ParquetFormatter

class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)

Bases: FormatterV2

Interface for reading and writing Arrow Table objects to and from Parquet files.

Attributes Summary

can_read_from_local_file

Declare whether read_from_file is available to this formatter.

default_extension

Default extension to use when writing a file.

Methods Summary

can_accept(in_memory_dataset)

Indicate whether this formatter can accept the specified storage class directly.

read_from_local_file(path[, component, ...])

Read a dataset from a URI guaranteed to refer to the local file system.

write_local_file(in_memory_dataset, uri)

Serialize the in-memory dataset to a local file.

Attributes Documentation

can_read_from_local_file: ClassVar[bool] = True

Declare whether read_from_file is available to this formatter.

default_extension: ClassVar[str | None] = '.parq'

Default extension to use when writing a file.

Can be None if the extension is determined dynamically. Use the get_write_extension method to get the actual extension to use.

Methods Documentation

can_accept(in_memory_dataset: Any) bool

Indicate whether this formatter can accept the specified storage class directly.

Parameters:
in_memory_datasetobject

The dataset that is to be written.

Returns:
acceptsbool

If True the formatter can write data of this type without requiring datastore to convert it. If False the datastore will attempt to convert before writing.

Notes

The base class always returns False even if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.

read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI guaranteed to refer to the local file system.

Parameters:
pathstr

Path to a local file that should be read.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no implementation written to read data from a local file.

Notes

This method will only be called if the class property can_read_from_local_file is True and other options were not used.

write_local_file(in_memory_dataset: Any, uri: ResourcePath) None

Serialize the in-memory dataset to a local file.

Parameters:
in_memory_datasetobject

The Python object to serialize.

uriResourcePath

The URI to use when writing the file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented this method or has failed to implement the to_bytes method.

Notes

By default this method will attempt to call to_bytes and then write these bytes to the file.