ParquetFormatter

class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)

Bases: FormatterV2

Interface for reading and writing Arrow Table objects to and from Parquet files.

Attributes Summary

can_read_from_local_file

Declare whether read_from_file is available to this formatter.

default_extension

Default extension to use when writing a file.

Methods Summary

read_from_local_file(path[, component, ...])

Read a dataset from a URI guaranteed to refer to the local file system.

write_local_file(in_memory_dataset, uri)

Serialize the in-memory dataset to a local file.

Attributes Documentation

can_read_from_local_file: ClassVar[bool] = True

Declare whether read_from_file is available to this formatter.

default_extension: ClassVar[str | None] = '.parq'

Default extension to use when writing a file.

Can be None if the extension is determined dynamically. Use the get_write_extension method to get the actual extension to use.

Methods Documentation

read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI guaranteed to refer to the local file system.

Parameters:
pathstr

Path to a local file that should be read.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no implementation written to read data from a local file.

Notes

This method will only be called if the class property can_read_from_local_file is True and other options were not used.

write_local_file(in_memory_dataset: Any, uri: ResourcePath) None

Serialize the in-memory dataset to a local file.

Parameters:
in_memory_datasetobject

The Python object to serialize.

uriResourcePath

The URI to use when writing the file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented this method or has failed to implement the to_bytes method.

Notes

By default this method will attempt to call to_bytes and then write these bytes to the file.