ParquetFormatter

class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)

Bases: FormatterV2

Interface for reading and writing Arrow Table objects to and from Parquet files.

Attributes Summary

can_read_from_local_file

Declare whether read_from_file is available to this formatter.

can_read_from_uri

Declare whether read_from_uri is available to this formatter.

default_extension

Default extension to use when writing a file.

Methods Summary

can_accept(in_memory_dataset)

Indicate whether this formatter can accept the specified storage class directly.

read_from_local_file(path[, component, ...])

Read a dataset from a URI guaranteed to refer to the local file system.

read_from_uri(uri[, component, expected_size])

Read a dataset from a URI that can be local or remote.

write_local_file(in_memory_dataset, uri)

Serialize the in-memory dataset to a local file.

Attributes Documentation

can_read_from_local_file: ClassVar[bool] = True

Declare whether read_from_file is available to this formatter.

can_read_from_uri: ClassVar[bool] = True

Declare whether read_from_uri is available to this formatter.

default_extension: ClassVar[str | None] = '.parq'

Default extension to use when writing a file.

Can be None if the extension is determined dynamically. Use the get_write_extension method to get the actual extension to use.

Methods Documentation

can_accept(in_memory_dataset: Any) bool

Indicate whether this formatter can accept the specified storage class directly.

Parameters:
in_memory_datasetobject

The dataset that is to be written.

Returns:
acceptsbool

If True the formatter can write data of this type without requiring datastore to convert it. If False the datastore will attempt to convert before writing.

Notes

The base class always returns False even if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.

read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI guaranteed to refer to the local file system.

Parameters:
pathstr

Path to a local file that should be read.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no implementation written to read data from a local file.

Notes

This method will only be called if the class property can_read_from_local_file is True and other options were not used.

read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI that can be local or remote.

Parameters:
urilsst.resources.ResourcePath

URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no support for direct reads from a, possibly, remote URI.

Notes

This method is only called if the class property can_read_from_uri is set to True.

It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache.

If the full file is being read this file will not be added to the local cache. Consider returning NotImplemented in this situation, for example if there are no parameters or component specified, and allowing the system to fall back to calling read_from_local_file (which will populate the cache if configured to do so).

write_local_file(in_memory_dataset: Any, uri: ResourcePath) None

Serialize the in-memory dataset to a local file.

Parameters:
in_memory_datasetobject

The Python object to serialize.

uriResourcePath

The URI to use when writing the file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented this method or has failed to implement the to_bytes method.

Notes

By default this method will attempt to call to_bytes and then write these bytes to the file.