ParquetFormatter¶
- class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)¶
Bases:
FormatterV2Interface for reading and writing Arrow Table objects to and from Parquet files.
Attributes Summary
Declare whether
read_from_fileis available to this formatter.Declare whether
read_from_uriis available to this formatter.Default extension to use when writing a file.
Methods Summary
add_provenance(in_memory_dataset[, provenance])Add provenance to the dataset.
can_accept(in_memory_dataset)Indicate whether this formatter can accept the specified storage class directly.
read_from_local_file(path[, component, ...])Read a dataset from a URI guaranteed to refer to the local file system.
read_from_uri(uri[, component, expected_size])Read a dataset from a URI that can be local or remote.
write_local_file(in_memory_dataset, uri)Serialize the in-memory dataset to a local file.
Attributes Documentation
- can_read_from_local_file: ClassVar[bool] = True¶
Declare whether
read_from_fileis available to this formatter.
- can_read_from_uri: ClassVar[bool] = True¶
Declare whether
read_from_uriis available to this formatter.
- default_extension: ClassVar[str | None] = '.parq'¶
Default extension to use when writing a file.
Can be
Noneif the extension is determined dynamically. Use theget_write_extensionmethod to get the actual extension to use.
Methods Documentation
- add_provenance(in_memory_dataset: Any, provenance: DatasetProvenance | None = None) Any¶
Add provenance to the dataset.
- Parameters:
- Returns:
- dataset_to_write
object The dataset to use for serialization. Can be the same object as given.
- dataset_to_write
Notes
The base class implementation returns the given object unchanged.
- can_accept(in_memory_dataset: Any) bool¶
Indicate whether this formatter can accept the specified storage class directly.
- Parameters:
- in_memory_dataset
object The dataset that is to be written.
- in_memory_dataset
- Returns:
Notes
The base class always returns
Falseeven if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.
- read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any¶
Read a dataset from a URI guaranteed to refer to the local file system.
- Parameters:
- Returns:
- in_memory_dataset
objectorNotImplemented The Python object read from the resource or
NotImplemented.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no implementation written to read data from a local file.
Notes
This method will only be called if the class property
can_read_from_local_fileisTrueand other options were not used.
- read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any¶
Read a dataset from a URI that can be local or remote.
- Parameters:
- uri
lsst.resources.ResourcePath URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file.
- component
strorNone, optional The component to be read from the dataset.
- expected_size
int, optional If known, the expected size of the resource to read. This can be
-1indicates the file size is not known.
- uri
- Returns:
- in_memory_dataset
objectorNotImplemented The Python object read from the resource or
NotImplemented.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no support for direct reads from a, possibly, remote URI.
Notes
This method is only called if the class property
can_read_from_uriis set toTrue.It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache.
If the full file is being read this file will not be added to the local cache. Consider returning
NotImplementedin this situation, for example if there are no parameters or component specified, and allowing the system to fall back to callingread_from_local_file(which will populate the cache if configured to do so).
- write_local_file(in_memory_dataset: Any, uri: ResourcePath) None¶
Serialize the in-memory dataset to a local file.
- Parameters:
- in_memory_dataset
object The Python object to serialize.
- uri
ResourcePath The URI to use when writing the file.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter subclass has not implemented this method or has failed to implement the
to_bytesmethod.
Notes
By default this method will attempt to call
to_bytesand then write these bytes to the file.