ParquetFormatter¶
- class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)¶
Bases:
FormatterV2
Interface for reading and writing Arrow Table objects to and from Parquet files.
Attributes Summary
Declare whether
read_from_file
is available to this formatter.Declare whether
read_from_uri
is available to this formatter.Default extension to use when writing a file.
Methods Summary
can_accept
(in_memory_dataset)Indicate whether this formatter can accept the specified storage class directly.
read_from_local_file
(path[, component, ...])Read a dataset from a URI guaranteed to refer to the local file system.
read_from_uri
(uri[, component, expected_size])Read a dataset from a URI that can be local or remote.
write_local_file
(in_memory_dataset, uri)Serialize the in-memory dataset to a local file.
Attributes Documentation
- can_read_from_local_file: ClassVar[bool] = True¶
Declare whether
read_from_file
is available to this formatter.
- can_read_from_uri: ClassVar[bool] = True¶
Declare whether
read_from_uri
is available to this formatter.
- default_extension: ClassVar[str | None] = '.parq'¶
Default extension to use when writing a file.
Can be
None
if the extension is determined dynamically. Use theget_write_extension
method to get the actual extension to use.
Methods Documentation
- can_accept(in_memory_dataset: Any) bool ¶
Indicate whether this formatter can accept the specified storage class directly.
- Parameters:
- in_memory_dataset
object
The dataset that is to be written.
- in_memory_dataset
- Returns:
Notes
The base class always returns
False
even if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.
- read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any ¶
Read a dataset from a URI guaranteed to refer to the local file system.
- Parameters:
- Returns:
- in_memory_dataset
object
orNotImplemented
The Python object read from the resource or
NotImplemented
.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no implementation written to read data from a local file.
Notes
This method will only be called if the class property
can_read_from_local_file
isTrue
and other options were not used.
- read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any ¶
Read a dataset from a URI that can be local or remote.
- Parameters:
- uri
lsst.resources.ResourcePath
URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file.
- component
str
orNone
, optional The component to be read from the dataset.
- expected_size
int
, optional If known, the expected size of the resource to read. This can be
-1
indicates the file size is not known.
- uri
- Returns:
- in_memory_dataset
object
orNotImplemented
The Python object read from the resource or
NotImplemented
.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no support for direct reads from a, possibly, remote URI.
Notes
This method is only called if the class property
can_read_from_uri
is set toTrue
.It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache.
If the full file is being read this file will not be added to the local cache. Consider returning
NotImplemented
in this situation, for example if there are no parameters or component specified, and allowing the system to fall back to callingread_from_local_file
(which will populate the cache if configured to do so).
- write_local_file(in_memory_dataset: Any, uri: ResourcePath) None ¶
Serialize the in-memory dataset to a local file.
- Parameters:
- in_memory_dataset
object
The Python object to serialize.
- uri
ResourcePath
The URI to use when writing the file.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter subclass has not implemented this method or has failed to implement the
to_bytes
method.
Notes
By default this method will attempt to call
to_bytes
and then write these bytes to the file.