ParquetFormatter¶
- class lsst.daf.butler.formatters.parquet.ParquetFormatter(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)¶
- Bases: - FormatterV2- Interface for reading and writing Arrow Table objects to and from Parquet files. - Attributes Summary - Declare whether - read_from_fileis available to this formatter.- Declare whether - read_from_uriis available to this formatter.- Default extension to use when writing a file. - Methods Summary - can_accept(in_memory_dataset)- Indicate whether this formatter can accept the specified storage class directly. - read_from_local_file(path[, component, ...])- Read a dataset from a URI guaranteed to refer to the local file system. - read_from_uri(uri[, component, expected_size])- Read a dataset from a URI that can be local or remote. - write_local_file(in_memory_dataset, uri)- Serialize the in-memory dataset to a local file. - Attributes Documentation - can_read_from_local_file: ClassVar[bool] = True¶
- Declare whether - read_from_fileis available to this formatter.
 - can_read_from_uri: ClassVar[bool] = True¶
- Declare whether - read_from_uriis available to this formatter.
 - default_extension: ClassVar[str | None] = '.parq'¶
- Default extension to use when writing a file. - Can be - Noneif the extension is determined dynamically. Use the- get_write_extensionmethod to get the actual extension to use.
 - Methods Documentation - can_accept(in_memory_dataset: Any) bool¶
- Indicate whether this formatter can accept the specified storage class directly. - Parameters:
- in_memory_datasetobject
- The dataset that is to be written. 
 
- in_memory_dataset
- Returns:
 - Notes - The base class always returns - Falseeven if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.
 - read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any¶
- Read a dataset from a URI guaranteed to refer to the local file system. - Parameters:
- Returns:
- in_memory_datasetobjectorNotImplemented
- The Python object read from the resource or - NotImplemented.
 
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
- Raised if there is no implementation written to read data from a local file. 
 
 - Notes - This method will only be called if the class property - can_read_from_local_fileis- Trueand other options were not used.
 - read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any¶
- Read a dataset from a URI that can be local or remote. - Parameters:
- urilsst.resources.ResourcePath
- URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file. 
- componentstrorNone, optional
- The component to be read from the dataset. 
- expected_sizeint, optional
- If known, the expected size of the resource to read. This can be - -1indicates the file size is not known.
 
- uri
- Returns:
- in_memory_datasetobjectorNotImplemented
- The Python object read from the resource or - NotImplemented.
 
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
- Raised if there is no support for direct reads from a, possibly, remote URI. 
 
 - Notes - This method is only called if the class property - can_read_from_uriis set to- True.- It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache. - If the full file is being read this file will not be added to the local cache. Consider returning - NotImplementedin this situation, for example if there are no parameters or component specified, and allowing the system to fall back to calling- read_from_local_file(which will populate the cache if configured to do so).
 - write_local_file(in_memory_dataset: Any, uri: ResourcePath) None¶
- Serialize the in-memory dataset to a local file. - Parameters:
- in_memory_datasetobject
- The Python object to serialize. 
- uriResourcePath
- The URI to use when writing the file. 
 
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
- Raised if the formatter subclass has not implemented this method or has failed to implement the - to_bytesmethod.
 
 - Notes - By default this method will attempt to call - to_bytesand then write these bytes to the file.