ParquetFormatter¶
-
class
lsst.daf.butler.formatters.parquet.
ParquetFormatter
(fileDescriptor: FileDescriptor, dataId: Optional[DataCoordinate] = None, writeParameters: Optional[Dict[str, Any]] = None, writeRecipes: Optional[Dict[str, Any]] = None)¶ Bases:
lsst.daf.butler.Formatter
Interface for reading and writing Pandas DataFrames to and from Parquet files.
This formatter is for the DataFrame StorageClass.
Attributes Summary
dataId
DataId associated with this formatter ( DataCoordinate
)extension
fileDescriptor
FileDescriptor associated with this formatter ( FileDescriptor
, read-only)supportedExtensions
supportedWriteParameters
unsupportedParameters
writeParameters
Parameters to use when writing out datasets. writeRecipes
Detailed write Recipes indexed by recipe name. Methods Summary
can_read_bytes
()Indicate if this formatter can format from bytes. fromBytes
(serializedDataset, component)Reads serialized data into a Dataset or its component. makeUpdatedLocation
(location)Return a new Location
instance updated with this formatter’s extension.name
()Returns the fully qualified name of the formatter. predictPath
()Return the path that would be returned by write, without actually writing. read
(component)Read a Dataset. segregateParameters
(parameters, Any]] = None)Segregate the supplied parameters into those understood by the formatter and those not understood by the formatter. toBytes
(inMemoryDataset)Serialize the Dataset to bytes based on formatter. validateExtension
(location)Check that the provided location refers to a file extension that is understood by this formatter. validateWriteRecipes
(recipes, Any]])Validate supplied recipes for this formatter. write
(inMemoryDataset)Write a Dataset. Attributes Documentation
-
dataId
¶ DataId associated with this formatter (
DataCoordinate
)
-
extension
= '.parq'¶
-
fileDescriptor
¶ FileDescriptor associated with this formatter (
FileDescriptor
, read-only)
-
supportedExtensions
= frozenset()¶
-
supportedWriteParameters
= None¶
-
unsupportedParameters
= frozenset()¶
-
writeParameters
¶ Parameters to use when writing out datasets.
-
writeRecipes
¶ Detailed write Recipes indexed by recipe name.
Methods Documentation
-
classmethod
can_read_bytes
() → bool¶ Indicate if this formatter can format from bytes.
Returns:
-
fromBytes
(serializedDataset: bytes, component: Optional[str] = None) → object¶ Reads serialized data into a Dataset or its component.
Parameters: Returns: - inMemoryDataset :
object
The requested data as a Python object. The type of object is controlled by the specific formatter.
- inMemoryDataset :
-
makeUpdatedLocation
(location: lsst.daf.butler.core.location.Location) → lsst.daf.butler.core.location.Location¶ Return a new
Location
instance updated with this formatter’s extension.Parameters: - location :
Location
The location to update.
Returns: - updated :
Location
A new
Location
with a new file extension applied.
Raises: - NotImplementedError
Raised if there is no
extension
attribute associated with this formatter.
Notes
This method is available to all Formatters but might not be implemented by all formatters. It requires that a formatter set an
extension
attribute containing the file extension used when writing files. Ifextension
isNone
the supplied file will not be updated. Not all formatters write files so this is not defined in the base class.- location :
-
classmethod
name
() → str¶ Returns the fully qualified name of the formatter.
Returns: - name :
str
Fully-qualified name of formatter class.
- name :
-
predictPath
() → str¶ Return the path that would be returned by write, without actually writing.
Uses the
FileDescriptor
associated with the instance.Returns: - path :
str
Path within datastore that would be associated with the location stored in this
Formatter
.
- path :
-
read
(component: Optional[str] = None) → Any¶ Read a Dataset.
Parameters: - component :
str
, optional Component to read from the file. Only used if the
StorageClass
for reading differed from theStorageClass
used to write the file.
Returns: - inMemoryDataset :
object
The requested Dataset.
- component :
-
segregateParameters
(parameters: Optional[Dict[str, Any]] = None) → Tuple[Dict[KT, VT], Dict[KT, VT]]¶ Segregate the supplied parameters into those understood by the formatter and those not understood by the formatter.
Any unsupported parameters are assumed to be usable by associated assemblers.
Parameters: Returns:
-
toBytes
(inMemoryDataset: Any) → bytes¶ Serialize the Dataset to bytes based on formatter.
Parameters: - inMemoryDataset :
object
The Python object to serialize.
Returns: - serializedDataset :
bytes
Bytes representing the serialized dataset.
- inMemoryDataset :
-
classmethod
validateExtension
(location: lsst.daf.butler.core.location.Location) → None¶ Check that the provided location refers to a file extension that is understood by this formatter.
Parameters: - location :
Location
Location from which to extract a file extension.
Raises: - NotImplementedError
Raised if file extensions are a concept not understood by this formatter.
- ValueError
Raised if the formatter does not understand this extension.
Notes
This method is available to all Formatters but might not be implemented by all formatters. It requires that a formatter set an
extension
attribute containing the file extension used when writing files. Ifextension
isNone
only the set of supported extensions will be examined.- location :
-
classmethod
validateWriteRecipes
(recipes: Optional[Mapping[str, Any]]) → Optional[Mapping[str, Any]]¶ Validate supplied recipes for this formatter.
The recipes are supplemented with default values where appropriate.
Parameters: - recipes :
dict
Recipes to validate.
Returns: - validated :
dict
Validated recipes.
Raises: - RuntimeError
Raised if validation fails. The default implementation raises if any recipes are given.
- recipes :
-