ParquetFormatter

class lsst.daf.butler.formatters.parquet.ParquetFormatter(fileDescriptor: FileDescriptor, dataId: DataCoordinate, writeParameters: Optional[Dict[str, Any]] = None, writeRecipes: Optional[Dict[str, Any]] = None)

Bases: lsst.daf.butler.Formatter

Interface for reading and writing Pandas DataFrames to and from Parquet files.

This formatter is for the DataFrame StorageClass.

Attributes Summary

dataId

Return Data ID associated with this formatter (DataCoordinate).

extension

fileDescriptor

File descriptor associated with this formatter (FileDescriptor).

supportedExtensions

Set of all extensions supported by this formatter.

supportedWriteParameters

Parameters understood by this formatter that can be used to control how a dataset is serialized.

unsupportedParameters

Set of read parameters not understood by this Formatter.

writeParameters

Parameters to use when writing out datasets.

writeRecipes

Detailed write Recipes indexed by recipe name.

Methods Summary

can_read_bytes()

Indicate if this formatter can format from bytes.

fromBytes(serializedDataset[, component])

Read serialized data into a Dataset or its component.

makeUpdatedLocation(location)

Return a new Location updated with this formatter’s extension.

name()

Return the fully qualified name of the formatter.

predictPath()

Return the path that would be returned by write.

read([component])

Read a Dataset.

segregateParameters([parameters])

Segregate the supplied parameters.

toBytes(inMemoryDataset)

Serialize the Dataset to bytes based on formatter.

validateExtension(location)

Check the extension of the provided location for compatibility.

validateWriteRecipes(recipes)

Validate supplied recipes for this formatter.

write(inMemoryDataset)

Write a Dataset.

Attributes Documentation

dataId

Return Data ID associated with this formatter (DataCoordinate).

extension = '.parq'
fileDescriptor

File descriptor associated with this formatter (FileDescriptor).

Read-only property.

supportedExtensions: ClassVar[AbstractSet[str]] = frozenset({})

Set of all extensions supported by this formatter.

Only expected to be populated by Formatters that write files. Any extension assigned to the extension property will be automatically included in the list of supported extensions.

supportedWriteParameters: ClassVar[Optional[AbstractSet[str]]] = None

Parameters understood by this formatter that can be used to control how a dataset is serialized. None indicates that no parameters are supported.

unsupportedParameters: ClassVar[Optional[AbstractSet[str]]] = frozenset({})

Set of read parameters not understood by this Formatter. An empty set means all parameters are supported. None indicates that no parameters are supported. These param (frozenset).

writeParameters

Parameters to use when writing out datasets.

writeRecipes

Detailed write Recipes indexed by recipe name.

Methods Documentation

classmethod can_read_bytes()bool

Indicate if this formatter can format from bytes.

Returns
canbool

True if the fromBytes method is implemented.

fromBytes(serializedDataset: bytes, component: Optional[str] = None)object

Read serialized data into a Dataset or its component.

Parameters
serializedDatasetbytes

Bytes object to unserialize.

componentstr, optional

Component to read from the Dataset. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

Returns
inMemoryDatasetobject

The requested data as a Python object. The type of object is controlled by the specific formatter.

makeUpdatedLocation(location: lsst.daf.butler.Location)lsst.daf.butler.Location

Return a new Location updated with this formatter’s extension.

Parameters
locationLocation

The location to update.

Returns
updatedLocation

A new Location with a new file extension applied.

Raises
NotImplementedError

Raised if there is no extension attribute associated with this formatter.

Notes

This method is available to all Formatters but might not be implemented by all formatters. It requires that a formatter set an extension attribute containing the file extension used when writing files. If extension is None the supplied file will not be updated. Not all formatters write files so this is not defined in the base class.

classmethod name()str

Return the fully qualified name of the formatter.

Returns
namestr

Fully-qualified name of formatter class.

predictPath()str

Return the path that would be returned by write.

Does not write any data file.

Uses the FileDescriptor associated with the instance.

Returns
pathstr

Path within datastore that would be associated with the location stored in this Formatter.

read(component: Optional[str] = None)Any

Read a Dataset.

Parameters
componentstr, optional

Component to read from the file. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

Returns
inMemoryDatasetobject

The requested Dataset.

segregateParameters(parameters: Optional[Dict[str, Any]] = None)Tuple[Dict, Dict]

Segregate the supplied parameters.

This splits the parameters into those understood by the formatter and those not understood by the formatter.

Any unsupported parameters are assumed to be usable by associated assemblers.

Parameters
parametersdict, optional

Parameters with values that have been supplied by the caller and which might be relevant for the formatter. If None parameters will be read from the registered FileDescriptor.

Returns
supporteddict

Those parameters supported by this formatter.

unsupporteddict

Those parameters not supported by this formatter.

toBytes(inMemoryDataset: Any)bytes

Serialize the Dataset to bytes based on formatter.

Parameters
inMemoryDatasetobject

The Python object to serialize.

Returns
serializedDatasetbytes

Bytes representing the serialized dataset.

classmethod validateExtension(location: lsst.daf.butler.Location)None

Check the extension of the provided location for compatibility.

Parameters
locationLocation

Location from which to extract a file extension.

Raises
NotImplementedError

Raised if file extensions are a concept not understood by this formatter.

ValueError

Raised if the formatter does not understand this extension.

Notes

This method is available to all Formatters but might not be implemented by all formatters. It requires that a formatter set an extension attribute containing the file extension used when writing files. If extension is None only the set of supported extensions will be examined.

classmethod validateWriteRecipes(recipes: Optional[Mapping[str, Any]])Optional[Mapping[str, Any]]

Validate supplied recipes for this formatter.

The recipes are supplemented with default values where appropriate.

Parameters
recipesdict

Recipes to validate.

Returns
validateddict

Validated recipes.

Raises
RuntimeError

Raised if validation fails. The default implementation raises if any recipes are given.

write(inMemoryDataset: Any)None

Write a Dataset.

Parameters
inMemoryDatasetobject

The Dataset to store.