FormatterV2

class lsst.daf.butler.FormatterV2(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)

Bases: object

Interface for reading and writing datasets using URIs.

The formatters are associated with a particular StorageClass.

Parameters:
file_descriptorFileDescriptor, optional

Identifies the file to read or write, and the associated storage classes and parameter information.

refDatasetRef

The dataset associated with this formatter. Should not be a component dataset ref.

write_parametersdict, optional

Parameters to control how the dataset is serialized.

write_recipesdict, optional

Detailed write recipes indexed by recipe name.

**kwargs

Additional arguments that will be ignored but allow for Formatter V1 parameters to be given.

Notes

A FormatterV2 author should not override the default read or write method. Instead for read the formatter author should implement one or all of read_from_stream, read_from_uri, or read_from_local_file. The method read_from_uri will always be attempted first and could be more efficient (since it allows the possibility for a subset of the data file to be accessed remotely when parameters or components are specified) but it will not update the local cache. If the entire contents of the remote file are being accessed (no component or parameters defined) and the dataset would be cached, read_from_uri will be called with a local file. If the file is remote and the parameters that have been included are known to be more efficiently handled with a local file, the read_from_uri method can return NotImplemented to indicate that a local file should be given instead.

Similarly for writes, the write method can not be subclassed. Instead the formatter author should implement to_bytes or write_local_file. For local URIs the system will always call write_local_file first (which by default will call to_bytes) to ensure atomic writes are implemented. For remote URIs with local caching disabled, to_bytes will be called first and the remote updated directly. If the dataset should be cached it will always be written locally first.

Attributes Summary

can_read_from_local_file

Declare whether read_from_file is available to this formatter.

can_read_from_stream

Declare whether read_from_stream is available to this formatter.

can_read_from_uri

Declare whether read_from_uri is available to this formatter.

data_id

Return Data ID associated with this formatter (DataCoordinate).

dataset_ref

Return Dataset Ref associated with this formatter (DatasetRef).

default_extension

Default extension to use when writing a file.

file_descriptor

File descriptor associated with this formatter (FileDescriptor).

supported_extensions

Set of all extensions supported by this formatter.

supported_write_parameters

Parameters understood by this formatter that can be used to control how a dataset is serialized.

unsupported_parameters

Set of read parameters not understood by this Formatter.

write_parameters

Parameters to use when writing out datasets.

write_recipes

Detailed write Recipes indexed by recipe name.

Methods Summary

can_accept(in_memory_dataset)

Indicate whether this formatter can accept the specified storage class directly.

get_write_extension()

Extension to use when writing a file.

make_updated_location(location)

Return a new Location updated with this formatter's extension.

name()

Return the fully qualified name of the formatter.

predict_path()

Return the path that would be returned by write.

read([component, expected_size, cache_manager])

Read a Dataset.

read_directly_from_possibly_cached_uri([...])

Read from arbitrary URI, checking for possible presence in local

read_from_local_file(path[, component, ...])

Read a dataset from a URI guaranteed to refer to the local file system.

read_from_possibly_cached_local_file([...])

Read a dataset ensuring that a local file is used, checking the cache for it.

read_from_possibly_cached_stream([...])

Read from a stream, checking for possible presence in local cache.

read_from_stream(stream[, component, ...])

Read from an open file descriptor.

read_from_uri(uri[, component, expected_size])

Read a dataset from a URI that can be local or remote.

segregate_parameters([parameters])

Segregate the supplied parameters.

to_bytes(in_memory_dataset)

Serialize the in-memory dataset to bytes.

validate_extension(location)

Check the extension of the provided location for compatibility.

validate_write_recipes(recipes)

Validate supplied recipes for this formatter.

write(in_memory_dataset[, cache_manager])

Write a Dataset.

write_direct(in_memory_dataset, uri[, ...])

Serialize and write directly to final location.

write_local_file(in_memory_dataset, uri)

Serialize the in-memory dataset to a local file.

write_locally_then_move(in_memory_dataset, uri)

Write file to file system and then move to final location.

Attributes Documentation

can_read_from_local_file: ClassVar[bool] = False

Declare whether read_from_file is available to this formatter.

can_read_from_stream: ClassVar[bool] = False

Declare whether read_from_stream is available to this formatter.

can_read_from_uri: ClassVar[bool] = False

Declare whether read_from_uri is available to this formatter.

data_id

Return Data ID associated with this formatter (DataCoordinate).

dataset_ref

Return Dataset Ref associated with this formatter (DatasetRef).

default_extension: ClassVar[str | None] = None

Default extension to use when writing a file.

Can be None if the extension is determined dynamically. Use the get_write_extension method to get the actual extension to use.

file_descriptor

File descriptor associated with this formatter (FileDescriptor).

supported_extensions: ClassVar[Set[str]] = frozenset({})

Set of all extensions supported by this formatter.

Any extension assigned to the default_extension property will be automatically included in the list of supported extensions.

supported_write_parameters: ClassVar[Set[str] | None] = None

Parameters understood by this formatter that can be used to control how a dataset is serialized. None indicates that no parameters are supported.

unsupported_parameters: ClassVar[Set[str] | None] = frozenset({})

Set of read parameters not understood by this Formatter. An empty set means all parameters are supported. None indicates that no parameters are supported. These parameters should match those defined in the storage class definition. (frozenset).

write_parameters

Parameters to use when writing out datasets.

write_recipes

Detailed write Recipes indexed by recipe name.

Methods Documentation

can_accept(in_memory_dataset: Any) bool

Indicate whether this formatter can accept the specified storage class directly.

Parameters:
in_memory_datasetobject

The dataset that is to be written.

Returns:
acceptsbool

If True the formatter can write data of this type without requiring datastore to convert it. If False the datastore will attempt to convert before writing.

Notes

The base class always returns False even if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.

get_write_extension() str

Extension to use when writing a file.

make_updated_location(location: Location) Location

Return a new Location updated with this formatter’s extension.

Parameters:
locationLocation

The location to update.

Returns:
updatedLocation

A new Location with a new file extension applied.

classmethod name() str

Return the fully qualified name of the formatter.

Returns:
namestr

Fully-qualified name of formatter class.

predict_path() str

Return the path that would be returned by write.

Does not write any data file.

Uses the FileDescriptor associated with the instance.

Returns:
pathstr

Path within datastore that would be associated with the location stored in this Formatter.

read(component: str | None = None, expected_size: int = -1, cache_manager: AbstractDatastoreCacheManager | None = None) Any

Read a Dataset.

Parameters:
componentstr, optional

Component to read from the file. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download. -1 indicates the file size is not known.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to cache a remote file locally or read a cached file that is already local.

Returns:
in_memory_datasetobject

The requested Dataset.

Raises:
FormatterNotImplementedError

Raised if no implementations were found that could read this resource.

Notes

This method should not be subclassed. Instead formatter subclasses should re-implement the specific read_from_* methods as appropriate. Each of these methods has a corresponding class property that must be True for the method to be called.

The priority for reading is:

Any of these methods can return NotImplemented if there is a desire to skip to the next one in the list. If a dataset is being requested with no component, no parameters, and it should also be added to the local cache, the first two calls will be skipped (unless read_from_stream is the only implemented read method) such that a local file will be used.

A Formatter can also read a file from within a Zip file if the URI associated with the FileDescriptor corresponds to a file with a zip extension and a URI fragment of the form zip-path={path_in_zip}. When reading a file from within a Zip file the priority for reading is:

There are multiple cases that must be handled for reading:

For a single file:

  • No component requested, read the whole file.

  • Component requested, optionally read the component efficiently, else read the whole file and extract the component.

  • Derived component requested, read whole file or read relevant component and derive.

Disassembled Composite:

  • The file to read here is the component itself. Formatter only knows about this one component file. Should be no component specified in the read call but the FileDescriptor will know which component this is.

  • A derived component. The file to read is a component but not the specified component. The caching needs the component from which it’s derived.

read_directly_from_possibly_cached_uri(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any
Read from arbitrary URI, checking for possible presence in local

cache.

Parameters:
componentstr, optional

Component to read from the file. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to check if there is a copy of the file in the local cache.

Returns:
in_memory_datasetobject or NotImplemented

The requested Dataset or an indication that the read mode was not implemented.

Notes

This method will first check the datastore cache in case the file is present locally. This method will not cache a remote dataset and will only do a size check for local files to avoid unnecessary round trips to a remote server.

The URI will be read by calling read_from_uri.

read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI guaranteed to refer to the local file system.

Parameters:
pathstr

Path to a local file that should be read.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no implementation written to read data from a local file.

Notes

This method will only be called if the class property can_read_from_local_file is True and other options were not used.

read_from_possibly_cached_local_file(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any

Read a dataset ensuring that a local file is used, checking the cache for it.

Parameters:
componentstr, optional

Component to read from the file. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download. -1 indicates the file size is not known.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to cache a remote file locally or read a cached file that is already local.

Returns:
in_memory_datasetobject or NotImplemented

The requested Dataset or an indication that the read mode was not implemented.

Notes

The file will be downloaded and cached if it is a remote resource. The file contents will be read using read_from_local_file or read_from_uri, with preference given to the former.

read_from_possibly_cached_stream(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any

Read from a stream, checking for possible presence in local cache.

Parameters:
componentstr, optional

Component to read from the file. Only used if the StorageClass for reading differed from the StorageClass used to write the file.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download. -1 indicates the file size is not known.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to check if there is a copy of the file in the local cache.

Returns:
in_memory_datasetobject or NotImplemented

The requested Dataset or an indication that the read mode was not implemented.

Notes

Calls read_from_stream but will first check the datastore cache in case the file is present locally. This method will not download a file to the local cache.

read_from_stream(stream: BinaryIO | ResourceHandleProtocol, component: str | None = None, expected_size: int = -1) Any

Read from an open file descriptor.

Parameters:
streamlsst.resources.ResourceHandleProtocol or typing.BinaryIO

File stream to use to read the dataset.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the stream or NotImplemented.

Notes

Only called if the class property can_read_from_stream is True.

read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any

Read a dataset from a URI that can be local or remote.

Parameters:
urilsst.resources.ResourcePath

URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file.

componentstr or None, optional

The component to be read from the dataset.

expected_sizeint, optional

If known, the expected size of the resource to read. This can be -1 indicates the file size is not known.

Returns:
in_memory_datasetobject or NotImplemented

The Python object read from the resource or NotImplemented.

Raises:
FormatterNotImplementedError

Raised if there is no support for direct reads from a, possibly, remote URI.

Notes

This method is only called if the class property can_read_from_uri is set to True.

It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache.

If the full file is being read this file will not be added to the local cache. Consider returning NotImplemented in this situation, for example if there are no parameters or component specified, and allowing the system to fall back to calling read_from_local_file (which will populate the cache if configured to do so).

segregate_parameters(parameters: dict[str, Any] | None = None) tuple[dict, dict]

Segregate the supplied parameters.

This splits the parameters into those understood by the formatter and those not understood by the formatter.

Any unsupported parameters are assumed to be usable by associated assemblers.

Parameters:
parametersdict, optional

Parameters with values that have been supplied by the caller and which might be relevant for the formatter. If None parameters will be read from the registered FileDescriptor.

Returns:
supporteddict

Those parameters supported by this formatter.

unsupporteddict

Those parameters not supported by this formatter.

to_bytes(in_memory_dataset: Any) bytes

Serialize the in-memory dataset to bytes.

Parameters:
in_memory_datasetobject

The Python object to serialize.

Returns:
serialized_datasetbytes

Bytes representing the serialized dataset.

Raises:
FormatterNotImplementedError

Raised if the formatter has not implemented the method. This will not cause a problem if write_local_file has been implemented.

classmethod validate_extension(location: Location) None

Check the extension of the provided location for compatibility.

Parameters:
locationLocation

Location from which to extract a file extension.

Raises:
ValueError

Raised if the formatter does not understand this extension.

classmethod validate_write_recipes(recipes: Mapping[str, Any] | None) Mapping[str, Any] | None

Validate supplied recipes for this formatter.

The recipes are supplemented with default values where appropriate.

Parameters:
recipesdict

Recipes to validate.

Returns:
validateddict

Validated recipes.

Raises:
RuntimeError

Raised if validation fails. The default implementation raises if any recipes are given.

final write(in_memory_dataset: Any, cache_manager: AbstractDatastoreCacheManager | None = None) None

Write a Dataset.

Parameters:
in_memory_datasetobject

The Dataset to serialize.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to cache the written file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented write_local_file and to_bytes was not called.

Exception

Raised if there is an error serializing the dataset to disk.

Notes

The intent is for subclasses to implement either to_bytes or write_local_file or both and not to subclass this method.

write_direct(in_memory_dataset: Any, uri: ResourcePath, cache_manager: AbstractDatastoreCacheManager | None = None) bool

Serialize and write directly to final location.

Parameters:
in_memory_datasetobject

The Dataset to serialize.

urilsst.resources.ResourcePath

URI to use when writing the serialized dataset.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to cache the written file.

Returns:
writtenbool

Flag to indicate whether the direct write did happen.

Raises:
Exception

Raised if there was a failure from serializing to bytes that was not FormatterNotImplementedError.

Notes

This method will call to_bytes to serialize the in-memory dataset and then will call the write method directly.

If the dataset should be cached or is local the file will not be written and the method will return False. This is because local URIs should be written to a temporary file name and then renamed to allow atomic writes. That path is handled by write_locally_then_move through write_local_file) and is preferred over this method being subclassed and the atomic write re-implemented.

write_local_file(in_memory_dataset: Any, uri: ResourcePath) None

Serialize the in-memory dataset to a local file.

Parameters:
in_memory_datasetobject

The Python object to serialize.

uriResourcePath

The URI to use when writing the file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented this method or has failed to implement the to_bytes method.

Notes

By default this method will attempt to call to_bytes and then write these bytes to the file.

write_locally_then_move(in_memory_dataset: Any, uri: ResourcePath, cache_manager: AbstractDatastoreCacheManager | None = None) None

Write file to file system and then move to final location.

Parameters:
in_memory_datasetobject

The Dataset to serialize.

urilsst.resources.ResourcePath

URI to use when writing the serialized dataset.

cache_managerAbstractDatastoreCacheManager

A cache manager to use to allow a formatter to cache the written file.

Raises:
FormatterNotImplementedError

Raised if the formatter subclass has not implemented write_local_file.

Exception

Raised if there is an error serializing the dataset to disk.