FormatterV2¶
- class lsst.daf.butler.FormatterV2(file_descriptor: FileDescriptor, *, ref: DatasetRef, write_parameters: Mapping[str, Any] | None = None, write_recipes: Mapping[str, Any] | None = None, **kwargs: Any)¶
Bases:
object
Interface for reading and writing datasets using URIs.
The formatters are associated with a particular
StorageClass
.- Parameters:
- file_descriptor
FileDescriptor
, optional Identifies the file to read or write, and the associated storage classes and parameter information.
- ref
DatasetRef
The dataset associated with this formatter. Should not be a component dataset ref.
- write_parameters
dict
, optional Parameters to control how the dataset is serialized.
- write_recipes
dict
, optional Detailed write recipes indexed by recipe name.
- **kwargs
Additional arguments that will be ignored but allow for
Formatter
V1 parameters to be given.
- file_descriptor
Notes
A
FormatterV2
author should not override the defaultread
orwrite
method. Instead for read the formatter author should implement one or all ofread_from_stream
,read_from_uri
, orread_from_local_file
. The methodread_from_uri
will always be attempted first and could be more efficient (since it allows the possibility for a subset of the data file to be accessed remotely when parameters or components are specified) but it will not update the local cache. If the entire contents of the remote file are being accessed (no component or parameters defined) and the dataset would be cached,read_from_uri
will be called with a local file. If the file is remote and the parameters that have been included are known to be more efficiently handled with a local file, theread_from_uri
method can returnNotImplemented
to indicate that a local file should be given instead.Similarly for writes, the
write
method can not be subclassed. Instead the formatter author should implementto_bytes
orwrite_local_file
. For local URIs the system will always callwrite_local_file
first (which by default will callto_bytes
) to ensure atomic writes are implemented. For remote URIs with local caching disabled,to_bytes
will be called first and the remote updated directly. If the dataset should be cached it will always be written locally first.Attributes Summary
Declare whether
read_from_file
is available to this formatter.Declare whether
read_from_stream
is available to this formatter.Declare whether
read_from_uri
is available to this formatter.Return Data ID associated with this formatter (
DataCoordinate
).Return Dataset Ref associated with this formatter (
DatasetRef
).Default extension to use when writing a file.
File descriptor associated with this formatter (
FileDescriptor
).Set of all extensions supported by this formatter.
Parameters understood by this formatter that can be used to control how a dataset is serialized.
Set of read parameters not understood by this
Formatter
.Parameters to use when writing out datasets.
Detailed write Recipes indexed by recipe name.
Methods Summary
can_accept
(in_memory_dataset)Indicate whether this formatter can accept the specified storage class directly.
Extension to use when writing a file.
make_updated_location
(location)Return a new
Location
updated with this formatter's extension.name
()Return the fully qualified name of the formatter.
Return the path that would be returned by write.
read
([component, expected_size, cache_manager])Read a Dataset.
Read from arbitrary URI, checking for possible presence in local
read_from_local_file
(path[, component, ...])Read a dataset from a URI guaranteed to refer to the local file system.
Read a dataset ensuring that a local file is used, checking the cache for it.
Read from a stream, checking for possible presence in local cache.
read_from_stream
(stream[, component, ...])Read from an open file descriptor.
read_from_uri
(uri[, component, expected_size])Read a dataset from a URI that can be local or remote.
segregate_parameters
([parameters])Segregate the supplied parameters.
to_bytes
(in_memory_dataset)Serialize the in-memory dataset to bytes.
validate_extension
(location)Check the extension of the provided location for compatibility.
validate_write_recipes
(recipes)Validate supplied recipes for this formatter.
write
(in_memory_dataset[, cache_manager])Write a Dataset.
write_direct
(in_memory_dataset, uri[, ...])Serialize and write directly to final location.
write_local_file
(in_memory_dataset, uri)Serialize the in-memory dataset to a local file.
write_locally_then_move
(in_memory_dataset, uri)Write file to file system and then move to final location.
Attributes Documentation
- can_read_from_local_file: ClassVar[bool] = False¶
Declare whether
read_from_file
is available to this formatter.
- can_read_from_stream: ClassVar[bool] = False¶
Declare whether
read_from_stream
is available to this formatter.
- can_read_from_uri: ClassVar[bool] = False¶
Declare whether
read_from_uri
is available to this formatter.
- data_id¶
Return Data ID associated with this formatter (
DataCoordinate
).
- dataset_ref¶
Return Dataset Ref associated with this formatter (
DatasetRef
).
- default_extension: ClassVar[str | None] = None¶
Default extension to use when writing a file.
Can be
None
if the extension is determined dynamically. Use theget_write_extension
method to get the actual extension to use.
- file_descriptor¶
File descriptor associated with this formatter (
FileDescriptor
).
- supported_extensions: ClassVar[Set[str]] = frozenset({})¶
Set of all extensions supported by this formatter.
Any extension assigned to the
default_extension
property will be automatically included in the list of supported extensions.
- supported_write_parameters: ClassVar[Set[str] | None] = None¶
Parameters understood by this formatter that can be used to control how a dataset is serialized.
None
indicates that no parameters are supported.
- unsupported_parameters: ClassVar[Set[str] | None] = frozenset({})¶
Set of read parameters not understood by this
Formatter
. An empty set means all parameters are supported.None
indicates that no parameters are supported. These parameters should match those defined in the storage class definition. (frozenset
).
- write_parameters¶
Parameters to use when writing out datasets.
- write_recipes¶
Detailed write Recipes indexed by recipe name.
Methods Documentation
- can_accept(in_memory_dataset: Any) bool ¶
Indicate whether this formatter can accept the specified storage class directly.
- Parameters:
- in_memory_dataset
object
The dataset that is to be written.
- in_memory_dataset
- Returns:
Notes
The base class always returns
False
even if the given type is an instance of the storage class type. This will result in a storage class conversion no-op but also allows mocks with mocked storage classes to work properly.
- make_updated_location(location: Location) Location ¶
Return a new
Location
updated with this formatter’s extension.
- classmethod name() str ¶
Return the fully qualified name of the formatter.
- Returns:
- name
str
Fully-qualified name of formatter class.
- name
- predict_path() str ¶
Return the path that would be returned by write.
Does not write any data file.
Uses the
FileDescriptor
associated with the instance.
- read(component: str | None = None, expected_size: int = -1, cache_manager: AbstractDatastoreCacheManager | None = None) Any ¶
Read a Dataset.
- Parameters:
- component
str
, optional Component to read from the file. Only used if the
StorageClass
for reading differed from theStorageClass
used to write the file.- expected_size
int
, optional If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download.
-1
indicates the file size is not known.- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to cache a remote file locally or read a cached file that is already local.
- component
- Returns:
- in_memory_dataset
object
The requested Dataset.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if no implementations were found that could read this resource.
Notes
This method should not be subclassed. Instead formatter subclasses should re-implement the specific
read_from_*
methods as appropriate. Each of these methods has a corresponding class property that must beTrue
for the method to be called.The priority for reading is:
read_from_uri
(but with a local file)
Any of these methods can return
NotImplemented
if there is a desire to skip to the next one in the list. If a dataset is being requested with no component, no parameters, and it should also be added to the local cache, the first two calls will be skipped (unlessread_from_stream
is the only implemented read method) such that a local file will be used.A Formatter can also read a file from within a Zip file if the URI associated with the
FileDescriptor
corresponds to a file with azip
extension and a URI fragment of the formzip-path={path_in_zip}
. When reading a file from within a Zip file the priority for reading is:There are multiple cases that must be handled for reading:
For a single file:
No component requested, read the whole file.
Component requested, optionally read the component efficiently, else read the whole file and extract the component.
Derived component requested, read whole file or read relevant component and derive.
Disassembled Composite:
The file to read here is the component itself. Formatter only knows about this one component file. Should be no component specified in the
read
call but theFileDescriptor
will know which component this is.A derived component. The file to read is a component but not the specified component. The caching needs the component from which it’s derived.
- read_directly_from_possibly_cached_uri(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any ¶
- Read from arbitrary URI, checking for possible presence in local
cache.
- Parameters:
- component
str
, optional Component to read from the file. Only used if the
StorageClass
for reading differed from theStorageClass
used to write the file.- expected_size
int
, optional If known, the expected size of the resource to read. This can be
-1
indicates the file size is not known.- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to check if there is a copy of the file in the local cache.
- component
- Returns:
- in_memory_dataset
object
orNotImplemented
The requested Dataset or an indication that the read mode was not implemented.
- in_memory_dataset
Notes
This method will first check the datastore cache in case the file is present locally. This method will not cache a remote dataset and will only do a size check for local files to avoid unnecessary round trips to a remote server.
The URI will be read by calling
read_from_uri
.
- read_from_local_file(path: str, component: str | None = None, expected_size: int = -1) Any ¶
Read a dataset from a URI guaranteed to refer to the local file system.
- Parameters:
- Returns:
- in_memory_dataset
object
orNotImplemented
The Python object read from the resource or
NotImplemented
.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no implementation written to read data from a local file.
Notes
This method will only be called if the class property
can_read_from_local_file
isTrue
and other options were not used.
- read_from_possibly_cached_local_file(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any ¶
Read a dataset ensuring that a local file is used, checking the cache for it.
- Parameters:
- component
str
, optional Component to read from the file. Only used if the
StorageClass
for reading differed from theStorageClass
used to write the file.- expected_size
int
, optional If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download.
-1
indicates the file size is not known.- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to cache a remote file locally or read a cached file that is already local.
- component
- Returns:
- in_memory_dataset
object
orNotImplemented
The requested Dataset or an indication that the read mode was not implemented.
- in_memory_dataset
Notes
The file will be downloaded and cached if it is a remote resource. The file contents will be read using
read_from_local_file
orread_from_uri
, with preference given to the former.
- read_from_possibly_cached_stream(component: str | None = None, expected_size: int = -1, *, cache_manager: AbstractDatastoreCacheManager | None = None) Any ¶
Read from a stream, checking for possible presence in local cache.
- Parameters:
- component
str
, optional Component to read from the file. Only used if the
StorageClass
for reading differed from theStorageClass
used to write the file.- expected_size
int
, optional If known, the expected size of the resource to read. This can be used for verification or to decide whether to do a direct read or a file download.
-1
indicates the file size is not known.- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to check if there is a copy of the file in the local cache.
- component
- Returns:
- in_memory_dataset
object
orNotImplemented
The requested Dataset or an indication that the read mode was not implemented.
- in_memory_dataset
Notes
Calls
read_from_stream
but will first check the datastore cache in case the file is present locally. This method will not download a file to the local cache.
- read_from_stream(stream: BinaryIO | ResourceHandleProtocol, component: str | None = None, expected_size: int = -1) Any ¶
Read from an open file descriptor.
- Parameters:
- stream
lsst.resources.ResourceHandleProtocol
ortyping.BinaryIO
File stream to use to read the dataset.
- component
str
orNone
, optional The component to be read from the dataset.
- expected_size
int
, optional If known, the expected size of the resource to read. This can be
-1
indicates the file size is not known.
- stream
- Returns:
- in_memory_dataset
object
orNotImplemented
The Python object read from the stream or
NotImplemented
.
- in_memory_dataset
Notes
Only called if the class property
can_read_from_stream
isTrue
.
- read_from_uri(uri: ResourcePath, component: str | None = None, expected_size: int = -1) Any ¶
Read a dataset from a URI that can be local or remote.
- Parameters:
- uri
lsst.resources.ResourcePath
URI to use to read the dataset. This URI can be local or remote and can refer to the actual resource or to a locally cached file.
- component
str
orNone
, optional The component to be read from the dataset.
- expected_size
int
, optional If known, the expected size of the resource to read. This can be
-1
indicates the file size is not known.
- uri
- Returns:
- in_memory_dataset
object
orNotImplemented
The Python object read from the resource or
NotImplemented
.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if there is no support for direct reads from a, possibly, remote URI.
Notes
This method is only called if the class property
can_read_from_uri
is set toTrue
.It is possible that a cached local file will be given to this method even if it was originally a remote URI. This can happen if the original write resulted in the file being added to the local cache.
If the full file is being read this file will not be added to the local cache. Consider returning
NotImplemented
in this situation, for example if there are no parameters or component specified, and allowing the system to fall back to callingread_from_local_file
(which will populate the cache if configured to do so).
- segregate_parameters(parameters: dict[str, Any] | None = None) tuple[dict, dict] ¶
Segregate the supplied parameters.
This splits the parameters into those understood by the formatter and those not understood by the formatter.
Any unsupported parameters are assumed to be usable by associated assemblers.
- Parameters:
- parameters
dict
, optional Parameters with values that have been supplied by the caller and which might be relevant for the formatter. If
None
parameters will be read from the registeredFileDescriptor
.
- parameters
- Returns:
- to_bytes(in_memory_dataset: Any) bytes ¶
Serialize the in-memory dataset to bytes.
- Parameters:
- in_memory_dataset
object
The Python object to serialize.
- in_memory_dataset
- Returns:
- serialized_dataset
bytes
Bytes representing the serialized dataset.
- serialized_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter has not implemented the method. This will not cause a problem if
write_local_file
has been implemented.
- classmethod validate_extension(location: Location) None ¶
Check the extension of the provided location for compatibility.
- Parameters:
- location
Location
Location from which to extract a file extension.
- location
- Raises:
- ValueError
Raised if the formatter does not understand this extension.
- classmethod validate_write_recipes(recipes: Mapping[str, Any] | None) Mapping[str, Any] | None ¶
Validate supplied recipes for this formatter.
The recipes are supplemented with default values where appropriate.
- final write(in_memory_dataset: Any, cache_manager: AbstractDatastoreCacheManager | None = None) None ¶
Write a Dataset.
- Parameters:
- in_memory_dataset
object
The Dataset to serialize.
- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to cache the written file.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter subclass has not implemented
write_local_file
andto_bytes
was not called.- Exception
Raised if there is an error serializing the dataset to disk.
Notes
The intent is for subclasses to implement either
to_bytes
orwrite_local_file
or both and not to subclass this method.
- write_direct(in_memory_dataset: Any, uri: ResourcePath, cache_manager: AbstractDatastoreCacheManager | None = None) bool ¶
Serialize and write directly to final location.
- Parameters:
- in_memory_dataset
object
The Dataset to serialize.
- uri
lsst.resources.ResourcePath
URI to use when writing the serialized dataset.
- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to cache the written file.
- in_memory_dataset
- Returns:
- written
bool
Flag to indicate whether the direct write did happen.
- written
- Raises:
- Exception
Raised if there was a failure from serializing to bytes that was not
FormatterNotImplementedError
.
Notes
This method will call
to_bytes
to serialize the in-memory dataset and then will call thewrite
method directly.If the dataset should be cached or is local the file will not be written and the method will return
False
. This is because local URIs should be written to a temporary file name and then renamed to allow atomic writes. That path is handled bywrite_locally_then_move
throughwrite_local_file
) and is preferred over this method being subclassed and the atomic write re-implemented.
- write_local_file(in_memory_dataset: Any, uri: ResourcePath) None ¶
Serialize the in-memory dataset to a local file.
- Parameters:
- in_memory_dataset
object
The Python object to serialize.
- uri
ResourcePath
The URI to use when writing the file.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter subclass has not implemented this method or has failed to implement the
to_bytes
method.
Notes
By default this method will attempt to call
to_bytes
and then write these bytes to the file.
- write_locally_then_move(in_memory_dataset: Any, uri: ResourcePath, cache_manager: AbstractDatastoreCacheManager | None = None) None ¶
Write file to file system and then move to final location.
- Parameters:
- in_memory_dataset
object
The Dataset to serialize.
- uri
lsst.resources.ResourcePath
URI to use when writing the serialized dataset.
- cache_manager
AbstractDatastoreCacheManager
A cache manager to use to allow a formatter to cache the written file.
- in_memory_dataset
- Raises:
- FormatterNotImplementedError
Raised if the formatter subclass has not implemented
write_local_file
.- Exception
Raised if there is an error serializing the dataset to disk.