LimitedButler¶
- class lsst.daf.butler.LimitedButler¶
Bases:
ABC
A minimal butler interface that is sufficient to back
PipelineTask
execution.Attributes Summary
This is a Generation 3 Butler.
The object that manages actual dataset storage (
Datastore
).Structure managing all dimensions recognized by this data repository (
DimensionUniverse
).Methods Summary
datasetExistsDirect
(ref)Return
True
if a dataset is actually present in the Datastore.get
(ref, /, *[, parameters, storageClass])Retrieve a stored dataset.
getDeferred
(ref, /, *[, parameters, ...])Create a
DeferredDatasetHandle
which can later retrieve a dataset, after an immediate registry lookup.getDirect
(ref, *[, parameters, storageClass])Retrieve a stored dataset.
getDirectDeferred
(ref, *[, parameters, ...])Create a
DeferredDatasetHandle
which can later retrieve a dataset, from a resolvedDatasetRef
.getURI
(ref, /, *[, predict])Return the URI to the Dataset.
getURIs
(ref, /, *[, predict])Return the URIs associated with the dataset.
Return the names of the datastores associated with this butler.
Return the defined root URIs for all registered datastores.
get_many_uris
(refs[, predict, allow_missing])Return URIs associated with many datasets.
markInputUnused
(ref)Indicate that a predicted input was not actually used when processing a
Quantum
.pruneDatasets
(refs, *[, disassociate, ...])Remove one or more datasets from a collection and/or storage.
put
(obj, ref, /)Store a dataset that already has a UUID and
RUN
collection.putDirect
(obj, ref, /)Store a dataset that already has a UUID and
RUN
collection.stored
(ref)Indicate whether the dataset's artifacts are present in the Datastore.
stored_many
(refs)Check the datastore for artifact existence of multiple datasets at once.
Attributes Documentation
- GENERATION: ClassVar[int] = 3¶
This is a Generation 3 Butler.
This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.
- datastore¶
The object that manages actual dataset storage (
Datastore
).Deprecated since version v26.0: The Butler.datastore property is now deprecated. Butler APIs should now exist with the relevant functionality. Will be removed after v26.0.
- dimensions¶
Structure managing all dimensions recognized by this data repository (
DimensionUniverse
).
Methods Documentation
- datasetExistsDirect(ref: DatasetRef) bool ¶
Return
True
if a dataset is actually present in the Datastore.- Parameters:
- ref
DatasetRef
Resolved reference to a dataset.
- ref
- Returns:
- exists
bool
Whether the dataset exists in the Datastore.
Deprecated since version v26.0: Butler.datasetExistsDirect() has been replaced by Butler.stored(). Will be removed after v26.0.
- exists
- get(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any ¶
Retrieve a stored dataset.
- Parameters:
- ref
DatasetRef
A resolved
DatasetRef
directly associated with a dataset.- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
object
The dataset.
- obj
- Raises:
- AmbiguousDatasetError
Raised if the supplied
DatasetRef
is unresolved.
Notes
In a
LimitedButler
the only allowable way to specify a dataset is to use a resolvedDatasetRef
. Subclasses can support more options.
- getDeferred(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle ¶
Create a
DeferredDatasetHandle
which can later retrieve a dataset, after an immediate registry lookup.- Parameters:
- ref
DatasetRef
For the default implementation of a
LimitedButler
, the only acceptable parameter is a resolvedDatasetRef
.- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
DeferredDatasetHandle
A handle which can be used to retrieve a dataset at a later time.
- obj
Notes
In a
LimitedButler
the only allowable way to specify a dataset is to use a resolvedDatasetRef
. Subclasses can support more options.
- getDirect(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) Any ¶
Retrieve a stored dataset.
- Parameters:
- ref
DatasetRef
Resolved reference to an already stored dataset.
- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
object
The dataset.
Deprecated since version v26.0: Butler.get() now behaves like Butler.getDirect() when given a DatasetRef. Please use Butler.get(). Will be removed after v26.0.
- obj
- getDirectDeferred(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle ¶
Create a
DeferredDatasetHandle
which can later retrieve a dataset, from a resolvedDatasetRef
.- Parameters:
- ref
DatasetRef
Resolved reference to an already stored dataset.
- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
DeferredDatasetHandle
A handle which can be used to retrieve a dataset at a later time.
Deprecated since version v26.0: Butler.getDeferred() now behaves like getDirectDeferred() when given a DatasetRef. Please use Butler.getDeferred(). Will be removed after v26.0.
- obj
- getURI(ref: DatasetRef, /, *, predict: bool = False) ResourcePath ¶
Return the URI to the Dataset.
- Parameters:
- ref
DatasetRef
A
DatasetRef
for which a single URI is requested.- predict
bool
If
True
, allow URIs to be returned of datasets that have not been written.
- ref
- Returns:
- uri
lsst.resources.ResourcePath
URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if
predict
isTrue
, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.
- uri
- Raises:
- RuntimeError
Raised if a URI is requested for a dataset that consists of multiple artifacts.
- getURIs(ref: DatasetRef, /, *, predict: bool = False) DatasetRefURIs ¶
Return the URIs associated with the dataset.
- Parameters:
- ref
DatasetRef
A
DatasetRef
for which URIs are requested.- predict
bool
If
True
, allow URIs to be returned of datasets that have not been written.
- ref
- Returns:
- uris
DatasetRefURIs
The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be
None
), and the URIs to any components associated with the dataset artifact (can be empty if there are no components).
- uris
- get_datastore_names() tuple[str, ...] ¶
Return the names of the datastores associated with this butler.
- get_datastore_roots() dict[str, lsst.resources._resourcePath.ResourcePath | None] ¶
Return the defined root URIs for all registered datastores.
- Returns:
- roots
dict
[str
,ResourcePath
|None
] A mapping from datastore name to datastore root URI. The root can be
None
if the datastore does not have any concept of a root URI.
- roots
- get_many_uris(refs: Iterable[DatasetRef], predict: bool = False, allow_missing: bool = False) dict[lsst.daf.butler._dataset_ref.DatasetRef, lsst.daf.butler.datastore._datastore.DatasetRefURIs] ¶
Return URIs associated with many datasets.
- Parameters:
- Returns:
- URIs
dict
of [DatasetRef
,DatasetRefURIs
] A dict of primary and component URIs, indexed by the passed-in refs.
- URIs
- Raises:
- FileNotFoundError
A URI has been requested for a dataset that does not exist and guessing is not allowed.
Notes
In file-based datastores, get_many_uris does not check that the file is present. It assumes that if datastore is aware of the file then it actually exists.
- markInputUnused(ref: DatasetRef) None ¶
Indicate that a predicted input was not actually used when processing a
Quantum
.- Parameters:
- ref
DatasetRef
Reference to the unused dataset.
- ref
Notes
By default, a dataset is considered “actually used” if it is accessed via
getDirect
or a handle to it is obtained viagetDirectDeferred
(even if the handle is not used). This method must be called after one of those in order to remove the dataset from the actual input list.This method does nothing for butlers that do not store provenance information (which is the default implementation provided by the base class).
- abstract pruneDatasets(refs: Iterable[DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Iterable[str] = (), purge: bool = False) None ¶
Remove one or more datasets from a collection and/or storage.
- Parameters:
- refs
Iterable
ofDatasetRef
Datasets to prune. These must be “resolved” references (not just a
DatasetType
and data ID).- disassociate
bool
, optional Disassociate pruned datasets from
tags
, or from all collections ifpurge=True
.- unstore
bool
, optional If
True
(False
is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.- tags
Iterable
[str
], optional TAGGED
collections to disassociate the datasets from. Ignored ifdisassociate
isFalse
orpurge
isTrue
.- purge
bool
, optional If
True
(False
is default), completely remove the dataset from theRegistry
. To prevent accidental deletions,purge
may only beTrue
if all of the following conditions are met:This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.
- refs
- Raises:
- TypeError
Raised if the butler is read-only, if no collection was provided, or the conditions for
purge=True
were not met.
- abstract put(obj: Any, ref: DatasetRef, /) DatasetRef ¶
Store a dataset that already has a UUID and
RUN
collection.- Parameters:
- obj
object
The dataset.
- ref
DatasetRef
Resolved reference for a not-yet-stored dataset.
- obj
- Returns:
- ref
DatasetRef
The same as the given, for convenience and symmetry with
Butler.put
.
- ref
- Raises:
- TypeError
Raised if the butler is read-only.
Notes
Whether this method inserts the given dataset into a
Registry
is implementation defined (someLimitedButler
subclasses do not have aRegistry
), but it always adds the dataset to aDatastore
, and the givenref.id
andref.run
are always preserved.
- putDirect(obj: Any, ref: DatasetRef, /) DatasetRef ¶
Store a dataset that already has a UUID and
RUN
collection.- Parameters:
- obj
object
The dataset.
- ref
DatasetRef
Resolved reference for a not-yet-stored dataset.
- obj
- Returns:
- ref
DatasetRef
The same as the given, for convenience and symmetry with
Butler.put
.
- ref
- Raises:
- TypeError
Raised if the butler is read-only.
Notes
Whether this method inserts the given dataset into a
Registry
is implementation defined (someLimitedButler
subclasses do not have aRegistry
), but it always adds the dataset to aDatastore
, and the givenref.id
andref.run
are always preserved.Deprecated since version v26.0: Butler.put() now behaves like Butler.putDirect() when given a DatasetRef. Please use Butler.put(). Will be removed after v26.0.
- stored(ref: DatasetRef) bool ¶
Indicate whether the dataset’s artifacts are present in the Datastore.
- Parameters:
- ref
DatasetRef
Resolved reference to a dataset.
- ref
- Returns:
- stored
bool
Whether the dataset artifact exists in the datastore and can be retrieved.
- stored
- stored_many(refs: Iterable[DatasetRef]) dict[lsst.daf.butler._dataset_ref.DatasetRef, bool] ¶
Check the datastore for artifact existence of multiple datasets at once.
- Parameters:
- refsiterable of
DatasetRef
The datasets to be checked.
- refsiterable of
- Returns:
- existence
dict
of [DatasetRef
,bool
] Mapping from given dataset refs to boolean indicating artifact existence.
- existence