LimitedButler

class lsst.daf.butler.LimitedButler

Bases: ABC

A minimal butler interface that is sufficient to back PipelineTask execution.

Attributes Summary

GENERATION

This is a Generation 3 Butler.

datastore

The object that manages actual dataset storage (Datastore).

dimensions

Structure managing all dimensions recognized by this data repository (DimensionUniverse).

Methods Summary

datasetExistsDirect(ref)

Return True if a dataset is actually present in the Datastore.

get(ref, /, *[, parameters, storageClass])

Retrieve a stored dataset.

getDeferred(ref, /, *[, parameters, ...])

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

getDirect(ref, *[, parameters, storageClass])

Retrieve a stored dataset.

getDirectDeferred(ref, *[, parameters, ...])

Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.

getURI(ref, /, *[, predict])

Return the URI to the Dataset.

getURIs(ref, /, *[, predict])

Return the URIs associated with the dataset.

get_datastore_names()

Return the names of the datastores associated with this butler.

get_datastore_roots()

Return the defined root URIs for all registered datastores.

get_many_uris(refs[, predict, allow_missing])

Return URIs associated with many datasets.

isWriteable()

Return True if this Butler supports write operations.

markInputUnused(ref)

Indicate that a predicted input was not actually used when processing a Quantum.

pruneDatasets(refs, *[, disassociate, ...])

Remove one or more datasets from a collection and/or storage.

put(obj, ref, /)

Store a dataset that already has a UUID and RUN collection.

putDirect(obj, ref, /)

Store a dataset that already has a UUID and RUN collection.

stored(ref)

Indicate whether the dataset's artifacts are present in the Datastore.

stored_many(refs)

Check the datastore for artifact existence of multiple datasets at once.

Attributes Documentation

GENERATION: ClassVar[int] = 3

This is a Generation 3 Butler.

This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.

datastore

The object that manages actual dataset storage (Datastore).

Deprecated since version v26.0: The Butler.datastore property is now deprecated. Butler APIs should now exist with the relevant functionality. Will be removed after v26.0.

dimensions

Structure managing all dimensions recognized by this data repository (DimensionUniverse).

Methods Documentation

datasetExistsDirect(ref: DatasetRef) bool

Return True if a dataset is actually present in the Datastore.

Parameters:
refDatasetRef

Resolved reference to a dataset.

Returns:
existsbool

Whether the dataset exists in the Datastore.

Deprecated since version v26.0: Butler.datasetExistsDirect() has been replaced by Butler.stored(). Will be removed after v26.0.

get(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any

Retrieve a stored dataset.

Parameters:
refDatasetRef

A resolved DatasetRef directly associated with a dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objobject

The dataset.

Raises:
AmbiguousDatasetError

Raised if the supplied DatasetRef is unresolved.

Notes

In a LimitedButler the only allowable way to specify a dataset is to use a resolved DatasetRef. Subclasses can support more options.

getDeferred(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

Parameters:
refDatasetRef

For the default implementation of a LimitedButler, the only acceptable parameter is a resolved DatasetRef.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objDeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Notes

In a LimitedButler the only allowable way to specify a dataset is to use a resolved DatasetRef. Subclasses can support more options.

getDirect(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) Any

Retrieve a stored dataset.

Parameters:
refDatasetRef

Resolved reference to an already stored dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objobject

The dataset.

Deprecated since version v26.0: Butler.get() now behaves like Butler.getDirect() when given a DatasetRef. Please use Butler.get(). Will be removed after v26.0.

getDirectDeferred(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.

Parameters:
refDatasetRef

Resolved reference to an already stored dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objDeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Deprecated since version v26.0: Butler.getDeferred() now behaves like getDirectDeferred() when given a DatasetRef. Please use Butler.getDeferred(). Will be removed after v26.0.

getURI(ref: DatasetRef, /, *, predict: bool = False) ResourcePath

Return the URI to the Dataset.

Parameters:
refDatasetRef

A DatasetRef for which a single URI is requested.

predictbool

If True, allow URIs to be returned of datasets that have not been written.

Returns:
urilsst.resources.ResourcePath

URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if predict is True, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.

Raises:
RuntimeError

Raised if a URI is requested for a dataset that consists of multiple artifacts.

getURIs(ref: DatasetRef, /, *, predict: bool = False) DatasetRefURIs

Return the URIs associated with the dataset.

Parameters:
refDatasetRef

A DatasetRef for which URIs are requested.

predictbool

If True, allow URIs to be returned of datasets that have not been written.

Returns:
urisDatasetRefURIs

The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be None), and the URIs to any components associated with the dataset artifact (can be empty if there are no components).

get_datastore_names() tuple[str, ...]

Return the names of the datastores associated with this butler.

Returns:
namestuple [str, …]

The names of the datastores.

get_datastore_roots() dict[str, lsst.resources._resourcePath.ResourcePath | None]

Return the defined root URIs for all registered datastores.

Returns:
rootsdict [str, ResourcePath | None]

A mapping from datastore name to datastore root URI. The root can be None if the datastore does not have any concept of a root URI.

get_many_uris(refs: Iterable[DatasetRef], predict: bool = False, allow_missing: bool = False) dict[lsst.daf.butler._dataset_ref.DatasetRef, lsst.daf.butler.datastore._datastore.DatasetRefURIs]

Return URIs associated with many datasets.

Parameters:
refsiterable of DatasetIdRef

References to the required datasets.

predictbool, optional

If True, allow URIs to be returned of datasets that have not been written.

allow_missingbool

If False, and predict is False, will raise if a DatasetRef does not exist.

Returns:
URIsdict of [DatasetRef, DatasetRefURIs]

A dict of primary and component URIs, indexed by the passed-in refs.

Raises:
FileNotFoundError

A URI has been requested for a dataset that does not exist and guessing is not allowed.

Notes

In file-based datastores, get_many_uris does not check that the file is present. It assumes that if datastore is aware of the file then it actually exists.

abstract isWriteable() bool

Return True if this Butler supports write operations.

markInputUnused(ref: DatasetRef) None

Indicate that a predicted input was not actually used when processing a Quantum.

Parameters:
refDatasetRef

Reference to the unused dataset.

Notes

By default, a dataset is considered “actually used” if it is accessed via getDirect or a handle to it is obtained via getDirectDeferred (even if the handle is not used). This method must be called after one of those in order to remove the dataset from the actual input list.

This method does nothing for butlers that do not store provenance information (which is the default implementation provided by the base class).

abstract pruneDatasets(refs: Iterable[DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Iterable[str] = (), purge: bool = False) None

Remove one or more datasets from a collection and/or storage.

Parameters:
refsIterable of DatasetRef

Datasets to prune. These must be “resolved” references (not just a DatasetType and data ID).

disassociatebool, optional

Disassociate pruned datasets from tags, or from all collections if purge=True.

unstorebool, optional

If True (False is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.

tagsIterable [ str ], optional

TAGGED collections to disassociate the datasets from. Ignored if disassociate is False or purge is True.

purgebool, optional

If True (False is default), completely remove the dataset from the Registry. To prevent accidental deletions, purge may only be True if all of the following conditions are met:

  • disassociate is True;

  • unstore is True.

This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.

Raises:
TypeError

Raised if the butler is read-only, if no collection was provided, or the conditions for purge=True were not met.

abstract put(obj: Any, ref: DatasetRef, /) DatasetRef

Store a dataset that already has a UUID and RUN collection.

Parameters:
objobject

The dataset.

refDatasetRef

Resolved reference for a not-yet-stored dataset.

Returns:
refDatasetRef

The same as the given, for convenience and symmetry with Butler.put.

Raises:
TypeError

Raised if the butler is read-only.

Notes

Whether this method inserts the given dataset into a Registry is implementation defined (some LimitedButler subclasses do not have a Registry), but it always adds the dataset to a Datastore, and the given ref.id and ref.run are always preserved.

putDirect(obj: Any, ref: DatasetRef, /) DatasetRef

Store a dataset that already has a UUID and RUN collection.

Parameters:
objobject

The dataset.

refDatasetRef

Resolved reference for a not-yet-stored dataset.

Returns:
refDatasetRef

The same as the given, for convenience and symmetry with Butler.put.

Raises:
TypeError

Raised if the butler is read-only.

Notes

Whether this method inserts the given dataset into a Registry is implementation defined (some LimitedButler subclasses do not have a Registry), but it always adds the dataset to a Datastore, and the given ref.id and ref.run are always preserved.

Deprecated since version v26.0: Butler.put() now behaves like Butler.putDirect() when given a DatasetRef. Please use Butler.put(). Will be removed after v26.0.

stored(ref: DatasetRef) bool

Indicate whether the dataset’s artifacts are present in the Datastore.

Parameters:
refDatasetRef

Resolved reference to a dataset.

Returns:
storedbool

Whether the dataset artifact exists in the datastore and can be retrieved.

stored_many(refs: Iterable[DatasetRef]) dict[lsst.daf.butler._dataset_ref.DatasetRef, bool]

Check the datastore for artifact existence of multiple datasets at once.

Parameters:
refsiterable of DatasetRef

The datasets to be checked.

Returns:
existencedict of [DatasetRef, bool]

Mapping from given dataset refs to boolean indicating artifact existence.