Butler

class lsst.daf.butler.Butler(config: Config | str | ParseResult | ResourcePath | Path | None = None, *, butler: Butler | None = None, collections: Any = None, run: str | None = None, searchPaths: Sequence[str | ParseResult | ResourcePath | Path] | None = None, writeable: bool | None = None, inferDefaults: bool = True, without_datastore: bool = False, **kwargs: str)

Bases: LimitedButler

Main entry point for the data access system.

Parameters:
configButlerConfig, Config or str, optional.

Configuration. Anything acceptable to the ButlerConfig constructor. If a directory path is given the configuration will be read from a butler.yaml file in that location. If None is given default values will be used.

butlerButler, optional.

If provided, construct a new Butler that uses the same registry and datastore as the given one, but with the given collection and run. Incompatible with the config, searchPaths, and writeable arguments.

collectionsstr or Iterable [ str ], optional

An expression specifying the collections to be searched (in order) when reading datasets. This may be a str collection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after the Butler is initialized.

runstr, optional

Name of the RUN collection new datasets should be inserted into. If collections is None and run is not None, collections will be set to [run]. If not None, this collection will automatically be registered. If this is not set (and writeable is not set either), a read-only butler will be created.

searchPathslist of str, optional

Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a ButlerConfig.

writeablebool, optional

Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of run, tags, or chains is non-empty.

inferDefaultsbool, optional

If True (default) infer default data ID values from the values present in the datasets in collections: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via **kwargs, no default will be inferred for that dimension.

without_datastorebool, optional

If True do not attach a datastore to this butler. Any attempts to use a datastore will fail.

**kwargsstr

Default data ID key-value pairs. These may only identify “governor” dimensions like instrument and skymap.

Examples

While there are many ways to control exactly how a Butler interacts with the collections in its Registry, the most common cases are still simple.

For a read-only Butler that searches one collection, do:

butler = Butler("/path/to/repo", collections=["u/alice/DM-50000"])

For a read-write Butler that writes to and reads from a RUN collection:

butler = Butler("/path/to/repo", run="u/alice/DM-50000/a")

The Butler passed to a PipelineTask is often much more complex, because we want to write to one RUN collection but read from several others (as well):

butler = Butler("/path/to/repo", run="u/alice/DM-50000/a",
                collections=["u/alice/DM-50000/a",
                             "u/bob/DM-49998",
                             "HSC/defaults"])

This butler will put new datasets to the run u/alice/DM-50000/a. Datasets will be read first from that run (since it appears first in the chain), and then from u/bob/DM-49998 and finally HSC/defaults.

Finally, one can always create a Butler with no collections:

butler = Butler("/path/to/repo", writeable=True)

This can be extremely useful when you just want to use butler.registry, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passing writeable explicitly here is only necessary if you want to be able to make changes to the repo - usually the value for writeable can be guessed from the collection arguments provided, but it defaults to False when there are not collection arguments.

Attributes Summary

GENERATION

This is a Generation 3 Butler.

collections

The collections to search by default, in order (Sequence [ str ]).

datastore

The object that manages actual dataset storage (Datastore).

dimensions

Structure managing all dimensions recognized by this data repository (DimensionUniverse).

registry

The object that manages dataset metadata and relationships (Registry).

run

Name of the run this butler writes outputs to by default (str or None).

Methods Summary

datasetExists(datasetRefOrType[, dataId, ...])

Return True if the Dataset is actually present in the Datastore.

datasetExistsDirect(ref)

Return True if a dataset is actually present in the Datastore.

exists(dataset_ref_or_type, /[, data_id, ...])

Indicate whether a dataset is known to Butler registry and datastore.

export(*[, directory, filename, format, ...])

Export datasets from the repository represented by this Butler.

get(datasetRefOrType, /[, dataId, ...])

Retrieve a stored dataset.

getDeferred(datasetRefOrType, /[, dataId, ...])

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

getDirect(ref, *[, parameters, storageClass])

Retrieve a stored dataset.

getDirectDeferred(ref, *[, parameters, ...])

Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.

getURI(datasetRefOrType, /[, dataId, ...])

Return the URI to the Dataset.

getURIs(datasetRefOrType, /[, dataId, ...])

Return the URIs associated with the dataset.

get_datastore_names()

Return the names of the datastores associated with this butler.

get_datastore_roots()

Return the defined root URIs for all registered datastores.

get_known_repos()

Retrieve the list of known repository labels.

get_many_uris(refs[, predict, allow_missing])

Return URIs associated with many datasets.

get_repo_uri(label[, return_label])

Look up the label in a butler repository index.

import_(*[, directory, filename, format, ...])

Import datasets into this repository that were exported from a different butler repository via export.

ingest(*datasets[, transfer, run, ...])

Store and register one or more datasets that already exist on disk.

isWriteable()

Return True if this Butler supports write operations.

makeRepo(root[, config, dimensionConfig, ...])

Create an empty data repository by adding a butler.yaml config to a repository root directory.

markInputUnused(ref)

Indicate that a predicted input was not actually used when processing a Quantum.

pruneDatasets(refs, *[, disassociate, ...])

Remove one or more datasets from a collection and/or storage.

put(obj, datasetRefOrType, /[, dataId, run])

Store and register a dataset.

putDirect(obj, ref, /)

Deprecated since version v26.0.

removeRuns(names[, unstore])

Remove one or more RUN collections and the datasets within them.

retrieveArtifacts(refs, destination[, ...])

Retrieve the artifacts associated with the supplied refs.

stored(ref)

Indicate whether the dataset's artifacts are present in the Datastore.

stored_many(refs)

Check the datastore for artifact existence of multiple datasets at once.

transaction()

Context manager supporting Butler transactions.

transfer_from(source_butler, source_refs[, ...])

Transfer datasets to this Butler from a run in another Butler.

validateConfiguration([logFailures, ...])

Validate butler configuration.

Attributes Documentation

GENERATION: ClassVar[int] = 3

This is a Generation 3 Butler.

This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.

collections

The collections to search by default, in order (Sequence [ str ]).

This is an alias for self.registry.defaults.collections. It cannot be set directly in isolation, but all defaults may be changed together by assigning a new RegistryDefaults instance to self.registry.defaults.

datastore: Datastore

The object that manages actual dataset storage (Datastore).

Direct user access to the datastore should rarely be necessary; the primary exception is the case where a Datastore implementation provides extra functionality beyond what the base class defines.

dimensions
registry

The object that manages dataset metadata and relationships (Registry).

Many operations that don’t involve reading or writing butler datasets are accessible only via Registry methods. Eventually these methods will be replaced by equivalent Butler methods.

run

Name of the run this butler writes outputs to by default (str or None).

This is an alias for self.registry.defaults.run. It cannot be set directly in isolation, but all defaults may be changed together by assigning a new RegistryDefaults instance to self.registry.defaults.

Methods Documentation

datasetExists(datasetRefOrType: DatasetRef | DatasetType | str, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, collections: Any = None, **kwargs: Any) bool

Return True if the Dataset is actually present in the Datastore.

Parameters:
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Raises:
LookupError

Raised if the dataset is not even present in the Registry.

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

NoDefaultCollectionError

Raised if no collections were provided.

Deprecated since version v26.0: Butler.datasetExists() has been replaced by Butler.exists(). Will be removed after v26.0.

datasetExistsDirect(ref: DatasetRef) bool

Return True if a dataset is actually present in the Datastore.

Parameters:
refDatasetRef

Resolved reference to a dataset.

Returns:
existsbool

Whether the dataset exists in the Datastore.

Deprecated since version v26.0: Butler.datasetExistsDirect() has been replaced by Butler.stored(). Will be removed after v26.0.

exists(dataset_ref_or_type: DatasetRef | DatasetType | str, /, data_id: DataCoordinate | Mapping[str, Any] | None = None, *, full_check: bool = True, collections: Any = None, **kwargs: Any) DatasetExistence

Indicate whether a dataset is known to Butler registry and datastore.

Parameters:
dataset_ref_or_typeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

data_iddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

full_checkbool, optional

If True, an additional check will be made for dataset artifact existence. This will involve additional overhead due to the need to query an external system. If False registry and datastore will solely be asked if they know about the dataset but no check for the artifact will be performed.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
existenceDatasetExistence

Object indicating whether the dataset is known to registry and datastore. Evaluates to True if the dataset is present and known to both.

export(*, directory: str | None = None, filename: str | None = None, format: str | None = None, transfer: str | None = None) Iterator[RepoExportContext]

Export datasets from the repository represented by this Butler.

This method is a context manager that returns a helper object (RepoExportContext) that is used to indicate what information from the repository should be exported.

Parameters:
directorystr, optional

Directory dataset files should be written to if transfer is not None.

filenamestr, optional

Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and directory is not None, it will be written to directory instead of the current working directory. Defaults to “export.{format}”.

formatstr, optional

File format for the database information file. If None, the extension of filename will be used.

transferstr, optional

Transfer mode passed to Datastore.export.

Raises:
TypeError

Raised if the set of arguments passed is inconsistent.

Examples

Typically the Registry.queryDataIds and Registry.queryDatasets methods are used to provide the iterables over data IDs and/or datasets to be exported:

with butler.export("exports.yaml") as export:
    # Export all flats, but none of the dimension element rows
    # (i.e. data ID information) associated with them.
    export.saveDatasets(butler.registry.queryDatasets("flat"),
                        elements=())
    # Export all datasets that start with "deepCoadd_" and all of
    # their associated data ID information.
    export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
get(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, parameters: dict[str, Any] | None = None, collections: Any = None, storageClass: StorageClass | str | None = None, **kwargs: Any) Any

Retrieve a stored dataset.

Parameters:
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof. If a resolved DatasetRef, the associated dataset is returned directly without additional querying.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
objobject

The dataset.

Raises:
LookupError

Raised if no matching dataset exists in the Registry.

TypeError

Raised if no collections were provided.

Notes

When looking up datasets in a CALIBRATION collection, this method requires that the given data ID include temporal dimensions beyond the dimensions of the dataset type itself, in order to find the dataset with the appropriate validity range. For example, a “bias” dataset with native dimensions {instrument, detector} could be fetched with a {instrument, detector, exposure} data ID, because exposure is a temporal dimension.

getDeferred(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, parameters: dict | None = None, collections: Any = None, storageClass: str | StorageClass | None = None, **kwargs: Any) DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

Parameters:
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate, optional

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

**kwargs

Additional keyword arguments used to augment or construct a DataId. See DataId parameters.

Returns:
objDeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Raises:
LookupError

Raised if no matching dataset exists in the Registry or datastore.

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

TypeError

Raised if no collections were provided.

getDirect(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any

Retrieve a stored dataset.

Parameters:
refDatasetRef

Resolved reference to an already stored dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objobject

The dataset.

Deprecated since version v26.0: Butler.get() now behaves like Butler.getDirect() when given a DatasetRef. Please use Butler.get(). Will be removed after v26.0.

getDirectDeferred(ref: DatasetRef, *, parameters: dict | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.

Parameters:
refDatasetRef

Resolved reference to an already stored dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

storageClassStorageClass or str, optional

The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.

Returns:
objDeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Raises:
LookupError

Raised if no matching dataset exists in the Registry.

Deprecated since version v26.0: Butler.getDeferred() now behaves like getDirectDeferred() when given a DatasetRef. Please use Butler.getDeferred(). Will be removed after v26.0.

getURI(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) ResourcePath

Return the URI to the Dataset.

Parameters:
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

predictbool

If True, allow URIs to be returned of datasets that have not been written.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

runstr, optional

Run to use for predictions, overriding self.run.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
urilsst.resources.ResourcePath

URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if predict is True, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.

Raises:
LookupError

A URI has been requested for a dataset that does not exist and guessing is not allowed.

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

TypeError

Raised if no collections were provided.

RuntimeError

Raised if a URI is requested for a dataset that consists of multiple artifacts.

getURIs(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) DatasetRefURIs

Return the URIs associated with the dataset.

Parameters:
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

predictbool

If True, allow URIs to be returned of datasets that have not been written.

collectionsAny, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

runstr, optional

Run to use for predictions, overriding self.run.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
urisDatasetRefURIs

The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be None), and the URIs to any components associated with the dataset artifact. (can be empty if there are no components).

get_datastore_names() tuple[str, ...]

Return the names of the datastores associated with this butler.

Returns:
namestuple [str, …]

The names of the datastores.

get_datastore_roots() dict[str, lsst.resources._resourcePath.ResourcePath | None]

Return the defined root URIs for all registered datastores.

Returns:
rootsdict [str, ResourcePath | None]

A mapping from datastore name to datastore root URI. The root can be None if the datastore does not have any concept of a root URI.

classmethod get_known_repos() set[str]

Retrieve the list of known repository labels.

Returns:
reposset of str

All the known labels. Can be empty if no index can be found.

Notes

See ButlerRepoIndex for details on how the information is discovered.

get_many_uris(refs: Iterable[DatasetRef], predict: bool = False, allow_missing: bool = False) dict[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datastore.DatasetRefURIs]

Return URIs associated with many datasets.

Parameters:
refsiterable of DatasetIdRef

References to the required datasets.

predictbool, optional

If True, allow URIs to be returned of datasets that have not been written.

allow_missingbool

If False, and predict is False, will raise if a DatasetRef does not exist.

Returns:
URIsdict of [DatasetRef, DatasetRefURIs]

A dict of primary and component URIs, indexed by the passed-in refs.

Raises:
FileNotFoundError

A URI has been requested for a dataset that does not exist and guessing is not allowed.

Notes

In file-based datastores, get_many_uris does not check that the file is present. It assumes that if datastore is aware of the file then it actually exists.

classmethod get_repo_uri(label: str, return_label: bool = False) ResourcePath

Look up the label in a butler repository index.

Parameters:
labelstr

Label of the Butler repository to look up.

return_labelbool, optional

If label cannot be found in the repository index (either because index is not defined or label is not in the index) and return_label is True then return ResourcePath(label). If return_label is False (default) then an exception will be raised instead.

Returns:
urilsst.resources.ResourcePath

URI to the Butler repository associated with the given label or default value if it is provided.

Raises:
KeyError

Raised if the label is not found in the index, or if an index is not defined, and return_label is False.

Notes

See ButlerRepoIndex for details on how the information is discovered.

import_(*, directory: str | ParseResult | ResourcePath | Path | None = None, filename: str | ParseResult | ResourcePath | Path | TextIO | None = None, format: str | None = None, transfer: str | None = None, skip_dimensions: set | None = None) None

Import datasets into this repository that were exported from a different butler repository via export.

Parameters:
directoryResourcePathExpression, optional

Directory containing dataset files to import from. If None, filename and all dataset file paths specified therein must be absolute.

filenameResourcePathExpression or TextIO

A stream or name of file that contains database information associated with the exported datasets, typically generated by export. If this a string (name) or ResourcePath and is not an absolute path, it will first be looked for relative to directory and if not found there it will be looked for in the current working directory. Defaults to “export.{format}”.

formatstr, optional

File format for filename. If None, the extension of filename will be used.

transferstr, optional

Transfer mode passed to ingest.

skip_dimensionsset, optional

Names of dimensions that should be skipped and not imported.

Raises:
TypeError

Raised if the set of arguments passed is inconsistent, or if the butler is read-only.

ingest(*datasets: FileDataset, transfer: str | None = 'auto', run: str | None = None, idGenerationMode: DatasetIdGenEnum | None = None, record_validation_info: bool = True) None

Store and register one or more datasets that already exist on disk.

Parameters:
datasetsFileDataset

Each positional argument is a struct containing information about a file to be ingested, including its URI (either absolute or relative to the datastore root, if applicable), a resolved DatasetRef, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used for put is assumed. On successful ingest all FileDataset.formatter attributes will be set to the formatter class used. FileDataset.path attributes may be modified to put paths in whatever the datastore considers a standardized form.

transferstr, optional

If not None, must be one of ‘auto’, ‘move’, ‘copy’, ‘direct’, ‘split’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.

runstr, optional

The name of the run ingested datasets should be added to, overriding self.run. This parameter is now deprecated since the run is encoded in the FileDataset.

idGenerationModeDatasetIdGenEnum, optional

Specifies option for generating dataset IDs. Parameter is deprecated.

record_validation_infobool, optional

If True, the default, the datastore can record validation information associated with the file. If False the datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.

Raises:
TypeError

Raised if the butler is read-only or if no run was provided.

NotImplementedError

Raised if the Datastore does not support the given transfer mode.

DatasetTypeNotSupportedError

Raised if one or more files to be ingested have a dataset type that is not supported by the Datastore..

FileNotFoundError

Raised if one of the given files does not exist.

FileExistsError

Raised if transfer is not None but the (internal) location the file would be moved to is already occupied.

Notes

This operation is not fully exception safe: if a database operation fails, the given FileDataset instances may be only partially updated.

It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.

isWriteable() bool

Return True if this Butler supports write operations.

static makeRepo(root: str | ParseResult | ResourcePath | Path, config: Config | str | None = None, dimensionConfig: Config | str | None = None, standalone: bool = False, searchPaths: list[str] | None = None, forceConfigRoot: bool = True, outfile: str | ParseResult | ResourcePath | Path | None = None, overwrite: bool = False) Config

Create an empty data repository by adding a butler.yaml config to a repository root directory.

Parameters:
rootlsst.resources.ResourcePathExpression

Path or URI to the root location of the new repository. Will be created if it does not exist.

configConfig or str, optional

Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a ButlerConfig or a ConfigSubset. If None, default configuration will be used. Root-dependent config options specified in this config are overwritten if forceConfigRoot is True.

dimensionConfigConfig or str, optional

Configuration for dimensions, will be used to initialize registry database.

standalonebool

If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing Butlers to repos created with standalone=True.

searchPathslist of str, optional

Directory paths to search when calculating the full butler configuration.

forceConfigRootbool, optional

If False, any values present in the supplied config that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter is True the values for root will be forced into the resulting config if appropriate.

outfilelss.resources.ResourcePathExpression, optional

If not-None, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to write butler.yaml.

overwritebool, optional

Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.

Returns:
configConfig

The updated Config instance written to the repo.

Raises:
ValueError

Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support standalone=False).

FileExistsError

Raised if the output config file already exists.

os.error

Raised if the directory does not exist, exists but is not a directory, or cannot be created.

Notes

Note that when standalone=False (the default), the configuration search path (see ConfigSubset.defaultSearchPaths) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.

markInputUnused(ref: DatasetRef) None

Indicate that a predicted input was not actually used when processing a Quantum.

Parameters:
refDatasetRef

Reference to the unused dataset.

Notes

By default, a dataset is considered “actually used” if it is accessed via getDirect or a handle to it is obtained via getDirectDeferred (even if the handle is not used). This method must be called after one of those in order to remove the dataset from the actual input list.

This method does nothing for butlers that do not store provenance information (which is the default implementation provided by the base class).

pruneDatasets(refs: Iterable[DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Iterable[str] = (), purge: bool = False) None

Remove one or more datasets from a collection and/or storage.

Parameters:
refsIterable of DatasetRef

Datasets to prune. These must be “resolved” references (not just a DatasetType and data ID).

disassociatebool, optional

Disassociate pruned datasets from tags, or from all collections if purge=True.

unstorebool, optional

If True (False is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.

tagsIterable [ str ], optional

TAGGED collections to disassociate the datasets from. Ignored if disassociate is False or purge is True.

purgebool, optional

If True (False is default), completely remove the dataset from the Registry. To prevent accidental deletions, purge may only be True if all of the following conditions are met:

  • disassociate is True;

  • unstore is True.

This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.

Raises:
TypeError

Raised if the butler is read-only, if no collection was provided, or the conditions for purge=True were not met.

put(obj: Any, datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, run: str | None = None, **kwargs: Any) DatasetRef

Store and register a dataset.

Parameters:
objobject

The dataset.

datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef is provided, dataId should be None. Otherwise the DatasetType or name thereof. If a fully resolved DatasetRef is given the run and ID are used directly.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the second argument.

runstr, optional

The name of the run the dataset should be added to, overriding self.run. Not used if a resolved DatasetRef is provided.

**kwargs

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters. Not used if a resolve DatasetRef is provided.

Returns:
refDatasetRef

A reference to the stored dataset, updated with the correct id if given.

Raises:
TypeError

Raised if the butler is read-only or if no run has been provided.

putDirect(obj: Any, ref: DatasetRef, /) DatasetRef

Deprecated since version v26.0: Butler.put() now behaves like Butler.putDirect() when given a DatasetRef. Please use Butler.put(). Be aware that you may need to adjust your usage if you were relying on the run parameter to determine the run. Will be removed after v26.0.

removeRuns(names: Iterable[str], unstore: bool = True) None

Remove one or more RUN collections and the datasets within them.

Parameters:
namesIterable [ str ]

The names of the collections to remove.

unstorebool, optional

If True (default), delete datasets from all datastores in which they are present, and attempt to rollback the registry deletions if datastore deletions fail (which may not always be possible). If False, datastore records for these datasets are still removed, but any artifacts (e.g. files) will not be.

Raises:
TypeError

Raised if one or more collections are not of type RUN.

retrieveArtifacts(refs: Iterable[DatasetRef], destination: str | ParseResult | ResourcePath | Path, transfer: str = 'auto', preserve_path: bool = True, overwrite: bool = False) list[lsst.resources._resourcePath.ResourcePath]

Retrieve the artifacts associated with the supplied refs.

Parameters:
refsiterable of DatasetRef

The datasets for which artifacts are to be retrieved. A single ref can result in multiple artifacts. The refs must be resolved.

destinationlsst.resources.ResourcePath or str

Location to write the artifacts.

transferstr, optional

Method to use to transfer the artifacts. Must be one of the options supported by transfer_from(). “move” is not allowed.

preserve_pathbool, optional

If True the full path of the artifact within the datastore is preserved. If False the final file component of the path is used.

overwritebool, optional

If True allow transfers to overwrite existing files at the destination.

Returns:
targetslist of lsst.resources.ResourcePath

URIs of file artifacts in destination location. Order is not preserved.

Notes

For non-file datastores the artifacts written to the destination may not match the representation inside the datastore. For example a hierarchical data structure in a NoSQL database may well be stored as a JSON file.

stored(ref: DatasetRef) bool

Indicate whether the dataset’s artifacts are present in the Datastore.

Parameters:
refDatasetRef

Resolved reference to a dataset.

Returns:
storedbool

Whether the dataset artifact exists in the datastore and can be retrieved.

stored_many(refs: Iterable[DatasetRef]) dict[lsst.daf.butler.core.datasets.ref.DatasetRef, bool]

Check the datastore for artifact existence of multiple datasets at once.

Parameters:
refsiterable of DatasetRef

The datasets to be checked.

Returns:
existencedict of [DatasetRef, bool]

Mapping from given dataset refs to boolean indicating artifact existence.

transaction() Iterator[None]

Context manager supporting Butler transactions.

Transactions can be nested.

transfer_from(source_butler: LimitedButler, source_refs: Iterable[DatasetRef], transfer: str = 'auto', skip_missing: bool = True, register_dataset_types: bool = False, transfer_dimensions: bool = False) Collection[DatasetRef]

Transfer datasets to this Butler from a run in another Butler.

Parameters:
source_butlerLimitedButler

Butler from which the datasets are to be transferred. If data IDs in source_refs are not expanded then this has to be a full Butler whose registry will be used to expand data IDs.

source_refsiterable of DatasetRef

Datasets defined in the source butler that should be transferred to this butler.

transferstr, optional

Transfer mode passed to transfer_from.

skip_missingbool

If True, datasets with no datastore artifact associated with them are not transferred. If False a registry entry will be created even if no datastore record is created (and so will look equivalent to the dataset being unstored).

register_dataset_typesbool

If True any missing dataset types are registered. Otherwise an exception is raised.

transfer_dimensionsbool, optional

If True, dimension record data associated with the new datasets will be transferred.

Returns:
refslist of DatasetRef

The refs added to this Butler.

Notes

The datastore artifact has to exist for a transfer to be made but non-existence is not an error.

Datasets that already exist in this run will be skipped.

The datasets are imported as part of a transaction, although dataset types are registered before the transaction is started. This means that it is possible for a dataset type to be registered even though transfer has failed.

validateConfiguration(logFailures: bool = False, datasetTypeNames: Iterable[str] | None = None, ignore: Iterable[str] | None = None) None

Validate butler configuration.

Checks that each DatasetType can be stored in the Datastore.

Parameters:
logFailuresbool, optional

If True, output a log message for every validation error detected.

datasetTypeNamesiterable of str, optional

The DatasetType names that should be checked. This allows only a subset to be selected.

ignoreiterable of str, optional

Names of DatasetTypes to skip over. This can be used to skip known problems. If a named DatasetType corresponds to a composite, all components of that DatasetType will also be ignored.

Raises:
ButlerValidationError

Raised if there is some inconsistency with how this Butler is configured.