Butler¶

Bases: LimitedButler

Interface for data butler and factory for Butler instances.

Parameters:

configButlerConfig, Config or str, optional: Configuration. Anything acceptable to the ButlerConfig constructor. If a directory path is given the configuration will be read from a butler.yaml file in that location. If None is given default values will be used. If config contains “cls” key then its value is used as a name of butler class and it must be a sub-class of this class, otherwise DirectButler is instantiated.
collectionsstr or Iterable [ str ], optional: An expression specifying the collections to be searched (in order) when reading datasets. This may be a str collection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after the Butler is initialized.
runstr, optional: Name of the RUN collection new datasets should be inserted into. If collections is None and run is not None, collections will be set to [run]. If not None, this collection will automatically be registered. If this is not set (and writeable is not set either), a read-only butler will be created.
searchPathslist of str, optional: Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a ButlerConfig.
writeablebool, optional: Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of run, tags, or chains is non-empty.
inferDefaultsbool, optional: If True (default) infer default data ID values from the values present in the datasets in collections: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via **kwargs, no default will be inferred for that dimension.
without_datastorebool, optional: If True do not attach a datastore to this butler. Any attempts to use a datastore will fail.
metricsButlerMetrics or None: External metrics object to be used for tracking butler usage. If None a new metrics object is created.
**kwargstyping.Any: Additional keyword arguments passed to a constructor of actual butler class.

Notes

The preferred way to instantiate Butler is via the from_config method. The call to Butler(...) is equivalent to Butler.from_config(...), but mypy will complain about the former.

Attributes Summary

`collection_chains`	Object with methods for modifying collection chains (`ButlerCollections`).
`collections`	Object with methods for modifying and querying collections (`ButlerCollections`).
`registry`	The object that manages dataset metadata and relationships (`Registry`).
`run`	Name of the run this butler writes outputs to by default (`str` or `None`).

Methods Summary

`clone`(*[, collections, run, inferDefaults, ...])	Return a new Butler instance connected to the same repository as this one, optionally overriding `collections`, `run`, `inferDefaults`, and default data ID.
`close`()	Release all resources associated with this Butler instance.
`exists`(dataset_ref_or_type, /[, data_id, ...])	Indicate whether a dataset is known to Butler registry and datastore.
`export`(*[, directory, filename, format, ...])	Export datasets from the repository represented by this `Butler`.
`find_dataset`(dataset_type[, data_id, ...])	Find a dataset given its `DatasetType` and data ID.
`from_config`([config, collections, run, ...])	Create butler instance from configuration.
`get`(datasetRefOrType, /[, dataId, ...])	Retrieve a stored dataset.
`getDeferred`(datasetRefOrType, /[, dataId, ...])	Create a `DeferredDatasetHandle` which can later retrieve a dataset, after an immediate registry lookup.
`getURI`(datasetRefOrType, /[, dataId, ...])	Return the URI to the Dataset.
`getURIs`(datasetRefOrType, /[, dataId, ...])	Return the URIs associated with the dataset.
`get_dataset`(id, *[, storage_class, ...])	Retrieve a Dataset entry.
`get_dataset_from_uri`(uri[, factory])	Get the dataset associated with the given dataset URI.
`get_dataset_type`(name)	Get the `DatasetType`.
`get_known_repos`()	Retrieve the list of known repository labels.
`get_many_datasets`(ids)	Retrieve a list of dataset entries.
`get_repo_uri`(label[, return_label])	Look up the label in a butler repository index.
`import_`(*[, directory, filename, format, ...])	Import datasets into this repository that were exported from a different butler repository via `export`.
`ingest`(*datasets[, transfer, ...])	Store and register one or more datasets that already exist on disk.
`ingest_zip`(zip_file[, transfer, ...])	Ingest a Zip file into this butler.
`makeRepo`(root[, config, dimensionConfig, ...])	Create an empty data repository by adding a butler.yaml config to a repository root directory.
`parse_dataset_uri`(uri)	Extract the butler label and dataset ID from a dataset URI.
`put`(obj, datasetRefOrType, /[, dataId, run, ...])	Store and register a dataset.
`query`()	Context manager returning a `queries.Query` object used for construction and execution of complex queries.
`query_all_datasets`([collections, name, ...])	Query for datasets of potentially multiple types.
`query_data_ids`(dimensions, *[, data_id, ...])	Query for data IDs matching user-provided criteria.
`query_datasets`(dataset_type[, collections, ...])	Query for dataset references matching user-provided criteria.
`query_dimension_records`(element, *[, ...])	Query for dimension information matching user-provided criteria.
`removeRuns`(names[, unstore, unlink_from_chains])	Remove one or more `RUN` collections and the datasets within them.
`retrieveArtifacts`(refs, destination[, ...])	Retrieve the artifacts associated with the supplied refs.
`retrieve_artifacts_zip`(refs, destination[, ...])	Retrieve artifacts from a Butler and place in ZIP file.
`transaction`()	Context manager supporting `Butler` transactions.
`transfer_dimension_records_from`(...)	Transfer dimension records to this Butler from another Butler.
`transfer_from`(source_butler, source_refs[, ...])	Transfer datasets to this Butler from a run in another Butler.
`validateConfiguration`([logFailures, ...])	Validate butler configuration.

Attributes Documentation

collection_chains¶

Object with methods for modifying collection chains (ButlerCollections).

Deprecated. Replaced with collections property.

collections¶

Object with methods for modifying and querying collections (ButlerCollections).

Use of this object is preferred over registry wherever possible.

registry¶

The object that manages dataset metadata and relationships (Registry).

Many operations that don’t involve reading or writing butler datasets are accessible only via Registry methods. Eventually these methods will be replaced by equivalent Butler methods.

run¶: Name of the run this butler writes outputs to by default (str or None).

Methods Documentation

Return a new Butler instance connected to the same repository as this one, optionally overriding collections, run, inferDefaults, and default data ID.

Parameters:

collectionsCollectionArgType or None, optional: Same as constructor. If omitted, uses value from original object.
runstr or None, optional: Same as constructor. If None, no default run is used. If omitted, copies value from original object.
inferDefaultsbool, optional: Same as constructor. If omitted, copies value from original object.
dataIdstr: Same as kwargs passed to the constructor. If omitted, copies values from original object.
metricsButlerMetrics or None, optional: Metrics object to record butler statistics.

abstract close() → None¶

Release all resources associated with this Butler instance. The instance may no longer be used after this is called.

Notes

Instead of calling close() directly, you can use the Butler object as a context manager. For example:

with Butler(...) as butler:
    butler.get(...)
# butler is closed after exiting the block.

abstract exists(dataset_ref_or_type: DatasetRef | DatasetType | str, /, data_id: DataId | None = None, *, full_check: bool = True, collections: Any = None, **kwargs: Any) → DatasetExistence¶

Indicate whether a dataset is known to Butler registry and datastore.

Parameters:

dataset_ref_or_typeDatasetRef, DatasetType, or str: When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.
data_iddict or DataCoordinate: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.
full_checkbool, optional: If True, a check will be made for the actual existence of a dataset artifact. This will involve additional overhead due to the need to query an external system. If False, this check will be omitted, and the registry and datastore will solely be asked if they know about the dataset but no direct check for the artifact will be performed.
collectionsAny, optional: Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.
**kwargs: Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:

existenceDatasetExistence: Object indicating whether the dataset is known to registry and datastore. Evaluates to True if the dataset is present and known to both.

abstract export(*, directory: str | None = None, filename: str | None = None, format: str | None = None, transfer: str | None = None) → AbstractContextManager[RepoExportContext]¶

Export datasets from the repository represented by this Butler.

This method is a context manager that returns a helper object (RepoExportContext) that is used to indicate what information from the repository should be exported.

Parameters:

directorystr, optional: Directory dataset files should be written to if transfer is not None.
filenamestr, optional: Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and directory is not None, it will be written to directory instead of the current working directory. Defaults to “export.{format}”.
formatstr, optional: File format for the database information file. If None, the extension of filename will be used.
transferstr, optional: Transfer mode passed to Datastore.export.

Raises:

TypeError: Raised if the set of arguments passed is inconsistent.

Examples

Typically the Registry.queryDataIds and Registry.queryDatasets methods are used to provide the iterables over data IDs and/or datasets to be exported:

with butler.export("exports.yaml") as export:
    # Export all flats, but none of the dimension element rows
    # (i.e. data ID information) associated with them.
    export.saveDatasets(
        butler.registry.queryDatasets("flat"), elements=()
    )
    # Export all datasets that start with "deepCoadd_" and all of
    # their associated data ID information.
    export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))

Find a dataset given its DatasetType and data ID.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.

Parameters:

dataset_typeDatasetType or str: A DatasetType or the name of one. If this is a DatasetType instance, its storage class will be respected and propagated to the output, even if it differs from the dataset type definition in the registry, as long as the storage classes are convertible.
data_iddict or DataCoordinate, optional: A dict-like object containing the Dimension links that identify the dataset within a collection. If it is a dict the dataId can include dimension record values such as day_obs and seq_num or full_name that can be used to derive the primary dimension.
collectionsstr or list [str], optional: A an ordered list of collections to search for the dataset. Defaults to self.defaults.collections.
timespanTimespan, optional: A timespan that the validity range of the dataset must overlap. If not provided, any CALIBRATION collections matched by the collections argument will not be searched.
storage_classstr or StorageClass or None: A storage class to use when creating the returned entry. If given it must be compatible with the default storage class.
dimension_recordsbool, optional: If True the ref will be expanded and contain dimension records.
datastore_recordsbool, optional: If True the ref will contain associated datastore records.
**kwargs: Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one. This can also include dimension record metadata that can be used to derive a primary dimension value.

Returns:

refDatasetRef: A reference to the dataset, or None if no matching Dataset was found.

Raises:

lsst.daf.butler.NoDefaultCollectionError: Raised if collections is None and self.collections is None.
LookupError: Raised if one or more data ID keys are missing.
lsst.daf.butler.MissingDatasetTypeError: Raised if the dataset type does not exist.
lsst.daf.butler.MissingCollectionError: Raised if any of collections does not exist in the registry.

Notes

This method simply returns None and does not raise an exception even when the set of collections searched is intrinsically incompatible with the dataset type, e.g. if datasetType.isCalibration() is False, but only CALIBRATION collections are being searched. This may make it harder to debug some lookup failures, but the behavior is intentional; we consider it more important that failed searches are reported consistently, regardless of the reason, and that adding additional collections that do not contain a match to the search path never changes the behavior.

This method handles component dataset types automatically, though most other query operations do not.

Create butler instance from configuration.

Parameters:

configButlerConfig, Config or str, optional: Configuration. Anything acceptable to the ButlerConfig constructor. If a directory path is given the configuration will be read from a butler.yaml file in that location. If None is given default values will be used. If config contains “cls” key then its value is used as a name of butler class and it must be a sub-class of this class, otherwise DirectButler is instantiated.
collectionsstr or Iterable [ str ], optional: An expression specifying the collections to be searched (in order) when reading datasets. This may be a str collection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after the Butler is initialized.
runstr, optional: Name of the RUN collection new datasets should be inserted into. If collections is None and run is not None, collections will be set to [run]. If not None, this collection will automatically be registered. If this is not set (and writeable is not set either), a read-only butler will be created.
searchPathslist of str, optional: Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a ButlerConfig.
writeablebool, optional: Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of run, tags, or chains is non-empty.
inferDefaultsbool, optional: If True (default) infer default data ID values from the values present in the datasets in collections: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via **kwargs, no default will be inferred for that dimension.
without_datastorebool, optional: If True do not attach a datastore to this butler. Any attempts to use a datastore will fail.
metricsButlerMetrics or None, optional: Metrics object to record butler usage statistics.
**kwargstyping.Any: Default data ID key-value pairs. These may only identify “governor” dimensions like instrument and skymap.

Returns:

butlerButler: A Butler constructed from the given configuration.

Notes

Calling this factory method is identical to calling Butler(config, ...). Its only raison d’être is that mypy complains about Butler() call.

Examples

While there are many ways to control exactly how a Butler interacts with the collections in its Registry, the most common cases are still simple.

For a read-only Butler that searches one collection, do:

butler = Butler.from_config(
    "/path/to/repo", collections=["u/alice/DM-50000"]
)

For a read-write Butler that writes to and reads from a RUN collection:

butler = Butler.from_config(
    "/path/to/repo", run="u/alice/DM-50000/a"
)

The Butler passed to a PipelineTask is often much more complex, because we want to write to one RUN collection but read from several others (as well):

butler = Butler.from_config(
    "/path/to/repo",
    run="u/alice/DM-50000/a",
    collections=[
        "u/alice/DM-50000/a",
        "u/bob/DM-49998",
        "HSC/defaults",
    ],
)

This butler will put new datasets to the run u/alice/DM-50000/a. Datasets will be read first from that run (since it appears first in the chain), and then from u/bob/DM-49998 and finally HSC/defaults.

Finally, one can always create a Butler with no collections:

butler = Butler.from_config("/path/to/repo", writeable=True)

This can be extremely useful when you just want to use butler.registry, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passing writeable explicitly here is only necessary if you want to be able to make changes to the repo - usually the value for writeable can be guessed from the collection arguments provided, but it defaults to False when there are not collection arguments.

Retrieve a stored dataset.

Parameters:

datasetRefOrTypeDatasetRef, DatasetType, or str: When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof. If a resolved DatasetRef, the associated dataset is returned directly without additional querying.
dataIddict or DataCoordinate: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.
parametersdict: Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
collectionsAny, optional: Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.
storageClassStorageClass or str, optional: The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.
timespanTimespan or None, optional: A timespan that the validity range of the dataset must overlap. If not provided and this is a calibration dataset type, an attempt will be made to find the timespan from any temporal coordinate in the data ID.
**kwargs: Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:

objobject: The dataset.

Raises:

LookupError: Raised if no matching dataset exists in the Registry.
TypeError: Raised if no collections were provided.

Notes

When looking up datasets in a CALIBRATION collection, this method requires that the given data ID include temporal dimensions beyond the dimensions of the dataset type itself, in order to find the dataset with the appropriate validity range. For example, a “bias” dataset with native dimensions {instrument, detector} could be fetched with a {instrument, detector, exposure} data ID, because exposure is a temporal dimension.

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

Parameters:

datasetRefOrTypeDatasetRef, DatasetType, or str: When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.
dataIddict or DataCoordinate, optional: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.
parametersdict: Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
collectionsAny, optional: Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.
storageClassStorageClass or str, optional: The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read StorageClass can force a different type to be returned. This type must be compatible with the original type.
timespanTimespan or None, optional: A timespan that the validity range of the dataset must overlap. If not provided and this is a calibration dataset type, an attempt will be made to find the timespan from any temporal coordinate in the data ID.
**kwargs: Additional keyword arguments used to augment or construct a DataId. See DataId parameters.

Returns:

objDeferredDatasetHandle: A handle which can be used to retrieve a dataset at a later time.

Raises:

LookupError: Raised if no matching dataset exists in the Registry or datastore.
ValueError: Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.
TypeError: Raised if no collections were provided.

getURI(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) → ResourcePath¶

Return the URI to the Dataset.

Parameters:

datasetRefOrTypeDatasetRef, DatasetType, or str: When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.
dataIddict or DataCoordinate: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.
predictbool: If True, allow URIs to be returned of datasets that have not been written.
collectionsAny, optional: Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.
runstr, optional: Run to use for predictions, overriding self.run.
**kwargs: Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:

urilsst.resources.ResourcePath: URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if predict is True, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.

Raises:

LookupError: A URI has been requested for a dataset that does not exist and guessing is not allowed.
ValueError: Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.
TypeError: Raised if no collections were provided.
RuntimeError: Raised if a URI is requested for a dataset that consists of multiple artifacts.

abstract getURIs(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) → DatasetRefURIs¶

Return the URIs associated with the dataset.

Parameters:

datasetRefOrTypeDatasetRef, DatasetType, or str: When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.
dataIddict or DataCoordinate: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.
predictbool: If True, allow URIs to be returned of datasets that have not been written.
collectionsAny, optional: Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.
runstr, optional: Run to use for predictions, overriding self.run.
**kwargs: Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:

urisDatasetRefURIs: The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be None), and the URIs to any components associated with the dataset artifact. (can be empty if there are no components).

abstract get_dataset(id: DatasetId | str, *, storage_class: str | StorageClass | None = None, dimension_records: bool = False, datastore_records: bool = False) → DatasetRef | None¶

Retrieve a Dataset entry.

Parameters:

idDatasetId: The unique identifier for the dataset, as an instance of uuid.UUID or a string containing a hexadecimal number.
storage_classstr or StorageClass or None: A storage class to use when creating the returned entry. If given it must be compatible with the default storage class.
dimension_recordsbool, optional: If True the ref will be expanded and contain dimension records.
datastore_recordsbool, optional: If True the ref will contain associated datastore records.

Returns:

refDatasetRef or None: A ref to the Dataset, or None if no matching Dataset was found.

classmethod get_dataset_from_uri(uri: str, factory: LabeledButlerFactoryProtocol | None = None) → SpecificButlerDataset¶

Get the dataset associated with the given dataset URI.

Parameters:

uristr: The URI associated with a dataset.
factoryLabeledButlerFactoryProtocol or None, optional: Bound factory function that will be given the butler label and receive a Butler. If this is not provided the label will be tried directly.

Returns:

resultSpecificButlerDataset: The butler associated with this URI and the dataset itself. The dataset can be None if the UUID is valid but the dataset is not known to this butler.

abstract get_dataset_type(name: str) → DatasetType¶

Get the DatasetType.

Parameters:

namestr: Name of the type.

Returns:

typeDatasetType: The DatasetType associated with the given name.

Raises:

lsst.daf.butler.MissingDatasetTypeError: Raised if the requested dataset type has not been registered.

Notes

This method handles component dataset types automatically, though most other operations do not.

classmethod get_known_repos() → set[str]¶

Retrieve the list of known repository labels.

Returns:

reposset of str: All the known labels. Can be empty if no index can be found.

Notes

See ButlerRepoIndex for details on how the information is discovered.

abstract get_many_datasets(ids: Iterable[DatasetId | str]) → list[DatasetRef]¶

Retrieve a list of dataset entries.

Parameters:

idsIterable [ DatasetId or str ]: The unique identifiers for the datasets, as instances of uuid.UUID or strings containing a hexadecimal number.

Returns:

refslist [ DatasetRef ]: A list containing a DatasetRef for each of the given dataset IDs. If a dataset was not found, no error is thrown – it is just not included in the list. The returned datasets are in no particular order.

classmethod get_repo_uri(label: str, return_label: bool = False) → ResourcePath¶

Look up the label in a butler repository index.

Parameters:

labelstr: Label of the Butler repository to look up.
return_labelbool, optional: If label cannot be found in the repository index (either because index is not defined or label is not in the index) and return_label is True then return ResourcePath(label). If return_label is False (default) then an exception will be raised instead.

Returns:

urilsst.resources.ResourcePath: URI to the Butler repository associated with the given label or default value if it is provided.

Raises:

KeyError: Raised if the label is not found in the index, or if an index is not defined, and return_label is False.

Notes

See ButlerRepoIndex for details on how the information is discovered.

Import datasets into this repository that were exported from a different butler repository via export.

Parameters:

directoryResourcePathExpression, optional: Directory containing dataset files to import from. If None, filename and all dataset file paths specified therein must be absolute.
filenameResourcePathExpression or typing.TextIO: A stream or name of file that contains database information associated with the exported datasets, typically generated by export. If this a string (name) or ResourcePath and is not an absolute path, it will first be looked for relative to directory and if not found there it will be looked for in the current working directory. Defaults to “export.{format}”.
formatstr, optional: File format for filename. If None, the extension of filename will be used.
transferstr, optional: Transfer mode passed to ingest.
skip_dimensionsset, optional: Names of dimensions that should be skipped and not imported.
record_validation_infobool, optional: If True, the default, the datastore can record validation information associated with the file. If False the datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.
without_datastorebool, optional: If True only registry records will be imported and the datastore will be ignored.

Raises:

TypeError: Raised if the set of arguments passed is inconsistent, or if the butler is read-only.

abstract ingest(*datasets: FileDataset, transfer: str | None = 'auto', record_validation_info: bool = True, skip_existing: bool = False) → None¶

Store and register one or more datasets that already exist on disk.

Parameters:

*datasetsFileDataset: Each positional argument is a struct containing information about a file to be ingested, including its URI (either absolute or relative to the datastore root, if applicable), a resolved DatasetRef, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used for put is assumed. On successful ingest all FileDataset.formatter attributes will be set to the formatter class used. FileDataset.path attributes may be modified to put paths in whatever the datastore considers a standardized form.
transferstr, optional: If not None, must be one of ‘auto’, ‘move’, ‘copy’, ‘direct’, ‘split’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.
record_validation_infobool, optional: If True, the default, the datastore can record validation information associated with the file. If False the datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.
skip_existingbool, optional: If True, a dataset will not be ingested if a dataset with the same dataset ID already exists in the datastore. If False (the default), a ConflictingDefinitionError will be raised if any datasets with the same dataset ID already exist in the datastore.

Returns:

None

Raises:

TypeError: Raised if the butler is read-only or if no run was provided.
NotImplementedError: Raised if the Datastore does not support the given transfer mode.
DatasetTypeNotSupportedError: Raised if one or more files to be ingested have a dataset type that is not supported by the Datastore..
FileNotFoundError: Raised if one of the given files does not exist.
FileExistsError: Raised if transfer is not None but the (internal) location the file would be moved to is already occupied.
ConflictingDefinitionError: Raised if a dataset already exists in the repository and skip_existing is False.

Notes

This operation is not fully exception safe: if a database operation fails, the given FileDataset instances may be only partially updated.

It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.

abstract ingest_zip(zip_file: str | ParseResult | ResourcePath | Path, transfer: str = 'auto', *, transfer_dimensions: bool = False, dry_run: bool = False, skip_existing: bool = False) → None¶

Ingest a Zip file into this butler.

The Zip file must have been created by retrieve_artifacts_zip.

Parameters:

zip_filelsst.resources.ResourcePathExpression: Path to the Zip file.
transferstr, optional: Method to use to transfer the Zip into the datastore.
transfer_dimensionsbool, optional: If True, dimension record data associated with the new datasets will be transferred from the Zip file, if present.
dry_runbool, optional: If True the ingest will be processed without any modifications made to the target butler and as if the target butler did not have any of the datasets.
skip_existingbool, optional: If True, a zip will not be ingested if the dataset entries listed in the index with the same dataset ID already exists in the butler. If False (the default), a ConflictingDefinitionError will be raised if any datasets with the same dataset ID already exist in the repository. If, somehow, some datasets are known to the butler and some are not, this is currently treated as an error rather than attempting to do a partial ingest.

Notes

Run collections and dataset types are created as needed.

Create an empty data repository by adding a butler.yaml config to a repository root directory.

Parameters:

rootlsst.resources.ResourcePathExpression: Path or URI to the root location of the new repository. Will be created if it does not exist.
configConfig or str, optional: Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a ButlerConfig or a ConfigSubset. If None, default configuration will be used. Root-dependent config options specified in this config are overwritten if forceConfigRoot is True.
dimensionConfigConfig or str, optional: Configuration for dimensions, will be used to initialize registry database.
standalonebool: If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing a Butler for repos created with standalone=True.
searchPathslist of str, optional: Directory paths to search when calculating the full butler configuration.
forceConfigRootbool, optional: If False, any values present in the supplied config that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter is True the values for root will be forced into the resulting config if appropriate.
outfilelsst.resources.ResourcePathExpression, optional: If not-None, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to write butler.yaml.
overwritebool, optional: Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.

Returns:

configConfig: The updated Config instance written to the repo.

Raises:

ValueError: Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support standalone=False).
FileExistsError: Raised if the output config file already exists.
os.error: Raised if the directory does not exist, exists but is not a directory, or cannot be created.

Notes

Note that when standalone=False (the default), the configuration search path (see ConfigSubset.defaultSearchPaths) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.

classmethod parse_dataset_uri(uri: str) → ParsedButlerDatasetURI¶

Extract the butler label and dataset ID from a dataset URI.

Parameters:

uristr: The dataset URI to parse.

Returns:

parsedParsedButlerDatasetURI: The label associated with the butler repository from which this dataset originates and the ID of the dataset.

Notes

Supports dataset URIs of the forms ivo://org.rubinobs/usdac/dr1?repo=butler_label&id=UUID (see DMTN-302) and butler://butler_label/UUID. The butler URI is deprecated and can not include / in the label string. ivo URIs can include anything supported by the Butler constructor, including paths to repositories and alias labels.

ivo://org.rubinobs/dr1?repo=/repo/main&id=UUID

will return a label of /repo/main.

This method does not attempt to check that the dataset exists in the labeled butler.

Since the IVOID can be issued by any publisher to represent a Butler dataset there is no validation of the path or netloc component of the URI. The only requirement is that there are id and repo keys in the ivo URI query component.

Store and register a dataset.

Parameters:

objobject: The dataset.
datasetRefOrTypeDatasetRef, DatasetType, or str: When DatasetRef is provided, dataId should be None. Otherwise the DatasetType or name thereof. If a fully resolved DatasetRef is given the run and ID are used directly.
dataIddict or DataCoordinate: A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the second argument.
runstr, optional: The name of the run the dataset should be added to, overriding self.run. Not used if a resolved DatasetRef is provided.
provenanceDatasetProvenance or None, optional: Any provenance that should be attached to the serialized dataset. Not supported by all serialization mechanisms.
**kwargs: Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters. Not used if a resolve DatasetRef is provided.

Returns:

refDatasetRef: A reference to the stored dataset, updated with the correct id if given.

Raises:

TypeError: Raised if the butler is read-only or if no run has been provided.

abstract query() → AbstractContextManager[Query]¶: Context manager returning a queries.Query object used for construction and execution of complex queries.

Query for datasets of potentially multiple types.

Parameters:

collectionsstr or Iterable [ str ], optional: The collection or collections to search, in order. If not provided or None, the default collection search path for this butler is used.
namestr or Iterable [ str ], optional: Names or name patterns (glob-style) that returned dataset type names must match. If an iterable, items are OR’d together. The default is to include all dataset types in the given collections.
find_firstbool, optional: If True (default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in).
data_iddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.
limitint or None, optional: Upper limit on the number of returned records. None can be used if no limit is wanted. A limit of 0 means that the query will be executed and validated but no results will be returned. If a negative value is given a warning will be issued if the number of results is capped by that limit. If no limit is provided, by default a maximum of 20,000 records will be returned.
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the data_id argument (and may be used to provide a constraining data ID even when the data_id argument is None).

Returns:

refslist [ DatasetRef ]: Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e. DataCoordinate.hasFull will return True), but will not include dimension records (DataCoordinate.hasRecords will be False).

Raises:

MissingDatasetTypeError: When no dataset types match name, or an explicit (non-glob) dataset type in name does not exist.
InvalidQueryError: If the parameters to the query are inconsistent or malformed.
MissingCollectionError: If a given collection is not found.

Query for data IDs matching user-provided criteria.

Parameters:

dimensionsDimensionGroup, str, or Iterable [str]: The dimensions of the data IDs to yield, as either DimensionGroup instances or str. Will be automatically expanded to a complete DimensionGroup.
data_iddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.
with_dimension_recordsbool, optional: If True (default is False) then returned data IDs will have dimension records.
order_byIterable [str] or str, optional: Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (-) to use descending ordering.
limitint or None, optional: Upper limit on the number of returned records. None can be used if no limit is wanted. A limit of 0 means that the query will be executed and validated but no results will be returned. In this case there will be no exception even if explain is True. If a negative value is given a warning will be issued if the number of results is capped by that limit.
explainbool, optional: If True (default) then EmptyQueryResultError exception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the data_id argument (and may be used to provide a constraining data ID even when the data_id argument is None).

Returns:

dataIdslist [DataCoordinate]: Data IDs matching the given query parameters. These are always guaranteed to identify all dimensions (DataCoordinate.hasFull returns True).

Raises:

lsst.daf.butler.registry.DataIdError: Raised when data_id or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.
lsst.daf.butler.registry.UserExpressionError: Raised when where expression is invalid.
lsst.daf.butler.EmptyQueryResultError: Raised when query generates empty result and explain is set to True.
TypeError: Raised when the arguments are incompatible.

query_datasets(dataset_type: str | DatasetType, collections: str | Iterable[str] | None = None, *, find_first: bool = True, data_id: DataId | None = None, where: str = '', bind: Mapping[str, Any] | None = None, with_dimension_records: bool = False, order_by: Iterable[str] | str | None = None, limit: int | None = -20000, explain: bool = True, **kwargs: Any) → list[DatasetRef]¶

Query for dataset references matching user-provided criteria.

Parameters:

dataset_typestr or DatasetType: Dataset type object or name to search for.
collectionscollection expression, optional: A collection name or iterable of collection names to search. If not provided, the default collections are used. Can be a wildcard if find_first is False (if find first is requested the order of collections matters and wildcards make the order indeterminate). See Collection expressions for more information.
find_firstbool, optional: If True (default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain wildcards.
data_iddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.
with_dimension_recordsbool, optional: If True (default is False) then returned data IDs will have dimension records.
order_byIterable [str] or str, optional: Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (-) to use descending ordering.
limitint or None, optional: Upper limit on the number of returned records. None can be used if no limit is wanted. A limit of 0 means that the query will be executed and validated but no results will be returned. In this case there will be no exception even if explain is True. If a negative value is given a warning will be issued if the number of results is capped by that limit.
explainbool, optional: If True (default) then EmptyQueryResultError exception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the data_id argument (and may be used to provide a constraining data ID even when the data_id argument is None).

Returns:

refsqueries.DatasetRefQueryResults: Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e. DataCoordinate.hasFull will return True).

Raises:

lsst.daf.butler.registry.DatasetTypeExpressionError: Raised when dataset_type expression is invalid.
lsst.daf.butler.registry.DataIdError: Raised when data_id or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.
lsst.daf.butler.registry.UserExpressionError: Raised when where expression is invalid.
lsst.daf.butler.EmptyQueryResultError: Raised when query generates empty result and explain is set to True.
TypeError: Raised when the arguments are incompatible, such as when a collection wildcard is passed when find_first is True, or when collections is None and default butler collections are not defined.

query_dimension_records(element: str, *, data_id: DataId | None = None, where: str = '', bind: Mapping[str, Any] | None = None, order_by: Iterable[str] | str | None = None, limit: int | None = -20000, explain: bool = True, **kwargs: Any) → list[DimensionRecord]¶

Query for dimension information matching user-provided criteria.

Parameters:

elementstr: The name of a dimension element to obtain records for.
data_iddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. See Registry.queryDataIds and Dimension expressions for more information.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.
order_byIterable [str] or str, optional: Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (-) to use descending ordering.
limitint or None, optional: Upper limit on the number of returned records. None can be used if no limit is wanted. A limit of 0 means that the query will be executed and validated but no results will be returned. In this case there will be no exception even if explain is True. If a negative value is given a warning will be issued if the number of results is capped by that limit.
explainbool, optional: If True (default) then EmptyQueryResultError exception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the data_id argument (and may be used to provide a constraining data ID even when the data_id argument is None).

Returns:

recordslist [DimensionRecord]: Dimension records matching the given query parameters.

Raises:

lsst.daf.butler.registry.DataIdError: Raised when data_id or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.
lsst.daf.butler.registry.UserExpressionError: Raised when where expression is invalid.
lsst.daf.butler.EmptyQueryResultError: Raised when query generates empty result and explain is set to True.
TypeError: Raised when the arguments are incompatible, such as when a collection wildcard is passed when find_first is True, or when collections is None and default butler collections are not defined.

abstract removeRuns(names: ~collections.abc.Iterable[str], unstore: bool | type[lsst.daf.butler._butler._DeprecatedDefault] = <class 'lsst.daf.butler._butler._DeprecatedDefault'>, *, unlink_from_chains: bool = False) → None¶

Remove one or more RUN collections and the datasets within them.

Parameters:

namesIterable [ str ]: The names of the collections to remove.
unstorebool, optional: If True (default), delete datasets from all datastores in which they are present, and attempt to rollback the registry deletions if datastore deletions fail (which may not always be possible). If False, datastore records for these datasets are still removed, but any artifacts (e.g. files) will not be. This parameter is now deprecated and no longer has any effect. Files are always deleted from datastores unless they were ingested using full URIs.
unlink_from_chainsbool, optional: If True remove the RUN collection from any chains prior to removing the RUN. If False the removal will fail if any chains still refer to the RUN.

Raises:

TypeError: Raised if one or more collections are not of type RUN.

abstract retrieveArtifacts(refs: Iterable[DatasetRef], destination: ResourcePathExpression, transfer: str = 'auto', preserve_path: bool = True, overwrite: bool = False) → list[ResourcePath]¶

Retrieve the artifacts associated with the supplied refs.

Parameters:

refsIterable of DatasetRef: The datasets for which artifacts are to be retrieved. A single ref can result in multiple artifacts. The refs must be resolved.
destinationlsst.resources.ResourcePath or str: Location to write the artifacts.
transferstr, optional: Method to use to transfer the artifacts. Must be one of the options supported by transfer_from. “move” is not allowed.
preserve_pathbool, optional: If True the full path of the artifact within the datastore is preserved. If False the final file component of the path is used.
overwritebool, optional: If True allow transfers to overwrite existing files at the destination.

Returns:

targetslist of lsst.resources.ResourcePath: URIs of file artifacts in destination location. Order is not preserved.

Notes

For non-file datastores the artifacts written to the destination may not match the representation inside the datastore. For example a hierarchical data structure in a NoSQL database may well be stored as a JSON file.

abstract retrieve_artifacts_zip(refs: Iterable[DatasetRef], destination: ResourcePathExpression, overwrite: bool = True) → ResourcePath¶

Retrieve artifacts from a Butler and place in ZIP file.

Parameters:

refsIterable [ DatasetRef ]: The datasets to be included in the zip file.
destinationlsst.resources.ResourcePathExpression: Directory to write the new ZIP file. This directory will also be used as a staging area for the datasets being downloaded from the datastore.
overwritebool, optional: If False the output Zip will not be written if a file of the same name is already present in destination.

Returns:

zip_filelsst.resources.ResourcePath: The path to the new ZIP file.

Raises:

ValueError: Raised if there are no refs to retrieve.

abstract transaction() → AbstractContextManager[None]¶

Context manager supporting Butler transactions.

Transactions can be nested.

abstract transfer_dimension_records_from(source_butler: LimitedButler | Butler, source_refs: Iterable[DatasetRef | DataCoordinate]) → None¶

Transfer dimension records to this Butler from another Butler.

Parameters:

source_butlerLimitedButler or Butler: Butler from which the records are to be transferred. If data IDs in source_refs are not expanded then this has to be a full Butler whose registry will be used to expand data IDs. If the source refs contain coordinates that are used to populate other records then this will also need to be a full Butler.
source_refsIterable [DatasetRef | DataCoordinate]: Datasets or data IDs defined in the source butler whose dimension records should be transferred to this butler.

abstract transfer_from(source_butler: LimitedButler, source_refs: Iterable[DatasetRef], transfer: str = 'auto', skip_missing: bool = True, register_dataset_types: bool = False, transfer_dimensions: bool = False, dry_run: bool = False) → Collection[DatasetRef]¶

Transfer datasets to this Butler from a run in another Butler.

Parameters:

source_butlerLimitedButler: Butler from which the datasets are to be transferred. If data IDs in source_refs are not expanded then this has to be a full Butler whose registry will be used to expand data IDs.
source_refsIterable of DatasetRef: Datasets defined in the source butler that should be transferred to this butler. In most circumstances, transfer_from is faster if the dataset refs are expanded.
transferstr, optional: Transfer mode passed to transfer_from.
skip_missingbool: If True, datasets with no datastore artifact associated with them are not transferred. If False a registry entry will be created even if no datastore record is created (and so will look equivalent to the dataset being unstored).
register_dataset_typesbool: If True any missing dataset types are registered. Otherwise an exception is raised.
transfer_dimensionsbool, optional: If True, dimension record data associated with the new datasets will be transferred.
dry_runbool, optional: If True the transfer will be processed without any modifications made to the target butler and as if the target butler did not have any of the datasets.

Returns:

refslist of DatasetRef: The refs added to this Butler.

Notes

The datastore artifact has to exist for a transfer to be made but non-existence is not an error.

Datasets that already exist in this run will be skipped.

The datasets are imported as part of a transaction, although dataset types are registered before the transaction is started. This means that it is possible for a dataset type to be registered even though transfer has failed.

abstract validateConfiguration(logFailures: bool = False, datasetTypeNames: Iterable[str] | None = None, ignore: Iterable[str] | None = None) → None¶

Validate butler configuration.

Checks that each DatasetType can be stored in the Datastore.

Parameters:

logFailuresbool, optional: If True, output a log message for every validation error detected.
datasetTypeNamesIterable of str, optional: The DatasetType names that should be checked. This allows only a subset to be selected.
ignoreIterable of str, optional: Names of DatasetTypes to skip over. This can be used to skip known problems. If a named DatasetType corresponds to a composite, all components of that DatasetType will also be ignored.

Raises:

ButlerValidationError: Raised if there is some inconsistency with how this Butler is configured.

Navigation

Butler¶