Butler

class lsst.daf.butler.Butler(config: Union[lsst.daf.butler.core.config.Config, str, None] = None, *, butler: Optional[lsst.daf.butler._butler.Butler] = None, collections: Optional[Any] = None, run: Optional[str] = None, tags: Iterable[str] = (), chains: Optional[Mapping[str, Any]] = None, searchPaths: Optional[List[str]] = None, writeable: Optional[bool] = None)

Bases: object

Main entry point for the data access system.

Parameters:
config : ButlerConfig, Config or str, optional.

Configuration. Anything acceptable to the ButlerConfig constructor. If a directory path is given the configuration will be read from a butler.yaml file in that location. If None is given default values will be used.

butler : Butler, optional.

If provided, construct a new Butler that uses the same registry and datastore as the given one, but with the given collection and run. Incompatible with the config, searchPaths, and writeable arguments.

collections : Any, optional

An expression specifying the collections to be searched (in order) when reading datasets, and optionally dataset type restrictions on them. This may be: - a str collection name; - a tuple of (collection name, dataset type restriction); - an iterable of either of the above; - a mapping from str to dataset type restriction.

See Collection expressions for more information, including the definition of a dataset type restriction. All collections must either already exist or be specified to be created by other arguments.

run : str, optional

Name of the run datasets should be output to. If the run does not exist, it will be created. If collections is None, it will be set to [run]. If this is not set (and writeable is not set either), a read-only butler will be created.

tags : Iterable [ str ], optional

A list of TAGGED collections that datasets should be associated with in put or ingest and disassociated from in pruneDatasets. If any of these collections does not exist, it will be created.

chains : Mapping [ str, Iterable [ str ] ], optional

A mapping from the names of new CHAINED collections to an expression identifying their child collections (which takes the same form as the collections argument. Chains may be nested only if children precede their parents in this mapping.

searchPaths : list of str, optional

Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a ButlerConfig.

writeable : bool, optional

Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of run, tags, or chains is non-empty.

Examples

While there are many ways to control exactly how a Butler interacts with the collections in its Registry, the most common cases are still simple.

For a read-only Butler that searches one collection, do:

butler = Butler("/path/to/repo", collections=["u/alice/DM-50000"])

For a read-write Butler that writes to and reads from a RUN collection:

butler = Butler("/path/to/repo", run="u/alice/DM-50000/a")

The Butler passed to a PipelineTask is often much more complex, because we want to write to one RUN collection but read from several others (as well), while defining a new CHAINED collection that combines them all:

butler = Butler("/path/to/repo", run="u/alice/DM-50000/a",
                collections=["u/alice/DM-50000"],
                chains={
                    "u/alice/DM-50000": ["u/alice/DM-50000/a",
                                         "u/bob/DM-49998",
                                         "raw/hsc"]
                })

This butler will put new datasets to the run u/alice/DM-50000/a, but they’ll also be available from the chained collection u/alice/DM-50000. Datasets will be read first from that run (since it appears first in the chain), and then from u/bob/DM-49998 and finally raw/hsc. If u/alice/DM-50000 had already been defined, the chain argument would be unnecessary. We could also construct a butler that performs exactly the same put and get operations without actually creating a chained collection, just by passing multiple items is collections:

butler = Butler("/path/to/repo", run="u/alice/DM-50000/a",
                collections=["u/alice/DM-50000/a",
                             "u/bob/DM-49998",
                             "raw/hsc"])

Finally, one can always create a Butler with no collections:

butler = Butler("/path/to/repo", writeable=True)

This can be extremely useful when you just want to use butler.registry, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passing writeable explicitly here is only necessary if you want to be able to make changes to the repo - usually the value for writeable is can be guessed from the collection arguments provided, but it defaults to False when there are not collection arguments.

Attributes Summary

GENERATION This is a Generation 3 Butler.

Methods Summary

datasetExists(datasetRefOrType, …) Return True if the Dataset is actually present in the Datastore.
export(*, directory, filename, format, transfer) Export datasets from the repository represented by this Butler.
get(datasetRefOrType, …) Retrieve a stored dataset.
getDeferred(datasetRefOrType, …) Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.
getDirect(ref, *, parameters, Any]] = None) Retrieve a stored dataset.
getDirectDeferred(ref, *, parameters) Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.
getURI(datasetRefOrType, …) Return the URI to the Dataset.
getURIs(datasetRefOrType, …) Returns the URIs associated with the dataset.
import_(*, directory, filename, TextIO, …) Import datasets exported from a different butler repository.
ingest(*datasets, transfer, run, tags) Store and register one or more datasets that already exist on disk.
isWriteable() Return True if this Butler supports write operations.
makeRepo(root, config, str, None] = None, …) Create an empty data repository by adding a butler.yaml config to a repository root directory.
pruneCollection(name, purge, unstore) Remove a collection and possibly prune datasets within it.
pruneDatasets(refs, *, disassociate, …) Remove one or more datasets from a collection and/or storage.
put(obj, datasetRefOrType, …) Store and register a dataset.
transaction() Context manager supporting Butler transactions.
validateConfiguration(logFailures, …) Validate butler configuration.

Attributes Documentation

GENERATION = 3

This is a Generation 3 Butler.

This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.

Methods Documentation

datasetExists(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, collections: Optional[Any] = None, **kwds) → bool

Return True if the Dataset is actually present in the Datastore.

Parameters:
datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

collections : Any, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Raises:
LookupError

Raised if the dataset is not even present in the Registry.

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

TypeError

Raised if no collections were provided.

export(*, directory: Optional[str] = None, filename: Optional[str] = None, format: Optional[str] = None, transfer: Optional[str] = None) → AbstractContextManager[lsst.daf.butler.transfers._context.RepoExportContext]

Export datasets from the repository represented by this Butler.

This method is a context manager that returns a helper object (RepoExportContext) that is used to indicate what information from the repository should be exported.

Parameters:
directory : str, optional

Directory dataset files should be written to if transfer is not None.

filename : str, optional

Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and directory is not None, it will be written to directory instead of the current working directory. Defaults to “export.{format}”.

format : str, optional

File format for the database information file. If None, the extension of filename will be used.

transfer : str, optional

Transfer mode passed to Datastore.export.

Raises:
TypeError

Raised if the set of arguments passed is inconsistent.

Examples

Typically the Registry.queryDataIds and Registry.queryDatasets methods are used to provide the iterables over data IDs and/or datasets to be exported:

with butler.export("exports.yaml") as export:
    # Export all flats, but none of the dimension element rows
    # (i.e. data ID information) associated with them.
    export.saveDatasets(butler.registry.queryDatasets("flat"),
                        elements=())
    # Export all datasets that start with "deepCoadd_" and all of
    # their associated data ID information.
    export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
get(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, parameters: Optional[Dict[str, Any]] = None, collections: Optional[Any] = None, **kwds) → Any

Retrieve a stored dataset.

Parameters:
datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parameters : dict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

collections : Any, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
obj : object

The dataset.

Raises:
ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

LookupError

Raised if no matching dataset exists in the Registry.

TypeError

Raised if no collections were provided.

Notes

When looking up datasets in a CALIBRATION collection, this method requires that the given data ID include temporal dimensions beyond the dimensions of the dataset type itself, in order to find the dataset with the appropriate validity range. For example, a “bias” dataset with native dimensions {instrument, detector} could be fetched with a {instrument, detector, exposure} data ID, because exposure is a temporal dimension.

getDeferred(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, parameters: Optional[dict] = None, collections: Optional[Any] = None, **kwds) → lsst.daf.butler._deferredDatasetHandle.DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, after an immediate registry lookup.

Parameters:
datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate, optional

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parameters : dict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

collections : Any, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

kwds

Additional keyword arguments used to augment or construct a DataId. See DataId parameters.

Returns:
obj : DeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Raises:
LookupError

Raised if no matching dataset exists in the Registry (and allowUnresolved is False).

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

TypeError

Raised if no collections were provided.

getDirect(ref: lsst.daf.butler.core.datasets.ref.DatasetRef, *, parameters: Optional[Dict[str, Any]] = None)

Retrieve a stored dataset.

Unlike Butler.get, this method allows datasets outside the Butler’s collection to be read as long as the DatasetRef that identifies them can be obtained separately.

Parameters:
ref : DatasetRef

Resolved reference to an already stored dataset.

parameters : dict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

Returns:
obj : object

The dataset.

getDirectDeferred(ref: lsst.daf.butler.core.datasets.ref.DatasetRef, *, parameters: Optional[dict] = None) → lsst.daf.butler._deferredDatasetHandle.DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset, from a resolved DatasetRef.

Parameters:
ref : DatasetRef

Resolved reference to an already stored dataset.

parameters : dict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

Returns:
obj : DeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time.

Raises:
AmbiguousDatasetError

Raised if ref.id is None, i.e. the reference is unresolved.

getURI(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, predict: bool = False, collections: Optional[Any] = None, run: Optional[str] = None, **kwds) → lsst.daf.butler.core._butlerUri.ButlerURI

Return the URI to the Dataset.

Parameters:
datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

predict : bool

If True, allow URIs to be returned of datasets that have not been written.

collections : Any, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

run : str, optional

Run to use for predictions, overriding self.run.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
uri : ButlerURI

URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if predict is True, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.

Raises:
LookupError

A URI has been requested for a dataset that does not exist and guessing is not allowed.

ValueError

Raised if a resolved DatasetRef was passed as an input, but it differs from the one found in the registry.

TypeError

Raised if no collections were provided.

RuntimeError

Raised if a URI is requested for a dataset that consists of multiple artifacts.

getURIs(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, predict: bool = False, collections: Optional[Any] = None, run: Optional[str] = None, **kwds) → Tuple[Optional[lsst.daf.butler.core._butlerUri.ButlerURI], Dict[str, lsst.daf.butler.core._butlerUri.ButlerURI]]

Returns the URIs associated with the dataset.

Parameters:
datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

predict : bool

If True, allow URIs to be returned of datasets that have not been written.

collections : Any, optional

Collections to be searched, overriding self.collections. Can be any of the types supported by the collections argument to butler construction.

run : str, optional

Run to use for predictions, overriding self.run.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
primary : ButlerURI

The URI to the primary artifact associated with this dataset. If the dataset was disassembled within the datastore this may be None.

components : dict

URIs to any components associated with the dataset artifact. Can be empty if there are no components.

import_(*, directory: Optional[str] = None, filename: Union[str, TextIO, None] = None, format: Optional[str] = None, transfer: Optional[str] = None, skip_dimensions: Optional[Set[T]] = None)

Import datasets exported from a different butler repository.

Parameters:
directory : str, optional

Directory containing dataset files. If None, all file paths must be absolute.

filename : str or TextIO, optional

A stream or name of file that contains database information associated with the exported datasets. If this a string (name) and is not an absolute path, does not exist in the current working directory, and directory is not None, it is assumed to be in directory. Defaults to “export.{format}”.

format : str, optional

File format for the database information file. If None, the extension of filename will be used.

transfer : str, optional

Transfer mode passed to Datastore.ingest.

skip_dimensions : set, optional

Names of dimensions that should be skipped and not imported.

Raises:
TypeError

Raised if the set of arguments passed is inconsistent, or if the butler is read-only.

ingest(*datasets, transfer: Optional[str] = 'auto', run: Optional[str] = None, tags: Optional[Iterable[str]] = None)

Store and register one or more datasets that already exist on disk.

Parameters:
datasets : FileDataset

Each positional argument is a struct containing information about a file to be ingested, including its path (either absolute or relative to the datastore root, if applicable), a DatasetRef, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used for put is assumed. On successful return, all FileDataset.ref attributes will have their DatasetRef.id attribute populated and all FileDataset.formatter attributes will be set to the formatter class used. FileDataset.path attributes may be modified to put paths in whatever the datastore considers a standardized form.

transfer : str, optional

If not None, must be one of ‘auto’, ‘move’, ‘copy’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.

run : str, optional

The name of the run ingested datasets should be added to, overriding self.run.

tags : Iterable [ str ], optional

The names of a TAGGED collections to associate the dataset with, overriding self.tags. These collections must have already been added to the Registry.

Raises:
TypeError

Raised if the butler is read-only or if no run was provided.

NotImplementedError

Raised if the Datastore does not support the given transfer mode.

DatasetTypeNotSupportedError

Raised if one or more files to be ingested have a dataset type that is not supported by the Datastore..

FileNotFoundError

Raised if one of the given files does not exist.

FileExistsError

Raised if transfer is not None but the (internal) location the file would be moved to is already occupied.

Notes

This operation is not fully exception safe: if a database operation fails, the given FileDataset instances may be only partially updated.

It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.

isWriteable() → bool

Return True if this Butler supports write operations.

static makeRepo(root: str, config: Union[lsst.daf.butler.core.config.Config, str, None] = None, standalone: bool = False, createRegistry: bool = True, searchPaths: Optional[List[str]] = None, forceConfigRoot: bool = True, outfile: Optional[str] = None, overwrite: bool = False) → lsst.daf.butler.core.config.Config

Create an empty data repository by adding a butler.yaml config to a repository root directory.

Parameters:
root : str or ButlerURI

Path or URI to the root location of the new repository. Will be created if it does not exist.

config : Config or str, optional

Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a ButlerConfig or a ConfigSubset. If None, default configuration will be used. Root-dependent config options specified in this config are overwritten if forceConfigRoot is True.

standalone : bool

If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing Butlers to repos created with standalone=True.

createRegistry : bool, optional

If True create a new Registry.

searchPaths : list of str, optional

Directory paths to search when calculating the full butler configuration.

forceConfigRoot : bool, optional

If False, any values present in the supplied config that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter is True the values for root will be forced into the resulting config if appropriate.

outfile : str, optional

If not-None, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to write butler.yaml.

overwrite : bool, optional

Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.

Returns:
config : Config

The updated Config instance written to the repo.

Raises:
ValueError

Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support standalone=False).

FileExistsError

Raised if the output config file already exists.

os.error

Raised if the directory does not exist, exists but is not a directory, or cannot be created.

Notes

Note that when standalone=False (the default), the configuration search path (see ConfigSubset.defaultSearchPaths) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.

pruneCollection(name: str, purge: bool = False, unstore: bool = False)

Remove a collection and possibly prune datasets within it.

Parameters:
name : str

Name of the collection to remove. If this is a TAGGED or CHAINED collection, datasets within the collection are not modified unless unstore is True. If this is a RUN collection, purge and unstore must be True, and all datasets in it are fully removed from the data repository.

purge : bool, optional

If True, permit RUN collections to be removed, fully removing datasets within them. Requires unstore=True as well as an added precaution against accidental deletion. Must be False (default) if the collection is not a RUN.

unstore: `bool`, optional

If True, remove all datasets in the collection from all datastores in which they appear.

Raises:
TypeError

Raised if the butler is read-only or arguments are mutually inconsistent.

pruneDatasets(refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Optional[Iterable[str]] = None, purge: bool = False, run: Optional[str] = None)

Remove one or more datasets from a collection and/or storage.

Parameters:
refs : Iterable of DatasetRef

Datasets to prune. These must be “resolved” references (not just a DatasetType and data ID).

disassociate : bool, optional

Disassociate pruned datasets from self.tags (or the collections given via the tags argument).

unstore : bool, optional

If True (False is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.

tags : Iterable [ str ], optional

TAGGED collections to disassociate the datasets from, overriding self.tags. Ignored if disassociate is False or purge is True.

purge : bool, optional

If True (False is default), completely remove the dataset from the Registry. To prevent accidental deletions, purge may only be True if all of the following conditions are met:

  • All given datasets are in the given run.
  • disassociate is True;
  • unstore is True.

This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.

run : str, optional

RUN collection to purge from, overriding self.run. Ignored unless purge is True.

Raises:
TypeError

Raised if the butler is read-only, if no collection was provided, or the conditions for purge=True were not met.

put(obj: Any, datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, run: Optional[str] = None, tags: Optional[Iterable[str]] = None, **kwds) → lsst.daf.butler.core.datasets.ref.DatasetRef

Store and register a dataset.

Parameters:
obj : object

The dataset.

datasetRefOrType : DatasetRef, DatasetType, or str

When DatasetRef is provided, dataId should be None. Otherwise the DatasetType or name thereof.

dataId : dict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the second argument.

run : str, optional

The name of the run the dataset should be added to, overriding self.run.

tags : Iterable [ str ], optional

The names of a TAGGED collections to associate the dataset with, overriding self.tags. These collections must have already been added to the Registry.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns:
ref : DatasetRef

A reference to the stored dataset, updated with the correct id if given.

Raises:
TypeError

Raised if the butler is read-only or if no run has been provided.

transaction()

Context manager supporting Butler transactions.

Transactions can be nested.

validateConfiguration(logFailures: bool = False, datasetTypeNames: Optional[Iterable[str]] = None, ignore: Optional[Iterable[str]] = None)

Validate butler configuration.

Checks that each DatasetType can be stored in the Datastore.

Parameters:
logFailures : bool, optional

If True, output a log message for every validation error detected.

datasetTypeNames : iterable of str, optional

The DatasetType names that should be checked. This allows only a subset to be selected.

ignore : iterable of str, optional

Names of DatasetTypes to skip over. This can be used to skip known problems. If a named DatasetType corresponds to a composite, all components of that DatasetType will also be ignored.

Raises:
ButlerValidationError

Raised if there is some inconsistency with how this Butler is configured.