Butler

class lsst.daf.butler.Butler(config=None, butler=None, collection=None, run=None, searchPaths=None)

Bases: object

Main entry point for the data access system.

Parameters
configButlerConfig, Config or str, optional.

Configuration. Anything acceptable to the ButlerConfig constructor. If a directory path is given the configuration will be read from a butler.yaml file in that location. If None is given default values will be used.

butlerButler, optional.

If provided, construct a new Butler that uses the same registry and datastore as the given one, but with the given collection and run. Incompatible with the config and searchPaths arguments.

collectionstr, optional

Collection to use for all input lookups, overriding config[“collection”] if provided.

runstr, Run, optional

Collection associated with the Run to use for outputs, overriding config[“run”]. If a Run associated with the given Collection does not exist, it will be created. If “collection” is None, this collection will be used for input lookups as well; if not, it must have the same value as “run”.

searchPathslist of str, optional

Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a ButlerConfig.

Raises
ValueError

Raised if neither “collection” nor “run” are provided by argument or config, or if both are provided and are inconsistent.

Attributes
configstr, ButlerConfig or Config, optional

(filename to) configuration. If this is not a ButlerConfig, defaults will be read. If a str, may be the path to a directory containing a “butler.yaml” file.

datastoreDatastore

Datastore to use for storage.

registryRegistry

Registry to use for lookups.

Attributes Summary

GENERATION

This is a Generation 3 Butler.

Methods Summary

datasetExists(datasetRefOrType[, dataId])

Return True if the Dataset is actually present in the Datastore.

export(*[, directory, filename, format, …])

Export datasets from the repository represented by this Butler.

get(datasetRefOrType[, dataId, parameters])

Retrieve a stored dataset.

getDeferred(datasetRefOrType[, dataId, …])

Create a DeferredDatasetHandle which can later retrieve a dataset

getDirect(ref[, parameters])

Retrieve a stored dataset.

getUri(datasetRefOrType[, dataId, predict])

Return the URI to the Dataset.

import_(*[, directory, filename, format, …])

Import datasets exported from a different butler repository.

ingest(*datasets[, transfer])

Store and register one or more datasets that already exist on disk.

makeRepo(root[, config, standalone, …])

Create an empty data repository by adding a butler.yaml config to a repository root directory.

put(obj, datasetRefOrType[, dataId, producer])

Store and register a dataset.

remove(datasetRefOrType[, dataId, delete, …])

Remove a dataset from the collection and possibly the repository.

transaction()

Context manager supporting Butler transactions.

validateConfiguration([logFailures, …])

Validate butler configuration.

Attributes Documentation

GENERATION = 3

This is a Generation 3 Butler.

This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.

Methods Documentation

datasetExists(datasetRefOrType, dataId=None, **kwds)

Return True if the Dataset is actually present in the Datastore.

Parameters
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Raises
LookupError

Raised if the Dataset is not even present in the Registry.

export(*, directory: Optional[str] = None, filename: Optional[str] = None, format: Optional[str] = None, transfer: Optional[str] = None) → AbstractContextManager[lsst.daf.butler.core.repoTransfers.RepoExport]

Export datasets from the repository represented by this Butler.

This method is a context manager that returns a helper object (RepoExport) that is used to indicate what information from the repository should be exported.

Parameters
directorystr, optional

Directory dataset files should be written to if transfer is not None.

filenamestr, optional

Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and directory is not None, it will be written to directory instead of the current working directory. Defaults to “export.{format}”.

formatstr, optional

File format for the database information file. If None, the extension of filename will be used.

transferstr, optional

Transfer mode passed to Datastore.export.

Raises
TypeError

Raised if the set of arguments passed is inconsistent.

Examples

Typically the Registry.queryDimensions and Registry.queryDatasets methods are used to provide the iterables over data IDs and/or datasets to be exported:

with butler.export("exports.yaml") as export:
    # Export all flats, and the calibration_label dimensions
    # associated with them.
    export.saveDatasets(butler.registry.queryDatasets("flat"),
                        elements=[butler.registry.dimensions["calibration_label"]])
    # Export all datasets that start with "deepCoadd_" and all of
    # their associated data ID information.
    export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
get(datasetRefOrType, dataId=None, parameters=None, **kwds)

Retrieve a stored dataset.

Parameters
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns
objobject

The dataset.

getDeferred(datasetRefOrType: Union[lsst.daf.butler.core.datasets.DatasetRef, lsst.daf.butler.core.datasets.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, parameters: Optional[dict] = None, **kwds) → lsst.daf.butler.core.deferredDatasetHandle.DeferredDatasetHandle

Create a DeferredDatasetHandle which can later retrieve a dataset

Parameters
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate, optional

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

kwds

Additional keyword arguments used to augment or construct a DataId. See DataId parameters.

Returns
objDeferredDatasetHandle

A handle which can be used to retrieve a dataset at a later time

getDirect(ref, parameters=None)

Retrieve a stored dataset.

Unlike Butler.get, this method allows datasets outside the Butler’s collection to be read as long as the DatasetRef that identifies them can be obtained separately.

Parameters
refDatasetRef

Reference to an already stored dataset.

parametersdict

Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.

Returns
objobject

The dataset.

getUri(datasetRefOrType, dataId=None, predict=False, **kwds)

Return the URI to the Dataset.

Parameters
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

predictbool

If True, allow URIs to be returned of datasets that have not been written.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns
uristr

URI string pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if predict is True, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.

Raises
FileNotFoundError

A URI has been requested for a dataset that does not exist and guessing is not allowed.

import_(*, directory: Optional[str] = None, filename: Optional[str] = None, format: Optional[str] = None, transfer: Optional[str] = None)

Import datasets exported from a different butler repository.

Parameters
directorystr, optional

Directory containing dataset files. If None, all file paths must be absolute.

filenamestr, optional

Name for the file that containing database information associated with the exported datasets. If this is not an absolute path, does not exist in the current working directory, and directory is not None, it is assumed to be in directory. Defaults to “export.{format}”.

formatstr, optional

File format for the database information file. If None, the extension of filename will be used.

transferstr, optional

Transfer mode passed to Datastore.export.

Raises
TypeError

Raised if the set of arguments passed is inconsistent.

ingest(*datasets: lsst.daf.butler.core.repoTransfers.FileDataset, transfer: Optional[str] = None)

Store and register one or more datasets that already exist on disk.

Parameters
datasetsFileDataset

Each positional argument is a struct containing information about a file to be ingested, including its path (either absolute or relative to the datastore root, if applicable), a DatasetRef, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used for put is assumed. On return, all FileDataset.ref attributes will have their DatasetRef.id attribute populated and all FileDataset.formatter attributes will be set to the formatter class used. FileDataset.path attributes may be modified to put paths in whatever the datastore considers a standardized form.

transferstr, optional

If not None, must be one of ‘move’, ‘copy’, ‘hardlink’, or ‘symlink’, indicating how to transfer the file.

Raises
TypeError

Raised if the butler was not constructed with a Run, and is hence read-only.

NotImplementedError

Raised if the Datastore does not support the given transfer mode.

DatasetTypeNotSupportedError

Raised if one or more files to be ingested have a dataset type that is not supported by the Datastore..

FileNotFoundError

Raised if one of the given files does not exist.

FileExistsError

Raised if transfer is not None but the (internal) location the file would be moved to is already occupied.

static makeRepo(root, config=None, standalone=False, createRegistry=True, searchPaths=None, forceConfigRoot=True, outfile=None)

Create an empty data repository by adding a butler.yaml config to a repository root directory.

Parameters
rootstr

Filesystem path to the root of the new repository. Will be created if it does not exist.

configConfig or str, optional

Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a ButlerConfig or a ConfigSubset. If None, default configuration will be used. Root-dependent config options specified in this config are overwritten if forceConfigRoot is True.

standalonebool

If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing Butlers to repos created with standalone=True.

createRegistrybool, optional

If True create a new Registry.

searchPathslist of str, optional

Directory paths to search when calculating the full butler configuration.

forceConfigRootbool, optional

If False, any values present in the supplied config that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter is True the values for root will be forced into the resulting config if appropriate.

outfilestr, optional

If not-None, the output configuration will be written to this location rather than into the repository itself.

Returns
configConfig

The updated Config instance written to the repo.

Raises
ValueError

Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support standalone=False).

os.error

Raised if the directory does not exist, exists but is not a directory, or cannot be created.

Notes

Note that when standalone=False (the default), the configuration search path (see ConfigSubset.defaultSearchPaths) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.

put(obj, datasetRefOrType, dataId=None, producer=None, **kwds)

Store and register a dataset.

Parameters
objobject

The dataset.

datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef is provided, dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataCoordinate

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the second argument.

producerQuantum, optional

The producer.

kwds

Additional keyword arguments used to augment or construct a DataCoordinate. See DataCoordinate.standardize parameters.

Returns
refDatasetRef

A reference to the stored dataset, updated with the correct id if given.

Raises
TypeError

Raised if the butler was not constructed with a Run, and is hence read-only.

remove(datasetRefOrType, dataId=None, *, delete=True, remember=True, **kwds)

Remove a dataset from the collection and possibly the repository.

The identified dataset is always at least removed from the Butler’s collection. By default it is also deleted from the Datastore (e.g. files are actually deleted), but the dataset is “remembered” by retaining its row in the dataset and provenance tables in the registry.

If the dataset is a composite, all components will also be removed.

Parameters
datasetRefOrTypeDatasetRef, DatasetType, or str

When DatasetRef the dataId should be None. Otherwise the DatasetType or name thereof.

dataIddict or DataId

A dict of Dimension link name, value pairs that label the DatasetRef within a Collection. When None, a DatasetRef should be provided as the first argument.

deletebool

If True (default) actually delete the dataset from the Datastore (i.e. actually remove files).

rememberbool

If True (default), retain dataset and provenance records in the Registry for this dataset.

kwds

Additional keyword arguments used to augment or construct a DataId. See DataId parameters.

Raises
ValueError

Raised if delete and remember are both False; a dataset cannot remain in a Datastore if all of its Registry entries are removed.

OrphanedRecordError

Raised if remember is False but the dataset is still present in a Datastore not recognized by this Butler client.

transaction()

Context manager supporting Butler transactions.

Transactions can be nested.

validateConfiguration(logFailures=False, datasetTypeNames=None, ignore=None)

Validate butler configuration.

Checks that each DatasetType can be stored in the Datastore.

Parameters
logFailuresbool, optional

If True, output a log message for every validation error detected.

datasetTypeNamesiterable of str, optional

The DatasetType names that should be checked. This allows only a subset to be selected.

ignoreiterable of str, optional

Names of DatasetTypes to skip over. This can be used to skip known problems. If a named DatasetType corresponds to a composite, all component of that DatasetType will also be ignored.

Raises
ButlerValidationError

Raised if there is some inconsistency with how this Butler is configured.