Butler¶
-
class
lsst.daf.butler.
Butler
(config: Union[lsst.daf.butler.core.config.Config, str, None] = None, *, butler: Optional[lsst.daf.butler._butler.Butler] = None, collections: Optional[Any] = None, run: Optional[str] = None, tags: Iterable[str] = (), chains: Optional[Mapping[str, Any]] = None, searchPaths: Optional[List[str]] = None, writeable: Optional[bool] = None)¶ Bases:
object
Main entry point for the data access system.
Parameters: - config :
ButlerConfig
,Config
orstr
, optional. Configuration. Anything acceptable to the
ButlerConfig
constructor. If a directory path is given the configuration will be read from abutler.yaml
file in that location. IfNone
is given default values will be used.- butler :
Butler
, optional. If provided, construct a new Butler that uses the same registry and datastore as the given one, but with the given collection and run. Incompatible with the
config
,searchPaths
, andwriteable
arguments.- collections :
Any
, optional An expression specifying the collections to be searched (in order) when reading datasets, and optionally dataset type restrictions on them. This may be: - a
str
collection name; - a tuple of (collection name, dataset type restriction); - an iterable of either of the above; - a mapping fromstr
to dataset type restriction.See Collection expressions for more information, including the definition of a dataset type restriction. All collections must either already exist or be specified to be created by other arguments.
- run :
str
, optional Name of the run datasets should be output to. If the run does not exist, it will be created. If
collections
isNone
, it will be set to[run]
. If this is not set (andwriteable
is not set either), a read-only butler will be created.- tags :
Iterable
[str
], optional A list of
TAGGED
collections that datasets should be associated with input
oringest
and disassociated from inprune
. If any of these collections does not exist, it will be created.- chains :
Mapping
[str
,Iterable
[str
] ], optional A mapping from the names of new
CHAINED
collections to an expression identifying their child collections (which takes the same form as thecollections
argument. Chains may be nested only if children precede their parents in this mapping.- searchPaths :
list
ofstr
, optional Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a
ButlerConfig
.- writeable :
bool
, optional Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of
run
,tags
, orchains
is non-empty.
Examples
While there are many ways to control exactly how a
Butler
interacts with the collections in itsRegistry
, the most common cases are still simple.For a read-only
Butler
that searches one collection, do:butler = Butler("/path/to/repo", collections=["u/alice/DM-50000"])
For a read-write
Butler
that writes to and reads from aRUN
collection:butler = Butler("/path/to/repo", run="u/alice/DM-50000/a")
The
Butler
passed to aPipelineTask
is often much more complex, because we want to write to oneRUN
collection but read from several others (as well), while defining a newCHAINED
collection that combines them all:butler = Butler("/path/to/repo", run="u/alice/DM-50000/a", collections=["u/alice/DM-50000"], chains={ "u/alice/DM-50000": ["u/alice/DM-50000/a", "u/bob/DM-49998", "raw/hsc"] })
This butler will
put
new datasets to the runu/alice/DM-50000/a
, but they’ll also be available from the chained collectionu/alice/DM-50000
. Datasets will be read first from that run (since it appears first in the chain), and then fromu/bob/DM-49998
and finallyraw/hsc
. Ifu/alice/DM-50000
had already been defined, thechain
argument would be unnecessary. We could also construct a butler that performs exactly the sameput
andget
operations without actually creating a chained collection, just by passing multiple items iscollections
:butler = Butler("/path/to/repo", run="u/alice/DM-50000/a", collections=["u/alice/DM-50000/a", "u/bob/DM-49998", "raw/hsc"])
Finally, one can always create a
Butler
with no collections:butler = Butler("/path/to/repo", writeable=True)
This can be extremely useful when you just want to use
butler.registry
, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passingwriteable
explicitly here is only necessary if you want to be able to make changes to the repo - usually the value forwriteable
is can be guessed from the collection arguments provided, but it defaults toFalse
when there are not collection arguments.Attributes Summary
GENERATION
This is a Generation 3 Butler. Methods Summary
datasetExists
(datasetRefOrType, …)Return True if the Dataset is actually present in the Datastore. export
(*, directory, filename, format, transfer)Export datasets from the repository represented by this Butler
.get
(datasetRefOrType, …)Retrieve a stored dataset. getDeferred
(datasetRefOrType, …)Create a DeferredDatasetHandle
which can later retrieve a datasetgetDirect
(ref, *, parameters, Any]] = None)Retrieve a stored dataset. getUri
(datasetRefOrType, …)Return the URI to the Dataset. import_
(*, directory, filename, format, transfer)Import datasets exported from a different butler repository. ingest
(*datasets, transfer, run, tags)Store and register one or more datasets that already exist on disk. isWriteable
()Return True
if thisButler
supports write operations.makeRepo
(root, config, str, None] = None, …)Create an empty data repository by adding a butler.yaml config to a repository root directory. prune
(refs, *, disassociate, unstore, tags, …)Remove one or more datasets from a collection and/or storage. put
(obj, datasetRefOrType, …)Store and register a dataset. transaction
()Context manager supporting Butler
transactions.validateConfiguration
(logFailures, …)Validate butler configuration. Attributes Documentation
-
GENERATION
= 3¶ This is a Generation 3 Butler.
This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.
Methods Documentation
-
datasetExists
(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, collections: Optional[Any] = None, **kwds) → bool¶ Return True if the Dataset is actually present in the Datastore.
Parameters: - datasetRefOrType :
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId :
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- collections : Any, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- kwds
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
Raises: - LookupError
Raised if the dataset is not even present in the Registry.
- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- datasetRefOrType :
-
export
(*, directory: Optional[str] = None, filename: Optional[str] = None, format: Optional[str] = None, transfer: Optional[str] = None) → AbstractContextManager[lsst.daf.butler.core.repoTransfers.RepoExport]¶ Export datasets from the repository represented by this
Butler
.This method is a context manager that returns a helper object (
RepoExport
) that is used to indicate what information from the repository should be exported.Parameters: - directory :
str
, optional Directory dataset files should be written to if
transfer
is notNone
.- filename :
str
, optional Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and
directory
is notNone
, it will be written todirectory
instead of the current working directory. Defaults to “export.{format}”.- format :
str
, optional File format for the database information file. If
None
, the extension offilename
will be used.- transfer :
str
, optional Transfer mode passed to
Datastore.export
.
Raises: - TypeError
Raised if the set of arguments passed is inconsistent.
Examples
Typically the
Registry.queryDimensions
andRegistry.queryDatasets
methods are used to provide the iterables over data IDs and/or datasets to be exported:with butler.export("exports.yaml") as export: # Export all flats, and the calibration_label dimensions # associated with them. export.saveDatasets(butler.registry.queryDatasets("flat"), elements=[butler.registry.dimensions["calibration_label"]]) # Export all datasets that start with "deepCoadd_" and all of # their associated data ID information. export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
- directory :
-
get
(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, parameters: Optional[Dict[str, Any]] = None, collections: Optional[Any] = None, **kwds) → Any¶ Retrieve a stored dataset.
Parameters: - datasetRefOrType :
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId :
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- parameters :
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collections : Any, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- kwds
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
Returns: - obj :
object
The dataset.
Raises: - ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- LookupError
Raised if no matching dataset exists in the
Registry
.- TypeError
Raised if no collections were provided.
- datasetRefOrType :
-
getDeferred
(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, parameters: Optional[dict] = None, collections: Optional[Any] = None, **kwds) → lsst.daf.butler._deferredDatasetHandle.DeferredDatasetHandle¶ Create a
DeferredDatasetHandle
which can later retrieve a datasetParameters: - datasetRefOrType :
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId :
dict
orDataCoordinate
, optional A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- parameters :
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collections : Any, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- kwds
Additional keyword arguments used to augment or construct a
DataId
. SeeDataId
parameters.
Returns: - obj :
DeferredDatasetHandle
A handle which can be used to retrieve a dataset at a later time.
Raises: - LookupError
Raised if no matching dataset exists in the
Registry
(andallowUnresolved is False
).- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- datasetRefOrType :
-
getDirect
(ref: lsst.daf.butler.core.datasets.ref.DatasetRef, *, parameters: Optional[Dict[str, Any]] = None)¶ Retrieve a stored dataset.
Unlike
Butler.get
, this method allows datasets outside the Butler’s collection to be read as long as theDatasetRef
that identifies them can be obtained separately.Parameters: - ref :
DatasetRef
Reference to an already stored dataset.
- parameters :
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
Returns: - obj :
object
The dataset.
- ref :
-
getUri
(datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, predict: bool = False, collections: Optional[Any] = None, run: Optional[str] = None, **kwds) → str¶ Return the URI to the Dataset.
Parameters: - datasetRefOrType :
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId :
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- predict :
bool
If
True
, allow URIs to be returned of datasets that have not been written.- collections : Any, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- run :
str
, optional Run to use for predictions, overriding
self.run
.- kwds
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
Returns: - uri :
str
URI string pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if
predict
isTrue
, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.
Raises: - LookupError
A URI has been requested for a dataset that does not exist and guessing is not allowed.
- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- datasetRefOrType :
-
import_
(*, directory: Optional[str] = None, filename: Optional[str] = None, format: Optional[str] = None, transfer: Optional[str] = None)¶ Import datasets exported from a different butler repository.
Parameters: - directory :
str
, optional Directory containing dataset files. If
None
, all file paths must be absolute.- filename :
str
, optional Name for the file that containing database information associated with the exported datasets. If this is not an absolute path, does not exist in the current working directory, and
directory
is notNone
, it is assumed to be indirectory
. Defaults to “export.{format}”.- format :
str
, optional File format for the database information file. If
None
, the extension offilename
will be used.- transfer :
str
, optional Transfer mode passed to
Datastore.export
.
Raises: - TypeError
Raised if the set of arguments passed is inconsistent, or if the butler is read-only.
- directory :
-
ingest
(*datasets, transfer: Optional[str] = None, run: Optional[str] = None, tags: Optional[Iterable[str]] = None)¶ Store and register one or more datasets that already exist on disk.
Parameters: - datasets :
FileDataset
Each positional argument is a struct containing information about a file to be ingested, including its path (either absolute or relative to the datastore root, if applicable), a
DatasetRef
, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used forput
is assumed. On successful return, allFileDataset.ref
attributes will have theirDatasetRef.id
attribute populated and allFileDataset.formatter
attributes will be set to the formatter class used.FileDataset.path
attributes may be modified to put paths in whatever the datastore considers a standardized form.- transfer :
str
, optional If not
None
, must be one of ‘auto’, ‘move’, ‘copy’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.- run :
str
, optional The name of the run ingested datasets should be added to, overriding
self.run
.- tags :
Iterable
[str
], optional The names of a
TAGGED
collections to associate the dataset with, overridingself.tags
. These collections must have already been added to theRegistry
.
Raises: - TypeError
Raised if the butler is read-only or if no run was provided.
- NotImplementedError
Raised if the
Datastore
does not support the given transfer mode.- DatasetTypeNotSupportedError
Raised if one or more files to be ingested have a dataset type that is not supported by the
Datastore
..- FileNotFoundError
Raised if one of the given files does not exist.
- FileExistsError
Raised if transfer is not
None
but the (internal) location the file would be moved to is already occupied.
Notes
This operation is not fully exception safe: if a database operation fails, the given
FileDataset
instances may be only partially updated.It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.
- datasets :
-
static
makeRepo
(root: str, config: Union[lsst.daf.butler.core.config.Config, str, None] = None, standalone: bool = False, createRegistry: bool = True, searchPaths: Optional[List[str]] = None, forceConfigRoot: bool = True, outfile: Optional[str] = None, overwrite: bool = False) → lsst.daf.butler.core.config.Config¶ Create an empty data repository by adding a butler.yaml config to a repository root directory.
Parameters: - root :
str
Filesystem path to the root of the new repository. Will be created if it does not exist.
- config :
Config
orstr
, optional Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a
ButlerConfig
or aConfigSubset
. IfNone
, default configuration will be used. Root-dependent config options specified in this config are overwritten ifforceConfigRoot
isTrue
.- standalone :
bool
If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing
Butlers
to repos created withstandalone=True
.- createRegistry :
bool
, optional If
True
create a new Registry.- searchPaths :
list
ofstr
, optional Directory paths to search when calculating the full butler configuration.
- forceConfigRoot :
bool
, optional If
False
, any values present in the suppliedconfig
that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter isTrue
the values forroot
will be forced into the resulting config if appropriate.- outfile :
str
, optional If not-
None
, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to writebutler.yaml
.- overwrite :
bool
, optional Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.
Returns: Raises: - ValueError
Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support
standalone=False
).- FileExistsError
Raised if the output config file already exists.
- os.error
Raised if the directory does not exist, exists but is not a directory, or cannot be created.
Notes
Note that when
standalone=False
(the default), the configuration search path (seeConfigSubset.defaultSearchPaths
) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.- root :
-
prune
(refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Optional[Iterable[str]] = None, purge: bool = False, run: Optional[str] = None, recursive: bool = True)¶ Remove one or more datasets from a collection and/or storage.
Parameters: - refs :
Iterable
ofDatasetRef
Datasets to prune. These must be “resolved” references (not just a
DatasetType
and data ID).- disassociate : bool`, optional
Disassociate pruned datasets from
self.collections
(or the collection given as thecollection
argument). Dataset that are not in this collection are ignored, unlesspurge
isTrue
.- unstore :
bool
, optional If
True
(False
is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.- tags :
Iterable
[str
], optional TAGGED
collections to disassociate the datasets from, overridingself.tags
. Ignored ifdisassociate
isFalse
orpurge
isTrue
.- purge :
bool
, optional If
True
(False
is default), completely remove the dataset from theRegistry
. To prevent accidental deletions,purge
may only beTrue
if all of the following conditions are met:This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.
- run :
str
, optional RUN
collection to purge from, overridingself.run
. Ignored unlesspurge
isTrue
.- recursive :
bool
, optional If
True
(default) also prune component datasets of any given composite datasets. This will only prune components that are actually attached to the givenDatasetRef
objects, which may not reflect what is in the database (especially if they were obtained fromRegistry.queryDatasets
, which does not include components in its results).
Raises: - TypeError
Raised if the butler is read-only, if no collection was provided, or the conditions for
purge=True
were not met.- IOError
Raised an incomplete deletion may have left the repository in an inconsistent state. Only possible if
unstore=True
, and always accompanied by a chained exception describing the lower-level error.
- refs :
-
put
(obj: Any, datasetRefOrType: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, producer: Optional[lsst.daf.butler.core.quantum.Quantum] = None, run: Optional[str] = None, tags: Optional[Iterable[str]] = None, **kwds) → lsst.daf.butler.core.datasets.ref.DatasetRef¶ Store and register a dataset.
Parameters: - obj :
object
The dataset.
- datasetRefOrType :
DatasetRef
,DatasetType
, orstr
When
DatasetRef
is provided,dataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId :
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the second argument.- producer :
Quantum
, optional The producer.
- run :
str
, optional The name of the run the dataset should be added to, overriding
self.run
.- tags :
Iterable
[str
], optional The names of a
TAGGED
collections to associate the dataset with, overridingself.tags
. These collections must have already been added to theRegistry
.- kwds
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
Returns: - ref :
DatasetRef
A reference to the stored dataset, updated with the correct id if given.
Raises: - TypeError
Raised if the butler is read-only or if no run has been provided.
- obj :
-
validateConfiguration
(logFailures: bool = False, datasetTypeNames: Optional[Iterable[str]] = None, ignore: Optional[Iterable[str]] = None)¶ Validate butler configuration.
Checks that each
DatasetType
can be stored in theDatastore
.Parameters: - logFailures :
bool
, optional If
True
, output a log message for every validation error detected.- datasetTypeNames : iterable of
str
, optional The
DatasetType
names that should be checked. This allows only a subset to be selected.- ignore : iterable of
str
, optional Names of DatasetTypes to skip over. This can be used to skip known problems. If a named
DatasetType
corresponds to a composite, all component of thatDatasetType
will also be ignored.
Raises: - ButlerValidationError
Raised if there is some inconsistency with how this Butler is configured.
- logFailures :
- config :