Butler¶
- class lsst.daf.butler.Butler(config: Config | str | ParseResult | ResourcePath | Path | None = None, *, butler: Butler | None = None, collections: Any = None, run: str | None = None, searchPaths: Sequence[str | ParseResult | ResourcePath | Path] | None = None, writeable: bool | None = None, inferDefaults: bool = True, without_datastore: bool = False, **kwargs: str)¶
Bases:
LimitedButler
Main entry point for the data access system.
- Parameters:
- config
ButlerConfig
,Config
orstr
, optional. Configuration. Anything acceptable to the
ButlerConfig
constructor. If a directory path is given the configuration will be read from abutler.yaml
file in that location. IfNone
is given default values will be used.- butler
Butler
, optional. If provided, construct a new Butler that uses the same registry and datastore as the given one, but with the given collection and run. Incompatible with the
config
,searchPaths
, andwriteable
arguments.- collections
str
orIterable
[str
], optional An expression specifying the collections to be searched (in order) when reading datasets. This may be a
str
collection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after theButler
is initialized.- run
str
, optional Name of the
RUN
collection new datasets should be inserted into. Ifcollections
isNone
andrun
is notNone
,collections
will be set to[run]
. If notNone
, this collection will automatically be registered. If this is not set (andwriteable
is not set either), a read-only butler will be created.- searchPaths
list
ofstr
, optional Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a
ButlerConfig
.- writeable
bool
, optional Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of
run
,tags
, orchains
is non-empty.- inferDefaults
bool
, optional If
True
(default) infer default data ID values from the values present in the datasets incollections
: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via**kwargs
, no default will be inferred for that dimension.- without_datastore
bool
, optional If
True
do not attach a datastore to this butler. Any attempts to use a datastore will fail.- **kwargs
str
Default data ID key-value pairs. These may only identify “governor” dimensions like
instrument
andskymap
.
- config
Examples
While there are many ways to control exactly how a
Butler
interacts with the collections in itsRegistry
, the most common cases are still simple.For a read-only
Butler
that searches one collection, do:butler = Butler("/path/to/repo", collections=["u/alice/DM-50000"])
For a read-write
Butler
that writes to and reads from aRUN
collection:butler = Butler("/path/to/repo", run="u/alice/DM-50000/a")
The
Butler
passed to aPipelineTask
is often much more complex, because we want to write to oneRUN
collection but read from several others (as well):butler = Butler("/path/to/repo", run="u/alice/DM-50000/a", collections=["u/alice/DM-50000/a", "u/bob/DM-49998", "HSC/defaults"])
This butler will
put
new datasets to the runu/alice/DM-50000/a
. Datasets will be read first from that run (since it appears first in the chain), and then fromu/bob/DM-49998
and finallyHSC/defaults
.Finally, one can always create a
Butler
with no collections:butler = Butler("/path/to/repo", writeable=True)
This can be extremely useful when you just want to use
butler.registry
, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passingwriteable
explicitly here is only necessary if you want to be able to make changes to the repo - usually the value forwriteable
can be guessed from the collection arguments provided, but it defaults toFalse
when there are not collection arguments.Attributes Summary
This is a Generation 3 Butler.
The collections to search by default, in order (
Sequence
[str
]).The object that manages actual dataset storage (
Datastore
).Structure managing all dimensions recognized by this data repository (
DimensionUniverse
).The object that manages dataset metadata and relationships (
Registry
).Name of the run this butler writes outputs to by default (
str
orNone
).Methods Summary
datasetExists
(datasetRefOrType[, dataId, ...])Return True if the Dataset is actually present in the Datastore.
datasetExistsDirect
(ref)Return
True
if a dataset is actually present in the Datastore.exists
(dataset_ref_or_type, /[, data_id, ...])Indicate whether a dataset is known to Butler registry and datastore.
export
(*[, directory, filename, format, ...])Export datasets from the repository represented by this
Butler
.get
(datasetRefOrType, /[, dataId, ...])Retrieve a stored dataset.
getDeferred
(datasetRefOrType, /[, dataId, ...])Create a
DeferredDatasetHandle
which can later retrieve a dataset, after an immediate registry lookup.getDirect
(ref, *[, parameters, storageClass])Retrieve a stored dataset.
getDirectDeferred
(ref, *[, parameters, ...])Create a
DeferredDatasetHandle
which can later retrieve a dataset, from a resolvedDatasetRef
.getURI
(datasetRefOrType, /[, dataId, ...])Return the URI to the Dataset.
getURIs
(datasetRefOrType, /[, dataId, ...])Return the URIs associated with the dataset.
Return the names of the datastores associated with this butler.
Return the defined root URIs for all registered datastores.
Retrieve the list of known repository labels.
get_many_uris
(refs[, predict, allow_missing])Return URIs associated with many datasets.
get_repo_uri
(label[, return_label])Look up the label in a butler repository index.
import_
(*[, directory, filename, format, ...])Import datasets into this repository that were exported from a different butler repository via
export
.ingest
(*datasets[, transfer, run, ...])Store and register one or more datasets that already exist on disk.
makeRepo
(root[, config, dimensionConfig, ...])Create an empty data repository by adding a butler.yaml config to a repository root directory.
markInputUnused
(ref)Indicate that a predicted input was not actually used when processing a
Quantum
.pruneDatasets
(refs, *[, disassociate, ...])Remove one or more datasets from a collection and/or storage.
put
(obj, datasetRefOrType, /[, dataId, run])Store and register a dataset.
putDirect
(obj, ref, /)Deprecated since version v26.0.
removeRuns
(names[, unstore])Remove one or more
RUN
collections and the datasets within them.retrieveArtifacts
(refs, destination[, ...])Retrieve the artifacts associated with the supplied refs.
stored
(ref)Indicate whether the dataset's artifacts are present in the Datastore.
stored_many
(refs)Check the datastore for artifact existence of multiple datasets at once.
Context manager supporting
Butler
transactions.transfer_from
(source_butler, source_refs[, ...])Transfer datasets to this Butler from a run in another Butler.
validateConfiguration
([logFailures, ...])Validate butler configuration.
Attributes Documentation
- GENERATION: ClassVar[int] = 3¶
This is a Generation 3 Butler.
This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code.
- collections¶
The collections to search by default, in order (
Sequence
[str
]).This is an alias for
self.registry.defaults.collections
. It cannot be set directly in isolation, but all defaults may be changed together by assigning a newRegistryDefaults
instance toself.registry.defaults
.
- datastore: Datastore¶
The object that manages actual dataset storage (
Datastore
).Direct user access to the datastore should rarely be necessary; the primary exception is the case where a
Datastore
implementation provides extra functionality beyond what the base class defines.
- dimensions¶
- registry¶
The object that manages dataset metadata and relationships (
Registry
).Many operations that don’t involve reading or writing butler datasets are accessible only via
Registry
methods. Eventually these methods will be replaced by equivalentButler
methods.
- run¶
Name of the run this butler writes outputs to by default (
str
orNone
).This is an alias for
self.registry.defaults.run
. It cannot be set directly in isolation, but all defaults may be changed together by assigning a newRegistryDefaults
instance toself.registry.defaults
.
Methods Documentation
- datasetExists(datasetRefOrType: DatasetRef | DatasetType | str, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, collections: Any = None, **kwargs: Any) bool ¶
Return True if the Dataset is actually present in the Datastore.
- Parameters:
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
- datasetRefOrType
- Raises:
- LookupError
Raised if the dataset is not even present in the Registry.
- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- NoDefaultCollectionError
Raised if no collections were provided.
Deprecated since version v26.0: Butler.datasetExists() has been replaced by Butler.exists(). Will be removed after v26.0.
- datasetExistsDirect(ref: DatasetRef) bool ¶
Return
True
if a dataset is actually present in the Datastore.- Parameters:
- ref
DatasetRef
Resolved reference to a dataset.
- ref
- Returns:
- exists
bool
Whether the dataset exists in the Datastore.
Deprecated since version v26.0: Butler.datasetExistsDirect() has been replaced by Butler.stored(). Will be removed after v26.0.
- exists
- exists(dataset_ref_or_type: DatasetRef | DatasetType | str, /, data_id: DataCoordinate | Mapping[str, Any] | None = None, *, full_check: bool = True, collections: Any = None, **kwargs: Any) DatasetExistence ¶
Indicate whether a dataset is known to Butler registry and datastore.
- Parameters:
- dataset_ref_or_type
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- data_id
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- full_check
bool
, optional If
True
, an additional check will be made for dataset artifact existence. This will involve additional overhead due to the need to query an external system. IfFalse
registry and datastore will solely be asked if they know about the dataset but no check for the artifact will be performed.- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
- dataset_ref_or_type
- Returns:
- existence
DatasetExistence
Object indicating whether the dataset is known to registry and datastore. Evaluates to
True
if the dataset is present and known to both.
- existence
- export(*, directory: str | None = None, filename: str | None = None, format: str | None = None, transfer: str | None = None) Iterator[RepoExportContext] ¶
Export datasets from the repository represented by this
Butler
.This method is a context manager that returns a helper object (
RepoExportContext
) that is used to indicate what information from the repository should be exported.- Parameters:
- directory
str
, optional Directory dataset files should be written to if
transfer
is notNone
.- filename
str
, optional Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and
directory
is notNone
, it will be written todirectory
instead of the current working directory. Defaults to “export.{format}”.- format
str
, optional File format for the database information file. If
None
, the extension offilename
will be used.- transfer
str
, optional Transfer mode passed to
Datastore.export
.
- directory
- Raises:
- TypeError
Raised if the set of arguments passed is inconsistent.
Examples
Typically the
Registry.queryDataIds
andRegistry.queryDatasets
methods are used to provide the iterables over data IDs and/or datasets to be exported:with butler.export("exports.yaml") as export: # Export all flats, but none of the dimension element rows # (i.e. data ID information) associated with them. export.saveDatasets(butler.registry.queryDatasets("flat"), elements=()) # Export all datasets that start with "deepCoadd_" and all of # their associated data ID information. export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
- get(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, parameters: dict[str, Any] | None = None, collections: Any = None, storageClass: StorageClass | str | None = None, **kwargs: Any) Any ¶
Retrieve a stored dataset.
- Parameters:
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof. If a resolvedDatasetRef
, the associated dataset is returned directly without additional querying.- dataId
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
- datasetRefOrType
- Returns:
- obj
object
The dataset.
- obj
- Raises:
- LookupError
Raised if no matching dataset exists in the
Registry
.- TypeError
Raised if no collections were provided.
Notes
When looking up datasets in a
CALIBRATION
collection, this method requires that the given data ID include temporal dimensions beyond the dimensions of the dataset type itself, in order to find the dataset with the appropriate validity range. For example, a “bias” dataset with native dimensions{instrument, detector}
could be fetched with a{instrument, detector, exposure}
data ID, becauseexposure
is a temporal dimension.
- getDeferred(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, parameters: dict | None = None, collections: Any = None, storageClass: str | StorageClass | None = None, **kwargs: Any) DeferredDatasetHandle ¶
Create a
DeferredDatasetHandle
which can later retrieve a dataset, after an immediate registry lookup.- Parameters:
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId
dict
orDataCoordinate
, optional A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.- **kwargs
Additional keyword arguments used to augment or construct a
DataId
. SeeDataId
parameters.
- datasetRefOrType
- Returns:
- obj
DeferredDatasetHandle
A handle which can be used to retrieve a dataset at a later time.
- obj
- Raises:
- LookupError
Raised if no matching dataset exists in the
Registry
or datastore.- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- getDirect(ref: DatasetRef, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any ¶
Retrieve a stored dataset.
- Parameters:
- ref
DatasetRef
Resolved reference to an already stored dataset.
- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
object
The dataset.
Deprecated since version v26.0: Butler.get() now behaves like Butler.getDirect() when given a DatasetRef. Please use Butler.get(). Will be removed after v26.0.
- obj
- getDirectDeferred(ref: DatasetRef, *, parameters: dict | None = None, storageClass: str | StorageClass | None = None) DeferredDatasetHandle ¶
Create a
DeferredDatasetHandle
which can later retrieve a dataset, from a resolvedDatasetRef
.- Parameters:
- ref
DatasetRef
Resolved reference to an already stored dataset.
- parameters
dict
Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- storageClass
StorageClass
orstr
, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClass
can force a different type to be returned. This type must be compatible with the original type.
- ref
- Returns:
- obj
DeferredDatasetHandle
A handle which can be used to retrieve a dataset at a later time.
- obj
- Raises:
- LookupError
Raised if no matching dataset exists in the
Registry
.
Deprecated since version v26.0: Butler.getDeferred() now behaves like getDirectDeferred() when given a DatasetRef. Please use Butler.getDeferred(). Will be removed after v26.0.
- getURI(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) ResourcePath ¶
Return the URI to the Dataset.
- Parameters:
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- predict
bool
If
True
, allow URIs to be returned of datasets that have not been written.- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- run
str
, optional Run to use for predictions, overriding
self.run
.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
- datasetRefOrType
- Returns:
- uri
lsst.resources.ResourcePath
URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if
predict
isTrue
, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.
- uri
- Raises:
- LookupError
A URI has been requested for a dataset that does not exist and guessing is not allowed.
- ValueError
Raised if a resolved
DatasetRef
was passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- RuntimeError
Raised if a URI is requested for a dataset that consists of multiple artifacts.
- getURIs(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) DatasetRefURIs ¶
Return the URIs associated with the dataset.
- Parameters:
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
thedataId
should beNone
. Otherwise theDatasetType
or name thereof.- dataId
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the first argument.- predict
bool
If
True
, allow URIs to be returned of datasets that have not been written.- collectionsAny, optional
Collections to be searched, overriding
self.collections
. Can be any of the types supported by thecollections
argument to butler construction.- run
str
, optional Run to use for predictions, overriding
self.run
.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters.
- datasetRefOrType
- Returns:
- uris
DatasetRefURIs
The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be
None
), and the URIs to any components associated with the dataset artifact. (can be empty if there are no components).
- uris
- get_datastore_names() tuple[str, ...] ¶
Return the names of the datastores associated with this butler.
- get_datastore_roots() dict[str, lsst.resources._resourcePath.ResourcePath | None] ¶
Return the defined root URIs for all registered datastores.
- Returns:
- roots
dict
[str
,ResourcePath
|None
] A mapping from datastore name to datastore root URI. The root can be
None
if the datastore does not have any concept of a root URI.
- roots
- classmethod get_known_repos() set[str] ¶
Retrieve the list of known repository labels.
Notes
See
ButlerRepoIndex
for details on how the information is discovered.
- get_many_uris(refs: Iterable[DatasetRef], predict: bool = False, allow_missing: bool = False) dict[lsst.daf.butler.core.datasets.ref.DatasetRef, lsst.daf.butler.core.datastore.DatasetRefURIs] ¶
Return URIs associated with many datasets.
- Parameters:
- Returns:
- URIs
dict
of [DatasetRef
,DatasetRefURIs
] A dict of primary and component URIs, indexed by the passed-in refs.
- URIs
- Raises:
- FileNotFoundError
A URI has been requested for a dataset that does not exist and guessing is not allowed.
Notes
In file-based datastores, get_many_uris does not check that the file is present. It assumes that if datastore is aware of the file then it actually exists.
- classmethod get_repo_uri(label: str, return_label: bool = False) ResourcePath ¶
Look up the label in a butler repository index.
- Parameters:
- label
str
Label of the Butler repository to look up.
- return_label
bool
, optional If
label
cannot be found in the repository index (either because index is not defined orlabel
is not in the index) andreturn_label
isTrue
then returnResourcePath(label)
. Ifreturn_label
isFalse
(default) then an exception will be raised instead.
- label
- Returns:
- uri
lsst.resources.ResourcePath
URI to the Butler repository associated with the given label or default value if it is provided.
- uri
- Raises:
- KeyError
Raised if the label is not found in the index, or if an index is not defined, and
return_label
isFalse
.
Notes
See
ButlerRepoIndex
for details on how the information is discovered.
- import_(*, directory: str | ParseResult | ResourcePath | Path | None = None, filename: str | ParseResult | ResourcePath | Path | TextIO | None = None, format: str | None = None, transfer: str | None = None, skip_dimensions: set | None = None) None ¶
Import datasets into this repository that were exported from a different butler repository via
export
.- Parameters:
- directory
ResourcePathExpression
, optional Directory containing dataset files to import from. If
None
,filename
and all dataset file paths specified therein must be absolute.- filename
ResourcePathExpression
orTextIO
A stream or name of file that contains database information associated with the exported datasets, typically generated by
export
. If this a string (name) orResourcePath
and is not an absolute path, it will first be looked for relative todirectory
and if not found there it will be looked for in the current working directory. Defaults to “export.{format}”.- format
str
, optional File format for
filename
. IfNone
, the extension offilename
will be used.- transfer
str
, optional Transfer mode passed to
ingest
.- skip_dimensions
set
, optional Names of dimensions that should be skipped and not imported.
- directory
- Raises:
- TypeError
Raised if the set of arguments passed is inconsistent, or if the butler is read-only.
- ingest(*datasets: FileDataset, transfer: str | None = 'auto', run: str | None = None, idGenerationMode: DatasetIdGenEnum | None = None, record_validation_info: bool = True) None ¶
Store and register one or more datasets that already exist on disk.
- Parameters:
- datasets
FileDataset
Each positional argument is a struct containing information about a file to be ingested, including its URI (either absolute or relative to the datastore root, if applicable), a resolved
DatasetRef
, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used forput
is assumed. On successful ingest allFileDataset.formatter
attributes will be set to the formatter class used.FileDataset.path
attributes may be modified to put paths in whatever the datastore considers a standardized form.- transfer
str
, optional If not
None
, must be one of ‘auto’, ‘move’, ‘copy’, ‘direct’, ‘split’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.- run
str
, optional The name of the run ingested datasets should be added to, overriding
self.run
. This parameter is now deprecated since the run is encoded in theFileDataset
.- idGenerationMode
DatasetIdGenEnum
, optional Specifies option for generating dataset IDs. Parameter is deprecated.
- record_validation_info
bool
, optional If
True
, the default, the datastore can record validation information associated with the file. IfFalse
the datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.
- datasets
- Raises:
- TypeError
Raised if the butler is read-only or if no run was provided.
- NotImplementedError
Raised if the
Datastore
does not support the given transfer mode.- DatasetTypeNotSupportedError
Raised if one or more files to be ingested have a dataset type that is not supported by the
Datastore
..- FileNotFoundError
Raised if one of the given files does not exist.
- FileExistsError
Raised if transfer is not
None
but the (internal) location the file would be moved to is already occupied.
Notes
This operation is not fully exception safe: if a database operation fails, the given
FileDataset
instances may be only partially updated.It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.
- static makeRepo(root: str | ParseResult | ResourcePath | Path, config: Config | str | None = None, dimensionConfig: Config | str | None = None, standalone: bool = False, searchPaths: list[str] | None = None, forceConfigRoot: bool = True, outfile: str | ParseResult | ResourcePath | Path | None = None, overwrite: bool = False) Config ¶
Create an empty data repository by adding a butler.yaml config to a repository root directory.
- Parameters:
- root
lsst.resources.ResourcePathExpression
Path or URI to the root location of the new repository. Will be created if it does not exist.
- config
Config
orstr
, optional Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a
ButlerConfig
or aConfigSubset
. IfNone
, default configuration will be used. Root-dependent config options specified in this config are overwritten ifforceConfigRoot
isTrue
.- dimensionConfig
Config
orstr
, optional Configuration for dimensions, will be used to initialize registry database.
- standalone
bool
If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing
Butlers
to repos created withstandalone=True
.- searchPaths
list
ofstr
, optional Directory paths to search when calculating the full butler configuration.
- forceConfigRoot
bool
, optional If
False
, any values present in the suppliedconfig
that would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter isTrue
the values forroot
will be forced into the resulting config if appropriate.- outfile
lss.resources.ResourcePathExpression
, optional If not-
None
, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to writebutler.yaml
.- overwrite
bool
, optional Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.
- root
- Returns:
- Raises:
- ValueError
Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support
standalone=False
).- FileExistsError
Raised if the output config file already exists.
- os.error
Raised if the directory does not exist, exists but is not a directory, or cannot be created.
Notes
Note that when
standalone=False
(the default), the configuration search path (seeConfigSubset.defaultSearchPaths
) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.
- markInputUnused(ref: DatasetRef) None ¶
Indicate that a predicted input was not actually used when processing a
Quantum
.- Parameters:
- ref
DatasetRef
Reference to the unused dataset.
- ref
Notes
By default, a dataset is considered “actually used” if it is accessed via
getDirect
or a handle to it is obtained viagetDirectDeferred
(even if the handle is not used). This method must be called after one of those in order to remove the dataset from the actual input list.This method does nothing for butlers that do not store provenance information (which is the default implementation provided by the base class).
- pruneDatasets(refs: Iterable[DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Iterable[str] = (), purge: bool = False) None ¶
Remove one or more datasets from a collection and/or storage.
- Parameters:
- refs
Iterable
ofDatasetRef
Datasets to prune. These must be “resolved” references (not just a
DatasetType
and data ID).- disassociate
bool
, optional Disassociate pruned datasets from
tags
, or from all collections ifpurge=True
.- unstore
bool
, optional If
True
(False
is default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.- tags
Iterable
[str
], optional TAGGED
collections to disassociate the datasets from. Ignored ifdisassociate
isFalse
orpurge
isTrue
.- purge
bool
, optional If
True
(False
is default), completely remove the dataset from theRegistry
. To prevent accidental deletions,purge
may only beTrue
if all of the following conditions are met:This mode may remove provenance information from datasets other than those provided, and should be used with extreme care.
- refs
- Raises:
- TypeError
Raised if the butler is read-only, if no collection was provided, or the conditions for
purge=True
were not met.
- put(obj: Any, datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, run: str | None = None, **kwargs: Any) DatasetRef ¶
Store and register a dataset.
- Parameters:
- obj
object
The dataset.
- datasetRefOrType
DatasetRef
,DatasetType
, orstr
When
DatasetRef
is provided,dataId
should beNone
. Otherwise theDatasetType
or name thereof. If a fully resolvedDatasetRef
is given the run and ID are used directly.- dataId
dict
orDataCoordinate
A
dict
ofDimension
link name, value pairs that label theDatasetRef
within a Collection. WhenNone
, aDatasetRef
should be provided as the second argument.- run
str
, optional The name of the run the dataset should be added to, overriding
self.run
. Not used if a resolvedDatasetRef
is provided.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate
. SeeDataCoordinate.standardize
parameters. Not used if a resolveDatasetRef
is provided.
- obj
- Returns:
- ref
DatasetRef
A reference to the stored dataset, updated with the correct id if given.
- ref
- Raises:
- TypeError
Raised if the butler is read-only or if no run has been provided.
- putDirect(obj: Any, ref: DatasetRef, /) DatasetRef ¶
Deprecated since version v26.0: Butler.put() now behaves like Butler.putDirect() when given a DatasetRef. Please use Butler.put(). Be aware that you may need to adjust your usage if you were relying on the run parameter to determine the run. Will be removed after v26.0.
- removeRuns(names: Iterable[str], unstore: bool = True) None ¶
Remove one or more
RUN
collections and the datasets within them.- Parameters:
- names
Iterable
[str
] The names of the collections to remove.
- unstore
bool
, optional If
True
(default), delete datasets from all datastores in which they are present, and attempt to rollback the registry deletions if datastore deletions fail (which may not always be possible). IfFalse
, datastore records for these datasets are still removed, but any artifacts (e.g. files) will not be.
- names
- Raises:
- TypeError
Raised if one or more collections are not of type
RUN
.
- retrieveArtifacts(refs: Iterable[DatasetRef], destination: str | ParseResult | ResourcePath | Path, transfer: str = 'auto', preserve_path: bool = True, overwrite: bool = False) list[lsst.resources._resourcePath.ResourcePath] ¶
Retrieve the artifacts associated with the supplied refs.
- Parameters:
- refsiterable of
DatasetRef
The datasets for which artifacts are to be retrieved. A single ref can result in multiple artifacts. The refs must be resolved.
- destination
lsst.resources.ResourcePath
orstr
Location to write the artifacts.
- transfer
str
, optional Method to use to transfer the artifacts. Must be one of the options supported by
transfer_from()
. “move” is not allowed.- preserve_path
bool
, optional If
True
the full path of the artifact within the datastore is preserved. IfFalse
the final file component of the path is used.- overwrite
bool
, optional If
True
allow transfers to overwrite existing files at the destination.
- refsiterable of
- Returns:
- targets
list
oflsst.resources.ResourcePath
URIs of file artifacts in destination location. Order is not preserved.
- targets
Notes
For non-file datastores the artifacts written to the destination may not match the representation inside the datastore. For example a hierarchical data structure in a NoSQL database may well be stored as a JSON file.
- stored(ref: DatasetRef) bool ¶
Indicate whether the dataset’s artifacts are present in the Datastore.
- Parameters:
- ref
DatasetRef
Resolved reference to a dataset.
- ref
- Returns:
- stored
bool
Whether the dataset artifact exists in the datastore and can be retrieved.
- stored
- stored_many(refs: Iterable[DatasetRef]) dict[lsst.daf.butler.core.datasets.ref.DatasetRef, bool] ¶
Check the datastore for artifact existence of multiple datasets at once.
- Parameters:
- refsiterable of
DatasetRef
The datasets to be checked.
- refsiterable of
- Returns:
- existence
dict
of [DatasetRef
,bool
] Mapping from given dataset refs to boolean indicating artifact existence.
- existence
- transaction() Iterator[None] ¶
Context manager supporting
Butler
transactions.Transactions can be nested.
- transfer_from(source_butler: LimitedButler, source_refs: Iterable[DatasetRef], transfer: str = 'auto', skip_missing: bool = True, register_dataset_types: bool = False, transfer_dimensions: bool = False) Collection[DatasetRef] ¶
Transfer datasets to this Butler from a run in another Butler.
- Parameters:
- source_butler
LimitedButler
Butler from which the datasets are to be transferred. If data IDs in
source_refs
are not expanded then this has to be a fullButler
whose registry will be used to expand data IDs.- source_refsiterable of
DatasetRef
Datasets defined in the source butler that should be transferred to this butler.
- transfer
str
, optional Transfer mode passed to
transfer_from
.- skip_missing
bool
If
True
, datasets with no datastore artifact associated with them are not transferred. IfFalse
a registry entry will be created even if no datastore record is created (and so will look equivalent to the dataset being unstored).- register_dataset_types
bool
If
True
any missing dataset types are registered. Otherwise an exception is raised.- transfer_dimensions
bool
, optional If
True
, dimension record data associated with the new datasets will be transferred.
- source_butler
- Returns:
- refs
list
ofDatasetRef
The refs added to this Butler.
- refs
Notes
The datastore artifact has to exist for a transfer to be made but non-existence is not an error.
Datasets that already exist in this run will be skipped.
The datasets are imported as part of a transaction, although dataset types are registered before the transaction is started. This means that it is possible for a dataset type to be registered even though transfer has failed.
- validateConfiguration(logFailures: bool = False, datasetTypeNames: Iterable[str] | None = None, ignore: Iterable[str] | None = None) None ¶
Validate butler configuration.
Checks that each
DatasetType
can be stored in theDatastore
.- Parameters:
- logFailures
bool
, optional If
True
, output a log message for every validation error detected.- datasetTypeNamesiterable of
str
, optional The
DatasetType
names that should be checked. This allows only a subset to be selected.- ignoreiterable of
str
, optional Names of DatasetTypes to skip over. This can be used to skip known problems. If a named
DatasetType
corresponds to a composite, all components of thatDatasetType
will also be ignored.
- logFailures
- Raises:
- ButlerValidationError
Raised if there is some inconsistency with how this Butler is configured.