Butler¶
- class lsst.daf.butler.Butler(config: Config | str | ParseResult | ResourcePath | Path | None = None, *, collections: Any = None, run: str | None = None, searchPaths: Sequence[str | ParseResult | ResourcePath | Path] | None = None, writeable: bool | None = None, inferDefaults: bool = True, without_datastore: bool = False, **kwargs: Any)¶
Bases:
LimitedButlerInterface for data butler and factory for Butler instances.
- Parameters:
- config
ButlerConfig,Configorstr, optional Configuration. Anything acceptable to the
ButlerConfigconstructor. If a directory path is given the configuration will be read from abutler.yamlfile in that location. IfNoneis given default values will be used. Ifconfigcontains “cls” key then its value is used as a name of butler class and it must be a sub-class of this class, otherwiseDirectButleris instantiated.- collections
strorIterable[str], optional An expression specifying the collections to be searched (in order) when reading datasets. This may be a
strcollection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after theButleris initialized.- run
str, optional Name of the
RUNcollection new datasets should be inserted into. IfcollectionsisNoneandrunis notNone,collectionswill be set to[run]. If notNone, this collection will automatically be registered. If this is not set (andwriteableis not set either), a read-only butler will be created.- searchPaths
listofstr, optional Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a
ButlerConfig.- writeable
bool, optional Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of
run,tags, orchainsis non-empty.- inferDefaults
bool, optional If
True(default) infer default data ID values from the values present in the datasets incollections: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via**kwargs, no default will be inferred for that dimension.- without_datastore
bool, optional If
Truedo not attach a datastore to this butler. Any attempts to use a datastore will fail.- **kwargs
Any Additional keyword arguments passed to a constructor of actual butler class.
- config
Notes
The preferred way to instantiate Butler is via the
from_configmethod. The call toButler(...)is equivalent toButler.from_config(...), butmypywill complain about the former.Attributes Summary
Object with methods for modifying collection chains (
ButlerCollections).Object with methods for modifying and querying collections (
ButlerCollections).The object that manages dataset metadata and relationships (
Registry).Name of the run this butler writes outputs to by default (
strorNone).Methods Summary
clone(*[, collections, run, inferDefaults, ...])Return a new Butler instance connected to the same repository as this one, optionally overriding
collections,run,inferDefaults, and default data ID.exists(dataset_ref_or_type, /[, data_id, ...])Indicate whether a dataset is known to Butler registry and datastore.
export(*[, directory, filename, format, ...])Export datasets from the repository represented by this
Butler.find_dataset(dataset_type[, data_id, ...])Find a dataset given its
DatasetTypeand data ID.from_config([config, collections, run, ...])Create butler instance from configuration.
get(datasetRefOrType, /[, dataId, ...])Retrieve a stored dataset.
getDeferred(datasetRefOrType, /[, dataId, ...])Create a
DeferredDatasetHandlewhich can later retrieve a dataset, after an immediate registry lookup.getURI(datasetRefOrType, /[, dataId, ...])Return the URI to the Dataset.
getURIs(datasetRefOrType, /[, dataId, ...])Return the URIs associated with the dataset.
get_dataset(id, *[, storage_class, ...])Retrieve a Dataset entry.
get_dataset_from_uri(uri[, factory])Get the dataset associated with the given dataset URI.
get_dataset_type(name)Get the
DatasetType.Retrieve the list of known repository labels.
get_repo_uri(label[, return_label])Look up the label in a butler repository index.
import_(*[, directory, filename, format, ...])Import datasets into this repository that were exported from a different butler repository via
export.ingest(*datasets[, transfer, ...])Store and register one or more datasets that already exist on disk.
ingest_zip(zip_file[, transfer])Ingest a Zip file into this butler.
makeRepo(root[, config, dimensionConfig, ...])Create an empty data repository by adding a butler.yaml config to a repository root directory.
parse_dataset_uri(uri)Extract the butler label and dataset ID from a dataset URI.
put(obj, datasetRefOrType, /[, dataId, run])Store and register a dataset.
query()Context manager returning a
Queryobject used for construction and execution of complex queries.query_data_ids(dimensions, *[, data_id, ...])Query for data IDs matching user-provided criteria.
query_datasets(dataset_type[, collections, ...])Query for dataset references matching user-provided criteria.
query_dimension_records(element, *[, ...])Query for dimension information matching user-provided criteria.
removeRuns(names[, unstore])Remove one or more
RUNcollections and the datasets within them.retrieveArtifacts(refs, destination[, ...])Retrieve the artifacts associated with the supplied refs.
retrieve_artifacts_zip(refs, destination[, ...])Retrieve artifacts from a Butler and place in ZIP file.
Context manager supporting
Butlertransactions.Transfer dimension records to this Butler from another Butler.
transfer_from(source_butler, source_refs[, ...])Transfer datasets to this Butler from a run in another Butler.
validateConfiguration([logFailures, ...])Validate butler configuration.
Attributes Documentation
- collection_chains¶
Object with methods for modifying collection chains (
ButlerCollections).Deprecated. Replaced with
collectionsproperty.
- collections¶
Object with methods for modifying and querying collections (
ButlerCollections).Use of this object is preferred over
registrywherever possible.
- registry¶
The object that manages dataset metadata and relationships (
Registry).Many operations that don’t involve reading or writing butler datasets are accessible only via
Registrymethods. Eventually these methods will be replaced by equivalentButlermethods.
Methods Documentation
- clone(*, collections: CollectionArgType | None | EllipsisType = Ellipsis, run: str | None | EllipsisType = Ellipsis, inferDefaults: bool | EllipsisType = Ellipsis, dataId: dict[str, str] | EllipsisType = Ellipsis) Butler¶
Return a new Butler instance connected to the same repository as this one, optionally overriding
collections,run,inferDefaults, and default data ID.- Parameters:
- collections
CollectionArgTypeorNone, optional Same as constructor. If omitted, uses value from original object.
- run
strorNone, optional Same as constructor. If
None, no default run is used. If omitted, copies value from original object.- inferDefaults
bool, optional Same as constructor. If omitted, copies value from original object.
- dataId
str Same as
kwargspassed to the constructor. If omitted, copies values from original object.
- collections
- abstract exists(dataset_ref_or_type: DatasetRef | DatasetType | str, /, data_id: DataId | None = None, *, full_check: bool = True, collections: Any = None, **kwargs: Any) DatasetExistence¶
Indicate whether a dataset is known to Butler registry and datastore.
- Parameters:
- dataset_ref_or_type
DatasetRef,DatasetType, orstr When
DatasetRefthedataIdshould beNone. Otherwise theDatasetTypeor name thereof.- data_id
dictorDataCoordinate A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the first argument.- full_check
bool, optional If
True, a check will be made for the actual existence of a dataset artifact. This will involve additional overhead due to the need to query an external system. IfFalse, this check will be omitted, and the registry and datastore will solely be asked if they know about the dataset but no direct check for the artifact will be performed.- collectionsAny, optional
Collections to be searched, overriding
self.collections. Can be any of the types supported by thecollectionsargument to butler construction.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate. SeeDataCoordinate.standardizeparameters.
- dataset_ref_or_type
- Returns:
- existence
DatasetExistence Object indicating whether the dataset is known to registry and datastore. Evaluates to
Trueif the dataset is present and known to both.
- existence
- abstract export(*, directory: str | None = None, filename: str | None = None, format: str | None = None, transfer: str | None = None) AbstractContextManager[RepoExportContext]¶
Export datasets from the repository represented by this
Butler.This method is a context manager that returns a helper object (
RepoExportContext) that is used to indicate what information from the repository should be exported.- Parameters:
- directory
str, optional Directory dataset files should be written to if
transferis notNone.- filename
str, optional Name for the file that will include database information associated with the exported datasets. If this is not an absolute path and
directoryis notNone, it will be written todirectoryinstead of the current working directory. Defaults to “export.{format}”.- format
str, optional File format for the database information file. If
None, the extension offilenamewill be used.- transfer
str, optional Transfer mode passed to
Datastore.export.
- directory
- Raises:
- TypeError
Raised if the set of arguments passed is inconsistent.
Examples
Typically the
Registry.queryDataIdsandRegistry.queryDatasetsmethods are used to provide the iterables over data IDs and/or datasets to be exported:with butler.export("exports.yaml") as export: # Export all flats, but none of the dimension element rows # (i.e. data ID information) associated with them. export.saveDatasets(butler.registry.queryDatasets("flat"), elements=()) # Export all datasets that start with "deepCoadd_" and all of # their associated data ID information. export.saveDatasets(butler.registry.queryDatasets("deepCoadd_*"))
- abstract find_dataset(dataset_type: DatasetType | str, data_id: DataId | None = None, *, collections: str | Sequence[str] | None = None, timespan: Timespan | None = None, storage_class: str | StorageClass | None = None, dimension_records: bool = False, datastore_records: bool = False, **kwargs: Any) DatasetRef | None¶
Find a dataset given its
DatasetTypeand data ID.This can be used to obtain a
DatasetRefthat permits the dataset to be read from aDatastore. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.- Parameters:
- dataset_type
DatasetTypeorstr A
DatasetTypeor the name of one. If this is aDatasetTypeinstance, its storage class will be respected and propagated to the output, even if it differs from the dataset type definition in the registry, as long as the storage classes are convertible.- data_id
dictorDataCoordinate, optional A
dict-like object containing theDimensionlinks that identify the dataset within a collection. If it is adictthe dataId can include dimension record values such asday_obsandseq_numorfull_namethat can be used to derive the primary dimension.- collections
strorlist[str], optional A an ordered list of collections to search for the dataset. Defaults to
self.defaults.collections.- timespan
Timespan, optional A timespan that the validity range of the dataset must overlap. If not provided, any
CALIBRATIONcollections matched by thecollectionsargument will not be searched.- storage_class
strorStorageClassorNone A storage class to use when creating the returned entry. If given it must be compatible with the default storage class.
- dimension_records
bool, optional If
Truethe ref will be expanded and contain dimension records.- datastore_records
bool, optional If
Truethe ref will contain associated datastore records.- **kwargs
Additional keyword arguments passed to
DataCoordinate.standardizeto convertdataIdto a trueDataCoordinateor augment an existing one. This can also include dimension record metadata that can be used to derive a primary dimension value.
- dataset_type
- Returns:
- ref
DatasetRef A reference to the dataset, or
Noneif no matching Dataset was found.
- ref
- Raises:
- lsst.daf.butler.NoDefaultCollectionError
- LookupError
Raised if one or more data ID keys are missing.
- lsst.daf.butler.MissingDatasetTypeError
Raised if the dataset type does not exist.
- lsst.daf.butler.MissingCollectionError
Raised if any of
collectionsdoes not exist in the registry.
Notes
This method simply returns
Noneand does not raise an exception even when the set of collections searched is intrinsically incompatible with the dataset type, e.g. ifdatasetType.isCalibration() is False, but onlyCALIBRATIONcollections are being searched. This may make it harder to debug some lookup failures, but the behavior is intentional; we consider it more important that failed searches are reported consistently, regardless of the reason, and that adding additional collections that do not contain a match to the search path never changes the behavior.This method handles component dataset types automatically, though most other query operations do not.
- classmethod from_config(config: Config | str | ParseResult | ResourcePath | Path | None = None, *, collections: Any = None, run: str | None = None, searchPaths: Sequence[str | ParseResult | ResourcePath | Path] | None = None, writeable: bool | None = None, inferDefaults: bool = True, without_datastore: bool = False, **kwargs: Any) Butler¶
Create butler instance from configuration.
- Parameters:
- config
ButlerConfig,Configorstr, optional Configuration. Anything acceptable to the
ButlerConfigconstructor. If a directory path is given the configuration will be read from abutler.yamlfile in that location. IfNoneis given default values will be used. Ifconfigcontains “cls” key then its value is used as a name of butler class and it must be a sub-class of this class, otherwiseDirectButleris instantiated.- collections
strorIterable[str], optional An expression specifying the collections to be searched (in order) when reading datasets. This may be a
strcollection name or an iterable thereof. See Collection expressions for more information. These collections are not registered automatically and must be manually registered before they are used by any method, but they may be manually registered after theButleris initialized.- run
str, optional Name of the
RUNcollection new datasets should be inserted into. IfcollectionsisNoneandrunis notNone,collectionswill be set to[run]. If notNone, this collection will automatically be registered. If this is not set (andwriteableis not set either), a read-only butler will be created.- searchPaths
listofstr, optional Directory paths to search when calculating the full Butler configuration. Not used if the supplied config is already a
ButlerConfig.- writeable
bool, optional Explicitly sets whether the butler supports write operations. If not provided, a read-write butler is created if any of
run,tags, orchainsis non-empty.- inferDefaults
bool, optional If
True(default) infer default data ID values from the values present in the datasets incollections: if all collections have the same value (or no value) for a governor dimension, that value will be the default for that dimension. Nonexistent collections are ignored. If a default value is provided explicitly for a governor dimension via**kwargs, no default will be inferred for that dimension.- without_datastore
bool, optional If
Truedo not attach a datastore to this butler. Any attempts to use a datastore will fail.- **kwargs
Any Default data ID key-value pairs. These may only identify “governor” dimensions like
instrumentandskymap.
- config
- Returns:
Notes
Calling this factory method is identical to calling
Butler(config, ...). Its only raison d’être is thatmypycomplains aboutButler()call.Examples
While there are many ways to control exactly how a
Butlerinteracts with the collections in itsRegistry, the most common cases are still simple.For a read-only
Butlerthat searches one collection, do:butler = Butler.from_config( "/path/to/repo", collections=["u/alice/DM-50000"] )
For a read-write
Butlerthat writes to and reads from aRUNcollection:butler = Butler.from_config( "/path/to/repo", run="u/alice/DM-50000/a" )
The
Butlerpassed to aPipelineTaskis often much more complex, because we want to write to oneRUNcollection but read from several others (as well):butler = Butler.from_config( "/path/to/repo", run="u/alice/DM-50000/a", collections=[ "u/alice/DM-50000/a", "u/bob/DM-49998", "HSC/defaults" ] )
This butler will
putnew datasets to the runu/alice/DM-50000/a. Datasets will be read first from that run (since it appears first in the chain), and then fromu/bob/DM-49998and finallyHSC/defaults.Finally, one can always create a
Butlerwith no collections:butler = Butler.from_config("/path/to/repo", writeable=True)
This can be extremely useful when you just want to use
butler.registry, e.g. for inserting dimension data or managing collections, or when the collections you want to use with the butler are not consistent. Passingwriteableexplicitly here is only necessary if you want to be able to make changes to the repo - usually the value forwriteablecan be guessed from the collection arguments provided, but it defaults toFalsewhen there are not collection arguments.
- abstract get(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, parameters: dict[str, Any] | None = None, collections: Any = None, storageClass: StorageClass | str | None = None, timespan: Timespan | None = None, **kwargs: Any) Any¶
Retrieve a stored dataset.
- Parameters:
- datasetRefOrType
DatasetRef,DatasetType, orstr When
DatasetRefthedataIdshould beNone. Otherwise theDatasetTypeor name thereof. If a resolvedDatasetRef, the associated dataset is returned directly without additional querying.- dataId
dictorDataCoordinate A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the first argument.- parameters
dict Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collectionsAny, optional
Collections to be searched, overriding
self.collections. Can be any of the types supported by thecollectionsargument to butler construction.- storageClass
StorageClassorstr, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClasscan force a different type to be returned. This type must be compatible with the original type.- timespan
TimespanorNone, optional A timespan that the validity range of the dataset must overlap. If not provided and this is a calibration dataset type, an attempt will be made to find the timespan from any temporal coordinate in the data ID.
- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate. SeeDataCoordinate.standardizeparameters.
- datasetRefOrType
- Returns:
- obj
object The dataset.
- obj
- Raises:
- LookupError
Raised if no matching dataset exists in the
Registry.- TypeError
Raised if no collections were provided.
Notes
When looking up datasets in a
CALIBRATIONcollection, this method requires that the given data ID include temporal dimensions beyond the dimensions of the dataset type itself, in order to find the dataset with the appropriate validity range. For example, a “bias” dataset with native dimensions{instrument, detector}could be fetched with a{instrument, detector, exposure}data ID, becauseexposureis a temporal dimension.
- abstract getDeferred(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, parameters: dict | None = None, collections: Any = None, storageClass: str | StorageClass | None = None, timespan: Timespan | None = None, **kwargs: Any) DeferredDatasetHandle¶
Create a
DeferredDatasetHandlewhich can later retrieve a dataset, after an immediate registry lookup.- Parameters:
- datasetRefOrType
DatasetRef,DatasetType, orstr When
DatasetRefthedataIdshould beNone. Otherwise theDatasetTypeor name thereof.- dataId
dictorDataCoordinate, optional A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the first argument.- parameters
dict Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset.
- collectionsAny, optional
Collections to be searched, overriding
self.collections. Can be any of the types supported by thecollectionsargument to butler construction.- storageClass
StorageClassorstr, optional The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read
StorageClasscan force a different type to be returned. This type must be compatible with the original type.- timespan
TimespanorNone, optional A timespan that the validity range of the dataset must overlap. If not provided and this is a calibration dataset type, an attempt will be made to find the timespan from any temporal coordinate in the data ID.
- **kwargs
Additional keyword arguments used to augment or construct a
DataId. SeeDataIdparameters.
- datasetRefOrType
- Returns:
- obj
DeferredDatasetHandle A handle which can be used to retrieve a dataset at a later time.
- obj
- Raises:
- LookupError
Raised if no matching dataset exists in the
Registryor datastore.- ValueError
Raised if a resolved
DatasetRefwas passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- getURI(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) ResourcePath¶
Return the URI to the Dataset.
- Parameters:
- datasetRefOrType
DatasetRef,DatasetType, orstr When
DatasetRefthedataIdshould beNone. Otherwise theDatasetTypeor name thereof.- dataId
dictorDataCoordinate A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the first argument.- predict
bool If
True, allow URIs to be returned of datasets that have not been written.- collectionsAny, optional
Collections to be searched, overriding
self.collections. Can be any of the types supported by thecollectionsargument to butler construction.- run
str, optional Run to use for predictions, overriding
self.run.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate. SeeDataCoordinate.standardizeparameters.
- datasetRefOrType
- Returns:
- uri
lsst.resources.ResourcePath URI pointing to the Dataset within the datastore. If the Dataset does not exist in the datastore, and if
predictisTrue, the URI will be a prediction and will include a URI fragment “#predicted”. If the datastore does not have entities that relate well to the concept of a URI the returned URI string will be descriptive. The returned URI is not guaranteed to be obtainable.
- uri
- Raises:
- LookupError
A URI has been requested for a dataset that does not exist and guessing is not allowed.
- ValueError
Raised if a resolved
DatasetRefwas passed as an input, but it differs from the one found in the registry.- TypeError
Raised if no collections were provided.
- RuntimeError
Raised if a URI is requested for a dataset that consists of multiple artifacts.
- abstract getURIs(datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, predict: bool = False, collections: Any = None, run: str | None = None, **kwargs: Any) DatasetRefURIs¶
Return the URIs associated with the dataset.
- Parameters:
- datasetRefOrType
DatasetRef,DatasetType, orstr When
DatasetRefthedataIdshould beNone. Otherwise theDatasetTypeor name thereof.- dataId
dictorDataCoordinate A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the first argument.- predict
bool If
True, allow URIs to be returned of datasets that have not been written.- collectionsAny, optional
Collections to be searched, overriding
self.collections. Can be any of the types supported by thecollectionsargument to butler construction.- run
str, optional Run to use for predictions, overriding
self.run.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate. SeeDataCoordinate.standardizeparameters.
- datasetRefOrType
- Returns:
- uris
DatasetRefURIs The URI to the primary artifact associated with this dataset (if the dataset was disassembled within the datastore this may be
None), and the URIs to any components associated with the dataset artifact. (can be empty if there are no components).
- uris
- abstract get_dataset(id: DatasetId, *, storage_class: str | StorageClass | None = None, dimension_records: bool = False, datastore_records: bool = False) DatasetRef | None¶
Retrieve a Dataset entry.
- Parameters:
- id
DatasetId The unique identifier for the dataset.
- storage_class
strorStorageClassorNone A storage class to use when creating the returned entry. If given it must be compatible with the default storage class.
- dimension_records
bool, optional If
Truethe ref will be expanded and contain dimension records.- datastore_records
bool, optional If
Truethe ref will contain associated datastore records.
- id
- Returns:
- ref
DatasetReforNone A ref to the Dataset, or
Noneif no matching Dataset was found.
- ref
- classmethod get_dataset_from_uri(uri: str, factory: LabeledButlerFactoryProtocol | None = None) SpecificButlerDataset¶
Get the dataset associated with the given dataset URI.
- Parameters:
- uri
str The URI associated with a dataset.
- factory
LabeledButlerFactoryProtocolorNone, optional Bound factory function that will be given the butler label and receive a
Butler. If this is not provided the label will be tried directly.
- uri
- Returns:
- result
SpecificButlerDataset The butler associated with this URI and the dataset itself. The dataset can be
Noneif the UUID is valid but the dataset is not known to this butler.
- result
- abstract get_dataset_type(name: str) DatasetType¶
Get the
DatasetType.- Parameters:
- name
str Name of the type.
- name
- Returns:
- type
DatasetType The
DatasetTypeassociated with the given name.
- type
- Raises:
- lsst.daf.butler.MissingDatasetTypeError
Raised if the requested dataset type has not been registered.
Notes
This method handles component dataset types automatically, though most other operations do not.
- classmethod get_known_repos() set[str]¶
Retrieve the list of known repository labels.
Notes
See
ButlerRepoIndexfor details on how the information is discovered.
- classmethod get_repo_uri(label: str, return_label: bool = False) ResourcePath¶
Look up the label in a butler repository index.
- Parameters:
- label
str Label of the Butler repository to look up.
- return_label
bool, optional If
labelcannot be found in the repository index (either because index is not defined orlabelis not in the index) andreturn_labelisTruethen returnResourcePath(label). Ifreturn_labelisFalse(default) then an exception will be raised instead.
- label
- Returns:
- uri
lsst.resources.ResourcePath URI to the Butler repository associated with the given label or default value if it is provided.
- uri
- Raises:
- KeyError
Raised if the label is not found in the index, or if an index is not defined, and
return_labelisFalse.
Notes
See
ButlerRepoIndexfor details on how the information is discovered.
- abstract import_(*, directory: str | ParseResult | ResourcePath | Path | None = None, filename: str | ParseResult | ResourcePath | Path | TextIO | None = None, format: str | None = None, transfer: str | None = None, skip_dimensions: set | None = None, record_validation_info: bool = True, without_datastore: bool = False) None¶
Import datasets into this repository that were exported from a different butler repository via
export.- Parameters:
- directory
ResourcePathExpression, optional Directory containing dataset files to import from. If
None,filenameand all dataset file paths specified therein must be absolute.- filename
ResourcePathExpressionorTextIO A stream or name of file that contains database information associated with the exported datasets, typically generated by
export. If this a string (name) orResourcePathand is not an absolute path, it will first be looked for relative todirectoryand if not found there it will be looked for in the current working directory. Defaults to “export.{format}”.- format
str, optional File format for
filename. IfNone, the extension offilenamewill be used.- transfer
str, optional Transfer mode passed to
ingest.- skip_dimensions
set, optional Names of dimensions that should be skipped and not imported.
- record_validation_info
bool, optional If
True, the default, the datastore can record validation information associated with the file. IfFalsethe datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.- without_datastore
bool, optional If
Trueonly registry records will be imported and the datastore will be ignored.
- directory
- Raises:
- TypeError
Raised if the set of arguments passed is inconsistent, or if the butler is read-only.
- abstract ingest(*datasets: FileDataset, transfer: str | None = 'auto', record_validation_info: bool = True) None¶
Store and register one or more datasets that already exist on disk.
- Parameters:
- *datasets
FileDataset Each positional argument is a struct containing information about a file to be ingested, including its URI (either absolute or relative to the datastore root, if applicable), a resolved
DatasetRef, and optionally a formatter class or its fully-qualified string name. If a formatter is not provided, the formatter that would be used forputis assumed. On successful ingest allFileDataset.formatterattributes will be set to the formatter class used.FileDataset.pathattributes may be modified to put paths in whatever the datastore considers a standardized form.- transfer
str, optional If not
None, must be one of ‘auto’, ‘move’, ‘copy’, ‘direct’, ‘split’, ‘hardlink’, ‘relsymlink’ or ‘symlink’, indicating how to transfer the file.- record_validation_info
bool, optional If
True, the default, the datastore can record validation information associated with the file. IfFalsethe datastore will not attempt to track any information such as checksums or file sizes. This can be useful if such information is tracked in an external system or if the file is to be compressed in place. It is up to the datastore whether this parameter is relevant.
- *datasets
- Raises:
- TypeError
Raised if the butler is read-only or if no run was provided.
- NotImplementedError
Raised if the
Datastoredoes not support the given transfer mode.- DatasetTypeNotSupportedError
Raised if one or more files to be ingested have a dataset type that is not supported by the
Datastore..- FileNotFoundError
Raised if one of the given files does not exist.
- FileExistsError
Raised if transfer is not
Nonebut the (internal) location the file would be moved to is already occupied.
Notes
This operation is not fully exception safe: if a database operation fails, the given
FileDatasetinstances may be only partially updated.It is atomic in terms of database operations (they will either all succeed or all fail) providing the database engine implements transactions correctly. It will attempt to be atomic in terms of filesystem operations as well, but this cannot be implemented rigorously for most datastores.
- abstract ingest_zip(zip_file: str | ParseResult | ResourcePath | Path, transfer: str = 'auto') None¶
Ingest a Zip file into this butler.
The Zip file must have been created by
retrieve_artifacts_zip.- Parameters:
- zip_file
lsst.resources.ResourcePathExpression Path to the Zip file.
- transfer
str, optional Method to use to transfer the Zip into the datastore.
- zip_file
Notes
Run collections are created as needed.
- static makeRepo(root: str | ParseResult | ResourcePath | Path, config: Config | str | None = None, dimensionConfig: Config | str | None = None, standalone: bool = False, searchPaths: list[str] | None = None, forceConfigRoot: bool = True, outfile: str | ParseResult | ResourcePath | Path | None = None, overwrite: bool = False) Config¶
Create an empty data repository by adding a butler.yaml config to a repository root directory.
- Parameters:
- root
lsst.resources.ResourcePathExpression Path or URI to the root location of the new repository. Will be created if it does not exist.
- config
Configorstr, optional Configuration to write to the repository, after setting any root-dependent Registry or Datastore config options. Can not be a
ButlerConfigor aConfigSubset. IfNone, default configuration will be used. Root-dependent config options specified in this config are overwritten ifforceConfigRootisTrue.- dimensionConfig
Configorstr, optional Configuration for dimensions, will be used to initialize registry database.
- standalone
bool If True, write all expanded defaults, not just customized or repository-specific settings. This (mostly) decouples the repository from the default configuration, insulating it from changes to the defaults (which may be good or bad, depending on the nature of the changes). Future additions to the defaults will still be picked up when initializing
Butlersto repos created withstandalone=True.- searchPaths
listofstr, optional Directory paths to search when calculating the full butler configuration.
- forceConfigRoot
bool, optional If
False, any values present in the suppliedconfigthat would normally be reset are not overridden and will appear directly in the output config. This allows non-standard overrides of the root directory for a datastore or registry to be given. If this parameter isTruethe values forrootwill be forced into the resulting config if appropriate.- outfile
lss.resources.ResourcePathExpression, optional If not-
None, the output configuration will be written to this location rather than into the repository itself. Can be a URI string. Can refer to a directory that will be used to writebutler.yaml.- overwrite
bool, optional Create a new configuration file even if one already exists in the specified output location. Default is to raise an exception.
- root
- Returns:
- Raises:
- ValueError
Raised if a ButlerConfig or ConfigSubset is passed instead of a regular Config (as these subclasses would make it impossible to support
standalone=False).- FileExistsError
Raised if the output config file already exists.
- os.error
Raised if the directory does not exist, exists but is not a directory, or cannot be created.
Notes
Note that when
standalone=False(the default), the configuration search path (seeConfigSubset.defaultSearchPaths) that was used to construct the repository should also be used to construct any Butlers to avoid configuration inconsistencies.
- classmethod parse_dataset_uri(uri: str) ParsedButlerDatasetURI¶
Extract the butler label and dataset ID from a dataset URI.
- Parameters:
- uri
str The dataset URI to parse.
- uri
- Returns:
- parsed
ParsedButlerDatasetURI The label associated with the butler repository from which this dataset originates and the ID of the dataset.
- parsed
Notes
Supports dataset URIs of the forms
ivo://org.rubinobs/usdac/dr1?repo=butler_label&id=UUID(see DMTN-302) andbutler://butler_label/UUID. ThebutlerURI is deprecated and can not include/in the label string.ivoURIs can include anything supported by theButlerconstructor, including paths to repositories and alias labels.ivo://org.rubinobs/dr1?repo=/repo/main&id=UUID
will return a label of
/repo/main.This method does not attempt to check that the dataset exists in the labeled butler.
Since the IVOID can be issued by any publisher to represent a Butler dataset there is no validation of the path or netloc component of the URI. The only requirement is that there are
idandrepokeys in theivoURI query component.
- abstract put(obj: Any, datasetRefOrType: DatasetRef | DatasetType | str, /, dataId: DataId | None = None, *, run: str | None = None, **kwargs: Any) DatasetRef¶
Store and register a dataset.
- Parameters:
- obj
object The dataset.
- datasetRefOrType
DatasetRef,DatasetType, orstr When
DatasetRefis provided,dataIdshould beNone. Otherwise theDatasetTypeor name thereof. If a fully resolvedDatasetRefis given the run and ID are used directly.- dataId
dictorDataCoordinate A
dictofDimensionlink name, value pairs that label theDatasetRefwithin a Collection. WhenNone, aDatasetRefshould be provided as the second argument.- run
str, optional The name of the run the dataset should be added to, overriding
self.run. Not used if a resolvedDatasetRefis provided.- **kwargs
Additional keyword arguments used to augment or construct a
DataCoordinate. SeeDataCoordinate.standardizeparameters. Not used if a resolveDatasetRefis provided.
- obj
- Returns:
- ref
DatasetRef A reference to the stored dataset, updated with the correct id if given.
- ref
- Raises:
- TypeError
Raised if the butler is read-only or if no run has been provided.
- abstract query() AbstractContextManager[Query]¶
Context manager returning a
Queryobject used for construction and execution of complex queries.
- query_data_ids(dimensions: DimensionGroup | Iterable[str] | str, *, data_id: DataId | None = None, where: str = '', bind: Mapping[str, Any] | None = None, with_dimension_records: bool = False, order_by: Iterable[str] | str | None = None, limit: int | None = -20000, explain: bool = True, **kwargs: Any) list[DataCoordinate]¶
Query for data IDs matching user-provided criteria.
- Parameters:
- dimensions
DimensionGroup,str, orIterable[str] The dimensions of the data IDs to yield, as either
DimensionGroupinstances orstr. Will be automatically expanded to a completeDimensionGroup.- data_id
dictorDataCoordinate, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where
str, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
- bind
Mapping, optional Mapping containing literal values that should be injected into the
whereexpression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.- with_dimension_records
bool, optional If
True(default isFalse) then returned data IDs will have dimension records.- order_by
Iterable[str] orstr, optional Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (
-) to use descending ordering.- limit
intorNone, optional Upper limit on the number of returned records.
Nonecan be used if no limit is wanted. A limit of0means that the query will be executed and validated but no results will be returned. In this case there will be no exception even ifexplainisTrue. If a negative value is given a warning will be issued if the number of results is capped by that limit.- explain
bool, optional If
True(default) thenEmptyQueryResultErrorexception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardizewhen processing thedata_idargument (and may be used to provide a constraining data ID even when thedata_idargument isNone).
- dimensions
- Returns:
- dataIds
list[DataCoordinate] Data IDs matching the given query parameters. These are always guaranteed to identify all dimensions (
DataCoordinate.hasFullreturnsTrue).
- dataIds
- Raises:
- lsst.daf.butler.registry.DataIdError
Raised when
data_idor keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.- lsst.daf.butler.registry.UserExpressionError
Raised when
whereexpression is invalid.- lsst.daf.butler.EmptyQueryResultError
Raised when query generates empty result and
explainis set toTrue.- TypeError
Raised when the arguments are incompatible.
- query_datasets(dataset_type: str | DatasetType, collections: str | Iterable[str] | None = None, *, find_first: bool = True, data_id: DataId | None = None, where: str = '', bind: Mapping[str, Any] | None = None, with_dimension_records: bool = False, order_by: Iterable[str] | str | None = None, limit: int | None = -20000, explain: bool = True, **kwargs: Any) list[DatasetRef]¶
Query for dataset references matching user-provided criteria.
- Parameters:
- dataset_type
strorDatasetType Dataset type object or name to search for.
- collectionscollection expression, optional
A collection name or iterable of collection names to search. If not provided, the default collections are used. Can be a wildcard if
find_firstisFalse(if find first is requested the order of collections matters and wildcards make the order indeterminate).See Collection expressions for more information.
- find_first
bool, optional If
True(default), for each result data ID, only yield oneDatasetRefof eachDatasetType, from the first collection in which a dataset of that dataset type appears (according to the order ofcollectionspassed in). IfTrue,collectionsmust not contain wildcards.- data_id
dictorDataCoordinate, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where
str, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
- bind
Mapping, optional Mapping containing literal values that should be injected into the
whereexpression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.- with_dimension_records
bool, optional If
True(default isFalse) then returned data IDs will have dimension records.- order_by
Iterable[str] orstr, optional Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (
-) to use descending ordering.- limit
intorNone, optional Upper limit on the number of returned records.
Nonecan be used if no limit is wanted. A limit of0means that the query will be executed and validated but no results will be returned. In this case there will be no exception even ifexplainisTrue. If a negative value is given a warning will be issued if the number of results is capped by that limit.- explain
bool, optional If
True(default) thenEmptyQueryResultErrorexception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardizewhen processing thedata_idargument (and may be used to provide a constraining data ID even when thedata_idargument isNone).
- dataset_type
- Returns:
- refs
queries.DatasetRefQueryResults Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e.
DataCoordinate.hasFullwill returnTrue).
- refs
- Raises:
- lsst.daf.butler.registry.DatasetTypeExpressionError
Raised when
dataset_typeexpression is invalid.- lsst.daf.butler.registry.DataIdError
Raised when
data_idor keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.- lsst.daf.butler.registry.UserExpressionError
Raised when
whereexpression is invalid.- lsst.daf.butler.EmptyQueryResultError
Raised when query generates empty result and
explainis set toTrue.- TypeError
Raised when the arguments are incompatible, such as when a collection wildcard is passed when
find_firstisTrue, or whencollectionsisNoneand default butler collections are not defined.
- query_dimension_records(element: str, *, data_id: DataId | None = None, where: str = '', bind: Mapping[str, Any] | None = None, order_by: Iterable[str] | str | None = None, limit: int | None = -20000, explain: bool = True, **kwargs: Any) list[DimensionRecord]¶
Query for dimension information matching user-provided criteria.
- Parameters:
- element
str The name of a dimension element to obtain records for.
- data_id
dictorDataCoordinate, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where
str, optional A string expression similar to a SQL WHERE clause. See
queryDataIdsand Dimension expressions for more information.- bind
Mapping, optional Mapping containing literal values that should be injected into the
whereexpression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.- order_by
Iterable[str] orstr, optional Names of the columns/dimensions to use for ordering returned data IDs. Column name can be prefixed with minus (
-) to use descending ordering.- limit
intorNone, optional Upper limit on the number of returned records.
Nonecan be used if no limit is wanted. A limit of0means that the query will be executed and validated but no results will be returned. In this case there will be no exception even ifexplainisTrue. If a negative value is given a warning will be issued if the number of results is capped by that limit.- explain
bool, optional If
True(default) thenEmptyQueryResultErrorexception is raised when resulting list is empty. The exception contains non-empty list of strings explaining possible causes for empty result.- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardizewhen processing thedata_idargument (and may be used to provide a constraining data ID even when thedata_idargument isNone).
- element
- Returns:
- records
list`[`DimensionRecord] Dimension records matching the given query parameters.
- records
- Raises:
- lsst.daf.butler.registry.DataIdError
Raised when
data_idor keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.- lsst.daf.butler.registry.UserExpressionError
Raised when
whereexpression is invalid.- lsst.daf.butler.EmptyQueryResultError
Raised when query generates empty result and
explainis set toTrue.- TypeError
Raised when the arguments are incompatible, such as when a collection wildcard is passed when
find_firstisTrue, or whencollectionsisNoneand default butler collections are not defined.
- abstract removeRuns(names: Iterable[str], unstore: bool = True) None¶
Remove one or more
RUNcollections and the datasets within them.- Parameters:
- names
Iterable[str] The names of the collections to remove.
- unstore
bool, optional If
True(default), delete datasets from all datastores in which they are present, and attempt to rollback the registry deletions if datastore deletions fail (which may not always be possible). IfFalse, datastore records for these datasets are still removed, but any artifacts (e.g. files) will not be.
- names
- Raises:
- TypeError
Raised if one or more collections are not of type
RUN.
- abstract retrieveArtifacts(refs: Iterable[DatasetRef], destination: ResourcePathExpression, transfer: str = 'auto', preserve_path: bool = True, overwrite: bool = False) list[ResourcePath]¶
Retrieve the artifacts associated with the supplied refs.
- Parameters:
- refsiterable of
DatasetRef The datasets for which artifacts are to be retrieved. A single ref can result in multiple artifacts. The refs must be resolved.
- destination
lsst.resources.ResourcePathorstr Location to write the artifacts.
- transfer
str, optional Method to use to transfer the artifacts. Must be one of the options supported by
transfer_from(). “move” is not allowed.- preserve_path
bool, optional If
Truethe full path of the artifact within the datastore is preserved. IfFalsethe final file component of the path is used.- overwrite
bool, optional If
Trueallow transfers to overwrite existing files at the destination.
- refsiterable of
- Returns:
- targets
listoflsst.resources.ResourcePath URIs of file artifacts in destination location. Order is not preserved.
- targets
Notes
For non-file datastores the artifacts written to the destination may not match the representation inside the datastore. For example a hierarchical data structure in a NoSQL database may well be stored as a JSON file.
- abstract retrieve_artifacts_zip(refs: Iterable[DatasetRef], destination: ResourcePathExpression, overwrite: bool = True) ResourcePath¶
Retrieve artifacts from a Butler and place in ZIP file.
- Parameters:
- refs
Iterable[DatasetRef] The datasets to be included in the zip file.
- destination
lsst.resources.ResourcePathExpression Directory to write the new ZIP file. This directory will also be used as a staging area for the datasets being downloaded from the datastore.
- overwrite
bool, optional If
Falsethe output Zip will not be written if a file of the same name is already present indestination.
- refs
- Returns:
- zip_file
lsst.resources.ResourcePath The path to the new ZIP file.
- zip_file
- Raises:
- ValueError
Raised if there are no refs to retrieve.
- abstract transaction() AbstractContextManager[None]¶
Context manager supporting
Butlertransactions.Transactions can be nested.
- abstract transfer_dimension_records_from(source_butler: LimitedButler | Butler, source_refs: Iterable[DatasetRef]) None¶
Transfer dimension records to this Butler from another Butler.
- Parameters:
- source_butler
LimitedButlerorButler Butler from which the records are to be transferred. If data IDs in
source_refsare not expanded then this has to be a fullButlerwhose registry will be used to expand data IDs. If the source refs contain coordinates that are used to populate other records then this will also need to be a fullButler.- source_refsiterable of
DatasetRef Datasets defined in the source butler whose dimension records should be transferred to this butler. In most circumstances. transfer is faster if the dataset refs are expanded.
- source_butler
- abstract transfer_from(source_butler: LimitedButler, source_refs: Iterable[DatasetRef], transfer: str = 'auto', skip_missing: bool = True, register_dataset_types: bool = False, transfer_dimensions: bool = False, dry_run: bool = False) Collection[DatasetRef]¶
Transfer datasets to this Butler from a run in another Butler.
- Parameters:
- source_butler
LimitedButler Butler from which the datasets are to be transferred. If data IDs in
source_refsare not expanded then this has to be a fullButlerwhose registry will be used to expand data IDs.- source_refsiterable of
DatasetRef Datasets defined in the source butler that should be transferred to this butler. In most circumstances,
transfer_fromis faster if the dataset refs are expanded.- transfer
str, optional Transfer mode passed to
transfer_from.- skip_missing
bool If
True, datasets with no datastore artifact associated with them are not transferred. IfFalsea registry entry will be created even if no datastore record is created (and so will look equivalent to the dataset being unstored).- register_dataset_types
bool If
Trueany missing dataset types are registered. Otherwise an exception is raised.- transfer_dimensions
bool, optional If
True, dimension record data associated with the new datasets will be transferred.- dry_run
bool, optional If
Truethe transfer will be processed without any modifications made to the target butler and as if the target butler did not have any of the datasets.
- source_butler
- Returns:
- refs
listofDatasetRef The refs added to this Butler.
- refs
Notes
The datastore artifact has to exist for a transfer to be made but non-existence is not an error.
Datasets that already exist in this run will be skipped.
The datasets are imported as part of a transaction, although dataset types are registered before the transaction is started. This means that it is possible for a dataset type to be registered even though transfer has failed.
- abstract validateConfiguration(logFailures: bool = False, datasetTypeNames: Iterable[str] | None = None, ignore: Iterable[str] | None = None) None¶
Validate butler configuration.
Checks that each
DatasetTypecan be stored in theDatastore.- Parameters:
- logFailures
bool, optional If
True, output a log message for every validation error detected.- datasetTypeNamesiterable of
str, optional The
DatasetTypenames that should be checked. This allows only a subset to be selected.- ignoreiterable of
str, optional Names of DatasetTypes to skip over. This can be used to skip known problems. If a named
DatasetTypecorresponds to a composite, all components of thatDatasetTypewill also be ignored.
- logFailures
- Raises:
- ButlerValidationError
Raised if there is some inconsistency with how this Butler is configured.