Registry

class lsst.daf.butler.Registry(database: Database, universe: DimensionUniverse, *, attributes: Type[ButlerAttributeManager], opaque: Type[OpaqueTableStorageManager], dimensions: Type[DimensionRecordStorageManager], collections: Type[CollectionManager], datasets: Type[DatasetRecordStorageManager], datastoreBridges: Type[DatastoreRegistryBridgeManager], versions: ButlerVersionsManager, writeable: bool = True, create: bool = False)

Bases: object

Registry interface.

Parameters:
config : ButlerConfig, RegistryConfig, Config or str

Registry configuration

Attributes Summary

defaultConfigFile Path to configuration defaults.
dimensions All dimensions recognized by this Registry (DimensionUniverse).

Methods Summary

associate(collection, refs) Add existing datasets to a TAGGED collection.
deleteOpaqueData(tableName, **where) Remove records from an opaque table.
disassociate(collection, refs) Remove existing datasets from a TAGGED collection.
expandDataId(dataId, Mapping[str, Any], …) Expand a dimension-based data ID to include additional information.
fetchOpaqueData(tableName, **where) Retrieve records from an opaque table.
findDataset(datasetType, str], dataId, …) Find a dataset given its DatasetType and data ID.
fromConfig(config, RegistryConfig, Config, …) Create Registry subclass instance from config.
getCollectionChain(parent) Return the child collections in a CHAINED collection.
getCollectionType(name) Return an enumeration value indicating the type of the given collection.
getDataset(id) Retrieve a Dataset entry.
getDatasetLocations(ref) Retrieve datastore locations for a given dataset.
getDatasetType(name) Get the DatasetType.
getDatastoreBridgeManager() Return an object that allows a new Datastore instance to communicate with this Registry.
insertDatasets(datasetType, str], dataIds, …) Insert one or more datasets into the Registry
insertDimensionData(element, str], *data, …) Insert one or more dimension records into the database.
insertOpaqueData(tableName, *data) Insert records into an opaque table.
isWriteable() Return True if this registry allows write operations, and False otherwise.
makeQueryBuilder(summary) Return a QueryBuilder instance capable of constructing and managing more complex queries than those obtainable via Registry interfaces.
queryCollections(expression, datasetType, …) Iterate over the collections whose names match an expression.
queryDatasetTypes(expression, *, components) Iterate over the dataset types whose names match an expression.
queryDatasets(datasetType, *, collections, …) Query for and iterate over dataset references matching user-provided criteria.
queryDimensions(dimensions, str]], …) Query for and iterate over data IDs matching user-provided criteria.
registerCollection(name, type) Add a new collection if one with the given name does not exist.
registerDatasetType(datasetType) Add a new DatasetType to the Registry.
registerOpaqueTable(tableName, spec) Add an opaque (to the Registry) table for use by a Datastore or other data repository client.
registerRun(name) Add a new run if one with the given name does not exist.
removeCollection(name) Completely remove the given collection.
removeDatasets(refs) Remove datasets from the Registry.
setCollectionChain(parent, children) Define or redefine a CHAINED collection.
syncDimensionData(element, str], row, Any], …) Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.
transaction() Return a context manager that represents a transaction.

Attributes Documentation

defaultConfigFile = None

Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.

dimensions

All dimensions recognized by this Registry (DimensionUniverse).

Methods Documentation

associate(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None

Add existing datasets to a TAGGED collection.

If a DatasetRef with the same exact integer ID is already in a collection nothing is changed. If a DatasetRef with the same DatasetType and data ID but with different integer ID exists in the collection, ConflictingDefinitionError is raised.

Parameters:
collection : str

Indicates the collection the datasets should be associated with.

refs : Iterable [ DatasetRef ]

An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises:
ConflictingDefinitionError

If a Dataset with the given DatasetRef already exists in the given collection.

AmbiguousDatasetError

Raised if any(ref.id is None for ref in refs).

MissingCollectionError

Raised if collection does not exist in the registry.

TypeError

Raise adding new datasets to the given collection is not allowed.

deleteOpaqueData(tableName: str, **where) → None

Remove records from an opaque table.

Parameters:
tableName : str

Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.

where

Additional keyword arguments are interpreted as equality constraints that restrict the deleted rows (combined with AND); keyword arguments are column names and values are the values they must have.

disassociate(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None

Remove existing datasets from a TAGGED collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters:
collection : str

The collection the datasets should no longer be associated with.

refs : Iterable [ DatasetRef ]

An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises:
AmbiguousDatasetError

Raised if any of the given dataset references is unresolved.

MissingCollectionError

Raised if collection does not exist in the registry.

TypeError

Raise adding new datasets to the given collection is not allowed.

expandDataId(dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None, records: Optional[Mapping[lsst.daf.butler.core.dimensions.elements.DimensionElement, Optional[lsst.daf.butler.core.dimensions.records.DimensionRecord]]] = None, **kwargs) → lsst.daf.butler.core.dimensions.coordinate.ExpandedDataCoordinate

Expand a dimension-based data ID to include additional information.

Parameters:
dataId : DataCoordinate or dict, optional

Data ID to be expanded; augmented and overridden by kwds.

graph : DimensionGraph, optional

Set of dimensions for the expanded ID. If None, the dimensions will be inferred from the keys of dataId and kwds. Dimensions that are in dataId or kwds but not in graph are silently ignored, providing a way to extract and expand a subset of a data ID.

records : Mapping [DimensionElement, DimensionRecord], optional

Dimension record data to use before querying the database for that data.

**kwargs

Additional keywords are treated like additional key-value pairs for dataId, extending and overriding

Returns:
expanded : ExpandedDataCoordinate

A data ID that includes full metadata for all of the dimensions it identifieds.

fetchOpaqueData(tableName: str, **where) → Iterator[dict]

Retrieve records from an opaque table.

Parameters:
tableName : str

Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.

where

Additional keyword arguments are interpreted as equality constraints that restrict the returned rows (combined with AND); keyword arguments are column names and values are the values they must have.

Yields:
row : dict

A dictionary representing a single result row.

findDataset(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, collections: Any, **kwargs) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]

Find a dataset given its DatasetType and data ID.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.

Parameters:
datasetType : DatasetType or str

A DatasetType or the name of one.

dataId : dict or DataCoordinate, optional

A dict-like object containing the Dimension links that identify the dataset within a collection.

collections

An expression that fully or partially identifies the collections to search for the dataset, such as a str, re.Pattern, or iterable thereof. can be used to return all collections. See Collection expressions for more information.

**kwargs

Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one.

Returns:
ref : DatasetRef

A reference to the dataset, or None if no matching Dataset was found.

Raises:
LookupError

Raised if one or more data ID keys are missing or the dataset type does not exist.

MissingCollectionError

Raised if any of collections does not exist in the registry.

classmethod fromConfig(config: Union[ButlerConfig, RegistryConfig, Config, str], create: bool = False, butlerRoot: Optional[str] = None, writeable: bool = True) → Registry

Create Registry subclass instance from config.

Uses registry.cls from config to determine which subclass to instantiate.

Parameters:
config : ButlerConfig, RegistryConfig, Config or str

Registry configuration

create : bool, optional

Assume empty Registry and create a new one.

butlerRoot : str, optional

Path to the repository root this Registry will manage.

writeable : bool, optional

If True (default) create a read-write connection to the database.

Returns:
registry : Registry (subclass)

A new Registry subclass instance.

getCollectionChain(parent: str) → lsst.daf.butler.registry.wildcards.CollectionSearch

Return the child collections in a CHAINED collection.

Parameters:
parent : str

Name of the chained collection. Must have already been added via a call to Registry.registerCollection.

Returns:
children : CollectionSearch

An object that defines the search path of the collection. See Collection expressions for more information.

Raises:
MissingCollectionError

Raised if parent does not exist in the Registry.

TypeError

Raised if parent does not correspond to a CHAINED collection.

getCollectionType(name: str) → lsst.daf.butler.registry._collectionType.CollectionType

Return an enumeration value indicating the type of the given collection.

Parameters:
name : str

The name of the collection.

Returns:
type : CollectionType

Enum value indicating the type of this collection.

Raises:
MissingCollectionError

Raised if no collection with the given name exists.

getDataset(id: int) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]

Retrieve a Dataset entry.

Parameters:
id : int

The unique identifier for the dataset.

Returns:
ref : DatasetRef or None

A ref to the Dataset, or None if no matching Dataset was found.

getDatasetLocations(ref: lsst.daf.butler.core.datasets.ref.DatasetRef) → Iterable[str]

Retrieve datastore locations for a given dataset.

Parameters:
ref : DatasetRef

A reference to the dataset for which to retrieve storage information.

Returns:
datastores : Iterable [ str ]

All the matching datastores holding this dataset.

Raises:
AmbiguousDatasetError

Raised if ref.id is None.

getDatasetType(name: str) → lsst.daf.butler.core.datasets.type.DatasetType

Get the DatasetType.

Parameters:
name : str

Name of the type.

Returns:
type : DatasetType

The DatasetType associated with the given name.

Raises:
KeyError

Requested named DatasetType could not be found in registry.

getDatastoreBridgeManager() → DatastoreRegistryBridgeManager

Return an object that allows a new Datastore instance to communicate with this Registry.

Returns:
manager : DatastoreRegistryBridgeManager

Object that mediates communication between this Registry and its associated datastores.

insertDatasets(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataIds: Iterable[Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any]]], run: str) → List[lsst.daf.butler.core.datasets.ref.DatasetRef]

Insert one or more datasets into the Registry

This always adds new datasets; to associate existing datasets with a new collection, use associate.

Parameters:
datasetType : DatasetType or str

A DatasetType or the name of one.

dataIds : Iterable of dict or DataCoordinate

Dimension-based identifiers for the new datasets.

run : str

The name of the run that produced the datasets.

Returns:
refs : list of DatasetRef

Resolved DatasetRef instances for all given data IDs (in the same order).

Raises:
ConflictingDefinitionError

If a dataset with the same dataset type and data ID as one of those given already exists in run.

MissingCollectionError

Raised if run does not exist in the registry.

insertDimensionData(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], *data, conform: bool = True) → None

Insert one or more dimension records into the database.

Parameters:
element : DimensionElement or str

The DimensionElement or name thereof that identifies the table records will be inserted into.

data : dict or DimensionRecord (variadic)

One or more records to insert.

conform : bool, optional

If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

insertOpaqueData(tableName: str, *data) → None

Insert records into an opaque table.

Parameters:
tableName : str

Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.

data

Each additional positional argument is a dictionary that represents a single row to be added.

isWriteable() → bool

Return True if this registry allows write operations, and False otherwise.

makeQueryBuilder(summary: lsst.daf.butler.registry.queries._structs.QuerySummary) → lsst.daf.butler.registry.queries._builder.QueryBuilder

Return a QueryBuilder instance capable of constructing and managing more complex queries than those obtainable via Registry interfaces.

This is an advanced interface; downstream code should prefer Registry.queryDimensions and Registry.queryDatasets whenever those are sufficient.

Parameters:
summary : QuerySummary

Object describing and categorizing the full set of dimensions that will be included in the query.

Returns:
builder : QueryBuilder

Object that can be used to construct and perform advanced queries.

queryCollections(expression: Any = Ellipsis, datasetType: Optional[lsst.daf.butler.core.datasets.type.DatasetType] = None, collectionType: Optional[lsst.daf.butler.registry._collectionType.CollectionType] = None, flattenChains: bool = False, includeChains: Optional[bool] = None) → Iterator[str]

Iterate over the collections whose names match an expression.

Parameters:
expression : Any, optional

An expression that fully or partially identifies the collections to return, such as a str, re.Pattern, or iterable thereof. can be used to return all collections, and is the default. See Collection expressions for more information.

datasetType : DatasetType, optional

If provided, only yield collections that should be searched for this dataset type according to expression. If this is not provided, any dataset type restrictions in expression are ignored.

collectionType : CollectionType, optional

If provided, only yield collections of this type.

flattenChains : bool, optional

If True (False is default), recursively yield the child collections of matching CHAINED collections.

includeChains : bool, optional

If True, yield records for matching CHAINED collections. Default is the opposite of flattenChains: include either CHAINED collections or their children, but not both.

Yields:
collection : str

The name of a collection that matches expression.

queryDatasetTypes(expression: Any = Ellipsis, *, components: Optional[bool] = None) → Iterator[lsst.daf.butler.core.datasets.type.DatasetType]

Iterate over the dataset types whose names match an expression.

Parameters:
expression : Any, optional

An expression that fully or partially identifies the dataset types to return, such as a str, re.Pattern, or iterable thereof. can be used to return all dataset types, and is the default. See DatasetType expressions for more information.

components : bool, optional

If True, apply all expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

Yields:
datasetType : DatasetType

A DatasetType instance whose name matches expression.

queryDatasets(datasetType: Any, *, collections: Any, dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, deduplicate: bool = False, expand: bool = True, components: Optional[bool] = None, **kwargs) → Iterator[lsst.daf.butler.core.datasets.ref.DatasetRef]

Query for and iterate over dataset references matching user-provided criteria.

Parameters:
datasetType

An expression that fully or partially identifies the dataset types to be queried. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. The special value can be used to query all dataset types. See DatasetType expressions for more information.

collections

An expression that fully or partially identifies the collections to search for datasets, such as a str, re.Pattern, or iterable thereof. can be used to return all collections. See Collection expressions for more information.

dimensions : Iterable of Dimension or str

Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the dataId or where arguments.

dataId : dict or DataCoordinate, optional

A data ID whose key-value pairs are used as equality constraints in the query.

where : str, optional

A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.

deduplicate : bool, optional

If True (False is default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be .

expand : bool, optional

If True (default) attach ExpandedDataCoordinate instead of minimal DataCoordinate base-class instances.

components : bool, optional

If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

**kwargs

Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Yields:
ref : DatasetRef

Dataset references matching the given query criteria. These are grouped by DatasetType if the query evaluates to multiple dataset types, but order is otherwise unspecified.

Raises:
TypeError

Raised when the arguments are incompatible, such as when a collection wildcard is passed when deduplicate is True.

Notes

When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use queryDimensions to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls to queryDatasets with the returned data IDs passed as constraints.

queryDimensions(dimensions: Union[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]], lsst.daf.butler.core.dimensions.elements.Dimension, str], *, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, expand: bool = True, components: Optional[bool] = None, **kwargs) → Iterator[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate]

Query for and iterate over data IDs matching user-provided criteria.

Parameters:
dimensions : Dimension or str, or iterable thereof

The dimensions of the data IDs to yield, as either Dimension instances or str. Will be automatically expanded to a complete DimensionGraph.

dataId : dict or DataCoordinate, optional

A data ID whose key-value pairs are used as equality constraints in the query.

datasets : Any, optional

An expression that fully or partially identifies dataset types that should constrain the yielded data IDs. For example, including “raw” here would constrain the yielded instrument, exposure, detector, and physical_filter values to only those for which at least one “raw” dataset exists in collections. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. Unlike other dataset type expressions, is not permitted - it doesn’t make sense to constrain data IDs on the existence of all datasets. See DatasetType expressions for more information.

collections: `Any`, optional

An expression that fully or partially identifies the collections to search for datasets, such as a str, re.Pattern, or iterable thereof. can be used to return all collections. Must be provided if datasets is, and is ignored if it is not. See Collection expressions for more information.

where : str, optional

A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.

expand : bool, optional

If True (default) yield ExpandedDataCoordinate instead of minimal DataCoordinate base-class instances.

components : bool, optional

If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

**kwargs

Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Yields:
dataId : DataCoordinate

Data IDs matching the given query parameters. Order is unspecified.

registerCollection(name: str, type: lsst.daf.butler.registry._collectionType.CollectionType = <CollectionType.TAGGED: 2>) → None

Add a new collection if one with the given name does not exist.

Parameters:
name : str

The name of the collection to create.

type : CollectionType

Enum value indicating the type of collection to create.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

registerDatasetType(datasetType: lsst.daf.butler.core.datasets.type.DatasetType) → bool

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters:
datasetType : DatasetType

The DatasetType to be added.

Returns:
inserted : bool

True if datasetType was inserted, False if an identical existing DatsetType was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.

Raises:
ValueError

Raised if the dimensions or storage class are invalid.

ConflictingDefinitionError

Raised if this DatasetType is already registered with a different definition.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

registerOpaqueTable(tableName: str, spec: lsst.daf.butler.core.ddl.TableSpec) → None

Add an opaque (to the Registry) table for use by a Datastore or other data repository client.

Opaque table records can be added via insertOpaqueData, retrieved via fetchOpaqueData, and removed via deleteOpaqueData.

Parameters:
tableName : str

Logical name of the opaque table. This may differ from the actual name used in the database by a prefix and/or suffix.

spec : ddl.TableSpec

Specification for the table to be added.

registerRun(name: str) → None

Add a new run if one with the given name does not exist.

Parameters:
name : str

The name of the run to create.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

removeCollection(name: str) → None

Completely remove the given collection.

Parameters:
name : str

The name of the collection to remove.

Raises:
MissingCollectionError

Raised if no collection with the given name exists.

Notes

If this is a RUN collection, all datasets and quanta in it are also fully removed. This requires that those datasets be removed (or at least trashed) from any datastores that hold them first.

A collection may not be deleted as long as it is referenced by a CHAINED collection; the CHAINED collection must be deleted or redefined first.

removeDatasets(refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None

Remove datasets from the Registry.

The datasets will be removed unconditionally from all collections, and any Quantum that consumed this dataset will instead be marked with having a NULL input. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters:
refs : Iterable of DatasetRef

References to the datasets to be removed. Must include a valid id attribute, and should be considered invalidated upon return.

Raises:
AmbiguousDatasetError

Raised if any ref.id is None.

OrphanedRecordError

Raised if any dataset is still present in any Datastore.

setCollectionChain(parent: str, children: Any) → None

Define or redefine a CHAINED collection.

Parameters:
parent : str

Name of the chained collection. Must have already been added via a call to Registry.registerCollection.

children : Any

An expression defining an ordered search of child collections, generally an iterable of str. Restrictions on the dataset types to be searched can also be included, by passing mapping or an iterable containing tuples; see Collection expressions for more information.

Raises:
MissingCollectionError

Raised when any of the given collections do not exist in the Registry.

TypeError

Raised if parent does not correspond to a CHAINED collection.

ValueError

Raised if the given collections contains a cycle.

syncDimensionData(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], row: Union[Mapping[str, Any], lsst.daf.butler.core.dimensions.records.DimensionRecord], conform: bool = True) → bool

Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.

Parameters:
element : DimensionElement or str

The DimensionElement or name thereof that identifies the table records will be inserted into.

row : dict or DimensionRecord

The record to insert.

conform : bool, optional

If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

Returns:
inserted : bool

True if a new row was inserted, False otherwise.

Raises:
ConflictingDefinitionError

Raised if the record exists in the database (according to primary key lookup) but is inconsistent with the given one.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

transaction() → Iterator[None]

Return a context manager that represents a transaction.