Registry¶

class lsst.daf.butler.registry.Registry(database: Database, universe: DimensionUniverse, *, opaque: Type[OpaqueTableStorageManager], dimensions: Type[DimensionRecordStorageManager], collections: Type[CollectionManager], create: bool = False)¶

Bases: object

Registry interface.

Parameters:	config : `ButlerConfig`, `RegistryConfig`, `Config` or `str` Registry configuration

Attributes Summary

`defaultConfigFile`	Path to configuration defaults.
`dimensions`	The universe of all dimensions known to the registry (`DimensionUniverse`).

Methods Summary

`associate`(collection, refs, *, recursive)	Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
`attachComponent`(name, parent, component)	Attach a component to a dataset.
`deleteOpaqueData`(tableName, **where)	Remove records from an opaque table.
`disassociate`(collection, refs, *, recursive)	Remove existing Datasets from a collection.
`expandDataId`(dataId, Mapping[str, Any], …)	Expand a dimension-based data ID to include additional information.
`fetchOpaqueData`(tableName, **where)	Retrieve records from an opaque table.
`findDataset`(datasetType, str], dataId, …)	Find a dataset given its `DatasetType` and data ID.
`fromConfig`(config, RegistryConfig, Config, …)	Create `Registry` subclass instance from `config`.
`getCollectionChain`(parent)	Return the child collections in a `CHAINED` collection.
`getCollectionType`(name)	Return an enumeration value indicating the type of the given collection.
`getDataset`(id, datasetType, dataId)	Retrieve a Dataset entry.
`getDatasetLocations`(ref)	Retrieve datastore locations for a given dataset.
`getDatasetType`(name)	Get the `DatasetType`.
`insertDatasetLocations`(datastoreName, refs)	Record that a datastore holds the given datasets.
`insertDatasets`(datasetType, str], dataIds, …)	Insert one or more datasets into the `Registry`
`insertDimensionData`(element, str], *data, …)	Insert one or more dimension records into the database.
`insertOpaqueData`(tableName, *data)	Insert records into an opaque table.
`isWriteable`()	Return `True` if this registry allows write operations, and `False` otherwise.
`makeQueryBuilder`(summary)	Return a `QueryBuilder` instance capable of constructing and managing more complex queries than those obtainable via `Registry` interfaces.
`queryCollections`(expression, datasetType, …)	Iterate over the collections whose names match an expression.
`queryDatasetTypes`(expression)	Iterate over the dataset types whose names match an expression.
`queryDatasets`(datasetType, *, collections, …)	Query for and iterate over dataset references matching user-provided criteria.
`queryDimensions`(dimensions, str]], …)	Query for and iterate over data IDs matching user-provided criteria.
`registerCollection`(name, type)	Add a new collection if one with the given name does not exist.
`registerDatasetType`(datasetType)	Add a new `DatasetType` to the Registry.
`registerOpaqueTable`(tableName, spec)	Add an opaque (to the `Registry`) table for use by a `Datastore` or other data repository client.
`registerRun`(name)	Add a new run if one with the given name does not exist.
`removeDataset`(ref)	Remove a dataset from the Registry.
`removeDatasetLocation`(datastoreName, ref)	Remove datastore location associated with this dataset.
`setCollectionChain`(parent, children)	Define or redefine a `CHAINED` collection.
`transaction`()	Return a context manager that represents a transaction.

Attributes Documentation

defaultConfigFile = None¶: Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.

dimensions¶: The universe of all dimensions known to the registry (DimensionUniverse).

Methods Documentation

associate(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef], *, recursive: bool = True)¶

Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.

If a DatasetRef with the same exact dataset_id is already in a collection nothing is changed. If a DatasetRef with the same DatasetType1 and dimension values but with different dataset_id exists in the collection, ValueError is raised.

Parameters:

collection : str: Indicates the collection the Datasets should be associated with.
refs : iterable of DatasetRef: An iterable of resolved DatasetRef instances that already exist in this Registry.
recursive : bool, optional: If True, associate all component datasets as well. Note that this only associates components that are actually included in the given DatasetRef instances, which may not be the same as those in the database (especially if they were obtained from queryDatasets, which does not populate DatasetRef.components).

Raises:

ConflictingDefinitionError: If a Dataset with the given DatasetRef already exists in the given collection.
AmbiguousDatasetError: Raised if any(ref.id is None for ref in refs).
MissingCollectionError: Raised if collection does not exist in the registry.
TypeError: Raise adding new datasets to the given collection is not allowed.

attachComponent(name: str, parent: lsst.daf.butler.core.datasets.ref.DatasetRef, component: lsst.daf.butler.core.datasets.ref.DatasetRef)¶

Attach a component to a dataset.

Parameters:	name : `str` Name of the component. parent : `DatasetRef` A reference to the parent dataset. Will be updated to reference the component. component : `DatasetRef` A reference to the component dataset.
Raises:	AmbiguousDatasetError Raised if `parent.id` or `component.id` is `None`.

deleteOpaqueData(tableName: str, **where)¶

Remove records from an opaque table.

Parameters:	tableName : `str` Logical name of the opaque table. Must match the name used in a previous call to `registerOpaqueTable`. where Additional keyword arguments are interpreted as equality constraints that restrict the deleted rows (combined with AND); keyword arguments are column names and values are the values they must have.

disassociate(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef], *, recursive: bool = True)¶

Remove existing Datasets from a collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters:

collection : str: The collection the Datasets should no longer be associated with.
refs : iterable of DatasetRef: An iterable of resolved DatasetRef instances that already exist in this Registry.
recursive : bool, optional: If True, disassociate all component datasets as well. Note that this only disassociates components that are actually included in the given DatasetRef instances, which may not be the same as those in the database (especially if they were obtained from queryDatasets, which does not populate DatasetRef.components).

Raises:

AmbiguousDatasetError: Raised if any(ref.id is None for ref in refs).
MissingCollectionError: Raised if collection does not exist in the registry.
TypeError: Raise adding new datasets to the given collection is not allowed.

expandDataId(dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None, records: Optional[Mapping[lsst.daf.butler.core.dimensions.elements.DimensionElement, lsst.daf.butler.core.dimensions.records.DimensionRecord]] = None, **kwds)¶

Expand a dimension-based data ID to include additional information.

Parameters:

dataId : DataCoordinate or dict, optional: Data ID to be expanded; augmented and overridden by kwds.
graph : DimensionGraph, optional: Set of dimensions for the expanded ID. If None, the dimensions will be inferred from the keys of dataId and kwds. Dimensions that are in dataId or kwds but not in graph are silently ignored, providing a way to extract and expand a subset of a data ID.
records : mapping [DimensionElement, DimensionRecord], optional: Dimension record data to use before querying the database for that data.
**kwds: Additional keywords are treated like additional key-value pairs for dataId, extending and overriding

Returns:

expanded : ExpandedDataCoordinate: A data ID that includes full metadata for all of the dimensions it identifieds.

fetchOpaqueData(tableName: str, **where) → Iterator[dict]¶

Retrieve records from an opaque table.

Parameters:	tableName : `str` Logical name of the opaque table. Must match the name used in a previous call to `registerOpaqueTable`. where Additional keyword arguments are interpreted as equality constraints that restrict the returned rows (combined with AND); keyword arguments are column names and values are the values they must have.
Yields:	row : `dict` A dictionary representing a single result row.

findDataset(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, collections: Any, **kwds) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]¶

Find a dataset given its DatasetType and data ID.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore.

Parameters:	datasetType : `DatasetType` or `str` A `DatasetType` or the name of one. dataId : `dict` or `DataCoordinate`, optional A `dict`-like object containing the `Dimension` links that identify the dataset within a collection. collections An expression that fully or partially identifies the collections to search for the dataset, such as a `str`, `re.Pattern`, or iterable thereof. can be used to return all collections. See Collection expressions for more information. **kwds Additional keyword arguments passed to `DataCoordinate.standardize` to convert `dataId` to a true `DataCoordinate` or augment an existing one.
Returns:	ref : `DatasetRef` A reference to the dataset, or `None` if no matching Dataset was found.
Raises:	LookupError Raised if one or more data ID keys are missing. MissingCollectionError Raised if any of `collections` does not exist in the registry.

classmethod fromConfig(config: Union[ButlerConfig, RegistryConfig, Config, str], create: bool = False, butlerRoot: Optional[str] = None, writeable: bool = True) → Registry¶

Create Registry subclass instance from config.

Uses registry.cls from config to determine which subclass to instantiate.

Parameters:	config : `ButlerConfig`, `RegistryConfig`, `Config` or `str` Registry configuration create : `bool`, optional Assume empty Registry and create a new one. butlerRoot : `str`, optional Path to the repository root this `Registry` will manage. writeable : `bool`, optional If `True` (default) create a read-write connection to the database.
Returns:	registry : `Registry` (subclass) A new `Registry` subclass instance.

getCollectionChain(parent: str) → lsst.daf.butler.registry.wildcards.CollectionSearch¶

Return the child collections in a CHAINED collection.

Parameters:	parent : `str` Name of the chained collection. Must have already been added via a call to `Registry.registerCollection`.
Returns:	children : `CollectionSearch` An object that defines the search path of the collection. See Collection expressions for more information.
Raises:	MissingCollectionError Raised if `parent` does not exist in the `Registry`. TypeError Raised if `parent` does not correspond to a `CHAINED` collection.

getCollectionType(name: str) → lsst.daf.butler.registry._collectionType.CollectionType¶

Return an enumeration value indicating the type of the given collection.

Parameters:	name : `str` The name of the collection.
Returns:	type : `CollectionType` Enum value indicating the type of this collection.
Raises:	MissingCollectionError Raised if no collection with the given name exists.

getDataset(id: int, datasetType: Optional[lsst.daf.butler.core.datasets.type.DatasetType] = None, dataId: Optional[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate] = None) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]¶

Retrieve a Dataset entry.

Parameters:

id : int: The unique identifier for the Dataset.
datasetType : DatasetType, optional: The DatasetType of the dataset to retrieve. This is used to short-circuit retrieving the DatasetType, so if provided, the caller is guaranteeing that it is what would have been retrieved.
dataId : DataCoordinate, optional: A Dimension-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving the dataId, so if provided, the caller is guaranteeing that it is what would have been retrieved.

Returns:

ref : DatasetRef: A ref to the Dataset, or None if no matching Dataset was found.

getDatasetLocations(ref: lsst.daf.butler.core.datasets.ref.DatasetRef) → Set[str]¶

Retrieve datastore locations for a given dataset.

Typically used by Datastore.

Parameters:	ref : `DatasetRef` A reference to the dataset for which to retrieve storage information.
Returns:	datastores : `set` of `str` All the matching datastores holding this dataset. Empty set if the dataset does not exist anywhere.
Raises:	AmbiguousDatasetError Raised if `ref.id` is `None`.

getDatasetType(name: str) → lsst.daf.butler.core.datasets.type.DatasetType¶

Get the DatasetType.

Parameters:	name : `str` Name of the type.
Returns:	type : `DatasetType` The `DatasetType` associated with the given name.
Raises:	KeyError Requested named DatasetType could not be found in registry.

insertDatasetLocations(datastoreName: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef])¶

Record that a datastore holds the given datasets.

Typically used by Datastore.

Parameters:	datastoreName : `str` Name of the datastore holding these datasets. refs : `Iterable` of `DatasetRef` References to the datasets.
Raises:	AmbiguousDatasetError Raised if `any(ref.id is None for ref in refs)`.

insertDatasets(datasetType: Union[DatasetType, str], dataIds: Iterable[DataId], run: str, *, producer: Optional[Quantum] = None, recursive: bool = False) → List[DatasetRef]¶

Insert one or more datasets into the Registry

This always adds new datasets; to associate existing datasets with a new collection, use associate.

Parameters:	datasetType : `DatasetType` or `str` A `DatasetType` or the name of one. dataIds : `Iterable` of `dict` or `DataCoordinate` Dimension-based identifiers for the new datasets. run : `str` The name of the run that produced the datasets. producer : `Quantum` Unit of work that produced the datasets. May be `None` to store no provenance information, but if present the `Quantum` must already have been added to the Registry. recursive : `bool` If True, recursively add datasets and attach entries for component datasets as well.
Returns:	refs : `list` of `DatasetRef` Resolved `DatasetRef` instances for all given data IDs (in the same order).
Raises:	ConflictingDefinitionError If a dataset with the same dataset type and data ID as one of those given already exists in the given collection. MissingCollectionError Raised if `run` does not exist in the registry.

insertDimensionData(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], *data, conform: bool = True)¶

Insert one or more dimension records into the database.

Parameters:

element : DimensionElement or str: The DimensionElement or name thereof that identifies the table records will be inserted into.
data : dict or DimensionRecord (variadic): One or more records to insert.
conform : bool, optional: If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

insertOpaqueData(tableName: str, *data)¶

Insert records into an opaque table.

Parameters:	tableName : `str` Logical name of the opaque table. Must match the name used in a previous call to `registerOpaqueTable`. data Each additional positional argument is a dictionary that represents a single row to be added.

isWriteable() → bool¶: Return True if this registry allows write operations, and False otherwise.

makeQueryBuilder(summary: lsst.daf.butler.registry.queries._structs.QuerySummary) → lsst.daf.butler.registry.queries._builder.QueryBuilder¶

Return a QueryBuilder instance capable of constructing and managing more complex queries than those obtainable via Registry interfaces.

This is an advanced interface; downstream code should prefer Registry.queryDimensions and Registry.queryDatasets whenever those are sufficient.

Parameters:	summary : `QuerySummary` Object describing and categorizing the full set of dimensions that will be included in the query.
Returns:	builder : `QueryBuilder` Object that can be used to construct and perform advanced queries.

queryCollections(expression: Any = Ellipsis, datasetType: Optional[lsst.daf.butler.core.datasets.type.DatasetType] = None, collectionType: Optional[lsst.daf.butler.registry._collectionType.CollectionType] = None, flattenChains: bool = False, includeChains: Optional[bool] = None) → Iterator[str]¶

Iterate over the collections whose names match an expression.

Parameters:

expression : Any, optional: An expression that fully or partially identifies the collections to return, such as a str, re.Pattern, or iterable thereof. can be used to return all collections, and is the default. See Collection expressions for more information.
datasetType : DatasetType, optional: If provided, only yield collections that should be searched for this dataset type according to expression. If this is not provided, any dataset type restrictions in expression are ignored.
collectionType : CollectionType, optional: If provided, only yield collections of this type.
flattenChains : bool, optional: If True (False is default), recursively yield the child collections of matching CHAINED collections.
includeChains : bool, optional: If True, yield records for matching CHAINED collections. Default is the opposite of flattenChains: include either CHAINED collections or their children, but not both.

Yields:

collection : str: The name of a collection that matches expression.

queryDatasetTypes(expression: Any = Ellipsis) → Iterator[lsst.daf.butler.core.datasets.type.DatasetType]¶

Iterate over the dataset types whose names match an expression.

Parameters:	expression : `Any`, optional An expression that fully or partially identifies the dataset types to return, such as a `str`, `re.Pattern`, or iterable thereof. can be used to return all dataset types, and is the default. See DatasetType expressions for more information.
Yields:	datasetType : `DatasetType` A `DatasetType` instance whose name matches `expression`.

queryDatasets(datasetType: Any, *, collections: Any, dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, deduplicate: bool = False, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.datasets.ref.DatasetRef]¶

Query for and iterate over dataset references matching user-provided criteria.

Parameters:	datasetType An expression that fully or partially identifies the dataset types to be queried. Allowed types include `DatasetType`, `str`, `re.Pattern`, and iterables thereof. The special value can be used to query all dataset types. See DatasetType expressions for more information. collections An expression that fully or partially identifies the collections to search for datasets, such as a `str`, `re.Pattern`, or iterable thereof. can be used to return all collections. See Collection expressions for more information. dimensions : `Iterable` of `Dimension` or `str` Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the `dataId` or `where` arguments. dataId : `dict` or `DataCoordinate`, optional A data ID whose key-value pairs are used as equality constraints in the query. where : `str`, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information. deduplicate : `bool`, optional If `True` (`False` is default), for each result data ID, only yield one `DatasetRef` of each `DatasetType`, from the first collection in which a dataset of that dataset type appears (according to the order of `collections` passed in). If `True`, `collections` must not contain regular expressions and may not be . expand : `bool`, optional If `True` (default) attach `ExpandedDataCoordinate` instead of minimal `DataCoordinate` base-class instances. kwds Additional keyword arguments are forwarded to `DataCoordinate.standardize` when processing the `dataId` argument (and may be used to provide a constraining data ID even when the `dataId` argument is `None`).
Yields:	ref : `DatasetRef` Dataset references matching the given query criteria. These are grouped by `DatasetType` if the query evaluates to multiple dataset types, but order is otherwise unspecified.
Raises:	TypeError Raised when the arguments are incompatible, such as when a collection wildcard is passed when `deduplicate` is `True`.

Notes

When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use queryDimensions to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls to queryDatasets with the returned data IDs passed as constraints.

queryDimensions(dimensions: Union[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]], lsst.daf.butler.core.dimensions.elements.Dimension, str], *, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate]¶

Query for and iterate over data IDs matching user-provided criteria.

Parameters:

dimensions : Dimension or str, or iterable thereof: The dimensions of the data IDs to yield, as either Dimension instances or str. Will be automatically expanded to a complete DimensionGraph.
dataId : dict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
datasets : Any, optional: An expression that fully or partially identifies dataset types that should constrain the yielded data IDs. For example, including “raw” here would constrain the yielded instrument, exposure, detector, and physical_filter values to only those for which at least one “raw” dataset exists in collections. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. Unlike other dataset type expressions, is not permitted - it doesn’t make sense to constrain data IDs on the existence of all datasets. See DatasetType expressions for more information.
collections: `Any`, optional: An expression that fully or partially identifies the collections to search for datasets, such as a str, re.Pattern, or iterable thereof. can be used to return all collections. Must be provided if datasets is, and is ignored if it is not. See Collection expressions for more information.
where : str, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
expand : bool, optional: If True (default) yield ExpandedDataCoordinate instead of minimal DataCoordinate base-class instances.
kwds: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Yields:

dataId : DataCoordinate: Data IDs matching the given query parameters. Order is unspecified.

registerCollection(name: str, type: lsst.daf.butler.registry._collectionType.CollectionType = <CollectionType.TAGGED: 2>)¶

Add a new collection if one with the given name does not exist.

Parameters:	name : `str` The name of the collection to create. type : `CollectionType` Enum value indicating the type of collection to create.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

registerDatasetType(datasetType: lsst.daf.butler.core.datasets.type.DatasetType) → bool¶

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters:	datasetType : `DatasetType` The `DatasetType` to be added.
Returns:	inserted : `bool` `True` if `datasetType` was inserted, `False` if an identical existing `DatsetType` was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.
Raises:	ValueError Raised if the dimensions or storage class are invalid. ConflictingDefinitionError Raised if this DatasetType is already registered with a different definition.

registerOpaqueTable(tableName: str, spec: lsst.daf.butler.core.ddl.TableSpec)¶

Add an opaque (to the Registry) table for use by a Datastore or other data repository client.

Opaque table records can be added via insertOpaqueData, retrieved via fetchOpaqueData, and removed via deleteOpaqueData.

Parameters:	tableName : `str` Logical name of the opaque table. This may differ from the actual name used in the database by a prefix and/or suffix. spec : `ddl.TableSpec` Specification for the table to be added.

registerRun(name: str)¶

Add a new run if one with the given name does not exist.

Parameters:	name : `str` The name of the run to create.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

removeDataset(ref: lsst.daf.butler.core.datasets.ref.DatasetRef)¶

Remove a dataset from the Registry.

The dataset and all components will be removed unconditionally from all collections, and any associated Quantum records will also be removed. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters:	ref : `DatasetRef` Reference to the dataset to be removed. Must include a valid `id` attribute, and should be considered invalidated upon return.
Raises:	AmbiguousDatasetError Raised if `ref.id` is `None`. OrphanedRecordError Raised if the dataset is still present in any `Datastore`.

removeDatasetLocation(datastoreName, ref)¶

Remove datastore location associated with this dataset.

Typically used by Datastore when a dataset is removed.

Parameters:	datastoreName : `str` Name of this `Datastore`. ref : `DatasetRef` A reference to the dataset for which information is to be removed.
Raises:	AmbiguousDatasetError Raised if `ref.id` is `None`.

setCollectionChain(parent: str, children: Any)¶

Define or redefine a CHAINED collection.

Parameters:

parent : str: Name of the chained collection. Must have already been added via a call to Registry.registerCollection.
children : Any: An expression defining an ordered search of child collections, generally an iterable of str. Restrictions on the dataset types to be searched can also be included, by passing mapping or an iterable containing tuples; see Collection expressions for more information.

Raises:

MissingCollectionError: Raised when any of the given collections do not exist in the Registry.
TypeError: Raised if parent does not correspond to a CHAINED collection.
ValueError: Raised if the given collections contains a cycle.

transaction()¶: Return a context manager that represents a transaction.

Navigation

Registry¶