SqlRegistry¶

class lsst.daf.butler.registry.SqlRegistry(database: Database, defaults: RegistryDefaults, managers: RegistryManagerInstances)¶

Bases: lsst.daf.butler.registry.Registry

Registry implementation based on SQLAlchemy.

Parameters

databaseDatabase: Database instance to store Registry.
defaultsRegistryDefaults: Default collection search path and/or output RUN collection.
managersRegistryManagerInstances: All the managers required for this registry.

Attributes Summary

`defaultConfigFile`	Path to configuration defaults.
`defaults`	Default collection search path and/or output `RUN` collection (`RegistryDefaults`).
`dimensions`	All dimensions recognized by this `Registry` (`DimensionUniverse`).

Methods Summary

`associate`(collection, refs)	Add existing datasets to a `TAGGED` collection.
`certify`(collection, refs, timespan)	Associate one or more datasets with a calibration collection and a validity range within it.
`copy`([defaults])	Create a new `Registry` backed by the same data repository and connection as this one, but independent defaults.
`createFromConfig`([config, dimensionConfig, …])	Create registry database and return `SqlRegistry` instance.
`decertify`(collection, datasetType, timespan, *)	Remove or adjust datasets to clear a validity range within a calibration collection.
`deleteOpaqueData`(tableName, **where)	Remove records from an opaque table.
`determineTrampoline`(config)	Return class to use to instantiate real registry.
`disassociate`(collection, refs)	Remove existing datasets from a `TAGGED` collection.
`expandDataId`([dataId, graph, records, …])	Expand a dimension-based data ID to include additional information.
`fetchOpaqueData`(tableName, **where)	Retrieve records from an opaque table.
`findDataset`(datasetType[, dataId, …])	Find a dataset given its `DatasetType` and data ID.
`forceRegistryConfig`(config)	Force the supplied config to a `RegistryConfig`.
`fromConfig`(config[, butlerRoot, writeable, …])	Create `Registry` subclass instance from `config`.
`getCollectionChain`(parent)	Return the child collections in a `CHAINED` collection.
`getCollectionDocumentation`(collection)	Retrieve the documentation string for a collection.
`getCollectionSummary`(collection)	Return a summary for the given collection.
`getCollectionType`(name)	Return an enumeration value indicating the type of the given collection.
`getDataset`(id)	Retrieve a Dataset entry.
`getDatasetLocations`(ref)	Retrieve datastore locations for a given dataset.
`getDatasetType`(name)	Get the `DatasetType`.
`getDatastoreBridgeManager`()	Return an object that allows a new `Datastore` instance to communicate with this `Registry`.
`insertDatasets`(datasetType, dataIds[, run, …])	Insert one or more datasets into the `Registry`
`insertDimensionData`(element, *data[, conform])	Insert one or more dimension records into the database.
`insertOpaqueData`(tableName, *data)	Insert records into an opaque table.
`isWriteable`()	Return `True` if this registry allows write operations, and `False` otherwise.
`makeQueryBuilder`(summary)	Return a `QueryBuilder` instance capable of constructing and managing more complex queries than those obtainable via `Registry` interfaces.
`queryCollections`([expression, datasetType, …])	Iterate over the collections whose names match an expression.
`queryDataIds`(dimensions, *[, dataId, …])	Query for data IDs matching user-provided criteria.
`queryDatasetAssociations`(datasetType[, …])	Iterate over dataset-collection combinations where the dataset is in the collection.
`queryDatasetTypes`([expression, components])	Iterate over the dataset types whose names match an expression.
`queryDatasets`(datasetType, *[, collections, …])	Query for and iterate over dataset references matching user-provided criteria.
`queryDimensionRecords`(element, *[, dataId, …])	Query for dimension information matching user-provided criteria.
`refresh`()	Refresh all in-memory state by querying the database.
`registerCollection`(name[, type, doc])	Add a new collection if one with the given name does not exist.
`registerDatasetType`(datasetType)	Add a new `DatasetType` to the Registry.
`registerOpaqueTable`(tableName, spec)	Add an opaque (to the `Registry`) table for use by a `Datastore` or other data repository client.
`registerRun`(name[, doc])	Add a new run if one with the given name does not exist.
`removeCollection`(name)	Completely remove the given collection.
`removeDatasetType`(name)	Remove the named `DatasetType` from the registry.
`removeDatasets`(refs)	Remove datasets from the Registry.
`resetConnectionPool`()	Reset SQLAlchemy connection pool for `SqlRegistry` database.
`setCollectionChain`(parent, children, *[, …])	Define or redefine a `CHAINED` collection.
`setCollectionDocumentation`(collection, doc)	Set the documentation string for a collection.
`syncDimensionData`(element, row[, conform])	Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.
`transaction`(*[, savepoint])	Return a context manager that represents a transaction.

Attributes Documentation

defaultConfigFile: Optional[str] = None¶: Path to configuration defaults. Accessed within the configs resource or relative to a search path. Can be None if no defaults specified.

defaults¶

Default collection search path and/or output RUN collection (RegistryDefaults).

This is an immutable struct whose components may not be set individually, but the entire struct can be set by assigning to this property.

dimensions¶

Methods Documentation

associate(collection: str, refs: Iterable[lsst.daf.butler.DatasetRef]) → None ¶

Add existing datasets to a TAGGED collection.

If a DatasetRef with the same exact ID is already in a collection nothing is changed. If a DatasetRef with the same DatasetType and data ID but with different ID exists in the collection, ConflictingDefinitionError is raised.

Parameters

collectionstr: Indicates the collection the datasets should be associated with.
refsIterable [ DatasetRef ]: An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises

ConflictingDefinitionError: If a Dataset with the given DatasetRef already exists in the given collection.
AmbiguousDatasetError: Raised if any(ref.id is None for ref in refs).
MissingCollectionError: Raised if collection does not exist in the registry.
TypeError: Raise adding new datasets to the given collection is not allowed.

certify(collection: str, refs: Iterable[lsst.daf.butler.DatasetRef], timespan: lsst.daf.butler.Timespan) → None ¶

Associate one or more datasets with a calibration collection and a validity range within it.

Parameters

collectionstr: The name of an already-registered CALIBRATION collection.
refsIterable [ DatasetRef ]: Datasets to be associated.
timespanTimespan: The validity range for these datasets within the collection.

Raises

AmbiguousDatasetError: Raised if any of the given DatasetRef instances is unresolved.
ConflictingDefinitionError: Raised if the collection already contains a different dataset with the same DatasetType and data ID and an overlapping validity range.
TypeError: Raised if collection is not a CALIBRATION collection or if one or more datasets are of a dataset type for which DatasetType.isCalibration returns False.

copy(defaults: Optional[lsst.daf.butler.registry.RegistryDefaults] = None) → lsst.daf.butler.registry.Registry¶

Create a new Registry backed by the same data repository and connection as this one, but independent defaults.

Parameters

defaultsRegistryDefaults, optional: Default collections and data ID values for the new registry. If not provided, self.defaults will be used (but future changes to either registry’s defaults will not affect the other).

Returns

copyRegistry: A new Registry instance with its own defaults.

Notes

Because the new registry shares a connection with the original, they also share transaction state (despite the fact that their transaction context manager methods do not reflect this), and must be used with care.

classmethod createFromConfig(config: Optional[Union[lsst.daf.butler.registry.RegistryConfig, str]] = None, dimensionConfig: Optional[Union[lsst.daf.butler.DimensionConfig, str]] = None, butlerRoot: Optional[str] = None) → lsst.daf.butler.registry.Registry¶

Create registry database and return SqlRegistry instance.

This method initializes database contents, database must be empty prior to calling this method.

Parameters

configRegistryConfig or str, optional: Registry configuration, if missing then default configuration will be loaded from registry.yaml.
dimensionConfigDimensionConfig or str, optional: Dimensions configuration, if missing then default configuration will be loaded from dimensions.yaml.
butlerRootstr, optional: Path to the repository root this SqlRegistry will manage.

Returns

registrySqlRegistry: A new SqlRegistry instance.

decertify(collection: str, datasetType: Union[str, lsst.daf.butler.DatasetType], timespan: lsst.daf.butler.Timespan, *, dataIds: Optional[Iterable[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]]] = None) → None ¶

Remove or adjust datasets to clear a validity range within a calibration collection.

Parameters

collectionstr: The name of an already-registered CALIBRATION collection.
datasetTypestr or DatasetType: Name or DatasetType instance for the datasets to be decertified.
timespanTimespan, optional: The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.
dataIdsIterable [ DataId ], optional: Data IDs that should be decertified within the given validity range If None, all data IDs for self.datasetType will be decertified.

Raises

TypeError: Raised if collection is not a CALIBRATION collection or if datasetType.isCalibration() is False.

deleteOpaqueData(tableName: str, **where: Any) → None ¶

Remove records from an opaque table.

Parameters

tableNamestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
where: Additional keyword arguments are interpreted as equality constraints that restrict the deleted rows (combined with AND); keyword arguments are column names and values are the values they must have.

classmethod determineTrampoline(config: Optional[Union[ButlerConfig, RegistryConfig, Config, str]]) → Tuple[Type[Registry], RegistryConfig]¶

Return class to use to instantiate real registry.

Parameters

configRegistryConfig or str, optional: Registry configuration, if missing then default configuration will be loaded from registry.yaml.

Returns

requested_clstype of Registry: The real registry class to use.
registry_configRegistryConfig: The RegistryConfig to use.

disassociate(collection: str, refs: Iterable[lsst.daf.butler.DatasetRef]) → None ¶

Remove existing datasets from a TAGGED collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters

collectionstr: The collection the datasets should no longer be associated with.
refsIterable [ DatasetRef ]: An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises

AmbiguousDatasetError: Raised if any of the given dataset references is unresolved.
MissingCollectionError: Raised if collection does not exist in the registry.
TypeError: Raise adding new datasets to the given collection is not allowed.

expandDataId(dataId: Optional[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]] = None, *, graph: Optional[lsst.daf.butler.DimensionGraph] = None, records: Optional[Union[lsst.daf.butler.NamedKeyMapping[lsst.daf.butler.DimensionElement, Optional[lsst.daf.butler.DimensionRecord]], Mapping[str, Optional[lsst.daf.butler.DimensionRecord]]]] = None, withDefaults: bool = True, **kwargs: Any) → lsst.daf.butler.DataCoordinate¶

Expand a dimension-based data ID to include additional information.

Parameters

dataIdDataCoordinate or dict, optional: Data ID to be expanded; augmented and overridden by kwds.
graphDimensionGraph, optional: Set of dimensions for the expanded ID. If None, the dimensions will be inferred from the keys of dataId and kwds. Dimensions that are in dataId or kwds but not in graph are silently ignored, providing a way to extract and expand a subset of a data ID.
recordsMapping [str, DimensionRecord], optional: Dimension record data to use before querying the database for that data, keyed by element name.
withDefaultsbool, optional: Utilize self.defaults.dataId to fill in missing governor dimension key-value pairs. Defaults to True (i.e. defaults are used).
**kwargs: Additional keywords are treated like additional key-value pairs for dataId, extending and overriding

Returns

expandedDataCoordinate: A data ID that includes full metadata for all of the dimensions it identifieds, i.e. guarantees that expanded.hasRecords() and expanded.hasFull() both return True.

fetchOpaqueData(tableName: str, **where: Any) → Iterator[dict]¶

Retrieve records from an opaque table.

Parameters

tableNamestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
where: Additional keyword arguments are interpreted as equality constraints that restrict the returned rows (combined with AND); keyword arguments are column names and values are the values they must have.

Yields

rowdict: A dictionary representing a single result row.

findDataset(datasetType: Union[lsst.daf.butler.DatasetType, str], dataId: Optional[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]] = None, *, collections: Optional[Any] = None, timespan: Optional[lsst.daf.butler.Timespan] = None, **kwargs: Any) → Optional[lsst.daf.butler.DatasetRef]¶

Find a dataset given its DatasetType and data ID.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.

Parameters

datasetTypeDatasetType or str: A DatasetType or the name of one.
dataIddict or DataCoordinate, optional: A dict-like object containing the Dimension links that identify the dataset within a collection.
collections, optional.: An expression that fully or partially identifies the collections to search for the dataset; see Collection expressions for more information. Defaults to self.defaults.collections.
timespanTimespan, optional: A timespan that the validity range of the dataset must overlap. If not provided, any CALIBRATION collections matched by the collections argument will not be searched.
**kwargs: Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one.

Returns

refDatasetRef: A reference to the dataset, or None if no matching Dataset was found.

Raises

TypeError: Raised if collections is None and self.defaults.collections is None.
LookupError: Raised if one or more data ID keys are missing.
KeyError: Raised if the dataset type does not exist.
MissingCollectionError: Raised if any of collections does not exist in the registry.

Notes

This method simply returns None and does not raise an exception even when the set of collections searched is intrinsically incompatible with the dataset type, e.g. if datasetType.isCalibration() is False, but only CALIBRATION collections are being searched. This may make it harder to debug some lookup failures, but the behavior is intentional; we consider it more important that failed searches are reported consistently, regardless of the reason, and that adding additional collections that do not contain a match to the search path never changes the behavior.

classmethod forceRegistryConfig(config: Optional[Union[ButlerConfig, RegistryConfig, Config, str]]) → RegistryConfig ¶

Force the supplied config to a RegistryConfig.

Parameters

configRegistryConfig, Config or str or None: Registry configuration, if missing then default configuration will be loaded from registry.yaml.

Returns

registry_configRegistryConfig: A registry config.

classmethod fromConfig(config: Union[ButlerConfig, RegistryConfig, Config, str], butlerRoot: Optional[Union[str, ButlerURI]] = None, writeable: bool = True, defaults: Optional[RegistryDefaults] = None) → Registry ¶

Create Registry subclass instance from config.

Registry database must be inbitialized prior to calling this method.

Parameters

configButlerConfig, RegistryConfig, Config or str: Registry configuration
butlerRootstr or ButlerURI, optional: Path to the repository root this Registry will manage.
writeablebool, optional: If True (default) create a read-write connection to the database.
defaultsRegistryDefaults, optional: Default collection search path and/or output RUN collection.

Returns

registrySqlRegistry (subclass): A new SqlRegistry subclass instance.

getCollectionChain(parent: str) → lsst.daf.butler.registry.wildcards.CollectionSearch ¶

Return the child collections in a CHAINED collection.

Parameters

parentstr: Name of the chained collection. Must have already been added via a call to Registry.registerCollection.

Returns

childrenCollectionSearch: An object that defines the search path of the collection. See Collection expressions for more information.

Raises

MissingCollectionError: Raised if parent does not exist in the Registry.
TypeError: Raised if parent does not correspond to a CHAINED collection.

getCollectionDocumentation(collection: str) → Optional[str]¶

Retrieve the documentation string for a collection.

Parameters

namestr: Name of the collection.

Returns

docsstr or None: Docstring for the collection with the given name.

getCollectionSummary(collection: str) → lsst.daf.butler.registry.summaries.CollectionSummary¶

Return a summary for the given collection.

Parameters

collectionstr: Name of the collection for which a summary is to be retrieved.

Returns

summaryCollectionSummary: Summary of the dataset types and governor dimension values in this collection.

getCollectionType(name: str) → lsst.daf.butler.registry.CollectionType¶

Return an enumeration value indicating the type of the given collection.

Parameters

namestr: The name of the collection.

Returns

typeCollectionType: Enum value indicating the type of this collection.

Raises

MissingCollectionError: Raised if no collection with the given name exists.

getDataset(id: Union[int, uuid.UUID]) → Optional[lsst.daf.butler.DatasetRef]¶

Retrieve a Dataset entry.

Parameters

idDatasetId: The unique identifier for the dataset.

Returns

refDatasetRef or None: A ref to the Dataset, or None if no matching Dataset was found.

getDatasetLocations(ref: lsst.daf.butler.DatasetRef) → Iterable[str]¶

Retrieve datastore locations for a given dataset.

Parameters

refDatasetRef: A reference to the dataset for which to retrieve storage information.

Returns

datastoresIterable [ str ]: All the matching datastores holding this dataset.

Raises

AmbiguousDatasetError: Raised if ref.id is None.

getDatasetType(name: str) → lsst.daf.butler.DatasetType¶

Get the DatasetType.

Parameters

namestr: Name of the type.

Returns

typeDatasetType: The DatasetType associated with the given name.

Raises

KeyError: Requested named DatasetType could not be found in registry.

getDatastoreBridgeManager() → DatastoreRegistryBridgeManager¶

Return an object that allows a new Datastore instance to communicate with this Registry.

Returns

managerDatastoreRegistryBridgeManager: Object that mediates communication between this Registry and its associated datastores.

insertDatasets(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataIds: Iterable[Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any]]], run: Optional[str] = None, expand: bool = True, idGenerationMode: lsst.daf.butler.registry.interfaces._datasets.DatasetIdGenEnum = <DatasetIdGenEnum.UNIQUE: 0>) → List[lsst.daf.butler.DatasetRef]¶

Insert one or more datasets into the Registry

This always adds new datasets; to associate existing datasets with a new collection, use associate.

Parameters

datasetTypeDatasetType or str: A DatasetType or the name of one.
dataIdsIterable of dict or DataCoordinate: Dimension-based identifiers for the new datasets.
runstr, optional: The name of the run that produced the datasets. Defaults to self.defaults.run.
expandbool, optional: If True (default), expand data IDs as they are inserted. This is necessary in general to allow datastore to generate file templates, but it may be disabled if the caller can guarantee this is unnecessary.
idGenerationModeDatasetIdGenEnum, optional: Specifies option for generating dataset IDs. By default unique IDs are generated for each inserted dataset.

Returns

refslist of DatasetRef: Resolved DatasetRef instances for all given data IDs (in the same order).

Raises

TypeError: Raised if run is None and self.defaults.run is None.
ConflictingDefinitionError: If a dataset with the same dataset type and data ID as one of those given already exists in run.
MissingCollectionError: Raised if run does not exist in the registry.

insertDimensionData(element: Union[lsst.daf.butler.DimensionElement, str], *data: Union[Mapping[str, Any], lsst.daf.butler.DimensionRecord], conform: bool = True) → None ¶

Insert one or more dimension records into the database.

Parameters

elementDimensionElement or str: The DimensionElement or name thereof that identifies the table records will be inserted into.
datadict or DimensionRecord (variadic): One or more records to insert.
conformbool, optional: If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

insertOpaqueData(tableName: str, *data: dict) → None ¶

Insert records into an opaque table.

Parameters

tableNamestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
data: Each additional positional argument is a dictionary that represents a single row to be added.

isWriteable() → bool ¶: Return True if this registry allows write operations, and False otherwise.

makeQueryBuilder(summary: lsst.daf.butler.registry.queries.QuerySummary) → lsst.daf.butler.registry.queries.QueryBuilder¶

Return a QueryBuilder instance capable of constructing and managing more complex queries than those obtainable via Registry interfaces.

This is an advanced interface; downstream code should prefer Registry.queryDataIds and Registry.queryDatasets whenever those are sufficient.

Parameters

summaryqueries.QuerySummary: Object describing and categorizing the full set of dimensions that will be included in the query.

Returns

builderqueries.QueryBuilder: Object that can be used to construct and perform advanced queries.

queryCollections(expression: Any = Ellipsis, datasetType: Optional[lsst.daf.butler.core.datasets.type.DatasetType] = None, collectionTypes: Iterable[lsst.daf.butler.registry._collectionType.CollectionType] = frozenset({<CollectionType.RUN: 1>, <CollectionType.TAGGED: 2>, <CollectionType.CHAINED: 3>, <CollectionType.CALIBRATION: 4>}), flattenChains: bool = False, includeChains: Optional[bool] = None) → Iterator[str]¶

Iterate over the collections whose names match an expression.

Parameters

expressionAny, optional: An expression that identifies the collections to return, such as a str (for full matches), re.Pattern (for partial matches), or iterable thereof. can be used to return all collections, and is the default. See Collection expressions for more information.
datasetTypeDatasetType, optional: If provided, only yield collections that may contain datasets of this type. This is a conservative approximation in general; it may yield collections that do not have any such datasets.
collectionTypesAbstractSet [ CollectionType ], optional: If provided, only yield collections of these types.
flattenChainsbool, optional: If True (False is default), recursively yield the child collections of matching CHAINED collections.
includeChainsbool, optional: If True, yield records for matching CHAINED collections. Default is the opposite of flattenChains: include either CHAINED collections or their children, but not both.

Yields

collectionstr: The name of a collection that matches expression.

queryDataIds(dimensions: Union[Iterable[Union[lsst.daf.butler.Dimension, str]], lsst.daf.butler.Dimension, str], *, dataId: Optional[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs: Any) → lsst.daf.butler.registry.queries.DataCoordinateQueryResults¶

Query for data IDs matching user-provided criteria.

Parameters

dimensionsDimension or str, or iterable thereof: The dimensions of the data IDs to yield, as either Dimension instances or str. Will be automatically expanded to a complete DimensionGraph.
dataIddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
datasetsAny, optional: An expression that fully or partially identifies dataset types that should constrain the yielded data IDs. For example, including “raw” here would constrain the yielded instrument, exposure, detector, and physical_filter values to only those for which at least one “raw” dataset exists in collections. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. Unlike other dataset type expressions, ... is not permitted - it doesn’t make sense to constrain data IDs on the existence of all datasets. See DatasetType expressions for more information.
collections: `Any`, optional: An expression that identifies the collections to search for datasets, such as a str (for full matches), re.Pattern (for partial matches), or iterable thereof. can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. Ignored unless datasets is also passed. See Collection expressions for more information.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
componentsbool, optional: If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace.
checkbool, optional: If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns

dataIdsDataCoordinateQueryResults: Data IDs matching the given query parameters. These are guaranteed to identify all dimensions (DataCoordinate.hasFull returns True), but will not contain DimensionRecord objects (DataCoordinate.hasRecords returns False). Call DataCoordinateQueryResults.expanded on the returned object to fetch those (and consider using DataCoordinateQueryResults.materialize on the returned object first if the expected number of rows is very large). See documentation for those methods for additional information.

Raises

TypeError: Raised if collections is None, self.defaults.collections is None, and datasets is not None.

queryDatasetAssociations(datasetType: Union[str, lsst.daf.butler.core.datasets.type.DatasetType], collections: Any = Ellipsis, *, collectionTypes: Iterable[lsst.daf.butler.registry._collectionType.CollectionType] = frozenset({<CollectionType.RUN: 1>, <CollectionType.TAGGED: 2>, <CollectionType.CHAINED: 3>, <CollectionType.CALIBRATION: 4>}), flattenChains: bool = False) → Iterator[lsst.daf.butler.DatasetAssociation]¶

Iterate over dataset-collection combinations where the dataset is in the collection.

This method is a temporary placeholder for better support for assocation results in queryDatasets. It will probably be removed in the future, and should be avoided in production code whenever possible.

Parameters

datasetTypeDatasetType or str: A dataset type object or the name of one.
collections: `Any`, optional: An expression that identifies the collections to search for datasets, such as a str (for full matches), re.Pattern (for partial matches), or iterable thereof. can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. See Collection expressions for more information.
collectionTypesAbstractSet [ CollectionType ], optional: If provided, only yield associations from collections of these types.
flattenChainsbool, optional: If True (default) search in the children of CHAINED collections. If False, CHAINED collections are ignored.

Yields

associationDatasetAssociation: Object representing the relationship beween a single dataset and a single collection.

Raises

TypeError: Raised if collections is None and self.defaults.collections is None.

queryDatasetTypes(expression: Any = Ellipsis, *, components: Optional[bool] = None) → Iterator[lsst.daf.butler.DatasetType]¶

Iterate over the dataset types whose names match an expression.

Parameters

expressionAny, optional: An expression that fully or partially identifies the dataset types to return, such as a str, re.Pattern, or iterable thereof. can be used to return all dataset types, and is the default. See DatasetType expressions for more information.
componentsbool, optional: If True, apply all expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

Yields

datasetTypeDatasetType: A DatasetType instance whose name matches expression.

queryDatasets(datasetType: Any, *, collections: Optional[Any] = None, dimensions: Optional[Iterable[Union[lsst.daf.butler.Dimension, str]]] = None, dataId: Optional[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]] = None, where: Optional[str] = None, findFirst: bool = False, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs: Any) → lsst.daf.butler.registry.queries.DatasetQueryResults¶

Query for and iterate over dataset references matching user-provided criteria.

Parameters

datasetType: An expression that fully or partially identifies the dataset types to be queried. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. The special value can be used to query all dataset types. See DatasetType expressions for more information.
collections: optional: An expression that identifies the collections to search, such as a str (for full matches), re.Pattern (for partial matches), or iterable thereof. can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. See Collection expressions for more information.
dimensionsIterable of Dimension or str: Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the dataId or where arguments.
dataIddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
findFirstbool, optional: If True (False is default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be .
componentsbool, optional: If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace.
checkbool, optional: If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns

refsqueries.DatasetQueryResults: Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e. DataCoordinate.hasFull will return True), but will not include dimension records (DataCoordinate.hasRecords will be False) unless expanded is called on the result object (which returns a new one).

Raises

TypeError: Raised when the arguments are incompatible, such as when a collection wildcard is passed when findFirst is True, or when collections is None and``self.defaults.collections`` is also None.

Notes

When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use queryDataIds to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls to queryDatasets with the returned data IDs passed as constraints.

queryDimensionRecords(element: Union[lsst.daf.butler.DimensionElement, str], *, dataId: Optional[Union[lsst.daf.butler.DataCoordinate, Mapping[str, Any]]] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs: Any) → Iterator[lsst.daf.butler.DimensionRecord]¶

Query for dimension information matching user-provided criteria.

Parameters

elementDimensionElement or str: The dimension element to obtain records for.
dataIddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
datasetsAny, optional: An expression that fully or partially identifies dataset types that should constrain the yielded records. See queryDataIds and DatasetType expressions for more information.
collections: `Any`, optional: An expression that identifies the collections to search for datasets, such as a str (for full matches), re.Pattern (for partial matches), or iterable thereof. can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. Ignored unless datasets is also passed. See Collection expressions for more information.
wherestr, optional: A string expression similar to a SQL WHERE clause. See queryDataIds and Dimension expressions for more information.
componentsbool, optional: Whether to apply dataset expressions to components as well. See queryDataIds for more information.
bindMapping, optional: Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace.
checkbool, optional: If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).
**kwargs: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns

dataIdsDataCoordinateQueryResults: Data IDs matching the given query parameters.

refresh() → None ¶

Refresh all in-memory state by querying the database.

This may be necessary to enable querying for entities added by other registry instances after this one was constructed.

registerCollection(name: str, type: lsst.daf.butler.registry._collectionType.CollectionType = <CollectionType.TAGGED: 2>, doc: Optional[str] = None) → None ¶

Add a new collection if one with the given name does not exist.

Parameters

namestr: The name of the collection to create.
typeCollectionType: Enum value indicating the type of collection to create.
docstr, optional: Documentation string for the collection.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

registerDatasetType(datasetType: lsst.daf.butler.DatasetType) → bool ¶

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters

datasetTypeDatasetType: The DatasetType to be added.

Returns

insertedbool: True if datasetType was inserted, False if an identical existing DatsetType was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.

Raises

ValueError: Raised if the dimensions or storage class are invalid.
ConflictingDefinitionError: Raised if this DatasetType is already registered with a different definition.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

registerOpaqueTable(tableName: str, spec: lsst.daf.butler.core.ddl.TableSpec) → None ¶

Add an opaque (to the Registry) table for use by a Datastore or other data repository client.

Opaque table records can be added via insertOpaqueData, retrieved via fetchOpaqueData, and removed via deleteOpaqueData.

Parameters

tableNamestr: Logical name of the opaque table. This may differ from the actual name used in the database by a prefix and/or suffix.
specddl.TableSpec: Specification for the table to be added.

registerRun(name: str, doc: Optional[str] = None) → None ¶

Add a new run if one with the given name does not exist.

Parameters

namestr: The name of the run to create.
docstr, optional: Documentation string for the collection.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

removeCollection(name: str) → None ¶

Completely remove the given collection.

Parameters

namestr: The name of the collection to remove.

Raises

MissingCollectionError: Raised if no collection with the given name exists.

Notes

If this is a RUN collection, all datasets and quanta in it are also fully removed. This requires that those datasets be removed (or at least trashed) from any datastores that hold them first.

A collection may not be deleted as long as it is referenced by a CHAINED collection; the CHAINED collection must be deleted or redefined first.

removeDatasetType(name: str) → None ¶

Remove the named DatasetType from the registry.

Warning

Registry implementations can cache the dataset type definitions. This means that deleting the dataset type definition may result in unexpected behavior from other butler processes that are active that have not seen the deletion.

Parameters

namestr: Name of the type to be removed.

Raises

lsst.daf.butler.registry.OrphanedRecordError: Raised if an attempt is made to remove the dataset type definition when there are already datasets associated with it.

Notes

If the dataset type is not registered the method will return without action.

removeDatasets(refs: Iterable[lsst.daf.butler.DatasetRef]) → None ¶

Remove datasets from the Registry.

The datasets will be removed unconditionally from all collections, and any Quantum that consumed this dataset will instead be marked with having a NULL input. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters

refsIterable of DatasetRef: References to the datasets to be removed. Must include a valid id attribute, and should be considered invalidated upon return.

Raises

AmbiguousDatasetError: Raised if any ref.id is None.
OrphanedRecordError: Raised if any dataset is still present in any Datastore.

resetConnectionPool() → None ¶

Reset SQLAlchemy connection pool for SqlRegistry database.

This operation is useful when using registry with fork-based multiprocessing. To use registry across fork boundary one has to make sure that there are no currently active connections (no session or transaction is in progress) and connection pool is reset using this method. This method should be called by the child process immediately after the fork.

setCollectionChain(parent: str, children: Any, *, flatten: bool = False) → None ¶

Define or redefine a CHAINED collection.

Parameters

parentstr: Name of the chained collection. Must have already been added via a call to Registry.registerCollection.
childrenAny: An expression defining an ordered search of child collections, generally an iterable of str; see Collection expressions for more information.
flattenbool, optional: If True (False is default), recursively flatten out any nested CHAINED collections in children first.

Raises

MissingCollectionError: Raised when any of the given collections do not exist in the Registry.
TypeError: Raised if parent does not correspond to a CHAINED collection.
ValueError: Raised if the given collections contains a cycle.

setCollectionDocumentation(collection: str, doc: Optional[str]) → None ¶

Set the documentation string for a collection.

Parameters

namestr: Name of the collection.
docsstr or None: Docstring for the collection with the given name; will replace any existing docstring. Passing None will remove any existing docstring.

syncDimensionData(element: Union[lsst.daf.butler.DimensionElement, str], row: Union[Mapping[str, Any], lsst.daf.butler.DimensionRecord], conform: bool = True) → bool ¶

Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.

Parameters

elementDimensionElement or str: The DimensionElement or name thereof that identifies the table records will be inserted into.
rowdict or DimensionRecord: The record to insert.
conformbool, optional: If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

Returns

insertedbool: True if a new row was inserted, False otherwise.

Raises

ConflictingDefinitionError: Raised if the record exists in the database (according to primary key lookup) but is inconsistent with the given one.

transaction(*, savepoint: bool = False) → Iterator[None]¶: Return a context manager that represents a transaction.

Navigation

SqlRegistry¶