Registry

class lsst.daf.butler.Registry

Bases: ABC

Abstract Registry interface.

All subclasses should store RegistryDefaults in a _defaults property. No other properties are assumed shared between implementations.

Attributes Summary

defaults

Default collection search path and/or output RUN collection (RegistryDefaults).

dimensions

Definitions of all dimensions recognized by this Registry (DimensionUniverse).

obsCoreTableManager

The ObsCore manager instance for this registry (ObsCoreTableManager or None).

Methods Summary

associate(collection, refs)

Add existing datasets to a TAGGED collection.

certify(collection, refs, timespan)

Associate one or more datasets with a calibration collection and a validity range within it.

createFromConfig([config, dimensionConfig, ...])

Create registry database and return Registry instance.

decertify(collection, datasetType, timespan, *)

Remove or adjust datasets to clear a validity range within a calibration collection.

disassociate(collection, refs)

Remove existing datasets from a TAGGED collection.

expandDataId([dataId, graph, records, ...])

Expand a dimension-based data ID to include additional information.

findDataset(datasetType[, dataId, ...])

Find a dataset given its DatasetType and data ID.

getCollectionChain(parent)

Return the child collections in a CHAINED collection.

getCollectionDocumentation(collection)

Retrieve the documentation string for a collection.

getCollectionParentChains(collection)

Return the CHAINED collections that directly contain the given one.

getCollectionSummary(collection)

Return a summary for the given collection.

getCollectionType(name)

Return an enumeration value indicating the type of the given collection.

getDataset(id)

Retrieve a Dataset entry.

getDatasetLocations(ref)

Retrieve datastore locations for a given dataset.

getDatasetType(name)

Get the DatasetType.

insertDatasets(datasetType, dataIds[, run, ...])

Insert one or more datasets into the Registry.

insertDimensionData(element, *data[, ...])

Insert one or more dimension records into the database.

isWriteable()

Return True if this registry allows write operations, and False otherwise.

queryCollections([expression, datasetType, ...])

Iterate over the collections whose names match an expression.

queryDataIds(dimensions, *[, dataId, ...])

Query for data IDs matching user-provided criteria.

queryDatasetAssociations(datasetType[, ...])

Iterate over dataset-collection combinations where the dataset is in the collection.

queryDatasetTypes([expression, components, ...])

Iterate over the dataset types whose names match an expression.

queryDatasets(datasetType, *[, collections, ...])

Query for and iterate over dataset references matching user-provided criteria.

queryDimensionRecords(element, *[, dataId, ...])

Query for dimension information matching user-provided criteria.

refresh()

Refresh all in-memory state by querying the database.

registerCollection(name[, type, doc])

Add a new collection if one with the given name does not exist.

registerDatasetType(datasetType)

Add a new DatasetType to the Registry.

registerRun(name[, doc])

Add a new run if one with the given name does not exist.

removeCollection(name)

Remove the given collection from the registry.

removeDatasetType(name)

Remove the named DatasetType from the registry.

removeDatasets(refs)

Remove datasets from the Registry.

resetConnectionPool()

Reset connection pool for registry if relevant.

setCollectionChain(parent, children, *[, ...])

Define or redefine a CHAINED collection.

setCollectionDocumentation(collection, doc)

Set the documentation string for a collection.

supportsIdGenerationMode(mode)

Test whether the given dataset ID generation mode is supported by insertDatasets.

syncDimensionData(element, row[, conform, ...])

Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.

transaction(*[, savepoint])

Return a context manager that represents a transaction.

Attributes Documentation

defaults

Default collection search path and/or output RUN collection (RegistryDefaults).

This is an immutable struct whose components may not be set individually, but the entire struct can be set by assigning to this property.

dimensions

Definitions of all dimensions recognized by this Registry (DimensionUniverse).

obsCoreTableManager

The ObsCore manager instance for this registry (ObsCoreTableManager or None).

ObsCore manager may not be implemented for all registry backend, or may not be enabled for many repositories.

Methods Documentation

abstract associate(collection: str, refs: Iterable[DatasetRef]) None

Add existing datasets to a TAGGED collection.

If a DatasetRef with the same exact ID is already in a collection nothing is changed. If a DatasetRef with the same DatasetType and data ID but with different ID exists in the collection, ConflictingDefinitionError is raised.

Parameters:
collectionstr

Indicates the collection the datasets should be associated with.

refsIterable [ DatasetRef ]

An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises:
lsst.daf.butler.registry.ConflictingDefinitionError

If a Dataset with the given DatasetRef already exists in the given collection.

lsst.daf.butler.registry.MissingCollectionError

Raised if collection does not exist in the registry.

lsst.daf.butler.registry.CollectionTypeError

Raise adding new datasets to the given collection is not allowed.

abstract certify(collection: str, refs: Iterable[DatasetRef], timespan: Timespan) None

Associate one or more datasets with a calibration collection and a validity range within it.

Parameters:
collectionstr

The name of an already-registered CALIBRATION collection.

refsIterable [ DatasetRef ]

Datasets to be associated.

timespanTimespan

The validity range for these datasets within the collection.

Raises:
lsst.daf.butler.AmbiguousDatasetError

Raised if any of the given DatasetRef instances is unresolved.

lsst.daf.butler.registry.ConflictingDefinitionError

Raised if the collection already contains a different dataset with the same DatasetType and data ID and an overlapping validity range.

lsst.daf.butler.registry.CollectionTypeError

Raised if collection is not a CALIBRATION collection or if one or more datasets are of a dataset type for which DatasetType.isCalibration returns False.

classmethod createFromConfig(config: RegistryConfig | str | None = None, dimensionConfig: DimensionConfig | str | None = None, butlerRoot: str | ParseResult | ResourcePath | Path | None = None) Registry

Create registry database and return Registry instance.

This method initializes database contents, database must be empty prior to calling this method.

Parameters:
configRegistryConfig or str, optional

Registry configuration, if missing then default configuration will be loaded from registry.yaml.

dimensionConfigDimensionConfig or str, optional

Dimensions configuration, if missing then default configuration will be loaded from dimensions.yaml.

butlerRootconvertible to lsst.resources.ResourcePath, optional

Path to the repository root this Registry will manage.

Returns:
registryRegistry

A new Registry instance.

Notes

This method is for backward compatibility only, until all clients migrate to use new _RegistryFactory factory class. Regular clients of registry class do not use this method, it is only used by tests in multiple packages.

abstract decertify(collection: str, datasetType: str | DatasetType, timespan: Timespan, *, dataIds: Iterable[DataCoordinate | Mapping[str, Any]] | None = None) None

Remove or adjust datasets to clear a validity range within a calibration collection.

Parameters:
collectionstr

The name of an already-registered CALIBRATION collection.

datasetTypestr or DatasetType

Name or DatasetType instance for the datasets to be decertified.

timespanTimespan, optional

The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.

dataIdsiterable [dict or DataCoordinate], optional

Data IDs that should be decertified within the given validity range If None, all data IDs for self.datasetType will be decertified.

Raises:
lsst.daf.butler.registry.CollectionTypeError

Raised if collection is not a CALIBRATION collection or if datasetType.isCalibration() is False.

abstract disassociate(collection: str, refs: Iterable[DatasetRef]) None

Remove existing datasets from a TAGGED collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters:
collectionstr

The collection the datasets should no longer be associated with.

refsIterable [ DatasetRef ]

An iterable of resolved DatasetRef instances that already exist in this Registry.

Raises:
lsst.daf.butler.AmbiguousDatasetError

Raised if any of the given dataset references is unresolved.

lsst.daf.butler.registry.MissingCollectionError

Raised if collection does not exist in the registry.

lsst.daf.butler.registry.CollectionTypeError

Raise adding new datasets to the given collection is not allowed.

abstract expandDataId(dataId: DataCoordinate | Mapping[str, Any] | None = None, *, graph: DimensionGraph | None = None, records: NamedKeyMapping[DimensionElement, DimensionRecord | None] | Mapping[str, DimensionRecord | None] | None = None, withDefaults: bool = True, **kwargs: Any) DataCoordinate

Expand a dimension-based data ID to include additional information.

Parameters:
dataIdDataCoordinate or dict, optional

Data ID to be expanded; augmented and overridden by kwargs.

graphDimensionGraph, optional

Set of dimensions for the expanded ID. If None, the dimensions will be inferred from the keys of dataId and kwargs. Dimensions that are in dataId or kwargs but not in graph are silently ignored, providing a way to extract and graph expand a subset of a data ID.

recordsMapping [str, DimensionRecord], optional

Dimension record data to use before querying the database for that data, keyed by element name.

withDefaultsbool, optional

Utilize self.defaults.dataId to fill in missing governor dimension key-value pairs. Defaults to True (i.e. defaults are used).

**kwargs

Additional keywords are treated like additional key-value pairs for dataId, extending and overriding

Returns:
expandedDataCoordinate

A data ID that includes full metadata for all of the dimensions it identifies, i.e. guarantees that expanded.hasRecords() and expanded.hasFull() both return True.

Raises:
lsst.daf.butler.registry.DataIdError

Raised when dataId or keyword arguments specify unknown dimensions or values, or when a resulting data ID contains contradictory key-value pairs, according to dimension relationships.

Notes

This method cannot be relied upon to reject invalid data ID values for dimensions that do actually not have any record columns. For efficiency reasons the records for these dimensions (which have only dimension key values that are given by the caller) may be constructed directly rather than obtained from the registry database.

abstract findDataset(datasetType: DatasetType | str, dataId: DataCoordinate | Mapping[str, Any] | None = None, *, collections: str | Pattern | Iterable[str | Pattern] | ellipsis | CollectionWildcard | None = None, timespan: Timespan | None = None, **kwargs: Any) DatasetRef | None

Find a dataset given its DatasetType and data ID.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.

Parameters:
datasetTypeDatasetType or str

A DatasetType or the name of one. If this is a DatasetType instance, its storage class will be respected and propagated to the output, even if it differs from the dataset type definition in the registry, as long as the storage classes are convertible.

dataIddict or DataCoordinate, optional

A dict-like object containing the Dimension links that identify the dataset within a collection.

collectionscollection expression, optional

An expression that fully or partially identifies the collections to search for the dataset; see Collection expressions for more information. Defaults to self.defaults.collections.

timespanTimespan, optional

A timespan that the validity range of the dataset must overlap. If not provided, any CALIBRATION collections matched by the collections argument will not be searched.

**kwargs

Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one.

Returns:
refDatasetRef

A reference to the dataset, or None if no matching Dataset was found.

Raises:
lsst.daf.butler.registry.NoDefaultCollectionError

Raised if collections is None and self.defaults.collections is None.

LookupError

Raised if one or more data ID keys are missing.

lsst.daf.butler.registry.MissingDatasetTypeError

Raised if the dataset type does not exist.

lsst.daf.butler.registry.MissingCollectionError

Raised if any of collections does not exist in the registry.

Notes

This method simply returns None and does not raise an exception even when the set of collections searched is intrinsically incompatible with the dataset type, e.g. if datasetType.isCalibration() is False, but only CALIBRATION collections are being searched. This may make it harder to debug some lookup failures, but the behavior is intentional; we consider it more important that failed searches are reported consistently, regardless of the reason, and that adding additional collections that do not contain a match to the search path never changes the behavior.

This method handles component dataset types automatically, though most other registry operations do not.

abstract getCollectionChain(parent: str) Sequence[str]

Return the child collections in a CHAINED collection.

Parameters:
parentstr

Name of the chained collection. Must have already been added via a call to Registry.registerCollection.

Returns:
childrenSequence [ str ]

An ordered sequence of collection names that are searched when the given chained collection is searched.

Raises:
lsst.daf.butler.registry.MissingCollectionError

Raised if parent does not exist in the Registry.

lsst.daf.butler.registry.CollectionTypeError

Raised if parent does not correspond to a CHAINED collection.

abstract getCollectionDocumentation(collection: str) str | None

Retrieve the documentation string for a collection.

Parameters:
namestr

Name of the collection.

Returns:
docsstr or None

Docstring for the collection with the given name.

abstract getCollectionParentChains(collection: str) set[str]

Return the CHAINED collections that directly contain the given one.

Parameters:
namestr

Name of the collection.

Returns:
chainsset of str

Set of CHAINED collection names.

abstract getCollectionSummary(collection: str) CollectionSummary

Return a summary for the given collection.

Parameters:
collectionstr

Name of the collection for which a summary is to be retrieved.

Returns:
summaryCollectionSummary

Summary of the dataset types and governor dimension values in this collection.

abstract getCollectionType(name: str) CollectionType

Return an enumeration value indicating the type of the given collection.

Parameters:
namestr

The name of the collection.

Returns:
typeCollectionType

Enum value indicating the type of this collection.

Raises:
lsst.daf.butler.registry.MissingCollectionError

Raised if no collection with the given name exists.

abstract getDataset(id: UUID) DatasetRef | None

Retrieve a Dataset entry.

Parameters:
idDatasetId

The unique identifier for the dataset.

Returns:
refDatasetRef or None

A ref to the Dataset, or None if no matching Dataset was found.

abstract getDatasetLocations(ref: DatasetRef) Iterable[str]

Retrieve datastore locations for a given dataset.

Parameters:
refDatasetRef

A reference to the dataset for which to retrieve storage information.

Returns:
datastoresIterable [ str ]

All the matching datastores holding this dataset.

Raises:
lsst.daf.butler.AmbiguousDatasetError

Raised if ref.id is None.

abstract getDatasetType(name: str) DatasetType

Get the DatasetType.

Parameters:
namestr

Name of the type.

Returns:
typeDatasetType

The DatasetType associated with the given name.

Raises:
lsst.daf.butler.registry.MissingDatasetTypeError

Raised if the requested dataset type has not been registered.

Notes

This method handles component dataset types automatically, though most other registry operations do not.

abstract insertDatasets(datasetType: DatasetType | str, dataIds: Iterable[DataCoordinate | Mapping[str, Any]], run: str | None = None, expand: bool = True, idGenerationMode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE) list[lsst.daf.butler.core.datasets.ref.DatasetRef]

Insert one or more datasets into the Registry.

This always adds new datasets; to associate existing datasets with a new collection, use associate.

Parameters:
datasetTypeDatasetType or str

A DatasetType or the name of one.

dataIdsIterable of dict or DataCoordinate

Dimension-based identifiers for the new datasets.

runstr, optional

The name of the run that produced the datasets. Defaults to self.defaults.run.

expandbool, optional

If True (default), expand data IDs as they are inserted. This is necessary in general to allow datastore to generate file templates, but it may be disabled if the caller can guarantee this is unnecessary.

idGenerationModeDatasetIdGenEnum, optional

Specifies option for generating dataset IDs. By default unique IDs are generated for each inserted dataset.

Returns:
refslist of DatasetRef

Resolved DatasetRef instances for all given data IDs (in the same order).

Raises:
lsst.daf.butler.registry.DatasetTypeError

Raised if datasetType is not known to registry.

lsst.daf.butler.registry.CollectionTypeError

Raised if run collection type is not RUN.

lsst.daf.butler.registry.NoDefaultCollectionError

Raised if run is None and self.defaults.run is None.

lsst.daf.butler.registry.ConflictingDefinitionError

If a dataset with the same dataset type and data ID as one of those given already exists in run.

lsst.daf.butler.registry.MissingCollectionError

Raised if run does not exist in the registry.

abstract insertDimensionData(element: DimensionElement | str, *data: Mapping[str, Any] | DimensionRecord, conform: bool = True, replace: bool = False, skip_existing: bool = False) None

Insert one or more dimension records into the database.

Parameters:
elementDimensionElement or str

The DimensionElement or name thereof that identifies the table records will be inserted into.

*datadict or DimensionRecord

One or more records to insert.

conformbool, optional

If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

replacebool, optional

If True (False is default), replace existing records in the database if there is a conflict.

skip_existingbool, optional

If True (False is default), skip insertion if a record with the same primary key values already exists. Unlike syncDimensionData, this will not detect when the given record differs from what is in the database, and should not be used when this is a concern.

abstract isWriteable() bool

Return True if this registry allows write operations, and False otherwise.

abstract queryCollections(expression: Any = Ellipsis, datasetType: DatasetType | None = None, collectionTypes: Iterable[CollectionType] | CollectionType = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), flattenChains: bool = False, includeChains: bool | None = None) Sequence[str]

Iterate over the collections whose names match an expression.

Parameters:
expressioncollection expression, optional

An expression that identifies the collections to return, such as a str (for full matches or partial matches via globs), re.Pattern (for partial matches), or iterable thereof. ... can be used to return all collections, and is the default. See Collection expressions for more information.

datasetTypeDatasetType, optional

If provided, only yield collections that may contain datasets of this type. This is a conservative approximation in general; it may yield collections that do not have any such datasets.

collectionTypesSet [CollectionType] or CollectionType, optional

If provided, only yield collections of these types.

flattenChainsbool, optional

If True (False is default), recursively yield the child collections of matching CHAINED collections.

includeChainsbool, optional

If True, yield records for matching CHAINED collections. Default is the opposite of flattenChains: include either CHAINED collections or their children, but not both.

Returns:
collectionsSequence [ str ]

The names of collections that match expression.

Raises:
lsst.daf.butler.registry.CollectionExpressionError

Raised when expression is invalid.

Notes

The order in which collections are returned is unspecified, except that the children of a CHAINED collection are guaranteed to be in the order in which they are searched. When multiple parent CHAINED collections match the same criteria, the order in which the two lists appear is unspecified, and the lists of children may be incomplete if a child has multiple parents.

abstract queryDataIds(dimensions: Iterable[Dimension | str] | Dimension | str, *, dataId: DataCoordinate | Mapping[str, Any] | None = None, datasets: Any = None, collections: str | Pattern | Iterable[str | Pattern] | ellipsis | CollectionWildcard | None = None, where: str = '', components: bool | None = False, bind: Mapping[str, Any] | None = None, check: bool = True, **kwargs: Any) DataCoordinateQueryResults

Query for data IDs matching user-provided criteria.

Parameters:
dimensionsDimension or str, or iterable thereof

The dimensions of the data IDs to yield, as either Dimension instances or str. Will be automatically expanded to a complete DimensionGraph.

dataIddict or DataCoordinate, optional

A data ID whose key-value pairs are used as equality constraints in the query.

datasetsdataset type expression, optional

An expression that fully or partially identifies dataset types that should constrain the yielded data IDs. For example, including “raw” here would constrain the yielded instrument, exposure, detector, and physical_filter values to only those for which at least one “raw” dataset exists in collections. Allowed types include DatasetType, str, and iterables thereof. Regular expression objects (i.e. re.Pattern) are deprecated and will be removed after the v26 release. See DatasetType expressions for more information.

collectionscollection expression, optional

An expression that identifies the collections to search for datasets, such as a str (for full matches or partial matches via globs), re.Pattern (for partial matches), or iterable thereof. ... can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. Ignored unless datasets is also passed. See Collection expressions for more information.

wherestr, optional

A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.

componentsbool, optional

If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None, apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

Values other than False are deprecated, and only False will be supported after v26. After v27 this argument will be removed entirely.

bindMapping, optional

Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.

checkbool, optional

If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).

**kwargs

Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns:
dataIdsqueries.DataCoordinateQueryResults

Data IDs matching the given query parameters. These are guaranteed to identify all dimensions (DataCoordinate.hasFull returns True), but will not contain DimensionRecord objects (DataCoordinate.hasRecords returns False). Call expanded on the returned object to fetch those (and consider using materialize on the returned object first if the expected number of rows is very large). See documentation for those methods for additional information.

Raises:
lsst.daf.butler.registry.NoDefaultCollectionError

Raised if collections is None and self.defaults.collections is None.

lsst.daf.butler.registry.CollectionExpressionError

Raised when collections expression is invalid.

lsst.daf.butler.registry.DataIdError

Raised when dataId or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.

lsst.daf.butler.registry.DatasetTypeExpressionError

Raised when datasetType expression is invalid.

lsst.daf.butler.registry.UserExpressionError

Raised when where expression is invalid.

abstract queryDatasetAssociations(datasetType: str | DatasetType, collections: str | Pattern | Iterable[str | Pattern] | ellipsis | CollectionWildcard | None = Ellipsis, *, collectionTypes: Iterable[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), flattenChains: bool = False) Iterator[DatasetAssociation]

Iterate over dataset-collection combinations where the dataset is in the collection.

This method is a temporary placeholder for better support for association results in queryDatasets. It will probably be removed in the future, and should be avoided in production code whenever possible.

Parameters:
datasetTypeDatasetType or str

A dataset type object or the name of one.

collectionscollection expression, optional

An expression that identifies the collections to search for datasets, such as a str (for full matches or partial matches via globs), re.Pattern (for partial matches), or iterable thereof. ... can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. See Collection expressions for more information.

collectionTypesSet [ CollectionType ], optional

If provided, only yield associations from collections of these types.

flattenChainsbool, optional

If True, search in the children of CHAINED collections. If False, CHAINED collections are ignored.

Yields:
associationDatasetAssociation

Object representing the relationship between a single dataset and a single collection.

Raises:
lsst.daf.butler.registry.NoDefaultCollectionError

Raised if collections is None and self.defaults.collections is None.

lsst.daf.butler.registry.CollectionExpressionError

Raised when collections expression is invalid.

abstract queryDatasetTypes(expression: Any = Ellipsis, *, components: bool | None = False, missing: list[str] | None = None) Iterable[DatasetType]

Iterate over the dataset types whose names match an expression.

Parameters:
expressiondataset type expression, optional

An expression that fully or partially identifies the dataset types to return, such as a str, re.Pattern, or iterable thereof. ... can be used to return all dataset types, and is the default. See DatasetType expressions for more information.

componentsbool, optional

If True, apply all expression patterns to component dataset type names as well. If False, never apply patterns to components. If None, apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

Values other than False are deprecated, and only False will be supported after v26. After v27 this argument will be removed entirely.

missinglist of str, optional

String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.

Returns:
dataset_typesIterable [ DatasetType]

An Iterable of DatasetType instances whose names match expression.

Raises:
lsst.daf.butler.registry.DatasetTypeExpressionError

Raised when expression is invalid.

abstract queryDatasets(datasetType: Any, *, collections: str | Pattern | Iterable[str | Pattern] | ellipsis | CollectionWildcard | None = None, dimensions: Iterable[Dimension | str] | None = None, dataId: DataCoordinate | Mapping[str, Any] | None = None, where: str = '', findFirst: bool = False, components: bool | None = False, bind: Mapping[str, Any] | None = None, check: bool = True, **kwargs: Any) DatasetQueryResults

Query for and iterate over dataset references matching user-provided criteria.

Parameters:
datasetTypedataset type expression

An expression that fully or partially identifies the dataset types to be queried. Allowed types include DatasetType, str, re.Pattern, and iterables thereof. The special value ... can be used to query all dataset types. See DatasetType expressions for more information.

collectionscollection expression, optional

An expression that identifies the collections to search, such as a str (for full matches or partial matches via globs), re.Pattern (for partial matches), or iterable thereof. ... can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. See Collection expressions for more information.

dimensionsIterable of Dimension or str

Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the dataId or where arguments.

dataIddict or DataCoordinate, optional

A data ID whose key-value pairs are used as equality constraints in the query.

wherestr, optional

A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.

findFirstbool, optional

If True (False is default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be ....

componentsbool, optional

If True, apply all dataset expression patterns to component dataset type names as well. If False, never apply patterns to components. If None, apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

Values other than False are deprecated, and only False will be supported after v26. After v27 this argument will be removed entirely.

bindMapping, optional

Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.

checkbool, optional

If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).

**kwargs

Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns:
refsqueries.DatasetQueryResults

Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e. DataCoordinate.hasFull will return True), but will not include dimension records (DataCoordinate.hasRecords will be False) unless expanded is called on the result object (which returns a new one).

Raises:
lsst.daf.butler.registry.DatasetTypeExpressionError

Raised when datasetType expression is invalid.

TypeError

Raised when the arguments are incompatible, such as when a collection wildcard is passed when findFirst is True, or when collections is None and self.defaults.collections is also None.

lsst.daf.butler.registry.DataIdError

Raised when dataId or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.

lsst.daf.butler.registry.UserExpressionError

Raised when where expression is invalid.

Notes

When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use queryDataIds to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls to queryDatasets with the returned data IDs passed as constraints.

abstract queryDimensionRecords(element: DimensionElement | str, *, dataId: DataCoordinate | Mapping[str, Any] | None = None, datasets: Any = None, collections: str | Pattern | Iterable[str | Pattern] | ellipsis | CollectionWildcard | None = None, where: str = '', components: bool | None = False, bind: Mapping[str, Any] | None = None, check: bool = True, **kwargs: Any) DimensionRecordQueryResults

Query for dimension information matching user-provided criteria.

Parameters:
elementDimensionElement or str

The dimension element to obtain records for.

dataIddict or DataCoordinate, optional

A data ID whose key-value pairs are used as equality constraints in the query.

datasetsdataset type expression, optional

An expression that fully or partially identifies dataset types that should constrain the yielded records. See queryDataIds and DatasetType expressions for more information.

collectionscollection expression, optional

An expression that identifies the collections to search for datasets, such as a str (for full matches or partial matches via globs), re.Pattern (for partial matches), or iterable thereof. ... can be used to search all collections (actually just all RUN collections, because this will still find all datasets). If not provided, self.default.collections is used. Ignored unless datasets is also passed. See Collection expressions for more information.

wherestr, optional

A string expression similar to a SQL WHERE clause. See queryDataIds and Dimension expressions for more information.

componentsbool, optional

Whether to apply dataset expressions to components as well. See queryDataIds for more information.

Values other than False are deprecated, and only False will be supported after v26. After v27 this argument will be removed entirely.

bindMapping, optional

Mapping containing literal values that should be injected into the where expression, keyed by the identifiers they replace. Values of collection type can be expanded in some cases; see Identifiers for more information.

checkbool, optional

If True (default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).

**kwargs

Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Returns:
dataIdsqueries.DimensionRecordQueryResults

Data IDs matching the given query parameters.

Raises:
lsst.daf.butler.registry.NoDefaultCollectionError

Raised if collections is None and self.defaults.collections is None.

lsst.daf.butler.registry.CollectionExpressionError

Raised when collections expression is invalid.

lsst.daf.butler.registry.DataIdError

Raised when dataId or keyword arguments specify unknown dimensions or values, or when they contain inconsistent values.

lsst.daf.butler.registry.DatasetTypeExpressionError

Raised when datasetType expression is invalid.

lsst.daf.butler.registry.UserExpressionError

Raised when where expression is invalid.

abstract refresh() None

Refresh all in-memory state by querying the database.

This may be necessary to enable querying for entities added by other registry instances after this one was constructed.

abstract registerCollection(name: str, type: CollectionType = CollectionType.TAGGED, doc: str | None = None) bool

Add a new collection if one with the given name does not exist.

Parameters:
namestr

The name of the collection to create.

typeCollectionType

Enum value indicating the type of collection to create.

docstr, optional

Documentation string for the collection.

Returns:
registeredbool

Boolean indicating whether the collection was already registered or was created by this call.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

abstract registerDatasetType(datasetType: DatasetType) bool

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters:
datasetTypeDatasetType

The DatasetType to be added.

Returns:
insertedbool

True if datasetType was inserted, False if an identical existing DatasetType was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.

Raises:
ValueError

Raised if the dimensions or storage class are invalid.

lsst.daf.butler.registry.ConflictingDefinitionError

Raised if this DatasetType is already registered with a different definition.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

abstract registerRun(name: str, doc: str | None = None) bool

Add a new run if one with the given name does not exist.

Parameters:
namestr

The name of the run to create.

docstr, optional

Documentation string for the collection.

Returns:
registeredbool

Boolean indicating whether a new run was registered. False if it already existed.

Notes

This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.

abstract removeCollection(name: str) None

Remove the given collection from the registry.

Parameters:
namestr

The name of the collection to remove.

Raises:
lsst.daf.butler.registry.MissingCollectionError

Raised if no collection with the given name exists.

sqlalchemy.exc.IntegrityError

Raised if the database rows associated with the collection are still referenced by some other table, such as a dataset in a datastore (for RUN collections only) or a CHAINED collection of which this collection is a child.

Notes

If this is a RUN collection, all datasets and quanta in it will removed from the Registry database. This requires that those datasets be removed (or at least trashed) from any datastores that hold them first.

A collection may not be deleted as long as it is referenced by a CHAINED collection; the CHAINED collection must be deleted or redefined first.

abstract removeDatasetType(name: str | tuple[str, ...]) None

Remove the named DatasetType from the registry.

Warning

Registry implementations can cache the dataset type definitions. This means that deleting the dataset type definition may result in unexpected behavior from other butler processes that are active that have not seen the deletion.

Parameters:
namestr or tuple [str]

Name of the type to be removed or tuple containing a list of type names to be removed. Wildcards are allowed.

Raises:
lsst.daf.butler.registry.OrphanedRecordError

Raised if an attempt is made to remove the dataset type definition when there are already datasets associated with it.

Notes

If the dataset type is not registered the method will return without action.

abstract removeDatasets(refs: Iterable[DatasetRef]) None

Remove datasets from the Registry.

The datasets will be removed unconditionally from all collections, and any Quantum that consumed this dataset will instead be marked with having a NULL input. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters:
refsIterable [DatasetRef]

References to the datasets to be removed. Must include a valid id attribute, and should be considered invalidated upon return.

Raises:
lsst.daf.butler.AmbiguousDatasetError

Raised if any ref.id is None.

lsst.daf.butler.registry.OrphanedRecordError

Raised if any dataset is still present in any Datastore.

resetConnectionPool() None

Reset connection pool for registry if relevant.

This operation can be used reset connections to servers when using registry with fork-based multiprocessing. This method should usually be called by the child process immediately after the fork.

The base class implementation is a no-op.

abstract setCollectionChain(parent: str, children: Any, *, flatten: bool = False) None

Define or redefine a CHAINED collection.

Parameters:
parentstr

Name of the chained collection. Must have already been added via a call to Registry.registerCollection.

childrencollection expression

An expression defining an ordered search of child collections, generally an iterable of str; see Collection expressions for more information.

flattenbool, optional

If True (False is default), recursively flatten out any nested CHAINED collections in children first.

Raises:
lsst.daf.butler.registry.MissingCollectionError

Raised when any of the given collections do not exist in the Registry.

lsst.daf.butler.registry.CollectionTypeError

Raised if parent does not correspond to a CHAINED collection.

ValueError

Raised if the given collections contains a cycle.

abstract setCollectionDocumentation(collection: str, doc: str | None) None

Set the documentation string for a collection.

Parameters:
namestr

Name of the collection.

docsstr or None

Docstring for the collection with the given name; will replace any existing docstring. Passing None will remove any existing docstring.

abstract supportsIdGenerationMode(mode: DatasetIdGenEnum) bool

Test whether the given dataset ID generation mode is supported by insertDatasets.

Parameters:
modeDatasetIdGenEnum

Enum value for the mode to test.

Returns:
supportedbool

Whether the given mode is supported.

abstract syncDimensionData(element: DimensionElement | str, row: Mapping[str, Any] | DimensionRecord, conform: bool = True, update: bool = False) bool | dict[str, Any]

Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.

Parameters:
elementDimensionElement or str

The DimensionElement or name thereof that identifies the table records will be inserted into.

rowdict or DimensionRecord

The record to insert.

conformbool, optional

If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

updatebool, optional

If True (False is default), update the existing record in the database if there is a conflict.

Returns:
inserted_or_updatedbool or dict

True if a new row was inserted, False if no changes were needed, or a dict mapping updated column names to their old values if an update was performed (only possible if update=True).

Raises:
lsst.daf.butler.registry.ConflictingDefinitionError

Raised if the record exists in the database (according to primary key lookup) but is inconsistent with the given one.

abstract transaction(*, savepoint: bool = False) Iterator[None]

Return a context manager that represents a transaction.