Registry¶
-
class
lsst.daf.butler.
Registry
¶ Bases:
abc.ABC
Abstract Registry interface.
Each registry implementation can have its own constructor parameters. The assumption is that an instance of a specific subclass will be constructed from configuration using
Registry.fromConfig()
. The base class will look for acls
entry and call that specificfromConfig()
method.All subclasses should store
RegistryDefaults
in a_defaults
property. No other properties are assumed shared between implementations.Attributes Summary
defaultConfigFile
Path to configuration defaults. defaults
Default collection search path and/or output RUN
collection (RegistryDefaults
).dimensions
Definitions of all dimensions recognized by this Registry
(DimensionUniverse
).Methods Summary
associate
(collection, refs)Add existing datasets to a TAGGED
collection.certify
(collection, refs, timespan)Associate one or more datasets with a calibration collection and a validity range within it. copy
(defaults)Create a new Registry
backed by the same data repository and connection as this one, but independent defaults.createFromConfig
(config, str, None] = None, …)Create registry database and return Registry
instance.decertify
(collection, datasetType, …)Remove or adjust datasets to clear a validity range within a calibration collection. determineTrampoline
(config, RegistryConfig, …)Return class to use to instantiate real registry. disassociate
(collection, refs)Remove existing datasets from a TAGGED
collection.expandDataId
(dataId, Mapping[str, Any], …)Expand a dimension-based data ID to include additional information. findDataset
(datasetType, str], dataId, …)Find a dataset given its DatasetType
and data ID.forceRegistryConfig
(config, RegistryConfig, …)Force the supplied config to a RegistryConfig
.fromConfig
(config, RegistryConfig, Config, …)Create Registry
subclass instance fromconfig
.getCollectionChain
(parent)Return the child collections in a CHAINED
collection.getCollectionDocumentation
(collection)Retrieve the documentation string for a collection. getCollectionSummary
(collection)Return a summary for the given collection. getCollectionType
(name)Return an enumeration value indicating the type of the given collection. getDataset
(id, uuid.UUID])Retrieve a Dataset entry. getDatasetLocations
(ref)Retrieve datastore locations for a given dataset. getDatasetType
(name)Get the DatasetType
.getDatastoreBridgeManager
()Return an object that allows a new Datastore
instance to communicate with thisRegistry
.insertDatasets
(datasetType, str], dataIds, …)Insert one or more datasets into the Registry
insertDimensionData
(element, str], *data, …)Insert one or more dimension records into the database. isWriteable
()Return True
if this registry allows write operations, andFalse
otherwise.queryCollections
(expression, datasetType, …)Iterate over the collections whose names match an expression. queryDataIds
(dimensions, str]], …)Query for data IDs matching user-provided criteria. queryDatasetAssociations
(datasetType, …)Iterate over dataset-collection combinations where the dataset is in the collection. queryDatasetTypes
(expression, *, components, …)Iterate over the dataset types whose names match an expression. queryDatasets
(datasetType, *, collections, …)Query for and iterate over dataset references matching user-provided criteria. queryDimensionRecords
(element, str], *, …)Query for dimension information matching user-provided criteria. refresh
()Refresh all in-memory state by querying the database. registerCollection
(name, type, doc)Add a new collection if one with the given name does not exist. registerDatasetType
(datasetType)Add a new DatasetType
to the Registry.registerRun
(name, doc)Add a new run if one with the given name does not exist. removeCollection
(name)Remove the given collection from the registry. removeDatasetType
(name)Remove the named DatasetType
from the registry.removeDatasets
(refs)Remove datasets from the Registry. resetConnectionPool
()Reset connection pool for registry if relevant. setCollectionChain
(parent, children, *, flatten)Define or redefine a CHAINED
collection.setCollectionDocumentation
(collection, doc)Set the documentation string for a collection. supportsIdGenerationMode
(mode)Test whether the given dataset ID generation mode is supported by insertDatasets
.syncDimensionData
(element, str], row, Any], …)Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does. transaction
(*, savepoint)Return a context manager that represents a transaction. Attributes Documentation
-
defaultConfigFile
= None¶ Path to configuration defaults. Accessed within the
configs
resource or relative to a search path. Can be None if no defaults specified.
-
defaults
¶ Default collection search path and/or output
RUN
collection (RegistryDefaults
).This is an immutable struct whose components may not be set individually, but the entire struct can be set by assigning to this property.
-
dimensions
¶ Definitions of all dimensions recognized by this
Registry
(DimensionUniverse
).
Methods Documentation
-
associate
(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None¶ Add existing datasets to a
TAGGED
collection.If a DatasetRef with the same exact ID is already in a collection nothing is changed. If a
DatasetRef
with the sameDatasetType
and data ID but with different ID exists in the collection,ConflictingDefinitionError
is raised.Parameters: - collection :
str
Indicates the collection the datasets should be associated with.
- refs :
Iterable
[DatasetRef
] An iterable of resolved
DatasetRef
instances that already exist in thisRegistry
.
Raises: - ConflictingDefinitionError
If a Dataset with the given
DatasetRef
already exists in the given collection.- AmbiguousDatasetError
Raised if
any(ref.id is None for ref in refs)
.- MissingCollectionError
Raised if
collection
does not exist in the registry.- TypeError
Raise adding new datasets to the given
collection
is not allowed.
- collection :
-
certify
(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef], timespan: lsst.daf.butler.core.timespan.Timespan) → None¶ Associate one or more datasets with a calibration collection and a validity range within it.
Parameters: - collection :
str
The name of an already-registered
CALIBRATION
collection.- refs :
Iterable
[DatasetRef
] Datasets to be associated.
- timespan :
Timespan
The validity range for these datasets within the collection.
Raises: - AmbiguousDatasetError
Raised if any of the given
DatasetRef
instances is unresolved.- ConflictingDefinitionError
Raised if the collection already contains a different dataset with the same
DatasetType
and data ID and an overlapping validity range.- TypeError
Raised if
collection
is not aCALIBRATION
collection or if one or more datasets are of a dataset type for whichDatasetType.isCalibration
returnsFalse
.
- collection :
-
copy
(defaults: Optional[lsst.daf.butler.registry._defaults.RegistryDefaults] = None) → lsst.daf.butler.registry._registry.Registry¶ Create a new
Registry
backed by the same data repository and connection as this one, but independent defaults.Parameters: - defaults :
RegistryDefaults
, optional Default collections and data ID values for the new registry. If not provided,
self.defaults
will be used (but future changes to either registry’s defaults will not affect the other).
Returns: Notes
Because the new registry shares a connection with the original, they also share transaction state (despite the fact that their
transaction
context manager methods do not reflect this), and must be used with care.- defaults :
-
classmethod
createFromConfig
(config: Union[lsst.daf.butler.registry._config.RegistryConfig, str, None] = None, dimensionConfig: Union[lsst.daf.butler.core.dimensions._config.DimensionConfig, str, None] = None, butlerRoot: Union[str, urllib.parse.ParseResult, ResourcePath, pathlib.Path, None] = None) → lsst.daf.butler.registry._registry.Registry¶ Create registry database and return
Registry
instance.This method initializes database contents, database must be empty prior to calling this method.
Parameters: - config :
RegistryConfig
orstr
, optional Registry configuration, if missing then default configuration will be loaded from registry.yaml.
- dimensionConfig :
DimensionConfig
orstr
, optional Dimensions configuration, if missing then default configuration will be loaded from dimensions.yaml.
- butlerRoot : convertible to
lsst.resources.ResourcePath
, optional Path to the repository root this
Registry
will manage.
Returns: Notes
This class will determine the concrete
Registry
subclass to use from configuration. Each subclass should implement this method even if it can not create a registry.- config :
-
decertify
(collection: str, datasetType: Union[str, lsst.daf.butler.core.datasets.type.DatasetType], timespan: lsst.daf.butler.core.timespan.Timespan, *, dataIds: Optional[Iterable[Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any]]]] = None) → None¶ Remove or adjust datasets to clear a validity range within a calibration collection.
Parameters: - collection :
str
The name of an already-registered
CALIBRATION
collection.- datasetType :
str
orDatasetType
Name or
DatasetType
instance for the datasets to be decertified.- timespan :
Timespan
, optional The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.
- dataIds :
Iterable
[DataId
], optional Data IDs that should be decertified within the given validity range If
None
, all data IDs forself.datasetType
will be decertified.
Raises: - TypeError
Raised if
collection
is not aCALIBRATION
collection or ifdatasetType.isCalibration() is False
.
- collection :
-
classmethod
determineTrampoline
(config: Optional[Union[ButlerConfig, RegistryConfig, Config, str]]) → Tuple[Type[Registry], RegistryConfig]¶ Return class to use to instantiate real registry.
Parameters: - config :
RegistryConfig
orstr
, optional Registry configuration, if missing then default configuration will be loaded from registry.yaml.
Returns: - requested_cls :
type
ofRegistry
The real registry class to use.
- registry_config :
RegistryConfig
The
RegistryConfig
to use.
- config :
-
disassociate
(collection: str, refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None¶ Remove existing datasets from a
TAGGED
collection.collection
andref
combinations that are not currently associated are silently ignored.Parameters: - collection :
str
The collection the datasets should no longer be associated with.
- refs :
Iterable
[DatasetRef
] An iterable of resolved
DatasetRef
instances that already exist in thisRegistry
.
Raises: - AmbiguousDatasetError
Raised if any of the given dataset references is unresolved.
- MissingCollectionError
Raised if
collection
does not exist in the registry.- TypeError
Raise adding new datasets to the given
collection
is not allowed.
- collection :
-
expandDataId
(dataId: Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions._graph.DimensionGraph] = None, records: Union[lsst.daf.butler.core.named.NamedKeyMapping[lsst.daf.butler.core.dimensions._elements.DimensionElement, typing.Union[lsst.daf.butler.core.dimensions._records.DimensionRecord, NoneType]][lsst.daf.butler.core.dimensions._elements.DimensionElement, Optional[lsst.daf.butler.core.dimensions._records.DimensionRecord]], Mapping[str, Optional[lsst.daf.butler.core.dimensions._records.DimensionRecord]], None] = None, withDefaults: bool = True, **kwargs) → lsst.daf.butler.core.dimensions._coordinate.DataCoordinate¶ Expand a dimension-based data ID to include additional information.
Parameters: - dataId :
DataCoordinate
ordict
, optional Data ID to be expanded; augmented and overridden by
kwargs
.- graph :
DimensionGraph
, optional Set of dimensions for the expanded ID. If
None
, the dimensions will be inferred from the keys ofdataId
andkwargs
. Dimensions that are indataId
orkwargs
but not ingraph
are silently ignored, providing a way to extract andgraph
expand a subset of a data ID.- records :
Mapping
[str
,DimensionRecord
], optional Dimension record data to use before querying the database for that data, keyed by element name.
- withDefaults :
bool
, optional Utilize
self.defaults.dataId
to fill in missing governor dimension key-value pairs. Defaults toTrue
(i.e. defaults are used).- **kwargs
Additional keywords are treated like additional key-value pairs for
dataId
, extending and overriding
Returns: - expanded :
DataCoordinate
A data ID that includes full metadata for all of the dimensions it identifies, i.e. guarantees that
expanded.hasRecords()
andexpanded.hasFull()
both returnTrue
.
- dataId :
-
findDataset
(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataId: Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, collections: Optional[Any] = None, timespan: Optional[lsst.daf.butler.core.timespan.Timespan] = None, **kwargs) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]¶ Find a dataset given its
DatasetType
and data ID.This can be used to obtain a
DatasetRef
that permits the dataset to be read from aDatastore
. If the dataset is a component and can not be found using the provided dataset type, a dataset ref for the parent will be returned instead but with the correct dataset type.Parameters: - datasetType :
DatasetType
orstr
A
DatasetType
or the name of one.- dataId :
dict
orDataCoordinate
, optional A
dict
-like object containing theDimension
links that identify the dataset within a collection.- collections, optional.
An expression that fully or partially identifies the collections to search for the dataset; see Collection expressions for more information. Defaults to
self.defaults.collections
.- timespan :
Timespan
, optional A timespan that the validity range of the dataset must overlap. If not provided, any
CALIBRATION
collections matched by thecollections
argument will not be searched.- **kwargs
Additional keyword arguments passed to
DataCoordinate.standardize
to convertdataId
to a trueDataCoordinate
or augment an existing one.
Returns: - ref :
DatasetRef
A reference to the dataset, or
None
if no matching Dataset was found.
Raises: Notes
This method simply returns
None
and does not raise an exception even when the set of collections searched is intrinsically incompatible with the dataset type, e.g. ifdatasetType.isCalibration() is False
, but onlyCALIBRATION
collections are being searched. This may make it harder to debug some lookup failures, but the behavior is intentional; we consider it more important that failed searches are reported consistently, regardless of the reason, and that adding additional collections that do not contain a match to the search path never changes the behavior.- datasetType :
-
classmethod
forceRegistryConfig
(config: Optional[Union[ButlerConfig, RegistryConfig, Config, str]]) → RegistryConfig¶ Force the supplied config to a
RegistryConfig
.Parameters: - config :
RegistryConfig
,Config
orstr
orNone
Registry configuration, if missing then default configuration will be loaded from registry.yaml.
Returns: - registry_config :
RegistryConfig
A registry config.
- config :
-
classmethod
fromConfig
(config: Union[ButlerConfig, RegistryConfig, Config, str], butlerRoot: Optional[ResourcePathExpression] = None, writeable: bool = True, defaults: Optional[RegistryDefaults] = None) → Registry¶ Create
Registry
subclass instance fromconfig
.Registry database must be initialized prior to calling this method.
Parameters: - config :
ButlerConfig
,RegistryConfig
,Config
orstr
Registry configuration
- butlerRoot :
lsst.resources.ResourcePathExpression
, optional Path to the repository root this
Registry
will manage.- writeable :
bool
, optional If
True
(default) create a read-write connection to the database.- defaults :
RegistryDefaults
, optional Default collection search path and/or output
RUN
collection.
Returns: Notes
This class will determine the concrete
Registry
subclass to use from configuration. Each subclass should implement this method.- config :
-
getCollectionChain
(parent: str) → lsst.daf.butler.registry.wildcards.CollectionSearch¶ Return the child collections in a
CHAINED
collection.Parameters: - parent :
str
Name of the chained collection. Must have already been added via a call to
Registry.registerCollection
.
Returns: - children :
CollectionSearch
An object that defines the search path of the collection. See Collection expressions for more information.
Raises: - parent :
-
getCollectionDocumentation
(collection: str) → Optional[str]¶ Retrieve the documentation string for a collection.
Parameters: - name :
str
Name of the collection.
Returns: - name :
-
getCollectionSummary
(collection: str) → lsst.daf.butler.registry.summaries.CollectionSummary¶ Return a summary for the given collection.
Parameters: - collection :
str
Name of the collection for which a summary is to be retrieved.
Returns: - summary :
CollectionSummary
Summary of the dataset types and governor dimension values in this collection.
- collection :
-
getCollectionType
(name: str) → lsst.daf.butler.registry._collectionType.CollectionType¶ Return an enumeration value indicating the type of the given collection.
Parameters: - name :
str
The name of the collection.
Returns: - type :
CollectionType
Enum value indicating the type of this collection.
Raises: - MissingCollectionError
Raised if no collection with the given name exists.
- name :
-
getDataset
(id: Union[int, uuid.UUID]) → Optional[lsst.daf.butler.core.datasets.ref.DatasetRef]¶ Retrieve a Dataset entry.
Parameters: - id :
DatasetId
The unique identifier for the dataset.
Returns: - ref :
DatasetRef
orNone
A ref to the Dataset, or
None
if no matching Dataset was found.
- id :
-
getDatasetLocations
(ref: lsst.daf.butler.core.datasets.ref.DatasetRef) → Iterable[str]¶ Retrieve datastore locations for a given dataset.
Parameters: - ref :
DatasetRef
A reference to the dataset for which to retrieve storage information.
Returns: - datastores :
Iterable
[str
] All the matching datastores holding this dataset.
Raises: - AmbiguousDatasetError
Raised if
ref.id
isNone
.
- ref :
-
getDatasetType
(name: str) → lsst.daf.butler.core.datasets.type.DatasetType¶ Get the
DatasetType
.Parameters: - name :
str
Name of the type.
Returns: - type :
DatasetType
The
DatasetType
associated with the given name.
Raises: - KeyError
Requested named DatasetType could not be found in registry.
- name :
-
getDatastoreBridgeManager
() → DatastoreRegistryBridgeManager¶ Return an object that allows a new
Datastore
instance to communicate with thisRegistry
.Returns: - manager :
DatastoreRegistryBridgeManager
Object that mediates communication between this
Registry
and its associated datastores.
- manager :
-
insertDatasets
(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], dataIds: Iterable[Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any]]], run: Optional[str] = None, expand: bool = True, idGenerationMode: lsst.daf.butler.registry.interfaces._datasets.DatasetIdGenEnum = <DatasetIdGenEnum.UNIQUE: 0>) → List[lsst.daf.butler.core.datasets.ref.DatasetRef]¶ Insert one or more datasets into the
Registry
This always adds new datasets; to associate existing datasets with a new collection, use
associate
.Parameters: - datasetType :
DatasetType
orstr
A
DatasetType
or the name of one.- dataIds :
Iterable
ofdict
orDataCoordinate
Dimension-based identifiers for the new datasets.
- run :
str
, optional The name of the run that produced the datasets. Defaults to
self.defaults.run
.- expand :
bool
, optional If
True
(default), expand data IDs as they are inserted. This is necessary in general to allow datastore to generate file templates, but it may be disabled if the caller can guarantee this is unnecessary.- idGenerationMode :
DatasetIdGenEnum
, optional Specifies option for generating dataset IDs. By default unique IDs are generated for each inserted dataset.
Returns: - refs :
list
ofDatasetRef
Resolved
DatasetRef
instances for all given data IDs (in the same order).
Raises: - datasetType :
-
insertDimensionData
(element: Union[lsst.daf.butler.core.dimensions._elements.DimensionElement, str], *data, conform: bool = True, replace: bool = False) → None¶ Insert one or more dimension records into the database.
Parameters: - element :
DimensionElement
orstr
The
DimensionElement
or name thereof that identifies the table records will be inserted into.- data :
dict
orDimensionRecord
(variadic) One or more records to insert.
- conform :
bool
, optional If
False
(True
is default) perform no checking or conversions, and assume thatelement
is aDimensionElement
instance anddata
is a one or moreDimensionRecord
instances of the appropriate subclass.- replace: `bool`, optional
If
True
(False
is default), replace existing records in the database if there is a conflict.
- element :
-
queryCollections
(expression: Any = Ellipsis, datasetType: Optional[lsst.daf.butler.core.datasets.type.DatasetType] = None, collectionTypes: Union[Iterable[lsst.daf.butler.registry._collectionType.CollectionType], lsst.daf.butler.registry._collectionType.CollectionType] = frozenset({<CollectionType.RUN: 1>, <CollectionType.TAGGED: 2>, <CollectionType.CHAINED: 3>, <CollectionType.CALIBRATION: 4>}), flattenChains: bool = False, includeChains: Optional[bool] = None) → Iterator[str]¶ Iterate over the collections whose names match an expression.
Parameters: - expression :
Any
, optional An expression that identifies the collections to return, such as a
str
(for full matches or partial matches via globs),re.Pattern
(for partial matches), or iterable thereof....
can be used to return all collections, and is the default. See Collection expressions for more information.- datasetType :
DatasetType
, optional If provided, only yield collections that may contain datasets of this type. This is a conservative approximation in general; it may yield collections that do not have any such datasets.
- collectionTypes :
AbstractSet
[CollectionType
] orCollectionType
, optional If provided, only yield collections of these types.
- flattenChains :
bool
, optional If
True
(False
is default), recursively yield the child collections of matchingCHAINED
collections.- includeChains :
bool
, optional If
True
, yield records for matchingCHAINED
collections. Default is the opposite offlattenChains
: include either CHAINED collections or their children, but not both.
Yields: - collection :
str
The name of a collection that matches
expression
.
- expression :
-
queryDataIds
(dimensions: Union[Iterable[Union[lsst.daf.butler.core.dimensions._elements.Dimension, str]], lsst.daf.butler.core.dimensions._elements.Dimension, str], *, dataId: Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs) → lsst.daf.butler.core.dimensions._dataCoordinateIterable.DataCoordinateIterable¶ Query for data IDs matching user-provided criteria.
Parameters: - dimensions :
Dimension
orstr
, or iterable thereof The dimensions of the data IDs to yield, as either
Dimension
instances orstr
. Will be automatically expanded to a completeDimensionGraph
.- dataId :
dict
orDataCoordinate
, optional A data ID whose key-value pairs are used as equality constraints in the query.
- datasets :
Any
, optional An expression that fully or partially identifies dataset types that should constrain the yielded data IDs. For example, including “raw” here would constrain the yielded
instrument
,exposure
,detector
, andphysical_filter
values to only those for which at least one “raw” dataset exists incollections
. Allowed types includeDatasetType
,str
,re.Pattern
, and iterables thereof. Unlike other dataset type expressions,...
is not permitted - it doesn’t make sense to constrain data IDs on the existence of all datasets. See DatasetType expressions for more information.- collections: `Any`, optional
An expression that identifies the collections to search for datasets, such as a
str
(for full matches or partial matches via globs),re.Pattern
(for partial matches), or iterable thereof....
can be used to search all collections (actually just allRUN
collections, because this will still find all datasets). If not provided,self.default.collections
is used. Ignored unlessdatasets
is also passed. See Collection expressions for more information.- where :
str
, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
- components :
bool
, optional If
True
, apply all dataset expression patterns to component dataset type names as well. IfFalse
, never apply patterns to components. IfNone
(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str
orDatasetType
instances) are always included.- bind :
Mapping
, optional Mapping containing literal values that should be injected into the
where
expression, keyed by the identifiers they replace.- check :
bool
, optional If
True
(default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardize
when processing thedataId
argument (and may be used to provide a constraining data ID even when thedataId
argument isNone
).
Returns: - dataIds :
DataCoordinateQueryResults
Data IDs matching the given query parameters. These are guaranteed to identify all dimensions (
DataCoordinate.hasFull
returnsTrue
), but will not containDimensionRecord
objects (DataCoordinate.hasRecords
returnsFalse
). CallDataCoordinateQueryResults.expanded
on the returned object to fetch those (and consider usingDataCoordinateQueryResults.materialize
on the returned object first if the expected number of rows is very large). See documentation for those methods for additional information.
Raises: - dimensions :
-
queryDatasetAssociations
(datasetType: Union[str, lsst.daf.butler.core.datasets.type.DatasetType], collections: Any = Ellipsis, *, collectionTypes: Iterable[lsst.daf.butler.registry._collectionType.CollectionType] = frozenset({<CollectionType.RUN: 1>, <CollectionType.TAGGED: 2>, <CollectionType.CHAINED: 3>, <CollectionType.CALIBRATION: 4>}), flattenChains: bool = False) → Iterator[lsst.daf.butler.core.datasets.association.DatasetAssociation]¶ Iterate over dataset-collection combinations where the dataset is in the collection.
This method is a temporary placeholder for better support for association results in
queryDatasets
. It will probably be removed in the future, and should be avoided in production code whenever possible.Parameters: - datasetType :
DatasetType
orstr
A dataset type object or the name of one.
- collections: `Any`, optional
An expression that identifies the collections to search for datasets, such as a
str
(for full matches or partial matches via globs),re.Pattern
(for partial matches), or iterable thereof....
can be used to search all collections (actually just allRUN
collections, because this will still find all datasets). If not provided,self.default.collections
is used. See Collection expressions for more information.- collectionTypes :
AbstractSet
[CollectionType
], optional If provided, only yield associations from collections of these types.
- flattenChains :
bool
, optional If
True
(default) search in the children ofCHAINED
collections. IfFalse
,CHAINED
collections are ignored.
Yields: - association :
DatasetAssociation
Object representing the relationship between a single dataset and a single collection.
Raises: - datasetType :
-
queryDatasetTypes
(expression: Any = Ellipsis, *, components: Optional[bool] = None, missing: Optional[List[str]] = None) → Iterator[lsst.daf.butler.core.datasets.type.DatasetType]¶ Iterate over the dataset types whose names match an expression.
Parameters: - expression :
Any
, optional An expression that fully or partially identifies the dataset types to return, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all dataset types, and is the default. See DatasetType expressions for more information.- components :
bool
, optional If
True
, apply all expression patterns to component dataset type names as well. IfFalse
, never apply patterns to components. IfNone
(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str
orDatasetType
instances) are always included.- missing :
list
ofstr
, optional String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.
Yields: - datasetType :
DatasetType
A
DatasetType
instance whose name matchesexpression
.
- expression :
-
queryDatasets
(datasetType: Any, *, collections: Optional[Any] = None, dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions._elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, findFirst: bool = False, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs) → Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]¶ Query for and iterate over dataset references matching user-provided criteria.
Parameters: - datasetType
An expression that fully or partially identifies the dataset types to be queried. Allowed types include
DatasetType
,str
,re.Pattern
, and iterables thereof. The special value...
can be used to query all dataset types. See DatasetType expressions for more information.- collections: optional
An expression that identifies the collections to search, such as a
str
(for full matches or partial matches via globs),re.Pattern
(for partial matches), or iterable thereof....
can be used to search all collections (actually just allRUN
collections, because this will still find all datasets). If not provided,self.default.collections
is used. See Collection expressions for more information.- dimensions :
Iterable
ofDimension
orstr
Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the
dataId
orwhere
arguments.- dataId :
dict
orDataCoordinate
, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where :
str
, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name. See Dimension expressions for more information.
- findFirst :
bool
, optional If
True
(False
is default), for each result data ID, only yield oneDatasetRef
of eachDatasetType
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
.- components :
bool
, optional If
True
, apply all dataset expression patterns to component dataset type names as well. IfFalse
, never apply patterns to components. IfNone
(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str
orDatasetType
instances) are always included.- bind :
Mapping
, optional Mapping containing literal values that should be injected into the
where
expression, keyed by the identifiers they replace.- check :
bool
, optional If
True
(default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardize
when processing thedataId
argument (and may be used to provide a constraining data ID even when thedataId
argument isNone
).
Returns: - refs :
queries.DatasetQueryResults
Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e.
DataCoordinate.hasFull
will returnTrue
), but will not include dimension records (DataCoordinate.hasRecords
will beFalse
) unlessexpanded
is called on the result object (which returns a new one).
Raises: Notes
When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use
queryDataIds
to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls toqueryDatasets
with the returned data IDs passed as constraints.
-
queryDimensionRecords
(element: Union[lsst.daf.butler.core.dimensions._elements.DimensionElement, str], *, dataId: Union[lsst.daf.butler.core.dimensions._coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Any] = None, collections: Optional[Any] = None, where: Optional[str] = None, components: Optional[bool] = None, bind: Optional[Mapping[str, Any]] = None, check: bool = True, **kwargs) → Iterable[lsst.daf.butler.core.dimensions._records.DimensionRecord]¶ Query for dimension information matching user-provided criteria.
Parameters: - element :
DimensionElement
orstr
The dimension element to obtain records for.
- dataId :
dict
orDataCoordinate
, optional A data ID whose key-value pairs are used as equality constraints in the query.
- datasets :
Any
, optional An expression that fully or partially identifies dataset types that should constrain the yielded records. See
queryDataIds
and DatasetType expressions for more information.- collections :
Any
, optional An expression that identifies the collections to search for datasets, such as a
str
(for full matches or partial matches via globs),re.Pattern
(for partial matches), or iterable thereof....
can be used to search all collections (actually just allRUN
collections, because this will still find all datasets). If not provided,self.default.collections
is used. Ignored unlessdatasets
is also passed. See Collection expressions for more information.- where :
str
, optional A string expression similar to a SQL WHERE clause. See
queryDataIds
and Dimension expressions for more information.- components :
bool
, optional Whether to apply dataset expressions to components as well. See
queryDataIds
for more information.- bind :
Mapping
, optional Mapping containing literal values that should be injected into the
where
expression, keyed by the identifiers they replace.- check :
bool
, optional If
True
(default) check the query for consistency before executing it. This may reject some valid queries that resemble common mistakes (e.g. queries for visits without specifying an instrument).- **kwargs
Additional keyword arguments are forwarded to
DataCoordinate.standardize
when processing thedataId
argument (and may be used to provide a constraining data ID even when thedataId
argument isNone
).
Returns: - dataIds :
Iterator
[DimensionRecord
] Data IDs matching the given query parameters.
- element :
-
refresh
() → None¶ Refresh all in-memory state by querying the database.
This may be necessary to enable querying for entities added by other registry instances after this one was constructed.
-
registerCollection
(name: str, type: lsst.daf.butler.registry._collectionType.CollectionType = <CollectionType.TAGGED: 2>, doc: Optional[str] = None) → bool¶ Add a new collection if one with the given name does not exist.
Parameters: - name :
str
The name of the collection to create.
- type :
CollectionType
Enum value indicating the type of collection to create.
- doc :
str
, optional Documentation string for the collection.
Returns: - registered :
bool
Boolean indicating whether the collection was already registered or was created by this call.
Notes
This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.
- name :
-
registerDatasetType
(datasetType: lsst.daf.butler.core.datasets.type.DatasetType) → bool¶ Add a new
DatasetType
to the Registry.It is not an error to register the same
DatasetType
twice.Parameters: - datasetType :
DatasetType
The
DatasetType
to be added.
Returns: Raises: - ValueError
Raised if the dimensions or storage class are invalid.
- ConflictingDefinitionError
Raised if this DatasetType is already registered with a different definition.
Notes
This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.
- datasetType :
-
registerRun
(name: str, doc: Optional[str] = None) → bool¶ Add a new run if one with the given name does not exist.
Parameters: Returns: Notes
This method cannot be called within transactions, as it needs to be able to perform its own transaction to be concurrent.
-
removeCollection
(name: str) → None¶ Remove the given collection from the registry.
Parameters: - name :
str
The name of the collection to remove.
Raises: - MissingCollectionError
Raised if no collection with the given name exists.
- sqlalchemy.IntegrityError
Raised if the database rows associated with the collection are still referenced by some other table, such as a dataset in a datastore (for
RUN
collections only) or aCHAINED
collection of which this collection is a child.
Notes
If this is a
RUN
collection, all datasets and quanta in it will removed from theRegistry
database. This requires that those datasets be removed (or at least trashed) from any datastores that hold them first.A collection may not be deleted as long as it is referenced by a
CHAINED
collection; theCHAINED
collection must be deleted or redefined first.- name :
-
removeDatasetType
(name: str) → None¶ Remove the named
DatasetType
from the registry.Warning
Registry implementations can cache the dataset type definitions. This means that deleting the dataset type definition may result in unexpected behavior from other butler processes that are active that have not seen the deletion.
Parameters: - name :
str
Name of the type to be removed.
Raises: - lsst.daf.butler.registry.OrphanedRecordError
Raised if an attempt is made to remove the dataset type definition when there are already datasets associated with it.
Notes
If the dataset type is not registered the method will return without action.
- name :
-
removeDatasets
(refs: Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]) → None¶ Remove datasets from the Registry.
The datasets will be removed unconditionally from all collections, and any
Quantum
that consumed this dataset will instead be marked with having a NULL input.Datastore
records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.Parameters: - refs :
Iterable
ofDatasetRef
References to the datasets to be removed. Must include a valid
id
attribute, and should be considered invalidated upon return.
Raises: - refs :
-
resetConnectionPool
() → None¶ Reset connection pool for registry if relevant.
This operation can be used reset connections to servers when using registry with fork-based multiprocessing. This method should usually be called by the child process immediately after the fork.
The base class implementation is a no-op.
-
setCollectionChain
(parent: str, children: Any, *, flatten: bool = False) → None¶ Define or redefine a
CHAINED
collection.Parameters: - parent :
str
Name of the chained collection. Must have already been added via a call to
Registry.registerCollection
.- children :
Any
An expression defining an ordered search of child collections, generally an iterable of
str
; see Collection expressions for more information.- flatten :
bool
, optional If
True
(False
is default), recursively flatten out any nestedCHAINED
collections inchildren
first.
Raises: - parent :
-
setCollectionDocumentation
(collection: str, doc: Optional[str]) → None¶ Set the documentation string for a collection.
Parameters:
-
supportsIdGenerationMode
(mode: lsst.daf.butler.registry.interfaces._datasets.DatasetIdGenEnum) → bool¶ Test whether the given dataset ID generation mode is supported by
insertDatasets
.Parameters: - mode :
DatasetIdGenEnum
Enum value for the mode to test.
Returns: - supported :
bool
Whether the given mode is supported.
- mode :
-
syncDimensionData
(element: Union[lsst.daf.butler.core.dimensions._elements.DimensionElement, str], row: Union[Mapping[str, Any], lsst.daf.butler.core.dimensions._records.DimensionRecord], conform: bool = True, update: bool = False) → Union[bool, Dict[str, Any]]¶ Synchronize the given dimension record with the database, inserting if it does not already exist and comparing values if it does.
Parameters: - element :
DimensionElement
orstr
The
DimensionElement
or name thereof that identifies the table records will be inserted into.- row :
dict
orDimensionRecord
The record to insert.
- conform :
bool
, optional If
False
(True
is default) perform no checking or conversions, and assume thatelement
is aDimensionElement
instance anddata
is a one or moreDimensionRecord
instances of the appropriate subclass.- update: `bool`, optional
If
True
(False
is default), update the existing record in the database if there is a conflict.
Returns: Raises: - ConflictingDefinitionError
Raised if the record exists in the database (according to primary key lookup) but is inconsistent with the given one.
- element :
-
transaction
(*, savepoint: bool = False) → Iterator[None]¶ Return a context manager that represents a transaction.
-