Registry¶
-
class
lsst.daf.butler.
Registry
(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶ Bases:
object
Registry interface.
- Parameters
- registryConfig
RegistryConfig
Registry configuration.
- schemaConfig
SchemaConfig
, optional Schema configuration.
- dimensionConfig
DimensionConfig
orConfig
or DimensionGraph
configuration.
- registryConfig
Attributes Summary
Path to configuration defaults.
Methods Summary
addDataset
(datasetType, dataId, run[, …])Adds a Dataset entry to the
Registry
addDatasetLocation
(ref, datastoreName)Add datastore name locating a given dataset.
addExecution
(execution)addRun
(run)associate
(collection, refs)Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
attachComponent
(name, parent, component)Attach a component to a dataset.
deleteOpaqueData
(name, **where)Remove records from an opaque table.
disassociate
(collection, refs)Remove existing Datasets from a collection.
ensureRun
(run)expandDataId
([dataId, graph, records])Expand a dimension-based data ID to include additional information.
fetchOpaqueData
(name, **where)Retrieve records from an opaque table.
find
(collection, datasetType[, dataId])Lookup a dataset.
fromConfig
(registryConfig[, schemaConfig, …])Create
Registry
subclass instance fromconfig
.Get names of all the collections found in this repository.
Get every registered
DatasetType
.getDataset
(id[, datasetType, dataId])Retrieve a Dataset entry.
getDatasetLocations
(ref)Retrieve datastore locations for a given dataset.
getDatasetType
(name)Get the
DatasetType
.getExecution
(id)Retrieve an Execution.
getRun
([id, collection])Get a
Run
corresponding to its collection or idinsertDimensionData
(element, *data[, conform])Insert one or more dimension records into the database.
insertOpaqueData
(name, *data)Insert records into an opaque table.
makeRun
(collection)queryDatasets
(datasetType, *, collections[, …])Query for and iterate over dataset references matching user-provided criteria.
queryDimensions
(dimensions, *[, dataId, …])Query for and iterate over data IDs matching user-provided criteria.
registerDatasetType
(datasetType)Add a new
DatasetType
to the Registry.registerOpaqueTable
(name, spec)Add an opaque (to the
Registry
) table for use by aDatastore
or other data repository client.removeDataset
(ref)Remove a dataset from the Registry.
removeDatasetLocation
(datastoreName, ref)Remove datastore location associated with this dataset.
setConfigRoot
(root, config, full[, overwrite])Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
Optionally implemented in
Registry
subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.Attributes Documentation
-
defaultConfigFile
= None¶ Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.
Methods Documentation
-
abstract
addDataset
(datasetType, dataId, run, producer=None, recursive=False, **kwds)¶ Adds a Dataset entry to the
Registry
This always adds a new Dataset; to associate an existing Dataset with a new collection, use
associate
.- Parameters
- datasetType
DatasetType
orstr
A
DatasetType
or the name of one.- dataId
dict
orDataCoordinate
A
dict
-like object containing theDimension
links that identify the dataset within a collection.- run
Run
The
Run
instance that produced the Dataset. Ignored ifproducer
is passed (producer.run
is then used instead). A Run must be provided by one of the two arguments.- producer
Quantum
Unit of work that produced the Dataset. May be
None
to store no provenance information, but if present theQuantum
must already have been added to the Registry.- recursive
bool
If True, recursively add Dataset and attach entries for component Datasets as well.
- kwds
Additional keyword arguments passed to
DataCoordinate.standardize
to convertdataId
to a trueDataCoordinate
or augment an existing one.
- datasetType
- Returns
- ref
DatasetRef
A newly-created
DatasetRef
instance.
- ref
- Raises
- ConflictingDefinitionError
If a Dataset with the given
DatasetRef
already exists in the given collection.- Exception
If
dataId
contains unknown or invalidDimension
entries.
-
abstract
addDatasetLocation
(ref, datastoreName)¶ Add datastore name locating a given dataset.
Typically used by
Datastore
.- Parameters
- ref
DatasetRef
A reference to the dataset for which to add storage information.
- datastoreName
str
Name of the datastore holding this dataset.
- ref
- Raises
- AmbiguousDatasetError
Raised if
ref.id
isNone
.
-
abstract
addExecution
(execution)¶ Add a new
Execution
to theRegistry
.If
execution.id
isNone
theRegistry
will update it to that of the newly inserted entry.
-
abstract
addRun
(run)¶
-
abstract
associate
(collection, refs)¶ Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
If a DatasetRef with the same exact
dataset_id
is already in a collection nothing is changed. If aDatasetRef
with the sameDatasetType1
and dimension values but with differentdataset_id
exists in the collection,ValueError
is raised.- Parameters
- collection
str
Indicates the collection the Datasets should be associated with.
- refsiterable of
DatasetRef
An iterable of
DatasetRef
instances that already exist in thisRegistry
. All component datasets will be associated with the collection as well.
- collection
- Raises
- ConflictingDefinitionError
If a Dataset with the given
DatasetRef
already exists in the given collection.
-
abstract
attachComponent
(name, parent, component)¶ Attach a component to a dataset.
- Parameters
- name
str
Name of the component.
- parent
DatasetRef
A reference to the parent dataset. Will be updated to reference the component.
- component
DatasetRef
A reference to the component dataset.
- name
- Raises
- AmbiguousDatasetError
Raised if
parent.id
orcomponent.id
isNone
.
-
abstract
deleteOpaqueData
(name: str, **where: Any)¶ Remove records from an opaque table.
- Parameters
- name
str
Logical name of the opaque table. Must match the name used in a previous call to
registerOpaqueTable
.- where
Additional keyword arguments are interpreted as equality constraints that restrict the deketed rows (combined with AND); keyword arguments are column names and values are the values they must have.
- name
-
abstract
disassociate
(collection, refs)¶ Remove existing Datasets from a collection.
collection
andref
combinations that are not currently associated are silently ignored.- Parameters
- collection
str
The collection the Datasets should no longer be associated with.
- refs
list
ofDatasetRef
A
list
ofDatasetRef
instances that already exist in thisRegistry
. All component datasets will also be removed.
- collection
- Raises
- AmbiguousDatasetError
Raised if
any(ref.id is None for ref in refs)
.
-
abstract
ensureRun
(run)¶ Conditionally add a new
Run
to theRegistry
.If the
run.id
isNone
or aRun
with thisid
orcollection
doesn’t exist in theRegistry
yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.
-
abstract
expandDataId
(dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None, records: Optional[Mapping[lsst.daf.butler.core.dimensions.elements.DimensionElement, lsst.daf.butler.core.dimensions.records.DimensionRecord]] = None, **kwds)¶ Expand a dimension-based data ID to include additional information.
-
abstract
fetchOpaqueData
(name: str, **where: Any) → Iterator[dict]¶ Retrieve records from an opaque table.
- Parameters
- name
str
Logical name of the opaque table. Must match the name used in a previous call to
registerOpaqueTable
.- where
Additional keyword arguments are interpreted as equality constraints that restrict the returned rows (combined with AND); keyword arguments are column names and values are the values they must have.
- name
- Yields
- row
dict
A dictionary representing a single result row.
- row
-
abstract
find
(collection, datasetType, dataId=None, **kwds)¶ Lookup a dataset.
This can be used to obtain a
DatasetRef
that permits the dataset to be read from aDatastore
.- Parameters
- collection
str
Identifies the collection to search.
- datasetType
DatasetType
orstr
A
DatasetType
or the name of one.- dataId
dict
orDataCoordinate
, optional A
dict
-like object containing theDimension
links that identify the dataset within a collection.- kwds
Additional keyword arguments passed to
DataCoordinate.standardize
to convertdataId
to a trueDataCoordinate
or augment an existing one.
- collection
- Returns
- ref
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
- ref
- Raises
- LookupError
If one or more data ID keys are missing.
-
static
fromConfig
(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶ Create
Registry
subclass instance fromconfig
.Uses
registry.cls
fromconfig
to determine which subclass to instantiate.- Parameters
- registryConfig
ButlerConfig
,RegistryConfig
,Config
orstr
Registry configuration
- schemaConfig
SchemaConfig
,Config
orstr
, optional. Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and
schemaConfig
isNone
.- dimensionConfig
DimensionConfig
orConfig
or str
, optional.DimensionGraph
configuration. Can be read from supplied registryConfig if the relevant component is defined anddimensionConfig
isNone
.- create
bool
Assume empty Registry and create a new one.
- registryConfig
- Returns
-
abstract
getAllCollections
()¶ Get names of all the collections found in this repository.
-
abstract
getAllDatasetTypes
()¶ Get every registered
DatasetType
.- Returns
- types
frozenset
ofDatasetType
Every
DatasetType
in the registry.
- types
-
abstract
getDataset
(id, datasetType=None, dataId=None)¶ Retrieve a Dataset entry.
- Parameters
- id
int
The unique identifier for the Dataset.
- datasetType
DatasetType
, optional The
DatasetType
of the dataset to retrieve. This is used to short-circuit retrieving theDatasetType
, so if provided, the caller is guaranteeing that it is what would have been retrieved.- dataId
DataCoordinate
, optional A
Dimension
-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving the dataId, so if provided, the caller is guaranteeing that it is what would have been retrieved.
- id
- Returns
- ref
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
- ref
-
abstract
getDatasetLocations
(ref)¶ Retrieve datastore locations for a given dataset.
Typically used by
Datastore
.- Parameters
- ref
DatasetRef
A reference to the dataset for which to retrieve storage information.
- ref
- Returns
- Raises
- AmbiguousDatasetError
Raised if
ref.id
isNone
.
-
abstract
getDatasetType
(name)¶ Get the
DatasetType
.- Parameters
- name
str
Name of the type.
- name
- Returns
- type
DatasetType
The
DatasetType
associated with the given name.
- type
- Raises
- KeyError
Requested named DatasetType could not be found in registry.
-
abstract
getExecution
(id)¶ Retrieve an Execution.
- Parameters
- id
int
The unique identifier for the Execution.
- id
-
abstract
insertDimensionData
(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], *data: Union[dict, lsst.daf.butler.core.dimensions.records.DimensionRecord], conform: bool = True)¶ Insert one or more dimension records into the database.
- Parameters
- element
DimensionElement
orstr
The
DimensionElement
or name thereof that identifies the table records will be inserted into.- data
dict
orDimensionRecord
(variadic) One or more records to insert.
- conform
bool
, optional If
False
(True
is default) perform no checking or conversions, and assume thatelement
is aDimensionElement
instance anddata
is a one or moreDimensionRecord
instances of the appropriate subclass.
- element
-
abstract
insertOpaqueData
(name: str, *data: dict)¶ Insert records into an opaque table.
- Parameters
- name
str
Logical name of the opaque table. Must match the name used in a previous call to
registerOpaqueTable
.- data
Each additional positional argument is a dictionary that represents a single row to be added.
- name
-
abstract
makeRun
(collection)¶ Create a new
Run
in theRegistry
and return it.If a run with this collection already exists, return that instead.
-
abstract
queryDatasets
(datasetType: Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], *, collections: Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis], dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, deduplicate: bool = False, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.datasets.DatasetRef]¶ Query for and iterate over dataset references matching user-provided criteria.
- Parameters
- datasetType
DatasetType
,str
,Like
, or...
An expression indicating type(s) of datasets to query for.
...
may be used to query for all known DatasetTypes. Multiple explicitly-provided dataset types cannot be queried in a single call toqueryDatasets
even though wildcard expressions can, because the results would be identical to chaining the iterators produced by multiple calls toqueryDatasets
.- collections: `~collections.abc.Sequence` of `str` or `Like`, or ``…``
An expression indicating the collections to be searched for datasets.
...
may be passed to search all collections.- dimensions
Iterable
ofDimension
orstr
Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the
dataId
orwhere
arguments.- dataId
dict
orDataCoordinate
, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where
str
, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
- deduplicate
bool
, optional If
True
(False
is default), for each result data ID, only yield oneDatasetRef
of eachDatasetType
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). Cannot be used if any element incollections
is an expression.- expand
bool
, optional If
True
(default) attachExpandedDataCoordinate
instead of minimalDataCoordinate
base-class instances.- kwds
Additional keyword arguments are forwarded to
DataCoordinate.standardize
when processing thedataId
argument (and may be used to provide a constraining data ID even when thedataId
argument isNone
).
- datasetType
- Yields
- ref
DatasetRef
Dataset references matching the given query criteria. These are grouped by
DatasetType
if the query evaluates to multiple dataset types, but order is otherwise unspecified.
- ref
- Raises
- TypeError
Raised when the arguments are incompatible, such as when a collection wildcard is pass when
deduplicate
isTrue
.
Notes
When multiple dataset types are queried via a wildcard expression, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use
queryDimensions
to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls toqueryDatasets
with the returned data IDs passed as constraints.
-
abstract
queryDimensions
(dimensions: Union[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]], lsst.daf.butler.core.dimensions.elements.Dimension, str], *, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Mapping[Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis]]] = None, where: Optional[str] = None, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate]¶ Query for and iterate over data IDs matching user-provided criteria.
- Parameters
- dimensions
Dimension
orstr
, or iterable thereof The dimensions of the data IDs to yield, as either
Dimension
instances orstr
. Will be automatically expanded to a completeDimensionGraph
.- dataId
dict
orDataCoordinate
, optional A data ID whose key-value pairs are used as equality constraints in the query.
- datasets
Mapping
, optional Datasets whose existence in the registry constrain the set of data IDs returned. This is a mapping from a dataset type expression (a
str
name, a trueDatasetType
instance, aLike
pattern for the name, or...
for all DatasetTypes) to a collections expression (a sequence ofstr
orLike
patterns, orfor all collections).
- where
str
, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
- expand
bool
, optional If
True
(default) yieldExpandedDataCoordinate
instead of minimalDataCoordinate
base-class instances.- kwds
Additional keyword arguments are forwarded to
DataCoordinate.standardize
when processing thedataId
argument (and may be used to provide a constraining data ID even when thedataId
argument isNone
).
- dimensions
- Yields
- dataId
DataCoordinate
Data IDs matching the given query parameters. Order is unspecified.
- dataId
-
abstract
registerDatasetType
(datasetType)¶ Add a new
DatasetType
to the Registry.It is not an error to register the same
DatasetType
twice.- Parameters
- datasetType
DatasetType
The
DatasetType
to be added.
- datasetType
- Returns
- Raises
- ValueError
Raised if the dimensions or storage class are invalid.
- ConflictingDefinitionError
Raised if this DatasetType is already registered with a different definition.
-
abstract
registerOpaqueTable
(name: str, spec: lsst.daf.butler.core.schema.TableSpec)¶ Add an opaque (to the
Registry
) table for use by aDatastore
or other data repository client.Opaque table records can be added via
insertOpaqueData
, retrieved viafetchOpaqueData
, and removed viadeleteOpaqueData
.
-
abstract
removeDataset
(ref)¶ Remove a dataset from the Registry.
The dataset and all components will be removed unconditionally from all collections, and any associated
Quantum
records will also be removed.Datastore
records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.- Parameters
- ref
DatasetRef
Reference to the dataset to be removed. Must include a valid
id
attribute, and should be considered invalidated upon return.
- ref
- Raises
-
abstract
removeDatasetLocation
(datastoreName, ref)¶ Remove datastore location associated with this dataset.
Typically used by
Datastore
when a dataset is removed.- Parameters
- datastoreName
str
Name of this
Datastore
.- ref
DatasetRef
A reference to the dataset for which information is to be removed.
- datastoreName
- Raises
- AmbiguousDatasetError
Raised if
ref.id
isNone
.
-
abstract classmethod
setConfigRoot
(root, config, full, overwrite=True)¶ Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
- Parameters
- root
str
Filesystem path to the root of the data repository.
- config
Config
A
Config
to update. Only the subset understood by this component will be updated. Will not expand defaults.- full
Config
A complete config with all defaults expanded that can be converted to a
RegistryConfig
. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied fromfull
toconfig
.- overwrite
bool
, optional If
False
, do not modify a value inconfig
if the value already exists. Default is always to overwrite with the providedroot
.
- root
Notes
If a keyword is explicitly defined in the supplied
config
it will not be overridden by this method ifoverwrite
isFalse
. This allows explicit values set in external configs to be retained.
-
transaction
()¶ Optionally implemented in
Registry
subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.This context manager may be nested (e.g. any implementation by a
Registry
subclass must nest properly).Warning
The level of exception safety is not guaranteed by this API. It may implement stong exception safety and roll back any changes leaving the state unchanged, or it may do nothing leaving the underlying
Registry
corrupted. Depending on the implementation in the subclass.