Registry¶

class lsst.daf.butler.Registry(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶

Bases: object

Registry interface.

Parameters

registryConfigRegistryConfig: Registry configuration.
schemaConfigSchemaConfig, optional: Schema configuration.
dimensionConfigDimensionConfig or Config or: DimensionGraph configuration.

Attributes Summary

defaultConfigFile

Path to configuration defaults.

Methods Summary

`addDataset`(datasetType, dataId, run[, …])	Adds a Dataset entry to the `Registry`
`addDatasetLocation`(ref, datastoreName)	Add datastore name locating a given dataset.
`addExecution`(execution)	Add a new `Execution` to the `Registry`.
`addRun`(run)	Add a new `Run` to the `Registry`.
`associate`(collection, refs)	Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
`attachComponent`(name, parent, component)	Attach a component to a dataset.
`deleteOpaqueData`(name, **where)	Remove records from an opaque table.
`disassociate`(collection, refs)	Remove existing Datasets from a collection.
`ensureRun`(run)	Conditionally add a new `Run` to the `Registry`.
`expandDataId`([dataId, graph, records])	Expand a dimension-based data ID to include additional information.
`fetchOpaqueData`(name, **where)	Retrieve records from an opaque table.
`find`(collection, datasetType[, dataId])	Lookup a dataset.
`fromConfig`(registryConfig[, schemaConfig, …])	Create `Registry` subclass instance from `config`.
`getAllCollections`()	Get names of all the collections found in this repository.
`getAllDatasetTypes`()	Get every registered `DatasetType`.
`getDataset`(id[, datasetType, dataId])	Retrieve a Dataset entry.
`getDatasetLocations`(ref)	Retrieve datastore locations for a given dataset.
`getDatasetType`(name)	Get the `DatasetType`.
`getExecution`(id)	Retrieve an Execution.
`getRun`([id, collection])	Get a `Run` corresponding to its collection or id
`insertDimensionData`(element, *data[, conform])	Insert one or more dimension records into the database.
`insertOpaqueData`(name, *data)	Insert records into an opaque table.
`makeRun`(collection)	Create a new `Run` in the `Registry` and return it.
`queryDatasets`(datasetType, *, collections[, …])	Query for and iterate over dataset references matching user-provided criteria.
`queryDimensions`(dimensions, *[, dataId, …])	Query for and iterate over data IDs matching user-provided criteria.
`registerDatasetType`(datasetType)	Add a new `DatasetType` to the Registry.
`registerOpaqueTable`(name, spec)	Add an opaque (to the `Registry`) table for use by a `Datastore` or other data repository client.
`removeDataset`(ref)	Remove a dataset from the Registry.
`removeDatasetLocation`(datastoreName, ref)	Remove datastore location associated with this dataset.
`setConfigRoot`(root, config, full[, overwrite])	Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
`transaction`()	Optionally implemented in `Registry` subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.

Attributes Documentation

defaultConfigFile = None¶: Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.

Methods Documentation

abstract addDataset(datasetType, dataId, run, producer=None, recursive=False, **kwds)¶

Adds a Dataset entry to the Registry

This always adds a new Dataset; to associate an existing Dataset with a new collection, use associate.

Parameters

datasetTypeDatasetType or str: A DatasetType or the name of one.
dataIddict or DataCoordinate: A dict-like object containing the Dimension links that identify the dataset within a collection.
runRun: The Run instance that produced the Dataset. Ignored if producer is passed (producer.run is then used instead). A Run must be provided by one of the two arguments.
producerQuantum: Unit of work that produced the Dataset. May be None to store no provenance information, but if present the Quantum must already have been added to the Registry.
recursivebool: If True, recursively add Dataset and attach entries for component Datasets as well.
kwds: Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one.

Returns

refDatasetRef: A newly-created DatasetRef instance.

Raises

ConflictingDefinitionError: If a Dataset with the given DatasetRef already exists in the given collection.
Exception: If dataId contains unknown or invalid Dimension entries.

abstract addDatasetLocation(ref, datastoreName)¶

Add datastore name locating a given dataset.

Typically used by Datastore.

Parameters

refDatasetRef: A reference to the dataset for which to add storage information.
datastoreNamestr: Name of the datastore holding this dataset.

Raises

AmbiguousDatasetError: Raised if ref.id is None.

abstract addExecution(execution)¶

Add a new Execution to the Registry.

If execution.id is None the Registry will update it to that of the newly inserted entry.

Parameters

executionExecution: Instance to add to the Registry. The given Execution must not already be present in the Registry.

Raises

ConflictingDefinitionError: If execution is already present in the Registry.

abstract addRun(run)¶

Add a new Run to the Registry.

Parameters

runRun: Instance to add to the Registry. The given Run must not already be present in the Registry (or any other). Therefore its id must be None and its collection must not be associated with any existing Run.

Raises

ConflictingDefinitionError: If a run already exists with this collection.

abstract associate(collection, refs)¶

Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.

If a DatasetRef with the same exact dataset_id is already in a collection nothing is changed. If a DatasetRef with the same DatasetType1 and dimension values but with different dataset_id exists in the collection, ValueError is raised.

Parameters

collectionstr: Indicates the collection the Datasets should be associated with.
refsiterable of DatasetRef: An iterable of DatasetRef instances that already exist in this Registry. All component datasets will be associated with the collection as well.

Raises

ConflictingDefinitionError: If a Dataset with the given DatasetRef already exists in the given collection.

abstract attachComponent(name, parent, component)¶

Attach a component to a dataset.

Parameters

namestr: Name of the component.
parentDatasetRef: A reference to the parent dataset. Will be updated to reference the component.
componentDatasetRef: A reference to the component dataset.

Raises

AmbiguousDatasetError: Raised if parent.id or component.id is None.

abstract deleteOpaqueData(name: str, **where: Any)¶

Remove records from an opaque table.

Parameters

namestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
where: Additional keyword arguments are interpreted as equality constraints that restrict the deketed rows (combined with AND); keyword arguments are column names and values are the values they must have.

abstract disassociate(collection, refs)¶

Remove existing Datasets from a collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters

collectionstr: The collection the Datasets should no longer be associated with.
refslist of DatasetRef: A list of DatasetRef instances that already exist in this Registry. All component datasets will also be removed.

Raises

AmbiguousDatasetError: Raised if any(ref.id is None for ref in refs).

abstract ensureRun(run)¶

Conditionally add a new Run to the Registry.

If the run.id is None or a Run with this id or collection doesn’t exist in the Registry yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.

Parameters

runRun: Instance to add to the Registry.

Raises

ConflictingDefinitionError: If run already exists, but is not identical.

abstract expandDataId(dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None, records: Optional[Mapping[lsst.daf.butler.core.dimensions.elements.DimensionElement, lsst.daf.butler.core.dimensions.records.DimensionRecord]] = None, **kwds)¶: Expand a dimension-based data ID to include additional information.

abstract fetchOpaqueData(name: str, **where: Any) → Iterator[dict]¶

Retrieve records from an opaque table.

Parameters

namestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
where: Additional keyword arguments are interpreted as equality constraints that restrict the returned rows (combined with AND); keyword arguments are column names and values are the values they must have.

Yields

rowdict: A dictionary representing a single result row.

abstract find(collection, datasetType, dataId=None, **kwds)¶

Lookup a dataset.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore.

Parameters

collectionstr: Identifies the collection to search.
datasetTypeDatasetType or str: A DatasetType or the name of one.
dataIddict or DataCoordinate, optional: A dict-like object containing the Dimension links that identify the dataset within a collection.
kwds: Additional keyword arguments passed to DataCoordinate.standardize to convert dataId to a true DataCoordinate or augment an existing one.

Returns

refDatasetRef: A ref to the Dataset, or None if no matching Dataset was found.

Raises

LookupError: If one or more data ID keys are missing.

static fromConfig(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶

Create Registry subclass instance from config.

Uses registry.cls from config to determine which subclass to instantiate.

Parameters

registryConfigButlerConfig, RegistryConfig, Config or str: Registry configuration
schemaConfigSchemaConfig, Config or str, optional.: Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and schemaConfig is None.
dimensionConfigDimensionConfig or Config or: str, optional. DimensionGraph configuration. Can be read from supplied registryConfig if the relevant component is defined and dimensionConfig is None.
createbool: Assume empty Registry and create a new one.

Returns

registryRegistry (subclass): A new Registry subclass instance.

abstract getAllCollections()¶

Get names of all the collections found in this repository.

Returns

collectionsset of str: The collections.

abstract getAllDatasetTypes()¶

Get every registered DatasetType.

Returns

typesfrozenset of DatasetType: Every DatasetType in the registry.

abstract getDataset(id, datasetType=None, dataId=None)¶

Retrieve a Dataset entry.

Parameters

idint: The unique identifier for the Dataset.
datasetTypeDatasetType, optional: The DatasetType of the dataset to retrieve. This is used to short-circuit retrieving the DatasetType, so if provided, the caller is guaranteeing that it is what would have been retrieved.
dataIdDataCoordinate, optional: A Dimension-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving the dataId, so if provided, the caller is guaranteeing that it is what would have been retrieved.

Returns

refDatasetRef: A ref to the Dataset, or None if no matching Dataset was found.

abstract getDatasetLocations(ref)¶

Retrieve datastore locations for a given dataset.

Typically used by Datastore.

Parameters

refDatasetRef: A reference to the dataset for which to retrieve storage information.

Returns

datastoresset of str: All the matching datastores holding this dataset. Empty set if the dataset does not exist anywhere.

Raises

AmbiguousDatasetError: Raised if ref.id is None.

abstract getDatasetType(name)¶

Get the DatasetType.

Parameters

namestr: Name of the type.

Returns

typeDatasetType: The DatasetType associated with the given name.

Raises

KeyError: Requested named DatasetType could not be found in registry.

abstract getExecution(id)¶

Retrieve an Execution.

Parameters

idint: The unique identifier for the Execution.

abstract getRun(id=None, collection=None)¶

Get a Run corresponding to its collection or id

Parameters

idint, optional: Retrieve the run with the given integer id.
collectionstr: If given, lookup by collection name instead.

Returns

runRun: The Run instance.

Raises

ValueError: Must supply one of collection or id.

abstract insertDimensionData(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], *data: Union[dict, lsst.daf.butler.core.dimensions.records.DimensionRecord], conform: bool = True)¶

Insert one or more dimension records into the database.

Parameters

elementDimensionElement or str: The DimensionElement or name thereof that identifies the table records will be inserted into.
datadict or DimensionRecord (variadic): One or more records to insert.
conformbool, optional: If False (True is default) perform no checking or conversions, and assume that element is a DimensionElement instance and data is a one or more DimensionRecord instances of the appropriate subclass.

abstract insertOpaqueData(name: str, *data: dict)¶

Insert records into an opaque table.

Parameters

namestr: Logical name of the opaque table. Must match the name used in a previous call to registerOpaqueTable.
data: Each additional positional argument is a dictionary that represents a single row to be added.

abstract makeRun(collection)¶

Create a new Run in the Registry and return it.

If a run with this collection already exists, return that instead.

Parameters

collectionstr: The collection used to identify all inputs and outputs of the Run.

Returns

runRun: A new Run instance.

abstract queryDatasets(datasetType: Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], *, collections: Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis], dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, deduplicate: bool = False, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.datasets.DatasetRef]¶

Query for and iterate over dataset references matching user-provided criteria.

Parameters

datasetTypeDatasetType, str, Like, or ...: An expression indicating type(s) of datasets to query for. ... may be used to query for all known DatasetTypes. Multiple explicitly-provided dataset types cannot be queried in a single call to queryDatasets even though wildcard expressions can, because the results would be identical to chaining the iterators produced by multiple calls to queryDatasets.
collections: `~collections.abc.Sequence` of `str` or `Like`, or ``…``: An expression indicating the collections to be searched for datasets. ... may be passed to search all collections.
dimensionsIterable of Dimension or str: Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the dataId or where arguments.
dataIddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
deduplicatebool, optional: If True (False is default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). Cannot be used if any element in collections is an expression.
expandbool, optional: If True (default) attach ExpandedDataCoordinate instead of minimal DataCoordinate base-class instances.
kwds: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Yields

refDatasetRef: Dataset references matching the given query criteria. These are grouped by DatasetType if the query evaluates to multiple dataset types, but order is otherwise unspecified.

Raises

TypeError: Raised when the arguments are incompatible, such as when a collection wildcard is pass when deduplicate is True.

Notes

When multiple dataset types are queried via a wildcard expression, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use queryDimensions to first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls to queryDatasets with the returned data IDs passed as constraints.

abstract queryDimensions(dimensions: Union[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]], lsst.daf.butler.core.dimensions.elements.Dimension, str], *, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Mapping[Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis]]] = None, where: Optional[str] = None, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate]¶

Query for and iterate over data IDs matching user-provided criteria.

Parameters

dimensionsDimension or str, or iterable thereof: The dimensions of the data IDs to yield, as either Dimension instances or str. Will be automatically expanded to a complete DimensionGraph.
dataIddict or DataCoordinate, optional: A data ID whose key-value pairs are used as equality constraints in the query.
datasetsMapping, optional: Datasets whose existence in the registry constrain the set of data IDs returned. This is a mapping from a dataset type expression (a str name, a true DatasetType instance, a Like pattern for the name, or ... for all DatasetTypes) to a collections expression (a sequence of str or Like patterns, or for all collections).
wherestr, optional: A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
expandbool, optional: If True (default) yield ExpandedDataCoordinate instead of minimal DataCoordinate base-class instances.
kwds: Additional keyword arguments are forwarded to DataCoordinate.standardize when processing the dataId argument (and may be used to provide a constraining data ID even when the dataId argument is None).

Yields

dataIdDataCoordinate: Data IDs matching the given query parameters. Order is unspecified.

abstract registerDatasetType(datasetType)¶

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters

datasetTypeDatasetType: The DatasetType to be added.

Returns

insertedbool: True if datasetType was inserted, False if an identical existing DatsetType was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.

Raises

ValueError: Raised if the dimensions or storage class are invalid.
ConflictingDefinitionError: Raised if this DatasetType is already registered with a different definition.

abstract registerOpaqueTable(name: str, spec: lsst.daf.butler.core.schema.TableSpec)¶

Add an opaque (to the Registry) table for use by a Datastore or other data repository client.

Opaque table records can be added via insertOpaqueData, retrieved via fetchOpaqueData, and removed via deleteOpaqueData.

Parameters

namestr: Logical name of the opaque table. This may differ from the actual name used in the database by a prefix and/or suffix.
specTableSpec: Specification for the table to be added.

abstract removeDataset(ref)¶

Remove a dataset from the Registry.

The dataset and all components will be removed unconditionally from all collections, and any associated Quantum records will also be removed. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters

refDatasetRef: Reference to the dataset to be removed. Must include a valid id attribute, and should be considered invalidated upon return.

Raises

AmbiguousDatasetError: Raised if ref.id is None.
OrphanedRecordError: Raised if the dataset is still present in any Datastore.

abstract removeDatasetLocation(datastoreName, ref)¶

Remove datastore location associated with this dataset.

Typically used by Datastore when a dataset is removed.

Parameters

datastoreNamestr: Name of this Datastore.
refDatasetRef: A reference to the dataset for which information is to be removed.

Raises

AmbiguousDatasetError: Raised if ref.id is None.

abstract classmethod setConfigRoot(root, config, full, overwrite=True)¶

Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.

Parameters

rootstr: Filesystem path to the root of the data repository.
configConfig: A Config to update. Only the subset understood by this component will be updated. Will not expand defaults.
fullConfig: A complete config with all defaults expanded that can be converted to a RegistryConfig. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied from full to config.
overwritebool, optional: If False, do not modify a value in config if the value already exists. Default is always to overwrite with the provided root.

Notes

If a keyword is explicitly defined in the supplied config it will not be overridden by this method if overwrite is False. This allows explicit values set in external configs to be retained.

transaction()¶

Optionally implemented in Registry subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.

This context manager may be nested (e.g. any implementation by a Registry subclass must nest properly).

Warning

The level of exception safety is not guaranteed by this API. It may implement stong exception safety and roll back any changes leaving the state unchanged, or it may do nothing leaving the underlying Registry corrupted. Depending on the implementation in the subclass.

Navigation

Registry¶