Registry

class lsst.daf.butler.Registry(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)

Bases: object

Registry interface.

Parameters:
registryConfig : RegistryConfig

Registry configuration.

schemaConfig : SchemaConfig, optional

Schema configuration.

dimensionConfig : DimensionConfig or Config or

DimensionGraph configuration.

Attributes Summary

defaultConfigFile Path to configuration defaults.
limited If True, this Registry does not maintain Dimension metadata or relationships (bool).
pixelization Object that interprets skypix Dimension values (lsst.sphgeom.Pixelization).

Methods Summary

addDataset(datasetType, dataId, run[, …]) Adds a Dataset entry to the Registry
addDatasetLocation(ref, datastoreName) Add datastore name locating a given dataset.
addDimensionEntry(dimension[, dataId, entry]) Add a new Dimension entry.
addDimensionEntryList(dimension, dataIdList) Add a new Dimension entry.
addExecution(execution) Add a new Execution to the Registry.
addQuantum(quantum) Add a new Quantum to the Registry.
addRun(run) Add a new Run to the Registry.
associate(collection, refs) Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
attachComponent(name, parent, component) Attach a component to a dataset.
disassociate(collection, refs) Remove existing Datasets from a collection.
ensureRun(run) Conditionally add a new Run to the Registry.
expandDataId([dataId, dimension, metadata, …]) Expand a data ID to include additional information.
find(collection, datasetType[, dataId]) Lookup a dataset.
findDimensionEntries(dimension) Return all Dimension entries corresponding to the named dimension.
findDimensionEntry(dimension[, dataId]) Return a Dimension entry corresponding to a DataId.
fromConfig(registryConfig[, schemaConfig, …]) Create Registry subclass instance from config.
getAllCollections() Get names of all the collections found in this repository.
getAllDatasetTypes() Get every registered DatasetType.
getDataset(id[, datasetType, dataId]) Retrieve a Dataset entry.
getDatasetLocations(ref) Retrieve datastore locations for a given dataset.
getDatasetType(name) Get the DatasetType.
getExecution(id) Retrieve an Execution.
getQuantum(id) Retrieve an Quantum.
getRun([id, collection]) Get a Run corresponding to its collection or id
makeDataIdPacker(name[, dataId]) Create an object that can pack certain data IDs into integers.
makeDatabaseDict(table, types, key, value[, …]) Construct a DatabaseDict backed by a table in the same database as this Registry.
makeRun(collection) Create a new Run in the Registry and return it.
markInputUsed(quantum, ref) Record the given DatasetRef as an actual (not just predicted) input of the given Quantum.
packDataId(name[, dataId, returnMaxBits]) Pack the given DataId into an integer.
registerDatasetType(datasetType) Add a new DatasetType to the Registry.
removeDataset(ref) Remove a dataset from the Registry.
removeDatasetLocation(datastoreName, ref) Remove datastore location associated with this dataset.
selectMultipleDatasetTypes(originInfo[, …]) Evaluate a filter expression and lists of DatasetTypes and return a set of dimension values.
setConfigRoot(root, config, full) Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
setDimensionRegion([dataId, update, region]) Set the region field for a Dimension instance or a combination thereof and update associated spatial join tables.
transaction() Optionally implemented in Registry subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.

Attributes Documentation

defaultConfigFile = None

Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.

limited

If True, this Registry does not maintain Dimension metadata or relationships (bool).

pixelization

Object that interprets skypix Dimension values (lsst.sphgeom.Pixelization).

None for limited registries.

Methods Documentation

addDataset(datasetType, dataId, run, producer=None, recursive=False, **kwds)

Adds a Dataset entry to the Registry

This always adds a new Dataset; to associate an existing Dataset with a new collection, use associate.

Parameters:
datasetType : DatasetType or str

A DatasetType or the name of one.

dataId : dict or DataId

A dict-like object containing the Dimension links that identify the dataset within a collection.

run : Run

The Run instance that produced the Dataset. Ignored if producer is passed (producer.run is then used instead). A Run must be provided by one of the two arguments.

producer : Quantum

Unit of work that produced the Dataset. May be None to store no provenance information, but if present the Quantum must already have been added to the Registry.

recursive : bool

If True, recursively add Dataset and attach entries for component Datasets as well.

kwds

Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

Returns:
ref : DatasetRef

A newly-created DatasetRef instance.

Raises:
ConflictingDefinitionError

If a Dataset with the given DatasetRef already exists in the given collection.

Exception

If dataId contains unknown or invalid Dimension entries.

addDatasetLocation(ref, datastoreName)

Add datastore name locating a given dataset.

Typically used by Datastore.

Parameters:
ref : DatasetRef

A reference to the dataset for which to add storage information.

datastoreName : str

Name of the datastore holding this dataset.

Raises:
AmbiguousDatasetError

Raised if ref.id is None.

addDimensionEntry(dimension, dataId=None, entry=None, **kwds)

Add a new Dimension entry.

dimension : str or Dimension
Either a Dimension object or the name of one.
dataId : dict or DataId, optional
A dict-like object containing the Dimension links that form the primary key of the row to insert. If this is a full DataId object, dataId.entries[dimension] will be updated with entry and then inserted into the Registry.
entry : dict
Dictionary that maps column name to column value.
kwds
Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

If values includes a “region” key, setDimensionRegion will automatically be called to set it any associated spatial join tables. Region fields associated with a combination of Dimensions must be explicitly set separately.

Returns:
dataId : DataId

A Data ID for exactly the given dimension that includes the added entry.

Raises:
TypeError

If the given Dimension does not have explicit entries in the registry.

ConflictingDefinitionError

If an entry with the primary-key defined in values is already present.

NotImplementedError

Raised if limited is True.

addDimensionEntryList(dimension, dataIdList, entry=None, **kwds)

Add a new Dimension entry.

dimension : str or Dimension
Either a Dimension object or the name of one.
dataId : list of dict or DataId
A list of dict-like objects containing the Dimension links that form the primary key of the rows to insert. If these are a full DataId object, dataId.entries[dimension] will be updated with entry and then inserted into the Registry.
entry : dict
Dictionary that maps column name to column value.
kwds
Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

If values includes a “region” key, regions will automatically be added to set it any associated spatial join tables. Region fields associated with a combination of Dimensions must be explicitly set separately.

Returns:
dataId : DataId

A Data ID for exactly the given dimension that includes the added entry.

Raises:
TypeError

If the given Dimension does not have explicit entries in the registry.

ConflictingDefinitionError

If an entry with the primary-key defined in values is already present.

NotImplementedError

Raised if limited is True.

addExecution(execution)

Add a new Execution to the Registry.

If execution.id is None the Registry will update it to that of the newly inserted entry.

Parameters:
execution : Execution

Instance to add to the Registry. The given Execution must not already be present in the Registry.

Raises:
ConflictingDefinitionError

If execution is already present in the Registry.

addQuantum(quantum)

Add a new Quantum to the Registry.

Parameters:
quantum : Quantum

Instance to add to the Registry. The given Quantum must not already be present in the Registry (or any other), therefore its:

  • run attribute must be set to an existing Run.
  • predictedInputs attribute must be fully populated with DatasetRefs, and its.
  • actualInputs and outputs will be ignored.
addRun(run)

Add a new Run to the Registry.

Parameters:
run : Run

Instance to add to the Registry. The given Run must not already be present in the Registry (or any other). Therefore its id must be None and its collection must not be associated with any existing Run.

Raises:
ConflictingDefinitionError

If a run already exists with this collection.

associate(collection, refs)

Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.

If a DatasetRef with the same exact dataset_id is already in a collection nothing is changed. If a DatasetRef with the same DatasetType1 and dimension values but with different dataset_id exists in the collection, ValueError is raised.

Parameters:
collection : str

Indicates the collection the Datasets should be associated with.

refs : iterable of DatasetRef

An iterable of DatasetRef instances that already exist in this Registry. All component datasets will be associated with the collection as well.

Raises:
ConflictingDefinitionError

If a Dataset with the given DatasetRef already exists in the given collection.

attachComponent(name, parent, component)

Attach a component to a dataset.

Parameters:
name : str

Name of the component.

parent : DatasetRef

A reference to the parent dataset. Will be updated to reference the component.

component : DatasetRef

A reference to the component dataset.

Raises:
AmbiguousDatasetError

Raised if parent.id or component.id is None.

disassociate(collection, refs)

Remove existing Datasets from a collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters:
collection : str

The collection the Datasets should no longer be associated with.

refs : list of DatasetRef

A list of DatasetRef instances that already exist in this Registry. All component datasets will also be removed.

Raises:
AmbiguousDatasetError

Raised if any(ref.id is None for ref in refs).

ensureRun(run)

Conditionally add a new Run to the Registry.

If the run.id is None or a Run with this id doesn’t exist in the Registry yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.

Parameters:
run : Run

Instance to add to the Registry.

Raises:
ConflictingDefinitionError

If run already exists, but is not identical.

expandDataId(dataId=None, *, dimension=None, metadata=None, region=False, update=False, **kwds)

Expand a data ID to include additional information.

expandDataId always returns a true DataId and ensures that its entries dict contains (at least) values for all implied dependencies.

Parameters:
dataId : dict or DataId

A dict-like object containing the Dimension links that include the primary keys of the rows to query. If this is a true DataId, the object will be updated in-place.

dimension : Dimension or str

A dimension passed to the DataId constructor to create a true DataId or augment an existing one.

metadata : collections.abc.Mapping, optional

A mapping from Dimension or str name to column name, indicating fields to read into dataId.entries. If dimension is provided, may instead be a sequence of column names for that dimension.

region : bool

If True and the given DataId is uniquely associated with a region on the sky, obtain that region from the Registry and attach it as dataId.region.

update : bool

If True, assume existing entries and regions in the given DataId are out-of-date and should be updated by values in the database. If False, existing values will be assumed to be correct and database queries will only be executed if they are missing.

kwds

Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

Returns:
dataId : DataId

A Data ID with all requested data populated.

Raises:
NotImplementedError

Raised if limited is True.

find(collection, datasetType, dataId=None, **kwds)

Lookup a dataset.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore.

Parameters:
collection : str

Identifies the collection to search.

datasetType : DatasetType or str

A DatasetType or the name of one.

dataId : dict or DataId, optional

A dict-like object containing the Dimension links that identify the dataset within a collection.

kwds

Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

Returns:
ref : DatasetRef

A ref to the Dataset, or None if no matching Dataset was found.

Raises:
LookupError

If one or more data ID keys are missing.

findDimensionEntries(dimension)

Return all Dimension entries corresponding to the named dimension.

Parameters:
dimension : str or Dimension

Either a Dimension object or the name of one.

Returns:
entries : list of dict

List with dict containing the Dimension values for each variant of the Dimension. Returns empty list if no entries have been added for this dimension.

Raises:
NotImplementedError

Raised if limited is True.

findDimensionEntry(dimension, dataId=None, **kwds)

Return a Dimension entry corresponding to a DataId.

Parameters:
dimension : str or Dimension

Either a Dimension object or the name of one.

dataId : dict or DataId, optional

A dict-like object containing the Dimension links that form the primary key of the row to retreive. If this is a full DataId object, dataId.entries[dimension] will be updated with the entry obtained from the Registry.

kwds

Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

Returns:
entry : dict

Dictionary with all Dimension values, or None if no matching entry is found. None if there is no entry for the given DataId.

Raises:
NotImplementedError

Raised if limited is True.

static fromConfig(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)

Create Registry subclass instance from config.

Uses registry.cls from config to determine which subclass to instantiate.

Parameters:
registryConfig : ButlerConfig, RegistryConfig, Config or str

Registry configuration

schemaConfig : SchemaConfig, Config or str, optional.

Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and schemaConfig is None.

dimensionConfig : DimensionConfig or Config or

str, optional. DimensionGraph configuration. Can be read from supplied registryConfig if the relevant component is defined and dimensionConfig is None.

create : bool

Assume empty Registry and create a new one.

Returns:
registry : Registry (subclass)

A new Registry subclass instance.

getAllCollections()

Get names of all the collections found in this repository.

Returns:
collections : set of str

The collections.

getAllDatasetTypes()

Get every registered DatasetType.

Returns:
types : frozenset of DatasetType

Every DatasetType in the registry.

getDataset(id, datasetType=None, dataId=None)

Retrieve a Dataset entry.

Parameters:
id : int

The unique identifier for the Dataset.

datasetType : DatasetType, optional

The DatasetType of the dataset to retrieve. This is used to short-circuit retrieving the DatasetType, so if provided, the caller is guaranteeing that it is what would have been retrieved.

dataId : DataId, optional

A Dimension-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving the DataId, so if provided, the caller is guaranteeing that it is what would have been retrieved.

Returns:
ref : DatasetRef

A ref to the Dataset, or None if no matching Dataset was found.

getDatasetLocations(ref)

Retrieve datastore locations for a given dataset.

Typically used by Datastore.

Parameters:
ref : DatasetRef

A reference to the dataset for which to retrieve storage information.

Returns:
datastores : set of str

All the matching datastores holding this dataset. Empty set if the dataset does not exist anywhere.

Raises:
AmbiguousDatasetError

Raised if ref.id is None.

getDatasetType(name)

Get the DatasetType.

Parameters:
name : str

Name of the type.

Returns:
type : DatasetType

The DatasetType associated with the given name.

Raises:
KeyError

Requested named DatasetType could not be found in registry.

getExecution(id)

Retrieve an Execution.

Parameters:
id : int

The unique identifier for the Execution.

getQuantum(id)

Retrieve an Quantum.

Parameters:
id : int

The unique identifier for the Quantum.

getRun(id=None, collection=None)

Get a Run corresponding to its collection or id

Parameters:
id : int, optional

Lookup by run id, or:

collection : str

If given, lookup by collection name instead.

Returns:
run : Run

The Run instance.

Raises:
ValueError

Must supply one of collection or id.

makeDataIdPacker(name, dataId=None, **kwds)

Create an object that can pack certain data IDs into integers.

Parameters:
name : str

Name of the packer, as given in the Registry configuration.

dataId : dict or DataId, optional

Data ID that identifies at least the “given” dimensions of the packer.

kwds

Addition keyword arguments used to augment or override the given data ID.

Returns:
packer : DataIdPacker

Instance of a subclass of DataIdPacker.

makeDatabaseDict(table, types, key, value, lengths=None)

Construct a DatabaseDict backed by a table in the same database as this Registry.

Parameters:
table : table

Name of the table that backs the returned DatabaseDict. If this table already exists, its schema must include at least everything in types.

types : dict

A dictionary mapping str field names to type objects, containing all fields to be held in the database.

key : str

The name of the field to be used as the dictionary key. Must not be present in value._fields.

value : type

The type used for the dictionary’s values, typically a namedtuple. Must have a _fields class attribute that is a tuple of field names (i.e. as defined by namedtuple); these field names must also appear in the types arg, and a _make attribute to construct it from a sequence of values (again, as defined by namedtuple).

lengths : dict, optional

Specific lengths of string fields. Defaults will be used if not specified.

Returns:
databaseDict : DatabaseDict

DatabaseDict backed by this registry.

makeRun(collection)

Create a new Run in the Registry and return it.

If a run with this collection already exists, return that instead.

Parameters:
collection : str

The collection used to identify all inputs and outputs of the Run.

Returns:
run : Run

A new Run instance.

markInputUsed(quantum, ref)

Record the given DatasetRef as an actual (not just predicted) input of the given Quantum.

This updates both the Registry”s Quantum table and the Python Quantum.actualInputs attribute.

Parameters:
quantum : Quantum

Producer to update. Will be updated in this call.

ref : DatasetRef

To set as actually used input.

Raises:
KeyError

If quantum is not a predicted consumer for ref.

packDataId(name, dataId=None, *, returnMaxBits=False, **kwds)

Pack the given DataId into an integer.

Parameters:
name : str

Name of the packer, as given in the Registry configuration.

dataId : dict or DataId, optional

Data ID that identifies at least the “required” dimensions of the packer.

returnMaxBits : bool

If True, return a tuple of (packed, self.maxBits).

kwds

Addition keyword arguments used to augment or override the given data ID.

Returns:
packed : int

Packed integer ID.

maxBits : int, optional

Maximum number of nonzero bits in packed. Not returned unless returnMaxBits is True.

registerDatasetType(datasetType)

Add a new DatasetType to the Registry.

It is not an error to register the same DatasetType twice.

Parameters:
datasetType : DatasetType

The DatasetType to be added.

Returns:
inserted : bool

True if datasetType was inserted, False if an identical existing DatsetType was found. Note that in either case the DatasetType is guaranteed to be defined in the Registry consistently with the given definition.

Raises:
ValueError

Raised if the dimensions or storage class are invalid.

ConflictingDefinitionError

Raised if this DatasetType is already registered with a different definition.

removeDataset(ref)

Remove a dataset from the Registry.

The dataset and all components will be removed unconditionally from all collections, and any associated Quantum records will also be removed. Datastore records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.

Parameters:
ref : DatasetRef

Reference to the dataset to be removed. Must include a valid id attribute, and should be considered invalidated upon return.

Raises:
AmbiguousDatasetError

Raised if ref.id is None.

OrphanedRecordError

Raised if the dataset is still present in any Datastore.

removeDatasetLocation(datastoreName, ref)

Remove datastore location associated with this dataset.

Typically used by Datastore when a dataset is removed.

Parameters:
datastoreName : str

Name of this Datastore.

ref : DatasetRef

A reference to the dataset for which information is to be removed.

Raises:
AmbiguousDatasetError

Raised if ref.id is None.

selectMultipleDatasetTypes(originInfo, expression=None, required=(), optional=(), prerequisite=(), perDatasetTypeDimensions=(), expandDataIds=True)

Evaluate a filter expression and lists of DatasetTypes and return a set of dimension values.

The returned rows consists of combinations of dimensions participating in the transformation from required to optional dataset types, restricted by existing datasets and filter expression.

Parameters:
originInfo : DatasetOriginInfo

Object which provides names of the input/output collections.

expression : str

An expression that limits the Dimensions and (indirectly) the Datasets returned.

required : iterable of DatasetType or str

The list of DatasetTypes whose Dimensions will be included in the returned column set. Output is limited to the the Datasets of these DatasetTypes which already exist in the registry.

optional : iterable of DatasetType or str

The list of DatasetTypes whose Dimensions will be included in the returned column set. Datasets of these types may or may not existin the registry.

prerequisite : iterable of DatasetType or str

DatasetTypes that should not constrain the query results, but must be present for all result rows. These are included with a LEFT OUTER JOIN, but the results are checked for NULL. Unlike regular inputs, prerequisite inputs lookups may be deferred (by some Registry implementations). Any DatasetTypes that are present in both required and prerequisite are considered prerequisite.

perDatasetTypeDimensions : iterable of Dimension or str, optional

Dimensions (or str names thereof) for which different dataset types do not need to have the same values in each result row.

expandDataIds : bool

If True (default), expand all data IDs when returning them.

Yields:
row : MultipleDatasetQueryRow

Single row is a unique combination of units in a transform.

Raises:
NotImplementedError

Raised if limited is True.

LookupError

Raised (during iteration) if a prerequisite dataset is not found.

classmethod setConfigRoot(root, config, full)

Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.

Parameters:
root : str

Filesystem path to the root of the data repository.

config : Config

A Config to update. Only the subset understood by this component will be updated. Will not expand defaults.

full : Config

A complete config with all defaults expanded that can be converted to a RegistryConfig. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied from full to Config.

setDimensionRegion(dataId=None, *, update=True, region=None, **kwds)

Set the region field for a Dimension instance or a combination thereof and update associated spatial join tables.

Parameters:
dataId : dict or DataId

A dict-like object containing the Dimension links that form the primary key of the row to insert or update. If this is a full DataId, dataId.region will be set to region (if region is not None) and then used to update or insert into the Registry.

update : bool

If True, existing region information for these Dimensions is being replaced. This is usually required because Dimension entries are assumed to be pre-inserted prior to calling this function.

region : lsst.sphgeom.ConvexPolygon, optional

The region to update or insert into the Registry. If not present dataId.region must not be None.

kwds

Additional keyword arguments passed to the DataId constructor to convert dataId to a true DataId or augment an existing one.

Returns:
dataId : DataId

A Data ID with its region attribute set.

Raises:
NotImplementedError

Raised if limited is True.

transaction()

Optionally implemented in Registry subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.

This context manager may be nested (e.g. any implementation by a Registry subclass must nest properly).

Warning

The level of exception safety is not guaranteed by this API. It may implement stong exception safety and roll back any changes leaving the state unchanged, or it may do nothing leaving the underlying Registry corrupted. Depending on the implementation in the subclass.