Registry¶
-
class
lsst.daf.butler.
Registry
(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶ Bases:
object
Registry interface.
Parameters: - registryConfig :
RegistryConfig
Registry configuration.
- schemaConfig :
SchemaConfig
, optional Schema configuration.
- dimensionConfig :
DimensionConfig
orConfig
or DimensionGraph
configuration.
Attributes Summary
defaultConfigFile
Path to configuration defaults. limited
If True, this Registry does not maintain Dimension metadata or relationships ( bool
).pixelization
Object that interprets SkyPix Dimension values ( lsst.sphgeom.Pixelization
).Methods Summary
addDataset
(datasetType, dataId, run[, …])Adds a Dataset entry to the Registry
addDatasetLocation
(ref, datastoreName)Add datastore name locating a given dataset. addDimensionEntry
(dimension[, dataId, entry])Add a new Dimension
entry.addExecution
(execution)Add a new Execution
to theRegistry
.addQuantum
(quantum)Add a new Quantum
to theRegistry
.addRun
(run)Add a new Run
to theRegistry
.associate
(collection, refs)Add existing Datasets to a collection, implicitly creating the collection if it does not already exist. attachComponent
(name, parent, component)Attach a component to a dataset. close
()This method performs any steps to properly close a registry instance. disassociate
(collection, refs)Remove existing Datasets from a collection. ensureRun
(run)Conditionally add a new Run
to theRegistry
.expandDataId
([dataId, dimension, metadata, …])Expand a data ID to include additional information. find
(collection, datasetType[, dataId])Lookup a dataset. findDimensionEntries
(dimension)Return all Dimension
entries corresponding to the named dimension.findDimensionEntry
(dimension[, dataId])Return a Dimension
entry corresponding to aDataId
.fromConfig
(registryConfig[, schemaConfig, …])Create Registry
subclass instance fromconfig
.getAllCollections
()Get names of all the collections found in this repository. getAllDatasetTypes
()Get every registered DatasetType
.getDataset
(id[, datasetType, dataId])Retrieve a Dataset entry. getDatasetLocations
(ref)Retrieve datastore locations for a given dataset. getDatasetType
(name)Get the DatasetType
.getExecution
(id)Retrieve an Execution. getQuantum
(id)Retrieve an Quantum. getRun
([id, collection])Get a Run
corresponding to its collection or idmakeDataIdPacker
(name[, dataId])Create an object that can pack certain data IDs into integers. makeDatabaseDict
(table, types, key, value[, …])Construct a DatabaseDict
backed by a table in the same database as this Registry.makeRun
(collection)Create a new Run
in theRegistry
and return it.markInputUsed
(quantum, ref)Record the given DatasetRef
as an actual (not just predicted) input of the givenQuantum
.packDataId
(name[, dataId, returnMaxBits])Pack the given DataId
into an integer.registerDatasetType
(datasetType)Add a new DatasetType
to the Registry.removeDataset
(ref)Remove a dataset from the Registry. removeDatasetLocation
(datastoreName, ref)Remove datastore location associated with this dataset. selectMultipleDatasetTypes
(originInfo[, …])Evaluate a filter expression and lists of DatasetTypes
and return a set of dimension values.setConfigRoot
(root, config, full)Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root. setDimensionRegion
([dataId, update, region])Set the region field for a Dimension instance or a combination thereof and update associated spatial join tables. transaction
()Optionally implemented in Registry
subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.Attributes Documentation
-
defaultConfigFile
= None¶ Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.
-
pixelization
¶ Object that interprets SkyPix Dimension values (
lsst.sphgeom.Pixelization
).None
for limited registries.
Methods Documentation
-
addDataset
(datasetType, dataId, run, producer=None, recursive=False, **kwds)¶ Adds a Dataset entry to the
Registry
This always adds a new Dataset; to associate an existing Dataset with a new collection, use
associate
.Parameters: - datasetType :
DatasetType
orstr
A
DatasetType
or the name of one.- dataId :
dict
orDataId
A
dict
-like object containing theDimension
links that identify the dataset within a collection.- run :
Run
The
Run
instance that produced the Dataset. Ignored ifproducer
is passed (producer.run
is then used instead). A Run must be provided by one of the two arguments.- producer :
Quantum
Unit of work that produced the Dataset. May be
None
to store no provenance information, but if present theQuantum
must already have been added to the Registry.- recursive :
bool
If True, recursively add Dataset and attach entries for component Datasets as well.
- kwds
Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
Returns: - ref :
DatasetRef
A newly-created
DatasetRef
instance.
Raises: - ConflictingDefinitionError
If a Dataset with the given
DatasetRef
already exists in the given collection.- Exception
If
dataId
contains unknown or invalidDimension
entries.
- datasetType :
-
addDatasetLocation
(ref, datastoreName)¶ Add datastore name locating a given dataset.
Typically used by
Datastore
.Parameters: - ref :
DatasetRef
A reference to the dataset for which to add storage information.
- datastoreName :
str
Name of the datastore holding this dataset.
Raises: - AmbiguousDatasetError
Raised if
ref.id
isNone
.
- ref :
-
addDimensionEntry
(dimension, dataId=None, entry=None, **kwds)¶ Add a new
Dimension
entry.- dimension :
str
orDimension
- Either a
Dimension
object or the name of one. - dataId :
dict
orDataId
, optional - A
dict
-like object containing theDimension
links that form the primary key of the row to insert. If this is a fullDataId
object,dataId.entries[dimension]
will be updated withentry
and then inserted into theRegistry
. - entry :
dict
- Dictionary that maps column name to column value.
- kwds
- Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
If
values
includes a “region” key,setDimensionRegion
will automatically be called to set it any associated spatial join tables. Region fields associated with a combination of Dimensions must be explicitly set separately.Returns: - dataId :
DataId
A Data ID for exactly the given dimension that includes the added entry.
Raises: - dimension :
-
addExecution
(execution)¶ Add a new
Execution
to theRegistry
.If
execution.id
isNone
theRegistry
will update it to that of the newly inserted entry.Parameters: Raises: - ConflictingDefinitionError
If
execution
is already present in theRegistry
.
-
addQuantum
(quantum)¶ Add a new
Quantum
to theRegistry
.Parameters: - quantum :
Quantum
Instance to add to the
Registry
. The givenQuantum
must not already be present in theRegistry
(or any other), therefore its:run
attribute must be set to an existingRun
.predictedInputs
attribute must be fully populated withDatasetRef
s, and its.actualInputs
andoutputs
will be ignored.
- quantum :
-
addRun
(run)¶ Add a new
Run
to theRegistry
.Parameters: Raises: - ConflictingDefinitionError
If a run already exists with this collection.
-
associate
(collection, refs)¶ Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
If a DatasetRef with the same exact
dataset_id
is already in a collection nothing is changed. If aDatasetRef
with the sameDatasetType1
and dimension values but with differentdataset_id
exists in the collection,ValueError
is raised.Parameters: - collection :
str
Indicates the collection the Datasets should be associated with.
- refs : iterable of
DatasetRef
An iterable of
DatasetRef
instances that already exist in thisRegistry
. All component datasets will be associated with the collection as well.
Raises: - ConflictingDefinitionError
If a Dataset with the given
DatasetRef
already exists in the given collection.
- collection :
-
attachComponent
(name, parent, component)¶ Attach a component to a dataset.
Parameters: - name :
str
Name of the component.
- parent :
DatasetRef
A reference to the parent dataset. Will be updated to reference the component.
- component :
DatasetRef
A reference to the component dataset.
Raises: - AmbiguousDatasetError
Raised if
parent.id
orcomponent.id
isNone
.
- name :
-
close
()¶ This method performs any steps to properly close a registry instance. After this method is called on a registry instance it should be considered unusable. Any new registry interactions should involve a newly constructed registry instance.
-
disassociate
(collection, refs)¶ Remove existing Datasets from a collection.
collection
andref
combinations that are not currently associated are silently ignored.Parameters: - collection :
str
The collection the Datasets should no longer be associated with.
- refs :
list
ofDatasetRef
A
list
ofDatasetRef
instances that already exist in thisRegistry
. All component datasets will also be removed.
Raises: - AmbiguousDatasetError
Raised if
any(ref.id is None for ref in refs)
.
- collection :
-
ensureRun
(run)¶ Conditionally add a new
Run
to theRegistry
.If the
run.id
isNone
or aRun
with thisid
doesn’t exist in theRegistry
yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.Parameters: Raises: - ConflictingDefinitionError
If
run
already exists, but is not identical.
-
expandDataId
(dataId=None, *, dimension=None, metadata=None, region=False, update=False, **kwds)¶ Expand a data ID to include additional information.
expandDataId
always returns a trueDataId
and ensures that itsentries
dict contains (at least) values for all implied dependencies.Parameters: - dataId :
dict
orDataId
A
dict
-like object containing theDimension
links that include the primary keys of the rows to query. If this is a trueDataId
, the object will be updated in-place.- dimension :
Dimension
orstr
A dimension passed to the
DataId
constructor to create a trueDataId
or augment an existing one.- metadata :
collections.abc.Mapping
, optional A mapping from
Dimension
orstr
name to column name, indicating fields to read intodataId.entries
. Ifdimension
is provided, may instead be a sequence of column names for that dimension.- region :
bool
If
True
and the givenDataId
is uniquely associated with a region on the sky, obtain that region from theRegistry
and attach it asdataId.region
.- update :
bool
If
True
, assume existing entries and regions in the givenDataId
are out-of-date and should be updated by values in the database. IfFalse
, existing values will be assumed to be correct and database queries will only be executed if they are missing.- kwds
Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
Returns: - dataId :
DataId
A Data ID with all requested data populated.
Raises: - dataId :
-
find
(collection, datasetType, dataId=None, **kwds)¶ Lookup a dataset.
This can be used to obtain a
DatasetRef
that permits the dataset to be read from aDatastore
.Parameters: - collection :
str
Identifies the collection to search.
- datasetType :
DatasetType
orstr
A
DatasetType
or the name of one.- dataId :
dict
orDataId
, optional A
dict
-like object containing theDimension
links that identify the dataset within a collection.- kwds
Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
Returns: - ref :
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
Raises: - LookupError
If one or more data ID keys are missing.
- collection :
-
findDimensionEntries
(dimension)¶ Return all
Dimension
entries corresponding to the named dimension.Parameters: Returns: Raises:
-
findDimensionEntry
(dimension, dataId=None, **kwds)¶ Return a
Dimension
entry corresponding to aDataId
.Parameters: - dimension :
str
orDimension
Either a
Dimension
object or the name of one.- dataId :
dict
orDataId
, optional A
dict
-like object containing theDimension
links that form the primary key of the row to retreive. If this is a fullDataId
object,dataId.entries[dimension]
will be updated with the entry obtained from theRegistry
.- kwds
Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
Returns: Raises: - dimension :
-
static
fromConfig
(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶ Create
Registry
subclass instance fromconfig
.Uses
registry.cls
fromconfig
to determine which subclass to instantiate.Parameters: - registryConfig :
ButlerConfig
,RegistryConfig
,Config
orstr
Registry configuration
- schemaConfig :
SchemaConfig
,Config
orstr
, optional. Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and
schemaConfig
isNone
.- dimensionConfig :
DimensionConfig
orConfig
or str
, optional.DimensionGraph
configuration. Can be read from supplied registryConfig if the relevant component is defined anddimensionConfig
isNone
.- create :
bool
Assume empty Registry and create a new one.
Returns: - registryConfig :
-
getAllCollections
()¶ Get names of all the collections found in this repository.
Returns:
-
getAllDatasetTypes
()¶ Get every registered
DatasetType
.Returns: - types :
frozenset
ofDatasetType
Every
DatasetType
in the registry.
- types :
-
getDataset
(id, datasetType=None, dataId=None)¶ Retrieve a Dataset entry.
Parameters: - id :
int
The unique identifier for the Dataset.
- datasetType :
DatasetType
, optional The
DatasetType
of the dataset to retrieve. This is used to short-circuit retrieving theDatasetType
, so if provided, the caller is guaranteeing that it is what would have been retrieved.- dataId :
DataId
, optional A
Dimension
-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving theDataId
, so if provided, the caller is guaranteeing that it is what would have been retrieved.
Returns: - ref :
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
- id :
-
getDatasetLocations
(ref)¶ Retrieve datastore locations for a given dataset.
Typically used by
Datastore
.Parameters: - ref :
DatasetRef
A reference to the dataset for which to retrieve storage information.
Returns: Raises: - AmbiguousDatasetError
Raised if
ref.id
isNone
.
- ref :
-
getDatasetType
(name)¶ Get the
DatasetType
.Parameters: - name :
str
Name of the type.
Returns: - type :
DatasetType
The
DatasetType
associated with the given name.
Raises: - KeyError
Requested named DatasetType could not be found in registry.
- name :
-
getExecution
(id)¶ Retrieve an Execution.
Parameters: - id :
int
The unique identifier for the Execution.
- id :
-
getRun
(id=None, collection=None)¶ Get a
Run
corresponding to its collection or idParameters: Returns: Raises: - ValueError
Must supply one of
collection
orid
.
-
makeDataIdPacker
(name, dataId=None, **kwds)¶ Create an object that can pack certain data IDs into integers.
Parameters: Returns: - packer :
DataIdPacker
Instance of a subclass of
DataIdPacker
.
- packer :
-
makeDatabaseDict
(table, types, key, value, lengths=None)¶ Construct a
DatabaseDict
backed by a table in the same database as this Registry.Parameters: - table :
table
Name of the table that backs the returned
DatabaseDict
. If this table already exists, its schema must include at least everything intypes
.- types :
dict
A dictionary mapping
str
field names to type objects, containing all fields to be held in the database.- key :
str
The name of the field to be used as the dictionary key. Must not be present in
value._fields
.- value :
type
The type used for the dictionary’s values, typically a
namedtuple
. Must have a_fields
class attribute that is a tuple of field names (i.e. as defined bynamedtuple
); these field names must also appear in thetypes
arg, and a_make
attribute to construct it from a sequence of values (again, as defined bynamedtuple
).- lengths :
dict
, optional Specific lengths of string fields. Defaults will be used if not specified.
Returns: - databaseDict :
DatabaseDict
DatabaseDict
backed by this registry.
- table :
-
makeRun
(collection)¶ Create a new
Run
in theRegistry
and return it.If a run with this collection already exists, return that instead.
Parameters: Returns:
-
markInputUsed
(quantum, ref)¶ Record the given
DatasetRef
as an actual (not just predicted) input of the givenQuantum
.This updates both the
Registry
”sQuantum
table and the PythonQuantum.actualInputs
attribute.Parameters: - quantum :
Quantum
Producer to update. Will be updated in this call.
- ref :
DatasetRef
To set as actually used input.
Raises: - KeyError
If
quantum
is not a predicted consumer forref
.
- quantum :
-
packDataId
(name, dataId=None, *, returnMaxBits=False, **kwds)¶ Pack the given
DataId
into an integer.Parameters: - name :
str
Name of the packer, as given in the
Registry
configuration.- dataId :
dict
orDataId
, optional Data ID that identifies at least the “required” dimensions of the packer.
- returnMaxBits :
bool
If
True
, return a tuple of(packed, self.maxBits)
.- kwds
Addition keyword arguments used to augment or override the given data ID.
Returns: - name :
-
registerDatasetType
(datasetType)¶ Add a new
DatasetType
to the Registry.It is not an error to register the same
DatasetType
twice.Parameters: - datasetType :
DatasetType
The
DatasetType
to be added.
Returns: Raises: - ValueError
Raised if the dimensions or storage class are invalid.
- ConflictingDefinitionError
Raised if this DatasetType is already registered with a different definition.
- datasetType :
-
removeDataset
(ref)¶ Remove a dataset from the Registry.
The dataset and all components will be removed unconditionally from all collections, and any associated
Quantum
records will also be removed.Datastore
records will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.Parameters: - ref :
DatasetRef
Reference to the dataset to be removed. Must include a valid
id
attribute, and should be considered invalidated upon return.
Raises: - ref :
-
removeDatasetLocation
(datastoreName, ref)¶ Remove datastore location associated with this dataset.
Typically used by
Datastore
when a dataset is removed.Parameters: - datastoreName :
str
Name of this
Datastore
.- ref :
DatasetRef
A reference to the dataset for which information is to be removed.
Raises: - AmbiguousDatasetError
Raised if
ref.id
isNone
.
- datastoreName :
-
selectMultipleDatasetTypes
(originInfo, expression=None, required=(), optional=(), prerequisite=(), perDatasetTypeDimensions=(), expandDataIds=True)¶ Evaluate a filter expression and lists of
DatasetTypes
and return a set of dimension values.The returned rows consists of combinations of dimensions participating in the transformation from
required
tooptional
dataset types, restricted by existing datasets and filter expression.Parameters: - originInfo :
DatasetOriginInfo
Object which provides names of the input/output collections.
- expression :
str
An expression that limits the
Dimensions
and (indirectly) the Datasets returned.- required : iterable of
DatasetType
orstr
The
list
of DatasetTypes whose Dimensions will be included in the returned column set. Output is limited to the the Datasets of these DatasetTypes which already exist in the registry.- optional : iterable of
DatasetType
orstr
The
list
of DatasetTypes whose Dimensions will be included in the returned column set. Datasets of these types may or may not existin the registry.- prerequisite : iterable of
DatasetType
orstr
DatasetTypes that should not constrain the query results, but must be present for all result rows. These are included with a LEFT OUTER JOIN, but the results are checked for NULL. Unlike regular inputs, prerequisite inputs lookups may be deferred (by some
Registry
implementations). Any DatasetTypes that are present in bothrequired
andprerequisite
are consideredprerequisite
.- perDatasetTypeDimensions : iterable of
Dimension
orstr
, optional Dimensions (or
str
names thereof) for which different dataset types do not need to have the same values in each result row.- expandDataIds :
bool
If
True
(default), expand all data IDs when returning them.
Yields: - row :
MultipleDatasetQueryRow
Single row is a unique combination of units in a transform.
Raises: - originInfo :
-
classmethod
setConfigRoot
(root, config, full)¶ Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
Parameters: - root :
str
Filesystem path to the root of the data repository.
- config :
Config
A
Config
to update. Only the subset understood by this component will be updated. Will not expand defaults.- full :
Config
A complete config with all defaults expanded that can be converted to a
RegistryConfig
. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied fromfull
toConfig
.
- root :
-
setDimensionRegion
(dataId=None, *, update=True, region=None, **kwds)¶ Set the region field for a Dimension instance or a combination thereof and update associated spatial join tables.
Parameters: - dataId :
dict
orDataId
A
dict
-like object containing theDimension
links that form the primary key of the row to insert or update. If this is a fullDataId
,dataId.region
will be set toregion
(ifregion
is notNone
) and then used to update or insert into theRegistry
.- update :
bool
If True, existing region information for these Dimensions is being replaced. This is usually required because Dimension entries are assumed to be pre-inserted prior to calling this function.
- region :
lsst.sphgeom.ConvexPolygon
, optional The region to update or insert into the
Registry
. If not presentdataId.region
must not beNone
.- kwds
Additional keyword arguments passed to the
DataId
constructor to convertdataId
to a trueDataId
or augment an existing one.
Returns: - dataId :
DataId
A Data ID with its
region
attribute set.
Raises: - dataId :
-
transaction
()¶ Optionally implemented in
Registry
subclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.This context manager may be nested (e.g. any implementation by a
Registry
subclass must nest properly).Warning
The level of exception safety is not guaranteed by this API. It may implement stong exception safety and roll back any changes leaving the state unchanged, or it may do nothing leaving the underlying
Registry
corrupted. Depending on the implementation in the subclass.
- registryConfig :