Registry¶
-
class
lsst.daf.butler.Registry(registryConfig, schemaConfig=None, create=False)¶ Bases:
objectRegistry interface.
Parameters: - registryConfig :
RegistryConfig Registry configuration.
- schemaConfig :
SchemaConfig, optional Schema configuration.
Attributes Summary
defaultConfigFilePath to configuration defaults. pixelizationObject that interprets SkyPix DataUnit values ( sphgeom.Pixelization).Methods Summary
addDataUnitEntry(dataUnitName, values)Add a new DataUnitentry.addDataset(datasetType, dataId, run[, …])Adds a Dataset entry to the RegistryaddDatasetLocation(ref, datastoreName)Add datastore name locating a given dataset. addExecution(execution)Add a new Executionto theRegistry.addQuantum(quantum)Add a new Quantumto theRegistry.addRun(run)Add a new Runto theRegistry.associate(collection, refs)Add existing Datasets to a collection, possibly creating the collection in the process. attachComponent(name, parent, component)Attach a component to a dataset. disassociate(collection, refs[, remove])Remove existing Datasets from a collection. ensureRun(run)Conditionally add a new Runto theRegistry.export(expr)Export contents of the Registry, limited to those reachable from the Datasets identified by the expressionexpr, into aTableSetformat such that it can be imported into a different database.find(collection, datasetType, dataId)Lookup a dataset. findDataUnitEntry(dataUnitName, value)Return a DataUnitentry corresponding to avalue.fromConfig(registryConfig[, schemaConfig, …])Create Registrysubclass instance fromconfig.getDataset(id)Retrieve a Dataset entry. getDatasetLocations(ref)Retrieve datastore locations for a given dataset. getDatasetType(name)Get the DatasetType.getExecution(id)Retrieve an Execution. getQuantum(id)Retrieve an Quantum. getRegion(dataId)Get region associated with a dataId. getRun([id, collection])Get a Runcorresponding to its collection or idimport_(tables, collection)Import (previously exported) contents into the (possibly empty) Registry.makeDatabaseDict(table, types, key, value)Construct a DatabaseDict backed by a table in the same database as this Registry. makeProvenanceGraph(expr[, types])Make a QuantumGraphthat contains the full provenance of all Datasets matching an expression.makeRun(collection)Create a new Runin theRegistryand return it.markInputUsed(quantum, ref)Record the given DatasetRefas an actual (not just predicted) input of the givenQuantum.merge(outputCollection, inputCollections)Create a new collection from a series of existing ones. registerDatasetType(datasetType)Add a new DatasetTypeto the Registry.removeDatasetLocation(datastoreName, ref)Remove datastore location associated with this dataset. selectDataUnits(originInfo, expression, …)Evaluate a filter expression and lists of DatasetTypesand return a set of data unit values.setConfigRoot(root, config, full)Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root. setDataUnitRegion(dataUnitNames, value, region)Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables. subset(collection, expr, datasetTypes)Create a new collection by subsetting an existing one. transaction()Optionally implemented in Registrysubclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.transfer(src, expr, collection)Transfer contents from a source Registry, limited to those reachable from the Datasets identified by the expressionexpr, into thisRegistryand associate them with a collection.Attributes Documentation
-
defaultConfigFile= None¶ Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.
-
pixelization¶ Object that interprets SkyPix DataUnit values (
sphgeom.Pixelization).
Methods Documentation
-
addDataUnitEntry(dataUnitName, values)¶ Add a new
DataUnitentry.- dataUnitName :
str - Name of the
DataUnit(e.g."Camera"). - values :
dict - Dictionary of
columnName, columnValuepairs.
If
valuesincludes a “region” key,setDataUnitRegionwill automatically be called to set it any associated spatial join tables. Region fields associated with a combination of DataUnits must be explicitly set separately.Raises: - TypeError
If the given
DataUnitdoes not have explicit entries in the registry.- ValueError
If an entry with the primary-key defined in
valuesis already present.
- dataUnitName :
-
addDataset(datasetType, dataId, run, producer=None, recursive=False)¶ Adds a Dataset entry to the
RegistryThis always adds a new Dataset; to associate an existing Dataset with a new collection, use
associate.Parameters: - datasetType :
str Name of a
DatasetType.- dataId :
dict A
dictofDataUnitlink name, value pairs that label theDatasetRefwithin a collection.- run :
Run The
Runinstance that produced the Dataset. Ignored ifproduceris passed (producer.runis then used instead). A Run must be provided by one of the two arguments.- producer :
Quantum Unit of work that produced the Dataset. May be
Noneto store no provenance information, but if present theQuantummust already have been added to the Registry.- recursive :
bool If True, recursively add Dataset and attach entries for component Datasets as well.
Returns: - ref :
DatasetRef A newly-created
DatasetRefinstance.
Raises: - ValueError
If a Dataset with the given
DatasetRefalready exists in the given collection.- Exception
If
dataIdcontains unknown or invalidDataUnitentries.
- datasetType :
-
addDatasetLocation(ref, datastoreName)¶ Add datastore name locating a given dataset.
Typically used by
Datastore.Parameters: - ref :
DatasetRef A reference to the dataset for which to add storage information.
- datastoreName :
str Name of the datastore holding this dataset.
- ref :
-
addExecution(execution)¶ Add a new
Executionto theRegistry.If
execution.idisNonetheRegistrywill update it to that of the newly inserted entry.Parameters: Raises: - Exception
If
Executionis already present in theRegistry.
-
addQuantum(quantum)¶ Add a new
Quantumto theRegistry.Parameters: - quantum :
Quantum Instance to add to the
Registry. The givenQuantummust not already be present in theRegistry(or any other), therefore its:runattribute must be set to an existingRun.predictedInputsattribute must be fully populated withDatasetRefs, and its.actualInputsandoutputswill be ignored.
- quantum :
-
addRun(run)¶ Add a new
Runto theRegistry.Parameters: Raises: - ValueError
If a run already exists with this collection.
-
associate(collection, refs)¶ Add existing Datasets to a collection, possibly creating the collection in the process.
Parameters: - collection :
str Indicates the collection the Datasets should be associated with.
- refs :
listofDatasetRef A
listofDatasetRefinstances that already exist in thisRegistry.
- collection :
-
attachComponent(name, parent, component)¶ Attach a component to a dataset.
Parameters: - name :
str Name of the component.
- parent :
DatasetRef A reference to the parent dataset. Will be updated to reference the component.
- component :
DatasetRef A reference to the component dataset.
- name :
-
disassociate(collection, refs, remove=True)¶ Remove existing Datasets from a collection.
collectionandrefcombinations that are not currently associated are silently ignored.Parameters: - collection :
str The collection the Datasets should no longer be associated with.
- refs :
listofDatasetRef A
listofDatasetRefinstances that already exist in thisRegistry.- remove :
bool If
True, remove Datasets from theRegistryif they are not associated with any collection (including via any composites).
Returns: - removed :
listofDatasetRef If
removeisTrue, thelistofDatasetRefs that were removed.
- collection :
-
ensureRun(run)¶ Conditionally add a new
Runto theRegistry.If the
run.idisNoneor aRunwith thisiddoesn’t exist in theRegistryyet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.Parameters: Raises: - ValueError
If
runalready exists, but is not identical.
-
export(expr)¶ Export contents of the
Registry, limited to those reachable from the Datasets identified by the expressionexpr, into aTableSetformat such that it can be imported into a different database.Parameters: - expr :
str An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the
Datasets, or a `QuantumGraphthat can be similarly interpreted.
Returns: - ts :
TableSet Containing all rows, from all tables in the
Registrythat are reachable from the selected Datasets.
- expr :
-
find(collection, datasetType, dataId)¶ Lookup a dataset.
This can be used to obtain a
DatasetRefthat permits the dataset to be read from aDatastore.Parameters: - collection :
str Identifies the collection to search.
- datasetType :
DatasetType The
DatasetType.- dataId :
dict A
dictofDataUnitlink name, value pairs that label theDatasetRefwithin a collection.
Returns: - ref :
DatasetRef A ref to the Dataset, or
Noneif no matching Dataset was found.
Raises: - ValueError
If dataId is invalid.
- collection :
-
findDataUnitEntry(dataUnitName, value)¶ Return a
DataUnitentry corresponding to avalue.Parameters: Returns:
-
static
fromConfig(registryConfig, schemaConfig=None, create=False)¶ Create
Registrysubclass instance fromconfig.Uses
registry.clsfromconfigto determine which subclass to instantiate.Parameters: - registryConfig :
ButlerConfig,RegistryConfig,Configorstr Registry configuration
- schemaConfig :
SchemaConfig,Configorstr, optional. Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and
schemaConfigisNone.- create :
bool Assume empty Registry and create a new one.
Returns: - registryConfig :
-
getDataset(id)¶ Retrieve a Dataset entry.
Parameters: - id :
int The unique identifier for the Dataset.
Returns: - ref :
DatasetRef A ref to the Dataset, or
Noneif no matching Dataset was found.
- id :
-
getDatasetLocations(ref)¶ Retrieve datastore locations for a given dataset.
Typically used by
Datastore.Parameters: - ref :
DatasetRef A reference to the dataset for which to retrieve storage information.
Returns: - ref :
-
getDatasetType(name)¶ Get the
DatasetType.Parameters: - name :
str Name of the type.
Returns: - type :
DatasetType The
DatasetTypeassociated with the given name.
Raises: - KeyError
Requested named DatasetType could not be found in registry.
- name :
-
getExecution(id)¶ Retrieve an Execution.
Parameters: - id :
int The unique identifier for the Execution.
- id :
-
getRegion(dataId)¶ Get region associated with a dataId.
Parameters: - dataId :
dict A
dictofDataUnitlink name, value pairs that label theDatasetRefwithin a collection.
Returns: - region :
lsst.sphgeom.ConvexPolygon The region associated with a
dataIdorNoneif not present.
Raises: - KeyError
If the set of dataunits for the
dataIddoes not correspond to a unique spatial lookup.
- dataId :
-
getRun(id=None, collection=None)¶ Get a
Runcorresponding to its collection or idParameters: Returns: Raises: - ValueError
Must supply one of
collectionorid.
-
import_(tables, collection)¶ Import (previously exported) contents into the (possibly empty)
Registry.Parameters: - ts :
TableSet Contains the previously exported content.
- collection :
str An additional collection assigned to the newly imported Datasets.
- ts :
-
makeDatabaseDict(table, types, key, value)¶ Construct a DatabaseDict backed by a table in the same database as this Registry.
Parameters: - table :
table Name of the table that backs the returned DatabaseDict. If this table already exists, its schema must include at least everything in
types.- types :
dict A dictionary mapping
strfield names to type objects, containing all fields to be held in the database.- key :
str The name of the field to be used as the dictionary key. Must not be present in
value._fields.- value :
type The type used for the dictionary’s values, typically a
namedtuple. Must have a_fieldsclass attribute that is a tuple of field names (i.e. as defined bynamedtuple); these field names must also appear in thetypesarg, and a_makeattribute to construct it from a sequence of values (again, as defined bynamedtuple).
- table :
-
makeProvenanceGraph(expr, types=None)¶ Make a
QuantumGraphthat contains the full provenance of all Datasets matching an expression.Parameters: - expr :
str An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the Datasets.
Returns: - graph :
QuantumGraph Instance (with
unitsset toNone).
- expr :
-
makeRun(collection)¶ Create a new
Runin theRegistryand return it.If a run with this collection already exists, return that instead.
Parameters: Returns:
-
markInputUsed(quantum, ref)¶ Record the given
DatasetRefas an actual (not just predicted) input of the givenQuantum.This updates both the
Registry”sQuantumtable and the PythonQuantum.actualInputsattribute.Parameters: - quantum :
Quantum Producer to update. Will be updated in this call.
- ref :
DatasetRef To set as actually used input.
Raises: - KeyError
If
quantumis not a predicted consumer forref.
- quantum :
-
merge(outputCollection, inputCollections)¶ Create a new collection from a series of existing ones.
Entries earlier in the list will be used in preference to later entries when both contain Datasets with the same
DatasetRef.Parameters:
-
registerDatasetType(datasetType)¶ Add a new
DatasetTypeto the Registry.It is not an error to register the same
DatasetTypetwice.Parameters: - datasetType :
DatasetType The
DatasetTypeto be added.
Returns: - inserted :
bool TrueifdatasetTypewas inserted,Falseif an identical existingDatsetTypewas found.
Raises: - ValueError
DatasetType is not valid for this registry or is already registered but not identical.
- datasetType :
-
removeDatasetLocation(datastoreName, ref)¶ Remove datastore location associated with this dataset.
Typically used by
Datastorewhen a dataset is removed.Parameters: - datastoreName :
str Name of this
Datastore.- ref :
DatasetRef A reference to the dataset for which information is to be removed.
- datastoreName :
-
selectDataUnits(originInfo, expression, neededDatasetTypes, futureDatasetTypes)¶ Evaluate a filter expression and lists of
DatasetTypesand return a set of data unit values.Returned set consists of combinations of units participating in data transformation from
neededDatasetTypestofutureDatasetTypes, restricted by existing data and filter expression.Parameters: - originInfo :
DatasetOriginInfo Object which provides names of the input/output collections.
- expression :
str An expression that limits the
DataUnitsand (indirectly) the Datasets returned.- neededDatasetTypes :
listofDatasetType The
listofDatasetTypeswhose DataUnits will be included in the returned column set. Output is limited to the the Datasets of these DatasetTypes which already exist in the registry.- futureDatasetTypes :
listofDatasetType The
listofDatasetTypeswhose DataUnits will be included in the returned column set. It is expected that Datasets for these DatasetTypes do not exist in the registry, but presently this is not checked.
Yields: - row :
PreFlightUnitsRow Single row is a unique combination of units in a transform.
- originInfo :
-
classmethod
setConfigRoot(root, config, full)¶ Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
Parameters: - root :
str Filesystem path to the root of the data repository.
- config :
Config A
Configto update. Only the subset understood by this component will be updated. Will not expand defaults.- full :
Config A complete config with all defaults expanded that can be converted to a
RegistryConfig. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied fromfulltoConfig.
- root :
-
setDataUnitRegion(dataUnitNames, value, region, update=True)¶ Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables.
Parameters: - dataUnitNames : sequence
A sequence of DataUnit names whose instances are jointly associated with a region on the sky.
- value :
dict A dictionary of values that uniquely identify the DataUnits.
- region :
sphgeom.ConvexPolygon Region on the sky.
- update :
bool If True, existing region information for these DataUnits is being replaced. This is usually required because DataUnit entries are assumed to be pre-inserted prior to calling this function.
-
subset(collection, expr, datasetTypes)¶ Create a new collection by subsetting an existing one.
Parameters: - collection :
str Indicates the input collection to subset.
- expr :
str An expression that limits the
DataUnits and (indirectly) Datasets in the subset.- datasetTypes :
listofDatasetType The
listofDatasetTypes whose instances should be included in the subset.
Returns: - collection :
str The newly created collection.
- collection :
-
transaction()¶ Optionally implemented in
Registrysubclasses to provide exception safety guarantees in case an exception is raised in the enclosed block.This context manager may be nested (e.g. any implementation by a
Registrysubclass must nest properly).Warning
The level of exception safety is not guaranteed by this API. It may implement stong exception safety and roll back any changes leaving the state unchanged, or it may do nothing leaving the underlying
Registrycorrupted. Depending on the implementation in the subclass.Todo
Investigate if we may want to provide a
TransactionalRegistrysubclass that guarantees a particular level of exception safety.
- registryConfig :