SqlRegistry¶
-
class
lsst.daf.butler.registries.sqlRegistry.
SqlRegistry
(registryConfig, schemaConfig, create=False)¶ Bases:
lsst.daf.butler.Registry
Registry backed by a SQL database.
Parameters: - registryConfig :
SqlRegistryConfig
orstr
Load configuration
- schemaConfig :
SchemaConfig
orstr
Definition of the schema to use.
- create :
bool
Assume registry is empty and create a new one.
Attributes Summary
defaultConfigFile
Path to configuration defaults. Methods Summary
addDataUnitEntry
(dataUnitName, values)Add a new DataUnit
entry.addDataset
(datasetType, dataId, run[, …])Adds a Dataset entry to the Registry
addDatasetLocation
(ref, datastoreName)Add datastore name locating a given dataset. addExecution
(execution)Add a new Execution
to theSqlRegistry
.addQuantum
(quantum)Add a new Quantum
to theSqlRegistry
.addRun
(run)Add a new Run
to theSqlRegistry
.associate
(collection, refs)Add existing Datasets to a collection, possibly creating the collection in the process. attachComponent
(name, parent, component)Attach a component to a dataset. disassociate
(collection, refs[, remove])Remove existing Datasets from a collection. ensureRun
(run)Conditionally add a new Run
to theSqlRegistry
.export
(expr)Export contents of the SqlRegistry
, limited to those reachable from the Datasets identified by the expressionexpr
, into aTableSet
format such that it can be imported into a different database.find
(collection, datasetType, dataId)Lookup a dataset. findDataUnitEntry
(dataUnitName, value)Return a DataUnit
entry corresponding to avalue
.getDataUnitDefinition
(dataUnitName)Return the definition of a DataUnit (an actual DataUnit
object).getDataset
(id)Retrieve a Dataset entry. getDatasetLocations
(ref)Retrieve datastore locations for a given dataset. getDatasetType
(name)Get the DatasetType
.getExecution
(id)Retrieve an Execution. getQuantum
(id)Retrieve an Quantum. getRegion
(dataId)Get region associated with a dataId. getRun
([id, collection])Get a Run
corresponding to its collection or idimport_
(tables, collection)Import (previously exported) contents into the (possibly empty) SqlRegistry
.makeDatabaseDict
(table, types, key, value)Construct a DatabaseDict backed by a table in the same database as this Registry. makeProvenanceGraph
(expr[, types])Make a QuantumGraph
that contains the full provenance of all Datasets matching an expression.makeRun
(collection)Create a new Run
in theSqlRegistry
and return it.markInputUsed
(quantum, ref)Record the given DatasetRef
as an actual (not just predicted) input of the givenQuantum
.merge
(outputCollection, inputCollections)Create a new collection from a series of existing ones. query
(sql, **params)Execute a SQL SELECT statement directly. registerDatasetType
(datasetType)Add a new DatasetType
to the SqlRegistry.removeDatasetLocation
(datastoreName, ref)Remove datastore location associated with this dataset. selectDataUnits
(originInfo, expression, …)Evaluate a filter expression and lists of DatasetTypes
and return a set of data unit values.setDataUnitRegion
(dataUnitNames, value, region)Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables. subset
(collection, expr, datasetTypes)Create a new collection by subsetting an existing one. transaction
()Context manager that implements SQL transactions. Attributes Documentation
-
defaultConfigFile
= None¶ Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.
Methods Documentation
-
addDataUnitEntry
(dataUnitName, values)¶ Add a new
DataUnit
entry.- dataUnitName :
str
- Name of the
DataUnit
(e.g."Camera"
). - values :
dict
- Dictionary of
columnName, columnValue
pairs.
If
values
includes a “region” key,setDataUnitRegion
will automatically be called to set it any associated spatial join tables. Region fields associated with a combination of DataUnits must be explicitly set separately.Raises: - TypeError
If the given
DataUnit
does not have explicit entries in the registry.- ValueError
If an entry with the primary-key defined in
values
is already present.
- dataUnitName :
-
addDataset
(datasetType, dataId, run, producer=None, recursive=False)¶ Adds a Dataset entry to the
Registry
This always adds a new Dataset; to associate an existing Dataset with a new collection, use
associate
.Parameters: - datasetType :
DatasetType
Type of the Dataset.
- dataId :
dict
A
dict
ofDataUnit
link name, value pairs that label theDatasetRef
within a collection.- run :
Run
The
Run
instance that produced the Dataset. Ignored ifproducer
is passed (producer.run
is then used instead). A Run must be provided by one of the two arguments.- producer :
Quantum
Unit of work that produced the Dataset. May be
None
to store no provenance information, but if present theQuantum
must already have been added to the SqlRegistry.- recursive :
bool
If True, recursively add Dataset and attach entries for component Datasets as well.
Returns: - ref :
DatasetRef
A newly-created
DatasetRef
instance.
Raises: - ValueError
If a Dataset with the given
DatasetRef
already exists in the given collection.- Exception
If
dataId
contains unknown or invalidDataUnit
entries.
- datasetType :
-
addDatasetLocation
(ref, datastoreName)¶ Add datastore name locating a given dataset.
Typically used by
Datastore
.Parameters: - ref :
DatasetRef
A reference to the dataset for which to add storage information.
- datastoreName :
str
Name of the datastore holding this dataset.
- ref :
-
addExecution
(execution)¶ Add a new
Execution
to theSqlRegistry
.If
execution.id
isNone
theSqlRegistry
will update it to that of the newly inserted entry.Parameters: - execution :
Execution
Instance to add to the
SqlRegistry
. The givenExecution
must not already be present in theSqlRegistry
.
Raises: - Exception
If
Execution
is already present in theSqlRegistry
.
- execution :
-
addQuantum
(quantum)¶ Add a new
Quantum
to theSqlRegistry
.Parameters: - quantum :
Quantum
Instance to add to the
SqlRegistry
. The givenQuantum
must not already be present in theSqlRegistry
(or any other), therefore its:run
attribute must be set to an existingRun
.predictedInputs
attribute must be fully populated withDatasetRef
s, and its.actualInputs
andoutputs
will be ignored.
- quantum :
-
addRun
(run)¶ Add a new
Run
to theSqlRegistry
.Parameters: - run :
Run
Instance to add to the
SqlRegistry
. The givenRun
must not already be present in theSqlRegistry
(or any other). Therefore itsid
must beNone
and itscollection
must not be associated with any existingRun
.
Raises: - ValueError
If a run already exists with this collection.
- run :
-
associate
(collection, refs)¶ Add existing Datasets to a collection, possibly creating the collection in the process.
Parameters: - collection :
str
Indicates the collection the Datasets should be associated with.
- refs :
list
ofDatasetRef
A
list
ofDatasetRef
instances that already exist in thisSqlRegistry
.
- collection :
-
attachComponent
(name, parent, component)¶ Attach a component to a dataset.
Parameters: - name :
str
Name of the component.
- parent :
DatasetRef
A reference to the parent dataset. Will be updated to reference the component.
- component :
DatasetRef
A reference to the component dataset.
- name :
-
disassociate
(collection, refs, remove=True)¶ Remove existing Datasets from a collection.
collection
andref
combinations that are not currently associated are silently ignored.Parameters: - collection :
str
The collection the Datasets should no longer be associated with.
- refs :
list
ofDatasetRef
A
list
ofDatasetRef
instances that already exist in thisSqlRegistry
.- remove :
bool
If
True
, remove Datasets from theSqlRegistry
if they are not associated with any collection (including via any composites).
Returns: - collection :
-
ensureRun
(run)¶ Conditionally add a new
Run
to theSqlRegistry
.If the
run.id
isNone
or aRun
with thisid
doesn’t exist in theRegistry
yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.Parameters: - run :
Run
Instance to add to the
SqlRegistry
.
Raises: - ValueError
If
run
already exists, but is not identical.
- run :
-
export
(expr)¶ Export contents of the
SqlRegistry
, limited to those reachable from the Datasets identified by the expressionexpr
, into aTableSet
format such that it can be imported into a different database.Parameters: - expr :
str
An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the
Datasets, or a `QuantumGraph
that can be similarly interpreted.
Returns: - ts :
TableSet
Containing all rows, from all tables in the
SqlRegistry
that are reachable from the selected Datasets.
- expr :
-
find
(collection, datasetType, dataId)¶ Lookup a dataset.
This can be used to obtain a
DatasetRef
that permits the dataset to be read from aDatastore
.Parameters: Returns: - ref :
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
Raises: - ValueError
If dataId is invalid.
- ref :
-
findDataUnitEntry
(dataUnitName, value)¶ Return a
DataUnit
entry corresponding to avalue
.Parameters: Returns:
-
getDataUnitDefinition
(dataUnitName)¶ Return the definition of a DataUnit (an actual
DataUnit
object).Parameters: - dataUnitName :
str
Name of the DataUnit, e.g. “Camera”, “Tract”, etc.
- dataUnitName :
-
getDataset
(id)¶ Retrieve a Dataset entry.
Parameters: - id :
int
The unique identifier for the Dataset.
Returns: - ref :
DatasetRef
A ref to the Dataset, or
None
if no matching Dataset was found.
- id :
-
getDatasetLocations
(ref)¶ Retrieve datastore locations for a given dataset.
Typically used by
Datastore
.Parameters: - ref :
DatasetRef
A reference to the dataset for which to retrieve storage information.
Returns: - ref :
-
getDatasetType
(name)¶ Get the
DatasetType
.Parameters: - name :
str
Name of the type.
Returns: - type :
DatasetType
The
DatasetType
associated with the given name.
Raises: - KeyError
Requested named DatasetType could not be found in registry.
- name :
-
getExecution
(id)¶ Retrieve an Execution.
Parameters: - id :
int
The unique identifier for the Execution.
- id :
-
getRegion
(dataId)¶ Get region associated with a dataId.
Parameters: Returns: - region :
lsst.sphgeom.ConvexPolygon
The region associated with a
dataId
orNone
if not present.
Raises: - KeyError
If the set of dataunits for the
dataId
does not correspond to a unique spatial lookup.
- region :
-
getRun
(id=None, collection=None)¶ Get a
Run
corresponding to its collection or idParameters: Returns: - run :
Run
The
Run
instance.
Raises: - ValueError
Must supply one of
collection
orid
.
- run :
-
import_
(tables, collection)¶ Import (previously exported) contents into the (possibly empty)
SqlRegistry
.Parameters: - ts :
TableSet
Contains the previously exported content.
- collection :
str
An additional collection assigned to the newly imported Datasets.
- ts :
-
makeDatabaseDict
(table, types, key, value)¶ Construct a DatabaseDict backed by a table in the same database as this Registry.
Parameters: - table :
table
Name of the table that backs the returned DatabaseDict. If this table already exists, its schema must include at least everything in
types
.- types :
dict
A dictionary mapping
str
field names to type objects, containing all fields to be held in the database.- key :
str
The name of the field to be used as the dictionary key. Must not be present in
value._fields
.- value :
type
The type used for the dictionary’s values, typically a
namedtuple
. Must have a_fields
class attribute that is a tuple of field names (i.e. as defined bynamedtuple
); these field names must also appear in thetypes
arg, and a_make
attribute to construct it from a sequence of values (again, as defined bynamedtuple
).
- table :
-
makeProvenanceGraph
(expr, types=None)¶ Make a
QuantumGraph
that contains the full provenance of all Datasets matching an expression.Parameters: - expr :
str
An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the Datasets.
Returns: - graph :
QuantumGraph
Instance (with
units
set toNone
).
- expr :
-
makeRun
(collection)¶ Create a new
Run
in theSqlRegistry
and return it.If a run with this collection already exists, return that instead.
Parameters: - collection :
str
The collection used to identify all inputs and outputs of the
Run
.
Returns: - run :
Run
A new
Run
instance.
- collection :
-
markInputUsed
(quantum, ref)¶ Record the given
DatasetRef
as an actual (not just predicted) input of the givenQuantum
.This updates both the
SqlRegistry
”sQuantum
table and the PythonQuantum.actualInputs
attribute.Parameters: - quantum :
Quantum
Producer to update. Will be updated in this call.
- ref :
DatasetRef
To set as actually used input.
Raises: - KeyError
If
quantum
is not a predicted consumer forref
.
- quantum :
-
merge
(outputCollection, inputCollections)¶ Create a new collection from a series of existing ones.
Entries earlier in the list will be used in preference to later entries when both contain Datasets with the same
DatasetRef
.Parameters:
-
query
(sql, **params)¶ Execute a SQL SELECT statement directly.
Named parameters are specified in the SQL query string by preceeding them with a colon. Parameter values are provided as additional keyword arguments. For example:
registry.query(“SELECT * FROM Camera WHERE camera=:name”, name=”HSC”)Parameters: - sql :
str
SQL query string. Must be a SELECT statement.
- **params
Parameter name-value pairs to insert into the query.
Yields: - row :
dict
The next row result from executing the query.
- sql :
-
registerDatasetType
(datasetType)¶ Add a new
DatasetType
to the SqlRegistry.It is not an error to register the same
DatasetType
twice.Parameters: - datasetType :
DatasetType
The
DatasetType
to be added.
Returns: - inserted :
bool
True
ifdatasetType
was inserted,False
if an identical existingDatsetType
was found.
Raises: - ValueError
DatasetType is not valid for this registry or is already registered but not identical.
- datasetType :
-
removeDatasetLocation
(datastoreName, ref)¶ Remove datastore location associated with this dataset.
Typically used by
Datastore
when a dataset is removed.Parameters: - datastoreName :
str
Name of this
Datastore
.- ref :
DatasetRef
A reference to the dataset for which information is to be removed.
- datastoreName :
-
selectDataUnits
(originInfo, expression, neededDatasetTypes, futureDatasetTypes)¶ Evaluate a filter expression and lists of
DatasetTypes
and return a set of data unit values.Returned set consists of combinations of units participating in data transformation from
neededDatasetTypes
tofutureDatasetTypes
, restricted by existing data and filter expression.Parameters: - originInfo :
DatasetOriginInfo
Object which provides names of the input/output collections.
- expression :
str
An expression that limits the
DataUnits
and (indirectly) the Datasets returned.- neededDatasetTypes :
list
ofDatasetType
The
list
ofDatasetTypes
whose DataUnits will be included in the returned column set. Output is limited to the the Datasets of these DatasetTypes which already exist in the registry.- futureDatasetTypes :
list
ofDatasetType
The
list
ofDatasetTypes
whose DataUnits will be included in the returned column set. It is expected that Datasets for these DatasetTypes do not exist in the registry, but presently this is not checked.
Yields: - row :
PreFlightUnitsRow
Single row is a unique combination of units in a transform.
- originInfo :
-
setDataUnitRegion
(dataUnitNames, value, region, update=True)¶ Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables.
Parameters: - dataUnitNames : sequence
A sequence of DataUnit names whose instances are jointly associated with a region on the sky. This must not include dependencies that are implied, e.g. “Patch” must not include “Tract”, but “Sensor” needs to add “Visit”.
- value :
dict
A dictionary of values that uniquely identify the DataUnits.
- region :
sphgeom.ConvexPolygon
Region on the sky.
- update :
bool
If True, existing region information for these DataUnits is being replaced. This is usually required because DataUnit entries are assumed to be pre-inserted prior to calling this function.
-
subset
(collection, expr, datasetTypes)¶ Create a new collection by subsetting an existing one.
Parameters: Returns: - collection :
str
The newly created collection.
- collection :
-
transaction
()¶ Context manager that implements SQL transactions.
Will roll back any changes to the
SqlRegistry
database in case an exception is raised in the enclosed block.This context manager may be nested.
- registryConfig :