SqlRegistry

class lsst.daf.butler.registries.sqlRegistry.SqlRegistry(registryConfig, schemaConfig, create=False)

Bases: lsst.daf.butler.Registry

Registry backed by a SQL database.

Parameters:
registryConfig : SqlRegistryConfig or str

Load configuration

schemaConfig : SchemaConfig or str

Definition of the schema to use.

create : bool

Assume registry is empty and create a new one.

Attributes Summary

defaultConfigFile Path to configuration defaults.

Methods Summary

addDataUnitEntry(dataUnitName, values) Add a new DataUnit entry.
addDataset(datasetType, dataId, run[, …]) Adds a Dataset entry to the Registry
addDatasetLocation(ref, datastoreName) Add datastore name locating a given dataset.
addExecution(execution) Add a new Execution to the SqlRegistry.
addQuantum(quantum) Add a new Quantum to the SqlRegistry.
addRun(run) Add a new Run to the SqlRegistry.
associate(collection, refs) Add existing Datasets to a collection, possibly creating the collection in the process.
attachComponent(name, parent, component) Attach a component to a dataset.
disassociate(collection, refs[, remove]) Remove existing Datasets from a collection.
ensureRun(run) Conditionally add a new Run to the SqlRegistry.
export(expr) Export contents of the SqlRegistry, limited to those reachable from the Datasets identified by the expression expr, into a TableSet format such that it can be imported into a different database.
find(collection, datasetType, dataId) Lookup a dataset.
findDataUnitEntry(dataUnitName, value) Return a DataUnit entry corresponding to a value.
getDataUnitDefinition(dataUnitName) Return the definition of a DataUnit (an actual DataUnit object).
getDataset(id) Retrieve a Dataset entry.
getDatasetLocations(ref) Retrieve datastore locations for a given dataset.
getDatasetType(name) Get the DatasetType.
getExecution(id) Retrieve an Execution.
getQuantum(id) Retrieve an Quantum.
getRegion(dataId) Get region associated with a dataId.
getRun([id, collection]) Get a Run corresponding to its collection or id
import_(tables, collection) Import (previously exported) contents into the (possibly empty) SqlRegistry.
makeDatabaseDict(table, types, key, value) Construct a DatabaseDict backed by a table in the same database as this Registry.
makeProvenanceGraph(expr[, types]) Make a QuantumGraph that contains the full provenance of all Datasets matching an expression.
makeRun(collection) Create a new Run in the SqlRegistry and return it.
markInputUsed(quantum, ref) Record the given DatasetRef as an actual (not just predicted) input of the given Quantum.
merge(outputCollection, inputCollections) Create a new collection from a series of existing ones.
query(sql, **params) Execute a SQL SELECT statement directly.
registerDatasetType(datasetType) Add a new DatasetType to the SqlRegistry.
removeDatasetLocation(datastoreName, ref) Remove datastore location associated with this dataset.
selectDataUnits(originInfo, expression, …) Evaluate a filter expression and lists of DatasetTypes and return a set of data unit values.
setDataUnitRegion(dataUnitNames, value, region) Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables.
subset(collection, expr, datasetTypes) Create a new collection by subsetting an existing one.
transaction() Context manager that implements SQL transactions.

Attributes Documentation

defaultConfigFile = None

Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.

Methods Documentation

addDataUnitEntry(dataUnitName, values)

Add a new DataUnit entry.

dataUnitName : str
Name of the DataUnit (e.g. "Camera").
values : dict
Dictionary of columnName, columnValue pairs.

If values includes a “region” key, setDataUnitRegion will automatically be called to set it any associated spatial join tables. Region fields associated with a combination of DataUnits must be explicitly set separately.

Raises:
TypeError

If the given DataUnit does not have explicit entries in the registry.

ValueError

If an entry with the primary-key defined in values is already present.

addDataset(datasetType, dataId, run, producer=None, recursive=False)

Adds a Dataset entry to the Registry

This always adds a new Dataset; to associate an existing Dataset with a new collection, use associate.

Parameters:
datasetType : DatasetType

Type of the Dataset.

dataId : dict

A dict of DataUnit link name, value pairs that label the DatasetRef within a collection.

run : Run

The Run instance that produced the Dataset. Ignored if producer is passed (producer.run is then used instead). A Run must be provided by one of the two arguments.

producer : Quantum

Unit of work that produced the Dataset. May be None to store no provenance information, but if present the Quantum must already have been added to the SqlRegistry.

recursive : bool

If True, recursively add Dataset and attach entries for component Datasets as well.

Returns:
ref : DatasetRef

A newly-created DatasetRef instance.

Raises:
ValueError

If a Dataset with the given DatasetRef already exists in the given collection.

Exception

If dataId contains unknown or invalid DataUnit entries.

addDatasetLocation(ref, datastoreName)

Add datastore name locating a given dataset.

Typically used by Datastore.

Parameters:
ref : DatasetRef

A reference to the dataset for which to add storage information.

datastoreName : str

Name of the datastore holding this dataset.

addExecution(execution)

Add a new Execution to the SqlRegistry.

If execution.id is None the SqlRegistry will update it to that of the newly inserted entry.

Parameters:
execution : Execution

Instance to add to the SqlRegistry. The given Execution must not already be present in the SqlRegistry.

Raises:
Exception

If Execution is already present in the SqlRegistry.

addQuantum(quantum)

Add a new Quantum to the SqlRegistry.

Parameters:
quantum : Quantum

Instance to add to the SqlRegistry. The given Quantum must not already be present in the SqlRegistry (or any other), therefore its:

  • run attribute must be set to an existing Run.
  • predictedInputs attribute must be fully populated with DatasetRefs, and its.
  • actualInputs and outputs will be ignored.
addRun(run)

Add a new Run to the SqlRegistry.

Parameters:
run : Run

Instance to add to the SqlRegistry. The given Run must not already be present in the SqlRegistry (or any other). Therefore its id must be None and its collection must not be associated with any existing Run.

Raises:
ValueError

If a run already exists with this collection.

associate(collection, refs)

Add existing Datasets to a collection, possibly creating the collection in the process.

Parameters:
collection : str

Indicates the collection the Datasets should be associated with.

refs : list of DatasetRef

A list of DatasetRef instances that already exist in this SqlRegistry.

attachComponent(name, parent, component)

Attach a component to a dataset.

Parameters:
name : str

Name of the component.

parent : DatasetRef

A reference to the parent dataset. Will be updated to reference the component.

component : DatasetRef

A reference to the component dataset.

disassociate(collection, refs, remove=True)

Remove existing Datasets from a collection.

collection and ref combinations that are not currently associated are silently ignored.

Parameters:
collection : str

The collection the Datasets should no longer be associated with.

refs : list of DatasetRef

A list of DatasetRef instances that already exist in this SqlRegistry.

remove : bool

If True, remove Datasets from the SqlRegistry if they are not associated with any collection (including via any composites).

Returns:
removed : list of DatasetRef

If remove is True, the list of DatasetRefs that were removed.

ensureRun(run)

Conditionally add a new Run to the SqlRegistry.

If the run.id is None or a Run with this id doesn’t exist in the Registry yet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.

Parameters:
run : Run

Instance to add to the SqlRegistry.

Raises:
ValueError

If run already exists, but is not identical.

export(expr)

Export contents of the SqlRegistry, limited to those reachable from the Datasets identified by the expression expr, into a TableSet format such that it can be imported into a different database.

Parameters:
expr : str

An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the Datasets, or a `QuantumGraph that can be similarly interpreted.

Returns:
ts : TableSet

Containing all rows, from all tables in the SqlRegistry that are reachable from the selected Datasets.

find(collection, datasetType, dataId)

Lookup a dataset.

This can be used to obtain a DatasetRef that permits the dataset to be read from a Datastore.

Parameters:
collection : str

Identifies the collection to search.

datasetType : DatasetType

The DatasetType.

dataId : dict

A dict of DataUnit link name, value pairs that label the DatasetRef within a collection.

Returns:
ref : DatasetRef

A ref to the Dataset, or None if no matching Dataset was found.

Raises:
ValueError

If dataId is invalid.

findDataUnitEntry(dataUnitName, value)

Return a DataUnit entry corresponding to a value.

Parameters:
dataUnitName : str

Name of a DataUnit

value : dict

A dictionary of values that uniquely identify the DataUnit.

Returns:
dataUnitEntry : dict

Dictionary with all DataUnit values, or None if no matching entry is found.

getDataUnitDefinition(dataUnitName)

Return the definition of a DataUnit (an actual DataUnit object).

Parameters:
dataUnitName : str

Name of the DataUnit, e.g. “Camera”, “Tract”, etc.

getDataset(id)

Retrieve a Dataset entry.

Parameters:
id : int

The unique identifier for the Dataset.

Returns:
ref : DatasetRef

A ref to the Dataset, or None if no matching Dataset was found.

getDatasetLocations(ref)

Retrieve datastore locations for a given dataset.

Typically used by Datastore.

Parameters:
ref : DatasetRef

A reference to the dataset for which to retrieve storage information.

Returns:
datastores : set of str

All the matching datastores holding this dataset. Empty set if the dataset does not exist anywhere.

getDatasetType(name)

Get the DatasetType.

Parameters:
name : str

Name of the type.

Returns:
type : DatasetType

The DatasetType associated with the given name.

Raises:
KeyError

Requested named DatasetType could not be found in registry.

getExecution(id)

Retrieve an Execution.

Parameters:
id : int

The unique identifier for the Execution.

getQuantum(id)

Retrieve an Quantum.

Parameters:
id : int

The unique identifier for the Quantum.

getRegion(dataId)

Get region associated with a dataId.

Parameters:
dataId : dict

A dict of DataUnit link name, value pairs that label the DatasetRef within a collection.

Returns:
region : lsst.sphgeom.ConvexPolygon

The region associated with a dataId or None if not present.

Raises:
KeyError

If the set of dataunits for the dataId does not correspond to a unique spatial lookup.

getRun(id=None, collection=None)

Get a Run corresponding to its collection or id

Parameters:
id : int, optional

Lookup by run id, or:

collection : str

If given, lookup by collection name instead.

Returns:
run : Run

The Run instance.

Raises:
ValueError

Must supply one of collection or id.

import_(tables, collection)

Import (previously exported) contents into the (possibly empty) SqlRegistry.

Parameters:
ts : TableSet

Contains the previously exported content.

collection : str

An additional collection assigned to the newly imported Datasets.

makeDatabaseDict(table, types, key, value)

Construct a DatabaseDict backed by a table in the same database as this Registry.

Parameters:
table : table

Name of the table that backs the returned DatabaseDict. If this table already exists, its schema must include at least everything in types.

types : dict

A dictionary mapping str field names to type objects, containing all fields to be held in the database.

key : str

The name of the field to be used as the dictionary key. Must not be present in value._fields.

value : type

The type used for the dictionary’s values, typically a namedtuple. Must have a _fields class attribute that is a tuple of field names (i.e. as defined by namedtuple); these field names must also appear in the types arg, and a _make attribute to construct it from a sequence of values (again, as defined by namedtuple).

makeProvenanceGraph(expr, types=None)

Make a QuantumGraph that contains the full provenance of all Datasets matching an expression.

Parameters:
expr : str

An expression (SQL query that evaluates to a list of Dataset primary keys) that selects the Datasets.

Returns:
graph : QuantumGraph

Instance (with units set to None).

makeRun(collection)

Create a new Run in the SqlRegistry and return it.

If a run with this collection already exists, return that instead.

Parameters:
collection : str

The collection used to identify all inputs and outputs of the Run.

Returns:
run : Run

A new Run instance.

markInputUsed(quantum, ref)

Record the given DatasetRef as an actual (not just predicted) input of the given Quantum.

This updates both the SqlRegistry”s Quantum table and the Python Quantum.actualInputs attribute.

Parameters:
quantum : Quantum

Producer to update. Will be updated in this call.

ref : DatasetRef

To set as actually used input.

Raises:
KeyError

If quantum is not a predicted consumer for ref.

merge(outputCollection, inputCollections)

Create a new collection from a series of existing ones.

Entries earlier in the list will be used in preference to later entries when both contain Datasets with the same DatasetRef.

Parameters:
outputCollection : str

collection to use for the new collection.

inputCollections : list of str

A list of collections to combine.

query(sql, **params)

Execute a SQL SELECT statement directly.

Named parameters are specified in the SQL query string by preceeding them with a colon. Parameter values are provided as additional keyword arguments. For example:

registry.query(“SELECT * FROM Camera WHERE camera=:name”, name=”HSC”)
Parameters:
sql : str

SQL query string. Must be a SELECT statement.

**params

Parameter name-value pairs to insert into the query.

Yields:
row : dict

The next row result from executing the query.

registerDatasetType(datasetType)

Add a new DatasetType to the SqlRegistry.

It is not an error to register the same DatasetType twice.

Parameters:
datasetType : DatasetType

The DatasetType to be added.

Returns:
inserted : bool

True if datasetType was inserted, False if an identical existing DatsetType was found.

Raises:
ValueError

DatasetType is not valid for this registry or is already registered but not identical.

removeDatasetLocation(datastoreName, ref)

Remove datastore location associated with this dataset.

Typically used by Datastore when a dataset is removed.

Parameters:
datastoreName : str

Name of this Datastore.

ref : DatasetRef

A reference to the dataset for which information is to be removed.

selectDataUnits(originInfo, expression, neededDatasetTypes, futureDatasetTypes)

Evaluate a filter expression and lists of DatasetTypes and return a set of data unit values.

Returned set consists of combinations of units participating in data transformation from neededDatasetTypes to futureDatasetTypes, restricted by existing data and filter expression.

Parameters:
originInfo : DatasetOriginInfo

Object which provides names of the input/output collections.

expression : str

An expression that limits the DataUnits and (indirectly) the Datasets returned.

neededDatasetTypes : list of DatasetType

The list of DatasetTypes whose DataUnits will be included in the returned column set. Output is limited to the the Datasets of these DatasetTypes which already exist in the registry.

futureDatasetTypes : list of DatasetType

The list of DatasetTypes whose DataUnits will be included in the returned column set. It is expected that Datasets for these DatasetTypes do not exist in the registry, but presently this is not checked.

Yields:
row : PreFlightUnitsRow

Single row is a unique combination of units in a transform.

setDataUnitRegion(dataUnitNames, value, region, update=True)

Set the region field for a DataUnit instance or a combination thereof and update associated spatial join tables.

Parameters:
dataUnitNames : sequence

A sequence of DataUnit names whose instances are jointly associated with a region on the sky. This must not include dependencies that are implied, e.g. “Patch” must not include “Tract”, but “Sensor” needs to add “Visit”.

value : dict

A dictionary of values that uniquely identify the DataUnits.

region : sphgeom.ConvexPolygon

Region on the sky.

update : bool

If True, existing region information for these DataUnits is being replaced. This is usually required because DataUnit entries are assumed to be pre-inserted prior to calling this function.

subset(collection, expr, datasetTypes)

Create a new collection by subsetting an existing one.

Parameters:
collection : str

Indicates the input collection to subset.

expr : str

An expression that limits the DataUnits and (indirectly) Datasets in the subset.

datasetTypes : list of DatasetType

The list of DatasetTypes whose instances should be included in the subset.

Returns:
collection : str

The newly created collection.

transaction()

Context manager that implements SQL transactions.

Will roll back any changes to the SqlRegistry database in case an exception is raised in the enclosed block.

This context manager may be nested.