SqlRegistry¶
-
class
lsst.daf.butler.registries.sqlRegistry.SqlRegistry(registryConfig, schemaConfig, dimensionConfig, create=False, butlerRoot=None)¶ Bases:
lsst.daf.butler.RegistryRegistry backed by a SQL database.
Parameters: - registryConfig :
SqlRegistryConfigorstr Load configuration
- schemaConfig :
SchemaConfigorstr Definition of the schema to use.
- dimensionConfig :
DimensionConfigorConfigor DimensionGraphconfiguration.- create :
bool Assume registry is empty and create a new one.
Attributes Summary
defaultConfigFilePath to configuration defaults. Methods Summary
addDataset(datasetType, dataId, run[, …])Adds a Dataset entry to the RegistryaddDatasetLocation(ref, datastoreName)Add datastore name locating a given dataset. addExecution(execution)Add a new Executionto theRegistry.addRun(run)Add a new Runto theRegistry.associate(collection, refs)Add existing Datasets to a collection, implicitly creating the collection if it does not already exist. attachComponent(name, parent, component)Attach a component to a dataset. disassociate(collection, refs)Remove existing Datasets from a collection. ensureRun(run)Conditionally add a new Runto theRegistry.expandDataId(dataId, Mapping[str, Any], …)Expand a dimension-based data ID to include additional information. find(collection, datasetType[, dataId])Lookup a dataset. fromConfig(registryConfig[, schemaConfig, …])Create Registrysubclass instance fromconfig.getAllCollections()Get names of all the collections found in this repository. getAllDatasetTypes()Get every registered DatasetType.getDataset(id[, datasetType, dataId])Retrieve a Dataset entry. getDatasetLocations(ref)Retrieve datastore locations for a given dataset. getDatasetType(name)Get the DatasetType.getExecution(id)Retrieve an Execution. getRun([id, collection])Get a Runcorresponding to its collection or idinsertDimensionData(element, str], *data, …)Insert one or more dimension records into the database. makeDatabaseDict(table, key, value)Construct a DatabaseDict backed by a table in the same database as this Registry. makeQueryBuilder(summary)Return a QueryBuilderinstance capable of constructing and managing more complex queries than those obtainable viaRegistryinterfaces.makeRun(collection)Create a new Runin theRegistryand return it.query(sql, **params)Execute a SQL SELECT statement directly. queryDatasets(datasetType, str, …)Query for and iterate over dataset references matching user-provided criteria. queryDimensions(dimensions, str]], *, …)Query for and iterate over data IDs matching user-provided criteria. registerDatasetType(datasetType)Add a new DatasetTypeto the Registry.removeDataset(ref)Remove a dataset from the Registry. removeDatasetLocation(datastoreName, ref)Remove datastore location associated with this dataset. setConfigRoot(root, config, full[, overwrite])Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root. transaction()Context manager that implements SQL transactions. Attributes Documentation
-
defaultConfigFile= None¶ Path to configuration defaults. Relative to $DAF_BUTLER_DIR/config or absolute path. Can be None if no defaults specified.
Methods Documentation
-
addDataset(datasetType, dataId, run, producer=None, recursive=False, **kwds)¶ Adds a Dataset entry to the
RegistryThis always adds a new Dataset; to associate an existing Dataset with a new collection, use
associate.Parameters: - datasetType :
DatasetTypeorstr A
DatasetTypeor the name of one.- dataId :
dictorDataCoordinate A
dict-like object containing theDimensionlinks that identify the dataset within a collection.- run :
Run The
Runinstance that produced the Dataset. Ignored ifproduceris passed (producer.runis then used instead). A Run must be provided by one of the two arguments.- producer :
Quantum Unit of work that produced the Dataset. May be
Noneto store no provenance information, but if present theQuantummust already have been added to the Registry.- recursive :
bool If True, recursively add Dataset and attach entries for component Datasets as well.
- kwds
Additional keyword arguments passed to
DataCoordinate.standardizeto convertdataIdto a trueDataCoordinateor augment an existing one.
Returns: - ref :
DatasetRef A newly-created
DatasetRefinstance.
Raises: - ConflictingDefinitionError
If a Dataset with the given
DatasetRefalready exists in the given collection.- Exception
If
dataIdcontains unknown or invalidDimensionentries.
- datasetType :
-
addDatasetLocation(ref, datastoreName)¶ Add datastore name locating a given dataset.
Typically used by
Datastore.Parameters: - ref :
DatasetRef A reference to the dataset for which to add storage information.
- datastoreName :
str Name of the datastore holding this dataset.
Raises: - AmbiguousDatasetError
Raised if
ref.idisNone.
- ref :
-
addExecution(execution)¶ Add a new
Executionto theRegistry.If
execution.idisNonetheRegistrywill update it to that of the newly inserted entry.Parameters: - execution :
Execution Instance to add to the
Registry. The givenExecutionmust not already be present in theRegistry.
Raises: - ConflictingDefinitionError
If
executionis already present in theRegistry.
- execution :
-
addRun(run)¶ Add a new
Runto theRegistry.Parameters: Raises: - ConflictingDefinitionError
If a run already exists with this collection.
-
associate(collection, refs)¶ Add existing Datasets to a collection, implicitly creating the collection if it does not already exist.
If a DatasetRef with the same exact
dataset_idis already in a collection nothing is changed. If aDatasetRefwith the sameDatasetType1and dimension values but with differentdataset_idexists in the collection,ValueErroris raised.Parameters: - collection :
str Indicates the collection the Datasets should be associated with.
- refs : iterable of
DatasetRef An iterable of
DatasetRefinstances that already exist in thisRegistry. All component datasets will be associated with the collection as well.
Raises: - ConflictingDefinitionError
If a Dataset with the given
DatasetRefalready exists in the given collection.
- collection :
-
attachComponent(name, parent, component)¶ Attach a component to a dataset.
Parameters: - name :
str Name of the component.
- parent :
DatasetRef A reference to the parent dataset. Will be updated to reference the component.
- component :
DatasetRef A reference to the component dataset.
Raises: - AmbiguousDatasetError
Raised if
parent.idorcomponent.idisNone.
- name :
-
disassociate(collection, refs)¶ Remove existing Datasets from a collection.
collectionandrefcombinations that are not currently associated are silently ignored.Parameters: Raises: - AmbiguousDatasetError
Raised if
any(ref.id is None for ref in refs).
-
ensureRun(run)¶ Conditionally add a new
Runto theRegistry.If the
run.idisNoneor aRunwith thisidorcollectiondoesn’t exist in theRegistryyet, add it. Otherwise, ensure the provided run is identical to the one already in the registry.Parameters: - run :
Run Instance to add to the
Registry.
Raises: - ConflictingDefinitionError
If
runalready exists, but is not identical.
- run :
-
expandDataId(dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None, records: Optional[Mapping[lsst.daf.butler.core.dimensions.elements.DimensionElement, lsst.daf.butler.core.dimensions.records.DimensionRecord]] = None, **kwds)¶ Expand a dimension-based data ID to include additional information.
-
find(collection, datasetType, dataId=None, **kwds)¶ Lookup a dataset.
This can be used to obtain a
DatasetRefthat permits the dataset to be read from aDatastore.Parameters: - collection :
str Identifies the collection to search.
- datasetType :
DatasetTypeorstr A
DatasetTypeor the name of one.- dataId :
dictorDataCoordinate, optional A
dict-like object containing theDimensionlinks that identify the dataset within a collection.- kwds
Additional keyword arguments passed to
DataCoordinate.standardizeto convertdataIdto a trueDataCoordinateor augment an existing one.
Returns: - ref :
DatasetRef A ref to the Dataset, or
Noneif no matching Dataset was found.
Raises: - LookupError
If one or more data ID keys are missing.
- collection :
-
static
fromConfig(registryConfig, schemaConfig=None, dimensionConfig=None, create=False, butlerRoot=None)¶ Create
Registrysubclass instance fromconfig.Uses
registry.clsfromconfigto determine which subclass to instantiate.Parameters: - registryConfig :
ButlerConfig,RegistryConfig,Configorstr Registry configuration
- schemaConfig :
SchemaConfig,Configorstr, optional. Schema configuration. Can be read from supplied registryConfig if the relevant component is defined and
schemaConfigisNone.- dimensionConfig :
DimensionConfigorConfigor str, optional.DimensionGraphconfiguration. Can be read from supplied registryConfig if the relevant component is defined anddimensionConfigisNone.- create :
bool Assume empty Registry and create a new one.
Returns: - registry :
Registry(subclass) A new
Registrysubclass instance.
- registryConfig :
-
getAllCollections()¶ Get names of all the collections found in this repository.
Returns:
-
getAllDatasetTypes()¶ Get every registered
DatasetType.Returns: - types :
frozensetofDatasetType Every
DatasetTypein the registry.
- types :
-
getDataset(id, datasetType=None, dataId=None)¶ Retrieve a Dataset entry.
Parameters: - id :
int The unique identifier for the Dataset.
- datasetType :
DatasetType, optional The
DatasetTypeof the dataset to retrieve. This is used to short-circuit retrieving theDatasetType, so if provided, the caller is guaranteeing that it is what would have been retrieved.- dataId :
DataCoordinate, optional A
Dimension-based identifier for the dataset within a collection, possibly containing additional metadata. This is used to short-circuit retrieving the dataId, so if provided, the caller is guaranteeing that it is what would have been retrieved.
Returns: - ref :
DatasetRef A ref to the Dataset, or
Noneif no matching Dataset was found.
- id :
-
getDatasetLocations(ref)¶ Retrieve datastore locations for a given dataset.
Typically used by
Datastore.Parameters: - ref :
DatasetRef A reference to the dataset for which to retrieve storage information.
Returns: Raises: - AmbiguousDatasetError
Raised if
ref.idisNone.
- ref :
-
getDatasetType(name)¶ Get the
DatasetType.Parameters: - name :
str Name of the type.
Returns: - type :
DatasetType The
DatasetTypeassociated with the given name.
Raises: - KeyError
Requested named DatasetType could not be found in registry.
- name :
-
getExecution(id)¶ Retrieve an Execution.
Parameters: - id :
int The unique identifier for the Execution.
- id :
-
getRun(id=None, collection=None)¶ Get a
Runcorresponding to its collection or idParameters: Returns: - run :
Run The
Runinstance.
Raises: - ValueError
Must supply one of
collectionorid.
- run :
-
insertDimensionData(element: Union[lsst.daf.butler.core.dimensions.elements.DimensionElement, str], *data, conform: bool = True)¶ Insert one or more dimension records into the database.
Parameters: - element :
DimensionElementorstr The
DimensionElementor name thereof that identifies the table records will be inserted into.- data :
dictorDimensionRecord(variadic) One or more records to insert.
- conform :
bool, optional If
False(Trueis default) perform no checking or conversions, and assume thatelementis aDimensionElementinstance anddatais a one or moreDimensionRecordinstances of the appropriate subclass.
- element :
-
makeDatabaseDict(table, key, value)¶ Construct a DatabaseDict backed by a table in the same database as this Registry.
Parameters: - table :
table Name of the table that backs the returned DatabaseDict. If this table already exists, its schema must include at least everything in
types.- key :
str The name of the field to be used as the dictionary key. Must not be present in
value._fields.- value :
type The type used for the dictionary’s values, typically a
DatabaseDictRecordBase. Must have afieldsclass method that is a tuple of field names; these field names must also appear in the return value of thetypes()class method, and it must be possible to construct it from a sequence of values. Lengths of string fields must be obtainable as adictfrom using thelengthsproperty.
Returns: - databaseDict :
DatabaseDict DatabaseDictbacked by this registry.
- table :
-
makeQueryBuilder(summary: lsst.daf.butler.core.queries.structs.QuerySummary) → lsst.daf.butler.core.queries.builder.QueryBuilder¶ Return a
QueryBuilderinstance capable of constructing and managing more complex queries than those obtainable viaRegistryinterfaces.This is an advanced
SqlRegistry-only interface; downstream code should preferRegistry.queryDimensionsandRegistry.queryDatasetswhenever those are sufficient.Parameters: - summary: `QuerySummary`
Object describing and categorizing the full set of dimensions that will be included in the query.
Returns: - builder :
QueryBuilder Object that can be used to construct and perform advanced queries.
-
makeRun(collection)¶ Create a new
Runin theRegistryand return it.If a run with this collection already exists, return that instead.
Parameters: - collection :
str The collection used to identify all inputs and outputs of the
Run.
Returns: - run :
Run A new
Runinstance.
- collection :
-
query(sql, **params)¶ Execute a SQL SELECT statement directly.
Named parameters are specified in the SQL query string by preceeding them with a colon. Parameter values are provided as additional keyword arguments. For example:
- registry.query(“SELECT * FROM instrument WHERE instrument=:name”,
- name=”HSC”)
Parameters: - sql :
str SQL query string. Must be a SELECT statement.
- **params
Parameter name-value pairs to insert into the query.
Yields: - row :
dict The next row result from executing the query.
-
queryDatasets(datasetType: Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], *, collections: Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis], dimensions: Optional[Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]]] = None, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, where: Optional[str] = None, deduplicate: bool = False, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.datasets.DatasetRef]¶ Query for and iterate over dataset references matching user-provided criteria.
Parameters: - datasetType :
DatasetType,str,Like, or... An expression indicating type(s) of datasets to query for.
...may be used to query for all known DatasetTypes. Multiple explicitly-provided dataset types cannot be queried in a single call toqueryDatasetseven though wildcard expressions can, because the results would be identical to chaining the iterators produced by multiple calls toqueryDatasets.- collections: `~collections.abc.Sequence` of `str` or `Like`, or ``…``
An expression indicating the collections to be searched for datasets.
...may be passed to search all collections.- dimensions :
IterableofDimensionorstr Dimensions to include in the query (in addition to those used to identify the queried dataset type(s)), either to constrain the resulting datasets to those for which a matching dimension exists, or to relate the dataset type’s dimensions to dimensions referenced by the
dataIdorwherearguments.- dataId :
dictorDataCoordinate, optional A data ID whose key-value pairs are used as equality constraints in the query.
- where :
str, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
- deduplicate :
bool, optional If
True(Falseis default), for each result data ID, only yield oneDatasetRefof eachDatasetType, from the first collection in which a dataset of that dataset type appears (according to the order ofcollectionspassed in). Cannot be used if any element incollectionsis an expression.- expand :
bool, optional If
True(default) attachExpandedDataCoordinateinstead of minimalDataCoordinatebase-class instances.- kwds
Additional keyword arguments are forwarded to
DataCoordinate.standardizewhen processing thedataIdargument (and may be used to provide a constraining data ID even when thedataIdargument isNone).
Yields: - ref :
DatasetRef Dataset references matching the given query criteria. These are grouped by
DatasetTypeif the query evaluates to multiple dataset types, but order is otherwise unspecified.
Raises: - TypeError
Raised when the arguments are incompatible, such as when a collection wildcard is pass when
deduplicateisTrue.
Notes
When multiple dataset types are queried via a wildcard expression, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included. In contexts where that kind of information is important, the recommended pattern is to use
queryDimensionsto first obtain data IDs (possibly with the desired dataset types and collections passed as constraints to the query), and then use multiple (generally much simpler) calls toqueryDatasetswith the returned data IDs passed as constraints.- datasetType :
-
queryDimensions(dimensions: Iterable[Union[lsst.daf.butler.core.dimensions.elements.Dimension, str]], *, dataId: Union[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate, Mapping[str, Any], None] = None, datasets: Optional[Mapping[Union[lsst.daf.butler.core.datasets.DatasetType, str, lsst.daf.butler.core.queries.datasets.Like, ellipsis], Union[Sequence[Union[str, lsst.daf.butler.core.queries.datasets.Like]], ellipsis]]] = None, where: Optional[str] = None, expand: bool = True, **kwds) → Iterator[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate]¶ Query for and iterate over data IDs matching user-provided criteria.
Parameters: - dimensions :
IterableofDimensionorstr The dimensions of the data IDs to yield, as either
Dimensioninstances orstr. Will be automatically expanded to a completeDimensionGraph.- dataId :
dictorDataCoordinate, optional A data ID whose key-value pairs are used as equality constraints in the query.
- datasets :
Mapping, optional Datasets whose existence in the registry constrain the set of data IDs returned. This is a mapping from a dataset type expression (a
strname, a trueDatasetTypeinstance, aLikepattern for the name, or...for all DatasetTypes) to a collections expression (a sequence ofstrorLikepatterns, orfor all collections).- where :
str, optional A string expression similar to a SQL WHERE clause. May involve any column of a dimension table or (as a shortcut for the primary key column of a dimension table) dimension name.
- expand :
bool, optional If
True(default) yieldExpandedDataCoordinateinstead of minimalDataCoordinatebase-class instances.- kwds
Additional keyword arguments are forwarded to
DataCoordinate.standardizewhen processing thedataIdargument (and may be used to provide a constraining data ID even when thedataIdargument isNone).
Yields: - dataId :
DataCoordinate Data IDs matching the given query parameters. Order is unspecified.
- dimensions :
-
registerDatasetType(datasetType)¶ Add a new
DatasetTypeto the Registry.It is not an error to register the same
DatasetTypetwice.Parameters: - datasetType :
DatasetType The
DatasetTypeto be added.
Returns: Raises: - ValueError
Raised if the dimensions or storage class are invalid.
- ConflictingDefinitionError
Raised if this DatasetType is already registered with a different definition.
- datasetType :
-
removeDataset(ref)¶ Remove a dataset from the Registry.
The dataset and all components will be removed unconditionally from all collections, and any associated
Quantumrecords will also be removed.Datastorerecords will not be deleted; the caller is responsible for ensuring that the dataset has already been removed from all Datastores.Parameters: - ref :
DatasetRef Reference to the dataset to be removed. Must include a valid
idattribute, and should be considered invalidated upon return.
Raises: - AmbiguousDatasetError
Raised if
ref.idisNone.- OrphanedRecordError
Raised if the dataset is still present in any
Datastore.
- ref :
-
removeDatasetLocation(datastoreName, ref)¶ Remove datastore location associated with this dataset.
Typically used by
Datastorewhen a dataset is removed.Parameters: - datastoreName :
str Name of this
Datastore.- ref :
DatasetRef A reference to the dataset for which information is to be removed.
Raises: - AmbiguousDatasetError
Raised if
ref.idisNone.
- datastoreName :
-
classmethod
setConfigRoot(root, config, full, overwrite=True)¶ Set any filesystem-dependent config options for this Registry to be appropriate for a new empty repository with the given root.
Parameters: - root :
str Filesystem path to the root of the data repository.
- config :
Config A
Configto update. Only the subset understood by this component will be updated. Will not expand defaults.- full :
Config A complete config with all defaults expanded that can be converted to a
RegistryConfig. Read-only and will not be modified by this method. Repository-specific options that should not be obtained from defaults when Butler instances are constructed should be copied fromfulltoconfig.- overwrite :
bool, optional If
False, do not modify a value inconfigif the value already exists. Default is always to overwrite with the providedroot.
Notes
If a keyword is explicitly defined in the supplied
configit will not be overridden by this method ifoverwriteisFalse. This allows explicit values set in external configs to be retained.- root :
-
transaction()¶ Context manager that implements SQL transactions.
Will roll back any changes to the
SqlRegistrydatabase in case an exception is raised in the enclosed block.This context manager may be nested.
- registryConfig :