DatasetRecordStorageManager

class lsst.daf.butler.registry.interfaces.DatasetRecordStorageManager(*, registry_schema_version: VersionTuple | None = None)

Bases: VersionedExtension

An interface that manages the tables that describe datasets.

DatasetRecordStorageManager primarily serves as a container and factory for DatasetRecordStorage instances, which each provide access to the records for a different DatasetType.

Parameters:
registry_schema_versionVersionTuple or None, optional

Version of registry schema.

Methods Summary

addDatasetForeignKey(tableSpec, *[, name, ...])

Add a foreign key (field and constraint) referencing the dataset table.

associate(dataset_type, collection, datasets)

Associate one or more datasets with a collection.

certify(dataset_type, collection, datasets, ...)

Associate one or more datasets with a calibration collection and a validity range within it.

clone(*, db, collections, dimensions, ...)

Make an independent copy of this manager instance bound to new instances of Database and other managers.

conform_exact_dataset_type(dataset_type)

Conform a value that may be a dataset type or dataset type name to just the dataset type name, while checking that the dataset type is not a component and (if a DatasetType instance is given) has the exact same definition in the registry.

decertify(dataset_type, collection, timespan, *)

Remove or adjust datasets to clear a validity range within a calibration collection.

delete(datasets)

Fully delete the given datasets from the registry.

disassociate(dataset_type, collection, datasets)

Remove one or more datasets from a collection.

fetch_summaries(collections[, dataset_types])

Fetch collection summaries given their names and dataset types.

getCollectionSummary(collection)

Return a summary for the given collection.

getDatasetRef(id)

Return a DatasetRef for the given dataset primary key value.

get_dataset_type(name)

Look up a dataset type by name.

import_(dataset_type, run, data_ids)

Insert one or more dataset entries into the database.

ingest_date_dtype()

Return type of the ingest_date column.

initialize(db, context, *, collections, ...)

Construct an instance of the manager.

insert(dataset_type_name, run, data_ids[, ...])

Insert one or more dataset entries into the database.

make_joins_builder(dataset_type, ...[, is_union])

Make a direct_query_driver.SqlJoinsBuilder that represents a search for datasets of this type.

make_relation(dataset_type, *collections, ...)

Return a sql.Relation that represents a query for this DatasetType in one or more collections.

preload_cache()

Fetch data from the database and use it to pre-populate caches to speed up later operations.

refresh()

Ensure all other operations on this manager are aware of any dataset types that may have been registered by other clients since it was initialized or last refreshed.

refresh_collection_summaries(dataset_type)

Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.

register_dataset_type(dataset_type)

Ensure that this Registry can hold records for the given DatasetType, creating new tables as necessary.

remove_dataset_type(name)

Remove the dataset type.

resolve_wildcard(expression[, missing, ...])

Resolve a dataset type wildcard expression.

Methods Documentation

abstract classmethod addDatasetForeignKey(tableSpec: TableSpec, *, name: str = 'dataset', constraint: bool = True, onDelete: str | None = None, **kwargs: Any) FieldSpec

Add a foreign key (field and constraint) referencing the dataset table.

Parameters:
tableSpecddl.TableSpec

Specification for the table that should reference the dataset table. Will be modified in place.

namestr, optional

A name to use for the prefix of the new field; the full name is {name}_id.

constraintbool, optional

If False (True is default), add a field that can be joined to the dataset primary key, but do not add a foreign key constraint.

onDeletestr, optional

One of “CASCADE” or “SET NULL”, indicating what should happen to the referencing row if the collection row is deleted. None indicates that this should be an integrity error.

**kwargs

Additional keyword arguments are forwarded to the ddl.FieldSpec constructor (only the name and dtype arguments are otherwise provided).

Returns:
idSpecddl.FieldSpec

Specification for the ID field.

abstract associate(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]) None

Associate one or more datasets with a collection.

Parameters:
dataset_typeDatasetType

Type of all datasets.

collectionCollectionRecord

The record object describing the collection. collection.type must be TAGGED.

datasetsIterable [ DatasetRef ]

Datasets to be associated. All datasets must have the same DatasetType as dataset_type, but this is not checked.

Notes

Associating a dataset into collection that already contains a different dataset with the same DatasetType and data ID will remove the existing dataset from that collection.

Associating the same dataset into a collection multiple times is a no-op, but is still not permitted on read-only databases.

abstract certify(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef], timespan: Timespan, context: SqlQueryContext) None

Associate one or more datasets with a calibration collection and a validity range within it.

Parameters:
dataset_typeDatasetType

Type of all datasets.

collectionCollectionRecord

The record object describing the collection. collection.type must be CALIBRATION.

datasetsIterable [ DatasetRef ]

Datasets to be associated. All datasets must have the same DatasetType as dataset_type, but this is not checked.

timespanTimespan

The validity range for these datasets within the collection.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Raises:
ConflictingDefinitionError

Raised if the collection already contains a different dataset with the same DatasetType and data ID and an overlapping validity range.

DatasetTypeError

Raised if dataset_type.isCalibration() is False.

CollectionTypeError

Raised if collection.type is not CollectionType.CALIBRATION.

abstract clone(*, db: Database, collections: CollectionManager, dimensions: DimensionRecordStorageManager, caching_context: CachingContext) DatasetRecordStorageManager

Make an independent copy of this manager instance bound to new instances of Database and other managers.

Parameters:
dbDatabase

New Database object to use when instantiating the manager.

collectionsCollectionManager

New CollectionManager object to use when instantiating the manager.

dimensionsDimensionRecordStorageManager

New DimensionRecordStorageManager object to use when instantiating the manager.

caching_contextCachingContext

New CachingContext object to use when instantiating the manager.

Returns:
instanceDatasetRecordStorageManager

New manager instance with the same configuration as this instance, but bound to a new Database object.

conform_exact_dataset_type(dataset_type: DatasetType | str) DatasetType

Conform a value that may be a dataset type or dataset type name to just the dataset type name, while checking that the dataset type is not a component and (if a DatasetType instance is given) has the exact same definition in the registry.

Parameters:
dataset_typestr or DatasetType

Dataset type object or name.

Returns:
dataset_typeDatasetType

The corresponding registered dataset type.

Raises:
DatasetTypeError

Raised if dataset_type is a component, or if its definition does not exactly match the registered dataset type.

MissingDatasetTypeError

Raised if this dataset type is not registered at all.

abstract decertify(dataset_type: DatasetType, collection: CollectionRecord, timespan: Timespan, *, data_ids: Iterable[DataCoordinate] | None = None, context: SqlQueryContext) None

Remove or adjust datasets to clear a validity range within a calibration collection.

Parameters:
dataset_typeDatasetType

Type of all datasets.

collectionCollectionRecord

The record object describing the collection. collection.type must be CALIBRATION.

timespanTimespan

The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.

data_idsIterable [ DataCoordinate ], optional

Data IDs that should be decertified within the given validity range If None, all data IDs for dataset_type in collection will be decertified.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Raises:
DatasetTypeError

Raised if dataset_type.isCalibration() is False.

CollectionTypeError

Raised if collection.type is not CollectionType.CALIBRATION.

abstract delete(datasets: Iterable[UUID | DatasetRef]) None

Fully delete the given datasets from the registry.

Parameters:
datasetsIterable [ DatasetId or DatasetRef ]

Datasets to be deleted. If DatasetRef instances are passed, only the DatasetRef.id attribute is used.

abstract disassociate(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]) None

Remove one or more datasets from a collection.

Parameters:
dataset_typeDatasetType

Type of all datasets.

collectionCollectionRecord

The record object describing the collection. collection.type must be TAGGED.

datasetsIterable [ DatasetRef ]

Datasets to be disassociated. All datasets must have the same DatasetType as dataset_type, but this is not checked.

abstract fetch_summaries(collections: Iterable[CollectionRecord], dataset_types: Iterable[DatasetType] | Iterable[str] | None = None) Mapping[Any, CollectionSummary]

Fetch collection summaries given their names and dataset types.

Parameters:
collectionsIterable [CollectionRecord]

Collection records to query.

dataset_typesIterable [DatasetType] or None

Dataset types to include into returned summaries. If None then all dataset types will be included.

Returns:
summariesMapping [Any, CollectionSummary]

Collection summaries indexed by collection record key. This mapping will also contain all nested non-chained collections of the chained collections.

abstract getCollectionSummary(collection: CollectionRecord) CollectionSummary

Return a summary for the given collection.

Parameters:
collectionCollectionRecord

Record describing the collection for which a summary is to be retrieved.

Returns:
summaryCollectionSummary

Summary of the dataset types and governor dimension values in this collection.

abstract getDatasetRef(id: UUID) DatasetRef | None

Return a DatasetRef for the given dataset primary key value.

Parameters:
idDatasetId

Primary key value for the dataset.

Returns:
refDatasetRef or None

Object representing the dataset, or None if no dataset with the given primary key values exists in this layer.

abstract get_dataset_type(name: str) DatasetType

Look up a dataset type by name.

Parameters:
namestr

Name of a parent dataset type.

Returns:
dataset_typeDatasetType

The object representing the records for the given dataset type.

Raises:
MissingDatasetTypeError

Raised if there is no dataset type with the given name.

abstract import_(dataset_type: DatasetType, run: RunRecord, data_ids: Mapping[DatasetId, DataCoordinate]) list[DatasetRef]

Insert one or more dataset entries into the database.

Parameters:
dataset_typeDatasetType

Type of dataset to import. Also used as the dataset type for the returned refs.

runRunRecord

The record object describing the RUN collection these datasets will be associated with.

data_idsMapping

Mapping from dataset ID to data ID.

Returns:
datasetslist [ DatasetRef ]

References to the inserted or existing datasets.

abstract ingest_date_dtype() type

Return type of the ingest_date column.

abstract classmethod initialize(db: Database, context: StaticTablesContext, *, collections: CollectionManager, dimensions: DimensionRecordStorageManager, caching_context: CachingContext, registry_schema_version: VersionTuple | None = None) DatasetRecordStorageManager

Construct an instance of the manager.

Parameters:
dbDatabase

Interface to the underlying database engine and namespace.

contextStaticTablesContext

Context object obtained from Database.declareStaticTables; used to declare any tables that should always be present.

collectionsCollectionManager

Manager object for the collections in this Registry.

dimensionsDimensionRecordStorageManager

Manager object for the dimensions in this Registry.

caching_contextCachingContext

Object controlling caching of information returned by managers.

registry_schema_versionVersionTuple or None

Schema version of this extension as defined in registry.

Returns:
managerDatasetRecordStorageManager

An instance of a concrete DatasetRecordStorageManager subclass.

abstract insert(dataset_type_name: str, run: RunRecord, data_ids: Iterable[DataCoordinate], id_generation_mode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE) list[DatasetRef]

Insert one or more dataset entries into the database.

Parameters:
dataset_type_namestr

Name of the dataset type.

runRunRecord

The record object describing the RUN collection these datasets will be associated with.

data_idsIterable [ DataCoordinate ]

Expanded data IDs (DataCoordinate instances) for the datasets to be added. The dimensions of all data IDs must be the same as dataset_type.dimensions.

id_generation_modeDatasetIdGenEnum

With UNIQUE each new dataset is inserted with its new unique ID. With non-UNIQUE mode ID is computed from some combination of dataset type, dataId, and run collection name; if the same ID is already in the database then new record is not inserted.

Returns:
datasetslist [ DatasetRef ]

References to the inserted datasets.

abstract make_joins_builder(dataset_type: DatasetType, collections: Sequence[CollectionRecord], fields: Set[str], is_union: bool = False) SqlJoinsBuilder

Make a direct_query_driver.SqlJoinsBuilder that represents a search for datasets of this type.

Parameters:
dataset_typeDatasetType

Type of dataset to query for.

collectionsSequence [ CollectionRecord ]

Collections to search, in order, after filtering out collections with no datasets of this type via collection summaries.

fieldsSet [ str ]

Names of fields to make available in the builder. Options include:

  • dataset_id (UUID)

  • run (collection name, str)

  • collection (collection name, str)

  • collection_key (collection primary key, manager-dependent)

  • timespan (validity range, or unbounded for non-calibrations)

  • ingest_date (time dataset was ingested into repository)

Dimension keys for the dataset type’s required dimensions are always included.

is_unionbool, optional

If True, this search is being joined in as part of one term in a union over all dataset types. This causes fields to be added to the builder via the special ... instad of the dataset type name.

Returns:
builderdirect_query_driver.SqlJoinsBuilder

A query-construction object representing a table or subquery.

abstract make_relation(dataset_type: DatasetType, *collections: CollectionRecord, columns: Set[str], context: SqlQueryContext) Relation

Return a sql.Relation that represents a query for this DatasetType in one or more collections.

Parameters:
dataset_typeDatasetType

Type of dataset to query for.

*collectionsCollectionRecord

The record object(s) describing the collection(s) to query. May not be of type CollectionType.CHAINED. If multiple collections are passed, the query will search all of them in an unspecified order, and all collections must have the same type. Must include at least one collection.

columnsSet [ str ]

Columns to include in the relation. See Query.find_datasets for most options, but this method supports one more:

  • rank: a calculated integer column holding the index of the

    collection the dataset was found in, within the collections sequence given.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Returns:
relationRelation

Representation of the query.

abstract preload_cache() None

Fetch data from the database and use it to pre-populate caches to speed up later operations.

abstract refresh() None

Ensure all other operations on this manager are aware of any dataset types that may have been registered by other clients since it was initialized or last refreshed.

abstract refresh_collection_summaries(dataset_type: DatasetType) None

Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.

Parameters:
dataset_typeDatasetType

Dataset type whose summary entries should be refreshed.

abstract register_dataset_type(dataset_type: DatasetType) bool

Ensure that this Registry can hold records for the given DatasetType, creating new tables as necessary.

Parameters:
dataset_typeDatasetType

Dataset type for which a table should created (as necessary) and an associated DatasetRecordStorage returned.

Returns:
insertedbool

True if the dataset type did not exist in the registry before.

Notes

This operation may not be invoked within a Database.transaction context.

abstract remove_dataset_type(name: str) None

Remove the dataset type.

Parameters:
namestr

Name of the dataset type.

abstract resolve_wildcard(expression: Any, missing: list[str] | None = None, explicit_only: bool = False) list[lsst.daf.butler._dataset_type.DatasetType]

Resolve a dataset type wildcard expression.

Parameters:
expressionAny

Expression to resolve. Will be passed to DatasetTypeWildcard.from_expression.

missinglist of str, optional

String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.

explicit_onlybool, optional

If True, require explicit DatasetType instances or str names, with re.Pattern instances deprecated and ... prohibited.

Returns:
dataset_typeslist [ DatasetType ]

A list of resolved dataset types.