DatasetRecordStorage

class lsst.daf.butler.registry.interfaces.DatasetRecordStorage(datasetType: DatasetType)

Bases: ABC

An interface that manages the records associated with a particular DatasetType.

Parameters:
datasetTypeDatasetType

Dataset type whose records this object manages.

Methods Summary

associate(collection, datasets)

Associate one or more datasets with a collection.

certify(collection, datasets, timespan, context)

Associate one or more datasets with a calibration collection and a validity range within it.

decertify(collection, timespan, *[, dataIds])

Remove or adjust datasets to clear a validity range within a calibration collection.

delete(datasets)

Fully delete the given datasets from the registry.

disassociate(collection, datasets)

Remove one or more datasets from a collection.

import_(run, datasets)

Insert one or more dataset entries into the database.

insert(run, dataIds[, idGenerationMode])

Insert one or more dataset entries into the database.

make_query_joiner(collections, fields)

Make a direct_query_driver.QueryJoiner that represents a search for datasets of this type.

make_relation(*collections, columns, context)

Return a sql.Relation that represents a query for for this DatasetType in one or more collections.

refresh_collection_summaries()

Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.

Methods Documentation

abstract associate(collection: CollectionRecord, datasets: Iterable[DatasetRef]) None

Associate one or more datasets with a collection.

Parameters:
collectionCollectionRecord

The record object describing the collection. collection.type must be TAGGED.

datasetsIterable [ DatasetRef ]

Datasets to be associated. All datasets must be resolved and have the same DatasetType as self.

Raises:
AmbiguousDatasetError

Raised if any of the given DatasetRef instances is unresolved.

Notes

Associating a dataset with into collection that already contains a different dataset with the same DatasetType and data ID will remove the existing dataset from that collection.

Associating the same dataset into a collection multiple times is a no-op, but is still not permitted on read-only databases.

abstract certify(collection: CollectionRecord, datasets: Iterable[DatasetRef], timespan: Timespan, context: SqlQueryContext) None

Associate one or more datasets with a calibration collection and a validity range within it.

Parameters:
collectionCollectionRecord

The record object describing the collection. collection.type must be CALIBRATION.

datasetsIterable [ DatasetRef ]

Datasets to be associated. All datasets must be resolved and have the same DatasetType as self.

timespanTimespan

The validity range for these datasets within the collection.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Raises:
AmbiguousDatasetError

Raised if any of the given DatasetRef instances is unresolved.

ConflictingDefinitionError

Raised if the collection already contains a different dataset with the same DatasetType and data ID and an overlapping validity range.

CollectionTypeError

Raised if collection.type is not CollectionType.CALIBRATION or if self.datasetType.isCalibration() is False.

abstract decertify(collection: CollectionRecord, timespan: Timespan, *, dataIds: Iterable[DataCoordinate] | None = None, context: SqlQueryContext) None

Remove or adjust datasets to clear a validity range within a calibration collection.

Parameters:
collectionCollectionRecord

The record object describing the collection. collection.type must be CALIBRATION.

timespanTimespan

The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.

dataIdsIterable [ DataCoordinate ], optional

Data IDs that should be decertified within the given validity range If None, all data IDs for self.datasetType will be decertified.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Raises:
CollectionTypeError

Raised if collection.type is not CollectionType.CALIBRATION.

abstract delete(datasets: Iterable[DatasetRef]) None

Fully delete the given datasets from the registry.

Parameters:
datasetsIterable [ DatasetRef ]

Datasets to be deleted. All datasets must be resolved and have the same DatasetType as self.

Raises:
AmbiguousDatasetError

Raised if any of the given DatasetRef instances is unresolved.

abstract disassociate(collection: CollectionRecord, datasets: Iterable[DatasetRef]) None

Remove one or more datasets from a collection.

Parameters:
collectionCollectionRecord

The record object describing the collection. collection.type must be TAGGED.

datasetsIterable [ DatasetRef ]

Datasets to be disassociated. All datasets must be resolved and have the same DatasetType as self.

Raises:
AmbiguousDatasetError

Raised if any of the given DatasetRef instances is unresolved.

abstract import_(run: RunRecord, datasets: Iterable[DatasetRef]) Iterator[DatasetRef]

Insert one or more dataset entries into the database.

Parameters:
runRunRecord

The record object describing the RUN collection this dataset will be associated with.

datasetsIterable of DatasetRef

Datasets to be inserted. Datasets can specify id attribute which will be used for inserted datasets. All dataset IDs must have the same type (int or uuid.UUID), if type of dataset IDs does not match type supported by this class then IDs will be ignored and new IDs will be generated by backend.

Returns:
datasetsIterable [ DatasetRef ]

References to the inserted or existing datasets.

Notes

The datasetType and run attributes of datasets are supposed to be identical across all datasets but this is not checked and it should be enforced by higher level registry code. This method does not need to use those attributes from datasets, only dataId and id are relevant.

abstract insert(run: RunRecord, dataIds: Iterable[DataCoordinate], idGenerationMode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE) Iterator[DatasetRef]

Insert one or more dataset entries into the database.

Parameters:
runRunRecord

The record object describing the RUN collection this dataset will be associated with.

dataIdsIterable [ DataCoordinate ]

Expanded data IDs (DataCoordinate instances) for the datasets to be added. The dimensions of all data IDs must be the same as self.datasetType.dimensions.

idGenerationModeDatasetIdGenEnum

With UNIQUE each new dataset is inserted with its new unique ID. With non-UNIQUE mode ID is computed from some combination of dataset type, dataId, and run collection name; if the same ID is already in the database then new record is not inserted.

Returns:
datasetsIterable [ DatasetRef ]

References to the inserted datasets.

abstract make_query_joiner(collections: Sequence[CollectionRecord], fields: Set[str]) QueryJoiner

Make a direct_query_driver.QueryJoiner that represents a search for datasets of this type.

Parameters:
collectionsSequence [ CollectionRecord ]

Collections to search, in order, after filtering out collections with no datasets of this type via collection summaries.

fieldsSet [ str ]

Names of fields to make available in the joiner. Options include:

  • dataset_id (UUID)

  • ``run` (collection name, str)

  • collection (collection name, str)

  • collection_key (collection primary key, manager-dependent)

  • timespan (validity range, or unbounded for non-calibrations)

  • ingest_date (time dataset was ingested into repository)

Dimension keys for the dataset type’s required dimensions are always included.

Returns:
joinerdirect_query_driver.QueryJoiner

A query-construction object representing a table or subquery. If fields is empty or len(collections) <= 1, this is guaranteed to have rows that are unique over dimension keys.

abstract make_relation(*collections: CollectionRecord, columns: Set[str], context: SqlQueryContext) Relation

Return a sql.Relation that represents a query for for this DatasetType in one or more collections.

Parameters:
*collectionsCollectionRecord

The record object(s) describing the collection(s) to query. May not be of type CollectionType.CHAINED. If multiple collections are passed, the query will search all of them in an unspecified order, and all collections must have the same type. Must include at least one collection.

columnsSet [ str ]

Columns to include in the relation. See Query.find_datasets for most options, but this method supports one more:

  • rank: a calculated integer column holding the index of the collection the dataset was found in, within the collections sequence given.

contextSqlQueryContext

The object that manages database connections, temporary tables and relation engines for this query.

Returns:
relationRelation

Representation of the query.

abstract refresh_collection_summaries() None

Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.