DatasetRecordStorageManager¶
- class lsst.daf.butler.registry.interfaces.DatasetRecordStorageManager(*, registry_schema_version: VersionTuple | None = None)¶
Bases:
VersionedExtension
An interface that manages the tables that describe datasets.
DatasetRecordStorageManager
primarily serves as a container and factory forDatasetRecordStorage
instances, which each provide access to the records for a differentDatasetType
.- Parameters:
- registry_schema_version
VersionTuple
orNone
, optional Version of registry schema.
- registry_schema_version
Methods Summary
addDatasetForeignKey
(tableSpec, *[, name, ...])Add a foreign key (field and constraint) referencing the dataset table.
associate
(dataset_type, collection, datasets)Associate one or more datasets with a collection.
certify
(dataset_type, collection, datasets, ...)Associate one or more datasets with a calibration collection and a validity range within it.
clone
(*, db, collections, dimensions, ...)Make an independent copy of this manager instance bound to new instances of
Database
and other managers.conform_exact_dataset_type
(dataset_type)Conform a value that may be a dataset type or dataset type name to just the dataset type name, while checking that the dataset type is not a component and (if a
DatasetType
instance is given) has the exact same definition in the registry.decertify
(dataset_type, collection, timespan, *)Remove or adjust datasets to clear a validity range within a calibration collection.
delete
(datasets)Fully delete the given datasets from the registry.
disassociate
(dataset_type, collection, datasets)Remove one or more datasets from a collection.
fetch_summaries
(collections[, dataset_types])Fetch collection summaries given their names and dataset types.
getCollectionSummary
(collection)Return a summary for the given collection.
getDatasetRef
(id)Return a
DatasetRef
for the given dataset primary key value.get_dataset_type
(name)Look up a dataset type by name.
import_
(dataset_type, run, data_ids)Insert one or more dataset entries into the database.
Return type of the
ingest_date
column.initialize
(db, context, *, collections, ...)Construct an instance of the manager.
insert
(dataset_type_name, run, data_ids[, ...])Insert one or more dataset entries into the database.
make_query_joiner
(dataset_type, collections, ...)Make a
direct_query_driver.QueryJoiner
that represents a search for datasets of this type.make_relation
(dataset_type, *collections, ...)Return a
sql.Relation
that represents a query for thisDatasetType
in one or more collections.refresh
()Ensure all other operations on this manager are aware of any dataset types that may have been registered by other clients since it was initialized or last refreshed.
refresh_collection_summaries
(dataset_type)Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.
register_dataset_type
(dataset_type)Ensure that this
Registry
can hold records for the givenDatasetType
, creating new tables as necessary.remove_dataset_type
(name)Remove the dataset type.
resolve_wildcard
(expression[, missing, ...])Resolve a dataset type wildcard expression.
Methods Documentation
- abstract classmethod addDatasetForeignKey(tableSpec: TableSpec, *, name: str = 'dataset', constraint: bool = True, onDelete: str | None = None, **kwargs: Any) FieldSpec ¶
Add a foreign key (field and constraint) referencing the dataset table.
- Parameters:
- tableSpec
ddl.TableSpec
Specification for the table that should reference the dataset table. Will be modified in place.
- name
str
, optional A name to use for the prefix of the new field; the full name is
{name}_id
.- constraint
bool
, optional If
False
(True
is default), add a field that can be joined to the dataset primary key, but do not add a foreign key constraint.- onDelete
str
, optional One of “CASCADE” or “SET NULL”, indicating what should happen to the referencing row if the collection row is deleted.
None
indicates that this should be an integrity error.- **kwargs
Additional keyword arguments are forwarded to the
ddl.FieldSpec
constructor (only thename
anddtype
arguments are otherwise provided).
- tableSpec
- Returns:
- idSpec
ddl.FieldSpec
Specification for the ID field.
- idSpec
- abstract associate(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]) None ¶
Associate one or more datasets with a collection.
- Parameters:
- dataset_type
DatasetType
Type of all datasets.
- collection
CollectionRecord
The record object describing the collection.
collection.type
must beTAGGED
.- datasets
Iterable
[DatasetRef
] Datasets to be associated. All datasets must have the same
DatasetType
asdataset_type
, but this is not checked.
- dataset_type
Notes
Associating a dataset into collection that already contains a different dataset with the same
DatasetType
and data ID will remove the existing dataset from that collection.Associating the same dataset into a collection multiple times is a no-op, but is still not permitted on read-only databases.
- abstract certify(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef], timespan: Timespan, context: SqlQueryContext) None ¶
Associate one or more datasets with a calibration collection and a validity range within it.
- Parameters:
- dataset_type
DatasetType
Type of all datasets.
- collection
CollectionRecord
The record object describing the collection.
collection.type
must beCALIBRATION
.- datasets
Iterable
[DatasetRef
] Datasets to be associated. All datasets must have the same
DatasetType
asdataset_type
, but this is not checked.- timespan
Timespan
The validity range for these datasets within the collection.
- context
SqlQueryContext
The object that manages database connections, temporary tables and relation engines for this query.
- dataset_type
- Raises:
- ConflictingDefinitionError
Raised if the collection already contains a different dataset with the same
DatasetType
and data ID and an overlapping validity range.- DatasetTypeError
Raised if
dataset_type.isCalibration() is False
.- CollectionTypeError
Raised if
collection.type is not CollectionType.CALIBRATION
.
- abstract clone(*, db: Database, collections: CollectionManager, dimensions: DimensionRecordStorageManager, caching_context: CachingContext) DatasetRecordStorageManager ¶
Make an independent copy of this manager instance bound to new instances of
Database
and other managers.- Parameters:
- db
Database
New
Database
object to use when instantiating the manager.- collections
CollectionManager
New
CollectionManager
object to use when instantiating the manager.- dimensions
DimensionRecordStorageManager
New
DimensionRecordStorageManager
object to use when instantiating the manager.- caching_context
CachingContext
New
CachingContext
object to use when instantiating the manager.
- db
- Returns:
- instance
DatasetRecordStorageManager
New manager instance with the same configuration as this instance, but bound to a new Database object.
- instance
- conform_exact_dataset_type(dataset_type: DatasetType | str) DatasetType ¶
Conform a value that may be a dataset type or dataset type name to just the dataset type name, while checking that the dataset type is not a component and (if a
DatasetType
instance is given) has the exact same definition in the registry.- Parameters:
- dataset_type
str
orDatasetType
Dataset type object or name.
- dataset_type
- Returns:
- dataset_type
DatasetType
The corresponding registered dataset type.
- dataset_type
- Raises:
- DatasetTypeError
Raised if
dataset_type
is a component, or if its definition does not exactly match the registered dataset type.- MissingDatasetTypeError
Raised if this dataset type is not registered at all.
- abstract decertify(dataset_type: DatasetType, collection: CollectionRecord, timespan: Timespan, *, data_ids: Iterable[DataCoordinate] | None = None, context: SqlQueryContext) None ¶
Remove or adjust datasets to clear a validity range within a calibration collection.
- Parameters:
- dataset_type
DatasetType
Type of all datasets.
- collection
CollectionRecord
The record object describing the collection.
collection.type
must beCALIBRATION
.- timespan
Timespan
The validity range to remove datasets from within the collection. Datasets that overlap this range but are not contained by it will have their validity ranges adjusted to not overlap it, which may split a single dataset validity range into two.
- data_ids
Iterable
[DataCoordinate
], optional Data IDs that should be decertified within the given validity range If
None
, all data IDs fordataset_type
incollection
will be decertified.- context
SqlQueryContext
The object that manages database connections, temporary tables and relation engines for this query.
- dataset_type
- Raises:
- DatasetTypeError
Raised if
dataset_type.isCalibration() is False
.- CollectionTypeError
Raised if
collection.type is not CollectionType.CALIBRATION
.
- abstract delete(datasets: Iterable[UUID | DatasetRef]) None ¶
Fully delete the given datasets from the registry.
- Parameters:
- datasets
Iterable
[DatasetId
orDatasetRef
] Datasets to be deleted. If
DatasetRef
instances are passed, only theDatasetRef.id
attribute is used.
- datasets
- abstract disassociate(dataset_type: DatasetType, collection: CollectionRecord, datasets: Iterable[DatasetRef]) None ¶
Remove one or more datasets from a collection.
- Parameters:
- dataset_type
DatasetType
Type of all datasets.
- collection
CollectionRecord
The record object describing the collection.
collection.type
must beTAGGED
.- datasets
Iterable
[DatasetRef
] Datasets to be disassociated. All datasets must have the same
DatasetType
asdataset_type
, but this is not checked.
- dataset_type
- abstract fetch_summaries(collections: Iterable[CollectionRecord], dataset_types: Iterable[DatasetType] | Iterable[str] | None = None) Mapping[Any, CollectionSummary] ¶
Fetch collection summaries given their names and dataset types.
- Parameters:
- collections
Iterable
[CollectionRecord
] Collection records to query.
- dataset_types
Iterable
[DatasetType
] orNone
Dataset types to include into returned summaries. If
None
then all dataset types will be included.
- collections
- Returns:
- summaries
Mapping
[Any
,CollectionSummary
] Collection summaries indexed by collection record key. This mapping will also contain all nested non-chained collections of the chained collections.
- summaries
- abstract getCollectionSummary(collection: CollectionRecord) CollectionSummary ¶
Return a summary for the given collection.
- Parameters:
- collection
CollectionRecord
Record describing the collection for which a summary is to be retrieved.
- collection
- Returns:
- summary
CollectionSummary
Summary of the dataset types and governor dimension values in this collection.
- summary
- abstract getDatasetRef(id: UUID) DatasetRef | None ¶
Return a
DatasetRef
for the given dataset primary key value.
- abstract get_dataset_type(name: str) DatasetType ¶
Look up a dataset type by name.
- Parameters:
- name
str
Name of a parent dataset type.
- name
- Returns:
- dataset_type
DatasetType
The object representing the records for the given dataset type.
- dataset_type
- Raises:
- MissingDatasetTypeError
Raised if there is no dataset type with the given name.
- abstract import_(dataset_type: DatasetType, run: RunRecord, data_ids: Mapping[DatasetId, DataCoordinate]) list[DatasetRef] ¶
Insert one or more dataset entries into the database.
- Parameters:
- Returns:
- datasets
list
[DatasetRef
] References to the inserted or existing datasets.
- datasets
- abstract classmethod initialize(db: Database, context: StaticTablesContext, *, collections: CollectionManager, dimensions: DimensionRecordStorageManager, caching_context: CachingContext, registry_schema_version: VersionTuple | None = None) DatasetRecordStorageManager ¶
Construct an instance of the manager.
- Parameters:
- db
Database
Interface to the underlying database engine and namespace.
- context
StaticTablesContext
Context object obtained from
Database.declareStaticTables
; used to declare any tables that should always be present.- collections
CollectionManager
Manager object for the collections in this
Registry
.- dimensions
DimensionRecordStorageManager
Manager object for the dimensions in this
Registry
.- caching_context
CachingContext
Object controlling caching of information returned by managers.
- registry_schema_version
VersionTuple
orNone
Schema version of this extension as defined in registry.
- db
- Returns:
- manager
DatasetRecordStorageManager
An instance of a concrete
DatasetRecordStorageManager
subclass.
- manager
- abstract insert(dataset_type_name: str, run: RunRecord, data_ids: Iterable[DataCoordinate], id_generation_mode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE) list[DatasetRef] ¶
Insert one or more dataset entries into the database.
- Parameters:
- dataset_type_name
str
Name of the dataset type.
- run
RunRecord
The record object describing the
RUN
collection these datasets will be associated with.- data_ids
Iterable
[DataCoordinate
] Expanded data IDs (
DataCoordinate
instances) for the datasets to be added. The dimensions of all data IDs must be the same asdataset_type.dimensions
.- id_generation_mode
DatasetIdGenEnum
With
UNIQUE
each new dataset is inserted with its new unique ID. With non-UNIQUE
mode ID is computed from some combination of dataset type, dataId, and run collection name; if the same ID is already in the database then new record is not inserted.
- dataset_type_name
- Returns:
- datasets
list
[DatasetRef
] References to the inserted datasets.
- datasets
- abstract make_query_joiner(dataset_type: DatasetType, collections: Sequence[CollectionRecord], fields: Set[str]) QueryJoiner ¶
Make a
direct_query_driver.QueryJoiner
that represents a search for datasets of this type.- Parameters:
- dataset_type
DatasetType
Type of dataset to query for.
- collections
Sequence
[CollectionRecord
] Collections to search, in order, after filtering out collections with no datasets of this type via collection summaries.
- fields
Set
[str
] Names of fields to make available in the joiner. Options include:
dataset_id
(UUID)run
(collection name,str
)collection
(collection name,str
)collection_key
(collection primary key, manager-dependent)timespan
(validity range, or unbounded for non-calibrations)ingest_date
(time dataset was ingested into repository)
Dimension keys for the dataset type’s required dimensions are always included.
- dataset_type
- Returns:
- joiner
direct_query_driver.QueryJoiner
A query-construction object representing a table or subquery.
- joiner
- abstract make_relation(dataset_type: DatasetType, *collections: CollectionRecord, columns: Set[str], context: SqlQueryContext) Relation ¶
Return a
sql.Relation
that represents a query for thisDatasetType
in one or more collections.- Parameters:
- dataset_type
DatasetType
Type of dataset to query for.
- *collections
CollectionRecord
The record object(s) describing the collection(s) to query. May not be of type
CollectionType.CHAINED
. If multiple collections are passed, the query will search all of them in an unspecified order, and all collections must have the same type. Must include at least one collection.- columns
Set
[str
] Columns to include in the relation. See
Query.find_datasets
for most options, but this method supports one more:rank
: a calculated integer column holding the index of thecollection the dataset was found in, within the
collections
sequence given.
- context
SqlQueryContext
The object that manages database connections, temporary tables and relation engines for this query.
- dataset_type
- Returns:
- relation
Relation
Representation of the query.
- relation
- abstract refresh() None ¶
Ensure all other operations on this manager are aware of any dataset types that may have been registered by other clients since it was initialized or last refreshed.
- abstract refresh_collection_summaries(dataset_type: DatasetType) None ¶
Make sure that collection summaries for this dataset type are consistent with the contents of the dataset tables.
- Parameters:
- dataset_type
DatasetType
Dataset type whose summary entries should be refreshed.
- dataset_type
- abstract register_dataset_type(dataset_type: DatasetType) bool ¶
Ensure that this
Registry
can hold records for the givenDatasetType
, creating new tables as necessary.- Parameters:
- dataset_type
DatasetType
Dataset type for which a table should created (as necessary) and an associated
DatasetRecordStorage
returned.
- dataset_type
- Returns:
Notes
This operation may not be invoked within a
Database.transaction
context.
- abstract remove_dataset_type(name: str) None ¶
Remove the dataset type.
- Parameters:
- name
str
Name of the dataset type.
- name
- abstract resolve_wildcard(expression: Any, missing: list[str] | None = None, explicit_only: bool = False) list[lsst.daf.butler._dataset_type.DatasetType] ¶
Resolve a dataset type wildcard expression.
- Parameters:
- expression
Any
Expression to resolve. Will be passed to
DatasetTypeWildcard.from_expression
.- missing
list
ofstr
, optional String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.
- explicit_only
bool
, optional If
True
, require explicitDatasetType
instances orstr
names, withre.Pattern
instances deprecated and...
prohibited.
- expression
- Returns:
- dataset_types
list
[DatasetType
] A list of resolved dataset types.
- dataset_types