QueryBackend¶
- class lsst.daf.butler.registry.queries.QueryBackend¶
Bases:
Generic
[_C
]An interface for constructing and evaluating the
Relation
objects that comprise registry queries.This ABC is expected to have a concrete subclass for each concrete registry type, and most subclasses will be paired with a
QueryContext
subclass. SeeQueryContext
for the division of responsibilities between these two interfaces.Attributes Summary
Definition of all dimensions and dimension elements for this registry (
DimensionUniverse
).Methods Summary
context
()Return a context manager that can be used to execute queries with this backend.
extract_dimension_relationships
(relation)Extract the dimension key relationships encoded in a relation tree.
filter_dataset_collections
(dataset_types, ...)Filter a sequence of collections to those for which a dataset query might succeed.
get_collection_name
(key)Return the collection name associated with a collection primary key value.
get_dimension_record_cache
(element_name, context)Return a local cache of all
DimensionRecord
objects for a dimension element, fetching it if necessary.make_dataset_query_relation
(dataset_type, ...)Construct a relation that represents an unordered query for datasets that returns matching results from all given collections.
make_dataset_search_relation
(dataset_type, ...)Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID.
make_dimension_relation
(dimensions, columns, ...)Construct a relation that provides columns and constraints from dimension records.
make_doomed_dataset_relation
(dataset_type, ...)Construct a relation that represents a doomed query for datasets.
resolve_collection_wildcard
(expression, *[, ...])Return the collection records that match a wildcard expression.
resolve_dataset_collections
(dataset_type, ...)Resolve the sequence of collections to query for a dataset type.
resolve_dataset_type_wildcard
(expression[, ...])Return the dataset types that match a wildcard expression.
resolve_governor_constraints
(dimensions, ...)Resolve governor dimension constraints provided by user input to a query against the content in the
Registry
.resolve_single_dataset_type_wildcard
(expression)Return a single dataset type that matches a wildcard expression.
Attributes Documentation
- universe¶
Definition of all dimensions and dimension elements for this registry (
DimensionUniverse
).
Methods Documentation
- context() _C ¶
Return a context manager that can be used to execute queries with this backend.
- Returns:
- context
QueryContext
Context manager that manages state and connections needed to execute queries.
- context
- extract_dimension_relationships(relation: Relation) set[frozenset[str]] ¶
Extract the dimension key relationships encoded in a relation tree.
- Parameters:
- relation
Relation
Relation tree to process.
- relation
- Returns:
Notes
Dimension relationships include both many-to-one implied dependencies and many-to-many joins backed by “always-join” dimension elements, and it’s important to join in the dimension table that defines a relationship in any query involving dimensions that are a superset of that relationship. For example, let’s consider a relation tree that joins dataset existence-check relations for two dataset types, with dimensions
{instrument, exposure, detector}
and{instrument, physical_filter}
. The joined relation appears to have all dimension keys in its expanded graph present exceptband
, and the system could easily correct this by joining that dimension in directly. But it’s also missing the{instrument, exposure, physical_filter}
relationship we’d get from theexposure
dimension’s own relation (exposure
impliesphyiscal_filter
) and the similar{instrument, physical_filter, band}
relationship from thephysical_filter
dimension relation; we need the relationship logic to recognize that those dimensions need to be joined in as well in order for the full relation to have rows that represent valid data IDs.The implementation of this method relies on the assumption that
LeafRelation
objects always have rows that are consistent with all defined relationships (i.e. are valid data IDs). This is true for not just dimension relations themselves, but anything created from queries based on them, including datasets and query results. It is possible to constructLeafRelation
objects that don’t satisfy this criteria (e.g. when accepting in user-provided data IDs(, and in this case higher-level guards or warnings must be provided.``
- abstract filter_dataset_collections(dataset_types: Iterable[DatasetType], collections: Sequence[CollectionRecord], *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None) dict[DatasetType, list[CollectionRecord]] ¶
Filter a sequence of collections to those for which a dataset query might succeed.
- Parameters:
- dataset_types
Iterable
[DatasetType
] Dataset types that are being queried. Must include only parent or standalone dataset types, not components.
- collections
Sequence
[CollectionRecord
] Sequence of collections that will be searched.
- governor_constraints
Mapping
[str
,Set
], optional Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped.
- rejections
list
[str
], optional If not
None
, alist
that diagnostic messages will be appended to, for any collection that matchescollections
that is not returned. At least one message is guaranteed whenever the result is empty.
- dataset_types
- Returns:
Notes
This method accepts multiple dataset types and multiple collections at once to enable implementations to batch up the fetching of summary information needed to relate them.
- abstract get_collection_name(key: Any) str ¶
Return the collection name associated with a collection primary key value.
- Parameters:
- key
Collection primary key value.
- Returns:
- name
str
Collection name.
- name
- abstract get_dimension_record_cache(element_name: str, context: _C) Mapping[DataCoordinate, DimensionRecord] | None ¶
Return a local cache of all
DimensionRecord
objects for a dimension element, fetching it if necessary.- Parameters:
- element_name
str
Name of the dimension element.
- context
queries.SqlQueryContext
Context to be used to execute queries when no cached result is available.
- element_name
- Returns:
- abstract make_dataset_query_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C) Relation ¶
Construct a relation that represents an unordered query for datasets that returns matching results from all given collections.
- Parameters:
- dataset_type
DatasetType
Type for the datasets being queried.
- collections
Sequence
[CollectionRecord
] Records for collections to query. Should generally be the result of a call to
resolve_dataset_collections
, and must not be empty.- context
QueryContext
Context that manages per-query state.
- columns
Set
[str
] Columns to include in the relation. See
Query.find_datasets
for details.- Results
- ——-
- relation
lsst.daf.relation.Relation
Relation representing a dataset query.
- dataset_type
- make_dataset_search_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C, *, join_to: Relation | None = None) Relation ¶
Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID.
- Parameters:
- dataset_type
DatasetType
Type for the datasets being search.
- collections
Sequence
[CollectionRecord
] Records for collections to search. Should generally be the result of a call to
resolve_dataset_collections
, and must not be empty.- columns
Set
[str
] Columns to include in the
relation. See `make_dataset_query_relation
for options.- context
QueryContext
Context that manages per-query state.
- join_to
Relation
, optional Another relation to join with the query for datasets in all collections before filtering out out shadowed datasets.
- dataset_type
- abstract make_dimension_relation(dimensions: DimensionGraph, columns: Set[ColumnTag], context: _C, *, initial_relation: Relation | None = None, initial_join_max_columns: frozenset[lsst.daf.relation._columns._tag.ColumnTag] | None = None, initial_dimension_relationships: Set[frozenset[str]] | None = None, spatial_joins: Iterable[tuple[str, str]] = (), governor_constraints: Mapping[str, Set[str]]) Relation ¶
Construct a relation that provides columns and constraints from dimension records.
- Parameters:
- dimensions
DimensionGraph
Dimensions to include. The key columns for all dimensions (both required and implied) will be included in the returned relation.
- columns
Set
[ColumnTag
] Dimension record columns to include. This set may include key column tags as well, though these may be ignored; the set of key columns to include is determined by the
dimensions
argument instead.- context
QueryContext
Context that manages per-query state.
- initial_relation
Relation
, optional Initial relation to join to the dimension relations. If this relation provides record columns, key columns, and relationships between key columns (see
initial_dimension_relationships
below) that would otherwise have been added by joining in a dimension element’s relation, that relation may not be joined in at all.- initial_join_max_columns
frozenset
[ColumnTag
], optional Maximum superset of common columns for joins to
initial_relation
(i.e. columns in theON
expression of SQLJOIN
clauses). If provided, this is a subset of the dimension key columns ininitial_relation
, which are otherwise all considered as potential common columns for joins. Ignored ifinitial_relation
is not provided.- initial_dimension_relationships
Set
[frozenset
[
str
] ], optionalA set of sets of dimension names representing relationships between dimensions encoded in the rows of
initial_relation
. If not provided (andinitial_relation
is),extract_dimension_relationships
will be called oninitial_relation
.- spatial_joins
collections.abc.Iterable
[tuple
[str
,str
] ] Iterable of dimension element name pairs that should be spatially joined.
- governor_constraints
Mapping
[str
[Set
[
str
] ] ], optionalConstraints on governor dimensions that are provided by other parts of the query that either have been included in
initial_relation
or are guaranteed to be added in the future. This is a mapping from governor dimension name to sets of values that dimension may take.
- dimensions
- make_doomed_dataset_relation(dataset_type: DatasetType, columns: Set[str], messages: Iterable[str], context: _C) Relation ¶
Construct a relation that represents a doomed query for datasets.
- Parameters:
- dataset_type
DatasetType
Dataset type being queried.
- columns
AbstractSet
[str
] Dataset columns to include (dimension key columns are always included). See
make_dataset_query_relation
for allowed values.- messages
Iterable
[str
] Diagnostic messages that explain why the query is doomed to yield no rows.
- context
QueryContext
Context that manages per-query state.
- dataset_type
- abstract resolve_collection_wildcard(expression: Any, *, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), done: set[str] | None = None, flatten_chains: bool = True, include_chains: bool | None = None) list[CollectionRecord] ¶
Return the collection records that match a wildcard expression.
- Parameters:
- expression
Names and/or patterns for collections; will be passed to
CollectionWildcard.from_expression
.- collection_types
collections.abc.Set
[CollectionType
], optional If provided, only yield collections of these types.
- done
set
[str
], optional A set of collection names that should be skipped, updated to include all processed collection names on return.
- flatten_chains
bool
, optional If
True
(default) recursively yield the child collections ofCHAINED
collections.- include_chains
bool
, optional If
False
, return records forCHAINED
collections themselves. The default is the opposite offlattenChains
: either return records for CHAINED collections or their children, but not both.
- Returns:
- records
list
[CollectionRecord
] Matching collection records.
- records
- resolve_dataset_collections(dataset_type: DatasetType, collections: CollectionWildcard, *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), allow_calibration_collections: bool = False) list[CollectionRecord] ¶
Resolve the sequence of collections to query for a dataset type.
- Parameters:
- dataset_type
DatasetType
Dataset type to be queried in the returned collections.
- collections
CollectionWildcard
Expression for the collections to be queried.
- governor_constraints
Mapping
[str
,Set
], optional Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped.
- rejections
list
[str
], optional If not
None
, alist
that diagnostic messages will be appended to, for any collection that matchescollections
that is not returned. At least one message is guaranteed whenever the result is empty.- collection_types
Set
[CollectionType
], optional Collection types to consider when resolving the collection expression.
- allow_calibration_collections
bool
, optional If
False
, skip (with arejections
message) any calibration collections that matchcollections
are not given explicitly by name, and raiseNotImplementedError
for any calibration collection that is given explicitly. This is a temporary option that will be removed when the query system can handle temporal joins involving calibration collections.
- dataset_type
- Returns:
- records
list
[CollectionRecord
] A new list of
CollectionRecord
instances, for collections that both matchcollections
and may have datasets of the given type.
- records
Notes
This is a higher-level driver for
resolve_collection_wildcard
andfilter_dataset_collections
that is mostly concerned with handling queries againstCALIBRATION
collections that aren’t fully supported yet. Once that support improves, this method may be removed.
- abstract resolve_dataset_type_wildcard(expression: Any, components: bool | None = None, missing: list[str] | None = None, explicit_only: bool = False, components_deprecated: bool = True) dict[lsst.daf.butler.core.datasets.type.DatasetType, list[str | None]] ¶
Return the dataset types that match a wildcard expression.
- Parameters:
- expression
Names and/or patterns for dataset types; will be passed to
DatasetTypeWildcard.from_expression
.- components
bool
, optional If
True
, apply all expression patterns to component dataset type names as well. IfFalse
, never apply patterns to components. IfNone
(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str
orDatasetType
instances) are always included.- missing
list
ofstr
, optional String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.
- explicit_only
bool
, optional If
True
, require explicitDatasetType
instances orstr
names, withre.Pattern
instances deprecated and...
prohibited.- components_deprecated
bool
, optional If
True
, this is a context in which component dataset support is deprecated. This will result in a deprecation warning whencomponents=True
orcomponents=None
and a component dataset is matched. In the future this will become an error.
- Returns:
- abstract resolve_governor_constraints(dimensions: DimensionGraph, constraints: Mapping[str, Set[str]], context: _C) Mapping[str, Set[str]] ¶
Resolve governor dimension constraints provided by user input to a query against the content in the
Registry
.- Parameters:
- dimensions
DimensionGraph
Dimensions that bound the governor dimensions to consider (via
dimensions.governors
, more specifically).- constraints
Mapping
[str
, [Set
[
str
] ] ]Constraints from user input to the query (e.g. from data IDs and string expression predicates).
- context
QueryContext
Object that manages state for the query; used here to fetch the governor dimension record cache if it has not already been loaded.
- dimensions
- Returns:
- Raises:
- DataIdValueError
Raised if
constraints
includes governor dimension values that are not present in theRegistry
.
- resolve_single_dataset_type_wildcard(expression: Any, components: bool | None = None, explicit_only: bool = False, components_deprecated: bool = True) tuple[lsst.daf.butler.core.datasets.type.DatasetType, list[str | None]] ¶
Return a single dataset type that matches a wildcard expression.
- Parameters:
- expression
Names and/or patterns for the dataset type; will be passed to
DatasetTypeWildcard.from_expression
.- components
bool
, optional If
True
, apply all expression patterns to component dataset type names as well. IfFalse
, never apply patterns to components. IfNone
(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str
orDatasetType
instances) are always included.- explicit_only
bool
, optional If
True
, require explicitDatasetType
instances orstr
names, withre.Pattern
instances deprecated and...
prohibited.- components_deprecated
bool
, optional If
True
, this is a context in which component dataset support is deprecated. This will result in a deprecation warning whencomponents=True
orcomponents=None
and a component dataset is matched. In the future this will become an error.
- Returns:
Notes
This method really finds a single parent dataset type and any number of components, because it’s only the parent dataset type that’s known to registry at all; many callers are expected to discard the
single_components
return value.