QueryBackend

class lsst.daf.butler.registry.queries.QueryBackend

Bases: Generic[_C]

An interface for constructing and evaluating the Relation objects that comprise registry queries.

This ABC is expected to have a concrete subclass for each concrete registry type, and most subclasses will be paired with a QueryContext subclass. See QueryContext for the division of responsibilities between these two interfaces.

Attributes Summary

universe

Definition of all dimensions and dimension elements for this registry (DimensionUniverse).

Methods Summary

context()

Return a context manager that can be used to execute queries with this backend.

extract_dimension_relationships(relation)

Extract the dimension key relationships encoded in a relation tree.

filter_dataset_collections(dataset_types, ...)

Filter a sequence of collections to those for which a dataset query might succeed.

get_collection_name(key)

Return the collection name associated with a collection primary key value.

get_dimension_record_cache(element_name, context)

Return a local cache of all DimensionRecord objects for a dimension element, fetching it if necessary.

make_dataset_query_relation(dataset_type, ...)

Construct a relation that represents an unordered query for datasets that returns matching results from all given collections.

make_dataset_search_relation(dataset_type, ...)

Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID.

make_dimension_relation(dimensions, columns, ...)

Construct a relation that provides columns and constraints from dimension records.

make_doomed_dataset_relation(dataset_type, ...)

Construct a relation that represents a doomed query for datasets.

resolve_collection_wildcard(expression, *[, ...])

Return the collection records that match a wildcard expression.

resolve_dataset_collections(dataset_type, ...)

Resolve the sequence of collections to query for a dataset type.

resolve_dataset_type_wildcard(expression[, ...])

Return the dataset types that match a wildcard expression.

resolve_governor_constraints(dimensions, ...)

Resolve governor dimension constraints provided by user input to a query against the content in the Registry.

resolve_single_dataset_type_wildcard(expression)

Return a single dataset type that matches a wildcard expression.

Attributes Documentation

universe

Definition of all dimensions and dimension elements for this registry (DimensionUniverse).

Methods Documentation

context() _C

Return a context manager that can be used to execute queries with this backend.

Returns:
contextQueryContext

Context manager that manages state and connections needed to execute queries.

extract_dimension_relationships(relation: Relation) set[frozenset[str]]

Extract the dimension key relationships encoded in a relation tree.

Parameters:
relationRelation

Relation tree to process.

Returns:
relationshipsset [ frozenset [ str ] ]

Set of sets of dimension names, where each inner set represents a relationship between dimensions.

Notes

Dimension relationships include both many-to-one implied dependencies and many-to-many joins backed by “always-join” dimension elements, and it’s important to join in the dimension table that defines a relationship in any query involving dimensions that are a superset of that relationship. For example, let’s consider a relation tree that joins dataset existence-check relations for two dataset types, with dimensions {instrument, exposure, detector} and {instrument, physical_filter}. The joined relation appears to have all dimension keys in its expanded graph present except band, and the system could easily correct this by joining that dimension in directly. But it’s also missing the {instrument, exposure, physical_filter} relationship we’d get from the exposure dimension’s own relation (exposure implies physical_filter) and the similar {instrument, physical_filter, band} relationship from the physical_filter dimension relation; we need the relationship logic to recognize that those dimensions need to be joined in as well in order for the full relation to have rows that represent valid data IDs.

The implementation of this method relies on the assumption that LeafRelation objects always have rows that are consistent with all defined relationships (i.e. are valid data IDs). This is true for not just dimension relations themselves, but anything created from queries based on them, including datasets and query results. It is possible to construct LeafRelation objects that don’t satisfy this criteria (e.g. when accepting in user-provided data IDs), and in this case higher-level guards or warnings must be provided.``

abstract filter_dataset_collections(dataset_types: Iterable[DatasetType], collections: Sequence[CollectionRecord], *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None) dict[DatasetType, list[CollectionRecord]]

Filter a sequence of collections to those for which a dataset query might succeed.

Parameters:
dataset_typesIterable [ DatasetType ]

Dataset types that are being queried. Must include only parent or standalone dataset types, not components.

collectionsSequence [ CollectionRecord ]

Sequence of collections that will be searched.

governor_constraintsMapping [ str, Set [ str ] ], optional

Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped.

rejectionslist [ str ], optional

If not None, a list that diagnostic messages will be appended to, for any collection that matches collections that is not returned. At least one message is guaranteed whenever the result is empty.

Returns:
dataset_collectionsdict [ DatasetType, list [ CollectionRecord ] ]

The collections to search for each dataset. The dictionary’s keys are always exactly dataset_types (in the same order), and each nested list of collections is ordered consistently with the given collections.

Notes

This method accepts multiple dataset types and multiple collections at once to enable implementations to batch up the fetching of summary information needed to relate them.

abstract get_collection_name(key: Any) str

Return the collection name associated with a collection primary key value.

Parameters:
keyAny

Collection primary key value.

Returns:
namestr

Collection name.

abstract get_dimension_record_cache(element_name: str, context: _C) Mapping[DataCoordinate, DimensionRecord] | None

Return a local cache of all DimensionRecord objects for a dimension element, fetching it if necessary.

Parameters:
element_namestr

Name of the dimension element.

contextqueries.SqlQueryContext

Context to be used to execute queries when no cached result is available.

Returns:
cacheMapping [ DataCoordinate, DimensionRecord ] or None

Mapping from data ID to dimension record, or None if this element’s records are never cached.

make_dataset_query_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C, *, join_to: Relation | None = None, temporal_join_on: Set[ColumnTag] = frozenset({})) Relation

Construct a relation that represents an unordered query for datasets that returns matching results from all given collections.

Parameters:
dataset_typeDatasetType

Type for the datasets being queried.

collectionsSequence [ CollectionRecord ]

Records for collections to query. Should generally be the result of a call to resolve_dataset_collections, and must not be empty.

columnsSet [ str ]

Columns to include in the relation. See Query.find_datasets for details.

contextQueryContext

Context that manages per-query state.

join_toRelation, optional

Another relation to join with the query for datasets in all collections.

temporal_join_onSet [ ColumnTag ], optional

Timespan columns in join_to that calibration dataset timespans must overlap. Must already be present in join_to. Ignored if join_to is None or if there are no calibration collections.

Returns:
relationlsst.daf.relation.Relation

Relation representing a dataset query.

make_dataset_search_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C, *, join_to: Relation | None = None, temporal_join_on: Set[ColumnTag] = frozenset({})) Relation

Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID.

Parameters:
dataset_typeDatasetType

Type for the datasets being search.

collectionsSequence [ CollectionRecord ]

Records for collections to search. Should generally be the result of a call to resolve_dataset_collections, and must not be empty.

columnsSet [ str ]

Columns to include in the relation. See make_dataset_query_relation for options.

contextQueryContext

Context that manages per-query state.

join_toRelation, optional

Another relation to join with the query for datasets in all collections before filtering out out shadowed datasets.

temporal_join_onSet [ ColumnTag ], optional

Timespan columns in join_to that calibration dataset timespans must overlap. Must already be present in join_to. Ignored if join_to is None or if there are no calibration collections.

Returns:
relationlsst.daf.relation.Relation

Relation representing a find-first dataset search.

abstract make_dimension_relation(dimensions: DimensionGroup, columns: Set[ColumnTag], context: _C, *, initial_relation: Relation | None = None, initial_join_max_columns: frozenset[lsst.daf.relation._columns._tag.ColumnTag] | None = None, initial_dimension_relationships: Set[frozenset[str]] | None = None, spatial_joins: Iterable[tuple[str, str]] = (), governor_constraints: Mapping[str, Set[str]]) Relation

Construct a relation that provides columns and constraints from dimension records.

Parameters:
dimensionsDimensionGroup

Dimensions to include. The key columns for all dimensions (both required and implied) will be included in the returned relation.

columnsSet [ ColumnTag ]

Dimension record columns to include. This set may include key column tags as well, though these may be ignored; the set of key columns to include is determined by the dimensions argument instead.

contextQueryContext

Context that manages per-query state.

initial_relationRelation, optional

Initial relation to join to the dimension relations. If this relation provides record columns, key columns, and relationships between key columns (see initial_dimension_relationships below) that would otherwise have been added by joining in a dimension element’s relation, that relation may not be joined in at all.

initial_join_max_columnsfrozenset [ ColumnTag ], optional

Maximum superset of common columns for joins to initial_relation (i.e. columns in the ON expression of SQL JOIN clauses). If provided, this is a subset of the dimension key columns in initial_relation, which are otherwise all considered as potential common columns for joins. Ignored if initial_relation is not provided.

initial_dimension_relationshipsSet [ frozenset [ str ] ], optional

A set of sets of dimension names representing relationships between dimensions encoded in the rows of initial_relation. If not provided (and initial_relation is), extract_dimension_relationships will be called on initial_relation.

spatial_joinscollections.abc.Iterable [ tuple [ str, str ] ]

Iterable of dimension element name pairs that should be spatially joined.

governor_constraintsMapping [ str [ Set [ str ] ] ], optional

Constraints on governor dimensions that are provided by other parts of the query that either have been included in initial_relation or are guaranteed to be added in the future. This is a mapping from governor dimension name to sets of values that dimension may take.

Returns:
relationlsst.daf.relation.Relation

Relation containing the given dimension columns and constraints.

make_doomed_dataset_relation(dataset_type: DatasetType, columns: Set[str], messages: Iterable[str], context: _C) Relation

Construct a relation that represents a doomed query for datasets.

Parameters:
dataset_typeDatasetType

Dataset type being queried.

columnsSet [ str ]

Dataset columns to include (dimension key columns are always included). See make_dataset_query_relation for allowed values.

messagesIterable [ str ]

Diagnostic messages that explain why the query is doomed to yield no rows.

contextQueryContext

Context that manages per-query state.

Returns:
relationlsst.daf.relation.Relation

Relation with the requested columns and no rows.

abstract resolve_collection_wildcard(expression: Any, *, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), done: set[str] | None = None, flatten_chains: bool = True, include_chains: bool | None = None) list[CollectionRecord]

Return the collection records that match a wildcard expression.

Parameters:
expressionAny

Names and/or patterns for collections; will be passed to CollectionWildcard.from_expression.

collection_typescollections.abc.Set [ CollectionType ], optional

If provided, only yield collections of these types.

doneset [ str ], optional

A set of collection names that should be skipped, updated to include all processed collection names on return.

flatten_chainsbool, optional

If True (default) recursively yield the child collections of CHAINED collections.

include_chainsbool, optional

If False, return records for CHAINED collections themselves. The default is the opposite of flattenChains: either return records for CHAINED collections or their children, but not both.

Returns:
recordslist [ CollectionRecord ]

Matching collection records.

resolve_dataset_collections(dataset_type: DatasetType, collections: CollectionWildcard, *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), allow_calibration_collections: bool = False) list[CollectionRecord]

Resolve the sequence of collections to query for a dataset type.

Parameters:
dataset_typeDatasetType

Dataset type to be queried in the returned collections.

collectionsCollectionWildcard

Expression for the collections to be queried.

governor_constraintsMapping [ str, Set ], optional

Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped.

rejectionslist [ str ], optional

If not None, a list that diagnostic messages will be appended to, for any collection that matches collections that is not returned. At least one message is guaranteed whenever the result is empty.

collection_typesSet [ CollectionType ], optional

Collection types to consider when resolving the collection expression.

allow_calibration_collectionsbool, optional

If False, skip (with a rejections message) any calibration collections that match collections are not given explicitly by name, and raise NotImplementedError for any calibration collection that is given explicitly. This is a temporary option that will be removed when the query system can handle temporal joins involving calibration collections.

Returns:
recordslist [ CollectionRecord ]

A new list of CollectionRecord instances, for collections that both match collections and may have datasets of the given type.

Notes

This is a higher-level driver for resolve_collection_wildcard and filter_dataset_collections that is mostly concerned with handling queries against CALIBRATION collections that aren’t fully supported yet. Once that support improves, this method may be removed.

abstract resolve_dataset_type_wildcard(expression: Any, components: bool | None = None, missing: list[str] | None = None, explicit_only: bool = False, components_deprecated: bool = True) dict[lsst.daf.butler._dataset_type.DatasetType, list[str | None]]

Return the dataset types that match a wildcard expression.

Parameters:
expressionAny

Names and/or patterns for dataset types; will be passed to DatasetTypeWildcard.from_expression.

componentsbool, optional

If True, apply all expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

missinglist of str, optional

String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided.

explicit_onlybool, optional

If True, require explicit DatasetType instances or str names, with re.Pattern instances deprecated and ... prohibited.

components_deprecatedbool, optional

If True, this is a context in which component dataset support is deprecated. This will result in a deprecation warning when components=True or components=None and a component dataset is matched. In the future this will become an error.

Returns:
dataset_typesdict [ DatasetType, list [ None, str ] ]

A mapping with resolved dataset types as keys and lists of matched component names as values, where None indicates the parent composite dataset type was matched.

abstract resolve_governor_constraints(dimensions: DimensionGroup, constraints: Mapping[str, Set[str]], context: _C) Mapping[str, Set[str]]

Resolve governor dimension constraints provided by user input to a query against the content in the Registry.

Parameters:
dimensionsDimensionGroup

Dimensions that bound the governor dimensions to consider (via dimensions.governors, more specifically).

constraintsMapping [ str, Set [ str ] ]

Constraints from user input to the query (e.g. from data IDs and string expression predicates).

contextQueryContext

Object that manages state for the query; used here to fetch the governor dimension record cache if it has not already been loaded.

Returns:
resolvedMapping [ str, Set [ str ] ]

A shallow copy of constraints with keys equal to dimensions.governors.names and value sets constrained by the Registry content if they were not already in constraints.

Raises:
DataIdValueError

Raised if constraints includes governor dimension values that are not present in the Registry.

resolve_single_dataset_type_wildcard(expression: Any, components: bool | None = None, explicit_only: bool = False, components_deprecated: bool = True) tuple[lsst.daf.butler._dataset_type.DatasetType, list[str | None]]

Return a single dataset type that matches a wildcard expression.

Parameters:
expressionAny

Names and/or patterns for the dataset type; will be passed to DatasetTypeWildcard.from_expression.

componentsbool, optional

If True, apply all expression patterns to component dataset type names as well. If False, never apply patterns to components. If None (default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (str or DatasetType instances) are always included.

explicit_onlybool, optional

If True, require explicit DatasetType instances or str names, with re.Pattern instances deprecated and ... prohibited.

components_deprecatedbool, optional

If True, this is a context in which component dataset support is deprecated. This will result in a deprecation warning when components=True or components=None and a component dataset is matched. In the future this will become an error.

Returns:
single_parentDatasetType

The matched parent dataset type.

single_componentslist [ str | None ]

The matched components that correspond to this parent, or None if the parent dataset type itself was matched.

Notes

This method really finds a single parent dataset type and any number of components, because it’s only the parent dataset type that’s known to registry at all; many callers are expected to discard the single_components return value.