QueryBackend¶
- class lsst.daf.butler.registry.queries.QueryBackend¶
- Bases: - Generic[- _C]- An interface for constructing and evaluating the - Relationobjects that comprise registry queries.- This ABC is expected to have a concrete subclass for each concrete registry type, and most subclasses will be paired with a - QueryContextsubclass. See- QueryContextfor the division of responsibilities between these two interfaces.- Attributes Summary - Definition of all dimensions and dimension elements for this registry ( - DimensionUniverse).- Methods Summary - context()- Return a context manager that can be used to execute queries with this backend. - extract_dimension_relationships(relation)- Extract the dimension key relationships encoded in a relation tree. - filter_dataset_collections(dataset_types, ...)- Filter a sequence of collections to those for which a dataset query might succeed. - get_collection_name(key)- Return the collection name associated with a collection primary key value. - get_dimension_record_cache(element_name, context)- Return a local cache of all - DimensionRecordobjects for a dimension element, fetching it if necessary.- make_dataset_query_relation(dataset_type, ...)- Construct a relation that represents an unordered query for datasets that returns matching results from all given collections. - make_dataset_search_relation(dataset_type, ...)- Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID. - make_dimension_relation(dimensions, columns, ...)- Construct a relation that provides columns and constraints from dimension records. - make_doomed_dataset_relation(dataset_type, ...)- Construct a relation that represents a doomed query for datasets. - resolve_collection_wildcard(expression, *[, ...])- Return the collection records that match a wildcard expression. - resolve_dataset_collections(dataset_type, ...)- Resolve the sequence of collections to query for a dataset type. - resolve_dataset_type_wildcard(expression[, ...])- Return the dataset types that match a wildcard expression. - resolve_governor_constraints(dimensions, ...)- Resolve governor dimension constraints provided by user input to a query against the content in the - Registry.- resolve_single_dataset_type_wildcard(expression)- Return a single dataset type that matches a wildcard expression. - Attributes Documentation - universe¶
- Definition of all dimensions and dimension elements for this registry ( - DimensionUniverse).
 - Methods Documentation - context() _C¶
- Return a context manager that can be used to execute queries with this backend. - Returns:
- contextQueryContext
- Context manager that manages state and connections needed to execute queries. 
 
- context
 
 - extract_dimension_relationships(relation: Relation) set[frozenset[str]]¶
- Extract the dimension key relationships encoded in a relation tree. - Parameters:
- relationRelation
- Relation tree to process. 
 
- relation
- Returns:
 - Notes - Dimension relationships include both many-to-one implied dependencies and many-to-many joins backed by “always-join” dimension elements, and it’s important to join in the dimension table that defines a relationship in any query involving dimensions that are a superset of that relationship. For example, let’s consider a relation tree that joins dataset existence-check relations for two dataset types, with dimensions - {instrument, exposure, detector}and- {instrument, physical_filter}. The joined relation appears to have all dimension keys in its expanded graph present except- band, and the system could easily correct this by joining that dimension in directly. But it’s also missing the- {instrument, exposure, physical_filter}relationship we’d get from the- exposuredimension’s own relation (- exposureimplies- phyiscal_filter) and the similar- {instrument, physical_filter, band}relationship from the- physical_filterdimension relation; we need the relationship logic to recognize that those dimensions need to be joined in as well in order for the full relation to have rows that represent valid data IDs.- The implementation of this method relies on the assumption that - LeafRelationobjects always have rows that are consistent with all defined relationships (i.e. are valid data IDs). This is true for not just dimension relations themselves, but anything created from queries based on them, including datasets and query results. It is possible to construct- LeafRelationobjects that don’t satisfy this criteria (e.g. when accepting in user-provided data IDs(, and in this case higher-level guards or warnings must be provided.``
 - abstract filter_dataset_collections(dataset_types: Iterable[DatasetType], collections: Sequence[CollectionRecord], *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None) dict[DatasetType, list[CollectionRecord]]¶
- Filter a sequence of collections to those for which a dataset query might succeed. - Parameters:
- dataset_typesIterable[DatasetType]
- Dataset types that are being queried. Must include only parent or standalone dataset types, not components. 
- collectionsSequence[CollectionRecord]
- Sequence of collections that will be searched. 
- governor_constraintsMapping[str,Set], optional
- Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped. 
- rejectionslist[str], optional
- If not - None, a- listthat diagnostic messages will be appended to, for any collection that matches- collectionsthat is not returned. At least one message is guaranteed whenever the result is empty.
 
- dataset_types
- Returns:
 - Notes - This method accepts multiple dataset types and multiple collections at once to enable implementations to batch up the fetching of summary information needed to relate them. 
 - abstract get_collection_name(key: Any) str¶
- Return the collection name associated with a collection primary key value. - Parameters:
- key
- Collection primary key value. 
 
- Returns:
- namestr
- Collection name. 
 
- name
 
 - abstract get_dimension_record_cache(element_name: str, context: _C) Mapping[DataCoordinate, DimensionRecord] | None¶
- Return a local cache of all - DimensionRecordobjects for a dimension element, fetching it if necessary.- Parameters:
- element_namestr
- Name of the dimension element. 
- contextqueries.SqlQueryContext
- Context to be used to execute queries when no cached result is available. 
 
- element_name
- Returns:
 
 - abstract make_dataset_query_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C) Relation¶
- Construct a relation that represents an unordered query for datasets that returns matching results from all given collections. - Parameters:
- dataset_typeDatasetType
- Type for the datasets being queried. 
- collectionsSequence[CollectionRecord]
- Records for collections to query. Should generally be the result of a call to - resolve_dataset_collections, and must not be empty.
- contextQueryContext
- Context that manages per-query state. 
- columnsSet[str]
- Columns to include in the relation. See - Query.find_datasetsfor details.
 
- dataset_type
- Returns:
- relationlsst.daf.relation.Relation
- Relation representing a dataset query. 
 
- relation
 
 - make_dataset_search_relation(dataset_type: DatasetType, collections: Sequence[CollectionRecord], columns: Set[str], context: _C, *, join_to: Relation | None = None) Relation¶
- Construct a relation that represents an order query for datasets that returns results from the first matching collection for each data ID. - Parameters:
- dataset_typeDatasetType
- Type for the datasets being search. 
- collectionsSequence[CollectionRecord]
- Records for collections to search. Should generally be the result of a call to - resolve_dataset_collections, and must not be empty.
- columnsSet[str]
- Columns to include in the - relation. See `make_dataset_query_relationfor options.
- contextQueryContext
- Context that manages per-query state. 
- join_toRelation, optional
- Another relation to join with the query for datasets in all collections before filtering out out shadowed datasets. 
 
- dataset_type
- Returns:
- relationlsst.daf.relation.Relation
- Relation representing a find-first dataset search. 
 
- relation
 
 - abstract make_dimension_relation(dimensions: DimensionGraph, columns: Set[ColumnTag], context: _C, *, initial_relation: Relation | None = None, initial_join_max_columns: frozenset[lsst.daf.relation._columns._tag.ColumnTag] | None = None, initial_dimension_relationships: Set[frozenset[str]] | None = None, spatial_joins: Iterable[tuple[str, str]] = (), governor_constraints: Mapping[str, Set[str]]) Relation¶
- Construct a relation that provides columns and constraints from dimension records. - Parameters:
- dimensionsDimensionGraph
- Dimensions to include. The key columns for all dimensions (both required and implied) will be included in the returned relation. 
- columnsSet[ColumnTag]
- Dimension record columns to include. This set may include key column tags as well, though these may be ignored; the set of key columns to include is determined by the - dimensionsargument instead.
- contextQueryContext
- Context that manages per-query state. 
- initial_relationRelation, optional
- Initial relation to join to the dimension relations. If this relation provides record columns, key columns, and relationships between key columns (see - initial_dimension_relationshipsbelow) that would otherwise have been added by joining in a dimension element’s relation, that relation may not be joined in at all.
- initial_join_max_columnsfrozenset[ColumnTag], optional
- Maximum superset of common columns for joins to - initial_relation(i.e. columns in the- ONexpression of SQL- JOINclauses). If provided, this is a subset of the dimension key columns in- initial_relation, which are otherwise all considered as potential common columns for joins. Ignored if- initial_relationis not provided.
- initial_dimension_relationshipsSet[frozenset
- [ - str] ], optional- A set of sets of dimension names representing relationships between dimensions encoded in the rows of - initial_relation. If not provided (and- initial_relationis),- extract_dimension_relationshipswill be called on- initial_relation.
- spatial_joinscollections.abc.Iterable[tuple[str,str] ]
- Iterable of dimension element name pairs that should be spatially joined. 
- governor_constraintsMapping[str[Set
- [ - str] ] ], optional- Constraints on governor dimensions that are provided by other parts of the query that either have been included in - initial_relationor are guaranteed to be added in the future. This is a mapping from governor dimension name to sets of values that dimension may take.
 
- dimensions
- Returns:
- relationlsst.daf.relation.Relation
- Relation containing the given dimension columns and constraints. 
 
- relation
 
 - make_doomed_dataset_relation(dataset_type: DatasetType, columns: Set[str], messages: Iterable[str], context: _C) Relation¶
- Construct a relation that represents a doomed query for datasets. - Parameters:
- dataset_typeDatasetType
- Dataset type being queried. 
- columnsAbstractSet[str]
- Dataset columns to include (dimension key columns are always included). See - make_dataset_query_relationfor allowed values.
- messagesIterable[str]
- Diagnostic messages that explain why the query is doomed to yield no rows. 
- contextQueryContext
- Context that manages per-query state. 
 
- dataset_type
- Returns:
- relationlsst.daf.relation.Relation
- Relation with the requested columns and no rows. 
 
- relation
 
 - abstract resolve_collection_wildcard(expression: Any, *, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), done: set[str] | None = None, flatten_chains: bool = True, include_chains: bool | None = None) list[CollectionRecord]¶
- Return the collection records that match a wildcard expression. - Parameters:
- expression
- Names and/or patterns for collections; will be passed to - CollectionWildcard.from_expression.
- collection_typescollections.abc.Set[CollectionType], optional
- If provided, only yield collections of these types. 
- doneset[str], optional
- A set of collection names that should be skipped, updated to include all processed collection names on return. 
- flatten_chainsbool, optional
- If - True(default) recursively yield the child collections of- CHAINEDcollections.
- include_chainsbool, optional
- If - False, return records for- CHAINEDcollections themselves. The default is the opposite of- flattenChains: either return records for CHAINED collections or their children, but not both.
 
- Returns:
- recordslist[CollectionRecord]
- Matching collection records. 
 
- records
 
 - resolve_dataset_collections(dataset_type: DatasetType, collections: CollectionWildcard, *, governor_constraints: Mapping[str, Set[str]], rejections: list[str] | None = None, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), allow_calibration_collections: bool = False) list[CollectionRecord]¶
- Resolve the sequence of collections to query for a dataset type. - Parameters:
- dataset_typeDatasetType
- Dataset type to be queried in the returned collections. 
- collectionsCollectionWildcard
- Expression for the collections to be queried. 
- governor_constraintsMapping[str,Set], optional
- Constraints imposed by other aspects of the query on governor dimensions; collections inconsistent with these constraints will be skipped. 
- rejectionslist[str], optional
- If not - None, a- listthat diagnostic messages will be appended to, for any collection that matches- collectionsthat is not returned. At least one message is guaranteed whenever the result is empty.
- collection_typesSet[CollectionType], optional
- Collection types to consider when resolving the collection expression. 
- allow_calibration_collectionsbool, optional
- If - False, skip (with a- rejectionsmessage) any calibration collections that match- collectionsare not given explicitly by name, and raise- NotImplementedErrorfor any calibration collection that is given explicitly. This is a temporary option that will be removed when the query system can handle temporal joins involving calibration collections.
 
- dataset_type
- Returns:
- recordslist[CollectionRecord]
- A new list of - CollectionRecordinstances, for collections that both match- collectionsand may have datasets of the given type.
 
- records
 - Notes - This is a higher-level driver for - resolve_collection_wildcardand- filter_dataset_collectionsthat is mostly concerned with handling queries against- CALIBRATIONcollections that aren’t fully supported yet. Once that support improves, this method may be removed.
 - abstract resolve_dataset_type_wildcard(expression: Any, components: bool | None = None, missing: list[str] | None = None, explicit_only: bool = False, components_deprecated: bool = True) dict[lsst.daf.butler.core.datasets.type.DatasetType, list[str | None]]¶
- Return the dataset types that match a wildcard expression. - Parameters:
- expression
- Names and/or patterns for dataset types; will be passed to - DatasetTypeWildcard.from_expression.
- componentsbool, optional
- If - True, apply all expression patterns to component dataset type names as well. If- False, never apply patterns to components. If- None(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (- stror- DatasetTypeinstances) are always included.
- missinglistofstr, optional
- String dataset type names that were explicitly given (i.e. not regular expression patterns) but not found will be appended to this list, if it is provided. 
- explicit_onlybool, optional
- If - True, require explicit- DatasetTypeinstances or- strnames, with- re.Patterninstances deprecated and- ...prohibited.
- components_deprecatedbool, optional
- If - True, this is a context in which component dataset support is deprecated. This will result in a deprecation warning when- components=Trueor- components=Noneand a component dataset is matched. In the future this will become an error.
 
- Returns:
 
 - abstract resolve_governor_constraints(dimensions: DimensionGraph, constraints: Mapping[str, Set[str]], context: _C) Mapping[str, Set[str]]¶
- Resolve governor dimension constraints provided by user input to a query against the content in the - Registry.- Parameters:
- dimensionsDimensionGraph
- Dimensions that bound the governor dimensions to consider (via - dimensions.governors, more specifically).
- constraintsMapping[str, [Set
- [ - str] ] ]- Constraints from user input to the query (e.g. from data IDs and string expression predicates). 
- contextQueryContext
- Object that manages state for the query; used here to fetch the governor dimension record cache if it has not already been loaded. 
 
- dimensions
- Returns:
- Raises:
- DataIdValueError
- Raised if - constraintsincludes governor dimension values that are not present in the- Registry.
 
 
 - resolve_single_dataset_type_wildcard(expression: Any, components: bool | None = None, explicit_only: bool = False, components_deprecated: bool = True) tuple[lsst.daf.butler.core.datasets.type.DatasetType, list[str | None]]¶
- Return a single dataset type that matches a wildcard expression. - Parameters:
- expression
- Names and/or patterns for the dataset type; will be passed to - DatasetTypeWildcard.from_expression.
- componentsbool, optional
- If - True, apply all expression patterns to component dataset type names as well. If- False, never apply patterns to components. If- None(default), apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (- stror- DatasetTypeinstances) are always included.
- explicit_onlybool, optional
- If - True, require explicit- DatasetTypeinstances or- strnames, with- re.Patterninstances deprecated and- ...prohibited.
- components_deprecatedbool, optional
- If - True, this is a context in which component dataset support is deprecated. This will result in a deprecation warning when- components=Trueor- components=Noneand a component dataset is matched. In the future this will become an error.
 
- Returns:
 - Notes - This method really finds a single parent dataset type and any number of components, because it’s only the parent dataset type that’s known to registry at all; many callers are expected to discard the - single_componentsreturn value.