DataCoordinateQueryResults¶

class lsst.daf.butler.registry.queries.DataCoordinateQueryResults(db: lsst.daf.butler.registry.interfaces.Database, query: lsst.daf.butler.registry.queries.Query, *, records: Optional[Mapping[str, Mapping[tuple, lsst.daf.butler.DimensionRecord]]] = None)¶

Bases: Iterable[lsst.daf.butler.DataCoordinate]

An enhanced implementation of DataCoordinateIterable that represents data IDs retrieved from a database query.

Parameters

dbDatabase: Database engine used to execute queries.
queryQuery: Low-level representation of the query that backs this result object.
recordsMapping, optional: A nested mapping containing DimensionRecord objects for all dimensions and all data IDs this query will yield. If None (default), DataCoordinateIterable.hasRecords will return False. The outer mapping has str keys (the names of dimension elements). The inner mapping has tuple keys representing data IDs (tuple conversions of DataCoordinate.values()) and DimensionRecord values.

Notes

Constructing an instance of this does nothing; the query is not executed until it is iterated over (or some other operation is performed that involves iteration).

Instances should generally only be constructed by Registry methods or the methods of other query result objects.

Attributes Summary

`graph`	Dimensions identified by these data IDs (`DimensionGraph`).
`universe`	Universe that defines all known compatible dimensions.

Methods Summary

`constrain`(query, columns)	Constrain a SQL query to include or relate to only known data IDs.
`expanded`()	Return a results object for which `hasRecords` returns `True`.
`findDatasets`(datasetType, collections, *[, …])	Find datasets using the data IDs identified by this query.
`fromScalar`(dataId)	Return a `DataCoordinateIterable` containing the single data ID.
`hasFull`()	Indicate if all data IDs in this iterable identify all dimensions.
`hasRecords`()	Return whether all data IDs in this iterable contain records.
`materialize`()	Insert this query’s results into a temporary table.
`subset`([graph, unique])	Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
`toSequence`()	Transform this iterable into a `DataCoordinateSequence`.
`toSet`()	Transform this iterable into a `DataCoordinateSet`.

Attributes Documentation

graph¶

universe¶

Universe that defines all known compatible dimensions.

(DimensionUniverse).

Methods Documentation

constrain(query: lsst.daf.butler.SimpleQuery, columns: Callable[[str], sqlalchemy.sql.elements.ColumnElement]) → None ¶

Constrain a SQL query to include or relate to only known data IDs.

Parameters

querySimpleQuery: Struct that represents the SQL query to constrain, either by appending to its WHERE clause, joining a new table or subquery, or both.
columnsCallable: A callable that accepts str dimension names and returns SQLAlchemy objects representing a column for that dimension’s primary key value in the query.

expanded() → lsst.daf.butler.registry.queries.DataCoordinateQueryResults¶

Return a results object for which hasRecords returns True.

This method may involve actually executing database queries to fetch DimensionRecord objects.

Returns

resultsDataCoordinateQueryResults: A results object for which hasRecords returns True. May be self if that is already the case.

Notes

For very result sets, it may be much more efficient to call materialize before calling expanded, to avoid performing the original query multiple times (as a subquery) in the follow-up queries that fetch dimension records. For example:

with registry.queryDataIds(...).materialize() as tempDataIds:
    dataIdsWithRecords = tempDataIds.expanded()
    for dataId in dataIdsWithRecords:
        ...

findDatasets(datasetType: Union[lsst.daf.butler.DatasetType, str], collections: Any, *, findFirst: bool = True) → lsst.daf.butler.registry.queries.ParentDatasetQueryResults¶

Find datasets using the data IDs identified by this query.

Parameters

datasetTypeDatasetType or str: Dataset type or the name of one to search for. Must have dimensions that are a subset of self.graph.
collectionsAny: An expression that fully or partially identifies the collections to search for the dataset, such as a str, re.Pattern, or iterable thereof. ... can be used to return all collections. See Collection expressions for more information.
findFirstbool, optional: If True (default), for each result data ID, only yield one DatasetRef, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be ....

Returns

datasetsParentDatasetQueryResults: A lazy-evaluation object representing dataset query results, iterable over DatasetRef objects. If self.hasRecords(), all nested data IDs in those dataset references will have records as well.

Raises

ValueError: Raised if datasetType.dimensions.issubset(self.graph) is False.

static fromScalar(dataId: lsst.daf.butler.DataCoordinate) → lsst.daf.butler.core.dimensions._dataCoordinateIterable._ScalarDataCoordinateIterable¶

Return a DataCoordinateIterable containing the single data ID.

Parameters

dataIdDataCoordinate: Data ID to adapt. Must be a true DataCoordinate instance, not an arbitrary mapping. No runtime checking is performed.

Returns

iterableDataCoordinateIterable: A DataCoordinateIterable instance of unspecified (i.e. implementation-detail) subclass. Guaranteed to implement the collections.abc.Sized (i.e. __len__) and collections.abc.Container (i.e. __contains__) interfaces as well as that of DataCoordinateIterable.

hasFull() → bool ¶

Indicate if all data IDs in this iterable identify all dimensions.

Not just required dimensions.

Returns

statebool: If True, all(d.hasFull() for d in iterable) is guaranteed. If False, no guarantees are made.

hasRecords() → bool ¶

Return whether all data IDs in this iterable contain records.

Returns

statebool: If True, all(d.hasRecords() for d in iterable) is guaranteed. If False, no guarantees are made.

materialize() → Iterator[lsst.daf.butler.registry.queries.DataCoordinateQueryResults]¶

Insert this query’s results into a temporary table.

Returns

contexttyping.ContextManager [ DataCoordinateQueryResults ]: A context manager that ensures the temporary table is created and populated in __enter__ (returning a results object backed by that table), and dropped in __exit__. If self is already materialized, the context manager may do nothing (reflecting the fact that an outer context manager should already take care of everything else).

Notes

When using a very large result set to perform multiple queries (e.g. multiple calls to subset with different arguments, or even a single call to expanded), it may be much more efficient to start by materializing the query and only then performing the follow up queries. It may also be less efficient, depending on how well database engine’s query optimizer can simplify those particular follow-up queries and how efficiently it caches query results even when the are not explicitly inserted into a temporary table. See expanded and subset for examples.

subset(graph: Optional[lsst.daf.butler.DimensionGraph] = None, *, unique: bool = False) → lsst.daf.butler.registry.queries.DataCoordinateQueryResults¶

Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.

This method may involve actually executing database queries to fetch DimensionRecord objects.

Parameters

graphDimensionGraph, optional: Dimensions to include in the new results object. If None, self.graph is used.
uniquebool, optional: If True (False is default), the query should only return unique data IDs. This is implemented in the database; to obtain unique results via Python-side processing (which may be more efficient in some cases), use toSet to construct a DataCoordinateSet from this results object instead.

Returns

resultsDataCoordinateQueryResults: A results object corresponding to the given criteria. May be self if it already qualifies.

Notes

This method can only return a “near-subset” of the original result rows in general because of subtleties in how spatial overlaps are implemented; see Query.subset for more information.

When calling subset multiple times on the same very large result set, it may be much more efficient to call materialize first. For example:

dimensions1 = DimensionGraph(...)
dimensions2 = DimensionGraph(...)
with registry.queryDataIds(...).materialize() as tempDataIds:
    for dataId1 in tempDataIds.subset(
            graph=dimensions1,
            unique=True):
        ...
    for dataId2 in tempDataIds.subset(
            graph=dimensions2,
            unique=True):
        ...

toSequence() → lsst.daf.butler.DataCoordinateSequence¶

Transform this iterable into a DataCoordinateSequence.

Returns

seqDataCoordinateSequence: A new DatasetCoordinateSequence with the same elements as self, in the same order. May be self if it is already a DataCoordinateSequence.

toSet() → lsst.daf.butler.DataCoordinateSet¶

Transform this iterable into a DataCoordinateSet.

Returns

setDataCoordinateSet: A DatasetCoordinateSet instance with the same elements as self, after removing any duplicates. May be self if it is already a DataCoordinateSet.

Navigation

DataCoordinateQueryResults¶