DataCoordinateQueryResults¶
- class lsst.daf.butler.registry.queries.DataCoordinateQueryResults¶
Bases:
QueryResultsBase
,DataCoordinateIterable
An enhanced implementation of
DataCoordinateIterable
that represents data IDs retrieved from a database query.Methods Summary
expanded
()Return a results object for which
hasRecords
returnsTrue
.findDatasets
(datasetType, collections, *[, ...])Find datasets using the data IDs identified by this query.
findRelatedDatasets
(datasetType, collections, *)Find datasets using the data IDs identified by this query, and return them along with the original data IDs.
Insert this query's results into a temporary table.
subset
([dimensions, unique])Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
Methods Documentation
- abstract expanded() DataCoordinateQueryResults ¶
Return a results object for which
hasRecords
returnsTrue
.This method may involve actually executing database queries to fetch
DimensionRecord
objects.- Returns:
- results
DataCoordinateQueryResults
A results object for which
hasRecords
returnsTrue
. May beself
if that is already the case.
- results
Notes
For very result sets, it may be much more efficient to call
materialize
before callingexpanded
, to avoid performing the original query multiple times (as a subquery) in the follow-up queries that fetch dimension records. For example:with registry.queryDataIds(...).materialize() as tempDataIds: dataIdsWithRecords = tempDataIds.expanded() for dataId in dataIdsWithRecords: ...
- abstract findDatasets(datasetType: DatasetType | str, collections: Any, *, findFirst: bool = True, components: bool = False) ParentDatasetQueryResults ¶
Find datasets using the data IDs identified by this query.
- Parameters:
- datasetType
DatasetType
orstr
Dataset type or the name of one to search for. Must have dimensions that are a subset of
self.graph
.- collections
Any
An expression that fully or partially identifies the collections to search for the dataset, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all collections. See Collection expressions for more information.- findFirst
bool
, optional If
True
(default), for each result data ID, only yield oneDatasetRef
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
.- components
bool
, optional Must be
False
. Provided only for backwards compatibility. After v27 this argument will be removed entirely.
- datasetType
- Returns:
- datasets
ParentDatasetQueryResults
A lazy-evaluation object representing dataset query results, iterable over
DatasetRef
objects. Ifself.hasRecords()
, all nested data IDs in those dataset references will have records as well.
- datasets
- Raises:
- MissingDatasetTypeError
Raised if the given dataset type is not registered.
- abstract findRelatedDatasets(datasetType: DatasetType | str, collections: Any, *, findFirst: bool = True, dimensions: DimensionGroup | Iterable[str] | None = None) Iterable[tuple[lsst.daf.butler.dimensions._coordinate.DataCoordinate, lsst.daf.butler._dataset_ref.DatasetRef]] ¶
Find datasets using the data IDs identified by this query, and return them along with the original data IDs.
This is a variant of
findDatasets
that is often more useful when the target dataset type does not have all of the dimensions of the original data ID query, as is generally the case with calibration lookups.- Parameters:
- datasetType
DatasetType
orstr
Dataset type or the name of one to search for. Must have dimensions that are a subset of
self.graph
.- collections
Any
An expression that fully or partially identifies the collections to search for the dataset, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all collections. See Collection expressions for more information.- findFirst
bool
, optional If
True
(default), for each data ID inself
, only yield oneDatasetRef
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
. Note that this is not the same as yielding oneDatasetRef
for each yielded data ID ifdimensions
is notNone
.- dimensions
DimensionGroup
orIterable
[str
], optional The dimensions of the data IDs returned. Must be a subset of
self.dimensions
.
- datasetType
- Returns:
- Raises:
- MissingDatasetTypeError
Raised if the given dataset type is not registered.
- abstract materialize() AbstractContextManager[DataCoordinateQueryResults] ¶
Insert this query’s results into a temporary table.
- Returns:
- context
typing.ContextManager
[DataCoordinateQueryResults
] A context manager that ensures the temporary table is created and populated in
__enter__
(returning a results object backed by that table), and dropped in__exit__
. Ifself
is already materialized, the context manager may do nothing (reflecting the fact that an outer context manager should already take care of everything else).
- context
Notes
When using a very large result set to perform multiple queries (e.g. multiple calls to
subset
with different arguments, or even a single call toexpanded
), it may be much more efficient to start by materializing the query and only then performing the follow up queries. It may also be less efficient, depending on how well database engine’s query optimizer can simplify those particular follow-up queries and how efficiently it caches query results even when the are not explicitly inserted into a temporary table. Seeexpanded
andsubset
for examples.
- abstract subset(dimensions: DimensionGroup | Iterable[str] | None = None, *, unique: bool = False) DataCoordinateQueryResults ¶
Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
This method may involve actually executing database queries to fetch
DimensionRecord
objects.- Parameters:
- dimensions
DimensionGroup
orIterable
[str
], optional Dimensions to include in the new results object. If
None
,self.dimensions
is used.- unique
bool
, optional If
True
(False
is default), the query should only return unique data IDs. This is implemented in the database; to obtain unique results via Python-side processing (which may be more efficient in some cases), usetoSet
to construct aDataCoordinateSet
from this results object instead.
- dimensions
- Returns:
- results
DataCoordinateQueryResults
A results object corresponding to the given criteria. May be
self
if it already qualifies.
- results
- Raises:
- ValueError
Raised when
dimensions
is not a subset of the dimensions in this result.
Notes
This method can only return a “near-subset” of the original result rows in general because of subtleties in how spatial overlaps are implemented; see
Query.projected
for more information.When calling
subset
multiple times on the same very large result set, it may be much more efficient to callmaterialize
first. For example:dimensions1 = DimensionGroup(...) dimensions2 = DimensionGroup(...) with registry.queryDataIds(...).materialize() as tempDataIds: for dataId1 in tempDataIds.subset(dimensions1, unique=True): ... for dataId2 in tempDataIds.subset(dimensions2, unique=True): ...