DataCoordinateQueryResults¶
- class lsst.daf.butler.DataCoordinateQueryResults¶
Bases:
Iterable
[DataCoordinate
]An interface for objects that represent the results of queries for data IDs.
Attributes Summary
The dimensions of the data IDs returned by this query.
Methods Summary
any
(*[, execute, exact])Test whether this query returns any results.
count
(*[, exact, discard])Count the number of rows this query would return.
expanded
()Return a results object for which
has_records
returnsTrue
.explain_no_results
([execute])Return human-readable messages that may help explain why the query yields no results.
find_datasets
(dataset_type, collections, *)Find datasets using the data IDs identified by this query.
find_related_datasets
(dataset_type, ...[, ...])Find datasets using the data IDs identified by this query, and return them along with the original data IDs.
has_full
()Indicate if all data IDs in this iterable identify all dimensions, not just required dimensions.
Return whether all data IDs in this iterable contain records.
limit
([limit, offset])Make the iterator return limited number of records.
Insert this query's results into a temporary table.
order_by
(*args)Make the iterator return ordered results.
subset
([dimensions, unique])Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
Attributes Documentation
- dimensions¶
The dimensions of the data IDs returned by this query.
Methods Documentation
- abstract any(*, execute: bool = True, exact: bool = True) bool ¶
Test whether this query returns any results.
- Parameters:
- execute
bool
, optional If
True
, execute at least aLIMIT 1
query if it cannot be determined prior to execution that the query would return no rows.- exact
bool
, optional If
True
, run the full query and perform post-query filtering if needed, until at least one result row is found. IfFalse
, the returned result does not account for post-query filtering, and hence may beTrue
even when all result rows would be filtered out.
- execute
- Returns:
- abstract count(*, exact: bool = True, discard: bool = False) int ¶
Count the number of rows this query would return.
- Parameters:
- exact
bool
, optional If
True
, run the full query and perform post-query filtering if needed to account for that filtering in the count. IfFalse
, the result may be an upper bound.- discard
bool
, optional If
True
, compute the exact count even if it would require running the full query and then throwing away the result rows after counting them. IfFalse
, this is an error, as the user would usually be better off executing the query first to fetch its rows into a new query (or passingexact=False
). Ignored ifexact=False
.
- exact
- Returns:
- count
int
The number of rows the query would return, or an upper bound if
exact=False
.
- count
Notes
This counts the number of rows returned, not the number of unique rows returned, so even with
exact=True
it may provide only an upper bound on the number of deduplicated result rows.
- abstract expanded() DataCoordinateQueryResults ¶
Return a results object for which
has_records
returnsTrue
.This method may involve actually executing database queries to fetch
DimensionRecord
objects.- Returns:
- results
DataCoordinateQueryResults
A results object for which
has_records
returnsTrue
. May beself
if that is already the case.
- results
Notes
For very result sets, it may be much more efficient to call
materialize
before callingexpanded
, to avoid performing the original query multiple times (as a subquery) in the follow-up queries that fetch dimension records. For example:with butler.query() as query: with query.data_ids(...).materialize() as tempDataIds: dataIdsWithRecords = tempDataIds.expanded() for dataId in dataIdsWithRecords: ...
- abstract explain_no_results(execute: bool = True) Iterable[str] ¶
Return human-readable messages that may help explain why the query yields no results.
- abstract find_datasets(dataset_type: DatasetType | str, collections: Any, *, find_first: bool = True) DatasetQueryResults ¶
Find datasets using the data IDs identified by this query.
- Parameters:
- dataset_type
DatasetType
orstr
Dataset type or the name of one to search for. Must have dimensions that are a subset of
self.dimensions
.- collections
Any
An expression that fully or partially identifies the collections to search for the dataset, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all collections. See Collection expressions for more information.- find_first
bool
, optional If
True
(default), for each result data ID, only yield oneDatasetRef
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
.
- dataset_type
- Returns:
- datasets
ParentDatasetQueryResults
A lazy-evaluation object representing dataset query results, iterable over
DatasetRef
objects. Ifself.has_records()
, all nested data IDs in those dataset references will have records as well.
- datasets
- Raises:
- MissingDatasetTypeError
Raised if the given dataset type is not registered.
Find datasets using the data IDs identified by this query, and return them along with the original data IDs.
This is a variant of
find_datasets
that is often more useful when the target dataset type does not have all of the dimensions of the original data ID query, as is generally the case with calibration lookups.- Parameters:
- dataset_type
DatasetType
orstr
Dataset type or the name of one to search for. Must have dimensions that are a subset of
self.dimensions
.- collections
Any
An expression that fully or partially identifies the collections to search for the dataset, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all collections. See Collection expressions for more information.- find_first
bool
, optional If
True
(default), for each data ID inself
, only yield oneDatasetRef
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
. Note that this is not the same as yielding oneDatasetRef
for each yielded data ID ifdimensions
is notNone
.- dimensions
DimensionGroup
, orIterable
[str
], optional The dimensions of the data IDs returned. Must be a subset of
self.dimensions
.
- dataset_type
- Returns:
- pairs
Iterable
[tuple
[DataCoordinate
,DatasetRef
] ] An iterable of (data ID, dataset reference) pairs.
- pairs
- Raises:
- MissingDatasetTypeError
Raised if the given dataset type is not registered.
- abstract has_full() bool ¶
Indicate if all data IDs in this iterable identify all dimensions, not just required dimensions.
- abstract limit(limit: int | None = None, offset: int = 0) DataCoordinateQueryResults ¶
Make the iterator return limited number of records.
- Parameters:
- Returns:
- result
DataCoordinateQueryResults
Returns
self
instance which is updated to return limited set of records.
- result
Notes
This method modifies the iterator in place and returns the same instance to support method chaining. Normally this method is used together with
order_by
method.
- abstract materialize() AbstractContextManager[DataCoordinateQueryResults] ¶
Insert this query’s results into a temporary table.
- Returns:
- context
typing.ContextManager
[DataCoordinateQueryResults
] A context manager that ensures the temporary table is created and populated in
__enter__
(returning a results object backed by that table), and dropped in__exit__
. Ifself
is already materialized, the context manager may do nothing (reflecting the fact that an outer context manager should already take care of everything else).
- context
Notes
When using a very large result set to perform multiple queries (e.g. multiple calls to
subset
with different arguments, or even a single call toexpanded
), it may be much more efficient to start by materializing the query and only then performing the follow up queries. It may also be less efficient, depending on how well database engine’s query optimizer can simplify those particular follow-up queries and how efficiently it caches query results even when the are not explicitly inserted into a temporary table. Seeexpanded
andsubset
for examples.
- abstract order_by(*args: str) DataCoordinateQueryResults ¶
Make the iterator return ordered results.
- Parameters:
- *args
str
Names of the columns/dimensions to use for ordering. Column name can be prefixed with minus (
-
) to use descending ordering.
- *args
- Returns:
- result
DataCoordinateQueryResults
Returns
self
instance which is updated to return ordered result.
- result
Notes
This method modifies the iterator in place and returns the same instance to support method chaining.
- abstract subset(dimensions: DimensionGroup | Iterable[str] | None = None, *, unique: bool = False) DataCoordinateQueryResults ¶
Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
This method may involve actually executing database queries to fetch
DimensionRecord
objects.- Parameters:
- dimensions
DimensionGroup
orIterable
[str
], optional Dimensions to include in the new results object. If
None
,self.dimensions
is used.- unique
bool
, optional If
True
(False
is default), the query should only return unique data IDs. This is implemented in the database; to obtain unique results via Python-side processing (which may be more efficient in some cases), usetoSet
to construct aDataCoordinateSet
from this results object instead.
- dimensions
- Returns:
- results
DataCoordinateQueryResults
A results object corresponding to the given criteria. May be
self
if it already qualifies.
- results
- Raises:
- ValueError
Raised when
dimensions
is not a subset of the dimensions in this result.
Notes
This method can only return a “near-subset” of the original result rows in general because of subtleties in how spatial overlaps are implemented; see
Query.projected
for more information.When calling
subset
multiple times on the same very large result set, it may be much more efficient to callmaterialize
first. For example:dimensions1 = DimensionGroup(...) dimensions2 = DimensionGroup(...) with butler.query(...)as query: with query.data_ids(...).materialize() as data_ids: for dataId1 in data_ids.subset(dimensions1, unique=True): ... for dataId2 in data_ids.subset(dimensions2, unique=True): ...