DataCoordinateQueryResults¶
-
class
lsst.daf.butler.registry.queries.
DataCoordinateQueryResults
(db: lsst.daf.butler.registry.interfaces._database.Database, query: lsst.daf.butler.registry.queries._query.Query, *, records: Optional[Mapping[str, Mapping[tuple, lsst.daf.butler.core.dimensions._records.DimensionRecord]]] = None)¶ Bases:
lsst.daf.butler.DataCoordinateIterable
An enhanced implementation of
DataCoordinateIterable
that represents data IDs retrieved from a database query.Parameters: - db :
Database
Database engine used to execute queries.
- query :
Query
Low-level representation of the query that backs this result object.
- records :
Mapping
, optional A nested mapping containing
DimensionRecord
objects for all dimensions and all data IDs this query will yield. IfNone
(default),DataCoordinateIterable.hasRecords
will returnFalse
. The outer mapping hasstr
keys (the names of dimension elements). The inner mapping hastuple
keys representing data IDs (tuple conversions ofDataCoordinate.values()
) andDimensionRecord
values.
Notes
Constructing an instance of this does nothing; the query is not executed until it is iterated over (or some other operation is performed that involves iteration).
Instances should generally only be constructed by
Registry
methods or the methods of other query result objects.Attributes Summary
graph
Dimensions identified by these data IDs ( DimensionGraph
).universe
Universe that defines all known compatible dimensions. Methods Summary
any
(*, execute, exact)Test whether this query returns any results. constrain
(query, columns, …)Constrain a SQL query to include or relate to only known data IDs. count
(*, exact)Count the number of rows this query would return. expanded
()Return a results object for which hasRecords
returnsTrue
.explain_no_results
()Return human-readable messages that may help explain why the query yields no results. findDatasets
(datasetType, str], collections, …)Find datasets using the data IDs identified by this query. fromScalar
(dataId)Return a DataCoordinateIterable
containing the single data ID.hasFull
()Indicate if all data IDs in this iterable identify all dimensions. hasRecords
()Return whether all data IDs in this iterable contain records. materialize
()Insert this query’s results into a temporary table. subset
(graph, *, unique)Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows. toSequence
()Transform this iterable into a DataCoordinateSequence
.toSet
()Transform this iterable into a DataCoordinateSet
.Attributes Documentation
-
graph
¶ Dimensions identified by these data IDs (
DimensionGraph
).
-
universe
¶ Universe that defines all known compatible dimensions.
(
DimensionUniverse
).
Methods Documentation
-
any
(*, execute: bool = True, exact: bool = True) → bool¶ Test whether this query returns any results.
Parameters: - execute :
bool
, optional If
True
, execute at least aLIMIT 1
query if it cannot be determined prior to execution that the query would return no rows.- exact :
bool
, optional If
True
, run the full query and perform post-query filtering if needed, until at least one result row is found. IfFalse
, the returned result does not account for post-query filtering, and hence may beTrue
even when all result rows would be filtered out.
Returns: - execute :
-
constrain
(query: lsst.daf.butler.core.simpleQuery.SimpleQuery, columns: Callable[[str], sqlalchemy.sql.elements.ColumnElement]) → None¶ Constrain a SQL query to include or relate to only known data IDs.
Parameters: - query :
SimpleQuery
Struct that represents the SQL query to constrain, either by appending to its WHERE clause, joining a new table or subquery, or both.
- columns :
Callable
A callable that accepts
str
dimension names and returns SQLAlchemy objects representing a column for that dimension’s primary key value in the query.
- query :
-
count
(*, exact: bool = True) → int¶ Count the number of rows this query would return.
Parameters: Returns: - count :
int
The number of rows the query would return, or an upper bound if
exact=False
.
Notes
This counts the number of rows returned, not the number of unique rows returned, so even with
exact=True
it may provide only an upper bound on the number of deduplicated result rows.- count :
-
expanded
() → lsst.daf.butler.registry.queries._results.DataCoordinateQueryResults¶ Return a results object for which
hasRecords
returnsTrue
.This method may involve actually executing database queries to fetch
DimensionRecord
objects.Returns: - results :
DataCoordinateQueryResults
A results object for which
hasRecords
returnsTrue
. May beself
if that is already the case.
Notes
For very result sets, it may be much more efficient to call
materialize
before callingexpanded
, to avoid performing the original query multiple times (as a subquery) in the follow-up queries that fetch dimension records. For example:with registry.queryDataIds(...).materialize() as tempDataIds: dataIdsWithRecords = tempDataIds.expanded() for dataId in dataIdsWithRecords: ...
- results :
-
explain_no_results
() → Iterator[str]¶ Return human-readable messages that may help explain why the query yields no results.
Returns: - messages :
Iterator
[str
] String messages that describe reasons the query might not yield any results.
Notes
Messages related to post-query filtering are only available if the iterator has been exhausted, or if
any
orcount
was already called (withexact=True
for the latter two).At present, this method only returns messages that are generated while the query is being built or filtered. In the future, it may perform its own new follow-up queries, which users may wish to short-circuit simply by not continuing to iterate over its results.
- messages :
-
findDatasets
(datasetType: Union[lsst.daf.butler.core.datasets.type.DatasetType, str], collections: Any, *, findFirst: bool = True) → lsst.daf.butler.registry.queries._results.ParentDatasetQueryResults¶ Find datasets using the data IDs identified by this query.
Parameters: - datasetType :
DatasetType
orstr
Dataset type or the name of one to search for. Must have dimensions that are a subset of
self.graph
.- collections :
Any
An expression that fully or partially identifies the collections to search for the dataset, such as a
str
,re.Pattern
, or iterable thereof....
can be used to return all collections. See Collection expressions for more information.- findFirst :
bool
, optional If
True
(default), for each result data ID, only yield oneDatasetRef
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not contain regular expressions and may not be...
.
Returns: - datasets :
ParentDatasetQueryResults
A lazy-evaluation object representing dataset query results, iterable over
DatasetRef
objects. Ifself.hasRecords()
, all nested data IDs in those dataset references will have records as well.
Raises: - ValueError
Raised if
datasetType.dimensions.issubset(self.graph) is False
.
- datasetType :
-
static
fromScalar
(dataId: lsst.daf.butler.core.dimensions._coordinate.DataCoordinate) → lsst.daf.butler.core.dimensions._dataCoordinateIterable._ScalarDataCoordinateIterable¶ Return a
DataCoordinateIterable
containing the single data ID.Parameters: - dataId :
DataCoordinate
Data ID to adapt. Must be a true
DataCoordinate
instance, not an arbitrary mapping. No runtime checking is performed.
Returns: - iterable :
DataCoordinateIterable
A
DataCoordinateIterable
instance of unspecified (i.e. implementation-detail) subclass. Guaranteed to implement thecollections.abc.Sized
(i.e.__len__
) andcollections.abc.Container
(i.e.__contains__
) interfaces as well as that ofDataCoordinateIterable
.
- dataId :
-
hasFull
() → bool¶ Indicate if all data IDs in this iterable identify all dimensions.
Not just required dimensions.
Returns:
-
hasRecords
() → bool¶ Return whether all data IDs in this iterable contain records.
Returns:
-
materialize
() → Iterator[lsst.daf.butler.registry.queries._results.DataCoordinateQueryResults]¶ Insert this query’s results into a temporary table.
Returns: - context :
typing.ContextManager
[DataCoordinateQueryResults
] A context manager that ensures the temporary table is created and populated in
__enter__
(returning a results object backed by that table), and dropped in__exit__
. Ifself
is already materialized, the context manager may do nothing (reflecting the fact that an outer context manager should already take care of everything else).
Notes
When using a very large result set to perform multiple queries (e.g. multiple calls to
subset
with different arguments, or even a single call toexpanded
), it may be much more efficient to start by materializing the query and only then performing the follow up queries. It may also be less efficient, depending on how well database engine’s query optimizer can simplify those particular follow-up queries and how efficiently it caches query results even when the are not explicitly inserted into a temporary table. Seeexpanded
andsubset
for examples.- context :
-
subset
(graph: Optional[lsst.daf.butler.core.dimensions._graph.DimensionGraph] = None, *, unique: bool = False) → lsst.daf.butler.registry.queries._results.DataCoordinateQueryResults¶ Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.
This method may involve actually executing database queries to fetch
DimensionRecord
objects.Parameters: - graph :
DimensionGraph
, optional Dimensions to include in the new results object. If
None
,self.graph
is used.- unique :
bool
, optional If
True
(False
is default), the query should only return unique data IDs. This is implemented in the database; to obtain unique results via Python-side processing (which may be more efficient in some cases), usetoSet
to construct aDataCoordinateSet
from this results object instead.
Returns: - results :
DataCoordinateQueryResults
A results object corresponding to the given criteria. May be
self
if it already qualifies.
Notes
This method can only return a “near-subset” of the original result rows in general because of subtleties in how spatial overlaps are implemented; see
Query.subset
for more information.When calling
subset
multiple times on the same very large result set, it may be much more efficient to callmaterialize
first. For example:dimensions1 = DimensionGraph(...) dimensions2 = DimensionGraph(...) with registry.queryDataIds(...).materialize() as tempDataIds: for dataId1 in tempDataIds.subset( graph=dimensions1, unique=True): ... for dataId2 in tempDataIds.subset( graph=dimensions2, unique=True): ...
- graph :
-
toSequence
() → lsst.daf.butler.core.dimensions._dataCoordinateIterable.DataCoordinateSequence¶ Transform this iterable into a
DataCoordinateSequence
.Returns: - seq :
DataCoordinateSequence
A new
DatasetCoordinateSequence
with the same elements asself
, in the same order. May beself
if it is already aDataCoordinateSequence
.
- seq :
-
toSet
() → lsst.daf.butler.core.dimensions._dataCoordinateIterable.DataCoordinateSet¶ Transform this iterable into a
DataCoordinateSet
.Returns: - set :
DataCoordinateSet
A
DatasetCoordinateSet
instance with the same elements asself
, after removing any duplicates. May beself
if it is already aDataCoordinateSet
.
- set :
- db :