Query¶
-
class
lsst.daf.butler.registry.queries.
Query
(*, graph:lsst.daf.butler.DimensionGraph
, whereRegion: Optional[lsst.sphgeom._sphgeom.Region], managers:lsst.daf.butler.registry.queries.RegistryManagers
)¶ Bases:
abc.ABC
An abstract base class for queries that return some combination of
DatasetRef
andDataCoordinate
objects.- Parameters
- graph
DimensionGraph
Object describing the dimensions included in the query.
- whereRegion
lsst.sphgeom.Region
, optional Region that all region columns in all returned rows must overlap.
- managers
RegistryManagers
A struct containing the registry manager instances used by the query system.
- graph
Notes
The
Query
hierarchy abstracts over the database/SQL representation of a particular set of data IDs or datasets. It is expected to be used as a backend for other objects that provide more natural interfaces for one or both of these, not as part of a public interface to query results.Attributes Summary
The
DatasetType
of datasets returned by this query, orNone
if there are no dataset results (DatasetType
orNone
).An iterator over the dimension element columns used in post-query filtering of spatial overlaps (
Iterator
[DimensionElement
]).A SQLAlchemy object representing the full query (
sqlalchemy.sql.FromClause
orNone
).Methods Summary
extractDataId
(row, *[, graph, records])Extract a data ID from a result row.
extractDatasetRef
(row[, dataId, records])Extract a
DatasetRef
from a result row.extractDimensionsTuple
(row, dimensions)Extract a tuple of data ID values from a result row.
Return the columns for the datasets returned by this query.
getDimensionColumn
(name)Return the query column that contains the primary key value for the dimension with the given name.
getRegionColumn
(name)Return a region column for one of the dimension elements iterated over by
spatial
.isUnique
()Return
True
if this query’s rows are guaranteed to be unique, andFalse
otherwise.makeBuilder
([summary])Return a
QueryBuilder
that can be used to construct a newQuery
that is joined to (and hence constrained by) this one.materialize
(db)Execute this query and insert its results into a temporary table.
predicate
([region])Return a callable that can perform extra Python-side filtering of query results.
rows
(db, *[, region])Execute the query and yield result rows, applying
predicate
.subset
(*[, graph, datasets, unique])Return a new
Query
whose columns and/or rows are (mostly) subset of this one’s.Attributes Documentation
-
datasetType
¶ The
DatasetType
of datasets returned by this query, orNone
if there are no dataset results (DatasetType
orNone
).
-
spatial
¶ An iterator over the dimension element columns used in post-query filtering of spatial overlaps (
Iterator
[DimensionElement
]).Notes
This property is intended primarily as a hook for subclasses to implement and the ABC to call in order to provide higher-level functionality; code that uses
Query
objects (but does not implement one) should usually not have to access this property.
-
sql
¶ A SQLAlchemy object representing the full query (
sqlalchemy.sql.FromClause
orNone
).This is
None
in the special case where the query has no columns, and only one logical row.
Methods Documentation
-
extractDataId
(row: Optional[sqlalchemy.engine.result.RowProxy], *, graph: Optional[lsst.daf.butler.DimensionGraph
] = None, records: Optional[Mapping[str, Mapping[tuple,lsst.daf.butler.DimensionRecord
]]] = None) →lsst.daf.butler.DataCoordinate
¶ Extract a data ID from a result row.
- Parameters
- row
sqlalchemy.engine.RowProxy
orNone
A result row from a SQLAlchemy SELECT query, or
None
to indicate the row from anEmptyQuery
.- graph
DimensionGraph
, optional The dimensions the returned data ID should identify. If not provided, this will be all dimensions in
QuerySummary.requested
.- records
Mapping
[str
,Mapping
[tuple
,DimensionRecord
] ] Nested mapping containing records to attach to the returned
DataCoordinate
, for whichhasRecords
will returnTrue
. If provided, outer keys must include all dimension element names ingraph
, and inner keys should be tuples of dimension primary key values in the same order aselement.graph.required
. If not provided,DataCoordinate.hasRecords
will returnFalse
on the returned object.
- row
- Returns
- dataId
DataCoordinate
A data ID that identifies all required and implied dimensions. If
records is not None
, this is havehasRecords()
returnTrue
.
- dataId
-
extractDatasetRef
(row: sqlalchemy.engine.result.RowProxy, dataId: Optional[lsst.daf.butler.DataCoordinate
] = None, records: Optional[Mapping[str, Mapping[tuple,lsst.daf.butler.DimensionRecord
]]] = None) →lsst.daf.butler.DatasetRef
¶ Extract a
DatasetRef
from a result row.- Parameters
- row
sqlalchemy.engine.RowProxy
A result row from a SQLAlchemy SELECT query.
- dataId
DataCoordinate
Data ID to attach to the
DatasetRef
. A minimal (i.e. base class)DataCoordinate
is constructed fromrow
ifNone
.- records
Mapping
[str
,Mapping
[tuple
,DimensionRecord
] ] Records to use to return an
ExpandedDataCoordinate
. If provided, outer keys must include all dimension element names ingraph
, and inner keys should be tuples of dimension primary key values in the same order aselement.graph.required
.
- row
- Returns
- ref
DatasetRef
Reference to the dataset; guaranteed to have
DatasetRef.id
notNone
.
- ref
-
extractDimensionsTuple
(row: Optional[sqlalchemy.engine.result.RowProxy], dimensions: Iterable[lsst.daf.butler.Dimension
]) → tuple¶ Extract a tuple of data ID values from a result row.
-
abstract
getDatasetColumns
() → Optional[lsst.daf.butler.registry.queries._structs.DatasetQueryColumns]¶ Return the columns for the datasets returned by this query.
- Returns
- columns
DatasetQueryColumns
orNone
Struct containing SQLAlchemy representations of the result columns for a dataset.
- columns
Notes
This method is intended primarily as a hook for subclasses to implement and the ABC to call in order to provide higher-level functionality; code that uses
Query
objects (but does not implement one) should usually not have to call this method.
-
abstract
getDimensionColumn
(name: str) → sqlalchemy.sql.elements.ColumnElement¶ Return the query column that contains the primary key value for the dimension with the given name.
- Parameters
- name
str
Name of the dimension.
- name
- Returns
- column
sqlalchemy.sql.ColumnElement
. SQLAlchemy object representing a column in the query.
- column
Notes
This method is intended primarily as a hook for subclasses to implement and the ABC to call in order to provide higher-level functionality; code that uses
Query
objects (but does not implement one) should usually not have to call this method.
-
abstract
getRegionColumn
(name: str) → sqlalchemy.sql.elements.ColumnElement¶ Return a region column for one of the dimension elements iterated over by
spatial
.- Parameters
- name
str
Name of the element.
- name
- Returns
- column
sqlalchemy.sql.ColumnElement
SQLAlchemy representing a result column in the query.
- column
Notes
This method is intended primarily as a hook for subclasses to implement and the ABC to call in order to provide higher-level functionality; code that uses
Query
objects (but does not implement one) should usually not have to call this method.
-
abstract
isUnique
() → bool¶ Return
True
if this query’s rows are guaranteed to be unique, andFalse
otherwise.If this query has dataset results (
datasetType
is notNone
), uniqueness applies to theDatasetRef
instances returned byextractDatasetRef
from the result ofrows
. If it does not have dataset results, uniqueness applies to theDataCoordinate
instances returned byextractDataId
.
-
abstract
makeBuilder
(summary: Optional[QuerySummary] = None) → QueryBuilder¶ Return a
QueryBuilder
that can be used to construct a newQuery
that is joined to (and hence constrained by) this one.- Parameters
- summary
QuerySummary
, optional A
QuerySummary
instance that specifies the dimensions and any additional constraints to include in the new query being constructed, orNone
to use the dimensions ofself
with no additional constraints.
- summary
-
materialize
(db:lsst.daf.butler.registry.interfaces.Database
) → Iterator[lsst.daf.butler.registry.queries.Query
]¶ Execute this query and insert its results into a temporary table.
- Parameters
- db
Database
Database engine to execute the query against.
- db
- Returns
- context
typing.ContextManager
[MaterializedQuery
] A context manager that ensures the temporary table is created and populated in
__enter__
(returning aMaterializedQuery
object backed by that table), and dropped in__exit__
. Ifself
is already aMaterializedQuery
,__enter__
may just returnself
and__exit__
may do nothing (reflecting the fact that an outer context manager should already take care of everything else).
- context
-
predicate
(region: Optional[lsst.sphgeom._sphgeom.Region] = None) → Callable[[sqlalchemy.engine.result.RowProxy], bool]¶ Return a callable that can perform extra Python-side filtering of query results.
To get the expected results from a query, the returned predicate must be used to ignore rows for which it returns
False
; this permits theQueryBuilder
implementation to move logic from the database to Python without changing the public interface.- Parameters
- region
sphgeom.Region
, optional A region that any result-row regions must overlap in order for the predicate to return
True
. If not provided, this will beself.whereRegion
, if that exists.
- region
- Returns
- func
Callable
A callable that takes a single
sqlalchemy.engine.RowProxy
argmument and returnsbool
.
- func
-
rows
(db:lsst.daf.butler.registry.interfaces.Database
, *, region: Optional[lsst.sphgeom._sphgeom.Region] = None) → Iterator[Optional[sqlalchemy.engine.result.RowProxy]]¶ Execute the query and yield result rows, applying
predicate
.- Parameters
- region
sphgeom.Region
, optional A region that any result-row regions must overlap in order to be yielded. If not provided, this will be
self.whereRegion
, if that exists.
- region
- Yields
-
abstract
subset
(*, graph: Optional[lsst.daf.butler.DimensionGraph
] = None, datasets: bool = True, unique: bool = False) →lsst.daf.butler.registry.queries.Query
¶ Return a new
Query
whose columns and/or rows are (mostly) subset of this one’s.- Parameters
- graph
DimensionGraph
, optional Dimensions to include in the new
Query
being constructed. IfNone
(default),self.graph
is used.- datasets
bool
, optional Whether the new
Query
should include dataset results. Defaults toTrue
, but is ignored ifself
does not include dataset results.- unique
bool
, optional Whether the new
Query
should guarantee unique results (this may come with a performance penalty).
- graph
- Returns
- query
Query
A query object corresponding to the given inputs. May be
self
if no changes were requested.
- query
Notes
The way spatial overlaps are handled at present makes it impossible to fully guarantee in general that the new query’s rows are a subset of this one’s while also returning unique rows. That’s because the database is only capable of performing approximate, conservative overlaps via the common skypix system; we defer actual region overlap operations to per-result-row Python logic. But including the region columns necessary to do that postprocessing in the query makes it impossible to do a SELECT DISTINCT on the user-visible dimensions of the query. For example, consider starting with a query with dimensions (instrument, skymap, visit, tract). That involves a spatial join between visit and tract, and we include the region columns from both tables in the results in order to only actually yield result rows (see
predicate
androws
) where the regions in those two columns overlap. If the user then wants to subset to just (skymap, tract) with unique results, we have two unpalatable options:we can do a SELECT DISTINCT with just the skymap and tract columns in the SELECT clause, dropping all detailed overlap information and including some tracts that did not actually overlap any of the visits in the original query (but were regarded as _possibly_ overlapping via the coarser, common-skypix relationships);
we can include the tract and visit region columns in the query, and continue to filter out the non-overlapping pairs, but completely disregard the user’s request for unique tracts.
This interface specifies that implementations must do the former, as that’s what makes things efficient in our most important use case (
QuantumGraph
generation inpipe_base
). We may be able to improve this situation in the future by putting exact overlap information in the database, either by using built-in (but engine-specific) spatial database functionality or (more likely) switching to a scheme in which pairwise dimension spatial relationships are explicitly precomputed (for e.g. combinations of instruments and skymaps).