Query

class lsst.daf.butler.registry.queries.Query(*, connection: sqlalchemy.engine.base.Connection, sql: sqlalchemy.sql.selectable.FromClause, summary: lsst.daf.butler.registry.queries._structs.QuerySummary, columns: lsst.daf.butler.registry.queries._structs.QueryColumns, parameters: lsst.daf.butler.registry.queries._structs.QueryParameters)

Bases: object

A wrapper for a SQLAlchemy query that knows how to re-bind parameters and transform result rows into data IDs and dataset references.

A Query should almost always be constructed directly by a call to QueryBuilder.finish; direct construction will make it difficult to be able to maintain invariants between arguments (see the documentation for QueryColumns and QueryParameters for more information).

Parameters:
connection: `sqlalchemy.engine.Connection`

Connection used to execute the query.

sql : sqlalchemy.sql.FromClause

A complete SELECT query, including at least SELECT, FROM, and WHERE clauses.

summary : QuerySummary

Struct that organizes the dimensions involved in the query.

columns : QueryColumns

Columns that are referenced in the query in any clause.

parameters : QueryParameters

Bind parameters for the query.

Notes

SQLAlchemy is used in the public interface of Query rather than just its implementation simply because avoiding this would entail writing wrappers for the sqlalchemy.engine.RowProxy and sqlalchemy.engine.ResultProxy classes that are themselves generic wrappers for lower-level Python DBAPI classes. Another layer would entail another set of computational overheads, but the only reason we would seriously consider not using SQLAlchemy here in the future would be to reduce computational overheads.

Methods Summary

bind(dataId) Return a dictionary that can be passed to a SQLAlchemy execute method to provide WHERE clause information at execution time rather than construction time.
execute(dataId) Execute the query.
extractDataId(row, *, graph) Extract a data ID from a result row.
extractDatasetRef(row, datasetType, dataId) Extract a DatasetRef from a result row.
predicate(region) Return a callable that can perform extra Python-side filtering of query results.

Methods Documentation

bind(dataId: lsst.daf.butler.core.dimensions.coordinate.ExpandedDataCoordinate) → Dict[str, Any]

Return a dictionary that can be passed to a SQLAlchemy execute method to provide WHERE clause information at execution time rather than construction time.

Most callers should call Query.execute directly instead; when called with a data ID, that calls bind internally.

Parameters:
dataId : ExpandedDataCoordinate

Data ID to transform into bind parameters. This must identify all dimensions in QuerySummary.given, and must have the same primary key values for all dimensions also identified by QuerySummary.dataId.

Returns:
parameters : dict

Dictionary that can be passed as the second argument (with self.sql this first argument) to SQLAlchemy execute methods.

Notes

Calling bind does not automatically update the callable returned by predicate with the given data ID’s region (if it has one). That must be done manually by passing the region when calling predicate.

execute(dataId: Optional[lsst.daf.butler.core.dimensions.coordinate.ExpandedDataCoordinate] = None) → sqlalchemy.engine.result.ResultProxy

Execute the query.

This may be called multiple times with different arguments to apply different bind parameter values without repeating the work of constructing the query.

Parameters:
dataId : ExpandedDataCoordinate, optional

Data ID to transform into bind parameters. This must identify all dimensions in QuerySummary.given, and must have the same primary key values for all dimensions also identified by QuerySummary.dataId. If not provided, QuerySummary.dataId must identify all dimensions in QuerySummary.given.

Returns:
results : sqlalchemy.engine.ResultProxy

Object representing the query results; see SQLAlchemy documentation for more information.

extractDataId(row: sqlalchemy.engine.result.RowProxy, *, graph: Optional[lsst.daf.butler.core.dimensions.graph.DimensionGraph] = None) → lsst.daf.butler.core.dimensions.coordinate.DataCoordinate

Extract a data ID from a result row.

Parameters:
row : sqlalchemy.engine.RowProxy

A result row from a SQLAlchemy SELECT query.

graph : DimensionGraph, optional

The dimensions the returned data ID should identify. If not provided, this will be all dimensions in QuerySummary.requested.

Returns:
dataId : DataCoordinate

A minimal data ID that identifies the requested dimensions but includes no metadata or implied dimensions.

extractDatasetRef(row: sqlalchemy.engine.result.RowProxy, datasetType: lsst.daf.butler.core.datasets.type.DatasetType, dataId: Optional[lsst.daf.butler.core.dimensions.coordinate.DataCoordinate] = None) → Tuple[lsst.daf.butler.core.datasets.ref.DatasetRef, Optional[int]]

Extract a DatasetRef from a result row.

Parameters:
row : sqlalchemy.engine.RowProxy

A result row from a SQLAlchemy SELECT query.

datasetType : DatasetType

Type of the dataset to extract. Must have been included in the Query via a call to QueryBuilder.joinDataset with isResult=True, or otherwise included in QueryColumns.datasets.

dataId : DataCoordinate

Data ID to attach to the DatasetRef. A minimal (i.e. base class) DataCoordinate is constructed from row if None.

Returns:
ref : DatasetRef

Reference to the dataset; guaranteed to have DatasetRef.id not None.

rank : int or None

Integer index of the collection in which this dataset was found, within the sequence of collections passed when constructing the query. None if QueryBuilder.joinDataset was called with addRank=False.

predicate(region: Optional[lsst.sphgeom.region.Region] = None) → Callable[[sqlalchemy.engine.result.RowProxy], bool]

Return a callable that can perform extra Python-side filtering of query results.

To get the expected results from a query, the returned predicate must be used to ignore rows for which it returns False; this permits the QueryBuilder implementation to move logic from the database to Python without changing the public interface.

Parameters:
region : sphgeom.Region, optional

A region that any result-row regions must overlap in order for the predicate to return True. If not provided, this will be the region in QuerySummary.dataId, if there is one.

Returns:
func : Callable

A callable that takes a single sqlalchemy.engine.RowProxy argmument and returns bool.