Query¶
- final class lsst.daf.butler.Query(driver: QueryDriver, tree: QueryTree)¶
Bases:
QueryBase
A method-chaining builder for butler queries.
- Parameters:
- driver
QueryDriver
Implementation object that knows how to actually execute queries.
- tree
QueryTree
Description of the query as a tree of joins and column expressions. The instance returned directly by the
Butler._query
entry point should be constructed viamake_identity_query_tree
.
- driver
Notes
Query
objects should never be constructed directly by users; useButler._query
instead.A
Query
object represents the first stage of query construction, in which constraints and joins are defined (roughly corresponding to the WHERE and FROM clauses in SQL). The various “results” objects represent the second (and final) stage, where the columns returned are specified and any sorting or integer slicing can be applied. Result objects are obtained from thedata_ids
,datasets
, anddimension_records
methods.Query
and query-result objects are always immutable (except for caching information fetched from the database or server), so modifier methods always return a new object without modifying the current one.Attributes Summary
The names of all dataset types joined into the query.
Dimensions currently present in the query, either directly or indirectly.
A factory for column expressions using overloaded operators.
Methods Summary
data_ids
([dimensions])Return a result object that is a
DataCoordinate
iterable.datasets
(dataset_type[, collections, find_first])Return a result object that is a
DatasetRef
iterable.dimension_records
(element)Return a result object that is a
DimensionRecord
iterable.join_data_coordinates
(iterable)Return a new query that joins in an explicit table of data IDs.
join_dataset_search
(dataset_type[, collections])Return a new query with a search for a dataset joined in.
join_dimensions
(dimensions)Return a new query that joins the logical tables for additional dimensions.
materialize
(*[, dimensions, datasets])Execute the query, save its results to a temporary location, and return a new query that represents fetching or joining against those saved results.
where
(*args[, bind])Return a query with a boolean-expression filter on its rows.
Attributes Documentation
- constraint_dataset_types¶
The names of all dataset types joined into the query.
The existence of datasets of these types constrains the data IDs of any type of result. Fields for these dataset types are also usable in ‘where’ expressions.
- constraint_dimensions¶
Dimensions currently present in the query, either directly or indirectly.
This includes dimensions that are present in any joined subquery (such as a dataset search, materialization, or data ID upload) or
where
argument, as well as any required or implied dependency of those dimensions.
- expression_factory¶
A factory for column expressions using overloaded operators.
Notes
Typically this attribute will be assigned to a single-character local variable, and then its (dynamic) attributes can be used to obtain references to columns that can be included in a query:
with butler._query() as query: x = query.expression_factory query = query.where( x.instrument == "LSSTCam", x.visit.day_obs > 20240701, x.any(x.band == 'u', x.band == 'y'), )
As shown above, the returned object also has an
any
method to create combine expressions with logical OR (as well asnot_
andall
, though the latter is rarely necessary sincewhere
already combines its arguments with AND).Proxies for fields associated with dataset types (
dataset_id
,ingest_date
,run
,collection
, as well astimespan
forCALIBRATION
collection searches) can be obtained with dict-like access instead:with butler._query() as query: query = query.order_by(x["raw"].ingest_date)
Expression proxy objects that correspond to scalar columns overload the standard comparison operators (
==
,!=
,<
,>
,<=
,>=
) and providein_range
,in_iterable
, andin_query
methods for membership tests. Fororder_by
contexts, they also have adesc
property to indicate that the sort order for that expression should be reversed.Proxy objects for region and timespan fields have an
overlaps
method, and timespans also havebegin
andend
properties to access scalar expression proxies for the bounds.All proxy objects also have a
is_null
property.Literal values can be created by calling
ExpressionFactory.literal
, but can almost always be created implicitly via overloaded operators instead.
Methods Documentation
- data_ids(dimensions: DimensionGroup | Iterable[str] | str | None = None) DataCoordinateQueryResults ¶
Return a result object that is a
DataCoordinate
iterable.- Parameters:
- dimensions
DimensionGroup
,str
, orIterable
[str
], optional The dimensions of the data IDs to yield, as either
DimensionGroup
instances orstr
names. Will be automatically expanded to a completeDimensionGroup
. These dimensions do not need to match the query’s currentdimensions
. Default isconstraint_dimensions
.
- dimensions
- Returns:
- data_ids
DataCoordinateQueryResults
Data IDs matching the given query parameters. These are guaranteed to identify all dimensions (
DataCoordinate.hasFull
returnsTrue
), but will not containDimensionRecord
objects (DataCoordinate.hasRecords
returnsFalse
). Callwith_dimension_records
on the returned object to include dimension records as well.
- data_ids
- datasets(dataset_type: str | DatasetType, collections: str | Iterable[str] | None = None, *, find_first: bool = True) DatasetRefQueryResults ¶
Return a result object that is a
DatasetRef
iterable.- Parameters:
- dataset_type
str
orDatasetType
The dataset type to search for.
- collections
str
orIterable
[str
], optional The collection or collections to search, in order. If not provided or
None
, and the dataset has not already been joined into the query, the default collection search path for this butler is used.- find_first
bool
, optional If
True
(default), for each result data ID, only yield oneDatasetRef
of eachDatasetType
, from the first collection in which a dataset of that dataset type appears (according to the order ofcollections
passed in). IfTrue
,collections
must not be...
.
- dataset_type
- Returns:
- refs
queries.DatasetRefQueryResults
Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e.
DataCoordinate.hasFull
will returnTrue
), but will not include dimension records (DataCoordinate.hasRecords
will beFalse
) unlesswith_dimension_records
is called on the result object (which returns a new one).
- refs
- Raises:
- lsst.daf.butler.registry.DatasetTypeExpressionError
Raised when the
dataset_type
expression is invalid.- lsst.daf.butler.registry.NoDefaultCollectionError
Raised when
collections
isNone
and default butler collections are not defined.- TypeError
Raised when the arguments are incompatible, such as when a collection wildcard is passed when
find_first
isTrue
Notes
When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included.
- dimension_records(element: str) DimensionRecordQueryResults ¶
Return a result object that is a
DimensionRecord
iterable.- Parameters:
- element
str
The name of a dimension element to obtain records for.
- element
- Returns:
- records
queries.DimensionRecordQueryResults
Data IDs matching the given query parameters.
- records
- join_data_coordinates(iterable: Iterable[DataCoordinate]) Query ¶
Return a new query that joins in an explicit table of data IDs.
- Parameters:
- iterable
Iterable
[DataCoordinate
] Iterable of
DataCoordinate
. All items must have the same dimensions. Must have at least one item.
- iterable
- Returns:
- query
Query
A new query object with the data IDs joined in.
- query
- join_dataset_search(dataset_type: str | DatasetType, collections: Iterable[str] | None = None) Query ¶
Return a new query with a search for a dataset joined in.
- Parameters:
- dataset_type
str
orDatasetType
Dataset type or name. May not refer to a dataset component.
- collections
Iterable
[str
], optional Iterable of collections to search. Order is preserved, but will not matter if the dataset search is only used as a constraint on dimensions or if
find_first=False
when requesting results. If not present orNone
, the default collection search path will be used.
- dataset_type
- Returns:
- query
Query
A new query object with dataset columns available and rows restricted to those consistent with the found data IDs.
- query
- Raises:
- DatasetTypeError
Raised given dataset type is inconsistent with the registered dataset type.
- MissingDatasetTypeError
Raised if the dataset type has not been registered and only a
str
dataset type name was given.
Notes
This method may require communication with the server unless the dataset type and collections have already been referenced by the same query context.
- join_dimensions(dimensions: Iterable[str] | DimensionGroup) Query ¶
Return a new query that joins the logical tables for additional dimensions.
- Parameters:
- dimensions
Iterable
[str
] orDimensionGroup
Names of dimensions to join in.
- dimensions
- Returns:
- query
Query
A new query object with the dimensions joined in.
- query
Notes
Dimensions are automatically joined in whenever needed, so this method should rarely need to be called directly.
- materialize(*, dimensions: Iterable[str] | DimensionGroup | None = None, datasets: Iterable[str] | None = None) Query ¶
Execute the query, save its results to a temporary location, and return a new query that represents fetching or joining against those saved results.
- Parameters:
- dimensions
Iterable
[str
] orDimensionGroup
, optional Dimensions to include in the temporary results. Default is to include all dimensions in the query.
- datasets
Iterable
[str
], optional Names of dataset types that should be included in the new query; default is to include
constraint_dataset_types
.
- dimensions
- Returns:
- query
Query
A new query object whose that represents the materialized rows.
- query
Notes
Only dimension key columns and (at the discretion of the implementation) certain dataset columns are actually materialized, since at this stage we do not know which dataset or dimension record fields are actually needed in result rows, and these can be joined back in on the materialized dimension keys. But all constraints on those dimension keys (including dataset existence) are applied to the materialized rows.
- where(*args: str | Predicate | DataCoordinate | Mapping[str, Any], bind: Mapping[str, Any] | None = None, **kwargs: Any) Query ¶
Return a query with a boolean-expression filter on its rows.
- Parameters:
- *args
Constraints to apply, combined with logical AND. Arguments may be
str
expressions to parse,Predicate
objects (these are typically constructed viaexpression_factory
) or data IDs.- bind
Mapping
Mapping from string identifier appearing in a string expression to a literal value that should be substituted for it. This is recommended instead of embedding literals directly into the expression, especially for strings, timespans, or other types where quoting or formatting is nontrivial.
- **kwargs
Data ID key value pairs that extend and override any present in
*args
.
- Returns:
- query
Query
A new query object with the given row filters (as well as any already present in
self
). All row filters are combined with logical AND.
- query
Notes
If an expression references a dimension or dimension element that is not already present in the query, it will be joined in, but dataset searches must already be joined into a query in order to reference their fields in expressions.
Data ID values are not checked for consistency; they are extracted from
args
and thenkwargs
and combined, with later values overriding earlier ones.