Query

final class lsst.daf.butler.queries.Query(driver: QueryDriver, tree: QueryTree | None = None)

Bases: QueryBase

A method-chaining builder for butler queries.

Parameters:
driverQueryDriver

Implementation object that knows how to actually execute queries.

treeQueryTree, optional

Description of the query as a tree of joins and column expressions. Defaults to the result of a call to tree.make_identity_query_tree.

Notes

Query objects should never be constructed directly by users; use Butler.query instead.

A Query object represents the first stage of query construction, in which constraints and joins are defined (roughly corresponding to the WHERE and FROM clauses in SQL). The various “results” objects represent the second (and final) stage, where the columns returned are specified and any sorting or integer slicing can be applied. Result objects are obtained from the data_ids, datasets, and dimension_records methods.

Query and query-result objects are always immutable (except for caching information fetched from the database or server), so modifier methods always return a new object without modifying the current one.

Attributes Summary

constraint_dataset_types

The names of all dataset types joined into the query.

constraint_dimensions

Dimensions currently present in the query, either directly or indirectly.

expression_factory

A factory for column expressions using overloaded operators.

Methods Summary

any(*[, execute, exact])

Test whether the query would return any rows.

data_ids([dimensions])

Return a result object that is a DataCoordinate iterable.

datasets(dataset_type[, collections, find_first])

Return a result object that is a DatasetRef iterable.

dimension_records(element)

Return a result object that is a DimensionRecord iterable.

explain_no_results([execute])

Return human-readable messages that may help explain why the query yields no results.

general(dimensions, *names[, ...])

Execute query returning general result.

join_data_coordinates(iterable)

Return a new query that joins in an explicit table of data IDs.

join_dataset_search(dataset_type[, collections])

Return a new query with a search for a dataset joined in.

join_dimensions(dimensions)

Return a new query that joins the logical tables for additional dimensions.

materialize(*[, dimensions, datasets])

Execute the query, save its results to a temporary location, and return a new query that represents fetching or joining against those saved results.

where(*args[, bind])

Return a query with a boolean-expression filter on its rows.

Attributes Documentation

constraint_dataset_types

The names of all dataset types joined into the query.

The existence of datasets of these types constrains the data IDs of any type of result. Fields for these dataset types are also usable in ‘where’ expressions.

constraint_dimensions

Dimensions currently present in the query, either directly or indirectly.

This includes dimensions that are present in any joined subquery (such as a dataset search, materialization, or data ID upload) or where argument, as well as any required or implied dependency of those dimensions.

expression_factory

A factory for column expressions using overloaded operators. (ExpressionFactory).

Notes

Typically this attribute will be assigned to a single-character local variable, and then its (dynamic) attributes can be used to obtain references to columns that can be included in a query:

with butler.query() as query:
    x = query.expression_factory
    query = query.where(
        x.instrument == "LSSTCam",
        x.visit.day_obs > 20240701,
        x.any(x.band == 'u', x.band == 'y'),
    )

As shown above, the returned object also has an any method to create combine expressions with logical OR (as well as not_ and all, though the latter is rarely necessary since where already combines its arguments with AND).

Proxies for fields associated with dataset types (dataset_id, ingest_date, run, collection, as well as timespan for CALIBRATION collection searches) can be obtained with dict-like access instead:

with butler.query() as query:
    query = query.order_by(x["raw"].ingest_date)

Expression proxy objects that correspond to scalar columns overload the standard comparison operators (==, !=, <, >, <=, >=) and provide in_range, in_iterable, and in_query methods for membership tests. For order_by contexts, they also have a desc property to indicate that the sort order for that expression should be reversed.

Proxy objects for region and timespan fields have an overlaps method, and timespans also have begin and end properties to access scalar expression proxies for the bounds.

All proxy objects also have a is_null property.

Literal values can be created by calling ExpressionFactory.literal, but can almost always be created implicitly via overloaded operators instead.

Methods Documentation

any(*, execute: bool = True, exact: bool = True) bool

Test whether the query would return any rows.

Parameters:
executebool, optional

If True, execute at least a LIMIT 1 query if it cannot be determined prior to execution that the query would return no rows.

exactbool, optional

If True, run the full query and perform post-query filtering if needed, until at least one result row is found. If False, the returned result does not account for post-query filtering, and hence may be True even when all result rows would be filtered out.

Returns:
anybool

True if the query would (or might, depending on arguments) yield result rows. False if it definitely would not.

data_ids(dimensions: DimensionGroup | Iterable[str] | str | None = None) DataCoordinateQueryResults

Return a result object that is a DataCoordinate iterable.

Parameters:
dimensionsDimensionGroup, str, or Iterable [str], optional

The dimensions of the data IDs to yield, as either DimensionGroup instances or str names. Will be automatically expanded to a complete DimensionGroup. These dimensions do not need to match the query’s current dimensions. Default is constraint_dimensions.

Returns:
data_idsDataCoordinateQueryResults

Data IDs matching the given query parameters. These are guaranteed to identify all dimensions (DataCoordinate.hasFull returns True), but will not contain DimensionRecord objects (DataCoordinate.hasRecords returns False). Call with_dimension_records on the returned object to include dimension records as well.

datasets(dataset_type: str | DatasetType, collections: str | Iterable[str] | None = None, *, find_first: bool = True) DatasetRefQueryResults

Return a result object that is a DatasetRef iterable.

Parameters:
dataset_typestr or DatasetType

The dataset type to search for.

collectionsstr or Iterable [ str ], optional

The collection or collections to search, in order. If not provided or None, and the dataset has not already been joined into the query, the default collection search path for this butler is used.

find_firstbool, optional

If True (default), for each result data ID, only yield one DatasetRef of each DatasetType, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not be ....

Returns:
refsqueries.DatasetRefQueryResults

Dataset references matching the given query criteria. Nested data IDs are guaranteed to include values for all implied dimensions (i.e. DataCoordinate.hasFull will return True), but will not include dimension records (DataCoordinate.hasRecords will be False) unless with_dimension_records is called on the result object (which returns a new one).

Raises:
lsst.daf.butler.registry.DatasetTypeExpressionError

Raised when the dataset_type expression is invalid.

lsst.daf.butler.registry.NoDefaultCollectionError

Raised when collections is None and default butler collections are not defined.

TypeError

Raised when the arguments are incompatible, such as when a collection wildcard is passed when find_first is True

Notes

When multiple dataset types are queried in a single call, the results of this operation are equivalent to querying for each dataset type separately in turn, and no information about the relationships between datasets of different types is included.

dimension_records(element: str) DimensionRecordQueryResults

Return a result object that is a DimensionRecord iterable.

Parameters:
elementstr

The name of a dimension element to obtain records for.

Returns:
recordsqueries.DimensionRecordQueryResults

Data IDs matching the given query parameters.

explain_no_results(execute: bool = True) Iterable[str]

Return human-readable messages that may help explain why the query yields no results.

Parameters:
executebool, optional

If True (default) execute simplified versions (e.g. LIMIT 1) of aspects of the tree to more precisely determine where rows were filtered out.

Returns:
messagesIterable [ str ]

String messages that describe reasons the query might not yield any results.

general(dimensions: DimensionGroup | Iterable[str], *names: str, dimension_fields: Mapping[str, Set[str]] | None = None, dataset_fields: Mapping[str, Set[Literal['dataset_id', 'ingest_date', 'run', 'collection', 'timespan']] | ellipsis] | None = None, find_first: bool | None = None) GeneralQueryResults

Execute query returning general result.

This is an experimental interface and may change at any time.

Parameters:
dimensionsDimensionGroup or Iterable [ str ]

The dimensions that span all fields returned by this query.

*namesstr

Names of dimensions fields (in “dimension.field” format), dataset fields (in “dataset_type.field” format) to include in this query.

dimension_fieldsMapping [str, Set`[`str ]], optional

Dimension record fields included in this query, keyed by dimension element name.

dataset_fieldsMapping [str, Set`[`DatasetFieldName] | ... ], optional

Dataset fields included in this query, the key in the mapping is dataset type name. Ellipsis (...) can be used for value to include all dataset fields needed to extract DatasetRef instances later.

find_firstbool, optional

Whether this query requires find-first resolution for a dataset. This is ignored and can be omitted if the query has no dataset fields. It must be explicitly set to False if there are multiple dataset types with fields, or if any dataset type’s collections or timespan fields are included in the results.

Returns:
resultGeneralQueryResults

Query result that can be iterated over.

Notes

The dimensions of the returned query are automatically expanded to include those associated with all dimension and dataset fields; the dimensions argument is just the minimal dimensions to return.

join_data_coordinates(iterable: Iterable[DataCoordinate]) Query

Return a new query that joins in an explicit table of data IDs.

Parameters:
iterableIterable [ DataCoordinate ]

Iterable of DataCoordinate. All items must have the same dimensions. Must have at least one item.

Returns:
queryQuery

A new query object with the data IDs joined in.

Return a new query with a search for a dataset joined in.

Parameters:
dataset_typestr or DatasetType

Dataset type or name. May not refer to a dataset component.

collectionsIterable [ str ], optional

Iterable of collections to search. Order is preserved, but will not matter if the dataset search is only used as a constraint on dimensions or if find_first=False when requesting results. If not present or None, the default collection search path will be used.

Returns:
queryQuery

A new query object with dataset columns available and rows restricted to those consistent with the found data IDs.

Raises:
DatasetTypeError

Raised given dataset type is inconsistent with the registered dataset type.

MissingDatasetTypeError

Raised if the dataset type has not been registered and only a str dataset type name was given.

Notes

This method may require communication with the server unless the dataset type and collections have already been referenced by the same query context.

join_dimensions(dimensions: Iterable[str] | DimensionGroup) Query

Return a new query that joins the logical tables for additional dimensions.

Parameters:
dimensionsIterable [ str ] or DimensionGroup

Names of dimensions to join in.

Returns:
queryQuery

A new query object with the dimensions joined in.

Notes

Dimensions are automatically joined in whenever needed, so this method should rarely need to be called directly.

materialize(*, dimensions: Iterable[str] | DimensionGroup | None = None, datasets: Iterable[str] | None = None) Query

Execute the query, save its results to a temporary location, and return a new query that represents fetching or joining against those saved results.

Parameters:
dimensionsIterable [ str ] or DimensionGroup, optional

Dimensions to include in the temporary results. Default is to include all dimensions in the query.

datasetsIterable [ str ], optional

Names of dataset types that should be included in the new query; default is to include constraint_dataset_types.

Returns:
queryQuery

A new query object whose that represents the materialized rows.

Notes

Only dimension key columns and (at the discretion of the implementation) certain dataset columns are actually materialized, since at this stage we do not know which dataset or dimension record fields are actually needed in result rows, and these can be joined back in on the materialized dimension keys. But all constraints on those dimension keys (including dataset existence) are applied to the materialized rows.

where(*args: str | Predicate | DataCoordinate | Mapping[str, Any], bind: Mapping[str, Any] | None = None, **kwargs: Any) Query

Return a query with a boolean-expression filter on its rows.

Parameters:
*args

Constraints to apply, combined with logical AND. Arguments may be str expressions to parse, Predicate objects (these are typically constructed via expression_factory) or data IDs.

bindMapping

Mapping from string identifier appearing in a string expression to a literal value that should be substituted for it. This is recommended instead of embedding literals directly into the expression, especially for strings, timespans, or other types where quoting or formatting is nontrivial.

**kwargs

Data ID key value pairs that extend and override any present in *args.

Returns:
queryQuery

A new query object with the given row filters (as well as any already present in self). All row filters are combined with logical AND.

Notes

If an expression references a dimension or dimension element that is not already present in the query, it will be joined in, but dataset searches must already be joined into a query in order to reference their fields in expressions.

Data ID values are not checked for consistency; they are extracted from args and then kwargs and combined, with later values overriding earlier ones.