QueryDriver¶

class lsst.daf.butler.queries.driver.QueryDriver¶

Bases: AbstractContextManager[None]

Base class for the implementation object inside Query objects that is specialized for DirectButler vs. RemoteButler.

Notes

Implementations should be context managers. This allows them to manage the lifetime of server-side state, such as:

a SQL transaction, when necessary (DirectButler);
SQL cursors for queries that were not fully iterated over (DirectButler);
temporary database tables (DirectButler);
result-page Parquet files that were never fetched (RemoteButler);
uploaded Parquet files used to fill temporary database tables (RemoteButler);
cached content needed to construct query trees, like collection summaries (potentially all Butlers).

When possible, these sorts of things should be cleaned up earlier when they are no longer needed, and the Butler server will still have to guard against the context manager’s __exit__ signal never reaching it, but a context manager will take care of these much more often than relying on garbage collection and __del__ would.

Attributes Summary

universe

Object that defines all dimensions.

Methods Summary

`any`(tree, *, execute, exact)	Test whether the query would return any rows.
`count`(tree, result_spec, *, exact, discard)	Return the number of rows a query would return.
`execute`()	Execute a query and return the first result page.
`explain_no_results`(tree, execute)	Return human-readable messages that may help explain why the query yields no results.
`get_dataset_type`(name)	Return the dimensions for a dataset type.
`get_default_collections`()	Return the default collection search path.
`materialize`(tree, dimensions, datasets[, ...])	Execute a query tree, saving results to temporary storage for use in later queries.
`upload_data_coordinates`(dimensions, rows)	Upload a table of data coordinates for use in later queries.

Attributes Documentation

universe¶: Object that defines all dimensions.

Methods Documentation

abstract any(tree: QueryTree, *, execute: bool, exact: bool) → bool¶

Test whether the query would return any rows.

Parameters:

treeQueryTree: Query tree to evaluate.
executebool, optional: If True, execute at least a LIMIT 1 query if it cannot be determined prior to execution that the query would return no rows.
exactbool, optional: If True, run the full query and perform post-query filtering if needed, until at least one result row is found. If False, the returned result does not account for post-query filtering, and hence may be True even when all result rows would be filtered out.

Returns:

anybool: True if the query would (or might, depending on arguments) yield result rows. False if it definitely would not.

abstract count(tree: QueryTree, result_spec: DataCoordinateResultSpec | DimensionRecordResultSpec | DatasetRefResultSpec | GeneralResultSpec, *, exact: bool, discard: bool) → int¶

Return the number of rows a query would return.

Parameters:

treeQueryTree: Query tree to evaluate.
result_specResultSpec: The kind of results the user wants to count.
exactbool, optional: If True, run the full query and perform post-query filtering if needed to account for that filtering in the count. If False, the result may be an upper bound.
discardbool, optional: If True, compute the exact count even if it would require running the full query and then throwing away the result rows after counting them. If False, this is an error, as the user would usually be better off executing the query first to fetch its rows into a new query (or passing exact=False). Ignored if exact=False.

abstract execute(result_spec: DataCoordinateResultSpec, tree: QueryTree) → Iterator[DataCoordinateResultPage]¶

abstract execute(result_spec: DimensionRecordResultSpec, tree: QueryTree) → Iterator[DimensionRecordResultPage]

abstract execute(result_spec: DatasetRefResultSpec, tree: QueryTree) → Iterator[DatasetRefResultPage]

abstract execute(result_spec: GeneralResultSpec, tree: QueryTree) → Iterator[GeneralResultPage]

Execute a query and return the first result page.

Parameters:

result_specResultSpec: The kind of results the user wants from the query. This can affect the actual query (i.e. SQL and Python postprocessing) that is run, e.g. by changing what is in the SQL SELECT clause and even what tables are joined in, but it never changes the number or order of result rows.
treeQueryTree: Query tree to evaluate.

Yields:

pageResultPage: A page whose type corresponds to the type of result_spec, with rows from the query.

abstract explain_no_results(tree: QueryTree, execute: bool) → Iterable[str]¶

Return human-readable messages that may help explain why the query yields no results.

Parameters:

treeQueryTree: Query tree to evaluate.
executebool, optional: If True (default) execute simplified versions (e.g. LIMIT 1) of aspects of the tree to more precisely determine where rows were filtered out.

Returns:

messagesIterable [ str ]: String messages that describe reasons the query might not yield any results.

abstract get_dataset_type(name: str) → DatasetType¶

Return the dimensions for a dataset type.

Parameters:

namestr: Name of the dataset type.

Returns:

dataset_typeDatasetType: Dimensions of the dataset type.

Raises:

MissingDatasetTypeError: Raised if the dataset type is not registered.

abstract get_default_collections() → tuple[str, ...]¶

Return the default collection search path.

Returns:

collectionstuple [ str, … ]: The default collection search path as a tuple of str.

Raises:

NoDefaultCollectionError: Raised if there are no default collections.

abstract materialize(tree: QueryTree, dimensions: DimensionGroup, datasets: frozenset[str], allow_duplicate_overlaps: bool = False) → UUID¶

Execute a query tree, saving results to temporary storage for use in later queries.

Parameters:

treeQueryTree: Query tree to evaluate.
dimensionsDimensionGroup: Dimensions whose key columns should be preserved.
datasetsfrozenset [ str ]: Names of dataset types whose ID columns may be materialized. It is implementation-defined whether they actually are.
allow_duplicate_overlapsbool, optional: If set to True then query will be allowed to generate non-distinct rows for spatial overlaps.

Returns:

keyMaterializationKey: Unique identifier for the result rows that allows them to be referenced in a QueryTree.

abstract upload_data_coordinates(dimensions: DimensionGroup, rows: Iterable[tuple[int | str, ...]]) → UUID¶

Upload a table of data coordinates for use in later queries.

Parameters:

dimensionsDimensionGroup: Dimensions of the data coordinates.
rowsIterable [ tuple ]: Tuples of data coordinate values, covering just the “required” subset of dimensions.

Returns:

keyDataCoordinateUploadKey: Unique identifier for the upload that allows it to be referenced in a QueryTree.

Navigation

QueryDriver¶