QueryDriver

class lsst.daf.butler.queries.driver.QueryDriver

Bases: AbstractContextManager[None]

Base class for the implementation object inside Query objects that is specialized for DirectButler vs. RemoteButler.

Notes

Implementations should be context managers. This allows them to manage the lifetime of server-side state, such as:

  • a SQL transaction, when necessary (DirectButler);

  • SQL cursors for queries that were not fully iterated over (DirectButler);

  • temporary database tables (DirectButler);

  • result-page Parquet files that were never fetched (RemoteButler);

  • uploaded Parquet files used to fill temporary database tables (RemoteButler);

  • cached content needed to construct query trees, like collection summaries (potentially all Butlers).

When possible, these sorts of things should be cleaned up earlier when they are no longer needed, and the Butler server will still have to guard against the context manager’s __exit__ signal never reaching it, but a context manager will take care of these much more often than relying on garbage collection and __del__ would.

Attributes Summary

universe

Object that defines all dimensions.

Methods Summary

any(tree, *, execute, exact)

Test whether the query would return any rows.

count(tree, result_spec, *, exact, discard)

Return the number of rows a query would return.

execute()

Execute a query and return the first result page.

explain_no_results(tree, execute)

Return human-readable messages that may help explain why the query yields no results.

get_dataset_type(name)

Return the dimensions for a dataset type.

get_default_collections()

Return the default collection search path.

materialize(tree, dimensions, datasets[, ...])

Execute a query tree, saving results to temporary storage for use in later queries.

upload_data_coordinates(dimensions, rows)

Upload a table of data coordinates for use in later queries.

Attributes Documentation

universe

Object that defines all dimensions.

Methods Documentation

abstract any(tree: QueryTree, *, execute: bool, exact: bool) bool

Test whether the query would return any rows.

Parameters:
treeQueryTree

Query tree to evaluate.

executebool, optional

If True, execute at least a LIMIT 1 query if it cannot be determined prior to execution that the query would return no rows.

exactbool, optional

If True, run the full query and perform post-query filtering if needed, until at least one result row is found. If False, the returned result does not account for post-query filtering, and hence may be True even when all result rows would be filtered out.

Returns:
anybool

True if the query would (or might, depending on arguments) yield result rows. False if it definitely would not.

abstract count(tree: QueryTree, result_spec: DataCoordinateResultSpec | DimensionRecordResultSpec | DatasetRefResultSpec | GeneralResultSpec, *, exact: bool, discard: bool) int

Return the number of rows a query would return.

Parameters:
treeQueryTree

Query tree to evaluate.

result_specResultSpec

The kind of results the user wants to count.

exactbool, optional

If True, run the full query and perform post-query filtering if needed to account for that filtering in the count. If False, the result may be an upper bound.

discardbool, optional

If True, compute the exact count even if it would require running the full query and then throwing away the result rows after counting them. If False, this is an error, as the user would usually be better off executing the query first to fetch its rows into a new query (or passing exact=False). Ignored if exact=False.

abstract execute(result_spec: DataCoordinateResultSpec, tree: QueryTree) Iterator[DataCoordinateResultPage]
abstract execute(result_spec: DimensionRecordResultSpec, tree: QueryTree) Iterator[DimensionRecordResultPage]
abstract execute(result_spec: DatasetRefResultSpec, tree: QueryTree) Iterator[DatasetRefResultPage]
abstract execute(result_spec: GeneralResultSpec, tree: QueryTree) Iterator[GeneralResultPage]

Execute a query and return the first result page.

Parameters:
result_specResultSpec

The kind of results the user wants from the query. This can affect the actual query (i.e. SQL and Python postprocessing) that is run, e.g. by changing what is in the SQL SELECT clause and even what tables are joined in, but it never changes the number or order of result rows.

treeQueryTree

Query tree to evaluate.

Yields:
pageResultPage

A page whose type corresponds to the type of result_spec, with rows from the query.

abstract explain_no_results(tree: QueryTree, execute: bool) Iterable[str]

Return human-readable messages that may help explain why the query yields no results.

Parameters:
treeQueryTree

Query tree to evaluate.

executebool, optional

If True (default) execute simplified versions (e.g. LIMIT 1) of aspects of the tree to more precisely determine where rows were filtered out.

Returns:
messagesIterable [ str ]

String messages that describe reasons the query might not yield any results.

abstract get_dataset_type(name: str) DatasetType

Return the dimensions for a dataset type.

Parameters:
namestr

Name of the dataset type.

Returns:
dataset_typeDatasetType

Dimensions of the dataset type.

Raises:
MissingDatasetTypeError

Raised if the dataset type is not registered.

abstract get_default_collections() tuple[str, ...]

Return the default collection search path.

Returns:
collectionstuple [ str, … ]

The default collection search path as a tuple of str.

Raises:
NoDefaultCollectionError

Raised if there are no default collections.

abstract materialize(tree: QueryTree, dimensions: DimensionGroup, datasets: frozenset[str], allow_duplicate_overlaps: bool = False) UUID

Execute a query tree, saving results to temporary storage for use in later queries.

Parameters:
treeQueryTree

Query tree to evaluate.

dimensionsDimensionGroup

Dimensions whose key columns should be preserved.

datasetsfrozenset [ str ]

Names of dataset types whose ID columns may be materialized. It is implementation-defined whether they actually are.

allow_duplicate_overlapsbool, optional

If set to True then query will be allowed to generate non-distinct rows for spatial overlaps.

Returns:
keyMaterializationKey

Unique identifier for the result rows that allows them to be referenced in a QueryTree.

abstract upload_data_coordinates(dimensions: DimensionGroup, rows: Iterable[tuple[int | str, ...]]) UUID

Upload a table of data coordinates for use in later queries.

Parameters:
dimensionsDimensionGroup

Dimensions of the data coordinates.

rowsIterable [ tuple ]

Tuples of data coordinate values, covering just the “required” subset of dimensions.

Returns:
keyDataCoordinateUploadKey

Unique identifier for the upload that allows it to be referenced in a QueryTree.