DataCoordinateQueryResults

class lsst.daf.butler.registry.queries.DataCoordinateQueryResults(query: Query)

Bases: DataCoordinateIterable

An enhanced implementation of DataCoordinateIterable that represents data IDs retrieved from a database query.

Parameters:
queryQuery

Query object that backs this class.

Notes

The Query class now implements essentially all of this class’s functionality; “QueryResult” classes like this one now exist only to provide interface backwards compatibility and more specific iterator types.

Attributes Summary

dimensions

The dimensions of the data IDs returned by this query.

graph

Deprecated since version v27.

Methods Summary

any(*[, execute, exact])

Test whether this query returns any results.

count(*[, exact, discard])

Count the number of rows this query would return.

expanded()

Return a results object for which hasRecords returns True.

explain_no_results([execute])

Return human-readable messages that may help explain why the query yields no results.

findDatasets(datasetType, collections, *[, ...])

Find datasets using the data IDs identified by this query.

findRelatedDatasets(datasetType, collections, *)

Find datasets using the data IDs identified by this query, and return them along with the original data IDs.

hasFull()

Indicate if all data IDs in this iterable identify all dimensions.

hasRecords()

Return whether all data IDs in this iterable contain records.

limit(limit[, offset])

Make the iterator return limited number of records.

materialize()

Insert this query's results into a temporary table.

order_by(*args)

Make the iterator return ordered results.

subset([dimensions, unique])

Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.

Attributes Documentation

dimensions

The dimensions of the data IDs returned by this query.

graph

Deprecated since version v27: Deprecated in favor of .dimensions. Will be removed after v27.

Methods Documentation

any(*, execute: bool = True, exact: bool = True) bool

Test whether this query returns any results.

Parameters:
executebool, optional

If True, execute at least a LIMIT 1 query if it cannot be determined prior to execution that the query would return no rows.

exactbool, optional

If True, run the full query and perform post-query filtering if needed, until at least one result row is found. If False, the returned result does not account for post-query filtering, and hence may be True even when all result rows would be filtered out.

Returns:
anybool

True if the query would (or might, depending on arguments) yield result rows. False if it definitely would not.

count(*, exact: bool = True, discard: bool = False) int

Count the number of rows this query would return.

Parameters:
exactbool, optional

If True, run the full query and perform post-query filtering if needed to account for that filtering in the count. If False, the result may be an upper bound.

discardbool, optional

If True, compute the exact count even if it would require running the full query and then throwing away the result rows after counting them. If False, this is an error, as the user would usually be better off executing the query first to fetch its rows into a new query (or passing exact=False). Ignored if exact=False.

Returns:
countint

The number of rows the query would return, or an upper bound if exact=False.

Notes

This counts the number of rows returned, not the number of unique rows returned, so even with exact=True it may provide only an upper bound on the number of deduplicated result rows.

expanded() DataCoordinateQueryResults

Return a results object for which hasRecords returns True.

This method may involve actually executing database queries to fetch DimensionRecord objects.

Returns:
resultsDataCoordinateQueryResults

A results object for which hasRecords returns True. May be self if that is already the case.

Notes

For very result sets, it may be much more efficient to call materialize before calling expanded, to avoid performing the original query multiple times (as a subquery) in the follow-up queries that fetch dimension records. For example:

with registry.queryDataIds(...).materialize() as tempDataIds:
    dataIdsWithRecords = tempDataIds.expanded()
    for dataId in dataIdsWithRecords:
        ...
explain_no_results(execute: bool = True) Iterable[str]

Return human-readable messages that may help explain why the query yields no results.

Parameters:
executebool, optional

If True (default) execute simplified versions (e.g. LIMIT 1) of aspects of the tree to more precisely determine where rows were filtered out.

Returns:
messagesIterable [ str ]

String messages that describe reasons the query might not yield any results.

findDatasets(datasetType: DatasetType | str, collections: Any, *, findFirst: bool = True, components: bool = False) ParentDatasetQueryResults

Find datasets using the data IDs identified by this query.

Parameters:
datasetTypeDatasetType or str

Dataset type or the name of one to search for. Must have dimensions that are a subset of self.graph.

collectionsAny

An expression that fully or partially identifies the collections to search for the dataset, such as a str, re.Pattern, or iterable thereof. ... can be used to return all collections. See Collection expressions for more information.

findFirstbool, optional

If True (default), for each result data ID, only yield one DatasetRef, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be ....

componentsbool, optional

Must be False. Provided only for backwards compatibility. After v27 this argument will be removed entirely.

Returns:
datasetsParentDatasetQueryResults

A lazy-evaluation object representing dataset query results, iterable over DatasetRef objects. If self.hasRecords(), all nested data IDs in those dataset references will have records as well.

Raises:
MissingDatasetTypeError

Raised if the given dataset type is not registered.

findRelatedDatasets(datasetType: DatasetType | str, collections: Any, *, findFirst: bool = True, dimensions: DimensionGroup | DimensionGraph | Iterable[str] | None = None) Iterable[tuple[lsst.daf.butler.dimensions._coordinate.DataCoordinate, lsst.daf.butler._dataset_ref.DatasetRef]]

Find datasets using the data IDs identified by this query, and return them along with the original data IDs.

This is a variant of findDatasets that is often more useful when the target dataset type does not have all of the dimensions of the original data ID query, as is generally the case with calibration lookups.

Parameters:
datasetTypeDatasetType or str

Dataset type or the name of one to search for. Must have dimensions that are a subset of self.graph.

collectionsAny

An expression that fully or partially identifies the collections to search for the dataset, such as a str, re.Pattern, or iterable thereof. ... can be used to return all collections. See Collection expressions for more information.

findFirstbool, optional

If True (default), for each data ID in self, only yield one DatasetRef, from the first collection in which a dataset of that dataset type appears (according to the order of collections passed in). If True, collections must not contain regular expressions and may not be .... Note that this is not the same as yielding one DatasetRef for each yielded data ID if dimensions is not None.

dimensionsDimensionGroup, DimensionGraph, or Iterable [ str ], optional

The dimensions of the data IDs returned. Must be a subset of self.dimensions.

Returns:
pairsIterable [ tuple [ DataCoordinate, DatasetRef ] ]

An iterable of (data ID, dataset reference) pairs.

Raises:
MissingDatasetTypeError

Raised if the given dataset type is not registered.

hasFull() bool

Indicate if all data IDs in this iterable identify all dimensions.

Not just required dimensions.

Returns:
statebool

If True, all(d.hasFull() for d in iterable) is guaranteed. If False, no guarantees are made.

hasRecords() bool

Return whether all data IDs in this iterable contain records.

Returns:
statebool

If True, all(d.hasRecords() for d in iterable) is guaranteed. If False, no guarantees are made.

limit(limit: int, offset: int | None = 0) DataCoordinateQueryResults

Make the iterator return limited number of records.

Parameters:
limitint

Upper limit on the number of returned records.

offsetint or None, optional

The number of records to skip before returning at most limit records. None is interpreted the same as zero for backwards compatibility.

Returns:
resultDataCoordinateQueryResults

Returns self instance which is updated to return limited set of records.

Notes

This method modifies the iterator in place and returns the same instance to support method chaining. Normally this method is used together with order_by method.

materialize() Iterator[DataCoordinateQueryResults]

Insert this query’s results into a temporary table.

Returns:
contexttyping.ContextManager [ DataCoordinateQueryResults ]

A context manager that ensures the temporary table is created and populated in __enter__ (returning a results object backed by that table), and dropped in __exit__. If self is already materialized, the context manager may do nothing (reflecting the fact that an outer context manager should already take care of everything else).

Notes

When using a very large result set to perform multiple queries (e.g. multiple calls to subset with different arguments, or even a single call to expanded), it may be much more efficient to start by materializing the query and only then performing the follow up queries. It may also be less efficient, depending on how well database engine’s query optimizer can simplify those particular follow-up queries and how efficiently it caches query results even when the are not explicitly inserted into a temporary table. See expanded and subset for examples.

order_by(*args: str) DataCoordinateQueryResults

Make the iterator return ordered results.

Parameters:
*argsstr

Names of the columns/dimensions to use for ordering. Column name can be prefixed with minus (-) to use descending ordering.

Returns:
resultDataCoordinateQueryResults

Returns self instance which is updated to return ordered result.

Notes

This method modifies the iterator in place and returns the same instance to support method chaining.

subset(dimensions: DimensionGroup | DimensionGraph | Iterable[str] | None = None, *, unique: bool = False) DataCoordinateQueryResults

Return a results object containing a subset of the dimensions of this one, and/or a unique near-subset of its rows.

This method may involve actually executing database queries to fetch DimensionRecord objects.

Parameters:
dimensionsDimensionGroup, DimensionGraph, or Iterable [ str], optional

Dimensions to include in the new results object. If None, self.dimensions is used.

uniquebool, optional

If True (False is default), the query should only return unique data IDs. This is implemented in the database; to obtain unique results via Python-side processing (which may be more efficient in some cases), use toSet to construct a DataCoordinateSet from this results object instead.

Returns:
resultsDataCoordinateQueryResults

A results object corresponding to the given criteria. May be self if it already qualifies.

Raises:
ValueError

Raised when dimensions is not a subset of the dimensions in this result.

Notes

This method can only return a “near-subset” of the original result rows in general because of subtleties in how spatial overlaps are implemented; see Query.projected for more information.

When calling subset multiple times on the same very large result set, it may be much more efficient to call materialize first. For example:

dimensions1 = DimensionGroup(...)
dimensions2 = DimensionGroup(...)
with registry.queryDataIds(...).materialize() as tempDataIds:
    for dataId1 in tempDataIds.subset(dimensions1, unique=True):
        ...
    for dataId2 in tempDataIds.subset(dimensions2, unique=True):
        ...