AllDimensionsQuantumGraphBuilder¶
- final class lsst.pipe.base.all_dimensions_quantum_graph_builder.AllDimensionsQuantumGraphBuilder(pipeline_graph: PipelineGraph, butler: Butler, *, where: str = '', dataset_query_constraint: DatasetQueryConstraintVariant = <class 'lsst.pipe.base._datasetQueryConstraints._ALL'>, bind: Mapping[str, Any] | None = None, **kwargs: Any)¶
Bases:
QuantumGraphBuilderAn implementation of
QuantumGraphBuilderthat uses a single large query for data IDs covering all dimensions in the pipeline.- Parameters:
- pipeline_graph
pipeline_graph.PipelineGraph Pipeline to build a
QuantumGraphfrom, as a graph. Will be resolved in-place with the given butler (any existing resolution is ignored).- butler
lsst.daf.butler.Butler Client for the data repository. Should be read-only.
- where
str, optional Butler expression language constraint to apply to all data IDs.
- dataset_query_constraint
DatasetQueryConstraintVariant, optional Specification of which overall-input datasets should be used to constrain the initial data ID queries. Not including an important constraint can result in catastrophically large query results that take too long to process, while including too many makes the query much more complex, increasing the chances that the database will choose a bad (sometimes catastrophically bad) query plan.
- bind
Mapping, optional Variable substitutions for the
whereexpression.- **kwargs
Additional keyword arguments forwarded to
QuantumGraphBuilder.
- pipeline_graph
Notes
This is a general-purpose algorithm that delegates the problem of determining which “end” of the pipeline is more constrained (beginning by input collection contents vs. end by the
wherestring) to the database query planner, which usually does a good job.This algorithm suffers from a serious limitation, which we refer to as the “tract slicing” problem from its most common variant: the
wherestring and general data ID intersection rules apply to all data IDs in the graph. For example, if atractconstraint is present in thewherestring or an overall-input dataset, then it is impossible for any data ID that does not overlap that tract to be present anywhere in the pipeline, such as a{visit, detector}combination where thevisitoverlaps thetracteven if thedetectordoes not.Attributes Summary
Definitions of all data dimensions.
Methods Summary
build([metadata, attach_datastore_records])Build the quantum graph.
process_subgraph(subgraph)Build the rough structure for an independent subset of the
QuantumGraphand query for relevant existing datasets.Attributes Documentation
- universe¶
Definitions of all data dimensions.
Methods Documentation
- build(metadata: Mapping[str, Any] | None = None, attach_datastore_records: bool = True) QuantumGraph¶
Build the quantum graph.
- Parameters:
- metadata
Mapping, optional Flexible metadata to add to the quantum graph.
- attach_datastore_records
bool, optional Whether to include datastore records in the graph. Required for
lsst.daf.butler.QuantumBackedButlerexecution.
- metadata
- Returns:
- quantum_graph
QuantumGraph DAG describing processing to be performed.
- quantum_graph
Notes
External code is expected to construct a
QuantumGraphBuilderand then call this method exactly once. See class documentation for details on what it does.
- process_subgraph(subgraph: PipelineGraph) QuantumGraphSkeleton¶
Build the rough structure for an independent subset of the
QuantumGraphand query for relevant existing datasets.- Parameters:
- subgraph
pipeline_graph.PipelineGraph Subset of the pipeline graph that should be processed by this call. This is always resolved and topologically sorted. It should not be modified.
- subgraph
- Returns:
- skeleton
quantum_graph_skeleton.QuantumGraphSkeleton Class representing an initial quantum graph. See
quantum_graph_skeleton.QuantumGraphSkeletondocs for details. After this is returned, the object may be modified in-place in unspecified ways.
- skeleton
Notes
The
quantum_graph_skeleton.QuantumGraphSkeletonshould associateDatasetRefobjects with nodes for existing datasets. In particular:quantum_graph_skeleton.QuantumGraphSkeleton.set_dataset_refmust be used to associate existing datasets with all overall-input dataset nodes in the skeleton by queryinginput_collections. This includes all standard input nodes and any prerequisite nodes added by the method (prerequisite nodes may also be left out entirely, as the base class can add them later, albeit possibly less efficiently).quantum_graph_skeleton.QuantumGraphSkeleton.set_output_for_skipmust be used to associate existing datasets with output dataset nodes by queryingskip_existing_in.quantum_graph_skeleton.QuantumGraphSkeleton.add_output_in_the_waymust be used to associated existing outputs with output dataset nodes by queryingoutput_runifoutput_run_existsisTrue. Note that the presence of such datasets is not automatically an error, even ifclobberisFalse, as these may be quanta that will be skipped.
DatasetRefobjects for existing datasets with empty data IDs in all of the above categories may be found in theempty_dimensions_datasetsattribute, as these are queried for prior to this call by the base class, but associating them with graph nodes is still this method’s responsibility.Dataset types should never be components and should always use the “common” storage class definition in
pipeline_graph.DatasetTypeNode(which is the data repository definition when the dataset type is registered).