QuantumGraphBuilder¶
- class lsst.pipe.base.quantum_graph_builder.QuantumGraphBuilder(pipeline_graph: PipelineGraph, butler: Butler, *, input_collections: Sequence[str] | None = None, output_run: str | None = None, skip_existing_in: Sequence[str] = (), clobber: bool = False)¶
Bases:
ABC
An abstract base class for building
QuantumGraph
objects from a pipeline.- Parameters:
- pipeline_graph
pipeline_graph.PipelineGraph
Pipeline to build a
QuantumGraph
from, as a graph. Will be resolved in-place with the given butler (any existing resolution is ignored).- butler
lsst.daf.butler.Butler
Client for the data repository. Should be read-only.
- input_collections
Sequence
[str
], optional Collections to search for overall-input datasets. If not provided,
butler.collections
is used (and must not be empty).- output_run
str
, optional Output
RUN
collection. If not provided,butler.run
is used (and must not beNone
).- skip_existing_in
Sequence
[str
], optional Collections to search for outputs that already exist for the purpose of skipping quanta that have already been run.
- clobber
bool
, optional Whether to raise if predicted outputs already exist in
output_run
(not including those quanta that would be skipped because they’ve already been run). This never actually clobbers outputs; it just informs the graph generation algorithm whether execution will run with clobbering enabled. This is ignored ifoutput_run
does not exist.
- pipeline_graph
Notes
Constructing a
QuantumGraphBuilder
will run queries for existing datasets with empty data IDs (including but not limited to init inputs and outputs), in addition to resolving the given pipeline graph and testing for existence of theoutput
run collection.The
build
method splits the pipeline graph into independent subgraphs, then calls the abstract methodprocess_subgraph
on each, to allow concrete implementations to populate the rough graph structure (theQuantumGraphSkeleton
class) and search for existing datasets (further populating the builder’sexisting_datasets
struct). Thebuild
method then:assembles
lsst.daf.butler.Quantum
instances from all data IDs in the skeleton;looks for existing outputs found in
skip_existing_in
to see if any quanta should be skipped;calls
PipelineTaskConnections.adjustQuantum
on all quanta, adjusting downstream quanta appropriately when preliminary predicted outputs are rejected (pruning nodes that will not have the inputs they need to run);attaches datastore records and registry dataset types to the graph.
In addition to implementing
process_subgraph
, derived classes are generally expected to add new construction keyword-only arguments to control the data IDs of the quantum graph, while forwarding all of the arguments defined in the base class tosuper
.Attributes Summary
Definitions of all data dimensions.
Methods Summary
build
([metadata, attach_datastore_records])Build the quantum graph.
process_subgraph
(subgraph)Build the rough structure for an independent subset of the
QuantumGraph
and query for relevant existing datasets.Attributes Documentation
- universe¶
Definitions of all data dimensions.
Methods Documentation
- final build(metadata: Mapping[str, Any] | None = None, attach_datastore_records: bool = True) QuantumGraph ¶
Build the quantum graph.
- Parameters:
- metadata
Mapping
, optional Flexible metadata to add to the quantum graph.
- attach_datastore_records
bool
, optional Whether to include datastore records in the graph. Required for
lsst.daf.butler.QuantumBackedButler
execution.
- metadata
- Returns:
- quantum_graph
QuantumGraph
DAG describing processing to be performed.
- quantum_graph
Notes
External code is expected to construct a
QuantumGraphBuilder
and then call this method exactly once. See class documentation for details on what it does.
- abstract process_subgraph(subgraph: PipelineGraph) QuantumGraphSkeleton ¶
Build the rough structure for an independent subset of the
QuantumGraph
and query for relevant existing datasets.- Parameters:
- subgraph
pipeline_graph.PipelineGraph
Subset of the pipeline graph that should be processed by this call. This is always resolved and topologically sorted. It should not be modified.
- subgraph
- Returns:
- skeleton
quantum_graph_skeleton.QuantumGraphSkeleton
Class representing an initial quantum graph. See
quantum_graph_skeleton.QuantumGraphSkeleton
docs for details. After this is returned, the object may be modified in-place in unspecified ways.
- skeleton
Notes
In addition to returning a
quantum_graph_skeleton.QuantumGraphSkeleton
, this method should populate theexisting_datasets
structure by querying for all relevant datasets with non-empty data IDs (those with empty data IDs will already be present). In particular:inputs
must always be populated with all overall-input datasets (but not prerequisites), by queryinginput_collections
;outputs_for_skip
must be populated with any intermediate our output datasets present inskip_existing_in
(it can be ignored ifskip_existing_in
is empty);outputs_in_the_way
must be populated with any intermediate or output datasets present inoutput_run
, ifoutput_run_exists
(it can be ignored ifoutput_run_exists
isFalse
). Note that the presence of such datasets is not automatically an error, even ifclobber is `False
, as these may be quanta that will be skipped.inputs
must be populated with all prerequisite-input datasets that were included in the skeleton, by queryinginput_collections
(not all prerequisite inputs need to be included in the skeleton, but the base class can only use per-quantum queries to find them, and that can be slow when there are many quanta).
Dataset types should never be components and should always use the “common” storage class definition in
pipeline_graph.DatasetTypeNode
(which is the data repository definition when the dataset type is registered).