QuantumGraphBuilder

class lsst.pipe.base.quantum_graph_builder.QuantumGraphBuilder(pipeline_graph: PipelineGraph, butler: Butler, *, input_collections: Sequence[str] | None = None, output_run: str | None = None, skip_existing_in: Sequence[str] = (), clobber: bool = False)

Bases: ABC

An abstract base class for building QuantumGraph objects from a pipeline.

Parameters:
pipeline_graphpipeline_graph.PipelineGraph

Pipeline to build a QuantumGraph from, as a graph. Will be resolved in-place with the given butler (any existing resolution is ignored).

butlerlsst.daf.butler.Butler

Client for the data repository. Should be read-only.

input_collectionsSequence [ str ], optional

Collections to search for overall-input datasets. If not provided, butler.collections is used (and must not be empty).

output_runstr, optional

Output RUN collection. If not provided, butler.run is used (and must not be None).

skip_existing_inSequence [ str ], optional

Collections to search for outputs that already exist for the purpose of skipping quanta that have already been run.

clobberbool, optional

Whether to raise if predicted outputs already exist in output_run (not including those quanta that would be skipped because they’ve already been run). This never actually clobbers outputs; it just informs the graph generation algorithm whether execution will run with clobbering enabled. This is ignored if output_run does not exist.

Notes

Constructing a QuantumGraphBuilder will run queries for existing datasets with empty data IDs (including but not limited to init inputs and outputs), in addition to resolving the given pipeline graph and testing for existence of the output run collection.

The build method splits the pipeline graph into independent subgraphs, then calls the abstract method process_subgraph on each, to allow concrete implementations to populate the rough graph structure (the QuantumGraphSkeleton class) and search for existing datasets (further populating the builder’s existing_datasets struct). The build method then:

  • assembles lsst.daf.butler.Quantum instances from all data IDs in the skeleton;

  • looks for existing outputs found in skip_existing_in to see if any quanta should be skipped;

  • calls PipelineTaskConnections.adjustQuantum on all quanta, adjusting downstream quanta appropriately when preliminary predicted outputs are rejected (pruning nodes that will not have the inputs they need to run);

  • attaches datastore records and registry dataset types to the graph.

In addition to implementing process_subgraph, derived classes are generally expected to add new construction keyword-only arguments to control the data IDs of the quantum graph, while forwarding all of the arguments defined in the base class to super.

Attributes Summary

universe

Definitions of all data dimensions.

Methods Summary

build([metadata, attach_datastore_records])

Build the quantum graph.

process_subgraph(subgraph)

Build the rough structure for an independent subset of the QuantumGraph and query for relevant existing datasets.

Attributes Documentation

universe

Definitions of all data dimensions.

Methods Documentation

final build(metadata: Mapping[str, Any] | None = None, attach_datastore_records: bool = True) QuantumGraph

Build the quantum graph.

Parameters:
metadataMapping, optional

Flexible metadata to add to the quantum graph.

attach_datastore_recordsbool, optional

Whether to include datastore records in the graph. Required for lsst.daf.butler.QuantumBackedButler execution.

Returns:
quantum_graphQuantumGraph

DAG describing processing to be performed.

Notes

External code is expected to construct a QuantumGraphBuilder and then call this method exactly once. See class documentation for details on what it does.

abstract process_subgraph(subgraph: PipelineGraph) QuantumGraphSkeleton

Build the rough structure for an independent subset of the QuantumGraph and query for relevant existing datasets.

Parameters:
subgraphpipeline_graph.PipelineGraph

Subset of the pipeline graph that should be processed by this call. This is always resolved and topologically sorted. It should not be modified.

Returns:
skeletonquantum_graph_skeleton.QuantumGraphSkeleton

Class representing an initial quantum graph. See quantum_graph_skeleton.QuantumGraphSkeleton docs for details. After this is returned, the object may be modified in-place in unspecified ways.

Notes

In addition to returning a quantum_graph_skeleton.QuantumGraphSkeleton, this method should populate the existing_datasets structure by querying for all relevant datasets with non-empty data IDs (those with empty data IDs will already be present). In particular:

  • inputs must always be populated with all overall-input datasets (but not prerequisites), by querying input_collections;

  • outputs_for_skip must be populated with any intermediate our output datasets present in skip_existing_in (it can be ignored if skip_existing_in is empty);

  • outputs_in_the_way must be populated with any intermediate or output datasets present in output_run, if output_run_exists (it can be ignored if output_run_exists is False). Note that the presence of such datasets is not automatically an error, even if clobber is `False, as these may be quanta that will be skipped.

  • inputs must be populated with all prerequisite-input datasets that were included in the skeleton, by querying input_collections (not all prerequisite inputs need to be included in the skeleton, but the base class can only use per-quantum queries to find them, and that can be slow when there are many quanta).

Dataset types should never be components and should always use the “common” storage class definition in pipeline_graph.DatasetTypeNode (which is the data repository definition when the dataset type is registered).