ResourceUsageQuantumGraphBuilder

class lsst.analysis.tools.tasks.ResourceUsageQuantumGraphBuilder(butler: Butler, *, dataset_type_names: Iterable[str] | None = None, where: str = '', input_collections: Sequence[str] | None = None, output_run: str | None = None, skip_existing_in: Sequence[str] = (), clobber: bool = False)

Bases: QuantumGraphBuilder

Custom quantum graph generator and pipeline builder for resource usage summary tasks.

Parameters:
butlerlsst.daf.butler.Butler

Butler client to query for inputs and dataset types.

dataset_type_namesIterable [ str ], optional

Iterable of dataset type names or shell-style glob patterns for the metadata datasets to be used as input. Default is all datasets ending with _metadata (other than the resource-usage summary tasks’ own metadata outputs, where are always ignored). A gather-resource task with a single quantum is created for each matching metadata dataset.

wherestr, optional

Data ID expression that constrains the input metadata datasets.

input_collectionsSequence [ str ], optional

Sequence of collections to search for inputs. If not provided, butler.collections is used and must not be empty.

output_runstr, optional

Output RUN collection name. If not provided, butler.run is used and must not be None.

skip_existing_inSequence [ str ], optional

Sequence of collections to search for outputs, allowing quanta whose outputs exist to be skipped.

clobberbool, optional

Whether execution of this quantum graph will permit clobbering. If False (default), existing outputs in output_run are an error unless skip_existing_in will cause those quanta to be skipped.

Notes

The resource usage summary tasks cannot easily be added to a regular pipeline, as it’s much more natural to have the gather tasks run automatically on all other tasks. And we can generate a quantum graph for these particular tasks much more efficiently than the general-purpose algorithm could.

Attributes Summary

universe

Definitions of all data dimensions.

Methods Summary

build([metadata, attach_datastore_records])

Build the quantum graph.

main()

Run the command-line interface for this quantum-graph builder.

make_argument_parser()

Make the argument parser for the command-line interface.

process_subgraph(subgraph)

Build the rough structure for an independent subset of the QuantumGraph and query for relevant existing datasets.

Attributes Documentation

universe

Definitions of all data dimensions.

Methods Documentation

build(metadata: Mapping[str, Any] | None = None, attach_datastore_records: bool = True) QuantumGraph

Build the quantum graph.

Parameters:
metadataMapping, optional

Flexible metadata to add to the quantum graph.

attach_datastore_recordsbool, optional

Whether to include datastore records in the graph. Required for lsst.daf.butler.QuantumBackedButler execution.

Returns:
quantum_graphQuantumGraph

DAG describing processing to be performed.

Notes

External code is expected to construct a QuantumGraphBuilder and then call this method exactly once. See class documentation for details on what it does.

classmethod main() None

Run the command-line interface for this quantum-graph builder.

This function provides the implementation for the build-gather-resource-usage-qg script.

classmethod make_argument_parser() ArgumentParser

Make the argument parser for the command-line interface.

process_subgraph(subgraph: PipelineGraph) QuantumGraphSkeleton

Build the rough structure for an independent subset of the QuantumGraph and query for relevant existing datasets.

Parameters:
subgraphpipeline_graph.PipelineGraph

Subset of the pipeline graph that should be processed by this call. This is always resolved and topologically sorted. It should not be modified.

Returns:
skeletonquantum_graph_skeleton.QuantumGraphSkeleton

Class representing an initial quantum graph. See quantum_graph_skeleton.QuantumGraphSkeleton docs for details. After this is returned, the object may be modified in-place in unspecified ways.

Notes

In addition to returning a quantum_graph_skeleton.QuantumGraphSkeleton, this method should populate the existing_datasets structure by querying for all relevant datasets with non-empty data IDs (those with empty data IDs will already be present). In particular:

  • inputs must always be populated with all overall-input datasets (but not prerequisites), by querying input_collections;

  • outputs_for_skip must be populated with any intermediate our output datasets present in skip_existing_in (it can be ignored if skip_existing_in is empty);

  • outputs_in_the_way must be populated with any intermediate or output datasets present in output_run, if output_run_exists (it can be ignored if output_run_exists is False). Note that the presence of such datasets is not automatically an error, even if clobber is `False, as these may be quanta that will be skipped.

  • inputs must be populated with all prerequisite-input datasets that were included in the skeleton, by querying input_collections (not all prerequisite inputs need to be included in the skeleton, but the base class can only use per-quantum queries to find them, and that can be slow when there are many quanta).

Dataset types should never be components and should always use the “common” storage class definition in pipeline_graph.DatasetTypeNode (which is the data repository definition when the dataset type is registered).