ResourceUsageQuantumGraphBuilder¶
- class lsst.analysis.tools.tasks.ResourceUsageQuantumGraphBuilder(butler: Butler, *, dataset_type_names: Iterable[str] | None = None, where: str = '', input_collections: Sequence[str] | None = None, output_run: str | None = None, skip_existing_in: Sequence[str] = (), clobber: bool = False)¶
Bases:
QuantumGraphBuilder
Custom quantum graph generator and pipeline builder for resource usage summary tasks.
- Parameters:
- butler
lsst.daf.butler.Butler
Butler client to query for inputs and dataset types.
- dataset_type_names
Iterable
[str
], optional Iterable of dataset type names or shell-style glob patterns for the metadata datasets to be used as input. Default is all datasets ending with
_metadata
(other than the resource-usage summary tasks’ own metadata outputs, where are always ignored). A gather-resource task with a single quantum is created for each matching metadata dataset.- where
str
, optional Data ID expression that constrains the input metadata datasets.
- input_collections
Sequence
[str
], optional Sequence of collections to search for inputs. If not provided,
butler.collections
is used and must not be empty.- output_run
str
, optional Output
RUN
collection name. If not provided,butler.run
is used and must not beNone
.- skip_existing_in
Sequence
[str
], optional Sequence of collections to search for outputs, allowing quanta whose outputs exist to be skipped.
- clobber
bool
, optional Whether execution of this quantum graph will permit clobbering. If
False
(default), existing outputs inoutput_run
are an error unlessskip_existing_in
will cause those quanta to be skipped.
- butler
Notes
The resource usage summary tasks cannot easily be added to a regular pipeline, as it’s much more natural to have the gather tasks run automatically on all other tasks. And we can generate a quantum graph for these particular tasks much more efficiently than the general-purpose algorithm could.
Attributes Summary
Definitions of all data dimensions.
Methods Summary
build
([metadata])Build the quantum graph.
main
()Run the command-line interface for this quantum-graph builder.
Make the argument parser for the command-line interface.
process_subgraph
(subgraph)Build the rough structure for an independent subset of the
QuantumGraph
and query for relevant existing datasets.Attributes Documentation
- universe¶
Definitions of all data dimensions.
Methods Documentation
- build(metadata: Mapping[str, Any] | None = None) QuantumGraph ¶
Build the quantum graph.
- Parameters:
- metadata
Mapping
, optional Flexible metadata to add to the quantum graph.
- metadata
- Returns:
- quantum_graph
QuantumGraph
DAG describing processing to be performed.
- quantum_graph
Notes
External code is expected to construct a
QuantumGraphBuilder
and then call this method exactly once. See class documentation for details on what it does.
- classmethod main() None ¶
Run the command-line interface for this quantum-graph builder.
This function provides the implementation for the
build-gather-resource-usage-qg
script.
- classmethod make_argument_parser() ArgumentParser ¶
Make the argument parser for the command-line interface.
- process_subgraph(subgraph: PipelineGraph) QuantumGraphSkeleton ¶
Build the rough structure for an independent subset of the
QuantumGraph
and query for relevant existing datasets.- Parameters:
- subgraph
pipeline_graph.PipelineGraph
Subset of the pipeline graph that should be processed by this call. This is always resolved and topologically sorted. It should not be modified.
- subgraph
- Returns:
- skeleton
quantum_graph_skeleton.QuantumGraphSkeleton
- Class representing an initial quantum graph. See
quantum_graph_skeleton.QuantumGraphSkeleton
docs for details. After this is returned, the object may be modified in-place in unspecified ways.
- skeleton
Notes
In addition to returning a
quantum_graph_skeleton.QuantumGraphSkeleton
, this method should populate theexisting_datasets
structure by querying for all relevant datasets with non-empty data IDs (those with empty data IDs will already be present). In particular:inputs
must always be populated with all overall-input datasets (but not prerequisites), by queryinginput_collections
;outputs_for_skip
must be populated with any intermediate our output datasets present inskip_existing_in
(it can be ignored ifskip_existing_in
is empty);outputs_in_the_way
must be populated with any intermediate or output datasets present inoutput_run
, ifoutput_run_exists
(it can be ignored ifoutput_run_exists
isFalse
). Note that the presence of such datasets is not automatically an error, even ifclobber is `False
, as these may be quanta that will be skipped.inputs
must be populated with all prerequisite-input datasets that were included in the skeleton, by queryinginput_collections
(not all prerequisite inputs need to be included in the skeleton, but the base class can only use per-quantum queries to find them, and that can be slow when there are many quanta).
Dataset types should never be components and should always use the “common” storage class definition in
pipeline_graph.DatasetTypeNode
(which is the data repository definition when the dataset type is registered).