lsst.ctrl.mpexec¶
Command Line Scripts¶
The pipetask
command is being ported from an argparse framework to a Click
framework. During development the command implemented using Click is called
pipetask2
. At some point the current pipetask
command will be removed and
pipetask2
will be renamed to pipetask
.
pipetask2¶
pipetask2 [OPTIONS] COMMAND [ARGS]...
Options
-
--log-level
<LEVEL|COMPONENT=LEVEL>
¶ The logging level. Supported levels are [CRITICAL|ERROR|WARNING|INFO|DEBUG]
-
--long-log
¶
Make log messages appear in long format.
build¶
Build and optionally save pipeline definition.
This does not require input data to be specified.
pipetask2 build [OPTIONS]
Options
-
--show
<ITEM|ITEM=VALUE>
¶ Dump various info to standard output. Possible items are:
config
,config=[Task::]
orconfig=[Task::]<PATTERN>:NOIGNORECASE
to dump configuration fields possibly matching given pattern and/or task label;history=
to dump configuration history for a field, field name is specified as [Task::]<PATTERN>;dump-config
,dump-config=Task
to dump complete configuration for a task given its label or all tasks;pipeline
to show pipeline composition;graph
to show information about quanta;workflow
to show information about quanta and their dependency;tasks
to show task composition;uri
to show predicted dataset URIs of quanta
-
-p
,
--pipeline
<pipeline>
¶ Location of a pipeline definition file in YAML format.
-
-t
,
--task
<TASK[:LABEL>
¶ Task name to add to pipeline, must be a fully qualified task name. Task name can be followed by colon and label name, if label is not given then task base name (class name) is used as label.
-
--delete
<LABEL>
¶ Delete task with given label from pipeline.
-
-c
,
--config
<LABEL:NAME=VALUE>
¶ Config override, as a key-value pair.
-
-C
,
--config-file
<LABEL:FILE>
¶ Configuration override file(s), applies to a task with a given label.
-
--order-pipeline
¶
Order tasks in pipeline based on their data dependencies, ordering is performed as last step before saving or executing pipeline.
-
-s
,
--save-pipeline
<save_pipeline>
¶ Location for storing resulting pipeline definition in YAML format.
-
--pipeline-dot
<pipeline_dot>
¶ “Location for storing GraphViz DOT representation of a pipeline.
-
--instrument
<instrument>
¶ Add an instrument which will be used to load config overrides when defining a pipeline. This must be the fully qualified class name.
-
-
@
,
--options-file
<options_file>
¶ URI to YAML file containing overrides of command line options. The YAML should be organized as a hierarchy with subcommand names at the top level options for that subcommand below.
Notes:
–task, –delete, –config, –config-file, and –instrument action options can appear multiple times; all values are used, in order left to right.
FILE reads command-line options from the specified file. Data may be distributed among multiple lines (e.g. one option per line). Data after # is treated as a comment and ignored. Blank lines and lines starting with # are ignored.)
See ‘pipetask –help’ for more options.
qgraph¶
Build and optionally save quantum graph.
pipetask2 qgraph [OPTIONS]
Options
-
--show
<ITEM|ITEM=VALUE>
¶ Dump various info to standard output. Possible items are:
config
,config=[Task::]
orconfig=[Task::]<PATTERN>:NOIGNORECASE
to dump configuration fields possibly matching given pattern and/or task label;history=
to dump configuration history for a field, field name is specified as [Task::]<PATTERN>;dump-config
,dump-config=Task
to dump complete configuration for a task given its label or all tasks;pipeline
to show pipeline composition;graph
to show information about quanta;workflow
to show information about quanta and their dependency;tasks
to show task composition;uri
to show predicted dataset URIs of quanta
-
-p
,
--pipeline
<pipeline>
¶ Location of a pipeline definition file in YAML format.
-
-t
,
--task
<TASK[:LABEL>
¶ Task name to add to pipeline, must be a fully qualified task name. Task name can be followed by colon and label name, if label is not given then task base name (class name) is used as label.
-
--delete
<LABEL>
¶ Delete task with given label from pipeline.
-
-c
,
--config
<LABEL:NAME=VALUE>
¶ Config override, as a key-value pair.
-
-C
,
--config-file
<LABEL:FILE>
¶ Configuration override file(s), applies to a task with a given label.
-
--order-pipeline
¶
Order tasks in pipeline based on their data dependencies, ordering is performed as last step before saving or executing pipeline.
-
-s
,
--save-pipeline
<save_pipeline>
¶ Location for storing resulting pipeline definition in YAML format.
-
--pipeline-dot
<pipeline_dot>
¶ “Location for storing GraphViz DOT representation of a pipeline.
-
--instrument
<instrument>
¶ Add an instrument which will be used to load config overrides when defining a pipeline. This must be the fully qualified class name.
-
-g
,
--qgraph
<qgraph>
¶ Location for a serialized quantum graph definition (pickle file). If this option is given then all input data options and pipeline-building options cannot be used. Can be a URI.
-
--qgraph-id
<qgraph_id>
¶ Quantum graph identifier, if specified must match the identifier of the graph loaded from a file. Ignored if graph is not loaded from a file.
-
--qgraph-node-id
<qgraph_node_id>
¶ Only load a specified set of nodes when graph is loaded from a file, nodes are identified by integer IDs. One or more comma-separated integers are accepted. By default all nodes are loaded. Ignored if graph is not loaded from a file.
-
--skip-existing
¶
If all Quantum outputs already exist in the output RUN collection then that Quantum will be excluded from the QuantumGraph. Requires the ‘run` command’s
--extend-run
flag to be set.
-
-q
,
--save-qgraph
<save_qgraph>
¶ URI location for storing a serialized quantum graph definition (pickle file).
-
--save-single-quanta
<save_single_quanta>
¶ Format string of locations for storing individual quantum graph definition (pickle files). The curly brace {} in the input string will be replaced by a quantum number. Can be a URI.
-
--qgraph-dot
<qgraph_dot>
¶ Location for storing GraphViz DOT representation of a quantum graph.
-
-b
,
--butler-config
<butler_config>
¶ Location of the gen3 butler/registry config file.
-
-i
,
--input
<COLLECTION>
¶ Comma-separated names of the input collection(s).
-
-o
,
--output
<COLL>
¶ Name of the output CHAINED collection. This may either be an existing CHAINED collection to use as both input and output (incompatible with –input), or a new CHAINED collection created to include all inputs (requires –input). In both cases, the collection’s children will start with an output RUN collection that directly holds all new datasets (see –output-run).
-
--output-run
<COLL>
¶ Name of the new output RUN collection. If not provided then –output must be provided and a new RUN collection will be created by appending a timestamp to the value passed with –output. If this collection already exists then –extend-run must be passed.
-
--extend-run
¶
Instead of creating a new RUN collection, insert datasets into either the one given by –output-run (if provided) or the first child collection of - -output(which must be of type RUN).
-
--replace-run
¶
Before creating a new RUN collection in an existing CHAINED collection, remove the first child collection (which must be of type RUN). This can be used to repeatedly write to the same (parent) collection during development, but it does not delete the datasets associated with the replaced run unless –prune-replaced is also passed. Requires –output, and incompatible with –extend-run.
-
--prune-replaced
<prune_replaced>
¶ Delete the datasets in the collection replaced by –replace-run, either just from the datastore (‘unstore’) or by removing them and the RUN completely (‘purge’). Requires –replace-run.
Options: unstore|purge
-
-d
,
--data-query
<QUERY>
¶ User data selection expression.
-
-
@
,
--options-file
<options_file>
¶ URI to YAML file containing overrides of command line options. The YAML should be organized as a hierarchy with subcommand names at the top level options for that subcommand below.
Notes:
–task, –delete, –config, –config-file, and –instrument action options can appear multiple times; all values are used, in order left to right.
FILE reads command-line options from the specified file. Data may be distributed among multiple lines (e.g. one option per line). Data after # is treated as a comment and ignored. Blank lines and lines starting with # are ignored.)
See ‘pipetask –help’ for more options.
run¶
Build and execute pipeline and quantum graph.
pipetask2 run [OPTIONS]
Options
-
--debug
¶
Enable debugging output using lsstDebug facility (imports debug.py).
-
--show
<ITEM|ITEM=VALUE>
¶ Dump various info to standard output. Possible items are:
config
,config=[Task::]
orconfig=[Task::]<PATTERN>:NOIGNORECASE
to dump configuration fields possibly matching given pattern and/or task label;history=
to dump configuration history for a field, field name is specified as [Task::]<PATTERN>;dump-config
,dump-config=Task
to dump complete configuration for a task given its label or all tasks;pipeline
to show pipeline composition;graph
to show information about quanta;workflow
to show information about quanta and their dependency;tasks
to show task composition;uri
to show predicted dataset URIs of quanta
-
-p
,
--pipeline
<pipeline>
¶ Location of a pipeline definition file in YAML format.
-
-t
,
--task
<TASK[:LABEL>
¶ Task name to add to pipeline, must be a fully qualified task name. Task name can be followed by colon and label name, if label is not given then task base name (class name) is used as label.
-
--delete
<LABEL>
¶ Delete task with given label from pipeline.
-
-c
,
--config
<LABEL:NAME=VALUE>
¶ Config override, as a key-value pair.
-
-C
,
--config-file
<LABEL:FILE>
¶ Configuration override file(s), applies to a task with a given label.
-
--order-pipeline
¶
Order tasks in pipeline based on their data dependencies, ordering is performed as last step before saving or executing pipeline.
-
-s
,
--save-pipeline
<save_pipeline>
¶ Location for storing resulting pipeline definition in YAML format.
-
--pipeline-dot
<pipeline_dot>
¶ “Location for storing GraphViz DOT representation of a pipeline.
-
--instrument
<instrument>
¶ Add an instrument which will be used to load config overrides when defining a pipeline. This must be the fully qualified class name.
-
-g
,
--qgraph
<qgraph>
¶ Location for a serialized quantum graph definition (pickle file). If this option is given then all input data options and pipeline-building options cannot be used. Can be a URI.
-
--qgraph-id
<qgraph_id>
¶ Quantum graph identifier, if specified must match the identifier of the graph loaded from a file. Ignored if graph is not loaded from a file.
-
--qgraph-node-id
<qgraph_node_id>
¶ Only load a specified set of nodes when graph is loaded from a file, nodes are identified by integer IDs. One or more comma-separated integers are accepted. By default all nodes are loaded. Ignored if graph is not loaded from a file.
-
--skip-existing
¶
If all Quantum outputs already exist in the output RUN collection then that Quantum will be excluded from the QuantumGraph. Requires the ‘run` command’s
--extend-run
flag to be set.
-
-q
,
--save-qgraph
<save_qgraph>
¶ URI location for storing a serialized quantum graph definition (pickle file).
-
--save-single-quanta
<save_single_quanta>
¶ Format string of locations for storing individual quantum graph definition (pickle files). The curly brace {} in the input string will be replaced by a quantum number. Can be a URI.
-
--qgraph-dot
<qgraph_dot>
¶ Location for storing GraphViz DOT representation of a quantum graph.
-
-b
,
--butler-config
<butler_config>
¶ Location of the gen3 butler/registry config file.
-
-i
,
--input
<COLLECTION>
¶ Comma-separated names of the input collection(s).
-
-o
,
--output
<COLL>
¶ Name of the output CHAINED collection. This may either be an existing CHAINED collection to use as both input and output (incompatible with –input), or a new CHAINED collection created to include all inputs (requires –input). In both cases, the collection’s children will start with an output RUN collection that directly holds all new datasets (see –output-run).
-
--output-run
<COLL>
¶ Name of the new output RUN collection. If not provided then –output must be provided and a new RUN collection will be created by appending a timestamp to the value passed with –output. If this collection already exists then –extend-run must be passed.
-
--extend-run
¶
Instead of creating a new RUN collection, insert datasets into either the one given by –output-run (if provided) or the first child collection of - -output(which must be of type RUN).
-
--replace-run
¶
Before creating a new RUN collection in an existing CHAINED collection, remove the first child collection (which must be of type RUN). This can be used to repeatedly write to the same (parent) collection during development, but it does not delete the datasets associated with the replaced run unless –prune-replaced is also passed. Requires –output, and incompatible with –extend-run.
-
--prune-replaced
<prune_replaced>
¶ Delete the datasets in the collection replaced by –replace-run, either just from the datastore (‘unstore’) or by removing them and the RUN completely (‘purge’). Requires –replace-run.
Options: unstore|purge
-
-d
,
--data-query
<QUERY>
¶ User data selection expression.
-
--clobber-partial-outputs
¶
Remove incomplete outputs from previous execution of the same quantum before new execution.
-
--do-raise
¶
Raise an exception on error. (else log a message and continue?)
-
--profile
<profile>
¶ Dump cProfile statistics to file name.
-
-j
,
--processes
<processes>
¶ Number of processes to use.
-
--start-method
<start_method>
¶ Multiprocessing start method, default is platform-specific.
Options: spawn|fork|forkserver
-
--timeout
<timeout>
¶ Timeout for multiprocessing; maximum wall time (sec).
-
--fail-fast
¶
Stop processing at first error, default is to process as many tasks as possible.
-
--graph-fixup
<graph_fixup>
¶ Name of the class or factory method which makes an instance used for execution graph fixup.
-
--skip-init-writes
¶
Do not write collection-wide ‘init output’ datasets (e.g.schemas).
-
--init-only
¶
Do not actually run; just register dataset types and/or save init outputs.
-
--register-dataset-types
¶
Register DatasetTypes that do not already exist in the Registry.
-
--no-versions
¶
Do not save or check package versions.
-
-
@
,
--options-file
<options_file>
¶ URI to YAML file containing overrides of command line options. The YAML should be organized as a hierarchy with subcommand names at the top level options for that subcommand below.
Notes:
–task, –delete, –config, –config-file, and –instrument action options can appear multiple times; all values are used, in order left to right.
FILE reads command-line options from the specified file. Data may be distributed among multiple lines (e.g. one option per line). Data after # is treated as a comment and ignored. Blank lines and lines starting with # are ignored.)
See ‘pipetask –help’ for more options.
Contributing¶
lsst.ctrl.mpexec
is developed at https://github.com/lsst/ctrl_mpexec.
You can find Jira issues for this module under the ctrl_mpexec component.
Python API reference¶
lsst.ctrl.mpexec Package¶
Functions¶
graph2dot (qgraph, file) |
Convert QuantumGraph into GraphViz digraph. |
pipeline2dot (pipeline, file) |
Convert Pipeline into GraphViz digraph. |
Classes¶
CmdLineFwk () |
PipelineTask framework which executes tasks from command line. |
ExecutionGraphFixup |
Interface for classes which update quantum graphs before execution. |
MPGraphExecutor (numProc, timeout, …[, …]) |
Implementation of QuantumGraphExecutor using same-host multiprocess execution of Quanta. |
MPGraphExecutorError |
Exception class for errors raised by MPGraphExecutor. |
MPTimeoutError |
Exception raised when task execution times out. |
PreExecInit (butler, taskFactory[, skipExisting]) |
Initialization of registry for QuantumGraph execution. |
QuantumExecutor |
Class which abstracts execution of a single Quantum. |
QuantumGraphExecutor |
Class which abstracts QuantumGraph execution. |
SingleQuantumExecutor (taskFactory[, …]) |
Executor class which runs one Quantum at a time. |
TaskFactory |
Class instantiating PipelineTasks. |