lsst-ctrl-mpexec 27.0.0 (2024-05-29)¶
New Features¶
Be more permissive about input/output collection consistency, and provided a
--rebase
option topipetask run
andpipetask qgraph
to force consistency.An existing output collection is now considered consistent with a given sequence of input collections if the latter is a contiguous subsequence of the former. When this is not the case,
--rebase
redefines the output collection such that it will be. (DM-37140)Updated the open-source license to allow for the code to be distributed with either GPLv3 or BSD 3-clause license. (DM-37231)
Adde
pipeline-graph
andtask-graph
options forpipetask build --show
, which provide text-art visualization of pipeline graphs. (DM-39779)Added
pipetask report
which reads a quantum graph and reports on the outputs of failed, produced and missing quanta. This is a command-line incarnation ofQuantumGraphExecutionReport.make_reports
in combination withQuantumGraphExecutionReport.write_summary_yaml
. (DM-41131)Added
--summary
option topipetask qgraph
. (DM-41542)Made option to output
pipetask report
information to the command-line using Astropy tables and set to default. Now unpack a more human-readable dictionary fromlsst.pipe.base.QuantumGraphExecutionReports.to_summary_dict
and print summary tables of quanta and datasets to the command-line. Can now save error messages and associated data ids to a YAML file in the working directory, or optionally print them to screen as well. (DM-41606)
API Changes¶
SeparablePipelineExecutor.run_pipeline
has been modified to take anum_proc
parameter to specify how many subprocesses can be used to execute the pipeline. The default is now1
(no spawning), which is a change from the previous behavior of using 80% of the available cores. (DM-42751)
Bug Fixes¶
Removed shadowing of
pipetask build -t
bypipetask qgraph -t
.-t
now means--task
(the original meaning) rather than--transfer
. (DM-35599)Fixed a storage class bug in registering dataset types in
pipetask run
.Prior to this fix, the presence of multiple storage classes being associated with the same dataset type in a pipeline could cause the registered dataset type’s storage class to be random and nondeterministic in regular
pipetask run
execution (but not quantum-backed butler execution). It now follows the rules set byPipelineGraph
, in which the definition in the task that produces the dataset wins. (DM-41962)Ensured that the implicit threading options for
run-qbb
is used so that implicit threading can be disabled. (DM-42118)Fixed
dump_kwargs
TypeError
caused by migration to Pydantic 2. (DM-42376)Fixed the
--show-errors
option inpipetask report
.Correctly pass the option to the function as a flag. Then, in testing, use the
--show-errors
option to avoid saving YAML files to disk without adequate cleanup. (DM-43363)Fixed BPS auto-retry functionality broken on DM-43060, by restoring support for repeated execution of already-successful quanta in
pipetask run-qbb
. (DM-43484)
Other Changes and Additions¶
Dropped support for Pydantic 1.x. (DM-42302)
An API Removal or Deprecation¶
Support for fork option in
pipetask run
has been removed as unsafe. Default start option now isspawn
, andforkserver
is also available. Thefork
option is still present in CLI for compatibility, but is deprecated and replaced byspawn
if specified. (DM-41832)
lsst-ctrl-mpexec v26.0.0 (2023-09-23)¶
New Features¶
Added support for executing quantum graph using Quantum-backed butler.
pipetask
adds two new commands to support execution with Quantum-backed butler, mostly useful for BPS:pre-exec-init-qbb
which runsPreExecInit
step of the execution to produce InitOutputs.run-qbb
which executesQuantumGraph
(or individual quanta) using Quantum-backed butler. (DM-33497)
Added
--coverage
and--cov-packages
topipetask
commands to allow for code coverage calculations when running (DM-34420)Added
SeparablePipelineExecutor
, a pipeline executor midway in capability betweenSimplePipelineExecutor
andCmdLineFwk
.SeparablePipelineExecutor
is designed to be run from Python, and lets the caller decide when each pipeline processing step is carried out. It also allows certain pipeline steps to be customized by passing alternate implementations of execution strategies (e.g., custom graph builder). (DM-36162)pipetask
will now produceQuantumGraph
with resolved output references, even with execution butler option. (DM-37582)Added new command
update-graph-run
topipetask
. It updates existing quantum graph with new output run name and re-generates output dataset IDs. (DM-38780)Added new command line options
--cores-per-quantum
and--memory-per-quantum
. These can be used to pass some execution context into a quantum, allowing that quantum to change how it executes (maybe by using multiple threads). (DM-39661)Made it possible to force failures in mocked pipelines from the command-line. (DM-39672)
The output of the
pipetask ... --show=graph
now includes extended information about dataset references and their related datastore records. (DM-40254)
API Changes¶
Several modification to multiple classes to support execution with Quantum-backed butler:
CmdLineFwk
class adds two new methods:preExecInitQBB
, which only runsPreExecInit
step of the execution to produce InitOutputs; andrunGraphQBB
, which executesQuantumGraph
using Quantum-backed butler.Abstract classes
QuantumExecutor
andQuantumGraphExecutor
do not acceptButler
instance in theirexecute()
methods.MPGraphExecutor
andSingleQuantumExecutor
methods updated to reflect above change and support execution with either fullButler
orLimitedButler
.New class
PreExecInitLimited
which performs pre-exec-init in case of Quantum-backed butler. The code that it shares with a regularPreExecInit
class is now in their common base classPreExecInitBase
. (DM-33497)
Added new
resources
parameter toSingleQuantumExecutor
,SimplePipelineExecutor
, andSeparablePipelineExecutor
constructors. This optional parameter is aExecutionResources
object and allows the execution context to be passed into therunQuantum
method. (DM-39661)
Bug Fixes¶
Fixed
SingleQuantumExecutor
class to correctly handle the case withclobberOutputs=True
andskipExistingIn=None
. Documentation says that complete quantum outputs should be removed in this case, but they were not removed. (DM-38601)
Other Changes and Additions¶
SingleQuantumExecutor
has been modified such that it no longer unresolvesDatasetRef
when putting the non-PipelineTask
datasets (such as packages and configs). This has been done so that the refs in the quantum graph are preserved when they are written to a normal Butler.Fixed a race condition when
pipetask run
creates the graph with a timestamped output run and then executes it. Previously the graph creation and run execution phases calculated their own timestamped output run and it would be possible for the execution output run to be one second later than the graph run. Previously this did not matter (the graph run was being ignored) but with the change to always use theDatasetRef
from the graph it becomes critical that they match. (DM-38779)
Revive the previously-bitrotted pipeline mocking system.
Most of the implementation has been moved to
pipe_base
, and the point at which mocking occurs has moved from execution to just beforeQuantumGraph
generation, which changes whichpipetask
subcommands the--mock
option is valid for. (DM-38952)Updated the directed graph color scheme with an aim towards making node text easier to read. The previous pipeline directed graph nodes used dark gray as their background color. It had been reported that it is difficult to read the black text on the dark gray background. In the process of exploring what color schemes would be optimal to satisfy the aim of this ticket, it emerged that making use of the Rubin visual identity colors may be desirable. This will help to make LSST pipeline graphs more instantly recognizable as Rubin-associated products. Colors: https://rubin.canto.com/g/RubinVisualIdentity (DM-39294)
The
saveMetadata
configuration field is now ignored by executors in this package, metadata is assumed to be saved for each task. (DM-39377)Improved logging and removed some obsolete code paths in
SingleQuantumExecutor
. (DM-40332)Command line help for
pipetask run
has been updated to reflect its correct clobbering behavior.
lsst-ctrl-mpexec v25.0.0 (2023-02-28)¶
New Features¶
Added support for transferring files into execution butler. (DM-35494)
Added documentation on how to use
--show
and--config
.A pipeline will now never execute if
--show
option is used withpipetask run
.The
--config
option can now accept list configuration values (with or without square brackets), for example--config task:listItem=a,b
or--config "task:listItem=[a,b]"
.The
--config-file
option can now take comma-separated file names for multiple config files. (DM-35917)
added additional quanta information to be displayed by the logger, showing number of quanta per task (DM-36145)
If
pipetask
is run with multiple processes and if a butler datastore cache is configured, all subprocesses will now share the same cache. For large numbers of simultaneous processes it may be necessary to significantly increase the number of datasets in the cache to make the cache usable. This can be done by using the$DAF_BUTLER_CACHE_EXPIRATION_MODE
environment variable.Previously each subprocess would get its own cache and if
fork
start method was used these cache directories would not be cleaned up. (DM-36412)Always disable implicit threading (e.g. in OpenBLAS) by default in
pipetask run
, even when not using-j
.The new
--enable-implicit-threading
can be used to turn it back on. (DM-36831)
API Changes¶
SimplePipelineExecutor
factory methods addbind
parameter for bind values to use with the user expression. (DM-36487)
lsst-ctrl-mpexec v24.0.0 (2022-08-26)¶
New Features¶
Added
--dataset-query-constraint
option topipetask qgraph
command (and thus downstream commands) that allows a user to control howQuantumGraph
creation is constrained by dataset existence. (DM-31769)Builds using
setuptools
now calculate versions from the Git repository, including the use of alpha releases for those associated with weekly tags. (DM-32408)Added
--summary
option topipetask run
command, it produces JSON report for execution status of the whole process and individual quanta. (DM-33481)Added
pipetask
CLI commandspurge
andcleanup
. (DM-33634)Removed dependency on the
obs_base
andafw
packages. Now only depends onpipe_base
anddaf_butler
(along withpex_config
andutils
). (DM-34105)Replaced the unused
--do-raise
option with--pdb
, which drops the user into the debugger (pdb
by default, but--pdb=ipdb
also works if you haveipdb
installed) on an exception. (DM-34215)
Bug Fixes¶
Other Changes and Additions¶
Added
lsst.ctrl.mpexec.SimplePipelineExecutor
, a minimal high-level Python interface forPipelineTask
execution intended primarily for unit testing. (DM-31966)
lsst-ctrl-mpexec v23.0.1 (2022-02-02)¶
Miscellaneous Changes of Minor Interest¶
Allow
pipetask run
execution to continue in single-process mode after failure of one or more tasks. Previously execution stopped on an exception from any task. (DM-33339)
lsst-ctrl-mpexec v23.0.0 (2021-12-10)¶
New Features¶
Several improvements in
pipetask
execution options:New option
--skip-existing-in
which takes collection names(s), if output datasets already exist in those collections corresponding quanta is skipped.A
--skip-existing
option is now equivalent to appending output run collection to the--skip-existing-in
list.An
--extend-run
option implicitly enables--skip-existing
option.A
--prune-replaced=unstore
option only removes regular output datasets; InitOutputs, task configs, and package versions are not removed. (DM-27492)
GraphViz dot files generated by pipetask now include more information (
RUN
collection for datasets, dimensions for tasks, data IDs for quanta). (DM-28111)pipetask qgraph
can now generate a standalone “execution butler” which is a SQLite registry with all the expected outputs pre-filled in registry. Using this registry allowpipetask run
to execute without touching the main registry whilst still writing file artifacts to the standard location. It is not yet configured to allow a completely detached processing using a local datastore but this can be changed manually after creation to use a chained datastore. (DM-28649)Log messages issued during quantum execution are now collected and stored in butler as
tasklabel_log
dataset types.New command line options for logging have been added to
pipetask
. These include--log-file
to write log messages to a file and--no-log-tty
to disable log output to the terminal. (DM-30977)
Add the output run to the log record.
Add
--log-label
option topipetask
command to allow extra information to be injected into the log record. (DM-31884)
Bug Fixes¶
Miscellaneous Changes of Minor Interest¶
Add some of the pipetask command line options to QuantumGraph metadata (DM-30702)
lsst-ctrl-mpexec v22.0 (2021-04-01)¶
New Features¶
pipetask run
can now execute a subset of a graph. This allows a single graph file to be created with an entire workflow and then only part of it to be executed. This is very important for large scale workflow execution. [DM-27667]
Performance Enhancement¶
Multi-processing execution performance has been significantly improved for large graphs. [DM-28418]
Other¶
Ignore
--input
instead of rejecting it if it hasn’t changed. [DM-28101]The graph file format has been changed from a pickle file to a form that can efficiently be accessed from an object store. This new format has a
.qgraph
file extension. [DM-27784]A full URI can now be used to specify the location of the quantum graph. [DM-27682]