Parallel processing with command-line tasks¶
Command-line tasks can operate in parallel.
Generally if a command-line task is run against several data IDs, each data ID can be processed in parallel.
Command-line tasks use two different categories of parallel processing: single-node parallel processing (-j
) and distributed processing (with the pipe_drivers
and ctrl_pool
packages).
This page describes how to use these parallel processing features.
How to enable single-node parallel processing¶
Command-line tasks can process multiple data IDs in parallel across multiple CPUs of one machine if you provide a -j
/--processes
argument with a value greater than 1
.
For example, to spread processing across up to eight CPUs:
task.py REPOPATH --output output --id -j 8 ...
When operating with -j
, command-line tasks have a default per-process timeout of 720 hours.
You can change this timeout via the --timeout
command-line argument.
Log messages from each process are combined into the same console output stream.
To discern logging statements from each process use the --longlog
argument to enable a verbose logging format that includes the data ID being processed.
See Using the verbose logging format for more information.
Note
Internally, -j
-type multiprocessing is implemented with the multiprocessing
module in the Python standard library.
If a task does not support -j
parallel processing it will report a non-fatal warning:
This task does not support multiprocessing; using one process
Specifying -j 1
disables multiprocessing altogether (the default).
How to enable distributed processing¶
Command-line tasks implemented in the pipe_drivers
package can distribute processing across multiple nodes using MPI or batch submission to clusters (PBS or SLURM).
The --batchType
argument controls which processing system is used.
--batchType None
disables multiprocessing with “driver” tasks.
See the pipe_drivers
documentation for more information.
How to control threading while parallel processing¶
LSST Science Pipelines uses low-level math libraries that support threading.
OpenBLAS and MKL default to using the same number of threads as CPUs present on the machine.
When parallel processing at a higher level (using -j
, for example), threading in these libraries can result in a net decrease in performance because of thread contention.
When running with -j
multiprocessing or via ctrl_pool
, command-line tasks check if low-level libraries are using their default threading behavior (same number of threads as CPUs).
If so, the task will automatically disable multi-threading in these libraries.
The command-line task also sends a warning when it does so:
WARNING: You are using OpenBLAS with multiple threads (16), but have not specified the number of threads using one of the OpenBLAS environment variables: OPENBLAS_NUM_THREADS, GOTO_NUM_THREADS, OMP_NUM_THREADS. This may indicate that you are unintentionally using multiple threads, which may cause problems. WE HAVE THEREFORE DISABLED OpenBLAS THREADING. If you know what you are doing and want threads enabled implicitly, set the environment variable LSST_ALLOW_IMPLICIT_THREADS.
The recommended resolution to this warning is to specifically limit the number of threads used by OpenBLAS and MKL.
For example, in a bash or similar shell set the OMP_NUM_THREADS
environment variable:
export OMP_NUM_THREADS=1
Note
OMP_NUM_THREADS
is recognized by both OpenBLAS and MKL so it is likely the only environment variable that needs to be set.
Alternatively, specific environment variables used by the libraries are:
- OpenBLAS:
OPENBLAS_NUM_THREADS
,GOTO_NUM_THREADS
, andOMP_NUM_THREADS
. - MLK:
MKL_NUM_THREADS
,MKL_DOMAIN_NUM_THREADS
,OMP_NUM_THREADS
.
If necessary, you may bypass a command-line task’s control over threading in low-level math libraries by setting the LSST_ALLOW_IMPLICIT_THREADS
environment variable.
For example, in a bash shell:
export LSST_ALLOW_IMPLICIT_THREADS=1
See also
These functions in lsst.base
implement threading detection and control: