Use

Use of this plugin by BPS is triggered through the BPS configuration file’s wmsServiceClass entry, which you should set to lsst.ctrl.bps.parsl.ParslService.

The computeSite entry should be set to a value of your choice, representative of the computing site in use. For example, I use local for running on a single machine, and tiger for running on the Princeton Tiger cluster. The site is then configured by settings under site.<computeSite> (this scheme allows simple switching between different sites, and different configurations of the same site). A site should have a class entry, which is the fully-qualified python name of a subclass of lsst.ctrl.bps.parsl.SiteConfig. Beyond that, the configuration of a site depends on the particular site configuration class chosen. See the section on Sites for details on available site configuration classes, and what configuration entries are available.

Here’s an example BPS configuration file for running the ci_hsc dataset:

pipelineYaml: "${DRP_PIPE_DIR}/pipelines/HSC/DRP-ci_hsc.yaml"
wmsServiceClass: lsst.ctrl.bps.parsl.ParslService
computeSite: tiger
parsl:
  log_level: DEBUG
site:
  local:
    class: lsst.ctrl.bps.parsl.sites.Local
    cores: 10
  tiger:
    class: lsst.ctrl.bps.parsl.sites.princeton.Tiger
    nodes: 1
    walltime: "0:59:00"  # Under the 1 hour limit for qos=tiger-test

Note that there are two sites configured:

  1. local, which uses the Local site configuration with 10 cores; and

  2. tiger, which uses the Tiger site configuration with a single node and almost 1 hour walltime.

It’s currently configured (through computeSite) to use the tiger site, but switching between these two is simply a matter of changing the computeSite value.

Configuration

The following configuration settings can be used in configuring the plugin:

  • parsl.log_level (str): logging level for Parsl; may be one of CRITICAL, DEBUG, ERROR, FATAL, INFO, WARN.

  • project (str): project name; defaults to bps.

  • campaign (str): campaign name; defaults to the user name (which can also be set via the username setting).

  • subDirTemplate (str): template used to define log subdirectories in order to avoid having too many files in a single directory; defaults to a very generic template defined by ctrl_bps in bps_defaults.yaml. To run with no subdirectories, in the submit yaml set subDirTemplate to the empty string (subDirTemplate: '').

The workflow job name is taken to be <project>.<campaign>.

Sites

All sites respect the following settings (under site.<computeSite>):

  • commandPrefix (str): command(s) to use as a prefix to executing a job command on a worker.

  • environment` (bool): add bash commands that replicate the environment on the driver/submit machine?

  • retries (int): number of times to retry a job that fails; defaults to 1.

  • run_dir (str): work directory for Parsl to store its running data, including logs (Default: runinfo).

The following sites are provided by the ctrl_bps_parsl package.

Local

lsst.ctrl.bps.parsl.sites.Local uses a ThreadPoolExecutor to execute the workflow on the local machine. Required settings are:

  • cores (int): number of cores to use.

Slurm

lsst.ctrl.bps.parsl.sites.Slurm uses a HighThroughputExecutor and SlurmProvider to execute the workflow on a Slurm cluster. This class can be used directly by providing the necessary values in the BPS configuration, or by subclasssing and setting values in the subclass. When used directly, required settings are:

  • nodes (int): number of nodes for each Slurm job.

  • walltime (str): time limit for each Slurm job.

Caution

walltime colon-delimited values should always be enclosed in double-quotes, to avoid YAML parsing them differently than you intend.

Optional settings are:

  • cores_per_node (int): number of cores per node for each Slurm job; by default we use all cores on the node.

  • mem_per_node (int): memory per node (GB) for each Slurm job; by default we use whatever Slurm gives us.

  • qos (str): quality of service to request for each Slurm job; by default we use whatever Slurm gives us.

  • scheduler_options (str): text to prepend to the Slurm submission script (each line usually starting with #SBATCH); empty string by default.

TripleSlurm

lsst.ctrl.bps.parsl.sites.TripleSlurm uses three HighThroughputExecutors and SlurmProviders to execute the workflow on a Slurm cluster. The small, medium and large executors may have different memory limits, allowing jobs to be sent to different allocations depending upon their requirements. This class can be used directly by providing the necessary values in the BPS configuration, or by subclasssing and setting values in the subclass. The TripleSlurm site respects the same settings as for Slurm (except for walltime), plus the following optional settings:

  • small_memory (float): memory per worker (GB) for each ‘small’ Slurm job (default: 2.0).

  • medium_memory (float): memory per worker (GB) for each ‘medium’ Slurm job (default: 4.0).

  • large_memory (float): memory per worker (GB) for each ‘large’ Slurm job (default: 8.0).

  • small_walltime (str): time limit for each ‘small’ Slurm job (default: 10 hours).

  • medium_walltime (str): time limit for each ‘medium’ Slurm job (default: 10 hours).

  • large_walltime (str): time limit for each ‘large’ Slurm job (default: 40 hours).

Specifying walltime (as for the Slurm site) would override the individual small_walltime, medium_walltime and large_walltime values.

Warning

All the *_walltime colon-delimited values should always be enclosed in double-quotes, to avoid YAML parsing them differently than you intend.

Tiger

lsst.ctrl.bps.parsl.sites.princeton.Tiger is intended for use with the Princeton Tiger cluster. It subclasses Slurm and adds some suitable customisation. By default, a block of 4 nodes of 40 cores each run while another block waits in the queue. Optional settings are:

  • nodes (int): number of nodes for each Slurm job.

  • cores_per_node (int): number of cores per node for each Slurm job.

  • walltime (str): time limit for each Slurm job.

  • mem_per_node (int): memory per node (GB) for each Slurm job.

  • max_blocks (int): number of blocks (Slurm jobs) to use; one will execute while the others wait.

  • cmd_timeout (int): timeout (seconds) to wait for a scheduler.

CoriKnl

lsst.ctrl.bps.parsl.sites.nersc.CoriKnl is intended for use with the NERSC Cori-KNL cluster. It subclasses TripleSlurm and adds some customisation. Required and optional settings are the same as for TripleSlurm.

Sdf

lsst.ctrl.bps.parsl.sites.slac.Sdf is intended to be used with the Rubin partition at the SLAC Shared Scientific Data Facility (SDF). It subclasses Slurm and adds some suitable customisation. By default, a block of 1 node of 100 cores runs while another block waits in the queue. Optional settings are:

  • nodes (int): number of nodes for each Slurm job.

  • cores_per_node (int): number of cores per node for each Slurm job.

  • walltime (str): time limit for each Slurm job.

  • mem_per_node (int): memory per node (GB) for each Slurm job.

  • max_blocks (int): number of blocks (Slurm jobs) to use; one will execute while the others wait.

  • cmd_timeout (int): timeout (seconds) to wait for a scheduler.

LocalSrunWorkQueue

lsst.ctrl.bps.parsl.sites.work_queue.LocalSrunWorkQueue uses a LocalProvider with a WorkQueueExecutor to manage resources on single- or multi-node allocations. For multi-node allocations, Slurm’s srun command is used to launch jobs via an SrunLauncher. This implementation uses the work_queue module to schedule jobs with specific resource requests (e.g., memory, cpus, wall time, disk space), taking into account the available resources on the nodes.

Ccin2p3

lsst.ctrl.bps.parsl.sites.Ccin2p3 is intended to be used with the Slurm farm at CC-IN2P3. It uses a HighThroughputExecutor and SlurmProvider to execute the workflow on the site’s Slurm cluster. The small, medium, large and xlarge executors may have different memory limits, allowing jobs to be sent to different partitions depending upon their requirements, in particular their memory requirements.

Optional settings that apply to all executors are:

  • account (str): account to charge the Slurm resource utilization to. (Default: lsst)

  • partition (str): Slurm partition to submit the jobs to. (Default: lsst,htc)

  • qos (str): quality-of-service to request to Slurm for executing the job. (Default: normal)

  • walltime (str): time limit for each Slurm job. (Default: 72:00:00)

For each executor, you can override the default values above and in addition specify values for the optional settings below:

  • max_blocks (int): maximum number of Slurm jobs to execute simultaneously. (Defaults: 3000 for executor small, 1000 for medium, 100 for large and 10 for xlarge)

  • memory (int): memory per node in GB. (Defaults: 4 GB for executor small, 10 GB for medium, 50 GB for large and 150 GB for xlarge)

Adding a site

You don’t need to use a site configuration that’s already contained in the ctrl_bps_parsl package. You can write a subclass of lsst.ctrl.bps.parsl.SiteConfig, define the two abstract methods (get_executors() and select_executor()), and override any other methods that need to be customised. You should place your SiteConfig subclass somewhere on your PYTHONPATH, and then set the site.<computeSite>.class to the fully-qualified name of your SiteConfig subclass.

If you think your site configuration might be of use to others, we can incorporate it into ctrl_bps_parsl; please file an issue on GitHub.

Monitoring

Turning on Parsl’s monitoring feature allows tracking the progress of the workflow using a web browser. The site settings that support monitoring are:

  • monitorEnable (bool): enable monitor? Defaults to false.

  • monitorInterval (float): time interval (sec) between logging of resource usage. Defaults to 30.

  • monitorFilename (str): name of file to use for the monitor sqlite database. Defaults to monitor.sqlite.

Once the workflow is running, point the parsl-visualize executable to the monitoring database, e.g.:

parsl-visualize sqlite:////path/to/monitor.sqlite

Note

Yes, that’s four slashes!

Then you can point your web browser to the machine serving the visualisation, on the default port of 8080. You will likely have to use an ssh tunnel to expose that port, e.g.:

ssh -L 8080:localhost:8080 username@headnode

Then you can point your browser to localhost:8080.