Tiger¶
- class lsst.ctrl.bps.parsl.sites.princeton.Tiger(config: BpsConfig, add_resources: bool = False)¶
Bases:
Slurm
Configuration for running jobs on Princeton’s Tiger cluster.
The following BPS configuration parameters are recognised, overriding the defaults:
nodes
(int
): number of nodes for each Slurm job.cores_per_node
(int
): number of cores per node for each Slurm job.walltime
(str
): time limit for each Slurm job.mem_per_node
(int
): memory per node (GB) for each Slurm job.max_blocks
(int
): maximum number of blocks (Slurm jobs) to use.cmd_timeout
(int
): timeout (seconds) to wait for a scheduler.singleton
(bool
): allow only one job to run at a time; by defaultTrue
.
When running on the Tiger cluster, you should operate on the
/scratch/gpfs
filesystem, rather than/projects
or/tigress
, as the latter are much slower on the cluster nodes than they are on the head nodes. Your BPS config should contain:includeConfigs: - ${CTRL_BPS_PARSL_DIR}/etc/execution_butler_copy_files.yaml
This will cause the necessary files to be transferred from your repo (presumably on
/projects
or/tigress
) to the execution butler in your submission directory (presumably on/scratch/gpfs
). Failure to do so will result in about a 6x slowdown, and probably degrading performance for other users. The results will be copied back to the original repo when everything has completed.Methods Summary
from_config
(config)Get the site configuration nominated in the BPS config.
Return the IP address of the machine hosting the driver/submission.
Return command(s) to add before each job command.
Get a list of executors to be used in processing.
Get parsl monitor.
Get Parsl configuration for this site.
get_site_subconfig
(config)Get BPS configuration for the site of interest.
make_executor
(label, *[, nodes, ...])Return an executor for running on a Slurm cluster.
select_executor
(job)Get the
label
of the executor to use to execute a job.Methods Documentation
- classmethod from_config(config: BpsConfig) SiteConfig ¶
Get the site configuration nominated in the BPS config.
The
computeSite
(str
) value in the BPS configuration is used to select a site configuration. The site configuration class to use is specified by the BPS configuration assite.<computeSite>.class
(str
), which should be the fully-qualified name of a python class that inherits fromSiteConfig
.- Parameters:
- config
BpsConfig
BPS configuration.
- config
- Returns:
- site_configsubclass of
SiteConfig
Site configuration.
- site_configsubclass of
- get_address() str ¶
Return the IP address of the machine hosting the driver/submission.
This host machine address should be accessible from the workers and should generally be the return value of one of the functions in
parsl.addresses
.This is used by the default implementation of
get_monitor
, but will generally be used byget_executors
too.This implementation gets the address from the Infiniband network interface, because the cluster nodes can’t connect to the head node through the regular internet.
- get_command_prefix() str ¶
Return command(s) to add before each job command.
These may be used to configure the environment for the job.
This default implementation respects the BPS configuration elements:
- get_executors() list[parsl.executors.base.ParslExecutor] ¶
Get a list of executors to be used in processing.
Each executor should have a unique
label
.The walltime default here is set so we get into the tiger-vshort QoS, which will hopefully reduce the wait for us to get a node. Then, we have one Slurm job running at a time (singleton) while another saves a spot in line (max_blocks=2). We hope that this will allow us to run almost continually until the workflow is done.
We set the cmd_timeout value to 300 seconds to help avoid TimeoutExpired errors when commands are slow to return (often due to system contention).
- get_monitor() MonitoringHub | None ¶
Get parsl monitor.
The parsl monitor provides a database that tracks the progress of the workflow and the use of resources on the workers.
This implementation respects the BPS configuration elements:
- get_parsl_config() Config ¶
Get Parsl configuration for this site.
Subclasses can overwrite this method to build a more specific Parsl configuration, if required.
The retries are set from the
site.<computeSite>.retries
value found in the BPS configuration file.- Returns:
- config
parsl.config.Config
The configuration to be used for Parsl.
- config
- static get_site_subconfig(config: BpsConfig) BpsConfig ¶
Get BPS configuration for the site of interest.
We return the BPS sub-configuration for the site indicated by the
computeSite
value, which issite.<computeSite>
.- Parameters:
- config
BpsConfig
BPS configuration.
- config
- Returns:
- site
BpsConfig
Site sub-configuration.
- site
- make_executor(label: str, *, nodes: int | None = None, cores_per_node: int | None = None, walltime: str | None = None, mem_per_node: int | None = None, mem_per_worker: float | None = None, qos: str | None = None, constraint: str | None = None, singleton: bool = False, scheduler_options: str | None = None, provider_options: dict[str, Any] | None = None, executor_options: dict[str, Any] | None = None) ParslExecutor ¶
Return an executor for running on a Slurm cluster.
- Parameters:
- label
str
Label for executor.
- nodes
int
, optional Default number of nodes for each Slurm job.
- cores_per_node
int
, optional Default number of cores per node for each Slurm job.
- walltime
str
, optional Default time limit for each Slurm job.
- mem_per_node
float
, optional Memory per node (GB) to request for each Slurm job.
- mem_per_worker
float
, optional Minimum memory per worker (GB), limited by the executor.
- qos
str
, optional Quality of service for each Slurm job.
- constraint
str
, optional Node feature(s) to require for each Slurm job.
- singleton
bool
, optional Wether to allow only a single Slurm job to run at a time.
- scheduler_options
str
, optional #SBATCH
directives to prepend to the Slurm submission script.- provider_options
dict
, optional Additional arguments for
SlurmProvider
constructor.- executor_options
dict
, optional Additional arguments for
HighThroughputExecutor
constructor.
- label
- Returns:
- executor
HighThroughputExecutor
Executor for Slurm jobs.
- executor