Tiger#
- class lsst.ctrl.bps.parsl.sites.princeton.Tiger(*args, **kwargs)#
Bases:
SlurmConfiguration for running jobs on Princeton’s Tiger cluster.
The following BPS configuration parameters are recognised, overriding the defaults:
nodes(int): number of nodes for each Slurm job.cores_per_node(int): number of cores per node for each Slurm job.walltime(str): time limit for each Slurm job.mem_per_node(int): memory per node (GB) for each Slurm job.max_blocks(int): maximum number of blocks (Slurm jobs) to use.cmd_timeout(int): timeout (seconds) to wait for a scheduler.singleton(bool): allow only one job to run at a time; by defaultTrue.
When running on the Tiger cluster, you should operate on the
/scratch/gpfsfilesystem, rather than/projectsor/tigress; the latter are not even mounted on the cluster nodes any more.Methods Summary
Return the IP address of the machine hosting the driver/submission.
Get a list of executors to be used in processing.
select_executor(job)Get the
labelof the executor to use to execute a job.Methods Documentation
- get_address() str#
Return the IP address of the machine hosting the driver/submission.
This host machine address should be accessible from the workers and should generally be the return value of one of the functions in
parsl.addresses.This is used by the default implementation of
get_monitor, but will generally be used byget_executorstoo.This implementation gets the address from the Infiniband network interface, because the cluster nodes can’t connect to the head node through the regular internet.
- get_executors() list[ParslExecutor]#
Get a list of executors to be used in processing.
Each executor should have a unique
label.The walltime default here is set so we get into the tiger-vshort QoS, which will hopefully reduce the wait for us to get a node. Then, we have one Slurm job running at a time (singleton) while another saves a spot in line (max_blocks=2). We hope that this will allow us to run almost continually until the workflow is done.
We set the cmd_timeout value to 300 seconds to help avoid TimeoutExpired errors when commands are slow to return (often due to system contention).