Tiger¶

class lsst.ctrl.bps.parsl.sites.princeton.Tiger(config: BpsConfig, add_resources: bool = False)¶

Bases: Slurm

Configuration for running jobs on Princeton’s Tiger cluster.

The following BPS configuration parameters are recognised, overriding the defaults:

nodes (int): number of nodes for each Slurm job.
cores_per_node (int): number of cores per node for each Slurm job.
walltime (str): time limit for each Slurm job.
mem_per_node (int): memory per node (GB) for each Slurm job.
max_blocks (int): maximum number of blocks (Slurm jobs) to use.
cmd_timeout (int): timeout (seconds) to wait for a scheduler.
singleton (bool): allow only one job to run at a time; by default True.

When running on the Tiger cluster, you should operate on the /scratch/gpfs filesystem, rather than /projects or /tigress; the latter are not even mounted on the cluster nodes any more.

Methods Summary

`from_config`(config)	Get the site configuration nominated in the BPS config.
`get_address`()	Return the IP address of the machine hosting the driver/submission.
`get_command_prefix`()	Return command(s) to add before each job command.
`get_executors`()	Get a list of executors to be used in processing.
`get_monitor`()	Get parsl monitor.
`get_parsl_config`()	Get Parsl configuration for this site.
`get_site_subconfig`(config)	Get BPS configuration for the site of interest.
`make_executor`(label, *[, nodes, ...])	Return an executor for running on a Slurm cluster.
`select_executor`(job)	Get the `label` of the executor to use to execute a job.

Methods Documentation

classmethod from_config(config: BpsConfig) → SiteConfig¶

Get the site configuration nominated in the BPS config.

The computeSite (str) value in the BPS configuration is used to select a site configuration. The site configuration class to use is specified by the BPS configuration as site.<computeSite>.class (str), which should be the fully-qualified name of a python class that inherits from SiteConfig.

Parameters:

configBpsConfig: BPS configuration.

Returns:

site_configsubclass of SiteConfig: Site configuration.

get_address() → str¶

Return the IP address of the machine hosting the driver/submission.

This host machine address should be accessible from the workers and should generally be the return value of one of the functions in parsl.addresses.

This is used by the default implementation of get_monitor, but will generally be used by get_executors too.

This implementation gets the address from the Infiniband network interface, because the cluster nodes can’t connect to the head node through the regular internet.

get_command_prefix() → str¶

Return command(s) to add before each job command.

These may be used to configure the environment for the job.

This default implementation respects the BPS configuration elements:

site.<computeSite>.commandPrefix (str): command(s) to use as a prefix to executing a job command on a worker.
site.<computeSite>.environment (bool): add bash commands that replicate the environment on the driver/submit machine?

get_executors() → list[parsl.executors.base.ParslExecutor]¶

Get a list of executors to be used in processing.

Each executor should have a unique label.

The walltime default here is set so we get into the tiger-vshort QoS, which will hopefully reduce the wait for us to get a node. Then, we have one Slurm job running at a time (singleton) while another saves a spot in line (max_blocks=2). We hope that this will allow us to run almost continually until the workflow is done.

We set the cmd_timeout value to 300 seconds to help avoid TimeoutExpired errors when commands are slow to return (often due to system contention).

get_monitor() → MonitoringHub | None¶

Get parsl monitor.

The parsl monitor provides a database that tracks the progress of the workflow and the use of resources on the workers.

This implementation respects the BPS configuration elements:

site.<computeSite>.monitorEnable (bool): enable monitor?
site.<computeSite>.monitorInterval (float): time interval (sec) between logging of resource usage.
site.<computeSite>.monitorFilename (str): name of file to use for the monitor sqlite database.

Returns:

monitorMonitoringHub or None: Parsl monitor, or None for no monitor.

get_parsl_config() → Config¶

Get Parsl configuration for this site.

Subclasses can overwrite this method to build a more specific Parsl configuration, if required.

The retries are set from the site.<computeSite>.retries value found in the BPS configuration file.

Returns:

configparsl.config.Config: The configuration to be used for Parsl.

static get_site_subconfig(config: BpsConfig) → BpsConfig¶

Get BPS configuration for the site of interest.

We return the BPS sub-configuration for the site indicated by the computeSite value, which is site.<computeSite>.

Parameters:

configBpsConfig: BPS configuration.

Returns:

siteBpsConfig: Site sub-configuration.

Return an executor for running on a Slurm cluster.

Parameters:

labelstr: Label for executor.
nodesint, optional: Default number of nodes for each Slurm job.
cores_per_nodeint, optional: Default number of cores per node for each Slurm job.
walltimestr, optional: Default time limit for each Slurm job.
mem_per_nodefloat, optional: Memory per node (GB) to request for each Slurm job.
mem_per_workerfloat, optional: Minimum memory per worker (GB), limited by the executor.
qosstr, optional: Quality of service for each Slurm job.
constraintstr, optional: Node feature(s) to require for each Slurm job.
singletonbool, optional: Wether to allow only a single Slurm job to run at a time.
scheduler_optionsstr, optional: #SBATCH directives to prepend to the Slurm submission script.
provider_optionsdict, optional: Additional arguments for SlurmProvider constructor.
executor_optionsdict, optional: Additional arguments for HighThroughputExecutor constructor.

Returns:

executorHighThroughputExecutor: Executor for Slurm jobs.

select_executor(job: ParslJob) → str¶

Get the label of the executor to use to execute a job.

Parameters:

jobParslJob: Job to be executed.

Returns:

labelstr: Label of executor to use to execute job.

Navigation

Tiger¶