TripleSlurm

class lsst.ctrl.bps.parsl.sites.TripleSlurm(*args, **kwargs)

Bases: Slurm

Configuration for running jobs on a Slurm cluster with three levels.

Parameters:
*argsAny

Parameters forwarded to base class constructor.

**kwargsAny

Keyword arguments passed to base class constructor.

Notes

The three levels are useful for having workers with different amount of available memory (and this is how executors are selected, by default), though other uses are possible.

The following BPS configuration parameters are recognised, overriding the defaults:

  • nodes (int): number of nodes for each Slurm job.

  • cores_per_node (int): number of cores per node for each Slurm job; by default we use all cores on the node.

  • walltime (str): time limit for each Slurm job; setting this would override each of the small_walltime, medium_walltime and large_walltime values.

  • mem_per_node (float): memory per node for each Slurm job; by default we use whatever Slurm gives us.

  • qos (str): quality of service to request for each Slurm job; by default we use whatever Slurm gives us.

  • small_memory (float): memory per worker (GB) for each ‘small’ Slurm job.

  • medium_memory (float): memory per worker (GB) for each ‘medium’ Slurm job.

  • large_memory (float): memory per worker (GB) for each ‘large’ Slurm job.

  • small_walltime (str): time limit for each ‘small’ Slurm job.

  • medium_walltime (str): time limit for each ‘medium’ Slurm job.

  • large_walltime (str): time limit for each ‘large’ Slurm job.

Methods Summary

from_config(config)

Get the site configuration nominated in the BPS config.

get_address()

Return the IP address of the machine hosting the driver/submission.

get_command_prefix()

Return command(s) to add before each job command.

get_executors([small_options, ...])

Get a list of executors to be used in processing.

get_monitor()

Get parsl monitor.

get_parsl_config()

Get Parsl configuration for this site.

get_site_subconfig(config)

Get BPS configuration for the site of interest.

make_executor(label, *[, nodes, ...])

Return an executor for running on a Slurm cluster.

select_executor(job)

Get the label of the executor to use to execute a job.

Methods Documentation

classmethod from_config(config: BpsConfig) SiteConfig

Get the site configuration nominated in the BPS config.

The computeSite (str) value in the BPS configuration is used to select a site configuration. The site configuration class to use is specified by the BPS configuration as site.<computeSite>.class (str), which should be the fully-qualified name of a python class that inherits from SiteConfig.

Parameters:
configBpsConfig

BPS configuration.

Returns:
site_configsubclass of SiteConfig

Site configuration.

get_address() str

Return the IP address of the machine hosting the driver/submission.

This address should be accessible from the workers. This should generally by the return value of one of the functions in parsl.addresses.

This is used by the default implementation of get_monitor, but will generally be used by get_executors too.

This default implementation gets the address from the hostname, but that will not work if the workers don’t access the driver/submission node by that address.

get_command_prefix() str

Return command(s) to add before each job command.

These may be used to configure the environment for the job.

This default implementation respects the BPS configuration elements:

  • site.<computeSite>.commandPrefix (str): command(s) to use as a prefix to executing a job command on a worker.

  • site.<computeSite>.environment (bool): add bash commands that replicate the environment on the driver/submit machine?

get_executors(small_options: dict[str, Any] | None = None, medium_options: dict[str, Any] | None = None, large_options: dict[str, Any] | None = None, **common_options) list[parsl.executors.base.ParslExecutor]

Get a list of executors to be used in processing.

We create three executors, with different walltime and memory per worker.

Parameters:
small_optionskwargs

Options for make_executor for small executor.

medium_optionskwargs

Options for make_executor for medium executor.

large_optionskwargs

Options for make_executor for large executor.

**common_options

Common options for make_executor for each of the executors.

get_monitor() MonitoringHub | None

Get parsl monitor.

The parsl monitor provides a database that tracks the progress of the workflow and the use of resources on the workers.

This implementation respects the BPS configuration elements:

  • site.<computeSite>.monitorEnable (bool): enable monitor?

  • site.<computeSite>.monitorInterval (float): time interval (sec) between logging of resource usage.

  • site.<computeSite>.monitorFilename (str): name of file to use for the monitor sqlite database.

Returns:
monitorMonitoringHub or None

Parsl monitor, or None for no monitor.

get_parsl_config() Config

Get Parsl configuration for this site.

Subclasses can overwrite this method to build a more specific Parsl configuration, if required.

The retries are set from the site.<computeSite>.retries value found in the BPS configuration file.

Returns:
configparsl.config.Config

The configuration to be used for Parsl.

static get_site_subconfig(config: BpsConfig) BpsConfig

Get BPS configuration for the site of interest.

We return the BPS sub-configuration for the site indicated by the computeSite value, which is site.<computeSite>.

Parameters:
configBpsConfig

BPS configuration.

Returns:
siteBpsConfig

Site sub-configuration.

make_executor(label: str, *, nodes: int | None = None, cores_per_node: int | None = None, walltime: str | None = None, mem_per_node: int | None = None, mem_per_worker: float | None = None, qos: str | None = None, constraint: str | None = None, singleton: bool = False, scheduler_options: str | None = None, provider_options: dict[str, Any] | None = None, executor_options: dict[str, Any] | None = None) ParslExecutor

Return an executor for running on a Slurm cluster.

Parameters:
labelstr

Label for executor.

nodesint, optional

Default number of nodes for each Slurm job.

cores_per_nodeint, optional

Default number of cores per node for each Slurm job.

walltimestr, optional

Default time limit for each Slurm job.

mem_per_nodefloat, optional

Memory per node (GB) to request for each Slurm job.

mem_per_workerfloat, optional

Minimum memory per worker (GB), limited by the executor.

qosstr, optional

Quality of service for each Slurm job.

constraintstr, optional

Node feature(s) to require for each Slurm job.

singletonbool, optional

Wether to allow only a single Slurm job to run at a time.

scheduler_optionsstr, optional

#SBATCH directives to prepend to the Slurm submission script.

provider_optionsdict, optional

Additional arguments for SlurmProvider constructor.

executor_optionsdict, optional

Additional arguments for HighThroughputExecutor constructor.

Returns:
executorHighThroughputExecutor

Executor for Slurm jobs.

select_executor(job: ParslJob) str

Get the label of the executor to use to execute a job.

This implementation only looks at the requested memory.

Parameters:
jobParslJob

Job to be executed.

Returns:
labelstr

Label of executor to use to execute job.