lsst-ctrl-bps v24.0.0 (2022-08-29) ================================== New Features ------------ - Plugins have been moved to separate packages. These new packages are ``ctrl_bps_htcondor``, ``ctrl_bps_pegasus`` (not currently supported) and ``ctrl_bps_panda``. (`DM-33521 `_) - Introduce a new command, ``restart``, that allows one to restart the failed workflow from the point of its failure. It restarts the workflow as it is just retrying failed jobs, no configuration changes are possible at the moment. (`DM-29575 `_) - Introduce a new option, ``--global``, to ``bps cancel`` and ``bps report`` which allows the user to interact (cancel or get the report on) with jobs in any job queue of a workflow management system using distributed job queues, e.g., HTCondor. (`DM-29614 `_) - Add ``ping`` subcommand to test whether the workflow services are available. (`DM-35144 `_) Bug Fixes --------- - * Fix cluster naming bug where variables in ``clusterTemplate`` were replaced too early. * Fix cluster naming bug if no ``clusterTemplate`` nor ``templateDataId`` given. (`DM-34265 `_) - Change bps to use ``DimensionUniverse`` from the relevant butler repository instead of the default universe from code. (`DM-35090 `_) Other Changes and Additions --------------------------- - Display run name after successful submission. (`DM-29575 `_) - * Abort submission if submit-side run directory already exists. * Emit more informative error message when creating the execution Butler fails. (`DM-32657 `_) - Reformat the code base with ``black`` and ``isort``. (`DM-33267 `_) - Select BPS commands now report approximate memory usage during their execution. (`DM-33331 `_) - Add a group and user attribute to the `~lsst.ctrl.bps.GenericWorkflowJob` that can be passed via WMS plugins to any batch systems that require such attributes for accounting purposes. (`DM-33887 `_) - * Abort submission if a ``Quantum`` is missing a dimension required by the clustering definition. * Abort submission if clustering definition results in cycles in the `~lsst.ctrl.bps.ClusteredQuantumGraph`. * Add unit tests for the quantum clustering functions. (`DM-34265 `_) - Add concept of cloud, in particular to be used by PanDA plugin. * Submit YAML can specify cloud with ``computeCloud``. * Common cloud values can be specified in cloud subsection. .. code-block:: YAML cloud: cloud_name_1: key1: value key2: value * `~lsst.ctrl.bps.GenericWorkflowJob` has ``compute_cloud``. (`DM-34876 `_) - * Print number of clusters in `~lsst.ctrl.bps.ClusteredQuantumGraph`. * Print number of jobs (including final) in `~lsst.ctrl.bps.GenericWorkflow`. (`DM-35066 `_) ctrl_bps v23.0.1 (2022-02-02) ============================= New Features ------------ - Check early in submission process that can import WMS service class and run any pre-submission checks provided by the WMS plugin. (`DM-32199 `_) - * Large tasks (> 30k jobs) splitted into chunks * Updated iDDS API usage for the most recent version * Updated iDDS API initialization to force PanDA proxy using the IAM user name for submitted workflow * Added limit on number of characters in the task pseudo inputs (`DM-32675 `_) - * New ``panda_auth`` command for handling PanDA authentication token. Includes status, reset, and clean capabilities. * Added early check of PanDA authentication token in submission process. (`DM-32830 `_) Other Changes and Additions --------------------------- - * Changed printing of submit directory early. * Changed PanDA plugin to only print the numeric id when outputing the request/run id. * Set maximum number of jobs in a PanDA task (maxJobsPerTask) to 70000 in config/bps_idf.yaml. (`DM-32830 `_) ctrl_bps v23.0.0 (2021-12-10) ============================= New Features ------------ - * Added bps htcondor job setting that should put jobs that get the signal 7 when exceeding memory on hold. Held message will say: "Job raised a signal 7. Usually means job has gone over memory limit." Until bps has the automatic memory exceeded retries, you can restart these the same way as with jobs that htcondor held for exceeding memory limits (condor_qedit and condor_release). * Too many files were being written to single directories in ``job/