lsst-ctrl-bps-htcondor v24.0.0 (2022-08-29)
New Features
- This package has been extracted from
lsst_ctrl_bps
into a standalone package to make it easier to manage development of the HTCondor plugin.
(DM-33521)
- Add support for a new command,
bps restart
, that allows one to restart the failed workflow from the point of its failure. It restarts the workflow as it is just retrying failed jobs, no configuration changes are possible at the moment. (DM-29575)
- Add support for a new option of
bps cancel
, --global
, which allows the user to interact (cancel or get the report on) with jobs in any HTCondor job queue. (DM-29614)
- Add a configurable memory threshold to the memory scaling mechanism. (DM-32047)
Bug Fixes
- HTCondor plugin now correctly passes attributes defined in site’s ‘profile’ section to the HTCondor submission files. (DM-33887)
Other Changes and Additions
- Make HTCondor treat all jobs exiting with a signal as if they ran out of memory. (DM-32968)
- Make HTCondor plugin pass a group and user attribute to any batch systems that require such attributes for accounting purposes. (DM-33887)
ctrl_bps v23.0.0 (2021-12-10)
New Features
- Added BPS htcondor job setting that should put jobs that
get the signal 7 when exceeding memory on hold. Held
message will say: “Job raised a signal 7. Usually means
job has gone over memory limit.” Until bps has the
automatic memory exceeded retries, you can restart these
the same way as with jobs that htcondor held for exceeding
memory limits (
condor_qedit
and condor_release
).
- Add
numberOfRetries
option which specifies the maximum number of retries
allowed for a job.
- Add
memoryMultiplier
option to allow for increasing the memory
requirements automatically between retries for jobs which exceeded memory
during their execution. At the moment this option is only supported by
HTCondor plugin. (DM-29756)
- Change HTCondor bps plugin to use HTCondor curl plugin for local job transfers. (DM-32074)
Bug Fixes
- Fix bug in HTCondor plugin for reporting final job status when
--id <path>
. (DM-31887)
- Fix execution butler with HTCondor plugin bug when output collection has period. (DM-32201)
- Disable HTCondor auto detection of files to copy back from jobs. (DM-32220)
- Fixed bug when not using lazy commands but using execution butler.
- Fixed bug in
htcondor_service.py
that overwrote message in bps report. (DM-32241)
- Fixed bug when a pipetask process killed by a signal on the edge node did not expose the failing status. (DM-32435)