Running ap_verify from the command line¶
ap_verify.py is a Python script designed to be run on both developer machines and verification servers.
While ap_verify.py is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical.
This page describes the minimum options needed to run ap_verify
.
For more details, see the ap_verify command-line reference or run ap_verify.py -h
.
Datasets as input arguments¶
Since ap_verify
begins with an uningested dataset, the input argument is a dataset name rather than a repository.
Datasets are identified by a name that gets mapped to an eups-registered directory containing the data.
The mapping is configurable.
The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ap_verify
.
How to run ap_verify in a new workspace¶
Using the HiTS 2015 dataset as an example, one can run ap_verify.py as follows:
ap_verify.py --dataset HiTS2015 --id "visit=412518 filter=g" --output workspaces/hits/ --silent
Here the inputs are:
- HiTS2015 is the dataset name,
- visit=412518 filter=g is the dataId to process,
while the output is:
- workspaces/hits/ is the location where the pipeline will create any Butler repositories necessary,
- --silent disables SQuaSH metrics reporting.
This call will create a new directory at workspaces/hits
, ingest the HiTS data into a new repository based on <hits-data>/repo/
, then run visit 412518 through the entire AP pipeline.
Note
The command-line interface for ap_verify.py is at present much more limited than those of command-line tasks. In particular, only file-based repositories are supported, and compound dataIds cannot be provided. See the ap_verify command-line reference for details.
How to run ingestion by itself¶
ap_verify
includes a separate program, ingest_dataset.py, that ingests datasets but does not run the pipeline on them.
This is useful if the data need special processing or as a precursor to massive processing runs.
Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.
Using the HiTS 2015 dataset as an example, one can run ingest_dataset
as follows:
ingest_dataset.py --dataset HiTS2015 --output workspaces/hits/
The --dataset
and --output
arguments behave the same way as for ap_verify.py.
Other options from ap_verify.py are not available.
How to use measurements of metrics¶
After ap_verify
has run, it will produce files named, by default, ap_verify.<dataId>.verify.json
in the caller’s directory.
The file name may be customized using the --metrics-file
command-line argument.
These files contain metric measurements in lsst.verify
format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019.
Unless the --silent
argument is provided, ap_verify
will also upload measurements to the SQuaSH service on completion.
See the SQuaSH documentation for details.
If the pipeline is interrupted by a fatal error, completed measurements will be saved to metrics files for debugging purposes, but nothing will get sent to SQuaSH. See the error-handling policy for details.