Running ap_verify from the command line

ap_verify.py is a Python script designed to be run on both developer machines and verification servers. While ap_verify.py is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical. This page describes the most common options used to run ap_verify. For more details, see the ap_verify command-line reference or run ap_verify.py -h.

Datasets as input arguments

Since ap_verify begins with an uningested dataset, the input argument is a dataset name rather than a repository.

Datasets are identified by a name that gets mapped to an installed eups-registered package containing the data. The mapping is configurable. The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ap_verify.

How to run ap_verify in a new workspace

Using the HiTS 2015 dataset as an example, one can run ap_verify.py as follows:

ap_verify.py --dataset HiTS2015 --gen2 --id "visit=412518^412568 filter=g" --output workspaces/hits/

Here the inputs are:

  • HiTS2015 is the ap_verify dataset name,
  • --gen2 specifies to process the dataset using the Gen 2 pipeline framework,
  • visit=412518^412568 filter=g is the dataId to process,

while the output is:

  • workspaces/hits/ is the location where the pipeline will create any Butler repositories necessary,

This call will create a new directory at workspaces/hits, ingest the HiTS data into a new repository based on <hits-data>/repo/, then run visit 412518 through the entire AP pipeline.

It’s also possible to run an entire dataset by omitting the --id argument (as some datasets are very large, do this with caution):

ap_verify.py --dataset CI-HiTS2015 --gen2 --output workspaces/hits/

Note

The command-line interface for ap_verify.py is at present more limited than those of command-line tasks. See the ap_verify command-line reference for details.

How to run ingestion by itself

ap_verify includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them. This is useful if the data need special processing or as a precursor to massive processing runs. Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.

Using the HiTS 2015 dataset as an example, one can run ingest_dataset as follows:

ingest_dataset.py --dataset HiTS2015 --gen2 --output workspaces/hits/

The --dataset, --output, --gen2, and --gen3 arguments behave the same way as for ap_verify.py. Other options from ap_verify.py are not available.

How to use measurements of metrics

After ap_verify has run, it will produce files named, by default, ap_verify.<dataId>.verify.json in the caller’s directory. The file name may be customized using the --metrics-file command-line argument. These files contain metric measurements in lsst.verify format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019.

If the pipeline is interrupted by a fatal error, completed measurements will be saved to metrics files for debugging purposes. See the error-handling policy for details.