Running ap_verify from the command line

ap_verify.py is a Python script designed to be run on both developer machines and verification servers. While ap_verify.py is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical. This page describes the most common options used to run ap_verify. For more details, see the ap_verify command-line reference or run ap_verify.py -h.

Datasets as input arguments

Since ap_verify begins with an uningested dataset, the input argument is a dataset name rather than a repository.

Datasets are identified by a name that gets mapped to an installed eups-registered package containing the data. The mapping is configurable. The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ap_verify.

How to run ap_verify in a new workspace (Gen 2 pipeline)

Using the HiTS 2015 dataset as an example, one can run ap_verify.py as follows:

ap_verify.py --dataset HiTS2015 --gen2 --id "visit=412518^412568 filter=g" --output workspaces/hits/

Here the inputs are:

  • HiTS2015 is the ap_verify dataset name,
  • --gen2 specifies to process the dataset using the Gen 2 pipeline framework,
  • visit=412518^412568 filter=g is the dataId to process,

while the output is:

  • workspaces/hits/ is the location where the pipeline will create any Butler repositories necessary,

This call will create a new directory at workspaces/hits, ingest the HiTS data into a new repository based on <hits-data>/repo/, then run visits 412518 and 412568 through the entire AP pipeline.

It’s also possible to run an entire dataset by omitting the --id argument (as some datasets are very large, do this with caution):

ap_verify.py --dataset CI-HiTS2015 --gen2 --output workspaces/hits/

Note

The command-line interface for ap_verify.py is at present more limited than those of command-line tasks. See the ap_verify command-line reference for details.

How to run ap_verify in a new workspace (Gen 3 pipeline)

The command for running the pipeline on Gen 3 data is almost identical to Gen 2:

ap_verify.py --dataset HiTS2015 --gen3 --id "visit in (412518, 412568) and band='g'" --output workspaces/hits/

The only differences are substituting --gen3 for --gen2, and formatting the (optional) data ID in the Gen 3 query syntax. For further compatibility with Gen 3 pipelines, --id may be replaced with --data-query.

Note

Because the science pipelines are still being converted to Gen 3, Gen 3 processing may not be supported for all ap_verify datasets. See the individual dataset’s documentation for more details.

How to run ingestion by itself

ap_verify includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them. This is useful if the data need special processing or as a precursor to massive processing runs. Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.

Using the HiTS 2015 dataset as an example, one can run ingest_dataset as follows:

ingest_dataset.py --dataset HiTS2015 --gen2 --output workspaces/hits/

The --dataset, --output, --gen2, --gen3, and --processes arguments behave the same way as for ap_verify.py. Other options from ap_verify.py are not available.

How to use measurements of metrics (Gen 2 pipeline)

After ap_verify has run, it will produce files named, by default, ap_verify.<dataId>.verify.json in the caller’s directory. The file name may be customized using the --metrics-file command-line argument. These files contain metric measurements in lsst.verify format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019.

If the pipeline is interrupted by a fatal error, completed measurements will be saved to metrics files for debugging purposes. See the error-handling policy for details.

How to use measurements of metrics (Gen 3 pipeline)

After ap_verify has run, it will produce Butler datasets named metricValue_<metric package>_<metric>. These can be queried, like any Butler dataset, using methods like queryDatasetTypes and get.

Note

Not all metric values need have the same data ID as the data run through the pipeline. For example, metrics describing the full focal plane have a visit but no detector.