Running ap_verify from the command line

ap_verify.py is a Python script designed to be run on both developer machines and verification servers. This page describes the most common options used to run ap_verify. For more details, see the ap_verify command-line reference or run ap_verify.py -h.

How to run ap_verify in a new workspace

Using the Cosmos PDR2 CI dataset as an example, one can run ap_verify.py as follows.

First download and setup the dataset.

git clone https://github.com/lsst/ap_verify_ci_cosmos_pdr2/
setup -r ap_verify_ci_cosmos_pdr2

You will need to setup the dataset each time you want to use it.

ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --data-query "visit in (59150, 59160)" -j4 --output workspaces/cosmos/

Here the inputs are:

  • ap_verify_ci_cosmos_pdr2 is the ap_verify dataset to process,

  • visit in (59150, 59160) is the data ID query to process,

  • -j causes the ingest and processing pipelines to use 4 processes: choose a value appropriate for your machine; the system does not automatically determine how many parallel processes to use.

while the output is:

  • workspaces/cosmos/ is the location where the pipeline will create a Butler repository along with other outputs such as the alert production database.

This call will create a new directory at workspaces/cosmos, ingest the Cosmos data into a new repository, then run visits 59150 and 59160 through the entire AP pipeline.

It’s also possible to run an entire dataset by omitting the --data-query argument (as some datasets are very large, do this with caution):

ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 -j4 --output workspaces/cosmos/

Warning

Some datasets require particular data queries in order to successfully run through the pipeline, due to missing data or other limitations. Check the README.md in each dataset’s main directory for what additional arguments might be necessary.

How to run ingestion by itself

ap_verify includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them. This is useful if the data need special processing or as a precursor to massive processing runs. Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.

Using the Cosmos PDR2 dataset as an example, one can run ingest_dataset as follows:

ingest_dataset.py --dataset ap_verify_ci_cosmos_pdr2 -j4 --output workspaces/cosmos/

The --dataset, --output, -j, and --processes arguments behave the same way as for ap_verify.py. Other options from ap_verify.py are not available.

How to use measurements of metrics

After ap_verify has run, it will produce Butler datasets named metricValue_<metric package>_<metric>. These can be queried, like any Butler dataset, using methods like queryDatasetTypes and get.

Note

Not all metric values need have the same data ID as the data run through the pipeline. For example, metrics describing the full focal plane have a visit but no detector.

Further reading