Running ap_verify from the command line

ap_verify.py is a Python script designed to be run on both developer machines and verification servers. This page describes the most common options used to run ap_verify. For more details, see the ap_verify command-line reference or run ap_verify.py -h.

This guide assumes that the dataset(s) to be run are already installed on the machine. If this is not the case, see Installing datasets.

How to run ap_verify in a new workspace

Using the Cosmos PDR2 CI dataset as an example, first setup the dataset, if it isn’t already.

setup [-r] ap_verify_ci_cosmos_pdr2

You will need to setup the dataset once each session.

Next, clone the model packages for the Real-Bogus (RB) Classifier and run the setup as follows.

setup [-r] rbClassifier_data

You will need to setup the rbClassifier_data as well once each session.

You can then run ap_verify.py as follows.

ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 -j4 --output workspaces/cosmos/

Here the inputs are:

  • ap_verify_ci_cosmos_pdr2 is the ap_verify dataset to process,

  • -j causes the ingest and processing pipelines to use 4 processes: choose a value appropriate for your machine; the system does not automatically determine how many parallel processes to use.

while the output is:

  • workspaces/cosmos/ is the location where the pipeline will create a Butler repository along with other outputs such as the alert production database.

This call will create a new directory at workspaces/cosmos, ingest the Cosmos data into a new repository, then run visits 59150 and 59160 through the entire AP pipeline.

Warning

Some datasets require particular data ID queries (e.g. --data-query "visit in ...") in order to successfully run through the pipeline, due to missing data or other limitations. Check the README.md in each dataset’s main directory for what additional arguments might be necessary.

How to run ingestion by itself

ap_verify includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them. This is useful if the data need special processing or as a precursor to massive processing runs. Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.

Using the Cosmos PDR2 dataset as an example, one can run ingest_dataset as follows:

ingest_dataset.py --dataset ap_verify_ci_cosmos_pdr2 -j4 --output workspaces/cosmos/

The --dataset, --output, -j, and --processes arguments behave the same way as for ap_verify.py. Other options from ap_verify.py are not available.

How to use measurements of metrics

After ap_verify has run, it will produce Butler datasets named metricValue_<metric package>_<metric>. These can be queried, like any Butler dataset, using methods like queryDatasetTypes and get.

Note

Not all metric values need have the same data ID as the data run through the pipeline. For example, metrics describing the full focal plane have a visit but no detector.

Further reading