Running ap_verify from the command line

ap_verify.py is a Python script designed to be run on both developer machines and verification servers. While ap_verify.py is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical. This page describes the most common options used to run ap_verify. For more details, see the ap_verify command-line reference or run ap_verify.py -h.

Datasets as input arguments

Since ap_verify begins with an uningested dataset, the input argument is a dataset name rather than a repository.

Datasets are identified by a name that gets mapped to an installed eups-registered package containing the data. The mapping is configurable. The dataset names are a placeholder for a future data repository versioning system, and may be replaced in a later version of ap_verify.

How to run ap_verify in a new workspace (Gen 2 pipeline)

Using the Cosmos PDR2 CI dataset as an example, one can run ap_verify.py as follows:

ap_verify.py --dataset CI-CosmosPDR2 --gen2 --id "visit=59150^59160 filter=HSC-G" --output workspaces/cosmos/

Here the inputs are:

  • CI-CosmosPDR2 is the ap_verify dataset name,
  • --gen2 specifies to process the dataset using the Gen 2 pipeline framework,
  • visit=59150^59160 filter=HSC-G is the dataId to process,

while the output is:

  • workspaces/cosmos/ is the location where the pipeline will create any Butler repositories necessary,

This call will create a new directory at workspaces/cosmos, ingest the Cosmos data into a new repository based on <cosmos-data>/repo/, then run visits 59150 and 59160 through the entire AP pipeline.

It’s also possible to run an entire dataset by omitting the --id argument (as some datasets are very large, do this with caution):

ap_verify.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/

Note

The command-line interface for ap_verify.py is at present more limited than those of command-line tasks. See the ap_verify command-line reference for details.

How to run ap_verify in a new workspace (Gen 3 pipeline)

Using the Cosmos PDR2 CI dataset as an example, one can run ap_verify.py as follows:

ap_verify.py --dataset CI-CosmosPDR2 --gen3 --data-query "visit in (59150, 59160) and band='g'" --output workspaces/cosmos/

Here the inputs are:

  • CI-CosmosPDR2 is the ap_verify dataset name,
  • --gen3 specifies to process the dataset using the Gen 3 pipeline framework,
  • visit in (59150, 59160) and band='g' is the data ID query to process,

while the output is:

  • workspaces/cosmos/ is the location where the pipeline will create a Butler repository along with other outputs such as the alert production database.

This call will create a new directory at workspaces/cosmos, ingest the Cosmos data into a new repository, then run visits 59150 and 59160 through the entire AP pipeline.

It’s also possible to run an entire dataset by omitting the --data-query argument (as some datasets are very large, do this with caution):

ap_verify.py --dataset CI-CosmosPDR2 --gen3 --output workspaces/cosmos/

Note

Because the science pipelines are still being converted to Gen 3, Gen 3 processing may not be supported for all ap_verify datasets. See the individual dataset’s documentation for more details.

How to run ingestion by itself

ap_verify includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them. This is useful if the data need special processing or as a precursor to massive processing runs. Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.

Using the Cosmos PDR2 dataset as an example, one can run ingest_dataset in Gen 2 as follows:

ingest_dataset.py --dataset CI-CosmosPDR2 --gen2 --output workspaces/cosmos/

The --dataset, --output, --gen2, --gen3, and --processes arguments behave the same way as for ap_verify.py. Other options from ap_verify.py are not available.

How to use measurements of metrics (Gen 2 pipeline)

After ap_verify has run, it will produce files named, by default, ap_verify.<dataId>.verify.json in the caller’s directory. The file name may be customized using the --metrics-file command-line argument. These files contain metric measurements in lsst.verify format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019.

If the pipeline is interrupted by a fatal error, completed measurements will be saved to metrics files for debugging purposes. See the error-handling policy for details.

How to use measurements of metrics (Gen 3 pipeline)

After ap_verify has run, it will produce Butler datasets named metricValue_<metric package>_<metric>. These can be queried, like any Butler dataset, using methods like queryDatasetTypes and get.

Note

Not all metric values need have the same data ID as the data run through the pipeline. For example, metrics describing the full focal plane have a visit but no detector.