Running ap_verify from the command line¶
ap_verify.py is a Python script designed to be run on both developer machines and verification servers.
While ap_verify.py is not a command-line task, the command-line interface is designed to resemble that of command-line tasks where practical.
This page describes the most common options used to run ap_verify
.
For more details, see the ap_verify command-line reference or run ap_verify.py -h
.
How to run ap_verify in a new workspace (Gen 2 pipeline)¶
Using the Cosmos PDR2 CI dataset as an example, one can run ap_verify.py as follows:
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 --id "visit=59150^59160 filter=HSC-G" -j4 --output workspaces/cosmos/
Here the inputs are:
- ap_verify_ci_cosmos_pdr2 is the
ap_verify
dataset to process, --gen2
specifies to process the dataset using the Gen 2 pipeline framework,- visit=59150^59160 filter=HSC-G is the dataId to process,
-j
causes the ingest and processing pipelines to use 4 processes: choose a value appropriate for your machine; the system does not automatically determine how many parallel processes to use.
while the output is:
- workspaces/cosmos/ is the location where the pipeline will create any Butler repositories necessary,
This call will create a new directory at workspaces/cosmos
, ingest the Cosmos data into a new repository based on <cosmos-data>/repo/
, then run visits 59150 and 59160 through the entire AP pipeline.
It’s also possible to run an entire dataset by omitting the --id
argument (as some datasets are very large, do this with caution):
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen2 -j4 --output workspaces/cosmos/
Note
The command-line interface for ap_verify.py is at present more limited than those of command-line tasks. See the ap_verify command-line reference for details.
How to run ap_verify in a new workspace (Gen 3 pipeline)¶
Using the Cosmos PDR2 CI dataset as an example, one can run ap_verify.py as follows:
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 --data-query "visit in (59150, 59160) and band='g'" -j4 --output workspaces/cosmos/
Here the inputs are:
- ap_verify_ci_cosmos_pdr2 is the
ap_verify
dataset to process, --gen3
specifies to process the dataset using the Gen 3 pipeline framework,- visit in (59150, 59160) and band='g' is the data ID query to process,
-j
causes the ingest and processing pipelines to use 4 processes: choose a value appropriate for your machine; the system does not automatically determine how many parallel processes to use.
while the output is:
- workspaces/cosmos/ is the location where the pipeline will create a Butler repository along with other outputs such as the alert production database.
This call will create a new directory at workspaces/cosmos
, ingest the Cosmos data into a new repository, then run visits 59150 and 59160 through the entire AP pipeline.
It’s also possible to run an entire dataset by omitting the --data-query
argument (as some datasets are very large, do this with caution):
ap_verify.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 -j4 --output workspaces/cosmos/
Note
Because the science pipelines are still being converted to Gen 3, Gen 3 processing may not be supported for all ap_verify datasets. See the individual dataset’s documentation for more details.
Warning
Some datasets require particular data queries in order to successfully run through the pipeline, due to missing data or other limitations.
Check the README.md
in each dataset’s main directory for what additional arguments might be necessary.
How to run ingestion by itself¶
ap_verify
includes a separate program, ingest_dataset.py, that ingests datasets into repositories but does not run the pipeline on them.
This is useful if the data need special processing or as a precursor to massive processing runs.
Running ap_verify.py with the same arguments as a previous run of ingest_dataset.py will automatically skip ingestion.
Using the Cosmos PDR2 dataset as an example, one can run ingest_dataset
in Gen 2 as follows:
ingest_dataset.py --dataset ap_verify_ci_cosmos_pdr2 --gen3 -j4 --output workspaces/cosmos/
The --dataset
, --output
, --gen2
, --gen3
, -j
(does not apply to --gen2
), and --processes
arguments behave the same way as for ap_verify.py.
Other options from ap_verify.py are not available.
How to use measurements of metrics (Gen 2 pipeline)¶
After ap_verify
has run, it will produce files named, by default, ap_verify.<dataId>.verify.json
in the caller’s directory.
The file name may be customized using the --metrics-file
command-line argument.
These files contain metric measurements in lsst.verify
format, and can be loaded and read as described in the lsst.verify documentation or in SQR-019.
If the pipeline is interrupted by a fatal error, completed measurements will be saved to metrics files for debugging purposes. See the error-handling policy for details.
How to use measurements of metrics (Gen 3 pipeline)¶
After ap_verify
has run, it will produce Butler datasets named metricValue_<metric package>_<metric>
.
These can be queried, like any Butler dataset, using methods like queryDatasetTypes
and get
.
Note
Not all metric values need have the same data ID as the data run through the pipeline. For example, metrics describing the full focal plane have a visit but no detector.