Getting started with the AP pipeline (Gen 2)

This page explains how to set up a Gen 2 data repository that can then be processed with the AP Pipeline (see Running the AP pipeline (Gen 2)). This is the established Science Pipelines workflow, and is compatible with a variety of existing pipelines and tools. However, it is expected to be phased out in the future in favor of the Gen 3 framework.

If you already have a Gen 3 data repository or want to learn the new framework, see Getting started with the AP pipeline (Gen 3).

Installation

lsst.ap.pipe is available from the LSST Science Pipelines. It is installed as part of the lsst_apps and lsst_distrib metapackages.

Ingesting data files

LSST-style image processing typically operates on Butler repositories and does not directly interface with data files. lsst.ap.pipe is no exception. The process of turning a set of raw data files and corresponding calibration products into a format the Butler understands is called ingestion. Ingestion can be somewhat camera-specific, and is outside the scope of the AP Pipeline.

A utility to ingest data before running lsst.ap.pipe is available in ap_verify. However, this works only on datasets which adhere to the ap_verify dataset format. Alternately, you may use a pre- ingested dataset or manually ingest files yourself following the directions for a given obs_ package.

A standard ingestion workflow for DECam looks something like

ingestImagesDecam.py input_loc --filetype raw path/to/raw/files --mode=link
ingestCuratedCalibs.py input_loc --calib calib_loc $OBS_DECAM_DATA_DIR/decam/defects
ingestCuratedCalibs.py input_loc --calib calib_loc $OBS_DECAM_DATA_DIR/decam/crosstalk
ingestCalibs.py input_loc --calib calib_loc /path/to/biases/and/flats --mode=link --validity 999

Required data products

For the AP Pipeline to successfully process data, the following is required:

  • Raw science images and reference catalogs ingested into a main Butler repository
    • The reference catalogs must be in a directory called ref_cats with subdirectories for each catalog containing the appropriate catalog shards. We recommend using Pan-STARRS for photometry and Gaia for astrometry. An example config file for using these two catalogs can be found in the ap_verify_hits2015 repository.
  • Calibration products (biases, flats, and possibly others) ingested into a Butler repository that you must specify with the --calib flag on the command line at runtime
    • To check if this requirement has been satisfied, you can inspect the calibRegistry.sqlite3 created in this repository and ensure the information in the tables is accurate
  • Template images (of type deepCoadd by default) for difference imaging must be either in the main Butler repository or in another location you may specify with the --template flag on the command line at runtime

A sample dataset from the DECam HiTS survey that works with ap_pipe in the The dataset framework format is available as ap_verify_hits2015. However, this dataset must be ingested as described in Ingesting data files, and the reference catalog files must be decompressed and extracted.

Please continue to Pipeline Tutorial for more details about running the AP Pipeline and interpreting the results.