Running the AP pipeline¶

Setup¶

Pick up where you left off in Getting Started. This means you already have a repository of ingested DECam data and have set up the LSST Science Pipelines stack.

Your repository should have the following collections, which can be checked using butler query-collections <repo>:

DECam/calib: biases, flats, defects, camera specs, etc.
DECam/raw/all: images to be processesd
refcats: reference catalogs for calibration
skymaps: index for the templates
templates/deep: deepCoadd templates for difference imaging

AP pipeline on the command line¶

Like most Vera Rubin Observatory pipelines, the AP Pipeline is run with an external runner called pipetask. This can be found in the ctrl_mpexec package, which is included as part of lsst_distrib.

The pipeline itself is configured in ap_pipe/pipelines/DECam/ApPipe.yaml.

To process your ingested data, run

apdb-cli create-sql "sqlite:///apdb.db" apdb_config.py
pipetask run -p ${AP_PIPE_DIR}/pipelines/DECam/ApPipe.yaml \
    --register-dataset-types -c parameters:coaddName=deep \
    -c isr:connections.bias=cpBias -c isr:connections.flat=cpFlat \
    -c parameters:apdb_config=apdb_config.py -b repo/ \
    -i "DECam/defaults,DECam/raw/all" -o processed \
    -d "visit in (411420, 419802) and detector=10"

In this case, a processed/<timestamp> collection will be created within repo and the results will be written there. The apdb_config.py file will be created by apdb-cli and passed to pipetask. See Setting up the Alert Production Database for ap_pipe for more information on apdb-cli.

This example command only processes observations corresponding to visits 411420 and 419802, both with only detector 10.

The example creates a “chained” output collection that can refer back to its inputs. If you prefer to have a standalone output collection, you may instead run

pipetask run -p ${AP_PIPE_DIR}/pipelines/DECam/ApPipe.yaml \
    --register-dataset-types -c parameters:coaddName=deep \
    -c isr:connections.bias=cpBias -c isr:connections.flat=cpFlat \
    -c parameters:apdb_config=apdb_config.py -b repo/ \
    -i "DECam/defaults,DECam/raw/all" --output-run processed \
    -d "visit in (411420, 419802) and detector=10"

Note

You must configure the pipeline to use the APDB config file, or ap_pipe will not run. In the examples above, it is configured with the -c option.

Note

Both examples above are only valid when running the pipeline for the first time. When rerunning with an existing chained collection using -o, you should omit the -i argument. When rerunning with an existing standalone collection using --output-run, you must pass --extend-run.

Expected outputs¶

If you used the chained option above, most of the output from ap_pipe should be written to a timestamped collection (e.g., processed/20200131T00h00m00s) in the repository. The exception is the source association database, which will be written to the location you configure. The result from running ap_pipe should look something like

apdb.db   <--- the Alert Production Database with DIAObjects
repo/
   contains_no_user_servicable_files/

To inspect this data with the Butler, you should instantiate a Butler within python and access the data products that way.

For example, in python

import lsst.daf.butler as dafButler
butler = dafButler.Butler('repo', collections="processed")  # collections keyword is optional
dataId = {'instrument': 'DECam', 'visit': 123456, 'detector': 42}
calexp = butler.get('calexp', dataId=dataId)
diffim = butler.get('deepDiff_differenceExp', dataId=dataId)
diaSourceTable = butler.get('deepDiff_diaSrc', dataId=dataId)

Supplemental information¶

Running on other cameras¶

Running ap_pipe on cameras other than DECam works much the same way. You need to provide a repository containing raws, calibs, and templates appropriate for the camera. There are versions of the AP pipeline for DECam, HSC, LATISS, and ImSim.

Common errors¶

‘KeyError: DatasetType <type> could not be found’: This usually means you left out the --register-dataset-types argument.
‘Expected exactly one instance of input <arbitrary dataset>’: This may mean an invalid pipeline, but can also mean that you did not provide an -i or --input argument when it was required. This is especially likely if the data ID is not one of the expected values.

Navigation