Running the AP pipeline#

Setup#

Pick up where you left off in Getting Started. This means you already have a repository of ingested DECam data and have set up the LSST Science Pipelines stack.

Your repository should have the following collections, which can be checked using butler query-collections <repo>:

DECam/calib: biases, flats, defects, camera specs, etc.
DECam/raw/all: images to be processesd
refcats: reference catalogs for calibration
skymaps: index for the templates
templates/deep: deepCoadd templates for difference imaging

AP pipeline on the command line#

Like most Vera Rubin Observatory pipelines, the AP Pipeline is run with an external runner called pipetask. This can be found in the ctrl_mpexec package, which is included as part of lsst_distrib.

The pipeline itself is configured in ap_pipe/pipelines/DECam/ApPipe.yaml.

To process your ingested data, run

In this case, a processed/<timestamp> collection will be created within repo and the results will be written there. The apdb_config.yaml file will be created by apdb-cli and passed to pipetask. See Setting up the Alert Production Database for ap_pipe for more information on apdb-cli.

This example command only processes observations corresponding to visits 411420 and 419802, both with only detector 10.

The example creates a “chained” output collection that can refer back to its inputs. If you prefer to have a standalone output collection, you may instead run

Note

You must configure the pipeline to use the APDB config file, or ap_pipe will not run. In the examples above, it is configured with the -c option.

Note

Both examples above are only valid when running the pipeline for the first time. When rerunning with an existing chained collection using -o, you should omit the -i argument. When rerunning with an existing standalone collection using --output-run, you must pass --extend-run.

Expected outputs#

If you used the chained option above, most of the output from ap_pipe should be written to a timestamped collection (e.g., processed/20200131T00h00m00s) in the repository. The exception is the source association database, which will be written to the location you configure. The result from running ap_pipe should look something like

apdb.db   <--- the Alert Production Database with DIAObjects
repo/
   contains_no_user_servicable_files/

To inspect this data with the Butler, you should instantiate a Butler within python and access the data products that way.

For example, in python

import lsst.daf.butler as dafButler
butler = dafButler.Butler('repo', collections="processed")  # collections keyword is optional
dataId = {'instrument': 'DECam', 'visit': 123456, 'detector': 42}
calexp = butler.get('calexp', dataId=dataId)
diffim = butler.get('deepDiff_differenceExp', dataId=dataId)
diaSourceTable = butler.get('deepDiff_diaSrc', dataId=dataId)

Supplemental information#

Running on other cameras#

Running ap_pipe on cameras other than DECam works much the same way. You need to provide a repository containing raws, calibs, and templates appropriate for the camera. There are versions of the AP pipeline for DECam, HSC, LATISS, and ImSim.

Common errors#

‘KeyError: DatasetType <type> could not be found’: This usually means you left out the --register-dataset-types argument.
‘Expected exactly one instance of input <arbitrary dataset>’: This may mean an invalid pipeline, but can also mean that you did not provide an -i or --input argument when it was required. This is especially likely if the data ID is not one of the expected values.