Running the AP pipeline#
Setup#
Pick up where you left off in Getting Started. This means you already have a repository of ingested DECam data and have set up the LSST Science Pipelines stack.
Your repository should have the following collections, which can be checked using butler query-collections <repo>:
DECam/calib: biases, flats, defects, camera specs, etc.
DECam/raw/all: images to be processesd
refcats: reference catalogs for calibration
skymaps: index for the templates
templates/deep:
deepCoaddtemplates for difference imaging
AP pipeline on the command line#
Like most Vera Rubin Observatory pipelines, the AP Pipeline is run with an external runner called pipetask.
This can be found in the ctrl_mpexec package, which is included as part of lsst_distrib.
The pipeline itself is configured in ap_pipe/pipelines/DECam/ApPipe.yaml.
To process your ingested data, run
In this case, a processed/<timestamp> collection will be created within repo and the results will be written there.
The apdb_config.yaml file will be created by apdb-cli and passed to pipetask.
See Setting up the Alert Production Database for ap_pipe for more information on apdb-cli.
This example command only processes observations corresponding to visits 411420 and 419802, both with only detector 10.
The example creates a “chained” output collection that can refer back to its inputs. If you prefer to have a standalone output collection, you may instead run
Note
You must configure the pipeline to use the APDB config file, or ap_pipe will not run.
In the examples above, it is configured with the -c option.
Note
Both examples above are only valid when running the pipeline for the first time.
When rerunning with an existing chained collection using -o, you should omit the -i argument.
When rerunning with an existing standalone collection using --output-run, you must pass --extend-run.
Expected outputs#
If you used the chained option above, most of the output from ap_pipe should be written to a timestamped collection (e.g., processed/20200131T00h00m00s) in the repository.
The exception is the source association database, which will be written to the location you configure.
The result from running ap_pipe should look something like
apdb.db <--- the Alert Production Database with DIAObjects
repo/
contains_no_user_servicable_files/
To inspect this data with the Butler, you should instantiate a Butler within python and access the data products that way.
For example, in python
import lsst.daf.butler as dafButler
butler = dafButler.Butler('repo', collections="processed") # collections keyword is optional
dataId = {'instrument': 'DECam', 'visit': 123456, 'detector': 42}
calexp = butler.get('calexp', dataId=dataId)
diffim = butler.get('deepDiff_differenceExp', dataId=dataId)
diaSourceTable = butler.get('deepDiff_diaSrc', dataId=dataId)
Supplemental information#
Running on other cameras#
Running ap_pipe on cameras other than DECam works much the same way. You need to provide a repository containing raws, calibs, and templates appropriate for the camera. There are versions of the AP pipeline for DECam, HSC, LATISS, and ImSim.
Common errors#
‘KeyError: DatasetType <type> could not be found’: This usually means you left out the
--register-dataset-typesargument.‘Expected exactly one instance of input <arbitrary dataset>’: This may mean an invalid pipeline, but can also mean that you did not provide an
-ior--inputargument when it was required. This is especially likely if the data ID is not one of the expected values.