Processing DC2 data with the Gen3 Butler¶
The data explored in this guide is simulated data, the same kind used for Rubin’s Data Preview 0 (DP0).
Note
This guide assumes the user has access to a shared “Gen3” Butler repository containing data from the Dark Energy Science Collaboration (DESC)’s Data Challenge 2 (DC2), most likely on the lsstdevl
machines at NCSA.
This guide further assumes the user has a recently-built version of lsst.distrib
from the LSST Science Pipelines.
The instructions in this guide were verified to work in July 2021 using rubin-env 0.6.0
with the weekly tagged w_2021_30
.
Finally, this guide assumes the user is interested in running the Alert Production (AP) pipeline on this data using good seeing templates.
Processing Data¶
Now it’s time to process some data.
In this guide, we will start with raws, run “processCcd” (which includes lsst.ip.isr.IsrTask
, lsst.pipe.tasks.characterizeImage.CharacterizeImageTask
, and lsst.pipe.tasks.calibrate.CalibrateTask
), and make good seeing coadd templates.
In a second pipeline, we will run difference imaging using the templates we just built and save the results in an Alert Production Database (APDB).
Building good seeing templates¶
Here is the pipeline we will use.
Note that multiple python configuration options can be used by typing a placeholder continuation character (e.g., |
) followed by one python config declaration per line.
description: An AP pipeline for building templates with LsstCam-imSim data
instrument: lsst.obs.lsst.LsstCamImSim
imports:
- location: $AP_PIPE_DIR/pipelines/ApTemplate.yaml
tasks:
isr:
class: lsst.ip.isr.IsrTask
config:
connections.newBFKernel: bfk
doBrighterFatter: True
calibrate:
class: lsst.pipe.tasks.calibrate.CalibrateTask
config:
connections.astromRefCat: 'cal_ref_cat_2_2'
connections.photoRefCat: 'cal_ref_cat_2_2'
astromRefObjLoader.ref_dataset_name: 'cal_ref_cat_2_2'
photoRefObjLoader.ref_dataset_name: 'cal_ref_cat_2_2'
python: |
config.astromRefObjLoader.filterMap = {band: 'lsst_%s_smeared' % (band) for band in 'ugrizy'};
config.photoRefObjLoader.filterMap = {band: 'lsst_%s_smeared' % (band) for band in 'ugrizy'};
subsets:
singleFrameAp:
subset:
- isr
- characterizeImage
- calibrate
- consolidateVisitSummary
description: >
Tasks to run for single frame processing that are necessary to use the good seeing selector to build coadds for use as difference imaging templates.
This example pipeline imports a pipeline from lsst.ap.pipe
you may view on GitHub.
There are some special configurations concerning reference catalogs that must be set for this camera and/or dataset, so the example pipeline above lists the calibrate
task explicitly to add custom configurations.
To run this example pipeline, save it as ApTemplate-DC2.yaml
, choose an appropriate output collection name (u/USERNAME/OUTPUT-COLLECTION-1
in the example below), and run
pipetask run -j 12 -b /repo/dc2 -d "band='g' AND skymap='DC2' AND tract=3829" -i 2.2i/defaults/test-med-1 -o u/USERNAME/OUTPUT-COLLECTION-1 -p ApTemplate-DC2.yaml#singleFrameAp --register-dataset-types
This will take some time, but when it’s done, you should have calibrated exposures and a visit summary table ready for making warps, selecting the best seeing visits, and assembling coadds for use as difference imaging templates. To continue, run:
pipetask run -j 12 -b /repo/dc2 -d "skymap='DC2' AND tract=3829 AND patch=47" -i u/USERNAME/OUTPUT-COLLECTION-1 -o u/USERNAME/OUTPUT-COLLECTION-2 -p ApTemplate-DC2.yaml#makeTemplate --register-dataset-types
This will also take some time. When it is complete, you should have good seeing coadds covering the entirety of patch 47 in tract 3829 for multiple bands and be ready to run the rest of the AP Pipeline (namely difference imaging and source association).
Performing difference imaging to make an APDB¶
This next step uses a second pipeline, which effectively includes lsst.pipe.tasks.imageDifference.ImageDifferenceTask
, lsst.ap.association.TransformDiaSourceCatalogTask
, and lsst.ap.association.DiaPipelineTask
.
description: An AP pipeline for difference imaging with LsstCam-imSim
instrument: lsst.obs.lsst.LsstCamImSim
imports:
- location: ApTemplate-DC2.yaml
exclude: # These tasks are not necessary, as we already have templates
- consolidateVisitSummary
- selectGoodSeeingVisits
- makeWarp
- assembleCoadd
- location: $AP_PIPE_DIR/pipelines/ApPipe.yaml
exclude: # These tasks come from the ApTemplate-DC2 pipeline instead
- isr
- characterizeImage
- calibrate
This difference imaging pipeline uses the good seeing templates we built and treats all the DP0 defaults as input “science” images.
This is a two-step process. First, create an empty sqlite APDB:
make_apdb.py -c isolation_level=READ_UNCOMMITTED -c db_url="sqlite:////PATH-TO-DESIRED-APDB/ApPipeTest1.db"
Note that the APDB must be empty, and it is highly recommended to make a new one each time the AP Pipeline is rerun for any reason.
Second, run the pipeline:
pipetask run -j 12 -b /repo/dc2 -d "skymap='DC2' AND tract=3829 AND patch=47" -i u/USERNAME/OUTPUT-COLLECTION-2,2.2i/defaults/test-med-1 -o u/USERNAME/OUTPUT-COLLECTION-3 -p ApPipe-DC2.yaml -c diaPipe:apdb.isolation_level=READ_UNCOMMITTED -c diaPipe:apdb.db_url="sqlite:////PATH-TO-DESIRED-APDB/ApPipeTest1.db" --register-dataset-types
When this pipeline completes, you should have difference images and an APDB with populated tables (DiaSource
, DiaObject
, etc.) for multiple bands in patch 47 of tract 3829 of this dataset.
Processing Data with BPS¶
The example data processing steps above assume a relatively small data volume (a single patch), so running from the command line and using an sqlite APDB is appropriate.
However, if you want to process larger data volumes, you’ll need to use the Batch Processing System (BPS, lsst.ctrl.bps
) and a PostgreSQL APDB.
Describing how to set up a PostgreSQL APDB is beyond the scope of this guide.
Members of the Data Management Team may wish to reference this non-public guide for how to use an existing NCSA PostgreSQL database as an APDB.
One key difference between using an sqlite APDB versus a PostgreSQL APDB is that the former is a file on disk created from scratch when running make_apdb.py
.
The latter requires a database to already exist, and make_apdb.py
turns the PostgreSQL database’s default schema into an empty APDB.
As before, you will still need to run, e.g.,
make_apdb.py -c db_url="postgresql://USER@DB_ADDRESS/DB_NAME"
(being sure to replace USER
, DB_ADDRESS
, and DB_NAME
with the correct values).
Next, use the documentation for lsst.ctrl.bps
to define a submission by creating two BPS configuration files — one for the template-building step and one for the difference-imaging step.
Save these BPS configuration files as ApTemplate-DC2-bps.yaml
and ApPipe-DC2-bps.yaml
.
Note
The lsst.ctrl.bps
module is well-documented, and is the first place to look for how to submit a batch processing run on the lsst-devl machines.
Ensure the pipelineYaml
keyword points to ApTemplate-DC2.yaml
and ApPipe-DC2.yaml
in each configuration file, respectively, and that you specify appropriate values for inCollection
, outCollection
, and dataQuery
like before on the command line with pipetask run
and the -i
, -o
, and -d
arguments.
For example, to make good seeing templates using all available patches and bands, you may wish to use a less restrictive data query like instrument='LSSTCam-imSim' and tract in (3828, 3829) and skymap='DC2'
.
When you are ready to submit your first BPS run to build templates, follow the documentation to submit a run, e.g.,
bps submit ApTemplate-DC2-bps.yaml
Once the templates are built, the second BPS configuration file will need to have two input collections: the output collection from the first run and a collection with raw science images (such as 2.2i/defaults/test-med-1
).
To submit the second BPS run and perform difference imaging and populate the PostgreSQL APDB, run, e.g.,
bps submit ApPipe-DC2-bps.yaml