Inject Synthetic Sources

Injecting Synthetic Sources Into Visit-Level or Coadd-Level Dataset Types

Synthetic sources can be injected into any imaging data product output by the LSST Science Pipelines. This is useful for testing algorithmic performance on simulated data, where the truth is known, and for various subsequent quality assurance tasks.

The sections below describe how to inject synthetic sources into a visit-level exposure-type or visit-type datasets (i.e., datasets with the dimension exposure or visit), or into a coadd-level coadded dataset. Options for injection on the command line and in Python are presented.

Prior to injection, the instructions on this page assume that the user will already have in-place a fully qualified source injection pipeline definition YAML (see Make an Injection Pipeline) and a suitable synthetic source injection catalog describing the sources to be injected (see Generate an Injection Catalog) which has been ingested into the data butler (see Ingest an Injection Catalog).

Injection on the Command Line

Source injection on the command line is performed using the pipetask run command. The process for injection into visit-level imaging (i.e., exposure or visit type data) or injection into coadd-level imaging (e.g., a deepCoadd`) is largely the same, save for the use of a different data query and a different injection task or pipeline subset.

The following command line example injects synthetic sources into the HSC exposure 1228, detector 51, postISRCCD dataset. For the purposes of this example, we will run the entirety of the HSC DRP RC2 step 1 subset. This subset contains all the tasks necessary to process raw science data through to initial visit-level calibrated outputs. The step 1 subset will have had the inject_exposure task (ExposureInjectTask) merged into it following a successful run of make_injection_pipeline.

Tip

Injection into a coadd-level data product such as a deepCoadd can easily be achieved by substituting step1 for step3 in the command below and modifying the -d data query. For the injection catalog generated in these notes, this coadd-level data query would work well:

-d "instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=42 AND band='i'"
pipetask --long-log --log-file $LOGFILE \
run --register-dataset-types \
-b $REPO \
-i $INPUT_DATA_COLL,$INJECTION_CATALOG_COLL \
-o $OUTPUT_COLL \
-p DRP-RC2+injection.yaml#step1 \
-d "instrument='HSC' AND exposure=1228 AND detector=51"

where

$LOGFILE

The full path to a user-defined output log file.

$REPO

The path to the butler repository.

$INPUT_DATA_COLL

The name of the input data collection.

$INJECTION_CATALOG_COLL

The name of the input injection catalog collection.

$OUTPUT_COLL

The name of the injected output collection.

Caution

Standard processing should not normally have to make use of the --register-dataset-types flag. This flag is only required to register a new output dataset type with the butler for the very first time.

If injection outputs have already been generated within your butler repository, you should omit this flag from your run command to prevent any accidental registration of unwanted dataset types.

Note

Similar to stepN subsets are injected_stepN subsets. These only run tasks including and after the injection task. The injected_stepN subsets can save memory and runtime if the tasks prior to injection have already been run.

Assuming processing completes successfully, the injected_postISRCCD and associated injected_postISRCCD_catalog will be written to the butler repository. Various downstream step1 data products should also exist, including the injected_calexp dataset type (see example images below).

Standard log messages that get printed as part of a successful run may include lines similar to:

Retrieved 25 injection sources from 1 HTM trixel.
Identified 19 injection sources with centroids outside the padded image bounding box.
Catalog cleaning removed 19 of 25 sources; 6 remaining for catalog checking.
Catalog checking flagged 0 of 6 sources; 6 remaining for source generation.
Adding INJECTED and INJECTED_CORE mask planes to the exposure.
Generating 6 injection sources consisting of 1 unique type: Sersic(6).
Injected 6 of 6 potential sources. 0 sources flagged and skipped.

An example injected output produced by the above snippet is shown below.

HSC visit 1228, detector 51, showcasing the injection of a series of synthetic Sérsic sources.

Calibrated exposure (calexp and injected_calexp) data for HSC visit 1228, detector 51, showcasing the injection of a series of synthetic Sérsic sources. Images are asinh scaled across the central 98% flux range and smoothed with a Gaussian kernel of FWHM 5 pixels.

HSC visit 1228, detector 51, before source injection.

Before injection.

HSC visit 1228, detector 51, after source injection.

After injection.

HSC visit 1228, detector 51, difference.

Difference image.

Injection in Python

Source injection in Python is achieved by using the source injection task classes directly. As on the command line, the process for injection into visit-level imaging or coadd-level imaging is largely the same, save for the use of a different task class, a different data query, and use of different calibration data products (see the notes in the Python snippet below).

The following Python example injects synthetic sources into the HSC i-band tract 9813, patch 42, deepCoadd dataset. For the purposes of this example, we will just run the source injection task alone.

from lsst.daf.butler import Butler
from lsst.source.injection import CoaddInjectConfig,CoaddInjectTask
# NOTE: For injections into other dataset types, use the following instead:
# from lsst.source.injection import ExposureInjectConfig,ExposureInjectTask
# from lsst.source.injection import VisitInjectConfig,VisitInjectTask

# Instantiate a butler.
butler = Butler(REPO)

# Load an input deepCoadd dataset.
dataId = dict(
    instrument="HSC",
    skymap="hsc_rings_v1",
    tract=9813,
    patch=42,
    band="i",
)
input_exposure = butler.get(
    "deepCoadd",
    dataId=dataId,
    collections=INPUT_DATA_COLL,
)
# NOTE: Visit-level injections also require a visit summary table.
# visit_summary = butler.get(
#     "finalVisitSummary",
#     dataId=dataId,
#     collections=INPUT_DATA_COLL,
# )

# Get calibration data products.
psf = input_exposure.getPsf()
photo_calib = input_exposure.getPhotoCalib()
wcs = input_exposure.getWcs()
# NOTE: Visit-level injections should instead use the visit summary table.
# detector_summary = visit_summary.find(dataId["detector"])
# psf = detector_summary.getPsf()
# photo_calib = detector_summary.getPhotoCalib()
# wcs = detector_summary.getWcs()

# Load input injection catalogs, here just for i-band catalogs.
injection_refs = butler.registry.queryDatasets(
    "injection_catalog",
    band="i",
    collections=INJECTION_CATALOG_COLL,
)
injection_catalogs = [
    butler.get(injection_ref) for injection_ref in injection_refs
]

# Instantiate the injection classes.
inject_config = CoaddInjectConfig()
inject_task = CoaddInjectTask(config=inject_config)

# Run the source injection task.
injected_output = inject_task.run(
    injection_catalogs=injection_catalogs,
    input_exposure=input_exposure.clone(),
    psf=psf,
    photo_calib=photo_calib,
    wcs=wcs,
)
injected_exposure=injected_output.output_exposure
injected_catalog=injected_output.output_catalog

where

REPO

The path to the butler repository.

INPUT_DATA_COLL

The name of the input data collection.

INJECTION_CATALOG_COLL

The name of the input injection catalog collection.

An example injected output produced by the above snippet is shown below.

HSC tract 9813, patch 42 in the i-band, showcasing the injection of a series of synthetic Sérsic sources.

Coadd-level (deepCoadd and injected_deepCoadd) data for HSC tract 9813, patch 42 in the i-band, showcasing the injection of a series of synthetic Sérsic sources. Images are log scaled across the central 99% flux range and smoothed with a Gaussian kernel of FWHM 5 pixels.

HSC tract 9813, patch 42 in the i-band, before Sérsic source injection.

Before injection.

HSC tract 9813, patch 42 in the i-band, after Sérsic source injection.

After injection.

HSC tract 9813, patch 42 in the i-band, difference.

Difference image.

Injecting Postage Stamps

The commands above have focussed on injecting synthetic parametric models produced by GalSim. It’s also possible to inject FITS postage stamps directly into the data. These may be real astronomical images, or they may be simulated images produced by other software.

By way of example, lets inject multiple copies of the 2dFGRS galaxy TGN420Z151, a \(z\sim0.17\) galaxy of brightness \(m_{i}\sim17.2\) mag located in HSC tract 9813, patch 42. First, lets construct a small postage stamp using existing HSC data products:

from lsst.daf.butler import Butler
from lsst.geom import Box2I, Extent2I, Point2I

# Instantiate a butler.
butler = Butler(REPO)

# Get the deepCoadd for HSC i-band tract 9813, patch 42.
dataId = dict(
    instrument="HSC",
    skymap="hsc_rings_v1",
    tract=9813,
    patch=42,
    band="i",
)
t9813p42i = butler.get(
    "deepCoadd",
    dataId=dataId,
    collections=INPUT_DATA_COLL,
)

# Find the x/y coordinates for the 2dFGRS TGN420Z151 galaxy.
wcs = t9813p42i.wcs
x0, y0 = wcs.skyToPixelArray(149.8599524, 2.1487149, degrees=True)

# Create a 181x181 pixel postage stamp centered on the galaxy.
bbox = Box2I(Point2I(x0, y0), Extent2I(1,1))
bbox.grow(90)
tgn420z151 = t9813p42i[bbox]

# Save the postage stamp image to a FITS file.
tgn420z151.image.writeFits(POSTAGE_STAMP_FILE)

where

REPO

The path to the butler repository.

INPUT_DATA_COLL

The name of the input data collection.

POSTAGE_STAMP_FILE

The file name for the postage stamp FITS file.

This postage stamp looks like this:

A postage stamp of the 2dFGRS galaxy TGN420Z151, a :math:`z\sim0.17` galaxy of brightness :math:`m_{i}\sim17.2` mag located in HSC tract 9813, patch 42..

An HSC i-band postage stamp of the 2dFGRS galaxy TGN420Z151, a \(z\sim0.17\) galaxy of brightness \(m_{i}\sim17.2\) mag located in HSC tract 9813, patch 42. Image is log scaled across the central 99.5% flux range.

Next, lets construct a simple injection catalog and ingest it into the butler. Injection of FITS-file postage stamps only requires the ra, dec, source_type, mag and stamp columns to be specified in the injection catalog. Note that below we switch from Python to the command line interface:

generate_injection_catalog \
-a 149.7 150.1 \
-d 2.0 2.4 \
-n 50 \
-p source_type Stamp \
-p mag 17.2 \
-p stamp $POSTAGE_STAMP_FILE \
-b $REPO \
-w deepCoadd_calexp \
-c $INPUT_DATA_COLL \
--where "instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=42 AND band='i'" \
-i i \
-o $INJECTION_CATALOG_COLL

where

$POSTAGE_STAMP_FILE

The file name for the postage stamp FITS file.

$REPO

The path to the butler repository.

$INPUT_DATA_COLL

The name of the input data collection.

$INJECTION_CATALOG_COLL

The name of the input injection catalog collection.

The first several rows from the injection catalog produced by the above snippet look like this:

injection_id         ra                dec         source_type mag       stamp
------------ ------------------ ------------------ ----------- ---- ---------------
           0  150.0403162981621  2.076877152109224       Stamp 17.2 tgn420z151.fits
           1 149.94655709194345 2.0422859082646854       Stamp 17.2 tgn420z151.fits
           2 150.02155685175438  2.116390565528664       Stamp 17.2 tgn420z151.fits
           3 149.92773562242124  2.358408570029682       Stamp 17.2 tgn420z151.fits
           4 149.82770694427973  2.338624350977013       Stamp 17.2 tgn420z151.fits
...

Finally, lets inject our postage stamp multiple times into the HSC i-band tract 9813, patch 42 image:

pipetask --long-log --log-file $LOGFILE \
run --register-dataset-types \
-b $REPO \
-i $INPUT_DATA_COLL,$INJECTION_CATALOG_COLL \
-o $OUTPUT_COLL \
-p $SOURCE_INJECTION_DIR/pipelines/inject_coadd.yaml \
-d "instrument='HSC' AND skymap='hsc_rings_v1' AND tract=9813 AND patch=42 AND band='i'"

where

$LOGFILE

The full path to a user-defined output log file.

$REPO

The path to the butler repository.

$INPUT_DATA_COLL

The name of the input data collection.

$INJECTION_CATALOG_COLL

The name of the input injection catalog collection.

$OUTPUT_COLL

The name of the injected output collection.

$SOURCE_INJECTION_DIR

The path to the source injection package directory.

Tip

If the injection FITS file is not in the same directory as the working directory where the pipetask run command is run, the stamp_prefix configuration option can be used. This appends a string to the beginning of the FITS file name taken from the catalog, allowing for your FITS files to be stored in a different directory to the current working directory.

Running the above snippet produces the following:

HSC tract 9813, patch 42 in the i-band, showcasing the injection of multiple copies of 2dFGRS galaxy TGN420Z151.

Coadd-level (deepCoadd and injected_deepCoadd) data for HSC tract 9813, patch 42 in the i-band, showcasing the injection of multiple copies of 2dFGRS galaxy TGN420Z151. Images are log scaled across the central 99% flux range and smoothed with a Gaussian kernel of FWHM 5 pixels.

HSC tract 9813, patch 42 in the i-band, before postage stamp injection.

Before injection.

HSC tract 9813, patch 42 in the i-band, after postage stamp injection.

After injection.

HSC tract 9813, patch 42 in the i-band, difference.

Difference image.

See also

For a “Rubin themed” example postage stamp injection, see the top of the FAQs page.

Wrap Up

This page has described how to inject synthetic sources into a visit-level exposure-type or visit-type dataset, or into a coadd-level coadded dataset. Options for injection on the command line and in Python have been presented. The special case of injecting FITS-file postage stamp images has also been covered.

Move on to another quick reference guide, consult the FAQs, or head back to the main page.