Getting started tutorial part 1: setting up the Butler data repository¶
The LSST Science Pipelines can process data from several telescopes using LSST’s algorithms. In this tutorial series, we’ll calibrate and reduce Hyper Suprime-Cam (HSC) exposures into coadditions and catalogs of objects.
This hands-on tutorial is intended for anyone just getting started with the LSST Science Pipelines. You’ll get a feel for setting up a Pipelines environment, working with data repositories, running command line tasks, and working with the Pipelines’ Python APIs. Along the way we’ll point you to additional documentation.
In this first part of the tutorial series we will collect the raw observations and calibration data needed for the tutorial. Along the way, you’ll be introduced to the Butler, which is the Pipelines’ interface for managing, reading, and writing datasets.
Setup check¶
Before we get started, you’ll need to install the LSST Science Pipelines. Follow installation tutorial to get the Pipelines software using the recommended method.
To make sure the environment is set up properly, run:
eups list lsst_distrib
The line printed out should contain the word setup
.
If not, review the installation tutorials on activating the environment and setting up lsst_distrib
.
Let’s get started.
Downloading the sample HSC data¶
Sample data for this tutorial comes from the ci_hsc package. ci_hsc contains a small set of Hyper Suprime-Cam (HSC) exposures. The Science Pipelines provides native integrations for many observatories, including HSC, CFHT/MegaCam, SDSS, and of course LSST.
ci_hsc is a Git LFS-backed package, so make sure you’ve installed and configured Git LFS for LSST.
First, clone ci_hsc using Git:
git clone https://github.com/lsst/ci_hsc
Then setup the package to add it to the EUPS stack:
setup -j -r ci_hsc
Tip
The -r ci_hsc
argument points to the package’s directory.
The -j
argument means that we’re just setting up ci_hsc
without affecting other packages.
Now run:
echo $CI_HSC_DIR
The $CI_HSC_DIR
environment variable should point to the ci_hsc directory.
Creating a Butler repository for HSC data¶
In the LSST Science Pipelines you don’t directly manage data files on disk. Instead, you access data through the Butler client. This gives you flexibility to work with data from different observatories without significantly changing your workflow.
The Butler manages data in repositories.
On a local filesystem, Butler repositories are simple directories.
Let’s create a repository called DATA
:
mkdir DATA
Then add a _mapper
file to the repository:
echo "lsst.obs.hsc.HscMapper" > DATA/_mapper
The Butler uses the mapper to find and organize data in a format specific to each camera.
Here we’re using the lsst.obs.hsc.HscMapper
mapper because we’re processing HSC data in this repository.
Ingesting raw data into the Butler repository¶
Next, let’s populate the repository with data from ci_hsc. The Pipelines’ ingestImages.py command (called a command line task) links raw images into a Butler repository, allowing the mapper to organize the data. Run:
ingestImages.py DATA $CI_HSC_DIR/raw/*.fits --mode=link
Tip
Notice that the first argument to most command line tasks is the Butler repository.
In this case it’s the DATA
directory.
Tip
You can learn about the arguments for command line tasks with the -h
flag.
For example:
ingestImages.py -h
Ingesting calibrations into the Butler repository¶
Next, we’ll add calibration images (such as dark, flat, and bias frames) associated with the raw data:
ln -s $CI_HSC_DIR/CALIB/ DATA/CALIB
Linking an astrometric reference catalog into the Butler repository¶
The Pipelines uses external stellar catalogs to refine the WCS of images. ci_hsc includes a subset of the Pan-STARRS PS1 catalog that has been prepared as an astrometric reference catalog. Let’s link that catalog into the Butler repository:
mkdir -p DATA/ref_cats
ln -s $CI_HSC_DIR/ps1_pv3_3pi_20170110 DATA/ref_cats/ps1_pv3_3pi_20170110
See also
Learn more about the PS1 reference catalog and how to use it with the LSST Science Pipelines in this LSST Community forum topic.
Next up¶
In part 2 of this tutorial series we will process the HSC data in the Butler repository into calibrated exposures.