.. Brief: This tutorial is geared towards beginners to data processing with the Science Pipelines. Our goal is to guide the reader through a small data processing project to show what it feels like to use the Science Pipelines. We want this tutorial to be kinetic; instead of getting bogged down in explanations and side-notes, we'll link to other documentation. Don't assume the user has any prior experience with the Pipelines; do assume a working knowledge of astronomy and the command line. .. _getting-started-tutorial-processccd: ############################################################################# Getting started tutorial part 2: calibrating single frames with processCcd.py ############################################################################# In this part of the :ref:`tutorial series ` you'll process individual raw HSC images in the Butler repository (which you assembled in :doc:`part 1 `) into calibrated exposures. We'll use the :command:`processCcd.py` command-line task to remove instrumental signatures with dark, bias and flat field calibration images. :command:`processCcd.py` will also use the reference catalog to establish a preliminary WCS and photometric zeropoint solution. Set up ====== Pick up your shell session where you left off in :doc:`part 1 `. That means your current working directory must *contain* the :file:`DATA` directory (the Butler repository). The ``lsst_distrib`` package also needs to be set up in your shell environment. See :doc:`/install/setup` for details on doing this. Reviewing what data will be processed ===================================== :command:`processCcd.py` can operate on a single image or iterate over multiple images. You can do a dry-run to see what data will be processed in the Butler repository: .. code-block:: bash processCcd.py DATA --rerun processCcdOutputs --id --show data The important arguments here are ``--id`` and ``--show data``. The ``--id`` argument allows you to select datasets to process by their **data IDs**. Data IDs describe individual datasets in the Butler repository. Datasets also have *types*, and each command-line task will only process data of certain types. In this case, :command:`processCcd.py` processes ``raw`` exposures (uncalibrated images from individual CCD chips). In the above command, the plain ``--id`` argument acts as a wildcard that selects all ``raw``-type data in the repository (in a moment we'll see how to filter data IDs). The ``--show data`` argument puts :command:`processCcd.py` into a dry-run mode that prints a list of data IDs to standard output that would be processed according to the ``--id`` argument rather than actually processing the data. For example, one line of the output from a :command:`processCcd.py` run with ``--show data`` looks like: .. code-block:: text id dataRef.dataId = {'taiObs': '2013-06-17', 'pointing': 533, 'visit': 903334, 'dateObs': '2013-06-17', 'filter': 'HSC-R', 'field': 'STRIPE82L', 'ccd': 23, 'expTime': 30.0} Notice the keys that describe each data ID, such as the ``visit`` (exposure identifier for the HSC camera), ``ccd`` (identifies a specific chip in the HSC camera) and ``filter``, among others. With these keys you can select exactly what data you want to process. For example, here's how to select just ``HSC-I``-band datasets: .. code-block:: bash processCcd.py DATA --rerun processCcdOutputs --id filter=HSC-I --show data Now only data IDs for ``HSC-I`` datasets are printed. The ``--id`` argument supports a rich syntax for expressing data IDs by multiple selection criteria. .. FIXME: Link to further documentation on Data IDs and the selector language from the lsst.pipe.base package documentation. Running processCcd.py ===================== After learning about datasets, go ahead and run :command:`processCcd.py` on all ``raw`` datasets in the repository: .. code-block:: bash processCcd.py DATA --rerun processCcdOutputs --id Aside: reruns and output Butler repositories ============================================ While :command:`processCcd.py` runs, let's discuss the ``--rerun`` argument. Command-line tasks, like :command:`processCcd.py`, write their output datasets to Butler data repositories. There are two ways to specify an output data repository: with the ``--output`` argument, or with the ``--rerun`` command-line argument. The rerun pattern is especially convenient, especially with local Butler repositories, because each rerun is packaged within the file system directory of the parent Butler data repository (the :file:`DATA` directory in this tutorial). Above, when you ran :command:`processCcd.py`, you configured it to write outputs to a new rerun named ``processCcdOutputs``. The idea is that you'll process data by running a sequence of individual command-line tasks. At each stage, you will output datasets to a new rerun. This is called *rerun chaining,* and you learn how to do this :ref:`in the next tutorial `. If you need to re-do a processing step, to experiment with a different command-line task configuration for example, you can do that safely by outputting to a new rerun. .. important:: Bottom line: a given rerun must contain data that was all processed consistently, with the same task configurations. If you mix outputs from multiple runs of a command-line task with different configurations, it may impossible to understand or use the results of the data processing. Wrap up ======= In this tutorial, you've used the :command:`processCcd.py` command-line task to calibrate ``raw`` images in a Butler repository. Here are some key takeaways: - The :command:`processCcd.py` command-line task processes ``raw`` datasets, applying both photometric and astrometric calibrations. - Datasets are described by both a *type* and *data ID*. Data IDs are key-value pairs that describe a dataset (for example ``filter``, ``visit``, ``ccd``, ``field``). - Command-line tasks have ``--id`` arguments that let you select which datasets to process. An empty ``--id`` arguments acts as a wildcard that selects all available datasets in the repository of the type the command-line task can processes. - Command-line tasks write their outputs to a Butler data repository. Reruns (``--rerun`` argument) are a convenient way to create output data repositories. Make sure that all datasets in a rerun are processed consistently. Continue this tutorial in :doc:`part 3, where you'll learn how display these calibrated exposures `.