.. lsst-task-topic:: lsst.pipe.tasks.postprocess.TransformSourceTableTask

########################
TransformSourceTableTask
########################


``TransformSourceTableTask`` transforms the full-width source table
(a ``source`` dataset) to a narrower Source Table (a ``sourceTable`` dataset)
as specified by the Data Products Definition Document (`DPDD <https://lse-163.lsst.io>`).
It extracts, transforms, and renames columns per a yaml specification, by default the
`Source.yaml` in obs_package/policy.   Inputs and outpus are both per-detector.
The input is typically a wide table and output a narrow table appropriate for
concatenating into a per-visit table by ConsolidateSourceTableTask.

It is the second of three postprocessing tasks to convert a `src` table to a
per-visit Source Table that conforms to the standard data model. The first is
:doc:`lsst.pipe.tasks.postprocess.WriteSourceTableTask`, and the third is :doc:`lsst.pipe.tasks.postprocess.ConsolidateSourceTableTask`.

``TransformSourceTableTask`` is available as a
:ref:`command-line task <pipe-tasks-command-line-tasks>`,
:command:`transformSourceTableTask.py`.

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-summary:

Processing summary
==================

``TransformSourceTableTask``

#. Read in `source`.

#. Generate functors (by instantiating a `lsst.pipe.tasks.functors.CompositeFunctor`)
from the yaml specification. Apply functors to the columns.

#. Store output DataFrame in parquet-formatted `sourceTable`

.. lsst.pipe.tasks.postprocess.TransformSourceTableTask-cli:

transformSourceTableTask.py command-line interface
==================================================

.. code-block:: text

   transformSourceTableTask.py REPOPATH [@file [@file2 ...]] [--output OUTPUTREPO | --rerun RERUN] [--id] [other options]

Key arguments:

:option:`REPOPATH`
   The input Butler repository's URI or file path.

Key options:

:option:`--id`:
   The data IDs to process.

.. seealso::

   See :ref:`command-line-task-argument-reference` for details and additional options.

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-api:

Python API summary
==================

.. lsst-task-api-summary:: lsst.pipe.tasks.postprocess.TransformSourceTableTask

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-butler:

Butler datasets
===============

When run as the ``transformSourceTableTask.py`` command-line task, or directly through the `~lsst.pipe.tasks.postprocess.TransformSourceTableTask.runDataRef` method, ``TransformSourceTableTask`` obtains datasets from the input Butler data repository and persists outputs to the output Butler data repository.
Note that configurations for ``TransformSourceTableTask``, and its subtasks, affect what datasets are persisted and what their content is.

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-butler-inputs:

Input datasets
--------------

``source``
    Full-width parquet version of the ``src`` catalog.
    It is generated by ``WriteSourceTableTask``

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-butler-outputs:

Output datasets
---------------

``sourceTable``
    Source Table in parquet format (per-detector)


.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-subtasks:

Retargetable subtasks
=====================

.. lsst-task-config-subtasks:: lsst.pipe.tasks.postprocess.TransformSourceTableTask

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-configs:

Configuration fields
====================

.. lsst-task-config-fields:: lsst.pipe.tasks.postprocess.TransformSourceTableTask

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-examples:

Examples
========

The following command shows an example of how to run the task on an example HSC repository.

.. code-block:: bash

    transformSourceTable.py /datasets/hsc/repo  --calib /datasets/hsc/repo/CALIB --rerun <rerun name> --id visit=30504  ccd=0..8^10..103


Using the python API

.. code-block:: python

    import os
    from lsst.utils import getPackageDir
    from lsst.daf.persistence import Butler
    from lsst.pipe.tasks.postprocess import TransformSourceTableTask

    # get input catalogs
    butler = Butler('/path/to/repo')
    dataId = {'visit': 30504, 'ccd': 51}
    source = butler.get('source', dataId=dataId)

    # setup task using the obs_subaru Source.yaml specification
    config =  TransformSourceTableTask.ConfigClass()
    config.functorFile = os.path.join(getPackageDir("obs_subaru"), 'policy', 'Source.yaml')
    task = TransformSourceTableTask(config=config)
    defaultFunctors = task.getFunctors()

    # run the task to get a DataFrame
    df = task.run(source, funcs=defaultFunctors, dataId=dataId)

You may also specify your own functors to apply:

.. code-block:: python

    import yaml
    from  lsst.pipe.tasks.functors import CompositeFunctor

    str = """
    funcs:
        ApFlux:
            functor: LocalNanojansky
            args:
                - slot_CalibFlux_instFlux
                - slot_CalibFlux_instFluxErr
                - base_LocalPhotoCalib
                - base_LocalPhotoCalibErr
        ApFluxErr:
            functor: LocalNanojanskyErr
            args:
                - slot_CalibFlux_instFlux
                - slot_CalibFlux_instFluxErr
                - base_LocalPhotoCalib
                - base_LocalPhotoCalibErr
    """
    exampleFunctors = CompositeFunctor.from_yaml(yaml.load(str))
    df = task.run(source, funcs=exampleFunctors, dataId=dataId)

.. _lsst.pipe.tasks.postprocess.TransformSourceTableTask-debug: