TransformSourceTableTask

TransformSourceTableTask transforms the full-width source table (a source dataset) to a narrower Source Table (a sourceTable dataset) as specified by the Data Products Definition Document (DPDD). It extracts, transforms, and renames columns per a yaml specification, by default the schemas/Source.yaml in this package. Inputs and outpus are both per-detector. The input is typically a wide table and output a narrow table appropriate for concatenating into a per-visit table by ConsolidateSourceTableTask.

It is the second of three postprocessing tasks to convert a src table to a per-visit Source Table that conforms to the standard data model. The first is WriteSourceTableTask, and the third is ConsolidateSourceTableTask.

TransformSourceTableTask is available as a command-line task, transformSourceTableTask.py.

Processing summary

TransformSourceTableTask

  1. Read in source.

#. Generate functors (by instantiating a lsst.pipe.tasks.functors.CompositeFunctor) from the yaml specification. Apply functors to the columns.

  1. Store output DataFrame in parquet-formatted sourceTable

transformSourceTableTask.py command-line interface

transformSourceTableTask.py REPOPATH [@file [@file2 ...]] [--output OUTPUTREPO | --rerun RERUN] [--id] [other options]

Key arguments:

REPOPATH
The input Butler repository’s URI or file path.

Key options:

--id:
The data IDs to process.

See also

See Command-line task argument reference for details and additional options.

Python API summary

from lsst.pipe.tasks.postprocess import TransformSourceTableTask
classTransformSourceTableTask(*args, **kwargs)

Transform/standardize a source catalog...

attributeconfig

Access configuration fields and retargetable subtasks.

methodrun(parq, funcs=None, dataId=None, band=None)

Do postprocessing calculations...

methodrunDataRef(dataRef)

Override to specify band label to run()...

See also

See the TransformSourceTableTask API reference for complete details.

Butler datasets

When run as the transformSourceTableTask.py command-line task, or directly through the runDataRef method, TransformSourceTableTask obtains datasets from the input Butler data repository and persists outputs to the output Butler data repository. Note that configurations for TransformSourceTableTask, and its subtasks, affect what datasets are persisted and what their content is.

Input datasets

source
Full-width parquet version of the src catalog. It is generated by WriteSourceTableTask

Output datasets

sourceTable
Source Table in parquet format (per-detector)

Retargetable subtasks

No subtasks.

Configuration fields

connections

Data type
lsst.pipe.base.config.Connections
Field type
ConfigField
Configurations describing the connections of the PipelineTask to datatypes

functorFile

Default
None
Field type
str Field (optional)
Path to YAML file specifying Science Data Model functors to use when copying columns and computing calibrated values.

primaryKey

Default
None
Field type
str Field (optional)
Name of column to be set as the DataFrame index. If None, the indexwill be named id

saveLogOutput

Default
True
Field type
bool Field
Flag to enable/disable saving of log output for a task, enabled by default.

saveMetadata

Default
True
Field type
bool Field
Flag to enable/disable metadata saving for a task, enabled by default.

Examples

The following command shows an example of how to run the task on an example HSC repository.

transformSourceTable.py /datasets/hsc/repo  --calib /datasets/hsc/repo/CALIB --rerun <rerun name> --id visit=30504  ccd=0..8^10..103

Using the python API

import os
from lsst.utils import getPackageDir
from lsst.daf.persistence import Butler
from lsst.pipe.tasks.postprocess import TransformSourceTableTask

# get input catalogs
butler = Butler('/path/to/repo')
dataId = {'visit': 30504, 'ccd': 51}
source = butler.get('source', dataId=dataId)

# setup task using the obs_subaru Source.yaml specification
config =  TransformSourceTableTask.ConfigClass()
config.functorFile = os.path.join(getPackageDir("obs_subaru"), 'policy', 'Source.yaml')
task = TransformSourceTableTask(config=config)
defaultFunctors = task.getFunctors()

# run the task to get a DataFrame
df = task.run(source, funcs=defaultFunctors, dataId=dataId)

You may also specify your own functors to apply:

import yaml
from  lsst.pipe.tasks.functors import CompositeFunctor

str = """
funcs:
    ApFlux:
        functor: LocalNanojansky
        args:
            - slot_CalibFlux_instFlux
            - slot_CalibFlux_instFluxErr
            - base_LocalPhotoCalib
            - base_LocalPhotoCalibErr
    ApFluxErr:
        functor: LocalNanojanskyErr
        args:
            - slot_CalibFlux_instFlux
            - slot_CalibFlux_instFluxErr
            - base_LocalPhotoCalib
            - base_LocalPhotoCalibErr
"""
exampleFunctors = CompositeFunctor.from_yaml(yaml.load(str))
df = task.run(source, funcs=exampleFunctors, dataId=dataId)