TransformSourceTableTask¶
TransformSourceTableTask
transforms the full-width source table
(a source
dataset) to a narrower Source Table (a sourceTable
dataset)
as specified by the Data Products Definition Document (DPDD
).
It extracts, transforms, and renames columns per a yaml specification, by default the
Source.yaml
in obs_package/policy. Inputs and outpus are both per-detector.
The input is typically a wide table and output a narrow table appropriate for
concatenating into a per-visit table by ConsolidateSourceTableTask.
It is the second of three postprocessing tasks to convert a src
table to a
per-visit Source Table that conforms to the standard data model. The first is
WriteSourceTableTask, and the third is ConsolidateSourceTableTask.
TransformSourceTableTask
is available as a command-line task, transformSourceTableTask.py.
Processing summary¶
TransformSourceTableTask
- Read in
source
.
#. Generate functors (by instantiating a lsst.pipe.tasks.functors.CompositeFunctor
)
from the yaml specification. Apply functors to the columns.
- Store output DataFrame in parquet-formatted
sourceTable
transformSourceTableTask.py command-line interface¶
transformSourceTableTask.py REPOPATH [@file [@file2 ...]] [--output OUTPUTREPO | --rerun RERUN] [--id] [other options]
Key arguments:
REPOPATH
- The input Butler repository’s URI or file path.
Key options:
--id
:- The data IDs to process.
See also
See Command-line task argument reference for details and additional options.
Python API summary¶
from lsst.pipe.tasks.postprocess import TransformSourceTableTask
-
class
(*args, **kwargs)TransformSourceTableTask
Transform/standardize a source catalog
...
-
attribute
config
Access configuration fields and retargetable subtasks.
-
method
(parq, funcs=None, dataId=None, band=None)run
Do postprocessing calculations
...
-
method
(dataRef)runDataRef
Override to specify band label to run()
...
See also
See the TransformSourceTableTask
API reference for complete details.
Butler datasets¶
When run as the transformSourceTableTask.py
command-line task, or directly through the runDataRef
method, TransformSourceTableTask
obtains datasets from the input Butler data repository and persists outputs to the output Butler data repository.
Note that configurations for TransformSourceTableTask
, and its subtasks, affect what datasets are persisted and what their content is.
Input datasets¶
source
- Full-width parquet version of the
src
catalog. It is generated byWriteSourceTableTask
Output datasets¶
sourceTable
- Source Table in parquet format (per-detector)
Retargetable subtasks¶
No subtasks.
Configuration fields¶
connections¶
- Data type
lsst.pipe.base.config.Connections
- Field type
ConfigField
functorFile¶
Examples¶
The following command shows an example of how to run the task on an example HSC repository.
transformSourceTable.py /datasets/hsc/repo --calib /datasets/hsc/repo/CALIB --rerun <rerun name> --id visit=30504 ccd=0..8^10..103
Using the python API
import os
from lsst.utils import getPackageDir
from lsst.daf.persistence import Butler
from lsst.pipe.tasks.postprocess import TransformSourceTableTask
# get input catalogs
butler = Butler('/path/to/repo')
dataId = {'visit': 30504, 'ccd': 51}
source = butler.get('source', dataId=dataId)
# setup task using the obs_subaru Source.yaml specification
config = TransformSourceTableTask.ConfigClass()
config.functorFile = os.path.join(getPackageDir("obs_subaru"), 'policy', 'Source.yaml')
task = TransformSourceTableTask(config=config)
defaultFunctors = task.getFunctors()
# run the task to get a DataFrame
df = task.run(source, funcs=defaultFunctors, dataId=dataId)
You may also specify your own functors to apply:
import yaml
from lsst.pipe.tasks.functors import CompositeFunctor
str = """
funcs:
ApFlux:
functor: LocalNanojansky
args:
- slot_CalibFlux_instFlux
- slot_CalibFlux_instFluxErr
- base_LocalPhotoCalib
- base_LocalPhotoCalibErr
ApFluxErr:
functor: LocalNanojanskyErr
args:
- slot_CalibFlux_instFlux
- slot_CalibFlux_instFluxErr
- base_LocalPhotoCalib
- base_LocalPhotoCalibErr
"""
exampleFunctors = CompositeFunctor.from_yaml(yaml.load(str))
df = task.run(source, funcs=exampleFunctors, dataId=dataId)