Functor

class lsst.pipe.tasks.functors.Functor(filt=None, dataset=None, noDup=None)

Bases: object

Define and execute a calculation on a ParquetTable

The __call__ method accepts either a ParquetTable object or a DeferredDatasetHandle, and returns the result of the calculation as a single column. Each functor defines what columns are needed for the calculation, and only these columns are read from the ParquetTable.

The action of __call__ consists of two steps: first, loading the necessary columns from disk into memory as a pandas.DataFrame object; and second, performing the computation on this dataframe and returning the result.

To define a new Functor, a subclass must define a _func method, that takes a pandas.DataFrame and returns result in a pandas.Series. In addition, it must define the following attributes

  • _columns: The columns necessary to perform the calculation

  • name: A name appropriate for a figure axis label

  • shortname: A name appropriate for use as a dictionary key

On initialization, a Functor should declare what band (filt kwarg) and dataset (e.g. 'ref', 'meas', 'forced_src') it is intended to be applied to. This enables the _get_data method to extract the proper columns from the parquet file. If not specified, the dataset will fall back on the _defaultDataset`attribute. If band is not specified and `dataset is anything other than 'ref', then an error will be raised when trying to perform the calculation.

Originally, Functor was set up to expect datasets formatted like the deepCoadd_obj dataset; that is, a dataframe with a multi-level column index, with the levels of the column index being band, dataset, and column. It has since been generalized to apply to dataframes without mutli-level indices and multi-level indices with just dataset and column levels. In addition, the _get_data method that reads the dataframe from the ParquetTable will return a dataframe with column index levels defined by the _dfLevels attribute; by default, this is column.

The _dfLevels attributes should generally not need to be changed, unless _func needs columns from multiple filters or datasets to do the calculation. An example of this is the lsst.pipe.tasks.functors.Color functor, for which _dfLevels = ('band', 'column'), and _func expects the dataframe it gets to have those levels in the column index.

Parameters:
filtstr

Filter upon which to do the calculation

datasetstr

Dataset upon which to do the calculation (e.g., ‘ref’, ‘meas’, ‘forced_src’).

Attributes Summary

columns

Columns required to perform calculation

name

Full name of functor (suitable for figure labels)

noDup

shortname

Short name of functor (suitable for column name/dict key)

Methods Summary

__call__(data[, dropna])

Call self as a function.

difference(data1, data2, **kwargs)

Computes difference between functor called on two different ParquetTable objects

fail(df)

multilevelColumns(data[, columnIndex, ...])

Returns columns needed by functor from multilevel dataset

Attributes Documentation

columns

Columns required to perform calculation

name

Full name of functor (suitable for figure labels)

noDup
shortname

Short name of functor (suitable for column name/dict key)

Methods Documentation

__call__(data, dropna=False)

Call self as a function.

difference(data1, data2, **kwargs)

Computes difference between functor called on two different ParquetTable objects

fail(df)
multilevelColumns(data, columnIndex=None, returnTuple=False)

Returns columns needed by functor from multilevel dataset

To access tables with multilevel column structure, the MultilevelParquetTable or DeferredDatasetHandle need to be passed either a list of tuples or a dictionary.

Parameters:
dataMultilevelParquetTable or DeferredDatasetHandle
columnIndex (optional): pandas `Index` object

either passed or read in from DeferredDatasetHandle.

`returnTuple`bool

If true, then return a list of tuples rather than the column dictionary specification. This is set to True by CompositeFunctor in order to be able to combine columns from the various component functors.