Functor

class lsst.pipe.tasks.functors.Functor(filt=None, dataset=None, noDup=None)

Bases: object

Define and execute a calculation on a DataFrame or Handle holding a DataFrame.

The __call__ method accepts either a DataFrame object or a DeferredDatasetHandle or InMemoryDatasetHandle, and returns the result of the calculation as a single column. Each functor defines what columns are needed for the calculation, and only these columns are read from the dataset handle.

The action of __call__ consists of two steps: first, loading the necessary columns from disk into memory as a DataFrame object; and second, performing the computation on this DataFrame and returning the result.

To define a new Functor, a subclass must define a _func method, that takes a DataFrame and returns result in a Series. In addition, it must define the following attributes:

  • _columns: The columns necessary to perform the calculation

  • name: A name appropriate for a figure axis label

  • shortname: A name appropriate for use as a dictionary key

On initialization, a Functor should declare what band (filt kwarg) and dataset (e.g. 'ref', 'meas', 'forced_src') it is intended to be applied to. This enables the _get_data method to extract the proper columns from the underlying data. If not specified, the dataset will fall back on the _defaultDataset attribute. If band is not specified and dataset is anything other than 'ref', then an error will be raised when trying to perform the calculation.

Originally, Functor was set up to expect datasets formatted like the deepCoadd_obj dataset; that is, a DataFrame with a multi-level column index, with the levels of the column index being band, dataset, and column. It has since been generalized to apply to DataFrames without multi-level indices and multi-level indices with just dataset and column levels. In addition, the _get_data method that reads the columns from the underlying data will return a DataFrame with column index levels defined by the _dfLevels attribute; by default, this is column.

The _dfLevels attributes should generally not need to be changed, unless _func needs columns from multiple filters or datasets to do the calculation. An example of this is the Color functor, for which _dfLevels = ('band', 'column'), and _func expects the DataFrame it gets to have those levels in the column index.

Parameters:
filtstr

Band upon which to do the calculation.

datasetstr

Dataset upon which to do the calculation (e.g., ‘ref’, ‘meas’, ‘forced_src’).

Attributes Summary

columns

Columns required to perform calculation.

name

Full name of functor (suitable for figure labels).

noDup

Do not explode by band if used on object table.

shortname

Short name of functor (suitable for column name/dict key).

Methods Summary

__call__(data[, dropna])

Call self as a function.

difference(data1, data2, **kwargs)

Computes difference between functor called on two different DataFrame/Handle objects.

fail(df)

multilevelColumns(data[, columnIndex, ...])

Returns columns needed by functor from multilevel dataset.

Attributes Documentation

columns

Columns required to perform calculation.

name

Full name of functor (suitable for figure labels).

noDup

Do not explode by band if used on object table.

shortname

Short name of functor (suitable for column name/dict key).

Methods Documentation

__call__(data, dropna=False)

Call self as a function.

difference(data1, data2, **kwargs)

Computes difference between functor called on two different DataFrame/Handle objects.

fail(df)
multilevelColumns(data, columnIndex=None, returnTuple=False)

Returns columns needed by functor from multilevel dataset.

To access tables with multilevel column structure, the DeferredDatasetHandle or InMemoryDatasetHandle needs to be passed either a list of tuples or a dictionary.

Parameters:
datavarious

The data as either DeferredDatasetHandle, or InMemoryDatasetHandle.

columnIndex (optional): pandas `~pandas.Index` object

Either passed or read in from DeferredDatasetHandle.

`returnTuple`bool

If true, then return a list of tuples rather than the column dictionary specification. This is set to True by CompositeFunctor in order to be able to combine columns from the various component functors.