Functor¶
- class lsst.pipe.tasks.functors.Functor(filt=None, dataset=None, noDup=None)¶
Bases:
object
Define and execute a calculation on a ParquetTable
The
__call__
method accepts either aParquetTable
object or aDeferredDatasetHandle
, and returns the result of the calculation as a single column. Each functor defines what columns are needed for the calculation, and only these columns are read from theParquetTable
.The action of
__call__
consists of two steps: first, loading the necessary columns from disk into memory as apandas.DataFrame
object; and second, performing the computation on this dataframe and returning the result.To define a new
Functor
, a subclass must define a_func
method, that takes apandas.DataFrame
and returns result in apandas.Series
. In addition, it must define the following attributes_columns
: The columns necessary to perform the calculationname
: A name appropriate for a figure axis labelshortname
: A name appropriate for use as a dictionary key
On initialization, a
Functor
should declare what band (filt
kwarg) and dataset (e.g.'ref'
,'meas'
,'forced_src'
) it is intended to be applied to. This enables the_get_data
method to extract the proper columns from the parquet file. If not specified, the dataset will fall back on the_defaultDataset`attribute. If band is not specified and `dataset
is anything other than'ref'
, then an error will be raised when trying to perform the calculation.Originally,
Functor
was set up to expect datasets formatted like thedeepCoadd_obj
dataset; that is, a dataframe with a multi-level column index, with the levels of the column index beingband
,dataset
, andcolumn
. It has since been generalized to apply to dataframes without mutli-level indices and multi-level indices with justdataset
andcolumn
levels. In addition, the_get_data
method that reads the dataframe from theParquetTable
will return a dataframe with column index levels defined by the_dfLevels
attribute; by default, this iscolumn
.The
_dfLevels
attributes should generally not need to be changed, unless_func
needs columns from multiple filters or datasets to do the calculation. An example of this is thelsst.pipe.tasks.functors.Color
functor, for which_dfLevels = ('band', 'column')
, and_func
expects the dataframe it gets to have those levels in the column index.- Parameters:
- filtstr
Filter upon which to do the calculation
- datasetstr
Dataset upon which to do the calculation (e.g., ‘ref’, ‘meas’, ‘forced_src’).
Attributes Summary
Columns required to perform calculation
Full name of functor (suitable for figure labels)
Short name of functor (suitable for column name/dict key)
Methods Summary
__call__
(data[, dropna])Call self as a function.
difference
(data1, data2, **kwargs)Computes difference between functor called on two different ParquetTable objects
fail
(df)multilevelColumns
(data[, columnIndex, ...])Returns columns needed by functor from multilevel dataset
Attributes Documentation
- columns¶
Columns required to perform calculation
- name¶
Full name of functor (suitable for figure labels)
- noDup¶
- shortname¶
Short name of functor (suitable for column name/dict key)
Methods Documentation
- __call__(data, dropna=False)¶
Call self as a function.
- difference(data1, data2, **kwargs)¶
Computes difference between functor called on two different ParquetTable objects
- fail(df)¶
- multilevelColumns(data, columnIndex=None, returnTuple=False)¶
Returns columns needed by functor from multilevel dataset
To access tables with multilevel column structure, the
MultilevelParquetTable
orDeferredDatasetHandle
need to be passed either a list of tuples or a dictionary.- Parameters:
- data
MultilevelParquetTable
orDeferredDatasetHandle
- columnIndex (optional): pandas `Index` object
either passed or read in from
DeferredDatasetHandle
.- `returnTuple`bool
If true, then return a list of tuples rather than the column dictionary specification. This is set to
True
byCompositeFunctor
in order to be able to combine columns from the various component functors.
- data