Functor¶
- class lsst.pipe.tasks.functors.Functor(filt=None, dataset=None, noDup=None)¶
Bases:
objectDefine and execute a calculation on a ParquetTable
The
__call__method accepts either aParquetTableobject or aDeferredDatasetHandle, and returns the result of the calculation as a single column. Each functor defines what columns are needed for the calculation, and only these columns are read from theParquetTable.The action of
__call__consists of two steps: first, loading the necessary columns from disk into memory as apandas.DataFrameobject; and second, performing the computation on this dataframe and returning the result.To define a new
Functor, a subclass must define a_funcmethod, that takes apandas.DataFrameand returns result in apandas.Series. In addition, it must define the following attributes_columns: The columns necessary to perform the calculationname: A name appropriate for a figure axis labelshortname: A name appropriate for use as a dictionary key
On initialization, a
Functorshould declare what band (filtkwarg) and dataset (e.g.'ref','meas','forced_src') it is intended to be applied to. This enables the_get_datamethod to extract the proper columns from the parquet file. If not specified, the dataset will fall back on the_defaultDataset`attribute. If band is not specified and `datasetis anything other than'ref', then an error will be raised when trying to perform the calculation.Originally,
Functorwas set up to expect datasets formatted like thedeepCoadd_objdataset; that is, a dataframe with a multi-level column index, with the levels of the column index beingband,dataset, andcolumn. It has since been generalized to apply to dataframes without mutli-level indices and multi-level indices with justdatasetandcolumnlevels. In addition, the_get_datamethod that reads the dataframe from theParquetTablewill return a dataframe with column index levels defined by the_dfLevelsattribute; by default, this iscolumn.The
_dfLevelsattributes should generally not need to be changed, unless_funcneeds columns from multiple filters or datasets to do the calculation. An example of this is thelsst.pipe.tasks.functors.Colorfunctor, for which_dfLevels = ('band', 'column'), and_funcexpects the dataframe it gets to have those levels in the column index.- Parameters:
- filtstr
Filter upon which to do the calculation
- datasetstr
Dataset upon which to do the calculation (e.g., ‘ref’, ‘meas’, ‘forced_src’).
Attributes Summary
Columns required to perform calculation
Full name of functor (suitable for figure labels)
Short name of functor (suitable for column name/dict key)
Methods Summary
__call__(data[, dropna])Call self as a function.
difference(data1, data2, **kwargs)Computes difference between functor called on two different ParquetTable objects
fail(df)multilevelColumns(data[, columnIndex, ...])Returns columns needed by functor from multilevel dataset
Attributes Documentation
- columns¶
Columns required to perform calculation
- name¶
Full name of functor (suitable for figure labels)
- noDup¶
- shortname¶
Short name of functor (suitable for column name/dict key)
Methods Documentation
- __call__(data, dropna=False)¶
Call self as a function.
- difference(data1, data2, **kwargs)¶
Computes difference between functor called on two different ParquetTable objects
- fail(df)¶
- multilevelColumns(data, columnIndex=None, returnTuple=False)¶
Returns columns needed by functor from multilevel dataset
To access tables with multilevel column structure, the
MultilevelParquetTableorDeferredDatasetHandleneed to be passed either a list of tuples or a dictionary.- Parameters:
- data
MultilevelParquetTableorDeferredDatasetHandle - columnIndex (optional): pandas `Index` object
either passed or read in from
DeferredDatasetHandle.- `returnTuple`bool
If true, then return a list of tuples rather than the column dictionary specification. This is set to
TruebyCompositeFunctorin order to be able to combine columns from the various component functors.
- data