DataFrame¶
The DataFrame
storage class corresponds to the pandas.DataFrame
class in Python.
It includes special support for dealing with multi-level indexes (i.e. pandas.MultiIndex
) in columns.
Components¶
The DataFrame
storage class has a single component, columns
, which contains a description of the columns as a pandas.Index
(often pandas.MultiIndex
) instance.
Parameters¶
The DataFrame
storage clss supports a single parameter for partial reads, with the key columns
.
For single-level columns, this should be a single column name (str
) or a list
of column names.
For multi-level index (pandas.MultiIndex
) columns, this should be a dictionary whose keys are the names of the levels, and whose values are column names (str
) or lists thereof.
The loaded columns are the product of the values for all levels.
Levels not included in the dict are included in their entirety.
For example, the deepCoadd_obj
dataset is typically defined as a hierarchical table with levels dataset
, filter
, and column
, which take values such as ("meas", "HSC-R", "base_SdssShape_xx")
.
Retrieving this dataset via:
butler.get(
"deepCoadd_obj", ...,
parameters={
"columns": {"dataset": "meas",
"filter": ["HSC-R", "HSC-I"],
"column": ["base_SdssShape_xx", "base_SdssShape_yy"]}
}
)
is equivalent to (but potentially much more efficient than):
full = butler.get("deepCoadd_obj", ...)
full.loc[:, ["meas", ["HSC-R", "HSC-I"],
["base_SdssShape_xx", "base_SdssShape_yy"]]]