ParquetTable

class lsst.pipe.tasks.parquetTable.ParquetTable(filename=None, dataFrame=None)

Bases: object

Thin wrapper to pyarrow’s ParquetFile object

Call toDataFrame method to get a pandas.DataFrame object, optionally passing specific columns.

The main purpose of having this wrapper rather than directly using pyarrow.ParquetFile is to make it nicer to load selected subsets of columns, especially from dataframes with multi-level column indices.

Instantiated with either a path to a parquet file or a dataFrame

Parameters:
filenamestr, optional

Path to Parquet file.

dataFramedataFrame, optional

Attributes Summary

columnIndex

Columns as a pandas Index

columns

List of column names (or column index if df is set)

pandasMd

Methods Summary

toDataFrame([columns])

Get table (or specified columns) as a pandas DataFrame

write(filename)

Write pandas dataframe to parquet

Attributes Documentation

columnIndex

Columns as a pandas Index

columns

List of column names (or column index if df is set)

This may either be a list of column names, or a pandas.Index object describing the column index, depending on whether the ParquetTable object is wrapping a ParquetFile or a DataFrame.

pandasMd

Methods Documentation

toDataFrame(columns=None)

Get table (or specified columns) as a pandas DataFrame

Parameters:
columnslist, optional

Desired columns. If None, then all columns will be returned.

write(filename)

Write pandas dataframe to parquet

Parameters:
filenamestr

Path to which to write.