ParquetTable¶
- class lsst.pipe.tasks.parquetTable.ParquetTable(filename=None, dataFrame=None)¶
Bases:
object
Thin wrapper to pyarrow’s ParquetFile object
Call
toDataFrame
method to get apandas.DataFrame
object, optionally passing specific columns.The main purpose of having this wrapper rather than directly using
pyarrow.ParquetFile
is to make it nicer to load selected subsets of columns, especially from dataframes with multi-level column indices.Instantiated with either a path to a parquet file or a dataFrame
- Parameters:
- filenamestr, optional
Path to Parquet file.
- dataFramedataFrame, optional
Attributes Summary
Columns as a pandas Index
List of column names (or column index if df is set)
Methods Summary
toDataFrame
([columns])Get table (or specified columns) as a pandas DataFrame
write
(filename)Write pandas dataframe to parquet
Attributes Documentation
- columnIndex¶
Columns as a pandas Index
- columns¶
List of column names (or column index if df is set)
This may either be a list of column names, or a pandas.Index object describing the column index, depending on whether the ParquetTable object is wrapping a ParquetFile or a DataFrame.
- pandasMd¶
Methods Documentation
- toDataFrame(columns=None)¶
Get table (or specified columns) as a pandas DataFrame
- Parameters:
- columnslist, optional
Desired columns. If
None
, then all columns will be returned.
- write(filename)¶
Write pandas dataframe to parquet
- Parameters:
- filenamestr
Path to which to write.