compute_row_group_size#
- lsst.daf.butler.formatters.parquet.compute_row_group_size(schema: Schema, target_size: int = 1000000000) int#
Compute approximate row group size for a given arrow schema.
Given a schema, this routine will compute the number of rows in a row group that targets the persisted size on disk (or smaller). The exact size on disk depends on the compression settings and ratios; typical binary data tables will have around 15-20% compression with the pyarrow default
snappycompression algorithm.Parameters#
- schema
pyarrow.Schema Arrow table schema.
- target_size
int, optional The target size (in bytes).
Returns#
- row_group_size
int Number of rows per row group to hit the target size.
- schema