compute_row_group_size

lsst.daf.butler.formatters.parquet.compute_row_group_size(schema: Schema, target_size: int = 1000000000) int

Compute approximate row group size for a given arrow schema.

Given a schema, this routine will compute the number of rows in a row group that targets the persisted size on disk (or smaller). The exact size on disk depends on the compression settings and ratios; typical binary data tables will have around 15-20% compression with the pyarrow default snappy compression algorithm.

Parameters:
schemapyarrow.Schema

Arrow table schema.

target_sizeint, optional

The target size (in bytes).

Returns:
row_group_sizeint

Number of rows per row group to hit the target size.