compute_row_group_size#

lsst.daf.butler.formatters.parquet.compute_row_group_size(schema: Schema, target_size: int = 1000000000) int#

Compute approximate row group size for a given arrow schema.

Given a schema, this routine will compute the number of rows in a row group that targets the persisted size on disk (or smaller). The exact size on disk depends on the compression settings and ratios; typical binary data tables will have around 15-20% compression with the pyarrow default snappy compression algorithm.

Parameters#

schemapyarrow.Schema

Arrow table schema.

target_sizeint, optional

The target size (in bytes).

Returns#

row_group_sizeint

Number of rows per row group to hit the target size.