Columnstore Sizing Estimations
On this page
There are two possible methods for estimating the total memory used by a columnstore table.
For the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.
Next, estimate the compression ratio for your dataset.
In the same way as the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.
Then, to get an estimate that is more accurate than the base calculation demonstrated above, a more precise compression ratio is needed.
Load a representative sample of data into a columnstore table.
More data leads to more accurate measurements.
Run OPTIMIZE TABLE on the table in order to ensure the data is optimized and on disk.
Use caution when running this command, as it can impact the performance of all other workloads running in the workspace. For more information, see the linked topic.
Measure the compression ratios using SQL queries as outlined below.
By querying COLUMNAR_
Compression ratio per table (compressed / uncompressed):SELECTdatabase_name,table_name,(1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratioFROM information_schema.columnar_segmentsGROUP BY database_name, table_nameORDER BY ratio DESC;
Compression ratio per column (compressed / uncompressed):SELECTdatabase_name,table_name,column_name,(1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratioFROM information_schema.columnar_segmentsGROUP BY database_name, table_name, column_nameORDER BY ratio DESC;
The following query can be used to obtain the total disk space used by columnstore secondary indexes:
SELECT database_name, table_name, SUM(size) AS total_index_sizeFROM information_schema.MV_COLUMNSTORE_FILESWHERE file_type = "INVERTED_INDEX" OR file_type = "CROSS_SEGMENT_HASH_INDEX" GROUP BY 1,2;
For more information about the column descriptions, see MV_
Beyond the variable size estimates explained above, columnstore tables also require space for a constant metadata overhead of:
~100 bytes of memory per segment per column.
~100 KB for each segment that has at least one deleted row.
7KB per column per partition for incremental cardinality statistics.
Last modified: April 17, 2023