Columnstore Sizing Estimations
On this page
There are two possible methods for estimating the total memory used by a columnstore table.
Simple Method
For the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.
Next, estimate the compression ratio for your dataset.
Complex Method
In the same way as the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.
Then, to get an estimate that is more accurate than the base calculation demonstrated above, a more precise compression ratio is needed.
-
Load a representative sample of data into a columnstore table.
More data leads to more accurate measurements. -
For information about creating a columnstore table, see Creating a Columnstore Table.
-
To see options and procedures for loading data, see Load Data and the topics nested below it.
-
-
Run OPTIMIZE TABLE on the table in order to ensure the data is optimized and on disk.
Use caution when running this command, as it can impact the performance of all other workloads running in the cluster. For more information, see the linked topic. -
Measure the compression ratios using either SingleStore Studio, or SQL queries as outlined below.
-
Using Studio:
-
Navigate to the Databases page.
-
Select the database containing the table in question.
-
Compression ratios are displayed at the table level.
-
Select a table to see compression ratios per column.
-
-
By querying COLUMNAR_
SEGMENTS. -
Compression ratio per table (compressed / uncompressed):
SELECTdatabase_name,table_name,(1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratioFROM information_schema.columnar_segmentsGROUP BY database_name, table_nameORDER BY ratio DESC; -
Compression ratio per column (compressed / uncompressed):
SELECTdatabase_name,table_name,column_name,(1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratioFROM information_schema.columnar_segmentsGROUP BY database_name, table_name, column_nameORDER BY ratio DESC;
-
-
Additional Methods to Obtain Columnstore Table Statistics
The following query can be used to obtain the total disk space used by columnstore secondary indexes:
SELECT database_name, table_name, SUM(size) AS total_index_sizeFROM information_schema.MV_COLUMNSTORE_FILESWHERE file_type = "INVERTED_INDEX" OR file_type = "CROSS_SEGMENT_HASH_INDEX" GROUP BY 1,2;
For more information about the column descriptions, see MV_
Additional Sizing Estimations for Columnstore Table Overhead
Beyond the variable size estimates explained above, columnstore tables also require space for a constant metadata overhead of:
-
~100 bytes of memory per segment per column.
-
~100 KB for each segment that has at least one deleted row.
-
7KB per column per partition for incremental cardinality statistics.
Last modified: March 7, 2024