Columnstore Sizing Estimations

There are two possible methods for estimating the total memory used by a columnstore table. The first method is brief, while the second is more precise. For more information about how columnstore works, see Columnstore, and the other topics nested under it.

Simple Method

For the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.  Multiply the average row size by the expected number of rows in your table to get the estimated uncompressed size.

Next, estimate the compression ratio for your dataset. Columnstore compression typically shrinks the data size by 3-7x. Based on these estimations, it is possible to get a very rough estimate of storage requirements for columnstore tables.

Complex Method

In the same way as the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.  Multiply the average row size by the expected number of rows in your table to get the estimated uncompressed size.

Then, to get an estimate that is more accurate than the base calculation demonstrated above, a more precise compression ratio is needed. Use the steps below to accomplish this by loading a large amount of representative data, then measuring the actual compression ratios achieved (by table, and by column).

  1. Load a representative sample of data into a columnstore table. More data leads to more accurate measurements.

    1. For information about creating a columnstore table, see Creating a Columnstore Table.

    2. To see options and procedures for loading data, see Load Data and the topics nested below it.

  2. Run OPTIMIZE TABLE on the table in order to ensure the data is optimized and on disk. Use caution when running this command, as it can impact the performance of all other workloads running in the workspace. For more information, see the linked topic.

  3. Measure the compression ratios using SQL queries as outlined below.

    1. By querying COLUMNAR_SEGMENTS.

      1. Compression ratio per table (compressed / uncompressed):

        SELECT
        database_name,
        table_name,
        (1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratio
        FROM information_schema.columnar_segments
        GROUP BY database_name, table_name
        ORDER BY ratio DESC;
      2. Compression ratio per column (compressed / uncompressed):

        SELECT
        database_name,
        table_name,
        column_name,
        (1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratio
        FROM information_schema.columnar_segments
        GROUP BY database_name, table_name, column_name
        ORDER BY ratio DESC;

Additional Methods to Obtain Columnstore Table Statistics

The following query can be used to obtain the total disk space used by columnstore secondary indexes:

SELECT database_name, table_name, SUM(size) AS total_index_size
FROM information_schema.MV_COLUMNSTORE_FILES
WHERE file_type = "INVERTED_INDEX" OR file_type = "CROSS_SEGMENT_HASH_INDEX" GROUP BY 1,2;

For more information about the column descriptions, see MV_COLUMNSTORE_FILES.

Additional Sizing Estimations for Columnstore Table Overhead

Beyond the variable size estimates explained above, columnstore tables also require space for a constant metadata overhead of:

  • ~100 bytes of memory per segment per column.

  • ~100 KB for each segment that has at least one deleted row.

  • 7KB per column per partition for incremental cardinality statistics.

Last modified: March 7, 2024

Was this article helpful?