Columnstore Sizing Estimations

There are two possible methods for estimating the total memory used by a columnstore table. The first method is brief, while the second is more precise. For more information about how columnstore works, see Columnstore, and the other topics nested under it.

Simple Method

For the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.  Multiply the average row size by the expected number of rows in your table to get the estimated uncompressed size.

Next, estimate the compression ratio for your dataset. Columnstore compression typically shrinks the data size by 3-7x. Based on these estimations, it is possible to get a very rough estimate of storage requirements for columnstore tables.

Complex Method

In the same way as the simple method, start with calculating the average row size for your table by adding together the size of each column's data type from Data Types.  Multiply the average row size by the expected number of rows in your table to get the estimated uncompressed size.

Then, to get an estimate that is more accurate than the base calculation demonstrated above, a more precise compression ratio is needed. Use the steps below to accomplish this by loading a large amount of representative data, then measuring the actual compression ratios achieved (by table, and by column).

  1. Load a representative sample of data into a columnstore table. More data leads to more accurate measurements.

    1. For information about creating a columnstore table, see Creating a Columnstore Table.

    2. To see options and procedures for loading data, see Load Data and the topics nested below it.

  2. Run OPTIMIZE TABLE on the table in order to ensure the data is optimized and on disk. Use caution when running this command, as it can impact the performance of all other workloads running in the cluster. For more information, see the linked topic.

  3. Measure the compression ratios using either SingleStore Studio, or SQL queries as outlined below.

    1. Using Studio:

      1. Navigate to the Databases page.

      2. Select the database containing the table in question.

      3. Compression ratios are displayed at the table level.

      4. Select a table to see compression ratios per column.

    2. By querying COLUMNAR_SEGMENTS.

      1. Compression ratio per table (compressed / uncompressed):

        SELECT
        database_name,
        table_name,
        (1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratio
        FROM information_schema.columnar_segments
        GROUP BY database_name, table_name
        ORDER BY ratio DESC;
      2. Compression ratio per column (compressed / uncompressed):

        SELECT
        database_name,
        table_name,
        column_name,
        (1-SUM(compressed_size)/SUM(uncompressed_size)) AS ratio
        FROM information_schema.columnar_segments
        GROUP BY database_name, table_name, column_name
        ORDER BY ratio DESC;

Additional Methods to Obtain Columnstore Table Statistics

The following query can be used to obtain the total disk space used by columnstore secondary indexes:

SELECT database_name, table_name, SUM(size) AS total_index_size
FROM information_schema.MV_COLUMNSTORE_FILES
WHERE file_type = "INVERTED_INDEX" OR file_type = "CROSS_SEGMENT_HASH_INDEX" GROUP BY 1,2;

For more information about the column descriptions, see MV_COLUMNSTORE_FILES.

Additional Sizing Estimations for Columnstore Table Overhead

Beyond the variable size estimates explained above, columnstore tables also require space for a constant metadata overhead of:

  • ~100 bytes of memory per segment per column.

  • ~100 KB for each segment that has at least one deleted row.

  • 7KB per column per partition for incremental cardinality statistics.

Last modified: March 7, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK