SingleStore Helios Sizing Guidelines


This document is designed to help technical users properly size a SingleStore Helios deployment.

Size a Workload

Workspace Sizing Considerations


Description and How to Measure

Data Size

It is important to understand how much data will be stored in SingleStore Helios.

The most critical element for sizing is to understand how much of the data is the “working set,” which equates to how much data is required to meet the query latency requirements/SLA of an application. For optimal performance, it is imperative that the working set of data fits in the persistent cache.

For example: A Fintech application displays stock performance over the last 30 days and stores the historical data for a full year for other types of analysis. There is a specific SLA for how quickly the dashboard must be visualized when working with data from the last 30 days, but the SLA for the analysis over the year’s worth of data is more flexible.

  • The market performance data for the last 30 days is considered the working set of data

  • The full year’s worth of data is the total data set

Even if the back-end database stores the previous year’s worth of market performance data (which is periodically queried but does not have an SLA), this would not impact the functionality of the application, and thus would not factor into sizing, as this data is stored in the object storage and not in the local cache storage.

Data size can be used as a baseline/lower bound for sizing. To calculate data size:

  1. Multiply the working set size times the compression ratio. SingleStore typically sees 75% compression on average but it depends on the data set.

  2. Take the total size of the raw data and multiply it by the compression ratio

  3. Multiply that result by two (2) to account for High Availability (HA)

  4. Add 25% to that result for other metadata (logs, snapshots, etc.)

This can be expressed formulaically as:

RawDataSizeinTBs * (1 - CompressionRatio)*2 + (RawDataSizeinTBs * (1 - CompressionRatio)*.25)


  1. How large is the working set of data?

  2. How much data does the application need to perform?

  3. In lieu of an application, how many distinct data sets are there?

  4. How many tables? What are the core tables? How large are they?

  5. What types of data are primarily being stored? Strings? INTs? Floats? JSON? etc.

  6. Is a single tenant or multi-tenant structure being maintained? If multi-tenant, how many tenants are there? What is the approximate size of each?

  7. Is the data set growing? If so, by how much? How frequently?

  8. For those SLAs that are not as stringent for latency and concurrency, an important factor in determining cluster size largely depends on data size.

Data Ingest

Data ingest is about how much, in both rows and bytes, and how fast data is written into the database. This is typically inserts, but could also be updates and/or deletes.


  1. What is the data size? The number of rows is not as helpful as the overall size of the data.

  2. What is the average ingest rate in bytes/second? What is the peak?

  3. How frequently is this volume of data ingested into the database? Does it stream in continuously, in small batches (every hour, etc.), or infrequently (once a day, etc.)

  4. How is the data currently ingested? What is the data source and format? Is it performed via an ELT, an ETL, or other transformation, like a stored procedure? The goal is to understand how much work will be performed during ingestion. For example, is data being inserted straight into a table or is there significant work performed in a stored procedure to process and transform the data?

  5. How is the ingest rate growing? By what percentage over what time?

Query Shapes

The “core queries” of an application are required to calculate the appropriate cluster size. Core queries are the queries most commonly run in the application and which require a specific SLA in order to meet the desired/required user experience.

Some core queries are run many times, possibly with different parameters (literals) but otherwise the same structure.

It is important to understand the shapes of the core queries that are running as they can impact the performance of the database differently.

A query shape (type) is the form of the query plan that determines how much work a query needs to do. These are the different query shapes. Visual Explain can be used to profile these queries.

  • Selective queries: Queries that return a small number of rows.

    SELECT * FROM table WHERE id = 5

  • Aggregation queries

    SELECT AVG(col1), SUM(col2) FROM table GROUP BY (col3)

  • Joins

    SELECT Count(*) FROM lineitem JOIN orders ON o_orderkey = l_orderkey;

It's also important to identify queries that can run on one partition vs. queries that must run on every partition. Select vie queries typically hit one or a small number of partitions. Aggregations often touch all the partitions. Those queries that run on one partition will tend to have lower total CPU.

For example: Filtering on id if id is a primary key (which is a shard key and will have a unique index) can enable  SELECT * FROM table WHERE id = 5 to run on one leaf node. If id is not a shard key, then the query must touch every partition, and thus every leaf node.

Graphical profile plans as well as JSON profile plans will reveal the indexes being used.

Consider CREATE TABLE t(id int, id2 int, primary key(id)); and query profile SELECT * FROM t WHERE id2 = 1;

The following in the JSON profile means it is touching all shards and scanning:

"columnstore_in_memory_scan_index":"KEY __UNORDERED () USING CLUSTERED COLUMNSTORE",

For profile SELECT * FROM t WHERE id = 1; means it is seeking one shard and seeking:

"columnstore_in_memory_scan_index":"UNIQUE KEY PRIMARY (id) USING HASH",


  1. What are the core queries?

  2. What are the query shapes?

  3. What is the total CPU of each core query? All core queries?

  4. What is the response time of each core query? All core queries?


Typically, customers will provide one or more latency targets for their core queries, such as a 5-second response time, or a range of response times for each core query.


  1. What are the latency expectations for each core query?

  2. Is there a latency expectation for all core queries? If so, what is it?


Concurrency refers to the number of queries hitting the data at a given moment, including both typical and peak traffic. Ideally, this concurrency can be measured per query shape. Review the concurrency of each of the core queries.

General numbers can be used to record the concurrency: 1, 5, 10, 20, 50, 100, 500, 1000 etc.


  1. When a user logs into the application, how many queries are triggered?

  2. On average, how many of the core queries are running in the application concurrently?

  3. Do large spikes in traffic occur? If so, what is the minimum number of queries? The maximum?

  4. How is concurrency growing? By what percentage over what time?

Comparative Performance

To estimate a workspace size, review the examples below and select the one that most closely resembles your requirements.

To determine the optimal workspace size, test the core queries with the selected workspace size and either increase or decrease the size until it fulfills your latency requirements.

Workspace Sizing Examples

Workspace Size




Use Case

Support ultra-fast, high-performance queries and massive concurrency while minimizing monthly costs

Support real-time analytics and user-defined metrics while minimizing the time to onboard new tenants

Maintain existing customers and prepare for significant customer growth by increasing platform availability and resiliency, and optimizing application responsiveness

Data Size *




Data Ingest

< 5,000 RPS

Current solution takes 40 - 60 minutes to fully load product data. Load times for SingleStore Helios do not increase the overall ETL duration by more than 10%

While not specifically tested on an S-8 cluster, load times tested on an S-6 cluster with a single tenant (~135s) and seven tenants concurrently (~340s) demonstrated that replacing the existing database with SingleStore Helios would significantly reduce ETL times.

< 5,000 inserts/second from the application

Query Latency


P50 query latency of < 1s

P95 query latency of < 5s

P99 query latency of < 10s

(About latency)

<100ms for 27 different queries

Query Shapes

Analytical queries/summary statistics on 4 fact tables of 2M records each, and 20 reference tables of approximately 5,000 records each

Medium complexity analytical queries

SingleStore Helios was evaluated as a means to power an analytics system and provide customized, user-defined metrics within the customer’s app, where “user-defined metrics” are metrics and signals that customers wanted to track/use within their individual environment.

Analytical (large table scans and aggregations)

Concurrency tested (QPS)

10 - 50



(achieved with 16:1 CPU:partition ratio)

Proposed Workspace Size

(About workspace sizes)




* Data sizes shown are compressed, where the compression rate can vary up to 80% based on the type of data.

Last modified: February 23, 2024

Was this article helpful?