Workspace Scaling

Overview

SingleStore Helios has a unique architecture which offers the flexibility to scale resources dynamically for both read and write workloads. Many databases make it easy to scale reads but provide no way to dynamically scale writes. For developers building dynamic applications, SingleStore Helios provides best-in-class performance and elasticity.

This is because SingleStore Helios is built on a clustered architecture which is distributed across compute resources. As workloads grow, resources are managed automatically to allow applications and workloads to scale effortlessly.

Compute workspaces can be scaled up or down to accommodate changing workloads. The size of a workspace, such as S-32, determines the overall number of compute resources (vCPU cores and memory) available to a workload.

Scaling operations are always online, however connections might drop for various reasons, and an immediate reconnect should always succeed.

How Scaling Works

A SingleStore Helios compute workspace is made up of individual nodes, which allow an even distribution of jobs across the underlying cloud resources. From a user's perspective, all that is required is to select a size (such as S-64) and SingleStore Helios automatically provisions all the necessary resources.

There are multiple ways to scale resources depending on the workload requirements. Compute can be Resized, Scaled, or Autoscaled. Additionally, the amount of cache for low-latency query access can also be scaled independently (Cache Scaling).

Resizing

Resizing is performed by changing the base size of the compute deployment (for example, from S-12 to S-24). This operation will automatically add or remove compute, memory, and cache resources, and redistribute data within the workspace for optimal performance.

As data is redistributed when resizing, the amount of time to perform a full resize is dependent on the workspace size and the size of the data working set. This operation is fully online, and the complete process can take between minutes and hours depending on data volumes.

Resizing is ideal for workloads which have grown or shrunk over time and are expected to continue operation at the new compute size. For workloads which are expected to rapidly scale up and down see the next section, Scaling.

Scaling

Scaling operations are performed by changing the scaleFactor of the deployment. For example, changing the scaleFactor from "1" to "2" or "4". This will automatically increase the amount of vCPU and memory available to workloads from 1x to up to 4x.

This feature is designed to scale resources up and down in order to handle dynamic changes in workload needs. The time for this operation to complete will vary based on workspace size, the number of tables involved, and other factors.

Autoscaling

Autoscaling is designed to track the active compute workload and automatically scale the deployment based on compute and memory usage. When the workload requests more vCPU or memory than is available, autoscaling will automatically add compute resources on the fly. If the workload decreases and the additional compute is no longer needed, autoscaling will return to the base size.

While many databases limit autoscaling to read-replicas, SingleStore has implemented autoscaling to provide both enhanced write and read performance. This allows Intelligent Applications to leverage SingleStore Helios as workloads change to increase ingest and read performance across Hybrid Transactional Analytical Processing (HTAP) workloads.

When configuring autoscaling users can turn the feature on or off, and set the maximum amount of vCPU and Memory to be provisioned (2x or 4x of the base amount). This provides dynamic flexibility while allowing users to tightly manage costs.

Autoscaling is ideal for dynamic workloads where the user does not know when peaks in workload may occur and can be turned on or off for each compute deployment independently.

The default settings for autoscaling may not fit every workload. Some workloads can benefit from a shorter sample duration, while others benefit from a longer duration to prevent unnecessary scaling.

Autoscaling provides three sensitivity levels to handle a workload:

Low - this is a more conservative scaling, which uses 15 min sampling and 30 minute cooldown.
Normal - this is the standard configuration, which uses 5 min sampling and 10 minute cooldown.
High - this level is the most sensitive scaling, which uses 3 min sampling and 5 minute cooldown.

As a general rule, it is best to use the Normal setting.

The Low setting can be used when scaling should happen only after sufficient workload has been running for the 15-minute sampling period. This prevents the occurrence of scaling up when there are short spikes of load followed by a drop in CPU utilization.

The High setting is recommended when workloads are most dynamic and scaling should occur as often as needed, with minimal cooldown before returning to the base scale factor.

Cache Configuration

Setting the Cache Configuration allows compute deployments to leverage greater volumes of Persistent Cache to increase the amount of data (the working set) that can be accessed with extremely low latency.

Increasing the cache configuration, for example from “1x” to “2x” or “4x”), will increase the overall volume of the cache, and automatically distribute data within the cache. As data is automatically redistributed for optimal performance, typically this operation will take between minutes and hours, depending on the volume of data.

This operation runs online and data is available to be written and read throughout the reconfiguration process, and cache configurations can be increased or decreased as desired.

Resizing and Scaling

Scaling up or down can be triggered through the Cloud Portal or Management API.

Using Cloud Portal

To scale a workspace through the Cloud Portal, navigate to Deployments > Overview, select the workspace card, open the workspace options menu (⋮), and select Resize Workspace. Alternatively, navigate to Deployments > Workspaces, select the workspace to resize, and under the Actions column, select Resize Workspace.

Using Management API

Resizing up or down through the management API can be done by using WorkspaceUpdate size. Scaling can be performed by using WorkspaceUpdate scaleFactor, and Cache Configuration can be updated with WorkspaceUpdate cacheConfig. Refer to Management API Overview for more information.

High Availability

Critical workloads need to stay online, even when scaling the underlying resources. Rather than forcing users to take downtime when scaling, SingleStore Helios leverages its unique distributed SQL storage engine to handle resource provisioning without a major impact on read or write workloads. This is possible because of a high availability framework called “Load-Balanced Partition Placement.” In this architecture, data is replicated within the distributed workspace, and applications maintain the ability to read and write even when the underlying resources are being added or removed:

Changing the Database Partition Count

You can use either of the following methods to change the partition count in a database:

Use the BACKUP WITH SPLIT PARTITION command.

BACKUP [DATABASE] db_name WITH SPLIT PARTITIONS [BY 2] TO [S3 | AZURE | GCS] "backup_path"
[CONFIG configuration_json]
[CREDENTIALS credentials_json]

For more information about the syntax options, refer to BACKUP DATABASE.

Use the INSERT…SELECT command.

In this method, you must first create a new database with the desired number of partitions and then use INSERT…SELECT to copy the tables from the existing database to the new database. For huge tables that take more than a few minutes to copy (this depends on the amount of data and your system's scale), you should move the rows of the table over in large batches, instead of all at once.

For example, you have a monitoring workspace of S-8 for which the recommended partition count is 64. However the databases are out of sync with the recommended partition count, and you want to repartition your databases to the recommended count to get the ideal performance.

In this case, create a new database with the required ideal partition count and use the INSERT…SELECT command to copy the data from the existing database to the new database.

Scaling Impact on Performance

Resizing operations trigger the online addition or removal of compute resources, as well as a redistribution of data to ensure even performance across the compute workspace. When the operation is running there may be a temporary reduction in performance due to resources being dynamically added or removed.

For large deployments with heavy active workloads the time required to complete the resizing operation may increase as a larger volume of active data needs to be redistributed within the deployment. The resize operation and data rebalancing will continue in the background while the workload is running, so no extra steps need to be taken to ensure the job completes.

Billing

Compute consumes compute credits while running. When a deployment is scaled up or down the number of credits consumed will change depending on the size and scaleFactor of the workspace. This is reflected in the Resize Workspace > Review Changes menu. Both the current credit consumption and the new consumption for the target deployment size will be shown.

Resizing and scaling do not affect the storage costs, as storage is charged based on the average number of monthly GB stored, which does not change when deployments are scaled up or down.