Smart Disaster Recovery (DR): SmartDR

You can initiate Smart DR via the SingleStore Portal or API. The failover and failback processes are completely automated. Upon initiation, SingleStore activates your workspaces in the secondary region, connects your databases to these workspaces, and generates a connection string. You can then utilize this connection string to access your data or to configure your application.

Smart DR replicates the exact topology from the primary to the secondary region and maintains all the users, permissions, and workspace configurations across the regions.

Benefits

The principal benefits of Smart DR are minimal ongoing costs and a low Recovery Point Objective (RPO of up to 10 minutes). Smart DR reduces your ongoing disaster recovery expenses by eliminating the need for active compute resources. You incur charges only for storage and data transfer.

Use Case

A primary use case for Smart DR is to guarantee business continuity in the face of a region outage. This could be due to a natural cause, technical failures, or human errors and actions.

Depending on your business requirements, it may be essential to have both Multi-AZ High Availability (HA) and Smart DR. The distinction between the two is crucial because HA focuses on maintaining data availability for day-to-day activities despite minor disruptions, whereas disaster recovery is about recovering and restoring databases following a major regional outage.

In conjunction with Smart DR, Point-in-Time Recovery (PITR), gives you the ability to go back in time and recover data in both the primary and secondary regions.

Setting up Smart DR

Configuring Replication

Go to Workspaces from the left nav and select the workspace.
Click the three vertical dots against the selected workspace, and from the drop down list select Configure SmartDR.
Click +Configure Replication, choose the relevant database options and submit.

The Configure Smart DR screen section displays 3 top level menu options:

Workspaces displays the workspaces name, region and the Failover Role.
Databases displays the replicated databases name and status. Clicking on the Manage button allows you to view the available databases and select them for replication.
Settings displays the Primary Region which is the region where your database(s) currently reside. and the Secondary Region to which you want to replicate the database(s).

Replication Type which is Storage only by default. This implies the data is copied asynchronously between clusters and the secondary site does not require active compute nodes running.

Auto-replication is disabled by default. If enabled, new databases are replicated automatically.

Pre-provisioning

Pre-provisioning can be used to configure compute resources in a secondary region in advance.

This enables you to:

Configure the private endpoints in the secondary region, before failover is initiated.
Test DR by failing over to the secondary region without disrupting your production environment.
Failover is faster because compute is already running.

To pre-provision:

In the Portal, navigate to the Replication tab.
Click on Enable Pre-Provisioning.

This starts the background process to configure the secondary region with the same topology of workspaces as your primary region. During this process, your primary region continues to actively replicate data to the secondary region.

To validate your failover capabilities and ensure business continuity, you can switch to the secondary region, attach your database to a workspace, and start querying the data from the secondary region.

You can attach your application in the secondary region and verify that it can insert or update your database as expected, without impacting your production environment.

This seamless testing is possible because when you attach your database to a workspace in the secondary region SingleStore Helios automatically creates a branch of your database. This branch reflects all the data up to the point of attachment. More importantly, this branch is independent of your primary region, ensuring that any updates or modifications made during the test failover do not impact your production environment. For more information on branching refer to Database Branching.

Failover

Once the database(s) replication is set up and synced up, you can fail over your application to the secondary region anytime there is a regional failure.

To start the process:

Click the 3 vertical dots against the selected workspace, and from the drop down list select Configure SmartDR.
Click the Failover button on the left upper side of the Configure Smart DR screen.

Check the I confirm… checkbox and click Confirm in the popup window. You can monitor the progress from the status bar.

During the failover deployment, the system automatically performs the following tasks in the background:

Provisions the environment in the secondary region, maintaining the primary region's topology.
Provisions and configures all your workspaces.
Attaches the databases to the workspace and provides a connection string.
Preserves user permissions, pipelines, firewall settings and other metadata from your primary region.

The primary region workspace is automatically suspended as part of the failover task which cannot be resumed or terminated.

System-Managed Databases

After initiating the failover process, you may notice either through the UI or by running the SQL command SHOW DATABASES, there are two databases: one attached to the workspace and another with the same name but including a timestamp, in a detached state.

The detached database, referred to as a system-managed database, is a continuation of your primary region's database. It's called "system-managed" because users cannot directly attach it to a workspace. SingleStore ensures synchronization between the primary region's database and this system-managed database in the secondary region in the background.

During failover, SingleStore attaches a branch of this system-managed database to your workspace. In most failovers, the data in the branch database mirrors the detached database. However, in cases where the region connection is abruptly interrupted during data ingest, some data may not fully replicate during failover. In such scenarios, the system-managed database works behind the scenes to sync everything by continuously reconnecting to the primary database to pick up any missing rows.

You can access the data in the system-managed database at any time by attaching it as a branch and recovering the missing rows. Importantly, database branches do not consume extra storage, allowing you to create as many branches as needed to read data from these system-managed databases without incurring additional storage costs. Also, SingleStore will display these system-managed databases in the secondary region only if the corresponding database is attached and active in the primary region during failover.

Failback

To initiate failback from the secondary region to the primary region:

Click the three vertical dots against the selected workspace, and in the drop down list select Configure SmartDR.
Click the Failback button on the left upper side of the Configure Smart DR screen.

Check the I confirm… checkbox and click Confirm in the popup window. You can monitor the progress from the status bar.

The system automatically performs the following tasks during failback:

Configures the primary region environment.
Attaches replicated databases to the workspace and provides the connection string.
Updates user permissions and other metadata with changes from the secondary region.

Upon successful completion, the primary region becomes active, and the secondary region is no longer accessible.

Smart Disaster Recovery (DR): SmartDR

On this page

Benefits

Use Case

Setting up Smart DR

Pre-provisioning

Failover

Failback

Was this article helpful?

On this page

Was this article helpful?