Smart Disaster Recovery (DR): SmartDR

Note

This is a Preview feature.

SmartDR creates and manages a continuous asynchronous replication of data between a primary and a geographically separate secondary region. This service allows you to access the secondary region with minimal downtime or data loss for critical applications.

You can initiate SmartDR via the SingleStore Portal or API. The failover and failback processes are completely automated. Upon initiation, SingleStore activates your workspaces in the secondary region, connects your databases to these workspaces, and generates a connection string. You can then utilize this connection string to access your data or to configure your application.

SmartDR replicates the exact topology from the primary to the secondary region and maintains all the users, permissions, and workspace configurations across the regions.

Benefits

The principal benefits of SmartDR are minimal ongoing costs and a low Recovery Point Objective (RPO of up to 10 minutes). SmartDR reduces your ongoing disaster recovery expenses by eliminating the need for active compute resources. You incur charges only for storage and data transfer.

Use Case

A primary use case for SmartDR is to guarantee business continuity in the face of a region outage. This could be due to a natural cause, technical failures, or human errors and actions.

Depending on your business requirements, it may be essential to have both Multi-AZ High Availability (HA) and SmartDR. The distinction between the two is crucial because HA focuses on maintaining data availability for day-to-day activities despite minor disruptions, whereas disaster recovery is about recovering and restoring databases following a major regional outage.

In conjunction with SmartDR, Point-in-Time Recovery (PITR), gives you the ability to go back in time and recover data in both the primary and secondary regions.

Setting up SmartDR

Minimum Pre-requisites

  1. An existing workspace group - at the time of creation of the workspace group, under Advanced Settings you must check out the Enable SmartDR and Database Branching option. You can see the Replication tab to perform SmartDR and branching only if you have enabled this option.

  2. A workspace in the workspace group - at the time of creation of the workspace, under the Deployment Type, currently on the Non-Production option is available.

  3. A database and its tables.

Configuring the Replication

  1. Select the Replication tab from the top menu and then select  +Configure Replication.

  2. Select the details for:

    1. Primary region - this displays the region where your database(s) currently reside. This is displayed by default and cannot be modified.

    2. Secondary region - select from the drop-down list the region to which you want to replicate the database(s).

  3. Choose the database(s) to replicate from the list displayed.

  4. Select Submit. You will see a status bar indicating the progress of the replication processes and the replication status of each selected database.

Failover

Once the database(s) replication is set up and synced up, you can fail over your application to the secondary region anytime there is a regional failure.

To start the process, click the Failover button under the Replication tab. Check the "I confirm…" checkbox and click Confirm in the popup window. You can monitor the progress from the status bar.

During the failover deployment, the system automatically performs the following tasks in the background:

  • Provisions the environment in the secondary region, maintaining the primary region's topology.

  • Provisions and configures all your workspaces.

  • Attaches the databases to the workspace and provides a connection string. To access the connection string:

    • Select the Overview tab.

    • Choose the Connect String from the Workspaces > Connect option from the drop-down list.

  • Preserves user permissions, pipelines, firewall settings and other metadata from your primary region.

System-Managed Databases

After initiating the failover process, you may notice either through the UI or by running the SQL command SHOW DATABASES, there are two databases: one attached to the workspace (WS) and another with the same name but including a timestamp, in a detached state.

The detached database, referred to as a system-managed database, is a continuation of your primary region's database. It's called "system-managed" because users cannot directly attach it to a workspace. SingleStore ensures synchronization between the primary region's database and this system-managed database in the secondary region in the background.

During failover, SingleStore attaches a branch of this system-managed database to your workspace. In most failovers, the data in the branch database mirrors the detached database. However, in cases where the region connection is abruptly interrupted during data ingest, some data may not fully replicate during failover. In such scenarios, the system-managed database works behind the scenes to sync everything by continuously reconnecting to the primary database to pick up any missing rows.

You can access the data in the system-managed database at any time by attaching it as a branch and recovering the missing rows. Importantly, database branches do not consume extra storage, allowing you to create as many branches as needed to read data from these system-managed databases without incurring additional storage costs. Also, SingleStore will display these system-managed databases in the secondary region only if the corresponding database is attached and active in the primary region during failover.

Failback

To initiate failback from the secondary region to the primary region:

  1. Click the Failback button located under the Replication tab.

  2. Check the "I confirm..." checkbox and click Confirm in the popup window. This action triggers the failback process. A status bar displays progress.

The system automatically performs the following tasks during failback:

  • Configures the primary region environment.

  • Attaches replicated databases to the workspace and provides the connection string.

  • Updates user permissions and other metadata with changes from the secondary region.

Upon successful completion, the primary region becomes active, and the secondary region is no longer accessible.

Pre-provisioning

Pre-provisioning can be used to configure compute resources in a secondary region in advance.

This enables you to:

  • Configure the private endpoints in the secondary region, before failover is initiated.

  • Test DR by failing over to the secondary region without disrupting your production environment.

  • Failover is faster because compute is already running.

To pre-provision:

  1. In the Portal, navigate to the Replication tab.

  2. Click on Enable Pre-Provisioning.

This starts the background process to configure the secondary region with the same topology of workspaces as your primary region. During this process, your primary region continues to actively replicate data to the secondary region.

To validate your failover capabilities and ensure business continuity, you can switch to the secondary region, attach your database to a workspace, and start querying the data from the secondary region.

You can attach your application in the secondary region and verify that it can insert or update your database as expected, without impacting your production environment.

This seamless testing is possible because when you attach your database to a workspace in the secondary region SingleStore Helios automatically creates a branch of your database. This branch reflects all the data up to the point of attachment. More importantly, this branch is independent of your primary region, ensuring that any updates or modifications made during the test failover do not impact your production environment. For more information on branching refer to Database Branching.

Last modified: July 10, 2024

Was this article helpful?