Database Branching

Note

This is a Preview feature.

Database branching lets you quickly create private, independent copies of your database for testing, development, data recovery, and other scenarios. Branches share the same history as their parent, but once created, they diverge and become independent databases. This means you can insert, update, and delete data in a branch without impacting the parent database’s performance or stability.

Benefits

  • It is a cost-effective solution that saves on infrastructure and time in maintaining several duplicate environments.

  • You can create multiple isolated datasets of your production database, each containing data from a specific point in time, current or from an earlier moment.

Use Cases

  • Development and Testing

    As a developer, you can instantly spin up isolated branches with the latest replica of the production database or at a specific point in time. This enables efficient testing and iterative development without impacting the production environment.

  • Point-In-Time Recovery (PITR)

    You can use branching to rewind your database to any previous point in time within the retention period and recover your data. Just create a branch that captures your data at the desired time in the past, and seamlessly restore the lost or corrected information to your production database. You can do all this online without impacting your application.

    The recovery will succeed only if the PITR target is within the retention period. Before attempting a PITR, check the RETENTION_PERIOD_MINUTES and EARLIEST_AVAILABLE_TIME columns in the MV_BOTTOMLESS_DATABASES view to find out if the PITR is feasible.

    For standard customers, the time defaults to midnight in the timezone where the cluster is located. For example, if your cluster is on US-EAST-1, performing PITR to 2024-9-16 defaults to 2024-9-16 00:00:00 in Eastern Standard Time.

  • Upgrades and Performance Tuning

    You can upgrade your application on a branch, isolating and resolving any issues while the production database remains active. Any performance bugs can be investigated and tuned in the branch before implementing the solutions in the production environment.

Creating a Branch Using SQL

Note

Branching is currently supported only via SQL.

It is available in all AWS regions.

The following syntax creates a branch on the same workspace as the parent database:

ATTACH DATABASE <parentDB name> AS <branchDB name>;

To create a branch on a separate workspace for development, testing, or data analysis, begin by opening SingleStore Notebooks or the SQL editor. select your desired workspace, then use this syntax:

ATTACH DATABASE <parentDB name> AS <branchDB name>;

You can also create a branch at any specific point in time for data recovery purposes, known as Point-in-Time Recovery (PITR). To do this, use the following syntax:

ATTACH DATABASE <ParentDB Name> AS <BranchDB Name> AT TIME 'YYYY-MM-DD HH_MM_SS';

Note: An enterprise package is required to perform PITR at seconds granularity (HH:MM:SS). For Standard deployments, you can still restore your data at a day's granularity by specifying only the date (YYYY-MM-DD).

Example Scenarios

The following examples use a database named sales, and a table named orders. Create a database named sales:

CREATE DATABASE sales;

Create a table named orders:

CREATE TABLE orders ( 
order_id bigint(11) NOT NULL, 
customer_id int(11) NOT NULL, 
order_date date NOT NULL, 
order_status char(1) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, 
total_price decimal(15,2) NOT NULL, 
SHARD KEY (order_id), 
SORT KEY (order_date));

Scenario 1: As a developer, you want to test a new business logic before implementing the query on a production dataset.

Create a new workspace named 'workspace2 (this can be done via the Portal UI), navigate to this workspace in the SQL editor (under the dropdown list, select workspace2) and then run the branch command on 'workspace2.

ATTACH DATABASE sales AS branch_sales;

You can navigate to the orders table within the branch_sales database and test your new business logic without impacting the production dataset. You can insert, update or delete rows, test the performance of your queries etc.

Once done testing, you can simply drop the branch database:

DROP DATABASE branch_sales;

Scenario 2: A faulty query has deleted 1000 rows from the orders table. You can recover the data by simply creating a branch of your database at a point in time before the bad query was executed.

Suppose the bad query was executed on January 2nd. You may branch the sales database at a timestamp before the query was executed and recover the data.

ATTACH DATABASE sales AS recover_sales AT TIME2024-01-02 21_57_31’;

To recover the data, you may query the missing rows and copy them to the production orders table.

Alternatively, you may drop the sales database and then rename recover_sales as sales. Note: when you do this you will have to reconnect this newly renamed sales database to your application.

To drop your existing sales database:

DROP DATABASE IF EXISTS sales;

To rename recover_sales as sales:

DETACH DATABASE recover_sales;
ATTACH DATABASE recover_sales AS sales;

Using Branches

Branches can only be created within the same workspace group as the parent database and all workspaces need to have the same engine version. You can create a branch at the current time or at a previous point in time. Branches are read/write by default. User permissions from the parent database are automatically inherited by the branch database.

Branches are independent, so any updates to the branch database are not propagated to the parent database, and updates to the parent database are not propagated to the branch databases. Similarly dropping the branch database does not impact the parent database and vice versa.

When a new branch database is created, the in-memory data and blob cache are duplicated. However, no data gets duplicated in the object store (S3) when you create a branch. You only pay for what you add. Any storage used up to the point of the branch creation is not counted again. Only new data inserted or updated in the branch adds to your storage consumption.

Last modified: September 25, 2024

Was this article helpful?