Database Branching

Note

This is a Preview feature.

Database branching lets you quickly create private, independent copies of your database for testing, development, data recovery, and other scenarios. Branches share the same history as their parent, but once created, they diverge and become independent databases. This means you can insert, update, and delete data in a branch without impacting the parent database’s performance or stability.

Benefits

  • It is a cost-effective solution that saves on infrastructure and time in maintaining several duplicate environments.

  • You can create multiple isolated datasets of your production database, each containing data from a specific point in time, current or from an earlier moment.

Use Cases

  • Development and Testing

    As a developer, you can instantly spin up isolated branches with the latest replica of the production database or at a specific point in time. This enables efficient testing and iterative development without impacting the production environment.

  • Point-In-Time Recovery (PITR)

    You can use branching to rewind your database to any previous point in time within the retention period and recover your data. Simply create a branch that captures your data at the desired time in the past, and seamlessly restore the lost or corrected information to your production database. You can do all this online without impacting your application.

  • Upgrades and Performance Tuning

    You can perform your application upgrades on a branch, thereby isolating and resolving any issues while the production database remains active. Any performance bugs can be investigated and tuned in the branch before implementing the solutions in the production environment.

Creating a Branch Using SQL

Note

Branching is currently supported only via SQL.

The following syntax creates a branch on the same workspace as the parent database:

ATTACH DATABASE <parentDB name> AS <branchDB name>;

The following syntax creates a branch on a different workspace from the parent database:

ATTACH DATABASE <parentDB name> FROM WORKSPACE GROUP <'group ID'> AS <branchDB name> ON WORKSPACE <workspace name>;

The following syntax creates a read-only branch on a separate workspace for scenarios such as data analysis:

ATTACH DATABASE <parentDB name> FROM WORKSPACE GROUP <'group ID'> ${attachMode} AS <BranchDB Name> ON WORKSPACE <workspace name>;

The following syntax creates a branch at a specific point in time:

ATTACH DATABASE <ParentDB Name> AS <BranchDB Name> AT TIME 'YYYY-MM-DD HH_MM_SS';

Example Scenarios

The following examples use a database called "sales", and a table called "orders". Create a database named SALES:

CREATE DATABASE sales;

Create a table called ORDERS:

CREATE TABLE orders ( 
order_id bigint(11) NOT NULL, 
customer_id int(11) NOT NULL, 
order_date date NOT NULL, 
order_status char(1) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL, 
total_price decimal(15,2) NOT NULL, 
SHARD KEY (order_id), 
SORT KEY (order_date));

Scenario 1: As a developer, you want to test a new business logic before implementing the query on a production dataset.

Create a new workspace called WORKSPACE2 (this can be done via the Portal UI), and then branch your SALES database on WORKSPACE2.

ATTACH DATABASE sales WORKSPACE GROUP '0fee214e-76dd-4ea9-918e-152e665b72ae'
AS branch_sales ON WORKSPACE 'workspace2';

If you already have WORKSPACE2 selected using the SQL Editor, then you just need to execute:

ATTACH DATABASE sales WORKSPACE GROUP '0fee214e-76dd-4ea9-918e-152e665b72ae' AS branch_sales;

You can navigate to the ORDERS table within the BRANCH_SALES database and test your new business logic without impacting the production dataset. You can insert, update or delete rows, test the performance of your queries etc.

Once done testing, you can simply drop the branch database:

DROP DATABASE branch_sales;

Scenario 2: Suppose a faulty query has deleted 1000 rows from the ORDERS table. You can recover the data by simply creating a branch of your database at a point in time before the bad query was executed.

Suppose the bad query was executed on January 2nd. You may branch the SALES database at a timestamp before the query was executed and recover the data.

ATTACH DATABASE sales WORKSPACE GROUP '0fee214e-76dd-4ea9-918e-152e665b72ae' AS recover_sales AT TIMESTAMP2024-01-02 21_57_31’;

To recover the data, you may query the missing rows and copy them to the production ORDERS table.

Alternatively, you may drop the SALES database and then rename RECOVER_SALES as SALES. Note: when you do this you will have to reconnect this newly renamed SALES database to your application.

To drop your existing SALES database:

DROP DATABASE IF EXISTS sales;

To rename RECOVER_SALES as SALES:

DETACH DATABASE recover_sales;
ATTACH DATABASE recover_sales AS sales;

Using Branches

Branches can only be created within the same workspace group as the parent database and all workspaces need to have the same engine version. You can create a branch at the current time or at a previous point in time. Branches are read/write by default. User permissions from the parent database are automatically inherited by the branch database.

Branches are independent, so any updates to the branch database are not propagated to the parent database, and updates to the parent database are not propagated to the branch databases. Similarly dropping the branch database does not impact the parent database and vice versa.

When a new branch database is created, the in-memory data and blob cache are duplicated. However, no data gets duplicated in the object store (S3) when you create a branch. You only pay for what you add. Any storage used up to the point of the branch creation is not counted again. Only new data inserted or updated in the branch adds to your storage consumption.

Last modified: February 9, 2024

Was this article helpful?