BACKUP DATABASE

Backs up a database. The backup files are saved on the master aggregator and on the leaves.

Refer to the Back Up and Restore Data guide for additional information.

Syntax

Backup to a Local or Network Drive

BACKUP [DATABASE] <database_name> [WITH { INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO "backup_path" [TIMEOUT <timeout_value>] [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];

Backup to S3

BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO S3 "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
configuration_json:
'{"region":"your_region",["multipart_chunk_size_mb":<size_in_MB>,]
"endpoint_url":"http://other_endpoint"
[,"x-amz-server-side-encryption":"<encryption_type>" [, "x-amz-server-side-encryption-aws-kms-key-id":"<optional_key>" ] |
"x-amz-server-side-encryption-customer-algorithm":"<encryption_type>",
"x-amz-server-side-encryption-customer-key":"<encrypted_or_unencrypted_key>",
"x-amz-server-side-encryption-customer-key-MD5":"<key>"
]
}'
credentials_json:
'{"aws_access_key_id": "replace_with_your_access_key_id",
"aws_secret_access_key": "replace_with_your_secret_access_key",
["aws_session_token": "replace_with_your_temp_session_token",]
["role_arn":"replace_with_your_role_arn"]
}'

Backup to an S3 Compatible Storage Provider

BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO S3 "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)];
configuration_json:
'{"endpoint_url":"http://storage_provider_endpoint"
[,"s3_force_path_style":"<true|false>"]
}'
credentials_json:
'{"aws_access_key_id": "replace_with_your_access_key_id",
"aws_secret_access_key": "replace_with_your_secret_access_key"
}'

Backup to Google Cloud Storage

BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO GCS "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
configuration_json:
'{}'
credentials_json:
'{"access_id": "replace_with_your_google_access_key",
"secret_key": "replace_with_your_google_secret_key"
}'

See the remarks for more details.

Backup to Azure Blobs

BACKUP DATABASE <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO AZURE "container/blob-prefix" [TIMEOUT <timeout_value>] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
credentials_json:
'{"account_name": "your_account_name_here",
"account_key": "your_account_key_here"
}'
BACKUP [DATABASE] <database_name> TO LINK [link_database_name.]connection_name "backup_path" [TIMEOUT <timeout_value>] [TAGS (<user defined tag names>];

Arguments

  • <database_name> is the SingleStore database while link_database_name is the database that stores the connection link.

  • resource_pool_name is optional. It is the name of the resource pool where the backup operation runs.

  • Use the [WITH { INIT | DIFFERENTIAL }] clause to make an incremental backup. If you don’t specify this clause, a full backup is made.

    Note

    The [WITH { INIT | DIFFERENTIAL }] clause is not supported for backing up an unlimited storage database.

  • Use the [WITH {SPLIT PARTITIONS [BY 2]}] clause to double the number of partitions in your database in an offline manner. See Using BACKUP WITH SPLIT PARTITIONS for details..

Remarks

  • Because backups go directly from each node to the backup destination, all nodes must have write access to the destination.

  • Disk usage increases while backing up non-bottomless databases. As a database is being backed up blobs are written to disk where they remain until after the backup is finished (or killed). Once the backup is finished the regular cleanup process removes the blobs from disk.

  • CONFIG and CREDENTIALS can be specified in either order (CONFIG followed by CREDENTIALS or CREDENTIALS followed by CONFIG).

  • The multipart_chunk_size_mb must be in the range of [5..500]. By default, the chunk size is 5 MB. A larger chunk size allows users to upload large files without going over Amazon’s limitation on maximum number of parts per upload. Although, a larger chunk size increases the chance of a network error during the upload to S3. If a chunk fails to upload, SingleStore retries uploading it until the limit set on the number of retries by AWS is reached. Each partition will use "multipart_chunk_size_mb" MB(s) of additional memory.

  • The BACKUP DATABASE command writes a consistent snapshot of the database to disk. This means a backup stores the data in the database as it existed at the time the BACKUP DATABASE operation started (all committed transactions at the time the BACKUP DATABASE started will be stored in the backup). Backup files containing the database’s reference tables are saved on the master aggregator. Backup files are also saved on the leaves.

  • This command can be run on the master aggregator node, or a child aggregator node. For more information, see Node Requirements for SingleStore Commands.

  • This command causes implicit commits. See COMMIT for more information.

  • See Backing Up and Restoring Data for a list of items that are included in or excluded from a backup.

  • BACKUP DATABASE is an online operation, which means that writes will continue to run while the BACKUP DATABASE is in progress. Any writes executed after the BACKUP DATABASE has started will not be reflected in the stored backup.

  • All partitions of the database need to be in the online or replicating state to run a BACKUP DATABASE operation. The backup file is created on the server and not the client, unlike mysqldump.

  • By default, BACKUP DATABASE makes a lock-free backup. An exception is BACKUP DATABASE ... WITH SPLIT PARTITIONS because the WITH SPLIT PARTITIONS option does not support lock-free backups.

  • Incremental backups contain incremental data only for columnstore tables. For rowstore tables, an incremental backup contains a full copy of the rowstore data (instead of incremental data).

  • A backup of system databases cannot be created.

  • Remote databases, i.e. databases on a secondary cluster to which the primary database is being replicated, cannot be backed up with the BACKUP DATABASE command.

  • TIMEOUT, a value specified in milliseconds, determines the length of time to wait for the BACKUP DATABASE command to commit across the cluster. If not specified, the default_distributed_ddl_timeout global variable value is used.

  • TAGS can have a list of descriptive labels that help in locating backups with the same tag.

  • This command cannot be run on system databases.

  • The subprocess_io_idle_timeout_ms engine variable controls the amount of time the engine waits for before timing out and failing the backup. You can try increasing it's value (e.g. 1800000ms) to avoid the timeout error.

  • Refer to the Permission Matrix for the required permission.

  • BACKUP ... TO a 3rd party file storage system (S3, Azure Blob, or GSC) requires BACKUP and OUTBOUND privileges.

    Without the required privileges, an error will be generated:

    BACKUP ... TO:
    BACKUP t1 TO S3 'testing/output'
    CONFIG '{"region":"us-east-1"}'
    CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}';
    ERROR 1227 (42000): Access denied; you need (at least one of) the OUTBOUND privilege(s) for this operation

    How to set the privileges for a user to be allowed to execute this command:

    BACKUP ... TO:
    GRANT BACKUP, OUTBOUND ON *.* to user1;

    The OUTBOUND privilege will be displayed in the SHOW GRANTS command:

    SHOW GRANTS FOR user1;
    +---------------------------------------------+
    | Grants for user1@%                          |
    +---------------------------------------------+
    | GRANT BACKUP, OUTBOUND ON *.* TO 'user1'@'% |
    +---------------------------------------------+
  • There are two different types of backup files, partition level backups and columnstore backups. With rowstore, you only see partition level backups but with columnstore you can see both. Example of rowstore and columnstore backup::

    c2-user@leaf2(71):/tmp/adtech0806> ls -lrt
    2total 113576
    3-rw-------. 1 memsql memsql  4580594 Oct  1 21:06 adtech_2.backup
    4-rw-------. 1 memsql memsql 14013022 Oct  1 21:06 adtech_6.backup
    5-rw-r--r--. 1 memsql memsql 49177088 Oct  1 21:06 adtech_2.backup_columns0.tar
    6-rw-r--r--. 1 memsql memsql 48519680 Oct  1 21:06 adtech_6.backup_columns0.tar

    Example of only a rowstore backup on a single leaf:

    =ec2-user@leaf1(123):~> ls -l /tmp/inmembackup/
    2total 1593540
    3-rw-------. 1 memsql memsql 203873226 Oct  1 21:58 rstoreonly_0.backup
    4-rw-------. 1 memsql memsql 204036519 Oct  1 21:58 rstoreonly_12.backup
    5-rw-------. 1 memsql memsql 203773854 Oct  1 21:58 rstoreonly_16.backup
    6-rw-------. 1 memsql memsql 203912757 Oct  1 21:58 rstoreonly_20.backup
    7-rw-------. 1 memsql memsql 204124707 Oct  1 21:58 rstoreonly_24.backup

Backing up to a Local or Network Drive

  • When a local backup is made, the backup files are saved on the SingleStore master aggregator node and leaf nodes.

  • Backing up to a network drive is preferable to backing up locally. This is because if any backup files stored locally on the SingleStore nodes become corrupted or are lost, the entire backup will not be able to be restored.

Local Backups

  • The path backup_path on each node needs to be accessible by the memsqld process.

  • The backup_path is relative to the directory stored in the datadirengine variable by default. You can include \ at the front of the backup_path to override the default and use an absolute path.

Note

Incremental backups cannot be stored on a local file system.

Backup to S3

  • BACKUP ... TO S3 writes the backup files to the specified S3 bucket.

  • SingleStore backup will add database_name.backup to the path and put the objects in the path path/database_name.backup/.

  • bucket/path/database_name.backup must not currently exist; otherwise, an error will be returned.

  • aws_session_token is optional. This is only needed if your AWS account uses the AWS Security Token Service.

  • region is optional. It is the AWS region where S3 bucket resides.

  • endpoint_url is optional.

  • role_arn is optional and can be used instead of specifying aws_access_key_id and aws_secret_access_key values. This is only needed if your AWS security is set up to require a role.The role needs to be alive for the entire backup process and will result in an assume_role on every S3 API call. It is specified as {"role_arn": "xxxx"} in the credentials JSON.

  • compatibility_mode makes it possible to backup to Google Cloud Storage (GCS) using S3’s API.  However, GCS does not support multipart uploads, which is the default mechanism for uploading large files. (In a multi-part upload, S3 breaks up the data in a file into small chunks which are automatically merged together on the server side resulting in a single file.)   In compatibility_mode:true certain optimizations are turned off to allow the use of a non-default uploader automatically. Note this may be less resilient to random network failures because the retries use small chunks. By default, it is not advisable to use large chunks as it will lead to memory overhead for the whole system.

  • Backup to an S3 bucket requires the following permissions:

    s3::GetObject
    s3::PutObject
    s3::ListBucket
  • As an alternative to using a BACKUP ... TO S3 statement where you specify the CONFIG and CREDENTIALS clauses, you can use the BACKUP ... TO LINK statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.

Backup to an S3 Compatible Storage Provider

  • BACKUP ... TO S3 with the endpoint_url set to the URL of a storage provider creates an S3 compatible backup at the storage provider.

  • SingleStore backup will add database_name.backup to the path and put the objects in the path path/database_name.backup/.

  • bucket/path/database_name.backup must not currently exist; otherwise, an error will be returned.

  • aws_access_key_id is optional. It is the access key id for accessing the storage provider.

  • aws_secret_access_key is optional. It is the secret access key for accessing the storage provider.

  • You should specify the compatibility_mode setting when backing up to Dell EMC ECS object storage.

  • If you're backing up to an S3 compatible storage you should specify the compatibility_mode:true setting in your backup command.

  • As an alternative to using a BACKUP ... TO S3 statement where you specify the CONFIG and CREDENTIALS clauses, you can use the BACKUP ... TO LINK statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.

  • s3_force_path_style is an optional boolean JSON config option that defaults to true. It specifies whether to use path style (the default: region.amazonaws.com/bucket) or virtual address style (bucket.region.amazonaws.com) syntax when specifying the location of the bucket.

    For s3 compatible services that only support virtual address paths, specify "s3_force_path_style":"false" in the config options.

Backup to Google Cloud Storage

  • The CONFIG clause may optionally specify an endpoint_url.

  • The CREDENTIALS clause is required

  • We support only HMAC keys

  • The CREDENTIALS clause should be a JSON object with two fields:

    access_id: usually a 24 or 60 character alphanumeric string, which is linked to the Google account, typically all uppercase and starts with GOOG.

    secret_key: usually a 40 character Base-64 encoded string that is linked to a specific access_id.

As an alternative to using a BACKUP ... TO GCS statement where you specify the CONFIG and CREDENTIALS clauses, you can use the BACKUP ... TO LINK statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.

Back up to Azure Blobs

  • BACKUP ... TO AZURE writes the backup files to the specified Azure container.

  • SingleStore backup will append database_name.backup to the blob prefix and will put objects into the pathblob-prefix/database_name.backup/.

  • container/blob-prefix/database_name.backup must not currently exist; otherwise, an error will be returned.

  • As an alternative to using a BACKUP ... TO AZURE statement where you specify the CREDENTIALS clause, you can use the BACKUP ... TO LINK statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.

  • BACKUP ... TO AZURE does not support Azure Data Lake Storage v2.

  • The BACKUP.. TO LINK command writes the backup files to the specified container using a connection link. To use this command, you only need to know the connection link name, not the connection details and configuration.

  • This command supports backup to S3, Azure, and GCS connections only.

Examples

Making an Incremental Backup

The Backing Up and Restoring Data topic provides an example of making incremental backups.

All of the examples shown below make full backups.

Local Backup

The following example backs up a database to /my-backup-dir.

BACKUP DATABASE memsql_demo TO "/my-backup-dir";

The following example backs up a database to the directory stored in the datadir engine variable.

BACKUP DATABASE memsql_demo TO "./";

The following example backs up a database to the my-relative-backup-dir directory, which is relative to the directory stored in the datadir engine variable. In this example, if datadir contained the directory /var/lib/memsql/leaf-3308-aaaaaaaaaa/data, the backup would be saved to the /var/lib/memsql/leaf-3308-aaaaaaaaaa/data/my-relative-backup-dir directory.

BACKUP DATABASE memsql_demo TO "./my-relative-backup-dir";

The following example creates a backup with tags.

BACKUP DATABASE db1 TO "/tmp/db1.backup" TAGS ("production", "no-prune");

S3 Backup

The following example backs up a database to an S3 bucket.

BACKUP DATABASE memsql_demo TO S3 "backup_bucket/backups/6_1_2018"
CONFIG '{"region":"us-east-1"}'
CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}';

The following examples demonstrate the CONFIG clause for backing up to encrypted buckets.

--- For Server-Side Encryption (SSE) ---
CONFIG '{"region":"us-east-1", "x-amz-server-side-encryption":"AES256"}'
--- For Server-Side Encryption - Key Management Service (SSE-KMS) ---
CONFIG '{"region":"us-east-1",
"x-amz-server-side-encryption":"aws:kms",
"x-amz-server-side-encryption-aws-kms-key-id": "<optional_key>"}'
--- For Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) ---
CONFIG '{"region":"us-east-1", "endpoint_url":"<target_endpoint_url>",
"x-amz-server-side-encryption-customer-algorithm":"AES256",
"x-amz-server-side-encryption-customer-key":"<key>",
"x-amz-server-side-encryption-customer-key-MD5":"<key>"}'

The following example demonstrates the use of the compatibility_mode clause:

BACKUP <dbname> TO S3 "<target-bucket>"
CONFIG '{"compatibility_mode": true, "endpoint_url": "https://storage.googleapis.com"}'
CREDENTIALS '{"aws_access_key_id": "<keyid>....",  "aws_secret_access_key ": “<access-key>"}';

Using virtual address path style syntax, available beginning in version 7.6:

BACKUP test TO S3 "mytestdb"
CONFIG '{"endpoint_url":"https://mybucket.mys3compatstorageprovider.com/", "compatibility_mode": true, "s3_force_path_style": "false"}'
CREDENTIALS '{"aws_access_key_id": "<your access key ID here","aws_secret_access_key": "your secret key here"}';

Backup to Google Cloud Storage

The following example backs up a database to a Google Cloud Storage bucket.

BACKUP DATABASE memsql_demo TO GCS "backup_bucket/backups/6_1_2018"
CREDENTIALS '{"access_id":"your_google_access_key","secret_key":"your_google_secret_key"}';

Azure Blobs Backup

The following example backs up a database to an Azure bucket.

BACKUP DATABASE memsql_demo TO Azure "backup_container/backups/6_1_2018"
CREDENTIALS '{"account_name":"your_account_name_here","account_key":"your_account_key_here"}';

Backup Using a Resource Pool

The following example backs up a database to /my-backup-dir, using an existing resource pool, general.

BACKUP DATABASE memsql_demo TO "/my-backup-dir" OPTION (resource_pool = general);

The following example backs up the database db1 to the bucket at the specified path, using an S3 connection link named S3_link stored in the database db2. Ensure that S3_link already exists in db2. Refer to the CREATE_LINK topic for details on creating a connection link.

BACKUP DATABASE db1 TO LINK db2.S3_link "backup_path/backups/6_1_2020"

Note: db1 is the SingleStore database while db2 is the database that stores the connection link S3_link.

CONFIG Clause Examples

This clause is optional and needs to be specified only if non-default values are used.

For S3, you can use any options that are supported by S3.

To specify a non-default region, use:

CONFIG '{"region": "us-west-1"}'

The disable_gunzip option disables the decompression of files with the .gz extension. If this option is disabled or missing, files with the .gz extension will be decompressed.

CONFIG '{"disable_gunzip" : true}'

You can specify suffixes for files such that only files that have the specified suffix are considered. These suffixes are a JSON array of strings.

CONFIG '{"suffixes": ["csv"]}'

You can specify that the requester will be charged for the request. Bucket owners need not specify this parameter in their requests.

CONFIG '{"region": "us-east-1","request_payer": "requester"}' 

For Azure you can specify,

CONFIG '{"blob_endpoint": "http://%s:%d/%s" % (self.host, self.API_PORT, self.account_name)}'

Last modified: November 29, 2023

Was this article helpful?