BACKUP DATABASE
Backs up a database. The backup files are saved on the master aggregator and on the leaves.
Refer to the Back Up and Restore Data guide for additional information.
Syntax
Backup to a Local or Network Drive
BACKUP [DATABASE] <database_name> [WITH { INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO "backup_path" [TIMEOUT <timeout_value>] [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
Backup to S3
BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO S3 "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
configuration_json:
'{"region":"your_region",["multipart_chunk_size_mb":<size_in_MB>,]
"endpoint_url":"http://other_endpoint"
[,"x-amz-server-side-encryption":"<encryption_type>" [, "x-amz-server-side-encryption-aws-kms-key-id":"<optional_key>" ] |
"x-amz-server-side-encryption-customer-algorithm":"<encryption_type>",
"x-amz-server-side-encryption-customer-key":"<encrypted_or_unencrypted_key>",
"x-amz-server-side-encryption-customer-key-MD5":"<key>"
]
}'
credentials_json:
'{"aws_access_key_id": "replace_with_your_access_key_id",
"aws_secret_access_key": "replace_with_your_secret_access_key",
["aws_session_token": "replace_with_your_temp_session_token",]
["role_arn":"replace_with_your_role_arn"]
}'
Backup to an S3 Compatible Storage Provider
BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO S3 "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)];
configuration_json:
'{"endpoint_url":"http://storage_provider_endpoint"
[,"s3_force_path_style":"<true|false>"]
}'
credentials_json:
'{"aws_access_key_id": "replace_with_your_access_key_id",
"aws_secret_access_key": "replace_with_your_secret_access_key"
}'
Backup to Google Cloud Storage
BACKUP [DATABASE] <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO GCS "bucket/path" [TIMEOUT <timeout_value>] [CONFIG configuration_json] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
configuration_json:
'{}'
credentials_json:
'{"access_id": "replace_with_your_google_access_key",
"secret_key": "replace_with_your_google_secret_key"
}'
See the remarks for more details.
Backup to Azure Blobs
BACKUP DATABASE <database_name> [WITH {INIT | DIFFERENTIAL | SPLIT PARTITIONS [BY 2] }] TO AZURE "container/blob-prefix" [TIMEOUT <timeout_value>] CREDENTIALS credentials_json [(OPTION resource_pool = resource_pool_name)] [TAGS (<user defined tag names>];
credentials_json:
'{"account_name": "your_account_name_here",
"account_key": "your_account_key_here"
}'
Backup Using a Connection Link
BACKUP [DATABASE] <database_name> TO LINK [link_database_name.]connection_name "backup_path" [TIMEOUT <timeout_value>] [TAGS (<user defined tag names>];
Arguments
<database_name>
is the SingleStoreDB database whilelink_database_name
is the database that stores the connection link.resource_pool_name
is optional. It is the name of the resource pool where the backup operation runs.Use the
[WITH { INIT | DIFFERENTIAL }]
clause to make an incremental backup. If you don’t specify this clause, a full backup is made.Note
The
[WITH { INIT | DIFFERENTIAL }]
clause is not supported for backing up an unlimited storage database.Use the
[WITH {SPLIT PARTITIONS [BY 2]}]
clause to double the number of partitions in your database in an offline manner. See Using BACKUP WITH SPLIT PARTITIONS for details..
Remarks
Because backups go directly from each node to the backup destination, all nodes must have write access to the destination.
Disk usage increases while backing up non-bottomless databases. As a database is being backed up blobs are written to disk where they remain until after the backup is finished (or killed). Once the backup is finished the regular cleanup process removes the blobs from disk.
CONFIG
andCREDENTIALS
can be specified in either order (CONFIG
followed byCREDENTIALS
orCREDENTIALS
followed byCONFIG
).The
multipart_chunk_size_mb
must be in the range of [5..500]. By default, the chunk size is 5 MB. A larger chunk size allows users to upload large files without going over Amazon’s limitation on maximum number of parts per upload. Although, a larger chunk size increases the chance of a network error during the upload to S3. If a chunk fails to upload, SingleStoreDB retries uploading it until the limit set on the number of retries by AWS is reached. Each partition will use "multipart_chunk_size_mb" MB(s) of additional memory.The
BACKUP DATABASE
command writes a consistent snapshot of the database to disk. This means a backup stores the data in the database as it existed at the time theBACKUP DATABASE
operation started (all committed transactions at the time theBACKUP DATABASE
started will be stored in the backup). Backup files containing the database’s reference tables are saved on the master aggregator. Backup files are also saved on the leaves.This command can be run on the master aggregator node, or a child aggregator node. For more information, see Node Requirements for SingleStoreDB Commands.
This command causes implicit commits. See COMMIT for more information.
See Backing Up and Restoring Data for a list of items that are included in or excluded from a backup.
BACKUP DATABASE
is an online operation, which means that writes will continue to run while theBACKUP DATABASE
is in progress. Any writes executed after theBACKUP DATABASE
has started will not be reflected in the stored backup.All partitions of the database need to be in the online or replicating state to run a
BACKUP DATABASE
operation. The backup file is created on the server and not the client, unlikemysqldump
.By default,
BACKUP DATABASE
makes a lock-free backup. An exception isBACKUP DATABASE ... WITH SPLIT PARTITIONS
because theWITH SPLIT PARTITIONS
option does not support lock-free backups.Incremental backups contain incremental data only for columnstore tables. For rowstore tables, an incremental backup contains a full copy of the rowstore data (instead of incremental data).
A backup of system databases cannot be created.
Remote databases, i.e. databases on a secondary cluster to which the primary database is being replicated, cannot be backed up with the
BACKUP DATABASE
command.TIMEOUT
, a value specified in milliseconds, determines the length of time to wait for theBACKUP DATABASE
command to commit across the cluster. If not specified, thedefault_distributed_ddl_timeout
global variable value is used.TAGS
can have a list of descriptive labels that help in locating backups with the same tag.This command cannot be run on system databases.
The
subprocess_io_idle_timeout_ms
engine variable controls the amount of time the engine waits for before timing out and failing the backup. You can try increasing it's value (e.g. 1800000ms) to avoid the timeout error.See the Permission Matrix for the required permission.
There are two different types of backup files, partition level backups and columnstore backups. With rowstore, you only see partition level backups but with columnstore you can see both. Example of rowstore and columnstore backup::
c2-user@leaf2(71):/tmp/adtech0806> ls -lrt 2total 113576 3-rw-------. 1 memsql memsql 4580594 Oct 1 21:06 adtech_2.backup 4-rw-------. 1 memsql memsql 14013022 Oct 1 21:06 adtech_6.backup 5-rw-r--r--. 1 memsql memsql 49177088 Oct 1 21:06 adtech_2.backup_columns0.tar 6-rw-r--r--. 1 memsql memsql 48519680 Oct 1 21:06 adtech_6.backup_columns0.tar
Example of only a rowstore backup on a single leaf:
=ec2-user@leaf1(123):~> ls -l /tmp/inmembackup/ 2total 1593540 3-rw-------. 1 memsql memsql 203873226 Oct 1 21:58 rstoreonly_0.backup 4-rw-------. 1 memsql memsql 204036519 Oct 1 21:58 rstoreonly_12.backup 5-rw-------. 1 memsql memsql 203773854 Oct 1 21:58 rstoreonly_16.backup 6-rw-------. 1 memsql memsql 203912757 Oct 1 21:58 rstoreonly_20.backup 7-rw-------. 1 memsql memsql 204124707 Oct 1 21:58 rstoreonly_24.backup
Backing up to a Local or Network Drive
When a local backup is made, the backup files are saved on the SingleStoreDB master aggregator node and leaf nodes.
Backing up to a network drive is preferable to backing up locally. This is because if any backup files stored locally on the SingleStoreDB nodes become corrupted or are lost, the entire backup will not be able to be restored.
Local Backups
The path
backup_path
on each node needs to be accessible by thememsqld
process.The
backup_path
is relative to the directory stored in thedatadir
engine variable by default. You can include\
at the front of thebackup_path
to override the default and use an absolute path.
Note
Incremental backups cannot be stored on a local file system.
Backup to S3
BACKUP ... TO S3
writes the backup files to the specified S3 bucket.SingleStoreDB backup will add
database_name.backup
to the path and put the objects in the pathpath/database_name.backup/
.bucket/path/database_name.backup
must not currently exist; otherwise, an error will be returned.aws_session_token
is optional. This is only needed if your AWS account uses the AWS Security Token Service.region
is optional. It is the AWS region where S3 bucket resides.endpoint_url
is optional.role_arn
is optional and can be used instead of specifyingaws_access_key_id
andaws_secret_access_key
values. This is only needed if your AWS security is setup to require a role.compatibility_mode
makes it possible to backup to Google Cloud Storage (GCS) using S3’s API. However, GCS does not support multipart uploads, which is the default mechanism for uploading large files. (In a multi-part upload, S3 breaks up the data in a file into small chunks which are automatically merged together on the server side resulting in a single file.) Incompatibility_mode:true
certain optimizations are turned off to allow the use of a non-default uploader automatically. Note this may be less resilient to random network failures because the retries use small chunks. By default, it is not advisable to use large chunks as it will lead to memory overhead for the whole system.Backup to an S3 bucket requires the following permissions:
s3::GetObject s3::PutObject s3::ListBucket
As an alternative to using a
BACKUP ... TO S3
statement where you specify theCONFIG
andCREDENTIALS
clauses, you can use theBACKUP ... TO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
Backup to an S3 Compatible Storage Provider
BACKUP ... TO S3
with theendpoint_url
set to the URL of a storage provider creates an S3 compatible backup at the storage provider.SingleStoreDB backup will add
database_name.backup
to the path and put the objects in the pathpath/database_name.backup/
.bucket/path/database_name.backup
must not currently exist; otherwise, an error will be returned.aws_access_key_id
is optional. It is the access key id for accessing the storage provider.aws_secret_access_key
is optional. It is the secret access key for accessing the storage provider.You should specify the
compatibility_mode
setting when backing up to Dell EMC ECS object storage.If you're backing up to an S3 compatible storage you should specify the
compatibility_mode:true
setting in your backup command.As an alternative to using a
BACKUP ... TO S3
statement where you specify theCONFIG
andCREDENTIALS
clauses, you can use theBACKUP ... TO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.s3_force_path_style
is an optional boolean JSON config option that defaults to true. It specifies whether to use path style (the default: region.amazonaws.com/bucket) or virtual address style (bucket.region.amazonaws.com) syntax when specifying the location of the bucket.For s3 compatible services that only support virtual address paths, specify
"s3_force_path_style":"false"
in the config options.
Backup to Google Cloud Storage
The
CONFIG
clause may optionally specify anendpoint_url
.The
CREDENTIALS
clause is requiredWe support only HMAC keys
The
CREDENTIALS
clause should be a JSON object with two fields:access_id
: usually a 24 or 60 character alphanumeric string, which is linked to the Google account, typically all uppercase and starts withGOOG
.secret_key
: usually a 40 character Base-64 encoded string that is linked to a specificaccess_id
.
As an alternative to using a BACKUP ... TO GCS
statement where you specify the CONFIG
and CREDENTIALS
clauses, you can use the BACKUP ... TO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.
Back up to Azure Blobs
BACKUP ... TO AZURE
writes the backup files to the specified Azure container.SingleStoreDB backup will append
database_name.backup
to the blob prefix and will put objects into the “path”blob-prefix/database_name.backup/
.container/blob-prefix/database_name.backup
must not currently exist; otherwise, an error will be returned.As an alternative to using a
BACKUP ... TO AZURE
statement where you specify theCREDENTIALS
clause, you can use theBACKUP ... TO LINK
statement, where you reference a connection link. For more information, see Configuring and Using Connection Links.BACKUP ... TO AZURE
does not support Azure Data Lake Storage v2.
Backup Using a Connection Link
The
BACKUP.. TO LINK
command writes the backup files to the specified container using a connection link. To use this command, you only need to know the connection link name, not the connection details and configuration.This command supports backup to S3, Azure, and GCS connections only.
Examples
Making an Incremental Backup
The Backing Up and Restoring Data topic provides an example of making incremental backups.
All of the examples shown below make full backups.
Local Backup
The following example backs up a database to /my-backup-dir
.
BACKUP DATABASE memsql_demo TO "/my-backup-dir";
The following example backs up a database to the directory stored in the datadir
engine variable.
BACKUP DATABASE memsql_demo TO "./";
The following example backs up a database to the my-relative-backup-dir
directory, which is relative to the directory stored in the datadir
engine variable. In this example, if datadir
contained the directory /var/lib/memsql/leaf-3308-aaaaaaaaaa/data
, the backup would be saved to the /var/lib/memsql/leaf-3308-aaaaaaaaaa/data/my-relative-backup-dir
directory.
BACKUP DATABASE memsql_demo TO "./my-relative-backup-dir";
The following example creates a backup with tags.
BACKUP DATABASE db1 TO "/tmp/db1.backup" TAGS ("production", "no-prune");
S3 Backup
The following example backs up a database to an S3 bucket.
BACKUP DATABASE memsql_demo TO S3 "backup_bucket/backups/6_1_2018" CONFIG '{"region":"us-east-1"}' CREDENTIALS '{"aws_access_key_id":"your_access_key_id","aws_secret_access_key":"your_secret_access_key"}';
The following example backs up a database to an S3 bucket using an Amazon Resource Name (ARN) for AWS Identity and Access Management (IAM).
BACKUP DATABASE mydatabase TO S3 'my-bucket/memsql-backup/' CONFIG '{"region":"us-east-1"}' CREDENTIALS '{"role_arn": "arn:aws:iam::<AccountID>:role/EC2AmazonS3FullAccess"}';
The following examples demonstrate the CONFIG
clause for backing up to encrypted buckets.
--- For Server-Side Encryption (SSE) --- CONFIG '{"region":"us-east-1", "x-amz-server-side-encryption":"AES256"}'
--- For Server-Side Encryption - Key Management Service (SSE-KMS) --- CONFIG '{"region":"us-east-1", "x-amz-server-side-encryption":"aws:kms", "x-amz-server-side-encryption-aws-kms-key-id": "<optional_key>"}'
--- For Server-Side Encryption with Customer-Provided Encryption Keys (SSE-C) --- CONFIG '{"region":"us-east-1", "endpoint_url":"<target_endpoint_url>", "x-amz-server-side-encryption-customer-algorithm":"AES256", "x-amz-server-side-encryption-customer-key":"<key>", "x-amz-server-side-encryption-customer-key-MD5":"<key>"}'
The following example demonstrates the use of the compatibility_mode
clause:
BACKUP <dbname> TO S3 "<target-bucket>" CONFIG '{"compatibility_mode": true, "endpoint_url": "https://storage.googleapis.com"}' CREDENTIALS '{"aws_access_key_id": "<keyid>....", "aws_secret_access_key ": “<access-key>"}';
Using virtual address path style syntax, available beginning in version 7.6:
BACKUP test TO S3 "mytestdb" CONFIG '{"endpoint_url":"https://mybucket.mys3compatstorageprovider.com/", "compatibility_mode": true, "s3_force_path_style": "false"}' CREDENTIALS '{"aws_access_key_id": "<your access key ID here","aws_secret_access_key": "your secret key here"}';
Backup to Google Cloud Storage
The following example backs up a database to a Google Cloud Storage bucket.
BACKUP DATABASE memsql_demo TO GCS "backup_bucket/backups/6_1_2018" CREDENTIALS '{"access_id":"your_google_access_key","secret_key":"your_google_secret_key"}';
Azure Blobs Backup
The following example backs up a database to an Azure bucket.
BACKUP DATABASE memsql_demo TO Azure "backup_container/backups/6_1_2018" CREDENTIALS '{"account_name":"your_account_name_here","account_key":"your_account_key_here"}';
Backup Using a Resource Pool
The following example backs up a database to /my-backup-dir
, using an existing resource pool, general
.
BACKUP DATABASE memsql_demo TO "/my-backup-dir" OPTION (resource_pool = general);
Backup Using a Connection Link
The following example backs up the database db1
to the bucket at the specified path, using an S3 connection link named S3_link
stored in the database db2
. Ensure that S3_link
already exists in db2
. Refer to the CREATE_LINK topic for details on creating a connection link.
BACKUP DATABASE db1 TO LINK db2.S3_link "backup_path/backups/6_1_2020"
Note: db1
is the SingleStoreDB database while db2
is the database that stores the connection link S3_link
.
CONFIG
Clause Examples
This clause is optional and needs to be specified only if non-default values are used.
For S3, you can use any options that are supported by S3.
To specify a non-default region
, use:
CONFIG '{"region": "us-west-1"}'
The disable_gunzip
option disables the decompression of files with the .gz extension. If this option is disabled or missing, files with the .gz extension will be decompressed.
CONFIG '{"disable_gunzip" : true}'
You can specify suffixes for files such that only files that have the specified suffix are considered. These suffixes are a JSON array of strings.
CONFIG '{"suffixes": ["csv"]}'
You can specify that the requester will be charged for the request. Bucket owners need not specify this parameter in their requests.
CONFIG '{"region": "us-east-1","request_payer": "requester"}'
For Azure you can specify,
CONFIG '{"blob_endpoint": "http://%s:%d/%s" % (self.host, self.API_PORT, self.account_name)}'