SNAPSHOT DATABASE

Syntax

SQL

SNAPSHOT [DATABASE] <database_name>;

Remarks

database_name is the name of the database to snapshot.
The engine variable, snapshot_trigger_size, sets a threshold for the size, in bytes, that the transaction log must reach before the next snapshot is taken.
A user must have either the SUPER or BACKUP privilege to run this command.
SNAPSHOT DATABASE forces a manual snapshot, and can only be run on the master aggregator.
This command causes implicit commits. Refer to COMMIT for more information.
Refer to the Permissions Matrix for the required permissions.

This command forces data to the object store if you are using bottomless. This is useful before a heavy maintenance operation, like a cluster resize, to ensure that all the data is on S3 so that you can attach back to that point or later.

Snapshots occur automatically and can be manually triggered with the SNAPSHOT command.

During normal operation, SingleStore logs writes and DDL commands to transaction log files and periodically takes snapshots of the database.

When a snapshot is taken, a compact snapshot of the in-memory portion of a database is written to disk. This snapshot contains only in-memory data; that is, it contains rowstore segments of columnstore tables, metadata information about columnstore segments, and rowstore tables.

Snapshots occur at the following times:

When the transaction log reached the threshold set by snapshot_trigger_size
After large ingests to columnstore tables, during which new columnstore segments are written or merged during ingest, and old (immutable) segments need to be cleared out. The snapshot effectively accomplishes this, since they are stored as long as the transaction log references them.

The SNAPSHOT DATABASE command allows you to force a snapshot operation on a given database across all the partitions in a cluster. SNAPSHOT DATABASE is primarily useful to compact the logs to enable faster recovery if the SingleStore service is restarted. SingleStore recommends performing a manual SNAPSHOT on a database prior to performing maintenance operations (such as upgrades) that require you to take leaves offline, especially if your workload is highly transactional.

Consider a database where DDL queries being run frequently cancel each other out, such as CREATE TABLE and DROP TABLE for the same table. These commands would always be written to the transaction log as two commands. Comparatively, a snapshot of the same set of transactions would merge out these operations. Effectively, this means reduced recovery work in cases where a snapshot is taken immediately prior to a restart or any maintenance. This is because SingleStore only needs to recover the snapshot, rather than replaying the transaction log and recovering the previous snapshot.

Note: For workloads with ETL transactions, some users choose to take a snapshot at the end of each transaction in order to keep the cluster ready for a restart. This is not always necessary, however. In many cases, the amount of work being done during ETL is enough to make the transaction log surpass the snapshot_trigger_size (sometimes more than once), or enough data is ingested to trigger a snapshot.

Although taking frequent snapshots is beneficial in many cases, users who utilize DR cluster replication should exercise caution when configuring snapshot_trigger_size or triggering manual snapshots with SNAPSHOT DATABASE. For example, consider a case where a secondary DR cluster falls behind replaying transactions from snapshot A, while the primary cluster is logging transactions to what will be snapshot B. Then, the primary cluster ingests enough data to trigger snapshot B, and moves on to logging what will be snapshot C. The secondary cluster no longer has the capability to catch up on old write operations from snapshot A, and will reprovision instead. This causes the secondary cluster to drop its data, and start replaying the newest snapshot B from the primary cluster. While the secondary cluster reprovisions, it cannot be queried, which could impact any analytical workload running on the secondary cluster.

For more information about replication, see Replicating Data Across Clusters.

Example

SQL

SNAPSHOT DATABASE database_name;

Query OK, 0 rows affected (45.36 sec)