Local and Unlimited Database Storage Concepts
On this page
You can store data locally in a SingleStore cluster or remotely in an object store.
Local Storage Databases
Local storage databases are stored in your SingleStore cluster.
To backup or restore a local storage database, use the BACKUP DATABASE and RESTORE DATABASE commands, respectively.
Unlimited Storage Databases
Note
This feature is not available in all editions of SingleStore.
Unlimited storage databases (also known as "bottomless" databases) are kept on remote object storage.
The cache size is set by the maximum_
The blobs for high availability (HA) are kept in the blob cache.maximum_
includes the local storage used for HA.
For unlimited storage databases, SingleStore caches data from remote storage on local disks or SSDs.
The views mv_
, mv_
and mv_
can be used to monitor the cache behavior.
In an unlimited storage database:
-
All columnstore data is stored externally in an object store.
Recently accessed columnstore data objects are also cached locally on the cluster's local storage. -
Columnstore data can be committed locally, so you do not have to wait for it to get uploaded, which helps keep the latency as low as possible.
-
All rowstore data is stored locally in cluster memory and externally in an object store.
-
Data updates made on the cluster are flushed to the object store asynchronously.
Typically, the object store will be no more than one minute behind the latest update on the cluster. Data is imported into an on-disk rowstore, and then asynchronously exported into unlimited storage on the cloud. This is the key to the high performance as data is ingested into the rowstore, "cold" data is put into long term online storage, keeping the "hot" data on disk where it is faster to access. -
Data is compressed (lz4) upon insertion into remote storage and decompressed upon retrieval.
This seamless process can help reduce network bandwidth and storage costs. The compression ratio is highly data-dependent. For rowstore data, it can be as much as 20x. For columnstores, the data is already compressed, but lz4 can compress it further, often in the range of 20% to up to 2x. -
Data is always stored in an object store but may only have a subset in local storage in your cluster.
Querying data that is in object store but not in local storage is comparatively slower the first time as it has to bring the data into the local storage. Hence it is useful to know how much data and what data is in the cache. -
When you resume a cluster the data does not automatically fault back into the local storage.
You will have to run queries to "warm the cache" on a resume.
Note
You cannot use the REPLICATE DATABASE
command to replicate an unlimited storage database.
Disk Cache with Unlimited Storage
To ensure consistent query performance when querying through a hash index, irrespective of when the query was last accessed, the OPTIMIZE TABLE .
command can be used to explicitly warm the disk cache with the index blobs for specific tables.
Local Disk Space Recommendations for Unlimited Databases
There’s no one-size-fits-all recommendation because every application has different access and locality-of-reference patterns.
If you have a very large database and you know that you have a high locality of reference, then having the local storage just, say, 20% larger than your working set size can be a good choice.
To monitor the rate of blob cache fetch activity, you can query information about fetch events in the mv_
information schema table.
You can also query mv_
which tells you how much data is being downloaded from the blob cache per second.
Use the following query to find the total number of GB for an entire database:
SELECT SUM(mv_columnstore_files.size)/(1024*1024*1024) AS dataSizeGBFROM mv_columnstore_filesWHERE mv_columnstore_files.database_name = 'database_name';
Use the following query to find the amount of data in GB that is cached in a local disk in a cluster:
SELECT COUNT(*) AS numSegments,SUM(mv_cached_blobs.size)/(1024*1024*1024) AS blobdataSizeGBFROM mv_cached_blobsWHERE mv_cached_blobs.database_name = 's2_dataset_tpch'AND type = 'primary';
You have to run the above two queries against the information_
database.
Point-in-Time Recovery (PITR)
Note
This feature is not available in all editions of SingleStore.
PITR is a user-initiated operation that restores a unlimited storage database to a point in time.
The PITR window or timeline is the length of the retention period.
For unlimited storage databases in SingleStore, garbage collection cannot occur at a given version until all partitions can recover from that version.
For SingleStore, you control the PITR timeline by setting the bottomless_
variable.
When you invoke PITR, you can specify either a timestamp or a milestone.
Attaching (restoring) an unlimited storage database can be faster than restoring an equivalent local storage database.
Prior to working with PITR, the enable_
engine variable must be set to ON
(the default value).
To work with PITR, use the following commands:
-
CREATE DATABASE (Create an unlimited storage database)
-
CREATE MILESTONE (Create a milestone, which is a marker to which you can restore)
-
ATTACH DATABASE (Attach an unlimited storage database and bring the database online)
-
DETACH DATABASE (Take an unlimited storage database offline)
For creating milestones and restoring a database to a milestone, see Attach an Unlimited Storage Database Using Point-in-Time Recovery (PITR).
Last modified: April 24, 2024