Choosing a Table Storage Type
The most important consideration when creating a table is its data layout. SingleStoreDB supports two types of tables: on-disk columnstores (which is the default table type for SingleStoreDB) and in-memory rowstore. Columnstore is also known as Universal Storage.
The following table enumerates the strengths and intended uses of the columnstore and rowstore.
Flash, SSD, or Disk-based Columnstore/Universal Storage | In-Memory Rowstore |
---|---|
Analytical workloads | Operational/transactional workloads |
Fast inserts over a small or large number of rows | Fast inserts and updates over a small or large number of rows |
Fast aggregations and table scans via sub-segment access | Random seek performance |
Updates/deletes are rare | Updates/deletes are frequent |
Compression |
|
Conceptual Differences Between Columnstore and Rowstore
Column-oriented stores or “columnstores” (columnstore is also referred to as Universal Storage) treat each column as a unit. Columnstore tables sequentially store data segments for each column on the same physical location on disk. This enables two important capabilities. One capability is being able to scan only the columns needed for the query with good cache locality during the scan. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together.
In addition, columnstore tables utilize a feature called sub-segment access with allows quick retrieval of a row once its position is known via hash indexing. Columnstore tables can have multiple multi-column hash indexes. Finally, row-level locking allows many concurrent updates of multiple rows in a columnstore to proceed, without requiring transactions to wait.
Columnstores are usually batch-loaded for optimal performance, although small-batch and streaming insert to columnstores can also perform well. The combination of efficient batch loading and processing of analytic queries make columnstore highly suitable for data warehousing scenarios.
In general, columnstore tables are effective when your workload matches the following descriptions as much as possible:
A large number of rows are scanned sequentially (i.e. millions of rows or >5% of the table)
Aggregation happens over only a few columns (e.g. <10 columns)
Small updates and deletes are rare; most affect large batches of rows
A simplified example of how columnstore data is physically stored is shown here:
Columnstore
ProductId |
---|
1 |
2 |
3 |
4 |
Color |
---|
Red x 2 |
Black |
White |
“x 2” indicates that the value is repeated twice.
Price |
---|
10 |
20 x 3 |
“x 3” indicates that the value is repeated three times.
Row-oriented stores or “rowstores” are the most common type of data stores used by relational databases. As the name suggests, a rowstore treats each row as a unit and stores all fields for a given row together in the same physical location. This results in very fast performance when running queries seek specific rows. This makes rowstores great for transactional workloads, where the database frequently selects, inserts, updates, and deletes individual rows, often referencing either most or all columns.
With rowstore, data is stored in lock-free indexes for great performance at high concurrency. Rowstore tables can use multiple indexes, allowing them to flexibly support many types of queries.
Finally, when rowstore tables are created with SPARSE compression, the tables are allowed to store more data in the same amount of RAM. Nullable, structured columns can use sparse data compression. The column data types include BIGINT
, INT
, MEDIUMINT
, SMALLINT
, TINYINT
, DATE
, DATETIME
, TIME
, TIMESTAMP
, and VARCHAR
.
Logical table and rowstore representation
ProductId | Color | Price |
---|---|---|
1 | Red | 10 |
2 | Red | 20 |
3 | Black | 20 |
4 | White | 20 |
Related Topics
Training: SingleStoreDB Storage
Check Your Understanding
Q: Do columnstore tables use memory?
A: Absolutely – SingleStoreDB uses the operating system disk buffer cache to cache segment files in memory. Good performance in columnstore can only be achieved when there is enough memory to cache the working set. In addition, columnstore tables use a rowstore buffer table as a special segment to batch writes to the disk.
Q: Do rowstore tables ever write to disk?
A: Yes – data in a rowstore is also written to the transaction log on disk so that it can be recovered when SingleStoreDB is restarted.
Q: Are rowstore tables always faster than columnstore tables?
A: No, columnstore tables are faster on some workloads – if the workload is batch inserts and sequential reads (e.g. an analytical workload with lots of scans) a columnstore can be significantly faster.
Q: What if I need small high-concurrency updates and my data doesn’t fit in memory?
A: You can also use columnstore tables while setting hash index(es) on columns you regularly filter on. Rowstore tables could also be effective if you create them with SPARSE
compression.
Related Topics
Training: SingleStoreDB Storage