Choosing a Table Storage Type

The most important consideration when creating a table is its data layout. SingleStore Helios supports two types of tables: on-disk columnstores (which is the default table type for SingleStore Helios) and in-memory rowstore. Columnstore is also known as Universal Storage.

The following table enumerates the strengths and intended uses of the columnstore and rowstore.

Flash, SSD, or Disk-based Columnstore/Universal Storage

In-Memory Rowstore

Analytical workloads, but can handle operational/transactional workloads with relatively frequent updates and deletes

Demanding operational/transactional workloads with high update and delete rates

Fast inserts over a small or large number of rows

Fast inserts and updates over a small or large number of rows

Fast aggregations and table scans via sub-segment access

Random seek performance

Compression

SPARSE compression option

Conceptual Differences Between Columnstore and Rowstore

Column-oriented stores or columnstores (columnstore is also referred to as Universal Storage) treat each column as a unit. Columnstore tables sequentially store data segments for each column on the same physical location on disk. This enables two important capabilities. One capability is being able to scan only the columns needed for the query with good cache locality during the scan. The other capability is that columnstores lend themselves well to compression. For example, repeating and similar values can easily be compressed together.

In addition, columnstore tables utilize a feature called sub-segment access with allows quick retrieval of a row once its position is known via hash indexing. Columnstore tables can have multiple multi-column hash indexes. Finally, row-level locking allows many concurrent updates of multiple rows in a columnstore to proceed, without requiring transactions to wait.

Columnstores are usually batch-loaded for optimal performance, although small-batch and streaming insert to columnstores can also perform well. The combination of efficient batch loading and processing of analytic queries make columnstore highly suitable for data warehousing scenarios.

Columnstores can handle operational and transactional workloads with relatively frequent updates and deletes, but rowstore can perform better for a demanding operational and transactional workload with high update and delete rates. Column group indexes can be used on columnstore tables to speed up full-row retrievals and updates, particularly for wide tables.

In general, columnstore tables are effective when your workload matches the following descriptions as much as possible:

  • A large number of rows are scanned sequentially (i.e. millions of rows or >5% of the table)

  • Aggregation happens over only a few columns (e.g. <10 columns)

A simplified example of how columnstore data is physically stored is shown here:

Columnstore

ProductId

1

2

3

4

Color

Red x 2

Black

White

x 2 indicates that the value is repeated twice.

Price

10

20 x 3

x 3 indicates that the value is repeated three times.

Row-oriented stores or rowstores are the most common type of data stores used by relational databases. As the name suggests, a rowstore treats each row as a unit and stores all fields for a given row together in the same physical location. This results in very fast performance when running queries seek specific rows. This makes rowstores great for transactional workloads, where the database frequently selects, inserts, updates, and deletes individual rows, often referencing either most or all columns.

With rowstore, data is stored in lock-free indexes for great performance at high concurrency. Rowstore tables can use multiple indexes, allowing them to flexibly support many types of queries.

Finally, when rowstore tables are created with SPARSE compression, the tables are allowed to store more data in the same amount of RAM. Nullable, structured columns can use sparse data compression. The column data types include BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT, DATE, DATETIME, TIME, TIMESTAMP, and VARCHAR.

Logical table and rowstore representation

ProductId

Color

Price

1

Red

10

2

Red

20

3

Black

20

4

White

20

Related Topics

Check Your Understanding

Q: Do columnstore tables use memory?

A: Absolutely – SingleStore Helios uses the operating system disk buffer cache to cache segment files in memory. Good performance in columnstore can only be achieved when there is enough memory to cache the working set. In addition, columnstore tables use a rowstore buffer table as a special segment to batch writes to the disk.

Q: Do rowstore tables ever write to disk?

A: Yes – data in a rowstore is also written to the transaction log on disk so that it can be recovered when SingleStore Helios is restarted.

Q: Are rowstore tables always faster than columnstore tables?

A: No, columnstore tables are faster on some workloads – if the workload is batch inserts and sequential reads (e.g. an analytical workload with lots of scans) a columnstore can be significantly faster.

Q: What if I need small high-concurrency updates and my data doesn’t fit in memory?

A: You can also use columnstore tables while setting hash index(es) on columns you regularly filter on. Rowstore tables could also be effective if you create them with SPARSE compression.

Related Topics

Last modified: February 8, 2024

Was this article helpful?