Rowstore

The total combined size of all rowstore tables is limited by the total available RAM on the leaf nodes in the workspace. It is important to reserve a reasonable amount of RAM, say 20%, for query execution. So the effective total rowstore table capacity will be reduced by this amount.

Note

Rowstore is no longer the default table storage format. As of version 7.3 of SingleStore, Columnstore is the default table storage format.

For applications that do many point lookups and small inserts, updates, and deletes, rowstore performs much better than columnstore. In general, since rowstores support a larger variety of workloads, they are a good starting point.

This section provides a conceptual overview of SingleStore Helios’s rowstore and includes considerations for optimizing your database performance using the rowstore.

Creating a Rowstore Table

Typically you will specify a shard key and one or more indexes for a rowstore, although a shard key and indexes are optional. A primary key, which is enforced to be unique, is also supported.

On rowstore tables, when the shard key is not the same column as the primary key, a skiplist index will also implicitly be created on the shard key columns.

Important

SingleStore supports both ordered (skiplist) and hash indices on rowstore tables.

Here is an example of a statement that creates a rowstore table:

CREATE ROWSTORE TABLE products (
ProductId INT,
Color VARCHAR(10),
Price INT,
dt DATETIME,
KEY (Price),
SHARD KEY (ProductId)
);

We define a SHARD KEY to explicitly control the data distribution. We define it on ProductId since sharding on a high cardinality column or columns generally allows for a more even distribution and prevents skew. The KEY specified on Price causes an index to be created on the Price column.

It is also possible to randomly distribute data by either omitting the shard key, or defining an empty shard key SHARD KEY(), as long as no primary key is defined.

Rowstore keys

Rowstores with multiple indexes support very fast seeking to find one or a small number of rows via several different keys, or access paths. SingleStore Helios rowstores are able to provide extremely fast lookup along multiple different access paths with low variance in response time from query to query.

The following statement creates a products table with keys on both Price and Color, and a unique (primary) key on ProductId.

CREATE ROWSTORE TABLE products (
ProductId INT,
Color VARCHAR(10),
Price INT,
dt DATETIME,
KEY (Price),
KEY (Color),
PRIMARY KEY(ProductId),
SHARD KEY (ProductId)
);

Primary keys must contain all the columns in the shard key so they can be enforced efficiently by looking at data in only a single shard (partition). Creating a primary key without specifying a shard key automatically shards on the primary key.

You can also create a key by creating an index on a rowstore using the CREATE INDEX statement.

Rowstore persistence

Rowstore data is fully persistent and updates on rowstore tables are done in transactions. Persistence is implemented for rowstores using periodic snapshots of the in-memory data, and a write-ahead log, which are both stored on a file system to make them permanent. If a SingleStore Helios node restarts, all its rowstore data will be recovered from the snapshots and log and the in-memory state for the rowstore will be rebuilt.

Garbage collection

Unneeded columnstore segments stored in memory are eliminated in a process called garbage collection. A buffer manager maintains unused memory blocks for future use until no query or operation is using the unused memory anymore.

The garbage collection process automatically runs in the background, so it's normally not necessary to execute the TRIGGER GC [FLUSH] command which manually activates the garbage collector. However, if memory-related errors or node failures are being generated, executing TRIGGER GC [FLUSH] might be appropriate. You can specify the number of garbage collection threads to run via the rowstore_gc_thread_count engine variable. See TRIGGER GC for more information about triggering garbage collection.

Porting applications to SingleStore Helios rowstores

If you are porting an application from another database that uses a disk-based rowstore structure based on B-tree, ISAM or a similar access method, you should start by using a SingleStore Helios rowstore with a key for each index on the original system. If the original system had a heap access method (unordered rowstore), you can start with a SingleStore Helios rowstore with no key.

OLTP operations on large data sets

For OLTP operations on large tables, the total cost of provisioning enough RAM, and possibly enough servers, when using rowstores, can become a significant concern for some users. If this is the case, and the application doesn’t need the very fastest row lookup time, consider using columnstore tables with hash indexes on the most frequently-used lookup key columns instead. Or, for wide tables with many null values, the total cost of ownership (TCO) can be reduced by using rowstore tables with SPARSE data compression.

See The Story Behind SingleStore’s Skiplist Indexes for details about how standard rowstore table indexes are implemented in SingleStore Helios.

Rowstore Sizing Estimations

In order to estimate the total memory used for a rowstore table, we simply need to estimate the amount of memory used for each row, and then multiply that by the number of rows in the table.

Memory Use Per Row

  • data: memory use of each column in the table, which can be found by utilizing the size per each data type listed in Data Types.

    • Be sure to consider changes in size that result from a column being nullable.

  • metadata overhead: 24 bytes

  • index(es):

    • 40 bytes per skiplist index

      • Skiplist indexes are created on rowstore tables by default in the following situations: if no primary key is specified, with the primary key unless otherwise specified, and when added explicitly after the table has already been been created.

    • 32 bytes per hash index

      • Hash indexes are created on Rowstore tables in the following situations: with the primary key when specified, and when added explicitly after the table has already been created.

    • For more information about keys and indexes see Understanding Keys and Indexes in SingleStore.

Memory Use Per Table

Based on the memory use per row, memory use for Rowstore tables has the following base formula:

(data + 64 bytes + (index size * number of secondary indexes)) * number of rows

data is the data size based on number of columns and their respective Data Type, 64 bytes is the metadata overhead + the primary key index (or the index created in its place where one isn't specified), index size is the size of the index type of any additional indexes on the table, number of secondary indexes is the number of additional indexes, and number of rows is the total number of rows in a given table.

Additional Sizing Estimates for Rowstore Table Overhead

  • Additional indexes and columns cause overhead to increase. The amount of increase depends on index and data type.

  • Every partition in a database will consume the same amount of table overhead. For example, if a leaf node has 8 partitions, and a table has a constant overhead of 32 MB, that lead will consume at least 256 MB of memory for that table.

  • Per row, a table with SPARSE compression enabled consumes 4 bytes per every 8 SPARSE columns, plus an additional 8 bytes. 

  • Deleted rows consume an additional 40 bytes, plus 8 bytes per index until they are physically deleted by the engine.

    • DELETE queries mark rows as deleted; they don't physically delete them right away. The row is removed at a later time when garbage collection runs, and the version can be safely deleted. Row garbage collection runs in the background, however OPTIMIZE TABLE can be used to force it to run immediately. Note that OPTIMIZE TABLE can result in a performance impact and should be run with caution.

Last modified: August 12, 2024

Was this article helpful?