Other Schema Concepts

Reference Database

When a user creates a database, two transaction logs are created for each partition of that database, in addition one special transaction log is created called the reference database. This database holds metadata about the whole database (for example, schemas of tables and stored procedures) as well as the data for special tables called reference tables. The reference database is replicated to each node in a cluster because every node needs this metadata. Further, reference tables take advantage of this database to implement joins that do not need to go over the network. The reference database has its master on the master aggregator, and only the master aggregator can write to it (this is why only the MA can execute DDL queries).

Reference Tables

Reference tables are relatively small tables that do not need to be distributed and are present on every node in the cluster. Reference tables are implemented via primary-secondary replication to every node in the cluster from the master aggregator to the leaves asynchronously. Replication enables reference tables to be dynamic: updates that you perform to a reference table on the master aggregator are quickly reflected on every machine in the cluster. Since reference tables are replicated to every node in the cluster, it eliminates the need to transfer the table’s data across the network during query execution. Reference tables should mostly be used for tables that change rarely, because the write operations on reference tables consume a lot more resources.

SingleStore aggregators can take advantage of reference tables’ ubiquity by pushing joins between reference tables and a distributed table onto the leaves. Imagine you have a distributed clicks table storing billions of records and a smaller customers table with just a few million records. Since the customers table is small, it can be replicated on every node in the cluster. If you run a join between the clicks table and the customers table, then the bulk of the work for the join will occur on the leaves.

Important

Reference tables are a convenient way to implement dimension tables.

Best Practices for using Reference Tables

  • Do not use reference tables to store a large number of rows, e.g. more than a few million rows.

  • If your workload requires intensive searches for individual rows (say, for OLTP-type operations), use a rowstore reference table.

  • Use columnstore reference tables to store dimension tables for a data warehouse type schema, with a typical star-join query and non-intensive updates.

DDL Forwarding for Reference Tables

Data Definition Language (DDL) commands for reference tables can be run from the master aggregator or a child aggregator node. DDL commands run on a child aggregator are forwarded to the master aggregator and propagated from there. (See Cluster Management Commands for more information.)

The enable_query_forwarding engine variable controls the DDL query forwarding for all database tables including reference tables.

DML Forwarding for Reference Tables

Data Manipulation Language (DML) commands for reference tables can be run from the master aggregator or a child aggregator node, however, with some exceptions. The following DML commands must be run directly on the master aggregator; they will fail with an error when run on a child aggregator.

  • The LOAD DATA command

  • Query profiling commands such as PROFILE and SHOW PROFILE if the sync_permissionsengine variable is set to ON.

  • DML statements used in stored procedures

  • DML queries that reference temporary tables

  • DML queries in multi-statement transactions that are enclosed within BEGIN and COMMIT transaction statements.

DDL commands run on a child aggregator are forwarded to the master aggregator and propagated from there. (See Cluster Management Commands for more information.)

The enable_dml_query_forwarding engine variable controls the DML query forwarding for reference tables.

Temporary Tables

Temporary tables can be created as Columnstore or Rowstore tables. For more information, see CREATE TABLE. Rowstore temporary tables are created in memory. Temporary tables exist for the duration of a client session. This means they are scoped to the connection that opened them, cannot be queried by other users, and are dropped once the connection has ended. They can also be dropped manually without removing the connection.

SingleStore does not write logs or take snapshots of temporary tables. Temporary tables are designed for temporary, intermediate computations. Since temporary tables are neither persisted nor replicated in SingleStore, they have high availability disabled. This means that if a node in a cluster goes down and a failover occurs, all the temporary tables on the cluster lose data. Whenever a query references a temporary table after a node failover, it returns an error. For example,

"Temporary table <table> is missing on leaf <host> due to failover. The table will need to be dropped and recreated."

Important

To prevent loss of data on node failover, use SingleStore tables that have high availability enabled.

Views cannot reference temporary tables because temporary tables only exist for the duration of a client session. Although SingleStore does not materialize views, views are available as a service for all clients, and so cannot depend on client session-specific temporary tables.

CREATE TEMPORARY TABLE can be run on any aggregator, not just the master aggregator. Temporary tables are sharded tables, and can be modified and queried like any permanent table, including distributed joins.

Global Temporary Tables

Another type of temporary table is the global temporary table. Like temporary tables, they are not persisted.

Note

Global temporary tables are not supported for columnstore.

Unlike temporary tables, global temporary tables are shared across sessions, i.e. they exist beyond the duration of a session. They are never automatically dropped and must always be dropped manually. They can also be queried by other users since they are not session dependent.

Global temporary tables can be used as a temporary space for processing data that can be accessed from multiple connections, for example, performing ETL (Extract, Transform, and Load) operations on modern dynamic databases.

If failover occurs, global temporary tables lose data and enter an errored state; they need to be dropped and recreated. However, dropping a global or non-global temporary table does not drop its plancache from the disk and retains the cache if the table is recreated with the same schema.

DDL such as CREATE ROWSTORE GLOBAL TEMPORARY TABLE and DML for global temporary tables can be run on a master, or a child aggregator node.

Note: Both global temporary tables and non-global temporary tables cannot be altered.

Types of Indexes

SingleStore, like all standard SQL databases, allows the creation of indexes on tables, which speed up certain access patterns. As with MySQL, SingleStore supports two index type keywords. The BTREE keyword is for compatibility with MySQL syntax and will create the default SingleStore skiplist index instead. The sort key index type is for columnstore tables.

Note

KEY() USING CLUSTERED COLUMNSTORE is a legacy syntax that is equivalent to SORT KEY(). SingleStore recommends using SORT KEY().

Skiplist Indexes

The default index type in SingleStore is a skiplist. Skiplists in SingleStore are meant to replace the B-Tree indexes used by most other databases, including MySQL. Skiplists are optimized to run in memory as they can be implemented lock free and offer extremely fast insert performance. Like B-Trees, they offer an expected O(log(n)) lookup performance and can be traversed in sorted order.

Unlike B-Trees in MySQL, skiplists in SingleStore are uni-directional (singly linked). Each column in a compound skiplist index can be specified as ascending (ASC) or descending (DESC). The default is ASC. Which one you pick will not impact lookup performance, but it does impact scan performance depending on the direction the index is scanned. Scanning a skiplist in reverse order is approximately twice as costly as scanning in forward order. So, if you have an ASC index and you run a query that would traverse the index in descending order (ORDER BY DESC for example), then the query will require a more expensive iteration than if the index were DESC.

SingleStore supports skiplists only on rowstore tables. For more information on what a skiplist is and why it is used in SingleStore see: The Story Behind SingleStore’s Skiplist Indexes

Columnstore Indexes

Columnstore indexes leverage columnstore technology to efficiently store and retrieve large numbers of values from disk (using flash or SSD is recommended). Because columnstore indexes are backed by disk and don’t have the requirement that all data must fit in memory, they are typically very useful for analytical workloads. SingleStore currently supports sort key indexes which, when added to a table, will make the entire table structure backed by the columnstore. Currently columnstore indexes cannot be combined with in-memory row store indexes on the same table. For more information about using columnstore indexes in SingleStore, see How the Columnstore Works.

Hash Indexes

Due to the restrictive case detailed above, HASH indexes should only be used when there is a demonstrated need and measurable benefit on your particular dataset and workload. In these specific cases, HASH indexes provide fast exact-match access to unique values. This is because the hash index is stored in a sparse array of buckets indexed by a hash function on the relevant columns eg: hash(column_a, column_b). Queries can quickly find exact match data by examining only the bucket identified by the hash function. However they cannot easily scan over a subset of the index. For multi-column indexes, query filters must match all of the index columns to be able to take advantage of the index. SingleStore supports HASH indexes on both columnstore and rowstore tables. For more information, see USING HASH behavior and Highly Selective Joins.

Warning

For rowstore tables, a HASH index will only be utilized if the query filters on all the columns in a multi-column HASH index.

For columnstore tables, a HASH index will be utilized on all or any of the columns in a multi-column HASH index.

Consider an example table:

CREATE TABLE t(a int, b int, INDEX(a, b) USING HASH);

Suppose we are running queries like:

SELECT * FROM t WHERE a < 3;

EXPLAIN shows us that since we are performing a range scan and not filtering on all the columns in our hash index, a full Table Scan is performed.

EXPLAIN SELECT * FROM t WHERE a < 3;
+------------------------------------------------+
| EXPLAIN                                        |
+------------------------------------------------+
| Gather partitions:all                          |
| Project [t.a, t.b]                             |
| Filter [t.a < 3]                               |
| TableScan db.t                                 |
+------------------------------------------------+

The hash index is only utilized if the query uses only equality predicates, and filters on all columns in the hash index.

EXPLAIN SELECT * FROM t WHERE a=3 AND b=7;
+--------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------+
| Gather partitions:all                                                                                                    |
| Project [t.a, t.b]                                                                                                       |
| IndexRangeScan db.t, KEY a (a, b) USING HASH storage:lf_hashtable scan:[a = 3 AND b = 7]                                 |
+--------------------------------------------------------------------------------------------------------------------------+

Important

A query on a columnstore table that employs an equality filter on a column that has a hash index defined on that column will always use the hash index. For this reason, if the filter is very unselective (returns many records), the hash index should not be used.

Runtime Plan Choosing

SingleStore can dynamically select which index to use for a query at runtime. Instead of collecting statistics and building histograms, SingleStore can compute these statistics for a given query on-demand by inspecting its indexes. If a query can match more than one index, SingleStore compiles an execution plan for each choice, along with the necessary expression logic to cheaply analyze and evaluate which plan to choose at runtime. This process eliminates the need to manually recompute statistics on indexes.

Index Hints

SingleStore supports the following index hint syntax:

tbl_name [index_hint]
index_hint:
USE {INDEX | KEY} (index_list)
| IGNORE {INDEX | KEY} (index_list)
| FORCE {INDEX | KEY} (index_list)
index_list:
index_name [, index_name] ...

For example:

SELECT * FROM t1 USE INDEX(idx_name);

--OR

SELECT * FROM t1 FORCE INDEX(idx_name);
  • USE and FORCE hints force the use of one of the specified indexes to run the query. In SingleStore, there is no difference between a USE and FORCE hint.

  • IGNORE hints disallow the specified indexes from being used to run the query.

The EXPLAIN <query> statement can be used to show which indexes the query considers and which one it will actually use.

Index Commands

An index may also be specified when creating or altering tables.

Last modified: September 13, 2024

Was this article helpful?