Choosing a Columnstore Key
On this page
Columnstore tables have exactly one index, the sort key.
In addition, SingleStore stores metadata for each row segment in memory, which includes the minimum and maximum values for each column contained within the segment.
For example, we will use this columnstore table:
CREATE TABLE products (ProductId INT,Color VARCHAR(10),Price INT,Quantity INT,SORT KEY (Price));
The following table represents a logical arrangement of the data in a single partition of the database:
ProductId |
Color |
Price |
Quantity |
---|---|---|---|
1 |
Red |
10 |
2 |
2 |
Red |
20 |
2 |
3 |
Black |
20 |
2 |
4 |
White |
30 |
2 |
5 |
Red |
20 |
2 |
6 |
Black |
10 |
2 |
7 |
White |
25 |
2 |
8 |
Red |
30 |
2 |
9 |
Black |
50 |
2 |
10 |
White |
15 |
2 |
11 |
Red |
5 |
2 |
12 |
Red |
20 |
2 |
13 |
Black |
35 |
2 |
14 |
White |
30 |
2 |
15 |
Red |
4 |
2 |
Since this is a columnstore table, the data is split into segments.x N
to indicate that a value is repeated N times.
Row segment #1 of 3:
Row segment #2 of 3:
Row segment #3 of 3:
In this example, we can see there are 3 segments containing different ranges of the Price
column (4-15, 20-25, and 30-50).
Segment Elimination
The single most important consideration for columnstore keys is increasing the amount of segment elimination.
For queries that filter on the key column of the columnstore index, segment elimination is typically very efficient as row segments within each row segment group will not have overlapping segments.SELECT AVG(Price), AVG(Qty) FROM Products WHERE Price BETWEEN 1 AND 10;
will eliminate all segments except row segment #1 Price 4-15.
The segment size (number of rows per segment) also impacts query execution with respect to sort keys.
Ordered Scans
Ordered scan over a sort key is faster than sorting a table.
Questions to Ask When Choosing a Columnstore Key:
-
Is the data always filtered by some column (e.
g. insert timestamp or event type)? Ensure that the common columns for all queries are in the columnstore key to improve segment elimination. -
Is the data generally inserted in order by some column (e.
g. insert timestamp)? It’s best to put that column first in the columnstore key to minimize the amount of work required by the background columnstore segment merger. -
Does one column in your key have a higher cardinality than the other? It’s best to put the lowest cardinality columns first to increase the likelihood that segment elimination will be able to affect later columns.
Check Your Understanding
Q: Can the columnstore key and the shard key be different?
A: Absolutely – they are not related at all.event_
) to improve join performance and to pick an unrelated columnstore key that matches the common filters on the table (e.event_
or event_
).
Q: Is (insert_
a good columnstore key?
A: Not necessarily, because the number of distinct microsecond precision timestamps is likely very high.insert_
and not use region_
.region_
.DATE_
.
For more information, see How the Columnstore Works.
Last modified: August 4, 2022