SingleStore Managed Service

Understanding Shard Key Selection
Shard Key
Data Distribution into Partitions

Data is distributed amongst the partitions in the leaf nodes based on the shard key selected when a table is created. Rows with the same shard key value are stored in the same partition, even if they are in a different table.

For example, the table below has the shard key that contains only the first column. All people with the same first name will be stored on the same partition.

CREATE TABLE people (
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(first)
);

Inside the node, values within the first column attempt to distribute evenly across all partitions.

data_skew_illustration.jpeg

But using a first name column as the shard key can create data skew because some first names are more common than others. Data skew occurs when data is unevenly distributed across partitions.

If first is used as the shard key, the data could be distributed amongst the partitions as follows. SingleStore computes a hash from all the columns in the shard key and uses this hash to assign the partition. Rows with the same value for the shard key column will reside on the same partition.

Data_skew_diagram_only.png

In the first partition, three of the rows have the same shard key value ("alex") so the rows would be assigned to the same partition. For the row with the shard key value of "tom", the hash computation just happened to assign this row to the first partition. This is the case with the other three partitions. The data in the people table is distributed unevenly across partitions resulting in data skew. If this table had millions of rows, the data skew could be even more pronounced.

When data is skewed, some partitions will store more data than others, throwing off the balance of the cluster. Those partitions will more data than the average will probably be asked to do more work during SELECT, UPDATE, and DELETE queries. Since a query executes as fast as it's slowest partition, data skew affects overall performance.

Choosing an appropriate shard key for a table is important for minimizing data skew. In the CREATE TABLE example, the user column would be a better choice for the shard key, in terms of even data distribution. It will be more evenly distributed, and it is likely to be a filter for queries against the table. However, it is best not to use a varchar column as the shard key at all because of the inherent potential for data skew. Also, varchar datatypes take longer to process during query exeution. Since JOIN operations are generally performed on the shard key, using a varchar datatype as the shard key can impact performance. In our example, you would get optimal data distribution by creating an auto-increment id column as the shard key.

CREATE TABLE people (
    id INT AUTO_INCREMENT PRIMARY KEY,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(id)
);

On a rowstore table, if a primary key is specified without a shard key, then the primary key will be used as the shard key. If the table doesn't have a primary key or a shard key, the engine will perform keyless sharding on that table. This means the engine performs a hash of the entire row, then randomly maps the resulting hash to a partition.

Detecting and Measuring Data Skew

The following methods can be used to detect and measure data skew.

Method 1: Inspect memory settings

With any version of SingleStore, you can manually compare maximum_table_memory and Alloc_table_memory (memory used by tables) on each node:

SHOW VARIABLES LIKE "maximum_%";
****
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| maximum_memory       | 48291 |
| maximum_table_memory | 43462 |
+----------------------+-------+
SHOW STATUS EXTENDED LIKE "Alloc_table_memory";
****
+--------------------+----------+
| Variable_name      | Value    |
+--------------------+----------+
| Alloc_table_memory | 43462 MB |
+--------------------+----------+
1 row in set (0.01 sec)

If memory consumption is significantly higher on specific nodes, skew is present. If memory consumption is fairly uniform across the leaf nodes, no skew is present.

Method 2: Find the number of rows and memory use per partition

The built-in INFORMATION_SCHEMA database contains metadata about tables, columns, indexes, and partitions. In particular, the TABLE_STATISTICS table contains one row per table partition, and can be used to see if a given partition has a lot more rows than average:

SELECT
    DATABASE_NAME,
    TABLE_NAME,
    ORDINAL AS PARTITION_ID,
    ROWS,
    MEMORY_USE
FROM INFORMATION_SCHEMA.TABLE_STATISTICS
WHERE TABLE_NAME = 'domains';
****
+---------------+------------+--------------+-------+------------+
| DATABASE_NAME | TABLE_NAME | PARTITION_ID | ROWS  | MEMORY_USE |
+---------------+------------+--------------+-------+------------+
| web           | domains    | 0            | 85916 |   11527936 |
| web           | domains    | 1            | 85925 |   11528512 |
| web           | domains    | 2            | 85427 |   11627712 |
| web           | domains    | 3            | 86185 |   11545152 |
| web           | domains    | 4            | 85592 |   11376128 |
| web           | domains    | 5            | 85549 |   11504448 |
...
Method 3: Calculate the memory skew for all tables, per database

Use the relative standard deviation (RSD) to calculate the memory skew across all partitions and tables in your database. The RSD is a decent way to measure skewness.

SELECT
    DATABASE_NAME,
    TABLE_NAME,
    FLOOR(AVG(ROWS)) AS avg_rows,
    ROUND(STDDEV(ROWS)/AVG(ROWS),3) * 100 AS row_skew,
    FLOOR(AVG(MEMORY_USE)) AS avg_memory,
    ROUND(STDDEV(MEMORY_USE)/AVG(MEMORY_USE),3) * 100 AS memory_skew
FROM INFORMATION_SCHEMA.TABLE_STATISTICS
GROUP BY 1, 2
HAVING SUM(ROWS) > 10000
ORDER BY row_skew DESC;
****
+---------------+----------------------+----------+----------+--------------+-------------+
| DATABASE_NAME | TABLE_NAME           | avg_rows | row_skew | avg_memory   | memory_skew |
+---------------+----------------------+----------+----------+--------------+-------------+
| web           | urls                 | 28881401 |    9.520 |  25733328291 |       3.200 |
| web           | domains              |   902817 |    0.100 |    111046491 |       0.200 |
...

A skew score of N means that there is about N% variation in the number of rows (or memory allocated) in a partition. As a general rule of thumb, skew less than 10% should not greatly affect system stability or performance and is not a cause for worry. If you do see persistent skew and observe other issues as well, such as a server running particularly hot, then you should further investigate the distribution of the values in your chosen shard keys.

Method 4: Calculate the skew per partition for the columns in a table

Use the relative standard deviation (RSD) to calculate the memory skew across all partitions and tables in your database. The RSD is a decent way to measure skewness.

If the skew factor is high, that indicates that some groups have very high cardinality and some have very low cardinality and can lead to long single-threaded operations.

SELECT ROUND(STDDEV(c)/AVG(c),3)*100 AS group_skew,
            PARTITION_ID()
FROM  (SELECT  col1,
               col2,
               ...,
               coln,
               count(*)             
       FROM <table_name>        
       GROUP BY 1, 2) sub
GROUP BY PARTITION_ID();
Method 5: Show the data distribution for a table if you were to reshard on another column or set of columns
SELECT WITH(leaf_pushdown=true) SUM(c) rows,
        PARTITION_ID()  
FROM  (SELECT count(*) c
       FROM <table_name>
       GROUP BY <potential_shard_key>) reshuffle
GROUP BY PARTITION_ID();
Resolving Data Skew

SingleStore does not support the use of ALTER TABLE to add or change shard keys on existing tables. If your data becomes skewed, there are a couple of workarounds to reduce the skew in a table.

One method to reduce skew in a table is by using the INSERT INTO… SELECT to redistribute the data with the force_random_reshuffle query hint. A common use case for this hint is on keyless sharded tables, as it allows forced redistribution of data among partitions, rather than inserting it locally into the same partition.

1) ORIGINAL TABLE 
####
CREATE TABLE people (
    id INT,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24) 
);

2) CREATE ANOTHER TABLE WITH A SHARD KEY
####
CREATE TABLE people_1 (
    id INT AUTO_INCREMENT,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(id)
);

3) INSERT INTO...SELECT using force_random_reshuffle
####
INSERT INTO people_1 (id, user, first, last)    
    SELECT WITH(force_random_reshuffle=1) *    
    FROM people;

Notice

To use this method, the column count in both tables must be the same or you will receive an error message.

Another method is to export your data. The entire database must be exported. Next, drop the table in question using the DROP TABLE syntax. Then recreate the table with a different shard key or multiple shard keys that will distribute the data more evenly. When you reload the data, SingleStore will automatically shard your data based on the new shard key

1a) RECREATE TABLE WITH A SINGLE SHARD KEY
####
CREATE TABLE people (
    id INT AUTO_INCREMENT,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(id)
);

1b) RECREATE TABLE WITH MULTIPLE SHARD KEYS
####
CREATE TABLE people (
    id INT AUTO_INCREMENT,
    user VARCHAR(24),
    first VARCHAR(24), 
    last VARCHAR(24),
    SHARD KEY(id,user)
);
Questions to Ask When Choosing Shard Keys
Local Joins and Distributed Joins

A local join is a join that can be executed in the partition where the data is currently stored. The tables that are joined must have a shard key that matches the join condition. The best way to accomplish this is to set the shard key on the column that you will be frequently joining on. In our first example of the people table with id as the shard key, say we are joining to an address table within the same database. To ensure your joins execute locally, make sure id is a column and a shard key in the address table.

A distributed join is a join where the shard key that matches the join condition is stored on different partitions. Distributed joins occur in 2 ways:

1) Repartition:

  • Each partition opens connections to every node.

  • Because the join requires the table to be sharded on the join condition, the query optimizer will create an internal, temporary results table in memory. This table will contain the results set at that stage of the query with a shard key that matches the join condition.

2) Broadcast:

  • Occurs when one of the joining tables is small. Therefore, no cost is incurred by sending a copy of the table to each node.

  • After the broadcast of the table, the join is performed on the partition where the larger table is stored.

Performance differences between repartition and broadcast operations

Queries that require distributed joins, i.e. queries which involve Broadcasts or Repartitions, generally require much larger numbers of connections and threads than other queries. The concurrency of such queries is automatically managed by Workload Manager. If you have any questions about this functionality, please contact us via the customer portal.Using the Workload Manager

In most scenarios, the default settings for workload management will schedule your workload appropriately to utilize cluster resources without exhausting connection and thread limits. If you encounter performance problems with distributed join queries run at high concurrency, see the Setting Resource Limits page for information about how to configure resource limit settings.Setting Resource Limits