Query Execution
On this page
Distributed DML Query Execution
In this topic we will look at common DML query patterns and how they are executed through the distributed system.
You can use the EXPLAIN command to examine a query’s node-level query plans.
Let’s assume the following schema:
CREATE TABLE a (a1 int,a2 int,a3 int,SHARD KEY (a1, a2),KEY (a3));CREATE TABLE b (b1 int,b2 int,b3 int,SHARD KEY (b1, b2));CREATE REFERENCE TABLE r (r1 int,r2 int,PRIMARY KEY (r1),KEY (r2));
Index Matching
Matching on the shard key or the secondary index.
Matching the Shard Key
If you specify an equality on every column in the shard key, then the node will direct the query to exactly one partition.
These queries are sent to one partition:
SELECT * FROM a WHERE a1 = 4 AND a2 = 10;SELECT a3, count(*) FROM a WHERE a1 = 4 AND a2 = 10 GROUP BY a3;
These queries are sent to all partitions:
SELECT * FROM a WHERE a1 = 4;SELECT * FROM a WHERE a1 = 4 OR a2 = 10;SELECT * FROM a WHERE a1 IN (4, 5) AND a2 IN (10);
Secondary Index Matching
If your query uses a secondary (non-shard) index, then the node must send the query to every partition in the workspace.
This query matches the secondary index on the column a3:
SELECT * FROM a WHERE a3 = 5;
No Index Matching
Queries that do not match any index perform a full table scan on all partitions.
Distributed Joins
Aligning Shard Keys for Performance
Aligning the shard keys of large tables enables more efficient joining.
CREATE TABLE users (id BIGINT AUTO_INCREMENT,user_name VARCHAR(1000),account_id BIGINT,PRIMARY KEY (id));CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT,account_id BIGINT,user_id BIGINT,page_id INT,ts TIMESTAMP,SHARD KEY (user_id),PRIMARY KEY (click_id, user_id));
In this example, id
is the shard key of the users
table, and the shard key on the clicks
table has the same signature (a single BIGINT).
SELECT * FROM users INNER JOIN clicks ON users.id = clicks.user_id WHERE clicks.page_id = 10;SELECT avg(c1.ts - c2.ts) FROM clicks c1 INNER JOIN clicks c2 ON c1.user_id = c2.user_id WHERE c1.page_id > c2.page_id;
Whereas this query will stream rows between nodes:
SELECT u.account_id, count(distinct user_id), count(1)FROM users u INNER JOIN clicks c ON u.account_id = c.account_idGROUP BY u.account_id;
If you identify your data layout and join patterns in advance, this technique can be an extremely effective way to run performant joins between distributed tables.
Write Queries
UPDATE
and DELETE
Queries
The WHERE
clause in an UPDATE
or DELETE
query is optimized the same way as the WHERE
clause in a SELECT
query.
INSERT
Queries
SingleStore Helios executes INSERT
queries by analyzing the insert values relevant to the shard key and routing the query to the corresponding partition.INSERT INTO a (a1, a2, a3) VALUES (1, 2, 3)
would compute the hash value of (1, 2)
and map this value to the appropriate partition.
If you are bulk inserting data with INSERT
queries, then you should take advantage of the multi-insert syntax: INSERT INTO a (a1, a2, a3) VALUES (1, 2, 3), (2, 3, 4), .
.
Last modified: September 24, 2021