Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
Distributed Joins
On this page
Aligning Shard Keys for Performance
Aligning the shard keys of large tables enables more efficient joining.
CREATE TABLE users (id BIGINT AUTO_INCREMENT,user_name VARCHAR(1000),account_id BIGINT,PRIMARY KEY (id));CREATE TABLE clicks (click_id BIGINT AUTO_INCREMENT,account_id BIGINT,user_id BIGINT,page_id INT,ts TIMESTAMP,SHARD KEY (user_id),PRIMARY KEY (click_id, user_id));
In this example, id
is the shard key of the users
table, and the shard key on the clicks
table has the same signature (a single BIGINT).
SELECT * FROM users INNER JOIN clicks ON users.id = clicks.user_id WHERE clicks.page_id = 10;SELECT avg(c1.ts - c2.ts) FROM clicks c1 INNER JOIN clicks c2 ON c1.user_id = c2.user_id WHERE c1.page_id > c2.page_id;
Whereas this query will stream rows between leaf nodes:
SELECT u.account_id, count(distinct user_id), count(1)FROM users u INNER JOIN clicks c ON u.account_id = c.account_idGROUP BY u.account_id;
If you identify your data layout and join patterns in advance, this technique can be an extremely effective way to run performant joins between distributed tables.
Last modified: September 7, 2023