SingleStore Managed Service

Distributed Joins
Reference Joins

As a general rule, SingleStore DB will efficiently execute any join query with a single sharded table and as many reference tables as you’d like. Since reference tables are fully replicated on every machine in the cluster, leaves can join against their local copies of reference tables.

These queries leverage reference joins:

SELECT * FROM a INNER JOIN r ON a.a1 = r.r1;
SELECT * FROM r LEFT JOIN a ON a.a1 = r.r1;
SELECT * FROM a, r r1, r r2, r r3;
SELECT * FROM a INNER JOIN
    (SELECT DISTINCT r1 FROM r) x
    ON a.a1 = x.c;
Aligning Shard Keys for Performance

Aligning the shard keys of large tables enables more efficient joining. It is possible to perform arbitrary distributed joins across any tables and along any column. However, if you join two tables with identical shard key signatures along that shard key, the joins will be performed local to the partitions, reducing network overhead.

CREATE TABLE users (
    id BIGINT AUTO_INCREMENT,
    user_name VARCHAR(1000),
    account_id BIGINT,
    PRIMARY KEY (id)
);

CREATE TABLE clicks (
    click_id BIGINT AUTO_INCREMENT,
    account_id BIGINT,
    user_id BIGINT,
    page_id INT,
    ts TIMESTAMP,
    SHARD KEY (user_id),
    PRIMARY KEY (click_id, user_id)
);

In this example, id is the shard key of the users table, and the shard key on the clicks table has the same signature (a single BIGINT). These queries join locally without network overhead:

SELECT * FROM users INNER JOIN clicks ON users.id = clicks.user_id WHERE clicks.page_id = 10;

SELECT avg(c1.t - c2.t) FROM clicks c1 INNER JOIN clicks c2 ON c1.user_id = c2.user_id WHERE c1.page_id > c2.page_id;

Whereas this query will stream rows between leaves:

SELECT u.account_id, count(distinct user_id), count(1)
FROM users u INNER JOIN clicks c ON u.account_id = c.account_id
GROUP BY u.account_id;

If you identify your data layout and join patterns in advance, this technique can be an extremely effective way to run performant joins between distributed tables. For more information about how queries execute, see EXPLAIN.