Highly Selective Joins
On this page
Starting in version 7.
SingleStoreDB supports these highly selective joins using an adaptive hash join algorithm.
If there are only a few rows in the hash table, it switches to use a nested loop join strategy, seeking into the larger table (on the probe side) via the index on the join column of the table on the probe side.
If the hash build side produces a lot of rows, then it performs a normal hash join.
The following example takes advantage of this strategy for selective joins.
CREATE TABLE orders(oid INT,d DATETIME,SORT KEY(d),SHARD(oid),KEY(oid) USING hash);CREATE TABLE lineitems(id INT,oid INT,item INT,SORT KEY(oid),SHARD(oid));
Now, add some sample data to
INSERT INTO orders VALUES(1, NOW());
Execute the following command repeatedly, until the table has around 33.
INSERT INTO ordersSELECT oid+(SELECT MAX(oid) FROM orders), NOW()FROM orders;
lineitems table, such that each line item belongs to an order, and each order has exactly two line items.
INSERT INTO lineitems SELECT oid, oid, 1 FROM orders;INSERT INTO lineitems SELECT oid + 1000*1000*1000, oid, 2 FROM orders;
Find a selective
DATETIME value for
d to search on:
SELECT d, COUNT(*)FROM ordersGROUP BY d;
The result shows that a number of
DATETIME values only appear in one row in
The following query uses this date to produce a join result with exactly two rows:
SELECT *FROM orders o JOIN lineitems l ON o.oid = l.oidWHERE o.d = "2020-03-30 16:47:05";
This query filters rows on
o. to find a single row of orders, then joins to
lineitems via the hash index on
lineitems., using the new selective join algorithm.
In the JSON profile plan, you can see if a plan may be able to perform a join via a hash index.
join index in descriptions of columnstore filter operators.
"executor":"ColumnStoreFilter","keyId":4294968023,"condition":["o.oid = l.oid bloom AND l.oid = o.oid join index"],
You may see a condition that mentions
o.oid = l.oid bloom AND l.oid = o.oid join index
join index filter can also be viewed in the
A higher number of join columns having a matching columnstore hash index and fewer number of hash values result in better join performance.
product are two columnstore tables with hash indexes on the columns
code, then the following query has better join performance than query with single join column:
SELECT * FROMinventory JOIN productON inventory.id = product.id AND inventory.code = product.code;
See SingleStore Universal Storage – And Then There Was One for additional information about (1) using columnstores with hash indexes, sub-segment access, and fine-grain locking to enable OLTP operations on data bigger than will fit in RAM, and (2) using SPARSE rowstore compression to reduce TCO for rowstore tables with many NULL values.
Last modified: March 2, 2023