Highly Selective Joins
On this page
Working with Highly Selective Joins
Starting in version 7.
SingleStore supports these highly selective joins using an adaptive hash join algorithm.
-
If there are only a few rows in the hash table, it switches to use a nested loop join strategy, seeking into the larger table (on the probe side) via the index on the join column of the table on the probe side.
-
If the hash build side produces a lot of rows, then it performs a normal hash join.
The following example takes advantage of this strategy for selective joins.
CREATE TABLE orders(oid INT,d DATETIME,SORT KEY(d),SHARD(oid),KEY(oid) USING hash);CREATE TABLE lineitems(id INT,oid INT,item INT,SORT KEY(oid),SHARD(oid));
Now, add some sample data to orders
:
INSERT INTO orders VALUES(1, NOW());
Execute the following command repeatedly, until the table has around 33.
INSERT INTO ordersSELECT oid+(SELECT MAX(oid) FROM orders), NOW()FROM orders;
Add 67.lineitems
table, such that each line item belongs to an order, and each order has exactly two line items.
INSERT INTO lineitems SELECT oid, oid, 1 FROM orders;INSERT INTO lineitems SELECT oid + 1000*1000*1000, oid, 2 FROM orders;
Find a selective DATETIME
value for d
to search on:
SELECT d, COUNT(*)FROM ordersGROUP BY d;
The result shows that a number of DATETIME
values only appear in one row in orders
.2020-03-30 16:47:05
.
The following query uses this date to produce a join result with exactly two rows:
SELECT *FROM orders o JOIN lineitems l ON o.oid = l.oidWHERE o.d = "2020-03-30 16:47:05";
This query filters rows on o.
to find a single row of orders, then joins to lineitems
via the hash index on lineitems.
, using the new selective join algorithm.
In the JSON profile plan, you can see if a plan may be able to perform a join via a hash index.join index
in descriptions of columnstore filter operators.
"executor":"ColumnStoreFilter",
"keyId":4294968023,
"condition":[
"o.oid = l.oid bloom AND l.oid = o.oid join index"
],
You may see a condition that mentions join index
.
o.oid = l.oid bloom AND l.oid = o.oid join index
The join index
filter can also be viewed in the EXPLAIN
output.
Performance of Highly Selective Joins
A higher number of join columns having a matching columnstore hash index and fewer number of hash values result in better join performance.inventory
and product
are two columnstore tables with hash indexes on the columns id
and code
, then the following query has better join performance than query with single join column:
SELECT * FROMinventory JOIN productON inventory.id = product.id AND inventory.code = product.code;
Related Content
See SingleStore Universal Storage – And Then There Was One for additional information about (1) using columnstores with hash indexes, sub-segment access, and fine-grain locking to enable OLTP operations on data bigger than will fit in RAM, and (2) using SPARSE rowstore compression to reduce TCO for rowstore tables with many NULL values.
Last modified: March 2, 2023