# Highly Selective Joins

## Working with Highly Selective Joins

Starting in version 7.0, SingleStore introduced support for columnstore hash indexes to broaden the support for OLTP-type queries. However, a common join pattern in OLTP is to have a very selective filter on one table, which produces a few rows from the source table, and then join those rows with another table. Databases for OLTP normally use a nested loop join for this. For each row from the outer table, an index seek is done on the inner table.

SingleStore supports these highly selective joins using an adaptive hash join algorithm. As a result, there’s no need to run a full columnstore scan when there’s a matching hash index. First, it builds a hash table for the table with the highly-selective filter. Then, depending on the number of rows in the hash table, the adaptive algorithm will switch strategies internally.

* If there are only a few rows in the hash table, it switches to use a nested loop join strategy, seeking into the larger table (on the probe side) via the index on the join column of the table on the probe side.
* If the hash build side produces a lot of rows, then it performs a normal hash join.

The following example takes advantage of this strategy for selective joins.

```sql
CREATE TABLE orders(
  oid INT,
  d DATETIME,
  SORT KEY(d),
  SHARD(oid),
  KEY(oid) USING hash);

CREATE TABLE lineitems(
  id INT,
  oid INT,
  item INT,
  SORT KEY(oid),
  SHARD(oid)
  KEY(oid) USING HASH);

```

Now, add some sample data to `orders`:

```sql
INSERT INTO orders VALUES(1, NOW());

```

Execute the following command repeatedly, until the table has around 33.5 million rows in it.

```sql
INSERT INTO orders
SELECT oid+(SELECT MAX(oid) FROM orders), NOW()
FROM orders;

```

Add 67.1 million rows of data to the `lineitems` table, such that each line item belongs to an order, and each order has exactly two line items.

```sql
INSERT INTO lineitems SELECT oid, oid, 1 FROM orders;
INSERT INTO lineitems SELECT oid + 1000*1000*1000, oid, 2 FROM orders;

```

Find a selective `DATETIME` value for `d` to search on:

```sql
SELECT d, COUNT(*)
FROM orders
GROUP BY d;

```

The result shows that a number of `DATETIME` values only appear in one row in `orders`. For this example, let’s say that one such value is `2020-03-30 16:47:05`.

The following query uses this date to produce a join result with exactly two rows:

```sql
SELECT *
FROM orders o JOIN lineitems l ON o.oid = l.oid
WHERE o.d = "2020-03-30 16:47:05";

```

This query filters rows on `o.d` to find a single row of orders, then joins to `lineitems` via the hash index on `lineitems.oid`, using the new selective join algorithm.

In the JSON profile plan, you can see if a plan may be able to perform a join via a hash index. If this optimization strategy is available in your plan, you will see `join index` in descriptions of columnstore filter operators. For example:

```
"executor":"ColumnStoreFilter",
"keyId":4294968023,
"condition":[
        "o.oid = l.oid bloom AND l.oid = o.oid join index"
],

```

You may see a condition that mentions `join index`.

```
o.oid = l.oid bloom AND l.oid = o.oid join index

```

The `join index` filter can also be viewed in the `EXPLAIN` output.

## Performance of Highly Selective Joins

A higher number of join columns having a matching columnstore hash index and fewer number of hash values result in better join performance. For example, if `inventory` and `product` are two columnstore tables with hash indexes on the columns `id` and `code`, then the following query has better join performance than query with single join column:

```sql
SELECT * FROM 
inventory JOIN product
ON inventory.id = product.id AND inventory.code = product.code;
```

## Related Content

See [SingleStore Universal Storage – And Then There Was One](https://www.singlestore.com/blog/memsql-singlestore-then-there-was-one/) for additional information about (1) using columnstores with hash indexes, sub-segment access, and fine-grain locking to enable OLTP operations on data bigger than will fit in RAM, and (2) using SPARSE rowstore compression to reduce TCO for rowstore tables with many NULL values.

***

Modified at: March 27, 2025

Source: [/db/v9.1/create-a-database/columnstore/highly-selective-joins/](https://docs.singlestore.com/db/v9.1/create-a-database/columnstore/highly-selective-joins/)

(An index of the documentation is available at /llms.txt)
