# Query Execution

## Distributed DML Query Execution

In this topic we will look at common DML query patterns and how they are executed through the distributed system.&#x20;

You can use the [EXPLAIN](https://docs.singlestore.com/cloud/reference/sql-reference/data-manipulation-language-dml/explain.md) command to examine a query’s node-level query plans.

Let’s assume the following schema:

```sql
CREATE TABLE a (
    a1 int,
    a2 int,
    a3 int,
    SHARD KEY (a1, a2),
    KEY (a3)
);

CREATE TABLE b (
    b1 int,
    b2 int,
    b3 int,
    SHARD KEY (b1, b2)
);

CREATE REFERENCE TABLE r (
    r1 int,
    r2 int,
    PRIMARY KEY (r1),
    KEY (r2)
);

```

## Index Matching

Matching on the shard key or the secondary index.

## Matching the Shard Key

If you specify an equality on every column in the shard key, then the node will direct the query to exactly one partition. Most queries do not fall into this pattern; instead, the node must send queries to every partition in the workspace for intermediate results and then stitch them together.

These queries are sent to one partition:

```sql
SELECT * FROM a WHERE a1 = 4 AND a2 = 10;
SELECT a3, count(*) FROM a WHERE a1 = 4 AND a2 = 10 GROUP BY a3;

```

These queries are sent to all partitions:

```sql
SELECT * FROM a WHERE a1 = 4;
SELECT * FROM a WHERE a1 = 4 OR a2 = 10;
SELECT * FROM a WHERE a1 IN (4, 5) AND a2 IN (10);

```

## Secondary Index Matching

If your query uses a secondary (non-shard) index, then the node must send the query to every partition in the workspace. Locally, each partition’s table will use its part of the secondary index to speed up the query. While the overall performance of the query is dictated by the seek and scan time of these indexes, the fact that the query must be sent everywhere in the workspace can increase the variance (and therefore overall latency) of the query.

This query matches the secondary index on the column a3:

```sql
SELECT * FROM a WHERE a3 = 5;

```

## No Index Matching

Queries that do not match any index perform a full table scan on all partitions. From the perspective of the node, these queries are the same as queries that match a secondary index.

## Distributed Joins

## Aligning Shard Keys for Performance

Aligning the shard keys of large tables enables more efficient joining. It is possible to perform arbitrary distributed joins across any tables and along any column. However, if you join two tables with identical shard key signatures along that shard key, the joins will be performed local to the partitions, reducing network overhead.

```sql
CREATE TABLE users (
    id BIGINT AUTO_INCREMENT,
    user_name VARCHAR(1000),
    account_id BIGINT,
    PRIMARY KEY (id)
);

CREATE TABLE clicks (
    click_id BIGINT AUTO_INCREMENT,
    account_id BIGINT,
    user_id BIGINT,
    page_id INT,
    ts TIMESTAMP,
    SHARD KEY (user_id),
    PRIMARY KEY (click_id, user_id)
);

```

In this example, `id` is the shard key of the `users` table, and the shard key on the `clicks` table has the same signature (a single BIGINT). These queries join locally without network overhead:

```sql
SELECT * FROM users INNER JOIN clicks ON users.id = clicks.user_id WHERE clicks.page_id = 10;

SELECT avg(c1.ts - c2.ts) FROM clicks c1 INNER JOIN clicks c2 ON c1.user_id = c2.user_id WHERE c1.page_id > c2.page_id;

```

Whereas this query will stream rows between nodes:

```sql
SELECT u.account_id, count(distinct user_id), count(1)
FROM users u INNER JOIN clicks c ON u.account_id = c.account_id
GROUP BY u.account_id;

```

If you identify your data layout and join patterns in advance, this technique can be an extremely effective way to run performant joins between distributed tables. For more information about how queries execute, see [EXPLAIN](https://docs.singlestore.com/cloud/reference/sql-reference/data-manipulation-language-dml/explain.md).

## Write Queries

## `UPDATE` and `DELETE` Queries

The `WHERE` clause in an `UPDATE` or `DELETE` query is optimized the same way as the `WHERE` clause in a `SELECT` query. If the predicate matches the shard key exactly then the query is routed to a single partition.

## `INSERT` Queries

SingleStore Helios executes `INSERT` queries by analyzing the insert values relevant to the shard key and routing the query to the corresponding partition. For example, `INSERT INTO a (a1, a2, a3) VALUES (1, 2, 3)` would compute the hash value of `(1, 2)` and map this value to the appropriate partition.

If you are bulk inserting data with `INSERT` queries, then you should take advantage of the multi-insert syntax: `INSERT INTO a (a1, a2, a3) VALUES (1, 2, 3), (2, 3, 4), ...`. The node will chop up the multi-insert into single-partition insert queries and run them in parallel across the workspace. This technique enables your application to combat the inherent latency of running in a distributed system.

***

Modified at: September 24, 2021

Source: [/cloud/getting-started-with-singlestore-helios/about-singlestore-helios/query-execution/](https://docs.singlestore.com/cloud/getting-started-with-singlestore-helios/about-singlestore-helios/query-execution/)

(An index of the documentation is available at /llms.txt)
