Distributed DML Query Execution
On this page
In this topic we will look at common DML query patterns and how they are executed through the distributed system.
You can use the EXPLAIN command to examine a query’s aggregator-level and leaf-level query plans.
Let’s assume the following schema:
CREATE TABLE a (a1 int,a2 int,a3 int,SHARD KEY (a1, a2),KEY (a3));CREATE TABLE b (b1 int,b2 int,b3 int,SHARD KEY (b1, b2));CREATE REFERENCE TABLE r (r1 int,r2 int,PRIMARY KEY (r1),KEY (r2));
Matching on the shard key or the secondary index.
If you specify an equality on every column in the shard key, then the aggregator node will direct the query to exactly one partition.
These queries are sent to one partition:
SELECT * FROM a WHERE a1 = 4 AND a2 = 10;SELECT a3, count(*) FROM a WHERE a1 = 4 AND a2 = 10 GROUP BY a3;
These queries are sent to all partitions:
SELECT * FROM a WHERE a1 = 4;SELECT * FROM a WHERE a1 = 4 OR a2 = 10;SELECT * FROM a WHERE a1 IN (4, 5) AND a2 IN (10);
If your query uses a secondary (non-shard) index, then the aggregator node must send the query to every partition in the cluster.
This query matches the secondary index on the column a3:
SELECT * FROM a WHERE a3 = 5;
Queries that do not match any index perform a full table scan on all partitions.
Last modified: June 22, 2022