Query Performance Tools
On this page
Run ANALYZE
The ANALYZE
command collects data statistics on a table to facilitate accurate query optimization.ANALYZE
your tables after inserting, updating, or deleting large numbers of rows (30% of your table row count is a good rule of thumb).
EXPLAIN
Returns detailed information on how the query is compiled.
PROFILE
Provides detailed resources usage metrics about a query.
Auto Profiling
Prior to the 8.PROFILE
could only be utilized on a per-query basis.SHOW PROFILE
command was run to obtain the profile results.PROFILE
command is explicitly written with the query.
PROFILE <statement>;SHOW PROFILE;
Now SingleStoreDB offers two automatic profiling modes (called "auto profiling" hereafter): FULL
and LITE
.
FULL
auto profiling collects statistics for every run of the query with an existing query plan.
FULL
auto profiling may cause additional network latency between 1-5 milliseconds on mostly low latency queries.FULL
auto profiling can cause a small variable overhead on the CPU.FULL
auto profiling works best for queries where adding this latency won't degrade the performance of the overall workload; such as analytical queries that execute in the order of seconds not milliseconds.
LITE
auto profiling collects statistics on the first run of a new query plan and when a query plan is loaded from disk to memory.FULL
auto profiling.
There is negligible memory overhead for high throughput and low latency queries for using the LITE
auto profiling setting.LITE
auto profiling works best on transaction processing workloads with mostly inserts and updates.
The following code snippet shows how to turn on auto profiling and then how to turn on the FULL
or LITE
auto profiling type.enable_
engine variable.SUPER
permissions, but SUPER
permissions are not required to set this engine variable at the session level:
SET GLOBAL enable_auto_profile = ON;
Next, you must set the auto_FULL
or LITE
:
SET GLOBAL auto_profile_type = FULL | LITE;
Note
If you choose to utilize auto profiling, if you decide to turn it off (enable_
), your plans will be recompiled.LITE
and FULL
auto profiling instead (SET GLOBAL auto_
).
Auto Profiling Use Cases
Capturing Live vs Historical Profiles
The profiles of both explicitly- and auto-profiled queries are obtained using the same method.
Live profiles (also called a partial profile) are available if the query is profiled while currently executing.SHOW PROCESSLIST
command.ID
from the SHOW PROCESSLIST
results as the process_
, the live profile can be obtained by running the following command in a different session.PROCESS
permission.
SHOW PROFILE [JSON] PROCESS <process_id>;
Note
Please note that when you run SHOW PROFILE PROCESS
on a running query, the query's execution will be interrupted as the engine gathers the currently collected statistics to be returned to the client.SHOW PROFILE PROCESS
at high frequency, as it can degrade the performance of the query.
To obtain a comprehensive profile after the session has completed the execution of the profiled query, you can simply run SHOW PROFILE [JSON]
(without the PROCESS
and process_
arguments) from the same session that just finished executing the query.
Historical profiles are obtained by adding a PLAN
and plan_
argument to the SHOW PROFILE
command.plan_
is obtained by running the SHOW PLANCACHE
command.information_
view with the appropriate filters to narrow the search for the correct plan.
SHOW PROFILE [JSON] PLAN <plan_id>;
Profile statistics from previous executions are stored independently of individual users and sessions.SHOW PROFILE PLAN
.SHOW PROFILE
on the session that finished first becomes useful, as the profile statistics from that session's query execution remain available.
There is no way to manually drop historical query profiles without removing the associated query plan from the in-memory plancache.
Profiling Use Cases in Production Environments
The following use cases assume the auto profile type is set to LITE
except for the last use case.
-
Investigating why a query inexplicably slows down:
Despite the absence of operators' individual execution times and network times in the initial run, the plan's profile stats still include the total execution time, row counts, and memory usage.
As a result, if the query plan starts running slower than anticipated, the user can save the initial run's stats and perform an explicit profile execution. The explicit profile will show which parts of the query are most expensive and how the query execution has changed by comparing row counts, memory usages, and network traffic. -
Investigating why a workload or a procedure inexplicably slows down:
In this situation, if the exact issue cannot be identified through workload profiling and existing monitoring tools, the
auto_
can be set toprofile_ type FULL
to collect basic stats from the procedures and or sessions that require profiling.Additionally, during intervals between query executions, the user can periodically send SHOW PROFILE [JSON] PLAN
queries to persist individual profiles.If more detailed stats are needed for specific query plans, the explicit profile approach can be utilized. -
Checking whether a long-running query is progressing or should be terminated:
The user can capture and save a live profile of the query assuming the query is collecting statistics.
After waiting for a few minutes, the user can capture another live profile and compare the two profiles, specifically examining the row counts. If the row counts have changed, it indicates that the query is making progress, but if they haven't, it is likely that the query is stuck at some point. Or, if the row counts are only changing for one operator, then that operator may be the source of the slowness; refactoring the query or changing table and index structures may improve that. -
Determining which query plans are more likely to be resource-intensive:
In this situation, the user may anticipate high demand from their application and consequently expect increased resource usage during a specific time window.
To proactively prepare, the user can analyze the profile stats of existing plans to estimate which plans are likely to be more resource-demanding on the cluster. -
Obtaining actual profiling information from users so the support team can troubleshoot query performance issues:
This situation is especially helpful when a potentially degraded query is part of a workload and it's difficult to profile the individual queries.
Also, some query degradation presents itself intermittently. Setting auto profiling can capture the unexpected performance decline.
Profiling Use Cases in Query/Workload Tuning
-
Understanding what the impact would be if new DML queries are added to a workload:
The user can switch the
auto_
toprofile_ type FULL
, execute the workload, and then switch theauto_
back toprofile_ type LITE
.Then, the user can obtain profile statistics on the query plans that were executed as part of the workload. If specific query plans require detailed information such as individual execution times and network times, the user can explicitly profile them as well. -
Performance testing a stored procedure:
Typically, workload profiling is the primary approach for analyzing the performance of an entire workload.
Query profiling is intended to complement workload profiling, so if there is a noticeable change in the workload profiling results, query profiles of both new and existing DML queries can be captured and compared. It is important to note that the profiles of existing DML queries may undergo significant changes due to the introduction of new DML queries, which can potentially alter row counts. Leveraging auto profiles can assist in capturing these changes effectively.
Visual Explain
Visual Explain is a feature of SingleStoreDB that allows customers to see a query plan visualized via a graphical interface.
Query Plan Operations
This topic describes the operations that a query plan may use.t
, a rowstore table with a primary key, and ct
a columnstore table.db1
.
CREATE ROWSTORE TABLE t(id INT PRIMARY KEY, a INT, b INT, KEY(a));CREATE TABLE ct(a INT, b INT, SORT KEY(a), SHARD KEY(a));
Table access methods
-
Project
- outputs a subset of columns of the input (for example, aSELECT
statement that calls out specific columns from a table) in a particular order, and optionally computes new columns that are expressions of existing ones (for example,SELECT column_
).a / column_ b AS column_ c FROM table_ name -
TableScan
- scans every row in a table using an index -
IndexSeek
- navigates to a particular row using an index -
IndexRangeScan
- scans a range of rows using an index -
ColumnStoreScan
- scans a columnstore table -
OrderedColumnStoreScan
- scans a table using the columnstore sort key in key order
EXPLAIN SELECT * FROM t WHERE t.a = 5;
+---------------------------------------------+
| EXPLAIN |
+---------------------------------------------+
| Project [t.id, t.a, t.b] |
| Gather partitions:all |
| Project [t.id, t.a, t.b] |
| IndexRangeScan db.t, KEY a (a) scan:[a = 5] |
+---------------------------------------------+
EXPLAIN SELECT * FROM ct;
+---------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------+
| Project [ct.a, ct.b] |
| Gather partitions:all |
| Project [ct.a, ct.b] |
| ColumnStoreScan db1.ct, KEY a (a) USING CLUSTERED COLUMNSTORE |
+---------------------------------------------------------------+
Filter Methods
-
Filter
- reads a stream of input rows and outputs only those rows that match a specified condition -
BloomFilter
- filters rows based on matching them against a join condition from a correspondingHashJoin
The following examples show how Bloomfilter
can appear in a query plan alone, and how it can appear as part of ColumnstoreFilter
when used on a columnstore table, respectively.
EXPLAIN SELECT * FROM rowstore_table_a straight_join (SELECT WITH (no_merge_this_select=true) * FROM columnstore_table_a) t WITH (bloom_filter=true) ON t.column_b = rowstore_table_a.column_b;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Top limit:[@@SESSION.`sql_select_limit`] |
| Gather partitions:all est_rows:1 alias:remote_0 |
| Project [r0.column_a, r0.column_b, r1.column_a AS column_a_1, r1.column_b AS column_b_2] est_rows:1 est_select_cost:4 |
| Top limit:[?] |
| HashJoin |
| |---HashTableProbe [r1.column_b = r0.column_b] |
| | HashTableBuild alias:r1 |
| | Repartition [columnstore_table_a.column_a, columnstore_table_a.column_b] AS r1 shard_key:[column_b] est_rows:1 |
| ColumnStoreScan database_name.columnstore_table_a, KEY column_a (column_a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
| TableScan r0 storage:list stream:yes table_type:sharded est_table_rows:1 est_filtered:1 |
| Repartition [rowstore_table_a.column_a, rowstore_table_a.column_b] AS r0 shard_key:[column_b] est_rows:1 |
| TableScan database_name.rowstore_table_a table_type:sharded_rowstore est_table_rows:1 est_filtered:1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN SELECT * FROM columnstore_table_b straight_join (SELECT WITH (no_merge_this_select=true) * FROM columnstore_table_a) t WITH (bloom_filter=true) ON t.column_b = columnstore_table_b.column_b;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Top limit:[@@SESSION.`sql_select_limit`] |
| Gather partitions:all est_rows:1 alias:remote_0 |
| Project [r0.column_a, r0.column_b, r1.column_a AS column_a_1, r1.column_b AS column_b_2] est_rows:1 est_select_cost:4 |
| Top limit:[?] |
| HashJoin |
| |---HashTableProbe [r1.column_b = r0.column_b] |
| | HashTableBuild alias:r1 |
| | Repartition [columnstore_table_a.column_a, columnstore_table_a.column_b] AS r1 shard_key:[column_b] est_rows:1 |
| | ColumnStoreScan database_name.columnstore_table_a, KEY column_a (column_a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
| TableScan r0 storage:list stream:yes table_type:sharded est_table_rows:1 est_filtered:1 |
| Repartition [columnstore_table_b.column_a, columnstore_table_b.column_b] AS r0 shard_key:[column_b] est_rows:1 |
| ColumnStoreScan database_name.columnstore_table_b, KEY column_a (column_a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
ColumnstoreFilter
Table Access Method: Applies a Filter to a Columnstore Table
The following example demonstrates the ColumnstoreFilter query operation, using the table articles
:
CREATE TABLE articles (id INT UNSIGNED,year int UNSIGNED,title VARCHAR(200),body TEXT,SHARD KEY(id),SORT KEY (id),KEY (id) USING HASH,KEY (title) USING HASH,KEY (year) USING HASH);
The EXPLAIN
statement shows the ColumnStoreFilter operation with index
, because a hash index is used to apply the filter.
EXPLAIN SELECT * FROM articles WHERE title = 'Interesting title here';
+------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------+
| Gather partitions:all alias:remote_0 |
| Project [articles.id, articles.year, articles.title, articles.body] |
| ColumnStoreFilter [articles.title = 'Interesting title here' index] |
| ColumnStoreScan d.articles, KEY id_2 (id) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore |
+------------------------------------------------------------------------------------------------------+
GROUP BY and Aggregations
-
Aggregate
- computes an aggregate -
HashGroupBy
- uses a hash table to compute group by results -
StreamingGroupBy
- leverages the fact that the underlying operation produces rows in order to compute group by results.The advantage of StreamingGroupBy
is that it only uses a constant amount of memory -
ShuffleGroupBy
- occurs when aGROUP BY
clause operates on a set of columns that do not include the shard key.First, a local GROUP BY
is performed per host.Then, the data is repartitioned and GROUP BY
is completed. -
Distinct
- removes duplicate rows
EXPLAIN SELECT SUM(id) FROM t;
+-----------------------------------------+
| EXPLAIN |
+-----------------------------------------+
| Project [`sum(id)`] |
| Aggregate [SUM(`sum(id)`) AS `sum(id)`] |
| Gather partitions:all |
| Project [`sum(id)`] |
| Aggregate [SUM(t.id) AS `sum(id)`] |
| TableScan db1.t, PRIMARY KEY (id) |
+-----------------------------------------+
EXPLAIN SELECT SUM(id) FROM t GROUP BY a+1;
+------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------+
| Project [`sum(id)`] |
| HashGroupBy [SUM(`sum(id)`) AS `sum(id)`] groups:[t.a + 1] |
| Gather partitions:all |
| Project [`sum(id)`, t.a + 1 AS op, t.a, 1 AS op_1] |
| HashGroupBy [SUM(t.id) AS `sum(id)`] groups:[t.a + 1] |
| TableScan db1.t, PRIMARY KEY (id) |
+------------------------------------------------------------+
Distributed data movement
-
Gather
- collects all the results from the leaf nodes to the aggregator node.When a query can be routed to a single partition it has the attribute partitions:single
.If Gather
collects data from all the partitions the attribute is set topartitions:all
.If the shard key matches an IN
list predicate, then the attribute is set topartitions:inlist
.The query will only be sent to partitions that match the values in the IN
list.Queries that have partitions:single
are called single partition queries.An advantage of single partition queries is that they can scale to much higher concurrency and throughput because they only need to execute on a single partition. -
GatherMerge
- collects ordered streams of rows from the leaf nodes and merges them to output an ordered stream.
EXPLAIN SELECT * FROM t WHERE id = 1;
+-------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------+
| Gather partitions:single |
| Project [t.id, t.a, t.b] |
| IndexSeek db1.t, PRIMARY KEY (id) scan:[id = 1] |
+-------------------------------------------------+
EXPLAIN SELECT * FROM t WHERE id > 1;
+------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------+
| Project [t.id, t.a, t.b] |
| Gather partitions:all |
| Project [t.id, t.a, t.b] |
| IndexRangeScan db1.t, PRIMARY KEY (id) scan:[id > 1] |
+------------------------------------------------------+
EXPLAIN SELECT * FROM t WHERE id IN (2,3,4);
+-----------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------+
| Gather partitions:inlist alias:remote_0 |
| Project [t.id, t.a, t.b] |
| IndexSeek demo.t, PRIMARY KEY (id) scan:[id IN (...)] table_type:sharded_rowstore |
+-----------------------------------------------------------------------------------+
EXPLAIN SELECT * FROM t ORDER BY id;
+-----------------------------------+
| EXPLAIN |
+-----------------------------------+
| Project [t.id, t.a, t.b] |
| GatherMerge [t.id] partitions:all |
| Project [t.id, t.a, t.b] |
| TableScan db.t, PRIMARY KEY (id) |
+-----------------------------------+
-
Repartition
- redistributes a dataset to hash-partition it on a particular key -
Broadcast
- broadcasts a dataset to every node in a workspaceNote
For broadcast
LEFT JOIN
, aBRANCH
operator is added to more accurately represent shared computations.A shared result table is now computed once, shown once, and shared across different branches in the plan. EXPLAIN SELECT t.*, ct.* FROM t LEFT JOIN ct ON t.a = ct.a;+--------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------+ | Gather partitions:all est_rows:30 alias:remote_0 | | Project [SUBQ_VWW_1.a, SUBQ_VWW_1.b, SUBQ_VWW_1.a_1, SUBQ_VWW_1.b_2] est_rows:30 est_select_cost:32 | | TableScan 1tmp AS SUBQ_VWW_1 storage:list stream:yes est_table_rows:30 est_filtered:30 | | UnionAll est_rows:30 | | |---Project [r4.a, r4.b, r4.a_1, r4.b_2] est_rows:1 | | | Filter [$0 = 8] | | | HashGroupBy [COUNT(*) AS $0] groups:[r4.i0] | | | TableScan r4 storage:list stream:no table_type:sharded est_table_rows:30 est_filtered:30 | | | Project [r3.i0, r3.a, r3.b, r3.a_1, r3.b_2] alias:r4 est_rows:30 | | | TableScan r3 storage:list stream:yes table_type:sharded est_table_rows:30 est_filtered:30 | | | Repartition AS r3 shard_key:[i0] est_rows:30 | | | Branch [SUBQ_VWW_0.ConstIntCol IS NULL] position:[2/2] | | Project [r2.a, r2.b, r2.a_1, r2.b_2] est_rows:30 | | TableScan r2 storage:list stream:yes table_type:sharded est_table_rows:30 est_filtered:30 | | Project [r1.i0, r1.a, r1.b, SUBQ_VWW_0.a_1, SUBQ_VWW_0.b_2] alias:r2 est_rows:30 | | Branch [SUBQ_VWW_0.ConstIntCol IS NOT NULL] position:[1/2] | | HashJoin type:right | | |---HashTableProbe [r1.a = SUBQ_VWW_0.a_1] | | | HashTableBuild alias:r1 | | | Project [r0.a, r0.b, i0] alias:r1 hash_key:[a] est_rows:1 | | | Window [ROW_NUMBER() OVER () AS i0] | | | TableScan r0 storage:list stream:yes table_type:reference est_table_rows:1 est_filtered:1 | | | Broadcast [t.a, t.b] AS r0 distribution:tree est_rows:1 | | | ColumnStoreScan test1.t, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:1 est_filtered:1 | | TableScan 0tmp AS SUBQ_VWW_0 storage:list stream:yes est_table_rows:3,072 est_filtered:3,072 | | Project [ct.a AS a_1, ct.b AS b_2, 0 AS ConstIntCol] est_rows:3,072 | | ColumnStoreScan test1.ct, KEY a (a) USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:3,072 est_filtered:3,072 | +--------------------------------------------------------------------------------------------------------------------------------------------+ 27 rows in set (0.01 sec)
EXPLAIN SELECT * FROM t,ct WHERE t.id = ct.b;
+-----------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------+
| Project [t.id, t.a, t.b, r0.a_1, r0.b_2] |
| Gather partitions:all est_rows:1 |
| Project [t.id, t.a, t.b, r0.a_1, r0.b_2] est_rows:1 est_select_cost:3 |
| NestedLoopJoin |
| |---IndexSeek db1.t, PRIMARY KEY (id) scan:[id = r0.b_2] est_table_rows:1 est_filtered:1 |
| TableScan r0 storage:list stream:no |
| Repartition [ct.a AS a_1, ct.b AS b_2] AS r0 shard_key:[b_2] est_rows:1 |
| ColumnStoreScan db1.ct, KEY a (a) USING CLUSTERED COLUMNSTORE est_table_rows:1 est_filtered:1 |
+-----------------------------------------------------------------------------------------------+
-
ChoosePlan
indicates that SingleStore will choose one of the listed plans at runtime based on cost estimates.estimate
illustrates the statistics that are being estimated, but note that these SQL statements are not actually estimated.Instead, SingleStore uses index information to estimate these statistics.
EXPLAIN SELECT * FROM t WHERE id > 5 AND a > 5;
+-----------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------+
| Project [t.id, t.a, t.b] |
| Gather partitions:all |
| Project [t.id, t.a, t.b] |
| ChoosePlan |
| | :estimate |
| | SELECT COUNT(*) AS cost FROM db1.t WHERE t.id > 5 |
| | SELECT COUNT(*) AS cost FROM db1.t WHERE t.a > 5 |
| |---Filter [t.a > 5] |
| | IndexRangeScan db1.t, PRIMARY KEY (id) scan:[id > 5] |
| +---Filter [t.id > 5] |
| IndexRangeScan db1.t, KEY a (a) scan:[a > 5] |
+-----------------------------------------------------------+
Joins
The following are the three types of joins that the optimizer can perform.
-
NestedLoopJoin
- performs a nested loop join: for every row on the outer side of the join SingleStore scans into the inner table to find all the matching rows.The complexity of the
NestedLoopJoin
operation is on the order of the number of rows in the outer table times the number of rows on the inner table.If there are millions of rows in each table, this operation is not efficient. It is best to add an index and or shard keys to the tables for that use case. If no index or shard key exists in either table, the optimizer performs keyless sharding or sorting on the table outer table before the
NestedLoopJoin
operation.This is displayed by an index seek operation in the EXPLAIN
output.This is the fallback join that the optimizer will perform if all other joins fail.
-
HashJoin
- performs a hash join: SingleStore builds a hash table from one of the joined tables.For every row in the left table, the hash table is probed. If there is a match, the rows are joined. The hash table must fit in memory, so SingleStore attempts to create the hash table from the smaller of the two tables.
If there isn't enough memory, a NestedLoopJoin
is performed. -
MergeJoin
- performs a merge join: SingleStore scans both inner and outer sides of the join at the same time and merges matching rows.Both tables must have a sort key(s) and shard key(s) on the column that is being joined. If none of these conditions exist, the other two joins are considered. Since values to be joined are in sort order, both tables are scanned at the same time.
The optimizer needs to perform only one scan. The shard key requirement makes the join local ensuring we only consider matches on each partition. This is why the MergeJoin
is the most performant.Note
MergeJoin
is only supported for inner joins, not for outer joins.
EXPLAIN SELECT * FROM t t1, t t2 WHERE t1.id = t2.a;
+------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------+
| Project [t1.id, t1.a, t1.b, r0.id_1, r0.a_2, r0.b_3] |
| Gather partitions:all est_rows:1 |
| Project [t1.id, t1.a, t1.b, r0.id_1, r0.a_2, r0.b_3] est_rows:1 est_select_cost:3 |
| NestedLoopJoin |
| |---IndexSeek db1.t AS t1, PRIMARY KEY (id) scan:[id = r0.a_2] est_table_rows:1 est_filtered:1 |
| TableScan r0 storage:list stream:no |
| Repartition [t2.id AS id_1, t2.a AS a_2, t2.b AS b_3] AS r0 shard_key:[a_2] est_rows:1 |
| TableScan db1.t AS t2, PRIMARY KEY (id) est_table_rows:1 est_filtered:1 |
+------------------------------------------------------------------------------------------------+
EXPLAIN SELECT * FROM t, ct WHERE t.b = ct.b;
+-----------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------+
| Project [r1.id, r1.a, r1.b, ct.a AS a_1, ct.b AS b_2] |
| Gather partitions:all est_rows:1 |
| Project [r1.id, r1.a, r1.b, ct.a AS a_1, ct.b AS b_2] est_rows:1 est_select_cost:4 |
| HashJoin [r1.b = ct.b] |
| |---Broadcast [t.id, t.a, t.b] AS r1 est_rows:1 |
| | TableScan db1.t, PRIMARY KEY (id) est_table_rows:1 est_filtered:1 |
| ColumnStoreScan db1.ct, KEY a (a) USING CLUSTERED COLUMNSTORE est_table_rows:1 est_filtered:1 |
+-----------------------------------------------------------------------------------------------+
EXPLAIN SELECT * FROM ct t1, ct t2 WHERE t1.a = t2.a;
+--------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------+
| Project [t1.a, t1.b, t2.a, t2.b] |
| Gather partitions:all |
| Project [t1.a, t1.b, t2.a, t2.b] |
| MergeJoin condition:[t2.a = t1.a] |
| |---OrderedColumnStoreScan db1.ct AS t2, KEY a (a) USING CLUSTERED COLUMNSTORE |
| +---OrderedColumnStoreScan db1.ct AS t1, KEY a (a) USING CLUSTERED COLUMNSTORE |
+--------------------------------------------------------------------------------+
Handling Parameter-Dependent Query Plan Issues
A parameter-dependent query plan issue can occur when the query optimizer generates a query execution plan optimized for a specific parameter value/set of values.
Workarounds that can reduce parameter-dependent query plan performance issues are:
-
Using both the
with(row_
andcount=xxx) with(selectivity=x.
query hints.x) The hints can be used with the right table after the JOIN
:JOIN <table_name> [AS alias] with(row_count=xxx, selectivity=x.x) ONThese hints override the statistics the query optimizer uses.
with(row_
treats the table as having xxx number of rows.count=xxx) with(selectivity=x.
sets an estimate of the fraction of the table's rows that are not filtered.x) -
Using the NOPARAM function.
This function disables the parameterization of constants before a query plan is compiled. Please be aware when disabling parameterization, separate query plans are created for the different parameter values thereby causing an increase in compiling time and resources. -
Adding comments to the query.
Comments are not parameterized, so using them forces the optimizer to generate different plans. Please note, the client can remove the comments in a query, thereby preventing the effectiveness of this workaround.
For example:
SELECT /*1*/ 1;The server could receive the following command:
SELECT 1;Again, please be aware of the increase in compiling time and resources needed when separate query plans are created.
Query Performance Tool Results For The Same Query
Using a join between the people
and age
tables shown in the shard key and sort key topics, this is what we get when we submit the same query to EXPLAIN, PROFILE, and VISUAL EXPLAIN.
EXPLAIN:
EXPLAIN SELECT people.user, people.first, people.last, age.ageFROM people JOIN age ON people.id = age.age_idORDER BY age.age;
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
| GatherMerge [remote_0.age] partitions:all est_rows:8 alias:remote_0 |
| Project [people.user, people.first, people.last, age.age] est_rows:8 est_select_cost:16 |
| TopSort limit:[?] [age.age] |
| HashJoin |
| |---HashTableProbe [people.id = age.age_id] |
| | HashTableBuild alias:age |
| | Project [age_0.age, age_0.age_id] est_rows:8 |
| | ColumnStoreScan demo.age AS age_0, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:8 est_filtered:8 |
| ColumnStoreScan demo.people, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:8 est_filtered:8 |
+----------------------------------------------------------------------------------------------------------------------------------------------------+
PROFILE:
PROFILE SELECT people.user, people.first, people.last, age.ageFROM people JOIN age ON people.id = age.age_idORDER BY age.age;
+-------------+-----------+------------+-----+
| user | first | last | age |
+-------------+-----------+------------+------
| cjohnson | carrie | johnson | 10 |
| dyarbrough | dennis | yarbrough | 13 |
| pquincy | penelope | quincy | 21 |
| cachebe | chioma | achebe | 29 |
| jmalik | jahid | malik | 35 |
| kharris | keisha | harris | 43 |
| tling | tai | ling | 67 |
| sstevens | samuel | stevens | 72 |
+-------------+-----------+------------+-----+
{
"plan_warnings": {
},
"execution_warnings": {
},
"mpl_path":"\/var\/lib\/memsql\/instance\/plancache\/a91\/Select_profile_age__et_al_a9111431d10dfe90ceed8b3f2b8e9c1d9706242c27af5f7ee4e97d7d3954a29a_6400bba1c33851b2",
"profile":[
{
"executor":"GatherMerge",
"keyId":4295163904,
"order":[
"remote_0.age"
],
"partitions":"all",
"est_rows":"8",
"est_rows_source":"JOIN",
"query":"SELECT STRAIGHT_JOIN `people`.`user` AS `user`, `people`.`first` AS `first`, `people`.`last` AS `last`, `age`.`age` AS `age` FROM (`demo_0`.`people` as `people` STRAIGHT_JOIN ( SELECT WITH(NO_MERGE_THIS_SELECT=1) `age_0`.`age` AS `age`, `age_0`.`age_id` AS `age_id` FROM `demo_0`.`age` as `age_0` ) AS `age`WITH (gen_min_max = TRUE)) WHERE (`people`.`id` = `age`.`age_id`) ORDER BY 4 \/*!90623 OPTION(NO_QUERY_REWRITE=1, INTERPRETER_MODE=INTERPRET_FIRST, CLIENT_FOUND_ROWS=1)*\/",
"alias":"remote_0",
"actual_total_time":{ "value":0 },
"start_time":{ "value":0 },
"end_time":{ "value":1 },
"actual_row_count":{ "value":8, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"inputs":[
{
"executor":"Project",
"keyId":196736,
"out":[
{
"alias":"",
"projection":"people.user"
},
{
"alias":"",
"projection":"people.first"
},
{
"alias":"",
"projection":"people.last"
},
{
"alias":"",
"projection":"age.age"
}
],
"est_rows":"8",
"est_rows_source":"JOIN",
"est_select_cost":"16",
"subselects":[],
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"network_traffic":{ "value":202, "avg":101.000000, "stddev":18.000000, "max":119, "maxPartition":0 },
"network_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"inputs":[
{
"executor":"Sort",
"keyId":196744,
"order":[
"age.age"
],
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"memory_usage":{ "value":785424, "avg":392712.000000, "stddev":0.000000, "max":392712, "maxPartition":0 },
"inputs":[
{
"executor":"HashJoin",
"keyId":327681,
"type":"inner",
"subselects":[],
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"inputs":[
{
"executor":"HashTableProbe",
"keyId":327825,
"condition":[
"people.id = age.age_id"
],
"inputs":[
{
"executor":"HashTableBuild",
"keyId":327824,
"alias":"age",
"actual_row_count":{ "value":8, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"memory_usage":{ "value":262144, "avg":131072.000000, "stddev":0.000000, "max":131072, "maxPartition":0 },
"inputs":[
{
"executor":"Project",
"keyId":327808,
"out":[
{
"alias":"",
"projection":"age_0.age"
},
{
"alias":"",
"projection":"age_0.age_id"
}
],
"est_rows":"8",
"est_rows_source":"JOIN",
"subselects":[],
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"inputs":[
{
"executor":"ColumnStoreScan",
"keyId":4295360512,
"db":"demo",
"table":"age",
"alias":"age_0",
"index":"KEY __UNORDERED () USING CLUSTERED COLUMNSTORE",
"storage":"columnar",
"table_type":"sharded_columnstore",
"columnstore_in_memory_scan_type":"TableScan",
"columnstore_in_memory_scan_index":"KEY __UNORDERED () USING CLUSTERED COLUMNSTORE",
"est_table_rows":"8",
"est_filtered":"8",
"est_filtered_source":"DEFAULT",
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"memory_usage":{ "value":262144, "avg":131072.000000, "stddev":0.000000, "max":131072, "maxPartition":0 },
"segments_scanned":{ "value":2, "avg":1.000000, "stddev":0.000000, "max":1, "maxPartition":0 },
"segments_skipped":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_fully_contained":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_filter_encoded_data":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"blob_fetch_network_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_in_blob_cache":{ "value":2, "avg":1.000000, "stddev":0.000000, "max":1, "maxPartition":0 },
"inputs":[]
}
]
}
]
}
]
},
{
"executor":"ColumnStoreScan",
"keyId":4295229440,
"db":"demo",
"table":"people",
"alias":"people",
"index":"KEY __UNORDERED () USING CLUSTERED COLUMNSTORE",
"storage":"columnar",
"table_type":"sharded_columnstore",
"columnstore_in_memory_scan_type":"TableScan",
"columnstore_in_memory_scan_index":"KEY __UNORDERED () USING CLUSTERED COLUMNSTORE",
"est_table_rows":"8",
"est_filtered":"8",
"est_filtered_source":"DEFAULT",
"actual_row_count":{ "value":8, "avg":4.000000, "stddev":1.000000, "max":5, "maxPartition":0 },
"actual_total_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"start_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"memory_usage":{ "value":262144, "avg":131072.000000, "stddev":0.000000, "max":131072, "maxPartition":0 },
"segments_scanned":{ "value":2, "avg":1.000000, "stddev":0.000000, "max":1, "maxPartition":0 },
"segments_skipped":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_fully_contained":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_filter_encoded_data":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"blob_fetch_network_time":{ "value":0, "avg":0.000000, "stddev":0.000000, "max":0, "maxPartition":0 },
"segments_in_blob_cache":{ "value":2, "avg":1.000000, "stddev":0.000000, "max":1, "maxPartition":0 },
"inputs":[]
}
]
}
]
}
]
}
]
}
],
"version":"4",
"info":{
"memsql_version":"7.5.8",
"memsql_version_hash":"12c73130aa6881ec53d57b44a654e4bada1a07c5",
"num_online_leaves":"2",
"num_online_aggs":"1",
"context_database":"demo"
},
"query_info":{
"query_text":"PROFILE SELECT people.user, people.first, people.last, age.age\n FROM people\n JOIN age ON people.id = age.age_id\n ORDER BY age.age",
"total_runtime_ms":"2",
"text_profile":"GatherMerge [remote_0.age] partitions:all est_rows:8 alias:remote_0 exec_time: 0ms start_time: 00:00:00.000 end_time: 00:00:00.001 actual_rows: 8\nProject [people.user, people.first, people.last, age.age] est_rows:8 est_select_cost:16 actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000 network_traffic: 0.202000 KB network_time: 0ms\nSort [age.age] actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000 memory_usage: 785.424011 KB\nHashJoin actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000\n|---HashTableProbe [people.id = age.age_id]\n| HashTableBuild alias:age actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000 memory_usage: 262.144012 KB\n| Project [age_0.age, age_0.age_id] est_rows:8 actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000\n| ColumnStoreScan demo.age AS age_0, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:8 est_filtered:8 actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000 memory_usage: 262.144012 KB segments_scanned: 2 segments_skipped: 0 segments_fully_contained: 0 blob_fetch_network_time: 0ms segments_in_blob_cache: 2\nColumnStoreScan demo.people, KEY __UNORDERED () USING CLUSTERED COLUMNSTORE table_type:sharded_columnstore est_table_rows:8 est_filtered:8 actual_rows: 8 exec_time: 0ms start_time: 00:00:00.000 memory_usage: 262.144012 KB segments_scanned: 2 segments_skipped: 0 segments_fully_contained: 0 blob_fetch_network_time: 0ms segments_in_blob_cache: 2\nCompile Total Time: 0ms\n",
"compile_time_stats":{
"mbc_emission":"0",
"create_mbc_context":"0",
"optimizer_query_rewrites":"0",
"optimizer_stats_analyze":"0",
"optimizer_stats_other":"0",
"optimizer_setting_up_subselect":"0",
"optimizer_distributed_optimizations":"0",
"optimizer_enumerate_temporary_tables":"0",
"optimizer_singlebox_optimizations_agg":"0",
"optimizer_stats_autostats":"0",
"generating_query_mpl":"0",
"generating_user_function_mpl":"0",
"unknown":"0",
"total":"0"
}
}
}
VISUAL EXPLAIN:
SELECT people.user, people.first, people.last, age.ageFROM people JOIN age ON people.id = age.age_idORDER BY age.age;

Last modified: August 1, 2023