Code Generation
On this page
A large contributor to SingleStore's superior query execution performance is its system of code generation.
As stated above, the first time SingleStore encounters a given query shape, it will optimize and compile the query asynchronously for future invocations.
Code generation applies to all Data Manipulation Language DML queries.
The Interpreter Mode’s Effects on Code Generation
The interpreter_
setting controls whether SingleStore interprets or compiles query shapes, or does both.
This section explains how this mode operates when SingleStore encounters a new query shape.
interpreter_ mode
is set to compile
or llvm
In this mode, SingleStore compiles a query’s shape the first time it is encountered.SELECT * FROM t WHERE col = 1;
takes longer to complete on the first run; this is a result of the compile overhead during the first step.WHERE
clauses are different in the second and third query, but both query shapes are identical.
SELECT * FROM t WHERE col = 1;
Empty set (0.13 sec)
SELECT * FROM t WHERE col = 1;
Empty set (0.00 sec)
SELECT * FROM t WHERE col = 100000;
Empty set (0.00 sec)
Important
Run the PROFILE command followed by the SHOW PROFILE JSON
command to view detailed statistics on compilation time (displayed under compile_
in the output).
interpreter_ mode
is set to compile_ lite
In this mode, a query's shape is compiled, without any optimizations, the first time the query shape is encountered.interpreter_
is set to compile
.compile_
applies no optimizations, it will execute queries slower than compile
.
interpreter_ mode
is set to compile_ lite_ first
In this mode, two parallel compilation threads are run: A foreground thread that compiles the query shape in compile_
mode and a background thread that compiles the query shape in compile
mode.compile_
mode instead of compile_
mode.compile_
would consume too many CPU resources (as the same query shape is compiled twice).
interpreter_ mode
is set to interpret
or mbc
In this mode, queries are interpreted and not compiled.
interpreter_ mode
is set to interpret_ first
In this mode, SingleStore interprets and compiles a query shape in parallel, upon encountering the query shape for the first time.SELECT * FROM t WHERE col = 1;
query runs faster than the first query in the previous example.
SELECT * FROM t WHERE col = 1;
Empty set (0.02 sec)
SELECT * FROM t WHERE col = 1;
Empty set (0.00 sec)
SELECT * FROM t WHERE col = 100000;
Empty set (0.00 sec)
As of MemSQL 6.interpreter_
is set to interpret_
by default.
Setting the Interpreter Mode
interpreter_
can be set:
Using a Query Option
To use the query option, add the OPTION (interpreter_
clause to the end of a query.
SELECT * FROM t WHERE col = 1 OPTION (interpreter_mode = compile);
The interpreter_
option is non-operational when a query containing the option is run on a leaf node.
Understanding the Plancache
When a query is first run, it activates code generation, gets optimized, and translated into lower-level machine language./var/lib/memsql/<node ID>/plancache
/.
If the data has undergone a lot of changes, then you might want the plancache to be updated with the new stats by running: ANALYZE TABLE table_
After code generation, the compiled query plans are saved for later use in a plancache.
There are two layers of this plancache: the in-memory plancache and the on-disk plancache (located in the @@global.
directory).plan_
global variable has elapsed (the default value of this variable is 720, or 12 hours).
When a plan expires from the in-memory plancache, it remains in the on-disk plancache and is loaded back into memory the next time the query is run.enable_
and disk_
are set, then a plan that has not been read from disk during the specified disk_
limit will expire from the on-disk plancache and be removed from disk.
To purge all generated plans, both in-memory and on-disk, run the DROP ALL FROM PLANCACHE command.
To see DML query plans that have been cached in the in-memory plancache, run the SHOW PLANCACHE command or query against the information_
table.
The information_
and information_
views contain information about query plans that have been compiled and executed, cumulative query execution statistics about each plan, and plancache information for all nodes.
Please note the details on how the plan_
and activity_
columns can change in the plancache views.SELECT * FROM table_
-
When the plan expires from memory as set by the
plan_
engine variable, if the query is re-run, note that theexpiration_ minutes plan_
changes, theid activity_
does not:name SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 13 | Select_planc1_901e63e0a36b09c1 | +---------+--------------------------------+
SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 33 | Select_planc1_901e63e0a36b09c1 | +---------+--------------------------------+
-
If plans are dropped from both memory and disk, then both the
plan_
andid activity_
changes.name SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 33 | Select_planc1_901e63e0a36b09c1 | +---------+--------------------------------+
DROP ALL FROM PLANCACHE;SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 38 | Select_planc1_2950b7e9280f64bb | +---------+--------------------------------+
-
If the definition of the table queried changes or an engine variable that affects query plans changes, then the
plan_
andid activity_
columns both change in the plancache.name The following example illustrates the change of both the plan_
andid activity_
when a column is added to the table:name SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 38 | Select_planc1_2950b7e9280f64bb | +---------+--------------------------------+
ALTER TABLE planc1 ADD COLUMN d INT;SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 39 | Select_planc1_4d6188772954c27c | +---------+--------------------------------+
-
The following example illustrates what happens when a plan-affecting engine variable is changed.
In this example, auto profiling is enabled. The previous plan remains valid; however, there are two different plans for the same query. SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+--------------------------------+ | plan_id | activity_name | +---------+--------------------------------+ | 39 | Select_planc1_4d6188772954c27c | +---------+--------------------------------+
SET GLOBAL enable_auto_profile = 1;SELECT plan_id, activity_name FROM information_schema.PLANCACHEWHERE query_text LIKE 'SELECT * FROM planc1 ORDER BY%';+---------+----------------------------------------+ | plan_id | activity_name | +---------+----------------------------------------+ | 39 | Select_planc1_4d6188772954c27c | | 45 | Select_profile_planc1_cb0dd9eed1c43d24 | +---------+----------------------------------------+
Overriding Parameterization of Queries
Parameterization is when the constants in a query are converted into parameters, so that queries with the same structure can reuse the query plan.
To override parameterization, use the NOPARAM built-in function.
Codegen Architecture
Code generation produces a specialized programming language, the SingleStore (MemSQL) Plan Language.
In addition to the simplicity of the language, MPL benefits from exposing as primitives many relational operations such as index seek
and scalar operations such as
.BIGINT
addition
SingleStore Bytecode
When compiling a query, SingleStore first flattens MPL into a compact format known as SingleStore (MemSQL) Bytecode (MBC).x86_
machine code.
SELECT * FROM t WHERE j > 0;
Empty set (0.05 sec)
SELECT * FROM t WHERE j > 0;
Empty set (0.05 sec)
In this example, there is no additional latency on the first request because the query is interpreted.
SELECT * FROM t WHERE j > 0; -- With code generation
Empty set (0.08 sec)
SELECT * FROM t WHERE j > 0; -- Using the cached plan
Empty set (0.02 sec)
Here, the additional latency on the first request is the result of code generation.
Last modified: October 31, 2024