Query Compilation

Why does CREATE TABLE take so long in SingleStore?

In order to speed up the compilation of queries in SingleStore, CREATE TABLE will precompile code used to access and update table indexes. This is a one-time operation for each unique table schema, and compiled table code will be cached on disk for future uses.

Why do SingleStore queries typically run faster the second time they are executed?

Traditional relational database management systems interpret SQL queries the same way interpreters for languages like Python and Ruby run programs. The first time a SingleStore server encounters a given query shape, it will optimize and compile the query for future invocations. This incurs overhead which does not depend on the amount of data to be processed, but rather the complexity of the query. The process of code generation involves extracting parameters from the query then transforming the normalized query into a SingleStore-specific intermediate representation tailored to the system. Subsequent requests with the same shape can reuse this plan to complete both quickly and consistently. Starting with MemSQL 5, MemSQL embeds an industrial compiler (LLVM) for code generation, leading to fast query performance for even the first time queries are run.

How much can you change a query before it needs to be recompiled?

If you only change an integer or string constant in a query, it will not require recompilation.

SingleStorestrips out numeric and string parameters from a query and attaches the resulting string to a compiled plan. This string is referred to as a parameterized query. For example, SELECT * FROM foo WHERE id=22 AND name='bar' is parameterized to SELECT * FROM foo WHERE id=@ AND name=^.

You can list the distinct parameterized queries corresponding to all executed queries by running SHOW PLANCACHE.

The one exception to this rule is constants in the projection clause without an alias. These constants are compiled directly into the plan’s assembly code for performance reasons. For example, SELECT id + 1, id + 2 AS z FROM foo is converted to SELECT id + 1, id + @ AS z FROM foo.

Why query compilation is abandoned?

If you receive a warning WARN: Failed to finalize asynchronous compilation, abandoning compilation of this query, this indicates a query's asynchronous compilation did not finish properly. It is likely due to query compilation timeouts or reaching the maximum compilation memory limit.

You can also get this warning if a table is updated, altered, or dropped within a transaction before the asynchronous compilation is finished.

Here are some solutions to consider:

  • Increase the max_compilation_memory_mb (maximum amount of memory to compile a query) setting on the Master Aggregator. The default value for this engine variable is 4096 megabytes:

    SET GLOBAL max_compilation_memory_mb = 4097;
  • Increase the max_compilation_time_s (maximum time compilation time) setting on the Master Aggregator. The default value for this engine variable is 600 seconds:

    SET GLOBAL max_compilation_time_s = 601;
  • Set the interpreter_mode session variable to mbc or compile_lite to interpret the query rather than compile it, and then try running the query again:

    SET SESSION interpreter_mode = mbc;
    SET SESSION interpreter_mode = compile_lite;

If you continue to see this warning, it may be worthwhile to investigate why the query is taking so much memory to compile. You may need to optimize the query or consider other performance tuning techniques.

How does parameterization affect queries that use JSON?

The parametrize_json_keys variable controls how the engine approaches queries that perform the same operations but operate on different JSON keys. For example, consider the two queries:

SELECT * FROM t WHERE JSON_EXTRACT_STRING(t.data, 'foo') = "bar";
SELECT * FROM t WHERE JSON_EXTRACT_STRING(t.data, 'bar') = "foo";

Here the engine compiles two different query shapes, one for each key ('foo' and 'bar'); by comparison, if parametrize_json_keys is enabled the key is parametrized and only one plan is created.

Now consider the below query where the specific key has been replaced with a generic placeholder:

SELECT * FROM t WHERE JSON_EXTRACT_STRING(t.data, ^) = "bar";

In this case, fewer plans are stored and less time is spent compiling if you send the same shape with different keys.

However, results may vary and having parametrize_json_keys enabled may not always be optimal for all queries.

For example, consider:

SELECT * FROM t, t2 WHERE JSON_EXTRACT_STRING(t.data, "address_state") = "CA"  AND t.a = t2.b;

Here we need to see how data movement will work for this query, depending on the filter on the ADDRESS_STATE key. If you parametrize this query you could use the same shape for:

SELECT * FROM t, t2 WHERE JSON_EXTRACT_STRING(t.data, "user_id") = "0124512abcdef" AND t.a = t2.b;

but you may see wildly different row counts depending on what key is being used.

Another optimization that may be affected when parametrizing involves deduplicating column references.

For example, consider:

SELECT JSON_EXTRACT_STRING(t.data, "foo") FROM t WHERE JSON_EXTRACT_STRING(t.data, "foo") LIKE "%bar%";

With no parametrization, the "foo" key will be read once during execution but with parametrization, you cannot be sure that the same key will be read each time and potentially the same data may be read twice on each execution.

Last modified: July 16, 2024

Was this article helpful?