WITH (Common Table Expressions)

Specifies a temporary named result set, referred to as a Common Table Expression (CTE).

Syntax

Non-recursive (standard) CTE:

WITH cte [, ...]
cte:
expression_name[(column_name, ...)]
AS
select_statement

Recursive CTE:

WITH RECURSIVE cte [, ...]
cte:
expression_name [(column_name, ...)] AS (
<base_case>
UNION ALL
<recursive_case>
) select_statement;

Remarks

  • WITH clauses are supported as part of SELECT, INSERT, UPDATE, DELETE, and REPLACE statements for non-recursive CTEs.

  • Recursive CTEs must have the materialize_ctesengine variable enabled to work. If this engine variable is not enabled, an error is generated when attempting to execute a recursive CTE.

    ERROR 2730 UNKNOWN_ERR_CODE: Unsupported recursive common table expression query shape:
    the engine variable 'materialize_ctes' is currently set to OFF.
    In order to run the recursive common table expression, you need to set 'materialize_ctes' to 'AUTO' or 'ALL'.
    See the Materializing Common Table Expressions documentation for additional information.
  • WITH RECURSIVE clauses defining recursive CTEs are only supported for SELECT statements for both the base case and the recursive case.

  • SingleStore supports materializing CTEs without recomputing them when the query contains UNION/UNION ALL and other SET operations. To enable the feature, set the engine variable allow_materialize_cte_with_union to TRUE. The default value for this engine variable is FALSE for version 8.1.19 and earlier. For more information about materializing CTEs, see the WITH (Common Table Expressions) section below.

  • By default, SingleStore Helios attempts to materialize CTEs. See Materializing Common Table Expressions for more information.

  • If a CTE defined in the WITH clause is not referenced in the SELECT statement, it does not affect the execution of the query.

  • The engine variable internal_max_cte_depth defines the maximum number of common table expressions (CTEs) that can be nested. The default value of this variable is 128.

  • The engine variable max_recursive_cte_iterations sets the maximum number of times the recursive case is executed.

Non-recursive vs recursive CTEs

A non-recursive standard CTE does not reference itself within the statement it is executed in. Non-recursive CTEs are inline view definitions that appear at the start of a query. Also, a non-recursive CTE can be used to simplify a query by isolating subselects out of the main SQL statement.

A recursive CTE references itself. They are useful when working with hierarchical data because the CTE continues to execute until the query returns the desired portion of the hierarchy.

A common example of hierarchical data is an organization chart where employees are listed under their direct managers.

This shows, in a graphical form, that a recursive CTE can retrieve from the database by recursively expanding the result set starting at the top with Maggie. The result set will still be in the form of a table.

A recursive CTE contains two SELECT statements joined by a UNION ALL clause. The first SELECT statement is the base case which is executed once. The second SELECT statement is the recursive case where the CTE references itself, expanding repeatedly until the termination condition is met. The termination condition does not evaluate to TRUE or FALSE. The termination condition is not specific syntax; rather, the RCTE terminates when the recursive case does not return any new rows.

The recursive CTE will use the data types from the columns in the base case if the columns in the recursive case have different data types.

Limitations

There are several limitations on which operations are allowed in the SELECT statements for the recursive case of recursive CTEs. Please note, this list is subject to change upon future releases. These operations are not allowed in the SELECT in the recursive case:

  • The base case in a recursive CTE must be a sharded table.

  • SELECT DISTINCT

  • EXCEPT/MINUS

  • LEFT [OUTER] JOIN

  • RIGHT [OUTER] JOIN

  • INTERSECT

  • UNION

  • GROUP BY

  • ORDER BY

  • LIMIT

  • Aggregate functions like COUNT, MAX, etc.

  • Window functions like RANK, DENSE RANK, etc.

Troubleshooting timed-out recursive CTEs

If the recursive CTE takes an inordinate amount of time to execute or times out, there can be an issue with the data’s structure or with the creation of the CTE itself. These issues prevent the recursive case of the CTE from terminating.

Your data structure may not be truly hierarchical. Using the organization chart example, if Maggie also reported to John, then the data structure contains a cycle. If your data contains cycles, either update the data to remove cycles, or use some other method to make the expansion terminate, like explicitly checking expansion depth.

Your recursive CTE query can be incorrectly coded and create an infinite loop. An infinite loop can occur when the SELECT clause in the recursive case outputs a value from the base case instead of the current iteration. This causes incorrect values to be used in the subsequent iteration’s JOIN operation.

Both the query itself and the data must allow the recursive case to terminate.

Known issue with recursive CTEs

If a recursive CTE results in the error "'Table'.'doesn't exist'", the materialize_ctes setting may be disabled. This usually happens when upgrading from a pre-7.8 version where the materialize_ctes setting was not enabled by default. See the WITH (Common Table Expressions) page for information on how to enable the functionality.

Examples

Non-recursive CTEs:

WITH myCTE AS
(select col from myTable)
select col from myCTE;
WITH myCTE (colAlias) AS
(select col from myTable)
select colAlias from myCTE;
WITH orderCTE AS
(select o_orderkey from orders),
lineitemCTE AS
(select l_orderkey from lineitem)
select count(*) from orderCTE join lineitemCTE on o_orderkey = l_orderkey;
WITH foo AS
(WITH bar AS
(SELECT * FROM t) SELECT * FROM bar)
SELECT * FROM foo;
WITH EmpSal(averageSal) AS
(SELECT AVG(Salary) FROM Employee)
SELECT EmpID, Name, Salary FROM Employee, EmpSal
WHERE Employee.Salary > EmpSal.averageSal;
WITH ObjectCTE (Name, Id, Date) AS
(SELECT objname, objectid, invoicedate FROM Inventory)
INSERT INTO Itemlist(Name, objectid, createdDate)
SELECT Name, Id, Date FROM ObjectCTE;
SELECT * FROM Itemlist;

Recursive CTEs:

Using the organization chart described above, this example illustrates how recursive CTEs simplify the retrieval of hierarchical data.

CREATE TABLE cte_emp(id INT, mgr_id INT, name VARCHAR(30));
INSERT INTO cte_emp VALUES(1, NULL, "Maggie"), (2, 1, "John"), (3, 1, "Felix"),
(4, 2, "Joan"), (5, 2, "Richard"), (6, 3, "Andy"), (7, 3, "Jill");
WITH RECURSIVE org_chart (emp_id, mgr_id, name, level) AS (
SELECT cte_emp.id, cte_emp.mgr_id, cte_emp.name, 0 AS level
FROM cte_emp
WHERE cte_emp.mgr_id is NULL
UNION ALL
SELECT cte_emp.id, cte_emp.mgr_id, cte_emp.name, level+1
FROM cte_emp
INNER JOIN org_chart
ON cte_emp.mgr_id = org_chart.emp_id
) SELECT * FROM org_chart;
+------+--------+---------+-------+
| id   | mgr_id | name    | level |
+------+--------+---------+-------+
|    1 |   NULL | Maggie  |     0 |
|    3 |      1 | Felix   |     1 |
|    2 |      1 | John    |     1 |
|    5 |      2 | Richard |     2 |
|    4 |      2 | Joan    |     2 |
|    7 |      3 | Jill    |     2 |
|    6 |      3 | Andy    |     2 |
+------+--------+---------+-------+

Another example is a graph of connected nodes. A recursive CTE can retrieve the nodes and their relationships to one other. The id column is the node id and pr is the parent node to the node in the id column.

CREATE TABLE cte_g (id INT, pr INT);
INSERT INTO cte_g VALUES (0, null), (1, 0), (2, 0),(3, 1), (4, 1), (5, 2), (6, 2);
WITH RECURSIVE depths AS (
SELECT 0 depth, *
FROM cte_g
WHERE id = 0
UNION ALL
SELECT depth+1, cte_g.id, cte_g.pr
FROM cte_g, depths
WHERE cte_g = depths.id
)
SELECT depth, id FROM depths;
+-------+------+
| depth | id   |
+-------+------+
|     0 |    0 |
|     1 |    1 |
|     1 |    2 |
|     2 |    5 |
|     2 |    3 |
|     2 |    4 |
|     2 |    6 |
+-------+------+

The following examples show the recursive CTE will use the data types from the columns in the base case if the columns in the recursive case have different data types.

CREATE TABLE cte_g (id INT, pr INT, SHARD KEY(id));
INSERT INTO cte_g VALUES (0, null), (1, 0), (2, 0),(3, 1), (4, 1), (5, 2), (6, 2);
WITH RECURSIVE depths AS (
SELECT 0 depth, id, pr
FROM cte_g
WHERE pr IS NULL
UNION ALL
SELECT CAST((depth+1) AS DECIMAL(10, 2)), cte_g.id, cte_g.pr
FROM cte_g, depths
WHERE cte_g.pr = depths.id
)
SELECT depth, id FROM depths;
+-------+------+
| depth | id   |
+-------+------+
|     0 |    0 |
|     1 |    2 |
|     2 |    5 |
|     2 |    4 |
|     1 |    1 |
|     2 |    3 |
|     2 |    6 |
+-------+------+
WITH RECURSIVE routes (id, path) AS (
SELECT id, cast ( 1 as char(30) )
FROM cte_g
WHERE pr IS NULL
UNION ALL
SELECT t.id, CONCAT ( routes.path, '-->', t.id )
FROM cte_g
JOIN routes
ON routes.id = cte_g.pr
)
SELECT * FROM routes;
+----+-------------------------+
| id |	cast ( 1 as char(30) ) |
+----+-------------------------+
| 1  |	1-->1                  |
| 3  |	1-->1-->3              |
| 0  |	1                      |
| 2  |	1-->2                  | 
| 5  |	1-->2-->5              | 
| 4  |	1-->1-->4              |  
+----+-------------------------+

Materializing Common Table Expressions

By using the variable materialize_ctes, common table expressions (CTEs) are materialized when queries with CTEs are run. The materialized results can then be referenced multiple times by the parent query. The CTE queries are cached, which results in optimized performance. This is particularly useful for running queries that contain an expensive CTE that produces a relatively small number of rows and is used two or more times.

The materializing CTEs functionality is enabled by default.

Enable or disable this functionality by setting the materialize_ctes engine variable, or setting it per session via the MATERIALIZE_CTES session variable:

SET MATERIALIZE_CTES = { 'ALL' | 'AUTO' | 'OFF' };

The following settings are available:

  • ALL: Each CTE that appears more than once in the query is materialized into an internal result table.

  • AUTO: This is the default value. A suite of optimizations are enabled. Currently, only the Predicate Pushdown optimization is available.

  • OFF: CTEs are inlined, which is similar to VIEW definitions.

If users can't or don't want to use engine variables, the materialize_ctes engine variable can be set on a per query basis by adding an OPTION expression at the end of the query:

SELECT * FROM foo OPTION(materialize_ctes="OFF");
SELECT * FROM foo OPTION(materialize_ctes="AUTO");
SELECT * FROM foo OPTION(materialize_ctes="ALL");

Materialization can also be turned off per CTE:

WITH foo as (select with(materialize=off) * from titanic),
bar as (select * from titanic)
select * from foo, bar;

Remarks

  • Materialized CTEs appear in the EXPLAIN as m_foo_i, where foo is the name of the CTE in the original query, and i is an integer value generated during name normalization.

  • The results table that is created from a materialized CTE is analogous to an intermediate results table that is created as part of a distributed join query.

  • Plans for queries with CTEs are cacheable regardless of the setting of the materialize_ctes variable.

  • Some SELECT statements may not be materializable. For example, selects with limits, selects with aggregates and no group by, and selects without FROM clauses cannot be materialized. All CTEs whose selects are non-materializable will get inlined with the parent query.

Predicate Pushdown

To avoid materializing unused rows, a rudimentary form of predicate pushdown in CTEs is supported. This optimization is enabled only when the materialize_ctes session variable is set to AUTO. An example of how a query is internally rewritten by SingleStore Helios is shown below.

WITH foo AS (SELECT * FROM t)
SELECT *
FROM foo, foo as bar
WHERE foo.a = 1 AND bar.b = 2;

is transformed to:

WITH foo AS (SELECT * FROM t WHERE t.a = 1 OR t.b = 2)
SELECT *
FROM foo, foo as bar
WHERE foo.a = 1 AND bar.b = 2;

Unlike normal predicate pushdown, the original top-level predicates remain.

Last modified: February 15, 2024

Was this article helpful?