CREATE PROJECTION
On this page
Note
This is a Preview feature.
Creates a projection.
A projection is a copy of a table, which may include only a subset of the base table's columns.
All projections on the base table must be dropped before the base table can be dropped.DROP TABLE operation is attempted.ALTER TABLE operation can proceed.
Updates to projections are done in the same transaction as updates to the base table.
Projections consume storage space and require maintenance when updating your data.
If you want to save space, you can create a projection with fewer columns than the original table – only the columns needed to speed up a certain query or class of queries.
Projections can improve query performance in these scenarios:
Optimize range queries: The use of sort keys can optimize certain range filters.col1 and col2) in the same table, using projections can be helpful.col1 as the sort key of the base table.col2 as the sort key.
Optimize ORDER BY/LIMIT Queries: Certain ORDER BY/LIMIT queries can be made faster using a columnstore sort key.ORDER BY/LIMIT on two different sort keys, you can make the base table be sorted by one key and a projection of the base table sorted on the second key.
Optimize grouping operations: In scenarios involving queries with different GROUP BY clauses within your workload, the goal is to optimize query execution by minimizing data transfer across nodes.GROUP BY, this same approach can speed up COUNT(DISTINCT …) and SELECT DISTINCT. operations.StreamingGroupBy query plan operator can be utilized by the engine to reduce the memory use and improve performance.
Optimize joins: Projections can also be used to optimize data transfer in JOIN clauses.
Optimize key lookups: If you have two different keys and want the fastest possible lookup speed on both of them, with high concurrency, you can shard the base table on one key and create a projection that shards on the other key.
Syntax
CREATE PROJECTION [IF NOT EXISTS] <projection_name>([SHARD KEY(<columns>),] [SORT KEY(<columns>),] [KEY(<columns>),...])AS SELECT <project_list>FROM <table_name>;
Arguments
projection_
The name of the projection.
table_
The projection must be defined on one columnstore table.
project_
The project list may contain any column(s) of <table>..* is allowed, but it is expanded at creation time (new columns are not automatically displayed or added to the projection).
Remarks
-
CREATE PROJECTIONis supported on columnstore tables only - both as the source table and projection object. -
CREATE PROJECTIONis not supported on temporary tables. -
Secondary
UNIQUEkeys and filtered projections are not supported. -
CREATE PROJECTIONis an offline operation, i.e. , any writes (or DDL operation) to the base table are blocked while it is running (but read queries to the base table are not blocked). The CREATE PROJECTIONoperation can use lots of resources and take some time to complete with large base tables. -
BACKUP DATABASE .
. . WITH SPLIT PARTITIONS is blocked if there is a projection in a database. Projections must be dropped before running backup with partition split. -
If a table has projections based on it, truncating and renaming the base table is blocked, as well as any
ALTER TABLEoperation that modifies a column that is referred to by a projection. -
Existing projections cannot be altered.
They can be dropped and re-created. Refer to DROP PROJECTION for more information. -
CREATE PROJECTIONrequiresALTERpermissions and any one of theSELECT,INSERT,UPDATE,andDELETEpermissions on the base table. -
The projection query hint (
use_) can be used in a query to force the optimizer to select the projection instead of the base table.projection='projection_ name' This is required for range filter optimization. Collocated join and group by optimization can be identified automatically by the optimizer. The following example shows the usage of this hint in a simple query shape: SELECT col1, col2, coln...FROM <table_name> WITH (use_projection='<projection_name>'); -
If set to
OFF, theconsider_engine variable will turn off the use of all projections in queries except where the projection query hintsecondary_ projection use_) is used.projection='projection_ name' The default value is ON. -
If there is a projection on a table that is being queried, the hint
disable_can be used to force the optimizer not to use the projection but use the base table instead.projection_ replacement = 1 SELECT col1, col2, coln...FROM <table_name> WITH (disable_projection_replacement = 1); -
If a projection query hint is explicitly used, any columns queried that are not in the projection will result in an error.
The following example returns an error because col3is not in the projectionproj_but it's queried in thename SELECTstatement:CREATE PROJECTION IF NOT EXISTS proj_name(SHARD KEY(col1), SORT KEY(col2))AS SELECT col1, col2 FROM t1;SELECT col1, col2, col3FROM t1 WITH (use_projection='proj_name')WHERE col1 BETWEEN '' AND '';ERROR 1877 ER_BAD_HINT: projection 'proj_name' is not eligible for replacement for table 't1'
Examples
Using the TPC-H dataset, these examples use the lineitem and the orders tables from the tpch database to demonstrate four use cases for utilizing projections.
The TPC-H scale-factor 100 dataset can be used to load data into both tables.
CREATE DATABASE tpch;USE tpch;
CREATE TABLE lineitem (l_orderkey BIGINT(11) NOT NULL,l_partkey INT(11) NOT NULL,l_suppkey INT(11) NOT NULL,l_linenumber INT(11) NOT NULL,l_quantity DECIMAL(15,2) NOT NULL,l_extendedprice DECIMAL(15,2) NOT NULL,l_discount DECIMAL(15,2) NOT NULL,l_tax DECIMAL(15,2) NOT NULL,l_returnflag CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,l_linestatus CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,l_shipdate DATE NOT NULL,l_commitdate DATE NOT NULL,l_receiptdate DATE NOT NULL,l_shipinstruct CHAR(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,l_shipmode CHAR(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,l_comment VARCHAR(44) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,UNIQUE KEY pk ( l_orderkey , l_linenumber ) UNENFORCED RELY,SHARD KEY ( l_orderkey ));
CREATE TABLE orders (o_orderkey BIGINT(11) NOT NULL,o_custkey INT(11) NOT NULL,o_orderstatus CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,o_totalprice DECIMAL(15,2) NOT NULL,o_orderdate DATE NOT NULL,o_orderpriority CHAR(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,o_clerk CHAR(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,o_shippriority INT(11) NOT NULL,o_comment VARCHAR(79) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL);
The tables can be loaded using pipelines.
Optimizing Range Filters
If your application needs to filter on the l_ column in the lineitem table, this query will not run as fast as it could because l_ is not the sort key:
SELECT COUNT(*)FROM lineitemWHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31';
+----------+
| COUNT(*) |
+----------+
| 861221 |
+----------+
77 msIf a projection of the lineitem table is created with l_ as the sort key, this will improve the execution speed of this query.use_) must be used in the updated query for the optimizer to use the projection:
CREATE PROJECTION IF NOT EXISTS lineitem_sort_shipdate(SHARD KEY(l_orderkey), SORT KEY(l_shipdate))AS SELECT l_shipdate, l_orderkey FROM lineitem;
SELECT COUNT(*)FROM lineitem WITH (use_projection='lineitem_sort_shipdate')WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31';
+----------+
| COUNT(*) |
+----------+
| 861221 |
+----------+
9 msOptimize GROUP BY on Shard Key Queries
If you need to know how many orders are grouped by l_ with the same date range as above, the execution of this query would not run optimally since l_ is not the shard key or the sort key:
SELECT COUNT(l_orderkey), l_shipdateFROM lineitemWHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'GROUP BY l_shipdate;
+-------------------+------------+
| COUNT(l_orderkey) | l_shipdate |
+-------------------+------------+
| 1853 | 1992-01-02 |
| 3814 | 1992-01-03 |
| . . . . . . . . . | . . . . . |
| 54004 | 1992-01-30 |
| 55737 | 1992-01-31 |
+-------------------+------------+
80 msThe image below illustrates the execution of this query.l_ is not the shard key for the lineitem table, rows from lineitem need to be transferred between nodes so that all rows from lineitem with the same value of l_ (the GROUP BY column) are on the same node to enable calculation of the COUNT aggregate.
The bold lines in the image below represent network transfers, which consume time and resources.
A projection can be created with l_ set as both the shard key and the sort key:
CREATE PROJECTION IF NOT EXISTS lineitem_shard_sort_shipdate(SHARD KEY(l_shipdate), SORT KEY (l_shipdate))AS SELECT l_orderkey, l_shipdate FROM lineitem;
After creating the projection and using the projection query hint, the query will run faster:
SELECT COUNT(l_orderkey), l_shipdateFROM lineitem WITH (use_projection='lineitem_shard_sort_shipdate')WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'GROUP BY l_shipdate;
+-------------------+------------+
| COUNT(l_orderkey) | l_shipdate |
+-------------------+------------+
| 1853 | 1992-01-02 |
| 3814 | 1992-01-03 |
| . . . . . . . . . | . . . . . |
| 54004 | 1992-01-30 |
| 55737 | 1992-01-31 |
+-------------------+------------+
4 msThe image below shows the query using the new lineitem_ projection, labeled lineitem_ in the image.
Creating a Collocated Join
The lineitem table needs to be joined to the orders table in order to pull the order dates into the query.orders table:
SELECT COUNT(l_orderkey), o_orderdate, l_shipdateFROM lineitem INNER JOIN orders ON lineitem.l_orderkey = orders.o_orderkeyWHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'GROUP BY l_shipdate, o_orderdate;
+-------------------+-------------+-------------+
| COUNT(l_orderkey) | o_orderdate | l_shipdate |
+-------------------+-------------+-------------+
| 1853 | 1992-01-01 | 1992-01-02 |
| 1974 | 1992-01-02 | 1992-01-03 |
| . . . . . . . . . | . . . . . . | . . . . . . |
| 1833 | 1992-01-21 | 1992-01-31 |
| 1831 | 1992-01-30 | 1992-01-31 |
+-------------------+-------------+-------------+
250 msThe image below illustrates the execution of this query.order_ (the JOIN column) is not the shard key for the orders table, rows from orders need to be transferred between nodes so that all rows from orders with a particular value of order_ are on the same node as with rows from lineitem that have that same value of order_.
As above, the bold lines in this image represent network transfers, which consume time and resources and which can be eliminated by using a projection.
You can create a projection and set an explicit shard key on the o_o_:
CREATE PROJECTION IF NOT EXISTS orders_shard_orderkey_sort_orderdate(SHARD KEY(o_orderkey), SORT KEY(o_orderdate))AS SELECT o_orderkey, o_orderdate FROM orders;
Note
The projection query hint may need to be positioned after the table that is projected in a JOIN clause.orders table is the projected table so the WITH (use_ syntax is placed after "INNER JOIN orders".
SELECT COUNT(l_orderkey), o_orderdate, l_shipdateFROM lineitem INNER JOIN orders WITH (use_projection='orders_shard_orderkey_sort_orderdate') ON lineitem.l_orderkey = orders.o_orderkeyWHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'GROUP BY l_shipdate, o_orderdate;
+-------------------+-------------+-------------+
| COUNT(l_orderkey) | o_orderdate | l_shipdate |
+-------------------+-------------+-------------+
| 1853 | 1992-01-01 | 1992-01-02 |
| 3814 | 1992-01-01 | 1992-01-03 |
| . . . . . . . . . | . . . . . . | . . . . . . |
| 54004 | 1992-01-09 | 1992-01-30 |
| 55737 | 1992-01-26 | 1992-01-31 |
+-------------------+-------------+-------------+
139 msRun the SHOW PROFILE command to confirm that the optimizer used the projection.
+--------------------------------------------------------------------------------+| PROFILE |+--------------------------------------------------------------------------------+| . . . . . . . . . . . . . . . . . . . . . . . . . . || ColumnStoreScan tpch.orders PROJECTION orders_shard_orderkey_sort_orderdate . .|| . . . . . . . . . . . . . . . . . . . . . . . . . . |+--------------------------------------------------------------------------------+
The image below shows the same query using the orders_ projection, labeled orders_ in the image.
Optimize Key Lookups
Prior to the introduction of projections, any key lookup where the key didn't contain the shard key would require a broadcast to every leaf node to look up rows having that key on that node.
Consider this query.
SELECT * FROM lineitem WHERE l_partkey = 8752612;
If you run the EXPLAIN statement for this query, you will notice "Gather partitions:all" in the output, which means a fanout is used to send a message to every leaf node to look up rows having that key on that node.
EXPLAIN SELECT * FROM lineitem WHERE l_partkey = 8752612;
+------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------+
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
+------------------------------------------------------------------+If the lookup key contains the shard key, then a single partition query can be used to query only the leaf node containing the key.l_) contains the shard key:
CREATE PROJECTION IF NOT EXISTS lineitem_shard_key_sort_key_partkey(SHARD KEY(l_partkey), SORT KEY(l_partkey))AS SELECT * FROM lineitem;
Now if you run the EXPLAIN statement, you notice "Gather partitions:single" in the output.
EXPLAIN SELECT * FROM lineitem WITH (use_projection='lineitem_shard_key_sort_key_partkey')WHERE l_partkey = 527508935;
+--------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------+
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
| Gather partitions:single alias:remote_0 parallelism_level:segment |
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
+--------------------------------------------------------------------+Note that the projection query hint is currently required in this case.
You can further optimize this example by adding hash index keys on l_ and l_ on their respective projections.
Last modified: July 2, 2025