# CREATE PROJECTION

> **📝 Note**: This is a Preview feature.

The `CREATE PROJECTION` command creates a projection.

A projection is a copy of a table, which may include only a subset of the base table's columns. The projection can be sharded and/or sorted differently from the base table. Projections are dependent on the base table, similar to indexes.

All projections on the base table must be dropped before the base table can be dropped. If not, an error will be generated when the `DROP TABLE` operation is attempted. Attempting to modify the columns in the base table that are used in a projection will also result in an error being generated. The projection would have to be dropped before the `ALTER TABLE` operation can proceed.

Updates to projections are done in the same transaction as updates to the base table. The contents of projections are always kept in sync with the contents of the base table.

Projections consume storage space and require maintenance when updating your data. Physically, a projection is like a second table, with some or all the columns of the primary table. So a typical projection that includes all the columns of the base table will double the storage usage and double the update time. Therefore, they should be utilized sparingly. Typically, you'll employ at most a few projections on a table. Often, no projections or only one may be needed on a table.

If you want to save space, you can create a projection with fewer columns than the original table – only the columns needed to speed up a certain query or class of queries. Also, it's often the case that storage is inexpensive compared to CPU, so consider that when deciding whether to create a projection.

Projections can improve query performance in these scenarios:

**Optimize range queries:** The use of sort keys can optimize certain range filters. If your workload requires fast range filters on two different columns (for example, `col1` and `col2`) in the same table, using projections can be helpful. You can use `col1` as the sort key of the base table. Then create a projection of the base table with `col2` as the sort key.

**Optimize ORDER BY/LIMIT Queries:** Certain `ORDER BY/LIMIT` queries can be made faster using a columnstore sort key. If you need the fastest possible query plans for `ORDER BY/LIMIT` on two different sort keys, you can make the base table be sorted by one key and a projection of the base table sorted on the second key.

**Optimize grouping operations:** In scenarios involving queries with different `GROUP BY` clauses within your workload, the goal is to optimize query execution by minimizing data transfer across nodes. You can adopt a strategy where one shard key is applied to the base table, while a different shard key is assigned to a projection. This approach ensures that data grouping remains localized, resulting in minimal cross-node data movement, and ultimately improving query efficiency. Similar to `GROUP BY`, this same approach can speed up `COUNT(DISTINCT …)` and `SELECT DISTINCT...` operations. Additionally, you can specify the grouping columns as the sort key, so that `StreamingGroupBy` query plan operator can be utilized by the engine to reduce the memory use and improve performance.

**Optimize joins:** Projections can also be used to optimize data transfer in `JOIN` clauses. If one or neither of your join predicate columns are sharded, shard keys can be set on those columns in a projection. This allows the utilization of collocated (local) joins where partitions of the table can be joined locally and no shuffle or broadcast is needed to complete the join, which is required if the joined tables are not both sharded on the join columns.

**Optimize key lookups:** If you have two different keys and want the fastest possible lookup speed on both of them, with high concurrency, you can shard the base table on one key and create a projection that shards on the other key. This can enable single-node lookups rather than broadcast lookups for both keys.

## Syntax

```sql
CREATE PROJECTION [IF NOT EXISTS] <projection_name> 
([SHARD KEY(<columns>),] [SORT KEY(<columns>),] [KEY(<columns>),...]) 
AS SELECT <project_list>
FROM <table_name>;

```

## Arguments

**projection\_name**

The name of the projection.

**table\_name**

The projection must be defined on one columnstore table.

**project\_list**

The project list may contain any column(s) of `<table>.*`. Use of `*` is allowed, but it is expanded at creation time (**new columns are not automatically displayed or added to the projection**).

****

## Remarks

* `CREATE PROJECTION` is supported on columnstore tables only - both as the source table and projection object.
* `CREATE PROJECTION` is not supported on temporary tables.
* Secondary `UNIQUE` keys and filtered projections are not supported.
* `CREATE PROJECTION` is an offline operation, i.e., any writes (or DDL operation) to the base table are blocked while it is running (but read queries to the base table are not blocked). The `CREATE PROJECTION` operation can use lots of resources and take some time to complete with large base tables.
* If a table has projections based on it, truncating and renaming the base table is blocked, as well as any `ALTER TABLE` operation that modifies a column that is referred to by a projection.
* Existing projections cannot be altered. They can be dropped and re-created. Refer to [DROP PROJECTION](https://docs.singlestore.com/cloud/reference/sql-reference/data-definition-language-ddl/drop-projection.md) for more information.
* `CREATE PROJECTION` requires `ALTER` permissions and any one of the `SELECT`, `INSERT`, `UPDATE,` and `DELETE` permissions on the base table.
* The projection query hint (`use_projection='projection_name'`) can be used in a query to force the optimizer to select the projection instead of the base table. This is required for range filter optimization. Collocated join and group by optimization can be identified automatically by the optimizer. The following example shows the usage of this hint in a simple query shape:
  ```sql
  SELECT col1, col2, coln... 
      FROM <table_name> WITH (use_projection='<projection_name>'); 
  ```
* If set to `OFF`, the `consider_secondary_projection` [engine variable ](https://docs.singlestore.com/cloud/reference/configuration-reference/engine-variables/list-of-engine-variables/#sync-variables-lists.md) will turn off the use of all projections in queries except where the projection query hint `use_projection='projection_name'`) is used. The default value is `ON`.
* If there is a projection on a table that is being queried, the hint `disable_projection_replacement = 1` can be used to force the optimizer **not** to use the projection but use the base table instead.
  ```sql
  SELECT col1, col2, coln... 
      FROM <table_name> WITH (disable_projection_replacement = 1); 
  ```
* If a projection query hint is explicitly used, any columns queried that are not in the projection will result in an error. The following example returns an error because `col3` is not in the projection `proj_name` but it's queried in the `SELECT` statement:
  ```sql
  CREATE PROJECTION IF NOT EXISTS proj_name
      (SHARD KEY(col1), SORT KEY(col2))
      AS SELECT col1, col2 FROM t1;
  ```
  ```sql
  SELECT col1, col2, col3 
      FROM t1 WITH (use_projection='proj_name')
      WHERE col1 BETWEEN '' AND '';

  ```
  ```output

  ERROR 1877 ER_BAD_HINT: projection 'proj_name' is not eligible for replacement for table 't1'
  ```

## Examples

Using the TPC-H dataset, these examples use the `lineitem` and the `orders` tables from the `tpch` database to demonstrate four use cases for utilizing projections.

The TPC-H scale-factor 100 dataset can be used to load data into both tables. Refer to [Load TPC-H Data into SingleStore](https://docs.singlestore.com/cloud/getting-started-with-singlestore-helios/next-steps-and-examples/sample-data/load-tpc-h-data-into-singlestore.md)  for more information.

```sql
CREATE DATABASE tpch;
USE tpch;
```

```sql
CREATE TABLE lineitem (
   l_orderkey BIGINT(11) NOT NULL,
   l_partkey INT(11) NOT NULL,
   l_suppkey INT(11) NOT NULL,
   l_linenumber INT(11) NOT NULL,
   l_quantity DECIMAL(15,2) NOT NULL,
   l_extendedprice DECIMAL(15,2) NOT NULL,
   l_discount DECIMAL(15,2) NOT NULL,
   l_tax DECIMAL(15,2) NOT NULL,
   l_returnflag CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   l_linestatus CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   l_shipdate DATE NOT NULL,
   l_commitdate DATE NOT NULL,
   l_receiptdate DATE NOT NULL,
   l_shipinstruct CHAR(25) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   l_shipmode CHAR(10) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   l_comment VARCHAR(44) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  UNIQUE KEY pk ( l_orderkey ,  l_linenumber ) UNENFORCED RELY,
  SHARD KEY ( l_orderkey )
);

```

```sql
CREATE TABLE orders (
   o_orderkey BIGINT(11) NOT NULL,
   o_custkey INT(11) NOT NULL,
   o_orderstatus CHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   o_totalprice DECIMAL(15,2) NOT NULL,
   o_orderdate DATE NOT NULL,
   o_orderpriority CHAR(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   o_clerk CHAR(15) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
   o_shippriority INT(11) NOT NULL,
   o_comment VARCHAR(79) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
  );
```

The tables can be loaded using pipelines. See [Load Data with Pipelines](https://docs.singlestore.com/cloud/getting-started-with-singlestore-helios/next-steps-and-examples/sample-data/load-tpc-h-data-into-singlestore/#load-data-with-pipelines.md) for more information.

## Optimizing Range Filters

If your application needs to filter on the `l_shipdate` column in the `lineitem` table, this query will not run as fast as it could because `l_shipdate` is not the sort key:

```sql
SELECT COUNT(*)     
    FROM lineitem 
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31';

```

```output

+----------+
| COUNT(*) |
+----------+
| 861221   |
+----------+
77 ms
```

If a projection of the `lineitem` table is created with `l_shipdate` as the sort key, this will improve the execution speed of this query. The projection hint (`use_projection='name_of_projection'`) must be used in the updated query for the optimizer to use the projection:

```sql
CREATE PROJECTION IF NOT EXISTS lineitem_sort_shipdate 
    (SHARD KEY(l_orderkey), SORT KEY(l_shipdate)) 
    AS SELECT l_shipdate, l_orderkey FROM lineitem;
```

```sql
SELECT COUNT(*) 
    FROM lineitem WITH (use_projection='lineitem_sort_shipdate')     
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31';

```

```output

+----------+
| COUNT(*) |
+----------+
| 861221   |
+----------+
9 ms

```

## Optimize `GROUP BY` on Shard Key Queries

If you need to know how many orders are grouped by `l_shipdate` with the same date range as above, the execution of this query would not run optimally since `l_shipdate` is not the shard key or the sort key:

```sql
SELECT COUNT(l_orderkey), l_shipdate
    FROM lineitem
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'
    GROUP BY l_shipdate;

```

```output

+-------------------+------------+
| COUNT(l_orderkey) | l_shipdate |
+-------------------+------------+
| 1853              | 1992-01-02 |
| 3814              | 1992-01-03 |
| . . . . . . . . . | . . . . .  |
| 54004             | 1992-01-30 |
| 55737             | 1992-01-31 |
+-------------------+------------+
80 ms
```

The image below illustrates the execution of this query. Since `l_shipdate` is not the shard key for the `lineitem` table, rows from `lineitem` need to be transferred between nodes so that all rows from `lineitem` with the same value of `l_shipdate` (the `GROUP BY` column) are on the same node to enable calculation of the `COUNT` aggregate.

The bold lines in the image below represent network transfers, which consume time and resources. These network transfers can be eliminated by using a projection.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt89d5d04978a38890/6a2fe83239f23771ed28a262/group-by-without-projection-Ofj0lB.png)

A projection can be created with `l_shipdate` set as both the shard key and the sort key:

```sql
CREATE PROJECTION IF NOT EXISTS lineitem_shard_sort_shipdate 
    (SHARD KEY(l_shipdate), SORT KEY (l_shipdate))
    AS SELECT l_orderkey, l_shipdate FROM lineitem;
```

After creating the projection and using the projection query hint, the query will run faster:

```sql
SELECT COUNT(l_orderkey), l_shipdate
    FROM lineitem WITH (use_projection='lineitem_shard_sort_shipdate')  
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'
    GROUP BY l_shipdate;

```

```output

+-------------------+------------+
| COUNT(l_orderkey) | l_shipdate |
+-------------------+------------+
| 1853              | 1992-01-02 |
| 3814              | 1992-01-03 |
| . . . . . . . . . | . . . . .  |
| 54004             | 1992-01-30 |
| 55737             | 1992-01-31 |
+-------------------+------------+
4 ms

```

The image below shows the query using the new `lineitem_shard_sort_shipdate` projection, labeled `lineitem_shipdate` in the image. Note that when using the projection, network transfers are not needed reducing the run time of the query.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blta03da3c5e9d0dd73/6a2fe8b05acece0671a29592/group-by-with-projection-22hquJ.png)

## Creating a Collocated Join

The `lineitem` table needs to be joined to the `orders` table in order to pull the order dates into the query. The optimizer will need to reshuffle the partitions because there is not an explicit shard key on the `orders` table:

```sql
SELECT COUNT(l_orderkey), o_orderdate, l_shipdate    
    FROM lineitem INNER JOIN orders ON lineitem.l_orderkey = orders.o_orderkey    
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'    
    GROUP BY l_shipdate, o_orderdate;

```

```output

+-------------------+-------------+-------------+
| COUNT(l_orderkey) | o_orderdate | l_shipdate  |
+-------------------+-------------+-------------+
| 1853              | 1992-01-01  | 1992-01-02  |
| 1974              | 1992-01-02  | 1992-01-03  | 
| . . . . . . . . . | . . . . . . | . . . . . . |
| 1833              | 1992-01-21  | 1992-01-31  |
| 1831              | 1992-01-30  | 1992-01-31  |
+-------------------+-------------+-------------+ 
250 ms
```

The image below illustrates the execution of this query. Since `order_key` (the `JOIN` column) is not the shard key for the `orders` table, rows from `orders` need to be transferred between nodes so that all rows from `orders` with a particular value of `order_key` are on the same node as with rows from `lineitem` that have that same value of `order_key`.

As above, the bold lines in this image represent network transfers, which consume time and resources and which can be eliminated by using a projection.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltea3a91fabad76a5d/6a2fe8b35408256b102062a1/join-without-projection-iRuKvw.png)

You can create a projection and set an explicit shard key on the `o_orderkey``o_orderdate`:

```sql
CREATE PROJECTION IF NOT EXISTS orders_shard_orderkey_sort_orderdate    
    (SHARD KEY(o_orderkey), SORT KEY(o_orderdate))     
    AS SELECT o_orderkey, o_orderdate FROM orders;

```

> **📝 Note**: The projection query hint may need to be positioned after the table that is projected in a `JOIN` clause. In the following example, the `orders` table is the projected table so the `WITH (use_projection='')` syntax is placed after "`INNER JOIN orders`".

```sql
SELECT COUNT(l_orderkey), o_orderdate, l_shipdate
    FROM lineitem INNER JOIN orders WITH (use_projection='orders_shard_orderkey_sort_orderdate') ON lineitem.l_orderkey = orders.o_orderkey
    WHERE l_shipdate BETWEEN '1992-01-01' AND '1992-01-31'
    GROUP BY l_shipdate, o_orderdate;


```

```output

+-------------------+-------------+-------------+
| COUNT(l_orderkey) | o_orderdate | l_shipdate  |
+-------------------+-------------+-------------+
| 1853              | 1992-01-01  | 1992-01-02  |
| 3814              | 1992-01-01  | 1992-01-03  | 
| . . . . . . . . . | . . . . . . | . . . . . . |
| 54004             | 1992-01-09  | 1992-01-30  |
| 55737             | 1992-01-26  | 1992-01-31  |
+-------------------+-------------+-------------+ 
139 ms
```

Run the `SHOW PROFILE` command to confirm that the optimizer used the projection. This is an abridged version of the plan:

```sql
+--------------------------------------------------------------------------------+
| PROFILE                                                                        | 
+--------------------------------------------------------------------------------+            
|  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  |
| ColumnStoreScan tpch.orders PROJECTION orders_shard_orderkey_sort_orderdate . .|     
|   .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . |  
+--------------------------------------------------------------------------------+
```

Here is the result of the `SHOW PROFILE` command from the SingleStore Helios UI:

![Screenshot of query profile in SingleStore portal](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltbb02dd12b5c4e592/6a2fe91165c88d4a16617356/projection-visual-profile-aPY3Xh.jpg)

The image below shows the same query using the `orders_shard_orderkey_sort_orderdate` projection, labeled `orders_orderkey` in the image. As before, when using the projection, network transfers are not necessary which reduces the run time of the query.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt4eac0fa31f9c06a0/6a2fe8a518d64f5a8a6a85c6/join-with-projection-ai8Qs6.png)

## Optimize Key Lookups

Prior to the introduction of projections, any key lookup where the key didn't contain the shard key would require a broadcast to every leaf node to look up rows having that key on that node.

Consider this query.

```sql
SELECT * FROM lineitem WHERE l_partkey = 8752612;
```

If you run the `EXPLAIN` statement for this query, you will notice "Gather partitions:all" in the output, which means a fanout is used to send a message to every leaf node to look up rows having that key on that node. This is an abridged version of the plan:

```sql
EXPLAIN SELECT * FROM lineitem WHERE l_partkey = 8752612;

```

```output

+------------------------------------------------------------------+
| EXPLAIN                                                          |
+------------------------------------------------------------------+
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  |    
| Gather partitions:all alias:remote_0 parallelism_level:segment |
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  | 
+------------------------------------------------------------------+

```

If the lookup key contains the shard key, then a single partition query can be used to query only the leaf node containing the key. A projection can be created where the lookup key (`l_partkey`) contains the shard key:

```sql
CREATE PROJECTION IF NOT EXISTS lineitem_shard_key_sort_key_partkey
    (SHARD KEY(l_partkey), SORT KEY(l_partkey))
    AS SELECT * FROM lineitem;
```

Now if you run the `EXPLAIN` statement, you notice "Gather partitions:single" in the output. This means that the aggregator node sends a message only to the node containing the partition that may contain the requested key. This is an abridged version of the plan:

```sql
EXPLAIN SELECT * FROM lineitem WITH (use_projection='lineitem_shard_key_sort_key_partkey')
    WHERE l_partkey = 527508935;

```

```output

+--------------------------------------------------------------------+
| EXPLAIN                                                            |
+--------------------------------------------------------------------+
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  |    
| Gather partitions:single alias:remote_0 parallelism_level:segment |
| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  |  
+--------------------------------------------------------------------+
```

Note that the projection query hint is currently required in this case. Otherwise, the projection won't be used and you'll notice "Gather partitions:all" indicating a fanout Gather.

You can further optimize this example by adding hash index keys on `l_orderkey` and `l_partkey` on their respective projections.

***

Modified at: June 11, 2026

Source: [/cloud/reference/sql-reference/data-definition-language-ddl/create-projection/](https://docs.singlestore.com/cloud/reference/sql-reference/data-definition-language-ddl/create-projection/)

(An index of the documentation is available at /llms.txt)
