Using Persistent Computed Columns

A computed column is a column defined by an expression that uses other columns in the table. SingleStore Helios allows users to create persistent computed columns defined by an expression that combines other columns, constants, built-in functions, and operators. SingleStore Helios’s computed columns are fully materialized and can be indexed like a standard column.

There are storage and performance tradeoffs to consider when using computed columns. At a high level, computed columns are a way to optimize computationally expensive read queries that use built-in functions or require additional data processing. They allow users to precompute values that would otherwise have to be computed as part of the execution of a read query. However, computed columns also consume additional storage and require more computation on writes. Some example use cases for computed columns include:

  • Parsing JSON objects for improved read performance

  • Precomputing a value using an expression that includes values from other columns in the table

  • Extracting values from a column, such as a year from a timestamp or a domain from a URL

  • Pre-materializing common expressions in queries allowing them to be used in high-order operations, such as segment elimination or encoded GROUP BYs (see Encoded Data in Columnstores for more information)

  • Using computed columns as sort keys, such as a timestamp type computed column (see Optimizing Table Data Structures for more information on sort keys)

  • When filtering on a query, using computed column names isn't necessary to get faster performance. If there is an index or a shard key on the computed column, then the optimizer will match the expression in the WHERE clause and use the computed column to execute the query. An index on a computed column can speed up an existing query by orders of magnitude without rewriting the query. See an example below:

    Note

    It is a best practice to use large storage size data types for persistent computed columns, e.g. BIGINT or LONGTEXT. This ensures the expression result will fit into the computed column.

    CREATE TABLE t (a INT, b INT, c as a + b PERSISTED BIGINT, sort key(c));
    SELECT * FROM t WHERE c = <constant>;
    SELECT * FROM t WHERE a + b = <constant>;

    Either query is acceptable. The second query will also use column c to filter and does not have to compute a + b.

  • Data type mismatch within the computed column and or parameterization of a computed column can prevent matching. Warnings will inform you if there is an issue with computed column matching:

    COMPILE <query>;
    SHOW WARNINGS;
    An example warning is:
    SHOW WARNINGS;
    +---------+------+---------------------------------------------------------------------------+
    | Level   | Code | Message                                                                   |
    +---------+------+---------------------------------------------------------------------------+
    | Warning | 2626 | Prospect computed column: <table_name>.<column_name> of type text         |
    |         |      | CHARACTER SET utf8 COLLATE utf8_general_ci NULL cannot suit expression of |
    |         |      | type longtext CHARACTER SET utf8 COLLATE utf8_general_ci NULL.            |
    +---------+------+---------------------------------------------------------------------------+
    1 row in set (0.00 sec)

    In the above case, change the column_name type to LONGTEXT and the matching will work as expected.

  • Using computed columns helps to maintain data independence by keeping the physical structure of data independent from application logic. For example, if your application requires data that is a combination of two or more columns within a table, a persistent computed column plus computed column matching can be used to speed up a query. This can be done without having to modify the query to refer to the computed column (a physical structure created to optimize performance) by name.

  • Promoting and indexing JSON fields is the main use case of persistent computed columns. See Using JSON for more information.

Persistent computed columns can be created as part of a CREATE TABLE statement, or can be added later using ALTER TABLE.

Computed Column Examples

The following statement creates a table col_test with an INT column col_a and a persistent computed column col_b that is defined using an expression that takes the value in col_a and adds 1 to it.

CREATE TABLE cal_test (col_a INT PRIMARY KEY, col_b AS col_a + 1
PERSISTED BIGINT, SORT KEY(col_b));

Inserting values into column col_a will automatically populate column col_b using the column definition expression.

INSERT INTO cal_test (col_a) VALUES (1), (2);
SELECT * FROM cal_test;
+-------+-------+
| col_a | col_b |
+-------+-------+
| 2     |    3  |
| 1     |    2  |
+-------+-------+

It is possible to create a table with computed columns initialized using a NOT NULL constraint, just like a non-computed column.

CREATE TABLE cal_test (col_a int NOT NULL, col_b AS col_a+1 PERSISTED BIGINT NOT NULL);
DESC cal_test;
+-------+---------+------+------+---------+----------+
| Field | Type    | Null | Key  | Default | Extra    |
+-------+---------+------+------+---------+----------+
| col_a | int(11) | NO   |      | NULL    |          |
| col_b | int(11) | NO   |      | NULL    | computed |
+-------+---------+------+------+---------+----------+

It is also possible to add a computed column to an existing table using ALTER TABLE.

CREATE TABLE cal_test (col_a INT, col_b INT);
INSERT INTO cal_test (col_a, col_b) VALUES (15, 9), (45, 81);
ALTER TABLE cal_test ADD col_c AS (col_a + col_b/2) PERSISTED BIGINT;
SELECT * FROM cal_test;
+---+-------------------+
| col_a | col_b | col_c |
+-------+-------+-------+
| 15    |   9   |  20   |
| 45    |   81  |  86   |
+-------+-------+-------+
DESC cal_test;
+-------+---------+------+------+---------+----------+
| Field | Type    | Null | Key  | Default | Extra    |
+-------+---------+------+------+---------+----------+
| col_a | int(11) | YES  |      | NULL    |          |
| col_b | int(11) | YES  |      | NULL    |          |
| col_c | int(11) | YES  |      | NULL    | computed |
+-------+---------+------+------+---------+----------+

Adding a computed column with a NOT NULL constraint using ALTER TABLE will result in an error.

ALTER TABLE cal_test ADD col_c AS (col_a + col_b/2)
PERSISTED BIGINT NOT NULL;
ERROR 1064 ER_PARSE_ERROR: You have an error in your SQL syntax; check the manual that 
corresponds to your MySQL server version for the right syntax to use near 'NOT NULL' at line 1

Attempts to insert values directly into a computed column will result in an error, even if the values are correct with respect to the column definition.

INSERT INTO cal_test (col_a, col_b, col_c) VALUES (3, 4, 2);
ERROR 1822 ER_MEMSQL_COMPUTED_INSERT: Cannot insert into computed column 'col_c'

A computed column can be created using any expression that outputs a value that is a valid SingleStore Helios data type, composed of operators and built-in functions. For example, computed columns allow you to extract values from a JSON blob so they can be scanned and read more quickly. However, note that aggregations, user-defined functions, and non-deterministic functions such as rand() and now() cannot be used in computed columns.

SELECT * FROM j;
+---+-------------------------------------------------+
| a | b                                               |
+---+-------------------------------------------------+
| 1 | {"name":"Sandy Beach","occupation":"lifeguard"} |
+---+-------------------------------------------------+
ALTER TABLE j ADD COLUMN occupation AS b::$occupation PERSISTED LONGTEXT;
SELECT * FROM j;
+---+-------------------------------------------------+------------+
| a | b                                               | occupation |
+---+-------------------------------------------------+------------+
| 1 | {"name":"Sandy Beach","occupation":"lifeguard"} | lifeguard  |
+---+-------------------------------------------------+------------+

Modifying values in a column used to define a computed column will also update the affected values in the computed column.

UPDATE j SET b::$occupation = 'circus clown';
SELECT * FROM j;
+---+----------------------------------------------------+--------------+
| a | b                                                  | occupation   |
+---+----------------------------------------------------+--------------+
| 1 | {"name":"Sandy Beach","occupation":"circus clown"} | circus clown |
+---+----------------------------------------------------+--------------+

Computed Column Matching

The keyword AUTO has been added when using computed columns. This addition allows the engine to infer the type of expression used by the computed column.

Instead of using the syntax:

CREATE TABLE test_cal(col_a int, col_b AS col_a * col_a PERSISTED int);

Use the word AUTO in place of the expression type:

CREATE TABLE auto_cal(col_a int, col_b AS col_a * col_a PERSISTED AUTO);

This example uses the table created above and specifies the expression type.

EXPLAIN SELECT col_a, col_a*col_a FROM test_cal;
+-------------------------------------------------------------------+
| EXPLAIN                                                           | 
+-------------------------------------------------------------------+
| WARNING: Computed column test_cal.col_b of type [int(11) NULL]    |
| does not match similar expression's type. Consider changing the   |
| computed column's type to that of the expression [bigint(21)      |
| NULL]                                                             |
| Project [test_cal.col_a, test_cal.col_a * test_cal.col_a AS       |
| 'col_a*col_a']                                                    |
|                                                                   |
| ColumnStoreScan ticket_test.test_cal, KEY __UNORDERED () USING    |
| CLUSTERED COLUMNSTORE table_type:sharded_columnstore              |
+-------------------------------------------------------------------+

The following uses the table created above and uses AUTO for the expression type.

EXPLAIN SELECT col_a, col_a*col_a FROM auto_cal;
+-------------------------------------------------------------------+
| EXPLAIN                                                           | 
+-------------------------------------------------------------------+
| Project [auto_cal.col_a, auto_cal.col_b AS `col_a*col_a`]         |
| ColumnStoreScan ticket_test.auto_cal, KEY __UNORDERED () USING    |
| CLUSTERED COLUMNSTORE table_type:sharded_columnstore              |
+-------------------------------------------------------------------+

Use SHOW CREATE TABLE <table_name> to see the persisted column expression when the expression type is specified.

SHOW CREATE TABLE test_cal;
+-----------+------------------------------------------------------+
| Table     | Create Table                                         |
+-----------+------------------------------------------------------+
| test_cal  | CREATE TABLE `test_cal` (`col_a` int(11) DEFAULT     |
|           | NULL,`col_b` as col_a * col_a PERSISTED int(11),     |
|           | KEY `__UNORDERED` () USING CLUSTERED COLUMNSTORE,    |
|           | SHARD KEY () )AUTOSTATS_CARDINALITY_MODE=INCREMENTAL |
|           | AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON|
|           | SQL_MODE='STRICT_ALL_TABLES'                         |
+-----------+------------------------------------------------------+

Use SHOW CREATE TABLE <table_name> to see the persisted column and when using AUTO.

SHOW CREATE TABLE auto_cal;
+-----------+------------------------------------------------------+
| Table     | Create Table                                         |
+-----------+------------------------------------------------------+
| auto_cal  | CREATE TABLE `auto_cal` (`col_a` int(11) DEFAULT     |
|           | NULL,`col_b` as col_a * col_a PERSISTED bigint(21),  |
|           | KEY `__UNORDERED` () USING CLUSTERED COLUMNSTORE,    |
|           | SHARD KEY () )AUTOSTATS_CARDINALITY_MODE=INCREMENTAL |
|           | AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON|
|           | SQL_MODE='STRICT_ALL_TABLES'                         |
+-----------+------------------------------------------------------+

Last modified: May 4, 2023

Was this article helpful?