Using Persistent Computed Columns

A computed column is a column defined by an expression that uses other columns in the table. SingleStore DB allows users to create persistent computed columns defined by an expression that combines other columns, constants, built-in functions, and operators. SingleStore DB’s computed columns are fully materialized and can be indexed like a standard column.

There are storage and performance tradeoffs to consider when using computed columns. At a high level, computed columns are a way to optimize computationally expensive read queries that use built-in functions or require additional data processing. They allow users to precompute values that would otherwise have to be computed as part of the execution of a read query. However, computed columns also consume additional storage and require more computation on writes. Some example use cases for computed columns include:

  • Parsing JSON objects for improved read performance

  • Precomputing a value using an expression that includes values from other columns in the table

  • Extracting values from a column, such as a year from a timestamp or a domain from a URL

  • Pre-materializing common expressions in queries allowing them to be used in high-order operations, such as segment elimination or encoded GROUP BYs (see Encoded Data in Columnstores for more information)

  • Using computed columns as sort keys, such as a timestamp type computed column (see Optimizing Table Data Structures for more information on sort keys)

  • When filtering on a query, using computed column names isn't necessary to get faster performance. If there is an index or a shard key on the computed column, then the optimizer will match the expression in the WHERE clause and use the computed column to execute the query. An index on a computed column can speed up an existing query by orders of magnitude without rewriting the query. See an example below:

    Notice

    It is a best practice to use large storage size data types for persistent computed columns, e.g. BIGINT or LONGTEXT. This ensures the expression result will fit into the computed column.

    CREATE TABLE t (a INT, b INT, c as a + b BIGINT, sort key(c));
    
    SELECT * FROM t WHERE c = <constant>;
    SELECT * FROM t WHERE a + b = <constant>;
    

    Either query is acceptable. The second query will also use column c to filter and does not have to compute a + b.

  • Data type mismatch within the computed column and or parameterization of a computed column can prevent matching. Warnings will inform you if there is an issue with computed column matching:

    COMPILE <query>;
    SHOW WARNINGS;
    ****
    An example warning is:
    SHOW WARNINGS;
    +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Level   | Code | Message                                                                                                                                                                                              |
    +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Warning | 2626 | Prospect computed column: <table_name>.<column_name> of type text CHARACTER SET utf8 COLLATE utf8_general_ci NULL cannot suit expression of type longtext CHARACTER SET utf8 COLLATE utf8_general_ci NULL. |
    +---------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set (0.00 sec)

    In the above case, change the column_name type to LONGTEXT and the matching will work as expected.

  • Using computed columns helps to maintain data independence by keeping the physical structure of data independent from application logic. For example, if your application requires data that is a combination of two or more columns within a table, a persistent computed column plus computed column matching can be used to speed up a query. This can be done without having to modify the query to refer to the computed column (a physical structure created to optimize performance) by name.

  • Promoting and indexing JSON fields is the main use case of persistent computed columns. See Using JSON for more information.

Notice

Persistent computed columns cannot be initialized with the NOT NULL constraint.

Computed columns can be created as part of a CREATE TABLE statement, or can be added later using ALTER TABLE.

Computed Column Examples

The following statement creates a table t with an INT column a and a persistent computed column b that is defined using an expression that takes the value in a and adds 1 to it.

CREATE TABLE t (a INT PRIMARY KEY, b AS a + 1 PERSISTED BIGINT, SORT KEY(b));

Inserting values into column a will automatically populate column b using the column definition expression.

INSERT INTO t (a) VALUES (1), (2);

SELECT * FROM t;
****
+---+------+
| a | b    |
+---+------+
| 2 |    3 |
| 1 |    2 |
+---+------+

It is also possible to add a computed column to an existing table using ALTER TABLE.

ALTER TABLE t ADD COLUMN c AS a * 2 - b PERSISTED BIGINT;

SELECT * FROM t;
****
+---+------+------+
| a | b    | c    |
+---+------+------+
| 2 |    3 |    1 |
| 1 |    2 |    0 |
+---+------+------+

Attempts to insert values directly into a computed column will result in an error, even if the values are correct with respect to the column definition.

INSERT INTO t (a, b, c) VALUES (3, 4, 2);
****
ERROR 1822 (HY000): Cannot insert into computed column 'b'

A computed column can be created using any expression that outputs a value that is a valid SingleStore DB data type, composed of operators and built-in functions. For example, computed columns allow you to extract values from a JSON blob so they can be scanned and read more quickly. However, note that aggregations, user-defined functions, and non-deterministic functions such as rand() and now() cannot be used in computed columns.

SELECT * FROM j;
****
+---+-------------------------------------------------+
| a | b                                               |
+---+-------------------------------------------------+
| 1 | {"name":"Sandy Beach","occupation":"lifeguard"} |
+---+-------------------------------------------------+
ALTER TABLE j ADD COLUMN occupation AS b::$occupation PERSISTED LONGTEXT;

SELECT  * FROM j;
****
+---+-------------------------------------------------+------------+
| a | b                                               | occupation |
+---+-------------------------------------------------+------------+
| 1 | {"name":"Sandy Beach","occupation":"lifeguard"} | lifeguard  |
+---+-------------------------------------------------+------------+

Modifying values in a column used to define a computed column will also update the affected values in the computed column.

UPDATE j SET b::$occupation = 'circus clown';

SELECT * FROM j;
****
+---+----------------------------------------------------+--------------+
| a | b                                                  | occupation   |
+---+----------------------------------------------------+--------------+
| 1 | {"name":"Sandy Beach","occupation":"circus clown"} | circus clown |
+---+----------------------------------------------------+--------------+