Working with Window Functions

Syntax

Window functions in SingleStore use the following syntax:

function ([expr]) OVER {window_name | (window_definition)}

window_definition:
    [partition_by_clause] [order_by_clause] [frame_clause]

partition_by_clause:
    PARTITION BY {col | expr}

order_by_clause:
    ORDER BY {col | expr}

frame_clause:
    {ROWS | RANGE}
        {UNBOUNDED PRECEDING | int PRECEDING | CURRENT ROW} |
        {BETWEEN
            {UNBOUNDED PRECEDING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
         AND
            {UNBOUNDED FOLLOWING | int {PRECEDING | FOLLOWING} | CURRENT ROW}
        }

Syntax Remarks

window_definition

Window definition clauses are similar to standard GROUP BY and ORDER BY clauses with the following exception: you cannot refer to the projections list using aliases or position, because literal integers are treated as integers instead of positional references.

frame_clause

The frame_clause is used with an aggregate window function, and it specifies a subset of rows in a partition to evaluate. Ranking, value, and percentile window functions are not affected by the frame_clause because they are intended to operate on the entire set of rows in each partition.

In the frame_clause, int PRECEDING and int FOLLOWING are referred to as frame offsets. RANGE cannot be used with frame offsets. Additionally, frame offsets must be non-negative integers. Any non-integer is automatically rounded.

If BETWEEN is not specified, the default upper bound is CURRENT ROW. For example, ROWS UNBOUNDED PRECEDING is equivalent to ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

If no frame is specified, the default is RANGE UNBOUNDED PRECEDING, i.e. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

RANGE includes all rows whose elements in the order_by_clause are equal. If the order_by_clause is empty, RANGE will include all rows in the partition.

The order_by_clause can include optional NULLS FIRST or NULLS LAST clauses that sort NULL values to the beginning or end of the results set. Ascending order is assumed for the non-NULL values. DESC can be added to the clause to order the non-NULL values in descending order. NULLS FIRST and NULLS LAST clauses are supported by all window functions. Consider the following examples:

SQL

CREATE TABLE wf_t (a INT);

INSERT INTO wf_t VALUES (1),(2),(2),(3),(3),(3),(4);

SQL

SELECT COUNT(1) OVER (ORDER BY a ROWS UNBOUNDED PRECEDING) AS col1,
          COUNT(1) OVER (ORDER BY a RANGE UNBOUNDED PRECEDING) AS col2 FROM wf_t;

+------+------+
| col1 | col2 |
+------+------+
|    1 |    1 |
|    2 |    3 |
|    3 |    3 |
|    4 |    6 |
|    5 |    6 |
|    6 |    6 |
|    7 |    7 |
+------+------+

SQL

INSERT INTO wf_t VALUES (null), (null);

SQL

SELECT a, RANK() OVER (ORDER BY a NULLS FIRST) AS col1 FROM wf_t;

+------+------+
| a    | col1 |
+------+------+
| NULL |    1 |
| NULL |    1 |
|    1 |    3 |
|    2 |    4 |
|    2 |    4 |
|    3 |    6 |
|    3 |    6 |
|    3 |    6 |
|    4 |    9 |
+------+------+

SQL

SELECT a, RANK() OVER (ORDER BY a NULLS LAST) AS col1 FROM wf_t;

+------+------+
| a    | col1 |
+------+------+
|    1 |    1 |
|    2 |    2 |
|    2 |    2 |
|    3 |    4 |
|    3 |    4 |
|    3 |    4 |
|    4 |    7 |
| NULL |    8 |
| NULL |    8 |
+------+------+

SQL

SELECT a, RANK() OVER (ORDER BY a DESC NULLS FIRST) AS col1 FROM wf_t;

+------+------+
| a    | col1 |
+------+------+
| NULL |    1 |
| NULL |    1 |
|    4 |    3 |
|    3 |    4 |
|    3 |    4 |
|    3 |    4 |
|    2 |    7 |
|    2 |    7 |
|    1 |    9 |
+------+------+

SQL

SELECT a, RANK() OVER (ORDER BY a DESC NULLS LAST) AS col1 FROM wf_t;

+------+------+
| a    | col1 |
+------+------+
|    4 |    1 |
|    3 |    2 |
|    3 |    2 |
|    3 |    2 |
|    2 |    5 |
|    2 |    5 |
|    1 |    7 |
| NULL |    8 |
| NULL |    8 |
+------+------+

Supported Functions

The following built-in functions are supported in a window function:

Ranking functions:

Value functions:

Aggregate functions:

Percentile functions:

Named Window Definition

window_name is a reference to a named window definition given in a WINDOW clause, such as:

SQL

SELECT RANK() OVER window_name FROM table WINDOW window_name AS (window_definition)

If using multiple window functions on the same window frame, you may see improved performance by setting and using a window_name. Using the same window_name across multiple functions allows SingleStore Helios code generation to make execution more efficient.

Consider the following examples. The statement below does not use a window_name for two identical window definitions.

SQL

SELECT website, timestamp,
AVG(clicks) OVER (PARTITION BY website
                  ORDER BY timestamp
                  ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING),
SUM(visits) OVER (PARTITION BY website
                  ORDER BY timestamp
                  ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM t;

The statement above can be improved by using window_name syntax, which will improve performance:

SQL

SELECT website, timestamp,
AVG(clicks) OVER w,
SUM(visits) OVER w
FROM t
WINDOW w AS (PARTITION BY website
              ORDER BY timestamp
              ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING);

The second version of the query has fewer parameters, so the generated code can be smaller and more efficient.

Examples

In the following simple examples, the table t contains four uniquely identified values divided into two categories.

SQL

CREATE TABLE wft (id int, category char(1), value int);

INSERT INTO wft VALUES (1, 'a', 1), (2, 'a', 3),
        (3, 'b', 2), (4, 'b', 4);

SELECT * FROM wft ORDER BY id;

+----+----------+-------+
| id | category | value |
+----+----------+-------+
|  1 | a        |     1 |
|  2 | a        |     3 |
|  3 | b        |     2 |
|  4 | b        |     4 |
+----+----------+-------+

Selecting the average of a column with GROUP BY performs an aggregation that returns a single row per group. Selecting (projecting over) id would be meaningless in this case – SingleStore Helios will return values from an arbitrary row.

SQL

SELECT category, avg(value) FROM wft GROUP BY category;

+----------+------------+
| category | avg(value) |
+----------+------------+
| a        |     2.0000 |
| b        |     3.0000 |
+----------+------------+

In contrast, using avg as a window function with OVER and partitioning by category will compute the average value for each partition of category, then append that category-wise average to each row.

SQL

SELECT id, category, value, avg(value) over (partition by category) FROM wft;

+----+----------+-------+-----------------------------------------+
| id | category | value | avg(value) over (partition by category) |
+----+----------+-------+-----------------------------------------+
|  2 | a        |     3 |                                  2.0000 |
|  1 | a        |     1 |                                  2.0000 |
|  4 | b        |     4 |                                  3.0000 |
|  3 | b        |     2 |                                  3.0000 |
+----+----------+-------+-----------------------------------------+

Here is another example where we order by value within each partition, and use the rank() window function to produce the rank of each row within its partition.

SQL

SELECT id, category, value, rank() OVER (partition by category order by value) AS rank FROM wft;

+------+----------+-------+------+
| id   | category | value | rank |
+------+----------+-------+------+
|    1 | a        |     1 |    1 |
|    2 | a        |     3 |    2 |
|    3 | b        |     2 |    1 |
|    4 | b        |     4 |    2 |
+------+----------+-------+------+

On this page