PERCENTILE_ CONT and MEDIAN
On this page
In SingleStore, percentile functions are available as window functions and aggregate functions.
There are two percentile functions: PERCENTILE_ and PERCENTILE_
PERCENTILE_ is an inverse distribution function.
Syntax
PERCENTILE_CONT(percentile)
WITHIN GROUP (ORDER BY (col | expr))
[OVER (
[PARTITION BY (col | expr), ...]
[frame_clause]
)]or equivalently
PERCENTILE_CONT(percentile)
OVER (
ORDER BY (col | expr)
[PARTITION BY {col | expr}, ...]
[frame_clause]
)Also, MEDIAN can be used as a regular aggregate function, as follows:
MEDIAN (col | expr)Arguments
percentile
A numeric value between 0 and 1.
col
A column of a numeric data type.
expr
An expression that evaluates to a numeric data type.
Remarks
Both the PERCENTILE_ and MEDIAN functions can be used as standard aggregate functions.
The OVER() clause defines how to partition the input set.PARTITION BY clause is specified, the entire input set is treated as a single partition.
Only one column or expression is allowed in the ORDER BY clause, and this column is the one for which the percentile is calculated.
Examples
Example 1
The following example demonstrates the use of PERCENTILE_ as an aggregate function.
CREATE TABLE percont_sets (a INT, b INT, c INT);INSERT INTO percont_sets VALUES (1,1,2),(2,2,3),(3,1,2),(4,2,4),(5,1,1),(2,1,1),(null,null,3);
SELECT PERCENTILE_CONT(0.5)WITHIN GROUP (ORDER BY a)FROM percont_sets;
+------------------------------------------------+
| PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY a) |
+------------------------------------------------+
| 2.5000 |
+------------------------------------------------+SELECT c, PERCENTILE_CONT(0.5)WITHIN GROUP (ORDER BY a) AS "PERCENTILE"FROM percont_setsGROUP BY c ORDER BY c;
+------+------------+
| c | PERCENTILE |
+------+------------+
| 1 | 3.5000 |
| 2 | 2.0000 |
| 3 | 2.0000 |
| 4 | 4.0000 |
+------+------------+SELECT PERCENTILE_CONT(0.25)WITHIN GROUP (ORDER BY a) AS "PERCENTILE"FROM percont_sets;
+------------+
| PERCENTILE |
+------------+
| 2.0000 |
+------------+The percentile of a data set may change depending on how it is ordered.
SELECT PERCENTILE_CONT(0.25)WITHIN GROUP (ORDER BY a DESC) AS "PERCENTILE"FROM percont_sets;
+------------+
| PERCENTILE |
+------------+
| 3.7500 |
+------------+Multiple PERCENTILE_ functions can be used in a single query, and each function must have its own WITHIN GROUP clause.
SELECT PERCENTILE_CONT(0.25)WITHIN GROUP (ORDER BY a) AS "PERCENTILE a",PERCENTILE_CONT(0.25)WITHIN GROUP (ORDER BY b) AS "PERCENTILE b"FROM percont_sets;
+--------------+--------------+
| PERCENTILE a | PERCENTILE b |
+--------------+--------------+
| 2.0000 | 1.0000 |
+--------------+--------------+Example 2
The following example creates a table containing SAT scores.
CREATE TABLE percont_SAT (reading_writing int, math int);INSERT INTO percont_SAT VALUES (800,800),(650,300),(700,500);
SELECT reading_writing, math, reading_writing + math AS total,PERCENTILE_CONT(.9)WITHIN GROUP (ORDER BY reading_writing + math)OVER () AS `90th Percentile`FROM percont_SAT;
+-----------------+------+-------+-----------------+
| reading_writing | math | total | 90th Percentile |
+-----------------+------+-------+-----------------+
| 650 | 300 | 950 | 1520.0000 |
| 700 | 500 | 1200 | 1520.0000 |
| 800 | 800 | 1600 | 1520.0000 |
+-----------------+------+-------+-----------------+Median
Compute the median using percentile = 0.
PERCENTILE_CONT( 0.5 ) ...Example:
CREATE TABLE percont (a int);INSERT INTO percont VALUES (1),(2),(3),(166),(1024);
SELECT a, AVG(a) OVER () AS mean,PERCENTILE_CONT(.5)WITHIN GROUP (ORDER BY a)OVER () AS medianFROM percont;
+------+----------+--------+
| a | mean | median |
+------+----------+--------+
| 1 | 239.2000 | 3.0000 |
| 2 | 239.2000 | 3.0000 |
| 3 | 239.2000 | 3.0000 |
| 166 | 239.2000 | 3.0000 |
| 1024 | 239.2000 | 3.0000 |
+------+----------+--------+Median and Percentiles with Regular Aggregates and Grouping
Suppose you have some data with columns you want to group by and another column you want to aggregate using several aggregate functions, including standard ones like AVG, MIN, MAX etc.
For example, suppose you have the following data:
CREATE TABLE percont_t (id int, category varchar(10), year int, value int);INSERT INTO percont_t VALUES(1, "ABC", 2019, 100),(2, "ABC", 2019, 150),(3, "ABC", 2019, 200),(4, "ABC", 2020, 100),(5, "ABC", 2020, 175),(6, "ABC", 2020, 250),(7, "XYZ", 2020, 200),(8, "XYZ", 2020, 275),(9, "XYZ", 2020, 350),(10, "XYZ", 2020, 450);
Suppose you want to compute the following aggregates of value grouped by category and year for this data: SUM, AVG, MEDIAN, 10th percentile, and 90th percentile.
SELECT category, year, SUM(value), AVG(value), MEDIAN(value),PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY value),PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY value)FROM percont_tGROUP BY category, yearORDER BY category, year;
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+
| category | year | SUM(value) | AVG(value) | MEDIAN(value) | PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY value) | PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY value) |
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+
| ABC | 2019 | 450 | 150.0000 | 150.0000 | 110.0000 | 190.0000 |
| ABC | 2020 | 525 | 175.0000 | 175.0000 | 115.0000 | 235.0000 |
| XYZ | 2020 | 1275 | 318.7500 | 312.5000 | 222.5000 | 420.0000 |
+----------+------+------------+------------+---------------+----------------------------------------------------+----------------------------------------------------+The above query gives the desired result because both the PERCENTILE_ and MEDIAN functions are allowed as standard aggregate functions.
Last modified: February 16, 2023