APPROX_ PERCENTILE
On this page
Calculates the approximate percentile for a set of values.
APPROX_ executes faster than the PERCENTILE_
APPROX_ is useful for workloads with large datasets that require statistical analysis.
May be used as a window function.
Syntax
Aggregate Function
APPROX_PERCENTILE (col | expr , percentage [, error_tolerance])
Window Function
APPROX_PERCENTILE (col | expr , percentage [, error_tolerance]) OVER( [PARTITION BY (col | expr)] )
Arguments
-
col: A column of a numeric data type. -
expr: An expression that evaluates to a numeric data type. -
percentage: A numeric value between0and1. -
error_: A numeric value betweentolerance 0and0.(not including5 0).
Return Type
A DOUBLE data type.
Remarks
-
While calculating the percentile,
APPROX_ignores all thePERCENTILE NULLvalues. -
A
percentagevalue of0.will return the approximate median for the data set.5 -
The default value for
error_istolerance 0..01 A higher error_value means a higher tolerance for error, hence lower accuracy.tolerance -
APPROX_can be used as a scalar aggregate function.PERCENTILE It can also be used with a GROUP BYclause.
Note
If you require an exact value instead of an estimation, you should use the PERCENTILE_ or PERCENTILE_ functions.
Error Tolerance Behavior
The error threshold of APPROX_ depends directly on the size of data and the requested quantile.
Examples
The following examples use the test_ table:
CREATE TABLE test_scores(student_id INT, lastname TEXT, class TEXT, test1 decimal(5,2), test2 decimal(5,2));INSERT INTO test_scores VALUES('304', 'Smith', 'sophomore','51.3', '64.1'),('288', 'Hawkins', 'junior','76.9', '81.6'),('614', 'Jones', 'senior','89.3', '75.9'),('451', 'Brown', 'junior','68.4', '71'),('643', 'Nova', 'junior','81', '84.4'),('400', 'Miller', 'senior','90', '88'),('690', 'Maxwell', 'sophomore','72', '74.5'),('243', 'Anderson', 'junior','78.8', '86.3'),('184', 'Williams', 'junior','82', '91'),('111', 'Jackson', 'sophomore','72', '84.7');
Aggregate Function
To calculate the percentile rank using the approx_ function: two inputs are required: the column that will be used to determine the percentile value and the specific percentile rank to be calculated.
Example 1
The approx_ function is used to calculate the approximate percentile value of a specified column in a table.
SELECT student_id, class, test1, approx_percentile(test1, 0.1)OVER (PARTITION BY class)FROM test_scoresORDER BY class, student_id, test1;
+------------+-----------+-------+--------------------------------+
| student_id | lastname | test1 | approx_percentile(test1, 0.1) |
| | | | OVER (PARTITION BY student_id) |
+------------+-----------+-------+--------------------------------+
| 184 | junior | 82.00 | 68.400000 |
| 243 | junior | 78.80 | 68.400000 |
| 288 | junior | 76.90 | 68.400000 |
| 451 | junior | 68.40 | 68.400000 |
| 643 | junior | 81.00 | 68.400000 |
| 400 | senior | 90.00 | 89.300000 |
| 614 | senior | 89.30 | 89.300000 |
| 111 | sophomore | 72.00 | 51.300000 |
| 304 | sophomore | 51.30 | 51.300000 |
| 690 | sophomore | 72.00 | 51.300000 |
+------------+-----------+-------+--------------------------------+Example 2
The SELECT clause includes two columns: class and the approximate median of test2 scores for each class.APPROX_ function calculates the approximate median at the 50th percentile (0.
The GROUP BY clause groups the results by class.
SELECT class, APPROX_PERCENTILE(test2, 0.5) AS "approx_median"FROM test_scores GROUP BY class;
+-----------+---------------+
| class | approx_median |
+-----------+---------------+
| senior | 81.950000 |
| sophomore | 74.500000 |
| junior | 84.400000 |
+-----------+---------------+Window Function
The SELECT clause includes four columns: student_approx_ function calculates the approximate 10th percentile of the test1 column.OVER clause is used to specify the window or subset of rows on which to perform the calculation.PARTITION BY clause, which divides the data by class, so the 10th percentile is calculated for each distinct class in the test_
SELECT class, student_id, test1, APPROX_PERCENTILE(test1, 0.3)OVER (PARTITION BY class) AS "percentile" FROM test_scores;
+-----------+------------+-------+------------+
| class | student_id | test1 | percentile |
+-----------+------------+-------+------------+
| senior | 614 | 89.30 | 89.370000 |
| senior | 400 | 90.00 | 89.370000 |
| sophomore | 690 | 72.00 | 59.580000 |
| sophomore | 304 | 51.30 | 59.580000 |
| sophomore | 111 | 72.00 | 59.580000 |
| junior | 643 | 81.00 | 76.900000 |
| junior | 243 | 78.80 | 76.900000 |
| junior | 184 | 82.00 | 76.900000 |
| junior | 288 | 76.90 | 76.900000 |
| junior | 451 | 68.40 | 76.900000 |
+-----------+------------+-------+------------+Related Topics
Last modified: March 27, 2024