# APPROX\_PERCENTILE

Calculates the approximate percentile for a set of values. The calculation occurs for a given percentage.

`APPROX_PERCENTILE` executes faster than the [PERCENTILE\_DISC](https://docs.singlestore.com/db/v9.1/reference/sql-reference/window-functions/percentile-disc.md) and [PERCENTILE\_CONT and MEDIAN](https://docs.singlestore.com/db/v9.1/reference/sql-reference/window-functions/percentile-cont-and-median.md) functions, and is an alternative to using those functions.

`APPROX_PERCENTILE` is useful for workloads with large datasets that require statistical analysis.

May be used as a window function.

## Syntax

## Aggregate Function

```sql
APPROX_PERCENTILE (col | expr , percentage [, error_tolerance])
```

## Window Function

```sql
APPROX_PERCENTILE (col | expr , percentage [, error_tolerance]) OVER 
  ( [PARTITION BY (col | expr)] )
```

## Arguments

* `col`: A column of a numeric data type.
* `expr`: An expression that evaluates to a numeric data type.
* `percentage`: A numeric value between `0` and `1`.
* `error_tolerance`: A numeric value between `0` and `0.5` (not including `0`).

## Return Type

A `DOUBLE` data type.

## Remarks

* While calculating the percentile, `APPROX_PERCENTILE` ignores all the `NULL` values.
* A `percentage` value of `0.5` will return the approximate median for the data set.
* The default value for `error_tolerance` is `0.01`. A higher `error_tolerance` value means a higher tolerance for error, hence lower accuracy.
* `APPROX_PERCENTILE` can be used as a scalar aggregate function. It can also be used with a `GROUP BY` clause.

> **📝 Note**: If you require an exact value instead of an estimation, you should use the `PERCENTILE_CONT` or `PERCENTILE_DISC` functions.

## Error Tolerance Behavior

The error threshold of `APPROX_PERCENTILE` depends directly on the size of data and the requested quantile. We expect the approximate estimation to differ from the exact estimation within a bounded range of \[-0.01,0.01] for more than 100k rows. For example, for a quantile of 0.5 (50th percentile), you can expect the estimation value will be within the range of 0.49 or 0.51 deterministic/original quantile (+-0.01). Additionally, accuracy improves with larger datasets, and also as the percentage approaches the lower or upper bounds of the distribution (i.e., 1st and 99th percentile).

## Examples

The following examples use the `test_scores` table:

```sql
CREATE TABLE test_scores(student_id INT, lastname TEXT, class TEXT, test1 decimal(5,2), test2 decimal(5,2));

INSERT INTO test_scores VALUES('304', 'Smith', 'sophomore','51.3', '64.1'),
('288', 'Hawkins', 'junior','76.9', '81.6'),
('614', 'Jones', 'senior','89.3', '75.9'),
('451', 'Brown', 'junior','68.4', '71'),
('643', 'Nova', 'junior','81', '84.4'),
('400', 'Miller', 'senior','90', '88'),
('690', 'Maxwell', 'sophomore','72', '74.5'),
('243', 'Anderson', 'junior','78.8', '86.3'),
('184', 'Williams', 'junior','82', '91'),
('111', 'Jackson', 'sophomore','72', '84.7');

```

## Aggregate Function

To calculate the percentile rank using the `approx_percentile` function: two inputs are required: the column that will be used to determine the percentile value and the specific percentile rank to be calculated.

## Example 1

The `approx_percentile` function is used to calculate the approximate percentile value of a specified column in a table.

```sql
SELECT student_id, class, test1, approx_percentile(test1, 0.1)
    OVER (PARTITION BY class)     
    FROM test_scores
    ORDER BY class, student_id, test1;


```

```output

+------------+-----------+-------+--------------------------------+
| student_id | lastname  | test1 | approx_percentile(test1, 0.1)  |
|            |           |       | OVER (PARTITION BY student_id) | 
+------------+-----------+-------+--------------------------------+
|        184 | junior    | 82.00 |                      68.400000 |
|        243 | junior    | 78.80 |                      68.400000 |
|        288 | junior    | 76.90 |                      68.400000 |
|        451 | junior    | 68.40 |                      68.400000 |
|        643 | junior    | 81.00 |                      68.400000 |
|        400 | senior    | 90.00 |                      89.300000 |
|        614 | senior    | 89.30 |                      89.300000 |
|        111 | sophomore | 72.00 |                      51.300000 |
|        304 | sophomore | 51.30 |                      51.300000 |
|        690 | sophomore | 72.00 |                      51.300000 |
+------------+-----------+-------+--------------------------------+
```

## Example 2

The `SELECT` clause includes two columns: class and the approximate median of *test2* scores for each *class*. The `APPROX_PERCENTILE` function calculates the approximate median at the 50th percentile (0.5).

The `GROUP BY` clause groups the results by *class*. The query calculates the approximate median of the *test2* scores for each unique class in the *test\_scores* table.

```sql
SELECT class, APPROX_PERCENTILE(test2, 0.5) AS "approx_median"
    FROM test_scores GROUP BY class;


```

```output

+-----------+---------------+
| class     | approx_median |
+-----------+---------------+
| senior    |     81.950000 |
| sophomore |     74.500000 |
| junior    |     84.400000 |
+-----------+---------------+

```

## Window Function

The `SELECT` clause includes four columns: *student\_id, class, test1*, and the *10th percentile* of *test1* within each class. The `approx_percentile` function calculates the approximate 10th percentile of the *test1* column. The `OVER` clause is used to specify the window or subset of rows on which to perform the calculation. In this example, the window is defined by the `PARTITION BY` clause, which divides the data by *class*, so the 10th percentile is calculated for each distinct *class* in the *test\_scores* table.

```sql
SELECT class, student_id, test1, APPROX_PERCENTILE(test1, 0.3)     
    OVER (PARTITION BY class) AS "percentile" FROM test_scores;


```

```output

+-----------+------------+-------+------------+
| class     | student_id | test1 | percentile |
+-----------+------------+-------+------------+
| senior    |        614 | 89.30 |  89.370000 |
| senior    |        400 | 90.00 |  89.370000 |
| sophomore |        690 | 72.00 |  59.580000 |
| sophomore |        304 | 51.30 |  59.580000 |
| sophomore |        111 | 72.00 |  59.580000 |
| junior    |        643 | 81.00 |  76.900000 |
| junior    |        243 | 78.80 |  76.900000 |
| junior    |        184 | 82.00 |  76.900000 |
| junior    |        288 | 76.90 |  76.900000 |
| junior    |        451 | 68.40 |  76.900000 |
+-----------+------------+-------+------------+

```

## Related Topics

* [PERCENTILE\_CONT](https://docs.singlestore.com/db/v9.1/reference/sql-reference/window-functions/percentile-cont-and-median.md)
* [PERCENTILE\_DISC](https://docs.singlestore.com/db/v9.1/reference/sql-reference/window-functions/percentile-disc.md)

***

Modified at: March 27, 2024

Source: [/db/v9.1/reference/sql-reference/aggregate-functions/approx-percentile/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/aggregate-functions/approx-percentile/)

(An index of the documentation is available at /llms.txt)
