# SingleStore Managed Service

##### APPROX_PERCENTILE

Calculates the approximate percentile for a set of values. The calculation occurs for a given percentage.

`APPROX_PERCENTILE` executes faster than the PERCENTILE_DISC and PERCENTILE_CONT and MEDIAN functions, and is an alternative to using those functions.

`APPROX_PERCENTILE` is useful for workloads with large datasets that require statistical analysis.

###### Syntax
```APPROX_PERCENTILE (col|expr , percentage [, error_tolerance])
```
###### Arguments
• `col`: A column of a numeric data type.

• `expr`: An expression that evaluates to a numeric data type.

• `percentage`: A numeric value between `0` and `1`.

• `error_tolerance`: A numeric value between `0` and `0.5` (not including `0`).

###### Return Type

A `DOUBLE` data type.

###### Remarks
• While calculating the percentile, `APPROX_PERCENTILE` ignores all the `NULL` values.

• A `percentage` value of `0.5` will return the approximate median for the data set.

• The default value for `error_tolerance` is `0.01`. A higher `error_tolerance` value means a higher tolerance for error, hence lower accuracy.

• `APPROX_PERCENTILE` can be used as a scalar aggregate function. It can also be used with a `GROUP BY` clause.

### Notice

If you require an exact value instead of an estimation, you should use the `PERCENTILE_CONT` or `PERCENTILE_DISC` functions.

###### Error Tolerance Behavior

The error threshold of `APPROX_PERCENTILE` depends directly on the size of data and the requested quantile. We expect the approximate estimation to differ from the exact estimation within a bounded range of [-0.01,0.01] for more than 100k rows. For example, for a quantile of 0.5 (50th percentile), you can expect the estimation value will be within the range of 0.49 or 0.51 deterministic/original quantile (+-0.01). Additionally, accuracy improves with larger datasets, and also as the percentage approaches the lower or upper bounds of the distribution (i.e., 1st and 99th percentile).

###### Examples

The following examples use the `courses` table:

```CREATE TABLE courses(course_code TEXT, section_number INT, number_students INT);
```

The contents of the `courses` table are as follows:

```+-------------+----------------+-----------------+
| course_code | section_number | number_students |
+-------------+----------------+-----------------+
| CS-101      |              1 |              20 |
| CS-101      |              2 |              16 |
| CS-101      |              3 |              22 |
| CS-101      |              4 |              25 |
| CS-101      |              5 |              22 |
| CS-150      |              1 |              10 |
| CS-150      |              2 |              16 |
| CS-150      |              3 |              11 |
| CS-150      |              4 |              17 |
| CS-150      |              5 |               9 |
| CS-201      |              1 |              14 |
| CS-201      |              2 |              17 |
| CS-301      |              1 |               7 |
| CS-301      |              2 |              10 |
+-------------+----------------+-----------------+
```
###### Example 1

The following example demonstrates the usage of `APPROX_PERCENTILE` by calculating the number of students across different courses for a given percentile.

###### Example 1a
```SELECT course_code, APPROX_PERCENTILE(number_students, 0.65) AS "percentile"
FROM courses GROUP BY course_code;
****
+-------------+------------+
| course_code | percentile |
+-------------+------------+
| CS-301      |     9.4000 |
| CS-201      |    16.4000 |
| CS-101      |    22.0000 |
| CS-150      |    14.7500 |
+-------------+------------+
```
###### Example 1b

The following example calculates the median of the dataset.

```SELECT course_code, APPROX_PERCENTILE(number_students, 0.5) AS "approx_median"
FROM courses GROUP BY course_code;
****
+-------------+---------------+
| course_code | approx_median |
+-------------+---------------+
| CS-301      |        8.5000 |
| CS-201      |       15.5000 |
| CS-101      |       22.0000 |
| CS-150      |       11.0000 |
+-------------+---------------+
```
###### Example 2

The `APPROX_PERCENTILE` function can be used as a Window Function.

###### Example 2a
```SELECT DISTINCT(course_code), APPROX_PERCENTILE(number_students, 0.5) OVER (ORDER BY course_code) AS "percentile" FROM courses;
****
+-------------+------------+
| course_code | percentile |
+-------------+------------+
| CS-101      |    22.0000 |
| CS-150      |    16.5000 |
| CS-201      |    16.5000 |
| CS-301      |    16.0000 |
+-------------+------------+
```
###### Example 2b
```SELECT DISTINCT(course_code), APPROX_PERCENTILE(number_students, 0.3, 0.3) OVER (ORDER BY course_code) AS "percentile" FROM courses;
****
+-------------+------------+
| course_code | percentile |
+-------------+------------+
| CS-101      |    20.0000 |
| CS-150      |    13.5000 |
| CS-201      |    14.2000 |
| CS-301      |    10.7000 |
+-------------+------------+
```

Related Topics