APPROX_ COUNT_ DISTINCT
On this page
Returns an estimate of the number of distinct values in a collection.
APPROX_
is useful if an exact result is not required.SELECT COUNT(DISTINCT <column>)
, which calculates the exact number of distinct values in a column of table, APPROX_
can run much faster and consume significantly less memory.
Syntax
APPROX_COUNT_DISTINCT ( expr [, expr] )
Arguments
expr
One or more expressions evaluating to a scalar value.
expr
is typically the name of a column.APPROX_
returns an estimate of the number of distinct values in the column.
Examples
The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>)
and APPROX_
on the column so you can compare the results.
DROP DATABASE IF EXISTS singlestore_docs_example;CREATE DATABASE singlestore_docs_example;USE singlestore_docs_example;DELIMITER //CREATE PROCEDURE populate_table() ASBEGINCREATE TABLE acd_t(a INT);FOR i IN 1..100000 LOOPINSERT INTO acd_t VALUES(i);END LOOP;END//DELIMITER ;CALL populate_table();
Get the exact count of distinct values in the column a
:
SELECT COUNT(DISTINCT a) FROM acd_t;
+--------------------+
| COUNT(DISTINCT a) |
+--------------------+
| 100000 |
+--------------------+
Get an estimate of the number of distinct values in the column a
:
SELECT APPROX_COUNT_DISTINCT(a) FROM acd_t;
+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
| 100498 |
+--------------------------+
Using multiple expressions:
CREATE TABLE acd (a int, b int);INSERT INTO acd VALUES (1, 1), (1, 2), (1, 1), (2, 1), (2, 2), (2, 2);
SELECT APPROX_COUNT_DISTINCT(a, b) from acd;
+-----------------------------+
| APPROX_COUNT_DISTINCT(a, b) |
+-----------------------------+
| 4 |
+-----------------------------+
1 row in set (0.254 sec)
Last modified: February 24, 2023