APPROX_COUNT_DISTINCT
Returns an estimate of the number of distinct values in a collection.
APPROX_COUNT_DISTINCT
is useful if an exact result is not required. Compared to SELECT COUNT(DISTINCT <column>)
, which calculates the exact number of distinct values in a column of table, APPROX_COUNT_DISTINCT
can run much faster and consume significantly less memory.
Syntax
APPROX_COUNT_DISTINCT ( expr [, expr] )
Arguments
expr
One or more expressions evaluating to a scalar value.
expr
is typically the name of a column. In this case, APPROX_COUNT_DISTINCT
returns an estimate of the number of distinct values in the column.
Examples
The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>)
and APPROX_COUNT_DISTINCT
on the column so you can compare the results.
DROP DATABASE IF EXISTS singlestore_docs_example; CREATE DATABASE singlestore_docs_example; USE singlestore_docs_example; DELIMITER // CREATE PROCEDURE populate_table() AS BEGIN CREATE TABLE acd_t(a INT); FOR i IN 1..100000 LOOP INSERT INTO acd_t VALUES(i); END LOOP; END // DELIMITER ; CALL populate_table();
Get the exact count of distinct values in the column a
:
SELECT COUNT(DISTINCT a) FROM acd_t; **** +--------------------+ | COUNT(DISTINCT a) | +--------------------+ | 100000 | +--------------------+
Get an estimate of the number of distinct values in the column a
:
SELECT APPROX_COUNT_DISTINCT(a) FROM acd_t; **** +--------------------------+ | APPROX_COUNT_DISTINCT(a) | +--------------------------+ | 100498 | +--------------------------+
Using multiple expressions:
CREATE TABLE acd (a int, b int); INSERT INTO acd VALUES (1, 1), (1, 2), (1, 1), (2, 1), (2, 2), (2, 2);
SELECT APPROX_COUNT_DISTINCT(a, b) from acd; **** +-----------------------------+ | APPROX_COUNT_DISTINCT(a, b) | +-----------------------------+ | 4 | +-----------------------------+ 1 row in set (0.254 sec)