APPROX_ COUNT_ DISTINCT
On this page
Aggregate function.
APPROX_
is useful if an exact result is not required.SELECT COUNT(DISTINCT <column>)
, which calculates the exact number of distinct values in a column of table, APPROX_
can run much faster and consume significantly less memory.
Syntax
APPROX_COUNT_DISTINCT ( expr [, expr ...] )
Arguments
expr
One or more expressions evaluating to a scalar value.
expr
is typically the name of a column.APPROX_
returns an estimate of the number of distinct values in the column.
Examples
The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>)
and APPROX_
on the column so you can compare the results.
DROP DATABASE IF EXISTS singlestore_docs_example;CREATE DATABASE singlestore_docs_example;USE singlestore_docs_example;DELIMITER //CREATE PROCEDURE populate_table() ASBEGINCREATE TABLE acd_t(a INT);FOR i IN 1..100000 LOOPINSERT INTO acd_t VALUES(i);END LOOP;END//DELIMITER ;CALL populate_table();
Get the exact count of distinct values in the column a
:
SELECT COUNT(DISTINCT a) FROM acd_t;
+--------------------+
| COUNT(DISTINCT a) |
+--------------------+
| 100000 |
+--------------------+
Get an estimate of the number of distinct values in the column a
:
SELECT APPROX_COUNT_DISTINCT(a) FROM acd_t;
+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
| 100498 |
+--------------------------+
Using multiple expressions:
CREATE TABLE acd (a int, b int);INSERT INTO acd VALUES (1, 1), (1, 2), (1, 1), (2, 1), (2, 2), (2, 2);
SELECT APPROX_COUNT_DISTINCT(a, b) from acd;
+-----------------------------+
| APPROX_COUNT_DISTINCT(a, b) |
+-----------------------------+
| 4 |
+-----------------------------+
1 row in set (0.254 sec)
A somewhat more practical example, using the following table and data:
CREATE TABLE acd_assets(asset_id int,asset_type varchar(50),asset_desc varchar(50),asset_value decimal(6,2));INSERT into acd_assets values('1049', 'laptop', 'mac_book_pro', '1999.00'),('49', 'cell_phone', 'iphone_12','879.00'),('1100', 'laptop', 'mac_book_pro','1999.00'),('2037', 'laptop', 'mac_book_air_M2','1199.00'),('58', 'cell_phone', 'iphone_12', '879.00'),('130', 'cell_phone', 'iphone_13', '699'),('210', 'laptop', 'mac_book_pro','2500.00'),('111', 'laptop', 'mac_book_pro','2100.00'),('099', 'laptop', 'mac_book_air_M1','999'),('140', 'cell_phone', 'iphone_13_pro','999.00');
SELECT approx_count_distinct(asset_id) AS approx_distinct_asset_idFROM acd_assets;
+-------------------------+
|approx_distinct_asset_id |
+-------------------------+
| 10 |
+-------------------------+
The GROUP BY
clause can be used with the APPROX_
function.GROUP BY
returns the approximate total of each type of asset from the assets table.
SELECT asset_type, APPROX_COUNT_DISTINCT(asset_id) AS approx_distinct_asset_idFROM acd_assetsGROUP BY asset_type;
+------------+--------------------------+
|asset_type | approx_distinct_asset_id |
+------------+--------------------------+
| cell_phone | 4 |
| laptop | 6 |
+------------+--------------------------+
Last modified: March 18, 2024