APPROX_ COUNT_ DISTINCT
Warning
SingleStore 9.0 gives you the opportunity to preview, evaluate, and provide feedback on new and upcoming features prior to their general availability. In the interim, SingleStore 8.9 is recommended for production workloads, which can later be upgraded to SingleStore 9.0.
On this page
Aggregate function.
APPROX_
is useful if an exact result is not required.SELECT COUNT(DISTINCT <column>)
, which calculates the exact number of distinct values in a column of table, APPROX_
can run much faster and consume significantly less memory.
Syntax
APPROX_COUNT_DISTINCT ( expr [, expr ...] )
Arguments
expr
One or more expressions evaluating to a scalar value.
expr
is typically the name of a column.APPROX_
returns an estimate of the number of distinct values in the column.
Examples
The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>)
and APPROX_
on the column so you can compare the results.
DROP DATABASE IF EXISTS singlestore_docs_example;CREATE DATABASE singlestore_docs_example;USE singlestore_docs_example;DELIMITER //CREATE PROCEDURE populate_table() ASBEGINCREATE TABLE acd_t(a INT);FOR i IN 1..100000 LOOPINSERT INTO acd_t VALUES(i);END LOOP;END//DELIMITER ;CALL populate_table();
Get the exact count of distinct values in the column a
:
SELECT COUNT(DISTINCT a) FROM acd_t;
+--------------------+
| COUNT(DISTINCT a) |
+--------------------+
| 100000 |
+--------------------+
Get an estimate of the number of distinct values in the column a
:
SELECT APPROX_COUNT_DISTINCT(a) FROM acd_t;
+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
| 100498 |
+--------------------------+
Using multiple expressions:
CREATE TABLE acd (a int, b int);INSERT INTO acd VALUES (1, 1), (1, 2), (1, 1), (2, 1), (2, 2), (2, 2);
SELECT APPROX_COUNT_DISTINCT(a, b) from acd;
+-----------------------------+
| APPROX_COUNT_DISTINCT(a, b) |
+-----------------------------+
| 4 |
+-----------------------------+
1 row in set (0.254 sec)
A somewhat more practical example, using the following table and data:
CREATE TABLE acd_assets(asset_id int,asset_type varchar(50),asset_desc varchar(50),asset_value decimal(6,2));INSERT into acd_assets values('1049', 'laptop', 'mac_book_pro', '1999.00'),('49', 'cell_phone', 'iphone_12','879.00'),('1100', 'laptop', 'mac_book_pro','1999.00'),('2037', 'laptop', 'mac_book_air_M2','1199.00'),('58', 'cell_phone', 'iphone_12', '879.00'),('130', 'cell_phone', 'iphone_13', '699'),('210', 'laptop', 'mac_book_pro','2500.00'),('111', 'laptop', 'mac_book_pro','2100.00'),('099', 'laptop', 'mac_book_air_M1','999'),('140', 'cell_phone', 'iphone_13_pro','999.00');
SELECT approx_count_distinct(asset_id) AS approx_distinct_asset_idFROM acd_assets;
+-------------------------+
|approx_distinct_asset_id |
+-------------------------+
| 10 |
+-------------------------+
The GROUP BY
clause can be used with the APPROX_
function.GROUP BY
returns the approximate total of each type of asset from the assets table.
SELECT asset_type, APPROX_COUNT_DISTINCT(asset_id) AS approx_distinct_asset_idFROM acd_assetsGROUP BY asset_type;
+------------+--------------------------+
|asset_type | approx_distinct_asset_id |
+------------+--------------------------+
| cell_phone | 4 |
| laptop | 6 |
+------------+--------------------------+
Last modified: March 18, 2024