APPROX_COUNT_DISTINCT

Syntax

APPROX_COUNT_DISTINCT ( expr [, expr ...] )

Arguments

expr

One or more expressions evaluating to a scalar value.

expr is typically the name of a column. In this case, APPROX_COUNT_DISTINCT returns an estimate of the number of distinct values in the column.

Examples

The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>) and APPROX_COUNT_DISTINCT on the column so you can compare the results.

SQL

DROP DATABASE IF EXISTS singlestore_docs_example;
CREATE DATABASE singlestore_docs_example;
USE singlestore_docs_example;

DELIMITER //
CREATE PROCEDURE populate_table() AS
BEGIN
  CREATE TABLE acd_t(a INT);
  FOR i IN 1..100000 LOOP
    INSERT INTO acd_t VALUES(i);
  END LOOP;
END
//
DELIMITER ;

CALL populate_table();

Get the exact count of distinct values in the column a:

SQL

SELECT COUNT(DISTINCT a) FROM acd_t;

+--------------------+
| COUNT(DISTINCT a)  |
+--------------------+
|             100000 |
+--------------------+

Get an estimate of the number of distinct values in the column a:

SQL

SELECT APPROX_COUNT_DISTINCT(a) FROM acd_t;

+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
|                   100498 |
+--------------------------+

Using multiple expressions:

SQL

CREATE TABLE acd (a int, b int);

INSERT INTO acd VALUES (1, 1), (1, 2), (1, 1), (2, 1), (2, 2), (2, 2);

SQL

SELECT APPROX_COUNT_DISTINCT(a, b) from acd;

+-----------------------------+
| APPROX_COUNT_DISTINCT(a, b) |
+-----------------------------+
|                           4 |
+-----------------------------+
1 row in set (0.254 sec)

A somewhat more practical example, using the following table and data:

SQL

CREATE TABLE acd_assets(
asset_id int,
asset_type varchar(50),
asset_desc varchar(50),
asset_value decimal(6,2)
);

INSERT into acd_assets values('1049', 'laptop', 'mac_book_pro', '1999.00'),
('49', 'cell_phone', 'iphone_12','879.00'),
('1100', 'laptop', 'mac_book_pro','1999.00'),
('2037', 'laptop', 'mac_book_air_M2','1199.00'),
('58', 'cell_phone', 'iphone_12', '879.00'),
('130', 'cell_phone', 'iphone_13', '699'),
('210', 'laptop', 'mac_book_pro','2500.00'),
('111', 'laptop', 'mac_book_pro','2100.00'),
('099', 'laptop', 'mac_book_air_M1','999'),
('140', 'cell_phone', 'iphone_13_pro','999.00');

SQL

SELECT approx_count_distinct(asset_id) AS approx_distinct_asset_id
FROM acd_assets;

+-------------------------+
|approx_distinct_asset_id |
+-------------------------+
| 10                      |
+-------------------------+

The GROUP BY clause can be used with the APPROX_COUNT_DISTINCT function. In this example GROUP BY returns the approximate total of each type of asset from the assets table.

SQL

SELECT asset_type, APPROX_COUNT_DISTINCT(asset_id) AS approx_distinct_asset_id
FROM acd_assets
GROUP BY asset_type;

+------------+--------------------------+
|asset_type  | approx_distinct_asset_id |
+------------+--------------------------+
| cell_phone | 4                        |
| laptop     | 6                        |
+------------+--------------------------+