Statistics and Sampling Tasks
On this page
Disabling and Enabling Automatic Statistics
You can disable or enable autostats on each table.
You can disable autostats on an existing table by running:
ALTER TABLE table_name DISABLE AUTOSTATS;
or
ALTER TABLE table_name AUTOSTATS_ENABLED = OFF;
and enable autostats by running
ALTER TABLE table_name ENABLE AUTOSTATS;
or
ALTER TABLE table_name AUTOSTATS_ENABLED = ON;
You can choose whether autostats are enabled or disabled when creating a table by adding the option:
AUTOSTATS_ENABLED=<TRUE|FALSE>
at the end of the CREATE TABLE
statement, such as:
CREATE TABLE t (a INT, b INT, SORT KEY (a)) AUTOSTATS_ENABLED=FALSE;
When AUTOSTATS_
is set to TRUE
, autostats categories are set to the following default settings
-
For rowstore tables,
AUTOSTATS_CARDINALITY_MODE = PERIODICAUTOSTATS_HISTOGRAM_MODE = CREATE -
For columnstore tables,
AUTOSTATS_CARDINALITY_MODE = INCREMENTALAUTOSTATS_HISTOGRAM_MODE = CREATEAUTOSTATS_SAMPLING = ON
SingleStore allows you to independently control how each category of statistics is automatically gathered.
AUTOSTATS_CARDINALITY_MODE = [ INCREMENTAL | PERIODIC | OFF ]AUTOSTATS_HISTOGRAM_MODE = [ CREATE | UPDATE | OFF ]AUTOSTATS_SAMPLING = [ ON | OFF ]
These autostats settings can also be specified or modified through the ALTER TABLE
and CREATE TABLE
commands.CREATE TABLE
statement.
ALTER TABLE table_name AUTOSTATS_CARDINALITY_MODE = PERIODIC;
CREATE TABLE t1 (a INT, b INT, SORT KEY(a))AUTOSTATS_CARDINALITY_MODE = INCREMENTAL,AUTOSTATS_HISTOGRAM_MODE = CREATE,AUTOSTATS_SAMPLING= ON;
Warning
Each type of autostats setting may not work for all types of tables.
Additionally, you can change whether autostats is enabled or disabled by default on future CREATE TABLE
statements by setting the global variable default_TRUE
or FALSE
on the master aggregator.TRUE
.
You can also modify the default settings for autostats, individually, by setting the following global variables:
default_autostats_columnstore_cardinality_mode = [ INCREMENTAL | PERIODIC | OFF ]default_autostats_rowstore_cardinality_mode = [ PERIODIC | OFF ]default_autostats_histogram_mode = [ CREATE | UPDATE | OFF ]default_autostats_columnstore_sampling = [ ON | OFF ]
Note
Limitations associated with using autostats: * Enabling AUTOSTATS_
(either mode besides OFF
) requires enabling AUTOSTATS_
(either mode besides OFF
).INCREMENTAL
cardinality mode is only supported on columnstore tables.AUTOSTATS_
is not allowed for rowstore tables.
You can see whether autostats is enabled or disabled on a table by running SHOW CREATE TABLE
.
Autostats Support upon Upgrading SingleStore
Any table that is upgraded to a newer version of SingleStore retains its autostats settings.ALTER TABLE <table_
on each table.
Below is a simple script you can use to enable autostats on all tables (written in bash to minimize dependencies).
Save this script in a file called enable-autostats.
:
#!/bin/bash# Enables autostats on all columnstore tables which currently have autostats disabled## Run this bash script with the mysql client arguments you use to connect to the SingleStore master aggregator, like:# bash enable-autostats.sh -h<host> -P<port> -u<user> -p<password>## Use the --dry-run option to only print the list of tables which would be altered, without actually running the ALTER commands.set -eclient_args=(mysql --batch --skip-column-names)dry_run=falsewhile [[ $# -gt 0 ]]docase "$1" in--dry-run)dry_run=true;;*)client_args+=("$1");;esacshiftdone# List of table names (fully qualified and quoted)tables=$(${client_args[@]} -e "select concat(table_schema, '.\`', replace(table_name, '\`', '\`\`'), '\`') from information_schema.tables where table_type = 'BASE TABLE'")echo "$tables" | while read -r tabledoshow_create_table=$(${client_args[@]} -e "show create table $table")alter_stmt=""if echo "$show_create_table" | grep -q "AUTOSTATS_CARDINALITY_MODE=OFF"thenif echo "$show_create_table" | grep -q "COLUMNSTORE"thenalter_stmt="${alter_stmt}AUTOSTATS_CARDINALITY_MODE=INCREMENTAL"elsealter_stmt="${alter_stmt}AUTOSTATS_CARDINALITY_MODE=PERIODIC"fifiif echo "$show_create_table" | grep -q "AUTOSTATS_HISTOGRAM_MODE=OFF"thenif [[ "$alter_stmt" != "" ]]thenalter_stmt="${alter_stmt},"fialter_stmt="${alter_stmt}AUTOSTATS_HISTOGRAM_MODE=CREATE"fiif echo "$show_create_table" | grep -q "AUTOSTATS_SAMPLING=OFF"thenif echo "$show_create_table" | grep -q "COLUMNSTORE"thenif [[ "$alter_stmt" != "" ]]thenalter_stmt="${alter_stmt},"fifialter_stmt="${alter_stmt}AUTOSTATS_SAMPLING=ON"fiif [[ "$alter_stmt" != "" ]]thenalter_stmt="ALTER TABLE $table $alter_stmt"echo $alter_stmt${client_args[@]} -e "$alter_stmt"fidone
Run this bash script with the mysql client arguments you use to connect to the SingleStore master aggregator, like:
bash enable-autostats.sh -h<host> -P<port> -u<user> -p<password>
Possible reasons to disable autostats include:
-
Incremental autostats require a small memory overhead per column to maintain the statistics.
If you have a large number of columns across all the columnstore tables in your database, this memory overhead may be significant. In this case you can modify your table to use periodic autostats, as the memory overhead is significantly lower and it will still keep the statistics up to date. -
In
CREATE
mode (when default_autostats_ histogram_ mode is set to CREATE
), automatic histograms may end up creating histograms on unwanted columns.As each histogram has a small memory overhead, you may choose to switch the histogram mode to UPDATE
so that all existing histograms will be kept up to date, but no new histograms will be automatically created.You can still create any histograms manually using the ANALYZE
command. -
Histograms cannot be deleted in
CREATE
mode.Deleting a histogram in this mode would have limited impact, as the same histogram may be created automatically in the future. To choose between specific columns for creating histograms, a table must be in UPDATE
mode. -
Incremental autostats require a small performance cost on writes to a table.
You may wish to consider disabling autostats if you find this write performance cost to be significant on your workload, can achieve better overall performance with manual statistics, and are willing to handle the operational requirements of managing statistics manually. -
Periodic autostats uses a small fixed amount of resources in the background.
If the resources used negatively affect your workload, you may find it beneficial to disable autostats -
If you want to “freeze” the statistics, so that the query optimizer continues to use a fixed set of statistics regardless of how the data is changing, you can do so by disabling automatic statistics and using
ANALYZE
to manage statistics.
Statistics Population after Enabling Autostats
Immediately after enabling autostats on a table where it was previously disabled, SingleStore will start collecting automatic statistics in the background until it catches up with the current table data.
This background statistics collection uses limited resources to avoid large impacts on the performance of other workloads.
Troubleshooting Autostats
Viewing Cardinality Statistics
You can view the cardinality statistics last collected by querying the information_
table.
DESCRIBE INFORMATION_SCHEMA.OPTIMIZER_STATISTICS;
+-----------------------+--------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+------+---------+-------+
| DATABASE_NAME | varchar(512) | NO | | NULL | |
| TABLE_NAME | varchar(512) | NO | | NULL | |
| COLUMN_NAME | varchar(512) | NO | | NULL | |
| ROW_COUNT | bigint(21) | YES | | NULL | |
| NULL_COUNT | bigint(21) | YES | | NULL | |
| CARDINALITY | bigint(21) | YES | | NULL | |
| AUTOSTATS_ENABLED | tinyint(1) | YES | | NULL | |
| AUTOSTATS_OUTDATED | tinyint(1) | YES | | NULL | |
| AUTOSTATS_ROW_COUNT | bigint(21) | YES | | NULL | |
| AUTOSTATS_NULL_COUNT | bigint(21) | YES | | NULL | |
| AUTOSTATS_CARDINALITY | bigint(21) | YES | | NULL | |
| ADVANCED_HISTOGRAMS | bigint(21) | YES | | NULL | |
| LEGACY_HISTOGRAMS | bigint(21) | YES | | NULL | |
| RANGE_STATS | bigint(21) | YES | | NULL | |
| SAMPLE_SIZE | bigint(21) | YES | | NULL | |
| LAST_UPDATED | datetime | YES | | NULL | |
+-----------------------+--------------+------+------+---------+-------+
See the OPTIMIZER_information_
table.
Viewing Histogram Statistics
When collected, histograms can be queried through the information_
table:
DESCRIBE INFORMATION_SCHEMA.ADVANCED_HISTOGRAMS;
+-------------------------+---------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------------+---------------+------+------+---------+-------+
| DATABASE_NAME | varchar(512) | NO | | NULL | |
| TABLE_NAME | varchar(512) | NO | | NULL | |
| COLUMN_NAME | varchar(512) | NO | | NULL | |
| TYPE | varchar(255) | NO | | NULL | |
| VERSION | bigint(21) | NO | | NULL | |
| BUCKET_COUNT | bigint(21) | NO | | NULL | |
| BUCKET_INDEX | bigint(21) | NO | | NULL | |
| RANGE_MIN | varchar(8192) | YES | | NULL | |
| RANGE_MAX | varchar(8192) | YES | | NULL | |
| UNIQUE_COUNT | double | YES | | NULL | |
| CARDINALITY | double | YES | | NULL | |
| UNIQUE_COUNT_CUMULATIVE | double | YES | | NULL | |
| CARDINALITY_CUMULATIVE | double | YES | | NULL | |
+-------------------------+---------------+------+------+---------+-------+
See the ADVANCED_information_
table.
Out-of-Date Statistics
If there is too large a difference between the data which SingleStore has gathered statistics on and the actual table data, such as immediately after autostats is enabled, autostats will be considered out-of-date.ANALYZE
was run on the table, if any.EXPLAIN
will show a warning that autostats is out of date.
You can accelerate the collection of autostats by running OPTIMIZE TABLE
:
OPTIMIZE TABLE <table_name> FLUSH;OPTIMIZE TABLE <table_name>;
After statistics have been updated, you may run ANALYZE TABLE
to trigger plan invalidation and re-optimization of any query plans that were based on stale statistics - see Plan Invalidation.
Identifying Missing Statistics Warnings in EXPLAIN
When you run EXPLAIN
on a query, SingleStore will show information about any missing statistics that the optimizer could take advantage of, and the ANALYZE
commands that you can run to collect those statistics.
When autostats are disabled, you should normally follow the recommendations in these warnings.
Note
As discussed above, collecting column statistics is strongly recommended, and collecting histograms is recommended in most situations.
For example, for the SELECT
statement below, the EXPLAIN
output shows that the optimizer is missing column statistics on both tables t
and t2
as well as histograms on t.
.
CREATE TABLE t (id BIGINT PRIMARY KEY, x DOUBLE);CREATE TABLE t2 (id BIGINT PRIMARY KEY, x DOUBLE);EXPLAIN SELECT * FROM t, t2 WHERE t.id = t2.id AND t.x > 1;
+--------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------+
| WARNING: Missing statistics on some tables, consider running the following commands: |
| ANALYZE TABLE db.`t2`; |
| ANALYZE TABLE db.`t` COLUMNS `x` ENABLE; |
| |
| Project [t.id, t.id2, t.x, t2.id, t2.x] |
| Gather partitions:all |
| Project [t.id, t.id2, t.x, t2.id, t2.x] |
| NestedLoopJoin |
| |---IndexSeek db.t2, PRIMARY KEY (id) scan:[id = t.id] |
| Filter [t.x > 1] |
| TableScan db.t, PRIMARY KEY (id) |
+--------------------------------------------------------------------------------------------------+
Identifying Missing Histograms in the Plancache
In addition, you can view information about any recently run queries which were missing histograms, which you can use to help identify which columns you may need to enable autostats on, or collect histograms on.
The view MV_information_
and information_
.
If the AUTOSTATS_
engine variable is set to CREATE
, a histogram is created if the USAGE_
is 2 or greater on the information_
view.
SELECT * FROM information_schema.MV_PROSPECTIVE_HISTOGRAMS;
+---------------+------------+-------------+-------------+
| DATABASE_NAME | TABLE_NAME | COLUMN_NAME | USAGE_COUNT |
+---------------+------------+-------------+-------------+
| db | t | c | 3 |
| db | t | b | 3 |
| db | t | a | 3 |
+---------------+------------+-------------+-------------+
The view information_
shows precisely which queries in the plancache could have made use of the histograms.MV_
and MV_
to learn more about the queries.
SELECT * FROM information_schema.MV_QUERY_PROSPECTIVE_HISTOGRAMS;
+-----------------------------------+---------------+------------+-------------+-------------+
| ACTIVITY_NAME | DATABASE_NAME | TABLE_NAME | COLUMN_NAME | USAGE_COUNT |
+-----------------------------------+---------------+------------+-------------+-------------+
| Select_symbols_t_349f04a075ae9982 | db | t | a | 1 |
| Select_symbols_t_bf93eac63fe6a2e8 | db | t | a | 1 |
| Select_symbols_t_bf93eac63fe6a2e8 | db | t | c | 1 |
| Select_symbols_t_056d53b2265b58ce | db | t | a | 1 |
| Select_symbols_t_4222228538392d61 | db | t | b | 1 |
| Select_symbols_t_ecb971f633697055 | db | t | b | 1 |
| Select_symbols_t_ecb971f633697055 | db | t | c | 1 |
| Select_symbols_t_d0692ff8d63f22e8 | db | t | b | 1 |
| Select_symbols_t_d0692ff8d63f22e8 | db | t | c | 1 |
+-----------------------------------+---------------+------------+-------------+-------------+
Both of these views display columns which were missing histograms at the time the queries were compiled.
Exporting and Importing Statistics
It is also possible to export and import statistics.
ANALYZE TABLE table_name INTO OUTFILE '</path/file.json>';
Save column and histograms recorded on table_
into the file /path/file.
in a JSON representation.data
directory of the SingleStore instance.
Note that this exports the current statistics; it does not collect new statistics.ANALYZE
.
ANALYZE TABLE table_name INFILE '/path/file.json';
Import range and column statistics for table_
This command is disallowed if autostats is enabled on the table.ANALYZE .
to import the statistics, and re-enable autostats.
You can also export and import configuration parameters (including session and global variables) used by the optimizer via a .
ANALYZE OPTIMIZER_STATE INTO OUTFILE '</path/file.json>';ANALYZE OPTIMIZER_STATE INFILE '</path/file.json>';
INTO OUTFILE
stores the configuration parameters into the specified file.
INFILE
specifies a file from which to import configuration parameters.
Note
Using INFILE
to set the configuration will update global variables.
Statistics Lifetime
Statistics are tied to a base table, and are persisted through ALTER TABLE
or by autostats.DROP TABLE
or ANALYZE TABLE t DROP
.
Intermittent Statistic Collection
You can configure settings for periodic statistics collection by setting the engine variables background_
and background_
.
The following scenario demonstrates the periodic statistics collection process for autostats-enabled tables.
-
The user creates a new database
DB1
and tableDB1.
with autostats enabled and the following default settings:T1 -
background_
= 60 (trigger collection every 60 seconds)statistics_ collection_ interval -
background_
= 0.statistics_ collection_ threshold 5 (trigger collection when the ratio of changes to the number of rows hits 0. 5)
-
-
The user inserts 5 rows into the
DB1.
table.T1 The mv_
table reports a total of 5 changes across the partitions forrow_ change_ counts DB1
. -
Within 60 seconds, a background thread wakes up and scans the change counts.
The row count is saved as 5, and after the stats collection, the last_
field in theanalyzed_ total_ changes mv_
table is set to thecollected_ row_ change_ counts total_
field, both of which will sum to 5 across partitions.changes -
The user inserts 3 new rows.
The row change counts now sum to 8. -
Next time the background thread wakes, it will see 3 changes and observe the last collected row count (5).
As 3/5 exceeds the 0. 5 ratio, the collection is triggered again.
Last modified: March 8, 2024