ANALYZE
On this page
Statistics for a table can be collected and updated on an ad hoc basis by running the ANALYZE
command:
ANALYZE [TABLE table_name | MEMORY];
If autostats are enabled for a table, ANALYZE
for that table will not gather statistics and will return almost instantly.
Collecting column statistics is strongly recommended for optimal query performance.
Statistics are not updated when the table is modified through DML statements, including INSERT
, UPDATE
, DELETE
, LOAD DATA
, and TRUNCATE
.ANALYZE
command is called, which completely overwrites any previous statistics collected on the table.
Warning
-
ANALYZE
may invalidate plans, i.e. cause them to be reoptimized and recompiled the next time they are run when data statistics change and a new plan could be faster. See the Plan invalidation section for more details. -
Collecting statistics with
ANALYZE
can be expensive for large tables, especially when collecting histograms.When autostats are disabled, because statistics are collected only when you run ANALYZE
, you can choose when to run it to best manage the performance impact.
Syntax
The ANALYZE
command has the following forms:
ANALYZE TABLE table_name;ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE;ANALYZE TABLE table_name COLUMNS ALL ENABLE;ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE;ANALYZE TABLE table_name COLUMNS ALL DISABLE;ANALYZE TABLE table_name DROP;ANALYZE TABLE table_name CORRELATE COLUMN correlated_column_name WITH COLUMN column_name [USING COEFFICIENT coefficient_value];ANALYZE MEMORY;
These are described in detail below:
ANALYZE TABLE table_ name
Collect column statistics on all columns in table_
ANALYZE TABLE table_ name COLUMNS column_ name [, . . . ] ENABLE
Designate the specified columns to have histograms collected, in addition to any previously designated columns.ANALYZE TABLE table_
.
ANALYZE TABLE table_ name COLUMNS ALL ENABLE
Designate all columns to have histograms collected, in addition to any previously designated columns.ANALYZE TABLE table_
.
Because this command creates a histogram on every column of the table and collects histogram information for each new histogram, please note the following:
-
On very wide tables, this can be expensive to collect, increasing the execution time of the
ANALYZE
command. -
The histograms are stored in memory.
Make sure to plan for the extra memory enabling those histograms will consume. -
Histograms are not always useful to have.
Query the information_
table to check if a column has been identified from your workload, where having a histogram on that column would be beneficial to your queries.schema. mv_ prospective_ histograms -
In some cases sampling stats are better than histograms, see Histograms vs.
Sampling for more information.
Instead, on clusters on v7.default_
With default_
set to CREATE
, if a column is identified at least twice in information_
that having a histogram on it could be beneficial, then the histogram will be automatically created and updated at that time.ANALYZE
, and also as part of the periodic autostats (same interval: 10 minutes, and actually collected as part of the same process).
ANALYZE TABLE table_ name COLUMNS column_ name [, . . . ] DISABLE
Delete histograms on specified columns if present, and designate the specified columns to not have histograms collected in the future.
ANALYZE TABLE table_ name COLUMNS ALL DISABLE
Delete histograms on all columns of the table if present, and designate all columns to not have histograms collected in the future.
ANALYZE TABLE table_ name DROP
Deletes all stored statistics for a table, and disables histograms on any columns for which they have been enabled.
ANALYZE TABLE table_ name CORRELATE COLUMN . . . WITH COLUMN . . . [USING COEFFICIENT coefficient_ value]
Stores correlations between two columns in a table along with their coefficient values.correlated_
is the column that is closely related to another column in a table.
An example is the relationship between car makers and models.
ANALYZE TABLE table_name CORRELATE COLUMN model WITH COLUMN make USING COEFFICIENT 1.0;
The optimizer defaults correlation coefficients at 0.
For more information, see the Statistics and Sampling Concepts page.
ANALYZE MEMORY
Note
You must have SUPER
privileges to run the ANALYZE MEMORY
command or an error will be generated.
ANALYZE MEMORY
starts a query internally, collecting memory cost details from past runs for each query across leaf nodes, and retrieves execution time data from the aggregator node.
ANALYZE MEMORY;
This memory information can also help with workload management tasks such as categorizing query types and managing available memory in the cluster.
JSON Statistics
Statistics for a table containing JSON keys can be collected and updated by using the ANALYZE
command.
Note
To use JSON statistics the global engine variable enable_
needs to be set to true.
To build a histogram on a JSON key:
ANALYZE table <table_name> COLUMN <column+key> enable;ANALYZE table <table_name> COLUMNS <list of (column+key)> enable;
To drop histograms for a JSON key:
ANALYZE table <table_name> COLUMN <column+key> disable;
ANALYZE table <table_name> COLUMNS <list of (column+key)> disable;
The format of <column+key> is similar to what is allowed in WHERE
clauses and projections.
SET GLOBAL enable_json_statistics = 1;CREATE TABLE analyze_test(col json);INSERT analyze_test VALUES('{"path" : { "path" : "hello" }}');
SELECT * FROM analyze_test WHERE col::`path`::$`path` = "hello";
To build a histogram on col::'path'::'path'
from the previous example:
ANALYZE TABLE analyze_test COLUMN col::`path`::`path` ENABLE;
Remarks
-
ANALYZE
triggers plan invalidation and re-optimization when statistics change, as described in Plan Invalidation.This is true whether autostats are enabled or not. -
If autostats are disabled for a table,
ANALYZE
collects column statistics on every column, as well as histograms on chosen columns.You may designate which columns to collect histograms on with the ANALYZE TABLE table_
commands.name COLUMNS . . . {ENABLE | DISABLE} -
All forms of
ANALYZE
are subject to the following restrictions:-
ANALYZE
can be run on a master or child aggregator node. -
ANALYZE
requiresSELECT
andALTER
orINSERT
permissions on the target table. -
table_
must be the name of a table in the chosen database.name -
column_
must refer to a column in the tablename table_
.name
-
Last modified: June 20, 2024