ANALYZE

Statistics for a table can be collected and updated on an ad hoc basis by running the ANALYZE command:

ANALYZE [TABLE table_name | MEMORY];

If autostats are enabled for a table, ANALYZE for that table will not gather statistics and will return almost instantly. This is normal because the autostats system already is maintaining the statistics so they do not need to be gathered again.

Collecting column statistics is strongly recommended for optimal query performance. Collecting histograms is recommended in most situations. See Histogram vs. Sampling for more details.

Statistics are not updated when the table is modified through DML statements, including INSERT, UPDATE, DELETE, LOAD DATA, and TRUNCATE. They are only collected and updated when the ANALYZE command is called, which completely overwrites any previous statistics collected on the table.

Warning

  • ANALYZE may invalidate plans, i.e. cause them to be reoptimized and recompiled the next time they are run when data statistics change and a new plan could be faster. See the Plan invalidation section for more details.

  • Collecting statistics with ANALYZE can be expensive for large tables, especially when collecting histograms. When autostats are disabled, because statistics are collected only when you run ANALYZE, you can choose when to run it to best manage the performance impact.

Syntax

The ANALYZE command has the following forms:

ANALYZE TABLE table_name;
ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE;
ANALYZE TABLE table_name COLUMNS ALL ENABLE;
ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE;
ANALYZE TABLE table_name COLUMNS ALL DISABLE;
ANALYZE TABLE table_name DROP;
ANALYZE TABLE table_name CORRELATE COLUMN correlated_column_name WITH COLUMN column_name [USING COEFFICIENT coefficient_value];
ANALYZE MEMORY;

These are described in detail below:

ANALYZE TABLE table_name

Collect column statistics on all columns in table_name if it is not autostats-enabled, and collect histograms over all previously designated columns. Invalidate all plans on this table which were compiled with significantly different statistics than the newly observed statistics.

ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE

Designate the specified columns to have histograms collected, in addition to any previously designated columns. Then collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

ANALYZE TABLE table_name COLUMNS ALL ENABLE

Designate all columns to have histograms collected, in addition to any previously designated columns. If the table contains columns which do not support histograms, a warning will be raised for each such column. Collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

Because this command creates a histogram on every column of the table and collects histogram information for each new histogram, please note the following:

  • On very wide tables, this can be expensive to collect, increasing the execution time of the ANALYZE command.

  • The histograms are stored in memory. Make sure to plan for the extra memory enabling those histograms will consume.

  • Histograms are not always useful to have. Query the information_schema.mv_prospective_histograms table to check if a column has been identified from your workload, where having a histogram on that column would be beneficial to your queries.

  • In some cases sampling stats are better than histograms, see Histograms vs. Sampling for more information.

Instead, on workspaces on v7.0+ SingleStore recommends using the default behavior: default_autostats_histogram_mode=CREATE

With default_autostats_histogram_mode set to CREATE, if a column is identified at least twice in information_schema.mv_prospective_histograms that having a histogram on it could be beneficial, then the histogram will be automatically created and updated at that time. Like any histograms you create, it is updated when you run ANALYZE, and also as part of the periodic autostats (same interval: 10 minutes, and actually collected as part of the same process).

ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE

Delete histograms on specified columns if present, and designate the specified columns to not have histograms collected in the future. Does not collect any statistics.

ANALYZE TABLE table_name COLUMNS ALL DISABLE

Delete histograms on all columns of the table if present, and designate all columns to not have histograms collected in the future. If the table contains columns that do not support histograms, a warning will be raised for each such column. Does not collect any statistics.

ANALYZE TABLE table_name DROP

Deletes all stored statistics for a table, and disables histograms on any columns for which they have been enabled. Invalidates all plans on the table which were compiled using statistics. Does not delete automatic statistics.

ANALYZE TABLE table_name CORRELATE COLUMN ... WITH COLUMN ... [USING COEFFICIENT coefficient_value]

Stores correlations between two columns in a table along with their coefficient values. The correlated_column_name is the column that is closely related to another column in a table.

An example is the relationship between car makers and models. A car model is associated with a specific car maker. So in this case, the model is the correlated column:

ANALYZE TABLE table_name CORRELATE COLUMN model WITH COLUMN make USING COEFFICIENT 1.0;

The optimizer defaults correlation coefficients at 0.5. The coefficient values can range from 0 (no correlation) to 1 (highly correlated).

For more information, see the Statistics and Sampling Concepts page.

ANALYZE MEMORY

Note

You must have SUPER privileges to run the ANALYZE MEMORYcommand or an error will be generated. This command also runs in the background every 10 minutes.

ANALYZE MEMORY starts a query internally, collecting memory cost details from past runs for each query across leaf nodes, and retrieves execution time data from the aggregator node. An estimated peak memory usage is calculated for each query. Then, the estimated peak memory usage is populated to the corresponding query entry in the plancache.

ANALYZE MEMORY;

This memory information can also help with workload management tasks such as categorizing query types and managing available memory in the cluster.

JSON Statistics

Statistics for a table containing JSON keys can be collected and updated by using the ANALYZE command.

Note

To use JSON statistics the global engine variable enable_json_statistics needs to be set to true.

To build a histogram on a JSON key:

ANALYZE table <table_name> COLUMN <column+key> enable;
ANALYZE table <table_name> COLUMNS <list of (column+key)> enable;

To drop histograms for a JSON key:

ANALYZE table <table_name> COLUMN <column+key> disable;

ANALYZE table <table_name> COLUMNS <list of (column+key)> disable;

The format of <column+key> is similar to what is allowed in WHERE clauses and projections. For example:

SET GLOBAL enable_json_statistics = 1;
CREATE TABLE analyze_test(col json);
INSERT analyze_test VALUES('{"path" : { "path" : "hello" }}');
SELECT * FROM analyze_test WHERE col::`path`::$`path` = "hello";

To build a histogram on col::'path'::'path' from the previous example:

ANALYZE TABLE analyze_test COLUMN col::`path`::`path` ENABLE;

Remarks

  • ANALYZE triggers plan invalidation and re-optimization when statistics change, as described in Plan Invalidation. This is true whether autostats are enabled or not.

  • If autostats are disabled for a table, ANALYZE collects column statistics on every column, as well as histograms on chosen columns. You may designate which columns to collect histograms on with the ANALYZE TABLE table_name COLUMNS ... {ENABLE | DISABLE} commands.

  • All forms of ANALYZE are subject to the following restrictions:

    1. ANALYZE can be run on a master or child aggregator node.

    2. ANALYZE requires SELECT and ALTER or INSERT permissions on the target table.

    3. table_name must be the name of a table in the chosen database.

    4. column_name must refer to a column in the table table_name.

Last modified: June 20, 2024

Was this article helpful?