ANALYZE

Statistics for a table can be collected and updated on an ad hoc basis by running the ANALYZE command:

ANALYZE [TABLE table_name | MEMORY];

If autostats are enabled for a table, ANALYZE for that table will not gather statistics and will return almost instantly. This is normal because the autostats system already is maintaining the statistics so they do not need to be gathered again.

Collecting column statistics is strongly recommended for optimal query performance. Collecting histograms is recommended in most situations. See Histogram vs. Sampling for more details.

Statistics are not updated when the table is modified through DML statements, including INSERT, UPDATE, DELETE, LOAD DATA, and TRUNCATE. They are only collected and updated when the ANALYZE command is called, which completely overwrites any previous statistics collected on the table.

Warning

  • ANALYZE may invalidate plans, i.e. cause them to be reoptimized and recompiled the next time they are run when data statistics change and a new plan could be faster. See the Plan invalidation section for more details.

  • Collecting statistics with ANALYZE can be expensive for large tables, especially when collecting histograms. When autostats are disabled, because statistics are collected only when you run ANALYZE, you can choose when to run it to best manage the performance impact.

Syntax

The ANALYZE command has the following forms:

ANALYZE TABLE table_name;
ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE;
ANALYZE TABLE table_name COLUMNS ALL ENABLE;
ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE;
ANALYZE TABLE table_name COLUMNS ALL DISABLE;
ANALYZE TABLE table_name DROP;
ANALYZE MEMORY;

These are described in detail below:

ANALYZE TABLE table_name

Collect column statistics on all columns in table_name if it is not autostats-enabled, and collect histograms over all previously designated columns. Invalidate all plans on this table which were compiled with significantly different statistics than the newly observed statistics.

ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE

Designate the specified columns to have histograms collected, in addition to any previously designated columns. Then collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

ANALYZE TABLE table_name COLUMNS ALL ENABLE

Designate all columns to have histograms collected, in addition to any previously designated columns. If the table contains columns which do not support histograms, a warning will be raised for each such column. Collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

Because this command creates a histogram on every column of the table and collects histogram information for each new histogram, please note the following:

  • On very wide tables, this can be expensive to collect, increasing the execution time of the ANALYZE command.

  • The histograms are stored in memory. Make sure to plan for the extra memory enabling those histograms will consume.

  • Histograms are not always useful to have. Query the information_schema.mv_prospective_histograms table to check if a column has been identified from your workload, where having a histogram on that column would be beneficial to your queries.

  • In some cases sampling stats are better than histograms, see Histograms vs. Sampling for more information.

Instead, on clusters on v7.0+ SingleStore recommends using the default behavior: default_autostats_histogram_mode=CREATE

With default_autostats_histogram_mode set to CREATE, if a column is identified at least twice in information_schema.mv_prospective_histograms that having a histogram on it could be beneficial, then the histogram will be automatically created and updated at that time. Like any histograms you create, it is updated when you run ANALYZE, and also as part of the periodic autostats (same interval: 10 minutes, and actually collected as part of the same process).

ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE

Delete histograms on specified columns if present, and designate the specified columns to not have histograms collected in the future. Does not collect any statistics.

ANALYZE TABLE table_name COLUMNS ALL DISABLE

Delete histograms on all columns of the table if present, and designate all columns to not have histograms collected in the future. If the table contains columns that do not support histograms, a warning will be raised for each such column. Does not collect any statistics.

ANALYZE TABLE table_name DROP

Deletes all stored statistics for a table, and disables histograms on any columns for which they have been enabled. Invalidates all plans on the table which were compiled using statistics. Does not delete automatic statistics.

ANALYZE MEMORY

Note

You must have SUPER privileges to run the ANALYZE MEMORYcommand or an error will be generated. This command also runs in the background every 10 minutes.

ANALYZE MEMORY starts a query internally, collecting memory cost details from past runs for each query across leaf nodes, and retrieves execution time data from the aggregator node. An estimated peak memory usage is calculated for each query. Then, the estimated peak memory usage is populated to the corresponding query entry in the plancache.

ANALYZE MEMORY;

This memory information can also help with workload management tasks such as categorizing query types and managing available memory in the cluster.

Remarks

  • ANALYZE triggers plan invalidation and re-optimization when statistics change, as described in Plan Invalidation. This is true whether autostats are enabled or not.

  • If autostats are disabled for a table, ANALYZE collects column statistics on every column, as well as histograms on chosen columns. You may designate which columns to collect histograms on with the ANALYZE TABLE table_name COLUMNS ... {ENABLE | DISABLE} commands.

  • All forms of ANALYZE are subject to the following restrictions:

    1. ANALYZE can be run on a master or child aggregator node.

    2. ANALYZE requires SELECT and ALTER or INSERT permissions on the target table.

    3. table_name must be the name of a table in the chosen database.

    4. column_name must refer to a column in the table table_name.

Last modified: June 20, 2024

Was this article helpful?