ANALYZE

Statistics for a table can be collected and updated on an ad hoc basis by running the ANALYZE command:

ANALYZE TABLE table_name;

If autostats are enabled for a table, ANALYZE for that table will not gather statistics and will return almost instantly. This is normal because the autostats system already is maintaining the statistics so they don’t need to be gathered again.

ANALYZE also triggers plan invalidation and re-optimization when statistics change, as described in Plan Invalidation. This is true whether autostats are enabled or not.

If autostats are disabled for a table, ANALYZE collects column statistics on every column, as well as histograms on chosen columns. You may designate which columns to collect histograms on with the ANALYZE TABLE table_name COLUMNS ... {ENABLE | DISABLE} commands, as described below.

Collecting column statistics is strongly recommended for optimal query performance. Collecting histograms is recommended in most situations. See Histogram vs. Sampling for more details.

Statistics are not updated when the table is modified through DML statements, including INSERT, UPDATE, DELETE, LOAD DATA, and TRUNCATE. They are only collected and updated when the ANALYZE command is called, which completely overwrites any previous statistics collected on the table.

Warning

  • ANALYZE may invalidate plans, i.e. cause them to be reoptimized and recompiled the next time they are run when data statistics change and a new plan could be faster. See the Plan invalidation section for more details.

  • Collecting statistics with ANALYZE can be expensive for large tables, especially when collecting histograms. When autostats are disabled, because statistics are collected only when you run ANALYZE, you can choose when to run it to best manage the performance impact.

Notes

All forms of ANALYZE are subject to the following restrictions:

  • ANALYZE can be run on a master or child aggregator node.

  • ANALYZE requires SELECT and ALTER or INSERT permissions on the target table.

  • table_name must be the name of a table in the chosen database.

  • column_name must refer to a column in the table table_name.

The ANALYZE command has the following forms:

ANALYZE TABLE table_name;
ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE;
ANALYZE TABLE table_name COLUMNS ALL ENABLE;
ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE;
ANALYZE TABLE table_name COLUMNS ALL DISABLE;
ANALYZE TABLE table_name DROP;

These are described in detail below.

ANALYZE TABLE table_name;

Collect column statistics on all columns in table_name if it is not autostats-enabled, and collect histograms over all previously designated columns. Invalidate all plans on this table which were compiled with significantly different statistics than the newly observed statistics.

ANALYZE TABLE table_name COLUMNS column_name [, ...] ENABLE;

Designate the specified columns to have histograms collected, in addition to any previously designated columns. Then collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

ANALYZE TABLE table_name COLUMNS ALL ENABLE;

Designate all columns to have histograms collected, in addition to any previously designated columns. If the table contains columns which do not support histograms, a warning will be raised for each such column. Collects statistics and invalidates plans the same way as ANALYZE TABLE table_name.

Because this command creates a histogram on every column of the table and collects histogram information for each new histogram, please note the following:

  • On very wide tables, this can be expensive to collect, increasing the execution time of the ANALYZE command.

  • The histograms are stored in memory. Make sure to plan for the extra memory enabling those histograms will consume.

  • Histograms are not always useful to have. Query the information_schema.mv_prospective_histograms table to check if a column has been identified from your workload, where having a histogram on that column would be beneficial to your queries.

  • In some cases sampling stats are better than histograms, see Histograms vs. Sampling for more information.

Instead, on clusters on v7.0+ we recommend using the default behavior: default_autostats_histogram_mode=CREATE

With default_autostats_histogram_mode set to CREATE, if a column is identified at least twice in information_schema.mv_prospective_histograms that having a histogram on it could be beneficial, then the histogram will be automatically created and updated at that time. Like any histograms you create, it is updated when you run ANALYZE, and also as part of the periodic autostats (same interval: 10 minutes, and actually collected as part of the same process).

ANALYZE TABLE table_name COLUMNS column_name [, ...] DISABLE;

Delete histograms on specified columns if present, and designate the specified columns to not have histograms collected in the future. Does not collect any statistics.

ANALYZE TABLE table_name COLUMNS ALL DISABLE;

Delete histograms on all columns of the table if present, and designate all columns to not have histograms collected in the future. If the table contains columns that do not support histograms, a warning will be raised for each such column. Does not collect any statistics.

ANALYZE TABLE table_name DROP;

Deletes all stored statistics for a table, and disables histograms on any columns for which they have been enabled. Invalidates all plans on the table which were compiled using statistics. Does not delete automatic statistics.