OPTIMIZE TABLE
On this page
Performs optimizations on the storage of rows in a table.
Optimizing Columnstore Tables
This topic assumes familiarity with the concepts of columnstore table layout, including segments, row segment groups, and background mergers.
SingleStore automatically runs optimization routines on tables, but they can also be started manually.OPTIMIZE TABLE
command is run.
OPTIMIZE TABLE
requires the INSERT
permission.
Syntax
OPTIMIZE TABLE <table_name> [FULL | FLUSH | FIX_ALTER | INDEX];
OPTIMIZE TABLE <table_name> WARM BLOB CACHE FOR COLUMN <column_names>;
Warning
This command can only be run against one table at a time.
Arguments (Columnstore)
The types of optimization performed by OPTIMIZE TABLE
are discussed below.
-
Without any arguments,
OPTIMIZE TABLE
runs a Manual merge. -
With the
FLUSH
argument,OPTIMIZE TABLE
runs a Manual flush.It does not run a Manual merge like OPTIMIZE TABLE
. -
With the
FULL
argument,OPTIMIZE TABLE
runs both a Manual merge - Full and a Manual flush (in MemSQL 6.0/SingleStore 7. 1 and later). In MemSQL 5. 8 and earlier, it runs only a Manual merge - Full. -
The
FIX_
argument is used when a previousALTER ALTER TABLE
command failed.This argument completes the background work of the failed ALTER TABLE
command.It can only be used with columnstore tables. It cannot be used with either the FULL
orFLUSH
arguments. -
The
INDEX
argument runs the optimization routine for columnstore secondary indexes manually. -
The WARM BLOB CACHE FOR COLUMN argument can take a comma separated list of column names (
c1, c2, c3
, etc.) or ' *
' for all columns. -
This command causes implicit commits.
Refer to COMMIT for more information. -
Refer to the Permission Matrix for the required permission.
Remarks
-
SingleStore supports online
OPTIMIZE TABLE
, which means that you can read and write while the table is being optimized.OPTIMIZE TABLE
on a sharded table is always executed online.Note that online OPTIMIZE TABLE
will not begin optimizing the table, but it will wait until all DML queries that were already running on the table finish.This allows any in-progress queries to complete execution before optimizing the table, and ensures consistency of results from queries on the table since the time of execution of OPTIMIZE TABLE
.As soon as the in-progress reads and writes complete and the OPTIMIZE TABLE
command begins optimizing the table, new reads and writes will proceed as normal.This blocking period usually lasts on the order of milliseconds. -
When there are concurrent writes being submitted against large columnstore tables while they are being optimized, you may wish to run the
OPTIMIZE TABLE
command repeatedly instead of running theOPTIMIZE TABLE FULL
command, since in larger tables, runningOPTIMIZE TABLE FULL
can take longer.Running repeated OPTIMIZE TABLE
commands instead ofOPTIMIZE TABLE FULL
avoids lock wait timeout for updates and improves concurrency. -
If you are running frequent
OPTIMIZE TABLE
statements on a table and have a lot of long-running queries on that table, then your normal workload may experience some periods of delay since it blocks other queries from starting while it waits for completion of long-running queries.There are a few types of optimization performed on columnstore tables: -
Automatic background optimization There is a background merger process, that is equivalent to constantly running the
OPTIMIZE TABLE
command in the background.These background threads automatically optimize columnstore indexes as needed. These optimizations are run in a single thread to minimize impact to concurrent query workloads. The background optimization will attempt to keep the number of row segment groups low, but will not attempt to create a single row segment group due to the cost of this operation. If the load is uneven, it can fall behind and should eventually catch up automatically.
However, sometimes it may be worthwhile to manually run the OPTIMIZE TABLE
command just to tell the server you want more resources dedicated to the task to speed up the performance. -
Manual merge - The
OPTIMIZE TABLE table_
statement runs a foreground optimization which uses multiple threads to complete the process as fast as possible, unlike automatic background optimization which only uses a single thread.name As a result, this may negatively impact the performance of other concurrent workloads. -
Manual merge - Full - With the
FULL
option, theOPTIMIZE TABLE
command sorts the entire table, creating a single sorted row segment group.This is more expensive and therefore takes longer than a regular manual optimization. The potential benefit is that because it sorts the entire table into a single sorted row segment group instead of a small number of them, read queries may perform faster. However, this benefit diminishes as writes are made to the table. See the Managing Columnstore Segments section for more information.
-
-
Manual flush - This
OPTIMIZE TABLE table_
statement will flush any rows in the in-memory rowstore-format row segment group into one or more columnstore-format row segment groups.name FLUSH Rows in the in-memory rowstore-format group are automatically flushed to the columnstore format in the background, so this command is generally not necessary. Some potential benefits are that this may be helpful if your rowstore segment is taking up a lot of memory that you would like to free up for other purposes, and putting all data into the columnar format may allow queries to run faster.
See Managing Columnstore Segments for details.
Optimizing Rowstore Tables
For rowstore tables, OPTIMIZE TABLE
physically sorts the data in memory by the primary key and optimizes the organization of the table’s internal structures, so primary key table scans will execute significantly faster.
OPTIMIZE TABLE
requires the INSERT
permission.
Syntax
OPTIMIZE TABLE <table_name>;
Arguments (Rowstore)
-
The
FLUSH
flag is not a valid option for rowstore tables. -
The
FULL
flag is not a valid option for rowstore tables; the syntax is accepted but it will be ignored by SingleStore.
Remarks
-
OPTIMIZE TABLE
can be particularly useful for reducing memory usage and improving query performance in cases where a leaf has been online for a long time, or if the workload is highly transactional. -
This can be a costly command in terms of time and CPU use if many rows need to be reordered to return the table to ordered memory, especially if it is being run during a high traffic time.
-
SingleStore supports online
OPTIMIZE TABLE
, which means that you can read and write while the table is being optimized.OPTIMIZE TABLE
on a sharded table is always executed online.Note that online OPTIMIZE TABLE
will not begin optimizing the table, but it will wait until all DML queries that were already running on the table finish.This allows any in-progress queries to complete execution before optimizing the table, and ensures consistency of results from queries on the table since the time of execution of OPTIMIZE TABLE
.As soon as the in-progress reads and writes complete and the OPTIMIZE TABLE
command begins optimizing the table, new reads and writes will proceed as normal.This blocking period usually lasts on the order of milliseconds. -
If you are running frequent
OPTIMIZE TABLE
statements on a table and have a lot of long-running queries on that table, then your normal workload may experience some periods of delay since it blocks other queries from starting while it waits for completion of long-running queries. -
This command can be run on the master aggregator node, or a child aggregator node (see Cluster Management Commands ).
Optimizing to Warm the Blob Cache
This command can be used to explicitly warm the disk cache with the index or columnar blobs for specific tables.
Syntax
OPTIMIZE TABLE <table_name> WARM BLOB CACHE FOR INDEX <index_name>
OPTIMIZE TABLE <table_name> WARM BLOB CACHE FOR COLUMN <column_names>
Arguments
-
table_
: name of the table to fetch index data from into the blob cache.name -
index_
: name of the specific hash index of the table to warm the blob cachename -
column_
: a comma-separated list of column names or 'name *
' for all columns
Remarks
-
The explicit blob cache warming command supports only one mode namely, warm a specific index.
-
The command will submit blob cache fetch requests for all user-requested data and block until all fetch requests are complete.
-
The blob cache will not hold references on all fetched data, so it is possible that these fetched data will be evicted by queries after the warm command.
-
If the blob cache has enough space, then the warm command can continue.
-
If the blob cache does not have enough space but it can evict existing blobs, then the warm command can also continue and will end in a full loading (by evicting all other blobs), or end in a partial loading (for example, 70% loaded, by evicting your own recently loaded blobs).
-
If the blob cache does not have enough space, and it cannot evict any existing blobs (for example, due to running queries), then this command will fail with the out-of-disk error.
Last modified: August 1, 2024