Managing Disk Space Used by Transaction Logs

Calculating Disk Space Needed by Transaction Logs

Transaction logs require (snapshot_trigger_size * snapshots_to_keep) of space over time (approximately), or about 4 GB per partition using the default values of these variables. Hence the transaction logs can grow to 2GB before a snapshot is taken. SingleStore recommends to have twice this amount of disk space available for long-term log storage.

Each database partition has two preallocated transaction log files. By default, these files are 256 MB each. The size of these preallocated transaction log files is defined by the variable log_file_size_partitions. For example, if each leaf has 8 partitions, an empty database will require 2 log files * 256 MB * 8 partitions = 4 GB of space in transaction logs per leaf. It is important to remember that paired mode high availability is on by default. This means that each leaf will have one set of master database partitions and one set of replica database partitions, and therefore the minimum amount of space required for transaction logs is doubled: 2 log files * 256 MB * 16 partitions = 8 GB per leaf node. Extending this to a cluster with 10 databases, the transaction logs will consume at least 8 GB * 10 databases = 80 GB.

Moreover, when queries are submitted to the database transaction log files are filled. Once the preallocated logs become full, more will be created. The transaction logs will only be truncated once a snapshot is taken. Snapshots occur when a partition has enough transactions to meet the snapshot_trigger_size, which is 2GB by default. See the section Deleting Transaction Logs below for more information on snapshots.

Controlling Individual Log File Size

Individual log file size can be controlled by adjusting the log_file_size_partitions and log_file_size_ref_dbs global engine variables. These variables are set to large values so that the preallocated files can handle sudden high throughput. Note that changing the log file size will not affect existing databases. Reducing the size of existing log files will have a random and severe performance impact on your workload when new log files need to be allocated.

These engine variables impact write throughput when there is a sustained burst of rowstore writes. Dips in throughput indicate the need to create a new log file. If your workload is mostly columnstore it's unlikely you would notice anything.

Reducing Disk Space Used by Transaction Logs

To reduce the total disk space occupied by all log files, tune the snapshot_trigger_size and snapshots_to_keep global engine variables while keeping the following in mind.

  • A lower snapshot_trigger_size value will result in more frequent snapshots, thereby reducing the accumulation of transaction logs. However, a higher setting will lead to fewer snapshots and result in a longer replay time for the snapshots taken.

  • A large snapshots_to_keep value increases the number of snapshots available and therefore increases the amount of space needed to store snapshots.

Managing Transaction Log Growth

To manage the growing size of your transaction logs, you can safely expand the disk size to accommodate new log files.

Deleting Transaction Logs

Warning

Transaction log files should never be deleted manually as doing so may result in the loss or corruption of your data.

To delete transaction logs, run SNAPSHOT on the database in which the logs are growing. This command forces the cleanup of log files, thereby allowing you to control how much disk space exists between snapshots. Once you issue this command, all the logs will be committed to the snapshot and deleted. (note that the engine always keeps two log files created ahead).

You can run the du Linux command to determine which databases have the largest log files.

In the following situations, SNAPSHOT is run implicitly, in accordance with the value of the snapshots_to_keep variable. This causes deletion of the transaction logs:

  • Transaction logs reach the size defined by the snapshot_trigger_size variable and, as a result, a snapshot is triggered.

  • The BACKUP command is run and, as a result, a snapshot is triggered.

  • Node maintenance operations occur (such as cluster upgrades) and, as a result, a snapshot is triggered.

The transaction logs may not get immediately cleaned out because the garbage collector (GC) has to do the final cleanup to complete the deletion process. Since the transaction logs are a part of the normal operation they will regrow so ideally there should be enough disk space to completely fill the TX  logs and snapshots.

Last modified: March 8, 2024

Was this article helpful?