# Use Case: Disk Bottleneck

An update query against a rowstore table is unexpectedly slow, and we’d like to understand why.

## Was the query running or waiting for a resource?

To answer this question, we compare the cluster-wide CPU usage associated with the query against cluster-wide wait times. `mv_activities` reports these in a single row. To find the correct row, we join `mv_activities` with `mv_queries` so that we may search by query text.

```sql
USE information_schema;

```

```sql
SELECT a.cpu_time_ms, a.cpu_wait_time_ms, a.lock_time_ms, a.disk_time_ms, a.network_time_ms, left(q.query_text, 50), a.activity_name
FROM mv_activities a LEFT JOIN mv_queries q ON q.activity_name = a.activity_name
WHERE q.query_text like "update %";

```

```output

+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+
| cpu_time_ms | cpu_wait_time_ms | lock_time_ms | disk_time_ms | network_time_ms | left(q.query_text, 50)                             | activity_name                    |
+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+
|          64 |                3 |            0 |          938 |             966 | update rowstore_db.rowstore set c = c+@ where a =  | Update_rowstore_9cb5d2de6e9ae9cc |
+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+

```

The query mostly waited. Over the interval of time described by `mv_activities`, the query spent only 60ms running, compared to 900ms waiting for disk operations to complete and 900ms waiting for network messages.

## Why was the query polling on the network?

`network_time_ms`*may* indicate a network issue, but it also includes time the aggregator spent waiting for a response from leaves.

To decide whether we have an actual network issue, we use `mv_activities_extended` to drill down into the per-partition usage associated with the query across the entire cluster. We again join with `mv_queries` to find the correct rows. We also join with `mv_nodes` to describe the nodes on which tasks ran.

```sql
use information_schema;

```

```sql
select a.elapsed_time_ms, a.cpu_time_ms, a.cpu_wait_time_ms, a.lock_time_ms, a.lock_row_time_ms, a.network_time_ms, a.disk_time_ms, a.log_flush_time_ms, a.log_buffer_time_ms, a.database_name, a.partition_id, n.type node_type
from mv_activities_extended a left join mv_queries q on a.aggregator_activity_name = q.activity_name left join mv_nodes n on a.node_id=n.id
where q.query_text like "update%";

```

```output

+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+
| elapsed_time_ms | cpu_time_ms | cpu_wait_time_ms | lock_time_ms | lock_row_time_ms | network_time_ms | disk_time_ms | log_flush_time_ms | log_buffer_time_ms | database_name | partition_id | node_type |
+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+
|             941 |          44 |                1 |            0 |                0 |               0 |            0 |               915 |                  0 | rowstore_db   |            0 | LEAF      |
|             993 |          46 |                2 |            1 |                0 |             956 |            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        |
+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+

```

The 900ms `network_time_ms` we saw in `mv_activities` was accumulated on the MA node, and closely matches the latency of processing the query on the sole relevant leaf partition.

We can deduce that `network_time_ms` was high because the MA genuinely needed to wait for the leaf to finish processing the query.

## Why was latency high on the leaf?

In the previous query, we saw 900ms `log_flush_time_ms`. The leaf spent its time waiting for transaction logs to be flushed to disk.

Note that we saw zero disk\_time\_ms in `mv_activities_extended`, and that `mv_activities` doesn’t have a `log_flush_time_ms column`. This is because `disk_time_ms` in `mv_activities` is split into `disk_time_ms` and `log_flush_time_ms` in `mv_activities_extended`.

We can drill down further, to illustrate the system-wide profiling potential of `mv_activities_extended`.

SingleStore performs actual flushing of transaction logs in a separate thread per partition, which is instrumented with a separate task. We can see the relationship between queries and log flusher threads by expanding our query to include system tasks. In particular, we select all rows of `mv_activities_extended` associated with the query’s database:

```sql
use information_schema;

```

```sql
select a.disk_time_ms, a.log_flush_time_ms, a.log_buffer_time_ms, a.database_name, a.partition_id, n.type node_type, a.activity_name, a.activity_type
from mv_activities_extended a left join mv_nodes n on a.node_id=n.id
where a.database_name="rowstore_db" order by a.disk_time_ms desc;

```

```output

+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+
| disk_time_ms | log_flush_time_ms | log_buffer_time_ms | database_name | partition_id | node_type | activity_name                    | activity_type |
+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+
|          923 |                 0 |                  0 | rowstore_db   |            0 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            2 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            2 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | LEAF      | ReplayFile                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            1 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |               925 |                  0 | rowstore_db   |            0 | LEAF      | update_rowstore_62c84c96b9194b88 | Query         |
|            0 |                 0 |                  0 | rowstore_db   |            0 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | LEAF      | StreamFileFromMaster             | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            1 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | Update_rowstore_9cb5d2de6e9ae9cc | Query         |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | MEMSQL_REPLICATION_START         | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | ReplicaHandler_Loop              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | ReplicaHandler_StreamFiles       | Database      |
+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+

```

LogFlusher tasks on partition 0 spent as much time waiting for disk writes to complete as our query tasks spent waiting for transaction log flushes to be acknowledged.

We can now build a detailed theory of what happened. Query latency was high because the queried MA spent most of its time waiting for a leaf node to finish processing the query on partition 0. This leaf node, in turn, was delayed waiting for a transaction log flusher thread to complete disk writes.

Ultimately, the query was waiting for disk.

## What else was using the disk?

We extract disk usage figures from `mv_activities_extended`, again joining with mv\_queries for descriptive query text.

```sql
USE information_schema;

```

```sql
SELECT a.disk_logical_write_b, a.disk_logical_read_b, a.activity_name, a.database_name, a.partition_id, LEFT(q.query_text, 50)
FROM mv_activities_extended a LEFT JOIN mv_queries q ON a.aggregator_activity_name = q.activity_name
ORDERY BY a.disk_logical_write_b + a.disk_logical_read_b desc limit 10;

```

```output

+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+
| disk_logical_write_b | disk_logical_read_b | activity_name                       | database_name  | partition_id | left(q.query_text, 50)                             |
+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+
|              1383163 |             1383855 | RunBackgroundMerger                 | columnstore_db |            0 | NULL                                               |
|              1212571 |             1213233 | RunBackgroundMerger                 | columnstore_db |            1 | NULL                                               |
|               797124 |              598624 | RunBackgroundMerger                 | columnstore_db |            2 | NULL                                               |
|                22535 |                   0 | LogFlusher                          | columnstore_db |            1 | NULL                                               |
|                21335 |                   0 | LogFlusher                          | columnstore_db |            2 | NULL                                               |
|                17990 |                   0 | LogFlusher                          | columnstore_db |            0 | NULL                                               |
|                 7076 |                   0 | LogFlusher                          | rowstore_db    |            0 | NULL                                               |
|                 1025 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            1 | insert into columnstore select @*round(@ * @ *rand |
|                  894 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            2 | insert into columnstore select @*round(@ * @ *rand |
|                  769 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            0 | insert into columnstore select @*round(@ * @ *rand |
+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+

```

The disk was dominated by work done on behalf of a columnstore insert query in the `columnstore_db` database.

***

Modified at: March 3, 2023

Source: [/db/v9.1/reference/troubleshooting-reference/diagnosing-a-performance-issue-using-workload-profiling/use-case-disk-bottleneck/](https://docs.singlestore.com/db/v9.1/reference/troubleshooting-reference/diagnosing-a-performance-issue-using-workload-profiling/use-case-disk-bottleneck/)

(An index of the documentation is available at /llms.txt)
