Use Case: Disk Bottleneck

Was the query running or waiting for a resource?

To answer this question, we compare the cluster-wide CPU usage associated with the query against cluster-wide wait times. mv_activities reports these in a single row. To find the correct row, we join mv_activities with mv_queries so that we may search by query text.

SQL

USE information_schema;

SQL

SELECT a.cpu_time_ms, a.cpu_wait_time_ms, a.lock_time_ms, a.disk_time_ms, a.network_time_ms, left(q.query_text, 50), a.activity_name
FROM mv_activities a LEFT JOIN mv_queries q ON q.activity_name = a.activity_name
WHERE q.query_text like "update %";

+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+
| cpu_time_ms | cpu_wait_time_ms | lock_time_ms | disk_time_ms | network_time_ms | left(q.query_text, 50)                             | activity_name                    |
+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+
|          64 |                3 |            0 |          938 |             966 | update rowstore_db.rowstore set c = c+@ where a =  | Update_rowstore_9cb5d2de6e9ae9cc |
+-------------+------------------+--------------+--------------+-----------------+----------------------------------------------------+----------------------------------+

The query mostly waited. Over the interval of time described by mv_activities, the query spent only 60ms running, compared to 900ms waiting for disk operations to complete and 900ms waiting for network messages.

Why was the query polling on the network?

network_time_msmay indicate a network issue, but it also includes time the aggregator spent waiting for a response from leaves.

To decide whether we have an actual network issue, we use mv_activities_extended to drill down into the per-partition usage associated with the query across the entire cluster. We again join with mv_queries to find the correct rows. We also join with mv_nodes to describe the nodes on which tasks ran.

SQL

use information_schema;

SQL

select a.elapsed_time_ms, a.cpu_time_ms, a.cpu_wait_time_ms, a.lock_time_ms, a.lock_row_time_ms, a.network_time_ms, a.disk_time_ms, a.log_flush_time_ms, a.log_buffer_time_ms, a.database_name, a.partition_id, n.type node_type
from mv_activities_extended a left join mv_queries q on a.aggregator_activity_name = q.activity_name left join mv_nodes n on a.node_id=n.id
where q.query_text like "update%";

+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+
| elapsed_time_ms | cpu_time_ms | cpu_wait_time_ms | lock_time_ms | lock_row_time_ms | network_time_ms | disk_time_ms | log_flush_time_ms | log_buffer_time_ms | database_name | partition_id | node_type |
+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+
|             941 |          44 |                1 |            0 |                0 |               0 |            0 |               915 |                  0 | rowstore_db   |            0 | LEAF      |
|             993 |          46 |                2 |            1 |                0 |             956 |            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        |
+-----------------+-------------+------------------+--------------+------------------+-----------------+--------------+-------------------+--------------------+---------------+--------------+-----------+

The 900ms network_time_ms we saw in mv_activities was accumulated on the MA node, and closely matches the latency of processing the query on the sole relevant leaf partition.

We can deduce that network_time_ms was high because the MA genuinely needed to wait for the leaf to finish processing the query.

Why was latency high on the leaf?

In the previous query, we saw 900ms log_flush_time_ms. The leaf spent its time waiting for transaction logs to be flushed to disk.

Note that we saw zero disk_time_ms in mv_activities_extended, and that mv_activities doesn’t have a log_flush_time_ms column. This is because disk_time_ms in mv_activities is split into disk_time_ms and log_flush_time_ms in mv_activities_extended.

We can drill down further, to illustrate the system-wide profiling potential of mv_activities_extended.

SingleStore performs actual flushing of transaction logs in a separate thread per partition, which is instrumented with a separate task. We can see the relationship between queries and log flusher threads by expanding our query to include system tasks. In particular, we select all rows of mv_activities_extended associated with the query’s database:

SQL

use information_schema;

SQL

select a.disk_time_ms, a.log_flush_time_ms, a.log_buffer_time_ms, a.database_name, a.partition_id, n.type node_type, a.activity_name, a.activity_type
from mv_activities_extended a left join mv_nodes n on a.node_id=n.id
where a.database_name="rowstore_db" order by a.disk_time_ms desc;

+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+
| disk_time_ms | log_flush_time_ms | log_buffer_time_ms | database_name | partition_id | node_type | activity_name                    | activity_type |
+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+
|          923 |                 0 |                  0 | rowstore_db   |            0 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            2 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            2 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | LEAF      | ReplayFile                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            1 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |               925 |                  0 | rowstore_db   |            0 | LEAF      | update_rowstore_62c84c96b9194b88 | Query         |
|            0 |                 0 |                  0 | rowstore_db   |            0 | LEAF      | RunBackgroundMerger              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | LEAF      | StreamFileFromMaster             | Database      |
|            0 |                 0 |                  0 | rowstore_db   |            1 | LEAF      | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | Update_rowstore_9cb5d2de6e9ae9cc | Query         |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | LogFlusher                       | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | MEMSQL_REPLICATION_START         | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | ReplicaHandler_Loop              | Database      |
|            0 |                 0 |                  0 | rowstore_db   |         NULL | MA        | ReplicaHandler_StreamFiles       | Database      |
+--------------+-------------------+--------------------+---------------+--------------+-----------+----------------------------------+---------------+

LogFlusher tasks on partition 0 spent as much time waiting for disk writes to complete as our query tasks spent waiting for transaction log flushes to be acknowledged.

We can now build a detailed theory of what happened. Query latency was high because the queried MA spent most of its time waiting for a leaf node to finish processing the query on partition 0. This leaf node, in turn, was delayed waiting for a transaction log flusher thread to complete disk writes.

Ultimately, the query was waiting for disk.

What else was using the disk?

We extract disk usage figures from mv_activities_extended, again joining with mv_queries for descriptive query text.

SQL

USE information_schema;

SQL

SELECT a.disk_logical_write_b, a.disk_logical_read_b, a.activity_name, a.database_name, a.partition_id, LEFT(q.query_text, 50)
FROM mv_activities_extended a LEFT JOIN mv_queries q ON a.aggregator_activity_name = q.activity_name
ORDERY BY a.disk_logical_write_b + a.disk_logical_read_b desc limit 10;

+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+
| disk_logical_write_b | disk_logical_read_b | activity_name                       | database_name  | partition_id | left(q.query_text, 50)                             |
+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+
|              1383163 |             1383855 | RunBackgroundMerger                 | columnstore_db |            0 | NULL                                               |
|              1212571 |             1213233 | RunBackgroundMerger                 | columnstore_db |            1 | NULL                                               |
|               797124 |              598624 | RunBackgroundMerger                 | columnstore_db |            2 | NULL                                               |
|                22535 |                   0 | LogFlusher                          | columnstore_db |            1 | NULL                                               |
|                21335 |                   0 | LogFlusher                          | columnstore_db |            2 | NULL                                               |
|                17990 |                   0 | LogFlusher                          | columnstore_db |            0 | NULL                                               |
|                 7076 |                   0 | LogFlusher                          | rowstore_db    |            0 | NULL                                               |
|                 1025 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            1 | insert into columnstore select @*round(@ * @ *rand |
|                  894 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            2 | insert into columnstore select @*round(@ * @ *rand |
|                  769 |                   0 | Insert_columnstore_61fc44a9950b4882 | columnstore_db |            0 | insert into columnstore select @*round(@ * @ *rand |
+----------------------+---------------------+-------------------------------------+----------------+--------------+----------------------------------------------------+

The disk was dominated by work done on behalf of a columnstore insert query in the columnstore_db database.