ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Rowlock timeout errors can occur in an application because of contention with transactions or uncommitted transactions.
Contention with Transactions
Updates and deletes in SingleStore are row locking operations.q1
running in transaction t1
, a second query q2
in transaction t2
that operates on the same row will be blocked until q1
completes.
Updates and deletes on a columnstore table lock the entire table if the count of rows being updated is greater than default_
.
The Lock wait timeout exceeded; try restarting transaction
error will occur when a query cannot proceed because it is blocked by a rowlock.
Deadlock between Transactions
Typically, a deadlock happens when two or more transactions are writing to the same rows, but in a different order.q1
and q2
in different transactions, where both q1
and q2
want to write to rows r1
and r2
.q1
wants to write to rows r1
and then r2
, but the query q2
wants to write to row r2
first and then r1
, they will deadlock.
A single transaction that runs multiple statements should not deadlock itself, because queries are executed in the order they are defined.
Solutions
-
To resolve contention when two or more transactions are writing to the same row (in the same order), add indexes on the columns being updated.
Adding indexes to the columns ensures that the row locks are taken in the same order that the write statements are specified in each transaction. Note: This solution will not resolve a deadlock (defined previously). For example, if the following query generates a lock wait timeout exceeded error:
DELETE FROM bucket WHERE meta_uid_id = ?Add an index on meta_
uid_ id. Consider the following
INSERT .
statement:. . ON DUPLICATE KEY UPDATE INSERT INTO allotments (DevID, DevName, Orders)SELECT * FROM deprAllotON DUPLICATE KEY UPDATEDevID = VALUES(DevID),Orders = VALUES (Orders);To reduce the chance of a lock wait timeout exceeded error, add individual indexes to the
DevID
andOrders
columns. -
You can decrease the
lock_
value to more quickly fail a query that is blocked by a lock.wait_ timeout -
To resolve a deadlock (defined above) you should change the flow of your application logic execute queries that write to the same rows in the same order.
-
You can also disable
multistatement_
.transactions Note: Disabling
multistatement_
to prevent deadlocks is not recommended because it can lead to data loss or other serious issues.transactions show variables like '%multistatement_transactions%';Shows whether multistatement transactions are on, and
SET GLOBAL multistatement_transactions = off;turns it off.
Disabling
multistatement_
turns off cross-node transactions for write queries.transactions Hence, write queries that span across multiple nodes, commit locally when the processing finishes on the node. To persist disabling multi-statement transactions through node restart, use the update-config command to update the
memsql.
file as shown below:cnf SingleStore Toolbox
sdb-admin update-config --key multistatement_transactions --value off --set-global --allBy turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node.
For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.
Uncommitted Transactions
Open transactions hold the locks on rows affected by the transaction until they are committed or rolled back, and any other write query modifying the same rows has to wait for the open transaction to release the locks.lock_
(default 60 seconds), it fails.
Solutions
-
To resolve the issue, identify the idle transaction and kill its connection.
-
To prevent the issue from recurring, ensure that all the transactions are committed or rolled back.
Resolving the current locking issue
If a query has failed because it waited long enough to exceed the lock_
, identify the transaction that is causing the timeout, and kill its connection.
START TRANSACTION;UPDATE cust SET ORDERS = (ORDERS + 1) WHERE ID > 7680;
In another connection start a transaction, and run a query that tries to update the same set of rows as the transaction above.
START TRANSACTION;UPDATE cust SET ORDER_DATE = DATE_ADD(ORDER_DATE, INTERVAL 1 DAY) WHERE ID > 7680;
ERROR 1205 (HY000): Leaf Error (192.168.3.152:3307): Lock wait timeout exceeded; try restarting transaction. Lock owned by connection id 76, query `open idle transaction`
To get the list of all the running transactions, query the INFORMATION_
table.INFORMATION_
table to identify the nodes on which each process is running.
SELECT TYPE, IP_ADDR, PORT, mvp.ID, USER, HOST, COMMAND, TIME, TRANSACTION_STATE, ROW_LOCKS_HELDFROM INFORMATION_SCHEMA.MV_PROCESSLIST mvp JOIN INFORMATION_SCHEMA.MV_NODES mvnON mvp.NODE_ID = mvn.IDWHERE TRANSACTION_STATE = "open" AND COMMAND = "Sleep" AND TIME > 60ORDER BY TYPE;
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
| TYPE | IP_ADDR | PORT | ID | USER | HOST | COMMAND | TIME | TRANSACTION_STATE | ROW_LOCKS_HELD |
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
| CA | 192.168.1.146 | 3306 | 143 | root | localhost:58322 | Sleep | 385 | open | 0 |
| LEAF | 192.168.3.152 | 3307 | 76 | distributed | 192.168.1.146:9555 | Sleep | 385 | open | 5168 |
| LEAF | 192.168.0.24 | 3307 | 125 | distributed | 192.168.1.146:7371 | Sleep | 385 | open | 5047 |
| LEAF | 192.168.0.24 | 3307 | 104 | distributed | 192.168.1.146:60874 | Sleep | 385 | open | 5001 |
| LEAF | 192.168.0.24 | 3307 | 115 | distributed | 192.168.1.146:63178 | Sleep | 385 | open | 5044 |
| LEAF | 192.168.3.242 | 3307 | 55 | distributed | 192.168.1.146:10118 | Sleep | 385 | open | 5129 |
| LEAF | 192.168.3.242 | 3307 | 77 | distributed | 192.168.1.146:21382 | Sleep | 385 | open | 5168 |
| LEAF | 192.168.3.242 | 3307 | 81 | distributed | 192.168.1.146:26758 | Sleep | 385 | open | 4998 |
| LEAF | 192.168.3.242 | 3307 | 122 | distributed | 192.168.1.146:55942 | Sleep | 385 | open | 5021 |
| LEAF | 192.168.0.24 | 3307 | 75 | distributed | 192.168.1.146:40650 | Sleep | 385 | open | 5015 |
| LEAF | 192.168.1.100 | 3307 | 131 | distributed | 192.168.1.146:23998 | Sleep | 385 | open | 5013 |
| LEAF | 192.168.3.152 | 3307 | 78 | distributed | 192.168.1.146:11603 | Sleep | 385 | open | 5124 |
| LEAF | 192.168.3.152 | 3307 | 83 | distributed | 192.168.1.146:21331 | Sleep | 385 | open | 5018 |
| LEAF | 192.168.3.152 | 3307 | 120 | distributed | 192.168.1.146:39763 | Sleep | 385 | open | 5092 |
| LEAF | 192.168.1.100 | 3307 | 79 | distributed | 192.168.1.146:1726 | Sleep | 385 | open | 5090 |
| LEAF | 192.168.1.100 | 3307 | 82 | distributed | 192.168.1.146:6078 | Sleep | 385 | open | 5065 |
| LEAF | 192.168.1.100 | 3307 | 114 | distributed | 192.168.1.146:21438 | Sleep | 385 | open | 5160 |
| MA | 192.168.0.231 | 3306 | 490 | root | localhost:32737 | Sleep | 73 | open | 0 |
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
In the result, each aggregator and each partition on each leaf has one idle open transaction which is holding row locks.143
, which is running on the child aggregator on host 192.
.143
.
KILL CONNECTION 143;
Alternatively, you can kill the open connections that have been idle for more than lock_
.
Note: Although the error message indicates that the lock is owned by the connection with ID 76
, it is not the connection that needs to be killed.
The original connection on the child aggregator that was updating ORDERS
is now closed, and the transaction is rolled back.
SELECT COUNT(*) FROM cust;
ERROR 2013 (HY000): Lost connection to MySQL server during query
SELECT COUNT(*) FROM cust;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 160
Current database: db1
+----------+
| COUNT(*) |
+----------+
| 390955 |
+----------+
Preventing the Issue from Recurring
To prevent the issue from recurring, inspect the application code and ensure that all the transactions are either committed or rolled back, including the transactions in exception handling.
A common cause of this error is if the transaction is executed as part of a stored procedure, but exception handling causes the final commit to get skipped.
The impact of this error can be mitigated by setting a session timeout for the client, in order to automatically kill the idle transactions.
Note: Setting a query timeout in Resource Governance will not affect idle transactions.
Last modified: June 20, 2024