SingleStore DB

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Rowlock timeout errors can occur in an application because of contention with transactions or uncommitted transactions. Refer to the following sections to resolve lock wait timeout errors.

Contention with Transactions

Updates and deletes in SingleStore are row locking operations. If a row is currently locked by query q1 running in transaction t1, a second query q2 in transaction t2 that operates on the same row will be blocked until q1 completes.

Updates and deletes on a columnstore table lock the entire table if the count of rows being updated is greater than columnstore_disk_insert_threshold. See Locking in Columnstores for more information.

The Lock wait timeout exceeded; try restarting transaction error will occur when a query cannot proceed because it is blocked by a rowlock.

Deadlock between Transactions

Typically, a deadlock happens when two or more transactions are writing to the same rows, but in a different order. For example, consider two concurrently running queries q1 and q2 in different transactions, where both q1 and q2 want to write to rows r1 and r2. If the query q1 wants to write to rows r1 and then r2, but the query q2 wants to write to row r2 first and then r1, they will deadlock.

A single transaction that runs multiple statements should not deadlock itself, because queries are executed in the order they are defined.

Solutions

  • To resolve contention when two or more transactions are writing to the same row (in the same order), add indexes on the columns being updated. Adding indexes to the columns ensures that the row locks are taken in the same order that the write statements are specified in each transaction. Note: This solution will not resolve a deadlock (defined previously).

    For example, if the following query generates a lock wait timeout exceeded error:

    DELETE FROM bucket WHERE meta_uid_id = ?
    

    Add an index on meta_uid_id.

    Consider the following INSERT ... ON DUPLICATE KEY UPDATE statement:

    INSERT INTO allotments (DevID, DevName, Orders)
      SELECT * FROM deprAllot
      ON DUPLICATE KEY UPDATE
      DevID = VALUES(DevID),
      Orders =  VALUES (Orders);
    

    To reduce the chance of a lock wait timeout exceeded error, add individual indexes to the DevID and Orders columns.

  • You can decrease the lock_wait_timeout value to more quickly fail a query that is blocked by a lock.

  • To resolve a deadlock (defined above) you should change the flow of your application logic execute queries that write to the same rows in the same order.

  • You can also disable multistatement_transactions.

    Note: Disabling multistatement_transactions to prevent deadlocks is not recommended because it can lead to data loss or other serious issues.

    show variables like '%multistatement_transactions%';
    

    Shows whether multistatement transactions are on, and

    SET GLOBAL multistatement_transactions = off;
    

    turns it off.

    Disabling multistatement_transactions turns off cross-node transactions for write queries. Hence, write queries that span across multiple nodes, commit locally when the processing finishes on the node.

    To persist disabling multi-statement transactions through node restart, use the update-config command to update the memsql.cnf file as shown below:

    SingleStore Tools

    sdb-admin update-config --key multistatement_transactions --value off --set-global --all
    

    By turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.

Uncommitted Transactions

Open transactions hold the locks on rows affected by the transaction until they are committed or rolled back, and any other write query modifying the same rows has to wait for the open transaction to release the locks. If the query has to wait for more than the lock_wait_timeout (default 60 seconds), it fails. This happens most often when the open transaction is idle and unnecessarily holding the locks.

Solutions

  • To resolve the issue, identify the idle transaction and kill its connection.

  • To prevent the issue from recurring, ensure that all the transactions are committed or rolled back.

Resolving the current locking issue

If a query has failed because it waited long enough to exceed the lock_wait_timeout, identify the transaction that is causing the timeout, and kill its connection. Killing the connection rolls back the uncommitted writes of the open transaction. For example, a write operation is performed in the following transaction, but it is not committed.

START TRANSACTION;

UPDATE cust SET ORDERS = (ORDERS + 1) WHERE ID > 7680;

In another connection start a transaction, and run a query that tries to update the same set of rows as the transaction above.

START TRANSACTION;

UPDATE cust SET ORDER_DATE = DATE_ADD(ORDER_DATE, INTERVAL 1 DAY) WHERE ID > 7680;
****
ERROR 1205 (HY000): Leaf Error (192.168.3.152:3307): Lock wait timeout exceeded; try restarting transaction.  Lock owned by connection id 76, query `open idle transaction`

To get the list of all the running transactions, query the INFORMATION_SCHEMA.PROCESSLIST table. This management view table contains the processlist information for all nodes in the cluster, and you can join this table with the INFORMATION_SCHEMA.MV_NODES table to identify the nodes on which each process is running.

SELECT TYPE, IP_ADDR, PORT, mvp.ID, USER, HOST, COMMAND, TIME, TRANSACTION_STATE, ROW_LOCKS_HELD
FROM INFORMATION_SCHEMA.MV_PROCESSLIST mvp JOIN INFORMATION_SCHEMA.MV_NODES mvn
ON mvp.NODE_ID = mvn.ID
WHERE TRANSACTION_STATE = "open" AND COMMAND = "Sleep" AND TIME > 60
ORDER BY TYPE;
****
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
| TYPE | IP_ADDR       | PORT | ID  | USER        | HOST                | COMMAND | TIME | TRANSACTION_STATE | ROW_LOCKS_HELD |
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
| CA   | 192.168.1.146 | 3306 | 143 | root        | localhost:58322     | Sleep   |  385 | open              |              0 |
| LEAF | 192.168.3.152 | 3307 |  76 | distributed | 192.168.1.146:9555  | Sleep   |  385 | open              |           5168 |
| LEAF | 192.168.0.24  | 3307 | 125 | distributed | 192.168.1.146:7371  | Sleep   |  385 | open              |           5047 |
| LEAF | 192.168.0.24  | 3307 | 104 | distributed | 192.168.1.146:60874 | Sleep   |  385 | open              |           5001 |
| LEAF | 192.168.0.24  | 3307 | 115 | distributed | 192.168.1.146:63178 | Sleep   |  385 | open              |           5044 |
| LEAF | 192.168.3.242 | 3307 |  55 | distributed | 192.168.1.146:10118 | Sleep   |  385 | open              |           5129 |
| LEAF | 192.168.3.242 | 3307 |  77 | distributed | 192.168.1.146:21382 | Sleep   |  385 | open              |           5168 |
| LEAF | 192.168.3.242 | 3307 |  81 | distributed | 192.168.1.146:26758 | Sleep   |  385 | open              |           4998 |
| LEAF | 192.168.3.242 | 3307 | 122 | distributed | 192.168.1.146:55942 | Sleep   |  385 | open              |           5021 |
| LEAF | 192.168.0.24  | 3307 |  75 | distributed | 192.168.1.146:40650 | Sleep   |  385 | open              |           5015 |
| LEAF | 192.168.1.100 | 3307 | 131 | distributed | 192.168.1.146:23998 | Sleep   |  385 | open              |           5013 |
| LEAF | 192.168.3.152 | 3307 |  78 | distributed | 192.168.1.146:11603 | Sleep   |  385 | open              |           5124 |
| LEAF | 192.168.3.152 | 3307 |  83 | distributed | 192.168.1.146:21331 | Sleep   |  385 | open              |           5018 |
| LEAF | 192.168.3.152 | 3307 | 120 | distributed | 192.168.1.146:39763 | Sleep   |  385 | open              |           5092 |
| LEAF | 192.168.1.100 | 3307 |  79 | distributed | 192.168.1.146:1726  | Sleep   |  385 | open              |           5090 |
| LEAF | 192.168.1.100 | 3307 |  82 | distributed | 192.168.1.146:6078  | Sleep   |  385 | open              |           5065 |
| LEAF | 192.168.1.100 | 3307 | 114 | distributed | 192.168.1.146:21438 | Sleep   |  385 | open              |           5160 |
| MA   | 192.168.0.231 | 3306 | 490 | root        | localhost:32737     | Sleep   |  73  | open              |              0 |
+------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+

In the result, each aggregator and each partition on each leaf has one idle open transaction which is holding row locks. The locks are only held on the leaves, because that is where the data is stored. Look for an aggregator process with the same run time as the leaf processes that are holding locks. In the result above, it is the process with connection ID 143, which is running on the child aggregator on host 192.168.1.146:3306. To resolve the current issue, connect to this aggregator and kill the connection with ID 143.

KILL CONNECTION 143;

Alternatively, you can kill the open connections that have been idle for more than lock_wait_timeout.

Note: Although the error message indicates that the lock is owned by the connection with ID 76, it is not the connection that needs to be killed. It is the connection for one of the distributed leaf processes, which holds the lock that our query was waiting on. Instead, we need to kill the connection for the aggregator process, which will roll back the entire transaction.

The original connection on the child aggregator that was updating ORDERS is now closed, and the transaction is rolled back. If you try to use that session again, it automatically re-establishes a new connection with a new ID.

SELECT COUNT(*) FROM cust;
****
ERROR 2013 (HY000): Lost connection to MySQL server during query
SELECT COUNT(*) FROM cust;
****
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    160
Current database: db1

+----------+
| COUNT(*) |
+----------+
|   390955 |
+----------+

Preventing the Issue from Recurring

To prevent the issue from recurring, inspect the application code and ensure that all the transactions are either committed or rolled back, including the transactions in exception handling. Additionally, all manually run transactions must be committed or rolled back as well.

A common cause of this error is if the transaction is executed as part of a stored procedure, but exception handling causes the final commit to get skipped. To prevent this issue, add rollbacks for the transaction in the exception handling section of the stored procedure. For more information, see Transactions in Stored Procedures.

The impact of this error can be mitigated by setting a session timeout for the client, in order to automatically kill the idle transactions. When a client ends a session, the database rolls back the uncommitted transactions and closes the connection on its end. If the session timeout value is set lower than the lock wait timeout (default 60 seconds), then idle connections will not cause lock wait timeouts to be breached.

Note: Setting a query timeout in Resource Governance will not affect idle transactions. The query timeout limit only applies to executing queries.