ISSUE: Inconsistent Query Run Times
If you receive very different response times when you run the same query (or groups of queries on the same leaf) repeatedly, Transparent Huge Pages (THP) may be the cause.
Linux organizes RAM into pages that are ordinarily 4KB. With THP, Linux can instead use 2MB pages or larger. But THP transparently re-organizes memory used by a process inside the kernel, merging some small pages to huge pages, and splitting some huge pages to small pages. This background process holds locks on the memory manager, and can prevent your process from accessing memory.
You can check for THP by running
cat /sys/kernel/mm/*transparent_hugepage/enabled. If the [brackets] are around [always], you should disable THP on all nodes in your cluster. Fully remedying the issue may require restarting MemSQL, or fully restarting all of the servers. Cluster operators should disable THP before launching in production.
Note: Having THP enabled may also be the cause of high system CPU (sometimes called red CPU).
How to Disable THP
MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.
To disable THP, add the following lines to the end of
rc.local before the exit line (if present), and reboot the host machines.
For Debian-based distributions, add the following to
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
For RedHat/CentOS distributions, add the following to
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
khugepaged/defrag option will be
0 on newer Linux versions (e.g. CentOS 7+), but
no on older versions (e.g. CentOS 6). You can check which your system uses by running
cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see
0 update the line with
echo 0. If you see
no, update the line with
ISSUE: Long Running Queries Blocking DDL Operations and Workload
MemSQL supports execution of online DDL statements, which means that you can perform read and write operations while the DDL command is being executed on a table. The execution of a DDL command does not begin until all the in-progress DML queries running on the table finish. As soon as the in-progress read and write operations complete and the DDL command begins executing, new reads and writes proceed as normal. This blocking period usually lasts on the order of milliseconds. If there are long running DML queries in your workload, it can extend the blocking period because the DDL must wait until those queries complete. The DDL waits and blocks until the
default_distributed_ddl_timeout is reached (default value is 180000 milliseconds or 3 minutes), after which it throws a timeout error. If you frequently run DDL statements and have a lot of long-running queries, then it negatively impacts the workload since the DDL operation blocks other queries from starting while it waits for completion of long-running queries.
Increasing the timeout duration for a DDL operation is not a recommended option since it delays the start of the operation. While the DDL operation is delayed from starting, it will continue to block new queries from executing either. Therefore, raising the timeout can cause longer delays in regular queries in your workload.
Optimize your workload. If you have long running queries, then you can optimize your current workload by running frequent maintenance and backups, or by using a different workload. For example, run the analytical queries in your workload during the lowest traffic time. Additionally, you can use cluster replication to replicate the data to a secondary read-only cluster, and then run all long running or analytical queries on the secondary cluster. This drastically reduces the workload on the primary cluster and improves workload performance.
TRUNCATEcommand instead of the
DELETEcommand to remove rows from a table.
TRUNCATEis the preferred method if you want to remove all rows of a large table since it instantly frees up the memory. On the contrary,
DELETEdoes not instantly drop the rows being deleted from the memory; it marks the rows for deletion, while a background job runs the garbage collector periodically to clear out the deleted rows. Each marked row consumes memory, which may result in memory allocation issues for a large table.
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Updates and Deletes in MemSQL lock the row being updated. If the update is a singleton update (or upsert), then this error means that two processes are trying to update the same row. The setting
lock_wait_timeout defaults to 60s, which implies that if a query is waiting on a row for more than 60s then it fails. So the first query updates, then the second. If you are seeing this error there is a serious contention. This happens most often when updating on a field that is not indexed, since MemSQL takes a very short lock on every row while scanning through the entire table to see if it matches the where clause on the update.
- If you’re experiencing contention with singleton transactions the solution is to implement indexes.
For example, if the following query generates a lock wait timeout exceeded error:
DELETE FROM bucket WHERE meta_uid_id = ?
To address the error, add an index on meta_uid_id.
- If you’re experiencing contention with transactions having multiple statements, you can turn off
show variables like '%multistatement_transactions%';
Shows whether multistatement transactions are on, and
SET GLOBAL multistatement_transactions = off;
turns it off.
memsql-admin update-config --key multistatement_transactions --value off --set-global --memsql-id <aggregator_id>
memsql-ops memsql-update-config --key multistatement_transactions --value off --set-global <aggregator_id>
By turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.
Open transactions hold the locks on rows affected by the transaction until they are committed or rolled back, and any other write query modifying the same rows has to wait for the open transaction to release the locks. If the query has to wait for more than the
lock_wait_timeout (default 60 seconds), it fails. This happens most often when the open transaction is idle and unnecessarily holding the locks.
- To resolve the issue, identify the idle transaction and kill its connection.
- To prevent the issue from recurring, ensure that all the transactions are committed or rolled back.
Resolving the current locking issue
If a query has failed because it waited long enough to exceed the
lock_wait_timeout, identify the transaction that is causing the timeout, and kill its connection. Killing the connection rolls back the uncommitted writes of the open transaction. For example, a write operation is performed in the following transaction, but it is not committed.
START TRANSACTION; UPDATE cust SET ORDERS = (ORDERS + 1) WHERE ID > 7680;
In another connection start a transaction, and run a query that tries to update the same set of rows as the transaction above.
START TRANSACTION; UPDATE cust SET ORDER_DATE = DATE_ADD(ORDER_DATE, INTERVAL 1 DAY) WHERE ID > 7680; **** ERROR 1205 (HY000): Leaf Error (192.168.3.152:3307): Lock wait timeout exceeded; try restarting transaction. Lock owned by connection id 76, query `open idle transaction`
To get the list of all the running transactions, query the
INFORMATION_SCHEMA.PROCESSLIST table. This management view table contains the processlist information for all nodes in the cluster, and you can join this table with the
INFORMATION_SCHEMA.MV_NODES table to identify the nodes on which each process is running.
SELECT TYPE, IP_ADDR, PORT, mvp.ID, USER, HOST, COMMAND, TIME, TRANSACTION_STATE, ROW_LOCKS_HELD FROM INFORMATION_SCHEMA.MV_PROCESSLIST mvp JOIN INFORMATION_SCHEMA.MV_NODES mvn ON mvp.NODE_ID = mvn.ID WHERE TRANSACTION_STATE = "open" AND COMMAND = "Sleep" ORDER BY TYPE; **** +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+ | TYPE | IP_ADDR | PORT | ID | USER | HOST | COMMAND | TIME | TRANSACTION_STATE | ROW_LOCKS_HELD | +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+ | CA | 192.168.1.146 | 3306 | 143 | root | localhost:58322 | Sleep | 385 | open | 0 | | LEAF | 192.168.3.152 | 3307 | 76 | distributed | 192.168.1.146:9555 | Sleep | 385 | open | 5168 | | LEAF | 192.168.0.24 | 3307 | 125 | distributed | 192.168.1.146:7371 | Sleep | 385 | open | 5047 | | LEAF | 192.168.0.24 | 3307 | 104 | distributed | 192.168.1.146:60874 | Sleep | 385 | open | 5001 | | LEAF | 192.168.0.24 | 3307 | 115 | distributed | 192.168.1.146:63178 | Sleep | 385 | open | 5044 | | LEAF | 192.168.3.242 | 3307 | 55 | distributed | 192.168.1.146:10118 | Sleep | 385 | open | 5129 | | LEAF | 192.168.3.242 | 3307 | 77 | distributed | 192.168.1.146:21382 | Sleep | 385 | open | 5168 | | LEAF | 192.168.3.242 | 3307 | 81 | distributed | 192.168.1.146:26758 | Sleep | 385 | open | 4998 | | LEAF | 192.168.3.242 | 3307 | 122 | distributed | 192.168.1.146:55942 | Sleep | 385 | open | 5021 | | LEAF | 192.168.0.24 | 3307 | 75 | distributed | 192.168.1.146:40650 | Sleep | 385 | open | 5015 | | LEAF | 192.168.1.100 | 3307 | 131 | distributed | 192.168.1.146:23998 | Sleep | 385 | open | 5013 | | LEAF | 192.168.3.152 | 3307 | 78 | distributed | 192.168.1.146:11603 | Sleep | 385 | open | 5124 | | LEAF | 192.168.3.152 | 3307 | 83 | distributed | 192.168.1.146:21331 | Sleep | 385 | open | 5018 | | LEAF | 192.168.3.152 | 3307 | 120 | distributed | 192.168.1.146:39763 | Sleep | 385 | open | 5092 | | LEAF | 192.168.1.100 | 3307 | 79 | distributed | 192.168.1.146:1726 | Sleep | 385 | open | 5090 | | LEAF | 192.168.1.100 | 3307 | 82 | distributed | 192.168.1.146:6078 | Sleep | 385 | open | 5065 | | LEAF | 192.168.1.100 | 3307 | 114 | distributed | 192.168.1.146:21438 | Sleep | 385 | open | 5160 | | MA | 192.168.0.231 | 3306 | 490 | root | localhost:32737 | Sleep | 73 | open | 0 | +------+---------------+------+-----+-------------+---------------------+---------+------+-------------------+----------------+
In the result, each aggregator and each partition on each leaf has one idle open transaction which is holding row locks. The locks are only held on the leaves, because that is where the data is stored. Look for an aggregator process with the same run time as the leaf processes that are holding locks. In the result above, it is the process with connection ID
143, which is running on the child aggregator on host
192.168.1.146:3306. To resolve the current issue, connect to this aggregator and kill the connection with ID
KILL CONNECTION 143;
Alternatively, you can kill the open connections that have been idle for more than
Note: Although the error message indicates that the lock is owned by the connection with ID
76, it is not the connection that needs to be killed. It is the connection for one of the distributed leaf processes, which holds the lock that our query was waiting on. Instead, we need to kill the connection for the aggregator process, which will roll back the entire transaction.
The original connection on the child aggregator that was updating
ORDERS is now closed, and the transaction is rolled back. If you try to use that session again, it automatically re-establishes a new connection with a new ID.
SELECT COUNT(*) FROM cust; **** ERROR 2013 (HY000): Lost connection to MySQL server during query
SELECT COUNT(*) FROM orders; **** ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 160 Current database: db1 +----------+ | COUNT(*) | +----------+ | 390955 | +----------+
Preventing the Issue from Recurring
To prevent the issue from recurring, inspect the application code and ensure that all the transactions are either committed or rolled back, including the transactions in exception handling. Additionally, all manually run transactions must be committed or rolled back as well.
A common cause of this error is if the transaction is executed as part of a stored procedure, but exception handling causes the final commit to get skipped. For more information, see Transactions in Stored Procedures.
The impact of this error can be mitigated by setting a session timeout for the client, in order to automatically kill the idle transactions. When a client ends a session, the database rolls back the uncommitted transactions and closes the connection on its end. If the session timeout value is set lower than the lock wait timeout (default 60 seconds), then idle connections will not cause lock wait timeouts to be breached.
Note: Setting a query timeout in Resource Governance will not affect idle transactions. The query timeout limit only applies to executing queries.
ERROR 1706 (HY000): Feature ‘Multi-table UPDATE/DELETE with a reference table as target table’ is not supported by MemSQL
Updating a reference table based on a join with a sharded table returns an error. This is because reference tables are only updatable on the master aggregator while joins against sharded tables must run on the leaves, so the query cannot run like normal updates.
Use the same table types that use both the target as well as joined table as sharded tables or as reference tables.
UPSERTwith a reference table that has a primary key. For example, if
t1is a reference table with a primary key, then the following query runs successfully:
INSERT INTO t1 SELECT t1.* FROM t1 JOIN t2 ON t1.b = t2.b ON DUPLICATE KEY UPDATE t1.b.
For more details, refer to the UPDATE topic.
- Turning off multi-statement transactions makes per-partition components of a distributed transaction commit independently without waiting for other partitions to finish and potentially trigger rollback. This removes the common row lock deadlock (transaction A waiting for transaction B on partition 1, but B waiting for A on partition 2) at the cost of potentially letting failed queries commit partial results.
- If the values inserted are dependent on multiple statements executing concurrently, then inconsistent data may get inserted.
- In case of failure, it is not possible to rollback the entire multi-upsert inside one transaction.
ERROR 1706 (HY000): Leaf Error (127.0.0.1:3307): Feature ‘INSERT IGNORE … ON DUPLICATE KEY UPDATE’ is not supported by MemSQL
When you use the
IGNORE modifier in a
INSERT statement, MemSQL ignores records with duplicate key entries and only inserts records with unique key values. The
ON DUPLICATE KEY UPDATE clause, instead of ignoring the insert, performs an update on the row with the duplicate key value. Using both of them in same query leads to a dilemma: whether to ignore the insert or update the record. To prevent any ambiguity in data, MemSQL returns an error, instead of choosing one operation over another.
Either use the
IGNORE modifier or
ON DUPLICATE KEY UPDATE clause in
INSERT statements, but not both. For more information, see INSERT