Monitoring Replication
SHOW PARTITIONS EXTENDED
Running SHOW PARTITIONS EXTENDED
on secondary-MA
will display information such as replication role, the location of each partition, if it is locked, and other information.
SHOW CLUSTER STATUS
Running SHOW CLUSTER STATUS
provides information like log replay position and detailed information about all databases in the cluster.
SHOW DATABASES EXTENDED
SHOW DATABASES EXTENDED
is another useful command for monitoring replication status. The output summarizes the replication status and other information about the state of the databases present in a SingleStoreDB cluster.
SHOW REPLICATION STATUS
Running SHOW REPLICATION STATUS
on a node shows the status of every replication process running on that node. However, it is recommended to use the information schema views mv_aggregated_replication_status, mv_replication_status, and lmv_replication_status instead to monitor the DR replication progress. These views provide an aggregated replication status of each database (including partitions) and replication links between the primary and the secondary cluster to know if there is any lag in replication. See the Monitoring DR Replication Progress section for details.
The following is an example of the output of SHOW REPLICATION STATUS run on secondary-MA
. Note that this example follows the naming conventions established in Setting Up Replication.
SHOW REPLICATION STATUS; **** +---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+ | Role | Database | Master_URI | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Replica_URI | Replica_State | Replica_CommitLSN | Replica_HardenedLSN | Replica_ReplayLSN | Replica_TailLSN | Replica_Commits | +---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+ | master | cluster | NULL | online | 0:37 | 0:37 | 0:0 | 0:37 | 34 | yes | 127.0.0.1:20002/cluster | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 17 | | master | cluster | NULL | online | 0:37 | 0:37 | 0:0 | 0:37 | 34 | yes | 127.0.0.1:20001/cluster | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 17 | | async replica | cluster_17639882876507016380 | 127.0.0.1:10000/cluster | online | 0:37 | 0:37 | 0:0 | 0:37 | 33 | yes | NULL | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | | master | cluster_17639882876507016380 | NULL | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | yes | 127.0.0.1:20002/cluster_17639882876507016380 | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | | master | cluster_17639882876507016380 | NULL | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | yes | 127.0.0.1:20001/cluster_17639882876507016380 | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | | async replica | db_name | 127.0.0.1:10000/db_name | online | 0:683 | 0:683 | 0:683 | 0:683 | 8 | yes | NULL | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | | master | db_name | NULL | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | yes | 127.0.0.1:20002/db_name | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | | master | db_name | NULL | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | yes | 127.0.0.1:20001/db_name | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | +---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+ 8 rows in set (0.03 sec)
In this example, the first line describes replication of the sharding
database on primary-MA
to the cluster_17639882876507016380
database on secondary-MA
. The sharding
database exists on the master aggregator and stores metadata that defines how data is partitioned. REPLICATE DATABASE
automatically creates a cluster_[hash]
database on the secondary cluster which stores partition metadata about the primary cluster. The second line describes replication of metadata and reference tables for the db_name
database in the secondary cluster. This data is replicated asynchronously to all aggregators and asynchronously to all leaves. The third and fourth lines describe replication of db_name
metadata and reference tables from secondary-MA
to the secondary cluster’s two leaf nodes (secondary-L1
and secondary-L2
).
NetworkPosition
uses the format [log file ordinal]:[byte offset into log file]
.
The following is the output of SHOW REPLICATION STATUS run on secondary-L1
. In this example, db_name_[ordinal]
refers to a partition of the sharded db_name
database.
SHOW REPLICATION STATUS; **** +---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+ | Role | Database | Master_URI | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Replica_URI | Replica_State | Replica_CommitLSN | Replica_HardenedLSN | Replica_ReplayLSN | Replica_TailLSN | Replica_Commits | +---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+ | async replica | cluster | 127.0.0.1:20000/cluster | online | 0:37 | 0:37 | 0:0 | 0:37 | 34 | yes | NULL | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 17 | | async replica | cluster_17639882876507016380 | 127.0.0.1:20000/cluster_17639882876507016380 | replicating | 0:36 | 0:37 | 0:36 | 0:37 | 16 | yes | NULL | replicating. | 0:36 | 0:37 | 0:36 | 0:37 | 16 | | async replica | db_name | 127.0.0.1:20000/db_name | replicating | 0:683 | 0:683 | 0:683 | 0:683 | 8 | yes | NULL | replicating. | 0:683 | 0:683 | 0:683 | 0:683 | 8 | | async replica | db_name_0 | 127.0.0.1:10001/db_name_0 | online | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | yes | NULL | replicating | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | | master | db_name_0 | NULL | replicating | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | yes | 127.0.0.1:20001/db_name_0 | replicating | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | | async replica | db_name_0_AUTO_REPLICA | 127.0.0.1:20001/db_name_0_AUTO_REPLICA | replicating | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | yes | NULL | replicating | 0:30423 | 0:30423 | 0:30423 | 0:30423 | 1778 | | async replica | db_name_10 | 127.0.0.1:10001/db_name_10 | online | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | yes | NULL | replicating | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | | master | db_name_10 | NULL | replicating | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | yes | 127.0.0.1:20001/db_name_10 | replicating | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | | async replica | db_name_10_AUTO_REPLICA | 127.0.0.1:20001/db_name_10_AUTO_REPLICA | replicating | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | yes | NULL | replicating | 0:29835 | 0:29835 | 0:29835 | 0:29835 | 1766 | | async replica | db_name_12 | 127.0.0.1:10001/db_name_12 | online | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | yes | NULL | replicating | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | | master | db_name_12 | NULL | replicating | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | yes | 127.0.0.1:20001/db_name_12 | replicating | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | | async replica | db_name_12_AUTO_REPLICA | 127.0.0.1:20001/db_name_12_AUTO_REPLICA | replicating | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | yes | NULL | replicating | 0:29773 | 0:29773 | 0:29773 | 0:29773 | 1747 | | async replica | db_name_14 | 127.0.0.1:10001/db_name_14 | online | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | yes | NULL | replicating | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | | master | db_name_14 | NULL | replicating | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | yes | 127.0.0.1:20001/db_name_14 | replicating | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | | async replica | db_name_14_AUTO_REPLICA | 127.0.0.1:20001/db_name_14_AUTO_REPLICA | replicating | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | yes | NULL | replicating | 0:29476 | 0:29476 | 0:29476 | 0:29476 | 1736 | | async replica | db_name_2 | 127.0.0.1:10001/db_name_2 | online | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | yes | NULL | replicating | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | | master | db_name_2 | NULL | replicating | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | yes | 127.0.0.1:20001/db_name_2 | replicating | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | | async replica | db_name_2_AUTO_REPLICA | 127.0.0.1:20001/db_name_2_AUTO_REPLICA | replicating | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | yes | NULL | replicating | 0:29188 | 0:29188 | 0:29188 | 0:29188 | 1696 | | async replica | db_name_4 | 127.0.0.1:10001/db_name_4 | online | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | yes | NULL | replicating | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | | master | db_name_4 | NULL | replicating | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | yes | 127.0.0.1:20001/db_name_4 | replicating | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | | async replica | db_name_4_AUTO_REPLICA | 127.0.0.1:20001/db_name_4_AUTO_REPLICA | replicating | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | yes | NULL | replicating | 0:30611 | 0:30611 | 0:30611 | 0:30611 | 1798 | | async replica | db_name_6 | 127.0.0.1:10001/db_name_6 | online | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | yes | NULL | replicating | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | | master | db_name_6 | NULL | replicating | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | yes | 127.0.0.1:20001/db_name_6 | replicating | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | | async replica | db_name_6_AUTO_REPLICA | 127.0.0.1:20001/db_name_6_AUTO_REPLICA | replicating | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | yes | NULL | replicating | 0:30573 | 0:30573 | 0:30573 | 0:30573 | 1797 | | async replica | db_name_8 | 127.0.0.1:10001/db_name_8 | online | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | yes | NULL | replicating | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | | master | db_name_8 | NULL | replicating | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | yes | 127.0.0.1:20001/db_name_8 | replicating | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | | async replica | db_name_8_AUTO_REPLICA | 127.0.0.1:20001/db_name_8_AUTO_REPLICA | replicating | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | yes | NULL | replicating | 0:29812 | 0:29812 | 0:29812 | 0:29812 | 1735 | +---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+
In this sample output, the first line refers to replication of the reference database (metadata) for db_name
. This data is replicated from primary-MA
to secondary-MA
from which it is replicated to each leaf node in the secondary cluster. The remaining lines refer to replication of the partitions of the sharded database db_name
. As you can see, the data is replicated directly from leaf nodes in the primary cluster to leaf nodes in the secondary cluster. In this example, secondary-L1
is receiving data from both primary-L1
and primary-L2
.
Finally, note that SingleStoreDB will automatically take the steps necessary to ensure the secondary cluster is consistent with the primary cluster. For example, if a leaf node in a primary cluster with redundancy 2 and a replica partition on the secondary cluster gets ahead of a replica partition on the primary cluster (due to network or other irregularity), SingleStoreDB will automatically drop and reprovision the replica partition on the secondary cluster such that it will be consistent with the recently promoted master partition on the primary cluster. Please note that the dropped or reprovisioning replica partition will not appear in the SHOW REPLICATION STATUS output.
Monitoring Replication Progress from the Primary to Secondary Cluster
You can view the progress of replications between the primary and the secondary cluster by querying the following information schema views:
lmv_replication_status: This view can be accessed from any node. It provides the replication status of every replication link on the single node which the view is accessed from. In case of inter-cluster replication, a replication link is either an instance of replication between the aggregator on the primary and secondary cluster, or an instance of replication between a leaf’s partition on the primary and secondary cluster. For each replication link, the view shows statistics such as replication status, volume of lag (in MBs), total number of lagging LSNs, replication throughput, and the estimated time lag.
mv_replication_status: This view can be accessed from any aggregator. It provides the replication status of every replication link in all nodes in the cluster. For each replication link, it provides the same information as the
lmv_replication_status
view provides.mv_aggregated_replication_status: This view can be accessed from any aggregator. It provides the aggregated replication status for each database in the cluster. In the view, partitions of a database are aggregated into one row, where each row corresponds to a specific type of replication (DR/non-DR).
You can tell whether replication is caught up on the secondary cluster by checking the values of the LSN_LAG
or ESTIMATED_CATCHUP_TIME_S
columns. If either of those columns has a value of 0, replication is finished on the secondary cluster.
Example
Consider two SingleStoreDB clusters, db
as the primary cluster and db1
as the secondary cluster. To replicate db
and view its replication progress, perform the following steps:
Connect to the master aggregator of the secondary cluster
db1
and execute the following command to replicatedb
:REPLICATE DATABASE db FROM root:'memsqlroot1'@192.168.0.1:3306/db;
Query the
mv_replication_status
view in theinformation_schema
database to view the replication status of all database partitions and replication links between the primary and the secondary cluster:SELECT * FROM information_schema.MV_REPLICATION_STATUS WHERE database_name LIKE 'db%' AND IS_DR = 1; **** +-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI | SECONDARY_URI | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S | +-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | 1 | 0 | db | 127.0.0.1:10000/db | 127.0.0.1:20000/db | online | replicating | 0 | 0.0 | 0.008 | 0.0 | | 1 | 0 | db_0 | 127.0.0.1:10001/db_0 | 127.0.0.1:20001/db_0 | online | replicating | 19561 | 80.122 | 28.099 | 2.851 | | 1 | 0 | db_1 | 127.0.0.1:10002/db_1 | 127.0.0.1:20002/db_1 | online | replicating | 18804 | 77.021 | 18.125 | 4.249 | +-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
Note:
DATABASE_NAME
with a suffix such asdb_0
,db_1
indicates a database partition, while without any suffix indicates link for the master aggregator. Also,IS_DR = 1
indicates replication between clusters.To view the DR replication links between the primary master aggregator and the secondary master aggregator of the database
db
that are lagging, query themv_replication_status
view as follows:SELECT * FROM information_schema.MV_REPLICATION_STATUS WHERE primary_uri = '192.168.0.1:3306/db'; **** +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI | SECONDARY_URI | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S | +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | 1 | 0 | db | 192.168.0.1:3306/db | 127.0.0.1:10000/db | online | replicating | 0 | 0.000 | 0.000 | 0.000 | +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
Query the
mv_aggregated_replication_status
view in theinformation_schema
database to view the aggregated replication lag of the entire databasedb
(including partitions) between the primary and secondary cluster:SELECT * FROM information_schema.MV_AGGREGATED_REPLICATION_STATUS WHERE database_name = 'db' AND IS_DR = 1; **** +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI | SECONDARY_URI | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S | +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+ | 1 | 0 | db | 127.0.0.1:10000/db | 127.0.0.1:20000/db | online | replicating | 45895 | 187.986 | 3.227 | 7339.667 | +-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
Note:
IS_DR = 1
indicates replication between clusters.