Monitoring Replication

SHOW PARTITIONS EXTENDED

Running SHOW PARTITIONS EXTENDED on secondary-MA will display information such as replication role, the location of each partition, if it is locked, and other information.

SHOW CLUSTER STATUS

Running SHOW CLUSTER STATUS provides information like log replay position and detailed information about all databases in the cluster.

SHOW DATABASES EXTENDED

SHOW DATABASES EXTENDED is another useful command for monitoring replication status. The output summarizes the replication status and other information about the state of the databases present in a SingleStore cluster.

SHOW REPLICATION STATUS

Running SHOW REPLICATION STATUS on a node shows the status of every replication process running on that node. However, SingleStore recommends to use the information schema views mv_aggregated_replication_status, mv_replication_status, and lmv_replication_status instead to monitor the DR replication progress. These views provide an aggregated replication status of each database (including partitions) and replication links between the primary and the secondary cluster to know if there is any lag in replication. See the Monitoring DR Replication Progress section for details.

The following is an example of the output of SHOW REPLICATION STATUS run on secondary-MA. Note that this example follows the naming conventions established in Setting Up Replication.

SQL

SHOW REPLICATION STATUS;

+---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+
| Role          | Database                     | Master_URI              | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Replica_URI                                  | Replica_State | Replica_CommitLSN | Replica_HardenedLSN | Replica_ReplayLSN | Replica_TailLSN | Replica_Commits |
+---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+
| master        | cluster                      | NULL                    | online       | 0:37             | 0:37               | 0:0              | 0:37           |             34 | yes       | 127.0.0.1:20002/cluster                      | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              17 |
| master        | cluster                      | NULL                    | online       | 0:37             | 0:37               | 0:0              | 0:37           |             34 | yes       | 127.0.0.1:20001/cluster                      | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              17 |
| async replica | cluster_17639882876507016380 | 127.0.0.1:10000/cluster | online       | 0:37             | 0:37               | 0:0              | 0:37           |             33 | yes       | NULL                                         | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              16 |
| master        | cluster_17639882876507016380 | NULL                    | replicating  | 0:36             | 0:37               | 0:36             | 0:37           |             16 | yes       | 127.0.0.1:20002/cluster_17639882876507016380 | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              16 |
| master        | cluster_17639882876507016380 | NULL                    | replicating  | 0:36             | 0:37               | 0:36             | 0:37           |             16 | yes       | 127.0.0.1:20001/cluster_17639882876507016380 | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              16 |
| async replica | db_name                      | 127.0.0.1:10000/db_name | online       | 0:683            | 0:683              | 0:683            | 0:683          |              8 | yes       | NULL                                         | replicating   | 0:683             | 0:683               | 0:683             | 0:683           |               8 |
| master        | db_name                      | NULL                    | replicating  | 0:683            | 0:683              | 0:683            | 0:683          |              8 | yes       | 127.0.0.1:20002/db_name                      | replicating   | 0:683             | 0:683               | 0:683             | 0:683           |               8 |
| master        | db_name                      | NULL                    | replicating  | 0:683            | 0:683              | 0:683            | 0:683          |              8 | yes       | 127.0.0.1:20001/db_name                      | replicating   | 0:683             | 0:683               | 0:683             | 0:683           |               8 |
+---------------+------------------------------+-------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+

8 rows in set (0.03 sec)

In this example, the first line describes replication of the sharding database on primary-MA to the cluster_17639882876507016380 database on secondary-MA. The sharding database exists on the master aggregator and stores metadata that defines how data is partitioned. REPLICATE DATABASE automatically creates a cluster_[hash] database on the secondary cluster which stores partition metadata about the primary cluster. The second line describes replication of metadata and reference tables for the db_name database in the secondary cluster. This data is replicated asynchronously to all aggregators and asynchronously to all leaves. The third and fourth lines describe replication of db_name metadata and reference tables from secondary-MA to the secondary cluster’s two leaf nodes (secondary-L1 and secondary-L2).

NetworkPosition uses the format [log file ordinal]:[byte offset into log file].

The following is the output of SHOW REPLICATION STATUS run on secondary-L1. In this example, db_name_[ordinal] refers to a partition of the sharded db_name database.

SQL

SHOW REPLICATION STATUS;

+---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+
| Role          | Database                     | Master_URI                                   | Master_State | Master_CommitLSN | Master_HardenedLSN | Master_ReplayLSN | Master_TailLSN | Master_Commits | Connected | Replica_URI                | Replica_State | Replica_CommitLSN | Replica_HardenedLSN | Replica_ReplayLSN | Replica_TailLSN | Replica_Commits |
+---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+
| async replica | cluster                      | 127.0.0.1:20000/cluster                      | online       | 0:37             | 0:37               | 0:0              | 0:37           |             34 | yes       | NULL                       | replicating   | 0:36              | 0:37                | 0:36              | 0:37            |              17 |
| async replica | cluster_17639882876507016380 | 127.0.0.1:20000/cluster_17639882876507016380 | replicating  | 0:36             | 0:37               | 0:36             | 0:37           |             16 | yes       | NULL                       | replicating.  | 0:36              | 0:37                | 0:36              | 0:37            |              16 |
| async replica | db_name                      | 127.0.0.1:20000/db_name                      | replicating  | 0:683            | 0:683              | 0:683            | 0:683          |              8 | yes       | NULL                       | replicating.  | 0:683             | 0:683               | 0:683             | 0:683           |               8 |
| async replica | db_name_0                    | 127.0.0.1:10001/db_name_0                    | online       | 0:30423          | 0:30423            | 0:30423          | 0:30423        |           1778 | yes       | NULL                       | replicating   | 0:30423           | 0:30423             | 0:30423           | 0:30423         |            1778 |
| master        | db_name_0                    | NULL                                         | replicating  | 0:30423          | 0:30423            | 0:30423          | 0:30423        |           1778 | yes       | 127.0.0.1:20001/db_name_0  | replicating   | 0:30423           | 0:30423             | 0:30423           | 0:30423         |            1778 |
| async replica | db_name_0_AUTO_REPLICA       | 127.0.0.1:20001/db_name_0_AUTO_REPLICA       | replicating  | 0:30423          | 0:30423            | 0:30423          | 0:30423        |           1778 | yes       | NULL                       | replicating   | 0:30423           | 0:30423             | 0:30423           | 0:30423         |            1778 |
| async replica | db_name_10                   | 127.0.0.1:10001/db_name_10                   | online       | 0:29835          | 0:29835            | 0:29835          | 0:29835        |           1766 | yes       | NULL                       | replicating   | 0:29835           | 0:29835             | 0:29835           | 0:29835         |            1766 |
| master        | db_name_10                   | NULL                                         | replicating  | 0:29835          | 0:29835            | 0:29835          | 0:29835        |           1766 | yes       | 127.0.0.1:20001/db_name_10 | replicating   | 0:29835           | 0:29835             | 0:29835           | 0:29835         |            1766 |
| async replica | db_name_10_AUTO_REPLICA      | 127.0.0.1:20001/db_name_10_AUTO_REPLICA      | replicating  | 0:29835          | 0:29835            | 0:29835          | 0:29835        |           1766 | yes       | NULL                       | replicating   | 0:29835           | 0:29835             | 0:29835           | 0:29835         |            1766 |
| async replica | db_name_12                   | 127.0.0.1:10001/db_name_12                   | online       | 0:29773          | 0:29773            | 0:29773          | 0:29773        |           1747 | yes       | NULL                       | replicating   | 0:29773           | 0:29773             | 0:29773           | 0:29773         |            1747 |
| master        | db_name_12                   | NULL                                         | replicating  | 0:29773          | 0:29773            | 0:29773          | 0:29773        |           1747 | yes       | 127.0.0.1:20001/db_name_12 | replicating   | 0:29773           | 0:29773             | 0:29773           | 0:29773         |            1747 |
| async replica | db_name_12_AUTO_REPLICA      | 127.0.0.1:20001/db_name_12_AUTO_REPLICA      | replicating  | 0:29773          | 0:29773            | 0:29773          | 0:29773        |           1747 | yes       | NULL                       | replicating   | 0:29773           | 0:29773             | 0:29773           | 0:29773         |            1747 |
| async replica | db_name_14                   | 127.0.0.1:10001/db_name_14                   | online       | 0:29476          | 0:29476            | 0:29476          | 0:29476        |           1736 | yes       | NULL                       | replicating   | 0:29476           | 0:29476             | 0:29476           | 0:29476         |            1736 |
| master        | db_name_14                   | NULL                                         | replicating  | 0:29476          | 0:29476            | 0:29476          | 0:29476        |           1736 | yes       | 127.0.0.1:20001/db_name_14 | replicating   | 0:29476           | 0:29476             | 0:29476           | 0:29476         |            1736 |
| async replica | db_name_14_AUTO_REPLICA      | 127.0.0.1:20001/db_name_14_AUTO_REPLICA      | replicating  | 0:29476          | 0:29476            | 0:29476          | 0:29476        |           1736 | yes       | NULL                       | replicating   | 0:29476           | 0:29476             | 0:29476           | 0:29476         |            1736 |
| async replica | db_name_2                    | 127.0.0.1:10001/db_name_2                    | online       | 0:29188          | 0:29188            | 0:29188          | 0:29188        |           1696 | yes       | NULL                       | replicating   | 0:29188           | 0:29188             | 0:29188           | 0:29188         |            1696 |
| master        | db_name_2                    | NULL                                         | replicating  | 0:29188          | 0:29188            | 0:29188          | 0:29188        |           1696 | yes       | 127.0.0.1:20001/db_name_2  | replicating   | 0:29188           | 0:29188             | 0:29188           | 0:29188         |            1696 |
| async replica | db_name_2_AUTO_REPLICA       | 127.0.0.1:20001/db_name_2_AUTO_REPLICA       | replicating  | 0:29188          | 0:29188            | 0:29188          | 0:29188        |           1696 | yes       | NULL                       | replicating   | 0:29188           | 0:29188             | 0:29188           | 0:29188         |            1696 |
| async replica | db_name_4                    | 127.0.0.1:10001/db_name_4                    | online       | 0:30611          | 0:30611            | 0:30611          | 0:30611        |           1798 | yes       | NULL                       | replicating   | 0:30611           | 0:30611             | 0:30611           | 0:30611         |            1798 |
| master        | db_name_4                    | NULL                                         | replicating  | 0:30611          | 0:30611            | 0:30611          | 0:30611        |           1798 | yes       | 127.0.0.1:20001/db_name_4  | replicating   | 0:30611           | 0:30611             | 0:30611           | 0:30611         |            1798 |
| async replica | db_name_4_AUTO_REPLICA       | 127.0.0.1:20001/db_name_4_AUTO_REPLICA       | replicating  | 0:30611          | 0:30611            | 0:30611          | 0:30611        |           1798 | yes       | NULL                       | replicating   | 0:30611           | 0:30611             | 0:30611           | 0:30611         |            1798 |
| async replica | db_name_6                    | 127.0.0.1:10001/db_name_6                    | online       | 0:30573          | 0:30573            | 0:30573          | 0:30573        |           1797 | yes       | NULL                       | replicating   | 0:30573           | 0:30573             | 0:30573           | 0:30573         |            1797 |
| master        | db_name_6                    | NULL                                         | replicating  | 0:30573          | 0:30573            | 0:30573          | 0:30573        |           1797 | yes       | 127.0.0.1:20001/db_name_6  | replicating   | 0:30573           | 0:30573             | 0:30573           | 0:30573         |            1797 |
| async replica | db_name_6_AUTO_REPLICA       | 127.0.0.1:20001/db_name_6_AUTO_REPLICA       | replicating  | 0:30573          | 0:30573            | 0:30573          | 0:30573        |           1797 | yes       | NULL                       | replicating   | 0:30573           | 0:30573             | 0:30573           | 0:30573         |            1797 |
| async replica | db_name_8                    | 127.0.0.1:10001/db_name_8                    | online       | 0:29812          | 0:29812            | 0:29812          | 0:29812        |           1735 | yes       | NULL                       | replicating   | 0:29812           | 0:29812             | 0:29812           | 0:29812         |            1735 |
| master        | db_name_8                    | NULL                                         | replicating  | 0:29812          | 0:29812            | 0:29812          | 0:29812        |           1735 | yes       | 127.0.0.1:20001/db_name_8  | replicating   | 0:29812           | 0:29812             | 0:29812           | 0:29812         |            1735 |
| async replica | db_name_8_AUTO_REPLICA       | 127.0.0.1:20001/db_name_8_AUTO_REPLICA       | replicating  | 0:29812          | 0:29812            | 0:29812          | 0:29812        |           1735 | yes       | NULL                       | replicating   | 0:29812           | 0:29812             | 0:29812           | 0:29812         |            1735 |
+---------------+------------------------------+----------------------------------------------+--------------+------------------+--------------------+------------------+----------------+----------------+-----------+----------------------------+---------------+-------------------+---------------------+-------------------+-----------------+-----------------+

In this sample output, the first line refers to replication of the reference database (metadata) for db_name. This data is replicated from primary-MA to secondary-MA from which it is replicated to each leaf node in the secondary cluster. The remaining lines refer to replication of the partitions of the sharded database db_name. As you can see, the data is replicated directly from leaf nodes in the primary cluster to leaf nodes in the secondary cluster. In this example, secondary-L1 is receiving data from both primary-L1 and primary-L2.

Finally, note that SingleStore will automatically take the steps necessary to ensure the secondary cluster is consistent with the primary cluster. For example, if a leaf node in a primary cluster with redundancy 2 and a replica partition on the secondary cluster gets ahead of a replica partition on the primary cluster (due to network or other irregularity), SingleStore will automatically drop and reprovision the replica partition on the secondary cluster such that it will be consistent with the recently promoted master partition on the primary cluster. Please note that the dropped or reprovisioning replica partition will not appear in the SHOW REPLICATION STATUS output.

Monitoring Replication Progress from the Primary to Secondary Cluster

You can view the progress of replications between the primary and the secondary cluster by querying the following information schema views:

lmv_replication_status: This view can be accessed from any node. It provides the replication status of every replication link on the single node which the view is accessed from. In case of inter-cluster replication, a replication link is either an instance of replication between the aggregator on the primary and secondary cluster, or an instance of replication between a leaf’s partition on the primary and secondary cluster. For each replication link, the view shows statistics such as replication status, volume of lag (in MBs), total number of lagging LSNs, replication throughput, and the estimated time lag.
mv_replication_status: This view can be accessed from any aggregator. It provides the replication status of every replication link in all nodes in the cluster. For each replication link, it provides the same information as the lmv_replication_status view provides.
mv_aggregated_replication_status: This view can be accessed from any aggregator. It provides the aggregated replication status for each database in the cluster. In the view, partitions of a database are aggregated into one row, where each row corresponds to a specific type of replication (DR/non-DR).

You can tell whether replication is caught up on the secondary cluster by checking the values of the LSN_LAG or ESTIMATED_CATCHUP_TIME_S columns. If either of those columns has a value of 0, replication is finished on the secondary cluster.

Example

Consider two SingleStore clusters, db as the primary cluster and db1 as the secondary cluster. To replicate db and view its replication progress, perform the following steps:

Connect to the master aggregator of the secondary cluster db1 and execute the following command to replicate db:
SQL
```
REPLICATE DATABASE db FROM root:'memsqlroot1'@192.168.0.1:3306/db;
```

Query the mv_replication_status view in the information_schema database to view the replication status of all database partitions and replication links between the primary and the secondary cluster:

SQL

SELECT * FROM information_schema.MV_REPLICATION_STATUS WHERE database_name LIKE 'db%' AND IS_DR = 1;

+-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
| IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI          | SECONDARY_URI        | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S |
+-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
| 1     | 0       | db            | 127.0.0.1:10000/db   | 127.0.0.1:20000/db   | online        | replicating     | 0       | 0.0           | 0.008                       | 0.0                      |
| 1     | 0       | db_0          | 127.0.0.1:10001/db_0 | 127.0.0.1:20001/db_0 | online        | replicating     | 19561   | 80.122        | 28.099                      | 2.851                    |
| 1     | 0       | db_1          | 127.0.0.1:10002/db_1 | 127.0.0.1:20002/db_1 | online        | replicating     | 18804   | 77.021        | 18.125                      | 4.249                    |
+-------+---------+---------------+----------------------+----------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+

Note: DATABASE_NAME with a suffix such as db_0, db_1 indicates a database partition, while without any suffix indicates link for the master aggregator. Also, IS_DR = 1 indicates replication between clusters.

To view the DR replication links between the primary master aggregator and the secondary master aggregator of the database db that are lagging, query the mv_replication_status view as follows:

SQL

SELECT * FROM information_schema.MV_REPLICATION_STATUS WHERE primary_uri = '192.168.0.1:3306/db';

+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
| IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI        | SECONDARY_URI      | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S |
+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
|     1 |       0 | db            | 192.168.0.1:3306/db | 127.0.0.1:10000/db | online        | replicating     |       0 |         0.000 |                       0.000 |                    0.000 |
+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+

Query the mv_aggregated_replication_status view in the information_schema database to view the aggregated replication lag of the entire database db (including partitions) between the primary and secondary cluster:

SQL

SELECT * FROM information_schema.MV_AGGREGATED_REPLICATION_STATUS WHERE database_name = 'db' AND IS_DR = 1;

+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
| IS_DR | IS_SYNC | DATABASE_NAME | PRIMARY_URI        | SECONDARY_URI      | PRIMARY_STATE | SECONDARY_STATE | LSN_LAG | VOLUME_LAG_MB | REPLICATION_THROUGHPUT_MBPS | ESTIMATED_CATCHUP_TIME_S |
+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+
| 1     | 0       | db            | 127.0.0.1:10000/db | 127.0.0.1:20000/db | online        | replicating     | 45895   | 187.986       | 3.227                       | 7339.667                 |
+-------+---------+---------------+--------------------+--------------------+---------------+-----------------+---------+---------------+-----------------------------+--------------------------+

Note: IS_DR = 1 indicates replication between clusters.