High Availability (HA)and Disaster Recovery (DR) FAQs
What triggers reprovisioning, and how to identify that it is happening?
Reprovisioning is the process of overwriting a replica's data with the master’s data.
Reprovision happens in two cases:
To identify reprovisioning is happening, you can use the event table and look for the DATABASE_
Can the re-replication of a database or a single partition be forced? What role does
SNAPSHOT DATABASE play here?
You cannot force re-replication.
SNAPSHOT DATABASE can help when setting up DR, as the database will have to replicate a smaller amount of logs but otherwise has no major role.
How does an intensive ingest on the primary side affect DR and any potentially running provisioning?
In a rare case, DR may lag slightly, but replication is generally very efficient.
What happens if there is a DDL operation on the primary side during replication? Does this block any operations?
DDL statements (
CREATE/ALTER/DROP ) are all propagated to the replicate.
How are users/grants/roles/groups/passwords handled with replication?
They are all maintained at the cluster level while replication is set up at the database level.
What are the different configurations for setting up DR?
Can you drop a database if replication is set up?
Yes, you can.
What is the impact of leaf node removal or failure on the redundancy mode of data?
It depends on the setting of the
leaf_ global variable.
In 'paired' mode, redundancy is not maintained when removing leaf nodes.
Why is it not advisable to have multiple availability groups (AGs) on the same host?
Rebalance uses AGs to make sure it puts a replica of a partition on a different AG than the master partition so that, if the node with the master goes offline, the cluster can failover to the replica which is on a different node.
How do temporary tables behave in a DR instance?
Temporary tables are not persistent and are not replicated for DR.
What is the Failover Grace Period for flapping leaves in HA?
For sync replication, the default since 7.
For async replication, if a leaf node fails repeatedly and HA is enabled, then a grace period of 300 seconds is initiated (and increases exponentially if flapping continues).
failover_ = 0' and '
failover_ = 0'.
How are pipelines handled in replication?
CREATE/DROP/ALTER pipeline commands are replicated to the replicate database.
One caveat, when you stop replication, the pipelines on the replicate cluster will inherit the state when the replication was stopped.
Does taking a snapshot of the primary database have an impact on the speed of replication?
Taking a snapshot will help the speed at which replication is first set up.
How are inflight queries and transactions handled in a failover scenario?
For the most part, "connection lost errors" are not retried if there is an "offline" failover (i.
UPDATEs and DELETEs are retried.
SELECT queries that have sent any rows back to the user already cannot be retried.
Queries inside multi-statement transactions other then the first query cannot be retried.
BEGIN;INSERT;INSERT; <--- if we hit a failover here we can't retry.COMMIT;
This transaction will rollback if any node fails.
Last modified: March 16, 2023