Replicate Data from MySQL
On this page
Change Data Capture (CDC) pipelines enable you to ingest historical data and sync the continuous changes to data as they happen on the source MySQL database.
-
Configure the source MySQL database.
Refer to Configure MySQL for more information. -
Configure the path of the
java
binary.Refer to Configure java
Binary Path for more information. -
(Optional) Create a link to the MySQL database.
Refer to CREATE LINK for more information. You can also specify the link configuration and credentials in the CONFIG
/CREDENTIALS
clause of theCREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement instead of creating a link.Note
Create a link to the primary MySQL instance.
SingleStore does not support replicating MySQL databases using CDC from secondary (replica) MySQL instances. -
Create the required table(s), stored procedure(s), and pipeline(s) using the
CREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement.Refer to Syntax for more information. You can either replicate the MySQL tables as is or apply custom transformations. Note
Before restarting the
INFER PIPELINE
operation, delete all the related artifacts.DROP TABLE <target_table_name>;DROP PIPELINE <pipeline_name>;DROP PROCEDURE <procedure_name>; -
Once all the components are configured, start the pipelines.
-
To start all the pipelines, run the
START ALL PIPELINES
SQL statement. -
To start a specific pipeline, run the
START PIPELINE <pipeline_
SQL statement.name> By default, the pipeline is named <source_
.db_ name>. <table_ name>
Note: Once the pipeline(s) (CDC operation) has started, do not run
ALTER TABLE
statements on the source MySQL database. -
For more information, refer to the relevant section on this page.
Replicate MySQL Tables Example
The following example shows how to replicate MySQL tables without any custom transformations.LINK
clause to specify the MySQL endpoint connection configuration.
Perform the following tasks after configuring MySQL:
-
Create a link to the MySQL endpoint using the
CREATE LINK
command.CREATE LINK pLink AS MYSQLCONFIG '{"database.hostname": "svchost", "database.port": 3306, "database.ssl.mode":"required"}'CREDENTIALS '{"database.password": "pa55w0rd", "database.user": "repl_user"}'; -
Create tables, pipelines, and stored procedures in SingleStore based on the inference from the source tables:
CREATE TABLES AS INFER PIPELINE AS LOAD DATALINK pLink "*" FORMAT AVRO; -
Start the pipelines, and begin the replication process.
START ALL PIPELINES;
Prerequisites
Configure java
Binary Path
Install JRE version 11+, and specify the full path of the java
binary using the java_
engine variable.java
binary path in the following command and then run it to configure java_
:
sdb-admin update-config --all --set-global --key "java_pipelines_java11_path" --value "/path_to/bin/java"
Refer to sdb-admin update-config for more information.
Configure MySQL
-
Enable binary logging for MySQL replication by setting the
log-bin
option tomysql-bin
in your MySQL configuration.Refer to Enabling the binlog for more information. -
Configure the following additional options in MySQL, as specified:
binlog_format=ROW binlog_row_image=FULL
You may need to restart MySQL in order to incorporate the configuration changes.
Consult the MySQL documentation for more information. -
Create a new database named
singlestore
.This database is required for internal usage. CREATE DATABASE IF NOT EXISTS singlestore;The
singlestore
database is meant to store the pipeline metadata information to confirm the status of the pipeline and other operations.SingleStore does not allow replicating this database and the tables that are included in it. The tables in the singlestore
database can be empty by design. -
Create a database user for connecting to the MySQL instance:
CREATE USER <user> IDENTIFIED BY '<password>';You can also use an existing user.
-
Grant the following privileges to the
<user>
:GRANT CREATE, INSERT, DELETE, DROP, SELECT ON singlestore.* TO <user>;GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO <user>;GRANT SELECT ON *.* TO <user>; -
For authentication, the
<user>
must use themysql_
plugin.native_ password Refer to Native Pluggable Authentication for more information. -
The MySQL instance must allow incoming traffic from the SingleStore cluster.
-
All the source MySQL tables must have primary keys.
Syntax
CREATE TABLE [IF NOT EXISTS] <table_name>AS INFER PIPELINE AS LOAD DATA{ LINK <link_name> "<source_db>.<source_table>" |MYSQL "<source_db>.<source_table>" CONFIG '<config_json>' CREDENTIALS '<credentials_json>' }FORMAT AVRO;
CREATE TABLES [IF NOT EXISTS]AS INFER PIPELINE AS LOAD DATA{ LINK <link_name> "*" |MYSQL "*" CONFIG '<config_json>' CREDENTIALS '<credentials_json>' }FORMAT AVRO;
CREATE TABLE . . . AS INFER PIPELINE
Behavior
The CREATE TABLE [IF NOT EXISTS] <table_
statement,
-
Connects to the MySQL instance using the specified
LINK <link_
orname> CONFIG
/CREDENTIALS
clause. -
Infers the schema of the table and creates a table (named <table_
name>) using the inferred schema. When the IF NOT EXISTS
clause is specified and a table with the specified name already exists, a new table is not created and the existing table is used instead.Refer to Limitations for more information. -
Creates a pipeline (named
<source_
) and stored procedure (nameddb_ name>. <table_ name> <source_
) that maps the AVRO data structure to the SingleStore data structure.db_ name>. <table_ name> The IF NOT EXISTS
clause is ignored for pipelines and stored procedures.If a pipeline or stored procedure with the same name already exists, the CREATE TABLE .
statement returns an error.. . AS INFER PIPELINE
CREATE TABLES AS INFER PIPELINE
Behavior
The CREATE TABLES [IF NOT EXISTS] AS INFER PIPELINE
statement creates a table in SingleStore for each table in the source MySQL database using the same set of operations as the CREATE TABLE [IF NOT EXISTS] <table_
statement (specified above).database.
, database.
, table.
, and table.
.
Arguments
-
<table_
: Name of the table to create in the SingleStore database.name> You can also specify a table name that differs from the name of the source MySQL table. -
<link_
: Name of the link to the MySQL endpoint.name> Refer to CREATE LINK for more information. -
<source_
: Name of the source MySQL database.db> -
<source_
: Name of the source MySQL table.table> -
<config_
: Configuration parameters, including the source MySQL configuration, in the JSON format.json> Refer to Parameters for supported parameters. -
<credentials_
: Credentials to use to access the MySQL database in JSON format.json> For example: CREDENTIALS '{"database.password": "<password>", "database.user": "<user>"}'
-
database.
: Name of the MySQL database user.user -
database.
: Password of the MySQL database user.password
-
Parameters
The CREATE {TABLE | TABLES}
, CREATE LINK
, and CREATE AGGREGATOR PIPELINE
statements support the following parameters in the CONFIG
clause:
Parameter |
Description |
---|---|
|
IP address or the hostname of the source MySQL instance. |
|
Port of the MySQL instance. Default: |
|
Specifies the SSL mode. |
|
Specifies the size of the queue inside the extractor process for records that are ready for ingestion. Default: |
|
Specifies the maximum number of rows of data fetched from the remote source in a single iteration (batch). Default: |
|
Specifies the interval for polling of remote sources (in milliseconds) if there were no new records in the previous iteration in the replication process. Default: |
|
A comma-separated list of regular expressions that match fully-qualified table identifiers (in |
|
A comma-separated list of regular expressions that match fully-qualified table identifiers (in |
|
A comma-separated list of regular expressions that match the names of databases to monitor. |
|
A comma-separated list of regular expressions that match the names of databases to exclude from monitoring. |
|
A table in the remote MySQL source that is used by SingleStore to generate special markings for snapshotting and synchronization. |
|
Specifies the snapshot mode for the pipeline.
Refer to CDC Snapshot Strategies for more information. Default: |
Replication Strategies
Use one of the following methods to create the required components for data ingestion.
Replicate MySQL Tables As Is
To replicate or migrate MySQL tables as is, use the CREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement.
Apply Transformations or Ingest a Subset of Columns
To apply transformations or ingest only a subset of columns, manually create the required tables, stored procedure, and pipelines:
-
Run the
CREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement to infer the schema of the MySQL table(s) and automatically generate templates for the relevant table(s), stored procedure(s), and aggregator pipeline(s). -
Use the automatically-generated templates as a base to create a new table(s), stored procedure(s), and pipeline(s) for custom transformations.
To inspect the generated table(s), stored procedure(s), and pipeline(s), use the SHOW CREATE TABLE
,SHOW CREATE PROCEDURE
, andSHOW CREATE PIPELINE
commands, respectively.After running the SHOW
commands, you can drop the templates and then recreate the same components with custom transformations.Using the automatically-generated templates:
-
Create table(s) in SingleStore with a structure that can store the ingested MySQL table.
By default, the primary key is used as the shard key for a table. You can specify a different shard key and sort key in the table definition. For an example, refer to Example 3. Refer to CREATE TABLE for more information. -
Create stored(s) procedure to map the MySQL table columns to the SingleStore table and implement other transformations required.
Refer to CREATE PROCEDURE for information on creating stored procedures. -
Create pipeline(s) to ingest the MySQL tables using the
CREATE AGGREGATOR PIPELINE
SQL statement.Note that the pipeline definition may contain encoded names in the field mapping clause for unsupported symbols in the
_
format because of Avro naming limitations, for example,uxxxx `i_
.d` <- `payload`::`i_ u005fd` The table and stored procedure declarations use human readable Unicode symbols instead of the encoding. Hence, they do not require additional handling. Refer to Parameters for a list of supported parameters. Refer to CREATE PIPELINE for the complete syntax and related information. Note: The CDC feature only supports
AGGREGATOR
pipelines.
-
Refer to Syntax for information on CREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement.
CDC Snapshot Strategies
SingleStore supports the following strategies for creating snapshots:
-
Perform a full snapshot before CDC:
The pipeline captures the position in the binary log and then performs a full snapshot of the data.
Once the snapshot is complete, the pipeline continues ingestion using CDC. This strategy is enabled by default. If the pipeline is restarted while the snapshot is in progress, the snapshot is restarted from the beginning. To use this strategy, set
"snapshot.
in themode":"initial" CONFIG
JSON.Requirement: The binary log retention period must be long enough to maintain the records while the snapshot is in progress.
Otherwise, the pipeline will fail and the process will have to be started over. -
CDC only:
The pipeline will not ingest existing data, and only the changes are captured using CDC.
To use this strategy, set
"snapshot.
in themode":"schema_ only" CONFIG
JSON. -
Perform a snapshot in parallel to CDC:
The pipeline captures the position in the binary log and starts capturing the changes using CDC.
In parallel, the pipeline performs incremental snapshots of the existing data and merges it with the CDC records. Although this strategy is slower than performing a full snapshot and then ingesting changes using CDC, it is more resilient to pipeline restarts. To use this strategy, set
"snapshot.
in themode":"incremental" CONFIG
JSON.Requirement: The binary log retention period must be long enough to compensate for unexpected pipeline downtime.
-
Manually perform the snapshot and capture changes using the CDC pipeline:
-
Create a pipeline and then wait for at least one batch of ingestion to capture the binary log position.
-
Stop the pipeline.
-
Snapshot the data using any of the suitable methods, for example,
mysqldump
. -
Restore the snapshot in SingleStore with any of the supported methods, for example, load data from an S3 bucket.
-
Start the CDC pipeline.
To use this strategy, set
"snapshot.
in themode":"schema_ only" CONFIG
JSON.This strategy provides faster data ingestion when the initial historical data is very large in size. Requirement: The binary log retention period must be long enough to maintain the records while the snapshot is in progress.
Otherwise, the pipeline will fail and the process will have to be started over. -
Configure Ingestion Speed Limit using Engine Variables
Use the following engine variables to configure ingestion speed:
Variable Name |
Description |
Default Value |
---|---|---|
|
Specifies a forced delay in row emission while migrating/replicating your tables (or collections) to your SingleStore databases. |
|
|
Specifies the JVM heap size limit (in MBs) for CDC-in pipelines. |
|
|
Specifies the maximum number of CDC-in extractor instances that can run concurrently. |
|
|
Specifies the minimum duration (in seconds) that the extractor allocates to a single pipeline for ingesting data and listening to CDC events. |
|
Refer to Engine Variables for information on how to set engine variables.
In-Depth Variable Definitions
Use the pipelines_
engine variable to limit the impact of CDC pipelines on the master aggregator node.1000000
.0
.
Warning
Disabling the emit delay may result in excessive CPU usage on master aggregator nodes.
Use the max.
parameter in the CONFIG
JSON to control the ingestion speed.max.
to half of max.
.pipelines_
engine variable accordingly.INFORMATION_
table for information on pipeline batch performance.
Use the pipelines_
engine variable to limit the number of CDC-in extractor instances that can run concurrently.pipelines_
, some pipelines will have to wait in the queue until an extractor can be acquired to fetch data.1024
.
Use the pipelines_
variable to specify the minimum duration (in seconds) that the extractor allocates to a single pipeline for ingesting data and listening to CDC events.3600
.
Optimize CDC-in Pipelines
Pipelines and Extractors
CDC-in pipelines ("pipelines") are aggregator pipelines that run on the Master Aggregator (MA).
Because the MA is limited in resources, it can only run a limited number of CDC-in extractors.pipelines_
), then the extractors extract data to pipelines_
number of pipelines for pipelines_
duration, before moving on to the next set of pipelines in the queue.
For example, if the total number of pipelines is 50, pipelines_
is set to 10
, and pipelines_
is set to 60
, then the extractors allocate resources to first 10 pipelines for 60 seconds, then to the next 10 pipelines in the queue for 60 seconds, and so on.
SingleStore recommends ingesting a limited number of tables using CDC-in pipelines.
Memory and Resource Consumption
Each extractor consumes a persistent amount of resources (approximately pipelines_
per extractor).pipelines_
MB.
Note: Total memory consumption may be higher and includes memory usage by static system memory, shared libraries, JVM heap, etc.
To reduce the resource consumption on the MA, either reduce the maximum number of extractors (pipelines_
) or the JVM heap size (pipelines_
).
Note: If JVM heap size is reduced, you may also need to reduce the pipeline batch and queue size.
Limitations
-
SingleStore intentionally does not support all of the features of MySQL.
Refer to Unsupported MySQL Features for more information. -
Autoincrement properties, default values, indexes, and additional keys (excluding primary keys) are not inferred or replicated.
-
If any column in the source MySQL table has a data type that is not supported by SingleStore, all the columns in the table are ingested as
TEXT
type in SingleStore.Apply custom transformations to your stored procedures and pipelines to map the unsupported MySQL type to SingleStore type. Refer to Data Types for information on supported data types. -
The MySQL database, table, or column names must not contain the following characters:
/
,\
,`
,'
,"
,'\b'
,'\t'
,'\n'
,'\f'
,'\r'
, or non-printable characters.Additionally, the database name must not contain .
.
Troubleshooting
-
If the
CREATE {TABLES | TABLE} AS INFER PIPELINE
SQL statement returns an error, run theSHOW WARNINGS
command to view the reason behind the failure. -
To view the status of the pipelines, query the
information_
table.schema. pipelines_ cursors Run the following SQL statement to display the status of the replication task: SELECT SOURCE_PARTITION_ID,EARLIEST_OFFSET,LATEST_OFFSET,LATEST_EXPECTED_OFFSET-LATEST_OFFSET as DIFF,UPDATED_UNIX_TIMESTAMPFROM information_schema.pipelines_cursors; -
To view pipeline errors, run the following SQL statement:
SELECT * FROM information_schema.pipelines_errors; -
If a pipeline fails with an out of memory error in Java, either increase the heap size using the
pipeline_
engine variable or reduce thecdc_ java_ heap_ size max.
andqueue. size max.
parameters in thebatch. size CONFIG
clause.The heap size is limited by the memory available on the MA. SingleStore recommends setting the queue size double of batch size.
Examples
Refer to Replicate MySQL Tables Example for a sample replication example.
Example 1
The following example shows how to specify a table name that differs from the source MySQL table name:
CREATE TABLE s2table AS INFER PIPELINE AS LOAD DATALINK pLink "db1.mtest" FORMAT AVRO;
This command replicates the mtest
table from the source MySQL database to the s2table
in SingleStore.
Example 2
The following example shows how to specify additional parameters in a CREATE LINK
statement:
CREATE LINK pLink AS MYSQLCONFIG '{"database.hostname": "svchost","database.port": 3306,"database.ssl.mode":"required","database.exclude.list": "mysql,performance_schema","table.include.list": "db1.example","max.queue.size": 1024,"max.batch.size": 512,"poll.interval.ms": 500}'CREDENTIALS '{"database.password": "pa55w0rd","database.user": "repl_user"}';
Example 3
The following example shows how to specify a different sort key and shard key for the table.
-
Infer the schema of the source MySQL table and generate a template for the table:
CREATE TABLE IF NOT EXISTS mtest AS INFER PIPELINEAS LOAD DATA LINK pLink "db1.mtest" FORMAT AVRO; -
Run the
SHOW CREATE TABLE
command to get the table definition:SHOW CREATE TABLE mtest;+-----------+-----------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-----------+----------------------------------------------------------------------------------------------------------------------------------+ | mtest | CREATE TABLE `mtest` ( `id` int(11) NOT NULL, `text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, PRIMARY KEY (`id`), SHARD KEY `__SHARDKEY` (`id`), SORT KEY `__UNORDERED` () ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES,NO_AUTO_CREATE_USER' | +-----------+-----------------------------------------------------------------------------------------------------------------------------------+
-
Drop the existing table, and use the definition of the template as base to specify a different sort key or shard key:
DROP TABLE mtest;CREATE TABLE mtest (id INT(11) NOT NULL PRIMARY KEY,text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,SHARD KEY (id), SORT KEY(id, text));
Last modified: January 7, 2025