# Replicate Data from MySQL

> **📝 Note**: Replicating data from MySQL via CDC-in pipelines has been deprecated. Future application development must use [SingleStore Flow](https://docs.singlestore.com/db/v9.1/load-data/load-data-with-singlestore-flow.md), which is recommended for all workloads.

Change Data Capture (CDC) pipelines enable you to ingest historical data and sync the continuous changes to data as they happen on the source MySQL database. You can replicate your MySQL databases to SingleStore using CDC pipelines.

> **📝 Note**: You can only create a maximum of 16 CDC-in pipelines in total for MongoDB® and MySQL sources.

Perform the following tasks:

1. Configure the source MySQL database. Refer to [Configure MySQL](https://docs.singlestore.com/#section-idm4594041876137634092850850389.md) for more information.

2. Configure the path of the `java` binary. Refer to [Configure `java` Binary Path](https://docs.singlestore.com/#section-idm234424122741452.md) for more information.

3. (Optional) Create a link to the MySQL database. Refer to [CREATE LINK](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/create-link.md) for more information. You can also specify the link configuration and credentials in the `CONFIG`/`CREDENTIALS` clause of the `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement instead of creating a link.
   > **📝 Note**: Create a link to the primary MySQL instance. SingleStore does not support replicating MySQL databases using CDC from secondary (replica) MySQL instances.

4. Create the required table(s), stored procedure(s), and pipeline(s) using the `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement. Refer to [Syntax](https://docs.singlestore.com/#section-idm4503128687899233950410758547.md) for more information. You can either [replicate the MySQL tables as is](https://docs.singlestore.com/#section-idm234424168959935.md) or [apply custom transformations](https://docs.singlestore.com/#section-idm234424169115312.md).
   > **📝 Note**: Before restarting the `INFER PIPELINE` operation, delete all the related artifacts.```sql
   > DROP TABLE <target_table_name>;
   > DROP PIPELINE <pipeline_name>;
   > DROP PROCEDURE <procedure_name>;
   > ```

5. Once all the components are configured, start the pipelines.

   * To start all the pipelines, run the `START ALL PIPELINES` SQL statement.
   * To start a specific pipeline, run the `START PIPELINE <pipeline_name>` SQL statement. By default, the pipeline is named `<source_db_name>.<table_name>`.

   **Note**: Once the pipeline(s) (CDC operation) has started, do not run `ALTER TABLE` statements on the source MySQL database.

For more information, refer to the relevant section on this page.

* [Replication Strategies](https://docs.singlestore.com/#section-idm234424168393315.md)
* [CDC Snapshot Strategies](https://docs.singlestore.com/#section-idm323432564239142.md)
* [Configure Ingestion Speed Limit using Engine Variables](https://docs.singlestore.com/#section-idm323435536289338.md)
* [Limitations](https://docs.singlestore.com/#section-idm234424135826981.md)

## Replicate MySQL Tables Example

The following example shows how to replicate MySQL tables without any custom transformations. This example uses the `LINK` clause to specify the MySQL endpoint connection configuration.

Perform the following tasks after configuring MySQL:

1. Create a link to the MySQL endpoint using the `CREATE LINK` command.
   ```sql
   CREATE LINK pLink AS MYSQL 
     CONFIG '{"database.hostname": "svchost", "database.port": 3306, "database.ssl.mode":"required"}' 
     CREDENTIALS '{"database.password": "pa55w0rd", "database.user": "repl_user"}';
   ```

2. Create tables, pipelines, and stored procedures in SingleStore based on the inference from the source tables:
   ```sql
   CREATE TABLES AS INFER PIPELINE AS LOAD DATA 
     LINK pLink "*" FORMAT AVRO;
   ```

3. Start the pipelines, and begin the replication process.
   ```sql
   START ALL PIPELINES;
   ```

## Prerequisites

## Configure `java` Binary Path

Install JRE version 25+, and specify the full path of the `java` binary using the `java_path` engine variable. You must set this engine variable on each node. Update the `java` binary path in the following command and then run it to configure `java_path`:

```Shell
sdb-admin update-config --all --set-global --key "java_path" --value "/path_to/bin/java"
```

Refer to [sdb-admin update-config](https://docs.singlestore.com/db/v9.1/reference/singlestore-tools-reference/sdb-admin-commands/update-config.md) for more information.

## Configure MySQL

1. Enable binary logging for MySQL replication by setting the `log-bin` option to `mysql-bin` in your MySQL configuration. Refer to [Enabling the binlog](https://debezium.io/documentation/reference/stable/connectors/mysql.html#enable-mysql-binlog) for more information.

2. Configure the following additional options in MySQL, as specified:
   ```
   binlog_format=ROW
   binlog_row_image=FULL
   ```
   You may need to restart MySQL in order to incorporate the configuration changes. Consult the MySQL documentation for more information.

3. Create a new database named `singlestore`. This database is required for internal usage.
   ```sql
   CREATE DATABASE IF NOT EXISTS singlestore;
   ```
   The `singlestore` database is meant to store the pipeline metadata information to confirm the status of the pipeline and other operations. SingleStore does not allow replicating this database and the tables that are included in it. The tables in the `singlestore` database can be empty by design.

4. Create a database user for connecting to the MySQL instance:
   ```SQL
   CREATE USER <user> IDENTIFIED BY '<password>';
   ```
   You can also use an existing user.

5. Grant the following privileges to the `<user>`:
   ```sql
   GRANT CREATE, INSERT, DELETE, DROP, SELECT ON singlestore.* TO <user>;
   GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO <user>;
   GRANT SELECT ON *.* TO <user>;
   ```

6. For authentication, the `<user>` must use the `mysql_native_password` plugin. Refer to [Native Pluggable Authentication](https://dev.mysql.com/doc/refman/8.0/en/native-pluggable-authentication.html) for more information.

7. The MySQL instance must allow incoming traffic from the SingleStore cluster.

8. All the source MySQL tables must have primary keys.

## Syntax

```sql
CREATE TABLE [IF NOT EXISTS] <table_name> 
  AS INFER PIPELINE 
  AS LOAD DATA <mysql_configuration>
  FORMAT AVRO;

-- Use either the LINK or MYSQL clause, they are mutually exclusive --
<mysql_configuration>:
  LINK <link_name> "<source_db>.<source_table>"
  | MYSQL "<source_db>.<source_table>" CONFIG '<config_json>' CREDENTIALS '<credentials_json>'
```

```sql
CREATE TABLES [IF NOT EXISTS] 
  AS INFER PIPELINE 
  AS LOAD DATA <mysql_configuration>
  FORMAT AVRO;

-- Use either the LINK or MYSQL clause, they are mutually exclusive --
<mysql_configuration>:
  LINK <link_name> "*"
  | MYSQL "*" CONFIG '<config_json>' CREDENTIALS '<credentials_json>'
```

## `CREATE TABLE ... AS INFER PIPELINE` Behavior

The `CREATE TABLE [IF NOT EXISTS] <table_name> AS INFER PIPELINE` statement,

1. Connects to the MySQL instance using the specified `LINK <link_name>` or `CONFIG`/`CREDENTIALS` clause.

2. Infers the schema of the table and creates a table (named **\<table\_name>**) using the inferred schema. When the `IF NOT EXISTS` clause is specified and a table with the specified name already exists, a new table is not created and the existing table is used instead. Refer to [Limitations](https://docs.singlestore.com/#section-idm234424135826981.md) for more information.

3. Creates a pipeline (named `<source_db_name>.<table_name>`) and stored procedure (named `<source_db_name>.<table_name>`) that maps the AVRO data structure to the SingleStore data structure. The `IF NOT EXISTS` clause is ignored for pipelines and stored procedures. If a pipeline or stored procedure with the same name already exists, the `CREATE TABLE ... AS INFER PIPELINE` statement returns an error.

## `CREATE TABLES AS INFER PIPELINE` Behavior

The `CREATE TABLES [IF NOT EXISTS] AS INFER PIPELINE` statement creates a table in SingleStore for each table in the source MySQL database using the same set of operations as the `CREATE TABLE [IF NOT EXISTS] <table_name> AS INFER PIPELINE` statement (specified above). Additionally, it discovers the available databases and tables filtered by `database.exclude.list`, `database.include.list`, `table.exclude.list`, and `table.include.list`.

## Arguments

* `<table_name>`: Name of the table to create in the SingleStore database. You can also specify a table name that differs from the name of the source MySQL table.
* `<link_name>`: Name of the link to the MySQL endpoint. Refer to [CREATE LINK](https://docs.singlestore.com/db/v9.1/reference/sql-reference/security-management-commands/create-link.md) for more information.
* `<source_db>`: Name of the source MySQL database.
* `<source_table>`: Name of the source MySQL table.
* `<config_json>`: Configuration parameters, including the source MySQL configuration, in the JSON format. Refer to [Parameters](https://docs.singlestore.com/#section-idm457045647466883432568358349.md) for supported parameters.
* `<credentials_json>`: Credentials to use to access the MySQL database in JSON format. For example:
  ```
  CREDENTIALS '{"database.password": "<password>", "database.user": "<user>"}'
  ```
  * `database.user`: Name of the MySQL database user.
  * `database.password`: Password of the MySQL database user.

## Parameters

The `CREATE {TABLE | TABLES}`, `CREATE LINK`, and `CREATE AGGREGATOR PIPELINE` statements support the following parameters in the `CONFIG` clause:

| Parameter                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| ------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `database.hostname`      | IP address or the hostname of the source MySQL instance.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `database.port`          | Port of the MySQL instance.Default:`3306`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `database.ssl.mode`      | Specifies the SSL mode.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| `max.queue.size`         | Specifies the size of the queue inside the extractor process for records that are ready for ingestion. Increasing the queue size results in an increase in the memory consumption by the replication process and you may need to increase the`pipelines_cdc_java_heap_size`engine variable. Refer to[Configure Ingestion Speed Limit using Engine Variables](https://docs.singlestore.com/#section-idm323435536289338.md)for more information.Default:`1024`                                                                                                                                                                                              |
| `max.batch.size`         | Specifies the maximum number of rows of data fetched from the remote source in a single iteration (batch).`max.batch.size`must be lower than`max.queue.size`. Refer to[Configure Ingestion Speed Limit using Engine Variables](https://docs.singlestore.com/#section-idm323435536289338.md)for more information.Default:`512`                                                                                                                                                                                                                                                                                                                             |
| `poll.interval.ms`       | Specifies the interval for polling of remote sources (in milliseconds) if there were no new records in the previous iteration in the replication process.Default:`500`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `table.include.list`     | A comma-separated list of regular expressions that match fully-qualified table identifiers (in`databaseName.tableName`format) for MySQL tables to monitor. By default, changes in all the non-system tables in the database are captured. This parameter is only supported in`CREATE TABLES AS INFER PIPELINE`statement. When this option is specified, tables excluded from the list are not monitored. The`table.include.list`and`table.exclude.list`parameters are mutually exclusive, i.e., they cannot be used in the same`CREATE TABLES AS INFER PIPELINE`statement. This parameter is ignored in the`CREATE TABLE ... AS INFER PIPELINE`statement. |
| `table.exclude.list`     | A comma-separated list of regular expressions that match fully-qualified table identifiers (in`databaseName.tableName`format) for MySQL tables to exclude from the monitoring list. By default, this list is empty. This parameter is only supported in`CREATE TABLES AS INFER PIPELINE`statement. The`table.include.list`and`table.exclude.list`parameters are mutually exclusive, i.e., they cannot be used in the same`CREATE TABLES AS INFER PIPELINE`statement.                                                                                                                                                                                      |
| `database.include.list`  | A comma-separated list of regular expressions that match the names of databases to monitor. By default, all the databases are monitored. When this option is specified, databases excluded from the list are not monitored. This parameter is only supported in`CREATE TABLES AS INFER PIPELINE`statements. The`database.include.list`and`database.exclude.list`parameters are mutually exclusive, i.e., they cannot be used in the same`CREATE TABLE AS INFER PIPELINE`statement. If this option is used with the`table.include.list`or`table.exclude.list`option, it returns the intersection of the matches                                            |
| `database.exclude.list`  | A comma-separated list of regular expressions that match the names of databases to exclude from monitoring. By default, this list is empty. This parameter is only supported in`CREATE TABLES AS INFER PIPELINE`statement. The`database.include.list`and`database.exclude.list`parameters are mutually exclusive, i.e., they cannot be used in the same`CREATE TABLES AS INFER PIPELINE`statement. If this option is used with the`table.include.list`or`table.exclude.list`option, it returns the intersection of the matches.                                                                                                                           |
| `signal.data.collection` | A table in the remote MySQL source that is used bySingleStoreto generate special markings for snapshotting and synchronization. By default, this parameter is set to`singlestore.signals_xxxxxx`, where`xxxxxx`is an automatically generated character sequence. The default signal table is in the`singlestore`database. Once the pipelines are started, any change to the value of this parameter is ignored, and the pipelines use the latest value specified before the pipelines started.                                                                                                                                                            |
| `snapshot.mode`          | Specifies the snapshot mode for the pipeline. It can have one of the following values:<ul> <li><code>"initial"</code> (Default): Perform a full snapshot first and replay CDC events created during the snapshot. Then, continue ingestion using CDC.</li> <li><code>"incremental"</code>: Start the snapshot operation and CDC simultaneously.</li> <li><code>"schema_only"</code>: Skip the snapshot, and ingest changes using CDC.</li> </ul>Refer to[CDC Snapshot Strategies](https://docs.singlestore.com/#section-idm323432564239142.md)for more information.Default:`"initial"`                                                                    |

## Replication Strategies

Use one of the following methods to create the required components for data ingestion.

## Replicate MySQL Tables As Is

To replicate or migrate MySQL tables as is, use the `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement. This method automatically creates the required tables, pipelines, and stored procedures. Refer to [Syntax](https://docs.singlestore.com/#section-idm4503128687899233950410758547.md) for more information.

## Apply Transformations or Ingest a Subset of Columns

To apply transformations or ingest only a subset of columns, manually create the required tables, stored procedure, and pipelines:

1. Run the `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement to infer the schema of the MySQL table(s) and automatically generate templates for the relevant table(s), stored procedure(s), and aggregator pipeline(s).

2. Use the automatically-generated templates as a base to create a new table(s), stored procedure(s), and pipeline(s) for custom transformations. To inspect the generated table(s), stored procedure(s), and pipeline(s), use the `SHOW CREATE TABLE` , `SHOW CREATE PROCEDURE`, and `SHOW CREATE PIPELINE` commands, respectively. After running the `SHOW` commands, you can drop the templates and then recreate the same components with custom transformations.

   Using the automatically-generated templates:

   1. Create table(s) in SingleStore with a structure that can store the ingested MySQL table. By default, the primary key is used as the shard key for a table. You can specify a different shard key and sort key in the table definition. For an example, refer to [Example 3](https://docs.singlestore.com/#section-idm234424452975799.md). Refer to [CREATE TABLE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-definition-language-ddl/create-table.md) for more information.

   2. Create stored(s) procedure to map the MySQL table columns to the SingleStore table and implement other transformations required. Refer to [CREATE PROCEDURE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/procedural-sql-reference/create-procedure.md) for information on creating stored procedures.

   3. Create pipeline(s) to ingest the MySQL tables using the `CREATE AGGREGATOR PIPELINE` SQL statement.

      Note that the pipeline definition may contain encoded names in the field mapping clause for unsupported symbols in the `_uxxxx` format because of Avro naming limitations, for example, `` `i_d` <- `payload`::`i_u005fd` ``. The table and stored procedure declarations use human readable Unicode symbols instead of the encoding. Hence, they do not require additional handling. Refer to [Parameters](https://docs.singlestore.com/#section-idm457045647466883432568358349.md) for a list of supported parameters. Refer to [CREATE PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md) for the complete syntax and related information.

      **Note**: The CDC feature only supports `AGGREGATOR` pipelines.

Refer to [Syntax](https://docs.singlestore.com/#section-idm4503128687899233950410758547.md) for information on `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement.

## CDC Snapshot Strategies

SingleStore supports the following strategies for creating snapshots:

* **Perform a full snapshot before CDC** (`"snapshot.mode":"initial"`):

  The pipeline captures the position in the binary log and then performs a full snapshot of the data. Once the snapshot is complete, the pipeline continues ingestion using CDC. This strategy is enabled by default. If the pipeline is restarted while the snapshot is in progress, the snapshot is restarted from the beginning.

  If the initial snapshot is large in size and the deployment is prone to restarts or connection issues from the source, SingleStore recommends using the `incremental` snapshot mode. Note that the `incremental` snapshot mode is slower than the `initial` mode. For faster data ingestion when the initial historical data is large in size, manually perform the snapshot and capture changes using the `schema_only` mode.

  To use this strategy, set `"snapshot.mode":"initial"` in the `CONFIG` JSON.

  **Requirement**: The binary log retention period must be long enough to maintain the records while the snapshot is in progress. Otherwise, the pipeline will fail and the process will have to be started over.
* **CDC only** (`"snapshot.mode":"schema_only"`):

  The pipeline will not ingest existing data, and only the changes are captured using CDC.

  To use this strategy, set `"snapshot.mode":"schema_only"` in the `CONFIG` JSON.
* **Perform a snapshot in parallel to CDC** (`"snapshot.mode":"incremental"`):

  The pipeline captures the position in the binary log and starts capturing the changes using CDC. In parallel, the pipeline performs incremental snapshots of the existing data and merges it with the CDC records. Although this strategy is slower than performing a full snapshot and then ingesting changes using CDC, it is more resilient to pipeline restarts.

  To use this strategy, set `"snapshot.mode":"incremental"` in the `CONFIG` JSON.

  **Requirement**: The binary log retention period must be long enough to compensate for unexpected pipeline downtime.
* **Manually perform the snapshot and capture changes using the CDC pipeline**:

  To use this strategy, set `"snapshot.mode":"schema_only"` in the `CONFIG` JSON.

  1. Create a pipeline and then wait for at least one batch of ingestion to capture the binary log position.

  2. Stop the pipeline.

  3. Snapshot the data using any of the suitable methods, for example, `mysqldump`.

  4. Restore the snapshot in SingleStore with any of the supported methods, for example, [load data from an S3 bucket](https://docs.singlestore.com/db/v9.1/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3.md).

  5. Start the CDC pipeline.

  This strategy provides faster data ingestion when the initial historical data is very large in size.

  **Requirement**: The binary log retention period must be long enough to maintain the records while the snapshot is in progress. Otherwise, the pipeline will fail and the process will have to be started over.

## Configure Ingestion Speed Limit using Engine Variables

Use the following engine variables to configure ingestion speed:

| Variable Name                     | Description                                                                                                                                                             | Default Value |
| --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------- |
| `pipelines_cdc_row_emit_delay_us` | Specifies a forced delay in row emission while migrating/replicating your tables (or collections) to yourSingleStoredatabases. It can have a maximum value of`1000000`. | `1`           |
| `pipelines_cdc_java_heap_size`    | Specifies the JVM heap size limit (in MBs) for CDC-in pipelines.                                                                                                        | `128`         |

Refer to [Engine Variables](https://docs.singlestore.com/db/v9.1/reference/configuration-reference/engine-variables.md) for information on how to set engine variables. You can also use the [supported parameters](https://docs.singlestore.com/#section-idm457045647466883432568358349.md) to configure the ingestion speed.

## In-Depth Variable Definitions

Use the `pipelines_cdc_row_emit_delay_us` engine variable to limit the impact of CDC pipelines on the master aggregator node. It specifies a forced delay in row emission during ingest. This variable can be set to a maximum value of `1000000`. To disable the forced delay while ingestion, set this variable to `0`.

> **⚠️ Warning**: Disabling the emit delay may result in excessive CPU usage on master aggregator nodes.

Use the `max.queue.size` parameter in the `CONFIG` JSON to control the ingestion speed. SingleStore recommends setting `max.batch.size` to half of `max.queue.size`. Increasing the queue size requires a larger Java heap limit, adjust the `pipelines_cdc_java_heap_size` engine variable accordingly. Query the `INFORMATION_SCHEMA.PIPELINES_BATCHES_SUMMARY` table for information on pipeline batch performance.

## Optimize CDC-in Pipelines

## Pipelines and Extractors

CDC-in [pipelines](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines.md)  ("pipelines") are aggregator pipelines that run on the Master Aggregator (MA). Each pipeline extracts data from a single source table and loads data into a single SingleStore table. Extractors are subprocesses that extract data from the source and provide the data to the pipelines. The extractors are shared between the CDC-in pipelines.

SingleStore recommends ingesting a limited number of tables using CDC-in pipelines.

## Memory and Resource Consumption

Each extractor consumes a persistent amount of resources (approximately `pipelines_cdc_java_heap_size` per extractor). It is used to store the queue of rows for subsequent batches of ingestion.

**Note**: Total memory consumption may be higher and includes memory usage by the static system memory, shared libraries, JVM heap, etc.

To reduce the resource consumption on the MA reduce the JVM heap size (`pipelines_cdc_java_heap_size`). Although, reducing this limit may also reduce the ingestion speed.

**Note**: If JVM heap size is reduced, you may also need to reduce the pipeline batch and queue size.

## Limitations

* SingleStore intentionally does not support all of the features of MySQL. Refer to [Unsupported MySQL Features](https://docs.singlestore.com/db/v9.1/connect-to-singlestore/connect-with-mysql/connect-with-mysql-client/unsupported-mysql-features.md) for more information.
* Autoincrement properties, default values, indexes, and additional keys (excluding primary keys) are not inferred or replicated.
* If any column in the source MySQL table has a data type that is not supported by SingleStore, all the columns in the table are ingested as `TEXT` type in SingleStore. Apply custom transformations to your stored procedures and pipelines to map the unsupported MySQL type to SingleStore type. Refer to [Data Types](https://docs.singlestore.com/db/v9.1/reference/sql-reference/data-types.md) for information on supported data types.
* The MySQL database, table, or column names must not contain the following characters: `/`, `\`, `` ` ``, `'`, `"`, `'\b'`, `'\t'`, `'\n'`, `'\f'`, `'\r'`, or non-printable characters. Additionally, the database name must not contain `.`.

## Troubleshooting

* If the `CREATE {TABLES | TABLE} AS INFER PIPELINE` SQL statement returns an error, run the `SHOW WARNINGS` command to view the reason behind the failure.
* To view the status of the pipelines, query the `information_schema.PIPELINES_CURSORS` table. Run the following SQL statement to display the status of the replication task:
  ```sql
  SELECT SOURCE_PARTITION_ID, 
   EARLIEST_OFFSET,
   LATEST_OFFSET,
   LATEST_EXPECTED_OFFSET-LATEST_OFFSET as STATUS,
   UPDATED_UNIX_TIMESTAMP 
  FROM information_schema.PIPELINES_CURSORS;
  ```
  The value in the `STATUS` column indicates the following:

  * `1`: Indicates that the snapshot is in progress and the pipeline is expecting more data.
  * `0`: Indicates that the initial snapshot is complete and the pipeline is capturing changes using CDC.

  **Note**: If `snapshot.mode` is set to `incremental`, the pipeline performs incremental snapshots in parallel to capturing changes via CDC. In this case, the value `1` in the `STATUS` column indicates that the pipeline is performing incremental snapshots and capturing changes in parallel.
* To view pipeline errors, run the following SQL statement:
  ```sql
  SELECT * FROM information_schema.PIPELINES_ERRORS;
  ```
* If a pipeline fails with an out of memory error in Java, either increase the heap size using the `pipeline_cdc_java_heap_size` engine variable or reduce the `max.queue.size` and `max.batch.size` parameters in the `CONFIG` clause. The heap size is limited by the memory available on the MA. SingleStore recommends setting the queue size as double of the batch size.

## Examples

Refer to [Replicate MySQL Tables Example](https://docs.singlestore.com/#section-idm4578618939937633950439090354.md) for a sample replication example.

## Example 1

The following example shows how to specify a table name that differs from the source MySQL table name:

```sql
CREATE TABLE s2table AS INFER PIPELINE AS LOAD DATA 
  LINK pLink "db1.mtest" FORMAT AVRO;
```

This command replicates the `mtest` table from the source MySQL database to the `s2table` in SingleStore.

## Example 2

The following example shows how to specify additional parameters in a `CREATE LINK` statement:

```sql
CREATE LINK pLink AS MYSQL 
CONFIG '{
  "database.hostname": "svchost",
  "database.port": 3306,
  "database.ssl.mode":"required",
  "database.exclude.list": "mysql,performance_schema",
  "table.include.list": "db1[.]example",
  "max.queue.size": 1024, 
  "max.batch.size": 512, 
  "poll.interval.ms": 500}'
CREDENTIALS '{
  "database.password": "pa55w0rd", 
  "database.user": "repl_user"}';
```

## Example 3

The following example shows how to specify a different sort key and shard key for the table.

1. Infer the schema of the source MySQL table and generate a template for the table:
   ```sql
   CREATE TABLE IF NOT EXISTS mtest AS INFER PIPELINE
     AS LOAD DATA LINK pLink "db1.mtest"  FORMAT AVRO;
   ```

2. Run the `SHOW CREATE TABLE` command to get the table definition:
   ```sql
   SHOW CREATE TABLE mtest;

   ```
   ```output

   +-----------+-----------------------------------------------------------------------------------------------------------------------------------+
   | Table     | Create Table |
   +-----------+----------------------------------------------------------------------------------------------------------------------------------+
   | mtest     | CREATE TABLE `mtest` (
   `id` int(11) NOT NULL,
   `text` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
   PRIMARY KEY (`id`),
   SHARD KEY `__SHARDKEY` (`id`),
   SORT KEY `__UNORDERED` ()
   ) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES,NO_AUTO_CREATE_USER' |
   +-----------+-----------------------------------------------------------------------------------------------------------------------------------+
   ```

3. Drop the existing table, and use the definition of the template as base to specify a different sort key or shard key:
   ```sql
   DROP TABLE mtest;

   CREATE TABLE mtest (
     id INT(11) NOT NULL PRIMARY KEY, 
     text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci, 
     SHARD KEY (id), SORT KEY(id, text));
   ```

## Example 4 - Use Regular Expressions to Include or Exclude Tables or Databases

The names of the databases or tables to include or exclude are specified using regular expressions. For example, to include or exclude all the tables in a database named `dbTest`, use the following regular expression:

```
dbTest[.].* 
-- OR -- 
dbTest\..*
```

where,

* `dbTest` matches the exact string `dbTest`.
* `[.]` and `\.` match the character dot (`.`), which represents the dot (`.`) in `<database_name>.<table_name>` notation.

  Note that `.` is a special character in regular expressions, and to match the character `.` it must either be escaped (`\.`) or specified as a literal character (`[.]`).
* `.*` matches any sequence of characters, because the dot (`.`) matches any single character and the asterisk (`*`) matches zero or more occurrences of any character.

Here's a sample `CREATE LINK` statement to include all the tables in the `dbTest` database, for example, `dbTest.foo`, `dbTest.bar`, `dbTest.exampleTable`, etc.:

```sql
CREATE LINK pLink AS MYSQL 
CONFIG '{
  "database.hostname": "svchost",
  "database.port": 3306,
  "database.ssl.mode":"required",
  "database.exclude.list": "mysql,performance_schema",
  "table.include.list": "dbTest\..*"
}'
CREDENTIALS '{
  "database.password": "pa55w0rd", 
  "database.user": "repl_user"}';
```

To specify the tables or databases to include or exclude from the replication task, use either (or a combination) of the following parameters: `table.include.list`, `table.exclude.list`, `database.include.list`, or `database.exclude.list`. Refer to [Parameters](https://docs.singlestore.com/#section-idm457045647466883432568358349.md) for more information.

## In this section

* [Load Data from MySQL using mysqldump](https://docs.singlestore.com/db/v9.1/load-data/data-sources/replicate-data-from-mysql/load-data-from-mysql.md)

***

Modified at: May 12, 2026

Source: [/db/v9.1/load-data/data-sources/replicate-data-from-mysql/](https://docs.singlestore.com/db/v9.1/load-data/data-sources/replicate-data-from-mysql/)

(An index of the documentation is available at /llms.txt)
