Connecting StreamSets to SingleStoreDB via Fast Loader
StreamSets can be connected to SingleStoreDB via Fast Loader by creating different types of pipelines. This document provides the connection details for the following pipelines:
Hadoop
Kafka
Filesystem
Perform the following steps first, then follow the specific instructions for your respective pipeline below.
Open the following URL: http://< IP Address of the server running the StreamSets service >:18630/
Enter the username and password to log in. The default is admin/admin
On the Get Started page, click + Create New Pipeline to create a new Pipeline.
Add a title and description for the new pipeline and click Save.
Pipeline from Hadoop Source to SingleStoreDB Using the Fast Loader
Under Select Origin on the right hand side panel, select Hadoop FS Standalone.
Provide the Origin Name and select “Send to error” for the On Record Error field.
On the connections tab enter the Hadoop File system URI in the following format: hdfs://<IP Address of the server running the Hadoop system>:9000/
Click on the Fields tab and provide the Hadoop file system details.
In the Data Source tab, provide the data source details.
From the right pane select SingleStoreDB Fast Loader as the destination type and configure as below:
General
Name: Name of destination
On Record Error: Sent to error
JDBC
JDBC Connection string
Schema Name and Table Name
NOTE: Do the Field to column mapping for all the columns present in the target table.
Connect Hadoop FS Standalone origin to SingleStoreDB Fast Loader destination.
Start the pipeline and data processing will start.
You are now ready to move the data.
Pipeline from Kafka Source to SingleStoreDB Using the Fast Loader
Under Select Origin on the right-hand side panel, select Kafka Consumer as the origin type and configure as below:
General
Name: Name of Origin
On Record Error: Sent to error
Kafka
Broker URI: ip-< IP address of the machine running Kafka >:9092
Zookeeper URI: ip-< IP address of the machine running Kafka >:2181
Consumer Group: < Name of the consumer group >
Topic: < Name of the Kafka topic >
Data Format:
Data format: Delimited
Delimiter Format Type: Default CSV (ignore empty lines)
Header line: No Header Line
From the right pane select SingleStoreDB Fast Loader as the destination type and configure as below:
General
Name: Name of destination
On Record Error: Sent to error
JDBC
JDBC Connection string
Schema Name and Table Name
Field to Column Mapping: < Do mapping of all columns where data is going to load >
Default Operation: Insert
Credentials:
Username: < DB user >
Password: < DB password >
NOTE: Do the Field to column mapping for all the columns present in the target table.
Connect Kafka Consumer Origin to SingleStoreDB Fast Loader.
Start the pipeline by clicking on the option present above the left pane.
You are now ready to move the data.
Pipeline from Filesystem Source to SingleStoreDB Using the Fast Loader
Under Select Origin on the right-hand side panel, select Directory as the origin type and configure as below:
General
Name: Name of Origin
On Record Error: Sent to error
Files
File Directory: < Directory path where file exists >
File Name Pattern: *.csv
Data Format:
Data format: Delimited
Delimiter Format Type: Default CSV (ignore empty lines)
Header line: With Header Line
From the right pane select SingleStoreDB Fast Loader as the destination type and configure as below:
General
Name: Name of destination
On Record Error: Sent to error
JDBC
JDBC Connection string
Schema Name and Table Name
Field to Column Mapping: < Do mapping of all columns where data is going to load >
Default Operation: Insert
Credentials:
Username: < DB user >
Password: < DB password >
NOTE: Do the Field to column mapping for all the columns present in the target table.
Connect Directory Origin to SingleStoreDB Fast Loader.
Start the pipeline by clicking on the option present above the left pane.
You are now ready to move the data.