Pipeline Concepts

Create a Pipeline

A core CREATE PIPELINE command contains:

A Pipeline Declaration with the pipeline name.
A Data Source Specification based on where the data is stored (AWS S3, Azure, Kafka, etc.).
A Data File Mapping which maps the data file to a SingleStore table and which is based on the data file format.

Below is a simple pipeline that loads a CSV file from Amazon S3.

In the pipeline above, the structure of the Data Source Specification is determined by the location of the data file, Amazon S3 in this example. The structure of the Data File Mapping is determined by the data file format, CSV in this example.

The following pipeline loads a Parquet file from Amazon S3.

In this pipeline, the Data Source Specification is the same as in the pipeline above (except for the file name), but the Data File Mapping is different as both the file format and the data file format are different (CSV in the prior example, Parquet in this example).

Refer to CREATE PIPELINE for a full description of the CREATE PIPELINE command.

Supported Data Stores

Pipelines support the following data stores:

Amazon S3 - S3 Pipeline Syntax
Google Cloud Services - GCS Pipeline Syntax
HDFS - HDFS Pipeline Syntax (Version 2.2.x or newer)
Kafka - Kafka Pipeline Syntax (Version 0.8.2.2 or newer)
Link - Creating a Pipeline Using a Connection Link
Local Filesystem - Local Filesystem Syntax
Microsoft Azure - Azure Blob Pipeline Syntax
MongoDB - Replicate MongoDB® Collections using SQL
MySQL - Replicate Data from MySQL

On this page

Create a Pipeline

Supported Data Stores

Supported Data File Formats

In this section

Was this article helpful?

On this page

Was this article helpful?