Pipeline Overview

A pipeline is a mechanism for continuously loading data into a SingleStoreDB database from external sources. With pipelines, users can extract, shape, and load data without the need for additional ETL (Extract, Transform, Load) tools.

Users can monitor the status and performance of pipelines in SingleStoreDB via the dashboard feature. The Pipeline Summary portion of the dashboard provides a high-level overview of the state of all pipelines, including the number and percentage of pipelines in various states such as running, stopped, and errored. The Pipeline Performance section provides more insights into the operation of pipelines. For example, monitor metrics such as execution count, average CPU time per execution, average elapsed time per execution, and others, which can aid identifying and optimizing aspects of pipeline performance.

Supported Data Sources

Data Source

Data Source Version

MemSQL/SingleStoreDB Cloud Version

Apache Kafka

0.8.2.2 or newer

5.5.0 or newer

Amazon S3

N/A

5.7.1 or newer

Filesystem Extractor

N/A

5.8.5 or newer

Azure Blob

N/A

5.8.5 or newer

HDFS

2.2.x or newer

6.5.2 or newer

Google Cloud Storage

N/A

7.0.14 or newer

Supported File Formats

Pipeline support the following file formats:

  • JSON

  • Avro

  • Parquet

  • CSV

Last modified: September 20, 2023

Was this article helpful?