Pipeline Overview

A pipeline is a mechanism for continuously loading data into a SingleStore database from external sources. With pipelines, users can extract, shape, and load data without the need for additional ETL (Extract, Transform, Load) tools.

Users can monitor the status and performance of pipelines in SingleStore via the dashboard feature. The Pipeline Summary portion of the dashboard provides a high-level overview of the state of all pipelines, including the number and percentage of pipelines in various states such as running, stopped, and errored. The Pipeline Performance section provides more insights into the operation of pipelines. For example, monitor metrics such as execution count, average CPU time per execution, average elapsed time per execution, and others, which can aid identifying and optimizing aspects of pipeline performance.

Pipelines and Resource Pools

Resource pools are used to group queries to prevent non-critical workloads from overburdening the system. Setting MAX_CONCURRENCY for a resource pool will limit the amount of SQL statements that run simultaneously, thereby reducing the burden placed on a system. See CREATE RESOURCE POOL and Set Resource Limits for more details.

Pipelines waiting in the queue will use the thread pool slots reserved for background pipelines based on the settings of MAX_CONCURENCY and MAX_QUEUE_DEPTH. Users may want to adjust the number of simultaneous running pipelines by changing the pipelines_max_concurrent engine variable.

Supported Data Sources

Data Source

Data Source Version

MemSQL/SingleStore Version

Apache Kafka

0.8.2.2 or newer

5.5.0 or newer

Amazon S3

N/A

5.7.1 or newer

Filesystem Extractor

N/A

5.8.5 or newer

Azure Blob

N/A

5.8.5 or newer

HDFS

2.2.x or newer

6.5.2 or newer

Google Cloud Storage

N/A

7.0.14 or newer

Supported File Formats

Pipelines support the following file formats:

  • JSON

  • Avro

  • Parquet

  • CSV

  • Iceberg

Last modified: December 10, 2024

Was this article helpful?