SingleStore DB

Overview of Pipelines

SingleStore Pipelines is a feature that natively ingests real-time data from external sources. As a built-in component of the database, Pipelines can extract, shape (modify), and load external data without the need for third-party tools or middleware. Pipelines is robust, scalable, highly performant, and supports fully distributed workloads.

Pipelines support Apache Kafka, Amazon S3, Azure Blob, file system, Google Cloud Storage and HDFS data sources.

Pipelines support the JSON, Avro, Parquet, and CSV data formats.

Features

The features of SingleStore Pipelines make it a powerful alternative to third-party ETL middleware in many scenarios:

  • Scalability: Pipelines inherently scales with SingleStore DB clusters as well as distributed data sources like Kafka and cloud data stores like Amazon S3.

  • High Performance: Pipelines data is loaded in parallel from the data source directly to the SingleStore leaves, in most situations; this improves throughput by bypassing the aggregator. Additionally, Pipelines has been optimized for low lock contention and concurrency.

  • Exactly-once Semantics: The architecture of Pipelines ensures that transactions are processed exactly once, even in the event of failover.

  • Debugging: Pipelines makes it easier to debug each step in the ETL process by storing exhaustive metadata about transactions, including stack traces and stderr messages.