Load Data with Pipelines

SingleStore Pipelines is a feature that continuously loads data as it arrives from external sources. As a built-in component of the database, Pipelines can extract, shape (modify), and load external data without the need for third-party tools or middleware. Pipelines is robust, scalable, highly performant, and supports fully distributed workloads.

Pipelines support Apache Kafka, Amazon S3, Azure Blob Storage, file system, Google Cloud Storage and HDFS data sources.

Pipelines support the JSON, Avro, Parquet, and CSV data formats.

A database backup preserves the state of all pipelines (offsets, etc.) in that database.

When a backup is restored all pipelines in that database will revert to the state (offsets, etc.) they were in when the target backup was generated.

Features

The features of SingleStore Pipelines make it a powerful alternative to third-party ETL middleware in many scenarios:

  • Easy continuous loading: Pipelines monitor their source folder or Kafka queue and, when new files or messages arrive, automatically load them. This simplifies the job of the application developer.

  • Scalability: Pipelines inherently scales with SingleStore clusters as well as distributed data sources like Kafka and cloud data stores like Amazon S3.

  • High Performance: Pipelines data is loaded in parallel from the data source directly to the SingleStore leaves, in most situations; this improves throughput by bypassing the aggregator. Additionally, Pipelines has been optimized for low lock contention and concurrency.

  • Exactly-once Semantics: The architecture of Pipelines ensures that transactions are processed exactly once, even in the event of failover.

  • Debugging: Pipelines makes it easier to debug each step in the ETL process by storing exhaustive metadata about transactions, including stack traces and stderr messages.

  • Concurrency: Multiple pipelines can insert data into a single table. This ability is similar to using multiple write queries. See Sync Variables Lists for more information.

In this section

Last modified: December 13, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK