SingleStore Ingest

Overview

SingleStore Ingest (“Ingest”) is real-time data replication software that replicates data from various sources to SingleStore. It is one of the primary components of SingleStore Flow. Ingest offers high performance, enabling real-time Change Data Capture from sources with zero load on the source systems. It captures changes and transfers them to the target system. It automates the creation of either an exact copy or a time-series copy of the data source in the target. It first performs a full initial load from the source, then incrementally merges changes to SingleStore. The entire process is fully automated.

Supported Source Databases

Ingest supports the following database sources:

  • Oracle

  • Microsoft SQL Server

  • MySQL

  • PostgreSQL

Contact your SingleStore account team or SingleStore Sales if you want to move data from a source not listed above.

Ingest Architecture

Ingest replicates data from any supported source to a SingleStore destination database. It is a fully self-service, automated data replication tool.

SingleStore Flow, of which Ingest is a part, offers several deployment strategies for its customers, including:

  • Standard deployment in an AWS environment

  • High Availability deployment in an AWS environment

  • Hybrid deployment using both on-premises and cloud infrastructure

  • Fully on-premises deployment

Flow components can be deployed in Google Cloud and Microsoft Azure as well. AWS components and services are referenced here as an illustration of a common type of deployment.

Ingest uses log-based Change Data Capture for data replication. The following is the technical architecture diagram that illustrates the standard setup in an AWS environment.

The following diagram serves as the reference for all setup instructions.

Estimated deployment time: Approximately 1 hour

Ingest / AWS Service Integration

The following is the Ingest architecture which showcases integration with various optional AWS services in a standard deployment.

This architecture diagram illustrates a standard deployment that highlights the following features:

  • AWS services running alongside Ingest.

  • RecommendedFlow architecture for a VPC in AWS.

  • Data flow between the source database, AWS, and SingleStore destination database, including security and monitoring features.

  • Security, including IAM, organized in a separate group and integrated with Ingest.

Ingest High Availability Architecture

The following High Availability architecture explains how Ingest is deployed in a multi-AZ setup. In the event of an instance or AZ failure, it automatically scales to another AZ without incurring any data loss.

Estimated deployment time: Approximately 1 day

Ingest Hybrid Architecture

Ingest also offers a hybrid deployment model that combines on-premises services with those in the AWS Cloud. Ingest can be easily set up on a Windows server in an on-premises environment. The SingleStore destination endpoint resides in the AWS Cloud that creates a hybrid model. SingleStore recommends secure connectivity between on-premises and AWS services, which can be achieved using a VPN connection or AWS Direct Connect.

Estimated deployment time: Approximately 2 hours to 1 day

Prerequisites

The following are the prerequisites for launching Ingest on Amazon EC2

  • Selection of the Ingest volume.

  • Selection of the EC2 instance type.

  • Ensure connectivity between the server/EC2 hosting the Ingest software and the source. Additionally, ensure connectivity to DynamoDB if the high availability option is required.

To create the necessary AWS services, refer to Environment Preparation.

The following are the steps to take before launching Ingest in AWS via custom installation on an EC2:

  1. Create a policy with a relevant name for EC2, such as FlowEC2Policy. Refer to the Define custom IAM permissions with customer managed policies for creating policies.

  2. Refer to AWS Identity and Access Management (IAM) for SingleStore Flow for JSON policy.

  3. Create an IAM role called FlowEC2Role. Refer to Create a role to delegate permissions to an IAM user for creating roles.

  4. Attach the FlowEC2Policy to the role.

  5. Create a Lambda policy for disk checks and attach the Lambda policy JSON. Refer to AWS Recovery for Lambda Policy JSON.

The following are the recommended EC2 options for replicating source data volumes.

Total Data Volume

EC2 Recommended

< 100 GB

t2.small

100GB – 300GB

t2.medium

300GB – 1TB

t2.large

> 1TB

Contact SingleStore Support

These recommendations serve as a starting point. If you have any questions, please contact SingleStore Support or your technical account team representative.

The following are the system requirements when not using the Amazon EC2:

  • Port 8081 must be open on the server hosting the Ingest software.

  • Google Chrome is required as the internet browser on the server hosting Ingest software.

  • Java version 21 or higher is required.

  • If using Microsoft SQL Server as a source, download and install the BCP utility.

  • Ensure connectivity between the server hosting the Ingest software and the source, and DynamoDB (if the high availability option is required).

The following describes the hardware configuration for a Windows server, assuming that there are a few sources and target combinations (3 medium, ideally). It also depends on how intensively the data is being replicated from these sources, so this is a guide, but will need extra resources depending on the amount of data being replicated. The amount of disk space will also be dependent on the amount of data being replicated.

The following describes the hardware configuration for a Windows server; similar configuration is recommended for a Linux or Ubuntu based server. The configuration also depends on the intensity of data replication from these sources. Additional resources may be required based on the volume of data being replicated. The disk space required also depends on the amount of data being replicated.

Component

Specification

Processor

4 cores

Memory

16 GB

Disk requirements

Varies based on the data being extracted, with a minimum of 300 GB

Network performance

High

Prerequisites for Software on Server

The following software must be installed on the server:

Required Skills

Flow is a suite of robust applications that makes seamless data replication to the cloud. It handles large data volumes with ease, and the process is fully automated. The setup takes only three simple steps. The application does not require highly technical resources, but basic knowledge of the following is recommended for deployment:

  • AWS Cloud Fundamentals

  • Basic database skills, including writing and executing database queries (for RDBMS endpoints)

  • Familiarity with using Microsoft Windows or Linux-based systems

Installation

For details on how to install Ingest and other Flow components, refer to Install SingleStore Flow.

In this section

Last modified: February 7, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK