Load Data with Pipelines

This part of the tutorial shows how to ingest TPC-H data from a public AWS S3 bucket into the SingleStore database using pipelines.

Note

The SQL Editor only runs the queries that you select, so ensure you have them all selected before selecting Run.

  1. Run the following SQL commands to create the pipelines:

    USE tpch;
    CREATE OR REPLACE PIPELINE tpch_100_lineitem
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/lineitem/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE lineitem
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_customer
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/customer/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE customer
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_nation
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/nation/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE nation
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_orders
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/orders/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE orders
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_part
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/part/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE part
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_partsupp
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/partsupp/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE partsupp
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_region
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/region/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE region
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
    CREATE OR REPLACE PIPELINE tpch_100_supplier
    AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/supplier/'
    config '{"region":"us-east-1"}'
    SKIP DUPLICATE KEY ERRORS
    INTO TABLE supplier
    FIELDS TERMINATED BY '|'
    LINES TERMINATED BY '|\n';
  2. Run the following SQL commands to start the pipelines:

    USE tpch;
    START ALL PIPELINES;

    Once the Success message is returned for all the created pipelines, SingleStore starts ingesting the data from the S3 bucket.

Last modified: October 28, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK