Skip to main content

Load Data with Pipelines

This part of the guide will show you how to pull the TPC-H data from a public S3 bucket into your SingleStoreDB database using Pipelines. Because of the powerful Pipelines functionality, loading TPC-H SF100 (approximately 100 GBs of row files) will take around four minutes on your cluster in AWS. Once a pipeline has been created, SingleStoreDB will continuously pull data from the bucket.

  1. Create the pipelines by copying the following block. Again, make sure you select all the queries in SQL Editor before clicking Run.

    use tpch;
    
    CREATE OR REPLACE PIPELINE tpch_100_lineitem
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/lineitem/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE lineitem
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_customer
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/customer/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE customer
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_nation
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/nation/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE nation
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_orders
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/orders/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE orders
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_part
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/part/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE part
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_partsupp
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/partsupp/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE partsupp
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_region
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/region/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE region
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    CREATE OR REPLACE PIPELINE tpch_100_supplier
        AS LOAD DATA S3 'memsql-tpch-dataset/sf_100/supplier/'
        config '{"region":"us-east-1"}'
        SKIP DUPLICATE KEY ERRORS
        INTO TABLE supplier
        FIELDS TERMINATED BY '|'
        LINES TERMINATED BY '|\n';
    
    16083b5779a4fa.png
  2. Start the pipelines by running the following queries.

    use tpch;
    START ALL PIPELINES;
    

    Once you see Success messages for all the Pipelines created, SingleStoreDB will begin pulling data from the S3 datasource.

    16083b577a2628.png

Note

The SQL Editor only runs the queries you have selected, so make sure you have them all selected before clicking Run.