Load Data Using pipeline_source_file()

Pipelines can extract, transform, and insert objects from an Amazon S3 bucket into a destination table. Pipelines can persist the name of a file by using the pipeline_source_file() helper.

Below is a list of three files that will be loaded into a table using an S3 pipeline. These files have a numeric column and one numeric entry per line which corresponds to a filename.

CREATE TABLE book_inventory(isbn NUMERIC(13),title VARCHAR(50));

Create an S3 pipeline to ingest the data.

CREATE PIPELINE books AS
LOAD DATA S3 's3://<bucket_name>/Books/'
CONFIG '{"region":"us-west-2"}
'CREDENTIALS '{"aws_access_key_id": "<access_key_id>",
"aws_secret_access_key": "<secret_access_key>"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE book_inventory
(isbn)
SET title = pipeline_source_file();

Test the pipeline:

TEST PIPELINE books limit 5;
+---------------+------------------+
| isbn          | title            |
+---------------+------------------+
| 9780770437404 | Books/Horror.csv |
| 9780380977277 | Books/Horror.csv |
| 9780385319676 | Books/Horror.csv |
| 9781416552963 | Books/Horror.csv |
| 9780316362269 | Books/Horror.csv |
+---------------+------------------+

Start the pipeline.

START PIPELINE books;

Check each row to verify that every one has a corresponding filename.

SELECT * FROM book_inventory;
+---------------+---------------------------+
| isbn          | title                     |
+---------------+---------------------------+
| 9780316137492 | Books/Nautical.csv        |
| 9780440117377 | Books/Horror.csv          |
| 9780297866374 | Books/Nautical.csv        |
| 9780006166269 | Books/Nautical.csv        |
| 9780721405971 | Books/Nautical.csv        |
| 9781416552963 | Books/Horror.csv          |
| 9780316362269 | Books/Horror.csv          |
| 9783104026886 | Books/Nautical.csv        |
| 9788496957879 | Books/Nautical.csv        |
| 9780380783601 | Books/Horror.csv          |
| 9780380973835 | Books/science_fiction.csv |
| 9780739462287 | Books/science_fiction.csv |
+---------------+---------------------------+

To load files from a specific folder in your S3 bucket while ignoring the files in the subfolders, use the '**' regular expression pattern as 's3://<bucket_name>/<folder_name>/**'. For example:

CREATE PIPELINE <your_pipeline> AS
LOAD DATA S3 's3://<bucket_name>/<folder_name>/**'
CONFIG '{"region":"<your_region>"}'
CREDENTIALS '{"aws_access_key_id": "<access_key_id>",  
              "aws_secret_access_key": "<secret_access_key>"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE <your_table>;

Using two asterisks (**) after the folder instructs the pipeline to load all of the files in the main folder and ignore the files in the subfolders. However, the files in the subfolders will get scanned when listing the contents of the bucket.

Last modified: July 16, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK