# Load Data Using pipeline\_source\_file()

Pipelines can extract, transform, and insert objects from an Amazon S3 bucket into a destination table. Pipelines can persist the name of a file (Persisting the name of a file is storing the file name so it can be retrieved and used again later in a program or system.) by using the `pipeline_source_file() helper`.

Below is a list of three files that will be loaded into a table using an S3 pipeline. These files have a numeric column and one numeric entry per line which corresponds to a filename.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltc0fa32e01c3ae3dc/6a2fe874a3d66d63f7cfb1f1/S3_file_list-pZpIWc.png)

```sql
CREATE TABLE book_inventory(isbn NUMERIC(13),title VARCHAR(50));
```

Create an S3 pipeline to ingest the data.

```sql
CREATE PIPELINE books AS
LOAD DATA S3 's3://<bucket_name>/Books/'
CONFIG '{"region":"us-west-2"}
'CREDENTIALS '{"aws_access_key_id": "<access_key_id>",               
               "aws_secret_access_key": "<secret_access_key>"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE book_inventory
(isbn)
SET title = pipeline_source_file();
```

Test the pipeline:

```sql
TEST PIPELINE books limit 5;


```

```output

+---------------+------------------+
| isbn          | title            |
+---------------+------------------+
| 9780770437404 | Books/Horror.csv |
| 9780380977277 | Books/Horror.csv |
| 9780385319676 | Books/Horror.csv |
| 9781416552963 | Books/Horror.csv |
| 9780316362269 | Books/Horror.csv |
+---------------+------------------+
```

Start the pipeline.

```sql
START PIPELINE books;
```

Check each row to verify that every one has a corresponding filename.

```sql
SELECT * FROM book_inventory;


```

```output

+---------------+---------------------------+
| isbn          | title                     |
+---------------+---------------------------+
| 9780316137492 | Books/Nautical.csv        |
| 9780440117377 | Books/Horror.csv          |
| 9780297866374 | Books/Nautical.csv        |
| 9780006166269 | Books/Nautical.csv        |
| 9780721405971 | Books/Nautical.csv        |
| 9781416552963 | Books/Horror.csv          |
| 9780316362269 | Books/Horror.csv          |
| 9783104026886 | Books/Nautical.csv        |
| 9788496957879 | Books/Nautical.csv        |
| 9780380783601 | Books/Horror.csv          |
| 9780380973835 | Books/science_fiction.csv |
| 9780739462287 | Books/science_fiction.csv |
+---------------+---------------------------+
```

To load files from a specific folder in your S3 bucket while ignoring the files in the subfolders, use the '`**`' regular expression pattern as '`s3://<bucket_name>/<folder_name>/**`'. For example:

```sql
CREATE PIPELINE <your_pipeline> AS
LOAD DATA S3 's3://<bucket_name>/<folder_name>/**'
CONFIG '{"region":"<your_region>"}'
CREDENTIALS '{"aws_access_key_id": "<access_key_id>",  
              "aws_secret_access_key": "<secret_access_key>"}'
SKIP DUPLICATE KEY ERRORS
INTO TABLE <your_table>;
```

Using two asterisks (\*\*) after the folder instructs the pipeline to load all of the files in the main folder and ignore the files in the subfolders. However, the files in the subfolders will get scanned when listing the contents of the bucket.

***

Modified at: July 16, 2025

Source: [/db/v9.1/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/load-data-using-pipeline-source-file/](https://docs.singlestore.com/db/v9.1/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/load-data-using-pipeline-source-file/)

(An index of the documentation is available at /llms.txt)
