# Load Data from Amazon Web Services (AWS) S3

SingleStore Pipelines can extract objects from Amazon S3 buckets, optionally transform them, and insert them into a destination table. To understand Amazon S3’s core concepts and the terminology used in this topic, please read the [Amazon S3 documentation](http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html).

## Prerequisites

To complete this Quickstart, your environment must meet the following prerequisites:

* **AWS Account**: This Quickstart uses Amazon S3 and requires an AWS account’s *access key id* and *secret access key*.
* **SingleStore Helios installation –or– a SingleStore Helios workspace**: You will connect to the database or workspace and create a pipeline to pull data from your Amazon S3 bucket.

## Part 1: Creating an Amazon S3 Bucket and Adding a File

1. On your local machine, create a text file with the following CSV contents and name it `books.txt`:
   ```
   The Catcher in the Rye, J.D. Salinger, 1945
   Pride and Prejudice, Jane Austen, 1813
   Of Mice and Men, John Steinbeck, 1937
   Frankenstein, Mary Shelley, 1818

   ```

2. In S3 create a bucket and upload `books.txt` to the bucket. For information on working with S3, refer to the [Amazon S3 documentation](http://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html).

   Note that the `aws_access_key_id` that your SingleStore pipeline will use (specified in the next section in `CREATE PIPELINE library ... CREDENTIALS ...`) must have read access to both the bucket and the file.

Once the `books.txt` file has been uploaded, you can proceed to the next part of the Quickstart.

## Part 2: Generating AWS Credentials

To be able to use an S3 bucket within the pipeline syntax, the following minimum permissions are required:

* s3:GetObject
* s3:ListBucket

These permissions only provide for read access from an S3 bucket which is the minimum required to ingest data into a pipeline.

There are two ways to create an IAM Policy: with the Visual editor or JSON. Both creation methods require obtaining the Amazon Resource Number (ARN) before the policy is created.

## Create an IAM Policy Using the Visual Editor

1. Log into the AWS Management Console.

2. Obtain the Amazon Resource Number (ARN) and region for the bucket. The ARN and region are located in the **Properties** tab of the bucket.

3. Select **IAM** from the list of services.

4. Select **Policies** under Access Management and select the **Create policy** button.

5. Using the Visual editor:

   1. Select the **Service** link and select **S3** from the list or manually enter S3 into the search block.

   2. Select the **S3** link from the available selections.

   3. In the Action section, select the **List** and **Read** checkboxes.

   4. Under Resources, select the **bucket** link and select the **Add ARN** link. Enter the ARN and bucket name and select the **Add** button.

   5. Under Resources, select the **object** link and select the **Add ARN** link. Enter the ARN and object name and select the **Add** button. If no objects are added under resources, the created policy has access to all objects in the bucket’s root path.

   6. Request conditions are optional.

## Create an IAM Policy Using JSON

1. To use JSON for policy creation, copy the information from following the code block into the AWS JSON tab. Make sure to change the bucket name.
   ```json
   {
   	"Version": "2012-10-17",
   	"Statement": [{
   		"Sid": "VisualEditor1",
   		"Effect": "Allow",
   		"Action": [
   			"s3:GetObject",
   			"s3:ListBucket"
   		],
   		"Resource": [
   			"arn:aws:s3:::<bucket_name>",
   			"arn:aws:s3:::<bucket_name>/*"
   		]
   	}]
   }
   ```

2. Select the **Add tag** button if needed and select **Next: Review**.

3. Enter a policy name this is a required field. The description field is optional. Select **Create policy** to finish.

## Assign the IAM Policy to a New User

1. In the IAM services, select **Users** and select the **Add users** button.

2. Enter in a name for the new user and select **Next**.

3. Select the **Attach policies directly** radio button. Use the search box to find the policy or scroll through the list of available policies.

4. Select the checkbox next to the policy to be applied to the user and select **Next**.

5. Select the **Create user** button to finish.

## Create Access Keys for Pipeline Syntax

Access keys must be generated for the new user.

1. In the IAM services, select **Users** and select the user name.

2. Select the **Security credentials** tab.

3. In the access keys section, select the **Create access key** button.

4. Select the **Third-party service** radio button and select **Next**.

5. Although setting a description tag is optional, SingleStore recommends doing so, especially when multiple keys are needed. Select the **Create key** button to continue.

6. Either download a `.csv` file containing the access and secret key information or copy the credentials directly. Select **Done** when finished.

7. Following is the basic syntax for using an access key and a secret access key in a pipeline:
   ```sql
   CREATE PIPELINE <pipeline_name> AS
   LOAD DATA S3 's3://bucket_name/<file_name>'
   CONFIG '{"region":"us-west-2"}'
   CREDENTIALS '{"aws_access_key_id": "<access key id>",
                  "aws_secret_access_key": "<access_secret_key>"}'
   INTO TABLE <destination_table>
   FIELDS TERMINATED BY ',';
   ```

If creating or starting S3 pipelines takes approximately `60` seconds, or fails with a subprocess timeout when running outside AWS, or in environments where IMDS is blocked, reduce the value of the `subprocess_ec2_metadata_timeout_ms` engine variable (for example, to `1000`) or explicitly provide `CREDENTIALS`. New workspaces already set this value to `1` (millisecond) to avoid delays. Refer to [Sync Variables Lists](https://docs.singlestore.com/cloud/reference/configuration-reference/engine-variables/list-of-engine-variables/#sync-variables-lists.md) for more information.

> **⚠️ Warning**: If the key information is not downloaded or copied to a secure location before selecting **Done**, the secret key *cannot* be retrieved, and will need to be recreated.

## Part 3: Creating a SingleStore Database and S3 Pipeline

Now that you have an S3 bucket that contains an object (file), you can use SingleStore Helios or DB to create a new pipeline and ingest the messages.

Create a new database and a table that adheres to the schema contained in the `books.txt` file. At the prompt, execute the following statements:

```sql
CREATE DATABASE books;

```

```sql
CREATE TABLE classic_books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);

```

These statements create a new database named `books` and a new table named `classic_books`, which has three columns: `title`, `author`, and `date`.

Now that the destination database and table have been created, you can create an S3 pipeline. In Part 1 of this Quickstart, you uploaded the `books.txt` file to your bucket. To create the pipeline, you will need the following information:

* The name of the bucket, such as: `<bucket-name>`
* The name of the bucket’s region, such as: `us-west-1`
* Your AWS account’s access keys, such as:

  * *Access Key ID*: `<aws_access_key_id>`
  * *Secret Access Key*: `<aws_secret_access_key>`
* Your AWS account's session token, such as:

  * *Session Token*: `your_session_token`
  * Note that the `aws_session_token` is required only if your credentials in the `CREDENTIALS` clause are temporary

Using these identifiers and keys, execute the following statement, replacing the placeholder values with your own.

```sql
CREATE PIPELINE library
AS LOAD DATA S3 'my-bucket-name'
CONFIG '{"region": "us-west-1"}'
CREDENTIALS '{"aws_access_key_id": "your_access_key_id", "aws_secret_access_key": "your_secret_access_key", "aws_session_token": "your_session_token"}'
INTO TABLE `classic_books`
FIELDS TERMINATED BY ',';

```

You can see what files the pipeline wants to load by running the following:

```sql
SELECT * FROM information_schema.PIPELINES_FILES;

```

If everything is properly configured, you should see one row in the `Unloaded` state, corresponding to `books.txt`. The `CREATE PIPELINE` statement creates a new pipeline named `library`, but the pipeline has not yet been started, and no data has been loaded. A SingleStore pipeline can run either in the background or be triggered by a foreground query. Start it in the foreground first.

```sql
START PIPELINE library FOREGROUND;

```

When this command returns success, all files from your bucket will be loaded. If you check `information_schema.PIPELINES_FILES` again, you should see all files in the `Loaded` state. Now query the `classic_books` table to make sure the data has actually loaded.

```sql
SELECT * FROM classic_books;

```

```output

+------------------------+-----------------+-------+
| title                  | author          | date  |
+------------------------+-----------------+-------+
| The Catcher in the Rye |  J.D. Salinger  |  1945 |
| Pride and Prejudice    |  Jane Austen    |  1813 |
| Of Mice and Men        |  John Steinbeck |  1937 |
| Frankenstein           |  Mary Shelley   |  1818 |
+------------------------+-----------------+-------+

```

You can also have SingleStore run your pipeline in the background. In such a configuration, SingleStore will periodically poll S3 for new files and continuously them as they are added to the bucket. Before running your pipeline in the background, you must reset the state of the pipeline and the table.

```sql
DELETE FROM classic_books;
ALTER PIPELINE library SET OFFSETS EARLIEST;

```

The first command deletes all rows from the target table. The second causes the pipeline to start from the beginning, in this case, *forgetting* it already loaded `books.txt` so you can load it again. You can also drop and recreate the pipeline, if you prefer.

To start a pipeline in the background, run

```sql
START PIPELINE library;

```

This statement starts the pipeline. To see whether the pipeline is running, run `SHOW PIPELINES`.

```sql
SHOW PIPELINES;

```

```output

+----------------------+---------+
| Pipelines_in_books   | State   |
+----------------------+---------+
| library              | Running |
+----------------------+---------+

```

At this point, the pipeline is running and the contents of the `books.txt` file should once again be present in the `classic_books` table.

> **📝 Note**: Foreground pipelines and background pipelines have different intended uses and behave differently. For more information, see the [START PIPELINE](https://docs.singlestore.com/cloud/reference/sql-reference/pipelines-commands/start-pipeline.md) topic.

## Use Cloud Workload Identity with S3 Pipelines

You can use [Cloud Workload Identity](https://docs.singlestore.com/cloud/user-and-workspace-administration/cloud-workload-identity-and-delegated-entities.md) instead of static credentials to load data via S3 pipelines.

Perform the following tasks to create an S3 pipeline that authenticates using the cloud workload identity:

1. [Configure delegated entities](https://docs.singlestore.com/cloud/user-and-workspace-administration/cloud-workload-identity-and-delegated-entities/#section-id235390403966648.md).

   1. **Create an IAM role** in your AWS account with the necessary privileges. You can also use an existing IAM role.

   2. **Update the IAM role's trust policy** to allow the workspace's cloud workload identity to assume the role. Specify the [cloud workload identity ARN of the SingleStore workspace](https://docs.singlestore.com/cloud/user-and-workspace-administration/cloud-workload-identity-and-delegated-entities/#section-id23539041172686.md).

   Alternatively, [create a CloudFormation stack](https://docs.singlestore.com/#section-id235391786003154.md) to configure the IAM roles.

2. Create an S3 pipeline. In the pipeline configuration:

   1. Set `creds_mode` to `eks_irsa` in the `CONFIG` clause.

   2. Specify the IAM role to assume using `role_arn` in the `CREDENTIALS` clause. The specified role ARN must match the configured delegated entities for the SingleStore workspace. For example:
      ```sql
      CREATE PIPELINE s3_pipeline AS
      LOAD DATA S3 's3://bucket-name/path/'
      CONFIG '{
        "region": "us-east-1",
        "creds_mode": "eks_irsa"
      }'
      CREDENTIALS '{
        "role_arn": "arn:aws:iam::xxxxxxxx:role/singlestore-s3-pipeline"
      }'
      INTO TABLE table_name
      FIELDS TERMINATED BY ',';
      ```

3. Start the pipeline to ingest data.
   ```sql
   START PIPELINE s3_pipeline;
   ```

If delegated entities are not configured, SingleStore pipelines that attempt to use IRSA with a role ARN not present in the delegated entities list fail at runtime.

## Use CloudFormation Stack to Configure the IAM Role

Use [this CloudFormation Stack template](https://singlestore-public-resources-production.s3.us-east-1.amazonaws.com/cf-templates/s3-access-role/template.yaml) to define the IAM role. Configure the following input parameters for the stack:

| Parameter                               | Description                                                                                                                                                                  |
| --------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `RoleName`                              | Name of the IAM role that theSingleStoreworkspaceassumes to access the S3 buckets.Default:`SingleStoreS3AccessRole-01`                                                       |
| `WorkspaceGroupCloudWorkloadIdentities` | Comma-separated list ofSingleStoreworkspace's cloud workload identity ARNs that can assume this IAM role.**Note**: Do not include a trailing comma (at the end of the list). |
| `BucketNames`                           | Comma separated list of S3 buckets that can be accessed by assuming this role.                                                                                               |

Refer to [Getting started with CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/GettingStarted.html) for more information.

## Next Steps

See [About SingleStore Pipelines](https://docs.singlestore.com/cloud/load-data/about-singlestore-pipelines.md) to learn more about how pipelines work.

## In this section

* [Scanning and Loading Files in AWS S3](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/scanning-and-loading-files-in-aws-s-3.md)
* [Load Data Using pipeline\_source\_file()](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/load-data-using-pipeline-source-file.md)
* [Connect to AWS S3 Bucket from SingleStore](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/connect-to-aws-s-3-bucket-from-singlestore.md)
* [Load Data in CSV Format from Amazon S3 Using a Pipeline](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/load-data-in-csv-format-from-amazon-s-3-using-a-pipeline.md)
* [Load Data in JSON Format from Amazon S3 Using a Wildcard](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/load-data-in-json-format-from-amazon-s-3-using-a-wildcard.md)
* [Enable EKS IRSA](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/enable-eks-irsa.md)
* [S3 Pipeline Errors](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/s-3-pipeline-errors.md)

***

Modified at: May 18, 2026

Source: [/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/](https://docs.singlestore.com/cloud/load-data/data-sources/load-data-from-amazon-web-services-aws-s-3/)

(An index of the documentation is available at /llms.txt)
