# Load Data from AWS Glue

AWS Glue is a fully managed serverless data integration service that allows users to extract, transform, and load (ETL) data from various sources for analytics and data processing. AWS Glue runtime supports connectivity to a variety of data sources.

SingleStore provides a connector for AWS Glue based on Apache Spark Datasource, available through AWS Marketplace. The connection supports VPC networking and integration with [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) for authentication credentials.

The following architecture diagram shows SingleStore connecting with AWS Glue for an ETL job.

![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltccfe61e49b393274/6a2c423891ca1c5f09253614/16083b576a82e3-hBpfOO.png)

## Prerequisites

* AWS Glue version 5.0+.
* Admin access to the AWS account.
* An active SingleStore workspace with a sample dataset.

  The example uses the `lineitem` table from the `tpch` database. Refer to [Load TPC-H Data into SingleStore](https://docs.singlestore.com/cloud/getting-started-with-singlestore-helios/next-steps-and-examples/sample-data/load-tpc-h-data-into-singlestore.md). Alternatively, you can use another data set stored in your SingleStore database.

> **📝 Note**: The SingleStore AWS Glue connector returns an error if used in the following AWS regions: Hong Kong, Sao Paolo, Stockholm, Bahrain, and Cape Town.

## Configuration Overview

The following steps connect a SingleStore workspace in an AWS Glue ETL job as the source, transform the data, and store it back on the following:

* SingleStore database
* Amazon S3 in parquet format

To create an ETL job using the SingleStore connector:

1. Store authentication credentials in Secrets Manager.

2. Create an [AWS Identity and Access Management (IAM)](http://aws.amazon.com/iam) role for the AWS Glue ETL job.

3. Configure the SingleStore connector and connection.

4. Create an ETL job using the SingleStore connection in AWS Glue Studio.

## Store Authentication Credentials in Secrets Manager

AWS Glue provides integration with AWS Secrets Manager to securely store connection authentication credentials. To create these credentials:

1. Log in to AWS, and open **Secrets Manager** (search Secrets Manager on the AWS console and select it from the results).

2. Select **Store a new secret**.

3. Under **Secret type**, select **Other type of secrets**.

4. Under **Key/value pairs**, set one row for each of the following parameters:

   * `ddlEndpoint`: IP address or hostname of the SingleStore workspace.
   * `database`: Name of the SingleStore database to connect with.
   * `user`: Username of the SingleStore database user with which to connect.
   * `password`: Password for the SingleStore database user.

5. Select **Next**.

6. In the **Secret name** field, enter a name for the secret, for example, `aws-glue-singlestore-connection-info`.

7. Select **Next**.

8. Disable **Automatic rotation**, and then select the **Next** button.

9. Review the secret configuration and then select **Store**.

10. Copy and securely store the **Secret ARN**.

## Create an IAM Role for the AWS Glue ETL Job

To create a role with an attached policy to allow read-only access to credentials that are stored in Secrets Manager for the AWS Glue ETL job:

1. Log in to AWS, and open **IAM** (search IAM in the AWS console and select it from the results).

2. Select **Policies > Create Policy**.

3. Select **JSON**. On the **JSON** tab, enter the following JSON snippet (update the Region and account ID from the secret ARN):
   ```json
   {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Sid": "VisualEditor0",
               "Effect": "Allow",
               "Action": [
                   "secretsmanager:GetSecretValue",
                   "secretsmanager:DescribeSecret"
               ],
               "Resource": "arn:aws:secretsmanager:<REGION>:<ACCOUNT_ID>:secret:aws-glue-*"
           }
       ]
   }

   ```

4. Select **Next**.

5. Under **Policy name**, enter a name for the policy, for example, **GlueAccessSecretValue**.

6. Select **Create policy**.

7. On the **IAM Dashboard**, select **Roles > Create role**.

8. Under **Trusted entity type**, select **AWS Service**. Under **Use case**, select **Glue**.

9. Select **Next**.

10. Find and select the following AWS managed policies: `AWSGlueServiceRole` and `AmazonEC2ContainerRegistryReadOnly`.

11. Find the policy created earlier `GlueAccessSecretValue`, and select it.

12. Select **Next**.

13. In the **Role name** field, enter a role name, for example, **GlueCustomETLConnectionRole**.

14. Confirm that the three policies are selected, and select **Create role**.

## Configure the SingleStore Connector and Connection

To subscribe to the SingleStore connector and configure the connection:

1. On the AWS dashboard, select **AWS Glue Studio** (search Glue Studio on the AWS console and select it from the results).

2. In the left navigation pane, select **Data Connections**.

3. Select **Go to AWS Marketplace**. On the **AWS Marketplace**, search and select **SingleStore connector for AWS Glue**.

4. Select **View purchase options > Subscribe**.

5. Once the subscription request is successful, select **View subscription > SingleStore connector for AWS Glue > Usage instructions**.

6. On the **Usage Instructions** dialog, open the **Activate the Glue connector using AWS Glue Studio** link to activate the connector.

7. On the **AWS Glue Studio** console, select **Data connections** in the left navigation pane.

8. In the **Connections** section, select **Create connection**.

9. In the **Create connection** dialog, under **Connection properties**, enter a name for the connection in the **Name** field, for example `SingleStore_connection`.

10. From the **AWS Secret** list, select the AWS secret value `aws-glue-singlestore-connection-info` created earlier.

11. Select **Create connection and activate connector**.

## Create an ETL Job using the SingleStore Connection in AWS Glue Studio

The example uses the `lineitem` table from the `tpch` database. Refer to [Load TPC-H Data into SingleStore](https://docs.singlestore.com/cloud/getting-started-with-singlestore-helios/next-steps-and-examples/sample-data/load-tpc-h-data-into-singlestore.md). Alternatively, you can use another data set stored in your SingleStore database.

After authentication is set up and the SingleStore connector is configured, create an ETL job using the connection:

1. In the **AWS Glue Studio** console, select **Data Connection** in the left navigation pane.

2. In the **Connections** dialog, select **SingleStore\_connection** (the connection created earlier). Select **Create job**.

3. On the **Job details** tab, in the **Name** field, enter a name for the job, for example, **SingleStore\_transform\_job**.

4. In the **Description** field, enter a description, for example, **Glue job to transform tpch data from SingleStore Helios**.

5. From the **IAM Role** list, select **GlueCustomETLConnectionRole**.

6. From the **Glue version** list, select a Glue version.

7. Use default settings for other properties. Select **Save**.

8. On the **Visual** tab, in the workspace area, select the **SingleStore** connection. On the **Data source properties – Connector** tab, expand **Connection options**, and select **Add new option**.

   ![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/blt80df0287b42cb2ab/6a2c436303b373445e908174/glue_select_data_source-uTkfb1.png)

9. Under **Connection options**, in the **Key** field, enter **dbtable**. In the **Value** field, enter `lineitem`.

10. On the **Output schema** tab, select **Edit**. Select the three dots, and select **Add root key** from the list.
    > **📝 Note**: In this example, AWS Glue Studio is using information stored in the connection to access the data source instead of retrieving metadata information from a Data Catalog table. Hence, you must provide the schema metadata for the data source. Use the schema editor to update the source schema. For instructions on how to use the schema editor, refer to [Editing the schema in a custom transform node](https://docs.aws.amazon.com/glue/latest/ug/edit-jobs-transforms.html).

11. Add the key and data type, which represent the name of the column in the database and its data type, respectively. Select **Apply**.

12. On the **Visual** tab workspace area, select **Add nodes**, and then select **DropFields** from the list.

13. On the **Transform** tab (for Drop Fields), from the **Node parents** list, select **SingleStore connector for AWS Glue**.

14. Select the fields to drop.

15. On the **Visual** tab workspace area, select **Add node**, and then select **Custom Transform** from the list.

16. From the **Node parents** list, select **Drop Fields**.

17. Add the following script to **Code block**:
    ```python
    def MyTransform (glueContext, dfc) -> DynamicFrameCollection:
        from pyspark.sql.functions import col

        df = dfc.select(list(dfc.keys())[0]).toDF().limit(100)
        df1 = df.withColumn("disc_price",(col("l_extendedprice")*(1-col("l_discount"))).cast("decimal(10,2)"))
        df2 = df1.withColumn("price", (col("disc_price")*(1+col("l_tax"))).cast("decimal(10,2)"))
        dyf = DynamicFrame.fromDF(df2, glueContext, "updated_lineitem")

        glueContext.write_dynamic_frame.from_options(frame = dyf,
         connection_type = "marketplace.spark",
         connection_options = {"dbtable":"updated_lineitem","connectionName":"SingleStore_connection"})

        return(DynamicFrameCollection({"CustomTransform0": dyf}, glueContext))

    ```
    This example calculates two additional columns, `disc_price` and `price`. It uses `glueContext.write_dynamic_frame` to write the updated data back to SingleStore using the connection **SingleStore\_connection** created earlier.
    > **📝 Note**: You can specify any of the SingleStore Spark connector [configuration settings](https://docs.singlestore.com/cloud/load-data/integrate-with-singlestore-helios/load-data-from-spark/configuration-settings.md) in `connection_options`. For example, the following statement (from the script) is extended to specify a shard key:```
    >  glueContext.write_dynamic_frame.from_options(frame = dyf,
    >      connection_type = "marketplace.spark",
    >      connection_options = {"dbtable":"updated_lineitem","connectionName":"SingleStore_connection", "tableKey.shard" : "l_partkey"})
    > ```

18. On the **Output schema** tab, select **Edit**. Add the additional columns `price` and `disc_price` with the `decimal` data type. Select **Apply**.

19. On the **Visual** tab, in the workspace area, select **Add nodes > Select from Collection**.

20. On the **Transform** tab, from the **Node parents** list, select **Custom transform**.

21. Select **Add nodes**. From the **Target** list, select **Amazon S3**.

22. From the **Node parents** list, select **Select from Collection**.

23. On the **Data target properties - S3 tab**, from the **Format** list, select **Parquet**.

24. In the S3 Target Location field, enter `s3://aws-glue-assets-{Your Account ID as a 12-digit number}-{AWS region}/output/` or select from the list.

    ![](https://images.contentstack.io/v3/assets/bltac01ee6daa3a1e14/bltf4a36b93f3154ade/6a2c42c5dc08997ef32f23db/glue_s3_target_location-Iuf5Cw.png)

25. Select **Save > Run**.

The transformed data is now stored in Amazon S3 and the SingleStore database.

***

Modified at: June 11, 2026

Source: [/cloud/load-data/integrate-with-singlestore-helios/load-data-from-aws-glue/](https://docs.singlestore.com/cloud/load-data/integrate-with-singlestore-helios/load-data-from-aws-glue/)

(An index of the documentation is available at /llms.txt)
