# Load Data from Azure Blob Storage Using a Pipeline

## Prerequisites

To complete this Quickstart, your environment must meet the following prerequisites:

* **Azure Account**: This Quickstart uses Azure Blob Store.
* **SingleStore installation –or– a SingleStore cluster**: You will connect to the database or cluster and create a pipeline to pull data from your Azure Blob Store.

## Part 1: Creating an Azure Blob Container and Adding a File

1. On your local machine, create a text file with the following CSV contents and name it *books.txt*:
   ```
   The Catcher in the Rye, J.D. Salinger, 1945
   Pride and Prejudice, Jane Austen, 1813
   Of Mice and Men, John Steinbeck, 1937
   Frankenstein, Mary Shelley, 1818
   ```

2. In Azure, create a container and upload `books.txt` to the container. For information on working with Azure, see the [Azure Docs](https://docs.microsoft.com/en-us/azure/).

Once the *books.txt* file has been uploaded, you can proceed to the next part of the Quickstart.

## Part 2: Creating a SingleStore Database and Azure Blob Pipeline

Now that you have an Azure container that contains an object (file), you can use SingleStore or DB to create a new pipeline and ingest the blobs.

We will create a new database and a table that adheres to the schema contained in **books.txt** file. At the MemSQL prompt, execute the following statements:

```sql
CREATE DATABASE books;
```

```sql
CREATE TABLE classic_books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);
```

These statements create a new database named `books` and a new table named `classic_books`, which has three columns: `title`, `author`, and `date`.

Now that the destination database and table have been created, you can create an Azure pipeline. In Part 1 of this Quickstart, you uploaded the **books.txt** file to your container. To create the pipeline, you will need the following information:

* The name of the container, such as: `my-container-name`
* Your Azure Storage account’s name and key, such as:

  * *Account Name*: `your_account_name`
  * *Account Key*: `your_account_key`

Using these identifiers and keys, execute the following statement, replacing the placeholder values with your own:

```sql
CREATE PIPELINE library
AS LOAD DATA AZURE 'my-container-name'
CREDENTIALS '{"account_name": "your_account_name", "account_key":
"your_account_key"}'
INTO TABLE `classic_books`
FIELDS TERMINATED BY ',';

```

You can see what files the pipeline wants to load by running the following:

```sql
SELECT * FROM information_schema.PIPELINES_FILES;

```

If everything is properly configured, you should see one row in the `Unloaded` state, corresponding to `books.txt`. The `CREATE PIPELINE` statement creates a new pipeline named `library`, but the pipeline has not yet been started, and no data has been loaded. A SingleStore pipeline can run either in the background or be triggered by a foreground query. Start it in the foreground first.

```sql
START PIPELINE library FOREGROUND;

```

When this command returns successfully, all files from your bucket will be loaded. If you check `information_schema.PIPELINES_FILES` again, you should see all files in the `Loaded` state. Now query the `classic_books` table to make sure the data has actually loaded.

```sql
SELECT * FROM classic_books;

```

```output

+------------------------+-----------------+-------+
| title                  | author          | date  |
+------------------------+-----------------+-------+
| The Catcher in the Rye |  J.D. Salinger  |  1945 |
| Pride and Prejudice    |  Jane Austen    |  1813 |
| Of Mice and Men        |  John Steinbeck |  1937 |
| Frankenstein           |  Mary Shelley   |  1818 |
+------------------------+-----------------+-------+

```

You can also have SingleStore run your pipeline in background. In such a configuration, SingleStore will periodically poll Azure Blob Storage for new files and continuously load them as they are added to the storage container. Before running your pipeline in the background, you must reset the state of the pipeline and the table.

```sql
DELETE FROM classic_books;
ALTER PIPELINE library SET OFFSETS EARLIEST;

```

The first command deletes all rows from the target table. The second causes the pipeline to start from the beginning, in this case, *forgetting* it already loaded `books.txt` so you can load it again. You can also drop and recreate the pipeline, if you prefer.

To start a pipeline in the background, run `START PIPELINE`.

```sql
START PIPELINE library;

```

This statement starts the pipeline. To see whether the pipeline is running, run `SHOW PIPELINES`.

```sql
SHOW PIPELINES;

```

```output

+----------------------+---------+
| Pipelines_in_books   | State   |
+----------------------+---------+
| library              | Running |
+----------------------+---------+

```

At this point, the pipeline is running and the contents of the **books.txt** file should once again be present in the `classic_books` table.

> **📝 Note**: Foreground pipelines and background pipelines have different intended uses and behave differently. For more information, see [START PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md).

## Next Steps

See [About SingleStore Pipelines](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines.md) to learn more about how pipelines work.

***

Modified at: September 9, 2024

Source: [/db/v9.1/load-data/data-sources/load-data-from-azure-blob-storage-using-a-pipeline/](https://docs.singlestore.com/db/v9.1/load-data/data-sources/load-data-from-azure-blob-storage-using-a-pipeline/)

(An index of the documentation is available at /llms.txt)
