Load Data from Azure Blob Storage Using a Pipeline
On this page
Prerequisites
To complete this Quickstart, your environment must meet the following prerequisites:
- 
        Azure Account: This Quickstart uses Azure Blob Store. 
- 
        SingleStore installation –or– a SingleStore cluster: You will connect to the database or cluster and create a pipeline to pull data from your Azure Blob Store. 
Part 1: Creating an Azure Blob Container and Adding a File
- 
        On your local machine, create a text file with the following CSV contents and name it books. txt: The Catcher in the Rye, J.D. Salinger, 1945 Pride and Prejudice, Jane Austen, 1813 Of Mice and Men, John Steinbeck, 1937 Frankenstein, Mary Shelley, 1818
- 
        In Azure, create a container and upload books.to the container.txt For information on working with Azure, see the Azure Docs. 
Once the books.
Part 2: Creating a SingleStore Database and Azure Blob Pipeline
Now that you have an Azure container that contains an object (file), you can use SingleStore or DB to create a new pipeline and ingest the blobs.
We will create a new database and a table that adheres to the schema contained in books.
CREATE DATABASE books;
CREATE TABLE classic_books(title VARCHAR(255),author VARCHAR(255),date VARCHAR(255));
These statements create a new database named books and a new table named classic_, which has three columns: title, author, and date.
Now that the destination database and table have been created, you can create an Azure pipeline.
- 
        The name of the container, such as: my-container-name
- 
        Your Azure Storage account’s name and key, such as: - 
            Account Name: your_account_ name 
- 
            Account Key: your_account_ key 
 
- 
            
Using these identifiers and keys, execute the following statement, replacing the placeholder values with your own:
CREATE PIPELINE libraryAS LOAD DATA AZURE 'my-container-name'CREDENTIALS '{"account_name": "your_account_name", "account_key":"your_account_key"}'INTO TABLE `classic_books`FIELDS TERMINATED BY ',';
You can see what files the pipeline wants to load by running the following:
SELECT * FROM information_schema.PIPELINES_FILES;
If everything is properly configured, you should see one row in the Unloaded state, corresponding to books..CREATE PIPELINE statement creates a new pipeline named library, but the pipeline has not yet been started, and no data has been loaded.
START PIPELINE library FOREGROUND;
When this command returns successfully, all files from your bucket will be loaded.information_ again, you should see all files in the Loaded state.classic_ table to make sure the data has actually loaded.
SELECT * FROM classic_books;
+------------------------+-----------------+-------+
| title                  | author          | date  |
+------------------------+-----------------+-------+
| The Catcher in the Rye |  J.D. Salinger  |  1945 |
| Pride and Prejudice    |  Jane Austen    |  1813 |
| Of Mice and Men        |  John Steinbeck |  1937 |
| Frankenstein           |  Mary Shelley   |  1818 |
+------------------------+-----------------+-------+You can also have SingleStore run your pipeline in background.
DELETE FROM classic_books;ALTER PIPELINE library SET OFFSETS EARLIEST;
The first command deletes all rows from the target table.books. so you can load it again.
To start a pipeline in the background, run START PIPELINE.
START PIPELINE library;
This statement starts the pipeline.SHOW PIPELINES.
SHOW PIPELINES;
+----------------------+---------+
| Pipelines_in_books   | State   |
+----------------------+---------+
| library              | Running |
+----------------------+---------+At this point, the pipeline is running and the contents of the books.classic_ table.
Note
Foreground pipelines and background pipelines have different intended uses and behave differently.
Next Steps
See About SingleStore Pipelines to learn more about how pipelines work.
Last modified: September 9, 2024