Load Data from Azure Blob Storage Using a Pipeline
On this page
Prerequisites
To complete this Quickstart, your environment must meet the following prerequisites:
-
Azure Account: This Quickstart uses Azure Blob Store.
-
SingleStore installation –or– a SingleStore cluster: You will connect to the database or cluster and create a pipeline to pull data from your Azure Blob Store.
Part 1: Creating an Azure Blob Container and Adding a File
-
On your local machine, create a text file with the following CSV contents and name it books.
txt: The Catcher in the Rye, J.D. Salinger, 1945 Pride and Prejudice, Jane Austen, 1813 Of Mice and Men, John Steinbeck, 1937 Frankenstein, Mary Shelley, 1818
-
In Azure, create a container and upload
books.
to the container.txt For information on working with Azure, see the Azure Docs.
Once the books.
Part 2: Creating a SingleStore Database and Azure Blob Pipeline
Now that you have an Azure container that contains an object (file), you can use SingleStore or DB to create a new pipeline and ingest the blobs.
We will create a new database and a table that adheres to the schema contained in books.
CREATE DATABASE books;
CREATE TABLE classic_books(title VARCHAR(255),author VARCHAR(255),date VARCHAR(255));
These statements create a new database named books
and a new table named classic_
, which has three columns: title
, author
, and date
.
Now that the destination database and table have been created, you can create an Azure pipeline.
-
The name of the container, such as:
my-container-name
-
Your Azure Storage account’s name and key, such as:
-
Account Name:
your_
account_ name -
Account Key:
your_
account_ key
-
Using these identifiers and keys, execute the following statement, replacing the placeholder values with your own:
CREATE PIPELINE libraryAS LOAD DATA AZURE 'my-container-name'CREDENTIALS '{"account_name": "your_account_name", "account_key":"your_account_key"}'INTO TABLE `classic_books`FIELDS TERMINATED BY ',';
You can see what files the pipeline wants to load by running the following:
SELECT * FROM information_schema.PIPELINES_FILES;
If everything is properly configured, you should see one row in the Unloaded
state, corresponding to books.
.CREATE PIPELINE
statement creates a new pipeline named library
, but the pipeline has not yet been started, and no data has been loaded.
START PIPELINE library FOREGROUND;
When this command returns successfully, all files from your bucket will be loaded.information_
again, you should see all files in the Loaded
state.classic_
table to make sure the data has actually loaded.
SELECT * FROM classic_books;
+------------------------+-----------------+-------+
| title | author | date |
+------------------------+-----------------+-------+
| The Catcher in the Rye | J.D. Salinger | 1945 |
| Pride and Prejudice | Jane Austen | 1813 |
| Of Mice and Men | John Steinbeck | 1937 |
| Frankenstein | Mary Shelley | 1818 |
+------------------------+-----------------+-------+
You can also have SingleStore run your pipeline in background.
DELETE FROM classic_books;ALTER PIPELINE library SET OFFSETS EARLIEST;
The first command deletes all rows from the target table.books.
so you can load it again.
To start a pipeline in the background, run START PIPELINE
.
START PIPELINE library;
This statement starts the pipeline.SHOW PIPELINES
.
SHOW PIPELINES;
+----------------------+---------+
| Pipelines_in_books | State |
+----------------------+---------+
| library | Running |
+----------------------+---------+
At this point, the pipeline is running and the contents of the books.classic_
table.
Note
Foreground pipelines and background pipelines have different intended uses and behave differently.
Next Steps
See About SingleStore Pipelines to learn more about how pipelines work.
Last modified: September 9, 2024