Load Data from the Filesystem Using a Pipeline
On this page
Note
The filesystem pipelines feature is not supported by the SingleStore (MemSQL) Operator or the SingleStore database engine in Kubernetes environments.
Prerequisites
The following is a list of the minimum requirements needed to load data from the filesystem using a pipeline:
-
Filesystem access to obtain the data.
-
A SingleStore database.
-
A table with corresponding columns based on the data to be loaded.
Create a Database and a Filesystem Pipeline
The following sections walk through creating a local directory, a text file, a database, a table, and a pipeline for ingesting data.
-
In a terminal window, create a directory and change to the created directory.
In this step the directory is called test_
.directory mkdir test_directorycd test_directoryNote
The directory you create will need to be in the path specified by the
secure_
engine variable.file_ priv Import and export operations are only allowed on files in the directory specified by secure_
or its subdirectories.file_ priv -
Create a file called
books.
and cut and paste in the information below.txt Save the file using Ctrl-D
.cat > books.txtThe Catcher in the Rye, J.D. Salinger, 1945Pride and Prejudice, Jane Austen, 1813Of Mice and Men, John Steinbeck, 1937Frankenstein, Mary Shelley, 1818Ctrl-D -
All the steps including this one will be run from a SingleStore prompt.
Create a database and use the database. CREATE DATABASE books;USE books; -
This statement creates a new table titled
classic_
, which contains three columns:books title
,author
, anddate
.CREATE TABLE classic_books(title VARCHAR(255),author VARCHAR(255),date VARCHAR(255)); -
Now that the destination database and table have been created, the pipeline can be created.
The following information is needed to create a pipeline: CREATE PIPELINE library ASLOAD DATA FS '/absolute/path/to/test_directory'INTO TABLE classic_booksFIELDS TERMINATED BY ',';You can see what files the pipeline will attempt to load by running the following:
SELECT * FROM information_schema.PIPELINES_FILES;+---------------+---------------+-------------+-------------------------+-----------+------------+ | DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | FILE_NAME | FILE_SIZE | FILE_STATE | +---------------+---------------+-------------+-------------------------+-----------+------------+ | books | library | FS | /tmp/books.txt | 154 | Unloaded | +---------------+---------------+-------------+-------------------------+-----------+------------+
-
If properly configured, there will be one row in the Unloaded state for the
books.
file.txt The CREATE PIPELINE
statement will create a new pipeline named library.The pipeline has not been started, and data has not been loaded. A SingleStore pipeline can run either in the background or foreground.
Starting the pipeline in the foreground allows you to see any errors that may occur during the execution. Starting the pipeline in the background allows continuous data ingest into the specified table. START PIPELINE library FOREGROUND; -
Once the pipeline has started, the contents of the
books.
file should be loaded.txt Check the information_
, all files should be in the Loaded state.schema. PIPELINES_ FILES Query the classic_
table to verify the data has loaded.books SELECT * FROM classic_books;+------------------------+-----------------+-------+ | title | author | date | +------------------------+-----------------+-------+ | The Catcher in the Rye | J.D. Salinger | 1945 | | Pride and Prejudice | Jane Austen | 1813 | | Of Mice and Men | John Steinbeck | 1937 | | Frankenstein | Mary Shelley | 1818 | +------------------------+-----------------+-------+
Running a pipeline in the background results in a periodic polling of the directory for new files and will continuously add them to the directory.
The first query deletes all rows from the target table.
DELETE FROM classic_books;ALTER PIPELINE library SET OFFSETS EARLIEST;
To start a pipeline in the background, use:
START PIPELINE library;
This statement starts the pipeline.SHOW PIPELINES
.
SHOW PIPELINES;
+--------------------+---------+-----------+
| Pipelines_in_books | State | Scheduled |
+--------------------+---------+-----------+
| library | Running | False |
+--------------------+---------+-----------+
At this point, the pipeline is running and the contents of the books.classic_
table.
Note
Foreground pipelines and background pipelines have different intended uses and behave differently.
Next Steps
See Load Data with Pipelines to learn more about how pipelines work.
Last modified: January 23, 2024