Load Data from the Filesystem Using a Pipeline

Prerequisites

The following is a list of the minimum requirements needed to load data from the filesystem using a pipeline:

Filesystem access to obtain the data.
A SingleStore database.
A table with corresponding columns based on the data to be loaded.

Create a Database and a Filesystem Pipeline

The following sections walk through creating a local directory, a text file, a database, a table, and a pipeline for ingesting data.

In a terminal window, create a directory and change to the created directory. In this step the directory is called test_directory.
Shell
```
mkdir test_directory

cd test_directory
```
Note

The directory you create will need to be in the path specified by the secure_file_priv engine variable. Import and export operations are only allowed on files in the directory specified by secure_file_priv or its subdirectories.

Create a file called books.txt and cut and paste in the information below. Save the file using Ctrl-D.

Shell

cat > books.txt

The Catcher in the Rye, J.D. Salinger, 1945
Pride and Prejudice, Jane Austen, 1813
Of Mice and Men, John Steinbeck, 1937
Frankenstein, Mary Shelley, 1818

Ctrl-D

All the steps including this one will be run from a SingleStore prompt. Create a database and use the database.
SQL
```
CREATE DATABASE books;

USE books;
```
This statement creates a new table titled classic_books, which contains three columns: title, author, and date.
SQL
```
CREATE TABLE classic_books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);
```

Now that the destination database and table have been created, the pipeline can be created. The following information is needed to create a pipeline:

SQL

CREATE PIPELINE library AS
LOAD DATA FS '/absolute/path/to/test_directory'
INTO TABLE classic_books
FIELDS TERMINATED BY ',';

You can see what files the pipeline will attempt to load by running the following:

SQL

SELECT * FROM information_schema.PIPELINES_FILES;

+---------------+---------------+-------------+-------------------------+-----------+------------+
| DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | FILE_NAME               | FILE_SIZE | FILE_STATE |
+---------------+---------------+-------------+-------------------------+-----------+------------+
| books         | library       | FS          | /tmp/books.txt          |       154 | Unloaded   |
+---------------+---------------+-------------+-------------------------+-----------+------------+

If properly configured, there will be one row in the Unloaded state for the books.txt file. The CREATE PIPELINE statement will create a new pipeline named library. The pipeline has not been started, and data has not been loaded.

A SingleStore pipeline can run either in the background or foreground. Starting the pipeline in the foreground allows you to see any errors that may occur during the execution. Starting the pipeline in the background allows continuous data ingest into the specified table.
SQL
```
START PIPELINE library FOREGROUND;
```

Once the pipeline has started, the contents of the books.txt file should be loaded. Check the information_schema.PIPELINES_FILES, all files should be in the Loaded state. Query the classic_books table to verify the data has loaded.

SQL

SELECT * FROM classic_books;

+------------------------+-----------------+-------+
| title                  | author          | date  |
+------------------------+-----------------+-------+
| The Catcher in the Rye |  J.D. Salinger  |  1945 |
| Pride and Prejudice    |  Jane Austen    |  1813 |
| Of Mice and Men        |  John Steinbeck |  1937 |
| Frankenstein           |  Mary Shelley   |  1818 |
+------------------------+-----------------+-------+

Running a pipeline in the background results in a periodic polling of the directory for new files and will continuously add them to the directory. Before running a pipeline in the background, reset the pipeline and table.

The first query deletes all rows from the target table. The second query will cause the pipeline to start from the beginning. In this example, the pipeline will forget it already loaded books.txt. An alternate method is to drop and recreate the pipeline.

SQL

DELETE FROM classic_books;

ALTER PIPELINE library SET OFFSETS EARLIEST;

To start a pipeline in the background, use:

SQL

START PIPELINE library;

This statement starts the pipeline. To see whether the pipeline is running, run SHOW PIPELINES.

SQL

SHOW PIPELINES;

+--------------------+---------+-----------+
| Pipelines_in_books | State   | Scheduled |
+--------------------+---------+-----------+
| library            | Running | False     |
+--------------------+---------+-----------+

At this point, the pipeline is running and the contents of the books.txt file should once again be present in the classic_books table.

Note

Foreground pipelines and background pipelines have different intended uses and behave differently. For more information, see START PIPELINE.

Next Steps

See About SingleStore Pipelines to learn more about how pipelines work.

Load Data from the Filesystem Using a Pipeline

On this page

Prerequisites

Create a Database and a Filesystem Pipeline

Next Steps

Was this article helpful?

On this page

Was this article helpful?