# Load Data from the Filesystem Using a Pipeline

> **📝 Note**: The filesystem pipelines feature is not supported by the SingleStore (MemSQL) Operator or the SingleStore database engine in Kubernetes environments.

## Prerequisites

The following is a list of the minimum requirements needed to load data from the filesystem using a pipeline:

* Filesystem access to obtain the data.
* A SingleStore database.
* A table with corresponding columns based on the data to be loaded.

## Create a Database and a Filesystem Pipeline

The following sections walk through creating a local directory, a text file, a database, a table, and a pipeline for ingesting data.

1. In a terminal window, create a directory and change to the created directory. In this step the directory is called `test_directory`.
   ```shell
   mkdir test_directory 

   cd test_directory
   ```
   > **📝 Note**: The directory you create will need to be in the path specified by the `secure_file_priv` engine variable. Import and export operations are only allowed on files in the directory specified by `secure_file_priv` or its subdirectories.

2. Create a file called `books.txt` and cut and paste in the information below. Save the file using `Ctrl-D`.
   ```shell
   cat > books.txt

   The Catcher in the Rye, J.D. Salinger, 1945
   Pride and Prejudice, Jane Austen, 1813
   Of Mice and Men, John Steinbeck, 1937
   Frankenstein, Mary Shelley, 1818

   Ctrl-D
   ```

3. All the steps including this one will be run from a SingleStore prompt. Create a database and use the database.
   ```sql
   CREATE DATABASE books;

   USE books;
   ```

4. This statement creates a new table titled `classic_books`, which contains three columns: `title`, `author`, and `date`.
   ```sql
   CREATE TABLE classic_books
   (
   title VARCHAR(255),
   author VARCHAR(255),
   date VARCHAR(255)
   );
   ```

5. Now that the destination database and table have been created, the pipeline can be created. The following information is needed to create a pipeline:
   ```sql
   CREATE PIPELINE library AS 
   LOAD DATA FS '/absolute/path/to/test_directory'
   INTO TABLE classic_books
   FIELDS TERMINATED BY ',';
   ```
   You can see what files the pipeline will attempt to load by running the following:
   ```sql
   SELECT * FROM information_schema.PIPELINES_FILES;


   ```
   ```output

   +---------------+---------------+-------------+-------------------------+-----------+------------+
   | DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | FILE_NAME               | FILE_SIZE | FILE_STATE |
   +---------------+---------------+-------------+-------------------------+-----------+------------+
   | books         | library       | FS          | /tmp/books.txt          |       154 | Unloaded   |
   +---------------+---------------+-------------+-------------------------+-----------+------------+
   ```

6. If properly configured, there will be one row in the Unloaded state for the `books.txt` file. The `CREATE PIPELINE` statement will create a new pipeline named library. The pipeline has not been started, and data has not been loaded.

   A SingleStore pipeline can run either in the background or foreground. Starting the pipeline in the *foreground* allows you to see any errors that may occur during the execution. Starting the pipeline in the *background* allows continuous data ingest into the specified table.
   ```sql
   START PIPELINE library FOREGROUND;
   ```

7. Once the pipeline has started, the contents of the `books.txt` file should be loaded. Check the `information_schema.PIPELINES_FILES`, all files should be in the *Loaded* state. Query the `classic_books` table to verify the data has loaded.
   ```sql
   SELECT * FROM classic_books;


   ```
   ```output

   +------------------------+-----------------+-------+
   | title                  | author          | date  |
   +------------------------+-----------------+-------+
   | The Catcher in the Rye |  J.D. Salinger  |  1945 |
   | Pride and Prejudice    |  Jane Austen    |  1813 |
   | Of Mice and Men        |  John Steinbeck |  1937 |
   | Frankenstein           |  Mary Shelley   |  1818 |
   +------------------------+-----------------+-------+
   ```

Running a pipeline in the background results in a periodic polling of the directory for new files and will continuously add them to the directory. Before running a pipeline in the background, reset the pipeline and table.

The first query deletes all rows from the target table. The second query will cause the pipeline to start from the beginning. In this example, the pipeline will *forget* it already loaded **books.txt**. An alternate method is to drop and recreate the pipeline.

```sql
DELETE FROM classic_books;

ALTER PIPELINE library SET OFFSETS EARLIEST;
```

To start a pipeline in the background, use:

```sql
START PIPELINE library;
```

This statement starts the pipeline. To see whether the pipeline is running, run `SHOW PIPELINES`.

```sql
SHOW PIPELINES;


```

```output

+--------------------+---------+-----------+
| Pipelines_in_books | State   | Scheduled |
+--------------------+---------+-----------+
| library            | Running | False     |
+--------------------+---------+-----------+

```

At this point, the pipeline is running and the contents of the **books.txt** file should once again be present in the `classic_books` table.

> **📝 Note**: Foreground pipelines and background pipelines have different intended uses and behave differently. For more information, see [START PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md).

## Next Steps

See [About SingleStore Pipelines](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines.md) to learn more about how pipelines work.

***

Modified at: January 23, 2024

Source: [/db/v9.1/load-data/data-sources/load-data-from-the-filesystem-using-a-pipeline/](https://docs.singlestore.com/db/v9.1/load-data/data-sources/load-data-from-the-filesystem-using-a-pipeline/)

(An index of the documentation is available at /llms.txt)