# The Lifecycle of a Pipeline

The following list shows the progression of a pipeline from `CREATE` and `START`, through error handling, to pipeline status updates.

1. Use [CREATE PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/create-pipeline.md) to create a new pipeline.

2. Use [START PIPELINE](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md) to start the pipeline.
   > **📝 Note**: Steps 3 to 8 refer to a batch, which is a subset of data that the pipeline extracts from its data source. Refer to [Batches](https://docs.singlestore.com/#section-id235274685905399.md) for related information.

3. The pipeline extracts the batch (files for S3; offset ranges for Kafka) and updates offsets to reflect progress in the data source.

4. The pipeline optionally transforms (modifies) the batch, using one of the data shaping methods. Refer to [Methods for Data Shaping with Pipelines](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines/pipeline-concepts/data-shaping-with-pipelines/#methods-for-data-shaping-with-pipelines.md) for more information.

5. If the pipeline is able to successfully process the batch, it  loads the batch into one or more SingleStore tables. If the processing fails, the batch is fully rolled back.

6. If an error occurs while a batch `b` is running, the batch `b` fails and all its transactions are rolled back. Batch `b` is then retried up to `pipelines_max_retries_per_batch_partition` times. If `pipelines_stop_on_error` is set to `ON` and all retries are unsuccessful, the pipeline stops. Otherwise, the pipeline continues and processes a new batch `nb`, using the same files or objects from batch `b`, excluding the files or objects that may have caused the error. Refer to [Pipeline Troubleshooting](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines/pipeline-troubleshooting.md) for more information.

7. The pipeline updates the `FILE_STATE` column in the `information_schema.PIPELINES_FILES` table as follows:

   * Files and objects in the batch that the pipeline processed successfully are marked as `Loaded`.
   * Files and objects in the batch that the pipeline did not process successfully, after all retries are unsuccessful (as described in step 6), are marked as `Skipped`.

   A file or object that is marked as `Loaded` or `Skipped` is not processed again by the pipeline, unless [`ALTER PIPELINE ... DROP FILE ...`](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/alter-pipeline.md) is run. The pipeline does not delete files and objects from the data source.

8. The pipeline checks if the data source contains new data. If it does, the pipeline processes another batch immediately by running steps 3 to 7 again. If the data source does not contain more data, the pipeline waits for `BATCH_INTERVAL` milliseconds (which is specified in the `CREATE PIPELINE` statement) before checking the data source for new data. If the pipeline finds new data at this point, the pipeline runs steps 3 to 7 again.

> **📝 Note**: - Use [`STOP PIPELINE`](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/stop-pipeline.md) to stop a running pipeline. If a batch is in progress when the command runs, the batch completes before the pipeline stops.
> - If a pipeline is stopped with the [`DETACH PIPELINE`](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/detach-pipeline.md) command, the data loading is stopped. However, it can be restarted with the [`START PIPELINE`](https://docs.singlestore.com/db/v9.1/reference/sql-reference/pipelines-commands/start-pipeline.md) command and the data loading continues in the same way as before.

During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema. Refer to [Data Ingest](https://docs.singlestore.com/db/v9.1/reference/information-schema-reference/data-ingest.md) for more information on pipelines tables. Except for the update of the `information_schema.PIPELINES_FILES` table mentioned in step 7, other updates are not discussed here.

## Batches

A batch is the set of data processed in one pipeline cycle across all batch partitions. It succeeds or fails as a unit, and failures cause a rollback for the entire batch.

A batch partition is the work performed by a single database partition during a pipeline lifecycle. Each partition independently extracts data from the source (files or offsets) and contributes to the batch.

`BATCH_INTERVAL` is the time a running pipeline waits before checking for new data when the source temporarily has no data.

`pipelines_max_retries_per_batch_partition` defines the maximum number of retries for a failing batch partition before the pipeline either continues or stops, depending on the configuration.

## S3 Pipelines

* Each database partition (batch partition) picks exactly one source file per batch cycle, if available.
* The union of all files across partitions in that cycle is the batch.
* If fewer files are available than partitions, some partitions remain idle; the batch still consists of the files picked that cycle.
* File paths appear in `information_schema.PIPELINES_BATCHES` as `BATCH_SOURCE_PARTITION_ID`.

For example,

```sql
SELECT * FROM information_schema.PIPELINES_BATCHES WHERE DATABASE_NAME = "trades";

```

```output

| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_ROWS_INSERTED | BATCH_ROWS_UPDATED | BATCH_ROWS_DELETED | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME |                                                     HOST                                      | PORT | PARTITION |
| ------------- | ------------- | -------- | ----------- | ------------------ | ------------------- | ------------------ | ------------------ | ---------- | -------------------------- | --------------------- | --------------------------- | ------------------------- | --------------------- | ------------------- | -------------------- | ------------------------------- | --------------------------------- | ----------------------------------- | ----------------------------------- | --------------------------------------------------------------------------------------------- | ---- | --------- |
|     trades    |    company    |   40129  |  Succeeded  |        3,288       |        3,288        |          0         |           0        |  1.161775  |      1762498594.440545     |       Succeeded       |             3,288           |     trades/company.csv    |           0           |         1           |       0.740079       |              387,137            |                 NULL              |                 0.6682              |                   0                 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag2-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 |     5     |
|     trades    |    trade      |   40130  |  Succeeded  |     5,810,328      |       5,810,328     |          0         |           0        |  17.830798 |      1762498594.440578     |       Succeeded       |           5,810,328         |     trades/trade.csv      |           0           |         1           |       15.383328      |            329,317,993          |                 NULL              |                 3.7094              |                   0                 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag1-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 |     1     |
```

## Kafka Pipelines

* Each database partition claims a contiguous set of offsets from its assigned Kafka partition for the current batch cycle.
* The union of all offsets across partitions constitutes the batch.
* Earliest and latest offsets for each batch partition appear as `BATCH_EARLIEST_OFFSET` and `BATCH_LATEST_OFFSET` along with the Kafka partition in `BATCH_SOURCE_PARTITION_ID`.

For Example,

```sql
SELECT * FROM information_schema.PIPELINES_BATCHES WHERE DATABASE_NAME = "adtech";

```

```output

| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_ROWS_INSERTED | BATCH_ROWS_UPDATED | BATCH_ROWS_DELETED | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME |                                              HOST                                             | PORT | PARTITION |
| ------------- | ------------- | -------- | ----------- | ------------------ | ------------------- | ------------------ | ------------------ | ---------- | -------------------------- | --------------------- | --------------------------- | ------------------------- | --------------------- | ------------------- | -------------------- | ------------------------------- | --------------------------------- | ----------------------------------- | ----------------------------------- | --------------------------------------------------------------------------------------------- | ---- | --------- |
|     adtech    |     events    |   40006  |  Succeeded  |      8,000,000     |       8,000,000     |           0        |          0         | 158.126145 |      1762497976.481443     |        Succeeded      |           1,000,000         |              0            |       37,021,295      |      38,021,295     |       145.240453     |             105,468,756         |                 NULL              |                 0.7554              |                   0                 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag2-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 |     7     |
```

***

Modified at: November 21, 2025

Source: [/db/v9.1/load-data/about-singlestore-pipelines/pipeline-concepts/the-lifecycle-of-a-pipeline/](https://docs.singlestore.com/db/v9.1/load-data/about-singlestore-pipelines/pipeline-concepts/the-lifecycle-of-a-pipeline/)

(An index of the documentation is available at /llms.txt)
