The Lifecycle of a Pipeline
On this page
The following list shows the progression of a pipeline from CREATE and START, through error handling, to pipeline status updates.
-
Use CREATE PIPELINE to create a new pipeline.
-
Use START PIPELINE to start the pipeline.
Note
Steps 3 to 8 refer to a batch, which is a subset of data that the pipeline extracts from its data source.
Refer to Batches for related information. -
The pipeline extracts the batch (files for S3; offset ranges for Kafka) and updates offsets to reflect progress in the data source.
-
The pipeline optionally transforms (modifies) the batch, using one of the data shaping methods.
Refer to Methods for Data Shaping with Pipelines for more information. -
If the pipeline is able to successfully process the batch, it loads the batch into one or more SingleStore tables.
If the processing fails, the batch is fully rolled back. -
If an error occurs while a batch
bis running, the batchbfails and all its transactions are rolled back.Batch bis then retried up topipelines_times.max_ retries_ per_ batch_ partition If pipelines_is set tostop_ on_ error ONand all retries are unsuccessful, the pipeline stops.Otherwise, the pipeline continues and processes a new batch nb, using the same files or objects from batchb, excluding the files or objects that may have caused the error.Refer to Pipeline Troubleshooting for more information. -
The pipeline updates the
FILE_column in theSTATE information_table as follows:schema. PIPELINES_ FILES -
Files and objects in the batch that the pipeline processed successfully are marked as
Loaded. -
Files and objects in the batch that the pipeline did not process successfully, after all retries are unsuccessful (as described in step 6), are marked as
Skipped.
A file or object that is marked as
LoadedorSkippedis not processed again by the pipeline, unlessALTER PIPELINE .is run.. . DROP FILE . . . The pipeline does not delete files and objects from the data source. -
-
The pipeline checks if the data source contains new data.
If it does, the pipeline processes another batch immediately by running steps 3 to 7 again. If the data source does not contain more data, the pipeline waits for BATCH_milliseconds (which is specified in theINTERVAL CREATE PIPELINEstatement) before checking the data source for new data.If the pipeline finds new data at this point, the pipeline runs steps 3 to 7 again.
Note
-
Use
STOP PIPELINEto stop a running pipeline.If a batch is in progress when the command runs, the batch completes before the pipeline stops. -
If a pipeline is stopped with the
DETACH PIPELINEcommand, the data loading is stopped.However, it can be restarted with the START PIPELINEcommand and the data loading continues in the same way as before.
During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema.information_ table mentioned in step 7, other updates are not discussed here.
Batches
A batch is the set of data processed in one pipeline cycle across all batch partitions.
A batch partition is the work performed by a single database partition during a pipeline lifecycle.
BATCH_ is the time a running pipeline waits before checking for new data when the source temporarily has no data.
pipelines_ defines the maximum number of retries for a failing batch partition before the pipeline either continues or stops, depending on the configuration.
S3 Pipelines
-
Each database partition (batch partition) picks exactly one source file per batch cycle, if available.
-
The union of all files across partitions in that cycle is the batch.
-
If fewer files are available than partitions, some partitions remain idle; the batch still consists of the files picked that cycle.
-
File paths appear in
information_asschema. PIPELINES_ BATCHES BATCH_.SOURCE_ PARTITION_ ID
For example,
SELECT * FROM information_schema.PIPELINES_BATCHES WHERE DATABASE_NAME = "trades";
| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_ROWS_INSERTED | BATCH_ROWS_UPDATED | BATCH_ROWS_DELETED | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME | HOST | PORT | PARTITION |
| ------------- | ------------- | -------- | ----------- | ------------------ | ------------------- | ------------------ | ------------------ | ---------- | -------------------------- | --------------------- | --------------------------- | ------------------------- | --------------------- | ------------------- | -------------------- | ------------------------------- | --------------------------------- | ----------------------------------- | ----------------------------------- | --------------------------------------------------------------------------------------------- | ---- | --------- |
| trades | company | 40129 | Succeeded | 3,288 | 3,288 | 0 | 0 | 1.161775 | 1762498594.440545 | Succeeded | 3,288 | trades/company.csv | 0 | 1 | 0.740079 | 387,137 | NULL | 0.6682 | 0 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag2-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 | 5 |
| trades | trade | 40130 | Succeeded | 5,810,328 | 5,810,328 | 0 | 0 | 17.830798 | 1762498594.440578 | Succeeded | 5,810,328 | trades/trade.csv | 0 | 1 | 15.383328 | 329,317,993 | NULL | 3.7094 | 0 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag1-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 | 1 |Kafka Pipelines
-
Each database partition claims a contiguous set of offsets from its assigned Kafka partition for the current batch cycle.
-
The union of all offsets across partitions constitutes the batch.
-
Earliest and latest offsets for each batch partition appear as
BATCH_andEARLIEST_ OFFSET BATCH_along with the Kafka partition inLATEST_ OFFSET BATCH_.SOURCE_ PARTITION_ ID
For Example,
SELECT * FROM information_schema.PIPELINES_BATCHES WHERE DATABASE_NAME = "adtech";
| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_ROWS_INSERTED | BATCH_ROWS_UPDATED | BATCH_ROWS_DELETED | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME | HOST | PORT | PARTITION |
| ------------- | ------------- | -------- | ----------- | ------------------ | ------------------- | ------------------ | ------------------ | ---------- | -------------------------- | --------------------- | --------------------------- | ------------------------- | --------------------- | ------------------- | -------------------- | ------------------------------- | --------------------------------- | ----------------------------------- | ----------------------------------- | --------------------------------------------------------------------------------------------- | ---- | --------- |
| adtech | events | 40006 | Succeeded | 8,000,000 | 8,000,000 | 0 | 0 | 158.126145 | 1762497976.481443 | Succeeded | 1,000,000 | 0 | 37,021,295 | 38,021,295 | 145.240453 | 105,468,756 | NULL | 0.7554 | 0 | node-57eb03ab-7fba-445a-b4c8-31af2e0633ae-leaf-ag2-0.svc-57eb03ab-7fba-445a-9abb-23fe45e4c0ab | 3306 | 7 |Last modified: November 21, 2025