The Lifecycle of a Pipeline
The following list describes the lifecycle of a pipeline and shows the progression of a pipeline from CREATE
and START
, to error handling, and finally pipeline status information.
-
The user creates a new pipeline using CREATE PIPELINE.
-
The user starts the pipeline using START PIPELINE.
Note
Steps 3 to 8 refer to a batch, which is subset of data that the pipeline extracts from its data source.
Steps 3 to 8 comprise a single batch operation, which will either succeed or fail. That is, if any step fails, the batch operation fails and rolls back. -
The pipeline extracts a batch from its data source.
The pipeline's offsets are updated to reflect the current position in the data source. -
The pipeline optionally shapes (modifies) the batch, using one of these methods.
-
If the pipeline is able to successfully process the batch, the pipeline loads the batch into one or more SingleStore tables.
-
If an error occurs while a batch is running, the batch fails and its transactions are rolled back.
-
Each batch is retried at most
pipelines_
times.max_ retries_ per_ batch_ partition -
If all retries are unsuccessful and
pipelines_
is set tostop_ on_ error ON
, the pipeline stops. -
If all retries are unsuccessful but
pipelines_
is set tostop_ on_ error OFF
, the pipeline continues and a new batch is processed.This batch includes the same files and/or objects as the first batch, excluding any files or objects that may have caused the error.
For more information, refer to Troubleshoot Pipelines.
-
-
The pipeline updates the
FILE_
column in theSTATE information_
table, as follows:schema. PIPELINES_ FILES -
Files and objects in the batch that the pipeline processed successfully are marked as
Loaded
. -
Files and objects in the batch that the pipeline did not process successfully are marked as
Skipped
.
A file or object that is marked as
Loaded
orSkipped
will not be processed again by the pipeline, unless ALTER PIPELINE .. . DROP FILE . . . is run. The pipeline does not delete files nor objects from the data source.
-
-
The pipeline checks if the data source contains new data.
If it does, the pipeline processes another batch immediately by rerunning steps 3 to 7. If the data source does not contain more data, the pipeline waits for
BATCH_
milliseconds (which can be specified in theINTERVAL CREATE PIPELINE
statement) before checking the data source for new data.If the pipeline finds new data at this point, the pipeline reruns steps 3 to 7.
Note
-
The user can stop a running pipeline using STOP PIPELINE.
If this command is issued while a batch operation is running, the batch operation completes before the pipeline stops. -
If a pipeline is stopped with the DETACH PIPELINE command, the data loading is stopped.
However, the loading can be restarted with the START PIPELINE command and the data loading continues as before.
During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema, at different times.information_
mentioned in step 7.
Last modified: May 9, 2025