The Lifecycle of a Pipeline
-
The user creates a new pipeline using CREATE PIPELINE.
-
The user starts the pipeline using START PIPELINE.
Note
Steps 3 to 8 refer to a batch, which is subset of data that the pipeline extracts from its data source.
These steps comprise one batch operation, which will succeed or fail completely. If any step fails, the batch operation rolls back. -
The pipeline extracts a batch from its data source.
The pipeline's offsets are updated to reflect the current position in the data source. -
The pipeline optionally shapes (modifies) the batch, using one of these methods.
-
If the pipeline is able to successfully process the batch, the pipeline loads the batch into one or more SingleStore tables.
-
If an error occurs while a batch
b
is running, the batchb
fails and its transactions are rolled back.Then b
is retried at mostpipelines_
times.max_ retries_ per_ batch_ partition If pipelines_
is set tostop_ on_ error ON
and all retries are unsuccessful, the pipeline stops.Otherwise, the pipeline continues and processes a new batch nb
,which processes the same files or objects thatb
attempted to process, excluding any files or objects that may have caused the error.For more information, see Pipeline Troubleshooting.
-
The pipeline updates the
FILE_
column in theSTATE information_
table, as follows:schema. PIPELINES_ FILES -
Files and objects in the batch that the pipeline processed successfully are marked as
Loaded
. -
Files and objects in the batch that the pipeline did not process successfully, after all retries are unsuccessful (as described in step 6), are marked as
Skipped
.
A file or object that is marked as
Loaded
orSkipped
will not be processed again by the pipeline, unless ALTER PIPELINE .. . DROP FILE . . . is run. The pipeline does not delete files nor objects from the data source.
-
-
The pipeline checks if the data source contains new data.
If it does, the pipeline processes another batch immediately by running steps 3 to 7 again. If the data source does not contain more data, the pipeline waits for BATCH_
milliseconds (which is specified in theINTERVAL CREATE PIPELINE
statement) before checking the data source for new data.If the pipeline finds new data at this point, the pipeline runs steps 3 to 7 again.
Note
-
The user can stop a running pipeline using STOP PIPELINE.
If this command is executed while a batch operation is executing, the batch operation completes before the pipeline stops. -
If a pipeline is stopped with the DETACH PIPELINE command, the data loading is stopped.
However, it can be restarted with the START PIPELINE command and the data loading continues in the same way as before.
During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema, at different times.information_
table mentioned in step 7, all of the other updates are not discussed here.
Last modified: February 25, 2025