The Lifecycle of a Pipeline
The user creates a new pipeline using CREATE PIPELINE.
The user starts the pipeline using START PIPELINE.
Steps 3 to 8 refer to a batch, which is subset of data that the pipeline extracts from its data source.
These steps comprise one batch operation, which will succeed or fail completely. If any step fails, the batch operation rolls back.
The pipeline extracts a batch from its data source.
The pipeline's offsets are updated to reflect the current position in the data source.
The pipeline optionally shapes (modifies) the batch, using one of these methods.
If the pipeline is able to successfully process the batch, the pipeline loads the batch into one or more SingleStore tables.
If an error occurs while a batch
bis running, then
bwill fail and
b's transaction rolls back.
bis retried at most
max_ retries_ per_ batch_ partition If all of the retries are unsuccessful and
pipelines_is set to
stop_ on_ error
ON, the pipeline stops.
Otherwise, the pipeline continues and processes a new batch
nb,which processes the same files or objects that
battempted to process, excluding any files or objects that may have caused the error.
For more information, see View and Handle Pipeline Errors.
The pipeline updates the
FILE_column in the
information_table, as follows:
schema. PIPELINES_ FILES
Files and objects in the batch that the pipeline processed successfully are marked as
Files and objects in the batch that the pipeline did not process successfully, after all retries are unsuccessful (as described in step 6), are marked as
A file or object that is marked as
Skippedwill not be processed again by the pipeline, unless ALTER PIPELINE .
. . DROP FILE . . . is run.
The pipeline does not delete files nor objects from the data source.
The pipeline checks if the data source contains new data.
If it does, the pipeline processes another batch immediately by running steps 3 to 7 again. If the data source does not contain more data, the pipeline waits for
BATCH_milliseconds (which is specified in the
CREATE PIPELINEstatement) before checking the data source for new data.
If the pipeline finds new data at this point, the pipeline runs steps 3 to 7 again.
The user can stop a running pipeline using STOP PIPELINE.
During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema, at different times.
information_ table mentioned in step 7, all of the other updates are not discussed here.
Last modified: September 28, 2023