The Lifecycle of a Pipeline

The following list describes the lifecycle of a pipeline and shows the progression of a pipeline from CREATE and START, to error handling, and finally pipeline status information.

The user creates a new pipeline using CREATE PIPELINE.
The user starts the pipeline using START PIPELINE.

Note

Steps 3 to 8 refer to a batch, which is subset of data that the pipeline extracts from its data source. Steps 3 to 8 comprise a single batch operation, which will either succeed or fail. That is, if any step fails, the batch operation fails and rolls back.
The pipeline extracts a batch from its data source. The pipeline's offsets are updated to reflect the current position in the data source.
The pipeline optionally shapes (modifies) the batch, using one of these methods.
If the pipeline is able to successfully process the batch, the pipeline loads the batch into one or more SingleStore tables.
If an error occurs while a batch is running, the batch fails and its transactions are rolled back.
- Each batch is retried at most pipelines_max_retries_per_batch_partition times.
- If all retries are unsuccessful and pipelines_stop_on_error is set to ON, the pipeline stops.
- If all retries are unsuccessful but pipelines_stop_on_error is set to OFF, the pipeline continues and a new batch is processed. This batch includes the same files and/or objects as the first batch, excluding any files or objects that may have caused the error.
For more information, refer to Pipeline Troubleshooting.
The pipeline updates the FILE_STATE column in the information_schema.PIPELINES_FILES table, as follows:
- Files and objects in the batch that the pipeline processed successfully are marked as Loaded.
- Files and objects in the batch that the pipeline did not process successfully are marked as Skipped.
A file or object that is marked as Loaded or Skipped will not be processed again by the pipeline, unless ALTER PIPELINE ... DROP FILE ... is run.

The pipeline does not delete files nor objects from the data source.
The pipeline checks if the data source contains new data. If it does, the pipeline processes another batch immediately by rerunning steps 3 to 7.

If the data source does not contain more data, the pipeline waits for BATCH_INTERVAL milliseconds (which can be specified in the CREATE PIPELINE statement) before checking the data source for new data. If the pipeline finds new data at this point, the pipeline reruns steps 3 to 7.

Note

The user can stop a running pipeline using STOP PIPELINE. If this command is issued while a batch operation is running, the batch operation completes before the pipeline stops.
If a pipeline is stopped with the DETACH PIPELINE command, the data loading is stopped. However, the loading can be restarted with the START PIPELINE command and the data loading continues as before.

During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema at different times. Refer to Data Ingest for additional pipeline-related information schema tables, including information_schema.PIPELINES_FILES mentioned in step 7.

The Lifecycle of a Pipeline

Was this article helpful?

Was this article helpful?