The Lifecycle of a Pipeline

  1. The user creates a new pipeline using CREATE PIPELINE.

  2. The user starts the pipeline using START PIPELINE.

    Note

    Steps 3 to 8 refer to a batch, which is subset of data that the pipeline extracts from its data source. These steps comprise one batch operation, which will succeed or fail completely. If any step fails, the batch operation rolls back.

  3. The pipeline extracts a batch from its data source. The pipeline's offsets are updated to reflect the current position in the data source.

  4. The pipeline optionally shapes (modifies) the batch, using one of these methods.

  5. If the pipeline is able to successfully process the batch, the pipeline loads the batch into one or more SingleStore tables.

  6. If an error occurs while a batch b is running, then b will fail and b 's transaction rolls back. Then b is retried at most pipelines_max_retries_per_batch_partition times. If all of the retries are unsuccessful and pipelines_stop_on_error is set to ON, the pipeline stops. Otherwise, the pipeline continues and processes a new batch nb ,which processes the same files or objects that b attempted to process, excluding any files or objects that may have caused the error.

    For more information, see Pipeline Troubleshooting.

  7. The pipeline updates the FILE_STATE column in the information_schema.PIPELINES_FILES table, as follows:

    • Files and objects in the batch that the pipeline processed successfully are marked as Loaded.

    • Files and objects in the batch that the pipeline did not process successfully, after all retries are unsuccessful (as described in step 6), are marked as Skipped.

    A file or object that is marked as Loaded or Skipped will not be processed again by the pipeline, unless ALTER PIPELINE ... DROP FILE ... is run.

    The pipeline does not delete files nor objects from the data source.

  8. The pipeline checks if the data source contains new data. If it does, the pipeline processes another batch immediately by running steps 3 to 7 again. If the data source does not contain more data, the pipeline waits for BATCH_INTERVAL milliseconds (which is specified in the CREATE PIPELINE statement) before checking the data source for new data. If the pipeline finds new data at this point, the pipeline runs steps 3 to 7 again.

Note

The user can stop a running pipeline using STOP PIPELINE. If this command is executed while a batch operation is executing, the batch operation completes before the pipeline stops.

During a pipeline's lifecycle, the pipeline updates the pipelines tables in the information schema, at different times. Other than the update of the information_schema.PIPELINES_FILES table mentioned in step 7, all of the other updates are not discussed here.

Last modified: October 8, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK