Pipelines Scheduling

If you create multiple pipelines in SingleStore, they will all run in parallel. They will run in parallel until all SingleStore partitions have been saturated. You can use the variables in the following table to define the maximum number of partitions and the number of pipelines that can run at the same time.

Variables

Description

max_partitions_per_batch

Allows you to specify the maximum number of partitions for each batch. This variable can be modified for each pipeline by using the ALTER PIPELINE SET MAX_PARTITIONS_PER_BATCH command.

pipelines_max_concurrent_batch_partitions

Allows you to specify the maximum number of pipeline batch partitions that can run concurrently. This is a global variable.

pipelines_max_concurrent

Allows you to set the maximum number of pipelines running concurrently.

Note

The number of partitions that a pipeline uses is dependent on its source. For example, for Kafka pipelines, the number of batch partitions that can run concurrently can not exceed the number of Kafka topic partitions.

For example, consider a SingleStore database with 10 partitions. Without any constraints, it is possible to run 5 parallel pipelines using 2 partitions each, 2 pipelines using 5 partitions each, and so on.

If the partition requirements, (as set via max_partitions_per_batch) of any two pipelines exceed the total number of partitions, each pipeline will be run serially in a round robin fashion.

For example, consider a SingleStore database with 10 partitions, and 3 pipelines. Let's say the first batch of pipelines P1, P2, and P3 requires 4, 8, and 4 partitions, respectively. The pipelines are scheduled concurrently with the aim of saturating the partitions in a cluster. Hence, the scheduler will run pipelines P1 and P3 in parallel to process their first batch. And then, it will run pipeline P2 serially, because the sum of the number of partitions required by P2 and any other pipeline (P1 or P3) is greater than the number of partitions in the cluster (10 partitions).

In this same scenario, if the pipelines_max_concurrent_batch_partitions variable was set to 5, and the max_partitions_per_batch variable was not specified, then each pipeline P1, P2, and P3 will be run serially.

You can also use the pipelines_max_concurrent variable in this scenario. If the variable was set to 1, then each pipeline P1, P2, and P3 will be run serially as no two pipelines can run concurrently.

Last modified: September 2, 2024

Was this article helpful?