Flexible Parallelism
SingleStoreDB supports Flexible Parallelism, which allows multiple cores on the same node to access the same database partition. To use Flexible Parallelism, it must be enabled prior to database creation.
With Flexible Parallelism, as database partitions are created they are divided into sub-partitions. As a query runs on a leaf node, multiple cores working on behalf of the query can process different sub-partitions of a partition in parallel.
As an example, if you are currently at one partition per core with Flexible Parallelism, doubling the size of your cluster and then rebalancing will result in two cores for each partition. As a result, a simple query that scans and aggregates all the data in a single partition will now execute more quickly than it did before.
Typically, analytic or hybrid transactional/analytic workloads will gain some benefits from Flexible Parallelism, especially over time as you add more hardware and rebalance clusters. Strictly OLTP workloads will tend to not benefit from enabling Flexible Parallelism.
As you add cores to your SingleStore system, you can reach a point where you have more cores than database partitions on each leaf node. Without Flexible Parallelism, there is a limit of one core accessing each database partition; therefore, the maximum ratio of cores to partitions is 1:1. This can lead to a kind of bottleneck as you can have idle cores on a node where you have queries accessing a single partition.
Note
When Flexible Parallelism is enabled, all new databases created will use Flexible Parallelism. Existing databases use the maximum of one thread per partition. You cannot enable or disable Flexible Parallelism on existing databases.
Flexible Parallelism is disabled by default in SingleStoreDB in version 7.8 and previous.
Enabling and Configuring Flexible Parallelism
You cannot enable or disable Flexible Parallelism on existing databases. Making a database capable of supporting Flexible Parallelism can only be done at the time the database is created. If you want to make Flexible Parallelism available for an existing database, create a database with Flexible Parallelism enabled and copy the tables from the original database to the new one. One way to do this is with CREATE TABLE LIKE
and INSERT…SELECT
.
Flexible Parallelism is enabled and configured via two engine variables. The sub_to_physical_partition_ratio
engine variable enables Flexible Parallelism, and query_parallelism_per_leaf_core
controls whether queries use flexible parallelism or not.
Set sub_to_physical_partition_ratio
to a valid value greater than 0 (a power of 2 up to 64) to enable Flexible Parallelism. This engine variable sets the number of sub-partitions per physical partition that a new database will be created with.
When a database is created with sub_to_physical_partition_ratio
set to a valid value greater than 0, it will support Flexible Parallelism for queries over columnstore tables in that database. The maximum query parallelism is sub_to_physical_partition_ratio * number of physical partitions
.
Once Flexible Parallelism is enabled, you can configure CPU usage per query on each leaf node.
To control the fraction of the CPU power that will be dedicated to one query on each leaf node, set the query_parallelism_per_leaf_core
engine variable on the aggregator.
For example, if you want to use half of the available cores on each leaf node to process one query, then set query_parallelism_per_leaf_core
to 0.5.
If you want to limit the parallelism to use a specific number of cores to process each query, rather than just a proportion of the number of cores, then you will need to know how many cores there are on each leaf. This can be determined by reading value of the expected_leaf_core_count
variable. This variable is set during installation and should not be changed during normal operation.
The degree of parallelism on each node is calculated as expected_leaf_core_count * query_parallelism_per_leaf_core
, rounded to the nearest integer.
Suppose expected_leaf_core_count
is 28 and you want to use 10 cores on each leaf node to process a query. Dividing 10 by 28 gives 0.35714. To make sure you get 10 cores processing each query, set query_parallelism_per_leaf_core
to .35714.
How to Tell When Flexible Parallelism is Enabled
At the database level, the sub-partition count is available in both the Information Schema distributed_databases
and view mv_distributed_databases_status
tables.
singlestore> SELECT * FROM information_schema.distributed_databases; +-------------+---------------+----------------+--------------------+------------+-------------+---------+--------------------+-----------+---------+ | DATABASE_ID | DATABASE_NAME | NUM_PARTITIONS | NUM_SUB_PARTITIONS | CLUSTER_ID | REMOTE_NAME | IS_SYNC | IS_SYNC_DURABILITY | IS_PAUSED | SITE_ID | +-------------+---------------+----------------+--------------------+------------+-------------+---------+--------------------+-----------+---------+ | 4 | db | 3 | 24 | 0 | | 1 | 0 | 0 | 1 | +-------------+---------------+----------------+--------------------+------------+-------------+---------+--------------------+-----------+---------+
In the NUM_SUB_PARTITIONS
column, the databases with a non-zero value have Flexible Parallelism enabled.
At the query level, you can tell whether a query is using Flexible Partitioning via Query Plan Operators. When node_degree_of_parallelism
is set, the parallelism level is shown in Explain or Profile operations like Gather, GatherMerge, Project, Repartition, Broadcast, and MaterializedCTE where parallelism_level
is either partition
, subpartition
, or segment
:
partitions:all parallelism_level:segment partitions:all parallelism_level:partition partitions:all parallelism_level:subpartition
A value of partition
indicates one thread per database partition (Flexible Parallelism is not enabled).
The values sub_partition
and segment
both mean Flexible Parallelism is used for the node. The difference between sub_partition
and segment
is the granularity of work items in the query threads. In the sub_partition
case, each thread scans a pre-computed range (a set of sub-partitions). For the segment case, there is not a pre-computed split of work across threads; therefore, as each thread finishes its work, it will try to get one more segment to scan.
Basically, the segment level parallelism allows threads that finish their work earlier to help with other threads (at a segment granularity), while sub-partition level parallelism does not allow that (each thread scans a set of sub-partitions assigned to it during the query).