SingleStore Managed Service

Parallel Read Support

If you enable parallel reads via the enableParallelRead option, the singlestore-spark-connector will attempt to read results directly from SingleStore leaf nodes. This can drastically improve performance in some cases.

16083b577557f9.png

Note: Parallel reads are not consistent

Parallel reads read directly from partitions on the leaf nodes which skips our entire transaction layer. This means that the individual reads will see an independent version of the databases distributed state. Make sure to take this into account when enabling parallel read.

Note: Parallel reads transparently fallback to single stream reads

Parallel reads currently only work for query-shapes which do no work on the Aggregator and thus can be pushed entirely down to the leaf nodes. To determine if a particular query is being pushed down you can ask the dataframe how many partitions it has like so:

df.rdd.getNumPartitions

If this value is > 1 then we are reading in parallel from leaf nodes.

Note: Parallel reads require consistent authentication and connectible leaf nodes

In order to use parallel reads, the username and password provided to the singlestore-spark-connector must be the same across all nodes in the cluster.

In addition, the hostnames and ports listed by SHOW LEAVES must be directly connectible from Spark.