Getting Started

You can download the latest version of SingleStore Spark Connector from Maven Central or SparkPackages. You can download the source code from its GitHub repository. The group is com.singlestore and the artifact is singlestore-spark-connector_2.12.

The following matrix shows currently supported versions of the connector and their compatibility with different Spark versions:

Connector version

Supported Spark versions

4.1.8

Spark 3.1, Spark 3.2, Spark 3.3, Spark 3.4, Spark 3.5

4.1.7

Spark 3.1, Spark 3.2, Spark 3.3, Spark 3.4, Spark 3.5

4.1.6

Spark 3.1, Spark 3.2, Spark 3.3, Spark 3.4, Spark 3.5

4.1.5

Spark 3.0, Spark 3.1, Spark 3.2, Spark 3.3, Spark 3.4, Spark 3.5

4.1.4

Spark 3.0, Spark 3.1, Spark 3.2, Spark 3.3, Spark 3.4

4.1.3

Spark 3.0, Spark 3.1, Spark 3.2, Spark 3.3

4.1.2

Spark 3.0, Spark 3.1, Spark 3.2, Spark 3.3

4.1.1

Spark 3.0, Spark 3.1, Spark 3.2, Spark 3.3

4.1.0

Spark 3.0, Spark 3.1, Spark 3.2

4.0.x

Spark 3.0, Spark 3.1, Spark 3.2

3.2.1+

Spark 3.0, Spark 3.1, Spark 3.2

3.2.0

Spark 3.0, Spark 3.1

Note

SingleStore recommends using the latest version of the connector compatible with the corresponding Spark version. Refer to Migrate between SingleStore Spark Connector Versions for more information.

The connector follows the x.x.x-spark-y.y.y naming convention, where x.x.x represents the connector version and y.y.y represents the corresponding Spark version. For example, in connector 3.0.0-spark-3.2.0, 3.0.0 is the version of the connector, compiled and tested against Spark version 3.2.0. It is critical to select the connector version that corresponds to the Spark version in use.

Release Highlights

Version 4.1.8

  • Changed retry during reading from result table to use exponential backoff.

  • Used ForkJoinPool instead of FixedThreadPool.

  • Added more logging.

Version 4.1.7

  • Fixed a bug that caused reading from the wrong result table when the task was restarted.

Version 4.1.6

  • Changed LoadDataWriter to send data in batches.

  • Added numPartitions parameter to specify the exact number of resulting partitions during parallel read.

Version 4.1.5

  • Added support for Spark 3.5 for connector version 4.1.5.

  • Updated dependencies.

Version 4.1.4

  • Added support for Spark 3.4 for connector version 4.1.4.

  • Added support for additional connection attributes.

  • Fixed conflicts in result table names during parallel read.

Version 4.1.3

  • Improved error handling when using the onDuplicateKeySQL option.

Version 4.1.2

  • Fixed an issue where retrying parallel reads caused a Table has reached its quota of 1 reader(s) error.

Version 4.1.1

  • Added support for clientEndpoint option.

  • Added support for Spark 3.3 for connector version 4.1.1.

  • Fixed an issue with error handling that caused deadlocks.

Version 4.1.0

  • Added support for JWT-based authentication.

  • Added support for connection pooling.

  • Added multi-partition support to parallel read feature.

  • Added support for more SQL expressions in pushdown.

Version 4.0.x

  • The connector uses the SingleStore JDBC driver instead of the MariaDB JDBC driver.

Version 3.2

  • Added support for parallel reads from aggregator nodes.

  • Added support for repartition results by columns in parallel read from aggregators.

Version 3.1

  • The connector uses the MariaDB JDBC driver and rebranded the connector from memsql-spark-connector to singlestore-spark-connector.

  • Adapts the rebranding from memsql to singlestore. For example, the configuration prefix is changed from spark.datasource.memsql.<config_name> to spark.datasource.singlestore.<config_name>.

Last modified: August 29, 2024

Was this article helpful?