SingleStore Managed Service

Migrating between the Spark Connector 2.0 and the Spark Connector 3.0

You may have previously used the SingleStore Spark Connector 2.0. There are many enhancements between the two versions, and the sections below describe the differences in configuration and functionality between the SingleStore Spark Connector versions 3.0 and 2.0.

Configuration Differences
  • If you are only using the Spark reader API, using version 3.0 of the connector will only require a change in the configuration options. Please see below for details.

  • If you are using the previous Spark connector to write data to SingleStore, the existing savetoMemSQL function is deprecated and has been replaced with Spark’s df.write and .save() functions.

Configuration Comparisons

2.0 Option

Related 3.0 Option (if applicable)

Details

masterHost

ddlEndpoint

Master host to connect to.

N/A

dmlEndpoint

Spark Connector 3.0 allows you to specify DML endpoints for load balancing for non-DDL queries. Load balancing is supported through different mechanism in Spark 2.0

masterPort

N/A

For the 3.0 version, the port is specified in the ddl/dml endpoint, respectively.

user

user

User to connect with.

password

password

Password for user.

defaultDatabase

database

Database to connect to.

N/A

query

The query to run (optional in 3.0).

N/A

dbtable

The table to query (optional in 3.0).

defaultSaveMode

N/A

Spark streaming is not available in 3.0. This is for Spark streaming in version 2.0 and allows a user to specify options for overriding duplicate keys.

disablePartitionPushdown

enableParallelRead

Spark connector 3.0 provides opt-in parallel read option.

defaultCreateMode

N/A

Controls whether databases and tables are created if they don’t exist. In 3.0 , we will automatically create a table if it doesn’t exist, but we will not create a database.

CompressionType

loadDataCompression

Compression options (there are more compression options available in the 3.0 connector).

defaultInsertBatchSize

insertBatchSize

This is for limiting insert batches when using multi-insert statements.

N/A

disablePushdown

Controls whether SQL pushdown is Enabled or Disabled. The previous connector did not support robust SQL pushdown.

Functionality Differences

Version 2.0 of the SingleStore Spark Connector contains the following functionality that is not available in version 3.0:

  • The SaveToMemSQL() function to write to SingleStore; this is replaced with using df.write directly

  • Adding indexes to automatically created tables; this can be done via a JDBC query

  • No formal Spark Streaming API integration.