Configuration Settings

The SingleStore Spark Connector leverages Spark SQL’s Data Sources API.

The singlestore-spark-connector is configurable globally via Spark options and locally when constructing a DataFrame. The global and local options use the same names; however the global options have the prefix spark.datasource.singlestore. The connection to SingleStore relies on the following Spark configuration options:

Basic Options

Option

Description

Default Value

ddlEndpoint (required)

The hostname or IP address of the SingleStore Master Aggregator in the host[:port] format, where port is an optional parameter. Example: master-agg.foo.internal:3308 or master-agg.foo.internal.

dmlEndPoints

The hostname or IP address of SingleStore Aggregator nodes to run queries against in the host[:port],host[:port],... format, where :port is an optional parameter (multiple hosts separated by comma). Example: child-agg:3308,child-agg2.

ddlendpoint

user

SingleStore username.

root

password

SingleStore password.

query

The query to run (mutually exclusive with dbtable option).

dbtable

The table to query (mutually exclusive with query).

database

If set, all connections use this database by default. This option is empty by default.

Read Options

Option

Description

Default Value

disablePushdown

Disables SQL Pushdown when running queries.

false

enableParallelRead

Enables reading data in parallel for some query shapes. It can have one of the following values: disabled, automaticLite, automatic, and forced. For more information, see Parallel Read Support.

automaticLite

parallelRead.Features

Specifies a comma separated list of parallel read features that are tried in the order they are listed. SingleStore supports the following features: ReadFromLeaves, ReadFromAggregators, and ReadFromAggregatorsMaterialized. For example, ReadFromAggregators, ReadFromAggregatorsMaterialized. For more information, see Parallel Read Support.

ReadFromAggregators

parallelRead.tableCreationTimeoutMS

Specifies the amount of time (in ms) the reader waits for the result table creation when using the ReadFromAggregators feature. If set to 0, timeout is disabled.

0

parallelRead.materializedTableCreationTimeoutMS

Specifies the amount of time (in ms) the reader waits for the result table creation when using the ReadFromAggregatorsMaterialized feature. If set to 0, timeout is disabled.

0

parallelRead.maxNumPartitions

Specifies the maximum number of partitions in the resulting DataFrame. If set to 0, the DataFrame can have unlimited number of partitions.

0

parallelRead.repartition

Repartitions data before reading.

false

parallelRead.repartition.columns

Specifies a comma separated list of columns that are used for repartitioning (when parallelRead.repartition is enabled). By default, an additional column with RAND() value is used for repartitioning.

Write Options

Option

Description

Default Value

overwriteBehavior

Specifies the behavior during Overwrite. It can have one of the following values: dropAndCreate, truncate, or merge.

dropAndCreate

truncate

This option is deprecated, please use overwriteBehavior instead. Truncates an existing table during Overwrite instead of dropping it.

false

loadDataCompression

Compresses data on load. It can have one of the following three values: GZip, LZ4, or Skip.

GZip

loadDataFormat

Serializes data on load. It can have one of the following values: Avro or CSV.

CSV

tableKey

Specifies additional keys to add to tables created by the connector. See Load Data from Spark Examples for more information.

onDuplicateKeySQL

If this option is specified and a new row with duplicate PRIMARY KEY or UNIQUE index is inserted, SingleStore performs an UPDATE operation on the existing row. See Load Data from Spark Examples for more information.

insertBatchSize

Specifies the size of the batch for row insertion.

10000

maxErrors

The maximum number of errors in a single LOAD DATA request. When this limit is reached, the load fails. If this property is set to 0, no error limit exists.

0

createRowstoreTable

If enabled, the connector creates a rowstore table.

false

Connection Pool Options

Option

Description

Default Value

driverConnectionPool.Enabled

Enables the use of connection pool on the driver.

true

driverConnectionPool.MaxOpenConns

The maximum number of active connections with the same options that can be allocated from the driver pool at the same time. A negative value indicates an unlimited number of active connections.

-1

driverConnectionPool.MaxIdleConns

The maximum number of connections with the same options that can remain idle in the driver pool without extra ones being released. A negative value indicates an unlimited number of idle connections.

8

driverConnectionPool.MinEvictableIdleTimeMs

The minimum amount of time (in ms) an object may sit idle in the driver pool before it is eligible for eviction by the idle object evictor (if any).

30000 (30 sec)

driverConnectionPool.TimeBetweenEvictionRunsMS

The number of milliseconds to sleep between runs of the idle object evictor thread on the driver. If set to 0 or a negative number, no idle object evictor thread is run.

1000

(1 sec)

driverConnectionPool.MaxWaitMS

The maximum number of milliseconds that the driver pool waits (when there are no available connections) for a connection to be returned before throwing an exception. If set to -1, the pool waits indefinitely.

-1

driverConnectionPool.MaxConnLifetimeMS

The maximum lifetime of the connector (in ms) after which the connection fails the next activation, passivation or validation test. If set to 0 or a negative number, the connection has an infinite lifetime.

-1

executorConnectionPool.Enabled

Enables the use of connection pool on executors.

true

executorConnectionPool.MaxOpenConns

The maximum number of active connections with the same options that can be allocated from the executor pool at the same time. A negative value indicates an unlimited number of active connections.

-1

executorConnectionPool.MaxIdleConns

The maximum number of connections with the same options that can remain idle in the executor pool, without extra ones being released. A negative value indicates an unlimited number of idle connections.

8

executorConnectionPool.MinEvictableIdleTimeMs

The minimum amount of time an object may sit idle in the executor pool before it is eligible for eviction by the idle object evictor (if any).

2000

(2 sec)

executorConnectionPool.TimeBetweenEvictionRunsMS

The number of milliseconds to sleep between runs of the idle object evictor thread on the executor. If set to 0 or a negative number, no idle object evictor thread is run.

1000

(1 sec)

executorConnectionPool.MaxWaitMS

The maximum number of milliseconds that the executor pool waits (when there are no available connections) for a connection to be returned before throwing an exception. If set to -1, the pool waits indefinitely.

-1

executorConnectionPool.MaxConnLifetimeMS

The maximum lifetime of the connector (in ms) after which the connection fails the next activation, passivation or validation test. If set to 0 or a negative number, the connection has an infinite lifetime.

-1

Last modified: February 23, 2024

Was this article helpful?