SingleStore DB

Configuration Settings

The SingleStore Spark Connector leverages Spark SQL’s Data Sources API. The connection to SingleStore relies on the following Spark configuration settings:

The singlestore-spark-connector is configurable globally via Spark options and locally when constructing a DataFrame. The global and local options use the same names; however the global options have the prefix spark.datasource.singlestore.:

Option

Description

ddlEndpoint (required)

Hostname or IP address of the SingleStore Master Aggregator in the format host[:port](port is optional). Example: master-agg.foo.internal:3308 or master-agg.foo.internal

dmlEndPoint

Hostname or IP address of SingleStore Aggregator nodes to run queries against in the format host[:port],host[:port],... (:port is optional, multiple hosts separated by comma). Example: child-agg:3308,child-agg2 (default: ddlEndpoint)

user (required)

SingleStore username.

password (required)

SingleStore password.

query

The query to run (mutually exclusive with database table).

dbtable

The table to query (mutually exclusive with query).

database

If set, all connections will default to using this database (default:empty).

overwriteBehavior

Specify the behavior during Overwrite; one of dropAndCreate, truncate, merge (default: dropAndCreate).

truncate

Deprecated option, please use overwriteBehavior instead. Truncate instead of drop an existing table during Overwrite (default: false).

loadDataCompression

Compress data on load; one of three options: GZip, LZ4, Skip. (default:GZip).

disablePushdown

Disable SQL Pushdown when running queries (default:false).

enableParallelRead

Enable reading data in parallel for some query shapes (default: false).

tableKey

Specify additional keys to add to tables created by the connector (See below for more details).

onDuplicateKeySQL

If this option is specified, and a row is to be inserted that would result in a duplicate value in a PRIMARY KEY or UNIQUE index, SingleStore will instead perform an UPDATE of the old row. See examples below.

insertBatchSize

Size of the batch for row insertion (default: 10000).

loadDataFormat

Serialize data on load; either (Avro, CSV) (default: CSV).

maxErrors

The maximum number of errors in a single LOAD DATA request. When this limit is reached, the load fails. If this property is set to 0, no error limit exists (Default: 0).