SingleStore DB

Load Data from Spark

Apache Spark is an open-source data processing framework. Spark excels at iterative computation and includes numerous libraries for statistical analysis, graph computations, and machine learning. The SingleStore Spark Connector allows you to connect your Spark and SingleStore environments. You can use SingleStore and Spark together to accelerate workloads by taking advantage of computational power of Spark in tandem with the fast ingest and persistent storage SingleStore has to offer.

The SingleStore Spark Connector integrates with Apache Spark 2.3 and 2.4 and supports both data loading and extraction from database tables and Spark DataFrames.

The connector is implemented as a native Spark SQL plugin, and supports Spark’s DataSource API. Spark SQL supports operating on a variety of data sources through the DataFrame interface, and the DataFrame API is the widely used framework for how Spark interacts with other systems.

In addition, the connector is a true Spark data source; it integrates with the Catalyst query optimizer, supports robust SQL pushdown, and leverages SingleStore LOAD DATA to accelerate ingest from Spark via compression.

You can download the Spark Connector from its GitHub repository and from Maven Central. The group is com.singlestore and the artifact is singlestore-spark-connector_2.11.

This topic discusses how to configure and start using the SingleStore Spark Connector 3.0.

Note: We’ve made significant changes between the Spark Connector 3.0 and Spark Connector 2.0. Please see Migrating between the Spark Connector 2.0 and the Spark Connector 3.0.