Installation
You can add the SingleStore Spark Connector to your Spark application using Spark-Shell, PySpark, or spark-submit by running the following command:
$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:<insert-connector-version>-spark-<insert-spark-version>
Before running the command, update the connector and spark version in the command. For example,
$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:4.0.0-spark-3.2.0
You can also use Maven or SBT to integrate SingleStore with Spark.
Integrate SingleStoreDB with Spark Using Maven
To integrate/connect Spark to SingleStoreDB using Maven:
Log in to the machine where you want to create the Maven project.
Create an empty Maven project (only contains pom.xml and the src directory):
mvn archetype:generate -DgroupId=example \ -DartifactId=SparkSingleStoreConnection \ -DarchetypeArtifactId=maven-archetype-quickstart \ -DinteractiveMode=false
Note: Maven uses a set of identifiers, also called coordinates, to uniquely identify a project and specify how the project artifact should be packaged:
groupId
– a unique base name of the company or group that created the projectartifactId
– a unique name of the projectarchetypeArtifactId
– a project template that contains only a pom.xml file and src directory
Update the pom.xml file in your project to include the SingleStore Spark Connector dependency. Your pom.xml file may be different based on your project’s required dependencies and your version of Spark. Here's a sample pom.xml file:
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>SparkSingleStoreConnection</artifactId> <version>1.0-SNAPSHOT</version> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.0</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <artifactId>maven-shade-plugin</artifactId> <version>2.4.1</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.RSA</exclude> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.inf</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"> <resource>reference.conf</resource> </transformer> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>{main-class-name}</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.12</artifactId> <version>{insert-spark-version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.12</artifactId> <version>{insert-spark-version}</version> </dependency> <dependency> <groupId>com.singlestore</groupId> <artifactId>singlestore-spark-connector_2.12</artifactId> <version>{insert-connector-version}-spark-{insert-spark-version}</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> </dependencies> </project>
Update the pom.xml file with names appropriate to your app/environment:
Change the name of your parent folder.
Enter the target main class
{main-class-name}
in the tag.Replace the
{insert-spark-version}
and{insert-connector-version}
with the appropriate Spark and SingleStore connector versions, respectively.Build the project from the parent directory using the following command:
mvn clean package
You are ready to run the executable.
Integrate SingleStoreDB with Spark Using SBT
To integrate and connect Spark to SingleStoreDB using SBT:
Log in to the machine where you want to create the SBT project.
Create the following directory structure to encompass the SBT project:
SparkSingleStoreSBT |── build.sbt |── project |── plugins.sbt |── src |── main |── scala |── Reader.scala |── Writter.scala
Add the following content to the plugins.sbt file, in addition to any other dependencies required by your project:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
Add the following content to the build.sbt file, in addition to any other additional dependencies required for your project. Your file may be different based on your version of Spark and other required project dependencies. Here's a sample build.sbt file:
name := "SparkSingleStoreConnector" version := "0.1" scalaVersion := "2.12.12" mainClass := Some("Reader") val sparkVersion = "{spark-version}" libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.12" % "{connector-version}-spark-{spark-version}" assemblyMergeStrategy in assembly := { case PathList("META-INF", xs @ _*) => xs map {_.toLowerCase} match { case "manifest.mf" :: Nil | "index.list" :: Nil | "dependencies" :: Nil => MergeStrategy.discard case ps @ x :: xs if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") => MergeStrategy.discard case "plexus" :: xs => MergeStrategy.discard case "services" :: xs => MergeStrategy.filterDistinctLines case "spring.schemas" :: Nil | "spring.handlers" :: Nil => MergeStrategy.filterDistinctLines case _ => MergeStrategy.first } case "application.conf" => MergeStrategy.concat case "reference.conf" => MergeStrategy.concat case _ => MergeStrategy.first }
Replace the
{spark-version}
and{connector-version}
with the appropriate Spark and SingleStore connector versions, respectively.Develop your Spark application including SingleStoreDB as the datastore for load and sink.
Package your application by setting the target main class in the build.sbt file:
Choose the target main class in the
mainClass := Some("target_main_class_name")
Build the project from the parent directory using the following command:
sbt clean assembly
You are ready to run the executable.