Installation

You can add the SingleStore Spark Connector to your Spark application using Spark-Shell, PySpark, or spark-submit by running the following command:

$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:<insert-connector-version>-spark-<insert-spark-version>

Before running the command, update the connector and spark version in the command. For example,

$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.12:4.0.0-spark-3.2.0

You can also use Maven or SBT to integrate SingleStore with Spark.

Integrate SingleStore with Spark Using Maven

To integrate/connect Spark to SingleStore using Maven:

  1. Log in to the machine where you want to create the Maven project.

  2. Create an empty Maven project (only contains pom.xml and the src directory):

    mvn archetype:generate -DgroupId=example \
    -DartifactId=SparkSingleStoreConnection \
    -DarchetypeArtifactId=maven-archetype-quickstart \
    -DinteractiveMode=false
    

    Note: Maven uses a set of identifiers, also called coordinates, to uniquely identify a project and specify how the project artifact should be packaged:

    • groupId – a unique base name of the company or group that created the project

    • artifactId – a unique name of the project

    • archetypeArtifactId – a project template that contains only a pom.xml file and src directory

  3. Update the pom.xml file in your project to include the SingleStore Spark Connector dependency. Your pom.xml file may be different based on your project’s required dependencies and your version of Spark. Here's a sample pom.xml file:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.example</groupId>
        <artifactId>SparkSingleStoreConnection</artifactId>
        <version>1.0-SNAPSHOT</version>
        <build>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                    </configuration>
                </plugin>
                <plugin>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>2.4.1</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <filters>
                                    <filter>
                                        <artifact>*:*</artifact>
                                        <excludes>
                                            <exclude>META-INF/*.RSA</exclude>
                                            <exclude>META-INF/*.SF</exclude>
                                            <exclude>META-INF/*.inf</exclude>
                                        </excludes>
                                    </filter>
                                </filters>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.conf</resource>
                                    </transformer>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <mainClass>{main-class-name}</mainClass>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.12</artifactId>
                <version>{insert-spark-version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.12</artifactId>
                <version>{insert-spark-version}</version>
            </dependency>
            <dependency>
                <groupId>com.singlestore</groupId>
                <artifactId>singlestore-spark-connector_2.12</artifactId>
                <version>{insert-connector-version}-spark-{insert-spark-version}</version>
            </dependency>
            <dependency>
                <groupId>junit</groupId>
                <artifactId>junit</artifactId>
                <version>3.8.1</version>
                <scope>test</scope>
            </dependency>
        </dependencies>
    
    </project>
    
  4. Update the pom.xml file with names appropriate to your app/environment:

    • Change the name of your parent folder.

    • Enter the target main class {main-class-name} in the tag.

    • Replace the {insert-spark-version} and {insert-connector-version} with the appropriate Spark and SingleStore connector versions, respectively.

    • Build the project from the parent directory using the following command:

      mvn clean package

You are ready to run the executable.

Integrate SingleStore with Spark Using SBT

To integrate and connect Spark to SingleStore using SBT:

  1. Log in to the machine where you want to create the SBT project.

  2. Create the following directory structure to encompass the SBT project:

    SparkSingleStoreSBT
      |── build.sbt
      |── project
        |── plugins.sbt
      |── src
        |── main
          |── scala
            |── Reader.scala
            |── Writter.scala
    
  3. Add the following content to the plugins.sbt file, in addition to any other dependencies required by your project:

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.15.0")
  4. Add the following content to the build.sbt file, in addition to any other additional dependencies required for your project. Your file may be different based on your version of Spark and other required project dependencies. Here's a sample build.sbt file:

    name := "SparkSingleStoreConnector"
    
    version := "0.1"
    
    scalaVersion := "2.12.12"
    
    mainClass := Some("Reader")
    
    val sparkVersion = "{spark-version}"
    
    libraryDependencies += "org.apache.spark" %% "spark-sql" % sparkVersion
    libraryDependencies += "com.singlestore" % "singlestore-spark-connector_2.12" % "{connector-version}-spark-{spark-version}"
    
    
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", xs @ _*) =>
        xs map {_.toLowerCase} match {
          case "manifest.mf" :: Nil | "index.list" :: Nil | "dependencies" :: Nil =>
            MergeStrategy.discard
          case ps @ x :: xs if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
            MergeStrategy.discard
          case "plexus" :: xs =>
            MergeStrategy.discard
          case "services" :: xs =>
            MergeStrategy.filterDistinctLines
          case "spring.schemas" :: Nil | "spring.handlers" :: Nil =>
            MergeStrategy.filterDistinctLines
          case _ => MergeStrategy.first
        }
      case "application.conf" => MergeStrategy.concat
      case "reference.conf" => MergeStrategy.concat
      case _ => MergeStrategy.first
    }
    

    Replace the {spark-version} and {connector-version} with the appropriate Spark and SingleStore connector versions, respectively.

  5. Develop your Spark application including SingleStore as the datastore for load and sink.

  6. Package your application by setting the target main class in the build.sbt file:

    • Choose the target main class in the mainClass := Some("target_main_class_name")

    • Build the project from the parent directory using the following command:

      sbt clean assembly

You are ready to run the executable.