SingleStore Managed Service

Installation

You can add the Spark Connector 3.0 your Spark application using Spark-Shell, PySpark, or spark-submit by running the following command. Make sure to update the command with the connector and spark version.

$SPARK_HOME/bin/spark-shell --packages com.singlestore:singlestore-spark-connector_2.11:3.0.<insert-connector-version>spark-<insert-spark-version>

Alternatively, you may use Maven or SBT.

SingleStore Integration with Spark Using Maven

This topic describes how to integrate and connect Spark to SingleStore using Maven.

  1. Log in to the machine where the Maven project is to be created.

  2. Create an empty Maven project (will only contain pom.xml and the src directory):

    mvn archetype:generate -DgroupId=example
    -DartifactId=SparkSingleStoreConnection
    -DarchetypeArtifactId=maven-archetype-quickstart
    -DinteractiveMode=false
    

    Note: Maven uses a set of identifiers, also called coordinates, to uniquely identify a project and specify how the project artifact should be packaged:

    • groupId – a unique base name of the company or group that created the project

    • artifactId – a unique name of the project

    • archetypeArtifactId – a project template that contains only a pom.xml file and src directory

  3. Update the pom.xml file in your project to include the SingleStore Spark Connector dependency. The following is an example pom.xml file with the SingleStore Spark Connector 3.0 dependency. Your pom.xml file might be different based on your project’s required dependencies and your version of Spark.

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.example</groupId>
        <artifactId>SparkSingleStoreConnection</artifactId>
        <version>1.0-SNAPSHOT</version>
        <build>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                    </configuration>
                </plugin>
                <plugin>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>2.4.1</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
                                <filters>
                                    <filter>
                                        <artifact>*:*</artifact>
                                        <excludes>
                                            <exclude>META-INF/*.RSA</exclude>
                                            <exclude>META-INF/*.SF</exclude>
                                            <exclude>META-INF/*.inf</exclude>
                                        </excludes>
                                    </filter>
                                </filters>
                                <transformers>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                        <resource>reference.conf</resource>
                                    </transformer>
                                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                        <mainClass>{main-class-name}</mainClass>
                                    </transformer>
                                </transformers>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-sql_2.11</artifactId>
                <version>2.4.4</version>
            </dependency>
            <dependency>
                <groupId>com.singlestore</groupId>
                <artifactId>singlestore-spark-connector_2.11</artifactId>
                <version>3.0.1-spark-2.4.4</version>
            </dependency>
        </dependencies>
    
    </project>
    
  4. Edit the pom.xml file (using names appropriate to your app/environment)

    • Change the name to your parent folder

    • Enter the target main class {main-class-name} in the tag

    • Build the project from the parent directory using the following command: mvn clean package

You are ready to run the executable.

SingleStore Integration with Spark Using SBT

This topic describes how to integrate and connect Spark to SingleStore using SBT.

  1. Log in to the machine where the SBT project is to be created.

  2. Create the following directory structure to encompass the SBT project:

    SparkSingleStoreSBT
      |── build.sbt
      |── project
        |── plugins.sbt
      |── src
      |── main
      |── scala
        |── Reader.scala
        |── Writter.scala
    
  3. Add the following content to the file build.sbt, in addition to any other additional dependencies required for your project. This is an example dependency file with the SingleStore Spark Connector 3.0. Your file may be different based on your version of Spark and other required project dependencies.

    name := ""SparkMemSQLConnector""
    
    version := ""0.1""
    
    scalaVersion := ""2.11.12""
    
    mainClass := Some(""Reader"")
    
    val sparkVersion = ""2.4.4""
    
    libraryDependencies += ""org.apache.spark"" %% ""spark-sql"" % sparkVersion
    libraryDependencies += ""com.singlestore"" % ""singlestore-spark-connector_2.11"" % ""3.0.0-spark-2.4.4""
    
    
    assemblyMergeStrategy in assembly := {
      case PathList(""META-INF"", xs @ _*) =>
        xs map {_.toLowerCase} match {
          case ""manifest.mf"" :: Nil | ""index.list"" :: Nil | ""dependencies"" :: Nil =>
            MergeStrategy.discard
          case ps @ x :: xs if ps.last.endsWith("".sf"") || ps.last.endsWith("".dsa"") =>
            MergeStrategy.discard
          case ""plexus"" :: xs =>
            MergeStrategy.discard
          case ""services"" :: xs =>
            MergeStrategy.filterDistinctLines
          case ""spring.schemas"" :: Nil | ""spring.handlers"" :: Nil =>
            MergeStrategy.filterDistinctLines
          case _ => MergeStrategy.first
        }
      case ""application.conf"" => MergeStrategy.concat
      case ""reference.conf"" => MergeStrategy.concat
      case _ => MergeStrategy.first
    }
    
  4. Develop your Spark application including SingleStore as the datastore for load and sink.

  5. Package your application by setting the target main class on the file build.sbt

    • Choose the target main class in the mainClass := Some("target_main_class_name")

    • Build the project from the parent directory using the following command: Set clean assembly

You are ready to run the executable.