SingleStore and Spark

SingleStore and Apache Spark are both distributed, in-memory technologies. SingleStore is a SQL database, while Spark is a general computation framework. SingleStore has tight integration with Apache Spark through its SingleStore Spark Connector offering. For instance, with SingleStore and Spark clusters deployed, users can extract data from real-time sources such as Kafka, run the data through a Spark machine learning library model, and store the model result into SingleStore to be persisted and queryable.

What are the differences between SingleStore and Spark SQL?

  • Spark SQL treats datasets (RDDs) as immutable - there is currently no concept of an INSERT, UPDATE, or DELETE. You could express these concepts as a transformation, but this operation returns a new RDD rather than updating the dataset in place. In contrast, SingleStore is an operational database with full transactional semantics.

  • SingleStore supports updatable relational database indexes. The closest analogue in Spark is IndexRDD, which is currently under development, and provides updatable key/value indexes.

You can connect SingleStore to Spark with the SingleStore Spark Connector. The SingleStore Spark Connector 3.0 is the latest GA version.

SQL Push Down

What happens if SQL push down fails?

The SingleStore Connector takes a best effort approach towards query push down. While Spark is preparing the query for execution, the SingleStore push down strategy attempts to push down every subtree starting with the entire query. If anything fails, we simply leave the tree as is and Spark handles executing the unsupported section of the tree.

How can I check to see if a query is pushed down?

Every DataFrame has a method called .explain which will print the final plan before execution. If the first element in that plan is a MemSQLPhysicalRDD then the DataFrame has been fully pushed down.

What SQL push downs are not supported?

We are constantly improving push down, so the best thing to do is just try your query and then use .explain to check to see if it got pushed down. If you find a query which is not pushed down, please raise an Github issue on the Connector repo.

Last modified: January 10, 2023

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK