SingleStoreDB and Spark
What are the differences between SingleStoreDB and Spark SQL?
Spark SQL treats datasets (RDDs) as immutable - there is currently no concept of an INSERT, UPDATE, or DELETE. You could express these concepts as a transformation, but this operation returns a new RDD rather than updating the dataset in place. In contrast, SingleStoreDB is an operational database with full transactional semantics.
SingleStoreDB supports updatable relational database indexes. The closest analogue in Spark is IndexRDD, which is currently under development, and provides updatable key/value indexes.
You can connect SingleStoreDB to Spark with the SingleStore Spark Connector. The SingleStore Spark Connector 3.0 is the latest GA version.
SQL Push Down
What happens if SQL push down fails?
The SingleStoreDB Connector takes a best effort approach towards query push down. While Spark is preparing the query for execution, the SingleStoreDB push down strategy attempts to push down every subtree starting with the entire query. If anything fails, we simply leave the tree as is and Spark handles executing the unsupported section of the tree.
How can I check to see if a query is pushed down?
Every DataFrame has a method called .explain
which will print the final plan before execution. If the first element in that plan is a MemSQLPhysicalRDD
then the DataFrame has been fully pushed down.
What SQL push downs are not supported?
We are constantly improving push down, so the best thing to do is just try your query and then use .explain
to check to see if it got pushed down. If you find a query which is not pushed down, please raise an Github issue on the Connector repo.