SQL Pushdown

The singlestore-spark-connector has extensive support for rewriting Spark SQL and dataframe operation query plans into standalone SingleStore queries. This allows most of the computation to be pushed into the SingleStore distributed system without any manual intervention. The SQL rewrites are enabled automatically, but they can be disabled using the disablePushdown option. We also support partial pushdown where certain parts of a query can be evaluated in SingleStore and certain parts need to be evaluated in Spark.

SQL Pushdown is either enabled or disabled on the entire Spark Session. If you want to run multiple queries in parallel with different disablePushdown values, run them on separate Spark Sessions.

We currently support most of the primary Logical Plan nodes in Spark SQL including Project, Filter, Aggregate, Window, Join, Limit, and Sort.

We also support most Spark SQL expressions. A full list of supported operators/functions can be found in the ExpressionGen.scala file.

The best place to look for examples of fully supported queries is in the tests. Check out this file as a starting point: SQLPushdownTest.scala.

SQL Pushdown Incompatibilities

  • ToUnixTimestamp and UnixTimestamp handle only time less than 2038-01-19 03:14:08, if they get DateType or TimestampType as a first argument.

  • FromUnixTime with default format (yyyy-MM-dd HH:mm:ss) handle only time less than 2147483648 (2^31).