Skip to main content

SQL Pushdown

The singlestore-spark-connector has extensive support for rewriting Spark SQL and DataFrame operation query plans into standalone SingleStoreDB queries. This allows most of the computation to be pushed into the SingleStoreDB distributed system without any manual intervention. The SQL rewrites are enabled automatically, but they can be disabled using the disablePushdown option. SingleStoreDB also support partial pushdown where certain parts of a query can be evaluated in SingleStoreDB and certain parts need to be evaluated in Spark.

SQL Pushdown is either enabled or disabled on the entire Spark Session. If you want to run multiple queries in parallel with different disablePushdown values, run them on separate Spark Sessions.

We currently support most of the primary Logical Plan nodes in Spark SQL including Project, Filter, Aggregate, Window, Join, Limit, and Sort.

We also support most Spark SQL expressions. A full list of supported operators/functions can be found in the ExpressionGen.scala file.

The best place to look for examples of fully supported queries is in the tests. Check out this file as a starting point: SQLPushdownTest.scala.

SQL Pushdown Incompatibilities

  • ToUnixTimestamp and UnixTimestamp handle only time less than 2038-01-19 03:14:08, if they get DateType or TimestampType as a first argument.

  • FromUnixTime with default format (yyyy-MM-dd HH:mm:ss) handle only time less than 2147483648 (2^31).

  • DecimalType is truncated on overflow (by default, Spark either throws an exception or returns null).

  • greatest and least return null if at least one argument is null (in Spark these functions skip nulls).

  • When a value can not be converted to a numeric or fractional type, SingleStoreDB returns 0 (Spark returns null).

  • Atanh(x), for x ∈ (-∞, -1] ∪ [1, ∞), returns null (Spark returns NaN).

  • When a string is cast to a numeric type, SingleStoreDB accepts its prefix if it is numeric (Spark returns null if the whole string is not numeric).

  • When a numeric type is cast to a smaller one (in size), SingleStoreDB truncates it. For example, 500 cast to the Byte will be 127.

    Note: Spark optimizer can optimize casts for literals and then the behaviour for literals matches custom Spark behaviour.

  • When a fractional type is cast to an integral type, SingleStoreDB rounds it to the nearest integer.

  • Log returns null instead of NaNInfinity, or -Infinity.

  • Round rounds down if the number to be rounded is followed by 5 and it is DOUBLE or FLOAT (DECIMAL is rounded up).

  • Conv works differently if the number contains non-alphanumeric characters.

  • ShiftLeftShiftRight, and ShiftRightUnsigned convert the value to an UNSIGNED BIGINT and then produce the shift. In case of an overflow, it returns 0 (1 << 64 = 0 and 10>>20 = 0).

  • BitwiseGet returns 0 when the bit position is negative or exceeds the bit upper limit.

  • Initcap defines a letter as the beginning of a word even if it is enclosed in quotation marks, brackets, etc. For example "dear sir/madam (miss)" is converted to "Dear Sir/Madam (Miss)".

  • Skewness(x), in Spark 3.0, for STD(x) = 0 returns null instead of NaN.