SQL Pushdown
The singlestore-spark-connector
has extensive support for rewriting Spark SQL and DataFrame operation query plans into standalone SingleStoreDB queries. This allows most of the computation to be pushed into the SingleStoreDB distributed system without any manual intervention. The SQL rewrites are enabled automatically, but they can be disabled using the disablePushdown
option. SingleStoreDB also support partial pushdown where certain parts of a query can be evaluated in SingleStoreDB and certain parts need to be evaluated in Spark.
SQL Pushdown is either enabled or disabled on the entire Spark Session. If you want to run multiple queries in parallel with different disablePushdown
values, run them on separate Spark Sessions.
We currently support most of the primary Logical Plan nodes in Spark SQL including Project
, Filter
, Aggregate
, Window
, Join
, Limit
, and Sort
.
We also support most Spark SQL expressions. A full list of supported operators/functions can be found in the ExpressionGen.scala file.
The best place to look for examples of fully supported queries is in the tests. Check out this file as a starting point: SQLPushdownTest.scala.
SQL Pushdown Incompatibilities
ToUnixTimestamp
andUnixTimestamp
handle only time less than2038-01-19 03:14:08
, if they getDateType
orTimestampType
as a first argument.FromUnixTime
with default format (yyyy-MM-dd HH:mm:ss
) handle only time less than2147483648
(2^31
).DecimalType
is truncated on overflow (by default, Spark either throws an exception or returnsnull
).greatest
andleast
returnnull
if at least one argument isnull
(in Spark these functions skip nulls).When a value can not be converted to a numeric or fractional type, SingleStoreDB returns 0 (Spark returns
null
).Atanh(x)
, for x ∈ (-∞, -1] ∪ [1, ∞), returns null (Spark returnsNaN
).When a string is cast to a numeric type, SingleStoreDB accepts its prefix if it is numeric (Spark returns
null
if the whole string is not numeric).When a numeric type is cast to a smaller one (in size), SingleStoreDB truncates it. For example, 500 cast to the
Byte
will be 127.Note: Spark optimizer can optimize casts for literals and then the behaviour for literals matches custom Spark behaviour.
When a fractional type is cast to an integral type, SingleStoreDB rounds it to the nearest integer.
Log
returnsnull
instead ofNaN
,Infinity
, or-Infinity
.Round
rounds down if the number to be rounded is followed by 5 and it isDOUBLE
orFLOAT
(DECIMAL
is rounded up).Conv
works differently if the number contains non-alphanumeric characters.ShiftLeft
,ShiftRight
, andShiftRightUnsigned
convert the value to anUNSIGNED BIGINT
and then produce the shift. In case of an overflow, it returns0 (1 << 64 = 0 and 10>>20 = 0)
.BitwiseGet
returns 0 when the bit position is negative or exceeds the bit upper limit.Initcap
defines a letter as the beginning of a word even if it is enclosed in quotation marks, brackets, etc. For example "dear sir/madam (miss)" is converted to "Dear Sir/Madam (Miss)".Skewness(x)
, in Spark 3.0, forSTD(x) = 0
returnsnull
instead ofNaN
.