8.0 Release Notes
Note
To deploy a SingleStoreDB 8.0 cluster, refer to the Deploy SingleStoreDB Guide.
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
The data_conversion_compatibility_level engine variable now defaults to
'8.0'
in new installations. This results in stricter data type conversions. The value is not changed when upgrading to version 8.0. This newdata_conversion_compatibility_level
setting additionally flags invalid string-to-number conversion inINSERT
statements.Applications will likely see more compatibility issues flagged when run against installations with
data_conversion_compatibility_level
set to'8.0'
than when run with a lower compatibility level.
Release Highlights
Note
This is the complete list of new features and fixes in engine version 8.0 and its maintenance patches. For a list of all new features and bug fixes added in maintenance releases only, see the maintenance release changelog. If you are upgrading from an existing 8.0 version, see the changelog for a list of new features and fixes added to 8.0 by date.
Code Engine - Powered by Wasm
The Code Engine feature allows you to create UDFs/TVFs using code compiled to WebAssembly (Wasm). This feature supports any language that can compile to the Wasm core specification.
For more information, see Code Engine - Powered by Wasm.
Improved Seekability in Universal Storage Tables
These enhancements will deliver large performance gains for transactional workloads on universal storage tables.
Added support for fast seeking into JSON columns in a universal storage table using subsegment access.
Improved seek performance for string data types for universal storage for LZ4 and run-length encoded (RLE) data.
Recursive Common Table Expressions
Recursive common table expressions (CTE) are now supported by SingleStoreDB. Previously, complex operations including temporary tables within a stored procedure would be needed to perform the actions that a simple recursive CTE query can handle.
For more information, see WITH (Common Table Expressions).
Initial IPv6 Support
Preview feature: Added initial support for IPv6 to the SingleStoreDB engine (memsqld
) and SingleStoreDB Toolbox. Refer to the allow_ipv6
and bind_address
engine variables for more information.
Other Improvements and Fixes
Performance Enhancements
Added the ability to cache histogram results during optimization to reduce the work performed by the histograms. (8.0.14)
Improved the parsing performance of queries that contain several tables. (8.0.14)
Improved the performance of S3 pipelines when Garbage Collection (GC) is enabled. (8.0.14)
Improved the query execution performance of JSON columns under a higher level of parallelism. (8.0.10)
Improved the performance on columnstore scans that perform multiple JSON extraction operations on the same JSON column. (8.0.8)
Improved the performance of various commands (
SHOW
commands, DDL, etc.) when there are very many views or tables in the database (100s of thousands). (8.0.6)Improved performance of comparing utf8mb4 strings. (8.0.5)
Improved performance for user-defined functions (UDFs) and Stored Procedures that take JSON arguments, and the
JSON_TO_ARRAY
command.Decreased the memory overhead for columnstore cardinality statistics by 25% as the first phase of an overall project to improve memory for auto-stats in general.
Improved the performance of the
PROFILE
functionality such as lower memory overheads, lower performance impacts to OLAP queries, and better statistics collecting.
Query Optimization enhancements:
Moved sub-queries for some outer joins from the
ON
clause to aWHERE
clause to enable subselects to be rewritten as joins.Enabled repartition on expressions.
Added ability to use
GROUP BY
push down for outer joins.Enhanced column pruning by eliminating derived duplicate columns.
Removed redundant
GROUP BY
clauses that are implied by equi-joins.Added sampling (a small portion of the rows in the table are used for analysis) for Reference tables as part of query optimization.
Added support for improved segment elimination in queries with
WHERE
clauses containingDATE
andTIME
functions. The functions that are supported for segment elimination are DATE, DATE_TRUNC, TIMESTAMP, UNIX_TIMESTAMP, and YEAR. See each topic for specific examples.
Selectivity Estimation improvements:
Enabled sampling for reference tables.
Improved date/time histogram estimates by utilizing a heuristic when the current date/time is outside of the histogram range.
Added selectivity estimation for filters containing uncorrelated scalar subselects. This behavior can be controlled by the engine variable
exclude_scalar_subselects_from_filters
. This change has the side-effect of enabling bloom filters more often.Changed the estimation source to heuristics when sampling is turned on but the total sampled rows are zero.
Added ability to use histogram estimation for filtering predicates that use a stored procedure parameter.
Increased the default value for engine variable
optimizer_cross_join_cost
to reduce the chance of Cartesian Joins being included when there are incorrect estimations.Improved the
GROUP BY
cardinality estimates for predicates usingOR
expressions.Enabled ability to combine histogram and sampling selectivity estimates by default.
New Information Schema Views and Columns
Add an information schema table (
JSON_COLUMN_SCHEMA
) which shows the schema inferred for JSON columns in columnstore tables. (8.0.15)Added the
DATETIME_PRECISION
column to bothPARAMETER
andROUTINES
information_schema views. Also, theDATETIME_PRECISION
column will includeTIME
andTIMESTAMP
data types in theCOLUMNS
information_schema view. (8.0.9)Added a new information_schema view named
LMV_LOCAL_DATABASES
. This view shows the state of local databases likeSHOW DATABASES EXTENDED
, but it can be queried against unlike show commands. (8.0.9)Added
CREATE_TIME
andALTER_TIME
columns to information_schema.pipelines.CREATE_TIME
shows the date/time a pipeline was created or recreated.ALTER_TIME
shows the date/time a pipeline was altered via anALTER PIPELINE
statement. (8.0.6)Added a new information schema view
internal_table_statistics
which shows memory use of SingleStoreDB internal metadata tables. The columns displayed are the same as those shown fortable_statistics
.Added several Replication Management views.
Added the MV_RECOVERY_STATUS view which includes information about the status of the current recovery process.
Added the
AVERAGE_DISK_SPILLING_USE
column to the MV_PLANCACHE information schema table. It shows the average amount of data (in bytes) spilt to disk during query execution.
New Commands and Functions
Added
JSON_INCLUDE/EXCLUDE_MASK
function when applied to a JSON document; it will return a subset of the document based on the mask. (8.0.18)JSON_EXCLUDE_MASK(<json>,<mask>);
JSON_INCLUDE_MASK(<json>,<mask>);
Added syntax to allow multiple leaf nodes to be detached using the
DETACH LEAF
command. (8.0.14)New feature:
ORDER BY SELF JOIN
, it creates a self join on ORDER BY LIMIT queries to take advantage of differences in bandwidth. (8.0.13)The
ORDER BY ALL [DESC|ASC]
(orORDER BY *
) syntax is now supported. (8.0.10)The
GROUP BY ALL [DESC|ASC]
(orGROUP BY *
) syntax is now supported. (8.0.10)Added the
REVERSE()
built-in string function that reverses the target string. (8.0.9)Added a new
OPTIMIZE TABLE <table_name> INDEX;
command for columnstore tables. This command runs the optimization routine for columnstore secondary indexes manually. (8.0.7)The
SHOW STATUS EXTENDED
command contains a new "Gv_clock
" key whose value is the current logical clock of the server. (8.0.5)The
SHOW DATABASE STATUS
command contains a new "gv_clock
" key whose value is the current logical clock of the server. (8.0.5)Added support to
PROFILE
for hash join spilling. (8.0.5)Improved the accuracy of
network_time
inPROFILE
output for some query shapes. (8.0.5)Added
ALTER USER ... ACCOUNT LOCK
to manually lock accounts:ALTER USER 'test'@'%' ACCOUNT LOCK; ALTER USER 'test'@'%' ACCOUNT UNLOCK;
Added support for encoded
GROUP BY
clauses in queries containing conditional and character expressions in aggregate functions.Updated the supported syntax for DROP … FROM PLANCACHE so plans on a specified node and plans from all aggregators based on the query text can be dropped.
DROP plan_id FROM PLANCACHE ON NODE node_id; DROP PLAN FROM PLANCACHE [ON AGGREGATORS] FOR QUERY <query_text>;
Added the optional parameter
DEFINER
forCREATE PROCEDURE
,FUNCTION
, andAGGREGATE
.Added ability to use use the
ORDER BY
clause with theJSON_AGG
function.The
BACKUP
command no longer blocks theALTER TABLE
and several other commands for the duration of the backup. This allows you to run commands likeTRUNCATE
on your tables even during the backup of a very large deployment. For a complete list of commands no longer blocked during backup refer to Commands Not Blocked by the BACKUP Process.Added support for the
AUTO
option in the computed column definition clause of aCREATE TABLE
statement to automatically infer the data type of a computed column expression. For more information, see CREATE TABLE.Added the ability to use
JSON_MATCH_<ANY>
. Returns true if, in the JSON, there is a value at the specified filter path which evaluates the optional filter predicate as true. If no filter predicate is provided, will return true if the filter path exists.
Engine Variables
Added new session variable
disable_remove_redundant_gby_rewrite
to prevent theGROUP BY
columns from being removed when used in anORDER BY
clause. (8.0.14)Introduced a new global variable
subprocess_max_retries
, which is used for retrying on retry-able connection failures during select into/backup queries for S3 and GCS. (8.0.13)Fixed the case where the
REGEXP_REPLACE
andREGEXP_SUBSTR
expressions can produce non-utf8 strings by introducing the new engine variableregexp_output_validation_mode
. Regular expression built-ins can produce non-utf8 strings because they don't have full support for multi-byte characters. The engine variable controls this behavior if regular expression built-ins return strings that are invalid under its collation settings. (8.0.8)Added the
ignore_foreign_keys
system variable, which allows foreign key syntax inCREATE TABLE
commands, but completely ignores the key (it will not show up in metadata). (8.0.7)Added a new option,
SERVER_V2
, to thejson_extract_string_collation
engine variable. This new, recommended option is the default for new clusters, and allows comparison of utf8mb4 strings extracted from JSON to utf8 string constants. Existing clusters will retain their original setting upon upgrade. (8.0.7)Added the
skip_segelim_with_inlist_threshold
engine variable, which controls when segment elimination will not use an IN list that is too large (default 1000 elements). (8.0.7)Added a new global variable,
maximum_blob_cache_size_percent
, which can set the blob cache size as a value from 0 to 1 that is percentage of local disk the blob cache is allowed to use. The default value is 0. (8.0.5)Added a new global variable,
num_background_merger_threads
, which controls the number of background merger threads to start for each node. The default value is 2. (8.0.5)Setting Collation for String Literals
You can set the collation for string literals explicitly:
SELECT "My string" COLLATE utf8mb4_unicode_ci;
Added two Workload Management engine variables:
workload_management_queue_size_allow_upgrade
andworkload_management_dynamic_resource_allocation
. These variables work together to dynamically move queries to another queue if the original queue is saturated.The columnstore_small_blob_combination_threshold engine variable default value has been changed to 5242880 bytes. Prior to the 8.0 release, the default value was 33554432 bytes.
Storage of CHAR(<length>) as VARCHAR(<length>): For a column defined as type
CHAR
of lengthlen
, SingleStoreDB will store the column as aVARCHAR
of lengthlen
iflen
greater than or equal to the value of the new engine variablevarchar_column_string_optimization_length
. If the value of the variable is0
, the column is not stored as aVARCHAR
.The enable_auto_profile engine variable now has a third value:
LITE
.LITE
is the new default value for new customers. It has a lower memory overhead thatON
. The default value for existing customers isON
.The sync_permissions engine variable default value is now
ON
.The default value only impacts newly installed clusters. Existing clusters must be manually updated to the variable.The data_conversion_compatibility_level engine variable can now be set to
'8.0'
for stricter data type conversions. This will now be the default value. This newdata_conversion_compatibility_level
setting additionally flags invalid string-to-number conversion inINSERT
statements.
Miscellaneous
Improvements to memsql_exporter: Improved error handling and reduced memory usage. (8.0.18)
Updated timezone metadata to include Mexico's latest timezone change. (8.0.17)
Added support for LOAD DATA from S3 for Avro and Parquet data. (8.0.17)
Improved column type resolution for base and recursive branches in recursive common table expressions (CTEs). (8.0.15)
Added the ability to backup a database to an HTTPS S3 target with an unverified SSL certificate when using the option:
verify_ssl: false
. (8.0.14)Expanded existing Unicode characters to support Private Use Area (PUA) code points. Including one in the Basic Multilingual Plane (U+E000–U+F8FF) and one in each plane 15 and 16 (U+F0000–U+FFFFD, U+100000–U+10FFFD). (8.0.13)
Added the
/api/v2/jwks_setup
endpoint to Data API to allow users to enable JWT Auth in Data API on Cloud. See jwks_setup for more information.Added the option to authenticate Data API requests using JWT.
Created the
ALTER USER
permission. Users must have this permission or theGRANT
permission to be able to execute theALTER USER
command.Expressions can be assigned to system variables. System variables, literals, or any combination of these can be referenced using built-ins like
CONCAT
as a variant of complex expressions.Added support for ? and [ ] glob patterns to FS pipelines.
Added ability for a JSON computed column to be returned in a query instead of the entire document.
Expanded the type of query execution operations (hash joins, window functions, and sort operations) to offload memory to disk using spilling to allow a large memory footprint query to succeed at the cost of query execution times in a memory constrained environment.
Flexible Parallelism is enabled by default. All new clusters created will have Flexible Parallelism enabled. This does not apply to updated clusters and restored databases, which will retain their original settings.
This change in behavior could impact applications that create databases or clusters. To retain the old behavior (Flexible Parallelism not enabled when creating clusters/databases) disable Flexible Parallelism prior to object creation via the
sub_to_physical_partition_ratio
engine variable. To use Flexible Parallelism, it must be enabled prior to database creation.Subselect related lockdown messages are now more informative and they indicate the line number and character offset of the subselect that caused the error. In addition, up to 100 bytes of text from the beginning of the referred subselect is also displayed.
SELECT (SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1; Old output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore." New output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore. Near '(SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1' at line 1, character 7."
After adding a new leaf node to a cluster and rebalancing partitions, the blob cache on the new leaf is warmed with copies of blobs from the node originally holding data that has been moved to the new leaf. This is done before the new leaf begins handling queries. It is fully automatic.
For unlimited storage databases, SingleStoreDB caches data from remote storage on local disks or SSDs. It uses a modified least-recently-used (LRU(2)) replacement policy. Information is retained to indicate if objects are frequently-accessed. This reduces the chance that a single large query will flush frequently-accessed data from the cache.
Fixed an issue where
REGEXP
andRLIKE
were case-insensitive on binary collations (for example, utf8_bin). They are now case-sensitive for binary collations.