8. 0 Release Notes
On this page
Note
-
To deploy a SingleStore 8.
0 cluster, refer to the Deploy SingleStore Guide. -
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
-
The data_
conversion_ compatibility_ level engine variable now defaults to '8.
in new installations.0' This results in stricter data type conversions. The value is not changed when upgrading to version 8. 0. This new data_
setting additionally flags invalid string-to-number conversion inconversion_ compatibility_ level INSERT
statements.Applications will likely see more compatibility issues flagged when run against installations with
data_
set toconversion_ compatibility_ level '8.
than when run with a lower compatibility level.0' SingleStore now supports in-place change of the
data_
for persisted computed column shard keys for versions 8.conversion_ compatibility_ level 0. 24 and newer. We still suggest testing any existing applications before deploying a change to a production environment.
Release Highlights
Note
This is the complete list of new features and fixes in engine version 8.
Code Engine - Powered by Wasm
The Code Engine feature allows you to create UDFs/TVFs using code compiled to WebAssembly (Wasm).
For more information, see Code Engine - Powered by Wasm.
Improved Seekability in Universal Storage Tables
These enhancements will deliver large performance gains for transactional workloads on universal storage tables.
-
Added support for fast seeking into JSON columns in a universal storage table using subsegment access.
-
Improved seek performance for string data types for universal storage for LZ4 and run-length encoded (RLE) data.
Recursive Common Table Expressions
Recursive common table expressions (CTE) are now supported by SingleStore.
For more information, see WITH (Common Table Expressions).
Initial IPv6 Support
Preview feature: Added initial support for IPv6 to the SingleStore engine (memsqld
) and SingleStore Toolbox.allow_
and bind_
engine variables for more information.
Other Improvements and Fixes
Performance Enhancements
-
Added the ability to cache histogram results during optimization to reduce the work performed by the histograms.
(8. 0. 14) -
Improved the parsing performance of queries that contain several tables.
(8. 0. 14) -
Improved the performance of S3 pipelines when Garbage Collection (GC) is enabled.
(8. 0. 14) -
Improved the query execution performance of JSON columns under a higher level of parallelism.
(8. 0. 10) -
Improved the performance on columnstore scans that perform multiple JSON extraction operations on the same JSON column.
(8. 0. 8) -
Improved the performance of various commands (
SHOW
commands, DDL, etc.) when there are very many views or tables in the database (100s of thousands). (8. 0. 6) -
Improved performance of comparing utf8mb4 strings.
(8. 0. 5) -
Improved performance for user-defined functions (UDFs) and Stored Procedures that take JSON arguments, and the
JSON_
command.TO_ ARRAY -
Decreased the memory overhead for columnstore cardinality statistics by 25% as the first phase of an overall project to improve memory for auto-stats in general.
-
Improved the performance of the
PROFILE
functionality such as lower memory overheads, lower performance impacts to OLAP queries, and better statistics collecting.
Query Optimization enhancements:
-
Moved sub-queries for some outer joins from the
ON
clause to aWHERE
clause to enable subselects to be rewritten as joins. -
Enabled repartition on expressions.
-
Added ability to use
GROUP BY
push down for outer joins. -
Enhanced column pruning by eliminating derived duplicate columns.
-
Removed redundant
GROUP BY
clauses that are implied by equi-joins. -
Added sampling (a small portion of the rows in the table are used for analysis) for Reference tables as part of query optimization.
-
Added support for improved segment elimination in queries with
WHERE
clauses containingDATE
andTIME
functions.The functions that are supported for segment elimination are DATE, DATE_ TRUNC, TIMESTAMP, UNIX_ TIMESTAMP, and YEAR. See each topic for specific examples.
Selectivity Estimation improvements:
-
Enabled sampling for reference tables.
-
Improved date/time histogram estimates by utilizing a heuristic when the current date/time is outside of the histogram range.
-
Added selectivity estimation for filters containing uncorrelated scalar subselects.
This behavior can be controlled by the engine variable exclude_
.scalar_ subselects_ from_ filters This change has the side-effect of enabling bloom filters more often. -
Changed the estimation source to heuristics when sampling is turned on but the total sampled rows are zero.
-
Added ability to use histogram estimation for filtering predicates that use a stored procedure parameter.
-
Increased the default value for engine variable
optimizer_
to reduce the chance of Cartesian Joins being included when there are incorrect estimations.cross_ join_ cost -
Improved the
GROUP BY
cardinality estimates for predicates usingOR
expressions. -
Enabled ability to combine histogram and sampling selectivity estimates by default.
New Information Schema Views and Columns
-
Add an information schema table (
JSON_
) which shows the schema inferred for JSON columns in columnstore tables.COLUMN_ SCHEMA (8. 0. 15) -
Added the
DATETIME_
column to bothPRECISION PARAMETER
andROUTINES
information_schema views. Also, the DATETIME_
column will includePRECISION TIME
andTIMESTAMP
data types in theCOLUMNS
information_schema view. (8. 0. 9) -
Added a new information_
schema view named LMV_
.LOCAL_ DATABASES This view shows the state of local databases like SHOW DATABASES EXTENDED
, but it can be queried against unlike show commands.(8. 0. 9) -
Added
CREATE_
andTIME ALTER_
columns to information_TIME schema. pipelines. CREATE_
shows the date/time a pipeline was created or recreated.TIME ALTER_
shows the date/time a pipeline was altered via anTIME ALTER PIPELINE
statement.(8. 0. 6) -
Added a new information schema view
internal_
which shows memory use of SingleStore internal metadata tables.table_ statistics The columns displayed are the same as those shown for table_
.statistics -
Added several Replication Management views.
-
Added the MV_
RECOVERY_ STATUS view which includes information about the status of the current recovery process. -
Added the
AVERAGE_
column to the MV_DISK_ SPILLING_ USE PLANCACHE information schema table. It shows the average amount of data (in bytes) spilt to disk during query execution.
New Commands and Functions
-
Added
JSON_
function when applied to a JSON document; it will return a subset of the document based on the mask.INCLUDE/EXCLUDE_ MASK (8. 0. 18) JSON_EXCLUDE_MASK(<json>,<mask>);
JSON_INCLUDE_MASK(<json>,<mask>);
-
Added syntax to allow multiple leaf nodes to be detached using the
DETACH LEAF
command.(8. 0. 14) -
New feature:
ORDER BY SELF JOIN
, it creates a self join on ORDER BY LIMIT queries to take advantage of differences in bandwidth.(8. 0. 13) -
The
ORDER BY ALL [DESC|ASC]
(orORDER BY *
) syntax is now supported.(8. 0. 10) -
The
GROUP BY ALL [DESC|ASC]
(orGROUP BY *
) syntax is now supported.(8. 0. 10) -
Added the
REVERSE()
built-in string function that reverses the target string.(8. 0. 9) -
Added a new
OPTIMIZE TABLE <table_
command for columnstore tables.name> INDEX; This command runs the optimization routine for columnstore secondary indexes manually. (8. 0. 7) -
The
SHOW STATUS EXTENDED
command contains a new "Gv_
" key whose value is the current logical clock of the server.clock (8. 0. 5) -
The
SHOW DATABASE STATUS
command contains a new "gv_
" key whose value is the current logical clock of the server.clock (8. 0. 5) -
Added support to
PROFILE
for hash join spilling.(8. 0. 5) -
Improved the accuracy of
network_
intime PROFILE
output for some query shapes.(8. 0. 5) -
Added the ability to run
SHOW GRANTS
within stored procedures. -
Added
ALTER USER .
to manually lock accounts:. . ACCOUNT LOCK ALTER USER 'test'@'%' ACCOUNT LOCK;ALTER USER 'test'@'%' ACCOUNT UNLOCK; -
Added support for encoded
GROUP BY
clauses in queries containing conditional and character expressions in aggregate functions. -
Updated the supported syntax for DROP … FROM PLANCACHE so plans on a specified node and plans from all aggregators based on the query text can be dropped.
DROP plan_id FROM PLANCACHE ON NODE node_id;DROP PLAN FROM PLANCACHE [ON AGGREGATORS] FOR QUERY <query_text>; -
Added the optional parameter
DEFINER
forCREATE PROCEDURE
,FUNCTION
, andAGGREGATE
. -
Added ability to use use the
ORDER BY
clause with theJSON_
function.AGG -
The
BACKUP
command no longer blocks theALTER TABLE
and several other commands for the duration of the backup.This allows you to run commands like TRUNCATE
on your tables even during the backup of a very large deployment.For a complete list of commands no longer blocked during backup refer to Lock-free Backups. -
Added support for the
AUTO
option in the computed column definition clause of aCREATE TABLE
statement to automatically infer the data type of a computed column expression.For more information, see CREATE TABLE. -
Added the ability to use
JSON_
.MATCH_ <ANY> Returns true if, in the JSON, there is a value at the specified filter path which evaluates the optional filter predicate as true. If no filter predicate is provided, will return true if the filter path exists.
Engine Variables
-
Added new session variable
disable_
to prevent theremove_ redundant_ gby_ rewrite GROUP BY
columns from being removed when used in anORDER BY
clause.(8. 0. 14) -
Introduced a new global variable
subprocess_
, which is used for retrying on retry-able connection failures during select into/backup queries for S3 and GCS.max_ retries (8. 0. 13) -
Fixed the case where the
REGEXP_
andREPLACE REGEXP_
expressions can produce non-utf8 strings by introducing the new engine variableSUBSTR regexp_
.output_ validation_ mode Regular expression built-ins can produce non-utf8 strings because they don't have full support for multi-byte characters. The engine variable controls this behavior if regular expression built-ins return strings that are invalid under its collation settings. (8. 0. 8) -
Added the
ignore_
system variable, which allows foreign key syntax inforeign_ keys CREATE TABLE
commands, but completely ignores the key (it will not show up in metadata).(8. 0. 7) -
Added a new option,
SERVER_
, to theV2 json_
engine variable.extract_ string_ collation This new, recommended option is the default for new clusters, and allows comparison of utf8mb4 strings extracted from JSON to utf8 string constants. Existing clusters will retain their original setting upon upgrade. (8. 0. 7) -
Added the
skip_
engine variable, which controls when segment elimination will not use an IN list that is too large (default 1000 elements).segelim_ with_ inlist_ threshold (8. 0. 7) -
Added a new global variable,
maximum_
, which can set the blob cache size as a value from 0 to 1 that is percentage of local disk the blob cache is allowed to use.blob_ cache_ size_ percent The default value is 0. (8. 0. 5) -
Added a new global variable,
num_
, which controls the number of background merger threads to start for each node.background_ merger_ threads The default value is 2. (8. 0. 5) -
Setting Collation for String Literals
You can set the collation for string literals explicitly:
SELECT "My string" COLLATE utf8mb4_unicode_ci; -
Added two Workload Management engine variables:
workload_
andmanagement_ queue_ size_ allow_ upgrade workload_
.management_ dynamic_ resource_ allocation These variables work together to dynamically move queries to another queue if the original queue is saturated. -
The columnstore_
small_ blob_ combination_ threshold engine variable default value has been changed to 5242880 bytes. Prior to the 8. 0 release, the default value was 33554432 bytes. -
Storage of CHAR(<length>) as VARCHAR(<length>): For a column defined as type
CHAR
of lengthlen
, SingleStore will store the column as aVARCHAR
of lengthlen
iflen
greater than or equal to the value of the new engine variablevarchar_
.column_ string_ optimization_ length If the value of the variable is 0
, the column is not stored as aVARCHAR
. -
The sync_
permissions engine variable default value is now ON
.The default value only impacts newly installed clusters. Existing clusters must be manually updated to the variable. -
The data_
conversion_ compatibility_ level engine variable can now be set to '8.
for stricter data type conversions.0' This will now be the default value. This new data_
setting additionally flags invalid string-to-number conversion inconversion_ compatibility_ level INSERT
statements.
Miscellaneous
-
Improvements to memsql_
exporter: Improved error handling and reduced memory usage. (8. 0. 18) -
Updated timezone metadata to include Mexico's latest timezone change.
(8. 0. 17) -
Added support for LOAD DATA from S3 for Avro and Parquet data.
(8. 0. 17) -
Improved column type resolution for base and recursive branches in recursive common table expressions (CTEs).
(8. 0. 15) -
Added the ability to backup a database to an HTTPS S3 target with an unverified SSL certificate when using the option:
verify_
.ssl: false (8. 0. 14) -
Expanded existing Unicode characters to support Private Use Area (PUA) code points.
Including one in the Basic Multilingual Plane (U+E000–U+F8FF) and one in each plane 15 and 16 (U+F0000–U+FFFFD, U+100000–U+10FFFD). (8. 0. 13) -
Added the
/api/v2/jwks_
endpoint to Data API to allow users to enable JWT Auth in Data API on Cloud.setup See jwks_ setup for more information. -
Added the option to authenticate Data API requests using JWT.
-
Created the
ALTER USER
permission.Users must have this permission or the GRANT
permission to be able to execute theALTER USER
command. -
Expressions can be assigned to system variables.
System variables, literals, or any combination of these can be referenced using built-ins like CONCAT
as a variant of complex expressions. -
Added support for ? and [ ] glob patterns to FS pipelines.
-
Added ability for a JSON computed column to be returned in a query instead of the entire document.
-
Expanded the type of query execution operations (hash joins, window functions, and sort operations) to offload memory to disk using spilling to allow a large memory footprint query to succeed at the cost of query execution times in a memory constrained environment.
-
Flexible Parallelism is enabled by default.
All new clusters created will have Flexible Parallelism enabled. This does not apply to updated clusters and restored databases, which will retain their original settings. This change in behavior could impact applications that create databases or clusters.
To retain the old behavior (Flexible Parallelism not enabled when creating clusters/databases) disable Flexible Parallelism prior to object creation via the sub_
engine variable.to_ physical_ partition_ ratio To use Flexible Parallelism, it must be enabled prior to database creation. -
Subselect related lockdown messages are now more informative and they indicate the line number and character offset of the subselect that caused the error.
In addition, up to 100 bytes of text from the beginning of the referred subselect is also displayed. SELECT (SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1; Old output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore." New output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore. Near '(SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1' at line 1, character 7."
-
After adding a new leaf node to a cluster and rebalancing partitions, the blob cache on the new leaf is warmed with copies of blobs from the node originally holding data that has been moved to the new leaf.
This is done before the new leaf begins handling queries. It is fully automatic. -
For unlimited storage databases, SingleStore caches data from remote storage on local disks or SSDs.
It uses a modified least-recently-used (LRU(2)) replacement policy. Information is retained to indicate if objects are frequently-accessed. This reduces the chance that a single large query will flush frequently-accessed data from the cache. -
Fixed an issue where
REGEXP
andRLIKE
were case-insensitive on binary collations (for example, utf8_bin). They are now case-sensitive for binary collations.
In this section
Last modified: September 24, 2024