8. 1 Release Notes
On this page
Note
-
To deploy a SingleStore 8.
1 cluster, refer to the Deploy SingleStore Guide. -
To upgrade a self-managed install to this release, follow this guide.
-
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
-
The data_
conversion_ compatibility_ level engine variable now defaults to '8.
in new installations.0' This results in stricter data type conversions. The value is not changed when upgrading to version 8. 0. This new data_
setting additionally flags invalid string-to-number conversion inconversion_ compatibility_ level INSERT
statements,PIPELINES
,LOAD DATA
commands, and type casts.Applications will likely see more compatibility issues flagged when run against installations with
data_
set toconversion_ compatibility_ level '8.
than when run with a lower compatibility level.0' -
New deployments of SingleStore 8.
1 require a 64-bit version of RHEL/AlmaLinux 7 or later, or Debian 8 or later, with kernel 3. 10 or later and glibc 2. 17 or later. Refer to the System Requirements and Recommendations for additional information.
Release Highlights
Note
This is the complete list of new features and fixes in engine version 8.
Column Group Index
Introduced a columnstore row index structure (COLUMN GROUP
) that will create a materialized copy of individual rows as a separate structure in a columnstore table.
Automatic Profiling
Added support for query profiling that is efficient enough to be enabled all the time by introducing the auto_
engine variable with FULL
and LITE
options for automatic profiling.enable_
engine variable must be set to ON
for the FULL
or LITE
options to work.
Visual Explain via SHOW PROFILE UI and EXPLAIN
Using the SHOW PROFILE UI
or EXPLAIN UI
commands will generate a URL that, when opened, loads a visual representation of the query plan on the Visual Explain website.
User ATTRIBUTE and COMMENT Fields
Added ATTRIBUTE
and COMMENT
fields for users.CREATE USER
and ALTER USER
commands.INFORMATION_
view.
CREATE USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}';CREATE USER 'user1'@'localhost' COMMENT 'Some information about user1';ALTER USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}';ALTER USER 'user1'@'localhost' COMMENT 'some comment about user1';ALTER USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}' COMMENT 'some comment about user1';SELECT USER, ATTRIBUTE, COMMENT from INFORMATION_SCHEMA.USERS WHERE user='user1';
+-------------+-------------------------+------------------------------+
| USER | ATTRIBUTE | COMMENT |
+-------------+-------------------------+------------------------------+
| user1 | {"phone": "1234567890"} | some information about user1 |
+-------------+-------------------------+------------------------------+
The value for ATTRIBUTE
must be a valid JSON object.ATTRIBUTE
and COMMENT
in the same SQL statement, ATTRIBUTE
must come first.
Full-Text Search Is Now Supported on utf8mb4 Data
The updated full-text search tokenizer in version 8.
New full-text indexes will use the new tokenizer when created.DROP INDEX
or ALTER TABLE DROP INDEX
) and recreate them via the new ALTER TABLE ADD FULLTEXT statement.
Other than adding proper support for emojis and extended characters, full-text search behavior using the new tokenizer is different from that of the previous tokenizer only in some rare edge cases.
SASL OAUTHBEARER Authentication Mechanism
Added the ability to configure Simple Authentication and Security Layer (SASL) OAUTHBEARER for use with Kafka pipelines.
Trace Log Rotation Function
SingleStore now has a built-in trace log rotation function (added in version 8.logrotate
or similar, you will need to disable that before enabling this new rotation feature.
The built-in trace log rotation feature can be configured to rotate logs based on their size or based on time.
The new trace log rotation feature is disabled by default.
As a part of this new feature, added the following engine variables:
-
tracelog_
- indicates the size threshold in bytesrotation_ size -
tracelog_
- indicates the time threshold in secondsrotation_ time -
tracelog_
- indicates retention period in daysretention_ period
Other Improvements and Fixes
Other Performance Enhancements
-
Enhanced workload management so that a medium-size query can be put in the large-query queue if there are free resources for large queries but the resources for medium-size queries are fully used.
This can improve throughput when there are many medium-size queries and few large ones. See the related discussion of the new variable workload_
.management_ queue_ size_ allow_ upgrade -
Enhanced disk space manageability that allows controlled space sharing between column store blob cache and query spilling.
Supports setting the maximum disk spilling space limit explicitly with a new engine variable spilling_
.maximum_ disk_ percent Additional metrics in the SHOW STATUS EXTENDED output give details about the disk usage: -
Disk_
spilling_ usage_ limit_ mb -
Disk_
spilling_ usage_ mb -
Disk_
spilling_ evict_ blob_ cache_ mb_ average -
Disk_
spilling_ usage_ mb_ high_ water_ mark -
Disk_
total_ size_ mb -
Disk_
available_ size_ mb -
Disk_
uncategorized_ usage_ mb -
Blob_
cache_ max_ size_ mb_ adjusted_ for_ low_ disk -
Blob_
cache_ max_ size_ mb_ adjusted_ for_ low_ disk_ and_ spilling
-
Query Optimization enhancements:
-
Added support to optimize table scans with an Order-By-Limit by (a) do the order and limit first, with only the minimum needed columns, and then (b) using a self-join to retrieve the additional necessary columns for only the qualifying rows.
This optimization can be enabled/disabled via the optimizer_
session variable, which is ON by default.enable_ orderby_ limit_ self_ join -
Added a
METADATA_
argument for shard keys which will prevent an index from being created on shard keys thereby saving memory.ONLY It can cause queries that would have used that index to run slower. -
Reduced the amount of memory used by autostats.
This can be controlled via the autostats_
engine variable.flush_ interval_ secs
New Information Schema Views and Columns
-
Added two Information Schema views for tracking connection attributes: LMV_
CONNECTION_ ATTRIBUTES and MV_ CONNECTION_ ATTRIBUTES. -
A new WORKLOAD_
MANAGEMENT_ QUEUE system table has been added which shows the list of queries for which the WM is not able to reserve resources and therefore has put them into the waiting queue. -
Added two Information Schema views for reporting the breakdown of disk utilization: information_
schema. MV_ DATA_ DISK_ USAGE and information_ schema. MV_ DISK_ USAGE.
New Commands and Functions
-
Added the ability to add and drop FULLTEXT indexes via the
ALTER TABLE
andDROP INDEX
commands.This allows simple addition or removal of full-text search capability without reloading data. A table can only have one FULLTEXT index. For example, with the following table: CREATE TABLE articles (id INT UNSIGNED,year int UNSIGNED,title VARCHAR(200),body TEXT,SORT KEY (id));Add a FULLTEXT index via:
ALTER TABLE articles ADD FULLTEXT (title, body);To remove the index, use either of these commands:
ALTER TABLE articles DROP INDEX title;DROP INDEX title ON articles;You cannot add and drop an index in a single
ALTER TABLE
statement. -
Added the ability to start only failed pipelines via the
START FAILED PIPELINES
command.START FAILED PIPELINES; -
Added the ability to load Avro formatted data from an AWS S3 bucket using LOAD DATA, which can be simpler than using pipelines.
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT AVRO; -
Added the ability to load Parquet formatted data from an AWS S3 bucket, which can be simpler than using pipelines.
LOAD DATA S3 '<bucket name>'CONFIG '{"region" : "<region_name>"}'CREDENTIALS '{"aws_access_key_id" : "<key_id> ","aws_secret_access_key": "<access_key>"}'INTO TABLE <table_name>(`<col_a>` <- %,`<col_b>` <- % DEFAULT NULL ,) FORMAT PARQUET; -
Added ability for users with
SUPER
orALTER USER
privileges to be able to see the hashed password.SHOW GRANTS FOR <user>; -
Added the
WARM BLOB CACHE FOR INDEX
clause to the OPTIMIZE TABLE command to ensure consistent query performance when querying through a hash index, irrespective of when the query was last accessed. -
Added support for creating a
DEEP COPY
of tables with computed columns.(8. 1. 31) -
Added support for the
REGEXP_
function.MATCH() This function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string. (8. 1. 18)
New or Modified Engine Variables
-
Enhancement: Introduced a new session engine variable
optimizer_
which disables predicate transitivity on query rewrites.disable_ transitive_ predicates -
Added the following engine variables:
max_
which sets a limit on the number of expressions within an entire query andexpression_ query_ limit max_
which sets a limit on the number of expressions within a query item.expression_ item_ limit Both can be set to a range between 100
and the maximum unsignedINT
value.Setting these engine variables to the maximum unsigned INT
value disables both features.(8. 1. 21) -
Added two engine variables,
jwks_
andusername_ field jwks_
to add flexibility and improve security.require_ audience (8. 1. 20) -
Added support for materializing CTEs without recomputing them when the query contains
UNION
,UNION ALL
, and otherSET
operations.To enable the feature, set the engine variable allow_
tomaterialize_ cte_ with_ union TRUE
.(8. 1. 20) -
json_
which limits the number of JSON key paths inferred within a segment.document_ max_ leaves The default value is 10000
.(8. 1. 18) -
pipelines_
, which is used in conjunction withparse_ errors_ threshold SKIP PARSER ERRORS
for Kafka pipelines.(8. 1. 16) -
dr_
, which allows users to adjust the minimum timeout period in Disaster Recovery (DR) replication.min_ connection_ timeout_ ms (8. 1. 15) -
workload_
: When enabled allows workload management to classify queries as large solely based on memory usage.management_ enable_ large_ memory_ classification This is an Aggregator Only engine variable. (8. 1. 9) -
workload_
: The expected number of aggregators that will be used to run a high volume of client queries which require fully distributed execution.management_ expected_ aggregators The default value is 0, which is equivalent to setting it to the total number of aggregators in the Products: cluster. For version 7. 0 and up, this variable should only be set to its default of 0, unless `workload_ management_ enable_ static_ partitioning is set to TRUE. This variable can sync to all aggregators. (8. 1. 9) -
Improved the performance of a left join of a reference table with a sharded table by pushing the reference table clauses to the leaves.
The engine variable disable_
must be set to "OFF" to enable this operation.reference_ table_ leftjoin_ where_ pushdown (8. 1. 4) -
auto_
: Determines the mode of auto profiling.profile_ type There are two values: LITE
andFULL
. -
autostats_
: Determines when autostats are flushed to disk if they are not used within the specified time.flush_ interval_ secs The default value is 600
seconds.If the engine variable is set to 0
, autostats will always stay in memory. -
compiled_
: Sets the percentage ofimages_ eviction_ memory_ limit_ percent maximum_
that is available for caching query object code images.memory This variable is set to 0 by default. For more information about this engine variable, see Managing Plancache Memory and Disk Usage. -
compiled_
: Sets the memory limit (in MBs) that is available for caching query object code images.images_ eviction_ memory_ limit_ mb For more information about this engine variable, see Managing Plancache Memory and Disk Usage. -
enable_
is now enabled by default.compiled_ images_ eviction -
sql_
is now a Sync variable and is settable on SingleStore Helios.mode -
spilling_
: Sets the maximum percentage of disk space to be used by disk spilling.maximum_ disk_ percent The value must be between 0 and 1 or left to the default (-1). -
Two new Sync variables control the dynamic reuse of WM queues:
workload_
andmanagement_ dynamic_ resource_ allocation workload_
.management_ queue_ size_ allow_ upgrade The default value for workload_
ismanagement_ queue_ size_ allow_ upgrade 1
.This means we can upgrade queries from the medium queue to the large queue until the large queue length becomes equal to 1. -
backup_
: Improves backup performance.multipart_ upload_ concurrency -
max_
: It is unused and setting this variable has no effect.execution_ time It exists so certain MySQL-compatible apps that have this variable set will not fail when connected to SingleStore. -
Introduced a new session engine variable
optimizer_
which disables predicate transitivity on query rewrites.disable_ transitive_ predicates -
Added the
bottomless_
engine variable, which when enabled, completes additional verification of persisted files immediately after upload.experimental_ blobstore_ mode This mode is experimental and may reduce upload speed. Please use with caution. -
The data_
conversion_ compatibility_ level engine variable can now be set to '8.
for stricter data type conversions.1' This new data_
setting additionally flags invalid or out-of-range timestamp inputs toconversion_ compatibility_ level TIMESTAMP
andTIMESTAMP(6)
columns inINSERT
statements.Please note: The data_ conversion_ compatibility_ level engine variable defaults to '8.
in new installations.0'
Miscellaneous
-
Allow Kerberos users to skip password validation for users who are authenticated with plugins.
(8. 1. 42) -
Updated the
information_
view to reflect the account status for locked users.schema. USERS (8. 1. 32) -
Added more log messages for backup sub-process errors.
(8. 1. 32) -
Improved the ability to kill queries containing JSON built-in functions.
(8. 1. 32) -
The query optimizer now considers more left join elimination cases.
(8. 1. 32) -
Added support for creating a
DEEP COPY
of tables with computed columns.(8. 1. 31) -
Added additional config and credential option validation while creating pipelines.
(8. 1. 31) -
Fixed an issue with the
SET
clause of theCREATE PIPELINE
statement: now it is possible to use type options likeNULL/NOT NULL/COLLATE.
with. . :>/!:>
typecast operators.(8. 1. 29) -
Improved performance during snapshotting for CDC-in pipelines.
(8. 1. 27) -
Added additional node metrics to the /cluster-metrics endpoint of the
memsql_
.exporter (8. 1. 27) -
Added support for simple multi-column update with sub-query.
(8. 1. 25) -
Added ability to use SSL keys with a password in the HTTP API.
(8. 1. 25) -
Improved the performance of
mv_
for clusters with large DR databases.replication_ status (8. 1. 24) -
Added ability to infer CSV data with text boolean values.
(8. 1. 24) -
Added resource pool metrics to the /cluster-metrics endpoint of the
memsql_
.exporter (8. 1. 24) -
Improved blob cache sync performance when there is no running query.
(8. 1. 23) -
Added support for flipping join order for
FULL OUTER JOIN
operations.(8. 1. 23) -
Enhanced the performance of DDL statements for role manipulation.
(8. 1. 20) -
Improved memory consumption in json decoding.
(8. 1. 19) -
SKIP PARSER ERRORS
is now supported for Kafka.(8. 1. 16) -
Added support for multiple uncorrelated IN-subselects in more query shapes.
(8. 1. 16) -
SingleStore automatically rewrites A=B OR (A IS NULL AND B IS NULL) to null safe equal (A<=>B) to enable hash joins.
(8. 1. 16) -
Added support for hash joins on null-accepting expressions in the ON condition of outer joins.
(8. 1. 16) -
Added support for Control Group v2 (cgroup v2).
(8. 1. 13) -
Default values for BLOB/TEXT and JSON columns are allowed, as well as NULL and empty strings.
(8. 1. 12) -
Added an opt-in optimization to Kerberos for HDFS pipelines to reduce the amount of requests.
(8. 1. 11) -
Improved the accuracy of the
blob_
statistic in query profiles.fetch_ network_ time (8. 1. 8) -
Added metrics to
memsql_
that show the breakdown of disk usage by SingleStore.exporter Also, added metrics to memsql_
that show unlimited storage utilization statistics.exporter (8. 1. 8) -
Enhanced the on-disk plancache functionality by shrinking the plancache when disk space gets low and the on-disk plancache is using at least 25% of available disk space.
(8. 1. 8) -
Improved the performance of backups to GCS (Google Cloud Storage).
GCS backups are not using multi-part uploads. (8. 1. 6) -
The
FLUSH EXTRACTOR POOLS
command now runs on all nodes in a cluster when run on an aggregator.(8. 1. 6) -
Now allowing child aggregator root users to be dropped even if the
sync_
engine variable is enabled.permission (8. 1. 4) -
Removed the restrictions on the names of JWT users.
Customers may now create users with names that look like UUIDs and emails. (8. 1. 4) -
Added more information about blob files that need to be downloaded from unlimited storage to the blob cache when running
PROFILE
.(8. 1. 4) -
Changed the system information built-ins to use the utf8_
general_ ci collation for the collation_
value.server (8. 1. 3) -
Added capability to create Wasm functions in SingleStoreclusters with heterogeneous hardware, e.
g. , with different levels of SIMD support. -
Wasm modules are now serialized to the plancache.
Refer to Code Engine - Powered by Wasm for more information. -
SingleStore now supports creation of Wasm-based user-defined aggregate functions (UDAFs).
Refer to CREATE AGGREGATE for more information. -
The
CREATE FUNCTION
statement now returns an error if theUSING EXPORT
clause is omitted and the Wasm module has functions with the same name but different case.Refer to Wasm UDF/TVF Naming Convention for more information. -
Added a new metric to the output of the
SHOW STATUS EXTENDED
command, which displays the memory allocated (in MBs) used to cache query object code images, as derived fromcompiled_
andimages_ eviction_ memory_ limit_ percent compiled_
.images_ eviction_ memory_ limit_ mb -
Adjusted how
CREATE_
,TIME CREATE_
,USER ALTER_
, andTIME ALTER_
columns in information_USER schema. TABLES are set so that they are given proper values for the relevant CREATE
andALTER
commands. -
Enhanced support for recursive common table expressions (CTEs) by expanding the range of query shapes allowed, improving column type checking and resolution between the base and recursive branches, and adding cross database support.
Also, resolved the issue that caused the "ER_ UNSUPPORTED_ RECURSIVE_ CTE_ SHAPE" error with the accompanying message about dependency on a streaming result table outside the cycle. -
Renamed
spill_
tooutputted_ rows non_
for the shuffle groupby operator in the profiling output to avoid confusion with disk spilling.aggregated_ output_ rows [...] "non_aggregated_output_rows":{ "value":1048576, "avg":349525.333333, [...]
-
Improved JSON encoding speed for sparse schemas (JSON schemas with a very large number of distinct property names across the full set of documents, with most properties missing in each document).
This applies only to universal storage. -
Incremental backups are supported on SingleStore Self-Managed for unlimited storage space.
Note: incremental backups are not supported as part of the managed backup solution. -
Updated OpenSSL to version 3.
0, which is now used to establish secure connections to SingleStore. As a consequence, a client certificate that uses SHA or MD5 hash functions in its signature must be replaced with a certificate that uses SHA256 at a minimum, or a secure connection to SingleStore cannot be established. While SingleStore supports TLS v1, TLS v1. 1, and TLS v1. 2, using TLS v1. 2 is recommended. When FIPS is enabled, only TLS v1. 2 is supported. Refer to Troubleshoot OpenSSL 3. 0 Connections for more information. -
The
ORDER BY ALL [DESC|ASC]
(orORDER BY *
) syntax is now supported. -
The
GROUP BY ALL [DESC|ASC]
(orGROUP BY *
) syntax is now supported. -
Added the following JSON functions:
-
JSON_
returns true if there exists a value in the JSON at the filter path for which the filter predicate evaluates to true.MATCH_ ANY JSON_MATCH_ANY(<filter_predicate>, <json>, <filter_path>); -
JSON_
returns true if there is a value (possibly null) in the JSON at the filter path.MATCH_ ANY_ EXISTS JSON_MATCH_ANY_EXISTS(<json>, <filter_path>); -
JSON_
eliminates all portions of the document that do not match the mask.INCLUDE_ MASK JSON_INCLUDE_MASK(<json>,<mask>); -
JSON_
eliminates all portions of the document that match the mask.EXCLUDE_ MASK JSON_EXCLUDE_MASK(<json>,<mask>);
-
Changes in Patch Releases Since 8. 0GA
Several features have been added since the GA release of 8.GROUP BY ALL
, ORDER BY ALL
, and several new JSON functions; however, there are others.
In this section
Last modified: August 22, 2024