8. 9 Release Notes
On this page
Note
-
To deploy a SingleStore 8.
9 cluster, refer to the Deploy SingleStore Guide. -
To upgrade a self-managed install to this release, follow this guide.
-
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
-
New deployments of SingleStore 8.
9 require a 64-bit version of RHEL/AlmaLinux 7 or later, or Debian 8 or later, with kernel 3. 10 or later and glibc
2.17 or later. Refer to the System Requirements and Recommendations for additional information.
Release Highlights
Note
This is the complete list of new features and fixes in SingleStore engine version 8.
Full-Text Search - Analyzers and Tokenizers
Full-text search using SingleStore's VERSION 2 full-text index has been enhanced with support for custom analyzers and tokenizers.
Full-Text Search - Enhanced BM25 Scoring
Full-text search using SingleStore's VERSION 2 full-text index has updated BM25 scoring functionality.
The BM25
function has been enhanced with support for boolean and boost queries, phrase and proximity search queries, and queries over multiple columns.
A new function, BM25_
, has been added to provide BM25 scoring across all partitions.BM25_
function augments the existing BM25
and MATCH
functions, and it is more accurate and more expensive than both of these functions.BM25_
function.
Iceberg Continuous Ingest
Added support for continuous ingest of data from Iceberg tables.CREATE OR REPLACE
command are supported.
Iceberg - New Catalogs
Added support for Snowflake, REST, JDBC, Hive, and Polaris Catalogs for Iceberg Ingest using pipelines.
Enhanced Disk Spilling
Added disk spilling for RIGHT
and FULL OUTER JOIN
.
Writable Views
Writable views allow users to run UPDATE
, INSERT
, and DELETE
queries on views.enable_
global variable to 1
.information_
view to inspect if a view can be updated.
Other Improvements and Fixes
Vector Index on Nullable Column
Vector Indexes can be created on columns that are nullable.NOT NULL
.NULL
vector value into a table with a vector index.
Vector Index Memory Tracking
The memory used by vector indexes can be tracked using the alloc_
metric which is now available in SHOW STATUS EXTENDED.
Other Performance Enhancements
-
Enhancement: Added sub-segment elimination for flexible parallelism.
Refer to Flexible Parallelism for more information. -
Enhancement: Performance of full-text search, specifically throughput performance, has been significantly improved.
-
Enhancement: Improved performance of
VECTOR
type user-defined variables (UDVs).These variables no longer require using extra typecast to BLOB
data type. -
Enhancement: Improved the performance of
LOAD DATA
queries that included theCHARACTER SET
clause. -
Enhancement: Improved performance of the
CREATE PROJECTION
command by skipping unique key checks. -
Enhancement: Improved the performance of
REPLACE
query into a columnstore table when table-level locking is triggered. -
Enhancement: Improved the performance of
CREATE {TABLE | TABLES} AS INFER PIPELINE
queries. -
Enhancement: Optimized full-text queries with
ORDER BY .
on a full-text score and that optionally filter on the same full-text clause.. . LIMIT -
Enhancement: Significantly improved the performance (~20x) of certain JSON-based SQL queries, when JSON objects contain arrays of sub-objects.
This optimization reduces the need to normalize the data into multiple tables to achieve high analytics performance. Queries that expand JSON arrays without any aggregations and/or perform the following operations benefit from this optimization:
-
Group by a field outside the array (in the
GROUP BY
clause) -
Filter on the fields in the array
-
New Information Schema Views and Columns
-
Enhancement: Added an
OBSERVE_
information schema view that contains information on offsets for starting anDATABASE_ OFFSETS OBSERVE
query.Refer to OBSERVE_ DATABASE_ OFFSETS for more information. -
Enhancement: Added the
TABLE_
column to theNAME LOAD_
information schema view.DATA_ ERRORS TABLE_
is the name of the table associated with the error.NAME -
Enhancement: Added the
NODE_
column to theID MV_
information schema view that specifies the ID of the node from which the database is being recovered.RECOVERY_ STATUS -
Enhancement: Added the
MV_
information schema view that contains a summary of remote API calls made from the engine.BOTTOMLESS_ API_ EVENTS_ SUMMARY Refer to MV_ BOTTOMLESS_ API_ EVENTS_ SUMMARY for more information.
New Commands and Functions
-
New feature: Added the
FULLTEXT SERVICE STOP
command.This command stops the full-text V2 service running on any node connected to the aggregator on which the command is run. Refer to FULLTEXT SERVICE STOP for more information. -
New feature: Added the following Identifier Generation Functions:
-
UUID_
TO_ BIN -
BIN_
TO_ UUID -
IS_
UUID
-
-
New feature: Added a
SHOW FULLTEXT SERVICE METRICS
command that displays the diagnostic metrics for the JLucene full-text search in JSON format.Refer to SHOW FULLTEXT SERVICE METRICS for more information. -
New feature: Added a
SHOW CDC EXTRACTOR POOL
command that displays information about the CDC-in pipelines.Refer to SHOW CDC EXTRACTOR POOL for more information. -
New feature: Added a new
JSON_
function that merges two JSON objects into a single JSON object.MERGE_ PATCH Refer to JSON_ MERGE_ PATCH for more information. -
New feature: Added support for Lateral Join.
Lateral join allows a subquery in the FROM
clause of a SQL query to reference another table in that sameFROM
clause, which can simplify query syntax.Refer to Lateral Join for more information.
New or Modified Engine Variables
Refer to List of Engine Variables for information on each of the following engine variables.
-
Enhancement: Added a new engine variable
sync_
that specifies the timeout (in seconds) to synchronize the cluster metadata across the cluster.partitions_ timeout_ sec -
Enhancement: Added a new engine variable
disconnect_
.client_ on_ invalid_ connection_ state If enabled, client connections are closed when their state becomes invalid. -
Enhancement: Added a new engine variable
synchronize_
that specifies the time (in seconds) long running queries wait for reference databases to synchronize on commit in the cluster.reference_ timeout_ ms -
Enhancement: Added a new engine variable
assume_
that controls behavior where SingleStore does extra work to avoid issues caused by UDFs that return different values when called repeatedly (e.udfs_ deterministic g. , are non-deterministic). -
Enhancement: Added a new engine variable
max_
to tune the maximum number of background autostats update workers.autostats_ update_ workers -
Enhancement: Added a new engine variable
enable_
that enables creation of writable views.writable_ views Refer to CREATE VIEW for more information. -
Enhancement: Added a new engine variable
recovery_
that controls the replay and database initialization concurrency during database recovery.concurrency -
Enhancement: Updated the minimum value of
json_
engine variable todocument_ max_ children 1
(from128
previously). -
Enhancement: Disabled the
optimize_
engine variable by default.json_ computed_ column -
Enhancement: Added a new engine variable
enable_
that controls the collection of block-level statistics for sub-segment elimination for flexible parallelism.block_ level_ stats_ collection -
Enhancement: Added a new engine variable
enable_
that controls whether the block-level statistics are read and used during scan as part of sub-segment elimination for flexible parallelism.block_ stats_ use_ in_ query -
Enhancement: Added a new engine variable
pipelines_
to control heap size specifically for Iceberg pipelines.iceberg_ heap_ size -
Enhancement: Added
json_
global variable to control collation of JSON.collation The value of json_
can be eithercollation utf8_
orbin utf8mb4_
.bin -
Enhancement: Added a method to throttle upload ingest when blob cache has low evictability and running out of disk space is imminent.
This is controlled via the following two new engine variables: -
bottomless_
: The usability (free space + evictable space) of blob cache below which all columnstore ingest is throttled.upload_ throttle_ hard_ limit_ cache_ usability -
bottomless_
: The usability (free space + evictable space) of blob cache below which some columnstore ingest is throttled.upload_ throttle_ soft_ limit_ cache_ usability
-
-
Enhancement: Added a new
optimizer_
engine variable that controls a new filter derivation rewrite.not_ null_ filter_ derivation -
Enhancement: Added a new engine variable
observe_
that specifies the maximum time (in seconds) that anagg_ timeout_ secs OBSERVE
query can remain idle on an aggregator node before the query is terminated. -
Enhancement: Added a new engine variable
external_
that sets the maximum size (in MB) of the memory-mapped region used to communicate between the engine and collocated services.functions_ service_ buffer_ mb -
Modification: During the upgrade to SingleStore 8.
9, if the value of fts2_
is equal tomax_ connections 100000
, the value is set to32
.
Miscellaneous
-
Enhancement: Added the
DETERMINISTIC
clause to theCREATE FUNCTION
(UDF) command that instructs the query optimizer to assume that the created function is deterministic.Refer to CREATE FUNCTION (UDF) for more information. -
Enhancement: Added support for the
IGNORE <n> LINES
clause to theINFER PIPELINE
command for CSV files.Refer to Schema and Pipeline Inference for more information. -
Enhancement: Added support for using the
SKIP ALL ERRORS
clause during creation of Kafka pipelines for ingesting JSON formatted data. -
Enhancement: Added support for
SKIP ALL ERRORS
andSKIP PARSER ERRORS
clauses during creation of Kafka pipelines for ingesting Avro formatted data. -
Enhancement: Added the ability to load Kafka properties and headers with the
get_
function.kafka_ pipeline_ prop("<property>") -
Enhancement: Added the ability to override the
pipelines_
global variable for each Kafka pipeline using themax_ offsets_ per_ batch_ partition MAX_
pipeline variable inOFFSETS_ PER_ BATCH_ PARTITION CREATE PIPELINE
andALTER PIPELINE
commands. -
New feature: Added support for the following parameters in the
CONFIG
clause ofCREATE PIPELINE AS .
statement:. . LOAD DATA S3 -
file_
: Decompresses files with the specified extensions.compression -
file_
: Only ingest files modified after the specified timestamp.time_ threshold
Refer to S3 Configurations for more information.
-
-
Enhancement: Added ability to re-optimize a query multiple times.
Refer to Query Tuning for more information. -
Enhancement: Added the ability to use connection links to load Avro and Parquet formatted data stored in an AWS S3 bucket.
-
Enhancement: Enabled auto PROFILE for
INSERT.
and. . SELECT REPLACE.
query shapes.. . SELECT -
New feature: Added the
ENABLE_
clause in theOVERWRITE SELECT .
and. . INTO S3 SELECT .
statements that enables the overwriting of existing files.. . INTO LINK Refer to SELECT … INTO S3 for more information. -
Enhancement: Updated PROFILE to show
number_
andof_ blocks_ tested_ for_ block_ elim number_
for sub-segment elimination for flexible parallelism.of_ blocks_ eliminated_ for_ block_ elim -
Enhancement: Updated the
JSON_
functions to accept a JSON document as the only argument.EXTRACT_ <type> With this enhancement, it is possible to extract from JSON documents with a string, numeric, boolean, or NULL
value as the root of the document. -
Enhancement: Updated the simplified syntax for the
JSON_
function, to allow specifyingMATCH_ ANY() MATCH_
by appending aELEMENTS *
to the end of the keypath. -
Enhancement: Added the ability to load CSV and JSON files from an Amazon S3 bucket using a
LOAD DATA
query. -
Enhancement: Enhanced the
TABLE()
function to supportDISTINCT
. -
Enhancement: Improved recovery time for tables with incremental autostats by recovering statistics from disk instead of rebuilding them from scratch.
-
Bugfix: Fixed an issue that caused a deadlock between the
DROP TABLE
andAGGREGATOR SYNC AUTO_
queries.INCREMENT -
Enhancement: Improved the logic for selecting vectors close to vector threshold in vector range search.
-
Bugfix: Fixed an issue where attaching a leaf node to the Master Aggregator (MA) failed if the MA was still starting up.
-
Enhancement: Added support for BM25 partition-scoped scoring for phrase and proximity search queries.
-
Enhancement: Improved columnstore hash index performance on low cardinality column.
-
Enhancement: Improved the Debezium DDL statements parser.
-
Enhancement: Added row count estimate for joins in the output of
EXPLAIN
andPROFILE
queries. -
Enhancement: Added support for
LIMIT
andOFFSET
clauses in non-equalityWHERE
conditions in subselects. -
Enhancements: Added support for
LATERAL
joins for table-valued functions (TVFs). -
Enhancement:
SYNC DURABILITY
is always enabled for reference databases.Reference tables in user databases that use async durability may notice a decrease in performance for DDL and DML statements. -
Enhancement: Added support for
HIGHLIGHT .
as a computed column expression when creating a new table.. . AGAINST -
Enhancement: Added support for a query rewrite that enables hash joins when two tables are joined using the
JSON_
function.ARRAY_ CONTAINS_ <type> -
Enhancement: Fixed an undefined behavior in the failure path of an
OBSERVE
query. -
Enhancement: Added new support for sub-select to join rewrites for correlated subselects and nested scalar subselects.
-
Enhancement: Added support for null-accepting projections in scalar subselect queries.
-
Enhancement: Added support for
LATERAL
join subselects to reference any level of outer tables. -
Enhancement: Changed the default collation to
utf8mb4_
and default character set togeneral_ ci utf8mb4
forTEXT
andENUM
type columns forCSV
,JSON
,AVRO
, and parquet formats inINFER PIPELINE AS LOAD DATA
statements. -
Bugfix: Fixed an issue that caused dangling compute sessions after a failed
ALTER DATABASE
command. -
Bugfix: Fixed an issue where concurrent
ALTER
compute andREBALANCE
create unlimited storage partitions with wrong compute ID. -
Bugfix: Fixed a race condition between the transaction log garbage collection and database transition-to-master that could result in unrecoverable partitions.
-
Enhancement: Updated the distributed OpenSSL license file to 1.
0. 2zj. -
Enhancement: Added support for multi-column
IN
list predicate in theWHERE
clause of a query. -
Enhancement: Function mapped
IN
-lists now share the same signature in the plancache for the same set of built-in functions. -
Bugfix: Removed soft lock on the
CHARACTER SET
clause and added a warning to indicate invalid character set value in theLOAD DATA
statement. -
Bugfix: Database names can no longer end with big numbers, such as
db_
, to avoid conflicts with internal databases used in replication.<big_ number> -
Enhancement: Each node in the cluster now validates the availability of bottle service every minute and records any consecutive failures in the
LMV_
information schema view.EVENTS -
Enhancement: Added more information to the out-of-memory (OOM) errors.
-
Bugfix: Fixed an issue that occurred while parsing manifest files (associated with backup or restore operations) having more than 4096 characters.
-
Bugfix: Fixed an issue that caused duplication of storage blobs that have not been repaired yet for ongoing repair operations.
-
Bugfix: Fixed an issue in MySQL CDC-in pipelines where some MySQL tables with names containing
_
were not being replicated. -
Bugfix: Fixed a race condition that caused shutdown to wait on idle async compile manager thread.
-
Bugfix: Fixed an issue where repair operation gets stuck while converting milestones.
-
Bugfix: Fixed a synchronization issue where the database was not immediately available after some clustering operations.
-
Enhancement: Added the number of partitions in the optimizer to the debug profile export.
-
Enhancement: Added support for
VECTOR
type inREDUCE
built-in function. -
Enhancement: Improved the lockdown message when changing collation or character set related engine variables within
utf8mb4
character set. -
Bugfix: Fixed an issue with
PROMOTE AGGREGATOR .
command where a restart at the end of the command, followed by manually finishing the promote operation, caused reprovisioning of the old Master Aggregator.. . MASTER -
Bugfix: Fixed an issue that caused an empty network prefetch queue.
-
Bugfix: Fixed a distributed deadlock caused by a query blocking reference database reprovisioning in one part of the deadlock cycle.
-
Enhancement: Added the following headers to Data API responses:
-
Cache-control: no-store
-
Strict-Transport-Security: max-age=31536000
-
X-Content-Type-Options: nosniff
-
-
Bugfix: Fixed an issue with counting common table expressions (CTEs) references in a query.
-
Bugfix: Fixed a memory corruption issue caused by a rare race condition involving
MV_
information schema view.ACTIVE_ TRANSACTIONS -
Enhancement: Sharding planner now recognizes non-union style single partitioned derived table.
-
Enhancement: Disabled pipeline batches sample by default in
memsql_
.exporter -
Enhancement: Improved message for an error where a global variable cannot be set because of a table sharded on a computed column.
-
Enhancement: Improved connection stability for MongoDB® CDC-in pipelines.
-
Enhancement: Upgraded the librdkafka library to version 2.
4. 0-3. -
Bugfix: Fixed an issue where the Master Aggregator (MA) temporarily stopped behaving as the MA after being restarted.
-
Enhancement: Improved the performance of cluster operations through distributed plancache.
-
Bugfix: Fixed a bug with incorrect privilege checks.
-
Enhancement: Optimized row locking for internal transactions.
-
Enhancement: Improved performance of
REVOKE
in case of errors. -
Bugfix: Fixed an engine crash caused when the Master Aggregator received a
REMOVE AGGREGATOR
query with its own<host>:<port>
. -
Enhancement: Reduced the chances of reference database reprovisioning in universal storage if a snapshot is taken concurrently with Master Aggregator shutdown.
-
Enhancement:
TO_
andJSON() JSON_
built-in functions now convertBUILT_ OBJECT() VECTOR
type arguments to JSON array instead of a JSON string. -
Enhancement: Added support for ZSTD compressed Kafka topics to Kafka pipelines.
-
Bugfix: Fixed an issue where
ClampTimestamp
spammed the tracelog. -
Enhancement: Limit DR connection attempts to the primary node when it is failing connections, avoiding a substantial increase in
TIME_
sockets.WAIT -
Enhancement: Improved error messages related to reprovisioning.
-
Enhancement: Implemented rotation and deletion of webproxy socket logs.
-
Enhancement: Ingest Kafka headers into the SingleStore table if they are included in the Kafka message.
-
Enhancement: Lockdown hints for common table expressions (CTEs).
-
Enhancement: Locked select and row count hints for multi-table views.
-
Bugfix: Fixed an issue where slow snapshots blocked clustering operations.
-
Bugfix: Fixed an issue where unrecoverable reference databases did not auto-heal for a prolonged period of time and get blocked, for example, by alter operations.
-
Enhancement: Improved the error message for
Correlated subselect that cannot be transformed and does not match on shard keys
errors. -
Bugfix: Fixed data loss in sync durability in a rare race condition.
-
Enhancement: Improved processing of queries with redundant (superfluous)
EXISTS
subselects. -
Bugfix: Fixed the reference count of committed blobs on upgrade.
-
Enhancement: Added the
ON NODE <node_
clause to theid> SHOW PROFILE
command, which forwards the command to another aggregator. -
Enhancement: Improved enforcement of
internal_
engine variable.columnstore_ max_ uncompressed_ blob_ size -
Enhancement: Added a new clause
ENSURE_
to thePARTITION_ SAFETY REMOVE {LEAF | LEAVES}
command that prevents a leaf (or leaves) from being removed if it contains the last online instance of a partition. -
Enhancement: The
memsql_
now collects additional fields from theexporter mv_
information schema view.activities_ extended_ cumulative -
Enhancement: Removed the
TxPartitions
column and added a newTxTimestamp
column forOBSERVE
. -
Enhancement: Optimized full-text queries that contain an
ORDER BY .
over a full-text score and optionally filter on the same full-text clause.. . LIMIT . . . -
Enhancement: The
InternalId
in theOBSERVE
output is now unique across partitions. -
Enhancement: Added support for numeric range queries when doing full-text search against JSON fields.
Refer to numeric range queries for more information.
-
Enhancement: Set collation
utf8_
for thebin JSON_
builtin in cases where the input hasTO_ ARRAY utf8
character set, and set collationutf8mb4_
in cases where the input hasbin utf8mb4
character set. -
Enhancement: If a primary key is defined for the table, the
DELETE
change records forOBSERVE
queries now populate the primary key for tables instead of the internal ID. -
Enhancement: Added information on remote API calls and bottle service reliability metrics to
MV_
information schema view.BOTTOMLESS_ STATUS_ EXTENDED -
Enhancement: Added metrics to track the availability of unlimited storage and the bottle service to
MV_
information schema view.BOTTOMLESS_ SUMMARY -
Enhancement: Updated the syntax of
JSON_
to allow specifyingMATCH_ ANY() MATCH_
by appending aELEMENTS *
at the end of the keypath. -
Enhancement: Additional binlog position updates for CDC-in pipelines in the extractor pool queue.
-
Enhancement: Added support for resumable offsets into columnar segments for
OBSERVE
.The OBSERVE
query now returns 28-byte offsets.While 24-byte offsets are still valid with BEGIN AT
clause, the "strictly increasing" guarantee does not hold when comparing 24-byte and 28-byte offsets. -
Enhancement:
OBSERVE
now immediately flushes the result-set metadata to the client, eliminating the need to wait for rows to be returned to receive result metadata. -
Enhancement: Information schema queries are no longer case-insensitive with respect to database names.
-
Enhancement: Allow rewrite for joining tables on the
JSON_
function to support multiple predicates of that form on different columns.ARRAY_ CONTAINS_ <type> -
Bugfix: Fixed a snapshot not found error.
-
Bugfix: Fixed an issue where the result of a seek operation was not handled correctly.
-
Enhancement: Detect scalar subselect requirement at runtime rather than rewrite time for certain cases.
-
Bugfix: Fixed a bug in full-text search query compilation.
-
New Feature: High availability for the Master Aggregator (HA for MA) has been introduced.
This enhances the reliability and failover capabilities, allowing mission-critical workloads to remain highly available. Refer to High Availability for the Master Aggregator for more information. -
Enhancement: Enabled
LOAD DATA
queries to use the specifiedCHARACTER SET
to ingest string type fields. -
Enhancement: Optimized parsing of JSON computed columns.
Refer to JSON Computed Column Optimization for more information.
Last modified: November 15, 2024