8.9 Release Notes

Note

  • To deploy a SingleStore 8.9 cluster, refer to the Deploy SingleStore Guide.

  • To upgrade a self-managed install to this release, follow this guide.

  • To make a backup of a database in this release or to restore a database backup to this release, follow this guide.

  • New deployments of SingleStore 8.9 require a 64-bit version of RHEL/AlmaLinux 7 or later, or Debian 8 or later, with kernel 3.10 or later and glibc 2.17 or later. Refer to the System Requirements and Recommendations for additional information.

Release Highlights

Note

This is the complete list of new features and fixes in SingleStore engine version 8.9.

Full-Text Search - Analyzers and Tokenizers

Full-text search using SingleStore's VERSION 2 full-text index has been enhanced with support for custom analyzers and tokenizers. With this enhancement, SingleStore full-text indexes can be created to support languages other than English, for text that contains emails and URLs, with custom whitespace processing, and more. The full set of Apache Lucene analyzers and tokenizers is supported. Refer to Full Text VERSION 2 Custom Analyzers for more information.

Full-Text Search - Enhanced BM25 Scoring

Full-text search using SingleStore's VERSION 2 full-text index has updated BM25 scoring functionality. 

The BM25 function has been enhanced with support for boolean and boost queries, phrase and proximity search queries, and queries over multiple columns. Refer to BM25 for more information and examples.

A new function, BM25_GLOBAL, has been added to provide BM25 scoring across all partitions. With this new function, all rows in a table are scored together; collection and term statistics are calculated for a table, ensuring accurate scores relative to all rows in a table. The BM25_GLOBAL function augments the existing BM25 and MATCH functions, and it is more accurate and more expensive than both of these functions. Refer to BM25 for more information on the BM25_GLOBAL function.

Iceberg Continuous Ingest

Added support for continuous ingest of data from Iceberg tables. Upsert and append-only workloads are supported. In addition, manual upserts with the CREATE OR REPLACE command are supported. Refer to Iceberg Ingest for more information.

Iceberg - New Catalogs

Added support for Snowflake, REST, JDBC, Hive, and Polaris Catalogs for Iceberg Ingest using pipelines. Refer to Iceberg Ingest for more information.

Enhanced Disk Spilling

Added disk spilling for RIGHT and FULL OUTER JOIN. Refer to Disk Spilling for more information.

Writable Views

Writable views allow users to run UPDATE, INSERT, and DELETE queries on views. To enable writable views, set the enable_writable_views global variable to 1. Query the information_schema.VIEWS view to inspect if a view can be updated. Refer to CREATE VIEW for more information.

Other Improvements and Fixes

Vector Index on Nullable Column

Vector Indexes can be created on columns that are nullable. Prior to this improvement, vector indexes could only be created on columns that were declared NOT NULL. With this improvement, a user can insert a row containing text and a NULL vector value into a table with a vector index. The user can subsequently obtain a vector embedding for the text and update the row with that vector value. The updated value will be added to the vector index.

Vector Index Memory Tracking

The memory used by vector indexes can be tracked using the alloc_vector_index metric which is now available in SHOW STATUS EXTENDED. Refer to Vector Indexing and Tuning Vector Indexes and Queries for more information.

Other Performance Enhancements

  • Enhancement: Added sub-segment elimination for flexible parallelism. Refer to Flexible Parallelism for more information.

  • Enhancement: Performance of full-text search, specifically throughput performance, has been significantly improved.

  • Enhancement: Improved performance of VECTOR type user-defined variables (UDVs). These variables no longer require using extra typecast to BLOB data type.

  • Enhancement: Improved the performance of LOAD DATA queries that included the CHARACTER SET clause.

  • Enhancement: Improved performance of the CREATE PROJECTION command by skipping unique key checks.

  • Enhancement: Improved the performance of REPLACE query into a columnstore table when table-level locking is triggered.

  • Enhancement: Improved the performance of CREATE {TABLE | TABLES} AS INFER PIPELINE queries.

  • Enhancement: Optimized full-text queries with ORDER BY ... LIMIT on a full-text score and that optionally filter on the same full-text clause.

  • Enhancement: Significantly improved the performance (~20x) of certain JSON-based SQL queries, when JSON objects contain arrays of sub-objects. This optimization reduces the need to normalize the data into multiple tables to achieve high analytics performance.

    Queries that expand JSON arrays without any aggregations and/or perform the following operations benefit from this optimization:

    • Group by a field outside the array (in the GROUP BY clause)

    • Filter on the fields in the array

New Information Schema Views and Columns

  • Enhancement: Added an OBSERVE_DATABASE_OFFSETS information schema view that contains information on offsets for starting an OBSERVE query. Refer to OBSERVE_DATABASE_OFFSETS for more information.

  • Enhancement: Added the TABLE_NAME column to the LOAD_DATA_ERRORS information schema view. TABLE_NAME is the name of the table associated with the error.

  • Enhancement: Added the NODE_ID column to the MV_RECOVERY_STATUS information schema view that specifies the ID of the node from which the database is being recovered.

  • Enhancement: Added the MV_BOTTOMLESS_API_EVENTS_SUMMARY information schema view that contains a summary of remote API calls made from the engine. Refer to MV_BOTTOMLESS_API_EVENTS_SUMMARY for more information.

New Commands and Functions

  • New feature: Added the FULLTEXT SERVICE STOP command. This command stops the full-text V2 service running on any node connected to the aggregator on which the command is run. Refer to FULLTEXT SERVICE STOP for more information.

  • New feature: Added the following Identifier Generation Functions:

    • UUID_TO_BIN

    • BIN_TO_UUID

    • IS_UUID

  • New feature: Added a SHOW FULLTEXT SERVICE METRICS command that displays the diagnostic metrics for the JLucene full-text search in JSON format. Refer to SHOW FULLTEXT SERVICE METRICS for more information.

  • New feature: Added a SHOW CDC EXTRACTOR POOL command that displays information about the CDC-in pipelines. Refer to SHOW CDC EXTRACTOR POOL for more information.

  • New feature: Added a new JSON_MERGE_PATCH function that merges two JSON objects into a single JSON object. Refer to JSON_MERGE_PATCH for more information.

  • New feature: Added support for Lateral Join. Lateral join allows a subquery in the FROM clause of a SQL query to reference another table in that same FROM clause, which can simplify query syntax. Refer to Lateral Join for more information.

New or Modified Engine Variables

Refer to List of Engine Variables for information on each of the following engine variables.

  • Enhancement: Added a new engine variable sync_partitions_timeout_sec that specifies the timeout (in seconds) to synchronize the cluster metadata across the cluster.

  • Enhancement: Added a new engine variable disconnect_client_on_invalid_connection_state. If enabled, client connections are closed when their state becomes invalid.

  • Enhancement: Added a new engine variable synchronize_reference_timeout_ms that specifies the time (in seconds) long running queries wait for reference databases to synchronize on commit in the cluster.

  • Enhancement: Added a new engine variable assume_udfs_deterministic that controls behavior where SingleStore does extra work to avoid issues caused by UDFs that return different values when called repeatedly (e.g., are non-deterministic).

  • Enhancement: Added a new engine variable max_autostats_update_workers to tune the maximum number of background autostats update workers.

  • Enhancement: Added a new engine variable enable_writable_views that enables creation of writable views. Refer to CREATE VIEW for more information.

  • Enhancement: Added a new engine variable recovery_concurrency that controls the replay and database initialization concurrency during database recovery.

  • Enhancement: Updated the minimum value of json_document_max_children engine variable to 1 (from 128 previously).

  • Enhancement: Disabled the optimize_json_computed_column engine variable by default.

  • Enhancement: Added a new engine variable enable_block_level_stats_collection that controls the collection of block-level statistics for sub-segment elimination for flexible parallelism.

  • Enhancement: Added a new engine variable enable_block_stats_use_in_query that controls whether the block-level statistics are read and used during scan as part of sub-segment elimination for flexible parallelism.

  • Enhancement: Added a new engine variable pipelines_iceberg_heap_size to control heap size specifically for Iceberg pipelines.

  • Enhancement: Added json_collation global variable to control collation of JSON. The value of json_collation can be either utf8_bin or utf8mb4_bin.

  • Enhancement: Added a method to throttle upload ingest when blob cache has low evictability and running out of disk space is imminent. This is controlled via the following two new engine variables:

    • bottomless_upload_throttle_hard_limit_cache_usability: The usability (free space + evictable space) of blob cache below which all columnstore ingest is throttled.

    • bottomless_upload_throttle_soft_limit_cache_usability: The usability (free space + evictable space) of blob cache below which some columnstore ingest is throttled.

  • Enhancement: Added a new optimizer_not_null_filter_derivation engine variable that controls a new filter derivation rewrite.

  • Enhancement: Added a new engine variable observe_agg_timeout_secs that specifies the maximum time (in seconds) that an OBSERVE query can remain idle on an aggregator node before the query is terminated.

  • Enhancement: Added a new engine variable external_functions_service_buffer_mb that sets the maximum size (in MB) of the memory-mapped region used to communicate between the engine and collocated services.

  • Modification: During the upgrade to SingleStore 8.9, if the value of fts2_max_connections is equal to 100000, the value is set to 32.

Miscellaneous

  • Enhancement: Added the DETERMINISTIC clause to the CREATE FUNCTION (UDF) command that instructs the query optimizer to assume that the created function is deterministic. Refer to CREATE FUNCTION (UDF) for more information.

  • Enhancement: Added support for the IGNORE <n> LINES clause to the INFER PIPELINE command for CSV files. Refer to Schema and Pipeline Inference for more information.

  • Enhancement: Added support for using the SKIP ALL ERRORS clause during creation of Kafka pipelines for ingesting JSON formatted data.

  • Enhancement: Added support for SKIP ALL ERRORS and SKIP PARSER ERRORS clauses during creation of Kafka pipelines for ingesting Avro formatted data.

  • Enhancement: Added the ability to load Kafka properties and headers with the get_kafka_pipeline_prop("<property>") function.

  • Enhancement: Added the ability to override the pipelines_max_offsets_per_batch_partition global variable for each Kafka pipeline using the MAX_OFFSETS_PER_BATCH_PARTITION pipeline variable in CREATE PIPELINE and ALTER PIPELINE commands.

  • New feature: Added support for the following parameters in the CONFIG clause of CREATE PIPELINE AS ... LOAD DATA S3 statement:

    • file_compression: Decompresses files with the specified extensions.

    • file_time_threshold: Only ingest files modified after the specified timestamp.

    Refer to S3 Configurations for more information.

  • Enhancement: Added ability to re-optimize a query multiple times. Refer to Query Tuning for more information.

  • Enhancement: Added the ability to use connection links to load Avro and Parquet formatted data stored in an AWS S3 bucket.

  • Enhancement: Enabled auto PROFILE for INSERT...SELECT and REPLACE...SELECT query shapes.

  • New feature: Added the ENABLE_OVERWRITE clause in the SELECT ... INTO S3 and SELECT ... INTO LINK statements that enables the overwriting of existing files. Refer to SELECT … INTO S3 for more information.

  • Enhancement: Updated PROFILE to show number_of_blocks_tested_for_block_elim and number_of_blocks_eliminated_for_block_elim for sub-segment elimination for flexible parallelism.

  • Enhancement: Updated the JSON_EXTRACT_<type> functions to accept a JSON document as the only argument. With this enhancement, it is possible to extract from JSON documents with a string, numeric, boolean, or NULL value as the root of the document.

  • Enhancement: Updated the simplified syntax for the JSON_MATCH_ANY() function, to allow specifying MATCH_ELEMENTS by appending a * to the end of the keypath.

  • Enhancement: Added the ability to load CSV and JSON files from an Amazon S3 bucket using a LOAD DATA query.

  • Enhancement: Enhanced the TABLE() function to support DISTINCT.

  • Enhancement: Improved recovery time for tables with incremental autostats by recovering statistics from disk instead of rebuilding them from scratch.

  • Bugfix: Fixed an issue that caused a deadlock between the DROP TABLE and AGGREGATOR SYNC AUTO_INCREMENT queries.

  • Enhancement: Improved the logic for selecting vectors close to vector threshold in vector range search.

  • Bugfix: Fixed an issue where attaching a leaf node to the Master Aggregator (MA) failed if the MA was still starting up.

  • Enhancement: Added support for BM25 partition-scoped scoring for phrase and proximity search queries.

  • Enhancement: Improved columnstore hash index performance on low cardinality column.

  • Enhancement: Improved the Debezium DDL statements parser.

  • Enhancement: Added row count estimate for joins in the output of EXPLAIN and PROFILE queries.

  • Enhancement: Added support for LIMIT and OFFSET clauses in non-equality WHERE conditions in subselects.

  • Enhancements: Added support for LATERAL joins for table-valued functions (TVFs).

  • Enhancement: SYNC DURABILITY is always enabled for reference databases. Reference tables in user databases that use async durability may notice a decrease in performance for DDL and DML statements.

  • Enhancement: Added support for HIGHLIGHT ... AGAINST as a computed column expression when creating a new table.

  • Enhancement: Added support for a query rewrite that enables hash joins when two tables are joined using the JSON_ARRAY_CONTAINS_<type> function.

  • Enhancement: Fixed an undefined behavior in the failure path of an OBSERVE query.

  • Enhancement: Added new support for sub-select to join rewrites for correlated subselects and nested scalar subselects.

  • Enhancement: Added support for null-accepting projections in scalar subselect queries.

  • Enhancement: Added support for LATERAL join subselects to reference any level of outer tables.

  • Enhancement: Changed the default collation to utf8mb4_general_ci and default character set to utf8mb4 for TEXT and ENUM type columns for CSV, JSON, AVRO, and parquet formats in INFER PIPELINE AS LOAD DATA statements.

  • Bugfix: Fixed an issue that caused dangling compute sessions after a failed ALTER DATABASE command.

  • Bugfix: Fixed an issue where concurrent ALTER compute and REBALANCE create unlimited storage partitions with wrong compute ID.

  • Bugfix: Fixed a race condition between the transaction log garbage collection and database transition-to-master that could result in unrecoverable partitions.

  • Enhancement: Updated the distributed OpenSSL license file to 1.0.2zj.

  • Enhancement: Added support for multi-column IN list predicate in the WHERE clause of a query.

  • Enhancement: Function mapped IN-lists now share the same signature in the plancache for the same set of built-in functions.

  • Bugfix: Removed soft lock on the CHARACTER SET clause and added a warning to indicate invalid character set value in the LOAD DATA statement.

  • Bugfix: Database names can no longer end with big numbers, such as db_<big_number>, to avoid conflicts with internal databases used in replication.

  • Enhancement: Each node in the cluster now validates the availability of bottle service every minute and records any consecutive failures in the LMV_EVENTS information schema view.

  • Enhancement: Added more information to the out-of-memory (OOM) errors.

  • Bugfix: Fixed an issue that occurred while parsing manifest files (associated with backup or restore operations) having more than 4096 characters.

  • Bugfix: Fixed an issue that caused duplication of storage blobs that have not been repaired yet for ongoing repair operations.

  • Bugfix: Fixed an issue in MySQL CDC-in pipelines where some MySQL tables with names containing _ were not being replicated.

  • Bugfix: Fixed a race condition that caused shutdown to wait on idle async compile manager thread.

  • Bugfix: Fixed an issue where repair operation gets stuck while converting milestones.

  • Bugfix: Fixed a synchronization issue where the database was not immediately available after some clustering operations.

  • Enhancement: Added the number of partitions in the optimizer to the debug profile export.

  • Enhancement: Added support for VECTOR type in REDUCE built-in function.

  • Enhancement: Improved the lockdown message when changing collation or character set related engine variables within utf8mb4 character set.

  • Bugfix: Fixed an issue with PROMOTE AGGREGATOR ... MASTER command where a restart at the end of the command, followed by manually finishing the promote operation, caused reprovisioning of the old Master Aggregator.

  • Bugfix: Fixed an issue that caused an empty network prefetch queue.

  • Bugfix: Fixed a distributed deadlock caused by a query blocking reference database reprovisioning in one part of the deadlock cycle.

  • Enhancement: Added the following headers to Data API responses:

    • Cache-control: no-store

    • Strict-Transport-Security: max-age=31536000

    • X-Content-Type-Options: nosniff

  • Bugfix: Fixed an issue with counting common table expressions (CTEs) references in a query.

  • Bugfix: Fixed a memory corruption issue caused by a rare race condition involving MV_ACTIVE_TRANSACTIONS information schema view.

  • Enhancement: Sharding planner now recognizes non-union style single partitioned derived table.

  • Enhancement: Disabled pipeline batches sample by default in memsql_exporter.

  • Enhancement: Improved message for an error where a global variable cannot be set because of a table sharded on a computed column.

  • Enhancement: Improved connection stability for MongoDB® CDC-in pipelines.

  • Enhancement: Upgraded the librdkafka library to version 2.4.0-3.

  • Bugfix: Fixed an issue where the Master Aggregator (MA) temporarily stopped behaving as the MA after being restarted.

  • Enhancement: Improved the performance of cluster operations through distributed plancache.

  • Bugfix: Fixed a bug with incorrect privilege checks.

  • Enhancement: Optimized row locking for internal transactions.

  • Enhancement: Improved performance of REVOKE in case of errors.

  • Bugfix: Fixed an engine crash caused when the Master Aggregator received a REMOVE AGGREGATOR query with its own <host>:<port>.

  • Enhancement: Reduced the chances of reference database reprovisioning in universal storage if a snapshot is taken concurrently with Master Aggregator shutdown.

  • Enhancement: TO_JSON() and JSON_BUILT_OBJECT() built-in functions now convert VECTOR type arguments to JSON array instead of a JSON string.

  • Enhancement: Added support for ZSTD compressed Kafka topics to Kafka pipelines.

  • Bugfix: Fixed an issue where ClampTimestamp spammed the tracelog.

  • Enhancement: Limit DR connection attempts to the primary node when it is failing connections, avoiding a substantial increase in TIME_WAIT sockets.

  • Enhancement: Improved error messages related to reprovisioning.

  • Enhancement: Implemented rotation and deletion of webproxy socket logs.

  • Enhancement: Ingest Kafka headers into the SingleStore table if they are included in the Kafka message.

  • Enhancement: Lockdown hints for common table expressions (CTEs).

  • Enhancement: Locked select and row count hints for multi-table views.

  • Bugfix: Fixed an issue where slow snapshots blocked clustering operations.

  • Bugfix: Fixed an issue where unrecoverable reference databases did not auto-heal for a prolonged period of time and get blocked, for example, by alter operations.

  • Enhancement: Improved the error message for Correlated subselect that cannot be transformed and does not match on shard keys errors.

  • Bugfix: Fixed data loss in sync durability in a rare race condition.

  • Enhancement: Improved processing of queries with redundant (superfluous) EXISTS subselects.

  • Bugfix: Fixed the reference count of committed blobs on upgrade.

  • Enhancement: Added the ON NODE <node_id> clause to the SHOW PROFILE command, which forwards the command to another aggregator.

  • Enhancement: Improved enforcement of internal_columnstore_max_uncompressed_blob_size engine variable.

  • Enhancement: Added a new clause ENSURE_PARTITION_SAFETY to the REMOVE {LEAF | LEAVES} command that prevents a leaf (or leaves) from being removed if it contains the last online instance of a partition.

  • Enhancement: The memsql_exporter now collects additional fields from the mv_activities_extended_cumulative information schema view.

  • Enhancement: Removed the TxPartitions column and added a new TxTimestamp column for OBSERVE.

  • Enhancement: Optimized full-text queries that contain an ORDER BY ... LIMIT ... over a full-text score and optionally filter on the same full-text clause.

  • Enhancement: The InternalId in the OBSERVE output is now unique across partitions.

  • Enhancement: Added support for numeric range queries when doing full-text search against JSON fields.

    Refer to numeric range queries for more information.

  • Enhancement: Set collation utf8_bin for the JSON_TO_ARRAY builtin in cases where the input has utf8 character set, and set collation utf8mb4_bin in cases where the input has utf8mb4 character set.

  • Enhancement: If a primary key is defined for the table, the DELETE change records for OBSERVE queries now populate the primary key for tables instead of the internal ID.

  • Enhancement: Added information on remote API calls and bottle service reliability metrics to MV_BOTTOMLESS_STATUS_EXTENDED information schema view.

  • Enhancement: Added metrics to track the availability of unlimited storage and the bottle service to MV_BOTTOMLESS_SUMMARY information schema view.

  • Enhancement: Updated the syntax of JSON_MATCH_ANY() to allow specifying MATCH_ELEMENTS by appending a * at the end of the keypath.

  • Enhancement: Additional binlog position updates for CDC-in pipelines in the extractor pool queue.

  • Enhancement: Added support for resumable offsets into columnar segments for OBSERVE. The OBSERVE query now returns 28-byte offsets. While 24-byte offsets are still valid with BEGIN AT clause, the "strictly increasing" guarantee does not hold when comparing 24-byte and 28-byte offsets.

  • Enhancement: OBSERVE now immediately flushes the result-set metadata to the client, eliminating the need to wait for rows to be returned to receive result metadata.

  • Enhancement: Information schema queries are no longer case-insensitive with respect to database names.

  • Enhancement: Allow rewrite for joining tables on the JSON_ARRAY_CONTAINS_<type> function to support multiple predicates of that form on different columns.

  • Bugfix: Fixed a snapshot not found error.

  • Bugfix: Fixed an issue where the result of a seek operation was not handled correctly.

  • Enhancement: Detect scalar subselect requirement at runtime rather than rewrite time for certain cases.

  • Bugfix: Fixed a bug in full-text search query compilation.

  • New Feature: High availability for the Master Aggregator (HA for MA) has been introduced. This enhances the reliability and failover capabilities, allowing mission-critical workloads to remain highly available. Refer to High Availability for the Master Aggregator for more information.

  • Enhancement: Enabled LOAD DATA queries to use the specified CHARACTER SET to ingest string type fields.

  • Enhancement: Optimized parsing of JSON computed columns. Refer to JSON Computed Column Optimization for more information.

Last modified: November 15, 2024

Was this article helpful?