8.7 Release Notes

Default Charset and Collation

The default charset has changed in version 8.7:

  • The default server character set is now utf8mb4 (previously utf8).

  • The default collation is now utf8mb4_general_ci (previously utf8_general_ci).

Newly created workspaces will have these values as defaults, ensuring that characters with 4-byte encoding (such as emojis) are treated correctly. Existing workspaces will retain their current settings.

The following additional notes apply:

JSON Collation: JSON now has utf8mb4_bin collation unless collation is specified explicitly, regardless of the value of the default character set. This change can affect JSON columns in the information schema, as they will have utf8mb4_bin collation after an upgrade.

Upgrading: For all existing workspaces, the default collation and collations of TEXT/JSON columns will not be changed.

JSON_EXTRACT_STRING: To support using the JSON_EXTRACT_STRING built-in function on JSON with utf8mb4_bin collation, it is highly recommended to set the json_extract_string_collation engine variable to SERVER_V2, which is default beginning in version 7.8.21. If json_extract_string_collation is set to AUTO or SERVER, using a JSON argument with utf8mb4_bin collation may throw an error. To avoid this, set json_extract_string_collation to SERVER_V2.

Release Highlights

Note

This is the complete list of new features and fixes in engine version 8.7 and its maintenance patches. For a list of all new features and bug fixes added in maintenance releases only, see the maintenance release changelog. If you are upgrading from an existing 8.7 version, see the changelog for a list of new features and fixes added to 8.7 by date.

Vector range search allows users to query for vectors that have a similarity score beyond a specified threshold using a vector index. This feature supplements the existing indexed vector search in SingleStore. Refer to Vector Range Search for more information.

Full-Text Search Enhancement

Full-text search version 2 is the enhanced and recommended use of the full-text search functionality in SingleStore. The legacy full-text search functionality has been deprecated.

Full-text search version 2:

  • Supports integration with the Java Lucene library.

  • Supports BM25 relevancy scoring for term queries at a partition level.

  • Allows creation and querying of full-text indexes on JSON columns.

Refer to Working with Full-Text Search for more information.

Schema and Pipeline Inference

Table and pipeline definitions can be inferred from input files using the INFER and CREATE INFERRED commands which create definitions for table or pipeline based on input files. Refer to Schema and Pipeline Inference for more information.

Ingest from Apache Iceberg

Apache Iceberg tables can be directly ingested into SingleStore using pipelines. Refer to Iceberg Ingest for more information.

Feedback Reoptimization

This feature can be used to reoptimize a poorly performing query by feeding in the statistics collected from the query's initial run back into the optimizer instead of repeatedly using the original estimates. It helps fix query performance issues without the need to add any hints or rewrites. Refer to Query Tuning for more information.

Other Improvements and Fixes

Vector Type Fixes and Enhancements

Vector - Project Format

The default projection format for VECTOR-type columns now is binary format. The format can be controlled via the vector_type_project_format engine variable. The currently supported values are BINARY and JSON. Refer to Vector Type for more information and for examples of using the vector_type_project_format engine variable.

Vector - String Builtins

String builtins now operate on the binary representation of VECTOR arguments instead of the JSON string representation. String functions, such as CONCAT, LENGTH, and SUBSTR previously operated on the JSON representation of a vector, interpreted as a string. This behavior is deprecated. In the 8.7 release, string functions operate on the binary representation of a vector.

This is a breaking change; those using the VECTOR type should review their application for the use of string functions on VECTOR arguments before upgrading to 8.7. Refer to Vector Type for more information.

Vector - HEX Function

The HEX function now operates on the binary representation of VECTOR arguments instead of the JSON string representation.

This is a breaking change; those using the VECTOR type should review their application for the use of the HEX function on vector arguments before upgrading to 8.7. Refer to Vector Type for more information.

Vector Comparisons

Added the ability to compare VECTOR-type expressions vs. BLOBs.

Other Performance Enhancements

  • Enhancement: The library used for building IVF_PQFS indexes, FAISS, has been updated to improve the performance of IVF_PQFS indexes. With this change, nprobe is now a search option for IVF_PQFS indexes in addition to being an index-building option. Refer to Vector Indexing for more information.

  • Bugfix: Resolved inaccurate memory estimation due to out-of-sync stats between aggregators and leaf nodes by only using collated memory usage stat for profile queries.

  • Enhancement: Improved the performance of Point In Time Recovery (PITR).

  • Enhancement: Improved performance related to PIPELINES_BATCHES information schema table.

  • Enhancement: Improved query performance up to 2x when retrieving data blobs from the object store for unlimited storage databases. This is achieved by allowing blob files that are downloaded from the object store to fsync() in the background while enabling query threads to read them immediately. This process occurs transparently without any user intervention.

Query Optimization Enhancements:

  • Enabled decorrelating scalar subselects that are not equi-joins on fields. Decorrelation involves transforming a query that contains a correlated subselect into a join that can be more efficiently executed by the database engine.

  • Improved the optimization of queries containing ORDER BY and LIMIT in a subselect.

New Information Schema Views and Columns

  • Enhancement: Updated the MV_BOTTOMLESS_STATUS_EXTENDED information schema table (8.7.4):

    • Removed the LATEST_API_FAILURE_RATE and AVERAGE_API_LATENCY_US columns.

    • Added the following columns:

      SUCCESS_API_PER_SECOND - Number of successful remote storage api calls per second over the last 30 seconds.

      FAILED_API_PER_SECOND - Number of failed remote storage api calls per second over the last 30 seconds.

      AVERAGE_API_EVENT_LATENCY_US - Average latency to complete a remote storage api event over the past 30 seconds. Includes time taken on retries or waiting for rate limiting.

      AVERAGE_API_RESPONSE_LATENCY_US - Average latency of receiving a response for a remote storage api request. Excludes waiting for rate limiting or retries. NOTE: This is relevant only for S3. otherwise this field is identical to AVERAGE_API_EVENT_LATENCY_US.

  • Enhancement: Added the following monitoring metrics to the information_schema.mv_bottomless_status_extended view: NUM_SUCCESS_API_REQUESTS, NUM_FAILED_API_REQUESTS, LATEST_API_FAILURE_RATE, AVERAGE_API_LATENCY_US, IS_GC_RUNNING, NUM_REMOTE_FILES_DELETED, and NUM_SLOW_DOWNS. (8.7.2)

  • Added the new MV_VERSION_HISTORY view. This view shows the upgrade history for a workspace. There are also two related views: LMV_VERSION_HISTORY_EXTENDED and MV_VERSION_HISTORY_EXTENDED.

New Commands and Functions

  • Enhancement: Added Feedback Reoptimization via the REOPTIMIZE SQL command. The command is used to reoptimize a query based on previous executions of it. You can reoptimize a query multiple times, pick execution plans from any iteration, and then choose the best iteration's plan to commit to use. Syntax:

    REOPTIMIZE [[EXPLAIN] | [COMMIT]] [<reoptimize_level>]

    For more information, see REOPTIMIZE.

    Added in 8.7.7

  • Introduced the BM25( ) function. This function is used to employ BM25 partition-scoped scoring for VERSION 2 full-text search queries. Refer to Working with Full-Text Search for more information.

New or Modified Engine Variables

  • Enhancement: Enabled log_file_size_partitions and log_file_size_ref_dbs engine variables on SingleStore Helios. (8.7.7)

  • Enhancement: Added a new engine variable, sampling_estimate_approach, used to configure sampling in columnstore tables. (8.7.7)

    See Statistics and Sampling Concepts for more information.

  • An engine variable vector_type_project_format has been added to allow the users to control whether vectors are output in binary or JSON format. The default value is BINARY for new deployments, but will be set to JSON for upgrades from 8.5 and will not be changed in future upgrades. Refer to List of Engine Variables for more information.

  • Several engine variables have been introduced to configure and manage the enhanced full-text search (FTS V2) functionality. Refer to the Working with Full-Text Search page for more information. The new engine variables are:

    • fts2_java_home - If set, this provides the JAVA_HOME for the full-text search service.  Not configurable on SingleStore Helios.

    • fts2_java_path - If set, this provides the path for the Java executable used by the full-text search service. Not configurable on SingleStore Helios.

    • fts2_ctrl_endpoint - Collocated service control endpoint for the full-text search service. Not configurable on SingleStore Helios.

    • fts2_endpoint - Collocated service request endpoint for full-text search service. Not configurable on SingleStore Helios.

    • fts2_init_memory_mb - The amount of heap memory with which to start the Java process for the full-text search collocated service. If it is 0, the Java default is used.

    • fts2_max_connections - The maximum number of parallel requests that the full-text search collocated service can accept before queuing them.

    • fts2_max_idle_seconds - The number of seconds that the full-text search collocated service may be idle before shutting itself down.

    • fts2_max_memory_mb - The maximum amount of heap memory that the Java process for the full-text search collocated service may use. If it is 0, the Java default is used.

    • fts2_monitor_interval_secs - This is the number of seconds between monitor polls (0 disables it).

    • fts2_position_increment_gap - Sets the positionIncrementGap for use when indexing multi-valued fields. Concatenates multiple values within a field. This variable controls the amount of virtual white space that is inserted between same-field values during a search. This helps prevent proximity and phrase queries from erroneously matching across field instances. The default value gives good results in most cases.

    • fts2_query_timeout - Sets the maximum number of seconds that a query will wait for a response from the full-text service before returning a timeout error.

    • fts2_stop_timeout_secs - When the full-text search collocated service is shut down in a controlled manner, this sets the number of seconds to wait before it must be forcibly killed.

  • Full for table evictions is now the default value for the global engine variable enable_idle_table_eviction. Refer to Sync Variables Lists for more information.

  • Adjusted the lower threshold for max_allowed_packet engine variable. Refer to Sync Variables Lists for more information.

  • An engine variable enable_iceberg_ingest has been added which can be used to turn on the new Iceberg Ingest feature. Refer to List of Engine Variables for more information.

  • The enable_json_statistics engine variable is now configurable in SingleStore Helios.

  • Introduced an engine variable highlight_allow_ascii_binary_strings to control whether HIGHLIGHT allows ascii binary strings as input argument. This variable controls the behavior of HIGHLIGHT (expression) AGAINST (query_expression, [max_number_fragments]) statement if it is allowed to process ascii binary strings or not. Refer to the Sync Variables.

  • An engine variable enable_use_of_stale_incremental_stats has been added to allow the query optimizer to use stale statistics instead of querying for statistics during optimization.

  • Added an engine variable max_sys_mem_usage_for_load_data that sets percentage of maximum_memory available for allocation to the load_data_read_size and load_data_write_size variables. This variable limits the memory usage by LOAD DATA read/write operations

  • Added an engine variable enable_histogram_auto_creation_for_joins that enables the automatic creation of histograms based on JOIN clauses.

  • Added an engine variable pipelines_cdc_max_extractors that specifies the maximum number of CDC-in extractor instances that can run concurrently.

  • Added an engine variable pipelines_cdc_min_extractor_lifetime_s that specifies the minimum duration (in seconds) that the extractor allocates to a single pipeline for ingesting data and listening to CDC events.

Miscellaneous

  • Bugfix: Fixed an issue where indexes were not case-sensitive when performing full-text search against JSON keys. (8.7.7)

  • Bugfix: Fixed an issue where ALTER TABLE could throw errors related to VECTOR-type columns, even if those columns were not part of the ALTER. (8.7.7)

  • Bugfix: Fixed an issue in CDC-in pipelines causing delays between batches. (8.7.7)

  • Enhancement: Added websocket_proxy.log to the files managed via Trace Log Rotation. (8.7.7)

  • Enhancement: Added support for numeric range queries when doing full-text search against JSON fields. (8.7.7)

    See numeric range queries for more information.

  • Enhancement: Enabled lateral join subselects to reference fields in grandparent SELECT. (8.7.7)

  • Bugfix: Fixed an issue that could occur when replacing JSON expressions with computed columns in GROUP BY lists. (8.7.7)

  • Bugfix: Fixed display of microseconds for default TIMESTAMP(6) column values in system information commands (such as DESCRIBE TABLE or INFORMATION_SCHEMA.COLUMNS.COLUMN_DEFAULT). (8.7.6)

  • Bugfix: Fixed a potential false positive output for OBSERVE queries during the first delete of large columnstore segments. (8.7.6)

  • Enhancement: Allow leading 0x prefix for OBSERVE offsets. (8.7.6)

  • Bugfix: Fixed an issue where the output of the OBSERVE query for the first delete on a columnstore segment was the inverse of expected. (8.7.6)

  • Bugfix: Fixed an issue where user defined variables could read collation-related variables incorrectly during a SET @var = ... command. (8.7.6)

  • Bugfix: Fixed a table eviction issue that can crash a node when replaying straight-to-disk inserts on unique key tables during database recovery. (8.7.6)

  • Bugfix: Fixed an issue in MongoDB CDC-in pipelines which blocks data ingestion after ALTER PIPELINE. (8.7.6)

  • Enhancement: Allow NULL values in JSON columns when used with Full-Text search. (8.7.6)

  • Bugfix: Fixed several system warnings that occur when running a SELECT from the MV_VERSION_HISTORY information schema view. (8.7.6)

  • Enhancement: Improved performance of VECTOR-type User-Defined Variables(UDVs). No longer require using extra typecast to BLOB datatype. (8.7.6)

  • Enhancement: Added support for INNER/LEFT lateral joins. (8.7.5)

  • Bugfix: Fixed issue that caused protocol errors in the pipelines. (8.7.5)

  • Bugfix: Fixed an issue to now avoid prematurely assigning column reference to namespace. (8.7.5)

  • Bugfix: Temporarily disallow shallow table copy for tables created on version 8.0 or before. (8.7.4)

  • Enhancement: Enhance table pushdown to union rewrite to enable derived table pushdown based on row count estimation. (8.7.4)

  • Enhancement: Added support for ON NODE clause for SHOW PROFILE PROCESS|PLAN, which forwards the command to another aggregator. (8.7.4)

  • Enhancement: Improved performance of the CREATE PROJECTION command. (8.7.4)

  • Bugfix: Unneeded sort fields will no longer be projected to the aggregator. (8.7.4)

  • Enhancement: Improved the performance of CREATE TABLES AS INFER PIPELINE. (8.7.4)

  • Bugfix: Fixed an issue that could occur where a join with TABLE(JSON_TO_ARRAY(JSON_EXTRACT_JSON())) is performed with numeric keys in the JSON_EXTRACT_JSON() statement could possibly lead to incorrect output or crashes. (8.7.4)

  • Bugfix: Identified an upgrade scenario that causes a blob leak and added a command to delete the leaked blobs. As a result, temporarily disallowing shallow table copy for tables created on version 8.0 or before. (8.7.4)

  • Enhancement: Set collation utf8_bin for the JSON_TO_ARRAY builtin in cases where the input has utf8 charset, and set collation utf8mb4_bin in cases where the input has utf8mb4 charset. (8.7.4)

  • Bugfix: An error is reported when inserting incorrect data into vector column for any data_conversion_compatibility_level. (8.7.4)

  • Bugfix: Fixed a possible connectivity issue and eliminated delay when starting the Lucene service for the first time. (8.7.4)

  • Bugfix: Fixed an edge case where unlimited storage garbage collection was causing the master aggregator to run out of connections. (8.7.3)

  • Bugfix: Fixed a crash that occurred when dropping a database that contained tables with both very high and low column IDs. (8.7.3)

  • Bugfix: Fixed a crash that occurred when spilling a hash join operation with at least 3 tables. (8.7.2)

  • Bugfix: Fixed a crash that occurred when running the SHOW CREATE PROJECTIONS command when using the database name in the syntax (SHOW CREATE PROJECTIONS database_name.projection_name...). (8.7.2)

  • Bugfix: Fixed a bug where the full-text service controller would periodically restart the process when the started process hadn't fully completed. (8.7.2)

  • Bugfix: Fixed an issue where INSERT ... SELECT into VECTOR type columns returned an error. (8.7.2)

  • Bugfix: Fixed issues with the TO_CHAR() function where results were truncated or reading communication packet errors were generated. (8.7.2)

  • Bugfix: Fixed an integer overflow issue in unique integer columns by changing the storage type to a 32-bit unsigned integer. (8.7.2)

  • Bugfix: Fixed incorrect results in an uncommon scenario where hash join operators with an optimization path need to spill. (8.7.2)

  • Enhancement: The memsql_exporter now collects additional fields from the mv_activities_extended_cumulative information schema view. (8.7.2)

  • Enhancement: Updated the default client.id for the Kafka configuration to the SingleStore partner ID. (8.7.2)

  • Enhancement: Added support for Iceberg ingest via Snowflake, REST, and JDBC. (8.7.2)

  • Enhancement: Added support for BM25 full-text search (FTS) as a built-in feature. (8.7.2)

  • Bugfix: Fixed an edge case regression with SingleStore Kai and information schema queries when using multiple workspaces and databases. (8.7.2)

  • Enhancement: Indexed vector search has been upgraded so that filters are applied before index search. This removes the need to adjust the search parameter k in indexed searches with a filter.

  • Enhancement: Introduced the SUB_PARTITIONS n argument to the CREATE DATABASE command. This argument allows you to set the number of sub-partitions per physical partition when a new database is created. Using this optional argument in the CREATE DATABASE command will overwrite the value set by the sub_to_physical_partition_ratio variable. The value of n must be set at 0 or a power of 2 value of the physical partition count.

  • Enhancement: A user with both the GRANT OPTION and SUPER privileges can now grant and remove any privilege to the users (including themselves) within a scoped database.

  • Enhancement: The deprecated Strict Mode feature is officially removed from SingleStore.

  • Enhancement: Improved error reporting for exceeding stack memory during SQL parsing for CTEs.

  • Bugfix: Fixed a bug that could cause scans on non-unique indexes to return incorrect results or crash.

  • Bugfix: Fixed a crash in a rare case involving ALTER TABLE and failovers.

  • Bugfix: Fixed a bug where the cleanup of unlimited storage data outside of the retention period is delayed when a database is restored.

  • Enhancement: Improved error messaging for unsupported index types on columnstore tables.

  • Bugfix: Fixed an error with the  total cache size after an OPTIMIZE TABLE WARM BLOB CACHE command.

  • Enhancement: Updated the default return type of the FORMAT function to VARCHAR.

  • Bugfix: Fixed a bug with LIMIT 0 in an EXIST clause in a subselect such as WHERE EXISTS (SELECT … LIMIT 0).

  • Bugfix: Fixed incorrect results that occur in some DELETE commands that contain a self-join and a LIMIT clause.

  • Bugfix: Fixed incorrect results and deadlocks that occur in some INSERT SELECTS inside multi-statement transactions.

  • Bugfix: Restricted queries on information_schema tables to be case-sensitive with respect to database names.

  • Bugfix: Table-level locking (such as large UPDATEs or DELETEs) no longer blocks the background flusher.

  • Bugfix: Fixed an error encountered while backing up a database using an hyphenated resource group name.

  • Enhancement: Unnecessary LIMIT clauses are removed when combined with unique key filters.

  • Bugfix: Fixed an issue where histograms are built from up to 10x more rows than desired.

  • Enhancement: Added file lock to prevent multiple SingleStore instances starting on the same datadir.

  • Enhancement: Disabled JSON_TO_ARRAY TVF join optimization if query does not use the TVF output column.

  • Bugfix: An info_schema.mv_cached_blobs query will no longer affect blob cache's internal eviction performance.

  • Bugfix: Fixed an unlimited storage issue with checking for missing files before downloading.

  • Bugfix: Fixed a rare situation where a windows function would not be killable.

  • Bugfix: Fixed a rare crash scenario with mv views.

  • Enhancement: Added support histogram data collection for BSON keys.

  • Enhancement: Allow using TABLE builtin with info schema tables (for example: SELECT * FROM information_schema.tables JOIN TABLE(SPLIT(table_name, '_')) LIMIT 2;).

  • Enhancement: Added support for expressions with VALUE() in the delete condition for ON DUPLICATE KEY DELETE in pipelines.

  • Enhancement: Engine variable optimizer_empty_tables_limit restricts to only show rule-based optimization warning message when cost based optimization is meaningful.

  • Bugfix: Fixed an issue in backup restore when the backup has a corrupted GV timestamp in the about snapshot record.

  • Enhancement: Provided the ability to collect non-truncated pipeline errors (CREATE PIPELINE ... DEBUG).

  • Enhancement: Improved performance for SELECT INTO kafka.

  • Enhancement: Added support for correlated subselects with LIMIT. ORDER BY and OFFSET are optional, but if they are present, they are also supported.

  • Enhancement: Allow EXPLAIN for ATTACH DATABASE READ ONLY. Nothing is returned if it could not be attached read only.

  • Enhancement: The EVENT_TYPE column of the MV_BOTTOMLESS_API_EVENTS view was modified to include the COPY event.

  • Enhancement: Added two file types to mv_bottomless_api_events: Backup Blob and Backup Snapshot

  • Enhancement: Added a visibility (security) check for detached databases.

  • Enhancement: Added JSON validation for link credentials and configuration.

  • Enhancement: Added the optional IF NOT EXISTS clause to the CREATE LINK and CREATE RESOURCE POOL statements.

  • Bugfix: Fixed a bug that blocked histograms from being enabled on remote tables.

  • Bugfix: LAST_INSERT_ID will now correctly retrieve the last inserted ID for a forwarded INSERT query.

  • Enhancement: Added support for mongodb.connection.string parameter in the CONFIG/CREDENTIAL clause of the CREATE AGGREGATOR PIPELINE ... AS LOAD DATA MONGODB statement. Refer to Load Data from MongoDB® for more information.

  • Bugfix: Fixed an issue with incorrect results in information_schema.PROJECTIONS when filtering on the PROJECTION_NAME column.

In this section

Last modified: July 23, 2024

Was this article helpful?