SingleStore Helios Release Notes

New features added to SingleStore Helios listed by month.

January 2025

Highlights

Full-Text Search - Analyzers and Tokenizers

Full-text search using SingleStore's VERSION 2 full-text index has been enhanced with support for custom analyzers and tokenizers. With this enhancement, SingleStore full-text indexes can be created to support languages other than English, for text that contains emails and URLs, with custom whitespace processing, and more. The n_gram tokenizer, which breaks words into n-grams, sequences of n adjacent symbols is included. The full set of Apache Lucene analyzers and tokenizers is supported. Refer to Full Text VERSION 2 Custom Analyzers for more information.

Full-Text Search - Enhanced BM25 Scoring

Full-text search using SingleStore's VERSION 2 full-text index has updated BM25 scoring functionality. 

The BM25 function has been enhanced with support for boolean and boost queries, phrase and proximity search queries, and queries over multiple columns. Refer to BM25 for more information and examples.

A new function, BM25_GLOBAL, has been added to provide BM25 scoring across all partitions. With this new function, all rows in a table are scored together; collection and term statistics are calculated for a table, ensuring accurate scores relative to all rows in a table. The BM25_GLOBAL function augments the existing BM25 and MATCH functions, and it is more accurate and more expensive than both of these functions. Refer to BM25 for more information on the BM25_GLOBAL function.

Iceberg Continuous Ingest

Added support for continuous ingest of data from Iceberg tables. Upsert and append-only workloads are supported. In addition, manual upserts with the CREATE OR REPLACE command are supported. Refer to Iceberg Ingest for more information.

Iceberg - New Catalogs

Added support for Snowflake, REST, JDBC, Hive, and Polaris Catalogs for Iceberg Ingest using pipelines. Refer to Iceberg Ingest for more information.

Enhanced Disk Spilling

Added disk spilling for RIGHT and FULL OUTER JOIN.

Writable Views

Writable views allow users to run UPDATE, INSERT, and DELETE queries on views. To enable writable views, set the enable_writable_views global variable to 1. Query the information_schema.VIEWS view to inspect if a view can be updated. Refer to CREATE VIEW for more information.

Other Improvements and Fixes

Vector Index on Nullable Column

Vector Indexes can be created on columns that are nullable. Prior to this improvement, vector indexes could only be created on columns that were declared NOT NULL. With this improvement, a user can insert a row containing text and a NULL vector value into a table with a vector index. The user can subsequently obtain a vector embedding for the text and update the row with that vector value. The updated value will be added to the vector index.

Vector Index Memory Tracking

The memory used by vector indexes can be tracked using the alloc_vector_index metric which is now available in SHOW STATUS EXTENDED. Refer to Vector Indexing and Tuning Vector Indexes and Queries for more information.

November 2024

  • Enhancement: Improved out of memory (OOM) error message to make the message more accurate.

October 2024

  • Enhancement: Improved the performance of regular expression (regex) operations.

  • Enhancement: Improved the performance of REPLACE query into a columnstore table when table-level locking is triggered.

  • Enhancement: Added a variable, max_autostats_update_workers, to tune the maximum number of background autostats update workers.

  • Enhancement: Improved the Debezium DDL statements parser.

  • Enhancement: Improved the Debezium DDL statements parser.

September 2024

  • Enhancement: Added single-parameter JSON_EXTRACT builtins.

  • Enhancement: Disabled the optimize_json_computed_column engine variable by default.

  • Bugfix: Fixed a crash when using a unique key to upsert into a table when JSON column optimization is enabled.

  • Bugfix: Fixed a certificate refresh issue for connections to remote storage in long-running clusters.

  • Enhancement: Both END AT TAIL and the DDL gracefully end OBSERVE queries instead of producing an error.

  • Enhancement: Added a new observe_agg_timeout_secs global variable that sets the number of seconds an OBSERVE query can block on sending data before giving the consumer a network error.

  • Bugfix: Fixed an undefined behavior in the failure path of an OBSERVE query.

  • Enhancement: Where appropriate, the DELETE records for OBSERVE queries now populate the primary key for tables instead of the internal ID.

  • Enhancement: Removed the TxPartitions column and added a new TxTimestamp column for OBSERVE.

  • Enhancement: The internal IDs for OBSERVE are now unique across partitions.

  • Enhancement: OBSERVE now immediately flushes the result-set metadata to the client, eliminating the need to wait for rows to be returned to receive result metadata.

  • Bugfix: Fixed the failure to reset the socket-write timeout after an OBSERVE query has completed.

  • Enhancement: Reading segments in OBSERVE now respects KILL QUERY.

  • Bugfix: Fixed misleading KILL QUERY tracelog messages in OBSERVE.

  • Bugfix: Fixed connection reset errors when the value in the observe_leaf_timeout_secs engine variable was not used correctly with OBSERVE.

  • Bugfix: Fixed BEGIN AT in OBSERVE when the offset is in the middle of a snapshot on an unlimited storage database.

  • Enhancement: Added an information_schema.OBSERVE_DATABASE_OFFSETS view for obtaining various OBSERVE offsets.

  • Enhancement: Added support for resumable offsets into columnar segments for OBSERVE. The OBSERVE query now returns 28-byte offsets. While 24-byte offsets are still valid with BEGIN AT, the "strictly increasing" guarantee does not hold when comparing 24-byte and 28-byte offsets.

  • Bugfix: Fixed OBSERVE returning the wrong transaction ID (txId) when resuming from some transactions.

  • Bugfix: Fixed an issue where the output of OBSERVE would be incorrect when performing a delete on a large columnstore segment.

  • Bugfix: Fixed some cases of an incorrect transaction ID (txId) for snapshot records. BeginSnapshot/CommitSnapshot can now report nested BeginTransaction/CommitTransaction when there are unresolved distributed transactions while the snapshot is taken.

  • Enhancement: Added metrics to track the availability of unlimited storage and the bottle service.

  • Enhancement: Added support for query rewrite that enables hash joins when two tables are joined together using the JSON_ARRAY_CONTAINS_<type> function.

  • Enhancement: Added bottle service reliability metrics to MV_BOTTOMLESS_STATUS_EXTENDED.

  • Enhancement: Fixed the KILLED_QUERIES counter in MV_RESOURCE_POOL_STATUS.

  • Enhancement: Added support for query rewrite for certain query patterns with EXISTS and subselects.

  • Enhancement: Added support for BM25 over multiple columns.

  • Enhancement: Added support for BM25 partition-scoped scoring for Boolean and boost queries.

  • Bugfix: Fixed a rare issue where a membership-related clustering operation could delay the re-provision or deletion of a cluster database.

  • Enhancement: Added the json_collation engine variable to control JSON collation, the value of which can either be utf8_bin or utf8mb4_bin.

  • Enhancement: Improved the logic for selecting vectors close to the distance threshold during vector range search.

  • Bugfix: Fixed an issue where JSON_MATCH_ANY() queries with MATCH_ELEMENTS evaluated incorrectly on non-array values.

  • Enhancement: Updated JSON_MATCH_ANY() simplified syntax to allow specifying MATCH_ELEMENTS by appending a '*' to the end of the keypath.

  • Enhancement: Added support for BM25 partition-scoped scoring for phrase and proximity search queries.

  • Bugfix: Fixed sporadic reading failures from MV_SYSINFO_CPU.

  • Bugfix: Clarified the error message produced in some cases where a snapshot gets aborted.

  • Bugfix: Fixed schema history in incremental snapshot mode for MySQL CDC.

  • Enhancement: The optimize_json_computed_column engine variable is now disabled by default. This only impacts new workspaces/clusters.

August 2024

  • New Feature: Added the new FULLTEXT SERVICE STOP command. This command stops the full-text V2 service running on any node connected to the aggregator on which the command is run.

    Syntax:

    FULLTEXT SERVICE STOP

    FULLTEXT SERVICE STOP requires the PROCESS permission.

    This command is useful when making configuration changes to the full-text service via engine variables that only take effect when the service (re)starts (fts2_max_connections, for example). Also can be used to force a reclaim of resources the service might be using before the idle-out time is reached.

    The service run state can be checked using SHOW FULLTEXT SERVICE STATUS. (8.7.12)

  • Enhancement: Significant overall performance improvements for the full-text V2 service.

  • Bugfix: Fixed an intermittent hang under load issue with the full-text V2 service.

  • Bugfix: Reduce server_id collision probability for MySQL CDC.

  • Bugfix: Fixed an issue with json computed column optimization crashes on computed column shard key.

  • Enhancement: When parametrize_json_keys is enabled, numeric keys will be parametrized as well. Also fixed a crash in queries with json extracts within JSON MATCH ANY (when parametrize_json_keys is switched on).

  • Enhancement: Update to certificate loading behavior. Prior to this change, only the first certificate was loaded from a server's .pem file specified via the server configuration. Now all certificates are loaded, so if a .pem file contains a certificates chain, then intermediate CA certificates will be returned by the server.

  • Enhancement: Updated the distributed OpenSSL license file to to 1.0.2zj

  • Bugfix: Fixed an issue that caused INFER of CDC-in pipelines to fail with "Failed to allocate slot in the extractor pool" error after increasing the limit.

  • Bugfix: Fixed an issue where concurrent DROP TABLEs could block a snapshot from running indefinitely.

  • Bugfix: Address an issue with rewriting of scalar subqueries with null-accepting projections.

  • Enhancement: Added X-Content-Type-Options: nosniff, Strict-Transport-Security: max-age=31536000, Cache-control: no-store headers to Data API responses.

  • Enhancement: Added a new engine variable, optimize_json_computed_column, which is used to enable or disable optimization for json computed columns. Enabled by default.

  • Enhancement: Added new engine variables for full-text search, used to fine-tune Lucene service connections: fts2_socket_timeout_ms and fts2_socket_ttl_ms. These variables control the service's socket timeout and time-to-live.

  • Enhancement: The default value for the fts2_max_connections engine variable has been increased to 100000.

  • Enhancement: Improved throughput for full-text v2 queries.

  • Enhancement: Optimized full-text queries that contain an ORDER BY ... LIMIT ... over a full-text score and optionally filter on the same full-text clause.

  • Bugfix: Fixed an issue in vector range search with DOT PRODUCT as the distance metric.

  • Bugfix: Fixed an issue with lateral join queries with identical aliases.

  • Bugfix: Fixed a possible race condition with DR connections that could cause repeated connection failures when attempting to replicate from the primary.

  • Bugfix: Fixed a rare race condition where PITR would incorrectly operate on an unrecoverable database.

July 2024

  • Bugfix: Fixed a password validation error on ADD AGGREGATOR.

  • Enhancement: Added a new engine variable: assume_udfs_deterministic. This variable controls behavior where SingleStore will do extra work to avoid issues caused by UDFs that return different values when repeatedly invoked (e.g., are non-deterministic). This variable defaults to false, preserving the existing behavior (since SingleStore version 8.0), where SingleStore will ensure these computations take place once on an aggregator node. When set to true, SingleStore will avoid this extra layer of coordination (and bottleneck) allowing leaf nodes to independently coordinate.

  • Enhancement: Added a new engine variable: external_functions_service_buffer_mb. Sets the maximum size (in mb) of the memory-mapped region used to communicate between the engine and collocated services.

  • Bugfix: Fixed an issue that could occur when replacing JSON expressions with computed columns in GROUP BY lists.

  • Bugfix: Fixed potential deadlock in Lucene service under high load.

  • Bugfix: Fixed an issue where in a VM environment SHOW LEAVES could show corrupted latency values due to incorrect values from the system clock if the VM is stopped or migrated.

  • Bugfix: Fixed a vector index issue in unlimited storage that could make a vector index blob file in accessible.

  • Enhancement: Added a new engine variable, sampling_estimate_approach, used to configure sampling in columnstore tables.

    See Statistics and Sampling Concepts for more information.

  • Bugfix: Fixed an issue where indexes were not case-sensitive when performing full-text search against JSON keys.

  • Bugfix: Fixed an issue where ALTER TABLE could throw errors related to VECTOR-type columns, even if those columns were not part of the ALTER.

  • Bugfix: Fixed an issue in CDC-in pipelines causing delays between batches.

  • Enhancement: Added Feedback Reoptimization via the REOPTIMIZE SQL command. The command is used to reoptimize a query based on previous executions of it. You can reoptimize a query multiple times, pick execution plans from any iteration, and then choose the best iteration's plan to commit to use. Syntax:

    REOPTIMIZE [[EXPLAIN] | [COMMIT]] [<reoptimize_level>]

    For more information, see REOPTIMIZE. (8.7.7)

  • Enhancement: Added websocket_proxy.log to the files managed via Trace Log Rotation.

  • Enhancement: Added support for numeric range queries when doing full-text search against JSON fields.

    Refer to numeric range queries for more information.

  • Enhancement: Enabled lateral join subselects to reference fields in grandparent SELECT.

  • Bugfix: Fixed an issue that could occur when replacing JSON expressions with computed columns in GROUP BY lists.

  • Enhancement: Enabled log_file_size_partitions and log_file_size_ref_dbs engine variables on SingleStore Helios.

  • Bugfix: Fixed display of microseconds for default TIMESTAMP(6) column values in system information commands (such as DESCRIBE TABLE or INFORMATION_SCHEMA.COLUMNS.COLUMN_DEFAULT).

  • Bugfix: Fixed a potential false positive output for OBSERVE queries during the first delete of large columnstore segments.

  • Enhancement: Allow leading 0x prefix for OBSERVE offsets.

  • Bugfix: Fixed an issue where the output of the OBSERVE query for the first delete on a columnstore segment was the inverse of expected.

  • Bugfix: Fixed an issue where user defined variables could read collation-related variables incorrectly during a SET @var = ... command.

  • Bugfix: Fixed a table eviction issue that can crash a node when replaying straight-to-disk inserts on unique key tables during database recovery.

  • Bugfix: Fixed an issue in MongoDB® CDC-in pipelines which blocks data ingestion after ALTER PIPELINE.

  • Enhancement: Allow NULL values in JSON columns when used with Full-Text search.

  • Bugfix: Fixed several system warnings that occur when running a SELECT from the MV_VERSION_HISTORY information schema view.

  • Enhancement: Improved performance of VECTOR-type User-Defined Variables(UDVs). No longer require using extra typecast to BLOB datatype.

  • Bugfix: Fixed an issue to now avoid prematurely assigning column reference to namespace.

  • Enhancement: Added support for INNER/LEFT lateral joins.

  • Bugfix: Fixed issue that caused protocol errors in the pipelines.

  • Bugfix: Temporarily disallow shallow table copy for tables created on version 8.0 or before.

  • Enhancement: Enhance table pushdown to union rewrite to enable derived table pushdown based on row count estimation.

  • Enhancement: Added support for ON NODE clause for SHOW PROFILE PROCESS|PLAN, which forwards the command to another aggregator.

  • Enhancement: Improved performance of the CREATE PROJECTION command.

  • Bugfix: Unneeded sort fields will no longer be projected to the aggregator.

  • Enhancement: Improved the performance of CREATE TABLES AS INFER PIPELINE.

  • Bugfix: Fixed an issue that could occur where a join with TABLE(JSON_TO_ARRAY(JSON_EXTRACT_JSON())) is performed with numeric keys in the JSON_EXTRACT_JSON() statement could possibly lead to incorrect output or crashes.

  • Bugfix: Identified an upgrade scenario that causes a blob leak and added a command to delete the leaked blobs. As a result, temporarily disallowing shallow table copy for tables created on version 8.0 or before.

  • Enhancement: Set collation utf8_bin for the JSON_TO_ARRAY builtin in cases where the input has utf8 charset, and set collation utf8mb4_bin in cases where the input has utf8mb4 charset.

  • Bugfix: An error is reported when inserting incorrect data into vector column for any data_conversion_compatibility_level.

  • Bugfix: Fixed a possible connectivity issue and eliminated delay when starting the Lucene service for the first time.

  • Enhancement: Updated the MV_BOTTOMLESS_STATUS_EXTENDED information schema table:

    • Removed the LATEST_API_FAILURE_RATE and AVERAGE_API_LATENCY_US columns.

    • Added the following columns:

      SUCCESS_API_PER_SECOND - Number of successful remote storage api calls per second over the last 30 seconds.

      FAILED_API_PER_SECOND - Number of failed remote storage api calls per second over the last 30 seconds.

      AVERAGE_API_EVENT_LATENCY_US - Average latency to complete a remote storage api event over the past 30 seconds. Includes time taken on retries or waiting for rate limiting.

      AVERAGE_API_RESPONSE_LATENCY_US - Average latency of receiving a response for a remote storage api request. Excludes waiting for rate limiting or retries. NOTE: This is relevant only for S3. otherwise this field is identical to AVERAGE_API_EVENT_LATENCY_US.

June 2024

A major update was released this month. Please see the associated release notes for a complete list of new features, enhancements, and bugfixes.

In addition the the large update, the following updates have been made this month:

  • Enhancement: Added the enable_use_of_stale_incremental_stats engine variable which allows the optimizer to use stale stats instead of querying for stats during optimization.

  • Enhancement: Added the throttle_replay_behind_tail_mb engine variable which controls how far the system allows a child aggregator to lag behind replicating the transaction log of its master aggregator before throttling is applied to the master aggregator.

  • Bugfix: Fixed issues with the TO_CHAR() function where results were truncated or reading communication packet errors were generated.

  • Bugfix: Fixed an integer overflow issue in unique integer columns by changing the storage type to a 32-bit unsigned integer.

  • Bugfix: Fixed incorrect results in an uncommon scenario where hash join operators with an optimization path need to spill.

  • Enhancement: The memsql_exporter now collects additional fields from the mv_activities_extended_cumulative information schema view.

  • Enhancement: Updated the default client.id for the Kafka configuration to the SingleStore partner ID.

  • Enhancement: Added the following monitoring metrics to the information_schema.mv_bottomless_status_extended view: NUM_SUCCESS_API_REQUESTS, NUM_FAILED_API_REQUESTS, LATEST_API_FAILURE_RATE, AVERAGE_API_LATENCY_US, IS_GC_RUNNING, NUM_REMOTE_FILES_DELETED, and NUM_SLOW_DOWNS.

  • Bugfix: Fixed a crash that occurred when spilling a hash join operation with at least 3 tables.

  • Bugfix: Fixed a crash that occurred when running the SHOW CREATE PROJECTIONS command when using the database name in the syntax (SHOW CREATE PROJECTIONS database_name.projection_name...).

  • Bugfix: Fixed a bug where the full-text service controller would periodically restart the process when the started process hadn't fully completed.

  • Bugfix: Fixed an issue where INSERT ... SELECT into VECTOR type columns returned an error.

  • Enhancement: Added support for Iceberg ingest via Snowflake, REST, and JDBC.

  • Enhancement: Added support for BM25 full-text search (FTS) as a built-in feature.

  • Bugfix: Fixed an edge case regression with SingleStore Kai and information schema queries when using multiple workspaces and databases.

  • Bugfix: Fixed a crash that occurred when dropping a database with tables containing very high and very low column IDs.

May 2024

  • Bugfix: Addressed a crash issue with REGEX_REPLACE.

  • Bugfix: Fixed an issue in ALTER TABLE when adding a computed column containing JSON_BUILD_OBJECT.

  • Bugfix: Reduced stack usage during compilation for some CASE expressions with many WHEN-THEN statements.

  • Bugfix: Fixed a potential deadlock issue between background garbage collector threads and foreground query execution threads.

  • Bugfix: Fixed a collation issue in the JSON_COLUMN_SCHEMA information schema table.

  • Enhancement: SingleStore now supports module eviction of idle tables with an ALTER history. This means that, prior to the 8.5.20 release, tables that had been altered could not be evicted. Altered here includes CREATE INDEX, ALTER TABLE ADD COLUMN, and some others.

  • Bugfix: Fixed a bug with Wasm UDFs where REPLACE FUNCTION did not correctly evict from the plancache, causing continued usage of the old UDF after replacement.

  • Bugfix: Fixed an upgrade issue present in version 8.5.17.

  • Bugfix: Fixed a BSON issue that could occur when large documents are read and mistakenly reported as corrupted.

  • Enhancement: Upgraded OpenSSL 1.0.2u to 1.0.2zj

  • Bugfix: Fixed a crash issue that could occur under a rare race condition in some distributed join queries.

  • Bugfix: Fixed statement atomicity in transactions writing to tables that have projections on them.

  • Enhancement: Allow Kerberos users to skip password validation for users who are authenticated with plugins.

April 2024

  • Enhancement: Added support for loading data from compressed Avro datafiles.

  • Bugfix: Fixed an issue that occurred when primary key is NULLABLE when copying a table using "Create table as select" statement.

  • Bugfix: Addressed a crash issue occurring while executing queries with vector builtins on clusters running via Docker containers on Mac machines equipped with Apple silicon processors (M1).

  • Bugfix: No longer show information_schema pipeline metadata of databases that are not attached to the workspace.

  • Enhancement: Use SeekableString encoding by default for storing VECTOR datatype columns in columnstore tables.

  • Bugfix: Fixed an issue that caused rare "Arrangement already exists in namespace" errors in recursive CTE queries.

  • Bugfix: Improved the hashGroupBy spilling condition to trigger (spill) when appropriate.

  • Enhancement: Added a new engine variable costing_max_djo_tables which sets the maximum amount of tables we allow in full DQO estimate search when filling table row estimates for distributed join hints.

  • Enhancement: Made the following engine variables configurable on SingleStore Helios:

    multi_insert_tuple_count, columnstore_row_value_table_lock_threshold, internal_columnstore_max_uncompressed_blob_size, internal_columnstore_max_uncompressed_rowindex_blob_size, columnstore_validate_blob_before_merge, default_max_histogram_buckets, enable_varbuffer_dictionary_compression, enable_alias_space_trim, skip_segelim_with_inlist_threshold, default_autostats_histogram_mode, default_autostats_columnstore_cardinality_mode, default_autostats_rowstore_cardinality_mode, default_autostats_columnstore_sampling, experimental_disable_multipartition_parallel_read, internal_enable_parallel_query_throttling, enable_spilling, spilling_node_memory_threshold_ratio, spilling_query_operator_memory_threshold, regexp_compile_mem_mb, selectivity_estimation_min_sample_size, repartition_minimum_buffer_size, use_seekable_json, json_document_max_children, json_document_max_leaves, json_document_sparse_children_check_threshold, json_document_sparse_children_check_ratio, json_document_absolute_sparse_key_check_ratio, json_document_page_data_soft_limit, ignore_insert_into_computed_column, max_subselect_aggregator_rowcount, leaf_pushdown_enable_rowcount, reshuffle_group_by_base_cost, max_broadcast_tree_rowcount, enable_broadcast_left_join, display_full_estimation_stats, force_bushy_joins, force_heuristic_rewrites, force_table_pushdown, force_bloom_filters, explain_joinplan_costs, statistics_warnings, optimizer_warnings, optimize_constants, optimize_huge_expressions, optimize_expressions_larger_than, optimize_stmt_threshold, quadratic_rewrite_size_limit, batch_external_functions, batch_external_functions_size, optimizer_enable_json_text_matching, optimizer_enable_orderby_limit_self_join, distributed_optimizer_broadcast_mult, distributed_optimizer_old_selectivity_table_threshold, distributed_optimizer_selectivity_fallback_threshold, distributed_optimizer_verbose, old_local_join_optimizer, optimizer_hash_join_cost, optimizer_merge_join_cost, optimizer_nested_join_cost, optimizer_disable_right_join, interpreter_mode_sampling_threshold, hash_groupby_segment_distinct_values_threshold, sampling_estimates_for_complex_filters, enable_histogram_to_unwrap_literals, estimate_zero_rows_when_sampling_data_is_missing, exclude_scalar_subselects_from_filters, leaf_pushdown_default, distributed_optimizer_nodes, optimizer_num_partitions, enable_binary_protocol, enable_multipartition_queries, enable_local_shuffle_group_by, enable_skiplist_sampling_for_selectivity, columnstore_sample_per_partition_limit, disable_sampling_estimation, disable_histogram_estimation, inlist_precision_limit, allow_materialize_cte_with_union, optimizer_cross_join_cost, distributed_optimizer_run_legacy_heuristic, distributed_optimizer_min_join_size_run_initial_heuristics, distributed_optimizer_unrestricted_search_threshold, singlebox_optimizer_cost_based_threshold, distributed_optimizer_estimated_restricted_search_cost_bound, disable_reference_table_leftjoin_where_pushdown, disable_sampling_estimation_with_histograms, disable_subquery_merge_with_straight_joins, default_columnstore_table_lock_threshold, default_spill_dependent_outputters, query_rewrite_loop_iterations, verify_fields_in_transitivity, optimizer_min_reference_tables_for_gathered_join, optimizer_min_reference_rows_for_gathered_join, max_expression_query_limit, max_expression_item_limit, optimizer_empty_tables_limit, optimizer_beam_width, optimizer_disable_subselect_to_join, disable_remove_redundant_gby_rewrite, subquery_merge_with_outer_joins, optimizer_disable_operator_tree_nullability_check, clamp_histogram_date_estimates, varchar_column_string_optimization_length, histogram_column_correlation, use_column_correlation, consider_secondary_projection, optimizer_disable_semi_join_reduction, optimizer_gb_placement_table_limit, optimizer_disable_transitive_predicates

  • Enhancement: Added more config information about pipelines to memsql_exporter.

  • Enhancement: Improved error handling for VECTOR type.

  • Enhancement: Added support for aggregate functions (FIRST, LAST) with VECTOR datatype.

  • Enhancement: Added support for Kafka key (producer) in the SELECT ... INTO KAFKA command.

  • Enhancement: Improved Kafka error messages to make them more specific.

  • Bugfix: Fixed a case where a killed query could incorrectly return a JSON error rather than query interrupted message.

  • Bugfix: MV_DISK_USAGE queries can no longer block garbage collection.

  • Bugfix: Fixed a crash-on-startup issue on Centos9.

  • Bugfix: Fixed potential deadlock between table module eviction and GC manager.

  • Bugfix: Fixed a crash that could occur when a scalar subselect is passed as an argument to an external TVF.

  • Bugfix: Fixed an issue with restoring from public backups.

  • Bugfix: Backups now throttle after snapshot.

  • New Feature: Added ability to create a Projection which is a copy of some or all columns in a table, and may be sharded and sorted differently than the primary table. Projections can be used to speed up several types of queries including range filters, ORDER BY/LIMIT, GROUP BY, count(DISTINCT...), DISTINCT, joins, and row lookups on secondary keys. Projections depend on the table and are updated in real time when you update the table. Related information schema table: PROJECTIONS.

  • Bugfix: Removed an unnecessary metadata copy to improve JSON performance.

  • Enhancement: Multi-insert queries now respect lower columnstore_segment_rows settings.

  • Enhancement: Subprocess and S3 API tracing are now also enabled via the enable_subprocess_tracing engine variable.

  • Enhancement: Added the optional IF NOT EXISTS clause to the CREATE TABLE ... AS INFER PIPELINE statement. Refer to Replicate MongoDB® Collections using SQL for more information.

March 2024

  • Bugfix: LAST_INSERT_ID will now correctly retrieve the last inserted ID for a forwarded INSERT query.

  • Bugfix: The rule based optimization warning message will only be shown when cost based optimization is meaningful.

  • Bugfix: Fixed a minor performance issue where a table module could be evicted prematurely during database recovery.

  • Bugfix: Fixed a very rare deadlock between ALTER of a table and a DROP of a database.

  • Bugfix: Fixed a JSON insertion failure to old utf8 sample tables that contain utf8mb4 characters by encoding it with base64 format beforehand for backward compatibility.

  • Enhancement: Disk Spilling now takes QUERY_MEMORY_PERCENTAGE resource pool setting into consideration.

  • Bugfix: Fixed an issue with a blob file leak in a rare crash scenario.

  • Bugfix: Fixed a potential deadlock in a scenario involving DDL, a 2PC transaction, and a concurrent failover.

  • Bugfix: Fixed a deadlock involving BACKUP and ALTER in a rare case.

  • Enhancement: Added support for utf8mb4 symbols in column comments, table comments, and user comments.

  • Enhancement: Enabled additional query hint for non-distributed queries so that first run of those queries can be queued.

  • Enhancement: Updated timezone data, now using IANA's tzdata version 2024a; new timezones Kyiv, Ciudad_Juarez, and Kanton supported.

  • Enhancement: Idle Table Eviction improvement. In the 8.5 GA version, an idle table's objects cannot be evicted from memory if the table has one or more UNIQUE key columns (both PRIMARY KEYs and any other keys with a UNIQUE constraint). In the 8.5.11 patch, idle tables' code objects can be evicted from memory if the table has one or more UNIQUE key columns.

  • Bugfix: Fixed an issue in background merger that allowed concurrent deletes of empty segments.

  • Bugfix: Fixed an issue to ensure preserving a field of a table correctly.

  • Enhancement: Added numInferredPaths and schemaSize as new columns to JSON_COLUMN_SCHEMA table. numInferredPaths is the number of key paths inferred for the segment. schemaSize is the size of schema_json in bytes.

  • Bugfix: Added safety checks around dropping databases without exclusive access to storage.

  • Bugfix: Fixed an issue where explicitly defining a JSON type in the RETURNS of a Wasm TVF can cause an error when it is run.

  • Bugfix: Fixed a rare crash scenario.

  • Bugfix: Fixed an issue that could lead to the risk of undefined behavior when running DROP EXTENSION IF EXISTS.

  • Enhancement: IN-lists will now use hashmap optimization for matching parameters in *_MATCH_ANY statements.

  • Enhancement: Made default_distributed_ddl_timeout a sync variable on all nodes.

  • Enhancement: Modified the conversion logic for converting from VECTOR to BSON types. Now, when casting VECTOR(F32), it will generate a BSON array of doubles, rather than a combination of numeric types.

  • Bugfix: Addressed a family of issues relating to optimal execution of JSON extracts on schema'd data.

February 2024

  • Bugfix: Fixed an issue with regression in replay performance for databases with many tables.

  • Enhancement: Added support for usage of vector built-ins with string/JSON arguments without requiring an explicit typecast to VECTOR (e.g., 'SELECT @vec<*> '[1,2,3]').

  • Enhancement: Added spilling metrics for TopSort (ORDER BY with LIMIT).

  • Enhancement: Added support to the VECTOR built-ins for all VECTOR elements types (e.g., F32, F64, I8, I16, I32, and I64).

  • Enhancement: Added support for creating numeric histograms based on JSON values. (Feature flag gated, contact your SingleStore representative)

  • Enhancement: Added new metrics to memsql_exporter based on the information_schema.mv_sysinfo_disk columns.

  • Enhancement: Creation of computed columns with VECTOR data type is now allowed.

  • Enhancement: Added ability for information_schema.optimizer_statistics to display JSON keys.

  • Enhancement: Added ability to skip eviction when log replaying hits the blob cache space limit, providing a greater chance to succeed.

  • Enhancement: Improved JSON histograms to support numeric histograms, analyze command, info_schema.optimizer_statistics, and displays JSON histograms.

  • Enhancement: Added support for warming blob cache with table's column data. Syntax is: OPTIMIZE TABLE <table_name> WARM BLOB CACHE FOR COLUMN <column_names>:

    OPTIMIZE TABLE t WARM BLOB CACHE FOR COLUMN c1, c2;
    OPTIMIZE TABLE t WARM BLOB CACHE FOR COLUMN *;
  • Enhancement: Disabled the default semi join reduction rewrite.

  • Enhancement: Now recognize more EXTRACT/MATCH functions as candidates for pushdown/computed column matching.

  • Enhancement: Replaced expression pushdown (EPD) approach with more complete version that does not over project rows.

  • Bugfix: Fixed an issue where the existing websocket connection would close when variables are updated in global scope.

  • Bugfix: Removed the hard-coded 'collation_server' from constant.StrictModeParams.

  • Bugfix: Fixed an issue with a sensitive information leak inside of out-of-memory reports.

  • Bugfix: Fixed the result collation from VECTOR built-ins to be utf8mb4_bin instead of binary.

  • Bugfix: Fixed an issue with nested extracts in the JSON_MATCH_ANY() statement.

  • Bugfix: Fixed an issue with backup restore when the backup has a corrupted GV timestamp in the about snapshot record.

  • Enhancement: The new engine variable parametrize_json_keys allows JSON keys to be parametrized and plans can be reused.

  • Bugfix: Fixed an issue that causes CREATE TABLES AS INFER PIPELINE to fail with an Out Of Memory error.

  • Bugfix: Fixed an issue with exporter not working with Public-Key Cryptography Standards #8(PKCS#8) pem format.

  • Bugfix: Fixed an issue with potential crashes when specifying a vector index in index hint.

  • Bugfix: Fixed a crash that occurred when trying to grant access to the table with no context database in per-privilege mode.

  • Bugfix: Fixed a memory issue, when calling a Table-Valued Function (TVF) that has arguments with default values.

  • Bugfix: Fixed an issue with the RG pool selector function return type check.

  • Enhancement: Added ability for information_schema.mv_connection_attributes to show tls_version and tls_cipher for connections where SSL is enabled.

  • Enhancement: Added ability for predicate pushdown with NOW() and user defined functions (UDF).

  • Enhancement.: Reduced amount of memory used by unlimited storage file listing, including point-in-time recovery (PITR).

  • Enhancement. Added an engine variable, innodb_lock_wait_timeout. This variable exists for backwards compatibility with MySQL and is non-operational in SingleStore Helios.

  • Enhancement. Optimized performance when using VECTOR built-ins with the VECTOR data type.

  • Bugfix: Fixed an issue where a backup would inadvertently carry metadata about the retention period of the backed-up unlimited storage database.

  • Bugfix: Fixed the change to aggregator activity queries in memsql_exporter not being applied properly.

  • Bugfix: Fixed an issue with START PIPELINE FOREGROUND skipping files in some circumstances.

  • Enhancement: Added ability to memsql_exporter to always return all sub-activities of an aggregator activity.

  • Enhancement: Improved ability to configure Kafka extract/egress by allowing additional options.

  • Enhancement: Improved performance for certain query shapes used by Kai.

  • Bugfix: Fixed the query-events endpoint link on the memsql_exporter landing page.

  • Bugfix: Fixed an issue where a segmentation fault (crash) appeared after terminating a query on the remote node.

  • Bugfix: Fixed an issue with crashes occurring when using certain JSON_EXTRACTs in the predicate expression of JSON_MATCH_ANY().

  • Bugfix: Fixed an issue where the number of rows affected with SELECT INTO object storage was erroneously reported as 0 rows.

January 2024

SmartDR

SmartDR creates and manages a continuous replication of data to a geographically separate secondary region thereby allowing you to failover to the secondary region with minimal downtime. For more information, refer to Smart Disaster Recovery (DR): SmartDR

Database Branching

Database branching enables you to quickly create private, independent copies of your database for development, testing and other scenarios. For more information, refer to Database Branching

Extensions

Extensions in SingleStore allow you to combine user-defined objects, such as UDFs or UDAFs, into a packaged archive (the extension) and then create, manage, and deploy these objects and other resources using a single command. Extensions support both Wasm-based and PSQL functions. For more information, refer to Extensions.

Trace Events and Query History

Added the ability to trace query completions as events, which is the initial installment of the larger event tracing framework. The Query History feature relies on query event tracing, and can be used to display query trace events over time. The Query History feature can therefore be used to troubleshoot and optimize query performance, including, but not limited to, tracing and recording expensive queries, resolving unexpected slowdowns, and viewing and optimizing workloads in real time. Refer to Query History for more information.

Improved Memory Management for Resource Pools

Added the QUERY_MEMORY_PERCENTAGE option for resource pools, which restricts memory usage in the pool on a per-individual query basis. This in contrast to MEMORY_PERCENTAGE which restricts usage based on total memory used within the current pool.

For example, when creating or altering a resource pool, setting MEMORY_PERCENTAGE to 60% and QUERY_MEMORY_PERCENTAGE to 50% would configure the system so that all queries running within the specified resource pool should together use a maximum of 60% of system memory, and any single query running within the pool should use, at most, 50% of system memory. Example syntax:

CREATE RESOURCE POOL rpoolmain WITH
MEMORY_PERCENTAGE = 60,
QUERY_MEMORY_PERCENTAGE = 50,
SOFT_CPU_LIMIT_PERCENTAGE = 65,
MAX_CONCURRENCY = 40;

Load Data Updates

SingleStore now supports loading data using the Change Data Capture (CDC) pipelines from the following data sources: MongoDB® and MySQL. Refer to Replicate MongoDB® Collections to SingleStore or Replicate Data from MySQL for information on loading data from the respective data source.

Enhancement: Other Performance Enhancements

  • SingleStore now supports creating shallow copies of tables. The WITH SHALLOW COPY feature copies an existing table and creates a new table that will have the same structure as the original table. The data is not physically copied to the new table, but referenced against the original table.

  • SingleStore now supports sorted scan query plan operators for queries containing ORDER BY/LIMIT clauses when utilizing flexible parallelism. Before this enhancement, there could be performance regressions for this query shape using flexible parallelism.

  • Improved performance when completing large sets of security operations (creating a lot of groups/users/roles, etc.).

  • Added the ability to use named argument notation when calling a PSQL SP or function. Can reduce total lines of code and make code more readable.

  • Added reduction of memory pre-allocation during columnstore JSON reads.

  • Added ability to check if all leaf node partitions are available, before processing new batches.

  • Addressed a table resolution issue for embedded recursive Common Table Expressions (CTEs).

  • SingleStore now natively supports the BSON data type.

Enhancement: Query Optimization Enhancements

  • Added support for Row Count and Selectivity hints in views.

  • Added new join logic to recognize when a non-reference table is being joined exclusively to reference tables and then gather the non-reference table to avoid duplicating work across every partition.

  • Remove redundant aggregation functions and GROUP BY statements.

  • JSON expressions are properly pushed down.

  • Allow columnstore optimization for JSON_MATCH_ANY with JSON_EXTRACT in predicate.

  • Added support for hash joins on null-accepting expressions in the ON condition of outer joins.

  • Automatically rewrite A=B OR (A IS NULL AND B IS NULL) to null safe equal (A<=>B) so that many important optimizations (e.g. shard key joins, hash joins) will work.

  • Perform a subselect to join rewrite in an UPDATE statement when there are multiple columns in the subselect.

  • Removed some query shape lockdowns.

  • Added support for flipping join order for full outer join.

  • Improved performance by not executing query optimization procedures for read queries during the process of persistent plan cache lookup. This optimization strategy has resulted in improved lookup performance, leading to faster data retrieval operations.

  • Added support for statistics on correlations between columns in cases where highly correlated filters are used.

  • The Data API now supports HTTPS for connections where ssl_key is encrypted with ssl_key_passphrase.

  • Removed parametrization of LIMIT 0 and LIMIT 1 to unlock more rewrites, especially for subselects.

  • Modified computed column matching to accurately evaluate JSON expressions containing equality and non-safe equality comparisons.

  • Fixed an issue where filtering with a JSON_EXTRACT_<type> function performs inconsistently.

  • The query optimizer now considers more LEFT JOIN elimination cases.

Enhancement: New Information Schema Views and Columns

  • Added a new view, correlated_column_statistics, to provide metadata on correlated columns.

  • Added a new view, RESOURCE_POOL_PRIVILEGES, to provide information about resource pool grants and privileges.

  • Added the following new columns to MV_BACKUP_HISTORY:

    • error_code: Error code for failed backups.

    • error_message: Error message for failed backups.

  • Added the following new columns to MV_SYSINFO_DISK:

    • read_operations_cumulative_per_device: Number of read operations performed by the device since start up.

    • write_operations_cumulative_per_device: Number of write operations performed by the device since start up.

    • device_name: Name of the device to which the values in read_operations_cumulative_per_device and write_operations_cumulative_per_device are associated.

  • Added the following to support trace events that are used by the Query History feature:

    • MV_TRACE_EVENTS: A snapshot of all trace events, the size of which is dictated by the trace_events_queue_size variable

    • MV_TRACE_EVENTS_STATUS: A view that reflects the status of current trace events

    • LMV_TRACE_EVENTS: A snapshot of each node's trace events

  • Added the following new column to ADVANCED_HISTOGRAMS, L/MV_QUERY_PROSPECTIVE_HISTOGRAMS, and L/MV_PROSPECTIVE_HISTOGRAMS:

    • JSON_KEY: an entry for each (column, json_key) pair. For non-json columns JSON_KEY is NULL.

  • Added the blob_cache_miss_b and blob_cache_wait_time_ms columns to the following information schema views: information_schema.plancache, information_schema.MV_ACTIVITIES, information_schema.MV_ACTIVITIES_CUMULATIVE, information_schema.MV_TASKS, information_schema.MV_FINISHED_TASKS, and information_schema.MV_QUERY_ACTIVITIES_EXTENDED_CUMULATIVE.

  • Bugfix: Updated the information_schema.USERS view to reflect the account status for locked users.

Enhancement: New Commands and Functions

  • Added support for the REGEXP_MATCH() function. This function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string.

  • Added support for CUBE and ROLLUP grouping operations to Wasm-based user-defined aggregate functions (UDAFs) in SingleStore. For more information, refer to CREATE AGGREGATE.

  • Added the following to support trace events that are used by the Query History feature:

    • CREATE EVENT TRACE to create a trace event

    • DROP EVENT to drop a trace event

  • Added ability to use DELETE on identical keys with the ON DUPLICATE KEY clause. This is in addition to existing "upsert" support with ON DUPLICATE KEY UPDATE. This allows new scenarios such as the ability to manage streaming aggregation with INSERT … ON DUPLICATE KEY UPDATE … ELSE DELETE ….

  • INFER PIPELINE for MongoDB® CDC-in now generates tables with BSON column types.

Enhancement: New or Modified Engine Variables

  • Added enable_idle_table_optimizations and enable_idle_table_eviction, which are used to reduce table memory overhead for idle tables on a cluster. The feature is enabled by default on all new and existing clusters. The variable enable_idle_table_optimizations needs to be set at runtime and requires a restart for changes to take effect. It can be set to OFF or ON (default). The other related variable, enable_idle_table_eviction, can be set during a session (though a very small amount of overhead will remain until the server is restarted) and can be set to Full, SkipListsOnly, and Off. It defaults to SkipListsOnly, which means SingleStore will only evict skiplist indexes for idle tables on the cluster. Full means it will evict skiplists and table modules, and Off means no eviction.

  • Added privilege_caches_update_mode, which can be used to address some performance issues that occur when performing large sets of security operations (creating a lot of groups/users/roles, etc.).

  • Added the optimizer_use_average_rowsize engine variable which can be used to now allow row size estimations in query optimization costing.

  • The query_parallelism engine variable (which was deprecated in 8.1) now is non-functional. To modify Flexible Parallelism settings, use query_parallelism_per_leaf_core instead.

  • Added the use_user_provided_index_types_in_show engine variable which controls what will be displayed via the DESCRIBE <table_name> or SHOW COLUMNS syntaxes for backward compatibility.

  • Added the throttle_replay_behind_tail_mb engine variable which controls how far the system allows a child aggregator to lag behind replicating the transaction log of it's master aggregator before throttling is applied to the master aggregator.

  • Added the ability to use the ANALYZE command with JSON keys to create histograms when the new engine variable enable_json_statistics is enabled. The engine variable enable_json_statistics is disabled by default.

  • Added trace_events_queue_size to capture trace events, the first of which is query event tracing. This engine variable is enabled by default (set to a value of 16 MB, where the value must be provided in bytes). Refer to Query History for more information.

  • Added the optimizer_disable_transitive_predicates engine variable which disables predicate transitivity on query rewrites if set to TRUE. This engine variable defaults to FALSE.

  • Added a new engine variable pipelines_cdc_java_heap_size to specify the JVM heap size limit for CDC-in pipelines.

  • Added the bottomless_experimental_blobstore_mode engine variable, which when enabled, completes additional verification of persisted files immediately after upload. This mode is experimental and may reduce upload speed. Please use with caution.

  • The new engine variable max_table_memory_room_mb sets the maximum amount of memory required when creating or attaching a database. Configuring this engine variable allows more control over whether a detached database can be reattached.

  • The engine variable backup_multipart_upload_concurrency maximum value has been increased to 15.

Miscellaneous Enhancements and Bugfixes:

  • Enhancement: Introduced a new VECTOR data type that is recommended over the BLOB data type for vector operations and use with ANN indexes. The VECTOR type improves code readability, error checking, and reduces total lines of code.

  • Enhancement: Added support for DDL forwarding for CLEAR BACKUP_HISTORY.CLEAR BACKUP_HISTORY works on the DML endpoint now.

  • Enhancement: Improved retry logic for connection reset: write errors.

  • Enhancement: SingleStore now natively supports the BSON data type.

  • Enhancement: Added support for collection.exclude.list, database.include.list, and database.exclude.list parameters to the CONFIG/CREDENTIAL clause of the CREATE AGGREGATOR PIPELINE ... AS LOAD DATA MONGODB statement. Refer to Replicate MongoDB® Collections to SingleStore for more information.

  • Enhancement: Added support for Approximate Nearest Neighbor (ANN) vector search using inverted file (IVF) and hierarchical navigable small world (HNSW) indexes, and variants of them based on product quantization (PQ). Enables support of larger-scale semantic search and generative AI applications.

  • Enhancement: Increased name length limit to 256 characters for  tables, views, table columns, view columns, procedures, functions, and aliases.

  • Enhancement: Added the ability to truncate plancache file names if they exceed the operating system's specified limit (255 bytes).

  • Bugfix: Both SHOW PIPELINES and SELECT * FROM information_schema.pipelines now show consistent pipeline state information across master and child aggregators.

  • Bugfix: Specific pipeline built-ins like pipeline_source_file() and pipeline_batch_Id() should not be used in UPSERT clause when creating a pipeline.

  • Bugfix: Improved the ability to terminate expressions containing JSON built-ins.

  • Enhancement: Added error-handling details for pipelines, including state, error, and performance-related metrics through monitoring solutions.

  • Enhancement: MemSQL Procedural SQL (MPSQL) has been renamed to simply Procedural SQL (PSQL). The name change will only show in some SHOW command output and information schema views. For example, SHOW FUNCTIONS output changed.

  • Enhancement: Added support for %ROWTYPE and %TYPE for use in declaring scalar type variables and parameters. Employing these abbreviations in Procedural SQL (PSQL) can lead to a reduction in the required lines of code.

  • Enhancement: Introduced blob cache profiling metrics for columnstore tables on unlimited storage databases. Refer to the PROFILE page for more information on what blob cache metrics are available.

  • Enhancement: Added infix operators for dot_product (<*>) and euclidean_distance (<->).

  • Enhancement: Added ability to delay retry attempts for pipeline retries.

  • Bugfix: Updated the output for the Key_name and Index_type columns in the SHOW INDEX, SHOW INDEXES, and SHOW KEYS commands for primary keys on columnstore tables. Refer to the SHOW INDEX, SHOW INDEXES, SHOW KEYS page for more information.

  • Bugfix: Improved the error message displayed when trying to create a primary key on an existing table.

  • Bugfix: Improved the error message displayed when a GRANT command fails due to missing permissions. The error message will now show the missing permissions:

    GRANT SELECT, UPDATE, DELETE, EXECUTE ON *.* TO test2;
    ERROR 1045 (28000): Current user is missing UPDATE, DELETE permission(s) for this GRANT
  • Bugfix: Fixed a case when UNIX_TIMESTAMP() was incorrectly returning 999999999.999999 for DATETIME data types with precision.

  • Bugfix: Fixed an issue where a user could be erroneously marked as deleted.

  • Bugfix: Fixed an issue where a child aggregator could crash if it ran out of memory during query forwarding.

  • Bugfix: Fixed an issue with blob cache LRU2 eviction that could occur when a query fetches a blob, evict it, and fetches it again.

  • Bugfix: Fixed an issue that could cause information for a blob to be missing from an Information Schema table.

  • Bugfix: Disk Spilling now takes the resource pool settings into consideration.

  • Enhancement: Auto user creation is deprecated and the NO_AUTO_CREATE_USER variable is enabled by default.

  • Bugfix: Fixed an erroneous access denied issue to views selecting from shard tables with computed columns.

  • Bugfix: Fixed a rare issue where the incorrect timezone could be used in logging.

  • Bugfix: Fixed an issue where using user-defined variables inside ORDER/GROUP BY statements could cause a crash.

  • Enhancement: Added column name to error messages when invalid date/time is inserted into a column.

  • Enhancement: Added BSON columnstore functionality over Parquet storage.

  • Enhancement: Added support for SELECT ... INTO KAFKA using OAUTH credentials.

  • Bugfix: Prevent the ability to create a Kafka Pipeline using Parquet Format.

  • Bugfix: Queued Time is now excluded from the cost estimate for workload management leaf memory.

  • Bugfix: Specific error messages are now logged for GCS subprocess failures.

  • Bugfix: Fixed a network communication error that occurred after query rewrites.

  • Bugfix: Fixed an error that occurred when a user attempts to access a view that is based on a table that has no privileges granted on it.

  • Enhancement: Improved the ability to kill queries containing JSON built-in functions.

  • Bugfix: JSON_KEY escape characters were not working as expected.

  • Bugfix: ALTER PIPELINE setting max_partitions_per_batch to use a default 0 value is now allowed.

  • Enhancement: Improved performance for JSON_EXTRACT_<type> built-ins in ORDER BY clauses.

  • Enhancement: Now suppressing a harmless traceSuspiciousClockUpdate trace message during recovery.

  • Bugfix: Fixed an issue where INSERT...SELECT queries with a partition_id() filter generating an error.

  • Bugfix: Fixed an issue with memory crashing when using REGEXP_MATCH, JSON_INCLUDE_MASK, or JSON_EXCLUDE_MASK built-ins.

  • Enhancement: Improved performance by optimizing joins on TABLE(JSON_TO_ARRAY()) queries.

  • Bugfix: Fixed an allocation issue that caused poor performance on high load insertion queries.

  • Enhancement: Enabled support for external UDFs used in INSERT statements with multiple VALUE clauses.

  • Enhancement: Added BSON fundamentals and column type support for SingleStore Kai.

  • Bugfix: Fixed a bug that could result in unrecoverable databases if the database had 1024 or more tables.

  • Bugfix: Fixed an optimization out-of-memory issue cause by operators generated for wide user tables.

  • Bugfix: Fixed a crash that occurs in rare scenarios involving ALTER TABLE and failovers.

  • Bugfix: Fixed ineffective search options that change in subsequent vector search queries.

  • Bugfix: Resolved an issue related to FROM_BASE64 and TO_BASE64 builtins when processing large string inputs thereby preventing potential errors in reading communication packets.

  • Bugfix: Fixed the code involved in backups to improve download error messaging.

  • Bugfix: Fixed an bug that could cause scans using non-unique indexes which could return incorrect results or cause crashes.

December 2023

  • Enhancement: Added support for creating a DEEP COPY of tables with computed columns.

  • Enhancement: Added additional config and credential option validation while creating pipelines.

  • Bugfix: Fixed an issue where valid LINK config and credential parameters were not supported for both reading from and writing to a datasource.

November 2023

  • Enhancement: Notebooks now have autosave (currently saves every 5 seconds).

  • Enhancement: Added additional node metrics to the /cluster-metrics endpoint of the memsql_exporter.

  • Added three new fields to MV_SYSINFO_DISK:

    • READ_OPERATIONS_CUMULATIVE_PER_DEVICE: Number of read operations performed by the device since start up.

    • WRITE_OPERATIONS_CUMULATIVE_PER_DEVICE: Number of write operations performed by the device since start up.

    • DEVICE_NAME: Name of the device to which the values in read_operations_cumulative_per_device and write_operations_cumulative_per_device are associated.

  • Enhancement: Changed the SingleStore Helios workspace default and range for the pipelines_cdc_row_emit_delay_us engine variable. Throttling default value is set to 1. The supported range is from 0 to 1000000.

  • Enhancement: Addressed some performance issues that occur when performing large sets of security operations (creating a lot of groups/users/roles, etc.) via the new privilege_caches_update_mode engine variable.

  • Enhancement: Improved performance during snapshotting for CDC-in pipelines.

October 2023

  • Enhancement: Added ability to infer CSV data with text boolean values.

  • Enhancement: Added support for simple multi-column update with sub-query.

  • Enhancement: Added ability to use SSL keys with a password in the HTTP API.

  • Bugfix: Fixed an issue preventing nodes from attaching during upgrade.

  • Bugfix: Fixed an upgrade issue where some databases could temporarily become unrecoverable if snapshots were skipped during a pre-upgrade on a recently attached unlimited database.

  • Enhancement: The singlestore_bundle.pem file, which SQL clients can use to connect to SingleStore Helios, will be updated as of October 20, 2023. As a consequence, connecting to SingleStore Helios may not be possible until this file has been (re-)downloaded. Refer to Connect to SingleStore Helios using TLS/SSL for more information.

September 2023

  • New Feature: SingleStore Spaces - Find a gallery of notebooks to learn about scenarios that SingleStore covers at: https://www.singlestore.com/spaces/

  • Enhancement: Notebooks have been improved with the addition of the following features:

    • SQL Notebooks

    • Hints to connect to external sources for the notebook firewall settings

    • Performance improvements around loading time

    • Jupyterlab 4.0

  • New Feature: Datadog integration. Monitor the health and performance of your SingleStore Helios workspaces in Datadog.

  • Enhancement: Enhanced the performance of DDL statements for role manipulation.

  • Enhancement: Added two engine variables, jwks_username_field and jwks_require_audience to add flexibility and improve security.

  • Enhancement: Added two engine variables: max_expression_query_limit which sets a limit on the number of expressions within an entire query and max_expression_item_limit which sets a limit on the number of expressions within a query item. Both can be set to a range between 100 and the maximum unsigned INT value. Setting these engine variables to the maximum unsigned INT value disables both features.

  • Enhancement: Added support for materializing CTEs without recomputing them when the query contains UNION, UNION ALL, and other SET operations. To enable the feature, set the engine variable allow_materialize_cte_with_union to TRUE.

  • Bugfix: Fixed several issues causing slow compilation for queries over wide tables.

  • New Feature: Persistent Cache/Disk Monitoring - Monitoring dashboard to help explain "What's consuming Persistent Cache" as well as the Blob Cache downloaded/evicted rate.

  • Bugfix: Fixed an issue where a crash occurs where the engine improperly rewrites queries with a UNION in an EXCEPT clause.

August 2023

  • New Feature: Added support for the REGEXP_MATCH() function. This function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string.

  • Enhancement: Improved performance of multi-part GCS backups.

  • Enhancement: Improved memory consumption in json decoding.

  • Enhancement: Introduced a new global engine variable, json_document_max_leaves which limits the number of JSON key paths inferred within a segment. The default value is 10000.

  • Enhancement: Introduced a new global variable, dr_min_connection_timeout_ms, which allows users to adjust the minimum timeout period in Disaster Recovery (DR) replication.

  • Enhancement: Added support for multiple uncorrelated IN-subselects in more query shapes.

  • Enhancement: SKIP PARSER ERRORS is now supported for Kafka. Additionally, a new related engine variable, pipelines_parse_errors_threshold, has been added.

  • Enhancement: SingleStore automatically rewrites A=B OR (A IS NULL AND B IS NULL) to null safe equal (A<=>B) to enable hash joins.

  • Enhancement: Added support for hash joins on null-accepting expressions in the ON condition of outer joins.

  • Bugfix: Fixed an issue with significant memory reduction from spilling for hash join queries with variable length strings (varchar, text, etc.) involved.

  • Bugfix: Fixed an issue where BACKUP DATABASE WITH INIT would fail under out-of-memory conditions.

  • Bugfix: Fixed a potential crash in NFS backup when encountering an IO error.

July 2023

  • Enhancement: Added support for Control Group v2 (cgroup v2).

  • Bugfix: Fixed an issue where a large spike in the query memory consumption on a any SingleStore node could cause replicas on the same node to become unrecoverable.

  • Bugfix: Fixed a pipeline wrong state issue caused by an error in a table with a computed column.

  • Bugfix: Fixed a potential issue where REBALANCE PARTITION would not stabilize to a partition placement on a read replicas workspace.

  • Bugfix: Fixed a potential crash in some edge cases when using Parquet pipelines.

  • Enhancement: The SingleStore Python Client is now the standard for our notebooks. This upgrade supports the ingestion of dataframes with specialized data types, including geospatial or vector data, into the database. Additionally, it incorporates the Ibis component, enabling Python to directly interact with the database. This allows dataframes to be executed within the database itself, greatly enhancing performance.

  • Enhancement: Default values for BLOB/TEXT and JSON columns are allowed, as well as NULL and empty strings.

  • Bugfix: Fixed a crash that may occur when using SELECT … FOR UPDATE … LIMIT <n> in a multi-statement transaction when two_phase_commit is ON.

  • Bugfix: Fixed a bug where the permissions do not clear for Data Manipulation Language (DML) queries and then leak to subsequent Data Definition Language (DDL) queries.

  • Bugfix: Fixed a bug where the mv_disk_usage table would show incorrect results for the plancache directory.

  • Bugfix: Fixed a crash that may occur when a function calls more than one external function.

    In the following example get_embedding() is an external function:

    SELECT DOT_PRODUCT(JSON_ARRAY_PACK(get_embedding('house')),
  • Enhancement: Added an opt-in optimization to Kerberos for HDFS pipelines to reduce the amount of requests.

  • Bugfix: Fixed a JSON formatting issue for the PARTITION_UNRECOVERABLE event in MV_EVENTS details column.

  • Bugfix: Fixed the dependency on PIPES_AS_CONCAT sql_mode inside user defined functions. The sql_mode state is now stored when a user defined function is created and used whenever the function is executed.

June 2023

  • Enhancement: Introduced the 8.1 data_conversion_compatibility_level which provides additional out-of-range checks for the TIMESTAMP data type.

  • Bugfix: Fixed a bug where a HashJoin on a materialized CTE gets executed as a NestedLoopJoin in some query shapes causing slow join performance.

  • Bugfix: Fixed the SPLIT string function that was not detecting the correct argument type when used inside a table-valued function (TVF) or a user-defined scalar-valued function (UDF).

  • Bugfix: Reduced memory usage in situations where tables are repeatedly created and dropped.

  • Enhancement: Added mv_disk_usage and mv_data_disk_usage information schema tables that report the breakdown of disk utilization by SingleStore Helios.

  • Enhancement: Added the max_execution_time engine variable. It is unused and setting this variable has no effect. It exists so certain MySQL-compatible apps that have this variable set will not fail when connected to SingleStore Helios.

  • Bugfix: Fixed an issue on sample tables that can be created with different collations on different partitions.

  • Bugfix: Fixed an issue where JSON computed column matching incorrectly triggers a JSON function.

  • Bugfix: Fixed a rare case where the query optimizer re-wrote a query into a non-equivalent query which produced more result rows.

  • Bugfix: Fixed unlimited storage S3 multipart uploads to properly retry on 200 responses which contain an embedded "SlowDown" error, instead of failing immediately.

  • Bugfix: Fixed an issue where memory accumulates over time if reshuffle or repartition operators are executed for many unique query shapes.

  • Bugfix: Fixed an issue with crashes that occur when SSL is enabled on universal storage databases.

May 2023

SingleStore Kai (Preview)

SingleStore Kai allows you to run MongoDB® queries natively in a SingleStore Helios workspace. This feature is currently available in these regions. You can enable this feature while creating a workspace. Each SingleStore Kai-enabled workspace has an additional mongodb:// endpoint, that can be used to connect from supported MongoDB® tools/applications to run MongoDB® queries.

For more information, refer to SingleStore Kai. Here's a few additional references:

New Feature: Introduced a columnstore row data structure (COLUMN GROUP) that will create a materialized copy of individual rows as a separate structure in a columnstore table. This index will speed up full row retrieval and updates.

New Feature: SingleStore Helios now supports creation of Wasm-based user-defined aggregate functions (UDAFs). Refer to CREATE AGGREGATE for more information.

Enhancement: Introduced the autostats_flush_interval_secs engine variable. It determines when autostats are flushed to disk if they are not used within the specified time. The default value is 600 seconds. If the engine variable is set to 0, autostats will always stay in memory.

Enhancement: Introduced two new Sync variables control the dynamic reuse of WM queues: workload_management_dy-namic_resource_allocation and workload_management_queue_size_allow_upgrade. The default value for workload_management_queue_size_allow_upgrade is 1. This means we can upgrade a medium queue to a large queue until the large queue becomes equal to 1.

Enhancement: Added two new monitoring dashboards: Pipeline Summary and Pipeline Performance. Currently, both dashboards are in preview mode.

Enhancement: Improved JSON encoding speed for sparse schemas (JSON schemas with a very large number of distinct property names across the full set of documents, with most properties missing in each document). This applies only to universal storage.

Enhancement: A disk manager now brokers the usage of disk space between competing uses like data, log, cache, and spilling.

Enhancement: Added capability to create Wasm functions in SingleStore Helios workspaces with heterogeneous hardware, e.g., with different levels of SIMD support.

New Feature: Added a METADATA_ONLY argument for shard keys which will prevent an index from being created on shard keys thereby saving memory. It can cause queries that would have used that index to run slower.

Enhancement: Reduced processing costs for queries containing an ORDER BY clause by minimizing the amount of data that needs to be processed. This is achieved by only projecting the primary key at the lower levels of the query plan and then joining the primary key with the original data to retrieve the remaining columns. This can significantly improve performance when dealing with large amounts of data and when only a small portion of the data needs to be retrieved.

New Feature: Added support for query profiling that is efficient enough to be enabled all the time by introducing the auto_profile_type engine variable with FULL and LITE options for automatic profiling. The enable_auto_profile engine variable must be set to ON for the FULL or LITE options to work.

Enhancement: The updated full-text search tokenizer in version 8.1 supports utf8mb4. The tokenizer properly tokeni- zes emojis and other extended 4-byte characters. In addition, certain emoji and glyph characters in utf8mb3 are also recognized as their own token rather than being treated as blanks.

Enhancement: Enhanced support for recursive common table expressions (CTEs) by expanding the range of query shapes allowed, improving column type checking and resolution between the base and recursive branches, and adding cross database support. Also, resolved the issue that caused the "ER_UN- SUPPORTED_RECURSIVE_CTE_SHAPE" error with the accompanying message about dependen- cy on a streaming result table outside the cycle.

New Feature: Added two Information Schema views for tracking connection attributes: LMV_CONNECTION_ATTRIBUTES and MV_CONNECTION_ATTRIBUTES.

Enhancement: Added support to optimize table scans with an Order-By-Limit by (a) do the order and limit first, with only the minimum needed columns, and then (b) using a self-join to retrieve the additional necessary columns for only the qualifying rows. This optimization can be enabled/disabled via the optimizer_enable_orderby_limit_self_join session variable, which is ON by default.

New Feature: Added ATTRIBUTE and COMMENT fields for users. These can be set via the CREATE USER and ALTER USER commands. The values are shown in the INFORMATION_SCHEMA.USERS view.

New Feature: Notebooks feature (Preview) is available to all customers through our Cloud service. It gives users the ability to marry SQL and Python interoperably and in a secure way. This is available in the Develop section within the Cloud Portal.

Bugfix: The sql_mode session variable now shows up in SHOW VARIABLES and the mv_global_variables view.

Bugfix: Fixed an issue that could cause an incorrect error message to be generated on an out of memory error.

Bugfix: Fixed an issue where a NULL could be pushed down to a union which could cause the type-cast/collation to be lost, resulting in an error: "Operation 'Scanning a hash join build' is not allowed".

Bugfix: Fixed an issue with DATETIME columns with ON UPDATE CURRENT_TIMESTAMP defaults that could cause incorrect results.

Enhancement: Removed EXECUTE privilege requirement for Kai internal UDFs.

Enhancement: Changed system information built-ins to use utf8_general_ci collation for the collation_server.

Enhancement: Removed the restrictions on the names of JWT users. Customers may now create users with names that look like UUIDs and emails.

Enhancement: Improved the performance of a left join of a reference table with a sharded table by pushing the reference table clauses to the leaves. The engine variable disable_reference_table_leftjoin_where_pushdown must be set to "OFF" to enable this operation.

Bugfix: Fixed an issue with optimizing queries containing no-operation predicates such as 1=1.

Bugfix: Fixed rare issue with sync durability and unlimited storage that could cause log file corruption in the blob storage.

Enhancement: Added more information about blob files that need to be downloaded from unlimited storage to the blob cache when running PROFILE.

Bugfix: Fixed an issue where the query optimizer could choose incorrectly between using the HashGroupBy or the StreamingGroupBy operators.

Bugfix: Disabled semi-join operator reduction optimization.

April 2023

Bugfix: Fixed an issue where errors could occur during backup when large files are copied.

Bugfix: Fixed an issue where S3 pipelines could stop loading data from the bucket if files are deleted from the bucket while the pipeline is active.

Bugfix: Fixed an issue where, in rare cases, an UPDATE statement with a left join in the filter clauses could fail to execute.

Bugfix: Fixed an issue where replica blobs could be loaded into the blob cache with an incorrect LRU timestamp.

New Feature: Added support for LOAD DATA from S3 for Avro and Parquet data.

New Feature: Added  JSON_INCLUDE/EXCLUDE_MASK function. When applied to a JSON document it will return a subset of the document based on the mask.

Bugfix: Fixed a potential crash issue in UPDATE queries that involve joins and that have scalar subselects in SET clauses.

Bugfix: Fixed an issue where running DROP PLAN FROM PLANCACHE ... FOR QUERY on a query with invalid syntax could cause a crash.

Enhancement: Updated timezone metadata to include Mexico's latest timezone change.

Enhancement: Added a new information schema view internal_table_statistics which shows memory use of SingleStore internal metadata tables. The columns displayed are the same as those shown for table_statistics.

Bugfix: Fixed a column name resolution issue for recursive CTEs when the column type is the same across the base branch and the recursive branch.

Bugfix: Fixed an issue where CLEAR ORPHAN DATABASE could cause data loss if run when the master aggregator has a database in an unrecoverable state.

Bugfix: Added the option to use HTTPS with memsql_exporter. To use HTTPS, enable the engine variables exporter_use_httpsexporter_ssl_cert, and exporter_ssl_key.

Bugfix: Fixed an issue with missing row counts during optimization when sampling is disabled.

Bugfix: Fixed an issue where background mergers were not stopping quickly enough, resulting in delayed DDL operations or REBALANCE commands.

Bugfix: Fixed an issue in information schema view JSON_COLUMN_SCHEMA where incorrect details were being shown for the leaf columns.

Bugfix: Fixed an issue where the memory used by an external function was not being freed quickly enough.

Enhancement: The default_partitions_per_leaf global variable will no longer be user-settable in SingleStore Helios, ensuring that it's always defined for the most optimal performance, according to the current resource size.

Bugfix: Fixed an issue where filters using the range_stats column in the information_schema.optimizer_statistics table were not allowed.

Bugfix: Fixed an issue where incorrect results could be returned when segment-level flexible parallelism is used inside a subquery containing both a shard key filter and either an aggregate, a window function, or the limit clause.

Bugfix: Fixed an issue where an internal result table was not created for a recursive CTE involving data across databases.

New Feature: Added self-service historical monitoring that allows you to quickly and easily understand your application workloads and debug performance-related issues.

March 2023

Enhancement: Improved column type resolution for base and recursive branches in recursive common table expressions (CTEs).

Bugfix: Fixed an error that could occur when attaching databases with a snapshot file of greater than 5 GB.

Bugfix: Fixed a bug where too many rows are sampled from columnstore tables with more than 1 billion rows.

Bugfix: Fixed an issue with histogram estimation in columns with a negative value TIME type.

Bugfix: Fixed "table partition count mismatch errors" that occur due to the following conditions: the system variable enable_workspace_ha is set and there is an upgraded workspace with an attached read replica.

Bugfix: Fixed an issue with DDL endpoint queries using a lower-than-specified query parallelism setting when workspaces are enabled.

Bugfix: Fixed a bug that prevents GROUP BY Push-Down optimization if the join filter contains a mismatched column type.

Bugfix: Fixed a data conversion compatibility level 8.0 error that may occur when sampling columnstore tables.

Bugfix: Fixed a possible deadlock that may occur between the blob cache and the Rowstore Garbage Collection (GC) when the blob cache encounters an out-of-memory error.

Bugfix: Fixed an error caused by setting the collation_server global variable to a non-default value when performing a REBALANCE PARTITIONS resource availability check.

Enhancement: Improved the parsing performance of queries that contain several tables.

Enhancement: The  read_advanced_counters, snapshot_trigger_size, and  snapshot_to_keep  engine variables can now be set on SingleStore Helios.

New Feature: Added new session variable disable_remove_redundant_gby_rewrite to prevent the GROUP BY columns from being removed when used in an ORDER BY clause.

Enhancement: Introduced disk and memory availability checks that run before a database is allowed to be attached to a workspace.

Enhancement: Added the ability to cache histogram results during optimization to reduce the work performed by the histograms.

Enhancement: Improved the performance of S3 pipelines when Garbage Collection (GC) is enabled.

Enhancement: Added the ability to backup a database to an HTTPS S3 target with an unverified SSL certificate when using the option: verify_ssl: false.

New feature: ORDER BY SELF JOIN, it creates a self join on ORDER BY LIMIT queries to take advantage of differences in bandwidth.

Bugfix: Fixed an issue with column type checking on base and recursive case of a recursive common table expression.

Bugfix: Fixed an issue that may cause a "Table doesn't exist" error when a multi-insert contains expressions and the target table has a computed column as its shard key.

Enhancement: Expanded existing Unicode characters to support Private Use Area (PUA) code points. Including one in the Basic Multilingual Plane (U+E000–U+F8FF) and one in each plane 15 and 16 (U+F0000–U+FFFFD, U+100000–U+10FFFD).

Bugfix: Fixed a crash that could occur when a computed column definition refers to a table name.

New Feature: Added the ability to set the maximum_blob_cache_size_percent global variable for workspaces.

Bugfix: Fixed an issue with promote lock timeout errors that may occur during a rebalance due to a heavy ingest workload, which causes the merger to be slow to pause.

New Feature: Introduced a new global variable subprocess_max_retries, which is used for retrying on retry-able connection failures during select into/backup queries for S3 and GCS.

New feature: ORDER BY SELF JOIN, it creates a self join on ORDER BY LIMIT queries to take advantage of differences in bandwidth.

February 2023

Enhancement: Background snapshots are now allowed to run during BACKUP DATABASE commands. This prevents increased disk usage by logs during a long-running backup.

Bugfix: The CREATE_TIME, CREATE_USER, ALTER_TIME, and ALTER_USER columns in the information_schema.TABLES view are now properly set for views and TVFs (table-valued functions).

Bugfix: Fixed an issue that occurred when the MATCH_PARAM_<type> argument of the JSON_MATCH_ANYfunction was not in a predicate expression.

Enhancement: Improved the performance of the JSON_MATCH_ANY_EXISTS function over columnstore tables.

Bugfix: Fixed a profiling issue specific to non-collocated hash joins where the memory usage and disk spilling are missing under the join operators.

Enhancement: Background snapshots are now allowed to run during BACKUP commands. This prevents increased disk use by logs during a long running BACKUP.

Bugfix: Attaching a database that exceeds available workspace memory or persistent cache is now automatically blocked.

Enhancement: Workspace creation and resume times are reduced by running operations in parallel.

Bugfix: Fixed an issue where the database engine locks up on certain out-of-memory errors.

Bugfix: Fixed a parsing issue for queries containing multi-line comments near GROUP BY or ORDER BY clauses.

Enhancement: The num_background_merger_threads engine variable is now settable on Cloud.

Enhancement: The ORDER BY ALL [DESC|ASC] (or ORDER BY *) syntax is now supported. 

Enhancement: The GROUP BY ALL [DESC|ASC] (or GROUP BY *) syntax is now supported. 

Enhancement: Improved the query execution performance of JSON columns under a higher level of parallelism.

Enhancement: Expanded support for encoded GROUP BY query shapes containing expressions in aggregates.

Bugfix: Fixed an issue where extra CPU was used when a read-only database is attached to a workspace without any writable mount for the read-only database.

Enhancement: Sampling will no longer be used for table size estimation when statistics are present.

Enhancement: Added the /api/v2/jwks_setup endpoint to Data API to allow users to enable JWT Auth in Data API on Cloud. See jwks_setup for more information.

Enhancement: Improved the code generation performance of tables with a large number of indexes.

Bugfix: Fixed an issue causing incorrect trace messages in master logs where clocks were incorrectly advancing from "0".

Enhancement: Added the DATETIME_PRECISION column to both PARAMETER and ROUTINESinformation_schema views. Also, the DATETIME_PRECISION column will include TIME and TIMESTAMP data types in the COLUMNSinformation_schema view.

Enhancement: Added the REVERSE() built-in string function that reverses the target string.

January 2023

Bugfix: Fixed some error handling issues with unlimited storage download and upload processes.

Enhancement: The SHOW TABLE STATUS command now displays the memory usage by GLOBAL TEMPORARY tables.

Bugfix: Fixed a crash when parsing certain Parquet data into a pipeline.

Enhancement: Added support for using a connection link for inserting data with the FORMAT PARQUET option.

Bugfix: Fixed an issue with aggregate functions using incorrect enum/set types that may result in inaccurate output in the operator tree.

Bugfix: Fixed an issue with a transaction leak on the master aggregator when running CREATE TABLE AS SELECT on a child aggregator using autocommit=0 on the connection.

Bugfix: Fixed a bug that may cause a query to hang when comparing an utf8 column with an utf8mb4 constant. This issue occurs when collation_server is set to one of the utf8mb4 collations.

Bugfix: Improved the accuracy of network time reporting in query profiles regarding the time spent sending the results back to the user connection.

Bugfix: Fixed an edge case issue causing a potential memory leak when running an UPSERT statement against a columnstore table.

Bugfix: Fixed an issue that could cause the engine to crash from info_schema query submissions.

Enhancement: Improved the performance of bushy join rewrites.

Bugfix: Fixed an edge case issue where the engine could crash when performing multi-inserts.

Bugfix: The avro schema registry URL portion of the CREATE PIPELINE syntax is now redacted in processlist.

Bugfix: Fixed an issue where the engine could crash during recursive set operations.

Bugfix: The information_schema.statistics "collation" column now correctly indicates whether an index is ascending ("A") or descending ("D").

Enhancement: Improved performance of comparing utf8mb4 strings.

Bugfix: Fixed an edge case issue which could cause the engine to hang during shutdown.

Enhancement: Added the skip_segelim_with_inlist_threshold engine variable, which will skip segment elimination with the IN list if its size is larger than threshold (default is 1000 elements).

Bugfix: information_schema.table_statistics now correctly shows information about global temporary tables.

December 2022

The following features may require you to enable them manually.

New Feature: Improved Seekability in Universal Storage Tables delivers large performance gains for transactional workloads on universal storage tables. 

  • Added support for fast seeking into JSON columns in a universal storage table using subsegment access.

  • Improved seek performance for string data types for universal storage for LZ4 and run-length encoded (RLE) data.

New Feature: Recursive common table expressions (CTE) are now supported by SingleStore. Previously, complex operations including temporary tables within a stored procedure would be needed to perform the actions that a simple recursive CTE query can handle. For more information, see WITH (Common Table Expressions).

Enhancement: New Information Schema Views

  • Added the MV_RECOVERY_STATUS view which includes information about the status of the current recovery process.

  • Added several Replication Management views.

Enhancement: Subselect lockdown messages are now more informative and they indicate the line number and character offset of the subselect that caused the error. In addition, up to 100 bytes of text from the beginning of the referred subselect is also displayed.

SELECT (SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1;

Old output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore."

New output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore.    Near '(SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1' at line 1, character 7."

Enhancement: Decreased the memory overhead for columnstore cardinality statistics by 25% as the first phase of an overall project to improve memory for auto-stats in general.

Enhancement: Improved performance for user-defined functions (UDFs) and Stored Procedures that take JSON arguments, and the JSON_TO_ARRAY command.

Enhancement: Updated the supported syntax for DROP … FROM PLANCACHE so plans on a specified node and plans from all aggregators based on the query text can be dropped.

DROP plan_id FROM PLANCACHE ON NODE node_id;
DROP PLAN FROM PLANCACHE [ON AGGREGATORS] FOR QUERY <query_text>;

Enhancement: Setting Collation for String Literals

You can set the collation for string literals explicitly:

SELECT "My string" COLLATE utf8mb4_unicode_ci;

Enhancement: Created the ALTER USER permission. Users must have this permission or the GRANT permission to be able to execute the ALTER USER command.

Enhancement: Added ALTER USER ... ACCOUNT LOCK to manually lock accounts:

ALTER USER 'test'@'%' ACCOUNT LOCK;

ALTER USER 'test'@'%' ACCOUNT UNLOCK;

Enhancement: Added sampling (a small portion of the rows in the table are used for analysis) for Reference tables as part of query optimization.

Enhancement: Improved the performance of the PROFILE functionality such as lower memory overheads, lower performance impacts to OLAP queries, and better statistics collecting.

Enhancement: Added support for improved segment elimination in queries with WHERE clauses containing DATE and TIME functions. The functions that are supported for segment elimination are DATE, DATE_TRUNC, TIMESTAMP, UNIX_TIMESTAMP, and YEAR.

Enhancement: The data_conversion_compatibility_level engine variable can now be set to '8.0' for stricter data type conversions. This will now be the default value. This new data_conversion_compatibility_level setting additionally flags invalid string-to-number conversion in INSERT statements.

Enhancement: The sync_permissions engine variable default value is now ON. The default value only impacts newly installed clusters. Existing clusters must be manually updated to the variable.

Enhancement: The enable_auto_profile engine variable now has a third value: LITE. LITE is the new default value for new customers. It has a lower memory overhead that ON. The default value for existing customers is ON.

Enhancement: The columnstore_small_blob_combination_threshold engine variable default value has been changed to 5242880 bytes. Prior to the 8.0 release, the default value was 33554432 bytes.

Enhancement: Added support for encoded GROUP BY clauses in queries containing conditional and character expressions in aggregate functions.

Enhancement: Expanded the type of query execution operations (hash joins, window functions, and sort operations) to offload memory to disk using spilling to allow a large memory footprint query to succeed at the cost of query execution times in a memory constraint environment.

Enhancement: Added support for ? and [ ] glob patterns to FS pipelines.

Enhancement: Added the optional parameter DEFINER for CREATE PROCEDURE, FUNCTION, and AGGREGATE.

Enhancement: Added ability for a JSON computed column to be returned in a query instead of the entire document.

Enhancement: Added ability to use use the ORDER BY clause with the JSON_AGG function.

Enhancement: Expressions can be assigned to system variables. System variables, literals, or any combination of these can be referenced using built-ins like CONCAT as a variant of complex expressions.

Enhancement: For unlimited storage databases, SingleStore caches data within the workspace. It uses a modified least-recently-used (LRU(2)) replacement policy. Information is retained to indicate if objects are frequently-accessed. This reduces the chance that a single large query will flush frequently-accessed data from the cache.

Enhancement: Added support for the AUTO option in the computed column definition clause of a CREATE TABLE statement to automatically infer the data type of a computed column expression. For more information, see CREATE TABLE.

Enhancement: Added two Workload Management engine variables: workload_management_queue_size_allow_upgrade and workload_management_dynamic_resource_allocation. These variables work together to dynamically move queries to another queue if the original queue is saturated.

Enhancement: Storage of CHAR(<length>) as VARCHAR(<length>): For a column defined as type CHAR of length len, SingleStore will store the column as a VARCHAR of length len if len greater than or equal to the value of the new engine variable varchar_column_string_optimization_length. If the value of the variable is 0, the column is not stored as a VARCHAR.

Enhancement: After scaling a workspace, the persistent cache on the workspace is warmed with copies of blobs before new resources begin handling queries. It is fully automatic.

Enhancement: The BACKUP command no longer blocks the ALTER TABLE and several other commands for the duration of the backup. This allows you to run commands like TRUNCATE on your tables even during the backup of a very large deployment.

Enhancement: Added the ability to use JSON_MATCH_<ANY>. Returns true if, in the JSON, there is a value at the specified filter path which evaluates the optional filter predicate as true. If no filter predicate is provided, will return true if the filter path exists.

Enhancement: Made the following Selectivity Estimation improvements:

  • Enabled sampling for reference tables.Improved date/time histogram estimates by utilizing a heuristic when the current date/time is outside of the histogram range.

  • Added selectivity estimation for filters containing uncorrelated scalar subselects. This behavior can be controlled by the engine variable exclude_scalar_subselects_from_filters. This change has the side-effect of enabling bloom filters more often.

  • Changed the estimation source to heuristics when sampling is turned on but the total sampled rows are zero.

  • Added ability to use histogram estimation for filtering predicates that use a stored procedure parameter.

  • Increased the default value for engine variable optimizer_cross_join_cost to reduce the chance of Cartesian Joins being included when there are incorrect estimations.

  • Improved the GROUP BY cardinality estimates for predicates using OR expressions.

  • Enabled ability to combine histogram and sampling selectivity estimates by default.

Enhancement: Made the following Query Optimization enhancements:

  • Moved sub-queries for some outer joins from the ON clause to a WHERE clause to enable subselects to be rewritten as joins.

  • Enabled repartition on expressions.

  • Added ability to use GROUP BY push down for outer joins.

  • Enhanced column pruning by eliminating derived duplicate columns.

  • Removed redundant GROUP BY clauses that are implied by equi-joins.

Bugfix: Fixed an issue where REGEXP and RLIKE were case-insensitive. They are now case-sensitive.

June 2022

New Feature: Management API now supports Workspaces. For more information see, Management API Reference.

The following features may require you to enable them manually.

Code Engine - Powered by Wasm

The Code Engine feature allows you to create UDFs using code compiled to WebAssembly (Wasm). This feature supports any language that can compile to the Wasm core specification.

For more information, see Code Engine - Powered by Wasm.

Workspaces

The Workspace feature allows you to spin up compute resources and size them up or down on-demand independent of storage.

Workspaces also provide greater flexibility than workspaces by allowing databases to be shared across multiple workspaces thereby eliminating the need of maintaining data across multiple workloads.

See What is a Workspace for more information.

The SingleStore Management API now supports Workspaces. For more information see, Management API Reference.

OUTBOUND privilege

The OUTBOUND privilege can be used to mitigate security risks. The privilege can be assigned to users who are allowed to create outbound internet connectivity.

For more information, see GRANT.

April 2022

The following features may require you to enable them manually.

New Feature: Flexible Parallelism allows multiple cores on the same node to access the same database partition.

With Flexible Parallelism, as database partitions are created they are divided into sub-partitions. As a query runs on a leaf node, multiple cores working on behalf of the query can process different sub-partitions of a partition in parallel. As an example, if you are currently at one partition per core with Flexible Parallelism, doubling the size of your workspace and then rebalancing will result in two cores for each partition. As a result, a simple query that scans and aggregates all the data in a single partition will now execute more quickly than it did before.

Added new engine variables used for enabling and configuring Flexible Parallelism: sub_to_physical_partition_ratio, query_parallelism_per_leaf_core, and expected_leaf_core_count. The existing engine variable node_degree_of_parallelism is deprecated. For more information, see Flexible Parallelism.

New Function: Added the ISNUMERIC function, used to determine whether the provided expression is a valid numeric type.

New Function: Added the SESSION_USER function, used to return the user name you specified when connecting to the server, and the client host from which you connected.

New Function: Added the SET function, used to initialize a user-defined session variable.

New Function: Added new vector functions, namely VECTOR_ELEMENTS_SUM, VECTOR_KTH_ELEMENT,VECTOR_NUM_ELEMENTS, VECTOR_SORT, and VECTOR_SUBVECTOR.

Enhancement: Added support for TRIM string function. TRIM is now multi-byte safe which means the result of an operation using TRIM is either a valid string or an unmodified string.

Enhancement: Unlimited storage databases now support the BACKUP … WITH SPLIT PARTITIONS command.

Enhancement: The DROP MILESTONE command is used to delete a milestone of a currently attached database.

Enhancement: Improved performance for columnstore seeks into string columns – now it is no longer necessary to scan an entire segment to look up the data for a string value for a row when seeking to find that one row.

New Function: A new clause "AS new_db_name" has been added to the RESTORE DATABASE command which allows the use of the full original backup path if trying to restore to a new database name.

New Function: SECRET - Added the ability to hide credentials from queries. Passing credentials in queries can leave them exposed in plain text during parameterization and they can be seen in logs and the process list. To counter this, you can use the SECRET function (similar in function to NOPARAM). SECRET takes a string (such as a password or other sensitive information) and replaces it with the literal string "<password>" during parameterization. The string is unchanged for the query however.

CALL db.do_something_useful('root', SECRET('super-secret-password'));

See SECRET for more information.

Enhancement: Added per privilege transferability from one user to another via the new TRANSFERABLE clause and SYSTEM_VARIABLES_ADMIN grant in the REVOKE security management command. A new engine variable, privilege_transfer_mode, must be set to per_privilege for this functionality to work as expected. Also, this new functionality will affect the results of the SHOW GRANTS command. If the privilege_transfer_mode engine variable remains on the default value of grant_option, then the output is one row and can include the WITH GRANT OPTION privilege. If the value of privilege_transfer_mode is per_privilege, then the output can be two rows. The first row will display the non-transferable privileges. The second row will display the transferable privileges.

Enhancement: Added new EXPLAIN and PROFILE reproduction clause syntax. EXPLAIN REPRO outputs the explain information in JSON format and provides important debugging information. EXPLAIN REPRO will work for SELECT queries only. The PROFILE REPRO syntax will replace the need to set the engine variable set_profile_for_debug to on. The engine variable will continue to be supported for backward compatibility.

Enhancement: Added ability to match a computed column expression and the same expression appearing in a query, to improve query performance, especially for indexed computed JSON fields. The enhancement promotes data independence between the physical and application layer.

Enhancement: Spilling for GROUP BY statements is enabled by default starting in engine version 7.8. Added an additional engine variable, spilling_minimal_disk_space. If a node has less disk space than spilling_minimal_disk_space (default is 500MB), queries on that node that require spilling will fail instead of spilling to disk.

Enhancement: Materialized CTEs are now on by default and no longer considered a preview feature.Reduced the memory usage of approx_count_distinct by using a more compact representation.

Enhancement: Existing queries are no longer recompiled on minor upgrades (from 7.8.x to 7.8.y for example).

New Function: Added ALTER_TIME, ALTER_USER, and CREATE_USER to the information_schema.TABLES table, to show the time of the latest update the table, the user who made the change, and the user who created the table. For existing tables, the ALTER_TIME value will be NULL until the table is altered. For new tables, ALTER_TIME will be the same as CREATE_TIME.

New Function: Added FLAGS column to the information_schema.VIEWS table, to indicate whether a view is a Table Valued Function (TVF). A value of IS_TABLE_VALUED_FUNCTION indicates a TVF.

Enhancement: An internal component, the LLVM code generation framework, was updated to version 10 from version 3.8. This improves performance of query compilation for DELETES on tables with a very large number of columns.

November 2021

New Feature: Point-in-time recovery (PITR) has moved from preview to production status, and is now supported for production use cases.

Enhancement: By default, all backups are now lock-free. Distributed write transactions no longer have to wait when a backup starts.

New feature: Introduced row-level decompression for the string data type which will increase performance on reads against columnstore tables.  Before this improvement, decompression occurred on the order of blocks of data which consists of 4096 rows.

New Feature: Added support for cross-database INSERT...SELECT into columnstore temporary tables.

Enhancement: Allow spilling hash GROUP BY operator.

New Feature: Added support for SELECT ... INTO AZURE. This command supports the WITH COMPRESSION option, which is described in the next release note.

New Feature: Added the WITH COMPRESSION option to SELECT … INTO FS, SELECT … INTO GCS, and SELECT … INTO S3. WITH COMPRESSION writes the SELECT query results, in compressed .gzip files, to an object store.

New Feature: Added support for new vector functions, namely JSON_ARRAY_UNPACK, SCALAR_VECTOR_MUL, VECTOR_ADD, and VECTOR_MUL.

New Feature: Added support for the current user security model in stored procedures. In this model, when the current user executes a stored procedure, the stored procedure is executed using the security permissions of that user.

New Feature: Added support for external functions, as a preview feature. An external function calls code that is executed outside of a SingleStore database. For more information, see CREATE [OR REPLACE] EXTERNAL FUNCTION.

Enhancement: Improved full-text filter performance when used with other secondary hash index filters. For highly selective full-text filters, the improvement in execution speed can be 10 times faster.

Enhancement: Introduced new logic to determine when to evict a compiled image of a query plan. The logic will sort on the oldest number of plans while considering the explicitly set memory limit usage of each plan. The feature is disabled by default. To enable the logic, the engine variable enable_compiled_images_eviction must be set to ON. The engine variable compiled_images_eviction_memory_limit_mb  is used to set the memory limit.

New Feature: Added a new function, JSON_KEYS, which returns the top-level keys of a JSON object in the form of a JSON array. Optionally, if a keypath is defined, returns the top-level keys from the keypath.

Enhancement: Added support for more query shapes with FULL JOIN or correlated subselects when reference tables are involved. Prior to this release, these query shapes would hit a lockdown error.

New Feature: Added support for query shapes that include repartitioned subqueries containing SELECT statements with aggregated column(s) without a GROUP BY clause. Prior to this release, these query shapes would hit a lockdown error.

New Feature: Added support for LEFT JOIN when the left table is a reference table without a primary key. Prior to this release, this query shape would hit a lockdown error.

Enhancement: Improved query execution for repartition DELETE FROM ...LIMIT and broadcast LEFT JOIN.

Enhancement: Improved selectivity estimate for RIGHT JOIN query shapes when doing BloomFilter decision. See Query Plan Operations for a detailed explanation of BloomFilter and other filtering methods.

Enhancement: Improved query performance using SORT KEY() and KEY () WITH CLUSTERED COLUMNSTORE columns with integer data types.

Enhancement: Added password_expiration column to the information_schema.USERS table. If the password_expiration_seconds engine variable is not enabled, the password_expiration column will be NULL. If the password_expiration_seconds engine variable is enabled, the password_expiration column will display the number of seconds remaining for the password to expire.

Enhancement: Improved the performance of selective filters using string columns in columnstore tables.

New Feature: Added support for UNION between reference and sharded tables.  Prior to this, this query shape would hit a lockdown error.

Enhancement: Improved EXPLAIN output to clarify a result table for a broadcast LEFT JOIN or for a MATERIALIZE_CTE as they can have the same result table name. For broadcast LEFT JOIN, a branch operator is added so that the branching operation on the shared result table is reflected. See Query Plan Operations for a detailed explanation of broadcasts and other distributed data movement.

New Feature: Ingest, Added support for transactions in Kafka pipelines.

Last modified: January 14, 2025

Was this article helpful?