8.0 Release Notes

Note

  • To deploy a SingleStore 8.0 cluster, refer to the Deploy SingleStore Guide.

  • To make a backup of a database in this release or to restore a database backup to this release, follow this guide.

  • The data_conversion_compatibility_level engine variable now defaults to '8.0' in new installations. This results in stricter data type conversions. The value is not changed when upgrading to version 8.0. This new data_conversion_compatibility_level setting additionally flags invalid string-to-number conversion in INSERT statements.

    Applications will likely see more compatibility issues flagged when run against installations with data_conversion_compatibility_level set to '8.0' than when run with a lower compatibility level.

    SingleStore now supports in-place change of the data_conversion_compatibility_level for persisted computed column shard keys for versions 8.0.24 and newer. We still suggest testing any existing applications before deploying a change to a production environment.

Release Highlights

Note

This is the complete list of new features and fixes in engine version 8.0 and its maintenance patches. For a list of all new features and bug fixes added in maintenance releases only, see the maintenance release changelog. If you are upgrading from an existing 8.0 version, see the changelog for a list of new features and fixes added to 8.0 by date.

Code Engine - Powered by Wasm

The Code Engine feature allows you to create UDFs/TVFs using code compiled to WebAssembly (Wasm). This feature supports any language that can compile to the Wasm core specification.

For more information, see Code Engine - Powered by Wasm.

Improved Seekability in Universal Storage Tables

These enhancements will deliver large performance gains for transactional workloads on universal storage tables.

  • Added support for fast seeking into JSON columns in a universal storage table using subsegment access.

  • Improved seek performance for string data types for universal storage for LZ4 and run-length encoded (RLE) data.

Recursive Common Table Expressions

Recursive common table expressions (CTE) are now supported by SingleStore. Previously, complex operations including temporary tables within a stored procedure would be needed to perform the actions that a simple recursive CTE query can handle.

For more information, see WITH (Common Table Expressions).

Initial IPv6 Support

Preview feature: Added initial support for IPv6 to the SingleStore engine (memsqld) and SingleStore Toolbox. Refer to the allow_ipv6 and bind_addressengine variables for more information.

Other Improvements and Fixes

Performance Enhancements

  • Added the ability to cache histogram results during optimization to reduce the work performed by the histograms. (8.0.14)

  • Improved the parsing performance of queries that contain several tables. (8.0.14)

  • Improved the performance of S3 pipelines when Garbage Collection (GC) is enabled. (8.0.14)

  • Improved the query execution performance of JSON columns under a higher level of parallelism. (8.0.10)

  • Improved the performance on columnstore scans that perform multiple JSON extraction operations on the same JSON column. (8.0.8)

  • Improved the performance of various commands (SHOW commands, DDL, etc.) when there are very many views or tables in the database (100s of thousands). (8.0.6)

  • Improved performance of comparing utf8mb4 strings. (8.0.5)

  • Improved performance for user-defined functions (UDFs) and Stored Procedures that take JSON arguments, and the JSON_TO_ARRAY command.

  • Decreased the memory overhead for columnstore cardinality statistics by 25% as the first phase of an overall project to improve memory for auto-stats in general.

  • Improved the performance of the PROFILE functionality such as lower memory overheads, lower performance impacts to OLAP queries, and better statistics collecting.

Query Optimization enhancements:

  • Moved sub-queries for some outer joins from the ON clause to a WHERE clause to enable subselects to be rewritten as joins.

  • Enabled repartition on expressions.

  • Added ability to use GROUP BY push down for outer joins.

  • Enhanced column pruning by eliminating derived duplicate columns.

  • Removed redundant GROUP BY clauses that are implied by equi-joins.

  • Added sampling (a small portion of the rows in the table are used for analysis) for Reference tables as part of query optimization.

  • Added support for improved segment elimination in queries with WHERE clauses containing DATE and TIME functions. The functions that are supported for segment elimination are DATE, DATE_TRUNC, TIMESTAMP, UNIX_TIMESTAMP, and YEAR. See each topic for specific examples.

Selectivity Estimation improvements:

  • Enabled sampling for reference tables.

  • Improved date/time histogram estimates by utilizing a heuristic when the current date/time is outside of the histogram range.

  • Added selectivity estimation for filters containing uncorrelated scalar subselects. This behavior can be controlled by the engine variable exclude_scalar_subselects_from_filters. This change has the side-effect of enabling bloom filters more often.

  • Changed the estimation source to heuristics when sampling is turned on but the total sampled rows are zero.

  • Added ability to use histogram estimation for filtering predicates that use a stored procedure parameter.

  • Increased the default value for engine variable optimizer_cross_join_cost to reduce the chance of Cartesian Joins being included when there are incorrect estimations.

  • Improved the GROUP BY cardinality estimates for predicates using OR expressions.

  • Enabled ability to combine histogram and sampling selectivity estimates by default.

New Information Schema Views and Columns

  • Add an information schema table (JSON_COLUMN_SCHEMA) which shows the schema inferred for JSON columns in columnstore tables. (8.0.15)

  • Added the DATETIME_PRECISION column to both PARAMETER and ROUTINESinformation_schema views. Also, the DATETIME_PRECISION column will include TIME and TIMESTAMP data types in the COLUMNSinformation_schema view. (8.0.9)

  • Added a new information_schema view named LMV_LOCAL_DATABASES. This view shows the state of local databases like SHOW DATABASES EXTENDED, but it can be queried against unlike show commands. (8.0.9)

  • Added CREATE_TIME and ALTER_TIME columns to information_schema.pipelines. CREATE_TIME shows the date/time a pipeline was created or recreated. ALTER_TIME shows the date/time a pipeline was altered via an ALTER PIPELINE statement. (8.0.6)

  • Added a new information schema view internal_table_statistics which shows memory use of SingleStore internal metadata tables. The columns displayed are the same as those shown for table_statistics.

  • Added several Replication Management views.

  • Added the MV_RECOVERY_STATUS view which includes information about the status of the current recovery process.

  • Added the AVERAGE_DISK_SPILLING_USE column to the MV_PLANCACHE information schema table. It shows the average amount of data (in bytes) spilt to disk during query execution.

New Commands and Functions

  • Added  JSON_INCLUDE/EXCLUDE_MASK function when applied to a JSON document; it will return a subset of the document based on the mask. (8.0.18)

    JSON_EXCLUDE_MASK(<json>,<mask>);
    JSON_INCLUDE_MASK(<json>,<mask>);
  • Added syntax to allow multiple leaf nodes to be detached using the DETACH LEAF command. (8.0.14)

  • New feature: ORDER BY SELF JOIN, it creates a self join on ORDER BY LIMIT queries to take advantage of differences in bandwidth. (8.0.13)

  • The ORDER BY ALL [DESC|ASC] (or ORDER BY *) syntax is now supported. (8.0.10)

  • The GROUP BY ALL [DESC|ASC] (or GROUP BY *) syntax is now supported. (8.0.10)

  • Added the REVERSE() built-in string function that reverses the target string. (8.0.9)

  • Added a new OPTIMIZE TABLE <table_name> INDEX; command for columnstore tables. This command runs the optimization routine for columnstore secondary indexes manually. (8.0.7)

  • The SHOW STATUS EXTENDED command contains a new "Gv_clock" key whose value is the current logical clock of the server. (8.0.5)

  • The SHOW DATABASE STATUS command contains a new "gv_clock" key whose value is the current logical clock of the server. (8.0.5)

  • Added support to PROFILE for hash join spilling. (8.0.5)

  • Improved the accuracy of network_time in PROFILE output for some query shapes. (8.0.5)

  • Added the ability to run SHOW GRANTS within stored procedures.

  • Added ALTER USER ... ACCOUNT LOCK to manually lock accounts:

    ALTER USER 'test'@'%' ACCOUNT LOCK;
    ALTER USER 'test'@'%' ACCOUNT UNLOCK;
  • Added support for encoded GROUP BY clauses in queries containing conditional and character expressions in aggregate functions.

  • Updated the supported syntax for DROP … FROM PLANCACHE so plans on a specified node and plans from all aggregators based on the query text can be dropped.

    DROP plan_id FROM PLANCACHE ON NODE node_id;
    DROP PLAN FROM PLANCACHE [ON AGGREGATORS] FOR QUERY <query_text>;
  • Added the optional parameter DEFINER for CREATE PROCEDURE, FUNCTION, and AGGREGATE.

  • Added ability to use use the ORDER BY clause with the JSON_AGG function.

  • The BACKUP command no longer blocks the ALTER TABLE and several other commands for the duration of the backup. This allows you to run commands like TRUNCATE on your tables even during the backup of a very large deployment. For a complete list of commands no longer blocked during backup refer to Lock-free Backups.

  • Added support for the AUTO option in the computed column definition clause of a CREATE TABLE statement to automatically infer the data type of a computed column expression. For more information, see CREATE TABLE.

  • Added the ability to use JSON_MATCH_<ANY>. Returns true if, in the JSON, there is a value at the specified filter path which evaluates the optional filter predicate as true. If no filter predicate is provided, will return true if the filter path exists.

Engine Variables

  • Added new session variable disable_remove_redundant_gby_rewrite to prevent the GROUP BY columns from being removed when used in an ORDER BY clause. (8.0.14)

  • Introduced a new global variable subprocess_max_retries, which is used for retrying on retry-able connection failures during select into/backup queries for S3 and GCS. (8.0.13)

  • Fixed the case where the REGEXP_REPLACE and REGEXP_SUBSTR expressions can produce non-utf8 strings by introducing the new engine variableregexp_output_validation_mode. Regular expression built-ins can produce non-utf8 strings because they don't have full support for multi-byte characters. The engine variable controls this behavior if regular expression built-ins return strings that are invalid under its collation settings. (8.0.8)

  • Added the ignore_foreign_keys system variable, which allows foreign key syntax in CREATE TABLE commands, but completely ignores the key (it will not show up in metadata). (8.0.7)

  • Added a new option, SERVER_V2, to the json_extract_string_collationengine variable. This new, recommended option is the default for new clusters, and allows comparison of utf8mb4 strings extracted from JSON to utf8 string constants. Existing clusters will retain their original setting upon upgrade. (8.0.7)

  • Added the skip_segelim_with_inlist_threshold engine variable, which controls when segment elimination will not use an IN list that is too large (default 1000 elements). (8.0.7)

  • Added a new global variable, maximum_blob_cache_size_percent, which can set the blob cache size as a value from 0 to 1 that is percentage of local disk the blob cache is allowed to use. The default value is 0. (8.0.5)

  • Added a new global variable, num_background_merger_threads, which controls the number of background merger threads to start for each node. The default value is 2. (8.0.5)

  • Setting Collation for String Literals

    You can set the collation for string literals explicitly:

    SELECT "My string" COLLATE utf8mb4_unicode_ci;
  • Added two Workload Management engine variables: workload_management_queue_size_allow_upgrade and workload_management_dynamic_resource_allocation. These variables work together to dynamically move queries to another queue if the original queue is saturated.

  • The columnstore_small_blob_combination_threshold engine variable default value has been changed to 5242880 bytes. Prior to the 8.0 release, the default value was 33554432 bytes.

  • Storage of CHAR(<length>) as VARCHAR(<length>): For a column defined as type CHAR of length len, SingleStore will store the column as a VARCHAR of length len if len greater than or equal to the value of the new engine variable varchar_column_string_optimization_length. If the value of the variable is 0, the column is not stored as a VARCHAR.

  • The sync_permissions engine variable default value is now ON.The default value only impacts newly installed clusters. Existing clusters must be manually updated to the variable.

  • The data_conversion_compatibility_level engine variable can now be set to '8.0' for stricter data type conversions. This will now be the default value. This new data_conversion_compatibility_level setting additionally flags invalid string-to-number conversion in INSERT statements.

Miscellaneous

  • Improvements to memsql_exporter: Improved error handling and reduced memory usage. (8.0.18)

  • Updated timezone metadata to include Mexico's latest timezone change. (8.0.17)

  • Added support for LOAD DATA from S3 for Avro and Parquet data. (8.0.17)

  • Improved column type resolution for base and recursive branches in recursive common table expressions (CTEs). (8.0.15)

  • Added the ability to backup a database to an HTTPS S3 target with an unverified SSL certificate when using the option: verify_ssl: false. (8.0.14)

  • Expanded existing Unicode characters to support Private Use Area (PUA) code points. Including one in the Basic Multilingual Plane (U+E000–U+F8FF) and one in each plane 15 and 16 (U+F0000–U+FFFFD, U+100000–U+10FFFD). (8.0.13)

  • Added the /api/v2/jwks_setup endpoint to Data API to allow users to enable JWT Auth in Data API on Cloud. See jwks_setup for more information.

  • Added the option to authenticate Data API requests using JWT.

  • Created the ALTER USER permission. Users must have this permission or the GRANT permission to be able to execute the ALTER USER command.

  • Expressions can be assigned to system variables. System variables, literals, or any combination of these can be referenced using built-ins like CONCAT as a variant of complex expressions.

  • Added support for ? and [ ] glob patterns to FS pipelines.

  • Added ability for a JSON computed column to be returned in a query instead of the entire document.

  • Expanded the type of query execution operations (hash joins, window functions, and sort operations) to offload memory to disk using spilling to allow a large memory footprint query to succeed at the cost of query execution times in a memory constrained environment.

  • Flexible Parallelism is enabled by default. All new clusters created will have Flexible Parallelism enabled. This does not apply to updated clusters and restored databases, which will retain their original settings.

    This change in behavior could impact applications that create databases or clusters. To retain the old behavior (Flexible Parallelism not enabled when creating clusters/databases) disable Flexible Parallelism prior to object creation via the sub_to_physical_partition_ratio engine variable. To use Flexible Parallelism, it must be enabled prior to database creation.

  • Subselect related lockdown messages are now more informative and they indicate the line number and character offset of the subselect that caused the error. In addition, up to 100 bytes of text from the beginning of the referred subselect is also displayed.

    SELECT (SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1;
    
    Old output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore."
    
    New output: "Feature 'subselect containing dependent field inside group by' is not supported by SingleStore.
        Near '(SELECT DISTINCT t1.a FROM t ORDER BY a) FROM t t1' at line 1, character 7."
  • After adding a new leaf node to a cluster and rebalancing partitions, the blob cache on the new leaf is warmed with copies of blobs from the node originally holding data that has been moved to the new leaf. This is done before the new leaf begins handling queries. It is fully automatic.

  • For unlimited storage databases, SingleStore caches data from remote storage on local disks or SSDs. It uses a modified least-recently-used (LRU(2)) replacement policy. Information is retained to indicate if objects are frequently-accessed. This reduces the chance that a single large query will flush frequently-accessed data from the cache.

  • Fixed an issue where REGEXP and RLIKE were case-insensitive on binary collations (for example, utf8_bin). They are now case-sensitive for binary collations.

In this section

Last modified: September 24, 2024

Was this article helpful?