8.1 Release Notes

To deploy a SingleStore 8.1 cluster, refer to the Deploy SingleStore Guide.
To upgrade a self-managed install to this release, follow this guide.
To make a backup of a database in this release or to restore a database backup to this release, follow this guide.
The data_conversion_compatibility_level engine variable now defaults to '8.0' in new installations. This results in stricter data type conversions. The value is not changed when upgrading to version 8.0. This new data_conversion_compatibility_level setting additionally flags invalid string-to-number conversion in INSERT statements, PIPELINES, LOAD DATA commands, and type casts.

Applications will likely see more compatibility issues flagged when run against installations with data_conversion_compatibility_level set to '8.0' than when run with a lower compatibility level.
New deployments of SingleStore 8.1 require a 64-bit version of RHEL/AlmaLinux 7 or later, or Debian 8 or later, with kernel 3.10 or later and glibc 2.17 or later. Refer to the System Requirements and Recommendations for additional information.

Release Highlights

Note

This is the complete list of new features and fixes in engine version 8.1 and its maintenance patches. For a list of all new features and bug fixes added in maintenance releases only, see the maintenance release changelog. If you are upgrading from an existing 8.1 version, see the changelog for a list of new features and fixes added to 8.1 by date.

Column Group Index

Introduced a columnstore row index structure (COLUMN GROUP) that will create a materialized copy of individual rows as a separate structure in a columnstore table. This index is most useful for wide tables, and can speed up full-row retrieval and updates by 8x or more on tables with over 200 columns.

Automatic Profiling

Added support for query profiling that is efficient enough to be enabled all the time by introducing the auto_profile_type engine variable with FULL and LITE options for automatic profiling. The enable_auto_profile engine variable must be set to ON for the FULL or LITE options to work.

Visual Explain via SHOW PROFILE UI and EXPLAIN

Using the SHOW PROFILE UI or EXPLAIN UI commands will generate a URL that, when opened, loads a visual representation of the query plan on the Visual Explain website.

User ATTRIBUTE and COMMENT Fields

Added ATTRIBUTE and COMMENT fields for users. These can be set via the CREATE USER and ALTER USER commands. The values are shown in the INFORMATION_SCHEMA.USERS view. Example syntax:

SQL

CREATE USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}';

CREATE USER 'user1'@'localhost' COMMENT 'Some information about user1';

ALTER USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}';

ALTER USER 'user1'@'localhost' COMMENT 'some comment about user1';

ALTER USER 'user1'@'localhost' ATTRIBUTE '{"phone": "1234567890"}' COMMENT 'some comment about user1';

SELECT USER, ATTRIBUTE, COMMENT from INFORMATION_SCHEMA.USERS WHERE user='user1';

+-------------+-------------------------+------------------------------+
| USER        | ATTRIBUTE               | COMMENT                      |
+-------------+-------------------------+------------------------------+
| user1       | {"phone": "1234567890"} | some information about user1 |
+-------------+-------------------------+------------------------------+

The value for ATTRIBUTE must be a valid JSON object. When setting both ATTRIBUTE and COMMENT in the same SQL statement, ATTRIBUTE must come first.

Full-Text Search Is Now Supported on utf8mb4 Data

The updated full-text search tokenizer in version 8.1 supports utf8mb4. The tokenizer properly tokenizes emojis and other extended 4-byte characters. In addition, certain emoji and glyph characters in utf8mb3 are also recognized as their own token rather than being treated as blanks.

New full-text indexes will use the new tokenizer when created. For existing full-text indexes, you can leave them as they are, using the old tokenizer, or you can drop old/existing indexes (via DROP INDEX or ALTER TABLE DROP INDEX) and recreate them via the new ALTER TABLE ADD FULLTEXT statement.

Other than adding proper support for emojis and extended characters, full-text search behavior using the new tokenizer is different from that of the previous tokenizer only in some rare edge cases. These are extremely unlikely to have any negative user impact.

SASL OAUTHBEARER Authentication Mechanism

Added the ability to configure Simple Authentication and Security Layer (SASL) OAUTHBEARER for use with Kafka pipelines. OAuthBearer support in Kafka enables clients to authenticate with Kafka using OAuth 2.0 tokens. With OAuthBearer support in Kafka pipelines, clients can use an OAuth 2.0 access token to authenticate with Kafka brokers and perform operations such as producing and consuming messages.

Trace Log Rotation Function

SingleStore now has a built-in trace log rotation function (added in version 8.1.25). This is separate from and mutually exclusive of any manual log rotation implementation already in place, so if you want to use the new function and already have a solution in place using logrotate or similar, you will need to disable that before enabling this new rotation feature.

The built-in trace log rotation feature can be configured to rotate logs based on their size or based on time. You can also, optionally, configure it to automatically delete logs after a specified number of days. The default, when the feature is enabled, is to never delete logs.

The new trace log rotation feature is disabled by default. See Rotating Trace Log to Manage its Size for information on how to enable and configure the new rotation feature.

As a part of this new feature, added the following engine variables:

tracelog_rotation_size - indicates the size threshold in bytes
tracelog_rotation_time - indicates the time threshold in seconds
tracelog_retention_period - indicates retention period in days

Other Improvements and Fixes

Other Performance Enhancements

Enhanced workload management so that a medium-size query can be put in the large-query queue if there are free resources for large queries but the resources for medium-size queries are fully used. This can improve throughput when there are many medium-size queries and few large ones. See the related discussion of the new variable workload_management_queue_size_allow_upgrade.
Enhanced disk space manageability that allows controlled space sharing between column store blob cache and query spilling. Supports setting the maximum disk spilling space limit explicitly with a new engine variable spilling_maximum_disk_percent. Additional metrics in the SHOW STATUS EXTENDED output give details about the disk usage:
- Disk_spilling_usage_limit_mb
- Disk_spilling_usage_mb
- Disk_spilling_evict_blob_cache_mb_average
- Disk_spilling_usage_mb_high_water_mark
- Disk_total_size_mb
- Disk_available_size_mb
- Disk_uncategorized_usage_mb
- Blob_cache_max_size_mb_adjusted_for_low_disk
- Blob_cache_max_size_mb_adjusted_for_low_disk_and_spilling

Query Optimization enhancements:

Added support to optimize table scans with an Order-By-Limit by (a) do the order and limit first, with only the minimum needed columns, and then (b) using a self-join to retrieve the additional necessary columns for only the qualifying rows. This optimization can be enabled/disabled via the optimizer_enable_orderby_limit_self_join session variable, which is ON by default.
Added a METADATA_ONLYargument for shard keys which will prevent an index from being created on shard keys thereby saving memory. It can cause queries that would have used that index to run slower.
Reduced the amount of memory used by autostats. This can be controlled via the autostats_flush_interval_secs engine variable.

New Information Schema Views and Columns

Added two Information Schema views for tracking connection attributes: LMV_CONNECTION_ATTRIBUTES and MV_CONNECTION_ATTRIBUTES.
A new WORKLOAD_MANAGEMENT_QUEUE system table has been added which shows the list of queries for which the WM is not able to reserve resources and therefore has put them into the waiting queue.
Added two Information Schema views for reporting the breakdown of disk utilization: information_schema.MV_DATA_DISK_USAGE and information_schema.MV_DISK_USAGE.

New Commands and Functions

Added the ability to add and drop FULLTEXT indexes via the ALTER TABLE and DROP INDEX commands. This allows simple addition or removal of full-text search capability without reloading data. A table can only have one FULLTEXT index. For example, with the following table:
SQL
```
CREATE TABLE articles (
    id INT UNSIGNED,
    year int UNSIGNED,
    title VARCHAR(200),
    body TEXT,
    SORT KEY (id));
```
Add a FULLTEXT index via:
SQL
```
ALTER TABLE articles ADD FULLTEXT (title, body);
```
To remove the index, use either of these commands:
SQL
```
ALTER TABLE articles DROP INDEX title;

DROP INDEX title ON articles;
```
You cannot add and drop an index in a single ALTER TABLE statement.
Added the ability to start only failed pipelines via the START FAILED PIPELINES command.
SQL
```
START FAILED PIPELINES;
```

Added the ability to load Avro formatted data from an AWS S3 bucket using LOAD DATA, which can be simpler than using pipelines.

SQL

LOAD DATA S3 '<bucket name>'
CONFIG '{"region" : "<region_name>"}'
CREDENTIALS '{"aws_access_key_id" : "<key_id> ",
             "aws_secret_access_key": "<access_key>"}'
INTO TABLE <table_name>
       (`<col_a>` <- %,
 `<col_b>` <- % DEFAULT NULL ,
  ) FORMAT AVRO;

Added the ability to load Parquet formatted data from an AWS S3 bucket, which can be simpler than using pipelines.

SQL

LOAD DATA S3 '<bucket name>'
CONFIG '{"region" : "<region_name>"}'
CREDENTIALS '{"aws_access_key_id" : "<key_id> ",
             "aws_secret_access_key": "<access_key>"}'
INTO TABLE <table_name>
       (`<col_a>` <- %,
 `<col_b>` <- % DEFAULT NULL ,
  ) FORMAT PARQUET;

Added ability for users with SUPER or ALTER USER privileges to be able to see the hashed password.
SQL
```
SHOW GRANTS FOR <user>;
```
Added the WARM BLOB CACHE FOR INDEX clause to the OPTIMIZE TABLE command to ensure consistent query performance when querying through a hash index, irrespective of when the query was last accessed.
Added support for creating a DEEP COPY of tables with computed columns. (8.1.31)
Added support for the REGEXP_MATCH() function. This function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string. (8.1.18)

New or Modified Engine Variables

Enhancement: Introduced a new session engine variable optimizer_disable_transitive_predicates which disables predicate transitivity on query rewrites.
Added the following engine variables: max_expression_query_limit which sets a limit on the number of expressions within an entire query and max_expression_item_limit which sets a limit on the number of expressions within a query item. Both can be set to a range between 100 and the maximum unsigned INT value. Setting these engine variables to the maximum unsigned INT value disables both features. (8.1.21)
Added two engine variables, jwks_username_field and jwks_require_audience to add flexibility and improve security. (8.1.20)
Added support for materializing CTEs without recomputing them when the query contains UNION, UNION ALL, and other SET operations. To enable the feature, set the engine variableallow_materialize_cte_with_union to TRUE. (8.1.20)
json_document_max_leaves which limits the number of JSON key paths inferred within a segment. The default value is 10000. (8.1.18)
pipelines_parse_errors_threshold, which is used in conjunction with SKIP PARSER ERRORS for Kafka pipelines. (8.1.16)
dr_min_connection_timeout_ms, which allows users to adjust the minimum timeout period in Disaster Recovery (DR) replication. (8.1.15)
workload_management_enable_large_memory_classification: When enabled allows workload management to classify queries as large solely based on memory usage. This is an Aggregator Only engine variable. (8.1.9)
workload_management_expected_aggregators: The expected number of aggregators that will be used to run a high volume of client queries which require fully distributed execution. The default value is 0, which is equivalent to setting it to the total number of aggregators in the Products: cluster. For version 7.0 and up, this variable should only be set to its default of 0, unless `workload_management_enable_static_partitioning is set to TRUE. This variable can sync to all aggregators. (8.1.9)
Improved the performance of a left join of a reference table with a sharded table by pushing the reference table clauses to the leaves. The engine variable disable_reference_table_leftjoin_where_pushdown must be set to "OFF" to enable this operation. (8.1.4)
auto_profile_type: Determines the mode of auto profiling. There are two values: LITE and FULL.
autostats_flush_interval_secs: Determines when autostats are flushed to disk if they are not used within the specified time. The default value is 600 seconds. If the engine variable is set to 0, autostats will always stay in memory.
compiled_images_eviction_memory_limit_percent: Sets the percentage of maximum_memory that is available for caching query object code images. This variable is set to 0 by default. For more information about this engine variable, see Managing Plancache Memory and Disk Usage.
compiled_images_eviction_memory_limit_mb: Sets the memory limit (in MBs) that is available for caching query object code images. For more information about this engine variable, see Managing Plancache Memory and Disk Usage.
enable_compiled_images_eviction is now enabled by default.
sql_mode is now a Sync variable and is settable on SingleStore Helios.
spilling_maximum_disk_percent: Sets the maximum percentage of disk space to be used by disk spilling. The value must be between 0 and 1 or left to the default (-1).
Two new Sync variables control the dynamic reuse of WM queues: workload_management_dynamic_resource_allocation and workload_management_queue_size_allow_upgrade. The default value for workload_management_queue_size_allow_upgrade is 1. This means we can upgrade queries from the medium queue to the large queue until the large queue length becomes equal to 1.
backup_multipart_upload_concurrency : Improves backup performance.
max_execution_time: It is unused and setting this variable has no effect. It exists so certain MySQL-compatible apps that have this variable set will not fail when connected to SingleStore.
Introduced a new session engine variable optimizer_disable_transitive_predicates which disables predicate transitivity on query rewrites.
Added the bottomless_experimental_blobstore_mode engine variable, which when enabled, completes additional verification of persisted files immediately after upload. This mode is experimental and may reduce upload speed. Please use with caution.
The data_conversion_compatibility_level engine variable can now be set to '8.1' for stricter data type conversions. This new data_conversion_compatibility_level setting additionally flags invalid or out-of-range timestamp inputs to TIMESTAMP and TIMESTAMP(6) columns in INSERT statements. Please note: The data_conversion_compatibility_level engine variable defaults to '8.0' in new installations.

Miscellaneous

Bugfix: Fixed an issue caused when multi-part names were used in common table expressions (CTEs) within UNION. (8.1.53)
Bugfix: Fixed a crash that could occur when columns were dropped with ALTER TABLE. (8.1.53)

Bugfix: Fixed an issue where left join returned the wrong result when spilling occurred. (8.1.52)
Allow Kerberos users to skip password validation for users who are authenticated with plugins. (8.1.42)
Updated the information_schema.USERS view to reflect the account status for locked users. (8.1.32)
Added more log messages for backup sub-process errors. (8.1.32)
Improved the ability to kill queries containing JSON built-in functions. (8.1.32)
The query optimizer now considers more left join elimination cases. (8.1.32)
Added support for creating a DEEP COPY of tables with computed columns. (8.1.31)
Added additional config and credential option validation while creating pipelines. (8.1.31)
Fixed an issue with the SET clause of the CREATE PIPELINE statement: now it is possible to use type options like NULL/NOT NULL/COLLATE... with :>/!:> typecast operators. (8.1.29)
Improved performance during snapshotting for CDC-in pipelines. (8.1.27)
Added additional node metrics to the /cluster-metrics endpoint of the memsql_exporter. (8.1.27)
Added support for simple multi-column update with sub-query. (8.1.25)
Added ability to use SSL keys with a password in the HTTP API. (8.1.25)
Improved the performance of mv_replication_status for clusters with large DR databases. (8.1.24)
Added ability to infer CSV data with text boolean values. (8.1.24)
Added resource pool metrics to the /cluster-metrics endpoint of the memsql_exporter. (8.1.24)
Improved blob cache sync performance when there is no running query. (8.1.23)
Added support for flipping join order for FULL OUTER JOIN operations. (8.1.23)
Enhanced the performance of DDL statements for role manipulation. (8.1.20)
Improved memory consumption in json decoding. (8.1.19)
SKIP PARSER ERRORS is now supported for Kafka. (8.1.16)
Added support for multiple uncorrelated IN-subselects in more query shapes. (8.1.16)
SingleStore automatically rewrites A=B OR (A IS NULL AND B IS NULL) to null safe equal (A<=>B) to enable hash joins. (8.1.16)
Added support for hash joins on null-accepting expressions in the ON condition of outer joins. (8.1.16)
Added support for Control Group v2 (cgroup v2). (8.1.13)
Default values for BLOB/TEXT and JSON columns are allowed, as well as NULL and empty strings. (8.1.12)
Added an opt-in optimization to Kerberos for HDFS pipelines to reduce the amount of requests. (8.1.11)
Improved the accuracy of the blob_fetch_network_time statistic in query profiles. (8.1.8)
Added metrics to memsql_exporter that show the breakdown of disk usage by SingleStore. Also, added metrics to memsql_exporter that show unlimited storage utilization statistics. (8.1.8)
Enhanced the on-disk plancache functionality by shrinking the plancache when disk space gets low and the on-disk plancache is using at least 25% of available disk space. (8.1.8)
Improved the performance of backups to GCS (Google Cloud Storage). GCS backups are not using multi-part uploads. (8.1.6)
The FLUSH EXTRACTOR POOLS command now runs on all nodes in a cluster when run on an aggregator. (8.1.6)
Now allowing child aggregator root users to be dropped even if the sync_permissionengine variable is enabled. (8.1.4)
Removed the restrictions on the names of JWT users. Customers may now create users with names that look like UUIDs and emails. (8.1.4)
Added more information about blob files that need to be downloaded from unlimited storage to the blob cache when running PROFILE. (8.1.4)
Changed the system information built-ins to use the utf8_general_ci collation for the collation_server value. (8.1.3)
Added capability to create Wasm functions in SingleStoreclusters with heterogeneous hardware, e.g., with different levels of SIMD support.
Wasm modules are now serialized to the plancache. Refer to Code Engine - Powered by Wasm for more information.
SingleStore now supports creation of Wasm-based user-defined aggregate functions (UDAFs). Refer to CREATE AGGREGATE for more information.
The CREATE FUNCTION statement now returns an error if the USING EXPORT clause is omitted and the Wasm module has functions with the same name but different case. Refer to Wasm UDF/TVF Naming Convention for more information.
Added a new metric to the output of the SHOW STATUS EXTENDED command, which displays the memory allocated (in MBs) used to cache query object code images, as derived from compiled_images_eviction_memory_limit_percent and compiled_images_eviction_memory_limit_mb.
Adjusted how CREATE_TIME, CREATE_USER, ALTER_TIME, and ALTER_USER columns in information_schema.TABLES are set so that they are given proper values for the relevant CREATE and ALTER commands.
Enhanced support for recursive common table expressions (CTEs) by expanding the range of query shapes allowed, improving column type checking and resolution between the base and recursive branches, and adding cross database support. Also, resolved the issue that caused the "ER_UNSUPPORTED_RECURSIVE_CTE_SHAPE" error with the accompanying message about dependency on a streaming result table outside the cycle.
Renamed spill_outputted_rows to non_aggregated_output_rows for the shuffle groupby operator in the profiling output to avoid confusion with disk spilling.
```
[...]
"non_aggregated_output_rows":{ "value":1048576, "avg":349525.333333,
[...]
```
Improved JSON encoding speed for sparse schemas (JSON schemas with a very large number of distinct property names across the full set of documents, with most properties missing in each document). This applies only to universal storage.
Incremental backups are supported on SingleStore Self-Managed for unlimited storage space. Note: incremental backups are not supported as part of the managed backup solution.
Updated OpenSSL to version 3.0, which is now used to establish secure connections to SingleStore. As a consequence, a client certificate that uses SHA or MD5 hash functions in its signature must be replaced with a certificate that uses SHA256 at a minimum, or a secure connection to SingleStore cannot be established. While SingleStore supports TLS v1, TLS v1.1, and TLS v1.2, using TLS v1.2 is recommended. When FIPS is enabled, only TLS v1.2 is supported. Refer to Troubleshoot OpenSSL 3.0 Connections for more information.
The ORDER BY ALL [DESC|ASC] (or ORDER BY *) syntax is now supported.
The GROUP BY ALL [DESC|ASC] (or GROUP BY *) syntax is now supported.
Added the following JSON functions:
- JSON_MATCH_ANY returns true if there exists a value in the JSON at the filter path for which the filter predicate evaluates to true.
  SQL
```
JSON_MATCH_ANY(<filter_predicate>, <json>, <filter_path>);
```
- JSON_MATCH_ANY_EXISTS returns true if there is a value (possibly null) in the JSON at the filter path.
  SQL
```
JSON_MATCH_ANY_EXISTS(<json>, <filter_path>);
```
- JSON_INCLUDE_MASK eliminates all portions of the document that do not match the mask.
  SQL
```
JSON_INCLUDE_MASK(<json>,<mask>);
```
- JSON_EXCLUDE_MASK eliminates all portions of the document that match the mask.
  SQL
```
JSON_EXCLUDE_MASK(<json>,<mask>);
```

Changes in Patch Releases Since 8.0GA

Several features have been added since the GA release of 8.0 (8.0.5). Particularly notable ones include GROUP BY ALL, ORDER BY ALL, and several new JSON functions; however, there are others. Please refer to the 8.0 release notes or maintenance release change log for additional details.