Watch the 7.3 Webinar On-Demand
This new release brings updates to Universal Storage, query optimization, and usability that you won’t want to miss.

7.3 Release Notes

Info

To deploy a SingleStore DB 7.3 cluster, refer to the Deploy SingleStore DB Guide.

To upgrade a self-managed install to this release, follow this guide.

To make a backup of a database in this release or to restore a database backup to this release, follow this guide.

Release Highlights

The SingleStore DB 7.3 release is focused on storage, query processing, and programmability. As Universal Storage evolves, useful features such as columnstore as a default and support for upserts into columnstore tables have been added. Other highlights include data definition language (DDL) forwarding from child to master aggregator, a new, safer command for promoting a child to master aggregator, support for large queries that join over 30 tables, and a variety of new engine variables.

System of Record

  • Added three information schema views: mv_aggregated_replication_status, mv_replication_status, and lmv_replication_status to monitor the progress of a DR replication. Users can now view the aggregated replication status of each database (including partition-level details) and replication links between the primary and the secondary cluster to know if there is any lag in replication, and view statistics related to the lag.
  • Added a new column, type, to the Backup History Table. This column shows the type of backup and can be accessed by querying the information_schema.MV_BACKUP_HISTORY table.
  • Tables that define a unique key using UNENFORCED now have an INDEX_TYPE of NONE in the information schema. The INDEX_TYPE was listed as BTREE in previous versions.

Storage

  • Implemented forwarding of data definition language (DDL) commands from child to master aggregator. Previously, these commands could only be run on a master aggregator. See Node Requirements for SingleStore DB Commands for more information about how to enable this feature.
  • Database-level DDL and clustering operations are now allowed to run in parallel across databases.
  • Added new command REBALANCE ALL DATABASES, which rebalances the partitions on all databases in the cluster.
  • Added the FULL option to REBALANCE PARTITIONS which takes effect when the number of partitions in the database is not divisible by the number of leaves. The extra partitions are placed on the leaves containing the fewest number of partitions.
  • Added new command PROMOTE AGGREGATOR … TO MASTER for all use cases that require promotion of a child aggregator to master, aside from permanent loss of the master aggregator.
  • Added the information schema view information_schema.MV_BACKUP_STATUS for monitoring backup progress.

Universal Storage

  • Added support for INSERT … ON DUPLICATE KEY UPDATE, INSERT … IGNORE, and REPLACE on columnstore tables.
  • Added the columnstore as default feature, which allows you to create a columnstore table using standard CREATE TABLE syntax.
  • Added support for the LOAD DATA ... [REPLACE | IGNORE | SKIP { ALL | CONSTRAINT | DUPLICATE KEY } ERRORS] semantics for ingesting data into columnstore tables with unique keys. These semantics allow duplicate keys to be handled without returning an error to the client application. See example 10 in LOAD DATA.
  • Added support for upserts on columnstore tables using Pipelines.

Query Optimization

  • Improved optimization of queries with large numbers of tables being joined. Join optimization is now significantly faster and adaptively handles very large join sizes to provide good execution plans while keeping query optimization time low. The engine variable distributed_optimizer_max_join_size is now deprecated and replaced by new variables distributed_optimizer_unrestricted_search_threshold, distributed_optimizer_min_join_size_run_initial_heuristics, and singlebox_optimizer_cost_based_threshold - see their descriptions in the List of Engine Variables for further information on configuring these variables.
  • Added a new engine variable profile_for_debug which can be used to enable collection of additional data with PROFILE that can be displayed using SHOW PROFILE JSON and is useful for troubleshooting query optimizer issues. For more information, see PROFILE.
  • Improved selectivity estimation by using sampling and histograms together, when both are available. This improvement only applies when cardinality_estimation_level is set to 7.3 or higher. By default, cardinality_estimation_level is set to 7.1.
  • Decreased query optimization cost of lookups of query plans from the on-disk plancache.

Query Execution

  • For each entry in the plancache, only the LLVM executable is now stored in-memory. This decreases the size of in-memory size of query plans by up to 80%. For more information about LLVM and the plancache, see Code Generation.
  • Implemented optimizations for system information schema queries, resulting in significant performance increases for tables such as index_statistics, column_statistics, and columnar_segments in particular.
  • Added support for EXPLAIN and PROFILE queries in stored procedures.

Usability and Programmability

  • Added a new aggregate function APPROX_PERCENTILE that calculates the approximate percentile and is about 10 times faster than the PERCENTILE_DISC and PERCENTILE_CONT functions.
  • The USING clause of query text is no longer captured as part of audit logging. It is now included in the output of SHOW PROCESSLIST in a new column titled RPC Info.
  • Added a new JSON function, JSON_AGG, that aggregates values as a JSON array.
    • Added a new variable, json_agg_max_len, which is the maximum string length JSON_AGG can return in bytes. For more information, see the List of Engine Variables.
  • The ALTER permission is no longer required for ANALYZE (SELECT and either ALTER or INSERT are required).
  • Added support for defining User-Defined Variables, via SELECT INTO @varname.

Ingest

  • Added support for publishing data to Google Cloud Storage (GCS) via SELECT … INTO GCS.
  • Added support for specifying the chunk size while uploading data to an Amazon S3 bucket via SELECT … INTO S3 to enable output of very large files.
  • SingleStore Pipelines now supports new Avro schema evolution capabilities. Hostname or IP address of the schema registry can be specified at the time of Pipeline creation; any changes to the schema can be easily viewed by the Pipelines. If fields are added to the Avro schema, the Pipelines can be modified without stopping and losing any offsets.
  • Added new option, max_retries_per_batch_partition, for CREATE PIPELINE and ALTER PIPELINE. When set, this determines the number of retries that will be attempted for writing batch partition data to the destination table. Specifying fewer retries when there is a large amount of data to load can be useful for conserving resources; the reverse is true for increasing the number of retries for smaller tables where performance is less of a concern.
  • Added a new option to the CONFIG clause of CREATE PIPELINE for Filesystem pipelines called process_zero_byte_files. Enabling this option ensures zero byte files are processed. Otherwise they are skipped by default.

Small Fixes and Otherwise Uncategorized Items

  • Improved the performance of columnstore batch deletes on certain column encodings.
  • Reduced memory usage for columnstore updates and deletes.
  • More concurrent operations are now allowed on the database level.
  • Added SORT KEY(...) as an alias for KEY(...) USING CLUSTERED COLUMNSTORE.
  • Added a new variable, columnstore_row_value_table_lock_threshold, that sets the threshold at which multiple inserts to a columnstore table with unique keys will switch from row value locking to table locking. For more information, see the List of Engine Variables.
  • Added two engine variables: internal_columnstore_validate_blob_after_write and internal_columnstore_validate_blob_before_read that control verification of the checksum of a blob immediately after it is created, and before reading it, respectively.
  • Fixed a bug where some materialized common table expressions (CTEs) with cross database joins would not run because the table holding the CTE result set was assigned to the wrong database.
  • Fixed an issue where JSON_EXTRACT_STRING() was not respecting json_extract_string_collation when extracting strings from columnstore tables.
  • Fixed an issue that caused RESTORE DATABASE FROM filesystem to result in an unbalanced distribution of partitions among nodes in the restored database.
  • Previously, running ALTER VIEW on a schema-bound view could cause recovery to fail to create a new schema-bound view referring to the altered view. This action has been disallowed.
  • Changed how SingleStore DB looks for files when configuring SSL. SingleStore expects an absolute path (/path/to/files). If you specify a relative path (./path/to/files), SingleStore first looks for the path relative to the location of the memsql.cnf file. If that fails, SingleStore looks for the path relative to the current working directory. If neither of those paths work, the operation fails.
  • JSON_EXTRACT_BIGINT now returns a SQL NULL value when the value of a key is JSON-NULL or otherwise undefined. Previously, this function returned 0 (not null) when using columnstore.
  • Tables created with syntax such as CREATE TABLE … AS SELECT will no longer inherit AUTO_INCREMENT behavior. For example, if CREATE TABLE table_1 AS SELECT * FROM table_2 is used to create table_1 where table_2 has an AUTO_INCREMENT column, it will be created as a non-auto-increment column in table_1.
  • SingleStore now supports aliases for column names in a PIVOT.
  • information_schema.MV_COLUMNSTORE_FILES now includes files showing global secondary index disk usage.
  • information_schema.MV_BLOCKED_QUERIES now shows queries on columnstore tables that are blocked.
  • Added connection link feature that stores connection details (credentials and configurations) to supported data providers such as S3, Azure, GCS, HDFS, and Kafka. Permitted users can run commands such as BACKUP,RESTORE, CREATE PIPELINE, and SELECT … INTO without specifying the connection details.

Maintenance Release Changelog

2021-04-01 Version 7.3.7

2021-03-15 Version 7.3.6

  • Running DDL inside of stored procedures from child aggregators is now properly supported.
  • Fixed a crash that previously occurred in cases where SHOW CLUSTER STATUS was run while a leaf node was unresponsive.
  • Fixed an issue that caused incorrect tracing to the error log when SHOW WORKLOAD MANAGEMENT STATUS was executed.
  • Fixed an issue with the system variable redundancy_level being incorrectly set on leaf nodes, which previously caused errors when CHECK BLOB CHECKSUM WITH REPAIR was run.
  • Monitoring clusters are now supported via SSL.
  • Fixed an issue that previously caused an incorrect error when running LOAD DATA from inside a stored procedure.

2021-03-08 Version 7.3.5

  • REBALANCE ALL DATABASES and REBALANCE PARTITIONS FULL now fully rebalance redundancy 1 clusters.
  • Added support for connecting to Avro Confluent Schema Registry over SSL, by specifying SSL configuration settings in CREATE PIPELINE.
  • Fixed an issue with automatic histograms on clusters that were upgraded with the variable cardinality_estimation_level set to 6.0.

2021-03-01 Version 7.3.4

  • Fixed an issue that occurred when running ALTER or TRUNCATE against a table that included ON DUPLICATE KEY UPDATE in its CREATE PIPELINE statement.
  • Fixed an issue that caused read actions on columnstore JSON objects to result in an error (Error reading from parquet: Column not found in schema) in cases where trailing whitespaces were present in the JSON key names stored for a given table.
  • Improved the optimizer’s cost estimates for hash joins when one of the tables in a given join has an index matching the join condition.
  • When inserting into a columnstore table, columns not needed to deal with a duplicate key are no longer decompressed.
  • Fixed a crash that occurred when an aggregate function was used in an unsupported context inside a stored procedure.
  • Added a new engine variable optimizer_empty_tables_limit, which sets a threshold for the number of empty tables that must be present in a query before the optimizer falls back to being rule based, instead of cost based. For more information, see List of Engine Variables.
  • When running memsql_exporter, the password for the monitoring user is no longer exposed in the ps output.
  • Allowed memsqlctl to establish a secure connection when a node is configured for SSL using either the engine variable ssl_ca, or the host system’s trusted certificates for verification.
  • Increased TasksMax to 128000 for memsql.service in order to prevent errors and/or unexpected node behavior. Previously, this setting was often exceeded, causing systemd to limit the ability of memsqld to fork processes.
  • Fixed an issue that caused auto-rebalance operations occuring when leaf_failover_fanout is set to load_balanced not to fully rebalance the cluster.
  • Updated the Websocket Proxy to respect the engine variables, ssl_cipher and tls_version, in configuring TLS for external communication via https.

2021-01-19 Version 7.3.3

  • Fixed a timeout error that occurred when running REBALANCE PARTITIONS on databases with large columnstore tables.
  • Improved the speed of server shutdown and of recovery for databases with large numbers of tables. This improvement is evident at 1,000 tables or more, and increases significantly after 10,000 tables.
  • Fixed an issue that caused RESTORE commands to fail in cases where a database contained a large number of reference tables.
  • Fixed an issue that caused DROP TEMPORARY TABLE IF NOT EXISTS to incorrectly return an error in cases where the table didn’t exist.
  • Fixed a crash caused by a check on the maximum number of partitions allowed on a leaf node. Now, an error message is surfaced when this is exceeded.
  • Fixed issue that caused an error (Transaction rolled back mid-query) to occur when running cross-database queries in multi-statement transactions.
  • Reduced blocks on UPDATE and DELETE commands via background merger.
  • Added a new variable, default_columnstore_table_lock_threshold, which sets a threshold for the number of rows that are locked when updating rows in a columnstore table before a table lock is acquired. For more information, see List of Engine Variables.

2020-12-15 Version 7.3

  • Initial GA release of SingleStore DB 7.3.2

Documentation Changelog

While the release notes serve to capture product changes for a given version of SingleStore DB, this section serves to capture significant additions to our documentation since the last release. These changes include areas of documentation that didn’t previously exist, or have been deepened and improved in order to make them more useful for our readers, but are not necessarily related to the release itself.

2021-04-01

2021-03-01

2021-02-01

2021-01-19