SingleStore Managed Service

Understanding How Datatype Can Affect Performance
Comparing mismatched datatypes

Using comparisons between mismatched datatypes may degrade query performance and may use unsafe type conversions which can yield undesirable query results. SingleStore will display a warning for queries with potentially problematic comparisons between mismatched datatypes in the EXPLAIN and information_schema records for the query.

These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different datatypes. But these datatype mismatches are flagged to help you identify potential problems that you may not be aware of.

When you see these warnings, you should check whether the datatypes are expected to be different. You may wish to consider changing the datatypes of the fields or values involved. You may also wish to consider adding explicit type conversion operations, such as the cast operators or functions like STR_TO_DATE which convert between different types.

These warnings can be seen in EXPLAIN <query>, information_schema.plancache.plan_warnings, information_schema.mv_queries.plan_warnings, and the PlanWarnings column of show plancache in textual form, as well as in EXPLAIN JSON <query>, information_schema.plancache.plan_info, and information_schema.mv_queries.plan_info in JSON form.

Example

An example of these warnings is the following:

CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));

EXPLAIN SELECT * FROM t WHERE id = 123;
****
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.singlestore.com/docs/mismatched-datatypes for more information. |
|                                                                                                                                                                                                                                                                        |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'.                                                                                                  |
|                                                                                                                                                                                                                                                                        |
| Gather partitions:all alias:remote_0                                                                                                                                                                                                                                   |
| Project [t.id]                                                                                                                                                                                                                                                         |
| Filter [t.id = 123]                                                                                                                                                                                                                                                    |
| TableScan db.t, PRIMARY KEY (id)                                                                                                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In this example, the query involves the expression t.i = 123, where t.i is a string field. When a numeric value is compared to a string, the string is converted to a numeric value, e.g. the string '123' is converted to the number 123.

This comparison may be problematic for multiple reasons:

  • The mismatched datatypes may indicate a mistake in how the query is written or how the table is defined. The query may behave differently than intended and yield undesired results. For example, all of the strings '123', '123.0', '0123', and '123abc' compare equal to the integer 123, so while the query may be intended to retrieve a single row with the specified id, this equality comparison may yield multiple rows whose id values all compare equal to 123.

  • The mismatched datatypes negatively impact the performance of the query. If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching id. But because there are many possible string ids that match the number 123, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.

Data Type Conversion

The global sync variable data_conversion_compatibility_level controls the way certain data conversions are performed. This variable can have the following possible values: '6.0', '6.5', '7.0', and '7.5'. Higher values introduce stricter checking of values and will error for conversions that worked at lower levels. E.g., the '7.0' level will fail the conversion of 'abc' to an integer value, while the '6.0' level will silently convert it to a NULL or 0.

The following table lists the data type conversion checks and behavior changes that are introduced with each data_conversion_compatibility_level value. Note that the data type conversion checks introduced in a compatibility level are preserved in subsequent levels.

data_conversion_compatibility_level

Data Type Conversion Checks

6.0

The lowest data type conversion compatibility level. It is the default data_conversion_compatibility_level in all SingleStore versions.

6.5

  • Error on integer overflows and underflows in INSERT statements. This check applies to all INTEGER data types, including BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT, BIT, and BOOL.

  • Error on string inputs to VARCHAR columns that are too short to store the data. This check is run against INSERT statements.

  • Error on string inputs to LONGBLOB, MEDIUMBLOB, BLOB, TINYBLOB, LONGTEXT, MEDIUMTEXT, TEXT, and TINYTEXT columns that are too short to store the data. This check is run against INSERT statements.

  • Error on invalid string to int conversion in INSERT statements. Only strings with valid characters will be converted to integers.

7.0

  • Error on string inputs to CHAR columns that are too short to store the data. This check is run against INSERT statements.

  • Error on inputs to BINARY columns that are too short to store the data. This check is run against INSERT statements.

  • Error on decimal inputs that do not match the size specified in the DECIMAL column definition. This check is run against INSERT statements.

  • Error on invalid or out-of-range date inputs to DATE, DATETIME, DATETIME(6), and YEAR  columns in INSERT statements.

  • Arithmetic operations involving DATE and INT values are performed by converting the INT data into intervals (days) instead of converting both data types to DOUBLE.

7.5

Error on invalid string to DECIMAL conversion in INSERT statements. Only strings with valid characters undergo numeric conversion.

It’s recommended to set the data_conversion_compatibility_level variable to the highest available value for new application development. For existing applications, it’s also recommended to use the highest available level, but it’s recommended that you test your application before deploying this change. In particular, changing the value of data_conversion_compatibility_level can change the behavior of expressions in computed columns.

If a computed column value changes due to a change in data_conversion_compatibility_level, columnstore sorting, indexes, and sharding can become logically corrupted. SingleStore does not recompile an existing plan when data_conversion_compatibility_level or sql_mode changes.

Notice

sql_mode is persisted to the CREATE TABLE statement. Therefore, the computed column of a table always uses the sql_mode that the table is created with, which may be different from the current sql_mode.

Best practices to avoid data corruption

  • Review the computed column expressions when changing data_conversion_compatibility_level.

  • Change data_conversion_compatibility_level only as a part of an application upgrade process.

  • Perform application upgrade tests.

For example, if a computed column contains corrupted data and you have to switch to a higher data_conversion_compatibility_level, you may need to drop the computed column before changing the level. Once the level is increased, add the computed column to the table. Alternatively, if a persisted computed column is used in a columnstore key, you may have to create a new table and select data into this new table. After copying the data, drop the old table and rename the new table.

The following examples demonstrate how the behavior of expressions may change if data_conversion_compatibility_level is changed.

Example 1

SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
|               20190422 |
+------------------------+

Example 2

SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
****
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22             |
+------------------------+