Understanding How Datatype Can Affect Performance

Comparing Mismatched Datatypes

Using comparisons between mismatched data types may degrade query performance and may use unsafe type conversions which can yield undesirable query results. SingleStore will display a warning for queries with potentially problematic comparisons between mismatched data types in the EXPLAIN and information_schema records for the query.

These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different data types. But these data type mismatches are flagged to help you identify potential problems that you may not be aware of.

When you see these warnings, you should check whether the data types are expected to be different. You may wish to consider changing the data types of the fields or values involved. You may also wish to consider adding explicit type conversion operations, such as the cast operators or functions like STR_TO_DATE which convert between different types.

These warnings can be seen in EXPLAIN <query>, information_schema.plancache.plan_warnings, information_schema.mv_queries.plan_warnings, and the PlanWarnings column of show plancache in textual form, as well as in EXPLAIN JSON <query>, information_schema.plancache.plan_info, and information_schema.mv_queries.plan_info in JSON form.

Example

An example of these warnings is the following:

CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));
EXPLAIN SELECT * FROM t WHERE id = 123;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.singlestore.com/docs/mismatched-datatypes for more information. |
|                                                                                                                                                                                                                                                                        |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'.                                                                                                  |
|                                                                                                                                                                                                                                                                        |
| Gather partitions:all alias:remote_0                                                                                                                                                                                                                                   |
| Project [t.id]                                                                                                                                                                                                                                                         |
| Filter [t.id = 123]                                                                                                                                                                                                                                                    |
| TableScan db.t, PRIMARY KEY (id)                                                                                                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

In this example, the query involves the expression t.i = 123, where t.i is a string field. When a numeric value is compared to a string, the string is converted to a numeric value, e.g. the string '123' is converted to the number 123.

This comparison may be problematic for multiple reasons:

  • The mismatched data types may indicate a mistake in how the query is written or how the table is defined. The query may behave differently than intended and yield undesired results. For example, all of the strings '123', '123.0', '0123', and '123abc' compare equal to the integer 123, so while the query may be intended to retrieve a single row with the specified id, this equality comparison may yield multiple rows whose id values all compare equal to 123.

  • The mismatched data types negatively impact the performance of the query. If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching id. But because there are many possible string ids that match the number 123, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.

Data Type Conversion

The global sync variable data_conversion_compatibility_level controls the way certain data conversions are performed. This variable can have the following possible values: '6.0', '6.5', '7.0','7.5', and '8.0'. Higher values introduce stricter checking of values and will error for conversions that worked at lower levels. For example, the '7.0' level will fail the conversion of 'abc' to an integer value, while the '6.0' level will silently convert it to a NULL or 0.

The following table lists the data type conversion checks and behavior changes that are introduced with each data_conversion_compatibility_level value. Note that the data type conversion checks introduced in a compatibility level are preserved in subsequent levels.

data_conversion_compatibility_level

Data Type Conversion Checks

6.0

  • The lowest data type conversion compatibility level.

  • Default value for SingleStore engine versions lower than 8.0.

6.5

  • Error on integer overflows and underflows in INSERT statements. This check applies to all INTEGER data types, including BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT, BIT, and BOOL.

  • Error on string inputs to VARCHAR columns that are too short to store the data. This check is run against INSERT statements.

  • Error on string inputs to LONGBLOB, MEDIUMBLOB, BLOB, TINYBLOB, LONGTEXT, MEDIUMTEXT, TEXT, and TINYTEXT columns that are too short to store the data. This check is run against INSERT statements.

  • Error on invalid string to int conversion in INSERT statements. Only strings with valid characters will be converted to integers.

7.0

  • Error on string inputs to CHAR columns that are too short to store the data. This check is run against INSERT statements.

  • Error on inputs to BINARY columns that are too short to store the data. This check is run against INSERT statements.

  • Error on decimal inputs that do not match the size specified in the DECIMAL column definition. This check is run against INSERT statements.

  • Error on invalid or out-of-range date inputs to DATE, DATETIME, DATETIME(6), and YEAR columns in INSERT statements.

  • Errors on zero value date inputs into DATE, DATETIME, and DATETIME(6) columns in INSERT statements.

  • Arithmetic operations involving DATE and INT values are performed by converting the INT data into intervals (days) instead of converting both data types to DOUBLE.

7.5

  • Error on invalid string to DECIMAL conversion in INSERT statements. Only strings with valid characters undergo numeric conversion.

8.0

  • Errors on conversions from STRING to DATE like data types instead of generating NULL or 0.

  • Default value beginning in version 8.0.

8.1

  • Errors on the invalid or out-of-range timestamp inputs to TIMESTAMP and TIMESTAMP(6) columns in INSERT statements.

It’s recommended to set the data_conversion_compatibility_level variable to the highest available value for new application development. SingleStore supports in-place change of data_conversion_compatibility_level for versions 8.0.24 and newer. This allows you to change your data_conversion_compatibility_level even if one of your shard keys is a persisted computed column. Prior to this, shard keys that were persisted computed columns could become logically corrupted when changing the data_conversion_compatibility_level. For existing applications, it’s also recommended to use the highest available level, but it’s recommended that you test your application before deploying this change. In particular, changing the value of data_conversion_compatibility_level can change the behavior of expressions in computed columns.

If a computed column value changes due to a change in data_conversion_compatibility_level; columnstore sorting, indexes, and sharding can become logically corrupted. SingleStore does not recompile an existing plan when data_conversion_compatibility_level or sql_mode changes.

Note

sql_mode is persisted to the CREATE TABLE statement. Therefore, the computed column of a table always uses the sql_mode that the table is created with, which may be different from the current sql_mode.

If a persisted computed column is also a shard key and the data_conversion_compatibility_level has been changed, rows can move to a different partition. Updating the row will move it  to the correct partition automatically, but some queries won’t be able to find the correct partition to fetch a row when they otherwise would.

For example, see how expression evaluation may behave differently under the two levels:

SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
|                       20230132 |
+--------------------------------+
SET GLOBAL data_conversion_compatibility_level = '8.0';
SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
|                       20230201 |
+--------------------------------+

Finding Incorrect Computed Shard Key Values

You can find all of the incorrect computed shard key values with a SQL query that searches for records where the computed columns are not equal to the value of their computed column expression.

First, set the allow_modifying_sysvars_with_computed_shard_key engine variable to ON. This will allow you to change the data_conversion_compatibility_level on databases that contain tables with a persisted computed column as the shard key. Otherwise, an error would be generated.

SET GLOBAL allow_modifying_sysvars_with_computed_shard_key = ON;

Consider a dates table where dates are inserted into the table as strings. Then those dates must be converted to the DATE format and 1 day is added to the inserted date. The persisted computed column is the shard key and the current data_conversion_compatibility_level is 6.0.

CREATE TABLE dates (date_str TEXT
,date_int AS CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT) PERSISTED INT
,SHARD KEY(date_int));
INSERT dates VALUES('2023-01-31'), ('2023-01-30');
SELECT * FROM dates;
+------------+----------+
| date_str   | date_int |
+------------+----------+
| 2023-01-30 | 20230131 |
| 2023-01-31 | 20230132 | 
+------------+----------+

Next, follow these steps:

  1. Set the data_conversion_compatibility_level to your desired version:

  2. Compare the computed column with a SELECT statement with a WHERE filter that finds rows where the computed column is not equal to the value of its computed column expression. For example:

    SET GLOBAL data_conversion_compatibility_level = '8.0';

    Note

    When creating a query to find values where the computed column changed, make sure that it is not using a shard key lookup, which might not find the desired row due to the change in shard key value. 

    For example, the query used to search for changed computed columns uses a full columnstore scan, which you can verify using EXPLAIN:

    EXPLAIN SELECT * FROM dates
    WHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1,
    SIGNED INT);
    +----------------------------------------------------------------------------------------------+
    | EXPLAIN                                                                                      |
    +----------------------------------------------------------------------------------------------+
    | Gather partitions:all alias:remote_0 parallelism_level:segment                               |                           
    | Project [r.date_str, r.date_int]                                                             |
    | Top limit:[?]                                                                                |                                           
    | ColumnStoreFilter [r.date_int <> CAST(DATE_ADD(DATE(r.date_str), INTERVAL 1 DAY) AS SIGNED)] |
    | ColumnStoreScan test1.r, SORT KEY __UNORDERED () table_type:sharded_columnstore              |
    +----------------------------------------------------------------------------------------------+
    4 rows in set (0.00 sec)
  3. Once the affected rows have been identified, delete the row and re-insert it.

    • Create a temporary table where the data will be backed up:

      CREATE TEMPORARY TABLE tmp_dates LIKE dates;
    • Insert the record(s) with the incorrect shard key value into the temporary table to back up the data. As previously explained, the value for date_str = ‘2023-01-31’ has been recomputed to a valid value:

      INSERT INTO tmp_dates (date_str)
      SELECT * FROM dates
      WHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);
      SELECT * FROM tmp_dates;
      +-------------+-----------+
      | date_str    | date_int  |
      +-------------+-----------+
      | 2023-01-31  |  20230201 |
      +-------------+-----------+
    • Using the query that finds rows where the computed column is not equal to the value of its computed column expression (#3), delete those rows from the temporary table:

      DELETE FROM dates WHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);
      Query OK, 1 row affected (0.06 sec)
      Records: 1 Duplicates: 0 Warnings: 0
      SELECT * FROM dates;
      +------------+----------+
      | date_str   | date_int |
      +------------+----------+
      | 2023-01-30 | 20230131 |
      +------------+----------+
    • Insert the data from the temporary table back into dates and confirm the date_int column is not valid under the new data_conversion_compatibility_level.

      INSERT INTO dates (date_str) SELECT date_str from tmp_dates;
      SELECT * FROM dates;
      +------------+----------+
      | date_str   | date_int |
      +------------+----------+
      | 2023-01-30 | 20230131 |
      | 2023-01-31 | 20230201 |
      +------------+----------+
    • Drop the temporary table:

      DROP TABLE tmp_dates;
  4. Lastly, set the allow_modifying_sysvars_with_computed_shard_key engine variable back to its default setting of OFF.

The procedure above is one possible way to repair persisted computed column shard keys that may have changed after you change data_conversion_compatibility_level. Other ways are also possible–any update to a row will recompute a persisted computed column. However, setting a field value to itself may not update the row due to logic created to optimize updates by doing nothing when an update is known to have no effect. For example, if t.A is 5 in a row in table t, and you try to update t.A to 5, the operation may not have any effect and not trigger recalculation of any persisted computed columns in the row.

Best Practices to Avoid Data Corruption

  • Review the computed column expressions when changing data_conversion_compatibility_level.

  • Perform application upgrade tests.

For example, if a computed column contains corrupted data and you have to switch to a higher data_conversion_compatibility_level, you may need to drop the computed column before changing the level. Once the level is increased, add the computed column to the table. Alternatively, if a persisted computed column is used in a columnstore key, you may have to create a new table and select data into this new table. After copying the data, drop the old table and rename the new table.

The following examples demonstrate how the behavior of expressions may change if data_conversion_compatibility_level is changed.

Example 1

SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
|               20190422 |
+------------------------+

Example 2

SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22             |
+------------------------+

The following example shows the error message generated if a zero value is attempted to be inserted into a DATE, DATETIME, or DATETIME(6) column if the data_conversion_compatibility_level is set to 7.0 or higher.

SET GLOBAL data_conversion_compatibility_level = '7.0';
CREATE TABLE dates (dt DATETIME);
INSERT INTO dates VALUES ('0000-00-00');
ERROR 2363 (HY000): Leaf Error: Invalid DATE/TIME in type conversion

See the table on the Time and Date page for examples of zero values for DATE, DATETIME, and DATETIME(6) data types. This page also discusses how date (and time) values are implicitly converted from string literals (e.g., '2023-02-03') and how to use date and time functions to perform explicit conversions from string literals.

Last modified: October 18, 2021

Was this article helpful?