Understanding How Datatype Can Affect Performance
On this page
Comparing Mismatched Datatypes
Using comparisons between mismatched data types may degrade query performance and may use unsafe type conversions which can yield undesirable query results.EXPLAIN
and information_
records for the query.
These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different data types.
When you see these warnings, you should check whether the data types are expected to be different.
These warnings can be seen in EXPLAIN <query>
, information_
, information_
, and the PlanWarnings
column of show plancache
in textual form, as well as in EXPLAIN JSON <query>
, information_
, and information_
in JSON form.
Example
An example of these warnings is the following:
CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));EXPLAIN SELECT * FROM t WHERE id = 123;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.singlestore.com/docs/mismatched-datatypes for more information. |
| |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'. |
| |
| Gather partitions:all alias:remote_0 |
| Project [t.id] |
| Filter [t.id = 123] |
| TableScan db.t, PRIMARY KEY (id) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In this example, the query involves the expression t.
, where t.
is a string field.'123'
is converted to the number 123
.
This comparison may be problematic for multiple reasons:
-
The mismatched data types may indicate a mistake in how the query is written or how the table is defined.
The query may behave differently than intended and yield undesired results. For example, all of the strings '123'
,'123.
,0' '0123'
, and'123abc'
compare equal to the integer123
, so while the query may be intended to retrieve a single row with the specifiedid
, this equality comparison may yield multiple rows whoseid
values all compare equal to123
. -
The mismatched data types negatively impact the performance of the query.
If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching id
.But because there are many possible string id
s that match the number123
, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.
Data Type Conversion
The global sync variable data_
controls the way certain data conversions are performed.'7.
level will fail the conversion of 'abc'
to an integer value, while the '6.
level will silently convert it to a NULL or 0.
The following table lists the data type conversion checks and behavior changes that are introduced with each data_
value.
data_ |
Data Type Conversion Checks |
---|---|
|
|
|
|
|
|
|
|
|
|
It’s recommended to set the data_
variable to the highest available value for new application development.data_
for versions 8.data_
even if one of your shard keys is a persisted computed column.data_
.data_
can change the behavior of expressions in computed columns.
If a computed column value changes due to a change in data_
; columnstore sorting, indexes, and sharding can become logically corrupted.data_
or sql_
changes.
Note
sql_
is persisted to the CREATE TABLE
statement.sql_
that the table is created with, which may be different from the current sql_
.
If a persisted computed column is also a shard key and the data_
has been changed, rows can move to a different partition.
For example, see how expression evaluation may behave differently under the two levels:
SET GLOBAL data_conversion_compatibility_level = '6.0';SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
| 20230132 |
+--------------------------------+
SET GLOBAL data_conversion_compatibility_level = '8.0';SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
| 20230201 |
+--------------------------------+
Finding Incorrect Computed Shard Key Values
You can find all of the incorrect computed shard key values with a SQL query that searches for records where the computed columns are not equal to the value of their computed column expression.
First, set the allow_
engine variable to ON
.data_
on databases that contain tables with a persisted computed column as the shard key.
SET GLOBAL allow_modifying_sysvars_with_computed_shard_key = ON;
Consider a dates
table where dates are inserted into the table as strings.DATE
format and 1 day is added to the inserted date.data_
is 6.
.
CREATE TABLE dates (date_str TEXT,date_int AS CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT) PERSISTED INT,SHARD KEY(date_int));
INSERT dates VALUES('2023-01-31'), ('2023-01-30');
SELECT * FROM dates;
+------------+----------+
| date_str | date_int |
+------------+----------+
| 2023-01-30 | 20230131 |
| 2023-01-31 | 20230132 |
+------------+----------+
Next, follow these steps:
-
Set the
data_
to your desired version:conversion_ compatibility_ level -
Compare the computed column with a
SELECT
statement with aWHERE
filter that finds rows where the computed column is not equal to the value of its computed column expression.For example: SET GLOBAL data_conversion_compatibility_level = '8.0';Note
When creating a query to find values where the computed column changed, make sure that it is not using a shard key lookup, which might not find the desired row due to the change in shard key value.
For example, the query used to search for changed computed columns uses a full columnstore scan, which you can verify using
EXPLAIN
:EXPLAIN SELECT * FROM datesWHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1,SIGNED INT);+----------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------+ | Gather partitions:all alias:remote_0 parallelism_level:segment | | Project [r.date_str, r.date_int] | | Top limit:[?] | | ColumnStoreFilter [r.date_int <> CAST(DATE_ADD(DATE(r.date_str), INTERVAL 1 DAY) AS SIGNED)] | | ColumnStoreScan test1.r, SORT KEY __UNORDERED () table_type:sharded_columnstore | +----------------------------------------------------------------------------------------------+ 4 rows in set (0.00 sec)
-
Once the affected rows have been identified, delete the row and re-insert it.
-
Create a temporary table where the data will be backed up:
CREATE TEMPORARY TABLE tmp_dates LIKE dates; -
Insert the record(s) with the incorrect shard key value into the temporary table to back up the data.
As previously explained, the value for date_
= ‘2023-01-31’ has been recomputed to a valid value:str INSERT INTO tmp_dates (date_str)SELECT * FROM datesWHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);SELECT * FROM tmp_dates;+-------------+-----------+ | date_str | date_int | +-------------+-----------+ | 2023-01-31 | 20230201 | +-------------+-----------+
-
Using the query that finds rows where the computed column is not equal to the value of its computed column expression (#3), delete those rows from the temporary table:
DELETE FROM dates WHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);Query OK, 1 row affected (0.06 sec)Records: 1 Duplicates: 0 Warnings: 0SELECT * FROM dates;+------------+----------+ | date_str | date_int | +------------+----------+ | 2023-01-30 | 20230131 | +------------+----------+
-
Insert the data from the temporary table back into
dates
and confirm thedate_
column is not valid under the newint data_
.conversion_ compatibility_ level INSERT INTO dates (date_str) SELECT date_str from tmp_dates;SELECT * FROM dates;+------------+----------+ | date_str | date_int | +------------+----------+ | 2023-01-30 | 20230131 | | 2023-01-31 | 20230201 | +------------+----------+
-
Drop the temporary table:
DROP TABLE tmp_dates;
-
-
Lastly, set the
allow_
engine variable back to its default setting ofmodifying_ sysvars_ with_ computed_ shard_ key OFF
.
The procedure above is one possible way to repair persisted computed column shard keys that may have changed after you change data_
.t.
is 5
in a row in table t
, and you try to update t.
to 5
, the operation may not have any effect and not trigger recalculation of any persisted computed columns in the row.
Best Practices to Avoid Data Corruption
-
Review the computed column expressions when changing
data_
.conversion_ compatibility_ level -
Review the computed column expressions when changing
data_
.conversion_ compatibility_ level -
Perform application upgrade tests.
For example, if a computed column contains corrupted data and you have to switch to a higher data_
, you may need to drop the computed column before changing the level.
The following examples demonstrate how the behavior of expressions may change if data_
is changed.
Example 1
SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 20190422 |
+------------------------+
Example 2
SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22 |
+------------------------+
The following example shows the error message generated if a zero value is attempted to be inserted into a DATE
, DATETIME
, or DATETIME(6)
column if the data_
is set to 7.
or higher.
SET GLOBAL data_conversion_compatibility_level = '7.0';
CREATE TABLE dates (dt DATETIME);INSERT INTO dates VALUES ('0000-00-00');
ERROR 2363 (HY000): Leaf Error: Invalid DATE/TIME in type conversion
See the table on the Time and Date page for examples of zero values for DATE
, DATETIME
, and DATETIME(6)
data types.
Last modified: October 18, 2021