Understanding How Datatype Can Affect Performance
On this page
Comparing Mismatched Datatypes
Using comparisons between mismatched data types may degrade query performance and may use unsafe type conversions which can yield undesirable query results.EXPLAIN
and information_
records for the query.
These warnings do not necessarily indicate a problem, and you may have valid reasons for comparing different data types.
When you see these warnings, you should check whether the data types are expected to be different.
These warnings can be seen in EXPLAIN <query>
, information_
, information_
, and the PlanWarnings
column of show plancache
in textual form, as well as in EXPLAIN JSON <query>
, information_
, and information_
in JSON form.
Example
An example of these warnings is the following:
CREATE TABLE t (id VARCHAR(50), PRIMARY KEY (id));EXPLAIN SELECT * FROM t WHERE id = 123;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| WARNING: Comparisons between mismatched datatypes which may involve unsafe datatype conversions and/or degrade performance. Consider changing the datatypes, or adding explicit typecasts. See https://docs.singlestore.com/docs/mismatched-datatypes for more information. |
| |
| WARNING: Comparison between mismatched datatypes: (`t`.`id` = 123). Types 'varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL' vs 'bigint(20) NOT NULL'. |
| |
| Gather partitions:all alias:remote_0 |
| Project [t.id] |
| Filter [t.id = 123] |
| TableScan db.t, PRIMARY KEY (id) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In this example, the query involves the expression t.
, where t.
is a string field.'123'
is converted to the number 123
.
This comparison may be problematic for multiple reasons:
-
The mismatched data types may indicate a mistake in how the query is written or how the table is defined.
The query may behave differently than intended and yield undesired results. For example, all of the strings '123'
,'123.
,0' '0123'
, and'123abc'
compare equal to the integer123
, so while the query may be intended to retrieve a single row with the specifiedid
, this equality comparison may yield multiple rows whoseid
values all compare equal to123
. -
The mismatched data types negatively impact the performance of the query.
If the field and constant were either both strings or both integers, the query plan would be able to use the index to efficiently seek to lookup the matching id
.But because there are many possible string id
s that match the number123
, which do not come in any particular order in terms of string lexicographic order, the query cannot seek into the index and instead must scan all the rows of the table.
Data Type Conversion
The global sync variable data_
controls the way certain data conversions are performed.'7.
level will fail the conversion of 'abc'
to an integer value, while the '6.
level will silently convert it to a NULL or 0.
The following table lists the data type conversion checks and behavior changes that are introduced with each data_
value.
data_ |
Data Type Conversion Checks |
---|---|
|
|
|
|
|
|
It’s recommended to set the data_
variable to the highest available value for new application development.data_
can change the behavior of expressions in computed columns.
If a computed column value changes due to a change in data_
; columnstore sorting, indexes, and sharding can become logically corrupted.data_
or sql_
changes.
Note
sql_
is persisted to the CREATE TABLE
statement.sql_
that the table is created with, which may be different from the current sql_
.
Best Practices to Avoid Data Corruption
-
Review the computed column expressions when changing
data_
.conversion_ compatibility_ level -
Perform application upgrade tests.
For example, if a computed column contains corrupted data and you have to switch to a higher data_
, you may need to drop the computed column before changing the level.
The following examples demonstrate how the behavior of expressions may change if data_
is changed.
Example 1
SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 20190422 |
+------------------------+
Example 2
SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22 |
+------------------------+
The following example shows the error message generated if a zero value is attempted to be inserted into a DATE
, DATETIME
, or DATETIME(6)
column if the data_
is set to 7.
or higher.
SET GLOBAL data_conversion_compatibility_level = '7.0';
CREATE TABLE dates (dt DATETIME);INSERT INTO dates VALUES ('0000-00-00');
ERROR 2363 (HY000): Leaf Error: Invalid DATE/TIME in type conversion
See the table on the Time and Date page for examples of zero values for DATE
, DATETIME
, and DATETIME(6)
data types.
Last modified: October 18, 2021