Data Type Conversion
On this page
The global sync variable data_
controls the way certain data conversions are performed.'7.
level will fail the conversion of 'abc'
to an integer value, while the '6.
level will silently convert it to a NULL or 0.
The following table lists the data type conversion checks and behavior changes that are introduced with each data_
value.
data_ |
Data Type Conversion Checks |
---|---|
|
|
|
|
|
|
|
|
|
|
It’s recommended to set the data_
variable to the highest available value for new application development.data_
for versions 8.data_
even if one of your shard keys is a persisted computed column.data_
.data_
can change the behavior of expressions in computed columns.
If a computed column value changes due to a change in data_
; columnstore sorting, indexes, and sharding can become logically corrupted.data_
or sql_
changes.
Note
sql_
is persisted to the CREATE TABLE
statement.sql_
that the table is created with, which may be different from the current sql_
.
If a persisted computed column is also a shard key and the data_
has been changed, rows can move to a different partition.
For example, see how expression evaluation may behave differently under the two levels:
SET GLOBAL data_conversion_compatibility_level = '6.0';SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
| 20230132 |
+--------------------------------+
SET GLOBAL data_conversion_compatibility_level = '8.0';SELECT CAST('2023-01-31' AS DATE) + 1;
+--------------------------------+
| CAST('2023-01-31' AS DATE) + 1 |
+--------------------------------+
| 20230201 |
+--------------------------------+
Finding Incorrect Computed Shard Key Values
You can find all of the incorrect computed shard key values with a SQL query that searches for records where the computed columns are not equal to the value of their computed column expression.
First, set the allow_
engine variable to ON
.data_
on databases that contain tables with a persisted computed column as the shard key.
SET GLOBAL allow_modifying_sysvars_with_computed_shard_key = ON;
Consider a dates
table where dates are inserted into the table as strings.DATE
format and 1 day is added to the inserted date.data_
is 6.
.
CREATE TABLE dates (date_str TEXT,date_int AS CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT) PERSISTED INT,SHARD KEY(date_int));
INSERT dates VALUES('2023-01-31'), ('2023-01-30');
SELECT * FROM dates;
+------------+----------+
| date_str | date_int |
+------------+----------+
| 2023-01-30 | 20230131 |
| 2023-01-31 | 20230132 |
+------------+----------+
Next, follow these steps:
-
Set the
data_
to your desired version:conversion_ compatibility_ level -
Compare the computed column with a
SELECT
statement with aWHERE
filter that finds rows where the computed column is not equal to the value of its computed column expression.For example: SET GLOBAL data_conversion_compatibility_level = '8.0';Note
When creating a query to find values where the computed column changed, make sure that it is not using a shard key lookup, which might not find the desired row due to the change in shard key value.
For example, the query used to search for changed computed columns uses a full columnstore scan, which you can verify using
EXPLAIN
:EXPLAIN SELECT * FROM datesWHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1,SIGNED INT);+----------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------+ | Gather partitions:all alias:remote_0 parallelism_level:segment | | Project [r.date_str, r.date_int] | | Top limit:[?] | | ColumnStoreFilter [r.date_int <> CAST(DATE_ADD(DATE(r.date_str), INTERVAL 1 DAY) AS SIGNED)] | | ColumnStoreScan test1.r, SORT KEY __UNORDERED () table_type:sharded_columnstore | +----------------------------------------------------------------------------------------------+ 4 rows in set (0.00 sec)
-
Once the affected rows have been identified, delete the row and re-insert it.
-
Create a temporary table where the data will be backed up:
CREATE TEMPORARY TABLE tmp_dates LIKE dates; -
Insert the record(s) with the incorrect shard key value into the temporary table to back up the data.
As previously explained, the value for date_
= ‘2023-01-31’ has been recomputed to a valid value:str INSERT INTO tmp_dates (date_str)SELECT * FROM datesWHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);SELECT * FROM tmp_dates;+-------------+-----------+ | date_str | date_int | +-------------+-----------+ | 2023-01-31 | 20230201 | +-------------+-----------+
-
Using the query that finds rows where the computed column is not equal to the value of its computed column expression (#3), delete those rows from the temporary table:
DELETE FROM dates WHERE date_int <> CONVERT(CONVERT(date_str, DATE) + 1, SIGNED INT);Query OK, 1 row affected (0.06 sec)Records: 1 Duplicates: 0 Warnings: 0SELECT * FROM dates;+------------+----------+ | date_str | date_int | +------------+----------+ | 2023-01-30 | 20230131 | +------------+----------+
-
Insert the data from the temporary table back into
dates
and confirm thedate_
column is not valid under the newint data_
.conversion_ compatibility_ level INSERT INTO dates (date_str) SELECT date_str from tmp_dates;SELECT * FROM dates;+------------+----------+ | date_str | date_int | +------------+----------+ | 2023-01-30 | 20230131 | | 2023-01-31 | 20230201 | +------------+----------+
-
Drop the temporary table:
DROP TABLE tmp_dates;
-
-
Lastly, set the
allow_
engine variable back to its default setting ofmodifying_ sysvars_ with_ computed_ shard_ key OFF
.
The procedure above is one possible way to repair persisted computed column shard keys that may have changed after you change data_
.t.
is 5
in a row in table t
, and you try to update t.
to 5
, the operation may not have any effect and not trigger recalculation of any persisted computed columns in the row.
Best Practices to Avoid Data Corruption
-
Review the computed column expressions when changing
data_
.conversion_ compatibility_ level -
Perform application upgrade tests.
For example, if a computed column contains corrupted data and you have to switch to a higher data_
, you may need to drop the computed column before changing the level.
The following examples demonstrate how the behavior of expressions may change if data_
is changed.
Example 1
SET GLOBAL data_conversion_compatibility_level = '6.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 20190422 |
+------------------------+
Example 2
SET GLOBAL data_conversion_compatibility_level = '7.0';
SELECT DATE('2019-04-20') + 2;
+------------------------+
| DATE('2019-04-20') + 2 |
+------------------------+
| 2019-04-22 |
+------------------------+
The following example shows the error message generated if a zero value is attempted to be inserted into a DATE
, DATETIME
, or DATETIME(6)
column if the data_
is set to 7.
or higher.
SET GLOBAL data_conversion_compatibility_level = '7.0';
CREATE TABLE dates (dt DATETIME);INSERT INTO dates VALUES ('0000-00-00');
ERROR 2363 (HY000): Leaf Error: Invalid DATE/TIME in type conversion
See the table on the Time and Date page for examples of zero values for DATE
, DATETIME
, and DATETIME(6)
data types.
Last modified: October 25, 2024