Implicit Collation in Special Cases
By default, SingleStore usesutf8mb4
and utf8mb4_
for character_
and collation_
respectively.character_
is set to utf8
, SingleStore accepts characters that use 4-byte encoding, stores them, and displays them in the output.utf8
, the string literals are implicitly assigned binary collation and processed as a sequence of bytes rather than characters.
For example, the CHARACTER_
string function returns the following output when used against a string literal with 4-byte encoding.
SHOW VARIABLES like "collation%";
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.001 sec)
SELECT character_length("ๆฅ"), character_length("๐");
+-------------------------+--------------------------+
| character_length("ๆฅ") | character_length("๐") |
+-------------------------+--------------------------+
| 1 | 4 |
+-------------------------+--------------------------+
1 row in set (0.092 sec)
In the above example, SingleStore tries to interpret the provided string literal using utf8_
.CHARACTER_
to return 4 instead of 1 (as it would be if character_
was utf8mb4
), where 4 is a size in bytes of the string literal encoding.
To avoid the implicit use of binary collation for string literals in string functions, either use explicit type casting or use database columns defined with the utf8mb4
character set.
SELECT "My string literal" COLLATE utf8mb4_unicode_ci;
SELECT character_length("๐" :> CHAR(20) COLLATE utf8mb4_unicode_ci) as result;
+--------+
| result |
+--------+
| 1 |
+--------+
Alternatively, you can use a database column defined with the utf8mb4
character set.
CREATE TABLE t(a VARCHAR(200) COLLATE utf8mb4_unicode_ci);INSERT INTO t VALUES("Hi ๐!");SELECT * FROM t;
+----------+------+
| a | len |
+----------+------+
| Hi ๐! | 5 |
+----------+------+
SELECT a, character_length(a) AS len FROM t;
+--------+
| result |
+--------+
| 1 |
+--------+
Last modified: June 7, 2024