Implicit Collation in Special Cases

By default, SingleStore usesutf8mb4 and utf8mb4_general_ci for character_set_server and collation_server respectively. When character_set_server is set to utf8, SingleStore accepts characters that use 4-byte encoding, stores them, and displays them in the output. However, as 4-byte characters are invalid in utf8, the string literals are implicitly assigned binary collation and processed as a sequence of bytes rather than characters. This implicit conversion to binary collation causes string functions and operations to return unexpected results.

For example, the CHARACTER_LENGTH string function returns the following output when used against a string literal with 4-byte encoding.

SHOW VARIABLES like "collation%";
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.001 sec)
SELECT character_length("ζ•₯"), character_length("πŸ˜€");
+-------------------------+--------------------------+
| character_length("ζ•₯")  | character_length("πŸ˜€")    |
+-------------------------+--------------------------+
|                       1 |                        4 |
+-------------------------+--------------------------+
1 row in set (0.092 sec)

In the above example, SingleStore tries to interpret the provided string literal using utf8_general_ci. As the string literal contains invalid characters for the current collation, SingleStore implicitly assigns a binary collation to the string literal and processes it as a sequence of bytes. This implicit use of binary collation causes CHARACTER_LENGTH("πŸ˜€") to return 4 instead of 1 (as it would be if character_set_server was utf8mb4), where 4 is a size in bytes of the string literal encoding.

To avoid the implicit use of binary collation for string literals in string functions, either use explicit type casting or use database columns defined with the utf8mb4 character set. The following example uses explicit typecasting of the string literal.

SELECT "My string literal" COLLATE utf8mb4_unicode_ci;
SELECT character_length("πŸ˜€" :> CHAR(20) COLLATE utf8mb4_unicode_ci) as result;
+--------+
| result |
+--------+
|      1 |
+--------+

Alternatively, you can use a database column defined with the utf8mb4 character set.

CREATE TABLE t(a VARCHAR(200) COLLATE utf8mb4_unicode_ci);
INSERT INTO t VALUES("Hi πŸ˜€!");
SELECT * FROM t;
+----------+------+
| a        | len  |
+----------+------+
| Hi πŸ˜€!   |    5 |
+----------+------+
SELECT a, character_length(a) AS len FROM t;
+--------+
| result |
+--------+
|      1 |
+--------+

Last modified: June 7, 2024

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK