# Implicit Collation in Special Cases

By default, SingleStore uses`utf8mb4`  and `utf8mb4_bin` for `character_set_server` and `collation_server` respectively. When `character_set_server` is set to `utf8`, SingleStore accepts characters that use 4-byte encoding, stores them, and displays them in the output. However, as 4-byte characters are invalid in `utf8`, the string literals are implicitly assigned binary collation and processed as a sequence of bytes rather than characters. This implicit conversion to binary collation causes string functions and operations to return unexpected results.

For example, the `CHARACTER_LENGTH` string function returns the following output when used against a string literal with 4-byte encoding.

```sql
SHOW VARIABLES like "collation%";

```

```output

+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8mb4_bin     |
| collation_database   | utf8mb4_bin     |
| collation_server     | utf8mb4_bin     |
+----------------------+-----------------+
3 rows in set (0.001 sec)


```

```sql
SELECT character_length("敥"), character_length("😀");

```

```output

+-------------------------+--------------------------+
| character_length("敥")  | character_length("😀")    |
+-------------------------+--------------------------+
|                       1 |                        4 |
+-------------------------+--------------------------+
1 row in set (0.092 sec)
```

In the above example, the engine tries to interpret the provided string literal using `utf8_general_ci`. As the string literal contains invalid characters for the current collation,  a binary collation is implicitly assigned to the string literal and it is processed as a sequence of bytes. This implicit use of binary collation causes `CHARACTER_LENGTH("😀")` to return 4 instead of 1 (as it would be if `character_set_server` was `utf8mb4`), where 4 is a size in bytes of the string literal encoding.

To avoid the implicit use of binary collation for string literals in string functions, either use explicit type casting or use database columns defined with the `utf8mb4` character set. The following example uses explicit typecasting of the string literal.

```sql
SELECT "My string literal" COLLATE utf8mb4_unicode_ci;
```

```sql
SELECT character_length("😀" :> CHAR(20) COLLATE utf8mb4_unicode_ci) as result;

```

```output

+--------+
| result |
+--------+
|      1 |
+--------+
```

Alternatively, you can use a database column defined with the `utf8mb4` character set.

```sql
CREATE TABLE t(a VARCHAR(200) COLLATE utf8mb4_unicode_ci);

INSERT INTO t VALUES("Hi 😀!");

SELECT * FROM t;

```

```output

+----------+------+
| a        | len  |
+----------+------+
| Hi 😀!   |    5 |
+----------+------+

```

```sql
SELECT a, character_length(a) AS len FROM t;

```

```output

+--------+
| result |
+--------+
|      1 |
+--------+

```

***

Modified at: June 18, 2025

Source: [/db/v9.1/reference/sql-reference/character-encoding/special-cases/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/character-encoding/special-cases/)

(An index of the documentation is available at /llms.txt)
