Collations Supported

Each character set supported in SingleStore can have multiple collations with one default collation per character set. To view the collations available and the character sets to which they apply, issue the SHOW COLLATION command. Alternatively, you can run the SELECT statement with optional LIKE and WHERE clauses to yield specific results from the COLLATIONS view.

SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8mb4' LIMIT 5;
+----------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME       | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+----------------------+--------------------+-----+------------+-------------+---------+
| utf8mb4_general_ci   | utf8mb4            |  45 | Yes        | Yes         |       1 |
| utf8mb4_bin          | utf8mb4            |  46 |            | Yes         |       1 |
| utf8mb4_unicode_ci   | utf8mb4            | 224 |            | Yes         |       8 |
| utf8mb4_icelandic_ci | utf8mb4            | 225 |            | Yes         |       8 |
| utf8mb4_latvian_ci   | utf8mb4            | 226 |            | Yes         |       8 |
+----------------------+--------------------+-----+------------+-------------+---------+

To view the mapping between collations and character sets, issue the following command.

SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY;

When applying a character set, the default collation of the character set is automatically assigned unless specified otherwise.In the above command output, default collations are indicated by the Yes value in the IS_DEFAULT field. You can also identify the default collation of each character set from the CHARACTER_SETS view, or from the results of the SHOW COLLATION or the SHOW CHARACTER SET command. Note that two character sets cannot have the same collation.

By default, SingleStore uses the utf8mb4 character set along with its default collation utf8mb4_general_ci. If not set explicitly, newly created text columns will have the utf8mb4_general_ci collation. You can retrieve the value of the collation in current use by using the @@ in a SELECT for one of the following collation variables: collation_server, collation_connection, and collation_database.

SELECT @@character_set_server;
+------------------------+
| @@character_set_server |
+------------------------+
| utf8mb4                |
+------------------------+
SELECT @@collation_server;
+--------------------+
| @@collation_server |
+--------------------+
| utf8mb4_general_ci |
+--------------------+

Collation Naming Conventions

The collation names are prefixed with their associated character set name, typically followed by one or more suffixes representing the collation properties. For example, language-specific collations such as utf8_swedish_ci contain the language as a suffix and then the _ci suffix to indicate case-insensitivity.

Similarly, there are suffixes to indicate case-sensitivity (_cs), accent-insensitivity (_ai), and accent-sensitivity (_as). Currently, SingleStore only supports case-insensitive collations except for the *_bin collation, which is case-sensitive. If a collation name does not contain an accent-sensitivity designation, the collation's accent-sensitivity is determined by its case-sensitivity designation. In other words, if a collation does not contain _ai or _as, _ci in the name also implies _ai and _cs also implies _as. For example, utf8mb4_general_ci is explicitly case-insensitive and implicitly accent-insensitive (See also Collation Demonstration). The exception to this format is the binary collation, which has no suffixes.

The _bin suffix indicates binary, which is different from the binary collation. The binary collation is the default and the only collation of the binary character set. It defines rules for character comparisons based on numeric byte values. On the contrary, the collations of nonbinary characters with the _bin suffix define comparisons based on numeric character code values, which differ from byte values for multibyte characters.

Last modified: June 5, 2024

Was this article helpful?