Collations Supported
Each character set supported in SingleStore can have multiple collations with one default collation per character set. To view the collations available and the character sets to which they apply, issue the SHOW COLLATION
command. Alternatively, you can run the SELECT
statement with optional LIKE
and WHERE
clauses to yield specific results from the COLLATIONS
view.
SELECT * FROM INFORMATION_SCHEMA.COLLATIONS WHERE CHARACTER_SET_NAME = 'utf8mb4' LIMIT 5; +----------------------+--------------------+-----+------------+-------------+---------+ | COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | +----------------------+--------------------+-----+------------+-------------+---------+ | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 | | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 | | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 | +----------------------+--------------------+-----+------------+-------------+---------+ 5 rows in set (0.17 sec)
To view the mapping between collations and character sets, issue the following command.
SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY;
When applying a character set, the default collation of the character set is automatically assigned unless specified otherwise. In the above command output, default collations are indicated by the Yes
value in the IS_DEFAULT
field. You can also identify the default collation of each character set from the CHARACTER_SETS
view, or from the results of the SHOW COLLATION
or the SHOW CHARACTER SET
command. Note that two character sets cannot have the same collation.
Collation Naming Conventions
The collation names are prefixed with their associated character set name, typically followed by one or more suffixes representing the collation properties. For example, language-specific collations such as utf8_swedish_ci
contain the language as a suffix and then the _ci
suffix to indicate case-insensitivity.
Similarly, there are suffixes to indicate case-sensitivity (_cs
), accent-insensitivity (_ai
), and accent-sensitivity (_as
). Currently, SingleStore only supports case-insensitive collations except for the *_bin
collation, which is case-sensitive. If a collation name does not contain an accent-sensitivity designation, the collation's accent-sensitivity is determined by its case-sensitivity designation. In other words, if a collation does not contain _ai
or _as
, _ci
in the name also implies _ai
and _cs
also implies _as
. For example, utf8mb4_general_ci
is explicitly case-insensitive and implicitly accent-insensitive (See also Collation Demonstration). The exception to this format is the binary
collation, which has no suffixes.
The _bin
suffix indicates binary, which is different from the binary
collation. The binary
collation is the default and the only collation of the binary
character set. It defines rules for character comparisons based on numeric byte values. On the contrary, the collations of nonbinary characters with the _bin
suffix define comparisons based on numeric character code values, which differ from byte values for multibyte characters.