Collations Supported
On this page
Each character set supported in SingleStore can have multiple collations with one default collation per character set.
A collation determines the ordering of characters in a character set and is used for sorting and comparison of columns defined to use that collation.
To view the available collations and the character sets to which they apply, use the SHOW COLLATION command.COLLATIONS information schema view as shown in the following example.
SELECT *FROM INFORMATION_SCHEMA.COLLATIONSWHERE CHARACTER_SET_NAME = 'utf8mb4' LIMIT 5;
+----------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+----------------------+--------------------+-----+------------+-------------+---------+
| utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
| utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
| utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
+----------------------+--------------------+-----+------------+-------------+---------+When a character set is used, the default collation of the character set is automatically assigned unless otherwise specified.Yes value in the IS_ field.CHARACTER_ information schema view, or from the results of the SHOW COLLATION or the SHOW CHARACTER SET command.
By default, SingleStore uses the utf8 character set along with its default collation utf8_.collation_, which is by default utf8_ collation.
SingleStore recommends using binary collations, such as utf8mb4_, for best performance.
Retrieve the value of the collation in current use by using the @@ in a SELECT for one of the following collation variables: collation_, collation_, and collation_.
SELECT @@character_set_server;
+------------------------+
| @@character_set_server |
+------------------------+
| utf8 |
+------------------------+SELECT @@collation_server;
+--------------------+
| @@collation_server |
+--------------------+
| utf8_general_ci |
+--------------------+To view the mapping between collations and character sets, use the following command.
SELECT *FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY;
Note: When used to specify a collation, binary must be surrounded by backticks (`binary`) or single quotes ('binary') as shown in the following examples.
CREATE TABLE t(a TEXT COLLATE binary);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'binary)' at line 1CREATE TABLE t(a TEXT COLLATE 'binary');
Query OK, 0 rows affected (0.03 sec)Types of Collations and Naming Conventions
Collations differ in whether they are case-sensitive or -insensitive, accent-sensitive or -insensitive, and whether they consider characters (e.
Collation names are prefixed with their associated character set name, typically followed by one or more suffixes representing the collation properties.utf8_ contain the language as a suffix and then the _ suffix to indicate case-insensitivity.
SingleStore supports case-insensitive (_) collations._), accent-insensitive (_), and accent-sensitive (_) collations are not supported._ in the name implies _.
The _ suffix indicates a binary collation for a specific character set, which is different from the binary collation.utf8mb4_ is a binary collation for the utf8mb4 character set.
The binary collation is the only collation for the binary character set.
In contrast, collations of character sets with the _ suffix define comparisons based on numeric character code values.utf8mb4 charset follows UTF8 rules, while the binary charset stores any binary values.
Choosing a Collation
Collations have an impact on performance.
SingleStore recommends using binary collations, such as utf8mb4_, for best performance._ collations compare using Unicode values, which is faster than comparisons using other types of collations which apply linguistic rules.
However, binary collations are case-sensitive, accent-sensitive, and do not account for different representations of certain characters.
Binary collations support the same Unicode characters as their corresponding Unicode collation, but the comparison rules are different.utf8mb4_ supports the same Unicode characters as utf8mb4_.
General case-insensitive collations (*_), and Unicode collations (*_) provide more sophisticated collation algorithms than binary collations (*_) at the cost of performance.
For any Unicode character set, the _ collations are faster than the _ collations.utf8mb4_ collation does not fully inherit Unicode rules for all languages, but is case-insensitive.
For example, comparisons for the utf8mb4_ collation are faster, but less precise, than comparisons for utf8mb4_.utf8mb4_ collation supports mappings such as expansions; that is, when one character is considered equal to combinations of other characters.
Thus for the utf8mb4 character set:
-
utf8mb4_is the fastest, but is case-sensitive and does not support expansions.bin -
utf8mb4_is slower, and is case-insensitive, but does not support expansions.general_ ci -
utf8mb4_is the slowest, it is case-insensitive and supports expansions.unicode_ ci
SingleStore recommends using case-insensitive, accent-insensitive, and Unicode collations only when explicitly needed by the application.
Unicode Collation Behavior
SingleStore's utf8mb4_ collation uses the Unicode Collation Algorithm (UCA) version 13._ collations according to prior UCA versions may have slightly different collation behavior.
Last modified: