Specifying Character Set and Collation Cluster-Wide
Setting Character Set and Collation Engine Variables
You can specify the character set and collation for your cluster by setting the following engine variables.
character_set_server: The character set of the entire cluster that can be set to
utf8mb4
,utf8
, orbinary
. If you setcharacter_set_server
to a value that is different from its previous value, SingleStore sets the associated collation variables collation_connection, collation_database, and collation_server to the character set’s default collation.To set the
character_set
variable cluster-wide, follow the instructions below.Warning
Although the
character_set_server
variable is a session variable, it is strongly recommended to set the variable globally to avoid undesired behavior.Using a SingleStoreDB client, run
SET CLUSTER
(which is the equivalent ofSET GLOBAL
) to set thecharacter_set_server
variable across the cluster. For example:SET CLUSTER character_set_server = 'utf8'; **** Query OK, 0 rows affected (0.03 sec)
Note: SingleStore offers alternative commands to set engine variables. See Setting Sync Variables for more information.
If you set the character_set_server variable using
SET CLUSTER
(or its equivalentSET GLOBAL
)orSET AGGREGATOR
, you need to additionally run FLUSH CONNECTION POOLS on every aggregator node and restart the SQL session for the newcharacter_set_server
value to take effect.
collation_connection, collation_database, and collation_server: These are the collation variables that are set cluster-wide, or on all aggregator nodes or leaf nodes depending on how you set the engine variables. These variables can take the collations returned by the
SHOW COLLATION
command as values. When you set any ofcollation_connection
,collation_database
, andcollation_server
, SingleStore sets the other two variables to the same value and applies the collation to the nodes. Also, when you set any of these three variables, SingleStore sets character_set_server to a value that corresponds to the collation configured.To set the collation variables cluster-wide, follow the instructions below.
Warning
Although the collation variables are session variables, it is strongly recommended to set the variable globally to avoid undesired behavior.
Using a SingleStoreDB client, run
SET CLUSTER
(which is the equivalent of SET GLOBAL) to set the collation variable across the cluster. For example:SET CLUSTER collation_server = 'utf8_general_ci'; **** Query OK, 0 rows affected (0.03 sec)
Note: SingleStore offers alternative commands to set engine variables. See Setting Sync Variables for more information.
If you set the character_set_server variable using
SET CLUSTER
(or its equivalentSET GLOBAL
) orSET AGGREGATOR
, you need to additionally run FLUSH CONNECTION POOLS on every aggregator node and restart the SQL session for the new collation value to take effect.
To set a collation cluster-wide, issue the following command.
SET CLUSTER collation_server = 'utf8_general_ci'; **** Query OK, 0 rows affected (0.03 sec)
Alternatively, you can use SET AGGREGATOR
or SET LEAF
to sync the collation variable to the respective nodes. Note that you need to restart your SQL session for the new collation value to take effect if it is set using one of these commands. For more information, see Setting Sync Variables.
Reading Character Set and Collation Engine Variables
To retrieve values of the character set or the collation in current use, use the @@
selector with the required variable in a SELECT
statement.
SELECT @@collation_server; **** +--------------------+ | @@collation_server | +--------------------+ | utf8mb4_general_ci | +--------------------+ 1 row in set (0.00 sec)
Collation Demonstration
The following example shows that utf8_general_ci
collation is case-insensitive. The output is 1, indicating that A and a match in this collation.
SET collation_connection = 'utf8_general_ci'; SELECT 'A' = 'a';
The following example shows that utf8_bin
collation is case-sensitive. The output is 0, indicating that A and a do not match in this collation.
SET collation_connection = 'utf8_bin'; SELECT 'A' = 'a';
Important
Among the collations supported for the utf8mb4
character set, utf8mb4_unicode_ci
is recommended for accurate comparison and sorting of characters, especially when using characters from non-English international languages. While utf8mb4_general_ci
improves the performance of comparing and sorting operations, the results may not be precise when using certain language characters.