Specifying Character Set and Collation for Workspaces

Setting Character Set and Collation Engine Variables

You can specify the character set and collation for your workspace by setting the following engine variables.

  • character_set_server: The character set of the entire workspace that can be set to utf8mb4, utf8, or binary. If you set character_set_server to a value that is different from its previous value, SingleStore sets the associated collation variables collation_connection, collation_database, and collation_server to the character set’s default collation.

    To set the character_set variable workspace-wide, follow the instructions below.

    Warning

    Although the character_set_server variable is a session variable, you must set the variable globally to avoid undesired behavior. An error is generated if the variable is set in a session to a value that does not match the value set globally. Change the value globally first, then change it in the session to match.

    1. Using a SingleStore Helios client, run SET CLUSTER (which is the equivalent of SET GLOBAL) to set the character_set_server variable across the workspace. For example:

      SET CLUSTER character_set_server = 'utf8';
      Query OK, 0 rows affected (0.03 sec)

      Note: SingleStore offers alternative commands to set engine variables. See Sync Variables for more information.

  • collation_connection, collation_database, and collation_server: These are the collation variables that are set workspace-wide, or on all aggregator nodes or leaf nodes depending on how you set the engine variables. These variables can take the collations returned by the SHOW COLLATION command as values. When you set any of collation_connection, collation_database, and collation_server, SingleStore sets the other two variables to the same value and applies the collation to the nodes. Also, when you set any of these three variables, SingleStore sets character_set_server to a value that corresponds to the collation configured.

    To set the collation variables workspace-wide, follow the instructions below.

    Warning

    Although the collation variables are session variables, you must set the variable globally to avoid undesired behavior. An error is generated if any of these variables is set in a session to a value that does not match the value set globally. Change the value globally first, then change it in the session to match.

    1. Using a SingleStore Helios client, run SET CLUSTER (which is the equivalent of SET GLOBAL) to set the collation variable across the workspace. For example:

      SET CLUSTER collation_server = 'utf8_general_ci';
      Query OK, 0 rows affected (0.03 sec)

      Note: SingleStore offers alternative commands to set engine variables. See Sync Variables for more information.

To set a collation workspace-wide, issue the following command.

SET CLUSTER collation_server = 'utf8_general_ci';
Query OK, 0 rows affected (0.03 sec)

Alternatively, you can use SET AGGREGATOR or SET LEAF to sync the collation variable to the respective nodes. Note that you need to restart your SQL session for the new collation value to take effect if it is set using one of these commands. For more information, see Setting Sync Variables.

Reading Character Set and Collation Engine Variables

To retrieve values of the character set or the collation in current use, use the @@ selector with the required variable in a SELECT statement.

SELECT @@collation_server;
+--------------------+
| @@collation_server |
+--------------------+
| utf8mb4_general_ci |
+--------------------+
1 row in set (0.00 sec)

Collation Demonstration

The following example shows that utf8_general_ci collation is case-insensitive. The output is 1, indicating that A and a match in this collation.

SET collation_connection = 'utf8_general_ci';
SELECT 'A' = 'a';

The following example shows that utf8_bin collation is case-sensitive. The output is 0, indicating that A and a do not match in this collation.

SET collation_connection = 'utf8_bin';
SELECT 'A' = 'a';

Important

Among the collations supported for the utf8mb4 character set, utf8mb4_unicode_ci is recommended for accurate comparison and sorting of characters, especially when using characters from non-English international languages. While utf8mb4_general_ci improves the performance of comparing and sorting operations, the results may not be precise when using certain language characters.

Last modified: August 29, 2024

Was this article helpful?