Character Set and Collation Override

Although the server character set and collation are applied cluster-wide, you can override the server configurations in procedural extensions, tables, columns, and expressions within each database. You can override the cluster-wide character set and collation when using the LOAD DATA statement.

Table-Level Override

The COLLATE clause in the CREATE TABLE statement allows you to override the server collation and hence override the server character set.

CREATE [ROWSTORE] TABLE <table_name> (column_name {<column_definition>})
COLLATE <collation_name>;

Examples

In the following example, the results table uses utf8mb4_general_ci collation overriding the server collation of utf8_general_ci.

SET collation_server = utf8_general_ci;
CREATE TABLE results (student_id INT, grade CHAR(5), pass VARCHAR(5)) COLLATE utf8mb4_general_ci;
INSERT results VALUES(1, "B+", "👍");
SELECT * FROM results;
+------------+-------+------+
| student_id | grade | pass |
+------------+-------+------+
|          1 | B+    | 👍   |
+------------+-------+------+
1 row in set (0.04 sec)

In the following example, a table introduced within a stored procedure uses utf8 collation overriding the server collation of utf8mb4_general_ci.

SET collation_server='utf8mb4_general_ci';
DELIMITER //
CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30))
AS
/* A variable uses utf8 overriding the server character set*/
DECLARE
location TEXT = UCASE(city);
/* A table uses utf8 overriding the server character set and collation */
BEGIN
CREATE TABLE customers(id INT, city VARCHAR (30) COLLATE utf8_general_ci);
INSERT INTO customers VALUES(id, location);
END //
DELIMITER ;
CALL sample(1, "london 🗺");
SELECT * FROM customers;
+------+---------+
| id   | city    |
+------+---------+
|    1 | LONDON  |
+------+---------+
1 row in set (0.02 sec)

To view the character set used by a table, issue the following command.

SHOW CREATE TABLE <table_name>;

The table character set and collation are used as default values for the columns in the table. You can, however, override the table settings at the column level. For more information, see Column-level Override.

Important

Table character set and collation values cannot be specified using the ALTER TABLE statement.

Column-Level Override

You can override the character set and collation for table columns using the COLLATE clause in the column definition of the CREATE TABLE statement. For example, you can specify the character set and the collation for VARCHAR and JSON columns using the following syntax.

CREATE [ROWSTORE] TABLE <table_name> (id INT,
<column_name> VARCHAR(<length>) COLLATE <collation_name>,
<column_name> JSON COLLATE <collation_name>);

Examples

In the following example, the comments.response column uses utf8mb4_general_ci overriding the table collation of utf8_general_ci.

CREATE TABLE response (message VARCHAR (20),
comments JSON COLLATE utf8mb4_general_ci) COLLATE utf8_general_ci;
INSERT response VALUES ('Available 👍','{"Good to meet next week":"🙂"}'),
('Cannot say 😐','{"Prefer to meet this week":"🙁"}'), ('Unavailable 👎','{"Planned time off":"🙁"}');
SELECT * FROM response ORDER BY message;
+----------------+-------------------------------------+
| message        | comments                            |
+----------------+-------------------------------------+
| Available      | {"Good to meet next week":"🙂"}      |
| Cannot say     | {"Prefer to meet this week":"🙁"}    |
| Unavailable    | {"Planned time off":"🙁"}            |
+----------------+-------------------------------------+

Important

For JSON columns, the COLLATE clause can take any of utf8 or utf8mb4 collation as values. If no collation is specified in the column definition, JSON columns use utf8_bin by default regardless of the value set for the collation engine variable.

In the following example, the Status.delivery column uses utf8mb4_general_ci, overriding the server collation of utf8_bin.

SET collation_server = utf8_bin;
CREATE TABLE delivery (ID INT, Status ENUM('NOT_DISPATCHED','DELIVERED ✅','IN_TRANSIT') COLLATE utf8mb4_general_ci);
INSERT INTO delivery VALUES(567,1);
INSERT INTO delivery VALUES(472,'DELIVERED ✅');
INSERT INTO delivery(ID) VALUES(323);
SELECT * FROM delivery ORDER BY ID ASC;
+------+----------------+
| ID   | Status         |
+------+----------------+
| 472  | DELIVERED ✅   |
| 323  | NULL           |
| 567  | NOT_DISPATCHED |
+------+----------------+

To view the character sets used by columns in a table, run one of the following commands.

SHOW FULL COLUMNS {FROM | IN} TABLE <table_name>;
SHOW CREATE TABLE <table_name>;

Modifying the Collation of Columns

In rowstores, you can additionally use the ALTER TABLE statement to modify the collation, and in turn, modify the character set of existing columns.

CREATE ROWSTORE TABLE posts(comments VARCHAR(50) COLLATE utf8_general_ci);
ALTER TABLE posts MODIFY comments VARCHAR(50) COLLATE utf8mb4_general_ci;

Important

  • In columnstores, the collation of existing columns cannot be modified.

  • In rowstores, when the character set of existing columns is modified from utf8mb4 to utf8, the column entries will be truncated to the first string of valid utf8 Unicode characters.

Expression-Level Override

The following example uses the collation clause with the cast (:>) operator to control the GROUP BY clause based on case sensitivity.

Collate utf8_bin compares strings based solely on their Unicode code point values. Collate utf8_general_ci applies Unicode normalization using language-specific rules and compares strings for case-insensitively.

CREATE TABLE grades (student_id INT, Lastname varchar(25),
grade CHAR(5)) COLLATE utf8mb4_general_ci;
INSERT grades values(751, "williams", "A"),
("184", "Williams", "B"),
("938", "Browning", "D"),
("260", "Anderson", "C");
SELECT * FROM grades;
+--------------------------------+
| student_id | Lastname  | Grade |
+--------------------------------+
| 184        | Williams  | B     |
+--------------------------------+
| 938        | Browning  | D     |
+--------------------------------+
| 260        | Anderson  | C     |
+--------------------------------+
| 751        | williams  | A     |
+--------------------------------+ 

When using utf8_bin the results will display all last names from the table regardless of capitalization.

SELECT lastname :> text COLLATE utf8_bin FROM grades GROUP BY 1;
+---------------------------------+
| lastname:>text collate utf8_bin |
|+--------------------------------+
| Williams                        |
+---------------------------------+
| Browning                        |
+---------------------------------+
| Anderson                        |
+---------------------------------+
| williams                        |
+---------------------------------+

When using utf8_general_ci the results will display only uppercase last names from the table.

SELECT lastname :> text COLLATE utf8_general_ci FROM grades GROUP BY 1;
+-----------------------------------------+
| lastname:>text collate utf8_general_ci  |
|+----------------------------------------+
| Williams                                |
+-----------------------------------------+
| Browning                                |
+-----------------------------------------+
| Anderson                                |
+-----------------------------------------+

When using expressions with JSON columns, use the :> operator to specify a collation for different expressions in a query.

:> text COLLATE <colation_name>

The following example uses a binary collation (utf8_bin) for the first expression and a case-insensitive collation (utf8_general_ci) for the second expression.

SELECT * FROM sets
WHERE sets.json_field::$x :> text COLLATE utf8_bin = "string1"
AND sets.json_field::$y :> text COLLATE utf8_general_ci = "string2";

Important

If the collation is not specified for expressions, the column character set and collation are used, which may rely on the table settings or the server settings.

Setting Collation for String Literals

You can set the collation for string literals explicitly.

SELECT "My string" COLLATE utf8mb4_unicode_ci;

Procedural Extension-Level Override

Parameters and variables in Procedural Extensions (stored procedures and user-defined functions) use the server character set and collation by default. The COLLATE clause in the CREATE PROCEDURE and the CREATE FUNCTION statements allows you to override the server collation, and in turn, override the server character set for parameters and variables.

Use the following syntax to override the collation for parameters and variables within stored procedures.

CREATE PROCEDURE procedure_name ( [parameter_name data_type COLLATE collation_name], [...n] )
...
CREATE PROCEDURE procedure_name ( [parameter_list] )
...
[DECLARE variable_name type_expression COLLATE collation_name [= value_expression ] ; [... ;] [ ...n ]
...

Similarly, you can use the following syntax to override the collation for parameters and variables in user-defined functions.

CREATE FUNCTION function_name(id INT, product VARCHAR(20) COLLATE utf8_general_ci);
CREATE FUNCTION single_parameter_example(a INT) AS
DECLARE
num INT = 0;
str VARCHAR(255) COLLATE utf8mb4_general_ci;
...

Examples

In the following stored procedure, a parameter uses utf8 overriding the server character set of utf8mb4.

SET collation_server='utf8mb4_general_ci';
/* A parameter uses utf8 overriding the server character set */
DELIMITER //
CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30) COLLATE utf8_general_ci)
AS
DECLARE
location TEXT = UCASE(city);
BEGIN
CREATE TABLE customers(id INT, city VARCHAR(30) COLLATE utf8_general_ci);
INSERT INTO customers VALUES(id, location);
END //
DELIMITER ;
CALL sample(1, "london 🗺");
SELECT * FROM customers;
+------+--------+
| id   | city   |
+------+--------+
|    1 | LONDON |
+------+--------+
1 row in set (0.02 sec)

In the following user-defined function, a variable uses utf8 overriding the server character set of utf8mb4.

SET collation_server='utf8mb4_general_ci';
DELIMITER //
CREATE OR REPLACE FUNCTION getVersion() RETURNS VARCHAR(255) AS
DECLARE
var1 VARCHAR(255);
var2 VARCHAR(255);
var3 VARCHAR(10) COLLATE utf8_general_ci;
result VARCHAR(255);
BEGIN
var1 = @@memsql_version;
var2 = @@version_compile_os;
var3 = "👾";
result = CONCAT("OS: ", var2, "; Version: ", var1, " ", var3);
RETURN result;
END //
DELIMITER ;
SELECT getVersion() AS Info;
+----------------------------+
| Info                       |
+----------------------------+
| OS: Linux; Version: 7.5.1  |
+----------------------------+
1 row in set (0.01 sec)

Important

The server character set and collation can be overridden for tables and columns introduced within procedural extensions. For more information, see Table-Level Override and Column-Level Override.

LOAD DATA

The LOAD DATA statement allows you to specify a character set for the files to be imported. Use the following syntax.

LOAD DATA [LOCAL] INFILE '<file_name>'> INTO TABLE '<table_name>' CHARACTER SET <character_set_name>;

Last modified: September 24, 2024

Was this article helpful?