Character Set and Collation Override
On this page
Although the server character set and collation are applied cluster-wide, you can override the server configurations in procedural extensions, tables, columns, and expressions within each database.LOAD DATA
statement.
Table-Level Override
The COLLATE
clause in the CREATE TABLE
statement allows you to override the server collation and hence override the server character set.
CREATE [ROWSTORE] TABLE <table_name> (column_name {<column_definition>})COLLATE <collation_name>;
Examples
In the following example, the results
table uses utf8mb4_
collation overriding the server collation of utf8_
.
SET collation_server = utf8_general_ci;CREATE TABLE results (student_id INT, grade CHAR(5), pass VARCHAR(5)) COLLATE utf8mb4_general_ci;INSERT results VALUES(1, "B+", "👍");SELECT * FROM results;
+------------+-------+------+
| student_id | grade | pass |
+------------+-------+------+
| 1 | B+ | 👍 |
+------------+-------+------+
1 row in set (0.04 sec)
In the following example, a table introduced within a stored procedure uses utf8
collation overriding the server collation of utf8mb4_
.
SET collation_server='utf8mb4_general_ci';DELIMITER //CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30))AS/* A variable uses utf8 overriding the server character set*/DECLARElocation TEXT = UCASE(city);/* A table uses utf8 overriding the server character set and collation */BEGINCREATE TABLE customers(id INT, city VARCHAR (30) COLLATE utf8_general_ci);INSERT INTO customers VALUES(id, location);END //DELIMITER ;CALL sample(1, "london 🗺");SELECT * FROM customers;
+------+---------+
| id | city |
+------+---------+
| 1 | LONDON |
+------+---------+
1 row in set (0.02 sec)
To view the character set used by a table, issue the following command.
SHOW CREATE TABLE <table_name>;
The table character set and collation are used as default values for the columns in the table.
Important
Table character set and collation values cannot be specified using the ALTER TABLE
statement.
Column-Level Override
You can override the character set and collation for table columns using the COLLATE
clause in the column definition of the CREATE TABLE
statement.VARCHAR
and JSON
columns using the following syntax.
CREATE [ROWSTORE] TABLE <table_name> (id INT,<column_name> VARCHAR(<length>) COLLATE <collation_name>,<column_name> JSON COLLATE <collation_name>);
Examples
In the following example, the comments.
column uses utf8mb4_
overriding the table collation of utf8_
.
CREATE TABLE response (message VARCHAR (20),comments JSON COLLATE utf8mb4_general_ci) COLLATE utf8_general_ci;INSERT response VALUES ('Available 👍','{"Good to meet next week":"🙂"}'),('Cannot say 😐','{"Prefer to meet this week":"🙁"}'), ('Unavailable 👎','{"Planned time off":"🙁"}');SELECT * FROM response ORDER BY message;
+----------------+-------------------------------------+
| message | comments |
+----------------+-------------------------------------+
| Available | {"Good to meet next week":"🙂"} |
| Cannot say | {"Prefer to meet this week":"🙁"} |
| Unavailable | {"Planned time off":"🙁"} |
+----------------+-------------------------------------+
Important
For JSON
columns, the COLLATE
clause can take any of utf8
or utf8mb4
collation as values.JSON
columns use utf8_
by default regardless of the value set for the collation engine variable.
In the following example, the Status.
column uses utf8mb4_
, overriding the server collation of utf8_
.
SET collation_server = utf8_bin;CREATE TABLE delivery (ID INT, Status ENUM('NOT_DISPATCHED','DELIVERED ✅','IN_TRANSIT') COLLATE utf8mb4_general_ci);INSERT INTO delivery VALUES(567,1);INSERT INTO delivery VALUES(472,'DELIVERED ✅');INSERT INTO delivery(ID) VALUES(323);SELECT * FROM delivery ORDER BY ID ASC;
+------+----------------+
| ID | Status |
+------+----------------+
| 472 | DELIVERED ✅ |
| 323 | NULL |
| 567 | NOT_DISPATCHED |
+------+----------------+
To view the character sets used by columns in a table, run one of the following commands.
SHOW FULL COLUMNS {FROM | IN} TABLE <table_name>;
SHOW CREATE TABLE <table_name>;
Modifying the Collation of Columns
In rowstores, you can additionally use the ALTER TABLE
statement to modify the collation, and in turn, modify the character set of existing columns.
CREATE ROWSTORE TABLE posts(comments VARCHAR(50) COLLATE utf8_general_ci);ALTER TABLE posts MODIFY comments VARCHAR(50) COLLATE utf8mb4_general_ci;
Important
-
In columnstores, the collation of existing columns cannot be modified.
-
In rowstores, when the character set of existing columns is modified from
utf8mb4
toutf8
, the column entries will be truncated to the first string of validutf8
Unicode characters.
Expression-Level Override
The following example uses the collation clause with the cast (:>) operator to control the GROUP BY clause based on case sensitivity.
Collate utf8_
compares strings based solely on their Unicode code point values.utf8_
applies Unicode normalization using language-specific rules and compares strings for case-insensitively.
CREATE TABLE grades (student_id INT, Lastname varchar(25),grade CHAR(5)) COLLATE utf8mb4_general_ci;INSERT grades values(751, "williams", "A"),("184", "Williams", "B"),("938", "Browning", "D"),("260", "Anderson", "C");SELECT * FROM grades;
+--------------------------------+
| student_id | Lastname | Grade |
+--------------------------------+
| 184 | Williams | B |
+--------------------------------+
| 938 | Browning | D |
+--------------------------------+
| 260 | Anderson | C |
+--------------------------------+
| 751 | williams | A |
+--------------------------------+
When using utf8_
the results will display all last names from the table regardless of capitalization.
SELECT lastname :> text COLLATE utf8_bin FROM grades GROUP BY 1;
+---------------------------------+
| lastname:>text collate utf8_bin |
|+--------------------------------+
| Williams |
+---------------------------------+
| Browning |
+---------------------------------+
| Anderson |
+---------------------------------+
| williams |
+---------------------------------+
When using utf8_
the results will display only uppercase last names from the table.
SELECT lastname :> text COLLATE utf8_general_ci FROM grades GROUP BY 1;
+-----------------------------------------+
| lastname:>text collate utf8_general_ci |
|+----------------------------------------+
| Williams |
+-----------------------------------------+
| Browning |
+-----------------------------------------+
| Anderson |
+-----------------------------------------+
When using expressions with JSON
columns, use the :>
operator to specify a collation for different expressions in a query.
:> text COLLATE <colation_name>
The following example uses a binary collation (utf8_
) for the first expression and a case-insensitive collation (utf8_
) for the second expression.
SELECT * FROM setsWHERE sets.json_field::$x :> text COLLATE utf8_bin = "string1"AND sets.json_field::$y :> text COLLATE utf8_general_ci = "string2";
Important
If the collation is not specified for expressions, the column character set and collation are used, which may rely on the table settings or the server settings.
Procedural Extension-Level Override
Parameters and variables in Procedural Extensions (stored procedures and user-defined functions) use the server character set and collation by default.COLLATE
clause in the CREATE PROCEDURE
and the CREATE FUNCTION
statements allows you to override the server collation, and in turn, override the server character set for parameters and variables.
Use the following syntax to override the collation for parameters and variables within stored procedures.
CREATE PROCEDURE procedure_name ( [parameter_name data_type COLLATE collation_name], [...n] )...
CREATE PROCEDURE procedure_name ( [parameter_list] )...[DECLARE variable_name type_expression COLLATE collation_name [= value_expression ] ; [... ;] [ ...n ]...
Similarly, you can use the following syntax to override the collation for parameters and variables in user-defined functions.
CREATE FUNCTION function_name(id INT, product VARCHAR(20) COLLATE utf8_general_ci);…
CREATE FUNCTION single_parameter_example(a INT) ASDECLAREnum INT = 0;str VARCHAR(255) COLLATE utf8mb4_general_ci;...
Examples
In the following stored procedure, a parameter uses utf8
overriding the server character set of utf8mb4
.
SET collation_server='utf8mb4_general_ci';/* A parameter uses utf8 overriding the server character set */DELIMITER //CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30) COLLATE utf8_general_ci)ASDECLARElocation TEXT = UCASE(city);BEGINCREATE TABLE customers(id INT, city VARCHAR(30) COLLATE utf8_general_ci);INSERT INTO customers VALUES(id, location);END //DELIMITER ;CALL sample(1, "london 🗺");SELECT * FROM customers;
+------+--------+
| id | city |
+------+--------+
| 1 | LONDON |
+------+--------+
1 row in set (0.02 sec)
In the following user-defined function, a variable uses utf8
overriding the server character set of utf8mb4
.
SET collation_server='utf8mb4_general_ci';DELIMITER //CREATE OR REPLACE FUNCTION getVersion() RETURNS VARCHAR(255) ASDECLAREvar1 VARCHAR(255);var2 VARCHAR(255);var3 VARCHAR(10) COLLATE utf8_general_ci;result VARCHAR(255);BEGINvar1 = @@memsql_version;var2 = @@version_compile_os;var3 = "👾";result = CONCAT("OS: ", var2, "; Version: ", var1, " ", var3);RETURN result;END //DELIMITER ;SELECT getVersion() AS Info;
+----------------------------+
| Info |
+----------------------------+
| OS: Linux; Version: 7.5.1 |
+----------------------------+
1 row in set (0.01 sec)
Important
The server character set and collation can be overridden for tables and columns introduced within procedural extensions.
LOAD DATA
The LOAD DATA statement allows you to specify a character set for the files to be imported.
LOAD DATA [LOCAL] INFILE '<file_name>'> INTO TABLE '<table_name>' CHARACTER SET <character_set_name>;
Last modified: September 9, 2024