SingleStore DB

Character Set and Collation Override

Although the server character set and collation are applied cluster-wide, you can override the server configurations in procedural extensions, tables, columns, and expressions within each database. You can override the cluster-wide character set and collation when using the LOAD DATA statement.

Table-Level Override

The COLLATE clause in the CREATE TABLE statement allows you to override the server collation and hence override the server character set.

CREATE TABLE <table_name> (column_name {<column_definition>}) COLLATE <collation_name>;

Examples

In the following example, the results table uses utf8mb4_general_ci collation overriding the server collation of utf8_general_ci.

SET collation_server = utf8_general_ci;
****
Query OK, 0 rows affected (0.00 sec)
 
CREATE TABLE results (student_id INT, grade CHAR(5), pass VARCHAR(5)) COLLATE utf8mb4_general_ci;
****
Query OK, 0 rows affected (0.09 sec)
 
INSERT results VALUES(1, "B+", "👍");
****
Query OK, 1 row affected (0.05 sec)
 
SELECT * FROM results;
****
+------------+-------+------+
| student_id | grade | pass |
+------------+-------+------+
|          1 | B+    | 👍     |
+------------+-------+------+
1 row in set (0.04 sec)

In the following example, a table introduced within a stored procedure uses utf8 collation overriding the server collation of utf8mb4_general_ci.

set collation_server='utf8mb4_general_ci';
****
Query OK, 0 rows affected (0.00 sec)
 
DELIMITER //
CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30)) 
AS
/* A variable uses utf8 overriding the server character set*/
DECLARE
	location TEXT = UCASE(city);
/* A table uses utf8 overriding the server character set and collation */
BEGIN
	CREATE TABLE customers(id INT, city VARCHAR (30) COLLATE utf8_general_ci);
	INSERT INTO customers VALUES(id, location);
END //
DELIMITER ;
****
Query OK, 1 row affected (0.02 sec)
 
CALL sample(1, "london 🗺");
****
Query OK, 0 rows affected (0.13 sec)
 
SELECT * FROM customers;
****
+------+---------+
| id   | city    |
+------+---------+
|    1 | LONDON  |
+------+---------+
1 row in set (0.02 sec)

To view the character set used by a table, issue the following command.

SHOW CREATE TABLE <table_name>;

The table character set and collation are used as default values for the columns in the table. You can, however, override the table settings at the column level. For more information, see Column-level Override.

Important

Table character set and collation values cannot be specified using the ALTER TABLE statement.

Column-Level Override

You can override the character set and collation for table columns using the COLLATE clause in the column definition of the CREATE TABLE statement. For example, you can specify the character set and the collation for VARCHAR and JSON columns using the following syntax.

CREATE TABLE <table_name> (id INT, <column_name> VARCHAR(<length>) COLLATE <collation_name>, <column_name> JSON COLLATE <collation_name>);

Examples

In the following example, the comments.response column uses utf8mb4_general_ci overriding the table collation of utf8_general_ci.

CREATE TABLE response (message VARCHAR (20), comments JSON COLLATE utf8mb4_general_ci) COLLATE utf8_general_ci;
****
Query OK, 0 rows affected (0.08 sec)
 
INSERT response VALUES ('Available 👍','{"Good to meet next week":"🙂"}'), ('Cannot say 😐','{"Prefer to meet this week":"🙁"}'), ('Unavailable 👎','{"Planned time off":"🙁"}');
****
Query OK, 3 rows affected (0.15 sec)
 
SELECT * FROM response ORDER BY message;
****
+----------------+-------------------------------------+
| message        | comments                            |
+----------------+-------------------------------------+
| Available      | {"Good to meet next week":"🙂"}      |
| Cannot say     | {"Prefer to meet this week":"🙁"}    |
| Unavailable    | {"Planned time off":"🙁"}            |
+----------------+-------------------------------------+

Important

For JSON columns, the COLLATE clause can take any of utf8 or utf8mb4 collation as values. If no collation is specified in the column definition, JSON columns use utf8_bin by default regardless of the value set for the collation engine variable.

In the following example, the Status.delivery column uses utf8mb4_general_ci, overriding the server collation of utf8_bin.

SET collation_server = utf8_bin;
****
Query OK, 0 rows affected (0.00 sec)


CREATE TABLE delivery (ID INT, Status ENUM('NOT_DISPATCHED','DELIVERED ✅','IN_TRANSIT') COLLATE utf8mb4_general_ci);
Query OK, 0 rows affected, 2 warnings (0.04 sec)
 
INSERT INTO delivery VALUES(567,1);
Query OK, 1 row affected (0.05 sec)
 
INSERT INTO delivery VALUES(472,'DELIVERED ✅');
Query OK, 1 row affected (0.00 sec)
 
INSERT INTO delivery(ID) VALUES(323);
Query OK, 1 row affected (0.06 sec)
 
SELECT * FROM delivery ORDER BY ID ASC;
+------+----------------+
| ID   | Status         |
+------+----------------+
| 472  | DELIVERED ✅   |
| 323  | NULL           |
| 567  | NOT_DISPATCHED |
+------+----------------+

In rowstores, you can additionally use the ALTER TABLE statement to modify the collation, and in turn, modify the character set of existing columns.

CREATE TABLE posts(comments VARCHAR(50) COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.52 sec)
 
ALTER TABLE posts MODIFY comments VARCHAR(50) COLLATE utf8mb4_general_ci;
Query OK, 0 rows affected, 1 warning (0.78 sec)

To view the character sets used by columns in a table, run one of the following commands.

SHOW FULL COLUMNS {FROM | IN} TABLE <table_name>;
SHOW CREATE TABLE <table_name>;

Important

  • In columnstores, the collation of existing columns cannot be modified.

  • In rowstores, when the character set of existing columns is modified from utf8mb4 to utf8, the column entries will be truncated to the first string of valid utf8 Unicode characters.

Expression-Level Override (in JSON columns)

When using expressions with JSON columns, use the :> operator to specify a collation for different expressions in a query.

:> text collate <colation_name>

The following example uses a binary collation (utf8_bin) for the first expression and a case-insensitive collation (utf8_general_ci) for the second expression.

SELECT * FROM sets
WHERE sets.json_field::$x :> text collate utf8_bin = "string1"
AND sets.json_field::$y :> text collate utf8_general_ci = "string2";

Important

If the collation is not specified for expressions, the column character set and collation are used, which may rely on the table settings or the server settings.

Procedural Extension-Level Override

Parameters and variables in Procedural Extensions (stored procedures and user-defined functions) use the server character set and collation by default. The COLLATE clause in the CREATE PROCEDURE and the CREATE FUNCTION statements allows you to override the server collation, and in turn, override the server character set for parameters and variables.

Use the following syntax to override the collation for parameters and variables within stored procedures.

CREATE PROCEDURE procedure_name ( [parameter_name data_type COLLATE collation_name], [...n] ) 
...
CREATE PROCEDURE procedure_name ( [parameter_list] ) 
... 
[DECLARE variable_name type_expression COLLATE collation_name [= value_expression ] ; [... ;] [ ...n ] 
...

Similarly, you can use the following syntax to override the collation for parameters and variables in user-defined functions.

CREATE FUNCTION function_name(id INT, product VARCHAR(20) COLLATE utf8_general_ci);
…
CREATE FUNCTION single_parameter_example(a INT) AS
  DECLARE
    num INT = 0;
    str VARCHAR(255) COLLATE utf8mb4_general_ci;
  ...

Examples

In the following stored procedure, a parameter uses utf8 overriding the server character set of utf8mb4.

set collation_server='utf8mb4_general_ci';
****
Query OK, 0 rows affected (0.00 sec)
 
/* A parameter uses utf8 overriding the server character set */
DELIMITER //
CREATE OR REPLACE PROCEDURE sample(id INT, city VARCHAR(30) COLLATE utf8_general_ci) 
AS
DECLARE
	location TEXT = UCASE(city);
BEGIN
	CREATE TABLE customers(id INT, city VARCHAR (30) COLLATE utf8_general_ci);
	INSERT INTO customers VALUES(id, location);
END //
DELIMITER ;
****
Query OK, 1 row affected (0.02 sec)
 
CALL sample(1, "london 🗺");
****
Query OK, 0 rows affected (0.15 sec)
 
SELECT * FROM customers;
****
+------+--------+
| id   | city   |
+------+--------+
|    1 | LONDON |
+------+--------+
1 row in set (0.02 sec)

In the following user-defined function, a variable uses utf8 overriding the server character set of utf8mb4.

set collation_server='utf8mb4_general_ci';
****
Query OK, 0 rows affected (0.00 sec)

DELIMITER //
 
CREATE OR REPLACE FUNCTION getVersion() RETURNS VARCHAR(255) AS
DECLARE
  var1 VARCHAR(255);
  var2 VARCHAR(255);
  var3 VARCHAR(10) COLLATE utf8_general_ci;
  result VARCHAR(255);
BEGIN
  var1 = @@memsql_version;
  var2 = @@version_compile_os;
  var3 = "👾";
  result = CONCAT("OS: ", var2, "; Version: ", var1, " ", var3);
RETURN result;
END //
 
DELIMITER ;
****
Query OK, 1 row affected (0.01 sec)
 
 
SELECT getVersion() AS Info;
****
+----------------------------+
| Info                       |
+----------------------------+
| OS: Linux; Version: 7.5.1  |
+----------------------------+
1 row in set (0.01 sec)

Important

The server character set and collation can be overridden for tables and columns introduced within procedural extensions. For more information, see Table-Level Override and Column-Level Override.

LOAD DATA

The LOAD DATA statement allows you to specify a character set for the files to be imported. Use the following syntax.

LOAD DATA [LOCAL] INFILE '<file_name>'> INTO TABLE '<table_name>' CHARACTER SET <character_set_name>;