Character Sets Supported
On this page
SingleStore supports a variety of character sets in the Unicode standard and their associated collations.SHOW CHARACTER SETS
command.
SHOW CHARACTER SET;
+---------+-----------------------+--------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+-----------------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| binary | Binary pseudo charset | binary | 1 |
+---------+-----------------------+--------------------+--------+
Alternatively, you can retrieve the supported character sets from the CHARACTER_
view by using a SELECT
statement with optional LIKE
and WHERE
clauses.
SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS WHERE CHARACTER_SET_NAME = 'utf8mb4';
+--------------------+----------------------+---------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+---------------+--------+
| utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 |
+--------------------+----------------------+---------------+--------+
Character Sets Supported by SingleStore Features
binary
A character set used for encoding binary strings.binary
as the default collation.
Important
The binary character set is a universal feature that is supported across most applicable database schema objects and commands.
utf8
An alias for utf8mb3
, which is a Unicode character set that supports encoding of characters using 1 to 3 bytes per character.utf8_
is the default collation assigned to this character set.
Important
The utf8
character set is a universal feature that is supported across most applicable database schema objects and commands.
utf8mb4
A Unicode character set that supports encoding of characters using 1 to 4 bytes per character.utf8mb4_
is the default collation assigned to this character set.
SingleStore uses the utf8mb4
character set by default.
utf8mb4
is supported for specific database schema objects and commands that are discussed in the following sections.
Data Types
The following data types allow you to store utf8mb4
Unicode characters.
-
JSON
-
CHAR
-
VARCHAR
-
LONGTEXT
,MEDIUMTEXT
,TEXT
,TINYTEXT
-
ENUM
-
SET
String Functions
String Functions can be used with strings with the utf8mb4
character set.utf8mb4
character set.
select LENGTH('Hello world!🙂');
+----------------------------+
| LENGTH('Hello world!🙂') |
+----------------------------+
| 16 |
+----------------------------+
JSON Functions
JSON Functions can be used with JSON
columns and string arguments with the utf8mb4
character set.JSON
column that supports the utf8mb4
character set.
CREATE TABLE events (name VARCHAR (20), registrations INT, comments JSON COLLATE utf8mb4_general_ci);INSERT events VALUES ("Swimming",50,'{"Registration closed":"✅"}'), ("Biking",28,'{"Registration is open":"⏸"}'), ("Powerlifting",22,'{"Registration is open":"⏸"}');SELECT JSON_AGG(comments) FROM events;
+-----------------------------------------------------------------------------------------------+
| JSON_AGG(comments) |
+-----------------------------------------------------------------------------------------------+
| [{"Registration is open":"⏸"},{"Registration closed":"✅"},{"Registration is open":"⏸"}] |
+-----------------------------------------------------------------------------------------------+
1 row in set (0.05 sec)
Procedural Extensions
In procedural extensions such as stored procedures and user-defined functions, you can use parameters and variables withutf8mb4
Unicode characters.utf8mb4
Unicode characters.
SingleStore Pipelines
SingleStore Pipelines can ingest and process data with the utf8mb4
character set from the supported data sources.utf8mb4
character set.
LOAD DATA
The LOAD DATA statement allows you to import files with any supported character set, including utf8mb4
, into SingleStore.utf8mb4
character set.
Last modified: July 8, 2024