Character Sets Supported

SingleStore supports a variety of character sets in the Unicode standard and their associated collations. To view supported character sets, run the SHOW CHARACTER SET command. This displays the character sets along with their default collation and the maximum byte length of the characters within each character set.

SHOW CHARACTER SET;
+---------+-----------------------+--------------------+--------+
| Charset | Description           | Default collation  | Maxlen |
+---------+-----------------------+--------------------+--------+
| utf8mb4 | UTF-8 Unicode         | utf8mb4_general_ci |      4 |
| utf8    | UTF-8 Unicode         | utf8_general_ci    |      3 |
| binary  | Binary pseudo charset | binary             |      1 |
+---------+-----------------------+--------------------+--------+

Alternatively, retrieve the supported character sets from the CHARACTER_SETS information schema view by using a SELECT statement with optional LIKE and WHERE clauses.

SELECT * FROM INFORMATION_SCHEMA.CHARACTER_SETS WHERE CHARACTER_SET_NAME = 'utf8mb4';
+--------------------+----------------------+---------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION   | MAXLEN |
+--------------------+----------------------+---------------+--------+
| utf8mb4            | utf8mb4_general_ci   | UTF-8 Unicode |      4 |
+--------------------+----------------------+---------------+--------+

Character Sets Supported by Features

binary

A character set used for encoding binary strings. This character set has binary as the default collation.

Important

The binary character set is a universal feature that is supported across most applicable database schema objects and commands.

utf8

An alias for utf8mb3, which is a Unicode character set that supports encoding of characters using 1 to 3 bytes per character. This character set is used for encoding the characters in the BMP. utf8_general_ci is the default collation assigned to this character set.

Important

The utf8 character set is a universal feature that is supported across most applicable database schema objects and commands.

utf8mb4

A Unicode character set that supports encoding of characters using 1 to 4 bytes per character.  This character set is used for encoding all the characters in the BMP and supplementary characters that lie outside the BMP, including the private use area (PUA) which can contain pictographic symbols (emojis) and ancient scripts, such as Egyptian hieroglyphs. utf8mb4_general_ci is the default collation assigned to this character set.

SingleStore uses the utf8mb4 character set by default.

utf8mb4 is supported for specific database schema objects and commands that are discussed in the following sections.

Data Types

The following data types can store utf8mb4 Unicode characters.

  • JSON

  • CHAR

  • VARCHAR

  • LONGTEXT, MEDIUMTEXT, TEXT, TINYTEXT

  • ENUM

  • SET

String Functions

String Functions can be used with strings with the utf8mb4 character set. For example, the LENGTH string function returns the number of bytes in a string that uses the utf8mb4 character set.

SELECT LENGTH('Hello world!🙂');
+----------------------------+
| LENGTH('Hello world!🙂')   |
+----------------------------+
|                         16 |
+----------------------------+

JSON Functions

JSON Functions can be used with JSON columns and string arguments with the utf8 and utf8mb4 character sets and the utf8_bin and utf8mb4_bin collations. For example, the JSON_AGG function aggregates a JSON column that supports the utf8mb4 character set.

Procedural Extensions

In procedural extensions such as stored procedures and user-defined functions, parameters and variables with utf8mb4 Unicode characters can be used. In addition, the tables and columns introduced in procedural extensions can store utf8mb4 Unicode characters.

Pipelines

Pipelines can ingest and process data with the utf8mb4 character set from the supported data sources. The columns that store the ingested data must be configured to support the utf8mb4 character set.

LOAD DATA

The LOAD DATA statement allows the import of files with any supported character set, including utf8mb4, into SingleStore. The columns that store the imported data must be configured to support the utf8mb4 character set.

Last modified:

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK

Try Out This Notebook to See What’s Possible in SingleStore

Get access to other groundbreaking datasets and engage with our community for expert advice.