String Types

Fixed-Length String

There are two fixed-size string types in SingleStore. The space consumed is directly proportional to their length definitions. Values of type CHAR require 4 x LENGTH bytes of memory, as SingleStore supports the UTF8 character set including the 4-byte character encoding. Values of type BINARY require LENGTH bytes of memory. char(0) not null and binary(0) not null still allocate 1 byte.

Data Type

Size

Size (Not Null)

CHAR(LENGTH)

4*LENGTH + 1 bytes

4*LENGTH bytes

BINARY(LENGTH)

LENGTH + 1 bytes

LENGTH bytes

A column declared as CHAR() stores values as per the defined length. When CHAR values are stored, they are right-padded with spaces to the specified length but when the CHAR values are retrieved, trailing spaces are removed.

CREATE TABLE t1 (c1 CHAR(3));
INSERT INTO t1 VALUES (' ');
INSERT INTO t1 VALUES ('ab ');
INSERT INTO t1 VALUES ('abcd');
SELECT c1, LENGTH (c1) FROM t1;
+------+-------------+
| c1   | LENGTH (c1) |
+------+-------------+
| abc  |           3 |
| ab   |           2 |
|      |           0 |
+------+-------------+

Variable-Length Strings

There are several variable-length string types. The data for this data type is stored in memory for rowstore tables and the main table structure holds pointers to the memory locations.

The total space consumed by a particular string value is:

  • The byte length of the string.

  • An 8-byte overhead for the pointer.

  • A 4-byte overhead for storing the length. There is no 4-byte overhead if the field is nullable.

Note

One exception is values less than 7 bytes long. Those small values are stored directly in the pointer area.

The three main types are VARCHAR, VARBINARY, and LONGTEXT. While they all operate the same for most English-language alphanumeric strings, there are several important differences in their behavior. The length limit of the VARCHAR is measured in Unicode characters, not in bytes. VARBINARY fields store data verbatim, regardless of whether they are valid Unicode strings. Sorting and comparing a VARBINARY uses the exact binary representation.

To reduce memory fragmentation, memory for variable-sized data is allocated from a fixed list in multiples of eight (16, 32, 40, etc.), but not every multiple of eight. Variable data of other lengths are rounded up to the next nearest multiple of eight in the list. For example, a ten (10) byte string would be allocated sixteen (16) bytes.

Data Type

Max Length

VARCHAR

21,844 characters

VARBINARY

65,533 bytes

LONGTEXT

While the DDL will allow you to specify a length of up to 4GB, there is an internal limit applied when assigning a value to a string or JSON field that caps the maximum size of a single value to max_allowed_packet. This is 100MB by default and can be set to up to 1GB.

Note

There are several variants of the TEXT type, listed below. Aside from their maximum lengths, there is no practical difference between these types in SingleStore, including performance or memory efficiency. Further, TEXT , LONGTEXT, and VARCHAR are identical in functionality other than the maximum lengths. TEXT types are implemented solely for backward compatibility with MySQL syntax.

Name

Max Length

Description

MEDIUMTEXT

16 MB

LONGTEXT

TEXT

65,535 bytes

LONGTEXT

TINYTEXT

255 bytes

LONGTEXT

Comparison operators do not treat trailing spaces differently when used with variable length TEXT type columns such as VARCHAR(), TEXT(), MEDIUMTEXT(), and TINYTEXT(). Comparison operators ignore the trailing spaces at the end of the string being compared, except for the LIKE operator. The LIKE operator does not ignore the trailing spaces and considers them while comparing the strings.

The following example demonstrates how comparison operators treat trailing spaces for a VARCHAR() column. Consider a table Employee as follows:

CREATE TABLE st_Employee (Location varchar(10));
INSERT INTO st_Employee VALUES ('BERLIN '),('BERLIN  '),('BERLIN'),('BERLIN ');

Insert a few values in the Location column, with and without trailing spaces:

SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM st_Employee;
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
|                6 | [Berlin]                   |
|                8 | [Berlin  ]                 |
|                7 | [Berlin ]                  |
|                7 | [Berlin ]                  |
+------------------+----------------------------+

The following SELECT statements ignore the trailing spaces and return the same output:

SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM st_Employee WHERE Location = 'Berlin ';
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM st_Employee WHERE Location = 'Berlin';
SELECT LENGTH(Location),CONCAT('[', Location, ']') FROM st_Employee WHERE Location = 'Berlin ';
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
|                6 | [Berlin]                   |
|                8 | [Berlin  ]                 |
|                7 | [Berlin ]                  |
|                7 | [Berlin ]                  |
+------------------+----------------------------+

An exception is the LIKE comparison operator, where trailing spaces are significant while comparison.

SELECT LENGTH(Location), CONCAT('[', Location, ']') FROM st_Employee WHERE Location LIKE 'Berlin';
+------------------+----------------------------+
| LENGTH(Location) | CONCAT('[', Location, ']') |
+------------------+----------------------------+
|                6 | [Berlin]                   |
+------------------+----------------------------+

Last modified: August 29, 2024

Was this article helpful?