Vector Type

An instance of the vector data type is an ordered collection of numeric values with a fixed number of dimensions. Vectors can represent various kinds of data, such as vector embeddings obtained from large language models (LLMs), vector embeddings representing images or faces, financial time series, spatial coordinates, velocities, colors, and other related information. Using the vector data type provides an easier way to insert, load, and query vectors.

VECTOR(<N> [ , <elementType> ])

Where:

  • <N> is the number of elements (dimensions)

  • <elementType> (optional) is the vector element type. Element types supported are 8-, 16-, 32-, and 64-bit integers and 32- and 64-bit floating point numbers. Options for elementType are I8, I16, I32, I64, F32, and F64. The default for <elementType> is 32-bit floating-point number (F32).

The input data for a VECTOR (<N>, ...) must have exactly <N> elements.

Refer to Working with Vector Data and How to Bulk Load Vectors for more information on using the VECTOR type.

Important

VECTOR_SUM and VECTOR_SUBVECTOR are currently not supported with the VECTOR data type.

Warning

Deprecation Notice

String functions, such as CONCAT, LENGTH, SUBSTR, and HEX, previously operated on the JSON representation of a vector, interpreted as a string. This behavior is deprecated. In the 8.7 release, string functions operate on the binary representation of a vector.

If you truly intend to use a string function on the JSON string representation of a vector, you can write code in the following way so that it will run the same way before and after this behavior change.

Suppose that vec is a vector-type value and stringfunc is any string function and you have expression:

stringfunc(vec)

you can change it to:

stringfunc(vec :> json)

Output Format for Examples

Vectors may be output in JSON or binary format. Use JSON format for examples and for output readability. For production, use the default binary for efficiency.

To get JSON output which will match the examples, use the following command to output vectors in JSON.

SET vector_type_project_format = JSON;

Use the following command to set the output format back to binary.

SET vector_type_project_format = BINARY;

Examples using Vector Data Type

Using Vector Data Type in Tables

Create a table and insert values using the VECTOR data type. The elementType will default to F32.

CREATE TABLE exp_data(col_a VECTOR(3));
INSERT INTO exp_data VALUES('[0.267261237,0.534522474,0.801783681]');
INSERT INTO exp_data VALUES('[0.371390671,0.557085991,0.742781341]');
INSERT INTO exp_data VALUES('[0.424264073,0.565685451,0.707106829]');

Confirm that the vectors have been loaded:

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT col_a FROM exp_data;
+---------------------------------------+
| col_a                                 |
+---------------------------------------+
| [0.267261237,0.534522474,0.801783681] |
| [0.424264073,0.565685451,0.707106829] |
| [0.371390671,0.557085991,0.742781341] |
+---------------------------------------+ 

Using Vector Data Type with Integer Element Type

This example shows how to create a VECTOR with an elementType of 16-bit integer (I16) to demonstrate how to use an elementType other than the default.

CREATE TABLE exp_data_i16(col_a VECTOR(3, I16));
INSERT INTO exp_data_i16 VALUES('[1, 2, 3]');
INSERT INTO exp_data_i16 VALUES('[4, 5, 6]');
INSERT INTO exp_data_i16 VALUES('[1, 4, 8]');

Confirm that the vectors were loaded.

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT col_a FROM exp_data_i16;
+---------------------------------------+
| col_a                                 |
+---------------------------------------+
| [1, 2, 3]                             |
| [4, 5, 6]                             |
| [1, 4, 8]                             |
+---------------------------------------+ 

A Note on Dot Product and Euclidean Distance

Two common operations on vectors are dot product and Euclidean distance. SingleStore supports DOT_PRODUCT and EUCLIDEAN_DISTANCE functions and infix operators for those functions:

  • <*> for DOT_PRODUCT

  • <-> for EUCLIDEAN_DISTANCE

The DOT_PRODUCT (<*>) and EUCLIDEAN_DISTANCE (<->) operators take as input two vector expressions and require that both operands have the same number of elements and are the same element type. In some cases, the system will do automatic type conversions between JSON strings, BLOBs and VECTORs. See DOT_PRODUCT and EUCLIDEAN_DISTANCE for more information.

A Note on Infix Operators

Two common operations on vectors are DOT_PRODUCT and EUCLIDEAN_DISTANCE. Infix operators for those functions are supported:

  • <*> for DOT_PRODUCT

  • <-> for EUCLIDEAN_DISTANCE

The infix operators both operate on two vectors and require that both of those vector operands have the same number of elements and have the same element type. See DOT_PRODUCT and EUCLIDEAN_DISTANCE for more information.

Transition from Blob to Vector Data Type

Transition from Blob to Vector Data Type in a Rowstore Table

Prior to the addition of the VECTOR data type, SingleStore supported storing vectors as BLOB values. With the introduction of this data type, you may wish to update your applications to use VECTORs instead of BLOBs.

The column type in a rowstore table can be easily modified from blob to vector using the ALTER TABLE MODIFY statement.

To simulate an existing application that stores vectors as blobs, create a table using the BLOB data type and enter vectors using JSON_ARRAY_PACK.

CREATE ROWSTORE TABLE exp_data2(col_b BLOB);
INSERT INTO exp_data2 VALUES(JSON_ARRAY_PACK('[1,2,3]'));
INSERT INTO exp_data2 VALUES(JSON_ARRAY_PACK('[3,5,9]'));

Confirm that the vectors have been inserted into the table.

SELECT JSON_ARRAY_UNPACK(col_b) FROM exp_data2;
+--------------------------+
| JSON_ARRAY_UNPACK(col_b) |
+--------------------------+
| [1,2,3]                  |
| [3,5,9]                  |
+--------------------------+

Use ALTER TABLE and MODIFY to convert the blob data type to a vector data type.

ALTER TABLE exp_data2 MODIFY col_b VECTOR(3);

Confirm that the column has been modified to a vector data type. Note that the element type of the vector is F32, the default.

SHOW COLUMNS FROM exp_data2;
+---------+----------------+------+------+---------+-------+
| Field   | Type           | Null | Key  | Default | Extra |
+-----------+--------------+------+------+---------+-------+
| col_b   | vector(3, F32) | YES  |      | NULL    |       |
+---------+----------------+------+------+---------+-------+

Confirm that the table reflects the vector data type.

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT col_b FROM exp_data2;
+--------+
|col_b   |
+--------+
|[3,5,9] |
|[1,2,3] |
+--------+

Transition from Blob to Vector Data Type in a Columnstore Table

The following steps outline how to change a columnstore blob column to vector.

  1. Add a new vector column to the right of the blob column.

  2. Update the vector column with the data from the blob column.

  3. Drop the blob column.

  4. Rename the new vector column to the old blob column name. This will ensure any previous queries will still work, or at least require fewer changes.

These steps are demonstrated in the following example.

Columnstore Example

Create a table and review the table details using the DESC statement.

CREATE TABLE test(col_1 int, col_2 blob, col_3 int);
DESC test;
+---------+---------+------+------+---------+-------+
| Field   | Type    | Null | Key  | Default | Extra |
+---------+---------+------+------+---------+-------+
| col_1   | int(11) | YES  |      | NULL    |       |
| col_2   | blob    | YES  |      | NULL    |       |
| col_3   | int(11) | YES  |      | NULL    |       |
+---------+---------+------+------+---------+-------+

Insert data into the table, add a new column to the table (col2), and update the new column (col2) with the data from the old column (col_2).

INSERT test VALUES(1,JSON_ARRAY_PACK("[1,2,3]"), 100);
INSERT test VALUES(2,JSON_ARRAY_PACK("[2,3,4]"), 200);
ALTER TABLE test ADD COLUMN col2 vector(3) AFTER col_2;
UPDATE test SET col2 = col_2;

Check the table to confirm that the data has been moved to the new column (col2).

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT col_1, JSON_ARRAY_UNPACK(col_2), col2, col_3 FROM test;
+--------+---------------------------+---------+-------+
| col_1  | JSON_ARRAY_UNPACK(col_2)  | col2    | col_3 |
+--------+---------------------------+---------+-------+
|      1 | [1,2,3]                   | [1,2,3] |  100  |
|      2 | [2,3,4]                   | [2,3,4] |  200  |
+------+-----------------------------+---------+-------+ 

Drop the old column (col_2) and rename the new column (col2) using the old column name (col_2).

ALTER TABLE test DROP COLUMN col_2;
ALTER TABLE test CHANGE col2 col_2;
DESC test;
+---------+----------------+------+------+---------+-------+
| Field   | Type           | Null | Key  | Default | Extra |
+---------+----------------+------+------+---------+-------+
| col_1   | int(11)        | YES  |      | NULL    |       |
| col_2   | vector(3, F32) | YES  |      | NULL    |       |
| col_3   | int(11)        | YES  |      | NULL    |       |
+---------+----------------+------+------+---------+-------+

Casting between Blob and Vector

Casting is a type of conversion that allows a value to be converted from one data type to another. The following examples show how to cast a blob data type to a vector data type and vice versa.

Continuing with the table and values from the previous example, applying the following SELECT statement will return a vector type value in the left column (col_2).

The blob produced by the UNHEX() function will be automatically cast as a vector of the correct length, which will match the type of the <*> operator’s left argument. If the length is not correct, an error will  occur.

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT col_2, col_2 <*> UNHEX("0000803F0000803F0000803F")
FROM test;
+-----------+---------------------------------------------+
| col_2     | col_2 <*> UNHEX("0000803F0000803F0000803F") |
+-----------+---------------------------------------------+
| [1,2,3]   |                                           6 |
| [2,3,4]   |                                           9 |
+-----------+---------------------------------------------+

Casting can be done in either direction using the :> operator. For example:

SET vector_type_project_format = JSON; /* to make vector output readable */
SELECT "[1,2,3]" :> VECTOR(3) :> BLOB :> VECTOR(3);
+---------------------------------------------+
| "[1,2,3]" :> VECTOR(3) :> BLOB :> VECTOR(3) |
+---------------------------------------------+
| [1,2,3]                                     |
+---------------------------------------------+

When writing UDFs that operate on vectors of different lengths, casting can be used to allow a function to accept a blob argument instead of a fixed-length vector. Refer to the vector normalization topic for an example of the normalize function, which produces the results in the following example. You may also use this function on vector-type values of a fixed length, where an automatic casting from vector to blob is performed. For example:

SET vector_type_project_format = JSON; /* to make vector output readable */
SET @v = "[1,2,3,4]" :> VECTOR(4);
SELECT normalize(@v) :> VECTOR(4);
+---------------------------------------------------+
| normalize(@v) :> VECTOR(4)                        |
+---------------------------------------------------+
| [0.182574183,0.365148365,0.547722578,0.730296731] |
+---------------------------------------------------+

Casting may be used to write helper functions, which are functions that call another function. For example, a helper function can be employed to operate on a fixed-length vector. Helper functions are useful when rewriting the logic of the calling function is impossible or impractical.

To illustrate, normalize treats a vector of any length as a blob due to the implicit casting of its argument from vector to blob. The following function, norm1536, is a thin wrapper around the normalize function. It normalizes a typed vector argument of length 1536 and returns a typed vector argument of length 1536.

DELIMITER //
CREATE OR REPLACE FUNCTION norm1536(v VECTOR(1536)) RETURNS VECTOR (1536) AS
BEGIN
RETURN normalize(v);
END //
DELIMITER ;

The norm1536 function can be used to write queries with no explicit casting required. This particular function is useful when working with vectors that are always of type VECTOR(1536) and the desired return value is explicitly typed as VECTOR(1536).

Troubleshooting

Trying to insert vectors with incorrect element types or lengths will cause an error. Below are two common errors and their causes.

To demonstrate these errors, create a table t_invalid with a VECTOR attribute v of length 3. The element type of v will be 32-bit floating point number (F32) as F32 is the default element type for VECTORs.

CREATE TABLE t_invalid(v VECTOR(3));

The following error occurs when attempting to insert a vector '[8,2,4a]', with an element that does not match the element type of the vector attribute, v. Specifically, '4a' cannot be converted into a 32-bit floating point number.

INSERT INTO t_invalid VALUES ('[8,2,4a]');
ERROR 2856 (HY000): Invalid VECTOR value for column 'v',
  reason: 'Value must be a JSON array of numbers.

The following error occurs when attempting to insert a a vector '[1,6,7,9]' with a number of elements that is different than the specified length for the vector column, v.

INSERT INTO t_invalid VALUES ('[1,6,7,9]');
ERROR 2856 (HY000): Invalid VECTOR value for column 'v',
  reason: 'Cannot cast vector (4, F32) to a vector (3, F32). 
  Ensure the lengths are equal and the element types are the same.'

Last modified: August 23, 2024

Was this article helpful?