Vector Type
On this page
An instance of the vector data type is an ordered collection of numeric values with a fixed number of dimensions.
VECTOR(<N> [ , <elementType> ])
Where:
-
<N>
is the number of elements (dimensions) -
<elementType>
(optional) is the vector element type.Element types supported are 8-, 16-, 32-, and 64-bit integers and 32- and 64-bit floating point numbers. Options for elementType
areI8
,I16
,I32
,I64
,F32
, andF64
.The default for <elementType>
is 32-bit floating-point number (F32
).
The input data for a VECTOR (<N>, .
must have exactly <N>
elements.
Refer to Working with Vector Data and How to Bulk Load Vectors for more information on using the VECTOR
type.
Important
VECTOR_
and VECTOR_
are currently not supported with the VECTOR data type.
Warning
Deprecation Notice
String functions, such as CONCAT
, LENGTH
, SUBSTR
, and HEX
, previously operated on the JSON representation of a vector, interpreted as a string.
If you truly intend to use a string function on the JSON string representation of a vector, you can write code in the following way so that it will run the same way before and after this behavior change.
Suppose that vec
is a vector-type value and stringfunc
is any string function and you have expression:
stringfunc(vec)
you can change it to:
stringfunc(vec :> json)
Output Format for Examples
Vectors may be output in JSON or binary format.
To get JSON output which will match the examples, use the following command to output vectors in JSON.
SET vector_type_project_format = JSON;
Use the following command to set the output format back to binary.
SET vector_type_project_format = BINARY;
Examples using Vector Data Type
Using Vector Data Type in Tables
Create a table and insert values using the VECTOR
data type.elementType
will default to F32
.
CREATE TABLE exp_data(col_a VECTOR(3));INSERT INTO exp_data VALUES('[0.267261237,0.534522474,0.801783681]');INSERT INTO exp_data VALUES('[0.371390671,0.557085991,0.742781341]');INSERT INTO exp_data VALUES('[0.424264073,0.565685451,0.707106829]');
Confirm that the vectors have been loaded:
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT col_a FROM exp_data;
+---------------------------------------+
| col_a |
+---------------------------------------+
| [0.267261237,0.534522474,0.801783681] |
| [0.424264073,0.565685451,0.707106829] |
| [0.371390671,0.557085991,0.742781341] |
+---------------------------------------+
Using Vector Data Type with Integer Element Type
This example shows how to create a VECTOR
with an elementType
of 16-bit integer (I16
) to demonstrate how to use an elementType
other than the default.
CREATE TABLE exp_data_i16(col_a VECTOR(3, I16));INSERT INTO exp_data_i16 VALUES('[1, 2, 3]');INSERT INTO exp_data_i16 VALUES('[4, 5, 6]');INSERT INTO exp_data_i16 VALUES('[1, 4, 8]');
Confirm that the vectors were loaded.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT col_a FROM exp_data_i16;
+---------------------------------------+
| col_a |
+---------------------------------------+
| [1, 2, 3] |
| [4, 5, 6] |
| [1, 4, 8] |
+---------------------------------------+
A Note on Dot Product and Euclidean Distance
Two common operations on vectors are dot product and Euclidean distance.
-
<*>
forDOT_
PRODUCT -
<->
forEUCLIDEAN_
DISTANCE
The DOT_
(<*>
) and EUCLIDEAN_
(<->
) operators take as input two vector expressions and require that both operands have the same number of elements and are the same element type.BLOB
s and VECTOR
s.
A Note on Infix Operators
Two common operations on vectors are DOT_
and EUCLIDEAN_
.
-
<*>
forDOT_
PRODUCT -
<->
forEUCLIDEAN_
DISTANCE
The infix operators both operate on two vectors and require that both of those vector operands have the same number of elements and have the same element type.
Transition from Blob to Vector Data Type
Transition from Blob to Vector Data Type in a Rowstore Table
Prior to the addition of the VECTOR data type, SingleStore supported storing vectors as BLOB values.
The column type in a rowstore table can be easily modified from blob to vector using the ALTER TABLE MODIFY
statement.
To simulate an existing application that stores vectors as blobs, create a table using the BLOB data type and enter vectors using JSON_
.
CREATE ROWSTORE TABLE exp_data2(col_b BLOB);INSERT INTO exp_data2 VALUES(JSON_ARRAY_PACK('[1,2,3]'));INSERT INTO exp_data2 VALUES(JSON_ARRAY_PACK('[3,5,9]'));
Confirm that the vectors have been inserted into the table.
SELECT JSON_ARRAY_UNPACK(col_b) FROM exp_data2;
+--------------------------+
| JSON_ARRAY_UNPACK(col_b) |
+--------------------------+
| [1,2,3] |
| [3,5,9] |
+--------------------------+
Use ALTER TABLE
and MODIFY
to convert the blob data type to a vector data type.
ALTER TABLE exp_data2 MODIFY col_b VECTOR(3);
Confirm that the column has been modified to a vector data type.F32
, the default.
SHOW COLUMNS FROM exp_data2;
+---------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+------+---------+-------+
| col_b | vector(3, F32) | YES | | NULL | |
+---------+----------------+------+------+---------+-------+
Confirm that the table reflects the vector data type.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT col_b FROM exp_data2;
+--------+
|col_b |
+--------+
|[3,5,9] |
|[1,2,3] |
+--------+
Transition from Blob to Vector Data Type in a Columnstore Table
The following steps outline how to change a columnstore blob column to vector.
-
Add a new vector column to the right of the blob column.
-
Update the vector column with the data from the blob column.
-
Drop the blob column.
-
Rename the new vector column to the old blob column name.
This will ensure any previous queries will still work, or at least require fewer changes.
These steps are demonstrated in the following example.
Columnstore Example
Create a table and review the table details using the DESC
statement.
CREATE TABLE test(col_1 int, col_2 blob, col_3 int);DESC test;
+---------+---------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+------+---------+-------+
| col_1 | int(11) | YES | | NULL | |
| col_2 | blob | YES | | NULL | |
| col_3 | int(11) | YES | | NULL | |
+---------+---------+------+------+---------+-------+
Insert data into the table, add a new column to the table (col2
), and update the new column (col2
) with the data from the old column (col_
).
INSERT test VALUES(1,JSON_ARRAY_PACK("[1,2,3]"), 100);INSERT test VALUES(2,JSON_ARRAY_PACK("[2,3,4]"), 200);ALTER TABLE test ADD COLUMN col2 vector(3) AFTER col_2;UPDATE test SET col2 = col_2;
Check the table to confirm that the data has been moved to the new column (col2
).
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT col_1, JSON_ARRAY_UNPACK(col_2), col2, col_3 FROM test;
+--------+---------------------------+---------+-------+
| col_1 | JSON_ARRAY_UNPACK(col_2) | col2 | col_3 |
+--------+---------------------------+---------+-------+
| 1 | [1,2,3] | [1,2,3] | 100 |
| 2 | [2,3,4] | [2,3,4] | 200 |
+------+-----------------------------+---------+-------+
Drop the old column (col_
) and rename the new column (col2
) using the old column name (col_
).
ALTER TABLE test DROP COLUMN col_2;ALTER TABLE test CHANGE col2 col_2;DESC test;
+---------+----------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------------+------+------+---------+-------+
| col_1 | int(11) | YES | | NULL | |
| col_2 | vector(3, F32) | YES | | NULL | |
| col_3 | int(11) | YES | | NULL | |
+---------+----------------+------+------+---------+-------+
Casting between Blob and Vector
Casting is a type of conversion that allows a value to be converted from one data type to another.
Continuing with the table and values from the previous example, applying the following SELECT
statement will return a vector type value in the left column (col_
).
The blob produced by the UNHEX()
function will be automatically cast as a vector of the correct length, which will match the type of the <*>
operator’s left argument.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT col_2, col_2 <*> UNHEX("0000803F0000803F0000803F")FROM test;
+-----------+---------------------------------------------+
| col_2 | col_2 <*> UNHEX("0000803F0000803F0000803F") |
+-----------+---------------------------------------------+
| [1,2,3] | 6 |
| [2,3,4] | 9 |
+-----------+---------------------------------------------+
Casting can be done in either direction using the :>
operator.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT "[1,2,3]" :> VECTOR(3) :> BLOB :> VECTOR(3);
+---------------------------------------------+
| "[1,2,3]" :> VECTOR(3) :> BLOB :> VECTOR(3) |
+---------------------------------------------+
| [1,2,3] |
+---------------------------------------------+
When writing UDFs that operate on vectors of different lengths, casting can be used to allow a function to accept a blob argument instead of a fixed-length vector.normalize
function, which produces the results in the following example.
SET vector_type_project_format = JSON; /* to make vector output readable */SET @v = "[1,2,3,4]" :> VECTOR(4);SELECT normalize(@v) :> VECTOR(4);
+---------------------------------------------------+
| normalize(@v) :> VECTOR(4) |
+---------------------------------------------------+
| [0.182574183,0.365148365,0.547722578,0.730296731] |
+---------------------------------------------------+
Casting may be used to write helper functions, which are functions that call another function.
To illustrate, normalize
treats a vector of any length as a blob due to the implicit casting of its argument from vector to blob.norm1536
, is a thin wrapper around the normalize
function.
DELIMITER //CREATE OR REPLACE FUNCTION norm1536(v VECTOR(1536)) RETURNS VECTOR (1536) ASBEGINRETURN normalize(v);END //DELIMITER ;
The norm1536
function can be used to write queries with no explicit casting required.VECTOR(1536)
and the desired return value is explicitly typed as VECTOR(1536)
.
Troubleshooting
Trying to insert vectors with incorrect element types or lengths will cause an error.
To demonstrate these errors, create a table t_
with a VECTOR
attribute v
of length 3.v
will be 32-bit floating point number (F32
) as F32
is the default element type for VECTOR
s.
CREATE TABLE t_invalid(v VECTOR(3));
The following error occurs when attempting to insert a vector '[8,2,4a]'
, with an element that does not match the element type of the vector attribute, v
.'4a'
cannot be converted into a 32-bit floating point number.
INSERT INTO t_invalid VALUES ('[8,2,4a]');
ERROR 2856 (HY000): Invalid VECTOR value for column 'v',
reason: 'Value must be a JSON array of numbers.
The following error occurs when attempting to insert a a vector '[1,6,7,9]'
with a number of elements that is different than the specified length for the vector column, v
.
INSERT INTO t_invalid VALUES ('[1,6,7,9]');
ERROR 2856 (HY000): Invalid VECTOR value for column 'v',
reason: 'Cannot cast vector (4, F32) to a vector (3, F32).
Ensure the lengths are equal and the element types are the same.'
Related Topics
Last modified: August 23, 2024