Using Vector Functions
On this page
This topic gives information about working with vectors represented as blobs.
Data type suffixes
The vector functions present a 32byte floatingpoint integer data type by default.

_
I8  8byte signed integer 
_
I16  16byte signed integer 
_
I32  32byte signed integer 
_
I64  64byte signed integer 
_
F32  32byte floatingpoint number (this is the default) 
_
F64  64byte floatingpoint number
Inserting Vectors as Blobs from an External Application
As a best practice when inserting binary data into a table from an external application, first convert a vector blob to its hexadecimal representation.INSERT
statement using the UNHEX()
builtin function to convert the hex representation back to its original binary format.
Consider the following example written in Python (3.
import numpyimport structvector = numpy.random.random(1024)vectorStr = "".join([str(struct.pack('f', elem)) for elem in vector])vectorStrEncoded = vectorStr.encode("utf8").hex()
When this code is executed, a random floating point vector is generated:
[0.09361789 0.22198475 0.36929942 ..., 0.97525847 0.98422799 0.26693578]
The vector is then converted to a binary string representation.
bdbabf3df84f633ed014bd3ef6ca183ffc759b ... 48ca303e8aaa793f5ef67b3fcfab883e
Using the hexencoded vector, you can connect to the database and execute a query that calls the UNHEX()
builtin function to convert the vector to its original binary format during insertion:
memsqlConnection.query('INSERT INTO TABLE vectors VALUES (UNHEX("%s"))' % (vectorStrEncoded))
You can also do:
memsqlConnection.query('SELECT EUCLIDEAN_DISTANCE(UNHEX("%s"), UNHEX("%s"))' % (vectorStrEncoded, vectorStrEncoded))
Performance Note
DOT_
, EUCLIDEAN_
, and VECTOR_
are highperformance functions implemented for fast vector operations, using singleinstruction multipledata (SIMD) processor instructions.
Once the vectors have been stored in a table in the binary
format, you do not need to use the UNHEX
builtin to use the blob with the vector builtins.
DOT_
takes in two vector blob arguments and returns the scalar dot product of those two vectors
VECTOR_
takes in two vector blob arguments and returns a vector blob which is a result of the second argument vector being subtracted from the first argument vector
EUCLIDEAN_
takes in two vector blob arguments and returns the scalar euclidean distance between the two vectors.EUCLIDEAN_
is equal to sqrt(DOT_
.EUCLIDEAN_
is more efficient with memory bandwidth, we expect the former to be faster than the latter.
For each of the three builtins, the input blob arguments must be the same length, and their binary length must be a multiple of 4 or 8.
If the result for either DOT_
or EUCLIDEAN_
is infinity, negative infinity, or NaN, null
will be returned instead.
JSON_ARRAY_PACK
There is also the JSON_
builtin which can be used as a helper for the vector functions.
SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) "Result";
++
 Result 
++
 5.25 
++
Using the JSON_
builtin, we can illustrate that vectors must have the same number of elements of the same type (BINARY
or BLOB) to work correctly in vector functions:
SET @v1 = JSON_ARRAY_PACK('[1,2,3,4]');SET @v2 = JSON_ARRAY_PACK('[1,1,0,0]');SET @v3 = JSON_ARRAY_PACK('[0,1]');SET @v4 = JSON_ARRAY_PACK_I8('[0,0,0,0,1]');SELECT DOT_PRODUCT(@v1,@v2);
++
 DOT_PRODUCT(@v1,@v2) 
++
 3 
++
SELECT LENGTH(@v1), LENGTH(@v4);
+++
 LENGTH(@v1)  LENGTH(@v4) 
+++
 16  5 
+++
/* Note: the length is in bytes, not elements */
An error will be generated if the number of elements or the types don't match:
SELECT DOT_PRODUCT(@v1,@v3);
ERROR 1940 (HY000): Error with a vector function: argument sizes do not match
SELECT DOT_PRODUCT(@v1, @v4);
ERROR 1940 (HY000): Error with a vector function: argument sizes do not match
Use Cases
A common use case for these functions is to work with floatingpoint vectors that describe images, such as images of faces or products.DOT_
or EUCLIDEAN_
can then be run to find the closest matches.DOT_
.
SELECT t.*, DOT_PRODUCT(t.vector, UNHEX("...constant vector...")) AS score
FROM t
ORDER BY score DESC
LIMIT 10;
Last modified: April 30, 2024