# Using Vector Functions

This section lists the built-in functions for working with vectors represented as blobs. See the topic on Working with Vector Data for a discussion of use cases for vector functions.

## Data type suffixes

The vector functions present a 32-byte floating-point integer data type by default. The data type for vector functions can be changed by using type suffixes:

• _I8 - 8-byte signed integer

• _I16 - 16-byte signed integer

• _I32 - 32-byte signed integer

• _I64 - 64-byte signed integer

• _F32 - 32-byte floating-point number (this is the default)

• _F64 - 64-byte floating-point number

## Inserting Vectors as Blobs from an External Application

As a best practice when inserting binary data into a table from an external application, first convert a vector blob to its hexadecimal representation. Then in SingleStore, execute the `INSERT` statement using the `UNHEX()` built-in function to convert the hex representation back to its original binary format.

Consider the following example written in Python (3.8), which uses a few modules to generate vectors and convert them to hex:

```import numpyimport struct
vector = numpy.random.random(1024)vectorStr = "".join([str(struct.pack('f', elem)) for elem in vector])vectorStrEncoded = vectorStr.encode("utf-8").hex()```

When this code is executed, a random floating point vector is generated:

`[0.09361789  0.22198475  0.36929942 ...,  0.97525847  0.98422799  0.26693578]`

The vector is then converted to a binary string representation. Finally, the resulting vector is hex encoded:

`bdbabf3df84f633ed014bd3ef6ca183ffc759b ... 48ca303e8aaa793f5ef67b3fcfab883e`

Using the hex-encoded vector, you can connect to the database and execute a query that calls the `UNHEX()` built-in function to convert the vector to its original binary format during insertion:

`memsqlConnection.query('INSERT INTO TABLE vectors VALUES (UNHEX("%s"))' % (vectorStrEncoded))`

You can also do:

`memsqlConnection.query('SELECT EUCLIDEAN_DISTANCE(UNHEX("%s"), UNHEX("%s"))' % (vectorStrEncoded, vectorStrEncoded))`

## Performance Note

`DOT_PRODUCT()`, `EUCLIDEAN_DISTANCE()`, and `VECTOR_SUB()` are high-performance functions implemented for fast vector operations, using single-instruction multiple-data (SIMD) processor instructions. Your hardware must support AVX2 instructions in order to use these functions.

Once the vectors have been stored in a table in the `binary` format, you do not need to use the `UNHEX` built-in to use the blob with the vector builtins.

-`DOT_PRODUCT` takes in two vector blob arguments and returns the scalar dot product of those two vectors

-`VECTOR_SUB` takes in two vector blob arguments and returns a vector blob which is a result of the second argument vector being subtracted from the first argument vector

-`EUCLIDEAN_DISTANCE` takes in two vector blob arguments and returns the scalar euclidean distance between the two vectors. `EUCLIDEAN_DISTANCE(v1, v2)` is equal to `sqrt(DOT_PRODUCT(VECTOR_SUB(v1, v2), VECTOR_SUB(v1, v2)))`. However, because `EUCLIDEAN_DISTANCE` is more efficient with memory bandwidth, we expect the former to be faster than the latter.

For each of the three builtins, the input blob arguments must be the same length, and their binary length must be a multiple of 4. As long as those two conditions are met, the builtins can operator on vectors of any length. There is no requirement that the length of the vectors must be a power of 2, or that the length of the vectors must be divisible by 8 or 16.

If the result for either `DOT_PRODUCT` or `EUCLIDEAN_DISTANCE` is infinity, negative infinity, or NaN, `null` will be returned instead.

### JSON_ARRAY_PACK

There is also the `JSON_ARRAY_PACK` builtin which can be used as a helper for the vector functions. It takes in a JSON array of floating point numbers and returns a packed binary blob suitable to be used as an argument for the vector functions. For example:

`SELECT DOT_PRODUCT(JSON_ARRAY_PACK('[1.0, 0.5, 2.0]'), JSON_ARRAY_PACK('[1.0, 0.5, 2.0]')) "Result";`
``````+--------+
| Result |
+--------+
|   5.25 |
+--------+``````

Using the `JSON_ARRAY_PACK` builtin, we can illustrate that vectors must have the same number of elements of the same type (`BINARY` or BLOB) to work correctly in vector functions:

```SET @v1 = JSON_ARRAY_PACK('[1,2,3,4]');
SET @v2 = JSON_ARRAY_PACK('[1,1,0,0]');
SET @v3 = JSON_ARRAY_PACK('[0,1]');
SET @v4 = JSON_ARRAY_PACK_I8('[0,0,0,0,1]');
SELECT DOT_PRODUCT(@v1,@v2);```
``````+----------------------+
| DOT_PRODUCT(@v1,@v2) |
+----------------------+
|                    3 |
+----------------------+

SELECT LENGTH(@v1), LENGTH(@v4);

+-------------+-------------+
| LENGTH(@v1) | LENGTH(@v4) |
+-------------+-------------+
|          16 |           5 |
+-------------+-------------+
/* Note: the length is in bytes, not elements */``````

An error will be generated if the number of elements or the types don't match:

`SELECT DOT_PRODUCT(@v1,@v3);`
``````ERROR 1940 (HY000): Error with a vector function: argument sizes do not match

SELECT DOT_PRODUCT(@v1, @v4);

ERROR 1940 (HY000): Error with a vector function: argument sizes do not match``````

## Use Cases

A common use case for these functions is to work with floating-point vectors that describe images, such as images of faces or products. These vectors can be created using a neural network trained to categorize the images. Objects are inserted into a table. One field of the object is the vector that describes the image, obtained by feeding the image into the neural network. Queries using `DOT_PRODUCT()` or `EUCLIDEAN_DISTANCE()` can then be run to find the closest matches. For example, the following query retrieves the 10 closest matches to the constant vector in the query, based on a cosine similarity match implemented using `DOT_PRODUCT()`.

`SELECT t.*, DOT_PRODUCT(t.vector, UNHEX("...constant vector...")) AS scoreFROM tORDER BY score DESCLIMIT 10;`