Vector Normalization

Vector normalization is the process of standardizing a vector. Vector normalization often involves normalizing feature vectors that describe data objects in applications of artificial intelligence for searching vector embeddings derived from images, text, and audio. The normalization process ensures that the vectors have a consistent scale or magnitude, which can be important in certain operations such as distance calculations, clustering, or classification. 

Specifically, cosine similarity, a common measure of the similarity of two vectors, can be calculated with the DOT_PRODUCT function when the input vectors are all of length 1. Thus, when performing cosine similarity calculations with DOT_PRODUCT, the vectors should be normalized to length 1 before storing them in the database.

Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1. If so, they do not need to be normalized further.

See the following for more information about working with vector data in SingleStore.


However, if your database contains vectors that are not normalized, you can normalize them with a SQL function. The following UDF normalizes a vector of 32-bit floating point values:

CREATE or REPLACE FUNCTION normalize(v blob) RETURNS blob AS
squares blob = vector_mul(v,v);
length FLOAT = sqrt(vector_elements_sum(squares));
RETURN scalar_vector_mul(1/length, v);
END //

It computes the length as the square root of the sum of the squares of the elements. Then it forms a new vector that divides the original vector elements by the length.

Consider the following table:

CREATE TABLE vectors(vec_id INT, vec_details BLOB);
INSERT vectors VALUES(1, json_array_pack('[1,1]'));
INSERT vectors VALUES(2, json_array_pack('[2,1]'));
INSERT vectors VALUES(3, json_array_pack('[0,1]'));

Some of the vectors have not been normalized. The vectors in the table can be normalized in place like so:

UPDATE vectors SET vec_details = normalize(vec_details);

The vectors in the table are now normalized, as follows:

SELECT vec_id, json_array_unpack(vec_details)
FROM vectors
ORDER BY vec_id;
| vec_id | json_array_unpack(vec_details) |
|      1 | [0.707106769,0.707106769]      |
|      2 | [0.89442718,0.44721359]        |
|      3 | [0,1]                          |

The elements of vectors 1 and 2 have changed since they were longer than one. Vector 3 is unchanged since it was length one originally.

Last modified: June 3, 2024

Was this article helpful?