Vector Normalization

Vector normalization usually involves normalizing the feature vectors that describe data objects. This normalization process ensures that the feature vectors have a consistent scale or magnitude, which can be important in certain operations such as distance calculations, clustering, or classification. In particular, when performing cosine similarity calculations with DOT_PRODUCT, the vectors should be normalized to length 1 before storing them in the database.

Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1. If so, they do not need to be normalized further.

However, if your database contains vectors that are not normalized, you can normalize them with a SQL function. The following UDF normalizes a vector of 32-bit floating point values:

DELIMITER //
CREATE or REPLACE FUNCTION normalize(v blob) RETURNS blob AS
DECLARE
squares blob = vector_mul(v,v);
length FLOAT = sqrt(vector_elements_sum(squares));
BEGIN
RETURN scalar_vector_mul(1/length, v);
END //
DELIMITER ;

It computes the length as the square root of the sum of the squares of the elements. Then it forms a new vector that divides the original vector elements by the length.

Consider the following table:

CREATE TABLE vectors(vec_id INT, vec_details BLOB);
INSERT vectors VALUES(1, json_array_pack('[1,1]'));
INSERT vectors VALUES(2, json_array_pack('[2,1]'));
INSERT vectors VALUES(3, json_array_pack('[0,1]'));

Some of the vectors have not been normalized. The vectors in the table can be normalized in place like so:

UPDATE vectors SET vec_details = normalize(vec_details);

The vectors in the table are now normalized, as follows:

SELECT vec_id, json_array_unpack(vec_details)
FROM vectors
ORDER BY vec_id;
+--------+--------------------------------+
| vec_id | json_array_unpack(vec_details) |
+--------+--------------------------------+
|      1 | [0.707106769,0.707106769]      |
|      2 | [0.89442718,0.44721359]        |
|      3 | [0,1]                          |
+--------+--------------------------------+

The elements of vectors 1 and 2 have changed since they were longer than one. Vector 3 is unchanged since it was length one originally.

Last modified: June 29, 2023

Was this article helpful?