Vector Normalization
Vector normalization usually involves normalizing the feature vectors that describe data objects.
Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1.
However, if your database contains vectors that are not normalized, you can normalize them with a SQL function.
DELIMITER //CREATE or REPLACE FUNCTION normalize(v blob) RETURNS blob ASDECLAREsquares blob = vector_mul(v,v);length FLOAT = sqrt(vector_elements_sum(squares));BEGINRETURN scalar_vector_mul(1/length, v);END //DELIMITER ;
It computes the length as the square root of the sum of the squares of the elements.
Consider the following table:
CREATE TABLE vectors(vec_id INT, vec_details BLOB);INSERT vectors VALUES(1, json_array_pack('[1,1]'));INSERT vectors VALUES(2, json_array_pack('[2,1]'));INSERT vectors VALUES(3, json_array_pack('[0,1]'));
Some of the vectors have not been normalized.
UPDATE vectors SET vec_details = normalize(vec_details);
The vectors in the table are now normalized, as follows:
SELECT vec_id, json_array_unpack(vec_details)FROM vectorsORDER BY vec_id;
+--------+--------------------------------+
| vec_id | json_array_unpack(vec_details) |
+--------+--------------------------------+
| 1 | [0.707106769,0.707106769] |
| 2 | [0.89442718,0.44721359] |
| 3 | [0,1] |
+--------+--------------------------------+
The elements of vectors 1 and 2 have changed since they were longer than one.
Last modified: June 29, 2023