Vector Normalization
On this page
Vector normalization is the process of standardizing a vector.
Specifically, cosine similarity, a common measure of the similarity of two vectors, can be calculated with the DOT_
Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1.
See the following for more information about working with vector data in SingleStore.
Example
However, if your database contains vectors that are not normalized, you can normalize them with a SQL function.
DELIMITER //CREATE or REPLACE FUNCTION normalize(v blob) RETURNS blob ASDECLAREsquares blob = vector_mul(v,v);length FLOAT = sqrt(vector_elements_sum(squares));BEGINRETURN scalar_vector_mul(1/length, v);END //DELIMITER ;
It computes the length as the square root of the sum of the squares of the elements.
Consider the following table:
CREATE TABLE vectors(vec_id INT, vec_details BLOB);INSERT vectors VALUES(1, json_array_pack('[1,1]'));INSERT vectors VALUES(2, json_array_pack('[2,1]'));INSERT vectors VALUES(3, json_array_pack('[0,1]'));
Some of the vectors have not been normalized.
UPDATE vectors SET vec_details = normalize(vec_details);
The vectors in the table are now normalized, as follows:
SELECT vec_id, json_array_unpack(vec_details)FROM vectorsORDER BY vec_id;
+--------+--------------------------------+
| vec_id | json_array_unpack(vec_details) |
+--------+--------------------------------+
| 1 | [0.707106769,0.707106769] |
| 2 | [0.89442718,0.44721359] |
| 3 | [0,1] |
+--------+--------------------------------+
The elements of vectors 1 and 2 have changed since they were longer than one.
Related Topics
Last modified: June 3, 2024