Vector Normalization
On this page
Vector normalization is the process of standardizing a vector.
Specifically, cosine similarity, a common measure of the similarity of two vectors, can be calculated with the DOT_
Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1.
See the following for more information about working with vector data in SingleStore.
Output Format for Examples
Vectors may be output in JSON or binary format.
To get JSON output which will match the examples, use the following command to output vectors in JSON.
SET vector_type_project_format = JSON;
Use the following command to set the output format back to binary.
SET vector_type_project_format = BINARY;
Example
If your database contains vectors that are not normalized, you can normalize them with a SQL function.
DELIMITER //CREATE or REPLACE FUNCTION normalize(v VECTOR(4)) RETURNS VECTOR(4) ASDECLAREsquares VECTOR(4) = vector_mul(v,v);length FLOAT = sqrt(vector_elements_sum(squares));BEGINRETURN scalar_vector_mul(1/length, v);END //DELIMITER ;
This function computes the length of the vector as the square root of the sum of the squares of the elements of the vector.
You can adapt this code to suit your application by changing the vector length.
The following example shows the use of this function to normalize a table of vectors.
CREATE TABLE vectors_not_normalized(id TEXT, v VECTOR(4));INSERT INTO vectors_not_normalized VALUES("A", '[1,2,3,4]'),("B", '[5,4,3,2]'),("C", '[1,0,0,0]');
Display the vectors and their lengths from the vectors_
table with the following query.
Click the Playground icon to the right of the SQL listing to try this query.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT V.id, V.v, sqrt(vector_elements_sum(vector_mul(V.v,V.v))) AS lengthFROM vectors_not_normalized VORDER BY V.id;
+------+-----------+--------------------+
| id | v | length |
+------+-----------+--------------------+
| A | [1,2,3,4] | 5.477225575051661 |
| B | [5,4,3,2] | 7.3484692283495345 |
| C | [1,0,0,0] | 1 |
+------+-----------+--------------------+
Note that vectors with ids A and B have length longer than 1 and will be expected to change when normalized.
Now, create a table to hold the normalized vectors.
CREATE TABLE vectors_normalized(id TEXT, v VECTOR(4));
The following SQL will normalize the vectors and insert the normalized vectors into the new vectors_
table.
INSERT INTO vectors_normalizedSELECT id, normalize(v)FROM vectors_not_normalized;
Run the following SQL to see the vectors and their lengths after normalization.
SET vector_type_project_format = JSON; /* to make vector output readable */SELECT V.id, V.v, sqrt(vector_elements_sum(vector_mul(V.v,V.v))) AS lengthFROM vectors_normalized VORDER BY V.id;
+------+---------------------------------------------------+--------------------+
| id | v | length |
+------+---------------------------------------------------+--------------------+
| A | [0.182574183,0.365148365,0.547722578,0.730296731] | 0.9999999850988387 |
| B | [0.680413842,0.544331074,0.408248305,0.272165537] | 1.000000033527612 |
| C | [1,0,0,0] | 1 |
+------+---------------------------------------------------+--------------------+
The elements of vectors A and B have changed significantly since and their original lengths were not very close to 1.
The lengths of vectors A and B are close to, but not exactly one.
Related Topics
Last modified: June 3, 2024