Vector Normalization

Vector normalization is the process of standardizing a vector. Vector normalization often involves normalizing feature vectors that describe data objects in applications of artificial intelligence for searching vector embeddings derived from images, text, and audio. The normalization process ensures that the vectors have a consistent scale or magnitude, which can be important in certain operations such as distance calculations, clustering, or classification.

Specifically, cosine similarity, a common measure of the similarity of two vectors, can be calculated with the DOT_PRODUCT function when the input vectors are all of length 1. Thus, when performing cosine similarity calculations with DOT_PRODUCT, the vectors should be normalized to length 1 before storing them in the database.

Many APIs that produce vector embeddings, such as the OpenAI APIs, always return vectors of length 1, so check the documentation for the vectors' source to see if they are length 1. If so, they do not need to be normalized further.

See the following for more information about working with vector data in SingleStore.

Output Format for Examples

Vectors can be output in JSON or binary format. Use JSON format for examples and for output readability. For production, use the default binary for efficiency.

Use the following command to output vectors in JSON format.

SQL

SET vector_type_project_format = JSON;

Use the following command to set the output format back to binary.

SQL

SET vector_type_project_format = BINARY;

Example

If your database contains vectors that are not normalized, you can normalize them with a SQL function. The following UDF (CREATE FUNCTION (UDF)) normalizes a vector of 32-bit floating point values with length 4:

SQL

DELIMITER //
CREATE or REPLACE FUNCTION normalize(v VECTOR(4)) RETURNS VECTOR(4) AS
DECLARE
  squares VECTOR(4) = vector_mul(v,v);
  length FLOAT = sqrt(vector_elements_sum(squares));
BEGIN
  RETURN scalar_vector_mul(1/length, v);
END //
DELIMITER ;

This function computes the length of the vector as the square root of the sum of the squares of the elements of the vector. The function then forms a new vector by dividing the original vector elements by the calculated length of the vector. The result is a new "normalized" vector which has length 1.

You can adapt this code to suit your application by changing the vector length. If you work with different vector lengths, you can create a function with a different name for each desired length.

The following example shows the use of this function to normalize a table of vectors. Vectors of length 4 are used to simplify the example and make it easier to see and understand the results. In a real-world use case, vectors are expected to have more elements.

SQL

CREATE TABLE vectors_not_normalized(id TEXT, v VECTOR(4));

INSERT INTO vectors_not_normalized VALUES
  ("A", '[1,2,3,4]'),
  ("B", '[5,4,3,2]'),
  ("C", '[1,0,0,0]');

Display the vectors and their lengths from the vectors_not_normalized table with the following query.

SQL

SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT V.id, V.v, sqrt(vector_elements_sum(vector_mul(V.v,V.v))) AS length
FROM vectors_not_normalized V
ORDER BY V.id;

+------+-----------+--------------------+
| id   | v         | length             |
+------+-----------+--------------------+
| A    | [1,2,3,4] |  5.477225575051661 |
| B    | [5,4,3,2] | 7.3484692283495345 |
| C    | [1,0,0,0] |                  1 |
+------+-----------+--------------------+

Note that vectors with ids A and B have length longer than 1 and will be expected to change when normalized. Vector with id C has a length of 1 and is not expected to change when normalized.

Now, create a table to hold the normalized vectors.

SQL

CREATE TABLE vectors_normalized(id TEXT, v VECTOR(4));

The following SQL will normalize the vectors and insert the normalized vectors into the new vectors_normalized table.

SQL

INSERT INTO vectors_normalized
SELECT id, normalize(v)
FROM vectors_not_normalized;

Run the following SQL to see the vectors and their lengths after normalization.

SQL

SET vector_type_project_format = JSON;  /* to make vector output readable */

SELECT V.id, V.v, sqrt(vector_elements_sum(vector_mul(V.v,V.v))) AS length
FROM vectors_normalized V
ORDER BY V.id;

+------+---------------------------------------------------+--------------------+
| id   | v                                                 | length             |
+------+---------------------------------------------------+--------------------+
| A    | [0.182574183,0.365148365,0.547722578,0.730296731] | 0.9999999850988387 |
| B    | [0.680413842,0.544331074,0.408248305,0.272165537] |  1.000000033527612 |
| C    | [1,0,0,0]                                         |                  1 |
+------+---------------------------------------------------+--------------------+

The elements of vectors A and B have changed significantly since and their original lengths were not very close to 1. Vector C has not changed as its original length was 1.

The lengths of vectors A and B are close to, but not exactly one. This is because of the finite precision of floating point representation.

Vector Normalization

On this page

Output Format for Examples

Example

Was this article helpful?

On this page

Was this article helpful?

Vector Normalization

On this page

Output Format for Examples

Example

Related Topics

Was this article helpful?

On this page

Was this article helpful?