# Cosine Similarity and Cosine Distance

## On this page

Cosine similarity is a measure of the similarity of two vectors.

In these applications, images, blocks of text, audio files, and other content are converted to vectors.

SingleStore provides a Vector Type for storing vector data and Vector Indexing to speed up vector search when searching over large sets of vector data.

Also see Working with Vector Data for more information about and examples of working with vector data in SingleStore.

**Cosine Similarity Calculation:** Cosine similarity can be calculated in two different ways in SingleStore.`DOT_`

to calculate cosine similarity.

Using `DOT_`

to calculate cosine similarity gives the best performance since `DOT_`

does not have to normalize its input every time it is called and because indexed vector search can be used.

Important

When performing cosine similarity calculations with `DOT_`

, the vectors should be normalized to length 1 before storing them in the database.

Example 1 below demonstrates calculating cosine similarity using `DOT_`

over a set of vectors with length 1.

In some cases, it is necessary to calculate cosine similarity directly or to calculate cosine distance.`cosine_`

and `cosine_`

.`DOT_`

.

Euclidean distance is another common measure or vector similarity.

Important

When finding the top K closest matches, descending (`DESC`

) sort order must be used when you are using `DOT_`

and cosine similarity.`ASC`

) sort order for cosine distance and `EUCLIDEAN_`

.

## Examples

### Example 1 - Cosine Similarity using DOT_PRODUCT (<*>)

A common use case of cosine similarity is to calculate the similarity of a set of vectors to a query vector.`@query_`

and a set of normalized vectors using the infix `DOT_`

operator (`<*>`

).

Note

This example assumes that the vectors are already normalized.

This SQL creates a comments table which includes text comments, and a vector representing an embedding for each comment.

CREATE TABLE comments(id INT,comment TEXT,comment_embedding VECTOR(4) not null,category VARCHAR(256));

INSERT INTO comments VALUES(1, "The cafeteria in building 35 has a great salad bar",'[0.45, 0.55, 0.495, 0.5]',"Food"),(2, "I love the taco bar in the B16 cafeteria.",'[0.01111, 0.01111, 0.1, 0.999]',"Food"),(3, "The B24 restaurant salad bar is quite good.",'[0.1, 0.8, 0.2, 0.555]',"Food");

This SQL creates a query vector and uses an `ORDER BY … LIMIT`

query to find the two comment_

The `@query_`

variable is cast to a `VECTOR`

to ensure that `@query_`

is a valid `VECTOR`

and to improve performance.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT id, comment, category,comment_embedding <*> @query_vec AS scoreFROM commentsORDER BY score DESCLIMIT 2;

```
*** 1. row ***
id: 1
comment: The cafeteria in building 35 has a great salad bar
category: Food
score: 0.9810000061988831
*** 2. row ***
id: 3
comment: The B24 restaurant salad bar is quite good.
category: Food
score: 0.8993000388145447
```

### Example 2 - Cosine Similarity and Cosine Distance

Note

Using the `cosine_`

and `cosine_`

functions provided below will have a performance impact as the functions normalize the input vectors and are unable to use vector indexes.

The following is a User Defined Function (UDF) (CREATE FUNCTION (UDF)) for calculating cosine similarity.

DELIMITER //CREATE OR REPLACE FUNCTION cosine_similarity(v1 VECTOR(4), v2 VECTOR(4))RETURNS FLOATASBEGINRETURN DOT_PRODUCT(normalize(v1), normalize(v2));END //DELIMITER ;

This function uses `DOT_`

and the `normalize`

function defined in Vector Normalization to compute the cosine similarity of two vectors of length 4.

You can adapt this code to suit your application by changing the vector length.

The UDF below calculates `cosine_`

using the `cosine_`

UDF defined above.

DELIMITER //CREATE OR REPLACE FUNCTION cosine_distance(v1 VECTOR(4), v2 VECTOR(4))RETURNS FLOATASBEGINRETURN 1 - cosine_similarity(v1, v2);END //DELIMITER ;

With cosine similarity, a higher score means the vectors have greater similarity.

Important

For top K queries, you must use a descending (`DESC`

) sort order when using cosine similarity and an ascending (`ASC`

) sort order when using cosine distance.

The following query calculates the cosine similarity of the comment embedding vectors to the query vector, `@query_`

, and orders the results in descending (`DESC`

) order of the score.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT id, comment,cosine_similarity(comment_embedding, @query_vec) AS scoreFROM commentsORDER BY score DESC;

```
+------+----------------------------------------------------+----------+
| id | comment | score |
+------+----------------------------------------------------+----------+
| 1 | The cafeteria in building 35 has a great salad bar | 0.980735 |
| 3 | The B24 restaurant salad bar is quite good. | 0.899957 |
| 2 | I love the taco bar in the B16 cafeteria. | 0.66153 |
+------+----------------------------------------------------+----------+
```

The query below calculates the `cosine_`

between the query vector and the vectors in the table.`ASC`

) order as is appropriate for cosine distance.

Note

`ASC`

is the default ordering for `ORDER BY`

clauses.

SET @query_vec = ('[0.44, 0.554, 0.34, 0.62]'):>VECTOR(4);SELECT id, comment,cosine_distance(comment_embedding, @query_vec) AS scoreFROM commentsORDER BY score ASC;

```
+------+----------------------------------------------------+-----------+
| id | comment | score |
+------+----------------------------------------------------+-----------+
| 1 | The cafeteria in building 35 has a great salad bar | 0.0192653 |
| 3 | The B24 restaurant salad bar is quite good. | 0.100043 |
| 2 | I love the taco bar in the B16 cafeteria. | 0.33847 |
+------+----------------------------------------------------+-----------+
```

Finally, observe that while the score is different in the query using `cosine_`

versus the query using `cosine_`

, the ordering of the comments in the results is the same.

## Related Topics

Last modified: March 12, 2024