# BM25

The `BM25_GLOBAL` and `BM25` functions are used to employ [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) scoring for `VERSION 2` full-text search queries.

For columnstore tables created with a `VERSION 2 FULLTEXT` index, the text columns in such tables can be searched using the `BM25_GLOBAL` and `BM25` functions. The `BM25_GLOBAL` function scores all rows in a table together, while the `BM25` function provides partition-level scoring. For `BM25_GLOBAL`, collection and term statistics are calculated for a table, ensuring accurate scores relative to all rows in a table. For `BM25`, collection and term statistics are calculated within a partition, ensuring accurate scores relative to a row's placement within a partition.

The `BM25_GLOBAL` function is more accurate than the `BM25` function and is also more expensive. In `BM25` the accuracy of the scores depends on how data is distributed within each partition. If text data is evenly distributed across all partitions, then `BM25` will produce scores that are close to `BM25_GLOBAL`. Refer to [Comparison of BM25 and BM25\_GLOBAL](https://docs.singlestore.com/#section-idm234863501492724.md) for details.

Queries using the `MATCH` syntax are scored using BM25 on a segment level. As partitions contain multiple segments, searches using `BM25` and `BM25_GLOBAL` return more accurate results than `MATCH` but are less efficient.

Thus, the functions `MATCH`, `BM25`, and `BM25_GLOBAL` provide BM25 scores with a tradeoff between accuracy and performance with `MATCH` as the least accurate, but most efficient, and `BM25_GLOBAL` as the most accurate and most expensive.

The result of  the `BM25`, `BM25_GLOBAL`, and `MATCH` functions is a relevancy score. A higher score indicates a higher quality match, while a lower score indicates a lower quality match. There is no upper bound to a BM25 score; the score is intended to be used for ordering and is not intended to be directly interpreted.

> **❗ Important**: To include recent inserts/updates in the results, run `OPTIMIZE TABLE tbl_name FLUSH` before running your query.

Refer to [Working with Full-Text Search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-full-text-search.md) for conceptual information on full-text search.

## Syntax

The syntax for `BM25` and `BM25_GLOBAL`scoring is as follows:

```sql
BM25(<table_name>, <expression>)
BM25_GLOBAL(<table_name>, <expression>)
```

The `USING VERSION 2` syntax must be used in the `CREATE TABLE` command to utilize `BM25` and `BM25_GLOBAL`. Refer to [Working with Full-Text Search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-full-text-search.md) for more information on `VERSION 2`.

BM25 full-text search can be customized for languages other than English; refer to [Full Text VERSION 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md) for more information.

Each `BM25` `or BM25_GLOBAL` clause applies to only one table. Specify multiple `BM25` or `BM25_GLOBAL`clauses to search against multiple tables.&#x20;

## Operators

A set of operators from [Java Lucene full-text search string syntax](https://lucene.apache.org/core/9_10_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description) is supported. The `<expression>` consists of a list of one or more columns, followed by a mix of text with zero or more of the following operators.

| **Operator**   | **Description**                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| (no operator)  | When no operator is specified, the word is optional; however, the rows that contain it are rated higher.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| +              | A leading plus sign indicates that this word must be present in each row returned.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| -              | A leading minus sign indicates that this word must not be present in any of the rows that are returned. Note: The`-`operator acts only to exclude rows that are otherwise matched by other search terms.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| NOT            | The`NOT`operator behaves the same as the`-`operator. The symbol`!`can be used in place of the word`NOT`. The`NOT`operator must be in all caps.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
| AND            | The`AND`operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol`&&`can be used in place of the word`AND`. The expression“A AND B”is equivalent to“+A +B”. The`AND`operator must be in all caps.                                                                                                                                                                                                                                                                                                                                                                             |
| OR             | The`OR`operator behaves the same as not having any operator between words. The symbol \|\| can be used in place of the word`OR`. The`OR`operator must be in all caps.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| ()             | Parentheses group words into subexpressions. Parenthesized groups can be nested.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ""             | A phrase is a quoted string with two or more space-delimited terms. For example,`'key:"word1 word2"'`is correct syntax for a phrase. The double quotes around the phrase`"word1 word2"`are required. The words in the phrase search may not use wildcards or proximity search notation.A phrase that is enclosed within double quote (") characters matches the words in the quotes as if it is a single word. If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because of a combination of factors: if they do not exist in the text, are stopwords, or are shorter than the minimum length of indexed words. |
| \~\<proximity> | A`~`with a`<proximity>`value can be appended to the`BM25`query when using a phrase. The proximity is the maximum number of words allowed between terms in the phrase. For example,`"word1 word2"~3`would be an example of appending a proximity value of`3`to the phrase`"word1 word2"`.                                                                                                                                                                                                                                                                                                                                                                                      |

## Stopwords

Certain words are ignored by full-text search due to their commonality resulting in less relevant results. These are called stopwords. SingleStore 's default list of stopwords is as follows:

```
a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, 
of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with

```

Customized stopword lists can be used with custom analyzers. Refer to [Full Text VERSION 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md) for more information.

## Special Characters

The default tokenizer for full-text search removes special characters from the search. The list of special characters is:

```
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \

```

Custom tokenizers can be used to search for strings that include those characters. Refer to [Full Text VERSION 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md) for more information.

## Relevancy Score

The relevancy score of an expression in a `BM25` or `BM25_GLOBAL`statement denotes the ranking of the expression based on the following factors:

* Number of times an expression appears in a column. More occurrences of an expression in the matched column(s) increases its relevancy score.
* Rarity of the expression. Rare words have a higher relevancy score than commonly used words.
* The length of the column containing the expression. A column with a short expression has a higher relevancy score than a column with a long expression.

## Remarks

* The column specification in the `<expression>` may contain a JSON keypath. A keypath is an expression containing one or more keys, delimited by dots, that describes the hierarchical path to a field in a JSON document. For example, `key1.key2.key3` would be a valid keypath.
* Refer to [MATCH](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/match.md) for more examples of full-text search with the above operators.

## Comparison of BM25 and BM25\_GLOBAL

This section provides details on the differences between the `BM25` and `BM25_GLOBAL` functions. The `BM25` function is more efficient, but potentially less accurate than the `BM25_GLOBAL` function. However, with careful data distribution, `BM25` can provide accurate scores, comparable to those of `BM25_GLOBAL`.

The `BM25_GLOBAL` and `BM25` functions both use the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking function to compute full-text scores. This function utilizes statistics to compute BM25 scores that include:

* The number of documents being searched.
* The number of tokens in all the documents being searched.
* The number of documents in which the term you are searching for exists.
* The total number of occurrences of the term across all documents.

The `BM25_GLOBAL` function calculates these statistics and the `BM25` scores across the entire data set. In contrast, the `BM25` function calculates these statistics and the `BM25` scores per partition. Thus, a `BM25_GLOBAL` score is relative to ALL the data, while a `BM25` score is relative to the data on the partition.

There is a tradeoff of accuracy versus search speed between the two functions, `BM25_GLOBAL` is more accurate, but searches are slower; `BM25` has faster searches, but less accuracy.

## Obtaining Accurate BM25 Scores

The `BM25` function works best if the data is evenly distributed across partitions with respect to the statistics specified above. That is, the statistics should be similar across partitions.

Ideally:

* Terms should be evenly distributed across partitions; terms should not appear significantly more often in one partition than in another.
* Partitions should have similar numbers of documents.
* Partitions should have similar distributions of document lengths; there should not be a partition with longer documents than other partitions.

To distribute data to obtain `BM25` scores that are as accurate as possible:

* Ensure that the `SHARD KEY` is not strongly correlated with the text data in the column with the full-text index.
* Set the `SHARD KEY` to a unique column. This is typically sufficient to ensure even data distribution.
* If there is not a unique column suitable for use as a `SHARD KEY`, randomly shard by creating a unique `ID` column and set that column as the `SHARD KEY`.
* For performance, consider a `SORT KEY` on a column other than the `SHARD KEY`, such as a column that your queries do range filters on, such as a date or time column.

If distributing data in such ways is not feasible in your application, and you require high accuracy, consider using `BM25_GLOBAL`.

## Examples

> **📝 Note**: The examples in this section use the `BM25` function. All examples also work with `BM25_GLOBAL`. The only difference between `BM25` and `BM25_GLOBAL` is the table-level vs. partition-level scoring.

## Create Example Data

Create a table with a `FULLTEXT VERSION 2` index on columns title and body. Insert data into that table and optimize the table to ensure queries have access to all the added rows via the full-text index.

Either column (`title` or `body`) can be queried separately using the `BM25` function, and the index on the column will be applied.

```sql
CREATE TABLE articles (
    id INT UNSIGNED,
    year int UNSIGNED,
    title VARCHAR(200),
    body TEXT,
    SORT KEY (id),    
    FULLTEXT USING VERSION 2 art_ft_index (title, body));

```

```sql
INSERT INTO articles (id, year, title, body) VALUES
   (1, 2021, 'Introduction to SQL', 'SQL is a standard language for accessing and manipulating databases.'),
   (2, 2022, 'Advanced SQL Techniques', 'Explore advanced techniques and functions in SQL for better data manipulation.'),
   (3, 2020, 'Database Optimization', 'Learn about various optimization techniques to improve database performance.'),
   (4, 2023, 'SQL in Web Development', 'Discover how SQL is used in web development to interact with databases.'),
   (5, 2019, 'Data Security in SQL', 'An overview of best practices for securing data in SQL databases.'),
   (6, 2021, 'SQL and Data Analysis', 'Using SQL for effective data analysis and reporting.'),
   (7, 2022, 'Introduction to Database Design', 'Fundamentals of designing a robust and scalable database.'),
   (8, 2020, 'SQL Performance Tuning', 'Tips and techniques for tuning SQL queries for better performance.'),
   (9, 2023, 'Using SQL with Python', 'Integrating SQL with Python for data science and automation tasks.'),
   (10, 2019, 'NoSQL vs SQL', 'A comparison of NoSQL and SQL databases and their use cases.'),
   (11, 2020, 'Real-time Data Analysis', 'An introduction to real-time analytics.'),
   (12, 2021, 'Analysis for Beginners', 'Simple examples of real time analytics.'),
   (13, 2023, 'Data-Dictionary Design', 'Create and maintain effective data dictionaries.'),
   (14, 2024, 'Scalable Performance', 'Designing for scalability.');

OPTIMIZE TABLE articles FLUSH;

```

## Example 1 - Search for a Term

Search for articles with the word `'database'` in the body. This query will return rows where the `BM25` score for the row is non-zero.

```sql
SELECT id, title, body, BM25(articles,'body:database') AS score
   FROM articles
   WHERE BM25(articles,'body:database');

```

```output

+------+---------------------------------+------------------------------------------------------------------------------+---------------------+
| id   | title                           | body                                                                         | score               |
+------+---------------------------------+------------------------------------------------------------------------------+---------------------+
|    7 | Introduction to Database Design | Fundamentals of designing a robust and scalable database.                    | 0.13076457381248474 |
|    3 | Database Optimization           | Learn about various optimization techniques to improve database performance. |  0.3219999074935913 |
+------+---------------------------------+------------------------------------------------------------------------------+---------------------+
```

## Example 2 - Use BM25 Twice in one Query

```sql
SELECT title
    FROM articles  
   WHERE BM25(articles, 'title:SQL OR body:("Business Intelligence")')
  AND BM25(articles,'body:web');

```

```output

+------------------------+
| title                  |
+------------------------+
| SQL in Web Development |
+------------------------+
```

## Example 3 - Use BM25 with a SQL Predicate

```sql
SELECT count(*)
  FROM articles
  WHERE year = 2021
 AND BM25(articles,'body:SQL');

```

```output

+----------+
| count(*) |
+----------+
|        2 |
+----------+
```

## Example 4 - Score over Multiple Columns

Search for the word `'database'` across two columns

```sql
SELECT *
FROM articles
  WHERE BM25(articles,'title:database OR body:database' );

```

```output

+------+------+---------------------------------+------------------------------------------------------------------------------+
| id   | year | title                           | body                                                                         |
+------+------+---------------------------------+------------------------------------------------------------------------------+
|    7 | 2022 | Introduction to Database Design | Fundamentals of designing a robust and scalable database.                    |
|    3 | 2020 | Database Optimization           | Learn about various optimization techniques to improve database performance. |
+------+------+---------------------------------+------------------------------------------------------------------------------+

```

## Example 5 - Use a Single Boolean Operator

Boolean operators allow terms to be combined through logic operators. The following Boolean operators are supported:  AND, "+", OR, NOT and "-"

> **📝 Note**: Boolean operators must be in ALL CAPS.

```sql
SELECT title
  FROM articles
 WHERE BM25(articles,'title:Database OR title:"Business Intelligence"');

```

```output

+---------------------------------+
| title                           |
+---------------------------------+
| Introduction to Database Design |
| Database Optimization           |
+---------------------------------+

```

## Example 6 - Use Multiple Boolean Operators

The following example returns the title of an article that contains either `'Database'` or `'Business Intelligence'` in the title and the string `'real-time analytics'` in the body.

```sql
SELECT title
  FROM articles
  WHERE BM25(articles,'(title:Database OR title:"Analysis") AND (body:"real-time analytics")');

```

```output

+-------------------------+
| title                   |
+-------------------------+
| Analysis for Beginners  |
| Real-time Data Analysis |
+-------------------------+
```

## Example 7 - Score over JSON

For this and the following example, create a table named `scoring` with a `FULLTEXT VERSION 2` index on columns `b` and `c`. Insert data into that table and optimize the table to ensure queries have access to all the added rows via the full-text index.

```sql
CREATE TABLE scoring (a INT, b TEXT, c JSON, SORT KEY(a),
                        FULLTEXT USING VERSION 2 scr_ft_index (b, c));

INSERT INTO scoring VALUES
  (1, 'one two three three four four four five five five five five','{"one": "two"}');
INSERT INTO scoring VALUES
  (2, 'one two two three three three three four four four four four four four four','{"one": "three"}');

OPTIMIZE TABLE scoring FLUSH;
```

To obtain `BM25` scores over JSON documents at a keypath, use the following query. In this query, the keypath is `one:three`.

```sql
SELECT a, ROUND(BM25(scoring, 'c$one:three'), 3) AS score_one 
FROM scoring;

```

```output

+-----+-----------+
| a   | score_one |
+-----+-----------+
| 1   |     0.000 | 
| 2   |     0.131 |
+-----+-----------+

```

## Example 8 - Use Phrases and Proximity Search

Proximity search enables you to find words within a specific distance of each other. To do a proximity search use the tilde, `~`, symbol with a `<proximity>` value at the end of a phrase. For example, to search for a `"Single"` and `"Store"` within `10` words of each other in a row use the search: `"Single Store"~10`.

The following example finds `BM25` scores over phrases using proximity search. Observe the change in scores as the proximity value is increased.

```sql
SELECT a
     , ROUND(BM25(scoring,'b:"two four"'), 3) AS score_two_four
     , ROUND(BM25(scoring,'b:"two four"~2'), 3) AS score_two_four_proximity_2
     , ROUND(BM25(scoring,'b:"two four"~4'), 3) AS score_two_four_proximity_4
FROM scoring;


```

```output

+-----+----------------+----------------------------|----------------------------+ 
| a   | score_two_four | score_two_four_proximity_2 | score_two_four_proximity_4 |
+-----+----------------+----------------------------|----------------------------+
| 1   |          0.000 |                     0.125  |                     0.125  |
| 2   |          0.000 |                     0.000  |                     0.082  |
|+----+----------------+----------------------------+----------------------------+
```

## Related Topics

* [Working with Full-Text Search](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/working-with-full-text-search.md)
* [MATCH](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/match.md)

***

Modified at: February 11, 2026

Source: [/db/v9.1/reference/sql-reference/full-text-search-functions/bm-25/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/bm-25/)

(An index of the documentation is available at /llms.txt)
