BM25

This function is used to employ BM25 partition-scoped scoring for VERSION 2 full-text search queries.

Partition level scoring improves accuracy across a larger set of data. Collection and term statistics are calculated within a partition, ensuring accurate scores relative to a row's placement within that partition. Queries using the MATCH syntax are still scored on a segment level.

Refer to Working with Full-Text Search for more conceptual information on this function.

Syntax

The syntax for BM25 partition-scoped scoring, use the BM25() function as follows:

BM25(<table_name>,'<column>:<term>')

The syntax for the BM25() function for JSON columns is:

BM25(<table_name>,'<column>$<key.path>:<term>')

Examples

Obtaining a BM25 Score Example

CREATE TABLE scoring (a INT, b TEXT, c JSON, SORT KEY(a),
FULLTEXT USING VERSION 2 scr_ft_index (b, c));
INSERT INTO scoring VALUES
(1, 'one two three three four four four five five five five five','{"one": "two"}');
INSERT INTO scoring VALUES
(2, 'one two two three three three three four four four four four four four four','{"one": "three"}');
OPTIMIZE TABLE scoring FLUSH;
SELECT a
, ROUND(BM25(scoring,'b:one'), 3) AS score_one
, ROUND(BM25(scoring,'b:two'), 3) AS score_two
, ROUND(0.6*BM25(scoring,'b:three') + 0.4*BM25(scoring,'b:four'),3) AS score_boosted
FROM scoring;
+-----+-----------+-----------+---------------+
| a   | score_one | score_two | score_boosted |
+-----+-----------+-----------+---------------+
| 1   |     0.087 |     0.087 |         0.124 |
| 2   |     0.079 |     0.110 |         0.145 |
+-----+-----------+-----------+---------------+
 

To obtain BM25 scores over JSON documents at a keypath, use the following query:

SELECT a, ROUND(BM25(scoring, 'c$one:three'), 3) AS score_one FROM scoring;
+-----+-----------+
| a   | score_one |
+-----+-----------+
| 1   |     0.000 | 
| 2   |     0.131 |
+-----+-----------+

Last modified: June 3, 2024

Was this article helpful?