# ANALYZE FULLTEXT

The `ANALYZE FULLTEXT` command displays the tokens generated by a Lucene analyzer for a string.

This command shows the tokens that an analyzer would generate when building a full-text index for a specified string. This information can be used to debug full-text index searches. In addition, `ANALYZE FULLTEXT` can be used to understand tokenization before selecting an analyzer configuration.

Refer to [Example 1](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/analyze-fulltext/#section-id235600035966813.md) and [Example 2](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/analyze-fulltext/#section-id235600036070459.md) for demonstrations of these use cases.

## Syntax

```sql
ANALYZE FULLTEXT "<text>" [OPTIONS '<analyzer options>'];
```

## Arguments

* `<text>` is the string to be analyzed.
* `<analyzer options>` is a JSON string that specifies the analyzer used to tokenize the `<text>` string.

  * `<analyzer options>` uses the same syntax as the analyzer key in `INDEX_OPTIONS` for full-text indexes. Refer to [Full Text VERSION 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md) for information.
  * If `<analyzer options>` is not provided, the Lucene StandardAnalyzer is used.

To see the tokens, set `<analyzer options>` in the `ANALYZE FULLTEXT` command to the value of `analyzer` in `INDEX_OPTIONS` in the index creation command.

For example, if an index was created with: `INDEX_OPTIONS '{"analyzer" : "spanish"}'`, use `OPTIONS '{"analyzer" : "spanish"}'` in the `ANALYZE FULLTEXT` command.

## Output

The output of `ANALYZE FULLTEXT` includes tokens from the Lucene analyzer and attributes that are associated with each token, which are created from Lucene attributes. 

The following table lists the attributes in the output of the `ANALYZE FULLTEXT` command and the Lucene Attributes used to create them.

| SingleStore Attribute(s)    | Lucene Attribute             | Description                                                                                                                                                            |
| --------------------------- | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `token`                     | `CharTermAttribute`          | The term text of a token.                                                                                                                                              |
| `position_length`           | `PositionLengthAttribute`    | The number of positions occupied by a token.                                                                                                                           |
| `type`                      | `TypeAttribute`              | The type of the token.                                                                                                                                                 |
| `start_offset`,`end_offset` | `OffsetAttribute`            | The start and end offset of a token in characters.                                                                                                                     |
| `position`                  | `PositionIncrementAttribute` | `position`: The absolute position of the token. The first token is at position 0.`PositionIncrementAttribute`: The position of a token relative to the previous token. |

Refer to [Attribute and Attribute Source](https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/analysis/package-summary.html#attribute-and-attributesource-heading) for additional information about the Lucene Attributes.

## Examples

## Example 1 - Debug Full-Text Search

The following query searches for the quote "to be or not to be" in the quotes table. The search does not return any results even though the quote is in the data.

```sql
SELECT *
FROM quotes
WHERE MATCH(TABLE quotes) AGAINST ("quote:\"to be or not to be\"");

```

```output

Empty set (0.19 sec)
```

The following steps show how to debug this disconnect.

Use the `SHOW CREATE TABLE` command to view the full-text index definition.

```sql
SHOW CREATE TABLE quotes;

```

```output

+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| quotes | CREATE TABLE `quotes` (
  `id` int(11) DEFAULT NULL,
  `quote` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  `author` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
  FULLTEXT USING VERSION 2 KEY `quote` (`quote`) INDEX_OPTIONS="{ \"analyzer\": \"english\"}",
  SORT KEY `__UNORDERED` ()
  , SHARD KEY () 
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES,NO_AUTO_CREATE_USER' CHARACTER SET=`utf8mb4` COLLATE=`utf8mb4_bin` |
+--------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

```

From the results, observe that the `INDEX_OPTIONS` used to define the full-text index is as follows.

```sql
INDEX_OPTIONS="{ \"analyzer\": \"english\"}"
```

Use `ANALYZE FULLTEXT` with the index options in the `OPTIONS` clause to display the tokens generated by the analyzer for the string "to be or not to be".

```
ANALYZE FULLTEXT "to be or not to be" OPTIONS="{ \"analyzer\": \"english\"}";

```

```output

+----------------+
| Response       |
+----------------+
| {"tokens": []} |
+----------------+

```

No tokens are created because all of the words in this quote are stopwords, which explains why there was no match in the first query.

Instead, search for quotes that contain the word "question". The following result is produced.

```sql
SELECT * 
FROM quotes 
WHERE MATCH(TABLE quotes) AGAINST ("quote:\"question\"");

```

```output

+------+--------------------------------------------+---------------------+
| id   | quote                                      | author              |
+------+--------------------------------------------+---------------------+
|    1 | To be, or not to be, that is the question. | William Shakespeare |
+------+--------------------------------------------+---------------------+

```

## Example Table

The following commands were used to create and populate the `quotes` table.

```sql
CREATE TABLE quotes (
	id INT,
	quote VARCHAR(200),
	FULLTEXT USING VERSION 2 (quote) INDEX_OPTIONS
        '{ "analyzer": "english"}',
      author VARCHAR(50)
);

INSERT INTO quotes VALUES
 (1, "To be, or not to be, that is the question.", "William Shakespeare"),
 (2, "We delight in the beauty of the butterfly, but rarely admit the changes it has gone through to achieve that beauty.", "Maya Angelou"),
 (3, "The most common way people give up their power is by thinking they don't have any.", "Alice Walker");

OPTIMIZE TABLE quotes FLUSH;

```

## Example 2 - Understand Analyzer Behavior

Use the `ANALYZE FULLTEXT` command to understand how a custom analyzer will tokenize text and preview the results of an analyzer configuration.

Consider the following `INDEX_OPTIONS` from [this example](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers/#section-idm234616973775952.md) in [Full Text Version 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md).  These `INDEX_OPTIONS` are intended for searching HTML content. However, before creating an index with these options, it may be useful to investigate how exactly these options tokenize text.

```sql
INDEX_OPTIONS
 '{
    "analyzer": {
        "custom": {"char_filters": ["html_strip"],
                   "tokenizer": "whitespace",
                   "token_filters":["lower_case"]
                   }
    }
 }'
```

Use `ANALYZE FULLTEXT` to see how the HTML string "Learning is a never-ending journey \&amp;amp; I\&amp;apos;m excited!\&lt;/p\&gt;" is tokenized.

```sql
ANALYZE FULLTEXT 
"Learning is a never-ending journey &amp; I&apos;m excited!</p>"
OPTIONS  '{
    "analyzer": {
        "custom": {"char_filters": ["html_strip"],
                   "tokenizer": "whitespace",
                   "token_filters":["lower_case"]
                   }
    }
 }';

```

```output

+------------------+
| Response                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+------------------+
| {"tokens": [
    {
      "position_length": 1,
      "end_offset": 8,
      "start_offset": 0,
      "position": 0,
      "type": "word",
      "token": "learning"
    },
    {
      "position_length": 1,
      "end_offset": 11,
      "start_offset": 9,
      "position": 1,
      "type": "word",
      "token": "is"
    },
    {
      "position_length": 1,
      "end_offset": 13,
      "start_offset": 12,
      "position": 2,
      "type": "word",
      "token": "a"
    },
    {
      "position_length": 1,
      "end_offset": 26,
      "start_offset": 14,
      "position": 3,
      "type": "word",
      "token": "never-ending"
    },
    {
      "position_length": 1,
      "end_offset": 34,
      "start_offset": 27,
      "position": 4,
      "type": "word",
      "token": "journey"
    },
    {
      "position_length": 1,
      "end_offset": 40,
      "start_offset": 35,
      "position": 5,
      "type": "word",
      "token": "&"
    },
    {
      "position_length": 1,
      "end_offset": 49,
      "start_offset": 41,
      "position": 6,
      "type": "word",
      "token": "i'm"
    },
    {
      "position_length": 1,
      "end_offset": 58,
      "start_offset": 50,
      "position": 7,
      "type": "word",
      "token": "excited!"
    }
  ]} |
+----------------+
```

The results show the following.

* HTML tags are stripped and HTML entities are converted to their character equivalents. For example, "\&amp;" and "i'm".
* Punctuation characters are included with words. For example, "excited!" and "never-ending."
* All text is converted to lower case.

This set of options searches for strings containing "never-ending" without returning strings containing only "never" or "ending".

The following query uses the table html\_table which was created with those options in [Example 2](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers/#section-idm234616973775952.md) of [Full Text Version 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers.md).

```sql
SELECT *
FROM html_table
WHERE MATCH(TABLE html_table) AGAINST ("content:never-ending");

```

```output

+------------------+----------------------------------------------------------------+
| title            | content                                                        |
+------------------+----------------------------------------------------------------+
| Learning Journey | Learning is a never-ending journey &amp; I&apos;m excited!</p> |
+------------------+----------------------------------------------------------------+

```

However, searches for the word "excited" return no answers because the "!" is tokenized with the word "excited", and "excited" doesn't match the token "excited!".

```sql
SELECT *
FROM html_table
WHERE MATCH(TABLE html_table) AGAINST ("content:'excited'");

```

```output

Empty set (0.15 sec)
```

## Example 3 - Use the StandardAnalyzer

The following command displays the tokens and associated attributes that are produced by the Lucene StandardAnalyzer for the string `"the train is moving"`.

```sql
ANALYZE FULLTEXT "the train is moving";

```

```output

+-----------------------------------------------+
| Response                                      | 
+-----------------------------------------------+
| {"tokens": [
    {
      "position_length": 1,
      "end_offset": 3,
      "start_offset": 0,
      "position": 0,
      "type": "<ALPHANUM>",
      "token": "the"
    },
    {
      "position_length": 1,
      "end_offset": 9,
      "start_offset": 4,
      "position": 1,
      "type": "<ALPHANUM>",
      "token": "train"
    },
    {
      "position_length": 1,
      "end_offset": 12,
      "start_offset": 10,
      "position": 2,
      "type": "<ALPHANUM>",
      "token": "is"
    },
    {
      "position_length": 1,
      "end_offset": 19,
      "start_offset": 13,
      "position": 3,
      "type": "<ALPHANUM>",
      "token": "moving"
    }
  ]} |
+-----------------------------------------------+

```

## Example 4 - Specify an Analyzer

The following command shows the tokens that would be generated by the `english` analyzer for the string `"the train is moving"`.

```sql
ANALYZE FULLTEXT "the train is moving" OPTIONS '{"analyzer": "english"}';

```

```output

+-----------------------------------------------+
| Response                                      | 
+-----------------------------------------------+
| {"tokens": [
    {
      "position_length": 1,
      "end_offset": 9,
      "start_offset": 4,
      "position": 1,
      "type": "<ALPHANUM>",
      "token": "train"
    },
    {
      "position_length": 1,
      "end_offset": 19,
      "start_offset": 13,
      "position": 3,
      "type": "<ALPHANUM>",
      "token": "move"
    }
  ]} |
+-----------------------------------------------+

```

## Example 3 - Custom Tokenizer and Token Filter

The following command generates the tokens for the string `"MemSQL is SingleStore."` for the specified custom analyzer. The analyzer specified in this case uses the Lucene standard tokenizer with a token filter customization that specifies that all occurrences of MemSQL should be replaced with SingleStore.

```sql
ANALYZE FULLTEXT "MemSQL is SingleStore." 
OPTIONS 
'{
  "analyzer": {
    "custom": {
      "tokenizer": "standard",
      "token_filters": [
        {
          "pattern_replace": {
            "pattern": "MemSQL",
            "replacement": "SingleStore"
          }
        }
      ]
    }
  }
}';

```

```output

+-----------------------------------------------+
| Response                                      | 
+-----------------------------------------------+
| {"tokens": [
    {
      "position_length": 1,
      "end_offset": 6,
      "start_offset": 0,
      "position": 0,
      "type": "<ALPHANUM>",
      "token": "SingleStore"
    },
    {
      "position_length": 1,
      "end_offset": 9,
      "start_offset": 7,
      "position": 1,
      "type": "<ALPHANUM>",
      "token": "is"
    },
    {
      "position_length": 1,
      "end_offset": 21,
      "start_offset": 10,
      "position": 2,
      "type": "<ALPHANUM>",
      "token": "SingleStore"
    }
  ]} |
+-----------------------------------------------+
```

## Example 4 - Custom Analyzer with Whitespace Tokenizer

The following command generates the tokens for the string `"This guide teaches you how to build a multimodal Retrieval-Augmented Generation (RAG) application … "` using a Whitespace tokenizer as used in [Example 1 on the Full Text Version 2 Custom Analyzers](https://docs.singlestore.com/db/v9.1/developer-resources/functional-extensions/full-text-version-2-custom-analyzers/#section-idm234616973460398.md) page.

```sql
ANALYZE FULLTEXT "This guide teaches you how to build a multimodal Retrieval-Augmented Generation (RAG) application using SingleStore, integrating various data types for enhanced AI responses." 
OPTIONS
  '{
      "analyzer": {
          "custom": {
              "tokenizer": "whitespace"
          }
      }
  }';

```

```output

+-----------------------------------------------+
| Response                                      |
+-----------------------------------------------+
| {"tokens": [ 
    {
      "position_length": 1,
      "end_offset": 4,
      "start_offset": 0,
      "position": 0,
      "type": "word",
      "token": "This"
    },
    {
      "position_length": 1,
      "end_offset": 10,
      "start_offset": 5,
      "position": 1,
      "type": "word",
      "token": "guide"
    },
    {
      "position_length": 1,
      "end_offset": 18,
      "start_offset": 11,
      "position": 2,
      "type": "word",
      "token": "teaches"
    },
    {
      "position_length": 1,
      "end_offset": 22,
      "start_offset": 19,
      "position": 3,
      "type": "word",
      "token": "you"
    },
    {
      "position_length": 1,
      "end_offset": 26,
      "start_offset": 23,
      "position": 4,
      "type": "word",
      "token": "how"
    },
    {
      "position_length": 1,
      "end_offset": 29,
      "start_offset": 27,
      "position": 5,
      "type": "word",
      "token": "to"
    },
    {
      "position_length": 1,
      "end_offset": 35,
      "start_offset": 30,
      "position": 6,
      "type": "word",
      "token": "build"
    },
    {
      "position_length": 1,
      "end_offset": 37,
      "start_offset": 36,
      "position": 7,
      "type": "word",
      "token": "a"
    },
    {
      "position_length": 1,
      "end_offset": 48,
      "start_offset": 38,
      "position": 8,
      "type": "word",
      "token": "multimodal"
    },
    {
      "position_length": 1,
      "end_offset": 68,
      "start_offset": 49,
      "position": 9,
      "type": "word",
      "token": "Retrieval-Augmented"
    },
    {
      "position_length": 1,
      "end_offset": 79,
      "start_offset": 69,
      "position": 10,
      "type": "word",
      "token": "Generation"
    },
    {
      "position_length": 1,
      "end_offset": 85,
      "start_offset": 80,
      "position": 11,
      "type": "word",
      "token": "(RAG)"
    },
    {
      "position_length": 1,
      "end_offset": 97,
      "start_offset": 86,
      "position": 12,
      "type": "word",
      "token": "application"
    },
    {
      "position_length": 1,
      "end_offset": 103,
      "start_offset": 98,
      "position": 13,
      "type": "word",
      "token": "using"
    },
    {
      "position_length": 1,
      "end_offset": 116,
      "start_offset": 104,
      "position": 14,
      "type": "word",
      "token": "SingleStore,"
    },
    {
      "position_length": 1,
      "end_offset": 128,
      "start_offset": 117,
      "position": 15,
      "type": "word",
      "token": "integrating"
    },
    {
      "position_length": 1,
      "end_offset": 136,
      "start_offset": 129,
      "position": 16,
      "type": "word",
      "token": "various"
    },
    {
      "position_length": 1,
      "end_offset": 141,
      "start_offset": 137,
      "position": 17,
      "type": "word",
      "token": "data"
    },
    {
      "position_length": 1,
      "end_offset": 147,
      "start_offset": 142,
      "position": 18,
      "type": "word",
      "token": "types"
    },
    {
      "position_length": 1,
      "end_offset": 151,
      "start_offset": 148,
      "position": 19,
      "type": "word",
      "token": "for"
    },
    {
      "position_length": 1,
      "end_offset": 160,
      "start_offset": 152,
      "position": 20,
      "type": "word",
      "token": "enhanced"
    },
    {
      "position_length": 1,
      "end_offset": 163,
      "start_offset": 161,
      "position": 21,
      "type": "word",
      "token": "AI"
    },
    {
      "position_length": 1,
      "end_offset": 174,
      "start_offset": 164,
      "position": 22,
      "type": "word",
      "token": "responses."
    }
  ]} |
+-----------------------------------------------+

```

***

Modified at: June 11, 2026

Source: [/db/v9.1/reference/sql-reference/full-text-search-functions/analyze-fulltext/](https://docs.singlestore.com/db/v9.1/reference/sql-reference/full-text-search-functions/analyze-fulltext/)

(An index of the documentation is available at /llms.txt)