REGEXP_MATCH()

The REGEXP_MATCH() function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string.

REGEXP_MATCH supports the extended and advanced regular expression formats. You can set the global variable regexp_format to specify which format you want to use.

Syntax

REGEXP_MATCH (<source_string>, <pattern> [, <parameters> ] ] )

Arguments

  • source_string: The string from which to extract substrings that match a regular expression.

  • pattern: A string representing a regular expression.

  • parameters:An optional argument, if present, has one of the following values:

    • c: Perform case-sensitive matching.

    • i: Perform case-insensitive matching.

    • If c or i is not specified, performs case-sensitive matching.

Remarks

  • If no match is found, the REGEXP_MATCH result will be NULL.

  • If a match is found, and the pattern does not contain parenthesized sub-expressions, the result is a single-element text array containing the substring that matches the entire pattern. This result is similar to the REGEXP_SUBSTR() function.

  • If a match is found, and the pattern contains parenthesized sub-expressions, the result is a JSON array of strings, whose n'th element is the substring matching the n'th parenthesized sub-expression of the pattern (not counting "non-capturing" parentheses). See the examples section for more details.

Examples

The following examples demonstrate how REGEXP_MATCH works when the global variable REGEXP_FORMAT is set to advanced.

SET GLOBAL REGEXP_FORMAT = 'advanced';

Searches for a group of characters between the alpha and gamma in the substrings.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma') |
+-------------------------------------------------------+
| ["beta"]                                              |
+-------------------------------------------------------+

Searches for a group of digits, which occurs after the word gamma.

SELECT REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)') |
+-------------------------------------------------------+
| ["579"]                                               |
+-------------------------------------------------------+

There is no capturing group (i.e., parenthesized subexpressions), so the returned values matches the entire substring pattern.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma');
+-----------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma') |
+-----------------------------------------------------+
| ["alphabetagamma"]                                  |
+-----------------------------------------------------+

This example matches data based on the each part of the date (i.e., by month, day, and year). It will also accept invalid dates (e.g., 29-11-000).

To use \d expression inside the pattern, the regexp_format global variable should be set to advanced. Also, when using SingleStore strings an escape should be used.

SELECT REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})');
+----------------------------------------------------------+
| REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})') |
+----------------------------------------------------------+
| ["04","08","2023"]                                       |
+----------------------------------------------------------+

Returns the first occurrences of a group of lowercase characters followed by groups of digits.

SELECT REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)');
+---------------------------------------------------+
| REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)') |
+---------------------------------------------------+
| ["357"]                                           |
+---------------------------------------------------+

Returns NULL since there is no match.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison');
+---------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+

This is a case-sensitive search.

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c') |
+------------------------------------------------------------+
| NULL                                                       |
+------------------------------------------------------------+

This is a case-sensitive search.

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i') |
+------------------------------------------------------------+
| ["beta"]                                                   |
+------------------------------------------------------------+

The first group is non-capturing (?:) but needs to be matched.

SELECT REGEXP_MATCH('barbeta', '(?:ba.)(beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(?:ba.)(beta)') |
+------------------------------------------+
| ["beta"]                                 |
+------------------------------------------+

The second group is optional if it was not captured. JSON null will be present in the output array.

SELECT REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)');
+--------------------------------------------------+
| REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)') |
+--------------------------------------------------+
| ["bar",null,"beta"]                              |
+--------------------------------------------------+

The '?=' expression is used for a positive lookahead; it is not a captured group.

SELECT REGEXP_MATCH('barbeta', '(bar)(?=beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(bar)(?=beta)') |
+------------------------------------------+
| ["bar"]                                  |
+------------------------------------------+

Extracting a Matched Group From a JSON Array in a Table

The REGEXP_MATCH builtin returns a JSON array of matches. Sometimes, you may need to obtain the exact matched string (i.e., extract the element from the array); to do this, use the JSON_EXTRACT_STRING builtin or shorthand ::$ syntax.

Examples

Create a table and insert some values.

CREATE TABLE filenames(source_strings text);
INSERT INTO filenames VALUES("file_expense_record.pdf"), ("file_24082023.pdf"),("inventory_file_new.pdf.tmp");

This statement extracts the first element from the array (note there is a zero (0) indexing).

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') AS a, JSON_EXTRACT_STRING(a, 0) AS b FROM filenames;
+-------------------------+---------------------+
| a                       | b                   |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

This statement also extracts the first element but uses the shorthand format.

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') rmatch, rmatch::$`0` FROM filenames;
+-------------------------+---------------------+
| rmatch                  | rmatch::$`0`        |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

Last modified: August 24, 2023

Was this article helpful?