REGEXP_MATCH()

Syntax

REGEXP_MATCH (<source_string>, <pattern> [, <parameters> ] ] )

Arguments

source_string: The string from which to extract substrings that match a regular expression.
pattern: A string representing a regular expression.
parameters:An optional argument, if present, has one of the following values:
- c: Perform case-sensitive matching.
- i: Perform case-insensitive matching.
- If c or i is not specified, performs case-sensitive matching.

Remarks

If no match is found, the REGEXP_MATCH result will be NULL.
If a match is found, and the pattern does not contain parenthesized sub-expressions, the result is a single-element text array containing the substring that matches the entire pattern. This result is similar to the REGEXP_SUBSTR() function.
If a match is found, and the pattern contains parenthesized sub-expressions, the result is a JSON array of strings, whose n'th element is the substring matching the n'th parenthesized sub-expression of the pattern (not counting "non-capturing" parentheses). See the examples section for more details.

Examples

The following examples demonstrate how REGEXP_MATCH works when the global variable REGEXP_FORMAT is set to advanced.

SQL

SET GLOBAL REGEXP_FORMAT = 'advanced';

Searches for a group of characters between the alpha and gamma in the substrings.

SQL

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma');

+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma') |
+-------------------------------------------------------+
| ["beta"]                                              |
+-------------------------------------------------------+

Searches for a group of digits, which occurs after the word gamma.

SQL

SELECT REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)');

+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)') |
+-------------------------------------------------------+
| ["579"]                                               |
+-------------------------------------------------------+

There is no capturing group (i.e., parenthesized subexpressions), so the returned values matches the entire substring pattern.

SQL

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma');

+-----------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma') |
+-----------------------------------------------------+
| ["alphabetagamma"]                                  |
+-----------------------------------------------------+

This example matches data based on the each part of the date (i.e., by month, day, and year). It will also accept invalid dates (e.g., 29-11-000).

To use \d expression inside the pattern, the regexp_format global variable should be set to advanced. Also, when using SingleStore strings an escape should be used.

SQL

SELECT REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})');

+----------------------------------------------------------+
| REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})') |
+----------------------------------------------------------+
| ["04","08","2023"]                                       |
+----------------------------------------------------------+

Returns the first occurrences of a group of lowercase characters followed by groups of digits.

SQL

SELECT REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)');

+---------------------------------------------------+
| REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)') |
+---------------------------------------------------+
| ["357"]                                           |
+---------------------------------------------------+

Returns NULL since there is no match.

SQL

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison');

+---------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+

This is a case-sensitive search.

SQL

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c');

+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c') |
+------------------------------------------------------------+
| NULL                                                       |
+------------------------------------------------------------+

This is a case-sensitive search.

SQL

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i');

+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i') |
+------------------------------------------------------------+
| ["beta"]                                                   |
+------------------------------------------------------------+

The first group is non-capturing (?:) but needs to be matched.

SQL

SELECT REGEXP_MATCH('barbeta', '(?:ba.)(beta)');

+------------------------------------------+
| REGEXP_MATCH('barbeta', '(?:ba.)(beta)') |
+------------------------------------------+
| ["beta"]                                 |
+------------------------------------------+

The second group is optional if it was not captured. JSON null will be present in the output array.

SQL

SELECT REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)');

+--------------------------------------------------+
| REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)') |
+--------------------------------------------------+
| ["bar",null,"beta"]                              |
+--------------------------------------------------+

The '?=' expression is used for a positive lookahead; it is not a captured group.

SQL

SELECT REGEXP_MATCH('barbeta', '(bar)(?=beta)');

+------------------------------------------+
| REGEXP_MATCH('barbeta', '(bar)(?=beta)') |
+------------------------------------------+
| ["bar"]                                  |
+------------------------------------------+

Extracting a Matched Group From a JSON Array in a Table

The REGEXP_MATCH builtin returns a JSON array of matches. Sometimes, you may need to obtain the exact matched string (i.e., extract the element from the array); to do this, use the JSON_EXTRACT_STRING builtin or shorthand ::$ syntax.

Examples

Create a table and insert some values.

SQL

CREATE TABLE filenames(source_strings text);
INSERT INTO filenames VALUES("file_expense_record.pdf"), ("file_24082023.pdf"),("inventory_file_new.pdf.tmp");

This statement extracts the first element from the array (note there is a zero (0) indexing).

SQL

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') AS a, JSON_EXTRACT_STRING(a, 0) AS b FROM filenames;

+-------------------------+---------------------+
| a                       | b                   |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

This statement also extracts the first element but uses the shorthand format.

SQL

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') rmatch, rmatch::$`0` FROM filenames;

+-------------------------+---------------------+
| rmatch                  | rmatch::$`0`        |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

REGEXP_MATCH()

On this page

Syntax

Arguments

Remarks

Examples

Extracting a Matched Group From a JSON Array in a Table

Examples

Was this article helpful?

On this page

Was this article helpful?