REGEXP_ MATCH()
On this page
The REGEXP_
function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string.
REGEXP_
supports the extended and advanced regular expression formats.regexp_
to specify which format you want to use.
Syntax
REGEXP_MATCH (<source_string>, <pattern> [, <parameters> ] ] )
Arguments
-
source_
: The string from which to extract substrings that match a regular expression.string -
pattern
: A string representing a regular expression. -
parameters
:An optional argument, if present, has one of the following values:-
c
: Perform case-sensitive matching. -
i
: Perform case-insensitive matching. -
If
c
ori
is not specified, performs case-sensitive matching.
-
Remarks
-
If no match is found, the
REGEXP_
result will be NULL.MATCH -
If a match is found, and the pattern does not contain parenthesized sub-expressions, the result is a single-element text array containing the substring that matches the entire pattern.
This result is similar to the REGEXP_ SUBSTR() function. -
If a match is found, and the pattern contains parenthesized sub-expressions, the result is a JSON array of strings, whose n'th element is the substring matching the n'th parenthesized sub-expression of the pattern (not counting "non-capturing" parentheses).
See the examples section for more details.
Examples
The following examples demonstrate how REGEXP_
works when the global variable REGEXP_
is set to advanced.
SET GLOBAL REGEXP_FORMAT = 'advanced';
Searches for a group of characters between the alpha and gamma in the substrings.
SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma') |
+-------------------------------------------------------+
| ["beta"] |
+-------------------------------------------------------+
Searches for a group of digits, which occurs after the word gamma.
SELECT REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)') |
+-------------------------------------------------------+
| ["579"] |
+-------------------------------------------------------+
There is no capturing group (i.
SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma');
+-----------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma') |
+-----------------------------------------------------+
| ["alphabetagamma"] |
+-----------------------------------------------------+
This example matches data based on the each part of the date (i.
To use \d
expression inside the pattern, the regexp_
SELECT REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})');
+----------------------------------------------------------+
| REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})') |
+----------------------------------------------------------+
| ["04","08","2023"] |
+----------------------------------------------------------+
Returns the first occurrences of a group of lowercase characters followed by groups of digits.
SELECT REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)');
+---------------------------------------------------+
| REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)') |
+---------------------------------------------------+
| ["357"] |
+---------------------------------------------------+
Returns NULL since there is no match.
SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison');
+---------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison') |
+---------------------------------------------------------+
| NULL |
+---------------------------------------------------------+
This is a case-sensitive search.
SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c') |
+------------------------------------------------------------+
| NULL |
+------------------------------------------------------------+
This is a case-sensitive search.
SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i') |
+------------------------------------------------------------+
| ["beta"] |
+------------------------------------------------------------+
The first group is non-capturing (?:) but needs to be matched.
SELECT REGEXP_MATCH('barbeta', '(?:ba.)(beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(?:ba.)(beta)') |
+------------------------------------------+
| ["beta"] |
+------------------------------------------+
The second group is optional if it was not captured.
SELECT REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)');
+--------------------------------------------------+
| REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)') |
+--------------------------------------------------+
| ["bar",null,"beta"] |
+--------------------------------------------------+
The '?='
expression is used for a positive lookahead; it is not a captured group.
SELECT REGEXP_MATCH('barbeta', '(bar)(?=beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(bar)(?=beta)') |
+------------------------------------------+
| ["bar"] |
+------------------------------------------+
Extracting a Matched Group From a JSON Array in a Table
The REGEXP_
builtin returns a JSON array of matches.JSON_
builtin or shorthand ::$ syntax
.
Examples
Create a table and insert some values.
CREATE TABLE filenames(source_strings text);INSERT INTO filenames VALUES("file_expense_record.pdf"), ("file_24082023.pdf"),("inventory_file_new.pdf.tmp");
This statement extracts the first element from the array (note there is a zero (0) indexing).
SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') AS a, JSON_EXTRACT_STRING(a, 0) AS b FROM filenames;
+-------------------------+---------------------+
| a | b |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL | NULL |
| ["file_24082023"] | file_24082023 |
+-------------------------+---------------------+
This statement also extracts the first element but uses the shorthand format.
SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') rmatch, rmatch::$`0` FROM filenames;
+-------------------------+---------------------+
| rmatch | rmatch::$`0` |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL | NULL |
| ["file_24082023"] | file_24082023 |
+-------------------------+---------------------+
Last modified: August 24, 2023