REGEXP_MATCH()

The REGEXP_MATCH() function returns a JSON array of matching substring(s) within the first match of a regular expression pattern to a string.

REGEXP_MATCH supports the extended and advanced regular expression formats. You can set the global variable regexp_format to specify which format you want to use.

Syntax

REGEXP_MATCH (<source_string>, <pattern> [, <parameters> ] ] )

Arguments

  • source_string: The string from which to extract substrings that match a regular expression.

  • pattern: A string representing a regular expression.

  • parameters:An optional argument, if present, has one of the following values:

    • c: Perform case-sensitive matching.

    • i: Perform case-insensitive matching.

    • If c or i is not specified, performs case-sensitive matching.

Remarks

  • If no match is found, the REGEXP_MATCH result will be NULL.

  • If a match is found, and the pattern does not contain parenthesized sub-expressions, the result is a single-element text array containing the substring that matches the entire pattern. This result is similar to the REGEXP_SUBSTR() function.

  • If a match is found, and the pattern contains parenthesized sub-expressions, the result is a JSON array of strings, whose n'th element is the substring matching the n'th parenthesized sub-expression of the pattern (not counting "non-capturing" parentheses). See the examples section for more details.

Examples

The following examples demonstrate how REGEXP_MATCH works when the global variable REGEXP_FORMAT is set to advanced.

SET GLOBAL REGEXP_FORMAT = 'advanced';

Searches for a group of characters between the alpha and gamma in the substrings.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)gamma') |
+-------------------------------------------------------+
| ["beta"]                                              |
+-------------------------------------------------------+

Searches for a group of digits, which occurs after the word gamma.

SELECT REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)');
+-------------------------------------------------------+
| REGEXP_MATCH('alphabetagamma579XYZ', 'gamma([0-9]+)') |
+-------------------------------------------------------+
| ["579"]                                               |
+-------------------------------------------------------+

There is no capturing group (i.e., parenthesized subexpressions), so the returned values matches the entire substring pattern.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma');
+-----------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha.*gamma') |
+-----------------------------------------------------+
| ["alphabetagamma"]                                  |
+-----------------------------------------------------+

This example matches data based on the each part of the date (i.e., by month, day, and year). It will also accept invalid dates (e.g., 29-11-000).

To use \d expression inside the pattern, the regexp_format global variable should be set to advanced. Also, when using SingleStore strings an escape should be used.

SELECT REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})');
+----------------------------------------------------------+
| REGEXP_MATCH('04-08-2023', '(\\d{2})-(\\d{2})-(\\d{4})') |
+----------------------------------------------------------+
| ["04","08","2023"]                                       |
+----------------------------------------------------------+

Returns the first occurrences of a group of lowercase characters followed by groups of digits.

SELECT REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)');
+---------------------------------------------------+
| REGEXP_MATCH('abc357cde123abc', '[a-z]+([0-9]+)') |
+---------------------------------------------------+
| ["357"]                                           |
+---------------------------------------------------+

Returns NULL since there is no match.

SELECT REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison');
+---------------------------------------------------------+
| REGEXP_MATCH('alphabetagammadelta', 'alpha(.*)eplison') |
+---------------------------------------------------------+
| NULL                                                    |
+---------------------------------------------------------+

This is a case-sensitive search.

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'c') |
+------------------------------------------------------------+
| NULL                                                       |
+------------------------------------------------------------+

This is a case-sensitive search.

SELECT REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i');
+------------------------------------------------------------+
| REGEXP_MATCH('ALPHAbetagammadelta', 'alpha(.*)gamma', 'i') |
+------------------------------------------------------------+
| ["beta"]                                                   |
+------------------------------------------------------------+

The first group is non-capturing (?:) but needs to be matched.

SELECT REGEXP_MATCH('barbeta', '(?:ba.)(beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(?:ba.)(beta)') |
+------------------------------------------+
| ["beta"]                                 |
+------------------------------------------+

The second group is optional if it was not captured. JSON null will be present in the output array.

SELECT REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)');
+--------------------------------------------------+
| REGEXP_MATCH('alphabarbeta', '(bar)(.+)?(beta)') |
+--------------------------------------------------+
| ["bar",null,"beta"]                              |
+--------------------------------------------------+

The '?=' expression is used for a positive lookahead; it is not a captured group.

SELECT REGEXP_MATCH('barbeta', '(bar)(?=beta)');
+------------------------------------------+
| REGEXP_MATCH('barbeta', '(bar)(?=beta)') |
+------------------------------------------+
| ["bar"]                                  |
+------------------------------------------+

Extracting a Matched Group From a JSON Array in a Table

The REGEXP_MATCH builtin returns a JSON array of matches. Sometimes, you may need to obtain the exact matched string (i.e., extract the element from the array); to do this, use the JSON_EXTRACT_STRING builtin or shorthand ::$ syntax.

Examples

Create a table and insert some values.

CREATE TABLE filenames(source_strings text);
INSERT INTO filenames VALUES("file_expense_record.pdf"), ("file_24082023.pdf"),("inventory_file_new.pdf.tmp");

This statement extracts the first element from the array (note there is a zero (0) indexing).

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') AS a, JSON_EXTRACT_STRING(a, 0) AS b FROM filenames;
+-------------------------+---------------------+
| a                       | b                   |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

This statement also extracts the first element but uses the shorthand format.

SELECT REGEXP_MATCH(source_strings, '^(file.+)\\.pdf$') rmatch, rmatch::$`0` FROM filenames;
+-------------------------+---------------------+
| rmatch                  | rmatch::$`0`        |
+-------------------------+---------------------+
| ["file_expense_record"] | file_expense_record |
| NULL                    | NULL                |
| ["file_24082023"]       | file_24082023       |
+-------------------------+---------------------+

Last modified: August 24, 2023

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK