REGEXP_SUBSTR

Searches a string for a regular expression pattern and returns the matching substring.

REGEXP_SUBSTR supports the extended and advanced regular expression formats. You can set the global variable regexp_format to specify which format you want to use.

Syntax

REGEXP_SUBSTR (<source_string>, <pattern> [, <position> [, <occurrence> [, <parameters> ] ] ] )

Arguments

  • source_string: A string expression, such as a column name, to be searched.

  • pattern: A string literal representing a regular expression

  • position: Optional if occurrence and parameters are not specified.

    A positive integer indicating the position within source_string to begin searching. The position is based on the number of characters, not bytes, so that multibyte characters are counted as single characters. The default is 1. If position is less than 1, the search begins at the first character of source_string. If position is greater than the number of characters in source_string, the result is NULL.

  • occurrence: Optional if parameters is not specified.

    A positive integer indicating which occurrence of pattern in source_string to return. The search for the occurrence begins at position. If occurrence is not present, defaults to 1. If occurrence is less than 1 or greater than the number of characters in source_string, the search is ignored and REGEXP_SUBSTR returns NULL.

  • parameters:Optional. If present, has one of the following values:

    • c: Perform case-sensitive matching.

    • i: Perform case-insensitive matching.

    • If c or i is not specified, performs case-sensitive matching.

Remarks

  • REGEXP_SUBSTR returns NULL if no match is found, source_string is NULL, or pattern is NULL.

Examples

The following examples demonstrate how REGEXP_SUBSTR works when the global variable regexp_format is set to 'extended'.

Call to REGEXP_SUBSTR

Result

Comments

SELECT REGEXP_SUBSTR('adog','o');

o

Returns the substring o.

SELECT REGEXP_SUBSTR('adog','.o');

do

Returns the character preceding the first occurrence of o, followed by the substring o.

SELECT REGEXP_SUBSTR('dog58','[[:digit:]]');

5

Returns the first digit.

SELECT REGEXP_SUBSTR('1dogbat','[[:alpha:]]');

d

Returns the first alphabet.

SELECT REGEXP_SUBSTR('rat','b|c');

NULL

Looks for b or c. Since neither character is found, returns NULL.

SELECT REGEXP_SUBSTR('mousecat','cat$');

cat

Returns the substring cat that appears at the end of the string.

SELECT REGEXP_SUBSTR('2dogcat','^dog');

NULL

Searches for dog at the beginning of the string. Since dog is not found at the beginning of the string, returns NULL.

SELECT REGEXP_SUBSTR('catcat','ca.',1);

cat

Returns the first occurrence of ca followed by any character.

SELECT REGEXP_SUBSTR('catcat','cat',1,2);

cat

Returns the second occurrence of cat.

SELECT REGEXP_SUBSTR('bat3BaT','BaT',1,1,'c');

BaT

Returns the first occurrence of BaT (case sensitive search).

SELECT REGEXP_SUBSTR('bat3BaT','BaT',1,1,'i');

bat

Returns the first occurrence of BaT (case-insensitive search).

The following examples demonstrate how REGEXP_SUBSTR works when the global variable regexp_format is set to 'advanced'.

Call to REGEXP_SUBSTR

Result

Comments

SELECT REGEXP_SUBSTR('dog58','\\d');

5

Returns the first digit.

SELECT REGEXP_SUBSTR('dog58','\\D');

d

Returns the first non-digit.

SELECT REGEXP_SUBSTR('%26dog','\\w');

2

Returns the first alpha-numeric character.

SELECT REGEXP_SUBSTR('%26dog','\\W');

%

Returns the first non-alpha-numeric character.

SELECT REGEXP_SUBSTR('cat dog','\\mdog');

dog

Returns the first occurrence of the word dog.

Using REGEXP_SUBSTR With a Table

The following example shows how to call REGEXP_SUBSTR once per table row. This example uses extended regular expression matching. Advanced regular expression matching could be used as well.

SET GLOBAL regexp_format = 'extended';
DROP DATABASE IF EXISTS singlestore_docs_example;
CREATE DATABASE singlestore_docs_example;
use singlestore_docs_example;
CREATE TABLE regexample_table(id INT, text_field TEXT);
INSERT INTO regexample_table VALUES (1, 'the'), (2, 'athe'), (3, 'abthe'),
(4, 'abcthe'), (5, 'thethe');
SELECT id, text_field, REGEXP_SUBSTR(text_field,'..the') FROM regexample_table
ORDER BY id;
+------+------------+-----------------------------------+
| id   | text_field | REGEXP_SUBSTR(text_field,'..the') |
+------+------------+-----------------------------------+
|    1 | the        | NULL                              |
|    2 | athe       | NULL                              |
|    3 | abthe      | abthe                             |
|    4 | abcthe     | bcthe                             |
|    5 | thethe     | hethe                             |
+------+------------+-----------------------------------+

Last modified: August 9, 2023

Was this article helpful?