Using Regular Expression Commands
Suppose you want to find the starting position of a string within another string. Call the former the search string and the latter the target string. To find dog
within The dog chased the cat
, you would run the query SELECT INSTR('The dog chased the cat','dog');
. This query returns 5
, indicating the position where it found dog
.
But INSTR does not allow search strings that contain placeholders. The following examples show search strings containing placeholders, denoted by the ?
symbol.
Search String | Placeholder |
---|---|
| Any string |
| Any three letter string |
| The string |
Regular expressions enable you to define search strings with placeholders. You specify these strings using patterns. For instance, the pattern The .{3,5} chased the mouse|frog
matches many target strings. Two of them are The dog chased the frog
and The rat chased the mouse
.
Patterns contain literals and metacharacters. Literals are characters that a pattern matcher attempts to locate in the target string exactly as the characters are written in the search string. Metacharacters describe the placeholders that the pattern uses.
The pattern The .{3,5} chased the mouse|frog
contains the literals The
,chased
,the
,mouse
and frog
. It contains the metacharacters .
,{3,5}
and |
.
Regular Expression Functions
SingleStoreDB has five built-in functions that allow you to work with regular expressions.
REGEXP
: Searches a string for a regular expression pattern and returns1
if the string is found,0
otherwise.RLIKE
: Has the same functionality asREGEXP
.REGEXP_INSTR
: Searches a string for a regular expression pattern and returns an integer indicating the beginning position or ending position of the matched substring.REGEXP_SUBSTR
: Searches a string for a regular expression pattern and returns the matching substring.REGEXP_REPLACE
: Searches a string for the first occurrence of a regular expression pattern and replaces that occurrence with a replacement string. Can also perform a search and replace of all occurrences.
Regular Expression Format - Advanced Regular Expression (ARE)
SingleStoreDB supports advanced regular expressions (ARE) as defined in PostgreSQL. ARE are nearly a superset of ERE and support common Perl regular expression extensions.
You can use the ARE format with the built-in SingleStoreDB functions RLIKE
, REGEXP
,REGEXP_INSTR
,REGEXP_SUBSTR
, and REGEXP_REPLACE
, which are described in the next section.
Set the global variable regexp_format
to specify the regular expression format to be used by the built-in functions you call.
To use ARE globally, run:
SET GLOBAL regexp_format=advanced;
You should set regexp_format
to 'advanced'
if you are developing new regular expression logic.
Advanced Regular Expression (ARE) Support
SingleStoreDB supports ARE as defined in PostgreSQL. This support is summarized below.
Backreferences
A pattern containing a backreference matches the same string that it matched earlier in the string. The earlier string must be enclosed in parentheses. A pattern may contain multiple backreferences that are denoted by \\n
, where n
is the number of the backreference.
For example, the pattern a(bc)d(ef)g\\1\\2
matches abcdefgbcef
.
Escapes
Escapes make it easier to write some types of patterns. Common escapes are listed below. See section 9.7.3.3 of the PostgreSQL pattern matching documentation for a full explanation of the supported escapes.
Metacharacter | Meaning |
---|---|
| Matches any digit. |
| Matches a space. |
| Matches any alpha-numeric character including the underscore. |
| Matches any character except a digit. |
| Matches any character except a space. |
| Matches any non-alpha-numeric character. Does not match the underscore. |
| Matches the beginning of a word. |
| Matches the end of a word. |
Note
In PostgreSQL, escapes in patterns begin with one \
character.
In SingleStoreDB, escapes in patterns also begin with one \
character. Additionally, a \
literal in an SingleStoreDB string needs to escaped with a leading \
. Hence, pattern strings containing escapes use two \
characters. For example, SELECT REGEXP_INSTR('dog58','\\d');