Regular expression functions#
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
- When using multi-line mode (enabled via the - (?m)flag), only- \nis recognized as a line terminator. Additionally, the- (?d)flag is not supported and must not be used.
- Case-insensitive matching (enabled via the - (?i)flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the- (?u)flag is not supported and must not be used.
- Surrogate pairs are not supported. For example, - \uD800\uDC00is not treated as- U+10000and must be specified as- \x{10000}.
- Boundaries ( - \b) are incorrectly handled for a non-spacing mark without a base character.
- \Qand- \Eare not supported in character classes (such as- [A-Z123]) and are instead treated as literals.
- Unicode character classes ( - \p{prop}) are supported with the following differences:- All underscores in names must be removed. For example, use - OldItalicinstead of- Old_Italic.
- Scripts must be specified directly, without the - Is,- script=or- sc=prefixes. Example:- \p{Hiragana}
- Blocks must be specified with the - Inprefix. The- block=and- blk=prefixes are not supported. Example:- \p{Mongolian}
- Categories must be specified directly, without the - Is,- general_category=or- gc=prefixes. Example:- \p{L}
- Binary properties must be specified directly, without the - Is. Example:- \p{NoncharacterCodePoint}
 
- regexp_count(string, pattern) bigint#
- Returns the number of occurrence of - patternin- string:- SELECT regexp_count('1a 2b 14m', '\s*[a-z]+\s*'); -- 3 
- regexp_extract_all(string, pattern)#
- Returns the substring(s) matched by the regular expression - patternin- string:- SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14] 
- regexp_extract_all(string, pattern, group)
- Finds all occurrences of the regular expression - patternin- stringand returns the capturing group number- group:- SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm'] 
- regexp_extract(string, pattern) varchar#
- Returns the first substring matched by the regular expression - patternin- string:- SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1 
- regexp_extract(string, pattern, group) varchar
- Finds the first occurrence of the regular expression - patternin- stringand returns the capturing group number- group:- SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a' 
- regexp_like(string, pattern) boolean#
- Evaluates the regular expression - patternand determines if it is contained within- string.- The - patternonly needs to be contained within- string, rather than needing to match all of- string. In other words, this performs a contains operation rather than a match operation. You can match the entire string by anchoring the pattern using- ^and- $:- SELECT regexp_like('1a 2b 14m', '\d+b'); -- true 
- regexp_position(string, pattern) integer#
- Returns the index of the first occurrence (counting from 1) of - patternin- string. Returns -1 if not found:- SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b'); -- 8 
- regexp_position(string, pattern, start) integer
- Returns the index of the first occurrence of - patternin- string, starting from- start(include- start). Returns -1 if not found:- SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 5); -- 8 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12); -- 19 
- regexp_position(string, pattern, start, occurrence) integer
- Returns the index of the nth - occurrenceof- patternin- string, starting from- start(include- start). Returns -1 if not found:- SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 1); -- 19 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 2); -- 31 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 3); -- -1 
- regexp_replace(string, pattern) varchar#
- Removes every instance of the substring matched by the regular expression - patternfrom- string:- SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m' 
- regexp_replace(string, pattern, replacement) varchar
- Replaces every instance of the substring matched by the regular expression - patternin- stringwith- replacement. Capturing groups can be referenced in- replacementusing- $gfor a numbered group or- ${name}for a named group. A dollar sign (- $) may be included in the replacement by escaping it with a backslash (- \$):- SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m' 
- regexp_replace(string, pattern, function) varchar
- Replaces every instance of the substring matched by the regular expression - patternin- stringusing- function. The lambda expression- functionis invoked for each match with the capturing groups passed as an array. Capturing group numbers start at one; there is no group for the entire match (if you need this, surround the entire expression with parenthesis).- SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York' 
- regexp_split(string, pattern)#
- Splits - stringusing the regular expression- patternand returns an array. Trailing empty strings are preserved:- SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]