Regular expressions
A regular expression is a sequence of characters that act as a pattern for matching and manipulating strings. Regular expressions are used in the fn:matches, fn:replace, and fn:tokenize functions.
Syntax
- character
- In a regular expression, character is a normal XML character that is not a metacharacter.
- Metacharacters
- Metacharacters are control characters in regular expressions.
The regular expression metacharacters that are currently supported
are:
- backslash (\)
- Begins a character class escape. A character class escape indicates that the metacharacter that follows is to be used as a character, instead of a metacharacter.
- period (.)
- Matches any single character except a newline character (\n).
- carat (^)
- If the carat character appears outside of a character class, the
characters that follow the carat match the start of the input string
or, for multi-line input strings, the start of a line. An input string
is considered to be a multi-line input string if the function that
uses the input string includes the
m
flag.If the carat character appears as the first character within a character class, the carat acts as a not-sign. A match occurs if none of the characters in the character group appear in the string that is being compared to the regular expression.
- dollar sign ($)
- Matches the end of the input string or, for multi-line input strings,
the end of a line. An input string is considered to be a multi-line
input string if the function that uses the input string includes the
m
flag. - question mark (?)
- Matches the preceding character or character group in the regular expression zero or one time.
- asterisk (*)
- Matches the preceding character or character group in the regular expression zero or more times.
- plus sign (+)
- Matches the preceding character or character group in the regular expression one or more times.
- {n}
- Matches the preceding character or character group in the regular expression exactly n times. n must be a positive integer.
- {n,m}
- Matches the preceding character or character group in the regular expression at least n times, but not more than m times. n must be a positive integer, and m must be a positive integer that is greater than or equal to n.
- {n,}
- Matches the preceding character or character group in the regular expression at least n times. n must be a positive integer.
- opening bracket ([) and closing bracket (])
- The opening and closing brackets and the enclosed character group
define a character class. For example, the character class [aeiou]
matches any single vowel. Character classes also support character
ranges. For example:
- [a-z] means any lowercase letter.
- [a-p] means any lowercase letter from a through p.
- [0-9] means any single digit.
- opening parenthesis (() and closing parenthesis ())
- An opening and closing parenthesis denote a grouping of some characters within a regular expression. You can then apply an operator, such as a repetition operator, to the entire group.
- character-class-escape
- A character class escape specifies that you want certain special
characters to be treated as characters, instead of performing some
function. A character class escape consists of a backslash (\), followed
by a single metacharacter, newline character, return character, or
tab character. The following table lists the character class escapes.
Table 1. Single-character character class escapes Character escape Character represented Description \n
#x0A Newline \r
#x0D Return \t
#x09 Tab \\
\ Backslash \|
| Pipe \.
. Period \?
? Question mark \*
* Asterisk \+
+ Plus sign \(
( Opening parenthesis \)
) Closing parenthesis \{
{ Opening curly brace \}
} Closing curly brace \$
$ Dollar sign \-
- Dash \[
[ Opening bracket \]
] Closing bracket \^
^ Caret - character-group
- A character group is the set of characters in a character class. The character class is used for matching. It can consist characters, character ranges, character class escapes, and an optional opening carat. If the carat is included, it indicates the complement of the set of characters that are defined by the rest of character group.
Examples
The following examples demonstrate how each of the metacharacters affects a regular expression.
- "hello[0-9]world" matches "hello3world", but not "hello world".
- "^hello" matches this text:
hello world
However, "^hello" does not match this text:world hello
- "hello$" matches this text:
world hello
However, "hello$" does not match this text:
hello world
- "(ca)|(bd)" matches "arcade" or "abdicate".
- "^((ca)|(bd))" does not match "arcade" or "abdicate".
- "w?s" matches "ws" or "s".
- "w.*s" matches "was" or "waters".
- "be+t" matches "beet" or "bet".
- "be{1,3}t" matches "bet", "beet", or "beeet".
- "\[n\]" matches "[n]".