If you need a refresher on how Regular Expressions work, check out our Interactive Tutorial first!
Java supports regular expressions through the classes in the java.util.regex
package in the standard Java library. While there are some differences in advanced features
supported by the Java regular expression library compared to PCRE, they both share a large part
of the syntax and patterns and expressions can be used in Java and other languages.
In the examples below, be sure to import the following package at the top of the source file if you are trying out the code:
import java.util.regex.*;
In Java, regular strings can contain special characters (also known as escape sequences)
which are characters that are preceeded by a backslash (\
) and identify a special piece of text like
a newline (\n
) or a tab character (\t
). As a result, when writing regular expressions in Java code,
you need to escape the backslash in each metacharacter to let the compiler know that it's not an errant
escape sequence.
For example, take the pattern "There are \d dogs"
. In Java, you would escape the backslash of the digit
metacharacter by using the escape sequence \\
(effectively escaping the backslash with itself) to create
the pattern "There are \\d dogs"
.
This is only necessary when hard-coding patterns into Java code, as strings that are read in from user
input or from files are read by character individually and escape sequences are not interpreted. This is
a common approach to get around this problem, by either putting the patterns in a Properties
or resource
file so they can be easier to read and understand.
Other languages like C# or Python support the notion of raw strings, but Java has yet to add this useful feature into the core language.
Working with regular expressions in Java generally involves instantiating a Pattern
,
and matching it against some text. The simplest way to do this is to call the static method Pattern.matches()
,
which takes an input string and the regular expression to match it against, and simply returns whether
the pattern matches the string.
boolean isMatch = Pattern.matches(String regex, String inputStr)
However, this does not give you any additional information such as where in the input string the
pattern matches, or the groups that matched. So for most purposes, it is both more useful and also
more efficient to compile a new Pattern
and then use it to create a new Matcher
for each input string that you are matching against, which will hold the results of the match.
Pattern ptrn = Pattern.compile(String regex)
Matcher matcher = ptrn.matcher(String inputStr)
// Lets use a regular expression to match a date string.
Pattern ptrn = Pattern.compile("([a-zA-Z]+) (\\d+)");
Matcher matcher = ptrn.matcher("June 24");
if (matcher.matches()) {
// Indeed, the expression "([a-zA-Z]+) (\d+)" matches the date string
// To get the indices of the match, you can read the Matcher object's
// start and end values.
// This will print [0, 7], since it matches at the beginning and end of the
// string
System.out.println("Match at index [" + matcher.start() +
", " + matcher.end() + ")");
// To get the fully matched text, you can read the Matcher object's group
// This will print "June 24"
System.out.println("Match: " + matcher.group());
}
Capturing groups in a regular expression is as straightforward as matching a string in the example
above. After using a Pattern
to match an input string, you can just iterate through the extracted
groups in the returned Matcher
.
// Lets use a regular expression to capture data from a few date strings.
String pattern = "([a-zA-Z]+) (\\d+)";
Pattern ptrn = Pattern.compile("([a-zA-Z]+) (\\d+)");
Matcher matcher = ptrn.matcher("June 24, August 9, Dec 12");
// This will print each of the matches and the index in the input string
// where the match was found:
// June 24 at index [0, 7)
// August 9 at index [9, 17)
// Dec 12 at index [19, 25)
while (matcher.find()) {
System.out.println(String.format("Match: %s at index [%d, %d]",
matcher.group(), matcher.start(), matcher.end()));
}
// If we are iterating over the groups in the match again, first reset the
// matcher to start at the beginning of the input string.
matcher.reset();
// For each match, we can extract the captured information by reading the
// captured groups.
while (matcher.find()) {
// This will print the number of captured groups in this match
System.out.println(String.format("%d groups captured",
matcher.groupCount()));
// This will print the month and day of each match. Remember that the
// first group is always the whole matched text, so the month starts at
// index 1 instead.
System.out.println("Month: " + matcher.group(1) + ", Day: " +
matcher.group(2));
// Each group in the match also has a start and end index, which is the
// index in the input string that the group was found.
System.out.println(String.format("Month found at[%d, %d)",
matcher.start(1), matcher.end(1)));
}
Another common task is to find and replace a part of a string using regular expressions, for example,
to replace all instances of an old email domain, or to swap the order of some text. You can do this
in Java with the Matcher.replaceAll()
and Matcher.replaceFirst()
methods. Both these methods first reset
the matcher to start at the beginning of the input string up to either the end of the string, or the
end of the first match respectively.
The replacement string can contain references to captured groups in the pattern (using the dollar sign $), or just a regular literal string.
String replacedString = matcher.replaceAll(String inputStr)
String replacedString = matcher.replaceFirst(String inputStr)
// Lets try and reverse the order of the day and month in a few date
// strings. Notice how the replacement string also contains metacharacters
// (the back references to the captured groups) so we use a verbatim
// string for that as well.
Pattern ptrn = Pattern.compile("([a-zA-Z]+) (\\d+)");
Matcher matcher = ptrn.matcher("June 24, August 9, Dec 12");
// This will reorder the string inline and print:
// 24 of June, 9 of August, 12 of Dec
// Remember that the first group is always the full matched text, so the
// month and day indices start from 1 instead of zero.
String replacedString = matcher.replaceAll("$2 of $1");
System.out.println(replacedString);
Pattern
FlagsWhen compiling a Pattern
, you will notice that you can pass in additional flags to change how input
strings are matched. Most of the available flags are a convenience and can be written into the into the
regular expression itself directly, but some can be useful in certain cases.
Pattern.CASE_INSENSITIVE
makes the pattern case insensitive so that it matches strings of different
capitalizationsPattern.MULTILINE
is necessary if your input string has newline characters (\n) and allows
the start and end metacharacter (^ and $ respectively) to match at the beginning and end of each
line instead of at the beginning and end of the whole input stringPattern.DOTALL
allows the dot metacharacter (.) to match new line characters as wellPattern.LITERAL
makes the pattern literal, in the sense that the escaped characters are matched
as-is. For example, the pattern "\d"
will match a backslash followed by a 'd' character as opposed to
a digit characterFor more information about using regular expressions in Java, please visit the following links: