If you need a refresher on how Regular Expressions work, check out our Interactive Tutorial first!
C# supports regular expressions through the classes in the System.Text.RegularExpressions
namespace in the standard .NET framework. While there are some differences in advanced features
supported by the .NET regular expression library compared to PCRE, they both share a large part
of the syntax and patterns and expressions can be used in C# and other languages.
In the examples below, be sure to import the following namespaces at the top of the source file if you are trying out the code:
using System.Text.RegularExpressions;
When writing regular expression in C#, it is recommended that you use verbatim strings
instead of regular strings. Verbatim strings begin with a special prefix (@
) and signal C# not
to interpret backslashes and special metacharacters in the string, allowing you to pass them through
directly to the regular expression engine.
This means that a pattern like "\n\w"
will not be interpreted and can be written as @"\n\w"
instead of "\\n\\w"
as in other languages, which is much easier to read.
In the System.Text.RegularExpressions
namespace is a Regex
class, which encapsulates the interface
to the regular expressions engine and allows you to perform matches and extract information from text
using regular expressions.
To test whether a regular expressions matches a string, you can use the static method Regex.Match()
which takes an optional set of RegexOptions
enums. This returns a Match
object which contains information about where the match (if any) was found.
Match match = Regex.Match(InputStr, Pattern, RegexOptions)
// Lets use a regular expression to match a date string.
string pattern = @"([a-zA-Z]+) (\d+)";
// The RegexOptions are optional to this call, we will go into more detail about
// them below.
Match result = Regex.Match("June 24", pattern);
if (result.Success) {
// Indeed, the expression "([a-zA-Z]+) (\d+)" matches the date string
// To get the indices of the match, you can read the Match object's
// Index and Length values.
// This will print [0, 7], since it matches at the beginning and end of the
// string
Console.WriteLine("Match at index [{0}, {1})",
result.Index,
result.Index + result.Length);
// To get the fully matched text, you can read the Match object's Value
// This will print "June 24"
Console.WriteLine("Match: {0}", result.Value);
// If you want to iterate over each of the matches, you can call the
// Match object's NextMatch() method which will return the next Match
// object.
// This will print out each of the matches sequentially.
while (result.Success) {
Console.WriteLine("Match: {0}", result.Value);
result = result.NextMatch();
}
}
If we wanted to perform a global search over the whole input string and return all the matches with
their corresponding capture data, we can instead use the static method Regex.Matches()
to get a MatchCollection
which can be iterated over and processed as in the example above.
MatchCollection matches = Regex.Matches(InputStr, Pattern, RegexOptions)
// Lets use a regular expression to capture data from a few date strings.
string pattern = @"([a-zA-Z]+) (\d+)";
MatchCollection matches = Regex.Matches("June 24, August 9, Dec 12", pattern);
// This will print the number of matches
Console.WriteLine("{0} matches", matches.Count);
// This will print each of the matches and the index in the input string
// where the match was found:
// June 24 at index [0, 7)
// August 9 at index [9, 17)
// Dec 12 at index [19, 25)
foreach (Match match in matches) {
Console.WriteLine("Match: {0} at index [{1}, {2})",
match.Value,
match.Index,
match.Index + match.Length);
}
// For each match, we can extract the captured information by reading the
// captured groups.
foreach (Match match in matches) {
GroupCollection data = match.Groups;
// This will print the number of captured groups in this match
Console.WriteLine("{0} groups captured in {1}", data.Count, match.Value);
// This will print the month and day of each match. Remember that the
// first group is always the whole matched text, so the month starts at
// index 1 instead.
Console.WriteLine("Month: " + data[1] + ", Day: " + data[2]);
// Each Group in the collection also has an Index and Length member,
// which stores where in the input string that the group was found.
Console.WriteLine("Month found at[{0}, {1})",
data[1].Index,
data[1].Index + data[1].Length);
}
Another common task is to find and replace a part of a string using regular expressions, for example,
to replace all instances of an old email domain, or to swap the order of some text. You
can do this in C# with the static method Regex.Replace()
.
The replacement string can either be a regular expression that contains references to captured groups in the pattern, or just a regular string.
string replaced = Regex.Replace(InputStr, Pattern, ReplacementPattern, RegexOption)
// Lets try and reverse the order of the day and month in a few date
// strings. Notice how the replacement string also contains metacharacters
// (the back references to the captured groups) so we use a verbatim
// string for that as well.
string pattern = @"([a-zA-Z]+) (\d+)";
// This will reorder the string inline and print:
// 24 of June, 9 of August, 12 of Dec
// Remember that the first group is always the full matched text, so the
// month and day indices start from 1 instead of zero.
string replacedString = Regex.Replace("June 24, August 9, Dec 12",
pattern, @"$2 of $1");
Console.WriteLine(replacedString);
RegexOptions
EnumsIn the regular expression methods above, you will notice that each of them also take an optional
RegexOptions
argument. Most of the available flags are a convenience and can be written into the into the
regular expression itself directly, but some can be useful in certain cases.
RegexOptions.Compiled
speeds up matching multiple input strings with the same pattern by allowing
the regular expression engine to compile the pattern first. On by default if no options are set.RegexOptions.IgnoreCase
makes the pattern case insensitive so that it matches strings of different
capitalizationsRegexOptions.Multiline
is necessary if your input string has newline characters (\n) and allows
the start and end metacharacter (^ and $ respectively) to match at the beginning and end of each
line instead of at the beginning and end of the whole input stringRegexOptions.RightToLeft
is useful for matching in RTL languagesRegexOptions.Singleline
allows the dot (.) metacharacter match all characters, including the
newline character (\n)While the static methods above for normal processing, if you are testing millions of input strings
against the same pattern, you can reduce object allocations and get faster performance by instantiating
a Regex
object with the pattern in the constructor.
With this Regex
object, all the same methods above are available on the object, except you will not
have to pass in the pattern again with each call.
For more information, you can read the documentation on Static vs Instance methods on MSDN.
For more information about using regular expressions in C#, please visit the following links: