Using Regular Expressions in C#

If you need a refresher on how Regular Expressions work, check out our Interactive Tutorial first!

C# supports regular expressions through the classes in the System.Text.RegularExpressions namespace in the standard .NET framework. While there are some differences in advanced features supported by the .NET regular expression library compared to PCRE, they both share a large part of the syntax and patterns and expressions can be used in C# and other languages.

In the examples below, be sure to import the following namespaces at the top of the source file if you are trying out the code:

using System.Text.RegularExpressions;

Verbatim String Literals

When writing regular expression in C#, it is recommended that you use verbatim strings instead of regular strings. Verbatim strings begin with a special prefix (@) and signal C# not to interpret backslashes and special metacharacters in the string, allowing you to pass them through directly to the regular expression engine.

This means that a pattern like "\n\w" will not be interpreted and can be written as @"\n\w" instead of "\\n\\w" as in other languages, which is much easier to read.

Matching a string

In the System.Text.RegularExpressions namespace is a Regex class, which encapsulates the interface to the regular expressions engine and allows you to perform matches and extract information from text using regular expressions.

To test whether a regular expressions matches a string, you can use the static method Regex.Match() which takes an optional set of RegexOptions enums. This returns a Match object which contains information about where the match (if any) was found.

Match match = Regex.Match(InputStr, Pattern, RegexOptions)
// Lets use a regular expression to match a date string. string pattern = @"([a-zA-Z]+) (\d+)"; // The RegexOptions are optional to this call, we will go into more detail about // them below. Match result = Regex.Match("June 24", pattern); if (result.Success) { // Indeed, the expression "([a-zA-Z]+) (\d+)" matches the date string // To get the indices of the match, you can read the Match object's // Index and Length values. // This will print [0, 7], since it matches at the beginning and end of the // string Console.WriteLine("Match at index [{0}, {1})", result.Index, result.Index + result.Length); // To get the fully matched text, you can read the Match object's Value // This will print "June 24" Console.WriteLine("Match: {0}", result.Value); // If you want to iterate over each of the matches, you can call the // Match object's NextMatch() method which will return the next Match // object. // This will print out each of the matches sequentially. while (result.Success) { Console.WriteLine("Match: {0}", result.Value); result = result.NextMatch(); } }

Capturing groups

If we wanted to perform a global search over the whole input string and return all the matches with their corresponding capture data, we can instead use the static method Regex.Matches() to get a MatchCollection which can be iterated over and processed as in the example above.

MatchCollection matches = Regex.Matches(InputStr, Pattern, RegexOptions)
// Lets use a regular expression to capture data from a few date strings. string pattern = @"([a-zA-Z]+) (\d+)"; MatchCollection matches = Regex.Matches("June 24, August 9, Dec 12", pattern); // This will print the number of matches Console.WriteLine("{0} matches", matches.Count); // This will print each of the matches and the index in the input string // where the match was found: // June 24 at index [0, 7) // August 9 at index [9, 17) // Dec 12 at index [19, 25) foreach (Match match in matches) { Console.WriteLine("Match: {0} at index [{1}, {2})", match.Value, match.Index, match.Index + match.Length); } // For each match, we can extract the captured information by reading the // captured groups. foreach (Match match in matches) { GroupCollection data = match.Groups; // This will print the number of captured groups in this match Console.WriteLine("{0} groups captured in {1}", data.Count, match.Value); // This will print the month and day of each match. Remember that the // first group is always the whole matched text, so the month starts at // index 1 instead. Console.WriteLine("Month: " + data[1] + ", Day: " + data[2]); // Each Group in the collection also has an Index and Length member, // which stores where in the input string that the group was found. Console.WriteLine("Month found at[{0}, {1})", data[1].Index, data[1].Index + data[1].Length); }

Finding and replacing strings

Another common task is to find and replace a part of a string using regular expressions, for example, to replace all instances of an old email domain, or to swap the order of some text. You can do this in C# with the static method Regex.Replace().

The replacement string can either be a regular expression that contains references to captured groups in the pattern, or just a regular string.

string replaced = Regex.Replace(InputStr, Pattern, ReplacementPattern, RegexOption)
// Lets try and reverse the order of the day and month in a few date // strings. Notice how the replacement string also contains metacharacters // (the back references to the captured groups) so we use a verbatim // string for that as well. string pattern = @"([a-zA-Z]+) (\d+)"; // This will reorder the string inline and print: // 24 of June, 9 of August, 12 of Dec // Remember that the first group is always the full matched text, so the // month and day indices start from 1 instead of zero. string replacedString = Regex.Replace("June 24, August 9, Dec 12", pattern, @"$2 of $1"); Console.WriteLine(replacedString);

RegexOptions Enums

In the regular expression methods above, you will notice that each of them also take an optional RegexOptions argument. Most of the available flags are a convenience and can be written into the into the regular expression itself directly, but some can be useful in certain cases.

  • RegexOptions.Compiled speeds up matching multiple input strings with the same pattern by allowing the regular expression engine to compile the pattern first. On by default if no options are set.
  • RegexOptions.IgnoreCase makes the pattern case insensitive so that it matches strings of different capitalizations
  • RegexOptions.Multiline is necessary if your input string has newline characters (\n) and allows the start and end metacharacter (^ and $ respectively) to match at the beginning and end of each line instead of at the beginning and end of the whole input string
  • RegexOptions.RightToLeft is useful for matching in RTL languages
  • RegexOptions.Singleline allows the dot (.) metacharacter match all characters, including the newline character (\n)

Compiling a pattern for performance

While the static methods above for normal processing, if you are testing millions of input strings against the same pattern, you can reduce object allocations and get faster performance by instantiating a Regex object with the pattern in the constructor.

With this Regex object, all the same methods above are available on the object, except you will not have to pass in the pattern again with each call.

For more information, you can read the documentation on Static vs Instance methods on MSDN.


For more information about using regular expressions in C#, please visit the following links: