RegexOne - Learn Regular Expressions - Lesson 11: Match groups

Lesson 11: Match groups

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group. In practice, this can be used to extract information like phone numbers or emails from all sorts of data.

Imagine for example that you had a command line tool to list all the image files you have in the cloud. You could then use a pattern such as ^(IMG\d+\.png)$ to capture and extract the full filename, but if you only wanted to capture the filename without the extension, you could use the pattern ^(IMG\d+)\.png$ which only captures the part before the period.

Go ahead and try to use this to write a regular expression that matches only the filenames (not including extension) of the PDF files below.

Exercise 11: Matching groups

Task	Text	Capture Groups
capture	file_record_transcript.pdf	file_record_transcript
capture	file_07241999.pdf	file_07241999
skip	testfile_fake.pdf.tmp

Solution

We only want to capture lines that start with "file" and have the file extension ".pdf" so we can write a simple pattern that captures everything from the start of "file" to the extension, like this ^(file.+)\.pdf$.

Solve the above task to continue on to the next problem, or read the Solution.

Lesson Notes

	abc…	Letters
	123…	Digits
	\d	Any Digit
	\D	Any Non-digit character
	.	Any Character
	\.	Period
	[abc]	Only a, b, or c
	[^abc]	Not a, b, nor c
	[a-z]	Characters a to z
	[0-9]	Numbers 0 to 9
	\w	Any Alphanumeric character
	\W	Any Non-alphanumeric character
	{m}	m Repetitions
	{m,n}	m to n Repetitions
	*	Zero or more repetitions
	+	One or more repetitions
	?	Optional character
	\s	Any Whitespace
	\S	Any Non-whitespace character
	^…$	Starts and ends
	(…)	Capture Group
	(a(bc))	Capture Sub-group
	(.*)	Capture all
	(abc\|def)	Matches abc or def