Occasionally, you'll find yourself with a log file that has ill-formatted whitespace where lines are indented too much or not enough. One way to fix this is to use an editor's search a replace and a regular expression to extract the content of the lines without the extra whitespace.
We have previously seen how to match a full line of text using the hat ^ and the dollar sign $ respectively. When used in conjunction with the whitespace \s, you can easily skip all preceding and trailing spaces.
Write a simple regular expression to capture the content of each line, without the extra whitespace.
Task | Text | Capture Groups | |
capture | The quick brown fox... | The quick brown fox... | |
capture | jumps over the lazy dog. | jumps over the lazy dog. |
Solution | We can just skip all the starting and ending whitespace by not capturing it in a line. For example, the expression ^\s*(.*)\s*$ will catch only the content. |