One of Sed’s nifty features that often tends to be overlooked is its ability to remember the last regular expression in a script. The advantage of it is that the code gets shorter and more concise, even at the cost of being more cryptic.
Imagine that we want to extract only the numbers on the “div” lines without the “div” tags.
We could do it using Grep and then piping the output to Sed:
Actually, we wouldn’t even need Grep to do it:
It turns out, however, that here we can take advantage of Sed’s ability to remember the last regular expression:
So what’s happening here? The -n flag suppresses automatic printing of the lines that don’t contain “div” tags. The “p” option then prints only the files that have been modified.
The above address matches lines containing either <div> or </div>. Remember that the asterisk (*) means zero or more occurrences of the previous character. As we used / as the delimiter, we also had to use \ to escape /. Then we have the ‘s’ (=substitute) command with 2 empty spaces separated by 3 forward slashes (///). The first blank space (regexp) remembers the last regular expression from the address space. The second empty space just tells Sed to replace whatever was matched with nothing (ie. get rid of it). As the default Sed behaviour would be to match only the first occurrence of the matched pattern, we need to pass the ‘g’ (=global) command.