Remember the last regular expression in Sed

One of Sed’s nifty features that often tends to be overlooked is its ability to remember the last regular expression in a script. The advantage of it is that the code gets shorter and more concise, even at the cost of being more cryptic.

<tag>345464245</tag>
<tag>24524524</tag>
<tag>24524524</tag>
<div>2424662</div>
<tag>24524524</tag>
<div>2462424624</div>
<tag>46246246</tag>
<div>4624642464</div>
<tag>24623462346423</tag>
<div>462462642646</div>
<tag>462462466264</tag>
<div>24646464</div>
<tag>2466246246</tag>
<div>24665457524</div>
<tag>53456575435</tag>
<div>35655</div>
<tag>565</tag>
<tag>457537753</tag>
<tag>48334</tag>

Imagine that we want to extract only the numbers on the “div” lines without the “div” tags.

We could do it using Grep and then piping the output to Sed:

grep div file | sed 's/&lt;\/*div&gt;//g'

Actually, we wouldn’t even need Grep to do it:

sed -n '/div/ s/&lt;\/*div&gt;//gp' file

It turns out, however, that here we can take advantage of Sed’s ability to remember the last regular expression:

sed -n '/&lt;\/*div&gt;/ s///gp' file

So what’s happening here? The -n flag suppresses automatic printing of the lines that don’t contain “div” tags. The “p” option then prints only the files that have been modified.

/&lt;\/*div&gt;/

The above address matches lines containing either <div> or </div>.  Remember that the asterisk (*) means zero or more occurrences of the previous character. As we used / as the delimiter, we also had to use \ to escape /. Then we have the ‘s’ (=substitute) command with 2 empty spaces separated by 3 forward slashes (///). The first blank space (regexp) remembers the last regular expression from the address space. The second empty space just tells Sed to replace whatever was matched with nothing (ie. get rid of it). As the default Sed behaviour would be to match only the first occurrence of the matched pattern, we need to pass the ‘g’ (=global) command.

Leave a Reply

Your email address will not be published. Required fields are marked *

* Copy This Password *

* Type Or Paste Password Here *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>