Erik van de Ven Erik van de Ven - 6 months ago 9
Perl Question

Get all <a> tags which do not contain a rel="nofollow"

I've written a small command to find all files which contain external url's, using

ack
and open them in sublime, so I can do a find and replace on all tags and add a
rel="nofollow"
:

sublime $(ack -l '<a[^>]+href="http')


But now I would like to make sure those
<a>
tags do not already contain a
rel="nofollow"
. Anyone who can help me out?

I just need to get all
<a>
tags which contain a
href="http
(so I'm pretty sure it's an external url), but probably it's better if I could check it contained a
href="<do not contain website.nl>"
, so a href without
website.nl
. And it may not contain a
rel="nofollow"
.

Would be a great bonus if it could check on
rel="nofollow"
and
rel='nofollow'
(so single and double quotes, same for
href
) But I could run the same command a couple of times, with and without double quotes, so it wouldn't be that much of an issue.

Answer

I believe ack uses Perl regex patterns, in which case you should use a negative look-ahead, like this

$ sublime $(ack -l '<a\b(?=[^>]+\bhref="http)(?![^>]+\brel="nofollow")')

But note that ack will check only one line at a time whereas an HTML <a> element may run over several lines