user2650277 user2650277 - 7 days ago 6
HTML Question

Extract href of a specific anchor text in bash

I am trying to get the

href
of the most recent production release from Exiftool page.

curl -s 'http://www.sno.phy.queensu.ca/~phil/exiftool/history.html' | grep -o -E "href=[\"'](.*)[\"'].*Version"


Actual output

href="Image-ExifTool-10.36.tar.gz">Version


I want this an as output

Image-ExifTool-10.36.tar.gz

Answer

Using grep -P you can use a lookahead and \K for match reset:

curl -s  'http://www.sno.phy.queensu.ca/~phil/exiftool/history.html' |
grep -o -P "href=[\"']\K[^'\"]+(?=[\"']>Version)"

Image-ExifTool-10.36.tar.gz