Anders Anders - 2 months ago 5
Vb.net Question

What's wrong with my Regex string for scraping link elements

I'm having a little problem with a VB.NET scraper, it's supposed to get all links of a html string, which I have already downloaded, and the links are there (I have checked), so it must be something with my regex string.

My regex string:

<a.*?href=""(.*?)"".*?>(.*?)</a>


This works for some sites, but for others it does not.

Here are examples from the HTML source that match and don't match.

Working:

<a href="http://domain.com" rel="nofollow" onmousedown="return clk('25936','3')" target="_blank">/a>


Not working:

<a href='http://domain.com' target="_blank" ><font size=2><b>text</b></a>


Could it be because of the
"
and
'
?

Answer

Check with following RegExp:

<a.*?href=[",'](.*?)[",'].*?><\/a>

You are using double quotes 2 times. since a tag's href will be used with single and double quotes you have to check with both.