Duke Duke - 3 months ago 10
C# Question

C# Regex with not necessary text

I need to parse some html text that can appear as either of 2 different types of links

1. <a href="http://freelistenonline.com/">Site</a>
2. <a class="mobile" href="http://m.freelistenonline.com/">Site</a>


I made the following RegEx :

<a[\s]*class="(?<class>[\w\W]*?)"[\s]*href="(?<link>[\w\W]*?)">


which works for the 2nd case, but not for the 1st case. How should I change it to work for both? I need to identify portions, such as param class="mobile", are not necessary in the text. So how do I modify the RegEx to make the portion of the string containing class="(?[\w\W]*?)"[\s]* optional? What is the syntax for it?

Duke

Answer

I think this will solve your problem - put a zero-or-more repeat on the 'class' portion. As it is now, the test string must contain this section, hence the first string fails:

<a[\s]+(class="(?<class>[\w\W]*?)")*[\s]*href="(?<link>[\w\W]*?)">

Edit to incorporate fix for aclass match noted in comments by rory.ap

Comments