Glory Jain Glory Jain - 1 month ago 17
C# Question

Unable to build a regex to match the article tag

I have been trying to create a regex to match the article tag and get all the text .

Here is my article tag-

<article id="post-82" class="post-82 post type-post status-publish format-standard hentry category-publishing">
<div class="entry-content clearfix">
<div class="abh_box abh_box_up abh_box_drop-down"><ul class="abh_tabs"> <li class="abh_about abh_active">
<p>With India playing host,</p>
<footer class="entry-meta-bar clearfix"><div class="entry-meta clearfix">
<span class="comments"><a href="http://www.test.com/blog/emerging-markets/#respond">No Comments</a></span>

</div></footer>
</article>


I need everything which is inside the article tag.So far I have tried the following Regex-

<article (.*?)</article>

(?:<article>)(.*?)(?:</article>)


None of them works .Please help.

Answer

Don't use regex for parsing of HTML. Use Html parser like Html Agility pack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlContent);

var result = doc.DocumentNode.SelectNodes("article").FirstOrDefault();