Felipe Oliveira Felipe Oliveira - 1 month ago 16
C# Question

how to catch a specific html class with C #

I am using C# to get an html page a whole however like to isolate just one div specifies

<div class="row row-dia-obituario">


I'm using this code to get the html, it brings the full html of the page

request = (HttpWebRequest)WebRequest.Create("https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal");
request.Proxy = webProxy;
request.Timeout = 20000;
request.Method = "GET";
request.KeepAlive = true;
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream(), encoding);
html = sr.ReadToEnd();
string htmlaux = Regex.Replace(html, "&quot;", "").Trim();
html = System.Net.WebUtility.HtmlDecode(htmlaux);

Answer

Don't use Regex to parse html. Use Html parser, you can look into Html Agility Pack

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    var divNode = doc.DocumentNode.Descendants().Where(x => x.Name == "div" && 
                                                x.Attributes["class"].Value == "row row-dia-obituario")
                                               .FirstOrDefault();
Comments