Felipe Oliveira Felipe Oliveira - 1 year ago 98
C# Question

how to catch a specific html class with C #

I am using C# to get an html page a whole however like to isolate just one div specifies

<div class="row row-dia-obituario">

I'm using this code to get the html, it brings the full html of the page

request = (HttpWebRequest)WebRequest.Create("https://pt.wikipedia.org/wiki/Wikip%C3%A9dia:P%C3%A1gina_principal");
request.Proxy = webProxy;
request.Timeout = 20000;
request.Method = "GET";
request.KeepAlive = true;
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream(), encoding);
html = sr.ReadToEnd();
string htmlaux = Regex.Replace(html, "&quot;", "").Trim();
html = System.Net.WebUtility.HtmlDecode(htmlaux);

Answer Source

Don't use Regex to parse html. Use Html parser, you can look into Html Agility Pack

    HtmlDocument doc = new HtmlDocument();

    var divNode = doc.DocumentNode.Descendants().Where(x => x.Name == "div" && 
                                                x.Attributes["class"].Value == "row row-dia-obituario")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download