Vija02 Vija02 - 4 years ago 87
Vb.net Question

Scraping specific text from website to Application on VB

I'm trying to create a simple app which is basically used to compare stuff on several websites. I've seen some ways to extract all the text to the app. But is there any way to extract say, only the Title and Description.

Take a book site as an example. Is there anyway to search a book title then show all different reviews, synopsis, prices without having any unusefull text there?

Answer Source

A quick and simple solution is to use a WebBrowser which exposes a HtmlDocument through it's .Document property.

Public Class Form1

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Me.WebBrowser1.ScriptErrorsSuppressed = True
        Me.WebBrowser1.Navigate(New Uri("http://stackoverflow.com/"))
    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted

        Dim document As HtmlDocument = Me.WebBrowser1.Document
        Dim title As String = Me.GetTitle(document)
        Dim description As String = Me.GetMeta(document, "description")
        Dim keywords As String = Me.GetMeta(document, "keywords")
        Dim author As String = Me.GetMeta(document, "author")

    End Sub

    Private Function GetTitle(document As HtmlDocument) As String
        Dim head As HtmlElement = Me.GetHead(document)
        If (Not head Is Nothing) Then
            For Each el As HtmlElement In head.GetElementsByTagName("title")
                Return el.InnerText
            Next
        End If
        Return String.Empty
    End Function

    Private Function GetMeta(document As HtmlDocument, name As String) As String
        Dim head As HtmlElement = Me.GetHead(document)
        If (Not head Is Nothing) Then
            For Each el As HtmlElement In head.GetElementsByTagName("meta")
                If (String.Compare(el.GetAttribute("name"), name, True) = 0) Then
                    Return el.GetAttribute("content")
                End If
            Next
        End If
        Return String.Empty
    End Function

    Private Function GetHead(document As HtmlDocument) As HtmlElement
        For Each el As HtmlElement In document.GetElementsByTagName("head")
            Return el
        Next
        Return Nothing
    End Function

End Class
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download