user3227774 user3227774 - 11 days ago 6
Vb.net Question

Visual Basic HTML Agility Pack How to Grab Images from Table Cells

Hope someone can help as I have spent ages trying to figure this out. I am using the agility pack to extract data from a table and put it in a data grid (the Data grid is not important I am just using it to see if the extraction works). Anyway in the first column of the table thumbnail pictures are contained. I can extract all the text using the code below but I don't know how to extract the images from the first column... Can anyone help?

PS I have saved the webpage as a MHL file as couldn't extract any data directly from it - I believe it's something to do with the site security/ credentials. Don't know if I have made things easier or harder for myself.

Private Sub Button1_Click(ByVal sender As Object, ByVal e As EventArgs) Handles Button1.Click

' '' original cods ***************************************
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Dim RowCount As Integer = 1



' Doc = Web.Load("https://firefly.cardinalnewman.ac.uk/home/my")

Doc.Load("E:\table.mht")


Dim tables As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
Dim img As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
Dim Links As HtmlAgilityPack.HtmlNodeCollection = Doc.DocumentNode.SelectNodes("//table")
Dim rows As HtmlAgilityPack.HtmlNodeCollection = tables(0).SelectNodes("//*[@id=HomeMyStudents]")


For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[1]")
RowCount = RowCount + 1

DGV.Rows.Add(Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing)

Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[2]")
RowCount = RowCount + 1
' DGV.Rows(RowCount).Cells(1).Value = somehow insert image
' this is the section where I need to grab the image in each cell and either save or place in my datagrid


Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[3]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(2).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[4]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(3).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[5]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(4).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[6]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(5).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[7]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(6).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[8]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(7).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[9]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(8).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[10]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(9).Value = table.InnerText
Next
RowCount = 0
For Each table As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes("//*[@id='HomeMyStudents']/tbody/tr['RowCount']/td[11]")
RowCount = RowCount + 1
DGV.Rows(RowCount).Cells(10).Value = table.InnerText
Next




End Sub

Answer

So, presumably the images look something like:

<img src="whatever.jpg"/> 

in the markup, right?

HAP will allow you to grab image nodes with something like

... .SelectNodes("./img") 

And for the paths:

... .Attributes("src").Value()

From there, I'm not aware of any particular HAP features that allow you to actually perform any HTTP requests like this, so you're going to want a WebClient for that.

Dim wc as new WebClient 

wc.DownloadFile(StringContainingThatSrcValue, PathToSaveFileTo) 

HTH!