smccull smccull - 1 year ago 47
Vb.net Question

How to pull listed data from several web pages - Cannot locate table tag

First of all I have read ALOT of different answers online in relation to this topic but I have to admit I am really struggling to adapt them to what I need so please any help is very much appreciated!

I need to extract the data listed on the following webpage (Pages 1-7) i.e. Fund Name, Price, Currency etc https://toolkit.financialexpress.net/santanderam and pull this data through to excel.

I have the below code that will open the IE page (which is working):

' return the document containg the DOM of the page strWebAddress
' returns Nothing if the timeout lngTimeoutInSeconds was reached
Public Function GetIEDocument(ByVal strWebAddress As String, Optional ByVal lngTimeoutInSeconds As Long = 15) As MSHTML.HTMLDocument
Dim IE As SHDocVw.InternetExplorer
Dim IEDocument As MSHTML.HTMLDocument
Dim dateNow As Date

' create an IE application, representing a tab
Set IE = New SHDocVw.InternetExplorer

' optionally make the application visible, though it will work perfectly fine in the background otherwise
IE.Visible = True

' open a webpage in the tab represented by IE and wait until the main request successfully finished
' times out after lngTimeoutInSeconds with a warning
IE.Navigate strWebAddress
dateNow = Now
Do While IE.Busy
If Now > DateAdd("s", lngTimeoutInSeconds, dateNow) Then Exit Function
Loop

' retrieve the webpage's content (that is, the HTML DOM) and wait until everything is loaded (images, etc.)
' times out after lngTimeoutInSeconds with a warning
Set IEDocument = IE.Document
dateNow = Now
Do While IEDocument.ReadyState <> "complete"
If Now > DateAdd("s", lngTimeoutInSeconds, dateNow) Then Exit Function
Loop

Set GetIEDocument = IEDocument
End Function


However I cannot find the table tag that contains all other tags that I am interested to allow the rest of the code to pull through the data, The below code is what I have so far:

Public Sub GetTeamData()
Dim strWebAddress As String
Dim strH2AnchorContent As String
Dim IEDocument As MSHTML.HTMLDocument
Dim objH2 As MSHTML.HTMLHeaderElement
Dim objTable As MSHTML.HTMLTable
Dim objRow As MSHTML.HTMLTableRow
Dim objCell As MSHTML.HTMLTableCell
Dim lngRow As Long
Dim lngColumn As Long

' initialize some variables that should probably better be passed as paramaters or defined as constants
strWebAddress = "https://toolkit.financialexpress.net/santanderam"
strH2AnchorContent = " "

' open page
Set IEDocument = GetIEDocument(strWebAddress)
If IEDocument Is Nothing Then
MsgBox "Timeout reached opening this address:" & vbNewLine & strWebAddress, vbCritical
Exit Sub
End If

' retrieve anchor element
For Each objH2 In IEDocument.getElementsByTagName("h2")
If objH2.innerText = strH2AnchorContent Then Exit For
Next objH2
If objH2 Is Nothing Then
MsgBox "Could not find """ & strH2AnchorContent & """ in DOM!", vbCritical
Exit Sub
End If

' traverse HTML tree to desired table element
' * move up one element in the hierarchy
' * skip two elements to proceed to the third (interjected each time with whitespace that is interpreted as an element of its own)
' * move down two elements n the hierarchy
Set objTable = objH2.parentElement _
.NextSibling.NextSibling _
.NextSibling.NextSibling _
.NextSibling.NextSibling _
.Children(0) _
.Children(0)

' iterate over the table and output its contents
lngRow = 1
For Each objRow In objTable.Rows
lngColumn = 1
For Each objCell In objRow.Cells
Cells(lngRow, lngColumn) = objCell.innerText
lngColumn = lngColumn + 1
Next objCell
lngRow = lngRow + 1
Next
End Sub


I am assuming if I can locate the correct table tag to enter in the line below:

strH2AnchorContent = " "


Then the above will work? If so can anyone help with finding the correct tag or advise where I am going wrong with the above?

Again any help would be REALLY appreciated!

Thanks

Edit 1

Updated code:

' open a webpage in the tab represented by IE and wait until the main request successfully finished
' times out after lngTimeoutInSeconds with a warning
IE.Navigate strWebAddress
dateNow = Now
Do While IE.Busy
If Now > DateAdd("s", lngTimeoutInSeconds, dateNow) Then Exit Function
Loop

' retrieve the webpage's content (that is, the HTML DOM) and wait until everything is loaded (images, etc.)
' times out after lngTimeoutInSeconds with a warning
Set IEDocument = IE.Document
dateNow = Now
Do While IEDocument.ReadyState <> "complete"
If Now > DateAdd("s", lngTimeoutInSeconds, dateNow) Then Exit Function
Loop

Set GetIEDocument = IEDocument
End Function

Public Sub GetTeamData()
Dim strWebAddress As String
Dim strH2AnchorContent As String
Dim IEDocument As MSHTML.HTMLDocument
Dim objH2 As MSHTML.HTMLHeaderElement
Dim obTable As MSHTML.HTMLTable
Dim objRow As MSHTML.HTMLTableRow
Dim objCell As MSHTML.HTMLTableCell
Dim lngRow As Long
Dim lngColumn As Long

' initialize some variables that should probably better be passed as paramaters or defined as constants
strWebAddress = "https://toolkit.financialexpress.net/santanderam"


' open page
Set IEDocument = GetIEDocument(strWebAddress)
If IEDocument Is Nothing Then
MsgBox "Timeout reached opening this address:" & vbNewLine & strWebAddress, vbCritical
Exit Sub
End If

' retrieve anchor element
Set oTable = IEDocument.getElementById("Price_1_1")
Debug.Print oTable.innerText

' iterate over the table and output its contents
lngRow = 1
For Each objRow In oTable.Rows
lngColumn = 1
For Each objCell In objRow.Cells
Cells(lngRow, lngColumn) = objCell.innerText
lngColumn = lngColumn + 1
Next objCell
lngRow = lngRow + 1
Next
End Sub

Zac Zac
Answer Source

Your code is working fine, problem is that you are trying to capture data from the table before it is loaded. I added a simple Wait loop for 5 seconds and you current code captured the data. Below is the loop I added just before Set oTable = IEDocument.getElementById("Price_1_1") statement:

dateNow = Now
bExitLoop = False
lngTimeoutInSeconds = 5
Do While Not bExitLoop
    If Now > DateAdd("s", lngTimeoutInSeconds, dateNow) Then Exit Do
Loop

Code above is a static 5 second wait. You could make it more dynamic.. I'll leave that there as a brain teaser :)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download