Glitch_Doctor Glitch_Doctor - 6 months ago 21
HTML Question

Can I web scrape for a particular style in VBA?

I have been looking everywhere for any possible workaround to this issue.

All of the data at my company is accessed via a web portal that produces static HTML pages. Unfortunately our department cannot be given direct access to the Server which would make my life easy so I need to page scrape this portal to find the data that I need. My navigation is fine and i am quite experienced with scraping where elements are named or given an ID, however this does not have either.

Anyway, background out of the way.

I want to grab a table from the page that has a unique style of "empty-cells: show;":

<TABLE cellspacing=10 cellPadding=10 border="1" style="empty-cells: show;">

</TABLE>


Or failing that there is a heading in the first row which always contains the same text string. Once I have that table I can manipulate the data I need from it. Hugely sensitive data here guys, so I can't provide the full page code unfortunately.

I know that there have been many posts regarding GetElementByRegex but I cannot find a post or website that actually explains how to use it. Instead they all want me to install their add-on which isn't an option (I need to learn this to sate my thirst for knowledge).

To help I have added the full table code below removing the sensitive data:

<TABLE cellspacing=10 cellPadding=10 border="0" width=100%>

<tr>

<td>
<TABLE cellspacing=10 cellPadding=10 border="1" style="empty-cells: show;">

<TR class="row0">
<TD style="width: 25%; background-color: #A3DCF5;"><strong>TITLE:</strong></TD>
<TD>LINE1</TD>
</TR>
<TR class="row1">
<TD>&nbsp;</TD><td>LINE2</td>
</TR>
<TR class="row0">
<TD>&nbsp;</TD><td>LINE3</td>
</TR>
<TR class="row1">
<TD>&nbsp;</TD><td>LINE4</td>
</TR>
<TR class="row0">
<TD>&nbsp;</TD><td>LINE5</td>
</TR>

</TABLE>
</td>

</tr>

</TABLE>


There are many other tables though so using a Len check will not help me top sift through the TD tags.

Thank you for all the help Tim!

The below worked perfectly for me:

Dim tbls, tbl
Dim L1, L2, L3, L4, L5 As String

Set tbls = IE.Document.getElementsByTagName("table")

For Each tbl In tbls
If tbl.Rows(0).Cells(0).innerText = "Card Address:" Then

On Error Resume Next
L1 = tbl.Rows(0).Cells(1).innerText
L2 = tbl.Rows(1).Cells(1).innerText
L3 = tbl.Rows(2).Cells(1).innerText
L4 = tbl.Rows(3).Cells(1).innerText
L5 = tbl.Rows(4).Cells(1).innerText

Exit For
End If
Next

Worksheets("Sheet2").Range("A1").Value = L1
Worksheets("Sheet2").Range("A2").Value = L2
Worksheets("Sheet2").Range("A3").Value = L3
Worksheets("Sheet2").Range("A4").Value = L4
Worksheets("Sheet2").Range("A5").Value = L5

End Sub

Answer
Dim tbls, tbl, tr, j, td, row, sht

Set tbls = IE.document.getElementsByTagName("table")

For Each tbl in tbls
    'item indexes are zero-based (AFAIR)
    If tbl.Rows(0).Cells(1).innerText = "LINE1" Then


        'EDIT: extracting the table contents
        Set sht = ActiveSheet
        row = 3
        For Each tr In t.getelementsbytagname("TR")
            j = 1
            For Each td In tr.getelementsbytagname("TD")
                sht.Cells(row + 1, j).Value = td.innerText
                j = j + 1
            Next
            row = row + 1
        Next

        Exit For 'stop looping
    End If
Next
Comments