Nicholas Kan Nicholas Kan - 1 month ago 11
HTML Question

Extract the text in html tag and put it as table title



Now the table title is set as

tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4")


how to change the name of four table titles according to the text in html tag?

e.g. I want the first table name changed to "重要财务指标" which the name is "a1" and id is also "a1", and change the following table name to "资产负债表", "现金流量表" and "综合损益表" accordingly?

Please refer to the html code below.

Sub GetFinanceData()
For x = 1 To 10
Dim URL As String, elemCollection As Object
Dim t As Integer, r As Integer, c As Integer

Worksheets("Stocks").Select
Worksheets("Stocks").Activate

'Open IE and Go to the Website

URL = "http://stock.finance.sina.com.cn/hkstock/finance/00001.html"
URL = Cells(x, 1)

Set IE = CreateObject("InternetExplorer.Application")
With IE
.navigate URL
.Visible = True

Do While .Busy = True Or .readyState <> 4
Loop
DoEvents

Worksheets.Add(After:=Worksheets(Worksheets.Count)).Name = _
ThisWorkbook.Worksheets("Stocks").Range("B" & x).Value 'You could even simplify it and just state the name as Cells(x,2)


'Select the Report Type

Set selectItems = IE.Document.getElementsByTagName("select")
For Each i In selectItems
i.Value = "zero"
i.FireEvent ("onchange")
Application.Wait (Now + TimeValue("0:00:05"))
Next i

Do While .Busy: DoEvents: Loop

ActiveSheet.Range("A1:K500").ClearContents

ActiveSheet.Range("A1").Value = .Document.getElementsByTagName("h1")(0).innerText
ActiveSheet.Range("B1").Value = .Document.getElementsByTagName("em")(0).innerText

'Find and Get Table Data

tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4")
tblStartRow = 5
Set elemCollection = .Document.getElementsByTagName("TABLE")
For t = 0 To elemCollection.Length - 1
For r = 0 To (elemCollection(t).Rows.Length - 1)
For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
ActiveSheet.Cells(r + tblStartRow, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
Next c
Next r

ActiveSheet.Cells(r + tblStartRow + 2, 1) = tblNameArr(t)
tblStartRow = tblStartRow + r + 4

Next t

End With

' cleaning up memory

IE.Quit

Next x


End Sub


Below is the html code:

<!--重要财务指标 start-->
<a name="a1" id="a1"></a>
<div class="part02">
<div class="sub01">
<div class="sub01_tt fblue">
<span class=" selected"><a href="#a1" target="_self">重要财务指标</a></span>
<span class=""><a href="#a2" target="_self">资产负债表</a></span>
<span class=""><a href="#a3" target="_self">现金流量表</a></span>
<span class=""><a href="#a4" target="_self">综合损益表</a></span>
<em class="rt">报表类型:<select class="fgrey" style="width:100px;" interface="getFinanceStandardForjs?symbol=$symbol&financeStanderd=" table="tableGetFinanceStandard" onchange="selectData(this);">
<option value="all" >全部</option>
<option value="zero" >年报</option>
<option value="1" >中报</option>
<option value="2" >一季报</option>
<option value="3" >三季报</option>
</select></em>
</div>

Answer

The element which is created by the tag <a name="a1" id="a1"></a> is empty. It is just an link anchor. It does not contain something. So getting this element is useless.

One approach could be, running over all A elements and pick those which refers to href="#a1", href="#a2", ...

Example:

...

nameBalanceSheet = "Balance Sheet"
nameCashFlow = "Cash Flow"
nameHeader3 = "Header 3"
nameHeader4 = "Header 4"

Set elemCollection = .Document.getElementsByTagName("A")
For i = 0 To elemCollection.Length - 1
 If Right(elemCollection(i).href, 3) = "#a1" Then
  nameBalanceSheet = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a2" Then
  nameCashFlow = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a3" Then
  nameHeader3 = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a4" Then
  nameHeader4 = elemCollection(i).innerText
 End If
Next

tblNameArr = Array(nameBalanceSheet, nameCashFlow, nameHeader3, nameHeader4)

...