menteith menteith - 6 months ago 34
Javascript Question

How do I retrieve some information from HTML code in AHK?

I'd like to retrieve some information from HTML code. Let's consider the following:

<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>


I can get all the information from
article-additional-info
class by using
document.getElementsByClassName("article-additional-info")[0].innerText
.

But how do I retrieve individual information from this class like
2011
(from
<strong>Issue Year:</strong> 2011<
)?
I'd like to avoid using RegEx.

EDIT:

Based on the answer, I slightly modified the code. However, I cannot get rid of one element:
Language:
. Here's the code:

html =
(
<body>
<ul class="article-additional-info">
<li><strong>Issue Year:</strong> 2011</li>
<li><strong>Issue No:</strong> 1 (200)</li>
<li><strong>Page Range:</strong> 65-80</li>
<li><strong>Page Count:</strong> 15</li>
<li><strong>Language:</strong> Polish</li>
</ul>
</body>
)

document := ComObjCreate("HTMLfile")
document.write(html)

test := ["Issue Year:", "Issue No:", "Page Range:", "Page Count:"]

try While (x := document.getElementsByTagName("ul")[A_Index-1])
{
if (x.className = "article-additional-info")
{
count++
yclass%count% := x.innerHTML
}
}

loop, %count%
{
html := yclass%A_Index%
document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
{
StringLen, y, % test[A_Index]
msgbox % [A_Index] . " " . substr(x.parentnode.innerText, y+2)
}
}
ExitApp

Answer

Try it like so:

html =
(
<body>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> XX 2011</li>
   <li><strong>Issue No:</strong> XX 1 (200)</li>
   <li><strong>Page Range:</strong> XX 65-80</li>
   <li><strong>Page Count:</strong> XX 15</li>
   <li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)

test := "Language:"  ;  adjust for the variable you want to return
classno := 1  ;  adjust the number for the correct class instance!

document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("ul")[A_Index-1])
    {
    if (x.className = "article-additional-info")
        yclass%A_Index% := x.innerHTML
    }
html := yclass%classno%

document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
    {
    StringLen, y, test
    if (x.innerText = test)
        msgbox % substr(x.parentnode.innerText, y+2)  ;  returns "Polish"
    }
ExitApp

And, if you want to iterate over all the class instances and for all the variables, just do it like so:

html =
(
<body>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
<ul class="ao">
   <li><strong>Issue Year:</strong> zz 2011</li>
   <li><strong>Issue No:</strong> zz 1 (200)</li>
   <li><strong>Page Range:</strong> zz 65-80</li>
   <li><strong>Page Count:</strong> zz 15</li>
   <li><strong>Language:</strong> zz Polish</li>
</ul>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> XX 2011</li>
   <li><strong>Issue No:</strong> XX 1 (200)</li>
   <li><strong>Page Range:</strong> XX 65-80</li>
   <li><strong>Page Count:</strong> XX 15</li>
   <li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)

document := ComObjCreate("HTMLfile")
document.write(html)

test := ["Issue Year:", "Issue No:", "Page Range:", "Page Count:", "Language:"]

try While (x := document.getElementsByTagName("ul")[A_Index-1])
    {
    if (x.className = "article-additional-info")
        {
        count++
        yclass%count% := x.innerHTML
        }
    }

loop, %count%
{
which++
html := yclass%A_Index%

document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
    {
    StringLen, y, % test[A_Index]
    msgbox % which . ": " . test[A_Index] . " " . substr(x.parentnode.innerText, y+2)
    }
}
ExitApp

Where the substr(x.parentnode.innerText, y+2) is the value you are looking for.

Have Fun!!

Comments