user2093601 user2093601 - 3 months ago 52
PowerShell Question

How to select HTML attribute values with powershell

I can't figure out how to pull the values from certain attributes of a Invoke-Webrequest

What I have so far:

$r = Invoke-WebRequest -Uri "http://someInternalSite.com"


From what I get back I get a specific table from the page by ID with the below:

$h = $r.ParsedHtml.body.getElementsByTagName('table') | Where {$_.id -eq 'SpecificIdOfTable'}


Then I can see the table's html by using:

$h = $s.getAttribute('outerhtml')


The output looks like:

<TABLE id=SpecificIdOfTable>
<TBODY>
<TR>
<TD>
<INPUT id=SpecificIdOfTable_0 type=checkbox value=11 name=SpecificIdOfTable$0><LABEL for=SpecificIdOfTable_0>First</LABEL>
</TD>
<TD><INPUT id=SpecificIdOfTable_2 type=checkbox value=12 name=SpecificIdOfTable$2><LABEL for=SpecificIdOfTable_2>Second</LABEL>
</TD>
</TR>
<TR>
<TD>
<INPUT id=SpecificIdOfTable_1 type=checkbox value=13 name=SpecificIdOfTable$1><LABEL for=SpecificIdOfTable_1>Third</LABEL>
</TD>
<TD></TD></TR></TBODY></TABLE>


What I'd like is to get just the attibute values 11, 12, and 13 from each td's value= attribute. I've tried to getByAttribute But I keep getting errors. What I found is that outerHTML is just a string. Since powershell is an object scripting language is there any way to access these attributes as objects and just pull each values without regex? If I need regex what would it look like?

Also a bonus question if someone could answer. I've tried to shorten the request with pipes (to avoid the $h = statement) but have been met with errors. What am I doing wrong when I try to shorten the script with?:

$r.ParsedHtml.body.getElementsByTagName('table') | Where {$_.id -eq 'SpecificIdOfTable'} | select outerHTML


or

$r.ParsedHtml.body.getElementsByTagName('table') | Where {$_.id -eq 'SpecificIdOfTable'} | $_.getAttribute('outerhtml')


Neither work and I don't know why or how to shorten the code.

Answer

Try this

$r.ParsedHTML.GetElementByID('SpecificIdOfTable').GetElementsByTagName('input') |
    Select-Object -Property id, value