user3839452 user3839452 - 4 months ago 14
PowerShell Question

Powershell HTML parsing from multiple tables

I need help extracting the the html tables from this page. I am trying to query different types of devices and they have different fields. I'd like a way to extract the column headers as object name and the data as the value regardless what tables it finds. The good this is that all the tables are just two table rows.

I've tried using convertfrom-html from http://poshcode.org/4849 but that didnt help me out.

one big problem i am having is that when i do a invoke-webrequest, there is no parsedhtml so I cant search by ID.

$url = 'http://ipaddressofdevice/status.htm'
$r = Invoke-WebRequest $url


This is $r

StatusCode : 200
StatusDescription : OK
Content : {60, 104, 116, 109...}
RawContent : HTTP/1.0 200 OK
Tue, 02 Aug 2016 01: 35:59 GMT
Context-Type: text/html

<html>
<head>
<title><center><b>Controller Status</b></title>
<meta http-equiv="Content-Type" content="text/html; charset...
Headers : {[Tue, 02 Aug 2016 01, 35:59 GMT], [Context-Type, text/html]}
RawContentLength : 4443


This is $r.RawContent

HTTP/1.0 200 OK
Tue, 02 Aug 2016 01: 35:59 GMT
Context-Type: text/html

<html>
<head>
<title><center><b>Controller Status</b></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="font_styles.css">
</head>
<body bgcolor="#FFFFFF"><center>
<table CELLSPACING=4 CELLPADDING=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"><tr>
<td colspan="2" class="intro_18">
<br><center>Controller Status<br><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<br><tr class="listhead_0" BGCOLOR="#009999"><th>Controller Type</th><th>Controller Name</th><th>Online</th><tr><tr><td class="listdata_1">Master Controller</td><td class="listdata_1">somename</td>
<td class="listdata_1">Yes</td></tr></table></td></tr><tr><td><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Main Image</th><th>Boot Image</th><th>Bootloader</th><th>Processor</th><th>Board</th><tr><td class="listdata_1">5.2.A.19813.i2</td>
<td class="listdata_1">5.0.4.17504.BOOT.i2</td><td class="listdata_1">2.0.35</td><td class="listdata_1">MPC860 D4</td><td class="listdata_1">II</td></tr></table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>MAC Address</th><th>IP Address</th><th>Host IP Address</th>
</tr><tr class="listdata_1"><td class="listdata_1">010bdc</td>
</td><td class="listdata_1">172.0.0.1</td><td class="listdata_1">hostname.com</td></tr></table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Local Date / Time</th>
<th>GMT Date / Time</th>
<th>DST</th>
<th>Boot Date / Time</th>
<th>Elapsed Time Since Boot</th>
</tr><td class="listdata_1">Tue Aug 2 7: 5:59 2016
India Standard Time</td>
<td class="listdata_1">Tue Aug 2 1:35:59 2016
</td>
<td class="listdata_1">No</td><td class="listdata_1">Fri Jul 15 23:30:13 2016
</td>
<td class="listdata_1">17 days 2 hours 5 minutes 46 seconds</td>
</table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Program Memory</th><th>Free Program Memory</th><th>Percent Free</th></tr><tr><td class="listdata_1">15425536</td>
<td class="listdata_1">6197248</td>
<td class="listdata_1">40.18 %</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Storage Memory</th><th>Free Storage Memory</th><th>Total Physical Memory</th></tr><tr><td class="listdata_1">50819072</td>
<td class="listdata_1">45514064</td>
<td class="listdata_1">64 Meg</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Host Connection Status</th>
<th>Path To Host</th></tr><tr><td class="listdata_1">Host Connection Established</td>
<td class="listdata_1">Yes</td></tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Active Communication Type</th><th>Secondary Communication Type</th>
</tr><td class="listdata_1">Ethernet</td>
<td class="listdata_1">N/A</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>PCMCIA Ethernet Card Address</th><th>Modem</th><th>USB Security Key</th></tr><td class="listdata_1">N/A</td><td class="listdata_1">N/A</td>
<td class="listdata_1">N/A</td></tr>
</table>
</td></tr>
</table>
<p><font face="Verdana, Trebuchet MS, Tahoma, Arial, sans-serif" size="1" color="#003366">
Copyright ? 2008 Tyco International Ltd. and its Respective Companies. All Rights Reserved</font></p>
</body>
</html>

Answer

If your HTML page is always look like this you could use this:

#$raw - is your $r.RawContent from Web-Request
$raw = @"
HTTP/1.0 200 OK
Tue, 02 Aug 2016 01: 35:59 GMT
Context-Type: text/html

<html>
<head>
<title><center><b>Controller Status</b></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" href="font_styles.css">
</head>
<body bgcolor="#FFFFFF"><center>
<table CELLSPACING=4 CELLPADDING=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"><tr>
<td colspan="2" class="intro_18">
<br><center>Controller Status<br><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<br><tr class="listhead_0" BGCOLOR="#009999"><th>Controller Type</th><th>Controller Name</th><th>Online</th><tr><tr><td class="listdata_1">Master Controller</td><td class="listdata_1">somename</td>
<td class="listdata_1">Yes</td></tr></table></td></tr><tr><td><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Main Image</th><th>Boot Image</th><th>Bootloader</th><th>Processor</th><th>Board</th><tr><td class="listdata_1">5.2.A.19813.i2</td>
<td class="listdata_1">5.0.4.17504.BOOT.i2</td><td class="listdata_1">2.0.35</td><td class="listdata_1">MPC860 D4</td><td class="listdata_1">II</td></tr></table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>MAC Address</th><th>IP Address</th><th>Host IP Address</th>
</tr><tr class="listdata_1"><td class="listdata_1">010bdc</td>
</td><td class="listdata_1">172.0.0.1</td><td class="listdata_1">hostname.com</td></tr></table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Local Date / Time</th>
<th>GMT Date / Time</th>
<th>DST</th>
<th>Boot Date / Time</th>
<th>Elapsed Time Since Boot</th>
</tr><td class="listdata_1">Tue Aug  2  7: 5:59 2016
 India Standard Time</td>
<td class="listdata_1">Tue Aug  2  1:35:59 2016
</td>
<td class="listdata_1">No</td><td class="listdata_1">Fri Jul 15 23:30:13 2016
</td>
<td class="listdata_1">17 days 2 hours 5 minutes 46 seconds</td>
</table></td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Program Memory</th><th>Free Program Memory</th><th>Percent Free</th></tr><tr><td class="listdata_1">15425536</td>
<td class="listdata_1">6197248</td>
<td class="listdata_1">40.18 %</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Storage Memory</th><th>Free Storage Memory</th><th>Total Physical Memory</th></tr><tr><td class="listdata_1">50819072</td>
<td class="listdata_1">45514064</td>
<td class="listdata_1">64 Meg</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Host Connection Status</th>
<th>Path To Host</th></tr><tr><td class="listdata_1">Host Connection Established</td>
<td class="listdata_1">Yes</td></tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>Active Communication Type</th><th>Secondary Communication Type</th>
</tr><td class="listdata_1">Ethernet</td>
<td class="listdata_1">N/A</td>
</tr>
</table>
</td></tr><tr><td>
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366">
<tr class="listhead_0" BGCOLOR="#009999"><th>PCMCIA Ethernet Card Address</th><th>Modem</th><th>USB Security Key</th></tr><td class="listdata_1">N/A</td><td class="listdata_1">N/A</td>
<td class="listdata_1">N/A</td></tr>
</table>
</td></tr>
</table>
<p><font face="Verdana, Trebuchet MS, Tahoma, Arial, sans-serif" size="1" color="#003366">
Copyright ? 2008 Tyco International Ltd. and its Respective Companies.  All Rights Reserved</font></p>
</body>
</html>
"@

$obj = New-Object PSObject
for ($i=0;$i -lt $raw.Length-8;){

    $table = $raw.Remove(0,$i)
    $i += $table.IndexOf('</table>') + 8
    $table = $table.Remove($table.IndexOf('</table>') + 8)
    $columns = $table -split "`n" -join '' -split '<th>' | 
                    ? {$_ -like '*</th>*'} |
                        % {$_.Remove($_.IndexOf('</th>'))} |
                          ? {$_ -and $_ -ne [Char]13}
    $values =  $table -split "`n" -join '' -split '<td class' | 
                    ? {$_ -like '*</td>*'} | 
                        % {$_.Remove($_.IndexOf('</td>')) -replace '.*">'} |
                            ? {$_ -and $_ -ne [Char]13}
    $o=0
    $columns | % {
        $column = $_
        $obj | Add-Member -MemberType NoteProperty -Name $column -Value $values[$o]
        $o++
    }
}
$obj

Output:

Controller Type              : Master Controller
Controller Name              : somename
Online                       : Yes
Main Image                   : 5.2.A.19813.i2
Boot Image                   : 5.0.4.17504.BOOT.i2
Bootloader                   : 2.0.35
Processor                    : MPC860 D4
Board                        : II
MAC Address                  : 010bdc
IP Address                   : 172.0.0.1
Host IP Address              : hostname.com
Local Date / Time            : Tue Aug  2  7: 5:59 2016 India Standard Time
GMT Date / Time              : Tue Aug  2  1:35:59 2016
DST                          : No
Boot Date / Time             : Fri Jul 15 23:30:13 2016
Elapsed Time Since Boot      : 17 days 2 hours 5 minutes 46 seconds
Total Program Memory         : 15425536
Free Program Memory          : 6197248
Percent Free                 : 40.18 %
Total Storage Memory         : 50819072
Free Storage Memory          : 45514064
Total Physical Memory        : 64 Meg
Host Connection Status       : Host Connection Established
Path To Host                 : Yes
Active Communication Type    : Ethernet
Secondary Communication Type : N/A
PCMCIA Ethernet Card Address : N/A
Modem                        : N/A
USB Security Key             : N/A
Comments