ImTrying ImTrying - 3 months ago 15
PowerShell Question

Having trouble with a non-breaking error

First off my code seems to work every time I run it even though I get an error returned. I'm curious to know what is happening and how to fix it.

My code is used to scrape metadata from an array of website links.

Non-breaking error:

Cannot index into a null array.
At C:\test\websiteScrape.ps1:127 char:5
+ $List += [pscustomobject]@{
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : NullArray


Code:

$web = New-Object Net.WebClient
$web | Get-Member

function getMetaData($Array){
$fullArray = @()

foreach ($element in $Array){

$metaString = $web.DownloadString($element)

$metaArray = $metaString| Select-String -AllMatches '(meta name=".*?".+")|(a lang="fr" href=".*?")' | % { $_.Matches } | % { $_.Value }
select -expa matches | select -expa value

$fullArray += ,($element,$metaArray)
}

return $fullArray
}
#$Array is a System.Array and it holds a bunch of strings"links" to a website.

$metaData = getMetaData $Array

$List = @()
for ($i=0; $i -le $metaData.length; $i++){

$List += [pscustomobject]@{

PageName = $metaData[$i][0]

Description = [regex]::Replace($metaData[$i][1][1], 'meta name=".*?" content="(.*?)"', '$1');

Creator = [regex]::Replace($metaData[$i][1][2], 'meta name=".*?" content="(.*?)"', '$1');

Instituation = [regex]::Replace($metaData[$i][1][3], 'meta name=".*?" content="(.*?)"', '$1');

Languague = [regex]::Replace($metaData[$i][1][4], 'meta name=".*?" content="(.*?)"', '$1');

Subject =[regex]::Replace($metaData[$i][1][5], 'meta name=".*?" content="(.*?)"', '$1');

Indentifier= [regex]::Replace($metaData[$i][1][6], 'meta name=".*?" content="(.*?)"', '$1');
}
}
List| Select-Object -Property PageName, Description| Export-Csv -path C:\Desktop\urlsAndMetaData.csv -NoTypeInformation

Answer

If I read your code correctly (unfortunately I don't have a link to test it), you can simplify your code a lot. This should do the same:

$web = New-Object Net.WebClient
$urls = @('www.firstlink.com', 'www.link2.com')

$regex = '<meta\s+name="([^"]+)" content="([^"]+)'

$urls | ForEach-Object {
    $webSiteContent = $web.DownloadString($_) 
    $metaData = @{}
    [regex]::Matches($webSiteContent, $regex) | ForEach-Object {
       $metaData.Add($_.Groups[1].Value, $_.Groups[2].Value)
    }
    [PSCustomObject]@{
        PageName = $_
        Description =  $metaData['gc.description.long']
        Creator = $metaData['dc.creator']
        Instituation = $metaData['dc.institution']
        Languague = $metaData['dc.language']
    }
} | Export-Csv -path C:\Desktop\urlsAndMetaData.csv -NoTypeInformation 

$web.Dispose()
Comments