Jordan Lee Jordan Lee - 4 months ago 10
HTML Question

VB.net searching through HTML code

I'm creating a program that will search through a pages HTML source code and returns if a specified string is present, though it always comes back false, could someone have a look incase I am missing something?

Private Const QUOTE As Char = """"c

Private Sub ServerStatus_Load(sender As Object, e As EventArgs) Handles MyBase.Load

'download the page source and store it here
Dim sourceString As String = New System.Net.WebClient().DownloadString("https://support.rockstargames.com/hc/en-us/articles/200426246")

'call the source and validate a string exists, if not
If (sourceString).Contains($"<div class={QUOTE}panel-base xbl{QUOTE} style={QUOTE}background-color: RGB(236, 255, 236);{QUOTE}><div class={QUOTE}marshmallowLogo{QUOTE} id={QUOTE}xboxLogo{QUOTE}>Xbox 360</div><center><span class={QUOTE}statusSpan{QUOTE} style={QUOTE}color green;{QUOTE}>Up</span></center>") = True Then
Label1.Text = "It's there"
' if it does
ElseIf (sourceString).Contains($"<div class={QUOTE}panel-base xbl{QUOTE} style={QUOTE}background-color: RGB(236, 255, 236);{QUOTE}><div class={QUOTE}marshmallowLogo{QUOTE} id={QUOTE}xboxLogo{QUOTE}>Xbox 360</div><center><span class={QUOTE}statusSpan{QUOTE} style={QUOTE}color green;{QUOTE}>Up</span></center>") = False Then
Label1.Text = "It's not"
End If

End Sub


End Class

Answer

So I spent a few minutes analyzing the page (you're welcome), and as indicated in a comment the data is loaded via javascript and is not present in the base html returned by your original URL. I'm not 100% sure yet, but I think you actually want to look at this address:

https://supportfiles.rockstargames.com/support/serverStatus.json

which returns a response like this:

jsonCallbackStatus(
    {
        "statuses":

            {
                "psnUpOrDownOverride": "",
                "ps4UpOrDownOverride": "",
                "xboxUpOrDownOverride": "",
                "xboxOneUpOrDownOverride": "",
                "rgscUpOrDownOverride": "",
                "psnWarningOverrideMessage": "",
                "ps4WarningOverrideMessage": "",
                "xboxWarningOverrideMessage": "",
                "xboxOneWarningOverrideMessage": "",
                "rgscWarningOverrideMessage": "",
                "pcWarningOverrideMessage": "",
                "pcUpOrDownOverride": "",
                "giantWarningOverrideMessage": ""
            },

    }
);

If I'm reading this correctly, the empty string next to each item means there's nothing wrong... no news is good news. This should be so much easier to parse than all that html :) Don't forget to look at both the warning and the up/down status for your platform, as well as the giantWarningOverrideMessage.

How I found this address

Data like this almost always comes in one of three ways: json, rss (or similar xml), or web service (soap). A web service would usually be loaded and parsed at the server, and then sent with the html, and rss is harder to parse in javascript and less popular recently, so I went for json first.

I started by opening the page in chrome. Then I opened the developer tools (F12) and chose the Network tab. Now when I refresh the page I get a list of every item downloaded from the web server for this page.1 I then narrow down the list by just looking at the javascript downloads (the JS button in the toolbar... I'm looking for a json response). This gives me a reasonable number of items, and I can narrow the search further by only looking at 200 status responses, of which I only saw two: both from this address.

Note that the full address actually looked like this:

https://supportfiles.rockstargames.com/support/serverStatus.json?callback=jsonCallbackStatus&callback=jsonCallbackStatus&_=1465445182216

There's a bug in the page, as it makes no sense to have a callback url parameter twice, especially with the same value. I only bring this up because of the _ url parameter. Cut the last 3 digits off of that value and you end up with a unix timestamp that happens to match today's date. You may want to generate a url which includes a timestamp like this, as it's possible that Rockstar uses the timestamp on the server to avoid serving a cached response. You'd hate to a get a response cached an hour ago when everything was fine if a server is down now.

One last reminder: I'm not 100% sure this is the data you need. It's possible it comes from another request. But this is all you get for free :) Hopefully the write up of how I got this far is enough for you to do your own detective work verifying the result.

Of course, you also have the option of using a WebBrowser control, which would run the javascript. But it's way slower, you're back to parsing the ugly html, and any little html change will break your code (whereas the json result is likely to live through several web site redesigns).

Source code to read the data

Dim unixTime As ULong = (DateTime.UtcNow - New DateTime(1970, 1, 1, 0, 0, 0)).TotalMilliSeconds
Using wc As New WebClient(),
      rdr As New StreamReader(wc.OpenRead($"https://supportfiles.rockstargames.com/support/serverStatus.json?_={unixTime}"))

    Dim line = rdr.ReadLine()
    While line IsNot Nothing
        line = line.Trim()
        If line.StartsWith("""xboxUpOrDownOverride") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Up/Down Failed")
            Else
                Console.WriteLine("Up/Down Okay")
            End If
        End If
        If line.StartsWith("""xboxWarningOverrideMessage") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Warning Failed")
            Else
                Console.WriteLine("Warning Okay")
            End If
        End If
        If line.StartsWith("""giantWarningOverrideMessage") Then
            Dim parts = line.Split(":".ToCharArray())
            parts(1) = Regex.Replace(parts(1), "[ "",]", "")
            If parts(1).Length > 0 Then
                Console.WriteLine("Giant Warning Failed")
            Else
                Console.WriteLine("Giant Warning Okay")
            End If
        End If
        line = rdr.ReadLine()
    End While

You should also consider using a real json parser (very easy to do via NuGet), as even something as simple as adding a minimizer would break this existing code by pushing everything into one line.


1 And there were a lot of things downloaded. Rockstar should invest in a bundler to minimize http requests for faster page loads and lower bandwidth, especially on mobile devices.