Greetings Stackoverflow! I am looking for a little help on how to parse an html document. My challenge is that I can not use a third party dll such as HTML Agility pack etc. Unfortunately this all has to be done via code or refrences native to VS. I was looking into JSon but I thought maybe someone had an easier way. I am trying to retrieve certain data from webpages like: http://www.wowhead.com/item=109118/blackrock-ore. There are multiple sections I am looking to retrieve data from: Each section starts with:
Well, hundreds of SO users will tell you not to regex HTML, but you're technically scraping the content within
<script>...</script> tags, so you may be able to get away with this one.
Let's take a crack at it.
After inspecting the page source, it appears that the JS within the
<script>...</script> tags is formatted consistently. This makes our jobs easy.
We know that the
id attribute will follow the
template attribute. We also know that the developer of this webpage consistently used single-quotes to surround his
template values. Therefore we'll capture the contents within this single-quotes that follow the
id attribute names using