Simeon Popov Simeon Popov - 6 months ago 25
Python Question

Python string to json and get html

I have one json string, which seems is not correct json :)

{"d":{"__type":"NGW.WebClient.AjaxMessages.GVGameHtmlResponse","res":0,"html":"\u003cdiv id=\"gvGameFixed\" class=\"Hockey\" leagueid=\"4\" brmatchid=\"0\"\u003e\r\n\t\r\n \u003cdiv class=\"gameHead\"\u003e\r\n \u003cdiv class=\"section\"\u003e\r\n \u003cdiv class=\"subtitle\"\u003eHockey - NHL\u003c/div\u003e\r\n\t\t\t\u003cdiv class=\"desc\"\u003eHp Pavillion At San Jose\u003c/div\u003e\r\n \u003cdiv class=\"title\"\u003ePit Penguins vs SJ Sharks\u003c/div\u003e\r\n \u003c/div\u003e\r\n \u003cdiv class=\"nav\"\u003e\r\n \u003cbutton id=\"btnMyBets\" type=\"button\" class=\"btnMyBets\" onclick=\"loadMyWagersFrameOnGame(70892);\"\u003eMy Bets on This Game\u003c/button\u003e\r\n \u003c/div\u003e\r\n \u003c/div\u003e\r\n\r\n \r\n \r\n\r\n\u003c/div\u003e\r\n\r\n\u003cdiv id=\"gvPropContainer\" class=\"scrollInner\"\u003e\r\n \u003cdiv id=\"gvGameNoProps\"\u003e\r\n This event has no active propositions\r\n \u003c/div\u003e\r\n \r\n \r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736341\u0027 order=\u002710\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Winner\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n \r\n \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736341\u0027 pos=\u00271\u0027 odds=\u00271.3704\u0027 pts=\u00271.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003epit penguins +1.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−270\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736341\u0027 pos=\u00272\u0027 odds=\u00273.21\u0027 pts=\u00271.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003esj sharks −1.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+221\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736342\u0027 order=\u002720\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Total - Incl OT/Pen\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n \r\n \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736342\u0027 pos=\u00271\u0027 odds=\u00272.39\u0027 pts=\u00275.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003eover 5.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+139\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736342\u0027 pos=\u00272\u0027 odds=\u00271.6061\u0027 pts=\u00275.5\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003eunder 5.5\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−165\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\r\n\u003cdiv class=\"gvProp\" pid=\u00272736343\u0027 order=\u002730\u0027\u003e\r\n \u003cdiv class=\"propTitle\"\u003e\u003cspan\u003eGame Winner ML - Incl OT/Pen\u003c/span\u003e\u003c/div\u003e \r\n \u003cul class=\u0027oneUp\u0027\u003e\r\n \r\n \r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736343\u0027 pos=\u00271\u0027 odds=\u00272.16\u0027 pts=\u00270\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003epit penguins\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t+116\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n\r\n\r\n\u003cli onmouseover=\u0027mouseOver(this);\u0027 onmouseout=\u0027mouseOut(this);\u0027 onclick=\u0027betSlipAdd(event||window.event, this);\u0027 class=\u0027\u0027 pid=\u00272736343\u0027 pos=\u00272\u0027 odds=\u00271.7299\u0027 pts=\u00270\u0027\u003e\r\n\r\n\t\u003cdiv class=\"box\"\u003e\r\n\t \u003cdiv class=\"propText\"\u003esj sharks\u003c/div\u003e\r\n\t \u003cdiv class=\"odds\"\u003e\r\n\t \t−137\r\n \u003cimg alt=\"flat\" src=\"Skin/Pinoccio/Images/odds_flat.png?v=4.5.4.81\"/\u003e\r\n\t \u003c/div\u003e\r\n\t \u003cdiv class=\u0027selStatus\u0027\u003e\u003cspan\u003e\u003c/span\u003e\u003c/div\u003e\r\n \u003c/div\u003e\r\n\u003c/li\u003e\r\n\r\n \u003c/ul\u003e\r\n\u003c/div\u003e\r\n\r\n\u003c/div\u003e\r\n\r\n","gameID":70892,"maxPropStamp":1465233306663,"progStamp":1464871557570,"msgsHtml":"\r\n\r\n\u003cdiv id=\"eventMessages\"\u003e\r\n \u003cul id=\"eventMessagesContent\"\u003e\r\n \r\n \r\n \u003c/ul\u003e\r\n \u003cdiv class=\"viewMoreBtn collapsed\"\u003e\r\n \u003cinput type=\"hidden\" id=\"strViewMoreMessages\" value=\"Show messages\"/\u003e\r\n \u003cinput type=\"hidden\" id=\"strHideMessages\" value=\"Hide messages\"/\u003e\r\n \u003cp\u003e\r\n Show messages\r\n \u003c/p\u003e\r\n \u003c/div\u003e\r\n\u003c/div\u003e","maxMessageStamp":1465233485217}}


I need to get "html" value and to process it with BeautifulSoup.

The problems are:
1. Why i cannot convert this ti json (anyway .. i can get it with regex too)
2. The biggest problem is that e cannot convert this unicode string to pure html which i should process with bs4. Can you help. What can i do to get this string and process it with BeautifulSoup.

Thanks.

Answer

This manages to read your data properly (supposing test.json contains your data):

#!/bin/python

import json
import bs4

with open('test.json') as file_:
    json_data = json.load(file_)

soup = bs4.BeautifulSoup(json_data['d']['html'], 'html.parser')

print(soup)