plzhelpmi plzhelpmi - 4 months ago 22
JSON Question

Beautifulsoup to extract within tags and output as a JSON

As mentioned in the previous question, I am using Beautiful soup with python to retrieve weather data from a website.

Here's how the website looks like:

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>`
<channel>


I managed to retrieve the information I need using these codes :

import requests
from bs4 import BeautifulSoup
import urllib3
import json


weather = []

#getting the time

r = requests.get('http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=781CF461BB6606AD907750DFD1D07667C6E7C5141804F45D')
soup = BeautifulSoup(r.content, "xml")
time = soup.find('validTime').string
print "validTime: " + time

for currentdate in soup.find_all('item'):
element = currentdate.find('forecastIssue')
print "date: " + element['date']

for currentdate in soup.find_all('item'):
element = currentdate.find('forecastIssue')
print "time: " + element['time']

for area in soup.find('weatherForecast').find_all('area'):
print area


#file writing
with open("c:/scripts/nea.json", 'w') as outfile:
json.dumps(weather, outfile)
#outfile.write(",")


This is the output I got (in CMD) :

C:\scripts>python neaweather.py
2.30 pm to 4.30 pm
date: 25-07-2016
time: 02:30 PM
<area forecast="LR" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="LR" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="LR" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="LR" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="LR" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="LR" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>


I have a few questions that I'm not sure of how to solve :


  1. Is there any way to retrieve the attributes in area forecast="LR" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio" without its tags?

    I tried adding ".text" to my codes but there would always be an error

  2. I would like the output to be in a JSON format for my output as it isn't in a table format as shown on tutorials on how to create a JSON file with python :/



EDIT: I have managed to open the data in a JSON file but how do I format the unicode string into a normal string as the result contains u' ?

Answer

Try This in your code:

with open("nea.json",'a+') as fs:
    for area in soup.find('weatherForecast').find_all('area'):
        fs.write(str(area.attrs))
Comments