Behi Behi -3 years ago 223
Python Question

python text parsing and splitting

I grab text file, as follows, from a web page using python. The data I grab includes extra things that I don't need. I only need parts that are bolded. I also need to split each of the bolded parts from each other. Would you help me to do so.In an image, also the red parts are what I am trying to extract from the data.

[
'\n249\nSRUS54 KFWD 051849\nRR5FWD\n:\n:
ALERT HOURLY ACCUMULATOR DATA\n:
NATIONAL WEATHER SERVICE FORT WORTH TX\n:
**1249 PM CST SUN MAR 5 2017**\n:\n:
HOURLY ACCUMULATOR INFORMATION TABLE\n:\n:
NOTE: ERRONEOU S REPORTS MAY BE RECEIVED UNDER CERTAIN\n:
WEATHER CONDITIONS\n:\n:
**********************************************************\n:
ID LOCATION ACCUMULATOR VALUE\n:
**********************************************************\n:
**CITY OF DALLAS ALERT SYSTEM**
\n**.A DCQT2 170305 C DH124216 /HGIRS
396.7**:
\n\n**.A DCVT2 170305 C DH123434 /HGIRS 516.8**:
\n\n**.A DAOT2 170305 C DH123721 /HGIRS 534.2**:\n\n**.A DDCT2
170305 C DH120338 /HGIRS 395.0**:\n\n**.A DAHT2 170305 C DH114758 /HGIRS
496.1**:\n\n\n\n']


This is an image of the data I grab from the web

import urllib
import re
htmlfile=urllib.urlopen("http://forecast.weather.gov/product.php?site=NWS&issuedby=FWD&product=RR5&format=CI&version=1&glossary=0")
htmltext=htmlfile.read()
regex='<pre class="glossaryProduct">(.+?)</pre>'
pattern=re.compile(regex,re.S)
out=re.findall(pattern, htmltext)
text=str(out)
saveFile=open('test.txt', 'w')
saveFile.write(text)
saveFile.close()
print (text)

Answer Source

NOAA data is usually formatted pretty regularly. The best approach is to split the input into separate lines and then loop through line-by-line.

Skip lines, unless they start with a phrase or keyword you're interested in. For example:

for line in text.split('\n'):
    if any([re.match('^: [0-9]{4} [AP]M', line),   # matches : 1249 PM
            line.startswith(': CITY OF'),          # CITY OF...
            line.startswith('.A D')]):             # .A D....
    saveFile.write(line)

(You'll need to modify the above base on what the actual, possible line values are.)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download