insecte insecte - 5 months ago 18
MySQL Question

How to loop parse pattern from MySql table using python

I'm trying to parse scrapy result with regex, the thing is my regex patterns located in MySQL table.. i have trouble to loop the pattern in sequence in order to return clean content without any html tags..
simply say html result scrapy -> parse with pattern in row1 (ex: clean html above content), parse with pattern in row2 (clean html below content) , ..... -> clean

example

<body>
<title>
<some tags>
<content>
<footer tags>
<another tags>
</body>


i'm trying to clean that html, with this table, field name (pattern , sequence, replacer), values:

row1 <body.*?some tags> 1 None
row2 <footer.*?/body> 2 None
row3 <br> 3 Enter
row4 #&quot 4 ""


so i have a clean content in return, i'm using regex replace pattern, not xpath match, because i'm expecting to scrap a lot of web with their own html tags variation

here's my code , it didn't raise an error but the result is repeated.. it supposed to be 1 clean result from 1 scrapy result. i think i did something wrong, but can't figure it out since i'm new in python and scrapy

def parse(self, response):
for mbuh in response.xpath('//body'):
Item = ParsingerbotItem()
Item['ling'] = str(response.url)
ngaliase = re.findall("\w+.com", str(response.url))[0]
mmhtml = mbuh.xpath('//body').extract()
cur.execute("select aliase, pattern, seq, opsi, replacer from tb_bersihin where aliase='"+ngaliase+"\' order by seq asc")
for filde in cur.fetchall():
faliase = filde[0]
fpattern = filde[1]
fseq = filde[2]
fopsi = filde[3]
freplacer = filde[4]
print "faliase=%s,fpattern=%s,furutan=%d,fopsi=%s,freplacer=%s" % \
(faliase, fpattern, fseq, fopsi, freplacer )
if ( freplacer == "NO" ) : freplacer=""
if ( fopsi == "NL" ) : fopsi="re.DOTALL"
k1 = re.sub(fpattern , freplacer, str(mmhtml), re.DOTALL)
print k1


thank you in advance

Answer

I think i solved my own question, maybe i'm not good at describing my question above but all i want is make the result from first pattern to be the subject for the second pattern, and then continue to the next pattern from mysql table..

all i did was just changed

k1 = re.sub(fpattern , freplacer, str(mmhtml), re.DOTALL) 

into

k1 = re.sub(fpattern , freplacer, k1)

and declare k1 = str(mmhtml) before loop cur.fetchall()

thank you

Comments