I wrote script that find expressions in web page:
import sre, urllib2, sys, BaseHTTPServer
# -*- coding: utf-8 -*-
address = sys.argv
web_handle = urllib2.urlopen(address)
website_text = website_handle.read()
matches = sre.findall(u"עברית", website_text)
for item in matches:
You need to ensure that the input string is also in UTF8 format.
unicode function with
utf-8 as second argument:
website_text = unicode(website_text, "utf-8")
Everything should be in consistent encoding for unicode to work in Python 2.