user1941390 user1941390 - 5 months ago 56
Python Question

how to search for elements containing unicode/arabic letters?

I am running the below code to find an element containing Unicode Arabic characters. The below code works just fine if I replace XXX with English letter, however, if I replace them with Arabic letters It won't.

I checked the html page and it has "< meta charset="utf-8" >" so I set the character set in my Py script at the first line just to make sure the letters are interpreted as expected but still not working.

Any clue is much appreciate it.


# coding=UTF8

from selenium import webdriver
# create a new Firefox session
driver = webdriver.Firefox()
print driver.find_element_by_xpath(u"//*[contains(text(), 'XXX')]").text


I think you are not using the correct unicode in the xpath, check the demo in Ipython here

First I have selected one node to get the corresponding unicode for that arabic word, so after using that unicode modified the xpath as follows and this was the output.

In [1]: response.xpath('//li[@class="lensItem"]/a/text()').extract()
Out[1]: [u'\u0639\u062f\u0633\u06cc']

In [2]: response.xpath(u'//a[contains(text(), "\u0639\u062f\u0633\u06cc")]/text()').extract()
 u'\u0645\u0634\u062e\u0635\u0627\u062a \u0639\u062f\u0633\u06cc \u0622\u0641\u062a\u0627\u0628\u06cc']

In [3]: a = response.xpath(u'//a[contains(text(), "\u0639\u062f\u0633\u06cc")]/text()').extract()

In [4]: for i in a:
    ...:     print i
مشخصات عدسی آفتابی


I have tested the xpath using Scrapy but this will also work with selenium,

In [6]: driver.find_element_by_xpath(u'//a[contains(text(), "\u0639\u062f\u0633\u06cc")]').text
Out[6]: u'\u0639\u062f\u0633\u06cc'

I hope this will help you to solve your issues.