I would like to do my first project in python but I have problem with coding. When I fetch data it shows coded letters instead of my native letters, for example '\xc4\x87' instead of 'ć'. The code is below:
page = urllib.request.urlopen("http://olx.pl/")
test = page.read()
z = "ł"
The data you read from a
urlopen() response is encoded data. You'd need to first decode that data using the right encoding.
You appear to have downloaded UTF-8 data; you'd have to decode that data first before you had text:
test = page.read().decode('utf8')
However, it is up to the server to tell you what data was received. Check for a characterset in the headers:
encoding = page.info().getparam('charset')
This can still be
None; many data formats include the encoding as part of the format. XML for example is UTF-8 by default but the XML declaration at the start can contain information about what codec was used for that document. An XML parser would extract that information to ensure you get properly decoded Unicode text when parsing.
You may not be able to print that data; the 852 codepage can only handle 256 different codepoints, while the Unicode standard is far larger.