bjornasm bjornasm - 8 months ago 57
Python Question

Encoding error when reading url with urllib

When I try to scrape a wikipedia site with a special character in its URL, using urllib.request and Python, I get the following error

UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 23: ordinal not in range(128)

The code:

# -*- coding: utf-8 -*-
import urllib.request as ur

url = "øre"
r = ur.urlopen(url).read()

How can I use urllib.request with utf-8 encoding?


Apparently, urllib can only handle ASCII requests, and converting your url to ascii gives a error on your special character. Replacing ø with %C3%B8, the proper way to encode this special character in http, seems to do the trick. However, I can't find a method to do this automatically like your browser does.


>>> f=""
>>> import urllib.request
>>> g=urllib.request.urlopen(f)
>>> text[:100]
b'<!DOCTYPE html>\n<html class="client-nojs" lang="nb" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title'

The answer above doesn't work, because he is encoding after the request is processed, while you get an error during the request processing.