When I try to scrape a wikipedia site with a special character in its URL, using urllib.request and Python, I get the following error
UnicodeEncodeError: 'ascii' codec can't encode character '\xf8' in position 23: ordinal not in range(128)
# -*- coding: utf-8 -*-
import urllib.request as ur
url = "https://no.wikipedia.org/wiki/Jonas_Gahr_Støre"
r = ur.urlopen(url).read()
Apparently, urllib can only handle ASCII requests, and converting your url to ascii gives a error on your special character. Replacing ø with %C3%B8, the proper way to encode this special character in http, seems to do the trick. However, I can't find a method to do this automatically like your browser does.
>>> f="https://no.wikipedia.org/wiki/Jonas_Gahr_St%C3%B8re" >>> import urllib.request >>> g=urllib.request.urlopen(f) >>> text=g.read() >>> text[:100] b'<!DOCTYPE html>\n<html class="client-nojs" lang="nb" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title'
The answer above doesn't work, because he is encoding after the request is processed, while you get an error during the request processing.