Zepol Zepol - 6 months ago 45
Python Question

AttributeError: 'HTTPResponse' object has no attribute 'split'

I am trying to get some information from google finance but i seem to be getting this error

AttributeError: 'HTTPResponse' object has no attribute 'split'

here is my python code:

import urllib.request
import urllib
from bs4 import BeautifulSoup

symbolsfile = open("Stocklist.txt")

symbolslist = symbolsfile.read()

thesymbolslist = symbolslist.split("\n")

i=0


while i<len (thesymbolslist):
theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c"
thepage = urllib.request.urlopen (theurl)
print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1])
i= i+1

Answer

The Cause of the Problem

This is because urllib.request.urlopen (theurl) returns an object representing the connection, not a string.


The Solution

To read data from this connection and actually get a string, you need to do

thepage = urllib.request.urlopen(theurl).read()

and then the rest of your code should follow naturally.

Addendum to the Solution

Occasionally, the string itself contains an unrecognised character encoding glyph, in which case Python converts it into a bytestring.

The right approach to dealing with that is to find the correct character encoding and decode the bytestring into a regular string using it, as seen in this question:

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

It is sometimes safe to make the assumption that the character encoding is utf-8, in which case

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

does work more often than not. It's a statistically good guess if nothing else.