VoidBug VoidBug - 1 month ago 18
Python Question

scrapy: ERROR: Error downloading <GET http://stackoverflow.com/questions?sort=votes> TypeError: 'float' object is not iterable

I am a new learner of python and scrapy, I copy these codes from a video, they worked well in the video but when I have a try, there is a TypeError of 'float' object is not iterable, here are the codes

import scrapy

class StackOverflowSpider(scrapy.Spider):
name="stackoverflow"
start_urls=["http://stackoverflow.com/questions?sort=votes"]

def parse(self,response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url=response.urljoin(href.extract())
yield scrapy.Request(full_url,callback=self.parse_question)

def parse_question(self,response):
yield {
'title':response.css('h1 a::text').extract()[0],
'votes':response.css(".question.vote-count-post::text").extract()[0],
'body':response.css(".question.post-text").extract()[0],
'tags':response.css(".question.post-tag::text").extract(),
'link':response.url,
}


then here is the Error:

2017-03-10 16:06:39 [scrapy] INFO: Enabled item pipelines:[]
2017-03-10 16:06:39 [scrapy] INFO: Spider opened
2017-03-10 16:06:39 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-10 16:06:39 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-10 16:06:40 [scrapy] ERROR: Error downloading <GET http://stackoverflow.com/questions?sort=votes>
Traceback (most recent call last):
File "C:\Anaconda2\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "C:\Anaconda2\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "C:\Anaconda2\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
result = f(*args, **kw)
File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
return handler.download_request(request, spider)
File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
return agent.download_request(request)
File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1631, in request
parsedURI.originForm)
File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
d = self._pool.getConnection(key, endpoint)
File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
return self._newConnection(key, endpoint)
File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
return endpoint.connect(factory)
File "C:\Anaconda2\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
EndpointReceiver, self._hostText, portNumber=self._port
File "C:\Anaconda2\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
onAddress = self._simpleResolver.getHostByName(hostName)
File "C:\Anaconda2\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
File "C:\Anaconda2\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-03-10 16:06:40 [scrapy] INFO: Closing spider (finished)
2017-03-10 16:06:40 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/exceptions.TypeError': 1,
'downloader/request_bytes': 235,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 3, 10, 8, 6, 40, 117000),
'log_count/DEBUG': 1,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2017, 3, 10, 8, 6, 39, 797000)}
2017-03-10 16:06:40 [scrapy] INFO: Spider closed (finished)


thanks for your help!

Answer Source

You code works in python3, but the items is empty list, I delete the index and run it again:

2017-03-10 16:48:34 [scrapy.core.scraper] DEBUG: Scraped from <200 http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits>
{'link': 'http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits', 'title': ['How to modify existing, unpushed commits?'], 'votes': [], 'body': [], 'tags': []}