Tony Wang Tony Wang - 2 months ago 20
JSON Question

how to return item load in scrapy loop

The code is as below , every time it returns only the first loop ,the last 9 loops disapeared .So what should I do to get all the loops ?

I have tried to add a "m = []" and m.append(l) ,but got a error "ERROR: Spider must return Request, BaseItem, dict or None, got 'ItemLoader'"

link is http://ajax.lianjia.com/ajax/housesell/area/district?ids=23008619&limit_offset=0&limit_count=100&sort=&&city_id=110000

def parse(self, response):
jsonresponse = json.loads(response.body_as_unicode())
for i in range(0,len(jsonresponse['data']['list'])):
l = ItemLoader(item = ItjuziItem(),response=response)
house_code = jsonresponse['data']['list'][i]['house_code']
price_total = jsonresponse['data']['list'][i]['price_total']
ctime = jsonresponse['data']['list'][i]['ctime']
title = jsonresponse['data']['list'][i]['title']
frame_hall_num = jsonresponse['data']['list'][i]['frame_hall_num']
tags = jsonresponse['data']['list'][i]['tags']
house_area = jsonresponse['data']['list'][i]['house_area']
community_id = jsonresponse['data']['list'][i]['community_id']
community_name = jsonresponse['data']['list'][i]['community_name']
is_two_five = jsonresponse['data']['list'][i]['is_two_five']
frame_bedroom_num = jsonresponse['data']['list'][i]['frame_bedroom_num']
l.add_value('house_code',house_code)
l.add_value('price_total',price_total)
l.add_value('ctime',ctime)
l.add_value('title',title)
l.add_value('frame_hall_num',frame_hall_num)
l.add_value('tags',tags)
l.add_value('house_area',house_area)
l.add_value('community_id',community_id)
l.add_value('community_name',community_name)
l.add_value('is_two_five',is_two_five)
l.add_value('frame_bedroom_num',frame_bedroom_num)
print l
return l.load_item()

Answer

The error:

ERROR: Spider must return Request, BaseItem, dict or None, got 'ItemLoader'

is slightly misleading since you can also return a generator! What is happening here is that return breaks the loop and the whole function. You can turn this function into a generator to avoid this.

Simply just replace return with yield in your last line.

return l.load_item()

to:

yield l.load_item()
Comments