user3768495 user3768495 - 6 months ago 223
Python Question

How to debug a scrapy pipeline?

I am following this tutorial to learn how to use scrapy and mongodb together. However, I keep getting these error messages:

[Anaconda2] C:\Users\Segovia\Dropbox\stack>scrapy crawl stack
Traceback (most recent call last):
File "c:\users\segovia\anaconda2\lib\runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "c:\users\segovia\anaconda2\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\Users\Segovia\Anaconda2\Scripts\scrapy.exe\__main__.py", line 9, in <module>
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\cmdline.py", line 108, in execute
settings = get_project_settings()
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\utils\project.py", line 60, in get_project_settings
settings.setmodule(settings_module_path, priority='project')
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\settings\__init__.py", line 285, in setmodule
self.set(key, getattr(module, key), priority)
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\settings\__init__.py", line 260, in set
self.attributes[name].set(value, priority)
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\settings\__init__.py", line 55, in set
value = BaseSettings(value, priority=priority)
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\settings\__init__.py", line 91, in __init__
self.update(values, priority)
File "c:\users\segovia\anaconda2\lib\site-packages\scrapy\settings\__init__.py", line 317, in update
for name, value in six.iteritems(values):
File "c:\users\segovia\anaconda2\lib\site-packages\six.py", line 599, in iteritems
return d.iteritems(**kw)
AttributeError: 'list' object has no attribute 'iteritems'


Can someone tell me what possibly went wrong? Or maybe someone can give me a hint on how to debug it? I've tried the 'parse' method provided on scrapy official documentation but it did not work for me. To debug it, I hope I can use an IDE and 'step-in' these codes and check what is going on in details. Thanks!

The settings.py file has these lines in it:

ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ]

MONGODB_SERVER = "localhost"
MONGODB_PORT = 27017
MONGODB_DB = "stackoverflow"
MONGODB_COLLECTION = "questions"


And I am sure 'mongod' is running in another cmd window.

Answer

Let's look at the error:

AttributeError: 'list' object has no attribute 'iteritems'

At this part of your project settings:

ITEM_PIPELINES = ['stack.pipelines.MongoDBPipeline', ]

And at this documentation page.

Scrapy expects ITEM_PIPELINES to be a dictionary and you are giving it a list. Fix it:

ITEM_PIPELINES = {'stack.pipelines.MongoDBPipeline': 300}