Geo Geo - 11 months ago 61
Python Question

Is it possible to run just through the item pipeline without crawling with Scrapy?

I have a

file with items I've scraped. I have another pipeline now which was not present when I was doing the scraping. Is it possible to run just the pipeline, and have it apply the new pipeline without doing the crawl/scrape again?

Answer Source

Quick answer: Yes.

To bypass the downloader while having other components of scrapy working, you could use a customized downloader middleware which returns Response objects in its process_request method. Check the details:

But in your case I personally think you could use some simple code to download the .jl file from your local file system. A quick (and full) example:

# coding: utf8

import json
import scrapy

class SampleSpider(scrapy.Spider):

    name = 'sample_spider'
    start_urls = [
    custom_settings = {
        'ITEM_PIPELINES': {
            'your_pipeline_here': 100,

    def parse(self, response):
        for line in response.body.splitlines():
            jdata = json.loads(line)
            yield jdata

Just replace '/tmp/some_file.jl' with your actual path to the file.