3 scrapy spiders implemented with ItemLoader
There are 3 spiders implemented with ItemLoader and without it. You can consider usage of ItemLoader for your spiders after reading the readme file until the end.
Table of Contents
Introduction
If you are a newbie in scrapy but have already written several spiders and gonna write more spiders, you should consider usage of ItemLoader if you don't use it yet. I will not describe features of ItemLoader and processors, check out official docs for this. But I will show migration from real world spiders without ItemLoader to spiders with ItemLoaders.
Migration steps
- Replace bare item field assignments with ItemLoader
- Usage of context selectors which simplify code
- Required step: add output processors
- Default output processor
- Optional step: extending ItemLoader
Installation
$ pip install scrapy
$ git clone [email protected]:taroved/3spiders-with-itemloader.git
# check contracts for spiders
$ cd 3spiders-with-itemloader
$ scrapy check
Scraping
$ scrapy crawl apple
Output scrapped data to the file and write log file:
$ scrapy crawl apple -o apple.json --logfile=apple.log
More spiders
The second spider scrape locations from wetseal.com:
$ scrapy crawl wetseal -o wetseal.json
The third spider scrape products from hhgregg.com:
$ scrapy crawl hhgregg -o hhgregg.json
Summary
I haven't said a lot, but you can take a look at the full diff between versions without and with ItemLoader for the spiders and make the right decision.
License
WTFPL