GithubHelp home page GithubHelp logo

accordbox / awesome-scrapy Goto Github PK

View Code? Open in Web Editor NEW
524.0 20.0 65.0 53 KB

A curated list of awesome packages, articles, and other cool resources from the Scrapy community.

python awesome scrapy awesome-scrapy

awesome-scrapy's Introduction

Awesome Scrapy Awesome

A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python.

Table of Contents

Apps

Visual Web Scraping

  • Portia Visual scraping for Scrapy

Distributed Spider

Scrapy Service

  • scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.

  • scrapyd A service daemon to run Scrapy spiders

  • scrapyd-client Command line client for Scrapyd server

  • python-scrapyd-api A Python wrapper for working with Scrapyd's API.

  • SpiderKeeper A scalable admin ui for spider service

  • scrapyrt HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.

Front-End Scrapy Managers

  • Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js

  • SpiderKeeper admin ui for scrapy/open source scrapinghub.

  • ScrapydWeb Scrapyd cluster management, Scrapy log analysis & visualization, Basic auth, Auto packaging, Timer Tasks, Email notice, and Mobile UI.

Monitor

  • scrapy-sentry Logs Scrapy exceptions into Sentry

  • scrapy-statsd-middleware Statsd integration middleware for scrapy

  • scrapy-jsonrpc An extension to control a running Scrapy web crawler via JSON-RPC

  • scrapy-fieldstats A Scrapy extension to log items coverage when the spider shuts down

  • spidermon Extension which provides useful tools for data validation, stats monitoring, and notification messages.

Avoid Ban

  • HttpProxyMiddleware A middleware for scrapy. Used to change HTTP proxy from time to time.

  • scrapy-proxies Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed.

  • scrapy-rotating-proxies Use multiple proxies with Scrapy

  • scrapy-random-useragent Scrapy Middleware to set a random User-Agent for every Request.

  • scrapy-fake-useragent Random User-Agent middleware based on fake-useragent

  • scrapy-crawlera Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.

Data Processing

Process Javascript

Other Useful Extensions

  • scrapy-djangoitem Scrapy extension to write scraped items using Django models

  • scrapy-deltafetch Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls

  • scrapy-crawl-once This package provides a Scrapy middleware which allows to avoid re-crawling pages which were already downloaded in previous crawls.

  • scrapy-magicfields Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.

  • scrapy-pagestorage A scrapy extension to store requests and responses information in storage service.

  • itemloaders Library to populate items using XPath and CSS with a convenient API.

  • itemadapter Adapter which provides a common interface to handle objects of different types in an uniform manner.

  • scrapy-poet Page Object pattern implementation which enables writing reusable and portable extraction and crawling code.

Resources

Articles

Exercises

Video

Book

awesome-scrapy's People

Contributors

asciidiego avatar fkromer avatar michael-yin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-scrapy's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.