GithubHelp home page GithubHelp logo

mspider's Introduction

MSpider

Talk

You can join the QQ Group of 153691452, we can talk about MSpider.

Installation

In Ubuntu, you need to install some libraries.

You can use pip or easy_install or apt-get to do this.

  • lxml
  • chardet
  • splinter
  • gevent
  • phantomjs

Example

  1. Use MSpider collect the vulnerability information on the wooyun.org.
	python mspider.py -u "http://www.wooyun.org/bugs/" --focus-domain "wooyun.org" --filter-keyword "xxx" --focus-keyword "bugs" -t 15 --random-agent true
  1. Use MSpider collect the news information on the news.sina.com.cn.
	python mspider.py -u "http://news.sina.com.cn/c/2015-12-20/doc-ifxmszek7395594.shtml" --focus-domain "news.sina.com.cn"  -t 15 --random-agent true

ToDo

  1. Crawl and storage of information.
  2. Distributed crawling.

MSpider's help

Usage:
  __  __  _____       _     _
 |  \/  |/ ____|     (_)   | |
 | \  / | (___  _ __  _  __| | ___ _ __
 | |\/| |\___ \| '_ \| |/ _` |/ _ \ '__|
 | |  | |____) | |_) | | (_| |  __/ |
 |_|  |_|_____/| .__/|_|\__,_|\___|_|
               | |
               |_|
                        Author: Manning23


Options:
  -h, --help            show this help message and exit
  -u MSPIDER_URL, --url=MSPIDER_URL
                        Target URL (e.g. "http://www.site.com/")
  -t MSPIDER_THREADS_NUM, --threads=MSPIDER_THREADS_NUM
                        Max number of concurrent HTTP(s) requests (default 10)
  --depth=MSPIDER_DEPTH
                        Crawling depth
  --count=MSPIDER_COUNT
                        Crawling number
  --time=MSPIDER_TIME   Crawl time
  --referer=MSPIDER_REFERER
                        HTTP Referer header value
  --cookies=MSPIDER_COOKIES
                        HTTP Cookie header value
  --spider-model=MSPIDER_MODEL
                        Crawling mode: Static_Spider: 0  Dynamic_Spider: 1
                        Mixed_Spider: 2
  --spider-policy=MSPIDER_POLICY
                        Crawling strategy: Breadth-first 0  Depth-first 1
                        Random-first 2
  --focus-keyword=MSPIDER_FOCUS_KEYWORD
                        Focus keyword in URL
  --filter-keyword=MSPIDER_FILTER_KEYWORD
                        Filter keyword in URL
  --filter-domain=MSPIDER_FILTER_DOMAIN
                        Filter domain
  --focus-domain=MSPIDER_FOCUS_DOMAIN
                        Focus domain
  --random-agent=MSPIDER_AGENT
                        Use randomly selected HTTP User-Agent header value
  --print-all=MSPIDER_PRINT_ALL
                        Will show more information

mspider's People

Contributors

manning23 avatar

Watchers

James Cloos avatar hunt5r avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.