GithubHelp home page GithubHelp logo

crawl-me's Introduction

crawl-me

crawl-me 是一个基于plugin的网页图片下载工具。crawl-me通过简单的命令行就可以用你想要的方式下载各个网站下的图片。目前暂时只支持gamersky(游明星空), pixiv(P站),更多plugin尽请期待,欢迎为它添加新的plugin。

Installation

通过git安装

  1. ####Ubuntu下安装

    由于代码依赖了pyquery,安装前请确保libxslt-devel libxml2-devel已被安装

     sudo apt-get install libxml2-dev
     sudo apt-get install libxslt1-dev 
    

    然后请确保安装了setuptools, Ubuntu下你可以:

     sudo apt-get install python-setuptools
    

    然后从github clone source到本地

     $ git clone https://github.com/nyankosama/crawl-me.git
     $ cd crawl-me/
     $ sudo python setup.py install
    
  2. ####Windows下安装

    首先你需要安装python2.7pip,python2.7可以通过windows installer安装。安装pip首先下载get-pip.py, 然后执行下面命令。

     python get-pip.py
    

    然后,你需要安装pyquery的所依赖的lxml,选择对应的lxml installer下载并安装

    最后从github clone 到本地

     $ git clone https://github.com/nyankosama/crawl-me.git
     $ cd crawl-me/
     $ sudo python setup.py install
    

Usage

Examples

  1. 下载gamersky下的http://www.gamersky.com/ent/201404/352055.shtml 的第1页到第10页的所有图片到当前目录的gamersky-crawl文件夹下

     crawl-me gamersky http://www.gamersky.com/ent/201404/352055.shtml ./gamersky-crawl 1 10
    
  2. 下载pixiv中id为3878890的用户的所有作品到pixiv-crawl文件下

    crawl-me pixiv 3878890 ./pixiv-crawl <your pixiv id> <your password>
    

Command line options

  1. general help

     $ crawl-me -h    
    
     usage: crawl-me [-h] plugin
    
     positional arguments:
         plugin      plugin the crawler uses
     
     optional arguments:
         -h, --help  show this help message and exit
    
     available plugins:
     ----gamersky
     ----pixiv
    
  2. gamersky

     $ crawl-me gamersky -h
     
     usage: crawl-me [-h] plugin authorId savePath pixivId password
    
     positional arguments:
         plugin      plugin the crawler uses
         authorId    the author id you want to crawl
         savePath    the path where the imgs ars saved
         pixivId     your pixiv login id
         password    your pixiv login password
    
     optional arguments:
         -h, --help  show this help message and exit
    
  3. pixiv

     $ crawl-me pixiv -h
    
     usage: crawl-me [-h] plugin authorId savePath pixivId password
    
     positional arguments:
         plugin      plugin the crawler uses
         authorId    the author id you want to crawl
         savePath    the path where the imgs ars saved
         pixivId     your pixiv login id
         password    your pixiv login password
     
     optional arguments:
         -h, --help  show this help message and exit
    

TODO

添加通过pip安装的支持

Licenses

MIT

crawl-me's People

Contributors

nyankosama avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.