GithubHelp home page GithubHelp logo

zhangyunhao116 / mini-spider Goto Github PK

View Code? Open in Web Editor NEW
47.0 8.0 23.0 1.67 MB

简单、实用的爬虫工具,仅需四步创建属于你的爬虫程序!

License: Other

Python 99.37% Shell 0.63%
crawler python spider

mini-spider's Introduction

Mini-Spider

PyPI platform license

Mini-Spider是一个实用的爬虫工具,它的意义在于快速获得你所要的资源,而不用去关注诸如爬虫构造、数据存储、网络环境、语言实现等一系列的事情。现在你只需要简单的几个命令,就可以创建一个爬虫,并完成你的任务!

使用mini-spider,你仅需要两步即可创建属于你自己的爬虫!(大部分时候)

特性

  • 网页自动提取资源并根据算法分类(包括完整url和所有html标签内容)
  • 根据资源自动生成提取器
  • 自定义提取器以及Host数据
  • 自动将提取内容加入相应数据库
  • 自动分类下载,断点续传
  • 数据库导入和导出

简单地说,你只需要几个命令就可以爬取你想要的资源!

安装

安装前注意:

  • 只依赖于python 3.x ,不兼容pyhon 2.x

  • 本项目不需要任何第三方依赖。

下载整个项目,切换到本目录,在终端中执行

$ python3 setup.py install

或者,使用pip下载

$ pip3 install mini-spider

如何使用

示例

这里演示使用三个命令创建爬虫,后使用两个命令完成全部任务。

示例目标:提取这里作者发布的所有图片

example

当前版本

Ver 0.0.4 : 基本功能测试阶段。

mini-spider's People

Contributors

zhangyunhao116 avatar zyunh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mini-spider's Issues

基本功能完善

  • list功能显示优化

  • 重构数据库

  • import的文本中只有一个参数时可以正确导入

  • 加入所有模块单元测试

  • 重构分析命令

  • 造轮子重构编写命令行解析器

很有前景的爬虫啊!

这是一个很有前景的爬虫啊,希望大力发展。界面化、参数化、代理、定时、js等功能添加就更完美了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.