GithubHelp home page GithubHelp logo

awesome-archive / wspider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from csuldw/wspider

0.0 2.0 0.0 312 KB

爬虫练习:新浪微博用户数据爬取、模拟知乎登陆

Python 17.27% HTML 82.73%

wspider's Introduction

Introduction

子项目

Mini小爬虫

  • conf.ini:用于配置proxies、headers等参数,其中Sina API的参数需设置成自己的;
  • dataEncode.py:用于模拟登录sina时提交的POST数据;
  • Logger.py:用于输出日志文件;
  • main.py:运行项目的入口文件;
  • myconf.py:加载配置文件;
  • SinaSpider.py:spider核心内容,主要是SinaClient类,内部方法说明如下
    • switchUserAccount(self, userlist):用于切换用户账号,防止长时间爬取账号被禁
    • login(self, username, password):根据用户名和密码登录sina微博
    • getUserInfos(self, uid):根据用户ID获取用户个人信息
    • getUserFollows(self, uid, params):根据用户ID 获取用户关注的用户ID列表
    • getUserFans(self, uid, params):根据用户ID 获取粉丝ID列表
    • getUserTweets(self, uid, tweets_all, params):根据用户ID 获取微博,tweets_all是一个list变量
  • output:输出目录

模拟登录知乎

文件介绍

  • ZhiHuPro/zhiHuLogin.py
  • ZhiHuPro/WSpider.py:封装的WSpider类,包括日志输出函数
  • ZhiHuPro/out:存放输出的网页
  • ZhiHuPro/temp:存放验证码

模拟登录新浪

文件介绍

  • SinaLogin/dataEncode.py:用于对提交POST请求的数据进行编码处理
  • SinaLogin/Logger.py:用于打印log
  • SinaLogin/SinaSpider.py:用于爬取sina微博数据的文件(主文件)
  • SinaLogin/out:用于存储输出文件

Contributor

@author: Diwei Liu


此项目将在后续持续更新,敬请关注,喜欢就给个Star吧。

wspider's People

Contributors

csuldw avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.