GithubHelp home page GithubHelp logo

hhy5277 / meituan-spider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hankailuo/meituan-spider

0.0 1.0 0.0 175 KB

多线程美团酒店爬虫,python模拟美团_token

Python 87.84% HTML 12.16%

meituan-spider's Introduction

Python实现模拟美团_token验证并用多线程爬去酒店信息

环境依赖

  • beautifulsoup
  • lxml
  • requests

运行方法

pip安装依赖

$ pip install -r requirments.txt

打开getproxy.py添加自己的代理池 默认使用了西刺免费代理,美团js验证容易被封,建议使用付费代理,顺便求靠谱代理池

$ python getproxy.py

爬取足够的代理ip

$ python run.py

运行: 20多万个酒店

数据

程序流程

选城市酒店列表的url为起点,爬取全国城市酒店列表网址,加入task队列,get_hotel_url爬取所有酒店网址,加入url队列,get_url_encode模拟js验证,run从url队列取出网址,爬取后存入datafile 断点续传

token验证

chrome查到异步提交的参数,全局搜索,反混淆找到js处理函数

python处理

perfect

meituan-spider's People

Contributors

hankailuo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.