GithubHelp home page GithubHelp logo

netease_spider's Introduction

网易严选模仿

网站端使用Django + MySQL + Celery + Redis

提供首页商品列表功能、商品详情页面和分类页面, 实现简单的页面展示功能

爬虫端使用Scrapy 每日定时从严选的某个频道爬取商品信息

  • 用redis-scrapy做分布式爬虫
  • 使用user agent池,轮流选择之一来作为user agent
  • 禁止cookies
  • 从代理网站上爬取代理, 每隔一段时间会检查代理的状态, 使用可用的代理爬取网站
  • 每次爬取网站数据, 都会对返回内容求hash值,代表本次获得内容, 下次更新时如果哈希值相同则不再更新数据库

目录结构

  • netease_spider 项目主配置
  • netease banner models
  • goods 商品、分类等相关
  • spider 爬虫相关
  • utils 工具函数

环境需求

mysql + redis + python 相关

其他说明

  • 配置环境是请修改 netease_spider/local_setings.py 更改redis和mysql的配置
  • 注意爬虫的执行顺序, 先爬取category 在爬取goods, 定时任务里这两个任务也是有先后之分的
  • 由于时间比较紧, 只在自己的开发机上搭建了环境, 没有在其他的平台测, 回去也会修改

netease_spider's People

Contributors

tobetterman avatar tmpbook avatar

Stargazers

 avatar 牛小牛 avatar lwy_thu avatar  avatar  avatar Eric avatar Alvinzhang avatar ig avatar Yu Pan avatar Sam avatar fayyang avatar cousepig avatar  avatar Cheemi avatar Rui avatar lllgm avatar  avatar  avatar Ryan Noodles avatar  avatar  avatar

Watchers

James Cloos avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.