GithubHelp home page GithubHelp logo

guoyu07 / spider_job Goto Github PK

View Code? Open in Web Editor NEW

This project forked from xianyunyh/spider_job

0.0 0.0 0.0 1.13 MB

基于boss直聘网数据的上海地区PHP职位情况的分析

Home Page: http://yehe.37he.cn/job/#/weekline

License: MIT License

PHP 5.47% Python 41.31% HTML 40.27% JavaScript 3.30% Vue 9.64%

spider_job's Introduction

爬虫项目

这个项目是主要自己研究招聘网站上的职位以及对应的需求准备的一个python项目。

数据来源为拉钩和直聘网。 项目基于scrapy框架进行爬虫,使用mongodb存储爬取数据。

  • 项目目录结构图

├─backend php后端接口
├─front 前端界面
│  ├─job                vue
│  ├─company.html       热门公司
│  ├─education.html     学历分析
│  ├─weekline.html      发布趋势
├─tutorial python爬虫
│  ├─spiders           爬虫
│  │  ├─51job.py       51job爬虫
│  │  ├─lagou.py       拉钩爬虫
│  │  ├─zhipin.py      直聘爬虫
│  ├─items.py          数据项
│  ├─middlewares.py    中间件
│  ├─pipelines.py      管道
│  ├─settings.py       项目配置
├─word.json 生成的英文技术词json
├─word.py 生成英文分词
├─stop.txt 停用词列表

安装

  • 请安装mongodb、redis

  • python 请选用3.6+以上的版本。需要的依赖有 pymongo、scrapy、redis、pyquery(后期可能会移除)

  • php请安装pecl mogodb拓展!依赖 mongodb/mongodb

    composer require mongodb/mongodb
    

运行爬虫

scrapy crawl boss #抓boss
scrapy crawl 51job #抓51job
scrapy crawl lagou #拉钩

spider_job's People

Contributors

xianyunyh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.