GithubHelp home page GithubHelp logo

data_crawler's Introduction

data_crawler

python爬虫

拉勾职位信息爬取分析

结构图

lagou
│
│
├─conf
│      common.py #配置信息
│      __init__.py
│
├─crawler  爬取模块
│      job_crawler.py  #职位信息爬取并导出csv文件
│      __init__.py
│
├─data #爬取数据保存位置
│
│
├─picture #生成图表保存位置
|
└─data_analysis  分析模块
       job_analysis.py 数据清洗、分析、可视化
       CompareType.py 枚举 比较类型

职位信息爬爬取模块

1.获取报文原始数据

{
	'totalCount': 10213,
	'locationInfo': {
		'city': '北京',
		'district': None,
		'queryByGisCode': False,
		'businessZone': None,
		'locationCode': None,
		'isAllhotBusinessZone': False
	},
	'resultSize': 15,
	'queryAnalysisInfo': {
		'jobNature': None,
		'companyName': None,
		'positionName': 'python',
		'usefulCompany': False,
		'industryName': None
	},
	'strategyProperty': {
		'name': 'dm-csearch-useUserAllInterest',
		'id': 0
	},
	'hotLabels': None,
	'hiTags': None,
	'result': [{
		'companyId': 63714,
		'approve': 1,
		'jobNature': '全职',
		'workYear': '1-3年',
		'education': '本科',
		'city': '北京',
		'companyLogo': 'i/image/M00/59/0B/CgqKkVfWgPaAdaAxAAAq6WYoG_0975.png',
		'positionAdvantage': '技术大牛多,福利待遇好',
		'salary': '25k-40k',
		'positionLables': ['后端'],
		'industryLables': [],
		'businessZones': None,
		'industryField': '移动互联网,教育',
		'companyShortName': '粉笔网',
		'companyFullName': '北京粉笔蓝天科技有限公司',
		'adWord': 0,
		'score': 0,
		'positionId': 5232056,
		'positionName': 'Python开发工程师',
		'createTime': '2018-10-18 11:25:13',
		'financeStage': '不需要融资',
		'companySize': '150-500人',
		'companyLabelList': ['技能培训', '节日礼物', '年底双薪', '带薪年假'],
		'publisherId': 3028023,
		'district': '朝阳区',
		'longitude': '116.481162',
		'latitude': '39.996092',
		'formatCreateTime': '11:25发布',
		'hitags': None,
		'resumeProcessRate': 100,
		'resumeProcessDay': 2,
		'imState': 'today',
		'lastLogin': 1539835283000,
		'explain': None,
		'plus': None,
		'pcShow': 0,
		'appShow': 0,
		'deliver': 0,
		'gradeDescription': None,
		'promotionScoreExplain': None,
		'firstType': '开发|测试|运维类',
		'secondType': '后端开发',
		'isSchoolJob': 0,
		'subwayline': '15号线',
		'stationname': '望京东',
		'linestaion': '14号线东段_望京;14号线东段_阜通;14号线东段_望京南;15号线_望京东;15号线_望京',
		'thirdType': 'Python',
		'skillLables': ['后端']
	}]
}

data_crawler's People

Contributors

chuanaqi avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.