GithubHelp home page GithubHelp logo

xianyu110 / bigdatasalaryanaliysystem Goto Github PK

View Code? Open in Web Editor NEW

This project forked from linjiayue/bigdatasalaryanaliysystem

0.0 0.0 0.0 48.21 MB

大数据招聘信息分析平台

Python 7.18% Java 74.11% HTML 0.91% JavaScript 2.36% CSS 15.44%

bigdatasalaryanaliysystem's Introduction

##大数据招聘信息分析平台## 这是依据爬取程序,爬取到的招聘信息进行各维度的分析及展现最终结果的平台

###平台环境###

  • Centos 7
  • Hadoop-2.5.1
  • Zookeeper-3.4.6
  • HBase-1.1.2
  • Hive-1.2.1
  • MySql-5.6
  • Python 3.5
  • JDK 1.8

搭建过程可参考:我的博客

###平台项目结构###

  • 爬取项目(爬虫),使用Python编写
  • 加载数据到HBase数据库项目,使用Java编写
  • Web端展现项目,使用Java编写

###各项目功能###

1.爬取项目

  • 定时爬取指定连接的内容,并将所爬取到的内容保存到指定文件夹中
  • 调用加载到HBase数据库的项目(Jar),使数据加载到HBase中

2.加载数据到HBase数据库项目

  • 读取指定目录下的文件,分析其中的内容并且保存到HBase中
  • 多线程加载

3.Web端展现项目

  • 图形化展现分析数据后的结果
  • 定时执行统计程序,并将结果保存到结果表中

###平台现有功能###

  • 爬取脚本爬取拉钩网指定关键词(Hadoop,Spark,数据分析,数据挖掘)的内容,并将爬取到的内容已指定的格式存储在文件中,目前存储的格式为JSON字符串格式
  • 爬取脚本调用加载数据到HBase中的Jar
  • 分析指定文件夹中的文件内容,将其转化为JSON后,映射HBase中的表字段存储内容
  • Web应用中,定时执行统计程序,并将统计结果保存到HBase的结果表中,目前已有的统计内容为统计每天,每个城市,每个职位(Hadoop,Spark,数据分析,数据挖掘)的职位数
  • 目前已有的展示内容为,以**地图的形式展现各省份各职位数数量,以柱状图的形式展示各职位(Hadoop,Spark,数据分析,数据挖掘)的职位数,以堆积折线图的形式展示北上广深四个城市各职位(Hadoop,Spark,数据分析,数据挖掘)数量

###建表脚本###

####HBase#### create 'SalaryInfoResult','ResultInfoFamily'

create 'SalaryInfo','PositionInfoFamily','CompanyInfoFamily','OtherInfoFamily'

####Hive#### create external table SalaryInfo(key string,positionName string,createTime string,createDate string,insertDate string,salary string,workYear string,city string,createTimeSort string,companySize string,financeStage string,education string,positionAdvantage string,positionType string,industryField string,companyLabelList string,companyName string,companyShortName string,jobNature string,positionFirstType string,leaderName string,flowScore string,searchScore string,countAdjusted string,pvScore string,companyLogo string,positonTypesMap string,orderBy string,formatCreateTime string,haveDeliver string,adWord string,score string,positionId string,deliverCount string,relScore string,totalCount string,showOrder string,showCount string,calcScore string,companyId string,randomScore string,hrScore string,adjustScore string,imstate string,plus string)stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties("hbase.columns.mapping"=":key,PositionInfoFamily:positionName,PositionInfoFamily:createTime,PositionInfoFamily:createDate,PositionInfoFamily:insertDate,PositionInfoFamily:salary,PositionInfoFamily:workYear,PositionInfoFamily:city,PositionInfoFamily:createTimeSort,CompanyInfoFamily:companySize,CompanyInfoFamily:financeStage,CompanyInfoFamily:education,CompanyInfoFamily:positionAdvantage,CompanyInfoFamily:positionType,CompanyInfoFamily:industryField,CompanyInfoFamily:companyLabelList,CompanyInfoFamily:companyName,CompanyInfoFamily:companyShortName,CompanyInfoFamily:jobNature,CompanyInfoFamily:positionFirstType,CompanyInfoFamily:leaderName,OtherInfoFamily:flowScore,OtherInfoFamily:searchScore,OtherInfoFamily:countAdjusted,OtherInfoFamily:pvScore,OtherInfoFamily:companyLogo,OtherInfoFamily:positonTypesMap,OtherInfoFamily:orderBy,OtherInfoFamily:formatCreateTime,OtherInfoFamily:haveDeliver,OtherInfoFamily:adWord,OtherInfoFamily:score,OtherInfoFamily:positionId,OtherInfoFamily:deliverCount,OtherInfoFamily:relScore,OtherInfoFamily:totalCount,OtherInfoFamily:showOrder,OtherInfoFamily:showCount,OtherInfoFamily:calcScore,OtherInfoFamily:companyId,OtherInfoFamily:randomScore,OtherInfoFamily:hrScore,OtherInfoFamily:adjustScore,OtherInfoFamily:imstate,OtherInfoFamily:plus")tblproperties("hbase.table.name"="SalaryInfo");

##更新历史## ###版本:1.0 更新时间 2015-12-26###

bigdatasalaryanaliysystem's People

Contributors

linjiayue avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.