GithubHelp home page GithubHelp logo

feng-li / tools-for-data-science-course Goto Github PK

View Code? Open in Web Editor NEW
62.0 6.0 183.0 66.84 MB

"Tools for Data Science Course" Student Interactive Git Repository

Home Page: https://feng.li/teaching/tds

Python 3.90% R 0.03% Jupyter Notebook 71.93% Shell 0.01% HTML 24.13% C++ 0.01%
data-science-tools git python hadoop spark

tools-for-data-science-course's Introduction

Tools for Data Science Course

  • Course hompage https://feng.li/teaching/tds/

    • Venue: 209M, Shahe Campus
    • Time: Every Friday 19:20 -- 21:00 pm
    • Taught by Feng Li
  • Information for 2019 spring semester

    • Venue: 105M, Shahe Campus
    • Time: Every Wednesday 19:20 pm

Hello,my old friends! hello

tools-for-data-science-course's People

Contributors

13897597076 avatar 1silverlining avatar 2019310115 avatar cloudy55 avatar cx0222 avatar divinerhjf avatar droluo avatar feng-li avatar hanxiya avatar haoliangzheng avatar hejieruo avatar iron1995 avatar jellysillyfish avatar jiachen-mu avatar jiangxin-1205 avatar jiaxin-li-sonia avatar keaton9527 avatar leoli2002 avatar liloyunwen avatar ning19991214 avatar philharmoni avatar shiyuanyang avatar sijia-gao avatar stacey-ckk avatar vincent-cufe avatar wajijiwa-dot avatar walan98 avatar wocufer avatar yizhenwang-2020 avatar zshenwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tools-for-data-science-course's Issues

爬虫.py

from bs4 import BeautifulSoup
import requests
import xlwt

def getHouseList(url):
house =[]
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER'}
#get从网页获取信息
res = requests.get(url,headers=headers)
#解析内容
soup = BeautifulSoup(res.content,'html.parser')
#房源title
housename_divs = soup.find_all('div',class_='title')
for housename_div in housename_divs:
housename_as=housename_div.find_all('a')
for housename_a in housename_as:
housename=[]
#标题
housename.append(housename_a.get_text())
#超链接
housename.append(housename_a.get('href'))
house.append(housename)
huseinfo_divs = soup.find_all('div',class_='houseInfo')
for i in range(len(huseinfo_divs)):
info = huseinfo_divs[i].get_text()
infos = info.split('|')
#小区名称
house[i].append(infos[0])
#户型
house[i].append(infos[1])
#平米
house[i].append(infos[2])
#查询总价
house_prices = soup.find_all('div',class_='totalPrice')
for i in range(len(house_prices)):
#价格
price = house_prices[i].get_text()
house[i].append(price)
return house

#爬取房屋详细信息:所在区域、套内面积
def houseinfo(url):
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER'}
res = requests.get(url,headers=headers)
soup = BeautifulSoup(res.content,'html.parser')
msg =[]
#所在区域
areainfos = soup.find_all('span',class_='info')
for areainfo in areainfos:
#只需要获取第一个a标签的内容即可
area = areainfo.find('a')
if(not area):
continue
hrefStr = area['href']
if(hrefStr.startswith('javascript')):
continue
msg.append(area.get_text())
break
#根据房屋户型计算套内面积
infolist = soup.find_all('div',id='infoList')
num = []
for info in infolist:
cols = info.find_all('div',class_='col')
for i in cols:
pingmi = i.get_text()
try:
a = float(pingmi[:-2])
num.append(a)
except ValueError:
continue
msg.append(sum(num))
return msg

#将房源信息写入excel文件
def writeExcel(excelPath,houses):
workbook = xlwt.Workbook()
#获取第一个sheet页
sheet = workbook.add_sheet('git')
row0=['标题','链接地址','户型','面积','朝向','总价','所属区域','套内面积']
for i in range(0,len(row0)):
sheet.write(0,i,row0[i])
for i in range(0,len(houses)):
house = houses[i]
print(house)
for j in range(0,len(house)):
sheet.write(i+1,j,house[j])
workbook.save(excelPath)

#主函数
def main():
data = []
for i in range(1,5):
print('-----分隔符',i,'-------')
if i==1:
url ='https://sjz.lianjia.com/ershoufang/l2rs%E5%92%8C%E5%B9%B3%E4%B8%96%E5%AE%B6/'
else:
url='https://sjz.lianjia.com/ershoufang/pg'+str(i)+'l2rs%E5%92%8C%E5%B9%B3%E4%B8%96%E5%AE%B6/'
houses =getHouseList(url)
for house in houses:
link = house[1]
if(not link or not link.startswith('http')):
continue
mianji = houseinfo(link)
#将套内面积、所在区域增加到房源信息
house.extend(mianji)
data.extend(houses)
writeExcel('d:/house.xls',data)

if name == 'main':
main()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.