GithubHelp home page GithubHelp logo

python-pyspider's Introduction

Python-pyspider

存放自己爬虫项目

目的:爬取某网站发布的新冠信息

使用方式

covid_spider.py文件是实际爬取程序,spider里的两个文件夹内容是存储数据

代码逻辑简述

首先访问指定的网站,然后爬取网页内容,对爬取到的网页内容进行一个数据筛查, 提取需要的数据,保存到指定的json格式文件。

用到的库


from encodings.utf_8 import encode
from pip import main
import requests
from bs4 import BeautifulSoup
import re
import json
from tqdm import tqdm

访问页面

 # 发送请求,获取疫情首页
        response = requests.get(url)
        return response.content.decode()

采集指定的数据

corona_virus = []
        for country in tqdm(last_day_covid,'采集近日以来各国的疫情数据'):  # tqdm为进度条显示
            # 发送请求,获取各国至今的json数据
            statistics_data_url = country['statisticsData']
            statistics_data_json_str = self.get_content_from_url(statistics_data_url)
            #print(statistics_data_json_str)
            # 把json数据转换为python类型的数据,添加列表中

保存数据到文件

def save_data(self,path,data):
        # json格式保存
        with open(path,'w',encoding='utf8') as fp:
            json.dump(data,fp,ensure_ascii=False)

运行

python-pyspider's People

Contributors

qixiaomao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.