GithubHelp home page GithubHelp logo

hhy5277 / node-spider-2 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from inuanfeng/node-spider

0.0 1.0 0.0 3.07 MB

nodejs爬虫,爬取汽车之家所有车型数据demo

JavaScript 89.95% HTML 10.05%

node-spider-2's Introduction

应用介绍

nodejs爬虫,爬取汽车之家所有车型数据 http://www.autohome.com.cn/car/

包括品牌,车系,年份,车型四个层级。

特性

现有特性

1、爬取汽车之家的数据;

2、自动存入MongoDB数据库

未来要添加特性

1、用HighChart显示爬取数据;

2、将数据自动存入MySQL;

3、添加单元测试.

使用的node模块:

superagent, request, iconv; (网络请求模块,iconv用于gbk转码)

cheerio; (和jQuery一样的API,处理请求来的html,省去正则匹配)

eventproxy, async; (控制并发请求,async控制得更细)

async控制并发请求数量为10个(避免封IP与网络错误)

模拟sleep使间隔100ms(不设间隔偶尔会出现dns错误)

去除express模块,该为控制台直接开启爬虫(数据量大,打开网页来开启爬虫可能会由于超时而重新发起访问)

最终使用的模块

request, iconv, cheerio, async

最后自动存入到mongoDB数据库

项目说明

app.js是爬虫主程序,分步骤抓取数据。

爬取步骤:

  1. 抓取品牌和车系;
  2. 抓取年份;
  3. 抓取车型;
  4. 存入本地json文件;
  5. 自动存入MongoDB数据库.

细节控制

1、在售款有2016款和2017款;

2、有的车系在售有2016款,停售的也有2016款;

3、抓取失败时重新抓取该页面;

4、抓取完毕自动存入data.json;

5、存取完毕,读取并存入MongoDB;

环境要求

运行项目前请先安装Node和MongoDB数据库

贡献者

Frank--https://github.com/sunfeng90

使用方法

#### 安装依赖
npm install

#### 启动爬虫,数据存储于data.json
node app

#### 存入MongoDB数据库
注意:爬虫的数据自动存入你本地的MongoDB数据库(前提是你已经安装了MongDB数据库)

爬取结果截图

赞助

协议

node-spider-2's People

Contributors

fwzkj90 avatar inuanfeng avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.