GithubHelp home page GithubHelp logo

sci-journals-crawler's Introduction

Puppeteer采集SCI期刊数据

1. 学习目标

  • 学习和使用基于Nodejs/Puppeteer的网页爬虫技术。
  • 利用Puppeteer爬取Elsevier官网关于期刊审稿速度接受率两个重要参数。
  • 利用mongoose将数据实时保存至本地的mongoDB数据库。

2. 运行程序

爬取“审稿速度”数据:

node review_speed.js

爬取“接受率”数据:

node acceptance_rate.js

3. 采集原理

Elsevier官方提供了旗下期刊的 Review SpeedAcceptance Rate 指标,通过使用不同期刊的ISSN号来构造URL,抓取相关数据。例如:

https://journalinsights.elsevier.com/journals/0163-8343/review_speed

Review Speed

Review Speed

待采集列表中的ISSN为所有Elsevier旗下的SCI期刊,数据来源于Clarivate并经过整理,但要注意并不是每本期刊都有我们想要采集的两项数据指标。

4. Puppeteer的基本使用

基于Node.js/Puppeteer无头浏览器的爬虫技术为前端人员提供了高效数据采集的手段,相比使用Python数据采集的方式,其上手更简单,适合用于比较简单的采集任务。初步使用方法可参考:

5. 成果转化

根据数据采集结果整理发布微信图文,扫描二维码预览。

sci-journals-crawler's People

Stargazers

 avatar Han avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.