GithubHelp home page GithubHelp logo

tangweize / spiderforwebofscience Goto Github PK

View Code? Open in Web Editor NEW
51.0 3.0 18.0 3.89 MB

python写的一个小爬虫,爬取web of science的文献信息,包含"title","作者全名", "作者简写","关键词","摘要"一切网页上有的信息,并转成CSV信息表格存储。还有下载web of science存有的pdf文献文件功能。

Python 99.94% Shell 0.06%
spider python3 webofscience paperspider

spiderforwebofscience's Introduction

说明手册

整个代码参数极少,只有三个,并且都是显示易懂的参数。

整个代码运行的前提是 能打开web of science并能展示出条件检索结果

整个爬虫代码在Spider_by_VZ里面只有三个主要的py文件分别如下

  • Main_Methods 里面包含了所有需要提取的信息抽取代码,无需关心
  • main是使用的入口,main里面有三个参数需要指定,具体后面阐述。
  • DownloadPdf 是 下载web of science 直接可获取的 文献pdf

main.py 参数说明:

总共有三个参数需要制定,我将分别用图片文字说明

  1. 此时我们已经打开了web of science页面,但是这时候的url链接并不符合这个代码的要求(因为没有翻页参数) web of science检索结果页面
  2. 这时候,我们需要在下图箭头标志出随便输入一个页码,激活带有page参数的url。 获取带有page的url
  3. 最终,我们可以根据该页面获得main函数里面的两个参数。
  • url_root的设置 带有page的url链接,但是不需要数字(比如上图里面的2删掉)注意:这个url_root里面是带有验证信息的,一般24小时,需要更换一次
  • nums_page的设置为下图圆圈里面数字,也就是总页码 页码数设置
  • filename 指定文献信息表格存的路径以及名字

环境

  • python 3.6
  • 依赖的包 requests pandas
    beautifulsoup4 tqdm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.