GithubHelp home page GithubHelp logo

google_scholar_spider's Introduction

Google Scholar Spider Documentation

Google Scholar Spider是一个基于Python的工具,根据给定的关键字检索Google Scholar上发表的文章数据。它允许用户将结果保存为CSV文件,绘制结果,并通过年份和引用次数过滤结果。

News

本仓库是2023年在训练学术大模型的时候,顺手写的谷歌学术爬虫,之后这个项目基本搁置了,但爬虫的价值还是很大的,如果有人有相关意向或者想对本仓库进行大翻新,可以联系我微信:db277500。

另外最近在做出海的AI SaaS产品,建了一个小的交流群,欢迎加入

Usage

可以通过运行命令行中的google_scholar_spider函数并传递任何所需的参数来使用Google Scholar Spider。可用的参数包括:

--kw (default "machine learning") 要搜索的关键字。

--nresults (default 50) 要在Google Scholar上搜索的文章数。

--notsavecsv 使用此标志以不保存结果到CSV文件的方式打印结果。

--csvpath 要保存导出的CSV文件的路径。默认为当前文件夹。

--sortby (default "Citations") 按列排序数据。如果要按每年引用次数排序,请使用--sortby "cit/year"。

--plotresults 使用此标志以原始排名在x轴上,引用次数在y轴上绘制结果。

--startyear 搜索文章的起始年份。

--endyear (default current year) 搜索文章的结束年份。

--debug 使用此标志启用调试模式。调试模式用于单元测试并将页面存储在网络档案库中。

Examples

python google_scholar_spider.py --kw "deep learning" --nresults 30 --csvpath "./data" --sortby "cit/year" --plotresults 1

此命令在Google Scholar上搜索与“deep learning”相关的文章,检索30个结果,将结果保存到“./data”文件夹中的CSV文件中,按每年引用次数排序数据,并绘制结果。

License

Google Scholar Spider根据MIT许可证发布。

google_scholar_spider's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

google_scholar_spider's Issues

用不了,googl请求找不到

DevTools listening on ws://127.0.0.1:59473/devtools/browser/991c0108-9f7c-4648-b098-7d207331820f
Element not found
No success. The following error was raised:
'NoneType' object has no attribute 'get_attribute'

感谢大佬,超级好用。

我用cursor尝试了,自动创建py程序,读取csv文件里的Title,检索摘要,但是失败了(我是代码零基础,纯白小白)。大佬能否加一个功能,根据Title的内容,检索摘要,并插入到之前的csv里。
我发现我差的不仅仅是代码,其实去哪个网站抓取信息都不知道。
自动创建的程序,总是尝试去检索pdf,导致运行没问题,但是pdf永远是不存在。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.