GithubHelp home page GithubHelp logo

mrgengli / dailypaper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from neu-datamining/dailypaper

0.0 0.0 0.0 675 KB

By crawling the latest papers on arXiv with specified keywords using a web crawler, and then summarizing the content of the papers using chatgpt, we can compile and update the information.通过爬虫每日抓取arXiv上指定关键词的最新论文,然后使用chatgpt总结论文内容,汇总更新。

Python 100.00%

dailypaper's Introduction

📰DailyPaper

这个项目是一个论文summary库,旨在通过简单高效的方式来了解学界的最新研究成果。

📎workflow:
通过爬虫每日抓取arXiv上指定关键词的最新论文,然后使用chatgpt总结论文内容,汇总更新。

🎃tips:
1. 本脚本中arXiv识别key words的方法是对论文中的abstract做分词,然后匹配(匹配度大于2/3就会被认定匹配成功),所以难免会抓取到少量非本领域的论文(比如diffusion关键词中经常抓到天体物理相关的论文);
2. summary并不完全准确,只能作为基本的参考;
3. chatgpt有时会不听话的输出英文内容;
4. arXiv周六周日不更新论文。

☀️5月快速导航

Sum Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

📝Example

  1. Title: Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca.
  2. Authors: Yiming Cui, Ziqing Yang, Xin Yao.
  3. Affiliation: 无
  4. Keywords: Large Language Models, natural language processing, Chinese language, open-source software.
  5. Urls: arXiv:2304.08177v1, Github: https://github.com/ymcui/Chinese-LLaMA-Alpaca.
  6. Summary:
  • (1): 本文研究背景为大型语言模型在自然语言处理领域的广泛应用以及其对透明和开放的学术研究所带来的挑战。
  • (2): 过去的方法存在着专有限制和高昂的训练费用等问题,导致整个研究社区在此基础上无法进行细粒度的进一步研究。作者提出的方法侧重于对**巨型语言模型进行二次预训练,并使用**数据进行微调,从而显著提高模型的理解能力和指令执行能力。
  • (3): 作者提出了二次预训练和微调技术并在**指令数据集上进行微调和测试,以评估模型的性能和理解能力。
  • (4): 该方法在**巨型语言模型上进行了全面的评测,取得了良好的性能表现,支持了开源软件的目标。
  • (5): 本文的主要动机是为了推动自然语言处理领域的开放研究并提高巨型语言模型的透明度和可理解性。

📍目前的关键词

query key words
chatgpt chatgpt
multimodal multimodal, text, image, video
dialogue system dialogue system, chatbot, chat-bot
empathetic dialogue empathetic dialogue, ed
humorous dialogue humorous dialogue
diffusion diffusion, image, text, image generation
large language model LLM, large language model
contrastive learning contrastive learning, text generation, negative response, negative example, attention mask, multi-modal, dialog
reinforcement learning rl, reinforcement learning, reinforcement learning from human feedback

⌨代码

本项目的代码是用api2d做了中转,如果要直接使用chatgpt的服务,只需要修改 api2d.py文件的请求链接和请求头。

# 基本配置信息, 保存apikey, email, query, key words等信息
config.ini 
# arxiv爬虫
get_arxiv.py
# 本地数据库
database.py
# 发送邮件
send_email.py
# 处理pdf
process_pdf.get_summary.py, prompt_convert_json.py 
# 🧠大脑
NavigoX.py

✏参考项目

chatpaper: https://github.com/kaixindelele/ChatPaper

chatgpt academic: https://github.com/binary-husky/chatgpt_academic

dailypaper's People

Contributors

zhangyiqun018 avatar mrgengli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.