GithubHelp home page GithubHelp logo

jeffhj / lm-reasoning Goto Github PK

View Code? Open in Web Editor NEW
496.0 16.0 30.0 102 KB

This repository contains a collection of papers and resources on Reasoning in Large Language Models.

License: MIT License

large-language-models reasoning deductive-reasoning natural-language-processing artificial-intelligence human-intelligence language-modeling language-models awesome-list chain-of-thought

lm-reasoning's Introduction

Reasoning in Large Language Models

Awesome License: MIT Made With Love

This repository contains a collection of papers and resources on Reasoning in Large Language Models.

For more details, please refer to Towards Reasoning in Large Language Models: A Survey

Feel free to let me know the missing papers (issue or pull request).

Contributor: Jie Huang @UIUC

Thank Kevin Chen-Chuan Chang @UIUC, Jason Wei @Google Brain, Denny Zhou @Google Brain for insightful discussions and suggestions.

Contents

Survey

Jie Huang, Kevin Chen-Chuan Chang

Relevant Survey and Position Paper and Blog

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-dickstein, Kevin Murphy, Charles Sutton

Yao Fu, Hao Peng, Tushar Shot

Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen

Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, Kai-Wei Chang

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, Zhifang Sui

Zonglin Yang, Xinya Du, Rui Mao, Jinjie Ni, Erik Cambria

Fei Yu, Hongbo Zhang, Benyou Wang

Technique

Fully Supervised Finetuning

We mainly focus on techniques that are applicable to improving or eliciting "reasoning" in large language models like GPT-3 (175B)

Papers in this paradigm vary a lot and are usually based on small models trained on specific datasets. We list several papers here for reference (that is, the list is not complete). Please refer to our survey for some discussion.

Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher

Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

Soumya Sanyal, Harman Singh, Xiang Ren

......

Prompting and In-Context Learning

Chain of Thought Prompting and Its Variants/Applications

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Boshi Wang, Xiang Deng, Huan Sun

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

Ben Prystawski, Paul Thibodeau, Noah Goodman

Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

Wenhu Chen

Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig

Luyu Gao*, Aman Madaan*, Shuyan Zhou*, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

Hangfeng He, Hongming Zhang, Dan Roth

Rationale Engineering

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen

Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola

Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

Yixuan Weng, Minjun Zhu, Shizhu He, Kang Liu, Jun Zhao

Problem Decomposition

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, Denny Zhou

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner

Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li

Others

Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch

Antonia Creswell, Murray Shanahan, Irina Higgins

Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi

Antonia Creswell, Murray Shanahan

Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan

Kumar Shridhar, Alessandro Stolfo, Mrinmaya Sachan

Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran

Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu

Hybrid Method

Reasoning-Enhanced Training and Prompting

Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Qiang Fu, Yan Gao, Jian-Guang Lou, Weizhu Chen

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

Bootstrapping and Self-Improving

Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman

Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai

Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

Evaluation and Analysis

Arkil Patel, Satwik Bhattamishra, Navin Goyal

Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh

Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang

Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Luke Benson, Lucy Sun, Ekaterina Zubova, Yujie Qiao, Matthew Burtell, David Peng, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Shafiq Joty, Alexander R. Fabbri, Wojciech Kryscinski, Xi Victoria Lin, Caiming Xiong, Dragomir Radev

Abulhair Saparov, He He

Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

Olga Golovneva, Moya Chen, Spencer Poff, Martin Corredor, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Citation

If you find this repo useful, please kindly cite our survey:

@article{huang2022towards,
  title={Towards Reasoning in Large Language Models: A Survey},
  author={Huang, Jie and Chang, Kevin Chen-Chuan},
  journal={arXiv preprint arXiv:2212.10403},
  year={2022}
}

lm-reasoning's People

Contributors

ber666 avatar haluptzok avatar huybery avatar jeffhj avatar shuyanzhou avatar siviltaram avatar zxlzr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lm-reasoning's Issues

Request to add paper

Hi,
great work/repo!

Please consider adding our work on deductive/logical reasoning.

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language, ACL 2022 (arxived on: 19 Mar 2022)
paper link
Soumya Sanyal, Harman Singh, Xiang Ren

A request for updating new papers on logical reasoning data augmentation, prompt augmentation and evaluation

Hi Jie,

Here is our new papers for logical reasoning data augmentation, prompt augmentation and evaluation. Please consider to add those papers into your arXiv paper. Thanks a lot.

Logic-Driven Data Augmentation and Prompt Augmentation

We present an AMR-based logic-driven data augmentation for contrastive learning to improve discriminative language model's logical reasoning performance and we also use AMR-based data augmentation method to augment the prompt which help GPT-4 achieved #1 on the ReClor leaderboard (One of the hardest logical reasoning reading comprehension dataset, the data was collected from LSAT and GMAT) and we also achieved better performance than other baseline models on different logical reasoning reading comprehension tasks and natural language inference tasks. Here is the details for the paper.

Our paper (Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Michael Witbrock, Jiamou Liu)
"Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentation" [Paper link] [Source code] [Model weights] [Leaderboard].

Out-of-Distribution Logical Reasoning Evaluation and Prompt Augmentation for Enhancing OOD Logical Reasoning

We present a systematically out-of-distribution evaluation on logical reasoning tasks. We presented three new more robust logical reasoning datasets ReClor-Plus, LogiQA-Plus and LogiQAv2-Plus which are basically constructed from ReClor, LogiQA and LogiQAv2 from the changes of option's order and forms. We found simply using chain-of-thought prompting will not increase models' performance on the out-of-distribution scenario while using our AMR-based logic-driven data augmentation to augment prompt can increase large language models' performance on out-of-distribution logical reasoning tasks. The three datasets have been collected by OpenAI/Evals.
"A Systematic Evaluation of Large Language Models on Out-of-Distribution Logical Reasoning Tasks" [Paper link] [Source code] [Dataset links].

A Empirical Study on Out-Of-Distribution Multi-Step Logical Reasoning

We find that pre-trained language models are not good at on robust multi-step logical reasoning tasks and one of the main reason is that there is limited amount of training sets for deeper multi-step logical reasoning. Therefore, we present a deeper large multi-step logical reasoning datasets named PARARULE-Plus. The dataset has also been collected by OpenAI/Evals.
"Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation" [Paper link] [Source code] [Dataset links].

In case you were planning to expand on inductive reasoning

Thanks for synthesizing such a fast growing list of papers on LLMs and reasoning! I also appreciate you writing about reasoning types that go beyond deductive!

I have a few papers that touch on inductive reasoning in humans and models in case you'd like to expand on that topic in the survey, though disclaimer: these only deal with what you consider to be small LMs (though they are model agnostic).

Misra, 2022 (AAAI Doctoral Consortium 2022): On Semantic Cognition, Inductive Generalization, and Language Models
https://ojs.aaai.org/index.php/AAAI/article/view/21584

Misra et al., 2022 (CogSci 2022): A Property Induction Framework For Neural Language Models:
https://arxiv.org/abs/2205.06910

Misra et al., 2021 (CogSci 2021): Do language models learn typicality judgments from text? (exp 2 is the first analysis of LMs on Inductive Reasoning)
https://arxiv.org/abs/2105.02987

Other papers that should be included in case you do decide to pursue this route:

Han et al., 2022 (CogSci 2022): Human-like property induction is a challenge for large language models
https://psyarxiv.com/6mkjy/

Yang et al., 2022: Language Models as Inductive Reasoners
https://arxiv.org/abs/2212.10923

Request to add a new survey

Hi, thanks for your contributions to collating large language model reasoning papers!
Recently, we release a reasoning survey on natural language reasoning mainly from another perspective: the reasoning paradigm (end-to-end, forward, and backward).

Here are our survey and repository:
Nature Language Reasoning, A Survey
https://arxiv.org/pdf/2303.14725.pdf
https://github.com/FreedomIntelligence/ReasoningNLP

I believe our surveys and repositories can complementarily help people better understand the reasoning!

Paper addition request

Hi, thanks for the great work, I wanted to point to this paper about using LM to perform reasoning in knowledge graphs for the explainable recommendation task: Faithful Path Language Modelling for Explainable Recommendation over Knowledge Graph
https://arxiv.org/abs/2310.16452

Request to add a paper.

Great Work!

Could you please add our paper:

Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks
In this paper, we propose a novel framework to combine the reasoning of LLM with search engine.
paper link

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.