Reasoning in Large Language Models

This repository contains a collection of papers and resources on Reasoning in Large Language Models.

For more details, please refer to Towards Reasoning in Large Language Models: A Survey

Feel free to let me know the missing papers (issue or pull request).

Contributor: Jie Huang @UIUC

Thank Kevin Chen-Chuan Chang @UIUC, Jason Wei @Google Brain, Denny Zhou @Google Brain for insightful discussions and suggestions.

Survey
Relevant Survey & Position Paper & Blog
Technique
Evaluation & Analysis

Survey

Towards Reasoning in Large Language Models: A Survey 20 Dec 2022

Jie Huang, Kevin Chen-Chuan Chang

Relevant Survey and Position Paper and Blog

Emergent Abilities of Large Language Models 15 Jun 2022

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

Language Model Cascades 21 Jul 2022

David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-dickstein, Kevin Murphy, Charles Sutton

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources 11 Dec 2022

Yao Fu, Hao Peng, Tushar Shot

Reasoning with Language Model Prompting: A Survey 19 Dec 2022

Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen

A Survey of Deep Learning for Mathematical Reasoning 20 Dec 2022

Pan Lu, Liang Qiu, Wenhao Yu, Sean Welleck, Kai-Wei Chang

A Survey for In-context Learning 31 Dec 2022

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, Zhifang Sui

Logical Reasoning over Natural Language as Knowledge Representation: A Survey 21 Mar 2023

Zonglin Yang, Xinya Du, Rui Mao, Jinjie Ni, Erik Cambria

Nature Language Reasoning, A Survey 26 Mar 2023

Fei Yu, Hongbo Zhang, Benyou Wang

Technique

Fully Supervised Finetuning

We mainly focus on techniques that are applicable to improving or eliciting "reasoning" in large language models like GPT-3 (175B)

Papers in this paradigm vary a lot and are usually based on small models trained on specific datasets. We list several papers here for reference (that is, the list is not complete). Please refer to our survey for some discussion.

Explain Yourself! Leveraging Language Models for Commonsense Reasoning 6 Jun 2019

Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher

Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge 11 Jun 2020

Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant

Measuring Mathematical Problem Solving With the MATH Dataset 5 Mar 2021

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

Show Your Work: Scratchpads for Intermediate Computation with Language Models 30 Nov 2021

Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language 19 Mar 2022

Soumya Sanyal, Harman Singh, Xiang Ren

......

Prompting and In-Context Learning

Chain of Thought Prompting and Its Variants/Applications

Chain of Thought Prompting Elicits Reasoning in Large Language Models 28 Jan 2022

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou

Iteratively Prompt Pre-trained Language Models for Chain of Thought 16 Mar 2022

Boshi Wang, Xiang Deng, Huan Sun

Large Language Models are Zero-Shot Reasoners 24 May 2022

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa

Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models 16 Sep 2022

Ben Prystawski, Paul Thibodeau, Noah Goodman

Language Models are Multilingual Chain-of-Thought Reasoners 6 Oct 2022

Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

Large Language Models are few(1)-shot Table Reasoners 13 Oct 2022

Wenhu Chen

Language Models of Code are Few-Shot Commonsense Learners 13 Oct 2022

Aman Madaan, Shuyan Zhou, Uri Alon, Yiming Yang, Graham Neubig

PaL: Program-Aided Language Model 18 Nov 2022

Luyu Gao*, Aman Madaan*, Shuyan Zhou*, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 22 Nov 2022

Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

Rethinking with Retrieval: Faithful Large Language Model Inference 31 Dec 2022

Hangfeng He, Hongming Zhang, Dan Roth

Rationale Engineering

Training Verifiers to Solve Math Word Problems 27 Oct 2021

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

Self-Consistency Improves Chain of Thought Reasoning in Language Models 21 Mar 2022

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

On the Advance of Making Language Models Better Reasoners 6 Jun 2022

Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen

Complexity-Based Prompting for Multi-Step Reasoning 3 Oct 2022

Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

Automatic Chain of Thought Prompting in Large Language Models 7 Oct 2022

Zhuosheng Zhang, Aston Zhang, Mu Li, Alex Smola

Teaching Algorithmic Reasoning via In-context Learning 15 Nov 2022

Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, Hanie Sedghi

Large Language Models are reasoners with Self-Verification 19 Dec 2022

Yixuan Weng, Minjun Zhu, Shizhu He, Kang Liu, Jun Zhao

Problem Decomposition

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models 21 May 2022

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi

Compositional Semantic Parsing with Large Language Models 29 Sep 2022

Andrew Drozdov, Nathanael Schärli, Ekin Akyürek, Nathan Scales, Xinying Song, Xinyun Chen, Olivier Bousquet, Denny Zhou

Decomposed Prompting: A Modular Approach for Solving Complex Tasks 5 Oct 2022

Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

Measuring and Narrowing the Compositionality Gap in Language Models 7 Oct 2022

Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis

Successive Prompting for Decomposing Complex Questions 8 Dec 2022

Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning 31 Jan 2023

Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li

Others

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents 18 Jan 2022

Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning 19 May 2022

Antonia Creswell, Murray Shanahan, Irina Higgins

Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations 24 May 2022

Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, Yejin Choi

Faithful Reasoning Using Large Language Models 30 Aug 2022

Antonia Creswell, Murray Shanahan

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering 20 Sep 2022

Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

Explanations from Large Language Models Make Small Reasoners Better 13 Oct 2022

Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan

Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions 1 Dec 2022

Kumar Shridhar, Alessandro Stolfo, Mrinmaya Sachan

Teaching Small Language Models to Reason 16 Dec 2022

Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language 20 Dec 2022

Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran

Reasoning with Language Model is Planning with World Model 24 May 2023

Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, Zhiting Hu

Hybrid Method

Reasoning-Enhanced Training and Prompting

Reasoning Like Program Executors 27 Jan 2022

Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Qiang Fu, Yan Gao, Jian-Guang Lou, Weizhu Chen

Solving Quantitative Reasoning Problems with Language Models 29 Jun 2022

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Exploring Length Generalization in Large Language Models 11 Jul 2022

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Scaling Instruction-Finetuned Language Models 20 Oct 2022

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, Jason Wei

Galactica: A Large Language Model for Science 16 Nov 2022

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

ALERT: Adapting Language Models to Reasoning Tasks 16 Dec 2022

Ping Yu, Tianlu Wang, Olga Golovneva, Badr Alkhamissy, Gargi Ghosh, Mona Diab, Asli Celikyilmaz

Bootstrapping and Self-Improving

STaR: Bootstrapping Reasoning With Reasoning 28 Mar 2022

Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman

Language Models Can Teach Themselves to Program Better 29 Jul 2022

Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai

Large Language Models Can Self-Improve 20 Oct 2022

Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

Evaluation and Analysis

Are NLP Models really able to Solve Simple Math Word Problems? 12 Mar 2021

Arkil Patel, Satwik Bhattamishra, Navin Goyal

Impact of Pretraining Term Frequencies on Few-Shot Reasoning 15 Feb 2022

Yasaman Razeghi, Robert L. Logan IV, Matt Gardner, Sameer Singh

Are Large Pre-Trained Language Models Leaking Your Personal Information? 25 May 2022

Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) 21 Jun 2022

Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati

Exploring Length Generalization in Large Language Models 11 Jul 2022

Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, Behnam Neyshabur

Language models show human-like content effects on reasoning 14 Jul 2022

Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

FOLIO: Natural Language Reasoning with First-Order Logic 2 Sep 2022

Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Luke Benson, Lucy Sun, Ekaterina Zubova, Yujie Qiao, Matthew Burtell, David Peng, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Shafiq Joty, Alexander R. Fabbri, Wojciech Kryscinski, Xi Victoria Lin, Caiming Xiong, Dragomir Radev

Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought 3 Oct 2022

Abulhair Saparov, He He

Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them 17 Oct 2022

Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei

Large language models are not zero-shot communicators 26 Oct 2022

Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning 15 Dec 2022

Olga Golovneva, Moya Chen, Spencer Poff, Martin Corredor, Luke Zettlemoyer, Maryam Fazel-Zarandi, Asli Celikyilmaz

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters 20 Dec 2022

Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Citation

If you find this repo useful, please kindly cite our survey:

@article{huang2022towards,
  title={Towards Reasoning in Large Language Models: A Survey},
  author={Huang, Jie and Chang, Kevin Chen-Chuan},
  journal={arXiv preprint arXiv:2212.10403},
  year={2022}
}

A request for updating new papers on logical reasoning data augmentation, prompt augmentation and evaluation

Hi Jie,

Here is our new papers for logical reasoning data augmentation, prompt augmentation and evaluation. Please consider to add those papers into your arXiv paper. Thanks a lot.

Logic-Driven Data Augmentation and Prompt Augmentation

We present an AMR-based logic-driven data augmentation for contrastive learning to improve discriminative language model's logical reasoning performance and we also use AMR-based data augmentation method to augment the prompt which help GPT-4 achieved #1 on the ReClor leaderboard (One of the hardest logical reasoning reading comprehension dataset, the data was collected from LSAT and GMAT) and we also achieved better performance than other baseline models on different logical reasoning reading comprehension tasks and natural language inference tasks. Here is the details for the paper.

Our paper (Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Michael Witbrock, Jiamou Liu)
"Enhancing Logical Reasoning of Large Language Models through Logic-Driven Data Augmentation" [Paper link] [Source code] [Model weights] [Leaderboard].

Out-of-Distribution Logical Reasoning Evaluation and Prompt Augmentation for Enhancing OOD Logical Reasoning

We present a systematically out-of-distribution evaluation on logical reasoning tasks. We presented three new more robust logical reasoning datasets ReClor-Plus, LogiQA-Plus and LogiQAv2-Plus which are basically constructed from ReClor, LogiQA and LogiQAv2 from the changes of option's order and forms. We found simply using chain-of-thought prompting will not increase models' performance on the out-of-distribution scenario while using our AMR-based logic-driven data augmentation to augment prompt can increase large language models' performance on out-of-distribution logical reasoning tasks. The three datasets have been collected by OpenAI/Evals.
"A Systematic Evaluation of Large Language Models on Out-of-Distribution Logical Reasoning Tasks" [Paper link] [Source code] [Dataset links].

A Empirical Study on Out-Of-Distribution Multi-Step Logical Reasoning

We find that pre-trained language models are not good at on robust multi-step logical reasoning tasks and one of the main reason is that there is limited amount of training sets for deeper multi-step logical reasoning. Therefore, we present a deeper large multi-step logical reasoning datasets named PARARULE-Plus. The dataset has also been collected by OpenAI/Evals.
"Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation" [Paper link] [Source code] [Dataset links].

jeffhj / lm-reasoning Goto Github PK

lm-reasoning's Introduction

Reasoning in Large Language Models

Contents

Survey

Towards Reasoning in Large Language Models: A Survey 20 Dec 2022

Relevant Survey and Position Paper and Blog

Emergent Abilities of Large Language Models 15 Jun 2022

Language Model Cascades 21 Jul 2022

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources 11 Dec 2022

Reasoning with Language Model Prompting: A Survey 19 Dec 2022

A Survey of Deep Learning for Mathematical Reasoning 20 Dec 2022

A Survey for In-context Learning 31 Dec 2022

Logical Reasoning over Natural Language as Knowledge Representation: A Survey 21 Mar 2023

Nature Language Reasoning, A Survey 26 Mar 2023

Technique

Fully Supervised Finetuning

Explain Yourself! Leveraging Language Models for Commonsense Reasoning 6 Jun 2019

Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge 11 Jun 2020

Measuring Mathematical Problem Solving With the MATH Dataset 5 Mar 2021

Show Your Work: Scratchpads for Intermediate Computation with Language Models 30 Nov 2021

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language 19 Mar 2022

......

Prompting and In-Context Learning

Chain of Thought Prompting and Its Variants/Applications

Chain of Thought Prompting Elicits Reasoning in Large Language Models 28 Jan 2022

Iteratively Prompt Pre-trained Language Models for Chain of Thought 16 Mar 2022

Large Language Models are Zero-Shot Reasoners 24 May 2022

Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models 16 Sep 2022

Language Models are Multilingual Chain-of-Thought Reasoners 6 Oct 2022

Large Language Models are few(1)-shot Table Reasoners 13 Oct 2022

Language Models of Code are Few-Shot Commonsense Learners 13 Oct 2022

PaL: Program-Aided Language Model 18 Nov 2022

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks 22 Nov 2022

Rethinking with Retrieval: Faithful Large Language Model Inference 31 Dec 2022

Rationale Engineering

Training Verifiers to Solve Math Word Problems 27 Oct 2021

Self-Consistency Improves Chain of Thought Reasoning in Language Models 21 Mar 2022

On the Advance of Making Language Models Better Reasoners 6 Jun 2022

Complexity-Based Prompting for Multi-Step Reasoning 3 Oct 2022

Automatic Chain of Thought Prompting in Large Language Models 7 Oct 2022

Teaching Algorithmic Reasoning via In-context Learning 15 Nov 2022

Large Language Models are reasoners with Self-Verification 19 Dec 2022

Problem Decomposition

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models 21 May 2022

Compositional Semantic Parsing with Large Language Models 29 Sep 2022

Decomposed Prompting: A Modular Approach for Solving Complex Tasks 5 Oct 2022

Measuring and Narrowing the Compositionality Gap in Language Models 7 Oct 2022

Successive Prompting for Decomposing Complex Questions 8 Dec 2022

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning 31 Jan 2023

Others

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents 18 Jan 2022

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning 19 May 2022

Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations 24 May 2022

Faithful Reasoning Using Large Language Models 30 Aug 2022

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering 20 Sep 2022

Explanations from Large Language Models Make Small Reasoners Better 13 Oct 2022

Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions 1 Dec 2022

Teaching Small Language Models to Reason 16 Dec 2022

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language 20 Dec 2022

Reasoning with Language Model is Planning with World Model 24 May 2023

Hybrid Method

Reasoning-Enhanced Training and Prompting

Reasoning Like Program Executors 27 Jan 2022

Solving Quantitative Reasoning Problems with Language Models 29 Jun 2022

Exploring Length Generalization in Large Language Models 11 Jul 2022

Scaling Instruction-Finetuned Language Models 20 Oct 2022

Galactica: A Large Language Model for Science 16 Nov 2022

ALERT: Adapting Language Models to Reasoning Tasks 16 Dec 2022

Bootstrapping and Self-Improving

STaR: Bootstrapping Reasoning With Reasoning 28 Mar 2022

Language Models Can Teach Themselves to Program Better 29 Jul 2022

Large Language Models Can Self-Improve 20 Oct 2022

Evaluation and Analysis

Are NLP Models really able to Solve Simple Math Word Problems? 12 Mar 2021

Impact of Pretraining Term Frequencies on Few-Shot Reasoning 15 Feb 2022

Are Large Pre-Trained Language Models Leaking Your Personal Information? 25 May 2022

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) 21 Jun 2022

Exploring Length Generalization in Large Language Models 11 Jul 2022

Language models show human-like content effects on reasoning 14 Jul 2022