GithubHelp home page GithubHelp logo

knarayan / awesome-tabular-llms Goto Github PK

View Code? Open in Web Editor NEW

This project forked from spursgozmy/awesome-tabular-llms

0.0 0.0 0.0 99 KB

We collect papers about "large language models (LLM) for table-related tasks", e.g., using LLM for Table QA task. “表格+LLM”相关论文整理

awesome-tabular-llms's Introduction

A-Paper-List-of-Awesome-Tabular-LLMs

Different types of tables are widely used to store and present information. To automatically process numerous tables and gain valuable insights, researchers have proposed a series of deep-learning models for various table-based tasks, e.g., table question answering (TQA), table-to-text (T2T), text-to-sql (NL2SQL) and table fact verification (TFV). Recently, the emerging Large Language Models (LLMs) and more powerful Multimodal Large Language Models (MLLMs) have opened up new possibilities for processing the tabular data, i.e., we can use one general model to process diverse tables and fulfill different tabular tasks based on the user natural language instructions. We refer to these LLMs speciallized for tabular tasks as Tabular LLMs. In this repository, we collect a paper list about recent Tabular (M)LLMs and divide them into the following categories based on their key idea.


Table of Contents:

  1. Survey of Tabular LLMs and table understanding
  2. Prompting LLMs for different tabular tasks, e.g., in-context learning, prompt engineering and integrating external tools.
  3. Training LLMs for better table understanding ability, e.g., training existing LLMs by instruction fine-tuning or post-pretraining.
  4. Developing agents for processing tabular data, e.g., devolping copilot for processing excel tables.
  5. Empirical study or benchmarks for evaluating LLMs' table understanding ability, e.g., exploring the influence of various table types or table formats.
  6. Multimodal table understanding, e.g., training MLLMs to understand diverse table images and textual user requests.

Task Names and Abbreviations:

Task Names Abbreviations Task Descriptions
Table Question Answering TQA Answering questions based on the table(s), e.g., answer look-up or computation questions about table(s).
Table-to-Text Table2Text or T2T Generate a text based on the table(s), e.g., generate a analysis report given a financial statement.
Text-to-Table Text2Table Generate structured tables based on input text, e.g., generate a statistical table based on the game summary.
Table Fact Verification TFV Judging if a statement is true or false (or not enough evidence) based on the table(s)
Text-to-SQL NL2SQL Generate a SQL statement to answer the user question based on the database schema
Tabular Mathematical Reasoning TMR Solving mathematical reasoning problems based on the table(s), e.g., solve math word problems related to a table
Table-and-Text Question Answering TAT-QA Answering questions based on both table(s) and their related texts, e.g., answer questions given wikipedia tables and their surrounding texts.
Table Interpretation TI Interpreting basic table content and structure information, e.g., column type annotation, entity linking, relation extraction, cell type classification et al.
Table Augmentation TA Augmenting existing tables with new data, e.g., schema augmentation, row population, et al.

1. Survey of Tabular LLMs and Table Understanding

Title Conference Date Pages
Large Language Model for Table Processing: A Survey arxiv 2024-02-04 9
A Survey of Table Reasoning with Large Language Models arxiv 2024-02-13 9
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey arxiv 2024-03-01 41
Transformers for Tabular Data Representation: A Survey of Models and Applications TACL 2023 23
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks IJCAI 2022 2022-01-24 15

2. Prompting LLMs for Different Tabular Tasks

Title Conference Date Task Code
Enhancing Temporal Understanding in LLMs for Semi-structured Tables arxiv 2024-07-22 Temporal TQA
Star
ALTER: Augmentation for Large-Table-Based Reasoning
arxiv 2024-07-03 TQA Github
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering arxiv 2024-06-27 TQA
Adapting Knowledge for Few-shot Table-to-Text Generation arxiv 2024-03-27 T2T
Graph Reasoning Enhanced Language Models for Text-to-SQL SIGIR 2024 NL2SQL
NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization arxiv 2024-06-25 TQA,TFV
Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo NAACL 2024 2024-04-05 T2T
TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition NAACL 2024 TQA,TFV
Star
E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate
NAACL 2024 TQA on hierarchical tables Github
OpenTE: Open-Structure Table Extraction From Text ICASSP 2024 Text-to-Table Extraction
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL NAACL 2024 2024-04-03 NL2SQL
MFORT-QA: Multi-hop Few-shot Open Rich Table Question Answering arxiv 2024-03-28 TQA
Star
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
ICLR 2024 2024-02-22 TQA,TFV Github
CABINET: Content Relevance based Noise Reduction for Table Question Answering ICLR 2024 2024-02-02 TQA
Star
Augment before You Try: Knowledge-Enhanced Table Question Answering via Table Expansion
arxiv 2024-01-24 TQA Github
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding ICLR 2024 2024-01-09 TQA,TFV
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning arxiv 2023-12-14 TQA,TAT-QA,TFV,T2T Github
Large Language Models are Complex Table Parsers EMNLP 2023 2023-12-13 TQA
API-Assisted Code Generation for Question Answering on Varied Table Structures EMNLP 2023 2023-10-23 TQA
Star
TableQAKit: A Comprehensive and Practical Toolkit for Table-based Question Answering
arxiv 2023-10-23 TQA,NL2SQL Github
Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies arxiv 2023-05-21 NL2SQL
Star
StructGPT: A General Framework for Large Language Model to Reason over Structured Data
EMNLP 2023 2023-05-16 TQA, TFV Github
Star
Chameleon:Plug-and-Play Compositional Reasoning with Large Language Models
NIPS 2023 2023-04-19 TMR Github
Generate, Transform, Answer: Question Specific Tool Synthesis for Tabular Data EMNLP 2023 2023-03-17 TQA,NL2SQL
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging Large Language Models SIGMOD 2024 2023-03-12 Table Transformation
Star
Large Language Models are Versatile Decomposers:Decompose Evidence and Questions for Table-based Reasoning
SIGIR 2023 2023-01-13 TQA, TFV Github
Star
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
TMLR 2023 2022-11-22 TMR, TAT-QA Github
Star
Large Language Models are few(1)-shot Table Reasoners
EACL 2023 Findings 2022-10-13 TQA, TFV Github
Star
Binding Language Models in Symbolic Languages
ICLR 2023 2022-10-06 TQA, TFV Github
Star
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
ICLR 2023 2022-09-29 TMR (Tabular Mathematical Reasoning) Github

3. Training LLMs for Better Table Understanding Ability

Title Conference Date Task LLM Backbone Code
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models arxiv 2024-07-12 Excel Manipulation
Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science arxiv 2024-03-29 Predictive Tabular Tasks Llama2 7B HuggingFace
HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding arxiv 2024-03-28 TI,TQA Vicuna-1.5 7B
Star
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
arxiv 2024-03-28 Table Manipulation CodeLlama 7B, 13B Github
Star
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
CoLM 2024 2024-02-26 TQA,TFV,T2T,NL2SQL CodeLlama 7B-34B Github
Star
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
arxiv 2024-01-24 TQA Llama2 7B, 13B, 70B Github
Star
TableLlama: Towards Open Large Generalist Models for Tables
NAACL 2024 2023-11-15 TQA,TFV,T2T,TA,TI Llama2 7B Github
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence arxiv 2023-11-15 T2T Llama2 7B-13B
Table-GPT: Table-tuned GPT for Diverse Table Tasks arxiv 2023-10-13 TQA GPT-3.5, ChatGPT

Pre-trained Tabular Language Models (non-LLM)

Title Conference Date Task Code
Star
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
NIPS 2023 2023-07-14 TA, TI Github
FLAME: A small language model for spreadsheet formulas AAAI 2024 2023-01-31 Generating Excel Formulas Github

4. Developing Agents for Processing Tabular Data

Title Conference Date Task Code
SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models arxiv 2024-03-06 Manipulating Excels with LLM Github
Star
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records
arxiv 2024-01-13 TQA Github
Star
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
arxiv 2024-01-10 Data Analysis Github
Star
DB-GPT: Empowering Database Interactions with Private Large Language Models
arxiv 2023-12-29 Data Analysis Github
ReAcTable: Enhancing ReAct for Table Question Answering arxiv 2023-10-01 TQA
Star
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models
NIPS 2023 2023-05-30 Manipulating Excels with LLM Github
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT arxiv 2023-07-17 Manipulating CSV table with LLM

5. Empirical Study or Benchmarks for Evaluating LLMs' Table Understanding Ability

Title Conference Date Task Code
Rethinking Tabular Data Understanding with Large Language Models NAACL 2024 2023-12-27 TQA
On the Robustness of Language Models for Tabular Question Answering arxiv 2024-06-18 TQA
FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering NAACL 2024 2024-04-29 TQA
How Robust are the Tabular QA Models for Scientific Tables? A Study using Customized Dataset arxiv 2024-03-20 TQA
Star
InstructExcel: A Benchmark for Natural Language Instruction in Excel
Findings of EMNLP 2023 2023-10-23 Excel operations Github
Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs arxiv 2023-10-16 Fact-Finding Tasks, Transformation Tasks
Star
Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios
EMNLP 2023 2023-05-24 T2T Github
Star
TABLET: Learning From Instructions For Tabular Data
arxiv 2023-04-25 Github
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study WSDM 2024 2023-05-22 TQA,TFV,T2T
Evaluating the Text-to-SQL Capabilities of Large Language Models arxiv 2022-03-15 NL2SQL
Star
A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability
arxiv 2023-03-12 NL2SQL Github
Star
RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations
ACL 2023 2023-06-25 TQA Github

6. Multimodal Table Understanding

Title Conference Date Task Code
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy arxiv 2024-06-03 TQA,TI
Star
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains
arxiv 2024-04-30 TQA, TFV Github
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs ACL 2024 2024-02-19 TQA,TFV,T2T
Star
Multimodal Table Understanding
ACL 2024 2024-02-15 TQA, TFV, T2T, TI, TAT-QA, TMR Github

awesome-tabular-llms's People

Contributors

spursgozmy avatar 01warpdrive avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.