GithubHelp home page GithubHelp logo

wzongyu / llm-and-multimodal-paper-list Goto Github PK

View Code? Open in Web Editor NEW
36.0 1.0 3.0 130 KB

A paper list about large language models and multimodal models (Diffusion, VLM). From foundations to applications. It is only used to record papers for my personal needs.

diffusion-models generative-model large-language-models multi-modal paper-list foundation-models stable-diffusion

llm-and-multimodal-paper-list's Introduction

LLM-and-VLM-Paper-List

A paper list about large language models and multi-modal models.
Note: It only records papers for my personal needs. It is welcome to open an issue if you think I missed some important or exciting work!

Table of Contents

Survey

  • LVLM Attack Survey: A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends. Arxiv'2024. [paper], [github]
  • HELM: Holistic evaluation of language models. TMLR. paper
  • HEIM: Holistic Evaluation of Text-to-Image Models. NeurIPS'2023. paper
  • Eval Survey: A Survey on Evaluation of Large Language Models. Arxiv'2023. paper
  • Healthcare LM Survey: A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics. Arxiv'2023. paper, github
  • Multimodal LLM Survey: A Survey on Multimodal Large Language Model. Arxiv'2023. paper, github
  • VLM for vision Task Survey: Vision Language Models for Vision Tasks: A Survey. Arxiv'2023. paper, github
  • Efficient LLM Survey: Efficient Large Language Models: A Survey. Arxiv'2023. paper, github
  • Prompt Engineering Survey: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. Arxiv'2021. paper
  • Multimodal Safety Survey: Safety of Multimodal Large Language Models on Images and Text. Arxiv'2024. paper
  • Multimodal LLM Recent Survey: MM-LLMs: Recent Advances in MultiModal Large Language Models. Arxiv'2024. paper
  • Prompt Engineering in LLM Survey: A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. Arxiv'2024. paper
  • LLM Security and Privacy Survey: A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. Arxiv'2024. paper
  • LLM Privacy Survey: Privacy in Large Language Models: Attacks, Defenses and Future Directions. Arxiv'2023. paper

Language Model

Foundation LM Models

  • Transformer: Attention Is All You Need. NIPS'2017. paper
  • GPT-1: Improving Language Understanding by Generative Pre-Training. 2018. paper
  • BERT: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL'2019. paper
  • GPT-2: Language Models are Unsupervised Multitask Learners. 2018. paper
  • RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach. Arxiv'2019, paper
  • DistilBERT: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Arxiv'2019. paper
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR'2020. paper
  • GPT-3: Language Models are Few-Shot Learners. NeurIPS'2020. paper
  • GLaM: GLaM: Efficient Scaling of Language Models with Mixture-of-Experts. ICML'2022. paper
  • PaLM: PaLM: Scaling Language Modeling with Pathways. ArXiv'2022. paper
  • BLOOM: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Arxiv'2022. paper
  • BLOOMZ: Crosslingual Generalization through Multitask Finetuning. Arxiv'2023. paper
  • LLaMA: LLaMA: Open and Efficient Foundation Language Models. Arxiv'2023. paper
  • GPT-4: GPT-4 Technical Report. Arxiv'2023. paper
  • PaLM 2: PaLM 2 Technical Report. 2023. paper
  • LLaMA 2: Llama 2: Open foundation and fine-tuned chat models. Arxiv'2023. paper
  • Mistral: Mistral 7B. Arxiv'2023. paper
  • Phi1: Project Link
  • Phi1.5: Project Link
  • Phi2: Project Link
  • Falcon: Project Link

RLHF

  • PPO: Proximal Policy Optimization Algorithms. Arxiv'2017. paper
  • DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS'2023. paper

Parameter Efficient Fine-tuning

  • LoRA: LoRA: Low-Rank Adaptation of Large Language Models. Arxiv'2021. paper
  • Q-LoRA: QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS'2023. paper

Healthcare LM

  • Med-PaLM: Large Language Models Encode Clinical Knowledge. Arxiv'2022. paper
  • MedAlpaca: MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data. Arxiv'2023. paper
  • Med-PaLM 2: Towards Expert-Level Medical Question Answering with Large Language Models. Arxiv'2023. paper
  • HuatuoGPT: HuatuoGPT, towards Taming Language Model to Be a Doctor. EMNLP'2023(findings). paper
  • GPT-4-Med: Capabilities of GPT-4 on Medical Challenge Problems. Arxiv'2023. paper

Watermarking LLM

Prompt Engineering in LLM

Hard Prompt

  • PET: Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. EACL'2021. paper
  • Making Pre-trained Language Models Better Few-shot Learners. ACL'2021. paper

Soft Prompt

  • Prompt-Tuning:The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP'2021 [paper]
  • Prefix-Tuning: Prefix-Tuning: Optimizing Continuous Prompts for Generation. ACL'2021. paper
  • P-tuning: P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. ACL'2022. paper
  • P-tuning v2: P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. Arxiv'2022. Paper

Between Soft and Hard

  • Auto-Prompt: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. EMNLP'2020. paper
  • FluentPrompt: Toward Human Readable Prompt Tuning: Kubrick's The Shining is a good movie, and a good prompt too?. EMNLP'2023 (findings). paper
  • PEZ: Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery. Arxiv'2023. paper

Multi-modal Models

Foundation Multi-Modal Models

  • CLIP: Learning Transferable Visual Models From Natural Language Supervision. ICML'2021. paper
  • DeCLIP: Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. ICLR'2022. paper
  • FILIP: FILIP: Fine-grained Interactive Language-Image Pre-Training. ICLR'2022. paper
  • Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models. CVPR'2022. paper
  • BLIP: BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML'2022. paper
  • BLIP2: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML'2023. paper
  • LLaMA-Adapter: LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention. Arxiv'2023. paper
  • LLaVA: Visual Instruction Tuning. NeurIPS'2023. paper
  • LLaVA 1.5: Improved Baselines with Visual Instruction Tuning. CVPR'2024. paper
  • Instruct BLIP: InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS'2023. paper
  • InternVL 1.0: InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks. CVPR'2024 (Oral). paper
  • InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. Arxiv'2024. Tech Report

T2I Safety

  • SLD: Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models. CVPR'2023. paper
  • ESD: Erasing Concepts from Diffusion Models. ICCV'2023. paper

LVLM Hallucinations

  • POPE: Evaluating Object Hallucination in Large Vision-Language Models. EMNLP'2023. paper
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models. CVPR'2024. paper

LVLM Adversarial Attack

  • On the Adversarial Robustness of Multi-Modal Foundation Models. ICCV Workshop'2023. paper

LVLM Privacy

Prompt Engineering in VLM


Agent

LLM-based Agent

  • Stanford Town: Generative Agents: Interactive Simulacra of Human Behavior. UIST'2023. paper

VLM-based Agent

  • OSWorld: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Arxiv'2024. paper

Useful-Resource

llm-and-multimodal-paper-list's People

Contributors

hongcheng-gao avatar wu-zongyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.