Efficient LLM and Multimodal Foundation Model Survey

This repo contains the paper list for A Survey of Backpropagation-free Training For LLMs.

Abstract

Large language models (LLMs) have achieved remarkable performance in various downstreaming tasks. However, the training of LLMs is computationally expensive and requires a large amount of memory. To address this issue, backpropagation-free (BP-free) training has been proposed as a promising approach to reduce the computational and memory costs of training LLMs. In this survey, we provide a comprehensive overview of BP-free training for LLMs from the perspective of mainstream BP-free training methods and their optimizations for LLMs. The goal of this survey is to provide a comprehensive understanding of BP-free training for LLMs and to inspire future research in this area.

Contribute

If we leave out any important papers, please let us know in the Issues and we will include them in the next version.

We will actively maintain the survey and the Github repo.

BP-free Methods
BP-Free LLM
- Parameter-Efficient Tuning
- Full-Parameter Tuning

BP-free Methods

Perturbated Model

Gradients without Backpropagation. [arXiv'22] [Paper]
Can Forward Gradient Match Backpropagation? [ICLR'23] [Paper]
Scaling Forward Gradient With Local Losses. [ICLR'23] [Paper] [Code]

Forward Gradient

Gradients without Backpropagation. [arXiv'22] [Paper]
Learning by Directional Gradient Descent. [ICLR'22] [Paper]
Optimization without Backpropagation. [arXiv'22] [Paper]
Scaling Forward Gradient With Local Losses. [ICLR'23] [Paper] [Code]
Can Forward Gradient Match Backpropagation? [ICLR'23] [Paper]
Low-variance Forward Gradients using Direct Feedback Alignment and momentum. [arXiv'22] [Paper]
How to Guess a Gradient. [arXiv'23] [Paper]

Zeroth-order Optimization

Does Federated Learning Really Need Backpropagation? [arXiv'23] [Paper] [Code]
Fine-Tuning Language Models with Just Forward Passes. [NeurIPS'23] [Paper] [Code]
DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization. [arXiv'23] [Paper]
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training. [ICLR'24] [Paper] [Code]

Evolution Strategy

Black-Box Tuning for Language-Model-as-a-Service. [ICML'22] [Paper] [Code]
BBTv2: Towards a Gradient-Free Future with Large Language Models. [EMNLP'22] [Paper] [Code]
Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies. [ICML'21] [Paper]
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single. [arXiv'23] [Paper]

Perturbated Input

The Forward-Forward Algorithm: Some Preliminary Investigations. [arXiv'22] [Paper][Code]
Graph Neural Networks Go Forward-Forward. [arXiv'23] [Paper]
The Predictive Forward-Forward Algorithm. [arXiv'23] [Paper][Code]
Contrastive-Signal-Dependent Plasticity: Forward-Forward Learning of Spiking Neural Systems. [arXiv'23] [Paper]
Training Convolutional Neural Networks with the Forward-Forward Algorithm. [arXiv'23] [Paper]
Backpropagation-free Training of Deep Physical Neural Networks. [Science'23] [Paper]
Forward-Forward Training of an Optical Neural Network. [arXiv'23] [Paper]
µ-FF: On-Device Forward-Forward Training Algorithm for Microcontrollers. [SMARTCOMP'23] [Paper]
Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass. [ICML'22] [Paper][Code]
Suitability of Forward-Forward and PEPITA Learning to MLCommons-Tiny benchmarks. [COINS'23] [Paper][Code]

No Perturbation

Neural Network Learning without Backpropagation. [IEEE Transactions on Neural Networks'10] [Paper]
The HSIC Bottleneck: Deep Learning without Back-Propagation. [AAAI'20] [Paper][Code]
Building Deep Random Ferns Without Backpropagation. [IEEE Access'20] [Paper]

BP-free LLM

Parameter-Efficient Tuning

Black-Box Tuning for Language-Model-as-a-Service. [ICML'22] [Paper][Code]
BBTv2: Towards a Gradient-Free Future with Large Language Models. [EMNLP'22] [Paper][Code]
Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives. [arXiv'23] [Paper][Code]
Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards. [EMNLP'22] [Paper]
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. [EMNLP'22] [Paper][Code]
Black-box Prompt Learning for Pre-trained Language Models. [arXiv'22] [Paper][Code]
PromptBoosting: Black-Box Text Classification with Ten Forward Passes. [ICML'23] [Paper][Code]
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. [arXiv'22] [Paper][Code]
Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models. [arXiv'23] [Paper]
Iterative Forward Tuning Boosts In-context Learning in Language Models. [arXiv'23] [Paper][Code]
FwdLLM: Efficient FedLLM using Forward Gradient. [arXiv'23] [Paper][Code]
HyperTuning: Toward Adapting Large Language Models without Back-propagation. [ICML'23] [Paper]

Full-Parameter Tuning

Backpropagation Free Transformers. [NeurIPS'20] [Paper]
Forward Learning of Large Language Models by Consumer Devices. [Electronics'24] [Paper][Code]
Fine-Tuning Language Models with Just Forward Passes. [arXiv'23] [Paper][Code]
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes. [arXiv'23] [Paper][Code]

ahmedcs / backpropagation_free_training_survey Goto Github PK

backpropagation_free_training_survey's Introduction

Efficient LLM and Multimodal Foundation Model Survey

Abstract

Contribute

Table of Contents

BP-free Methods

Perturbated Model

Forward Gradient

Zeroth-order Optimization

Evolution Strategy

Perturbated Input

No Perturbation

BP-free LLM

Parameter-Efficient Tuning

Full-Parameter Tuning

backpropagation_free_training_survey's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Jobs