GithubHelp home page GithubHelp logo

backpropagation_free_training_survey's Introduction

Efficient LLM and Multimodal Foundation Model Survey

This repo contains the paper list for A Survey of Backpropagation-free Training For LLMs.

Abstract

Large language models (LLMs) have achieved remarkable performance in various downstreaming tasks. However, the training of LLMs is computationally expensive and requires a large amount of memory. To address this issue, backpropagation-free (BP-free) training has been proposed as a promising approach to reduce the computational and memory costs of training LLMs. In this survey, we provide a comprehensive overview of BP-free training for LLMs from the perspective of mainstream BP-free training methods and their optimizations for LLMs. The goal of this survey is to provide a comprehensive understanding of BP-free training for LLMs and to inspire future research in this area.

Contribute

If we leave out any important papers, please let us know in the Issues and we will include them in the next version.

We will actively maintain the survey and the Github repo.

Table of Contents

BP-free Methods

Perturbated Model

  • Gradients without Backpropagation. [arXiv'22] [Paper]

  • Can Forward Gradient Match Backpropagation? [ICLR'23] [Paper]

  • Scaling Forward Gradient With Local Losses. [ICLR'23] [Paper] [Code]

Forward Gradient

  • Gradients without Backpropagation. [arXiv'22] [Paper]

  • Learning by Directional Gradient Descent. [ICLR'22] [Paper]

  • Optimization without Backpropagation. [arXiv'22] [Paper]

  • Scaling Forward Gradient With Local Losses. [ICLR'23] [Paper] [Code]

  • Can Forward Gradient Match Backpropagation? [ICLR'23] [Paper]

  • Low-variance Forward Gradients using Direct Feedback Alignment and momentum. [arXiv'22] [Paper]

  • How to Guess a Gradient. [arXiv'23] [Paper]

Zeroth-order Optimization

  • Does Federated Learning Really Need Backpropagation? [arXiv'23] [Paper] [Code]

  • Fine-Tuning Language Models with Just Forward Passes. [NeurIPS'23] [Paper] [Code]

  • DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization. [arXiv'23] [Paper]

  • DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training. [ICLR'24] [Paper] [Code]

Evolution Strategy

  • Black-Box Tuning for Language-Model-as-a-Service. [ICML'22] [Paper] [Code]

  • BBTv2: Towards a Gradient-Free Future with Large Language Models. [EMNLP'22] [Paper] [Code]

  • Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies. [ICML'21] [Paper]

  • Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single. [arXiv'23] [Paper]

Perturbated Input

  • The Forward-Forward Algorithm: Some Preliminary Investigations. [arXiv'22] [Paper][Code]

  • Graph Neural Networks Go Forward-Forward. [arXiv'23] [Paper]

  • The Predictive Forward-Forward Algorithm. [arXiv'23] [Paper][Code]

  • Contrastive-Signal-Dependent Plasticity: Forward-Forward Learning of Spiking Neural Systems. [arXiv'23] [Paper]

  • Training Convolutional Neural Networks with the Forward-Forward Algorithm. [arXiv'23] [Paper]

  • Backpropagation-free Training of Deep Physical Neural Networks. [Science'23] [Paper]

  • Forward-Forward Training of an Optical Neural Network. [arXiv'23] [Paper]

  • µ-FF: On-Device Forward-Forward Training Algorithm for Microcontrollers. [SMARTCOMP'23] [Paper]

  • Error-driven Input Modulation: Solving the Credit Assignment Problem without a Backward Pass. [ICML'22] [Paper][Code]

  • Suitability of Forward-Forward and PEPITA Learning to MLCommons-Tiny benchmarks. [COINS'23] [Paper][Code]

No Perturbation

  • Neural Network Learning without Backpropagation. [IEEE Transactions on Neural Networks'10] [Paper]

  • The HSIC Bottleneck: Deep Learning without Back-Propagation. [AAAI'20] [Paper][Code]

  • Building Deep Random Ferns Without Backpropagation. [IEEE Access'20] [Paper]

BP-free LLM

Parameter-Efficient Tuning

  • Black-Box Tuning for Language-Model-as-a-Service. [ICML'22] [Paper][Code]

  • BBTv2: Towards a Gradient-Free Future with Large Language Models. [EMNLP'22] [Paper][Code]

  • Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives. [arXiv'23] [Paper][Code]

  • Clip-Tuning: Towards Derivative-free Prompt Learning with a Mixture of Rewards. [EMNLP'22] [Paper]

  • RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning. [EMNLP'22] [Paper][Code]

  • Black-box Prompt Learning for Pre-trained Language Models. [arXiv'22] [Paper][Code]

  • PromptBoosting: Black-Box Text Classification with Ten Forward Passes. [ICML'23] [Paper][Code]

  • GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models. [arXiv'22] [Paper][Code]

  • Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models. [arXiv'23] [Paper]

  • Iterative Forward Tuning Boosts In-context Learning in Language Models. [arXiv'23] [Paper][Code]

  • FwdLLM: Efficient FedLLM using Forward Gradient. [arXiv'23] [Paper][Code]

  • HyperTuning: Toward Adapting Large Language Models without Back-propagation. [ICML'23] [Paper]

Full-Parameter Tuning

  • Backpropagation Free Transformers. [NeurIPS'20] [Paper]

  • Forward Learning of Large Language Models by Consumer Devices. [Electronics'24] [Paper][Code]

  • Fine-Tuning Language Models with Just Forward Passes. [arXiv'23] [Paper][Code]

  • Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes. [arXiv'23] [Paper][Code]

backpropagation_free_training_survey's People

Contributors

gardenia7 avatar caidongqi avatar ubiquitouslearning avatar wuyaozong99 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.