GithubHelp home page GithubHelp logo

strivin0311 / long-llms-learning Goto Github PK

View Code? Open in Web Editor NEW
241.0 8.0 11.0 35.57 MB

A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks

Jupyter Notebook 95.54% Python 4.46%

long-llms-learning's Introduction

long-llms-learning

survey

A repository sharing the panorama of the methodology literature on Transformer architecture upgrades in Large Language Models for handling extensive context windows, with real-time updating the newest published works.

Overview

Survey

For a clear taxonomy and more insights about the methodology, you can refer to our survey: Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey with a overview shown below

Overview of the survey

Flash-ReRoPE

We have augmented the great work rerope by Su with flash-attn kernel to combine rerope's infinite postional extrapolation capability with flash-attn's efficience, named as flash-rerope.

You can find and use the implementation as a flash-attn-like interface function here, with a simple precision and flops test script here.

Or you can further see how to implement llama attention module with flash-rerope here.

Latest News

Latest Works

Latest Baselines

Latest Benchmarks

More to Learn

Long-LLMs-Evals

  • We've also released a building repo long-llms-evals as a pipeline to evaluate various methods designed for general / specific LLMs to enhance their long-context capabilities on well-known long-context benchmarks.

LLMs-Learning

  • This repo is also a sub-track for another repo llms-learning, where you can learn more technologies and applicated tasks about the full-stack of Large Language Models.

Table of Contents

Contribution

If you want to make contribution to this repo, you can just make a pr / email us with the link to the paper(s) or use the format as below:

  • (un)read paper format:
#### <paper title> [(UN)READ]

paper link: [here](<link address>)

xxx link: [here](<link address>)

citation:
<bibtex citation>

Citation

If you find the survey or this repo helpful in your research or work, you can cite our paper as below:

@misc{huang2024advancing,
      title={Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey}, 
      author={Yunpeng Huang and Jingwei Xu and Junyu Lai and Zixu Jiang and Taolue Chen and Zenan Li and Yuan Yao and Xiaoxing Ma and Lijuan Yang and Hao Chen and Shupeng Li and Penghao Zhao},
      year={2024},
      eprint={2311.12351},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

long-llms-learning's People

Contributors

eltociear avatar lixin4ever avatar paragonlight avatar strivin0311 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

long-llms-learning's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.