GithubHelp home page GithubHelp logo

meta-llama's Introduction

LLaMA

PapersπŸ“„

I am reading these papers:
βœ… LLaMA: Open and Efficient Foundation Language Models
βœ… Llama 2: Open Foundation and Fine-Tuned Chat Models
β˜‘οΈ OPT: Open Pre-trained Transformer Language Models
βœ… Attention Is All You Need
βœ… Root Mean Square Layer Normalization
βœ… GLU Variants Improve Transformer
βœ… RoFormer: Enhanced Transformer with Rotary Position Embedding
βœ… Self-Attention with Relative Position Representations
β˜‘οΈ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
β˜‘οΈ To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding
βœ… Fast Transformer Decoding: One Write-Head is All You Need
βœ… GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
β˜‘οΈ PaLM: Scaling Language Modeling with Pathways

Goals πŸš€

βœ… Understand the concept of dot product of two matrices.
βœ… Understand the concept of autoregressive language models.
βœ… Understand the concept of attention computation.
βœ… Understand the workings of Byte-Pair Encoding (BPE) algorithm and tokenizer.
βœ… Read and implement the workings of the SentencePiece library and tokenizer.
βœ… Understand the concept of tokenization, input ids and embedding vectors.
βœ… Understand & implement the concept of positional encoding.
βœ… Understand the concept of single head self-attention.
βœ… Understand the concept of scaled dot-product attention.
βœ… Understand & implement the concept of multi-head attention.
βœ… Understand & implement the concept of layer normalization.
βœ… Understand the concept of masked multi-head attention & softmax layer.
βœ… Understand and implement the concept of RMSNorm and difference with LayerNorm.
βœ… Understand the concept of internal covariate shift.
βœ… Understand the concept and implementation of feed-forward network with ReLU activation.
βœ… Understand the concept and implementation of feed-forward network with SwiGLU activation.
βœ… Understand the concept of absolute positional encoding.
βœ… Understand the concept of relative positional encoding.
βœ… Understand and implement the rotary positional embedding.
βœ… Understand and implement the transformer architecture.
βœ… Understand and implement the original Llama (1) architecture.
βœ… Understand the concept of multi-query attention with single KV projection.
βœ… Understand and implement grouped query attention from scratch.
βœ… Understand and implement the concept of KV cache.
βœ… Understand and implement the concept of Llama2 architecture.
βœ… Test the Llama2 implementation using the checkpoints from Meta.
βœ… Download the checkpoints of Llama2 and inspect the inference code and working.
β˜‘οΈ Documentation of the Llama2 implementation and repo.

Blog Posts:

βœ… LLAMA: OPEN AND EFFICIENT LLM NOTES
β˜‘οΈ Add more blog posts.

Related GitHub Works:

🌐 pytorch-llama - PyTorch implementation of LLaMA by Umar Jamil.
🌐 pytorch-transformer - PyTorch implementation of Transformer by Umar Jamil.
🌐 llama - Facebook's LLaMA implementation.
🌐 tensor2tensor - Google's transformer implementation.
🌐 rmsnorm - RMSNorm implementation.
🌐 roformer - Rotary Tranformer implementation.
🌐 xformers - Facebook's implementation.

Articles:

βœ… Understanding SentencePiece ([Under][Standing][_Sentence][Piece])
βœ… SwiGLU: GLU Variants Improve Transformer (2020)

meta-llama's People

Contributors

thinamxx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.