GithubHelp home page GithubHelp logo

mbzuai-oryx / mobillama Goto Github PK

View Code? Open in Web Editor NEW
573.0 13.0 42.0 3.81 MB

MobiLlama : Small Language Model tailored for edge devices

Home Page: https://github.com/mbzuai-oryx/MobiLlama

License: Apache License 2.0

Python 99.74% Shell 0.26%
efficient-llm llm slm mobile-llm tiny-llm

mobillama's Introduction

📱🦙 MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Oryx MobiLLama

Oryx MobiLLama

license

Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), UAE and Linköping University, Sweden

paper 🤗 HuggingFace Demo


📢 Latest Updates

  • Feb-26-24- Arxiv Preprint is released!
  • Feb-25-24- Code (Training and Evaluation scripts) is released!
  • Feb-25-24- Final pre-trained models (including intermediate checkpoints) and chat version along with online demo links released!

Overview

Bigger the better has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the less is more paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost.

⚡ Model Download

Model Name Link Download
MobiLlama-05B HuggingFace
MobiLlama-08B HuggingFace
MobiLlama-1B HuggingFace
MobiLlama-05B-Chat HuggingFace
MobiLlama-1B-Chat HuggingFace

Generation with MobiLlama

Model Description

Loading MobiLlama

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B", trust_remote_code=True)

model.to('cuda')
text = "I was walking towards the river when "
input_ids = tokenizer(text, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, repetition_penalty=1.2, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

Load intermediate Checkpoints

model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B", revision="ckpt_352", trust_remote_code=True)

All the intermediate checkpoints are available from ckpt_100 to ckpt_358.

Dataset

Download the preprocessed Amber data from huggingface. The entire training data has 360 chunks totalling the size of ~8 TB. Amber dataset contains total 1.2 Trillion tokens with gathered from different data sources shown below.

Subset Tokens (Billion)
Arxiv 30.00
Book 28.86
C4 197.67
Refined-Web 665.01
StarCoder 291.92
StackExchange 21.75
Wikipedia 23.90
Total 1259.13

Installation

First install PyTorch according to the instructions specific to your operating system.

To install from source (recommended for training/fine-tuning) run:

conda create -n mobillama python=3.10
conda activate mibillama
git clone https://github.com/mbzuai-oryx/MobiLlama.git
cd MobiLlama
pip install -r  requirements.txt

pretrain

For MobiLlama (using 20 nodes of A100 80GB GPUS)

sbatch pretrain.sh

For large-base use main_largebase.py in L:11 of pretrain.sh

🔎 Evaluation

We used Analysis-360 to evaluate our model on different llm benchmarks.

📊 Results

Model Name #Params HellaSwag Truthfulqa MMLU Arc_C CrowsPairs piqa race siqa winogrande Average
gpt-neo-125m 0.15B 30.26 45.58 25.97 22.95 61.55 62.46 27.56 40.33 51.78 40.93
tiny-starcoder 0.17B 28.17 47.68 26.79 20.99 49.68 52.55 25.45 38.28 51.22 37.86
cerebras-gpt-256m 0.26B 28.99 45.98 26.83 22.01 60.52 61.42 27.46 40.53 52.49 40.69
opt-350m 0.35B 36.73 40.83 26.02 23.55 64.12 64.74 29.85 41.55 52.64 42.22
megatron-gpt2-345m 0.38B 39.18 41.51 24.32 24.23 64.82 66.87 31.19 40.28 52.96 42.81
LiteLlama 0.46B 38.47 41.59 26.17 24.91 62.90 67.73 28.42 40.27 49.88 42.26
gpt-sw3-356m 0.47B 37.05 42.55 25.93 23.63 61.59 64.85 32.15 41.56 53.04 42.48
pythia-410m 0.51B 40.85 41.22 27.25 26.19 64.20 67.19 30.71 41.40 53.12 43.57
xglm-564m 0.56B 34.64 40.43 25.18 24.57 62.25 64.85 29.28 42.68 53.03 41.87
Lamini-GPT-LM 0.59B 31.55 40.72 25.53 24.23 63.09 63.87 29.95 40.78 47.75 40.83
MobiLlama (Ours) 0.5B 52.52 38.05 26.45 29.52 64.03 72.03 33.68 40.22 57.53 46.00
Lamini-GPT-LM 0.77B 43.83 40.25 26.24 27.55 66.12 69.31 37.12 42.47 56.59 45.49
MobiLlama (Ours) 0.8B 54.09 38.48 26.92 30.20 64.82 73.17 33.37 41.60 57.45 46.67

The table provides a comparative analysis of various models, including our MobiLlama, across several LLM benchmarks. It highlights MobiLlama's superior performance, particularly in its 0.5B and 0.8B configurations, showcasing its efficiency and effectiveness in processing complex language tasks. This comparison underscores MobiLlama's advancements in achieving higher accuracy and demonstrates its potential as a leading solution in the field of LLM.


Model #Params HellaSwag Truthfulqa MMLU Arc_C CrowsPairs piqa race siqa winogrande Average
Boomer 1B 31.62 39.42 25.42 22.26 61.26 57.99 28.99 40.32 50.98 39.80
Pythia-Dedup 1B 49.63 38.92 24.29 29.09 67.11 70.23 32.44 42.63 53.98 45.36
Falcon-RW 1B 63.12 35.96 25.36 35.06 69.04 74.10 36.07 40.23 61.88 48.98
TinyLlama 1.1B 60.22 37.59 26.11 33.61 70.60 73.28 36.45 41.65 59.18 48.74
OLMo 1.2B 62.50 32.94 25.86 34.45 69.59 73.70 36.74 41.14 58.90 48.42
Cerebras-GPT 1.3B 38.51 42.70 26.66 26.10 63.67 66.75 30.33 42.42 53.59 43.41
Lamini 1.3B 38.05 36.43 28.47 26.62 64.62 67.89 33.39 43.19 50.59 43.25
OPT 1.3B 54.50 38.67 24.63 29.60 70.70 72.47 34.16 42.47 59.74 47.43
GPT-NEO 1.3B 48.49 39.61 24.82 31.31 65.67 71.05 34.06 41.81 57.06 45.98
Pythia-Deduped 1.4B 55.00 38.63 25.45 32.59 67.33 72.68 34.64 42.68 56.90 47.32
large-base 1.2B 62.99 35.90 24.79 34.55 68.49 75.57 35.31 41.96 62.03 49.06

Comprehensive comparisons with existing < 2B params fully open-source LLM models on 9 benchmarks. Our 1.2B "large-base" model pre-trained on 1.2T tokens achieves superior performance compared to both the recent OLMo 1.17B model and TinyLlama 1.1B model, which are pre-trained on a substantially larger data of 3T tokens.

📱 MobiLlama on Android

To run our model on an android app, please download and install the APK from here.

🙏 Acknowledgements

  • We thank LLM-360 for fully transparent and open-source implementation of their language model. MobiLlama repo is built using LLM-360.

📜 Citation

@misc{thawakar2024mobillama,
      title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT}, 
      author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
      year={2024},
      eprint={2402.16840},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
} 

mobillama's People

Contributors

ashmalvayani avatar omkarthawakar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mobillama's Issues

Question on MobiLlama-V

Thanks for your great work! In Multimodal MobiLlama of the Results section, you briefly introduce how you developed MobiLlama-V. The model seems to have a LLaVA-like architecture, but is only trained on the visual instruction tuning data, which is the potential reason that MobiLlama-V exhibits mediocre performance. Hence, my questions are the following:

  1. Can you release more details about the architecture and training process of MobiLlama-V?
  2. Did/Will you perform two-stage training instead of only the second stage?
  3. Do you consider using ALLaVA-4V, a high-quality multimodal dataset for vision-language training? This dataset is proposed to improve the performance of small VLMs.

Thanks!

The app doesn't install

I tried to install the app on my tablet and phone, however the app didn't install in neither one.
This is probably because the armeabi-v7a libs are missing

Android running problems

Great project! After I installed the apk on my Android device, I was able to run the 05B model and enter questions to get some feedback.
a65d897840b06038cc6689d5c68e66ef

But I have some questions:

  1. The output content seems to be somewhat irrelevant to the problem;
  2. Can the code for loading local models in apk be open source;
  3. Whether the model supports fine-tuning so that it can handle specific business scenarios;

Extending context size

this is more a question than an issue.
Can one extend the context size of the model ?
I am asking because I would like to test finetuning it to longer context to see how far one can get in terms of context size with constrained resources (RTX 4090)

cannot reproduce siqa numbers

hello @OmkarThawakar , I used the LLM360 Analysis repo to run eval for siqa task:

python Analysis360/eval/harness/main.py --device cuda:0 --model=hf-causal-experimental --batch_size=auto:1 --model_args="pretrained=MBZUAI/MobiLlama-05B,trust_remote_code=True,dtype=bfloat16" --tasks=social_iqa --num_fewshot=0 --output_path=Analysis360-MobiLlama-05B.json

it only gives 0.3327, which is close to random numbers, since there are only three choices.

Tasks Version Filter n-shot Metric Value Stderr
social_iqa 0 none 0 acc 0.3327 ± 0.0107

Could you share how you ran the siqa evaluation? Thanks

How this app is made?

How do you build the android app? Can u gime the llama implemention as an library (jar+.so)?

MobiLlama-V code and ckpts

Thanks for your great work!
Will you release the code and checkpoints of MobiLlama-V?
This work is very interesting and I hope to develop my work using MobiLlama-V. Thank you so much!

Android App load local model

Could you modify the home activity to load the local model instead? The network speed for downloading the model is quite slow. Thank you for sharing this amazing AI project and Android app.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.