Whenever I try to convert the 13B weights (unmodified) sourced from the dalai llama do

This should help: <a class="issue-link js-issue-link" data-error-text="Failed to load

I have 32 gb, how much do you need? <span class="email-hidden-togg

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error in convert_checkpoint.py when converting 13B weights about lit-llama HOT 7 CLOSED

lightning-ai commented on August 24, 2024

Error in convert_checkpoint.py when converting 13B weights

from lit-llama.

Comments (7)

lantiga commented on August 24, 2024 2

This should help: #111

Just append --dtype bfloat16 to the conversion arguments and it will keep half precision tensors in memory during conversion.

@psych0v0yager can you try this out on your system?

from lit-llama.

chrisociepa commented on August 24, 2024

most likely it means you don't have enough RAM

from lit-llama.

psych0v0yager commented on August 24, 2024

I have 32 gb, how much do you need?

…

On Apr 7, 2023, at 3:30 AM, Chris ***@***.***> wrote: most likely it means you don't have enough RAM — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

from lit-llama.

lantiga commented on August 24, 2024

@psych0v0yager I don't think you have enough RAM to hold the 13B model at 32bit precision (you need 48GB).

As a check, try instantiating the 13B model

from lit_llama.model import LLaMA

model = LLaMA.from_name("13B")

BTW are you loading the original checkpoints? We could provide an option to load incrementally in bfloat16 to reduce the requirements.

from lit-llama.

psych0v0yager commented on August 24, 2024

Thank you for the prompt replies.

I attempted the instantiation and additional conversion argument and received the following results.

Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from lit_llama.model import LLaMA

model = LLaMA.from_name("13B")

Killed

python scripts/convert_checkpoint.py --output_dir checkpoints/lit-llama --ckpt_dir /dalai/llama/models --tokenizer_path /dalai/llama/models/tokenizer.model --model_size 13B --dtype bfloat16
50%|█████████████████████████████████████████████████████████████████▌ | 1/2 [00:46<00:46, 46.04s/it]
Killed

Could it potentially be a VRAM issue? I only have 12 gb in my Nvidia 3060. Furthermore the model weights come from the dalai llama repo (https://github.com/cocktailpeanut/dalai) and I believe they are the full precision weights.

Thanks again for the support

from lit-llama.

chrisociepa commented on August 24, 2024

Killed means that the program was killed by your OS. From my experience it appears in 99% when you try using more RAM and SWAP than you have. My advice: run htop (or top) and check the memory consumption when the script is running to confirm the root cause of the problem. If you dont have enough VRAM, you see OOM exception.

from lit-llama.

lantiga commented on August 24, 2024

Closing this one, feel free to reopen

from lit-llama.

Error in convert_checkpoint.py when converting 13B weights about lit-llama HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs