juncongmoo / pyllama Goto Github PK

View Code? Open in Web Editor NEW

2.8K 2.8K 312.0 67.63 MB

LLaMA: Open and Efficient Foundation Language Models

License: GNU General Public License v3.0

Shell 4.98% Python 95.02%

pyllama's People

Contributors

Stargazers

Watchers

Forkers

dlxj standardgalactic brentes c00renut mjtao xiaodongsuper tianzuishiwo forex24 2023klaus matzoh phoebussi gucasdongzi zhys513 zoubohao goniszewski dumpmemory xczhanjun zero506 djedafranck mjiangcn betai18n diesper techthiyanes ukaserge webclinic017 cjltctc ckqqqq dafu-wu mldevorg open-evals loongel xxyqsy fscomfs zerocool438 xinjiayu mt6979 itinov hwg1986 lemoncandy42 bocyou han-0q0 han-oqo limaluizpaulo rayturn commune-ai andrewkeyanzhe mereep mani-rri wanweilove dreamcat4 xunyuw singlag liamgon dotpyu zaquinn xivisi yifenglv46 bradley-butcher qweszxc7410 tribe-health coorung tkpte fabricioxx gmlove leaaom leandermaerkisch soon14 chiselscala swair apvsalvador dtsdeveloper humbertodeveloper a1ex90 ssemiya zhengdqin arunbaruah hertera1 devlux76 liujiachi1997 aijianiula0601 ribaisou jisang0814 jmwdpk firma mcx celobusana lucashofer maubertin nionsisre johndee89 jianantian hazemabdelkawy bensu theophpo clercrobin danielbarankin yueyedeai aalllq xiuwf cjh88888

pyllama's Issues

Error trying Quantize 7B model to 2-bit

I have installed GPTQ as said "https://pypi.org/project/gptq/#description", but following error comes out after execute python -m llama.llama_quant D:\Repo\Llama\weights\7B c4 --wbits 2 --save pyllama-7B2b.pt:
Traceback (most recent call last): File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ASUS\.conda\envs\PyLlama\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Repo\PyLlama\pyllama\llama\llama_quant.py", line 6, in <module> from gptq import ( File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\__init__.py", line 9, in <module> from .gptq import GPTQ File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\gptq.py", line 5, in <module> from .quant import quantize File "C:\Users\ASUS\.conda\envs\PyLlama\lib\site-packages\gptq\quant.py", line 4, in <module> from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16 ModuleNotFoundError: No module named 'quant_cuda'
I am using Windows 11 SO

Tips: hiq_python support py3.8+

HiQ supports Python version 3.8 and higher. Your version is 3.7.

example.py FAILED

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 261346) of binary: /usr/bin/python3

How can I input prompt when I use multi GPU?

Hello, I use 4 V100 GPU, and I load the 30B model, I want to modify the example.py code to input my promths. But it doesnot work. My code as this:

user_input = input("please enter your prompts (Ctrl+C to exit): ")
prompts = [user_input]
print("prompts", prompts)

It stops before the print code. How to solve it ?

error when running model for inference: ModuleNotFoundError: No module named 'transformers.models.llama'

Hi,

I am trying to run inference using pyllama using the quantized 4bit model on Google colab, however I get below error, after model is successfully loaded:

(The command to run inference is:
!python pyllama/quant_infer.py --wbits 4 --load drive/MyDrive/pyllama/llama-7b-4bit.pt --text "the general theory of relativity states that" --max_length 24 --cuda cuda:0)

mod,126,transformers.models.llama.tokenization_llama: ModuleNotFoundError: No module named 'transformers.models.llama'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
(2):
/usr/local/lib/python3.8/dist-packages/hiq/base.py(491): _h
/usr/local/lib/python3.8/dist-packages/hiq/base.py(426): enable_hiq
/usr/local/lib/python3.8/dist-packages/hiq/base.py(160): init
/usr/local/lib/python3.8/dist-packages/hiq/base.py(722): init
pyllama/quant_infer.py(6): main
pyllama/quant_infer.py(25):

🦉 transformers.models.llama.tokenization_llama.LLaMATokenizer.encode is not traced('NoneType' object has no attribute 'LLaMATokenizer')
⌛️ Loading model from drive/MyDrive/pyllama/llama-7b-4bit.pt...
✅ Model from drive/MyDrive/pyllama/llama-7b-4bit.pt is loaded successfully.
Traceback (most recent call last):
File "pyllama/quant_infer.py", line 25, in
main()
File "pyllama/quant_infer.py", line 19, in main
hiq.mod("llama.llama_infer").run(args)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 375, in __x
s.handle_exception(f_name, e)
File "/usr/local/lib/python3.8/dist-packages/hiq/utils.py", line 493, in __y
r = f(s, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 353, in handle_exception
raise e
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 368, in __x
result = call_decorated(
File "/usr/local/lib/python3.8/dist-packages/hiq/hiq_utils.py", line 326, in call_decorated
return f(*args, **kwargs)
File "", line 27, in __run_quant
File "", line 11, in __run_quant
File "/content/pyllama/llama/llama_infer.py", line 75, in run
tokenizer = AutoTokenizer.from_pretrained(args.model)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 375, in __x
s.handle_exception(f_name, e)
File "/usr/local/lib/python3.8/dist-packages/hiq/utils.py", line 493, in __y
r = f(s, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 353, in handle_exception
raise e
File "/usr/local/lib/python3.8/dist-packages/hiq/base.py", line 368, in __x
result = call_decorated(
File "/usr/local/lib/python3.8/dist-packages/hiq/hiq_utils.py", line 326, in call_decorated
return f(*args, **kwargs)
File "", line 27, in __from_pretrained
File "", line 11, in __from_pretrained
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 676, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Sorry,I can't run

(llama) -bash-4.2$ python inference.py --ckpt_dir ./models/7B --tokenizer_path ./models/tokenizer.model
Traceback (most recent call last):
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 67, in
run(
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 47, in run
generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
File "/home/ycshu_wlxy/kingingwang/pyllama-main/inference.py", line 22, in load
checkpoint = torch.load(ckpt_path, map_location="cpu")
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 789, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1131, in _load
result = unpickler.load()
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1101, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/home/ycshu_wlxy/.conda/envs/llama/lib/python3.10/site-packages/torch/serialization.py", line 1079, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
RuntimeError: PytorchStreamReader failed reading file data/22: invalid header or archive is corrupted

Docker install

Has anyone got this dockerized for easy install?

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 33/33 [00:12<00:00, 2.68it/s]
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-6fbe877195f42de5/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Found cached dataset json (/home/jjjj/.cache/huggingface/datasets/allenai___json/allenai--c4-efc3d4f4606f44bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Traceback (most recent call last):
File "/miniconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/jjjj/miniconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 474, in
run()
File "/home/jjjj/Project/00.TextGen/pyllama/llama/llama_quant.py", line 437, in run
dataloader, testloader = get_loaders(
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 112, in get_loaders
return get_c4(nsamples, seed, seqlen, model, tokenizer)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/gptq/datautils.py", line 67, in get_c4
tokenizer = tokenizer or AutoTokenizer.from_pretrained(model, use_fast=False)
File "/home/jjjj/miniconda3/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
This error might be caused by the fact that LLaMATokenizer was changed to LlamaTokenizer. Where should I make the modification?

Possible to add a basic training pipeline?

This should be compatible with Deepspeed, would it be possible to add a basic training pipeline?

Downloading get stuck in infinite loop

When trying to download the models, I get stuck in some infinite loop.

This repeats once every second until I terminate the program.

Environment:

pyllama version: commit 321d475f01c88e179c8a30d68b5281e2caca5b07 (HEAD -> main, tag: v0.0.9, origin/main, origin/HEAD)
OS: macOS 13.2.1
Hardware: Apple M1 Max
After installing the following packages were installed manually:
- Transformers
  - This was missing from requirements.txt
  - Version 4.27.3
  - Command used pip install transformers
- Itree
  - Was incompatible with M1 architecture
  - Work-around instructions: https://pypi.org/project/py-itree/
  - Command used: pip uninstall py-itree ; pip install https://github.com/juncongmoo/itree/archive/refs/tags/tag-bf9f3aada064acf3ce4db6fc58ed2e744caee0a3.tar.gz

already quantize to 4bit and get the model pyllama-7B4b.pt，but can not run in RTX3080. report torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated;

the error is as the follow：
python webapp_single.py --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH
Traceback (most recent call last):
File "/home/xxxx/chatllama/pyllama/apps/gradio/webapp_single.py", line 80, in
generator = load(
File "/home/u/chatllama/pyllama/apps/gradio/webapp_single.py", line 42, in load
model = Transformer(model_args)
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 199, in init
self.layers.append(TransformerBlock(layer_id, params))
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 167, in init
self.feed_forward = FeedForward(
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/llama/model_single.py", line 154, in init
self.w3 = nn.Linear(dim, hidden_dim, bias=False)
File "/home/xxxx/miniconda3/envs/chatllama/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 96, in init
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 10.00 GiB total capacity; 9.24 GiB already allocated; 0 bytes free; 9.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

du -sh pyllama-7B4b.pt
3.6G pyllama-7B4b.pt

Quantize Original LLaMA Model Files

A bit confused here. In README.md, users are asked to donwload LLaMA model files first. Then quantize examples use decapoda-research/llama-7b-hf. How to quantize the downloaded LLaMA model files(for example, consolidated.00.pth for 7B)?

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

is py-itree available for Windows?

hiq-python depends on py-itree, but py-itree does not appear to have a windows compatible release.

Error trying Quantize 7B model to 8-bit

when run :
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt
got error:
OSError: Unable to load weights from pytorch checkpoint file for '/home/jima/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model-00002-of-00033.bin' at '/home/jima/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model-00002-of-00033.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Quantization with "groupsize" makes the results completely wrong.

Hi,

I'm quantizing the models following the README but there's one common thing while using the groupsize parameter - in each case the perplexity goes to the roof and the results are completely wrong.
For example, quantizing 7B model with 4 bits, perplexity:

wikitext2: 7.462815284729004
ptb:       11.122198104858398
c4:        8.211784362792969

And the same model with 4 bits and --groupsize 128:

wikitext2: 243848.546875
ptb:       309488.53125
c4:        240030.015625

And the results for input What's the Earth?:

🦙: What's the Earth?
So what's the earth? It's a planet.
Which one? Well, the one that revolves around the sun.
Now that's true, but what does that mean?

4b, group size of 128:

🦙: What's the Earth?örtfitolly Alburd Tob fitpaunity Tobżyurd girlsurd fitattanattan�ört SE�ży girlsolly Podpois Siegunityunityollyź�éliollyört Nationpois Pod girls finalepoisazineattan

Any idea what's going on?

If this matters, I'm using Python 3.8 in ubuntu 22.04 running in WSL

Running web_server.py on Multi GPU instance.

Hello. I started 8x A100 80G instance in Google Cloud and can't start 65B model:

root@llama:/pyllama/apps/flask# python3 web_server.py --ckpt_dir /var/llama/65B --tokenizer_path /var/llama/tokenizer.model
Traceback (most recent call last):
  File "/pyllama/apps/flask/web_server.py", line 101, in <module>
    generator = init_generator(
  File "/pyllama/apps/flask/web_server.py", line 88, in init_generator
    local_rank, world_size = setup_model_parallel()
  File "/pyllama/apps/flask/web_server.py", line 39, in setup_model_parallel
    dist.init_process_group("nccl")
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 754, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py", line 236, in _env_rendezvous_handler
    rank = int(_get_env_or_raise("RANK"))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py", line 221, in _get_env_or_raise
    raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

ModuleNotFoundError: No module named 'quant_cuda'

Traceback (most recent call last):
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/llama/llama_quant.py", line 6, in <module>
    from gptq import (
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/__init__.py", line 9, in <module>
    from .gptq import GPTQ
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/gptq.py", line 5, in <module>
    from .quant import quantize
  File "/home/orion/.local/lib/python3.10/site-packages/gptq/quant.py", line 4, in <module>
    from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16
ModuleNotFoundError: No module named 'quant_cuda'

I can't find it whatsoever online, no idea whats going on:

$ nvidia-smi
Sat Mar 18 19:49:11 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti      On | 00000000:09:00.0 Off |                  N/A |
|  0%   45C    P8               23W / 200W|     64MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1245      G   /usr/lib/xorg/Xorg                           56MiB |
|    0   N/A  N/A      1437      G   /usr/bin/gnome-shell                          6MiB |
+---------------------------------------------------------------------------------------+

Download takes forever

Stuck at "downloading file to llama_7B/7B/consolidated.00.pth" for several hours. I checked the size of the model folder, it's around 6.6GB. The size stays constant.

The following are script outputs.

❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
✅ llama_7B/tokenizer.model
✅ llama_7B/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to llama_7B/7B/consolidated.00.pth ...please wait for a few minutes ...

Quantize 7B model to 8-bit --> "Killed"

Getting this issue:

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt
Loading checkpoint shards:  64%|████████████████████████████████████████                       | 21/33 [00:11<00:05,  2.32it/s]
Killed

Any ideas? It seems to consistently fail at 64% on that.

hiq-python installation problem

My process

anaconda powershell
python 3.10
pip install pyllama
git clone https://github.com/juncongmoo/pyllama.git
cd pyllama
run inference => failure
pip install -r requirements.txt
get the following error

Using cached hiq_python-1.0.0-py3-none-any.whl (49 kB)
ERROR: Cannot install hiq-python==1.0.0, hiq-python==1.0.1, hiq-python==1.0.2, hiq-python==1.0.3, hiq-python==1.0.4, hiq-python==1.0.5, hiq-python==1.1.0, hiq-python==1.1.1, hiq-python==1.1.2, hiq-python==1.1.3, hiq-python==1.1.4, hiq-python==1.1.5, hiq-python==1.1.6, hiq-python==1.1.7 and hiq-python==1.1.8 because these package versions have conflicting dependencies.

The conflict is caused by:
hiq-python 1.1.8 depends on py-itree
hiq-python 1.1.7 depends on py-itree
hiq-python 1.1.6 depends on py-itree
hiq-python 1.1.5 depends on py-itree
hiq-python 1.1.4 depends on py-itree
hiq-python 1.1.3 depends on py-itree
hiq-python 1.1.2 depends on py-itree
hiq-python 1.1.1 depends on py-itree~=0.0.15
hiq-python 1.1.0 depends on py-itree~=0.0.15
hiq-python 1.0.5 depends on py-itree~=0.0.15
hiq-python 1.0.4 depends on py-itree~=0.0.15
hiq-python 1.0.3 depends on py-itree~=0.0.15
hiq-python 1.0.2 depends on py-itree~=0.0.14
hiq-python 1.0.1 depends on py-itree~=0.0.14
hiq-python 1.0.0 depends on py-itree~=0.0.14

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Things I've tried

there are no listed version requirements in requirement.txt to loosen
uninstall, reinstall pyllama, hiq
uninstall, reinstall torch
pip install py-itree => "ERROR: No matching distribution found for py-itree"

My thought now is that it's because python 3.10 is too far in the future?

Next steps: recreate environment with python 3.8, which I've seen referenced around, and try again.

How to run llama_quant without downloading models from huggingface ?

How to run llama_quant without downloading model files from hugginface ?
I tried ckpt_dir no luck I got the following error:

"the following argument are required: dataset"

world size assertionerror

I try to make the 7B model on my single GPU server, and I have error:

Traceback (most recent call last):
  File "inference.py", line 82, in <module>
    run(
  File "inference.py", line 50, in run
    generator = load(
  File "inference.py", line 17, in load
    assert world_size == len(
AssertionError: Loading a checkpoint for MP=0 but world size is 1

I used the community way to download the model files.

Where can i modify the MP setting? or I have to run it with multiple GPU way?

Model mismatch for 13B

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 webapp.py --ckpt_dir ../../../llama/ckpt/13B/ --tokenizer_path ../../../llama/ckpt/tokenizer.model
`WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

initializing model parallel with size 2
initializing ddp with size 1
initializing pipeline with size 1
Traceback (most recent call last):
File "webapp.py", line 95, in
generator = load(
File "webapp.py", line 56, in load
model.load_state_dict(checkpoint, strict=False)
File "/data/anaconda3/envs/pyllama/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Transformer:
size mismatch for tok_embeddings.weight: copying a param with shape torch.Size([32000, 2560]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).
size mismatch for layers.0.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.0.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.0.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.0.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.1.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.1.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.1.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.1.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.2.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.2.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.2.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.2.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.3.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.3.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.3.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.3.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.4.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.4.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.4.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.4.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.5.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.5.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.5.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.5.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.6.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.6.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.6.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.6.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.7.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.7.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.7.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.7.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.8.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.8.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.8.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.8.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.9.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.9.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.9.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.9.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.10.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.10.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.10.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.10.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.11.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.11.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.11.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.11.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.12.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.12.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.12.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.12.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.13.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.13.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.13.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.13.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.14.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.14.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.14.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.14.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.15.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.15.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.15.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.15.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.16.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.16.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.16.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.16.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.17.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.17.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.17.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.17.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.18.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.18.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.18.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.18.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.19.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.19.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.19.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.19.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.20.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.20.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.20.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.20.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.21.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.21.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.21.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.21.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.22.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.22.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.22.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.22.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.23.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.23.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.23.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.23.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.24.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.24.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.24.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.24.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.25.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.25.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.25.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.25.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.26.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.26.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.26.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.26.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.27.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.27.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.27.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.27.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.28.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.28.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.28.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.28.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.29.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.29.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.29.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.29.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.30.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.30.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.30.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.30.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.31.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.31.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.31.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.31.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.32.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.32.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.32.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.32.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.33.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.33.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.33.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.33.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.34.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.34.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.34.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.34.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.35.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.35.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.35.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.35.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.36.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.36.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.36.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.36.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.37.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.37.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.37.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.37.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.38.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.38.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.38.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.38.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.39.attention.wq.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wk.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wv.weight: copying a param with shape torch.Size([2560, 5120]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.attention.wo.weight: copying a param with shape torch.Size([5120, 2560]) from checkpoint, the shape in current model is torch.Size([5120, 5120]).
size mismatch for layers.39.feed_forward.w1.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for layers.39.feed_forward.w2.weight: copying a param with shape torch.Size([5120, 6912]) from checkpoint, the shape in current model is torch.Size([5120, 13824]).
size mismatch for layers.39.feed_forward.w3.weight: copying a param with shape torch.Size([6912, 5120]) from checkpoint, the shape in current model is torch.Size([13824, 5120]).
size mismatch for output.weight: copying a param with shape torch.Size([16000, 5120]) from checkpoint, the shape in current model is torch.Size([32000, 5120]).`

What does MP mean?

ModuleNotFoundError: No module named 'quant_cuda'

I got this error when running " !python3 -m llama.llama_quant --help " on Google Colab

Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/dist-packages/llama/llama_quant.py", line 6, in
from gptq import (
File "/usr/local/lib/python3.9/dist-packages/gptq/init.py", line 9, in
from .gptq import GPTQ
File "/usr/local/lib/python3.9/dist-packages/gptq/gptq.py", line 5, in
from .quant import quantize
File "/usr/local/lib/python3.9/dist-packages/gptq/quant.py", line 4, in
from quant_cuda import matvmul2, matvmul3, matvmul4, matvmul8, matvmul16
ModuleNotFoundError: No module named 'quant_cuda'

M1 inference

I can run original llama (with minimal changes) and llama.cpp on my Macbook M1 Max. I think it would be great if I could use pyllama with the same hardware too.

No module named "transformers" error

When i try to run "python -m llama.download --model_size 7B", it says that python command doesnt exist, so i have to use "python3" command, but once i write "python3 -m llama.download --model_size 7B", all these errors appears

Can someone help me figure out what is wrong?

error when installing

Could not find a version that satisfies the requirement fairscale>=0.4.13

Struggle with training LLaMA with a single GPU using both PT v1 and v2

Hi,
I love your code base and want to try how to train the LLaMA with a single GPU. This code I use is here https://github.com/juncongmoo/pyllama/blob/main/llama/model_single.py.
However, I struggle with an error. This message's shown that:
"
self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
File "/home/linh/anaconda3/envs/a/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 139, in init
self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs))
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 512]
"
Can you help me to fix/test this code again.

Thank in advance.
Linh

torch.cuda.OutOfMemoryError: CUDA out of memory

Thanks for making this repo! I was looking to run this on my own hardware and this is helping me do just that.

I first tried to run inference with Facebook's own instructions by I was getting a memory error. I tried a few other modifications but they did not work either.

Finally, I came to this repository to try and fix my problem. I'm still getting the same error, however.

Error:

Traceback (most recent call last):
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 67, in <module>
    run(
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 48, in run
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
  File "/mnt/FILEZ/Files/Downloads/Media/llama/inference.py", line 32, in load
    model = Transformer(model_args)
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 196, in __init__
    self.layers.append(TransformerBlock(layer_id, params))
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 170, in __init__
    self.feed_forward = FeedForward(
  File "/mnt/FILEZ/Files/Downloads/Media/llama/llama/model_single.py", line 152, in __init__
    self.w2 = nn.Linear(
  File "/home/musa/.local/share/anaconda3/envs/llama/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 11.75 GiB total capacity; 11.50 GiB already allocated; 11.12 MiB free; 11.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have been consistently seeing this error everytime that I've tried to run inference and I'm not sure how to fix it.

I run the inference with this command: python inference.py --ckpt_dir ./llama-dl/7B --tokenizer_path ./llama-dl/tokenizer.model

My Specs:

CPU: Intel 5 11500
GPU: 12GB Nvidia 3060
RAM: 16 GB

With these specs it seems I should be able to run this version of inference but it still does not work.

Before running the program I ran the free command:

               total        used        free      shared  buff/cache   available
Mem:        15173760      668404    12962088         584     1543268    14165152
Swap:       15605752      550436    15055316

So I definitely have more than the 8GB of ram shown in the README.

I would really appreciate your help, thanks!

how many cuda needew when training

Execuse me, How to use chat mode?

Docker Playground With LLaMA And PyLLaMA

Hi @juncongmoo

I made a simple Docker image to run LLaMA and your PyLLaMA.

Hope that helps

https://github.com/soulteary/llama-docker-playground

Life time is precious, and there is no need to toss about the installation environment

Share your evaluate result

We evaluate llama using 100 examples of the SQuAD dataset with the Open-evals framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt as the output of Llama and useinclude accuracy as a metric to measure its performance.

For a model completion a and a reference list of correct answers B
include: any([(a in b) for b in B])

model	squad(100)
alpaca-lora-7b	0.88
llama-7b	0.63
gpt-3.5-turbo	0.9
text-davinci-003	0.87
text-davinci-002	0.66
text-davinci-001	0.58
ada	0.35

Unkown cuda error

Traceback (most recent call last):
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 82, in <module>
    run(
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 50, in run
    generator = load(
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/inference.py", line 33, in load
    model = Transformer(model_args)
  File "/home/orion/AI-Horde-Worker/llama.cpp/pyllama/llama/model_single.py", line 195, in __init__
    self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
  File "/home/orion/.local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 142, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
  File "/home/orion/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

python3 pyllama/inference.py --ckpt_dir models/7B/ --tokenizer_path models/tokenizer.model

Environment is ubuntu 22, cuda 12.1, rtx 3060 ti

Killed

Hello all, I installed the requirements of project but when I try to execute the following command:

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 2 --save pyllama-7B2b.pt

I got this message -> "Killed". Could you help me to determinate better the issue and fix. thanks

run inference.py and it report 'model parallel group is not initialized' error

I have set torch.distributed.init_process_group and it still got this error:

Traceback (most recent call last):
  File "inference.py", line 67, in <module>
    run(
  File "inference.py", line 48, in run
    generator = load(ckpt_dir, tokenizer_path, local_rank, world_size, max_seq_len, max_batch_size)
  File "inference.py", line 32, in load
    model = Transformer(model_args)
  File "/usr/local/lib/python3.8/dist-packages/llama/model.py", line 205, in __init__
    self.tok_embeddings = ParallelEmbedding(
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/layers.py", line 186, in __init__
    world_size = get_model_parallel_world_size()
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/initialize.py", line 152, in get_model_parallel_world_size
    return torch.distributed.get_world_size(group=get_model_parallel_group())
  File "/usr/local/lib/python3.8/dist-packages/fairscale/nn/model_parallel/initialize.py", line 128, in get_model_parallel_group
    assert _MODEL_PARALLEL_GROUP is not None, "model parallel group is not initialized"
AssertionError: model parallel group is not initialized

how to solve this? many thx.

Model does not split for 65B

I have 8 80G A100 GPUs. I can't run correctly for the project， while I can run official example.py.

 torchrun --nproc_per_node 8 webapp.py --ckpt_dir /nvme/syx/llama/model/65B/65B/ --tokenizer_path /nvme/syx/ll
ama/model/tokenizer.model

Output:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 344.00 MiB (GPU 0; 79.20 GiB total capacity; 77.97 GiB already allocated; 297.25 MiB free; 77.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

What does MP mean?

Vanilla pytorch LLaMA implementation

Hey great work with pyllama.
I may be wrong but i noticed that your code checks if the system has the same number of GPUs of the checkpoints (like here).
If it is the case it means that you can only run the 65B version if you have 8 GPUs but this is not necessary.

Here you can find a vanilla pytorch implementation of LLaMA and a weights conversion script that you can use to run LLaMA using as many (or as few) GPUs as you want https://github.com/galatolofederico/vanilla-llama

No module named 'hiq'

G:\ai\pyllama>python inference.py --ckpt_dir G:\model\7B --tokenizer_path G:\model/tokenizer.model
Traceback (most recent call last):
File "G:\ai\pyllama\inference.py", line 6, in
from llama import ModelArgs, Transformer, Tokenizer, LLaMA
File "G:\ai\pyllama\llama_init_.py", line 5, in
from .model_single import ModelArgs, Transformer
File "G:\ai\pyllama\llama\model_single.py", line 8, in
import hiq
ModuleNotFoundError: No module named 'hiq'

"torch.cuda.OutOfMemoryError: CUDA out of memory" when I'm not out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 12.00 GiB total capacity; 2.60 GiB already allocated; 8.36 GiB free; 2.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

woaah. I'm out of memory already? 8.36 GiB free and I can't allocate 64.00 MiB?

Any way to infer a quantized model on multi GPUs?

Meaningless Prediction in 13B 2bit

I have quantized the 13B model to 2bit by executing:

python -m llama.llama_quant decapoda-research/llama-13b-hf c4 --wbits 2 --save pyllama-13B2b.pt

After the quantization when I run the test inference the output seams completely random:

python quant_infer.py --model decapoda-research/llama-13b-hf --wbits 2 --load ../pyllama-13B2b.pt --text "the meaning of life is" --max_length 24 --cuda cuda:0

ModuleNotFoundError: No module named 'llama.hf'

Try to run:
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 8 --save pyllama-7B8b.pt

Got an error:
Traceback (most recent call last):
File "/home/user/miniconda3/envs/transformers/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/user/miniconda3/envs/transformers/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/miniconda3/envs/transformers/lib/python3.10/site-packages/llama/llama_quant.py", line 16, in
from llama.hf.modeling_llama import LLaMAForC
ModuleNotFoundError: No module named 'llama.hf'

AttributeError: module 'numpy' has no attribute 'array'

(tf) C:\Users\James>python -m llama.download
Traceback (most recent call last):
File "C:\Users\James\anaconda3\envs\tf\lib\runpy.py", line 188, in _run_module_as_main
mod_name, mod_spec, code = get_module_details(mod_name, Error)
File "C:\Users\James\anaconda3\envs\tf\lib\runpy.py", line 111, in get_module_details
import(pkg_name)
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\llama_init.py", line 1, in
from .generation import LLaMA
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\llama\generation.py", line 6, in
import torch
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch_init.py", line 831, in
from .functional import * # noqa: F403
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch\functional.py", line 7, in
import torch.backends.opt_einsum as opt_einsum
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\torch\backends\opt_einsum_init.py", line 9, in
import opt_einsum as opt_einsum # type: ignore[import]
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum_init.py", line 5, in
from . import blas
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum\blas.py", line 7, in
from . import helpers
File "C:\Users\James\anaconda3\envs\tf\lib\site-packages\opt_einsum\helpers.py", line 14, in
_sizes = np.array([2, 3, 4, 5, 4, 3, 2, 6, 5, 4, 3, 2, 5, 7, 4, 3, 2, 3, 4])
AttributeError: module 'numpy' has no attribute 'array'

Error when download models

Hi,

Here is the errors when i try download from my mac M1:

python3 -m llama.download        
Traceback (most recent call last):
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 5, in <module>
    from . import _itree
ImportError: cannot import name '_itree' from partially initialized module 'itree' (most likely due to a circular import) (/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 188, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/__init__.py", line 4, in <module>
    from .model_single import ModelArgs, Transformer
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/llama/model_single.py", line 8, in <module>
    import hiq
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/__init__.py", line 57, in <module>
    from .tree import (
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/hiq/tree.py", line 9, in <module>
    import itree
  File "/Users/paulo/Library/Python/3.9/lib/python/site-packages/itree/__init__.py", line 7, in <module>
    import _itree
ImportError: dlopen(/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so, 0x0002): tried: '/Users/paulo/Library/Python/3.9/lib/python/site-packages/_itree.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

Error Downloading Models from Community on Winodws

I have cloded the repo, installed all requirements including the CMake and itree based on one of the reported issues, still I run into the following traceback error when trying to download the model via:

python -m llama.download
or
python -m llama.download --folder .\models\

Exception has occurred: FileNotFoundError
[Errno 2] No such file or directory: '/tmp/error.njkfo9xztnqw.log'
  File "C:\Users\majmo\Git\pyllama\llama\download.py", line 17, in download
    retcode = hiq.execute_cmd(cmd, verbose=True, shell=True, runtime_output=True)
  File "C:\Users\majmo\Git\pyllama\llama\download.py", line 87, in <module>
    download(args)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/error.njkfo9xztnqw.log'

I have tried to see what I can change, but it was nor clear what hiq does actually when executing at line 17!

"KeyError: 'llama'"

When debugging the code for the quota section, I received an error message saying "KeyError: 'llama'", and upgrading the transformer did not work.

    dataloader, testloader = get_loaders(
        args.dataset, # C4
        nsamples=args.nsamples,
        seed=args.seed,
        # model=args.model,
        model='D:\\SPACE_Research_AI\\QutaModel_TransformerBased\\modelCk\\models--decapoda-research--llama-7b-hf\\'
              'snapshots\\5f98eefcc80e437ef68d457ad7bf167c2c6a1348',
        seqlen=model.seqlen,
    )

pyllama/downloads returns empty folders

Hello, when running:

python3 -m llama.download

the command runs almost instantly but only creates empty folders named 7B, 13B, etc...
I also tried by specifying --model-size and --folder with the same result

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble