python3 convert_checkpoint.py --model_dir /workspace/lk/model/Qwen/14B --output_dir ./

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

KeyError: 'model.layers.0.self_attn.q_proj.qweight' about tensorrt-llm HOT 6 OPEN

LIUKAI0815 commented on July 17, 2024

KeyError: 'model.layers.0.self_attn.q_proj.qweight'

from tensorrt-llm.

Comments (6)

jershi425 commented on July 17, 2024

@LIUKAI0815 Thanks for the feedback. Could you kindly tell me which model are you using? This requires using the official GPTQ quantized checkpoints from HF.

from tensorrt-llm.

RoslinAdama commented on July 17, 2024

I have the same issue using a quantized Mistral model : TheBloke/Mistral-7B-v0.1-AWQ

from tensorrt-llm.

LIUKAI0815 commented on July 17, 2024

@jershi425 I'm using the Qwen1.5-14B-Chat

from tensorrt-llm.

Mary-Sam commented on July 17, 2024

Has this problem been solved? I have the same error when using a quantized mixtral model

from tensorrt-llm.

nv-guomingz commented on July 17, 2024

Has this problem been solved? I have the same error when using a quantized mixtral model

Hi @Mary-Sam could u please list more details/log on your issue? So we can look into it.

from tensorrt-llm.

Mary-Sam commented on July 17, 2024

Hi @nv-guomingz
I run the following command for the quantized model
python3 /tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /model --output_dir /engine --load_model_on_cpu

I am using the latest version of tensorrt_llm==0.9.0

My model has the following quantization configuration

{
    "bits": 4,
    "group_size": 128,
    "modules_to_not_convert": [
      "gate"
    ],
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  }

And I am getting the following error:

2024-06-03 12:56:17,367 utils.common INFO:[TensorRT-LLM] TensorRT-LLM version: 0.9.0
2024-06-03 12:56:17,367 utils.common INFO:We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
2024-06-03 12:56:17,367 utils.common INFO:0.9.0
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.42it/s]
2024-06-03 12:56:17,367 utils.common INFO:Traceback (most recent call last):
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 446, in <module>
2024-06-03 12:56:17,367 utils.common INFO:    main()
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 438, in main
2024-06-03 12:56:17,367 utils.common INFO:    convert_and_save_hf(args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 375, in convert_and_save_hf
2024-06-03 12:56:17,367 utils.common INFO:    execute(args.workers, [convert_and_save_rank] * world_size, args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 397, in execute
2024-06-03 12:56:17,367 utils.common INFO:    f(args, rank)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 362, in convert_and_save_rank
2024-06-03 12:56:17,367 utils.common INFO:    llama = LLaMAForCausalLM.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 244, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    llama = convert.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1192, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    weights = load_weights_from_hf(config=config,
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1296, in load_weights_from_hf
2024-06-03 12:56:17,367 utils.common INFO:    weights = convert_hf_llama(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 964, in convert_hf_llama
2024-06-03 12:56:17,367 utils.common INFO:    convert_layer(l)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 646, in convert_layer
2024-06-03 12:56:17,367 utils.common INFO:    q_weight = get_weight(model_params, prefix + 'self_attn.q_proj', dtype)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 399, in get_weight
2024-06-03 12:56:17,367 utils.common INFO:    if config[prefix + '.weight'].dtype != dtype:
2024-06-03 12:56:17,367 utils.common INFO:KeyError: 'model.layers.0.self_attn.q_proj.weight'

from tensorrt-llm.

KeyError: 'model.layers.0.self_attn.q_proj.qweight' about tensorrt-llm HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs