GithubHelp home page GithubHelp logo

Comments (6)

jershi425 avatar jershi425 commented on July 17, 2024

@LIUKAI0815 Thanks for the feedback. Could you kindly tell me which model are you using? This requires using the official GPTQ quantized checkpoints from HF.

from tensorrt-llm.

RoslinAdama avatar RoslinAdama commented on July 17, 2024

I have the same issue using a quantized Mistral model : TheBloke/Mistral-7B-v0.1-AWQ

from tensorrt-llm.

LIUKAI0815 avatar LIUKAI0815 commented on July 17, 2024

@jershi425 I'm using the Qwen1.5-14B-Chat

from tensorrt-llm.

Mary-Sam avatar Mary-Sam commented on July 17, 2024

Has this problem been solved? I have the same error when using a quantized mixtral model

from tensorrt-llm.

nv-guomingz avatar nv-guomingz commented on July 17, 2024

Has this problem been solved? I have the same error when using a quantized mixtral model

Hi @Mary-Sam could u please list more details/log on your issue? So we can look into it.

from tensorrt-llm.

Mary-Sam avatar Mary-Sam commented on July 17, 2024

Hi @nv-guomingz
I run the following command for the quantized model
python3 /tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /model --output_dir /engine --load_model_on_cpu

I am using the latest version of tensorrt_llm==0.9.0

My model has the following quantization configuration

{
    "bits": 4,
    "group_size": 128,
    "modules_to_not_convert": [
      "gate"
    ],
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  }

And I am getting the following error:

2024-06-03 12:56:17,367 utils.common INFO:[TensorRT-LLM] TensorRT-LLM version: 0.9.0
2024-06-03 12:56:17,367 utils.common INFO:We suggest you to set `torch_dtype=torch.float16` for better efficiency with AWQ.
2024-06-03 12:56:17,367 utils.common INFO:0.9.0
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.42it/s]
2024-06-03 12:56:17,367 utils.common INFO:Traceback (most recent call last):
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 446, in <module>
2024-06-03 12:56:17,367 utils.common INFO:    main()
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 438, in main
2024-06-03 12:56:17,367 utils.common INFO:    convert_and_save_hf(args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 375, in convert_and_save_hf
2024-06-03 12:56:17,367 utils.common INFO:    execute(args.workers, [convert_and_save_rank] * world_size, args)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 397, in execute
2024-06-03 12:56:17,367 utils.common INFO:    f(args, rank)
2024-06-03 12:56:17,367 utils.common INFO:  File "/tensorrt_llm/examples/llama/convert_checkpoint.py", line 362, in convert_and_save_rank
2024-06-03 12:56:17,367 utils.common INFO:    llama = LLaMAForCausalLM.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 244, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    llama = convert.from_hugging_face(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1192, in from_hugging_face
2024-06-03 12:56:17,367 utils.common INFO:    weights = load_weights_from_hf(config=config,
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1296, in load_weights_from_hf
2024-06-03 12:56:17,367 utils.common INFO:    weights = convert_hf_llama(
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 964, in convert_hf_llama
2024-06-03 12:56:17,367 utils.common INFO:    convert_layer(l)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 646, in convert_layer
2024-06-03 12:56:17,367 utils.common INFO:    q_weight = get_weight(model_params, prefix + 'self_attn.q_proj', dtype)
2024-06-03 12:56:17,367 utils.common INFO:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 399, in get_weight
2024-06-03 12:56:17,367 utils.common INFO:    if config[prefix + '.weight'].dtype != dtype:
2024-06-03 12:56:17,367 utils.common INFO:KeyError: 'model.layers.0.self_attn.q_proj.weight'

from tensorrt-llm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.