Comments (16)
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
With this modification, the code can run normally
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
ok~
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
nice, i'll try it.
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
I commented these three lines,but still have the same error
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
“--qwen_type qwen2”, Do you set this?
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
“--qwen_type qwen2”, Do you set this?
I set it, using
python3 convert_checkpoint.py --model_dir /workspace/model/model/Qwen1.5-32B-Chat/ --output_dir /workspace/model/model/Qwen-32B-trt --dtype float16 --qwen_type qwen2
and it doesn't seem to work
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
“--qwen_type qwen2”, Do you set this?
I set it, using
python3 convert_checkpoint.py --model_dir /workspace/model/model/Qwen1.5-32B-Chat/ --output_dir /workspace/model/model/Qwen-32B-trt --dtype float16 --qwen_type qwen2
and it doesn't seem to work
python3 convert_checkpoint.py \
--model_dir ./Qwen1.5-32B-Chat-GPTQ-Int4/ \
--output_dir ./tllm_checkpoint_1gpu_gptq/ \
--dtype float16 \
--use_weight_only \
--weight_only_precision int4_gptq \
--per_group \
--load_model_on_cpu \
--qwen_type qwen2
This works for me
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
it should be caused by the differences between GQA(32B) and MHA(others)
“--qwen_type qwen2”, Do you set this?
I set it, using
python3 convert_checkpoint.py --model_dir /workspace/model/model/Qwen1.5-32B-Chat/ --output_dir /workspace/model/model/Qwen-32B-trt --dtype float16 --qwen_type qwen2
and it doesn't seem to workpython3 convert_checkpoint.py \ --model_dir ./Qwen1.5-32B-Chat-GPTQ-Int4/ \ --output_dir ./tllm_checkpoint_1gpu_gptq/ \ --dtype float16 \ --use_weight_only \ --weight_only_precision int4_gptq \ --per_group \ --load_model_on_cpu \ --qwen_type qwen2This works for me
Thanks, it works for me in Qwen1.5-32B-Chat-GPTQ-Int4 too
from tensorrt-llm.
This issue arises because the conversion of the non-quantized version of qwen1.5 is not implemented in "tensorrt_llm/models/qwen/convert.py" or "tensorrt_llm/models/qwen/model.py".
from tensorrt-llm.
is this issue fixed now?
from tensorrt-llm.
no
from tensorrt-llm.
@AGI-player Thanks for your feedback. This is a known issue and we will fix it soon.
@jershi425 is this issue fixed in the latest version?
from tensorrt-llm.
from tensorrt-llm.
Related Issues (20)
- Error trying to build the visual encoder for llava-v1.6-34b-hf using build_visual_engine.py HOT 3
- Finding protobuf files while benchmarking TensorRT-LLM HOT 2
- How to add a new quantization method?
- How to preprocess a int4 model? HOT 1
- Does TensorRT-LLM support passing input_embeds directly? HOT 11
- Convert_checkpoint.py failed with LLAMA 3.1 8B instruct HOT 2
- Will KOSMOS-2.5 be supported?
- Assertion failed: FMHA kernels are not found
- Why are the answers the same every time?
- Conver qwen2-57B-A14B failed
- `--use_fp8_context_fmha` is broken with Llama models
- Concerns about the removal of batch manager C++ API HOT 1
- Build failure when using FP8 quantized Medusa heads
- wgmma.mma_async instructions are serialized due to non wgmma instructions defining accumulator registers of a wgmma between start and end of the pipeline stage
- The weights compiled by v0.11.0 are run using TritonServer, and the concurrency is lower than that of the v0.10.0 version. The compilation scripts are the same.
- AttributeError: 'PluginConfig' object has no attribute '_streamingllm'. Did you mean: '_streamingllm'? HOT 4
- multi_block_mode has been removed in the current version? HOT 2
- TensorRT LLM does not stop while met the eos_id
- Llama3.1 support removed? HOT 1
- Cannot support long context input vs vllm.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorrt-llm.