aiot-mlsys-lab / svd-llm Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 6.0 752 KB

Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"

Home Page: https://arxiv.org/abs/2403.07378

License: Apache License 2.0

Python 99.58% Shell 0.42%

efficient-model generative-ai large-language-models

svd-llm's People

Contributors

Stargazers

Watchers

Forkers

pvtien96 valeriy42 tuidan qipengwang s1ghhh lihuang258

svd-llm's Issues

fail to apply on llama-13b

Hello,I have some trouble to reproduce the results on llama-13b.An error "scaling_matrix_inv = torch.linalg.inv(scaling_diag_matrix) torch._C._LinAlgError: linalg.inv: The diagonal element 6940 is zero, the inversion could not be completed because the input matrix is singular" occurs on line 203, in whitening function.
How can I sovle this problem? Thanks.

Instead of ratio-compress to specific layer size

Hello!
I was wondering if this code can be manipulated to transform a tensor - say (32128128) to a smaller tensor (86464).
Basically reduce the size of the llm layer by layer.

Incorrect Model Responses after compression

I tried to use the provided scripts to compress LLAMA 2 with 0.2 compression ratio. The model evaluation script shows a perplexity of 7.2 on wikitext, but the model responses are mostly incoherent. I am getting responses like

Instruction: tell me about you==\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ selecting\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

where as original model is giving decent responses.

Is there any modification to be done for the inference script or the tokeniser after model compression? , Is there an inference script within the repository?

Thanks for your help

recommended hardware resources

What is the minimum hardware resources required to test out this codebase for llama 7B.

still fail to apply on llama-13b

Hi, thank you for your reply. But I still get the same problem as mentioned before.
Traceback (most recent call last):
File "/home/xxx/SVD-LLM/SVDLLM_new.py", line 193, in whitening
scaling_matrix_inv = torch.linalg.inv(scaling_diag_matrix)
torch._C._LinAlgError: linalg.inv: The diagonal element 6940 is zero, the inversion could not be completed because the input matrix is singular. "
My python environment is built on requirements.txt. And I run the code on 2 3090 GPUs

Request for Code Integration of SVD-LLM with GPTQ

Hello,

Firstly, I want to express my gratitude for the fascinating work you've been doing. It's been inspiring.

I've recently come across your paper where you describe the integration of SVD-LLM with GPTQ, and I'm eager to explore the implementation further.
Could you please share the code where you've integrated SVD-LLM with GPTQ as described in the paper?

Your assistance in providing access to this code would be appreciated. Thank you for your time and consideration.

aiot-mlsys-lab / svd-llm Goto Github PK

svd-llm's People

Contributors

Stargazers

Watchers

Forkers

svd-llm's Issues

fail to apply on llama-13b

Instead of ratio-compress to specific layer size

Incorrect Model Responses after compression

recommended hardware resources

still fail to apply on llama-13b

Request for Code Integration of SVD-LLM with GPTQ

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs