Hi, I am reproducing the llama-7b pruning experiment, but the perplexities after pruning 70% by sparsegpt, wanda, owl w. are different from the result in paper, as table in below:
second col is my res, third col is paper's result.
i wander if i miss any point or if hyperparams in this repo are different from ones in paper?
I have a question about Layerwise Outlier Distribution. Why is the value in LOD larger than 1 in Figure 1? I suppose the outlier ratio should be smaller than 1?
Hey, I wanted to know whether the size of model will be same after pruning or it'll be reduced?
I've tried to prune OPT-125M model but the size of the model is same as before which is 250MB.
Thanks in advance