Comments (19)
Hi, is it possible to take this one?
from nncf.
Hello @RedShift51, the task is assigned to you.
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
from nncf.
Hey, what metric value is okay for tinyllama/tinyllama-1.1b-step-50k-105b ?
from nncf.
Hey,
Similarity metric between float16 and int8 weight compressed tinyllama-1.1b-step-50k-105b model on whowhatbench:
similarity : 0.9628345480671635
Code to reproduce:
import torch
import whowhatbench
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
import nncf
MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map="auto")
evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer)
text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"].cuda(), "attention_mask": token["attention_mask"].cuda()}
compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))
metrics_per_prompt, metrics = evaluator.score(compressed_model)
metric_of_interest = "similarity"
print(metric_of_interest, ": ", metrics["similarity"][0])
from nncf.
Hi, sorry for the delay, I have reproduced on a cpu
import torch
import nncf
import transformers
import whowhatbench
MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_ID)
model = transformers.AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map="cpu")
text = 'The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.'
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}
compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))
evaluator = whowhatbench.Evaluator(base_model=compressed_model, tokenizer=tokenizer)
metrics_per_prompt, metrics = evaluator.score(compressed_model)
print(metrics)
metric_of_interest = "similarity"
print(metric_of_interest, ": ", metrics["similarity"][0])
worst_examples = evaluator.worst_examples(top_k=5, metric=metric_of_interest)
print("Metric: ", metric_of_interest)
from nncf.
The main idea of whowhatbench to compare original_model
and compressed_model
. But you have compared compressed_model
with compressed_model
in your code and as expected you get similarity metric = 1.
# collect outputs of original_model
evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer)
# inplace weight model compression
compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))
# collect outputs of compressed model and calculate the similarity metric.
metrics_per_prompt, metrics = evaluator.score(compressed_model)
from nncf.
@RedShift51, are you going to continue work on this issue? do you have any updates?
from nncf.
Removed assignment due to inactivity.
from nncf.
.take
from nncf.
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
from nncf.
@alexsu52 @ksj20 Any updates on this issue? If the assignee isn't going to work on this, I'd be down to take it.
from nncf.
.take
from nncf.
Thank you for looking into this issue! Please let us know if you have any questions or require any help.
from nncf.
@alexsu52 @AlexanderDokuchaev If I add the following code to the LMWeightCompression.compress()
and then run a benchmark right after using whowhatbench
how should I store the metrics?
Also please tell me if I am going in the right direction, this approach feels a bit odd so far
class LMWeightCompression(BaseTestPipeline):
...
def compress(self) -> None:
if self.backend == BackendType.FP32:
return
elif self.backend == BackendType.TORCH:
start_time = time.perf_counter()
MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_ID)
self.model = transformers.AutoModelForCausalLM.from_pretrained(
MODEL_ID, torch_dtype=torch.float16, device_map="cpu"
)
text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}
self.run_info.compression_memory_usage = memory_usage(self._compress_torch(inputs), max_usage=True)
self.run_info.time_compression = time.perf_counter() - start_time
return
print("Weight compression...")
start_time = time.perf_counter()
self.run_info.compression_memory_usage = memory_usage(self._compress, max_usage=True)
self.run_info.time_compression = time.perf_counter() - start_time
def _compress_torch(self, inputs):
self.compressed_model = nncf.compress_weights(self.model, dataset=nncf.Dataset([inputs]))
...
from nncf.
@alexsu52 @AlexanderDokuchaev If I add the following code to the LMWeightCompression.compress() and then run a >benchmark right after using whowhatbench how should I store the metrics?
Also please tell me if I am going in the right direction, this approach feels a bit odd so far
@alexsu52 @AlexanderDokuchaev following up on the above^
from nncf.
Hi @AdiKsOnDev
Add _validate
function to LMWeightCompression, that will contain call of evaluator from whowhatbench.
Example of _validate function https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/image_classification_timm.py#L127
Metrics should be stored in self.run_info
nncf/tests/post_training/pipelines/image_classification_timm.py
Lines 170 to 171 in 0b407de
from nncf.
Hi @AdiKsOnDev
Add
_validate
function to LMWeightCompression, that will contain call of evaluator from whowhatbench.Example of _validate function https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/image_classification_timm.py#L127
Metrics should be stored in
self.run_info
nncf/tests/post_training/pipelines/image_classification_timm.py
Lines 170 to 171 in 0b407de
OK, thanks for the directions
from nncf.
@AlexanderDokuchaev _validate(self)
already exists in LMWeightCompression
Git Blame
from nncf.
@AlexanderDokuchaev I added following code for INT_8
support, do you want me to send a PR?
def compress(self) -> None:
if self.backend == BackendType.FP32:
return
elif self.backend == BackendType.TORCH:
start_time = time.perf_counter()
tokenizer = transformers.AutoTokenizer.from_pretrained(self.model_id)
self.model = transformers.AutoModelForCausalLM.from_pretrained(
self.model_id, torch_dtype=torch.float16, device_map="cpu"
)
text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}
self.run_info.compression_memory_usage = memory_usage(self._compress_torch(inputs), max_usage=True)
self.run_info.time_compression = time.perf_counter() - start_time
return
print("Weight compression...")
start_time = time.perf_counter()
self.run_info.compression_memory_usage = memory_usage(self._compress, max_usage=True)
self.run_info.time_compression = time.perf_counter() - start_time
def _compress_torch(self, inputs):
self.compressed_model = nncf.compress_weights(self.model, dataset=nncf.Dataset([inputs]))
from nncf.
Related Issues (20)
- Compressed models that call torch.is_floating_point() during inference are traced with runtime error.
- nncf + ultralytics yolov8 training-time compression HOT 7
- Ultralytics yolov8 QAT example HOT 1
- Thanks to our Contributors HOT 1
- [Good First Issue][NNCF]: Fixing NNCFGraph export for visualization in Netron HOT 6
- Why doesn't the size and precision of the model change after INT4 quantization? HOT 2
- [Good First Issue][NNCF]: Optimize memory footprint by removing redundant collected statistics HOT 8
- [Good First Issue][NNCF]: Dump actual_subset_size to ov.Model HOT 8
- [Good First Issue][NNCF]: dump the ignored scope more gracefully HOT 4
- [Good First Issue][NNCF]: check number of u8, u4 constants in weight compression tests HOT 10
- PTQ of Fast R-CNN crashes in PyTorch backend HOT 1
- [Good First Issue][NNCF]: fix invalid error reporting in JSON schema HOT 19
- [Good First Issue][NNCF]: Add tests for torch device utils HOT 5
- [Good First Issue][NNCF]: Remove compress_to_fp16=False from examples HOT 3
- AttributeError: 'list' object has no attribute 'keys' when executing yolov8_quantize_with_accuracy_control example HOT 4
- The question about function create_compressed_model():RuntimeError: CUDA error: device-side assert triggered HOT 3
- Very inconsistent quantization results on Tensorflow HOT 8
- When will there be a c++ interface? HOT 1
- IndexError: list index out of range When I try to quantize llama models using OVQuantizer HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nncf.