GithubHelp home page GithubHelp logo

Comments (19)

RedShift51 avatar RedShift51 commented on June 25, 2024

Hi, is it possible to take this one?

from nncf.

alexsu52 avatar alexsu52 commented on June 25, 2024

Hello @RedShift51, the task is assigned to you.

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from nncf.

RedShift51 avatar RedShift51 commented on June 25, 2024

Hey, what metric value is okay for tinyllama/tinyllama-1.1b-step-50k-105b ?

from nncf.

alexsu52 avatar alexsu52 commented on June 25, 2024

Hey,

Similarity metric between float16 and int8 weight compressed tinyllama-1.1b-step-50k-105b model on whowhatbench:
similarity : 0.9628345480671635

Code to reproduce:

import torch
import whowhatbench
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

import nncf

MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map="auto")

evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer)

text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"].cuda(), "attention_mask": token["attention_mask"].cuda()}
compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))

metrics_per_prompt, metrics = evaluator.score(compressed_model)

metric_of_interest = "similarity"
print(metric_of_interest, ": ", metrics["similarity"][0])

from nncf.

RedShift51 avatar RedShift51 commented on June 25, 2024

Hi, sorry for the delay, I have reproduced on a cpu
screenshot

import torch
import nncf
import transformers
import whowhatbench

MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"

tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_ID)
model = transformers.AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map="cpu")

text = 'The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens.'
token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}

compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))


evaluator = whowhatbench.Evaluator(base_model=compressed_model, tokenizer=tokenizer)
metrics_per_prompt, metrics = evaluator.score(compressed_model)
print(metrics)
metric_of_interest = "similarity"
print(metric_of_interest, ": ", metrics["similarity"][0])

worst_examples = evaluator.worst_examples(top_k=5, metric=metric_of_interest)
print("Metric: ", metric_of_interest)

from nncf.

alexsu52 avatar alexsu52 commented on June 25, 2024

The main idea of whowhatbench to compare original_model and compressed_model. But you have compared compressed_model with compressed_model in your code and as expected you get similarity metric = 1.

# collect outputs of original_model
evaluator = whowhatbench.Evaluator(base_model=model, tokenizer=tokenizer)
# inplace weight model compression
compressed_model = nncf.compress_weights(model, dataset=nncf.Dataset([inputs]))
# collect outputs of compressed model and calculate the similarity metric.
metrics_per_prompt, metrics = evaluator.score(compressed_model)

from nncf.

alexsu52 avatar alexsu52 commented on June 25, 2024

@RedShift51, are you going to continue work on this issue? do you have any updates?

from nncf.

alexsu52 avatar alexsu52 commented on June 25, 2024

Removed assignment due to inactivity.

from nncf.

ksj20 avatar ksj20 commented on June 25, 2024

.take

from nncf.

github-actions avatar github-actions commented on June 25, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

@alexsu52 @ksj20 Any updates on this issue? If the assignee isn't going to work on this, I'd be down to take it.

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

.take

from nncf.

github-actions avatar github-actions commented on June 25, 2024

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

@alexsu52 @AlexanderDokuchaev If I add the following code to the LMWeightCompression.compress() and then run a benchmark right after using whowhatbench how should I store the metrics?
Also please tell me if I am going in the right direction, this approach feels a bit odd so far

class LMWeightCompression(BaseTestPipeline):
...

    def compress(self) -> None:
        if self.backend == BackendType.FP32:
            return
        elif self.backend == BackendType.TORCH:
            start_time = time.perf_counter()
            MODEL_ID = "tinyllama/tinyllama-1.1b-step-50k-105b"

            tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_ID)
            self.model = transformers.AutoModelForCausalLM.from_pretrained(
                MODEL_ID, torch_dtype=torch.float16, device_map="cpu"
            )

            text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
            token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
            inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}

            self.run_info.compression_memory_usage = memory_usage(self._compress_torch(inputs), max_usage=True)
            self.run_info.time_compression = time.perf_counter() - start_time

            return

        print("Weight compression...")
        start_time = time.perf_counter()
        self.run_info.compression_memory_usage = memory_usage(self._compress, max_usage=True)
        self.run_info.time_compression = time.perf_counter() - start_time

    def _compress_torch(self, inputs):
        self.compressed_model = nncf.compress_weights(self.model, dataset=nncf.Dataset([inputs]))

...

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

@alexsu52 @AlexanderDokuchaev If I add the following code to the LMWeightCompression.compress() and then run a >benchmark right after using whowhatbench how should I store the metrics?
Also please tell me if I am going in the right direction, this approach feels a bit odd so far

@alexsu52 @AlexanderDokuchaev following up on the above^

from nncf.

AlexanderDokuchaev avatar AlexanderDokuchaev commented on June 25, 2024

Hi @AdiKsOnDev

Add _validate function to LMWeightCompression, that will contain call of evaluator from whowhatbench.

Example of _validate function https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/image_classification_timm.py#L127

Metrics should be stored in self.run_info

self.run_info.metric_name = "Acc@1"
self.run_info.metric_value = acc_top1

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

Hi @AdiKsOnDev

Add _validate function to LMWeightCompression, that will contain call of evaluator from whowhatbench.

Example of _validate function https://github.com/openvinotoolkit/nncf/blob/develop/tests/post_training/pipelines/image_classification_timm.py#L127

Metrics should be stored in self.run_info

self.run_info.metric_name = "Acc@1"
self.run_info.metric_value = acc_top1

OK, thanks for the directions

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

@AlexanderDokuchaev _validate(self) already exists in LMWeightCompression
image

Git Blame

image

from nncf.

AdiKsOnDev avatar AdiKsOnDev commented on June 25, 2024

@AlexanderDokuchaev I added following code for INT_8 support, do you want me to send a PR?

def compress(self) -> None:
    if self.backend == BackendType.FP32:
        return
    elif self.backend == BackendType.TORCH:
        start_time = time.perf_counter()
                                                                                                            
        tokenizer = transformers.AutoTokenizer.from_pretrained(self.model_id)
        self.model = transformers.AutoModelForCausalLM.from_pretrained(
            self.model_id, torch_dtype=torch.float16, device_map="cpu"
        )
                                                                                                            
        text = "The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens."
        token = tokenizer(text, max_length=500, return_tensors="pt", truncation=True)
        inputs = {"input_ids": token["input_ids"], "attention_mask": token["attention_mask"]}
                                                                                                            
        self.run_info.compression_memory_usage = memory_usage(self._compress_torch(inputs), max_usage=True)
        self.run_info.time_compression = time.perf_counter() - start_time
                                                                                                            
        return

      print("Weight compression...")
      start_time = time.perf_counter()
      self.run_info.compression_memory_usage = memory_usage(self._compress, max_usage=True)
      self.run_info.time_compression = time.perf_counter() - start_time
    def _compress_torch(self, inputs):
        self.compressed_model = nncf.compress_weights(self.model, dataset=nncf.Dataset([inputs]))

from nncf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.