After finetuning , how can I deploy / use the checkpoints? or how to export the c

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Amazing - happy to hear that <a class="user-mention notranslate" data-hovercard-type="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

How to use the resulting whisper checkpoints when finetuning ,about huggingface/community-events

Comments (9)

osanseviero commented on July 22, 2024 2

cc @Vaibhavs10 @sanchit-gandhi

from community-events.

Vaibhavs10 commented on July 22, 2024 2

Hi @SuperKogito,

The fine-tuned checkpoints can be inferred in multiple ways, the most simplest way would be to perhaps use it as part of the ASR pipeline as mentioned in the snippet below:

from transformers import pipeline

whisper_asr = pipeline(
    "automatic-speech-recognition", 
    model="MODEL_CHECKPOINT_NAME_HERE"
)

whisper_asr(AUDIO_FILE_NAME.mp3)

If you want more fine-grained control over generation then you can also use it with the processor + the model, for that you can do something like this:

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

torch.cuda.empty_cache()

device = "cuda" if torch.cuda.is_available() else "cpu"

model = WhisperForConditionalGeneration.from_pretrained("MODEL_CHECKPOINT_NAME_HERE").to(device)
processor = WhisperProcessor.from_pretrained("MODEL_CHECKPOINT_NAME_HERE")

inputs = processor.feature_extractor(next(iter(common_voice_es))["audio"]["array"], return_tensors="pt", sampling_rate=16_000).input_features.to("cuda")
forced_decoder_ids = processor.get_decoder_prompt_ids(language=LANGUAGE_HERE, task="transcribe")

predicted_ids = model.generate(inputs, max_length=448, forced_decoder_ids=forced_decoder_ids)
processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True, normalize=False)[0]

I created a notebook earlier as part of the event to showcase these inference methods you can find it here: https://github.com/Vaibhavs10/notebooks/blob/main/Infer_Whisper_🤗transformers_edition.ipynb

To answer your last question about converting the Transformer checkpoints to Open AI Whisper format, we don't have an officially supported utility for it, however there are some community scripts that can help you do that: https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets

# install multiple_datasets
!pip install git+https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets.git

from multiple_datasets.hub_default_utils import convert_hf_whisper
model_name_or_path = 'openai/whisper-tiny'
whisper_checkpoint_path = './whisper-tiny-checkpoint.pt'
convert_hf_whisper(model, whisper_checkpoint_path)

# now transcribe
import whisper
model = whisper.load_model(whisper_model_path)
result = model.transcribe('loooong_audio_path.wav') # probably longer than 10 min? hour?
print(result['text'])

Let me know if you have any other questions, happy transcribing! 🤗

from community-events.

Vaibhavs10 commented on July 22, 2024 1

Hey @SuperKogito,

I think the problem is in the way you are defining the path to your model. You should be able to infer from the checkpoint directly via the below code:

from transformers import pipeline

whisper_asr = pipeline(
    "automatic-speech-recognition", 
    model="whisper-finetuned/checkpoint-40000"
)

whisper_asr(AUDIO_FILE_NAME.mp3)

Just make sure to pass along the path to the specific checkpoint to ensure that the pipeline picks up on the necessary and required files. This way you'd also not need to convert your checkpoint to the Open AI Whisper format.

Do let me know if it doesn't work.

from community-events.

sanchit-gandhi commented on July 22, 2024 1

Hey @SuperKogito!

When we use from_pretrained with a model name or path, we load the weights from this path into our model. So we need to make sure that our model path contains:

Model weights (pytorch_model.bin)
Config (config.json)

We'd expect to see the final model weights saved under your output_dir (whisper-small-finetuned-de-2023-01-03) at the end of training.

We can see that the weights are saved every save_steps (4000 steps) during training, but there's an absence of the final weights under your output_dir (whisper-small-finetuned-de-2023-01-03).

This could be because trainer.save_model() is only under the control flow for when we resume training from a checkpoint:

# start training
print("start training")
if checkpoint is None:
    train_result = trainer.train()
else :
    print("-> Training from checkpoint")
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
    trainer.save_model()

This means that we only save the final model if we're resuming training from a checkpoint.

In terms of the other files in our directory, there is one file related to the feature extractor:

├── preprocessor_config.json

And several files related to the tokenizer (no need for tokenizer.pt):

├── added_tokens.json
├── merges.txt
├── normalizer.json
├── special_tokens_map.json
├── tokenizer_config.json
└── vocab.json

from community-events.

sanchit-gandhi commented on July 22, 2024 1

Amazing - happy to hear that @SuperKogito! Enjoy using your fine-tuned model 🤗

from community-events.

SuperKogito commented on July 22, 2024

Thank you for your response!
Unfortunately, none of these worked with my resulting checkpoints :(
The first two snippets results in the following error:

whisper-small-finetuned-de-2023-01-03 does not appear to have a file named config.json. Checkout 'https://huggingface.co/whisper-small-finetuned-de-2023-01-03/None' for available files.

as for the second it causes the following:

Whisper(...
...)' is the correct path to a directory containing a config.json file

My checkpoints structure looks as follow:

whisper-small-finetuned-de-2023-01-03/
├── added_tokens.json
├── all_results.json
├── checkpoint-12000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-16000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-20000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-24000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-28000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-32000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-36000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-4000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-40000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── checkpoint-8000
│   ├── config.json
│   ├── optimizer.pt
│   ├── preprocessor_config.json
│   ├── pytorch_model.bin
│   ├── rng_state.pth
│   ├── scaler.pt
│   ├── scheduler.pt
│   ├── trainer_state.json
│   └── training_args.bin
├── eval_results.json
├── merges.txt
├── normalizer.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── train_results.json
└── vocab.json

I am not sure why is it not recognizing any of my config.json files?
I am also missing a tokerizer.pt, is that normal? am I doing something wrong when training?

My finetuning code is the following:

import os
# direct cache
os.environ["HF_HOME"] = "/trainingdata/chris/.cache/huggingface"
os.environ["TRANSFORMERS_CACHE"] = "/trainingdata/chris/.cache/huggingface/hub"

import torch 
# specify the gpu to use
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
torch.cuda.device_count()  # print 1

import evaluate
from transformers import WhisperTokenizer
from transformers import WhisperProcessor
from transformers import WhisperFeatureExtractor
from transformers import WhisperForConditionalGeneration
from datasets import Dataset, load_dataset, DatasetDict, Audio, Features, Value


# prepare feature extractor and tokenizer
feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small")
tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small", language="de", task="transcribe")
processor = WhisperProcessor.from_pretrained("openai/whisper-small", language="de", task="transcribe")


def verify_tokenizer(common_voice):
    input_str = common_voice["train"][0]["sentence"]
    labels = tokenizer(input_str).input_ids
    decoded_with_special = tokenizer.decode(labels, skip_special_tokens=False)
    decoded_str = tokenizer.decode(labels, skip_special_tokens=True)

    print(f"Input:                 {input_str}")
    print(f"Decoded w/ special:    {decoded_with_special}")
    print(f"Decoded w/out special: {decoded_str}")
    print(f"Are equal:             {input_str == decoded_str}")


def prepare_dataset(batch):
    # load and resample audio data from 48 to 16kHz
    audio = batch["audio"]

    # compute log-Mel input features from input audio array
    batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

    # encode target text to label ids
    batch["labels"] = tokenizer(batch["sentence"]).input_ids
    return batch


def compute_metrics(pred):
    pred_ids = pred.predictions
    label_ids = pred.label_ids

    # replace -100 with the pad_token_id
    label_ids[label_ids == -100] = tokenizer.pad_token_id

    # we do not want to group tokens when computing the metrics
    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    label_str = tokenizer.batch_decode(label_ids, skip_special_tokens=True)

    wer = 100 * metric.compute(predictions=pred_str, references=label_str)
    print("WER: ", wer)
    return {"wer": wer}


from dataclasses import dataclass
from typing import Any, Dict, List, Union

@dataclass
class DataCollatorSpeechSeq2SeqWithPadding:
    processor: Any

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lengths and need different padding methods
        # first treat the audio inputs by simply returning torch tensors
        input_features = [{"input_features": feature["input_features"]} for feature in features]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

        # get the tokenized label sequences
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        # pad the labels to max length
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

        # if bos token is appended in previous tokenization step,
        # cut bos token here as it's append later anyways
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]

        batch["labels"] = labels

        return batch

# read data
features = Features(
    {
        "audio": Audio(sampling_rate=16000),
        "sentence": Value("string")
    }
)

common_voice = load_dataset(
    'csv', data_files={
        'train': '100k_parsed_eml_train_data.csv', 
        'test': '30k_parsed_eml_test_data.csv'
    }
)
print("Loaded data: ", common_voice)

# read audio
common_voice["train"] = common_voice["train"].cast_column("audio", Audio(sampling_rate=16000))
common_voice["test"] = common_voice["test"].cast_column("audio", Audio(sampling_rate=16000))
print("Formatted train data: ", common_voice["train"][0])
print("Formatted test  data: ", common_voice["test"][0])


# verify tokenizer 
verify_tokenizer(common_voice)

# extract features
common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=8)
data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)

# config metrics
metric = evaluate.load("wer")

# import model 
print("load model")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
#model.config.forced_decoder_ids = None
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="de", task="transcribe")
model.config.suppress_tokens = []

# define training config 
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer

output_dir = "./whisper-small-finetuned-de-2023-01-03"

training_args = Seq2SeqTrainingArguments(
    output_dir=output_dir,                # change to a repo name of your choice
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,        # increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=5000,
    max_steps=40000,
    gradient_checkpointing=True,
    fp16=True,
    evaluation_strategy="steps",
    per_device_eval_batch_size=4,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=4000,
    eval_steps=4000,
    logging_steps=250,
    logging_dir="logs",
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
)

 
# config trainer
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice["train"],
    eval_dataset=common_voice["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)

# save processor
processor.save_pretrained(training_args.output_dir)

# load checkpoints 
from transformers.trainer_utils import get_last_checkpoint
last_checkpoint = get_last_checkpoint(training_args.output_dir)
print("checkpoints: ", last_checkpoint)
checkpoint = last_checkpoint 

# start training
print("start training")
if checkpoint is None:
    train_result = trainer.train()
else :
    print("-> Training from checkpoint")
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
    trainer.save_model()

# evaluate
small_train_dataset = common_voice["train"]
small_eval_dataset = common_voice["test"]

# compute train results
metrics = train_result.metrics
max_train_samples = len(small_train_dataset)
metrics["train_samples"] = min(max_train_samples, len(small_train_dataset))

# save train results
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)

# compute evaluation results
metrics = trainer.evaluate()
max_val_samples = len(small_eval_dataset)
metrics["eval_samples"] = min(max_val_samples, len(small_eval_dataset))

# save evaluation results
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

from community-events.

SuperKogito commented on July 22, 2024

I still cannot test the checkpoints directly but I figured out the conversion issue and re-wrote the code to be more user friendly

"""
The following code is based on:
- https://github.com/bayartsogt-ya/whisper-multiple-hf-datasets
"""
import re
import sys
import torch
import argparse
from transformers import WhisperForConditionalGeneration


whisper_mappings = {
    "layers": "blocks",
    "fc1": "mlp.0",
    "fc2": "mlp.2",
    "final_layer_norm": "mlp_ln",
    ".self_attn.q_proj": ".attn.query",
    ".self_attn.k_proj": ".attn.key",
    ".self_attn.v_proj": ".attn.value",
    ".self_attn_layer_norm": ".attn_ln",
    ".self_attn.out_proj": ".attn.out",
    ".encoder_attn.q_proj": ".cross_attn.query",
    ".encoder_attn.k_proj": ".cross_attn.key",
    ".encoder_attn.v_proj": ".cross_attn.value",
    ".encoder_attn_layer_norm": ".cross_attn_ln",
    ".encoder_attn.out_proj": ".cross_attn.out",
    "decoder.layer_norm.": "decoder.ln.",
    "encoder.layer_norm.": "encoder.ln_post.",
    "embed_tokens": "token_embedding",
    "encoder.embed_positions.weight": "encoder.positional_embedding",
    "decoder.embed_positions.weight": "decoder.positional_embedding",
    "layer_norm": "ln_post",
}

def format_key(key, verbose=False):
    # format replacements
    rep_sorted = sorted(whisper_mappings, key=len, reverse=True)
    rep_escaped = map(re.escape, rep_sorted)
    
    # Create a big OR regex that matches any of the substrings to replace
    pattern = re.compile("|".join(rep_escaped))
    
    # For each match, look up the new string in the replacements, being the key the normalized old string
    new_key = pattern.sub(lambda m: whisper_mappings[m.group(0)], key)
    
    # debug
    if verbose: 
        print(f"{key} -> {new_key}")
    return new_key

def convert_hf_checkpoints_to_whisper(checkpoints_path, generated_whisper_model_path, verbose):
    try:
        # load checkpoints
        transformer_model = WhisperForConditionalGeneration.from_pretrained(checkpoints_path)
        config = transformer_model.config

        # build dims
        dims = {
            "n_mels": config.num_mel_bins,
            "n_vocab": config.vocab_size,
            "n_audio_ctx": config.max_source_positions,
            "n_audio_state": config.d_model,
            "n_audio_head": config.encoder_attention_heads,
            "n_audio_layer": config.encoder_layers,
            "n_text_ctx": config.max_target_positions,
            "n_text_state": config.d_model,
            "n_text_head": config.decoder_attention_heads,
            "n_text_layer": config.decoder_layers,
        }

        # convert
        hf_state_dict = transformer_model.model.state_dict()
        whisper_state_dict = { format_key(hf_key, verbose): hf_value for hf_key, hf_value in hf_state_dict.items() }

        # save model
        torch.save({"dims": dims, "model_state_dict": whisper_state_dict}, generated_whisper_model_path)
        print("-> whisper-like model is exported under ", generated_whisper_model_path)  
    except Exception as e:
        print(str(e))
        print("ConversionError: could not convert checkpoints.")

def main():
    # init parser
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--hf_checkpoints_path",
        type=str,
        default=None,
        help="csv file with data to use for testing.",
    )
    parser.add_argument(
        "--exported_model_path",
        type=str,
        default=None,
        help="path to whisper model.",
    )
    parser.add_argument(
        "--verbose",
        default=False,
        help="logs verbosity (if True).",
        action="store_true"
    )

    # check args
    args = parser.parse_args()
    if not args.hf_checkpoints_path:
        print('You need to specify the checkpoints path via "the --hf_checkpoints_path flag."')
        sys.exit(1)

    if not args.exported_model_path:
        print('You need to specify a path for the generated model via "the --exported_model_path flag."')
        sys.exit(1)

    # export model
    convert_hf_checkpoints_to_whisper(args.hf_checkpoints_path, args.exported_model_path, args.verbose)


if __name__ == "__main__":
    main()

This can be used as follows:

python export.py --hf_checkpoints_path whisper-finetuned/checkpoint-40000 --exported_model_path finetuned_model.pt

from community-events.

SuperKogito commented on July 22, 2024

@Vaibhavs10 and @sanchit-gandhi, thank you both for your time and help ❤️
@sanchit-gandhi was right about my code. Fixing that, made my checkpoints load correctly :))

from community-events.

psiyou commented on July 22, 2024

Hi @sanchit-gandhi @SuperKogito, can you please share how to fixed the issue?? It's still confuse to me.

from community-events.

How to use the resulting whisper checkpoints when finetuning about community-events HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs