I've got a finetuned whisper model on hugging face and I would like to try timestamped

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

I think I found a way to reproduce: <div class="highlight highlight-source-python

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Loading finetuned model serialized with safetensors (and/or sharded models) about whisper-timestamped HOT 10 CLOSED

BlahBlah314 commented on June 12, 2024

Loading finetuned model serialized with safetensors (and/or sharded models)

from whisper-timestamped.

Comments (10)

Jeronymous commented on June 12, 2024 2

Thanks both !
I figured out how to support shared models and safetensors format.

@BlahBlah314 You should be able to use your "BlahBlah314/whisper_LargeV3FR_ft-V1" with the new version (1.14.4)

from whisper-timestamped.

Jeronymous commented on June 12, 2024

Can you give some code to reproduce?

from whisper-timestamped.

LaurinmyReha commented on June 12, 2024

from transformers import WhisperForConditionalGeneration
model=WhisperForConditionalGeneration.from_pretrained("your_model_dir")
model.save_pretrained("your_model_dir", safe_serialization=False, max_shard_size= '10GB')

from whisper-timestamped.

Jeronymous commented on June 12, 2024

What is "your_model_dir" in your case?
You mentioned "finetuned whisper model on hugging face", I thought you were using a model that is on Hugging Face.

Also there is no whisper_timestamped in your code.
Is it possible to have a code to reproduce how you load the model in whisper_timestamped? (the code that throws an error related to a missing "pytorch_model.bin" file I guess)

from whisper-timestamped.

Jeronymous commented on June 12, 2024

I think I found a way to reproduce:

import shutil
from transformers import WhisperForConditionalGeneration
import whisper_timestamped as whisper

audio_file = "XXX.wav" # use an audio file here

shutil.rmtree("tmp_model", ignore_errors=True)
model = WhisperForConditionalGeneration.from_pretrained("qanastek/whisper-tiny-french-cased")
model.save_pretrained("tmp_model", safe_serialization=False, max_shard_size= '100MB')

model = whisper.load_model("qanastek/whisper-tiny-french-cased")
expected = whisper.transcribe(model, audio_file)

model = whisper.load_model("tmp_model")
output = whisper.transcribe(model, audio_file)

assert expected == output

The second loading of the model fails (whisper.load_model("tmp_model")).
It happens because the model is sharded. I will investigate that.

@LaurinmyReha Can you just check that your folder "your_model_dir" contains something similar to this:

config.json  generation_config.json  pytorch_model-00001-of-00003.bin  pytorch_model-00002-of-00003.bin  pytorch_model-00003-of-00003.bin  pytorch_model.bin.index.json

from whisper-timestamped.

LaurinmyReha commented on June 12, 2024

exactly. Increase the max_shard_size to something larger and you should be fine i think. 10GB as in the example will definately be enough :)

model.save_pretrained("your_model_dir", safe_serialization=False, max_shard_size= '10GB')

from whisper-timestamped.

BlahBlah314 commented on June 12, 2024

Can you give some code to reproduce?

Hi !
Here is my code:

import whisper_timestamped as whisper

audio = whisper.load_audio("4d8b691a-7529-47d1-a3e3-00ce32f430c2.wav")

model = whisper.load_model("BlahBlah314/whisper_LargeV3FR_ft-V1", device="cuda")

result = whisper.transcribe(model, audio, language="fr")

I must specify that I don't have a .bin model, but a safetensor. Therefore, maybe I need to convert somehow my safetensors model ? Or a config to specify on whisper timestamped ?

from whisper-timestamped.

LaurinmyReha commented on June 12, 2024

from transformers import WhisperForConditionalGeneration
model=WhisperForConditionalGeneration.from_pretrained("BlahBlah314/whisper_LargeV3FR_ft-V1")
model.save_pretrained("path_to_where_you_want_to_safe_your_bin", safe_serialization=False, max_shard_size= '10GB')

exactly. I think this should do the conversion.... there is probably a better way but this seemed easiest to me. After that add the .bin file to where your safetensor file resides ( in your case the huggingface hub) and you should be able to load the model :)

from whisper-timestamped.

BlahBlah314 commented on June 12, 2024

Thank you ! I'll try that way

from whisper-timestamped.

LaurinmyReha commented on June 12, 2024

very cool!! thanks for the quick update!

from whisper-timestamped.

Loading finetuned model serialized with safetensors (and/or sharded models) about whisper-timestamped HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs