huggingface / community-events Goto Github PK
View Code? Open in Web Editor NEWPlace where folks can contribute to 🤗 community events
Place where folks can contribute to 🤗 community events
Hi!
I have been trying for a long time to get inference on the finetuned model but it keeps throwing an error saying that tokenizer is missing.
Steps to reproduce:
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition",model="nodlehs/whisper_finetune") # change to "your-username/the-name-you-picked"
def transcribe(audio):
text = pipe(audio)["text"]
return text
It seems I am missing a tokenizer file but while running the whisper finetune, no such file was uploaded
Could someone please help me out?
P.S this is my model on hf https://huggingface.co/nodlehs/whisper_finetune/tree/main
The fine tuning script run_speech_recognition_seq2seq_streaming.py use interleave_dataset function to add the train and validation split together. But I think what we really want to use is concatenate_dataset, because according to the docs, the result of function interleave_dataset ends when one of the source datasets runs out of examples (the default mode).
For example, if the train split has 100 entries and validation split has 10 entries, the result would contains only 10 entries from validation split and 10 from train split. That means we waste the existing train split dataset.
as example:
>>> from datasets import Dataset, interleave_datasets, concatenate_datasets
>>> d1 = Dataset.from_dict({"a": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> d2 = Dataset.from_dict({"a": [10, 11, 12]})
>>> print(interleave_datasets([d1, d2])['a'])
[0, 10, 1, 11, 2, 12]
>>> print(concatenate_datasets([d1, d2])['a'])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Hi,
In file "interleave_streaming_datasets.ipynb" the rename_column and remove_column methods are used and it will throw an error with this line of code:
dataset = dataset.remove_columns(set(dataset.features.keys()) - set(["audio", "sentence"]))
as the dataset.features becomes None. This is a bug mentioned here.
Just dumping ideas here related to this repo cc @NielsRogge @nateraw
PyTorchModelHubMixin
has not been widely tested so I'm a bit worried people face issues.Hi, I tried to upload my dataset to the Hub but I am getting this error message
HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create - You don't have the rights to create a dataset under this namespace
I tried changing the permissions of the token (write and read) but it didn't work. Any help will be appreciated.
Hi,
I have custom text data for plant disease names and plant names like this:
uuid, context
1er1hhaj13, The Rhododendron, a popular ornamental plant, often suffers from Phytophthora ramorum, a challenging disease to manage and pronounce. This pathogen causes Sudden Oak Death, which can lead to extensive damage and mortality in infected plants.
I used speech-to-text APIs to convert this context into audio WAV files, choosing 10 speakers with mostly American/UK/British accents. So I created around ~5k samples for training and ~2k samples for testing.
I followed the same steps from "Fast whisper finetuning" to finetune the peft version of Whisper Large-v2. The training and validation loss looks good:
Step | Training Loss | Validation Loss
250 | 0.413000 | 0.102663
500 | 0.109900 | 0.130888
750 | 0.116500 | 0.102719
1000 | 0.092800 | 0.099153
1250 | 0.068800 | 0.075613
1500 | 0.042500 | 0.085680
1750 | 0.047500 | 0.076951
2000 | 0.027500 | 0.065127
2250 | 0.023700 | 0.061832
2500 | 0.012500 | 0.062658
2750 | 0.011500 | 0.061922
3000 | 0.008500 | 0.061463
3250 | 0.005300 | 0.060227
3500 | 0.003800 | 0.060712
3750 | 0.002700 | 0.060332
4000 | 0.002300 | 0.060496
When I calculated WER on the test data:
Which looks good. However, during real-time testing with an Indian English-speaking audience, the accuracy for plant names and disease names was not satisfactory. What strategies could we employ to improve accuracy in real-time settings?
Any guidance or suggestions on this matter would be greatly appreciated. Thank you!
In the mixin code (https://github.com/huggingface/huggingface_hub/blob/9e0ac58813df4e0414d6fd494040953f053dbe0d/src/huggingface_hub/hub_mixin.py#L93) from_pretrained calls _from_pretrained but doesn't pass in a use_auth_token argument.
In the LightweightGAN code this argument is required: https://github.com/huggingface/community-events/blob/main/huggan/pytorch/lightweight_gan/lightweight_gan.py#L854
Notebook showing how this manifests for a user: https://colab.research.google.com/drive/1Lc42pRp0-ZxFKbhfU420ZrpXfA8Q-k-e?usp=sharing (includes how I worked around this for now). Currently if you follow the example usage at e.g. https://huggingface.co/ceyda/butterfly_cropped_uniq1K_512 you'll get an error.
Suggested fix is to just add a default for use_auth_token=None in the LightweightGAN _from_pretrained
method, but creating an issue in case someone wants to do a more thorough fix. This code is very rarely used but I've had at least one keen learner stuck on this.
After finetuning , how can I deploy / use the checkpoints?
or how to export the checkpoints into a model that I can load and deploy as in whisper?
import whisper
model = whisper.load_model("base")
Hi I'm trying to train whisper fine-tune with multi-gpu
and I don't know what RANK
to set
I just set WORLD_SIZE
is numer of gpu and MASTER_ADDR
is localhost, MASTER_PORT
is idle port
When WORLD_SIZE
is more than 2 and RANK
is set 0, training is hanging
Probably it hanged in setting torch.distributed.TCPStore() part..
anyone who solved this problem?
let me know hint please
As discussed in #10 , package up the huggan/
dir so it's pip installable and components within it are easier to share
Can I use whisper parameters like beam_size and temperature while running my finetuned hf model?
Hello there
I participated on the Whisper fine-tuning event hold last December. As result, I trained some models for Catalan language finetuned using Common Voice 11. Here are the models that we trained:
They score well in the WER evaluation produced by the script provided by HuggingFace.
However, when I evaluate these fine-tuned models with real audio, they perform worse than the original OpenAI models. These audio are 4 audio transcribed by humans from 1 to 5 minutes.
More details:
I tested quickly with the Spanish models and the fine tuned models also perform worse than the original OpenAI models.
From what I observed for the case of Catalan models, the fine-tuned models seem to quickly overfit.
Additionally I do not know if you also have seen this article: https://alphacephei.com/nsh/2023/01/15/whisper-finetuning.html from Nickolay Shmyrev.
My questions are :
Let me know if you need more details. Thanks in advance!
@sanchit-gandhi I have seen that the Python script has been changed to support non-streaming mode. Please, could you add instructions to README (which parms) for non streaming mode?
Hi,
Great tutorial! I had a question regarding the data-processing step for this tutorial, where the label tokens are padded with -100
(black-space) before being passed on to the model. Upon running the de-bugger I see that the model makes correct predictions, but predicts tokenizer.pad_token_id
(which corresponds to 50256
for Whisper) which leads to different losses depending on what value this padding is done with.
Should the padding not correspond to 50256
, and not -100
?
One of the comments said # replace padding with -100 to ignore loss correctly
but doing so actually yields a higher loss for a prediction that is correct (before fine-tuning has even begun) but has pad-token-ids at the end instead of -100
, as expected in the output tensor.
I have been trying to fine tune the whisper base on an english dataset only to improve the transciption of the model as the wer seemed to be high without fine tuning. I have a total of 560 audio files to fine tune it .
The above picture shows the base model performnace , there was actually very high difference in the train and validation loss os i used lora optimizer. but after using that you can see the abouve issue in the image which i am facing
When i used large-v2 model the above image as you can see , i am facing that. In this also i used the lora optimizer.
I have been stuck in this what is the issue and how can i solve this
Thank you in advance!
@Vaibhavs10 @sanchit-gandhi
That way it's visible in GitHub and people can fix issues/typos/etc.
Blank license/dataset tags will be blocking if you try to push to hub via git. Should maybe remove them/ see if we can comment them out so folks can fill them in as needed.
hi, we have 12M names and we would like to fine tune whisper on them. also, i am happy to share with you the results.
the question is it better to fine tune whisper using the entire spoken name? Or is it better to fine tune using invidial names and recording snippets of each anme spoken?
Thanks for providing the code for fine-tuning!
Issue:
I am running into an issue when i call trainer.train()
, I get super large number of epoch as shown below.
I have tried specifying num_train_epochs, which didn't work.
Training context:
I am finetuning in colab, referencing this script. I am fine-tuning the tiny model for Chinese(zh-TW), and only modified the code where necessary. My script is here: colab link
I get one error when I try to use the Lightweight GAN
implementation. Here is the traceback:
Traceback (most recent call last):
File "cli.py", line 166, in <module>
main()
File "cli.py", line 163, in main
fire.Fire(train_from_folder)
File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "cli.py", line 160, in train_from_folder
run_training(model_args, data, load_from, new, num_train_steps, name, seed)
File "cli.py", line 53, in run_training
model.train(G, D, D_aug)
File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 1074, in train
real_output, real_output_32x32, real_aux_loss = D_aug(image_batch, calc_aux_loss = True, **aug_kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 285, in forward
return self.D(images, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 648, in forward
x = net(x)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 171, in forward
return sum(map(lambda fn: fn(x), self.branches))
File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 171, in <lambda>
return sum(map(lambda fn: fn(x), self.branches))
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
I tried with multiple datasets and even with the default one. I am looking at the code to find where is there a problem, it seems that data and models are not on the same device in Discriminator
forward method.
Here is my 🤗 accelerate
config:
In the Readme of whisper-fine-tuning-event, the link to A Complete Guide To Audio Datasets is not found. Kindly update the readme accordingly.
The code reaches eval step and prints that num_example: unknown and gets stuck
I didn't change anything on the example code and tried both the google colab and the python variant on a google VM.
I tried:
1-changing the split to the same one as train.
2- Disabling predict_with_generate and do_normalize_eval.
I am trying to prepare a dataset for whisper fine-tuning , and I have a lot of small segment clip , most of them less than 6 seconds, I read the paper, but didn’t understand this paragraph:
“ When a final transcript segment is only partially included in the current 30- second audio chunk, we predict only its start time token for the segment when in timestamp mode, to indicate that the subsequent decoding should be performed on an audio window aligned with that time, otherwise we truncate the audio to not include the segment”
So when should I add the final segment if it is partially included in the current 30-second chunk, and when should I truncate the chunk without it, and if I added it how to extract only relevant transcription?
To make it clear:
| window | window |
|segment|-----segment---|--segment--|
assume that every window is 30 seconds, how to get the correct relevant transcription of the partially included segments?
Anyone could help?
Hi,
I've recently created a dataset using speech-to-text APIs on custom documents. The dataset consists of 1,000 audio samples, with 700 designated for training and 300 for testing. In total, this equates to about 4 hours of audio, where each clip is approximately 30 seconds long.
I'm attempting to fine-tune the Whisper small model with the help of HuggingFace's script, following the tutorial they've provided Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers.
Before diving into the fine-tuning, I evaluated the WER on OpenAI's pre-trained model, which stood at WER = 23.078%.
However, as my fine-tuning progresses, I'm observing some unexpected behavior:
As visible, the Validation Loss and WER are both on the rise during the fine-tuning phase. I'm at a bit of a loss here. Why might this be happening? Any insights or recommendations would be greatly appreciated.
Thank you in advance!
@Vaibhavs10 @sanchit-gandhi
Hi there
I'm trying to fintune whisper model but there is a problem that decoder positional embedding size(small model case is [448,768]) should not bigger than 448(first dim)
I have two question
Q1) When I use a 10 second or more long wav file, that problem let stop training.. is it problem related to file size..?
prob code line is below
# embed positions
positions = self.embed_positions(input_ids, past_key_values_length=past_key_values_length)
hidden_states = inputs_embeds + positions
in transformers/models/whisper/modeling_whisper.py:872 is stopped line
if I change the max_target_positions then I use random embedding layer instead existing whisper's embedding layer..
Q2) let me know any solution..?
I tried running the following script and it deletes all the files and folders in my current working directory.
https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#python-script
There is a --overwrite_output_dir \ flag but I'm guessing the behaviour should be to delete a folder inside the current working directory, not all the files in the folder.
There should probably be a way to rewrite this since deleting folders and subfolders on someones computer is dangerous and I trusted the code and let it run on my computer.
Add optional support for visualization of ControlNet results using wandb.Table
in train_controlnet_flax.py
. This can be used for summarizing results during training and inference.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.