GithubHelp home page GithubHelp logo

community-events's Introduction

Community Events @ ๐Ÿค—

A central repository for all community events organized by ๐Ÿค— HuggingFace. Come one, come all! We're constantly finding ways to democratise the use of ML across modalities and languages. This repo contains information about all past, present and upcoming events.

Hugging Events

Event Name Dates Status
Open Source AI Game Jam ๐ŸŽฎ (First Edition) July 7th - 9th, 2023 Finished
Whisper Fine Tuning Event Dec 5th - 19th, 2022 Finished
Computer Vision Study Group Ongoing Monthly
ML for Audio Study Group Ongoing Monthly
Gradio Blocks May 16th - 31st, 2022 Finished
HugGAN Apr 4th - 17th, 2022 Finished
Keras Sprint June, 2022 Finished

community-events's People

Contributors

ak391 avatar alekseykorshuk avatar andsteing avatar anuragshas avatar arampacha avatar arig23498 avatar cgarciae avatar eschivo avatar frugile avatar johko avatar johnowhitaker avatar kingabzpro avatar leticiaisilveira avatar merveenoyan avatar nateraw avatar nielsrogge avatar nsanghi avatar osanseviero avatar parambharat avatar pcuenca avatar pmysl avatar ronsor avatar sanchit-gandhi avatar sayakpaul avatar simoninithomas avatar theanimeguru avatar vaibhavs10 avatar yiyixuxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

community-events's Issues

Fine-tuned Whisper models perform worse than OpenAI

Hello there

I participated on the Whisper fine-tuning event hold last December. As result, I trained some models for Catalan language finetuned using Common Voice 11. Here are the models that we trained:

They score well in the WER evaluation produced by the script provided by HuggingFace.

However, when I evaluate these fine-tuned models with real audio, they perform worse than the original OpenAI models. These audio are 4 audio transcribed by humans from 1 to 5 minutes.

More details:

  • As we know, HuggingFace library does not work well yet with Whisper for audios over 30 seconds
  • We use https://github.com/ggerganov/whisper.cpp library which converts from HuggingFace models to its own format
    • Their converter is solid since when you run the conversion from huggingface/openai you get the same results that with the openAI models

I tested quickly with the Spanish models and the fine tuned models also perform worse than the original OpenAI models.

From what I observed for the case of Catalan models, the fine-tuned models seem to quickly overfit.

Additionally I do not know if you also have seen this article: https://alphacephei.com/nsh/2023/01/15/whisper-finetuning.html from Nickolay Shmyrev.

My questions are :

  • Has anybody been using the finetune models for real uses cases?
  • Has anybody observed these problems?

Let me know if you need more details. Thanks in advance!

Super large number of epoch

Thanks for providing the code for fine-tuning!

Issue:
I am running into an issue when i call trainer.train(), I get super large number of epoch as shown below.
image

I have tried specifying num_train_epochs, which didn't work.
image

Training context:
I am finetuning in colab, referencing this script. I am fine-tuning the tiny model for Chinese(zh-TW), and only modified the code where necessary. My script is here: colab link

Increasing WER & Validation Loss During Whisper Fine-Tuning

Hi,
I've recently created a dataset using speech-to-text APIs on custom documents. The dataset consists of 1,000 audio samples, with 700 designated for training and 300 for testing. In total, this equates to about 4 hours of audio, where each clip is approximately 30 seconds long.

I'm attempting to fine-tune the Whisper small model with the help of HuggingFace's script, following the tutorial they've provided Fine-Tune Whisper For Multilingual ASR with ๐Ÿค— Transformers.

Before diving into the fine-tuning, I evaluated the WER on OpenAI's pre-trained model, which stood at WER = 23.078%.

However, as my fine-tuning progresses, I'm observing some unexpected behavior:

Screenshot 2023-11-01 at 2 11 14 AM

As visible, the Validation Loss and WER are both on the rise during the fine-tuning phase. I'm at a bit of a loss here. Why might this be happening? Any insights or recommendations would be greatly appreciated.

Thank you in advance!
@Vaibhavs10 @sanchit-gandhi

WhisperPositionalEmbedding

Hi there
I'm trying to fintune whisper model but there is a problem that decoder positional embedding size(small model case is [448,768]) should not bigger than 448(first dim)
I have two question
Q1) When I use a 10 second or more long wav file, that problem let stop training.. is it problem related to file size..?

prob code line is below

        # embed positions
        positions = self.embed_positions(input_ids, past_key_values_length=past_key_values_length)

        hidden_states = inputs_embeds + positions

in transformers/models/whisper/modeling_whisper.py:872 is stopped line
if I change the max_target_positions then I use random embedding layer instead existing whisper's embedding layer..
Q2) let me know any solution..?

Whisper parameters

Can I use whisper parameters like beam_size and temperature while running my finetuned hf model?

Poor Real-Time Performance of Whisper Models Fine-Tuned on Synthetic Data

Hi,

I have custom text data for plant disease names and plant names like this:

uuid, context 
1er1hhaj13, The Rhododendron, a popular ornamental plant, often suffers from Phytophthora ramorum, a challenging disease to manage and pronounce. This pathogen causes Sudden Oak Death, which can lead to extensive damage and mortality in infected plants.

I used speech-to-text APIs to convert this context into audio WAV files, choosing 10 speakers with mostly American/UK/British accents. So I created around ~5k samples for training and ~2k samples for testing.

I followed the same steps from "Fast whisper finetuning" to finetune the peft version of Whisper Large-v2. The training and validation loss looks good:

Step | Training Loss | Validation Loss
250 | 0.413000 | 0.102663
500 | 0.109900 | 0.130888
750 | 0.116500 | 0.102719
1000 | 0.092800 | 0.099153
1250 | 0.068800 | 0.075613 
1500 | 0.042500 | 0.085680
1750 | 0.047500 | 0.076951
2000 | 0.027500 | 0.065127
2250 | 0.023700 | 0.061832
2500 | 0.012500 | 0.062658
2750 | 0.011500 | 0.061922
3000 | 0.008500 | 0.061463
3250 | 0.005300 | 0.060227
3500 | 0.003800 | 0.060712
3750 | 0.002700 | 0.060332
4000 | 0.002300 | 0.060496

When I calculated WER on the test data:

  • OpenAI Whisper APIs: 22.03 WER on test data
  • Finetuned model: 0.3 WER on test data

Which looks good. However, during real-time testing with an Indian English-speaking audience, the accuracy for plant names and disease names was not satisfactory. What strategies could we employ to improve accuracy in real-time settings?
Any guidance or suggestions on this matter would be greatly appreciated. Thank you!

The fine tuning script run_speech_recognition_seq2seq_streaming.py use interleave_datasets which will truncate the train split

The fine tuning script run_speech_recognition_seq2seq_streaming.py use interleave_dataset function to add the train and validation split together. But I think what we really want to use is concatenate_dataset, because according to the docs, the result of function interleave_dataset ends when one of the source datasets runs out of examples (the default mode).
For example, if the train split has 100 entries and validation split has 10 entries, the result would contains only 10 entries from validation split and 10 from train split. That means we waste the existing train split dataset.

as example:

>>> from datasets import Dataset, interleave_datasets, concatenate_datasets
>>> d1 = Dataset.from_dict({"a": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
>>> d2 = Dataset.from_dict({"a": [10, 11, 12]})
>>> print(interleave_datasets([d1, d2])['a'])
[0, 10, 1, 11, 2, 12]
>>> print(concatenate_datasets([d1, d2])['a'])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

huggan.pytorch.lightweight_gan.lightweight_gan.LightweightGAN _from_pretrained requires use_auth_token but this is not passed by the from_pretrained method inherited from ModelHubMixin

In the mixin code (https://github.com/huggingface/huggingface_hub/blob/9e0ac58813df4e0414d6fd494040953f053dbe0d/src/huggingface_hub/hub_mixin.py#L93) from_pretrained calls _from_pretrained but doesn't pass in a use_auth_token argument.
In the LightweightGAN code this argument is required: https://github.com/huggingface/community-events/blob/main/huggan/pytorch/lightweight_gan/lightweight_gan.py#L854

Notebook showing how this manifests for a user: https://colab.research.google.com/drive/1Lc42pRp0-ZxFKbhfU420ZrpXfA8Q-k-e?usp=sharing (includes how I worked around this for now). Currently if you follow the example usage at e.g. https://huggingface.co/ceyda/butterfly_cropped_uniq1K_512 you'll get an error.

Suggested fix is to just add a default for use_auth_token=None in the LightweightGAN _from_pretrained method, but creating an issue in case someone wants to do a more thorough fix. This code is very rarely used but I've had at least one keen learner stuck on this.

Script deletes files and subfolders on my machine

I tried running the following script and it deletes all the files and folders in my current working directory.

https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#python-script

There is a --overwrite_output_dir \ flag but I'm guessing the behaviour should be to delete a folder inside the current working directory, not all the files in the folder.

There should probably be a way to rewrite this since deleting folders and subfolders on someones computer is dangerous and I trusted the code and let it run on my computer.

this is more question than an issue

hi, we have 12M names and we would like to fine tune whisper on them. also, i am happy to share with you the results.

the question is it better to fine tune whisper using the entire spoken name? Or is it better to fine tune using invidial names and recording snippets of each anme spoken?

Lightweight GAN input type error

I get one error when I try to use the Lightweight GAN implementation. Here is the traceback:

Traceback (most recent call last):
  File "cli.py", line 166, in <module>
    main()
  File "cli.py", line 163, in main
    fire.Fire(train_from_folder)
  File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "cli.py", line 160, in train_from_folder
    run_training(model_args, data, load_from, new, num_train_steps, name, seed)
  File "cli.py", line 53, in run_training
    model.train(G, D, D_aug)
  File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 1074, in train
    real_output, real_output_32x32, real_aux_loss = D_aug(image_batch,  calc_aux_loss = True, **aug_kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 285, in forward
    return self.D(images, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 648, in forward
    x = net(x)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 171, in forward
    return sum(map(lambda fn: fn(x), self.branches))
  File "/notebooks/community-events/huggan/pytorch/lightweight_gan/lightweight_gan.py", line 171, in <lambda>
    return sum(map(lambda fn: fn(x), self.branches))
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

I tried with multiple datasets and even with the default one. I am looking at the code to find where is there a problem, it seems that data and models are not on the same device in Discriminator forward method.

Here is my ๐Ÿค— accelerate config:

  • This machine (Paperspace RTX A5000),
  • no distrubuted training,
  • not only on cpu,
  • no DeepSpeed,
  • 1 total process,
  • no fp16 or bf16

Colab runtime crash

image

I am trying to fine-tune whisper small on a colab notebook using T4 GPU , the issue is when I run this snippet the ram usage is maxed and the notebook crashes, any suggestions or explanations on why that happens?

Padding conflict in loss computation

Hi,

Great tutorial! I had a question regarding the data-processing step for this tutorial, where the label tokens are padded with -100 (black-space) before being passed on to the model. Upon running the de-bugger I see that the model makes correct predictions, but predicts tokenizer.pad_token_id (which corresponds to 50256 for Whisper) which leads to different losses depending on what value this padding is done with.

Should the padding not correspond to 50256, and not -100?

One of the comments said # replace padding with -100 to ignore loss correctly but doing so actually yields a higher loss for a prediction that is correct (before fine-tuning has even begun) but has pad-token-ids at the end instead of -100, as expected in the output tensor.

Misc ideas based on discussions in Slack and TODOs

Just dumping ideas here related to this repo cc @NielsRogge @nateraw

Issue uploading the dataset

Hi, I tried to upload my dataset to the Hub but I am getting this error message

HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create - You don't have the rights to create a dataset under this namespace

I tried changing the permissions of the token (write and read) but it didn't work. Any help will be appreciated.

How to prepare audio dataset for whisper fine-tuning with timestamps?

I am trying to prepare a dataset for whisper fine-tuning , and I have a lot of small segment clip , most of them less than 6 seconds, I read the paper, but didnโ€™t understand this paragraph:

โ€œ When a final transcript segment is only partially included in the current 30- second audio chunk, we predict only its start time token for the segment when in timestamp mode, to indicate that the subsequent decoding should be performed on an audio window aligned with that time, otherwise we truncate the audio to not include the segmentโ€

So when should I add the final segment if it is partially included in the current 30-second chunk, and when should I truncate the chunk without it, and if I added it how to extract only relevant transcription?

To make it clear:

|           window           |           window           |
|segment|-----segment---|--segment--|

assume that every window is 30 seconds, how to get the correct relevant transcription of the partially included segments?
Anyone could help?

Lambda platform doesn't support Tensorflow-Gpu

@merveenoyan ,

NVIDIA-SMI 515.65.01
Driver Version: 515.65.01
CUDA Version: 11.7
TensorFlow: 2.11.0

image
The pytorch library supports it, but tensorflow doesn't.

TF:

tf.config.list_logical_devices("GPU")

Output:

[]

Torch:

import torch
torch.cuda.is_available()

Output:

True

Using finetuned whisper checkpoints for inference

Hi!
I have been trying for a long time to get inference on the finetuned model but it keeps throwing an error saying that tokenizer is missing.

Steps to reproduce:

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition",model="nodlehs/whisper_finetune")  # change to "your-username/the-name-you-picked"

def transcribe(audio):
    text = pipe(audio)["text"]
    return text

It seems I am missing a tokenizer file but while running the whisper finetune, no such file was uploaded
Could someone please help me out?

P.S this is my model on hf https://huggingface.co/nodlehs/whisper_finetune/tree/main

Whisper finetune

Hi I'm trying to train whisper fine-tune with multi-gpu
and I don't know what RANK to set
I just set WORLD_SIZE is numer of gpu and MASTER_ADDR is localhost, MASTER_PORT is idle port
When WORLD_SIZE is more than 2 and RANK is set 0, training is hanging
Probably it hanged in setting torch.distributed.TCPStore() part..

anyone who solved this problem?
let me know hint please

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.