Comments (14)
It works! Thanks very much.
from espnet.
Thanks for your report.
@simpleoier, can you answer it for me?
from espnet.
Hi @ily6 , did you specify freeze_params
in your config? This is used to freeze the original parameters, due to the new Adapter interface made by @Stanwang1210.
from espnet.
Thanks for your reply! I didn't specify freeze_params
, just follow the original "conf/tuning/train_asr_whisper_medium_lora_finetune.yaml". What should I set the freeze_params
to?
from espnet.
Can you try the following?
unused_parameters: true
freeze_param: [
"encoder",
"decoder",
]
from espnet.
Hi,I've encountered a new issue. When I use Whisper for zero-shot evaluation, I directly modify the config.yaml file with:
max_epoch: 1
optim_conf:
lr: 0.0
But the CER (%) on the Aishell1 test set was 174, and its hypothesis is as follows:
I would like to know if there are any solutions to this issue. Alternatively, is there any run.sh file available that I can use directly for zero-shot evaluation?
from espnet.
Did you initialize lora parameters without fine-tuning?
from espnet.
No, I use the asr_config=conf/tuning/train_asr_whisper_medium_full_finetune.yaml, inference_config=conf/tuning/decode_asr_whisper_noctc_beam10.yaml
from espnet.
Did you correctly set the --token_type
to whisper_multilingual
? See here
From the figure you provide, the Chinese transcriptions look reasonable.
Therefore, I doubt that there is something wrong with the tokenizer, which may lead to the high CER issue.
Also, from my own experience, the version of transformers
module is quite important. Adopting wrong version of transformers
module may also lead to the same issue.
Please try transformers==4.28.1
or version close to it.
If the issue can't be fixed, please provide more information.
It will help a lot.
from espnet.
Thank you for your response. My training files are as follows:
normalize: null
encoder: whisper
encoder_conf:
whisper_model: medium
dropout_rate: 0.0
use_specaug: true
specaug_conf:
apply_time_warp: true
time_warp_window: 5
time_warp_mode: bicubic
apply_freq_mask: true
freq_mask_width_range:
- 0
- 40
num_freq_mask: 2
apply_time_mask: true
time_mask_width_ratio_range:
- 0.
- 0.12
num_time_mask: 5
decoder: whisper
decoder_conf:
whisper_model: medium
dropout_rate: 0.0
preprocessor: default
preprocessor_conf:
whisper_language: "zh"
whisper_task: "transcribe"
model_conf:
ctc_weight: 0.0
lsm_weight: 0.1
length_normalized_loss: false
extract_feats_in_collect_stats: false
sym_sos: "<|startoftranscript|>"
sym_eos: "<|endoftext|>"
# do_pad_trim: true # should be set when doing zero-shot inference
frontend: null
input_size: 1 # to prevent build_model() from complaining
seed: 2022
log_interval: 100
num_att_plot: 0
num_workers: 4
sort_in_batch: descending # how to sort data in making batch
sort_batch: descending # how to sort created batches
batch_type: numel
batch_bins: 3000000 # good for 8 * RTX 3090 24G
accum_grad: 16
max_epoch: 1
patience: none
init: none
best_model_criterion:
- - valid
- acc
- max
keep_nbest_models: 1
use_amp: true
cudnn_deterministic: false
cudnn_benchmark: false
optim: adamw
grad_clip: 1.0
optim_conf:
lr: 0.0
scheduler: warmuplr
scheduler_conf:
warmup_steps: 1500
And run.sh
#!/usr/bin/env bash
# Set bash to 'debug' mode, it will exit on :
# -e 'error', -u 'undefined variable', -o ... 'error in pipeline', -x 'print commands',
set -e
set -u
set -o pipefail
train_set=train
valid_set=dev
test_sets="test"
asr_config=conf/tuning/train_asr_whisper_medium_finetune.yaml
inference_config=conf/tuning/decode_asr_whisper_noctc_beam10.yaml
lm_config=conf/train_lm_transformer.yaml
use_lm=false
use_wordlm=false
# speed perturbation related
# (train_set will be "${train_set}_sp" if speed_perturb_factors is specified)
speed_perturb_factors="0.9 1.0 1.1"
./asr_test.sh \
--nj 32 \
--gpu_inference true \
--inference_nj 1 \
--lang zh \
--token_type whisper_multilingual \
--feats_normalize "" \
--audio_format "wav" \
--feats_type raw \
--use_lm ${use_lm} \
--use_word_lm ${use_wordlm} \
--lm_config "${lm_config}" \
--cleaner whisper_basic \
--asr_config "${asr_config}" \
--inference_config "${inference_config}" \
--train_set "${train_set}" \
--valid_set "${valid_set}" \
--test_sets "${test_sets}" \
--speed_perturb_factors "${speed_perturb_factors}" \
--asr_speech_fold_length 512 \
--asr_text_fold_length 150 \
--lm_fold_length 150 \
--lm_train_text "data/${train_set}/text" "$@"
I found that the decoding results of Aishell are repetitive, for example:
BAC009S0764W0121
甚至出现交易几乎停滞的情况甚至出现交易几乎停滞的情况甚至出现交易几乎停滞的情况甚至出现交易几乎停滞的情况甚至出
BAC009S0764W0123 但因為聚集了過多公共事源,但因為聚集了過多公共事源,但因為聚集了過多公共事源,但因為聚集了過多公共事源,但因為聚集了過多
This leads to many insertion errors, and the final result is as follows:
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 7176 104765 | 80.3 15.6 4.2 154.5 174.2 90.6 |
My transformer version is "4.40.2", and I achieved consistent results with the espnet official results when performing LoRA fine-tuning and full fine-tuning experiments using this version on Aishell1 dataset. Therefore, could it be that the issue is not related to the transformer version?
Would setting the learning rate to 0 and training for one epoch cause any problems? Or maybe the parameter "do_pad_trim: true"?
from espnet.
Sorry for the late reply.
Could you please check whether the parameters in your checkpoint align with the original whisper checkpoint? So that we can tell it's not related to
learning rate to 0 and training for one epoch
And given the information you provided, it may not relate to the transformers version.
From your inference samples, it's like you encounter the hallucination problem like here.
If that's the case, then the problem will not able to be solved easily. Whisper official code did implement some post-processing to deal with that issue. You can take a look at it.
from espnet.
Hi, I am also using this recipe, but in asr.sh stage 5, I have Whisper attribute error tokenizer object has no attribute tokenizer error. I am using espnet 202402, and openai-whisper 202311
from espnet.
@Yuanyuan-888 The quick fix is to try an earlier version of whisper, 20230308
. Whisper has changed their tokenizer API.
from espnet.
@Yuanyuan-888 The quick fix is to try an earlier version of whisper,
20230308
. Whisper has changed their tokenizer API.
Hi! Thank you for your answer, then it will not use whisper large V3 to decode anymore
from espnet.
Related Issues (20)
- How to accelerate ESPnet models using TensorRT. HOT 1
- Multilingual ASR with Auxiliary CTC objectives HOT 10
- issue: frontend embed in the latest version.
- Bugs in reproducing VoxtLM v1 HOT 14
- Regarding the reconstruction of some models using keras3 code. HOT 1
- Teacher forcing vs knowledge distillation
- How to extract voice embeddings? HOT 6
- Whisper attribute error tokenizer object has no attribute tokenizer HOT 2
- The Longformer TVM kernel was built with Cuda 10.0 and currently, it works only with Cuda 10.0.
- Test LJspeech TTS with random sentences. HOT 1
- How to download transformer LM from this repo? HOT 3
- I am having issues with tts implementation HOT 4
- Unable to download dataset as link doesn't work
- Flash attention? HOT 2
- Regarding usage of parameters "num_iters_per_epoch" and "num_splits_asr" in ASR Task. HOT 2
- RuntimeError: Error(s) in loading state_dict for ESPnetASRModel HOT 7
- Link in the TTS recipe is not right HOT 2
- About controlling the gender of generated wavs in TTS from the interface HOT 1
- ViSinger2+ HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espnet.