GithubHelp home page GithubHelp logo

x-lance / slam-llm Goto Github PK

View Code? Open in Web Editor NEW
410.0 18.0 33.0 42.61 MB

Speech, Language, Audio, Music Processing with Large Language Model

License: MIT License

Python 99.02% Shell 0.83% Dockerfile 0.15%
audio-processing large-language-model multimodal-large-language-models music-processing peft speech-processing

slam-llm's People

Contributors

activescott avatar amitsangani avatar anshikavermag avatar avi-cenna avatar awgu avatar chauhang avatar cmiller01 avatar cwx-worst-one avatar ddlbojack avatar hamidshojanazeri avatar irajmoradi avatar jeffxtang avatar johnbwilliams avatar lauragpt avatar lchu-ibm avatar luobots avatar mreso avatar philparzer avatar polym avatar rohan-varma avatar sekyondameta avatar shijie-wu avatar thuwyh avatar tim-a-davis avatar varunfb avatar wangtianrui avatar yanghaha0908 avatar zhikangniu avatar zszheng147 avatar zzasdf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slam-llm's Issues

Avoid putting a bos token before answer ?

Hello,
I don't quite understand why bos is not added here "(example = prompt + answer # FIX(MZY): avoid putting a bos token before answer.)". How can autoregressive training be implemented without adding bos?

Do you have any plan about Speech to Text or Speech to Speech End2End models?

🚀 The feature, motivation and pitch

As we all know, GPT-4o is an end2end multi-modal models, which support Speech to Text/Speech. I have some ideas about it:

  1. Speech to Text: Can we have a try by combining the pretrained ASR encoder and a trainable linear projection to make Speech to Text possible?
  2. Speech to Speech: Align the pretrained ASR decoder with the main LLM backbone.

Alternatives

No response

Additional context

No response

Mismatch Issue in the EAT Checkpoint Dictionary for the AAC Inference Task

System Info

Consistent with the official repository's environment requirements

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

I encountered a bug during inference as shown in the Error logs:

AssertionError: Could not infer task type from {'_name': 'mae_image_classification', 'data': '/hpc_stor03/sjtu_home/wenxi.chen/mydata/audio/AS2M', 'multi_data': None, 'input_size': 224, 'local_cache_path': None, 'key': 'imgs', 'beit_transforms': False, 'target_transform': False, 'no_transform': False, 'rebuild_batches': True, 'precompute_mask_config': None, 'subsample': 1.0, 'seed': 1, 'dataset_type': 'imagefolder', 'audio_mae': True, 'h5_format': True, 'downsr_16hz': True, 'target_length': 1024, 'flexible_mask': False, 'esc50_eval': False, 'spcv2_eval': False, 'AS2M_finetune': True, 'spcv1_finetune': False, 'roll_aug': True, 'noise': False, 'weights_file': '/hpc_stor03/sjtu_home/wenxi.chen/mydata/audio/AS2M/weight_train_all.csv', 'num_samples': 200000, 'is_finetuning': False, 'label_descriptors': 'label_descriptors.csv', 'labels': 'lbl'}. Available argparse tasks: dict_keys(['sentence_prediction', 'hubert_pretraining', 'speech_unit_modeling', 'translation', 'online_backtranslation', 'language_modeling', 'speech_to_text', 'text_to_speech', 'cross_lingual_lm', 'translation_multi_simple_epoch', 'denoising', 'multilingual_denoising', 'multilingual_translation', 'legacy_masked_lm', 'masked_lm', 'sentence_prediction_adapters', 'sentence_ranking', 'translation_from_pretrained_bart', 'speech_to_speech', 'translation_from_pretrained_xlm', 'multilingual_masked_lm', 'frm_text_to_speech', 'audio_pretraining', 'audio_finetuning', 'multilingual_language_modeling', 'translation_lev', 'simul_speech_to_text', 'simul_text_to_text', 'semisupervised_translation', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt']). Available hydra tasks: dict_keys(['sentence_prediction', 'hubert_pretraining', 'speech_unit_modeling', 'translation', 'language_modeling', 'masked_lm', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'audio_pretraining', 'audio_finetuning', 'multilingual_language_modeling', 'translation_lev', 'simul_text_to_text', 'dummy_lm', 'dummy_masked_lm'])

After pinpointing the issue, I found that the problem occurred at this step:
SLAM-LLM/src/slam_llm/models/encoder.py, line 77, in load EATEncoder, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([model_config.encoder_path])

Further analysis revealed that the mismatch issue described in the Error logs is due to the checkpoint['cfg']['task'] in the pre-trained EAT checkpoint I downloaded from the repository link not matching the code.

How should I modify the dictionary values in the EAT checkpoint to ensure it runs correctly?

Error logs

AssertionError: Could not infer task type from {'_name': 'mae_image_classification', 'data': '/hpc_stor03/sjtu_home/wenxi.chen/mydata/audio/AS2M', 'multi_data': None, 'input_size': 224, 'local_cache_path': None, 'key': 'imgs', 'beit_transforms': False, 'target_transform': False, 'no_transform': False, 'rebuild_batches': True, 'precompute_mask_config': None, 'subsample': 1.0, 'seed': 1, 'dataset_type': 'imagefolder', 'audio_mae': True, 'h5_format': True, 'downsr_16hz': True, 'target_length': 1024, 'flexible_mask': False, 'esc50_eval': False, 'spcv2_eval': False, 'AS2M_finetune': True, 'spcv1_finetune': False, 'roll_aug': True, 'noise': False, 'weights_file': '/hpc_stor03/sjtu_home/wenxi.chen/mydata/audio/AS2M/weight_train_all.csv', 'num_samples': 200000, 'is_finetuning': False, 'label_descriptors': 'label_descriptors.csv', 'labels': 'lbl'}. Available argparse tasks: dict_keys(['sentence_prediction', 'hubert_pretraining', 'speech_unit_modeling', 'translation', 'online_backtranslation', 'language_modeling', 'speech_to_text', 'text_to_speech', 'cross_lingual_lm', 'translation_multi_simple_epoch', 'denoising', 'multilingual_denoising', 'multilingual_translation', 'legacy_masked_lm', 'masked_lm', 'sentence_prediction_adapters', 'sentence_ranking', 'translation_from_pretrained_bart', 'speech_to_speech', 'translation_from_pretrained_xlm', 'multilingual_masked_lm', 'frm_text_to_speech', 'audio_pretraining', 'audio_finetuning', 'multilingual_language_modeling', 'translation_lev', 'simul_speech_to_text', 'simul_text_to_text', 'semisupervised_translation', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt']). Available hydra tasks: dict_keys(['sentence_prediction', 'hubert_pretraining', 'speech_unit_modeling', 'translation', 'language_modeling', 'masked_lm', 'sentence_prediction_adapters', 'translation_from_pretrained_xlm', 'audio_pretraining', 'audio_finetuning', 'multilingual_language_modeling', 'translation_lev', 'simul_text_to_text', 'dummy_lm', 'dummy_masked_lm'])

Expected behavior

An EAT checkpoint that correctly matches the inference script and code.

Query on Metrics Reported in VSR Sub-Project Test Phase

System Info

Excellent work! May I kindly inquire what metrics are reported in the test phase of the VSR sub-project? Is it WER?

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Excellent work! May I kindly inquire what metrics are reported in the test phase of the VSR sub-project? Is it WER?

Error logs

Excellent work! May I kindly inquire what metrics are reported in the test phase of the VSR sub-project? Is it WER?

Expected behavior

More details.

License

Hi, thanks for releasing SLAM LLM! Would you mind adding a license? Thanks!

checkpoint文件下载权限

想尝试复现一下vsr等几个模型的推理效果,但是没有相关checkpoint的下载权限,希望能给开一下
感谢!!!

About FSDP,deepspeech.

Hello,
May I know if the current FSDP and DeepSpeech are stable and available for use? Do they support multi-machine multi-card and LORA fine-tuning?

Deepspeed training dataset does not have sampler

System Info

torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

When using Deepspeed training, compared with DDP training with the same configuration, the total number of steps in each epoch training increased by N times (N is the number of cards). When printing the relevant configuration of the dataset, it was found that there is no sampler.
DDP: {'sampler': <torch.utils.data.distributed.DistributedSampler object at 0x7fc99032c640>, 'batch_size': 6, 'drop_last': True, 'collate_fn': <bound method SpeechDatasetJsonl.collator of <speech_dataset.py.SpeechDatasetJsonl object at 0x7fc275f34130>>}
Deepspeed: {'batch_size': 6, 'drop_last': True, 'collate_fn': <bound method SpeechDatasetJsonl.collator of <speech_dataset.py.SpeechDatasetJsonl object at 0x7fbee2e324c0>>}
It may cause that the data read by each card is exactly the same.

Error logs

the same as above

Expected behavior

Thank you for your outstanding work. Hope can fix this problem and give the time required for Deepspeed and DDP to train an epoch on the default configuration of LIbrispeech. Thx a lot!:D

Request for additional checkpoints of SALM-ASR

Hi! Thank you sharing interesting work!

I would like to ask if you can share additional trained model checkpoints you included in the paper.
I would appreciate if you could share SLAM-ASR checkpoints with Whisper ASR encoder, which is included in Table 4 middle rows.

Thank you!

[Question] Does it support the combination of hubert-large + linear-projector + tinyllama

Motivation

Try to replicate your experiments in the paper. Due to resource constrain, I'd like to start from the combination hubert-large + linear project + tiny-llama-1.1b-chat.

Error

I revised the examples/librispeech_asr/scripts/finetune_hubert_xtralarge_linear_vicuna_7b.sh, specifying speech_encoder with hubert_large and llm with tiny-llama-chat respectively, and changed as well the encoder dim accordingly.
Unfortunately, when training, an error arises as follows:

企业微信截图_17162878152304

In the slam_llm/models/slam_model.py, it seems that the length of modality_mask (i.e. audio_length) is shorter than the encoder outputs.
I only change the speech encoder here, so such an error shouldn't occur.
From my understanding, the hubert-large and hubert-xtralarge share the same subsampling rate, so this change shouldn't affect the training.

Question

Does the examples/librispeech_asr support the combination mentioned above? Thanks a lot!

LoRA weights and config are not generated when finetuning the model for AAC task with peft

System Info

OS: Ubuntu 22.04.3 LTS
Python Version: 3.10.12
PyTorch Version: 2.0.1
GPU: 1x Nvidia RTX A6000
CUDA Version: 12.4

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

I am trying to finetune the model for AAC task using LoRA. I use the official finetuning script examples/aac_audiocaps/scripts/finetune_eat_audiocaps.sh with slight modifications to the config params. The training and inference are ok when PEFT is disabled (i.e., finetuning the linear layer only). When I enable PEFT, however, LoRA weights (adapter_model.safetensors) and config (adapter_config.json) are not stored in the output directory.

Here is the output directory for both cases (with and without LoRA) after training is over:

output_dir
+--- .hydra
|    +--- config.yaml
|    +--- hydra.yaml
|    +--- overrides.yaml
+--- aac_epoch_x_step_y
|    +--- model.pt
+--- finetune_aac.log
+--- train.log

Here is the config:

hydra_args="
hydra.run.dir=$output_dir \
++model_config.llm_name=vicuna-7b-v1.5 \
++model_config.llm_path=$llm_path \
++model_config.llm_dim=4096 \
++model_config.encoder_fairseq_dir=$fairseq_eat_path \
++model_config.encoder_name=eat \
++model_config.encoder_ds_rate=2 \
++model_config.encoder_projector_ds_rate=$encoder_projector_ds_rate \
++model_config.encoder_path=$audio_encoder_path \
++model_config.encoder_dim=768 \
++model_config.encoder_projector=linear \
++dataset_config.encoder_projector_ds_rate=${encoder_projector_ds_rate} \
++dataset_config.dataset=audio_dataset \
++dataset_config.train_data_path=$train_jsonl_path \
++dataset_config.val_data_path=$val_jsonl_path \
++dataset_config.input_type=mel \
++dataset_config.fbank_mean=-4.268 \
++dataset_config.fbank_std=4.569 \
++dataset_config.model_name=eat \
++dataset_config.fixed_length=true \
++dataset_config.target_length=1024 \
++train_config.num_epochs=1 \
++train_config.model_name=aac \
++train_config.freeze_encoder=true \
++train_config.freeze_llm=true \
++train_config.batching_strategy=custom \
++train_config.warmup_steps=1000 \
++train_config.total_steps=100000 \
++train_config.lr=$lr \
++train_config.validation_interval=1 \
++train_config.batch_size_training=$btz \
++train_config.val_batch_size=$btz \
++train_config.num_workers_dataloader=4 \
++train_config.use_fp16=true \
++train_config.output_dir=$output_dir \
++train_config.seed=${seed} \
++train_config.use_peft=true \
++log_config.log_file="${output_dir}/train.log" \
++metric=acc \
"

As a side note, the training and validation accuracy are improved when enabling PEFT, suggesting that the training with PEFT is actually ok, but the learned weights are not stored on disk.

Error logs

Training is completed without any error, but LoRA weights and config files are not generated.

Expected behavior

The training script should generate the LoRA files along with model.pt.

Valid model.pt for ckpt_path -- Is it a open-source model

System Info

I am trying to run this: bash decode_wavlm_large_linear_vicuna_7b.sh

But, not sure, what has to be given for ckpt_path, currently I do not have model.pt? Where do I get this? Is it some open-source model available in hugging face etc.? Please let me know. Currently, it is failing with this error. Thanks @byrTony-Frankzyq

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/model.pt'
Thanks

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

bash decode_wavlm_large_linear_vicuna_7b.sh

Error logs

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/model.pt'

Expected behavior

It is expected to produce output

The batch decoding results are inconsistent with the non-batch decoding results

System Info

torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

using scripts/decode_hubert_xtralarge_linear_vicuna_7b.sh, decode test-clean
when batchsize=1, decode 50 utts result:
1089-134686-0000 HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTENED SAUCE 1089-134686-0001 STUFF IT INTO YOU HIS BELLY COUNSELLED HIM 1089-134686-0002 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS 1089-134686-0003 HULLO BERTY ANY GOOD IN YOUR MIND 1089-134686-0004 NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND 1089-134686-0005 THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS 1089-134686-0006 THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL 1089-134686-0007 A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL 1089-134686-0008 THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF 1089-134686-0009 AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE 1089-134686-0010 WELL NOW ENNIS I DECLARE YOU HAVE A HEAD AND SO HAS MY STICK 1089-134686-0011 ON SATURDAY MORNINGS WHEN THE SODALITY MET IN THE CHAPEL TO RECITE THE LITTLE OFFICE HIS PLACE WAS A CUSHIONED KNEELING DESK AT THE RIGHT OF THE ALTAR FROM WHICH HE LED HIS WING OF BOYS THROUGH THE RESPONSES 1089-134686-0012 HER EYES SEEMED TO REGARD HIM WITH MILD PITY HER HOLINESS A STRANGE LIGHT GLOWING FAINTLY UPON HER FRAIL FLESH DID NOT HUMILIATE THE SINNER WHO APPROACHED HER 1089-134686-0013 IF EVER HE WAS IMPELLED TO CAST SIN FROM HIM AND TO REPENT THE IMPULSE THAT MOVED HIM WAS THE WISH TO BE HER KNIGHT 1089-134686-0014 HE TRIED TO THINK HOW IT COULD BE 1089-134686-0015 BUT THE DUSK DEEPENING IN THE SCHOOLROOM COVERED OVER HIS THOUGHTS THE BELL RANG 1089-134686-0016 THEN YOU CAN ASK HIM QUESTIONS ON THE CATECHISM DEDALUS 1089-134686-0017 STEPHEN LEANING BACK AND DRAWING IDLY ON HIS SCRIBLER LISTENED TO THE TALK ABOUT HIM WHICH HARREN CHECKED FROM TIME TO TIME BY SAYING 1089-134686-0018 IT WAS STRANGE TOO THAT HE FOUND AN ARID PLEASURE IN FOLLOWING UP TO THE END THE RIGID LINES OF THE DOCTRINES OF THE CHURCH AND PENETRATING INTO OBSCURE SILENCES ONLY TO HEAR AND FEEL THE MORE DEEPLY HIS OWN CONDEMNATION 1089-134686-0019 THE SENTENCE OF SAINT JAMES'S WHICH SAYS THAT HE WHO OFFENDS AGAINST ONE COMMANDMENT BECOMES GUILTY OF ALL HAD SEEMED TO HIM FIRST A SWOLLEN PHRASE UNTIL HE HAD BEGUN TO GROPE IN THE DARKNESS OF HIS OWN STATE 1089-134686-0020 IF A MAN HAD STOLEN A POUND IN HIS YOUTH AND HAD USED THAT POUND TO AMASS A HUGE FORTUNE HOW MUCH WAS HE OBLIGED TO GIVE BACK THE POUND HE HAD STOLEN ONLY OR THE POUND TOGETHER WITH THE COMPOUND INTEREST ACCRUEING UPON IT OR ALL HIS HUGE FORTUNE 1089-134686-0021 IF A LAYMAN IN GIVING BAPTISM POUR THE WATER BEFORE SAYING THE WORDS IS THE CHILD BAPTIZED 1089-134686-0022 HOW COMES IT THAT WHILE THE FIRST BEATITUDE PROMISES THE KINGDOM OF HEAVEN TO THE POOR OF HEART THE SECOND BEATITUDE PROMISES ALSO TO THE MEEK THAT THEY SHALL POSSESS THE LAND 1089-134686-0023 WHY WAS THE SACRAMENT OF THE EUCHARIST INSTITUTED UNDER THE TWO SPECIES OF BREAD AND WINE IF JESUS CHRIST BE PRESENT BODY AND BLOOD SOUL AND DIVINITY IN THE BREAD ALONE AND IN THE WINE ALONE 1089-134686-0024 IF THE WINE CHANGE INTO VINEGAR AND THE HOST CRUMBLE INTO CORRUPTION AFTER THEY HAVE BEEN CONSECRATED IS JESUS CHRIST STILL PRESENT UNDER THEIR SPECIES AS GOD AND AS MAN 1089-134686-0025 A GENTLE KICK FROM THE TALL BOY IN THE BENCH BEHIND URGED STEPHEN TO ASK A DIFFICULT QUESTION 1089-134686-0026 THE RECTOR DID NOT ASK FOR A CATECHISM TO HEAR THE LESSON FROM 1089-134686-0027 HE CLASPED HIS HANDS ON THE DESK AND SAID 1089-134686-0028 THE RETREAT WILL BEGIN ON WEDNESDAY AFTERNOON IN HONOR OF SAINT FRANCIS XAVIER WHOSE FEAST DAY IS SATURDAY 1089-134686-0029 ON FRIDAY CONFESSION WILL BE HEARD ALL THE AFTERNOON AFTER BEADS 1089-134686-0030 BEWARE OF MAKING THAT MISTAKE 1089-134686-0031 STEPHEN'S HEART BEGAN SLOWLY TO FOLD AND FADE WITH FEAR LIKE A WITHERING FLOWER 1089-134686-0032 HE IS CALLED AS YOU KNOW THE APOSTLE OF THE INDIES 1089-134686-0033 A GREAT SAINT SAINT FRANCIS XAVIER 1089-134686-0034 THE RECTOR PAUSED AND THEN SHAKING HIS CLASPED HANDS BEFORE HIM WENT ON 1089-134686-0035 HE HAD THE FAITH IN HIM THAT MOVES MOUNTAINS 1089-134686-0036 A GREAT SAINT SAINT FRANCIS XAVIER 1089-134686-0037 IN THE SILENCE THEIR DARK FIRE KINDLED THE DUSK INTO A TAWNY GLOW 1089-134691-0000 HE COULD WAIT NO LONGER 1089-134691-0001 FOR A FULL HOUR HE HAD PACED UP AND DOWN WAITING BUT HE COULD WAIT NO LONGER 1089-134691-0002 HE SET OFF ABRUPTLY FOR THE BULL WALKING RAPIDLY LEST HIS FATHER'S SHRILL WHISTLE MIGHT CALL HIM BACK AND IN A FEW MOMENTS HE HAD ROUNDED THE CURVE AT THE POLICE BARRACK AND WAS SAFE 1089-134691-0003 THE UNIVERSITY 1089-134691-0004 PRIDE AFTER SATISFACTION UPLIFTED HIM LIKE LONG SLOW WAVES 1089-134691-0005 WHOSE FEET ARE AS THE FEET OF HEARTS AND UNDERNEATH THE EVERLASTING ARMS 1089-134691-0006 THE PRIDE OF THAT DIM IMAGE BROUGHT BACK TO HIS MIND THE DIGNITY OF THE OFFICE HE HAD REFUSED 1089-134691-0007 SOON THE WHOLE BRIDGE WAS TREMBLING AND RESOUNDING 1089-134691-0008 THE UNCOUTH FACES PASSED HIM TWO BY TWO STAINED YELLOW OR RED OR LIVID BY THE SEA AND AS HE STROVE TO LOOK AT THEM WITH EASE AND INDIFFERENCE A FAINT STAIN OF PERSONAL SHAME AND COMMISERATION ROSE TO HIS OWN FACE 1089-134691-0009 ANGRY WITH HIMSELF HE TRIED TO HIDE HIS FACE FROM THEIR EYES BY GAZING DOWN SIDEWAYS INTO THE SHALLOW SWIRLING WATER UNDER THE BRIDGE BUT HE STILL SAW A REFLECTION THEREIN OF THEIR TOP HEAVY SILK HATS AND HUMBLE TAPE LIKE COLLARS AND LOOSELY HANGING CLOTHES BROTHER HICKEY 1089-134691-0010 BROTHER MCARDLE BROTHER KIEFF 1089-134691-0011 THEIR PIETY WOULD BE LIKE THEIR NAMES LIKE THEIR FACES LIKE THEIR CLOTHES AND IT WAS IDLE FOR HIM TO TELL HIMSELF THAT THEIR HUMBLE AND CONTRITE HEARTS IT MIGHT BE PAID A FAR RICHER TRIBUTE OF DEVOTION THAN HIS HAD EVER BEEN A GIFT TENFOLD MORE ACCEPTABLE THAN HIS ELABORATE ADORATION
WER 1.33%

When use batchsize=10, result is
1089-134686-0000 HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOUR FATTENED SAUCE 1089-134686-0001 STUFF IT INTO YOU HIS BELLY COUNSELLED HIM 1089-134686-0002 AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS 1089-134686-0003 HULLO BERTY ANY GOOD IN YOUR MIND 1089-134686-0004 NUMBER TEN FRESH NELLIE IS WAITING ON YOU GOOD NIGHT HUSBAND 1089-134686-0005 THE MUSIC CAME NEARER AND HE RECALLED THE WORDS THE WORDS OF SHELLEY'S FRAGMENT UPON THE MOON WANDERING COMPANIONLESS PALE FOR WEARINESS 1089-134686-0006 THE DULL LIGHT FELL MORE FAINTLY UPON THE PAGE WHEREON ANOTHER EQUATION BEGAN TO UNFOLD ITSELF SLOWLY AND TO SPREAD ABROAD ITS WIDENING TAIL 1089-134686-0007 A COLD LUCID INDIFFERENCE REIGNED IN HIS SOUL 1089-134686-0008 THE CHAOS IN WHICH HIS ARDOUR EXTINGUISHED ITSELF WAS A COLD INDIFFERENT KNOWLEDGE OF HIMSELF 1089-134686-0009 AT MOST BY AN ALMS GIVEN TO A BEGGAR WHOSE BLESSING HE FLED FROM HE MIGHT HOPE WEARILY TO WIN FOR HIMSELF SOME MEASURE OF ACTUAL GRACE 1089-134686-0010 WELL NOW ENNIS I DECLARE YOU HAVE A HEAD AND SO HAS MY STICK 1089-134686-0011 ON SATURDAY MORNINGS WHEN THE SODALITY MET IN THE CHAPEL TO RECITE THE LITTLE OFFICE HIS PLACE WAS A CUSHIONED KNEELING DESK AT THE RIGHT OF THE ALTAR FROM WHICH HE LED HIS WING OF BOYS THROUGH THE RESPONSES 1089-134686-0012 HER EYES SEEMED TO REGARD HIM WITH MILD PITY HER HOLINESS A STRANGE LIGHT GLOWING FAINTLY UPON HER FRAIL FLESH DID NOT HUMILIATE THE SINNER WHO APPROACHED HER 1089-134686-0013 IF EVER HE WAS IMPELLED TO CAST SIN FROM HIM AND TO REPENT THE IMPULSE THAT MOVED HIM WAS THE WISH TO BE HER KNIGHT 1089-134686-0014 TRIED TO THINK HOW IT COULD BE 1089-134686-0015 DEEPENING IN THE SCHOOLROOM COVERED OVER HIS THOUGHTS THE BELL RANG 1089-134686-0016 THEN YOU CAN ASK HIM QUESTIONS ON THE CATECHISM DEDALUS 1089-134686-0017 STEPHEN LEANING BACK AND DRAWING IDLY ON HIS SCRIBLER LISTENED TO THE TALK ABOUT HIM WHICH HARREN CHECKED FROM TIME TO TIME BY SAYING 1089-134686-0018 IT WAS STRANGE TOO THAT HE FOUND AN ARID PLEASURE IN FOLLOWING UP TO THE END THE RIGID LINES OF THE DOCTRINES OF THE CHURCH AND PENETRATING INTO OBSCURE SILENCES ONLY TO HEAR AND FEEL THE MORE DEEPLY HIS OWN CONDEMNATION 1089-134686-0019 THE SENTENCE OF SAINT JAMES'S WHICH SAYS THAT HE WHO OFFENDS AGAINST ONE COMMANDMENT BECOMES GUILTY OF ALL HAD SEEMED TO HIM FIRST A SWOLLEN PHRASE UNTIL HE HAD BEGUN TO GROPE IN THE DARKNESS OF HIS OWN STATE 1089-134686-0020 IF A MAN HAD STOLEN A POUND IN HIS YOUTH AND HAD USED THAT POUND TO AMASS A HUGE FORTUNE HOW MUCH WAS HE OBLIGED TO GIVE BACK THE POUND HE HAD STOLEN ONLY OR THE POUND TOGETHER WITH THE COMPOUND INTEREST ACCRUEING UPON IT OR ALL HIS HUGE FORTUNE 1089-134686-0021 IF A LAYMAN IN GIVING BAPTISM POUR THE WATER BEFORE SAYING THE WORDS IS THE CHILD BAPTIZED 1089-134686-0022 HOW COMES IT THAT WHILE THE FIRST BEATITUDE PROMISES THE KINGDOM OF HEAVEN TO THE POOR OF HEART THE SECOND BEATITUDE PROMISES ALSO TO THE MEEK THAT THEY SHALL POSSESS THE LAND 1089-134686-0023 WHY WAS THE SACRAMENT OF THE EUCHARIST INSTITUTED UNDER THE TWO SPECIES OF BREAD AND WINE IF JESUS CHRIST BE PRESENT BODY AND BLOOD SOUL AND DIVINITY IN THE BREAD ALONE AND IN THE WINE ALONE 1089-134686-0024 IF THE WINE CHANGE INTO VINEGAR AND THE HOST CRUMBLE INTO CORRUPTION AFTER THEY HAVE BEEN CONSECRATED IS JESUS CHRIST STILL PRESENT UNDER THEIR SPECIES AS GOD AND AS MAN 1089-134686-0025 GENTLE KICK FROM THE TALL BOY IN THE BENCH BEHIND URGED STEPHEN TO ASK A DIFFICULT QUESTION 1089-134686-0026 THE RECTOR DID NOT ASK FOR A CATECHISM TO HEAR THE LESSON FROM 1089-134686-0027 1089-134686-0028 RETREAT WILL BEGIN ON WEDNESDAY AFTERNOON IN HONOR OF SAINT FRANCIS XAVIER WHOSE FEAST DAY IS SATURDAY 1089-134686-0029 ON FRIDAY CONFESSION WILL BE HEARD ALL THE AFTERNOON AFTER BEADS 1089-134686-0030 BEWARE OF MAKING THAT MISTAKE 1089-134686-0031 STEPHEN'S HEART BEGAN SLOWLY TO FOLD AND FADE WITH FEAR LIKE A WITHERING FLOWER 1089-134686-0032 IS CALLED AS YOU KNOW THE APOSTLE OF THE INDIES 1089-134686-0033 A GREAT SAINT SAINT FRANCIS XAVIER 1089-134686-0034 THE RECTOR PAUSED AND THEN SHAKING HIS CLASPED HANDS BEFORE HIM WENT ON 1089-134686-0035 HE HAD THE FAITH IN HIM THAT MOVES MOUNTAINS 1089-134686-0036 A GREAT SAINT SAINT FRANCIS XAVIER 1089-134686-0037 IN THE SILENCE THEIR DARK FIRE KINDLED THE DUSK INTO A TAWNY GLOW 1089-134691-0000 HE COULD WAIT NO LONGER 1089-134691-0001 FULL HOUR HE HAD PACED UP AND DOWN WAITING BUT HE COULD WAIT NO LONGER 1089-134691-0002 OFF ABRUPTLY FOR THE BULL WALKING RAPIDLY LEST HIS FATHER'S SHRILL WHISTLE MIGHT CALL HIM BACK AND IN A FEW MOMENTS HE HAD ROUNDED THE CURVE AT THE POLICE BARRACK AND WAS SAFE 1089-134691-0003 THE UNIVERSITY 1089-134691-0004 PRIDE AFTER SATISFACTION UPLIFTED HIM LIKE LONG SLOW WAVES 1089-134691-0005 WHOSE FEET ARE AS THE FEET OF HEARTS AND UNDERNEATH THE EVERLASTING ARMS 1089-134691-0006 THE PRIDE OF THAT DIM IMAGE BROUGHT BACK TO HIS MIND THE DIGNITY OF THE OFFICE HE HAD REFUSED 1089-134691-0007 SOON THE WHOLE BRIDGE WAS TREMBLING AND RESOUNDING 1089-134691-0008 UNCOUTH FACES PASSED HIM TWO BY TWO STAINED YELLOW OR RED OR LIVID BY THE SEA AND AS HE STROVE TO LOOK AT THEM WITH EASE AND INDIFFERENCE A FAINT STAIN OF PERSONAL SHAME AND COMMISERATION ROSE TO HIS OWN FACE 1089-134691-0009 ANGRY WITH HIMSELF HE TRIED TO HIDE HIS FACE FROM THEIR EYES BY GAZING DOWN SIDEWAYS INTO THE SHALLOW SWIRLING WATER UNDER THE BRIDGE BUT HE STILL SAW A REFLECTION THEREIN OF THEIR TOP HEAVY SILK HATS AND HUMBLE TAPE LIKE COLLARS AND LOOSELY HANGING CLOTHES BROTHER HICKEY 1089-134691-0010 BROTHER MCARDLE BROTHER KIEFF 1089-134691-0011 THEIR PIETY WOULD BE LIKE THEIR NAMES LIKE THEIR FACES LIKE THEIR CLOTHES AND IT WAS IDLE FOR HIM TO TELL HIMSELF THAT THEIR HUMBLE AND CONTRITE HEARTS IT MIGHT BE PAID A FAR RICHER TRIBUTE OF DEVOTION THAN HIS HAD EVER BEEN A GIFT TENFOLD MORE ACCEPTABLE THAN HIS ELABORATE ADORATION
WER 3.48%

Error logs

same as above

Expected behavior

None

FSDP training raise "KeyError: 'ShardingStrategy.NO_SHARD'"

System Info

torch 2.0.1
torchaudio 2.0.2
torchvision 0.15.2

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

Hi, i can train the asr_librispeech finetuning code use DDP, however, when i switch to FSDP, an exception raised.

Error logs

Traceback (most recent call last):
File "examples/asr_librispeech/finetune_asr.py", line 41, in main_hydra
train(kwargs)
File "/SLAM-LLM/src/slam_llm/pipeline/finetune.py", line 167, in main
model = FSDP(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 391, in init
_auto_wrap(auto_wrap_kwargs, fsdp_kwargs, FullyShardedDataParallel)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_wrap_utils.py", line 73, in _auto_wrap
_recursive_wrap(**auto_wrap_kwargs, **fsdp_kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 370, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 370, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 370, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
[Previous line repeated 3 more times]
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 388, in _recursive_wrap
return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/wrap.py", line 317, in _wrap
return wrapper_cls(module, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 408, in init
_init_param_handle_from_module(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_init_utils.py", line 429, in _init_param_handle_from_module
_init_param_handle_from_params(state, managed_params, fully_sharded_module)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/fsdp/_init_utils.py", line 529, in _init_param_handle_from_params
SHARDING_STRATEGY_MAP[state.sharding_strategy],
KeyError: 'ShardingStrategy.NO_SHARD'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2031656) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(

Expected behavior

How to modify to use FSDP for speeding up? Thanks a lot! :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.