GithubHelp home page GithubHelp logo

Bing BERT about deepspeedexamples HOT 28 OPEN

microsoft avatar microsoft commented on August 25, 2024
Bing BERT

from deepspeedexamples.

Comments (28)

jeffra avatar jeffra commented on August 25, 2024 6

Hi @Rachnas and @tomekrut, we have uploaded our pre-processing script for the raw bookcorpus and wikipedia datasets to get them into our numpy compatible format. We haven't written up a tutorial yet on how to use them but feel free to check out the script here: https://github.com/microsoft/DeepSpeedExamples/blob/jeffra/bert_preprocessing/bing_bert/turing/bert_pretrain_data.py

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024 4

@piyushghai We are pleased to announce that support for training Bing BERT with Nvidia dataset, #27. Please give it a try.

from deepspeedexamples.

liuyq47 avatar liuyq47 commented on August 25, 2024 1

Hi, thanks for adding the NVIDIA dataset support. After trying it out, I see sometimes there are spikes in step time during the training process like the one I shown below. The spikes happens at allreduce methods.
Screen Shot 2020-07-27 at 5 54 43 PM

I don't have the original dataset so I don't know if there is a similar behavior in original dataset.

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

Thanks for trying out DeepSpeed. Unfortunately, these datasets are not yet publicly available. We are working on resolving this. Apologies for the inconvenience.

from deepspeedexamples.

oliverhu avatar oliverhu commented on August 25, 2024

any update on the dataset?

from deepspeedexamples.

sriramsrao avatar sriramsrao commented on August 25, 2024

Can we run DeepSpeed BERT trainer on the NVDA-generated hdf5 data?

from deepspeedexamples.

jeffra avatar jeffra commented on August 25, 2024

We'll be open sourcing the pre-processing scripts we used to get the data in this format very soon. However, if you're both at LinkedIn we can probably figure out a way for you to just download our datasets directly. Send me an email internally.

In theory you should be able to run the NVIDIA hdf5 format but it will take some code change to support it, which we have not done.

from deepspeedexamples.

oliverhu avatar oliverhu commented on August 25, 2024

@jeffra that's awesome, thanks :) Sending the email now.

from deepspeedexamples.

Rachnas avatar Rachnas commented on August 25, 2024

I am also looking for these datasets for pre-training Bert model. Any update about data availability?

from deepspeedexamples.

Rachnas avatar Rachnas commented on August 25, 2024

@jeffra Thank you!

from deepspeedexamples.

piyushghai avatar piyushghai commented on August 25, 2024

@jeffra
I was trying to run Bing BERT and hit the same issue, where the dataset is missing.

  1. Do you have an idea if I can leverage a dataset created from https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/LanguageModeling/BERT/data to run Bing BERT ?

  2. Or, do you have an idea on when a tutorial will be able for the pre-processing script for wiki and book corpus ?

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

@sriramsrao, @oliverhu, @tomekrut We have added support for training with Nvidia dataset. Thanks for the patience. We would really appreciate feedback on your experience trying it out. Thanks!

from deepspeedexamples.

oliverhu avatar oliverhu commented on August 25, 2024

thank you @tjruwase

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

@liuyq47 Thanks for trying out the new dataset.

Can you be more specific on the timer names and values showing the spikes? The highlighted section of the screenshot seems fine to me, except it seems you are running with a gradient accumulation step of 1 and an effective batch size of 4K (instead of 64K).

from deepspeedexamples.

liuyq47 avatar liuyq47 commented on August 25, 2024

I was comparing the time to the step above and below the highlighted section. Normally I see backward pass takes around ~400ms and backward_allreduce steps takes around 229ms but this highlighted section has much higher backward pass time which is due to higher backward_allreduce time.

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

Thanks for the clarification. So to confirm, you are observing occasional spikes of allreduce time from ~229 to ~415. Yes, that does look odd. To help repro for a quick sanity check, can you please share your json config and hardware details (GPU type and count)?

from deepspeedexamples.

liuyq47 avatar liuyq47 commented on August 25, 2024

I'm using 8 DGX-1 (64 V100-SXM2) Pytorch version 1.5.0 and Cuda 10.1

deepspeed_bsz64k_lamb_config_seq128.json.txt

bert_large_lamb_nvidia_data.json.txt

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

Awesome. Thanks!

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

@liuyq47 I can confirm that I do see occasional spikes as well with all-reduce latency with a similar setup. In my case, I used single DGX2 node, 16GPUs and saw min/max of 20msec/37msec. I don't know what could cause such spikes, and don't want to speculate at this point. While these spikes should not affect convergence, I am curious whether it has noticeable impact on your training speed, especially as you increase the number of nodes. Is this this case? Can you try increasing the gradient accumulation steps (and reducing the number of nodes)?

from deepspeedexamples.

liuyq47 avatar liuyq47 commented on August 25, 2024

I've seen the spikes too with gradient accumulations. (8 nodes with bz of 64 and gradient accumulation of 16) and higher number of nodes (64 DGX-1). Normal all-reduce time is 200ms, but sometimes I saw > 300ms, or even 500ms. Is this spike much longer than you see(20msec/37msec)? This does not affect the training accuracy but does affect training time. I saw this spikes happening around 20% of the time.

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

@liuyq47 Thanks for confirming that this issue shows with gradient accumulations. Now, I suspect it has to do with the nvidia dataset as I don't believe we have previously seen this with the bing dataset. One difference that I notice is that nvidida dataset uses a random data sampler whereas bing dataset uses a distributed sampler.

Regarding the spikes, 200ms/500ms in your case versus 20ms/37ms in mine, I am more concerned about the relative size. In other words, you are seeing 2.5X spike, which is very significant, while it is a lower 1.8X for me. More concerning is that allreduce was already the slowest portion of your computation (compared to forward/backward/optimizer), and so a 2.5X spike 20% of the time is quite significant. We will take a closer look into this. Thanks so much for helping to diagnose this far.

from deepspeedexamples.

vgaraujov avatar vgaraujov commented on August 25, 2024

Hi @Rachnas and @tomekrut, we have uploaded our pre-processing script for the raw bookcorpus and wikipedia datasets to get them into our numpy compatible format. We haven't written up a tutorial yet on how to use them but feel free to check out the script here: https://github.com/microsoft/DeepSpeedExamples/blob/jeffra/bert_preprocessing/bing_bert/turing/bert_pretrain_data.py

Hi @jeffra,
Are you still willing to share your original dataset? I am really interested in replicating your results

Thanks

from deepspeedexamples.

huahuaai avatar huahuaai commented on August 25, 2024

How could I download the dataset from nvidia?

from deepspeedexamples.

dancingpipi avatar dancingpipi commented on August 25, 2024

@tjruwase
"The scripts assume that the datasets are available in the path /workspace/bert"
could you show me the directory tree of /workspace/bert ? I have download nvidia wiki data, and formatted them to hdf5. but don't know how to put them to data dir.

from deepspeedexamples.

tjruwase avatar tjruwase commented on August 25, 2024

@dancingpipi, sorry I have not run this in a long time and don't have the datasets setup on my box. But, can you try
/workspace/bert/data/128
/workspace/bert/data/512

The related configuration setting is here.

Let me know if that works.

from deepspeedexamples.

dancingpipi avatar dancingpipi commented on August 25, 2024

@dancingpipi, sorry I have not run this in a long time and don't have the datasets setup on my box. But, can you try
/workspace/bert/data/128
/workspace/bert/data/512

The related configuration setting is here.

Let me know if that works.

thanks for your reply, I'll do a try

from deepspeedexamples.

dancingpipi avatar dancingpipi commented on August 25, 2024

@dancingpipi, sorry I have not run this in a long time and don't have the datasets setup on my box. But, can you try
/workspace/bert/data/128
/workspace/bert/data/512

The related configuration setting is here.

Let me know if that works.

It works!~

from deepspeedexamples.

zyz0000 avatar zyz0000 commented on August 25, 2024

@jeffra Could you send me an email to share your datasets for bert pretraining? Thank you so much!

from deepspeedexamples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.