Comments (5)
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
from thumt.
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250.
But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15
from thumt.
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250.
But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15
The THUMT-TensorFlow can be run with TensorFlow-gpu=1.15. You can run a simple Tensorflow-GPU program (maybe a matrix multiplication operation) to check whether it can use the GPU. If not, you should check the CUDA version and the Driver version to make sure they are matched.
from thumt.
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250.
But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15The THUMT-TensorFlow can be run with TensorFlow-gpu=1.15. You can run a simple Tensorflow-GPU program (maybe a matrix multiplication operation) to check whether it can use the GPU. If not, you should check the CUDA version and the Driver version to make sure they are matched.
thank u very mach, the issue have be solved, it is because CUDA version dosen't match Tensorflow version.
By the way, if I set update_cycle=1,batch_size=4096, how many BLEU score I can get in valid corpus? and training model in zh-en 200 millions sentence-pair?
from thumt.
@Rooders Sorry, we did not record the BLEU scores under this setting.
from thumt.
Related Issues (20)
- what is the hparams_set for benchmark transformer model? HOT 1
- translator.py生成了空的文档,程序无报错
- 模型训练无法收敛
- pytorch version ? Providing a bool or integral fill value without setting the optional `dtype` or `out` arguments is currently unsupported. In PyTorch 1.7, HOT 2
- TypeError: Can't instantiate abstract class MapDataset with abstract methods _inputs, set_inputs HOT 1
- Question about translating with CPU
- Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! HOT 1
- In dataset Wmt17 zh-en,The result is not good as wmt14 en-de HOT 2
- 希望能出一份中文档
- 一些疑惑 HOT 2
- 训练无响应无报错
- 训练时没有生成eval文件夹,也没有在日志中输出验证信息 HOT 2
- 报错:TypeError: Expected 'Iterator' as the return annotation for `__iter__` of Dataset, but found thumt.data.iterator.Iterator HOT 1
- 请教问题
- 一些疑惑
- Code Problem and Potential Solution: Inference with CPU
- tensorflow版本target端为什么只在结束加eos,却没有在开始加bos。
- how to fine tuning with pre_trained model
- get_relevance出现cast float to string报错
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thumt.