把Bert、和Bert-wwm的模型转成paddle, 基于 <a href="https://github.com/PaddlePaddle/LARK/tree/deve

阅读理解在实现上是不是有问题 about chinese-bert-wwm HOT 8 CLOSED

zhanghan1992 commented on June 20, 2024

阅读理解在实现上是不是有问题

from chinese-bert-wwm.

Comments (8)

ymcui commented on June 20, 2024

https://arxiv.org/abs/1806.00920
参考DRCD官方的论文，报BERT基线结果 F1 score 89.59% and Exact Match score 82.34%.

from chinese-bert-wwm.

zhanghan1992 commented on June 20, 2024

能否提供一下阅读理解的实验代码，用bert官方代码单卡batch size开不到64

from chinese-bert-wwm.

ymcui commented on June 20, 2024

参考：https://github.com/ymcui/CMRC2018-DRCD-BERT/
在这个基础上仅对lr_schedual进行了优化以提高基线效果。

from chinese-bert-wwm.

zhanghan1992 commented on June 20, 2024

参考：https://github.com/ymcui/CMRC2018-DRCD-BERT/
在这个基础上仅对lr_schedual进行了优化以提高基线效果。

python run_cmrc2018_drcd_baseline.py
--vocab_file=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/vocab.txt
--bert_config_file=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/bert_config.json
--init_checkpoint=${PATH_TO_BERT}/multi_cased_L-12_H-768_A-12/bert_model.ckpt
--do_train=True
--train_file=${DATA_DIR}/cmrc2018_train.json
--do_predict=True
--predict_file=${DATA_DIR}/cmrc2018_dev.json
--train_batch_size=32
--num_train_epochs=2
--max_seq_length=512
--doc_stride=128
--learning_rate=3e-5
--save_checkpoints_steps=1000
--output_dir=${MODEL_DIR}
--do_lower_case=False
--use_tpu=False

论文效果是用这个配置跑的吗？

from chinese-bert-wwm.

ymcui commented on June 20, 2024

请参考技术报告中的参数（batch 64, lr 3e-5，epoch 2，maxlen 512），未说明的代表使用的是默认配置。
另外，你贴的是multi-lingual BERT上的命令，不是Chinese BERT，do_lower_case需要设置为True，中文BERT是uncased模型。

from chinese-bert-wwm.

zhanghan1992 commented on June 20, 2024

https://github.com/ymcui/CMRC2018-DRCD-BERT/ 的代码在32g v100上单卡batch size开不到64。论文里的实验在什么gpu上跑的？是通过多卡同步增大batch size吗？

from chinese-bert-wwm.

ymcui commented on June 20, 2024

TPU v2

from chinese-bert-wwm.

ymcui commented on June 20, 2024

reopen if necessary

from chinese-bert-wwm.

Recommend Projects

阅读理解在实现上是不是有问题 about chinese-bert-wwm HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs