jayyip / m3tl Goto Github PK
View Code? Open in Web Editor NEWBERT for Multitask Learning
Home Page: https://jayyip.github.io/m3tl/
License: Apache License 2.0
BERT for Multitask Learning
Home Page: https://jayyip.github.io/m3tl/
License: Apache License 2.0
你好,使用repo中的optimizer发现总的训练步数并没有减少,多gpu理论上训练步数应该线性递减啊?请问你有没有遇到这个问题,谢谢。
Hi, thanks for your great jobs! I am wondering if I can use it for multi-task classifications.
I have few questions of the training process. My task is "cls1&cls2&cls3" task. 1. For the classification task, the model uses pre-trained BERT to obtain a sentence representation of each input, how does this representation generate (how is it pooled)? 2. what is the loss function of classification task? 3. the loss using for backpropagation is the mean of these three classification losses, is it correct? 4. During the backpropagation, the model whether updates entire model (include BERT) or only the top layer.
介绍中提到本项目可以支持seq2seq_text任务:
seq2seq_text: Sequence to Sequence text generation problem
想请教下,数据输入部分(data_preprocessing)的函数应该如何写?谢谢~
抱歉又来打扰您。训练多个任务,freeze_step设置为>0时,会报一个如题所述的异常;自己尝试调试,但最终未能解决,希望能得到你的建议。
环境:
何时遇到 ValueError:
Traceback (most recent call last):
File "debug_freeze.py", line 161, in
model_dir=model_dir)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/bert_multitask_learning/run_bert_multitask.py", line 126, in train_bert_multitask
train_and_evaluate(estimator, train_spec, eval_spec)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1159, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1222, in _train_model_distributed
self._config._train_distribute, input_fn, hooks, saving_listeners)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1302, in _actual_train_model_distributed
self.config))
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1810, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 662, in _call_for_each_replica
fn, args, kwargs)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 196, in _call_for_each_replica
coord.join(threads)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/distribute/mirrored_strategy.py", line 880, in run
self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1149, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/bert_multitask_learning/model_fn.py", line 496, in model_fn
features, hidden_feature, loss_eval_pred, mode, warm_start)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/bert_multitask_learning/model_fn.py", line 457, in create_spec
train_scaffold)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/bert_multitask_learning/model_fn.py", line 384, in create_train_spec
aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_impl.py", line 158, in gradients
unconnected_gradients)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 350, in _MaybeCompile
return grad_fn() # Exit early
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/gradients_util.py", line 679, in
lambda: grad_fn(op, *out_grads))
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_grad.py", line 84, in _SwitchGrad
return merge(grad, name="cond_grad")[0], None
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 413, in merge
nest.assert_same_structure(inputs[0], v, expand_composites=True)
File "/home/jp/anaconda3/envs/jp_test/lib/python3.6/site-packages/tensorflow_core/python/util/nest.py", line 326, in assert_same_structure
% (str(e), str1, str2))
ValueError: The two structures don't have the same nested structure.
First structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("gradients/cond/Merge_grad/cond_grad/Switch_1:0", shape=(?,), dtype=int64, device=/replica:0/task:0/device:GPU:0), values=Tensor("gradients/cond/Merge_grad/cond_grad/Switch:0", shape=(?, ?, 768), dtype=float32, device=/replica:0/task:0/device:GPU:0), dense_shape=Tensor("gradients/cond/Merge_grad/cond_grad/Switch_2:0", shape=(3,), dtype=int64, device=/replica:0/task:0/device:GPU:0))
Second structure: type=IndexedSlices str=IndexedSlices(indices=Tensor("gradients/cond/Switch_1_grad/cond_grad/Cast:0", shape=(?,), dtype=int64, device=/replica:0/task:0/device:GPU:0), values=Tensor("gradients/zeros:0", shape=(?, ?, 768), dtype=float32, device=/replica:0/task:0/device:GPU:0), dense_shape=Tensor("gradients/cond/Switch_1_grad/cond_grad/Shape:0", shape=(3,), dtype=int32, device=/replica:0/task:0/device:GPU:0))
More specifically: Incompatible CompositeTensor TypeSpecs: type=IndexedSlicesSpec str=IndexedSlicesSpec(TensorShape([Dimension(None), Dimension(None), Dimension(768)]), tf.float32, tf.int64, tf.int64, TensorShape([Dimension(None)])) vs. type=IndexedSlicesSpec str=IndexedSlicesSpec(TensorShape([Dimension(None), Dimension(None), Dimension(768)]), tf.float32, tf.int64, tf.int32, TensorShape([Dimension(None)]))
Entire first structure:
.
Entire second structure:
.
Do you add the wordpiece tokenization in the process? In the example, your input data is already tokenized, so I am wondering about this.
请问有比较过多卡和单卡的训练时间吗?
我按照这个方式实现了一下bert里的run_squad,发现2卡和单卡时间相当,甚至2卡比四卡时间还短一些.
使用的是tf 1.13 gpu是1080
单机多GPU,效率较单机单卡有所提升;但切换到多机多卡后,效率大大下降了
in model_fn.py line 284:
train_op = optimizer.apply_gradients(
zip(grads, tvars), global_step=global_step)
but there is no apply_gradients in optimizer.py
Dear author:
I find that you have achived the method which is mirrorStrategy to train the bert in multi-gpu.Now,I want to use the way of collectiveallreduce to train in multi-gpu.I use the tf.contrib.distribute.CollectiveAllReduceStrategy(num_gpus_per_worker=2) to set the distributation.Then i use the function of train_and_evaluate to start the process of train,but encounter a problem unsupported operand type(s) for +:'perreplica' and 'str'.I don't know how to solve it.(I have 2 gpus which is V100)
0.3.4版本,执行notebook示例:Run Self Defined Problem,出现以下错误:
InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: cycle_length
must be > 0
[[node ExperimentalParallelInterleaveDataset (defined at /home/appadmin/anaconda3/envs/multi_task/lib/python3.6/site-packages/bert_multitask_learning-0.3.4-py3.6.egg/bert_multitask_learning/read_write_tfrecord.py:520) ]]
[[MultiDeviceIteratorToStringHandle/_8865]]
(1) Invalid argument: cycle_length
must be > 0
[[node ExperimentalParallelInterleaveDataset (defined at /home/appadmin/anaconda3/envs/multi_task/lib/python3.6/site-packages/bert_multitask_learning-0.3.4-py3.6.egg/bert_multitask_learning/read_write_tfrecord.py:520) ]]
0 successful operations.
0 derived errors ignored.
运行环境如下:
tensorflow==1.14.0
keras==2.3.1
tensor2tensor==1.15.5
how to solve it
Collecting googleapis-common-protos (from tensorflow-metadata->tensorflow-datasets->tensor2tensor->bert-multitask-learning)
Downloading https://mirrors.aliyun.com/pypi/packages/eb/ee/e59e74ecac678a14d6abefb9054f0bbcb318a6452a30df3776f133886d7d/googleapis-common-protos-1.6.0.tar.gz
ERROR: Complete output from command python setup.py egg_info:
ERROR: Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 11, in
from setuptools.extern.six.moves import filterfalse, map
File "/usr/lib/python3/dist-packages/setuptools/extern/init.py", line 1, in
from pkg_resources.extern import VendorImporter
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2927, in
@_call_aside
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2913, in _call_aside
f(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2952, in _initialize_master_working_set
add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 956, in subscribe
callback(dist)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2952, in
add_activation_listener(lambda dist: dist.activate())
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2515, in activate
declare_namespace(pkg)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2097, in declare_namespace
_handle_ns(packageName, path_item)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2047, in _handle_ns
_rebuild_mod_path(path, packageName, module)
File "/usr/lib/python3/dist-packages/pkg_resources/init.py", line 2066, in _rebuild_mod_path
orig_path.sort(key=position_in_sys_path)
AttributeError: '_NamespacePath' object has no attribute 'sort'
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-s47a0wqk/googleapis-common-protos/
ub16c9@ub16c9-gpu:/media/ub16c9/fcd84300-9270-4bbd-896a-5e04e79203b7/ub16_prj/bert-multitask-learning$
您好,在github上搜尋 BERT multitask的時候發現了你的repo,因為目前在github找不太到有人同時整合一個BERT multitask multi-modal的架構,有找到BERT multi-modal的但沒有multitask,所以想說加果能支援multi-modal的話就完美了,謝謝!
Hey, I have been trying to use a sentiment analysis dataset with the imdb class (mentioned in the notebook) as a multitask.
This is the sample format of the sentiment data:
train_data = [['I', 'am', 'going', 'to', 'school', '.'], ['I', 'am', 'not', 'feeling', 'good', '.']] train_labels = [0, 1] test_data = [['I', 'wass', 'so', 'sick', 'yesterday', '.']] test_labels = [1]
Unfortunately, this runs to the error
ValueError:
generator
yielded an element of shape (48,) where an element of shape () was expected.
Can you kindly help me solve this issue?
I find a bug of generating the label set of classification task.
The label set always has one more label: [PAD]. I find in utils.fit() (line 33 and 35), the [PAD] will be add to label set.
Could you please check this issue?
I tried to run the notebook Run Pre-defined problems.ipynb
after
train_bert_multitask(problem='weibo_ner&weibo_cws', num_gpus=1, num_epochs=3)
I got the error message:
Traceback (most recent call last):
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 206, in assign_problem
self.get_data_info(self.problem_list, self.ckpt_dir)
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/params.py", line 270, in get_data_info
list(self.read_data_fnproblem))
File "/cluster/kappa/90-days-archive///g_transformer/git/bert-multitask-learning/bert_multitask_learning/create_generators.py", line 300, in create_single_problem_generator
example_list=example) for example in example_list
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 1017, in call
self.retrieve()
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/parallel.py", line 909, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 562, in wrap_future_result
return future.result(timeout=timeout)
File "/cluster/tufts//lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 435, in result
return self.__get_result()
File "/cluster/tufts/**/lib/anaconda3/envs/1001-nlp/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}
How much RAM do I need?
之前一直想做bert的多任务, 单独做过Bert 的 单句多标签分类问题,也单独做过bert 的NER实体识别。按照我对这个项目的理解,是可以比如 做一次finetune 这次finetune 涉及这两种任务, 然后得到一个比较通用的finetune 模型, 这个模型在分类任务和NER上 都有一个相对可以的表现.请问我理解的对吗.
从代码中看,是 cws & NER (同形式输入的两种任务) (用| 是不同形式输入的两种任务).
想问下 我是否可以 (multilabel_classification | NER) 这样做呢。可以讲解下数据这部分应该怎么处理这种不同形式的两种任务的读入呢, 自己理解的不是很明白. 非常感谢
'''
WARNING:root:bert_config not exists. will load model from huggingface checkpoint.
Traceback (most recent call last):
File "run_weibo_ner_cws.py", line 31, in
train_bert_multitask(problem='weibo_ner&weibo_cws', params=params, problem_type_dict=problem_type_dict,
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/bert_multitask_learning/run_bert_multitask.py", line 113, in train_bert_multitask
params.assign_problem(problem, gpu=int(num_gpus),
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/bert_multitask_learning/params.py", line 221, in assign_problem
self.prepare_dir(base_dir, dir_name, self.problem_list)
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/bert_multitask_learning/params.py", line 491, in prepare_dir
tokenizer = load_transformer_tokenizer(
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/bert_multitask_learning/utils.py", line 278, in load_transformer_tokenizer
tok = getattr(transformers, load_module_name).from_pretrained(
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/transformers/tokenization_auto.py", line 188, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/data/home/likai/.conda/envs/lkai_tf2/lib/python3.8/site-packages/transformers/configuration_auto.py", line 289, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in models/weibo_cws_weibo_ner_ckpt/tokenizer. Should have a model_type
key in its config.json, or contain one of the following strings in its name: retribert, t5, mobilebert, distilbert, albert, bert-generation, camembert, xlm-roberta, pegasus, marian, mbart, bart, reformer, longformer, roberta, flaubert, bert, openai-gpt, gpt2, transfo-xl, xlnet, xlm, ctrl, electra, encoder-decoder, funnel, lxmert
'''
The file-'config.json' under this path -'models/weibo_cws_weibo_ner_ckpt/tokenizer' is updated each time I run the program, and there is no model_type
. Do you know what the problem is?Expect a response, thank you.
Hello,i am still trying to train ner task ,but every time i get very low loss after 100 steps and after evaluating get Acc Score: 0.839600
Precision Score: 0.125084
Recall Score: 0.216169
F1 Score: 0.158471
Score still dont change after 1000 and more steps
can you help me? maybe my data processing function is bad,but i compare outputs with predifined functions and all is okey
optimizer.apply_gradients() will increase global step once when param global_step is not set to None. So in this line, global step will be increased again which it doesn't make sense. The issue can influence the number of train step when using max_steps or steps parameter in estimator.train()
BTW, thanks for sharing the optimization implementation for BERT multi-gpu.
Just as you remind "Therefore, in a particular batch, some tasks might not be sampled, and their loss could be 0 in this batch." Now I get the error:
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
with multitask problem.how to slove that?
Hi~
I'm a bit confused about the results in the baseline.md:
Does the baseline mean that we use the bert model without multi-task learning and the multitask_label_transfer_first_train mean that we use the bert model with multi-task learning?
It seems that all the results prove that the multitask_label_transfer_first_train is not as good as the baseline?
Hope to get your reply. Thanks a lot.
Hello,thank you for this great implementation.I want to use different vocab and init checkpoint.But in DynamicBatchSizeParams() methods there is only 'decode_vocab_file' method.Is it about adding vocab?
Hi, I'm new to tensorflow, and not quite familiar with multi-GPU training using tf.eatimator. Could you explain some key points that you modify the original code to implement multi-GPU support, especially in the optimizer.py? By the way, is there any big changes in the estimator.py compared with the original code? Thanks in advance!
如题,谢谢
from .ner_data import *
from .test_data import *
from .test_data import *
Excuse me, is there a small error with the file in this path -‘bert_multitask_learning/predefined_problems/init.py’ ?cws_data?
您好,非常感谢您的分享。
我在使用您的项目的时候,发现tensor2tensor好像只支持tf2.2以上,然后看您的项目里面又是支持tf1.13;我装了tf1.15之后,运行您demo里面的代码,报了错
File "/root/data/glusterfs_sharing_04_v3/11117720/bert-multitask-learning-master/bert_multitask_learning/run_bert_multitask.py", line 120, in train_bert_multitask
input_fn=train_input_fn, max_steps=params.train_steps, hooks=[train_hook])
File "/root/miniconda3/envs/tf1.15/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 163, in new
'Must specify max_steps > 0, given: {}'.format(max_steps))
ValueError: Must specify max_steps > 0, given: 0
使用的是这个代码 https://github.com/JayYip/bert-multitask-learning/blob/master/notebooks/Run%20Pre-defined%20problems.ipynb
希望您能帮忙解答一下,万分感谢
你好,感谢分享代码!想请教下,我们有几个任务进行多任务学习,在用这套框架一起训练时候,会报nan错误,但是单独训练时候都没有问题。不知道你可否知道可能哪里出现了问题?(看代码里面有注释说 # WARNING: Potential nan created here! # TODO: Fix this.)谢谢!
非常感谢你的repo给我提供了一个学习multi-task的材料。
最近打算使用预训练的bert做文本分类和NER的多任务学习(两个任务输入不同)。我了解到Hard parameter sharing多任务学习有两种训练方式:
我想研究上述两个任务使用联合训练/交替训练时的效果。我看到你文档中写的不同输入时problem参数必须使用 | 。我有两个疑问:
I think your patch made notebook files not working such as Run Self Defined Problem with Modified Model.ipynb.
it tries to import create_single_problem_generator, which does not exist anymore..
can you fix it?
Hi, is this only for fine-tuning?
I see all examples are about fine-tuning.
Does it support pre-training? Any examples?
感谢作者分享代码,小白能求问下这个代码的运行环境么,之前在python 2.7 tf 1.12.0 运行bert finetune是没问题的,在这个代码运行时候遇到这个报错:
def train_eval_input_fn(config: Params, mode='train', epoch=None):
SyntaxError: invalid syntax
这个 “config 冒号 Param”
以前没有见过这种写法,是缺少了什么包还是运行环境不对呢,谢谢作者
代码:
from bert_multitask_learning import train_bert_multitask, eval_bert_multitask, predict_bert_multitask
problem_type_dict = {'toy_cls': 'cls', 'toy_seq_tag': 'seq_tag'}
problem = 'toy_cls&toy_seq_tag'
model = train_bert_multitask(
problem=problem,
num_epochs=1,
problem_type_dict=problem_type_dict,
processing_fn_dict=processing_fn_dict,
#continue_training=True
)
报错:
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/run_bert_multitask.py in train_bert_multitask(problem, num_gpus, num_epochs, model_dir, params, problem_type_dict, processing_fn_dict, model, create_tf_record_only, steps_per_epoch, warmup_ratio, continue_training, mirrored_strategy)
257
258 model = create_keras_model(
--> 259 mirrored_strategy=mirrored_strategy, params=params, mode=mode, inputs_to_build_model=one_batch)
260
261 _train_bert_multitask_keras_model(
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/run_bert_multitask.py in create_keras_model(mirrored_strategy, params, mode, inputs_to_build_model, model)
91 if mirrored_strategy is not None:
92 with mirrored_strategy.scope():
---> 93 model = _get_model_wrapper(params, mode, inputs_to_build_model, model)
94 else:
95 model = _get_model_wrapper(params, mode, inputs_to_build_model, model)
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/run_bert_multitask.py in _get_model_wrapper(params, mode, inputs_to_build_model, model)
51 def _get_model_wrapper(params, mode, inputs_to_build_model, model):
52 if model is None:
---> 53 model = BertMultiTask(params)
54 # model.run_eagerly = True
55 if mode == 'resume':
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/model_fn.py in init(self, params, name)
261 self.params = params
262 # initialize body model, aka transformers
--> 263 self.body = BertMultiTaskBody(params=self.params)
264 # mlm might need word embedding from bert
265 # build sub-model
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/model_fn.py in init(self, params, name)
63 super(BertMultiTaskBody, self).init(name=name)
64 self.params = params
---> 65 self.bert = MultiModalBertModel(params=self.params)
66 if self.params.custom_pooled_hidden_size:
67 self.custom_pooled_layer = tf.keras.layers.Dense(
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/modeling.py in init(self, params, use_one_hot_embeddings)
40 # multimodal input dense
41 embedding_dim = get_embedding_table_from_model(
---> 42 self.bert_model).shape[-1]
43 self.modal_name_list = ['image', 'others']
44 self.multimodal_dense = {modal_name: tf.keras.layers.Dense(
/root/.local/lib/python3.7/site-packages/bert_multitask_learning/utils.py in get_embedding_table_from_model(model)
397 def get_embedding_table_from_model(model):
398 base_model = get_transformer_main_model(model)
--> 399 return base_model.embeddings.word_embeddings
400
401
AttributeError: 'TFBertEmbeddings' object has no attribute 'word_embeddings'
请问notebook中,分类的例子中,这里params.init_checkpoint = 'models/cased_L-12_H-768_A-12'是原始bert的模型吗?还有这个项目有模型说明吗或者论文?谢谢
Hi! Great to see such a tremendous work done. One question: why do you consider such great changes to the project architecture necessary? Why just passing DistributionStrategy to estimator would not be enough? And did you try Horovod from Uber for the same purpose?
Hi I am trying to run the classification tasks. I can successfully run my code at first time. But I get error when I remove the results of first run and rerun the model. The error is:
Adding new problem country_cls, problem type: cls
Adding new problem gender_cls, problem type: cls
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-50089851ce6e> in <module>
4 train_bert_multitask(problem='country_cls&gender_cls', num_gpus=2,
5 num_epochs=2, params=params,
----> 6 problem_type_dict=new_problem_type, processing_fn_dict=new_problem_process_fn_dict)
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/run_bert_multitask.py in train_bert_multitask(problem, num_gpus, num_epochs, model_dir, params, problem_type_dict, processing_fn_dict, model)
106 problem_name=new_problem, problem_type=problem_type_dict[new_problem], processing_fn=new_problem_processing_fn)
107 params.assign_problem(problem, gpu=int(num_gpus),
--> 108 base_dir=base_dir, dir_name=dir_name)
109 params.to_json()
110
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/params.py in assign_problem(self, flag_string, gpu, base_dir, dir_name, is_serve)
191 self.prepare_dir(base_dir, dir_name, self.problem_list)
192
--> 193 self.get_data_info(self.problem_list, self.ckpt_dir)
194
195 if not is_serve:
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/params.py in get_data_info(self, problem_list, base)
255
256 self.data_num_dict[problem] = len(
--> 257 list(self.read_data_fn[problem](self, 'train')))
258 self.data_num += self.data_num_dict[problem]
259 else:
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/data_preprocessing/preproc_decorator.py in wrapper(params, mode)
13 if os.path.exists(pickle_file) and params.multiprocess:
14 label_encoder = get_or_make_label_encoder(
---> 15 params, problem=problem, mode=mode)
16 return create_single_problem_generator(
17 func.__name__,
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/utils.py in get_or_make_label_encoder(params, problem, mode, label_list, zero_class)
136 label_encoder = LabelEncoder()
137
--> 138 label_encoder.fit(label_list, zero_class=zero_class)
139 label_encoder.dump(le_path)
140 else:
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/utils.py in fit(self, y, zero_class)
28 self.encode_dict = {}
29 self.decode_dict = {}
---> 30 label_set = set(y)
31 if zero_class is None:
32 zero_class = '[PAD]'
TypeError: 'NoneType' object is not iterable
It would be useful to create a TopLayer for regression-type problems, where the label is a score, for instance similarity metrics between embeddings.
NameError: name 'input_list' is not defined
@JayYip
--> https://github.com/JayYip/bert-multitask-learning/blob/9c21e432ca1afd54423cff0bbfd16cc966156d21/bert_multitask_learning/data_preprocessing/ner_data.py#L131-L141
ValueError: You must specify an aggregation method to update a MirroredVariable in Tower Context"
该怎么修改优化器?
Hi, I have a dataset that involves 2 sequences and the task is classifying the sequence pair. I am not sure how to prepare the input in this case. So far, I have been working with only one sequence where I used the following format:
["Everyone", "should", "be", "happy", "."]
How do I extend this for 2 sequences? Do I have to insert a "SEP" token myself?
A lot of multi-gpu-related issues under this project have benefited me a lot.
Based on the original code of bert, with batch_size of 24 and tensorflow of 1.13.1, I recently used the AdamWeightDecayOptimizer in your project to successfully train a classifier with bert-large-uncased in 2 x Tesla P40, and the prediction looks fine.
But when I adjusted the batch_size to 32, I got the following OOM error. At this time I increased the number of GPUs to 3 but still OOM, I feel that MirroredSttrategy does not make data parallelism work. Then I reduced the number of GPUs to 1, the batch_size to 24, no OOM occurred.
Do you have any clues to solve this problem pls? Thank you very much!
error message:
WARNING:tensorflow:Efficient allreduce is not supported for IndexedSlices.
INFO:tensorflow:batch_all_reduce invoked for batches size = 1 with algorithm = hierarchical_copy, num_packs = 0, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
...
Limit: 22654317364
InUse: 22621908992
MaxInUse: 22621909760
NumAllocs: 13050
MaxAllocSize: 247209984
...
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4096,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:1 by allocator GPU_1_bfc
[[node replica_1/gradients/replica_1/bert/encoder/layer_20/intermediate/dense/Pow_grad/Pow (defined at /scripts/bert/custom_optimization.py:74) ]]
mirrored strategy:
dist_strategy = tf.contrib.distribute.MirroredStrategy(
cross_device_ops=AllReduceCrossDeviceOps('nccl'))
log_every_n_steps = 8
run_config = RunConfig(
train_distribute=dist_strategy,
eval_distribute=dist_strategy,
log_step_count_steps=log_every_n_steps,
model_dir=FLAGS.output_dir,
save_checkpoints_steps=FLAGS.save_checkpoints_steps)
estimator = Estimator(
model_fn=model_fn,
params={},
config=run_config)
...
train_file = os.path.join(FLAGS.output_dir, "train.tf_record")
file_based_convert_examples_to_features(
train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
train_input_fn = file_based_input_fn_builder(
input_file=train_file,
seq_length=FLAGS.max_seq_length,
is_training=True,
drop_remainder=True,
batch_size=FLAGS.train_batch_size)
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
custom_optimization:
def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps):
"""Creates an optimizer training op."""
global_step = tf.train.get_or_create_global_step()
learning_rate = tf.constant(value=init_lr, shape=[], dtype=tf.float32)
# Implements linear decay of the learning rate.
learning_rate = tf.train.polynomial_decay(
learning_rate,
global_step,
num_train_steps,
end_learning_rate=0.0,
power=1.0,
cycle=False)
# Implements linear warmup. I.e., if global_step < num_warmup_steps, the
# learning rate will be `global_step/num_warmup_steps * init_lr`.
if num_warmup_steps:
global_steps_int = tf.cast(global_step, tf.int32)
warmup_steps_int = tf.constant(num_warmup_steps, dtype=tf.int32)
global_steps_float = tf.cast(global_steps_int, tf.float32)
warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)
warmup_percent_done = global_steps_float / warmup_steps_float
warmup_learning_rate = init_lr * warmup_percent_done
is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)
learning_rate = (
(1.0 - is_warmup) * learning_rate + is_warmup * warmup_learning_rate)
# It is recommended that you use this optimizer for fine tuning, since this
# is how the model was trained (note that the Adam m/v variables are NOT
# loaded from init_checkpoint.)
optimizer = AdamWeightDecayOptimizer(
learning_rate=learning_rate,
weight_decay_rate=0.01,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-6,
exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"])
tvars = tf.trainable_variables()
grads = tf.gradients(loss, tvars)
# This is how the model was pre-trained.
(grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)
train_op = optimizer.apply_gradients(
zip(grads, tvars), global_step=global_step)
# Normally the global step update is done inside of `apply_gradients`.
# However, `AdamWeightDecayOptimizer` doesn't do this. But if you use
# a different optimizer, you should probably take this line out.
new_global_step = global_step + 1
train_op = tf.group(train_op, [global_step.assign(new_global_step)])
return train_op
class AdamWeightDecayOptimizer(Optimizer):
"""A basic Adam optimizer that includes "correct" L2 weight decay."""
def __init__(self,
learning_rate,
weight_decay_rate=0.0,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-6,
exclude_from_weight_decay=None,
name="AdamWeightDecayOptimizer"):
"""Constructs a AdamWeightDecayOptimizer."""
super(AdamWeightDecayOptimizer, self).__init__(False, name)
self.learning_rate = learning_rate
self.weight_decay_rate = weight_decay_rate
self.beta_1 = beta_1
self.beta_2 = beta_2
self.epsilon = epsilon
self.exclude_from_weight_decay = exclude_from_weight_decay
def _prepare(self):
self.learning_rate_t = ops.convert_to_tensor(
self.learning_rate, name='learning_rate')
self.weight_decay_rate_t = ops.convert_to_tensor(
self.weight_decay_rate, name='weight_decay_rate')
self.beta_1_t = ops.convert_to_tensor(self.beta_1, name='beta_1')
self.beta_2_t = ops.convert_to_tensor(self.beta_2, name='beta_2')
self.epsilon_t = ops.convert_to_tensor(self.epsilon, name='epsilon')
def _create_slots(self, var_list):
for v in var_list:
self._zeros_slot(v, 'm', self._name)
self._zeros_slot(v, 'v', self._name)
def _apply_dense(self, grad, var):
learning_rate_t = math_ops.cast(
self.learning_rate_t, var.dtype.base_dtype)
beta_1_t = math_ops.cast(self.beta_1_t, var.dtype.base_dtype)
beta_2_t = math_ops.cast(self.beta_2_t, var.dtype.base_dtype)
epsilon_t = math_ops.cast(self.epsilon_t, var.dtype.base_dtype)
weight_decay_rate_t = math_ops.cast(
self.weight_decay_rate_t, var.dtype.base_dtype)
m = self.get_slot(var, 'm')
v = self.get_slot(var, 'v')
# Standard Adam update.
next_m = (
tf.multiply(beta_1_t, m) +
tf.multiply(1.0 - beta_1_t, grad))
next_v = (
tf.multiply(beta_2_t, v) + tf.multiply(1.0 - beta_2_t,
tf.square(grad)))
update = next_m / (tf.sqrt(next_v) + epsilon_t)
if self._do_use_weight_decay(var.name):
update += weight_decay_rate_t * var
update_with_lr = learning_rate_t * update
next_param = var - update_with_lr
return control_flow_ops.group(*[var.assign(next_param),
m.assign(next_m),
v.assign(next_v)])
def _resource_apply_dense(self, grad, var):
learning_rate_t = math_ops.cast(
self.learning_rate_t, var.dtype.base_dtype)
beta_1_t = math_ops.cast(self.beta_1_t, var.dtype.base_dtype)
beta_2_t = math_ops.cast(self.beta_2_t, var.dtype.base_dtype)
epsilon_t = math_ops.cast(self.epsilon_t, var.dtype.base_dtype)
weight_decay_rate_t = math_ops.cast(
self.weight_decay_rate_t, var.dtype.base_dtype)
m = self.get_slot(var, 'm')
v = self.get_slot(var, 'v')
# Standard Adam update.
next_m = (
tf.multiply(beta_1_t, m) +
tf.multiply(1.0 - beta_1_t, grad))
next_v = (
tf.multiply(beta_2_t, v) + tf.multiply(1.0 - beta_2_t,
tf.square(grad)))
update = next_m / (tf.sqrt(next_v) + epsilon_t)
if self._do_use_weight_decay(var.name):
update += weight_decay_rate_t * var
update_with_lr = learning_rate_t * update
next_param = var - update_with_lr
return control_flow_ops.group(*[var.assign(next_param),
m.assign(next_m),
v.assign(next_v)])
def _apply_sparse_shared(self, grad, var, indices, scatter_add):
learning_rate_t = math_ops.cast(
self.learning_rate_t, var.dtype.base_dtype)
beta_1_t = math_ops.cast(self.beta_1_t, var.dtype.base_dtype)
beta_2_t = math_ops.cast(self.beta_2_t, var.dtype.base_dtype)
epsilon_t = math_ops.cast(self.epsilon_t, var.dtype.base_dtype)
weight_decay_rate_t = math_ops.cast(
self.weight_decay_rate_t, var.dtype.base_dtype)
m = self.get_slot(var, 'm')
v = self.get_slot(var, 'v')
m_t = state_ops.assign(m, m * beta_1_t,
use_locking=self._use_locking)
m_scaled_g_values = grad * (1 - beta_1_t)
with ops.control_dependencies([m_t]):
m_t = scatter_add(m, indices, m_scaled_g_values)
v_scaled_g_values = (grad * grad) * (1 - beta_2_t)
v_t = state_ops.assign(v, v * beta_2_t, use_locking=self._use_locking)
with ops.control_dependencies([v_t]):
v_t = scatter_add(v, indices, v_scaled_g_values)
update = m_t / (math_ops.sqrt(v_t) + epsilon_t)
if self._do_use_weight_decay(var.name):
update += weight_decay_rate_t * var
update_with_lr = learning_rate_t * update
var_update = state_ops.assign_sub(var,
update_with_lr,
use_locking=self._use_locking)
return control_flow_ops.group(*[var_update, m_t, v_t])
def _apply_sparse(self, grad, var):
return self._apply_sparse_shared(
grad.values, var, grad.indices,
lambda x, i, v: state_ops.scatter_add( # pylint: disable=g-long-lambda
x, i, v, use_locking=self._use_locking))
def _resource_scatter_add(self, x, i, v):
with ops.control_dependencies(
[resource_variable_ops.resource_scatter_add(
x.handle, i, v)]):
return x.value()
def _resource_apply_sparse(self, grad, var, indices):
return self._apply_sparse_shared(
grad, var, indices, self._resource_scatter_add)
def _do_use_weight_decay(self, param_name):
"""Whether to use L2 weight decay for `param_name`."""
if not self.weight_decay_rate:
return False
if self.exclude_from_weight_decay:
for r in self.exclude_from_weight_decay:
if re.search(r, param_name) is not None:
return False
return True
model_fn:
is_training = (mode == tf.estimator.ModeKeys.TRAIN)
(total_loss, per_example_loss, logits, probabilities) = create_model(
bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,
num_labels, use_one_hot_embeddings)
tvars = tf.trainable_variables()
initialized_variable_names = {}
scaffold_fn = None
if init_checkpoint:
(assignment_map, initialized_variable_names
) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
tf.logging.info("**** Trainable Variables ****")
for var in tvars:
init_string = ""
if var.name in initialized_variable_names:
init_string = ", *INIT_FROM_CKPT*"
tf.logging.info(" name = %s, shape = %s%s", var.name, var.shape,
init_string)
if mode == tf.estimator.ModeKeys.TRAIN:
train_op = custom_optimization.create_optimizer(
total_loss, learning_rate, num_train_steps, num_warmup_steps)
output_spec = tf.estimator.EstimatorSpec(
mode=mode,
loss=total_loss,
train_op=train_op,
scaffold=scaffold_fn)
...
修改一下问题:
1、您好,我在运行Run Run Pre-defined problems.ipynb这个文件的时候,导入自己的模型,然后设置batchsize为1,multiprocess为False,单个任务,会占满整个显存。
其实我尝试各种batchsize大小都不行,都是这样,不知道问题出在哪。另外我是这么修改您的代码
除了pip安装,我也试过直接下载你的代码直接运行,也是这个问题。另外我想问问,除了batchsize、multiprocess,还可以调整哪些参数可以降低显存的使用
2、另外就是调用train_bert_multitask会出现eval_feature_desc.json未创建。
最后查到train_and_evaluate(estimator, train_spec, eval_spec)中有误
RT
Hi,
I am trying to run the example in "run self defined problem". But I got error on "Train Model" part. I am using python 3.6.3 and bert-multitask-learning 0.2.7.
Adding new problem imdb_cls, problem type: cls
INFO:tensorflow:Saving preprocessing files to tmp/imdb_cls_train_data.pkl
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
'''
Traceback (most recent call last):
File "/home/chiyu94/bert_multitask/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 391, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/chiyu94/bert_multitask/lib/python3.6/site-packages/bert_multitask_learning/__init__.py", line 4, in <module>
from .model_fn import *
File "/home/chiyu94/bert_multitask/lib/python3.6/site-packages/bert_multitask_learning/model_fn.py", line 5, in <module>
from .bert import modeling
ModuleNotFoundError: No module named 'bert_multitask_learning.bert'
'''
The above exception was the direct cause of the following exception:
BrokenProcessPool Traceback (most recent call last)
<ipython-input-7-3420e0b145e9> in <module>
4 train_bert_multitask(problem='imdb_cls', num_gpus=0,
5 num_epochs=10, params=params,
----> 6 problem_type_dict=new_problem_type, processing_fn_dict=new_problem_process_fn_dict)
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/run_bert_multitask.py in train_bert_multitask(problem, num_gpus, num_epochs, model_dir, params, problem_type_dict, processing_fn_dict, model)
106 problem_name=new_problem, problem_type=problem_type_dict[new_problem], processing_fn=new_problem_processing_fn)
107 params.assign_problem(problem, gpu=int(num_gpus),
--> 108 base_dir=base_dir, dir_name=dir_name)
109 params.to_json()
110
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/params.py in assign_problem(self, flag_string, gpu, base_dir, dir_name, is_serve)
191 self.prepare_dir(base_dir, dir_name, self.problem_list)
192
--> 193 self.get_data_info(self.problem_list, self.ckpt_dir)
194
195 if not is_serve:
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/params.py in get_data_info(self, problem_list, base)
255
256 self.data_num_dict[problem] = len(
--> 257 list(self.read_data_fn[problem](self, 'train')))
258 self.data_num += self.data_num_dict[problem]
259 else:
/lustre03/project/6007993/chiyu94/bert-multitask-learning-master/bert_multitask_learning/create_generators.py in create_single_problem_generator(problem, inputs_list, target_list, label_encoder, params, tokenizer, mode)
295
296 return_dict_list_list = Parallel(num_process)(delayed(partial_fn)(
--> 297 example_list=example) for example in example_list
298 )
299
~/bert_multitask/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
932
933 with self._backend.retrieval_context():
--> 934 self.retrieve()
935 # Make sure that we get a last message telling us we are done
936 elapsed_time = time.time() - self._start_time
~/bert_multitask/lib/python3.6/site-packages/joblib/parallel.py in retrieve(self)
831 try:
832 if getattr(self._backend, 'supports_timeout', False):
--> 833 self._output.extend(job.get(timeout=self.timeout))
834 else:
835 self._output.extend(job.get())
~/bert_multitask/lib/python3.6/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
519 AsyncResults.get from multiprocessing."""
520 try:
--> 521 return future.result(timeout=timeout)
522 except LokyTimeoutError:
523 raise TimeoutError()
/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()
/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
Hello! I am working on training BERT on multiple GPUs these days (using official code released by Google Research) and after specifying the Run_Config of the estimator with MirroredStrategy, I encountered an "ValueError: You must specify an aggregation method to update a MirroredVariable in Tower Context." This error was the same with that in https://github.com/tensorflow/tensorflow/issues/23986#issuecomment-444389363 and I found your reply below. You said "You can take my implementation as a reference:", and therefore I came to this repo.
I modified the original optimization.py code with reference to your src/optimizer.py but still got the same error, could you give me some advice about how to re-implement the optimizer in original optimization.py?
The original official code of optimization.py is here https://github.com/google-research/bert/blob/master/optimization.py
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.