huawei-noah / pretrained-language-model Goto Github PK

View Code? Open in Web Editor NEW

3.0K 56.0 622.0 29.69 MB

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Dockerfile 0.05% Python 96.19% Shell 3.22% C++ 0.52% Cython 0.02%

knowledge-distillation model-compression quantization pretrained-models large-scale-distributed

pretrained-language-model's People

Contributors

Stargazers

Watchers

Forkers

edenbuaa daanmomo zhgwen vickzhang issxjl2015 xuyemax365 m102981 xiong666 gc-tom liuwq168 pertschuk chenruiqing delaiahz dario-github may-sunshine fishredleaf linyu0716 fr-wang seeker1943 winnerineast wfeidan as472780551 youngjt saurabhkulkarni77 dakelq zwjyyc xiaodanjiao fendaq jkszw2014 sjyttkl nick-2008 chapzq77 humdingers lzpfmh xueguohua dlreseach luojie-roger qingkongzhiqian airobotzhang duxiaochao gaozhen0816 wdyxwzyh 18106574249 simpleishappy zcj-git shihuaxing stefanruan zoraluo neuralnlp shengzhang90 stjordanis joeyee007 wsadijn lairongxuan realcodebase mbabby annabelle115 liuxinyu41 zhuyawen csanycall houxueliang chen-lc mingkin shirnyqi haiwentom lyr199606264105 dd-guo joseph-mutu myhome1998 sufeheisenberg jacksianyun 1912158597-george 1048693172 zhangzhe2212 slidersun cheungbs mrrace xrosliang yrchen92 xinxiangbobby waterofbigriver ylyinzju zheng5yu9 wlh329387589 luzm3 stevenlee-belief cluluxiu fantabulous-j jh-chen hongyunnchen 1649759610 ptxce0910 wronskia subhasree hanjiacheng superzhang1984 lizhuang0323 phymucs fengqian1989 tricoffee

pretrained-language-model's Issues

请问会发布中文的General_TinyBERT模型吗？

不知道您是否考虑发布中文General_TinyBERT模型？

TinyBert中，关于task_distill的损失函数中，各个层损失函数对应权重怎么初始化？

你好，看到论文中的公式(6）,每一层的损失需要乘上当前层对应的超参数λ_m之后再求和，这里我理解的是这个λ_m是初始化的值，介于0到1之间，并且所有的λ加起来值为1。
1、请问实验中也的确是这么实现的吗？我在task_distill.py中没有看到有类似λ的变量，好像是直接把每一层的损失加起来？
2、如果在实现的时候的确有这样的λ_m的话，想问下当时是怎么初始化的？
3、初始化种子不同，对收敛后λ的最终值影响大吗？

感谢回答！

Release other variants of general TinyBert

Hi,

I found in the paper that you have experimented with four different variants of TinyBert (with different number of layers and dimension) , among which two have general distilled models released. Would it be possible to release general TinyBert for the other two variants(4layer-768dim and 6layer-312dim)?

Thanks

[TinyBert] ERROR, runing the task_distill during task-specific distill for a chinese task.

The ERROR happened during task-specific distill, Traceback is in the END. Fine-turn Bert model was generated using transformer package using the bert-base-chinese model, which included in the transformer package.

Is that because the release of TinyBERT's model trained using corpus without Chinese?

Fine-turn command using transformer as follow:

python run_glue.py   --model_type bert   \
--model_name_or_path bert-base-chinese  \
 --task_name sst-2   \
--do_train   \
--do_eval   \
--do_lower_case \
--data_dir /home/vigosser/nvidia/bert/data/final   \
--max_seq_length 128   \
--per_gpu_train_batch_size 8  \
 --learning_rate 15e-6   \
--num_train_epochs 3.0   \
--output_dir /home/vigosser/TinyBERT/FT_bert

Traceback

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 2066, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 2060, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 1411, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\pydevd.py", line 1418, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2019.2.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/github/TinyBERT/task_distill.py", line 1154, in <module>
    main()
  File "D:/github/TinyBERT/task_distill.py", line 1013, in main
    teacher_logits, teacher_atts, teacher_reps = teacher_model(input_ids, segment_ids, input_mask)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 1133, in forward
    output_all_encoded_layers=True, output_att=True)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 832, in forward
    embedding_output = self.embeddings(input_ids, token_type_ids)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\github\TinyBERT\transformer\modeling.py", line 357, in forward
    words_embeddings = self.word_embeddings(input_ids)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\vigosser\Anaconda3\envs\vai\lib\site-packages\torch\nn\functional.py", line 1484, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 21397 out of table with 21127 rows. at C:\w\1\s\tmp_conda_3.7_112106\conda\conda-bld\pytorch_1572952932150\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:418

哪吒模型的Functional Relative Positional Encoding部分体现在哪里？

modeling.py文件中，对位置编码部分没有体现出Functional Relative Positional Encoding呀，有大佬可以解答一下吗

Can you upload the model to other platforms?

It takes a long time to download four models.
Can you upload the models to other platforms (e.g., google drive)?

Thanks !

Task-specific Distillation 的step1中load_tf_weights_in_bert出现下面的问题

Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'key', 'bias']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'key', 'kernel']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'query', 'bias']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'query', 'kernel']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'value', 'bias']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'attention', 'self', 'value', 'kernel']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'intermediate', 'dense', 'bias']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'intermediate', 'dense', 'kernel']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'output', 'LayerNorm', 'beta']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'output', 'LayerNorm', 'gamma']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'output', 'dense', 'bias']
Initialize PyTorch weight ['bert', 'encoder', 'layer_9', 'output', 'dense', 'kernel']
Initialize PyTorch weight ['bert', 'pooler', 'dense', 'bias']
Initialize PyTorch weight ['bert', 'pooler', 'dense', 'kernel']
Skipping cls/predictions/output_bias
Skipping cls/predictions/output_bias
Skipping cls/predictions/output_bias
Traceback (most recent call last):
File "task_distill.py", line 1162, in
main()
File "task_distill.py", line 927, in main
teacher_model = TinyBertForSequenceClassification.from_pretrained(args.teacher_model, num_labels=num_labels, from_tf=True)
File "/mnt/disk0/home/xx/project/demo/tinybert/TinyBERT/transformer/modeling.py", line 706, in from_pretrained
return load_tf_weights_in_bert(model, weights_path)
File "/mnt/disk0/home/xx/project/demo/tinybert/TinyBERT/transformer/modeling.py", line 119, in load_tf_weights_in_bert
assert pointer.shape == array.shape
File "/home/xx/install/anaconda3/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 576, in getattr
type(self).name, name))
AttributeError: 'TinyBertForSequenceClassification' object has no attribute 'shape'

Some errors in "run_classifier_ner.py" ?

脚本run_classifier_ner.py 中从第1105行到第1115行的缩进有问题吧？这块代码应该包含在第1076行的else循环中？
result = estimator.evaluate(input_fn=predict_input_fn, checkpoint_path=FLAGS.init_checkpoint) # predict output_predict_file = os.path.join(FLAGS.output_dir, "test_results.tsv") eval_re_path = os.path.join(FLAGS.output_dir, "eval") if not os.path.exists(eval_re_path): os.mkdir(eval_re_path) output_test_file = os.path.join(eval_re_path, "test_results.txt") with tf.gfile.GFile(output_test_file, "w") as writer: tf.logging.info("***** Test results *****") for key in sorted(result.keys()): tf.logging.info(" %s = %s", key, str(result[key])) writer.write("%s = %s\n" % (key, str(result[key])))

Where can I find the learnable linear transformation W_h in TinyBERT？

There is a learnable linear transformation W_h (Equation 8) in the TinyBERT paper, could you plz tell me where can I find the code implementation for the transformation? Thanks for you early response.

为什么在蒸馏prediction layer时，没有考虑样本的hard label？

在一般蒸馏过程中，会将soft_target以及hard_target都用上，构造两个交叉熵损失，为什么这里只用了soft_target来计算prediction layer的loss（或者说为什么hard_target cross entropy 权重为0）？

Question towards TinyBERT Data Augmentation ${GLOVE_EMB}$

Hi, all

In the part of Data Augmentation I have seen “--glove_embs ${GLOVE_EMB}$”, I am wondering what should I use to replace this part: "${GLOVE_EMB}$"

I have noticed from the code in the data_augmentation.py, it mentioned it is the glove embedding file. If we should replace "${GLOVE_EMB}$" with the location of the glove embedding file.

May I know where can we get the glove embedding file? Could you provide me with a link?

Task-specific Distillation

In Task-specific Distillation, the description says "${FT_BERT_BASE_DIR}$ contains the fine-tuned BERT-base model", could you offer me the fine-tuned BERT-base model to reproduce results in the paper?

希望提供一些体量更小的NEZHA预训练模型

有时候比较关注inference速度。比如类似于 albert_tiny_zh,albert_small_zh的模型，
给个 nezha_tiny_zh,nezha_small_zh之类的小模型以提升inference速度，哈哈哈哈哈哈

中文版蒸馏模型

官方最近有更新中文版模型计划吗？
有人自己训练蒸馏中文版模型吗？蒸馏模型做下游任务效果如何？

Multi-languages support ?

I was wondering whether it is possible to use this model for other languages (e.g. French)?

I checked the vocab.txt, probably it is for English for now?

run_classifier.py文件中输出test_result.tsv时报错TypeError: string indices must be integers

进行文本分类任务时，想把test的结果打印出来看，就把run_classifier.py中最后的注释解掉运行，发现报错 probabilities = prediction["probabilities"] TypeError: string indices must be integers。
后来发现是和打印“test_result.txt”的result混淆了，上面的是

result = estimator.evaluate(input_fn=predict_input_fn, checkpoint_path=FLAGS.init_checkpoint)

是模型评估的结果，实例预测打印应该用estimator.predict：

result1 = estimator.predict(input_fn=predict_input_fn)
for (i, prediction) in enumerate(result1):
     probabilities = prediction["probabilities"]

这样test_result.tsv就在output的任务目录下面了。

TinyBert distillation on downstream task : Why 2 steps ?

From the paper :

In our experiments, we firstly perform intermediate layer distillation (M ≥ m ≥ 0), then perform
the prediction-layer distillation (m = M + 1).

Why is it necessary to perform the downstream task distillation in 2 separate steps ?

Is it possible to distill on downstream task in one single step, using both loss at the same time ?
Or maybe doing this is more difficult for the model to converge ?

How long does it approx. take to pre-train tinybert?

I couldn't find the infomation in the paper. Thanks :)

预训练模型下载链接不存在

你好，下载链接不存在，麻烦重新发下

inference time

在论文中的inference time单位是s, 不会几十秒吧，是毫秒（ms）吧？

tinybert预训练蒸馏两个问题

预训练蒸馏只有attention和encoder_layer loss, 好像没有mask lm的loss？
如果没有mask lm的loss, 怎么直接测试蒸馏好的小模型效果？

tinyBert general model with `cased`

Hello,

Have you done general distillation using the bert-base-cased model?
and would you have the General_TinyBERT_v2(4layer-312dim) cased model available?

When trying python3 task_distill.py --teacher_model $FT_BERT_BASE_DIR --student_model $GENERAL_TINYBERT_DIR ... on a Fine-Tuned model that is 'bert-cased',
a CUDA error is thrown

I would like to invite you to test NEZHA or TinyBERT on CLUE Benchmark!

Hi! Thank you for your great contribution. I am a founder member of CLUE(Language Understanding Evaluation benchmark for Chinese), a group aimed at promoting the development of Chinese language model.
Recently, we have opened a leaderboard, including 10 different and varied datasets in Chinese. We hope you can use NEZHA or TinyBERT to test these tasks in our platform and promote the development of Chinese language model together : )
Thank you again!

Our Github: https://github.com/CLUEbenchmark/CLUE
Leaderboard system: https://www.cluebenchmarks.com/

pytorch版本哪吒

请问会提供pytorch版本的哪吒吗

No such file or directory: '/cache/shelf.db.dat'

trying to run prepare script and getting this error.
how is this file created? or this part of setup

No such file or directory: '/cache/shelf.db.dat'

请问会不会出现过拟合的情况，BERT-PKD专门解决了过拟合的情况

Patient Knowledge Distillation for BERT Model Compression

Why only teacher_model is applied DistributedDataParallel in general_distill.py ?

I am not familiar with pytorch's DistributedDataParallel, and I am confused that why only teacher_model is applied DistributedDataParallel in general_distill.py ?

词典大小对不上

Task-specific Distillation阶段，teacher是fine-tuned bert base，student是general_tinybert，两者都是由bert base而来，bert base词典大小是21128，但是为啥下载的general_tinybert词典是30522？两者怎么对齐？
在task-specific distill阶段，student词典较大，输入到teacher会造成index越界。

student_logits, student_atts, student_reps = student_model(input_ids, segment_ids, input_mask, is_student=True)
teacher_logits, teacher_atts, teacher_reps = teacher_model(input_ids, segment_ids, input_mask)

模型能否提供一个国内源下载

teacher和student的hidden_size不同时，fit_size作用

假设teacher和student的hidden_size分别为d和d'
当d不等于d'时，利用student模型的fit_dense层，将d‘映射到和d一样的维度，使得student和teacher之间可以计算hidden_state loss。
但是当d和d'像当时，就可以不经过fit_dense映射直接计算hidden_state loss吧。但是代码里用了
if is_student判断，实际应该是判断d是否等于d'吧？

general_distill.py 运行时程序被killed

在运行task_distill时，加载tensorflow的fine-tune之后的bert模型参数报错了

在modeling.py里面的load_tf_weights_in_bert方法里面报错了，错误信息如下：

Traceback (most recent call last):
File "E:/TinyBERT/task_distill_training.py", line 1123, in
main()
File "E:/TinyBERT/task_distill_training.py", line 888, in main
teacher_model = TinyBertForSequenceClassification.from_pretrained(args.teacher_model, num_labels=num_labels)
File "E:\TinyBERT\transformer\modeling.py", line 702, in from_pretrained
return load_tf_weights_in_bert(model, weights_path)
File "E:\TinyBERT\transformer\modeling.py", line 112, in load_tf_weights_in_bert
pointer = pointer[num]
File "C:\Users\ThinkPad\Anaconda3\envs\python36\lib\site-packages\torch\nn\modules\container.py", line 137, in getitem
return self._modules[self._get_abs_string_index(idx)]
File "C:\Users\ThinkPad\Anaconda3\envs\python36\lib\site-packages\torch\nn\modules\container.py", line 128, in _get_abs_string_index
raise IndexError('index {} is out of range'.format(idx))
IndexError: index 10 is out of range

原以为是python版本的原因，但是已经试过python3.7和3.6的版本，都是一样的错，torch版本也都是1.0.1的，也试过只加载原始的没有fine-tune的bert模型参数，但也是这个错

General_TinyBERT 模型可以提供 tensorflow版本吗？

我看模型文件都是 pytorch的，就想问问能不能把tensorflow版本的也帮忙训一下，谢谢！

Is there a plan to release tf-version tinyBert?

Hi!
THX for the great work! I'm looking for a pretrained model like bert but with higher inference speed. So I wonder if the tinyBert-torch model can be uesd by NEZHA or is there a plan to release a tf version tinyBert?
Thank you.

error occured when apply data_augmentation on QNLI and QQP dataset

tail -f data_augmentation_QNLI.log

tail -f data_augmentation_QQP.log

请问有没对比任务蒸馏的时候，两步合成一步的结果？

请问一下，任务蒸馏目前是分两步进行的，先蒸馏表示层，再蒸馏任务层；从论文里看没看出来是分开的，请问有没实验任务蒸馏合成一步，loss由下游任务loss+表示层loss，效果如何？

TinyBert : Why using several Linear layer / activations to get the logits ?

In TinyBert, you get the prediction logits by applying ReLu activation followed by Linear layer :

Pretrained-Language-Model/TinyBERT/transformer/modeling.py

Line 1135 in 05462fe

logits = self.classifier(torch.relu(pooled_output))

But why is this necessary ?

Because pooled_output is already a Linear layer followed by an activation function :

Pretrained-Language-Model/TinyBERT/transformer/modeling.py

Lines 534 to 537 in 05462fe

 pooled_output = self.dense(pooled_output) 

 pooled_output = self.activation(pooled_output) 

 return pooled_output

TinyBERT有中文版吗

如题，请问下TinyBERT发布中文预训练版吗？谢谢！

TinyBERT的疑问

看过TinyBERT的论文后，想请教如下几个问题：
(1)预训练的蒸馏阶段，是指在预训练teacher BERT的同时蒸馏 student TinyBERT吗？比如每个epoch蒸馏一次或者其他？因为看到如下示意图，一开始觉得是预训练的同时进行蒸馏。

另一种是可能是预训练完BERT之后，固定teacher BERT，再用相同的预训练语料同时输入到teacher BERT和要蒸馏出的TinyBERT？再逐个目标函数蒸馏？
(2)论文中似乎没有透露预训练和微调阶段的资源消耗，比如两阶段一共用了多少时间？
多谢！

TinyBert prediction-layer distillation loss : Why not KLDiv ?

For the prediction-layer distillation loss, you used the soft cross-entropy loss.

However, other distill architecture (Distil-Bert, Bert-PKD) used KLDiv loss for the prediction layer.

What is the reason behind this choice ?
Did you try KLDiv and it gave worse results ? Is it empirical choice ?

Data Augmentation

In the phase of Data Augmentation, pretrained_bert_model is General_TinyBERT in data_augmentation.py but is "pre-trained language model BERT" in the description.

蒸馏的效果问题？

 原始的教师网络通过fine-tune后的准确率大概在93%，使用大量未打标签数据输入到教师网络获取打标签数据，将这些数据输入到四层的bert（作为学生网络）中训练，以下两种情况：

（1）未添加中间层loss（atten、embebeding、encoder等），仅仅采用学生的硬标签作为loss，准确率为89%；
（2）添加中间层loss蒸馏，准确率为90%。
这说明中间层loss对最终学生网络准确率影响比较小？不知道TinyBert有测试损失函数添加对蒸馏准确率影响有多大？是不是我这边蒸馏的有问题呢？

请问有尝试过在sequence labelling任务上验证过能力么？

TinyBert : Why applying temperature to student only ?

From the paper :

Temperature is not applied to teacher's logits.

But in the code :

Pretrained-Language-Model/TinyBERT/task_distill.py

Lines 967 to 968 in 05462fe

 cls_loss = soft_cross_entropy(student_logits / args.temperature, 

 teacher_logits / args.temperature)

Temperature is applied to both student and teacher's logits.

Should temperature be applied to both student and teacher's logits ?

INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../nezha/model.ckpt

运行 Peoples-daily-NER 任务的时候出现问题：INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../nezha/model.ckpt

(tensorflow-gpu2) [wgpu@localhost scripts]$ sh run_seq_labelling.sh /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) INFO:tensorflow:***********label_list of this task is ['O', 'B-PER', 'I-PER', 'B-ORG', 'I-ORG', 'B-LOC', 'I-LOC', 'X', '[CLS]', '[SEP]'] WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x7f21fb2a8ea0>) includes params argument, but params are not passed to Estimator. INFO:tensorflow:Using config: {'_model_dir': '../output/peoples-daily-ner/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 100, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true graph_options { rewrite_options { meta_optimizer_iterations: ONE } } , '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f21fac42160>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None} INFO:tensorflow:_TPUContext: eval_on_tpu True WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False. INFO:tensorflow:Writing example 0 of 230 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: train-0 INFO:tensorflow:tokens: 当希望工程救助的百万儿童成长起来，科教兴国蔚然成风时，今天有收藏价值的书你没买，明日就叫你悔不当初！藏书本来就是所有传统收藏门类中的第一大户，只是我们结束温饱的时间太短而已。 INFO:tensorflow:input_ids: 101 2496 2361 3307 2339 4923 3131 1221 4638 4636 674 1036 4997 2768 7270 6629 3341 8024 4906 3136 1069 1744 5917 4197 2768 7599 3198 8024 791 1921 3300 3119 5966 817 966 4638 741 872 3766 743 8024 3209 3189 2218 1373 872 2637 679 2496 1159 8013 5966 741 3315 3341 2218 3221 2792 3300 837 5320 3119 5966 7305 5102 704 4638 5018 671 1920 2787 8024 1372 3221 2769 812 5310 3338 3946 7653 4638 3198 7313 1922 4764 5445 2347 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: train-1 INFO:tensorflow:tokens: 因有关日寇在京掠夺文物详情，藏界较为重视，也是我们收藏北京史料中的要件之一。 INFO:tensorflow:input_ids: 101 1728 3300 1068 3189 2167 1762 776 2966 1932 3152 4289 6422 2658 8024 5966 4518 6772 711 7028 6228 8024 738 3221 2769 812 3119 5966 1266 776 1380 3160 704 4638 6206 816 722 671 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 6 1 1 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: train-2 INFO:tensorflow:tokens: 我们藏有一册 1 9 4 5 年 6 月油印的《北京文物保存保管状态之调查报告》，调查范围涉及故宫、历博、古研所、北大清华图书馆、北图、日伪资料库等二十几家，言及文物二十万件以上，洋洋三万余言，是珍贵的北京史料。 INFO:tensorflow:input_ids: 101 2769 812 5966 3300 671 1085 122 130 125 126 2399 127 3299 3779 1313 4638 517 1266 776 3152 4289 924 2100 924 5052 4307 2578 722 6444 3389 2845 1440 518 8024 6444 3389 5745 1741 3868 1350 3125 2151 510 1325 1300 510 1367 4777 2792 510 1266 1920 3926 1290 1745 741 7667 510 1266 1745 510 3189 841 6598 3160 2417 5023 753 1282 1126 2157 8024 6241 1350 3152 4289 753 1282 674 816 809 677 8024 3817 3817 676 674 865 6241 8024 3221 4397 6586 4638 1266 776 1380 3160 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 7 1 6 7 1 4 5 5 1 6 7 7 7 7 7 7 1 6 7 1 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 7 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: train-3 INFO:tensorflow:tokens: 以家乡的历史文献、特定历史时期书刊、某一名家或名著的多种出版物为专题，注意精品、非卖品、纪念品，集成系列，那收藏的过程就已经够您玩味无穷了。 INFO:tensorflow:input_ids: 101 809 2157 740 4638 1325 1380 3152 4346 510 4294 2137 1325 1380 3198 3309 741 1149 510 3378 671 1399 2157 2772 1399 5865 4638 1914 4905 1139 4276 4289 711 683 7579 8024 3800 2692 5125 1501 510 7478 1297 1501 510 5279 2573 1501 8024 7415 2768 5143 1154 8024 6929 3119 5966 4638 6814 4923 2218 2347 5307 1916 2644 4381 1456 3187 4956 749 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: train-4 INFO:tensorflow:tokens: 我们是受到郑振铎先生、阿英先生著作的启示，从个人条件出发，瞄准现代出版史研究的空白，重点集藏解放区、国民党毁禁出版物。 INFO:tensorflow:input_ids: 101 2769 812 3221 1358 1168 6948 2920 7195 1044 4495 510 7350 5739 1044 4495 5865 868 4638 1423 4850 8024 794 702 782 3340 816 1139 1355 8024 4730 1114 4385 807 1139 4276 1380 4777 4955 4638 4958 4635 8024 7028 4157 7415 5966 6237 3123 1277 510 1744 3696 1054 3673 4881 1139 4276 4289 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 1 2 3 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 5 5 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:***** Running training ***** INFO:tensorflow: Num examples = 230 INFO:tensorflow: Batch size = 16 INFO:tensorflow: Num steps = 143 INFO:tensorflow:Writing example 0 of 50 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: dev-0 INFO:tensorflow:tokens: 美国的华莱士，我和他谈笑风生。 INFO:tensorflow:input_ids: 101 5401 1744 4638 1290 5812 1894 8024 2769 1469 800 6448 5010 7599 4495 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 6 7 1 2 2 2 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: dev-1 INFO:tensorflow:tokens: 看包公断案的戏，看他威风凛凛坐公堂拍桌子动刑具，多少还有一点担心，总怕靠这一套办法弄出错案来，放过了真正的坏人；可看《包公赶驴》这出戏，心里就很踏实：这样是一断一个准的。 INFO:tensorflow:input_ids: 101 4692 1259 1062 3171 3428 4638 2767 8024 4692 800 2014 7599 1123 1123 1777 1062 1828 2864 3430 2094 1220 1152 1072 8024 1914 2208 6820 3300 671 4157 2857 2552 8024 2600 2586 7479 6821 671 1947 1215 3791 2462 1139 7231 3428 3341 8024 3123 6814 749 4696 3633 4638 1776 782 8039 1377 4692 517 1259 1062 6628 7723 518 6821 1139 2767 8024 2552 7027 2218 2523 6672 2141 8038 6821 3416 3221 671 3171 671 702 1114 4638 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: dev-2 INFO:tensorflow:tokens: 譬如看《施公案》，施大人坐公堂问案子不得要领，总是扮成普通百姓深入民间暗中查访，结果就屡破奇案了。 INFO:tensorflow:input_ids: 101 6357 1963 4692 517 3177 1062 3428 518 8024 3177 1920 782 1777 1062 1828 7309 3428 2094 679 2533 6206 7566 8024 2600 3221 2815 2768 3249 6858 4636 1998 3918 1057 3696 7313 3266 704 3389 6393 8024 5310 3362 2218 2249 4788 1936 3428 749 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: dev-3 INFO:tensorflow:tokens: 如果有人问我： [UNK] 你看过许多包公戏，哪一出最好？ [UNK] 我要毫不犹豫地回答道： [UNK] 自然是《包公赶驴》啦！包公毕竟是包公，若是换了好摆身份的什么公，便要先派人通报，然后由卫士前呼后拥而去，如何查得出实情！ [UNK] （马得／画）学习基本法顺利迎回归本报评论员再过 5 5 天，我国政府将对香港恢复行使主权。 INFO:tensorflow:input_ids: 101 1963 3362 3300 782 7309 2769 8038 100 872 4692 6814 6387 1914 1259 1062 2767 8024 1525 671 1139 3297 1962 8043 100 2769 6206 3690 679 4310 6499 1765 1726 5031 6887 8038 100 5632 4197 3221 517 1259 1062 6628 7723 518 1568 8013 1259 1062 3684 4994 3221 1259 1062 8024 5735 3221 2940 749 1962 3030 6716 819 4638 784 720 1062 8024 912 6206 1044 3836 782 6858 2845 8024 4197 1400 4507 1310 1894 1184 1461 1400 2881 5445 1343 8024 1963 862 3389 2533 1139 2141 2658 8013 100 8020 7716 2533 8027 4514 8021 2110 739 1825 3315 3791 7556 1164 6816 1726 2495 3315 2845 6397 6389 1447 1086 6814 126 126 1921 8024 2769 1744 3124 2424 2199 2190 7676 3949 2612 1908 6121 886 712 3326 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 2 3 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:*** Example *** INFO:tensorflow:guid: dev-4 INFO:tensorflow:tokens: 在香港回归前的最后阶段，中共中央举办《 [UNK] 一国两制 [UNK] 与香港基本法》讲座，中央领导同志认真听讲，虚心学习，很有意义。 INFO:tensorflow:input_ids: 101 1762 7676 3949 1726 2495 1184 4638 3297 1400 7348 3667 8024 704 1066 704 1925 715 1215 517 100 671 1744 697 1169 100 680 7676 3949 1825 3315 3791 518 6382 2429 8024 704 1925 7566 2193 1398 2562 6371 4696 1420 6382 8024 5994 2552 2110 739 8024 2523 3300 2692 721 511 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:label_ids: 9 1 6 7 1 1 1 1 1 1 1 1 1 4 5 5 5 1 1 1 1 1 1 1 1 1 1 6 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 INFO:tensorflow:***** Running evaluation ***** INFO:tensorflow: Num examples = 50 INFO:tensorflow: Batch size = 16 INFO:tensorflow:Not using Distribute Coordinator. INFO:tensorflow:Running training and evaluation locally (non-distributed). INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 100 or save_checkpoints_secs None. WARNING:tensorflow:From /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From ../run_classifier_ner.py:564: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.map_and_batch(...). WARNING:tensorflow:From ../run_classifier_ner.py:545: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Running train on CPU INFO:tensorflow:*** Features *** INFO:tensorflow: name = input_ids, shape = (16, 256) INFO:tensorflow: name = input_mask, shape = (16, 256) INFO:tensorflow: name = label_ids, shape = (16, 256) INFO:tensorflow: name = segment_ids, shape = (16, 256) WARNING:tensorflow:From /home/wgpu/deep/Pretrained-Language-Model/NEZHA/modeling.py:365: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use rateinstead ofkeep_prob. Rate should be set to rate = 1 - keep_prob. INFO:tensorflow:use_relative_position: True WARNING:tensorflow:From /home/wgpu/deep/Pretrained-Language-Model/NEZHA/modeling.py:908: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version. Instructions for updating: Use keras.layers.dense instead. INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../nezha/model.ckpt INFO:tensorflow:training_loop marked as finished WARNING:tensorflow:Reraising captured error Traceback (most recent call last): File "../run_classifier_ner.py", line 1124, in <module> tf.app.run() File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "../run_classifier_ner.py", line 1035, in main tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 471, in train_and_evaluate return executor.run() File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 611, in run return self.run_local() File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/training.py", line 712, in run_local saving_listeners=saving_listeners) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2457, in train rendezvous.raise_errors() File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors six.reraise(typ, value, traceback) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/six.py", line 696, in reraise raise value File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train saving_listeners=saving_listeners) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn config) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2534, in _model_fn features, labels, is_export_mode=is_export_mode) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1323, in call_without_tpu return self._call_model_fn(features, labels, is_export_mode=is_export_mode) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1593, in _call_model_fn estimator_spec = self._model_fn(features=features, **kwargs) File "../run_classifier_ner.py", line 705, in model_fn ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint) File "/home/wgpu/deep/Pretrained-Language-Model/NEZHA/modeling.py", line 336, in get_assignment_map_from_checkpoint init_vars = tf.train.list_variables(init_checkpoint) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 95, in list_variables reader = load_checkpoint(ckpt_dir_or_file) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 64, in load_checkpoint return pywrap_tensorflow.NewCheckpointReader(filename) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 326, in NewCheckpointReader return CheckpointReader(compat.as_bytes(filepattern), status) File "/home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ../nezha/model.ckpt /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/wgpu/.conda/envs/tensorflow-gpu2/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)]) Traceback (most recent call last): File "../read_tf_events.py", line 24, in <module> events_name_list = os.listdir(os.path.join(args.task_output_dir, "eval")) FileNotFoundError: [Errno 2] No such file or directory: '../output/peoples-daily-ner/eval'

没有licence

如题

Chinese TinyBERT

现在公开的bert是英文版的么，中文版的啥时候发布呀

Isn't TinyBert Equivalent to copying teacher Bert attention/hidden weights

Looking at Eq 7-9 in the paper (https://arxiv.org/pdf/1909.10351.pdf) and assuming that the student and teacher models have the same dimensionality (i.e. d=d') then how is TinyBert any different (better) than initializing a 4 layer Bert model with the hidden, embedding, and attention weights of the corresponding teacher model? Since you are doing a MSE() loss in eq 7-9 then the minimum of this loss is achieved (assuming d=d', M < N) when Ai_s = Ai_t, H_s=H_t, E_s=E_t => so whats the advantage of using TinyBert for general distillation (GD) (where you dont do the prediction layer Hinton distillation) over simply cloning the selected teacher layers onto the student? Maybe you can clarify?

Task specific第二阶段（prediction layer training）的训练需要使用数据增强后的数据集嘛？

用论文中的数据增强的方法处理中文语料似乎会带来很严重的歧义，这在intermediate layer training的部分影响不是很大，但是在下游任务时影响巨大。似乎使用未增强的数据集会更合适？

Embedding layer distillation not implemented?

Discussed in the paper and included in results, but I can't see this referenced in the Readme or anywhere in the code. Was it implemented in a later (unreleased) version?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble

	pooled_output = self.dense(pooled_output)
	pooled_output = self.activation(pooled_output)

	return pooled_output

	cls_loss = soft_cross_entropy(student_logits / args.temperature,
	teacher_logits / args.temperature)