alibaba / easytransfer Goto Github PK

View Code? Open in Web Editor NEW

848.0 25.0 162.0 3.7 MB

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

Home Page: https://www.yuque.com/easytransfer/cn/

License: Apache License 2.0

Python 88.20% Shell 7.99% Jupyter Notebook 3.82%

bert nlp-applications knowledge-distillation transfer-learning

easytransfer's Introduction

EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications

Intro

The literature has witnessed the success of applying deep Transfer Learning (TL) for many real-world NLP applications, yet it is not easy to build an easy-to-use TL toolkit to achieve such a goal. To bridge this gap, EasyTransfer is designed to facilitate users leveraging deep TL for NLP applications at ease. It was developed in Alibaba in early 2017, and has been used in the major BUs in Alibaba group and achieved very good results in 20+ business scenarios. It supports the mainstream pre-trained ModelZoo, including pre-trained language models (PLMs) and multi-modal models on the PAI platform, integrates the SOTA models for the mainstream NLP applications in AppZoo, and supports knowledge distillation for PLMs. EasyTransfer is very convenient for users to quickly start model training, evaluation, offline prediction, and online deployment. It also provides rich APIs to make the development of NLP and transfer learning easier.

Main Features

Language model pre-training tool: it supports a comprehensive pre-training tool for users to pre-train language models such as T5 and BERT. Based on the tool, the user can easily train a model to achieve great results in the benchmark leaderboards such as CLUE, GLUE, and SuperGLUE;
ModelZoo with rich and high-quality pre-trained models: supports the Continual Pre-training and Fine-tuning of mainstream LM models such as BERT, ALBERT, RoBERTa, T5, etc. It also supports a multi-modal model FashionBERT developed using the fashion domain data in Alibaba;
AppZoo with rich and easy-to-use applications: supports mainstream NLP applications and those models developed inside of Alibaba, e.g.: HCNN for text matching, and BERT-HAE for MRC.
Automatic knowledge distillation: supports task-adaptive knowledge distillation to distill knowledge from a teacher model to a small task-specific student model to reduce parameter size while keep comparable performance.
Easy-to-use and high-performance distributed strategy: based on the in-house PAI features, it provides easy-to-use and high-performance distributed strategy for multiple CPU/GPU training.

Architecture

Installation

You can either install from pip

$ pip install easytransfer

or setup from the source：

$ git clone https://github.com/alibaba/EasyTransfer.git
$ cd EasyTransfer
$ python setup.py install

This repo is tested on Python3.6/2.7, tensorflow 1.12.3

Quick Start

Now let's show how to use just 30 lines of code to build a text classification model based on BERT.

from easytransfer import base_model, layers, model_zoo, preprocessors
from easytransfer.datasets import CSVReader, CSVWriter
from easytransfer.losses import softmax_cross_entropy
from easytransfer.evaluators import classification_eval_metrics

class TextClassification(base_model):
    def __init__(self, **kwargs):
        super(TextClassification, self).__init__(**kwargs)
	self.pretrained_model_name = "google-bert-base-en"
        self.num_labels = 2
        
    def build_logits(self, features, mode=None):
        preprocessor = preprocessors.get_preprocessor(self.pretrained_model_name)
        model = model_zoo.get_pretrained_model(self.pretrained_model_name)
        dense = layers.Dense(self.num_labels)
        input_ids, input_mask, segment_ids, label_ids = preprocessor(features)
        _, pooled_output = model([input_ids, input_mask, segment_ids], mode=mode)
        return dense(pooled_output), label_ids

    def build_loss(self, logits, labels):
        return softmax_cross_entropy(labels, self.num_labels, logits)
    
    def build_eval_metrics(self, logits, labels):
        return classification_eval_metrics(logits, labels, self.num_labels)
        
app = TextClassification()
train_reader = CSVReader(input_glob=app.train_input_fp, is_training=True, batch_size=app.train_batch_size)
eval_reader = CSVReader(input_glob=app.eval_input_fp, is_training=False, batch_size=app.eval_batch_size)              
app.run_train_and_evaluate(train_reader=train_reader, eval_reader=eval_reader)

You can find more details or play with the code in our Jupyter/Notebook PAI-DSW.

You can also use AppZoo Command Line Tools to quickly train an App model. Take text classification on SST-2 dataset as an example. First you can download the train.tsv, dev.tsv and test.tsv, then start training:

$ easy_transfer_app --mode train \
    --inputTable=./train.tsv,./dev.tsv \
    --inputSchema=content:str:1,label:str:1 \
    --firstSequence=content \
    --sequenceLength=128 \
    --labelName=label \
    --labelEnumerateValues=0,1 \
    --checkpointDir=./sst2_models/\
    --numEpochs=3 \
    --batchSize=32 \
    --optimizerType=adam \
    --learningRate=2e-5 \
    --modelName=text_classify_bert \
    --advancedParameters='pretrain_model_name_or_path=google-bert-base-en'

And then predict:

$ easy_transfer_app --mode predict \
    --inputTable=./test.tsv \
    --outputTable=./test.pred.tsv \
    --inputSchema=id:str:1,content:str:1 \
    --firstSequence=content \
    --appendCols=content \
    --outputSchema=predictions,probabilities,logits \
    --checkpointPath=./sst2_models/

To learn more about the usage of AppZoo, please refer to our documentation.

Tutorials

EasyNLP for CLUE Benchmark

Here is the CLUE benchmark example

You can find more benchmarks in https://www.yuque.com/easytransfer/cn/rkm4p7

Links

Tutorials：https://www.yuque.com/easytransfer/itfpm9/qtzvuc

ModelZoo：https://www.yuque.com/easytransfer/itfpm9/oszcof

AppZoo：https://www.yuque.com/easytransfer/itfpm9/ky6hky

API docs：http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/eztransfer_docs/html/index.html

Contact Us

Scan the following QR codes to join Dingtalk discussion group. The group discussions are most in Chinese, but English is also welcomed.

Also we can scan the following QR code to join wechat discussion group.

Citation

@article{easytransfer,
    author = {Minghui Qiu and 
	    Peng Li and 
	    Chengyu Wang and 
	    Haojie Pan and 
	    An Wang and 
	    Cen Chen and 
	    Xianyan Jia and 
	    Yaliang Li and 
	    Jun Huang and 
	    Deng Cai and 
	    Wei Lin},
    title = {EasyTransfer - A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
},
    journal = {CIKM 2021},
    url = {https://arxiv.org/abs/2011.09463},
    year = {2021}
}

easytransfer's People

Contributors

Stargazers

Watchers

Forkers

jerryli1981 bobmayuze ligo gm0616 minghui cluebenchmark shenyi666666 benchen4395 datadaveh search-opensource-space a925907195 laomagic qianrenjian liuhc001 muximuxi 502110983 taot168 jackliaoall-ai-nlp wusanshou2017 xiaming9880 gdh756462786 lin17182210 daizzhisheng april-cai ys610zz hgznull jeff654 astarxixi ceciliacenchen jokertion big-data-ai niaoshuai arseniysky wenmengzhou xrosliang luckylhy xdcs100 arryboom chenhuiji xmm1016 sldyfe scutcyr nbcstevenchen zhenpingli yamingpeng100 xinhhd ljw23 joechou21 wellinxu tspannhw xksa-me youhebuke htfhxx eridgd jessie0624 hunyuanfeng chizhu colinsongf eroicax putdoor chaosqian bshoterj weiziyao foreverqing wyazx huangjx8 threefoldo tyoi000 katehuang920909 leo-ryu johnson7788 tlntin zeminli xinxiangbobby ivyee17 tingxuann ispame lsq357 melon-zhou irfan11111111 ggxxding lichenbiostat livingbody 32de18 del18687058912 peternara jxz542189 mayite derekliu-hz shubhampachori12110095 d294270681 ericdoug-qi msjeinlong sirius93123 leotrees ryanbekabe svpwm danny1984 answer3664 wjn1996

easytransfer's Issues

classification_report error

File "multitask_finetune.py", line 160, in main
tnews_report = classification_report(all_tnews_gts, all_tnews_preds, digits=4)
File "/usr/local/lib64/python3.6/site-packages/sklearn/metrics/classification.py", line 1568, in classification_report
name_width = max(len(cn) for cn in target_names)
ValueError: max() arg is an empty sequence

知识蒸馏

在对搜索到的模型进行finetune并预测时，出现错误如下：
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int64 of argument 'x'

大哥们！咱啥时候将TensorFlow更新到2.0+呀！！

之前用bert4keras和keras-xlnet做albert和xlnet因为一个依赖冲突刚的我头皮发麻，用transformer做bert又是torch和TensorFlow2+来回旋转跳跃，本来看到阿里爸爸把这几个bert类模型整合到一起还是TensorFlow环境下的，但是奈何竟然只支持tf1+，因此在此处恳求各个大佬，看看能不能在百忙之中将TensorFlow给升级到2.0+呀，跪谢！！！！！！！！！！！！！！！

8卡的线性度很低

跑fashionbert的多卡，发现8卡的性能跟4卡的性能差不多。
请问是用BundleCSVReader读数据部分的代码的问题吗？
请问这份代码在多卡的情况下验证过吗？

ERROR: File system scheme 'oss' not implemented

Hello,
I try to use meta-finetune and encounter some problem as shown below:

"predict_config": {
predict_checkpoint_path:"oss://pai-wcy/easytransfer/model/google-bert-base-en/model.ckpt",
"predict_input_fp": "amazon_train.tsv",
"predict_batch_size": 32,
"predict_output_fp": null
}

This error message, ``tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'oss' not implemented (file: 'oss://pai-wcy/easytransfer/model/google-bert-base-en/train_config.json')'', always comes out.
Thus, I would like to know what is the dir (predict_checkpoint_path) here.

wechat QR code is outmoded

Please update it, thank you.

钉钉社群二维码过期了

EasyTransfer/examples/clue_quick_start.ipynb在colab上运行出错

EasyTransfer/examples/clue_quick_start.ipynb 只有一个数据集可以运行，换个数据集就不能运行了。显示csv 解码错误，作者检查一下哪里的问题好吗

LayerNorm/beta not found in checkpoint

I tried with several pretrained models, including: google-bert-base-en, google-albert-base-en, pai-albert-base-en, they all have the same kind of error:

Key bert_pre_trained_model/bert/encoder/layer_10/attention/output/LayerNorm/beta not found in checkpoint
        [[node save/RestoreV2 (defined at /home/jinzy/.conda/envs/et/lib/python3.6/site-packages/easytransfer/engines/model.py:658)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Dingtalk QR code is outdated

Question about fashionBERT Rank@K, AUC

When calculating rank@k, it is calculated based on the sorted TIA score.
However, the code does not seem to be right because the scores are compared without the softmax applied.

I guess it's an unfair comparison of TIA score.

So i think read_batch_result_file.py function should be changed

# step 2: rank@K
for idx in range(len(text_prod_ids)):
        query_id = text_prod_ids[idx] if type == 'txt2img' else image_prod_ids[idx]
        doc_id   = image_prod_ids[idx] if type == 'txt2img' else text_prod_ids[idx]
        dscore   = predictions[idx, 1]    --> softmax(predictions[idx], axis=-1)[1]
        dlabel   = labels[idx] 
        doc      = Doc(id=doc_id, score = dscore, label=dlabel)
        if query_id in query_dict:
            query_dict[query_id].append(doc)
        else:
            docs = []
            docs.append(doc)
              query_dict[query_id] = docs

and the AUC is weird too.

# step 3: AUC
for idx in range(len(text_prod_ids)):
        y_preds.append(predictions[idx, 1]) -->  y_preds.append(softmax(predictions[idx], axis=-1)[1])
        y_trues.append(labels[idx])

Is that right?

Thank you

fashion_bert: masked_lm_ids

Hello,

I would like to train fashion_bert on a different dataset.
To construct the data, I need to input "masked_lm_ids".
I was wondering what this is?

Thank you for your help.

InvalidArgumentError when i run "clue_quick_start.ipynb" in Colab with datasets besides “CLUEWSC”

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node _inference_tf_data_experimental_map_and_batch_124135}} Expect 2 fields but have 3 in record 0
[[{{node DecodeCSV}}]]
[[IteratorGetNext]]
[[classification_regression_preprocessor/StringToNumber_3/_4105]]
(1) Invalid argument: {{function_node _inference_tf_data_experimental_map_and_batch_124135}} Expect 2 fields but have 3 in record 0
[[{{node DecodeCSV}}]]
[[IteratorGetNext]]

Meta Finetune problem

Hello,
I follow the Meta Fine-tune tutorial and fine-tuned the model. (I use the command in bash)
However, I cannot find:

where is the test set (amazon)
how to predict (lost part in the tutorial)
Even though I try to use PAI-modelzoom, the names of parameters are far different so that hard to know how to predict still.

Could you please provide the (test set-amazon) and (bash file-predicting part) in the tutorial.

Thanks

Meta-fine problem

When I use meta-fine in the predicting stage, both AppZoo Command Line and the released code don't work.

Could you please provide (meta-fine):

AppZoo Command Line and the corresponding config file in the meta-fine predicting stage
or
The coed and the corresponding config file in meta-fine the predicting stage

使用pai-bert-tiny-zh model ，报错ValueError: not enough values to unpack (expected 5, got 3)，使用pai-imagebert-base-en可以正常跑完

这是什么原因啊

fashionbert_fashiongen_patch_train_1K是怎么生成的？

打印了其中一行来看，为什么和input_schema的shape对不上？能否放出这段预处理代码？

FashionBert errors

I tried to run the scripts provided for fashionbert and i'm getting this error:

Traceback (most recent call last):
  File "pretrain_main.py", line 194, in <module>
    main()
  File "pretrain_main.py", line 177, in main
    yield_single_examples=False):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/estimator/estimator.py", line 577, in predict
    features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
     model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/engines/model.py", line 578, in model_fn
    output = self.build_logits(features, mode=mode)
  File "pretrain_main.py", line 50, in build_logits
    input_sequence_length=_APP_FLAGS.input_sequence_length)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/__init__.py", line 38, in get_pretrained_model
     return ImageBertPreTrainedModel.get(pretrain_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/modeling_utils.py", line 99, in get
    model(model.dummy_inputs(kwargs.get('input_sequence_length', 512)), mode='eval', output_features=False)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/modeling_utils.py", line 69, in dummy_inputs
    input_ids = [[1]*seq_length]
  TypeError: can't multiply sequence by non-int of type 'NoneType'

After running:

python3 pretrain_main.py \
  --workerGPU=1 \
  --type=txt2img  \
  --mode=predict \
  --predict_input_fp=eval_txt2img_list.list_csv  \
  --predict_batch_size=64  \
  --output_dir=./fashionbert_out  \
  --pretrain_model_name_or_path=./pai-imagebert-base-en/model.ckpt  \
  --predict_checkpoint_path=./fashionbert_pretrain_model_fin/model.ckpt-54198  \
  --image_feature_size=131072  \
  --input_schema="image_feature:float:131072,image_mask:int:64,input_ids:int:64,input_mask:int:64,segment_ids:int:64,nx_sent_labels:int:1,prod_desc:str:1,text_prod_id:str:1,image_prod_id:str:1,prod_img_id:str:1"  \

I'm using easytransfer 0.1.2 (since the latest version causes another error as reported here #21) and tensorflow 1.12.3.

Can you give some advice on how to run the scripts? Are there other implementation of FashionBert (pytorch or tensorflow?)

Where is the AppZoo Command Line Tools?

I use the "easy_transfer_app" command in windows derectly, and it returns error.

用EasyTransfer训练的bert模型能用huggingface transformers推理吗？

如题，该怎么做呢

tsv文件点击后无法自动下载

train.tsv, dev.tsv and test.tsv
就是这里，看起来还是超链接，但是点击后没有任何操作。

FashionBERT pretrained model error

Hello,

Thanks for sharing the code and the pretrained model. But I get the invalid argument error as follows while using it.

link_to_error_image

I am using the following script:
python pretrain_main.py \ --workerGPU=1 \ --type=img2txt \ --mode=predict \ --predict_input_fp=eval_img2txt_list.list_csv \ --predict_batch_size=64 \ --input_sequence_length=64 \ --output_dir=./fashionbert_out \ --pretrain_model_name_or_path=fashionbert_pretrain_model_fin \ --image_feature_size=131072 \ --predict_checkpoint_path=./fashionbert_pretrain_model_fin/model.ckpt-54198 \ --input_schema="image_feature:float:131072,image_mask:int:64,input_ids:int:64,input_mask:int:64,segment_ids:int:64,nx_sent_labels:int:1,prod_desc:str:1,text_prod_id:str:1,image_prod_id:str:1,prod_img_id:str:1" \

Thanks & Regards
@dannygao1984 @minghui

How to visualize the results

You seem to didn't provide the original pictures?

Some questions about adabert

In adabert paper, the performance of the model is evaluated with accuracy(e.g. Table 3), including 78.7 on MRPC task.

But here we take it as F1 score? The accuracy of 70.8 is still lower than 78.7 in the paper.

pai-bert-tiny-zh模型不收敛

这个模型根本训练不起来....比随机初始化还差

在TensorFlow12.3版本下测试hcnn模型报错

测试命令如下：
easy_transfer_app --mode=train --inputSchema=query:str:1,doc:str:1,label:str:1 --inputTable=./train_lcqmc.csv,.dev_lcqmc.csv --firstSequence=query --secondSequence=doc --labelName=label --labelEnumerateValues=0,1 --batchSize=32 --numEpochs=1 --optimizerType=adam --learningRate=0.001 --modelName=text_match_hcnn --checkpointDir=./hcnn_match_models --advancedParameters='first_sequence_length=40 second_sequence_length=40 pretrain_word_embedding_name_or_path=./sgns.zhihu.char.300.bin fix_embedding=true max_vocab_size=30000 embedding_size=300 hidden_size=300'
报错信息如下：
INFO:tensorflow:Initialize word embedding from pretrained
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/tf12.3/bin/easy_transfer_app", line 8, in
sys.exit(main())
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo_cli.py", line 99, in main
app.run()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/app_utils.py", line 168, in wrapper
func(*args, **kw)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/base.py", line 44, in run
getattr(self, self.config.mode.replace("_on_the_fly", ""))()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/base.py", line 113, in train_and_evaluate
self.run_train_and_evaluate(train_reader=train_reader, eval_reader=eval_reader)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/engines/model.py", line 608, in run_train_and_evaluate
eval_spec=eval_spec)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/engines/model.py", line 530, in model_fn
logits, labels = self.build_logits(features, mode=mode)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/text_match.py", line 618, in build_logits
filter_size=self.config.filter_size)([a_embeds, b_embeds, text_a_masks, text_b_masks])
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/layers/cnn.py", line 276, in call
(a_length / 4 / 3 / 2) * (b_length / 4 / 3 / 2)])
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6482, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 609, in _apply_op_helper
param_name=input_name)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'shape' has DataType float32 not in list of allowed values: int32, int64

以上基于安装官方给出的版本tf1.12.3

论文Learning to Expand: Reinforced Response Expansion for Information-seeking Conversations 源码共享

请问论文Learning to Expand: Reinforced Response Expansion for Information-seeking Conversations的源码在什么地方没有找到

pai-imagebert-base-en是在英文<text, image>语料库上训练的吗？有没有在中文数据集上预训练好的模型？

我想用来fine-tune，因为我自己的数据集很小

How to use multi-gpus?

As shown in the figure above, I want to use multi gpus to run my job. But it has error like as follow:

Traceback (most recent call last):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 177, in _call_for_each_tower
**merge_kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 193, in _init_from_checkpoint
ckpt_file = _get_checkpoint_filename(ckpt_dir_or_file)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 280, in _get_checkpoint_filename
if gfile.IsDirectory(ckpt_dir_or_file):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 467, in is_directory
return pywrap_tensorflow.IsDirectory(compat.as_bytes(dirname), status)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 61, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got PerDevice({'/replica:0/task:0/device:GPU:0': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt', '/replica:0/task:0/device:GPU:1': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt'})
Traceback (most recent call last):
File "src/fit.py", line 179, in
tf.app.run()
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "src/fit.py", line 168, in main
train()
File "src/fit.py", line 24, in train
app.run_train(reader=train_reader)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/easytransfer/engines/model.py", line 616, in run_train
max_steps=self.train_steps)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1205, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1316, in _train_model_distributed
self.config)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 721, in call_for_each_tower
return self._call_for_each_tower(fn, *args, **kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 556, in _call_for_each_tower
return _call_for_each_tower(self, fn, *args, **kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 183, in _call_for_each_tower
coord.join(threads)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 177, in _call_for_each_tower
**merge_kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 193, in _init_from_checkpoint
ckpt_file = _get_checkpoint_filename(ckpt_dir_or_file)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 280, in _get_checkpoint_filename
if gfile.IsDirectory(ckpt_dir_or_file):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 467, in is_directory
return pywrap_tensorflow.IsDirectory(compat.as_bytes(dirname), status)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 61, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got PerDevice({'/replica:0/task:0/device:GPU:0': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt', '/replica:0/task:0/device:GPU:1': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt'})

请问text在转化为id时使用的是哪个版本的BERT vocab？

meta-finetune是否可以用于多标签分类

您好！想请教一下meta-finetune是否能够用于多标签分类？看源码涉及到计算 centroids的部分，看起来似乎没有考虑多标签的情况？谢谢！

FashionBert - Generate image features

Why does the image_feature_extract.py code for patch feature generation uses num_patches = (4, 4) at line 475? shouldn't it be (8,8)?

Also the paper states that patch features are generated using resnext101 while the script you give uses resnet 50, what is the correct one?

I'm asking because I did some experiments and i tried to generate features from the fashiongen validation dataset using your image_feature_extract.py code and i get strange results: given a certain product, the features i get for its image does not match with the features that you have in the csv evalution files. Maybe there is some preprocessing to do to images? The fashion gen validation dataset has 256x256 RGB images, are these the correct sizes?

I'm using this this code to extract features:

feature_extractor = PatchFeatureExtractor("/content/resnet_v1_50")
results_batch = feature_extractor.predict([img], batch_size=1)
feature = results_batch[0]['feature']

Thank you for our help

AssertionError: assert FLAGS.mode is not None

Traceback (most recent call last):
File "/home/share/NLP/ali_easytransformer/text_cls.py", line 33, in
app = TextClassification()
File "/home/share/NLP/ali_easytransformer/text_cls.py", line 14, in init
super(TextClassification, self).init(**kwargs)
File "/home/nxz/anaconda3/envs/simbert_env/lib/python3.6/site-packages/easytransfer/engines/model.py", line 627, in init
assert FLAGS.mode is not None
AssertionError

Under the same resource, which model is the best?

Based on your experiments. Thank you very much!

export to pb models occured an error:"TypeError: int returned non-int (type NoneType)"

when i export my text match model hcnn checkpoints use:
easy_transfer_app --mode=export --checkpointPath=./hcnn_match_models/model.ckpt-290901 --exportType=app_model --exportDirBase=./pb_models

it occured an error say:
INFO:tensorflow:<easytransfer.app_zoo.text_match.HCNNTextMatch object at 0x7f45410a8fd0>
INFO:tensorflow:Calling model_fn.
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/tf1.12.3/bin/easy_transfer_app", line 33, in
sys.exit(load_entry_point('easytransfer==0.1.4', 'console_scripts', 'easy_transfer_app')())
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo_cli.py", line 99, in main
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/app_utils.py", line 168, in wrapper
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/base.py", line 44, in run
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/base.py", line 164, in export
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/engines/model.py", line 658, in export_model
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 663, in export_savedmodel
mode=model_fn_lib.ModeKeys.PREDICT)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 789, in _export_saved_model_for_mode
strip_default_attrs=strip_default_attrs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 907, in _export_all_saved_models
mode=model_fn_lib.ModeKeys.PREDICT)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 984, in _add_meta_graph_for_mode
config=self.config)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/engines/model.py", line 574, in model_fn
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/text_match.py", line 618, in build_logits
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/layers/cnn.py", line 204, in call
TypeError: int returned non-int (type NoneType)

it also happens at the last epoch when save model

my env is : tensorflow 1.15.0

Raise error when adabert finetune

Python: 3.6
tensorflow: 1.12.3

https://github.com/alibaba/EasyTransfer/tree/master/scripts/knowledge_distillation#33-finetune%E5%B9%B6%E9%A2%84%E6%B5%8B

error message:

Parameters:
  arch_l2_reg=0.001
  arch_opt_lr=0.0001
  arch_path=./adabert_models/search/best/arch.json
  checkpointDir=
  config=None
  distribution_strategy=None
  emb_pathes=./adabert_models/search/best/wemb.npy,./adabert_models/search/best/pemb.npy
  embed_size=128
  f=
  h=False
  help=False
  helpfull=False
  helpshort=False
  is_pair_task=1
  is_training=True
  job_name=worker
  loss_beta=4.0
  loss_gamma=0.8
  max_save=1
  mode=None
  modelZooBasePath=/root/.eztransfer_modelzoo
  model_dir=./adabert_models/finetune/
  model_l2_reg=0.0003
  model_opt_lr=5e-06
  num_classes=2
  num_core_per_host=1
  num_token=30522
  open_ess=None
  outputs=None
  save_steps=30
  searched_model=./adabert_models/search/best
  seq_length=128
  tables=None
  task_index=0
  temp_decay_steps=18000
  train_batch_size=32
  train_file=./mrpc/train_mrpc_output_logits.txt,./mrpc/dev_mrpc_output_logits.txt
  train_steps=30
  usePAI=False
  workerCPU=1
  workerCount=1
  workerGPU=1
  worker_hosts=localhost:5001

INFO:tensorflow:Using config: {'_model_dir': './adabert_models/finetune/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 30, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 64
inter_op_parallelism_threads: 64
gpu_options {
  per_process_gpu_memory_fraction: 1.0
  allow_growth: true
  force_gpu_compatible: true
}
allow_soft_placement: true
graph_options {
  rewrite_options {
    constant_folding: OFF
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb74ca82518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function get_model_fn.<locals>.model_fn at 0x7fb747b5ca60>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/train_mrpc_output_logits.txt, total number of training examples 3668
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/dev_mrpc_output_logits.txt, total number of eval examples 408
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 30 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
====== 1 cells ======
searched op distributions ======>
{(0, 2): array([0.09165421, 0.10016984, 0.09765881, 0.10739392, 0.11919089,
       0.11196633, 0.0886544 , 0.0918505 , 0.09910965, 0.09235148],
      dtype=float32), (1, 2): array([0.09706013, 0.09356507, 0.0981027 , 0.12304117, 0.11524444,
       0.11587463, 0.08445919, 0.08487304, 0.09398084, 0.09379876],
      dtype=float32), (0, 3): array([0.09431954, 0.09327073, 0.09237881, 0.11603428, 0.11155295,
       0.10067917, 0.08867473, 0.09511623, 0.10352677, 0.10444669],
      dtype=float32), (1, 3): array([0.10270569, 0.09890872, 0.09799318, 0.11776961, 0.1114557 ,
       0.10589372, 0.08784777, 0.08607294, 0.09638216, 0.09497037],
      dtype=float32), (2, 3): array([0.10986389, 0.10315704, 0.09444612, 0.10787328, 0.10679101,
       0.1009643 , 0.09225149, 0.09034712, 0.09656148, 0.09774421],
      dtype=float32), (0, 4): array([0.09107016, 0.08650941, 0.08797061, 0.11874099, 0.10631699,
       0.11541508, 0.09348091, 0.10446716, 0.10441516, 0.09161358],
      dtype=float32), (1, 4): array([0.09745023, 0.09436949, 0.08907194, 0.12406871, 0.12098379,
       0.10303614, 0.08979508, 0.0915589 , 0.0945749 , 0.09509072],
      dtype=float32), (2, 4): array([0.10559002, 0.09695413, 0.09311736, 0.10958813, 0.10113393,
       0.10502651, 0.10361147, 0.08967911, 0.09141986, 0.10387935],
      dtype=float32), (3, 4): array([0.10880414, 0.10283509, 0.10102597, 0.1103332 , 0.10920084,
       0.09756382, 0.087904  , 0.09060738, 0.09158863, 0.10013673],
      dtype=float32)}
derived arch ======>
{(1, 2): 3, (0, 2): 4, (1, 3): 3, (0, 3): 3, (1, 4): 3, (0, 4): 3}

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 983, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("c0/node2/edge0to2/Reshape_1:0", shape=(1, 1, 1, 1), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_adabert.py", line 322, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_adabert.py", line 280, in main
    estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "main_adabert.py", line 183, in model_fn
    given_arch=given_arch)
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 117, in __init__
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 270, in _build_graph
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 471, in build_cell
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 501, in build_node
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 561, in build_edge
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type int64 that does not match type float32 of argument 'x'.

tensorflow.python.framework.errors_impl.InvalidArgumentError: [[{{node DecodeCSV}}]] [[IteratorGetNext]]

脚本报错：
tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 4 fields but have 3 in record 0
[[{{node DecodeCSV}}]]
[[IteratorGetNext]]
NLP中csv转TFRecord，train转正常， dev转失败，为0kb。
https://github.com/livingbody/EasyTransfer/tree/master/scripts/%E5%A4%A9%E6%B1%A0%E5%A4%A7%E8%B5%9B%E4%B8%93%E5%8C%BA

question about fashionbert

Thanks for sharing your code! I have the following questions about fasionbert.

(1) Have you evaluated the performance of fasionbert without pretraining? That is, training a model by removing mlm and mpm only with image-text-matching task. Besides, There are not fine-tuning on downstream task (eg. image-text matching). So why do you call that as pertaining instead of training.

(2) Have you resize the image patch (3232->224224) for image feature extraction?

(3) Which backbone network is selected for image feature extraction in your released pertained model? resnet50 or resnext101?

a bug need to fix

TypeError: avgloss_logger_hook() takes 4 positional arguments but 5 were given

while in easytransfer/engines/model.py
avgloss_hook = avgloss_logger_hook(self.train_steps,
total_loss,
self.model_dir,
self.config.log_step_count_steps,
self.config.task_index)

and in easytransfer/utils/hooks.py
def avgloss_logger_hook(max_steps, loss, model_dir, log_step_count_steps):

I have installed EasyTransfer via pip install