GithubHelp home page GithubHelp logo

alibaba / easytransfer Goto Github PK

View Code? Open in Web Editor NEW
848.0 25.0 162.0 3.7 MB

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

Home Page: https://www.yuque.com/easytransfer/cn/

License: Apache License 2.0

Python 88.20% Shell 7.99% Jupyter Notebook 3.82%
bert nlp-applications knowledge-distillation transfer-learning

easytransfer's Introduction

EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications

Intro

The literature has witnessed the success of applying deep Transfer Learning (TL) for many real-world NLP applications, yet it is not easy to build an easy-to-use TL toolkit to achieve such a goal. To bridge this gap, EasyTransfer is designed to facilitate users leveraging deep TL for NLP applications at ease. It was developed in Alibaba in early 2017, and has been used in the major BUs in Alibaba group and achieved very good results in 20+ business scenarios. It supports the mainstream pre-trained ModelZoo, including pre-trained language models (PLMs) and multi-modal models on the PAI platform, integrates the SOTA models for the mainstream NLP applications in AppZoo, and supports knowledge distillation for PLMs. EasyTransfer is very convenient for users to quickly start model training, evaluation, offline prediction, and online deployment. It also provides rich APIs to make the development of NLP and transfer learning easier.

Main Features

  • Language model pre-training tool: it supports a comprehensive pre-training tool for users to pre-train language models such as T5 and BERT. Based on the tool, the user can easily train a model to achieve great results in the benchmark leaderboards such as CLUE, GLUE, and SuperGLUE;
  • ModelZoo with rich and high-quality pre-trained models: supports the Continual Pre-training and Fine-tuning of mainstream LM models such as BERT, ALBERT, RoBERTa, T5, etc. It also supports a multi-modal model FashionBERT developed using the fashion domain data in Alibaba;
  • AppZoo with rich and easy-to-use applications: supports mainstream NLP applications and those models developed inside of Alibaba, e.g.: HCNN for text matching, and BERT-HAE for MRC.
  • Automatic knowledge distillation: supports task-adaptive knowledge distillation to distill knowledge from a teacher model to a small task-specific student model to reduce parameter size while keep comparable performance.
  • Easy-to-use and high-performance distributed strategy: based on the in-house PAI features, it provides easy-to-use and high-performance distributed strategy for multiple CPU/GPU training.

Architecture

image.png

Installation

You can either install from pip

$ pip install easytransfer

or setup from the source:

$ git clone https://github.com/alibaba/EasyTransfer.git
$ cd EasyTransfer
$ python setup.py install

This repo is tested on Python3.6/2.7, tensorflow 1.12.3

Quick Start

Now let's show how to use just 30 lines of code to build a text classification model based on BERT.

from easytransfer import base_model, layers, model_zoo, preprocessors
from easytransfer.datasets import CSVReader, CSVWriter
from easytransfer.losses import softmax_cross_entropy
from easytransfer.evaluators import classification_eval_metrics

class TextClassification(base_model):
    def __init__(self, **kwargs):
        super(TextClassification, self).__init__(**kwargs)
	self.pretrained_model_name = "google-bert-base-en"
        self.num_labels = 2
        
    def build_logits(self, features, mode=None):
        preprocessor = preprocessors.get_preprocessor(self.pretrained_model_name)
        model = model_zoo.get_pretrained_model(self.pretrained_model_name)
        dense = layers.Dense(self.num_labels)
        input_ids, input_mask, segment_ids, label_ids = preprocessor(features)
        _, pooled_output = model([input_ids, input_mask, segment_ids], mode=mode)
        return dense(pooled_output), label_ids

    def build_loss(self, logits, labels):
        return softmax_cross_entropy(labels, self.num_labels, logits)
    
    def build_eval_metrics(self, logits, labels):
        return classification_eval_metrics(logits, labels, self.num_labels)
        
app = TextClassification()
train_reader = CSVReader(input_glob=app.train_input_fp, is_training=True, batch_size=app.train_batch_size)
eval_reader = CSVReader(input_glob=app.eval_input_fp, is_training=False, batch_size=app.eval_batch_size)              
app.run_train_and_evaluate(train_reader=train_reader, eval_reader=eval_reader)

You can find more details or play with the code in our Jupyter/Notebook PAI-DSW.

You can also use AppZoo Command Line Tools to quickly train an App model. Take text classification on SST-2 dataset as an example. First you can download the train.tsv, dev.tsv and test.tsv, then start training:

$ easy_transfer_app --mode train \
    --inputTable=./train.tsv,./dev.tsv \
    --inputSchema=content:str:1,label:str:1 \
    --firstSequence=content \
    --sequenceLength=128 \
    --labelName=label \
    --labelEnumerateValues=0,1 \
    --checkpointDir=./sst2_models/\
    --numEpochs=3 \
    --batchSize=32 \
    --optimizerType=adam \
    --learningRate=2e-5 \
    --modelName=text_classify_bert \
    --advancedParameters='pretrain_model_name_or_path=google-bert-base-en'

And then predict:

$ easy_transfer_app --mode predict \
    --inputTable=./test.tsv \
    --outputTable=./test.pred.tsv \
    --inputSchema=id:str:1,content:str:1 \
    --firstSequence=content \
    --appendCols=content \
    --outputSchema=predictions,probabilities,logits \
    --checkpointPath=./sst2_models/ 

To learn more about the usage of AppZoo, please refer to our documentation.

Tutorials

Here is the CLUE benchmark example

You can find more benchmarks in https://www.yuque.com/easytransfer/cn/rkm4p7

Links

Tutorials:https://www.yuque.com/easytransfer/itfpm9/qtzvuc

ModelZoo:https://www.yuque.com/easytransfer/itfpm9/oszcof

AppZoo:https://www.yuque.com/easytransfer/itfpm9/ky6hky

API docs:http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/eztransfer_docs/html/index.html

Contact Us

Scan the following QR codes to join Dingtalk discussion group. The group discussions are most in Chinese, but English is also welcomed.

Also we can scan the following QR code to join wechat discussion group.

Citation

@article{easytransfer,
    author = {Minghui Qiu and 
	    Peng Li and 
	    Chengyu Wang and 
	    Haojie Pan and 
	    An Wang and 
	    Cen Chen and 
	    Xianyan Jia and 
	    Yaliang Li and 
	    Jun Huang and 
	    Deng Cai and 
	    Wei Lin},
    title = {EasyTransfer - A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
},
    journal = {CIKM 2021},
    url = {https://arxiv.org/abs/2011.09463},
    year = {2021}
}



easytransfer's People

Contributors

aeloyq avatar alibaba-oss avatar benchen4395 avatar bobmayuze avatar chywang avatar dannygao1984 avatar jerryli1981 avatar joytianya avatar linbojin avatar minghui avatar scarletpan avatar wellinxu avatar wenmengzhou avatar ztl-35 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

easytransfer's Issues

classification_report error

File "multitask_finetune.py", line 160, in main
tnews_report = classification_report(all_tnews_gts, all_tnews_preds, digits=4)
File "/usr/local/lib64/python3.6/site-packages/sklearn/metrics/classification.py", line 1568, in classification_report
name_width = max(len(cn) for cn in target_names)
ValueError: max() arg is an empty sequence

知识蒸馏

在对搜索到的模型进行finetune并预测时,出现错误如下:
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int64 of argument 'x'

大哥们!咱啥时候将TensorFlow更新到2.0+呀!!

之前用bert4keras和keras-xlnet做albert和xlnet因为一个依赖冲突刚的我头皮发麻,用transformer做bert又是torch和TensorFlow2+来回旋转跳跃,本来看到阿里爸爸把这几个bert类模型整合到一起还是TensorFlow环境下的,但是奈何竟然只支持tf1+,因此在此处恳求各个大佬,看看能不能在百忙之中将TensorFlow给升级到2.0+呀,跪谢!!!!!!!!!!!!!!!

8卡的线性度很低

跑fashionbert的多卡,发现8卡的性能跟4卡的性能差不多。
请问是用BundleCSVReader读数据部分的代码的问题吗?
请问这份代码在多卡的情况下验证过吗?

ERROR: File system scheme 'oss' not implemented

Hello,
I try to use meta-finetune and encounter some problem as shown below:

"predict_config": {
predict_checkpoint_path:"oss://pai-wcy/easytransfer/model/google-bert-base-en/model.ckpt",
"predict_input_fp": "amazon_train.tsv",
"predict_batch_size": 32,
"predict_output_fp": null
}

This error message, ``tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme 'oss' not implemented (file: 'oss://pai-wcy/easytransfer/model/google-bert-base-en/train_config.json')'', always comes out.
Thus, I would like to know what is the dir (predict_checkpoint_path) here.

LayerNorm/beta not found in checkpoint

I tried with several pretrained models, including: google-bert-base-en, google-albert-base-en, pai-albert-base-en, they all have the same kind of error:

Key bert_pre_trained_model/bert/encoder/layer_10/attention/output/LayerNorm/beta not found in checkpoint
        [[node save/RestoreV2 (defined at /home/jinzy/.conda/envs/et/lib/python3.6/site-packages/easytransfer/engines/model.py:658)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Question about fashionBERT Rank@K, AUC

When calculating rank@k, it is calculated based on the sorted TIA score.
However, the code does not seem to be right because the scores are compared without the softmax applied.

I guess it's an unfair comparison of TIA score.

So i think read_batch_result_file.py function should be changed

# step 2: rank@K
for idx in range(len(text_prod_ids)):
        query_id = text_prod_ids[idx] if type == 'txt2img' else image_prod_ids[idx]
        doc_id   = image_prod_ids[idx] if type == 'txt2img' else text_prod_ids[idx]
        dscore   = predictions[idx, 1]    --> softmax(predictions[idx], axis=-1)[1]
        dlabel   = labels[idx] 
        doc      = Doc(id=doc_id, score = dscore, label=dlabel)
        if query_id in query_dict:
            query_dict[query_id].append(doc)
        else:
            docs = []
            docs.append(doc)
              query_dict[query_id] = docs

and the AUC is weird too.

# step 3: AUC
for idx in range(len(text_prod_ids)):
        y_preds.append(predictions[idx, 1]) -->  y_preds.append(softmax(predictions[idx], axis=-1)[1])
        y_trues.append(labels[idx])

Is that right?

Thank you

fashion_bert: masked_lm_ids

Hello,

I would like to train fashion_bert on a different dataset.
To construct the data, I need to input "masked_lm_ids".
I was wondering what this is?

Thank you for your help.

InvalidArgumentError when i run "clue_quick_start.ipynb" in Colab with datasets besides “CLUEWSC”

InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node _inference_tf_data_experimental_map_and_batch_124135}} Expect 2 fields but have 3 in record 0
[[{{node DecodeCSV}}]]
[[IteratorGetNext]]
[[classification_regression_preprocessor/StringToNumber_3/_4105]]
(1) Invalid argument: {{function_node _inference_tf_data_experimental_map_and_batch_124135}} Expect 2 fields but have 3 in record 0
[[{{node DecodeCSV}}]]
[[IteratorGetNext]]

Meta Finetune problem

Hello,
I follow the Meta Fine-tune tutorial and fine-tuned the model. (I use the command in bash)
However, I cannot find:

  1. where is the test set (amazon)
  2. how to predict (lost part in the tutorial)
    Even though I try to use PAI-modelzoom, the names of parameters are far different so that hard to know how to predict still.

Could you please provide the (test set-amazon) and (bash file-predicting part) in the tutorial.

Thanks

Meta-fine problem

When I use meta-fine in the predicting stage, both AppZoo Command Line and the released code don't work.

Could you please provide (meta-fine):

  1. AppZoo Command Line and the corresponding config file in the meta-fine predicting stage
    or
  2. The coed and the corresponding config file in meta-fine the predicting stage

FashionBert errors

I tried to run the scripts provided for fashionbert and i'm getting this error:

Traceback (most recent call last):
  File "pretrain_main.py", line 194, in <module>
    main()
  File "pretrain_main.py", line 177, in main
    yield_single_examples=False):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/estimator/estimator.py", line 577, in predict
    features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
     model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/engines/model.py", line 578, in model_fn
    output = self.build_logits(features, mode=mode)
  File "pretrain_main.py", line 50, in build_logits
    input_sequence_length=_APP_FLAGS.input_sequence_length)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/__init__.py", line 38, in get_pretrained_model
     return ImageBertPreTrainedModel.get(pretrain_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/modeling_utils.py", line 99, in get
    model(model.dummy_inputs(kwargs.get('input_sequence_length', 512)), mode='eval', output_features=False)
  File "/usr/local/lib/python3.6/dist-packages/easytransfer/model_zoo/modeling_utils.py", line 69, in dummy_inputs
    input_ids = [[1]*seq_length]
  TypeError: can't multiply sequence by non-int of type 'NoneType'

After running:

python3 pretrain_main.py \
  --workerGPU=1 \
  --type=txt2img  \
  --mode=predict \
  --predict_input_fp=eval_txt2img_list.list_csv  \
  --predict_batch_size=64  \
  --output_dir=./fashionbert_out  \
  --pretrain_model_name_or_path=./pai-imagebert-base-en/model.ckpt  \
  --predict_checkpoint_path=./fashionbert_pretrain_model_fin/model.ckpt-54198  \
  --image_feature_size=131072  \
  --input_schema="image_feature:float:131072,image_mask:int:64,input_ids:int:64,input_mask:int:64,segment_ids:int:64,nx_sent_labels:int:1,prod_desc:str:1,text_prod_id:str:1,image_prod_id:str:1,prod_img_id:str:1"  \

I'm using easytransfer 0.1.2 (since the latest version causes another error as reported here #21) and tensorflow 1.12.3.

Can you give some advice on how to run the scripts? Are there other implementation of FashionBert (pytorch or tensorflow?)

FashionBERT pretrained model error

Hello,

Thanks for sharing the code and the pretrained model. But I get the invalid argument error as follows while using it.
fashionBERTerror_01
link_to_error_image

I am using the following script:
python pretrain_main.py \ --workerGPU=1 \ --type=img2txt \ --mode=predict \ --predict_input_fp=eval_img2txt_list.list_csv \ --predict_batch_size=64 \ --input_sequence_length=64 \ --output_dir=./fashionbert_out \ --pretrain_model_name_or_path=fashionbert_pretrain_model_fin \ --image_feature_size=131072 \ --predict_checkpoint_path=./fashionbert_pretrain_model_fin/model.ckpt-54198 \ --input_schema="image_feature:float:131072,image_mask:int:64,input_ids:int:64,input_mask:int:64,segment_ids:int:64,nx_sent_labels:int:1,prod_desc:str:1,text_prod_id:str:1,image_prod_id:str:1,prod_img_id:str:1" \

Thanks & Regards
@dannygao1984 @minghui

Some questions about adabert

In adabert paper, the performance of the model is evaluated with accuracy(e.g. Table 3), including 78.7 on MRPC task.

But here we take it as F1 score? The accuracy of 70.8 is still lower than 78.7 in the paper.
image

image

在TensorFlow12.3版本下测试hcnn模型报错

测试命令如下:
easy_transfer_app --mode=train --inputSchema=query:str:1,doc:str:1,label:str:1 --inputTable=./train_lcqmc.csv,.dev_lcqmc.csv --firstSequence=query --secondSequence=doc --labelName=label --labelEnumerateValues=0,1 --batchSize=32 --numEpochs=1 --optimizerType=adam --learningRate=0.001 --modelName=text_match_hcnn --checkpointDir=./hcnn_match_models --advancedParameters='first_sequence_length=40 second_sequence_length=40 pretrain_word_embedding_name_or_path=./sgns.zhihu.char.300.bin fix_embedding=true max_vocab_size=30000 embedding_size=300 hidden_size=300'
报错信息如下:
INFO:tensorflow:Initialize word embedding from pretrained
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/tf12.3/bin/easy_transfer_app", line 8, in
sys.exit(main())
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo_cli.py", line 99, in main
app.run()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/app_utils.py", line 168, in wrapper
func(*args, **kw)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/base.py", line 44, in run
getattr(self, self.config.mode.replace("_on_the_fly", ""))()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/base.py", line 113, in train_and_evaluate
self.run_train_and_evaluate(train_reader=train_reader, eval_reader=eval_reader)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/engines/model.py", line 608, in run_train_and_evaluate
eval_spec=eval_spec)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
return executor.run()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
return self.run_local()
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
saving_listeners=saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/engines/model.py", line 530, in model_fn
logits, labels = self.build_logits(features, mode=mode)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/app_zoo/text_match.py", line 618, in build_logits
filter_size=self.config.filter_size)([a_embeds, b_embeds, text_a_masks, text_b_masks])
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/easytransfer/layers/cnn.py", line 276, in call
(a_length / 4 / 3 / 2) * (b_length / 4 / 3 / 2)])
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6482, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 609, in _apply_op_helper
param_name=input_name)
File "/usr/local/anaconda3/envs/tf12.3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'shape' has DataType float32 not in list of allowed values: int32, int64

以上基于安装官方给出的版本tf1.12.3

How to use multi-gpus?

image

As shown in the figure above, I want to use multi gpus to run my job. But it has error like as follow:

Traceback (most recent call last):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 177, in _call_for_each_tower
**merge_kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 193, in _init_from_checkpoint
ckpt_file = _get_checkpoint_filename(ckpt_dir_or_file)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 280, in _get_checkpoint_filename
if gfile.IsDirectory(ckpt_dir_or_file):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 467, in is_directory
return pywrap_tensorflow.IsDirectory(compat.as_bytes(dirname), status)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 61, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got PerDevice({'/replica:0/task:0/device:GPU:0': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt', '/replica:0/task:0/device:GPU:1': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt'})
Traceback (most recent call last):
File "src/fit.py", line 179, in
tf.app.run()
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "src/fit.py", line 168, in main
train()
File "src/fit.py", line 24, in train
app.run_train(reader=train_reader)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/easytransfer/engines/model.py", line 616, in run_train
max_steps=self.train_steps)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1205, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1316, in _train_model_distributed
self.config)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/distribute.py", line 721, in call_for_each_tower
return self._call_for_each_tower(fn, *args, **kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 556, in _call_for_each_tower
return _call_for_each_tower(self, fn, *args, **kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 183, in _call_for_each_tower
coord.join(threads)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/contrib/distribute/python/mirrored_strategy.py", line 177, in _call_for_each_tower
**merge_kwargs)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 193, in _init_from_checkpoint
ckpt_file = _get_checkpoint_filename(ckpt_dir_or_file)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/training/checkpoint_utils.py", line 280, in _get_checkpoint_filename
if gfile.IsDirectory(ckpt_dir_or_file):
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 467, in is_directory
return pywrap_tensorflow.IsDirectory(compat.as_bytes(dirname), status)
File "/data/yangxiaohan/tool/python3.6/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 61, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got PerDevice({'/replica:0/task:0/device:GPU:0': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt', '/replica:0/task:0/device:GPU:1': '/data/yangxiaohan/.eztransfer_modelzoo/bert/google-bert-base-zh/model.ckpt'})

FashionBert - Generate image features

Why does the image_feature_extract.py code for patch feature generation uses num_patches = (4, 4) at line 475? shouldn't it be (8,8)?

Also the paper states that patch features are generated using resnext101 while the script you give uses resnet 50, what is the correct one?

I'm asking because I did some experiments and i tried to generate features from the fashiongen validation dataset using your image_feature_extract.py code and i get strange results: given a certain product, the features i get for its image does not match with the features that you have in the csv evalution files. Maybe there is some preprocessing to do to images? The fashion gen validation dataset has 256x256 RGB images, are these the correct sizes?

I'm using this this code to extract features:

feature_extractor = PatchFeatureExtractor("/content/resnet_v1_50")
results_batch = feature_extractor.predict([img], batch_size=1)
feature = results_batch[0]['feature']

Thank you for our help

AssertionError: assert FLAGS.mode is not None

Traceback (most recent call last):
File "/home/share/NLP/ali_easytransformer/text_cls.py", line 33, in
app = TextClassification()
File "/home/share/NLP/ali_easytransformer/text_cls.py", line 14, in init
super(TextClassification, self).init(**kwargs)
File "/home/nxz/anaconda3/envs/simbert_env/lib/python3.6/site-packages/easytransfer/engines/model.py", line 627, in init
assert FLAGS.mode is not None
AssertionError

export to pb models occured an error:"TypeError: __int__ returned non-int (type NoneType)"

when i export my text match model hcnn checkpoints use:
easy_transfer_app --mode=export --checkpointPath=./hcnn_match_models/model.ckpt-290901 --exportType=app_model --exportDirBase=./pb_models

it occured an error say:
INFO:tensorflow:<easytransfer.app_zoo.text_match.HCNNTextMatch object at 0x7f45410a8fd0>
INFO:tensorflow:Calling model_fn.
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/tf1.12.3/bin/easy_transfer_app", line 33, in
sys.exit(load_entry_point('easytransfer==0.1.4', 'console_scripts', 'easy_transfer_app')())
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo_cli.py", line 99, in main
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/app_utils.py", line 168, in wrapper
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/base.py", line 44, in run
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/base.py", line 164, in export
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/engines/model.py", line 658, in export_model
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 663, in export_savedmodel
mode=model_fn_lib.ModeKeys.PREDICT)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 789, in _export_saved_model_for_mode
strip_default_attrs=strip_default_attrs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 907, in _export_all_saved_models
mode=model_fn_lib.ModeKeys.PREDICT)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 984, in _add_meta_graph_for_mode
config=self.config)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/engines/model.py", line 574, in model_fn
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/app_zoo/text_match.py", line 618, in build_logits
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 374, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in call
outputs = self.call(inputs, *args, **kwargs)
File "/usr/local/anaconda3/envs/tf1.12.3/lib/python3.6/site-packages/easytransfer-0.1.4-py3.6.egg/easytransfer/layers/cnn.py", line 204, in call
TypeError: int returned non-int (type NoneType)

it also happens at the last epoch when save model

my env is : tensorflow 1.15.0

Raise error when adabert finetune

Python: 3.6
tensorflow: 1.12.3

https://github.com/alibaba/EasyTransfer/tree/master/scripts/knowledge_distillation#33-finetune%E5%B9%B6%E9%A2%84%E6%B5%8B

error message:

Parameters:
  arch_l2_reg=0.001
  arch_opt_lr=0.0001
  arch_path=./adabert_models/search/best/arch.json
  checkpointDir=
  config=None
  distribution_strategy=None
  emb_pathes=./adabert_models/search/best/wemb.npy,./adabert_models/search/best/pemb.npy
  embed_size=128
  f=
  h=False
  help=False
  helpfull=False
  helpshort=False
  is_pair_task=1
  is_training=True
  job_name=worker
  loss_beta=4.0
  loss_gamma=0.8
  max_save=1
  mode=None
  modelZooBasePath=/root/.eztransfer_modelzoo
  model_dir=./adabert_models/finetune/
  model_l2_reg=0.0003
  model_opt_lr=5e-06
  num_classes=2
  num_core_per_host=1
  num_token=30522
  open_ess=None
  outputs=None
  save_steps=30
  searched_model=./adabert_models/search/best
  seq_length=128
  tables=None
  task_index=0
  temp_decay_steps=18000
  train_batch_size=32
  train_file=./mrpc/train_mrpc_output_logits.txt,./mrpc/dev_mrpc_output_logits.txt
  train_steps=30
  usePAI=False
  workerCPU=1
  workerCount=1
  workerGPU=1
  worker_hosts=localhost:5001

INFO:tensorflow:Using config: {'_model_dir': './adabert_models/finetune/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 30, '_save_checkpoints_secs': None, '_session_config': intra_op_parallelism_threads: 64
inter_op_parallelism_threads: 64
gpu_options {
  per_process_gpu_memory_fraction: 1.0
  allow_growth: true
  force_gpu_compatible: true
}
allow_soft_placement: true
graph_options {
  rewrite_options {
    constant_folding: OFF
  }
}
, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb74ca82518>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function get_model_fn.<locals>.model_fn at 0x7fb747b5ca60>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/train_mrpc_output_logits.txt, total number of training examples 3668
INFO:tensorflow:num_parallel_batches 8
INFO:tensorflow:shuffle_buffer_size 1024
INFO:tensorflow:prefetch_buffer_size 1
INFO:tensorflow:batch_size 32
INFO:tensorflow:distribution_strategy None
INFO:tensorflow:num_micro_batches 1
INFO:tensorflow:input_schema labels:int:1,ids:int:128,mask:int:128,seg_ids:int:128,prob_logits:float:26
INFO:tensorflow:./mrpc/dev_mrpc_output_logits.txt, total number of eval examples 408
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 30 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
====== 1 cells ======
searched op distributions ======>
{(0, 2): array([0.09165421, 0.10016984, 0.09765881, 0.10739392, 0.11919089,
       0.11196633, 0.0886544 , 0.0918505 , 0.09910965, 0.09235148],
      dtype=float32), (1, 2): array([0.09706013, 0.09356507, 0.0981027 , 0.12304117, 0.11524444,
       0.11587463, 0.08445919, 0.08487304, 0.09398084, 0.09379876],
      dtype=float32), (0, 3): array([0.09431954, 0.09327073, 0.09237881, 0.11603428, 0.11155295,
       0.10067917, 0.08867473, 0.09511623, 0.10352677, 0.10444669],
      dtype=float32), (1, 3): array([0.10270569, 0.09890872, 0.09799318, 0.11776961, 0.1114557 ,
       0.10589372, 0.08784777, 0.08607294, 0.09638216, 0.09497037],
      dtype=float32), (2, 3): array([0.10986389, 0.10315704, 0.09444612, 0.10787328, 0.10679101,
       0.1009643 , 0.09225149, 0.09034712, 0.09656148, 0.09774421],
      dtype=float32), (0, 4): array([0.09107016, 0.08650941, 0.08797061, 0.11874099, 0.10631699,
       0.11541508, 0.09348091, 0.10446716, 0.10441516, 0.09161358],
      dtype=float32), (1, 4): array([0.09745023, 0.09436949, 0.08907194, 0.12406871, 0.12098379,
       0.10303614, 0.08979508, 0.0915589 , 0.0945749 , 0.09509072],
      dtype=float32), (2, 4): array([0.10559002, 0.09695413, 0.09311736, 0.10958813, 0.10113393,
       0.10502651, 0.10361147, 0.08967911, 0.09141986, 0.10387935],
      dtype=float32), (3, 4): array([0.10880414, 0.10283509, 0.10102597, 0.1103332 , 0.10920084,
       0.09756382, 0.087904  , 0.09060738, 0.09158863, 0.10013673],
      dtype=float32)}
derived arch ======>
{(1, 2): 3, (0, 2): 4, (1, 3): 3, (0, 3): 3, (1, 4): 3, (0, 4): 3}

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
    preferred_dtype=default_dtype)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 983, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype int64: 'Tensor("c0/node2/edge0to2/Reshape_1:0", shape=(1, 1, 1, 1), dtype=int64)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_adabert.py", line 322, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_adabert.py", line 280, in main
    estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 471, in train_and_evaluate
    return executor.run()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 610, in run
    return self.run_local()
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/training.py", line 711, in run_local
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1237, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1195, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "main_adabert.py", line 183, in model_fn
    given_arch=given_arch)
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 117, in __init__
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 270, in _build_graph
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 471, in build_cell
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 501, in build_node
  File "/usr/local/lib/python3.6/site-packages/easytransfer-0.1.1-py3.6.egg/easytransfer/model_zoo/modeling_adabert.py", line 561, in build_edge
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 866, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1131, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 5042, in mul
    "Mul", x=x, y=y, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 546, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type int64 that does not match type float32 of argument 'x'.

question about fashionbert

Thanks for sharing your code! I have the following questions about fasionbert.

(1) Have you evaluated the performance of fasionbert without pretraining? That is, training a model by removing mlm and mpm only with image-text-matching task. Besides, There are not fine-tuning on downstream task (eg. image-text matching). So why do you call that as pertaining instead of training.

(2) Have you resize the image patch (3232->224224) for image feature extraction?

(3) Which backbone network is selected for image feature extraction in your released pertained model? resnet50 or resnext101?

a bug need to fix

TypeError: avgloss_logger_hook() takes 4 positional arguments but 5 were given

while in easytransfer/engines/model.py
avgloss_hook = avgloss_logger_hook(self.train_steps,
total_loss,
self.model_dir,
self.config.log_step_count_steps,
self.config.task_index)

and in easytransfer/utils/hooks.py
def avgloss_logger_hook(max_steps, loss, model_dir, log_step_count_steps):

I have installed EasyTransfer via pip install

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.