Abstractive summarisation using Bert as encoder and Transformer Decoder

Jupyter Notebook 35.88% Python 62.32% Perl 0.69% Makefile 0.32% Batchfile 0.29% Shell 0.49%

bert summarization abstractive-summarization abstractive-text-summarization bert-model transformer transfer-learning nlp nlg

abstractive-summarization-with-transfer-learning's Introduction

Abstractive summarization using bert as encoder and transformer decoder

I have used a text generation library called Texar , Its a beautiful library with a lot of abstractions, i would say it to be scikit learn for text generation problems.

The main idea behind this architecture is to use the transfer learning from pretrained BERT a masked language model , I have replaced the Encoder part with BERT Encoder and the deocder is trained from the scratch.

One of the advantages of using Transfomer Networks is training is much faster than LSTM based models as we elimanate sequential behaviour in Transformer models.

Transformer based models generate more gramatically correct and coherent sentences.

To run the model

wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip unzip uncased_L-12_H-768_A-12.zip

Place the story and summary files under data folder with the following names. -train_story.txt -train_summ.txt -eval_story.txt -eval_summ.txt each story and summary must be in a single line (see sample text given.)

Step1: Run Preprocessing python preprocess.py

This creates two tfrecord files under the data folder.

Step 2: python main.py

Configurations for the model can be changes from config.py file

Step 3: Inference Run the command python inference.py This code runs a flask server Use postman to send the POST request @http://your_ip_address:1118/results with two form parameters story,summary

abstractive-summarization-with-transfer-learning's People

Contributors

Stargazers

Watchers

Forkers

daviwu huhangithub thzll2001 xcgfth pathakrohit08 godofdream grayming zorrock sachinsingh3107 darienacosta anish52 studypython2016 rahulkhairnarr templeblock develord boluoyu legendtianjin gzjas lity3lenovo garyhsu29 stanxii yencarnacion junhaosli blacksea2001 snowcranestart dartrevan chensongcan vishwakaria databill86 agnibrainhack jufengada faisalnazir syyunn chenjun0210 xuhaiming1996 jiankeguxin rajshakerp xinxingit217 manikant92 thatianafernandes shizhediao qianrenjian itspritish pidugusundeep hecongqing duyuankai1992 nikwww aiedward xiaojia1234 ompanda ranjeetds somnai-dreams brightgems fatmalearning lizhzh8 schokoro mehwishfatimah shahzeb42 swayam01 luweishuang rsrahul1000 anurag-anand71994 greengrass2015 salokr rogervaas eulertech wchen-casia sadam1195 thientu alirezabayatmk datacrunchwizard torr95 nanditamohan haojiepan1 stefhill praveenvattem liangzai951 liszhang enrico-stack vigneshblockdev eduard-s gnvramanarao peterpanunderhill dragomirradev jiedali prabhakars rhadimogavi olamecome lunnada dorco95 yashkumaratri techthiyanes aftabanjum4451 zhongxiangboy sunjc234 aruneshwari eman22s rhythmgirdhar indransec iq-scm

abstractive-summarization-with-transfer-learning's Issues

InvalidArgumentError when changing the batch size

I tried to run the code with a batch size of 8, however I got this error :

InvalidArgumentError (see above for traceback): Incompatible shapes at component 0: expected [1,512] but got [8,512].

AttributeError: module 'tensorflow.python.layers.layers' has no attribute 'Layer'

File "texar_repo\texar\core\layers.py", line 628, in
class _ReducePooling1D(tf.layers.Layer):
AttributeError: module 'tensorflow.python.layers.layers' has no attribute 'Layer'

what should i do ?
thanks

Changes to the readme

The earlier readme had a good code as to how to see the verbal output .
Would be really helpful if you can include it .

Not added position embedding to BERT encoder Input

    # Creates segment embeddings for each type of tokens.
    segment_embedder = tx.modules.WordEmbedder(
        vocab_size=bert_config.type_vocab_size,
        hparams=bert_config.segment_embed)
    segment_embeds = segment_embedder(src_segment_ids)

    input_embeds = word_embeds + segment_embeds`

As per BERT paper, the input embeddings are a sum of Embedding Lookup, Segment Embedding and position embedding. As we can see in 'input_embeds = word_embeds + segment_embeds', position embedding is missing.

Length of Input Sequence

I observed that with the cnn mail data set ,the length of many of the sentences is beyond 512 words
So although I have trained it over 200,000 steps the bleu score seems to be arund 0.4 .

Any Ideas on how to overcome this problem ?

Can't load save_path when it is None.

I am training the model and it is taking longer than expected so I killed the process.
However, when I am running inference.py, I am getting an error
Traceback (most recent call last):
File "inference.py", line 92, in
saver.restore(sess, tf.train.latest_checkpoint(model_dir))
File "C:\Users\AKHIL\Anaconda3\lib\site-packages\tensorflow_core\python\training\saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
can anyone please look into it?

Network is not optimised.

The network for summarization is not optimised. Therefore the loss is too high and does not reduce much.

Eval method seems to be using data from the train dataset

I observed that although I use
feed_dict = {
iterator.handle: iterator.get_handle(sess, 'eval'),
tx.global_mode(): tf.estimator.ModeKeys.EVAL,
}
in the _eval_epoch method ,I observed that it is using a few examples from train dataset as well.
Is this the desired behavior as we use FeedableDataIterator which is supposed to iterates through multiple datasets and switches between datasets.

If so could you please explain why such a behavior is necessary .

More details on how to get the data

How did you get the data txt files ?

How did you process it ? Did you tokenize it ?

AssertionError: model name:bert/encoder/layer_0/ffn/intermediate/bias not exists!

Hi. While executing the file model.py I am getting the following error on line 109.

AssertionError: model name:bert/encoder/layer_0/ffn/intermediate/bias not exists!

I am stuck here. What should I do to remove this error?
Plus I also add reuse=tf.AUTO_REUSE in tf.variablescope at line 75 to remove the following error:
ValueError: Variable bert/word_embeddings/w already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
I am telling about this in case it has anything to do with assertionError.
Please help me solve this issue. I am stuck here.

Requirements.txt

Creating a requirements.txt file might help users for dependencies :)

Facing memory exhausted while running inference

I've partially trained the model, but when I went for testing the model and ran Inference.py, with static story and summaries in the script, it gave me the insufficient memory error from tensorflow. tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,10,50,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Thanks, I've made the suggested changes. The process is no longer hanging, but I see the following error. (I haven't made any other code changes)

Is there an error inside the _eval_epoch function?

After debugging, I found that this function would enter an endless loop, and no operation could overflow it, and [batch] acquisition of batch data seems to never be exhausted. Is there any mistake in my operation?

Looking forward to your reply!

ValueError: Unknown hyperparameter: position_embedder_type. Only hyperparameters named 'kwargs' hyperparameters can contain new entries undefined in default hyperparameters.

I clone code, and run main.py without any code change,
file 'model.py', line 90,
encoder = tx.modules.TransformerEncoder(hparams=bert_config.encoder)

what's wrong? what should i do next?
thanks

ImportError: cannot import name 'gfile' from 'tensorflow'

Hello i got this error how can i fix it ?

Traceback (most recent call last):
File "preprocess.py", line 5, in
from config import *
File "E:\project\Abstractive-Summarization-With-Transfer-Learning\config.py", line 1, in
import texar as tx
File "texar_repo\texar_init_.py", line 24, in
from texar.module_base import *
File "texar_repo\texar\module_base.py", line 26, in
from texar.utils.exceptions import TexarError
File "texar_repo\texar\utils_init_.py", line 31, in
from texar.utils.utils_io import *
File "texar_repo\texar\utils\utils_io.py", line 32, in
from tensorflow import gfile
ImportError: cannot import name 'gfile' from 'tensorflow' (C:\Users\Admin\Anaconda3\lib\site-packages\tensorflow_init_.py)

NameError: name 'bert_pretrain_dir' is not defined

A little help here please.

alueError: Unknown hyperparameter: position_embedder_type. Only hyperparameters named 'kwargs' hyperparameters can contain new entries undefined in default hyperparameters.

Getting this error. I am using tensorflow 1.10.0 and texar 0.2.1
Please help.

The generated summary has always been one, without any change?

Thank you. I used your code to migrate to the Chinese dataset, but there was a problem in the prediction phase. The generated summary has always been one, without any change? How can I solve this problem?
央行：加快推进利率市场化建立存款保险制度（Reference Abstract）
济南公租房首日免费体验（Generating Abstract）

天津自贸区等待最后批复将成京津冀融合实验区（Reference Abstract）
济南公租房首日免费体验（Generating Abstract）

韩媒：油价每下跌 10 % 中国 gdp 增长 0 . 15 %（Reference Abstract）
济南公租房首日免费体验（Generating Abstract）

Requirements file missing

Can you add a requirement.txt file of this project as I am getting different issues related to in compatible version of different modules

decoder embedding

@santhoshkolloju when you train this model, do you use bert embedding as abstract embedding, it mean article and abstract will go through bert when train this summary model?

AttributeError: 'dict' object has no attribute 'src_txt'

Getting a 500 error when using Postman on /results.

/preprocess.py", line 170, in convert_single_example
tokens_a = tokenizer.tokenize(example.src_txt)
AttributeError: 'dict' object has no attribute 'src_txt'

Is this because I'm using python3?

File "Inference.py", line 40, in infer_single_example 'inferred_ids': inferred_ids, NameError: name 'inferred_ids' is not defined

Double check the initilization part

Thanks for sharing such a helpful repo~

I want to double-check with the author about the initialization part.

According to my understanding, Encoder is initialized with pre-trained BERT and Decoder is initialized from scratch.

File "main.py", line 66, in _eval_epoch 'inferred_ids': inferred_ids, NameError: name 'inferred_ids' is not defined

File "main.py", line 146, in
step = _train_epoch(sess, epoch, step, smry_writer)
File "main.py", line 55, in _train_epoch
_eval_epoch(sess, epoch, mode='eval')
File "main.py", line 66, in _eval_epoch 'inferred_ids': inferred_ids, NameError: name 'inferred_ids' is not defined

I got this error when i run main.py

How can I get a abstract quickly?

How can I get a abstract quickly, now I need long time and many cpu source to count it, if have a fast solution ?

How can I run inference.py locally and in a normal way rather than using a flask app, which I can hardly figure out how it works?

Getting error module 'texar_repo.examples.bert.utils.model_utils' has no attribute 'transform_bert_to_texar_config'

I am getting module has no attribute error while running
bert_config = model_utils.transform_bert_to_texar_config(
os.path.join(bert_pretrain_dir, 'bert_config.json'))

AttributeError Traceback (most recent call last)
in ()
bert_config = model_utils.transform_bert_to_texar_config(
os.path.join(bert_pretrain_dir, 'bert_config.json'))

AttributeError: module 'texar_repo.examples.bert.utils.model_utils' has no attribute 'transform_bert_to_texar_config'

Please help

batch size problem

what specific value should be given to the test_batch_size? could anyone suggest?

i want to change batch_size to 8,could do it?

The bleu score

@santhoshkolloju Hi, i'm using your code to train on my own data, but i find that the bleu score in your code is multiplied by 100, and I am wondering why. Could you give me some clue on that problem? Thanks.

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./uncased_L-12_H-768_A-12/bert_model.ckpt

Hi, Can i use your code for Chinese task?

Can you make a demo data of this file ?

-train_story.txt
-train_summ.txt
-eval_story.txt
-eval_summ.txt

How long does this eval_epoch runs on a GTX 1080 with the full CNN Stories dataset?

The Result on CNN and Daily Mail

Hello, Thanks for providing the Transformer-based s2s models for abstractive text summarization, it helps me a lot.
I run it on CNN and Daily Mail dataset and obtain the results as:

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466)
1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855)
1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878)
1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227)
1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035)
1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185)
1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

ROUGE-1/2/L: 39.29/17.30/27.10

I adopt the default setting but find that the results are far from those reported in the previous study. For example (ROUGE-1/2/L)):
In "Text Summarization with Pretrained Encoders": TransformerABS - 40.21; 17.76; 37.09

In fact, the ROUGE-L result is terrible compared with others, therefore I doubt I make some mistakes during training. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32.

Does anyone obtain the result on CNN and Daily Mail dataset, or know what is wrong during training?
Many thanks!

Never ending training

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

While running this block i.e. the last block

_#tx.utils.maybe_create_dir(model_dir)
#logging_file = os.path.join(model_dir, 'logging.txt')

model_dir = "gs://bert_summ/models/"uncased_L-12_H-768_A-12/bert_model.ckpt
logging_file= "logging.txt"
logger = utils.get_logger(logging_file)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run(tf.tables_initializer())

smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)

if run_mode == 'train_and_evaluate':
logger.info('Begin running with train_and_evaluate mode')

if tf.train.latest_checkpoint(model_dir) is not None:
    logger.info('Restore latest checkpoint in %s' % model_dir)
    saver.restore(sess, tf.train.latest_checkpoint(model_dir))

iterator.initialize_dataset(sess)

step = 5000
for epoch in range(max_train_epoch):
  iterator.restart_dataset(sess, 'train')
  step = _train_epoch(sess, epoch, step, smry_writer)

elif run_mode == 'test':
logger.info('Begin running with test mode')

logger.info('Restore latest checkpoint in %s' % model_dir)
saver.restore(sess, tf.train.latest_checkpoint(model_dir))

_eval_epoch(sess, 0, mode='test')

else:
raise ValueError('Unknown mode: {}'.format(run_mode))_

The error I am getting is:-

PermissionDeniedError Traceback (most recent call last)
in
10 sess.run(tf.tables_initializer())
11
---> 12 smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)
13
14 if run_mode == 'train_and_evaluate':

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py in init(self, logdir, graph, max_queue, flush_secs, graph_def, filename_suffix)
350
351 event_writer = EventFileWriter(logdir, max_queue, flush_secs,
--> 352 filename_suffix)
353 super(FileWriter, self).init(event_writer, graph, graph_def)
354

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py in init(self, logdir, max_queue, flush_secs, filename_suffix)
65 self._logdir = logdir
66 if not gfile.IsDirectory(self._logdir):
---> 67 gfile.MakeDirs(self._logdir)
68 self._event_queue = six.moves.queue.Queue(max_queue)
69 self._ev_writer = pywrap_tensorflow.EventsWriter(

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py in recursive_create_dir(dirname)
372 """
373 with errors.raise_exception_on_not_ok_status() as status:
--> 374 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
375
376

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
517 None, None,
518 compat.as_text(c_api.TF_Message(self.status.status)),
--> 519 c_api.TF_GetCode(self.status.status))
520 # Delete the underlying status object from memory otherwise it stays alive
521 # as there is a reference to status from this from the traceback due to

PermissionDeniedError: Error executing an HTTP request (HTTP response code 401, error code 0, error message ''), response '{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/.",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/."
}
}
'
when reading metadata of gs://bert_summ/models/

Taking way too long for Training

I am trying to train with 10k documents with additional 1k document for eval cycle.

Even for these small number of documents, it is projecting around 4 days of training time on Tesla M60 GPU.

I have changed config to have 10 docs per step with max steps to be 10000 for 10 epochs. It takes around 34 seconds per step, which gives us around 4 days of training time.

Am I doing something wrong?

Does the code produce verbal summaries?

What additional changes are required in the code so that I can see the actual summary produced for a given paragraph.

train.tf_record not found

Can you provide train.tf_record file?
NotFoundError: Error executing an HTTP request: HTTP response code 404 with body '{ "error": { "errors": [ { "domain": "global", "reason": "notFound", "message": "No such object: bert_summarization/train.tf_record" } ], "code": 404, "message": "No such object: bert_summarization/train.tf_record" } } ' when reading metadata of gs://bert_summarization/train.tf_record [[node IteratorGetNext_1 (defined at texar_repo/texar/data/data/data_iterators.py:401) ]]

Can you provide some summary examples from text?

ValueError: Dimensions must be equal, but are 768 and 512 for 'bert/transformer_encoder_1/layer_0/add'

ValueError: Dimensions must be equal, but are 768 and 512 for 'bert/transformer_encoder_1/layer_0/add' (op: 'AddV2') with input shapes:
[?,?,768], [?,?,512].

Can you please help me resolve this dimension error. Thanks

Implement NER fine-tuned BERT model

I really like what you've done here.

I have a BERT model fine-tuned for NER and would like to implement it using your architecture here.

My intention is to bypass the fine-tuning section where you use stories and directly use my fine-tuned model in it's place.

Do you have any tips?

got an unexpected keyword argument 'embedding'

In model.py line 114

decoder = tx.tf.modules.TransformerDecoder(embedding=tgt_embedding,
                             hparams=dcoder_config)

I am getting a type error that there is an unexpected keyword embedding passed into the TransformerDecoder. How did people resolve this? I see that the Transformer Decoder takes in (vocab_size=None, output_layer=None, hparams=None). So I'm not sure what the embedding refers to here.

Any guidance would be appreciated.

Setup error

ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The specified module could not be found.

Happens when i try running preprocess.py file.

ValueError during the init of pretrained BERT

Hello!
I tried your code in a google colab and i encountered a problem i wasn't able to solve.
During the inititalization of the Bert encoder in your ipynb:
https://github.com/santhoshkolloju/Abstractive-Summarization-With-Transfer-Learning/blob/master/BERT_SUMM.ipynb

in cell 15 there occurs the following error

`Intializing the Bert Encoder Graph
loading the bert pretrained weights

ValueError Traceback (most recent call last)
in ()
35 init_checkpoint = os.path.join(bert_pretrain_dir, 'bert_model.ckpt')
36 #init_checkpoint = "gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_model.ckpt"
---> 37 model_utils.init_bert_checkpoint(init_checkpoint)

5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py in _init_from_checkpoint(ckpt_dir_or_file, assignment_map)
344 "Assignment map with scope only name {} should map to scope only "
345 "{}. Should be 'scope/': 'other_scope/'.".format(
--> 346 scopes, tensor_name_in_ckpt))
347 # If scope to scope mapping was provided, find all variables in the scope
348 # and create variable to variable mapping.

ValueError: Assignment map with scope only name bert/position_embeddings should map to scope only bert/embeddings/position_embeddings. Should be 'scope/': 'other_scope/'.`

i checked the github repo of texar already and found the post:
asyml/texar#127

Basically the code for the encoder and decoder changed in the newer version of texar but i dont know how to adjust the code.

santhoshkolloju / abstractive-summarization-with-transfer-learning Goto Github PK

abstractive-summarization-with-transfer-learning's Introduction

Abstractive summarization using bert as encoder and transformer decoder

To run the model

abstractive-summarization-with-transfer-learning's People

Contributors

Stargazers

Watchers

Forkers

abstractive-summarization-with-transfer-learning's Issues

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466) 1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855) 1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878) 1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227) 1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035) 1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185) 1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

While running this block i.e. the last block

The error I am getting is:-

`Intializing the Bert Encoder Graph loading the bert pretrained weights

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466)
1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855)
1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878)
1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227)
1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035)
1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185)
1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

`Intializing the Bert Encoder Graph
loading the bert pretrained weights