GithubHelp home page GithubHelp logo

santhoshkolloju / abstractive-summarization-with-transfer-learning Goto Github PK

View Code? Open in Web Editor NEW
403.0 21.0 100.0 807 KB

Abstractive summarisation using Bert as encoder and Transformer Decoder

Jupyter Notebook 35.88% Python 62.32% Perl 0.69% Makefile 0.32% Batchfile 0.29% Shell 0.49%
bert summarization abstractive-summarization abstractive-text-summarization bert-model transformer transfer-learning nlp nlg

abstractive-summarization-with-transfer-learning's Introduction

Abstractive summarization using bert as encoder and transformer decoder

I have used a text generation library called Texar , Its a beautiful library with a lot of abstractions, i would say it to be scikit learn for text generation problems.

The main idea behind this architecture is to use the transfer learning from pretrained BERT a masked language model , I have replaced the Encoder part with BERT Encoder and the deocder is trained from the scratch.

One of the advantages of using Transfomer Networks is training is much faster than LSTM based models as we elimanate sequential behaviour in Transformer models.

Transformer based models generate more gramatically correct and coherent sentences.

To run the model

wget https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip 
unzip uncased_L-12_H-768_A-12.zip

Place the story and summary files under data folder with the following names. -train_story.txt -train_summ.txt -eval_story.txt -eval_summ.txt each story and summary must be in a single line (see sample text given.)

Step1: Run Preprocessing python preprocess.py

This creates two tfrecord files under the data folder.

Step 2: python main.py

Configurations for the model can be changes from config.py file

Step 3: Inference Run the command python inference.py This code runs a flask server Use postman to send the POST request @http://your_ip_address:1118/results with two form parameters story,summary

abstractive-summarization-with-transfer-learning's People

Contributors

pidugusundeep avatar santhoshkolloju avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abstractive-summarization-with-transfer-learning's Issues

Changes to the readme

The earlier readme had a good code as to how to see the verbal output .
Would be really helpful if you can include it .

Not added position embedding to BERT encoder Input

`

    # Creates segment embeddings for each type of tokens.
    segment_embedder = tx.modules.WordEmbedder(
        vocab_size=bert_config.type_vocab_size,
        hparams=bert_config.segment_embed)
    segment_embeds = segment_embedder(src_segment_ids)

    input_embeds = word_embeds + segment_embeds`

As per BERT paper, the input embeddings are a sum of Embedding Lookup, Segment Embedding and position embedding. As we can see in 'input_embeds = word_embeds + segment_embeds', position embedding is missing.

Length of Input Sequence

I observed that with the cnn mail data set ,the length of many of the sentences is beyond 512 words
So although I have trained it over 200,000 steps the bleu score seems to be arund 0.4 .

Any Ideas on how to overcome this problem ?

Can't load save_path when it is None.

I am training the model and it is taking longer than expected so I killed the process.
However, when I am running inference.py, I am getting an error
Traceback (most recent call last):
File "inference.py", line 92, in
saver.restore(sess, tf.train.latest_checkpoint(model_dir))
File "C:\Users\AKHIL\Anaconda3\lib\site-packages\tensorflow_core\python\training\saver.py", line 1277, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
can anyone please look into it?

Network is not optimised.

The network for summarization is not optimised. Therefore the loss is too high and does not reduce much.

Eval method seems to be using data from the train dataset

I observed that although I use
feed_dict = {
iterator.handle: iterator.get_handle(sess, 'eval'),
tx.global_mode(): tf.estimator.ModeKeys.EVAL,
}
in the _eval_epoch method ,I observed that it is using a few examples from train dataset as well.
Is this the desired behavior as we use FeedableDataIterator which is supposed to iterates through multiple datasets and switches between datasets.

If so could you please explain why such a behavior is necessary .

AssertionError: model name:bert/encoder/layer_0/ffn/intermediate/bias not exists!

Hi. While executing the file model.py I am getting the following error on line 109.

AssertionError: model name:bert/encoder/layer_0/ffn/intermediate/bias not exists!

I am stuck here. What should I do to remove this error?
Plus I also add reuse=tf.AUTO_REUSE in tf.variablescope at line 75 to remove the following error:
ValueError: Variable bert/word_embeddings/w already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
I am telling about this in case it has anything to do with assertionError.
Please help me solve this issue. I am stuck here.

Requirements.txt

Creating a requirements.txt file might help users for dependencies :)

Facing memory exhausted while running inference

I've partially trained the model, but when I went for testing the model and ran Inference.py, with static story and summaries in the script, it gave me the insufficient memory error from tensorflow. tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[512,10,50,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Is there an error inside the _eval_epoch function?

Is there an error inside the _eval_epoch function?

After debugging, I found that this function would enter an endless loop, and no operation could overflow it, and [batch] acquisition of batch data seems to never be exhausted. Is there any mistake in my operation?

Looking forward to your reply!

ValueError: Unknown hyperparameter: position_embedder_type. Only hyperparameters named 'kwargs' hyperparameters can contain new entries undefined in default hyperparameters.

ValueError: Unknown hyperparameter: position_embedder_type. Only hyperparameters named 'kwargs' hyperparameters can contain new entries undefined in default hyperparameters.

I clone code, and run main.py without any code change,
file 'model.py', line 90,
encoder = tx.modules.TransformerEncoder(hparams=bert_config.encoder)

what's wrong? what should i do next?
thanks

ImportError: cannot import name 'gfile' from 'tensorflow'

Hello i got this error how can i fix it ?

Traceback (most recent call last):
File "preprocess.py", line 5, in
from config import *
File "E:\project\Abstractive-Summarization-With-Transfer-Learning\config.py", line 1, in
import texar as tx
File "texar_repo\texar_init_.py", line 24, in
from texar.module_base import *
File "texar_repo\texar\module_base.py", line 26, in
from texar.utils.exceptions import TexarError
File "texar_repo\texar\utils_init_.py", line 31, in
from texar.utils.utils_io import *
File "texar_repo\texar\utils\utils_io.py", line 32, in
from tensorflow import gfile
ImportError: cannot import name 'gfile' from 'tensorflow' (C:\Users\Admin\Anaconda3\lib\site-packages\tensorflow_init_.py)

The generated summary has always been one, without any change?

Thank you. I used your code to migrate to the Chinese dataset, but there was a problem in the prediction phase. The generated summary has always been one, without any change? How can I solve this problem?
央 行 : 加 快 推 进 利 率 市 场 化 建 立 存 款 保 险 制 度(Reference Abstract)
济 南 公 租 房 首 日 免 费 体 验(Generating Abstract)

天 津 自 贸 区 等 待 最 后 批 复 将 成 京 津 冀 融 合 实 验 区(Reference Abstract)
济 南 公 租 房 首 日 免 费 体 验(Generating Abstract)

韩 媒 : 油 价 每 下 跌 10 % 中 国 gdp 增 长 0 . 15 %(Reference Abstract)
济 南 公 租 房 首 日 免 费 体 验(Generating Abstract)

Requirements file missing

Can you add a requirement.txt file of this project as I am getting different issues related to in compatible version of different modules

decoder embedding

@santhoshkolloju when you train this model, do you use bert embedding as abstract embedding, it mean article and abstract will go through bert when train this summary model?

AttributeError: 'dict' object has no attribute 'src_txt'

Getting a 500 error when using Postman on /results.

/preprocess.py", line 170, in convert_single_example
tokens_a = tokenizer.tokenize(example.src_txt)
AttributeError: 'dict' object has no attribute 'src_txt'

Is this because I'm using python3?

Double check the initilization part

Thanks for sharing such a helpful repo~

I want to double-check with the author about the initialization part.

According to my understanding, Encoder is initialized with pre-trained BERT and Decoder is initialized from scratch.

Getting error module 'texar_repo.examples.bert.utils.model_utils' has no attribute 'transform_bert_to_texar_config'

I am getting module has no attribute error while running
bert_config = model_utils.transform_bert_to_texar_config(
os.path.join(bert_pretrain_dir, 'bert_config.json'))

AttributeError Traceback (most recent call last)
in ()
bert_config = model_utils.transform_bert_to_texar_config(
os.path.join(bert_pretrain_dir, 'bert_config.json'))

AttributeError: module 'texar_repo.examples.bert.utils.model_utils' has no attribute 'transform_bert_to_texar_config'

Please help

batch size problem

what specific value should be given to the test_batch_size? could anyone suggest?

The bleu score

@santhoshkolloju Hi, i'm using your code to train on my own data, but i find that the bleu score in your code is multiplied by 100, and I am wondering why. Could you give me some clue on that problem? Thanks.

The Result on CNN and Daily Mail

Hello, Thanks for providing the Transformer-based s2s models for abstractive text summarization, it helps me a lot.
I run it on CNN and Daily Mail dataset and obtain the results as:

1 ROUGE-1 Average_R: 0.40213 (95%-conf.int. 0.39962 - 0.40466)
1 ROUGE-1 Average_P: 0.40580 (95%-conf.int. 0.40310 - 0.40855)
1 ROUGE-1 Average_F: 0.39289 (95%-conf.int. 0.39072 - 0.39516)

1 ROUGE-2 Average_R: 0.17639 (95%-conf.int. 0.17417 - 0.17878)
1 ROUGE-2 Average_P: 0.17982 (95%-conf.int. 0.17756 - 0.18227)
1 ROUGE-2 Average_F: 0.17305 (95%-conf.int. 0.17094 - 0.17527)

1 ROUGE-L Average_R: 0.27810 (95%-conf.int. 0.27581 - 0.28035)
1 ROUGE-L Average_P: 0.27940 (95%-conf.int. 0.27701 - 0.28185)
1 ROUGE-L Average_F: 0.27099 (95%-conf.int. 0.26895 - 0.27300)

ROUGE-1/2/L: 39.29/17.30/27.10

I adopt the default setting but find that the results are far from those reported in the previous study. For example (ROUGE-1/2/L)):
In "Text Summarization with Pretrained Encoders": TransformerABS - 40.21; 17.76; 37.09

In fact, the ROUGE-L result is terrible compared with others, therefore I doubt I make some mistakes during training. I trained on 1 GPU for 3 days, total 17w steps with batch size = 32.

Does anyone obtain the result on CNN and Daily Mail dataset, or know what is wrong during training?
Many thanks!

Never ending training

I'm running your code on the CNN/Dailymail dataset.

However, training never end, displaying :

Batch #X

with X growing more and more. I waited a long time, then kill the process.

But now, when I run the inference code, produced summary is very bad. Example :

the two - year - year - year - old cate - old cat was found in the animal .

What did I do wrong ? Anyone in the same situation who succeed to fix the code ? (@Vibha111094)

While running this block i.e. the last block

While running this block i.e. the last block

_#tx.utils.maybe_create_dir(model_dir)
#logging_file = os.path.join(model_dir, 'logging.txt')

model_dir = "gs://bert_summ/models/"uncased_L-12_H-768_A-12/bert_model.ckpt
logging_file= "logging.txt"
logger = utils.get_logger(logging_file)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run(tf.tables_initializer())

smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)

if run_mode == 'train_and_evaluate':
logger.info('Begin running with train_and_evaluate mode')

if tf.train.latest_checkpoint(model_dir) is not None:
    logger.info('Restore latest checkpoint in %s' % model_dir)
    saver.restore(sess, tf.train.latest_checkpoint(model_dir))

iterator.initialize_dataset(sess)

step = 5000
for epoch in range(max_train_epoch):
  iterator.restart_dataset(sess, 'train')
  step = _train_epoch(sess, epoch, step, smry_writer)

elif run_mode == 'test':
logger.info('Begin running with test mode')

logger.info('Restore latest checkpoint in %s' % model_dir)
saver.restore(sess, tf.train.latest_checkpoint(model_dir))

_eval_epoch(sess, 0, mode='test')

else:
raise ValueError('Unknown mode: {}'.format(run_mode))_

The error I am getting is:-

PermissionDeniedError Traceback (most recent call last)
in
10 sess.run(tf.tables_initializer())
11
---> 12 smry_writer = tf.summary.FileWriter(model_dir, graph=sess.graph)
13
14 if run_mode == 'train_and_evaluate':

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py in init(self, logdir, graph, max_queue, flush_secs, graph_def, filename_suffix)
350
351 event_writer = EventFileWriter(logdir, max_queue, flush_secs,
--> 352 filename_suffix)
353 super(FileWriter, self).init(event_writer, graph, graph_def)
354

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/summary/writer/event_file_writer.py in init(self, logdir, max_queue, flush_secs, filename_suffix)
65 self._logdir = logdir
66 if not gfile.IsDirectory(self._logdir):
---> 67 gfile.MakeDirs(self._logdir)
68 self._event_queue = six.moves.queue.Queue(max_queue)
69 self._ev_writer = pywrap_tensorflow.EventsWriter(

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py in recursive_create_dir(dirname)
372 """
373 with errors.raise_exception_on_not_ok_status() as status:
--> 374 pywrap_tensorflow.RecursivelyCreateDir(compat.as_bytes(dirname), status)
375
376

~/anaconda3/envs/tf-1.8/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
517 None, None,
518 compat.as_text(c_api.TF_Message(self.status.status)),
--> 519 c_api.TF_GetCode(self.status.status))
520 # Delete the underlying status object from memory otherwise it stays alive
521 # as there is a reference to status from this from the traceback due to

PermissionDeniedError: Error executing an HTTP request (HTTP response code 401, error code 0, error message ''), response '{
"error": {
"errors": [
{
"domain": "global",
"reason": "required",
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/.",
"locationType": "header",
"location": "Authorization"
}
],
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to bert_summ/models/."
}
}
'
when reading metadata of gs://bert_summ/models/

Taking way too long for Training

I am trying to train with 10k documents with additional 1k document for eval cycle.

Even for these small number of documents, it is projecting around 4 days of training time on Tesla M60 GPU.

I have changed config to have 10 docs per step with max steps to be 10000 for 10 epochs. It takes around 34 seconds per step, which gives us around 4 days of training time.

Am I doing something wrong?

train.tf_record not found

Can you provide train.tf_record file?
NotFoundError: Error executing an HTTP request: HTTP response code 404 with body '{ "error": { "errors": [ { "domain": "global", "reason": "notFound", "message": "No such object: bert_summarization/train.tf_record" } ], "code": 404, "message": "No such object: bert_summarization/train.tf_record" } } ' when reading metadata of gs://bert_summarization/train.tf_record [[node IteratorGetNext_1 (defined at texar_repo/texar/data/data/data_iterators.py:401) ]]

Implement NER fine-tuned BERT model

I really like what you've done here.

I have a BERT model fine-tuned for NER and would like to implement it using your architecture here.

My intention is to bypass the fine-tuning section where you use stories and directly use my fine-tuned model in it's place.

Do you have any tips?

got an unexpected keyword argument 'embedding'

In model.py line 114

decoder = tx.tf.modules.TransformerDecoder(embedding=tgt_embedding,
                             hparams=dcoder_config)

I am getting a type error that there is an unexpected keyword embedding passed into the TransformerDecoder. How did people resolve this? I see that the Transformer Decoder takes in (vocab_size=None, output_layer=None, hparams=None). So I'm not sure what the embedding refers to here.

Any guidance would be appreciated.

Setup error

ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The specified module could not be found.

Happens when i try running preprocess.py file.

ValueError during the init of pretrained BERT

Hello!
I tried your code in a google colab and i encountered a problem i wasn't able to solve.
During the inititalization of the Bert encoder in your ipynb:
https://github.com/santhoshkolloju/Abstractive-Summarization-With-Transfer-Learning/blob/master/BERT_SUMM.ipynb

in cell 15 there occurs the following error

`Intializing the Bert Encoder Graph
loading the bert pretrained weights

ValueError Traceback (most recent call last)
in ()
35 init_checkpoint = os.path.join(bert_pretrain_dir, 'bert_model.ckpt')
36 #init_checkpoint = "gs://cloud-tpu-checkpoints/bert/uncased_L-12_H-768_A-12/bert_model.ckpt"
---> 37 model_utils.init_bert_checkpoint(init_checkpoint)

5 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py in _init_from_checkpoint(ckpt_dir_or_file, assignment_map)
344 "Assignment map with scope only name {} should map to scope only "
345 "{}. Should be 'scope/': 'other_scope/'.".format(
--> 346 scopes, tensor_name_in_ckpt))
347 # If scope to scope mapping was provided, find all variables in the scope
348 # and create variable to variable mapping.

ValueError: Assignment map with scope only name bert/position_embeddings should map to scope only bert/embeddings/position_embeddings. Should be 'scope/': 'other_scope/'.`

i checked the github repo of texar already and found the post:
asyml/texar#127

Basically the code for the encoder and decoder changed in the newer version of texar but i dont know how to adjust the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.