GithubHelp home page GithubHelp logo

jrc1995 / abstractive-summarization Goto Github PK

View Code? Open in Web Editor NEW
167.0 10.0 59.0 338 KB

Implementation of abstractive summarization using LSTM in the encoder-decoder architecture with local attention.

License: MIT License

Jupyter Notebook 100.00%
abstractive-text-summarization nlp deep-learning encoder-decoder attention-mechanism local-attention tensorflow lstm

abstractive-summarization's People

Contributors

jrc1995 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

abstractive-summarization's Issues

Pretrained model

Actually, I was studying the implementation but could not get the pretrained model.I need to get the Seq2seq_summarization.cpkt file. I downloaded the zip file from the repositories but the pretrained model was not in. Please I need to have it.

How to get context?

plz help me to understand how to select the next word to be predicted and context for it.
thank you

Word2vec function and glove

the word2vec function is used to find out vector representation of a word using glove right? so what happens if the particular word is not found in glove? so is the nearest neighbor used to find the nearest vector? Is glove only used during preprocessing? during training are we substituting the actual word with closest match?
Thank you

Don't know where to get these files

Please tell me from where I can get these files:

vec_summaries
vec_texts
vocab_limit
embd_limit

Thanks! I'm really eager to test this fully...

vec_summaries_reduced and vec_texts_reduced returning blank list []

when i am printing the imported summaries from the pickled file, its alright. It's fully printed in its encoded form. Vec_texts also prints fine. But vec_summaries_reduced and vec_texts_reduced returns blank list.
therefore the train_len is printed as 0.
also Percentage of the dataset with text length less that window size:
99.848
isn't that bad since 99% is less than window size so it is getting reduced?. when d=1 window size 3, then it is 44%, yet no data outputted from vec_summaries reduced ...please tell me what to do?
p.s-> don't know if it has any significance but I had not properly implemented the clean function from your code(it gave some errors). So I only just lowercased everything, nothing else.

update: when i set maxlen_summary as 80 instead of 7, trainlen = 14400. But is setting it as 80 wrong?

Testing

in your code there is no testing yet, but I see the testing data already exists. how to implement it? is it the same as validation
Can you help me for the steps i need to do?

ZeroDivisionError

Trying your method for a different dataset and I am getting a ZeroDivisionError for the Training and Validation Section. I assume that something is not loading properly because there should be no zero values.
Here is the code:

`import pickle
import random

with tf.Session() as sess: # Start Tensorflow Session
display_step = 100
patience = 5

 load = input("\nLoad checkpoint? y/n: ")
 print("")
 saver = tf.train.Saver()

 if load.lower() == 'y':

     print('Loading pre-trained weights for the model...')

     saver.restore(sess, r'C:\Users\james\Desktop\Title Generation - SENG 6245\Dataset250K.csv')
     sess.run(tf.global_variables())
     sess.run(tf.tables_initializer())

     with open(r'C:\Users\james\Desktop\Title Generation - SENG 6245\Dataset250K.csv', 'rb') as fp:
         train_data = pickle.load(fp)

     covered_epochs = train_data['covered_epochs']
     best_loss = train_data['best_loss']
     impatience = 0
    
     print('\nRESTORATION COMPLETE\n')

 else:
     best_loss = 2**30
     impatience = 0
     covered_epochs = 0

     init = tf.global_variables_initializer()
     sess.run(init)
     sess.run(tf.tables_initializer())

 epoch=0
 while (epoch+covered_epochs)<epochs:
    
     print("\n\nSTARTING TRAINING\n\n")
    
     batches_indices = [i for i in range(0, len(train_batches_text))]
     random.shuffle(batches_indices)

     total_train_acc = 0
     total_train_loss = 0

     for i in range(0, len(train_batches_text)):
        
         j = int(batches_indices[i])

         cost,prediction,\
             acc, _ = sess.run([cross_entropy,
                                outputs,
                                accuracy,
                                train_op],
                               feed_dict={tf_text: train_batches_text[j],
                                          tf_embd: embd,
                                          tf_summary: train_batches_summary[j],
                                          tf_true_summary_len: train_batches_true_summary_len[j],
                                          tf_train: True})
        
         total_train_acc += acc
         total_train_loss += cost

         if i % display_step == 0:
             print("Iter "+str(i)+", Cost= " +
                   "{:.3f}".format(cost)+", Acc = " +
                   "{:.2f}%".format(acc*100))
        
         if i % 500 == 0:
            
             idx = random.randint(0,len(train_batches_text[j])-1)
            
            
            
             text = " ".join([idx2vocab.get(vec,"<UNK>") for vec in train_batches_text[j][idx]])
             predicted_summary = [idx2vocab.get(vec,"<UNK>") for vec in prediction[idx]]
             actual_summary = [idx2vocab.get(vec,"<UNK>") for vec in train_batches_summary[j][idx]]
             
             print("\nSample Text\n")
             print(text)
             print("\nSample Predicted Summary\n")
             for word in predicted_summary:
                 if word == '<EOS>':
                     break
                 else:
                     print(word,end=" ")
             print("\n\nSample Actual Summary\n")
             for word in actual_summary:
                 if word == '<EOS>':
                     break
                 else:
                     print(word,end=" ")
             print("\n\n")
            
     print("\n\nSTARTING VALIDATION\n\n")
            
     total_val_loss=0
     total_val_acc=0
            
     for i in range(0, len(val_batches_text)):
        
         if i%100==0:
             print("Validating data # {}".format(i))

         cost, prediction,\
             acc = sess.run([cross_entropy,
                             outputs,
                             accuracy],
                               feed_dict={tf_text: val_batches_text[i],
                                          tf_embd: embd,
                                          tf_summary: val_batches_summary[i],
                                          tf_true_summary_len: val_batches_true_summary_len[i],
                                          tf_train: False})
         
         total_val_loss += cost
         total_val_acc += acc
    
    #Issue Starts Here  
     try:
         avg_val_loss = total_val_loss/len(val_batches_text) 
     except ZeroDivisionError: 
         avg_val_loss = 0
    
     print("\n\nEpoch: {}\n\n".format(epoch+covered_epochs))
     print("Average Training Loss: {:.3f}".format(total_train_loss/len(train_batches_text)))
     print("Average Training Accuracy: {:.2f}".format(100*total_train_acc/len(train_batches_text)))
     print("Average Validation Loss: {:.3f}".format(avg_val_loss))
     print("Average Validation Accuracy: {:.2f}".format(100*total_val_acc/len(val_batches_text)))
           
     if (avg_val_loss < best_loss):
         best_loss = avg_val_loss
         save_data={'best_loss':best_loss,'covered_epochs':covered_epochs+epoch+1}
         impatience=0
         with open('Model_Backup/Seq2seq_summarization.pkl', 'wb') as fp:
             pickle.dump(save_data, fp)
         saver.save(sess, 'Model_Backup/Seq2seq_summarization.ckpt')
         print("\nModel saved\n")
          
     else:
         impatience+=1
          
     if impatience > patience:
           break
          
          
     epoch+=1`

I can get rid of the error with exception handling but I was wondering if you had and idea of why it's not working in the first place.

where does the abstraction come from?

Hi I looked at the amazon fine food reviews data. It doesn't seem the review articles are accompanied by summaries. Are they?

Is the abstractive summary part based on any paper, such as Abigail see's pointer-generator? I only see references to lstm papers.

There is no mention of how you dealt with OOV words. Can you explain how you handled that?

ValueError: The passed save_path is not a valid checkpoint:

I'm trying to run this with Google Colab (GPU setting), with Tensorflow 2.0
It shows following error
saver.restore(sess, "/content/drive/My Drive/Colab Notebooks/Data/release/Model_Backup/Seq2seq_summarization.ckpt")

Following is the Code:
`
with tf.Session() as sess: # Start Tensorflow Session
display_step = 100
patience = 5

load = input("\nLoad checkpoint? y/n: ")
print("")
saver = tf.train.Saver(tf.global_variables(), )

if load.lower() == 'y':

    print('Loading pre-trained weights for the model...')
    saver.restore(sess, "/content/drive/My Drive/Colab Notebooks/Data/release/Model_Backup/Seq2seq_summarization.ckpt")
    sess.run(tf.global_variables())
    sess.run(tf.tables_initializer())

`

ValueError: setting an array element with a sequence

I am trying to run the code on the other dataset following the instructions in the readme, but I get 'ValueError: setting an array element with a sequence' when I try to train.

Do you have any idea where the problem could be?

Using the model, Complete Training and Optimize Hyperparameters

First, I want to thank you for your work.
Reading the future work to be done I got a lil bit confused: Complete Training and Optimize Hyperparameters isn't training already done? am seeing that U have already used the validation dataset but as a training dataset. Can U tell a lil bit more about the last part of the code "Training and Validation".

I want to try the model on a new article, however am new to tensorflow. I didn't get how to use the model generated on a new input. I used a usual command line model.predict (the one used for sklearn for example) but didn't get a result. Can U lead me to the right command to use to model files generated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.