text-summarization's Introduction

Text-Summarization

This project compares the KL-SUM,TF-IDF, Lexrank and LSA algorithms for text summarization.
Both these algorithms employ an extractive summarization methodology, i.e. important sentences from the original document are selected and concatenated to form a summary.
The paper associated with this project was published in the peer-reviewed journal IJCST and can be found here - http://www.ijcstjournal.org/volume-4/issue-3/IJCST-V4I3P63.pdf

Installation

install python3
pip install -r requirements.txt
run python shell and write these commands

import nltk
nltk.download('punkt')
nltk.download('stopwords')

Now open each folder and run its desired python file for making summary(of the text files which are in input folder).
at last run comparison files.

Implementation

The input files have word counts ranging from 500 – 25,000.
The csv files for both the algorithms contain the word count associated with each text file and the time required for generation of the summary.
The corresponding output files contain the generated automatic summary.

Python packages used

Natural Language Toolkit (NLTK) -
Open source Python library for Natural Language Processing.
http://www.nltk.org/
Sumy -
Python library and command line utility version 0.4.1 used for extracting summary from html pages and plain text documents.
https://pypi.python.org/pypi/sumy

Conclusion

For larger sized files (files with a greater word count), LSA is faster than Lexrank however for smaller files (files with a smaller word count), Lexrank is faster.

Future Scope

Comparing the quality of summary generated in addition to the efficiency in terms of speed for greater accuracy and ease of summarization.

Recommend Projects

haxkd / text-summarization Goto Github PK