GithubHelp home page GithubHelp logo

text-summarization's Introduction

Text-Summarization

This project compares the KL-SUM,TF-IDF, Lexrank and LSA algorithms for text summarization.
Both these algorithms employ an extractive summarization methodology, i.e. important sentences from the original document are selected and concatenated to form a summary.
The paper associated with this project was published in the peer-reviewed journal IJCST and can be found here - http://www.ijcstjournal.org/volume-4/issue-3/IJCST-V4I3P63.pdf

Installation

  1. install python3
  2. pip install -r requirements.txt
  3. run python shell and write these commands
  • import nltk
  • nltk.download('punkt')
  • nltk.download('stopwords')
  1. Now open each folder and run its desired python file for making summary(of the text files which are in input folder).
  2. at last run comparison files.

Implementation

The input files have word counts ranging from 500 โ€“ 25,000.
The csv files for both the algorithms contain the word count associated with each text file and the time required for generation of the summary.
The corresponding output files contain the generated automatic summary.

Python packages used

Natural Language Toolkit (NLTK) -
Open source Python library for Natural Language Processing.
http://www.nltk.org/
Sumy -
Python library and command line utility version 0.4.1 used for extracting summary from html pages and plain text documents.
https://pypi.python.org/pypi/sumy

Conclusion

For larger sized files (files with a greater word count), LSA is faster than Lexrank however for smaller files (files with a smaller word count), Lexrank is faster.

Future Scope

Comparing the quality of summary generated in addition to the efficiency in terms of speed for greater accuracy and ease of summarization.

text-summarization's People

Contributors

haxkd avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.