GithubHelp home page GithubHelp logo

lunaticprakash / text-summarization Goto Github PK

View Code? Open in Web Editor NEW
19.0 0.0 6.0 332 KB

Using Spacy and NLTK module with Tf-Idf algorithm for text-summarisation. This code will give you the summary of inputted article. You can input text directly or from .txt file, .pdf file or from wikipedia url.

Python 100.00%
nlp naturallanguageprocessing spacy nltk textsummarization text-summarisation tfidf

text-summarization's Introduction

Text-Summarization

Using Spacy and NLTK module with TF-IDF algorithm for text-summarisation. This code will give you the summary of inputted article. You can input the text by typing (or copy-paste) or from Txt file, PDF file or from Wikipedia Page Url.

Purpose :-

To save time while reading by summarizing a large article or text into fewer lines.

Description :-

It usage Term Frequency-Inverse Document Frequency (TF-IDF) algorithm for summarising the article.

Features :-

You can read the text of your long article in 4 ways :-

InputTextWays

  • By typing text on your own (or copy-paste).
  • Reading the text from .txt file.
  • Reading the text from .pdf file.(You can choose either to get summary of entire pdf or select any page interval).

PdfInput

  • Reading the text from wikipedia page (All you have to do is to provide the url of that page. Program will automatically scrap the text and summarise it for you).

Don't worry about Code length xD. It might look lengthy but there are lot of comments for explaination of code(almost 70 comments) and extra spacing for more readability.

Output :-

Summary

  • Comparison of Original Content vs Summarized content.

OriginalvsSummaryWordCount

Requirements :-

  • Python3
  • Spacy Module (short, medium, or long any type is sufficient)
  • NLTK Module
  • PyPdf2
  • Beautiful Soup (bs4)
  • urllib (already available with python itself, no need for external installation)

How to install Requirements :-

  1. Python3 can be installed from their official site https://www.python.org/ . Or you can use anaconda environment.
  2. Spacy can be installed by For Anaconda Environment >
conda install -c conda-forge spacy

python3 -m spacy download en

For other environments >

pip3 install spacy

python3 -m spacy download en
  1. NLTK can be installed by For Anaconda Environment >
conda install -c anaconda nltk

For other environments >

pip3 install nltk
  1. PyPdf2 can be installed by For Anaconda Environment >
conda install -c conda-forge pypdf2

For other environments >

pip3 install PyPDF2
  1. Beautiful Soup (bs4) For Anaconda Environment >
conda install -c anaconda beautifulsoup4

For other environments >

pip3 install beautifulsoup4`

Getting Started :-

  • Download or clone repository.

  • Open cmd or terminal in same directory where Text-Summarizer.py file is stored and then run it by followng command :-

python3 Text-Summarizer.py
  • Now just follow along with the program.

Bugs and Improvements :-

  • No known bugs. Summary can't be as perfect as humans can do.
  • Audio feature will be added soon, so that you can listen the summary too if you want.

Dev :- Prakash Gupta

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.