GithubHelp home page GithubHelp logo

codekhal / inshorts-nlp Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 1.0 3.48 MB

Analysed syntax and Semantics of Corpus of Text Documents Retrieved from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.

License: MIT License

Python 0.17% Jupyter Notebook 99.83%
nlp tokenizer stemming lemmatization bagging postagging python3 shallow-parsing data-mining webscraping

inshorts-nlp's Introduction

Inshorts-NLP

Scraping

Analysed syntax and Semantics of Corpus of Text Documents Retrived from Web Scraping of News articles from Inshorts and followed the Standard NLP Workflow of the CRISP-DM model.

WorkFlow

Credits

Open Issues Forks Stars

Maintained Made with Python
Open Source Love
Built with Love

πŸ“’ Index

πŸ”° About

A NLP based Project which scraps the news articles of mainly 3 categories:

  • Technology
  • Sports
  • World

from InShorts using website urls. Finally after numerous preprocessing steps like Text Wrangling, Removing accented characters, Removing html tags, Lemmatization, Stemming, build a text normalizer to create dataset for applying sentiment analysis.

Sentiment analysis is perhaps one of the most popular applications of NLP.

The key aspect of sentiment analysis is to analyze a body of text for understanding the opinion expressed by it. Typically, quantifying this sentiment with a positive or negative value, called polarity.

This project can be used to create following key features:

  • Building Text summarizer using RNNs and LSTM
  • Gain only particular sentiment be it positive or negative.
  • Emojifier: Building appropriate reaction emojis from the extracted sentiments.
  • Building a tone detector as Grammarly (Beta) provides us.

Build this project to learn the nuances of NLP of handling Text Data.

πŸ”Œ Installation

πŸ“¦ Commands

Packages which should be imported:

  • Pandas
  • Numpy
  • Seaborn
  • nltk
  • Afinn
  • TextBlob
  • Beautiful Soup
  • requests
  • Spacy Language Models

Note: Spacy may give lot of errors, one should make sure to proper install it. Further more refer to the requirements.txt

Just want to run the project on your local machine: Make sure you install all the packages mentioned in requirements.txt.

  • Clone the repository
$ git clone https://github.com/codekhal/Inshorts-NLP 
  • Install dependencies.
$ cd Inshorts-NLP
  • Now in your terminal, using appropriate conda env
$ run jupyter or any other preferable editor

πŸ“‚ File Structure

  • File structure with the basic details about files and directories.
.__Inshorts-NLP__
β”œβ”€β”€ contractions.py
β”œβ”€β”€ img
β”‚   β”œβ”€β”€ scraping.png
β”‚   β”œβ”€β”€ Sentiment_Score_News_Category.png
β”‚   β”œβ”€β”€ sentiments.png
β”‚   β”œβ”€β”€ stemming.png
β”‚   β”œβ”€β”€ Visualizing_Sentiments_Box_Plot.png
β”‚   └── workflow.png
β”œβ”€β”€ LICENSE
β”œβ”€β”€ news.csv
β”œβ”€β”€ NLP_main.ipynb
β”œβ”€β”€ __pycache__
β”‚   └── contractions.cpython-35.pyc
β”œβ”€β”€ README.md
└── requirements.txt

2 directories, 13 files

- Brief Description

Built a web scraper which had scraped news articles from Inshorts website urls. Then using numerous text-preprocessing techniques, cleaned the data for further processing. After this, turn came for sentiment analysis on the data. Various popular lexicons are used for sentiment analysis, including the following.

  • AFINN lexicon
  • Bing Liu’s lexicon
  • MPQA subjectivity lexicon
  • SentiWordNet
  • VADER lexicon
  • TextBlob lexicon

Used NLTK, AFINN and TextBlob library. Using both data visualization tools and pandas dataframe techniques to show results of the dataset.

πŸ“· Info Gallery

The sentiment score of different genres of news category is shown with the help of the following plots.

Box Plot

Lastly, the count of three sentiments in different genres of news articles is depicted with the help of factor or bar plot.

Factor Plot

πŸ“œ Guidelines

  • Contribution Guidelines

Future Work that could be done:

  • Flask/Flask App Deployment -​ ​ Deploy the app so that couldbe efficiently used.

  • Use of Deep Learning -​ One may try and use deep learning for building a text summurizer and tone detector.

Kindly follow the Contributions Guildlines before you create any pull requests or issues. Though feel free to contribute in any form.
Open Source <3

πŸ“„ Resources

🌟 Present Contributors

Contributors

Want to share your ideas

Feel free to reach out to me

Telegram

πŸ”’ License

License

inshorts-nlp's People

Contributors

codekhal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

visioninhope

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.