GithubHelp home page GithubHelp logo

datahappy1 / czech_language_sentiment_analyzer Goto Github PK

View Code? Open in Web Editor NEW
3.0 0.0 1.0 208.61 MB

Czech sentiment analyzer

Home Page: http://czester.herokuapp.com

License: MIT License

Python 81.77% HTML 17.23% JavaScript 0.93% Procfile 0.07%
czech-language czech sentiment-analysis sentiment python python-3 scraper flask naive-bayes logistic-regression

czech_language_sentiment_analyzer's Introduction

10000 ft. Overview

10000 ft overview

Data Collection

56k Czech movie reviews were collected using the /data_preparation/data_collector_movie_review_scraper.py multithreaded HTML scraping module. These reviews were scrubbed using langdetect module to remove reviews written in Slovak language. This dataset was also scrubbed against a collection of Czech stopwords. To have the data balanced with the same amount of negative and positive reviews, the final dataset had to be reduced to 11.5k positive and 11.5k negative reviews. Collected data was also stemmed before training the models.

ML Models

From Scikit-Learn Python library, Naive Bayes, Logistic regression and Support Vector Machine ML models were used for training and testing data for text sentiment analysis. The scripts for training and testing are located here:

The overall sentiment score for the specified text input is calculated as a weighted average based on the precision score accuracy of these 3 model predictions.

Flask web application

The Flask web application is currently hosted at https://czester.herokuapp.com, source code can be found in this location /flask_webapp/. This application backend is written in Python using the Flask framework and Bootstrap for the templates styling. This app also provides the users with a simple API. The stats module is a result of an integration between Chart.js and Flask where the statistics data persistence layer can be either Sqlite3 or Heroku Postgres. If you provide this app with a environment variable named DATABASE_URL containing the Heroku Postgres DB URL like postgres://YourPostgresUrl, then remote Heroku Postgres will be used, otherwise local Sqlite3 db instance will be used.

Input text dataflow diagram:

Input text dataflow diagram

How to run this Flask App from local environment
  1. create and activate a standard Python virtual or pipenv environment
  2. pip3 install the requirements from requirements.txt
  3. set the working directory for instance to the path where you cloned this repo (Make sure it's the path where the Heroku Procfile file is located)
TODOs / Future ideas
  • Remove reviews written in Slovak language
  • Verify input text is written in Czech language
  • Add Flask web app tests
  • Add Czech word stemmatizer module
  • Ensembling instead of weighted model precision average for overall sentiment
  • Redis could replace Sqlite3 / Postgres
  • Migrate from Heroku to AWS
Useful links

czech_language_sentiment_analyzer's People

Contributors

datahappy1 avatar dependabot[bot] avatar pprud avatar

Stargazers

 avatar  avatar  avatar

Forkers

kakamband

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.