GithubHelp home page GithubHelp logo

brash6 / topicmodelingtrustpilotreviews Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 1.0 17.37 MB

Scraping of Trustpilot reviews, Analysis of the evolution of topics discussed in reviews over time.

Jupyter Notebook 66.06% HTML 33.90% Python 0.04%
nlp nlp-machine-learning python topic-modeling scraper jupyter-notebook data-science machine-learning trustpilot reviewsanalysis-nlp

topicmodelingtrustpilotreviews's Introduction

Topics analysis in UK train companies reviews

What are the topics discussed in UK train companies reviews on Trustpilot ?

What could UK train companies learn from this to improve their services ?

Description

This project consists in the analysis of topics discussed in UK train companies reviews from Trustpilot.

The goal of this project is to understand the evolution of discussed topics over time in order to give insights to UK train companies so they can improve their services.

Configuration

To run this project on your machine, you can create a virtual environment using the okra_env.yml file.

Data

We use reviews scraped from Trustpilot on 17 UK train companies. For each review, we extract :

  • date
  • train company
  • rating (number of stars)
  • title
  • body

To run the code, please unzip the data files in the data folder

Data can be found in the data folder.

Scripts

Scraping

The script trustpilot_scraping.py used to scrap Trustpilot reviews is in the scraping folder.

Analysis

You will find the visualizations and results in the TopicsAnalysisReviewsClean.ipynb file. To just look at results and viz, you can open TopicsAnalysisReviewsClean.html

Some visualizations (PyLDAvis) need to be launch in a web browser. These visualizations are in the viz folder

In order to make this notebook clean, functions are written in separated files.

In the constants.py file, you will find all the constants used in this project.

In the data_cleaning.py file, you will find all the cleaning functions. These functions clean the dataframe in order to make it ready for Latent Dirichlet Allocation.

In the visualizations.py file, you will find all the visualizations functions.

In the modelization.py file, you will find all the modelization functions.

Advices to UK train companies based on topics analysis

  • Keep working on the reliability of your train schedules
  • If not done already, develop efficient social media services for booking, refund, information on the journey, etc. (Whatsapp, Facebook and Twitter)
  • Improve the effectiveness of your website especially for tickets refund, tickets change and railcard unsubscription
  • Give the choice between voucher and refund when a train is deleted
  • Deliver significantly higher quality services for first class passengers (cleanness, seat, drink, lounge, more accessible carriage, etc.)
  • Emphasize helpfulness and kindness as key behaviours in your staff management, especially those in station and on board
  • If not done already, modernize your trains
    • Smartphones outlets
    • Space to work with a laptop, wifi
    • Space for bikes
    • Special places for the disabled

topicmodelingtrustpilotreviews's People

Contributors

brash6 avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

benwaldner

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.