GithubHelp home page GithubHelp logo

glassdoor_reviews's Introduction

Glassdoor_reviews

This repository includes a Data science project about collecting and analyzing Glassdoor reviews

Planning

  • Research question and objective: what are the most common skills, qualifications, and experiences that applicants for a particular job have.
  • Data collection and cleaning:
    • using web scraping tools or APIs to extract the reviews from Glassdoor.
    • Filter the reviews by the job title and location of innterest, and remove any duplicates, missing values, or irrelevant information.
  • Exploratory data analysis (EDA): Using descriptive statistics, visualizations, and natural language processing techniques to understand the distribution, trends, and patterns of the reviews
  • Sentiment analysis on the reviews: Using sentiment analysis tools or models to classify the reviews into positive, negative, or neutral categories based on their tone and emotion.
  • Extract features and insights from the reviews:
    • using text mining techniques such as topic modeling, keyword extraction, named entity recognition, or word embeddings to identify the main themes, skills, qualifications, experiences, or attributes that are mentioned in the reviews.
    • using machine learning techniques such as clustering, classification, or regression to find relationships, correlations, or predictions among the features and the sentiment of the reviews.
  • Communicate and visualize findings: Use tools such as dashboards, reports, or presentations to summarize and communicate results (word clouds, charts, graphs, or maps to highlight the key insights and recommendations from the analysis)
graph TB
    A[Start] --> B[Define question]
    B --> C[Collect and clean data]
    C --> D[EDA and sentiment]
    D --> E[Extract features]
    E --> F{Choose technique}
    F -->|Clustering| G[Find groups]
    F -->|Classification| H[Classify reviews]
    F -->|Regression| I[Predict outcomes]
    G --> J[Summarize findings]
    H --> J
    I --> J
    J --> K[End]
    C -- Web scraping or APIs --> L((Data))
    L -.-> C
    D -- Sentiment tools or models --> M((Categories))
    M -.-> D
    E --> N[Text mining]
    N --> Q[Themes]
    N --> R[Skills]
    N -->S[Qualifications]
    N --> T[Experiences]
    Q -.-> E
    R -.-> E
    S -.-> E
    T -.-> E
    style A circle, 50px;
    style K circle, 50px;

Timeline

gantt
    title Glassdoor reviews Data Science Project Timeline
    dateFormat YYYY-MM-DD
    axisFormat %b %d
    section Define question
        Define research question and objectives: 2023-06-01, 2023-06-03
    section Collect and clean data
        Research available web scraping methods: 2023-06-03, 2023-06-04
        Collect Glassdoor reviews data: 2023-06-04, 2023-06-05
        Clean and preprocess data: 2023-06-05, 2023-06-08
    section EDA and sentiment
        Perform descriptive statistics and visualizations: 2023-06-08, 2023-06-11
        Perform sentiment analysis on reviews: 2023-06-11, 2023-06-18
    section Extract features
        Apply text mining techniques: 2023-06-18, 2023-06-22
        Extract features from reviews: 2023-06-22, 2023-06-25
    section Choose technique
        Choose machine learning technique: 2023-06-26
        Find relationships or predictions from reviews: 2023-06-27, 2023-06-30
    section Communicate findings
        Create summary of results: 2023-07-01, 2023-07-02

glassdoor_reviews's People

Contributors

ralucan avatar serathu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.