GithubHelp home page GithubHelp logo

iliaromanov / google-play-app-rating-prediction Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 6.53 MB

Project With Partner; Using Data Analysis/ Visualization and ML to predict the rating an App would get on the Google Play Store

Jupyter Notebook 100.00%
data-science data-visualization machine-learning rating-prediction

google-play-app-rating-prediction's Introduction

Hi, I'm Ilia!

  • ๐ŸŽ“ Fourth year Computer Science student at the University of Waterloo.
  • ๐Ÿ‘จโ€๐Ÿ’ป Interested in backend, computer networking, and algorithmic trading.
  • ๐Ÿข Looking for Winter/Summer 2025 internship or 2025 new grad Software Engineering related opportunities.

Languages Card

google-play-app-rating-prediction's People

Contributors

iliaromanov avatar sovima avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sovima

google-play-app-rating-prediction's Issues

Remove Unneeded Columns

First, we want to choose which columns we will keep for our app_data_x variable (the df that we will feed to the model to predict the rating). Probably Category, Size, Type or Price(not sure if we should keep both), Content Rating, Genre, and maybe Android Ver.

Then we should remove the unneeded columns.

Clean Genres

Since the Genres column consists of semicolon-separated arrays/lists (not sure what they're actually called) we should find a way to consider each individual genre in the list.

I am not really sure how or when we should do this.

Maybe create a duplicate row for each genre in the list for that row??

We could drop Genres all together if this is too difficult and just use Category instead

Clean Rating

We need to remove all rows with a null value in the rating column

Clean Size

Convert the values in the Size column to numeric and remove rows with unusable values in this column such as "varies with device" or "nan".

I kinda wanna do this one cus it's one of the only ones I think I have a clear idea for.

Clean Android Ver

If we are going to keep the Android Ver column for our app_data_x variable, we should probably find a way to either convert it to numeric or combine similar values (e.g. combine "4.0.3 and up" and "4.0.3" to be one category). I'm thinking combining similar values would be better because the Android version seems like a more categoric piece of data.

Then have to remove any unusable values from this column such as "nan" and other unusable values of that sort.

Remove Duplicates

Remove apps with duplicate names to make data more readable and clean

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.