GithubHelp home page GithubHelp logo

psanghal / machine_learning Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 5.65 MB

Prediction Model, Bias and Uncertainty

License: MIT License

Jupyter Notebook 100.00%
auc-roc cross-validation-score dash-plotly dashboard knn-classification logistic-regression machine-learning random-forest support-vector-machines learning-analytics

machine_learning's Introduction

machine_learning

1. Open University Learning Analytics (OULAD) dataset

In this project, we will perform 4 analytical tasks to help the office of innovative learning (OIL) at the University of Michigan to provide advisory intervention to students who are at risk of dropping the course or not likely to succeed.

  1. Conduct Exploratory Data Analysis: Examine variation in student engagement and performance across Open University courses. How is the passing, dropout rate and average number of clicks students log per week in each course?

  2. Build Prediction Model of Student Success: Identify which students are likely to fail or succeed by day 100. Use demographics, engagement and assessment to build predictive model. Think, how data accurately captures student's knowledge, skills and abilities?

  3. Investigate Biases of Prediction Models: Identify biases and shortcomings of prediction model. Is a model with mean AUC-ROC score > 89%, also fair? To answer above question, I have used Absolute Between ROC Area (ABROCA) to quantify fairness in predictive models.

  4. Build a Dashboard: Communicate data product to specific audience, so that it helps non-technically skilled instructor understand who is likely to succeed (or not) on day 100 of their course. Identify, which metrics will be useful in assessing dashboard’s use and impact?

Data Source:

Kuzilek, J., Hlosta, M., & Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific data, 4, 170171. https://www.nature.com/articles/sdata2017171

2. Boston Housing dataset:

In this case, we will quantify uncertainty in the linear model by bootstrapping a model predictor 5000 times to build a sampling distribution. Did we spot standard error in the mean of the sampling distribution? Yes, it's critical to present this model uncertainty to stakeholders prior to deployment.

Data Source:

http://lib.stat.cmu.edu/datasets/boston

The two predictors used from the dataset are as under:

  1. RM: Average number of rooms per dwelling
  2. LSTAT: Average proportion of adults without high school education and the proportion of male workers classified as labourers.

Visualize Model Uncertainty:

Now, let's visualize model uncertainty using Hypothetical Outcome Plot (HOP) and Spaghetti plots to simulate different outcomes.

Data Source:

Folder Dataset/Housing 'Salary_Data.csv'

machine_learning's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.