Welcome to sleep-data-analysis repository

This is our Mini-Project for SC1015 (Introduction to Data Science and Artificial Intelligence).

Contributors

shinghao (Soh Shing Hao) - Data Preparation & Cleaning, Exploratory Analysis, Presentation
leechunyang98 (Lee Chun Yang) - Data Resampling, Machine Learning Models
czhi-heng (Cheung Zhi Heng) - Research, Data Analysis, Video Recording and Editing, Presentation

Practical Motivation

We are often told that we need at least 7 hours of sleep to be well-rested. However, we often still feel tired and unsufficiently rested even after sleeping for at least 7 hours. Are other variables apart from the duration of our sleep affecting our sleep quality?

Problem Definition

Can we predict a person's sleep quality using information on his sleep cycle, the time he goes to sleep, and his activities throughout the day?

We will only use data where the person has slept for at least 7 hours.

Dataset Source

The dataset we will be using is the Sleep Data dataset created and shared by Dana Diotte on Kaggle. The sleep data dataset consists only of Dana Diotte's own sleep information which he accquired between 2014-2018 and collected through the Sleep Cycle app from Northcube on iOS. The dataset can be found here: https://www.kaggle.com/danagerous/sleep-data

Machine Learning Models Used

Linear Regression
Random Forest
Polynomial Regression

Sampling Methods

Random Sampling
K-Folds
Repeated K-Folds

Conclusion

More sleep cycles and having a stressful day results in better sleep quality
Remaining variables do not have much correlation to sleep quality
Out of the 3 models we used, linear Regression produced the best results
However, since all 3 models produced low accuracy, we conclude that sleep quality cannot be accurately predicted with just sleep cycle, time going to sleep & lifestyle. To accurately predict sleep quality, other variables or models have to be explored.

Recommendations

Create a more balanced response variable through methods such as resampling. This is because for our model, the response variable, Sleep quality, is more skewed towards the right (representing higher sleep quality).
Collect more data as the data may become too small to measure the actual accuracy of the models.
Since this data is only about one person, it may be biased and hard to make accurate analysis of the information being given. It will be better to have a specific range/group of sleep information to give.(Continuation of point 2) Or a specific research centre/sleep centre of information to analyse will generate more interesting insights.
Since there is a lack of correlation for the variables we used, we recommend considering other variables that could also affect sleep quality

What Did We Learn?

Random Forest Model
Polynomial Regression
Encoding Categorical data with Label Encoding
Sampling data with K-Folds and repeated K-Folds
Representing data in time-series
Experimented with other machine learning models - Gradient Boosting Decision Tree, Histogram-Based Gradient Boosting, AdaBoost, K-Nearest Neighbour

shinghao / sleep-data-analysis Goto Github PK

sleep-data-analysis's Introduction

Welcome to sleep-data-analysis repository

Contributors

Practical Motivation

Problem Definition

Dataset Source

Machine Learning Models Used

Sampling Methods

Conclusion

Recommendations

What Did We Learn?

References

sleep-data-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs