GithubHelp home page GithubHelp logo

jammaladeyemi / cardiovascularheartdisease Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 11.4 MB

This is a separate personal repo. If you want to see the full work check: https://github.com/JammalAdeyemi/MSc-Team16

Jupyter Notebook 99.98% Python 0.02%
data-science machine-learning python

cardiovascularheartdisease's Introduction

CVD

Problem Statements

Cardiovascular disease (CVD) is a leading cause of mortality globally, and early identification and management of risk factors can significantly reduce the burden of CVD. The UCI heart dataset has been widely used in recent studies to develop predictive models for CVD. However, the limited size of this dataset and the lack of diversity in its sources raise concerns about the generalizability of the models developed using it. Additionally, joining different datasets from various repositories led to the removal of a substantial amount of data due to duplicates. To address these issues, I propose the use of a new dataset that includes objective medical information, results of medical examinations, and subjective information given by patients. Our objective is to develop a predictive model that can accurately predict the risk of CVD, expressed as a percentage, using this new dataset.

Evaluation

The error metric used is the F1-score, which ranges from 0 (total failure) to 1 (perfect score). Hence, the closer one scores is to 1, the better the model.

  1. F1 Score: A performance score that combines both precision and recall. It is a harmonic mean of these two variables. Formula is given as: 2PrecisionRecall/(Precision + Recall)
  2. Precision: This is an indicator of the number of items correctly identified as positive out of total items identified as positive. Formula is given as: TP/(TP+FP)
  3. Recall / Sensitivity / True Positive Rate (TPR): This is an indicator of the number of items correctly identified as positive out of total actual positives. Formula is given as: TP/(TP+FN)

Where:

  • TP = True Positive
  • FP = False Positive
  • TN = True Negative
  • FN = False Negative

Folders

I have 3 folders in the projects

  1. Data: Contains all the datasets used in the project.
  2. Models: Contains the final models that will be used in the web interface to make predictions on unseen data.
  3. Notebooks: Contains three sub-folders, each with a self-explanatory name.

Tools and Technologies

The project was implemented using Python and the following libraries:

  1. Pandas, and Numpy for data manipulation and analysis.
  2. Matplotlib and Seaborn for data visualization.
  3. Pycaret to automate the ML-Flow and get the general overview of what model to use on the dataset.
  4. Scikit-learn libraries for training, and evaluating the machine learning model
  5. XGBOOST and LightGBM for training our dataset

I hope that this project can contribute to the field of CVD risk prediction and encourage further research in this area.

cardiovascularheartdisease's People

Contributors

jammaladeyemi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.