GithubHelp home page GithubHelp logo

cancer_mortality_rates_model's Introduction

Cancer Mortality Rates Model

CMRM logo

Overview

The "Cancer Mortality Rates Model" project is a comprehensive data science and machine learning endeavor aimed at understanding the relationship between socioeconomic status and cancer mortality rates in the United States from 2010 to 2016. This project combines data analysis, preprocessing, modeling, and evaluation to derive valuable insights and build predictive models for assessing the impact of various attributes on cancer mortality rates.

For more information here Is Our Presentation and Report

Dataset

The project utilizes a comprehensive dataset aggregated from data.world website, including the American Community Survey. This dataset contains a wealth of information about US counties, including socioeconomic indicators and cancer mortality rates. For detailed information on the dataset, data sources, and preprocessing, please refer to the Dataset.

Usage

To use the project, follow the Jupyter notebooks in the Model directory or Python script at Model Script. These notebooks guide you through the entire data analysis and modeling process. You can use Google colab notebook which is ready with results.

Used Libraries

We used several python libraries like pandas, matplotlib, seaborn, numpy, scipy, and statsmodels.

Data Exploration and Analysis

This exploration helped us identify the nature of each attribute and determine what quantitative and categorical variables are. We found out that there are only two categorical variables, which are the city and district from which the data was collected.Through visualization, we also noticed the discrepancy between each county's real and expected death rates.

Data Cleaning

The attributes that have null values and how to solve this problem. So we test 3 methnologies to test which is the best for using on our data:

  1. Filling all null values with zero
  2. Removing the rows that have null values
  3. Filling null values with the mean of the values that are present in their column
  4. Applying the forward fill method

Handling the outliers

Visualizing the data helped us see that the data had a lot of outliers that needto be handled

Feature Engineering

Feature engineering was a crucial step in selecting and engineering attributes for our machine learning models.After we had successfully removed the outliers, we went on to choose the attributes that were believed to strongly affect the data. At first, we calculated the Pearson correlation coefficient between each attribute of the numeric attributes and the other attributes using Pandas. Then, we made a heatmap for these calculations using the seaborn python package.

Model Building

We experimented with various machine learning algorithms and techniques to build predictive models for cancer mortality rates. So we used the Scikit-learn package to make a multivariable regression model between the target death rate and all other features.

Evaluation

We evaluated model performance using appropriate metrics and conducted hypothesis testing to assess the significance of our findings.

Contributors

Acknowledgments

We extend our gratitude to our supervisors, Dr. Ibrahim and Eng. Merna, for their invaluable guidance and support throughout this project.

cancer_mortality_rates_model's People

Contributors

zoz-hf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.