GithubHelp home page GithubHelp logo

titanic-survival-prediction's Introduction

Titanic - Learning from Disaster

Kaggle Competition: Titanic - Machine Learning from Disaster

Model Results

  • Best Decision Tree Accuracy: 0.84

Classification Report

      precision    recall  f1-score   support
   0       0.88      0.86      0.87       114
   1       0.76      0.80      0.78        65

accuracy 0.84 179

Confusion Matrix

[[98 16] [13 52]]

  • Precision: 0.76
  • Recall: 0.8
  • F1 Score: 0.78

The goal of this analysis is to predict the survival of passengers on the Titanic. The dataset underwent preprocessing and feature engineering to prepare it for modeling. Various transformations, including imputation, standardization, and one-hot encoding, were applied to handle missing values and convert categorical features.

Feature Engineering and Preprocessing

The preprocessing pipeline included:

  • Imputing missing values using median for numeric features and most frequent values for categorical features.
  • Standardizing numeric features using StandardScaler.
  • One-hot encoding categorical features with OneHotEncoder.

Model Selection - Decision Tree

A Decision Tree model was chosen for its interpretability and ability to capture complex relationships in the data. The model was initially trained without hyperparameter tuning, resulting in an accuracy of 83%.

Hyperparameter Tuning - Randomized Grid Search

To improve the model's performance, a randomized grid search was conducted to find optimal hyperparameters. The hyperparameter grid included various settings for criterion, splitter, max depth, min samples split, min samples leaf, and max features.

The best model obtained from the search was then evaluated on the test set.

Model Evaluation Metrics

The following metrics were used to evaluate the Decision Tree model:

  • Precision: 0.76
  • Recall: 0.80
  • F1 Score: 0.78

These metrics provide insights into the model's ability to correctly classify survivors and non-survivors. Precision measures the accuracy of positive predictions, recall measures the ability to capture all positive instances, and the F1 score balances precision and recall.

In conclusion, the Decision Tree model, after hyperparameter tuning, shows promising results in predicting passenger survival on the Titanic dataset.

Data Dictionary

Variable Definition Key
survival Survival 0 = No, 1 = Yes
pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd
sex Sex
Age Age in years
sibsp # of siblings / spouses aboard the Titanic
parch # of parents / children aboard the Titanic
ticket Ticket number
fare Passenger fare
cabin Cabin number
embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

Variable Notes

  • pclass: A proxy for socio-economic status (SES)

    • 1st = Upper
    • 2nd = Middle
    • 3rd = Lower
  • age: Age is fractional if less than 1. If the age is estimated, it is in the form of xx.5.

  • sibsp: The dataset defines family relations in this way...

    • Sibling = brother, sister, stepbrother, stepsister
    • Spouse = husband, wife (mistresses and fiancés were ignored)
  • parch: The dataset defines family relations in this way...

    • Parent = mother, father
    • Child = daughter, son, stepdaughter, stepson
    • Some children traveled only with a nanny, therefore parch=0 for them.

titanic-survival-prediction's People

Contributors

dneshp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.