GithubHelp home page GithubHelp logo

ml-practice-fertility-data-set's Introduction

ML-Practice-On-Fertility-Data-Set

Practicing Various Models on Fertility Data Set

Abstract

100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria. Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits

  • Data Set Characteristics: Multivariate
  • Attribute Characteristics: Real
  • Area: Life
  • Number Of Attributes:10
  • Number Of Records:100

To get the data set for yourself click here
To know more about the data click here

Data Set Attribute Information

  • Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)
  • Age at the time of analysis. 18-36 (0, 1)
  • Childish diseases (ie , chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)
  • Accident or serious trauma 1) yes, 2) no. (0, 1)
  • Surgical intervention 1) yes, 2) no. (0, 1)
  • High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)
  • Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1)
  • Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)
  • Number of hours spent sitting per day ene-16 (0, 1)
  • Output: Diagnosis normal (N), altered (O)

Methodology

Used 3 models for now :

  • KNN
  • LogisticRegression
  • SVC
  • ANN

Procedure

  1. Data was loded into python by using Pandas library.
  2. Data was split into training and testing data using the train_test_split module in the model__selection library of sklearn
  3. For KNN a loop was used to check different n_neighbours values in between 1 to 40. The mean error_rate was plotted and can be seen that for any value greater than 1 the model has least error so taking the least value of n_neigbours i.e. 2.
  4. For SVC again from model__selection library of sklearn module GridSearchCV is used so as to find the best combination for the SVC parameters C and gamma.
    The best combination of parameters was found to be 'C': 0.1 & 'gamma': 0.1
  5. Using this estimator data was fit into the SVC model and trained and tested.
  6. For Logistic Regression, simply the data was fit into the model and values were tested.
  7. Evaluation of the models was done by sklearn's metrics library. Modules classification_report and confusion_matrix were used to check the confidence/accuracy of the model.
  8. For ANN, a simple shallow neural network was built with one hidden layer. The Keras library was used with the 'accuracy' metric.
  9. Accuracy Scores of all the models can be seen as the same as 91%.

Final Remarks

What I think went wrong in this is that the data set doesn't have enough Class 1 data in the test case as you can see in the confusion matrix

n=33 Pred Class 0 Pred Class 1
Actual Class 0 30 3
Actual Class 1 0 0

So these models might not be that good in predicting True Class 1. Also the data set is very small

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.