GithubHelp home page GithubHelp logo

thom-j-h / capstone2_harvard_edx Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 1.62 MB

Testing 21 models (machine learning ensemble) on the WDBC data set

R 3.92% HTML 96.08%
wdbc machine-learning machine-learning-algorithms r ensemble-machine-learning ensembles

capstone2_harvard_edx's Introduction

Capstone2_Harvard_edX

In partial fulfillment of the requirements for the Harvard edX: Data Science Professional Certificate, this repository contains the following files:

And the original Breast Cancer Wisconsin (Diagnostic) data set (WDBC), available from the UCI Machine Learning Repository, Center for Machine Learning and Intelligent Systems, University of California, Irvine:

The script (and RMD) import the data set from the UCI source, so there is no need to download it first.

Please note:
The RMD will take a minimum of 40 minutes -- and more likely over an hour -- to run. It also requires that the user has installed a number of ML packages for R, consistent with those used in for ensemble modelling in the Harvard edX course on Machine Learning.

The script largely runs silently (the output captured). Any warnings or error messages may be safely ignored. Not every model works perfectly on each testing condition/variation, which is the point of testing the various models against similar controlled conditions.

Thank you,
Thom J. Haslam
March 12, 2019

 
Run Two: Visual Overview

Update: 2019-03-14

I thank the Harvard edX peer and staff reviewers for their encouraging and helpful comments. One suggestion was to change the loading procedure in the RMD from

  • library(tidyverse)
  • library(caret) # etc

To

Which will ensure that if someone is missing the needed packages, the packages will be installed from CRAN so that the RMD runs without terminating by error. (Please see Packages_Required_Set_up.R).

This is an excellent suggestion, so I will update the script and the RMD (by 15 March 2019) for future use/reference. I will also take one last crack at fixing any typos or infelicities of expression in the report, even though the project has received full marks (50 out of 50) and for all practical purpose is done: certificate earned!

Otherwise, I will leave this Machine Learning project up as an archive: as part of what I hope will be a growing R for Data Science portfolio.

 
PCA Graphs 1-2, 4-5

capstone2_harvard_edx's People

Contributors

thom-j-h avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

amounsey

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.