GithubHelp home page GithubHelp logo

lrjoe / ucsd_dse200_finalproject Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 138.24 MB

The purpose of this project for our first quarter Python class was to merge dataframes and use features of both datasets to predict a target, and more importantly to show our approaches and convey our findings in well documented code.

Jupyter Notebook 100.00%

ucsd_dse200_finalproject's Introduction

DSE 200 Final Project

Horse Racing Odds: Merging Dataframes to Predict Odds on the Racetrack

Leslie's final project with horse racing data from Kaggle

About this project

The purpose of this project for our first quarter Python class was to merge dataframes and use features of both datasets to predict a target, and more importantly to show our approaches and convey our findings in well documented code.

The data I chose was "Triple Crown Races (2005 - 2019)" by Joseph (jmolitoris) on Kaggle, and "Horse Racing Data from 1990" by Nikolay Kashavkin (hwaitt) on Kaggle, merging along either the names of horse trainers or jockeys. Both resulting dataframes were run through the three machine learning models we learned to use: linear regression, elastic net, and decision tree regressor. The mean square errors (MSE) and coefficients of determinations (r$^2$) were returned as results, which were the determining factors we'd used in class thus far.

While other classes have since taught us other machine learning models, methods of determining model robustness, and following standard organization methods, this project was a good opportunity to show my understanding of Jupyter notebooks and Python code with an emphasis on the pandas, numpy, matplotlib, and seaborn libraries.

The notebook was submitted for school on December 3rd, 2021.

(Note: In the "Final Project_Fall2021.ipynb" notebook, grey font indicates text written by the instructor.)

Notebook outline

  • Finding datasets to use
    • "Triple Crown Races (2005 - 2019)" and "Horse Racing Data from 1990"
      • by Joseph (jmolitoris) and by Nikolay Kashavkin (hwaitt) on Kaggle
      • Listed column descriptions
  • Reading in the datasets
    • Read in both datasets twice
    • Once for the trainer name merging and once for the jockey name merging
  • Cleaning and understanding the data
    • Examining the columns
      • Removed unnecessary (duplicate, object, etc) columns for my analysis
      • Converted object types to ints and floats types
      • Removed outliers
      • Graphed distributions and relationships between columns
    • Took note of potential correlations between columns
  • Can the odds be predicted with this data?
    • Merging dataframes along trainer or jockey names
      • Performed three times each with differing columns (features) included
    • Setting up the machine learning models
      • Linear regression
      • Elastic net
      • Decision tree regressor
    • Executing the ML models
      • Modeling based on trainer names
      • Modeling based on jockey names
  • Observing the results
    • Visual (plotted) observations of the predicted vs actual output
    • Mean squared errors and coefficients of determination
  • Conclusions

How to run:

  1. Download this repository
  2. Start jupyter lab
  3. Run the jupyter notebook "Final Project_Fall2021.ipynb"

ucsd_dse200_finalproject's People

Contributors

lrjoe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.