GithubHelp home page GithubHelp logo

ttaylor14 / pybaseball-database Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 2.14 MB

Pybaseball Database: Using the Pybaseball Python Package to create a historical baseball database for any use.

Python 88.29% R 11.71%
python pybaseball baseball database statistics python-3 historical-data statcast-data lahman-baseball-database lahman

pybaseball-database's Introduction

pybaseball-database

Pybaseball Database

Step 1:

Required Python Packages:

  • pandas
  • pybaseball

Step 2:

Run file: 'Baseball-Data.py'

Currently set to run between the years of 1950-2020

  • Can easily be adjust at the top of the file
  • File takes approximately 1 hour to run as it currently is written

This file completes the following:

  • Creates Data Directories
  • Downloads the lahman Database
  • Pulls all Battings Stats
  • Pulls all Pitching Stats
  • Pulls all Team Batting Stats
  • Pulls all Team Pitching Stats
  • Pulls all Statcast Exit Velocity Data
  • Pulls Top Prospect Data
  • Pulls Fangraph Batting Data
  • Pulls Fangraph Pitching Data
  • Pulls Fangraph Team batting Data
  • Pulls Fangraphs Team Pitching Data

Files are organized into folders by year.

Each Year folder contains

  • All Batting Stats
  • All Pitching Stats
  • All Batting and Pitching Stats Combined
  • All Team Batting Stats
  • All Team Pitching Stats

-After 2007:

  • Statcast Exit Velocity Data is added

Data is also combined into a file containing data from all the selected years

files include:

  • All Batting Stats
  • All Pitching Stats
  • All Combined Stats
  • All Exit Velocity Stats
  • All FanGraph Stats
  • All Team Batting Data
  • All Team Pitching Data
  • Top Prospect Data

Data to later include:

[ ] - Team Standings (Currently pulled as a long list and needs to be transformed into a DataFrame)

[ ] - Amateur Draft (Code Works, but more research is needed to pull all data) (additional file?)

[ ] - Fielding Data (pybaseball Fielding Code was not working - may need to use Fangraph/lahman)

[ ] - Add fangraph data to each year folder?

[ ] - Possibly add lahman data to each year folder?

Step 3:

Run file: 'StatcastPull.py'

  • Safe Guard is in place, but Stacast Data began in 2008

This file will pull all statcast data for 1 season

Data is placed in a seperate statcast folder and creates a seperate file for each team Data is also combined into an All Statcast File for the given year.

This was originally incorporated in the file from Step 2, however, it cause a significant incease in time and memory allocation.

The code is included to combine all statcast files into one historical statcast data file (However, Memory issues were causing issues)

It is highly recommended to run only a a few years at a time to reduce run time. Takes approximately 30 minutes to 1 hour per year

Future Additions/changes

[ ] - Combining Lahman Database data into 1 complete file

[ ] - Creating an additional file to add a single new season to the database and reduce run speed for post 2021 season

[ ] - Investigate columns with mixed types and resolve the issues

pybaseball-database's People

Contributors

ttaylor14 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.