GithubHelp home page GithubHelp logo

2dvodcast / sabermetrics-101 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from paul-reiners/sabermetrics-101

0.0 2.0 0.0 548 KB

Extra (non-assignment) code I've written for BUx: SABR101x Sabermetrics 101: Introduction to Baseball Analytics

R 100.00%

sabermetrics-101's Introduction

Sabermetrics 101 Questions

This is a set of questions I looked at while taking BUx: SABR101x Sabermetrics 101: Introduction to Baseball Analytics, taught by Professor Andy Andres.

  1. Why did the Players League have so many R/G?
  2. Do Canadian-born baseball players use a bat like they use a hockey stick?

Why did the Players League have so many R/G?

In the "Different Eras IV" video, Professor Andres asked

Maybe there's a difference here with the Players League. It might be different than the other leagues in that era, that actual year. But it's hard to tell. And again, if you're interested in this, seeing this difference, you might explore it. What could be the potential cause for this difference? Do you think it actually is a significant difference or not?

Here is a zoomed-in plot showing this:

runs per game league plot

I decided to investigate this question. Recall that the Players League lasted for only one year, 1890. I thought that the reason might be that the players in the Players League were simply better than the players in the other leagues. To test this hypothesis, I did the following:

  1. Found those players that batted in the Players League and who also batted (in some other league) in the year before, 1889, and in the year after, 1891.
  2. Found the ratio for each hitter between their R/G in the PL and their R/G in the preceding and succeeding years.
  3. Took the mean of all these ratios

The mean of ratios was 1.2360. So these players had higher R/G records in the PL than they had in other leagues. That is, any player who played in the PL, on average, had 1.2360 times as many R/G as he had when he was in a league other than the PL.

For brevity, let's call this ratio the "PL advantage".

Now a certain number of these players probably just happened to reach their peak year in 1890. How many players would this be? The average career length for an MLB player is currently 5.6 years. So 1 / 5.6 players would be at their peak in 1890. What happens if we eliminate the top 1 / 5.6 of the players (ordered descending by their PL advantage) from our data pool? If we do, then the ratio is 1.2257. This is still significant.

It is still possible that part of the reason for the higher R/G is because there were better players in the PL. This is something I plan on looking at. But at least part of the reason, as I established above, is that players had more R/G in the PL than they had in the other leagues in the two years surrounding the year of the PL.

So what was going on in the PL that caused players to have a higher R/G. Here are a couple things that might have been different in the PL than the other leagues in 1890:

  • Rules
  • Parks

These are things I would like to look at when I have time.

You can see my code and a derived table here:

I encourage people to look at my code and let me know if you find any problems or anything that can be improved.

My next task will be to establish whether PL players had higher R/G than non-PL players on average.

Perhaps the problem was that the pitching wasn't so hot?

The ERA for the PL was definitely higher than that of the AA and NL:

SELECT yearID, lgID, AVG(ERA)
FROM Teams 
WHERE 1889 <= yearID AND yearID <= 1891
GROUP BY yearID, lgID;

yearID        lgID	AVG(ERA)
1889	AA	3.845
1889	NL	4.0325
1890	AA	3.91777777778
1890	NL	3.5625
1890	PL	4.23875
1891	AA	3.61333333333
1891	NL	3.34375

However, ERA is sort of the flip side of the coin of R/G, I would think. So I'm not sure this proves anything. Is there a measure of pitching ability that would work better?

I need to do the following:

  • Decide on a pitching stat(s) similar to R/G batting stat. Perhaps this is ERA; perhaps not.
  • Compare the performance of PL pitchers with their performance in other leagues in 1889 and 1891 (for those pitchers who pitched in the majors all three seasons).

So here is a plot of league ERAs:

League ERAs

Yes, the ERA was significantly higher for the PL. So you could say the pitching was worse in the PL. Or is this caused by the batting being better in the PL?

Handedness in Canadian-born players versus U.S.-born players

I know that Canadian-born hockey players tend to mostly shoot left-handed while U.S.-born hockey players tend to shoot right-handed. There was an article in the NY Times about this a few years ago.

I wondered if this carried over to batting. It does.

         B    L    R
  CAN   10   96  104
  USA  881 4115 9454

If whether you bat left- or right-handed is somewhat independent of which is your dominant hand, then it follows that which way you bat is a learned skill. (I'm assuming that the distribution of hand dominance in ball players is the same as it is in the general population, but I don't know that for a fact.)

From that it seems to follow that you could be taught (at least if you started young enough) to be a switch-hitter.

Here is the SQL code:

SELECT playerID, birthCountry, bats
FROM baseball.master

Here is the R code:

handedness <- read.csv("./data/handedness.csv")
table(handedness$birthCountry, handedness$bats)

sabermetrics-101's People

Contributors

paul-reiners avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.