beanumber / baseball_r Goto Github PK
View Code? Open in Web Editor NEWThis project forked from maxtoki/baseball_r
Companion to Analyzing Baseball Data with R, 2nd edition
This project forked from maxtoki/baseball_r
Companion to Analyzing Baseball Data with R, 2nd edition
Working on chapter 7 and pitch framing. I and a few others have had issues scraping pitchrx. Is there a workaround?
Having issues with the first step in 7.2 where we create an empty SQLite database using src_sqlite()
db <- src_sqlite("~/Desktop/Data Analysis/Analyzing Baseball Data with R/baseball_R/data/pitchrx.sqlite", create = TRUE)
this code leads to the error:
Error in (function (cond) :
error in evaluating the argument 'drv' in selecting a method for function 'dbConnect': there is no package called ‘RSQLite’
I then use the tbl() function which was prompted since src_sqlite is not available anymore:
db <- tbl("~/Desktop/Data Analysis/Analyzing Baseball Data with R/baseball_R/data/pitchrx.sqlite", create = TRUE)
which then leads to the error:
Error in UseMethod("tbl") :
no applicable method for 'tbl' applied to an object of class "character"
I'm curious if I entered the wrong arguments in the tbl() function, or if there is another way to get through this.
Hi Ben,
When I tried to duplicate the script in the beginning of Chapter 5, it seems that the "all2016.csv" file has not data in it. The "ros2016.csv" works fine. I tried this for other years as well (1950 as mentioned in the appendix, and 2018), the "all1950.csv" and "all2018.csv" have no data in it but the roster files do.
Attached is a screenshot for your reference. Thanks!
Hi I'm working my way through Analyzing Baseball with R. When working on the Chapter 2 first exercise, I'm getting the following error message even after typing straight from the answer given here (which is what I had done in the first place). What am I doing wrong?
SB.Attempt = SB + CS
Error in SB + CS : non-numeric argument to binary operator
Hi, I have been working through Chapter 3 on graphics. I am on page 86 and it provides code to read the all1998 retrosheet data file, but that doesnt exist in the data folder. Am I missing that file somewhere?
Greetings,
These are the codes I'm using.
library(tidyverse)
library(retrosheet)
download_retrosheet <- function(season) {
download.file(
url = paste0(
"http://www.retrosheet.org/events/", season, "eve.zip"),
destfile = file.path("retrosheet", "zipped",
paste0(season, "eve.zip"))
)
}
unzip_retrosheet <- function(season) {
unzip(file.path("retrosheet", "zipped",
paste0(season, "eve.zip")),
exdir = file.path("retrosheet", "unzipped"))
}
create_csv_file <- function(season) {
wd <- getwd()
setwd("retrosheet/unzipped")
cmd <- paste0("cwevent -y ", season, " -f 0-96 ",
season, ".EV", " > all", season, ".csv")
message(cmd)
if (.Platform$OS.type == "unix") {
system(cmd)
} else {
shell(cmd)
}
setwd(wd)
}
create_csv_roster <- function(season) {
rosters <- list.files(
path = file.path("retrosheet", "unzipped"),
pattern = paste0(season, ".ROS"),
full.names = TRUE)
rosters %>%
map_df(read_csv,
col_names = c("PlayerID", "LastName", "FirstName",
"Bats", "Pitches", "Team")) %>%
write_csv(path = file.path("retrosheet",
"unzipped",
paste0("roster", season, ".csv")))
}
cleanup <- function() {
files <- list.files(
path = file.path("retrosheet", "unzipped"),
pattern = "(.EV|.ROS|TEAM*)",
full.names = TRUE
)
unlink(files)
zips <- list.files(
path = file.path("retrosheet", "zipped"),
pattern = "*.zip",
full.names = TRUE
)
unlink(zips)
}
parse_retrosheet_pbp <- function(season) {
download_retrosheet(season)
unzip_retrosheet(season)
create_csv_file(season)
create_csv_roster(season)
cleanup()
}
After running the function parse_retrosheet_pbp(1950), Rstudio is giving me the following message:
cwevent -y 1950 -f 0-96 1950*.EV* > all1950.csv
'cwevent' is not recognized as an internal or external command,
operable program or batch file.
Warning messages:
1: In download.file(url = paste0("http://www.retrosheet.org/events/", :
URL http://www.retrosheet.org/events/1950eve.zip: cannot open destfile 'retrosheet/zipped/1950eve.zip', reason 'No such file or directory'
2: In download.file(url = paste0("http://www.retrosheet.org/events/", :
download had nonzero exit status
3: In unzip(file.path("retrosheet", "zipped", paste0(season, "eve.zip")), :
error 1 in extracting from zip file
4: In shell(cmd) :
'cwevent -y 1950 -f 0-96 1950*.EV* > all1950.csv' execution failed with error code 1
jboardman found a couple of small errors:
Having issues with creating the run expectancy matrix.
Here's my code
data2016 %>%
mutate(BASES =
paste(ifelse(BASE1_RUN_ID > '', 1, 0),
ifelse(BASE2_RUN_ID > '', 1, 0),
ifelse(BASE3_RUN_ID > '', 1, 0), sep = ""),
STATE = paste(BASES, OUTS_CT)) ->
data2016
This is the error warning message that I think is the root to my issue:
Warning messages:
1: Problem with mutate()
column BASES
.
ℹ BASES = paste(...)
.
ℹ ‘>’ not meaningful for factors
2: Problem with mutate()
column BASES
.
ℹ BASES = paste(...)
.
ℹ ‘>’ not meaningful for factors
3: Problem with mutate()
column BASES
.
ℹ BASES = paste(...)
.
ℹ ‘>’ not meaningful for factors
I have attached a screenshot of what it looks like in the data frame
I have also attached what the output looks like in 5.5 when we analyze Jose Altuve
It contains bibliographical information on every player and manager who have appeared at the Major League Baseball level and of all people who have been inducted in the Baseball Hall of Fame.
Table 1.4 displays statistics from the data file Pitching.csv for the seasons where Ruth was a pitcher.
The following questions can be answered with Lahman�s database.
Replace "per games�" to "per game�" (two times).
This table displays team statistics \footnote{Some of the less important statistics, such as Catcher Interference, have been omitted in Table \ref{tab:gamelog}} as well as the players' identities and fielding positions for the home team; similar statistics and player information are available for the visitor team.
with this paragraph:
This table displays team statistics \footnote{Some other team statistics, such as Stolen Bases and Caught Stealings, omitted in Table \ref{tab:gamelog}, are reported in Game log files.} as well as the players' identities and fielding positions for the home team; similar statistics and player information are available for the visitor team.
rstudio.org should be rstudio.com
Change "features of R" to "feature of R"
"ball in play�" should be "balls in play�"
Change "350 Wins" to "350-Wins� (three times)
hof <- read.csv("hofbatting.csv")
hof$MidCareer <- with(hof, (From + To) / 2)
hof$Era <- cut(hof$MidCareer,
breaks = c(1800, 1900, 1919, 1941, 1960, 1976, 1993, 2050),
labels = c("19th Century", "Dead Ball", "Lively Ball",
"Integration", "Expansion", "Free Agency",
"Long Ball"))
"dotplot�" should be "dot plot�"
In many places, "runs expectancy" should be replaced with "run expectancy". Similarly replace "runs value" with "run value" throughout this chapter.
Replace "Dotplot�" with "Stripchart�".
Replace "group argument" with "groups argument"
Replace "appearance of the line" with "appearance of the line,"
Replace "pitching statistics as from his MLB" to "pitching statistics from his MLB"
Replace "In addition, one adds the difference between the fielding position values of the two players." with "In addition, one subtracts the absolute value of the difference between the fielding position values of the two players. ".
Replace "field.csv�" with "fields.csv"
I'm a beginner here so please excuse my naiveness.
It appears that the book contains out-dated files from the Lahman data base? The Lahman data base that I've downloaded contains only a 'HallOfFame.csv' file and no longer separate ones like the book insinuates - hofbatting.csv & hofpitching.csv?
In order to get around this I read in the data through:
hof <- read_csv("Documents/R Project/Baseball/Lahman/core/HallOfFame.csv")
But the next code in the sequence I can't seem to navigate around. The message I keep receiving is "Error: object 'From' not found" for this code:
hof$MidCareer <- with(hof, (From + To) / 2)
hof$Era <- cut(hof$MidCareer,
breaks = c(1800, 1900, 1919, 1941, 1960, 1976, 1993, 2050),
labels = c("19th Century", "Dead Ball", "Lively Ball", "Integration", "Expansion", "Free Agency", "Long Ball"))
I've tried to find answers online about the 'From + To' function of the code but cant seem to find anything relevant. There must be an easy solution to this but I'm a beginner so I'm unaware of any easy fixes. Any help to this problem would be much appreciated, Thanks.
The following code for creating a run expectancy matrix in chapter 5 is giving me an error saying that "> not meaningful for factors" and failing to create the BASES and STATE variables
data2016 %>%
mutate(BASES =
paste(ifelse(BASE1_RUN_ID > ' ', 1, 0),
ifelse(BASE2_RUN_ID > ' ', 1, 0),
ifelse(BASE3_RUN_ID > ' ', 1, 0), sep = ' '),
STATE = paste(BASES, OUTS_CT)) ->
data2016
The career trajectory chart and line fit should exclude active players. You can see the shape of the curve descends surprisingly rapidly. This is explained by using players who are in the middle of their career who have not yet hit their future peak.
I'm using the 2021 data, but the concept should be the same, and my "before" plot looks similar to the one in the book.
Original plot, from L247 of trajectories.R
After removing players whose final year was earlier than 2018
This can be seen by calculating the final year for each playerID and plotting that per year and observing the large spike in the most recent year, which makes sense intuitively.
Getting this error from chapter 8.4.1:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
I copied the code exactly from this website:
midcareers <- batting_2000 %>%
group_by(playerID) %>%
summarize(Midyear = (min(yearID) + max(yearID)) / 2,
AB.total = first(Career.AB))
batting_2000 %>%
inner_join(midcareers, by = "playerID") -> batting_2000
models <- batting_2000 %>%
split(pull(., playerID)) %>%
map(~lm(OPS ~ I(Age - 30) + I((Age - 30)^2), data = .)) %>%
map_df(tidy, .id = "playerID")
I don't know why this error is appearing, everything else from the chapter has worked perfectly so far.
Hey man,
I am working through the lessons and homework and I am having trouble running the graph in section 3.8.3. I keep getting the following error.
ggplot(hr_ytd, aes(Date, cumHR, linetype = nameLast)) +
geom_line() +
geom_hline(yintercept = 62, color = crcblue) +
annotate("text", ymd("1998-04-15"), 65,
label = "62", color = crcblue) +
ylab("Home Runs in the Season")
Error in xj[i] : object of type 'closure' is not subsettable
I believe it has something to do with R not being able to subset a function. I am thinking the error is being caused by this line of code.
library(lubridate)
cum_hr <- function(d) {
d %>%
mutate(Date = ymd(str_sub(GAME_ID, 4, 11))) %>%
arrange(Date) %>%
mutate(HR = ifelse(EVENT_CD == 23, 1, 0),
cumHR = cumsum(HR)) %>%
select(Date, cumHR)
}
I am brand new to R so if I am making an obvious mistake I apologize in advance :D
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.