nflfastR
is a set of functions to efficiently scrape NFL play-by-play
data. nflfastR
expands upon the features of nflscrapR:
- The package contains NFL play-by-play data back to 1999
- As suggested by the package name, it obtains games much faster
- Includes completion probability (
cp
) and completion percentage over expected (cpoe
) in play-by-play going back to 2006 - Includes drive information, including drive starting position and drive result
- Includes series information, including series number and series success
- Hosts a repository of play-by-play data going back to 1999 for very quick access
- Features new and enhanced models for Expected Points, Win Probability, and Completion Probability (see section below)
We owe a debt of gratitude to the original
nflscrapR
team, Maksim
Horowitz, Ronald Yurko, and Samuel Ventura, without whose contributions
and inspiration this package would not exist.
You can load and install nflfastR from GitHub with:
# If 'devtools' isn't installed run
# install.packages("devtools")
devtools::install_github("mrcaseb/nflfastR")
We have provided some application examples under vignette("examples")
.
However, these require a basic knowledge of R. For this reason we have
the nflfastR beginner’s guide in vignette("beginners_guide")
,
which we recommend to all those who are looking for an introduction to
nflfastR with R.
Even though nflfastR
is very fast, for historical games we recommend
downloading the data from
here. These data sets
include play-by-play data of complete seasons going back to 1999 and we
will update them in 2020 once the season starts. The files contain both
regular season and postseason data, and one can use game_type or week
to figure out which games occurred in the postseason. Data are available
as .csv.gz, .parquet, or .rds.
nflfastR
uses its own models for Expected Points, Win Probability,
Completion Probability, and Expected Yards After the Catch. To read
about the models, please see vignette("nflfastR-models")
. For a more
detailed description of Expected Points models, we highly recommend this
paper from the nflscrapR team located
here.
Here is a visualization of the Expected Points model by down and yardline.
Here is a visualization of the Completion Probability model by air yards and pass direction.
nflfastR
includes two win probability models: one with and one without
incorporating the pre-game spread.
- To Nick Shoemaker for finding
and making available JSON-formatted NFL play-by-play back
to 1999 (
nflfastR
uses this source for 1999-2010) - To Lau Sze Yui for developing a scraping function to access JSON-formatted NFL play-by-play beginning in 2011.
- To Lee Sharpe for curating a resource for game information
- To Timo Riske, Lau Sze
Yui, Sean
Clement, and Daniel
Houston for many helpful
discussions regarding the development of the new
nflfastR
models - To Zach Feldman and Josh Hermsmeyer for many helpful discussions about CPOE models as well as Peter Owen for many helpful suggestions for the CP model
- To Florian Schmitt for the logo design
- The many users who found and reported bugs in
nflfastR
1.0 - And of course, the original
nflscrapR
team, Maksim Horowitz, Ronald Yurko, and Samuel Ventura, whose work represented a dramatic step forward for the state of public NFL research