GithubHelp home page GithubHelp logo

tara-nguyen / soccer-competitiveness-k-means-clustering Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 2.13 MB

Using exploratory data analysis and k-means clustering to analyze competitive balance in soccer/football

R 100.00%
k-means-clustering k-means soccer-analytics machine-learning unsupervised-machine-learning r data-science data-science-project data-science-in-r data-visualization

soccer-competitiveness-k-means-clustering's Introduction

Analyzing the Competitive Balance of Different Soccer Leagues

  • AUTHOR: TARA NGUYEN
  • Project for the course Exploratory Data Analysis and Visualization at UCLA Extension
  • Completed in December 2020

Abstract

Background: Competitive balance, which refers to the degree of uncertainty regarding the outcome of a competition, is frequently debated among soccer fans and has received considerable attention both in and outside academia.

Data and research question: In this project I analyzed team performances (points per game, win proportions, etc.) in the four soccer leagues from the 2015/2016 season to the 2019/2020 season. The main research question was: Which soccer league is the most competitive?

Method and findings: The project was completed in R (except for part of the data cleaning process that was done in Excel). Through exploratory data analysis and k-means clustering, I found that, in general, the Major League Soccer was the more competitive than the Bundesliga, the La Liga, and the Premier League.

For a complete report, see the wiki page

List of files and directory in the repo

Plots - directory for plots created during data visualization

README.md - this document you are currently reading

References - directory for academic articles on competitive balance

all-form-leaguetables.csv - final data set

all-leaguetables.xlsx - season-end league tables in all 5 seasons of all 4 leagues

form-bundesliga.csv, form-epl.csv, form-laliga.csv, form-mls.csv - form tables in all 5 seasons of the Bundesliga, the EPL, the La Liga, and the MLS, respectively

leaguescompetitiveness_Analysis.R - main R script for data wrangling, visualization, and statistical analyses

leaguesfinaldat_DataWrangling.R - R script for creating the final data set

Usage Note

The dataset and R scripts are free for download and use, provided that proper credit is given.

If you mention or use any part of my research report, please provide a link to this repo.

soccer-competitiveness-k-means-clustering's People

Contributors

tara-nguyen avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.