GithubHelp home page GithubHelp logo

mchien15 / datascience Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 45.92 MB

Soccer Players Data Analyst and Similar Players Finder

Python 24.33% Jupyter Notebook 75.67%
fbref soccer-analytics streamlit trino datalake datawarehouse object-storage

datascience's Introduction

Soccer Players Data Analyst and Similar Players Finder

Clone the repo

git clone https://github.com/mchien15/datascience.git

then navigate to repo's folder

Install required packages

pip install -r requirements.txt

This application also requires Docker, so if you haven't already installed it on your computer, please follow this INTRUCTION

Prepare the data

Start our data lake infrastructure

docker compose -f docker-compose.yml up -d

Clean data (drop replical columns, rename columns, convert csv files to parquet)

python clean_data.py

Generate data and push them to MinIO

python utils/export_data_to_datalake.py

Create data schema

After pushing your files to MinIO, please run the following command to execute the trino container:

docker exec -ti datalake-trino bash

When you are already inside the trino container, run trino to enter the interactive mode

After that, copy and run this chunk of commands to register a new schema for the data:

CREATE SCHEMA IF NOT EXISTS datalake.data_big_5_leagues WITH (location = 's3://data-big-5-leagues/');

CREATE TABLE IF NOT EXISTS datalake.data_big_5_leagues.all_leagues (
    Date VARCHAR,
    Name VARCHAR,
    Round VARCHAR,
    Venue VARCHAR,
    Result VARCHAR,
    Squad VARCHAR,
    Opponent VARCHAR,
    Start VARCHAR,
    Pos VARCHAR,
    Min DOUBLE,
    Cmp DOUBLE,
    PassAtt DOUBLE,
    CmpPct DOUBLE,
    PassTotDist DOUBLE,
    PassPrgDist DOUBLE,
    Cmp1 DOUBLE,
    Att1 DOUBLE,
    CmpPct1 DOUBLE,
    Cmp2 DOUBLE,
    Att2 DOUBLE,
    CmpPct2 DOUBLE,
    Cmp3 DOUBLE,
    Att3 DOUBLE,
    CmpPct3 DOUBLE,
    Ast DOUBLE,
    xAG DOUBLE,
    xA DOUBLE,
    KP DOUBLE,
    PassFinThird DOUBLE,
    PPA DOUBLE,
    CrsPA DOUBLE,
    PrgP DOUBLE,
    ID VARCHAR,
    SCA DOUBLE,
    PassLiveShot DOUBLE,
    PassDeadShot DOUBLE,
    TO DOUBLE,
    ShLSh DOUBLE,
    Fld DOUBLE,
    DefShot DOUBLE,
    GCA DOUBLE,
    PassLiveGoal DOUBLE,
    PassDeadGoal DOUBLE,
    TO1 DOUBLE,
    ShGoal DOUBLE,
    FldGoal DOUBLE,
    DefGoal DOUBLE,
    Tkl DOUBLE,
    TklW DOUBLE,
    TacklesDef3rd DOUBLE,
    TacklesMid3rd DOUBLE,
    TacklesAtt3rd DOUBLE,
    DribTackled DOUBLE,
    DribContest DOUBLE,
    DribTackledPct DOUBLE,
    Lost DOUBLE,
    Blocks DOUBLE,
    BlockSh DOUBLE,
    Pass DOUBLE,
    Int DOUBLE,
    TklPlusInt DOUBLE,
    Clr DOUBLE,
    Err DOUBLE,
    Touches DOUBLE,
    DefPen DOUBLE,
    TouchDef3rd DOUBLE,
    TouchMid3rd DOUBLE,
    TouchAtt3rd DOUBLE,
    AttPen DOUBLE,
    Live DOUBLE,
    Att DOUBLE,
    Succ DOUBLE,
    SuccPct DOUBLE,
    Tkld DOUBLE,
    TkldPct DOUBLE,
    Carries DOUBLE,
    TotDist DOUBLE,
    PrgDist DOUBLE,
    PrgC DOUBLE,
    CarriesFinThird DOUBLE,
    CPA DOUBLE,
    Mis DOUBLE,
    Dis DOUBLE,
    Rec DOUBLE,
    PrgR DOUBLE,
    Gls DOUBLE,
    PK DOUBLE,
    PKatt DOUBLE,
    Sh DOUBLE,
    SoT DOUBLE,
    CrdY DOUBLE,
    CrdR DOUBLE,
    xG DOUBLE,
    npxG DOUBLE
) WITH (
    external_location = 's3://data-big-5-leagues/players/',
    format = 'PARQUET'
);

Run the Streamlit app

Open the new terminal or run exit twice, then run this command

streamlit run Main_Page.py

Visit the URL displayed in the terminal (usually http://localhost:8501) to interact with the app

Some of the features of the app

Scouting Report and Similar Players Finder

Radar chart for players comparison

For each position, there will be different stats to be used to compare the players. For examples, these are the plots for comparing Messi - Neymar and Thiago Silva - Van Dijk

Scatter plot for metrics comparison

datascience's People

Contributors

mchien15 avatar nnminh322 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.