GithubHelp home page GithubHelp logo

eliekawerk / data-engineer-camp-elt-chess-project Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ericbjames/elt-chess-project

0.0 0.0 0.0 13.8 MB

Shell 6.11% Python 78.71% Makefile 8.52% Dockerfile 6.66%

data-engineer-camp-elt-chess-project's Introduction

CHESS_ELT_PROJECT

Table of Contents

Introduction

Welcome to my data engineering project where I use the Lichess API to analyze chess games played on their platform. Chess has seen a historic rise in popularity in recent years due to the ease and accessibility of quick online games that allow players to play with opponents from all around the world. One of the most popular forms of online chess is blitz and bullet chess, where games are played with time limits of 3 minutes or less.

Lichess, a non-profit and open-source online chess platform, has quickly become a favorite among chess enthusiasts due to its commitment to fairness, openness, and transparency. As an open-source platform, Lichess provides an API that allows developers to access a wealth of data about the games played on their platform.

In this Data Engineering project, I will use the Lichess API to collect, transform, and visualize data on chess games played on the platform. By analyzing this data, I hope to gain insights into the openings used by top players, as well as identify trends and patterns in the game that could be useful in improving player performance.

Project Architecture

Data Source

When it comes to getting data from the Lichess API, it's crucial to keep in mind the limitations that come with it. These include:

  • A throttling limit of only 20 games per second for game data retrieval.
  • Only one request can be made at a time, with a 1-minute timeout if this limit is exceeded.
  • Computer evaluation data is only accessible if the game has been previously requested to "Run Computer Analysis".

By being aware of these limitations, we can better design our data ingestion process to work within these constraints and avoid running into issues with API usage.

Ingestion

In the ingestion phase of the project, I've set up a Python script that fetches data from the Lichess API, specifically focusing on bullet, blitz, and ultrabullet games of user 'penguingim1'. The script operates in 21-second intervals (to comply with rate limits) for a duration of 10 minutes. This collected data is loaded into a Kafka topic, which is then transferred into Snowflake via a Confluent connector, setting the stage for dbt modeling and analysis.

Dimensional Model

Codebase

infrastructure/producer

The infrastructure/producer folder holds the code for the customer Kafka producer I created for data ingestion. This includes the Dockerfile that I use to host the producer apart from my dbt project on AWS.

warehouse/snowflake

The warehouse/snowflake folder holds the full dbt project which is also dockerized for AWS.

DBT Lineage

Final Dashboard

Before creating a dashboard ask yourself a few questions:

  • What is the primary focus?
  • What type of decisions will be made based on the data?
  • How detailed does your dashboard need to be?

The primary focus of this particular dashboard is to help the viwer choose new Chess Openings to try out based on what the selected talented player uses.

The completed visualization includes a few graphs:

  • Win percentage by chess opening, sorted by play count (Table)
  • Chess rating over time
  • Win Rate by day
  • Top 10 Chess Openings play rate

data-engineer-camp-elt-chess-project's People

Contributors

ericbjames avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.