GithubHelp home page GithubHelp logo

jquintanac / proj-etl Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.44 MB

Class project: a ETL process where I cleaned a DnD Kaggle dataset for being enriched with Wikipedia info through web scrapping.

Jupyter Notebook 100.00%
dnd etl python webscraping

proj-etl's Introduction

ETL PROJECT

alt text

The goal of this project is to practice the three processes for a data engineer: extract, transform and load the data. For this project, I extracted the data from Dungeons and Dragons (or DnD, a fantasy tabletop role-playing game dated in 1974 but with several updates) to create playable characters extracted from the Lord of the Rings movie.

Requeriments ✔

♦ Extract data from 3 different sources using 2 different tools.

♦ Use functions, list comprehensions, string operations, scrapping...

Sources and tools 🛠

For this project, my 3️⃣ sources were:

  • Five .csv files from Kaggle about DnD to build the database: classes, races, equipment, monsters and spells.

  • DnD API source to enrich the database.

  • Wikipedia, to get the information about the movie character.

The tools that have been employed were:

  • 'Get' requesting from the API source.

  • Web scrapping with Selenium.

Step-by-step process 🏃‍♂️

1️⃣ I started cleaning the data from Kaggle: classes.csv, races.csv and equipment.csv. Due to the hard cleaning, I decided to focus on main characteristics and I skipped monsters and spells. I got three new and clean .csv data and a new one from alignment based on the races description.

2️⃣ Thanks to the API source, I could work on new data such as stats, subclasses, subraces and languages. These tables are additional information about the character and background when you build the character.

3️⃣ I figured out I had not a straight relationship between some of my data, such as classes, stats or races, when indeed they have. So, I researched to fix it:

  • I wrote some functions to build the stats: a six dice function (as a basic six dice rolling), a random stats generator function (based on DnD handbook*1 where you roll four dices to sum the highest three to stablish a value for your stats) and a stats modifier function (based on DnD handbook*1 where you have to substract 10 to your stat and to divide it by 2 for getting your modifier). Now I could make any stat I wanted but I could not relationate them with other data, so I had to stablish predefined stats by class, based on the handbook*2. This way, stats and classes were relationated, but... what about races?

  • It was hard to join races and classes because of you can choose any class for any race though some classes are more appropiate for some races. So I search more information on internet and I found a survey*3 about 100.000 built characters with the combination class/race cases. I worked the table and I wrote the information on a excel doc, transforming the cases to frequencies and classifying them as 'infrequent' (low frequencies), 'recommended' (high frequencies) and neutral for the rest. Assigning the id for races and classes, I built the combo table that I needed.

4️⃣ With my 10 DnD tables, I could build the database so I started with the ERD diagram in MySQL to check the relationships and... it worked!

alt text

5️⃣ I just needed the info about the character to build his character sheet so... time to scrap Wikipedia! I focused on 'The Lord of the Rings: The Fellowship of the Ring' movie and thanks to Selenium scrapping I get two main data: Character name and race. I took three character as example: Frodo Bolson (a Hobbit), Gimli (a Dwarf) and Legolas (an Elf).

6️⃣ Time to create! I made a new table in MySQL with all the parameters I considered as important ones to the character sheet and the result was...

alt text

Now you can check all the posibilities you have to build your DnD character based on a LOTR character!

Useful Resources 💻

https://docs.python.org/3.7/howto/functional.html

https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

https://docs.python.org/3/tutorial/errors.html

https://stackoverflow.com/questions/tagged/string+python

Kaggle source: https://www.kaggle.com/datasets/shadowtime2000/dungeons-dragons

Api source: https://www.dnd5eapi.co/docs/#get-/api/subraces/-index-

Scrapping source: https://en.wikipedia.org/wiki/

*1 https://media.wizards.com/2016/downloads/DND/SRD-OGL_V5.1.pdf

*2 https://rpgbot.net/dnd5/characters/classes/

*3 https://www.enworld.org/attachments/db-vkqsw4aaajke-jpg.96949/

“You step into the road, and if you don’t keep your feet, there is no knowing where you might be swept off to.”J.R.R. Tolkien

proj-etl's People

Contributors

jquintanac avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.