GithubHelp home page GithubHelp logo

st-olz / metal_archives Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 2.0 23.39 MB

What are the predominant genres and themes in metal music? (web scraping, sql database, normalize text data, geographic heatmap, word cloud)

Python 0.14% Jupyter Notebook 97.06% HTML 2.80%
heatmaps python sql metal-archives metal-music scraper world-map

metal_archives's Introduction

What are the predominant genres and themes in metal music and where they are most popular

In this project we will web scrape, clean and analyse data about metal bands from all over the world. The goal is to answer the following questions:

  • Which countries are the most metal bands from in total numbers and by population?
  • What are the most popular sub-genres and lyrical themes?
  • Are there differences in the questions above for bands that are independent or signed to a label?
  • Are there differences in the questions above for bands from different world regions?

The data we use is web scraped from Encyclopaedia Metallum: The Metal Archives (https://www.metal-archives.com/) that is an online encyclopedia based upon musical artists who predominantly perform heavy metal music along with its various sub-genres. It attempts to provide comprehensive information on each band, such as a discography, logos, pictures, lyrics, line-ups, biography, trivia and user-submitted reviews. The site also provides a system for submitting bands to the archives. The website is free of advertisements and is run completely independently.

The data about country population is sourced from https://www.cia.gov/the-world-factbook and stored as CSV file worlds_popultion.csv.

To generate a geographic heat map with python library GeoPandas a shapefile was downloaded from https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-details/.

Web scraping metal_archives.com into a SQLite database

  1. To create the SQLite database run create_metal_db.py, a python script using the Sqlite3 library. It will also create the table "band_info" in which later the scraped data is committed. The table contains the following columns:
  • band_id: an integer as the auto-incremented primary key
  • band_name: the bands name, that do not have to be unique
  • url: the unique URL to the band's site on metal_archives.com
  • country_of_origin: the country the band was founded
  • location: the city or region the band is located
  • status: status of activity (activ, on hold, split up, changed name, unknown)
  • formed_in: year of founding
  • genre: type of played metal style, can be more than one genre
  • lyrical_themes: topic of lyrics, can be more than one theme
  • current_label: the current music label the band is signed or unsigned/independent
  • years_active: time ranges the band was activ, sometimes with additional information
  • data_retrieved: the date the data set was retrieved from the given URL
  1. To scrape the data run web_scraping_metal_archives.py. To get a list of bands and their websites URL it has to use the Selenium library. It allows to open the websites in a browser which is necessary to run JavaScript. Only in this way we get access to some needed data and are able to automate to change sites by click. To use Selenium in this script you need the Chrome browser and the latest version of Chromdriver which controls the browser. To finally collect the data from the individual band websites the script uses the Beautiful Soup library. For further information take a look at the comments in the python scripts. The data in the deposited database was retrieved between 15.12.2022 - 17.12.2022.

Additional to the highlighted libraries you also need: String, Time, Request and Datetime

Clean and analyse the web scraped data

The process of cleaning and analysing the data is described in Jupiter notebook analyse_metal_archives.ipynb.

Active metal bands per 100k inhabitants (World 2022)

metal_archives's People

Contributors

st-olz avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

felperez cpvargas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.