GithubHelp home page GithubHelp logo

data's Introduction

Data Repository

This repository contains data for my personal projects, which is usually a blog post on my website, a submission to #TidyTuesday, something to practice web scraping and data cleaning on, or some combination of these.

Feel free to use this data, but none of it belongs to me, so please do so in accordance with the dataset's license. You don't need to cite me if you use something I scraped or cleaned. (But I would love to hear about what you're doing with it!)

And if you have any questions, please @ me on Twitter (or DM if you prefer). It makes my day whenever I'm able to help someone, even if it's just answering a quick question!

Repository Structure

Each dataset has its own folder with the following structure. Bold text signifies a folder. Normal text represents files:

data-folder

  • README.md (description, data dictionary, and link to original source)
  • clean data files (usually in CSV format)
  • raw
    • original data files (if the raw folder exists, these files needed to be cleaned)
  • ref
    • reference documents (e.g., license, data dictionary from the data source)
  • src
    • scripts (to scrape or clean the original data)

Dataset Descriptions

Dataset Description Suited for... Blog Post
Air Raids on Britain in World War II Location, timing, and casualties from German air raids on the United Kingdom during World War II. Approximately 32,000 records. Mapping, animation with gganimate
Broadway Weekly Grosses Weekly box office grosses from Broadway shows from 1985-2020. Includes weekly grosses, seats sold, and average and top ticket prices for over 1,000 shows. (Featured in TidyTuesday) Time series analysis, Forecasting with forecast ✒️
Caribou Location Tracking Time-stamped location tracking of 260 individual caribou from herds in northern British Columbia, Canada, from 1988-2016. (Featured in TidyTuesday) Mapping, animation with gganimate
Children's Book Ratings Ratings (1-5) for over 9,000 children's books, as well as title, author, and publishing details. Empirical Bayes estimation ✒️
Duke Lemur Center Ages, weights, litter sizes, and other data for over 3,700 individual animals from 27 species of lemur. Survival analysis
Fictional Character Personalities Crowdsourced scores of personality traits for 800 characters from fictional works (e.g., Game of Thrones, The Wizard of Oz). There are 268 "traits" on an adjective-pair spectrum (e.g., cruel / kind​). Principal Component Analysis (PCA), Clustering ✒️
Himalayan Climbing Expeditions Records for over 10,000 expeditions from 1905-2019 that have climbed the Nepal Himalayas (which includes Mt. Everest), like dates, outcomes (success/failure), highpoints reached, and details on individual climbers, like sex, age, citizenship, and (if applicable) cause of injury or death. (Featured in TidyTuesday) Exploratory Data Analysis (EDA), Classification ✒️
Japanese Mascots (Yuru-chara) Records of about 4,000 Japanese mascots (yuru-chara), including name, affiliation, description, and rankings in the Yuru-Chara Grand Prix, an annual competition held since 2011. Natural Language Processing (NLP) with tidytext
Sakura (Cherry Blossom) Flowering Two datasets: (1) flowering dates of the sakura (cherry blossom) tree in Kyoto, from 812-2015; (2) flowering and full bloom dates of the sakura across Japan, from 1953-2019, which tracks the sakura zensen (cherry blossom front). Mapping, animation with gganimate ✒️
TV Tropes Full list of tropes (and short descriptions) from tvtropes.org, along with instances of tropes in over 2,000 American TV series. Network analysis, Principal Component Analysis (PCA)
U.S. Diplomatic Gifts Diplomatic gifts foreign governments given to U.S. government employees -- such as when a a foreign leader visits the White House -- from 1999-2018. Includes recipient, donor, gift description, and estimated dollar value. Natural Language Processing (NLP) with tidytext ✒️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.