GithubHelp home page GithubHelp logo

marcolussetti / opendotadump-tools Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 352.62 MB

Analysis & Tools for the 2011-2016 OpenDota Dump (~1.2 billion matches)

HTML 0.01% Java 0.17% Python 0.05% Jupyter Notebook 98.37% PostScript 0.10% TeX 1.30%
dota2 dota2-heroes opendota time-series timeseries-analysis dimensionality-reduction resolution-reduction python java graph

opendotadump-tools's Introduction

OpenDotaDump Tools

This repository hosts some tools and analysis performed on the second OpenDota Data Dump.

These tools and analysis were presented as a poster at the TRU Undergraduate Research & Innovation Conference 2019.

We also presented these results as two presentations in class and at the TRU Computing Science Showcase - Winter 2019.

We submitted a report as part of our class that is an expanded version of the previously submitted work.

Data Set

The data set used for this analysis is the matches file from the aforementioned Open Dota Data Dump. It includes over one billion matches from March 2011 to March 2016. You can download a sample (3GB) or the full file (157GB compressed, 1.2TB uncompressed). The sample file and the full file maintain the same structure as CSV files. We suggest using the sample file for exploring the data before switching to the full file.

While not used currently in this project, there are also two other files in this data dump: the player_matches file whose full version is no longer available (lack of seeds) and the match_skill file which contains further metadata about matches.

The format of the dataset is described on the OpenDota project wiki.

Sample of data

match_id,match_seq_num,radiant_win,start_time,duration,tower_status_radiant,tower_status_dire,barracks_status_radiant,barracks_status_dire,cluster,first_blood_time,lobby_type,human_players,leagueid,positive_votes,negative_votes,game_mode,engine,picks_bans,parse_status,chat,objectives,radiant_gold_adv,radiant_xp_adv,teamfights,version,pgroup
2304340261,2019317886,t,1461013929,1701,1975,4,63,3,155,100,0,10,0,0,0,1,1,,3,,,,,,,"{""0"":{""account_id"":4294967295,""hero_id"":93,""player_slot"":0},""1"":{""account_id"":4294967295,""hero_id"":75,""player_slot"":1},""2"":{""account_id"":4294967295,""hero_id"":19,""player_slot"":2},""3"":{""account_id"":4294967295,""hero_id"":44,""player_slot"":3},""4"":{""account_id"":4294967295,""hero_id"":7,""player_slot"":4},""128"":{""account_id"":4294967295,""hero_id"":46,""player_slot"":128},""129"":{""account_id"":45475622,""hero_id"":38,""player_slot"":129},""130"":{""account_id"":4294967295,""hero_id"":52,""player_slot"":130},""131"":{""account_id"":4294967295,""hero_id"":43,""player_slot"":131},""132"":{""account_id"":4294967295,""hero_id"":60,""player_slot"":132}}"

Generated (condensed) Data Sets

The condensed versions produced with our tools (see below) are available on this repository:

Tools

Matches Condenser

Our Matches Condenser tool is a Java tool that condenses the matches file by performing dimensionality and granularity reduction. Depending on the arguments used, it will either produce a JSON of picks per hero per day, or wins and losses per hero per day. We intend to expand this tool to allow for more metadata to be retained, particularly on matches duration. The last stable release available as a [JAR][matches-condenser-release] only produces picks per day and is meant for use with the JSON to CSV tool.

JSON to CSV

Our JSON to CSV tool performs data cleaning, labelling of heroes, and converts the output of the Matches Condenser into a pretty straight forward CSV file. However, it does not yet support the new [win, losses] format unfortunately.

Temporarily we have two Jupyter Notebooks that do so: OpenDota_Picks_JSON_to_CSV.ipynb produces a normal picks only CSV from the winratio JSON, and OpenDota_Picks_JSON_to_CSV_winratio.ipynb produces a CSV that retains win ratios but does not fully clean up the data. We intend to incorporate these back into the tool shortly.

Analysis

Our analysis has been done via Jupyter Notebooks which we must note are not yet cleaned up well: both the graph generation for the production of our poster as well as exploratory data analysis are still grouped together in the notebooks, and it takes quite a while to generate these graphs. We fully intend on cleaning up these notebooks in the future.

Please also note that due to the versions of various libraries needed by Plotnine and the versions provided in Google Colab, you will need to run the pip commands at the top of the notebook and then reload the runtime once before running the notebook through. If running through the entire notebook, please be aware that the large graph size and the complexity of the graphs mean that it may take quite a while to run through it.

  • LookupSpikes Open In Colab: This notebook is the closest thing to a usable analysis tool. It's a WIP cleanup of the GraphPicks (see below) original notebook, but it needs a lot of cleaning still.
  • GraphPicks Open In Colab: This notebook was the main exploratory analysis tool, and produced the early graphs
  • GraphWinRatios Open In Colab: This notebook was the main exploratory tool for looking at Win Ratios

Other Components

Poster

As mentioned above, we had the privilege of presenting a poster at the [TRU Undergraduate Research & Innovation Conference 2019][undegrad-conference-session]. The tools and source files used in generating the poster are included in this repository.

Prints

We also produced a number of smaller prints for explanatory purposes that we brought along to the conference.

Citation

If for any reason you wish to cite this work, we suggest using our poster's presentation as base:

Lussetti, M., & Fraser, D. (2019, March 29). Big Data Reduction: Lessons Learned from Analyzing One Billion Dota 2 Matches. Presented at the 14th annual TRU Undergraduate Research & Innovation Conference, Kamloops, Canada. Retrieved from https://digitalcommons.library.tru.ca/urc/2019/postersb/26

Lussetti, M., & Fraser, D. (2019, March 29). Big Data Reduction: Lessons Learned from Analyzing One Billion Dota 2 Matches. Presented at the 14th annual TRU Undergraduate Research & Innovation Conference, Kamloops, Canada. Retrieved from https://digitalcommons.library.tru.ca/urc/2019/postersb/26

opendotadump-tools's People

Contributors

marcolussetti avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.