GithubHelp home page GithubHelp logo

joshvarty / haliteio Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 2.93 MB

An attempt at making a Halite III agent trained with reinforcement learning

License: MIT License

CMake 1.66% C++ 95.72% Shell 0.79% Batchfile 0.88% Python 0.95%

haliteio's People

Contributors

joshvarty avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

navalmen007

haliteio's Issues

Convert map into 64x64 inputs

Input for ship and spawn

- 64x64 plane for Halite           
- 64x64 plane for current unit      
- 64x64 plane for my other units
- 64x64 plane for my units halite
- 64x64 plane for my shipyard
- 64x64 plane for my dropoff
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite

 -64x64 plane for my total score
- 64x64 plane for enemy score
- 64x64 plane for enemy score
- 64x64 plane for enemy score

- 64x64 plane for number of steps (maybe how many steps are left?)

2018.12.16

  • Finish removing spawn stuff (go back to fixed schedule)
  • Correct MyBot.cpp for new approach
  • Visualize current performance
  • Remove dropoff action?
  • Set up GCloud?

Don't train on every rollout item

I'm worried about high correlations between rollouts generated from a single game. (Halite is identical, dropoffs identical, ships usually in the same space etc.)

Consider training on every second or third example and just play more games to create enough rollout items.

Normalize/Scale input layers to network

One of the reasons I think we are getting large outputs is because of the large variations in scale between the different layers of our network. (Eg. Halite on the map is 0 to 1,000 while presence of a player is 0-1)

For this reason I propose we normalize the inputs of our network.

# Feature Current Range Normalized range
1 Map Halite [0, inf) [-0.5,~0.5]
2 Steps Remaining [0,500] [-0.5,0.5]
3 My Ships [0,1] [0,1]
4 My Ships' Halite [0,1000] [-0.5,0.5]
5 My Dropoffs [0,1] [0, 1]
6 My Score [0,Inf) [-0.5,~0.5]
7 Enemy Ships [0,1] [0,1]
8 Enemy Ship's Halite [0, 1000] [-0.5,0.5]
9 Enemy Dropoff [0,1] [0,1]
10 Enemy Score [0, inf) [-0.5, 0.5]
11 Current Entity Location [0,1] [0,1]
12 Current Entity Halite [0,1000] [-1,1]

Try just using score as reward

We should try using the score as the reward and see if optimizing directly for score helps us create better agents.

I'm still worried that the agents will learn to cooperate rather than compete, but it's worth a shot since learning seems to slow with +1 and -1 rewards.

Plan for 2018.11.28

Things I would like to do today:

  • Finish implementing forward() for the model
  • Finish calculating losses
  • Run a few training epochs
  • (stretch) Print out the losses and scores our agents are attaining

Approach for testing out different hyperparameters

We need an approach for testing out different hyperparameters. Here are some things we should implement/do:

  • Print out all hyperparameters at beginning of run
  • Print out network information at beginning of run
  • Print out mean losses
  • Average metrics over longer periods of time (Right now it's 10, try 50?)

2018.12.17

  • Visualize current progress on one agent
  • Set up GCloud
  • Run grid search on GCloud
  • Visualize input layers
  • Plot input layers

Compress state representation

2000 rollouts * 12 depth * 64 height * 64 width = 98 MB.

We should use a more compact state representation and expand it only when we need it.

We may also wish to investigate using sparse tensors here.

Save the best weights to disk

Whenever we get a new "best" network, we should save that network's weights so we can load it from another implementation and watch the replays.

Compare results from grid search

Let's start looking at the results and ruling out certain hyperparameter ranges.

Dicsount Rounds Batch Rollout LR Steps Score Graphs
0.99 3 32 1000 1e-7 356.5 313.065 Img
0.99 3 32 1000 1e-8 11.79 - Img
0.99 3 32 1000 1e-9 4.8 - Img
0.99 3 32 5000 1e-7 331 63.38 Img
0.99 3 32 5000 1e-8 323** 149 Img
0.99 3 32 5000 1e-9 6 - Img
0.99 3 64 1000 1e-7 346 71 Img
0.99 3 64 1000 1e-8 6.6 - Img
0.99 3 64 1000 1e-9 4.77 - Img
0.99 3 64 5000 1e-7 360 92.2 Img
0.99 3 64 5000 1e-8 247.1** 75.1 Img
0.99 3 64 5000 1e-9 5.2 - Img
0.99 3 128 1000 1e-7 338.9 108 Img
0.99 3 128 1000 1e-8 5.3 - Img
0.99 3 128 1000 1e-9 4.7 - Img
0.99 3 128 5000 1e-7 396.6 46.0 Img
0.99 3 128 5000 1e-8 96.9** 150.8 Img
0.99 3 128 5000 1e-9 5.1 - Img
0.99 5 32 1000 1e-7 362 139.8 Img
0.99 5 32 1000 1e-8 122.7 139.3 Img
0.99 5 32 1000 1e-9 5.0 - Img
0.99 5 32 5000 1e-7 395.0 113.4 Img
0.99 5 32 5000 1e-8 363.1 230.4 Img
0.99 5 32 5000 1e-9 7.0 - Img
0.99 5 64 1000 1e-7 366 44.4 Img
0.99 5 64 1000 1e-8 12.0 - Img
0.99 5 64 1000 1e-9 5.0 - Img
0.99 5 64 5000 1e-7 387.5 125.0 Img
0.99 5 64 5000 1e-8 Img
0.99 5 64 5000 1e-9 Img
0.99 5 128 1000 1e-7 Img
0.99 5 128 1000 1e-8 Img
0.99 5 128 1000 1e-9 Img
0.99 5 128 5000 1e-7 Img
0.99 5 128 5000 1e-8 Img
0.99 5 128 5000 1e-9 Img
0.99 10 32 1000 1e-7 Img
0.99 10 32 1000 1e-8 Img
0.99 10 32 1000 1e-9 Img
0.99 10 32 5000 1e-7 Img
0.99 10 32 5000 1e-8 Img
0.99 10 32 5000 1e-9 Img
0.99 10 64 1000 1e-7 Img
0.99 10 64 1000 1e-8 Img
0.99 10 64 1000 1e-9 Img
0.99 10 64 5000 1e-7 Img
0.99 10 64 5000 1e-8 Img
0.99 10 64 5000 1e-9 Img
0.99 10 128 1000 1e-7 Img
0.99 10 128 1000 1e-8 Img
0.99 10 128 1000 1e-9 Img
0.99 10 128 5000 1e-7 Img
0.99 10 128 5000 1e-8 Img
0.99 10 128 5000 1e-9 Img
0.995 3 32 1000 1e-7 356 81.995 Img
0.995 3 32 1000 1e-8 16 - Img
0.995 3 32 1000 1e-9 5 - Img
0.995 3 32 5000 1e-7 339 75 Img
0.995 3 32 5000 1e-8 295.7** 165.8 Img
0.995 3 32 5000 1e-9 5 - Img
0.995 3 64 1000 1e-7 325 128 Img
0.995 3 64 1000 1e-8 9.0 - Img
0.995 3 64 1000 1e-9 4.83 - Img
0.995 3 64 5000 1e-7 353 80.7 Img
0.995 3 64 5000 1e-8 295.4 110.2 Img
0.995 3 64 5000 1e-9 5 - Img
0.995 3 128 1000 1e-7 300 117.6 Img
0.995 3 128 1000 1e-8 5 - Img
0.995 3 128 1000 1e-9 4.7 - Img
0.995 3 128 5000 1e-7 377.8 123 Img
0.995 3 128 5000 1e-8 137.1** 140.215 Img
0.995 3 128 5000 1e-9 4.9 - Img
0.995 5 32 1000 1e-7 349.1 79.3 Img
0.995 5 32 1000 1e-8 44.0 - Img
0.995 5 32 1000 1e-9 5.1 - Img
0.995 5 32 5000 1e-7 371.2 142.2 Img
0.995 5 32 5000 1e-8 342.7 97.5 Img
0.995 5 32 5000 1e-9 7.7 - Img
0.995 5 64 1000 1e-7 359.7 114.1 Img
0.995 5 64 1000 1e-8 9.9 - Img
0.995 5 64 1000 1e-9 5.0 - Img
0.995 5 64 5000 1e-7 380.8 111.7 Img
0.995 5 64 5000 1e-8 Img
0.995 5 64 5000 1e-9 Img
0.995 5 128 1000 1e-7 Img
0.995 5 128 1000 1e-8 Img
0.995 5 128 1000 1e-9 Img
0.995 5 128 5000 1e-7 Img
0.995 5 128 5000 1e-8 Img
0.995 5 128 5000 1e-9 Img
0.995 10 32 1000 1e-7 Img
0.995 10 32 1000 1e-8 Img
0.995 10 32 1000 1e-9 Img
0.995 10 32 5000 1e-7 Img
0.995 10 32 5000 1e-8 Img
0.995 10 32 5000 1e-9 Img
0.995 10 64 1000 1e-7 Img
0.995 10 64 1000 1e-8 Img
0.995 10 64 1000 1e-9 Img
0.995 10 64 5000 1e-7 Img
0.995 10 64 5000 1e-8 Img
0.995 10 64 5000 1e-9 Img
0.995 10 128 1000 1e-7 Img
0.995 10 128 1000 1e-8 Img
0.995 10 128 1000 1e-9 Img
0.995 10 128 5000 1e-7 Img
0.995 10 128 5000 1e-8 Img
0.995 10 128 5000 1e-9 Img

*High score likely due to opponent failing very quickly
** Appears to still be improving

Track training time for grid search

It is probably useful to know how roughly long it takes to train different algorithms. This will likely only be an approximation since the training time will vary based on load but it might be useful to have.

2018.12.05

Things we'd like to look at today:

  • Create script that plots scores and steps for completed runs when given a log file
  • Visualize a few games with hyperparameters that seemed to do well

Chart results from grid search

We're running a pretty costly grid search over hyperparameters right now. It would help to have a script that went through the output files and charted results from each search.

Scale down to 32x32

After fixing #6 we should scale down the map to see if that improves anything. At the very least it may make simulations run much faster.

2018.11.29

Things I would like to do today:

  • Investigate that from_blob is working properly for sampled_log_probs and others
  • #8
  • Try to guarantee backprop works properly

Track number of self-crashes

We should measure how many self-crashes occur to observe how this number changes when we use different hyperparameters.

2018.12.18

  • Build a ResNet
  • Test ResNet on MNIST
  • Count/Plot input layer values
  • Optionally change output

Figure out how to restart game engine from agent

Our agent needs to be able to restart the game in order to generate new rollouts.

It seems to be something similar to the following:

//NOTE: These seem to be the steps necessary to start a new game!
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
hlt::Map map(map_parameters.width, map_parameters.height);
hlt::mapgen::Generator::generate(map, map_parameters);

std::string replay_directory = replay_arg.getValue();
if (replay_directory.back() != SEPARATOR) replay_directory.push_back(SEPARATOR);

hlt::GameStatistics game_statistics;
hlt::Replay replay{game_statistics, map_parameters.num_players, map_parameters.seed, map};
Logging::log("Map seed is " + std::to_string(map_parameters.seed));

hlt::Halite game(map, game_statistics, replay);
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2018.11.26

Tasks we would like to work on today:

  • Finish Batcher class

    • Batcher should take a list of rollouts and allow us to randomly sample batches of them
  • Investigate dones

    • Realized last night that we might be representing done incorrectly
  • Begin train_network() method

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.