joshvarty / haliteio Goto Github PK
View Code? Open in Web Editor NEWAn attempt at making a Halite III agent trained with reinforcement learning
License: MIT License
An attempt at making a Halite III agent trained with reinforcement learning
License: MIT License
Input for ship and spawn
- 64x64 plane for Halite
- 64x64 plane for current unit
- 64x64 plane for my other units
- 64x64 plane for my units halite
- 64x64 plane for my shipyard
- 64x64 plane for my dropoff
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite
-64x64 plane for my total score
- 64x64 plane for enemy score
- 64x64 plane for enemy score
- 64x64 plane for enemy score
- 64x64 plane for number of steps (maybe how many steps are left?)
MyBot.cpp
for new approachI'm worried about high correlations between rollouts generated from a single game. (Halite is identical, dropoffs identical, ships usually in the same space etc.)
Consider training on every second or third example and just play more games to create enough rollout items.
One of the reasons I think we are getting large outputs is because of the large variations in scale between the different layers of our network. (Eg. Halite on the map is 0 to 1,000 while presence of a player is 0-1)
For this reason I propose we normalize the inputs of our network.
# | Feature | Current Range | Normalized range |
---|---|---|---|
1 | Map Halite | [0, inf) |
[-0.5,~0.5] |
2 | Steps Remaining | [0,500] |
[-0.5,0.5] |
3 | My Ships | [0,1] |
[0,1] |
4 | My Ships' Halite | [0,1000] |
[-0.5,0.5] |
5 | My Dropoffs | [0,1] |
[0, 1] |
6 | My Score | [0,Inf) |
[-0.5,~0.5] |
7 | Enemy Ships | [0,1] |
[0,1] |
8 | Enemy Ship's Halite | [0, 1000] |
[-0.5,0.5] |
9 | Enemy Dropoff | [0,1] |
[0,1] |
10 | Enemy Score | [0, inf) |
[-0.5, 0.5] |
11 | Current Entity Location | [0,1] |
[0,1] |
12 | Current Entity Halite | [0,1000] |
[-1,1] |
Double check that our calculation of log_probs is correct and equivalent to PyTorch's distributions model
We should try using the score as the reward and see if optimizing directly for score helps us create better agents.
I'm still worried that the agents will learn to cooperate rather than compete, but it's worth a shot since learning seems to slow with +1 and -1 rewards.
I've littered 12,64,64
all over our code and we should probably just declare these in a single place so it won't be painful if we change the input structure.
500 steps is a lot. Maybe we should use a larger discount rate 0.999
?
In order to give the agent experience in different situations, consider randomizing where the dropoff spawns for each player.
Things I would like to do today:
forward()
for the modelSimilar to #1 I need to convert the board into an input for my neural network.
Gradient clipping might help us avoid exploding gradients without having to use such a small learning rate. We should try to implement it.
We need an approach for testing out different hyperparameters. Here are some things we should implement/do:
Everyone says you should have it, so let's add it and see how it performs.
If we would like to have varied states, we could consider having random offsets for each game. We would have to properly handle wrapping around the edges of the board though.
2000 rollouts * 12 depth * 64 height * 64 width = 98 MB.
We should use a more compact state representation and expand it only when we need it.
We may also wish to investigate using sparse tensors here.
Whenever we get a new "best" network, we should save that network's weights so we can load it from another implementation and watch the replays.
Let's start looking at the results and ruling out certain hyperparameter ranges.
Dicsount | Rounds | Batch | Rollout | LR | Steps | Score | Graphs |
---|---|---|---|---|---|---|---|
0.99 | 3 | 32 | 1000 | 1e-7 | 356.5 | 313.065 | Img |
0.99 | 3 | 32 | 1000 | 1e-8 | 11.79 | - | Img |
0.99 | 3 | 32 | 1000 | 1e-9 | 4.8 | - | Img |
0.99 | 3 | 32 | 5000 | 1e-7 | 331 | 63.38 | Img |
0.99 | 3 | 32 | 5000 | 1e-8 | 323** | 149 | Img |
0.99 | 3 | 32 | 5000 | 1e-9 | 6 | - | Img |
0.99 | 3 | 64 | 1000 | 1e-7 | 346 | 71 | Img |
0.99 | 3 | 64 | 1000 | 1e-8 | 6.6 | - | Img |
0.99 | 3 | 64 | 1000 | 1e-9 | 4.77 | - | Img |
0.99 | 3 | 64 | 5000 | 1e-7 | 360 | 92.2 | Img |
0.99 | 3 | 64 | 5000 | 1e-8 | 247.1** | 75.1 | Img |
0.99 | 3 | 64 | 5000 | 1e-9 | 5.2 | - | Img |
0.99 | 3 | 128 | 1000 | 1e-7 | 338.9 | 108 | Img |
0.99 | 3 | 128 | 1000 | 1e-8 | 5.3 | - | Img |
0.99 | 3 | 128 | 1000 | 1e-9 | 4.7 | - | Img |
0.99 | 3 | 128 | 5000 | 1e-7 | 396.6 | 46.0 | Img |
0.99 | 3 | 128 | 5000 | 1e-8 | 96.9** | 150.8 | Img |
0.99 | 3 | 128 | 5000 | 1e-9 | 5.1 | - | Img |
0.99 | 5 | 32 | 1000 | 1e-7 | 362 | 139.8 | Img |
0.99 | 5 | 32 | 1000 | 1e-8 | 122.7 | 139.3 | Img |
0.99 | 5 | 32 | 1000 | 1e-9 | 5.0 | - | Img |
0.99 | 5 | 32 | 5000 | 1e-7 | 395.0 | 113.4 | Img |
0.99 | 5 | 32 | 5000 | 1e-8 | 363.1 | 230.4 | Img |
0.99 | 5 | 32 | 5000 | 1e-9 | 7.0 | - | Img |
0.99 | 5 | 64 | 1000 | 1e-7 | 366 | 44.4 | Img |
0.99 | 5 | 64 | 1000 | 1e-8 | 12.0 | - | Img |
0.99 | 5 | 64 | 1000 | 1e-9 | 5.0 | - | Img |
0.99 | 5 | 64 | 5000 | 1e-7 | 387.5 | 125.0 | Img |
0.99 | 5 | 64 | 5000 | 1e-8 | Img | ||
0.99 | 5 | 64 | 5000 | 1e-9 | Img | ||
0.99 | 5 | 128 | 1000 | 1e-7 | Img | ||
0.99 | 5 | 128 | 1000 | 1e-8 | Img | ||
0.99 | 5 | 128 | 1000 | 1e-9 | Img | ||
0.99 | 5 | 128 | 5000 | 1e-7 | Img | ||
0.99 | 5 | 128 | 5000 | 1e-8 | Img | ||
0.99 | 5 | 128 | 5000 | 1e-9 | Img | ||
0.99 | 10 | 32 | 1000 | 1e-7 | Img | ||
0.99 | 10 | 32 | 1000 | 1e-8 | Img | ||
0.99 | 10 | 32 | 1000 | 1e-9 | Img | ||
0.99 | 10 | 32 | 5000 | 1e-7 | Img | ||
0.99 | 10 | 32 | 5000 | 1e-8 | Img | ||
0.99 | 10 | 32 | 5000 | 1e-9 | Img | ||
0.99 | 10 | 64 | 1000 | 1e-7 | Img | ||
0.99 | 10 | 64 | 1000 | 1e-8 | Img | ||
0.99 | 10 | 64 | 1000 | 1e-9 | Img | ||
0.99 | 10 | 64 | 5000 | 1e-7 | Img | ||
0.99 | 10 | 64 | 5000 | 1e-8 | Img | ||
0.99 | 10 | 64 | 5000 | 1e-9 | Img | ||
0.99 | 10 | 128 | 1000 | 1e-7 | Img | ||
0.99 | 10 | 128 | 1000 | 1e-8 | Img | ||
0.99 | 10 | 128 | 1000 | 1e-9 | Img | ||
0.99 | 10 | 128 | 5000 | 1e-7 | Img | ||
0.99 | 10 | 128 | 5000 | 1e-8 | Img | ||
0.99 | 10 | 128 | 5000 | 1e-9 | Img | ||
0.995 | 3 | 32 | 1000 | 1e-7 | 356 | 81.995 | Img |
0.995 | 3 | 32 | 1000 | 1e-8 | 16 | - | Img |
0.995 | 3 | 32 | 1000 | 1e-9 | 5 | - | Img |
0.995 | 3 | 32 | 5000 | 1e-7 | 339 | 75 | Img |
0.995 | 3 | 32 | 5000 | 1e-8 | 295.7** | 165.8 | Img |
0.995 | 3 | 32 | 5000 | 1e-9 | 5 | - | Img |
0.995 | 3 | 64 | 1000 | 1e-7 | 325 | 128 | Img |
0.995 | 3 | 64 | 1000 | 1e-8 | 9.0 | - | Img |
0.995 | 3 | 64 | 1000 | 1e-9 | 4.83 | - | Img |
0.995 | 3 | 64 | 5000 | 1e-7 | 353 | 80.7 | Img |
0.995 | 3 | 64 | 5000 | 1e-8 | 295.4 | 110.2 | Img |
0.995 | 3 | 64 | 5000 | 1e-9 | 5 | - | Img |
0.995 | 3 | 128 | 1000 | 1e-7 | 300 | 117.6 | Img |
0.995 | 3 | 128 | 1000 | 1e-8 | 5 | - | Img |
0.995 | 3 | 128 | 1000 | 1e-9 | 4.7 | - | Img |
0.995 | 3 | 128 | 5000 | 1e-7 | 377.8 | 123 | Img |
0.995 | 3 | 128 | 5000 | 1e-8 | 137.1** | 140.215 | Img |
0.995 | 3 | 128 | 5000 | 1e-9 | 4.9 | - | Img |
0.995 | 5 | 32 | 1000 | 1e-7 | 349.1 | 79.3 | Img |
0.995 | 5 | 32 | 1000 | 1e-8 | 44.0 | - | Img |
0.995 | 5 | 32 | 1000 | 1e-9 | 5.1 | - | Img |
0.995 | 5 | 32 | 5000 | 1e-7 | 371.2 | 142.2 | Img |
0.995 | 5 | 32 | 5000 | 1e-8 | 342.7 | 97.5 | Img |
0.995 | 5 | 32 | 5000 | 1e-9 | 7.7 | - | Img |
0.995 | 5 | 64 | 1000 | 1e-7 | 359.7 | 114.1 | Img |
0.995 | 5 | 64 | 1000 | 1e-8 | 9.9 | - | Img |
0.995 | 5 | 64 | 1000 | 1e-9 | 5.0 | - | Img |
0.995 | 5 | 64 | 5000 | 1e-7 | 380.8 | 111.7 | Img |
0.995 | 5 | 64 | 5000 | 1e-8 | Img | ||
0.995 | 5 | 64 | 5000 | 1e-9 | Img | ||
0.995 | 5 | 128 | 1000 | 1e-7 | Img | ||
0.995 | 5 | 128 | 1000 | 1e-8 | Img | ||
0.995 | 5 | 128 | 1000 | 1e-9 | Img | ||
0.995 | 5 | 128 | 5000 | 1e-7 | Img | ||
0.995 | 5 | 128 | 5000 | 1e-8 | Img | ||
0.995 | 5 | 128 | 5000 | 1e-9 | Img | ||
0.995 | 10 | 32 | 1000 | 1e-7 | Img | ||
0.995 | 10 | 32 | 1000 | 1e-8 | Img | ||
0.995 | 10 | 32 | 1000 | 1e-9 | Img | ||
0.995 | 10 | 32 | 5000 | 1e-7 | Img | ||
0.995 | 10 | 32 | 5000 | 1e-8 | Img | ||
0.995 | 10 | 32 | 5000 | 1e-9 | Img | ||
0.995 | 10 | 64 | 1000 | 1e-7 | Img | ||
0.995 | 10 | 64 | 1000 | 1e-8 | Img | ||
0.995 | 10 | 64 | 1000 | 1e-9 | Img | ||
0.995 | 10 | 64 | 5000 | 1e-7 | Img | ||
0.995 | 10 | 64 | 5000 | 1e-8 | Img | ||
0.995 | 10 | 64 | 5000 | 1e-9 | Img | ||
0.995 | 10 | 128 | 1000 | 1e-7 | Img | ||
0.995 | 10 | 128 | 1000 | 1e-8 | Img | ||
0.995 | 10 | 128 | 1000 | 1e-9 | Img | ||
0.995 | 10 | 128 | 5000 | 1e-7 | Img | ||
0.995 | 10 | 128 | 5000 | 1e-8 | Img | ||
0.995 | 10 | 128 | 5000 | 1e-9 | Img |
*High score likely due to opponent failing very quickly
** Appears to still be improving
It is probably useful to know how roughly long it takes to train different algorithms. This will likely only be an approximation since the training time will vary based on load but it might be useful to have.
Things we'd like to look at today:
We're running a pretty costly grid search over hyperparameters right now. It would help to have a script that went through the output files and charted results from each search.
After fixing #6 we should scale down the map to see if that improves anything. At the very least it may make simulations run much faster.
Things I would like to do today:
from_blob
is working properly for sampled_log_probs
and othersNow that we're seeing signs of life, we should use a better neural network to control our actions and policy.
I'm thinking to use ResNet-18 from: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
We should measure how many self-crashes occur to observe how this number changes when we use different hyperparameters.
If our agent crashes its own ships into one another it might be useful to punish this action with a -0.1
or -0.05
reward on this step/
Our agent needs to be able to restart the game in order to generate new rollouts.
It seems to be something similar to the following:
//NOTE: These seem to be the steps necessary to start a new game!
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
hlt::Map map(map_parameters.width, map_parameters.height);
hlt::mapgen::Generator::generate(map, map_parameters);
std::string replay_directory = replay_arg.getValue();
if (replay_directory.back() != SEPARATOR) replay_directory.push_back(SEPARATOR);
hlt::GameStatistics game_statistics;
hlt::Replay replay{game_statistics, map_parameters.num_players, map_parameters.seed, map};
Logging::log("Map seed is " + std::to_string(map_parameters.seed));
hlt::Halite game(map, game_statistics, replay);
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could add a very small negative reward to turtles who move when they aren't supposed to be able to move. Something like -0.002
might work.
If it has any hope of learning to return Halite to the base, it will do so when there is only one turtle spawned. We should make it only spawn one turtle and see how it performs.
We'll do this after a gridsearch of hyperparameters
Tasks we would like to work on today:
Finish Batcher
class
Batcher
should take a list of rollouts and allow us to randomly sample batches of them Investigate dones
done
incorrectly Begin train_network()
method
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.