The haliteio from joshvarty

Convert map into 64x64 inputs

Input for ship and spawn

- 64x64 plane for Halite           
- 64x64 plane for current unit      
- 64x64 plane for my other units
- 64x64 plane for my units halite
- 64x64 plane for my shipyard
- 64x64 plane for my dropoff
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite
- 64x64 plane for enemy units halite

 -64x64 plane for my total score
- 64x64 plane for enemy score
- 64x64 plane for enemy score
- 64x64 plane for enemy score

- 64x64 plane for number of steps (maybe how many steps are left?)

2018.12.16

Finish removing spawn stuff (go back to fixed schedule)
Correct MyBot.cpp for new approach
Visualize current performance
Remove dropoff action?
Set up GCloud?

Don't train on every rollout item

I'm worried about high correlations between rollouts generated from a single game. (Halite is identical, dropoffs identical, ships usually in the same space etc.)

Consider training on every second or third example and just play more games to create enough rollout items.

Normalize/Scale input layers to network

One of the reasons I think we are getting large outputs is because of the large variations in scale between the different layers of our network. (Eg. Halite on the map is 0 to 1,000 while presence of a player is 0-1)

For this reason I propose we normalize the inputs of our network.

#	Feature	Current Range	Normalized range
1	Map Halite	`[0, inf)`	`[-0.5,~0.5]`
2	Steps Remaining	`[0,500]`	`[-0.5,0.5]`
3	My Ships	`[0,1]`	`[0,1]`
4	My Ships' Halite	`[0,1000]`	`[-0.5,0.5]`
5	My Dropoffs	`[0,1]`	`[0, 1]`
6	My Score	`[0,Inf)`	`[-0.5,~0.5]`
7	Enemy Ships	`[0,1]`	`[0,1]`
8	Enemy Ship's Halite	`[0, 1000]`	[-0.5,0.5]
9	Enemy Dropoff	`[0,1]`	`[0,1]`
10	Enemy Score	`[0, inf)`	`[-0.5, 0.5]`
11	Current Entity Location	`[0,1]`	`[0,1]`
12	Current Entity Halite	`[0,1000]`	`[-1,1]`

Double check that our calculation of log_probs is correct and equivalent to PyTorch's distributions model

Try just using score as reward

We should try using the score as the reward and see if optimizing directly for score helps us create better agents.

I'm still worried that the agents will learn to cooperate rather than compete, but it's worth a shot since learning seems to slow with +1 and -1 rewards.

Refactor all sizes (12,64,64) into proper fields.

I've littered 12,64,64 all over our code and we should probably just declare these in a single place so it won't be painful if we change the input structure.

Test different discount rates

500 steps is a lot. Maybe we should use a larger discount rate 0.999?

Consider randomizing where the dropoff spawns

In order to give the agent experience in different situations, consider randomizing where the dropoff spawns for each player.

Plan for 2018.11.28

Things I would like to do today:

Finish implementing forward() for the model
Finish calculating losses
Run a few training epochs
(stretch) Print out the losses and scores our agents are attaining

Convert game board to neural network input state

Similar to #1 I need to convert the board into an input for my neural network.

Implement gradient clipping

Gradient clipping might help us avoid exploding gradients without having to use such a small learning rate. We should try to implement it.

nn.utils.clip_grad_norm_()

Approach for testing out different hyperparameters

We need an approach for testing out different hyperparameters. Here are some things we should implement/do:

Print out all hyperparameters at beginning of run
Print out network information at beginning of run
Print out mean losses
Average metrics over longer periods of time (Right now it's 10, try 50?)

2018.12.17

Add entropy term to loss

Everyone says you should have it, so let's add it and see how it performs.

Consider having random offsets for game board

If we would like to have varied states, we could consider having random offsets for each game. We would have to properly handle wrapping around the edges of the board though.

Compress state representation

2000 rollouts * 12 depth * 64 height * 64 width = 98 MB.

We should use a more compact state representation and expand it only when we need it.

We may also wish to investigate using sparse tensors here.

Save the best weights to disk

Whenever we get a new "best" network, we should save that network's weights so we can load it from another implementation and watch the replays.

Compare results from grid search

Let's start looking at the results and ruling out certain hyperparameter ranges.

Dicsount	Rounds	Batch	Rollout	LR	Steps	Score	Graphs
0.99	3	32	1000	1e-7	356.5	313.065	Img
0.99	3	32	1000	1e-8	11.79	-	Img
0.99	3	32	1000	1e-9	4.8	-	Img
0.99	3	32	5000	1e-7	331	63.38	Img
0.99	3	32	5000	1e-8	323**	149	Img
0.99	3	32	5000	1e-9	6	-	Img
0.99	3	64	1000	1e-7	346	71	Img
0.99	3	64	1000	1e-8	6.6	-	Img
0.99	3	64	1000	1e-9	4.77	-	Img
0.99	3	64	5000	1e-7	360	92.2	Img
0.99	3	64	5000	1e-8	247.1**	75.1	Img
0.99	3	64	5000	1e-9	5.2	-	Img
0.99	3	128	1000	1e-7	338.9	108	Img
0.99	3	128	1000	1e-8	5.3	-	Img
0.99	3	128	1000	1e-9	4.7	-	Img
0.99	3	128	5000	1e-7	396.6	46.0	Img
0.99	3	128	5000	1e-8	96.9**	150.8	Img
0.99	3	128	5000	1e-9	5.1	-	Img
0.99	5	32	1000	1e-7	362	139.8	Img
0.99	5	32	1000	1e-8	122.7	139.3	Img
0.99	5	32	1000	1e-9	5.0	-	Img
0.99	5	32	5000	1e-7	395.0	113.4	Img
0.99	5	32	5000	1e-8	363.1	230.4	Img
0.99	5	32	5000	1e-9	7.0	-	Img
0.99	5	64	1000	1e-7	366	44.4	Img
0.99	5	64	1000	1e-8	12.0	-	Img
0.99	5	64	1000	1e-9	5.0	-	Img
0.99	5	64	5000	1e-7	387.5	125.0	Img
0.99	5	64	5000	1e-8			Img
0.99	5	64	5000	1e-9			Img
0.99	5	128	1000	1e-7			Img
0.99	5	128	1000	1e-8			Img
0.99	5	128	1000	1e-9			Img
0.99	5	128	5000	1e-7			Img
0.99	5	128	5000	1e-8			Img
0.99	5	128	5000	1e-9			Img
0.99	10	32	1000	1e-7			Img
0.99	10	32	1000	1e-8			Img
0.99	10	32	1000	1e-9			Img
0.99	10	32	5000	1e-7			Img
0.99	10	32	5000	1e-8			Img
0.99	10	32	5000	1e-9			Img
0.99	10	64	1000	1e-7			Img
0.99	10	64	1000	1e-8			Img
0.99	10	64	1000	1e-9			Img
0.99	10	64	5000	1e-7			Img
0.99	10	64	5000	1e-8			Img
0.99	10	64	5000	1e-9			Img
0.99	10	128	1000	1e-7			Img
0.99	10	128	1000	1e-8			Img
0.99	10	128	1000	1e-9			Img
0.99	10	128	5000	1e-7			Img
0.99	10	128	5000	1e-8			Img
0.99	10	128	5000	1e-9			Img
0.995	3	32	1000	1e-7	356	81.995	Img
0.995	3	32	1000	1e-8	16	-	Img
0.995	3	32	1000	1e-9	5	-	Img
0.995	3	32	5000	1e-7	339	75	Img
0.995	3	32	5000	1e-8	295.7**	165.8	Img
0.995	3	32	5000	1e-9	5	-	Img
0.995	3	64	1000	1e-7	325	128	Img
0.995	3	64	1000	1e-8	9.0	-	Img
0.995	3	64	1000	1e-9	4.83	-	Img
0.995	3	64	5000	1e-7	353	80.7	Img
0.995	3	64	5000	1e-8	295.4	110.2	Img
0.995	3	64	5000	1e-9	5	-	Img
0.995	3	128	1000	1e-7	300	117.6	Img
0.995	3	128	1000	1e-8	5	-	Img
0.995	3	128	1000	1e-9	4.7	-	Img
0.995	3	128	5000	1e-7	377.8	123	Img
0.995	3	128	5000	1e-8	137.1**	140.215	Img
0.995	3	128	5000	1e-9	4.9	-	Img
0.995	5	32	1000	1e-7	349.1	79.3	Img
0.995	5	32	1000	1e-8	44.0	-	Img
0.995	5	32	1000	1e-9	5.1	-	Img
0.995	5	32	5000	1e-7	371.2	142.2	Img
0.995	5	32	5000	1e-8	342.7	97.5	Img
0.995	5	32	5000	1e-9	7.7	-	Img
0.995	5	64	1000	1e-7	359.7	114.1	Img
0.995	5	64	1000	1e-8	9.9	-	Img
0.995	5	64	1000	1e-9	5.0	-	Img
0.995	5	64	5000	1e-7	380.8	111.7	Img
0.995	5	64	5000	1e-8			Img
0.995	5	64	5000	1e-9			Img
0.995	5	128	1000	1e-7			Img
0.995	5	128	1000	1e-8			Img
0.995	5	128	1000	1e-9			Img
0.995	5	128	5000	1e-7			Img
0.995	5	128	5000	1e-8			Img
0.995	5	128	5000	1e-9			Img
0.995	10	32	1000	1e-7			Img
0.995	10	32	1000	1e-8			Img
0.995	10	32	1000	1e-9			Img
0.995	10	32	5000	1e-7			Img
0.995	10	32	5000	1e-8			Img
0.995	10	32	5000	1e-9			Img
0.995	10	64	1000	1e-7			Img
0.995	10	64	1000	1e-8			Img
0.995	10	64	1000	1e-9			Img
0.995	10	64	5000	1e-7			Img
0.995	10	64	5000	1e-8			Img
0.995	10	64	5000	1e-9			Img
0.995	10	128	1000	1e-7			Img
0.995	10	128	1000	1e-8			Img
0.995	10	128	1000	1e-9			Img
0.995	10	128	5000	1e-7			Img
0.995	10	128	5000	1e-8			Img
0.995	10	128	5000	1e-9			Img

*High score likely due to opponent failing very quickly
** Appears to still be improving

Track training time for grid search

It is probably useful to know how roughly long it takes to train different algorithms. This will likely only be an approximation since the training time will vary based on load but it might be useful to have.

2018.12.05

Things we'd like to look at today:

Create script that plots scores and steps for completed runs when given a log file
Visualize a few games with hyperparameters that seemed to do well

Chart results from grid search

We're running a pretty costly grid search over hyperparameters right now. It would help to have a script that went through the output files and charted results from each search.

Scale down to 32x32

After fixing #6 we should scale down the map to see if that improves anything. At the very least it may make simulations run much faster.

2018.11.29

Things I would like to do today:

Investigate that from_blob is working properly for sampled_log_probs and others
#8
Try to guarantee backprop works properly

Use an improved neural network

Now that we're seeing signs of life, we should use a better neural network to control our actions and policy.
I'm thinking to use ResNet-18 from: https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

Track number of self-crashes

We should measure how many self-crashes occur to observe how this number changes when we use different hyperparameters.

2018.12.18

Build a ResNet
Test ResNet on MNIST
Count/Plot input layer values
Optionally change output

Punish ships if they crash into themselves

If our agent crashes its own ships into one another it might be useful to punish this action with a -0.1 or -0.05 reward on this step/

Figure out how to restart game engine from agent

Our agent needs to be able to restart the game in order to generate new rollouts.

It seems to be something similar to the following:

//NOTE: These seem to be the steps necessary to start a new game!
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`
hlt::Map map(map_parameters.width, map_parameters.height);
hlt::mapgen::Generator::generate(map, map_parameters);

std::string replay_directory = replay_arg.getValue();
if (replay_directory.back() != SEPARATOR) replay_directory.push_back(SEPARATOR);

hlt::GameStatistics game_statistics;
hlt::Replay replay{game_statistics, map_parameters.num_players, map_parameters.seed, map};
Logging::log("Map seed is " + std::to_string(map_parameters.seed));

hlt::Halite game(map, game_statistics, replay);
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Finish Batcher class
- Batcher should take a list of rollouts and allow us to randomly sample batches of them
Investigate dones
- Realized last night that we might be representing done incorrectly
Begin train_network() method

joshvarty / haliteio Goto Github PK

haliteio's People

Contributors

Stargazers

Watchers

Forkers

haliteio's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs