DeepRacer Journal/Model Bank

Latest Update: 2022-11-29

Disclaimer: At the time of writing, I am employed by Amazon as an SSA (Specialist Solutions Architect) in Beijing, China. However, the notes, opinions, and thoughts on DeepRacer modeling shared here are my own, not those of my employer. In cases where I borrow ideas or methods from other people, I try to make that clear with appropriate references.

The idea to keep a DeepRacer journal comes from Scott Pletcher's awesome repo, where he documents some of the reward functions he has tried and the logic behind them.

Starting Out

I started by taking the (free) AWS DeepRacer: Driven By Reinforcement Learning course on AWS Skillbuilder.

The course runs through the basics of setting up the physical DeepRacer car, using the DeepRacer console, and training and evaluating a model. It also explains what the model's hyperparameters are, and what parameters you can use in the reward function to reward (or punish) your car for taking certain actions.

During the course, I used a few of the example reward functions provided in the DeepRacer documentation and trained the car on the Jennens Family Speedway track, typically for 1 hour at a time.

Observations

After playing with some of the provided functions, I learned that:

Complicated reward functions aren't always better (in fact, making a lot of assumptions about what the car "should" be doing seems to be a bad thing)
Some reward functions can be trained more quickly than others (in general, following the centerline leads to fast convergence)

With these observations in mind, I opted to start with simpler reward functions, only adding new pieces as necessary to coax the model towards desired behaviors it had not acquired on its own.

1. The First Few Models

Model	Purpose
Follow the line	Just follow the centerline
Don't wiggle	Follow the centerline, but penalize the car for steering angles > 15 degrees
Stay on track	Reward the car based on its ability to stay on the track and reasonably close to the centerline

All three of these models were trained on the Jennens Family Speedway track: none of them achieved a time under 1 minute, and only the Don't Wiggle model managed to make it around the track without any resets.

2. Trying For More Speed

The obvious next step was to try and get the car moving faster. To do this, I cloned my three models again, but with a modified reward function that penalized low speeds. Specifically, I penalized the car for speeds below 1 m/s, by reducing the reward by a factor of 0.8 (for the Don't wiggle model, the reward was reduced even further if the car's steering angle was > 15 degrees). I also updated the maximum allowed speed in the Action space settings, from 1 m/s to 2 m/s.

Model	Purpose
Follow the line, fast	Same as Follow the line, but with a low speed penalty
Don't wiggle, fast	Same as before (over-steering penalty), but with a low speed penalty added as well
Stay on track, fast	Same as before, but with a low speed penalty

The results were good: the car did speed up, and most models could complete the track in under a minute (under 40 seconds, in some cases).

Still More Speed?

Rather than punishing low speeds, I cloned my models again and tried directly adding a scaling factor to the reward, which scaled up as the car traveled faster. I also raised the car's maximum speed to 3 m/s.

The scaling factor looked something like this:

# Give a bonus for high speeds
reward += speed / 3.0

I was expecting good results, so I was surprised when the models performed badly. Instead of speeding around the track, the models were flying off it! Perhaps the reward for speed was overwhelming the other rewards for staying on the track and/or staying near the center.

I ended up throwing these models away to go back to the drawing board.

3. Answering Some Burning Questions

Since all the reward functions seemed to get better at navigating the track as I trained them more, I started to wonder: would any reward function work? What about a constant reward of 1? or -1? What if I just fed the model a 0? What if I set the discount factor (a measure of how important future rewards are) to 0? I ran a few tests to try and tease out what would happen.

My previous experience told me that simpler models need more time to train, so I gave each of the models below a full 4 hours of training time.

Model	Purpose
Good dog	Constant reward of 1, regardless of what the car does
Bad dog	Constant reward of -1, regardless of what the car does
Existentialist dog	No reward of any kind, regardless of what the car does
In the moment	Default centerline following model, but with discount factor set to 0

jeremypedersen / deepracer-notes Goto Github PK

deepracer-notes's Introduction

DeepRacer Journal/Model Bank

Starting Out

Observations

1. The First Few Models

2. Trying For More Speed

Still More Speed?

3. Answering Some Burning Questions

deepracer-notes's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs