GithubHelp home page GithubHelp logo

mishalaskin / large-scale-curiosity Goto Github PK

View Code? Open in Web Editor NEW

This project forked from openai/large-scale-curiosity

0.0 2.0 0.0 808 KB

Code for the paper "Large-Scale Study of Curiosity-Driven Learning"

Home Page: https://arxiv.org/abs/1808.04355

Python 9.59% Jupyter Notebook 90.41%

large-scale-curiosity's Introduction

Status: Archive (code is provided as-is, no updates expected)

Large-Scale Study of Curiosity-Driven Learning

Yuri Burda*, Harri Edwards*, Deepak Pathak*,
Amos Storkey, Trevor Darrell, Alexei A. Efros
(* alphabetical ordering, equal contribution)

University of California, Berkeley
OpenAI
University of Edinburgh

This is a TensorFlow based implementation for our paper on large-scale study of curiosity-driven learning across 54 environments. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper, We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments. We further investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). If you find this work useful in your research, please cite:

@inproceedings{largeScaleCuriosity2018,
    Author = {Burda, Yuri and Edwards, Harri and
              Pathak, Deepak and Storkey, Amos and
              Darrell, Trevor and Efros, Alexei A.},
    Title = {Large-Scale Study of Curiosity-Driven Learning},
    Booktitle = {arXiv:1808.04355},
    Year = {2018}
}

Installation and Usage

The following command should train a pure exploration agent on Breakout with default experiment parameters.

python run.py

To use more than one gpu/machine, use MPI (e.g. mpiexec -n 8 python run.py should use 1024 parallel environments to collect experience instead of the default 128 on an 8 gpu machine).

Saving Agents

The policy is automatically saved during training. Specify a path to your desired folder with the --saved_model_dir flag, e.g.:

python run.py --saved_model_dir YOUR_SAVED_MODEL_FOLDER

The model filename is hardcoded to be model.ckpt, but in your folder you'll find model.ckpt-{INTEGER} files where integer corresponds to a specific save since the model is saved periodically.

Restoring Saved Agents

Once you've ran and saved an agent, you can restore it by specifying the saved model folder and the model name with the --saved_model_dir and --model_name flags. The model name should be model.ckpt-{INTEGER} without the .index suffix, so model.ckpt-100 would be a valid model name.

Evaluating a Restored Agent

In order to evaluate you have to enable the -eval flag. You can specify the evaluation length with the --n_eval_steps argument, which is set to 512 steps by default. Note: the default number of workers is 128 workers and is set by --envs_per_process, so if you run the default configuration you will get 128 parallel rollouts of 512 length each.

Running evaluation will automatically generate a dataset of saved actions. You can also get image observations from the evaluation run if you enable the -include_images flag. If you include images, the data file size can get quite large (~1.8G for 128 workers x 512 steps per worker).

Here's an example command for running evaluation and collecting a dataset that includes images:

python run.py -eval -include_images --saved_model_dir ./tmp/ --model_name model.ckpt-100

Visualizing the Evaluation Dataset

Now that you've saved your data, you should have a file names {env_name}_data.npy (for example, BreakoutNoFrameskip-v4_data.npy) in your current directory. To visualize, open the Visualization.ipynb notebook and run it. You should get a video of the saved rollouts if you included images. Sense check these to make sure the curiosity code is working correctly.

Data for plots in paper

Data for Figure-2: contains raw game score data along with the plotting code to generate Figure-2 in the paper.

Other helpful pointers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.