GithubHelp home page GithubHelp logo

juliareinforcementlearning / juliareinforcementlearning.github.io Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 4.0 9.6 MB

Documentation for JuliaReinforcementLearning

Home Page: https://JuliaReinforcementLearning.org/

License: MIT License

HTML 75.53% TeX 1.88% JavaScript 22.58%
franklin

juliareinforcementlearning.github.io's People

Contributors

henrideh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

juliareinforcementlearning.github.io's Issues

facing problem using tensorboard

I am facing the following problem while running (base) nabanita07@nabanita07:~$ tensorboard --logdir /home/nabanita07/checkpoints/JuliaRL_BasicDQN_CartPole_20200807180019/tb_log

/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
  File "/home/nabanita07/anaconda3/bin/tensorboard", line 6, in <module>
    from tensorboard.main import run_main
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/main.py", line 40, in <module>
    from tensorboard import default
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/default.py", line 39, in <module>
    from tensorboard.plugins.beholder import beholder_plugin_loader
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/__init__.py", line 22, in <module>
    from tensorboard.plugins.beholder.beholder import Beholder
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/beholder.py", line 199, in <module>
    class BeholderHook(tf.estimator.SessionRunHook):
  File "/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 106, in __getattr__
    attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow' has no attribute 'estimator'

In an alternative to using tensorboard for logging, we can use TensorBoardLogger.jl for making it more Julian.

Typo in "How to write a customized environment?" page

At the end of The minimal interfaces to implement section in "How to write a customized environment?" page, there is the following test code.

using ReinforcementLearning
hook = TotalRewardPerEpisode()
run(
    Agent(
        ;policy = RandomPolicy(env),
        trajectory = VectorialCompactSARTSATrajectory(
            state_type=Bool,
            action_type=Any,
            reward_type=Int,
            terminal_type=Bool,
        ),
    ),
    LotteryEnv(),
    StopAfterEpisode(1_000),
    hook
)

println(sum(hook.rewards) / 1_000)

Should VectorialCompactSARTSATrajectory be replaced by VectCompactSARTSATrajectory? It looks VectorialCompactSARTSATrajectory is not defined in ReinforcementLearningCore (v0.4.5).

Also, its output is shown as

UndefVarError: env not defined

which seems unintentional. There is another UndefVarError at the beginning of the Traits of environments section too.

How to contribute to this website

It looks like there's only one commit right now which seems to have been auto generated, am I missing something here? How would one contribute to the website?

Minor changes to blog

I will be adding minor issues while going through the get-started tutorial guide.

  • There is no mention of precompiling Flux, Random, BSON libraries while running the cartpole environment.
    Suggestions: While precompiling ReinforcementLearning we can add Flux, Random, BSON there.

How to understand different kinds of trajectories?

This is asked on slack.

  1. A trajectory is designed to behave like a NamedTuple
  2. A trajectory contains multiple traces. Each trace can be of different containers, (Vector, CircularArrayBuffer, ElasticArray)
  3. A SharedTrajector is a special trajectory in which different traces share the same container in different parts.
  4. A CombinedTrajectory is used to combine different trajectories. (Similar to merge of different NamedTuple) So that we compose different trajectories as wish.

Some commonly used trajectories are:

  • CircularCompactSARTSATrajectory: Usually used in DQN as experience replay buffer.
  • CircularCompactSALRTSALTrajectory: add another two traces to the above one: legal_actions, legal_actions_mask
  • CircularCompactPSARTSATrajectory: Used as Prioritized experience replay buffer.
  • CircularCompactPSALRTSALTrajectory: add another two traces to the above one: legal_actions, legal_actions_mask

JuliaReinforcementLearning/ReinforcementLearning.jl#92

ElasticCompactSARTSATrajectory is very similar to CircularCompactSARTSATrajectory, except that it uses ElasticArray instead of CircularArrayBuffer as container.

Example usage:

using ReinforcementLearning

env = CartPoleEnv()

traj = ElasticCompactSARTSATrajectory(;
       state_type = Float32,
       state_size = (4,)
       )

agent = Agent(;policy=policy, trajectory=traj)

run(agent, env, StopAfterEpisode(2))

traj[:state]
#=
4×58 view(::ElasticArrays.ElasticArray{Float32,2,1,Array{Float32,1}}, :, 1:58) with eltype Float32:
 -0.0457848   -0.0452089   -0.0485339   …  -0.0813302  -0.101226  -0.125046
  0.0287953   -0.16625     -0.361306       -0.994798   -1.19099   -0.997598
 -0.00532384  -0.00462615   0.00189152      0.109161    0.143715   0.18476
  0.0348848    0.325883     0.617104        1.72769     2.05225    1.80727
=#

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.