Documentation for JuliaReinforcementLearning

Home Page: https://JuliaReinforcementLearning.org/

License: MIT License

HTML 75.53% TeX 1.88% JavaScript 22.58%

franklin

juliareinforcementlearning.github.io's People

Contributors

Stargazers

Watchers

Forkers

findmyway dnabanita7 quattroporte616 drozzy

juliareinforcementlearning.github.io's Issues

Update the Guid of how to write a customized environment

Add more examples and descriptions:

Ref: JuliaReinforcementLearning/ReinforcementLearning.jl#64

facing problem using tensorboard

I am facing the following problem while running (base) nabanita07@nabanita07:~$ tensorboard --logdir /home/nabanita07/checkpoints/JuliaRL_BasicDQN_CartPole_20200807180019/tb_log

/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
  File "/home/nabanita07/anaconda3/bin/tensorboard", line 6, in <module>
    from tensorboard.main import run_main
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/main.py", line 40, in <module>
    from tensorboard import default
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/default.py", line 39, in <module>
    from tensorboard.plugins.beholder import beholder_plugin_loader
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/__init__.py", line 22, in <module>
    from tensorboard.plugins.beholder.beholder import Beholder
  File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/beholder.py", line 199, in <module>
    class BeholderHook(tf.estimator.SessionRunHook):
  File "/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 106, in __getattr__
    attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow' has no attribute 'estimator'

In an alternative to using tensorboard for logging, we can use TensorBoardLogger.jl for making it more Julian.

Document how to render an environment

Update slack invitation link

https://discourse.julialang.org/t/https-slackinvite-julialang-org-deprecated-and-replaced-with-https-julialang-org-slack/49088

Typo in "How to write a customized environment?" page

At the end of The minimal interfaces to implement section in "How to write a customized environment?" page, there is the following test code.

using ReinforcementLearning
hook = TotalRewardPerEpisode()
run(
    Agent(
        ;policy = RandomPolicy(env),
        trajectory = VectorialCompactSARTSATrajectory(
            state_type=Bool,
            action_type=Any,
            reward_type=Int,
            terminal_type=Bool,
        ),
    ),
    LotteryEnv(),
    StopAfterEpisode(1_000),
    hook
)

println(sum(hook.rewards) / 1_000)

Should VectorialCompactSARTSATrajectory be replaced by VectCompactSARTSATrajectory? It looks VectorialCompactSARTSATrajectory is not defined in ReinforcementLearningCore (v0.4.5).

Also, its output is shown as

UndefVarError: env not defined

which seems unintentional. There is another UndefVarError at the beginning of the Traits of environments section too.

"exploiting" doc

Just reading docs: https://juliareinforcementlearning.org/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts/

What does this sentence mean?
"Until now, a policy is still very general. We don't know it's actually updating or exploiting."

Specifically the word "exploiting"?

How to contribute to this website

It looks like there's only one commit right now which seems to have been auto generated, am I missing something here? How would one contribute to the website?

Explain the batched environment in A2C

JuliaReinforcementLearning/ReinforcementLearning.jl#92

Minor changes to blog

I will be adding minor issues while going through the get-started tutorial guide.

There is no mention of precompiling Flux, Random, BSON libraries while running the cartpole environment.
Suggestions: While precompiling ReinforcementLearning we can add Flux, Random, BSON there.

A trajectory is designed to behave like a NamedTuple
A trajectory contains multiple traces. Each trace can be of different containers, (Vector, CircularArrayBuffer, ElasticArray)
A SharedTrajector is a special trajectory in which different traces share the same container in different parts.
A CombinedTrajectory is used to combine different trajectories. (Similar to merge of different NamedTuple) So that we compose different trajectories as wish.

Some commonly used trajectories are:

CircularCompactSARTSATrajectory: Usually used in DQN as experience replay buffer.
CircularCompactSALRTSALTrajectory: add another two traces to the above one: legal_actions, legal_actions_mask
CircularCompactPSARTSATrajectory: Used as Prioritized experience replay buffer.
CircularCompactPSALRTSALTrajectory: add another two traces to the above one: legal_actions, legal_actions_mask

JuliaReinforcementLearning/ReinforcementLearning.jl#92

ElasticCompactSARTSATrajectory is very similar to CircularCompactSARTSATrajectory, except that it uses ElasticArray instead of CircularArrayBuffer as container.

Example usage:

using ReinforcementLearning

env = CartPoleEnv()

traj = ElasticCompactSARTSATrajectory(;
       state_type = Float32,
       state_size = (4,)
       )

agent = Agent(;policy=policy, trajectory=traj)

run(agent, env, StopAfterEpisode(2))

traj[:state]
#=
4×58 view(::ElasticArrays.ElasticArray{Float32,2,1,Array{Float32,1}}, :, 1:58) with eltype Float32:
 -0.0457848   -0.0452089   -0.0485339   …  -0.0813302  -0.101226  -0.125046
  0.0287953   -0.16625     -0.361306       -0.994798   -1.19099   -0.997598
 -0.00532384  -0.00462615   0.00189152      0.109161    0.143715   0.18476
  0.0348848    0.325883     0.617104        1.72769     2.05225    1.80727
=#

juliareinforcementlearning / juliareinforcementlearning.github.io Goto Github PK

juliareinforcementlearning.github.io's People

Contributors

Stargazers

Watchers

Forkers

juliareinforcementlearning.github.io's Issues

Example usage:

Recommend Projects

Recommend Topics

Recommend Org

Jobs