juliareinforcementlearning / juliareinforcementlearning.github.io Goto Github PK
View Code? Open in Web Editor NEWDocumentation for JuliaReinforcementLearning
Home Page: https://JuliaReinforcementLearning.org/
License: MIT License
Documentation for JuliaReinforcementLearning
Home Page: https://JuliaReinforcementLearning.org/
License: MIT License
Add more examples and descriptions:
I am facing the following problem while running (base) nabanita07@nabanita07:~$ tensorboard --logdir /home/nabanita07/checkpoints/JuliaRL_BasicDQN_CartPole_20200807180019/tb_log
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "/home/nabanita07/anaconda3/bin/tensorboard", line 6, in <module>
from tensorboard.main import run_main
File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/main.py", line 40, in <module>
from tensorboard import default
File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/default.py", line 39, in <module>
from tensorboard.plugins.beholder import beholder_plugin_loader
File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/__init__.py", line 22, in <module>
from tensorboard.plugins.beholder.beholder import Beholder
File "/home/nabanita07/anaconda3/lib/python3.7/site-packages/tensorboard/plugins/beholder/beholder.py", line 199, in <module>
class BeholderHook(tf.estimator.SessionRunHook):
File "/home/nabanita07/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 106, in __getattr__
attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow' has no attribute 'estimator'
In an alternative to using tensorboard for logging, we can use TensorBoardLogger.jl for making it more Julian.
At the end of The minimal interfaces to implement section in "How to write a customized environment?" page, there is the following test code.
using ReinforcementLearning
hook = TotalRewardPerEpisode()
run(
Agent(
;policy = RandomPolicy(env),
trajectory = VectorialCompactSARTSATrajectory(
state_type=Bool,
action_type=Any,
reward_type=Int,
terminal_type=Bool,
),
),
LotteryEnv(),
StopAfterEpisode(1_000),
hook
)
println(sum(hook.rewards) / 1_000)
Should VectorialCompactSARTSATrajectory
be replaced by VectCompactSARTSATrajectory
? It looks VectorialCompactSARTSATrajectory
is not defined in ReinforcementLearningCore (v0.4.5).
Also, its output is shown as
UndefVarError: env not defined
which seems unintentional. There is another UndefVarError at the beginning of the Traits of environments section too.
Just reading docs: https://juliareinforcementlearning.org/blog/an_introduction_to_reinforcement_learning_jl_design_implementations_thoughts/
What does this sentence mean?
"Until now, a policy is still very general. We don't know it's actually updating or exploiting."
Specifically the word "exploiting"?
It looks like there's only one commit right now which seems to have been auto generated, am I missing something here? How would one contribute to the website?
I will be adding minor issues while going through the get-started tutorial guide.
Sometimes when people try this package in a global environment. Some old dependencies will lead to the downgrade of this package, resulting in errors when trying the examples here.
This is asked on slack.
NamedTuple
Vector
, CircularArrayBuffer
, ElasticArray
)SharedTrajector
is a special trajectory in which different traces share the same container in different parts.CombinedTrajectory
is used to combine different trajectories. (Similar to merge
of different NamedTuple
) So that we compose different trajectories as wish.Some commonly used trajectories are:
CircularCompactSARTSATrajectory
: Usually used in DQN as experience replay buffer.CircularCompactSALRTSALTrajectory
: add another two traces to the above one: legal_actions
, legal_actions_mask
CircularCompactPSARTSATrajectory
: Used as Prioritized experience replay buffer.CircularCompactPSALRTSALTrajectory
: add another two traces to the above one: legal_actions
, legal_actions_mask
JuliaReinforcementLearning/ReinforcementLearning.jl#92
ElasticCompactSARTSATrajectory
is very similar to CircularCompactSARTSATrajectory
, except that it uses ElasticArray
instead of CircularArrayBuffer
as container.
using ReinforcementLearning
env = CartPoleEnv()
traj = ElasticCompactSARTSATrajectory(;
state_type = Float32,
state_size = (4,)
)
agent = Agent(;policy=policy, trajectory=traj)
run(agent, env, StopAfterEpisode(2))
traj[:state]
#=
4×58 view(::ElasticArrays.ElasticArray{Float32,2,1,Array{Float32,1}}, :, 1:58) with eltype Float32:
-0.0457848 -0.0452089 -0.0485339 … -0.0813302 -0.101226 -0.125046
0.0287953 -0.16625 -0.361306 -0.994798 -1.19099 -0.997598
-0.00532384 -0.00462615 0.00189152 0.109161 0.143715 0.18476
0.0348848 0.325883 0.617104 1.72769 2.05225 1.80727
=#
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.