ai4finance-foundation / elegantrl Goto Github PK

View Code? Open in Web Editor NEW

3.4K 52.0 799.0 106.53 MB

Massively Parallel Deep Reinforcement Learning. 🔥

Home Page: https://ai4finance.org

License: Other

Python 82.49% Jupyter Notebook 17.51%

pytorch reinforcement-learning ppo sac td3 dqn ddpg stable lightweight efficient

elegantrl's Introduction

ElegantRL “小雅”: Massively Parallel Deep Reinforcement Learning

“小雅”源于《诗经·小雅·鹤鸣》，旨在「他山之石，可以攻玉」。

ElegantRL (website) is developed for users/developers with the following advantages:

Cloud-native: follows a cloud-native paradigm through micro-service architecture and containerization, and supports ElegantRL-Podracer and FinRL-Podracer.
Scalable: fully exploits the parallelism of DRL algorithms, making it easily scale out to hundreds or thousands of computing nodes on a cloud platform, say, a DGX SuperPOD platform with thousands of GPUs.
Elastic: allows to elastically and automatically allocate computing resources on the cloud.
Lightweight: the core codes have <1,000 lines (check Elegantrl_Helloworld).
Efficient: in many testing cases (e.g., single-GPU/multi-GPU/GPU-cloud), we find it more efficient than Ray RLlib.
Stable: much much much more stable than Stable Baselines 3 by utilizing various methods such as the Hamiltonian term.
Practical: used in multipe projects (RLSolver, FinRL, FinRL-Meta, TransportRL etc.)

ElegantRL implements the following model-free deep reinforcement learning (DRL) algorithms:

DDPG, TD3, SAC, PPO, REDQ for continuous actions in single-agent environment,
DQN, Double DQN, D3QN for discrete actions in single-agent environment,
QMIX, VDN, MADDPG, MAPPO, MATD3 in multi-agent environment.

For more details of DRL algorithms, please refer to the educational webpage OpenAI Spinning Up.

ElegantRL supports the following simulators:

Isaac Gym for massively parallel simulations,
OpenAI Gym, MuJoCo, PyBullet, FinRL for benchmarking.

News
ElegantRL-Helloworld
File Structure
Experimental Demos
Requirements
Citation

Tutorials

[Towardsdatascience] A New Era of Massively Parallel Simulation: A Practical Tutorial Using ElegantRL, Nov. 2, 2022.
[MLearning.ai] ElegantRL: Much More Stable Deep Reinforcement Learning Algorithms than Stable-Baseline3, Mar. 3, 2022.
[Towardsdatascience] ElegantRL-Podracer: A Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning, Dec. 11, 2021.
[Towardsdatascience] ElegantRL: Mastering PPO Algorithms, May. 3, 2021.
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part II), Apr. 19, 2021.
[MLearning.ai] ElegantRL Demo: Stock Trading Using DDPG (Part I), Mar. 28, 2021.
[Towardsdatascience] ElegantRL-Helloworld: A Lightweight and Stable Deep Reinforcement Learning Library, Mar. 4, 2021.

ElegantRL-Helloworld

For beginners, we maintain ElegantRL-Helloworld as a tutorial. Its goal is to get hands-on experience with ELegantRL.

One sentence summary: an agent (agent.py) with Actor-Critic networks (net.py) is trained (run.py) by interacting with an environment (env.py).

File Structure

elegantrl # main folder
- agents # a collection of DRL algorithms
  - AgentXXX.py # a collection of one kind of DRL algorithms
  - net.py # a collection of network architectures
- envs # a collection of environments
  - XxxEnv.py # a training environment for RL
- train # a collection of training programs - demo.py # a collection of demos
  - config.py # configurations (hyper-parameter)
  - run.py # training loop
  - worker.py # the worker class (explores the env, saving the data to replay buffer)
  - learner.py # the learner class (update the networks, using the data in replay buffer)
  - evaluator.py # the evaluator class (evaluate the cumulative rewards of policy network)
  - replay_buffer.py # the buffer class (save sequences of transitions for training)
elegantrl_helloworld # tutorial version
- config.py # configurations (hyper-parameter)
- agent.py # DRL algorithms
- net.py # network architectures
- run.py # training loop
- env.py # environments for RL training
examples # a collection of example codes
ready-to-run Google-Colab notebooks
- quickstart_Pendulum_v1.ipynb
- tutorial_BipedalWalker_v3.ipynb
- tutorial_Creating_ChasingVecEnv.ipynb
- tutorial_LunarLanderContinuous_v2.ipynb
unit_tests # a collection of tests

Experimental Demos

More efficient than Ray RLlib

Experiments on Ant (MuJoCo), Humainoid (MuJoCo), Ant (Isaac Gym), Humanoid (Isaac Gym) # from left to right

ElegantRL fully supports Isaac Gym that runs massively parallel simulation (e.g., 4096 sub-envs) on one GPU.

More stable than Stable-baseline 3

Experiment on Hopper-v2 # ElegantRL achieves much smaller variance (average over 8 runs).

Also, PPO+H in ElegantRL completed the training process of 5M samples about 6x faster than Stable-Baseline3.

Testing and Contributing

Our tests are written with the built-in unittest Python module for easy access. In order to run a specific test file (for example, test_training_agents.py), use the following command from the root directory:

python -m unittest unit_tests/test_training_agents.py

In order to run all the tests sequentially, you can use the following command:

python -m unittest discover

Please note that some of the tests require Isaac Gym to be installed on your system. If it is not, any tests related to Isaac Gym will fail.

We welcome any contributions to the codebase, but we ask that you please do not submit/push code that breaks the tests. Also, please shy away from modifying the tests just to get your proposed changes to pass them. As it stands, the tests on their own are quite minimal (instantiating environments, training agents for one step, etc.), so if they're breaking, it's almost certainly a problem with your code and not with the tests.

We're actively working on refactoring and trying to make the codebase cleaner and more performant as a whole. If you'd like to help us clean up some code, we'd strongly encourage you to also watch Uncle Bob's clean coding lessons if you haven't already.

Requirements

Necessary:
| Python 3.6+     |
| PyTorch 1.6+    |

Not necessary:
| Numpy 1.18+     | For ReplayBuffer. Numpy will be installed along with PyTorch.
| gym 0.17.0      | For env. Gym provides tutorial env for DRL training. (env.render() bug in gym==0.18 pyglet==1.6. Change to gym==0.17.0, pyglet==1.5)
| pybullet 2.7+   | For env. We use PyBullet (free) as an alternative of MuJoCo (not free).
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots.

pip3 install gym==0.17.0 pybullet Box2D matplotlib # or pip install -r requirements.txt

To install StarCraftII env,
bash ./elegantrl/envs/installsc2.sh
pip install -r sc2_requirements.txt

Citation:

To cite this repository:

@misc{erl,
  author = {Liu, Xiao-Yang and Li, Zechu and Zhu, Ming and Wang, Zhaoran and Zheng, Jiahao},
  title = {{ElegantRL}: Massively Parallel Framework for Cloud-native Deep Reinforcement Learning},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/AI4Finance-Foundation/ElegantRL}},
}

@article{liu2021elegantrl,
  title={ElegantRL-Podracer: Scalable and elastic library for cloud-native deep reinforcement learning},
  author={Liu, Xiao-Yang and Li, Zechu and Yang, Zhuoran and Zheng, Jiahao and Wang, Zhaoran and Walid, Anwar and Guo, Jian and Jordan, Michael I},
  journal={NeurIPS, Workshop on Deep Reinforcement Learning},
  year={2021}
}

elegantrl's People

Contributors

Stargazers

Watchers

Forkers

jiangyancao thumxs jiefu-zhang guanshiting shaanxicode ysssghtz andymaple timor7hao wormpartner denn-m wzq-a bcmq wisdomyz colaotaku xuforr jindongwang fangwudi czh513 dukasguo zhangtjtongxue zhq-air silent790 whitelilis sunhao12121 lighterhouse zjuyichen landoufulxf blueyyc0 peide oldbrostable electronicelephant jh-ott panpan0111 jingxfei staminatang f2-song dannycooper126 qiuweimin1332499 birdfine karry5 smithqwe davidkillerhahaha sachiel321 zdaotian tzeyangx greensun0830 jiahuisun 2590400709 guoyaq 2017wxyzwxyz binyao2020 ydlu neverpy renzhenxuexidemaimai yuguoxinls ljnaaa destinyddx chinglohsiu andyyue1893 zzpdapeng marsprj yangletliu mistcarryyou yechao1009 hyyh28 moneypi wuyadie zhangmaojun yuanzi1501040205 silencewsl montshasta2020 viphi xianyisanren hah404 weileze yue19960831 winnull xiao000l vivicoco taogz pangpang97 singmeasong rgs6655 15738897318 zir0ne tracycuiyating yyan99 fdoperezi dotzou lukemshannonhill murilobd tonylibing wojiaojd ltargaryen z4z5 asi212 pplonski ngduyanhece zeta1999 clintg

elegantrl's Issues

RNN

Having RNN in this project will be great, do you plan on creating something like LSTM?

installation issue: setup.py, docs, readme.md

In the homepage README.md:
pip3 install gym==1.17.0 pybullet Box2D matplotlib

in the docs, it says:
cd ElegantRL
pip install .
this will install the latest version of gym (!= 1.17.0), and will not install Box2D.

setup.py needs to be update?
Thanks.

Can't run in colab

I meet the error below

How to crack the BipedalWalkerHardcore using eRL in colab?

How to repeat success in video{https://www.bilibili.com/video/BV1wi4y187tC} using eRL in colab?

I tried:

args = Arguments(if_on_policy=False)
args.agent = AgentSAC()
args.env = PreprocessEnv(gym.make('BipedalWalkerHardcore-v3'))
args.reward_scale = 2 ** -1 
args.gamma = 0.95
args.rollout_num = 2
args.if_remove = False
train_and_evaluate_mp(args)

After a million step (3 hours), the agent scored 37 points (MaxR)

Notebook eRL_demo_StockTrading.ipynb doen't work

Hi,

It seems that notebook eRL_demo_StockTrading.ipynb is deprecated and broken. It throughs the following error:

Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/elegantrl/run.py", line 340, in run
agent.states = [env.reset(), ]
File "/usr/local/lib/python3.7/dist-packages/elegantrl/envs/FinRL/StockTrading.py", line 47, in reset
self.amount = self.initial_capital * rd.uniform(0.95, 1.05) - (self.stocks * price).sum()
ValueError: operands could not be broadcast together with shapes (74,) (73,)

YonV: Don't open Issues like this 不建议像这样开问题 → init() takes 1 positional argument but 2 were given

Thank you!

env error

Thank you for such a wonderful program.
I was trying to run "eRL_demo_StockTrading.ipynb" under Colab
via "cut and paste".

At the very end and after training, I ran into this error:

Last code block:

args = Arguments(if_on_policy=True)
args.agent = AgentPPO()
args.env = StockTradingEnv(cwd='./', if_eval=True)
args.if_remove = False
args.cwd = './AgentPPO/StockTradingEnv-v1_0'
args.init_before_training()

env.draw_cumulative_return(args, torch)

Error Message under Colab:

GPU id: 0, cwd: ./AgentPPO/StockTradingEnv-v1_0

NameError Traceback (most recent call last)
in ()
6 args.init_before_training()
7
----> 8 env.draw_cumulative_return(args, torch)

NameError: name 'env' is not defined

How could I solve this?? Pls help!

Main.py can not work with DEMO 2

My python version is 3.6.9

and I just uncomment DEMO2 and comment DEMO3

Then I got the error message below:

/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
| env_name: LunarLanderContinuous-v2, action space: Continuous
| state_dim: 8, action_dim: 2, action_max: 1.0, target_reward: 200
| GPU id: n, cwd: ./AgentModSAC/LunarLanderContinuous-v2_n
| Remove history
Traceback (most recent call last):
File "Main.py", line 312, in
main()
File "Main.py", line 77, in main
train_and_evaluate(args)
File "Main.py", line 134, in train_and_evaluate
agent = rl_agent(net_dim, state_dim, action_dim)
File "/home/oycq/gym/ElegantRL/Agent.py", line 468, in init
AgentSAC.init(net_dim, state_dim, action_dim, learning_rate)
File "/home/oycq/gym/ElegantRL/Agent.py", line 406, in init
super().init()
TypeError: super(type, obj): obj must be an instance or subtype of type

eRL_demo_StockTrading file error at train_and_evaluate_mp function

When I execute the commandline "train_and_evalutate_mp(args)" I get the error:

RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:258

and comments on google says "we cannot use multiprocessing CUDA on Windows"
Do I need linux for this?

Offline data support.

Would you happen to have offline example for stock? Some tickers are not available on yahoo finance.

eRL_demo_StockTrading fails with error -

eRL_demo_StockTrading fails in the last cell with error -

TypeError                                 Traceback (most recent call last)
<ipython-input-5-e8a0261f66cc> in <module>()
      4 args.if_remove = False
      5 args.cwd = './AgentPPO/StockTradingEnv-v1_0'
----> 6 args.init_before_training()
      7 
      8 env.draw_cumulative_return(args, torch)

TypeError: init_before_training() missing 1 required positional argument: 'if_main'

Could be due to latest commit

Tried setting if_main=True and changing env.draw to args.env.draw. I get another error

AttributeError: 'AgentPPO' object has no attribute 'save_load_model'

Could be from the same commit

Why rewrite `def explore_env` for PPO?

Hi,

Why rewrite the def explore_env for agentPPO? Is it to store the action instead of action.tanh()? Can I ask why?

YonV: Don't open Issues like this 不建议像这样开问题 → Found dtype Double but expected Float

This problem seems to appear in a lot of the code I downloaded, is my tool version wrong? Please help me to solve it, thank you!

installation issue: setup.py, docs, readme.md

In the homepage README.md:
pip3 install gym==1.17.0 pybullet Box2D matplotlib

in the docs, it says:
cd ElegantRL
pip install .
this will install the latest version of gym (!= 1.17.0), and will not install Box2D.

setup.py needs to be update?
Thanks.

Find some bugs

Thanks for your wonderful work!

I found 2 bugs from the repo.

In https://github.com/AI4Finance-LLC/ElegantRL/blob/master/elegantrl/agent.py and other agent.py files,
method get_obj_critic_per() may compute the td-error wrong.

td_error = (q_label - torch.min(q1, q1).detach()).abs()
should be
td_error = (q_label - torch.min(q1, q2).detach()).abs()
Also in agent.py: class AgentSAC,AgentModSAC, and AgentSharedSAC.

Their method update_net() return the following

return alpha.item(), obj_critic.item()

It may be

return obj_actor.item(), obj_critic.item()

something wrong with readme

python3 AgentRun.py

You can see run__demo(gpu_id=0, cwd='AC_BasicAC') in AgentRun.py.

python3: can't open file 'AgentRun.py': [Errno 2] No such file or directory

chengdu-pijialonggui-CV

BufferArray bug: sampling range not updated

In class BufferArray, random_sample(), the indices are sampled from self.now_len, which does not change as new memory is added. This would result in training on old initial memories unless the whole buffer is full and the index starts from 0 again.

Only small modifications needed, but this can be critical sometimes.

unexpected keyword argument 'if_on_policy' in eRL_demo_StockTrading Part 5

Hi, I just discovered elegance RL and I am testing the jupyter notebooks.
When I run part 5 on eRL_demo_StockTrading, jupyter throws me:

TypeError: init() got an unexpected keyword argument 'if_on_policy'

Looking a bit at the run.py code in tutorial folder, I see

class Arguments: def __init__(self, agent=None, env=None, if_on_policy=False):

And I have seen that the Arguments class has been updated and now it does not have "if_on_policy" nor "None" as default:

class Arguments: # [ElegantRL.2021.10.21] def __init__(self, env, agent):

I'm trying to get the code to work but all I'm doing is skipping necessary parts and doesn't work for me.

I'm missing something?

Support for dict observation space

Hello. Thanks for this useful DRL library. I am implementing my own gym environment and I use the dict space as my observation space(consists of images and some low-dimension states). It seems that ElegantRL does't support this kind of space now. Do you have the plan to add this feature in the future or do I need to encode this observation into space.Box by myself?

WinOS: 'os' has no attribute 'mknod', no module named 'cv2' || original title"跑不通咋办"

兄弟你这介绍的好像不太详细呀，而且有的程序导入的包都不见了，请问有没有更好的版本？

Need more examples

Hello, @Yonv1943

Please make more examples in repo.

多举几个例子吧，大兄弟。如果能像stable-baselines那样方便套gym就更好了。

ppo and entropy

In your AgentZoo.py class AgentGAE
you estimate policy entropy use mean

            # surrogate objective of TRPO
            ratio = (new_log_prob - old_log_prob).exp()
            surrogate_obj0 = advantage * ratio
            surrogate_obj1 = advantage * ratio.clamp(1 - clip, 1 + clip)
            surrogate_obj = -torch.min(surrogate_obj0, surrogate_obj1).mean()
            # policy entropy
            loss_entropy = (new_log_prob.exp() * new_log_prob).mean()

But in the definition of entropy, you should use sum
I want to know the difference of mean and sum to estimate policy entropy.
What impact will they have on the algorithm.

PPO : why use 1/(r_sum.std()) as coef when updating the critic?

Is it some kind of TTUR?

Save and load issue

Thanks so much for the contribution to reinforcement learning.
Might I ask that is there any save and load functions?
Sometimes we might want to stop the training process and continue the training later.

SAC : why actor has a target network? Why ModSAC has a Reliable lamdba and TTUR?

你好我看到在代码中，sac的actor也有target_net。这个在其他implementation，比如stable_baseline3, spinning_up都没有出现。 Spinning Up: SAC中也有强调，

Unlike in TD3, the next-state actions used in the target come from the current policy instead of a target policy.

请问下，加上target network是为了得到更稳定的actor吗？

Runtime error

Can you help resolve this error?

  File "DelayDDPG.py", line 8, in <module>
    from yonv_utils import load_cifar10_data
ModuleNotFoundError: No module named 'yonv_utils'

CUDA error: out of memory

Hello,

Running the run.py in both the main directory and in the MultiGPU directory leads me to have an error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "elegantrl/AgentZoo/ElegantRL-MultiGPU/run.py", line 522, in mp_explore
    agent.init(net_dim, state_dim, action_dim)
  File "/usr/local/lib/python3.8/dist-packages/elegantrl/agent.py", line 687, in init
    self.act = ActorPPO(net_dim, state_dim, action_dim).to(self.device)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 673, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 387, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 409, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 671, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: out of memory

This error persists no matter what batch size I assign or net size I specify and doesnt matter if I try to use Multi-GPU or the main elegantrl/run.py file.

I am running this on an Nvidia Quadro 4000 with 8gb of RAM.

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

>>> torch.__version__
'1.8.0'

In order to get the examples to work, I have to specify the GPU ID of "-1"

EDIT:
If I set the rollout_num to 1 the error changes to this:

Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "elegantrl/AgentZoo/ElegantRL-MultiGPU/run.py", line 546, in mp_explore
    agent.explore_env(env, buffer, exp_step, reward_scale, gamma)
  File "/code/ElegantRL/elegantrl/agent.py", line 714, in explore_env
    action, noise = self.select_action(state)
  File "/code/ElegantRL/elegantrl/agent.py", line 703, in select_action
    actions, noises = self.act.get_action_noise(states)
  File "/code/ElegantRL/elegantrl/net.py", line 144, in get_action_noise
    a_avg = self.net(state)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 94, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 1753, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasCreate(handle)`

Researching it still appears that it is a memory problem, but as far as I can tell the "explore" process should take up about 1.5gb of memory but I have almost 4gb free.

ReplayBuffer Error When gpu can't use

In run.py, line 140.
If I set if_gpu False, the training process error because some variable is np.ndarray. But neural network needs tensor.

Running demo.py, demo2_continuous_action_space_off_policy() and set replaybuffer if_gpu False could show this problems. Error happens:

Traceback (most recent call last):
  File "ElegantRL/elegantrl/demo.py", line 413, in <module>
    demo2_continuous_action_space_off_policy()


  File "ElegantRL/elegantrl/demo.py", line 73, in demo2_continuous_action_space_off_policy
    train_and_evaluate(args)


  File "ElegantRL/elegantrl/run.py", line 153, in train_and_evaluate
    agent.update_net(buffer, target_step, batch_size, repeat_times)  # pre-training and hard update


  File "ElegantRL/elegantrl/agent.py", line 562, in update_net
    obj_critic, state = self.get_obj_critic(buffer, batch_size, alpha)


  File "ElegantRL/elegantrl/agent.py", line 508, in get_obj_critic_raw
    next_a, next_logprob = self.act.get_action_logprob(next_s)


  File "ElegantRL/elegantrl/net.py", line 211, in get_action_logprob
    t_tmp = self.net_state(state)


  File "anaconda3/envs/py3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)


  File "anaconda3/envs/py3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)


  File "anaconda3/envs/py3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)


  File "anaconda3/envs/py3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)


  File "anaconda3/envs/py3/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)


TypeError: linear(): argument 'input' (position 1) must be Tensor, not numpy.ndarray

Some lines number may have bias because I change some code.

Recurrent Neural Network and lookback window

How can this project implement a Recurrent Neural Network? Because all stock or crypto are time-series and lookback is a very important aspect of trading

RuntimeError: mat1 dim 1 must match mat2 dim 0

Thanks for your work.

I am facing a problem when I run demo.py (demo_discrete_action_off_policy())
I set if_train_cart_pole = 1, if_train_lunar_lander = 0 and if_train_cart_pole = 0, if_train_lunar_lander = 1
I got the same error:

Traceback (most recent call last):
  File "/home/lq/ElegantRL/elegantrl/demo.py", line 154, in <module>
    demo_discrete_action_off_policy()
  File "/home/lq/ElegantRL/elegantrl/demo.py", line 126, in demo_discrete_action_off_policy
    train_and_evaluate(args)
  File "/home/lq/ElegantRL/elegantrl/run.py", line 177, in train_and_evaluate
    array_tuple = agent.explore_env(env, target_step)
  File "/home/lq/ElegantRL/elegantrl/agent.py", line 215, in explore_env
    action = self.select_action(state)  # assert isinstance(action, int)
  File "/home/lq/ElegantRL/elegantrl/agent.py", line 271, in select_action
    actions = self.act(states)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lq/ElegantRL/elegantrl/net.py", line 50, in forward
    tmp = self.net_state(state)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/lq/anaconda3/envs/py3.6-elegantrl/lib/python3.6/site-packages/torch/nn/functional.py", line 1692, in linear
    output = input.matmul(weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

AttributeError: 'AgentPPO' object has no attribute 'ClassCri'

While Running the last command on eRL_demo_StockTrading.ipynb on colab

args.init_before_training(if_main=False)
args.env.draw_cumulative_return(args, torch)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-c641199acf36> in <module>()
----> 1 args.env.draw_cumulative_return(args, torch)

2 frames
/usr/local/lib/python3.7/dist-packages/elegantrl/envs/FinRL/StockTrading.py in draw_cumulative_return(self, args, _torch)
    240         cwd = args.cwd
    241 
--> 242         agent.init(net_dim, state_dim, action_dim)
    243         agent.save_load_model(cwd=cwd, if_save=False)
    244         act = agent.act

/usr/local/lib/python3.7/dist-packages/elegantrl/agent.py in init(self, net_dim, state_dim, action_dim, learning_rate, if_use_gae, env_num, agent_id)
    617 
    618     def init(self, net_dim, state_dim, action_dim, learning_rate=1e-4, if_use_gae=False, env_num=1, agent_id=0):
--> 619         super().init(net_dim, state_dim, action_dim, learning_rate, if_use_gae, env_num, agent_id)
    620         self.traj_list = [list() for _ in range(env_num)]
    621 

/usr/local/lib/python3.7/dist-packages/elegantrl/agent.py in init(self, net_dim, state_dim, action_dim, learning_rate, if_per_or_gae, env_num, agent_id)
     52         self.device = torch.device(f"cuda:{agent_id}" if (torch.cuda.is_available() and (agent_id >= 0)) else "cpu")
     53 
---> 54         self.cri = self.ClassCri(int(net_dim * 1.25), state_dim, action_dim).to(self.device)
     55         self.act = self.ClassAct(net_dim, state_dim, action_dim).to(self.device) if self.ClassAct else self.cri
     56         self.cri_target = deepcopy(self.cri) if self.if_use_cri_target else self.cri

AttributeError: 'AgentPPO' object has no attribute 'ClassCri'

But I am able to see that AgentPPO has ClassCri with CriticAdv

class AgentPPO(AgentBase):
    def __init__(self):
        super().__init__()
        self.ClassAct = ActorPPO
        self.ClassCri = CriticAdv

Actual Issue -

Attachments

TypeError: not a sequence

运行程序：
args=Arguments(if_on_policy=False)
args.agent=AgentDDPG()
env=gym.make('Pendulum-v0')
env.target_reward=-200
args.env=PreprocessEnv(env=env)
args.reward_scale=2**-3
args.net_dim=27
args.batch_size=27

train_and_evaluate(args)

发生错误：
File "F:/engine_research_20210901/ElegantRL/ElegantRL-master/elegantrl/run.py", line 717, in demo_continuous_action
train_and_evaluate(args)
File "F:/engine_research_20210901/ElegantRL/ElegantRL-master/elegantrl/run.py", line 190, in train_and_evaluate
steps, r_exp = update_buffer(trajectory)
File "F:/engine_research_20210901/ElegantRL/ElegantRL-master/elegantrl/run.py", line 172, in update_buffer
ary_other = torch.as_tensor([item[1] for item in _trajectory])
TypeError: not a sequence

cannot import elegantrl.tutorial.run.train_and_evaluate_mp

Hi,

When tried the multiprocess training with train_and_evaluate_mp in elegantrl.tutorial.run, it can't import properly.

_Python 3.8.0 (default, Feb 25 2021, 22:10:10)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> from elegantrl.tutorial.run import Arguments, train_and_evaluate

>>> from elegantrl.tutorial.run import train_and_evaluate_mp
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'train_and_evaluate_mp' from 'elegantrl.tutorial.run' (/home/user/elegantrl/ElegantRL/elegantrl/tutorial/run.py)
>>>_

Could you help to fix it?

AgentPPO中的explore_one_env

请教下AgentPPO中的explore_one_env中的splice list 的作用是什么

` def explore_one_env(self, env, target_step):
traj_temp = list()

    state = self.states[0]
    last_done = 0
    for i in range(target_step):
        action, noise = [ary[0] for ary in self.select_actions((state,))]
        next_state, reward, done, _ = env.step(np.tanh(action))
        traj_temp.append((state, reward, done, action, noise))
        if done:
            state = env.reset()
            last_done = i
        else:
            state = next_state
    self.states[0] = state

    '''splice list'''
    traj_list = self.traj_list[0] + traj_temp[:last_done + 1]
    self.traj_list[0] = traj_temp[last_done:]
    return traj_list`

I think reward scaling not working in FinRL env --> early end to training

Hello,
I am using the FinRL environment with a custom dataset (I have 20,000 time periods of training data -- np array shape is [200000, 6*43])

Right now, the training stops after 20,000 time peroids/1 epoch. I think this is because the MaxR becomes more than TargetR, so the training stops.

I try setting args.reward_scale = 1e-20, a very very low number, but the reward for the epoch does not change. It is the same reward as when args.reward_scale = 1.

I checked, and the args.reward_scale variable is passed to run.py. After that, I have a difficult time understanding the code.

Also, is there a way to manually set the # of training epochs? Or the # of time steps? Thank you

`from elegantrl.run import Arguments, train_and_evaluate, train_and_evaluate__multiprocessing
from elegantrl.tutorial.env import PreprocessEnv
import gym
gym.logger.set_level(40) # Block warning

args = Arguments(if_on_policy=True)
'''choose an DRL algorithm'''
from elegantrl.tutorial.agent import AgentPPO
args.agent = AgentPPO()

from elegantrl.env import FinanceMultiStockEnv # a standard env for ElegantRL, not need PreprocessEnv()
args.env = FinanceMultiStockEnv(stock_dim=6, if_train=True, max_stock=1e2, transaction_fee_percent=1e-3,
train_beg=0, train_len=20000, info_col_length=43)
args.env_eval = FinanceMultiStockEnv(stock_dim=6,if_train=False, max_stock=1e2, transaction_fee_percent=1e-3,
train_beg=0, train_len=20000, info_col_length=43) # eva_len = 1699 - train_len
args.reward_scale = 1e-20 #2 ** 0 # RewardRange: 0 < 1.0 < 1.25 <
args.break_step = int(5e6)
args.max_step = args.env.max_step
args.max_memo = (args.max_step - 1) * 8
args.batch_size = 2 ** 11
"TotalStep: 2e5, TargetReward: 1.25, UsedTime: 200s"

'''train and evaluate'''
train_and_evaluate(args)

args.rollout_num = 8

#train_and_evaluate__multiprocessing(args) # try multiprocessing in formal version`

Deal with discrete observation space when computing `state_dim`

Hi! Thank you for this awesome library, it helps me a lot.

I'm new to rl and not sure whether I'm missing something, but it seems that ElegantRL doesn't deal with environment with discrete observation space (like Frozenlake, of which observation_space.shape is ()):

https://github.com/AI4Finance-LLC/ElegantRL/blob/d82f33d960c356bebce4178f881d518197686f06/elegantrl/env.py#L198-L199

Do you think it will be better to change here to:

if isinstance(env.observation_space, gym.spaces.Discrete):
    state_dim = env.observation_space.n
elif isinstance(env.observation_space, gym.spaces.Box):
    state_shape = env.observation_space.shape
    state_dim = state_shape[0] if len(state_shape) == 1 else state_shape  # sometimes state_dim is a list

Thanks.

A2C

Hi! Could I ask if A2C is implemented in EleganRL?

how to continue training an agent after stopping (paused)?

the function " train_and_evaluate__multiprocessing(args) ", when restarted, erases all previous learning process.

Param "args.if_remove = False" - Does not help
Func "agent.save_load_model( )" - Does not help

Maybe added - train_and_evaluate__multiprocessing(args, if_continue=True, start_weight = 'old_save_weight.pth')?

How to make a video for result?

I am inquiring if there is any example code to help us make a video for the result with the help of env.render()?

CriticTwin class didn't use minimum of q1 and q2 as output

in AgentNet.py, in class CriticTwin, def forward(xxx):
x = torch.cat((state, action), dim=1)
q_value = self.net1(x)
return q_value
It seems the CriticTwin is using only self.net1 as output. When this class is used to initialize AgentSAC.cri, the self.cri will only output net1 result when it's called.

When we train with AgentSAC,This will have no influence in computing target, since function get__q1_q2() is called there;but when we compute actor loss, self.cri is called, then there might be an issue. I suggest to replace the above 3 lines into:
x = torch.cat((state, action), dim=1)
q_value1 = self.net1(x)
q_value2 = self.net2(x)
q12 = torch.cat((q_value1,q_value2), dim=1)
q_value_min = torch.min(q12,dim=1,keepdim=True,out=None)[0]
return q_value_min

I've noticed the both the current version and revised version works well with SAC agent on 'BipedalWalker-v3', 'MountainCarContinuous-v0'. I haven't tested on other envs.

AgentPPO has no attribute save_load_model

Executing
args.env.draw_cumulative_return(args, torch) I got the following error message:

AttributeError Traceback (most recent call last)
in ()
6 args.init_before_training(if_main=False)
7
----> 8 args.env.draw_cumulative_return(args, torch)

/usr/local/lib/python3.7/dist-packages/elegantrl/envs/FinRL/StockTrading.py in draw_cumulative_return(self, args, _torch)
241
242 agent.init(net_dim, state_dim, action_dim)
--> 243 agent.save_load_model(cwd=cwd, if_save=False)
244 act = agent.act
245 device = agent.device

AttributeError: 'AgentPPO' object has no attribute 'save_load_model'

ModuleNotFoundError: No module named 'elegantrl'

I followed the installation instructions
When I try to run elegantrl\run.py on Windows and Linux I get this error

Traceback (most recent call last):
File "run.py", line 8, in
from elegantrl.replay import ReplayBuffer, ReplayBufferMP
ModuleNotFoundError: No module named 'elegantrl'

A run error in env file while processing data

There is an error in the file: elegantrl/envs/FinRL/StockTrading.py, line 225, in function: convert_df_to_ary():

tech_items = [item[tech].values.tolist() for tech in tech_indicator_list]
AttributeError: 'numpy.float64' object has no attribute 'values'

If remove '.values':
--- tech_items = [item[tech].values.tolist() for tech in tech_indicator_list]
+++ tech_items = [item[tech].tolist() for tech in tech_indicator_list]

then get a new error:
tech_items_flatten = sum(tech_items, [])
TypeError: can only concatenate list (not "float") to list

Please have a look, thanks!

base env info:
OS: win10 or Ubuntu20.04
Python: 3.8, 3.9
numpy: 1.21, 1.20, 1.19. 1.18 and 1.17

Test case (pick up from elegantrl/envs/FinRL/StockTrading.py::convert_df_to_ary() ):

import pandas as pd
import numpy as np

def convert_df_to_ary(df, tech_indicator_list):

tech_ary = list()
price_ary = list() 
for day in range(len(df.index.unique())): 

    item = df.loc[day]                                                                                        
    tech_items = [item[tech].values.tolist() for tech in tech_indicator_list] 
    tech_items_flatten = sum(tech_items, [])
    tech_ary.append(tech_items_flatten)
    price_ary.append(item.close)  # adjusted close price (adjcp)
                                                                                   
price_ary = np.array(price_ary) 
tech_ary = np.array(tech_ary) 
print(f'| price_ary.shape: {price_ary.shape}, tech_ary.shape: {tech_ary.shape}') 
return price_ary, tech_ary

#for test:
df = pd.read_csv('test.csv')
tech_indicator_list = ['close_30_sma', 'close_60_sma']

convert_df_to_ary(df, tech_indicator_list)

Sample data:

$ head test.csv
date,open,high,low,close,volume,tic,day,close_30_sma,close_60_sma,turbulence
2012-01-03,14.621429443359375,14.73214340209961,14.60714340209961,12.610315322875977,302220800,AAPL,1,11.903397591908773,12.052827676137289,0.7520241067006971
2012-01-04,14.64285659790039,14.8100004196167,14.617142677307129,12.678085327148438,260022000,AAPL,2,11.942750962575277,12.07513124148051,0.06970371670868289
2012-01-05,14.819643020629883,14.948213577270508,14.738213539123535,12.818838119506836,271269600,AAPL,3,11.992857360839844,12.090065081914267,0.3731384027883833
2012-01-06,14.991786003112793,15.098214149475098,14.972143173217773,12.952840805053711,318292800,AAPL,4,12.039764340718587,12.101365041732787,0.3256607277450752
2012-01-09,15.196429252624512,15.276785850524902,15.048213958740234,12.93229866027832,394024400,AAPL,0,12.095717589060465,12.111351569493612,0.025847654177537545
2012-01-10,15.211071014404297,15.214285850524902,15.053570747375488,12.978597640991211,258196400,AAPL,1,12.15670992533366,12.118920628229777,0.023686138936847546
2012-01-11,15.09571361541748,15.101785659790039,14.975357055664062,12.957439422607422,215084800,AAPL,2,12.20416882832845,12.119201691945394,0.024991091810340274

multiple trading condition

How to set multiple trading condition like Candlestick pattern, RSI,Pivots as a input to choose the trend reversal & continuation for "higher Risk to reward entry"?Only close value is used as input in your setup.

No actor and actor_target init in AgentTD3

Hi,

Found the init of acotor and actor_target lost in AgentTD3

self.act = Actor(net_dim, state_dim, action_dim).to(self.device)
self.act_target = deepcopy(self.act)

which leads the error:

  File "/export/home/wepe2320/reference-track/elegantrl/agent.py", line 345, in init
    self.act_optimizer = torch.optim.Adam(self.act.parameters(), lr=self.learning_rate)
AttributeError: 'NoneType' object has no attribute 'parameters'

State to contain last N days

A feature request, if it makes sense for you.
Do you think that if the current state would contain information from the last N days(timestamps) will improve the results when comparing with the current method, that uses only 1 day as current state ?

The alpha loss calculating of SAC is different from other repo

The alpha loss calculated in this repo via:

alpha_loss = (self.alpha_log * (log_prob - self.target_entropy).detach()).mean()

and self.target_entropy is initialized with self.target_entropy = np.log(action_dim)

https://github.com/Yonv1943/ElegantRL/blob/05c720e6b84ff393d38c929066da982dbe59957c/AgentZoo.py#L405

I found another repo which implement the SAC with auto alpha adjustment:

https://github.com/dongminlee94/deep_rl/blob/3cfd41d7e3b7f3cf4a3ee5d6a8ac8f49cbe36d93/agents/sac.py#L154

and that repo calculate alpha_loss with:

alpha_loss = -(self.log_alpha * (log_pi + self.target_entropy).detach()).mean()

and self.target_entropy is initialized with -np.prod((act_dim,)).item()

Why is there such a difference? Am I overlook something?

您好，Actor-Critic Methods方法是不是没法用于离散动作环境

您好，如题，在CartPole中，使用SAC等算法会有RuntimeError: mat1 dim 1 must match mat2 dim 0
的错误，我想请问一下是有类似离散连续动作的参数来控制模型输出离散还是连续动作，还是需要自己修改模型呢

Question about the implementation of advantage function

In eRL_demo_PPOinSingleFile.py there are two get_reward_sum functions:

   def get_reward_sum_raw(self, buf_len, buf_reward, buf_mask, buf_value) -> (torch.Tensor, torch.Tensor):
        buf_r_sum = torch.empty(buf_len, dtype=torch.float32, device=self.device)  # reward sum

        pre_r_sum = 0
        for i in range(buf_len - 1, -1, -1):
            buf_r_sum[i] = buf_reward[i] + buf_mask[i] * pre_r_sum
            pre_r_sum = buf_r_sum[i]
        buf_advantage = buf_r_sum - (buf_mask * buf_value[:, 0])
        return buf_r_sum, buf_advantage

    def get_reward_sum_gae(self, buf_len, ten_reward, ten_mask, ten_value) -> (torch.Tensor, torch.Tensor):
        buf_r_sum = torch.empty(buf_len, dtype=torch.float32, device=self.device)  # old policy value
        buf_advantage = torch.empty(buf_len, dtype=torch.float32, device=self.device)  # advantage value

        pre_r_sum = 0
        pre_advantage = 0  # advantage value of previous step
        for i in range(buf_len - 1, -1, -1):
            buf_r_sum[i] = ten_reward[i] + ten_mask[i] * pre_r_sum
            pre_r_sum = buf_r_sum[i]
            buf_advantage[i] = ten_reward[i] + ten_mask[i] * (pre_advantage - ten_value[i])  # fix a bug here
            pre_advantage = ten_value[i] + buf_advantage[i] * self.lambda_gae_adv
        return buf_r_sum, buf_advantage

They both mutiply current state's value by a mask. And in update_buffer function we can see the mask contains a gamma
ten_mask = (1.0 - torch.as_tensor(_trajectory[2], dtype=torch.float32)) * gamma

That means V(st) is mutiplied by gamma, but we don't need a gamma for current state's value. Does it right? I am having trouble in understanding this, please help me out, thanks.

env is not defined at the last line of eRL_demo_StockTrading.ipynb

I was running eRL_demo_StockTrading.ipynb. Everything is fine except at the end, it complained "env" is not defined. Do you have any ideas how to fix this?