GithubHelp home page GithubHelp logo

ai4finance-foundation / finrl-trading Goto Github PK

View Code? Open in Web Editor NEW
1.9K 96.0 690.0 141.95 MB

For trading. Please star.

Home Page: https://ai4finance.org

License: MIT License

Python 8.25% Jupyter Notebook 91.64% Dockerfile 0.12%
deep-reinforcement-learning stock-trading a2c-algorithm ppo ddpg openai-gym ensemble-strategy stock-trading-strategy automated-stock-trading sharpe-ratio

finrl-trading's People

Contributors

avoydatta avatar bruceyanghy avatar dependabot[bot] avatar kanishkshah avatar pit-storm avatar pitola avatar robinbg avatar yangletliu avatar yeewahchan avatar zhumingpassional avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finrl-trading's Issues

Rebalance window

Hi,
As you said that rebalance window is the number of months to retrain the model, but I cannot understand why the value was 63. The start date of retrain progress began from 2015/10/01 to 2020/05/08, included a validation process.
I could not find out the way why you got your result of rebalance window and validation window to apply in my data.
Thanks,
Thy

hyperparameter tuning

hey sir thanks for your helpful documentation
I have a question and i was wondering if yo could answer
have you tried hyperparameter tuning like bayesian optimization and why you havent used this in your project

thanks

turbulence_threshold

Hi,
I'm following your great work :-) , and in models.py, I came across turbulence_threshold, can you please explain how did you calculate the turbulence_threshold to be 140, 100, 90 for the time periods?
thank you,
Nethanel Koler

Problem with stockenvtrain, stockenvvalidation and stockenvtrade

good evening

I have problems executing the following code (in the "model" folder):

customized env

from EnvMultipleStock_train import StockEnvTrain
from EnvMultipleStock_validation import StockEnvValidation
from EnvMultipleStock_trade import StockEnvTrade

I don't see on Github how to install the above.

Could you help me!?! thank you!!!

GPU Utilization

Hi, great work on this paper.

I was wondering why there is no option to utilize GPU to train the models quicker. Wouldn't that converge quicker and allow you to use a more complex neural net or sample more data?

Thanks

A Synergistic Exploration of RL-Based Stock trading & Quantitative finance Strategies for an Enhanced understanding of the trading.

Discussed in #71

Originally posted by Aditya-dom March 2, 2024

Greetings
I'm eager to showcase my projects and contribute to the exciting realm of AI4Finance. Your Guidances on how to proceed would be greatly appreciated.

1 - RL-Based Stock trading

  • This project is done for the learning purposes autumn. The idea is to investigate various algorithms of Reinforcement Learning to the domain of Stock Trading.

As part of learning, I have read around 10-15 research papers covering a lot of major and minor aspects of Deep Learning and Reinforcement Learning. We explored OpenAI Gym to gain insights into how to setup our stock trading environment.

The various algorithms we discuss are namely:

  • Deep Q Learning
  • Deep Double Q Learning
  • Duelling Deep Double Q Learning
  • Deep Deterministic Policy Gradient
  • Deep Recurrent Double Reinforcement Learning

Stock Trading being a really challenging RL Problem, instead of jumping straightaway to the Stock Trading part, we go on a step by step basis starting from lower state space control problems to high state space Atari Games. And then finally to Stock Trading.

Control Problems

We started with various control problems (lower state space) to test the architectures/algorithms and get a firm grip on the algorithms and Reinforcement Learning as a whole. The problems include -

  • Pendulum
  • Cart Pole
  • Mountain Car

Atari Games

Having tried our algos on lower state space problems, we jumped into the Atari Games that have a really large state space.
The games include -

  • Breakout
  • Pong

The results are reported below -

We implemented both Feedforward Neural Nets and Conv Nets for RAM and image versions of the game respectively. But due to lack of computational resources, we were forced to train on the RAM version. Using OpenCV, we obtained the pixels of the game play and made a video out of it which is shown below.

Our best score on Breakout was close to that reported by DeepMind by certainly less than the best score reported by the OpenAI team which is because we couldn't train for long due to lack of computational resources and time.

Stock Trading

We now shift to the main part of the project i.e. Stock Trading.

We started reading a lot of research papers and articles on algorithms and their applications. We also came across some articles and reports on Stock Trading using DRQN, DDPG and DDRPG algorithms. We explored various aspects of the algorithms and concluded the relevant algorithms.

We began by implementing the environment for Stock Trading using OpenAI Gym. This was a basic single stock version of the environment for ou first DDPG Single Stock Agent. We observed some really good performance that made us rethink the implementaions of the agent and environment.

Having fixed the errors for single stock environment and the agent, the agent was trained on it and we achieved a decent performance which is reported here.

Having gained some intuition of how things are going on, we modularised out environment code to tackle and multi stock scenario. We developed our DDPG agent with a better architecture and also modularised it to pertain to any multi stock environment. The model was trained on the environment and the results are reported here.

We planned to use Recurent Layers in our model and for that we completed the Sequence Modelling Course offered by DeepLearning.ai on Coursera. We trained a DRQN model on the single stock environment as applying it to a multi stock environment made it really complex to code which was not possible to achieve in these limited number of days. The results are reported here.

The basic idea of using the recurrent layers was to somehow make the agent remember past data so that it can use that information to make informed decisions. This is what LSTM promises to do.

Authors

2 - Quantitative finance Strategies

Finance

Introduction

Welcome to 'Quantfinance with backtesting Strategies' - a comprehensive collection of over 200+ Python programs designed for quantitative finance enthusiasts and professionals. This repository is your go-to resource for gathering, manipulating, and analyzing stock market data, leveraging the power of Python to unlock insights in the financial markets.

Organization

Our repository is organized into several key sections:

find_stocks

Programs to screen stocks based on technical and fundamental analysis.

machine_learning

Introductory machine learning applications for stock classification and prediction.

portfolio_strategies

Simulations of trading strategies and portfolio analysis tools.

stock_analysis

Detailed analysis tools for individual stock assessment.

stock_data

Tools for collecting stock price action and company data via APIs and web scraping.

technical_indicators

Visual tools for popular technical indicators like Bollinger Bands, RSI, and MACD.

Installation

To get started, clone the repository and install the required dependencies:

git clone https://github.com/aditya-dom/Quantfinance-with-backtesting.git
cd Finance
pip install -r requirements.txt

Usage

Detailed instructions on how to use each program can be found within their respective directories. Explore different modules to discover their functionalities.

Each script in this collection is stand-alone. Here's how you can run a sample program:

python example_program.py

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Authors

Acknowledgements

Disclaimer

The material in this repository is for educational purposes only and should not be considered professional investment advice.

multi stock data training

for what I understand, it needs 30 stock data to train a model, if there is a lot of stocks(> 30), I want to use all of them to train the model, how can I do that?

for example, firstly I use 30 stock data, then call reset() function, and read next 30 stock data, continue to train model parameters, the key point is that I do not reinit model

is this will work?

SyntaxError

Hello, I cannot run the code because I get this error:

(venv) linus@linux:~/trading/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020$ python run_DRL.py
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Traceback (most recent call last):
  File "run_DRL.py", line 12, in <module>
    from model.models import *
  File "/home/linus/trading/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020/model/models.py", line 23, in <module>
    from env.EnvMultipleStock_train import StockEnvTrain
  File "/home/linus/trading/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020/venv/lib/python3.6/site-packages/env.py", line 51
    print k
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(k)?

> `
```

Please ask why bring my stock data into the training, the action produces nan

The resulting problems are as follows

/home/xie/Tacrypto/DCmaster/env/EnvMultipleStock_validation.py:162: RuntimeWarning: invalid value encountered in greater
buy_index = argsort_actions[::-1][:np.where(actions > 0)[0].shape[0]]
/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020-master/env/EnvMultipleStock_validation.py:136: RuntimeWarning: invalid value encountered in double_scalars
df_total_value['daily_return'].std()
/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020-master/model/models.py:128: RuntimeWarning: invalid value encountered in double_scalars
df_total_value['daily_return'].std()
A2C Sharpe Ratio: nan
======PPO Training========
Training time (PPO): 5.048629434903463 minutes
======PPO Validation from: 20201126 to 20210303
PPO Sharpe Ratio: nan
======DDPG Training========
Training time (DDPG): 0.9256621718406677 minutes
======DDPG Validation from: 20201126 to 20210303
DDPG Sharpe Ratio: 0.07492627731887684

My data did not produce missing values, only DDPG in the three algorithms can run successfully to get results, the rest of A2C and PPO and even ACER, TD3 are training a certain amount of time after the action space to produce nan, why is this?

Integrate with current CNN model

Hi Bruce,

Undoubtedly this is a great contribution to DRL Stock Trading by you. I am new to DRL so it will be a great help if you clear my below doubt:

Currently, we are using our custom trained CNN model to predict Buy, Sell, or Hold classification. Can we in-corporate this model somewhere in the DRL Automated Trading process? Or DRL only works with its own Algorithms like DQN or PPO to train and predict Buy, Sell, or Hold classification?

Regards,
Ankit

No module named 'env'

when i run the code ,it appears
"
old_repo_ensemble_strategy/model/models.py", line 23, in
from env.EnvMultipleStock_train import StockEnvTrain
ModuleNotFoundError: No module named 'env'
"
did you forget upload this file?

deployment page down

The page to deploy the model in a trading platform seems down. Any other options available ? Thank you

ERROR: Could not find a version that satisfies the requirement scikit-learn==0.21.0 (from -r requirements.txt (line 5))

When i run "pip3 install -r requirements.txt"

i get the following error

Downloading stockstats-0.3.2-py2.py3-none-any.whl (13 kB)
ERROR: Could not find a version that satisfies the requirement scikit-learn==0.21.0 (from -r requirements.txt (line 5)) (from versions: 0.9, 0.10, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.14, 0.14.1, 0.15.0b1, 0.15.0b2, 0.15.0, 0.15.1, 0.15.2, 0.16b1, 0.16.0, 0.16.1, 0.17b1, 0.17, 0.17.1, 0.18, 0.18.1, 0.18.2, 0.19b2, 0.19.0, 0.19.1, 0.19.2, 0.20rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.20.4, 0.21rc2, 0.21.1, 0.21.2, 0.21.3, 0.22rc2.post1, 0.22rc3, 0.22, 0.22.1, 0.22.2, 0.22.2.post1, 0.23.0rc1, 0.23.0, 0.23.1, 0.23.2, 0.24.dev0, 0.24.0rc1, 0.24.0)
ERROR: No matching distribution found for scikit-learn==0.21.0 (from -r requirements.txt (line 5))

Any idea how to resolve it ?

Thank you.

What is StockTradingRLEnv.StockEnv for?

HI @BruceYanghy

as you know I'm expanding your work for my masters thesis. And for that, I stepped over the following class:

StockEnv

As far as I scanned over the Code, this Env isn't used anywhere. Can you please explain why this orphan env is there and (more important) you built three different Envs for training validation and trading?

Would be great to hear from you :-)

Kind regards!

Problem with DRL_prediction

Hi everyone,

i think that this code line and especially DRL_prediction call needs to be changed.

df_account_value, df_actions = DRLAgent.DRL_prediction(model=trained_sac, test_data = trade, test_env = env_trade, test_obs = obs_trade)
As in the library the declaration of the function is like this
def DRL_prediction(model, environment):

Thank you in advance

`

Format of original data

I assume that dow_30_2009_2020.csv was the original data, but I have trouble understand what each column represents. Do you mind clarify that so I can try to replicate the results in different stocks.

Including more variables into the algorithm

I spoke with Bruce on LinkedIn and mentioned to him the idea of adding more volume-related data such as time & sales. I believe that the current OHLC format for data is the simpler data which is readily available online to retail traders. However, I feel that the RL agents would vastly improve with the additional input of data such as time & sales (TS). TS data shows a table indexed by time the volume of every single buy and sell order. I think that these additional variables which generate altogether a depth of market (DOM) would be highly valuable information to an RL agent.

I am trying to run this code but NameError: name 'DRLEnsembleAgent' is not defined

rebalance_window = 63 # rebalance_window is the number of days to retrain the model
validation_window = 63 # validation_window is the number of days to do validation and trading (e.g. if validation_window=63, then both validation and trading period will be 63 days)
train_start = '2009-01-01'
train_end = '2020-04-01'
val_test_start = '2020-04-01'
val_test_end = '2021-07-20'

ensemble_agent = DRLEnsembleAgent(df=processed,
train_period=(train_start,train_end),
val_test_period=(val_test_start,val_test_end),
rebalance_window=rebalance_window,
validation_window=validation_window,
**env_kwargs)

ERROR:

NameError Traceback (most recent call last)
in ()
6 val_test_end = '2021-07-20'
7
----> 8 ensemble_agent = DRLEnsembleAgent(df=processed,
9 train_period=(train_start,train_end),
10 val_test_period=(val_test_start,val_test_end),

NameError: name 'DRLEnsembleAgent' is not defined

What do the data column names mean?

Hello,

Excellent work on this. I wanted to evaluate the performance of your models on some other index. I took a look at your data csv files, and found the column names (",datadate,tic,prccd,ajexdi,prcod,prchd,prcld,cshtrd")
I understand what the ticker and datadate represent, but could you elaborate on the other ones?

I would also appreciate a more detailed explanation of the backtesting process, I found the script a little bit sparse, are you simply comparing model result CSVs against the index average?

Thank you!

Backtesting results

The DRL algorithm systematically invests the initial_amount until all the amount is exhausted. However, during validation/trading the entire amount may not be invested (due to hmax) and high initial_amount (1 million). However for the baseline we are using point to point close price returns. This means for the baseline we are considering the entire initial amount whereas for the back test the entire amount may never be invested. This means that the backtest will usually underperform the baseline?

Should the baseline also use a similar investing method...In practice no one would invest 1 million on day 1 even if its an index....alternatively for the baseline we should allocate the initial amount to the index and only allow sell actions?

backtest_plot Error with Crypto or any stock not listed in yfinance

So I was testing out one of the crypto notebooks, it came to my attention that new new project master branch has a new naming convention for the backtest methods, so old notebooks will need to be updated with that.

I also realized a few issues with the data intake of the methods. They are only accepting data formatted in a specific way, like the date column, needs to be named "date" (unlike in crypto "time").

One of the major issues with the project right now is the non standardized methods, classes, inputs and naming conventions used.

Also, there is a redundancy in the operations, for example the below method uses get_baseline, which typically redownloads the data that you have downloaded at the beginning of the notebook? is there something I am not seeing? can we not create a deepcopy of the data at the beginning once and for all and use that as the baseline?

All the above are just me writing down my thoughts, please do let me know what you all think and the best solution so we work on it together to reduce inefficiency and redundant fixes. would truly love to hear your thoughts!

@XiaoYangLiu-FinRL @BruceYanghy

def backtest_plot(
    account_value,
    baseline_start=config.START_TRADE_DATE,
    baseline_end=config.END_DATE,
    baseline_ticker="^DJI",
    value_col_name="account_value",
):

    df = deepcopy(account_value)
    test_returns = get_daily_return(df, value_col_name=value_col_name)

    baseline_df = get_baseline(
        ticker=baseline_ticker, start=baseline_start, end=baseline_end
    )

    baseline_returns = get_daily_return(baseline_df, value_col_name="close")
    with pyfolio.plotting.plotting_context(font_scale=1.1):
        pyfolio.create_full_tear_sheet(
            returns=test_returns, benchmark_rets=baseline_returns, set_context=False
        )

def get_baseline(ticker, start, end):
    dji = YahooDownloader(
        start_date=start, end_date=end, ticker_list=[ticker]
    ).fetch_data()
    return dji

Rebalancing for Individual Models

Hi @BruceYanghy, I was wondering if you used this approach when training your individual models (eg PPO):

Step 1. We use a growing window of n months to retrain our three agents concurrently. In this paper we retrain our three agents at every three months.
Step 2. We validate all three agents by using a 3-month validation rolling window after training window to pick the best performing agent with the highest Sharpe ratio [42]. 
Step 3. After the best agent is picked, we use it to predict and trade for the next quarter.

Is this approach used strictly for choosing the best model of an ensemble, or does it also relate to being able to "redo" previous states much like how RL agents need multiple runs to explore? Also out of curiosity, what script did you use to run the individual models?

adjusting prices with 'ajexdi'

Great work
just have a couple of questions

I'm looking at preprocess code and see that you used the field 'ajexdi' from the raw file 'dow_30_2009_2020.csv' to adjust the value of the prices .
how did you come up with the value of the field 'ajexdi' in the mentioned file ?

Trying to integrate model with SPY stocks

Hello, I'm trying to integrate the current ensemble strategy with SPY stocks data. The data is distributed minute-wise from 2009 to 2022. Can somebody help me with that? What change do I need to make?

Does the cost is updating with new value of shares or old ones ?

When the selling of stock happens, and the cost is updated, I think this update with new amount of shares.
I mean, first time self.state[index+STOCK_DIM+1] is updated and this updated version is participated to compute costs

self.cost +=self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \
             TRANSACTION_FEE_PERCENT ```
`Where I am wrong? The amount of shares  (self. state[index+STOCK_DIM+1]) in cost is a new one or an old one?
`
def _sell_stock(self, index, action):
    # perform sell action based on the sign of the action
    if self.state[index+STOCK_DIM+1] > 0:
        #update balance
        self.state[0] += \
        self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \
         (1- TRANSACTION_FEE_PERCENT)

        self.state[index+STOCK_DIM+1] -= min(abs(action), self.state[index+STOCK_DIM+1])
        self.cost +=self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \
         TRANSACTION_FEE_PERCENT
        self.trades+=1
    else:
        pass

`Here is the full version of codes from the training environment`

improve model

thanks for you amazing article ,
is there new update for this model to be published ,
whats your tips to improve this model

What can I do for Crypto trading?

Thanks for sharing a great library for DRL trading. I am trying to find a way to do Crypto trading with DRL, but all my trials failed so far.

Firstly, I tried Tutorial/3-Practical/Demo_MultiCrypto_Trading.ipynb. But it fails at Arguments importing error which was already reported in the Github repository issue.

Secondly, I tried Tutorial/3-Practical/FinRL_Paper_Traing_Demo.ipynb. It runs well trading stocks. So , I modified the ticker list for crypto, but it failed at downloading data. It seems that it requires modifiying env to import CryptoEnv, and I am afraid that I have to go through the same process already failed at the first trial with Demo_MultiCrypto_Trading.ipynb.

What would be the best way to run crypto trading in this situation? I don't know when the first tutorial script above will be fixed. I am looking for Colab scripts for crypto to avoid payment using a cloud server.

Please give me any sugguests or thoughts.
Thanks in advance.

Question regarding the observation space in the Env

Why is self.observation_space bounded from 0 to np.inf when the techinal indicators (e.g. MACD) can take negative values?

Should it not be:

self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape = (Obs_Dim,))

instead of

self.observation_space = spaces.Box(low=0 high=np.inf, shape = (Obs_Dim,))

with Obs_Dim = Balance+num_stock+_num_shares_holding_per_stock+num_tech_indicators*num_stocks

Thanks

DRL_prediction error

Hi, great article once again,

I am trying to predict using trained models but get the same error in both single and multi stock scenarios.
When I run prediction:

trade = data_split(processed_full, '2019-01-01','2021-01-01')
e_trade_gym = StockTradingEnv(df = trade, turbulence_threshold = 332, **env_kwargs)
df_account_value, df_actions = DRLAgent.DRL_prediction(model=model_sac, environment = e_trade_gym)

the code above gives this error
ValueError: Length mismatch: Expected axis has 0 elements, new values have 1 elements

the error happens in this location:
~\Anaconda3\envs\py36\lib\site-packages\finrl\model\models.py in DRL_prediction(model, environment)
78 action, _states = model.predict(test_obs)
79 account_memory = test_env.env_method(method_name="save_asset_memory")
---> 80 actions_memory = test_env.env_method(method_name="save_action_memory")
81 test_obs, rewards, dones, info = test_env.step(action)
82 if dones[0]:

I am using Jupyter notebook downloaded as of Feb 9, 2021, something must have changed in the modules since you ran it
Thanks
Vadim

I have add other 2 algorithms to this ensemble stragegy but i am gettting error

`# DRL models from Stable Baselines 3
from future import annotations

import time

import numpy as np
import pandas as pd
from stable_baselines3 import A2C
from stable_baselines3 import DDPG
from stable_baselines3 import PPO
from stable_baselines3 import SAC
from stable_baselines3 import TD3
from stable_baselines3.common.callbacks import BaseCallback
from stable_baselines3.common.noise import NormalActionNoise
from stable_baselines3.common.noise import OrnsteinUhlenbeckActionNoise
from stable_baselines3.common.vec_env import DummyVecEnv

from finrl import config
from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.meta.preprocessor.preprocessors import data_split

MODELS = {"a2c": A2C, "ddpg": DDPG, "td3": TD3, "sac": SAC, "ppo": PPO}

MODEL_KWARGS = {x: config.dict[f"{x.upper()}_PARAMS"] for x in MODELS.keys()}

NOISE = {
"normal": NormalActionNoise,
"ornstein_uhlenbeck": OrnsteinUhlenbeckActionNoise,
}

class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""

def __init__(self, verbose=0):
    super().__init__(verbose)

def _on_step(self) -> bool:
    try:
        self.logger.record(key="train/reward", value=self.locals["rewards"][0])
    except BaseException:
        self.logger.record(key="train/reward", value=self.locals["reward"][0])
    return True

class DRLAgent:
"""Provides implementations for DRL algorithms

Attributes
----------
    env: gym environment class
        user-defined class

Methods
-------
    get_model()
        setup DRL algorithms
    train_model()
        train DRL algorithms in a train dataset
        and output the trained model
    DRL_prediction()
        make a prediction in a test dataset and get results
"""

def __init__(self, env):
    self.env = env

def get_model(
    self,
    model_name,
    policy="MlpPolicy",
    policy_kwargs=None,
    model_kwargs=None,
    verbose=1,
    seed=None,
    tensorboard_log=None,
):
    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")

    if model_kwargs is None:
        model_kwargs = MODEL_KWARGS[model_name]

    if "action_noise" in model_kwargs:
        n_actions = self.env.action_space.shape[-1]
        model_kwargs["action_noise"] = NOISE[model_kwargs["action_noise"]](
            mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions)
        )
    print(model_kwargs)
    return MODELS[model_name](
        policy=policy,
        env=self.env,
        tensorboard_log=tensorboard_log,
        verbose=verbose,
        policy_kwargs=policy_kwargs,
        seed=seed,
        **model_kwargs,
    )

def train_model(self, model, tb_log_name, total_timesteps=5000):
    model = model.learn(
        total_timesteps=total_timesteps,
        tb_log_name=tb_log_name,
        callback=TensorboardCallback(),
    )
    return model

@staticmethod
def DRL_prediction(model, environment, deterministic=True):
    test_env, test_obs = environment.get_sb_env()
    """make a prediction"""
    account_memory = []
    actions_memory = []
    #         state_memory=[] #add memory pool to store states
    test_env.reset()
    for i in range(len(environment.df.index.unique())):
        action, _states = model.predict(test_obs, deterministic=deterministic)
        # account_memory = test_env.env_method(method_name="save_asset_memory")
        # actions_memory = test_env.env_method(method_name="save_action_memory")
        test_obs, rewards, dones, info = test_env.step(action)
        if i == (len(environment.df.index.unique()) - 2):
            account_memory = test_env.env_method(method_name="save_asset_memory")
            actions_memory = test_env.env_method(method_name="save_action_memory")
        #                 state_memory=test_env.env_method(method_name="save_state_memory") # add current state to state memory
        if dones[0]:
            print("hit end!")
            break
    return account_memory[0], actions_memory[0]

@staticmethod
def DRL_prediction_load_from_file(model_name, environment, cwd, deterministic=True):
    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")
    try:
        # load agent
        model = MODELS[model_name].load(cwd)
        print("Successfully load model", cwd)
    except BaseException:
        raise ValueError("Fail to load agent!")

    # test on the testing env
    state = environment.reset()
    episode_returns = []  # the cumulative_return / initial_account
    episode_total_assets = [environment.initial_total_asset]
    done = False
    while not done:
        action = model.predict(state, deterministic=deterministic)[0]
        state, reward, done, _ = environment.step(action)

        total_asset = (
            environment.amount
            + (environment.price_ary[environment.day] * environment.stocks).sum()
        )
        episode_total_assets.append(total_asset)
        episode_return = total_asset / environment.initial_total_asset
        episode_returns.append(episode_return)

    print("episode_return", episode_return)
    print("Test Finished!")
    return episode_total_assets

class DRLEnsembleAgent:
@staticmethod
def get_model(
model_name,
env,
policy="MlpPolicy",
policy_kwargs=None,
model_kwargs=None,
seed=None,
verbose=1,
):

    if model_name not in MODELS:
        raise NotImplementedError("NotImplementedError")

    if model_kwargs is None:
        temp_model_kwargs = MODEL_KWARGS[model_name]
    else:
        temp_model_kwargs = model_kwargs.copy()

    if "action_noise" in temp_model_kwargs:
        n_actions = env.action_space.shape[-1]
        temp_model_kwargs["action_noise"] = NOISE[
            temp_model_kwargs["action_noise"]
        ](mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
    print(temp_model_kwargs)
    return MODELS[model_name](
        policy=policy,
        env=env,
        tensorboard_log=f"{config.TENSORBOARD_LOG_DIR}/{model_name}",
        verbose=verbose,
        policy_kwargs=policy_kwargs,
        seed=seed,
        **temp_model_kwargs,
    )

@staticmethod
def train_model(model, model_name, tb_log_name, iter_num, total_timesteps=5000):
    model = model.learn(
        total_timesteps=total_timesteps,
        tb_log_name=tb_log_name,
        callback=TensorboardCallback(),
    )
    model.save(
        f"{config.TRAINED_MODEL_DIR}/{model_name.upper()}_{total_timesteps // 1000}k_{iter_num}"
    )
    return model

@staticmethod
def get_validation_sharpe(iteration, model_name):
    """Calculate Sharpe ratio based on validation results"""
    df_total_value = pd.read_csv(
        f"results/account_value_validation_{model_name}_{iteration}.csv"
    )
    # If the agent did not make any transaction
    if df_total_value["daily_return"].var() == 0:
        if df_total_value["daily_return"].mean() > 0:
            return np.inf
        else:
            return 0.0
    else:
        return (
            (7**0.5)
            * df_total_value["daily_return"].mean()
            / df_total_value["daily_return"].std()
        )

def __init__(
    self,
    df,
    train_period,
    val_test_period,
    rebalance_window,
    validation_window,
    stock_dim,
    hmax,
    initial_amount,
    buy_cost_pct,
    sell_cost_pct,
    reward_scaling,
    state_space,
    action_space,
    tech_indicator_list,
    print_verbosity,
):

    self.df = df
    self.train_period = train_period
    self.val_test_period = val_test_period

    self.unique_trade_date = df[
        (df.date > val_test_period[0]) & (df.date <= val_test_period[1])
    ].date.unique()
    self.rebalance_window = rebalance_window
    self.validation_window = validation_window

    self.stock_dim = stock_dim
    self.hmax = hmax
    self.initial_amount = initial_amount
    self.buy_cost_pct = buy_cost_pct
    self.sell_cost_pct = sell_cost_pct
    self.reward_scaling = reward_scaling
    self.state_space = state_space
    self.action_space = action_space
    self.tech_indicator_list = tech_indicator_list
    self.print_verbosity = print_verbosity

def DRL_validation(self, model, test_data, test_env, test_obs):
    """validation process"""
    for _ in range(len(test_data.index.unique())):
        action, _states = model.predict(test_obs)
        test_obs, rewards, dones, info = test_env.step(action)

def DRL_prediction(
    self, model, name, last_state, iter_num, turbulence_threshold, initial
):
    """make a prediction based on trained model"""

    ## trading env
    trade_data = data_split(
        self.df,
        start=self.unique_trade_date[iter_num - self.rebalance_window],
        end=self.unique_trade_date[iter_num],
    )
    trade_env = DummyVecEnv(
        [
            lambda: StockTradingEnv(
                df=trade_data,
                stock_dim=self.stock_dim,
                hmax=self.hmax,
                initial_amount=self.initial_amount,
                num_stock_shares=[0] * self.stock_dim,
                buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                reward_scaling=self.reward_scaling,
                state_space=self.state_space,
                action_space=self.action_space,
                tech_indicator_list=self.tech_indicator_list,
                turbulence_threshold=turbulence_threshold,
                initial=initial,
                previous_state=last_state,
                model_name=name,
                mode="trade",
                iteration=iter_num,
                print_verbosity=self.print_verbosity,
            )
        ]
    )

    trade_obs = trade_env.reset()

    for i in range(len(trade_data.index.unique())):
        action, _states = model.predict(trade_obs)
        trade_obs, rewards, dones, info = trade_env.step(action)
        if i == (len(trade_data.index.unique()) - 2):
            # print(env_test.render())
            last_state = trade_env.render()

    df_last_state = pd.DataFrame({"last_state": last_state})
    df_last_state.to_csv(f"results/last_state_{name}_{i}.csv", index=False)
    return last_state

def run_ensemble_strategy(
    self, A2C_model_kwargs, PPO_model_kwargs, DDPG_model_kwargs,SAC_model_kwargs,TD3_model_kwargs, timesteps_dict
):
    """Ensemble Strategy that combines PPO, A2C and DDPG"""
    print("============Start Ensemble Strategy============")
    # for ensemble model, it's necessary to feed the last state
    # of the previous model to the current model as the initial state
    last_state_ensemble = []

    ppo_sharpe_list = []
    ddpg_sharpe_list = []
    a2c_sharpe_list = []
    td3_sharpe_list = []
    sac_sharpe_list = []

    model_use = []
    validation_start_date_list = []
    validation_end_date_list = []
    iteration_list = []

    insample_turbulence = self.df[
        (self.df.date < self.train_period[1])
        & (self.df.date >= self.train_period[0])
    ]
    insample_turbulence_threshold = np.quantile(
        insample_turbulence.turbulence.values, 0.90
    )

    start = time.time()
    for i in range(
        self.rebalance_window + self.validation_window,
        len(self.unique_trade_date),
        self.rebalance_window,
    ):
        validation_start_date = self.unique_trade_date[
            i - self.rebalance_window - self.validation_window
        ]
        validation_end_date = self.unique_trade_date[i - self.rebalance_window]

        validation_start_date_list.append(validation_start_date)
        validation_end_date_list.append(validation_end_date)
        iteration_list.append(i)

        print("============================================")
        ## initial state is empty
        if i - self.rebalance_window - self.validation_window == 0:
            # inital state
            initial = True
        else:
            # previous state
            initial = False

        # Tuning trubulence index based on historical data
        # Turbulence lookback window is one quarter (63 days)
        end_date_index = self.df.index[
            self.df["date"]
            == self.unique_trade_date[
                i - self.rebalance_window - self.validation_window
            ]
        ].to_list()[-1]
        start_date_index = end_date_index - 63 + 1

        historical_turbulence = self.df.iloc[
            start_date_index : (end_date_index + 1), :
        ]

        historical_turbulence = historical_turbulence.drop_duplicates(
            subset=["date"]
        )

        historical_turbulence_mean = np.mean(
            historical_turbulence.turbulence.values
        )

        # print(historical_turbulence_mean)

        if historical_turbulence_mean > insample_turbulence_threshold:
            # if the mean of the historical data is greater than the 90% quantile of insample turbulence data
            # then we assume that the current market is volatile,
            # therefore we set the 90% quantile of insample turbulence data as the turbulence threshold
            # meaning the current turbulence can't exceed the 90% quantile of insample turbulence data
            turbulence_threshold = insample_turbulence_threshold
        else:
            # if the mean of the historical data is less than the 90% quantile of insample turbulence data
            # then we tune up the turbulence_threshold, meaning we lower the risk
            turbulence_threshold = np.quantile(
                insample_turbulence.turbulence.values, 1
            )

        turbulence_threshold = np.quantile(
            insample_turbulence.turbulence.values, 0.99
        )
        print("turbulence_threshold: ", turbulence_threshold)

        ############## Environment Setup starts ##############
        ## training env
        train = data_split(
            self.df,
            start=self.train_period[0],
            end=self.unique_trade_date[
                i - self.rebalance_window - self.validation_window
            ],
        )
        self.train_env = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=train,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    print_verbosity=self.print_verbosity,
                )
            ]
        )

        validation = data_split(
            self.df,
            start=self.unique_trade_date[
                i - self.rebalance_window - self.validation_window
            ],
            end=self.unique_trade_date[i - self.rebalance_window],
        )
        ############## Environment Setup ends ##############

        ############## Training and Validation starts ##############
        print(
            "======Model training from: ",
            self.train_period[0],
            "to ",
            self.unique_trade_date[
                i - self.rebalance_window - self.validation_window
            ],
        )
        # print("training: ",len(data_split(df, start=20090000, end=test.datadate.unique()[i-rebalance_window]) ))
        # print("==============Model Training===========")
        print("======A2C Training========")
        model_a2c = self.get_model(
            "a2c", self.train_env, policy="MlpPolicy", model_kwargs=A2C_model_kwargs
        )
        model_a2c = self.train_model(
            model_a2c,
            "a2c",
            tb_log_name=f"a2c_{i}",
            iter_num=i,
            total_timesteps=timesteps_dict["a2c"],
        )  # 100_000

        print(
            "======A2C Validation from: ",
            validation_start_date,
            "to ",
            validation_end_date,
        )
        val_env_a2c = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=validation,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    turbulence_threshold=turbulence_threshold,
                    iteration=i,
                    model_name="A2C",
                    mode="validation",
                    print_verbosity=self.print_verbosity,
                )
            ]
        )
        val_obs_a2c = val_env_a2c.reset()
        self.DRL_validation(
            model=model_a2c,
            test_data=validation,
            test_env=val_env_a2c,
            test_obs=val_obs_a2c,
        )
        sharpe_a2c = self.get_validation_sharpe(i, model_name="A2C")
        print("A2C Sharpe Ratio: ", sharpe_a2c)

        print("======PPO Training========")
        model_ppo = self.get_model(
            "ppo", self.train_env, policy="MlpPolicy", model_kwargs=PPO_model_kwargs
        )
        model_ppo = self.train_model(
            model_ppo,
            "ppo",
            tb_log_name=f"ppo_{i}",
            iter_num=i,
            total_timesteps=timesteps_dict["ppo"],
        )  # 100_000
        print(
            "======PPO Validation from: ",
            validation_start_date,
            "to ",
            validation_end_date,
        )
        val_env_ppo = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=validation,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    turbulence_threshold=turbulence_threshold,
                    iteration=i,
                    model_name="PPO",
                    mode="validation",
                    print_verbosity=self.print_verbosity,
                )
            ]
        )
        val_obs_ppo = val_env_ppo.reset()
        self.DRL_validation(
            model=model_ppo,
            test_data=validation,
            test_env=val_env_ppo,
            test_obs=val_obs_ppo,
        )
        sharpe_ppo = self.get_validation_sharpe(i, model_name="PPO")
        print("PPO Sharpe Ratio: ", sharpe_ppo)
        
        print("======SAC Training========")

model_sac = self.get_model(
"sac", self.train_env, policy="MlpPolicy", model_kwargs=SAC_model_kwargs
)
model_sac = self.train_model(
model_sac,
"sac",
tb_log_name=f"sac_{i}",
iter_num=i,
total_timesteps=timesteps_dict["sac"],
) # 100_000

        print(
            "======SAC Validation from: ",
            validation_start_date,
            "to ",
            validation_end_date,
        )
        val_env_sac = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=validation,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    turbulence_threshold=turbulence_threshold,
                    iteration=i,
                    model_name="SAC",
                    mode="validation",
                    print_verbosity=self.print_verbosity,
                )
            ]
        )
        val_obs_sac = val_env_sac.reset()
        self.DRL_validation(
            model=model_sac,
            test_data=validation,
            test_env=val_env_sac,
            test_obs=val_obs_sac,
        )
        sharpe_sac = self.get_validation_sharpe(i, model_name=“SAC")
        print(“SAC Sharpe Ratio: ", sharpe_sac)

print("======TD3 Training========")

model_td3 = self.get_model(
"td3", self.train_env, policy="MlpPolicy", model_kwargs=TD3_model_kwargs
)
model_td3 = self.train_model(
model_td3,
"td3",
tb_log_name=f"td3_{i}",
iter_num=i,
total_timesteps=timesteps_dict["td3"],
) # 100_000

        print(
            "======TD3 Validation from: ",
            validation_start_date,
            "to ",
            validation_end_date,
        )
        val_env_td3 = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=validation,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    turbulence_threshold=turbulence_threshold,
                    iteration=i,
                    model_name="TD3”,
                    mode="validation",
                    print_verbosity=self.print_verbosity,
                )
            ]
        )
        val_obs_td3 = val_env_td3.reset()
        self.DRL_validation(
            model=model_td3,
            test_data=validation,
            test_env=val_env_td3,
            test_obs=val_obs_td3,
        )
        sharpe_td3 = self.get_validation_sharpe(i, model_name=“TD3”)
        print(“TD3 Sharpe Ratio: ", sharpe_td3)


        print("======DDPG Training========")
        model_ddpg = self.get_model(
            "ddpg",
            self.train_env,
            policy="MlpPolicy",
            model_kwargs=DDPG_model_kwargs,
        )
        model_ddpg = self.train_model(
            model_ddpg,
            "ddpg",
            tb_log_name=f"ddpg_{i}",
            iter_num=i,
            total_timesteps=timesteps_dict["ddpg"],
        )  # 50_000
        print(
            "======DDPG Validation from: ",
            validation_start_date,
            "to ",
            validation_end_date,
        )
        val_env_ddpg = DummyVecEnv(
            [
                lambda: StockTradingEnv(
                    df=validation,
                    stock_dim=self.stock_dim,
                    hmax=self.hmax,
                    initial_amount=self.initial_amount,
                    num_stock_shares=[0] * self.stock_dim,
                    buy_cost_pct=[self.buy_cost_pct] * self.stock_dim,
                    sell_cost_pct=[self.sell_cost_pct] * self.stock_dim,
                    reward_scaling=self.reward_scaling,
                    state_space=self.state_space,
                    action_space=self.action_space,
                    tech_indicator_list=self.tech_indicator_list,
                    turbulence_threshold=turbulence_threshold,
                    iteration=i,
                    model_name="DDPG",
                    mode="validation",
                    print_verbosity=self.print_verbosity,
                )
            ]
        )
        val_obs_ddpg = val_env_ddpg.reset()
        self.DRL_validation(
            model=model_ddpg,
            test_data=validation,
            test_env=val_env_ddpg,
            test_obs=val_obs_ddpg,
        )
        sharpe_ddpg = self.get_validation_sharpe(i, model_name="DDPG")

        ppo_sharpe_list.append(sharpe_ppo)
        a2c_sharpe_list.append(sharpe_a2c)
        sac_sharpe_list.append(sharpe_sac)
        td3_sharpe_list.append(sharpe_td3)
        ddpg_sharpe_list.append(sharpe_ddpg)

        print(
            "======Best Model Retraining from: ",
            self.train_period[0],
            "to ",
            self.unique_trade_date[i - self.rebalance_window],
        )
        # Environment setup for model retraining up to first trade date
        # train_full = data_split(self.df, start=self.train_period[0], end=self.unique_trade_date[i - self.rebalance_window])
        # self.train_full_env = DummyVecEnv([lambda: StockTradingEnv(train_full,
        #                                                    self.stock_dim,
        #                                                    self.hmax,
        #                                                    self.initial_amount,
        #                                                    self.buy_cost_pct,
        #                                                    self.sell_cost_pct,
        #                                                    self.reward_scaling,
        #                                                    self.state_space,
        #                                                    self.action_space,
        #                                                    self.tech_indicator_list,
        #                                                    print_verbosity=self.print_verbosity)])
        # Model Selection based on sharpe ratio
        if (sharpe_ppo >= sharpe_a2c) & (sharpe_ppo >= sharpe_ddpg) & (sharpe_ppo >= sharpe_sac) & (sharpe_ppo >= sharpe_td3):
            model_use.append("PPO")
            model_ensemble = model_ppo

            # model_ensemble = self.get_model("ppo",self.train_full_env,policy="MlpPolicy",model_kwargs=PPO_model_kwargs)
            # model_ensemble = self.train_model(model_ensemble, "ensemble", tb_log_name="ensemble_{}".format(i), iter_num = i, total_timesteps=timesteps_dict['ppo']) #100_000
        elif (sharpe_a2c > sharpe_ppo) & (sharpe_a2c > sharpe_ddpg) & (sharpe_a2c >= sharpe_sac) & (sharpe_a2c >= sharpe_td3)::
            model_use.append("A2C")
            model_ensemble = model_a2c

            # model_ensemble = self.get_model("a2c",self.train_full_env,policy="MlpPolicy",model_kwargs=A2C_model_kwargs)
            # model_ensemble = self.train_model(model_ensemble, "ensemble", tb_log_name="ensemble_{}".format(i), iter_num = i, total_timesteps=timesteps_dict['a2c']) #100_000
            
            elif (sharpe_td3 > sharpe_ppo) & (sharpe_td3 > sharpe_ddpg) & (sharpe_td3 >= sharpe_sac) & (sharpe_td3 >= sharpe_a2c)::
            model_use.append(“TD3”)
            model_ensemble = model_td3

            # model_ensemble = self.get_model("td3",self.train_full_env,policy="MlpPolicy",model_kwargs=TD3_model_kwargs)
            # model_ensemble = self.train_model(model_ensemble, "ensemble", tb_log_name="ensemble_{}".format(i), iter_num = i, total_timesteps=timesteps_dict['td3']) #100_000
     elif (sharpe_sac > sharpe_ppo) & (sharpe_sac > sharpe_ddpg) & (sharpe_sac >= sharpe_td3) & (sharpe_sac >= sharpe_a2c)::
            model_use.append(“SAC”)
            model_ensemble = model_sac

            # model_ensemble = self.get_model("sac",self.train_full_env,policy="MlpPolicy",model_kwargs=SAC_model_kwargs)
            # model_ensemble = self.train_model(model_ensemble, "ensemble", tb_log_name="ensemble_{}".format(i), iter_num = i, total_timesteps=timesteps_dict['sac']) #100_000
    
        else:
            model_use.append("DDPG")
            model_ensemble = model_ddpg

            # model_ensemble = self.get_model("ddpg",self.train_full_env,policy="MlpPolicy",model_kwargs=DDPG_model_kwargs)
            # model_ensemble = self.train_model(model_ensemble, "ensemble", tb_log_name="ensemble_{}".format(i), iter_num = i, total_timesteps=timesteps_dict['ddpg']) #50_000

        ############## Training and Validation ends ##############

        ############## Trading starts ##############
        print(
            "======Trading from: ",
            self.unique_trade_date[i - self.rebalance_window],
            "to ",
            self.unique_trade_date[i],
        )
        # print("Used Model: ", model_ensemble)
        last_state_ensemble = self.DRL_prediction(
            model=model_ensemble,
            name="ensemble",
            last_state=last_state_ensemble,
            iter_num=i,
            turbulence_threshold=turbulence_threshold,
            initial=initial,
        )
        ############## Trading ends ##############

    end = time.time()
    print("Ensemble Strategy took: ", (end - start) / 60, " minutes")

    df_summary = pd.DataFrame(
        [
            iteration_list,
            validation_start_date_list,
            validation_end_date_list,
            model_use,
            a2c_sharpe_list,
            ppo_sharpe_list,
            ddpg_sharpe_list,
            sac_sharpe_list,
            td3_sharpe_list,
        ]
    ).T
    df_summary.columns = [
        "Iter",
        "Val Start",
        "Val End",
        "Model Used",
        "A2C Sharpe",
        "PPO Sharpe",
        "DDPG Sharpe",
        "SAC Sharpe",
        "TD3 Sharpe",
    ]

    return df_summary

Screen Shot 2022-08-11 at 12 26 11 AM

`

docker-image

Nice work! it's possible create an docker image?

Cannot copy sequence with size... even with correct observation space dim.

I'm using the correct observation space dim, calculated by:
Shape = [Current Balance]+[prices 1-30]+[owned shares 1-30] +[macd 1-30]+ [rsi 1-30] + [cci 1-30] + [adx 1-30]

But, as soon as the program starts, I get the ValueError: cannot copy sequence with size 17 to array axis with dimension 19. When I change it to 17, the program runs, but when the DDPG validation starts I get the error again (this time with the correct dim, as calculated above) ValueError: cannot copy sequence with size 19 to array axis with dimension 17.

It seems my dimension is not constant through the iterations. What could be the possible reasons for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.