pythonlessons / rl-bitcoin-trading-bot Goto Github PK
View Code? Open in Web Editor NEWTrying to create Reinforcement Learning powered Bitcoin trading bot
License: MIT License
Trying to create Reinforcement Learning powered Bitcoin trading bot
License: MIT License
Hello,
First, thank you so much for your source code.
But there is a memory leak issue when training,
The source stop at almost 2000th episode.
My PC has 32G RAM.
Below is the last part of messages
...
episode: 2074 [000020] worker: 0 net worth: 1035202.66 average: 1177863.37 orders: 38
episode: 2075 [000020] worker: 5 net worth: 1458406.87 average: 1163223.70 orders: 10
episode: 2076 [000020] worker: 5 net worth: 467591.76 average: 1156877.23 orders: 1
unable to alloc 3840000 bytes
Hi,
Big mistake right here :
https://github.com/pythonlessons/RL-Bitcoin-trading-bot/blob/main/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py#L309
Basically you set your reward as the difference in value of the trades. Which is fine.
The value of a trade is amount of the thing multplied by the price of the thing. Fine.
However, in this particular line, you have made a mistake (by copy pasting your code I believe, it happens).
Because you set the amount of the buy trade the same as the amount of the previous sell trade (= bypassing the fees).
It is only one character at line 309 ... and it dramastically changes your results and the way the model learns.
Still, I would like to thank you for providing your lessons and code online. Sincerely.
I am doing other personal things using pytorch and you're inspiring a lot.
It is a great experience for me to try and translate your code and other youtubers work into pytorch.
You have no idea how much I learned faster thanks to you.
I like a lot your blog posts, mathematical explanations.
Keep it up.
Hello!
I was reviewing your code base, considered using it as part of a demo for a class I teach. The initial run didn't seem to be learning much. I went into the function AddIndicators and added 2 new indicators:
# Add a magic indicator that will tell you tomorrow's return
df['magic'] = df['Close'].pct_change().shift(-1)
df['magic8'] = df['Close'].pct_change(8).shift(-8)
So, the idea here is that I will tell the bot what the return will be in 1 hour and in 8 hours. With this information, a human trader could make a huge return. I've done this same test on 3 other code bases, and only 1 could actually learn this.
After implementing this change and re-running the bot, for thousands of episodes, it does not seem to have learned much. The average return and episodic return aren't zooming up like I would expect.
episode: 26340 worker: 21 net worth: 943.79 average: 1046.29 orders: 2
episode: 26341 worker: 14 net worth: 846.84 average: 1044.38 orders: 6
episode: 26342 worker: 26 net worth: 1019.90 average: 1045.11 orders: 34
episode: 26343 worker: 16 net worth: 1661.54 average: 1051.10 orders: 92
episode: 26344 worker: 20 net worth: 1020.38 average: 1051.17 orders: 49
episode: 26345 worker: 24 net worth: 989.14 average: 1052.00 orders: 3
episode: 26346 worker: 19 net worth: 990.24 average: 1052.65 orders: 4
This is very similar to the first few episodes, except generally the number of orders has declined.
This might be due to the convolution layer 'blurring out' or averaging away the ability of the bot to notice that one of its features is very helpful.
Expected behavior: Return should get much higher when the bot is provided with perfect information from the future.
Actual behavior: Doesn't seem to change anything.
Thank you!
Hi, I want to ask about this:
Why you use np.random.choice(self.action_space, p=prediction) but not np.argmax()??
Hi!
I read your articles on pylessons, thanks a lot! I have a suggestion to make since I feel like you are missing two very important parts of the "trading environment": Bid-Ask-Spread and exchange fees. Especially the first one will MASSIVELY impact how your algorithm performs in real life. As far as I get from your code, you are currently determining the "current price" by just selecting the Open
price of the current time window.
I don't know how familiar you are with crypto-trading or trading in general - I am myself a beginner at this, so don't take everything here to be correct - but when you want to buy or sell something, you have to find someone who is willing to buy or willing to sell at the price you suggest. In trading, this usually done by your broker (i.e. exchange in crypto) via the orderbook. Orders (bids and asks) with a target price for buying ore selling are placed there and if there is a matching price a trade is made. However, the difference between the current stock/coin price and the bids and asks can be very big and differs from exchange to exchange and throughout time dramatically. The difference between the best bid and ask price in the orderbook is called bid-ask spread. For bitcoin you can for example look at the previous bid ask spread here: https://data.bitcoinity.org/markets/spread/2y/USD?c=e&st=log&t=l - however, for smaller coins the spread is usually much higher since the market isn't that big. For MOB/USD on the FTX exchange for example bid/ask spread is currently 0.8%. That is HUGE. It means, every time you buy or sell you will lose 0.8% of your money.
You usually have a couple of choices to place an order, but the two most basic choices are:
Now on top of that, you have trading fees. They are exchange dependend, and there are two different ones:
There are tricks to get maker fees with limit orders too (the POST option) but they usually don't guarantee an immediate sale either.
The performance of your algorithm will be heavily impacted by the bid-ask spread of the market it tries to trade. The bid/ask fees are more or less static, but bid-ask spread is a problem. For comparison I implemented some simple hand written bots and traded on a sample of the DOGE/USD coin with it - one time I assumed 0.1% cost for a trade and one time 1% cost. It literally meant the difference between a 457.68% increase with 0.1% assumed trading costs to 0.00003% left of the original 100% with 1% cost. Because the bots were trading far too often to make the 1% costs of the trade feasible.
So bid-ask spread and proper ordering should be part of your model to make it realistic and something your agents should be able to observe and learn from.
C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py:2359: UserWarning: Model.state_updates
will be removed in a future version. This property should not be used in TensorFlow 2.0, as updates
are applied automatically.
updates=self.state_updates,
2023-11-30 19:38:24.841936: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE SSE2 SSE3 SSE4.1 SSE4.2 AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 19:38:24.863100: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:388] MLIR V1 optimization pass is not enabled
2023-11-30 19:38:24.898453: W tensorflow/c/c_api.cc:305] Operation '{name:'dense_15/kernel/Assign' id:566 op device:{requested: '', assigned: ''} def:{{{node dense_15/kernel/Assign}} = AssignVariableOp[_has_manual_control_dependencies=true, dtype=DT_FLOAT, validate_shape=false](dense_15/kernel, dense_15/kernel/Initializer/stateless_random_uniform)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
2023-11-30 19:38:25.816994: W tensorflow/c/c_api.cc:305] Operation '{name:'dense_7/BiasAdd' id:272 op device:{requested: '', assigned: ''} def:{{{node dense_7/BiasAdd}} = BiasAdd[T=DT_FLOAT, _has_manual_control_dependencies=true, data_format="NHWC"](dense_7/MatMul, dense_7/BiasAdd/ReadVariableOp)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
Traceback (most recent call last):
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 313, in
train_agent(train_env, visualize=False, train_episodes=20000, training_batch_size=500)
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 273, in train_agent
env.replay(states, actions, rewards, predictions, dones, next_states)
File "C:\Users\G15\Desktop\tradingbotRL\bot.py", line 213, in replay
a_loss = self.Actor.Actor.fit(states, y_true, epochs=self.epochs, verbose=0, shuffle=True)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 856, in fit
return func.fit(
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 734, in fit
return fit_loop(
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 192, in model_iteration
f = _make_execution_function(model, mode)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_arrays_v1.py", line 620, in _make_execution_function
return model._make_execution_function(mode)
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 2366, in _make_execution_function
self._make_train_function()
File "C:\Users\G15\Desktop\tradingbotRL\env\lib\site-packages\keras\src\engine\training_v1.py", line 2284, in _make_train_function
updates = self.optimizer.get_updates(
AttributeError: 'Adam' object has no attribute 'get_updates'
How would I prevent the model from updating W/Bs while doing testing? I'd like to get one model with better generalization and I think I can do that if it stops updating every nth batch size while testing.
I have one question. Shouldn't the indicators be for more days? I mean if you get 1h candles and the RSI is calculated with 14 days, shouldn't it be 24 x 14 = 336 margin rows? Because I understand that the library takes the days as rows of the data frame, as if the rows were days.
# Calculate reward
def get_reward(self):
if self.episode_orders > 1 and self.episode_orders > self.prev_episode_orders:
self.prev_episode_orders = self.episode_orders
if self.trades[-1]['type'] == "buy" and self.trades[-2]['type'] == "sell":
reward = self.trades[-2]['total']*self.trades[-2]['current_price'] - self.trades[-2]['total']*self.trades[-1]['current_price']
self.trades[-1]["Reward"] = reward
return reward
elif self.trades[-1]['type'] == "sell" and self.trades[-2]['type'] == "buy":
reward = self.trades[-1]['total']*self.trades[-1]['current_price'] - self.trades[-2]['total']*self.trades[-2]['current_price']
self.trades[-1]["Reward"] = reward
return reward
# return needed
else:
return 0
This code returns None sometimes and it doesn't seem to have any rhyme or reason to do so. I noticed there's no else in the nested conditional to return anything and I believe this is the source of the error as my commented code would suggest above.
Hey Rokas!
Thanks for your code. I am playing with it for a while. What I see, is that the machine like does not learn. It starts with the results that are better than the results it achieves after a while. In the meantime, it acts alternatively, like chasing a sinusoid pattern.
Do you have any idea?
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Hi, i‘m watching your work here now from time to time, and i‘m happy to see, someone try to build a tensor trading bot. i‘m not familiar with tensor, but i‘remember my study time, we was build a technical Kreatur what should walk forward, we was give only this job, to the NN, and later a lot epoche, it was move forward ... in a way where we was smile.
so, maybe the same can happen here, the bot can teach himself for found the best strategy to to learn how to get food (for example btc)
i‘dont know, this is possible ... but it‘s how i’thing, should work a AI. learn to get food in a live Szenario ...
Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.
VC:
Python : 3.8.10
tensorflow = 2.3.1
Windows = 11
No IDLE, Using script mode from windows power shell virtual env.
Below is the complete Traceback of the error I received.
2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 501, in
train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5)
File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing
a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id])
File "RL-Bitcoin-trading-bot_7.py", line 121, in replay
advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values))
File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
File "RL-Bitcoin-trading-bot_7.py", line 93, in
deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)]
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity.
Thanks.
I run default script in the Windows 10 with:
test_multiprocessing(CustomEnv, CustomAgent, test_df, test_df_nomalized, num_worker = 16, visualize=True, (...)
im main.py.
When it comes to utils.py to line:
img = img.reshape(self.fig.canvas.get_width_height()[::-1] + (3,))
I have error:
Traceback (most recent call last):
File "C:\Users\Tomek\anaconda3\lib\multiprocessing\process.py", line 315, in _bootstrap
self.run()
File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\multiprocessing_env.py", line 35, in run
self.env.render(self.visualize)
File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\RL-Bitcoin-trading-bot_7.py", line 324, in render
img = self.visualization.render(self.df.loc[self.current_step], self.net_worth, self.trades)
File "c:\Users\Tomek\Documents\binance_bot\RL-Bitcoin-trading-bot-main_my_version\utils.py", line 231, in render
img = img.reshape(self.fig.canvas.get_width_height()[::-1] + (3,))
ValueError: cannot reshape array of size 15360000 into shape (800,1600,3)
Prints are following:
print("2", img, "len", len(img))
print("3", self.fig.canvas.get_width_height())
print("4", self.fig.canvas.get_width_height()[::-1])
2 [255 255 255 ... 255 255 255] len 15360000
3 (1600, 800)
4 (800, 1600)
What should I do? figsize is:
# figsize attribute allows us to specify the width and height of a figure in unit inches
fig = plt.figure(figsize=(16,8))
Hello,
I played around with your code. Fantastic job really... Thanks for sharing.
The code and trained agent can beat not only historical data that is trained on but also a data that agent never saw.
I made some tests on bull, bear and sideway markets. Models it learned can bring a profit of 10-20% in one week (I am looking into 5 minutes candle sticks and testing on the 1 week period, around 2000 points). After I got satisfying results I decided to write some lines so that I can implement this in real life with a small money.
But I am a little unsure about the approach...
What I am thinking now is the following:
As I said I am a little unsure if this is the correct approach.
Any thoughts?
PS. I can not, by no means, claim that I am a coder. I can read, understand and alter but the things I try are probably are far from efficiency :)
Anyways... Hope you guys have some thoughts so that we can put this fantastic code one step further...
not sure if it's a graphical error or not but the red and green buy/sell markers end up being placed on the same candlestick for me. I've been through the code and it doesn't seem like that should be possible.
I am not sure why you you did the random prediction this way action = np.random.choice(self.action_space, p=prediction) also you are picking the random outcome as the chosen action during testing and training. I can understand why during training but why during testing as well ?
Here's the full traceback call I am getting in nomalizing the data; I made the column_list print.
['Unnamed: 0', 'Date', 'Open', 'Close', 'High', 'Low', 'Volume', 'sma7', 'sma25', 'sma99', 'bb_bbm', 'bb_bbh', 'bb_bbl', 'psar', 'MACD', 'RSI']
Traceback (most recent call last):
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 143, in na_arithmetic_op
result = expressions.evaluate(op, left, right)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\computation\expressions.py", line 233, in evaluate
return _evaluate(op, op_str, a, b) # type: ignore
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\computation\expressions.py", line 68, in _evaluate_standard
return op(a, b)
TypeError: unsupported operand type(s) for -: 'str' and 'float'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "RL-Bitcoin-trading-bot_7.py", line 563, in
df_nomalized = Normalizing(df[99:])[1:].dropna()
File "D:\RL-Bitcoin-trading-bot_7\utils.py", line 290, in Normalizing
df[column] = df[column] - df[column].shift(1)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\common.py", line 65, in new_method
return method(self, other)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops_init_.py", line 343, in wrapper
result = arithmetic_op(lvalues, rvalues, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 190, in arithmetic_op
res_values = na_arithmetic_op(lvalues, rvalues, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 150, in na_arithmetic_op
result = masked_arith_op(left, right, op)
File "D:\RL-Bitcoin-trading-bot_7\tazz\lib\site-packages\pandas\core\ops\array_ops.py", line 92, in masked_arith_op
result[mask] = op(xrav[mask], yrav[mask])
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Any sort of help would be appreciated.
Thanks.
I noticed that you are reseting the env at the beginning of each episode!
By doing this you never go forward in your dataset!
check train_agent function.
Thats why you train your model for 50000 episodes and nothing goes wrong!! (all your database is smaller than 50000. it is 23450)
Hi Rokas,
Thank you for the lesson and fantastic work on this tutorial. I have a quick question about why you set the current price equal to the 'open' price rather than the Close. The agent is able to see the close price for any time step, but is still able to execute at the open price. Doesn't this provide information that a real trader would not have? In other words, isn't the agent able to see that, for example, the close is higher than the open, and therefore should buy? I think this introduces some form of lookahead bias. Do you know what your results look like if you set the current price to the close?
Thanks
Hi,
I just finished your tutorial, and it's really interesting,
I'm just wondering why you are using np.random.choice()
to select your action from the prediction, is it not better to get the maximum value instead of using random?
predictions_list = agent.Actor.actor_predict(np.reshape(state, [num_worker]+[_ for _ in state[0].shape]))
actions_list = [np.random.choice(agent.action_space, p=i) for i in predictions_list]
Hi, thx for lesson
...
thx!
You are basically leaving fees to yourself. But they should go to the exchange provider.
Line should be self.balance -= self.crypto_bought * current_price
self.balance = 0
as is written in comment "buy with 100% of current balance" - this means balance should become zero after that (in your code you basically subtract from balance your balance reduced by fee so after this your balance contains fee which should be instead given to the exchange provider):
RL-Bitcoin-trading-bot/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py
Lines 265 to 268 in a01c8c8
Line should be self.crypto_held -= self.crypto_sold
self.crypto_held = 0
as you are selling here all your coins (instead you are just subtracting amount of coins reduced by fee so in your purse is left this "fee" amount of coins which should go to the exchange provider).
RL-Bitcoin-trading-bot/RL-Bitcoin-trading-bot_7/RL-Bitcoin-trading-bot_7.py
Lines 274 to 278 in a01c8c8
Hello,
I am a bit confused about how your TradingGrpah.Render function deals with deque list Render_data.
Your code is the following:
self.render_data.append([Date, Open, High, Low, Close])
# Clear the frame rendered last step
self.ax1.clear()
candlestick_ohlc(self.ax1, **self.render_data**, width=0.8/24, colorup='green', colordown='red', alpha=0.8)
for this to work on my end I have to reference an index of the deque list like so:
self.render_data.append([Date, Open, High, Low, Close])
# Clear the frame rendered last step
self.ax1.clear()
candlestick_ohlc(self.ax1, **self.render_data[0]**, width=0.8/24, colorup='green', colordown='red', alpha=0.8)
If I don't do this and use your method I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
or this if I pass my data in the same way using df.loc[current_step]
ValueError: not enough values to unpack (expected 5, got 0)
Any suggestions?
Is there any way to speed up training and get the program to use more GPU?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.