Comments (3)
Hello,
best is to start with a working example:
that being said, there might be a bug too.
Tagging @kronion and @vwxyzjn as they actually worked with it.
from stable-baselines3-contrib.
It might be a bug, but it's hard to say from the description. Could you share the code to reproduce? And could you show an example of how the mask is being split weirdly? My initial impression is that the (128, 360) shape is intended because each row corresponds to an env in the vecenv.
from stable-baselines3-contrib.
BUT: the shape of the mask is not (360,) or (1,360) but instead it is (128, 360)
this actually looks good to me, we need to retrieve one mask per env.
Does it produce an error?
if so, please provide a minimal example to reproduce the issue and provide the traceback.
(fyi I think that we expect 1D mask from the env even for multi discrete (see #80 (comment)), it will be reshaped by the algorithm afterward)
from stable-baselines3-contrib.
Related Issues (20)
- [Feature Request] Implement Recurrent SAC HOT 16
- [Feature Request] Hybrid PPO
- Speed up when using MaskablePPO HOT 2
- [Feature Request] BBF algorithm implementation HOT 2
- Decrease in reward during training with MaskablePPO
- Maskable PPO selects illegal actions, altough everything looks correct HOT 2
- How to use LSTM ? RecurrentPPO from sb3-contrib HOT 6
- Worse training with Vectorized Environment
- Recurrent PPO Not Training Well on a Very Simple Environment
- Predicting actions after using MaskablePPO model outputs invalid action HOT 2
- [Question] Recurrent PPO evaluation HOT 2
- [Feature Request] Expand RNN Options and Algorithm Flexibility HOT 2
- [Bug]: producing NAN values during training in MaskablePPO HOT 5
- [Question] how to use "lstm_states" from rollout_buffer to reconstruct LSTM states during training HOT 2
- [Feature Request] STAC algorithm HOT 4
- Implementing "Sibling Rivalry" Method from "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards" Paper HOT 1
- EvalCallback crashes Maskable PPO without error HOT 3
- Episodic training with TQC? HOT 2
- [Question] LSTM observations HOT 3
- [Question] Simple way to implement data augmentation when training agent HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-baselines3-contrib.