Comments (14)
I think ideally we want to follow a path similar to leela zero; find an architecture that shows good potential (which is arguably applying human heuristics lol), write a robust implementation, and start a distributed effort. It's really quite straightforward, but at the moment we're still trying to fix bugs, validate the models and wait for more contributors... tomorrow I'll start looking into C++ implementation of this project, you're welcome to join in if you want to expedite the process.
from chess-alpha-zero.
Hello, @simsim314 .
- @benediamond , @Akababa and myself tried to enlarge the feed planes but we failed until now to make the model to converge with the 177 (or so) planes DeepMind uses. You are welcome to try to do this. We are right now still working on this.
- We could try to use different model structures for sure. Maybe this two layer CNN could even be the problem why we cannot get convergence with a more rich input feeding.
- Yes. This is a general issue DeepMind resolved using 1000 TPUs card. In our project a distributed version is ready to be use in the future but we have not yet started to make use of this feature.
- You can develop the supervised learning in that way easily. If you get good results I will merge your pull request ;).
Regards.
from chess-alpha-zero.
Hi @simsim314, thanks for sharing your thoughts.
This project is still under active construction, just yesterday I wrote a much faster MCTS, and I'll optimize and test it more today. As @Zeta36 said, you're always welcome to contribute and ask questions here if you need any help.
from chess-alpha-zero.
@simsim314, In a fork, I have implemented:
- The DeepMind-style 119 planes of input (see here).
- The DeepMind-style NN architecture, with 19 residual layers (see here).
- ...see Akababa's comment...!
- Loading only the n most recent play data files are loaded into memory during optimization, on a "rolling" basis, to ease memory consumption (see here).
Unfortunately, in this setup I have failed to achieve convergence, even during supervised learning. Please feel free to help investigate the reason why.
from chess-alpha-zero.
@benediamond, could you try a fast check?
Why don't you add (as @simsim314 said) to the model some others CNN layers like this:
x = Conv2D(filters=mc.cnn_filter_num, kernel_size=mc.cnn_filter_size, padding="same",
data_format="channels_first", kernel_regularizer=l2(mc.l2_reg))(x)
x = BatchNormalization(axis=1)(x)
x = Activation("relu")(x)
before applying the residual blocks?
Maybe the lack of convergence in your NN with so many feed planes is due to the current limitation to two Conv2D layers in the configuration of our models.
from chess-alpha-zero.
@Zeta36 I'm not sure I understand. As it stands, following DeepMind, we already have a residual tower consisting of
- A convolutional layer containing two filters
- 19 residual blocks, each of which contains two convolutional layer of 256 filters each.
There are then further convolutions in the policy and value heads.
Let me know if you still think something should be changed.
from chess-alpha-zero.
You are right, @benediamond. We have already two convolutional layer for each residual block. My mistake. I don't really know why nor your model neither mine are able to converge when we introduce new feed planes :(.
I could not even make to converge a 14 planes feeding with your one-hot encoder pieces development (??). I wonder if we could at least converge a model with some more lineal planes (without one-hot encoder) like current player color, number of movement, etc. but leaving the pieces planes as ord() integer values.
I don't know why but I've got the feeling that problem comes with the one-hot planes.
from chess-alpha-zero.
@benediamond Have you tried playing with your model yourself to see if it's qualitatively "getting better"? I'm worried about us falling into the trap of mixing validation and training data.
Also I don't know if the concept of losses converging to 0 is a sound one, because a) you have regularization and b) it can't be lower than the "shannon entropy" of the training data, if that makes any sense.
from chess-alpha-zero.
OK I see everything's there, except that it doesn't work well. I think from practical standpoint, we should reach a place where alpha-zero is not giving away it's queen or pieces for free.
How about using some engine, to train alpha-zero on its blunders in certain positions instead of games, this will reduce the training noise significantly. Once we reach a point where it starts to play reasonably well, we can use self-play to improve.
from chess-alpha-zero.
@simsim314 see the comments on this thread. I'm working on a new version that addresses the "policy flipping" issue; I think Akababa might already have one.
from chess-alpha-zero.
@simsim314 Good ideas, I'm currently adjudicating games based on material but as you say it's probably faster to train the naked network on stockfish outputs or something. If you're doing any self play or evaluation though I have a multithreaded MCTS implementation which is much faster. (I will rewrite in C++ when I get the chance, or maybe someone can help me with this)
I started a wiki page on supervised methods so we can organize our thoughts
from chess-alpha-zero.
I think we should be realistic about our access to good hardware. Google had used 5000 TPUs to generate self play, I think we can safely assume we will not get something like this soon. So we should focus on using existing games, and even there making the best out of them - because running on 40 million games with single GPU will currently take years. So we probably need to analyze blunders in positions and teach our network to avoid making them.
Another point is that MCTS AlphaZero is using, runs 80K positions in second, using 4 TPUs. This is equivalent to 720 TFLOPs, or about 100-200 strong GPUs. On my GPU it runs ~800 positions in second and the question is whether it's possible to use such low count of simulations, to even get something that plays well, aiming to play as good as some engines (above 2500).
The alternative here would be to run each line not till the end, but to some point where the evaluation is certain, thus maybe adding to our policy certainty of our score.
from chess-alpha-zero.
@Zeta36: Regarding speed, I expect the bottleneck is in gameplay, written in Python. I am happy to help you with a minimal C implementation for that. Let me know if you're interested. I think your code is beautiful, and I rewriting all of it in C++ is a bad idea, but just a C portion for speed critical gameplay seems appropriate.
from chess-alpha-zero.
Yes!! of course your collaboration will be welcome. Please checkout and as soon as you get stable results ask for a pull request :).
from chess-alpha-zero.
Related Issues (20)
- Compatible with Ampere GPUs?
- Requirements for Newer PCs - Changing tensorflow-gpu==1.15 to tensorflow HOT 1
- Module 'keras.backend' has no attribute 'observe_object_name' HOT 2
- need help understanding the network
- Gobang version
- Why is the engine so weak? HOT 1
- I just can't install the project.
- Evaluate move_model doesn't move model_dir (Windows)
- AttributeError: module 'chess.pgn' has no attribute 'scan_offsets' HOT 1
- Lot of errors in opt HOT 1
- AttributeError: '_thread._local' object has no attribute 'value'
- Addition for 50-moves rule and en passant
- Error with recent merge regarding sl proces
- After fresh SL training data not able to start training.
- Please, help me understand this. HOT 1
- policy_out dimension HOT 1
- Issue with maybe_flip_fen method HOT 5
- Change the command for Supervised Learning in README.md
- AttributeError: 'Functional' object has no attribute '_make_predict_function' HOT 1
- Takes 30s to 40s per move!! and Why does it play only one opening as white? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chess-alpha-zero.