I try to use your code to train a model, but there are several "stop production" issue

Hello, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

You are right, <a class="user-mention notranslate" data-hovercard-type="user" data-hov

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Lets make it practical about chess-alpha-zero HOT 14 OPEN

zeta36 commented on May 22, 2024

Lets make it practical

from chess-alpha-zero.

Comments (14)

Akababa commented on May 22, 2024 1

I think ideally we want to follow a path similar to leela zero; find an architecture that shows good potential (which is arguably applying human heuristics lol), write a robust implementation, and start a distributed effort. It's really quite straightforward, but at the moment we're still trying to fix bugs, validate the models and wait for more contributors... tomorrow I'll start looking into C++ implementation of this project, you're welcome to join in if you want to expedite the process.

from chess-alpha-zero.

Zeta36 commented on May 22, 2024

Hello, @simsim314 .

@benediamond , @Akababa and myself tried to enlarge the feed planes but we failed until now to make the model to converge with the 177 (or so) planes DeepMind uses. You are welcome to try to do this. We are right now still working on this.
We could try to use different model structures for sure. Maybe this two layer CNN could even be the problem why we cannot get convergence with a more rich input feeding.
Yes. This is a general issue DeepMind resolved using 1000 TPUs card. In our project a distributed version is ready to be use in the future but we have not yet started to make use of this feature.
You can develop the supervised learning in that way easily. If you get good results I will merge your pull request ;).

Regards.

from chess-alpha-zero.

Akababa commented on May 22, 2024

Hi @simsim314, thanks for sharing your thoughts.

This project is still under active construction, just yesterday I wrote a much faster MCTS, and I'll optimize and test it more today. As @Zeta36 said, you're always welcome to contribute and ask questions here if you need any help.

from chess-alpha-zero.

benediamond commented on May 22, 2024

@simsim314, In a fork, I have implemented:

The DeepMind-style 119 planes of input (see here).
The DeepMind-style NN architecture, with 19 residual layers (see here).
...see Akababa's comment...!
Loading only the n most recent play data files are loaded into memory during optimization, on a "rolling" basis, to ease memory consumption (see here).

Unfortunately, in this setup I have failed to achieve convergence, even during supervised learning. Please feel free to help investigate the reason why.

from chess-alpha-zero.

Zeta36 commented on May 22, 2024

@benediamond, could you try a fast check?

Why don't you add (as @simsim314 said) to the model some others CNN layers like this:

        x = Conv2D(filters=mc.cnn_filter_num, kernel_size=mc.cnn_filter_size, padding="same",
                   data_format="channels_first", kernel_regularizer=l2(mc.l2_reg))(x)
        x = BatchNormalization(axis=1)(x)
        x = Activation("relu")(x)

before applying the residual blocks?

Maybe the lack of convergence in your NN with so many feed planes is due to the current limitation to two Conv2D layers in the configuration of our models.

from chess-alpha-zero.

benediamond commented on May 22, 2024

@Zeta36 I'm not sure I understand. As it stands, following DeepMind, we already have a residual tower consisting of

A convolutional layer containing two filters
19 residual blocks, each of which contains two convolutional layer of 256 filters each.

There are then further convolutions in the policy and value heads.

Let me know if you still think something should be changed.

from chess-alpha-zero.

Zeta36 commented on May 22, 2024

You are right, @benediamond. We have already two convolutional layer for each residual block. My mistake. I don't really know why nor your model neither mine are able to converge when we introduce new feed planes :(.

I could not even make to converge a 14 planes feeding with your one-hot encoder pieces development (??). I wonder if we could at least converge a model with some more lineal planes (without one-hot encoder) like current player color, number of movement, etc. but leaving the pieces planes as ord() integer values.

I don't know why but I've got the feeling that problem comes with the one-hot planes.

from chess-alpha-zero.

Akababa commented on May 22, 2024

@benediamond Have you tried playing with your model yourself to see if it's qualitatively "getting better"? I'm worried about us falling into the trap of mixing validation and training data.

Also I don't know if the concept of losses converging to 0 is a sound one, because a) you have regularization and b) it can't be lower than the "shannon entropy" of the training data, if that makes any sense.

from chess-alpha-zero.

simsim314 commented on May 22, 2024

OK I see everything's there, except that it doesn't work well. I think from practical standpoint, we should reach a place where alpha-zero is not giving away it's queen or pieces for free.

How about using some engine, to train alpha-zero on its blunders in certain positions instead of games, this will reduce the training noise significantly. Once we reach a point where it starts to play reasonably well, we can use self-play to improve.

from chess-alpha-zero.

benediamond commented on May 22, 2024

@simsim314 see the comments on this thread. I'm working on a new version that addresses the "policy flipping" issue; I think Akababa might already have one.

from chess-alpha-zero.

Akababa commented on May 22, 2024

@simsim314 Good ideas, I'm currently adjudicating games based on material but as you say it's probably faster to train the naked network on stockfish outputs or something. If you're doing any self play or evaluation though I have a multithreaded MCTS implementation which is much faster. (I will rewrite in C++ when I get the chance, or maybe someone can help me with this)

I started a wiki page on supervised methods so we can organize our thoughts

from chess-alpha-zero.

simsim314 commented on May 22, 2024

I think we should be realistic about our access to good hardware. Google had used 5000 TPUs to generate self play, I think we can safely assume we will not get something like this soon. So we should focus on using existing games, and even there making the best out of them - because running on 40 million games with single GPU will currently take years. So we probably need to analyze blunders in positions and teach our network to avoid making them.

Another point is that MCTS AlphaZero is using, runs 80K positions in second, using 4 TPUs. This is equivalent to 720 TFLOPs, or about 100-200 strong GPUs. On my GPU it runs ~800 positions in second and the question is whether it's possible to use such low count of simulations, to even get something that plays well, aiming to play as good as some engines (above 2500).

The alternative here would be to run each line not till the end, but to some point where the evaluation is certain, thus maybe adding to our policy certainty of our score.

from chess-alpha-zero.

lucasart commented on May 22, 2024

@Zeta36: Regarding speed, I expect the bottleneck is in gameplay, written in Python. I am happy to help you with a minimal C implementation for that. Let me know if you're interested. I think your code is beautiful, and I rewriting all of it in C++ is a bad idea, but just a C portion for speed critical gameplay seems appropriate.

from chess-alpha-zero.

Zeta36 commented on May 22, 2024

Yes!! of course your collaboration will be welcome. Please checkout and as soon as you get stable results ask for a pull request :).

from chess-alpha-zero.

Lets make it practical about chess-alpha-zero HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs