kobanium / tamago Goto Github PK

View Code? Open in Web Editor NEW

52.0 4.0 11.0 1.91 MB

Computer go engine using Monte-Carlo Tree Search written in Python3.

License: Apache License 2.0

Python 99.91% Shell 0.09%

baduk go weiqi mcts monte-carlo-tree-search deep-learning go-text-protocol alphago alphago-zero alphagozero

tamago's People

Contributors

Stargazers

Watchers

Forkers

cglemon muzudho fyanuck tkzwhr soumyak4 bleu48 me11060 koshikawa kaorahi yuito-it tantakn

tamago's Issues

リモートサーバで実行する際に改行文字がうまく扱えない

WindowsからSSHでLinuxにあるTamaGoを直接起動すると改行コードの違いでクライアントが正しく動かない。

MCTSの可視化

TamaGo を少し改造して MCTS の可視化を試しています. 現状のやっつけ実装からいくらか体裁を整えたら, マージの可能性はございますでしょうか?

教育やデモには良さそうに思われますが, プロジェクトの趣旨に合わなければご遠慮なく却下ください. 概念図ではない「現物」は見たことがなかったので, 個人的にはおもしろいです.

アニメーション

https://github.com/kaorahi/TamaGo/tree/mcts_step

mcts.mp4

色 = 勝率 (前回の自手番との相対値)
文字の大きさ = visits (親との相対値)

(kaorahi/lizgoban#100 を参考にしました)

ツリー表示

https://github.com/kaorahi/TamaGo/tree/treeviz

100visits

ノードの大きさ = visits
ノードの形 = 手番 (□は高 winrate (青) を, ○は低 winrate (赤) をめざす)
兄弟ノードの順序 = 左から右へ visits 順 (最左をたどるのが principal variation)
色 = winrate (自身と子孫ノードたちの NN winrate の平均)
ふちの色 = NN winrate (探索なしの素のニューラルネット出力)
辺の太さ = policy

1000 visits

10000 visits (principal variation 周辺のみ)

A minor mistake of network structure.

In the nn\network\res_block.py, the forwarding is

def forward(self, input_plane: torch.Tensor) -> torch.Tensor:
    hidden_1 = self.relu(self.bn1(self.conv1(input_plane)))
    hidden_2 = self.relu(self.bn2(self.conv2(hidden_1)))
    return input_plane + hidden_2

The correct block forwarding should be

def forward(self, input_plane: torch.Tensor) -> torch.Tensor:
    hidden_1 = self.relu(self.bn1(self.conv1(input_plane)))
    hidden_2 = self.bn2(self.conv2(hidden_1))
    return self.relu(input_plane + hidden_2)

Command line option for mini-batch size

--batch-sizeオプションで1より大きい値を指定すると落ちる

May be a bug

In nn/feature.py
`
def generate_rl_target_data(board: GoBoard, improved_policy_data: str, sym: int=0) -> np.ndarray:

split_data = improved_policy_data.split(" ")[1:]
target_data = [1e-18] * len(board.board)
for datum in split_data[1:]:
    pos, target = datum.split(":")
    coord = board.coordinate.convert_from_gtp_format(pos)
    target_data[coord] = float(target)`

Maybe the second [1:] should be removed, it removes the number of child nodes twice

1. In the `pucb.py`, the value should not be zero if there is no visits.

exploration = np.divide(value_sum, children_visits, \
  out=np.zeros_like(value_sum), where=(children_visits != 0))

The value should be 0.5 (draw value)?

exploration = np.divide(value_sum, children_visits, \
  out=np.zeros_like(value_sum) + 0.5, where=(children_visits != 0))

2. In the `tree.py`, update the nodes via `reverse_path`. Not `path`.

    def process_mini_batch(self, board: GoBoard): # pylint: disable=R0914
    ......
            if path:
                value = value_dist[0] + value_dist[1] * 0.5

                reverse_path = list(reversed(path))
                leaf = reverse_path[0]

                self.node[leaf[0]].set_leaf_value(leaf[1], value)

                for index, child_index in path:
                    self.node[index].update_child_value(child_index, value)
                    self.node[index].update_node_value(value)
                    value = 1.0 - value

May use reverse_path instead of path?

    def process_mini_batch(self, board: GoBoard): # pylint: disable=R0914
    ......
            if path:
                value = value_dist[0] + value_dist[1] * 0.5

                reverse_path = list(reversed(path))
                leaf = reverse_path[0]

                self.node[leaf[0]].set_leaf_value(leaf[1], value)

                for index, child_index in reverse_path:
                    self.node[index].update_child_value(child_index, value)
                    self.node[index].update_node_value(value)
                    value = 1.0 - value

合法手が少ない局面でgenmoveを実行すると"Cannot save move record."で落ちる

I cannot use TamaGo on Gogui on Mac

I cannot use TamaGo on Gogui on Mac.
I tried to register the command python3 /Users/user/TamaGo-main/main.py --model model/rl-model.bin but failed and got the message "囲碁プログラムはどのコメンドにも応答しません。..."

Is it better to use mixed value approximation?

In the paper (Appendix D), DeepMind used the mixed value approximation instead of simple one. It seems that your implementation is simple one. In my experience, the simple one can work on 9x9. But it is crashed on the 19x19. So maybe it is better choice to use mixed value approximation?

    def calculate_completed_q_value(self) -> np.array:

        ~~~~~~~~~~~~~~

        sum_prob = np.sum(policy)
        v_pi = np.sum(policy * q_value)

        return np.where(self.children_visits[:self.num_children] > 0, q_value, v_pi / sum_prob)

time_leftが白番だけ受け付けない

不正なGTPコマンドの応答誤り

正

? unknown command

現状

= ? unknown command

#80 にて既に対応済み

MCTS過程のアニメーション機能

#91 を参照 (やっつけ試作あり).

この試作は, lz-analyze に以下の細工をしたもの.

探索が進むたびに, 探索した系列を「最善応手系列」と詐称して出力
適当に sleep をはさみながら途中経過を出力することで, 一手ずつ打っているようにアニメーション

GUI 側の対応が不要なのが利点 (Lizzie, LizzieYzy, LizGoban の「サブ碁盤に PV を常時表示する機能」がそのまま使える). GUI をだますハックなので細かな不具合は割り切らざるをえないのが欠点 (LizGoban だと visits グラフが乱れる).

別案は, まじめに専用コマンドを独自導入する方法. 「一手ずつアニメーション」のような本来 GUI 側が働くべき役割を GUI にまかせられるのが利点. GUI 側の実装がそこそこ手間なので, 実状として LizGoban 専用になってしまいそうなのが欠点.

どちらにしても, 清書する価値があるかやや迷っています. 実際に使うのは, MCTS を解説するときにデモとして数秒流すか, 自分で数回眺めて「ふうん」と思うか, ぐらいで終わりそうなので…

SGF文字列の読込

経緯は #91 を参照.

GTP の load_sgf <SGFファイル名> の変種で tamago-read_sgf <SGF文字列> がほしい.

デバッグなどで適当な局面を解析したいときに, GUI で棋譜を編集 → クリップボードに SGF をコピー → term に貼りつけて次のように実行, ができると手軽.

SGF='(;SZ[9]KM[7];B[fe];W[de];B[ec])'
echo "tamago-read_sgf $SGF\n..." | python3 main.py ...

現状では, いちいち SGF ファイルを作るか, 「play b ...」のような GTP コマンド列を用意するかが必要でおっくう.

実装例は kaorahi/TamaGo@841fadd. README を書き足して PR を送る予定です.

ところで, よく見たら SGF の標準コマンドは load_sgf ではなく loadsgf ですね. 互換性を考えたら, loadsgf と load_sgf を別名として両方サポートした上で, 新コマンドは tamago-readsgf にしておくほうがすっきりかもしれません.

http://www.lysator.liu.se/~gunnar/gtp/gtp2-spec-draft2/gtp2-spec.html#SECTION00073500000000000000

探索の再開

Lizzie を使っていて, スペースキーで探索を一時停止したあと再度スペースキーを押すと, また 0 visits から探索がやり直しになります. Leela Zero や KataGo のように, 「盤面が変わっていなければ続きから探索」という動作にするのはいかがでしょうか.

KataGo のこの動作に慣れているので, 何かと気軽にスペースキーを押してしまって, 後悔することが多いです.

(試作: #106 に依存しています)
https://github.com/kaorahi/TamaGo/tree/continue_search

もっと欲を言えば, 一手進めた際の部分木の再利用だのニューラルネット出力のキャッシュだのもありますが, コードの複雑化に見合うほど必須かは疑問です.

単位表示の修正 (sec -> s)

Question about C_SCALE value.

According to paper, the best C_SCALE value is 0.1 on 9x9 board. Why did you use 1. Did you tune it by your personal experiment?

time_leftコマンドが来ない時に持ち時間を消費したことにならない。

visitsの厳密指定

経緯は #91 を参照.

「1000 visits のツリーの画像」などを描きたいときに, 最善手が確定しても探索を打ち切らない手段がほしい.

実装例は kaorahi/TamaGo@c454642. README を書き足して PR を送る予定です.

--window-sizeオプションの処理エラー

get_final_status.pyがない

Would you want to add support for built-in GUI in the future?

TamaGo is based on Ubuntu. It is not friendly for Windows users who want use TamaGo with GUI. In the experience of my old project, the best way for Windows users is built-in GUI. Would you want to add support for built-in GUI? If the answer is yes, I am glad to work for it and I want to know which framework you like, tkinter, pygame or others. Thanks!

探索木を配列から置換表に変更

Support CGOS player mode.

思考時間管理に探索回数だけでなく現在の考慮時間も加える

search_sequential_halvingの実装に疑問があります

search_sequential_halvingの実装において
select_move_by_sequential_halving_for_rootではルート局面でnp.argmaxされておりそれを候補数呼び出していますが、
”POLICY IMPROVEMENT BY PLANNING WITH GUMBEL”のAlgorithm 2 Sequential Halving with Gumbelにおいて
argtopでm個の候補を一度に取ることになっています。
現象的には重複無しサンプリングであるべきところが重複ありサンプリングになっていると思われます。
加えて言うとGumbel-Top-k trickの計算コスト的な旨味や低ノード数域での多様性を損ねているように思います。

持ち時間が少ないときにcgos-genmove_analyzeでゼロ除算が発生

get_analyzeメソッドで発生していると思われる。

undoコマンドの実装

#80 にて既に実装済み。より効率的な実装方法の可能性を要検討。

LizGoban対応

処理については #80 にて対応済み。
GUIの登録方法についてREADME.mdに追記する。

Unrecognized superko

http://www.yss-aya.com/cgos/viewer.cgi?9x9/SGF/2023/06/06/1440151.sgf

get_num_libertiesメソッドの呼吸点数集計誤り

実装ミス

GPU使用時にGoGUI解析コマンドが落ちる

以下ソースコードの置き換えが必要

修正箇所 : forward_with_softmax
修正後呼び出しメソッド : inference

I'm curious if noise takes effect in search_best_move

I observed that the policy will be set to noise in "expand_node", but the "update_policy" used during inference (in "process_mini_batch") will directly update the policy to the result of network calculations, so that there will be no randomness at all except selfplay games.

kobanium / tamago Goto Github PK

tamago's People

Contributors

Stargazers

Watchers

Forkers

tamago's Issues

アニメーション

ツリー表示

1. In the pucb.py, the value should not be zero if there is no visits.

2. In the tree.py, update the nodes via reverse_path. Not path.

Recommend Projects

Recommend Topics

Recommend Org

Jobs

1. In the `pucb.py`, the value should not be zero if there is no visits.

2. In the `tree.py`, update the nodes via `reverse_path`. Not `path`.