facebookresearch / darkforestgo Goto Github PK

View Code? Open in Web Editor NEW

2.1K 148.0 325.0 607 KB

DarkForest, the Facebook Go engine.

License: Other

C 64.25% Lua 33.14% Shell 0.58% Objective-C 0.15% C++ 1.88%

darkforestgo's Introduction

DarkForest, the Facebook Go engine

Update[12/11/2017] DarkForestGo has been incorporated into ELF platform.

Update: The training code is open source now. See below for detailed instructions.

DarkForest is a Go game engine powered by Deep Learning and developed at Facebook AI Research.

We hope that releasing the source code and pre-trained models are beneficial to the community.

Details of the engine are given in our paper and poster, and if you use our engine in future research, cite our paper:

Better Computer Go Player with Neural Network and Long-term Prediction, ICLR 2016
Yuandong Tian, Yan Zhu

@article{tian2015better,
  title={Better Computer Go Player with Neural Network and Long-term Prediction},
  author={Tian, Yuandong and Zhu, Yan},
  journal={arXiv preprint arXiv:1511.06410},
  year={2015}
}

Although DarkForest is standalone and does not depend on external libraries, some portions of the tactics and pattern code were inspired by the Pachi engine.

Build

Dependencies:

Install torch7.
Install CUDA / CuDNN
Install a few packages

luarocks install class
luarocks install image
luarocks install tds
luarocks install cudnn

This program supports 1 to 4 GPUs.

Then just compile with the following command:

sh ./compile.sh

GCC 4.8+ is required. Depending on the location of your C++ compiler, please change the script accordingly. Tested in CentOS 6.5 and Ubuntu 14.04, 15.04.

Install gcc-4.9 as a second compiler and create symlink as:

sudo ln -s /usr/bin/gcc-4.9 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-4.9 /usr/local/cuda/bin/g++

During the installation of torch and cudnn, either change the build script or replace symlink at /usr/bin/cc with:

sudo ln -s /usr/bin/gcc-4.9 /usr/bin/cc

More info at (http://stackoverflow.com/questions/6622454/cuda-incompatible-with-my-gcc-version)

After the compilation cc symlink can be reverted back to latest version.

If you get errors like:

These bindings are for version 5005 or above ...

Download latest cuDNN from nvidia at (https://developer.nvidia.com/rdp/cudnn-download), registration required.

Usage

Step 1: Download the models.

Create ./models directory and download trained models.

Step 2: First run the GPU server

cd ./local_evaluator
sh cnn_evaluator.sh [num_gpu] [pipe file path]

num_gpu the number of GPUs (1-8) you have for the current machine.
pipe file path The path that the pipe file is settled. Default is /data/local/go. If you have specific other path, then you need to specify the same when running cnnPlayerMCTSV2.lua

Example: sh cnn_evaluator.sh 4 /data/local/go

Step 3: Run the main program

cd ./cnnPlayerV2
th cnnPlayerMCTSV2.lua [options]

See cnnPlayerV2/cnnPlayerMCTSV2.lua for a lot of options. For a simple first run (assuming you have 4 GPUs), you could use:

th cnnPlayerMCTSV2.lua --num_gpu [num_gpu] --time_limit 10

or (if you want to use a set of plausibly good parameters):

th cnnPlayerMCTSV2.lua --use_formal_params --num_gpu [num_gpu] --time_limit 10

To load an existing game up to move 23:

th cnnPlayerMCTSV2.lua [other_options] --setup_board "/path/to/sgf 23"

When you are in the interactive environment, type

clear_board to clear the board
genmove b to genmove the black move.
play w Q4 to play a move at Q4 for specific color.
quit to quit.

A complete game may look like:

clear_board
[MCTS initialization ...]
place_free_handicap 3
genmove b
[MCTS generates moves..e.g., it returns Q16]
play w D4
genmove b
[MCTS generates moves...]
quit

For more commands, please use command list_commands, check the details of GTP protocol or take a look at the source code.

Training

To train the policy network from scratch, please run ./train.sh. 1 GPU is needed. Please install torchnet first (e.g., luarocks install torchnet).

Differences with the award-winning versions

The difference between this open source version (A) and that in KGS/competitions (B) is the following:

(A) runs on a single machine and uses pipe as client/server communications. (B) uses thrift RPC services as a way to communicate.
(B) uses more computational resources.
We might have tuned parameters for (B) extensively, but not for (A). We will give the tip of parameter tuning soon.

Troubleshooting

Q: My program hanged on genmove/quit, what happened? A: Make sure you run the GPU server under ./local_evaluator, the server remains active and the pipe file path matches between the server and the client.

If you have any questions or find any bugs, please open a Github issue by clicking "Issues" tab and then click "New Issue".

Code Overview

The system consists of the following parts.

./CNNPlayerV2 Lua (terminal) interface for Go.

CNNPlayerV3.lua Run Pure-DCNN player
CNNPlayerMCTSV2.lua Run player with DCNN + MCTS

./board Things about board and its evaluations. Board data structure and different playout policy.
./mctsv2 Implementation of Monte Carlo Tree Search
./local_evaluator Simple GPU-based server. Communication with search threads via pipe.
./utils Simple utilities, e.g., read/write sgf files.
./test Test utilities.
./train Training code
./dataset Dataset used for training. Please download them here and save to the ./dataset directory.
./models All pre-trained models. Please download them here and save to the ./models directory.
./sgfs Some exemplar sgf files.

License

Please check the LICENSE file for the license of Facebook DarkForest Go engine.

darkforestgo's People

Contributors

Stargazers

Watchers

Forkers

harpreet lzytek xurantju pinglmlcv ghosthwang amoliu dawume panyang sherkwast magicyb2016 wangxiao5791509 geilove jizhihang oiolong hitting nufroftsuj shushu0 olivercsy convexsetgithub moyanyunyue nick1201 robinshan hydercps hoardboard chenxujin wanjinchang iamjasonye hxdone aniljava ml-lab xotic89 partrick timmyzhao eriche2016 brooklynsys wanyuanwang techscientist hitluobin remi-coulom caomw felixmonkey winning1120xx gefei tailintalent vikingmew androidgg xiarx2016 lixy zhang365947064 jianchengss peerchen txia2015 inno157 hengqujushi zhuojw10 onlydole laurielinz simonhung cngoku isee15 zmoon111 wendypenny winsky2008 2php chinshou euwen cadelaren caplu danielrich zhaoerchao vhuarui woodyring nifannn hephaex barneyeldinosaurio chonglinsun qingsong99 limingdeng ashmaple saakaifoundry wavelets motivic smopart cartertsai bin2000 somaticapi lina rtruxal dolanor-galaxy shyamalschandra townie makemefriendanshu franciscogodoy delejnr mars198356 kod3r mrsci codeaudit slanterns-fork raulpercy

darkforestgo's Issues

Unclear dependencies in readme

Hi,

I'm trying to follow the instructions to install the dependencies:

Install torch7.
Install luarocks: class, image, tds, cudnn

But none of those luarocks can be found, at least on luarocks.org, AFAICT:
https://luarocks.org/search?q=class
https://luarocks.org/search?q=image
https://luarocks.org/search?q=tds
https://luarocks.org/search?q=cudnn

Please bear in mind I have absolutely zero experience with Lua or Luarocks, I'm a python/ruby coder mostly. I might very well just be ignorant of some basic information every Lua coder knows by heart that has been omitted as "obvious".

Can not setup_board

from Readme.md
th cnnPlayerMCTSV2.lua [other_options] --setup_board "/path/to/sgf 23"

I used it but it has some error

bio1607b@bio1607b-MS-7817:~/Downloads/darkforestGo/cnnPlayerV2$ th cnnPlayerMCTSV2.lua --save_sgf_per_move --num_gpu 1 --pipe_path /home/bio1607b/Downloads/darkforestGo/local_evaluator --setup_board "/home/bio1607b/Downloads/darkforestGo/cnnPlayerV2/18760408.sgf 12"
Pattern file ../models/playout-model.bin loaded!
CNNPlayerV2MCTSParams is NULL, set default parameters.
Pattern file ../models/playout-model.bin loaded!
---- PatternV2 -----
#hash_size: 1048576, NUM_PRIOR: 17, LEN_PRIOR: 49
Verbose: 1, cnt_threshold: 1, alpha: 0.015000, batch_size: 8, temperature: 0.125000, ply_fraction: 0.001000
neighbor: true, nakade: true, resp: true, save_atari: true, kill_other: true, global: false, ko: true, put_group_to_atari: true, eye: true
#Pattern: 754901, collision: 198968663
Sample from topn: -1
-- End Patternv2 ---
New MCTS game, signature: 2016-11-15_17-24-48 ------------ Parameters for Search -----------------
Local Pipe path: /home/bio1607b/Downloads/darkforestGo/local_evaluator
Verbose: 1
PrintSearchTree: false
#GPU: 1
#Use CPU rollout only: false
Komi: 6.5
dynkomi_factor: 0.00
Rule: chinese
Use heuristic time management: 0, max_time_spent: 0.000000, min_time_spent: 0.000000
+++++++++++ Tree #0 ++++++++++++
Verbose: 1
#Threads: 16
#Receivers: 1
Sigma: 0.05, over n: false
Async mode: false
RAVE: false
UCT: PUCT
num_rollout: 1000
num_rollout_per_move: 1000
num_playout_per_rollout: 1
num_rollout_peekable: 20000
num_dcnn_per_move: 1000
rcv_acc_percent_thres: 80
rcv_max_num_move: 20
rcv_min_num_move: 1
expand_n_thres: 0
decision_mixture_ratio: 5.0
Use pondering: false
Time limit: 0
% of threads running playout when expanding node: 0
single_move_return: false
default_policy: PATTERN_V2 [-1, T: 0.125]
+++++++++++ End Tree ++++++++++++
--------- End parameters for Search --------------
/home/bio1607b/torch/install/bin/luajit: ../cnnPlayerV2/cnnPlayerV2Framework.lua:216: attempt to concatenate local 'donnot_flip_vertical' (a boolean value)
stack traceback:
../cnnPlayerV2/cnnPlayerV2Framework.lua:216: in function 'setup_board'
../cnnPlayerV2/cnnPlayerV2Framework.lua:877: in function '__init'
/home/bio1607b/torch/install/share/lua/5.1/class/init.lua:164: in function 'CNNPlayerV2'
cnnPlayerMCTSV2.lua:336: in main chunk
[C]: in function 'dofile'
...607b/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
Segmentation fault (core dumped)

Training code BUG?

In line 311 in darkforestGo/train/rl_framework/infra/bundle.lua:
self.params[i]:add(-learning_rate, self.gparams[i])

I think this is weird since adding the learning rate with the gradient makes no sense, Maybe the author wanted to use 'addcmul' instead of 'add'?

Gen doesnot known command - kgs-game_over , so that Game cannot finish and cannot start a new game.

二月 07, 2017 12:16:42 上午 com.gokgs.client.gtp.GtpClient d
非常详细: Got successful response to command "genmove w": = G7
二月 07, 2017 12:16:42 上午 com.gokgs.client.gtp.a a
非常详细: Submitting move g7 to server
timeleft -- color: w, num_seconds: 49, num_moves: 0二月 07, 2017 12:16:42 上午 com.gokgs.client.gtp.GtpClient d
非常详细: Command sent to engine: time_left w 49 0
二月 07, 2017 12:16:42 上午 com.gokgs.client.gtp.GtpClient d
非常详细: Got successful response to command "time_left w 49 0": =
二月 07, 2017 12:21:28 上午 com.gokgs.client.gtp.a b
警告: Opponent has not returned. Leaving game.
二月 07, 2017 12:21:28 上午 com.gokgs.client.gtp.GtpClient d
非常详细: Command sent to engine: kgs-game_over
二月 07, 2017 12:21:28 上午 com.gokgs.client.gtp.GtpClient d
严重: Got malformed response from engine: Warning: Ignoring unknown command - kgs-game_over
? ???. nil
二月 07, 2017 12:21:28 上午 com.gokgs.client.gtp.GtpClient c
详细: Game ended. Starting another.
二月 07, 2017 12:21:28 上午 com.gokgs.client.gtp.GtpClient a
非常详细: Still an outstanding command, will wait until the system is idle before making a new game.

Problem in run_cmds?

When I try to write a script for run_cmds, I find g and genmove can only run once, then the program will exit. Other commands don't have this problem. In cnnPlayerV2Framework.lua, line 664, g or genmove will return the "win_rate"(between 0 and 1), and in line 339, win_rate is read as quit, thus always return true and exit. When I delete the "win_rate" in line 664, it seems worked fine.

Can Darkforest use the Gogui?

I want to use Gogui.
But... I don't know How to do.....
please~!!!! Help me ~~

Errors when trying to compile darkforestGo on MacOS 10.11.6 without CUDA but with torch installed.

ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
ld: unknown option: -export-dynamic
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Create libplayout_multithread.so
clang: error: no such file or directory: 'event_count.o'
Create liblocalexchanger.so
Undefined symbols for architecture x86_64:
"_error", referenced from:
Play in board.o
IsSelfAtari in board.o
CheckLadderUseSearch(Board, unsigned char, int, int) in board.o
EmptyGroup(Board*, unsigned short) in board.o
_find_only_liberty in board.o
_find_two_liberties in board.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Compile all test codes
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated
clang: error: no such file or directory: 'event_count.o'
clang: warning: -O4 is equivalent to -O3
Put all .so file into directory so that lua could load
cp: libboard.so: No such file or directory
cp: libownermap.so: No such file or directory
cp: libpattern_v2.so: No such file or directory
cp: libdefault_policy.so: No such file or directory
cp: libcomm.so: No such file or directory
cp: libcommon.so: No such file or directory
cp: libplayout_multithread.so: No such file or directory
cp: libmoggy.so: No such file or directory
cp: liblocalexchanger.so: No such file or directory

What should I do?

Timeline for learn module and distributed evaluator

Is there any expected time for the release of learn code base.

Also, will the distributed configuration used in tournaments shared as well.

Problem in --exec

th cnnPlayerMCTSV2.lua --setup_board /usr/path/game-0.sgf --exec /usr/path/dfcmds
my dfcmds is
g
save_sgf /usr/path/game-0.sgf

After darkforest execute 'g'
It will quit
Didn't do the second command to save sgf file
Trying 'genmove' or 'g' is the same result (also quit)
What the problem about the situation

关于怎么实现darkforest自动对战的问题？

你好，我遇到一些问题在搭建好darkforestGo之后，我想实现default_policy分别为v2和pachi的自动对局，也就是不用在每次clear_board之后每次都输入'g'.
问题1：怎么设置白棋和黑棋的default_policy。
问题2：怎么设置棋局自动进行到制定局数之后结束(不用每次都输入'g')
希望能得到一些指点，谢谢！

the program is perfect

high CPU usage of GPU server (local_evaluator)

After
sh cnn_evaluator.sh 1 /home/zhu/darkforestGo/data/

the CPU usage is 200% all the time, is it normal?

PID	USER	PR	NI	VIRT	RES	SHR	S	%CPU	%MEM	TIME+	COMMAND
3840	zhu	20	0	20.935g	357256	87472	R	199.7	4.4	0:48.94	/home/zhu/torch/install/bin/luajit -e package+

then I set opt.use_pondering = false in the function load_params_for_formal_game(), the engine run OK with the command: th cnnPlayerV2/cnnPlayerMCTSV2.lua --use_formal_params --num_gpu 1 --time_limit 10

but the usage of CPU is still 200% when it is not the engine's turn....

Test with Alpha Go of Deep Mind

Can DarkForest be able to beat Alpha Go of Deep Mind ?

dropbox连不上，无法获取models

求好心人给传一份，多谢多谢。

田教授，自己怎么制作dataset，我有很多棋谱，想加进去。

System requirement and GTP setup

I tested Tested cnnPlayerMCTSV2.lua on 750 Ti, it responded normally upto 6 moves, after that it stopped responding and started freezing computer.

Also, when using cnnPlayerMCTSV2.lua with gogui, the debug messages with line ends, i think interferes with the GTP protocol triggering the timeout on the gogui.

A guide on GTP setup and expected system requirement would be nice.

Attempt to call field 'hasFastHalfInstructions' (a nil value)

Every step works fine but the final step has a problem like this

bill@Darkstar:~/fbgo/darkforestGo/cnnPlayerV2$ th cnnPlayerMCTSV2.lua
/home/kangqi/workspace/distro/install/bin/luajit: ...qi/workspace/distro/install/share/lua/5.1/trepl/init.lua:384: ...qi/workspace/distro/install/share/lua/5.1/cudnn/init.lua:98: attempt to call field 'hasFastHalfInstructions' (a nil value)
stack traceback:
	[C]: in function 'error'
	...qi/workspace/distro/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
	../utils/utils.lua:523: in function 'require_cutorch'
	cnnPlayerMCTSV2.lua:15: in main chunk
	[C]: in function 'dofile'
	...ace/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
	[C]: at 0x00405ea0

How much memory do I need to train

Hi,

First of all, thanks to your nice work.

I was trying to run your go engine on my server, which has about 120 GiB memory.

It went all right until I tried to train with provided dataset.

The output are as followed:

[root@localhost darkforestGo]# ./train.sh
{
  nstep = 3,
  optim = "supervised",
  loss = "policy",
  progress = false,
  nthread = 4,
  model_name = "model-12-parallel-384-n-output-bn",
  data_augmentation = true,
  actor = "policy",
  nGPU = 1,
  sampling = "replay",
  intermediate_step = 50,
  userank = true,
  alpha = 0.05,
  num_forward_models = 2048,
  batchsize = 256,
  epoch_size_test = 128000,
  feature_type = "extended",
  epoch_size = 128000,
  datasource = "kgs"
}	
fm_init: function: 0x4076e7c8	
fm_gen: function: 0x410f4a58	
fm_postprocess: nil	
rl.Dataset.__init(): forward_model_init is set, run it
rl.Dataset.__init(): forward_model_init is set, run it
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-4547/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 1 module of nn.Sequential:
In 9 module of nn.Sequential:
/root/torch/install/share/lua/5.1/nn/THNN.lua:110: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-4547/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
	[C]: in function 'v'
	/root/torch/install/share/lua/5.1/nn/THNN.lua:110: in function 'BatchNormalization_updateOutput'
	/root/torch/install/share/lua/5.1/nn/BatchNormalization.lua:124: in function </root/torch/install/share/lua/5.1/nn/BatchNormalization.lua:113>
	[C]: in function 'xpcall'
	/root/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </root/torch/install/share/lua/5.1/nn/Sequential.lua:41>
	[C]: in function 'xpcall'
	/root/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	./train/rl_framework/infra/bundle.lua:161: in function 'forward'
	./train/rl_framework/infra/agent.lua:46: in function 'optimize'
	./train/rl_framework/infra/engine.lua:114: in function 'train'
	./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
	train.lua:155: in main chunk
	[C]: in function 'dofile'
	/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x004064f0

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
	[C]: in function 'error'
	/root/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
	/root/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	./train/rl_framework/infra/bundle.lua:161: in function 'forward'
	./train/rl_framework/infra/agent.lua:46: in function 'optimize'
	./train/rl_framework/infra/engine.lua:114: in function 'train'
	./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
	train.lua:155: in main chunk
	[C]: in function 'dofile'
	/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x004064f0

I ran the "free" command before training. It turns out like this:

[root@localhost darkforestGo]# free
              total        used        free      shared  buff/cache   available
Mem:      115383448     1317128   112506528       10744     1559792   113786336
Swap:      67108860           0    67108860

It seems that I'm facing an "out of memory" issue.

May I ask how much memory do I need to train?

Or, is there anything wrong elsewere?

Thanks in advance

Completion of error handling

Would you like to add more error handling for return values from functions like the following?

malloc ⇒ wouldbe_ladder
printf ⇒ main

reserved identifier violation

I would like to point out that identifiers like "_BOARD_H_" and "_PACKAGE_H_" do not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?

Can darkforest return genmove in only one commands ?

I have a project to use PHP to control darkforest in my website .
My algorithm is let darkforest load the last game's sgf file and return the AI genmove result
ex:
th cnnPlayerMCTSV2.lua --num_gpu 1 --pipe_path /home/bio1607b/darkforestGo/local_evaluator --setup_board "/home/bio1607b/darkforestGo/sgfs/alphago_leesedol_1.sgf 12"
but i don't know how to let darkforest return the result which it move .

I need help !!

create a stateless interface to genmove(board_state)

It would be great to have stateless interface to engine which can be exposed e.g. with REST on web. It probably needs to implement/document a call to genmove(board_state) and show how to call it via framework like lapis.

When I enter genmove and quit....

When I enter genmove and quit, program is stopped .....
What's the problem
Please......

What's the system requirements of training code?

I have tried training the model by provided dataset, but it's out of memory in my machine, which has 16G memory.

I can't find machine information in the original paper. Is there any data about how many memory required in training phase?

Does CNNPlayerMCTSV2.lua run player only with MCTS, not with DCNN?

hello,
as in the readme :https://github.com/facebookresearch/darkforestGo#code-overview, CNNPlayerMCTSV2.lua is Run player with DCNN + MCTS.
And i dive into the code https://github.com/facebookresearch/darkforestGo/blob/master/cnnPlayerV2/cnnPlayerMCTSV2.lua#L266, and into this funciton, it call https://github.com/facebookresearch/darkforestGo/blob/master/mctsv2/tree_search.c#L1291 to play, but it does't call cnn to calc, so does it mean only with mcts?
thanks~

Crash in training

I attempted training (kgs data) with train.sh I installed most recent version of torch, and cuda 8.0.
Training seem to end soon with error as below:

| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
/home/lauri/go/engines/darkf/torch/install/bin/luajit: ./train/rl_framework/infra/bundle.lua:187: invalid arguments: CudaTensor CudaLongTensor 
expected arguments: [*CudaByteTensor*] CudaTensor float | *CudaTensor* CudaTensor float | [*CudaByteTensor*] CudaTensor CudaTensor | *CudaTensor* CudaTensor CudaTensor
stack traceback:
        [C]: in function 'eq'
        ./train/rl_framework/infra/bundle.lua:187: in function 'get_top5'
        ./train/rl_framework/infra/bundle.lua:242: in function 'backward_prepare'
        ./train/rl_framework/infra/agent.lua:47: in function 'optimize'
        ./train/rl_framework/infra/engine.lua:114: in function 'train'
        ./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
        train.lua:155: in main chunk
        [C]: in function 'dofile'
        ...arkf/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x004063a0

Please explain the parameter

I am control the parameters like surewin threshold.
but I didn't understand its means...
please explain a lot of parameter.

crash in training

I attempted training (kgs data) with train.sh I installed most recent version of torch, and cuda 8.0.
Training seem to end soon with error as below:
{
nstep = 3,
optim = "supervised",
loss = "policy",
progress = false,
nthread = 4,
model_name = "model-12-parallel-384-n-output-bn",
data_augmentation = true,
actor = "policy",
nGPU = 1,
sampling = "replay",
intermediate_step = 50,
userank = true,
alpha = 0.05,
num_forward_models = 2048,
batchsize = 256,
epoch_size_test = 128000,
feature_type = "extended",
epoch_size = 128000,
datasource = "kgs"
}
fm_init: function: 0x40af2138
fm_gen: function: 0x41d64210
fm_postprocess: nil
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 144748 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
rl.Dataset.__init(): forward_model_init is set, run it
| IndexedDataset: loaded ./dataset with 26814 examples
rl.Dataset.__init(): #forward model = 2048, batchsize = 256
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6693/cutorch/lib/THC/generic/THCStorage.cu line=65 error=2 : out of memory
/home/lin/torch/install/bin/luajit: /home/lin/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
In 5 module of nn.Sequential:
/home/lin/torch/install/share/lua/5.1/cudnn/Pointwise.lua:15: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6693/cutorch/lib/THC/generic/THCStorage.cu:65
stack traceback:
[C]: in function 'resizeAs'
/home/lin/torch/install/share/lua/5.1/cudnn/Pointwise.lua:15: in function 'createIODescriptors'
/home/lin/torch/install/share/lua/5.1/cudnn/Pointwise.lua:41: in function </home/lin/torch/install/share/lua/5.1/cudnn/Pointwise.lua:40>
[C]: in function 'xpcall'
/home/lin/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/lin/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function </home/lin/torch/install/share/lua/5.1/nn/Sequential.lua:41>
[C]: in function 'xpcall'
/home/lin/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
/home/lin/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./train/rl_framework/infra/bundle.lua:161: in function 'forward'
./train/rl_framework/infra/agent.lua:46: in function 'optimize'
./train/rl_framework/infra/engine.lua:114: in function 'train'
./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
train.lua:155: in main chunk
[C]: in function 'dofile'
.../lin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/lin/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
/home/lin/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
./train/rl_framework/infra/bundle.lua:161: in function 'forward'
./train/rl_framework/infra/agent.lua:46: in function 'optimize'
./train/rl_framework/infra/engine.lua:114: in function 'train'
./train/rl_framework/infra/framework.lua:304: in function 'run_rl'
train.lua:155: in main chunk
[C]: in function 'dofile'
.../lin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50

cannot create /data/local/go/cnn_eval-1.log: Directory nonexistent

yixuan@yixuan-All-Series:~/code/facebook/darkforestGo/local_evaluator$ sh cnn_evaluator.sh 1 /data/local/go
num of gpu used = 1
other parameters =
output path = /data/local/go
cnn_evaluator.sh: 22: cnn_evaluator.sh: cannot create /data/local/go/cnn_eval-1.log: Directory nonexistent
6036
cnn_evaluator.sh: 23: cnn_evaluator.sh: cannot create /data/local/go/cnn_eval-1.log: Directory nonexistent
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录
grep: /data/local/go/cnn_eval-1.log: 没有那个文件或目录

you said "pipe file path The path that the pipe file is settled. Default is /data/local/go",
but i don't have the directory /data/local/go, what does it mean, should I install something?
My operating system is ubuntu 16.04.

'simpleko' dcnn feature issue ?

Hi,

I was looking at the input features for the dcnn and it looks like the simpleko feature actually returns the player's stones (like the stones feature).

in board.lua:

function board.get_stones(b, player)
    local stones = torch.FloatTensor(19, 19)
    C.GetStones(b, player, stones:data())
    return stones
end

function board.get_simple_ko(b, player)
    local simple_ko = torch.FloatTensor(19, 19)
    C.GetStones(b, player, simple_ko:data())
    return simple_ko
end

Maybe GetSimpleKo() was meant instead of GetStones() ?

Acknowledge Pachi?

Hi! I'm glad you found Pachi useful when developing darkforest. It seems to me that besides the unused-by-default pachi-tactics/ code, pieces of the board/ code also were inspired by Pachi, in the pattern and policy modules in particular, is that a correct impression?

I don't think what you did is a licence problem wrt. GPL or anything, for a variety of specific reasons. I'm happy it helped. But if you agree that a mention in the README near the bottom like "Some portions of the tactics and pattern code were inspired by the Pachi engine." would be appropriate, I'd be glad to see it mentioned. :-)

Cannot open pipe /home/ubuntu/darkforestGo/GO/./pipe-0-0 (client) !

How can I fix this error??
fs/alphago_leesedol_1.sgf 12"
Pattern file ../models/playout-model.bin loaded!
CNNPlayerV2MCTSCannot open pipe /home/ubuntu/darkforestGo/GO/./pipe-0-0 (client) !
/home/ubuntu/torch/install/bin/luajit:

#define ⇒ enum?

Would you like to replace more defines for constant values by enumerations to stress their relationships?

Segmentation Fault on --cpu_only

I'm getting the following error:

th cnnPlayerMCTSV2.lua --cpu_only
Pattern file ../models/playout-model.bin loaded!
CNNPlayerV2MCTSclear_board
Params is NULL, set default parameters.
Pattern file ../models/playout-model.bin loaded!
---- PatternV2 -----
#hash_size: 1048576, NUM_PRIOR: 17, LEN_PRIOR: 49
Verbose: 1, cnt_threshold: 1, alpha: 0.015000, batch_size: 8, temperature: 0.125000, ply_fraction: 0.001000
neighbor: true, nakade: true, resp: true, save_atari: true, kill_other: true, global: false, ko: true, put_group_to_atari: true, eye: true
#Pattern: 754901, collision: 198968663
Sample from topn: -1
-- End Patternv2 ---
Segmentation fault

Any ideas how to fix it?

--cpu_only crashes ?

Hi,

Trying to run darkforest in mcts-only mode (no dcnn):

$ th cnnPlayerMCTSV2.lua --cpu_only --time_limit 10
Pattern file ../models/playout-model.bin loaded!
CNNPlayerV2MCTS

boardsize 19
= 

clear_board
Params is NULL, set default parameters.
Pattern file ../models/playout-model.bin loaded!
---- PatternV2 -----
#hash_size: 1048576, NUM_PRIOR: 17, LEN_PRIOR: 49
Verbose: 1, cnt_threshold: 1, alpha: 0.015000, batch_size: 8, temperature: 0.125000, ply_fraction: 0.001000
neighbor: true, nakade: true, resp: true, save_atari: true, kill_other: true, global: false, ko: true, put_group_to_atari: true, eye: true
#Pattern: 754901, collision: 198968663
Sample from topn: -1
-- End Patternv2 ---
Segmentation fault (core dumped)

Am i doing something wrong, or is this a bug ?

Using latest git (ee97607), Ubuntu 14.04

luarocks package not found

luarocks install class
luarocks install image
luarocks install tds
luarocks install cudnn

All these command failed:

Error: No results matching query were found.

Cannot open pipe /data/local/go//./pipe-0-0 (client) !

I do have started the server and I have tried to use another directory as pipe. But it's always like this when I type clear_board

bill@Darkstar:~/fbgo/darkforestGo/cnnPlayerV2$ th cnnPlayerMCTSV2.lua
Pattern file ../models/playout-model.bin loaded!
CNNPlayerV2MCTSclear_board
Cannot open pipe /data/local/go//./pipe-0-0 (client) !
/home/kangqi/workspace/distro/install/bin/luaji

Dataset can't be download.

I'm in China, I can't download Dataset from "here"
Please download them here and save to the ./dataset directory.

Please give me new links just like what you do in Issue 17.
Thanks.

The models link cann't be accessed now.

Usage step 1, cann't download models. The link is invalid.
Can anyone do that?