cglemon / sayuri Goto Github PK

View Code? Open in Web Editor NEW

65.0 2.0 7.0 14.73 MB

AlphaZero based engine for the game of Go (圍棋/围棋).

License: GNU General Public License v3.0

CMake 1.36% C++ 78.03% C 0.29% Cuda 3.44% Python 16.57% Shell 0.30%

mcts weiqi baduk alphago deeplearning sayuri alphazero gumbel-alphazero

sayuri's Introduction

Sayuri

Let's ROCK!

Sayuri is a GTP-compliant go engine based on Deep Convolutional Neural Network and Monte Carlo Tree Search. Learning the game of Go without strategic knowledge from human with AlphaZero-based algorithm. She is strongly inspired by Leela Zero and KataGo. The board data structure, search algorithm and network format are borrowed from Leela Zero in the beginning. Current version follows the KataGo research, the engine supports variable komi and board size now. Some methods or reports you may see my articles (some are chinese).

Quick Start via Terminal

First, you need a executable weights. Download the old v0.6 weights here or last v0.7 weights here. Some description of weights and RL progression is here. If you want to use the older network, please use the v0.5 engine at the save-last-v050 branch.

Then start the program with GTP mode via the terminal/PowerShell, using 1 thread and 400 visits per move with optimistic policy, please enter

$ ./sayuri -w <weights file> -t 1 -p 400 --use-optimistic-policy

You will see the diagnostic verbose. If the verbose includes Network Verison information, it means you success to execute the program with GPT mode. For more arguments, please give the --help option.

$ ./sayuri --help

Execute Engine via Graphical Interface

Sayuri is not complete engine. You need a graphical interface for playing with her. She supports any GTP (version 2) interface application. Sabaki and GoGui are recommended because Sayuri supports some specific analysis commands.

Sabaki analysis mode

GoGui analysis commands

Build From Source

Please see this section. If you are Windows platform, you may download the executable file from release page.

Reinforcement Learning

Sayuri is a fairly fast self-play learning system for the game of Go. The pictute shows the estimated computation of v0.7 engine (purple line) versus KataGo and LeelaZero. Compare sayuri with ELF OpenGo, achieving a around 250x reduction in computation. In detail, spending 3 months on a single RTX4080 device. The result is apparently better than KataGo g104 which claims 50x reduction in computation.

Here will describe how to run the self-play loop.

Todo

Support NHWC format.
Support distributed computation.
Support KataGo analysis mode.

Other Resources

Go Text Protocol, https://www.gnu.org/software/gnugo/gnugo_19.html
Leela Zero, https://github.com/leela-zero/leela-zero
KataGo methods, https://github.com/lightvector/KataGo/blob/master/docs/KataGoMethods.md
YouTube, playing with Pachi.
Supported analysis commands, analyze.
AlphaZero 之加速演算法實作 (v0.4~v0.5), describe some methods for old version.
Journal

License

The code is released under the GPLv3, except for threadpool.h, cppattributes.h, Eigen and Fast Float, which have specific licenses mentioned in those files.

Contact

[email protected] (Hung-Tse Lin)

sayuri's People

Contributors

Stargazers

Watchers

Forkers

jerry-github-cloud 6xfuquap7r7w shengkelong countingtls misaka10032s xieren58 owaisk4

sayuri's Issues

Issues with spaces in paths

It seems Sayuri (on Windows, anyway) cannot launch if its path contains any spaces.

Also, I believe it cannot load weights if the location of the weights contains any spaces.

Any plans for a openCl backend?

Hi,

I would like to try out sayuri but only have igpu hd520 graphics. Is openCl possible in the future? Thanks.

The description is odd.

// CPU-only version
$ nvcc main.cc config.cc version.cc game/board.cc game/book.cc game/game_state.cc game/gtp.cc game/iterator.cc game/pattern_board.cc game/sgf.cc game/strings.cc game/symmetry.cc game/zobrist.cc mcts/node.cc mcts/search.cc mcts/time_control.cc neural/description.cc neural/encoder.cc neural/loader.cc neural/network.cc neural/training_data.cc neural/winograd_helper.cc neural/blas/batchnorm.cc neural/blas/biases.cc neural/blas/blas.cc neural/blas/blas_forward_pipe.cc neural/blas/convolution.cc neural/blas/fullyconnect.cc neural/blas/se_unit.cc neural/blas/sgemm.cc neural/blas/winograd_convolution3.cc neural/cuda/cuda_common.cc neural/cuda/cuda_forward_pipe.cc neural/cuda/cuda_layers.cc neural/cuda/cuda_kernels.cu pattern/gammas_dict.cc pattern/mm.cc pattern/mm_trainer.cc pattern/pattern.cc selfplay/engine.cc selfplay/pipe.cc summary/accuracy.cc utils/filesystem.cc utils/gogui_helper.cc utils/gzip_helper.cc utils/komi.cc utils/log.cc utils/option.cc utils/parse_float.cc utils/random.cc utils/splitter.cc utils/time.cc -o sayuri -I . -DNDEBUG -DWIN32 -DNOMINMAX -DUSE_CUDA -lcudart -lcublas -O3 -Xcompiler /O2 -Xcompiler /std:c++14

// GPU version
$ nvcc main.cc config.cc version.cc game/board.cc game/book.cc game/game_state.cc game/gtp.cc game/iterator.cc game/pattern_board.cc game/sgf.cc game/strings.cc game/symmetry.cc game/zobrist.cc mcts/node.cc mcts/search.cc mcts/time_control.cc neural/description.cc neural/encoder.cc neural/loader.cc neural/network.cc neural/training_data.cc neural/winograd_helper.cc neural/blas/batchnorm.cc neural/blas/biases.cc neural/blas/blas.cc neural/blas/blas_forward_pipe.cc neural/blas/convolution.cc neural/blas/fullyconnect.cc neural/blas/se_unit.cc neural/blas/sgemm.cc neural/blas/winograd_convolution3.cc neural/cuda/cuda_common.cc neural/cuda/cuda_forward_pipe.cc neural/cuda/cuda_layers.cc neural/cuda/cuda_kernels.cu pattern/gammas_dict.cc pattern/mm.cc pattern/mm_trainer.cc pattern/pattern.cc selfplay/engine.cc selfplay/pipe.cc summary/accuracy.cc utils/filesystem.cc utils/gogui_helper.cc utils/gzip_helper.cc utils/komi.cc utils/log.cc utils/option.cc utils/parse_float.cc utils/random.cc utils/splitter.cc utils/time.cc -o sayuri -I . -DNDEBUG -DWIN32 -DNOMINMAX -DUSE_CUDA -lcudart -lcublas -O3 -Xcompiler /O2 -Xcompiler /std:c++14

问题棋局

Sayuri 执黑对 Zen7，很精彩的一局棋。309手收官后黑胜0.5目，白棋pass 黑棋自填2手后pass，导致输1.5目，不知为何会这样？
sgf.zip

make -j error.

/home/guest/Sayuri/src/neural/cuda/cuda_layers.cc: In member function ‘void CUDA::Convolution::Forward(int, float*, float*, void*, void*, size_t)’:
/home/guest/Sayuri/src/neural/cuda/cuda_layers.cc:126:15: warning: unused variable ‘board_size’ [-Wunused-variable]
126 | const int board_size = (width_ + height_) / 2;
| ^~~~~~~~~~
/home/mong/Sayuri/src/neural/cuda/cuda_layers.cc: At top level:
cc1plus: warning: unrecognized command line option ‘-Wno-mismatched-tags’
cc1plus: warning: unrecognized command line option ‘-Wno-mismatched-tags’

Is it possible to do a cuda compile?

I'm using it well. Is it possible to do a cuda compile? for window use.

Thank you.

About the strength of sayuri

Hello, I just tried running sayuri with LizzieYzy.
It worked, but it recommended a strange location.
Both the analysis and the game are strange.
Does that mean sayuri isn't strong enough yet?
The engine commands used are:
"Sayuri-v0.6.1-eigen-windows-x64.exe" -w zero-swa-2200k.bin.txt -t 1 -b 1 -p 400

selfplay 不使用gpu

按照默认的各个参数，
$ cp -r bash selfplay-course
$ cd selfplay-course
$ bash setup.sh -s ..
$ bash selfplay.sh
有gpu但是没使用，是哪里需要改设置吗

windows 10 compile error.

g++ -std=c++14 -ffast-math -I . -Wall -Wextra -lpthread .cc utils/.cc accuracy/.cc game/.cc mcts/.cc neural/.cc neural/blas/.cc neural/cuda/.cc pattern/.cc selfplay/.cc -o Sayuri -O3 -DNDEBUG -DWIN32 -I ../third_party/Eigen -DUSE_BLAS -DUSE_EIGEN
neural/blas/winograd_convolution3.cc: In lambda function:
neural/blas/winograd_convolution3.cc:53:29: error: 'SQ2' is not captured
auto i3m1_2 = i3 * (SQ2) + i1 * (-SQ2 / 2.0f);
^~~
neural/blas/winograd_convolution3.cc:42:31: note: the lambda has no capture-default
const auto multiply_bt = [](float& o0, float& o1, float& o2,
^
neural/blas/winograd_convolution3.cc:18:20: note: 'constexpr const double SQ2' declared here
constexpr auto SQ2 = kSqrt2;
^~~
neural/blas/winograd_convolution3.cc: In lambda function:
neural/blas/winograd_convolution3.cc:189:34: error: 'SQ2' is not captured
auto t3m4 = (i3 - i4) * (SQ2);
^~~
neural/blas/winograd_convolution3.cc:182:31: note: the lambda has no capture-default
const auto multiply_at = [](float& o0, float& o1, float& o2, float& o3,
^
neural/blas/winograd_convolution3.cc:174:20: note: 'constexpr const double SQ2' declared here
constexpr auto SQ2 = kSqrt2;
^~~
What kind of error is it?

mingw64 : gcc version 8.1.0 (x86_64)

About --relative-rank?

I'm curious about the --relative-rank option.
What about the -int- range?
Is it affected much by the -p(playouts) option?
Do you have an example you can refer to?

0.6.0 version compile error.

clang-16: error: no such file or directory: 'accuracy/*.cc'

delete.. accuracy/*.cc

After deleting it, an error appears as follows.

$ g++ -std=c++14 -ffast-math -I . -lpthread .cc utils/.cc game/.cc mcts/.cc neural/.cc neural/blas/.cc neural/cuda/.cc pattern/.cc selfplay/*.cc -o Sayuri -O3 -DNDEBUG -DWIN32 -I ../third_party/Eigen -DUSE_BLAS -DUSE_EIGEN
In file included from config.cc:1:
./utils/option.h:162:9: warning: expression result unused [-Wunused-value]
(T)(*this);
^ ~~~~~~~
./utils/option.h:52:20: note: in instantiation of function template specialization 'Option::FancyPush<bool, void>' requested here
FancyPush(val);
^
./utils/option.h:239:20: note: in instantiation of function template specialization 'Option::Option<bool, void>' requested here
auto out = Option(t, val,
^
config.cc:17:36: note: in instantiation of function template specialization 'Option::SetOption<bool, void>' requested here
kOptionsMap["help"] << Option::SetOption(false);
^
In file included from config.cc:1:
./utils/option.h:162:9: warning: expression result unused [-Wunused-value]
(T)(*this);
^ ~~~~~~~
./utils/option.h:52:20: note: in instantiation of function template specialization 'Option::FancyPush<int, void>' requested here
FancyPush(val);
^
./utils/option.h:239:20: note: in instantiation of function template specialization 'Option::Option<int, void>' requested here
auto out = Option(t, val,
^
config.cc:30:50: note: in instantiation of function template specialization 'Option::SetOption<int, void>' requested here
kOptionsMap["fixed_nn_boardsize"] << Option::SetOption(0);
^
In file included from config.cc:1:
./utils/option.h:162:9: warning: expression result unused [-Wunused-value]
(T)(*this);
^ ~~~~~~~
./utils/option.h:52:20: note: in instantiation of function template specialization 'Option::FancyPush<float, void>' requested here
FancyPush(val);
^
./utils/option.h:239:20: note: in instantiation of function template specialization 'Option::Option<float, void>' requested here
auto out = Option(t, val,
^
config.cc:32:44: note: in instantiation of function template specialization 'Option::SetOption<float, void>' requested here
kOptionsMap["defualt_komi"] << Option::SetOption(kDefaultKomi);
^
3 warnings generated.
In file included from main.cc:1:
In file included from C:/Program Files/mingw64/include/c++/v1/memory:898:
In file included from C:/Program Files/mingw64/include/c++/v1/__memory/shared_ptr.h:31:
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:65:5: warning: delete called on 'NetworkForwardPipe' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptr::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./neural/network.h:15:7: note: in instantiation of member function 'std::unique_ptr::~unique_ptr' requested here
class Network {
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_deleteGtpLoop::Agent::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptrGtpLoop::Agent::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./game/gtp.h:53:5: note: in instantiation of member function 'std::unique_ptrGtpLoop::Agent::~unique_ptr' requested here
GtpLoop() {
^
1 warning generated.
game/board.cc:1817:17: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
std::remove(std::begin(strings_head),
^~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~
game/board.cc:2068:17: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
std::remove(std::begin(epmty_head),
^~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
2 warnings generated.
In file included from game/gtp.cc:1:
In file included from ./game/gtp.h:3:
In file included from ./game/game_state.h:3:
In file included from C:/Program Files/mingw64/include/c++/v1/vector:3359:
In file included from C:/Program Files/mingw64/include/c++/v1/algorithm:1747:
In file included from C:/Program Files/mingw64/include/c++/v1/__algorithm/inplace_merge.h:28:
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:65:5: warning: delete called on 'NetworkForwardPipe' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptr::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./neural/network.h:15:7: note: in instantiation of member function 'std::unique_ptr::~unique_ptr' requested here
class Network {
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_deleteGtpLoop::Agent::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptrGtpLoop::Agent::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./game/gtp.h:53:5: note: in instantiation of member function 'std::unique_ptrGtpLoop::Agent::~unique_ptr' requested here
GtpLoop() {
^
1 warning generated.
In file included from neural/network.cc:6:
In file included from ../third_party/Eigen/Eigen/Dense:1:
In file included from ../third_party/Eigen/Eigen/Core:43:
In file included from C:/Program Files/mingw64/include/c++/v1/complex:243:
In file included from C:/Program Files/mingw64/include/c++/v1/sstream:191:
In file included from C:/Program Files/mingw64/include/c++/v1/istream:165:
In file included from C:/Program Files/mingw64/include/c++/v1/ostream:168:
In file included from C:/Program Files/mingw64/include/c++/v1/__memory/shared_ptr.h:31:
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:65:5: warning: delete called on 'NetworkForwardPipe' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:241:5: note: in instantiation of member function 'std::unique_ptr::reset' requested here
reset(_u.release());
^
neural/network.cc:68:11: note: in instantiation of function template specialization 'std::unique_ptr::operator=<BlasForwardPipe, std::default_delete, void, void>' requested here
pipe = std::make_unique();
^
1 warning generated.
In file included from selfplay/engine.cc:1:
In file included from ./selfplay/engine.h:3:
In file included from ./game/game_state.h:3:
In file included from C:/Program Files/mingw64/include/c++/v1/vector:3359:
In file included from C:/Program Files/mingw64/include/c++/v1/algorithm:1747:
In file included from C:/Program Files/mingw64/include/c++/v1/__algorithm/inplace_merge.h:28:
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:65:5: warning: delete called on 'NetworkForwardPipe' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptr::reset' requested here
_LIBCPP_INLINE_VISIBILITY LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./neural/network.h:15:7: note: in instantiation of member function 'std::unique_ptr::~unique_ptr' requested here
class Network {
^
selfplay/engine.cc:25:25: note: in instantiation of function template specialization 'std::make_unique' requested here
network = std::make_unique();
^
1 warning generated.
In file included from selfplay/pipe.cc:1:
In file included from ./selfplay/pipe.h:3:
In file included from ./selfplay/engine.h:3:
In file included from ./game/game_state.h:3:
In file included from C:/Program Files/mingw64/include/c++/v1/vector:3359:
In file included from C:/Program Files/mingw64/include/c++/v1/algorithm:1747:
In file included from C:/Program Files/mingw64/include/c++/v1/__algorithm/inplace_merge.h:28:
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:65:5: warning: delete called on 'NetworkForwardPipe' that is abstract but has non-virtual destructor [-Wdelete-abstract-non-virtual-dtor]
delete __ptr;
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptr::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./neural/network.h:15:7: note: in instantiation of member function 'std::unique_ptr::~unique_ptr' requested here
class Network {
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:297:7: note: in instantiation of member function 'std::default_delete::operator()' requested here
_ptr.second()(__tmp);
^
C:/Program Files/mingw64/include/c++/v1/__memory/unique_ptr.h:263:75: note: in instantiation of member function 'std::unique_ptr::reset' requested here
_LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
^
./selfplay/engine.h:10:7: note: in instantiation of member function 'std::unique_ptr::~unique_ptr' requested here
class Engine {
^
In file included from selfplay/pipe.cc:1:
In file included from ./selfplay/pipe.h:3:
In file included from ./selfplay/engine.h:5:
In file included from ./mcts/search.h:4:
In file included from ./mcts/parameters.h:4:
./utils/option.h:162:9: warning: expression result unused [-Wunused-value]
(T)(*this);
^ ~~~~~~~
./utils/option.h:179:9: note: in instantiation of function template specialization 'Option::FancyPush<bool, void>' requested here
FancyPush(val);
^
./utils/option.h:276:20: note: in instantiation of function template specialization 'Option::Set' requested here
it->second.Set(val);
^
selfplay/pipe.cc:15:5: note: in instantiation of function template specialization 'SetOption<bool, void>' requested here
SetOption("analysis_verbose", false);
^
In file included from selfplay/pipe.cc:1:
In file included from ./selfplay/pipe.h:3:
In file included from ./selfplay/engine.h:5:
In file included from ./mcts/search.h:4:
In file included from ./mcts/parameters.h:4:
./utils/option.h:162:9: warning: expression result unused [-Wunused-value]
(T)(*this);
^ ~~~~~~~
./utils/option.h:179:9: note: in instantiation of function template specialization 'Option::FancyPush<int, void>' requested here
FancyPush(val);
^
./utils/option.h:276:20: note: in instantiation of function template specialization 'Option::Set' requested here
it->second.Set(val);
^
selfplay/pipe.cc:18:5: note: in instantiation of function template specialization 'SetOption<int, void>' requested here
SetOption("threads", 1);
^
3 warnings generated.
ld.lld: error: undefined symbol: ComputeNetAccuracy(Network&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>)

referenced by C:/Users/mong/AppData/Local/Temp/gtp-b164cf.o:(GtpLoop::Execute(Splitter&, bool&))

ld.lld: error: undefined symbol: ComputeSelfplayAccumulation(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>)

referenced by C:/Users/mong/AppData/Local/Temp/gtp-b164cf.o:(GtpLoop::Execute(Splitter&, bool&))
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)

关于和leelazero对照的问题

作者您好，我一直在关注sayuri的训练过程（因为对gumbel是否能真正提升训练很感兴趣），我注意到您在最新的日志中和leelazero的早期网络进行了对比，但是据我所知leelazero在早期很长一段时间的训练有比较严重的问题（具体是什么忘了，很久以前的事了），所以如果想要对比训练速度的话和sai比leelazero更合适，考虑到sayuri使用了一些kata的算法来改进训练，如果想要证明gumbel有效可能和kata的早期网络对比是最合适的。

在Windows系统下编译Cuda引擎时报错

具体是从这开始报错的：
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(1973): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(1979): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(1985): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(1991): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(1997): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2003): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2009): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2015): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2021): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2027): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2033): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2039): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2044): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2048): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2052): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2056): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2060): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2064): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2068): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(2072): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp(3495): error: asm operand type size(8) does not match type/size implied by constraint 'r'
: "r"(address), "h"(*(reinterpret_cast<const unsigned short *>(&(val))))
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1897): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1903): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1909): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1915): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1921): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1927): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1933): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1939): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1945): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1951): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1957): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1963): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1969): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1973): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1977): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1981): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1985): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1989): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1993): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_bf16.hpp(1997): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

41 errors detected in the compilation of "neural/cuda/cuda_kernels.cu"

编译使用的命令行如下：
nvcc main.cc config.cc version.cc game/board.cc game/book.cc game/game_state.cc game/gtp.cc game/iterator.cc game/pattern_board.cc game/sgf.cc game/strings.cc game/symmetry.cc game/zobrist.cc mcts/node.cc mcts/search.cc mcts/time_control.cc neural/description.cc neural/encoder.cc neural/loader.cc neural/network.cc neural/training_data.cc neural/winograd_helper.cc neural/blas/batchnorm.cc neural/blas/biases.cc neural/blas/blas.cc neural/blas/blas_forward_pipe.cc neural/blas/convolution.cc neural/blas/fullyconnect.cc neural/blas/se_unit.cc neural/blas/sgemm.cc neural/blas/winograd_convolution3.cc neural/cuda/cuda_common.cc neural/cuda/cuda_forward_pipe.cc neural/cuda/cuda_layers.cc neural/cuda/cuda_kernels.cu pattern/gammas_dict.cc pattern/mm.cc pattern/mm_trainer.cc pattern/pattern.cc selfplay/engine.cc selfplay/pipe.cc summary/accuracy.cc utils/filesystem.cc utils/gogui_helper.cc utils/gzip_helper.cc utils/komi.cc utils/log.cc utils/option.cc utils/parse_float.cc utils/random.cc utils/splitter.cc utils/time.cc -o sayuri -I . -DNDEBUG -DWIN32 -DNOMINMAX -DUSE_CUDA -lcudart -lcublas -O3 -Xcompiler /O2 -Xcompiler /std:c++14

请问如何解决？谢谢

cuda compile error

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1906): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1912): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1918): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1924): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1930): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1936): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1942): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1948): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1954): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1960): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1966): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1972): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1977): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1981): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1985): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1989): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1993): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(1997): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(2001): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(2005): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_fp16.hpp(3428): error: asm operand type size(8) does not match type/size implied by constraint 'r'
: "r"(address), "h"(*(reinterpret_cast<const unsigned short *>(&(val))))
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1830): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1836): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.nc.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1842): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1848): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cg.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1854): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1860): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.ca.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1866): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1872): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cs.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr));
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1878): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1884): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.lu.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1890): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b32 %0, [%1];" : "=r"(*(reinterpret_cast<unsigned int *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1896): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("ld.global.cv.b16 %0, [%1];" : "=h"(*(reinterpret_cast<unsigned short *>(&(ret)))) : "r"(ptr) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1902): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1906): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wb.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1910): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1914): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cg.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1918): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1922): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.cs.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1926): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b32 [%0], %1;" :: "r"(ptr), "r"(*(reinterpret_cast<const unsigned int *>(&(value)))) : "memory");
^

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda_bf16.hpp(1930): error: asm operand type size(8) does not match type/size implied by constraint 'r'
asm ("st.global.wt.b16 [%0], %1;" :: "r"(ptr), "h"(*(reinterpret_cast<const unsigned short *>(&(value)))) : "memory");
^

41 errors detected in the compilation of "neural/cuda/cuda_kernels.cu".

os : windows 10
cuda : 12.3
Visual Studio 2019 Developer Command Prompt v16.11.11
Copyright (c) 2021 Microsoft Corporation

compile error with gcc 13.2

I tried to build on Ubuntu 24.04 LTS, RTX 4090, CUDA 12.4, cuDNN 8.9.7.

$ git submodule update --init --recursive
$ mkdir build && cd build
$ cmake .. -DBLAS_BACKEND=CUDNN
$ make -j 4
...
In file included from /home/yss/go/sayuri/src/utils/parse_float.cc:1:
/home/yss/go/sayuri/src/utils/parse_float.h:8:33: error: 'std::uint32_t' has not been declared
8 | bool MatchFloat32(float f, std::uint32_t n);

$ gcc --version
gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0

Add this one line to these two files seems ok.
#include <cstdint>
src/utils/parse_float.h
src/utils/filesystem.h

window10 compile error

C:\Users\mong\AppData\Local\Temp\ccCYLsP0.o:convolution.cc:(.text+0x0): multiple definition of `Convolution<1u>::Forward(unsigned long long, unsigned long long, unsigned long long, std::vector<float, std::allocator > const&, std::vector<float, std::allocator > const&, std::vector<float, std::allocator >&, std::vector<float, std::allocator >&)'
C:\Users\mong\AppData\Local\Temp\cc84MHSl.o:blas_forward_pipe.cc:(.text$ZN11ConvolutionILj1EE7ForwardEyyyRKSt6vectorIfSaIfEES5_RS3_S6[ZN11ConvolutionILj1EE7ForwardEyyyRKSt6vectorIfSaIfEES5_RS3_S6]+0x0): first defined here
collect2.exe: error: ld returned 1 exit status

In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:1:
/home/suerya/工程/Sayuri/src/pattern/pattern.h:14:13: error: ‘uint64_t’ in namespace ‘std’ does not name a type
   14 | extern std::uint64_t PatternHash[8][4][kMaxPatternArea];
      |             ^~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:9:26: error: ‘std::string’ has not been declared
    9 |     void Initialize(std::string filename);
      |                          ^~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:11:28: error: ‘std::uint64_t’ has not been declared
   11 |     bool ProbePattern(std::uint64_t hash, float &val) const;
      |                            ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:12:28: error: ‘std::uint64_t’ has not been declared
   12 |     bool ProbeFeature(std::uint64_t hash, float &val) const;
      |                            ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:15:29: error: ‘std::uint64_t’ has not been declared
   15 |     bool InsertPattern(std::uint64_t hash, float val);
      |                             ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:16:29: error: ‘std::uint64_t’ has not been declared
   16 |     bool InsertFeature(std::uint64_t hash, float val);
      |                             ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:29: error: ‘uint64_t’ is not a member of ‘std’
   18 |     std::unordered_map<std::uint64_t, float> pattern_dict_;
      |                             ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:29: error: ‘uint64_t’ is not a member of ‘std’
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:44: error: template argument 1 is invalid
   18 |     std::unordered_map<std::uint64_t, float> pattern_dict_;
      |                                            ^
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:44: error: template argument 3 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:44: error: template argument 4 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:18:44: error: template argument 5 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:29: error: ‘uint64_t’ is not a member of ‘std’
   19 |     std::unordered_map<std::uint64_t, float> feature_dict_;
      |                             ^~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:29: error: ‘uint64_t’ is not a member of ‘std’
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:44: error: template argument 1 is invalid
   19 |     std::unordered_map<std::uint64_t, float> feature_dict_;
      |                                            ^
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:44: error: template argument 3 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:44: error: template argument 4 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:19:44: error: template argument 5 is invalid
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:15:6: error: no declaration matches ‘void GammasDict::Initialize(std::string)’
   15 | void GammasDict::Initialize(std::string filename) {
      |      ^~~~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:9:10: note: candidate is: ‘void GammasDict::Initialize(int)’
    9 |     void Initialize(std::string filename);
      |          ^~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:5:7: note: ‘class GammasDict’ defined here
    5 | class GammasDict {
      |       ^~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:73:6: error: no declaration matches ‘bool GammasDict::ProbePattern(uint64_t, float&) const’
   73 | bool GammasDict::ProbePattern(std::uint64_t hash, float &val) const {
      |      ^~~~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:11:10: note: candidate is: ‘bool GammasDict::ProbePattern(int, float&) const’
   11 |     bool ProbePattern(std::uint64_t hash, float &val) const;
      |          ^~~~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:5:7: note: ‘class GammasDict’ defined here
    5 | class GammasDict {
      |       ^~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:82:6: error: no declaration matches ‘bool GammasDict::ProbeFeature(uint64_t, float&) const’
   82 | bool GammasDict::ProbeFeature(std::uint64_t hash, float &val) const {
      |      ^~~~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:12:10: note: candidate is: ‘bool GammasDict::ProbeFeature(int, float&) const’
   12 |     bool ProbeFeature(std::uint64_t hash, float &val) const;
      |          ^~~~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:5:7: note: ‘class GammasDict’ defined here
    5 | class GammasDict {
      |       ^~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:91:6: error: no declaration matches ‘bool GammasDict::InsertPattern(uint64_t, float)’
   91 | bool GammasDict::InsertPattern(std::uint64_t hash, float val) {
      |      ^~~~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:15:10: note: candidate is: ‘bool GammasDict::InsertPattern(int, float)’
   15 |     bool InsertPattern(std::uint64_t hash, float val);
      |          ^~~~~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:5:7: note: ‘class GammasDict’ defined here
    5 | class GammasDict {
      |       ^~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:100:6: error: no declaration matches ‘bool GammasDict::InsertFeature(uint64_t, float)’
  100 | bool GammasDict::InsertFeature(std::uint64_t hash, float val) {
      |      ^~~~~~~~~~
In file included from /home/suerya/工程/Sayuri/src/pattern/gammas_dict.cc:2:
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:16:10: note: candidate is: ‘bool GammasDict::InsertFeature(int, float)’
   16 |     bool InsertFeature(std::uint64_t hash, float val);
      |          ^~~~~~~~~~~~~
/home/suerya/工程/Sayuri/src/pattern/gammas_dict.h:5:7: note: ‘class GammasDict’ defined here
    5 | class GammasDict {

是这个系统不满足什么需求吗？debian和ubuntu之间的什么差距才会导致这个问题发生呢？

Is there any option to remove all dead stones?

Hi,
I run cgf2023 weight on CGOS.
But It seems Sayuri passes without removing all dead stones.
http://www.yss-aya.com/cgos/viewer.cgi?19x19/SGF/2023/08/24/876080.sgf
Is there any option to remove all dead stones?

I use RTX 3090, cuda-11.2, Ubuntu 20.04.2, Ryzen 7 3700X 8-Core. Compile and command are

$ git clone https://github.com/CGLemon/Sayuri
$ cd Sayuri
$ git submodule update --init --recursive
$ cd ..
$ wget https://github.com/CGLemon/Sayuri/archive/refs/tags/cgf2023.tar.gz
$ tar xvf cgf2023.tar.gz
$ cd Sayuri-cgf2023
$ cp -p -r ../Sayuri/third_party ./
$ mkdir build && cd build
$ cmake .. -DBLAS_BACKEND=CUDA -DCMAKE_CUDA_ARCHITECTURES=75
$ make -j 6
$ ./Sayuri -t 4 -p 10000 -w ../network/cgf2023-swa.bin.txt

windows version

can we expect a windows version this year?