If I have
llm-chain-llama-sys ="0.12.3"
to Cargo.toml
it runs fine, but if I have
llm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] }
causes the program to segfault.
I tracked down where it was by adding .arg("-DCMAKE_BUILD_TYPE=Debug")
to llm-chain-llama-sys/build.rs:84
to tell Cmake to add debug symbols to the llama.cpp
. Then stepping through the program in gdb
I found it causes the segfault (valgrind says it tries to Jump to the invalid address stated on the next line 0x0: ???
) at what I believe to be the first FFI the program calls.
My rust code calls llm_chain_llama::Executor::new_with_options(options)
, which eventually goes to the line llm-chain-llama/src/context.rs:42
which is an unsafe
block to the FFI function llama_context_default_params
which starts on llama.cpp/llama.cpp:864
.
When gdb
enters llama_context_default_params
, running bt
shows a correct back trace leading back to the rust program. After stepping over the struct initialization, now bt
shows that the rust program tries to return to 0x00000000
. I assume its because the stack frame is getting messed up. The c++ function llama_context_default_params
just returns a struct so the struct size is probably wrong .
I think I found the problem, before the struct initialization on llama.cpp:864
if I add a line
printf("HELLO FROM CPP ASDFASDF struct size is %ld\n", sizeof(llama_context_params));
and then in rust add the line
eprintln!("HELLO FROM RUST struct size = {}", std::mem::size_of::<llm_chain__llama_sys::llama_context_params>()));
let executor = llm_chain::llama::Executor::new_with_options(options)?;
eprintln!("GOT PAST FUNCTION");
If I dont enable features = ["cuda"]
they print the same size of 48
.
HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 48
GOT PAST FUNCTION
... program runs fine
but if do have features = ["cuda"]
HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 112
Segmentation fault (core dumped)
which I think is the problem.
The struct llama_context_params
is defined in llama.h:74
, and I think the problem is that it has an member float tensor_split[LLAMA_MAX_DEVICES]
.
When cuda is enabled the build.rs:88
passes in the build flag -DLLAMA_CUBLAS=ON
which on llama.h:5
some preprocessor ifdef
s changes the value of LLAMA_MAX_DEVICES
which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.