Comments (5)
I feel that this is not an unexpected behavior, even with the temperature set to 0. The tricky bit here is numerical stability, some of the cuda algorithm may be non deterministic but even besides this candle and pytorch don't apply the exact same ops, e.g. we accumulate with f32 in the softmax whereas pytorch may well do something slightly different.
Overall as the generated text seems legit I would think that it's fine but I would not consider that the generation or the generated logits should line up perfectly.
from candle.
It makes sense! I was suspecting that. My concern arises from consistently lower performance in my internal benchmarks (average across ~17 datasets), where it scores 1% to 2% lower than the reference implementation (Python) on all tested models. However, I suppose there's no easy solution for that.
from candle.
That's interesting, what is the benchmark, MMLU or something else? For MMLU 1 or 2% seems with noise but it's a bit annoying if it's consistently worse, might be good to measure perplexity if that's not already what you're doing.
Overall numerical differences can lead to lower performance as pytorch will be consistent between training and inference and we wouldn't be but it's hard to say by how much, so any number you can put on this would be greatly appreciated (it may certainly be a bug on the candle side).
from candle.
This is actually an interesting topic. Thanks for sharing it @hugoabonizio. Even though the numerical imprecision being naturally present in different implementations, I would expect these differences to be minimal and therefore have no impact in the actual token generation (the probabilities precision might differ slightly, but the sampled token should be the same assuming one fixes the random seed for sampling. Any thoughts on this @LaurentMazare @hugoabonizio ?
from candle.
@LaurentMazare Unfortunately, this result is based on an internal benchmark suite and not all datasets are public. However, I'll make an effort to attempt the same kind of evaluation using public datasets to make it reproducible.
@jorgeantonio21 I wouldn't expect sampling to be equal because there are a lot of factors affecting the sampling process that differ. However, in greedy sampling, I was expecting the results to be the same since the output probabilities should be (hopefully) the same.
from candle.
Related Issues (20)
- Improving the versatility of Tensor::slice_assign
- Automatically upcasting GGUF values HOT 2
- Improve extracting values from `gguf_file::Value` HOT 1
- [question] difference with tvm-unity / mlc-llm HOT 1
- Linear layer with same weights, biases, and inputs gives different output than Pytorch HOT 3
- Dynamic linking feature breaks pyo3 wrappers
- unsupported op_type STFT for op
- ONNX: MaxPool with pads != 0
- Qwen2: can not run with the latest Qwen2 models HOT 3
- How to Implement New Operators Using CUDA Host Functions Along with Thrust and CUB Libraries HOT 1
- Implement unfold function
- Implement torch.scatter HOT 1
- Provide a simple Stable Diffusion 3 (SD3) inference example
- Status of Apple silicon M3 GPU support?
- Quantized-t5 models on Cuda HOT 2
- WASM library examples require HTTPS
- How to select which GPU to use HOT 3
- Metal memory leak multiplying matrices HOT 3
- Is it advisable to avoid variable shadowing when using Candle?
- candle-flash-attn infinite compile time HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from candle.