huggingface / candle Goto Github PK

View Code? Open in Web Editor NEW

13.4K 13.4K 719.0 10.25 MB

Minimalist ML framework for Rust

License: Apache License 2.0

Rust 78.89% Cuda 4.42% Makefile 0.01% Python 4.40% HTML 2.98% JavaScript 1.07% C++ 2.70% Shell 0.03% Metal 5.50%

candle's People

Contributors

Stargazers

Watchers

Forkers

codingonion stjordanis cksac lucky-tensor bnusunny ron-devel rand dujjwalr-aws thearchiver xwzy adamnemecek malcolmgreaves recmo iantbutler01 aqhali v0idmatr1x vamseekm mmizutani getong hubayirp isgasho ather23 3aceshowhand kklingeman maksymsur leejw51 apepkuss mec-is schupeliac hypi-universe richardsonjf gidakov21 revsic oyelowo icnawi chadington-bear jacktang eltociear danny-b00y martinabeleda kerakov gabrielmbmb ailabteam rocketknight1 stevehoober254 k2m5t2 philparzer smallw00d2211 omarofo llukas22 sbrcck dogpawhat maghent cyberflamego brokenhardisk brixthedev peytontolbert daddyunikii internalxz ssghost gmh5225 tony163163 techieteee farhadfa22 buithanh6728 arahman493 prahs alexyen1000 asdhashd111 songwithlyrics007 aveshrahmani05 bawa74090 badshah12584 nouseensiddiqui87635 odyssey27 pterameta sunny2yan allenli178 beslan09 egolightly81 lo-ferris fwxrgbfexvbw dwjpman himanshugiripunje moonswordsman ourcats jamestiotio adrainzack gamartyom hristogospodinov adamwads dani-el-lo wuyumin 100nguyen andrewyu0 mbrukman mahsashahbaz cometwonderw maekawatoshiki ehsanmok

candle's Issues

Specify the device only once

This looks like a great project!

I have a question: Why is it necessary to specify the device for every Tensor? Wouldn't it be possible to set the device once and then all allocations are made to that device?
This would work like a global allocator in Rust: https://doc.rust-lang.org/std/alloc/index.html

The drawback is that you can't use multiple backends at the same time easily.

What are your thoughts?

Why is `candle_nn` not re-exported by `candle_core`?

It took me a while to figure out I should add candle_nn as a dependency separately to get access to types such as VarBuilder. It was additionally confusing because the examples use crates with different names, such as candle instead of candle_core: https://github.com/huggingface/candle/blob/main/candle-examples/examples/mnist-training/main.rs#L7

Is the intent to have candle_nn as its own crate or is this an oversight? Please share your insight.

Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

When I run cargo run --example llama --release, I got

Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

Caused by:
    https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

Are there some solutions for me to download such repository and load it on my local directory?

Compiling candle-examples v0.1.0 error

warning: some crates are on edition 2021 which defaults to resolver = "2", but virtual workspaces default to resolver = "1"
note: to keep the current resolver, specify workspace.resolver = "1" in the workspace root's manifest
note: to use the edition 2021 resolver, specify workspace.resolver = "2" in the workspace root's manifest
Compiling candle-examples v0.1.0 (~/github.com/huggingface/candle/candle-examples)
error[E0308]: mismatched types
--> candle-examples/examples/whisper/main.rs:174:21
|
174 | .decode(tokens.clone(), true)
| ------ ^^^^^^^^^^^^^^ expected &[u32], found Vec<u32>
| |
| arguments to this method are incorrect
|
= note: expected reference &[u32]
found struct Vec<u32>
note: method defined here
--> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.13.4/src/tokenizer/mod.rs:814:12
|
814 | pub fn decode(&self, ids: &[u32], skip_special_tokens: bool) -> Result {
| ^^^^^^
help: consider borrowing here
|
174 | .decode(&tokens.clone(), true)
| +

For more information about this error, try rustc --explain E0308.
error: could not compile candle-examples (example "whisper") due to previous error

Tensor::avg_pool2d not working as expected

I'm an ML beginner and a Rust beginner, and I don't know if there's something wrong with my usage or understanding, but the avg_pool2d function doesn't seem to work as expected!

main.rs

use candle_core::{Device, Tensor};
fn main() {
    let device = Device::Cpu;
    
    let data: Vec<f32> = vec![1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,];
    let t = Tensor::from_vec(data, (1, 1, 4, 4), &device).unwrap();
    
    let pool = t.avg_pool2d((2, 2), (2, 2)).unwrap();
    
    println!("{}", t.to_string());
    println!("{}", pool.to_string());
}

output

scale:0.25
sum:2
sum:2
sum:2
sum:2
[[[[1., 1., 1., 1.],
   [0., 0., 1., 1.],
   [1., 1., 1., 1.],
   [1., 1., 1., 1.]]]]
Tensor[[1, 1, 4, 4], f32]

[[[[0.5000, 0.5000],
   [0.5000, 0.5000]]]]
Tensor[[1, 1, 2, 2], f32]

Quick Tensor creation functions

Converting Tensor to vector is pretty straightforward. Is there any easy way for taking a vector and converting it into a Tensor ?

What is the minimal requirements of Intel MKL version?

Hello, Thanks for the great work!

I've got an error while compiling with the -features mkl option.
For example cargo install --git https://github.com/huggingface/candle.git candle-examples --examples bert -F mkl

The error said

  = note: /usr/bin/ld: /workspaces/Kuberian/searcher/target/debug/deps/libcandle_core-0afc8671b4dae8af.rlib(candle_core-0afc8671b4dae8af.candle_core.b11884625c01537d-cgu.13.rcgu.o): in function `candle_core::mkl::hgemm':
          /usr/local/cargo/git/checkouts/candle-0c2b4fa9e5801351/60cd155/candle-core/src/mkl.rs:162: undefined reference to `hgemm_'
          collect2: error: ld returned 1 exit status
          
  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)

I initially thought that I did not install intel mkl libs properly, but I found that

intel-mkl-src automatically downloads the required library from ghcr
intel mkl 2020.01, which automatically downloaded from here, simply does not implement hgemm while they do implement sgemm and dgemm
the latest version of intel mkl does implement hgemm

So I tried the latest version of intel mkl, but it seems intel-mkl-src does not support it.

I'm wondering which intel-mkl version do you use for your development environment?

AMD hardware support for training and Inference

Hi,

This library is cool. Rust for deep learning is nice and great work from huggingface. I am curious to understand if there are plans for AMD hardware support for training and Inference.

Thanks

error: failed to run custom build command for candle-kernels

cargo run --example stable-diffusion --features cuda --features image -- --prompt "a rusty robot holding a fire torch"
Compiling candle-kernels v0.1.0 (HOME/candle/candle-kernels)
error: failed to run custom build command for candle-kernels v0.1.0 (HOME/candle/candle-kernels)
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
process didn't exit successfully: HOME/candle/target/debug/build/candle-kernels-c1d996e014c93c27/build-script-build (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/include
cargo:rerun-if-changed=src/
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=sm_75

--- stderr
src/compatibility.cuh(11): error: identifier "__hmax" is undefined

....

stderr

', candle-kernels/build.rs:207:13
stack backtrace:
0: rust_begin_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
1: core::panicking::panic_fmt
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
2: build_script_build::cuda::build_ptx
3: build_script_build::main
4: core::ops::function::FnOnce::call_once

Randn only generates positive numbers

I can't seem to get randn to generate any negative numbers. Any ideas what is happening here?

Code:

use anyhow::Result;
use candle_core::{Device, Tensor};

fn main() -> Result<()> {
    let n = 200;
    let t = Tensor::randn(0f32, 1f32, n, &Device::Cpu)?;

    let count = Tensor::sum_all(&Tensor::gt(&t, &Tensor::zeros_like(&t)?)?)?;

    println!("{count} out of {n} elements are > 0");

    Ok(())
}

Output:

[200]
Tensor[[], u8] out of 200 elements are > 0

Add community requirements

I don't think this project is at the point for needing a code of conduct, but maybe setting up a contribution guide and a set of requests for compilation errors will alleviate a lot of back and forth's for this project in the future. Open to suggestions, and if this is not a concern please feel free to close the issue.

Potentially incorrect outputs for q4_0 llama 7B

When running the ggml example for llama, using llama-2-7b.ggmlv3.q4_0.bin, I get the following output:

My favorite theorem is 100% of the time. nobody knows what it means. everybody knows it. nobody knows it. nobody knows it. It's a theorem.
I's a theorem.
I's a theorem
I's a theorem
I'm a theorem
I'm a theorem
I'm a theorem
I'm a theorem
I'm a theorem
I'm a theorem
I'm a theorem
I'm a theorem

This doesn't seem correct, so there might be some mismatches in the operations. I'm not using any temperature setting

whisper demo recognize wrong text (some segment text repeat again and again)

I tried to build and run like below, and found the segment text recognize wrong.

cargo run --example whisper --release
cd target/release/examples
whisper.exe --input 001.wav --model base-en

many same segment text repeat again and again.
but the OpenAI/Whisper Pytorch Version can give me the right text.

my wav audio file
https://drive.google.com/file/d/1qwQwfDK-rzac2mnAlSE-oXsKzlKoiA74/view?usp=sharing

Good Video Tutorials

It is possible that this will be the first framework for a lot of people who are entering into the field. Is it possible to create a video tutorial series such as Andrej's for the newcomers ? This will improve the adaptability by a huge margin.

Equivalent of torch.add function

I am trying to write a code similar to the following

import torch
a = torch.ones((4, 8, 32))
b = torch.ones((8,32))
print(a+b)

For following code

use candle::*;
fn main() -> Result<()> {
    let a = Tensor::randn(0f32, 1., (4,8, 32), &Device::Cpu)?;
    let b = Tensor::randn(0f32, 1., (8, 32), &Device::Cpu)?;

    let c = a.add(&b)?;
    println!("{c}");
    Ok(())
}

I am getting this error. Am I missing something ?

ShapeMismatchBinaryOp { lhs: [4, 8, 32], rhs: [8, 32], op: "add" }

Connection Failed: tls connection init failed (os error 10054)

No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
Error: request error: https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav: Connection Failed: tls connection init failed: The remote host forcibly closed an existing connection(os error 10054)

Caused by:
0: https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav: Connection Failed: tls connection init failed: The remote host forcibly closed an existing connection (os error 10054)
1: The remote host forcibly closed an existing connection (os error 10054)
error: process didn't exit successfully: target\release\examples\whisper.exe (exit code: 1)

Anybody knows how to fix this?

WebGPU support

Is WebGPU support on the roadmap as an alternative GPU-accelerated backend? This would be especially useful for inference on the web or for non-CUDA environments.

License

Hi, is the License file missing?

Bigram Model

Hello there,

Newbie here, I am trying to reproduce "let's build GPT" lecture from Andrej Karpathy in candle. At 31 minutes mark in this video, he implements a Bigram model using embeddings.

This is my rust implementation,

#[derive(Debug)]
pub struct BigramLanguageModel {
    token_embedding_table: Embedding,
}

impl BigramLanguageModel {
    // Constructor
    pub fn new(vocab_size: usize) -> Result<Self> {
        let vb = candle_nn::VarBuilder::from_varmap(&candle_nn::VarMap::new(), DType::F32, &Device::Cpu);
        let token_embedding_table = embedding(vocab_size, vocab_size, vb)?;
        Ok(BigramLanguageModel {
            token_embedding_table,
        })
    }

    // Forward pass
    pub fn forward(&self, idx: &Tensor, targets: &Tensor) -> (Tensor, Tensor) {

        let logits = self.token_embedding_table.forward(idx);
        let logits = logits.unwrap();
        
        let shape = logits.shape().dims();
        let logits = logits.reshape(&[shape[0]*shape[1], shape[2]]).unwrap();
        
        println!("shape: {:?}", logits.shape());
        println!("targets shape: {:?}", targets.shape().dims()[0]);
        if targets.shape().dims()[0] != 1 {
            let targets = targets.reshape(&[shape[0]*shape[1]]).unwrap();
            let loss = cross_entropy(&logits, &targets).unwrap();
            (logits, loss)
        }else{
            let loss = Tensor::zeros((1, 1), DType::F32, &Device::Cpu).unwrap();
            (logits, loss)
        }
        
    }

But during training, the loss does not reduce from -ln(1/65). Is my implementation incorrect ?

Also, do you have any tips you could give to a newcomer to make the adoption easy ?

Need resources for testing candle project

Could you help me with resources?

I try to launch https://github.com/huggingface/candle

cargo run --example llama --features cuda

I have 32 Gb RAM laptop with GPU 4GB RAM
it is not working on my small resources

I try to rent per hour GPU on Google Claude, Paperspace, Vultr and etc. All GPU with 24-32 RAM already rent :(

My email: [email protected]

huggingface/transformers to candle porter

Would it possible to build a porter, which can take any huggingface transformer model and convert into rust ?

Generating the same sentence embedding for `all-MiniLM-L6-v2` using `candle`

candle/candle-examples/examples/bert/main.rs

Lines 164 to 167 in 25ec2d9

 // Apply some avg-pooling by taking the mean embedding value for all tokens (including padding) 

 let (_n_sentence, n_tokens, _hidden_size) = embeddings.dims3()?; 

 let embeddings = (embeddings.sum(1)? / (n_tokens as f64))?; 

 println!("pooled embeddings {:?}", embeddings.shape());

The BERT sentence embedding example is using a pooling strategy that generates a different sentence embedding compared to using either the HuggingFace API, or alternative ways of running the model locally.

I would be interested in getting the same result, and I suspect it's in the pooling strategy that should be used.

Any pointers would be helpful.

Thanks!

Lamma2 start a test server got error

Description

Folow example to start a test server for llama2 got error.

# README.MD
For llama2, run the following command to retrieve the weight files and start a test server:

cd candle-wasm-examples/llama2-c
wget https://karpathy.ai/llama2c/model.bin
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
trunk serve --release --public-url /candle-llama2/ --port 8081
And then browse to http://localhost:8081/candle-llama2.

Expected behavior

Server start sucessfully.

Current behavior

2023-08-10T08:44:41.267930Z  INFO 📦 starting build
2023-08-10T08:44:41.268424Z  INFO spawning asset pipelines
2023-08-10T08:44:41.268905Z ERROR ❌ error
error from HTML pipeline

Caused by:
    0: error getting canonical path for "/hosted/workspace/1_user/...../candle/candle-wasm-examples/llama2-c/tokenizer.json"
    1: No such file or directory (os error 2)
2023-08-10T08:44:41.269446Z  INFO 📡 serving static assets at -> /candle-llama2/
2023-08-10T08:44:41.269498Z  INFO 📡 server listening at http://127.0.0.1:8081

Steps to reproduce

git clone https://github.com/huggingface/candle.git

cd candle

cd candle-wasm-examples/llama2-c
wget https://karpathy.ai/llama2c/model.bin
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
trunk serve --release --public-url /candle-llama2/ --port 8081

Diagnosis

tokenizer.json not found

# https://github.com/huggingface/candle/blob/main/candle-wasm-examples/llama2-c/index.html
# line no. 7
 <link data-trunk rel="copy-file" href="tokenizer.json" />

Is `compatibility.cuh` taken from dfdx?

I was reading through some of the issues and came across #353 which mentions compatibility.cuh. I remember writing a file with the same name for dfdx.

Was the file taken from dfdx and changed?
I am not worried about attribution. I don't think it is necessary as per the relevant licenses.
Instead, I am interested whether anything was changed that dfdx could also benefit from.

Maybe there should be a separate library for cuda kernels so that both libraries could benefit from improvements and bug fixes. Let me know what you think.

Support for quantisation

rustformers/llm supports Q2 to Q8 quants with various varieties. Would it be possible to quantize the existing models and run them in this repo ?

cargo run --example whisper --release compile error!

rustc 1.71.1 (eb26296b5 2023-08-03)

error: failed to run custom build command for ring v0.16.20

new version:

Compilation failed for flash-attn kernels

Hi, amazing job done here!

I am trying the example provided for llama-2 multi-device inference using the following command:

cargo run --example llama_multiprocess --release --features "cuda nccl flash-attn"

which yields the following error messages during the building stage of candle-flash-attn crate:

  kernels/flash_fwd_hdim160_fp16_sm80.cu(26): here

  kernels/flash_fwd_kernel.h(325): error: argument list for class template "cute::Tensor" is missing
            detected during:
              instantiation of "void flash::compute_attn<Kernel_traits,Is_dropout,Is_causal,Is_even_N,Is_even_K,Return_softmax,Params>(const Params &) [with Kernel_traits=Flash_fwd_kernel_traits<160, 64, 64, 4, false, false, cutlass::half_t, Flash_kernel_traits<160, 64, 64, 4, cutlass::half_t>>, Is_dropout=true, Is_causal=true, Is_even_N=true, Is_even_K=true, Return_softmax=true, Params=Flash_fwd_params]"
  kernels/flash_fwd_launch_template.h(15): here
              instantiation of "void flash_fwd_kernel<Kernel_traits,Is_dropout,Is_causal,Is_even_N,Is_even_K,Return_softmax>(Flash_fwd_params) [with Kernel_traits=Flash_fwd_kernel_traits<160, 64, 64, 4, false, false, cutlass::half_t, Flash_kernel_traits<160, 64, 64, 4, cutlass::half_t>>, Is_dropout=true, Is_causal=true, Is_even_N=true, Is_even_K=true, Return_softmax=true]"
  kernels/flash_fwd_launch_template.h(34): here
              instantiation of "void run_flash_fwd<Kernel_traits,Is_dropout,Is_causal>(Flash_fwd_params &, cudaStream_t) [with Kernel_traits=Flash_fwd_kernel_traits<160, 64, 64, 4, false, false, cutlass::half_t, Flash_kernel_traits<160, 64, 64, 4, cutlass::half_t>>, Is_dropout=true, Is_causal=true]"
  kernels/flash_fwd_launch_template.h(155): here
              instantiation of "void run_mha_fwd_hdim160<T>(Flash_fwd_params &, cudaStream_t) [with T=cutlass::half_t]"
  kernels/flash_fwd_hdim160_fp16_sm80.cu(26): here

  Error limit reached.
  100 errors detected in the compilation of "kernels/flash_fwd_hdim160_fp16_sm80.cu".
  Compilation terminated.
  Error: nvcc error while compiling:

  # stdout


  # stderr

warning: build failed, waiting for other jobs to finish...

The environment of nvidia related things are as follow:

bin /home/xuzhangda/.mamba/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /home/xuzhangda/.mamba/envs/llm/lib/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/xuzhangda/.mamba/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...

I am not sure why the cute::Tensor is missing here, is it because the cutlass submodule not up-to-date? Current commit hash is on: c4f6b8c. Thanks!

Add ONNX support

Any plans to support ONNX? An ONNX converter would be very helpful, but implementation details may vary.

I know that burn tries to generate Rust code from ONNX and then include it as a module. Codegen provides some performance benefits.

On the other hand, it is possible to create a model from ONNX at runtime, similar to tract.

Converting to ONNX also requires some effort, as branches and loops can introduce errors.

Cant run example - right access problem

Execute: cargo run --example llama

Have error:

Running on CPU, to run on GPU, build this example with --features cuda
loading the model weights from meta-llama/Llama-2-7b-hf
Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

I have acc https://huggingface.co/ievnsk and create token and after create token file: C:\Users\igumn.cache\huggingface\token execute again. Have new error:

loading the model weights from meta-llama/Llama-2-7b-hf
Error: I/O error Клиент не обладает требуемыми правами. (os error 1314)

Caused by:
Клиент не обладает требуемыми правами. (os error 1314)

Stack backtrace:
0: backtrace::backtrace::dbghelp::trace
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\backtrace-0.3.68\src\backtrace\dbghelp.rs:98
1: backtrace::backtrace::trace_unsynchronizedanyhow::backtrace::capture::impl$4::create::closure_env$0
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\backtrace-0.3.68\src\backtrace\mod.rs:66
2: backtrace::backtrace::traceanyhow::backtrace::capture::impl$4::create::closure_env$0
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\backtrace-0.3.68\src\backtrace\mod.rs:53
3: anyhow::backtrace::capture::Backtrace::create
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.72\src\backtrace.rs:216
4: anyhow::backtrace::capture::Backtrace::capture
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.72\src\backtrace.rs:204
5: anyhow::error::impl$1::from<enum2$<hf_hub::api::sync::ApiError> >
at C:\Users\igumn.cargo\registry\src\index.crates.io-6f17d22bba15001f\anyhow-1.0.72\src\error.rs:547
6: core::result::impl$27::from_residual<tuple$<>,enum2$<hf_hub::api::sync::ApiError>,anyhow::Error>
at /rustc/eb26296b556cef10fb713a38f3d16b9886080f26\library\core\src\result.rs:1961
7: llama::main
at .\candle-examples\examples\llama\main.rs:168

Link website to github project

It took me a while to notice that this repository had a book 😓. Would it be possible to link the website to the project in GitHub on the sidebar?

Error while linking with accelerate on mac

I enabled accelerate features in my cargo.toml file

[dependencies]
candle = { path = "../candle/candle-core", version = "0.1.0", package = "candle-core", features=["accelerate"]}
candle-datasets = { path = "../candle/candle-datasets", version = "0.1.0" }
candle-nn = { path = "../candle/candle-nn", version = "0.1.0" }
candle-transformers = { path = "../candle/candle-transformers", version = "0.1.0" }
safetensors = "*"
serde = "*"
serde_json = "*"
num-traits = "*"
half = "*"
rand =  "*"
rand_chacha =  "*"

I am getting the following error

Undefined symbols for architecture arm64:
            "_dgemm_", referenced from:
                candle_core::accelerate::dgemm::h1b71a038552bcabe in libcandle_core-8c2363c344682bad.rlib(candle_core-8c2363c344682bad.3cylqiepw2bvor3t.rcgu.o)
            "_sgemm_", referenced from:
                candle_core::accelerate::sgemm::h2cf21c592cba3c47 in libcandle_core-8c2363c344682bad.rlib(candle_core-8c2363c344682bad.3cylqiepw2bvor3t.rcgu.o)
          ld: symbol(s) not found for architecture arm64

Am I doing something wrong ?

request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json

$ cargo run --example llama --release
Finished release [optimized] target(s) in 0.09s
Running target/release/examples/llama
Running on CPU, to run on GPU, build this example with --features cuda
loading the model weights from meta-llama/Llama-2-7b-hf
Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

Caused by:
https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

Stack backtrace:
0: llama::main
1: std::sys_common::backtrace::__rust_begin_short_backtrace
2: std::rt::lang_start::{{closure}}
3: core::ops::function::impls::<impl core::ops::function::FnOnce for &F>::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:284:13
std::panicking::try::do_call
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
std::panicking::try
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
std::panic::catch_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
std::rt::lang_start_internal::{{closure}}
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:148:48
std::panicking::try::do_call
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
std::panicking::try
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
std::panic::catch_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
std::rt::lang_start_internal
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:148:20
4: main
5: __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
6: __libc_start_main_impl
at ./csu/../csu/libc-start.c:392:3
7: _start

Equivalent to pytorch tril / masked_fill

How would you implement following pytorch code into candle ?

wei = torch.ones(T,T)
tril = torch.tril(torch.ones(T,T))
wei = wei.masked_fill(tril==0, float('-inf'))
wei = F.softmax(wei, dim=-1)
out = wei @ x

Support for signed integer

It would be useful to have the possibility to have a signed integer DType.
In candle_core::dtype there is already support for many float types and unsigned int, but no signed int option.
I suggest we add i32.

Android support

Is Android support on the roadmap? And example?

Parameter Grouping

Optimizers like Lars and Lamb do per-layer weight updates. Is there functionality to group certain parameters together? Pytorch equivalent would be nn.ModuleDict.

Version 0.2.0 Features

Having Q4 (GGML like) inference
Stable Diffusion example (not necessarily highly optimized, but functional on CUDA + CPU)
- Make it reasonably fast (CUDA at least :) )
More unit tests
More benchmarks points (M1, CPU, MKL, CUDA, WASM) x (Whisper, LLama2) at least
Optimize CPU backend
Better errors for older GPUs (#353)
More complete book (Training loop, adding Cuda debugging/profiling section, adding CPU tracing information at least)
VarBuilder reconsideration (We know it's not a great abstraction, we just don't have a good replacement idea yet)

Using your own fine tuned weights

How do you provide a path to your own fine tuned weights for all the other models but its clear how you do it for llama2 but not the other models.

Contributing

Cool project, I would love to contribute in my free time. What needs to be done at this point?

Error while building on arm64 due to candle-gemm-f16

I'm using candle-core and candle-nn as dependencies and I can not build my project on an arm64 machine. (version 0.1.1)

I have just added the final log part.

All the errors seems to point to the file candle-gemm-f16/src/microkernel.rs.

...
note: instantiated into assembly here
   --> <inline asm>:1:2
    |
1   |     fmul v4.8h, v8.8h, v3.8h
    |     ^

error: instruction requires: fullfp16
   --> /code/vendor/candle-gemm-f16/src/microkernel.rs:364:18
    |
364 |                 "fmul {0:v}.8h, {1:v}.8h, {2:v}.8h",
    |                  ^
    |
note: instantiated into assembly here
   --> <inline asm>:1:2
    |
1   |     fmul v2.8h, v8.8h, v1.8h
    |     ^

error: instruction requires: fullfp16
   --> /code/vendor/candle-gemm-f16/src/microkernel.rs:364:18
    |
364 |                 "fmul {0:v}.8h, {1:v}.8h, {2:v}.8h",
    |                  ^
    |
note: instantiated into assembly here
   --> <inline asm>:1:2
    |
1   |     fmul v1.8h, v8.8h, v0.8h
    |     ^

...

error: could not compile `candle-gemm-f16` (lib) due to 1141 previous errors
warning: build failed, waiting for other jobs to finish...

Seems there is no way to disable f16 support, please let me know if I'm wrong. That will be nice as I'm not using it, but I don't know if there is another solution to compile the project for arm/arm64 devices.

Thank you for the help.

Serialize/Deserialize Varmap in memory

Is there any way to serialize a Varmap instance into a u8 vector instead of writing to a file? If not, can it be added?

I'm using candle 0.1.0 and I'm unable to accomplish that.

Thank you in advance.

Support for broadcast matmul

If I understand correctly, for now matmul needs batch dimensions to be exactly the same. Taken from candle_core::tensor doc:
b1, b2, ..., bi, m, k x b1, b2, ..., bi, k, n -> b1, b2, ..., bi, m, n
This is batch matrix multiplication (like torch.bmm).

It would be great to support broadcast, as in torch.matmul: "For example, if input is a (j×1×n×n) tensor and other is a (k×n×n) tensor, out will be a (j×k×n×n) tensor."

whisper example: Error: No such file or directory (os error 2) when binary copied to another directory

The whisper example runs fine from ./target:

$ ./target/release/examples/whisper --input ~/git/whisper-burn/output.wav
Running on CPU, to run on GPU, build this example with `--features cuda`
loaded mel filters [80, 201]
loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
pcm data loaded 131413
loaded mel: [1, 80, 3000]
audio features: [1, 1500, 384]
3000: Segment { start: 0.0, duration: 30.0, dr: DecodingResult { tokens: [50257, 50363, 770, 318, 281, 1672, 3809, 8296, 1223, 1244, 1682, 910, 611, 314, 8296, 1223, 319, 616, 2342, 11, 314, 892, 428, 318, 703, 340, 561, 1210, 503, 13, 50763, 50256], text: " This is an example voice recording something might actually say if I recording something on my watch, I think this is how it would turn out.", avg_logprob: -0.37165226448053545, no_speech_prob: 0.09571712464094162, temperature: 0.0, compression_ratio: NaN } }, in 3.080288417s

But it crashes with an unfound file when I run from another directory:

$ RUST_BACKTRACE=full ./whisper --input ~/git/whisper-burn/output.wav
Running on CPU, to run on GPU, build this example with `--features cuda`
Error: No such file or directory (os error 2)

Stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.68/src/backtrace/libunwind.rs:93:5
      backtrace::backtrace::trace_unsynchronized
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.68/src/backtrace/mod.rs:66:5
   1: backtrace::backtrace::trace
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.68/src/backtrace/mod.rs:53:14
   2: anyhow::backtrace::capture::Backtrace::create
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.72/src/backtrace.rs:216:13
   3: anyhow::backtrace::capture::Backtrace::capture
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.72/src/backtrace.rs:204:17
   4: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /Users/n8henrie/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.72/src/error.rs:547:25
   5: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/result.rs:1961:27
   6: whisper::main
             at /Users/n8henrie/git/candle/candle-examples/examples/whisper/main.rs:304:32
   7: core::ops::function::FnOnce::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:250:5
   8: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs:135:18
   9: std::rt::lang_start::{{closure}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:166:18
  10: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:284:13
      std::panicking::try::do_call
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:148:48
      std::panicking::try::do_call
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
      std::panicking::try
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
      std::panic::catch_unwind
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
      std::rt::lang_start_internal
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:148:20
  11: std::rt::lang_start
             at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/rt.rs:165:17
  12: _main

Is there a way to run this as a standalone binary? Other files I could include via include_bytes! or the like?

Trainable batch normalization

I am trying to translate some code I wrote with tch-rs into candle as an experiment to see what the library is like.
It looks like I stumbled into a road-block almost immediately. I have a convolutional neural network made up of many residual blocks. Each residual block internally uses batch normalization.

In tch-rs, I could use nn::batch_norm_2d. Is batch normalization is not implemented by candle yet?

Apple silicon (MPS backends) support?

Support running on Macbook?

Cannot run examples with --features cuda option

CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true
warning: some crates are on edition 2021 which defaults to resolver = "2", but virtual workspaces default to resolver = "1"
note: to keep the current resolver, specify workspace.resolver = "1" in the workspace root's manifest
note: to use the edition 2021 resolver, specify workspace.resolver = "2" in the workspace root's manifest
Compiling libc v0.2.147
Compiling autocfg v1.1.0
Compiling crossbeam-utils v0.8.16
Compiling proc-macro2 v1.0.66
Compiling unicode-ident v1.0.11
Compiling rayon-core v1.11.0
Compiling memchr v2.5.0
Compiling libm v0.2.7
Compiling cfg-if v1.0.0
Compiling pkg-config v0.3.27
Compiling paste v1.0.14
Compiling serde v1.0.183
Compiling serde_derive v1.0.183
Compiling scopeguard v1.2.0
Compiling syn v1.0.109
Compiling serde_json v1.0.104
Compiling seq-macro v0.3.5
Compiling vcpkg v0.2.15
Compiling crc32fast v1.3.2
Compiling ident_case v1.0.1
Compiling strsim v0.10.0
Compiling fnv v1.0.7
Compiling thiserror v1.0.44
Compiling either v1.9.0
Compiling glob v0.3.1
Compiling openssl v0.10.56
Compiling rustls v0.21.6
Compiling anyhow v1.0.72
Compiling cudarc v0.9.13
Compiling portable-atomic v1.4.2
Compiling native-tls v0.2.11
Compiling esaxx-rs v0.1.8
Compiling adler v1.0.2
Compiling rustix v0.38.7
Compiling gimli v0.27.3
Compiling macro_rules_attribute-proc_macro v0.1.3
Compiling rustc-demangle v0.1.23
Compiling miniz_oxide v0.7.1
Compiling heck v0.4.1
Compiling flate2 v1.0.26
Compiling memoffset v0.9.0
Compiling crossbeam-epoch v0.9.15
Compiling num-traits v0.2.16
Compiling zip v0.6.6
Compiling crossbeam-channel v0.5.8
Compiling aho-corasick v1.0.2
Compiling object v0.31.1
Compiling nom v7.1.3
Compiling aho-corasick v0.7.20
Compiling quote v1.0.32
Compiling macro_rules_attribute v0.1.3
Compiling syn v2.0.28
Compiling crossbeam-deque v0.8.3
Compiling num_cpus v1.16.0
Compiling getrandom v0.2.10
Compiling dirs-sys v0.4.1
Compiling console v0.15.7
Compiling memmap2 v0.7.1
Compiling regex-automata v0.3.6
Compiling cc v1.0.82
Compiling dirs v5.0.1
Compiling rand_core v0.6.4
Compiling num-complex v0.4.3
Compiling rand_chacha v0.3.1
Compiling indicatif v0.17.6
Compiling rand v0.8.5
Compiling addr2line v0.20.0
Compiling rayon v1.7.0
Compiling is-terminal v0.4.9
Compiling ring v0.16.20
Compiling openssl-sys v0.9.91
Compiling rand_distr v0.4.3
Compiling backtrace v0.3.68
Compiling onig_sys v69.8.1
Compiling anstream v0.3.2
Compiling clap_builder v4.3.21
Compiling half v2.3.1
Compiling spm_precompiled v0.1.4
Compiling regex v1.9.3
Compiling darling_core v0.14.4
Compiling fancy-regex v0.10.0
Compiling candle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels)
Compiling candle-gemm-common v0.15.5
Compiling rayon-cond v0.1.0
Compiling candle-gemm-f32 v0.15.5
Compiling candle-gemm-f64 v0.15.5
Compiling candle-gemm-c64 v0.15.5
Compiling candle-gemm-c32 v0.15.5
Compiling safetensors v0.3.2
Compiling candle-examples v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-examples)
Compiling tracing-chrome v0.7.1
Compiling candle-gemm-f16 v0.15.5
error: failed to run custom build command for candle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels)

Caused by:
process didn't exit successfully: /mnt/source1/djbGR/ruststuffs/candle/target/release/build/candle-kernels-e21ab5b8e8daaf0a/build-script-build (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/local/cuda/include
cargo:rerun-if-changed=src/
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=sm_61

--- stderr
src/compatibility.cuh(19): error: function "__hmax_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmax_nan(__half a, __half b) {
^

src/compatibility.cuh(22): error: function "__hmin_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmin_nan(__half a, __half b) {
^

src/compatibility.cuh(19): error: function "__hmax_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmax_nan(__half a, __half b) {
^

2 errors detected in the compilation of "src/indexing.cu".
src/compatibility.cuh(19): error: function "__hmax_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmax_nan(__half a, __half b) {
^

2 errors detected in the compilation of "src/affine.cu".
src/compatibility.cuh(19): error: function "__hmax_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmax_nan(__half a, __half b) {
^

2 errors detected in the compilation of "src/cast.cu".
2 errors detected in the compilation of "src/reduce.cu".
2 errors detected in the compilation of "src/conv.cu".
src/compatibility.cuh(19): error: function "__hmax_nan(__half, __half)" has already been defined
attribute((device)) inline attribute((always_inline)) __half __hmax_nan(__half a, __half b) {
^

2 errors detected in the compilation of "src/ternary.cu".
2 errors detected in the compilation of "src/unary.cu".
2 errors detected in the compilation of "src/binary.cu".
thread 'main' panicked at 'nvcc error while compiling "src/affine.cu":

stdout

stderr

', candle-kernels/build.rs:207:13
stack backtrace:
0: 0x557f8498d0b1 - std::backtrace_rs::backtrace::libunwind::trace::hb01a67340c9cfb71
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x557f8498d0b1 - std::backtrace_rs::backtrace::trace_unsynchronized::h896aca561948c930
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x557f8498d0b1 - std::sys_common::backtrace::_print_fmt::h8627be5b68fbde29
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:65:5
3: 0x557f8498d0b1 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h1b7758da45f4cd22
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:44:22
4: 0x557f849b282c - core::fmt::rt::Argument::fmt::h0eb38586043a01ca
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/rt.rs:138:9
5: 0x557f849b282c - core::fmt::write::h68b52f8aa598961e
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/mod.rs:1094:21
6: 0x557f8498949e - std::io::Write::write_fmt::hc5568929b662da92
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/io/mod.rs:1714:15
7: 0x557f8498cec5 - std::sys_common::backtrace::_print::h65aecbff12ca83c8
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:47:5
8: 0x557f8498cec5 - std::sys_common::backtrace::print::hf75ac9d60598d247
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:34:9
9: 0x557f8498e483 - std::panicking::default_hook::{{closure}}::hc2cb8da3be7476b0
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:269:22
10: 0x557f8498e19d - std::panicking::default_hook::hefa49c86da66275b
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:288:9
11: 0x557f8498ea09 - std::panicking::rust_panic_with_hook::hd4c3b0056ba96951
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:705:13
12: 0x557f8498e907 - std::panicking::begin_panic_handler::{{closure}}::he487675683e9a525
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:597:13
13: 0x557f8498d516 - std::sys_common::backtrace::__rust_end_short_backtrace::hcff58b9b81620321
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:151:18
14: 0x557f8498e652 - rust_begin_unwind
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:593:5
15: 0x557f848b9333 - core::panicking::panic_fmt::h1b81548733a03bd5
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/panicking.rs:67:14
16: 0x557f848c3323 - build_script_build::cuda::build_ptx::ha488acce3cd701b3
at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:207:13
17: 0x557f848c0878 - build_script_build::main::h2523e6c20b65fa04
at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:6:33
18: 0x557f848d40cb - core::ops::function::FnOnce::call_once::h385ddf31127d3e12
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:250:5
19: 0x557f848ccbae - std::sys_common::backtrace::__rust_begin_short_backtrace::h1cfd550c72c3e194
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:135:18
20: 0x557f848e0130 - std::rt::lang_start::{{closure}}::h70dc5fa7783a03f7
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:166:18
21: 0x557f8498541b - core::ops::function::impls::<impl core::ops::function::FnOnce for &F>::call_once::h9eccf02cf11756f6
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:284:13
22: 0x557f8498541b - std::panicking::try::do_call::hc95b838862bbb45a
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40
23: 0x557f8498541b - std::panicking::try::h82935254d12a76fc
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19
24: 0x557f8498541b - std::panic::catch_unwind::h7fd9d11cd70fc350
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14
25: 0x557f8498541b - std::rt::lang_start_internal::{{closure}}::h0ddb191e68b650a4
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:48
26: 0x557f8498541b - std::panicking::try::do_call::h17d4693c7a6e120c
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40
27: 0x557f8498541b - std::panicking::try::h684fc020e1305912
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19
28: 0x557f8498541b - std::panic::catch_unwind::h757da538db515116
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14
29: 0x557f8498541b - std::rt::lang_start_internal::ha6b1625a1e9a4f5b
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:20
30: 0x557f848e010a - std::rt::lang_start::h0d1360f20fc735dd
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:165:17
31: 0x557f848c43fe - main
32: 0x7fd8be429d90 - __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
33: 0x7fd8be429e40 - __libc_start_main_impl
at ./csu/../csu/libc-start.c:392:3
34: 0x557f848b9a15 - _start
35: 0x0 -

Cannot run llama example : access to source requires login credentials

cargo run --example llama --release
warning: some crates are on edition 2021 which defaults to resolver = "2", but virtual workspaces default to resolver = "1"
note: to keep the current resolver, specify workspace.resolver = "1" in the workspace root's manifest
note: to use the edition 2021 resolver, specify workspace.resolver = "2" in the workspace root's manifest
Finished release [optimized] target(s) in 0.17s
Running target/release/examples/llama
Running on CPU, to run on GPU, build this example with --features cuda
loading the model weights from meta-llama/Llama-2-7b-hf
Error: request error: https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

Caused by:
https://huggingface.co/meta-llama/Llama-2-7b-hf/resolve/main/tokenizer.json: status code 401

llama convert_checkpont error

candle/candle-examples/examples/llama/convert_checkpoint.py

Line 173 in 385f0d2

all_dicts = {k: v.numpy() for k, v in all_dicts.items()}

Traceback (most recent call last):                                                                                                                                                  20:17:17
  File "/home/zhengwu/workspace/Github/candle/candle-examples/examples/llama/convert_checkpoint.py", line 199, in <module>
    main()
  File "/home/zhengwu/workspace/Github/candle/candle-examples/examples/llama/convert_checkpoint.py", line 191, in main
    write_model(
  File "/home/zhengwu/workspace/Github/candle/candle-examples/examples/llama/convert_checkpoint.py", line 173, in write_model
    all_dicts = {k: v.numpy() for k, v in all_dicts.items()}
  File "/home/zhengwu/workspace/Github/candle/candle-examples/examples/llama/convert_checkpoint.py", line 173, in <dictcomp>
    all_dicts = {k: v.numpy() for k, v in all_dicts.items()}
TypeError: Got unsupported ScalarType BFloat16

Numpy not support bfloat16.

if convert bfloat16 to float32 , can't support float16 anymore.

all_dicts = {k: v.numpy() if v.dtype != torch.bfloat16else v.to(torch.float32).numpy() for k, v in all_dicts.items()}

maybe there have some more elegant method .

Error building candle-kernels

Hi,
Exciting project!
I'm having some issues building the candle-kernels on windows 10 with cuda 11.7. when trying to run the examples. Any thoughts?

   Compiling candle-kernels v0.1.0 (D:\candle\candle-kernels)
error: failed to run custom build command for `candle-kernels v0.1.0 (D:\candle\candle-kernels)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `D:\candle\target\debug\build\candle-kernels-68d6aa5feaf84d2d\build-script-build` (exit code: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include
  cargo:rerun-if-changed=src/
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=sm_86
  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  8 errors detected in the compilation of "src/unary.cu".
  unary.cu

  --- stderr
  thread 'main' panicked at 'nvcc error while compiling "src\\unary.cu":

  # stdout


  ', candle-kernels\build.rs:207:13
  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\std\src\panicking.rs:578
     1: core::panicking::panic_fmt
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\core\src\panicking.rs:67
     2: build_script_build::cuda::build_ptx
     3: <[T] as core::fmt::Debug>::fmt
     4: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
PS D:\candle> cargo run --example whisper -- --input samples_jfk.wav
   Compiling candle-kernels v0.1.0 (D:\candle\candle-kernels)
error: failed to run custom build command for `candle-kernels v0.1.0 (D:\candle\candle-kernels)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `D:\candle\target\debug\build\candle-kernels-68d6aa5feaf84d2d\build-script-build` (exit code: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include
  cargo:rerun-if-changed=src/
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=sm_86
  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  8 errors detected in the compilation of "src/unary.cu".
  unary.cu

  --- stderr
  thread 'main' panicked at 'nvcc error while compiling "src\\unary.cu":

  # stdout


  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\std\src\panicking.rs:578
     1: core::panicking::panic_fmt
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\core\src\panicking.rs:67
     2: build_script_build::cuda::build_ptx
     3: <[T] as core::fmt::Debug>::fmt
     4: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
PS D:\candle> $env:CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true      
PS D:\candle> cargo run --example whisper -- --input samples_jfk.wav
   Compiling candle-kernels v0.1.0 (D:\candle\candle-kernels)
error: failed to run custom build command for `candle-kernels v0.1.0 (D:\candle\candle-kernels)`
note: To improve backtraces for build dependencies, set the CARGO_PROFILE_DEV_BUILD_OVERRIDE_DEBUG=true environment variable to enable debug information generation.

Caused by:
  process didn't exit successfully: `D:\candle\target\debug\build\candle-kernels-68d6aa5feaf84d2d\build-script-build` (exit code: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include
  cargo:rerun-if-changed=src/
  cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
  cargo:rustc-env=CUDA_COMPUTE_CAP=sm_86
  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__nv_bfloat16]"
  (54): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=__half]"
  (68): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=float]"
  (92): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_2_SQRTPI" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  D:\candle\candle-kernels\src\unary.cu(34): error: identifier "M_SQRT1_2" is undefined
            detected during instantiation of "T gelu_fwd(T) [with T=double]"
  (93): here

  8 errors detected in the compilation of "src/unary.cu".
  unary.cu

  --- stderr
  thread 'main' panicked at 'nvcc error while compiling "src\\unary.cu":

  # stdout


  # stderr
  ', candle-kernels\build.rs:207:13
  stack backtrace:
     0: std::panicking::begin_panic_handler
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\std\src\panicking.rs:578
     1: core::panicking::panic_fmt
               at /rustc/a2b1646c597329d0a25efa3889b66650f65de1de/library\core\src\panicking.rs:67
     2: build_script_build::cuda::build_ptx
     3: <[T] as core::fmt::Debug>::fmt
     4: core::ops::function::FnOnce::call_once
  note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Steps to use on browser/WASM

Hello thank you for sharing this crate !!

Would it be possible to get the steps/code to reproduce the llama2.c web example https://laurentmazare.github.io/candle-llama2/ (compiling to wasm seems ok, but I am quite struggling to generate the corresponding JS glue-code to make it all work).

Again, thank you for your heavy work, really appreciated. :)

	// Apply some avg-pooling by taking the mean embedding value for all tokens (including padding)
	let (_n_sentence, n_tokens, _hidden_size) = embeddings.dims3()?;
	let embeddings = (embeddings.sum(1)? / (n_tokens as f64))?;
	println!("pooled embeddings {:?}", embeddings.shape());

huggingface / candle Goto Github PK

candle's People

Contributors

Stargazers

Watchers

Forkers

candle's Issues

stderr

Description

Expected behavior

Current behavior

Steps to reproduce

Diagnosis

stdout

stderr

Recommend Projects

Recommend Topics

Recommend Org

Jobs