GithubHelp home page GithubHelp logo

Bigram Model about candle HOT 12 CLOSED

huggingface avatar huggingface commented on May 10, 2024
Bigram Model

from candle.

Comments (12)

LaurentMazare avatar LaurentMazare commented on May 10, 2024

It's certainly early days for this project so the documentation is scarse and the apis are likely to change. There is some ongoing work on the candle book but it's also a work in progress.
When it comes to your issue, the model implementation looks reasonable. Maybe the issue is with how you set the optimizer, you have to pass it the variables used in your model, so if you have a fully reproducible example this would make diagnosing this easier.
I would also suggest looking at the mnist training example for inspiration.

from candle.

known-samy avatar known-samy commented on May 10, 2024

Update Readme Section

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Thanks @LaurentMazare. I am trying to build a few simple tutorials which could onboard newcomers to candle framework.

This is my optimizer code. Can you review it by any chance ?

let varmap = VarMap::new();
    let optimizer = candle_nn::SGD::new(varmap.all_vars(), 0.003);

    for _ in 0..100 {
        let (x,y) = get_batch(&train_data);
        let xb: Tensor = Tensor::new(&x,&Device::Cpu).unwrap();
        let yb: Tensor = Tensor::new(&y,&Device::Cpu).unwrap();
        let (_, loss) = m.forward(&xb, &yb);
        optimizer.backward_step(&loss).unwrap();
        println!("loss: {:?}", loss);
    }

I am picking up xb and yb randomly.

fn get_batch(data: &[u32]) -> ([[u32; BLOCK_SIZE]; BATCH_SIZE], [[u32; BLOCK_SIZE]; BATCH_SIZE]) {
    let mut rng = ChaCha8Rng::seed_from_u64(SEED);

    let mut xx = [[0u32; BLOCK_SIZE]; BATCH_SIZE];
    let mut yy = [[0u32; BLOCK_SIZE]; BATCH_SIZE];

    for batch_index in 0..BATCH_SIZE {
        let start = rng.gen_range(0..data.len() - BLOCK_SIZE);

        for block_index in 0..BLOCK_SIZE {
            xx[batch_index][block_index] = data[start + block_index];
            yy[batch_index][block_index] = data[start + block_index + 1];
        }
    }
    
    (xx, yy)
}

from candle.

LaurentMazare avatar LaurentMazare commented on May 10, 2024

Here is a script that should work based on your code. The tricky bit was that the optimizer should take as input a VarMap that contains the variable to optimize. So you want to use the VarMap struct that was also used to create the embedding layer.
Note that I've also removed the unwrap in favor of propagating the errors. I would suggest as possible improvements:

  • Using AdamW rather than SGD might make tweaking the learning rate easier.
  • Maybe get_bach could be a rust iterator?
use anyhow::Result;
use candle::{DType, Device, Tensor};
use candle_nn::{embedding, Embedding, VarBuilder, VarMap};
use rand::{Rng, SeedableRng};

pub const BATCH_SIZE: usize = 64;
pub const BLOCK_SIZE: usize = 128;
pub const VOCAB_SIZE: usize = 100;
pub const SEED: u64 = 299792458;

fn get_batch(
    data: &[u32],
) -> (
    [[u32; BLOCK_SIZE]; BATCH_SIZE],
    [[u32; BLOCK_SIZE]; BATCH_SIZE],
) {
    let mut rng = rand::rngs::StdRng::seed_from_u64(SEED);

    let mut xx = [[0u32; BLOCK_SIZE]; BATCH_SIZE];
    let mut yy = [[0u32; BLOCK_SIZE]; BATCH_SIZE];

    for batch_index in 0..BATCH_SIZE {
        let start = rng.gen_range(0..data.len() - BLOCK_SIZE);

        for block_index in 0..BLOCK_SIZE {
            xx[batch_index][block_index] = data[start + block_index];
            yy[batch_index][block_index] = data[start + block_index + 1];
        }
    }

    (xx, yy)
}

#[derive(Debug)]
pub struct BigramLanguageModel {
    token_embedding_table: Embedding,
}

impl BigramLanguageModel {
    // Constructor
    pub fn new(vocab_size: usize, vb: VarBuilder) -> Result<Self> {
        let token_embedding_table = embedding(vocab_size, vocab_size, vb)?;
        Ok(BigramLanguageModel {
            token_embedding_table,
        })
    }

    // Forward pass
    pub fn forward(&self, idx: &Tensor, targets: &Tensor) -> Result<(Tensor, Tensor)> {
        let logits = self.token_embedding_table.forward(idx)?;

        let shape = logits.shape().dims();
        let logits = logits.reshape(&[shape[0] * shape[1], shape[2]])?;

        println!("shape: {:?}", logits.shape());
        println!("targets shape: {:?}", targets.shape().dims()[0]);
        if targets.shape().dims()[0] != 1 {
            let targets = targets.reshape(&[shape[0] * shape[1]])?;
            let loss = candle_nn::loss::cross_entropy(&logits, &targets)?;
            Ok((logits, loss))
        } else {
            let loss = Tensor::zeros((1, 1), DType::F32, &Device::Cpu)?;
            Ok((logits, loss))
        }
    }
}

fn main() -> Result<()> {
    let dev = Device::Cpu;
    let varmap = VarMap::new();
    let vb = VarBuilder::from_varmap(&varmap, DType::F32, &dev);
    let m = BigramLanguageModel::new(VOCAB_SIZE, vb)?;
    let optimizer = candle_nn::SGD::new(varmap.all_vars(), 0.3);

    let train_data = (0..1000).map(|i| i % VOCAB_SIZE as u32).collect::<Vec<_>>();

    for _ in 0..100 {
        let (x, y) = get_batch(&train_data);
        let xb: Tensor = Tensor::new(&x, &dev)?;
        let yb: Tensor = Tensor::new(&y, &dev)?;
        let (_, loss) = m.forward(&xb, &yb)?;
        optimizer.backward_step(&loss)?;
        println!("loss: {:?}", loss);
    }
    Ok(())
}

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Thanks @LaurentMazare, this is very useful. AdamW is my next step. I just wanted build everything from scratch. So that different functions could be introduced at different steps. So rust iterator will be my next step.

I am curious about inspiration behind VarBuilder. Does it act as an efficient storage of neural weights even when scope of training function changes ?

Also, I am not seeing any multi core usage. How do I enable it ? This is my htop output during training.

Screenshot 2023-08-14 at 11 27 16 AM

from candle.

LaurentMazare avatar LaurentMazare commented on May 10, 2024

The VarBuilder is used to provide a model with variables so you typically pass it to functions that create model components. These functions can then retrieve a variable if it's loaded from disk or generate a random variable if the model is to be initialized. Variables are kept together with their path, e.g. encoder.layer1.mlp.weight that is used both when reading weights from disks or when saving trained weights to the disk.

Typical example derived from the mnist example:

use candle_nn::Linear;
// This is actually already in candle_nn::linear.
fn linear(in_dim: usize, out_dim: usize, vs: VarBuilder) -> Result<Linear> {
    let ws = vs.get_or_init((out_dim, in_dim), "weight", candle_nn::init::ZERO)?;
    let bs = vs.get_or_init(out_dim, "bias", candle_nn::init::ZERO)?;
    Ok(Linear::new(ws, Some(bs)))
}

struct Mlp {
    ln1: Linear,
    ln2: Linear,
}

impl Mlp {
    fn new(vs: VarBuilder) -> Result<Self> {
        let ln1 = linear(IMAGE_DIM, 100, vs.pp("ln1"))?;
        let ln2 = linear(100, LABELS, vs.pp("ln2"))?;
        Ok(Self { ln1, ln2 })
    }

    fn forward(&self, xs: &Tensor) -> Result<Tensor> {
        let xs = self.ln1.forward(xs)?;
        let xs = xs.relu()?;
        self.ln2.forward(&xs)
    }
}

This means that you can create a new Mlp by using a VarBuilder that is backed by a file for inference, e.g.:

let weights = unsafe { candle::safetensors::MmapedFile::new(weights_filename)? };
let weights = weights.deserialize()?;
let vb = VarBuilder::from_safetensors(vec![weights], DType::F32, &dev);
let model = Mlp.new(vb)?;

Or use a VarBuilder backed by a fresh VarMap if you want to train a model via:

// For training
let varmap = VarMap::new();
let vb = VarBuilder::from_varmap(&varmap, DType::F32, &dev);
let model = Mlp.new(vb)?;
...train...
varmap.save("mlp.safetensors")?;

The embedding layer is by default not multithreaded as it's usually more memory bounded than cpu bounded (we're likely to revisit this though when polishing things if it adds some performance). Matrix multiplication/convolutions and other intensive ops should be using multiple cores.

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Thank you @LaurentMazare. That makes sense.

Could you help me out with a few of more queries ?

  1. How would you use manual seed in randn initialization ?
  2. Assumming that let x = Tensor::zeros( (B, T, C), DType::F32, &Device::Cpu)?;, how would you change values at x.i((b,t)) ?
  3. Any chance you could prioritize MPS implementation ? There is a larger audience who will be moving from Llama.cpp and Rustformers/llm to this repo where they will be looking at training and inference at local device. Apple Silicon backend will definitely encourage those users in using this repo.

from candle.

LaurentMazare avatar LaurentMazare commented on May 10, 2024
  • Manual seeding is a work in progress but is not available at the moment, we're trying to figure out a good way to do it (the rng should probably be part of the device but it's annoying to have devices that are not Copy and that you have to pass by reference so we have to think a bit more about it).
  • You cannot mutate tensors and it's on purpose, only variables can be mutated. Maybe you want to use something like mask.where_cond(x, y)?
  • We actually added support for accelerate over the last couple days, it's probably not as good as having full metal support but should already provide some good speed up. You can use it with --features accelerate, please let us know if you see anything weird going on with it as it's certainly less tested than the rest.

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Thanks @LaurentMazare. I will check it out.

One more question, are you using accelerate for BLAS, LAPACK or BNNS ? What kind of speed ups have you observed ?

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Apologies, I guess you are using accelerate-src to get this going. I will go into the implementation as I get more command on the framework. Should I close this issue ?

from candle.

LaurentMazare avatar LaurentMazare commented on May 10, 2024

Right, on my linux x86 box I see mkl being regulary 2x or 3x faster. With accelerate on the matmul from cpu_benchmarks.rs I saw a 5x acceleration but that may be a very specific case. Feel free to close this issue if the original question was answered (and obviously to open new ones if you have further/different questions or to re-open if you have more comments on the same topic)

from candle.

okpatil4u avatar okpatil4u commented on May 10, 2024

Thanks @LaurentMazare

from candle.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.