GithubHelp home page GithubHelp logo

Behaviour with small dataset about char-rnn HOT 3 CLOSED

karpathy avatar karpathy commented on June 15, 2024
Behaviour with small dataset

from char-rnn.

Comments (3)

karpathy avatar karpathy commented on June 15, 2024

Hello, I'm guessing you have to use a much smaller batch size, maybe try 10, or even down to one. The issue is that you are having too small number of batches, for this small data, I think. Also try making seq_length 50.

I'll add checks in the code to give more appropriate error messages.

from char-rnn.

hnykda avatar hnykda commented on June 15, 2024

I guess that the problem is that there are no batches in test part. Negative number is nonsence...

 ~/char-rnn (remakes) $ th train.lua -data_dir data/sample -checkpoint_dir cvs/sample -gpuid 0 -eval_val_every 10 -num_layers 1 -rnn_size 50 -val_frac 0.15 -batch_size 100                                                                                         
using CUDA on GPU 0...  
loading data files...   
cutting off end of data so that the batches/sequences divide evenly     
reshaping tensor...     
data load done. Number of batches in train: 18, val: 2, test: -1
vocab size: 7   
creating an LSTM with 1 layers  
number of parameters in the model: 12157        
cloning rnn     
cloning criterion       
1/540 (epoch 0.056), train_loss = 1.91737684, grad/param norm = 2.1561e+00, time/batch = 0.07s  
2/540 (epoch 0.111), train_loss = 1.89370414, grad/param norm = 2.2051e+00, time/batch = 0.04s  
3/540 (epoch 0.167), train_loss = 1.85842120, grad/param norm = 2.4545e+00, time/batch = 0.04s  
4/540 (epoch 0.222), train_loss = 1.78653944, grad/param norm = 3.3634e+00, time/batch = 0.04s  
5/540 (epoch 0.278), train_loss = 1.66507999, grad/param norm = 2.8590e+00, time/batch = 0.04s  
6/540 (epoch 0.333), train_loss = 1.61959934, grad/param norm = 1.6471e+00, time/batch = 0.04s  
7/540 (epoch 0.389), train_loss = 1.61162854, grad/param norm = 1.7974e+00, time/batch = 0.04s  
8/540 (epoch 0.444), train_loss = 1.58403392, grad/param norm = 1.0281e+00, time/batch = 0.04s  
9/540 (epoch 0.500), train_loss = 1.60601129, grad/param norm = 1.7299e+00, time/batch = 0.04s  
evaluating loss over split index 2      
1/2...  
/home/kotrfa/torch/install/bin/luajit: /home/kotrfa/char-rnn/train.lua:131: attempt to index local 'x' (a nil value)
stack traceback:
        /home/kotrfa/char-rnn/train.lua:131: in function 'eval_split'
        /home/kotrfa/char-rnn/train.lua:234: in main chunk
        [C]: in function 'dofile'
        ...trfa/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00406670

I don't really understand what is the test batch for? I thought that data are sliced to two parts - train and evaluation. The train loss is being printed every step, while evaluation is printed on every eval_val_every. But what is the role of test?

from char-rnn.

amangupta199334 avatar amangupta199334 commented on June 15, 2024

What code do we use for testing purpose?

from char-rnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.