facebookarchive / torchnet Goto Github PK

Torch on steroids

License: Other

CMake 0.15% Lua 99.85%

torchnet's Issues

Why the software history was not kept?

Hi there,

I'm a researcher studying software evolution. As part of my current research, I'm studying the implications of open-sourcing a proprietary software, for instance, if the project succeed in attracting newcomers. However, I observed that some projects, like torchnet, deleted their software history.

9b7759d

Knowing that software history is indispensable for developers (e.g., developers need to refer to history several times a day), I would like to ask torchnet developers the following four brief questions:

Why did you decide to not keep the software history?
Do the core developers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
Do the newcomers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
How does the lack of history impacted on software evolution? Does it placed any burden in understanding and evolving the software?

Thanks in advance for your collaboration,

Gustavo Pinto, PhD
http://www.gustavopinto.org

Segmentation fault (core dumped)

I have tried to use torchnet but I have got this exception when I called the function getIterator
could you please help me to solve this issue.
here is my code.

local function getData(fname)
    local hdf5 = require 'hdf5'
    local f = hdf5.open(fname, 'r')
    
    local X1 = f:read('X1'):all()
    local X2 = f:read('X2'):all()
    local X3 = f:read('X3'):all()
    local labels = f:read('labels'):all()
    f:close()
    return {X1 = X1, X2 = X2, X3 = X3, labels = labels}
end

 
-- function that sets of dataset iterator:


local function getIterator(mode)
   return tnt.ParallelDatasetIterator{
      nthread = 1,
      init    = function() require 'torchnet' end,
      closure = function()

         -- load dataset:

         local dataset = getData('data/en/A/' .. mode .. '.h5')

         -- return batches of data:
         return tnt.BatchDataset{
            batchsize = 128,
            dataset = tnt.ListDataset{  -- replace this by your own dataset
               list = torch.range(1, dataset.X1:size(1)):long(),
               load = function(idx)
                  return {
                     input  = { dataset.X1[idx], dataset.X2[idx], dataset.X3[idx] },
                     target = torch.LongTensor{dataset.labels[idx] + 1},
                  }  -- sample contains input and target
               end,
            }
         }
      end,
   }
end

Wrapping issue when passing tables as first parameter with argcheck

Lua handles the brackets arguments by passing everything into the first argument as a table. There is no indicator that the function has been called using my_function{arg1 = "hello", arg2 = "world"} and not with my_function({arg1 = "hello", arg2 = "world"}). This makes it impossible for argcheck to know how to parse the first argument when it is of type = "table" as in the NDCGMeter. Here's a test case:

function test.NDCGMeter()
   local mtr = tnt.NDCGMeter{K = {6}}

   -- From: https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG
   local relevance = torch.DoubleTensor{3,2,3,0,1,2}
   local output = torch.linspace(relevance:size(1), 1, relevance:size(1)):double()
   mtr:add(output, relevance)

   local est = mtr:value()
   tester:eq(est[6], 0.932, "nDGC with K=6", 10^-3)
end

As there is only one parameter it is easy to add a workaround (add at line 96):

         if (K.K) then
            K = K.K
         end

Unfortunately there is no elegant Lua solution as I see it and I've proposed for argcheck to add a table wrapper class. This allows the user to wrap the table in a class that then is easy for argcheck to identify. This has the downside of adding complexity but we believed that for the torch-dataframe package this was the best solution as there are plenty of instances where passing a table as the first argument makes sense.

evaluation mode for models

For models that contain layers like batch normalization or dropout, does the API take care of calling model:evaluate() in the test engine or is it the user's responsibility to take care of this?

MNIST example

Hi,
I just want to point out a minor problem in the mnist.lua example related to the criterion and database iterator.

If you redefine the model and the criterion (lines 49 and 50) with the following lines, the script will fail.

local net = nn.Sequential():add(nn.Linear(784,10))
net:add(net:add(nn.LogSoftMax()))

local criterion = nn.ClassNLLCriterion()

It looks that the database iterator is returning a target tensor with 2D instead of 1D, which is required by ClassNLLCriterion() . One simple solution is to reshape the sample during training

engine.hooks.onSample = function(state)
    state.sample.target = state.sample.target:view(state.sample.target:nElement())
end

Error when requiring torchnet

When doing

tnt = require 'torchnet'

I'm getting the following error:

/Users/marioyc/torch/install/bin/lua: /Users/marioyc/torch/install/share/lua/5.2/trepl/init.lua:384: /Users/marioyc/torch/install/share/lua/5.2/trepl/init.lua:384: ...rs/marioyc/torch/install/share/lua/5.2/sundown/ascii.lua:227: attempt to index global 'bit' (a nil value)
stack traceback:
    [C]: in function 'error'
    /Users/marioyc/torch/install/share/lua/5.2/trepl/init.lua:384: in function 'require'
    torchnet_test.lua:3: in main chunk
    [C]: in function 'dofile'
    ...ioyc/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: in ?

I've been able to reduce it to the fact that the following fails:

local sdascii

pcall(function()
         sdascii = require 'sundown.ascii'
      end)
local test_str = "- "
sdascii.render(test_str)

and this ends up producing an error when requiring for example batchdataset.

Saving network after epoch

I try to save my network after each epoch by doing:

egine.hooks.onEndEpoch = function(state)
        state.network:clearState()
        torch.save('model.t7', state.network)
end

When continuing with a new epoch, I get following error:

$ Error: cuda runtime error (77) : an illegal memory access was encountered at /users/visics/dneven/torch_recent/extra/cutorch/lib/THC/generic/THCStorage.c:158
THCudaCheck FAIL file=/users/visics/dneven/torch_recent/extra/cutorch/lib/THC/generic/THCStorage.c line=158 error=77 : an illegal memory access was encountered

Should I use a different method to clear my model before saving?

returning vector in ListDataset problem.

in latest torchnet, i tried to run examples/mnist.lua changing criterion adopting MSECriterion.
and following the condition, i made target as vector like this,

target = torch.LongTensor(1,10) or target = torch.LongTensor(10, 1)
from
target = torch.LongTensor{dataset.label[idx] + 1}
(to just check whether it works, i didn't assign the label.)

but it's fail.

I suspect returning vector in getIterator(mode) {...}
(specifically, variable 'load' in ListDataset cannot contain vector as target)
does not work.

will you have a plan to repair it near some day?

Tracking progress in an epoch

How do we track the progress in an epoch? I generally use xlua to see how many images have been processed. How can we do that here?

We can use state.t to get batches/samples processed so far. But how we calculate the number of total batches? Can an iterator or datasets have a generic attribute size?

Would you provide an example for ListDataset(self, filename, load[, maxload][, path])?

I'm now trying to
ParallelDatasetIterator

BatchDataset
ListDataset(self, filename, load[, maxload][, path])
case.

Would you provide an example for ListDataset() case using string input having filenames?

meter.MultilabelConfusionMeter invalid argument error

Hi.
I tried to use the MultilabelConfusionMeter with following code:

	local tnt = require 'torchnet'
	meter = tnt.MultiLabelConfusionMeter(#classes, false)

and got error messege:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
tnt.MultiLabelConfusionMeter(self, k[, normalized])

{
   self       = tnt.MultiLabelConfusionMeter  -- 
   k          = number                        -- 
  [normalized = boolean]                      --  [default=true]
}


The tnt.MultiLabelConfusionMeter constructs a confusion matrix for multi-
label, multi-class classification problems. In constructing the confusion
matrix, the number of positive predictions is assumed to be equal to the
number of positive labels in the ground-truth. Correct predictions (that
is, labels in the prediction set that are also in the ground-truth set) are
added to the diagonal of the confusion matrix. Incorrect predictions (that
is, labels in the prediction set that are not in the ground-truth set) are
equally divided over all non-predicted labels in the ground-truth set.

At initialization time, the k parameter that indicates the number of
classes in the classification problem under consideration must be
specified. Additionally, an optional parameter normalized (default = false)
may be specified that determines whether or not the confusion matrix is
normalized (that is, it contains percentages) or not (that is, it contains
counts).

The add(output, target) method takes as input an NxK tensor output that
contains the output scores obtained from the model for N examples and K
classes, and a corresponding NxK-tensor target that provides the targets
for the N examples using one-hot vectors (that is, vectors that contain
only zeros and a single one at the location of the target value to be
encoded).

The value() method has no parameters and returns the confusion matrix in a
KxK tensor. In the confusion matrix, rows correspond to ground-truth
targets and columns correspond to predicted targets.

Got: tnt.MultiLabelConfusionMeter, number, boolean

invalid arguments!

It looks like tnt.MultiLabelConfusionMeter, number boolean is passed, as it should.
But then why do I see this messege?

Sunwoo

imagenet example

hi guys, did anyone write an imagenet training script like this https://github.com/soumith/imagenet-multiGPU.torch
for torchnet?

Main documentation not updated

tnt.SplitDataset treats partition values as fractions only when they are < 1, otherwise treating them as absolute partition size.

The documentation states:
The sum of the partition weights may or may not sum to one (tnt.SplitDataset will make them sum to one!).

This is confusing as if someone gives partitions as percentage like
{ train = 70, test = 30 }, the 70 and 30 are treated as absolute values rather than fraction.

Edit: Also, sum of partition weights should sum 1, or be exact size.

Error using MultiLabelConfusionMeter()

Hi,

I am running the Torch demo face detector code with a different dataset - one that has 158 classes. Here is a snippet from my train.lua file.

local tnt = require 'torchnet' 

local confusion = tnt.MultiLabelConfusionMeter{k = opt.numClasses}

       --create closure to evaluate f(X) and df/dX
       local eval_E = function(w)
       for i = 1,opt.batchSize do 
          confusion:add(y[i],yt[i])
       end

Etc.

When I run the code, the error is:

/home/uni/torch/install/bin/luajit: /home/uni/torch/install/share/lua/5.1/torch/Tensor.lua:462: Wrong size for view. Input size: 158. Output size: 1x1
stack traceback:
[C]: in function 'error'
/home/uni/torch/install/share/lua/5.1/torch/Tensor.lua:462: in function 'view'
...hare/lua/5.1/torchnet/meter/multilabelconfusionmeter.lua:80: in function 'add'
./train.lua:150: in function 'opfunc'
/home/uni/torch/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
./train.lua:158: in function 'train'
run.lua:76: in main chunk
[C]: in function 'dofile'
.../uni/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Can you please help?

I looked into the code and the problem seems to be coming from the following snippet from multilabelconfusionmeter.lua:

if output:nDimension() == 1 then
    output = output:view(1, output:size(1))
 end
if target:nDimension() == 1 then
    target = target:view(1, output:size(1))
end

Since my output and target are 158-dimensional tensors, reshaping the output means output:size(1) has now changed so when target is reshaped, the error arises.

Thank you.

errors during run the codes provided in the paper

File.lua:141: Unwritable object at <?>.callback.closure.mnist.testdataset.createdataset.readlush.torch.cat

slow and out of mem with threads on imagenet

Hi guys, maybe you can suggest some help here?
https://github.com/karandwivedi42/imagenet-multiGPU.torchnet/issues/5
thanks a lot
E

CUDA MNIST example is missing

On the paper, there is also an example about how to run on GPU.
Maybe it would be worth it including it in the example folder.

About this example, I'm not very sure why you perform a resize() on the torch.CudaTensor() on each onSample. Isn't this resize() necessary only once?

How can i use MSE criterion?

Running examples/mnist.lua simply, i can't change criterion to MSE Criterion.
Can you tell me the how-to?

Document uncorrect about "transform.perm"

The document is not correct about the function name "transform.perm"

I checked the source code. The correct version should be "transform.randperm".

problems on the mnist example

@lvdmaaten I donot install mnist dataset to my torch directory by using luarocks install mnist command. Rather I downloaded the mnist to a separate place path_to_minist/, then i try to modify the path in the example code provided by the repos. I insert the code at https://github.com/torchnet/torchnet/blob/master/example/mnist.lua#L11 like below:
local mnist_pkg_path = '/home/jack/public_datasets/?/init.lua' -- concate to package path package.path = mnist_pkg_path .. ';' .. package.path
And the leaving remaining code of 'mnist.lua' the same.
however, I always get an error, donot know why? can you give me some tips? besides, other method like directly modified path of the require command in here, but failed.

Problem running the example

I may have missed something obvious, but I failed to run the provided examples.

If I run th examples/mnist.lua torchnet directory, it gives the following error output:

$ th example/mnist.lua 
running on CPU  
/home/joe/torch/install/bin/luajit: /home/joe/torch/install/share/lua/5.1/threads/threads.lua:264: 
[thread 1 callback] example/mnist.lua:26: module 'mnist' not found:
    no field package.preload['mnist']
    no file [some paths in my computer]
    ...

Where could I get the mnist dataset lua library?

Improve ParallelDatasetIterator documentation

Hey,
This is not really an issue, I would like to have so more details about how this iterator operates the threading. I came through situations where the original dataset was loaded n times, where n is the number of threads. What I would like to do is load my dataset on-the-fly and let torchnet do the multitasking. Problem here is because I don't understand how and what torchnet will multitask, I can't write my iterator efficiently.

I tried to get my hands into the code but it's really hard to understand how Threads operates. I hope someone will find the time to answer this question. Maybe we can improve the documentation to make it less error-prone 😃

for ListDataset, add an onComplete argument

This would be useful for closing any underlying data streams for connections that were opened in order to fetch data for the list data set.

For example, let's say you want to return a ListDataset from a method. This ListDataset loads its data from an IndexedDataSetReader. When the ListDataset is done loading, it should be able to close the IndexedDataSetReader stream.

This would look like:

function getDataSet()

  local dataSetReader = tnt.IndexedDatasetReader('dataset.index', 'dataset.data');

  local someList = {1, 2, 3);

  local list = tnt.ListDataset(
    list = someList,
    load = function(idx) return dataSetReader:get(idx); end,
    ... other arguments omitted ...
    onComplete = function() dataSetReader.close() end
  )
end

Breaking out of a ParallelDatasetIterator may lead to issues

The following code example produces incorrect results in the second run through the iterator iff we break out of the first run through the iterator:

local tnt = require 'torchnet'

local producebug = true

local N = 20
local iterator = tnt.ParallelDatasetIterator{
   nthread = 3,
   init    = function() require 'torchnet' end,
   closure = function()
      local list = torch.range(1, N):long()
      return tnt.ListDataset{
         list = list,
         load = function(idx)
            return {input  = torch.LongTensor{idx}}
         end,
      }
   end,
}

print('| run that we are breaking out:')
for sample in iterator() do
   print(' (1) -> ' .. sample.input[1])
   if producebug then break end
end

print('| run that may contain erroneous samples:')
for sample in iterator() do
   print(' (2) -> ' .. sample.input[1])
end

Documentation generation with chapter support

I was thinking to split the documentation into chapters, and relocate it into a doc folder, like we have done for the other packages of Torch.
Would this be an acceptable PR?

Problem when using multiple GPUs

Sorry for posting long code here, but it is a pretty weird problem and this is the simplest example for me to possibly reproduce the problem.

I'm trying to use torchnet's ParallelDatasetIterator, but got problem when using multiple GPUs. For the following code, the output should not depend on the torchnet, since it is not using the torchnet code.

The program works normally when using single GPU:

$ th train_debug.lua -nGPU 1
Number of parameters:   11184650    
iteration 1: loss=2.289696  
iteration 2: loss=2.209115  
iteration 3: loss=2.136883  
iteration 4: loss=2.071370  
iteration 5: loss=2.011046

However, when using 2 GPUs, the loss goes to nan after the first iteration.

$ th train_debug.lua -nGPU 2
Number of parameters:   11184650    
iteration 1: loss=2.952482  
iteration 2: loss=nan   
iteration 3: loss=nan   
iteration 4: loss=nan   
iteration 5: loss=nan

Although the code should not really depend on torchnet, when I comment out the torchnet related code, the code works normally again.

Code:

require 'nn'
require 'cunn'
require 'cudnn'
local tnt = require 'torchnet'
local optim = require 'optim'

cmd = torch.CmdLine()
cmd:option('-nGPU', 1, 'GPU ID (only using cuda)')
cmd:option('-learning_rate', 1e-5, 'lr')
opt = cmd:parse(arg)

function dpt_model(nGPU, model)
  if nGPU > 1 then
    local gpus = torch.range(1, nGPU):totable()

    model = nn.DataParallelTable(1, true, false)
      :add(model, gpus)
      :threads(function()
        local cudnn = require 'cudnn'
      end)
    model.gradInput = nil
  end
  return model:cuda()
end


------------------------------------
-- start of torchnet related code
-- Create torchnet ParallelDatasetIterator, but not using it
local index = {}
for i=1,10000 do table.insert(index, {path='12345'}) end 
get_data_iterator_func = function ()
  return tnt.ParallelDatasetIterator{
    nthread = 8,
    init    = function() 
      require 'torchnet'
      torch.setdefaulttensortype('torch.FloatTensor')
    end,
    closure = function()
      return tnt.ListDataset{
        list = index,
        load = function (data)
          return {
            input = torch.zeros(3,224,224),
            label = torch.LongTensor{1},
          }
        end
      }:batch(16, 'skip-last')
    end
  }
end
get_data_iterator = get_data_iterator_func()
data_iterator = get_data_iterator()
-- end of torchnet related code
------------------------------------


-- modify the network a little bit (to trigger the error)
local model = torch.load('models/resnet-18.t7')
model:remove(#model)
model:add(nn.Linear(512, 10):cuda())

model = dpt_model(opt.nGPU, model)
criterion = nn.CrossEntropyCriterion():cuda()

local params, grad_params = model:getParameters()
print('Number of parameters: ', params:size(1))

-- same data in each iteration
local x = opt.nGPU > 1 and cutorch.createCudaHostTensor() or torch.CudaTensor()
local y = torch.ones(16):cuda()
x = x:resize(16,3,224,224):normal(0, 1)

function feval(xx)
  if xx ~= params then params:copy(xx) end
  grad_params:zero()
  model:training()
  local outputs = model:forward(x)
  local f = criterion:forward(outputs, y)
  local df_do = criterion:backward(outputs, y)
  model:backward(x, df_do)
  return f, grad_params
end


local optim_state = {learningRate=opt.learning_rate}
for i = 1,5 do
  local __, loss = optim.adam(feval, params, optim_state)
  cutorch.synchronize()
  print(('iteration %d: loss=%f'):format(i, loss[1]))
end

models/resnet-18.t7 is downloaded from pretrianed model of fb.resnet.torch. Very weirdly, the error only triggers when the model file is put in the subdirectory. I'm able to reproduce the error in two different machines. Occasionally it runs without any problem on one of the machines that I tested on, but the problem occurs most of the time.

It may also be that I'm using DataParallelTable incorrectly. Please let me know if this is the case.

Thanks in advance!

How to shuffle dataset after ParallelDatasetIterator ?

Thanks in advance :)

local tnt = require 'torchnet'

local d = tnt.TableDataset{data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}}

local iterator = tnt.ParallelDatasetIterator{
    nthread = 3,
    init = function() require 'torchnet' end,
    closure = function()
        return d
    end,
    ordered = true
}

print('print 1')
for sample in iterator() do
    print(sample)
end

-- executed on the main thread
d = d:shuffle()

print('print 2')
for sample in iterator() do
    print(sample)
end

print('print 3')
for sample in d:iterator()() do
    print(sample)
end

-- executed on each threads, so return a table
-- containing 3 * datasets to the main thread.
-- but how to update d on the threads ?
iterator:exec('shuffle')

print('print 4')
for sample in iterator() do
    print(sample)
end

Bug report: not entering into iterator until thorough depth.

In this case,

local function getIterator(mode)
   return tnt.ParallelDatasetIterator{
         return tnt.BatchDataset{
            dataset = tnt.ListDataset{
                [place 1]

When I set batchSize as big one(:eg. 1000),
I met a case, one cannot reache the [place 1].

But strangely, the program runs without any error.
And it generating 'nan' training loss.

How to select different batch samples in each iteration?

The training code is like:

engine:train{
    network = model,
    iterator = getIterator('train'),
    criterion = criterion,
    optimMethod = optim.sgd,
    config = tablex.deepcopy(cnnopt),
    maxepoch = cnnopt.max_epoch,
}

The getIterator() function is following and only called once:

local function getIterator(mode)
    return tnt.ParallelDatasetIterator{
        nthread = 1,
        init = function()
            require 'torchnet'
            require 'image'
            require 'nn'
        end,
        closure = function()
            local dataset = provider[mode..'Data']

            local list_dataset = tnt.ListDataset{
                list = torch.range(1, dataset.labels:numel()):long(),
                load = function(idx)
                    return {
                        input = dataset.data[idx]:float(),
                        target = torch.LongTensor{dataset.labels[idx]},
                    }
                end,
            }
            if mode == 'train' then
                return list_dataset
                :transform{
                    input = tnt.transform.compose{
                        cnnopt.hflip and hflip,
                        cnnopt.randomcrop > 0 and randomcrop,
                    }
                }
                :batch(cnnopt.batchSize, 'skip-last')
            elseif mode == 'test' then
                return list_dataset
                :batch(cnnopt.batchSize, 'include-last')
            end
        end
    }
end

It seems that the DatasetIterator() will return a fixed list of batches for iterating. If I want to select batch samples at each iteration, how should I change the code? or are there other interfaces I should use?

This error is unclear - what is the problem with my code that is causing this?

qlua: ...hare/lua/5.1/torchnet/meter/multilabelconfusionmeter.lua:107: attempt to index local 'pos' (a nil value)
stack traceback:
[C]: in function '__index'
...hare/lua/5.1/torchnet/meter/multilabelconfusionmeter.lua:107: in function 'add'
main.lua:198: in function 'hooks'
...ch/install/share/lua/5.1/torchnet/engine/optimengine.lua:105: in function 'train'
main.lua:218: in main chunk

Is it necessary to shallow copy the transforms in transform.lua

I think this line is redudent here, and remove this line is okay

Continuously load data from disk in separate thread

Hi,

After playing a bit with torchnet it is still unclear to me how to properly tackle the following problem: suppose my training data lies in a very big file on disk (not loadable into memory at once), each line of the file being one example point. I would like to build a data iterator that runs on a separate thread (or more threads) and that can provide mini-batches to the main thread that performs training of the network. I would also like to do multiple epochs on the ttraining data, so I require that the training file is reopened once it is finished.

I tried using ParallelDatasetIterator, but as far as I understand, the closure is run once per thread and the returned dataset is expected to have a finite size. Can someone please explain or give an example on this issue ? Thanks a lot.

IndexedDataset using string as index for large dataset

Hi !

I would like to extract features from a large dataset of images and to store them in memory.
IndexedDataset provides a really nice way to handle large dataset, however indexes must be integer:
first_tensor = dataset:get(1)

For now, I am using a tds.Hash to store the image names as keys and the corresponding indexes as values. Thus I am doing:
first_tensor = dataset:get(tdshash[first_image_name])

Do you know a better way to handle this using torchnet ?

Thank you for your precious help.

Greetings,
Remi

Impossible to save a Dataset when using torch/image

Please try the following code.

local tnt = require 'torchnet'
local image = require 'image'

local mode = 'train'
local mnist = require 'mnist'
local dataset = mnist[mode .. 'dataset']()

local listdataset = tnt.ListDataset{ 
   list = torch.range(1, dataset.data:size(1)):long(),
   load = function(idx)
      return {
         input  = image.scale(dataset.data[idx],10,10),
         target = torch.LongTensor{dataset.label[idx] + 1},
      } 
   end,
}

torch.save('image.scale', image.scale)     -- works
torch.save('listdataset.tnt', listdataset) -- doesnt work
torch.save('image',image)                  -- doesnt work

/home/cadene/torch/install/bin/luajit: /home/cadene/torch/install/share/lua/5.1/torch/File.lua:141: Unwritable object <function> at <?>.tnt.ListDataset.load.image.float.scaleBicubic
stack traceback:
    [C]: in function 'error'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:141: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:200: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:235: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:220: in function 'writeObject'
    /home/cadene/torch/install/share/lua/5.1/torch/File.lua:388: in function 'save'
    src/bugserialize.lua:18: in main chunk
    [C]: in function 'dofile'
    ...dene/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405ce0

putting vector as target

MSE criterion needs vector(;1d tensor) type input.
But in the example codes, I could see only cases in which scalar values are put as target.
How can i take vector as target and use MSE criterion?

GPU Sparse Support

Good news. The new version supports the GPU version sparse inputs?

Installation error on OS X 10.11 El Capitan with Xcode 8.0

When installing torchnet via luarocks on OS X 10.11 El Capitan, I receive the below error. This error appears to occur because the installation is searching for MacOSX10.11.sdk when only MacOSX.sdk and MacOSX10.12.sdk are installed by default with Xcode 8.0, even though I am running OS X 10.11. Supposedly, this is due to the issue of the latest Xcode only shipping with the latest SDK.

Scanning dependencies of target ads
[ 16%] Building C object CMakeFiles/tds.dir/tds_utils.c.o
[ 33%] Building C object CMakeFiles/tds.dir/tds_elem.c.o
[ 50%] Building C object CMakeFiles/tds.dir/tds_hash.c.o
[ 66%] Building C object CMakeFiles/tds.dir/tds_vec.c.o
[ 83%] Building C object CMakeFiles/tds.dir/tds_atomic_counter.c.o
make[2]: *** No rule to make target `/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk/System/Library/Frameworks/Accelerate.framework', needed by `libtds.so'.  Stop.
make[1]: *** [CMakeFiles/tds.dir/all] Error 2
make: *** [all] Error 2

Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/tds-scm-1.rockspec - Build error: Failed building.

In the meantime, I will address the issue by downgrading Xcode to 7.3.1.

Transforms

Recently I found a very nice set of preprocessing transformations in a project by Facebook implementing ResNet (of which you are probably aware). I guess adapting and incorporating transforms.lua will be a nice addition to torchnet.

OptimEngine.test not implemented

Why is test() not implemented for the OptimEngine class? This is rather surprising to me. Seems like it should share the same functionality as SgdEngine.test(), so that that function should be used as Engine.test(), and inherited by both engines.

Happy to do this myself. Just wondering why something so necessary, e.g. for early stopping when validation error does stops going down, is missing.

Garbage Collection

In ParallelDatasetIterator, moving collectgarbage() from Line 109-110

https://github.com/torchnet/torchnet/blob/master/dataset/paralleldatasetiterator.lua#L109-L110

to Line 114 (i.e. in second function)
https://github.com/karandwivedi42/torchnet/blob/mem-usage/dataset/paralleldatasetiterator.lua#L112-L113

reduces the memory usage.

Without moving, memory usage peaks at around 30GB (4 threads, 256 batchSize, fb.resnet.torch pre-processing) and after moving to second function, it peaks at around 10GB.

fatal thread panic on parallelDatasetIterator

I get this message:

FATAL THREAD PANIC: (read) /Users/genovese/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <package.torchnet>

This is relevant code:

local tnt = require 'torchnet'

function getIterator(dataset)
    return tnt.ParallelDatasetIterator{
    init    = function() 
            tnt = require 'torchnet' 
            optParser = require 'opts'
            opt = optParser.parse(arg) end,
    nthread = opt.nThreads,
    closure = function(dataset)
            return tnt.DasetIterator{
                dataset = tnt.BatchDataset{
                    batchsize = opt.batchsize,
                    dataset = dataset
                }
            } end
        }
end

Torch net install problem

After I did:
luarocks install torchnet

I created a new file "torchtest.lua" as follow:
require ’nn’ local tnt = require ’torchnet’ local mnist = require ’mnist’

run
th torchtest.lua

I got the errors below:
robysmac:lua test zhaochangkai$ th torchtest.lua /Users/zhaochangkai/torch/install/bin/luajit: .../zhaochangkai/torch/install/share/lua/5.1/trepl/init.lua:384: .../zhaochangkai/torch/install/share/lua/5.1/trepl/init.lua:384: .../zhaochangkai/torch/install/share/lua/5.1/trepl/init.lua:384: module 'tds' not found:No LuaRocks module found for tds no field package.preload['tds'] no file '/Users/zhaochangkai/.luarocks/share/lua/5.1/tds.lua' no file '/Users/zhaochangkai/.luarocks/share/lua/5.1/tds/init.lua' no file '/Users/zhaochangkai/torch/install/share/lua/5.1/tds.lua' no file '/Users/zhaochangkai/torch/install/share/lua/5.1/tds/init.lua' no file './tds.lua' no file '/Users/zhaochangkai/torch/install/share/luajit-2.1.0-beta1/tds.lua' no file '/usr/local/share/lua/5.1/tds.lua' no file '/usr/local/share/lua/5.1/tds/init.lua' no file '/Users/zhaochangkai/.luarocks/lib/lua/5.1/tds.so' no file '/Users/zhaochangkai/torch/install/lib/lua/5.1/tds.so' no file './tds.so' no file '/usr/local/lib/lua/5.1/tds.so' no file '/usr/local/lib/lua/5.1/loadall.so' stack traceback: [C]: in function 'error' .../zhaochangkai/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' torchtest.lua:2: in main chunk [C]: in function 'dofile' ...gkai/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0109718d40

I use Mac ,with cuda supported graph card.

I switched clang to version 6 in order to use cuda.
robysmac:lua test zhaochangkai$ clang -v Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn) Target: x86_64-apple-darwin15.4.0 Thread model: posix

The AUC-meter evaluates differently from classical statistics

I've finished writing a basic test-suite for the meters and apart from issue #41 I've encountered an unexpected problem with the tnt.AUCMeter. The following test-case should hopefully implement classical AUC-calculation based on this paper:

function test.AUCMeter()
   local mtr = tnt.AUCMeter()

   -- From http://stats.stackexchange.com/questions/145566/how-to-calculate-area-under-the-curve-auc-or-the-c-statistic-by-hand
   local samples = torch.Tensor{
      {33,6,6,11,2}, --normal
      {3,2,2,11,33} -- abnormal
   }
   for i=1,samples:size(2) do
      local target = torch.Tensor():resize(samples:narrow(2,i,1):sum()):zero()
      target:narrow(1,1,samples[2][i]):fill(1)
      local output = torch.Tensor(target:size(1)):fill(i)
      mtr:add(output, target)
   end

   local error, tpr, fpr = mtr:value()

   tester:assert(math.abs(error - 0.8931711) < 10^-3,
      ("The AUC error does not match: %.3f is not equal to 0.893"):format(error))
end

Unfortunately the AUC is lower (0.704) than expected 0.893. I'm not familiar with ML enough to know if there ML AUC differs in some significant way but the value 0.704 seems intuitively low (my apologies if I missed something in the coding). After looking at how the AUC is calculated there is a zero appended that could possibly be pulling the value down.

Validation at the end of every epoch.

We can create two engines and do validation. But is there a better solution than this?

Fail to require torchnet when debugging using eclipseLDT

I am a bit new to torch. I setup eclipseLDT according to http://www.lighting-torch.com/2015/07/27/configuring-eclipse-with-torch/

I can debug code without require torchnet, but once I require torchnet, I get the following errors

qlua: .../torch/install/share/lua/5.1/argcheck/usage.lua:83: bad argument #2 to 'isatty' (FILE* expected, got table)
stack traceback:
[C]: at 0x7f30ab2179c0
[C]: in function 'isatty'
.../torch/install/share/lua/5.1/argcheck/usage.lua:83: in function 'render'
.../torch/install/share/lua/5.1/argcheck/init.lua:102: in function 'argcheck'
.../torch/install/share/lua/5.1/torchnet/utils/table.lua:35: in main chunk
[C]: in function 'require'
.../torch/install/share/lua/5.1/torchnet/utils/init.lua:23: in main chunk
[C]: in function 'require'
.../torch/install/share/lua/5.1/torchnet/transform.lua:12: in main chunk
[C]: in function 'require'
...h/install/share/lua/5.1/torchnet/dataset/listdataset.lua:13: in main chunk
[C]: in function 'require'
.../torch/install/share/lua/5.1/torchnet/init.lua:70: in main chunk
[C]: in function 'require'
.../LDT_workspace/test1/src/main.lua:14: in main chunk

ClassErrorMeter throwing size mismatch error

I get the following error "qlua: ...install/share/lua/5.1/torchnet/meter/classerrormeter.lua:79: target and output do not match"

My target and network.output values are as below:

I initialize the error meter like " local clerr = tnt.ClassErrorMeter{topk = {1}}
target values
20
11
12
7
12
12
12
12
16
10
15
12
12
1
12
12
[torch.CudaTensor of size 16x1]

output size values 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
[torch.CudaTensor of size 16x1]

I checked the source code of ClassErrorMeter to check what condition caused the error, but the error message doesn't say much .

Is there a plan to multi-gpu support?

Thanks.

allocated memory estimation.

How can i estimate the amount of allocated memory?

I've now used single gpu and a memory loading dataset(*.t7) file.
I met these cases. (the number of thread: 1)

case 1: dataset is 13GB, datatype is double, allocated memory: 13GB
case 2: dataset is 7GB (a half-sized one of case 1), datatype is double, allocated memory: 55GB
case 3: dataset is 13GB, datatype is float, allocated memory: 13GB
case 4: dataset is 7.7GB, datatype is float, allocated memory: 24GB

I don't know the relation. (I expected linearity.)

How to shuffle selected partition after SplitDataset ?

local tnt = require 'torchnet'

local d = tnt.TableDataset{data = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}}

d = d:split{
  train = 0.7,
  val   = 0.3
}

d:select('val')
print('val')
for sample in d:iterator()() do
  print(sample)
end

d:select('train')
print('train')
for sample in d:iterator()() do
  print(sample)
end

d=d:shuffle() -- gives a new dataset
print('train2')
for sample in d:iterator()() do
  print(sample)
end

d:select('val') -- no more select
print('val2')
for sample in d:iterator()() do
  print(sample)
end

FATAL THREAD PANIC after resuming training

For some reason ParallelDatasetIterator throws a "FATAL THREAD PANIC" error after I torch.load() (resume) a network from disk. It looks for a custom dataset class that is included in the init() and is found when simply running without resuming from a saved model. Any ideas?

facebookarchive / torchnet Goto Github PK

torchnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs