nitishsrivastava / deepnet Goto Github PK

View Code? Open in Web Editor NEW

893.0 893.0 437.0 41.93 MB

Implementation of some deep learning algorithms.

License: BSD 3-Clause "New" or "Revised" License

Python 85.46% Shell 1.84% C++ 12.50% C 0.19%

deepnet's People

Contributors

Stargazers

Watchers

Forkers

wool wqren tbluche fqhlxw520 dnouri willwilliams hqxu nvdnkpr yangxs dustinfreeman yanling2011 kashif shailcoolboy nudles fairymane alphaprime puneethmishra tjsongzw thinkski splade bigbear2017 kelvinchan90 xiaoyili ebattenberg lijinhui sanchitarora hestendelin xiling hujinshui jaewonk bjou theoryno3 alemagnani hlzz001 wycg1984 leoxiaobin 0ri0nx fanfannothing cmrnhrrs parthg phecy zxwu shurain khashayar0213 qyouurcs yiiwood vanessad mbrubake 0xas hellcoderz ahmed26 ejabberdupgrade zuiwufenghua someapp wwhu ghosthamlet jingtaow wangdongfrank yanshanjing cbinners timesofbadri jmhz j0rdm4n rockcdr birdgun smalliao wait4pumpkin xbsd donghuan tskyler jgabriellima biddyweb wavelets pepsalehi beesim chagge lyzhangjm f3r blgene openhero mosessky corvalius hdubey zge jormansa skywalkerq victorchi2009 stripathi669 mrgloom wenzheli weidezhang pavlikovskiy datascicat kelvinxu rootlessweed nikhil153 venkatesh-murthy sanjeebjena telser1 fangxiangfeng

deepnet's Issues

I had a problem when using normalize parameter in input layer

As title. When adding normalize: true hyperparams in input layer, I got error.
Does any one have the same problem ?

Traceback (most recent call last):
File "/usr/local/lib/python2.7/pdb.py", line 1314, in main
pdb._runscript(mainpyfile)
File "/usr/local/lib/python2.7/pdb.py", line 1233, in _runscript
self.run(statement)
File "/usr/local/lib/python2.7/bdb.py", line 387, in run
exec cmd in globals, locals
File "", line 1, in
File "../../trainer.py", line 1, in
from neuralnet import *
File "../../trainer.py", line 51, in main
model.Train()
File "../../neuralnet.py", line 450, in Train
self.SetUpTrainer()
File "../../neuralnet.py", line 418, in SetUpTrainer
self.SetUpData()
File "../../neuralnet.py", line 408, in SetUpData
verbose=self.verbose)
File "../../datahandler.py", line 479, in GetDataHandles
handlers.append(DataHandler(op, name_list, hyp_list, frac=proportions[i]))
File "../../datahandler.py", line 616, in init
self.gpu_cache.SetDataStats(i, stats_file)
File "../../datahandler.py", line 380, in SetDataStats
assert os.path.exists(stats_file), 'Stats file %s not found.' % stats_file
AssertionError: Stats file not found.
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program

/home/chengyangfu/deepnet/deepnet/datahandler.py(380)SetDataStats()
-> assert os.path.exists(stats_file), 'Stats file %s not found.' % stats_file

error when running the example dbm

Recently when I rerun the example dbm, I find the error:
"NoneType' object has no attribute 'euclid_norm'

About example of ae error

Hi nitish,
This is the error that when I ran the ae met, could you help me with that, many thanks!

Autoencoder 1
Train Step: 0Traceback (most recent call last):
File "../../trainer.py", line 60, in
main()
File "../../trainer.py", line 54, in main
model.Train()
File "/home/hchen/deepnet/deepnet/neuralnet.py", line 632, in Train
losses = self.TrainOneBatch(step)
File "/home/hchen/deepnet/deepnet/neuralnet.py", line 330, in TrainOneBatch
losses2 = self.BackwardPropagate(step)
File "/home/hchen/deepnet/deepnet/neuralnet.py", line 316, in BackwardPropagate
loss = self.ComputeDown(node, step)
File "/home/hchen/deepnet/deepnet/neuralnet.py", line 234, in ComputeDown
self.UpdateLayerParams(layer, step)
File "/home/hchen/deepnet/deepnet/neuralnet.py", line 291, in UpdateLayerParams
layer.Update('bias', step, no_reg=True) # By default, do not regularize bias.
File "/home/hchen/deepnet/deepnet/parameter.py", line 84, in Update
w = self.params[param_name] # Parameter to be updated.
KeyError: 'bias'

Bug in DataHandler

Hi,
I think I found a bug in DataHandler which involves dealing with multiple data files. It is in GPUCache.Get():

  def Get(self, batchsize, get_last_piece=False):
    """Return 'batchsize' data points from the cache.

    May return fewer points towards the end of the dataset when there are fewer
    than batchsize left.
    """
    skip = False
    if self._pos == self.datasize:
      self._pos = 0
      self.empty = True         # This line: load the next batch. 
    if self._pos == 0:
      if self.empty or self._maxpos < self.parent._maxpos:
        self.LoadData()
        self.empty = False
      if self.randomize and self._maxpos == self.parent._maxpos:
        # Shuffle if randomize is True and parent has not already shuffled it.
        self.ShuffleData()
    start = self._pos
    end = self._pos + batchsize
    if end > self.datasize:
      end = self.datasize
      skip = not get_last_piece
    self._pos = end
    if skip:
      return self.Get(batchsize, get_last_piece=get_last_piece)
    else:
      batch = [d.slice(start, end) for d in self.data]
      for i in range(self.num_data):
        if self.add_noise[i]:
          self.AddNoise(batch, i)
        if self.translate[i]:
          self.TranslateData(batch, i)
      return batch

Should the 10th line be there? Because without that, the code will never load the next files into the cache, and therefore it just reuses the already loaded data.
Please confirm if this is a bug, since I am using the 10th line as a fix...

a typo in multimodal_dbn runall_dbn.sh

it is defined as 'main_mem',
but used as 'cpu_mem' in 5 other places....

Parameters for Replicated Softmax

There are some parameters in replicated_softmax_layer.py: multiplicative_prior, additive_prior, adaptive_prior. Anyone knows the meaning of these parameters and how to set them?
In addition, it seems the ApplyActivation function does not follow the formula in the paper[1]. Is there any document explaining the mathematics behind the related code (e.g., how to compute p(h|v) and p(v|h) and sufficient statistics)

Thanks in advance.

[1]:Replicated Softmax: an Undirected Topic Model

Issue with GCC version

multimodel joint layer question

As your multimodel demo shows, the joint layer dimension is two fold of the image_hidden layer and text_hidden layer. If I set the joint dimension the same as the image_hidden layer and text_hidden layer, will the accuracy drop increasingly?

Problems with Running examples

Hi Nitish,
I compiled cudamat on Centos 6.3.When I run the test in cudamat,it seemed that the result of test_pow_matrix function was not very stable.I failed sometimes and succeeded when I tried again.

Besides, after setup the examples,I runned the rbm , it failed and prompted no gpu found,but I can run cuda-convnet.I tried both on K20 and Quardo 4000, is this because of the system?

Visualization of 2nd and 3rd layers weights of dbn and dbm

Does the file "visualize.py" has code for visualization of 2nd and 3rd layer weights and what is the theory behind them.

Best Regards,
Devendra

questions for convolution pbtxt files

Your convolution demo are perfectly fit the result(90% only need 100 train steps). But I don't know some parameter means such as

nloc = (x_width + 2_padding - size)/stride + 1

= (28 + 2_ 2- 5)/1 + 1 = 28

Is this a convolution layer or sample layer? Like LeNet series deep network, the first convolution layer size would be 24*24, if it's conv layer, why the size not change? It's a little hard to understand your comment, maybe you can give me what paper you refer to. Also, like the layer type, what does the parameter "shape" and "receptive_field_width" mean? I'm afraid Hinton's training guide have not mentioned it.

When I tried to change the configure file to fit the LeNet. I only change the num_filters in the hidden1 layer, the error occur like that:
Traceback (most recent call last):
File "../../trainer.py", line 67, in
main()
File "../../trainer.py", line 62, in main
model.Train()
File "/home/chase/deepnet/deepnet/neuralnet.py", line 522, in Train
self.SetUpTrainer()
File "/home/chase/deepnet/deepnet/neuralnet.py", line 482, in SetUpTrainer
self.LoadModelOnGPU()
File "/home/chase/deepnet/deepnet/neuralnet.py", line 115, in LoadModelOnGPU
self.edge.append(self.EdgeClass(edge, node1, node2))
File "/home/chase/deepnet/deepnet/edge.py", line 34, in init
self.LoadParams()
File "/home/chase/deepnet/deepnet/edge.py", line 298, in LoadParams
n_locs = self.AllocateMemoryForConvolutions(param, node1, node2)
File "/home/chase/deepnet/deepnet/edge.py", line 243, in AllocateMemoryForConvolutions
assert numdims % num_colors == 0
AssertionError

Is it a bug?

Model for Convolutional network

Maybe I missed something, but could you provide a simple example on how to write a pbtxt file for training a convolutional NN? I think it is supported (since prontos/deepnet.proto has some parameters for that) but I have some doubts, like should I have Parameter.Convolution in a Layer or in Edge, and in that case, what should be the name for the parameter etc... I think a simple example would clear those doubts.
Btw, with some minor changes in Make script and CUDA code, I managed to compile the code successfully on CUDA 5. I can send you the modified code (or a description about what have been changed), if it is needed.

Thank you for making this awesome library be available.

theory behind code

Dear @nitishsrivastava , please tell us the papers used for various implementation, maybe update them in README, it would be much helpful for us to understand the code, thanks so much!

How to configure sharing weights in deepnet?

Hi nitish,
I wanna use sharing weights in some edges in deep models, could you public some examples about it, or just tell me how to configure it in pbtxt files? Thanks in advance.

a bug & some question

My Nvidia display card is old GT9800, your trainer.py code I change it like that

def LockGPU(max_retries=10):
cm.cublas_init()

so that I can run it.

I do an experiment on small norb that I first resize all the image to 32x32 and the test data and validation data I make them point to all the same test file. Here is my result on dbn modal.
E_CE: 0.000 E_Acc: 0.000 (0/1000) E_CE: 18.559 E_Acc: 0.200 (199/1000)
I have read your code several times and my question is why the evaluate result on validation data and test data are different even though they point to the same data bunch?

Some questions when using dropout

Hi Nitish,
I am using dropout in deepnet now. But I got 2 questions about details of the usage of dropout. I am wondering could you or anyone else help me.

When I do layer-wise pretraining for a DBM with dropout, for example, the DBM consist of a input layer, 2 hidden layers, h1 and h2, with dropout_prob 0.2, 0.5, 0.5 respectively. After I pretrained the bottom RBM(RBM1), is it correct that I directly use the extracted representations of RBM1 to train the top RBM(RBM2)?
What I mean is that, when extract the representations of RBM1, the nodes of h1 would be multiplied by (1-dropout_prob) after actived. When these data are used to train RBM2 as inputs, they would be corrupted again by being multiplied by the dropout mask, it seems incorrect in my mind.
I am training a multimodal DBM with dropout with 2 modalities. I will use the representations of the joint hidden layer as features to do regression by SVR. Having read that dropout should not be used in the output layer of a DL model , I am wondering in my case, should the joint hidden layer be seen as a output layer? Is it proper to use dropout in it?
Thanks in advance.

bug report in relu_layer.py

I asked questions about relu RBM before. In relu RBM, the error will increase no matter how low the learning rate I set. So I assume there exist a bug in the relu_layer.

set x = v*w +b
In the Sample function in relu_layer.py:
the sample will be max(0, N(0, var( max(0, x)))

However in the paper "ReLUs improve RBMs" in ICML 2010
the sample is max(0, x+N(0, var(x))
Obviously, the sampling function in relu_layer.py is wrong.
The following code is how I modify it.

1 def ApplyActivation(self, neg=False):
12 if neg:
13 state = self.neg_state
14 else:
15 state = self.state
16 self.preState.assign(state) ## add a variable to store X , and this variable is create in layer.py

17 state.lower_bound(0)
18
19 def Sample(self, neg=False):
20 if neg:
21 sample = self.neg_sample
22 state = self.neg_state
23 else:
24 sample = self.sample
25 state = self.state
26
27
28
29 self.preState.sample_gaussian(target=sample, mult=1.0) ## N(0,var(x)
30 sample.add(self.preState ) ## x+N(0,var(x)
31 sample.lower_bound(0) ## max(0, x+N(0,var(x))

dbn example: 'DBN' object has no attribute 'layer'

Hi Nitish,
The rbm example works perfectly, but dbn example fails on rbm2 (after completing rbm1 training) with the following:
Traceback (most recent call last):
File "../../trainer.py", line 56, in
main()
File "../../trainer.py", line 51, in main
model.Train()
File "/deepnet/deepnet/neuralnet.py", line 497, in Train
self.CopyModelToCPU()
File "/deepnet/deepnet/neuralnet.py", line 340, in CopyModelToCPU
for layer in self.layer:
AttributeError: 'DBN' object has no attribute 'layer'

Thanks!

Sergey

question about rectified_linear

In ff example, the rectified_linear activation function performs much better than the logistic function. Does there exist any method to pretrain the rectified_linear Neural Nets?

Linear Layer error positive phase

Train Step: 0Traceback (most recent call last):
File "../../../deepnet/deepnet/trainer.py", line 56, in
main()
File "../../../deepnet/deepnet/trainer.py", line 51, in main
model.Train()
File "/na/homes/dhjelm/CUDANET/deepnet/deepnet/neuralnet.py", line 646, in Train
losses = self.TrainOneBatch(step)
File "/na/homes/dhjelm/CUDANET/deepnet/deepnet/dbm.py", line 270, in TrainOneBatch
losses1 = self.PositivePhase(train=True, step=step)
File "/na/homes/dhjelm/CUDANET/deepnet/deepnet/dbm.py", line 131, in PositivePhase
self.ComputeUp(node, train=train)
File "/na/homes/dhjelm/CUDANET/deepnet/deepnet/dbm.py", line 108, in ComputeUp
if layer.pos_phase:
AttributeError: 'LinearLayer' object has no attribute 'pos_phase'

infer text pathway conditioned on images in "multimodal_dbn"

in sample_text.py,,,, why set layernames_to_unclamp to ['text_input_layer', 'text_hidden2']?

why is 'text_hidden1' clamped ?

what does negative_phase_order mean in DBM?

Cudamat typeof compile error

Can not install cudamat for this error:

cudamat.cu(1055): error: identifier "typeof" is undefined

cudamat.cu(1055): error: expected a ";"

cudamat.cu(1056): error: identifier "res" is undefined

libeigenmat.so: cannot open shared object file

I'm getting this error when running ./runall.sh

_eigenmat = ct.cdll.LoadLibrary('libeigenmat.so') File "/usr/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary return self._dlltype(name) File "/usr/lib/python2.7/ctypes/__init__.py", line 365, in **init** self._handle = _dlopen(self._name, mode) OSError: libeigenmat.so: cannot open shared object file: No such file or directory

No GPU board available

I can run ./runall.sh under /example/rbm successfully at the first time, and my GPU works normally. However, there is an error on the print function due to python version incompatibility I think. After that, when I run the example again, it is terminated with a message "No GPU board available."

I seems that the code locked my GPU last time, and it didn't free it. How can I unlock my GPU?

segfault in dbn example

Hi Nitish,
I've run the dbn example over the weekend and it segfaulted on running Classifier. Here is the output:
./runall.sh: line 11: 15313 Segmentation fault (core dumped) ${train_deepnet} mnist_classifier.pbtxt train_classifier.pbtxt eval.pbtxt

real 3191m57.396s
user 3186m10.271s
sys 1m33.154s

And this is after 2 days of crunching :) Good all checkpoints are saved.

Thanks!

Sergey

no field named "mcmc_steps" and "lower_model" when running multimodal_dbm example

hi nitish srivastava,
I have problems when running multimodal_dbm example, the error messages are:
google.protobuf.text_format.ParseError: 17:3 : Message type "deepnet.Hyperparams" has no field named "mcmc_steps".

google.protobuf.text_format.ParseError: 23:1 : Message type "deepnet.Model" has no field named "lower_model".

It seems that the multimodal_dbm example are not completely compatible with deepnet.

By the way, if I want to use dropout in multimodal_dbm, specifically I want to use dropout_prob=0.2 for image input layer and dropout_prob=0.5 for image hidden1 layer and hidden2 layer , is it correct that I use dropout for image hidden1 layer both in running image rbm1 and image rbm2? Or I should just use dropout for the visible layer in each rbm?

multimodal_dbn

Hi, nitish,
I am testing multimodal_dbn, and encountered an error..
It says: AttributeError: /home/qianlong/deepnet/cudamat/libcudamat.so: undefined symbol: perturb_prob
Need I recompile cudamat ?
See the pic if it can provide any information...

What does "shape" mean in dbn?

Hi nitish,

Could you kindly explain the parameter "shape" from https://github.com/nitishsrivastava/deepnet/blob/master/deepnet/examples/dbn/mnist_rbm1.pbtxt?

Thanks so much!

multimodal_dbn assertion error

Hi Nitish,

With the multimodal_dbn script, I'm getting an error as it starts to train the joint layer:

AssertionError: Size of text_hidden2_train is not 850580

I'll go back and recheck things, but any ideas on this?

Thanks,
Ron

Code Development without GPU

Dear Nitish Srivastava

Is there any way in which can use this library if we do not have GPU's to compile the code.

Best Regards,

Sincerely,
Devendra Singh

CUDA error

We haven't changed our hardware or anything related to CUDA, but we are getting errors. What might cause this?

no CUDA-capable device is detected
Traceback (most recent call last):
File "../../../deepnet/deepnet/trainer.py", line 56, in
main()
File "../../../deepnet/deepnet/trainer.py", line 47, in main
board = LockGPU()
File "../../../deepnet/deepnet/trainer.py", line 20, in LockGPU
cm.cuda_set_device(board)
File "/na/homes/dhjelm/CUDANET/deepnet/cudamat/cudamat.py", line 1630, in cuda_set_device
raise generate_exception(err_code)
cudamat.cudamat.CUDAMatException: CUDA error: no CUDA-capable device is detected

rbm example problems

Hi Nitish,
I have successfully compiled your code following the instructions and I am trying to run the rbm example (deepnet/examples/rbm). I have two issues:

./runall.sh
Traceback (most recent call last):
File "../../trainer.py", line 7, in
from cudamat import gpu_lock
ImportError: cannot import name gpu_lock

Hmm! If I just fire up python and execute "from cudamat import gpu_lock" - works smoothly. So what I have done, I have just copied trainer.py into examples/rbm and edited train.pbtxt to account for the change. And not it runs but see below

Fails on data import, I am printing filename_list in the Get method of the datahandler and it is empty as you can see.
Thanks!

Train Step: 0[]
Traceback (most recent call last):
File "trainer.py", line 56, in
main()
File "trainer.py", line 51, in main
model.Train()
File "/na/homes/splis/soft/tools/deepnet/deepnet/neuralnet.py", line 463, in Train
self.GetTrainBatch()
File "/na/homes/splis/soft/tools/deepnet/deepnet/neuralnet.py", line 363, in GetTrainBatch
self.GetBatch(self.train_data_handler)
File "/na/homes/splis/soft/tools/deepnet/deepnet/dbm.py", line 229, in GetBatch
super(DBM, self).GetBatch(handler=handler)
File "/na/homes/splis/soft/tools/deepnet/deepnet/neuralnet.py", line 354, in GetBatch
data_list = handler.Get()
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 621, in Get
batch = self.gpu_cache.Get(self.batchsize, get_last_piece=self.get_last_piece)
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 397, in Get
self.LoadData()
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 313, in LoadData
data_cpu = self.parent.Get(self._maxpos)
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 250, in Get
self.LoadData()
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 222, in LoadData
self.data = self.parent.Get(self._maxpos)
File "/na/homes/splis/soft/tools/deepnet/deepnet/datahandler.py", line 65, in
Get
current_file = (self.last_read_file[i] + 1) % num_files
ZeroDivisionError: integer division or modulo by zero

AutoEncoder mirroring by transpose

Dear Nitish,

In your example of ae, also in the case we have 3 or more layers, will be the second half automaticaly mirrored using transpose of weights ? My tests concluded that not really, by visualising and comparing the weights. In a simple case of e.g. [ly1]*768-[ly2]512-[ly3]768 , [ly3] is declared tied_to: [ly1] , all obviously using SQUARED_LOSS global, but question is how to tell [ly2] to take weights transposed from ly1<->ly2 always. I found the "tied_transpose: true" directive but seems to not work as expected or i dont know what is it doing.

Thank You.

Is it correct to set mult_factor in edge parameters when using dropout with pre-training?

Hi Nitish,
In your paper "Improving Neural Networks with Dropout", it says "Dropout nets can also be pretrained using these techniques. The procedure is identical to standard pretraining except with a small modi cation - the weights obtained from pretraining should be scaled up by a factor of 1/p." Does this mean that I should set mult_factor equal to (1-dropout_prob) in edge parameters in pbtxt files when using dropout with pre-training?
Another question is that if two layers of an edge have different dropout_prob, for example input layer with dropout_prob 0.2 and hidden1 layer with dropout_prob 0.5, how to set mult_factor in this condition?
Thanks in advance.

error with cudamat and cudamat_conv library

hi nitish srivastava,

I have a problem when I test the cudamat library by running "python test_cudamat.py".It just shows that "ran 0 tests".

I then use "nosetests"command in that dir,it got an exception "CUDAMatException:CUBLAS error".

I looked at the code in cudamat ,and it seems that the exception was thrown when copy_to_device function is called in cudamat.py,which does the work of "coping matrix to the GPU".
I'm new to cuda .now I really have no idea how to solve it. Any hints will be greatly appreciated.

ansyral

Sorry ,I‘ve figured out the reason, I just list here in case someone else also has the problem:
the cuda device wasn't recognized because I haven't a dev file in /dev. all you need to do is just execute the shell:

!/bin/bash

/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then

Count the number of NVIDIA controllers found.

N=expr $N3D + $NVGA - 1
for i in seq 0 $N; do
mknod -m 666 /dev/nvidia$i c 195 $i
done

mknod -m 666 /dev/nvidiactl c 195 255

else
exit 1
fi

the reference is here:http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html

how helpful if there is a wiki

I am now struggling to figure out how to run the model for dbm from http://www.cs.toronto.edu/~nitish/multimodal/

It's really helpful if there is a wiki helping us run the model step by step...That will be a good start point..

Did this package work well for Mac OS 10.8.2 GeForce 660M GPU?

I got libcudamat.dylib and libcudamat_conv.dylib,
then when I tried to test it with "python test_cudamat.py",
the result is:
Ran 0 tests in 0.000s
OK

Then, I tried to run sh runall.sh in the rbm folder and finally, it showed:

No GPU board available.

Could someone help figure out the issue?

Thanks so much!!!

Tied weights and TransposedCUDAMatrix

Hi,

I'm trying to modify the autoencoder model_layer1.pbtxt file to tie the weights from the hidden layer to the output layer to be the transpose of those from the input layer to the hidden layer. When I run this I get the error:

AttributeError: 'TransposedCUDAMatrix' object has no attribute 'shape'

Any ideas how to fix this? I tried making the TransposedCUDAMatrix be derived from CUDAMatrix instead of object, which fixed that problem but then died when trying to use the .T operator.

Thanks,
Ron

missing a library

Hi Nitish,

Thanks for putting up the code.
I tried to run setup_examples.py; however, it seems that deepnet.py is missing in the package. Can you please check that out?

Here is the error:
Traceback (most recent call last):
File "setup_examples.py", line 2, in
from deepnet import util
ImportError: No module named deepnet

Thanks,
-M

what does "clobber" mean in multimodal_dbn

att

Data format for training different models

Hi,

I would like to know if the data format used by your model is same as that used by cuda-convnet?
I am dealing with different types of data, so wanted to know if there is any function which is pre-built to conver this data into the type compatible with different models.

Let me know.

Thanks.

what is the 'output path' mean in the running example?

After change to deepnet/deepnet/examples, and running the following command,
$python setup_examples.py

I make a 'out' directory under example, and ran
$python setup_examples.py out
no file is added to the 'out' directory

g++ version backdate? [Cudamat]

When trying to run the build.sh script during the cudamat_conv installation portion, I get the following error:

make: /usr/bin/g++-4.4: Command not found
make: *** [bin/linux/release/_ConvNet.so] Error 127

Is it necessary for me to backdate g++ to 4.4 or have I made a mistake elsewhere?

rbm/runall.sh execution problem

I am getting this error if I run the script

Traceback (most recent call last):
File "../../trainer.py", line 5, in
from sparse_coder import *
File "/home/retina18/deepnet/deepnet/sparse_coder.py", line 3, in
import scipy.linalg
File "/usr/lib/python2.7/dist-packages/scipy/linalg/init.py", line 9, in
from basic import *
File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 14, in
from lapack import get_lapack_funcs
File "/usr/lib/python2.7/dist-packages/scipy/linalg/lapack.py", line 14, in
from scipy.linalg import flapack
ImportError: /usr/lib/liblapack.so.3gf: undefined symbol: ATL_chemv

Any idea why?

Memory usage on AWS

Hi Nitish,

Thanks for making deepnet available. I'm looking forward to working with it.
I'm able to run most of the examples, but when I try the multimodal_dbn, I get MemoryErrors when extract_rbm_representation.py is being run. I'm running on an AWS Cluster GPU instance which has 6GB on the GPU and 22 GM main memory. I've been progressively shrinking the gpu_mem (from 4 to 2 to 1) and main_mem values (20,18,16,10,8). With them set to 1G and 8G, respectively, the image layer 1 extract completed for the first time, and layer 2 trained, but then the extract on layer 2 failed. The error dump is appended.

Any suggestions on how to fix this? I notice in extract_rbm_representation that there is an additional memory=10G setting - do I need to make that be no larger than the main_mem setting?

Thanks,
Ron

Writing to /vol/FlickrPreproc/flickr/dbn_reps/image_rbm2_LAST/train
998Traceback (most recent call last):
File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 81, in
main()
File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 76, in main
data_proto=data_proto)
File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 40, in ExtractRepresentations
layernames, output_dir, memory=memory, dataset=dataset, input_recon=True)
File "/home/ubuntu/src/deepnet-master/deepnet/dbm.py", line 360, in WriteRepresentationToDisk
datagetter()
File "/home/ubuntu/src/deepnet-master/deepnet/neuralnet.py", line 370, in GetTrainBatch
self.GetBatch(self.train_data_handler)
File "/home/ubuntu/src/deepnet-master/deepnet/dbm.py", line 229, in GetBatch
super(DBM, self).GetBatch(handler=handler)
File "/home/ubuntu/src/deepnet-master/deepnet/neuralnet.py", line 361, in GetBatch
data_list = handler.Get()
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 627, in Get
batch = self.gpu_cache.Get(self.batchsize, get_last_piece=self.get_last_piece)
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 396, in Get
self.LoadData()
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 332, in LoadData
self.data[i].overwrite(mat)
File "/home/ubuntu/src/deepnet-master/cudamat/cudamat.py", line 161, in overwrite
array = reformat(array)
File "/home/ubuntu/src/deepnet-master/cudamat/cudamat.py", line 1621, in reformat
return np.array(array, dtype=np.float32, order='F')
MemoryError
./runall_dbn.sh: line 71: 3880 Segmentation fault (core dumped) python ${extract_rep} ${model_output_dir}/image_rbm2_LAST trainers/dbn/train_CD_image_layer2.pbtxt image_hidden2 ${data_output_dir}/image_rbm2_LAST ${gpu_mem} ${cpu_mem}

pbtxt parameters

Hi, the pbtxt files are a bit difficult to configure. At least, at some parts it's intuitive given the examples, other times it is not. What are the defaults for each parameter, what are the available parameters (eg in the case of activation, "logistic", "relu", "hyperbolic tagent"?). Reading through the examples is easy, but incomplete. Reading through the code is incomplete but difficult.

Unfortunately there are some parameters for even a basic DBN which are not intuitive. "shape" is one of them: what is this parameter in terms of the bias or interaction in network, and why do i need it? Is these parameters that will be eventually used for optimization? There are two of them of the same name, and oftentimes they change with the same layer (say hidden1 in the dbn training). In addition, sometimes shape (first one) x shape (second one) = dimensions (number of hidden nodes?). Or is it shape^2 (the first one) and the second one is something else entirely?

Anyways, thanks in advance!

What is the difference between "compute_sparsity" in a hidden layer and "sparsity" in hyperparams?

I found in some "model.pbtxt" files, for example, model.pbtxt in mnist rbm model example, there are two kinds of parameters related with sparsity, one called "compute_sparsity" in the field "layer", and another called "sparsity" in the field "hyperparams". Sometimes the two parameters contradict with each other, usually compute_sparsity=true but sparsity=false, this really confused me. What is the difference between them?

could you post multimodel pretrain files?

Well, you know, the flicker data set are very large and it's going to be a very very long time for my old computer to train it. Would you please post all the multimodel 'LAST' pretrain file to the git? I'm still confused about the training problem in some details. If I can get the pretrain files that would hepl me a lot.

Can't run rbm/runall.sh

It seems that this project doesn't have support for CUDA 5.5. When I compiled the cudamat with CUDA 5.5. It reported that 'uint is undefined'. So I added '#define uint unsigned int' in the header. Then after I compiled libcudamat_conv.so and libcudamat.so, I failed on 'python test_cudamat.py'. I didn't got error message, but I got the following message:

Run 0 tests in 0.000s

I ignored this and tried to run the sample code. I mean the runall.sh in /deepnet/examples/rbm, then it told me 'No GPU board available'. I don't know how to solve this. By the way, I use mac OS 10.8.4. Should I go back to older version of CUDA (such as 4.2, 5.0) or find a linux OS? Thanks!

nitishsrivastava / deepnet Goto Github PK

deepnet's People

Contributors

Stargazers

Watchers

Forkers

deepnet's Issues

nloc = (x_width + 2_padding - size)/stride + 1

= (28 + 2_ 2- 5)/1 + 1 = 28

Thanks!

Thanks!

!/bin/bash

Count the number of NVIDIA controllers found.

Recommend Projects

Recommend Topics

Recommend Org

Jobs