dmlc / minerva Goto Github PK

Minerva: a fast and flexible tool for deep learning on multi-GPU. It provides ndarray programming interface, just like Numpy. Python bindings and C++ bindings are both available. The resulting code can be run on CPU or GPU. Multi-GPU support is very easy.

License: Other

Python 37.08% C++ 49.64% CMake 1.90% C 0.05% Cuda 5.33% Shell 0.37% Protocol Buffer 5.63%

minerva's Introduction

Distributed Machine Learning Common Codebase

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel

What's New

Note on Parameter Module for Machine Learning

Known Issues

RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
Try to introduce minimum dependency when possible

CheckList before submit code

Type make lint and fix all the style problems.
Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

minerva's People

Contributors

Stargazers

Watchers

Forkers

alienfeel trangle provemyself lqshixinlei spideryan uknowxk ilovefree2 njuhugn xianyi kai4729 voidexception jingtaow xingdi-eric-yuan darcy0511 yiiwood atomcat vangogh0318 bobye yuewong y-x-c weiyb chagge yjcelly yanshanjing niuzhiheng mapleyustat wenhuizhang yanqingmen dzhwinter xshhhm lawrencelj fancyerii leizi007 fudanimc tigerneil xaccc 0x0all clab snazz2001 amos-zq apprisi zuiwufenghua uwroute cuijianzhu thomasdic2000 gongzhen orangelpai exlsunshine wangdongfrank tandakun chandlerz josephwinston maymaomao ericxsun jethrotan david61 wycharry nagyistoce xubingyue itortoise nkhuyu txd866 vickyzhou jwmneu sandyfy zuiwanting caohao2008 elviswf kingnus evansky wellsoftware guzhaki lovi9573 dreadlord1984 phecy zhouxiazx zhangmeishan txliheng chengduozhao maorenxin yskamiya kitstar zkailinzhang acctforgh mrgloom athrunarthur kublai-jing zjucsxxd guker cequencer lukemetz poneyo malkocb evanhon yoavg wyvern92 meanmee deepdream-community tomzhang joseph1992yu

minerva's Issues

use_dag flag

Hello,
I am experiencing a crash when trying to launch without the dag from python.

$ python tt.py --use_dag=false
[22:16:29] /home/luke/Repos/minerva/minerva/system/minerva_system.cpp:89: dag engine disabled
*** Error in `python': free(): invalid pointer: 0x000000000102a498 ***
Aborted (core dumped)

I tracked it back to:
https://github.com/dmlc/minerva/blob/master/owl/owl/libowl.pyx#L46
where argv is being freed.

Commenting that out does fix the problem but most likely leaks.

I am running from ubuntu 15.04, building with CPU only. with no flags launches without an issue.

Thanks!

Unittest compilation failed

When compiling unittest, I got error on my machine (GCC 4.9.2 prerelease, Arch Linux 2015.03.01, kernel 3.18.6). This is caused by following things in this commit 2e886a4:

unittest_main is built as a shared library while for gtest, usually it only builds .a.
The -flto flag is not working with my GCC.

I suggest fixing them by changing back to static library for unittest and removing lto flag (or perform a corresponding check on that flag).

Difference between Minerva and MShadow

Dear all,

I was surprised too see that Minerva is now part of the umbrella project DMLC. So I'm a little bit lost in the middle of these awesome tools. What is the difference between them ? Especially between minerva, mshadow and Cxxnet. I'm interested in implementing a distributed version of a triplet convolutional network that I have in Caffe.

Tutorial, API Documentation?

Hi ,
Is there tutorial for the API in the project ? I've found only the sample code (mnist_mlp, mnist_cnn....)

Thanks!

automatically detect GPU compute capability and set -arch correctly

Currently we set -arch sm_35, which leads to "invalid device function" in older GPUs. There seems to be some effort in automatically detecting GPU compute capability and set it correctly: https://public.kitware.com/Bug/view.php?id=11767

Let's see if we can use it in our CMakeLists.txt

Fast convolution using FFT

This paper: Fast Training of Convolutional Networks through FFTs shows that FFT-based convolution is faster than convolution-unrolling implementation in CuDNN.

It's implemented in Torch

Has anyone in the Minerva team tried this approach?

C++ Documentation?

IS there a plan to release C++ API docs?

For instance, I saw concat function in python, is there a equivalent one in C++?

Thanks,
Kublai

Missing index 0 in CPU SoftmaxForward

The Cpu implementation of SoftmaxForward in /minerva/op/impl/basic.cpp misses the 0 index in the second (line 249) and third (line 255) loops.

Pooling output dimension is confusing if we give a non-square matrix as input

Hi,

In the given MNIST_CNN example in owl, if we set batch size as 4, the input dimension is something like [28,28,1,4], since the pic is a 28*28 square matrix. However, I found that when input matrix is non-square, the output dimension is confusing. I am wondering if it is a bug or it is implemented that way intentionally.

For example, if input.shape is [4, 2, 1, 4] in owl format, while I set "pooling = conv.Pooler(2, 2, 2, 2, 0, 0, conv.pool_op.max)" and after I did "pooling.ff(input)" I am expected to have [2,1,1,4]. But actually I got [1, 2, 1, 4] as the output dimension in owl. Should I expect to have [1, 2, 1, 4] as the result or there is something going wrong inside the pooling function?

Great appreciation for any comments and suggestions. Thanks!

Binary Classifier - Log loss function

Hi,
Fantastic Library!
I was just wondering, i am trying to use the library for a binary classifier experiment using the log loss function to train the model. This is for a university experiment around benchmarking different models. Would you have time to provide an example of how to use the library to achieve the above goal.
Also showing a visualization in how the algorithm learns and decreases the error.
Many thanks,
Best,
Andrew

softmax not implemented on CPU

Jiaxing's intern is working on LSTM on CPU. He need softmax.

How is Minerva/Owl different from Theano?

What advantages does it have over Theano?

Does Minerva/Owl have automatic differentiation capability that Theano has?

I had thought that Minerva is more comparable to Caffe but it looks like Minerva is more similar to Theano than it is to Caffe. Is it correct?

Is it easy(possible) to implement RNN?

Could you provide a simple example?

Can't build apps

Does minerva work with CUDA 7? I installed CUDA 7 and all the samples worked well but I cannot build minerva apps with errors like undefined reference to curandGenerateNormal.

...
-- cmake generator: Unix Makefiles
-- cmake build tool: /usr/bin/make
-- cmake build type: Release
-- Found cuDNN (include: ~/cudnn2, library: ~/cudnn2/libcudnn.so)
-- Found BLAS (include: /usr/include, library: /opt/openblas/lib/libcblas.so)
-- build C++ applications              -- 1                                                                                                                                                         [9/1758]
-- build unit tests                    -- 0
-- build cpu-only version              -- 0
-- build with parameter server support -- 0
-- build with BLAS library for CPU     -- 1
-- Build CXX Applications:
--   mnist_cnn_2gpu
--   mnist_mlp
--   main
--   mnist_cnn
-- Configuring done
-- Generating done
-- Build files have been written to: ~/minerva/release
[ 16%] Built target gflags
[ 32%] Built target dmlc-core
[ 32%] Built target third-party
Linking CXX shared library ../lib/libminerva.so
[ 91%] Built target minerva
Linking CXX executable main
../lib/libminerva.so: undefined reference to `curandGenerateNormal'
../lib/libminerva.so: undefined reference to `curandSetPseudoRandomGeneratorSeed'
../lib/libminerva.so: undefined reference to `curandCreateGenerator'

Here's my config.in file

BUILD_DIR=release
CXX=g++
CC=gcc
CXXFLAGS=
CUDA_ROOT=/usr/local/cuda
CUDNN_ROOT=/home/zer0n/cudnn2
BUILD_TYPE=Release
BUILD_OWL=0
BUILD_CXX_APPS=1
BUILD_TESTS=0
BUILD_WITH_PS=0
PS_ROOT=
BUILD_CPU_ONLY=0
BUILD_WITH_BLAS=1
BLAS_ROOT=/opt/openblas

I have successfully installed Caffe with CUDA 7 using this instruction tho.

minerva with ps

Hi,

I am trying to compile minerva with the PS (as suggested in https://github.com/dmlc/minerva/wiki/Integrate-with-PS,). But I got some problems.
First, I cannot find the ps branch of minerva, did you open it in the repository ?
Second, I downloaded the ps from https://github.com/hjk41/parameter_server, after compiling, I can obtain libminervaps.a and libminervaps.so (using a previous version of Makefile). However, when I trying to compile minerva, I got the the following errors:

...
Linking CXX shared library ../lib/libminerva.so
/usr/bin/ld: cannot find -lminervaps
collect2: error: ld returned 1 exit status
make[2]: *** [lib/libminerva.so] Error 1
make[1]: *** [minerva/CMakeFiles/minerva.dir/all] Error 2
make: *** [all] Error 2

I tried to include the PATH, but it doesn't work.
BTW, compiling the project without ps is fine.

Can you help me with those?

Thanks,
Tao

mnist_cnn failed at Epoch #0

Here is the error:
F0310 16:38:25.787619 9513 narray.cpp:126] Check failed: lhs.Size(1) == rhs.Size(0) (512 vs. -6833920) size must match
the acts[6] has size: [32845682 4 32 256 ], and minibatch_size is 256.

Images for README.md

Differences with MShadow

hi, all
what's the difference between MShadow and minerva? From the documents, I know that both of them can perform tensor operations of unified form on both CPU and GPU. And their languages are both C++. So I'm wondering what's the main difference between them?

Request owl.elewise.pow/sqrt

Hi guys,

Is that possible for you to provide a element-wise power/sqrt function for the python interface? I'd like to adjust the weight use Adagrad or RMSprop, which requires a such operation. However, the numpy solution does not work well on multi-GPU case.
Or can you show me some instruction about how to add such operations?

Thanks.
Tao

mnist walk through example is out-of-date with the new cython interface

Need to modify the wiki to make this consistent.

script to convert the pre-trained model to caffe format

I am very interested in the minerva project and glad to deploy minerva on our lab servers. But currently we have a lot of existing codes that are largely only compatible with caffe. I am wondering if you could provide a script that could convert the networks trained by minerva back to caffe format. I think it will help grab the attention of caffe users and gradually turn them to minerva :)

build failed on ubuntu14.04

File /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */

include <pthread.h>

int main(int argc, char** argv)
{
(void)argv;

ifndef pthread_create

return ((int*)(&pthread_create))[argc];

else

(void)argc;
return 0;

endif

}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp

Run Build Command:/usr/bin/make "cmTryCompileExec1004526741/fast"
/usr/bin/make -f CMakeFiles/cmTryCompileExec1004526741.dir/build.make CMakeFiles/cmTryCompileExec1004526741.dir/build
make[1]: Entering directory /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp' /usr/bin/cmake -E cmake_progress_report /home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp/CMakeFiles 1 Building C object CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o /usr/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o -c /usr/share/cmake-2.8/Modules/CheckFunctionExists.c Linking C executable cmTryCompileExec1004526741 /usr/bin/cmake -E cmake_link_script CMakeFiles/cmTryCompileExec1004526741.dir/link.txt --verbose=1 /usr/bin/gcc -DCHECK_FUNCTION_EXISTS=pthread_create CMakeFiles/cmTryCompileExec1004526741.dir/CheckFunctionExists.c.o -o cmTryCompileExec1004526741 -rdynamic -lpthreads /usr/bin/ld: cannot find -lpthreads collect2: error: ld returned 1 exit status make[1]: *** [cmTryCompileExec1004526741] Error 1 make[1]: Leaving directory/home/ubgpu/github/DMLC/minerva/release/CMakeFiles/CMakeTmp'
make: *** [cmTryCompileExec1004526741/fast] Error 2

ubgpu@ubgpu:~/github/DMLC/minerva$

Does installation of Own module require dmlc?

I followed the Build Owl module instruction here (https://github.com/dmlc/minerva/wiki/Install-Minerva) but the build fails

minerva/narray/narray.h:3:26: fatal error: dmlc/logging.h: No such file or directory
 #include <dmlc/logging.h>

Can't get libminervaps.a

I'm following the instructions to integrate minerva with parameter server. But I always fail to get libminervaps.a. I have some confusions here. I don't understand "then compile with make minerva". What does this mean, should I make under the parameter server sources or under minerva? If under parameter server directory, it says not such target. And under minerva directory, it will not give me libminervaps.a. I'm sure that I run configure with parameter enabled.

Updates on `owl` API cheat sheet

elewise.mult is missing.
Add mapping of their corresponding C++ interface

Is there a functional google groups or email address for this project?

The emails I've sent

[email protected]

bounce back.

My question is about recurrent functions and BPTT; does the C++/cuda codebase currently have implementations?

Does Owl trainer randomly shuffle data in each epoch?

If yes, how does it shuffle data for very large datasets (e.g. ImageNet)?

./main: symbol lookup error: ./main: undefined symbol: _ZN7minerva13MinervaSystem15CreateCpuDeviceEv

I can successfully run the given c++ sample code, but I ran into this problem when trying to build my own app.

g++ -std=c++11 -DHAS_CUDA -I./dmlc-core/include/ -I/usr/local/cuda-6.5/include -I./include/ -rdynamic ./libminerva.so -rdynamic ./libcudnn.so main.cpp -o main

when I run it, it says:

./main: symbol lookup error: ./main: undefined symbol: _ZN7minerva13MinervaSystem15CreateCpuDeviceEv

the main.cpp program is just:

include <minerva.h>

using namespace std;
using namespace minerva;
int main(int argc, char ** argv){
MinervaSystem::Initialize(&argc, &argv);
MinervaSystem& ms = MinervaSystem::Instance();
uint64_t cpuDevice = ms.CreateCpuDevice();
ms.SetDevice(cpuDevice);
return 0;
}

Which I think is strange.. since I find exactly CreateCpuDeviceEv in the binary "libminerva.so".
I am using CentOS 6.5

Minerva GPU-- free memory for a variable

Thanks for the great tool.
I are trying to use minerva to run RNN on huge-size dataset. However, the gpu memory increase gradually and get crashed when it is beyond the gpu memory limit. My first question is that is there any way/function to free memory of unused variables? I have used wait_for_all, however it cannot solve the problem.

I try to write a memory-free function by cudaFree. The function is able to free the memory of the Narray variable, however, the pointer to the Narray variable still exists. When I reuse the Narray variable for assignment operation recursively, e.g. data=owl.from_numpy(np.array([...])), I found that it works for a new array with different size, however, it get crashed when the size of new array is same as the old one that has been deleted. For example:

//cannot work
data = owl.from_numpy(np.range(10000).reshape(100，100))
owl.free_memory(data) //I write the function using cudaFree by myself
data = owl.from_numpy(np.range(10000).reshape(100，100))
owl.free_memory(data) //core dump here

//can work
data = owl.from_numpy(np.arange(10000).reshape(100，100))
owl.free_memory(data)
data = owl.from_numpy(np.arange(20000).reshape(200，100))
owl.free_memory(data)

Would you please find out for me the reason? Thanks a lot

broadcasting NArray + NArray

Hello. First, thanks for the really cool library!

I am experiencing odd issues with broadcasting when doing element wise operations between two arrays. I would expect all of the following to work but not all of them do.

owl.zeros((3,3)) + 10 # works
owl.zeros((3,3)) + np.array(10) # works
owl.zeros((3,3)) + np.array([10]) # works
owl.zeros((3,3)) + np.array([[10]]) # works


owl.zeros((3,3)) + owl.from_numpy(np.array([[10]])) # Fails

what():  [22:31:13] /home/luke/Apps/minerva/minerva/op/impl/cuda.cpp:223: Check failed: (closure.dims_to_replicate.NumDims()) == (1) currently do norm on one dimension only

owl.zeros((3,3)) + owl.from_numpy(np.array(10)) # Fails

*** RuntimeError: [22:32:19] /home/luke/Apps/minerva/minerva/narray/narray.cpp:197: Check failed: (lhs.Size().NumDims()) == (rhs.Size().NumDims()) #dimension mismatch

It appears that broadcasting does not work between two owl arrays. Is this correct? Is there a way to work around this?

Thanks!

self-defined elemwise function on GPU ?

Hi,

I've looked at the NIPS paper and the code snippet there presents a very convenient way of defining elem-wise function:

float Sigmoid(float x){
return 1.0 / (1.0 + exp(-x));
}

Then we just do:

Matrix z = (V * y + c).Map(&Sigmoid);

However, the current version seems to be completely different from the one presented in the paper, so I wonder why ? Or how do I achieve such goal in the current version of the code easily (instead of writing cuda code myself). Sorry, just start to look at this project recently so I might've missed some context

Thanks

`mnist_mlp.cpp` cannot compile

In the recent merging, I have removed all codes related to file_loader since they could be completed replaced by MakeNArray interface. Then apps/mnist_mlp.cpp and apps/ps/mnist_mlp.cpp codes cannot be compiled. Need to fix these two applications by writing similar IO codes in apps/mnist_cnn.cpp.

Failed to run MNIST example due to CudaPerformNormAddOnRow CUDA: invalid device function

(Owl Ready) ➜  mnist: python mnist_mlp.py
[18:48:23] /home/zer0n/minerva/minerva/system/minerva_system.cpp:86: dag engine enabled
Training data: 235 mini-batches
Test data: 10000 samples
(256, 10)
---Start epoch #0
[18:48:28] /home/zer0n/minerva/minerva/op/impl/cuda/cuda_perform.cu:136: Check failed: (e) == (cudaSuccess) CudaPerformNormAddOnRow CUDA: invalid device function
[1]    72399 abort (core dumped)  python mnist_mlp.py

My environment:

Minerva built with cmake 3.2.3 and g++ 4.9 on Ubuntu 12.04
CUDA 7
CuDNN 2

FYI, I ran Caffe's MNIST example using GPU just fine.

terminate called after throwing an instance of 'std::out_of_range'

Here is my python codes:

import owl, numpy
a = numpy.zeros((300,400))
b = owl.from_numpy(a)

And it just gave me such an error:
terminate called after throwing an instance of 'std::out_of_range'
what(): _Map_base::at
Aborted (core dumped)

help...

indexing reference of NArray?

What I really wanna do is parameters update for LSTM.
I've realized that my vocabulary is relatively large (about 500K) and having a vector of NArray is very inefficient (of which the reason I don't know). When trying to sync (by calling WaitForAll) for the following code:

int N = 600000;
int D = 128;
vector A = vector(N,NArray::Zeros({1,D}));
for(int i = 0 ; i < N ; i ++){
A[i] = NArray::Randn({1,D},0,0.05);
}
//calling ms.WaitForAll() here takes a long long time.....

it takes a long long time (maybe because pushing NArray into vector one at a time makes a lot of malloc calls on GPU, which is slow ?)
So instead I am thinking about having a giant 2D matrix and do the following thing.

NArray A = NArray::Randn({N,D},0,0.01);
NArray b = NArray::Randn({1,D},0,1);
A[10] = A[10] + b; // update a

It seems that this feature is not supported?
Any suggestion or comment is very appreciated...

race condition in owl imports

Hello again,
I think I found a race condition in importing of owl.
I have a fairly simple test program that crashes maybe 1 in 3 times.

import owl
import owl.elewise as ele

c = owl.create_cpu_device()
owl.set_device(c)

x = owl.zeros((10, 10))
y = ele.relu(x)

The error received on a bad run looks something like:

○ → python tt.py
[22:55:26] /home/luke/Repos/minerva/minerva/system/minerva_system.cpp:86: dag engine enabled
[22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:46: create new op node #1 on device #0
[22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:149: node #1 running right after creation
[22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:46: create new op node #3 on device #0
[22:55:26] /home/luke/Repos/minerva/minerva/backend/dag/dag_scheduler.cpp:176: dispatching node #1 to device #0
[22:55:26] /home/luke/Repos/minerva/minerva/device/device.cpp:95: CPU device #0 create output for task data #0
[22:55:26] /home/luke/Repos/minerva/minerva/device/data_store.cpp:18: create data #0 length 400
terminate called after throwing an instance of 'dmlc::Error'
  what():  [22:55:26] minerva/common/singleton.h:13: Check failed: data_ please initialize before use
^[[AAborted (core dumped)

Sadly I cannot get a legit stack trace as when I run it under gdb I get no failure.

I can throw a sleep of a bit just after the import and it seems to fix the problem.

This is under cpu, ubuntu 15.04 with the dag enabled.

Thanks!

doesn't compile with cuDNN R2

cuDNN R2 has different interface than R1. Our code does not compile with R2, giving out messages like "identifier "cudnnTensor4dDescriptor_t" is undefined". We should fix it or at least give a warning about this.

dmlc / minerva Goto Github PK

minerva's Introduction

Distributed Machine Learning Common Codebase

What's New

Contents

Known Issues

Contributing

CheckList before submit code

NOTE

minerva's People

Contributors

Stargazers

Watchers

Forkers

minerva's Issues

include <pthread.h>

ifndef pthread_create

else

endif

g++ -std=c++11 -DHAS_CUDA -I./dmlc-core/include/ -I/usr/local/cuda-6.5/include -I./include/ -rdynamic ./libminerva.so -rdynamic ./libcudnn.so main.cpp -o main

when I run it, it says:

./main: symbol lookup error: ./main: undefined symbol: _ZN7minerva13MinervaSystem15CreateCpuDeviceEv

include <minerva.h>

Recommend Projects

Recommend Topics

Recommend Org

Jobs