GithubHelp home page GithubHelp logo

dmlc / mshadow Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 431.0 1.48 MB

Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

License: Other

Makefile 2.03% C++ 80.48% Cuda 12.16% Shell 0.23% CMake 4.88% C 0.22%

mshadow's Introduction

Distributed Machine Learning Common Codebase

Build Status Documentation Status GitHub license

DMLC-Core is the backbone library to support all DMLC projects, offers the bricks to build efficient and scalable distributed machine learning libraries.

Developer Channel Join the chat at https://gitter.im/dmlc/dmlc-core

What's New

Contents

Known Issues

  • RecordIO format is not portable across different processor endians. So it is not possible to save RecordIO file on a x86 machine and then load it on a SPARC machine, because x86 is little endian while SPARC is big endian.

Contributing

Contributing to dmlc-core is welcomed! dmlc-core follows google's C style guide. If you are interested in contributing, take a look at feature wishlist and open a new issue if you like to add something.

  • DMLC-Core uses C++11 standard. Ensure that your C++ compiler supports C++11.
  • Try to introduce minimum dependency when possible

CheckList before submit code

  • Type make lint and fix all the style problems.
  • Type make doc and fix all the warnings.

NOTE

deps:

libcurl4-openssl-dev

mshadow's People

Contributors

antinucleon avatar apeforest avatar asmushetzel avatar cjolivier01 avatar drustz avatar eric-haibin-lin avatar hjk41 avatar jermainewang avatar jpauwels avatar larroy avatar lorrainexun avatar mli avatar piiswrong avatar pluskid avatar ptrendx avatar rahul003 avatar reminisce avatar sinzero avatar stefanhenneking avatar sxjscience avatar szha avatar taolv avatar tornadomeet avatar tqchen avatar vchuravy avatar winstywang avatar yajiedesign avatar zhenlinluo avatar zhreshold avatar zihengjiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mshadow's Issues

Why the default GPU stream was used?

When I compile and run the "mshadow/test/test.cu", it tells me:
"Default GPU stream was used when MSHADOW_FORCE_STREAM was on".
I don't know why the stream was used and how could I make it unused? I just turn off MSHADOW_FORCE_STREAM now. Doing so will make the performance worse? Thanks.

Some misleading syntax

In Dot_engine-inl.h, the default type of a template argument is set by

template<typename Device, typename DType = default_real_t>

While in other places like Tensor.h, I notice that DType is set by #define MSHADOW_DEFAULT_DTYPE = default_real_t.

Should we define them in a consistent way?

Confusion about stream

Hey, I'm working on editingbackward andforward functions of guide/convnet.cu
As I writing the functions, I notice that the initial version of the .cu has two temp TensorContainers : tmp_col and tmp_dst. In my completion, I need more temp Containers, so I just imitate the code and add more TensorContainers like TensorContainer<xpu, 4, real_t> tmp41, tmp42, tmp43, tmp44; , and in the init-function set their stream.
But when I use them in the forward and bacward , I just can't use -gpu option, the output is

Default GPU stream was used when MSHADOW_FORCE_STREAM was on

In the function I Resize the temp Containers and assign them to some return value like swapaxis or unpack_patch2col, I debugged the process, and find when I assign them to a return value, the error happens.

And , I'm little confused about stream. It is used to store the operation process, right? My concept about stream is poor.
@antinucleon @tqchen
Thank you.

Building on Mac OS 10.9.4

I managed to build cxxnet+mshadow on Mac OS X 10.9.4 after doing several changes to the Makefile (which I can share if anyone's interested) mostly to deal with the well known libc++/libstdc++ issue in Mavericks.
Clang gave me several harmless warning about types being declared as structs and defined as classes (or the other way around). However, it was also failing to compile and I had to change line 453 in tensor.h because it looks like it was missing the dimkeep parameter and Clang was unforgiving about it:

template<typename Saver, typename Reducer, int dimkeep, typename E, int etype>

I'm new to mshadow/cxxnet (I discovered it yesterday evening) so I apologize beforehand if my contribution is not correct.

Regards,

Steven

'sum_rows' error, when the dimension of tensor is (4,1).

This error will occur when assignment operator execution, and the error line seems in this line. The cuda get no error, and posix error string is 'File exists'.

I test this simple program in cuda 5.5 and cuda 6, and they are both error.

inline void onExitPrintError(){
    cudaError_t err = cudaGetLastError();
    if(err != cudaSuccess)
    {
        // print the CUDA error message and exit
        printf("CUDA error: %s\n", cudaGetErrorString(err));
    }
    printf("Posix errno %s\n",strerror(errno));

}

int main(){
    InitTensorEngine(1);
    atexit(onExitPrintError);
    TensorContainer<cpu, 2> a;
    a.Resize(Shape2(4,1));

    a[0][0] = 0.0f;
    a[1][0] = 1.0f;
    a[2][0] = 1.0f;
    a[3][0] = 0.0f;

    TensorContainer<gpu, 2> gpu_a;
    gpu_a.Resize(Shape2(4,1));
    Copy(gpu_a,a);


    TensorContainer<gpu, 1> b;
    b.Resize(Shape1(1));

    b = sum_rows(gpu_a);

    TensorContainer<cpu, 1> c;
    c.Resize(b.shape);
    Copy(c,b);
    for(int i=0;i<c.shape[0];++i){
        cout<< c[i]<<endl;
    }

    ShutdownTensorEngine();
    return 0;
}

[question] why just replace the inline keyword with predefined macro?

This is just a question about the source code. To improve the performance, it will be benefit to use inline function as many as possible. The MSHADOW_XINLINE is a force inline macro which defined in base.h. But sometimes inline key word appear here and there.
for example,
https://github.com/dmlc/mshadow/blob/master/mshadow/expression.h#L137
why just replace the inline keyword with predefined macro? Is it matter the performance?
Or just use inline keywords where we can make sure that function will be execute in CPU .

Compiling NNET examle

Hi,

When I try to compile neural net example I get the following error:

nvcc -o nnet_ps -O3 --use_fast_math -ccbin g++  -Xcompiler "-Wall -O3 -I../../ -fopenmp -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -I/usr/include/cuda/ -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_DIST_PS=0" -Xlinker "-lm -lm -lcudart -lcublas -lcurand -L/usr/lib64 -lopenblas -L/usr/lib64/atlas" nnet_ps.cu
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<1>(mshadow::Tensor<mshadow::gpu, 1, float>, mshadow::Tensor<mshadow::gpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x108): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::gpu>::SyncProc<2>(mshadow::Tensor<mshadow::gpu, 2, float>, mshadow::Tensor<mshadow::gpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3gpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x165): undefined reference to `NNet<mshadow::gpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::gpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<1>(mshadow::Tensor<mshadow::cpu, 1, float>, mshadow::Tensor<mshadow::cpu, 1, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi1EEEvNS0_6TensorIS1_XT_EfEES5_i]+0xe1): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
/tmp/tmpxft_00000cc5_00000000-16_nnet_ps.o: In function `void NNet<mshadow::cpu>::SyncProc<2>(mshadow::Tensor<mshadow::cpu, 2, float>, mshadow::Tensor<mshadow::cpu, 2, float>, int)':
tmpxft_00000cc5_00000000-3_nnet_ps.cudafe1.cpp:(.text._ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i[_ZN4NNetIN7mshadow3cpuEE8SyncProcILi2EEEvNS0_6TensorIS1_XT_EfEES5_i]+0x13d): undefined reference to `NNet<mshadow::cpu>::UpdateEntry::ApplyUpdate(mshadow::Stream<mshadow::cpu>*, void*)'
collect2: error: ld returned 1 exit status
Makefile:34: recipe for target 'nnet_ps' failed

Any ideas what can I do to fix it? Is this example old?

Szymon

The usage of concat

Hi , @tqchen @antinucleon ,
I notice there's a function concat in mshadow, AWESOME.
But how to use it? I find the function's parameter is LVALUE, RVALUE, but how can I define which dimension to concat?

question about reshape a tensor

hi,

I've got some problem when I try to reshape a mat from (1,2) to (2,1) , my code is something like this:

Tensor<cpu, 2> mat1 = NewTensor<cpu, float>(Shape2(1,2), 1.0);
//reshape mat from 1*2  to 2*1
Tensor<cpu, 2> mat2(Shape2(2,1));
mat2 = reshape(mat1, mat2.shape_);

I got segmentation falt while doing this. I think it's because I didn't allocate memory for mat2, but in my understanding, reshape operation need no extra memory for handling data since mat2 just share data with mat1. I wonder if its my misunderstanding or just because I've done something wrong.

thanks a lot.

Is there a way to convert a 1x1 2D Tensor to scalar?

Hi,
My code looks something like this which is computing the square error of NN.

real_t = (sumall_except_dim<2>(sum_rows(F(outgrad)) / (real_t) pred.shape[1];

however, this seems not working with the following error msg.

error: cannot convert 'mshadow::expr::BinaryMapExp<mshadow::op::div, mshadow::expr::ReduceTo1DExp<mshadow::expr::ReduceTo1DExp<mshadow::expr::UnaryMapExp<square, mshadow::Tensor<mshadow::cpu, 2>, 1>, mshadow::red::sum, 0>, mshadow::red::sum, 2>, mshadow::expr::ScalarExp, 3>' to 'mshadow::real_t {aka float}' in initialization
real_t err = (sumall_except_dim<2>(sum_rows(F(outgrad))) / (real_t)pred.shape[1]);

which basically says the type is different.

I've also tried

(sumall_except_dim<2>(sum_rows(F(outgrad))[0]

but it says there is no operator[] defined on such data type.
It seems that I need to explicit go over the tensor and sum the result(which could be slow compared to optimized vectorized code)?

Thank you.

Use GPU To Random a tensor when shape is odd error

Error Code Shows like that

InitTensorEngine(0);
TensorContainer<gpu,1> tmp;
tmp.Resize(Shape1(1));  // 1 Or any odd value
Random<mshadow::gpu> rnd(0);
rnd.SampleGaussian(tmp, 0, 0.1);
ShutdownTensorEngine();

The error code is CURAND_STATUS_LENGTH_NOT_MULTIPLE. The error line maybe is here.

[Discussion] Support sort in mshadow?

Do we need to do this? For CPU we can use std::sort, for GPU we can refer CUDA examples.

sampe_api

dst, idx = sort(src)

where idx is the argsort result

cpu pending

After around 8 hours running, the cpu could idle.

<gpu> and <cpu> generates totally different results

Help.
I improved the /guide/neuralnet/convnet.cu, add my own function
when I use -cpu parameter, things go well , the error rate is declining
however, -gpu generates totally different results: the error just stays at 0.9 or so(didn't change a little bit)...
It may seems there's a bug in mshadow's gpu implementation, but I don't know.

I didn't add my own gpu code, just use mshadow's(like <xpu>).

for cpu , I used blas lib. And my cuda version is 7.0.
Please help.
@tqchen @antinucleon

half_t multiplication

/home/wallnuss/src/mxnet/mshadow/mshadow/././half.h: In instantiation of ‘mshadow::half::half_t mshadow::half::operator*(mshadow::half::half_t, T) [with T = mshadow::expr::CroppingExp<mshadow::expr::MakeTensorExp<mshadow::expr::UnPoolingExp<mshadow::red::maximum, mshadow::expr::MakeTensorExp<mshadow::expr::PaddingExp<mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::expr::MakeTensorExp<mshadow::expr::PaddingExp<mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>]’:
src/operator/./pooling-inl.h:145:7:   required from ‘void mxnet::op::PoolingOp<xpu, Reducer, DType>::Backward(const mxnet::OpContext&, const std::vector<mshadow::TBlob>&, const std::vector<mshadow::TBlob>&, const std::vector<mshadow::TBlob>&, const std::vector<mxnet::OpReqType>&, const std::vector<mshadow::TBlob>&, const std::vector<mshadow::TBlob>&) [with xpu = mshadow::cpu; Reducer = mshadow::red::maximum; DType = mshadow::half::half_t]’
src/operator/pooling.cc:47:1:   required from here
/home/wallnuss/src/mxnet/mshadow/mshadow/././half.h:248:31: error: invalid cast from type ‘mshadow::expr::CroppingExp<mshadow::expr::MakeTensorExp<mshadow::expr::UnPoolingExp<mshadow::red::maximum, mshadow::expr::MakeTensorExp<mshadow::expr::PaddingExp<mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::expr::MakeTensorExp<mshadow::expr::PaddingExp<mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>, mshadow::Tensor<mshadow::cpu, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, 4, mshadow::half::half_t>, mshadow::half::half_t, 4>’ to type ‘float’
 MSHADOW_HALF_OPERATOR(half_t, *)
                               ^
/home/wallnuss/src/mxnet/mshadow/mshadow/././half.h:35:27: note: in definition of macro ‘MSHADOW_HALF_OPERATOR’
     return RTYPE(float(a) OP float(b));  /* NOLINT(*) */

I am encountering this in apache/mxnet#2280 for a case where I have constant * expr.
I am not that familiar with how mshadow works so I am wondering if it might be some weird interaction between the half_t implementation of multiplication taking precedent
over the expr multiplication.

The CroppingExp for RValue

Hi,
this issue is comming from:How to copy the data from the Tensor 'src' to the crop of Tensor 'dst'?. i'm not familiar with mshadow, it puzzled me here:

i found that the extensions which support RValue(like SliceExp, ConcatExp ) is inherited from TRValue, but the currently CroppingExp is inherited from MakeTensorExp, so should we change this?

would someone give some more advises for doing this?
i guess we should add REval in Plan structer, like slice and concat, are there any other place to pay attention to?
ths~

Fix for Makefile on OS X

On my OS X, the shared libraries of CUDA are placed in CUDA_HOME/lib instead of CUDA_HOME/lib64. Please change the following line from

MSHADOW_LDFLAGS += -L$(USE_CUDA_PATH)/lib64

to

MSHADOW_LDFLAGS += -L$(USE_CUDA_PATH)/lib64 -L$(USE_CUDA_PATH)/lib

Potential cuda kernel lauching problem of `reduce_except_dim<0,..>` operator

I find that if we use reduce_except_dim<0,..>, we will ultimately call MapReduceKeepDim1(https://github.com/dmlc/mshadow/blob/master/mshadow/tensor_gpu-inl.h#L153-L155), which may have problems for large matrix. In fact, in the implementation of MapReduceKeepDim1, the dimGrid is set directly as p[1] (https://github.com/dmlc/mshadow/blob/master/mshadow/cuda/tensor_gpu-inl.cuh#L183-L184), which may exceed the boundary of 65536(https://github.com/dmlc/mshadow/blob/master/mshadow/cuda/tensor_gpu-inl.cuh#L45).

This problem does not exist for MapReduceKeepLowest, which has used MemUnits to set kernel launching parameters https://github.com/dmlc/mshadow/blob/master/mshadow/cuda/tensor_gpu-inl.cuh#L142-L143. Should we change the implementation of MapReduceKeepDim1 to be similar to MapReduceKeepLowest in the future?

Broadcasting along multiple axis

Sometimes we need to broadcast along some axis given in an std::vector -> broadcast_with_multi_axis. I'm not sure how to code this concisely for Tensor with arbitrary ndim, since we may not know the number of broadcasted axises in prior. One idea I have is to constrain the maximum axis of the broadcast_with_multi_axis to a large number like 5. Is this solution acceptable?

incorrect flag checking for -std=c++0x ?

Using gcc v. 4.6.3, I found that I needed to change the flag check from
__GXX_EXPERIMENTAL_CXX0X
to
GXX_EXPERIMENTAL_CXX0X
in tensor_base.h
in order to detect the -std=c++0x flag correctly.

(I was using lambdas in my code, so needed to add -std=c++0x to the compiler options, but then I got errors about not using constexpr, which is what that flag check in tensor_base.h is supposed to do ... so it looks like you need to add those underscores to get the flag check to work).

what does ``unpack_patch2col`` exactly do?

I know that this function is to vectorize a matrix to prepare for the convolution,
but in detail, the function receive params (input_image, filter_height, filter_width, stride)
and output a tensor<2>,whose shape is (channel * filter_height * filter_width, batch * ouput_height * output_width)

so I think this function convert the matrix to suit the “dot“ function with filter.

thus , for example , if I input a image tensor ``IMG`` 1 * 2 * 1 * 3, meaning 1 batch, 2 channels, 1 height , 3 width , the first channel of IMG is (1,2,3), the second is (4,5,6), then I apply `` unpack_patch2col(IMG, 1, 2, 1)`` which means the filter is 1 * 2, then the output shoud be like a tensor, (1, 2, 2, 3, 4, 5, 5, 6) but all I get is (6, 2, 6, 2, 6, 2, 6, 2)

Anyone helps me about the function?

../mshadow/./base.h:145:20: fatal error: cuda.h: Filr or directory doesn't exist

When compiling the simple example basic.cpp:
marco@pc:~/mshadow/prove$ g++ -std=c++11 -DUSE_BLAS=openblas -I .. basic.cpp -obasic
In file included from ../mshadow/tensor.h:16:0,
from basic.cpp:2:
../mshadow/./base.h:145:20: fatal error: cuda.h: File o directory non esistente
#include <cuda.h>
^
compilation terminated.

What to do?

3D Tensor operations?

Is there operation for 3D tensor( Tensor<cpu,3>) ?

For instance I would like to take one 2D tensor and dot product with the first two dimensions of a 3D tensor:
Tensor<cpu,2> 2d_ten(Shape2(100,50));
Tensor<cpu,3> 3d_ten(Shape3(50,30,20));

someop(2d_ten, 3d_ten);

I would like to get the result which is a 3D Tensor with shape = (100,30,20)

Thanks.

Is Mshadow Cpp_version Theano?

I am currently using Mshadow to develop some NN learning tools from scratch.

And I wonder can I use Mshadow as a CPP Version of Theano?

I mean, nearly all the things that Theano is able to do, Mshadow can do them (faster on running and slower on coding) too in a similar way.

Is that correct?

Makefile error

The rule for $(BIN) is incorrect in the Makefile: the order of -o and LDFLAGS needs to be reversed, otherwise you get an error about unresolved symbols. That is, the target should be $(CXX) $(CFLAGS) -o $@ $(filter %.cpp %.o %.c, $^) $(LDFLAGS)

Multiple GPU support

I do not know if mshadow could support multiple GPU now.
Instead of pre-define the id of device, it could also be templated ( with a default device number) to enable Tensor on multiple GPUs.

Question about operator= for Tensor and TensorContainer

Hi ,
I`m confused with operator= for Tensor and TensorContainer
Could someone explain how it works?
Many thanks :D

#include<mshadow/tensor.h>
#include<iostream>
#include<mshadow/tensor_container.h>
void TestTensor(){
  using namespace std;
  using namespace mshadow;
  TensorContainer<cpu, 3> tc3;
  TensorContainer<cpu, 2> tc2;
  tc3.Resize(Shape3(3, 2, 2));
  tc2.Resize(Shape2(2, 2));
  tc2[0][0] = 0; tc2[0][1] = 0.1;
  tc2[1][0] = 1.0; tc2[1][1] = 1.1;
  for (index_t i = 0; i < 3; i++){
    //tc3[i] = tc2; //in this case , failed
    Copy(tc3[i], tc2, tc3.stream_);//succeed
  }
  for (index_t i = 0; i < 3; i++){
    cout << "channel 0" << endl;
    for (index_t j = 0; j < 2; j++)
    {
      for (index_t k = 0; k < 2; k++)
      {
        cout << tc3[i][j][k] << " ";
      }
    }
    cout << endl;
  }
}

0sjl8g3 oy3 lvq0p4fsif
c 8bvye 5w61 f58nn0cpy5

leverage cudnn?

Is there anything from cudnn can be leveraged? It shows ~24% speedup in Titan Black.

compile errors on dot towards TensorContainer

I add the following codes in basic.cpp to have a test, but it reports a compile error for me? Do I miss something? Thanks.

TensorContainer<cpu, 2, float> matc1(Shape2(2,3));
matc1[0]=2; matc1[1]=3;
matc1 = 1 / matc1;

TensorContainer<cpu, 2, float> matc2(Shape2(3,2));
matc2[0]=1; matc2[1]=2; matc2[2]=3;

// error: conversion from ‘mshadow::expr::DotExp<mshadow::Tensor<mshadow::cpu, 2, float>,
// mshadow::Tensor<mshadow::cpu, 2, float>, false, false, float>’ to non-scalar type
// ‘mshadow::TensorContainer<mshadow::cpu, 2, float>’ requested
TensorContainer<cpu, 2, float> matc3 = dot(matc1, matc2);

for (index_t i = 0; i < matc3.size(0); ++i) {
for (index_t j = 0; j < matc3.size(1); ++j) {
printf("%.2f ", matc3[i][j]);
}
printf("\n");
}
printf("\n");

Compile error using cuda (basic_stream.cu, mshadow::InitTensorEngine)

nvcc -o basic_stream -O3 --use_fast_math -ccbin g++ -Xcompiler "-Wall -O3 -I../ -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -I/usr/local/cuda-7.0/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0" -Xlinker "-lm -lm -lcudart -lcublas -lcurand -L/usr/local/cuda-7.0/lib64 -lcblas" basic_stream.cu
basic_stream.cu(10): error: no instance of function template "mshadow::InitTensorEngine" matches the argument list

basic_stream.cu(31): error: no instance of function template "mshadow::ShutdownTensorEngine" matches the argument list

potential random issue with DTypes

Currently random number generation only supporting float and double type using cuRAND. According to cuRAND doc(CUDA7.5, I haven't found the link to CUDA8.0), half type random number is not yet supported. A candidate solution is to create one extra float type tensor to generate values and convert them into DTypes other than float or double.

Currently the Random is used in dropout layer to created mask, which might be a issue if we want to support DType.
@tqchen

Question about `broadcast_with_axis`

Why broadcast_with_axis is designed to create a tensor of ndim+1? This is a little bit annoying sometimes. For example, if I need to broadcast a [1, 200] tensor to a [100, 200] tensor. I need to first broadcast it to [1, 100, 200] tensor then do a reshape to get rid of the redundant dim. Is there any special concern for this design? I think maybe a keepdim broadcasting will be more convenient (i.e, [1, 200] directly to [100, 200]).

Can't get correct answer from reduce_with_axis

I'm trying to implement a tensorflow-like reduce_sum operator for mxnet, based on mshadow::expr::reduce_with_axis. Firstly I wrote some testing code but got the wrong answer. Not sure whether my usage is inappropriate. Here's the test code:

#define MSHADOW_STAND_ALONE 1
#include <iostream>
#include "../mshadow/tensor.h"
#include "../mshadow/extension/reduce_with_axis.h"
using namespace mshadow;
using namespace mshadow::expr;
using namespace std;

int main() {
    Tensor<cpu, 3, float> t3(Shape3(2, 2, 5));
    AllocSpace(&t3);
    t3 = 1.0f;

    Tensor<cpu, 2, float> t2(Shape2(2, 2));
    AllocSpace(&t2);
    t2 = reduce_with_axis<red::sum, false>(t3, 2);
    for (index_t i = 0; i < t2.size(0); ++i) {
        for (index_t j = 0; j < t2.size(1); ++j) {
            cout << t2[i][j] << ' ';
        }
        cout << endl;
    }
}

The output varied from

5 5
7.3787e+19 0

to

5 5
4 0

or something else. However, if I reduce dimension 0 or 1, rather than 2, then the result is constant and correct.

I also read the source code of reduce_with_axis.h but failed to catch the idea.

@piiswrong

Sum all

Could there be a sum all function that can give the sum of elements in tensor? Or I missed it? It is useful for MSE calculations. Otherwise loops are inevitable like follows, which may be inefficient:

    tpv=sumall_except_dim<0>(F<Abs>(V));
    tpv2=sumall_except_dim<0>(F<sme>(V));
    for(int i=0;i<V.shape.shape_[0];i++){
        err+=tpv[i];
        err2+=tpv2[i];
    }

matrix transpose

Is there any operation that can deal with matrix transpose efficiently? thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.