intel / ideep Goto Github PK

Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.

License: MIT License

Python 9.46% C++ 81.63% Makefile 0.32% Shell 0.07% C 7.22% CMake 1.27% Dockerfile 0.02%

ideep's Issues

Big performance penalty due to repeated reordering of weights

We noticed that IDEEP spends quite a bit time doing reordering of tensors. This problem is especially obvious when input size is small (e.g. batch_size=1). After some profiling, I noticed that during our Caffe2 run, most of the time is spent redordering tensors, weight tensors in particular. This does not make sense and should be done only once, as suggested in the example of MKL-DNN (https://github.com/intel/mkl-dnn/blob/master/examples/simple_net.cpp#L798). Hopefully, we can resolve this issue as it's killing the performance and make it noncompetitive against the MKLML ops.

Code of interest:

ideep/include/ideep/computations.hpp

Lines 1080 to 1089 in a861d8c

 if (src.get_descriptor() != comp.expected_src_descriptor()) { 

 src_in.init<alloc, convolution_forward>( 

 comp.expected_src_descriptor()); 

 reorder::compute(src, src_in); 

 } 

 if (weights.get_descriptor() != comp.expected_weights_descriptor()) { 

 weights_in.init<alloc, convolution_forward>( 

 comp.expected_weights_descriptor()); 

 reorder::compute(weights, weights_in); 

 }

@gujinghui @jgong5

What is the oneDNN version for PyTorch 2.3 release?

I see oneDNN 3.4 was released last week, wondering if this is going into the upcoming PyTorch 2.3 release. Currently I see ideep_pytorch branch pointing to oneDNN 3.3.5.

question: mkl-dnn upgrade for PyTorch

Hi Intel ideep team, thanks for the awesome works of integrating mkl-dnn with PyTorch.😃

I noticed that the pytorch_dnnl branch is used for such integration, while the latest commit shows the ideep leverages the 1.5 version of mkl-dnn.

$ cd /root/pytorch/third_party/ideep
$ git log
commit 938cc68897bb46b8d4b228966edd9e23e471cf3b (HEAD, origin/pytorch_dnnl)
Author: pinzhenx <[email protected]>
Date:   Tue Jun 16 18:54:21 2020 +0000

    bump onednn to v1.5

Our application runs PyTorch on Intel CPU architecture, and we relies on mkl-dnn to gain better performances.

We want to know when will ideep upgrades mkl-dnn for PyToch, do you have a timetable for doing so?

question about test operators

Hi, is there any test case to validate those operator in include/ideep/operators of branch pytorch_dnnl?

question: project status?

ideep contains an ancient copy of mkl-dnn (0.x version), while the latest verson is onednn (1.X, or 2.Xbeta). What's the status of this project? and do you have plan to upgrade the embedded copy of onednn?

I'm asking this because I'm preparing packages for official Debian archive, and I don't want to deal with inactive projects.
pytorch/pytorch#37332

Thanks in advance :-)

file INSTALL cannot find "/opt/intel/mkl/include/../lib/libmklml_intel.so"

Almost there but still failed to install ideep. Where is mklml ?

Make Error at cmake_install.cmake:45 (file):
  file INSTALL cannot find "/opt/intel/mkl/include/../lib/libmklml_intel.so".


Makefile:88: recipe for target 'install' failed
make: *** [install] Error 1
building 'ideep4py._ideep4py' extension
swigging ideep4py/py/ideep4py.i to ideep4py/py/ideep4py_wrap.cpp
swig -python -c++ -builtin -modern -modernargs -Iideep4py/py/mm -Iideep4py/py/primitives -Iideep4py/py/swig_utils -o ideep4py/py/ideep4py_wrap.cpp ideep4py/py/ideep4py.i
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-EKG1lX/python3.6-3.6.5=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -Iideep4py/include -Iideep4py/include/mklml -Iideep4py/include/ideep -Iideep4py/py/mm -Iideep4py/py/primitives -I/usr/include/python3.6m -I~/.local/lib/python3.6/site-packages/numpy/core/include -c ideep4py/py/ideep4py_wrap.cpp -o build/temp.linux-x86_64-3.6/ideep4py/py/ideep4py_wrap.o -std=c++11 -Wno-unknown-pragmas -march=native -mtune=native -D_TENSOR_MEM_ALIGNMENT_=4096 -fopenmp
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from ideep4py/py/ideep4py_wrap.cpp:3920:0:
ideep4py/py/mm/mdarray.h:41:10: fatal error: ideep.hpp: No such file or directory
 #include "ideep.hpp"
          ^~~~~~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

And if I carry out the installation from within out-of-source folder build by
sudo make install

it's so weird that the installation of ideep will remove mkl symbolic link always.

Any further suggestions?

Is a Windows build functionnal?

Hi !
I'd like to know if a windows build is functional at the moment (since it's not part of the recommended platforms).
I tried compiling and installing it by following the readme instructions, but I'm stuck when installing the package with cd ../python && py -3 setup.py install.
Intel MKL is correctly found but I'm getting LINK errors such as gemm_convolution.obj : error LNK2019: unresolved external symbol _cblas_sgemm.

I attached the full trace, so you can check MKL is found, as well as the different unresolved symbols.
trace.txt

Doesn't expose version

There is no way to tell what the version of ideep is from the library. There should be.

What's the relationship between IDEEP and MKLDNN

Hi folks,
I haven't dug very deep on either IDEEP or MKLDNN codebase yet. Just the high level question, what is the relationship between IDEEP and MKLDNN? If IDEEP just a wrapper around MKLDNN primitives to make writing DNN operators easier? Thanks.

How to use Conv+Sum+Relu fusion?

Hi, folks,
I'm trying to make sense of this constraint for ConvFusion (iattr::residue) op:
https://github.com/pytorch/pytorch/blob/150af6ac1eaedf8aa2ca2a1ca9938bfb3d24d1c5/caffe2/ideep/operators/conv_fusion_op.cc#L144

Basically, what it means is that we need to overwrite the input S with output. How does this even work in inference? Because after one run, the value of my S has changed. And subsequent run will observe different weights. It doesn't look correct to me. Any thoughts? @4pao

Bug in depthwise conv with group=4

Check pytorch/pytorch#8301 for a minimal repo. I met with a segfault on my machine and it seems to originate from the git code of MKL-DNN.

Wheels should not depend on libpythonX.Y.so.1

Currently iDeep wheels provided on PyPI depends on libpythonX.Y.so.1.
This violates the manylinux1 rule. https://www.python.org/dev/peps/pep-0513/#libpythonx-y-so-1
Moreover, it cannot be run in some environment.

Furthermore, explicit linking to libpython creates problems in the common configuration where Python is not built with --enable-shared. In particular, on Debian and Ubuntu systems, apt install pythonX.Y does not even install libpythonX.Y.so.1, meaning that any wheel that did depend on libpythonX.Y.so.1 could fail to import.

Behavior of Spatial BN op

The 5th output of spatial bn (saved_variance) seems to be reciprocal of that of normal CPU spatial BN
https://github.com/pytorch/pytorch/blob/cb98c5020a3400425a5c36ba0ddffcd6ccdb8b84/caffe2/python/ideep/spatial_bn_op_test.py#L91-L93

Any reason why?

OneDNN3.x support timelines

Hi, could someone please provide details on (i) timelines for OneDNN3.x support in ideep and (ii) is the ideep_dev_3.0 the development branch for the same?
Thank you!

[feature request] add example code for ideep on the pytorch_dnnl{,_dev} branch

I'd like to request for a simple .cc example file using ideep on the pytorch_dnnl{,_dev} branch.

Developers of some distributions, say Debian, won't include the sources from git submodules when doing the packaging. In that sense, these developers will need a way of sanity testing to ensure that the packaged ideep indeed works with the separately packaged onednn.

Thank you very much :-)

Update MKL-DNN submodule

When building PyTorch from source with the -DUSE_MKL=ON and -DUSE_IDEEP=ON flags, the compilation of MKL-DNN fails with GCC 8 because the submodule version of MKL-DNN is too old.

A bugfix for MKL-DNN was made recently (see oneapi-src/oneDNN#283) and should solve the issue. For now, this issue can be resolved when building with the -DCMAKE_CXX_FLAGS=-Wno-format-truncation flag.

Behavior of ideep::direct_copy::compute

When &X == &Y in ideep::direct_copy::compute(X, Y). What is the result of this computation? Is it safe to do so? Thanks.

Context:
https://github.com/pytorch/pytorch/blob/769397eb7714c9da1ff3a472f67f063e2da7c483/caffe2/ideep/operators/elementwise_sum_op.cc#L19

Why will there be a free in `get_mkldnn_primitive_desc_t()`

Hi folks,
We are debugging some ASAN use-after-free issue in IDEEP ops.

The offending part is

(4-byte-read-heap-use-after-free)
#0 0x7f727ea48ae2 in caffe2::Tensor::GetDeviceType() const caffe2/caffe2/core/tensor.h:177
    #1 0x7f727f0bde16 in bool caffe2::Blob::IsType<caffe2::Tensor>(caffe2::DeviceType) const caffe2/caffe2/core/blob.h:72
    #2 0x7f727f0bd81a in caffe2::CopyIDEEPToCPUOp::RunOnDevice() caffe2/caffe2/ideep/operators/utility_ops.cc:34
    #3 0x7f727ee140cc in caffe2::IDEEPOperator::Run(int) caffe2/caffe2/ideep/utils/ideep_operator.h:54
    #4 0x7f727ec60a1d in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:63

And the memory that is being read is freed at

#0 0x43da00 in operator delete(void*) ()
    #1 0x7f727ee2b29e in ideep::param::get_mkldnn_primitive_desc_t() const ideep/include/ideep/tensor.hpp:629
    #2 0x7f727ee26049 in ideep::param::get_descriptor() const ideep/include/ideep/tensor.hpp:647
    #3 0x7f727f08c42c in void ideep::batch_normalization_forward_inference::compute<ideep::utils::allocator>(ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor const&, ideep::tensor&, float) ideep/include/ideep/computations.hpp:2595
    #4 0x7f727f08b773 in caffe2::IDEEPSpatialBNOp::RunOnDevice() caffe2/caffe2/ideep/operators/spatial_batch_norm_op.cc:38
    #5 0x7f727ee140cc in caffe2::IDEEPOperator::Run(int) caffe2/caffe2/ideep/utils/ideep_operator.h:54
    #6 0x7f727ec60a1d in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:63

As I'm looking at the code, I don't understand why ideep::param::get_mkldnn_primitive_desc_t() const would induce a free. Any ideas?

Issue with `to_bytes(const int)`

Hey folks, I'm looking at this line:

ideep/include/ideep/lru_cache.hpp

Line 425 in fb1adc4

auto len = sizeof(arg) - (__builtin_clz(arg) / 8);

Looks like it's trying to squeeze out the leading 0 bits. But I don't understand that next line is

bytestring(as_cstring, len);

where address to as_cstring starts with zeros. Can someone help me understand what it is doing? Thanks.

And BTW, __builtin_clz(0) is undefined behavior...
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

ideep4py cannot be installed on OSX

I cannot install ideep4py on OSX.

I cloned the codes on master branch and followed README.md, but python setup.py install doesn't work. python setup.py install returns ideep4py/py/mm/mdarray.h:37:10: fatal error: 'forward_list' file not found, so I tried CFLAGS=-stdlib=libc++ python setup.py install.

However, another error occurs as bellow.

running install
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
running build
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
running build_py
running build_ext
Installing ...
CMake Warning (dev) at CMakeLists.txt:3 (project):
  Policy CMP0048 is not set: project() command manages VERSION variables.
  Run "cmake --help-policy CMP0048" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  The following variable(s) would be set to empty:

    CMAKE_PROJECT_VERSION
    CMAKE_PROJECT_VERSION_MAJOR
    CMAKE_PROJECT_VERSION_MINOR
    CMAKE_PROJECT_VERSION_PATCH
This warning is for project developers.  Use -Wno-dev to suppress it.

-- VTune profiling environment is unset
CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:21 (cmake_policy):
  The OLD behavior for policy CMP0048 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


CMake Deprecation Warning at mkl-dnn/CMakeLists.txt:22 (cmake_policy):
  The OLD behavior for policy CMP0054 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_BUILD_TYPE is unset, defaulting to Release
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES)
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES)
-- Could NOT find OpenMP (missing: OpenMP_C_FOUND OpenMP_CXX_FOUND)
-- VTune profiling environment is unset
-- Detecting Intel(R) MKL: trying mklml_intel
-- Intel(R) MKL: include /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/include
-- Intel(R) MKL: lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libmklml.dylib
-- Intel(R) MKL: OpenMP lib /Users/nogu-atsu/ideep/mkl-dnn/external/mklml_mac_2018.0.3.20180406/lib/libiomp5.dylib
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/nogu-atsu/ideep/build
[  0%] Linking CXX shared library libmkldnn.dylib
Undefined symbols for architecture x86_64:
  "_cblas_gemm_s8u8s32", referenced from:
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<true, (mkldnn_data_type_t)6>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)1>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)2>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_u8s8s32x_convolution_fwd_t<false, (mkldnn_data_type_t)5>::execute_forward() in gemm_u8s8s32x_convolution.cpp.o
      ...
  "_cblas_saxpy", referenced from:
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
  "_cblas_sgemm", referenced from:
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<true, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_fwd_t<false, false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_forward() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_data_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_data() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::_gemm_convolution_bwd_weights_t<false, (mkldnn::impl::cpu::cpu_isa_t)0>::execute_backward_weights() in gemm_convolution.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_fwd_t<(mkldnn_data_type_t)1>::execute_forward() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_data_t<(mkldnn_data_type_t)1>::execute_backward_data() in gemm_inner_product.cpp.o
      mkldnn::impl::cpu::gemm_inner_product_bwd_weights_t<(mkldnn_data_type_t)1>::execute_backward_weights() in gemm_inner_product.cpp.o
      ...
  "_cblas_sgemm_alloc", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sgemm_compute", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::packed_gemm(int, int, int, int, int, int, int, int, int, float const*, float*, float*, bool, float) in ref_rnn.cpp.o
  "_cblas_sgemm_free", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::free_packed_weights(int, int, float**) in ref_rnn.cpp.o
  "_cblas_sgemm_pack", referenced from:
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)64>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
      mkldnn::impl::cpu::_ref_rnn_common_t<(mkldnn_prop_kind_t)128>::pack_weights(int, int, int, int, int, int, int, float**, float const*) in ref_rnn.cpp.o
  "_cblas_sscal", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_scal(int, float, float*) in ref_softmax.cpp.o
  "_vsExp", referenced from:
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::execute_forward_dense() in ref_softmax.cpp.o
      mkldnn::impl::cpu::ref_softmax_fwd_t<(mkldnn_data_type_t)1>::_exp(int, float const*, float*) in ref_softmax.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [mkl-dnn/src/libmkldnn.0.14.0.dylib] Error 1
make[1]: *** [mkl-dnn/src/CMakeFiles/mkldnn.dir/all] Error 2
make: *** [all] Error 2
building 'ideep4py._ideep4py' extension
swigging ideep4py/py/ideep4py.i to ideep4py/py/ideep4py_wrap.cpp
swig -python -c++ -builtin -modern -modernargs -Iideep4py/py/mm -Iideep4py/py/primitives -Iideep4py/py/swig_utils -o ideep4py/py/ideep4py_wrap.cpp ideep4py/py/ideep4py.i
gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -stdlib=libc++ -Iideep4py/include -Iideep4py/include/mklml -Iideep4py/include/ideep -Iideep4py/py/mm -Iideep4py/py/primitives -I/Users/nogu-atsu/.pyenv/versions/anaconda3-4.3.1/include/python3.6m -I/Users/nogu-atsu/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/numpy/core/include -c ideep4py/py/ideep4py_wrap.cpp -o build/temp.macosx-10.7-x86_64-3.6/ideep4py/py/ideep4py_wrap.o -std=c++11 -Wno-unknown-pragmas -march=native -mtune=native -D_TENSOR_MEM_ALIGNMENT_=4096
In file included from ideep4py/py/ideep4py_wrap.cpp:3920:
ideep4py/py/mm/mdarray.h:41:10: fatal error: 'ideep.hpp' file not found
#include "ideep.hpp"
         ^~~~~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1

How can I solve this?

PyPI package wheel files for Python 3.8

Hi,

The wheel files for ideep4py on PyPI support Python versions up to 3.7. Since 3.8 is the default Python version on Ubuntu 20, installation of ideep via pip fails there.
Would it be possible to upload the whl files for Python 3.8 on PyPI?

Thanks!

No support to zero-dim tensors

Hi folks, I notice that the following code will throw error

ideep::tensor::resize(dims, itensor::data_type::f32);

if dims contains zero dimension. For example, (0, 2). This is a legit tensor shape which will emerge from rcnn use case. In fact, old MKL-ML operators support such shapes. I don't know whether this is a regression of IDEEP or MKL-DNN. Please help us take a look. Thanks.

You can use this tiny test case to reproduce the issue:
pytorch/pytorch#8459

@4pao @gujinghui

Recent force push means all submodule pointers broke

Recently ideep repository had a force push from https://github.com/pytorch/ideep/commit/fb1adc449de61b56e92f8a81e02b91c068209f47 to 526cf81

This generally makes downstream users grumpy because it means that their submodule pointers stop working with:

fatal: reference is not a tree: fb1adc449de61b56e92f8a81e02b91c068209f47
Unable to checkout 'fb1adc449de61b56e92f8a81e02b91c068209f47' in submodule path 
'third_party/ideep'

Generally, it's better to avoid force push. But if there is no other option, I'd recommend keeping the old commits around using a tag.

tests break: fatal error: 'mkl_vsl.h' file not found

In file included from /usr/ports/math/ideep/work/ideep-2.0.0-119-gb57539e/include/ideep.hpp:44:
/usr/ports/math/ideep/work/ideep-2.0.0-119-gb57539e/include/ideep/computations.hpp:61:10: fatal error: 'mkl_vsl.h' file not found
#include <mkl_vsl.h>
         ^~~~~~~~~~~
1 error generated.

FreeBSD 12

Any particular reason why IDEEP requires g++>5.3?

Recently pytorch/pytorch#6699 has brought IDEEP into Pytorch. But our gcc requirement is not that tight. In fact, I built with g++4.8 and it seems to build and runs OK. Hence the question here.

Any update for mkl-dnn is now updated to dnnl?

mkl-dnn version used by PyTorch causes internal compiler error when built by latest VS2019

See below:

C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
FAILED: third_party/ideep/mkl-dnn/src/cpu/CMakeFiles/dnnl_cpu.dir/gemm_convolution_utils.cpp.obj 
C:\Users\circleci\project\build\win_tmp\bin\sccache-cl.exe   /TP -DDNNL_ENABLE_CONCURRENT_EXEC -DDNNL_ENABLE_MAX_CPU_ISA -DDNNL_X64=1 -DIDEEP_USE_MKL -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_OPENMP_NOFORCE_MANIFEST -D_WIN -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -I..\cmake\..\third_party\benchmark\include -Icaffe2\contrib\aten -I..\third_party\onnx -Ithird_party\onnx -I..\third_party\foxi -Ithird_party\foxi -I..\third_party\ideep\mkl-dnn\include -Ithird_party\ideep\mkl-dnn\include -I..\third_party\ideep\mkl-dnn\src -I..\cmake\..\third_party\googletest\googlemock\include -I..\cmake\..\third_party\googletest\googletest\include -I..\third_party\protobuf\src -Iwin_tmp\mkl\include -I..\third_party -I..\cmake\..\third_party\eigen -IC:\Jenkins\Miniconda3\include -IC:\Jenkins\Miniconda3\lib\site-packages\numpy\core\include -I..\cmake\..\third_party\pybind11\include /DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -openmp:experimental -DNDEBUG -openmp:experimental  /MP    /wd4800 /wd4068 /wd4305 /wd4551 /wd4244  /MD /O2 /Ob2 /DNDEBUG /w /bigobj -DNDEBUG -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -std:c++14 /showIncludes /Fothird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\gemm_convolution_utils.cpp.obj /Fdthird_party\ideep\mkl-dnn\src\cpu\CMakeFiles\dnnl_cpu.dir\ /FS -c ..\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp
C:\Users\circleci\project\third_party\ideep\mkl-dnn\src\cpu\gemm_convolution_utils.cpp(401) : fatal error C1001: Internal compiler error.
(compiler file 'd:\agent\_work\7\s\src\vctools\Compiler\Utc\src\p2\main.c', line 195)
 To work around this problem, try simplifying or changing the program near the locations listed above.
If possible please provide a repro here: https://developercommunity.visualstudio.com 
Please choose the Technical Support command on the Visual C++ 
 Help menu, or open the Technical Support help file for more information
  cl!RaiseException()+0x69
  cl!RaiseException()+0x69
  cl!CloseTypeServerPDB()+0x22e6b
  cl!CloseTypeServerPDB()+0xcd30a
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29111 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Would it be possible to update version of MKL-DNN used by ideep:pytorch branch to include oneapi-src/oneDNN#805

Let user use system installed mkl-dnn, googltest, rapidcheck ?

I seriously do NOT wanna use the third-party submodules. Is it possible to let CMakeLists.txt to search for my local installed packages?

Thank you

question about which version to use

Hi, I'm trying to reproduce MLPerf/inference_results_v0.5, and I meet some trouble with ideep API.
For example, auto in_format = dataOrder_ == "NCHW" ? ideep::format::nchw : ideep::format::nhwc; , I think it has changed to format_tag. And reinit has changed to init, right?
But I don't know all correspondences. Should I use an older version ideep as third_party of pytorch and rebuild pytorch from source? I really appreciate it if you can help me.

Is this project dead?

It is still in preview stage but no code update anymore.

Plans on Python 3.7 support

As more and more users started using Python 3.7.x, it is nice if you provide Python 3.7 wheels.
https://pypi.org/project/ideep4py/2.0.0.post3/#files

Thanks in advance!

which ideep branch is meant for pytorch integration?

Hi, I see several branches in ideep with pytorch reference; for example I see pytorch-internal and pytorch_dnnl
Currently I have this PR for pytorch-internal, and I'm looking for getting this into PyTorch 1.13. Please let me know if this is not the correct branch and appreciate any feedback on the PR. thank you!

oneDNN compatibility?

It looks DNNL is now oneDNN. Will ideep be upgraded to be compatible with oneDNN ?

Thank you...

Make the Windows build functional

Hi! I tried to compile ideep with MSVC, but met the following kinds of issues:

symbol annotation : attribute((visibility("default"))) -> __dllspec(dllexport) / __dllspec(dllimport)
unsupported OpenMP clauses: #ifndef _MSC_VER
unsupported C99 VLA in MSVC: int a[c]; -> int* a = new int[c];
inline assembly code

I'm able to resolve the first three kinds of issues but then blocked by the last one because I have little knowledge about assembly code. I really think that you should support ideep on Windows because it is heavily used in deep learning libraries like PyTorch [1] and so on. Without that support, even if MKLDNN itself is supported on Windows, we could not use them actually in those frameworks. Doesn't that sound a little bit weird? What's more, the work won't take too much time because I was able to solve issue kind 1-3 in only 3-4 hrs.

References:
[1] pytorch/pytorch#15982

Any possibility to implement ChannelShuffle op?

Hi folks,
Shufflenet is a pretty popular model (https://arxiv.org/abs/1707.01083). And we tried to run in on IDEEP ops. The only thing that MKLDNN doesn't support is the ChannelShuffle primitive, so we ended up doing fallback to our our implementation and do the context switch. We noticed that MKLDNN's depthwise CNN really helps in this model but the performance is hampered by the context switch, which manifests as reordering of inputs at the conv op. Since channel shuffle is conceptually similar to reorder, is there any possibility that we can add support to that? Here is a reference implementation of ChannelShuffle just to get the idea of what it is: https://github.com/pytorch/pytorch/blob/master/caffe2/operators/channel_shuffle_op.h#L14-L64

Conceptually, you can think if as splitting the channel dimension (c) into two dimension g and k (c = g * k) and then transpose g and k and merge it back to c.

Expose `version` in python module

It is nice if version of iDeep4py module can be accessed from code, e.g.,

>>> import ideep4py
>>> print(ideep4py.__version__)
2.0.0

So that we can embed version information in chainer.print_runtime_info(). It helps when we support users having difficulties with installation.
https://docs.chainer.org/en/latest/performance.html#use-the-latest-version

Please allow to use the external mkl-dnn installed as a package

mkl-dnn packages exist on many systems: https://repology.org/project/mkl-dnn/versions

Memory Leakage in mpool

Hi, folks,
We noticed some memory leakage when using IDEEP. Here is the top stacktrace:

==1260204==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1464320 byte(s) in 1 object(s) allocated from:
    #0 0x385e10 in posix_memalign (/data/users/yinghai/fbsource/fbcode/buck-out/dev/gen/fblearner/predictor/model/tests/caffe2_xray_memory_leak+0x385e10)
    #1 0x7f3ca066f72d in ideep::utils::scratch_allocator::mpool::malloc(unsigned long) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/allocators.hpp:141
    #2 0x7f3ca066ea3b in char* ideep::utils::scratch_allocator::malloc<ideep::computation>(unsigned long) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/allocators.hpp:196
    #3 0x7f3ca066dd55 in void ideep::param::init<ideep::utils::scratch_allocator, ideep::computation>(ideep::param::descriptor const&) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/tensor.hp
p:551
    #4 0x7f3ca066cbac in ideep::param::reshape(std::vector<int, std::allocator<int> >) third-party-buck/gcc-5-glibc-2.23/build/ideep/include/ideep/tensor.hpp:651
    #5 0x7f3ca099a14f in caffe2::IDEEPSqueezeOp::RunOnDevice() caffe2/caffe2/ideep/operators/squeeze_op.cc:58

So the memory is allocated at

ideep/include/ideep/allocators.hpp

Line 141 in 2f2994b

int rc = ::posix_memalign(&ptr, alignment_, len);

And I looked around, its free function doesn't really call free(). Instead, it moves blocks to free list.

ideep/include/ideep/allocators.hpp

Line 152 in 2f2994b

void free(void *ptr) {

Am I missing something?

Question: interoperability with TBB

Hi, we're using TBB in our project, in particular we're using MKL-DNN library compiled with TBB instead of OpenMP. I noticed there's explicit usage of OMP though pragmas in ideep source files, what are the implications of using TBB version of MKL-DNN with ideep?

README inaccurately says it's chainer bindings

If I understand recent correspondence correctly, ideep will be the official frontend for mkl-dnn, not just for chainer. README should be updated.

How to install ideep4py on Windows?

I cannot install ideep4py on Windows.

Is it possible to install ideep4py on Windows?
If is it possible, could you tell me how to install ideep4py on Windows?
Intel MKL is already in windows.

Can you give me advice?

Build ideep with installed mkl-dnn?

Hi, I just wonder if I can build ideep with an already-installed mkl-dnn?
Currently, I've been obtaining the following error message if I didn't git clone --recursive ?

CMake Error at cmake/mkldnn.cmake:16 (add_subdirectory):
The source directory
....../ideep/mkl-dnn
does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
CMakeLists.txt:13 (include)

I prefer building ideep with the newest mkl-dnn which has already been installed?

Cheerrs
Pei

Allow user to tune LRU size

1024 seems to be a pretty arbitrary number for the LRU size. Is there any way that we can change it? Even in compile time is fine.

Code:

ideep/include/ideep/lru_cache.hpp

Line 293 in a861d8c

template <class value_t, class key_t = std::string, size_t capacity = 1024>

	if (src.get_descriptor() != comp.expected_src_descriptor()) {
	src_in.init<alloc, convolution_forward>(
	comp.expected_src_descriptor());
	reorder::compute(src, src_in);
	}
	if (weights.get_descriptor() != comp.expected_weights_descriptor()) {
	weights_in.init<alloc, convolution_forward>(
	comp.expected_weights_descriptor());
	reorder::compute(weights, weights_in);
	}

intel / ideep Goto Github PK

ideep's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs