hi sh1r0: I'm very interesting in your project.This project is very wondeful.It wo

There is a specific openblas branch for "deep learning" at <a href="https://github.com

To use the pre-built OpenBLAS: get <a href="http://sourceforge

About Openblas,about sh1r0/caffe-android-lib

Comments (35)

nickwxwu commented on June 6, 2024

hi sh1r0:
It seems using eigen in mobile is more popular than using openblas.Is eigen more efficient than openblas?

from caffe-android-lib.

sh1r0 commented on June 6, 2024

How did you get it to work? Did you cross compile the OpenBLAS library with hard float support?
I tried this outdated pre-built one before, and got that to work while it was horribly slow. Also, I'm sure that the latest OpenBLAS can be built for Android and works for being linked to executables. However, it's troublesome to be used in jni calls (might be related to this). If you or anyone has any idea about dealing with this issue, please feel free to let me know.
Thanks.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

AFAIK, Eigen can be simply used as a header-only library, and is quite competitive with other BLAS-like libraries (refer to the benchmark, and note that OpenBLAS is based on GotoBLAS). I'm not going to say that Eigen is the best choice in all the cases, but it's a simple and great one at least in my case.

from caffe-android-lib.

bhack commented on June 6, 2024

There is a specific openblas branch for "deep learning" at https://github.com/xianyi/OpenBLAS/tree/optimized_for_deeplearning?files=1

from caffe-android-lib.

nickwxwu commented on June 6, 2024

I modified the flag "-mfloat-abi=hard" to "softfp"(it came error when openblas cross compile with hard float while caffe with softfp)@sh1r0

I tried the outdated pre-built one and https://github.com/xianyi/OpenBLAS/tree/optimized_for_deeplearning?files=1
and failed.
I wonder the way i use is ok...
I linked the libopenblas.so and produce libcaffe.so and libcaffe_jni.so.Then I use "System.loadlibrary("caffe");System.loadlibrary("caffe_jni")" to load these two library.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

To use the pre-built OpenBLAS:

get this and extract to android_lib/
comment out android_lib/openblas-android/include/openblas_config.h:20
remove all *.so* in android_lib/openblas-android/lib
modify scripts/build_caffe.sh as shown below

@@ -19,7 +19,7 @@ OPENCV_ROOT=${ANDROID_LIB_ROOT}/opencv/sdk/native/jni
PROTOBUF_ROOT=${ANDROID_LIB_ROOT}/protobuf
GFLAGS_HOME=${ANDROID_LIB_ROOT}/gflags
BOOST_HOME=${ANDROID_LIB_ROOT}/boost_1.56.0
-export OpenBLAS_HOME=${ANDROID_LIB_ROOT}/openblas
+export OpenBLAS_HOME=${ANDROID_LIB_ROOT}/openblas-android
export EIGEN_HOME=${ANDROID_LIB_ROOT}/eigen3

rm -rf "${BUILD_DIR}"
@@ -40,7 +40,7 @@ cmake -DCMAKE_TOOLCHAIN_FILE="${WD}/android-cmake/android.toolchain.cmake" \
   -DUSE_LMDB=OFF \
   -DUSE_LEVELDB=OFF \
   -DUSE_HDF5=OFF \
-      -DBLAS=eigen \
+      -DBLAS=open \
   -DBOOST_ROOT="${BOOST_HOME}" \
   -DGFLAGS_INCLUDE_DIR="${GFLAGS_HOME}/include" \
   -DGFLAGS_LIBRARY="${GFLAGS_HOME}/lib/libgflags.a" \

re-build caffe

On the other hand, regarding the master or optimized_for_deeplearning branch of OpenBLAS, hard float support is required. And as I said, it works for native executables but not for jni libs. If you want to build this project with hard float support, you can simply set the flag in the shell export ANDROID_ABI="armeabi-v7a-hard with NEON" and re-build everything.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

Thank you very much@sh1r0. It worked with OpenBLAS-0.2.15.tar.gz when I had compile all dependencies with hard float support, with your help. But it seemed to show that using openblas is more faster than eigen in the forwarding of caffe model( 400-800ms faster). I thought may the version eigen is 3.2.5 and it was not the latest,but the openblas was the latest.
Later ,I'll test this using the latest eigen.
For all ,thanks.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

I used the latest version of eigen (3.2.7), but got the same result... I wonder some flag (like "neon" etc) need to be set to eigen when compiling caffe with eigen.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu , good to know that. Do you mean that you have succeeded in getting jni work with hard float? Could you share experience? Thanks.
BTW, I think the version of eigen might be minor to performance. :p

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@wuxuewu
I tried to run the cpp_classification example on my phone, and simply used time to do simple benchmarks. The results below are the best three of each build (both are built by armeabi-v7a-hard with NEON).

=======  OpenBLAS  ======
0m10.57s real     0m4.76s user     0m4.83s system
0m10.68s real     0m4.35s user     0m4.81s system
0m11.03s real     0m4.46s user     0m4.73s system

=======   Eigen    ======
0m10.99s real     0m3.48s user     0m3.48s system
0m10.85s real     0m3.30s user     0m3.70s system
0m10.38s real     0m3.58s user     0m3.18s system

from caffe-android-lib.

nickwxwu commented on June 6, 2024

Hi sh1r0:
Yes, I have succeeded in getting jni work with hard float. Just followed your instruction in the build.sh with all compiling with " armeabi-v7a-hard with NEON " .
The results above you showd seems that openblas is a bit slower than eigen, I did not try the cpp_classification example.(what's the version of openblas and eigen you used ?)
I use the caffe lib with openblas and eigen in the caffe-demo-for-android project, and the caffe_mobile.cpp print logs are below,and i test several times while the results did not change.
===== Eigen ========
Prediction time: 2043.39ms

===== OpenBLAS =====
Prediction time: 1458.48ms

note: caffe model, and cpu mode, eigen 3.2.7, OpenBLAS 0.2.15
sorry, i want to know if the eigen should to be compiled alone or if setting some compile flag for eigen in the build_caffe.sh?

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu ,
Wow, that's weird. First, I use OpenBLAS v0.2.15 and Eigen v3.2.5.
Second, did you use the build_openblas.sh to build?
In my experience, armeabi-v7a-hard with NEON is okay for building everything. However, during runtime, the results are totally wrong. Could you provide some of your prediction results by jni calls?
(EDIT: caffe/examples/images/cat.jpg is a good candidate for the tests.)
For the last question, the answer is no. There is no need to build eigen alone.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

Hi sh1r0,
I used to test the openblas and eigen with two mobile I have (A and B),and got results below:

phone A phone B
---------- openblas - 8 ----------
502ms 1330ms
458ms 1280ms
584ms 1530ms
4168ms 1400ms
4822ms 1420ms

------------openblas - 4 -----------
409ms 1300ms
445ms 1490ms
385ms 1410ms
385ms 1360ms
376ms 1410ms
365ms 1340ms
367ms 1440ms

------------- eigen -----------
539ms 2170ms
526ms 2100ms
535ms 2160ms
564ms 2220ms
551ms 2160ms
528ms 2210ms
537ms 2140ms

phone A: AArch64, android 6.0, 8 core
phone B: Armv7 rev 1, android4.4.2, 4 core
(phone C: Armv7 rev 5, android4.4.2, 8 core. results same as phone B)
openblas - 8: compile with TARGET=ARMV7 USE_THREAD=ON NUM_THREADS=8
openblas - 4: compile with TARGET=ARMV7 USE_THREAD=ON NUM_THREADS=4

from caffe-android-lib.

nickwxwu commented on June 6, 2024

I count the time with the following change in caffe_mobile.cpp, because I found predicting on phone A the function "clock()" was not precise.The log output was "Prediction time: 3900ms" while I saw the app returned results less than one second. So I used the following way to count the time.(The log would output and could get the time in the window logcat of eclipse)
"
VLOG(1)<<"wxw";
const vector<Blob*>& result = caffe_net->Forward(dummy_bottom_vec, &loss);
VLOG(1)<<"wxw";
"

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu , it seems that your prediction results are correct? I mean, for example, caffe/examples/images/cat.jpg is classified as tabby cat (top-1), right? Could you provide your script for building OpenBLAS and possibly your adaptions for building this project? It'll be great to integrate it.
Regrading Forwarding time in caffe_mobile.cpp, I think it counts the real cpu time (sum up all your multi-core cpu time) rather than the wall time, I'll try to fix this.
Thanks.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

Hi @sh1r0 :
the script of building OpenBLAS is below:
"
#!/usr/bin/env sh

if [ -z "$NDK_ROOT" ] && [ "$#" -eq 0 ]; then
echo 'Either $NDK_ROOT should be set or provided as argument'
echo "e.g., 'export NDK_ROOT=/path/to/ndk' or"
echo " '${0} /path/to/ndk'"exit 1
else
NDK_ROOT="${1:-${NDK_ROOT}}"
fi

#export OPENBLAS_NUM_THREADS=1
TOOLCHAIN_DIR=$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
WD=$(readlink -f "dirname $0/..")
INSTALL_DIR=${WD}/android_lib
N_JOBS=8

cd OpenBLAS

make clean
make -j${N_JOBS}
CC="$TOOLCHAIN_DIR/arm-linux-androideabi-gcc --sysroot=$NDK_ROOT/platforms/android-19/arch-arm"
CROSS_SUFFIX=$TOOLCHAIN_DIR/arm-linux-androideabi-
HOSTCC=gcc NO_LAPACK=1 TARGET=ARMV7
USE_THREAD=ON NUM_THREADS=4

rm -rf "$INSTALL_DIR/openblas"
make PREFIX="$INSTALL_DIR/openblas" install
"

I used the "caffe/examples/images/cat.jpg" to predict, but I did not focus on the result of prediction. I modified the last layer of the caffe model with only 4 outputs, but I did not change the synset_words.txt remained 1000 classifications. Does that matter?

from caffe-android-lib.

nickwxwu commented on June 6, 2024

the script of building OpenBLAS is below:

#!/usr/bin/env sh

if [ -z "$NDK_ROOT" ] && [ "$#" -eq 0 ]; then
echo 'Either $NDK_ROOT should be set or provided as argument'
echo "e.g., 'export NDK_ROOT=/path/to/ndk' or"
echo " '${0} /path/to/ndk'"exit 1
else
NDK_ROOT="${1:-${NDK_ROOT}}"
fi

#export OPENBLAS_NUM_THREADS=1
TOOLCHAIN_DIR=$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
WD=$(readlink -f " dirname $0 /..")
INSTALL_DIR=${WD}/android_lib
N_JOBS=8

cd OpenBLAS

make clean
make -j${N_JOBS} \
CC="$TOOLCHAIN_DIR/arm-linux-androideabi-gcc --sysroot=$NDK_ROOT/platforms/android-19/arch-arm" \
CROSS_SUFFIX=$TOOLCHAIN_DIR/arm-linux-androideabi- \
HOSTCC=gcc NO_LAPACK=1 TARGET=ARMV7 \
USE_THREAD=ON NUM_THREADS=4

rm -rf "$INSTALL_DIR/openblas"
make PREFIX="$INSTALL_DIR/openblas" install

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@wuxuewu
OK, It seems that your script is almost the same as mine.
Did you fine-tune the caffemodel for your own purpose?
I'm curious about the prediction results. Could you provide your results of PredictTopK by feeding caffe/examples/images/cat.jpg (through jni calls) into the standard caffenet model provided by bvlc?

BTW, what NDK version do you use?
Thanks.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

Hi @sh1r0 ,
I did not fine-tune the caffemodel yet. Just now, I tested with the standard caffemodel, and caffe/examples/images/cat.jpg is classified as tabby, tabby cat (top-1), I used top-1 of function predict_top_k in the caffe_mobile.cpp

My NDK version is r10e.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

And the time it took was almost the same as above in phone B (phone B: Armv7 rev 1, android4.4.2, 4 core) . I just tested it on phone B.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu ,
That's weird, I always get the incorrect results. Could you provide your prebuilt libcaffe.so and libcaffe_jni.so for me to check if my device is the real problem?
Thanks.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

I just got another phone to test, the results were (unsurprisingly?) incorrect, too. Perhaps, device is not the problem. My tests follow this ("armeabi-v7a-hard with NEON" is used in 2nd step).
Did I miss anything special about reproducing your results? Also, could you try to build with the latest master branch (follow the steps in the link above), and let me know if that works for you?
Thanks.

Note: This attached image is my prediction result of caffe/examples/images/cat.jpg using caffe-android-demo app with substitute libs.

from caffe-android-lib.

nickwxwu commented on June 6, 2024

I think maybe the key of the question is the caffemodel. You could use another caffemodel... I use caffemodel downloading from http://dl.caffe.berkeleyvision.org/.Sorry, I could not upload files because of my company's rules... But I'll try to build with the latest master branch, and let you know.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@wuxuewu ,
I do not think the problem is the model. The cpp_classification example (executable) works fine with both armeabi-v7a with NEON and armeabi-v7a-hard with NEON build. Also, a clean caffe-android-demo (where the libs are built with armeabi-v7a with NEON) works. All I did to my demo app as I mentioned in the last comment was to change jni libs with armeabi-v7a-hard with NEON ones.
And to be specific, there are numeric issues when the native methods are calling from java, as the prediction results are "fixed" no matter what the input image is.
(My models are all downloaded by using scripts provided from official caffe.)

cd caffe
./scripts/download_model_binary.py models/bvlc_reference_caffenet

from caffe-android-lib.

nickwxwu commented on June 6, 2024

I downloaded the caffe on Dec,22. And the caffe zip name is caffe-462c0b8e6575f72e50307ac61c116ea28c09eaad. I did not find any numeric issues when the native methods builded with armeabi-v7a-hard with NEON are calling from java. Because it does not using jfloat in this branch version. So I think maybe the problem is the jni call...

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Why you need to download caffe?

Because it does not using jfloat in this branch version.

Sorry, I cannot get the idea. jfloat is never used in official caffe. But in this project, I make a jni wrapper for java to call native methods. And yes, all problems should be related to the jni calls.

So, if possible, let me know the results of your build with the latest master branch.
Thanks.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu ,
I think I found the problem, OS! I just had a try on my Macbook, and I got it. Sorry for bothering you so much, and thanks for your help. Just a quick question, what kind of environment (OS) do you use? All my trials on Ubuntu 14.04 (both real and virtual machines) failed, and made me think that armeabi-v7a-hard with NEON builds did not work at all.
EDIT: I still cannot make OpenBLAS works, while armeabi-v7a-hard with NEON is okay for Eigen to produce correct results. I'm really confused. 😕

from caffe-android-lib.

sh1r0 commented on June 6, 2024

Hi @wuxuewu ,
I think I eventually found the problem, that is, multi-thread support of OpenBLAS (NUM_THREADS). Therefore, I set NUM_THREADS=1 as single-threaded.

I cannot get the clear idea why multi-threading not works on my devices. Both of my devices are quad-core. It's really a pity that the computation power is not fully utilized.

from caffe-android-lib.

bhack commented on June 6, 2024

@sh1r0 Is the issue related to the fact that "The JNI interface pointer (JNIEnv *) is only valid in the current thread."? Have you tested with openmp flags? See https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@bhack According to the reports above from @wuxuewu , I think NUM_THREADS with value greater than 1 works for him. However, some people mentioned in OpenMathLib/OpenBLAS#363 that OpenBLAS for android works only if single-threaded (?).
I've never used openmp flag before. Probably, I'll have a try later. Thanks.

from caffe-android-lib.

bhack commented on June 6, 2024

If the native code in caffe called by jni use threads openblas need to parallelize with openmp

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@bhack Thanks for your information. I just updated the master branch to support OpenMP.

from caffe-android-lib.

bhack commented on June 6, 2024

@sh1r0 Next step CUDA support on android tegra k1 and x1 could be very useful.

from caffe-android-lib.

sh1r0 commented on June 6, 2024

@bhack
Recently, I got NVIDIA CodeWorks for Android 1R4 which contains cuda toolkit for tegra devices, but I failed to get that work by cmake at my very first trial. I'll do a deep investigation later (probably after #23).

from caffe-android-lib.

xianyi commented on June 6, 2024

Great work! I also want to play caffe on android :)

from caffe-android-lib.

About Openblas about caffe-android-lib HOT 35 CLOSED

Comments (35)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs