Comments (35)
hi sh1r0:
It seems using eigen in mobile is more popular than using openblas.Is eigen more efficient than openblas?
from caffe-android-lib.
How did you get it to work? Did you cross compile the OpenBLAS library with hard float support?
I tried this outdated pre-built one before, and got that to work while it was horribly slow. Also, I'm sure that the latest OpenBLAS can be built for Android and works for being linked to executables. However, it's troublesome to be used in jni calls (might be related to this). If you or anyone has any idea about dealing with this issue, please feel free to let me know.
Thanks.
from caffe-android-lib.
AFAIK, Eigen can be simply used as a header-only library, and is quite competitive with other BLAS-like libraries (refer to the benchmark, and note that OpenBLAS is based on GotoBLAS). I'm not going to say that Eigen is the best choice in all the cases, but it's a simple and great one at least in my case.
from caffe-android-lib.
There is a specific openblas branch for "deep learning" at https://github.com/xianyi/OpenBLAS/tree/optimized_for_deeplearning?files=1
from caffe-android-lib.
I modified the flag "-mfloat-abi=hard" to "softfp"(it came error when openblas cross compile with hard float while caffe with softfp)@sh1r0
I tried the outdated pre-built one and https://github.com/xianyi/OpenBLAS/tree/optimized_for_deeplearning?files=1
and failed.
I wonder the way i use is ok...
I linked the libopenblas.so and produce libcaffe.so and libcaffe_jni.so.Then I use "System.loadlibrary("caffe");System.loadlibrary("caffe_jni")" to load these two library.
from caffe-android-lib.
To use the pre-built OpenBLAS:
- get this and extract to
android_lib/
- comment out
android_lib/openblas-android/include/openblas_config.h:20
- remove all
*.so*
inandroid_lib/openblas-android/lib
- modify
scripts/build_caffe.sh
as shown below
@@ -19,7 +19,7 @@ OPENCV_ROOT=${ANDROID_LIB_ROOT}/opencv/sdk/native/jni
PROTOBUF_ROOT=${ANDROID_LIB_ROOT}/protobuf
GFLAGS_HOME=${ANDROID_LIB_ROOT}/gflags
BOOST_HOME=${ANDROID_LIB_ROOT}/boost_1.56.0
-export OpenBLAS_HOME=${ANDROID_LIB_ROOT}/openblas
+export OpenBLAS_HOME=${ANDROID_LIB_ROOT}/openblas-android
export EIGEN_HOME=${ANDROID_LIB_ROOT}/eigen3
rm -rf "${BUILD_DIR}"
@@ -40,7 +40,7 @@ cmake -DCMAKE_TOOLCHAIN_FILE="${WD}/android-cmake/android.toolchain.cmake" \
-DUSE_LMDB=OFF \
-DUSE_LEVELDB=OFF \
-DUSE_HDF5=OFF \
- -DBLAS=eigen \
+ -DBLAS=open \
-DBOOST_ROOT="${BOOST_HOME}" \
-DGFLAGS_INCLUDE_DIR="${GFLAGS_HOME}/include" \
-DGFLAGS_LIBRARY="${GFLAGS_HOME}/lib/libgflags.a" \
- re-build caffe
On the other hand, regarding the master or optimized_for_deeplearning branch of OpenBLAS, hard float support is required. And as I said, it works for native executables but not for jni libs. If you want to build this project with hard float support, you can simply set the flag in the shell export ANDROID_ABI="armeabi-v7a-hard with NEON"
and re-build everything.
from caffe-android-lib.
Thank you very much@sh1r0. It worked with OpenBLAS-0.2.15.tar.gz when I had compile all dependencies with hard float support, with your help. But it seemed to show that using openblas is more faster than eigen in the forwarding of caffe model( 400-800ms faster). I thought may the version eigen is 3.2.5 and it was not the latest,but the openblas was the latest.
Later ,I'll test this using the latest eigen.
For all ,thanks.
from caffe-android-lib.
I used the latest version of eigen (3.2.7), but got the same result... I wonder some flag (like "neon" etc) need to be set to eigen when compiling caffe with eigen.
from caffe-android-lib.
Hi @wuxuewu , good to know that. Do you mean that you have succeeded in getting jni work with hard float? Could you share experience? Thanks.
BTW, I think the version of eigen might be minor to performance. :p
from caffe-android-lib.
@wuxuewu
I tried to run the cpp_classification example on my phone, and simply used time
to do simple benchmarks. The results below are the best three of each build (both are built by armeabi-v7a-hard with NEON
).
======= OpenBLAS ======
0m10.57s real 0m4.76s user 0m4.83s system
0m10.68s real 0m4.35s user 0m4.81s system
0m11.03s real 0m4.46s user 0m4.73s system
======= Eigen ======
0m10.99s real 0m3.48s user 0m3.48s system
0m10.85s real 0m3.30s user 0m3.70s system
0m10.38s real 0m3.58s user 0m3.18s system
from caffe-android-lib.
Hi sh1r0:
Yes, I have succeeded in getting jni work with hard float. Just followed your instruction in the build.sh with all compiling with " armeabi-v7a-hard with NEON " .
The results above you showd seems that openblas is a bit slower than eigen, I did not try the cpp_classification example.(what's the version of openblas and eigen you used ?)
I use the caffe lib with openblas and eigen in the caffe-demo-for-android project, and the caffe_mobile.cpp print logs are below,and i test several times while the results did not change.
===== Eigen ========
Prediction time: 2043.39ms
===== OpenBLAS =====
Prediction time: 1458.48ms
note: caffe model, and cpu mode, eigen 3.2.7, OpenBLAS 0.2.15
sorry, i want to know if the eigen should to be compiled alone or if setting some compile flag for eigen in the build_caffe.sh?
from caffe-android-lib.
Hi @wuxuewu ,
Wow, that's weird. First, I use OpenBLAS v0.2.15 and Eigen v3.2.5.
Second, did you use the build_openblas.sh
to build?
In my experience, armeabi-v7a-hard with NEON
is okay for building everything. However, during runtime, the results are totally wrong. Could you provide some of your prediction results by jni calls?
(EDIT: caffe/examples/images/cat.jpg
is a good candidate for the tests.)
For the last question, the answer is no. There is no need to build eigen alone.
from caffe-android-lib.
Hi sh1r0,
I used to test the openblas and eigen with two mobile I have (A and B),and got results below:
phone A phone B
---------- openblas - 8 ----------
502ms 1330ms
458ms 1280ms
584ms 1530ms
4168ms 1400ms
4822ms 1420ms
------------openblas - 4 -----------
409ms 1300ms
445ms 1490ms
385ms 1410ms
385ms 1360ms
376ms 1410ms
365ms 1340ms
367ms 1440ms
------------- eigen -----------
539ms 2170ms
526ms 2100ms
535ms 2160ms
564ms 2220ms
551ms 2160ms
528ms 2210ms
537ms 2140ms
phone A: AArch64, android 6.0, 8 core
phone B: Armv7 rev 1, android4.4.2, 4 core
(phone C: Armv7 rev 5, android4.4.2, 8 core. results same as phone B)
openblas - 8: compile with TARGET=ARMV7 USE_THREAD=ON NUM_THREADS=8
openblas - 4: compile with TARGET=ARMV7 USE_THREAD=ON NUM_THREADS=4
from caffe-android-lib.
I count the time with the following change in caffe_mobile.cpp, because I found predicting on phone A the function "clock()" was not precise.The log output was "Prediction time: 3900ms" while I saw the app returned results less than one second. So I used the following way to count the time.(The log would output and could get the time in the window logcat of eclipse)
"
VLOG(1)<<"wxw";
const vector<Blob*>& result = caffe_net->Forward(dummy_bottom_vec, &loss);
VLOG(1)<<"wxw";
"
from caffe-android-lib.
Hi @wuxuewu , it seems that your prediction results are correct? I mean, for example, caffe/examples/images/cat.jpg
is classified as tabby cat
(top-1), right? Could you provide your script for building OpenBLAS and possibly your adaptions for building this project? It'll be great to integrate it.
Regrading Forwarding time
in caffe_mobile.cpp
, I think it counts the real cpu time (sum up all your multi-core cpu time) rather than the wall time, I'll try to fix this.
Thanks.
from caffe-android-lib.
Hi @sh1r0 :
the script of building OpenBLAS is below:
"
#!/usr/bin/env sh
if [ -z "$NDK_ROOT" ] && [ "$#" -eq 0 ]; then
echo 'Either $NDK_ROOT should be set or provided as argument'
echo "e.g., 'export NDK_ROOT=/path/to/ndk' or"
echo " '${0} /path/to/ndk'"exit 1
else
NDK_ROOT="${1:-${NDK_ROOT}}"
fi
#export OPENBLAS_NUM_THREADS=1
TOOLCHAIN_DIR=$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
WD=$(readlink -f "dirname $0
/..")
INSTALL_DIR=${WD}/android_lib
N_JOBS=8
cd OpenBLAS
make clean
make -j${N_JOBS}
CC="$TOOLCHAIN_DIR/arm-linux-androideabi-gcc --sysroot=$NDK_ROOT/platforms/android-19/arch-arm"
CROSS_SUFFIX=$TOOLCHAIN_DIR/arm-linux-androideabi-
HOSTCC=gcc NO_LAPACK=1 TARGET=ARMV7
USE_THREAD=ON NUM_THREADS=4
rm -rf "$INSTALL_DIR/openblas"
make PREFIX="$INSTALL_DIR/openblas" install
"
I used the "caffe/examples/images/cat.jpg" to predict, but I did not focus on the result of prediction. I modified the last layer of the caffe model with only 4 outputs, but I did not change the synset_words.txt remained 1000 classifications. Does that matter?
from caffe-android-lib.
the script of building OpenBLAS is below:
#!/usr/bin/env sh
if [ -z "$NDK_ROOT" ] && [ "$#" -eq 0 ]; then
echo 'Either $NDK_ROOT should be set or provided as argument'
echo "e.g., 'export NDK_ROOT=/path/to/ndk' or"
echo " '${0} /path/to/ndk'"exit 1
else
NDK_ROOT="${1:-${NDK_ROOT}}"
fi
#export OPENBLAS_NUM_THREADS=1
TOOLCHAIN_DIR=$NDK_ROOT/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
WD=$(readlink -f "
dirname $0 /..")
INSTALL_DIR=${WD}/android_lib
N_JOBS=8
cd OpenBLAS
make clean
make -j${N_JOBS} \
CC="$TOOLCHAIN_DIR/arm-linux-androideabi-gcc --sysroot=$NDK_ROOT/platforms/android-19/arch-arm" \
CROSS_SUFFIX=$TOOLCHAIN_DIR/arm-linux-androideabi- \
HOSTCC=gcc NO_LAPACK=1 TARGET=ARMV7 \
USE_THREAD=ON NUM_THREADS=4
rm -rf "$INSTALL_DIR/openblas"
make PREFIX="$INSTALL_DIR/openblas" install
from caffe-android-lib.
@wuxuewu
OK, It seems that your script is almost the same as mine.
Did you fine-tune the caffemodel for your own purpose?
I'm curious about the prediction results. Could you provide your results of PredictTopK
by feeding caffe/examples/images/cat.jpg
(through jni calls) into the standard caffenet model provided by bvlc?
BTW, what NDK version do you use?
Thanks.
from caffe-android-lib.
Hi @sh1r0 ,
I did not fine-tune the caffemodel yet. Just now, I tested with the standard caffemodel, and caffe/examples/images/cat.jpg
is classified as tabby, tabby cat
(top-1), I used top-1 of function predict_top_k
in the caffe_mobile.cpp
My NDK version is r10e
.
from caffe-android-lib.
And the time it took was almost the same as above in phone B (phone B: Armv7 rev 1, android4.4.2, 4 core) . I just tested it on phone B.
from caffe-android-lib.
Hi @wuxuewu ,
That's weird, I always get the incorrect results. Could you provide your prebuilt libcaffe.so
and libcaffe_jni.so
for me to check if my device is the real problem?
Thanks.
from caffe-android-lib.
I just got another phone to test, the results were (unsurprisingly?) incorrect, too. Perhaps, device is not the problem. My tests follow this ("armeabi-v7a-hard with NEON"
is used in 2nd step).
Did I miss anything special about reproducing your results? Also, could you try to build with the latest master branch (follow the steps in the link above), and let me know if that works for you?
Thanks.
Note: This attached image is my prediction result of caffe/examples/images/cat.jpg
using caffe-android-demo
app with substitute libs.
from caffe-android-lib.
I think maybe the key of the question is the caffemodel. You could use another caffemodel... I use caffemodel downloading from http://dl.caffe.berkeleyvision.org/
.Sorry, I could not upload files because of my company's rules... But I'll try to build with the latest master branch, and let you know.
from caffe-android-lib.
@wuxuewu ,
I do not think the problem is the model. The cpp_classification example (executable) works fine with both armeabi-v7a with NEON
and armeabi-v7a-hard with NEON
build. Also, a clean caffe-android-demo
(where the libs are built with armeabi-v7a with NEON
) works. All I did to my demo app as I mentioned in the last comment was to change jni libs with armeabi-v7a-hard with NEON
ones.
And to be specific, there are numeric issues when the native methods are calling from java, as the prediction results are "fixed" no matter what the input image is.
(My models are all downloaded by using scripts provided from official caffe.)
cd caffe
./scripts/download_model_binary.py models/bvlc_reference_caffenet
from caffe-android-lib.
I downloaded the caffe on Dec,22. And the caffe zip name is caffe-462c0b8e6575f72e50307ac61c116ea28c09eaad
. I did not find any numeric issues when the native methods builded with armeabi-v7a-hard with NEON
are calling from java. Because it does not using jfloat
in this branch version. So I think maybe the problem is the jni call...
from caffe-android-lib.
Why you need to download caffe?
Because it does not using jfloat in this branch version.
Sorry, I cannot get the idea. jfloat
is never used in official caffe. But in this project, I make a jni wrapper for java to call native methods. And yes, all problems should be related to the jni calls.
So, if possible, let me know the results of your build with the latest master branch.
Thanks.
from caffe-android-lib.
Hi @wuxuewu ,
I think I found the problem, OS! I just had a try on my Macbook, and I got it. Sorry for bothering you so much, and thanks for your help. Just a quick question, what kind of environment (OS) do you use? All my trials on Ubuntu 14.04 (both real and virtual machines) failed, and made me think that armeabi-v7a-hard with NEON
builds did not work at all.
EDIT: I still cannot make OpenBLAS works, while armeabi-v7a-hard with NEON
is okay for Eigen to produce correct results. I'm really confused. 😕
from caffe-android-lib.
Hi @wuxuewu ,
I think I eventually found the problem, that is, multi-thread support of OpenBLAS (NUM_THREADS
). Therefore, I set NUM_THREADS=1 as single-threaded.
I cannot get the clear idea why multi-threading not works on my devices. Both of my devices are quad-core. It's really a pity that the computation power is not fully utilized.
from caffe-android-lib.
@sh1r0 Is the issue related to the fact that "The JNI interface pointer (JNIEnv *) is only valid in the current thread."? Have you tested with openmp flags? See https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded
from caffe-android-lib.
@bhack According to the reports above from @wuxuewu , I think NUM_THREADS with value greater than 1 works for him. However, some people mentioned in OpenMathLib/OpenBLAS#363 that OpenBLAS for android works only if single-threaded (?).
I've never used openmp flag before. Probably, I'll have a try later. Thanks.
from caffe-android-lib.
If the native code in caffe called by jni use threads openblas need to parallelize with openmp
from caffe-android-lib.
@bhack Thanks for your information. I just updated the master branch to support OpenMP.
from caffe-android-lib.
@sh1r0 Next step CUDA support on android tegra k1 and x1 could be very useful.
from caffe-android-lib.
@bhack
Recently, I got NVIDIA CodeWorks for Android 1R4 which contains cuda toolkit for tegra devices, but I failed to get that work by cmake at my very first trial. I'll do a deep investigation later (probably after #23).
from caffe-android-lib.
Great work! I also want to play caffe on android :)
from caffe-android-lib.
Related Issues (20)
- Performance become very slow when using multiple threads to do net_->Forward();
- OpenBlas Build failed HOT 1
- Error while running ./build.sh HOT 1
- make: *** No rule to make target 'clean'. Stop. Failed to build OpenBLAS
- How to make pycaffe and build _caffe module HOT 1
- Failed to build OpenBLAS HOT 2
- hi, Failed to build OpenBLAS,libopenblas_atom-r0.2.18.a HOT 1
- Failed to build OpenBLAS: cannot stat 'libopenblas_armv8p-r0.3.0.dev.so': No such file or directory HOT 3
- can not use asset manager HOT 1
- Failed to build OpenBLAS? HOT 29
- Is it suitable for all network?
- Is trainning available on Android?
- Failed to find header file, during [make] process HOT 1
- armeabi-v7a with: command not found HOT 1
- libtool: error: unrecognised option: '-DHAVE_CONFIG_H' HOT 2
- Build error when linking libopenblas.a to libcaffe.so
- find: '/caffe-android-lib/toolchains/armeabi/bin/': No such file or directory
- Porting an exisiting caffe net for text recognition?
- run caffee demo error HOT 3
- Compilation failed at ndk-r19c
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caffe-android-lib.