Comments (14)
iOS is supported since we landed CMake configs.
from nnpack.
What do you mean by supporting iOS?
The PSIMD implementation can be built for any architecture supported by clang. Performance, of course, wouldn't be as good as native assembly.
from nnpack.
Actually, I don't understand what is PSIMD implementation...
That's what I see in your README.md:
NNPACK can be build on OS X and Linux
...
and
Cross-compilation for Native Client
...
Both are not about compilation for iOS.
from nnpack.
Its not yet documented, but you can configure with --enable-psimd
option, and NNPACK will use kernels from src/psimd
from nnpack.
Ok, I see, thank you.
We've tried this option on iPhone 6S and got only 10% performance boost. Is it correct or we did something wrong? Or maybe, not all capabilities of ARM architecture are used in the current NNPACK and you could support ARM better in future.
In short, do you think it is possible to get more boost on iOS?
from nnpack.
The PSIMD branch is generic 128-bit SIMD, it misses many ARM-specific optimization opportunities. It is possible to do much better on ARM, but currently I have no time to work on it.
However, you can get some perf boost on ARM with little effort: in the src/psimd/blas
replace multiplication-accumulation with FMA (a * b + c
-> vmlaq_f32
).
from nnpack.
@joker512 Did you roll out your own version of ARM-based intrinsics in lieu of the SSE/AVX2 versions of @Maratyszcza ?
I'm interested in a fast iOS port as well.
from nnpack.
@carlodelmundo There is some initial NEON support in master (I suppose mainly blas)
from nnpack.
@Maratyszcza for ios, i have tried to compile nnpack with --enable-psimd option,but some error occurred. ld: warning: ignoring file *****/libnnpack.a, file was built for archive which is not the architecture being linked (arm64): *****/libnnpack.a. how can i solve this problem?3x
from nnpack.
For iOS you'd have to create XCode project and do some porting. Its not supported out of the box.
from nnpack.
@Maratyszcza I see ,thanks a lot for your reply
from nnpack.
The PSIMD branch is generic 128-bit SIMD, it misses many ARM-specific optimization opportunities. It is possible to do much better on ARM, but currently I have no time to work on it.
@Maratyszcza I see that you have added optimized NEON intrinsic BLAS code under src/neon since your above comment. However, input & output Fourier transform is still using generic 128-bit SIMD code from src/psimd.
Do you think it's possible to get performance boost by writing ARM NEON intrinsics for input & output transform as well?
Thanks
from nnpack.
@vpatilvasu Yes, I believe so.
from nnpack.
You can potentially build it manually with the set of source files - here is how you can do it for example:
from nnpack.
Related Issues (20)
- potential unitialized variable in nnp_sgemm_upto_4x8__psimd HOT 1
- not found /bin/banchmarkxxx
- Why do more threads take longer?
- AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support HOT 1
- CMakeLists.txt broken on MSYS2/MINGW64/AMD64 (Windows) HOT 3
- Real-time human detection on Pi 4 HOT 1
- 'vdotq_lane_s32' is invalid in C99 [-Wimplicit-function-declaration] HOT 1
- Build failed, cos_npi_over_8 is not available in common HOT 1
- ModuleNotFoundError: No module named 'peachpy.x86_64.avx' HOT 7
- make install dont link to libcpuinfo.so HOT 1
- NNPACK builds are not bit-for-bit reproducible HOT 1
- Unsupported Hardware on VM with compatible CPU HOT 3
- Does NNPACK fall back to non-accelerated code when "Could not initialize NNPACK! Reason: Unsupported hardware." occurs? HOT 1
- ld: in lib/libnnpack.a(conv1x1.py.o), section __TEXT/__const address out of range for architecture x86_64
- Use CPack for packaging HOT 1
- After Installing NNPACK on MacBook Pro 15, late 2012 retina, I still get: [W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
- CMake error cpuinfo-gitclone.cmake:40 (message): Failed to checkout tag: 'master'
- [W NNPACK.cpp:64] Could not initialize NNPACK! Reason: Unsupported hardware
- SIGFPE when using the nosmt linux kernel parameter HOT 1
- FP16 python module error at make HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nnpack.