Comments (7)
测试大模型的时候需要手动添加这几个宏:
-DMNN_LOW_MEMORY=ON
-DMNN_ARM82=ON
只有打开了arm82,才会使用一些加速指令
from mnn.
另外,需要设置precision=low才会使用fp16推理,否则是fp32
from mnn.
手机不同性能也不一样吧
from mnn.
Different mobile phones have different performance, right?
Yes you are right, but I am testing with more powerful soc i.e 8 gen 2.
The APK release on mnn-llm can do 26 t/s decode on my 8 gen 2 but the source built libs are doing 2 t/s . I found out that the mnn is not built with -DMNN_LOW_MEMORY cmake flag when using https://github.com/wangzhaode/mnn-llm/blob/master/script/android_build.sh
When I built the MNN libs with the low memory flag I could see a huge boost in performance. The performance is still not as good as the release apk though. I doubt that I am still missing some of the flags and options that were used to compile the mnn libs present in apk
from mnn.
Thank you very much for this information.
MNN_LOW_MEMORY this flag enables int4 weights support but activation are still in fp32. Will precision=low make activations fp16? Also how can I enable precision=low
Maybe setting it to backend config, let me check if mnn-llm can do that
Update: I am using low precision in cpu backend config
from mnn.
具体的一些设置(precision, threadNumber etc.)在 llm.cpp中,关于直接使用mnn来跑 llm ,可以打开宏 -DMNN_BUILD_LLM=ON,然后得到llm_demo这个可执行文件来跑大模型。
from mnn.
Thank you very much @v0jiuqi . You have saved my day and helped me to fix the issue. I can confirm I can get the same perf now as the released apk.
I cant thank you enough <3 .
Thanks again and have a good day
from mnn.
Related Issues (20)
- RK3588 (A76+A55) + Ubuntu 22.04 + MNN_ARM82=ON: "MNNAbsMaxFP16.S:106: Error: operand 1 must be a SIMD vector register" HOT 4
- quantized.out int8量化后的模型在CPU后端运行resnet50出错(已更新到最新commit版本) HOT 10
- 静态shape转换后的模型模型,是否可以计算动态shape的输入 HOT 2
- Windows x64下使用expr进行较大规模卷积计算时崩溃
- MNN的GPU性能对比CPU HOT 2
- [BUG] mem_low模式下fp6无法进行推理 HOT 3
- 汇编里为什么使用机器码,而不直接使用对应汇编指令? HOT 5
- opencl 和 vulkan 推理报错 HOT 1
- Wrong input size at converting and wrong ineference values on popular models (SD, Whisper) HOT 4
- android手机上初始化报错,MNN版本2.7.1 HOT 2
- MNN 2.8.1 CPU耗时比ONNXRuntime多 HOT 2
- 请问走cpu mnn的初始化耗时是否有办法进一步减少 HOT 3
- Can MNN be compiled in Apple M1 max (arm64 proccesor)? HOT 1
- 模型推理结果异常 HOT 11
- MNN对yolov5n.pt导出的onnx模型剪枝后在在单片机中调用报错Segment fault如何解决 HOT 1
- mnn模型量化后使用timeProfile.out测速结果分析 HOT 1
- Model Output Disparity HOT 1
- 一个较大的模型在infer时出现 _mm256_storeu_ps(d + PACK_UNIT * 1, t1); HOT 2
- std::vector to tensor HOT 2
- RAM usage regeression between v1.0.1 and v2.8.1 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mnn.