megengine / megpeak Goto Github PK

View Code? Open in Web Editor NEW

244.0 244.0 36.0 163 KB

License: Apache License 2.0

CMake 0.50% Shell 0.47% C 36.82% C++ 62.21%

megpeak's Introduction

MegEngine

Documentation | 中文文档

MegEngine is a fast, scalable, and user friendly deep learning framework with 3 key features.

Unified framework for both training and inference
- Quantization, dynamic shape/image pre-processing, and even derivation with a single model.
- After training, put everything into your model to inference on any platform with speed and precision. Check here for a quick guide.
The lowest hardware requirements
- The memory usage of the GPU can be reduced to one-third of the original memory usage when DTR algorithm is enabled.
- Inference models with the lowest memory usage by leveraging our Pushdown memory planner.
Inference efficiently on all platforms
- Inference with speed and high-precision on x86, Arm, CUDA, and RoCM.
- Supports Linux, Windows, iOS, Android, TEE, etc.
- Optimize performance and memory usage by leveraging our advanced features.

Installation

NOTE: MegEngine now supports Python installation on Linux-64bit/Windows-64bit/MacOS(CPU-Only)-10.14+/Android 7+(CPU-Only) platforms with Python from 3.6 to 3.9. On Windows 10 you can either install the Linux distribution through Windows Subsystem for Linux (WSL) or install the Windows distribution directly. Many other platforms are supported for inference.

Binaries

To install the pre-built binaries via pip wheels:

python3 -m pip install --upgrade pip
python3 -m pip install megengine -f https://megengine.org.cn/whl/mge.html

Building from Source

CMake build details. please refer to BUILD_README.md
Python binding build details, Please refer to BUILD_PYTHON_WHL_README.md

How to Contribute

MegEngine adopts Contributor Covenant as a guideline to run our community. Please read the Code of Conduct.
Every contributor of MegEngine must sign a Contributor License Agreement (CLA) to clarify the intellectual property license granted with the contributions.
You can help to improve MegEngine in many ways:
- Write code.
- Improve documentation.
- Answer questions on MegEngine Forum, or Stack Overflow.
- Contribute new models in MegEngine Model Hub.
- Try a new idea on MegStudio.
- Report or investigate bugs and issues.
- Review Pull Requests.
- Star MegEngine repo.
- Cite MegEngine in your papers and articles.
- Recommend MegEngine to your friends.
- Any other form of contribution is welcomed.

We strive to build an open and friendly community. We aim to power humanity with AI.

How to Contact Us

Issue: github.com/MegEngine/MegEngine/issues
Email: [email protected]
Forum: discuss.megengine.org.cn
QQ Group: 1029741705

Resources

MegEngine
MegStudio
mirror repo
- OPENI: openi.org.cn/MegEngine
- Gitee: gitee.com/MegEngine/MegEngine

License

MegEngine is licensed under the Apache License, Version 2.0

Citation

If you use MegEngine in your publication,please cite it by using the following BibTeX entry.

@Misc{MegEngine,
  institution = {megvii},
  title =  {MegEngine:A fast, scalable and easy-to-use deep learning framework},
  howpublished = {\url{https://github.com/MegEngine/MegEngine}},
  year = {2020}
}

megpeak's People

Contributors

Stargazers

Watchers

megpeak's Issues

寒武纪MLU 上测试opencl 出错

在寒武纪芯加速卡MLU上测试，运行 ./megpeak -d opencl时，出现以下错误：

use opencl error:/usr/lib64/libOpenCL.so
opencl error -1001: unknow error (clGetPlatformIDs(0,nullptr,&nr_platforms) at /data/....../MegPeak-main/src/opencl/common.cpp:OpenCLEnv:438)

请问这是MLU不支持openCL导致的吗？

make err

Which src code I need change

mkdir build && cd build
cmake ..
make
Consolidate compiler generated dependencies of target megpeak
[ 10%] Building CXX object CMakeFiles/megpeak.dir/src/cpu/x86_avx.cpp.o
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:25:34: error: attribute(target("avx512bw")) is unknown
     static int func##_throughput() {                               \
                                  ^
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:118:1: note: in expansion of macro ‘THROUGHPUT’
 THROUGHPUT(cb, vpmaddwd_512, "avx512bw")
 ^
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:42:31: error: attribute(target("avx512bw")) is unknown
     static int func##_latency() {        \
                               ^
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:121:1: note: in expansion of macro ‘LATENCY’
 LATENCY(cb, vpmaddwd_512, "avx512bw")
 ^
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:25:34: error: attribute(target("avx512bw")) is unknown
     static int func##_throughput() {                               \
                                  ^
/sh2/home/hanhao/MegPeak-main/src/cpu/x86_avx.cpp:125:1: note: in expansion of macro ‘THROUGHPUT’
 THROUGHPUT(cb, vpaddd_512, "avx512bw")

<feat> Can you guys add cpu fp16 benchmark for this awesome project?

[BUG] Detecting cpu architecture error

When trying to build MegPeak from sources on Termux, MGEPEAK_ARCH is incorrectly set to x86.

CMAKE SYSTEM PROCESSOR :aarch64.
-- CONFIG MGEPEAK_ARCH TO x86

Print more information about current CPU

目前的实现，打印了cpu总数和当前cpu的编号。
实际上每个cpu可以打印更多的信息；尤其是多个CPU不完全一样的时候，例如android arm上的超大、大、小核心，微架构名称不一样。

Question: when loading opencl library, why use lock (mutex + lock_guard)?

问题描述

在 MegPeak/opencl-stub/src/libopencl-wrap.h 中的 load_library() 函数里，执行 opencl 库的加载。查看了 Linux 的 Man 手册， dlopen() 本身是线程安全的。不太明白使用了 mutex + lock guard 的目的。

能否有具体的场景说明呢？

代码

static void load_library() {
    static bool done = false;
    static std::mutex mtx;
    std::lock_guard<std::mutex> lg{mtx};

    if (done)
        return;

    void* handle = get_library_handle();
    for (size_t i = 0; i < NR_FUNC; ++i) {
        void* func;
        if (!handle) {
            func = nullptr;
        } else {
            func = resolve_library_func(handle, g_func_name[i]);
        }
        if (!func) {
            func = g_func_table_error[i];
        }
        __atomic_store_n(g_func_table + i, func, __ATOMIC_RELAXED);
    }
    done = true;
}

opencl ？

请问这里的opencl指的是gpu吗？

关于单位，没太明白

printf("bandwidth: %f Gbps\n",
       2 * NR_BYTES / (1024.0 * 1024.0 * 1024.0) * 1000 / used);

这个应该是GBps

Multiple threads for ARM big.LITTLE cpu scheduling

Hi, MegPeak authors:

个人理解为 megpeak 的现有实现，为CPU单线程的程序优化提供了参考的性能上界。这里想讨论下多线程情况。

在 ARM big.LITTLE CPU调度情况下，多线程方式实现的函数，可能被调度分配到不同类型的核心上。此时的性能上界估计，有什么指导方向吗？

例如，在小米11手机上有1个X1超大核，3个A78大核，4个A55小核。对于 OpenCV Android 提供的 cv::resize( ) 函数实现，默认开启了2线程，可能分配到 X1+A78，也可能分配到 A78 + A55, 或 A55 + A55. 三种情况下的耗时会有较大差异。考虑到真实场景下 app 通常不能绑定核心，不知道此时的性能上界估计，有没有什么确定性的方法？

err: can not find opencl

./megpeak -d opencl
err: can not find opencl
err: failed to load opencl func: clGetPlatformIDs
opencl error -32: CL_INVALID_PLATFORM (clGetPlatformIDs(0, nullptr, &nr_platforms) at /home/kuylv/Downloads/MegPeak-main/src/opencl/common.cpp:OpenCLEnv:438)

aarch64 Phytium
NVIDIA GeForce RTX 2060
515.65.01

clinfo

Platform Name NVIDIA CUDA
Number of devices 1
Device Name NVIDIA GeForce RTX 2060
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 3.0 CUDA
Device UUID e38925ec-2f68-5836-3031-5170d70c9401
Driver UUID e38925ec-2f68-5836-3031-5170d70c9401
Valid Device LUID No
Device LUID 0000-000000000000
Device Node Mask 0
Device Numeric Version 0xc00000 (3.0.0)
Driver Version 515.65.01
Device OpenCL C Version OpenCL C 1.2
Device OpenCL C all versions OpenCL C 0x400000 (1.0.0)
OpenCL C 0x401000 (1.1.0)
OpenCL C 0x402000 (1.2.0)
OpenCL C 0xc00000 (3.0.0)
Device OpenCL C features __opencl_c_fp64 0xc00000 (3.0.0)
__opencl_c_images 0xc00000 (3.0.0)
__opencl_c_int64 0xc00000 (3.0.0)
__opencl_c_3d_image_writes 0xc00000 (3.0.0)
Latest comfornace test passed v2021-02-01-00
Device Type GPU
Device Topology (NV) PCI-E, 0000:05:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 30
Max clock frequency 1680MHz
Compute Capability (NV) 7.5
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple (device) 32

make 错误

我的cmake 命令是： cmake ..
然后再执行make 命令出现以下错误：
In file included from /usr/local/gcc-5.4/include/c++/5.4.0/chrono:35:0,
from /data/ningwang11/MePeak-main/src/cpu/common.h:13
from /data/ningwang11/Mepeak-main/src/cpu/aarch64.cpp:11:
/usr/local/gcc-5.4/include/c++/5.4.0/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support \

In file included from /data/ningwang11/Mepeak-main/src/cpu/aarch64.cpp:11:0
/data/ningwang11/Mepeak-main/src/cpu/common.h:22:1:error: 'constexpr' does not name a type
constexpr static unint32_t RUNS = 800000

请问这个错误该怎么解决啊？

Failed to load opencl on MacOSX

With the latest commit of MegPeak, ./megpeak -d opencl run failed on MacOSX, tested on both Mac Intel x64 and Mac M1.

The cmake installation provided FindOpenCL.cmake helps find_package(OpenCL) work, which find the system bundled OpenCL framework.

find_package(OpenCL)
if(OpenCL_FOUND)
  message(STATUS "--- OpenCL_FOUND : ${OpenCL_FOUND}")
  message(STATUS "--- OpenCL_INCLUDE_DIRS : ${OpenCL_INCLUDE_DIRS}")
  message(STATUS "--- OpenCL_LIBRARIES : ${OpenCL_LIBRARIES}")
  message(STATUS "--- OpenCL_VERSION_STRING : ${OpenCL_VERSION_STRING}")
  message(STATUS "--- OpenCL_VERSION_MAJOR : ${OpenCL_VERSION_MAJOR}")
  message(STATUS "--- OpenCL_VERSION_MINOR : ${OpenCL_VERSION_MINOR}")
endif()

Which gives OpenCL_LIBRARIES to:

/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/OpenCL.framework

Thus, I manually modify libopencl.cpp of opencl-stub in MegPeak, from

#if defined(__APPLE__) || defined(__MACOSX)
static const char* default_so_paths[] = {
        "/System/Library/Frameworks/OpenCL.framework/OpenCL",
        "libOpenCL.so"};

#if defined(__APPLE__) || defined(__MACOSX)
static const char* default_so_paths[] = {
        "/System/Library/Frameworks/OpenCL.framework/OpenCL",
 "/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/OpenCL.framework/OpenCL", // new added
        "libOpenCL.so"};

But still suffers:

(base) ➜  mac-arm64 git:(main) ✗ ./megpeak -d opencl
err: can not find opencl
err: failed to load opencl func: clGetPlatformIDs
opencl error -32: CL_INVALID_PLATFORM (clGetPlatformIDs(0, nullptr, &nr_platforms) at /Users/zz/work/MegPeak/src/opencl/common.cpp:OpenCLEnv:438)
(base) ➜  mac-arm64 git:(main) ✗