GithubHelp home page GithubHelp logo

oaid / autokernel Goto Github PK

View Code? Open in Web Editor NEW
777.0 44.0 94.0 6.05 MB

AutoKernel 是一个简单易用,低门槛的自动算子优化工具,提高深度学习算法部署效率。

License: Apache License 2.0

CMake 0.31% C++ 87.33% Shell 0.38% C 9.84% Makefile 0.40% Python 1.46% Jupyter Notebook 0.29%
tensorflow pytorch tvm halide tengine deep-learning auto reinforcement-learning autosearch optimization

autokernel's Introduction

English | 简体中文

AutoKernel

Introduction

Neural networks are now used in a wide variety of applications. Efficient execution of Neural networks on various devices plays a critical role for these applications. Facing the rapid evolution of deep learning algorithms, there're limited qualified programmers to write hand optimized low-level kernels on different hardware platforms. Using automatic optimization tools to generate high-performance implementations become a promising solution.

AutoKernel began as a research project at OPEN AI LAB. The project is now open source. AutoKernel is an operator optimzation tools for automatically generating high-performance low-level codes for diverse hardware backends. It aims to accelerate the development of high performance operators on various hardware including specialized accelerators.

AutoKernel Architecture

AutoKernel arch

AutoKernel consists of three modules:

  • Operator Generator:

    This module uses the open source project Halide. Halide is a domain specific language (DSL), embedded in C++, designed to make it easier to write high-performance image processing code on modern machines. Halide seperates the algorithm description from its schedule. The input of this module is the algorithm description of operator, and the output is compiled optimized assembly code/object file for corresponding back-ends.

  • AutoSearch

    AutoSearch is an automatic module for searching optimized schedules for halide operators, using multiple optimization algorithms (greedy algorithm, reinforce learning, marchine learning, ...). It supports searching optimized schedules on both CPU and GPU, and generate code files running on different platforms (x86 or arm). This module is still under developping.

  • AutoKernel Plugin:

    AutoKernel Plugin realizes one-click integration of auto-generated optimized operator codes into Tengine, without modifying the core code base of Tengine. AutoKernel plugin realizes the one-click deployment of the automatic generated operator implements.

Features

  • Automated
  • Efficient
  • User-friendly

Docker

We provide following dockers with Halide and Tengine installed:

  • cpu: openailab/autokernel
  • cuda: openailab/autokernel:cuda
  • opencl: openailab/autokernel:opencl

Detail Dockerfiles, see Dockerfiles

[NOTE]: if using the cuda image, you need use nvidia-docker instead of docker, here's nvidia-docker install-guide.

nvidia-docker pull openailab/autokernel:cuda
nvidia-docker run -it openailab/autokernel:cuda /bin/bash

License

Discussion

  • Github issues
  • QQ group: 829565581

autokernel's People

Contributors

bug1989 avatar ccw1996 avatar crouchggj avatar ddzhao91 avatar hconk avatar lyuchuny3 avatar qingchuanws avatar steve-selite avatar zhuguiqian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autokernel's Issues

AutoKernel首期 悬赏任务 (快来薅羊毛~~)

任务激励

  • 达成任意一项任务即成为 Tengine 开源贡献者、活动贡献者认证证书;
  • 根据达成任务的难度还有额外小礼品赠送
    • 简单:T恤 + 多功能充电线
    • 一般:T恤 + 多功能充电线 + 帽子
    • 困难:T恤 + 多功能充电线 + 机械键盘

任务详情

算子类(C++- float32)

此类算子添加参考文档 如何快速添加算子, 测试脚本放在 autokernel_plugin/tests目录下

  • 基于最新版本,实现 avgpooling的移植,验证正确性(难度:简单)已完成:chenjun2hao

  • 基于最新版本,实现 maxpooling的移植,验证正确性(难度:简单)已完成:chenjun2hao

  • 基于最新版本,实现 fc算子的移植,验证正确性(难度:简单)已完成:QingChuanWS

  • 基于最新版本,实现 softmax算子的移植,验证正确性(难度:简单)已完成:ccw1996

  • 基于最新版本,实现 dw_conv(depthwise_convolution)的移植,验证正确性(难度:简单)已完成:QingChuanWS

  • 基于最新版本,实现 normalize算子的移植,验证正确性(难度:简单)已完成:crouchggj

  • 其他算子待定,欢迎大家待补充(难度:简单)

后端类(C/C++)

文档提交在 doc目录下

  • 修改cuda-matmul用例,使之适用任意矩阵尺寸,并提供测试用例(难度:简单)
  • opencl后端demo, 提供代码和文档说明(难度:一般)

优化类(C/C++)

  • 实现任意算子的schedule优化,性能达到最新Tengine-lite的70%以上(难度:困难)

测试/编译类(C/C++)

  • 基于最新版本,优化测试代码/编译流程,优化使用体验(难度:一般)已完成:Hconk
  • 基于最新版本,测试AutoKernel的算子性能, 提交到 doc/benchmark.md(难度:一般)
  • 其他优化编译流程/优化使用体验的,欢迎继续补充

文档类(C/C++)

  • 基于最新版本,在现有的xxx.md 中修复 ≥ 3处文本错误/ ≥ 1处新内容增加。(难度:简单)

第三期悬赏任务说明:

做任务之前,可以先通过AutoKernel初级教程了解AutoKernel项目

  1. 所有悬赏任务均以 Pull Request 形式提交到 AutoKernel项目
  2. 奖励标准:提交时间和任务完成质量,评选规则由 Tengine 开源委员会评选和决定;
  3. 该期悬赏任务截止时间:2020.1.10;奖励和贡献者证书在 2020.1.31 统一发送;
  4. 奖励发送&加入贡献者交流群,请添加Tengine小助手微信号:Tengine666 备注:任务
  5. Tengine 开源委员会保留最终解释权。

Autosearch 使用报错

拉最新的代码,在docker上跑测试例子报错, 没找到动态依赖库

/workspace/AutoKernel/AutoSearch/toolkit/demo_gen: error while loading shared libraries: libHalide.so.10: cannot open shared object file: No such file or directory

autokernel_plugin下运行./build/tests/tm_classification报错

正常编译生成libautokernel.so库前提下
在AutoKernel/autokernel_plugin 目录下执行测试程序:./build/tests/tm_classification
报错: ./build/tests/tm_classification: symbol lookup error: ./build/src/libautokernel.so: undefined symbol: register_builtin_node_ops

ps:由于src子目录里面的build.sh脚本运行报错,于是所有的build.sh脚本中的g++编译命令都添加了-ldl -lpthread -lz

请教当矩阵尺寸过小时如何解决边界索引越界的问题?

文档中gemm的例子,如果把输入矩阵改成如下:

  int M = 20;
  int N = 30;
  int K = 40;

那么从step 2开始使用tiling的优化操作的方法都会出现索引越界的问题,可以请教一下应该如何解决吗?

Aborted
❯ ./06_build.sh 1
step =  1
M N K =  20  30  40     err 0.00        [rep 50] autokernel | blas      0.0112 ms       0.0013 ms
❯ ./06_build.sh 2
step =  2
terminate called after throwing an instance of 'Halide::RuntimeError'
  what():  Error: Input buffer b0 is accessed at 23, which is beyond the max (19) in dimension 1

Aborted

autokernel使用问题

1.Halide.h里面用到了c++17,autokernel_plugin的builid.sh中修改-std=c++17才能编译通过
2.tools.py第175行{HALIDE_HOME}/halide-build/inclue出现了这个路径,不同于autokernel_plugin部分,HALIDE_ROOT代表halide项目源码地址,HALIDE_DIR代表halide的安装位置,建议统一下,这里指定{HALIDE_HOME}/halide-build,需要在源码安装的时候指定build目录名字为halide-build,不同人的习惯可能不同。
3.tools.py第64行,{HALIDE_HOME}/bin这个bin目录源码位置没有(可能是Hailde拉取的版本不同),同时安装目录的库存放地址应该是{HALIDE_DIR}/lib
4.执行python3 tools.py --gen ../generator/batch_matmul.cpp -autotune -compute_time
出现错误:
c++: error: ./temp/batch_1_0/0/.registration.cpp: No such file or directory
c++: error: ./temp/batch_1_0/0/
.a: No such file or directory
c++: error: ./temp/batch_1_0/1/.registration.cpp: No such file or directory
c++: error: ./temp/batch_1_0/1/
.a: No such file or directory
timeout: failed to run command ‘./temp/batch_1_0/0/bench’: No such file or directory
timeout: failed to run command ‘./temp/batch_1_0/1/bench’: No such file or directory
retrain_cost_model: /home/hebingshi/Downloads/autokernel-git-pr/AutoKernel/AutoSearch/src/adams2019/retrain_cost_model.cpp:351: std::map<int, {anonymous}::PipelineSample> {anonymous}::load_samples(const {anonymous}::Flags&): Assertion `dot != string::npos && best_path.substr(dot) == ".sample"' failed.
../src/adams2019/autotune_loop.sh: line 261: 294442 Done find ${SAMPLES} -name "*.sample"
294443 Aborted (core dumped) | ${AUTOSCHED_BIN}/retrain_cost_model --epochs=${BATCH_SIZE} --rates="0.0001" --num_cores=32 --initial_weights=${WEIGHTS} --weights_out=${WEIGHTS} --best_benchmark=${SAMPLES}/best.${PIPELINE}.benchmark.txt --best_schedule=${SAMPLES}/best.${PIPELINE}.schedule.h

AutoKernel第二期 悬赏任务 (快来薅羊毛~~)

任务激励

  • 达成任意一项任务即成为 Tengine 开源贡献者、活动贡献者认证证书;
  • 根据达成任务的难度还有额外小礼品赠送和积分;
    • 简单:T恤 + 多功能充电线 ,积分+1
    • 一般:T恤 + 多功能充电线 + 帽子,积分+3
    • 困难:T恤 + 多功能充电线 + 机械键盘,积分+5
  • 积分兑换奖品
    • 积分15,Khadas VIM3L
    • 积分25,Khadas VIM3

任务详情

Arm平台自动调优:

  • 基于AutoKernel最新版本,在arm平台上进行matmul算子的自动调优,输出调优后的权重数据,以及调优结果 (难度:一般)

数据类型 INT8支持:

  • 基于AutoKernel最新版本,在cuda平台上进行matmul-int8类型的自动调优,输出算子生成代码及调优结果: (难度:困难)

RISCV后端优化类:

  • 基于AutoKernel最新版本,尝试使用最新的LLVM工具链,尝试自动生成RISCV的向量指令(Vector指令),输出说明文档(提交在autokernel-docs.git文档的blog目录下)(难度:困难)

问题反馈类:

  • 基于AutoKernel最新版本,通过 github issue上报AutoKernel运行问题及调试信息(难度:简单)
  • 基于AutoKernel最新版本,通过 github issue上报AutoKernel运行问题及调试信息(难度:简单)

算子优化支持:

  • 基于AutoKernel最新版本,进行其他算子(除matmul算子之外的算子)的自动调优,提交内容算子generator文件和说明文档(自动调优结果,对比手工调优/其他实现的结果,数据复现说明,目标平台)(难度:一般)
  • 基于AutoKernel最新版本,进行其他算子(除matmul算子之外的算子)的自动调优,提交内容算子generator文件和说明文档(自动调优结果,对比手工调优/其他实现的结果,数据复现说明,目标平台)(难度:一般)

优化需求收集类

  • 通过 github issue 上报优化需求信息;(难度:简单)
  • 通过 github issue 上报优化需求信息;(难度:简单)
  • 通过 github issue 上报优化需求信息。(难度:简单)
    说明:
    优化需求信息需包含以下信息:
    1. 背景项目介绍
    简单描述应用场景,算法模块简介(可附上相关开源项目/文档链接)
    2. 性能需求描述
    目前性能,目标性能,目标平台,测试数据维度(shape)
    3.  待优化代码块源码
    进行初步性能剖析,找出最值得优化的代码块,提供待优化模块的基础代码(C/C++代码实现/python	代码实现)void func()
    int main()
    {
       //测试性能,提供测试数据的维度
       func()
       //输出目前耗时
    }
    

本期悬赏任务说明:

  1. 所有悬赏任务均以Pull Request 形式提交;
  1. 奖励标准:提交时间和任务完成质量,评选规则由 Tengine 开源委员会评选和决定;
  2. 本期悬赏任务截止时间:2021.6.30;奖励和贡献者证书在 2021.6.30 统一发送;
  3. 奖励发送&加入贡献者交流群,请添加Tengine小助手微信号:Tengine666 备注:任务
  4. Tengine 开源委员会保留最终解释权。

build all plugin fails

/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s: Assembler messages:
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:808: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:823: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:854: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1168: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1170: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1387: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1397: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1420: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1459: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1608: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1622: Error: unknown vector operation: ` {z}'
/workspace/AutoKernel/autokernel_plugin/src/depthwise/halide_depthwise.s:1652: Error: unknown vector operation: ` {z}'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.