GithubHelp home page GithubHelp logo

alibaba / heterogeneity-aware-lowering-and-optimization Goto Github PK

View Code? Open in Web Editor NEW
248.0 25.0 75.0 32.57 MB

heterogeneity-aware-lowering-and-optimization

License: Apache License 2.0

CMake 0.90% TeX 0.03% C 1.45% C++ 96.37% Shell 0.21% Python 0.78% Makefile 0.04% Dockerfile 0.13% Cuda 0.08%

heterogeneity-aware-lowering-and-optimization's Introduction

License PRs Welcome

Testing (X86_64, GPU) Testing (IPU Simulator) API Publish

HALO

Heterogeneity-Aware Lowering and Optimization (HALO) is a heterogeneous computing acceleration platform based on the compiler technology. It exploits the heterogeneous computing power targeting the deep learning field through an abstract, extendable interface called Open Deep Learning API (ODLA). HALO provides a unified Ahead-Of-Time compilation solution, auto tailored for cloud, edge, and IoT scenarios.

HALO supports multiple compilation modes. Under the ahead-of-time (AOT) compilation mode, HALO compiles an AI model into the C/C++ code written in the ODLA APIs. The compiled model can be run on any supported platform with the corresponding ODLA runtime liibrary. Plus, HALO is able to compile both host and heterogeneous device code simultaneously. The picture below shows the overall compilation flow:

HALO has supported the compilation of models from the following frameworks:

  • Caffe
  • ONNX
  • TensorFlow
  • TFLite

More frameworks will be supported soon.

HALO supports Alibaba's first AI-inference chip: Hanguang-800 NPU via its HgAI SDK. Hanguang-800 NPU is designed by T-Head Semiconductor Co., Ltd. (also known as PingTouGe), a business entity of Alibaba Group.

A broad ODLA ecosystem is supported via the ODLA runtime library set targeting various heterogeneous accelerators/runtimes:

And we welcome new accelerator platforms to join in the ODLA community.

ODLA API Reference can be found here and detailed programming guide be coming soon...

Partners

We appreciate the support of ODLA runtimes from the following partners:

How to Use HALO

To build HALO, please follow the instructions here (查看中文).

The workflow of deploying models using HALO includes:

  1. Use HALO to compile the model file(s) into an ODLA-based C/C++ source file.
  2. Use a C/C++ compiler to compile the generated C/C++ file into an object file.
  3. Link the object file, the weight binary, and specific ODLA runtime library together.

A Simple Example

Let's start with a simple example of MNIST based on TensorFlow Tutorial. The diagram below shows the overall workflow:

Brief explanations:

HALO generates 3 files:

  • mnist.h : the header file to be used by application.
  • mnist.cc : the ODLA C++ file that represents the model.
  • mnist.bin : the weights in ELF format.

To application, the inference is simply viewed as a function call mnist().

Note that, for portability purpose, HALO always exports functions in the C convention even though the output file model.cc is in the C++ format.

More detailed explanations can be found here. Example code can be found here

Please refer to HALO options list for all command line options.

More Examples

Contributing

We're always looking for help to improve HALO. See the Contributing Guide for more details. Thank you!

Resources

License

HALO is licensed under the Apache 2.0 License

heterogeneity-aware-lowering-and-optimization's People

Contributors

ahuizxc avatar alibaba-oss avatar alishenli avatar dj176050 avatar dongjiyingdjy avatar hj-wei avatar jackzipu avatar jayzlee147 avatar lingqingzz avatar lingyeai avatar littlefatfat avatar neozhangjianyu avatar peng2007 avatar pengl avatar shuhand avatar tianboh avatar tjs2200120 avatar wangcl15 avatar weifengz2016 avatar weimingzha0 avatar xuhongyao avatar yanwei-gr avatar youbeny avatar zars19 avatar zh-wei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heterogeneity-aware-lowering-and-optimization's Issues

[BUG] Sub-graph compilation can't work with the option of "outputs"

Describe the bug
sub-graph compilation should work with the option of "outputs", but when I assigned to the output node to be Conv/Conv2D for Tensorflow version LPRnet, the whole graph was still compiled.

To Reproduce
./halo -target cxx /home/LPRNet_Models/tensorflow/lprnet_new_v2.pb -reorder-data-layout=channel-last -disable-broadcasting -remove-input-transpose -remove-output-transpose -o lprnet_conv.cc -entry-func-name=lprnet_conv -outputs=Conv/Conv2D

Expected behavior
a sub-graph with the output node of "conv/conv2d" should be compiled

Screenshots
output node was still d_predictions instead of conv/conv2d
void lprnet_conv(const float input_1[1 * 24 * 94 * 3],
float out_d_predictions[1 * 88 * 71]) {
lprnet_conv_init();
static odla_context Ctx;
if (Ctx == nullptr) {
odla_CreateContext(&Ctx);
};
odla_BindToArgumentById((const odla_value_id) "input_1", input_1, Ctx);
odla_BindToOutputById((const odla_value_id) "d_predictions",
out_d_predictions, Ctx);
odla_ExecuteComputation(Comp, Ctx, ODLA_COMPUTE_INFERENCE, nullptr);
}

Additional context
I check the codes in driver.cc, the output node name has some requirements, but the node name with slash can often be found at TF model. so it is better for HALO to support the node name with slash.
static llvm::cl::liststd::string Outputs(
"outputs",
llvm::cl::desc("Specify output names like -outputs=foo, -outputs=bar:0"));

[BUG] Unsupport TFLite operation MaxPool2D

Describe the bug
when TFLite model file include operation MaxPool2D, the following error occurs:
E0208 07:28:31.082795 93539 tflite_parser.cc:282] Convert function not found, Please check if it is supported: Op: [17], Index: [-1]

To Reproduce
./build/bin/halo --target cc --disable-broadcasting --print-mem-stats --emit-value-reset --fuse-conv-bias --fuse-matmul-bias --entry-func-name=uai_infer --api=odla_05 --emit-data-as-c model_quant.tflite -o .//uai_infer.cc

[BUG] Cannot disable TRT on device doesn't have GPU

cmake -G Ninja -DDNNL_COMPILER=/host/gcc10/bin/gcc

then i get

-- Found GCC10: /host/gcc10/bin/gcc, build odla_dnnl with bf16 support!
-- Popart library not found, skip building ODLA for Popart
-- The CUDA compiler identification is unknown
CMake Error at ODLA/platforms/tensorrt/CMakeLists.txt:17 (enable_language):
  No CMAKE_CUDA_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.

[BUG] wrong bias size parse for tflite model

Describe the bug
when parsing mobilenet_v2_1.0_224.tflite using halo, it parses bias size as tensor output size.
mobilenet model downloaded from https://storage.googleapis.com/download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224.tgz

To Reproduce

heterogeneity-aware-lowering-and-optimization/build/bin/halo -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v2_1.0_224.tflite -o ./mobilenet_v2_1.cc

Expected behavior
it generates: extern const float inst_112_bias_broadcasted_181[1 * 112 * 112 * 32];
however, the correct bias size should be 32 float.

Screenshots
image

Support for ONNX Resize-13

ONNX Resize-13 has an optional "ROI" operand.
The operand list is "input, ROI (optional), scales (optional), sizes (optional)

Even if ROI is not used, an empty tensor will be passed in order to specify a scales value.

Currently, HALO can't handle an empty tensor.

[Doc] Typo of Hanguang

Describe the bug
Typo of Hanguang

To Reproduce

Hanghuang-800 should be Hanguang-800(

Screenshots

typo

[BUG] cast op not properly compiled

Describe the bug

Function: cast(x[FLOAT32: 1x64x768])
BasicBlock: bb0
Inst: y([invalid: ]) = tf_Cast(<x, 0>:[FLOAT32: 1x64x768]) {Attrs: <SrcT: 3, <DstT: 4, <Truncate: 0>}
Inst: y([invalid: ]) = fptosi(<x, 0>:[FLOAT32: 1x64x768]) {Attrs: <data_type: 4}
Inst: output() = return(<y, 0>:[invalid: ], <y, 0>:[invalid: ])

To Reproduce

node {
name: "x"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: -1
}
dim {
size: 64
}
dim {
size: 768
}
}
}
}
}
node {
name: "y"
op: "Cast"
input: "x"
attr {
key: "DstT"
value {
type: DT_INT8
}
}
attr {
key: "SrcT"
value {
type: DT_FLOAT
}
}
attr {
key: "Truncate"
value {
b: false
}
}
}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

[BUG]: g++ compile error using halo's output

Describe the bug
.cc could not be read by g++ compiler correctly, and it'll report "<class_ 'type'>" could not be recognized.

To Reproduce
g++ -c -o .o .cc -I<INCLUDE_PATH>

Expected behavior
exit normally and output .o

Screenshots
image

[Feature] Serializing/Deserializing TensorRT execution engine

TensorRT supports serializing/deserializing the execution engine for the network.
It can speed up the initialization.

ODLA has defined odla_CompileExecution, odla_LoadExecution, odla_SaveExecution, etc. APIs, which are exactly designed for such scenarios.

[BUG] Bert frozen model compilation issue (batch size >=2)

Describe the bug
I see a bert frozen model compilation issue, when I use batch size >=2.

To Reproduce
cmd line: "halo -target cxx -batch-size=2 bert_frozen_model.pb -o bert.cc"

Expected behavior
The third model entry function parameter should be "const float input_type_ids[2 * 64 * 2]", however, "const float input_type_ids[2 * 2]" is generated.

Screenshots
b=1:
1
b=2:
2

[BUG] [DNNL] expand_dims

Describe the bug
expand_dims() seems incorrect

To Reproduce

if src has shape [C], dst has shape [N, C, H, W], the current function will make src as [1, 1, 1, C], which is incorrect.
It should follow the broadcasting rule.

[BUG] Unsupport TFLite operation FullyConnected

Describe the bug
when TFLite model file include operation FullyConnected, the following error occurs:
E0208 07:28:31.082795 93539 tflite_parser.cc:282] Convert function not found, Please check if it is supported: Op: [9], Index: [-1]

To Reproduce
./build/bin/halo --target cc --disable-broadcasting --print-mem-stats --emit-value-reset --fuse-conv-bias --fuse-matmul-bias --entry-func-name=uai_infer --api=odla_05 --emit-data-as-c model_quant.tflite -o .//uai_infer.cc

[BUG] mobilenet_v1_1.0_224.tflite model parse failure

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
build/bin/halo -disable-broadcasting -fuse-conv-bias -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v1_1.0_224.tflite -o ./mobilenet_v1_1.cc

Expected behavior
could not parse mobilenet_v1_1.0_224.tflite (file is in attachment, rename to .tflite to run)

Screenshots
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1207 03:39:29.349206 34502 tflite_parser.cc:273] Convert function not found, Please check if it is supported: Op: [43], Index: [-1]
root# /build/bin/halo -fuse-conv-bias -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v1_1.0_224.tflite -o ./mobilenet_v1_1.cc
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1207 03:40:06.268330 34503 tflite_parser.cc:273] Convert function not found, Please check if it is supported: Op: [43], Index: [-1]

mobilenet_v1_1.0_224.tflite.txt

[BUG]: dnnl missing "odla_Tile" implementation

Describe the bug
link error occurred while compile caffe model include tile op

To Reproduce
$1 test file include main function
$2 model binary *.o file
$3 model binary *.bin file
$4 model executable output file
g++ -O3 -DBATCH=128 -I/host/code-base/heterogeneity-aware-lowering-and-optimization/ODLA/include $1 $2 $3 -L/host/code-base/heterogeneity-aware-lowering-and-optimization/build/lib -lodla_dnnl -o $4
Expected behavior

Screenshots
image

[BUG]: dnnl missing "odla_PRelu" implementation

Describe the bug
link error occurred when compile face_recog caffe model to executable file(elf)

To Reproduce

$1 test file include main function

$2 model binary *.o file

$3 model binary *.bin file

$4 model executable output file

g++ -O3 -DBATCH=128 -I/host/code-base/heterogeneity-aware-lowering-and-optimization/ODLA/include $1 $2 $3 -L/host/code-base/heterogeneity-aware-lowering-and-optimization/build/lib -lodla_dnnl -o $4

Expected behavior
exit normally without error

Screenshots
image

[BUG] Packed constants in TF not handled properly

Describe the bug
for values like [a, a,...], TF saved it as a single {a}.
TF parser doesn't handle it properly.

To Reproduce

Command lines to reproduce the bug.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

[BUG] DCE doesn't remove dead loops

Currently, DCE doesn't remove dead loops.
For example, for YOLO, if we specify output nodes before the loop, current halo still emits code for loop body.

[BUG] inception-v3 model incorrect shape

Describe the bug

incorrectly parse the shape in the inception-v3 model, as the red font part

halo IR
image

image

To Reproduce

pr #124
remove "XFAIL: *" in models/vision/classification/inception/run_inception_v3.sh, then run "ninja check-halo-models"

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Fix ONNX weight parsing #104

[BUG] Need a "Fill" op

Describe the bug
Currently, HALO converts the fill op into constants by generating random data at compile time.

With the increase of batch size, the bin file will grow dramatically for some model.

To Reproduce

For example, for Bert model,
./bin/halo bert_frozen_model.pb -target cxx -disable-broadcasting -batch-size=1 -o bert1.cc
./bin/halo bert_frozen_model.pb -target cxx -disable-broadcasting -batch-size=2 -o bert1.cc

The second command will generate larger .bin file.

Expected behavior
The bin file size should remain constant.

[ODLA/DNNL] odla_BindToOutput and odla_BindToOutputByID should follow same logic

Currently, odla_BindToOutput still sets data_handle of DNNL memory:

odla_status odla_BindToOutput(odla_value value, odla_void* data_ptr,
                              odla_context context) {
  // Handle the case of output is constant due to compile-time optimization.
  if (value->is_const) {
    size_t len = value->mem.get_desc().get_size();
    if (value->elem_size == 8) {
      len *= 2;
    }
    memcpy(data_ptr, value->mem.get_data_handle(), len);
  } else {
    value->mem.set_data_handle(data_ptr); // ==> Old logic
  }
  return ODLA_SUCCESS;
}

odla_status odla_BindToOutputById(const odla_value_id value_id,
                                  odla_void* data_ptr, odla_context context) {
  std::string name((const char*)value_id);
  auto& outputs_v = context->comp->outputs_v;
  auto val = context->comp->outputs[name];
  outputs_v[name] = {val, data_ptr};
  return ODLA_SUCCESS;
}

[BUG]: resnet18 run failed on x86

Describe the bug
car detect resnet18 run failed on x86

To Reproduce
link resnet18.cc with dnnl.so, and execute resnet18.exe.

Expected behavior
exit without error

Screenshots
image

Additional context
"terminate called after throwing an instance of 'dnnl::error'
what(): could not create a descriptor for a deconvolution forward propagation primitive
Aborted (core dumped)"

[BUG] Unit test ResultCheck only checks the first element

Describe the bug
Looks like Unit test ResultCheck only checks the first element

T* out_data = reinterpret_cast<T*>(out[i]);
size_t elem_size = sizeof(out_data) / sizeof(T); ===> elem_size is 1 ?
for (size_t j = 0; j < elem_size; ++j) {
bool nan_mismatch = (isnan(out_data[j]) ^ isnan(out_ref_data[j]));
if (nan_mismatch || fabs(out_data[j] - out_ref_data[j]) > thre) {
#if DEBUG_PRINT
oss << " result: FAIL [" << i <<", "<< j << "]: " << out_data[j]
<< " expects: " << out_ref_data[j] << "\n";
outfile << oss.str();
outfile.close();

[Feature] Loop op support for ONNX

Describe the bug
Currently, ONNX Loop is unsupported.
We need:

  1. HALO IR support for loops
  2. Parser support for loops
  3. CodeGen support for loops
  4. ODLA support for loops (we can start with TensorRT backend)

Print git revision info from "--version"

Current, halo --version just prints a predefined version string.

Better to print the revision that the binary is built upon. It can:

  • help automation on releases
  • let user provide more details when they encounters an issue

dnnl maxpooling support attribute "ceil mode"

Describe the bug
https://github.com/alibaba/heterogeneity-aware-lowering-and-optimization/pull/146/checks?check_run_id=1736476571
terminate called after throwing an instance of 'dnnl::error'
what(): could not create a descriptor for a pooling forward propagation primitive
/host/heterogeneity-aware-lowering-and-optimization/halo/models/vision/classification/squeezenet/run_squeezenet_1_0.sh: line 25: 5032 Aborted (core dumped) python3 $curr_dir/../../invoke_halo.py --model $model_file --label-file $curr_dir/../1000_labels.txt --image-dir $image_dir --odla dnnl --convert-layout-to=nhwc

To Reproduce
execute vision/classification/squeezenet/run_squeezenet_1_0.sh
or ninja check-halo-models

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
the ceil_mode is not considered in dnnl pooling_desc_init
ceil_mode: whether to use ceil or floor (default) to compute the output shape
https://github.com/oneapi-src/oneDNN/blob/master/src/common/pooling.cpp line 111

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.