heterogeneity-aware-lowering-and-optimization

License: Apache License 2.0

CMake 0.90% TeX 0.03% C 1.45% C++ 96.37% Shell 0.21% Python 0.78% Makefile 0.04% Dockerfile 0.13% Cuda 0.08%

heterogeneity-aware-lowering-and-optimization's Introduction

HALO

Heterogeneity-Aware Lowering and Optimization (HALO) is a heterogeneous computing acceleration platform based on the compiler technology. It exploits the heterogeneous computing power targeting the deep learning field through an abstract, extendable interface called Open Deep Learning API (ODLA). HALO provides a unified Ahead-Of-Time compilation solution, auto tailored for cloud, edge, and IoT scenarios.

HALO supports multiple compilation modes. Under the ahead-of-time (AOT) compilation mode, HALO compiles an AI model into the C/C++ code written in the ODLA APIs. The compiled model can be run on any supported platform with the corresponding ODLA runtime liibrary. Plus, HALO is able to compile both host and heterogeneous device code simultaneously. The picture below shows the overall compilation flow:

HALO has supported the compilation of models from the following frameworks:

Caffe
ONNX
TensorFlow
TFLite

More frameworks will be supported soon.

HALO supports Alibaba's first AI-inference chip: Hanguang-800 NPU via its HgAI SDK. Hanguang-800 NPU is designed by T-Head Semiconductor Co., Ltd. (also known as PingTouGe), a business entity of Alibaba Group.

A broad ODLA ecosystem is supported via the ODLA runtime library set targeting various heterogeneous accelerators/runtimes:

And we welcome new accelerator platforms to join in the ODLA community.

ODLA API Reference can be found here and detailed programming guide be coming soon...

Partners

We appreciate the support of ODLA runtimes from the following partners:

How to Use HALO

To build HALO, please follow the instructions here (查看中文).

The workflow of deploying models using HALO includes:

Use HALO to compile the model file(s) into an ODLA-based C/C++ source file.
Use a C/C++ compiler to compile the generated C/C++ file into an object file.
Link the object file, the weight binary, and specific ODLA runtime library together.

A Simple Example

Let's start with a simple example of MNIST based on TensorFlow Tutorial. The diagram below shows the overall workflow:

Brief explanations:

HALO generates 3 files:

mnist.h : the header file to be used by application.
mnist.cc : the ODLA C++ file that represents the model.
mnist.bin : the weights in ELF format.

To application, the inference is simply viewed as a function call mnist().

Note that, for portability purpose, HALO always exports functions in the C convention even though the output file model.cc is in the C++ format.

More detailed explanations can be found here. Example code can be found here

Please refer to HALO options list for all command line options.

More Examples

Contributing

We're always looking for help to improve HALO. See the Contributing Guide for more details. Thank you!

Resources

License

HALO is licensed under the Apache 2.0 License

heterogeneity-aware-lowering-and-optimization's People

Contributors

Stargazers

Watchers

heterogeneity-aware-lowering-and-optimization's Issues

[BUG] BF16 not enabled for ODLA DNNL

Describe the bug
GCC-10 not enabled during CMake

Screenshots
If applicable, add screenshots to help explain your problem.

[BUG] Bert frozen model compilation issue (batch size >=2)

Describe the bug
I see a bert frozen model compilation issue, when I use batch size >=2.

To Reproduce
cmd line: "halo -target cxx -batch-size=2 bert_frozen_model.pb -o bert.cc"

Expected behavior
The third model entry function parameter should be "const float input_type_ids[2 * 64 * 2]", however, "const float input_type_ids[2 * 2]" is generated.

Screenshots
b=1:

b=2:

[BUG] DCE doesn't remove dead loops

Currently, DCE doesn't remove dead loops.
For example, for YOLO, if we specify output nodes before the loop, current halo still emits code for loop body.

[Script] build_docker script is inconsistent with start_docker script

The tag to build docker image is different from docker starting script.

[BUG] wrong bias size parse for tflite model

Describe the bug
when parsing mobilenet_v2_1.0_224.tflite using halo, it parses bias size as tensor output size.
mobilenet model downloaded from https://storage.googleapis.com/download.tensorflow.org/models/tflite_11_05_08/mobilenet_v2_1.0_224.tgz

To Reproduce

heterogeneity-aware-lowering-and-optimization/build/bin/halo -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v2_1.0_224.tflite -o ./mobilenet_v2_1.cc

Expected behavior
it generates: extern const float inst_112_bias_broadcasted_181[1 * 112 * 112 * 32];
however, the correct bias size should be 32 float.

Screenshots

[BUG] Packed constants in TF not handled properly

Describe the bug
for values like [a, a,...], TF saved it as a single {a}.
TF parser doesn't handle it properly.

To Reproduce

Command lines to reproduce the bug.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Print git revision info from "--version"

Current, halo --version just prints a predefined version string.

Better to print the revision that the binary is built upon. It can:

help automation on releases
let user provide more details when they encounters an issue

dnnl maxpooling support attribute "ceil mode"

Describe the bug
https://github.com/alibaba/heterogeneity-aware-lowering-and-optimization/pull/146/checks?check_run_id=1736476571
terminate called after throwing an instance of 'dnnl::error'
what(): could not create a descriptor for a pooling forward propagation primitive
/host/heterogeneity-aware-lowering-and-optimization/halo/models/vision/classification/squeezenet/run_squeezenet_1_0.sh: line 25: 5032 Aborted (core dumped) python3 $curr_dir/../../invoke_halo.py --model $model_file --label-file $curr_dir/../1000_labels.txt --image-dir $image_dir --odla dnnl --convert-layout-to=nhwc

To Reproduce
execute vision/classification/squeezenet/run_squeezenet_1_0.sh
or ninja check-halo-models

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
the ceil_mode is not considered in dnnl pooling_desc_init
ceil_mode: whether to use ceil or floor (default) to compute the output shape
https://github.com/oneapi-src/oneDNN/blob/master/src/common/pooling.cpp line 111

[Feature] Allow simple preprocessing logic be expressed into HALO IR

Some CNN models require simple preprocessing like channel-wise subtraction and division. E.g. (R - 123) / 255, (G - 90) / 255, (B - 10) / 255.

When they're included into HALO IR, the generated code will run on accelerated platform rather than on host.

[BUG] mobilenet_v1_1.0_224.tflite model parse failure

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
build/bin/halo -disable-broadcasting -fuse-conv-bias -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v1_1.0_224.tflite -o ./mobilenet_v1_1.cc

Expected behavior
could not parse mobilenet_v1_1.0_224.tflite (file is in attachment, rename to .tflite to run)

Screenshots
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1207 03:39:29.349206 34502 tflite_parser.cc:273] Convert function not found, Please check if it is supported: Op: [43], Index: [-1]
root# /build/bin/halo -fuse-conv-bias -exec-mode=interpret -emit-value-id-as-int -emit-data-as-c -target cxx ./mobilenet_v1_1.0_224.tflite -o ./mobilenet_v1_1.cc
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1207 03:40:06.268330 34503 tflite_parser.cc:273] Convert function not found, Please check if it is supported: Op: [43], Index: [-1]

mobilenet_v1_1.0_224.tflite.txt

[CI] Upload build artifacts for each merge

It would be useful if the CI can upload the build artifacts when each patch is merged. So people can just download the package to use.

ODLA runtime libs are not installed to specified path

ODLA runtime libs are not installed to specified path to locations specified by CMAKE_INSTALL_PREFIX

[Bug] PRelu should do sink transpose operation, when enable reorder-data-layout

Describe the bug
Halo support PRelu op, but it will cause performance issue when enable reorder-data-layout
it will interrupt the transpose fusion

To Reproduce
Enable reorder-data-layout
Expected behavior
Sink the Transpose op before PRelu

[Feature] Packing build artifacts for release

CI should automatically upload build artifacts whenever there is a release

[BUG] Unit tests has no loop

Describe the bug
Current unit test cases have no loop tests

[BUG] Caffe Parser: ceil_mode in pooling not handled

Describe the bug
The parser handles "round_mode" but not "ceil_mode"

[BUG] Halo has no cxx code generation support for random_uniform op

TensorRT plugin is not available

Some ops relies on TensorRT plugins which are unavailable.

[Feature] PGQ tool needs an option to limit channel-wise quant info

When channel-wise quantization is enabled, some ops like FC has huge "channels"
We need an option to filter out the info based on dimensions

[Feature] Loop op support for ONNX

Describe the bug
Currently, ONNX Loop is unsupported.
We need:

HALO IR support for loops
Parser support for loops
CodeGen support for loops
ODLA support for loops (we can start with TensorRT backend)

[Feature] Serializing/Deserializing TensorRT execution engine

TensorRT supports serializing/deserializing the execution engine for the network.
It can speed up the initialization.

ODLA has defined odla_CompileExecution, odla_LoadExecution, odla_SaveExecution, etc. APIs, which are exactly designed for such scenarios.

[BUG] temp files generated for model tests should be cleaned up

Without cleaning up, it may cause false positive test results: If halo build files, the test framework will still pick the temp file generated by previous runs.

[ODLA/DNNL] PRelu: if slope is negative, the result is incorrect

[ODLA/DNNL] odla_resize not support nhwc format

nchw and nhwc can be distinguished by passing axes_mask (-1:nhwc, 0:nchw)

[PGQ] Allow specifying quantization scheme via command line

Add Hgai NPU support via custom op

[BUG]: dnnl missing "odla_PRelu" implementation

Describe the bug
link error occurred when compile face_recog caffe model to executable file(elf)

To Reproduce

$1 test file include main function

$2 model binary *.o file

$3 model binary *.bin file

$4 model executable output file

g++ -O3 -DBATCH=128 -I/host/code-base/heterogeneity-aware-lowering-and-optimization/ODLA/include $1 $2 $3 -L/host/code-base/heterogeneity-aware-lowering-and-optimization/build/lib -lodla_dnnl -o $4

Expected behavior
exit normally without error

Screenshots

[BUG] cast op not properly compiled

Describe the bug

Function: cast(x[FLOAT32: 1x64x768])
BasicBlock: bb0
Inst: y([invalid: ]) = tf_Cast(<x, 0>:[FLOAT32: 1x64x768]) {Attrs: <SrcT: 3, <DstT: 4, <Truncate: 0>}
Inst: y([invalid: ]) = fptosi(<x, 0>:[FLOAT32: 1x64x768]) {Attrs: <data_type: 4}
Inst: output() = return(<y, 0>:[invalid: ], <y, 0>:[invalid: ])

To Reproduce

node {
name: "x"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: -1
}
dim {
size: 64
}
dim {
size: 768
}
}
}
}
}
node {
name: "y"
op: "Cast"
input: "x"
attr {
key: "DstT"
value {
type: DT_INT8
}
}
attr {
key: "SrcT"
value {
type: DT_FLOAT
}
}
attr {
key: "Truncate"
value {
b: false
}
}
}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

[BUG] Unsupport TFLite operation MaxPool2D

Describe the bug
when TFLite model file include operation MaxPool2D, the following error occurs:
E0208 07:28:31.082795 93539 tflite_parser.cc:282] Convert function not found, Please check if it is supported: Op: [17], Index: [-1]

To Reproduce
./build/bin/halo --target cc --disable-broadcasting --print-mem-stats --emit-value-reset --fuse-conv-bias --fuse-matmul-bias --entry-func-name=uai_infer --api=odla_05 --emit-data-as-c model_quant.tflite -o .//uai_infer.cc

[BUG] Cannot disable TRT on device doesn't have GPU

cmake -G Ninja -DDNNL_COMPILER=/host/gcc10/bin/gcc

then i get

-- Found GCC10: /host/gcc10/bin/gcc, build odla_dnnl with bf16 support!
-- Popart library not found, skip building ODLA for Popart
-- The CUDA compiler identification is unknown
CMake Error at ODLA/platforms/tensorrt/CMakeLists.txt:17 (enable_language):
  No CMAKE_CUDA_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.

[BUG]: dnnl missing "odla_Tile" implementation

Describe the bug
link error occurred while compile caffe model include tile op

To Reproduce
$1 test file include main function
$2 model binary *.o file
$3 model binary *.bin file
$4 model executable output file
g++ -O3 -DBATCH=128 -I/host/code-base/heterogeneity-aware-lowering-and-optimization/ODLA/include $1 $2 $3 -L/host/code-base/heterogeneity-aware-lowering-and-optimization/build/lib -lodla_dnnl -o $4
Expected behavior

Screenshots

SliceOp's function signature mismatches with header file

[BUG]: resnet18 run failed on x86

Describe the bug
car detect resnet18 run failed on x86

To Reproduce
link resnet18.cc with dnnl.so, and execute resnet18.exe.

Expected behavior
exit without error

Screenshots

Additional context
"terminate called after throwing an instance of 'dnnl::error'
what(): could not create a descriptor for a deconvolution forward propagation primitive
Aborted (core dumped)"

[ODLA/DNNL] odla_BindToOutput and odla_BindToOutputByID should follow same logic

Currently, odla_BindToOutput still sets data_handle of DNNL memory:

odla_status odla_BindToOutput(odla_value value, odla_void* data_ptr,
                              odla_context context) {
  // Handle the case of output is constant due to compile-time optimization.
  if (value->is_const) {
    size_t len = value->mem.get_desc().get_size();
    if (value->elem_size == 8) {
      len *= 2;
    }
    memcpy(data_ptr, value->mem.get_data_handle(), len);
  } else {
    value->mem.set_data_handle(data_ptr); // ==> Old logic
  }
  return ODLA_SUCCESS;
}

odla_status odla_BindToOutputById(const odla_value_id value_id,
                                  odla_void* data_ptr, odla_context context) {
  std::string name((const char*)value_id);
  auto& outputs_v = context->comp->outputs_v;
  auto val = context->comp->outputs[name];
  outputs_v[name] = {val, data_ptr};
  return ODLA_SUCCESS;
}

Support for ONNX Resize-13

ONNX Resize-13 has an optional "ROI" operand.
The operand list is "input, ROI (optional), scales (optional), sizes (optional)

Even if ROI is not used, an empty tensor will be passed in order to specify a scales value.

Currently, HALO can't handle an empty tensor.

[Doc] Typo of Hanguang

Describe the bug
Typo of Hanguang

To Reproduce

Hanghuang-800 should be Hanguang-800(

Screenshots

[BUG] Need a "Fill" op

Describe the bug
Currently, HALO converts the fill op into constants by generating random data at compile time.

With the increase of batch size, the bin file will grow dramatically for some model.

To Reproduce

For example, for Bert model,
./bin/halo bert_frozen_model.pb -target cxx -disable-broadcasting -batch-size=1 -o bert1.cc
./bin/halo bert_frozen_model.pb -target cxx -disable-broadcasting -batch-size=2 -o bert1.cc

The second command will generate larger .bin file.

Expected behavior
The bin file size should remain constant.

[BUG] ODLA/Popart should use ODLA_USE_SIM_MODE instead of ODLA_USE_SIM_MODEL

Describe the bug
Build error when ODLA/Popart is enabled

To Reproduce

See the build of of PR 61: https://github.com/alibaba/heterogeneity-aware-lowering-and-optimization/pull/61/checks?check_run_id=1514167986

[BUG] Unit test ResultCheck only checks the first element

Describe the bug
Looks like Unit test ResultCheck only checks the first element

T* out_data = reinterpret_cast<T*>(out[i]);
size_t elem_size = sizeof(out_data) / sizeof(T); ===> elem_size is 1 ?
for (size_t j = 0; j < elem_size; ++j) {
bool nan_mismatch = (isnan(out_data[j]) ^ isnan(out_ref_data[j]));
if (nan_mismatch || fabs(out_data[j] - out_ref_data[j]) > thre) {
#if DEBUG_PRINT
oss << " result: FAIL [" << i <<", "<< j << "]: " << out_data[j]
<< " expects: " << out_ref_data[j] << "\n";
outfile << oss.str();
outfile.close();

[ODLA/DNNL] Support for slices with greater-than-one step

Currently, "Slice" of DNNL only supports cases with step == 1.

[BUG] [DNNL] expand_dims

Describe the bug
expand_dims() seems incorrect

To Reproduce

if src has shape [C], dst has shape [N, C, H, W], the current function will make src as [1, 1, 1, C], which is incorrect.
It should follow the broadcasting rule.

[BUG] Unsupport TFLite operation FullyConnected

Describe the bug
when TFLite model file include operation FullyConnected, the following error occurs:
E0208 07:28:31.082795 93539 tflite_parser.cc:282] Convert function not found, Please check if it is supported: Op: [9], Index: [-1]

ODLA API doc is missing

CI should generate the html using Doxygen and publish it

[BUG] inception-v3 model incorrect shape

Describe the bug

incorrectly parse the shape in the inception-v3 model, as the red font part

halo IR

To Reproduce

pr #124
remove "XFAIL: *" in models/vision/classification/inception/run_inception_v3.sh, then run "ninja check-halo-models"

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Fix ONNX weight parsing #104

[Doc] Broken url

http://yann.lecun.com/exdb/mnist/ returns "Service Unavailable"

The CI is blocked.

Profiler supports dumping all intermediate values

This is useful for debugging purpose also.

[Fusion] DeConv +BN could be fused

Currently, Halo will fuse Conv + BN. Similar idea could be applied to DeConv + BN as well.

[BUG] libodla_dnnl and libodla_dnnl_interpreter.so fails on CPUs without avx512 support

Describe the bug
libodla_dnnl.so and libodla_dnnl_interpreter.so fails on CPU without avx512 support

To Reproduce

Run profiling or inferencing on CPUs without avx512.

Expected behavior
Functionally correct on all X86_64 cpus

Screenshots

[BUG] Sub-graph compilation can't work with the option of "outputs"

Describe the bug
sub-graph compilation should work with the option of "outputs", but when I assigned to the output node to be Conv/Conv2D for Tensorflow version LPRnet, the whole graph was still compiled.

To Reproduce
./halo -target cxx /home/LPRNet_Models/tensorflow/lprnet_new_v2.pb -reorder-data-layout=channel-last -disable-broadcasting -remove-input-transpose -remove-output-transpose -o lprnet_conv.cc -entry-func-name=lprnet_conv -outputs=Conv/Conv2D

Expected behavior
a sub-graph with the output node of "conv/conv2d" should be compiled

Screenshots
output node was still d_predictions instead of conv/conv2d
void lprnet_conv(const float input_1[1 * 24 * 94 * 3],
float out_d_predictions[1 * 88 * 71]) {
lprnet_conv_init();
static odla_context Ctx;
if (Ctx == nullptr) {
odla_CreateContext(&Ctx);
};
odla_BindToArgumentById((const odla_value_id) "input_1", input_1, Ctx);
odla_BindToOutputById((const odla_value_id) "d_predictions",
out_d_predictions, Ctx);
odla_ExecuteComputation(Comp, Ctx, ODLA_COMPUTE_INFERENCE, nullptr);
}

Additional context
I check the codes in driver.cc, the output node name has some requirements, but the node name with slash can often be found at TF model. so it is better for HALO to support the node name with slash.
static llvm::cl::liststd::string Outputs(
"outputs",
llvm::cl::desc("Specify output names like -outputs=foo, -outputs=bar:0"));

[BUG]: g++ compile error using halo's output

Describe the bug
.cc could not be read by g++ compiler correctly, and it'll report "<class_ 'type'>" could not be recognized.

To Reproduce
g++ -c -o .o .cc -I<INCLUDE_PATH>

Expected behavior
exit normally and output .o

Screenshots

alibaba / heterogeneity-aware-lowering-and-optimization Goto Github PK

heterogeneity-aware-lowering-and-optimization's Introduction

HALO

Partners

How to Use HALO

A Simple Example

More Examples

Contributing

Resources

License

heterogeneity-aware-lowering-and-optimization's People

Contributors

Stargazers

Watchers

Forkers

heterogeneity-aware-lowering-and-optimization's Issues

$1 test file include main function

$2 model binary *.o file

$3 model binary *.bin file

$4 model executable output file

Recommend Projects

Recommend Topics

Recommend Org

Jobs