mlc-ai / relax Goto Github PK

License: Apache License 2.0

Shell 0.67% JavaScript 0.04% C++ 36.40% Python 59.43% C 0.69% Objective-C 0.04% Java 0.35% Groovy 0.05% Go 0.19% Rust 0.69% TypeScript 0.32% Objective-C++ 0.16% Cuda 0.16% Makefile 0.12% HTML 0.01% CMake 0.52% RenderScript 0.01% Batchfile 0.01% Cython 0.05% Jinja 0.09%

relax's People

Contributors

Stargazers

Watchers

relax's Issues

[Bug] "Please make sure you have the correct access rights" when cloning [email protected]:mlc-ai/relax.git

I'm trying to create a WASM model to run in the browser, and following the steps to do so in the documentation.

I am now at the "install TVM" step, following the "build from source" instructions on this page.

At the step git clone --recursive [email protected]:mlc-ai/relax.git tvm-unity && cd tvm-unity I get a permission issue?

Expected behavior

I expected the git repo to be cloned

Actual behavior

I could not clone the repo because of a permission issue

Environment

Mac OS Monterey on Macbook Pro M1

Steps to reproduce

Follow the steps

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

needs-triage

Log details:

#                                                                                                                                   
# To activate this environment, use
#
#     $ conda activate tvm-build-venv
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) ~/Downloads/ems$ conda activate tvm-build-venv
(tvm-build-venv) ~/Downloads/ems$ git clone --recursive [email protected]:mlc-ai/relax.git tvm-unity && cd tvm-unity
Cloning into 'tvm-unity'...
The authenticity of host 'github.com (140.82.121.3)' can't be established.
ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
(tvm-build-venv) ~/Downloads/ems$ rm -rf build && mkdir build && cd build
(tvm-build-venv) ~/Downloads/ems/build$ git clone --recursive [email protected]:mlc-ai/relax.git tvm-unity && cd tvm-unity
Cloning into 'tvm-unity'...
[email protected]: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
(tvm-build-venv) ~/Downloads/ems/build$

[Tracking Issue] Relax high-level operator migration

Operator migration:

Frontend and transformation:

PyTorch translator #95
Operator legalizer #96

The branch with latest work progress is https://github.com/MasterJH5574/relax/commits/mlc-dev/2022-12-24-op-with-struct-info-backup

[Bug] Cannot use cross entropy for multi-class classification

Hi @Ubospica and @SiriusNEO!

Thank you for your hard work on adding training to TVM. It's amazing work! My student (@ndemashov) and I are working on some experiments related to training, and we were not able to run multi-class classifier training with cross entropy loss function.

In this commit, I added a test to trainer with cross-entropy loss function. And if you run this test, then you get the following error:

...
E             7: operator()
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/vm.cc:659
E             6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/vm.cc:716
E             5: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/vm.cc:808
E             4: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/vm.cc:790
E             3: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/vm.cc:574
E             2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
E                   at /home/echuraev/Workspace/OctoML/relax-training/include/tvm/runtime/packed_func.h:1217
E             1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<void (*)(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
E                   at /home/echuraev/Workspace/OctoML/relax-training/include/tvm/runtime/packed_func.h:1213
E             0: tvm::runtime::relax_vm::CheckTensorInfo(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
E                   at /home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/builtin.cc:175
E             File "/home/echuraev/Workspace/OctoML/relax-training/src/runtime/relax_vm/builtin.cc", line 175
E           ValueError: Check failed: (ptr->dl_tensor.ndim == ndim) is false: ErrorContext(fn=loss_adjoint, loc=param[3], param=targets, annotation=R.Tensor((1,), dtype="int32"))  expect Tensor with ndim 1 but get 2

Could you please help us to figure out what's wrong with this test? If the test was configured in the correct way, then probably you could suggest to @ndemashov the possible direction for debugging and fixing this issue. Probably he can help you with this task.

In our experiments, we are using the latest state of mlc branch.

[Bug] llvm Cannot create binary operator with two operands of differing type!"' faile

Environment

Hardware: Amdgpu gfx906
TVM Version: * mlc 6fd55bc [Unity][FIX] add init file to relax.backend.contrib (#15023) (#244)
Operation system: Centos
LLVM version: 15.0.7

Steps to reproduce

install relax
1、git clone https://github.com/mlc-ai/relax.git --recursive
2、cp config.cmake build/
-DUSE_ROCM=ON -DUSE_ROCBLAS=ON -DUSE_LLVM=ON
3、 cmake ..
4、 make
5、 python mlp.py

import tvm
from tvm import relax, tir, topi
import numpy as np
from tvm.contrib import rocblas

def build_mlp(data, weight):
    bb = relax.BlockBuilder()

    with bb.function("mlp", [data, weight]):
        gv0 = bb.emit_te(tvm.contrib.rocblas.matmul, data, weight, transa=False, transb=False)
#        print(gv0)
        gv1 = bb.emit_te(topi.nn.relu, gv0)
        bb.emit_func_output(gv1)

    mod = bb.get()
    return mod


if __name__ == "__main__":
    # symbolic dimensions
    n, m = tir.Var("n", "int64"), tir.Var("m", "int64")
    # create data and weight variables
    data = relax.Var("data", relax.TensorStructInfo([n, m], "float32"))
    weight = relax.Var("weight", relax.TensorStructInfo([m, n], "float32"))

    # construct a mlp model
    mod = build_mlp(data, weight)

    # build and create vm executor
    target = tvm.target.Target("rocm", host="llvm")
    with tvm.target.Target(target):
        mod = tvm.tir.transform.DefaultGPUSchedule()(mod)
    #mod.show()
    ex = relax.build(mod, target)
    vm = relax.VirtualMachine(ex, tvm.rocm())

    # run the mlp model on relax vm
    data = tvm.nd.array((np.random.rand(16, 32).astype(np.float32)),tvm.rocm())
    weight = tvm.nd.array((np.random.rand(32, 16).astype(np.float32)),tvm.rocm())
    res = vm["mlp"](data, weight)
    print(res)

Error message

llvm-project-llvmorg-15.0.7/llvm/lib/IR/Instructions.cpp:2785: static llvm::BinaryOperator* llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, const llvm::Twine&, llvm::Instruction*): Assertion `S1->getType() == S2->getType() && "Cannot create binary operator with two operands of differing type!"' failed.
Aborted (core dumped)

gdb python --core=core_python_150871

#1  0x00007f599a690a78 in abort () from /lib64/libc.so.6
#2  0x00007f599a6881a6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f599a688252 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f58edf5db33 in llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::Instruction*) [clone .part.680] () from /usr/local/bin/libtvm.so
#5  0x00007f58f31eef72 in llvm::BinaryOperator::Create(llvm::Instruction::BinaryOps, llvm::Value*, llvm::Value*, llvm::Twine const&, llvm::Instruction*) [clone .localalias.60] () from /usr/local/bin/libtvm.so
#6  0x00007f58f00679a2 in llvm::IRBuilderBase::CreateMul(llvm::Value*, llvm::Value*, llvm::Twine const&, bool, bool) ()
   from /usr/local/bin/libtvm.so
#7  0x00007f58f0053e3c in tvm::codegen::CodeGenLLVM::CreateMul(tvm::runtime::DataType, llvm::Value*, llvm::Value*) ()
   from /usr/local/bin/libtvm.so
#8  0x00007f58f0026d4d in tvm::tir::ExprFunctor<llvm::Value* (tvm::PrimExpr const&)>::VisitExpr(tvm::PrimExpr const&) ()
   from /usr/local/bin/libtvm.so
#9  0x00007f58f0054814 in tvm::codegen::CodeGenLLVM::VisitExpr_(tvm::tir::AddNode const*) () from /usr/local/bin/libtvm.so
#10 0x00007f58f0026d4d in tvm::tir::ExprFunctor<llvm::Value* (tvm::PrimExpr const&)>::VisitExpr(tvm::PrimExpr const&) ()
   from /usr/local/bin/libtvm.so
#11 0x00007f58f0054824 in tvm::codegen::CodeGenLLVM::VisitExpr_(tvm::tir::AddNode const*) () from /usr/local/bin/libtvm.so
#12 0x00007f58f0026d4d in tvm::tir::ExprFunctor<llvm::Value* (tvm::PrimExpr const&)>::VisitExpr(tvm::PrimExpr const&) ()
   from /usr/local/bin/libtvm.so
#13 0x00007f58f0053c55 in tvm::codegen::CodeGenLLVM::VisitExpr_(tvm::tir::LTNode const*) () from /usr/local/bin/libtvm.so
#14 0x00007f58f0026d4d in tvm::tir::ExprFunctor<llvm::Value* (tvm::PrimExpr const&)>::VisitExpr(tvm::PrimExpr const&) ()
   from /usr/local/bin/libtvm.so
#15 0x00007f58f005279b in tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::IfThenElseNode const*) () from /usr/local/bin/libtvm.so
#16 0x00007f58ee2278ac in tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&) ()
   from /usr/local/bin/libtvm.so
#17 0x00007f58f0062ebe in tvm::codegen::CodeGenLLVM::CreateSerialFor(llvm::Value*, llvm::Value*, llvm::Value*, tvm::tir::Var const&, tvm::tir::Stmt const&) () from /usr/local/bin/libtvm.so
#18 0x00007f58f006366b in tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::ForNode const*) () from /usr/local/bin/libtvm.so
#19 0x00007f58ee2278ac in tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&) ()
   from /usr/local/bin/libtvm.so
#20 0x00007f58f0064ab7 in tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::AttrStmtNode const*) () from /usr/local/bin/libtvm.so
#21 0x00007f58ee2278ac in tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&) ()
   from /usr/local/bin/libtvm.so
#22 0x00007f58f0064ab7 in tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::AttrStmtNode const*) () from /usr/local/bin/libtvm.so
#23 0x00007f58ee2278ac in tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&) ()
   from /usr/local/bin/libtvm.so
#24 0x00007f58f006626b in tvm::codegen::CodeGenLLVM::AddFunctionInternal(tvm::GlobalVar const&, tvm::tir::PrimFunc const&, bool) ()
   from /usr/local/bin/libtvm.so
#25 0x00007f58f002003c in tvm::codegen::CodeGenAMDGPU::AddFunction(tvm::GlobalVar const&, tvm::tir::PrimFunc const&) ()
   from /usr/local/bin/libtvm.so
#26 0x00007f58f0028f83 in _ZN3tvm7codegen11CodeGenLLVM19AddFunctionsOrderedINS_7runtime3MapINS_9GlobalVarENS_8BaseFuncEvvE8iteratorEZNS1_19AddFunctionsOrderedIS8_EEvT_SA_EUlSA_E_EEvSA_SA_T0_ () from /usr/local/bin/libtvm.so
#27 0x00007f58f001d798 in tvm::codegen::BuildAMDGPU(tvm::IRModule, tvm::Target) () from /usr/local/bin/libtvm.so
#28 0x00007f58ef5fe9b0 in tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<void tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target), std::string)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) () from /usr/local/bin/libtvm.so
#29 0x00007f58ef5f6f6e in tvm::codegen::Build(tvm::IRModule, tvm::Target) () from /usr/local/bin/libtvm.so
#30 0x00007f58ee44b3e2 in tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&) ()
   from /usr/local/bin/libtvm.so
#31 0x00007f58ee452ce4 in tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<void tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#6}>(tvm::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule,

[Bug] ACL Core Library no longer apart of Arm Compute Library -> Failure to Compile TVM's ACL Contrib. Lib.

What Happened

When I attempt to build TVM-Unity I run into errors with the ACL Lib. I believe/hope it is because of this change not being reflected within the TVM ACL CMake Config.

The 2 errors that trip up the build are...

<arm_compute/core/Types.h> (#include w/in the ACL_Utils.h file)
<arm_compute/runtime/IAllocator.h> (#include w/in the ACL_Allocator.h file)

All the files are in /src/runtime/contrib/arm_compute_lib/ of course.

Edit: This actually affects more than just the ACL Contrib. comp. but also the Rust build --

gmake[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:1098: CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/arm_compute_lib/acl_runtime.cc.o] Error 1
error: failed to run custom build command for `tvm-sys v0.1.1-alpha (/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/rust/tvm-sys)`

Environment

Mac 14.4, Silicon/Metal/MPS/aarm64, whatever you want to call it.
Because Relax doesn't have a TVM-Unity ver., don't have that readily accessible, but, latest 😅
CMake 3.29.2, LLVM 18.1.6: CLANG, -std=c++17

Steps to reproduce

//Path to a library.
EXTERN_ACL_COMPUTE_CORE_LIB:FILEPATH=OFF

//Path to a library.
EXTERN_ACL_COMPUTE_GRAPH_LIB:FILEPATH=/Users/zack/.home/gitrepos/LLMLife/backend/Misc/ComputeLibrary-24.04/arm_compute/build/libarm_compute_graph.dylib

//Path to a library.
EXTERN_ACL_COMPUTE_LIB:FILEPATH=/Users/zack/.home/gitrepos/LLMLife/backend/Misc/ComputeLibrary-24.04/arm_compute/build/libarm_compute.dylib

There is still a core lib., it just isn't associated with a .dylib so one cannot provide the core lib dir as the filepath.

Notes

Besides the aforementioned issue, the other elephant in the room is that I am doing this on a Mac which leads to a shaky from-source Arm Compute Library build that doesn't have a full suite of functionalities. A few folks have been requesting them to provide prebuilt Mac packages (I mention because I'm sure that'd at least ensure an error-free compilation, no loose ends).

I doubt these errors are related and I imagine this is something that routinely causes issues in packages but just to be thorough (and maybe get help 😂):

 --- stderr
  error: header '/Users/zack/.home/gitrepos/LLMLife/backend/tvm/include/tvm/runtime/c_backend_api.h' does not exist.
  Error: bindgen failed to generate the Rust bindings for the C API
gmake[2]: *** [CMakeFiles/rust_ext.dir/build.make:73: /Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/rust/target/release/libcompiler_ext.so] Error 101
gmake[1]: *** [CMakeFiles/Makefile2:152: CMakeFiles/rust_ext.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/gemm.mm:48:53: error: no member named 'GetCommandQueue' in 'tvm::runtime::metal::MetalWorkspace'
   48 |   id<MTLCommandQueue> queue = entry_ptr->metal_api->GetCommandQueue(A->device);
      |                               ~~~~~~~~~~~~~~~~~~~~  ^
1 error generated.
gmake[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:1000: CMakeFiles/tvm_runtime_objs.dir/src/runtime/contrib/mps/gemm.mm.o] Error 1
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/conv.mm:36:25: error: 'CopyDataFromTo' is a protected member of 'tvm::runtime::metal::MetalWorkspace'
   36 |   entry_ptr->metal_api->CopyDataFromTo((__bridge void*)mtlbuf, 0, (__bridge void*)temp, 0,
      |                         ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/../../metal/metal_common.h:187:8: note: declared protected here
  187 |   void CopyDataFromTo(const void* from, size_t from_size, void* to, size_t to_size, size_t size,
      |        ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/conv.mm:72:25: error: 'CopyDataFromTo' is a protected member of 'tvm::runtime::metal::MetalWorkspace'
   72 |   entry_ptr->metal_api->CopyDataFromTo((__bridge void*)temp, 0, (__bridge void*)mtlbuf, 0,
      |                         ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/../../metal/metal_common.h:187:8: note: declared protected here
  187 |   void CopyDataFromTo(const void* from, size_t from_size, void* to, size_t to_size, size_t size,
      |        ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/conv.mm:106:53: error: no member named 'GetCommandQueue' in 'tvm::runtime::metal::MetalWorkspace'
  106 |   id<MTLCommandQueue> queue = entry_ptr->metal_api->GetCommandQueue(data->device);
      |                               ~~~~~~~~~~~~~~~~~~~~  ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/conv.mm:115:25: error: 'CopyDataFromTo' is a protected member of 'tvm::runtime::metal::MetalWorkspace'
  115 |   entry_ptr->metal_api->CopyDataFromTo((__bridge void*)bufB, 0, (__bridge void*)tempB, 0,
      |                         ^
/Users/zack/.home/gitrepos/LLMLife/backend/tvm-unity/src/runtime/contrib/mps/../../metal/metal_common.h:187:8: note: declared protected here
  187 |   void CopyDataFromTo(const void* from, size_t from_size, void* to, size_t to_size, size_t size,
      |        ^
4 errors generated.

Going to look into this one online as mentioned, so I imagine I'll be able to clear it up.

Solution

Obviously I don't readily have one. I figure my best option is to use CMake functions to link my IAllocator.h, Types.h to the complimenting TVM contrib. files.

I tried this but I'm probably doing something wrong...doesn't help that 1 or 2 of the problematic headers is used in multiple files AFAIK/AFAIR.

find_package(/src/runtime/contrib/arm_compute_lib/acl_utils COMPONENTS arm_compute/core/Types)
target_link_libraries(/Users/zack/.home/gitrepos/LLMLife/backend/Misc/ComputeLibrary-24.04/arm_compute/core/Types PRIVATE acl_utils::arm_compute/core/Types)

find_package(/src/runtime/contrib/arm_compute_lib/acl_allocator COMPONENTS arm_compute/runtime/IAllocator)
target_link_libraries(/Users/zack/.home/gitrepos/LLMLife/backend/Misc/ComputeLibrary-24.04/arm_compute/runtime/IAllocator PRIVATE acl_allocator::arm_compute/runtime/IAllocator)

[Bug] InitCCLPerWorker Fails when using AMD GPU Bridge

Expected behavior

MLC-LLM should be load the sharded model across all 4 AMD Instinct MI100 GPUs and start inferring.
Issue is confirmed only with the bridge enabled, adding amdgpu.use_xgmi_p2p=0 to grub config makes the issue stop with no other changes, though this reverts back to PCIe P2P only.

Here is the output when attempting to run with NCCL_DEBUG=INFO
screenlog.txt

Actual behavior

/src/extlibs/rccl/build/hipify/src/transport/p2p.cc:287 NCCL WARN Cuda failure 'invalid argument'

terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [02:18:19] /workspace/tvm/src/runtime/disco/nccl/nccl.cc:196: rcclErrror: unhandled cuda error (run with NCCL_DEBUG=INFO for details)
Stack trace:
  0: _ZN3tvm7runtime6deta
  1: tvm::runtime::nccl::InitCCLPerWorker(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
  2: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<void (tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>::AssignTypedLambda<void (*)(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)>(void (*)(tvm::runtime::ShapeTuple, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  3: tvm::runtime::DiscoWorker::Impl::CallPacked(tvm::runtime::DiscoWorker*, long, tvm::runtime::PackedFunc, tvm::runtime::TVMArgs const&)
  4: tvm::runtime::DiscoWorker::Impl::MainLoop(tvm::runtime::DiscoWorker*)
  5: 0x00007ff61c0dc252
  6: start_thread
        at ./nptl/pthread_create.c:442
  7: 0x00007ff64cd2665f
        at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
  8: 0xffffffffffffffff

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ROCm 6.0
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...): 4x AMD Instinct MI100
How you installed MLC-LLM (conda, source): conda
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10.12
TVM Unity Hash Tag: unity.txt

Steps to reproduce

Install MLC-LLM
Run the following python code to start loading/inferring

cm = ChatModule(model="goliath-120b-q4f16_1", chat_config=ChatConfig(
	max_gen_len=4096,
	conv_template="LM",
	temperature=0.75,
	repetition_penalty=1.1,
	top_p=0.9,
	tensor_parallel_shards=4,
	context_window_size=4096
))

output = cm.generate(
    prompt="What is the meaning of life?",
    progress_callback=StreamToStdout(callback_interval=2),
)

[Bug] Downcast from runtime.String to Array failed

Thanks for participating in the TVM community! We use https://discuss.tvm.ai for any general usage questions and discussions. The issue tracker is used for actionable items such as feature proposals discussion, roadmaps, and bug tracking. You are always welcomed to post on the forum first 😸

Issues that are inactive for a period of time may get closed. We adopt this policy so that we won't lose track of actionable issues that may fall at the bottom of the pile. Feel free to reopen a new one if you feel there is an additional problem that needs attention when an old one gets closed.

Expected behavior

show tuning log and start auto-schedule

Actual behavior

failed with error:
File "e2e_auto_tir.py", line 254, in
main()
File "e2e_auto_tir.py", line 195, in main
db = ms.relax_integration.tune_relax(
File "/home//mlc-relax/relax/python/tvm/meta_schedule/relax_integration.py", line 255, in tune_relax
return tune_tasks(
File "/home//mlc-relax/relax/python/tvm/meta_schedule/tune.py", line 118, in tune_tasks
task_scheduler.tune(
File "/home//mlc-relax/relax/python/tvm/meta_schedule/task_scheduler/task_scheduler.py", line 132, in tune
_ffi_api.TaskSchedulerTune( # type: ignore # pylint: disable=no-member
File "/home//mlc-relax/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in call
raise get_last_ffi_error()
tvm.error.InternalError: Traceback (most recent call last):
[bt] (8) /home//mlc-relax/relax/build/libtvm.so(tvm::meta_schedule::TaskSchedulerNode::Tune(tvm::runtime::Array<tvm::meta_schedule::TuneContext, void>, tvm::runtime::Array<tvm::FloatImm, void>, int, int, int, tvm::meta_schedule::Builder, tvm::meta_schedule::Runner, tvm::runtime::Array<tvm::meta_schedule::MeasureCallback, void>, tvm::runtime::Optionaltvm::meta_schedule::Database, tvm::runtime::Optionaltvm::meta_schedule::CostModel)+0x62b) [0x7ff172abcaab]
[bt] (7) /home//mlc-relax/relax/build/libtvm.so(tvm::meta_schedule::PostOrderApplyNode::GenerateDesignSpace(tvm::IRModule const&)+0xe93) [0x7ff172a98ed3]
[bt] (6) /home//mlc-relax/relax/build/libtvm.so(tvm::meta_schedule::MultiLevelTilingNode::Apply(tvm::tir::Schedule const&, tvm::tir::BlockRV const&)+0x496) [0x7ff172a174a6]
[bt] (5) /home//mlc-relax/relax/build/libtvm.so(tvm::meta_schedule::MultiLevelTilingNode::ApplySubRules(std::vector<tvm::meta_schedule::State, std::allocatortvm::meta_schedule::State >)+0x24a) [0x7ff172a1d9ea]
[bt] (4) /home//mlc-relax/relax/build/libtvm.so(tvm::meta_schedule::MultiLevelTilingNode::AddWriteReuse(tvm::meta_schedule::State) const+0x1c2) [0x7ff172a1c032]
[bt] (3) /home//mlc-relax/relax/build/libtvm.so(tvm::runtime::Optional<tvm::runtime::Array<tvm::Integer, void> > tvm::tir::GetAnn<tvm::runtime::Array<tvm::Integer, void> >(tvm::tir::StmtSRef const&, tvm::runtime::String const&)+0x26c) [0x7ff172a20e7c]
[bt] (2) /home//mlc-relax/relax/build/libtvm.so(tvm::runtime::Array<tvm::Integer, void> tvm::runtime::Downcast<tvm::runtime::Array<tvm::Integer, void>, tvm::runtime::ObjectRef>(tvm::runtime::ObjectRef)+0x165) [0x7ff172a20bd5]
[bt] (1) /home//mlc-relax/relax/build/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3d) [0x7ff1725097dd]
[bt] (0) /home//mlc-relax/relax/build/libtvm.so(tvm::runtime::Backtraceabi:cxx11+0x2c) [0x7ff174821ecc]
File "/home//mlc-relax/relax/include/tvm/runtime/object.h", line 920
InternalError: Check failed: (ref->template IsInstance()) is false: Downcast from runtime.String to Array failed.

Environment

Ubuntu20.04, mlc-relax latest.

Steps to reproduce

1.Change "tar" to "ndk" in function tvm.meta_schedule.testing.custom_builder_runner.run_module_via_rpc.
2.use relax_example/e2e_auto_tir.py as:
python3 e2e_auto_tir.py --workload=resnet_18 --target=“opencl -device=mali”
Just for tuning on opencl mobile gpu.

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

needs-triage

[Bug] ‘DeviceName’ was not declared in this scope

Expected behavior

build successfully

Actual behavior

/home/mlc-relax/relax/src/runtime/contrib/papi/papi.cc: In function ‘int tvm::runtime::profiling::component_for_device(tvm::Device)’:
/home/mlc-relax/relax/src/runtime/contrib/papi/papi.cc:76:58: error: ‘DeviceName’ was not declared in this scope; did you mean ‘kDeviceName’?
76 | LOG(WARNING) << "PAPI does not support device " << DeviceName(dev.device_type);

Environment

ubuntu20.04, relax version latest

Steps to reproduce

mkdir build
cd build
cmake ..
make -j8

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

needs-triage

[Bug] dlight decode-gemv generating incorrect schedule

the unit test test_decode_gemv_4 in dlight branch fails to build if I try to compile it using tvm.build(mod, target) at the end of the test case.

It throws the error:
ValueError: Check failed: n_deepest_reduction_loops == reduction_loops.size() (0 vs. 1) : Cross-thread reduction requires all the reduction-related loops to be the deepest among all statements outside the desired block. However, block matmul needs cross-thread reduction, while the reduction-related loops outside of it are not the deepest statements, which violates the condition.

cc @zxybazh

[Bug] Build Error: Undefined Symbol HexagonModuleCreate

Building on Linux aarch64 causes a build error:

[100%] Linking CXX shared library libtvm_allvisible.so        [100%] Linking CXX shared library libtvm.so                   ld.lld: error: undefined symbol: tvm::runtime::HexagonModuleCreate(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, tvm::runtime::FunctionInfo, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>>, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>>, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>> const, tvm::runtime::FunctionInfo>>>, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char>>)                                                 >>> referenced by codegen_hexagon.cc:649 (/data/data/com.termux/files/home/mlc-llm/3rdparty/tvm/src/target/llvm/codegen_hexagon.cc:649)                                                   >>>               CMakeFiles/tvm_objs.dir/src/target/llvm/codegen_hexagon.cc.o:(tvm::codegen::BuildHexagon(tvm::IRModule, tvm::Target))                                                   c++: error: linker command failed with exit code 1 (use -v to see invocation)

Hexagon isn't even enabled in config.cmake, so I don't know why it's trying to compile parts of it

config.cmake pastebin

[Bug] rocm.py fails to correctly find LLVM without PATH being set

Expected behavior

TVM should find the LLVM ld.lld file

Actual behavior

When running mlc_llm chat with JIT compiling on, TVM fails to find the LLVM installation, throwing
RuntimeError: cannot find ld.lld, canditates are: ['ld.lld-17.0', 'ld.lld-17', 'ld.lld', '/opt/rocm/llvm/bin']

Environment

Testing in an MLC docker container with fresh installs of nightly

Steps to reproduce

run mlc_llm chat HF://<model>, it will download the model, compile it, then crash when saving the .so file

Triage

Line 55 of https://github.com/mlc-ai/relax/blob/mlc/python/tvm/contrib/rocm.py incorrectly forgets to add ld.lld (or whatever it finds in the lines above) to the /opt/rocm/llvm/bin path, which then returns None since os.path.isfile in https://github.com/mlc-ai/relax/blob/mlc/python/tvm/contrib/utils.py#L253 returns False when pointed at directories.

error when call tune_tir for Android OpenCL by RPC

Dear
I use rpc to tune my Tir Function On Android device for OpenCL.
My code is as below:

rpc_host = "127.0.0.1"
rpc_port = 9190
rpc_key = "android"

my_rpc_config = ms.runner.RPCConfig(
            tracker_host=rpc_host,
            tracker_port=rpc_port,
            tracker_key=rpc_key,
            session_timeout_sec=180,
        )
my_workers = my_rpc_config.count_num_servers(allow_missing=False)

def get_runner():
    runner_config = {
        "evaluator_config": ms.runner.EvaluatorConfig(
            number=3,
            repeat=1,
            min_repeat_ms=100,
            enable_cpu_cache_flush=False,
        ),
        "alloc_repeat": 5,
    }
    runner = ms.runner.RPCRunner(
        rpc_config=my_rpc_config, max_workers=my_workers, **runner_config
    )

    return runner

my_target = tvm.target.Target("opencl", host="llvm -mtriple=aarch64-linux-android")
database = ms.tune_tir(
    mod=MyModule,
    target=my_target,
    max_trials_global=64,
    num_trials_per_iter=64,
    work_dir="./",
    runner=get_runner(),
)

got below error:

2023-07-11 16:55:32 [INFO] [task_scheduler.cc:121] [Task #0: main] Trial #2: Error in running:
RPCRunner: An exception occurred
Traceback (most recent call last):
  File "/home/qq/work/tvm_llm/relax/python/tvm/exec/popen_worker.py", line 87, in main
    result = fn(*args, **kwargs)
  File "/home/qq/work/tvm_llm/relax/python/tvm/meta_schedule/runner/rpc_runner.py", line 392, in _worker_func
    rt_mod: Module = f_upload_module(session, local_path, remote_path)
  File "/home/qq/work/tvm_llm/relax/python/tvm/meta_schedule/runner/rpc_runner.py", line 451, in default_upload_module
    rt_mod: Module = session.load_module(remote_path)
  File "/home/qq/work/tvm_llm/relax/python/tvm/rpc/client.py", line 161, in load_module
    return _ffi_api.LoadRemoteModule(self._sess, path)
  File "/home/qq/work/tvm_llm/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
    raise get_last_ffi_error()
tvm.error.RPCError: Traceback (most recent call last):
  5: TVMFuncCall
  4: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS0_6ModuleES5_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEE17AssignTypedLambdaINS0_UlS5_SB_E3_EEEvT_SB_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SH_SL_
  3: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
  2: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
  1: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)
  0: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  File "/home/qq/work/tvm_llm/relax/src/runtime/rpc/rpc_endpoint.cc", line 380
RPCError: Error caught from RPC call:
[16:55:22] /home/qq/work/tvm_llm/relax/apps/cpp_rpc/rpc_env.cc:310: tar: chdir './rpc/tmp/': No such file or directory

Could please help check this ?

[Bug] TVM build failed on aarch64 (RK3588 / Orange Pi 5+)

Expected behavior

built TVM runtime

Actual behavior

av1d@ubuntu:~/mlc-llm/tvm_unity/build$ cmake .. && cmake --build . --target runtime --parallel $(nproc) && cd ../..
CMake Error at build/config.cmake:478:
  Parse error.  Expected a command name, got unquoted argument with text
  "\nset".
Call Stack (most recent call first):
  CMakeLists.txt:18 (include)


-- Configuring incomplete, errors occurred!

Environment

Ubuntu 22.04.3, Python 3.11.0rc1, cmake version 3.22.1, latest TVM, RK3588

Steps to reproduce

# clone from GitHub
git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
# create build directory
mkdir -p build && cd build
# generate build configuration
cp ../cmake/config.cmake . && echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)\nset(USE_OPENCL ON)" >> config.cmake
# build `TVM runtime`
cmake .. && cmake --build . --target runtime --parallel $(nproc) && cd ../..

Triage

needs-triage

[Bug] [Unity] [Metascheduler] cannot tune relax linear/matmul with M > 1 for cuda

Unable to tune linear/matmul having M value bigger than 1.

The error message is different comparing to Unity branch and this fact causes me to submit this bug, since changes in mlc-ai relax affected this use case and seems should be fixed here as well, not only in Unity

import tvm
from tvm import meta_schedule as ms
from tvm.relay.backend import Executor
from tvm import relax
from tvm.relax.testing import nn

# -------- Func definition
class Linear(nn.Module):
    def __init__(self, in_features, out_features, dtype: str, bias=False):
        self.in_features = in_features
        self.out_features = out_features
        self.weight = nn.Parameter(
            (out_features, in_features), dtype=dtype, name="linear_weight"
        )
        if bias:
            self.bias = nn.Parameter((out_features,), dtype=dtype, name="linear_bias")
        else:
            self.bias = None

    def forward(self, input: relax.Expr) -> relax.Var:
        return nn.emit(relax.op.linear(input, self.weight, self.bias))

bb = relax.BlockBuilder()
seq_len = 4
with bb.function("func1"):
    model = Linear(2048, 768, "float16")
    input = nn.Placeholder((seq_len, 2048), dtype="float16", name="input")
    with bb.dataflow():
        res = model(input)
        params = [
            input,
        ] + model.parameters()
        gv = bb.emit_output((res,))
    bb.emit_func_output(gv, params)

mod = bb.get()
gv = mod.get_global_var("func1")
bb.update_func(gv, mod[gv].with_attr("func1", 1))

mod = relax.pipeline.get_pipeline()(mod)
mod = relax.transform.LiftTransformParams()(mod)

mod = tvm.tir.transform.ForceNarrowIndexToInt32()(mod)

# ------ Metascheduler starts here
database = None

strategy_name = "evolutionary"
name = f"relax_linear_{seq_len}_2048_2048_768"
work_dir = f"./{name}/"
module_equality_name = "ignore-ndarray"

target = tvm.target.Target("nvidia/geforce-rtx-2060", host="llvm")
executor = Executor("graph")
mod = mod.with_attr("executor", executor)
ndk_builder = ms.builder.LocalBuilder(timeout_sec=60)
evaluator_config=ms.runner.EvaluatorConfig(
    number=3,
    repeat=1,
    min_repeat_ms=100,
    enable_cpu_cache_flush=False,
)
ms_rpc_runner = ms.runner.LocalRunner(evaluator_config=evaluator_config,
            alloc_repeat=1,
        )
ms.relax_integration.tune_relax(
    mod=mod,
    target=target,
    params={},
    work_dir=work_dir,
    max_trials_global=1024,
    strategy=strategy_name,
    builder=ndk_builder,
    runner=ms_rpc_runner,
    module_equality=module_equality_name,
)

how to update repo remotely?

I use
git clone --recursive https://github.com/mlc-ai/relax.git
to clone down the repo.
But every time I use "git pull origin", it occurs unmerged paths and so many modified or both modified files. While I didn't modify anything.
So I want to know how to update code on my local machine?

[Bug] Cmake Error: /.tvm_build_x86_64/main/source/cmake/libs/../../3rdparty/libbacktrace/configure: not found

Expected behavior

Compiles Rust crate on Unity branch revision f611983b2c7c320f854bdc239657538df7efa4cc.

Actual behavior

Cmake error:

--- stdout
  CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
  CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
  TARGET_CMAKE_TOOLCHAIN_FILE = None
  CMAKE_TOOLCHAIN_FILE = None
  CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
  CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
  TARGET_CMAKE_PREFIX_PATH = None
  CMAKE_PREFIX_PATH = None
  CMAKE_x86_64-unknown-linux-gnu = None
  CMAKE_x86_64_unknown_linux_gnu = None
  TARGET_CMAKE = None
  CMAKE = None
  running: cd "/home/rou/.tvm_build_x86_64/main/build/build" && CMAKE_PREFIX_PATH="" "cmake" "/home/rou/.tvm_build_x86_64/main/source" "-G" "Unix Makefiles" "-DCMAKE_SYSTEM_NAME=Linux" "-DCMAKE_SYSTEM_PROCESSOR=x86_64" "-DCMAKE_INSTALL_PREFIX=/home/rou/.tvm_build_x86_64/main/build" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_C_COMPILER=/usr/bin/x86_64-linux-gnu-gcc" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_COMPILER=/usr/bin/x86_64-linux-gnu-g++" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/usr/bin/x86_64-linux-gnu-gcc" "-DCMAKE_BUILD_TYPE=Debug"
  -- Build in Debug mode
  -- Forbidding undefined symbols in shared library, using -Wl,--no-undefined on platform Linux
  -- Build with RPC support...
  -- Build with Graph Executor support...
  -- Build with profiler...
  -- Build with AOT Executor support...
  -- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) 
  -- Didn't find the path to CCACHE, disabling ccache
  -- VTA build with VTA_HW_PATH=/home/rou/.tvm_build_x86_64/main/source/3rdparty/vta-hw
  -- Build VTA runtime with target: sim
  -- Build with contrib.random
  -- Build with contrib.sort
  -- Build with contrib.hybriddump
  -- Git found: /usr/bin/git
  -- Found TVM_GIT_COMMIT_HASH=4267fbf6a173cd742acb293fab4f77693dc4b887
  -- Found TVM_GIT_COMMIT_TIME=2023-05-29 06:28:33 +0900
  -- Could NOT find LIBBACKTRACE (missing: LIBBACKTRACE_STATIC_LIBRARY LIBBACKTRACE_INCLUDE_DIR)    
  -- Building libbacktrace from 3rdparty/libbacktrace
  -- Building with TVM Map...
  -- Build with thread support...
  -- Configuring done
  -- Generating done
  -- Build files have been written to: /home/rou/.tvm_build_x86_64/main/build/build
  running: cd "/home/rou/.tvm_build_x86_64/main/build/build" && MAKEFLAGS="-j --jobserver-fds=11,13 --jobserver-auth=11,13" "cmake" "--build" "." "--target" "install" "--config" "Debug"
  Consolidate compiler generated dependencies of target tvm_libinfo_objs
  [  0%] Performing configure step for 'project_libbacktrace'
  [  0%] Built target tvm_libinfo_objs
  Consolidate compiler generated dependencies of target tvm_objs
  [ 91%] Built target tvm_objs

  --- stderr
  /bin/sh: 1: /home/rou/.tvm_build_x86_64/main/source/cmake/libs/../../3rdparty/libbacktrace/configure: not found
  make[2]: *** [CMakeFiles/project_libbacktrace.dir/build.make:131: libbacktrace/src/project_libbacktrace-stamp/project_libbacktrace-configure] Error 127
  make[1]: *** [CMakeFiles/Makefile2:229: CMakeFiles/project_libbacktrace.dir/all] Error 2
  make[1]: *** Waiting for unfinished jobs....
  make: *** [Makefile:136: all] Error 2
  thread 'main' panicked at '
  command did not execute successfully, got: exit status: 2

  build script failed, must exit now', /home/rou/.cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.50/src/lib.rs:1098:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Environment

WSL Ubuntu 20.04 X86

Steps to reproduce

Clone mlc-ai/relax with the mlc branch on revision f611983b2c7c320f854bdc239657538df7efa4cc and cd into tvm/rust and run cargo build or create an empty Rust project and import TVM and run cargo build.

Triage

needs-triage

[Flaky Test] Numerical gradient test for `greater` op

Error log. https://ci.mlc.ai/blue/organizations/jenkins/mlc-ai-relax/detail/relax/45/pipeline

=================================== FAILURES ===================================
___________________ test_binary[greater-relax.greater-llvm] ____________________

target = 'llvm', dev = cpu(0)
binary_op_func = <function greater at 0x7f6df80c5cb0>
binary_op_name = 'relax.greater'

    @tvm.testing.parametrize_targets("llvm")
    def test_binary(target, dev, binary_op_func, binary_op_name):
        data1_numpy = np.random.uniform(1, 2, (3, 3)).astype(np.float32)
        data2_numpy = np.random.uniform(1, 2, (3, 3)).astype(np.float32)
        relax_check_gradients(
>           binary_op_func, binary_op_name, [data1_numpy, data2_numpy], target, dev, (3, 3)
        )

tests/python/relax/test_op_gradient_numeric.py:156: 

...

            if dist > atol * sqrt_n + rtol * grad_norm:
                raise AssertionError(
                    "Analytical and numerical grads wrt '{}' differ too much\n"
                    "analytical grad = {}\n numerical grad = {}\n"
                    "{}% of elements differ, first 10 of wrong positions: {}\n"
                    "distance > atol*sqrt(n) + rtol*grad_norm\n"
                    "distance {} > {}*{} + {}*{}".format(
                        x_name,
                        grad,
                        ngrad,
                        wrong_percentage,
                        wrong_positions[:10],
                        dist,
                        atol,
                        sqrt_n,
                        rtol,
>                       grad_norm,
                    )
                )
E               AssertionError: Analytical and numerical grads wrt '1' differ too much
E               analytical grad = [[0. 0. 0.]
E                [0. 0. 0.]
E                [0. 0. 0.]]
E                numerical grad = [[   0.        0.     -943.2258]
E                [   0.        0.        0.    ]
E                [   0.        0.        0.    ]]
E               11% of elements differ, first 10 of wrong positions: [(0, 2)]
E               distance > atol*sqrt(n) + rtol*grad_norm
E               distance 943.225830078125 > 0.01*3.0 + 0.1*943.225830078125

python/tvm/testing/utils.py:263: AssertionError
=========================== short test summary info ============================
FAILED tests/python/relax/test_op_gradient_numeric.py::test_binary[greater-relax.greater-llvm]

[Bug] Build tvm-unity on aarch64 device

Hi everyone, I want to build tvm-unity from source on aarch64 device. When building libtvm using CMake, this step gives error messages when running up to 100%. Can anyone give me some advice? Thank you.

Expected behavior

Successfully building tvm-unity

Actual behavior

Error when building up to 100%

Environment

Operating system: Ubuntu 20.04 aarch64
TVM version: git clone mlc-ai/relax (sha 43985e7)

Steps to reproduce

Build with OpenCL on

git clone --recursive https://github.com/mlc-ai/relax.git tvm_unity && cd tvm_unity/
mkdir -p build && cd build
cp ../cmake/config.cmake .
echo "set(CMAKE_BUILD_TYPE RelWithDebInfo)" >> config.cmake
echo "set(USE_LLVM "/usr/bin/llvm-config-16 --ignore-libllcm --link-static")" >> config.cmake
echo "set(HIDE_PRIVATE_SYMBOLS ON)" >> config.cmake
echo "set(USE_OPENCL ON)" >> config.cmake
cmake .. && cmake --build . --parallel 3 && cd ../..

Did you forget to bind?

Expected behavior

generate a .so model file

Actual behavior

report error:
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /home/mlc-relax/relax/build/libtvm.so(tvm::ApplyPasses(tvm::IRModule, tvm::transform::Sequential)+0x42) [0x7f5212c4d512]
[bt] (7) /home/mlc-relax/relax/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x56) [0x7f5212d21dd6]
[bt] (6) /home/mlc-relax/relax/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x347) [0x7f5212d21aa7]
[bt] (5) /home/mlc-relax/relax/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x44a) [0x7f5212d2429a]
[bt] (4) /home/mlc-relax/relax/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x347) [0x7f5212d21aa7]
[bt] (3) /home/mlc-relax/relax/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x1f4) [0x7f5212d22b84]
[bt] (2) /home/mlc-relax/relax/build/libtvm.so(+0x256a123) [0x7f5213820123]
[bt] (1) /home/mlc-relax/relax/build/libtvm.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x3d) [0x7f521298bfcd]
[bt] (0) /home/mlc-relax/relax/build/libtvm.so(tvm::runtime::Backtraceabi:cxx11+0x2c) [0x7f5214c849dc]
Did you forget to bind?
Variable A is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable T_relu is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/home/mlc-relax/relax/src/tir/analysis/verify_memory.cc", line 205
RuntimeError: Memory verification failed with the following errors:
# from tvm.script import tir as T

Environment

Ubuntu 20.04, mlc-relax up to date, python3.8

Steps to reproduce

cross_compiler = "/home/AndroidSdk/ndk/23.2.8568313/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++"
target = tvm.target.Target("opencl --device=mali", host="llvm --mtriple=aarch64-linux-gnu")
#target = tvm.target.Target("llvm --num-cores=4 --mtriple=aarch64-linux-android --mattr=+neon")
model_out = "./libs/mobilenet.so"
relay_mod, relay_param,_, _ = get_network("mobilenet", 1)
relax_mod = relay_translator.from_relay(relay_mod["main"], target, relay_param)
ex = relax.build(relax_mod, target)
ex.export_library(model_out, cc=cross_compiler)

in function get_network, it use testing.mobilenet.get_workload to load model.

Triage

*(backend::OpenCL)

[Tracking Issue] Relax training M0 migration and polishment

In this fork @Ubospica and me collaborated on the development of a training workflow on relax. We finished M0 and the efforts included:

An automatic differentiation pass on relax high level operators.
Some relax operators for training.
Gradients registered for some relax operators.
A framework supports training with different optimizers.
A trainer wrapper.

Now these results are being migrated to new struct info infra. During this migration, we also try our best to polish our previous work. Related PRs and Progress:

[Bug] CrossEntropyLoss function doesn't work for training.

Hello @SiriusNEO, @Ubospica!

Thank you for your hard work on adding training to TVM. It is awesome work!
My student @ndemashov is using this functionality at his research and he found an issue that it wasn't possible to use CrossEntropyLoss function for training.

Probably you could correct us if we use CrossEntropyLoss function in incorrect way. The commit with the failing test and some other minor changes is presented at this commit: echuraev@8f7efa6

Please take a look at the commit message of this commit. It contains detailed description of all issues which were occurred while our experiments.

From my point of view, it is necessary to implement gradient function for nll_loss function. If it is true (I believe this true), I think it might be a good task for @ndemashov and he can fix that. It will help him to dive deeper to training in TVM. What do you think about that?
Do you have any ideas about the issue with different data types for the gradient function and how it should be fixed? I didn't spend much time on looking the formula of gradient for nll_loss function, so if you have any links, could you please share them with us?

Expected behavior

Training should work with CrossEnropyLoss function

Actual behavior

It doesn't work.

Environment

Ubuntu Linux on one machine and WSL on another. The latest state of mlc-ai/relax branch.

Steps to reproduce

Test was added at this commit: echuraev@8f7efa6

[Bug] vm_profiler_load_executable doesn't support OpenCL

Expected behavior

print profiling data

Actual behavior

OpenCL profiling data is 0.0 or nan.
But if I use llvm(cpu) as my target, it will print out valid profiling data.

Environment

host pc is ubuntu20.04, mobile device mtk9000

Steps to reproduce

tvm::runtime::Module mod_relax = tvm::runtime::Module::LoadFromFile(module_path);
tvm::runtime::PackedFunc vm_load_executable = mod_relax.GetFunction("vm_profiler_load_executable");
tvm::runtime::Module mod = vm_load_executable();
tvm::runtime::PackedFunc vm_initialization = mod.GetFunction("vm_initialization");
tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input");
tvm::runtime::PackedFunc get_output = mod.GetFunction("get_output");
tvm::runtime::PackedFunc invoke = mod.GetFunction("invoke_stateful");
tvm::runtime::PackedFunc profile = mod.GetFunction("profile");

vm_initialization(4, 0, 2);
set_input("main", input0, input1, input2);
invoke("main");
output = get_output("main");
std::string prof = profile("main");

Triage

Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).

OpenCL profiling

[Bug] ValueError: Multiple tensors as index not yet supported

I installed web-stable-diffusion according to the official documentation, but it couldn't run. It seemed to be caused by changes in relax.

Expected behavior

build successfully

Actual behavior

/web-stable-diffusion# python3 build.py --target cuda
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 17.38it/s]
/root/anaconda3/envs/mlc/lib/python3.9/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:181: FutureWarning: The configuration file of this scheduler: LCMScheduler {
"_class_name": "LCMScheduler",
"_diffusers_version": "0.24.0",
"beta_end": 0.012,
"beta_schedule": "scaled_linear",
"beta_start": 0.00085,
"clip_sample": false,
"clip_sample_range": 1.0,
"dynamic_thresholding_ratio": 0.995,
"num_train_timesteps": 1000,
"original_inference_steps": 50,
"prediction_type": "epsilon",
"rescale_betas_zero_snr": false,
"sample_max_value": 1.0,
"set_alpha_to_one": true,
"steps_offset": 0,
"thresholding": false,
"timestep_scaling": 10.0,
"timestep_spacing": "leading",
"trained_betas": null
}
is outdated. steps_offset should be set to 1 instead of 0. Please make sure to update the config accordingly as leaving steps_offset might led to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file
deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False)
Traceback (most recent call last):
File "/root/zrx/web-stable-diffusion/build.py", line 158, in
mod, params = trace_models(torch_dev_key)
File "/root/zrx/web-stable-diffusion/build.py", line 81, in trace_models
clip = trace.clip_to_text_embeddings(pipe)
File "/root/zrx/web-stable-diffusion/web_stable_diffusion/trace/model_trace.py", line 27, in clip_to_text_embeddings
mod = dynamo_capture_subgraphs(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/tvm/relax/frontend/torch/dynamo.py", line 198, in dynamo_capture_subgraphs
compiled_model(*params, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/root/zrx/web-stable-diffusion/web_stable_diffusion/trace/model_trace.py", line 20, in forward
text_embeddings = self.clip(text_input_ids)[0]
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 800, in forward
return self.text_model(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 697, in forward
causal_attention_mask = _create_4d_causal_attention_mask(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
return callback(frame, cache_entry, hooks, frame_state)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
result = inner_convert(frame, cache_size, hooks, frame_state)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
return fn(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
return _compile(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
r = func(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
out_code = transform_code_object(code, transform)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
transformations(instructions, code_options)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
tracer.run()
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
super().run()
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
and self.step()
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step
getattr(self, inst.opname)(inst)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE
self.output.compile_subgraph(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 857, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/root/anaconda3/envs/mlc/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
r = func(*args, **kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/dynamo/output_graph.py", line 1009, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/torch/init.py", line 1607, in call
return self.compiler_fn(model, inputs, **self.kwargs)
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/tvm/relax/frontend/torch/dynamo.py", line 184, in capture
mod = from_fx(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/tvm/relax/frontend/torch/fx_translator.py", line 1635, in from_fx
return TorchFXImporter().from_fx(
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/tvm/relax/frontend/torch/fx_translator.py", line 1522, in from_fx
self.env[node] = self.convert_mapfunc_name
File "/root/anaconda3/envs/mlc/lib/python3.9/site-packages/tvm/relax/frontend/torch/fx_translator.py", line 1291, in _getitem
raise ValueError("Multiple tensors as index not yet supported")
torch._dynamo.exc.BackendCompilerFailed: backend='_capture' raised:
ValueError: Multiple tensors as index not yet supported

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

[12:20:18] /workspace/tvm/src/relax/ir/block_builder.cc:64: Warning: BlockBuilder destroyed with remaining blocks!

Environment

Linux, cuda11.8, torch 2.1, latest mlc-ai/relax or mlc-ai-nightly-cu118, lastest diffusers

Steps to reproduce

Preferably a minimal script to cause the issue to occur.

Triage

frontend:pytorch

[Bug] Error in relax.op.masked_fill and _ffi_api.full_like function

Expected behavior

In the source code, values = _ffi_api.full_like(x, value) # type: ignore is supposed to be functional without dtype argument.

def masked_fill(x: Expr, mask: Expr, value: Expr):
    """Fill a tensor by a specified value in places defined by a mask.
    Parameters
    ----------
    x : relax.Expr
        The input data to the operator.
    mask : relax.Expr
        The mask.
    value : relax.Expr
        The value to set in the input tensor.
    Returns
    -------
    result : relax.Expr
        The filled tensor.
    """
    values = _ffi_api.full_like(x, value)  # type: ignore
    return _ffi_api.where(mask, values, x)  # type: ignore

Actual behavior

During compilation, relax.op.full_like must receive 3 arguments. TVMError: Function relax.op.full_like(0: RelayExpr, 1: RelayExpr, 2: DataType) -> RelayExpr expects 3 arguments, but 2 were provided.

I can manually replace masked_fill(x, mask, value) with full_like(x, value, x.struct_info.dtype) and where(mask, value, x) to get the right result as a workaround.

    attention_scores = masked_fill(attn_weights, astype(attention_mask, dtype="bool"), fill_value)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlc-chat-venv/lib/python3.11/site-packages/tvm/relax/op/mask.py", line 37, in masked_fill
    values = _ffi_api.full_like(x, value)  # type: ignore
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  3: TVMFuncCall
  2: _ZN3tvm7runtime13PackedFun
  1: tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::RelayExpr, tvm::RelayExpr, tvm::runtime::DataType)>::AssignTypedLambda<tvm::RelayExpr (*)(tvm::RelayExpr, tvm::RelayExpr, tvm::runtime::DataType)>(tvm::RelayExpr (*)(tvm::RelayExpr, tvm::RelayExpr, tvm::runtime::DataType), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/include/tvm/runtime/packed_func.h", line 1731
TVMError: Function relax.op.full_like(0: RelayExpr, 1: RelayExpr, 2: DataType) -> RelayExpr expects 3 arguments, but 2 were provided.

Environment

mlc-ai-nightly-cu116==0.12.dev1365
mlc-chat-nightly-cu116==0.1.dev309

Steps to reproduce

Calling relax.op.masked_fill(x, mask, value)

Triage

relax:op

[Bug] How to use tvm.rpc.RPCSession.get_function?

I want to use this method to run a cpp runtime function "vm.builtin.ndarray_cache.load" on my android device by tvm rpc, my code is below:

from tvm.auto_scheduler.utils import request_remote
remote = request_remote("android", "127.0.0.1", 9190)
dev = remote.cl()
remote_load = remote.get_function("vm.builtin.ndarray_cache.load")
remote_load("/data/local/tmp/params", dev.device_type, dev.device_id)

After I run these, I got a error below:
/data/projects/relax/apps/android_rpc/app/src/main/jni/../../../../../../include/../src/runtime/c_runtime_api.cc:131: InternalError: Check failed: (allow_missing) is false: Device API rpc is not enabled.

How can I fix it?

[Tracking Issue] CUTLASS Integration

Hi, I noticed a few PRs regarding CUTLASS and wanted to know what features you are interested in using. Also, if you have any questions please don't hesitate to post on our repo.

@hwu36 for vis

mlc-ai / relax Goto Github PK

relax's People

Contributors

Stargazers

Watchers

Forkers

relax's Issues

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Environment

Steps to reproduce

Error message

What Happened

Environment

Steps to reproduce

Notes

Solution

Expected behavior

Actual behavior

Environment

Steps to reproduce

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Expected behavior

Actual behavior

Environment

Steps to reproduce

Triage

Recommend Projects

Recommend Topics

Recommend Org