tensorflow / mlir-hlo Goto Github PK

C++ 31.35% MLIR 64.79% Shell 0.31% CMake 0.78% Python 0.62% C 0.43% Starlark 1.30% Jupyter Notebook 0.42%

mlir-hlo's Introduction

MLIR-HLO: A Standalone "HLO" MLIR-based Compiler

The code here exists in two places:

https://github.com/openxla/xla/tree/main/xla/mlir_hlo: this is the canonical location and where contributions should be made using GitHub pull-requests.
(DEPRECATED) https://github.com/tensorflow/mlir-hlo; this is a standalone repository with a view to the same code to allow other projects to use this without depending on the entire XLA monorepo. This repository is slated for deletion. All dependencies on MHLO should go through XLA. Users of MHLO should migrate to StableHLO whenever possible.

This implements a self-contained compiler for a linear algebra set of operations inspired by XLA HLO IR using MLIR components. It is designed to provide an end-to-end flow independent of TensorFlow and XLA, but usable inside of these projects.

Coding practice and conventions in this repository follow the MLIR Developer Guide in this repo as part of the intent to act as an incubator for technology to upstream.

QuickStart: building and testing

These instructions work on Linux, you may have to adjust for your platform.

To build the code in this repository, you need a clone of the LLVM/MLIR git repository:

$ git clone https://github.com/llvm/llvm-project.git

You need to make sure you have the right commit checked out in the LLVM repository (you need to do this every time you pull from this repo):

$ (cd llvm-project && git checkout $(cat ../build_tools/llvm_version.txt))

We provide a script to configure and build LLVM/MLIR:

$ build_tools/build_mlir.sh ${PWD}/llvm-project/ ${PWD}/llvm-build

Again this is something to do every time you pull from this repository and the LLVM revision changes.

Finally you can build and test this repository:

$ mkdir build && cd build
$ cmake .. -GNinja \
   -DLLVM_ENABLE_LLD=ON \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=On \
   -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
$ ninja check-mlir-hlo

Overview

MLIR-HLO aims to provide an end-to-end compiler for CPU and GPU, as well as building reusable blocks for other accelerators. This is heavily inspired by the success of XLA.

XLA (Accelerated Linear Algebra) is a domain-specific compiler framework and execution environment for linear algebra, which powers code-generation for ML frameworks like TensorFlow, JAX, and others.

A cornerstone of XLA is the HLO (High Level Optimizer) IR, which offers a carefully fixed selected list of operations, mostly orthogonal to each other. It provides an efficient optimizer for computations expressed with this set of operations and generate codes for hardware platforms like CPU, GPU, and TPUs. Its goal is to provide a uniform interface to compile and execute these optimized HLO programs independently of the targeted device. It is not a front-end ML system like TensorFlow or JAX, rather it is a backend framework that optimizes HLO and lowers to machine code.

The HLO set of operations is closed and has well defined semantics. HLO operations operate on immutable Tensors with static shapes (actually bounded shapes to be exact) and explicit broadcasts.

MLIR is a compiler infrastructure which intends to come with "battery included", as such it intends to provide all the blocks required to assemble graph optimization and codegen pipelines. The longer term roadmap for MLIR is to provide a Tensor Compute Primitive (TCP) dialect, which should hopefully be general enough to model what HLO represents today (see slides and recording for a technical discussion on this topic).

The work on MLIR-HLO can be seen as a stepping stone towards building TCP, while integrating intermediate components into XLA itself by relying on the well-proven HLO IR and introducing more pieces from upstream MLIR (Linalg, Vector, GPU dialect, ...). This document provides more information on the current migration of the XLA GPU codegen.

MLIR Dialects for XLA-style compilation

This repository defines three dialects to support a HLO-like compilation pipeline using MLIR:

chlo: the "client" HLO dialect, intended to be closer to the frontend (including implicit broadcast semantics).
mhlo: "meta"-HLO dialect ; similar to xla_hlo, but with extensions for dynamic shape support.
lmhlo: "late"-"meta"-HLO, it is the IR after buffer allocation is performed. In XLA the buffer allocation is a side-data structure which keeps track of these informations, while this separate dialect materializes it in the IR.

We describe these in more details below.

HLO Client Dialect: `chlo`.

It was originally designed to map the XLA client APIs (e.g., ops supports implicit broadcast and roughly modeled on XlaBuilder API) modulo support for dynamic shapes and additional ops required to support dynamic client side HLOs.
Ops can be from either the XlaBuilder or XLA helper functions can be converted into ops (e.g., given ambiguity in what constitutes these ops, there is some freedom to decide), the goal of this dialect is to correspond close to client level and enable a thin layer between client use and op construction (making it cheap to construct and optimizations on the dialect close to optimizations on the client ops).

Entry:

The vast majority of old "client" interactions are via the XlaBuilder APIs. These APIs are used by TF2XLA kernels, JAX, PyTorch bridge and directly. The legalization path (described below) can also reuse the XlaBuilder's APIs to construct XLA Client HLO ops directly (this uses MlirXlaBuilder which is a subclass of XlaBuilder).
The other entry point is during legalization from TensorFlow ops in the TF Graph Compiler and other tools (e.g., SavedModel lowering and TFCompile).

Exit:

MHLO
May be exported to xla::HloInstructionProto by invoking the XlaBuilder APIs (with regular XlaBuilder)

The chlo dialect started originally as mapping to the XLA client Builder APIs. It enables it to both be constructed and converted back to existing XLA interfaces using the XlaBuilder API. Due to the way that translation into and out of the dialect works, there is no expectation that this dialect roundtrips to XLA (e.g., it is only intended to be translated to MLIR and then legalized to another dialect or translated to HloInstructionProto).

The export approach of reusing the XlaBuilders enables reusing a lot of logic that was already implemented in terms of computing shapes, inserting broadcasts etc.

An important topic here is that XLA Client HLO ops are not a well defined set. And in particular what some would consider helper functions, others would consider ops. It should be easy to move between these and so define a new op along with the helper function or autogenerate the helper functions from the descriptions of the ops. For the former, a simple approach would be to simply consider the context in which the op is being constructed and if an MLIR one, construct a op in the client dialect instead of further calls into XlaBuilder. The latter could be implemented by adding the op and a legalization of the op to other known ops, from which a helper function can get generated that could be used as regular.

absl::Status: Exists but need to be cleaned up.

Meta HLO Dialect `mhlo`

Dialect is closer to current HLO server ops (e.g., no implicit broadcast)
MHLO dialect where we can deviate from the requirements of the client or server dialect, in particular:
- Control flow ops with implicit capture to enable simpler optimizations (e.g., generic LICM, unroll & jam, etc.)
- Multiple results ops (e.g., no tuples)
- More ops (for example, unique op or assert op), and ops that don't need to be added to either client or server dialect.
- Op set not constrained by implementation (e.g., hlo.add operating on say i79 or !mydialect.weird_type is allowed even though no XLA backend supports it). Verification on types happening at the boundaries.
- It does not need to preserve some deprecated XLA constructs (e.g. stateful RNG HLO).
- More dynamic shape support ops without need for updating all users/backends.
This dialect enables evolving HLO independently from XLA in order to experiment with features we'd like to upstream in MLIR TCP. In particular it intends to be user-extensible through interfaces.
It should have no TensorFlow, or proto, or other Google internal dependencies.
It need not be a complete superset of ops compared to XLA HLO dialect.

Entry:

Legalization from chlo dialect or conversion from XLA HLO.
Directly emitted from TF Graph Compiler;
Builder call (e.g., EDSL);

Exit:

LMHLO, Linalg IREE, directly used in codegen.
XLA HLO.

The MHLO dialect has no direct export format, it is only meant as an intermediate optimization dialect/format. It is also where we can experiment cheaply with new ops. This format will be where the representation would differ from existing endpoints.

absl::Status: Exists but need to be cleaned up and evolved, in particular with respect to supporting dynamic shapes.

MHLO differs from XLA HLO op set in multiple ways, including:

MHLO While accepts multiple operands and may produce multiple results instead;

LMHLO

LMHLO corresponds to late mhlo and operates on buffer domain (e.g., memref) with side-effecting operations. The lowering from mhlo dialect proceeds by way of scheduling, memory and buffer allocation. The current mapping is directly on XLA Client HLOs but without implicit broadcast and with operation on memrefs. This dialect will instead be rebased on mhlo dialect but operating on buffers still.

Entry:

Post buffer assignment on mhlo dialect, or from XLA after buffer assignment.

Exit:

Codegen (LLVM IR in the common cases at the moment)

End-to-End pipeline

TODO

Alternative build setups

Building Python API

Building the MHLO Python API requires building as an LLVM external project. The below instructions presume that you have this mlir-hlo repo and an llvm-project repo checked out side by side.

Note that the python package produced by this procedure includes the mlir package and is not suitable for deployment as-is (but it can be included into a larger aggregate).

mkdir build && cd build
cmake -GNinja -B. ${LLVM_SRC_DIR}/llvm \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS=mlir_hlo \
    -DLLVM_EXTERNAL_MLIR_HLO_SOURCE_DIR=${MLIR_HLO_SRC_DIR} \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DPython3_EXECUTABLE=$(which python) \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DMHLO_ENABLE_BINDINGS_PYTHON=ON

ninja MLIRHLOPythonModules
export PYTHONPATH=$PWD/tools/mlir_hlo/python_packages/mlir_hlo
python -c "import mlir.dialects.mhlo"

External projects that depend on mlir-hlo

External projects that need to depend on mlir-hlo (for example via a git submodule) can use the following setting in their cmake configuration in order for find_package(MHLO) to import all mlir-hlo cmake targets into their build setup and have access to the required include and lib variables (see generated MHLOConfig.cmake).

...
   -DMHLO_DIR=<path to mlir-hlo build dir>/lib/cmake/mlir-hlo
   ...

mlir-hlo's People

Contributors

Stargazers

Watchers

mlir-hlo's Issues

Failed to convert mhlo dialect to Linalg dialect

I want to convert the conv2d operator from mhlo to linalg dialect. It seems like that it can't process the situation when the layout of conv2d is NCHW.
The mhlo dialect of conv2d.mlir is shown below:

module  {
  func @main(%arg0: tensor<64x3x7x7xf32>, %arg1: tensor<1x3x224x224xf32>) -> tuple<tensor<1x64x112x112xf32>> {
    %0 = "mhlo.convolution"(%arg1, %arg0) {batch_group_count = 1 : i64, dimension_numbers = {input_batch_dimension = 0 : i64, input_feature_dimension = 1 : i64, input_spatial_dimensions = dense<[2, 3]> : tensor<2xi64>, kernel_input_feature_dimension = 1 : i64, kernel_output_feature_dimension = 0 : i64, kernel_spatial_dimensions = dense<[2, 3]> : tensor<2xi64>, output_batch_dimension = 0 : i64, output_feature_dimension = 1 : i64, output_spatial_dimensions = dense<[2, 3]> : tensor<2xi64>}, feature_group_count = 1 : i64, lhs_dilation = dense<1> : tensor<2xi64>, padding = dense<3> : tensor<2x2xi64>, precision_config = ["DEFAULT", "DEFAULT"], rhs_dilation = dense<1> : tensor<2xi64>, window_strides = dense<2> : tensor<2xi64>} : (tensor<1x3x224x224xf32>, tensor<64x3x7x7xf32>) -> tensor<1x64x112x112xf32>
    %1 = "mhlo.tuple"(%0) : (tensor<1x64x112x112xf32>) -> tuple<tensor<1x64x112x112xf32>>
    return %1 : tuple<tensor<1x64x112x112xf32>>
  }
}

I use "./mlir-hlo-opt ./conv2d.mlir -hlo-legalize-to-linalg -o conv2d_linalg.mlir" to do the transformation.
I traced the workflow and found out when it checks "HasCanonicalDimensionNumbers()", it returns false.
legalize_to_linalg.cc

  if (dimension_numbers.input_batch_dimension().getInt() != 0 ||
      dimension_numbers.input_feature_dimension().getInt() !=
          (input_spatial_rank + 1)) {
    return false;
  }

build error /usr/bin/ld.lld: error: unknown argument: --push-state

-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK
-- Performing Test HAVE_STEADY_CLOCK -- success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/e0004850/project_code/llvm_for_hlo/llvm-build

cmake --build /home/e0004850/project_code/mlir-hlo/../llvm_for_hlo/llvm-build --target all --target mlir-cpu-runner
[383/2912] Linking C executable bin/count
FAILED: bin/count
: && /usr/bin/cc -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-comment -fdiagnostics-color -O2 -g -DNDEBUG -fuse-ld=lld -W
l,--color-diagnostics -Wl,-O3 -Wl,--gc-sections utils/count/CMakeFiles/count.dir/count.c.o -o bin/count -Wl,-rpath,"$ORIGIN/../lib" -lpthread && :
/usr/bin/ld.lld: error: unknown argument: --push-state
/usr/bin/ld.lld: error: unknown argument: --pop-state
/usr/bin/ld.lld: error: unknown argument: --push-state
/usr/bin/ld.lld: error: unknown argument: --pop-state
collect2: error: ld returned 1 exit status
[416/2912] Building CXX object lib/DebugInfo/CodeView/CMakeFiles/LLVMDebugInfoCodeView.dir/EnumTables.cpp.o
ninja: build stopped: subcommand failed.

how to solve this problem

Move CMake CI for this repo to be publicly visible

The project is tested continuously with CMake on Kokoro, but this is invisible outside of Google right now.
It would be great if this CI was part of the publicly visible Kokoro.

To begin with, what about updating copybara to publish in the repo the build script used by the Kokoro CI right now to build with CMake?

failed to legalize to lhlo operation 'mhlo.fusion' that was explicitly marked illegal

There are comments in hlo_legalize_to_lhlo.cc about mhlo.fusion but it is not processed. fusionOp has defined in lhlo.td.

Clarify pad legality

func.func public @main(%arg0: tensor<2x3xi8> loc(unknown), %arg1: tensor<i8> loc(unknown)) -> tensor<2x1xi8> {
    %0 = "mhlo.pad"(%arg0, %arg1) {edge_padding_high = dense<[0, -1]> : tensor<2xi64>, edge_padding_low = dense<[0, -1]> : tensor<2xi64>, interior_padding = dense<0> : tensor<2xi64>} : (tensor<2x3xi8>, tensor<i8>) -> tensor<2x1xi8> loc(#loc1)
    return %0 : tensor<2x1xi8> loc(#loc0)
  }

legal pad?

Came up in iree-org/iree#9296

Inconsistent python dialect registration

The dialect registration functions for MHLO are of the form mhlo.register_mhlo_dialect(context) and chlo.register_chlo_dialect(context).

We don't have an upstream entry for this in the style guide (but maybe should); however, everyone else has converged on mhlo.register_dialect(context). Would be nice to remove the redundant "mhlo" and "chlo". Since already in use, can deprecate the originals.

Did the mhlo::Operation's trait of RecursiveSideEffects is necessary such as mhlo::ReduceOp SelectAndScatter and some ops with computation body.

Futhur more some control flow ops like while. I refer to the implementation of HLO instructions . They are also no side effect. I think they ensure the form of SSA when constructing, which is helpful for our optimization.

when build mlir-hlo-opt, there is error that symbolic cannot find

when build mlir-hlo-opt, there is error that symbolic cannot find
/usr/bin/ld: lib/libMhloPasses.a(symbolic_shape_optimization.cc.o): in function mlir::mhlo::(anonymous namespace)::SimplifyBroadcasts::matchAndRewrite(mlir::shape::BroadcastOp, mlir::PatternRewriter&) const':
symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_118SimplifyBroadcasts15matchAndRewriteENS_5shape11BroadcastOpERNS_15PatternRewriterE+0x11d): undefined reference to mlir::ShapeComponentAnalysis::GetValueInfo(mlir::Value)' /usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_118SimplifyBroadcasts15matchAndRewriteENS_5shape11BroadcastOpERNS_15PatternRewriterE+0x2a1): undefined reference to mlir::ShapeComponentAnalysis::SymbolicExpr::isConstant(long) const'
/usr/bin/ld: lib/libMhloPasses.a(symbolic_shape_optimization.cc.o): in function mlir::mhlo::(anonymous namespace)::AnnotateExpandingDimensionsInDynamicBroadcastInDim::matchAndRewrite(mlir::mhlo::DynamicBroadcastInDimOp, mlir::PatternRewriter&) const': symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_150AnnotateExpandingDimensionsInDynamicBroadcastInDim15matchAndRewriteENS0_23DynamicBroadcastInDimOpERNS_15PatternRewriterE+0x20f): undefined reference to mlir::ShapeComponentAnalysis::GetShapeInfo(mlir::Value)'
/usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_150AnnotateExpandingDimensionsInDynamicBroadcastInDim15matchAndRewriteENS0_23DynamicBroadcastInDimOpERNS_15PatternRewriterE+0x22e): undefined reference to mlir::ShapeComponentAnalysis::GetValueInfo(mlir::Value)' /usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_150AnnotateExpandingDimensionsInDynamicBroadcastInDim15matchAndRewriteENS0_23DynamicBroadcastInDimOpERNS_15PatternRewriterE+0x32a): undefined reference to mlir::ShapeComponentAnalysis::SymbolicExpr::isConstant(long) const'
/usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_150AnnotateExpandingDimensionsInDynamicBroadcastInDim15matchAndRewriteENS0_23DynamicBroadcastInDimOpERNS_15PatternRewriterE+0x355): undefined reference to mlir::ShapeComponentAnalysis::SymbolicExpr::isConstant(long) const' /usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_150AnnotateExpandingDimensionsInDynamicBroadcastInDim15matchAndRewriteENS0_23DynamicBroadcastInDimOpERNS_15PatternRewriterE+0x68c): undefined reference to mlir::ShapeComponentAnalysis::SymbolicExpr::isKnownNotOne() const'
/usr/bin/ld: lib/libMhloPasses.a(symbolic_shape_optimization.cc.o): in function mlir::mhlo::(anonymous namespace)::BroadcastOpLowering::matchAndRewrite(mlir::shape::BroadcastOp, mlir::PatternRewriter&) const': symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_119BroadcastOpLowering15matchAndRewriteENS_5shape11BroadcastOpERNS_15PatternRewriterE+0x154): undefined reference to mlir::ShapeComponentAnalysis::GetValueInfo(mlir::Value)'
/usr/bin/ld: symbolic_shape_optimization.cc:(.text._ZNK4mlir4mhlo12_GLOBAL__N_119BroadcastOpLowering15matchAndRewriteENS_5shape11BroadcastOpERNS_15PatternRewriterE+0x6fb): undefined reference to mlir::ShapeComponentAnalysis::SymbolicExpr::isConstant(long) const' collect2: error: ld returned 1 exit status

'mhlo/IR/hlo_ops.h' file not found

Hi there,

we're using mlir-hlo in a downstream project as a submodule and include it into our project using add_subdirectory(third_party/mlir-hlo EXCLUDE_FROM_ALL).

We're trying to update to the green commit from Nov. 14th, but we're hitting the error mentioned in the title: 'mhlo/IR/hlo_ops.h' file not found.

We're linking against the following libraries:

target_link_libraries(${TARGET} PUBLIC
  # ...
  ChloOps
  HloOpsCommon
  MLIRMhloUtils
  MhloDialect
  MhloRegisterDialects
  StablehloAssemblyFormat
  StablehloBase
  StablehloBroadcastUtils
  StablehloOps
  StablehloRegister
  StablehloTypeInference
)

I would assume that the MHLO include directories are propagated through the libraries we link against. Interestingly enough, mlir_hlo/include and mlir_hlo/stablehlo are are on the include path. Is there anything we're missing to get the missing include directories?

Failure to compile broadcast_in_dim

The following fails to compile

module @jit_encode_batch {
  func.func public @main(%arg0: tensor<i32>, %arg293: tensor<?x1024xi32>) -> tensor<?x1024xi32> {
    %0 = mhlo.constant dense<0> : tensor<i32>
    %1 = mhlo.convert %arg0 : tensor<i32>
    %2 = mhlo.reshape %1 : (tensor<i32>) -> tensor<1xi32>
    %3 = mhlo.constant dense<1024> : tensor<1xi32>
    %4 = "mhlo.concatenate"(%2, %3) {dimension = 0 : i64} : (tensor<1xi32>, tensor<1xi32>) -> tensor<2xi32>
    %5 = "mhlo.dynamic_broadcast_in_dim"(%arg293, %4) {broadcast_dimensions = dense<[0, 1]> : tensor<2xi64>} : (tensor<?x1024xi32>, tensor<2xi32>) -> tensor<?x1024xi32>
    return %5 : tensor<?x1024xi32>
 }
}

while the following does

module @jit_encode_batch {
  func.func public @main(%arg0: tensor<i32>, %arg293: tensor<?x1024xi32>) -> tensor<?x1024xi32> {
    %0 = mhlo.constant dense<0> : tensor<i32>
    %1 = mhlo.convert %arg0 : tensor<i32>
    %2 = mhlo.reshape %1 : (tensor<i32>) -> tensor<1xi32>
    %3 = mhlo.constant dense<1024> : tensor<1xi32>
    %4 = "mhlo.concatenate"(%2, %3) {dimension = 0 : i64} : (tensor<1xi32>, tensor<1xi32>) -> tensor<2xi32>
    %5 = "mhlo.dynamic_broadcast_in_dim"(%arg293, %4) {broadcast_dimensions = dense<[0, 1]> : tensor<2xi64>,
      known_expanding_dimensions = dense<[0]> : tensor<1xi64>,
      known_nonexpanding_dimensions = dense<[1]> : tensor<1xi64>
    } : (tensor<?x1024xi32>, tensor<2xi32>) -> tensor<?x1024xi32>
    return %5 : tensor<?x1024xi32>
 }
}

Not sure if this is a frontend issue (the first form shouldn't have been produced with those attributes), luck that it compiles with those attributes, or expected that it compiles without those.

MHLO CAPI is not actually a CAPI

Example:

In file included from /home/stella/src/iree/third_party/mlir-hlo/python/MlirHloModule.cpp:15:
/home/stella/src/iree/third_party/mlir-hlo/include/mlir-hlo-c/Attributes.h:186:32: warning: 'mlirMhloComparisonDirectionAttrGetDirection' has C-linkage specified, but returns user-defined type 'std::string' (aka 'basic_string<char>') which is incompatible with C [-Wreturn-type-c-linkage]
MLIR_CAPI_EXPORTED std::string mlirMhloComparisonDirectionAttrGetDirection(

Suggest adding a dummy .c test which at least includes all of the public headers and will hard fail on such situations.

extract pattern in special pass of 'mhlo-legalize-einsum-to-dot-general' to canonicalization and insert it to canonicalization pass

and need to do:

enrich the description of Einsum.
simplify UnaryEinsumOp to broadcast, transpose or reduce...

Deduplicate reduction subcomputations when converting from MHLO to HLO

See google/jax#7654

We should deduplicate reducers when converting from MHLO to HLO. e.g. compare:

In [1]: import jax

In [2]: import jax.numpy as jnp

In [3]: def f(x, y): return jnp.sum(x) + jnp.sum(y)

In [4]: print(jax.jit(f).lower(jnp.arange(10), jnp.arange(15)).compiler_ir())
module @jit_f.2 {
  func.func public @main(%arg0: tensor<10xi32>, %arg1: tensor<15xi32>) -> tensor<i32> {
    %0 = mhlo.constant dense<0> : tensor<i32>
    %1 = mhlo.reduce(%arg0 init: %0) across dimensions = [0] : (tensor<10xi32>, tensor<i32>) -> tensor<i32>
     reducer(%arg2: tensor<i32>, %arg3: tensor<i32>)  {
      %5 = mhlo.add %arg2, %arg3 : tensor<i32>
      "mhlo.return"(%5) : (tensor<i32>) -> ()
    }
    %2 = mhlo.constant dense<0> : tensor<i32>
    %3 = mhlo.reduce(%arg1 init: %2) across dimensions = [0] : (tensor<15xi32>, tensor<i32>) -> tensor<i32>
     reducer(%arg2: tensor<i32>, %arg3: tensor<i32>)  {
      %5 = mhlo.add %arg2, %arg3 : tensor<i32>
      "mhlo.return"(%5) : (tensor<i32>) -> ()
    }
    %4 = mhlo.add %1, %3 : tensor<i32>
    return %4 : tensor<i32>
  }
}

and

In [6]: print(jax.jit(f).lower(jnp.arange(10), jnp.arange(15)).compiler_ir(dialect="hlo").as_hlo_text())
HloModule jit_f.4, entry_computation_layout={(s32[10]{0},s32[15]{0})->s32[]}

region_0.4 {
  Arg_0.5 = s32[] parameter(0)
  Arg_1.6 = s32[] parameter(1)
  ROOT add.7 = s32[] add(Arg_0.5, Arg_1.6)
}

region_1.9 {
  Arg_0.10 = s32[] parameter(0)
  Arg_1.11 = s32[] parameter(1)
  ROOT add.12 = s32[] add(Arg_0.10, Arg_1.11)
}

ENTRY main.15 {
  Arg_0.1 = s32[10]{0} parameter(0)
  constant.3 = s32[] constant(0)
  reduce.8 = s32[] reduce(Arg_0.1, constant.3), dimensions={0}, to_apply=region_0.4
  Arg_1.2 = s32[15]{0} parameter(1)
  reduce.13 = s32[] reduce(Arg_1.2, constant.3), dimensions={0}, to_apply=region_1.9
  ROOT add.14 = s32[] add(reduce.8, reduce.13)
}

It would be great to merge region_0.4 and region_1.9 for readability of the HLO. Some computations end up with hundreds of reducers.

@cheshire

missing test case after remove disc

Would we remove this file mhlo_fusion.cc ?

Using StableHLO for ONNX to HLO conversion

In light of the presentation on OpenXLA where they touted StableHLO as the preferred representation for exchanges between compilers/tools, would it make sense to migrate our support from ONNX to StableHLO?

My understanding is that at this stage, StableHLO might only be starting, and initially it will be very close to MHLO, but as MHLO evolves to better correspond to optimizations, StableHLO will remain more stable and focused on providing compatibility.

Does it make sense to the ONNX-MHLO contributors?

@ZihengJiang @rsuderman @lipracer @raikonenfnu (apologies if I missed some MHLO contributors)

Missing CMake deps for THLO and suspect layering

Can someone please land an equivalent of this change?

https://github.com/iree-org/iree-mhlo-fork/commit/a6a3308969278cf9ec36132fc9235314fd091c23

I only did a little bit of poking to get a minimum size patch, but the dependency chain and code organization of these pieces seems wrong. Can it be improved? Many of us are not using mlir-hlo as part of XLA where maybe this is tolerated, and it is counter-intuitive that the code is laid out to have such a large transitive dependency set of compiler internals just to get the top level dialect.

pretty print chlo.comparison*

I saw this op flying by in a review and it could be quite a bit more succinct:

chlo.broadcast_compare %[[T0]], %[[T1]] {compare_type = #chlo<comparison_type FLOAT>, comparison_direction = #chlo<comparison_direction LT>} : (tensor<?x?xf32>, tensor<64xf32>) -> tensor<?x?xi1>

could be

chlo.broadcast_compare %[[T0]], %[[T1]] FLOAT, LT : tensor<?x?xf32>, tensor<64xf32> -> tensor<?x?xi1>

Or just drop FLOAT

chlo.broadcast_compare LT, %[[T0]], %[[T1]] : tensor<?x?xf32>, tensor<64xf32> -> tensor<?x?xi1>

@GleasonK

mhlo.dot inferred shape is incompatible with return type

I am working with mhlo in C++ for some external bindings. I have this code to add a dot operation to an existing function:

mlir::Value MLIRFunction::DotOp(mlir::Value lhs, mlir::Value rhs) {
  module_->builder()->setInsertionPointToEnd(&func_->getBody().back());
  mlir::ArrayAttr emptyPrecisionAttr = module_->builder()->getArrayAttr({});
  auto op = module_->builder()->create<mlir::mhlo::DotOp>(module_->builder()->getUnknownLoc(), lhs.getType(), lhs, rhs, emptyPrecisionAttr);
  return op;
}

This produces this mlir module:

"builtin.module"() ({
  "func.func"() <{function_type = (tensor<1000000xf32>) -> tensor<1000000xf32>, sym_name = "main"}> ({
  ^bb0(%arg0: tensor<1000000xf32>):
    %0 = "mhlo.dot"(%arg0, %arg0) {precision_config = []} : (tensor<1000000xf32>, tensor<1000000xf32>) -> tensor<1000000xf32>
    "func.return"(%0) : (tensor<1000000xf32>) -> ()
  }) : () -> ()
}) : () -> ()

and trying to execute this results in this error:

** (RuntimeError) <unknown>:0: error: inferred shape '[]' is incompatible with return type of operation 'tensor<1000000xf32>'

Not sure why the shape inference is failing here

Support Converting TF dialect to HLO dialect entirely?

Hi folks,

I'm converting the TF mode to HLO dialect based on the Tensorflow framework. The conversion sequence can be described as pbtxt -> executor -> function
tf-mlir-translate -graphdef-to-mlir -tf-enable-shape-inference-on-import=false saved_model.pbtxt -o resnet_50.mlir tf-opt -tf-executor-to-functional-conversion resnet_50.mlir -o resnet_50-func.mlir
However, when converting from TF dialect to HLO further via the tf-to-hlo-pipeline, the type in TF seems not to match with that in MHLO.

lenet_tf_launch_gpu.mlir:49:14: error: 'mhlo.dynamic_broadcast_in_dim' op operand #0 must be tensor of floating-point or pred (AKA boolean or 1-bit integer) or 8/16/32/64-bit signless integer or 8/16/32/64-bit unsigned integer or complex type with 32-bit float or 64-bit float elements values, but got 'tensor<32x!tf_type.f32ref>'
%1 = "tf.BiasAdd"(%outputs_18, %outputs_12) {data_format = "NHWC"} : (tensor<?x?x?x32xf32>, tensor<32xf32>) -> tensor<?x?x?x32xf32>
^
lenet_tf_launch_gpu.mlir:49:14: note: see current operation: %38 = "mhlo.dynamic_broadcast_in_dim"(%6#0, %37) {broadcast_dimensions = dense<3> : tensor<1xi64>} : (tensor<32x!tf_type.f32ref>, tensor<4xindex>) -> tensor<?x?x?x32xf32>

So, my question is that TF dialect can be lowered to HLO entirely until now？ Or any advice about how to convert a model to HLO dialect?

Thanks

`erfinv(1.)` should be `inf`

erfinv was added in 221ac0e, and erfinv(1.) evaluates to a large positive float instead of inf (this surfaced in JAX when JAX 0.4.6 was released, and broke a test in TensorFlow Probability, comparing TF and JAX erfinv).

lhlo-copy-removal pass crash

I'm not sure whether issues can be posted on this repo. If not, I can move it to tensorflow proper.

This can be reproduced with a recent commit (d4dcba1340f363762cc6003d4ed1f4db2df61858) and in all certainty with the trunk as well. The input isn't really the expected one for this pass, but this is a bug apparently stemming from an assumption on the input. A check / bail out would have been fine for example.

Input:

func @func_op_long(%arg0: memref<4xf32>, %arg1: memref<4xf32>, %arg2: memref<4xf32>) {
    %0 = alloc() : memref<4xf32>
    affine.for %arg3 = 0 to 4 {
      %5 = affine.load %arg0[%arg3] : memref<4xf32>
      %6 = affine.load %arg1[%arg3] : memref<4xf32>
      %7 = cmpf "ogt", %5, %6 : f32
      %8 = select %7, %5, %6 : f32
      affine.store %8, %0[%arg3] : memref<4xf32>
    }
    "lmhlo.copy"(%0, %arg2) : (memref<4xf32>, memref<4xf32>) -> ()
    return
  }

$ mlir-hlo-opt -lhlo-copy-removal   /tmp/crash.mlir 
mlir-hlo-opt: external/llvm-project/mlir/lib/IR/Operation.cpp:330: bool mlir::Operation::isBeforeInBlock(mlir::Operation*): Assertion `other && other->block == block && "Expected other operation to have the same parent block."' failed.
PLEASE submit a bug report to  and include the crash backtrace.
Stack dump:
0.	Program arguments: bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt -lhlo-copy-removal /tmp/crash.mlir 
 #0 0x00000000014c1d7d llvm::sys::PrintStackTrace(llvm::raw_ostream&) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14c1d7d)
 #1 0x00000000014bfaed llvm::sys::RunSignalHandlers() (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14bfaed)
 #2 0x00000000014c041d SignalHandler(int) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14c041d)
 #3 0x00007ff0c1bfcdd0 __restore_rt (/lib64/libpthread.so.0+0x12dd0)
 #4 0x00007ff0c164770f raise (/lib64/libc.so.6+0x3770f)
 #5 0x00007ff0c1631b25 abort (/lib64/libc.so.6+0x21b25)
 #6 0x00007ff0c16319f9 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x219f9)
 #7 0x00007ff0c163fcc6 (/lib64/libc.so.6+0x2fcc6)
 #8 0x00000000014526f3 mlir::Operation::isBeforeInBlock(mlir::Operation*) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14526f3)
 #9 0x00000000009f0acf _ZN4llvm12function_refIFvPN4mlir9OperationEEE11callback_fnIZNS1_6detail14walkOperationsIZNS1_5lmhlo12_GLOBAL__N_119LhloCopyRemovalPass14runOnOperationEvEUlNS9_6CopyOpEE_SC_vEENSt9enable_ifIXaantsrSt7is_sameIT0_S3_E5valuesrSF_IT1_vE5valueESI_E4typeES3_OT_EUlS3_E_EEvlS3_ (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x9f0acf)
#10 0x00000000014821e7 mlir::detail::walkOperations(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14821e7)
#11 0x00000000014821e7 mlir::detail::walkOperations(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14821e7)
#12 0x00000000009f073f mlir::lmhlo::(anonymous namespace)::LhloCopyRemovalPass::runOnOperation() (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x9f073f)
#13 0x00000000013d37ce mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13d37ce)
#14 0x00000000013d38ba mlir::OpPassManager::run(mlir::Operation*, mlir::AnalysisManager) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13d38ba)
#15 0x00000000013da139 mlir::PassManager::run(mlir::ModuleOp) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13da139)
#16 0x0000000000c40320 performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) (.constprop.101) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40320)
#17 0x0000000000c40b89 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40b89)
#18 0x0000000000c40cd0 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40cd0)
#19 0x0000000000c4149d mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc4149d)
#20 0x000000000096a885 main (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x96a885)
#21 0x00007ff0c16336a3 __libc_start_main (/lib64/libc.so.6+0x236a3)
#22 0x000000000096402e _start (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x96402e)
Aborted (core dumped)

@dfki-ehna, @joker-eph

Missing legalization for `mhlo.scatter` to standard MLIR

Is there a pass (sequence) that can lower the mhlo.scatter operation to standard MLIR dialects, such as linalg or tensor?

The goal is to lower to the LLVM dialect and perform codegen with LLVM. I wasn't able to find a pass that converts the op out of the MLIR-HLO dialect domain.

Support i4 type in MHLO

MHLO currently limits operations to 8/16/32/64-bit types. We would like support for i4 to enable quantization work

cmake build error

cmd：build_tools/build_mlir.sh ${PWD}/llvm-project/ ${PWD}/llvm-build

log：-- Looking for strerror_r - found
-- Looking for strerror_s
-- Looking for strerror_s - not found
-- Looking for setenv
-- Looking for setenv - found
-- Looking for dlopen
-- Looking for dlopen - found
-- Looking for dladdr
-- Looking for dladdr - not found
-- Performing Test HAVE_STRUCT_STAT_ST_MTIMESPEC_TV_NSEC
-- Performing Test HAVE_STRUCT_STAT_ST_MTIMESPEC_TV_NSEC - Failed
-- Performing Test HAVE_STRUCT_STAT_ST_MTIM_TV_NSEC
-- Performing Test HAVE_STRUCT_STAT_ST_MTIM_TV_NSEC - Success
-- Looking for GLIBC
-- Looking for GLIBC - found
-- Looking for pthread_getname_np
-- Looking for pthread_getname_np - found
-- Looking for pthread_setname_np
-- Looking for pthread_setname_np - found
-- Performing Test HAVE_CXX_ATOMICS_WITHOUT_LIB
-- Performing Test HAVE_CXX_ATOMICS_WITHOUT_LIB - Success
-- Performing Test HAVE_CXX_ATOMICS64_WITHOUT_LIB
-- Performing Test HAVE_CXX_ATOMICS64_WITHOUT_LIB - Success
-- Performing Test LLVM_HAS_ATOMICS
-- Performing Test LLVM_HAS_ATOMICS - Success
-- Performing Test SUPPORTS_VARIADIC_MACROS_FLAG
-- Performing Test SUPPORTS_VARIADIC_MACROS_FLAG - Success
-- Performing Test SUPPORTS_GNU_ZERO_VARIADIC_MACRO_ARGUMENTS_FLAG
-- Performing Test SUPPORTS_GNU_ZERO_VARIADIC_MACRO_ARGUMENTS_FLAG - Failed
-- Performing Test HAS_MAYBE_UNINITIALIZED
-- Performing Test HAS_MAYBE_UNINITIALIZED - Success
-- Native target architecture is X86
-- Threads enabled.
-- Doxygen disabled.
-- Go bindings disabled.
-- Ninja version: 1.8.2
-- Found OCaml: /usr/bin/ocamlfind
-- OCaml bindings disabled, need ctypes >=0.4.
-- Could NOT find Python module pygments
-- Could NOT find Python module pygments.lexers.c_cpp
-- Could NOT find Python module yaml
-- LLVM host triple: x86_64-unknown-linux-gnu
-- LLVM default target triple: x86_64-unknown-linux-gnu
-- Performing Test CXX_SUPPORTS_CUSTOM_LINKER
-- Performing Test CXX_SUPPORTS_CUSTOM_LINKER - Failed
CMake Error at cmake/modules/HandleLLVMOptions.cmake:282 (message):
Host compiler does not support '-fuse-ld=lld'
Call Stack (most recent call first):
CMakeLists.txt:703 (include)

-- Configuring incomplete, errors occurred!
See also "/root/llvm_mlir/mlir-hlo/llvm-build/CMakeFiles/CMakeOutput.log".
See also "/root/llvm_mlir/mlir-hlo/llvm-build/CMakeFiles/CMakeError.log".

mlir-hlo：1c6c04f76b474f866a3a6b6df44c6afbd412697d
llvm：baa005c96ce610e9ee91ef55a3a1b1eacd5a0a27

Custom assembly form for mhlo.reduce does not roundtrip with locations

Have not triaged yet, but the attached file fails to roundtrip some forms of the mhlo.reduce op with location info.

Example of failure:

%39 = mhlo.reduce(%38 init: %1) applies mhlo.maximum across dimensions = [1] : (tensor<?x10xf32>, tensor<f32>) -> tensor<?xf32> loc(#loc1)

Reports: "expected location instance"

Repro:

mhlo-opt input.mlir

reduce_parse_error.mlir.txt

Inclusive language (sanity -> safety)

With moves towards inclusive language:
https://www.ibm.com/blogs/think/2020/08/words-matter-driving-thoughtful-change-toward-inclusive-language-in-technology/
we'd like to request references to "sanity" be changed to "safety" or another more inclusive term. Sanity isn't an inclusive term for those coming from neurodiverse backgrounds.

The two hits we find are at:

`tiling_softmax.cc#tilePartialSoftmax: undefined reference to `mlir::gml_st::isSimpleBcastReduction'

Compiling mlir-hlo as a dep of torch-mlir I get a linker error:

/usr/bin/ld: lib/libGmlStPasses.a(tiling_softmax.cc.o): in function `mlir::gml_st::(anonymous namespace)::tilePartialSoftmax(mlir::TilingInterface, mlir::PatternRewriter&, llvm::function_ref<mlir::FailureOr<mlir::TilingResult> (mlir::Operation*, long)>)':
undefined reference to `mlir::gml_st::isSimpleBcastReduction(mlir::Operation*, long*, mlir::gml_st::SimpleBcastReduction*)'

This symbol is at gml_st/utils/linalg_utils.cc#L43.

Tracking it down, it seems gml_st/transforms/CMakeLists.txt#L61 depends on MLIRGmlStUtils but doesn't actually link it.

cc @powderluv

MHLO operation regions need to use scalars arguments

MHLO operations that have regions use a zero-rank tensor to represent what are really scalar values. For example

func @reduce_one_op_all_locs_same(%arg0: tensor<?x?xf32>, %arg1 : tensor<f32>) -> (tensor<?xf32>) {
  %0 = "mhlo.reduce"(%arg0, %arg1) ( {
  ^bb0(%arg2: tensor<f32> loc("foo"), %arg3: tensor<f32> loc("foo")):
    %1 = "mhlo.add"(%arg2, %arg3) : (tensor<f32>, tensor<f32>) -> tensor<f32> loc("foo")
    "mhlo.return"(%1) : (tensor<f32>) -> () loc("foo")
  }) {dimensions = dense<[1]> : tensor<1xi64>} : (tensor<?x?xf32>, tensor<f32>) -> tensor<?xf32> loc("foo")

  return %0: tensor<?xf32>
}

There are a couple of issues here.

The region of the mhlo.reduce here has an mhlo.add. The way one would lower mhlo.add to say linalg dialect is very different whether this operation is within an mhlo op or at the top level. This seems to be a conflation between different uses of an mhlo.add operation. It would be much easier to handle this if mhlo.add was only used at the top level and a different operation was used within mhlo operations.
The region of the mhlo operation in this case seems to be a sequence of computations that are really scalars. Using tensor of zero rank introduces additional complexity when translating this to Linalg dialect since this requires a type conversion of the arguments from zero rank tensor to scalars. Having this scalar before the conversion would reduce a lot of the complexity.

32-bit erf is inaccurate for x > 1

see openxla/stablehlo#1238

Build warning spew

Hi, MHLO has a number of build warnings that are causing spew in the Torch-MLIR build.

 /main_checkout/torch-mlir/externals/mlir-hlo/lib/Dialect/mhlo/IR/hlo_ops.cc: In function ‘mlir::LogicalResult mlir::mhlo::sortOpInferDefaultDimension(mlir::mhlo::SortOp, mlir::PatternRewriter&)’:
  /main_checkout/torch-mlir/externals/mlir-hlo/lib/Dialect/mhlo/IR/hlo_ops.cc:6593:22: warning: comparison of integer expressions of different signedness: ‘uint64_t’ {aka ‘long unsigned int’} and ‘int’ [-Wsign-compare]
   6593 |   if (op.dimension() != -1) {

  /main_checkout/torch-mlir/externals/mlir-hlo/lib/Dialect/mhlo/IR/hlo_ops.cc: In member function ‘mlir::LogicalResult mlir::mhlo::ReduceWindowOp::verify()’:
  /main_checkout/torch-mlir/externals/mlir-hlo/lib/Dialect/mhlo/IR/hlo_ops.cc:4210:29: warning: comparison of integer expressions of different signedness: ‘int64_t’ {aka ‘long int’} and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]
   4210 |     if (inputType.getRank() != windowDims.size())
        |         ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~

  /main_checkout/torch-mlir/externals/mlir-hlo/lib/Dialect/mhlo/IR/hlo_ops.cc:6011:26: warning: extra ‘;’ [-Wpedantic]
   6011 | BINARY_FOLDER(MinOp, Min);
        |                          ^

For full list search for "mhlo" in https://github.com/llvm/torch-mlir/runs/7598076835?check_suite_focus=true

Missing CMake Dependency

When building on a single core I get the following error:

cmake --build /home/erick/code/catalyst-latest/mlir//mlir-hlo/build --target check-mlir-hlo -j 1
[160/305] Building CXX object mhlo/analysis/CMakeFiles/obj.MhloTestAnalysis.dir/test_shape_component_analysis.cc.o
FAILED: mhlo/analysis/CMakeFiles/obj.MhloTestAnalysis.dir/test_shape_component_analysis.cc.o 
ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/erick/code/catalyst-latest/mlir/llvm-project/llvm/include -I/home/erick/code/catalyst-latest/mlir/llvm-project/build/include -I/home/erick/code/catalyst-latest/mlir/llvm-project/mlir/include -I/home/erick/code/catalyst-latest/mlir/llvm-project/build/tools/mlir/include -I/home/erick/code/catalyst-latest/mlir/mlir-hlo -I/home/erick/code/catalyst-latest/mlir/mlir-hlo/build -I/home/erick/code/catalyst-latest/mlir/mlir-hlo/stablehlo -I/home/erick/code/catalyst-latest/mlir/mlir-hlo/build/stablehlo -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG   -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=gnu++17 -MD -MT mhlo/analysis/CMakeFiles/obj.MhloTestAnalysis.dir/test_shape_component_analysis.cc.o -MF mhlo/analysis/CMakeFiles/obj.MhloTestAnalysis.dir/test_shape_component_analysis.cc.o.d -o mhlo/analysis/CMakeFiles/obj.MhloTestAnalysis.dir/test_shape_component_analysis.cc.o -c /home/erick/code/catalyst-latest/mlir/mlir-hlo/mhlo/analysis/test_shape_component_analysis.cc
/home/erick/code/catalyst-latest/mlir/mlir-hlo/mhlo/analysis/test_shape_component_analysis.cc:24:10: fatal error: 'transforms/passes.h.inc' file not found
#include "transforms/passes.h.inc"
         ^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
ninja: build stopped: subcommand failed.
make[1]: *** [Makefile:53: mhlo] Error 1
make[1]: Leaving directory '/home/erick/code/catalyst-latest/mlir'
make: *** [Makefile:44: mhlo] Error 2

Fixed with:

diff --git a/mhlo/analysis/CMakeLists.txt b/mhlo/analysis/CMakeLists.txt
index 4e15d73..a10d88d 100644
--- a/mhlo/analysis/CMakeLists.txt
+++ b/mhlo/analysis/CMakeLists.txt
@@ -13,6 +13,9 @@ add_mlir_library(MhloAnalysis
 add_mlir_library(MhloTestAnalysis
   test_shape_component_analysis.cc
 
+  DEPENDS
+  LMHLOTransformsPassIncGen
+
   LINK_COMPONENTS
   Core

mlir in tensorflow training?

Hi, Is it possible for using mlir to make LLVM IR in machine training like GPU support?
I can not find any code in tensroflow to use mlir turn back to tensorflow executor. Therefore, mlir is only useful for inference? I wonder if tensorflow could change make some ops into IR process and then merge IR result back to tensorflow process?

mhlo-fusion bug

// CHECK-LABEL: func @elementwise_fusion
func @elementwise_fusion(%arg0: tensor<4x16xi32>, %arg1: tensor<4x16xi32>) -> tensor<2x4xi32> {
  %0 = "mhlo.add"(%arg0, %arg1) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
  %1 = "mhlo.subtract"(%0, %arg0) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
  %2 = "mhlo.slice"(%1) {limit_indices = dense<[2, 8]> : tensor<2xi64>, start_indices = dense<0> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} : (tensor<4x16xi32>) -> tensor<2x8xi32>
  %3 = "mhlo.multiply"(%0, %1) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
  %4 = "mhlo.slice"(%3) {limit_indices = dense<[2, 8]> : tensor<2xi64>, start_indices = dense<0> : tensor<2xi64>, strides = dense<[1, 2]> : tensor<2xi64>} : (tensor<4x16xi32>) -> tensor<2x4xi32>
  return %4 : tensor<2x4xi32>
}

Paste this paragraph after the test/ mhlo-fusion.mlir file and run ninja check-mlir-hlo
Run the IR above with mhlo-fusion to get the wrong result;
Run this section with pass alone, and the IR output is as follows:

loc("-":5:10): error: operand #0 does not dominate this use
// -----// IR Dump After MhloFusionPass Failed ('builtin.func' operation: @main) //----- //
"builtin.module"() ( {
  "builtin.func"() ( {
  ^bb0(%arg0: tensor<4x16xi32>, %arg1: tensor<4x16xi32>):  // no predecessors
    %0 = "mhlo.slice"(%1#0) {limit_indices = dense<[2, 8]> : tensor<2xi64>, start_indices = dense<0> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} : (tensor<4x16xi32>) -> tensor<2x8xi32>
    %1:2 = "mhlo.fusion"(%arg0, %arg1) ( {
      %3 = "mhlo.add"(%arg0, %arg1) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
      %4 = "mhlo.subtract"(%3, %arg0) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
      %5 = "mhlo.multiply"(%3, %4) : (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<4x16xi32>
      "mhlo.return"(%4, %5) : (tensor<4x16xi32>, tensor<4x16xi32>) -> ()
    }) : (tensor<4x16xi32>, tensor<4x16xi32>) -> (tensor<4x16xi32>, tensor<4x16xi32>)
    %2 = "mhlo.slice"(%1#1) {limit_indices = dense<[2, 8]> : tensor<2xi64>, start_indices = dense<0> : tensor<2xi64>, strides = dense<[1, 2]> : tensor<2xi64>} : (tensor<4x16xi32>) -> tensor<2x4xi32>
    "std.return"(%2) : (tensor<2x4xi32>) -> ()
  }) {sym_name = "main", type = (tensor<4x16xi32>, tensor<4x16xi32>) -> tensor<2x4xi32>} : () -> ()
}) : () -> ()

%0 = "mhlo.slice"(%1#0) should follow by fusion op

Opbuilder is created by this statement OpBuilder b(pattern.back());. Fusion OP is inserted after all fused ops, need to move others consumers between fused ops by post order.

where the argument "-lhlo-legalize-to-linalg” of mlir-hlo-opt?

while loop simplifier missing

xla service has while_loop_simplifier.
Invariant's code Motion function MHLO already exists.
But the following optimizations are missing.

loop with zero trip count elininate

while (operands) {
 cond(operands) {
   return fasle
 }
 body(operands){
 }
}
replace with: operands

loop with one trip count
inline while's body.

Currently MHLIR ::WhileOp just lower to SCF ::WhileOp, I'm not sure if SCF ::WhileOp has a similar simplification.
If MLIR ::WhileOp needs to support this functionality, it may need to simulator excute 'whileOp''s body, or use data flow analysis to implement constant propagation to simulate body execution of 'whileOp.

Failed to import repo as a third party dependency using bazel

Hi there, we're currently working on a graph-compiler based on [some front graph]-mhlo-linalg-[some backend]. When we tried to use mhlo as a third party dependency in another repo as following BUILD:

cc_library(
  name = "utils",
  srcs = [
    "utils.cc",
  ],
  hdrs = [
    "utils.h",
  ],
  deps = [
    "@mlir-hlo//stablehlo:stablehlo_ops",
    "@mlir-hlo//stablehlo:chlo_ops",
    "@mlir-hlo//stablehlo:vhlo_ops",
    "@llvm-project//mlir:IR",
    "@llvm-project//mlir:Dialect",
    "@llvm-project//mlir:Support"
  ]

An error occured in bazel build //target:this rule is missing dependency declarations for the following files.

I found out this error may caused by attr strip_include_prefix = "." in BUILD file, and the extensive use of this attribute seems due to file includes mainly for generated files.
https://github.com/tensorflow/mlir-hlo/blob/master/mhlo/IR/hlo_ops.cc#L87
Once all file includes are in full path, the problem is solved.

We are wondering that are these file includes written by some certain purpose for CMAKE(we tried bazel only), or they are just some known issues need to be rewrite in the future?

ld.lld: error: unable to find library -lMLIRTosa

Not sure this is my fault or not. I followed the build instructions from the README, including selecting the specific LLVM commit, and when linking mlir-hlo-opt I get the following error:

ld.lld: error: unable to find library -lMLIRTosa

If I replace the name with MLIRTosaDialect, it links.

There doesn't seem to be a MLIRTosa.a library in the LLVM build dir (or CMake files for Tosa), not in the specific LLVM commit nor on trunk, so I'm not sure how that builds for other people.

I also looked at the stream of commits and there are two new merges from TF's LLVM, tried them both, nope.

Am I doing something wrong?

Build & Test sections in readme are inacurate

Hi,

I'm getting started with this project and found that readme doesn't work well.

1st - building. It would help to add step of building llvm with -DLLVM_ENABLE_PROJECTS="clang;lld" after checking it out. I mean, inside ./llvm-project. Otherwise error #4 appears. Alternatively having llvm (clang + lld) in prerequsites may work good enough.

2nd - testing. The last test produces an error:

[303/304] Running the mlir-hlo regression tests
llvm-lit: .../mlir-hlo/llvm-project/llvm/utils/lit/lit/llvm/subst.py:122: note: Did not find mlir-cpu-runner in .../mlir-hlo/llvm-build:.../mlir-hlo/build/bin

The reason is that mlir_binary_dir doesn't actually points to the .../bin subdirrectory. Using llvm_tools_dir resolves the problem.
in ./tests/lit.cfg.py

 tool_dirs = [
-    config.mlir_binary_dir,
+    config.llvm_tools_dir,
     config.mlir_hlo_tools_dir,
 ]

or maybe add llvm_tools_dir to mlir_hlo_tools_dir in lit.site.cfg.py.in

How to extend stalehlo to stable_ext ?

I want to extend stalehlo to stable_ext to add some op for my complier.

...
def StableHloExt_Dialect : Dialect {
  let name = "stablehlo_ext";
  let cppNamespace = "::mlir::stablehlo_ext";
  let dependentDialects = ["mlir::stablehlo"];
}
...

It will generate a some code in inc file such as:

StableHloExtDialect::StableHloExtDialect(::mlir::MLIRContext *context)
    : ::mlir::Dialect(getDialectNamespace(), context, ::mlir::TypeID::get<StableHloExtDialect>()) {
  
    getContext()->loadDialect<xxx>();//xxx means mlir::stablehloDialect

  initialize();
}

But I mlir-hlo not have some info about "mlir::stablehloDialect" in hpp file like other dialect. What should I do~,Thanks.

Missing empty canonicalization for reduce

Missing mhlo canonicalization matching algebraic simplifier for empty reduce.

[cmake build] MhloDialect target no longer carries correct binary include dir

After this latest change to mlir-hlo/mhlo/CMakeLists.txt, the MhloDialect target no longer has the correct binary include directories attached to its INTERFACE_INCLUDE_DIRECTORIES. I now have to do the following in my downstream project when building mlir-hlo via the "LLVM External Project" route:

get_target_property(mhlo_includes_ MhloDialect INTERFACE_INCLUDE_DIRECTORIES)
list(APPEND mhlo_includes_ $<BUILD_INTERFACE:${LLVM_BINARY_DIR}/tools/mlir-hlo>)
set_target_properties(MhloDialect PROPERTIES INTERFACE_INCLUDE_DIRECTORIES "${mhlo_includes_}")

Otherwise, the build will fail on downstream targets that depend on MhloDialect. The compiler will say:

 fatal error: 'mhlo/IR/hlo_ops_enums.h.inc' file not found

Note that this is different from #52 (which probably can now be closed).

Edit: fixed the workaround

cudnn call support

Is there any plan to support cudnn/cublas call for convolution、dot computation?

release request

Seems this project is under development, the dependant llvm-project commit id is changing, not sure if there is any release plan for this repo, it would good to see several "stable" releases for this project.

Clarify float->int rounding behavior in convert

See iree-org/iree#6160, is the current expectation that an MHLO consumer can arbitrarily decide rounding behavior? (and so backends/devices may differ)

Support lowering mhlo.gather to linalg

The only supported lowering of mhlo.gather to Linalg is via the subset that can be lowered to torch_index_select (which is then lowered to Linalg)

One approach would be to decompose this mondo-op with some transposes and reshapes to drop the extra attributes that basically just accomplish that. This should probably be a separate pass, since not all backends will necessarily be able to lower that into performant code, but it would certainly make lowering to linalg easier.

I may work on this if we decide we want it badly enough in IREE in the near future.

mhlo.xor folding bug when lhs == rhs

(cc @rsuderman)

Looks like the way the 0 value is being created here is wrong:

https://github.com/tensorflow/tensorflow/blob/02f317584da257aafd262e3c1488b96b7722246e/tensorflow/compiler/mlir/hlo/lib/Dialect/mhlo/IR/hlo_ops.cc#L1363-L1366

Repro:

    func @bools_rgn_dispatch_0(%arg0: tensor<4xi1>) -> tensor<4xi1> {
      %0 = mhlo.xor %arg0, %arg0: tensor<4xi1>
      return %0 : tensor<4xi1>
    }

Results in this assert:

Assertion failed: ::isValidIntOrFloat(type.getElementType(), dataEltSize, isInt, isSigned), file D:\Dev\iree\third_party\llvm-project\mlir\lib\IR\Attributes.cpp, line 1114

Should be using builder.getZeroAttr(rType) instead.

legalizing/converting mhlo.convolution to linalg.conv

Is there a way to convert/legalize mhlo.convolution to linalg.convolution ?
I see /mlir-hlo/tests/hlo-legalize-to-lhlo.mlir supporting a) conversion from "mhlo.convolution" to "lmhlo.convolution"
and further "lmhlo.convolution" b) conversion to "linalg.conv" in /mlir-hlo/tests/lhlo-legalize-to-linalg.mlir.
The problem with conversion b) is that the linalg.conv is not inside the linalg.generic .

#map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
module {
func @conv(%arg0: memref<3x5x5x3xf32>, %arg1: memref<2x2x3x4xf32>, %arg2: memref<3x5x5x4xf32>) {
%c0 = constant 0 : index
%0 = alloc() : memref<3x5x5x4xf32>
linalg.conv(%arg0, %arg1, %0) {dilations = [1, 2], padding = dense<[[0, 1], [0, 1]]> : tensor<2x2xi64>, strides = [2, 1]} : memref<3x5x5x3xf32>, memref<2x2x3x4xf32>, memref<3x5x5x4xf32>
linalg.conv(%arg0, %arg1, %0) {dilations = [1, 1], strides = [2, 1]} : memref<3x5x5x3xf32>, memref<2x2x3x4xf32>, memref<3x5x5x4xf32>
linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%0 : memref<3x5x5x4xf32>) outs(%arg2 : memref<3x5x5x4xf32>) {
^bb0(%arg3: f32, %arg4: f32): // no predecessors
linalg.yield %arg3 : f32
}
"lmhlo.terminator"() : () -> ()
}
}

I also see the linalg-fusion-for-tensor-ops pass which can fuse hlo pointwise operators into region something like add/mul fused as std operators inside a generic op

func @float_add(%lhs: tensor<2x2xf32>,
%rhs: tensor<2x2xf32>) -> tensor<2x2xf32> {
%0 = "mhlo.add"(%lhs, %rhs) : (tensor<2x2xf32>,
tensor<2x2xf32>) -> tensor<2x2xf32>
%1 = "mhlo.multiply"(%lhs, %0) : (tensor<2x2xf32>,
tensor<2x2xf32>) -> tensor<2x2xf32>
return %1 : tensor<2x2xf32>
}

module {
func @float_add(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2xf32>) -> tensor<2x2xf32> {
%0 = linalg.init_tensor [2, 2] : tensor<2x2xf32>
%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<2x2xf32>, tensor<2x2xf32>) outs(%0 : tensor<2x2xf32>) {
^bb0(%arg2: f32, %arg3: f32, %arg4: f32): // no predecessors
%4 = addf %arg2, %arg3 : f32
linalg.yield %4 : f32
} -> tensor<2x2xf32>
%2 = linalg.init_tensor [2, 2] : tensor<2x2xf32>
%3 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %1 : tensor<2x2xf32>, tensor<2x2xf32>) outs(%2 : tensor<2x2xf32>) {
^bb0(%arg2: f32, %arg3: f32, %arg4: f32): // no predecessors
%4 = mulf %arg2, %arg3 : f32
linalg.yield %4 : f32
} -> tensor<2x2xf32>
return %3 : tensor<2x2xf32>
}
}

#map = affine_map<(d0, d1) -> (d0, d1)>
module {
func @float_add(%arg0: tensor<2x2xf32>, %arg1: tensor<2x2xf32>) -> tensor<2x2xf32> {
%0 = linalg.init_tensor [2, 2] : tensor<2x2xf32>
%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<2x2xf32>, tensor<2x2xf32>) outs(%0 : tensor<2x2xf32>) {
^bb0(%arg2: f32, %arg3: f32, %arg4: f32): // no predecessors
%2 = addf %arg2, %arg3 : f32
%3 = mulf %arg2, %2 : f32
linalg.yield %3 : f32
} -> tensor<2x2xf32>
return %1 : tensor<2x2xf32>
}
}

I would like to Fuse linalg Conv Relu inside a region using how do I get something like this where Conv gets inside the block ,
Can the methodology used by pointwise operators be extended to Conv and MatMul etc.

linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel"]} ins(%0 : memref<3x5x5x4xf32>) outs(%arg2 : memref<3x5x5x4xf32>) {
^bb0(%arg3: f32, %arg4: f32):  // no predecessors
    linalg.conv(%arg0, %arg1, %0) {dilations = [1, 1], strides = [2, 1]} : memref<3x5x5x3xf32>, memref<2x2x3x4xf32>, memref<3x5x5x4xf32>
  linalg.yield %arg3 : f32
}

Are there any examples of tensorflow training a model via mlir?

I want to train the model through tensorflow+mlir, but I can't find a complete model training example, I see some examples of tensorflow to MHLO Dialect in the tensorflow/tensorflow/compiler/mlir/tensorflow/tests directory, but it's just inference, Without training, where can I find training examples?

tensorflow / mlir-hlo Goto Github PK

mlir-hlo's Introduction

MLIR-HLO: A Standalone "HLO" MLIR-based Compiler

QuickStart: building and testing

Overview

MLIR Dialects for XLA-style compilation

HLO Client Dialect: chlo.

Meta HLO Dialect mhlo

LMHLO

End-to-End pipeline

Alternative build setups

Building Python API

External projects that depend on mlir-hlo

mlir-hlo's People

Contributors

Stargazers

Watchers

Forkers

mlir-hlo's Issues

func @float_add(%lhs: tensor<2x2xf32>, %rhs: tensor<2x2xf32>) -> tensor<2x2xf32> { %0 = "mhlo.add"(%lhs, %rhs) : (tensor<2x2xf32>, tensor<2x2xf32>) -> tensor<2x2xf32> %1 = "mhlo.multiply"(%lhs, %0) : (tensor<2x2xf32>, tensor<2x2xf32>) -> tensor<2x2xf32> return %1 : tensor<2x2xf32> }

Recommend Projects

Recommend Topics

Recommend Org

Jobs

HLO Client Dialect: `chlo`.

Meta HLO Dialect `mhlo`

func @float_add(%lhs: tensor<2x2xf32>,
%rhs: tensor<2x2xf32>) -> tensor<2x2xf32> {
%0 = "mhlo.add"(%lhs, %rhs) : (tensor<2x2xf32>,
tensor<2x2xf32>) -> tensor<2x2xf32>
%1 = "mhlo.multiply"(%lhs, %0) : (tensor<2x2xf32>,
tensor<2x2xf32>) -> tensor<2x2xf32>
return %1 : tensor<2x2xf32>
}