GithubHelp home page GithubHelp logo

enzymead / enzyme.jl Goto Github PK

View Code? Open in Web Editor NEW
401.0 401.0 55.0 3.18 MB

Julia bindings for the Enzyme automatic differentiator

Home Page: https://enzyme.mit.edu

License: MIT License

Julia 99.92% Python 0.08%
ad automatic-differentiation compiler differentiable-programming enzyme julia llvm machine-learning

enzyme.jl's People

Contributors

aviatesk avatar bitmyte avatar devmotion avatar enigne avatar gaurav-arya avatar gdalle avatar giordano avatar github-actions[bot] avatar jgreener64 avatar jlk9 avatar lassepe avatar mcabbott avatar metab0t avatar michel2323 avatar milescranmer avatar motabbara avatar oschulz avatar pchintalapudi avatar sethaxen avatar simsurace avatar sriharikrishna avatar st-- avatar swilliamson7 avatar tgymnich avatar vaibhavdixit02 avatar vchuravy avatar viralbshah avatar wsmoses avatar yingboma avatar zusez4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enzyme.jl's Issues

Enzyme autodiff breaking on tanh function

The following example crashes Julia on my system

using Enzyme
autodiff(tanh, Active(1.0))

version info of my system:
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Support Julia 1.6

Julia Version 1.6.0-beta1
Commit b84990e1ac (2021-01-08 12:42 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, westmere)
using Distributions, Enzyme
f(x) = logpdf(Normal(0.0, 1.0), x)
autodiff(f, Active(1.0))

Enzyme_jll for 1.6

To capture the discussion I had with @staticfloat on the topic.

 julia> using Base.BinaryPlatforms
       platforms = Dict(
           Platform("x86_64", "linux"; julia_version=v"1.5") => "info for 1.5",
           Platform("x86_64", "linux"; julia_version=v"1.6") => "info for 1.6",
           Platform("x86_64", "linux"; julia_version=v"1.7") => "info for 1.7",
       )
       select_platform(platforms, Platform("x86_64", "linux"; julia_version=v"1.6"))

so what I’m showcasing here is that (1) you can have arbitrary tags assigned to Platform objects
(2) you can pattern-match with Base.BinaryPlatforms.select_platform to select the Platform key in a dictionary that best matches the platform you pass in to it
so if we have a JLL that has a bunch of mappings in its Artifacts.toml, each of which has a julia_version key, it will automatically choose the right one
and we don’t have to write any custom code on the JLL side because teh default HostPlatform already encodes Julia version:

julia> HostPlatform()
Linux x86_64 {cxxstring_abi=cxx11, julia_version=1.6.0, libc=glibc, libgfortran_version=4.0.0, libstdcxx_version=3.4.26}

We do this because I consider the julia version a property of the environment

from the BB perspective, I think what needs to happen is we need to do away with the julia_compat kwarg to build_tarballs() and instead present this information via the julia_version tag on the platform objects

and then we need to add an expand_julia_versions(platforms, julia_versions) function so that we can do e.g. expand_julia_versions(platforms, [v"1.6", v"1.7"]) also, (and here’s the real kicker) this will only work for v1.6+ because v1.5 doesn’t know how to deal with extended platforms

Okay so this is a kind of hacky strategy that Stefan and I talked about a long time ago
https://github.com/JuliaLang/Pkg.jl/blob/v1.5.3/src/Artifacts.jl#L554-L555
this is how v1.5 selects the Artifacts.toml block that gets downloaded
In v1.5, if you add extended attributes, they will be ignored
so unpack_platform() will give the same result, ignoring the julia_version key
so you’ll get two keys that are the same

julia> Dict("a" => 1, "b" => 2, "a" => 3)
Dict{String, Int64} with 2 entries:
  "b" => 2
  "a" => 3

So you need to make sure that the Julia v1.5 entry is the last one in the file
so basically, write out all the entries into the TOML file, but then ensure that the Julia v1.5 mappings are after the Julia v1.6 mappings I don’t really like it, but it should function

cc @giordano

Active integer differentiation

I think NamedTuples are allowed as Actives, but currently, we get (may be a known problem):

julia> autodiff(x -> x.a * x.b, Active((a = 2, b = 3)))
((a = 0, b = 0),)

Debug build complains about missing DICompileUnit

julia> using Enzyme

julia> f(x) = 1.0 + x
f (generic function with 1 method)

julia> g(x) = Enzyme.autodiff(f, x)
g (generic function with 1 method)

julia> @code_llvm g(1.0)
saw metadata for diffe_out
DICompileUnit not listed in llvm.dbg.cu
!33 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !32, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4)
ERROR: LLVM error: Broken module found, compilation aborted!
Stacktrace:
 [1] handle_error(::Cstring) at /home/vchuravy/src/LLVM/src/core/context.jl:103
 [2] _dump_function_linfo(::Core.MethodInstance, ::UInt64, ::Bool, ::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Symbol, ::Base.CodegenParams) at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:102
 [3] _dump_function(::Any, ::Any, ::Bool, ::Bool, ::Bool, ::Bool, ::Symbol, ::Bool, ::Symbol, ::Base.CodegenParams) at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:84
 [4] _dump_function at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:71 [inlined]
 [5] code_llvm at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:132 [inlined]
 [6] #code_llvm#7 at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:134 [inlined]
 [7] (::InteractiveUtils.var"#kw##code_llvm")(::NamedTuple{(:raw, :dump_module, :optimize, :debuginfo),Tuple{Bool,Bool,Bool,Symbol}}, ::typeof(code_llvm), ::Base.TTY, ::Function, ::Type) at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:134
 [8] #code_llvm#8(::Bool, ::Bool, ::Bool, ::Symbol, ::typeof(code_llvm), ::Any, ::Any) at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:136
 [9] code_llvm(::Any, ::Any) at /home/vchuravy/builds/julia-debug/usr/share/julia/stdlib/v1.4/InteractiveUtils/src/codeview.jl:136
 [10] top-level scope at REPL[4]:1

julia>

Pre-optimization:

julia> llvmf, mod = Enzyme.emit(typeof(f), (Float64,)); mod
; ModuleID = 'f'
source_filename = "f"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-pc-linux-gnu"

%jl_value_t = type opaque

; Function Attrs: sspstrong
define double @julia_f_2(double) #0 !dbg !5 {
top:
  %1 = call %jl_value_t*** @julia.ptls_states()
  %2 = bitcast %jl_value_t*** %1 to %jl_value_t addrspace(10)**
  %3 = getelementptr inbounds %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)** %2, i64 4
  %4 = bitcast %jl_value_t addrspace(10)** %3 to i64**
  %5 = load i64*, i64** %4, !tbaa !13, !invariant.load !4
  %x = alloca double
  store double %0, double* %x
  call void @llvm.dbg.declare(metadata double* %x, metadata !12, metadata !DIExpression()), !dbg !16
  %6 = fadd double 1.000000e+00, %0, !dbg !17
  ret double %6, !dbg !16
}

declare %jl_value_t*** @julia.ptls_states()

; Function Attrs: allocsize(1)
declare noalias nonnull %jl_value_t addrspace(10)* @julia.gc_alloc_obj(i8*, i64, %jl_value_t addrspace(10)*) #1

; Function Attrs: nounwind readnone speculatable
declare void @llvm.dbg.declare(metadata, metadata, metadata) #2

; Function Attrs: alwaysinline
define double @enzyme_entry(double) #3 {
entry:
  %1 = call double (i8*, ...) @__enzyme_autodiff(i8* bitcast (double (double)* @julia_f_2 to i8*), metadata !"diffe_out", double %0)
  ret double %1
}

declare double @__enzyme_autodiff(i8*, ...)

attributes #0 = { sspstrong }
attributes #1 = { allocsize(1) }
attributes #2 = { nounwind readnone speculatable }
attributes #3 = { alwaysinline }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 1, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4)
!3 = !DIFile(filename: "REPL[2]", directory: ".")
!4 = !{}
!5 = distinct !DISubprogram(name: "f", linkageName: "julia_f_17137", scope: null, file: !3, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !2, variables: !9)
!6 = !DISubroutineType(types: !7)
!7 = !{!8, !8}
!8 = !DIBasicType(name: "Float64", size: 64, encoding: DW_ATE_unsigned)
!9 = !{!10, !12}
!10 = !DILocalVariable(name: "#self#", arg: 1, scope: !5, file: !3, line: 1, type: !11)
!11 = !DICompositeType(tag: DW_TAG_structure_type, name: "#f", align: 8, elements: !4, runtimeLang: DW_LANG_Julia, identifier: "49701")
!12 = !DILocalVariable(name: "x", arg: 2, scope: !5, file: !3, line: 1, type: !8)
!13 = !{!14, !14, i64 0, i64 1}
!14 = !{!"jtbaa_const", !15, i64 0}
!15 = !{!"jtbaa"}
!16 = !DILocation(line: 1, scope: !5)
!17 = !DILocation(line: 401, scope: !18, inlinedAt: !16)
!18 = distinct !DISubprogram(name: "+;", linkageName: "+", scope: !19, file: !19, type: !20, isLocal: false, isDefinition: true, isOptimized: true, unit: !2, variables: !4)
!19 = !DIFile(filename: "float.jl", directory: ".")
!20 = !DISubroutineType(types: !4)

Post-optimization:

julia> Enzyme.optimize!(mod)
saw metadata for diffe_out
true

julia> mod
; ModuleID = 'f'
source_filename = "f"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-pc-linux-gnu"

%jl_value_t = type opaque

; Function Attrs: sspstrong
define double @julia_f_2(double) #0 !dbg !5 {
top:
  call void @llvm.dbg.value(metadata double %0, metadata !12, metadata !DIExpression()), !dbg !13
  %1 = fadd double %0, 1.000000e+00, !dbg !14
  ret double %1, !dbg !13
}

; Function Attrs: allocsize(1)
declare noalias nonnull %jl_value_t addrspace(10)* @julia.gc_alloc_obj(i8*, i64, %jl_value_t addrspace(10)*) #1

; Function Attrs: nounwind readnone speculatable
declare void @llvm.dbg.declare(metadata, metadata, metadata) #2

; Function Attrs: alwaysinline
define double @enzyme_entry(double) #3 {
entry:
  %1 = call { double } @diffejulia_f_2(double %0, double 1.000000e+00)
  %2 = extractvalue { double } %1, 0
  ret double %2
}

declare double @__enzyme_autodiff(i8*, ...)

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #4

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #4

; Function Attrs: inaccessiblemem_or_argmemonly
declare void @jl_gc_queue_root(%jl_value_t addrspace(10)*) #5

; Function Attrs: allocsize(1)
declare noalias nonnull %jl_value_t addrspace(10)* @jl_gc_pool_alloc(i8*, i32, i32) #1

; Function Attrs: allocsize(1)
declare noalias nonnull %jl_value_t addrspace(10)* @jl_gc_big_alloc(i8*, i64) #1

; Function Attrs: nounwind readnone speculatable
declare void @llvm.dbg.value(metadata, metadata, metadata) #2

; Function Attrs: sspstrong
define double @preprocess_julia_f_2(double) #0 !dbg !18 {
top:
  call void @llvm.dbg.value(metadata double %0, metadata !21, metadata !DIExpression()), !dbg !22
  %1 = fadd double %0, 1.000000e+00, !dbg !23
  ret double %1, !dbg !22
}

; Function Attrs: sspstrong
define internal { double } @diffejulia_f_2(double, double %differeturn) #0 !dbg !24 {
top:
  call void @llvm.dbg.value(metadata double %0, metadata !27, metadata !DIExpression()), !dbg !28
  %"'de" = alloca double
  store double 0.000000e+00, double* %"'de"
  %"'de1" = alloca double
  store double 0.000000e+00, double* %"'de1"
  %1 = fadd double %0, 1.000000e+00, !dbg !29
  br label %inverttop, !dbg !28

inverttop:                                        ; preds = %top
  store double %differeturn, double* %"'de"
  %2 = load double, double* %"'de"
  store double 0.000000e+00, double* %"'de"
  %3 = load double, double* %"'de1"
  %4 = fadd fast double %3, %2
  store double %4, double* %"'de1"
  %5 = load double, double* %"'de1"
  %6 = insertvalue { double } undef, double %5, 0
  ret { double } %6
}

attributes #0 = { sspstrong }
attributes #1 = { allocsize(1) }
attributes #2 = { nounwind readnone speculatable }
attributes #3 = { alwaysinline }
attributes #4 = { argmemonly nounwind }
attributes #5 = { inaccessiblemem_or_argmemonly }

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 1, !"Debug Info Version", i32 3}
!2 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !3, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4)
!3 = !DIFile(filename: "REPL[2]", directory: ".")
!4 = !{}
!5 = distinct !DISubprogram(name: "f", linkageName: "julia_f_17137", scope: null, file: !3, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !2, variables: !9)
!6 = !DISubroutineType(types: !7)
!7 = !{!8, !8}
!8 = !DIBasicType(name: "Float64", size: 64, encoding: DW_ATE_unsigned)
!9 = !{!10, !12}
!10 = !DILocalVariable(name: "#self#", arg: 1, scope: !5, file: !3, line: 1, type: !11)
!11 = !DICompositeType(tag: DW_TAG_structure_type, name: "#f", align: 8, elements: !4, runtimeLang: DW_LANG_Julia, identifier: "49701")
!12 = !DILocalVariable(name: "x", arg: 2, scope: !5, file: !3, line: 1, type: !8)
!13 = !DILocation(line: 1, scope: !5)
!14 = !DILocation(line: 401, scope: !15, inlinedAt: !13)
!15 = distinct !DISubprogram(name: "+;", linkageName: "+", scope: !16, file: !16, type: !17, isLocal: false, isDefinition: true, isOptimized: true, unit: !2, variables: !4)
!16 = !DIFile(filename: "float.jl", directory: ".")
!17 = !DISubroutineType(types: !4)
!18 = distinct !DISubprogram(name: "f", linkageName: "julia_f_17137", scope: null, file: !3, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !2, variables: !19)
!19 = !{!20, !21}
!20 = !DILocalVariable(name: "#self#", arg: 1, scope: !18, file: !3, line: 1, type: !11)
!21 = !DILocalVariable(name: "x", arg: 2, scope: !18, file: !3, line: 1, type: !8)
!22 = !DILocation(line: 1, scope: !18)
!23 = !DILocation(line: 401, scope: !15, inlinedAt: !22)
!24 = distinct !DISubprogram(name: "f", linkageName: "julia_f_17137", scope: null, file: !3, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !2, variables: !25)
!25 = !{!26, !27}
!26 = !DILocalVariable(name: "#self#", arg: 1, scope: !24, file: !3, line: 1, type: !11)
!27 = !DILocalVariable(name: "x", arg: 2, scope: !24, file: !3, line: 1, type: !8)
!28 = !DILocation(line: 1, scope: !24)
!29 = !DILocation(line: 401, scope: !15, inlinedAt: !28)

Crashes with BandedMatrices.jl

I'm looking for a reverse mode autodiff package that supports in-place mutation and @ChrisRackauckas suggested this. Unfortunately it crashed on first attempt:

julia> using FillArrays, BandedMatrices, Enzyme

julia> f = x -> BandedMatrix(0 => Fill(x^2,10))[1,1]
#1 (generic function with 1 method)

julia> f(0.1)
0.010000000000000002

julia> autodiff(f, Active(0.1))
mod: ; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-apple-darwin20.4.0"

@_j_const1 = private unnamed_addr constant [2 x i64] [i64 10, i64 10]
@_j_const2 = private unnamed_addr constant [2 x i64] zeroinitializer
@exception.38 = private unnamed_addr constant [10 x i8] c"exception\00", align 1
@exception.44 = private unnamed_addr constant [13 x i8] c"bounds error\00", align 1

define internal fastcc nonnull {} addrspace(10)* @julia_dec_1477(i64 zeroext %0, i8 zeroext %1) unnamed_addr !dbg !65 {
top:
  %2 = call {}*** @julia.ptls_states()
  %3 = call fastcc i64 @julia_ndigits0zpb_1459(i64 zeroext %0), !dbg !67
  %.not = icmp slt i64 %3, 1, !dbg !74
  %4 = select i1 %.not, i64 1, i64 %3, !dbg !77
  %5 = and i8 %1, 1, !dbg !80
  %6 = zext i8 %5 to i64, !dbg !80
  %7 = add i64 %4, %6, !dbg !91
  %8 = icmp sgt i64 %7, -1, !dbg !93
  br i1 %8, label %L33, label %L25, !dbg !95

L25:                                              ; preds = %top
  %9 = call fastcc nonnull {} addrspace(10)* @julia_throw_inexacterror_1461(i64 signext %7) #6, !dbg !95
  unreachable, !dbg !95

L33:                                              ; preds = %top
  %10 = call nonnull {} addrspace(10)* @jl_alloc_string(i64 %7), !dbg !105
  %11 = call nonnull {} addrspace(10)* @jl_string_to_array({} addrspace(10)* nonnull %10), !dbg !112
  %12 = icmp slt i64 %7, 2, !dbg !114
  br i1 %12, label %L56, label %pass.lr.ph, !dbg !119

Crash with LoopVectorization

Enzyme currently (current master branch) crashes with LoopVectorization. The last line in this example

using Enzyme
using LoopVectorization

function mymul_simd!(R, A, B)
    @assert axes(A,2) == axes(B,1) && axes(R,1) == axes(A,1) && axes(R,2) == axes(B,2)
    @inbounds @simd for i in eachindex(R)
        R[i] = 0
    end
    @inbounds for j in axes(B, 2), i in axes(A, 1)
        @inbounds @simd for k in axes(A,2)
            R[i,j] += A[i,k] * B[k,j]
        end
    end
    nothing
end

function mymul_turbo!(R, A, B)
    @assert axes(A,2) == axes(B,1) && axes(R,1) == axes(A,1) && axes(R,2) == axes(B,2)
    @inbounds @turbo for i in eachindex(R)
        R[i] = 0
    end
    @inbounds @turbo for j in axes(B, 2), i in axes(A, 1), k in axes(A,2)
            R[i,j] += A[i,k] * B[k,j]
    end
    nothing
end

A = rand(500, 300)
B = rand(300, 700)
R = zeros(size(A,1), size(B,2))

@assert (fill!(R, NaN); mymul_simd!(R, A, B); R  A * B)
@assert (fill!(R, NaN); mymul_turbo!(R, A, B); R  A * B)

dA = similar(A); dB = similar(B); dR = similar(R)

fill!(R, NaN); fill!(dR, 1); fill!(dA, 0); fill!(dB, 0)
Enzyme.autodiff(mymul_simd!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

fill!(R, NaN); fill!(dR, 1); fill!(dA, 0); fill!(dB, 0)
Enzyme.autodiff(mymul_turbo!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

results in

  call:   %45 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0), !dbg !77
 +   %53 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %52) #6, !dbg !240
 +   %3 = call {}*** @julia.ptls_states()
 +   %42 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %41) #6, !dbg !78
 +   call fastcc void @julia__turbo___6288(i64 signext %33, i64 signext %24, i64 signext %8, i64 zeroext %55, i64 zeroext %59, i64 zeroext %51, i64 signext %res.i5.i, i64 signext %res.i4.i, i64 signext %res.i6.i), !dbg !239
 +   %57 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %56) #6, !dbg !240
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:3553: void AdjointGenerator<AugmentedReturnType>::visitCallInst(llvm::CallInst&) [with AugmentedReturnType = const AugmentedReturn*]: Assertion `uncacheable_args_map.find(&call) != uncacheable_args_map.end()' failed.

signal (6): Aborted

Reflection broken

function f(x)
         LinearAlgebra.BLAS.dot(length(x), x, 1, x, 1)
end

julia> Enzyme.Compiler.enzyme_code_llvm(f, Tuple{Enzyme.Duplicated{Vector{Float64}}}, run_enzyme=false, second_stage=false)
ERROR: UndefVarError: wrapper! not defined
Stacktrace:
 [1] reflect(::Any, ::Any; optimize::Bool, run_enzyme::Bool, second_stage::Bool) at /home/vchuravy/.julia/packages/Enzyme/XmiBH/src/compiler/reflection.jl:13
 [2] enzyme_code_llvm(::Base.TTY, ::Any, ::Any; optimize::Bool, run_enzyme::Bool, second_stage::Bool, raw::Bool, debuginfo::Symbol, dump_module::Bool) at /home/vchuravy/.julia/packages/Enzyme/XmiBH/src/compiler/reflection.jl:26
 [3] enzyme_code_llvm(::Any, ::Any; kwargs::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol,Symbol},NamedTuple{(:run_enzyme, :second_stage),Tuple{Bool,Bool}}}) at /home/vchuravy/.julia/packages/Enzyme/XmiBH/src/compiler/reflection.jl:33
 [4] top-level scope at REPL[12]:1

Decide on ABI

We are currently mimicking the julia_ API since we use GPUCompiler and it is meant for environments where there is no dynamic execution. I am slightly worried that we will run into ABI mismatches and it might be better to default to the jl_ ABI and set they types passed to ccall to Any. We would need to make sure that Enzyme can deal with the wrapper code.

Matrix pullback

the following doesn't give up, but hangs forever on master branch

julia> moo(x) = fill(x, 10, 10)
moo (generic function with 1 method)

julia> Enzyme.pullback(moo, 2.1)(rand(10, 10))
^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: InterruptException:
Stacktrace:
 [1] top-level scope
   @ REPL[21]:1

Crash with Tullio

Enzyme currently (current master branch) crashes with Tullio. The last line in this example

using Enzyme
using Tullio

function mymul_simd!(R, A, B)
    @assert axes(A,2) == axes(B,1) && axes(R,1) == axes(A,1) && axes(R,2) == axes(B,2)
    @inbounds @simd for i in eachindex(R)
        R[i] = 0
    end
    @inbounds for j in axes(B, 2), i in axes(A, 1)
        @inbounds @simd for k in axes(A,2)
            R[i,j] += A[i,k] * B[k,j]
        end
    end
    nothing
end

function mymul_tullio!(R, A, B)
    @tullio R[i,j] = A[i,k] * B[k,j]
    nothing
end

A = rand(500, 300)
B = rand(300, 700)
R = zeros(size(A,1), size(B,2))

@assert (fill!(R, NaN); mymul_simd!(R, A, B); R  A * B)
@assert (fill!(R, NaN); mymul_tullio!(R, A, B); R  A * B)

dA = similar(A); dB = similar(B); dR = similar(R)

fill!(R, NaN); fill!(dR, 1); fill!(dA, 0); fill!(dB, 0)
Enzyme.autodiff(mymul_simd!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

fill!(R, NaN); fill!(dR, 1); fill!(dA, 0); fill!(dB, 0)
Enzyme.autodiff(mymul_tullio!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

results in

not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,8]:Pointer, [-1,0,0,8,0]:Pointer, [-1,0,0,8,0,56]:Integer, [-1,0,0,8,0,57]:Integer, [-1,0,0,8,8]:Pointer, [-1,0,0,8,8,56]:Integer, [-1,0,0,8,8,57]:Integer, [-1,0,0,56]:Integer, [-1,0,0,57]:Integer, [-1,0,8]:Pointer, [-1,0,8,0]:Pointer, [-1,0,8,0,0]:Pointer, [-1,0,8,0,0,56]:Integer, [-1,0,8,0,0,57]:Integer, [-1,0,8,0,8]:Pointer, [-1,0,8,0,8,0]:Pointer} adding v: [-1,0,8,0,8,0,56]: Integer
not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,8]:Pointer, [-1,0,0,8,0]:Pointer, [-1,0,0,8,0,56]:Integer, [-1,0,0,8,0,57]:Integer, [-1,0,0,8,8]:Pointer, [-1,0,0,8,8,56]:Integer, [-1,0,0,8,8,57]:Integer, [-1,0,0,56]:Integer, [-1,0,0,57]:Integer, [-1,0,8]:Pointer, [-1,0,8,0]:Pointer, [-1,0,8,0,0]:Pointer, [-1,0,8,0,0,56]:Integer, [-1,0,8,0,0,57]:Integer, [-1,0,8,0,8]:Pointer, [-1,0,8,0,8,0]:Pointer} adding v: [-1,0,8,0,8,0,57]: Integer
not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,8]:Pointer, [-1,0,0,8,0]:Pointer, [-1,0,0,8,0,56]:Integer, [-1,0,0,8,0,57]:Integer, [-1,0,0,8,8]:Pointer, [-1,0,0,8,8,56]:Integer, [-1,0,0,8,8,57]:Integer, [-1,0,0,56]:Integer, [-1,0,0,57]:Integer, [-1,0,8]:Pointer, [-1,0,8,0]:Pointer, [-1,0,8,0,0]:Pointer, [-1,0,8,0,0,56]:Integer, [-1,0,8,0,0,57]:Integer, [-1,0,8,0,8]:Pointer, [-1,0,8,0,8,0]:Pointer} adding v: [-1,0,8,0,8,0,56]: Integer
not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,56]:Integer, [-1,0,0,0,57]:Integer, [-1,0,0,8]:Pointer, [-1,0,0,8,0]:Pointer, [-1,0,0,8,0,56]:Integer, [-1,0,0,8,0,57]:Integer, [-1,0,0,8,8]:Pointer, [-1,0,0,8,8,56]:Integer, [-1,0,0,8,8,57]:Integer, [-1,0,0,56]:Integer, [-1,0,0,57]:Integer, [-1,0,8]:Pointer, [-1,0,8,0]:Pointer, [-1,0,8,0,0]:Pointer, [-1,0,8,0,0,56]:Integer, [-1,0,8,0,0,57]:Integer, [-1,0,8,0,8]:Pointer, [-1,0,8,0,8,0]:Pointer} adding v: [-1,0,8,0,8,0,57]: Integer
not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,0]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,8,0]:Pointer} adding v: [-1,0,0,0,8,0,56]: Integer
not handling more than 6 pointer lookups deep dt:{[-1]:Pointer, [-1,0]:Pointer, [-1,0,0]:Pointer, [-1,0,0,0]:Pointer, [-1,0,0,0,0]:Pointer, [-1,0,0,0,0,56]:Integer, [-1,0,0,0,0,57]:Integer, [-1,0,0,0,8]:Pointer, [-1,0,0,0,8,0]:Pointer} adding v: [-1,0,0,0,8,0,57]: Integer
...

Tullio support would of course be awesome, if feasible (easy GPU+GPU compatible code, etc.). While Tullio provides pullbacks for ChainRulesCore and Zygote, they do have significant overhead for smaller arrays, so code with smaller arrays and several Tullio loops may profit a lot from using Enzyme.

This should be unrelated to #98 (Tullio is running without LoopVectorization here, though Tullio plus LoopVectorization is even more awesome, naturally).

Differentiate `dsyevr_64_`

I'm not sure why it complains the return type is Union{}, the return type of this foo function is quite simple? I got

julia> foo(x) = sum(exp(fill(x, 10, 10)))
foo (generic function with 1 method)

julia> Enzyme.autodiff(foo, Active(2.0))
ERROR: return type is Union{}, giving up.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(EnzymeCtx), Nothing, Nothing, Enzyme.Compiler.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(foo), Float64}}}, mod::LLVM.Module, primalf::LLVM.Function, adjoint::GPUCompiler.FunctionSpec{typeof(foo), Tuple{Active{Float64}}}, split::Bool, parallel::Bool)
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/A97If/src/compiler.jl:143
 [3] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(EnzymeCtx), Nothing, Nothing, Enzyme.Compiler.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(foo), Float64}}}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/A97If/src/compiler.jl:435
 [4] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams, GPUCompiler.FunctionSpec{typeof(Cassette.overdub), Tuple{Cassette.Context{nametype(EnzymeCtx), Nothing, Nothing, Enzyme.Compiler.var"##PassType#257", Nothing, Cassette.DisableHooks}, typeof(foo), Float64}}})
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/A97If/src/compiler.jl:761
 [5] callback(orc_ref::Ptr{LLVM.API.LLVMOrcOpaqueJITStack}, callback_ctx::Ptr{Nothing})
   @ Enzyme.Compiler ~/.julia/packages/Enzyme/A97If/src/compiler.jl:822
 [6] top-level scope
   @ REPL[11]:1

julia> @code_warntype foo(2.0)
Variables
  #self#::Core.Const(foo)
  x::Float64

Body::Float64
1%1 = Main.fill(x, 10, 10)::Matrix{Float64}%2 = Main.exp(%1)::Matrix{Float64}%3 = Main.sum(%2)::Float64
└──      return %3

libLLVM-11jl.so: cannot open shared object file

On Arch Linux, the following problem in Enzyme breaks DiffEqSensitivity, DiffEqFlux, NeuralPDE...

ERROR: LoadError: LoadError: InitError: could not load library "/home/seadra/.julia/artifacts/106bae7ef747d3f252feb096f357f676565fcde0/lib/libEnzyme-11.so"
libLLVM-11jl.so: cannot open shared object file: No such file or directory

Arch Linux uses llvm 12.0.0

Undubb'd tanh analysis fails

define internal fastcc double @preprocess_julia_tanh_1253(double) unnamed_addr !dbg !109 {
top:
  %1 = call %jl_value_t*** @julia.ptls_states()
  %2 = fcmp ord double %0, 0.000000e+00, !dbg !110
  br i1 %2, label %L4, label %L3, !dbg !112

L3:                                               ; preds = %L22, %top
  ret double %0, !dbg !113

L4:                                               ; preds = %top
  %3 = fsub double %0, %0, !dbg !114
  %4 = fcmp oeq double %3, 0.000000e+00, !dbg !118
  br i1 %4, label %L19, label %L17, !dbg !117

L17:                                              ; preds = %L4
  %5 = bitcast double %0 to i64, !dbg !120
  %6 = and i64 %5, -9223372036854775808, !dbg !120
  %7 = or i64 %6, 4607182418800017408, !dbg !120
  %8 = bitcast i64 %7 to double, !dbg !121
  ret double %8, !dbg !121

L19:                                              ; preds = %L4
  %9 = call double @llvm.fabs.f64(double %0), !dbg !122
  %10 = fcmp uge double %9, 2.200000e+01, !dbg !124
  br i1 %10, label %L42, label %L22, !dbg !125

L22:                                              ; preds = %L19
  %11 = fcmp uge double %9, 0x3E30000000000000, !dbg !126
  br i1 %11, label %L26, label %L3, !dbg !127

L26:                                              ; preds = %L22
  %12 = fcmp ult double %9, 1.000000e+00, !dbg !128
  br i1 %12, label %L34, label %L28, !dbg !130

L28:                                              ; preds = %L26
  %13 = fmul double %9, 2.000000e+00, !dbg !131
  %14 = call cc37 nonnull %jl_value_t addrspace(10)* bitcast (%jl_value_t addrspace(10)* (%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32)* @jl_f_tuple to %jl_value_t addrspace(10)* (%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)*)*)(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* null to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873808773664 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873788832384 to %jl_value_t*) to %jl_value_t addrspace(10)*)), !dbg !133
  %15 = call %jl_value_t addrspace(10)* @julia.typeof(%jl_value_t addrspace(10)* nonnull %14), !dbg !133
  %16 = call %jl_value_t addrspace(10)* @julia.typeof(%jl_value_t addrspace(10)* nonnull %15), !dbg !133
  %17 = icmp eq %jl_value_t addrspace(10)* %16, addrspacecast (%jl_value_t* inttoptr (i64 139873741882160 to %jl_value_t*) to %jl_value_t addrspace(10)*), !dbg !133
  br i1 %17, label %pass, label %fail, !dbg !133

L34:                                              ; preds = %L26
  %18 = fmul double %9, -2.000000e+00, !dbg !134
  %19 = call cc37 nonnull %jl_value_t addrspace(10)* bitcast (%jl_value_t addrspace(10)* (%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)**, i32)* @jl_f_tuple to %jl_value_t addrspace(10)* (%jl_value_t addrspace(10)*, %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)*)*)(%jl_value_t addrspace(10)* addrspacecast (%jl_value_t* null to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873808773664 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873788832384 to %jl_value_t*) to %jl_value_t addrspace(10)*)), !dbg !136
  %20 = call %jl_value_t addrspace(10)* @julia.typeof(%jl_value_t addrspace(10)* nonnull %19), !dbg !136
  %21 = call %jl_value_t addrspace(10)* @julia.typeof(%jl_value_t addrspace(10)* nonnull %20), !dbg !136
  %22 = icmp eq %jl_value_t addrspace(10)* %21, addrspacecast (%jl_value_t* inttoptr (i64 139873741882160 to %jl_value_t*) to %jl_value_t addrspace(10)*), !dbg !136
  br i1 %22, label %pass7, label %fail6, !dbg !136

L42:                                              ; preds = %pass11, %pass4, %L19
  %value_phi5 = phi double [ %43, %pass4 ], [ %58, %pass11 ], [ 1.000000e+00, %L19 ]
  %23 = bitcast double %value_phi5 to i64, !dbg !137
  %24 = bitcast double %0 to i64, !dbg !137
  %25 = and i64 %24, -9223372036854775808, !dbg !137
  %26 = and i64 %23, 9223372036854775807, !dbg !137
  %27 = or i64 %26, %25, !dbg !137
  %28 = bitcast i64 %27 to double, !dbg !138
  ret double %28, !dbg !138

fail:                                             ; preds = %L28
  %29 = addrspacecast %jl_value_t addrspace(10)* %15 to %jl_value_t addrspace(12)*, !dbg !133
  call void @jl_type_error(i8* nonnull inttoptr (i64 41541232 to i8*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873741882160 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(12)* %29), !dbg !133
  unreachable, !dbg !133

pass:                                             ; preds = %L28
  %30 = addrspacecast %jl_value_t addrspace(10)* %15 to %jl_value_t addrspace(11)*, !dbg !133
  %31 = bitcast %jl_value_t addrspace(11)* %30 to %jl_value_t addrspace(10)* addrspace(11)*, !dbg !133
  %32 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %31, align 8, !dbg !133, !tbaa !86, !invariant.load !4
  %33 = addrspacecast %jl_value_t addrspace(10)* %32 to %jl_value_t addrspace(12)*, !dbg !133
  %34 = icmp eq %jl_value_t addrspace(12)* %33, addrspacecast (%jl_value_t* inttoptr (i64 139873742360480 to %jl_value_t*) to %jl_value_t addrspace(12)*), !dbg !133
  br i1 %34, label %pass2, label %fail1, !dbg !133

fail1:                                            ; preds = %pass
  %35 = addrspacecast %jl_value_t addrspace(10)* %14 to %jl_value_t addrspace(12)*, !dbg !133
  call void @jl_type_error(i8* nonnull inttoptr (i64 41541232 to i8*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873742360560 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(12)* %35), !dbg !133
  unreachable, !dbg !133

pass2:                                            ; preds = %pass
  %36 = bitcast %jl_value_t addrspace(10)* %14 to i64 addrspace(10)*, !dbg !133
  %37 = load i64, i64 addrspace(10)* %36, align 8, !dbg !133, !tbaa !90
  %38 = icmp eq i64 %37, 0, !dbg !133
  br i1 %38, label %fail3, label %pass4, !dbg !133

fail3:                                            ; preds = %pass2
  call void @jl_throw(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 139873749195808 to %jl_value_t*) to %jl_value_t addrspace(12)*)), !dbg !133
  unreachable, !dbg !133

pass4:                                            ; preds = %pass2
  %39 = inttoptr i64 %37 to double (double)*, !dbg !133
  %40 = call double %39(double %13), !dbg !133
  %41 = fadd double %40, 2.000000e+00, !dbg !139
  %42 = fdiv double 2.000000e+00, %41, !dbg !141
  %43 = fsub double 1.000000e+00, %42, !dbg !142
  br label %L42, !dbg !140

fail6:                                            ; preds = %L34
  %44 = addrspacecast %jl_value_t addrspace(10)* %20 to %jl_value_t addrspace(12)*, !dbg !136
  call void @jl_type_error(i8* nonnull inttoptr (i64 41541232 to i8*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873741882160 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(12)* %44), !dbg !136
  unreachable, !dbg !136

pass7:                                            ; preds = %L34
  %45 = addrspacecast %jl_value_t addrspace(10)* %20 to %jl_value_t addrspace(11)*, !dbg !136
  %46 = bitcast %jl_value_t addrspace(11)* %45 to %jl_value_t addrspace(10)* addrspace(11)*, !dbg !136
  %47 = load %jl_value_t addrspace(10)*, %jl_value_t addrspace(10)* addrspace(11)* %46, align 8, !dbg !136, !tbaa !86, !invariant.load !4
  %48 = addrspacecast %jl_value_t addrspace(10)* %47 to %jl_value_t addrspace(12)*, !dbg !136
  %49 = icmp eq %jl_value_t addrspace(12)* %48, addrspacecast (%jl_value_t* inttoptr (i64 139873742360480 to %jl_value_t*) to %jl_value_t addrspace(12)*), !dbg !136
  br i1 %49, label %pass9, label %fail8, !dbg !136

fail8:                                            ; preds = %pass7
  %50 = addrspacecast %jl_value_t addrspace(10)* %19 to %jl_value_t addrspace(12)*, !dbg !136
  call void @jl_type_error(i8* nonnull inttoptr (i64 41541232 to i8*), %jl_value_t addrspace(10)* addrspacecast (%jl_value_t* inttoptr (i64 139873742360560 to %jl_value_t*) to %jl_value_t addrspace(10)*), %jl_value_t addrspace(12)* %50), !dbg !136
  unreachable, !dbg !136

pass9:                                            ; preds = %pass7
  %51 = bitcast %jl_value_t addrspace(10)* %19 to i64 addrspace(10)*, !dbg !136
  %52 = load i64, i64 addrspace(10)* %51, align 8, !dbg !136, !tbaa !90
  %53 = icmp eq i64 %52, 0, !dbg !136
  br i1 %53, label %fail10, label %pass11, !dbg !136

fail10:                                           ; preds = %pass9
  call void @jl_throw(%jl_value_t addrspace(12)* addrspacecast (%jl_value_t* inttoptr (i64 139873749195808 to %jl_value_t*) to %jl_value_t addrspace(12)*)), !dbg !136
  unreachable, !dbg !136

pass11:                                           ; preds = %pass9
  %54 = inttoptr i64 %52 to double (double)*, !dbg !136
  %55 = call double %54(double %18), !dbg !136
  %56 = fsub double -0.000000e+00, %55, !dbg !143
  %57 = fadd double %55, 2.000000e+00, !dbg !145
  %58 = fdiv double %56, %57, !dbg !146
  br label %L42, !dbg !146
}
@code_llvm tanh(2.0)

;  @ special/hyperbolic.jl:138 within `tanh'
define double @julia_tanh_135(double) {
top:
;  @ special/hyperbolic.jl:151 within `tanh'
; ┌ @ float.jl:536 within `isnan'
; │┌ @ float.jl:456 within `!='
    %1 = fcmp ord double %0, 0.000000e+00
; └└
  br i1 %1, label %L4, label %L3

L3:                                               ; preds = %L22, %top
;  @ special/hyperbolic.jl:152 within `tanh'
  ret double %0

L4:                                               ; preds = %top
;  @ special/hyperbolic.jl:153 within `tanh'
; ┌ @ float.jl:564 within `isinf'
; │┌ @ float.jl:554 within `isfinite'
; ││┌ @ float.jl:403 within `-'
     %2 = fsub double %0, %0
; ││└
; ││┌ @ float.jl:488 within `==' @ float.jl:454
     %3 = fcmp oeq double %2, 0.000000e+00
; └└└
  br i1 %3, label %L19, label %L17

L17:                                              ; preds = %L4
;  @ special/hyperbolic.jl:154 within `tanh'
; ┌ @ floatfuncs.jl:5 within `copysign'
   %4 = bitcast double %0 to i64
   %5 = and i64 %4, -9223372036854775808
   %6 = or i64 %5, 4607182418800017408
; └
  %7 = bitcast i64 %6 to double
  ret double %7

L19:                                              ; preds = %L4
;  @ special/hyperbolic.jl:157 within `tanh'
; ┌ @ float.jl:528 within `abs'
   %8 = call double @llvm.fabs.f64(double %0)
; └
;  @ special/hyperbolic.jl:158 within `tanh'
; ┌ @ float.jl:458 within `<'
   %9 = fcmp uge double %8, 2.200000e+01
; └
  br i1 %9, label %L42, label %L22

L22:                                              ; preds = %L19
;  @ special/hyperbolic.jl:160 within `tanh'
; ┌ @ float.jl:458 within `<'
   %10 = fcmp uge double %8, 0x3E30000000000000
; └
  br i1 %10, label %L26, label %L3

L26:                                              ; preds = %L22
;  @ special/hyperbolic.jl:163 within `tanh'
; ┌ @ operators.jl:350 within `>='
; │┌ @ float.jl:460 within `<='
    %11 = fcmp ult double %8, 1.000000e+00
; └└
  br i1 %11, label %L34, label %L28

L28:                                              ; preds = %L26
;  @ special/hyperbolic.jl:165 within `tanh'
; ┌ @ float.jl:405 within `*'
   %12 = fmul double %8, 2.000000e+00
; └
; ┌ @ math.jl:366 within `expm1'
   %13 = call double inttoptr (i64 140691799529184 to double (double)*)(double %12)
; └
;  @ special/hyperbolic.jl:166 within `tanh'
; ┌ @ float.jl:401 within `+'
   %14 = fadd double %13, 2.000000e+00
; └
; ┌ @ float.jl:407 within `/'
   %15 = fdiv double 2.000000e+00, %14
; └
; ┌ @ float.jl:403 within `-'
   %16 = fsub double 1.000000e+00, %15
; └
  br label %L42

L34:                                              ; preds = %L26
;  @ special/hyperbolic.jl:169 within `tanh'
; ┌ @ float.jl:405 within `*'
   %17 = fmul double %8, -2.000000e+00
; └
; ┌ @ math.jl:366 within `expm1'
   %18 = call double inttoptr (i64 140691799529184 to double (double)*)(double %17)
; └
;  @ special/hyperbolic.jl:170 within `tanh'
; ┌ @ float.jl:393 within `-'
   %19 = fsub double -0.000000e+00, %18
; └
; ┌ @ float.jl:401 within `+'
   %20 = fadd double %18, 2.000000e+00
; └
; ┌ @ float.jl:407 within `/'
   %21 = fdiv double %19, %20
   br label %L42

L42:                                              ; preds = %L34, %L28, %L19
   %value_phi1 = phi double [ %16, %L28 ], [ %21, %L34 ], [ 1.000000e+00, %L19 ]
; └
;  @ special/hyperbolic.jl:176 within `tanh'
; ┌ @ floatfuncs.jl:5 within `copysign'
   %22 = bitcast double %value_phi1 to i64
   %23 = bitcast double %0 to i64
   %24 = and i64 %23, -9223372036854775808
   %25 = and i64 %22, 9223372036854775807
   %26 = or i64 %25, %24
; └
  %27 = bitcast i64 %26 to double
  ret double %27
}

Algorithms which branch to simpler implementations

I was wondering what happens when there is a branch in the function, to use a simpler algorithm when (say) some variable is zero.

Below is a toy example, which seems to lead to wrong answers. What I was working towards when I hit #112 was the fact that 5-arg mul! takes shortcuts when α == 0; LinearAlgebra is full of such things which e.g. branch to a simpler algorithm when ishermitian(A).

Is this a problem which Enzyme could conceivably detect and avoid? ForwardDiff does not do so at present, but I think that essentially changing iszero(::Dual) can fix it, discussed here: JuliaDiff/ForwardDiff.jl#480 . Enzyme seems like black magic to me, but if it knows which numbers are Active, can it somehow use this to avoid measure-zero branches?

julia> using Enzyme, ForwardDiff

julia> x = Float32[1 2 0 3 4];

julia> dx = zero(x); autodiff(prod, Active, Duplicated(x, dx)); dx  # ok
1×5 Matrix{Float32}:
 0.0  0.0  24.0  0.0  0.0

# implementations of prod from https://github.com/JuliaDiff/ForwardDiff.jl/issues/480

julia> function prod1(xs::Array)
           p = one(eltype(xs))
           for x in xs
               p = p * x
           end
           p
       end;

julia> function prod2(xs::Array)
           p = one(eltype(xs))
           for x in xs
               p = p * x
               p == 0 && break  # exit early once you know the answer
           end
           p
       end;

julia> dx = zero(x); autodiff(prod1, Active, Duplicated(x, dx)); dx  # ok
1×5 Matrix{Float32}:
 0.0  0.0  24.0  0.0  0.0

 julia> dx = zero(x); autodiff(prod2, Active, Duplicated(x, dx)); dx  # wrong
1×5 Matrix{Float32}:
 0.0  0.0  2.0  0.0  0.0

julia> ForwardDiff.gradient(prod2, x)  # same mistake
1×5 Matrix{Float32}:
 0.0  0.0  2.0  0.0  0.0

Handling exceptions


julia> function f(x)
         y = 0.0
         try
           y = 2*x
           error("")
         catch
         end 
         y
       end
f (generic function with 1 method)

julia> Enzyme.autodiff(f, Active(2.0))

`AssertionError: rt <: Union{AbstractFloat, Nothing}`

Just playing around with Enzyme.jl to see if I can auto-diff through OceanTurb.jl and array mutation but I seem to encounter this error:

julia> using Statistics, Enzyme

julia> function g(x)  # Basically g(x) = x^2
           a = x * ones(10)
           for n in 1:length(a)
               a[n] = x^2
           end
           return mean(a)
       end
g (generic function with 1 method)

julia> autodiff(g, Active(5.0))
ERROR: AssertionError: rt <: Union{AbstractFloat, Nothing}
Stacktrace:
 [1] fspec at /home/alir/.julia/packages/Enzyme/XmiBH/src/compiler.jl:89 [inlined]
 [2] thunk(::typeof(g), ::Type{Tuple{Active{Float64}}}, ::Val{false}) at /home/alir/.julia/packages/Enzyme/XmiBH/src/compiler/thunk.jl:64 (repeats 2 times)
 [3] autodiff(::Function, ::Active{Float64}) at /home/alir/.julia/packages/Enzyme/XmiBH/src/Enzyme.jl:46
 [4] top-level scope at REPL[27]:1

Environment

(@v1.5) pkg> status Enzyme
Status `~/.julia/environments/v1.5/Project.toml`
  [7da242da] Enzyme v0.3.0

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, cascadelake)

Support for ccall

Enzyme likes names, Julia likes pointers, Needs to be solved for using Enzyme's MPI support

Structured activity annotations.

It could be very handy for users to support annotations nested in NamesTuples/structs. Let's say we have

foo(a::Real, b::NamedTuple) = exp(log(a) + log(sum(b.x)) + log(b.y.q * b.y.r))

foo(4.2, (x = [4.0, 5.0], y = (q = 0.1, r = 0.2)))

and let's assume the user knows some parts of this are const. This currently doesn't work:

using Enzyme
a = Active(4.2)
b = (x = Const([4.0, 5.0]), y = (q = Active(0.1), r = Active(0.2)))
autodiff(foo, a, b)
ERROR: return type is Union{}, giving up.

But Enzyme is capable of handling nested mixed const/active/duplicated structures internally, of course. So the semantic equivalent

expanded_foo(a, b_x, b_y_q, b_y_r) = foo(a, (x = b_x, y = (q = b_y_q, r = b_y_r)))
da, d_y_q, d_y_r = autodiff(expanded_foo, a, b.x, b.y.q, b.y.r)
(a = da, y = (q = d_y_q, r = d_y_r))

works fine - it's just a bit hard on the user. :-)

Could autodiff(foo, a, b) be supported by just making some limited changes to Enzyme's "input formatting" stage?

README example segfaults

julia> autodiff(f1, Active(1.0))
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/Enzyme.cpp:103: void HandleAutoDiff(llvm::CallInst*, llvm::TargetLibraryInfo&, llvm::AAResults&): Assertion `0 && "illegal diffe metadata string"' failed.

signal (6): Aborted
in expression starting at REPL[5]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fdadb292748)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
HandleAutoDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/Enzyme.cpp:103
Allocations: 37717581 (Pool: 37701286; Big: 16295); GC: 43
[1]    20710 abort (core dumped)  julia

Do I need to build Enzyme locally first?

Custom pullbacks for PP AD example

Just to clarify my question: the way my AD objective is setup, there's a layer of indirection between the targets I'm taking the gradient with respect to, and the computation of the objective function.
E.g. say I'm taking the gradient with respect to a random choice (and let's ignore hierarchical calls, and just focus on a single call)

@inline function (ctx::ChoiceBackpropagateContext)(call::typeof(trace), 
                                                   addr::T, 
                                                   d::Distribution{K}) where {T <: Address, K}
    haskey(ctx.target, addr) || return get_value(get_sub(ctx.call, addr))
    s = read_choice(ctx, addr)
    increment!(ctx, logpdf(d, s))
    return s
end

The user provides a selection of "what choices do you want grads for" and then this context will re-trace the program. read_choice has a custom adjoint, and the objective is kept in the ctx - here increment! accumulates the logpdf of the choice onto the objective.
So, AD is sort of messy in this case.

Then, everything is tied together by calling pullback here:

function accumulate_choice_gradients!(fillables::S, initial_params::P, choice_grads, choice_target::K, cl::DynamicCallSite, ret_grad...) where {S <: AddressMap, P <: AddressMap, K <: Target}
    fn = (args, choices) -> begin
        ctx = ChoiceBackpropagate(cl, fillables, initial_params, choices, choice_grads, choice_target)
        ret = ctx(cl.fn, args...)
        (ctx.weight, ret)
    end
    blank = Store()
    _, back = Zygote.pullback(fn, cl.args, blank)
    arg_grads, grad_ref = back((1.0, ret_grad...))
    choice_vals = filter_acc!(choice_grads, cl, grad_ref, choice_target)
    return choice_vals, arg_grads
end

So here, the function which I pullback takes care of instantiating the context, then accumulate the objective (here, ctx.weight) etc. And the "glue" pullbacks for read_choice make sure that only terms which the user targets are accumulated.

I think this is an "advanced" Julia AD example which might be interesting to think about.

Crash with broadcast

(No sure whether to report this here or on the main Enzyme repo.)

Enzyme seems to crash with broadcast! (non-allocating). This works fine:

using Enzyme

function foo_nobc!(R, A, B)
    @assert eachindex(R) == eachindex(R) == eachindex(B)
    @inbounds @simd for i in eachindex(R)
        R[i] = A[i] + B[i]
    end
    return nothing
end

function foo_bc!(R, A, B)
    broadcast!(+, R, A, B)
    return nothing
end


A = rand(10); B = rand(10); R = similar(A)
dA = zero(A); dB = zero(B); dR = fill!(similar(R), 1)

foo_nobc!(R, A, B)
foo_bc!(R, A, B)

autodiff(foo_nobc!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

But

autodiff(foo_bc!, Duplicated(R, dR), Duplicated(A, dA), Duplicated(B, dB))

goes boom:

mod: ; ModuleID = 'text'
source_filename = "text"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-pc-linux-gnu"

@exception.4 = private unnamed_addr constant [10 x i8] c"exception\00", align 1

; Function Attrs: readnone
declare {}*** @julia.ptls_states() local_unnamed_addr #0

declare nonnull {} addrspace(10)* @jl_invoke({} addrspace(10)*, {} addrspace(10)** nocapture readonly, i32, {} addrspace(10)*) local_unnamed_addr
[...]

In principle, Enzyme should be able to support (non-allocating) broadcasts, right?

Symbol lookup failure

(@wsmoses not sure whether to report this here or at Enzyme itself)

using Enzyme

a = 2
c = 3

f(a, c) = a * c
∂f_∂a = autodiff(f, Active(a), c)

results in

name = "__memmove_ssse3_back"
ERROR: Enzyme: Symbol lookup failed. Aborting!
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] resolver(name::Cstring, ctx::Ptr{Nothing})
    @ Enzyme.Compiler /user/.julia/dev/Enzyme/src/compiler.jl:826

Doesn't happen when using c instead of √c.

Tested using Enzyme.jl v0.6.0 with official Julia Linux x86_64 binaries, system info:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

GC support

@noinline escape(x) = Base.inferencebarrier(nothing)::Nothing

function f(x)
    r=Ref{Float64}()
    r[] = 2*x
    escape(r)
    r[]
 end

Enzyme.Compiler.enzyme_code_llvm(f, Tuple{Active{Float64}}, run_enzyme=false, dump_module=true)

with https://github.com/wsmoses/Enzyme.jl/tree/gc

`sphericalbessely` makes Enzyme unhappy due to exceptions.

using Enzyme, SpecialFunctions

Enzyme.autodiff(sphericalbessely, Active(0.3), Const(0.5)))

Fails due to the presence of:

i8* ()* asm "movq %fs:0, $0", "=r"

IIUC this occurs primarily because there are exceptions in the code.
we seem to also have opted to not test bessel anymore https://github.com/wsmoses/Enzyme.jl/blob/5f213fe779e918ef1a465cbd6bf685638d13809d/test/runtests.jl#L176-L183

Might be fixed by #51 since that will allow us to reason about the ptls getter directly.

cc: @emmanuellujan

GC addresses

Preprocess:

define internal fastcc nonnull {} addrspace(10)* @preprocess_julia___1390(double %0, {} addrspace(10)* nonnull align 16 dereferenceable(40) %1) unnamed_addr !dbg !749 {
top:
  %malloccall = tail call noalias i8* @malloc(i64 8), !enzyme_fromstack !4
  %2 = bitcast i8* %malloccall to [1 x [1 x i64]]*
  %malloccall2 = tail call noalias i8* @malloc(i64 8), !enzyme_fromstack !4
  %3 = bitcast i8* %malloccall2 to [1 x [1 x i64]]*
  %4 = call {}*** @julia.ptls_states()
  call void @llvm.dbg.value(metadata {} addrspace(10)* null, metadata !753, metadata !DIExpression(DW_OP_deref)), !dbg !754
  call void @llvm.dbg.value(metadata double %0, metadata !752, metadata !DIExpression()), !dbg !754
  call void @llvm.dbg.value(metadata {} addrspace(10)* %1, metadata !753, metadata !DIExpression(DW_OP_deref)), !dbg !754
  %5 = bitcast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(10)*, !dbg !755
  %6 = addrspacecast {} addrspace(10)* addrspace(10)* %5 to {} addrspace(10)* addrspace(11)*, !dbg !755
  %7 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %6, i64 3, !dbg !755
  %8 = bitcast {} addrspace(10)* addrspace(11)* %7 to i64 addrspace(11)*, !dbg !755
  %9 = load i64, i64 addrspace(11)* %8, align 8, !dbg !755, !tbaa !101, !range !104
  %10 = getelementptr inbounds [1 x [1 x i64]], [1 x [1 x i64]]* %2, i64 0, i64 0, i64 0, !dbg !762
  store i64 %9, i64* %10, align 8, !dbg !762, !tbaa !109
  %11 = call nonnull {} addrspace(10)* @jl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140550914329872 to {}*) to {} addrspace(10)*), i64 %9), !dbg !764
  %12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !772
  %13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !772
  %14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 3, !dbg !772
  %15 = bitcast {} addrspace(10)* addrspace(11)* %14 to i64 addrspace(11)*, !dbg !772
  %16 = load i64, i64 addrspace(11)* %15, align 8, !dbg !772, !tbaa !101, !range !104
  switch i64 %16, label %L22 [
    i64 0, label %L12
    i64 1, label %L16
  ], !dbg !776

L12:                                              ; preds = %top
  %17 = icmp eq i64 %9, 0, !dbg !779
  br i1 %17, label %L30, label %L110, !dbg !774

L16:                                              ; preds = %top
  %18 = icmp eq i64 %9, 1, !dbg !782
  br i1 %18, label %L30, label %L110, !dbg !774

L22:                                              ; preds = %top
  %19 = icmp eq i64 %16, %9, !dbg !785
  br i1 %19, label %L30, label %L110, !dbg !774

L30:                                              ; preds = %L22, %L16, %L12
  %.not15 = icmp eq {} addrspace(10)* %11, %1, !dbg !787
  br i1 %.not15, label %L56, label %L33, !dbg !787

L33:                                              ; preds = %L30
  %20 = load i8, i8* inttoptr (i64 140550914329945 to i8*), align 1, !dbg !793, !tbaa !172, !invariant.load !4
  %21 = and i8 %20, 1, !dbg !795
  %.not19.not = icmp eq i8 %21, 0, !dbg !795
  br i1 %.not19.not, label %L39, label %L56, !dbg !795

L39:                                              ; preds = %L33
  %22 = addrspacecast {} addrspace(10)* %11 to {} addrspace(11)*, !dbg !797
  %23 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %22) #6, !dbg !797
  %24 = bitcast {}* %23 to i64*, !dbg !797
  %25 = load i64, i64* %24, align 8, !dbg !797, !tbaa !181, !range !183
  %26 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !797
  %27 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %26) #6, !dbg !797
  %28 = bitcast {}* %27 to i64*, !dbg !797
  %29 = load i64, i64* %28, align 8, !dbg !797, !tbaa !181, !range !183
  %.not = icmp eq i64 %25, %29, !dbg !800
  br i1 %.not, label %L51, label %L56, !dbg !796

L51:                                              ; preds = %L39
  %30 = call nonnull {} addrspace(10)* @jl_array_copy({} addrspace(10)* nonnull %1), !dbg !803
  br label %L56, !dbg !796

L56:                                              ; preds = %L51, %L39, %L33, %L30
  %value_phi2 = phi {} addrspace(10)* [ %30, %L51 ], [ %1, %L39 ], [ %1, %L30 ], [ %1, %L33 ]
  %.not16 = icmp eq i64 %9, 0, !dbg !805
  br i1 %.not16, label %L120, label %L77.lr.ph, !dbg !806

L77.lr.ph:                                        ; preds = %L56
  %31 = bitcast {} addrspace(10)* %value_phi2 to {} addrspace(10)* addrspace(10)*, !dbg !808
  %32 = addrspacecast {} addrspace(10)* addrspace(10)* %31 to {} addrspace(10)* addrspace(11)*, !dbg !808
  %33 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %32, i64 3, !dbg !808
  %34 = bitcast {} addrspace(10)* addrspace(11)* %33 to i64 addrspace(11)*, !dbg !808
  %35 = load i64, i64 addrspace(11)* %34, align 8, !dbg !808, !tbaa !101, !range !104
  %.not18 = icmp eq i64 %35, 1, !dbg !812
  %36 = bitcast {} addrspace(10)* %value_phi2 to double addrspace(13)* addrspace(10)*, !dbg !816
  %37 = addrspacecast double addrspace(13)* addrspace(10)* %36 to double addrspace(13)* addrspace(11)*, !dbg !816
  %38 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %37, align 8, !dbg !816, !tbaa !181, !nonnull !4
  %39 = bitcast {} addrspace(10)* %11 to double addrspace(13)* addrspace(10)*, !dbg !824
  %40 = addrspacecast double addrspace(13)* addrspace(10)* %39 to double addrspace(13)* addrspace(11)*, !dbg !824
  %41 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %40, align 8, !dbg !824, !tbaa !181, !nonnull !4
  br i1 %.not18, label %L77.us.preheader, label %L77.preheader, !dbg !825

L77.preheader:                                    ; preds = %L77.lr.ph
  br label %L77, !dbg !825

L77.us.preheader:                                 ; preds = %L77.lr.ph
  br label %L77.us, !dbg !825

L77.us:                                           ; preds = %L77.us.preheader, %L77.us
  %tiv = phi i64 [ 0, %L77.us.preheader ], [ %tiv.next, %L77.us ]
  %tiv.next = add nuw nsw i64 %tiv, 1, !dbg !826
  %42 = load double, double addrspace(13)* %38, align 8, !dbg !826, !tbaa !227
  %43 = fmul double %42, %0, !dbg !827
  %44 = getelementptr inbounds double, double addrspace(13)* %41, i64 %tiv, !dbg !830
  store double %43, double addrspace(13)* %44, align 8, !dbg !830, !tbaa !227
  %45 = add nuw nsw i64 %tiv, 1, !dbg !831
  %exitcond25.not = icmp eq i64 %45, %9, !dbg !833
  br i1 %exitcond25.not, label %L120.loopexit, label %L77.us, !dbg !825, !llvm.loop !834

L77:                                              ; preds = %L77.preheader, %L77
  %tiv3 = phi i64 [ 0, %L77.preheader ], [ %tiv.next4, %L77 ]
  %tiv.next4 = add nuw nsw i64 %tiv3, 1, !dbg !826
  %46 = getelementptr inbounds double, double addrspace(13)* %38, i64 %tiv3, !dbg !826
  %47 = load double, double addrspace(13)* %46, align 8, !dbg !826, !tbaa !227
  %48 = fmul double %47, %0, !dbg !827
  %49 = getelementptr inbounds double, double addrspace(13)* %41, i64 %tiv3, !dbg !830
  store double %48, double addrspace(13)* %49, align 8, !dbg !830, !tbaa !227
  %50 = add nuw nsw i64 %tiv3, 1, !dbg !831
  %exitcond.not = icmp eq i64 %50, %9, !dbg !833
  br i1 %exitcond.not, label %L120.loopexit1, label %L77, !dbg !825, !llvm.loop !834

L110:                                             ; preds = %L22, %L16, %L12
  %51 = getelementptr inbounds [1 x [1 x i64]], [1 x [1 x i64]]* %3, i64 0, i64 0, i64 0, !dbg !835
  store i64 %16, i64* %51, align 8, !dbg !835, !tbaa !109
  %52 = addrspacecast [1 x [1 x i64]]* %3 to [1 x [1 x i64]] addrspace(11)*, !dbg !774
  %53 = addrspacecast [1 x [1 x i64]]* %2 to [1 x [1 x i64]] addrspace(11)*, !dbg !774
  %54 = call fastcc nonnull {} addrspace(10)* @julia_throwdm_1395([1 x [1 x i64]] addrspace(11)* nocapture nonnull readonly align 8 dereferenceable(8) %52, [1 x [1 x i64]] addrspace(11)* nocapture nonnull readonly align 8 dereferenceable(8) %53) #12, !dbg !774
  unreachable, !dbg !774

L120.loopexit:                                    ; preds = %L77.us
  br label %L120, !dbg !761

L120.loopexit1:                                   ; preds = %L77
  br label %L120, !dbg !761

L120:                                             ; preds = %L120.loopexit1, %L120.loopexit, %L56
  ret {} addrspace(10)* %11, !dbg !761
}

Augmented:

@augmented_julia___1390(double %0, {} addrspace(10)* nonnull align 16 dereferenceable(40) %1, {} addrspace(10)* %"'") unnamed_addr !dbg !836 {
top:
  %2 = alloca { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }, align 8
  %3 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 0
  %"iv'ac" = alloca i64, align 8
  %"iv6'ac" = alloca i64, align 8
  %_cache = alloca double*, align 8
  %_cache19 = alloca double*, align 8
  %malloccall = tail call noalias i8* @malloc(i64 8), !enzyme_fromstack !4
  %4 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 3
  store i8* %malloccall, i8** %4, align 8
  %5 = bitcast i8* %malloccall to [1 x [1 x i64]]*
  %malloccall2 = tail call noalias i8* @malloc(i64 8), !enzyme_fromstack !4
  %6 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 2
  store i8* %malloccall2, i8** %6, align 8
  %7 = bitcast i8* %malloccall2 to [1 x [1 x i64]]*
  %8 = bitcast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(10)*, !dbg !841
  %9 = addrspacecast {} addrspace(10)* addrspace(10)* %8 to {} addrspace(10)* addrspace(11)*, !dbg !841
  %10 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %9, i64 3, !dbg !841
  %11 = bitcast {} addrspace(10)* addrspace(11)* %10 to i64 addrspace(11)*, !dbg !841
  %12 = load i64, i64 addrspace(11)* %11, align 8, !dbg !841, !tbaa !101, !range !104
  %13 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 9, !dbg !848
  store i64 %12, i64* %13, align 8, !dbg !848
  %14 = getelementptr inbounds [1 x [1 x i64]], [1 x [1 x i64]]* %5, i64 0, i64 0, i64 0, !dbg !848
  store i64 %12, i64* %14, align 8, !dbg !848, !tbaa !109
  %15 = call nonnull {} addrspace(10)* @jl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140550914329872 to {}*) to {} addrspace(10)*), i64 %12), !dbg !850
  %16 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 1, !dbg !858
  store {} addrspace(10)* %15, {} addrspace(10)** %16, align 8, !dbg !858
  %17 = call {} addrspace(10)* @jl_alloc_array_1d({} addrspace(10)* addrspacecast ({}* inttoptr (i64 140550914329872 to {}*) to {} addrspace(10)*), i64 %12), !dbg !858
  %18 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 0, !dbg !858
  store {} addrspace(10)* %17, {} addrspace(10)** %18, align 8, !dbg !858
  %19 = mul i64 %12, 8, !dbg !858
  %20 = bitcast {} addrspace(10)* %17 to i8* addrspace(10)*, !dbg !858
  %21 = load i8*, i8* addrspace(10)* %20, align 8, !dbg !858
  call void @llvm.memset.p0i8.i64(i8* %21, i8 0, i64 %19, i1 false), !dbg !858
  %22 = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !858
  %23 = addrspacecast {} addrspace(10)* addrspace(10)* %22 to {} addrspace(10)* addrspace(11)*, !dbg !858
  %24 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %23, i64 3, !dbg !858
  %25 = bitcast {} addrspace(10)* addrspace(11)* %24 to i64 addrspace(11)*, !dbg !858
  %26 = load i64, i64 addrspace(11)* %25, align 8, !dbg !858, !tbaa !101, !range !104
  switch i64 %26, label %L22 [
    i64 0, label %L12
    i64 1, label %L16
  ], !dbg !862

L12:                                              ; preds = %top
  %27 = icmp eq i64 %12, 0, !dbg !865
  br i1 %27, label %L30, label %L110, !dbg !860

L16:                                              ; preds = %top
  %28 = icmp eq i64 %12, 1, !dbg !868
  br i1 %28, label %L30, label %L110, !dbg !860

L22:                                              ; preds = %top
  %29 = icmp eq i64 %26, %12, !dbg !871
  br i1 %29, label %L30, label %L110, !dbg !860

L30:                                              ; preds = %L22, %L16, %L12
  %.not15 = icmp eq {} addrspace(10)* %15, %1, !dbg !873
  br i1 %.not15, label %L56, label %L33, !dbg !873

L33:                                              ; preds = %L30
  %30 = load i8, i8* inttoptr (i64 140550914329945 to i8*), align 1, !dbg !879, !tbaa !172, !invariant.load !4
  %31 = and i8 %30, 1, !dbg !881
  %.not19.not = icmp eq i8 %31, 0, !dbg !881
  %32 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 10, !dbg !881
  store i1 %.not19.not, i1* %32, align 1, !dbg !881
  br i1 %.not19.not, label %L39, label %L56, !dbg !881

L39:                                              ; preds = %L33
  %"'ipc11" = addrspacecast {} addrspace(10)* %17 to {} addrspace(11)*, !dbg !883
  %33 = addrspacecast {} addrspace(10)* %15 to {} addrspace(11)*, !dbg !883
  %34 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc11"), !dbg !883
  %35 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %33) #6, !dbg !883
  %"'ipc9" = bitcast {}* %34 to i64*, !dbg !883
  %36 = bitcast {}* %35 to i64*, !dbg !883
  %"'ipl10" = load i64, i64* %"'ipc9", align 8, !dbg !883
  %37 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 5, !dbg !883
  store i64 %"'ipl10", i64* %37, align 8, !dbg !883
  %38 = load i64, i64* %36, align 8, !dbg !883, !tbaa !181, !range !183
  %"'ipc8" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !883
  %39 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !883
  %40 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc8"), !dbg !883
  %41 = call nonnull align 8 {}* @julia.pointer_from_objref({} addrspace(11)* %39) #6, !dbg !883
  %"'ipc" = bitcast {}* %40 to i64*, !dbg !883
  %42 = bitcast {}* %41 to i64*, !dbg !883
  %"'ipl" = load i64, i64* %"'ipc", align 8, !dbg !883
  %43 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 4, !dbg !883
  store i64 %"'ipl", i64* %43, align 8, !dbg !883
  %44 = load i64, i64* %42, align 8, !dbg !883, !tbaa !181, !range !183
  %.not = icmp eq i64 %38, %44, !dbg !886
  %45 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 11, !dbg !882
  store i1 %.not, i1* %45, align 1, !dbg !882
  br i1 %.not, label %L51, label %L56, !dbg !882

L51:                                              ; preds = %L39
  call void @jl_error(i8* getelementptr inbounds ([54 x i8], [54 x i8]* @0, i32 0, i32 0)), !dbg !889
  %46 = call nonnull {} addrspace(10)* @jl_array_copy({} addrspace(10)* nonnull %1), !dbg !889
  %47 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 7, !dbg !882
  store {} addrspace(10)* %46, {} addrspace(10)** %47, align 8, !dbg !882
  %48 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 6, !dbg !882
  store {} addrspace(10)* %46, {} addrspace(10)** %48, align 8, !dbg !882
  br label %L56, !dbg !882

L56:                                              ; preds = %L51, %L39, %L33, %L30
  %49 = phi {} addrspace(10)* [ %46, %L51 ], [ %"'", %L39 ], [ %"'", %L30 ], [ %"'", %L33 ]
  %value_phi2 = phi {} addrspace(10)* [ %46, %L51 ], [ %1, %L39 ], [ %1, %L30 ], [ %1, %L33 ]
  %.not16 = icmp eq i64 %12, 0, !dbg !891
  br i1 %.not16, label %L120, label %L77.lr.ph, !dbg !892

L77.lr.ph:                                        ; preds = %L56
  %50 = bitcast {} addrspace(10)* %value_phi2 to {} addrspace(10)* addrspace(10)*, !dbg !894
  %51 = addrspacecast {} addrspace(10)* addrspace(10)* %50 to {} addrspace(10)* addrspace(11)*, !dbg !894
  %52 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %51, i64 3, !dbg !894
  %53 = bitcast {} addrspace(10)* addrspace(11)* %52 to i64 addrspace(11)*, !dbg !894
  %54 = load i64, i64 addrspace(11)* %53, align 8, !dbg !894, !tbaa !101, !range !104
  %.not18 = icmp eq i64 %54, 1, !dbg !898
  %55 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 12, !dbg !902
  store i1 %.not18, i1* %55, align 1, !dbg !902
  %"'ipc15" = bitcast {} addrspace(10)* %49 to double addrspace(13)* addrspace(10)*, !dbg !902
  %56 = bitcast {} addrspace(10)* %value_phi2 to double addrspace(13)* addrspace(10)*, !dbg !902
  %"'ipc16" = addrspacecast double addrspace(13)* addrspace(10)* %"'ipc15" to double addrspace(13)* addrspace(11)*, !dbg !902
  %57 = addrspacecast double addrspace(13)* addrspace(10)* %56 to double addrspace(13)* addrspace(11)*, !dbg !902
  %"'ipl17" = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %"'ipc16", align 8, !dbg !902
  %58 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 8, !dbg !902
  store double addrspace(13)* %"'ipl17", double addrspace(13)** %58, align 8, !dbg !902
  %59 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %57, align 8, !dbg !902, !tbaa !181, !nonnull !4
  %"'ipc12" = bitcast {} addrspace(10)* %17 to double addrspace(13)* addrspace(10)*, !dbg !910
  %60 = bitcast {} addrspace(10)* %15 to double addrspace(13)* addrspace(10)*, !dbg !910
  %"'ipc13" = addrspacecast double addrspace(13)* addrspace(10)* %"'ipc12" to double addrspace(13)* addrspace(11)*, !dbg !910
  %61 = addrspacecast double addrspace(13)* addrspace(10)* %60 to double addrspace(13)* addrspace(11)*, !dbg !910
  %"'ipl14" = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %"'ipc13", align 8, !dbg !910
  %62 = load double addrspace(13)*, double addrspace(13)* addrspace(11)* %61, align 8, !dbg !910, !tbaa !181, !nonnull !4
  br i1 %.not18, label %L77.us.preheader, label %L77.preheader, !dbg !911

L77.preheader:                                    ; preds = %L77.lr.ph
  %63 = add nsw i64 %12, -1, !dbg !911
  %64 = add nuw i64 %63, 1, !dbg !911
  %mallocsize20 = mul nuw nsw i64 %64, 8
  %malloccall21 = tail call noalias nonnull i8* @malloc(i64 %mallocsize20)
  %_malloccache22 = bitcast i8* %malloccall21 to double*
  %65 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 14, !dbg !911
  store double* %_malloccache22, double** %65, align 8, !dbg !911
  store double* %_malloccache22, double** %_cache19, align 8, !dbg !911, !invariant.group !912
  br label %L77, !dbg !911

L77.us.preheader:                                 ; preds = %L77.lr.ph
  %66 = add nsw i64 %12, -1, !dbg !911
  %67 = add nuw i64 %66, 1, !dbg !911
  %mallocsize = mul nuw nsw i64 %67, 8
  %malloccall18 = tail call noalias nonnull i8* @malloc(i64 %mallocsize)
  %_malloccache = bitcast i8* %malloccall18 to double*
  %68 = getelementptr inbounds { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }* %3, i32 0, i32 13, !dbg !911
  store double* %_malloccache, double** %68, align 8, !dbg !911
  store double* %_malloccache, double** %_cache, align 8, !dbg !911, !invariant.group !913
  br label %L77.us, !dbg !911

L77.us:                                           ; preds = %L77.us, %L77.us.preheader
  %iv = phi i64 [ %iv.next, %L77.us ], [ 0, %L77.us.preheader ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !914
  %69 = load double, double addrspace(13)* %59, align 8, !dbg !914, !tbaa !227
  %70 = fmul double %69, %0, !dbg !915
  %71 = getelementptr inbounds double, double addrspace(13)* %62, i64 %iv, !dbg !918
  store double %70, double addrspace(13)* %71, align 8, !dbg !918, !tbaa !227
  %72 = load double*, double** %_cache, align 8, !dbg !919, !dereferenceable !920, !invariant.group !913
  %73 = getelementptr inbounds double, double* %72, i64 %iv, !dbg !919
  store double %69, double* %73, align 8, !dbg !919, !invariant.group !921
  %exitcond25.not = icmp eq i64 %iv.next, %12, !dbg !919
  br i1 %exitcond25.not, label %L120.loopexit, label %L77.us, !dbg !911, !llvm.loop !922

L77:                                              ; preds = %L77, %L77.preheader
  %iv6 = phi i64 [ %iv.next7, %L77 ], [ 0, %L77.preheader ]
  %iv.next7 = add nuw nsw i64 %iv6, 1, !dbg !914
  %74 = getelementptr inbounds double, double addrspace(13)* %59, i64 %iv6, !dbg !914
  %75 = load double, double addrspace(13)* %74, align 8, !dbg !914, !tbaa !227
  %76 = fmul double %75, %0, !dbg !915
  %77 = getelementptr inbounds double, double addrspace(13)* %62, i64 %iv6, !dbg !918
  store double %76, double addrspace(13)* %77, align 8, !dbg !918, !tbaa !227
  %78 = load double*, double** %_cache19, align 8, !dbg !919, !dereferenceable !920, !invariant.group !912
  %79 = getelementptr inbounds double, double* %78, i64 %iv6, !dbg !919
  store double %75, double* %79, align 8, !dbg !919, !invariant.group !923
  %exitcond.not = icmp eq i64 %iv.next7, %12, !dbg !919
  br i1 %exitcond.not, label %L120.loopexit1, label %L77, !dbg !911, !llvm.loop !922

L110:                                             ; preds = %L22, %L16, %L12
  %80 = getelementptr inbounds [1 x [1 x i64]], [1 x [1 x i64]]* %7, i64 0, i64 0, i64 0, !dbg !924
  store i64 %26, i64* %80, align 8, !dbg !924, !tbaa !109
  %81 = addrspacecast [1 x [1 x i64]]* %7 to [1 x [1 x i64]] addrspace(11)*, !dbg !860
  %82 = addrspacecast [1 x [1 x i64]]* %5 to [1 x [1 x i64]] addrspace(11)*, !dbg !860
  %83 = call fastcc nonnull {} addrspace(10)* @julia_throwdm_1395([1 x [1 x i64]] addrspace(11)* nocapture nonnull readonly align 8 dereferenceable(8) %81, [1 x [1 x i64]] addrspace(11)* nocapture nonnull readonly align 8 dereferenceable(8) %82) #12, !dbg !860
  unreachable, !dbg !860

L120.loopexit:                                    ; preds = %L77.us
  br label %L120, !dbg !847

L120.loopexit1:                                   ; preds = %L77
  br label %L120, !dbg !847

L120:                                             ; preds = %L120.loopexit1, %L120.loopexit, %L56
  %84 = insertvalue { i8*, {} addrspace(10)*, {} addrspace(10)* } undef, {} addrspace(10)* %15, 1, !dbg !847
  %85 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 1, !dbg !847
  store {} addrspace(10)* %15, {} addrspace(10)** %85, align 8, !dbg !847
  %86 = getelementptr inbounds { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }* %2, i32 0, i32 2, !dbg !847
  store {} addrspace(10)* %17, {} addrspace(10)** %86, align 8, !dbg !847
  %87 = load { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }, { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* }* %2, align 8, !dbg !847
  ret { { {} addrspace(10)*, {} addrspace(10)*, i8*, i8*, i64, i64, {} addrspace(10)*, {} addrspace(10)*, double addrspace(13)*, i64, i1, i1, i1, double*, double* }, {} addrspace(10)*, {} addrspace(10)* } %87, !dbg !847
}

Unknown binary operator

Here's a MWE for an error I experienced:

julia> using Enzyme

julia> function f(x)
           y = (x + im)*(x - im)
           real(y)
       end
f (generic function with 1 method)

julia> autodiff(f, Active(2.0))
4.0julia> function g(x)
           y = (x + im)*(1 - im*x)
           real(y)
       end
g (generic function with 1 method)

julia> autodiff(g, Active(2.0))define dso_local double @preprocess_julia_overdub_1835(double) local_unnamed_addr {
top:
  %1 = bitcast double %0 to i64
  %2 = and i64 %1, -9223372036854775808
  %3 = bitcast i64 %2 to double
  %4 = fsub double 1.000000e+00, %3
  %5 = fmul double %4, %0
  %6 = fadd double %5, %0
  ret double %6
} constantinst[  %1 = bitcast double %0 to i64] = 0 val:0 type: {[-1]:Float@double}
 constantinst[  %2 = and i64 %1, -9223372036854775808] = 0 val:0 type: {[-1]:Float@double}
 constantinst[  ret double %6] = 1 val:1 type: {}
 constantinst[  %3 = bitcast i64 %2 to double] = 0 val:0 type: {[-1]:Float@double}
 constantinst[  %4 = fsub double 1.000000e+00, %3] = 0 val:0 type: {[-1]:Float@double}
 constantinst[  %5 = fmul double %4, %0] = 0 val:0 type: {[-1]:Float@double}
 constantinst[  %6 = fadd double %5, %0] = 0 val:0 type: {[-1]:Float@double}
cannot handle unknown binary operator:   %2 = and i64 %1, -9223372036854775808
ERROR: LLVM error: unknown binary operator
Stacktrace:
 [1] handle_error(::Cstring) at /home/mason/.julia/packages/LLVM/dVU7J/src/core/context.jl:105

Broken instalation on Mac

Hello,

I was curious to test Enzyme to see, if it is faster than my custom written rules. But the installation on Mac is broken

julia> autodiff(f1, Active(1.0))
ERROR: could not load library "LLVMEnzyme-9.dylib"
dlopen(LLVMEnzyme-9.dylib, 1): image not found
Stacktrace:
 [1] enzyme! at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/compiler/optimize.jl:2 [inlined]
 [2] (::Enzyme.Compiler.var"#2#3"{Bool,LLVM.Module,String,LLVM.TargetMachine})(::LLVM.ModulePassManager) at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/compiler/optimize.jl:82
 [3] LLVM.ModulePassManager(::Enzyme.Compiler.var"#2#3"{Bool,LLVM.Module,String,LLVM.TargetMachine}) at /Users/tomas.pevny/.julia/packages/LLVM/KITdB/src/passmanager.jl:28
 [4] optimize!(::LLVM.Module, ::LLVM.Function; run_enzyme::Bool) at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/compiler/optimize.jl:13
 [5] Enzyme.LLVMThunk(::Function, ::Tuple{DataType}; optimize::Bool, run_enzyme::Bool) at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/Enzyme.jl:120
 [6] LLVMThunk at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/Enzyme.jl:38 [inlined]
 [7] Enzyme.Thunk(::Function, ::Tuple{DataType}) at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/Enzyme.jl:130
 [8] autodiff(::Function, ::Active{Float64}) at /Users/tomas.pevny/.julia/packages/Enzyme/QQP8b/src/Enzyme.jl:185
 [9] top-level scope at REPL[6]:1

Do you have ideas what could go wrong?

Enzyme.pullback: user defined struct support

currently this errors since gradient will assert if input argument is a AbstractFloat

julia> struct MyComplex
           a::Float64
           b::Float64
           end

julia> x = MyComplex(1.0, 2.0)
MyComplex(1.0, 2.0)

julia> Base.abs(x::MyComplex) = x.a + x.b

julia> Enzyme.pullback(abs, x)(1.0)
ERROR: AssertionError: arg isa AbstractFloat
Stacktrace:
 [1] #8
   @ ~/.julia/packages/Enzyme/A97If/src/Enzyme.jl:126 [inlined]
 [2] ntuple
   @ ./ntuple.jl:48 [inlined]
 [3] pack
   @ ~/.julia/packages/Enzyme/A97If/src/Enzyme.jl:123 [inlined]
 [4] gradient(f::Function, args::MyComplex)
   @ Enzyme ~/.julia/packages/Enzyme/A97If/src/Enzyme.jl:140
 [5] (::Enzyme.var"#12#14"{typeof(abs), Tuple{MyComplex}})(c::Float64)
   @ Enzyme ~/.julia/packages/Enzyme/A97If/src/Enzyme.jl:151
 [6] top-level scope
   @ REPL[6]:1

Type Analysis mismatch

function bad(domain::Vector{Float64}, dx::Integer)
    @inbounds begin
        data = view(domain, 1:dx)
        buf = isbits(data) ? MPI.Buffer(Ref(data)) : MPI.Buffer(data)

        req = MPI.Request()
        # int MPI_Isend(const void* buf, int count, MPI_Datatype datatype, int dest,
        #               int tag, MPI_Comm comm, MPI_Request *request)
        ccall((:MPI_Isend, MPI.libmpi), Cint,
              (MPI.MPIPtr, Cint, MPI.MPI_Datatype, Cint, Cint, MPI.MPI_Comm, Ptr{MPI.MPI_Request}),
                      buf.data, buf.count, buf.datatype, 0, 0, MPI.COMM_WORLD, req)
        req.buffer = buf
    end
    nothing
end

Setjmp handling

@aviatesk asked me what happens when Enzyme tries to differentiate through exceptional control flow

Version: Enzyme v0.6.2.

using Enzyme

function f(cond, x)
         try
            x = x*x
            if cond
              error("Why")
           end
           x = x*x
         catch
         end
         x
       end
f (generic function with 1 method)

julia> autodiff(f, Const(false), Active(2.0))
inlinable function call in a function with debug info must have a !dbg location
  %2 = call fastcc double @julia_f_1802.inner(i8 %0, double %1)
define double @preprocess_julia_f_1802(i8 zeroext %0, double %1) local_unnamed_addr !dbg !56 {
entry:
  %2 = call fastcc double @julia_f_1802.inner(i8 %0, double %1)
  ret double %2
}

ERROR: LLVM error: function failed verification (1)

Julia crash with Enzyme on product of sums

Enzyme (current master) crashes Julia when auto-diffing a product of sums.

A single sum works fine:

using Enzyme

x = rand(10)
dx = zero(x)

foo(x::AbstractVector{<:Real}) = sum(x)
foo(x)
autodiff(foo, Duplicated(x, dx))

But

bar(x::AbstractVector{<:Real}) = sum(x) * sum(x)
bar(x)
autodiff(bar, Duplicated(x, dx))

goes boom:

julia: /buildworker/worker/package_linux64/build/src/llvm-late-gc-lowering.cpp:857: std::vector<int, std::allocator<int> > LateLowerGCFrame::NumberAllBase(State&, llvm::Value*): Assertion `Tracked.size() == Numbers.size()' failed.

signal (6): Aborted
in expression starting at REPL[9]:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
...
Aborted (core dumped)

Enzyme crash for simple user struct type

this crash on both master and latest release

julia> using Enzyme

julia> using Enzyme

julia> struct Foo
           x::Float64
       end

julia> function (f::Foo)(x::Float64)
           return f.x + x
       end

julia> Enzyme.pullback(Foo(1.0), 2.0)(1.0)
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:253: LLVMOpaqueValue* EnzymeCreatePrimalAndGradient(EnzymeLogicRef, LLVMValueRef, CDIFFE_TYPE, CDIFFE_TYPE*, size_t, EnzymeTypeAnalysisRef, uint8_t, uint8_t, uint8_t, LLVMTypeRef, CFnTypeInfo, uint8_t*, size_t, EnzymeAugmentedReturnPtr, uint8_t, uint8_t): Assertion `argnum < uncacheable_args_size' failed.

signal (6): Aborted
in expression starting at REPL[5]:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f557877f728)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
EnzymeCreatePrimalAndGradient at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:253
EnzymeCreatePrimalAndGradient at /home/roger/.julia/packages/Enzyme/A97If/src/api.jl:83
enzyme! at /home/roger/.julia/packages/Enzyme/A97If/src/compiler.jl:249
unknown function (ip: 0x7f551575c205)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#codegen#30 at /home/roger/.julia/packages/Enzyme/A97If/src/compiler.jl:435
codegen##kw at /home/roger/.julia/packages/Enzyme/A97If/src/compiler.jl:386 [inlined]
_thunk at /home/roger/.julia/packages/Enzyme/A97If/src/compiler.jl:761
unknown function (ip: 0x7f5515cc51f1)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
callback at /home/roger/.julia/packages/Enzyme/A97If/src/compiler.jl:822
unknown function (ip: 0x7f5515cbe801)
_ZN12_GLOBAL__N_134CompileCallbackMaterializationUnit11materializeEN4llvm3orc29MaterializationResponsibilityE at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession26materializeOnCurrentThreadESt10unique_ptrINS0_19MaterializationUnitESt14default_deleteIS3_EENS0_29MaterializationResponsibilityE at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZNSt17_Function_handlerIFvSt10unique_ptrIN4llvm3orc19MaterializationUnitESt14default_deleteIS3_EENS2_29MaterializationResponsibilityEEPS8_E9_M_invokeERKSt9_Any_dataOS6_OS7_ at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession17runOutstandingMUsEv at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupENS0_10LookupKindERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS8_EENS0_15SymbolLookupSetENS0_11SymbolStateENS_15unique_functionIFvNS_8ExpectedINS_8DenseMapINS0_15SymbolStringPtrENS_18JITEvaluatedSymbolENS_12DenseMapInfoISI_EENS_6detail12DenseMapPairISI_SJ_EEEEEEEEESt8functionIFvRKNSH_IS6_NS_8DenseSetISI_SL_EENSK_IS6_EENSN_IS6_SV_EEEEEE at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EERKNS0_15SymbolLookupSetENS0_10LookupKindENS0_11SymbolStateESt8functionIFvRKNS_8DenseMapIS5_NS_8DenseSetINS0_15SymbolStringPtrENS_12DenseMapInfoISK_EEEENSL_IS5_EENS_6detail12DenseMapPairIS5_SN_EEEEEE at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc16ExecutionSession6lookupERKSt6vectorISt4pairIPNS0_8JITDylibENS0_19JITDylibLookupFlagsEESaIS7_EENS0_15SymbolStringPtrENS0_11SymbolStateE at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc25JITCompileCallbackManager22executeCompileCallbackEm at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm6detail18UniqueFunctionBaseIvJmNS_15unique_functionIKFvmEEEEE8CallImplIKZNS_3orc30LocalJITCompileCallbackManagerINS7_14OrcX86_64_SysVEEC4ERNS7_16ExecutionSessionEmRNS_5ErrorEEUlmS4_E_EEvPvmRS4_ at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
_ZN4llvm3orc19LocalTrampolinePoolINS0_14OrcX86_64_SysVEE7reenterEPvS4_ at /home/roger/packages/julias/julia-1.6/bin/../lib/julia/libLLVM-11jl.so (unknown line)
unknown function (ip: 0x7f5578976043)
unknown function (ip: 0x7f5578975ff5)
#12 at /home/roger/.julia/packages/Enzyme/A97If/src/Enzyme.jl:151
unknown function (ip: 0x7f5515cbc787)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:115
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:204
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:155 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:562
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:670
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:877
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:825
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:929
eval at ./boot.jl:360 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
#run_repl#42 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:317
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:305
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
#874 at ./client.jl:387
jfptr_YY.874_41532.clone_1 at /home/roger/packages/julias/julia-1.6/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:714
#invokelatest#2 at ./essentials.jl:708 [inlined]
invokelatest at ./essentials.jl:706 [inlined]
run_main_repl at ./client.jl:372
exec_options at ./client.jl:302
_start at ./client.jl:485
jfptr__start_34289.clone_1 at /home/roger/packages/julias/julia-1.6/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2237 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2419
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1703 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:560
repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:702
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4007d8)
Allocations: 33613790 (Pool: 33600323; Big: 13467); GC: 45
[1]    9172 abort (core dumped)  julia --project

deps/usr/lib Not created in right order

build.log

Downloading release asset enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz to /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz
Downloaded /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz
Download of release assets complete
Release asset checksum verified for /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz
[ Info: Directory /home/wmoses/autodiff/Enzyme.jl/deps/usr/lib does not exist!
[ Info: Destination file /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz already exists, verifying...
[ Info: No hash cache found
[ Info: Calculated hash 9f2d8a9011348c6b167fc773a1090ce0190c2331d21f5ab8eea6dc2745bdf0ce for file /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz
[ Info: Installing /home/wmoses/autodiff/Enzyme.jl/deps/usr/downloads/enzyme.v0.0.1.x86_64-linux-gnu-gcc4.tar.gz into /home/wmoses/autodiff/Enzyme.jl/deps/usr
[ Info: Found a valid dl path LLVMEnzyme.so while looking for LLVMEnzyme
[ Info: /home/wmoses/autodiff/Enzyme.jl/deps/usr/lib/LLVMEnzyme.so matches our search criteria of LLVMEnzyme
[ Info: /home/wmoses/autodiff/Enzyme.jl/deps/usr/lib/LLVMEnzyme.so cannot be dlopen'ed
[ Info: Could not locate LLVMEnzyme inside /home/wmoses/autodiff/Enzyme.jl/deps/usr/lib
ERROR: Error while loading expression starting at /home/wmoses/autodiff/Enzyme.jl/deps/build.jl:68
caused by [exception 1]
LibraryProduct(nothing, ["LLVMEnzyme"], :libenzyme, "Prefix(/home/wmoses/autodiff/Enzyme.jl/deps/usr)") is not satisfied, cannot generate deps.jl!
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] #write_deps_file#152(::Bool, ::typeof(write_deps_file), ::String, ::Array{LibraryProduct,1}) at /home/wmoses/.julia/packages/BinaryProvider/TcAwt/src/Products.jl:414
 [3] (::getfield(BinaryProvider, Symbol("#kw##write_deps_file")))(::NamedTuple{(:verbose,),Tuple{Bool}}, ::typeof(write_deps_file), ::String, ::Array{LibraryProduct,1}) at ./none:0
 [4] top-level scope at /home/wmoses/autodiff/Enzyme.jl/deps/build.jl:68
 [5] include at ./boot.jl:328 [inlined]
 [6] include_relative(::Module, ::String) at ./loading.jl:1094
 [7] include(::Module, ::String) at ./Base.jl:31
 [8] include(::String) at ./client.jl:431
 [9] top-level scope at none:5

Note that after failing the directory does seem to exist (implying an incorrect order issue)

Error adding Enzyme

Hi! I encountered the following error when I tried to install Enzyme. Any suggestion on what might have caused it or how to solve the error would be much appreciated! Many thanks!

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Pkg

julia> Pkg.add("Enzyme")
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
   Installed Enzyme_jll ─ v0.0.17+0
  Downloaded artifact: Enzyme
  Downloaded artifact: Enzyme
ERROR: Unable to automatically install 'Enzyme' from '/home/users/adamjcz/.julia/packages/Enzyme_jll/i2IWB/Artifacts.toml'
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] ensure_artifact_installed(name::String, meta::Dict{String, Any}, artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, verbose::Bool, quiet_download::Bool, io::Base.TTY)
    @ Pkg.Artifacts /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Artifacts.jl:445
  [3] ensure_all_artifacts_installed(artifacts_toml::String; platform::Base.BinaryPlatforms.Platform, pkg_uuid::Nothing, include_lazy::Bool, verbose::Bool, quiet_download::Bool, io::Base.TTY)
    @ Pkg.Artifacts /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Artifacts.jl:510
  [4] download_artifacts(ctx::Pkg.Types.Context, pkg_roots::Vector{String}; platform::Base.BinaryPlatforms.Platform, verbose::Bool, io::Base.TTY)
    @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Operations.jl:709
  [5] download_artifacts(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; platform::Base.BinaryPlatforms.Platform, julia_version::VersionNumber, verbose::Bool, io::Base.TTY)
    @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Operations.jl:686
  [6] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, new_git::Vector{Base.UUID}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform)
    @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Operations.jl:1241
  [7] add(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; preserve::Pkg.Types.PreserveLevel, platform::Base.BinaryPlatforms.Platform, kwargs::Base.Iterators.Pairs{Symbol, Base.TTY, Tuple{Symbol}, NamedTuple{(:io,), Tuple{Base.TTY}}})
    @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:204
  [8] add(pkgs::Vector{Pkg.Types.PackageSpec}; io::Base.TTY, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:80
  [9] add(pkgs::Vector{Pkg.Types.PackageSpec})
    @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:78
 [10] #add#23
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:76 [inlined]
 [11] add
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:76 [inlined]
 [12] #add#22
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:75 [inlined]
 [13] add(pkg::String)
    @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:75
 [14] top-level scope
    @ REPL[2]:1

Type inference fails on LinearAlgebra.mul!

using LinearAlgebra
using Enzyme

function la_mul!(R, A, B)
         LinearAlgebra.mul!(R, A, B)
         nothing
end

A = rand(1024, 64)
B = rand(64, 512)

R = zeros(size(A,1), size(B,2))
∂z_∂R = rand(size(R)...)  # Some gradient/tangent passed to us

∂z_∂A = zero(A)
∂z_∂B = zero(B)

Enzyme.autodiff(la_mul!, 
               Duplicated(R, ∂z_∂R),
               Duplicated(A, ∂z_∂A),
               Duplicated(B, ∂z_∂B))

Currently crashes spectacularly on master by failing to infer some integer

Support dynamic code

We could rewrite jl_apply_generic to call a method in Enzyme producing the gradient. We could therefore deal with dynamic code.

Things to design:

  • How do we deal with Enzyme argument annotation
  • Pass through Julia world

Supporting functions that cannot be inferred

using Enzyme
using LinearAlgebra

A = [1 2; 3 4.0]; B = [5 6; 7 8.0]
da = zero(A); c = similar(A)
autodiff((A,B)->sum(mul!(c, A, B)), Active, Duplicated(A, da), Const(B))

Fails because c is non-const and thus we can't infer the body. Active{Float64} for the retval doesn't work since we have safety checks.

`dgemm_64` support missing

using Enzyme
using LinearAlgebra

A = [1 2; 3 4.0]; B = [5 6; 7 8.0]
da = zero(A); c = similar(A)
autodiff((c, A,B)->sum(mul!(c, A, B)), Active, Duplicated(c, zero(c)), Duplicated(A, da), Const(B))

Fails with:

julia-debug: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:2883: llvm::Value* GradientUtils::invertPointerM(llvm::Value*, llvm::IRBuilder<>&): Assertion `0 && "cannot find deal wit
h ptr that isnt arg"' failed.                                                                                                                                                                  
                                                                                                                                                                                               
signal (6): Aborted                                                                                                                                                                            
in expression starting at REPL[6]:1                                                                                                                                                            
gsignal at /usr/lib/libc.so.6 (unknown line)                                                                                                                                                   
abort at /usr/lib/libc.so.6 (unknown line)                                                                                                                                                     
__assert_fail_base.cold at /usr/lib/libc.so.6 (unknown line)                                                                                                                                   
__assert_fail at /usr/lib/libc.so.6 (unknown line)                                                                                                                                             
invertPointerM at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:2883                                                                                                                
visitCallInst at /workspace/srcdir/Enzyme/enzyme/Enzyme/AdjointGenerator.h:4841                                                                                                                
delegateCallInst at /opt/x86_64-linux-gnu/x86_64-linux-gnu/sys-root/usr/local/include/llvm/IR/InstVisitor.h:299 [inlined]        

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.