GithubHelp home page GithubHelp logo

Comments (27)

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024 1

I took a quick look and I actually got better results with my implementation:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.720 (1909/November2018Update/19H2)
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.200
[Host] : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT
Job-BSBVUK : .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT DEBUG

BuildConfiguration=Bench LaunchCount=1 RunStrategy=Throughput

Method Mean Error StdDev Code Size
GuidEqualsMethod 0.9211 ns 0.0263 ns 0.0220 ns 140 B
GuidBitwiseEqualsAltMethod 1.9558 ns 0.0449 ns 0.0375 ns 83 B
GuidBitwiseEqualsMethod 2.3840 ns 0.0428 ns 0.0400 ns 83 B

Here's my EqualsAlt method:

public static bool EqualsAlt<G>(in T first, in G second)
    where G : struct
{
    if (UnsafeTools.SizeOf<T>() != UnsafeTools.SizeOf<G>())
        return false;

    switch (UnsafeTools.SizeOf<T>())
    {
        case 0:
            return true;

        case 1:
            return UnsafeTools.AsInRef<T, byte>(first) == UnsafeTools.AsInRef<G, byte>(second);

        case 2:
            return UnsafeTools.AsInRef<T, short>(first) == UnsafeTools.AsInRef<G, short>(second);

        case 4:
            return UnsafeTools.AsInRef<T, int>(first) == UnsafeTools.AsInRef<G, int>(second);

        case 8:
            return UnsafeTools.AsInRef<T, long>(first) == UnsafeTools.AsInRef<G, long>(second);

        default:
            return Intrinsics.EqualsAligned(
                ref UnsafeTools.AsInRef<T, byte>(first),
                ref UnsafeTools.AsInRef<G, byte>(second),
                UnsafeTools.SizeOf<T>()
            );
    }
}

Here's UnsafeTools:

internal static class UnsafeTools
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static int SizeOf<T>()
    {
        Sizeof(typeof(T));
        return Return<int>();
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static ref TTo AsInRef<TFrom, TTo>(in TFrom source)
    {
        Ldarg(nameof(source));
        return ref ReturnRef<TTo>();
    }
}

I've changed the benchmarks so they return the values, and also updated to BenchmarkDotNet 0.12.1 in order to get a better disassembly with [DisassemblyDiagnoser]:

[Benchmark]
public bool GuidEqualsMethod()
    => NonEmptyGuid.Equals(default);

[Benchmark]
public bool GuidBitwiseEqualsMethod()
    => BitwiseComparer<Guid>.Equals<Guid>(NonEmptyGuid, default);

[Benchmark]
public bool GuidBitwiseEqualsAltMethod()
    => BitwiseComparer<Guid>.EqualsAlt<Guid>(NonEmptyGuid, default);

The difference comes from the fact that the JIT is able to emit a tail call to Intrinsics.EqualsAligned in my version:

; DotNext.BitwiseComparer`1[[System.Guid, System.Private.CoreLib]].EqualsAlt[[System.Guid, System.Private.CoreLib]](System.Guid ByRef, System.Guid ByRef)
       mov       r8d,10
       mov       rax,offset DotNext.Runtime.Intrinsics.EqualsAligned(Byte ByRef, Byte ByRef, Int64)
       jmp       rax
       movzx     eax,al
       ret
; Total bytes of code 23

Whereas it emits a normal call in yours:

; DotNext.BitwiseComparer`1[[System.Guid, System.Private.CoreLib]].Equals[[System.Guid, System.Private.CoreLib]](System.Guid ByRef, System.Guid ByRef)
       sub       rsp,28
       mov       r8d,10
       call      DotNext.Runtime.Intrinsics.EqualsAligned(Byte ByRef, Byte ByRef, Int64)
       movzx     eax,al
       add       rsp,28
       ret
; Total bytes of code 23

I'm pretty sure you'd get the same results if you tweaked the IL a bit, but if the more readable version has the same perf, there's no real reason not to take it. 😉

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024 1

dotnet crashed with SIGABRT Unfortunately disassembler is not working, at least as easy as on Windows 😢

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024 1

Yes, sure. I need to inspect crash dump before this. It takes some time.

Also, you was right about configuration. I fixed everything and found the explanation why optimized build demonstrates worse result. It was because of my unawareness about how to correctly write benchmarks. If benchmark checks something that has return value then benchmark method should have return type as well.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Hello!

About the in modifier:

I won't change Ldarg as I want the opcode methods to be mapped "cleanly" to what they do in IL, but I can enhance the helper methods in the IL class. IL.Push already has a ref overload, but changing that to an in would be a breaking change. I suppose I could add additional IL.PushIn<T>(in T value) and IL.PushOut<T>(out T value) helpers for this, so you could use them instead of Ldarg.

By the way, I'm not getting any warnings on your Bitcast example, are you talking about actual compiler warnings, or are you getting analyzer warnings in the IDE?

About method calls:

I actually wanted to have something like your proposed T Arg<T>() method, so I implemented a T IL.Pop<T>() method at some point, but I quickly realized that it didn't play well with the code that Roslyn generates. It was very unreliable, as I couldn't really depend on the position in the evaluation stack where the call gets inserted, especially when you began to call it inside larger expressions, so it could easily retrieve an unexpected value.

I replaced that with void IL.Pop<T>(out T value) (and some other void methods), which is much more reliable as it is necessarily in a single statement, so I could work with what Roslyn generates. But it requires a local variable or argument to output its value to.

I added a couple things to reduce the verbosity: if your method has no overloads, you don't need to specify any parameter type, and the single method will get picked up. Besides, I don't think the following is that bad:

Ldc_I4(10);
Ldc_I4(2);
Call(new MethodRef(typeof(Math), nameof(Math.Max), typeof(int), typeof(int)));
return Return<int>();

Can you point me to some code where this is a burden? Maybe I could figure something out.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

By the way, I'm not getting any warnings on your Bitcast example, are you talking about actual compiler warnings, or are you getting analyzer warnings in the IDE?

It depends on the FxCop rule set. I'm using custom rule set which warns about unused parameters

I suppose I could add additional IL.PushIn<T>(in T value) and IL.PushOut<T>(out T value)

I'm ok with this.

so I implemented a T IL.Pop<T>()

Yes, it's unreliable. Anyway, when you using pure IL code you're on dangerous path by default. IMO, Arg<T>() just can be replaced with literally nothing when translating to IL because this instruction is just a placeholder. Therefore, translation and analysis are very simple. Another drawback of Call method is a loose of compile-time check for the method presence.

I replaced that with void IL.Pop<T>(out T value)

This way produces unnecessary local variable.

Of course, the original proposal can be modified. For instance, Arg placeholder can be marked with EditorBrowsable attribute to avoid highlights from IDE. Or you can name it as ArgUnsafe to highlight unsafe and unreliable nature of this method. Or just place it to another class named as UnsafeIL or even to different inner class or namespace with Pro, Unsafe words in the name.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

One more thing: what about to add generic versions of ldobj, stobj, sizeof etc? It will be less verbose than using typeof keyword. Of course, overloaded methods with Type parameter should still exist to cover ref value types.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Oh, ok so that's FxCop. I suppose IL.PushIn and IL.PushOut would help then. I just added an implementation (it's not released yet).

IMO, Arg<T>() just can be replaced with literally nothing when translating to IL because this instruction is just a placeholder.

That's exactly the implementation of IL.Push, and the former implementation of IL.Pop 😉

Another drawback of Call method is a loose of compile-time check for the method presence.

You don't lose it, InlineIL checks for the method presence and fails the build if it's not there.

Of course, the original proposal can be modified. For instance, Arg placeholder can be marked with EditorBrowsable attribute to avoid highlights from IDE. Or you can name it as ArgUnsafe to highlight unsafe and unreliable nature of this method. Or just place it to another class named as UnsafeIL or even to different inner class or namespace with Pro, Unsafe words in the name.

Yes, that would be a very unsafe method indeed. It would only work if you call a method where all the parameters are Arg<T>().

Honestly I wouldn't be comfortable using it, ever. I'm not even comfortable providing something like that in the API. Unless maybe if I add a check for correct usage (which would make sure you use it for all of the parameters of a method call).

One more thing: what about to add generic versions of ldobj, stobj, sizeof etc? It will be less verbose than using typeof keyword.

I could add that. I'm not sure I'll like the syntax though, it may not feel very consistent with the existing methods.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

For Arg<T> I can show you actual example. Here is my code. A little bit verbose, isn't? Arg<T> is perfect in this case because arguments are on the evaluation stack, not in parameters or locals. Moreover, I need to reinterpret managed pointer type (which is allowed by ECMA). Here I'm fully understand what is going on and safety is my problem. Additionally, I'm always inspecting generated IL code with ikdasm.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Also, I see obvious usefulness for ldftn, ldvirtftn, ldtoken. With Arg<T> it's possible to reference the method without MethodRef.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Ok, here are some thoughts about your Compare<G> method, and how to make it shorter:

  1. You can replace new TR(typeof(byte)).MakeByRefType() with typeof(byte).MakeByRefType(). You're not using any TypeRef feature here, so you don't need it.

(As an aside: I kind of regret introducing constructors for TypeRef, MethodRef etc, as I don't like having that new keyword everywhere, but now we're stuck with it. I may add more static factories on these classes later on.)

  1. To me it looks like you've unnecessarily written the following part in IL:
            Sizeof(typeof(T));
            Pop(out uint size);
            Push(size);
            Sizeof(typeof(G));
            Ceq();
            Dup();
            Brfalse(methodExit);
            Pop();

This could have been replaced with C# code like:

if (Unsafe.SizeOf<T>() == Unsafe.SizeOf<G>())

The JIT will always optimize a call to Unsafe.SizeOf<T>() to the equivalent sizeof instruction (especially since the method is flagged for aggressive inlining), and it will evaluate the condition as a constant at JIT time and eliminate one of the branches. You can of course embed the SizeOf method in your assembly. Anyway the JIT will produce the same output.

  1. sizeof is basically free - it's always constant, so there's no need to store it in a local variable.

  2. Similarly, with a custom helper method, you could write something like:

return Intrinsics.Compare(
    ref Unsafe.AsInRef<T, byte>(first),
    ref Unsafe.AsInRef<G, byte>(second),
    Unsafe.SizeOf<T>()
);

You'd need to write an equivalent to Unsafe.As which takes an in parameter instead of ref, something like:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ref TTo AsInRef<TFrom, TTo>(in TFrom source)
{
    Ldarg(nameof(source));
    return ref IL.ReturnRef<TTo>();
}

I'd argue this would be more readable than:

Ldarg(nameof(first));
Ldarg(nameof(second));
Sizeof(typeof(T));
Conv_I8();
Push(Intrinsics.Compare(ref ArgRef<byte>(), ref ArgRef<byte>(), Arg<long>())

Also, I see obvious usefulness for ldftn, ldvirtftn, ldtoken. With Arg<T> it's possible to reference the method without MethodRef.

I'm not sure I understand what you mean here.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

One more thing: what about to add generic versions of ldobj, stobj, sizeof etc? It will be less verbose than using typeof keyword.

I added that. I actually like this syntax. 👍

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

and it will evaluate the condition as a constant at JIT time and eliminate one of the branches.

Generally, it's not always working in this way. Compare cannot be inlined by itself. As a result, sizeof will ask runtime to return the size of the generic argument. Generics are actually passed as hidden arguments as described in .NET ABI. So it's constant value at runtime, but not in assembly code generally. To solve this issue I have to wait for this.

I'm not sure I understand what you mean here.

I meant this:

Ldtoken(Console.WriteLine(Arg<string>(), Arg<object[]>());

But I forgot that many methods have void return type.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

I agree that my code can be rewritten with pure C#. However, I saw somewhere in the open issues for CLR that AggressiveInlining may be ignored in some circumstances (even if the size of IL code is less than 36 bytes). This is like inline in C which recommends the compiler to inline the method body at call site but not enforce this. That's why GCC has extension in the form of always_inline attribute.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

The issue you linked does not apply to this case: the T and G type parameters of Compare are constrained to struct types, which means there won't be native code sharing: the JIT will generate a different native method for each combination of type arguments. You're in the best possible scenario for the JIT to optimize out all branches except one (at least if you had written switch (Unsafe.SizeOf<T>()), I'm not sure if the JIT will be able to optimize out switch (size) because of the variable).

Even if the Compare method could handle reference types, and therefore the JIT emitted shareable native code, sizeof would still yield a constant, as it returns the pointer size when asked for the size of any reference type, and struct types get their own native code anyway.

Also, I wouldn't worry too much about AggressiveInlining being ignored in this case: they made a ton of optimizations in .NET Core 3.0 by using Unsafe in many hot paths. These methods are very simple, and none of them gets inlining blocked, it would be a major regression if that happened. Of course, if you're still unsure, you can take a look at the native code emitted by the JIT.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Let's recap our discussion:

  • PushIn and PushOut are mostly expected methods
  • Arg<T> is nice to have. Some API design work required.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Yes, PushIn and PushOut are implemented. But I think I'll rename them to PushInRef and PushOutRef as the current names seem confusing.

Overloads like Stobj<T>() instead of Stobj(TypeRef) are also implemented.

As for Arg<T>... I don't think that's a good idea. 😉

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

You decide, you're an author of this add-in 😉

Interesting fact: I tested Compare using my benchmarks when it's fully re-implemented without InlineIL using SizeOf<T> and can tell that it's a slower than roundtrip with local variable. I think that JIT unable to apply constant propagation in combination with switch table.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Yes, that's interesting... I wrote a quick SharpLab test which seems to show that Roslyn always goes through a local variable when using the switch IL instruction, and maybe the JIT optimizes for that. The native code is always optimized to a single branch in that case.

But that means your rewritten code still should use a local variable for the switch... Could you send me that benchmark please, if you still have it? I'd be interested in playing a bit with it.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Yup, benchmark result for the current implementation: https://sakno.github.io/dotNext/benchmarks.html
Benchmark code: https://github.com/sakno/dotNext/blob/master/src/DotNext.Benchmarks/BitwiseEqualityBenchmark.cs
You can run benchmark project using dotnet run -c Bench. Bench configuration is the same as Release but doesn't publish packages for NuGet.
The main loss is in GuidBitwiseEqualsMethod bench method. The current implementation can save 4-5 nanos (which is x2) in comparison with the implementation using Unsafe.SizeOf<T>() and other helper methods you mentioned above.

Another proof that constant propagation will not lead to inlining of Equals or Compare is EnumEqualsUsingBitwiseComparer bench. In case of constant propagation, the method should be optimized to a few assembly instructions and have the same performance as EqualityComparer.Default but it's not.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Thanks, I'll take a look at this (maybe tomorrow).

Just a quick note about EqualityComparer<T>.Default: don't expect to be able to beat it for things like enums. It's a devirtualized JIT intrinsic. 😄

Compare this demo in .NET Core and .NET Framework targets and see the difference.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Many thanks for such investigation!! I owe you a beer 🍺 I didn't use multiple returns due to absence of this optimization. Now I see that it's not a problem because just one branch survive after optimization.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

You're welcome! 😄 I didn't do much BTW, I just stopped after the first iteration. 😉

Actually, I thought about it a bit more and noticed it was strange that there was no inlining happening. Then, I noticed that your benchmark code actually runs in Debug mode! 😱

See, you added WithCustomBuildConfiguration("Bench") to the benchmark configuration, but BenchmarkDotNet builds its own assembly from a generated project, which does not know about the Bench configuration, so you end up with the default Debug mode.

You can add WithArguments(new[] { new MsBuildArgument("/p:Optimize=true") }) to force it to optimize, or specify a condition for the Bench configuration in a Directory.Build.props file.

I'm actually getting worse results with that change, but I didn't look into it further just yet.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Hmm, WithCustomBuildConfiguration("Bench") doing exactly what I expected. Otherwise, I would see warnings from DotNetBenchmark. Optimization flag for Bench build config is described in csproj file.

Disassembly diagnoser is a luxury for me because I'm sitting on Linux and it is not supported for .NET Core, only for Mono.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

Yes, I didn't notice it at first because I expected BDN to complain, but I guess it doesn't since you're basically doing a custom build. It writes X64 RyuJIT DEBUG in the output though..

The disassembly diagnoser has been massively improved and is now cross-platform in BDN 0.12.1. 😉

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

Also I found that switch expression produces worse result rather than switch statement. In IL I found that Roslyn generates redundant temporary variables.

from inlineil.fody.

ltrzesniewski avatar ltrzesniewski commented on May 22, 2024

I guess you should report an issue then, as it was supposed to work. Maybe some configuration in your benchmarks is preventing it from working.

from inlineil.fody.

sakno avatar sakno commented on May 22, 2024

@ltrzesniewski , thanks a lot!

from inlineil.fody.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.