GithubHelp home page GithubHelp logo

herumi / xbyak Goto Github PK

View Code? Open in Web Editor NEW
2.0K 113.0 272.0 2.73 MB

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.48% C++ 97.52% Shell 0.34% C 0.73% Batchfile 0.60% CMake 0.17% Meson 0.15%
assembler jit x86-64

xbyak's Introduction

Xbyak 7.06 Badge Build

A C++ JIT assembler for x86 (IA32), x64 (AMD64, x86-64)

Menu

Abstract

Xbyak is a C++ header library that enables dynamically to assemble x86(IA32), x64(AMD64, x86-64) mnemonic.

The pronunciation of Xbyak is kəi-bja-k. It is named from a Japanese word 開闢, which means the beginning of the world.

Feature

  • header file only
  • Intel/MASM like syntax
  • fully support AVX-512
  • support APX/AVX10

Note: Use and_(), or_(), ... instead of and(), or(). If you want to use them, then specify -fno-operator-names option to gcc/clang.

Derived Projects

News

  • support RAO-INT for APX
  • support AVX10 detection, AESKLE, WIDE_KL, KEYLOCKER, KEYLOCKER_WIDE
  • support APX except for a few instructions
  • add amx_fp16/avx_vnni_int8/avx_ne_convert/avx-ifma
  • add movdiri, movdir64b, clwb, cldemote
  • WAITPKG instructions (tpause, umonitor, umwait) are supported.
  • MmapAllocator supports memfd with user-defined strings. see sample/memfd.cpp
  • strictly check address offset disp32 in a signed 32-bit integer. e.g., ptr[(void*)0xffffffff] causes an error.
    • define XBYAK_OLD_DISP_CHECK if you need an old check, but the option will be remoevd.
  • add jmp(mem, T_FAR), call(mem, T_FAR) retf() for far absolute indirect jump.
  • vnni instructions such as vpdpbusd supports vex encoding.
  • (break backward compatibility) push(byte, imm) (resp. push(word, imm)) forces to cast imm to 8(resp. 16) bit.
  • (Windows) #include <winsock2.h> has been removed from xbyak.h, so add it explicitly if you need it.
  • support exception-less mode see. Exception-less mode
  • XBYAK_USE_MMAP_ALLOCATOR will be defined on Linux/macOS unless XBYAK_DONT_USE_MMAP_ALLOCATOR is defined.

Supported OS

  • Windows (Xp, Vista, 7, 10, 11) (32 / 64 bit)
  • Linux (32 / 64 bit)
  • macOS (Intel CPU)

Supported Compilers

Almost C++03 or later compilers for x86/x64 such as Visual Studio, g++, clang++, Intel C++ compiler and g++ on mingw/cygwin.

License

BSD-3-Clause License

Author

光成滋生 Mitsunari Shigeo

GitHub | Website (Japanese) | [email protected]

Sponsors welcome

GitHub Sponsor

xbyak's People

Contributors

80c535 avatar akharito avatar atafra avatar constellation avatar cota avatar cursey avatar densamoilov avatar doyagu avatar electronicsarchiver avatar gibbed avatar herumi avatar igorsafo avatar inolen avatar jrmwng avatar keszybz avatar koscrob avatar kwiersch avatar mgouicem avatar nagamani71 avatar nivas-x86 avatar nshustrov avatar orz-- avatar ryan-rsm-mckenzie avatar scribam avatar shelleygoel avatar sonicadvance1 avatar tachi107 avatar tyfkda avatar wunkolo avatar xuxinzen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xbyak's Issues

vcmp missing broadcast flags

The vcmppd and vcmpps instructions don't have the (respectively) T_B64 and T_B32 broadcast operand type flags set, even though the corresponding instructions are legal (at least on my machine/extension set).

Thanks.

Programmable Label & Jump

declarative な label だけではなく, programmable な Label & Jump の concept と実装を提案します.

例えば, V8 assembler のように,

void AsmSmallProcedure(Label* bailout) {
    asm(...);
    asm(...);
    cmp(...);
    je(bailout);  // bind
    ...;
}

Label bailout;
AsmSmallProcedure(&bailout);
...
bind(&bailout);  // bind jump target here.

このようにすることで, asm fragments の module 化や, label の取り回しが便利になり,
例えば, iv/lv5/breaker がやってるように https://github.com/Constellation/iv/blob/master/iv/lv5/breaker/compiler.h#L2966
unique な文字列を生成してというようなやり方よりもより label の取り回しが簡単になるのではないかと考えています. このようにすることで, Label を宣言的でなく扱うことができ, 何かの動的な入力から動的にコードを出力したい場合, 特に JIT compiler にとっては非常に便利です.

また同時に, JSC Macro Assembler のように Jump というものも同時に提案します.

Jump AsmSmallProcedure() {
    asm(...);
    asm(...);
    cmp(...);
    Jump jump = je();
    ...;
    return jump;
}

Jump jump = AsmSmallProcedure();
...
link(&jump);  // link jump target here.

みたいにして記述することができると, 非常に取り回しが便利になるのではないでしょうか.

Label と Jump を組み合わせて,

Jump AsmSmallProcedure() {
    asm(...);
    asm(...);
    cmp(...);
    Jump jump = je();
    ...;
    return jump;
}

jmp(".start");
// unreachable

Label label;
// reach here later
bind(&label);
...

L(".start");
Jump jump = AsmSmallProcedure();
...
link(&jump, &label);  // link jump target to label

というようなことができれば, 現在の declarative な label に加えて, programmable な label や jump というのが実現できるのではないかと考えています.

gen.link(&jump, &label)gen.link(&jump) とするか, jump.link(gen, label) のようにするかについては不明です.

CodeArray::setSize can throw when it shouldn't

Hi again, just a small issue this time:

setSize throws an error when size >= maxSize_.
However "size_ == maxSize_" is a valid state that occurs naturally as you fill an autoGrow-enabled CodeArray.

Therefore, the following valid (in my opinion) code has a small chance to fail due to setSize throwing an error:

size_t prevPos = getSize ();
setSize ();
<...do stuff...>
setSize (prevPos); // <-- This may throw.

I think setSize should only throw if size > maxSize_ instead.

Thanks.

EDIT: Oh, and thanks for creating xbyak - it's been a great help to me.

fix nop()

nop() 下記のfor文は
2417 for (size_t i = 0; i < len; i++) {
2418 db(seq[i]);
2419 }

db(seq, len);
1行に書き換えができます。

あと要望なのですが、
dbのcodeを可変引数で受け取る変更はできますか?


db(0x90) ;db(0x90) ;db(0x90) ;

db(0x90, 0x90, 0x90);
と書き換えたい
よろしくお願いします。

Keep getting 'ERR: Can't Protect', but not in gdb mode

$ ./bf64 hello.bf
64bit mode
ERR:can't protect

Is there any explanation into this? I can't debug it with gdb since it runs fine in debug mode..
I'm using 64-bit CentOS 6.4, with gcc 4.4.7

FYI I also tried in Win7 with msvc 10 and did not get that error either..

And FWIW output of this machine's less /proc/cpuinfo is listed below

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz
stepping        : 10
cpu MHz         : 2657.908
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dts tpr_shadow vnmi flexpriority
bogomips        : 5315.81
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

..and the other 3 cores..

[Intel MPX] BND prefix(F2h) is missing.

JMP/Jcc/CALL/RETの先頭にBNDプリフィックスがない場合bnd0-bnd3はゼロクリアされる仕様となっております。
現状、bndプリフィックスを出力する関数をワークアラウンドで追加して対応しておりますが、追加は可能でしょうか?
void bnd(){ db(0xF2); }

加えて、baseとindexの解釈が特殊なbndmkやbndldx/bndstx(mibエンコーディング)ではbaseとindexのレジスタの値を明確に区別する必要があるため、SIBの最適化(例: [rcx*2] ⇒ [rcx+rcx])を抑制する必要があります。
こちらも現状は必要に応じバイトコード直打ちで対応しておりますが、ご検討くださいませ。

how to use vpcmpgtb

May I ask one question about how to use void vpcmpgtb(const Xmm& x1, const Xmm& x2, const Operand& op).

For example:
xmm_src with 16*s8 = [-2, 0, 3, -1, ....];
and I want to get xmm_dst like [0, 0, 3, 0, ...]

It can work with function void vpcmpgtb(const Opmask& k, const Xmm& x, const Operand& op)

      Opmask k;
      vpcmpgtb(k, xmm_src, xmm_zero); 
      vpmovm2b(xmm_tmp, k); 
      vandps(xmm_dst, xmm_src, xmm_tmp); 

But when I try

      vpcmpgtb(xmm_tmp, xmm_src, xmm_zero); 
      vandps(xmm_dst, xmm_src, xmm_tmp); 

It throws one error:

terminate called after throwing an instance of 'Xbyak::Error'
what(): evex is invalid

Even though there are some other ways can do it, but I am still confuse why throw this error.

Thanks very much

YMM support on vptest ?

Hello,

Currently my project uses the version 4.84 of Xbyak. I'm currently looking to upgrade to latest version but I saw that we did a small update locally to support YMM reg on vptest.
Here the local hack:

--- a/plugins/GSdx/xbyak/xbyak_mnemonic.h
+++ b/plugins/GSdx/xbyak/xbyak_mnemonic.h
@@ -946,7 +974,7 @@ void vpmovzxdq(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F3
 void vpshufd(const Xmm& xm, const Operand& op, uint8 imm) { opAVX_X_XM_IMM(xm, op, MM_0F | PP_66, 0x70, true, -1, imm); }
 void vpshufhw(const Xmm& xm, const Operand& op, uint8 imm) { opAVX_X_XM_IMM(xm, op, MM_0F | PP_F3, 0x70, true, -1, imm); }
 void vpshuflw(const Xmm& xm, const Operand& op, uint8 imm) { opAVX_X_XM_IMM(xm, op, MM_0F | PP_F2, 0x70, true, -1, imm); }
-void vptest(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F38 | PP_66, 0x17, false, -1); }
+void vptest(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F38 | PP_66, 0x17, true, -1); }
 void vrcpps(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F, 0x53, true, -1); }
 void vrsqrtps(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F, 0x52, true, -1); }
 void vsqrtpd(const Xmm& xm, const Operand& op) { opAVX_X_XM_IMM(xm, op, MM_0F | PP_66, 0x51, true, -1); }

Looking at the new code, I think it still miss it (aka I don't see T_YMM). I don't know if the flag is enough.

Thanks you.

Missing BSF/BSR instructions

The BSF/BSR instructions appear to be missing from xbyak.h.
Sorry - didn't realize they were in xbyak_mnemonic.h - intellisense deceived me to think they weren't anywhere.

RIP-relative addressing with labels - issues with opcodes that have immediates

Hi,

Sorry for not catching this in the "RIP-relative addressing with labels" issue but it seems like the feature doesn't work correctly for opcodes that have an extra immediate after the modrm data (like cmpss, pshufd, test and many others).

The following testcase demonstrates it (check the resulting assembly):

    Xbyak::Label label;
    cmpss (xmm0, ptr[rip + label], 0);
    test (dword[rip + label], 33);
    bt (dword[rip + label ], 3);
    vblendpd (xmm0, dword[rip + label], 3);
    vpalignr (xmm0, qword[rip + label], 4);
    vextractf128 (dword[rip + label], ymm3, 12);
    vperm2i128 (ymm0, ymm1, qword[rip + label], 13);
    vcvtps2ph (ptr[rip + label], xmm2, 44);
    mov (dword[rip + label], 0x1234);
    shl (dword[rip + label], 3);
    shr (dword[rip + label], 1);
    shld (dword[rip + label], rax, 3);
    imul (rax, qword[rip + label], 21);
    rorx (rax, qword[rip + label], 21);
    ret ();

    L (label);
    dq (0x123456789abcdef0);

(Note: some of these write to ptr[rip + label], which I'm not sure of the legality of)

I don't see any nice fix for this, but I've prepared a rather cumbersome fix that adds optional arguments such as "int imm = NONE" and "int endOffset = 0" to handle this case, in case it helps:
https://gist.github.com/whyisthisfieldhere/3bcf044bcd06a2c474fd

What do you think?
I'd understand if you'd prefer to remove the feature if you don't consider it useful enough.

AVX512 support

Hi,

Are you working on/ planning to adding AVX512 instruction set support to xbyak?

Thanks for developing this great project.

Regards,
Jacek

Support for segment registers

This feature is missing and it would be extremely convenient for me and maybe others as well. Are there any plans to implement it? If not, could you please give some guidelines about how would you like such a thing to be integrated into Xbyak? Thank you very much. :-)

Support for ptr[rip + label + 1234] addressing.

I am trying to replicate code that reads from a data section label plus an offset. In YASM code looks like this:

pat_idct32_pass1: dw ...
...
lea rax, [pat_idct32_pass1 + 352]

I am loading other data in xbyak using [rip + label] addressing. This may not be the best way but it works. I think it should be possible to do ptr[rip + pat_idct32_pass1 + 352] as above. This will not compile though (no suitable operator+). Can xbyak be made to do this? (thanks for fantastic tool btw)

x64 addressing problem

How can I move a variable value to register on x64?
I am trying to do:

uint64_t var;
mov(rax, ptr[&var]);

I got error message: offset is too big

I tried:
mov(rax, qword[(size_t)&var]);
But it moves the var address as immediate value to rax.

How can I write: mov rax, qword[var]?

Generate multiple functions with one instance of CodeGenerator

Any chance I would generate more than one functions with just one instance of CodeGenerator ?

I need to generate many functions on the fly, but I found the constructor of CodeGenerator has very significant overhead, and is unsuitable to be called repeated.

PVS Studio Analyzer

More errors came up these are considered low or medium and maybe just false positive.

V566 The integer constant is converted to pointer. Possibly an error or a bad coding style: (void *) 1 xbyak.h 752
V690 The 'Pack' class implements a copy constructor, but lacks the '=' operator. It is dangerous to use such a class. xbyak_util.h 309
V524 It is odd that the body of 'cmovnbe' function is fully equivalent to the body of 'cmova' function. xbyak_mnemonic.h 67
V524 It is odd that the body of 'cmovna' function is fully equivalent to the body of 'cmovbe' function. xbyak_mnemonic.h 64
V524 It is odd that the body of 'cmovz' function is fully equivalent to the body of 'cmove' function. xbyak_mnemonic.h 83
V524 It is odd that the body of 'cmovnle' function is fully equivalent to the body of 'cmovg' function. xbyak_mnemonic.h 73
V524 It is odd that the body of 'cmovnl' function is fully equivalent to the body of 'cmovge' function. xbyak_mnemonic.h 72
V524 It is odd that the body of 'cmovnge' function is fully equivalent to the body of 'cmovl' function. xbyak_mnemonic.h 71
V524 It is odd that the body of 'cmovng' function is fully equivalent to the body of 'cmovle' function. xbyak_mnemonic.h 70
V524 It is odd that the body of 'cmovnz' function is fully equivalent to the body of 'cmovne' function. xbyak_mnemonic.h 77
V524 It is odd that the body of 'cmovpo' function is fully equivalent to the body of 'cmovnp' function. xbyak_mnemonic.h 81
V524 It is odd that the body of 'cmovpe' function is fully equivalent to the body of 'cmovp' function. xbyak_mnemonic.h 80
V524 It is odd that the body of 'jnbe' function is fully equivalent to the body of 'ja' function. xbyak_mnemonic.h 350
V524 It is odd that the body of 'jna' function is fully equivalent to the body of 'jbe' function. xbyak_mnemonic.h 338
V524 It is odd that the body of 'jz' function is fully equivalent to the body of 'je' function. xbyak_mnemonic.h 414
V524 It is odd that the body of 'jnle' function is fully equivalent to the body of 'jg' function. xbyak_mnemonic.h 374
V524 It is odd that the body of 'jnl' function is fully equivalent to the body of 'jge' function. xbyak_mnemonic.h 370
V524 It is odd that the body of 'jnge' function is fully equivalent to the body of 'jl' function. xbyak_mnemonic.h 366
V524 It is odd that the body of 'jng' function is fully equivalent to the body of 'jle' function. xbyak_mnemonic.h 362
V524 It is odd that the body of 'jnz' function is fully equivalent to the body of 'jne' function. xbyak_mnemonic.h 390
V524 It is odd that the body of 'jpo' function is fully equivalent to the body of 'jnp' function. xbyak_mnemonic.h 406

PVS Studio Analyzer Errors

I get the following errors which show V792:

V792 xbyak.h 409 The 'isREG' function located to the right of the operator '|' will be called regardless of the value of the left operand. Perhaps, it is better to use '||'.

V792 xbyak.h 409 The 'isExtIdx' function located to the right of the operator '|' will be called regardless of the value of the left operand. Perhaps, it is better to use '||'.

EVEX disp8 granularity for vinsertps

Hi,
I noticed that the EVEX disp8 granularity for vinsertps doesn't seem to be set correctly in xbyak_mnemonic.h (it should be 4). I changed line 1026 to add the T_N4 flag and it works correctly now:

opAVX_X_X_XM(x1, x2, op, T_66 | T_0F3A | T_W0 | T_EW0 | T_EVEX | T_N4, 0x21, imm);

vextractps already has the T_N4 flag, so it's OK.

Silent truncation of size_t to uint32 in mov (<mem>, <size_t>) case

Hi.

A small issue I noticed - "void mov(const Operand& op, size_t imm)", in the case where "op.isMEM()", the size_t is cast to uint32, silently truncating it.
It'd be great if this would throw an exception if the value doesn't fit in an int32 instead (like I see some other places in the code already do).

[Feature Request] Allow getting Label's pointer

In my JIT, I compile several blocks of code that link and jump to each-other. For that I keep Labels in an array and do a L() call at the beginning of each block. However I also need to call into these blocks from external compiled code (as a function pointer). This forces me to also keep a separate array of function pointers and call getCurr() at each block together with the L(), even though the Labels already keep track of that.

So I think it would be useful to be able to get a pointer to the location represented by a Label, after compilation is done and all label locations have already been resolved.

Recursive label resolusion

Quoted from here. https://github.com/herumi/misc/blob/master/local-label.txt

Xbyakのラベルの扱いの拡張案

現状の仕様

L("AAA"); グローバルラベル

L(".AAA"); ローカルラベル

jne(".AAA");

inLocalLabel(); outLocalLabel();
で囲まれた中のローカルラベルはその外側のラベルは見えない。

拡張案
ジャンプ命令の第2(3?)引数にint型の引数を追加する。

inLocalLabel();
L(".AAA"); // L1

  inLocalLabel();
  L(".AAA"); // L2

    inLocalLabel();
    L(".AAA"); // L3

    jmp(".AAA"); // 今までどおり
    jmp(".AAA", 1); // 1段外のスコープのラベル(L2)になる
    jmp(".AAA", 2); // L1になる

    outLocalLabel();
  outLocalLabel();
outLocalLabel();

inLocalLabel()が今のスタックの段数を返すようにすればより自由度が高まるか。

現状は
jne(const std::string& label, LabelType type = T_AUTO);
これをどうやって拡張しよう。

案) 第2引数がマイナスのときは上記拡張になるとする。
-1なら1段外、-2なら2段外。このときはT_NEAR固定でよいか。

Question: how to use RIP to access "external" C structure

Hello,

I would like to use rip addressing to load/store from/to a C structure. I fail to understand how I can do it. As I need the rip value. Did I miss something, how must I write the code? Thanks you

Here a small example:

// In my C++ program
struct S {
    char[16] v;
} data;

// In my JIT generation
// Goal: move data.v to xmm1

First tentative: will really add data to rip as expected but I don't want to do that.

vmovdqa(xmm1, ptr [ rip + (size_t)&data]);

Second tentative: can't use - with label

// So you need to manually compute the address of the next instruction so something like this
Label next;
vmovdqa(xmm1, ptr [ rip - next + (size_t)&data]);
L(next);

Third tentative: manually compute the displacement but requires to know the size of the instruction in advance.

vmovdqa(xmm1, ptr [ rip + (size_t)&data - top_ - size_ - 8]);

convert 8-bit reg to high 8-bit

Is there something similar to cvt8 but for converting to the high 8-bit representation? Or perhaps a function for converting a low 8-bit representation to the high 8-bit representation? Sorry in advance if I'm missing something obvious.

ERR_OFFSET_IS_TOO_BIG when calling function in x64

sub(rsp, 40);

// setup lua stack
mov(edx, p->maxstacksize);
// state is already in rcx
call(luaD_growstack);

mov(eax, 0);
add(rsp, 40);
ret();

When I use this code a ERR_OFFSET_IS_TOO_BIG exception is thrown in VerifyInInt32

The x86 equivalent works perfectly fine

push(ebp);
mov(ebp, esp);
mov(ecx, ptr[esp + 8]);

// setup lua stack
push(p->maxstacksize);
push(ecx); // state
call(luaD_growstack);
add(esp, 4 * 2);

mov(eax, 0)
pop(ebp);
ret();

clang-cl support

Hello,
during a recent compilation of xbyak with clang-cl I found the following problem:
error LNK2019: reference to exernal symbol __xgetbv unresolved in function "public: __cdecl Xbyak::util::Cpu::Cpu(void)" (??0Cpu@util@Xbyak@@QEAA@XZ)

clang-cl can be recognized by the programmer because it defines both _MSC_VER and __clang__, therefore a possible fix to this issue seems to be:

--- a/xbyak/xbyak_util.h
+++ b/xbyak/xbyak_util.h
@@ -49,7 +49,9 @@
 #endif

 #ifdef _MSC_VER
-extern "C" unsigned __int64 __xgetbv(int);
+       #ifndef __clang__
+               extern "C" unsigned __int64 __xgetbv(int);
+       #endif
 #endif

 namespace Xbyak { namespace util {
@@ -118,7 +120,11 @@ public:
        static inline uint64 getXfeature()
        {
 #ifdef _MSC_VER
+       #ifndef __clang__
                return __xgetbv(0);
+       #else
+               return _xgetbv(0);
+       #endif
 #else
                unsigned int eax, edx;
                // xgetvb is not support on gcc 4.2

This compiles without problems, not sure if this causes other problems though.
Another option might be checking for (_MSC_VER && !__clang__) instead of only _MSC_VER.

Instructions that were not found in the kit

Hello,

missing instructions, if these instructions were available, it would give extended flexibility to the engine.

arpl
bound
call
clac
cldemote
clflushopt
clts
clwb
cmovcc
cmps
cmpsd
encls
enter
fbld
fbstp
fclex
fcmovcc
fldenv
fnclex
fnclex*
fninit*
fnsave
fnsave*
fnstcw
fnstcw*
fnstenv
fnstenv*
fnstsw
fnstsw*
frstor
fsave
fstenv
fstsw
fxrstor
fxrstor64
fxsave
fxsave64
hlt
ibvpermilps
in
ins
insb
insd
insw
int
int n
int1
int3
into
invd
invept
invlpg
invpcid
invvpid
iret
iretd
iretq
lar
lds
leave
les
lfs
lgdt
lgs
lidt
lldt
lmsw
lods
lodsb
lodsd
lodsq
lodsw
loop
loopcc
loope
loopne
lsl
lss
ltr
mov
movdir64b
movdiri
movq
movs
movsd
nop
out
outs
outsb
outsd
outsw
pabsq
pmaxsq
pmaxuq
pminsq
pminuq
pmovsx
pmovzx
pmullq
pop
popfq
prefetchh
psraq
ptwrite
push
pushfq
rdfsbase
rdgsbase
rdpid
rdpkru
repe
repne
repnz
repz
rsm
scas
setcc
sgdt
sidt
sldt
smsw
stos
str
swapgs
syscall
sysenter
sysexit
sysret
test
tpause
ud
umonitor
umwait
vbroadcast
verr
verw
vextractf32x4
vextracti32x4
vgatherdpd
vgatherdps
vgatherqpd
vgatherqps
vmaskmov
vmcall
vmclear
vmfunc
vmlaunch
vmptrld
vmptrst
vmread
vmresume
vmwrite
vmxoff
vmxon
vpbroadcast
vpbroadcastm
vpgatherdd
vpgatherdq
vpgatherqd
vpgatherqq
vpmaskmov
vshuff32x4
vshuff64x2
vshufi32x4
vshufi64x2
wrfsbase
wrgsbase
wrpkru
xabort
xacquire
xbegin
xchg
xend
xlat
xrelease
xrstor
xrstor64
xrstors
xrstors64
xsave
xsave64
xsavec
xsavec64
xsaveopt
xsaveopt64
xsaves
xsaves64
xsetbv
xtest
xtest

ENCLS
ENCLS[EADD]
ENCLS[EAUG]
ENCLS[EBLOCK]
ENCLS[ECREATE]
ENCLS[EDBGRD]
ENCLS[EDBGWR]
ENCLS[EEXTEND]
ENCLS[EINIT]
ENCLS[ELBUC]
ENCLS[ELDBC]
ENCLS[ELDB]
ENCLS[ELDU]
ENCLS[EMODPR]
ENCLS[EMODT]
ENCLS[EPA]
ENCLS[ERDINFO]
ENCLS[EREMOVE]
ENCLS[ETRACKC]
ENCLS[ETRACK]
ENCLS[EWB]
ENCLU
ENCLU[EACCEPTCOPY]
ENCLU[EACCEPT]
ENCLU[EENTER]
ENCLU[EEXIT]
ENCLU[EGETKEY]
ENCLU[EMODPE]
ENCLU[EREPORT]
ENCLU[ERESUME]
ENCLV
ENCLV[EDECVIRTCHILD]
ENCLV[EINCVIRTCHILD]
ENCLV[ESETCONTEXT]

GETSEC[CAPABILITIES]
GETSEC[ENTERACCS]
GETSEC[EXITAC]
GETSEC[PARAMETERS]
GETSEC[SENTER]
GETSEC[SEXIT]
GETSEC[SMCTRL]
GETSEC[WAKEUP]

INVEPT
INVVPID
VMCALL
VMCLEAR
VMFUNC
VMLAUNCH
VMPTRLD
VMPTRST
VMREAD
VMRESUME
VMWRITE
VMXOFF
VMXON

Conditional jumps w/ absolute code destinations

Currently the code only allows Labels as conditional branch destination

eg

void jns(std::string label, LabelType type = T_AUTO) { opJmp(label, type, 0x79, 0x89, 0x0F); }
void jns(const Label& label, LabelType type = T_AUTO) { opJmp(label, type, 0x79, 0x89, 0x0F); }

For my use case I need to branch to an arbitrary destination, so I need an entry like

void jns(void* dst, LabelType type = T_NEAR) { opJmpAbs(dst, type, 0x79, 0x89, 0x0F); }

Is it possible to add this variation in the generator?

ld: symbol(s) not found for architecture i386 on Mac

I'm having trouble installing on a Mac with clang. Any idea why it's complaining about i386, when I don't think I have any flags set for i386?

:info:build       std::_Hashtable<int, std::pair<int const, Xbyak::JmpLabel const>, std::allocator<std::pair<int const, Xbyak::JmpLabel const> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, false> >::_M_insert_multi_node(std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*, unsigned long, std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*) in ccFELKh2.o
:info:build       _main in ccFELKh2.o
:info:build   "___cxa_call_unexpected", referenced from:
:info:build       Xbyak::Error::what() const in ccFELKh2.o
:info:build   "___cxa_end_catch", referenced from:
:info:build       std::_Hashtable<int, std::pair<int const, Xbyak::JmpLabel const>, std::allocator<std::pair<int const, Xbyak::JmpLabel const> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, false> >::_M_insert_multi_node(std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*, unsigned long, std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*) in ccFELKh2.o
:info:build       _main in ccFELKh2.o
:info:build   "___cxa_free_exception", referenced from:
:info:build       Xbyak::RegExp::RegExp(Xbyak::Reg const&, int) in ccFELKh2.o
:info:build       Xbyak::operator+(Xbyak::RegExp const&, Xbyak::RegExp const&) in ccFELKh2.o
:info:build       Xbyak::CodeArray::db(int) in ccFELKh2.o
:info:build       Xbyak::Address::verify() const in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::rex(Xbyak::Operand const&, Xbyak::Operand const&) in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::CodeGenerator(unsigned long, void*, Xbyak::Allocator*) in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::opModM(Xbyak::Address const&, Xbyak::Reg const&, int, int, int, int) in ccFELKh2.o
:info:build       ...
:info:build   "___cxa_rethrow", referenced from:
:info:build       std::_Hashtable<int, std::pair<int const, Xbyak::JmpLabel const>, std::allocator<std::pair<int const, Xbyak::JmpLabel const> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, false> >::_M_insert_multi_node(std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*, unsigned long, std::__detail::_Hash_node<std::pair<int const, Xbyak::JmpLabel const>, false>*) in ccFELKh2.o
:info:build   "___cxa_throw", referenced from:
:info:build       Xbyak::RegExp::RegExp(Xbyak::Reg const&, int) in ccFELKh2.o
:info:build       Xbyak::operator+(Xbyak::RegExp const&, Xbyak::RegExp const&) in ccFELKh2.o
:info:build       Xbyak::CodeArray::db(int) in ccFELKh2.o
:info:build       Xbyak::Address::verify() const in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::rex(Xbyak::Operand const&, Xbyak::Operand const&) in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::CodeGenerator(unsigned long, void*, Xbyak::Allocator*) in ccFELKh2.o
:info:build       Xbyak::CodeGenerator::opModM(Xbyak::Address const&, Xbyak::Reg const&, int, int, int, int) in ccFELKh2.o
:info:build       ...
:info:build   "___gxx_personality_v0", referenced from:
:info:build       Dwarf Exception Unwind Info (__eh_frame) in ccFELKh2.o
:info:build ld: symbol(s) not found for architecture i386
:info:build collect2: error: ld returned 1 exit status
:info:build make[1]: *** [quantize] Error 1
:info:build make[1]: *** Waiting for unfinished jobs....

Add template overload for the call function

When passing a function pointer to call, the compiler complains on implicit conversion to void*, it could be fixed (when vararg templates are available) by adding an overloaded function using vararg template:

template<typename Result, typename... Values>
void call(Result(*addr)(Values...)) { opJmpAbs((const void*)addr, T_NEAR, 0, B11101000); }

VSIB encoding errors with vector index registers 16-31

Hi!
First, thanks for all the work making xbyak, it's a great library to work with. I noticed an encoding bug in AVX-512 gather/scatter instructions when the vector index register is in the high half, one of [x,y,z]mm16-31. For instance,
vgatherdps zmm0{k1}, [rax + ymm18]
would be encoded as
vgatherdps zmm0{k1}, [rax + ymm2].
The problem is that the EVEX.V' bit is not set properly in these situations (it is always 1). I have a patch that appears to fix the issue, but I'm not an expert in xbyak's internals, so it might not be the right approach.

Thanks,
Peter

Missing cmpxchg

Although cmpxchg8b/16b seem to exist, the standard cmpxchg instructions appear to be missing.

`push(byte[edx + ecx])` dereferences a dword instead.

(Values here are from my incident, and not made-up ones to minimize mistakes I may have made).

Suppose the dword at address edx + ecx is 0x01000100, in x86 a value of 0x20001.
Calling push(byte[edx + ecx]) should push only the first byte, so 0x01, however Xbyak seems to emit code that pushes the entire dword onto the stack, which is unintended behavior, I think.

I may be wrong, although if I wanted to push a dword, I would've thought to use the dword AddressFrame, and not byte.

different addresses are equal

I had some code that was doing something to the effect of:

if (result_operand != arg0_operand) {
  mov(result_operand, arg0_operand);
}

However, that broke down when result and arg0 were instances of Xbyak::Address as the equality operator overload in Xbyak::Operator returns true for different displacements.

It seems the Xbyak::Address::disp_ and Xbyak::Operator::idx_ fields could be merged into a single field to fix this somewhat sanely.

namespace Xbyak::util missing some AddressFrames

Hi herumi,

AddressFrame xword, yword, zword, ptr_b, xword_b, yword_b, zword_b; are missing from this namespace. EvexModifierRounding and EvexModifierZero are missing as well.

Thank you for your time.

Too many Xbyak::Allocator::alloc call produces mprotect failure

Hi, thank you for your great library.

I found that sometimes mprotect fails it's modification to the page attribues.
For example, in this test, we observe it several times. https://travis-ci.org/Constellation/iv/jobs/37578037.
In this test, the test script requires xbyak's code generation huge times. (16 ** 4 = 65536 times)

After investigating this problem, I've found that mprotect with valid parameters (address and size are valid) fails with ENOMEM, and it seems that this is because the map count becomes zero (Maybe I think).
ref:

Since xbyak's dynamic memory allocator calls mprotect, but doesn't call munmap, map count doesn't released.
To prevent it, I suggest the following plan.

  • Allocate memory with mmap instead of posix_memalign
  • Manage the allocated size with the address
  • Free the allocated memory with munmap

What do you think about it?

changeBit sometimes produces incorrect 8-bit register

Problem:

crc32(ebx, bpl);
// Produces: f2 40 0f 38 f0 dd   // crc ebx, bpl
// ○

crc32(ebx, rbp.changeBit(8));
// Produces: f2 0f 38 f0 dd   // crc ebx, ch
// ×

Suggestion:

--- a/xbyak/xbyak.h
+++ b/xbyak/xbyak.h
@@ -528,7 +528,7 @@ class Reg : public Operand {
 public:
        Reg() { }
        Reg(int idx, Kind kind, int bit = 0, bool ext8bit = false) : Operand(idx, kind, bit, ext8bit) { }
-       Reg changeBit(int bit) const { return Reg(getIdx(), getKind(), bit, isExt8bit()); }
+       Reg changeBit(int bit) const;
        uint8 getRexW() const { return isREG(64) ? 8 : 0; }
        uint8 getRexR() const { return isExtIdx() ? 4 : 0; }
        uint8 getRexX() const { return isExtIdx() ? 2 : 0; }
@@ -650,6 +650,18 @@ struct RegRip {
 };
 #endif

+inline Reg Reg::changeBit(int bit) const
+{
+       switch (bit) {
+       case 8: return cvt8();
+       case 16: return cvt16();
+       case 32: return cvt32();
+#ifdef XBYAK64
+       case 64: return cvt64();
+#endif
+       }
+       throw Error(ERR_CANT_CONVERT);
+}
 inline Reg8 Reg::cvt8() const
 {
        const int idx = getIdx();

Add VNNI instruction support

Hi, Herumi,
Sorry for bothering you.
I am on a work to add INT8 VNNI instruction VPDPBUSD to xbyak to enable some VNNI test on my environment, VNNI spec is from http://kib.kiev.ua/x86docs/SDMs/319433-030.pdf.
What I plan to do:
I plan to add a new API in xbyak_mnemonic.h, as below:
void vpdpbusd(const Zmm& z1, const Zmm& z2, const Operand& op) { opAVX_X_X_XM(z1, z2, op, T_66 | T_0F38 | T_EW0 | T_YMM | T_MUST_EVEX | T_N4 | T_N_VL, 0x50); }
And I have 2 questions:
1. This is change all needed if I don't consider unit test code completion now, anything else need I change?
2. Is T_N4 | T_N_VL needed or correctly used in this case?
Thx you very much in advance for your guidance.

Multi-nop align is incorrect

Hi herumi,

	if (useMultiByteNop) {
		nop(size_t(getCurr()) % x);
		return;
	}

If getCurr() % 16 == 2 this would produce a 2 byte nop. This would then result in the new getCurr() % 16 being 4, which is incorrect.

This should be:

	if (useMultiByteNop) {
		if (size_t(getCurr()) % x == 0) return;
		nop(x - size_t(getCurr()) % x);
		return;
	}

Thanks.

RIP-relative addressing with labels

Hi.

I noticed that while I can create a RIP-relative address via ptr[rip + ], this doesn't allow me to easily create RIP-relative addresses pointing to other locations in the code buffer (e.g. to a bunch of data I will put right after the code).

I would want to be able to write things like:

Label myLabel;
padd (xmm0, ptr[rip + myLabel]);
<...>
L (myLabel);
db (myData, 16);

I ended up changing xbyak to support this. In case it helps, I've uploaded the result here:
http://codepad.org/QTKeKx3T
(WARNING: It's only tested for my specific usecase; I could upload this as Pull Request if you want)

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.