Xbyak 4.90 ; JIT assembler for x86(IA32), x64(AMD64, x86-64) by C++ ============= Abstract ------------- This is a header file which enables dynamically to assemble x86(IA32), x64(AMD64, x86-64) mnemonic. Feature ------------- header file only you can use Xbyak's functions at once if xbyak.h is included. ### Supported Instructions Sets MMX/MMX2/SSE/SSE2/SSE3/SSSE3/SSE4/FPU(*partial*)/AVX/AVX2/FMA/VEX-encoded GPR ### Supported OS * Windows Xp, Vista, Windows 7(32bit, 64bit) * Linux(32bit, 64bit) * Intel Mac ready ### Supported Compilers * Visual Studio C++ VC2008 Pro, VC2010, VC2012 * gcc 4.7 * clang 3.3 * cygwin gcc 4.5.3 * icc 7.2 >Note: Xbyak uses and(), or(), xor(), not() functions, so "-fno-operator-names" option is required on gcc. Or define XBYAK_NO_OP_NAMES and use and_(), or_(), xor_(), not_() instead of them. and_(), or_(), xor_(), not_() are available if XBYAK_NO_OP_NAMES is not defined. Install ------------- The following files are necessary. Please add the path to your compile directories. * xbyak.h * xbyak_bin2hex.h * xbyak_mnemonic.h Linux: make install These files are copied into /usr/local/include/xbyak Break backward compatibility ------------- * change the type of Xbyak::Error from enum to a class. ** get the enum value by cast to int. * An (old) Reg32e class will split (new) Reg32e class and (new) RegExp. (new) Reg32e class is Reg32 or Reg64. (new) RegExp class is to deal with 'Reg32e + Reg32e * scale + disp'. Please rename Reg32e as RegExp if you use (old) Reg32e as RegExp. New Feature ------------- * Use MmapAllocator if XBYAK_USE_MMAP_ALLOCATOR. Default allocator calls posix_memalign on Linux, then mprotect recudes map count. The max value is written in ```/proc/sys/vm/max_map_count```. The max number of instances of ```Xbyak::CodeGenerator``` is limited to the value. See ```test/mprotect_test.cpp```. Use MmapAllocator if you want to avoid the restriction(This behavior may be default in the feature). * AutoGrow mode is a mode that Xbyak grows memory automatically if necessary. Call ready() before calling getCode() to calc address of jmp. ``` struct Code : Xbyak::CodeGenerator { Code() : Xbyak::CodeGenerator(, Xbyak::AutoGrow) { ... } }; Code c; c.ready(); // Don't forget to call this function ``` >Don't use the address returned by getCurr() before calling ready(). >It may be invalid address. >RESTRICTION : rip addressing is not supported in AutoGrow Syntax ------------- Make Xbyak::CodeGenerator and make the class method and get the function pointer by calling cgetCode() and casting the return value. NASM Xbyak mov eax, ebx --> mov(eax, ebx); inc ecx inc(ecx); ret --> ret(); ### Addressing (ptr|dword|word|byte) [base + index * (1|2|4|8) + displacement] [rip + 32bit disp] ; x64 only NASM Xbyak mov eax, [ebx+ecx] --> mov (eax, ptr[ebx+ecx]); test byte [esp], 4 --> test (byte [esp], 4); How to use Selector(Segment Register) >Note: Segment class is not derived from Operand. ``` mov eax, [fs:eax] --> putSeg(fs); mov(eax, ptr [eax]); mov ax, cs --> mov(ax, cs); ``` >you can use ptr for almost memory access unless you specify the size of memory. >dword, word and byte are member variables, then don't use dword as unsigned int, for example. ### AVX You can omit a destination for almost 3-op mnemonics. vaddps(xmm1, xmm2, xmm3); // xmm1 <- xmm2 + xmm3 vaddps(xmm2, xmm3); // xmm2 <- xmm2 + xmm3 vaddps(xmm2, xmm3, ptr [rax]); // use ptr to access memory vgatherdpd(xmm1, ptr [ebp+123+xmm2*4], xmm3); ### Label L("L1"); jmp ("L1"); jmp ("L2"); ... a few mnemonics(8-bit displacement jmp) ... L("L2"); jmp ("L3", T_NEAR); ... a lot of mnemonics(32-bit displacement jmp) ... L("L3"); >Call hasUndefinedLabel() to verify your code has no undefined label. > you can use a label for immediate value of mov like as mov (eax, "L2"); #### 1. support @@, @f, @b like MASM L("@@"); // jmp("@b"); // jmp to jmp("@f"); // jmp to L("@@"); // jmp("@b"); // jmp to mov(eax, "@b"); jmp(eax); // jmp to #### 2. localization of label by calling inLocalLabel(), outLocallabel(). labels begining of period between inLocalLabel() and outLocalLabel() are dealed with local label. inLocalLabel() and outLocalLabel() can be nested. void func1() { inLocalLabel(); L(".lp"); // ; local label ... jmp(".lp"); // jmpt to L("aaa"); // global label outLocalLabel(); } void func2() { inLocalLabel(); L(".lp"); // ; local label func1(); jmp(".lp"); // jmp to inLocalLabel(); } ### New Label class L() and jxx() functions support a new Label class. Label label1, label2; L(label1); ... jmp(label1); ... jmp(label2); ... L(label2); Moreover, assignL(dstLabel, srcLabel) method binds dstLabel with srcLabel. Label label1, label2; L(label1); ... jmp(label2); ... assignL(label2, label1); // label2 <= label1 The above jmp opecode jumps label1. * Restriction: * srcLabel must be used in L(). * dstLabel must not be used in L(). ### Code size The default max code size is 4096 bytes. Please set it in constructor of CodeGenerator() if you want to use large size. class Quantize : public Xbyak::CodeGenerator { public: Quantize() : CodeGenerator(8192) { } ... }; ### use user allocated memory You can make jit code on prepaired memory. class Sample : public Xbyak::CodeGenerator { public: Sample(void *userPtr, size_t size) : Xbyak::CodeGenerator(size, userPtr) { ... } }; const size_t codeSize = 1024; uint8 buf[codeSize + 16]; // get 16-byte aligned address uint8 *p = Xbyak::CodeArray::getAlignedAddress(buf); // append executable attribute to the memory Xbyak::CodeArray::protect(p, codeSize, true); // construct your jit code on the memory Sample s(p, codeSize); >See *sample/test0.cpp* Macro ------------- * **XBYAK32** is defined on 32bit. * **XBYAK64** is defined on 64bit. * **XBYAK64_WIN** is defined on 64bit Windows(VC) * **XBYAK64_GCC** is defined on 64bit gcc, cygwin Sample ------------- * test0.cpp ; tiny sample of Xbyak(x86, x64) * quantize.cpp ; JIT optimized quantization by fast division(x86 only) * calc.cpp ; assemble and estimate a given polynomial(x86, x64) * bf.cpp ; JIT brainfuck(x86, x64) Remark ------------- The current version does not support 3D Now!, 80bit FPU load/store and some special mnemonics. Please mail to me if necessary. License ------------- modified new BSD License http://opensource.org/licenses/BSD-3-Clause The files under test/cybozu/ are copied from cybozulib(https://github.com/herumi/cybozulib/), which is licensed by BSD-3-Clause and are used for only tests. The header files under xbyak/ are independent of cybozulib. History ------------- * 2016/Feb/04 ver 4.90 add jcc(const void *addr); * 2016/Jan/30 ver 4.89 vpblendvb supports ymm reg(thanks to John Funnell) * 2016/Jan/24 ver 4.88 lea, cmov supports 16-bit register(thanks to whyisthisfieldhere) * 2015/Oct/05 ver 4.87 support segment selectors * 2015/Aug/18 ver 4.86 fix [rip + label] addressing with immediate value(thanks to whyisthisfieldhere) * 2015/Aug/10 ver 4.85 Address::operator==() is not correct(thanks to inolen) * 2015/Jun/22 ver 4.84 call() support variadic template if available(thanks to randomstuff) * 2015/Jun/16 ver 4.83 support movbe(thanks to benvanik) * 2015/May/24 ver 4.82 support detection of F16C * 2015/Apr/25 ver 4.81 fix the condition to throw exception for setSize(thanks to whyisthisfieldhere) * 2015/Apr/22 ver 4.80 rip supports label(thanks to whyisthisfieldhere) * 2015/Jar/28 ver 4.71 support adcx, adox, cmpxchg, rdseed, stac * 2014/Oct/14 ver 4.70 support MmapAllocator * 2014/Jun/13 ver 4.62 disable warning of VC2014 * 2014/May/30 ver 4.61 support bt, bts, btr, btc * 2014/May/28 ver 4.60 support vcvtph2ps, vcvtps2ph * 2014/Apr/11 ver 4.52 add detection of rdrand * 2014/Mar/25 ver 4.51 remove state information of unreferenced labels * 2014/Mar/16 ver 4.50 support new Label * 2014/Mar/05 ver 4.40 fix wrong detection of BMI/enhanced rep on VirtualBox * 2013/Dec/03 ver 4.30 support Reg::cvt8(), cvt16(), cvt32(), cvt64() * 2013/Oct/16 ver 4.21 label support std::string * 2013/Jul/30 ver 4.20 [break backward compatibility] split Reg32e class into RegExp(base+index*scale+disp) and Reg32e(means Reg32 or Reg64) * 2013/Jul/04 ver 4.10 [break backward compatibility] change the type of Xbyak::Error from enum to a class * 2013/Jun/21 ver 4.02 add putL(LABEL) function to put the address of the label * 2013/Jun/21 ver 4.01 vpsllw, vpslld, vpsllq, vpsraw, vpsrad, vpsrlw, vpsrld, vpsrlq support (ymm, ymm, xmm). support vpbroadcastb, vpbroadcastw, vpbroadcastd, vpbroadcastq(thanks to Gabest). * 2013/May/30 ver 4.00 support AVX2, VEX-encoded GPR-instructions * 2013/Mar/27 ver 3.80 support mov(reg, "label"); * 2013/Mar/13 ver 3.76 add cqo(), jcxz(), jecxz(), jrcxz() * 2013/Jan/15 ver 3.75 add setSize() to modify generated code * 2013/Jan/12 ver 3.74 add CodeGenerator::reset() ; add Allocator::useProtect() * 2013/Jan/06 ver 3.73 use unordered_map if possible * 2012/Dec/04 ver 3.72 eax, ebx, ... are member variables of CodeGenerator(revert), Xbyak::util::eax, ... are static const. * 2012/Nov/17 ver 3.71 and_(), or_(), xor_(), not_() are available if XBYAK_NO_OP_NAMES is not defined. * 2012/Nov/17 change eax, ebx, ptr and so on in CodeGenerator as static member and alias of them are defined in Xbyak::util. * 2012/Nov/09 ver 3.70 XBYAK_NO_OP_NAMES macro is added to use and_() instead of and() (thanks to Mattias) * 2012/Nov/01 ver 3.62 add fwait/fnwait/finit/fninit * 2012/Nov/01 ver 3.61 add fldcw/fstcw * 2012/May/03 ver 3.60 change interface of Allocator * 2012/Mar/23 ver 3.51 fix userPtr mode * 2012/Mar/19 ver 3.50 support AutoGrow mode * 2011/Nov/09 ver 3.05 fix bit property of rip addresing / support movsxd * 2011/Aug/15 ver 3.04 fix dealing with imm8 such as add(dword [ebp-8], 0xda); (thanks to lolcat) * 2011/Jun/16 ver 3.03 fix __GNUC_PREREQ macro for Mac gcc(thanks to t_teruya) * 2011/Apr/28 ver 3.02 do not use xgetbv on Mac gcc * 2011/May/24 ver 3.01 fix typo of OSXSAVE * 2011/May/23 ver 3.00 add vcmpeqps and so on * 2011/Feb/16 ver 2.994 beta add vmovq for 32-bit mode(I forgot it) * 2011/Feb/16 ver 2.993 beta remove cvtReg to avoid thread unsafe * 2011/Feb/10 ver 2.992 beta support one argument syntax for fadd like nasm * 2011/Feb/07 ver 2.991 beta fix pextrw reg, xmm, imm(Thanks to Gabest) * 2011/Feb/04 ver 2.99 beta support AVX * 2010/Dec/08 ver 2.31 fix ptr [rip + 32bit offset], support rdtscp * 2010/Oct/19 ver 2.30 support pclmulqdq, aesdec, aesdeclast, aesenc, aesenclast, aesimc, aeskeygenassist * 2010/Jun/07 ver 2.29 fix call(