aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--.github/FUNDING.yml1
-rw-r--r--doc/changelog.md175
-rw-r--r--doc/install.md14
-rw-r--r--doc/usage.md409
-rw-r--r--readme.md640
5 files changed, 630 insertions, 609 deletions
diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml
new file mode 100644
index 0000000..f6612cf
--- /dev/null
+++ b/.github/FUNDING.yml
@@ -0,0 +1 @@
+github: herumi \ No newline at end of file
diff --git a/doc/changelog.md b/doc/changelog.md
new file mode 100644
index 0000000..c329558
--- /dev/null
+++ b/doc/changelog.md
@@ -0,0 +1,175 @@
+# History
+
+* 2022/Apr/05 ver 6.04 add tpause, umonitor, umwait
+* 2022/Mar/08 ver 6.03 MmapAllocator supports memfd with user-defined strings.
+* 2022/Jan/28 ver 6.02 strict check the range of 32-bit dispacement
+* 2021/Dec/14 ver 6.01 support T_FAR jump/call and retf
+* 2021/Sep/14 ver 6.00 fully support AVX512-FP16
+* 2021/Sep/09 ver 5.997 fix vrndscale* to support {sae}
+* 2021/Sep/03 ver 5.996 fix v{add,sub,mul,div,max,min}{sd,ss} to support T_rd_sae.
+* 2021/Aug/15 ver 5.995 add a label to /proc/self/maps if XBYAK_USE_MEMFD is defined on Linux
+* 2021/Jun/17 ver 5.994 add alias of vcmpXX{ps,pd,ss,sd} with mask register
+* 2021/Jun/06 ver 5.993 strict check of gather/scatter register combination
+* 2021/May/09 ver 5.992 support endbr32 and endbr64
+* 2020/Nov/16 ver 5.991 disable constexpr for gcc-5 with -std=c++-14
+* 2020/Oct/19 ver 5.99 support VNNI instructions(Thanks to akharito)
+* 2020/Oct/17 ver 5.98 support the form of [scale * reg]
+* 2020/Sep/08 ver 5.97 replace uint32 with uint32_t etc.
+* 2020/Aug/28 ver 5.95 some constructors of register classes support constexpr if C++14 or later
+* 2020/Aug/04 ver 5.941 `CodeGenerator::reset()` calls `ClearError()`.
+* 2020/Jul/28 ver 5.94 remove #include <winsock2.h> (only windows)
+* 2020/Jul/21 ver 5.93 support exception-less mode
+* 2020/Jun/30 ver 5.92 support Intel AMX instruction set (Thanks to nshustrov)
+* 2020/Jun/22 ver 5.913 fix mov(r64, imm64) on 32-bit env with XBYAK64
+* 2020/Jun/19 ver 5.912 define MAP_JIT on macOS regardless of Xcode version (Thanks to rsdubtso)
+* 2020/May/10 ver 5.911 XBYAK_USE_MMAP_ALLOCATOR is defined unless XBYAK_DONT_USE_MMAP_ALLOCATOR is defined.
+* 2020/Apr/20 ver 5.91 accept mask register k0 (it means no mask)
+* 2020/Apr/09 ver 5.90 kmov{b,d,w,q} throws exception for an unsupported register
+* 2020/Feb/26 ver 5.891 fix typo of type
+* 2020/Jan/03 ver 5.89 fix error of vfpclasspd
+* 2019/Dec/20 ver 5.88 fix compile error on Windows
+* 2019/Dec/19 ver 5.87 add setDefaultJmpNEAR(), which deals with `jmp` of an undefined label as T_NEAR if no type is specified.
+* 2019/Dec/13 ver 5.86 [changed] revert to the behavior before v5.84 if -fno-operator-names is defined (and() is available)
+* 2019/Dec/07 ver 5.85 append MAP_JIT flag to mmap for macOS mojave or later
+* 2019/Nov/29 ver 5.84 [changed] XBYAK_NO_OP_NAMES is defined unless XBYAK_USE_OP_NAMES is defined
+* 2019/Oct/12 ver 5.83 exit(1) was removed
+* 2019/Sep/23 ver 5.82 support monitorx, mwaitx, clzero (thanks to @MagurosanTeam)
+* 2019/Sep/14 ver 5.81 support some generic mnemonics.
+* 2019/Aug/01 ver 5.802 fix detection of AVX512_BF16 (thanks to vpirogov)
+* 2019/May/27 support vp2intersectd, vp2intersectq (not tested)
+* 2019/May/26 ver 5.80 support vcvtne2ps2bf16, vcvtneps2bf16, vdpbf16ps
+* 2019/Apr/27 ver 5.79 vcmppd/vcmpps supports ptr_b(thanks to jkopinsky)
+* 2019/Apr/15 ver 5.78 rewrite Reg::changeBit() (thanks to MerryMage)
+* 2019/Mar/06 ver 5.77 fix number of cores that share LLC cache by densamoilov
+* 2019/Jan/17 ver 5.76 add Cpu::getNumCores() by shelleygoel
+* 2018/Oct/31 ver 5.751 recover Xbyak::CastTo for compatibility
+* 2018/Oct/29 ver 5.75 unlink LabelManager from Label when msg is destroyed
+* 2018/Oct/21 ver 5.74 support RegRip +/- int. Xbyak::CastTo is removed
+* 2018/Oct/15 util::AddressFrame uses push/pop instead of mov
+* 2018/Sep/19 ver 5.73 fix evex encoding of vpslld, vpslldq, vpsllw, etc for (reg, mem, imm8)
+* 2018/Sep/19 ver 5.72 fix the encoding of vinsertps for disp8N(Thanks to petercaday)
+* 2018/Sep/04 ver 5.71 L() returns a new label instance
+* 2018/Aug/27 ver 5.70 support setProtectMode() and DontUseProtect for read/exec setting
+* 2018/Aug/24 ver 5.68 fix wrong VSIB encoding with vector index >= 16(thanks to petercaday)
+* 2018/Aug/14 ver 5.67 remove mutable in Address ; fix setCacheHierarchy for cloud vm
+* 2018/Jul/26 ver 5.661 support mingw64
+* 2018/Jul/24 ver 5.66 add CodeArray::PROTECT_RE to mode of protect()
+* 2018/Jun/26 ver 5.65 fix push(qword [mem])
+* 2018/Mar/07 ver 5.64 fix zero division in Cpu() on some cpu
+* 2018/Feb/14 ver 5.63 fix Cpu::setCacheHierarchy() and fix EvexModifierZero for clang<3.9(thanks to mgouicem)
+* 2018/Feb/13 ver 5.62 Cpu::setCacheHierarchy() by mgouicem and rsdubtso
+* 2018/Feb/07 ver 5.61 vmov* supports mem{k}{z}(I forgot it)
+* 2018/Jan/24 ver 5.601 add xword, yword, etc. into Xbyak::util namespace
+* 2018/Jan/05 ver 5.60 support AVX-512 for Ice lake(319433-030.pdf)
+* 2017/Aug/22 ver 5.53 fix mpx encoding, add bnd() prefix
+* 2017/Aug/18 ver 5.52 fix align (thanks to MerryMage)
+* 2017/Aug/17 ver 5.51 add multi-byte nop and align() uses it(thanks to inolen)
+* 2017/Aug/08 ver 5.50 add mpx(thanks to magurosan)
+* 2017/Aug/08 ver 5.45 add sha(thanks to magurosan)
+* 2017/Aug/08 ver 5.44 add prefetchw(thanks to rsdubtso)
+* 2017/Jul/12 ver 5.432 reduce warnings of PVS studio
+* 2017/Jul/09 ver 5.431 fix hasRex() (no affect) (thanks to drillsar)
+* 2017/May/14 ver 5.43 fix CodeGenerator::resetSize() (thanks to gibbed)
+* 2017/May/13 ver 5.42 add movs{b,w,d,q}
+* 2017/Jan/26 ver 5.41 add prefetchwt1 and support for scale == 0(thanks to rsdubtso)
+* 2016/Dec/14 ver 5.40 add Label::getAddress() method to get the pointer specified by the label
+* 2016/Dec/09 ver 5.34 fix handling of negative offsets when encoding disp8N(thanks to rsdubtso)
+* 2016/Dec/08 ver 5.33 fix encoding of vpbroadcast{b,w,d,q}, vpinsr{b,w}, vpextr{b,w} for disp8N
+* 2016/Dec/01 ver 5.32 rename __xgetbv() to _xgetbv() to support clang for Visual Studio(thanks to freiro)
+* 2016/Nov/27 ver 5.31 rename AVX512_4VNNI to AVX512_4VNNIW
+* 2016/Nov/27 ver 5.30 add AVX512_4VNNI, AVX512_4FMAPS instructions(thanks to rsdubtso)
+* 2016/Nov/26 ver 5.20 add detection of AVX512_4VNNI and AVX512_4FMAPS(thanks to rsdubtso)
+* 2016/Nov/20 ver 5.11 lost vptest for ymm(thanks to gregory38)
+* 2016/Nov/20 ver 5.10 add addressing [rip+&var]
+* 2016/Sep/29 ver 5.03 fix detection ERR_INVALID_OPMASK_WITH_MEMORY(thanks to PVS-Studio)
+* 2016/Aug/15 ver 5.02 xbyak does not include xbyak_bin2hex.h
+* 2016/Aug/15 ver 5.011 fix detection of version of gcc 5.4
+* 2016/Aug/03 ver 5.01 disable omitted operand
+* 2016/Jun/24 ver 5.00 support avx-512 instruction set
+* 2016/Jun/13 avx-512 add mask instructions
+* 2016/May/05 ver 4.91 add detection of AVX-512 to Xbyak::util::Cpu
+* 2016/Mar/14 ver 4.901 comment to ready() function(thanks to skmp)
+* 2016/Feb/04 ver 4.90 add jcc(const void *addr);
+* 2016/Jan/30 ver 4.89 vpblendvb supports ymm reg(thanks to John Funnell)
+* 2016/Jan/24 ver 4.88 lea, cmov supports 16-bit register(thanks to whyisthisfieldhere)
+* 2015/Oct/05 ver 4.87 support segment selectors
+* 2015/Aug/18 ver 4.86 fix [rip + label] addressing with immediate value(thanks to whyisthisfieldhere)
+* 2015/Aug/10 ver 4.85 Address::operator==() is not correct(thanks to inolen)
+* 2015/Jun/22 ver 4.84 call() support variadic template if available(thanks to randomstuff)
+* 2015/Jun/16 ver 4.83 support movbe(thanks to benvanik)
+* 2015/May/24 ver 4.82 support detection of F16C
+* 2015/Apr/25 ver 4.81 fix the condition to throw exception for setSize(thanks to whyisthisfieldhere)
+* 2015/Apr/22 ver 4.80 rip supports label(thanks to whyisthisfieldhere)
+* 2015/Jar/28 ver 4.71 support adcx, adox, cmpxchg, rdseed, stac
+* 2014/Oct/14 ver 4.70 support MmapAllocator
+* 2014/Jun/13 ver 4.62 disable warning of VC2014
+* 2014/May/30 ver 4.61 support bt, bts, btr, btc
+* 2014/May/28 ver 4.60 support vcvtph2ps, vcvtps2ph
+* 2014/Apr/11 ver 4.52 add detection of rdrand
+* 2014/Mar/25 ver 4.51 remove state information of unreferenced labels
+* 2014/Mar/16 ver 4.50 support new Label
+* 2014/Mar/05 ver 4.40 fix wrong detection of BMI/enhanced rep on VirtualBox
+* 2013/Dec/03 ver 4.30 support Reg::cvt8(), cvt16(), cvt32(), cvt64()
+* 2013/Oct/16 ver 4.21 label support std::string
+* 2013/Jul/30 ver 4.20 [break backward compatibility] split Reg32e class into RegExp(base+index*scale+disp) and Reg32e(means Reg32 or Reg64)
+* 2013/Jul/04 ver 4.10 [break backward compatibility] change the type of Xbyak::Error from enum to a class
+* 2013/Jun/21 ver 4.02 add putL(LABEL) function to put the address of the label
+* 2013/Jun/21 ver 4.01 vpsllw, vpslld, vpsllq, vpsraw, vpsrad, vpsrlw, vpsrld, vpsrlq support (ymm, ymm, xmm). support vpbroadcastb, vpbroadcastw, vpbroadcastd, vpbroadcastq(thanks to Gabest).
+* 2013/May/30 ver 4.00 support AVX2, VEX-encoded GPR-instructions
+* 2013/Mar/27 ver 3.80 support mov(reg, "label");
+* 2013/Mar/13 ver 3.76 add cqo(), jcxz(), jecxz(), jrcxz()
+* 2013/Jan/15 ver 3.75 add setSize() to modify generated code
+* 2013/Jan/12 ver 3.74 add CodeGenerator::reset() ; add Allocator::useProtect()
+* 2013/Jan/06 ver 3.73 use unordered_map if possible
+* 2012/Dec/04 ver 3.72 eax, ebx, ... are member variables of CodeGenerator(revert), Xbyak::util::eax, ... are static const.
+* 2012/Nov/17 ver 3.71 and_(), or_(), xor_(), not_() are available if XBYAK_NO_OP_NAMES is not defined.
+* 2012/Nov/17 change eax, ebx, ptr and so on in CodeGenerator as static member and alias of them are defined in Xbyak::util.
+* 2012/Nov/09 ver 3.70 XBYAK_NO_OP_NAMES macro is added to use and_() instead of and() (thanks to Mattias)
+* 2012/Nov/01 ver 3.62 add fwait/fnwait/finit/fninit
+* 2012/Nov/01 ver 3.61 add fldcw/fstcw
+* 2012/May/03 ver 3.60 change interface of Allocator
+* 2012/Mar/23 ver 3.51 fix userPtr mode
+* 2012/Mar/19 ver 3.50 support AutoGrow mode
+* 2011/Nov/09 ver 3.05 fix bit property of rip addresing / support movsxd
+* 2011/Aug/15 ver 3.04 fix dealing with imm8 such as add(dword [ebp-8], 0xda); (thanks to lolcat)
+* 2011/Jun/16 ver 3.03 fix __GNUC_PREREQ macro for Mac gcc(thanks to t_teruya)
+* 2011/Apr/28 ver 3.02 do not use xgetbv on Mac gcc
+* 2011/May/24 ver 3.01 fix typo of OSXSAVE
+* 2011/May/23 ver 3.00 add vcmpeqps and so on
+* 2011/Feb/16 ver 2.994 beta add vmovq for 32-bit mode(I forgot it)
+* 2011/Feb/16 ver 2.993 beta remove cvtReg to avoid thread unsafe
+* 2011/Feb/10 ver 2.992 beta support one argument syntax for fadd like nasm
+* 2011/Feb/07 ver 2.991 beta fix pextrw reg, xmm, imm(Thanks to Gabest)
+* 2011/Feb/04 ver 2.99 beta support AVX
+* 2010/Dec/08 ver 2.31 fix ptr [rip + 32bit offset], support rdtscp
+* 2010/Oct/19 ver 2.30 support pclmulqdq, aesdec, aesdeclast, aesenc, aesenclast, aesimc, aeskeygenassist
+* 2010/Jun/07 ver 2.29 fix call(<label>)
+* 2010/Jun/17 ver 2.28 move some member functions to public
+* 2010/Jun/01 ver 2.27 support encoding of mov(reg64, imm) like yasm(not nasm)
+* 2010/May/24 ver 2.26 fix sub(rsp, 1000)
+* 2010/Apr/26 ver 2.25 add jc/jnc(I forgot to implement them...)
+* 2010/Apr/16 ver 2.24 change the prototype of rewrite() method
+* 2010/Apr/15 ver 2.23 fix align() and xbyak_util.h for Mac
+* 2010/Feb/16 ver 2.22 fix inLocalLabel()/outLocalLabel()
+* 2009/Dec/09 ver 2.21 support cygwin(gcc 4.3.2)
+* 2009/Nov/28 support a part of FPU
+* 2009/Jun/25 fix mov(qword[rax], imm); (thanks to Martin)
+* 2009/Mar/10 fix redundant REX.W prefix on jmp/call reg64
+* 2009/Feb/24 add movq reg64, mmx/xmm; movq mmx/xmm, reg64
+* 2009/Feb/13 movd(xmm7, dword[eax]) drops 0x66 prefix (thanks to Gabest)
+* 2008/Dec/30 fix call in short relative address(thanks to kato san)
+* 2008/Sep/18 support @@, @f, @b and localization of label(thanks to nobu-q san)
+* 2008/Sep/18 support (ptr[rip + 32bit offset]) (thanks to Dango-Chu san)
+* 2008/Jun/03 fix align(). mov(ptr[eax],1) throws ERR_MEM_SIZE_IS_NOT_SPECIFIED.
+* 2008/Jun/02 support memory interface allocated by user
+* 2008/May/26 fix protect() to avoid invalid setting(thanks to shinichiro_h san)
+* 2008/Apr/30 add cmpxchg16b, cdqe
+* 2008/Apr/29 support x64
+* 2008/Apr/14 code refactoring
+* 2008/Mar/12 add bsr/bsf
+* 2008/Feb/14 fix output of sub eax, 1234 (thanks to Robert)
+* 2007/Nov/5 support lock, xadd, xchg
+* 2007/Nov/2 support SSSE3/SSE4 (thanks to Dango-Chu san)
+* 2007/Feb/4 fix the bug that exception doesn't occur under the condition which the offset of jmp mnemonic without T_NEAR is over 127.
+* 2007/Jan/21 fix the bug to create address like [disp] select smaller representation for mov (eax|ax|al, [disp])
+* 2007/Jan/4 first version
diff --git a/doc/install.md b/doc/install.md
new file mode 100644
index 0000000..ddc1a10
--- /dev/null
+++ b/doc/install.md
@@ -0,0 +1,14 @@
+# Install
+
+The following files are necessary. Please add the path to your compile directory.
+
+* xbyak.h
+* xbyak_mnemonic.h
+* xbyak_util.h
+
+Linux:
+```
+make install
+```
+
+These files are copied into `/usr/local/include/xbyak`.
diff --git a/doc/usage.md b/doc/usage.md
new file mode 100644
index 0000000..7dad245
--- /dev/null
+++ b/doc/usage.md
@@ -0,0 +1,409 @@
+# Usage
+
+Inherit `Xbyak::CodeGenerator` class and make the class method.
+```
+#include <xbyak/xbyak.h>
+
+struct Code : Xbyak::CodeGenerator {
+ Code(int x)
+ {
+ mov(eax, x);
+ ret();
+ }
+};
+```
+Or you can pass the instance of CodeGenerator without inheriting.
+```
+void genCode(Xbyak::CodeGenerator& code, int x) {
+ using namespace Xbyak::util;
+ code.mov(eax, x);
+ code.ret();
+}
+```
+
+Make an instance of the class and get the function
+pointer by calling `getCode()` and call it.
+```
+Code c(5);
+int (*f)() = c.getCode<int (*)()>();
+printf("ret=%d\n", f()); // ret = 5
+```
+
+## Syntax
+Similar to MASM/NASM syntax with parentheses.
+
+```
+NASM Xbyak
+mov eax, ebx --> mov(eax, ebx);
+inc ecx inc(ecx);
+ret --> ret();
+```
+
+## Addressing
+Use `qword`, `dword`, `word` and `byte` if it is necessary to specify the size of memory,
+otherwise use `ptr`.
+
+```
+(ptr|qword|dword|word|byte) [base + index * (1|2|4|8) + displacement]
+ [rip + 32bit disp] ; x64 only
+
+NASM Xbyak
+mov eax, [ebx+ecx] --> mov(eax, ptr [ebx+ecx]);
+mov al, [ebx+ecx] --> mov(al, ptr [ebx + ecx]);
+test byte [esp], 4 --> test(byte [esp], 4);
+inc qword [rax] --> inc(qword [rax]);
+```
+**Note**: `qword`, ... are member variables, then don't use `dword` as unsigned int type.
+
+### How to use Selector (Segment Register)
+```
+mov eax, [fs:eax] --> putSeg(fs);
+ mov(eax, ptr [eax]);
+mov ax, cs --> mov(ax, cs);
+```
+**Note**: Segment class is not derived from `Operand`.
+
+## AVX
+
+```
+vaddps(xmm1, xmm2, xmm3); // xmm1 <- xmm2 + xmm3
+vaddps(xmm2, xmm3, ptr [rax]); // use ptr to access memory
+vgatherdpd(xmm1, ptr [ebp + 256 + xmm2*4], xmm3);
+```
+
+**Note**:
+If `XBYAK_ENABLE_OMITTED_OPERAND` is defined, then you can use two operand version for backward compatibility.
+But the newer version will not support it.
+```
+vaddps(xmm2, xmm3); // xmm2 <- xmm2 + xmm3
+```
+
+## AVX-512
+
+```
+vaddpd zmm2, zmm5, zmm30 --> vaddpd(zmm2, zmm5, zmm30);
+vaddpd xmm30, xmm20, [rax] --> vaddpd(xmm30, xmm20, ptr [rax]);
+vaddps xmm30, xmm20, [rax] --> vaddps(xmm30, xmm20, ptr [rax]);
+vaddpd zmm2{k5}, zmm4, zmm2 --> vaddpd(zmm2 | k5, zmm4, zmm2);
+vaddpd zmm2{k5}{z}, zmm4, zmm2 --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2);
+vaddpd zmm2{k5}{z}, zmm4, zmm2,{rd-sae} --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2 | T_rd_sae);
+ vaddpd(zmm2 | k5 | T_z | T_rd_sae, zmm4, zmm2); // the position of `|` is arbitrary.
+vcmppd k4{k3}, zmm1, zmm2, {sae}, 5 --> vcmppd(k4 | k3, zmm1, zmm2 | T_sae, 5);
+
+vaddpd xmm1, xmm2, [rax+256] --> vaddpd(xmm1, xmm2, ptr [rax+256]);
+vaddpd xmm1, xmm2, [rax+256]{1to2} --> vaddpd(xmm1, xmm2, ptr_b [rax+256]);
+vaddpd ymm1, ymm2, [rax+256]{1to4} --> vaddpd(ymm1, ymm2, ptr_b [rax+256]);
+vaddpd zmm1, zmm2, [rax+256]{1to8} --> vaddpd(zmm1, zmm2, ptr_b [rax+256]);
+vaddps zmm1, zmm2, [rax+rcx*8+8]{1to16} --> vaddps(zmm1, zmm2, ptr_b [rax+rcx*8+8]);
+vmovsd [rax]{k1}, xmm4 --> vmovsd(ptr [rax] | k1, xmm4);
+
+vcvtpd2dq xmm16, oword [eax+33] --> vcvtpd2dq(xmm16, xword [eax+33]); // use xword for m128 instead of oword
+ vcvtpd2dq(xmm16, ptr [eax+33]); // default xword
+vcvtpd2dq xmm21, [eax+32]{1to2} --> vcvtpd2dq(xmm21, ptr_b [eax+32]);
+vcvtpd2dq xmm0, yword [eax+33] --> vcvtpd2dq(xmm0, yword [eax+33]); // use yword for m256
+vcvtpd2dq xmm19, [eax+32]{1to4} --> vcvtpd2dq(xmm19, yword_b [eax+32]); // use yword_b to broadcast
+
+vfpclassps k5{k3}, zword [rax+64], 5 --> vfpclassps(k5|k3, zword [rax+64], 5); // specify m512
+vfpclasspd k5{k3}, [rax+64]{1to2}, 5 --> vfpclasspd(k5|k3, xword_b [rax+64], 5); // broadcast 64-bit to 128-bit
+vfpclassps k5{k3}, [rax+64]{1to4}, 5 --> vfpclassps(k5|k3, yword_b [rax+64], 5); // broadcast 64-bit to 256-bit
+
+vpdpbusd(xm0, xm1, xm2); // default encoding is EVEX
+vpdpbusd(xm0, xm1, xm2, EvexEncoding); // same as the above
+vpdpbusd(xm0, xm1, xm2, VexEncoding); // VEX encoding
+```
+### Remark
+* `k1`, ..., `k7` are opmask registers.
+ - `k0` is dealt as no mask.
+ - e.g. `vmovaps(zmm0|k0, ptr[rax]);` and `vmovaps(zmm0|T_z, ptr[rax]);` are same to `vmovaps(zmm0, ptr[rax]);`.
+* use `| T_z`, `| T_sae`, `| T_rn_sae`, `| T_rd_sae`, `| T_ru_sae`, `| T_rz_sae` instead of `,{z}`, `,{sae}`, `,{rn-sae}`, `,{rd-sae}`, `,{ru-sae}`, `,{rz-sae}` respectively.
+* `k4 | k3` is different from `k3 | k4`.
+* use `ptr_b` for broadcast `{1toX}`. X is automatically determined.
+* specify `xword`/`yword`/`zword(_b)` for m128/m256/m512 if necessary.
+
+## Label
+Two kinds of Label are supported. (String literal and Label class).
+
+### String literal
+```
+L("L1");
+ jmp("L1");
+
+ jmp("L2");
+ ...
+ a few mnemonics (8-bit displacement jmp)
+ ...
+L("L2");
+
+ jmp("L3", T_NEAR);
+ ...
+ a lot of mnemonics (32-bit displacement jmp)
+ ...
+L("L3");
+```
+
+* Call `hasUndefinedLabel()` to verify your code has no undefined label.
+* you can use a label for immediate value of mov like as `mov(eax, "L2")`.
+
+### Support `@@`, `@f`, `@b` like MASM
+
+```
+L("@@"); // <A>
+ jmp("@b"); // jmp to <A>
+ jmp("@f"); // jmp to <B>
+L("@@"); // <B>
+ jmp("@b"); // jmp to <B>
+ mov(eax, "@b");
+ jmp(eax); // jmp to <B>
+```
+
+### Local label
+
+Label symbols beginning with a period between `inLocalLabel()` and `outLocalLabel()`
+are treated as a local label.
+`inLocalLabel()` and `outLocalLabel()` can be nested.
+
+```
+void func1()
+{
+ inLocalLabel();
+ L(".lp"); // <A> ; local label
+ ...
+ jmp(".lp"); // jmp to <A>
+ L("aaa"); // global label <C>
+ outLocalLabel();
+
+ inLocalLabel();
+ L(".lp"); // <B> ; local label
+ func1();
+ jmp(".lp"); // jmp to <B>
+ inLocalLabel();
+ jmp("aaa"); // jmp to <C>
+}
+```
+
+### short and long jump
+Xbyak deals with jump mnemonics of an undefined label as short jump if no type is specified.
+So if the size between jmp and label is larger than 127 byte, then xbyak will cause an error.
+
+```
+jmp("short-jmp"); // short jmp
+// small code
+L("short-jmp");
+
+jmp("long-jmp");
+// long code
+L("long-jmp"); // throw exception
+```
+Then specify T_NEAR for jmp.
+```
+jmp("long-jmp", T_NEAR); // long jmp
+// long code
+L("long-jmp");
+```
+Or call `setDefaultJmpNEAR(true);` once, then the default type is set to T_NEAR.
+```
+jmp("long-jmp"); // long jmp
+// long code
+L("long-jmp");
+```
+
+### Label class
+
+`L()` and `jxx()` support Label class.
+
+```
+ Xbyak::Label label1, label2;
+L(label1);
+ ...
+ jmp(label1);
+ ...
+ jmp(label2);
+ ...
+L(label2);
+```
+
+Use `putL` for jmp table
+```
+ Label labelTbl, L0, L1, L2;
+ mov(rax, labelTbl);
+ // rdx is an index of jump table
+ jmp(ptr [rax + rdx * sizeof(void*)]);
+L(labelTbl);
+ putL(L0);
+ putL(L1);
+ putL(L2);
+L(L0);
+ ....
+L(L1);
+ ....
+```
+
+`assignL(dstLabel, srcLabel)` binds dstLabel with srcLabel.
+
+```
+ Label label2;
+ Label label1 = L(); // make label1 ; same to Label label1; L(label1);
+ ...
+ jmp(label2); // label2 is not determined here
+ ...
+ assignL(label2, label1); // label2 <- label1
+```
+The `jmp` in the above code jumps to label1 assigned by `assignL`.
+
+**Note**:
+* srcLabel must be used in `L()`.
+* dstLabel must not be used in `L()`.
+
+`Label::getAddress()` returns the address specified by the label instance and 0 if not specified.
+```
+// not AutoGrow mode
+Label label;
+assert(label.getAddress() == 0);
+L(label);
+assert(label.getAddress() == getCurr());
+```
+
+### Rip ; relative addressing
+```
+Label label;
+mov(eax, ptr [rip + label]); // eax = 4
+...
+
+L(label);
+dd(4);
+```
+```
+int x;
+...
+ mov(eax, ptr[rip + &x]); // throw exception if the difference between &x and current position is larger than 2GiB
+```
+
+## Far jump
+
+Use `word|dword|qword` instead of `ptr` to specify the address size.
+
+### 32 bit mode
+```
+jmp(word[eax], T_FAR); // jmp m16:16(FF /5)
+jmp(dword[eax], T_FAR); // jmp m16:32(FF /5)
+```
+
+### 64 bit mode
+```
+jmp(word[rax], T_FAR); // jmp m16:16(FF /5)
+jmp(dword[rax], T_FAR); // jmp m16:32(FF /5)
+jmp(qword[rax], T_FAR); // jmp m16:64(REX.W FF /5)
+```
+The same applies to `call`.
+
+## Code size
+The default max code size is 4096 bytes.
+Specify the size in constructor of `CodeGenerator()` if necessary.
+
+```
+class Quantize : public Xbyak::CodeGenerator {
+public:
+ Quantize()
+ : CodeGenerator(8192)
+ {
+ }
+ ...
+};
+```
+
+## User allocated memory
+
+You can make jit code on prepared memory.
+
+Call `setProtectModeRE` yourself to change memory mode if using the prepared memory.
+
+```
+uint8_t alignas(4096) buf[8192]; // C++11 or later
+
+struct Code : Xbyak::CodeGenerator {
+ Code() : Xbyak::CodeGenerator(sizeof(buf), buf)
+ {
+ mov(rax, 123);
+ ret();
+ }
+};
+
+int main()
+{
+ Code c;
+ c.setProtectModeRE(); // set memory to Read/Exec
+ printf("%d\n", c.getCode<int(*)()>()());
+}
+```
+
+**Note**: See [../sample/test0.cpp](../sample/test0.cpp).
+
+### AutoGrow
+
+The memory region for jit is automatically extended if necessary when `AutoGrow` is specified in a constructor of `CodeGenerator`.
+
+Call `ready()` or `readyRE()` before calling `getCode()` to fix jump address.
+```
+struct Code : Xbyak::CodeGenerator {
+ Code()
+ : Xbyak::CodeGenerator(<default memory size>, Xbyak::AutoGrow)
+ {
+ ...
+ }
+};
+Code c;
+// generate code for jit
+c.ready(); // mode = Read/Write/Exec
+```
+
+**Note**:
+* Don't use the address returned by `getCurr()` before calling `ready()` because it may be invalid address.
+
+### Read/Exec mode
+Xbyak set Read/Write/Exec mode to memory to run jit code.
+If you want to use Read/Exec mode for security, then specify `DontSetProtectRWE` for `CodeGenerator` and
+call `setProtectModeRE()` after generating jit code.
+
+```
+struct Code : Xbyak::CodeGenerator {
+ Code()
+ : Xbyak::CodeGenerator(4096, Xbyak::DontSetProtectRWE)
+ {
+ mov(eax, 123);
+ ret();
+ }
+};
+
+Code c;
+c.setProtectModeRE();
+...
+
+```
+Call `readyRE()` instead of `ready()` when using `AutoGrow` mode.
+See [protect-re.cpp](../sample/protect-re.cpp).
+
+## Exception-less mode
+If `XBYAK_NO_EXCEPTION` is defined, then gcc/clang can compile xbyak with `-fno-exceptions`.
+In stead of throwing an exception, `Xbyak::GetError()` returns non-zero value (e.g. `ERR_BAD_ADDRESSING`) if there is something wrong.
+The status will not be changed automatically, then you should reset it by `Xbyak::ClearError()`.
+`CodeGenerator::reset()` calls `ClearError()`.
+
+## Macro
+
+* **XBYAK32** is defined on 32bit.
+* **XBYAK64** is defined on 64bit.
+* **XBYAK64_WIN** is defined on 64bit Windows(VC).
+* **XBYAK64_GCC** is defined on 64bit gcc, cygwin.
+* define **XBYAK_USE_OP_NAMES** on gcc with `-fno-operator-names` if you want to use `and()`, ....
+* define **XBYAK_ENABLE_OMITTED_OPERAND** if you use omitted destination such as `vaddps(xmm2, xmm3);`(deprecated in the future).
+* define **XBYAK_UNDEF_JNL** if Bessel function jnl is defined as macro.
+* define **XBYAK_NO_EXCEPTION** for a compiler option `-fno-exceptions`.
+* define **XBYAK_USE_MEMFD** on Linux then /proc/self/maps shows the area used by xbyak.
+* define **XBYAK_OLD_DISP_CHECK** if the old disp check is necessary (deprecated in the future).
+
+## Sample
+
+* [test0.cpp](../sample/test0.cpp) ; tiny sample (x86, x64)
+* [quantize.cpp](../sample/quantize.cpp) ; JIT optimized quantization by fast division (x86 only)
+* [calc.cpp](../sample/calc.cpp) ; assemble and estimate a given polynomial (x86, x64)
+* [bf.cpp](../sample/bf.cpp) ; JIT brainfuck (x86, x64)
diff --git a/readme.md b/readme.md
index 415bf60..118bdde 100644
--- a/readme.md
+++ b/readme.md
@@ -1,6 +1,13 @@
-[![Build Status](https://github.com/herumi/xbyak/actions/workflows/main.yml/badge.svg)](https://github.com/herumi/xbyak/actions/workflows/main.yml)
-# Xbyak 6.04 ; JIT assembler for x86(IA32), x64(AMD64, x86-64) by C++
+# Xbyak 6.04 [![Badge Build]][Build Status]
+
+*A C++ JIT assembler for x86 (IA32), x64 (AMD64, x86-64)*
+
+## Menu
+
+- [Install]
+- [Usage]
+- [Changelog]
## Abstract
@@ -10,15 +17,17 @@ The pronunciation of Xbyak is `kəi-bja-k`.
It is named from a Japanese word [開闢](https://translate.google.com/?hl=ja&sl=ja&tl=en&text=%E9%96%8B%E9%97%A2&op=translate), which means the beginning of the world.
## Feature
-* header file only
-* Intel/MASM like syntax
-* fully support AVX-512
+
+- header file only
+- Intel/MASM like syntax
+- fully support AVX-512
**Note**:
Use `and_()`, `or_()`, ... instead of `and()`, `or()`.
If you want to use them, then specify `-fno-operator-names` option to gcc/clang.
### News
+
- WAITPKG instructions (tpause, umonitor, umwait) are supported.
- MmapAllocator supports memfd with user-defined strings. see sample/memfd.cpp
- strictly check address offset disp32 in a signed 32-bit integer. e.g., `ptr[(void*)0xffffffff]` causes an error.
@@ -32,621 +41,34 @@ If you want to use them, then specify `-fno-operator-names` option to gcc/clang.
### Supported OS
-* Windows Xp, Vista, Windows 7, Windows 10(32bit, 64bit)
-* Linux(32bit, 64bit)
-* Intel macOS
+- Windows (Xp, Vista, 7, 10, 11) (32 / 64 bit)
+- Linux (32 / 64 bit)
+- macOS (Intel CPU)
### Supported Compilers
Almost C++03 or later compilers for x86/x64 such as Visual Studio, g++, clang++, Intel C++ compiler and g++ on mingw/cygwin.
-## Install
-
-The following files are necessary. Please add the path to your compile directory.
-
-* xbyak.h
-* xbyak_mnemonic.h
-* xbyak_util.h
-
-Linux:
-```
-make install
-```
-
-These files are copied into `/usr/local/include/xbyak`.
-
-## How to use it
-
-Inherit `Xbyak::CodeGenerator` class and make the class method.
-```
-#include <xbyak/xbyak.h>
-
-struct Code : Xbyak::CodeGenerator {
- Code(int x)
- {
- mov(eax, x);
- ret();
- }
-};
-```
-Or you can pass the instance of CodeGenerator without inheriting.
-```
-void genCode(Xbyak::CodeGenerator& code, int x) {
- using namespace Xbyak::util;
- code.mov(eax, x);
- code.ret();
-}
-```
-
-Make an instance of the class and get the function
-pointer by calling `getCode()` and call it.
-```
-Code c(5);
-int (*f)() = c.getCode<int (*)()>();
-printf("ret=%d\n", f()); // ret = 5
-```
-
-## Syntax
-Similar to MASM/NASM syntax with parentheses.
-
-```
-NASM Xbyak
-mov eax, ebx --> mov(eax, ebx);
-inc ecx inc(ecx);
-ret --> ret();
-```
-
-## Addressing
-Use `qword`, `dword`, `word` and `byte` if it is necessary to specify the size of memory,
-otherwise use `ptr`.
-
-```
-(ptr|qword|dword|word|byte) [base + index * (1|2|4|8) + displacement]
- [rip + 32bit disp] ; x64 only
-
-NASM Xbyak
-mov eax, [ebx+ecx] --> mov(eax, ptr [ebx+ecx]);
-mov al, [ebx+ecx] --> mov(al, ptr [ebx + ecx]);
-test byte [esp], 4 --> test(byte [esp], 4);
-inc qword [rax] --> inc(qword [rax]);
-```
-**Note**: `qword`, ... are member variables, then don't use `dword` as unsigned int type.
-
-### How to use Selector (Segment Register)
-```
-mov eax, [fs:eax] --> putSeg(fs);
- mov(eax, ptr [eax]);
-mov ax, cs --> mov(ax, cs);
-```
-**Note**: Segment class is not derived from `Operand`.
-
-## AVX
-
-```
-vaddps(xmm1, xmm2, xmm3); // xmm1 <- xmm2 + xmm3
-vaddps(xmm2, xmm3, ptr [rax]); // use ptr to access memory
-vgatherdpd(xmm1, ptr [ebp + 256 + xmm2*4], xmm3);
-```
-
-**Note**:
-If `XBYAK_ENABLE_OMITTED_OPERAND` is defined, then you can use two operand version for backward compatibility.
-But the newer version will not support it.
-```
-vaddps(xmm2, xmm3); // xmm2 <- xmm2 + xmm3
-```
-
-## AVX-512
-
-```
-vaddpd zmm2, zmm5, zmm30 --> vaddpd(zmm2, zmm5, zmm30);
-vaddpd xmm30, xmm20, [rax] --> vaddpd(xmm30, xmm20, ptr [rax]);
-vaddps xmm30, xmm20, [rax] --> vaddps(xmm30, xmm20, ptr [rax]);
-vaddpd zmm2{k5}, zmm4, zmm2 --> vaddpd(zmm2 | k5, zmm4, zmm2);
-vaddpd zmm2{k5}{z}, zmm4, zmm2 --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2);
-vaddpd zmm2{k5}{z}, zmm4, zmm2,{rd-sae} --> vaddpd(zmm2 | k5 | T_z, zmm4, zmm2 | T_rd_sae);
- vaddpd(zmm2 | k5 | T_z | T_rd_sae, zmm4, zmm2); // the position of `|` is arbitrary.
-vcmppd k4{k3}, zmm1, zmm2, {sae}, 5 --> vcmppd(k4 | k3, zmm1, zmm2 | T_sae, 5);
-
-vaddpd xmm1, xmm2, [rax+256] --> vaddpd(xmm1, xmm2, ptr [rax+256]);
-vaddpd xmm1, xmm2, [rax+256]{1to2} --> vaddpd(xmm1, xmm2, ptr_b [rax+256]);
-vaddpd ymm1, ymm2, [rax+256]{1to4} --> vaddpd(ymm1, ymm2, ptr_b [rax+256]);
-vaddpd zmm1, zmm2, [rax+256]{1to8} --> vaddpd(zmm1, zmm2, ptr_b [rax+256]);
-vaddps zmm1, zmm2, [rax+rcx*8+8]{1to16} --> vaddps(zmm1, zmm2, ptr_b [rax+rcx*8+8]);
-vmovsd [rax]{k1}, xmm4 --> vmovsd(ptr [rax] | k1, xmm4);
-
-vcvtpd2dq xmm16, oword [eax+33] --> vcvtpd2dq(xmm16, xword [eax+33]); // use xword for m128 instead of oword
- vcvtpd2dq(xmm16, ptr [eax+33]); // default xword
-vcvtpd2dq xmm21, [eax+32]{1to2} --> vcvtpd2dq(xmm21, ptr_b [eax+32]);
-vcvtpd2dq xmm0, yword [eax+33] --> vcvtpd2dq(xmm0, yword [eax+33]); // use yword for m256
-vcvtpd2dq xmm19, [eax+32]{1to4} --> vcvtpd2dq(xmm19, yword_b [eax+32]); // use yword_b to broadcast
-
-vfpclassps k5{k3}, zword [rax+64], 5 --> vfpclassps(k5|k3, zword [rax+64], 5); // specify m512
-vfpclasspd k5{k3}, [rax+64]{1to2}, 5 --> vfpclasspd(k5|k3, xword_b [rax+64], 5); // broadcast 64-bit to 128-bit
-vfpclassps k5{k3}, [rax+64]{1to4}, 5 --> vfpclassps(k5|k3, yword_b [rax+64], 5); // broadcast 64-bit to 256-bit
-
-vpdpbusd(xm0, xm1, xm2); // default encoding is EVEX
-vpdpbusd(xm0, xm1, xm2, EvexEncoding); // same as the above
-vpdpbusd(xm0, xm1, xm2, VexEncoding); // VEX encoding
-```
-### Remark
-* `k1`, ..., `k7` are opmask registers.
- - `k0` is dealt as no mask.
- - e.g. `vmovaps(zmm0|k0, ptr[rax]);` and `vmovaps(zmm0|T_z, ptr[rax]);` are same to `vmovaps(zmm0, ptr[rax]);`.
-* use `| T_z`, `| T_sae`, `| T_rn_sae`, `| T_rd_sae`, `| T_ru_sae`, `| T_rz_sae` instead of `,{z}`, `,{sae}`, `,{rn-sae}`, `,{rd-sae}`, `,{ru-sae}`, `,{rz-sae}` respectively.
-* `k4 | k3` is different from `k3 | k4`.
-* use `ptr_b` for broadcast `{1toX}`. X is automatically determined.
-* specify `xword`/`yword`/`zword(_b)` for m128/m256/m512 if necessary.
-
-## Label
-Two kinds of Label are supported. (String literal and Label class).
-
-### String literal
-```
-L("L1");
- jmp("L1");
-
- jmp("L2");
- ...
- a few mnemonics (8-bit displacement jmp)
- ...
-L("L2");
-
- jmp("L3", T_NEAR);
- ...
- a lot of mnemonics (32-bit displacement jmp)
- ...
-L("L3");
-```
-
-* Call `hasUndefinedLabel()` to verify your code has no undefined label.
-* you can use a label for immediate value of mov like as `mov(eax, "L2")`.
-
-### Support `@@`, `@f`, `@b` like MASM
-
-```
-L("@@"); // <A>
- jmp("@b"); // jmp to <A>
- jmp("@f"); // jmp to <B>
-L("@@"); // <B>
- jmp("@b"); // jmp to <B>
- mov(eax, "@b");
- jmp(eax); // jmp to <B>
-```
-
-### Local label
-
-Label symbols beginning with a period between `inLocalLabel()` and `outLocalLabel()`
-are treated as a local label.
-`inLocalLabel()` and `outLocalLabel()` can be nested.
-
-```
-void func1()
-{
- inLocalLabel();
- L(".lp"); // <A> ; local label
- ...
- jmp(".lp"); // jmp to <A>
- L("aaa"); // global label <C>
- outLocalLabel();
-
- inLocalLabel();
- L(".lp"); // <B> ; local label
- func1();
- jmp(".lp"); // jmp to <B>
- inLocalLabel();
- jmp("aaa"); // jmp to <C>
-}
-```
-
-### short and long jump
-Xbyak deals with jump mnemonics of an undefined label as short jump if no type is specified.
-So if the size between jmp and label is larger than 127 byte, then xbyak will cause an error.
-
-```
-jmp("short-jmp"); // short jmp
-// small code
-L("short-jmp");
-
-jmp("long-jmp");
-// long code
-L("long-jmp"); // throw exception
-```
-Then specify T_NEAR for jmp.
-```
-jmp("long-jmp", T_NEAR); // long jmp
-// long code
-L("long-jmp");
-```
-Or call `setDefaultJmpNEAR(true);` once, then the default type is set to T_NEAR.
-```
-jmp("long-jmp"); // long jmp
-// long code
-L("long-jmp");
-```
-
-### Label class
-
-`L()` and `jxx()` support Label class.
-
-```
- Xbyak::Label label1, label2;
-L(label1);
- ...
- jmp(label1);
- ...
- jmp(label2);
- ...
-L(label2);
-```
-
-Use `putL` for jmp table
-```
- Label labelTbl, L0, L1, L2;
- mov(rax, labelTbl);
- // rdx is an index of jump table
- jmp(ptr [rax + rdx * sizeof(void*)]);
-L(labelTbl);
- putL(L0);
- putL(L1);
- putL(L2);
-L(L0);
- ....
-L(L1);
- ....
-```
-
-`assignL(dstLabel, srcLabel)` binds dstLabel with srcLabel.
-
-```
- Label label2;
- Label label1 = L(); // make label1 ; same to Label label1; L(label1);
- ...
- jmp(label2); // label2 is not determined here
- ...
- assignL(label2, label1); // label2 <- label1
-```
-The `jmp` in the above code jumps to label1 assigned by `assignL`.
-
-**Note**:
-* srcLabel must be used in `L()`.
-* dstLabel must not be used in `L()`.
-
-`Label::getAddress()` returns the address specified by the label instance and 0 if not specified.
-```
-// not AutoGrow mode
-Label label;
-assert(label.getAddress() == 0);
-L(label);
-assert(label.getAddress() == getCurr());
-```
-
-### Rip ; relative addressing
-```
-Label label;
-mov(eax, ptr [rip + label]); // eax = 4
-...
-
-L(label);
-dd(4);
-```
-```
-int x;
-...
- mov(eax, ptr[rip + &x]); // throw exception if the difference between &x and current position is larger than 2GiB
-```
-
-## Far jump
-
-Use `word|dword|qword` instead of `ptr` to specify the address size.
-
-### 32 bit mode
-```
-jmp(word[eax], T_FAR); // jmp m16:16(FF /5)
-jmp(dword[eax], T_FAR); // jmp m16:32(FF /5)
-```
-
-### 64 bit mode
-```
-jmp(word[rax], T_FAR); // jmp m16:16(FF /5)
-jmp(dword[rax], T_FAR); // jmp m16:32(FF /5)
-jmp(qword[rax], T_FAR); // jmp m16:64(REX.W FF /5)
-```
-The same applies to `call`.
-
-## Code size
-The default max code size is 4096 bytes.
-Specify the size in constructor of `CodeGenerator()` if necessary.
-
-```
-class Quantize : public Xbyak::CodeGenerator {
-public:
- Quantize()
- : CodeGenerator(8192)
- {
- }
- ...
-};
-```
-
-## User allocated memory
-
-You can make jit code on prepared memory.
-
-Call `setProtectModeRE` yourself to change memory mode if using the prepared memory.
-
-```
-uint8_t alignas(4096) buf[8192]; // C++11 or later
-
-struct Code : Xbyak::CodeGenerator {
- Code() : Xbyak::CodeGenerator(sizeof(buf), buf)
- {
- mov(rax, 123);
- ret();
- }
-};
-
-int main()
-{
- Code c;
- c.setProtectModeRE(); // set memory to Read/Exec
- printf("%d\n", c.getCode<int(*)()>()());
-}
-```
-
-**Note**: See [sample/test0.cpp](sample/test0.cpp).
-
-### AutoGrow
-
-The memory region for jit is automatically extended if necessary when `AutoGrow` is specified in a constructor of `CodeGenerator`.
-
-Call `ready()` or `readyRE()` before calling `getCode()` to fix jump address.
-```
-struct Code : Xbyak::CodeGenerator {
- Code()
- : Xbyak::CodeGenerator(<default memory size>, Xbyak::AutoGrow)
- {
- ...
- }
-};
-Code c;
-// generate code for jit
-c.ready(); // mode = Read/Write/Exec
-```
-
-**Note**:
-* Don't use the address returned by `getCurr()` before calling `ready()` because it may be invalid address.
-
-### Read/Exec mode
-Xbyak set Read/Write/Exec mode to memory to run jit code.
-If you want to use Read/Exec mode for security, then specify `DontSetProtectRWE` for `CodeGenerator` and
-call `setProtectModeRE()` after generating jit code.
-
-```
-struct Code : Xbyak::CodeGenerator {
- Code()
- : Xbyak::CodeGenerator(4096, Xbyak::DontSetProtectRWE)
- {
- mov(eax, 123);
- ret();
- }
-};
-
-Code c;
-c.setProtectModeRE();
-...
-
-```
-Call `readyRE()` instead of `ready()` when using `AutoGrow` mode.
-See [protect-re.cpp](sample/protect-re.cpp).
-
-## Exception-less mode
-If `XBYAK_NO_EXCEPTION` is defined, then gcc/clang can compile xbyak with `-fno-exceptions`.
-In stead of throwing an exception, `Xbyak::GetError()` returns non-zero value (e.g. `ERR_BAD_ADDRESSING`) if there is something wrong.
-The status will not be changed automatically, then you should reset it by `Xbyak::ClearError()`.
-`CodeGenerator::reset()` calls `ClearError()`.
+## License
-## Macro
+[BSD-3-Clause License](http://opensource.org/licenses/BSD-3-Clause)
-* **XBYAK32** is defined on 32bit.
-* **XBYAK64** is defined on 64bit.
-* **XBYAK64_WIN** is defined on 64bit Windows(VC).
-* **XBYAK64_GCC** is defined on 64bit gcc, cygwin.
-* define **XBYAK_USE_OP_NAMES** on gcc with `-fno-operator-names` if you want to use `and()`, ....
-* define **XBYAK_ENABLE_OMITTED_OPERAND** if you use omitted destination such as `vaddps(xmm2, xmm3);`(deprecated in the future).
-* define **XBYAK_UNDEF_JNL** if Bessel function jnl is defined as macro.
-* define **XBYAK_NO_EXCEPTION** for a compiler option `-fno-exceptions`.
-* define **XBYAK_USE_MEMFD** on Linux then /proc/self/maps shows the area used by xbyak.
-* define **XBYAK_OLD_DISP_CHECK** if the old disp check is necessary (deprecated in the future).
+## Author
-## Sample
+#### 光成滋生 Mitsunari Shigeo
+ [GitHub](https//github.com/herumi) | [Website (Japanese)](http://herumi.in.coocan.jp/) | [[email protected]](mailto:[email protected])
-* [test0.cpp](sample/test0.cpp) ; tiny sample (x86, x64)
-* [quantize.cpp](sample/quantize.cpp) ; JIT optimized quantization by fast division (x86 only)
-* [calc.cpp](sample/calc.cpp) ; assemble and estimate a given polynomial (x86, x64)
-* [bf.cpp](sample/bf.cpp) ; JIT brainfuck (x86, x64)
+## Sponsors welcome
+[GitHub Sponsor](https://github.com/sponsors/herumi)
-## License
+<!----------------------------------------------------------------------------->
-modified new BSD License
-http://opensource.org/licenses/BSD-3-Clause
+[Badge Build]: https://github.com/herumi/xbyak/actions/workflows/main.yml/badge.svg
+[Build Status]: https://github.com/herumi/xbyak/actions/workflows/main.yml
-## History
-* 2022/Apr/05 ver 6.04 add tpause, umonitor, umwait
-* 2022/Mar/08 ver 6.03 MmapAllocator supports memfd with user-defined strings.
-* 2022/Jan/28 ver 6.02 strict check the range of 32-bit dispacement
-* 2021/Dec/14 ver 6.01 support T_FAR jump/call and retf
-* 2021/Sep/14 ver 6.00 fully support AVX512-FP16
-* 2021/Sep/09 ver 5.997 fix vrndscale* to support {sae}
-* 2021/Sep/03 ver 5.996 fix v{add,sub,mul,div,max,min}{sd,ss} to support T_rd_sae.
-* 2021/Aug/15 ver 5.995 add a label to /proc/self/maps if XBYAK_USE_MEMFD is defined on Linux
-* 2021/Jun/17 ver 5.994 add alias of vcmpXX{ps,pd,ss,sd} with mask register
-* 2021/Jun/06 ver 5.993 strict check of gather/scatter register combination
-* 2021/May/09 ver 5.992 support endbr32 and endbr64
-* 2020/Nov/16 ver 5.991 disable constexpr for gcc-5 with -std=c++-14
-* 2020/Oct/19 ver 5.99 support VNNI instructions(Thanks to akharito)
-* 2020/Oct/17 ver 5.98 support the form of [scale * reg]
-* 2020/Sep/08 ver 5.97 replace uint32 with uint32_t etc.
-* 2020/Aug/28 ver 5.95 some constructors of register classes support constexpr if C++14 or later
-* 2020/Aug/04 ver 5.941 `CodeGenerator::reset()` calls `ClearError()`.
-* 2020/Jul/28 ver 5.94 remove #include <winsock2.h> (only windows)
-* 2020/Jul/21 ver 5.93 support exception-less mode
-* 2020/Jun/30 ver 5.92 support Intel AMX instruction set (Thanks to nshustrov)
-* 2020/Jun/22 ver 5.913 fix mov(r64, imm64) on 32-bit env with XBYAK64
-* 2020/Jun/19 ver 5.912 define MAP_JIT on macOS regardless of Xcode version (Thanks to rsdubtso)
-* 2020/May/10 ver 5.911 XBYAK_USE_MMAP_ALLOCATOR is defined unless XBYAK_DONT_USE_MMAP_ALLOCATOR is defined.
-* 2020/Apr/20 ver 5.91 accept mask register k0 (it means no mask)
-* 2020/Apr/09 ver 5.90 kmov{b,d,w,q} throws exception for an unsupported register
-* 2020/Feb/26 ver 5.891 fix typo of type
-* 2020/Jan/03 ver 5.89 fix error of vfpclasspd
-* 2019/Dec/20 ver 5.88 fix compile error on Windows
-* 2019/Dec/19 ver 5.87 add setDefaultJmpNEAR(), which deals with `jmp` of an undefined label as T_NEAR if no type is specified.
-* 2019/Dec/13 ver 5.86 [changed] revert to the behavior before v5.84 if -fno-operator-names is defined (and() is available)
-* 2019/Dec/07 ver 5.85 append MAP_JIT flag to mmap for macOS mojave or later
-* 2019/Nov/29 ver 5.84 [changed] XBYAK_NO_OP_NAMES is defined unless XBYAK_USE_OP_NAMES is defined
-* 2019/Oct/12 ver 5.83 exit(1) was removed
-* 2019/Sep/23 ver 5.82 support monitorx, mwaitx, clzero (thanks to @MagurosanTeam)
-* 2019/Sep/14 ver 5.81 support some generic mnemonics.
-* 2019/Aug/01 ver 5.802 fix detection of AVX512_BF16 (thanks to vpirogov)
-* 2019/May/27 support vp2intersectd, vp2intersectq (not tested)
-* 2019/May/26 ver 5.80 support vcvtne2ps2bf16, vcvtneps2bf16, vdpbf16ps
-* 2019/Apr/27 ver 5.79 vcmppd/vcmpps supports ptr_b(thanks to jkopinsky)
-* 2019/Apr/15 ver 5.78 rewrite Reg::changeBit() (thanks to MerryMage)
-* 2019/Mar/06 ver 5.77 fix number of cores that share LLC cache by densamoilov
-* 2019/Jan/17 ver 5.76 add Cpu::getNumCores() by shelleygoel
-* 2018/Oct/31 ver 5.751 recover Xbyak::CastTo for compatibility
-* 2018/Oct/29 ver 5.75 unlink LabelManager from Label when msg is destroyed
-* 2018/Oct/21 ver 5.74 support RegRip +/- int. Xbyak::CastTo is removed
-* 2018/Oct/15 util::AddressFrame uses push/pop instead of mov
-* 2018/Sep/19 ver 5.73 fix evex encoding of vpslld, vpslldq, vpsllw, etc for (reg, mem, imm8)
-* 2018/Sep/19 ver 5.72 fix the encoding of vinsertps for disp8N(Thanks to petercaday)
-* 2018/Sep/04 ver 5.71 L() returns a new label instance
-* 2018/Aug/27 ver 5.70 support setProtectMode() and DontUseProtect for read/exec setting
-* 2018/Aug/24 ver 5.68 fix wrong VSIB encoding with vector index >= 16(thanks to petercaday)
-* 2018/Aug/14 ver 5.67 remove mutable in Address ; fix setCacheHierarchy for cloud vm
-* 2018/Jul/26 ver 5.661 support mingw64
-* 2018/Jul/24 ver 5.66 add CodeArray::PROTECT_RE to mode of protect()
-* 2018/Jun/26 ver 5.65 fix push(qword [mem])
-* 2018/Mar/07 ver 5.64 fix zero division in Cpu() on some cpu
-* 2018/Feb/14 ver 5.63 fix Cpu::setCacheHierarchy() and fix EvexModifierZero for clang<3.9(thanks to mgouicem)
-* 2018/Feb/13 ver 5.62 Cpu::setCacheHierarchy() by mgouicem and rsdubtso
-* 2018/Feb/07 ver 5.61 vmov* supports mem{k}{z}(I forgot it)
-* 2018/Jan/24 ver 5.601 add xword, yword, etc. into Xbyak::util namespace
-* 2018/Jan/05 ver 5.60 support AVX-512 for Ice lake(319433-030.pdf)
-* 2017/Aug/22 ver 5.53 fix mpx encoding, add bnd() prefix
-* 2017/Aug/18 ver 5.52 fix align (thanks to MerryMage)
-* 2017/Aug/17 ver 5.51 add multi-byte nop and align() uses it(thanks to inolen)
-* 2017/Aug/08 ver 5.50 add mpx(thanks to magurosan)
-* 2017/Aug/08 ver 5.45 add sha(thanks to magurosan)
-* 2017/Aug/08 ver 5.44 add prefetchw(thanks to rsdubtso)
-* 2017/Jul/12 ver 5.432 reduce warnings of PVS studio
-* 2017/Jul/09 ver 5.431 fix hasRex() (no affect) (thanks to drillsar)
-* 2017/May/14 ver 5.43 fix CodeGenerator::resetSize() (thanks to gibbed)
-* 2017/May/13 ver 5.42 add movs{b,w,d,q}
-* 2017/Jan/26 ver 5.41 add prefetchwt1 and support for scale == 0(thanks to rsdubtso)
-* 2016/Dec/14 ver 5.40 add Label::getAddress() method to get the pointer specified by the label
-* 2016/Dec/09 ver 5.34 fix handling of negative offsets when encoding disp8N(thanks to rsdubtso)
-* 2016/Dec/08 ver 5.33 fix encoding of vpbroadcast{b,w,d,q}, vpinsr{b,w}, vpextr{b,w} for disp8N
-* 2016/Dec/01 ver 5.32 rename __xgetbv() to _xgetbv() to support clang for Visual Studio(thanks to freiro)
-* 2016/Nov/27 ver 5.31 rename AVX512_4VNNI to AVX512_4VNNIW
-* 2016/Nov/27 ver 5.30 add AVX512_4VNNI, AVX512_4FMAPS instructions(thanks to rsdubtso)
-* 2016/Nov/26 ver 5.20 add detection of AVX512_4VNNI and AVX512_4FMAPS(thanks to rsdubtso)
-* 2016/Nov/20 ver 5.11 lost vptest for ymm(thanks to gregory38)
-* 2016/Nov/20 ver 5.10 add addressing [rip+&var]
-* 2016/Sep/29 ver 5.03 fix detection ERR_INVALID_OPMASK_WITH_MEMORY(thanks to PVS-Studio)
-* 2016/Aug/15 ver 5.02 xbyak does not include xbyak_bin2hex.h
-* 2016/Aug/15 ver 5.011 fix detection of version of gcc 5.4
-* 2016/Aug/03 ver 5.01 disable omitted operand
-* 2016/Jun/24 ver 5.00 support avx-512 instruction set
-* 2016/Jun/13 avx-512 add mask instructions
-* 2016/May/05 ver 4.91 add detection of AVX-512 to Xbyak::util::Cpu
-* 2016/Mar/14 ver 4.901 comment to ready() function(thanks to skmp)
-* 2016/Feb/04 ver 4.90 add jcc(const void *addr);
-* 2016/Jan/30 ver 4.89 vpblendvb supports ymm reg(thanks to John Funnell)
-* 2016/Jan/24 ver 4.88 lea, cmov supports 16-bit register(thanks to whyisthisfieldhere)
-* 2015/Oct/05 ver 4.87 support segment selectors
-* 2015/Aug/18 ver 4.86 fix [rip + label] addressing with immediate value(thanks to whyisthisfieldhere)
-* 2015/Aug/10 ver 4.85 Address::operator==() is not correct(thanks to inolen)
-* 2015/Jun/22 ver 4.84 call() support variadic template if available(thanks to randomstuff)
-* 2015/Jun/16 ver 4.83 support movbe(thanks to benvanik)
-* 2015/May/24 ver 4.82 support detection of F16C
-* 2015/Apr/25 ver 4.81 fix the condition to throw exception for setSize(thanks to whyisthisfieldhere)
-* 2015/Apr/22 ver 4.80 rip supports label(thanks to whyisthisfieldhere)
-* 2015/Jar/28 ver 4.71 support adcx, adox, cmpxchg, rdseed, stac
-* 2014/Oct/14 ver 4.70 support MmapAllocator
-* 2014/Jun/13 ver 4.62 disable warning of VC2014
-* 2014/May/30 ver 4.61 support bt, bts, btr, btc
-* 2014/May/28 ver 4.60 support vcvtph2ps, vcvtps2ph
-* 2014/Apr/11 ver 4.52 add detection of rdrand
-* 2014/Mar/25 ver 4.51 remove state information of unreferenced labels
-* 2014/Mar/16 ver 4.50 support new Label
-* 2014/Mar/05 ver 4.40 fix wrong detection of BMI/enhanced rep on VirtualBox
-* 2013/Dec/03 ver 4.30 support Reg::cvt8(), cvt16(), cvt32(), cvt64()
-* 2013/Oct/16 ver 4.21 label support std::string
-* 2013/Jul/30 ver 4.20 [break backward compatibility] split Reg32e class into RegExp(base+index*scale+disp) and Reg32e(means Reg32 or Reg64)
-* 2013/Jul/04 ver 4.10 [break backward compatibility] change the type of Xbyak::Error from enum to a class
-* 2013/Jun/21 ver 4.02 add putL(LABEL) function to put the address of the label
-* 2013/Jun/21 ver 4.01 vpsllw, vpslld, vpsllq, vpsraw, vpsrad, vpsrlw, vpsrld, vpsrlq support (ymm, ymm, xmm). support vpbroadcastb, vpbroadcastw, vpbroadcastd, vpbroadcastq(thanks to Gabest).
-* 2013/May/30 ver 4.00 support AVX2, VEX-encoded GPR-instructions
-* 2013/Mar/27 ver 3.80 support mov(reg, "label");
-* 2013/Mar/13 ver 3.76 add cqo(), jcxz(), jecxz(), jrcxz()
-* 2013/Jan/15 ver 3.75 add setSize() to modify generated code
-* 2013/Jan/12 ver 3.74 add CodeGenerator::reset() ; add Allocator::useProtect()
-* 2013/Jan/06 ver 3.73 use unordered_map if possible
-* 2012/Dec/04 ver 3.72 eax, ebx, ... are member variables of CodeGenerator(revert), Xbyak::util::eax, ... are static const.
-* 2012/Nov/17 ver 3.71 and_(), or_(), xor_(), not_() are available if XBYAK_NO_OP_NAMES is not defined.
-* 2012/Nov/17 change eax, ebx, ptr and so on in CodeGenerator as static member and alias of them are defined in Xbyak::util.
-* 2012/Nov/09 ver 3.70 XBYAK_NO_OP_NAMES macro is added to use and_() instead of and() (thanks to Mattias)
-* 2012/Nov/01 ver 3.62 add fwait/fnwait/finit/fninit
-* 2012/Nov/01 ver 3.61 add fldcw/fstcw
-* 2012/May/03 ver 3.60 change interface of Allocator
-* 2012/Mar/23 ver 3.51 fix userPtr mode
-* 2012/Mar/19 ver 3.50 support AutoGrow mode
-* 2011/Nov/09 ver 3.05 fix bit property of rip addresing / support movsxd
-* 2011/Aug/15 ver 3.04 fix dealing with imm8 such as add(dword [ebp-8], 0xda); (thanks to lolcat)
-* 2011/Jun/16 ver 3.03 fix __GNUC_PREREQ macro for Mac gcc(thanks to t_teruya)
-* 2011/Apr/28 ver 3.02 do not use xgetbv on Mac gcc
-* 2011/May/24 ver 3.01 fix typo of OSXSAVE
-* 2011/May/23 ver 3.00 add vcmpeqps and so on
-* 2011/Feb/16 ver 2.994 beta add vmovq for 32-bit mode(I forgot it)
-* 2011/Feb/16 ver 2.993 beta remove cvtReg to avoid thread unsafe
-* 2011/Feb/10 ver 2.992 beta support one argument syntax for fadd like nasm
-* 2011/Feb/07 ver 2.991 beta fix pextrw reg, xmm, imm(Thanks to Gabest)
-* 2011/Feb/04 ver 2.99 beta support AVX
-* 2010/Dec/08 ver 2.31 fix ptr [rip + 32bit offset], support rdtscp
-* 2010/Oct/19 ver 2.30 support pclmulqdq, aesdec, aesdeclast, aesenc, aesenclast, aesimc, aeskeygenassist
-* 2010/Jun/07 ver 2.29 fix call(<label>)
-* 2010/Jun/17 ver 2.28 move some member functions to public
-* 2010/Jun/01 ver 2.27 support encoding of mov(reg64, imm) like yasm(not nasm)
-* 2010/May/24 ver 2.26 fix sub(rsp, 1000)
-* 2010/Apr/26 ver 2.25 add jc/jnc(I forgot to implement them...)
-* 2010/Apr/16 ver 2.24 change the prototype of rewrite() method
-* 2010/Apr/15 ver 2.23 fix align() and xbyak_util.h for Mac
-* 2010/Feb/16 ver 2.22 fix inLocalLabel()/outLocalLabel()
-* 2009/Dec/09 ver 2.21 support cygwin(gcc 4.3.2)
-* 2009/Nov/28 support a part of FPU
-* 2009/Jun/25 fix mov(qword[rax], imm); (thanks to Martin)
-* 2009/Mar/10 fix redundant REX.W prefix on jmp/call reg64
-* 2009/Feb/24 add movq reg64, mmx/xmm; movq mmx/xmm, reg64
-* 2009/Feb/13 movd(xmm7, dword[eax]) drops 0x66 prefix (thanks to Gabest)
-* 2008/Dec/30 fix call in short relative address(thanks to kato san)
-* 2008/Sep/18 support @@, @f, @b and localization of label(thanks to nobu-q san)
-* 2008/Sep/18 support (ptr[rip + 32bit offset]) (thanks to Dango-Chu san)
-* 2008/Jun/03 fix align(). mov(ptr[eax],1) throws ERR_MEM_SIZE_IS_NOT_SPECIFIED.
-* 2008/Jun/02 support memory interface allocated by user
-* 2008/May/26 fix protect() to avoid invalid setting(thanks to shinichiro_h san)
-* 2008/Apr/30 add cmpxchg16b, cdqe
-* 2008/Apr/29 support x64
-* 2008/Apr/14 code refactoring
-* 2008/Mar/12 add bsr/bsf
-* 2008/Feb/14 fix output of sub eax, 1234 (thanks to Robert)
-* 2007/Nov/5 support lock, xadd, xchg
-* 2007/Nov/2 support SSSE3/SSE4 (thanks to Dango-Chu san)
-* 2007/Feb/4 fix the bug that exception doesn't occur under the condition which the offset of jmp mnemonic without T_NEAR is over 127.
-* 2007/Jan/21 fix the bug to create address like [disp] select smaller representation for mov (eax|ax|al, [disp])
-* 2007/Jan/4 first version
+[License]: COPYRIGHT
-## Author
-MITSUNARI Shigeo([email protected])
+[Changelog]: doc/changelog.md
+[Install]: doc/install.md
+[Usage]: doc/usage.md
-## Sponsors welcome
-[GitHub Sponsor](https://github.com/sponsors/herumi)