Age | Commit message (Collapse) | Author |
|
|
|
Redo primary context and fix various long-standing bugs around this API
|
|
|
|
|
|
underying -> underlying
|
|
The link that should be for AMD Adrenalin was pointing to ROCm linux info
|
|
Add sign extension support to prmt, allow set.<op>.f16x2.f16x2, add more BLAS mappings
|
|
uderlying -> underlying
|
|
|
|
|
|
|
|
|
|
Too many changes to list, but broadly:
* Remove Intel GPU support from the compiler
* Add AMD GPU support to the compiler
* Remove Intel GPU host code
* Add AMD GPU host code
* More device instructions. From 40 to 68
* More host functions. From 48 to 184
* Add proof of concept implementation of OptiX framework
* Add minimal support of cuDNN, cuBLAS, cuSPARSE, cuFFT, NCCL, NVML
* Improve ZLUDA launcher for Windows
|
|
|
|
|
|
* Update ze_loader.lib to the newest version
* Export _ptsz/_ptds for which we have a legacy stream implementations
* Stop producing build logs if we are not looking at them anyway
|
|
|
|
* Use official GPU driver packages for building on Linux
* Start building on Windows
* Start uploading artifacts
|
|
Improve injector&redirector so it's no longer required to manually mess with files if the application links nvcuda.dll. Additionally inject into child processes
|
|
This fixes the last remaining bug preventing end-to-end GeekBench run, so also update Geekbench results in README
|
|
zluda_dump can already create traces of GPU execution, this script can replay those traces.
Additionally, changed added just enough code in core ZLUDA to support simple PyCUDAexecution
|
|
|
|
|
|
This function is required by recent versions of CUDA runtime on Windows
|
|
In one of the previous commits we made a change to mark ld/st as aligned. This change was not propagated to test files
|
|
Fixes issues pointed out in #27:
* spirv_tools-sys was build in non-test profiles
* By default ZLUDA dll has a wrong name
* We relied on third-party OpenCL installation on Windows
* We encouraged building debug configuration
* We didn't provide build information for developers (cmake, python, submodules)
|
|
Fix various bugs in injector and redirector, make them more robust and enable building them by default
|
|
|
|
36b69b9 Make Detours MinGW Clang-compatible
git-subtree-dir: ext/detours
git-subtree-split: 36b69b971888b2ca0c5913563bae011efaa4a42e
|
|
|
|
git-subtree-dir: ext/detours
git-subtree-split: 39aa864d2985099c8d847e29a5fb86618039b9c4
|
|
Testing isn't working yet because some tests require live Intel GPU and live NVIDIA GPU
|
|
Two changes:
* Fixes to builtins generation that I forgot to include in #21
* Marking of ld/st as aligned - this gives a big performance boost in GeekBench SFFT
|
|
We currently directly map PTX special registers: %ntid, %tid, etc. to SPIR-V builtins with type OpTypeVector %uint 4.
This is wrong and leads to a silent corruption, which fails e.g. Depth of Field in GeekBench
|
|
Current code has a problem with handling vector members: "b.x" in "mov.u32 a, b.x". This functionality has been kinda tacked-on and has annoying issues:
* vector members support is only limited to being source of movs (so "add.u32 a.x, b.x, c.y" will not work)
* the width of "b" in "b.x" is not known, which led to some "interesting" workarounds
* passes can either convert all member accesses to other member accesses or to temporaries. No way to convert some member accesses to temporaries (which we need for an important fix)
This commit solves all this
|
|
Fix small typo
|
|
fix typo in readme
|
|
|
|
|
|
|
|
|
|
name
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|