From f533cfe03ffa5c77884b6ffeb3d96045eb6a5dc3 Mon Sep 17 00:00:00 2001 From: Andrzej Janik Date: Fri, 20 Dec 2024 22:13:00 +0100 Subject: Update README --- CONTRIBUTING.md | 61 --------------------------------------------------------- README.md | 54 +++++++++++++++++++++++++++++++++++--------------- 2 files changed, 38 insertions(+), 77 deletions(-) delete mode 100644 CONTRIBUTING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md deleted file mode 100644 index 59899a8..0000000 --- a/CONTRIBUTING.md +++ /dev/null @@ -1,61 +0,0 @@ -# Dependencies - -Development builds of ZLUDA requires following dependencies: - -* CMake -* Python 3 - -Additionally the repository has to be cloned with Git submodules initalized. If you cloned the repo without initalizing submodules, do this: -``` -git submodule update --init --recursive -``` - -# Tests - -Tests should be executed with `--workspace` option to test non-default targets: -``` -cargo test --workspace -``` - -# Debugging - -## Debuggging CUDA applications - -When running an application with ZLUDA quite often you will run into subtle bugs or incompatibilities in the generated GPU code. The best way to debug an application's GPU CUDA code is to use ZLUDA dumper. - -Library `zluda_dump` can be injected into a CUDA application and produce a trace which, for every launched GPU function contains: -* PTX source -* Launch arguments (block size, grid size, shared memory size) -* Dump of function arguments. Both after and before - -Example use with GeekBench: -``` -set ZLUDA_DUMP_KERNEL=knn_match -set ZLUDA_DUMP_DIR=C:\temp\zluda_dump -"\zluda_with.exe" "\zluda_dump.dll" -- "geekbench_x86_64.exe" --compute CUDA -``` - -The example above, for every execution of GPU function `knn_match`, will save its details into the directory `C:\temp\zluda_dump` - -This dump can be replayed with `replay.py` script from `zluda_dump` source directory. Use it like this: -``` -python replay.py "C:\temp\zluda_dump\geekbench_x86_64.exe" -``` -You must copy (or symlink) ZLUDA `nvcuda.dll` into PyCUDA directory, so it will run using ZLUDA. Example output: -``` -Intel(R) Graphics [0x3e92] [github.com/vosen/ZLUDA] -C:\temp\zluda_dump\geekbench_x86_64.exe\4140_scale_pyramid -C:\temp\zluda_dump\geekbench_x86_64.exe\4345_convolve_1d_vertical_grayscale - Skipping, launch block size (512) bigger than maximum block size (256) -C:\temp\zluda_dump\geekbench_x86_64.exe\4480_scale_pyramid -6: -Arrays are not equal - -Mismatched elements: 1200 / 19989588 (0.006%) -Max absolute difference: 255 -Max relative difference: 255. - x: array([ 7, 6, 8, ..., 193, 195, 193], dtype=uint8) - y: array([ 7, 6, 8, ..., 193, 195, 193], dtype=uint8) -``` -From this output one can observe that in kernel launch 4480, 6th argument to function `scale_pyramid` differs between what was executed on an NVIDIA GPU using CUDA and Intel GPU using ZLUDA. -__Important__: It's impossible to infer what was the type (and semantics) of argument passed to a GPU function. At our level it's a buffer of bytes and by default `replay.py` simply checks if two buffers are byte-equal. That means you will have a ton of false negatives when running `replay.py`. You should override them for your particular case in `replay.py` - it already contains some overrides for GeekBench kernels \ No newline at end of file diff --git a/README.md b/README.md index 73df97f..2b6a815 100644 --- a/README.md +++ b/README.md @@ -6,16 +6,17 @@ ZLUDA is a drop-in replacement for CUDA on non-NVIDIA GPU. ZLUDA allows to run u ZLUDA is work in progress. Follow development here and say hi on [Discord](https://discord.gg/sg6BNzXuc7). For more details see the announcement: https://vosen.github.io/ZLUDA/blog/zludas-third-life/ - ## Usage -**Warning**: ZLUDA is under heavy development (see news [here](https://vosen.github.io/ZLUDA/blog/zludas-third-life/)). Instructions below might not work. +**Warning**: This version ZLUDA is under heavy development (more [here](https://vosen.github.io/ZLUDA/blog/zludas-third-life/)) and right now only supports Geekbench. ZLUDA probably will not work with your application just yet. ### Windows -You should have the most recent ROCm installed.\ -Run your application like this: -``` -\zluda_with.exe -- -``` +You should have recent AMD GPU driver ("AMD Software: Adrenalin Edition") installed.\ +To run your application you should etiher: +* (Recommended approach) Copy ZLUDA-provided `nvcuda.dll` and `nvml.dll` into a path which your application uses to load CUDA. Paths vary application to application, but usually it's the directory where the .exe file is located +* Use ZLUDA launcher like below. ZLUDA launcher is known to be buggy and unfinished + ``` + \zluda_with.exe -- + ``` ### Linux @@ -24,25 +25,35 @@ Run your application like this: LD_LIBRARY_PATH= ``` +where `` is the directory which contains ZLUDA-provided `libcuda.so`: `target\release` if you built from sources or `zluda` if you downloaded prebuilt package. + ### MacOS Not supported ## Building -**Warning**: ZLUDA is under heavy development (see news [here](https://vosen.github.io/ZLUDA/blog/zludas-third-life/)). Instructions below might not work. -_Note_: This repo has submodules. Make sure to recurse submodules when cloning this repo, e.g.: `git clone --recursive https://github.com/vosen/ZLUDA.git` +### Dependencies - You should have a relatively recent version of Rust installed, then you just do: +* Git +* CMake +* Python 3 +* Rust compiler (recent version) +* C++ compiler +* (Optional, but recommended) [Ninja build system](https://ninja-build.org/) + +### Build steps + +* Git clone the repo (make sure to use `--recursive` option to fetch submodules): +`git clone --recursive https://github.com/vosen/ZLUDA.git` +* Enter freshly cloned `ZLUDA` directory and build with cargo (this takes a while): +`cargo build --release` -``` -cargo build --release -``` -in the main directory of the project. ### Linux -If you are building on Linux you must also symlink (or rename) the ZLUDA output binaries after ZLUDA build finishes: +If you are building on Linux you must also symlink the ZLUDA output binaries after ZLUDA build finishes: ``` +cd target/release ln -s libnvcuda.so target/release/libcuda.so ln -s libnvcuda.so target/release/libcuda.so.1 ln -s libnvml.so target/release/libnvidia-ml.so @@ -50,7 +61,18 @@ ln -s libnvml.so target/release/libnvidia-ml.so ## Contributing -If you want to develop ZLUDA itself, read [CONTRIBUTING.md](CONTRIBUTING.md), it contains instructions how to set up dependencies and run tests +ZLUDA project has a commercial backing and _does not_ accept donations. +ZLUDA project accepts pull requests and other non-monetary contributions. + +If you want to contribute a code fix or documentation update feel free to open a Pull Request. + +### Getting started + +There's no architecture document (yet). Two most important crates in ZLUDA are `ptx` (PTX compiler) and `zluda` (AMD GPU runtime). A good starting point to tinkering the project is to run one of the `ptx` unit tests under a debugger and understand what it is doing. `cargo test -p ptx -- ::add_hip` is a simple test that adds two numbers. + +Github issues tagged with ["help wanted"](https://github.com/vosen/ZLUDA/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) are tasks that are self-containted. Their level of difficulty varies, they are not always great starting points, but they have a relatively clear definition of "done". + +If you have questions feel free to ask on [#devtalk channel on Discord](https://discord.com/channels/1273316903783497778/1303329281409159270). ## License -- cgit v1.2.3