update documentation dflags, full example kernel

Tan Jerry · thewilsonator · commit 3be3fbf4aa84 · 2025-10-27T09:55:53.000+08:00
diff --git a/README.md b/README.md
@@ -51,19 +51,21 @@ To build DCompute you will need:
 * a SPIRV capable LLVM (available [here](https://github.com/thewilsonator/llvm/tree/compute) to build ldc to to support SPIRV (required for OpenCL)).
 * or LDC built with any LLVM 3.9.1 or greater that has the NVPTX backend enabled, to support CUDA.
 * [dub](https://github.com/dlang/dub) then just run `$dub build.`
+
 Alternatively, you can include dcompute as a dependency, as shown below:
   * add
     ```json
-    "dcompute": {
-        "version": "~>0.1.1",
-        "dflags": [
-            "-mdcompute-targets=cuda-800",
-            "-mdcompute-targets=ocl-300",
-            "-oq"
-        ]
-    }
+	"dependencies": {
+		"dcompute": {
+			"version": "~>0.1.1"
+		}
+	},
     ```
-    to your `dub.json` under `dependencies`. The dflags will be passed to LDC to generate code for the specified targets. You can run `ldc2 --help` to look for that flag. Use `ocl-xy0` for OpenCL x.y and `cuda-xy0` for CUDA Compute Capability x.y. So the above flags are for OpenCL 3.0 and CUDA CC 8.0. The two flags must be included separately as shown in the `dub.json`.
+    to your `dub.json` under `dependencies`. You should include the following dub flags under `dflags-ldc`, which are passed to the compiler:
+	```json
+	"dflags-ldc": ["-mdcompute-targets=cuda-800","-mdcompute-targets=ocl-300","-version=LDC_DCompute","-oq"],
+	```
+	The dflags will be passed to LDC to generate code for the specified targets. You can run `ldc2 --help` to look for that flag. Use `ocl-xy0` for OpenCL x.y and `cuda-xy0` for CUDA Compute Capability x.y. So the above flags are for OpenCL 3.0 and CUDA CC 8.0. The two flags must be included separately as shown above.
     * If you get an error saying `Need to use a DCompute enabled compiler`, you likely forgot the `-mdcompute-targets` flags.
     * Check NVIDIA's [website](https://developer.nvidia.com/cuda-gpus) for your CUDA Compute Capability.
   * Alternatively add the equivalent to dub.sdl, `dependency "dcompute" version="~>0.1.1"` to your `dub.sdl` and include the dflags.
diff --git a/docs/05-driver/00-intro.md b/docs/05-driver/00-intro.md
@@ -44,3 +44,72 @@ context's devices.
 
 **Event:** Represents a future return value from executing an asynchronous operation, such 
 as a data transfer or kernel launch.
+
+# Running a Kernel
+
+Now, let's run our `mykernel` kernel that we have built up (see `04-std/01-index.md`). Recall
+that our kernel code should be in a separate file. For our main function, we can have something
+as shown below. This is assumes compilation for CUDA backend. Note that we import our 
+`mykernels` module containing our kernel code and the dcompute driver for cuda.
+
+```d
+import std.stdio;
+import ldc.dcompute;
+import std.algorithm;
+import std.stdio;
+import std.file;
+import std.traits;
+import std.meta;
+import std.exception : enforce;
+import std.experimental.allocator;
+import std.array;
+import mykernels;
+import dcompute.driver.cuda;
+
+int main()
+{
+    enum size_t N = 128;
+    float c = 5.0;
+    float[N] res, x;
+    foreach (i; 0 .. N)
+    {
+        x[i] = i;
+    }
+
+    Platform.initialise();
+
+    auto devs = Platform.getDevices(theAllocator);
+    auto dev   = devs[0];
+    auto ctx   = Context(dev); scope(exit) ctx.detach();
+
+    // Change the file to match your GPU.
+    Program.globalProgram = Program.fromFile("kernels_cuda800_64.ptx");
+    auto q = Queue(false);
+
+    Buffer!(float) b_res, b_x;
+    b_res =  Buffer!(float)(res[]); scope(exit) b_res.release();
+    b_x   =  Buffer!(float)(x[]);   scope(exit) b_x.release();
+
+    b_x.copy!(Copy.hostToDevice);
+
+    q.enqueue!(mykernel)
+              ([N,1,1],[1,1,1])
+              (b_res,b_x,c);
+    b_res.copy!(Copy.deviceToHost);
+
+    foreach(i; 0 .. N)
+        enforce(res[i] == x[i] + c);
+    writeln(res[]);
+
+    return 0;
+}
+```
+It is important to change the file path on the `Program.fromFile("kernels_cuda800_64.ptx")` line
+to the ptx file generated by the compilation step. Depending on how you set up dub, it may be in
+`./.dub/obj` or just your project directory. You should verify that your kernels actually show
+up in the ptx file after running dub build (it's in plaintext).
+
+With the above example, we should get a successful run with the integers from 5 to 132 printed, since
+our kernel adds c, which is 5 in this case, to the input vector, which has 0 to 127 in our case.
+
+See `source/dcompute/tests` for examples of a slightly more complicated kernel and running with opencl driver.
diff --git a/docs/README.md b/docs/README.md
@@ -21,7 +21,12 @@ These docs are designed to help getting started installing & using DCompute.
 4.1 index
 5. The compute API driver
 
-## D
+You can find the corresponding Readme for each of the listed items in the parent `docs` directory, labelled with names
+starting with 00 through 05. For the device standard library and compute API driver, look in the 
+subdirectories `04-std` and `05-driver`, respectively. These instructions will help you install and execute
+your first kernel with DCompute.
+
+## D Basics Refresher
 
 This guide assumes that the reader is familiar with the basics of D, although anyone 
 familiar with the C family of languages should be able to understand most of it.