|
| 1 | + |
| 2 | + |
1 | 3 | CUDA Path Tracer |
2 | 4 | ================ |
3 | 5 |
|
@@ -70,6 +72,125 @@ A single choice between three options |
70 | 72 | ``` |
71 | 73 | ## 4. Performance benchmark |
72 | 74 |
|
| 75 | +#### 4.1 Stream Compaction, on open and close scene |
| 76 | + |
| 77 | + |
| 78 | +A sample output of number of rays after stream compaction after each iteration. Tested on scenes/cornell_refraction.json |
| 79 | + |
| 80 | +Results 1: |
| 81 | +Tested with cornell_refraction.json there are a total of 7 materials and 9 geometries, all of them are either box or spheres. |
| 82 | + |
| 83 | +  |
| 84 | + ``` |
| 85 | + num_paths: 588297 at depth 1 of iter18 |
| 86 | + num_paths: 481950 at depth 2 of iter18 |
| 87 | + num_paths: 398397 at depth 3 of iter18 |
| 88 | + num_paths: 326704 at depth 4 of iter18 |
| 89 | + num_paths: 266102 at depth 5 of iter18 |
| 90 | + num_paths: 217403 at depth 6 of iter18 |
| 91 | + num_paths: 178307 at depth 7 of iter18 |
| 92 | + num_paths: 0 at depth 8 of iter18 |
| 93 | + num_paths: 588302 at depth 1 of iter19 |
| 94 | + num_paths: 482180 at depth 2 of iter19 |
| 95 | + num_paths: 397826 at depth 3 of iter19 |
| 96 | + num_paths: 326109 at depth 4 of iter19 |
| 97 | + num_paths: 265074 at depth 5 of iter19 |
| 98 | + num_paths: 216725 at depth 6 of iter19 |
| 99 | + num_paths: 177920 at depth 7 of iter19 |
| 100 | + num_paths: 0 at depth 8 of iter19 |
| 101 | + num_paths: 588302 at depth 1 of iter20 |
| 102 | + num_paths: 482791 at depth 2 of iter20 |
| 103 | + num_paths: 398975 at depth 3 of iter20 |
| 104 | + num_paths: 327015 at depth 4 of iter20 |
| 105 | + num_paths: 266191 at depth 5 of iter20 |
| 106 | + num_paths: 217700 at depth 6 of iter20 |
| 107 | + num_paths: 178610 at depth 7 of iter20 |
| 108 | + num_paths: 0 at depth 8 of iter20 |
| 109 | + ``` |
| 110 | +
|
| 111 | +Results 2: |
| 112 | +Tested with cornell_refraction_close.json. Comparing to cornell_refraction.json, it has one more cube at back of the camera so light can't escape the room. |
| 113 | +
|
| 114 | +  |
| 115 | + ``` |
| 116 | + num_paths: 588103 at depth 1 of iter18 |
| 117 | + num_paths: 552057 at depth 2 of iter18 |
| 118 | + num_paths: 522357 at depth 3 of iter18 |
| 119 | + num_paths: 493064 at depth 4 of iter18 |
| 120 | + num_paths: 464186 at depth 5 of iter18 |
| 121 | + num_paths: 437812 at depth 6 of iter18 |
| 122 | + num_paths: 413273 at depth 7 of iter18 |
| 123 | + num_paths: 0 at depth 8 of iter18 |
| 124 | + num_paths: 588112 at depth 1 of iter19 |
| 125 | + num_paths: 551931 at depth 2 of iter19 |
| 126 | + num_paths: 522363 at depth 3 of iter19 |
| 127 | + num_paths: 492947 at depth 4 of iter19 |
| 128 | + num_paths: 464198 at depth 5 of iter19 |
| 129 | + num_paths: 437899 at depth 6 of iter19 |
| 130 | + num_paths: 413139 at depth 7 of iter19 |
| 131 | + num_paths: 0 at depth 8 of iter19 |
| 132 | + num_paths: 588110 at depth 1 of iter20 |
| 133 | + num_paths: 552150 at depth 2 of iter20 |
| 134 | + num_paths: 522766 at depth 3 of iter20 |
| 135 | + num_paths: 493317 at depth 4 of iter20 |
| 136 | + num_paths: 464677 at depth 5 of iter20 |
| 137 | + num_paths: 438364 at depth 6 of iter20 |
| 138 | + num_paths: 413396 at depth 7 of iter20 |
| 139 | + num_paths: 0 at depth 8 of iter20 |
| 140 | + ``` |
| 141 | +
|
| 142 | +The average render time is higher with close room scene. Since there is a lot less light terminated during rendering, more kernels need to be launched at each render pass, causing higher render time. |
| 143 | +
|
| 144 | +### BVH Reults |
| 145 | +
|
| 146 | +BVH is constructed on CPU, then ray-scene intersection kernel iteratively looks for closest hit along the BVH Tree. BVH accept three arguments: number of bins to split per axis ( when constructing BVH), max leaf size acceptable, and max depth acceptable. |
| 147 | +
|
| 148 | +*Test scene used is cow.obj, contains 2903 vertices, 5804 faces, and 1 diffuse material* |
| 149 | +
|
| 150 | +*Config 0 is using a single volume containing the whole mesh.* |
| 151 | +
|
| 152 | +#### Using max leaf count as bvh constraint |
| 153 | +
|
| 154 | +``` |
| 155 | +Max leaf count is set, max depth is generated on the fly. |
| 156 | +config 0: Single volume |
| 157 | +config 1: Bins per axis: 32, Max depth: 33, largest leaf size: 10 |
| 158 | +config 2: Bins per axis: 32, Max depth: 22, largest leaf size: 27 |
| 159 | +config 3: Bins per axis: 32, Max depth: 11, largest leaf size: 63 |
| 160 | +config 4: Bins per axis: 32, Max depth: 10, largest leaf size: 120 |
| 161 | +config 5: Bins per axis: 32, Max depth: 8, largest leaf size: 229 |
| 162 | +``` |
| 163 | +  |
| 164 | +
|
| 165 | +#### Using max depth as bvh constraint |
| 166 | +``` |
| 167 | +Max depth is set, max leaf count is generated on the fly. |
| 168 | +config 0: Single volume, 2995 |
| 169 | +config 1: Bins per axis: 32, Max depth: 4, largest leaf size: 1253 |
| 170 | +config 2: Bins per axis: 32, Max depth: 10, largest leaf size: 102 |
| 171 | +config 3: Bins per axis: 32, Max depth: 20, largest leaf size: 30 |
| 172 | +config 4: Bins per axis: 32, Max depth: 38, largest leaf size: 10 |
| 173 | +config 5: Bins per axis: 32, Max depth: 100, largest leaf size: 162 |
| 174 | +``` |
| 175 | +  |
| 176 | +
|
| 177 | +#### BVH Conclusion |
| 178 | +The BVH performance varies a lot given different tree construction configuration. Overall the |
| 179 | +best BVH result is slight worse than adding a simple bounding volume culling. I think that BVH will show improvement over vanilla bounding volume when we use a much denser mesh with 10k or more vertices in it. |
| 180 | +
|
| 181 | +### Sorting by material type |
| 182 | +
|
| 183 | +Another option is to sort by material type during each render pass so that each block will have similar performance. Here is the result. Tested with cornell_refraction scene. |
| 184 | + ``` |
| 185 | + No sort: Not soring by material type. |
| 186 | + One partition: Only group one type of material. Used for performance comparison |
| 187 | + Full Partition: All materials are grouped by type. Using thrust::parition |
| 188 | + Stable Partition: All materials are grouped by type. Using thrust::stable_parition |
| 189 | + Full sort: All materials types are sorted thrust::stable_sort |
| 190 | + ``` |
| 191 | +  |
| 192 | +
|
| 193 | + Turning on material sort hinders performance. I expect to that scenes with many more material types will see noticeable benefit from using this option. |
73 | 194 |
|
74 | 195 | ### 3rd-party code used |
75 | 196 |
|
|
0 commit comments