|
| 1 | +# Fuzzing plan for intel/llvm |
| 2 | + |
| 3 | +Fuzzing (or fuzz testing) is an automated testing approach where inputs are |
| 4 | +randomly generated. It often leads to passing unexpected and invalid inputs |
| 5 | +which in turn uncovers various corner cases that weren't considered during |
| 6 | +development or regular testing. |
| 7 | + |
| 8 | +The main product which is being developed at intel/llvm repo is a SYCL |
| 9 | +implementation. At high-level, it consists of two components: a compiler and a |
| 10 | +runtime, and therefore this document will be divided into two major sections |
| 11 | +covering those components. Those components are essentially the only entry |
| 12 | +points through which a user can interact with our product. |
| 13 | + |
| 14 | +## SYCL Runtime |
| 15 | + |
| 16 | +SYCL runtime is a library which implements SYCL API, but besides that API it |
| 17 | +also has multiple configuration options which can be tweaked through environment |
| 18 | +variables and config file. |
| 19 | + |
| 20 | +### Fuzzing environment variables |
| 21 | + |
| 22 | +Every environment variable in the [documentation][sycl-rt-env-variables] should |
| 23 | +should be fuzzed. |
| 24 | + |
| 25 | +The most interesting of the environment variables are ones which expect data in |
| 26 | +a certain format, like `ONEAPI_DEVICE_SELECTOR` or `SYCL_CACHE_TRESHOLD`. |
| 27 | + |
| 28 | +[sycl-rt-env-variables]: https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md |
| 29 | + |
| 30 | +### Fuzzing sycl config file |
| 31 | + |
| 32 | +Instead of tweaking SYCL Runtime behavior through environment variables, the |
| 33 | +same can be done by providing a config file. We don't seem to have any |
| 34 | +documentation on it except the source code for the functionality which is |
| 35 | +located in `sycl/source/detail/config.cpp` file. |
| 36 | + |
| 37 | +There is a prototype for the sycl config file fuzzer available at |
| 38 | +https://github.com/intel/llvm/pull/16308 |
| 39 | + |
| 40 | +### Fuzzing API entry points |
| 41 | + |
| 42 | +TODO: think more about this section, i.e. whether or not we want to fuzz SYCL |
| 43 | +APIs. Not every of them accepts "raw" data, but instead expects some SYCL |
| 44 | +objects returned from previous API calls. However, there are still plenty of |
| 45 | +APIs which accept raw pointers and other fundamental data types. Note: to |
| 46 | +properly fuzz them structure-aware fuzzing may be needed. |
| 47 | + |
| 48 | +## SYCL Compiler |
| 49 | + |
| 50 | +SYCL compiler is based on the [upstream LLVM compiler project][llvm-project] |
| 51 | +and it is an enormously huge codebase. Some of LLVM components have been re-used |
| 52 | +without any modifications to them at all. Some of LLVM components were slightly |
| 53 | +tweaked or significantly modified and there are components which are completely |
| 54 | +new and only exist in our implementation. |
| 55 | + |
| 56 | +For every re-used component we should be able to benefit from existing fuzz |
| 57 | +testing written for those. Upstream documentation has them documented |
| 58 | +[here][llvm-fuzzers]. |
| 59 | + |
| 60 | +[llvm-fuzzers]: https://llvm.org/docs/FuzzingLLVM.html |
| 61 | +[llvm-project]: https://github.com/llvm/llvm-project |
| 62 | + |
| 63 | +However, even though we could re-use existing fuzzers, we can't just rely on |
| 64 | +someone else running them on the upstream codebase, because those runs won't |
| 65 | +cover any customizations we made (including new components like optimization |
| 66 | +passes which we added only in our downstream). |
| 67 | + |
| 68 | +There are also some unique components which may require special fuzzers. |
| 69 | +Sections below will go through components that we have and describe in more |
| 70 | +details like what should we fuzz and if we already have an existing fuzzer for |
| 71 | +that. |
| 72 | + |
| 73 | +There is also the [intel/yarpgen](https://github.com/intel/yarpgen) project that |
| 74 | +can be used to fuzz SYCL compilers. It generate random programs (of certain |
| 75 | +structure) to detect weaknesses and bugs in optimization passes. |
| 76 | + |
| 77 | +### Command line options |
| 78 | + |
| 79 | +There are plenty of SYCL-specific command line options and there are multiple of |
| 80 | +those which are not mere flag, but expect a user-provided value in a certain |
| 81 | +format. |
| 82 | + |
| 83 | +Those options should be fuzzed as well to ensure proper error handling of |
| 84 | +various weird inputs. |
| 85 | + |
| 86 | +### SYCL-specific passes |
| 87 | + |
| 88 | +We have developed a number of passes to implement different SYCL features. They |
| 89 | +all can be found in the `llvm/lib/SYCLLowerIR` folder. We don't need a dedicated |
| 90 | +fuzzer for every pass, but we can instead re-use existing LLVM fuzzer intended |
| 91 | +for compiler passes to cover those. |
| 92 | + |
| 93 | +### SYCL-specific tools |
| 94 | + |
| 95 | +As of now, we still use legacy offloading flow which involves multiple custom |
| 96 | +tools and some custom data format to communicate information between compiler |
| 97 | +phases. |
| 98 | + |
| 99 | +Even though strictly speaking, we should probably fuzz those as well, we are |
| 100 | +going to replace that with so-called new offloading model which significantly |
| 101 | +simplifies the flow by reducing amount of tools we have and therefore amount of |
| 102 | +custom data formats used to communicate between those. |
| 103 | + |
| 104 | +#### SPIRV-LLVM-Translator |
| 105 | + |
| 106 | +Going forward, this tool may be replaced by a SPIRV Backend, but ultimately a |
| 107 | +step of translating LLVM IR into SPIR-V format will stay in place. |
| 108 | + |
| 109 | +SPIR-V is a way stricter format and it moves at a slower pace than LLVM does and |
| 110 | +it is important that we have fuzzing for this phase as well. |
0 commit comments