|
| 1 | +- name: "CompilerResearchCon 2025 (day 2)" |
| 2 | + date: 2025-11-13 15:00:00 +0200 |
| 3 | + time_cest: "15:00" |
| 4 | + connect: "[Link to zoom](https://princeton.zoom.us/j/97915651167?pwd=MXJ1T2lhc3Z5QWlYbUFnMTZYQlNRdz09)" |
| 5 | + label: crcon25_part_2 |
| 6 | + agenda: |
| 7 | + - title: "Implementing Debugging Support for xeus-cpp" |
| 8 | + speaker: |
| 9 | + name: "Abhinav Kumar" |
| 10 | + time_cest: "15:00 - 15:20" |
| 11 | + description: | |
| 12 | + This proposal outlines integrating debugging into the xeus-cpp kernel |
| 13 | + for Jupyter using LLDB and its Debug Adapter Protocol (lldb-dap). |
| 14 | + Modeled after xeus-python, it leverages LLDB’s Clang and JIT debugging |
| 15 | + support to enable breakpoints, variable inspection, and step-through |
| 16 | + execution. The modular design ensures compatibility with Jupyter’s |
| 17 | + frontend, enhancing interactive C++ development in notebooks. |
| 18 | +
|
| 19 | + This project achieved DAP protocol integration with xeus-cpp. User can |
| 20 | + use the JupyterLab’s debugger panel to debug C++ JIT code. Applying and |
| 21 | + hitting breakpoints, stepping in and out of functions are supported in |
| 22 | + xeus-cpp. Additionally, during this project I had refactored |
| 23 | + the Out-of-Process JIT execution which was the major part in implementing |
| 24 | + the debugger. |
| 25 | +
|
| 26 | +
|
| 27 | + # slides: /assets/presentations/... |
| 28 | + |
| 29 | + - title: "Activity analysis for reverse-mode differentiation of (CUDA) GPU kernels" |
| 30 | + speaker: |
| 31 | + name: "Maksym Andriichuk" |
| 32 | + time_cest: "15:20 - 15:40" |
| 33 | + description: | |
| 34 | + Clad is a Clang plugin designed to provide automatic differentiation (AD) for C++ |
| 35 | + mathematical functions. It generates code for computing derivatives modifying |
| 36 | + Abstract-Syntax-Tree(AST) using LLVM compiler features. It performs advanced program |
| 37 | + optimization by implementing more sophisticated analyses because it has access to a |
| 38 | + rich program representation – the Clang AST. |
| 39 | +
|
| 40 | + The project achieved to optimize code that contains potential data-race conditions, |
| 41 | + significantly speeding up the execution. Thread Safety Analysis is a static analysis |
| 42 | + that detects possible data-race conditions that would enable reducing atomic |
| 43 | + operations in the Clad-produced code. |
| 44 | + |
| 45 | + # slides: /assets/presentations/... |
| 46 | + |
| 47 | + - title: "Enable automatic differentiation of OpenMP programs with Clad" |
| 48 | + speaker: |
| 49 | + name: "Jiayang Li" |
| 50 | + time_cest: "15:40 - 16:00" |
| 51 | + description: | |
| 52 | + This project extends Clad, a Clang-based automatic differentiation tool for C++, to |
| 53 | + support OpenMP programs. This project enables Clad to parse and differentiate |
| 54 | + functions with OpenMP directives, thereby enabling gradient computation in |
| 55 | + multi-threaded environments. |
| 56 | +
|
| 57 | + This project achieved Clad support for both forward and reverse mode differentiation |
| 58 | + of common OpenMP directives (parallel, parallel for) and clauses (private, |
| 59 | + firstprivate, lastprivate, shared, atomic, reduction) by implementing OpenMP-related |
| 60 | + AST parsing and designing corresponding differentiation strategies. Additional |
| 61 | + contributions include example applications and comprehensive tests. |
| 62 | +
|
| 63 | + |
| 64 | + # slides: /assets/presentations/... |
| 65 | + |
| 66 | + - title: "Using ROOT in the field of Genome Sequencing" |
| 67 | + speaker: |
| 68 | + name: "Aditya Pandey" |
| 69 | + time_cest: "16:00 - 16:20" |
| 70 | + description: | |
| 71 | + The project extends ROOT, CERN's petabyte-scale data processing framework, to address |
| 72 | + the critical challenge of managing genomic data that generates upto 200GB per human |
| 73 | + genome. By leveraging ROOT's big data expertise and introducing the next-generation |
| 74 | + RNTuple columnar storage format specifically optimized for genomic sequences, the |
| 75 | + project eliminates the traditional trade-off between compression efficiency and |
| 76 | + access speed in bioinformatics. |
| 77 | +
|
| 78 | + The project achieved comprehensive genomic data support through validating GeneROOT |
| 79 | + baseline performance benchmarks against BAM/SAM formats, implementing RNTuple-based |
| 80 | + RAM (ROOT Alignment Maps) format with full SAM/BAM field support and smart reference |
| 81 | + management, demonstrating 23.5% smaller file sizes compared to CRAM while delivering |
| 82 | + 1.9x faster large region queries and 3.2x faster full chromosome scans, optimizing |
| 83 | + FASTQ compression from 14.2GB to 6.8GB. We also developed chromosome based |
| 84 | + file-splitting for larger genome file so that chromosome based data can be extracted. |
| 85 | +
|
| 86 | + |
| 87 | + # slides: /assets/presentations/... |
| 88 | + |
| 89 | +- name: "CompilerResearchCon 2025 (day 1)" |
| 90 | + date: 2025-10-30 15:00:00 +0200 |
| 91 | + time_cest: "15:00" |
| 92 | + connect: "[Link to zoom](https://princeton.zoom.us/j/97915651167?pwd=MXJ1T2lhc3Z5QWlYbUFnMTZYQlNRdz09)" |
| 93 | + label: crcon25_part_1 |
| 94 | + agenda: |
| 95 | + - title: "CARTopiaX an Agent-Based Simulation of CAR -T -Cell Therapy built on BioDynaMo" |
| 96 | + speaker: |
| 97 | + name: "Salvador de la Torre Gonzalez" |
| 98 | + time_cest: "15:00 - 15:20" |
| 99 | + description: | |
| 100 | + CAR- T-cell therapy is a form of cancer immunotherapy that engineers a |
| 101 | + patient’s T cells to recognize and eliminate malignant cells. Although |
| 102 | + highly effective in leukemias and other hematological cancers, this therapy |
| 103 | + faces significant challenges in solid tumors due to the complex and |
| 104 | + heterogeneous tumor microenvironment. CARTopiaX is an advanced agent-based |
| 105 | + model developed to address this challenge, using the mathematical framework |
| 106 | + proposed in the Nature paper “In silico study of heterogeneous tumour-derived |
| 107 | + organoid response to CAR T-cell therapy,” successfully replicating its core |
| 108 | + results. Built on BioDynaMo, a high-performance, open-source platform for |
| 109 | + large-scale and modular biological modeling, CARTopiaX enables detailed |
| 110 | + exploration of complex biological interactions, hypothesis testing, and |
| 111 | + data-driven discovery within solid tumor microenvironments. |
| 112 | +
|
| 113 | + The project achieved major milestones, including simulations that run more than |
| 114 | + twice as fast as previous model, allowing rapid scenario exploration and robust |
| 115 | + hypothesis validation; high-quality, well-structured, and maintainable C++ code |
| 116 | + developed following modern software engineering principles; and a scalable, |
| 117 | + modular, and extensible architecture that fosters collaboration, customization, |
| 118 | + and the continuous evolution of an open-source ecosystem. Altogether, this work |
| 119 | + represents a meaningful advancement in computational biology, providing |
| 120 | + researchers with a powerful tool to investigate CAR- T- cell dynamics in solid |
| 121 | + tumors and accelerating scientific discovery while reducing the time and cost |
| 122 | + associated with experimental wet-lab research. |
| 123 | +
|
| 124 | + # slides: /assets/presentations/... |
| 125 | + |
| 126 | + - title: "Efficient LLM Training in C++ via Compiler-Level Autodiff with Clad" |
| 127 | + speaker: |
| 128 | + name: "Rohan Timmaraju" |
| 129 | + time_cest: "15:20 - 15:40" |
| 130 | + description: | |
| 131 | + The computational demands of Large Language Model (LLM) training are |
| 132 | + often constrained by the performance of Python frameworks. This project |
| 133 | + tackles these bottlenecks by developing a high-performance LLM training |
| 134 | + pipeline in C++ using Clad, a Clang plugin for compiler-level automatic |
| 135 | + differentiation. The core of this work involved creating cladtorch, a new |
| 136 | + C++ tensor library with a PyTorch-style API designed for compatibility |
| 137 | + with Clad's differentiation capabilities. This library provides a more |
| 138 | + user-friendly interface for building and training neural networks while |
| 139 | + enabling Clad to automatically generate gradient computations for |
| 140 | + backpropagation. |
| 141 | +
|
| 142 | + Throughout the project, I successfully developed two distinct LLM training |
| 143 | + implementations. The first, using the cladtorch library, established a |
| 144 | + functional and flexible framework for Clad-driven AD. To further push |
| 145 | + performance boundaries, I then developed a second, highly-optimized |
| 146 | + implementation inspired by llm.c, which utilizes pre-allocated memory buffers |
| 147 | + and custom kernels. This optimized C-style approach, when benchmarked for |
| 148 | + GPT-2 training on a multithreaded CPU, outperformed the equivalent PyTorch |
| 149 | + implementation. This work successfully demonstrates the viability and |
| 150 | + performance benefits of compiler-based AD for deep learning in C++ and |
| 151 | + provides a strong foundation for future hardware acceleration, such as porting |
| 152 | + the implementation to CUDA. |
| 153 | + |
| 154 | + # slides: /assets/presentations/... |
| 155 | + |
| 156 | + - title: "Implement and improve an efficient, layered tape with prefetching capabilities" |
| 157 | + speaker: |
| 158 | + name: "Aditi Milind Joshi" |
| 159 | + time_cest: "15:40 - 16:00" |
| 160 | + description: | |
| 161 | + Clad relies on a tape data structure to store intermediate values during reverse |
| 162 | + mode differentiation. This project focuses on enhancing the core tape implementation |
| 163 | + in Clad to make it more efficient and scalable. Key deliverables include replacing |
| 164 | + the existing dynamic array-based tape with a slab allocation approach and small |
| 165 | + buffer optimization, enabling multilayer storage, and introducing thread safety to |
| 166 | + support concurrent access. |
| 167 | +
|
| 168 | + The current implementation replaces the dynamic array with a slab-based structure |
| 169 | + and a small static buffer, eliminating costly reallocations. Thread-safe access |
| 170 | + functions have been added through a mutex locking mechanism, ensuring safe parallel |
| 171 | + tape operations. Ongoing work includes developing a multilayer tape system with |
| 172 | + offloading capabilities, which will allow only the most recent slabs to remain in |
| 173 | + memory. |
| 174 | +
|
| 175 | + |
| 176 | + # slides: /assets/presentations/... |
| 177 | + |
| 178 | + - title: "Support usage of Thrust API in Clad" |
| 179 | + speaker: |
| 180 | + name: "Abdelrhman Elrawy" |
| 181 | + time_cest: "16:00 - 16:20" |
| 182 | + description: | |
| 183 | + This project integrates NVIDIA's Thrust library into Clad, a Clang-based automatic |
| 184 | + differentiation tool for C++. By extending Clad's source-to-source transformation |
| 185 | + engine to recognize and differentiate Thrust parallel algorithms, the project |
| 186 | + enables automatic gradient generation for GPU-accelerated scientific computing |
| 187 | + and machine learning applications. |
| 188 | +
|
| 189 | + The project achieved Thrust support in Clad through implementing custom derivatives |
| 190 | + for core algorithms including thrust::reduce, thrust::transform, |
| 191 | + thrust::transform_reduce, thrust::inner_product, thrust::copy, scan operations |
| 192 | + (inclusive/exclusive), thrust::adjacent_difference, and sorting primitives. |
| 193 | + Additional contributions include Thrust data containers like thrust::device_vector, |
| 194 | + generic functor handling for transformations, demonstration applications, and |
| 195 | + comprehensive unit tests. |
| 196 | + |
| 197 | + # slides: /assets/presentations/... |
0 commit comments