compiler-research
diff --git a/‎.github/actions/spelling/allow/names.txt
Lines changed: 2 additions & 0 deletions b/‎.github/actions/spelling/allow/names.txt
Lines changed: 2 additions & 0 deletions
diff --git a/‎.github/actions/spelling/allow/terms.txt
Lines changed: 5 additions & 0 deletions b/‎.github/actions/spelling/allow/terms.txt
Lines changed: 5 additions & 0 deletions
diff --git a/‎_data/contributors.yml
Lines changed: 28 additions & 0 deletions b/‎_data/contributors.yml
Lines changed: 28 additions & 0 deletions
diff --git a/‎_pages/team/rohan-timmaraju.md
Lines changed: 10 additions & 0 deletions b/‎_pages/team/rohan-timmaraju.md
Lines changed: 10 additions & 0 deletions
diff --git a/‎_posts/2025-05-18-Advanced-symbol-resolution-and-reoptimization-for-clang-repl.md
Lines changed: 69 additions & 0 deletions b/‎_posts/2025-05-18-Advanced-symbol-resolution-and-reoptimization-for-clang-repl.md
Lines changed: 69 additions & 0 deletions
diff --git a/‎_posts/2025-05-21-enhancing-llm-training.md
Lines changed: 52 additions & 0 deletions b/‎_posts/2025-05-21-enhancing-llm-training.md
Lines changed: 52 additions & 0 deletions
diff --git a/‎assets/docs/Rohan_Timmaraju_Proposal_2025.pdf
187 KB b/‎assets/docs/Rohan_Timmaraju_Proposal_2025.pdf
187 KB
diff --git a/‎images/blog/LLM_project_banner.jpg
354 KB b/‎images/blog/LLM_project_banner.jpg
354 KB
diff --git a/‎images/blog/gsoc_clang_repl.jpeg
99 KB b/‎images/blog/gsoc_clang_repl.jpeg
99 KB
diff --git a/‎images/team/Rohan_Timmaraju.jpg
294 KB b/‎images/team/Rohan_Timmaraju.jpg
294 KB
@@ -74,6 +74,7 @@ Svrin
 Tadel
 Taras
 Thessaloniki
+Timmaraju
 Universitat
 Unveristy
 Uppili
@@ -196,6 +197,7 @@ tapaswenipathak
 tfransham
 thakkar
 tharun
+timmaraju
 tlattner
 vaibhav
 vassil
 
@@ -4,12 +4,15 @@ CINT
 CMSSW
 Cppyy
 Debian
+EPC
 GPGPU
+GPT
 GSo
 GSoC
 HSF
 JIT'd
 Jacobians
+LLMs
 LLVM
 NVIDIA
 NVMe
@@ -30,12 +33,14 @@ gitlab
 gridlay
 gsoc
 gpu
+llm
 llvm
 pushforward
 linkedin
 microenvironments
 pythonized
 ramview
+reoptimize
 samtools
 sitemap
 softsusy
 
@@ -334,6 +334,34 @@
       proposal: /assets/docs/de_la_torre_gonzalez_salvador_proposal_gsoc_2025.pdf
       mentors: Vassil Vassilev, Lukas Breitwieser
 
+- name: Rohan Timmaraju
+  photo: Rohan_Timmaraju.jpg
+  info: "Google Summer of Code 2025 Contributor"
+  email: [email protected]
+  education: "B.S. Computer Science, Columbia University"
+  github: "https://github.com/Rohan-T144"
+  active: 1
+  linkedin: "https://www.linkedin.com/in/rohan-timmaraju-650ba3221/"
+  projects:
+    - title: "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation"
+      status: Ongoing
+      description: |
+        Training Large Language Models is computationally expensive, often 
+        limited by the performance limitations of Python-based frameworks. This 
+        project addresses this challenge by enhancing LLM training efficiency 
+        within a C++ environment through the integration of Clad, a Clang/LLVM 
+        compiler plugin for automatic differentiation (AD). We will develop a 
+        custom C++ tensor library specifically designed for optimal interaction 
+        with Clad. The core objective is to replace traditional runtime or 
+        manual gradient computations with Clad's efficient compile-time 
+        differentiation for key LLM operations within a GPT-2 training pipeline. 
+        This involves investigating effective strategies to bridge Clad's static 
+        analysis with dynamic neural network computations, benchmarking the 
+        resulting performance gains in speed and memory usage against a non-Clad 
+        baseline, and leveraging OpenMP for further parallelization.
+      proposal: /assets/docs/Rohan_Timmaraju_Proposal_2025.pdf
+      mentors: Vassil Vassilev, David Lange, Jonas Rembser, Christina Koutsou
+
 - name: Abdelrhman Elrawy
   photo: Abdelrhman.jpg
   info: "Google Summer of Code 2025 Contributor"
 
@@ -0,0 +1,10 @@
+---
+title: "Compiler Research - Team - Rohan Timmaraju"
+layout: gridlay
+excerpt: "Compiler Research: Team members"
+sitemap: false
+permalink: /team/RohanTimmaraju
+email: [email protected]
+---
+
+{% include team-profile.html %}
@@ -0,0 +1,69 @@
+---
+title: "Advanced symbol resolution and re-optimization for Clang-Repl"
+layout: post
+excerpt: "Advanced symbol resolution and re-optimization for Clang-Repl is a Google Summer of Code 2025 project. It aims to improve Clang-Repl and ORC JIT by adding support for automatically loading dynamic libraries when symbols are missing. This removes the need for users to load libraries manually and makes things work more smoothly."
+sitemap: false
+author: Sahil Patidar
+permalink: blogs/gsoc25_sahil_introduction_blog/
+banner_image: /images/blog/gsoc_clang_repl.jpeg
+date: 2025-05-18
+tags: gsoc LLVM clang-repl ORC-JIT auto-loading
+---
+
+### Introduction
+
+I am Sahil Patidar, a student during the 2025 Google Summer of Code. I will be
+working on the project "Advanced symbol resolution and re-optimization for Clang-Repl".
+
+**Mentors**: Vassil Vassilev
+
+### Overview of the Project
+
+[Clang-Repl](https://clang.llvm.org/docs/ClangRepl.html) is a powerful interactive C++ interpreter that leverages LLVM’s ORC JIT to support incremental compilation and execution. Currently, users must manually load dynamic libraries when their code references external symbols, as Clang-Repl lacks the ability to automatically resolve symbols from dynamic libraries.
+To address this limitation, we propose a solution to enable **auto-loading of dynamic libraries for unresolved symbols** within ORC JIT, which is central to Clang-Repl’s runtime infrastructure.
+
+Another part of this project is to add **re-optimization support** to Clang-Repl. Currently, Clang-Repl does not have the ability to optimize hot functions at runtime. With this feature, Clang-Repl will be able to detect frequently called functions and re-optimize them using a runtime call threshold.
+
+### Objectives
+
+* Implement **auto-loading** of dynamic libraries in ORC JIT.
+* Add **re-optimization support** to Clang-Repl for hot functions.
+
+
+### Implementation Details and Plans
+
+The primary objective of this project is to enable **automatic loading of dynamic libraries for unresolved symbols** in Clang-Repl. Since Clang-Repl heavily relies on LLVM's **ORC JIT** for incremental compilation and execution, our work focuses on extending ORC JIT to support this capability for out-of-process execution enviroment.
+
+Currently, ORC JIT handles dynamic library symbol resolution through the `DynamicLibrarySearchGenerator`, which is registered for each loaded dynamic library. This generator is responsible for symbol lookup and interacts with the **Executor Process Control** layer to resolve symbols during execution. Specifically, it uses a `DylibHandle` to identify which dynamic library to search for the unresolved symbol. On the executor side, the `SimpleExecutorDylibManager` API performs the actual lookup using this handle.
+
+To support **auto-loading in out-of-process execution**, Lang Hames proposed a design involving two new components:
+
+* **`ExecutorResolver` API**: This is an abstract interface for resolving symbols on the executor side. It can be implemented in different ways—for example:
+
+  * `PerDylibResolver`, which wraps a native handle for a specific library.
+  * `AutoLoadDylibResolver`, which attempts to load libraries automatically when a symbol is unresolved.
+
+  The `SimpleExecutorDylibManager` will be responsible for creating and managing these resolvers, returning a `ResolverHandle` instead of the traditional `DylibHandle`.
+
+* **`ExecutorSymbolResolutionGenerator`**: This generator replaces the existing `EPCDynamicLibrarySearchGenerator` for out-of-process execution. Unlike the previous design that relied on `DylibHandle`, this generator will use the new `ResolverHandle` to resolve symbols via the `ResolverHandle->resolve()` interface.
+
+In out-of-process execution, **per-library lookup** requires an RPC call for each dynamic library when resolving a symbol. If the symbol is in the **(N-1)th** library, **N-1 RPC calls** are made—introducing significant overhead.
+In **auto-loading mode**, only one RPC call is made, but it scans all libraries, which is also inefficient if the symbol is missing.
+
+To reduce this overhead, we propose using a **Bloom filter** to quickly check symbol presence in both modes before making costly lookups. The main challenge lies in designing an efficient and accurate filtering approach.
+
+The second goal of this project is to add **re-optimization support** for Clang-Repl. Since ORC JIT is the core component used by Clang-Repl for runtime compilation and execution, we will build on its existing capabilities. ORC JIT supports runtime re-optimization using the `ReOptimizeLayer` and `RedirectableManager`.
+
+At a high level, the `ReOptimizeLayer` emits boilerplate "sugar" code into the IR module. This code triggers a call to `__orc_rt_reoptimize_tag` when a threshold count is exceeded. This call is handled by `ReOptimizeLayer::rt_reoptimize`, which is triggered by the ORC runtime to generate an optimized version of a "hot" function. The `RedirectableManager` then updates the function’s stub pointer to point to the new optimized version. To achieve this, we will implement a custom `ReOptFunc`. If runtime profiling is needed to detect hot functions, we may also need to make small changes to the ORC runtime to collect this data.
+
+### Conclusion
+
+Upon completion of this project, ORC JIT will gain the ability to **automatically load dynamic libraries** to resolve previously unresolved symbols. Additionally, the integration of **filter-based optimizations** on the controller side will significantly reduce the overhead of unnecessary RPC calls.
+Overall, this work enhances the flexibility and performance of ORC JIT and improves the user experience in tools like Clang-Repl that rely on it.
+
+
+### Related Links
+
+- [LLVM Repository](https://github.com/llvm/llvm-project)
+- [Project Description](https://discourse.llvm.org/t/gsoc2025-advanced-symbol-resolution-and-reoptimization-for-clang-repl/84624/3)
+- [My GitHub Profile](https://github.com/SahilPatidar)
@@ -0,0 +1,52 @@
+---
+title: "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation"
+layout: post
+excerpt: "This GSoC project leverages Clad to optimize LLM training in C++, aiming to boost efficiency by developing a custom tensor library and integrating Clad for compiler-level gradient calculations."
+sitemap: true
+author: Rohan Timmaraju
+permalink: blogs/gsoc25_rohan_introduction_blog/
+banner_image: /images/blog/LLM_project_banner.jpg
+date: 2025-05-21
+tags: gsoc c++ clang clad llm
+---
+
+### Introduction
+
+I am Rohan Timmaraju, a Computer Science student at Columbia University. During Google Summer of Code 2025, I will be working on the "Enhancing LLM Training Efficiency with Clad for Automatic Differentiation" project with the Compiler Research group.
+
+**Mentors**: Vassil Vassilev, David Lange, Jonas Rembser, Christina Koutsou
+
+### About LLM Training
+
+Large Language Models (LLMs) like ChatGPT have revolutionized AI, but their training is incredibly computationally intensive. Currently, Python-based frameworks such as PyTorch and TensorFlow are the go-to tools. While they offer excellent flexibility and a rich ecosystem, their reliance on interpreted execution and dynamic computation graphs can lead to performance bottlenecks and high memory consumption. This is particularly noticeable when we consider deploying or training these models in resource-constrained environments or within C++-centric high-performance computing (HPC) setups, which are common in scientific research.
+
+While C++ provides the tools for fine-grained control over system resources and has proven its capabilities in efficient LLM inference (as seen with projects like [llama.cpp](https://github.com/ggml-org/llama.cpp)), the critical component for *training* – flexible and efficient Automatic Differentiation (AD) – presents an ongoing challenge for C++ solutions.
+
+### Why Use Clad?
+
+This project proposes to tackle this challenge by integrating Clad, an Automatic Differentiation plugin for the Clang compiler. Unlike traditional AD libraries that often operate at runtime, Clad performs source-to-source transformation. It analyzes the C++ Abstract Syntax Tree (AST) at compile time and generates optimized C++ code for computing derivatives. This compiler-level approach has the potential to reduce runtime overhead and improve memory efficiency compared to dynamic methods.
+
+To facilitate this integration, I am developing a custom C++ tensor library to be used in neural network training. Inspired by the powerful approaches of libraries such as [llm.c](https://github.com/karpathy/llm.c) and [pytorch](https://docs.pytorch.org/cppdocs/), this library is being designed from the ground up with Clad compatibility in mind. The core idea is to replace manual or internally managed gradient computations with Clad's reverse-mode AD (as in `clad::gradient`) for key LLM operations like matrix multiplications, activation functions, normalization layers, and the final loss function.
+
+### Implementation Plan
+1. **Foundation & Baseline:** The implementation will start by implementing a complete GPT-2 training loop in C++ *without* Clad. This will serve as our performance baseline. GPT-2 is chosen here as a relatively simple open-source LLM architecture capable of being trained on local devices. This could be extended to other architectures like Llama or Mistral.
+2. **Core Clad Integration Strategy:** We will investigate and evaluate different strategies for applying Clad to tensor network gradient calculations, potentially also identifying potential areas where Clad itself could be enhanced for deep learning workloads.
+3. **Expanding Integration:** Once a promising strategy is identified and validated on simpler operations, we'll systematically integrate Clad into more complex components of the GPT-2 architecture.
+4. **Benchmarking & Optimization:** Benchmarking against our baseline will be crucial to quantify the performance gains (speed, memory). We'll also use profiling tools to identify bottlenecks and optimize the tensor library with Clad. OpenMP may be employed for parallelization to further boost performance.
+5. **Documentation & Potential Extensions:** Thorough documentation of the tensor library, the Clad integration process, and our findings will also be a primary focus. Time permitting, we'll explore extending this work to other LLM architectures like Llama.
+
+
+### Conclusion
+By successfully integrating Clad into a C++ LLM training pipeline, we aim to:
+* **Demonstrate Performance Gains:** Show tangible improvements in training speed and memory efficiency.
+* **Clad for ML:** Provide a significant real-world use case, potentially identifying areas for Clad's improvement in supporting ML tasks.
+* **Offer a C++ Alternative:** Provide a foundation for more efficient, compiler-driven LLM training within the C++ ecosystems.
+* **Learn and Share:** Gain insights into the practicalities of applying compiler-based AD to complex ML problems and share these learnings with the community.
+
+I believe this project has the potential to make a valuable contribution to both the compiler research field and the ongoing efforts to make powerful AI models more accessible and efficient to train.
+
+### Related Links
+
+- [Project Description](https://hepsoftwarefoundation.org/gsoc/2025/proposal_Clad-LLM.html)
+- [Clad Repository](https://github.com/vgvassilev/clad)
+- [My GitHub Profile](https://github.com/Rohan-T144)