Skip to content

Conversation

@lee2716
Copy link

@lee2716 lee2716 commented Nov 25, 2025

Describe the intent of your PR here.

This PR adds support for RMSNorm (Root Mean Square Normalization) operation to the Deeploy framework's Generic platform. RMSNorm is a critical normalization technique used in modern Transformer architectures and large language models. To enable RMSNorm deployment on embedded systems, this PR implements the necessary mathematical primitives (Pow and Sqrt operations) and integrates them into Deeploy's compilation pipeline.

The implementation follows Deeploy's operator decomposition approach, where RMSNorm is constructed from basic mathematical operations rather than as a monolithic kernel. This design provides flexibility and maintainability while supporting both float32 and float16 precision for resource-constrained embedded devices.

Added

  • Pow (Power) operation support

    • FloatPowTemplate.py: Mako template for C code generation
    • Pow_fp32.c Kernel implementations for both precisions
    • kernel/Pow.h: Kernel interface definitions
    • Parser, Layer, and Binding classes for framework integration
  • Sqrt (Square Root) operation support

    • FloatSqrtTemplate.py: Mako template for C code generation
    • Sqrt_fp32.c : Kernel implementations
    • kernel/Sqrt.h: Kernel interface definitions
    • Complete framework integration components
  • Comprehensive test suites

    • testFloatPow : Pow operator tests with ONNX models and reference data
    • testFloatSqrt : Sqrt operator tests
    • testFloatRMSNorm: End-to-end RMSNorm tests demonstrating operator composition

Changed

  • Framework integration files

    • Deeploy/Targets/Generic/Parsers.py: Added PowParser and SqrtParser for ONNX graph parsing
    • Deeploy/Targets/Generic/Layers.py: Added corresponding Layer classes for both operations
    • Deeploy/Targets/Generic/Bindings.py: Added type checking and binding registration
    • Deeploy/Targets/Generic/Platform.py: Registered new operations in platform mapping
  • Runtime library headers

    • TargetLibraries/Generic/inc/DeeployBasicMath.h: Extended with Pow and Sqrt function declarations
    • TargetLibraries/Generic/inc/types.h: Updated type definitions for consistency
  • CI/CD configuration

    • .github/workflows/ci-platform-generic.yml: Updated to include new test cases in automated testing pipeline

Fixed

  • N/A (This is a feature addition PR with no bug fixes)

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (3)
  • DeeployTest/Tests/testFloatRMSNorm/inputs.npz
  • DeeployTest/Tests/testFloatRMSNorm/network.onnx
  • DeeployTest/Tests/testFloatRMSNorm/outputs.npz

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added Power (Pow) operation support with both array and scalar exponent modes.
    • Added Square Root (Sqrt) operation support for floating-point tensors.
    • Extended CI workflow with new test cases.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

The pull request introduces Pow and Sqrt operation support to the generic platform. Changes include adding new parser classes, layer definitions, bindings, templates with kernel selection logic, and C kernel implementations for float32 operations. CI tests for these new operations are also added.

Changes

Cohort / File(s) Summary
CI Configuration
.github/workflows/ci-platform-generic.yml
Adds three new test cases (testFloatPow, testFloatSqrt, testFloatRMSNorm) to the generic-kernels CI workflow
Parsers & Layers
Deeploy/Targets/Generic/Parsers.py, Deeploy/Targets/Generic/Layers.py
Introduces PowParser and SqrtParser classes for validating and extracting node context; adds PowLayer and SqrtLayer ONNXLayer subclasses
Bindings
Deeploy/Targets/Generic/Bindings.py
Adds BasicPowBindings and BasicSqrtBindings containing NodeBindings for pointer-class float32 inputs; updates import statements with FloatPowTemplate and FloatSqrtTemplate
Platform Integration
Deeploy/Targets/Generic/Platform.py
Integrates new parsers, layers, and bindings; creates PowMapper and SqrtMapper; registers Pow and Sqrt nodes in GenericMapping
Templates
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py, Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py
Implements _PowTemplate and _SqrtTemplate with alignToContext methods; PowTemplate handles scalar and array exponent dispatch; both expose referenceTemplate instances with kernel selection logic
Kernel Headers
TargetLibraries/Generic/inc/DeeployBasicMath.h, TargetLibraries/Generic/inc/kernel/Pow.h, TargetLibraries/Generic/inc/kernel/Sqrt.h
Adds include guards and declarations for Pow_fp32_fp32_fp32, Pow_fp32_scalar_fp32, and Sqrt_fp32_fp32 kernel functions
Kernel Implementations
TargetLibraries/Generic/src/Pow_fp32.c, TargetLibraries/Generic/src/Sqrt_fp32.c
Implements element-wise power and square-root kernels using powf and sqrtf with simple for-loop iteration

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • PowParser constant vs. non-constant exponent handling logic — verify validation, error handling, and correct path selection
  • FloatPowTemplate scalar detection and kernel dispatch — ensure size computation, is_scalar determination, and conditional template selection are correct
  • Integration wiring in Platform.py — confirm all parsers, layers, bindings, and mappers are properly connected and exported
  • Kernel implementations — review loop bounds, restrict qualifiers, and element-wise operation correctness

Suggested labels

Feature

Suggested reviewers

  • Victor-Jung
  • Xeratec
  • lukamac

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title 'Rmsnorm' does not reflect the actual changes made. The PR primarily adds Pow and Sqrt operations; RMSNorm itself is not implemented as a standalone operation, only mentioned as end-goal context. Revise title to accurately reflect primary changes, e.g., 'Add Pow and Sqrt operations for operator composition' or 'Add math primitives (Pow/Sqrt) for RMSNorm support'.
Docstring Coverage ⚠️ Warning Docstring coverage is 8.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description clearly relates to the changeset, explaining the addition of Pow and Sqrt operations, their integration into the framework, and the motivation for RMSNorm support.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
Deeploy/DeeployTypes.py (1)

327-349: Docstring matches behavior; minor cleanup possible in visited set

The updated “live ancestors” wording matches the breadth‑first walk over the alias graph and better explains what’s being checked. One small implementation nit: visited = set(self.name) builds a set of characters rather than a set of buffer names; using {self.name} would make the intent clearer and avoid mixing types in visited, even though it doesn’t currently break correctness.

TargetLibraries/Generic/src/Sqrt_fp32.c (1)

1-13: Elementwise fp32 sqrt kernel looks correct

The Sqrt_fp32_fp32 implementation is straightforward and type‑consistent with float32_t/int32_t, doing an elementwise sqrtf over the input range. Assuming sqrtf is declared via the transitive includes from DeeployBasicMath.h, there are no correctness issues here.

TargetLibraries/Generic/src/Pow_fp16.c (1)

1-26: Pow_fp16 implementation is correct for integer exponents; consider faster exponentiation

The kernel correctly handles zero and negative integer exponents and writes elementwise base^exponent into data_out. For typical small exponents this is fine, but the linear for (j = 0; j < exp; j++) loop makes runtime proportional to |exponent|. If you expect larger exponents or care about worst‑case latency, consider switching to exponentiation‑by‑squaring on a promoted float accumulator for better performance and numerical behavior, while preserving the float16_t I/O interface.

Deeploy/Targets/Generic/Layers.py (1)

230-240: PowLayer/SqrtLayer wiring is minimal and consistent with existing layers

The new PowLayer and SqrtLayer classes correctly follow the existing pattern of thin ONNXLayer wrappers around mappers. For current usage this is sufficient. If accurate op‑count reporting or explicit broadcasting for Pow becomes important, you may later want to override computeOps (e.g., proportional to tensor size) and, if needed, computeShapes similar to AddLayer/MulLayer.

Deeploy/Targets/Generic/Parsers.py (1)

1967-2001: Duplicate PowParser/SqrtParser definitions and mismatched exponent field

There are two separate definitions of PowParser and SqrtParser in this file: one here and another at lines 2813–2869. The later definitions override these ones at import time, so this block is effectively dead code and also:

  • Triggers lints (PowParser/SqrtParser redefinition, undefined ConstantBuffer on Line 1990).
  • Uses exponent_value instead of exponent, which doesn’t match FloatPowTemplate.alignToContext, where exponent is expected and exponent_value is derived there.

To avoid confusion and static-analysis noise, I’d consolidate to a single implementation (the newer one) and delete this earlier block entirely. A minimal fix would look like:

-class PowParser(NodeParser):
-    ...
-
-
-class SqrtParser(NodeParser):
-    ...
-

leaving only the final PowParser/SqrtParser definitions at the bottom of the file.

Also applies to: 2003-2023

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

14-28: FloatSqrtTemplate matches kernels; consider removing unused data_out

The template and alignToContext correctly:

  • Infer data_type from data_in and
  • Dispatch to Sqrt_fp32_fp32 / Sqrt_fp16_fp16 with the right arguments.

The only nit is that data_out = ctxt.lookup(operatorRepresentation['data_out']) is never used in alignToContext; you can safely drop that line to quiet Ruff and keep the function minimal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e07cd13 and 30cfabb.

📒 Files selected for processing (16)
  • .github/workflows/ci-platform-generic.yml (1 hunks)
  • Deeploy/DeeployTypes.py (3 hunks)
  • Deeploy/Targets/Generic/Bindings.py (2 hunks)
  • Deeploy/Targets/Generic/Layers.py (1 hunks)
  • Deeploy/Targets/Generic/Parsers.py (2 hunks)
  • Deeploy/Targets/Generic/Platform.py (3 hunks)
  • Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1 hunks)
  • Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1 hunks)
  • TargetLibraries/Generic/inc/DeeployBasicMath.h (1 hunks)
  • TargetLibraries/Generic/inc/kernel/Pow.h (1 hunks)
  • TargetLibraries/Generic/inc/kernel/Sqrt.h (1 hunks)
  • TargetLibraries/Generic/inc/types.h (1 hunks)
  • TargetLibraries/Generic/src/Pow_fp16.c (1 hunks)
  • TargetLibraries/Generic/src/Pow_fp32.c (1 hunks)
  • TargetLibraries/Generic/src/Sqrt_fp16.c (1 hunks)
  • TargetLibraries/Generic/src/Sqrt_fp32.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
TargetLibraries/Generic/inc/kernel/Sqrt.h (3)
TargetLibraries/Generic/src/Sqrt_fp32.c (1)
  • Sqrt_fp32_fp32 (9-13)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
TargetLibraries/Generic/src/Sqrt_fp16.c (1)
  • Sqrt_fp16_fp16 (9-13)
TargetLibraries/Generic/inc/kernel/Pow.h (2)
TargetLibraries/Generic/src/Pow_fp32.c (1)
  • Pow_fp32_int32_fp32 (9-27)
TargetLibraries/Generic/src/Pow_fp16.c (1)
  • Pow_fp16_int32_fp16 (8-26)
Deeploy/Targets/Generic/Layers.py (1)
Deeploy/DeeployTypes.py (2)
  • ONNXLayer (1819-2147)
  • NodeMapper (1660-1816)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/Targets/Snitch/Parsers.py (3)
  • parseNode (15-26)
  • parseNodeCtxt (28-42)
  • parseNodeCtxt (60-74)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (3)
Deeploy/DeeployTypes.py (2)
  • NetworkContext (508-1020)
  • NodeTemplate (87-229)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
  • alignToContext (14-28)
Deeploy/AbstractDataTypes.py (1)
  • typeName (312-313)
Deeploy/Targets/Generic/Platform.py (3)
Deeploy/Targets/Generic/Layers.py (2)
  • PowLayer (230-233)
  • SqrtLayer (236-239)
Deeploy/Targets/Generic/Parsers.py (4)
  • PowParser (1967-2000)
  • PowParser (2814-2846)
  • SqrtParser (2003-2023)
  • SqrtParser (2849-2869)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1660-1816)
Deeploy/Targets/Generic/Bindings.py (2)
Deeploy/CommonExtensions/DataTypes.py (2)
  • float16_t (67-71)
  • float32_t (74-78)
Deeploy/DeeployTypes.py (2)
  • CodeTransformation (2290-2324)
  • NodeBinding (1512-1657)
🪛 Ruff (0.14.5)
Deeploy/Targets/Generic/Parsers.py

1978-1978: Unused method argument: channels_first

(ARG002)


1990-1990: Undefined name ConstantBuffer

(F821)


1995-1996: Prefer TypeError exception for invalid type

(TRY004)


1995-1996: Avoid specifying long messages outside the exception class

(TRY003)


2014-2014: Unused method argument: channels_first

(ARG002)


2814-2814: Redefinition of unused PowParser from line 1967

(F811)


2825-2825: Unused method argument: channels_first

(ARG002)


2849-2849: Redefinition of unused SqrtParser from line 2003

(F811)


2860-2860: Unused method argument: channels_first

(ARG002)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

19-19: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

19-19: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)


30-30: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (9)
.github/workflows/ci-platform-generic.yml (1)

76-81: New generic-kernels tests are wired correctly in CI list

The added Pow/Sqrt/RMSNorm tests fit the existing naming pattern and placement in the float test block; no workflow logic changes. Please just confirm the test targets are defined under those exact names so CI doesn’t fail on a typo.

Deeploy/DeeployTypes.py (1)

2565-2589: _mapNode refactor cleanly decouples engine selection from layer construction

The new _mapNode that iterates engines and calls engine.Mapping[node.op](node) looks correct and keeps NetworkContainer._bindLayers simpler by returning either an ONNXLayer or side‑effect‑only mappings (e.g., Constant hoisting). The isinstance(layer, ONNXLayer) guard in _bindLayers ensures only real layers are scheduled, while Constant mappings still run via their lambda. This preserves multi‑engine support and fails fast with a clear RuntimeError when no mapping exists.

TargetLibraries/Generic/inc/DeeployBasicMath.h (1)

47-55: Pow/Sqrt kernel includes are correctly integrated

Including kernel/Pow.h and kernel/Sqrt.h alongside the other basic math kernels is consistent and sufficient to expose the new operations to users of DeeployBasicMath.h.

TargetLibraries/Generic/src/Sqrt_fp16.c (1)

1-13: fp16 sqrt kernel is consistent with the fp32 path

Sqrt_fp16_fp16 mirrors the fp32 implementation, applying sqrtf elementwise and relying on the float16_t typedef for the actual storage type. This is a reasonable, simple implementation for FP16 support and aligns with the new type definition.

TargetLibraries/Generic/inc/types.h (1)

13-21: float16_t typedef is sensible and keeps non-FP16 platforms building

Defining float16_t as _Float16 when compiler support is detected, and otherwise aliasing it to float, gives the new Pow/Sqrt FP16 kernels a consistent type while preserving buildability on targets without native FP16. The surrounding comments clearly document this fallback behavior.

TargetLibraries/Generic/inc/kernel/Sqrt.h (1)

1-24: Sqrt kernel header matches implementations

The include guard, DeeployBasicMath dependency, and fp32/fp16 prototypes are consistent with the corresponding C kernels; no issues from a correctness or integration perspective.

TargetLibraries/Generic/inc/kernel/Pow.h (1)

1-25: Pow kernel header is consistent with C implementations

Prototypes and guard are well-formed and match the Pow_fp32/Pow_fp16 C kernels; nothing blocking here.

Deeploy/Targets/Generic/Bindings.py (1)

10-11: Pow/Sqrt bindings are wired consistently with templates and types

The new BasicPowBindings/BasicSqrtBindings correctly:

  • Use float32_t/float16_t pointer types for inputs/outputs.
  • Bind to FloatPowTemplate.referenceTemplate and FloatSqrtTemplate.referenceTemplate.
  • Reuse DummyChecker and BasicTransformer in line with nearby float ops.

Once the Pow parser/template exponent checks are tightened as discussed, these bindings look sound.

Also applies to: 18-22, 121-133

Deeploy/Targets/Generic/Platform.py (1)

10-17: Pow/Sqrt integration into Generic platform is coherent

The new imports, PowMapper/SqrtMapper definitions, and 'Pow'/'Sqrt' entries in GenericMapping line up correctly with:

  • BasicPowBindings / BasicSqrtBindings,
  • PowLayer / SqrtLayer, and
  • The Pow/Sqrt kernels exposed via DeeployBasicMath.h.

Assuming DeeployBasicMath.h now includes the new kernel/Pow.h and kernel/Sqrt.h, the end‑to‑end wiring looks correct.

Please double‑check that DeeployBasicMath.h actually includes the new Pow/Sqrt kernel headers so generated code has the necessary prototypes.

Also applies to: 20-22, 27-29, 56-57, 104-105

Comment on lines 2811 to 2869
############################


class PowParser(NodeParser):

def __init__(self):
super().__init__()

def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Pow' and len(node.inputs) == 2 and len(node.outputs) == 1

def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:

data_in = ctxt.lookup(node.inputs[0].name)
exponent = node.inputs[1]
data_out = ctxt.lookup(node.outputs[0].name)

self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name

# Check if exponent is a constant
if isinstance(exponent, gs.Constant):
exp_value = float(exponent.values)
self.operatorRepresentation['exponent'] = exp_value
self.operatorRepresentation['is_constant_exp'] = True
else:
exp_tensor = ctxt.lookup(exponent.name)
self.operatorRepresentation['exponent'] = exp_tensor.name
self.operatorRepresentation['is_constant_exp'] = False

self.operatorRepresentation['size'] = int(np.prod(data_in.shape))

return ctxt, True


class SqrtParser(NodeParser):

def __init__(self):
super().__init__()

def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Sqrt' and len(node.inputs) == 1 and len(node.outputs) == 1

def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:

data_in = ctxt.lookup(node.inputs[0].name)
data_out = ctxt.lookup(node.outputs[0].name)

self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name
self.operatorRepresentation['size'] = int(np.prod(data_in.shape))

return ctxt, True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Tighten PowParser/SqrtParser to the supported “scalar constant integer exponent” subset

With the current wiring:

  • PowParser.parseNodeCtxt stores a float exponent for gs.Constant inputs, or a tensor name otherwise.
  • FloatPowTemplate.alignToContext then casts that float to int, only rejecting tensor exponents if the value is a string.

Given the C kernels (Pow_fp32_int32_fp32/Pow_fp16_int32_fp16) only support integer exponents, this means any non‑integer constant exponent will be silently truncated before codegen, which is a functional divergence from general ONNX Pow.

I’d recommend tightening PowParser here to enforce what the backend actually supports:

  • Require the exponent input to be a scalar gs.Constant.
  • Extract its scalar value and store it as a Python number in operatorRepresentation['exponent'].
  • Reject any tensor/broadcast exponents up front.

For example:

-        data_in = ctxt.lookup(node.inputs[0].name)
-        exponent = node.inputs[1]
+        data_in = ctxt.lookup(node.inputs[0].name)
+        exponent = node.inputs[1]
         data_out = ctxt.lookup(node.outputs[0].name)
@@
-        self.operatorRepresentation['data_in'] = data_in.name
-        self.operatorRepresentation['data_out'] = data_out.name
-
-        # Check if exponent is a constant
-        if isinstance(exponent, gs.Constant):
-            exp_value = float(exponent.values)
-            self.operatorRepresentation['exponent'] = exp_value
-            self.operatorRepresentation['is_constant_exp'] = True
-        else:
-            exp_tensor = ctxt.lookup(exponent.name)
-            self.operatorRepresentation['exponent'] = exp_tensor.name
-            self.operatorRepresentation['is_constant_exp'] = False
+        self.operatorRepresentation['data_in'] = data_in.name
+        self.operatorRepresentation['data_out'] = data_out.name
+
+        # Only scalar constant exponents are supported
+        if not isinstance(exponent, gs.Constant):
+            raise TypeError(f"Pow: exponent input for node {node.name} must be a scalar Constant")
+
+        exp_vals = np.asarray(exponent.values).reshape(-1)
+        if exp_vals.size != 1:
+            raise ValueError(f"Pow: only scalar exponents are supported, got shape {exp_vals.shape}")
+
+        self.operatorRepresentation['exponent'] = float(exp_vals[0])
@@
-        self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
-
-        return ctxt, True
+        self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
+        return ctxt, True

You can keep SqrtParser here and drop the earlier duplicate, as mentioned in the previous comment.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
############################
class PowParser(NodeParser):
def __init__(self):
super().__init__()
def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Pow' and len(node.inputs) == 2 and len(node.outputs) == 1
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
data_in = ctxt.lookup(node.inputs[0].name)
exponent = node.inputs[1]
data_out = ctxt.lookup(node.outputs[0].name)
self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name
# Check if exponent is a constant
if isinstance(exponent, gs.Constant):
exp_value = float(exponent.values)
self.operatorRepresentation['exponent'] = exp_value
self.operatorRepresentation['is_constant_exp'] = True
else:
exp_tensor = ctxt.lookup(exponent.name)
self.operatorRepresentation['exponent'] = exp_tensor.name
self.operatorRepresentation['is_constant_exp'] = False
self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
return ctxt, True
class SqrtParser(NodeParser):
def __init__(self):
super().__init__()
def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Sqrt' and len(node.inputs) == 1 and len(node.outputs) == 1
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
data_in = ctxt.lookup(node.inputs[0].name)
data_out = ctxt.lookup(node.outputs[0].name)
self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name
self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
return ctxt, True
############################
class PowParser(NodeParser):
def __init__(self):
super().__init__()
def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Pow' and len(node.inputs) == 2 and len(node.outputs) == 1
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
data_in = ctxt.lookup(node.inputs[0].name)
exponent = node.inputs[1]
data_out = ctxt.lookup(node.outputs[0].name)
self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name
# Only scalar constant exponents are supported
if not isinstance(exponent, gs.Constant):
raise TypeError(f"Pow: exponent input for node {node.name} must be a scalar Constant")
exp_vals = np.asarray(exponent.values).reshape(-1)
if exp_vals.size != 1:
raise ValueError(f"Pow: only scalar exponents are supported, got shape {exp_vals.shape}")
self.operatorRepresentation['exponent'] = float(exp_vals[0])
self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
return ctxt, True
class SqrtParser(NodeParser):
def __init__(self):
super().__init__()
def parseNode(self, node: gs.Node) -> bool:
return node.op == 'Sqrt' and len(node.inputs) == 1 and len(node.outputs) == 1
def parseNodeCtxt(self,
ctxt: NetworkContext,
node: gs.Node,
channels_first: bool = True) -> Tuple[NetworkContext, bool]:
data_in = ctxt.lookup(node.inputs[0].name)
data_out = ctxt.lookup(node.outputs[0].name)
self.operatorRepresentation['data_in'] = data_in.name
self.operatorRepresentation['data_out'] = data_out.name
self.operatorRepresentation['size'] = int(np.prod(data_in.shape))
return ctxt, True
🧰 Tools
🪛 Ruff (0.14.5)

2814-2814: Redefinition of unused PowParser from line 1967

(F811)


2825-2825: Unused method argument: channels_first

(ARG002)


2849-2849: Redefinition of unused SqrtParser from line 2003

(F811)


2860-2860: Unused method argument: channels_first

(ARG002)

Comment on lines 14 to 38
def alignToContext(self, ctxt: NetworkContext,
operatorRepresentation: OperatorRepresentation) -> Tuple[NetworkContext, Dict, List[str]]:

# Get input and output tensors
data_in = ctxt.lookup(operatorRepresentation['data_in'])
data_out = ctxt.lookup(operatorRepresentation['data_out'])

# Get data type (fp32 or fp16)
data_type = data_in._type.typeName
operatorRepresentation['data_type'] = data_type

# Exponent must be a constant integer
if 'exponent' in operatorRepresentation:
exponent_input = operatorRepresentation['exponent']
if isinstance(exponent_input, str):
# It's a tensor name - not supported for integer exponent version
raise ValueError("Tensor exponent not supported. Use constant integer exponent.")
else:
# Convert to integer
operatorRepresentation['exponent_value'] = int(exponent_input)

# Calculate size
operatorRepresentation['size'] = int(np.prod(data_in.shape))

return ctxt, operatorRepresentation, []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Enforce scalar integer exponent and remove unused local in FloatPowTemplate

Two things here:

  1. Exponent semantics / silent truncation

alignToContext currently does:

  • Accepts any numeric operatorRepresentation['exponent'].
  • Casts it directly to int without checking integer‑ness.
  • Rejects only string exponents (tensors) at runtime.

Combined with the C kernels’ int32_t exponent parameter, this means a constant exponent of 2.7 will be silently interpreted as 2, which is a correctness bug for general Pow.

To make the limitation explicit and safe, I’d strongly suggest:

  • Requiring exponent to be present and non‑string (i.e., constant, not a tensor).
  • Validating that it’s integer‑valued before casting.
  • Raising a clear error otherwise.

For example:

-        # Exponent must be a constant integer
-        if 'exponent' in operatorRepresentation:
-            exponent_input = operatorRepresentation['exponent']
-            if isinstance(exponent_input, str):
-                # It's a tensor name - not supported for integer exponent version
-                raise ValueError("Tensor exponent not supported. Use constant integer exponent.")
-            else:
-                # Convert to integer
-                operatorRepresentation['exponent_value'] = int(exponent_input)
+        # Exponent must be a scalar constant integer
+        exponent_input = operatorRepresentation.get('exponent')
+        if exponent_input is None:
+            raise ValueError("Pow: missing 'exponent' in operatorRepresentation")
+        if isinstance(exponent_input, str):
+            # Tensor exponents are not supported by the integer-exponent kernel
+            raise TypeError("Pow: tensor exponents are not supported; exponent must be a scalar constant")
+
+        exp_float = float(exponent_input)
+        if not exp_float.is_integer():
+            raise ValueError(f"Pow: only integer exponents are supported, got {exponent_input!r}")
+
+        operatorRepresentation['exponent_value'] = int(exp_float)

This aligns codegen behavior with the actual kernel capabilities and avoids silent truncation.

  1. Minor: unused data_out local

data_out = ctxt.lookup(operatorRepresentation['data_out']) is never used. You can safely drop that line to satisfy Ruff and keep alignToContext minimal.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

19-19: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)


30-30: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In Deeploy/Targets/Generic/Templates/FloatPowTemplate.py around lines 14–38,
alignToContext currently looks up an unused data_out and silently truncates
non-integer numeric exponents; remove the unused data_out = ctxt.lookup(...)
line, require that 'exponent' exists and is not a tensor (string), validate that
the provided constant exponent is integer-valued (e.g., reject floats that are
not whole numbers) before casting to int, store the integer in
operatorRepresentation['exponent_value'], and raise a clear ValueError if the
exponent is missing, a tensor, or not integer-valued.

Comment on lines 7 to 27
#include "DeeployBasicMath.h"

void Pow_fp32_int32_fp32(float32_t *data_in, int32_t exponent,
float32_t *data_out, int32_t size) {
for (int i = 0; i < size; i++) {
float32_t result = 1.0f;
int32_t exp = exponent;
float32_t base = data_in[i];

if (exp < 0) {
base = 1.0f / base;
exp = -exp;
}

for (int32_t j = 0; j < exp; j++) {
result *= base;
}

data_out[i] = result;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Kernel assumes integer scalar exponents; ensure front‑end enforces this

This implementation is correct for integer exponents (including negatives), but it fundamentally implements base**int(exponent) via repeated multiplication. When wired through the current Python template, any non-integer constant exponent will be truncated to an integer before reaching this kernel, silently diverging from ONNX Pow semantics.

Please make sure the Python side (parser/template) explicitly:

  • Restricts Pow to scalar, constant, integer exponents, and
  • Raises a clear error if the exponent is non-integer or a tensor,

so users don’t get silent wrong results for general Pow nodes.

🤖 Prompt for AI Agents
In TargetLibraries/Generic/src/Pow_fp32.c around lines 7 to 27, the C kernel
implements power by repeated multiplication and thus only supports integer
scalar exponents; update the Python front-end (parser/template) to validate Pow
nodes before emitting this kernel: require the exponent to be a scalar constant
integer (not a tensor and not a non-integer), and if that check fails raise a
clear, descriptive exception (e.g., "Pow operator only supported for scalar
integer constant exponents in this backend") so users are not silently given
truncated/non-ONNX-compliant results.

@lee2716 lee2716 marked this pull request as draft November 25, 2025 17:09
The current implementation does not support true FP16 arithmetic. Instead, data is cast to FP32 internally. These tests are being removed to avoid misleading results until native half-precision support is implemented.
Copy link
Contributor

@diaconuccalin diaconuccalin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job overall, but there are some changes that we need to address.

Most of them concern the following:

  • Remove integer exponent enforcement for Pow
  • Remove constant exponent enforcement for Pow
  • Remove all traces of FP16 version for Generic, since the compiler for this platform doesn't support this format (as we talked privately, we will use it directly in Snitch, since here it would only help us create the proper infrastructure, like binding and parser, but we've already done it with FP32)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have 2 versions each of PowParser and SqrtParser. Please only keep one for each. I see the Sqrt ones are identical, and for the Pow operation, the first one (above the #### line) looks cleaner

return ctxt, False


############################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment line as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes on this page should be reverted, it looks like a leftover from a rebase

void Pow_fp32_int32_fp32(float32_t *data_in, int32_t exponent,
float32_t *data_out, int32_t size);

void Pow_fp16_int32_fp16(float16_t *data_in, int32_t exponent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I think it's ok to remove the fp16 version from the generic platform, since there is no compiler support for this data format.


void Sqrt_fp32_fp32(float32_t *data_in, float32_t *data_out, int32_t size);

void Sqrt_fp16_fp16(float16_t *data_in, float16_t *data_out, int32_t size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we can remove the fp16 for generic because of the lack of compiler support,

operatorRepresentation['data_type'] = data_type

# Exponent must be a constant integer
if 'exponent' in operatorRepresentation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we should be certain that the exponent is in the opRep, no need for this check.

self.operatorRepresentation['data_out'] = data_out.name

# Extract exponent value from the constant tensor
if isinstance(exponent_tensor, ConstantBuffer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's remove the constant enforcement


#include "DeeployBasicMath.h"

void Pow_fp32_int32_fp32(float32_t *data_in, int32_t exponent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For removing the enforcement of constant and int values, we will need to change the data type of exponent to const float *__restrict__

float32_t *data_out, int32_t size) {
for (int i = 0; i < size; i++) {
float32_t result = 1.0f;
int32_t exp = exponent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also have to update the kernel for float exponent support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a new test to better test the RMSNorm function. Right now, pretty much all values go to either -1 or 1 because the function normalizes on the last axis and we only have a single element on that one (the input dimension is 1024x1). Let's do a test with something like a 128x128 input, so it's not too big, but has more than one element on the last axis.

Let's also use a value a little more interesting than 1 for the weight, to check that operation as well.

This commit addresses code review feedback:
- Refactor Pow kernel to use 'powf' from math.h to support floating-point exponents.
- Update PowParser to allow tensor exponents instead of forcing constants.
- Remove Generic FP16 support and revert types.h changes.
- Remove duplicate PowParser/SqrtParser classes.
- Enhance RMSNorm tests with larger shapes and non-trivial weights.
@lee2716 lee2716 marked this pull request as ready for review November 28, 2025 20:33
@lee2716 lee2716 marked this pull request as draft November 28, 2025 20:35
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)

12-13: Remove unused variable lookup.

The data_out variable is looked up but never used in alignToContext. Since the template only needs data_in to infer the data type and compute size, you can safely remove this line.

Apply this diff:

     # Get input and output tensors
     data_in = ctxt.lookup(operatorRepresentation['data_in'])
-    data_out = ctxt.lookup(operatorRepresentation['data_out'])
     
     # Get data type (fp32)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)

12-14: Remove unused variable lookup.

The data_out variable is looked up but never used in alignToContext. The method only needs data_in and exponent to infer the data type and compute sizes.

Apply this diff:

     # Get input and output tensors
     data_in = ctxt.lookup(operatorRepresentation['data_in'])
     exponent = ctxt.lookup(operatorRepresentation['exponent'])
-    data_out = ctxt.lookup(operatorRepresentation['data_out'])
     
     # Get data type (fp32)
Deeploy/Targets/Generic/Bindings.py (1)

121-129: Consider more specific type checkers for Pow and Sqrt.

The bindings use DummyChecker which provides minimal type validation. While this may be intentional for flexibility, you might want to define dedicated PowChecker and SqrtChecker classes (similar to AddChecker, MulChecker, etc.) to provide more specific type validation for these operations.

This can be deferred if the current approach aligns with the project's type-checking strategy. The bindings are otherwise correctly structured.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30cfabb and 8f90620.

📒 Files selected for processing (8)
  • .github/workflows/ci-platform-generic.yml (1 hunks)
  • Deeploy/Targets/Generic/Bindings.py (2 hunks)
  • Deeploy/Targets/Generic/Parsers.py (3 hunks)
  • Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1 hunks)
  • Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1 hunks)
  • TargetLibraries/Generic/inc/kernel/Pow.h (1 hunks)
  • TargetLibraries/Generic/inc/kernel/Sqrt.h (1 hunks)
  • TargetLibraries/Generic/src/Pow_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/ci-platform-generic.yml
🧰 Additional context used
🧬 Code graph analysis (5)
TargetLibraries/Generic/inc/kernel/Sqrt.h (2)
TargetLibraries/Generic/src/Sqrt_fp32.c (1)
  • Sqrt_fp32_fp32 (9-13)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/Targets/Generic/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (1)
  • float32_t (74-78)
Deeploy/DeeployTypes.py (2)
  • CodeTransformation (2290-2324)
  • NodeBinding (1512-1657)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
TargetLibraries/Generic/inc/kernel/Pow.h (2)
TargetLibraries/Generic/src/Pow_fp32.c (2)
  • Pow_fp32_fp32_fp32 (10-17)
  • Pow_fp32_scalar_fp32 (19-26)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (7)
  • NetworkContext (508-1020)
  • NodeParser (1023-1198)
  • VariableBuffer (232-360)
  • ConstantBuffer (393-430)
  • parseNode (1033-1048)
  • inputs (2503-2520)
  • parseNodeCtxt (1051-1076)
Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py (1)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)
  • alignToContext (9-34)
🪛 Ruff (0.14.6)
Deeploy/Targets/Generic/Templates/FloatPowTemplate.py

14-14: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

Deeploy/Targets/Generic/Parsers.py

1978-1978: Unused method argument: channels_first

(ARG002)


1995-1996: Prefer TypeError exception for invalid type

(TRY004)


1995-1996: Avoid specifying long messages outside the exception class

(TRY003)


2799-2799: Unused method argument: channels_first

(ARG002)

Deeploy/Targets/Generic/Templates/FloatSqrtTemplate.py

13-13: Local variable data_out is assigned to but never used

Remove assignment to unused variable data_out

(F841)

🔇 Additional comments (6)
TargetLibraries/Generic/inc/kernel/Sqrt.h (1)

20-20: LGTM!

The function signature is correct for an element-wise square root operation. The naming convention follows the pattern seen in other kernels and the parameters are appropriate.

TargetLibraries/Generic/inc/kernel/Pow.h (1)

16-24: LGTM!

Both function signatures correctly use float32_t for the exponent parameter(s), which allows the kernels to support general floating-point exponents via powf. The const and restrict qualifiers are appropriate.

TargetLibraries/Generic/src/Pow_fp32.c (1)

10-26: LGTM!

Both kernel implementations correctly use powf which supports general floating-point exponents. The array-based and scalar-based variants are implemented appropriately for broadcasting scenarios.

Deeploy/Targets/Generic/Parsers.py (2)

2788-2808: LGTM!

The SqrtParser implementation is straightforward and correct for a unary square root operation. It properly extracts the input/output tensors and computes the size.

Note: The channels_first parameter is unused (flagged by static analysis), but this is likely required by the NodeParser interface.


1990-1996: Incorrect exponent handling: casting to int loses precision and enforcing constants limits functionality.

There are two critical issues here:

  1. Integer casting loses precision: Line 1991 casts the exponent to int, but the C kernel Pow_fp32_fp32_fp32 and Pow_fp32_scalar_fp32 use powf which supports floating-point exponents. For example, an exponent of 2.5 would be silently truncated to 2, producing incorrect results.

  2. Constant enforcement is too restrictive: Lines 1994-1996 reject non-constant (variable tensor) exponents, but this unnecessarily limits the operator's functionality. Per past review feedback and the ONNX Pow specification, variable exponents should be supported.

Apply this diff to support float exponents and remove constant enforcement:

     # Extract exponent value from the constant tensor
     if isinstance(exponent_tensor, ConstantBuffer):
-        exp_value = int(exponent_tensor.values.flatten()[0])
-        self.operatorRepresentation['exponent_value'] = exp_value
-    else:
-        # Tensor exponent not supported
-        raise ValueError(f"Node {node.name}: Exponent must be a constant. "
-                         f"Variable tensor exponents are not supported.")
+        exp_value = float(exponent_tensor.values.flatten()[0])
+        self.operatorRepresentation['exponent_value'] = exp_value
+    # Variable tensor exponents are now supported via the array-based kernel

Based on learnings from past reviews requesting float exponent support and removal of constant enforcement.

Likely an incorrect or invalid review comment.

Deeploy/Targets/Generic/Templates/FloatPowTemplate.py (1)

25-34: LGTM!

The scalar broadcasting logic is well-implemented. The template correctly distinguishes between scalar and array exponents, selecting the appropriate kernel (Pow_fp32_scalar_fp32 vs Pow_fp32_fp32_fp32) and constructing the proper variable reference for scalar exponents.

@lee2716 lee2716 marked this pull request as ready for review November 28, 2025 21:23
@lee2716 lee2716 requested a review from diaconuccalin December 1, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants