Skip to content

Conversation

@jgw0915
Copy link
Contributor

@jgw0915 jgw0915 commented Dec 18, 2025

Fix false load-use hazards by gating rs1/rs2 usage in ID stage

Background

While analyzing control hazard waveforms, I discovered that the current load-use hazard detection logic can spuriously trigger stalls for certain instruction patterns.

Specifically, the control logic in Control.scala compares rd_ex against both rs1_id and rs2_id unconditionally. However, for several instruction types, the bit positions corresponding to rs1 or rs2 do not represent architectural source registers.

Root Cause

Snipaste_2025-12-15_23-58-59
  • For I-type load instructions (e.g., lw a0, 16(a5)), only rs1 is architecturally used.

  • rs2_id is still wired directly from instruction bits [24:20], which encode imm[4:0] for I-type instructions.

  • When the immediate value coincidentally equals a register index (e.g., imm = 16x16), the condition

    rd_ex == rs2_id
    

    may incorrectly evaluate to true.

  • This causes io_pc_stall, io_if_stall, and io_id_flush to assert, inserting an unnecessary bubble even though no real data dependency exists.

This behavior is technically consistent with the existing implementation but reflects a decode-level limitation: the hazard unit does not know whether the ID-stage instruction actually uses rs1 or rs2.

Fixes in this PR

Explicit rs1 / rs2 usage signals

  • Added decode-time signals to indicate whether an instruction uses rs1 and/or rs2
    • Identified that jal, auipc, and lui do not use rs1 (the same bit positions encode immediates)
    • rs2 is only considered for instruction classes that architecturally use a second source register, including:
      • R-type ALU instructions (e.g., add, sub, and, or)
      • S-type store instructions (e.g., sw)
      • B-type branch instructions (e.g., beq, bne)
  • These signals follow the added design hint mentioned in CA25 Exercise 19
  • Enables the control logic to reason about architectural source operands correctly

Result

Snipaste_2025-12-16_01-02-28
  • Eliminates false-positive load-use stalls
  • Preserves correct handling of genuine hazards
  • Improves correctness and precision of control hazard detection
  • Waveform behavior after the fix matches architectural expectations and avoids unnecessary pipeline stalls

You can check the whole implementation logic in the Control.scala on my Github repo

Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request adds uses_rs1_id and uses_rs2_id signals and wires them to Control, but Control.scala still has ? placeholders. The hazard detection logic is not modified to use these signals.
Looking at the author's full implementation in their fork, the intended fix is:

  val hazard_ex_rs1 = io.uses_rs1_id && (io.rd_ex === io.rs1_id)
  val hazard_ex_rs2 = io.uses_rs2_id && (io.rd_ex === io.rs2_id)

  when(
    ((io.memory_read_enable_ex || io.jump_instruction_id) &&
     (io.rd_ex =/= 0.U) &&
     (hazard_ex_rs1 || hazard_ex_rs2))
    ||
    (io.jump_instruction_id &&
     io.memory_read_enable_mem &&
     (io.rd_mem =/= 0.U) &&
     ((io.uses_rs1_id && (io.rd_mem === io.rs1_id)) ||
      (io.uses_rs2_id && (io.rd_mem === io.rs2_id))))
  ) { ... }

The PR needs to include this Control.scala change or the fix is non-functional.

@jserv jserv changed the title Fix/false load-use stall by gating rs1 amd rs2 harzard check Fix/false load-use stall by gating rs1 amd rs2 hazard check Dec 19, 2025
@jserv
Copy link
Contributor

jserv commented Dec 19, 2025

Exercise 19 Intent: This is part of CA25 Exercise 19. The PR adds the usage signals which is good guidance, but should:

  1. Keep the exercise structure intact (placeholders remain for students)
  2. Add the signals as "hints" without solving the exercise completely

Since this PR is meant to be a complete fix, the Control.scala hazard logic needs updating.

@jserv
Copy link
Contributor

jserv commented Dec 19, 2025

Consider to contribute test case demonstrating false stall is eliminated:

lw  x16, 16(x0)   # imm[4:0] = 16 = x16
add x1, x2, x3    # rs2 = x3, should NOT stall despite x16 match

@jgw0915
Copy link
Contributor Author

jgw0915 commented Dec 24, 2025

So, in the requested change, I should put my full implementation of Control.scala in this PR (replace placeholders), modify camelCase to snake_case, append missing edge cases, and contribute false stall test cases, right?

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from b7ace8c to 0b2c4ff Compare December 24, 2025 09:36
@jserv
Copy link
Contributor

jserv commented Dec 25, 2025

So, in the requested change, I should put my full implementation of Control.scala in this PR (replace placeholders), modify camelCase to snake_case, append missing edge cases, and contribute false stall test cases, right?

Yes, that is the purpose of review.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from d148958 to bbd344d Compare December 25, 2025 09:30
jserv

This comment was marked as outdated.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch 3 times, most recently from a6d445e to 81839a6 Compare December 25, 2025 10:19
@jgw0915
Copy link
Contributor Author

jgw0915 commented Dec 25, 2025

I have updated a test case for false load use stall in Section 11 of hazard_extended.S. The reason why Section 11 checks mem[0x38] == 3 is that this test is designed to detect only the presence of an extra (spurious) stall, not to count absolute clock cycles.

In hazard_extended.S, Section 11 measures a local cycle window using two csrr cycle instructions surrounding two back-to-back lws:

csrr a2, cycle
lw   a6, 0(a5)
lw   a7, 16(a5)
csrr a3, cycle
a3 = a3 - a2

Assume the first csrr a2, cycle reads cycle = 1000.

Between the two CSR reads, there are two intervening instructions (lw a6, 0(a5) and lw a7, 16(a5)), which execute in the next two cycles:

  • after first lw: cycle = 1001
  • after second lw: cycle = 1002

The second csrr a3, cycle itself executes in the following cycle and therefore reads cycle = 1003.

Thus, the measured delta is:

a3 - a2 = 1003 - 1000 = 3

This value (3) represents the expected baseline when no extra bubble is inserted.

The purpose of this test is to detect whether the hazard unit inserts an additional stall cycle due to a false-positive load-use hazard. If rs2 is incorrectly considered for I-type instructions (i.e., imm[4:0] is treated as rs2), the control logic asserts pc_stall / if_stall / id_flush, inserting one extra bubble. In that case, the second csrr is delayed by one more cycle and reads 1004, yielding:

a3 - a2 = 4

Therefore:

  • mem[0x38] == 3 ⇒ no false stall (correct behavior)
  • mem[0x38] == 4 ⇒ one spurious stall inserted (buggy behavior)

Section 11 is thus a regression test that isolates decode-level false hazard detection, rather than a test of absolute cycle timing.

  • mem[0x38] == 3 ⇒ no false stall (correct behavior)
  • mem[0x38] == 4 ⇒ one spurious stall inserted (buggy behavior)

This makes Section 11 a regression test that isolates decode-level false hazard detection, rather than a test of absolute cycle accounting.

Here is the actual waveform observed with no gating of uses_rs1 and uses_rs2, you can see the false stall happen at 301~305 ps :

((io.memory_read_enable_ex || io.jump_instruction_id) && // Either:
      // - Jump in ID needs register value, OR
      // - Load in EX (load-use hazard)
      (io.rd_ex =/= 0.U) &&                                 // Destination is not x0
      ((io.rd_ex === io.rs1_id) || (io.rd_ex === io.rs2_id))) // Destination matches ID source
image

After implementing the gating with uses_rs1 and uses_rs2, the false stall is successfully eliminated in the test case.

    ((io.memory_read_enable_ex || io.jump_instruction_id) && // Either:
      // - Jump in ID needs register value, OR
      // - Load in EX (load-use hazard)
      (io.rd_ex =/= 0.U) &&                                 // Destination is not x0
      ((io.uses_rs1_id && (io.rd_ex === io.rs1_id) || io.uses_rs2_id && (io.rd_ex === io.rs2_id)))) // Destination matches ID source
image

@jgw0915 jgw0915 requested a review from jserv December 25, 2025 12:52
jserv

This comment was marked as outdated.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from 81839a6 to e49f798 Compare December 25, 2025 15:33
@jgw0915 jgw0915 requested a review from jserv December 25, 2025 15:34
jserv

This comment was marked as resolved.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from e49f798 to 8ffca19 Compare December 26, 2025 02:07
@jgw0915 jgw0915 requested a review from jserv December 26, 2025 02:08
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run 'git rebase -i' to squash commits and enforce the rules described in https://cbea.ms/git-commit/ .

Read the above carefully!

@jgw0915
Copy link
Contributor Author

jgw0915 commented Dec 26, 2025

My apologies, I thought the request was intended to shorten the commit message. Instead, I should squash commits to one or fewer meaningful commits, right?

@jserv
Copy link
Contributor

jserv commented Dec 26, 2025

My apologies, I thought the request was intended to shorten the commit message. Instead, I should squash commits to one or fewer meaningful commits, right?

Don't repeat my words.

@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from 8ffca19 to c07e80c Compare December 26, 2025 03:18
@jgw0915 jgw0915 requested a review from jserv December 26, 2025 03:18
jserv

This comment was marked as resolved.

Load-use detection compared rd_ex against rs1_id/rs2_id without checking
whether the instruction actually consumes those operands. For I-type
loads, rs2 encodes imm[4:0]; when it matched rd_ex the control unit
stalled PC/IF and flushed ID, inserting a bubble even though rs2 was
not a real source register.

Decode previously exposed raw rs1/rs2 bits, so the hazard logic could
not distinguish immediates from architectural sources.

Add uses_rs1/uses_rs2 from decode and gate hazard checks on these flags.
Also incorporate mem-read and jump enables to avoid false positives
while preserving true hazards.

Section 11 of hazard_extended.S now reports mem[0x38] == 3 (delta 3),
rather than 4, indicating the extra bubble is removed.
@jgw0915 jgw0915 force-pushed the Control-Hazard-Logic branch from c07e80c to 5be6751 Compare December 26, 2025 04:16
@jgw0915 jgw0915 requested a review from jserv December 26, 2025 04:18
@jserv jserv merged commit 3ca52d2 into sysprog21:main Dec 26, 2025
@jserv
Copy link
Contributor

jserv commented Dec 26, 2025

Thank @jgw0915 for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants