Skip to content

[LoopInterchange] Ignore the cost-model, force interchange if legal #148858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 18, 2025
Merged
24 changes: 20 additions & 4 deletions llvm/lib/Transforms/Scalar/LoopInterchange.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ enum class RuleTy {
PerLoopCacheAnalysis,
PerInstrOrderCost,
ForVectorization,
Ignore
};

} // end anonymous namespace
Expand Down Expand Up @@ -106,14 +107,20 @@ static cl::list<RuleTy> Profitabilities(
clEnumValN(RuleTy::PerInstrOrderCost, "instorder",
"Prioritize the IVs order of each instruction"),
clEnumValN(RuleTy::ForVectorization, "vectorize",
"Prioritize vectorization")));
"Prioritize vectorization"),
clEnumValN(RuleTy::Ignore, "ignore",
"Ignore profitability, force interchange (does not "
"work with other options)")));

#ifndef NDEBUG
static bool noDuplicateRules(ArrayRef<RuleTy> Rules) {
static bool noDuplicateRulesAndIgnore(ArrayRef<RuleTy> Rules) {
SmallSet<RuleTy, 4> Set;
for (RuleTy Rule : Rules)
for (RuleTy Rule : Rules) {
if (!Set.insert(Rule).second)
return false;
if (Rule == RuleTy::Ignore)
return false;
}
return true;
}

Expand Down Expand Up @@ -1286,6 +1293,13 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
bool LoopInterchangeProfitability::isProfitable(
const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
unsigned OuterLoopId, CharMatrix &DepMatrix, CacheCostManager &CCM) {

// Return true if interchange is forced and the cost-model ignored.
if (Profitabilities.size() == 1 && Profitabilities[0] == RuleTy::Ignore)
return true;
assert(noDuplicateRulesAndIgnore(Profitabilities) &&
"Duplicate rules and option 'ignore' are not allowed");

// isProfitable() is structured to avoid endless loop interchange. If the
// highest priority rule (isProfitablePerLoopCacheAnalysis by default) could
// decide the profitability then, profitability check will stop and return the
Expand All @@ -1294,7 +1308,6 @@ bool LoopInterchangeProfitability::isProfitable(
// second highest priority rule (isProfitablePerInstrOrderCost by default).
// Likewise, if it failed to analysis the profitability then only, the last
// rule (isProfitableForVectorization by default) will decide.
assert(noDuplicateRules(Profitabilities) && "Detect duplicate rules");
std::optional<bool> shouldInterchange;
for (RuleTy RT : Profitabilities) {
switch (RT) {
Expand All @@ -1311,6 +1324,9 @@ bool LoopInterchangeProfitability::isProfitable(
shouldInterchange =
isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
break;
case RuleTy::Ignore:
llvm_unreachable("Option 'ignore' is not supported with other options");
break;
}

// If this rule could determine the profitability, don't call subsequent
Expand Down
43 changes: 43 additions & 0 deletions llvm/test/Transforms/LoopInterchange/force-interchange.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
; RUN: opt < %s -passes=loop-interchange -pass-remarks-output=%t -disable-output -loop-interchange-profitabilities=ignore -S
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orce interchange can occur odd behavior in some cases. Can you add a test to show it? Here is an example:

int A[4][4][4];
for (int i = 0; i < 4; i++)
  for (int j = 1; j < 4; j++)
    for (int k = 0; k < 4; k++)
      A[3-i][j-1][k] = A[i][j][k];

IIUIC, the following scenario happens due to the bubble-sort based heuristic:

1. Try to interchange the 2nd loop (j-loop) and 3rd loop (k-loop). It is legal, so they are interchanged.

2. Try to interchange the 1st loop (i-loop) and 2nd loop (k-loop, because the j-loop and k-loop were interchanged). It is illegal, so they are NOT interchanged.

3. Try to interchange the 2nd loop (k-loop) and 3rd loop (j-loop). It is legal indeed, so they are interchanged again.

Consequently, the first interchange is reverted, and the final loop order remains the same as the original. I believe this is acceptable, considering that the force interchange option is intended for testing purposes. However, it is worth clarifying this behavior with an example.

Could you please add this? It's fine to do it in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, I will commit this and look at that suggestion early next week.
This allows me to run more testing over the weekend.
I have already found a couple of interchange problems, but can raise issues when this option is available.

; RUN: FileCheck --input-file=%t %s

; There should be no reason to interchange this, unless it is forced.
;
; for (int i = 0; i<1024; i++)
; for (int j = 0; j<1024; j++)
; A[i][j] = 42;
;
; CHECK: --- !Passed
; CHECK-NEXT: Pass: loop-interchange
; CHECK-NEXT: Name: Interchanged
; CHECK-NEXT: Function: f
; CHECK-NEXT: Args:
; CHECK-NEXT: - String: Loop interchanged with enclosing loop.
; CHECK-NEXT: ...

@A = dso_local local_unnamed_addr global [1024 x [1024 x i32]] zeroinitializer, align 4

define dso_local void @f() local_unnamed_addr #0 {
entry:
br label %outer.header

outer.header:
%i = phi i64 [ 0, %entry ], [ %i.next, %inner.header ]
br label %inner.body

inner.header:
%i.next = add nuw nsw i64 %i, 1
%exitcond20.not = icmp eq i64 %i.next, 1024
br i1 %exitcond20.not, label %exit, label %outer.header

inner.body:
%j = phi i64 [ 0, %outer.header ], [ %j.next, %inner.body ]
%arrayidx6 = getelementptr inbounds nuw [1024 x [1024 x i32]], ptr @A, i64 0, i64 %i, i64 %j
store i32 42, ptr %arrayidx6, align 4
%j.next = add nuw nsw i64 %j, 1
%exitcond.not = icmp eq i64 %j.next, 1024
br i1 %exitcond.not, label %inner.header, label %inner.body

exit:
ret void
}