Skip to content

[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions clang/include/clang/CIR/Dialect/IR/CIROps.td
Original file line number Diff line number Diff line change
Expand Up @@ -889,6 +889,65 @@ def BinOp : CIR_Op<"binop", [Pure,
let hasVerifier = 1;
}


//===----------------------------------------------------------------------===//
// BinOpOverflowOp
//===----------------------------------------------------------------------===//

def BinOpOverflowKind : I32EnumAttr<
"BinOpOverflowKind",
"checked binary arithmetic operation kind",
[BinOpKind_Add, BinOpKind_Sub, BinOpKind_Mul]> {
let cppNamespace = "::cir";
}

def BinOpOverflowOp : CIR_Op<"binop.overflow", [Pure, SameTypeOperands]> {
let summary = "Perform binary integral arithmetic with overflow checking";
let description = [{
`cir.binop.overflow` performs binary arithmetic operations with overflow
checking on integral operands.

The `kind` argument specifies the kind of arithmetic operation to perform.
It can be either `add`, `sub`, or `mul`. The `lhs` and `rhs` arguments
specify the input operands of the arithmetic operation. The types of `lhs`
and `rhs` must be the same.

`cir.binop.overflow` produces two SSA values. `result` is the result of the
arithmetic operation truncated to its specified type. `overflow` is a
boolean value indicating whether overflow happens during the operation.

The exact semantic of this operation is as follows:

- `lhs` and `rhs` are promoted to an imaginary integral type that has
infinite precision.
- The arithmetic operation is performed on the promoted operands.
- The infinite-precision result is truncated to the type of `result`. The
truncated result is assigned to `result`.
- If the truncated result is equal to the un-truncated result, `overflow`
is assigned to false. Otherwise, `overflow` is assigned to true.
}];

let arguments = (ins Arg<BinOpOverflowKind, "arithmetic kind">:$kind,
CIR_IntType:$lhs, CIR_IntType:$rhs);
let results = (outs CIR_IntType:$result, CIR_BoolType:$overflow);

let assemblyFormat = [{
`(` $kind `,` $lhs `,` $rhs `)` `:` type($lhs) `,`
`(` type($result) `,` type($overflow) `)`
attr-dict
}];

let builders = [
OpBuilder<(ins "cir::IntType":$resultTy,
"cir::BinOpOverflowKind":$kind,
"mlir::Value":$lhs,
"mlir::Value":$rhs), [{
auto overflowTy = cir::BoolType::get($_builder.getContext());
build($_builder, $_state, resultTy, overflowTy, kind, lhs, rhs);
}]>
];
}

//===----------------------------------------------------------------------===//
// GlobalOp
//===----------------------------------------------------------------------===//
Expand Down
1 change: 1 addition & 0 deletions clang/include/clang/CIR/MissingFeatures.h
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ struct MissingFeatures {
static bool cgFPOptionsRAII() { return false; }
static bool metaDataNode() { return false; }
static bool fastMathFlags() { return false; }
static bool vlas() { return false; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool vlas() { return false; }
static bool variableLengthArrays() { return false; }

vla has a handful of other meanings that this could get confused for. Also, it gives me an immediate gag-reflex to see vla in code, so spelling it out eases me into them better :)


// Missing types
static bool dataMemberType() { return false; }
Expand Down
125 changes: 122 additions & 3 deletions clang/lib/CIR/CodeGen/CIRGenExprScalar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) {
static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf,
const BinOpInfo &op,
bool isSubtraction) {
cgf.cgm.errorNYI(op.loc, "pointer arithmetic");
return {};
// Must have binary (not unary) expr here. Unary pointer
// increment/decrement doesn't use this path.
const BinaryOperator *expr = cast<BinaryOperator>(op.e);

mlir::Value pointer = op.lhs;
Expr *pointerOperand = expr->getLHS();
mlir::Value index = op.rhs;
Expr *indexOperand = expr->getRHS();

// In a subtraction, the LHS is always the pointer.
if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of subtraction index is no longer accurate either, right? Since some_ptr - other_ptr is valid.

std::swap(pointer, index);
std::swap(pointerOperand, indexOperand);
}

bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType();

// Some versions of glibc and gcc use idioms (particularly in their malloc
// routines) that add a pointer-sized integer (known to be a pointer value)
// to a null pointer in order to cast the value back to an integer or as
// part of a pointer alignment algorithm. This is undefined behavior, but
// we'd like to be able to compile programs that use it.
//
// Normally, we'd generate a GEP with a null-pointer base here in response
// to that code, but it's also UB to dereference a pointer created that
// way. Instead (as an acknowledged hack to tolerate the idiom) we will
// generate a direct cast of the integer value to a pointer.
//
// The idiom (p = nullptr + N) is not met if any of the following are true:
//
// The operation is subtraction.
// The index is not pointer-sized.
// The pointer type is not byte-sized.
//
if (BinaryOperator::isNullPointerArithmeticExtension(
cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS()))
return cgf.getBuilder().createIntToPtr(index, pointer.getType());

// Differently from LLVM codegen, ABI bits for index sizes is handled during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could use some clarification. I was going to say that I didn't think the "Differently from LLVM codegen" part of the comment added anything useful, but then before I even submitted my comment I found I had to go to the LLVM IR codegen to see what the comment was talking about. Anyway, here's my suggested rewording.

"The LLVM IR codegen compares the size of the index variable to what the data layout requires, and extends it if necessary. Like other ABI-specific processing, this is deferred until lowering to the LLVM dialect."

// LLVM lowering.

// If this is subtraction, negate the index.
if (isSubtraction)
index = cgf.getBuilder().createNeg(index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned above, this assumes that the index is integral, which isn't necessarily true. Or did we do a conversion somewhere?


if (cgf.sanOpts.has(SanitizerKind::ArrayBounds))
cgf.cgm.errorNYI("array bounds sanitizer");

const PointerType *pointerType =
pointerOperand->getType()->getAs<PointerType>();
if (!pointerType) {
cgf.cgm.errorNYI("ObjC");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love this to be slightly more detailed of a message here. Just saying ObjC is pretty vague.

return {};
}

QualType elementType = pointerType->getPointeeType();
if (const VariableArrayType *vla =
cgf.getContext().getAsVariableArrayType(elementType)) {

// The element count here is the total number of non-VLA elements.
// TODO(cir): Get correct VLA size here
assert(!cir::MissingFeatures::vlas());
mlir::Value numElements = cgf.getBuilder().getConstAPInt(
cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0));

// GEP indexes are signed, and scaling an index isn't permitted to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line of this comment from classic codegen was lost in the transition to the incubator. I think it's relevant and should be added here. The wording of the comment is also a bit awkward. How about this?

// Effectively, the multiply by the VLA size is part of the GEP.
// GEP indexes are signed, and signed-overflow while scaling an
// index isn't permitted, so we set the NSW flag on our explicit
// multiply if overflow is undefined behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this.

// signed-overflow, so we use the same semantics for our explicit
// multiply. We suppress this if overflow is not undefined behavior.
mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType());

index = cgf.getBuilder().createCast(cir::CastKind::integral, index,
numElements.getType());
index = cgf.getBuilder().createMul(index.getLoc(), index, numElements);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment above (with or without my rewording), this should sunk into the if-else below and should be createNSWMul in the else case. That's what happens in classic codegen.


if (cgf.getLangOpts().isSignedOverflowDefined()) {
assert(!cir::MissingFeatures::ptrStrideOp());
cgf.cgm.errorNYI("pointer stride");
} else {
pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
isSubtraction, op.e->getExprLoc());
}

return pointer;
}
// Explicitly handle GNU void* and function pointer arithmetic extensions. The
// GNU void* casts amount to no-ops since our void* type is i8*, but this is
// future proof.
mlir::Type elemTy;
if (elementType->isVoidType() || elementType->isFunctionType())
elemTy = cgf.UInt8Ty;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit suspect. Before the transition to opaque pointers in ClangIR, the code here in the classic codegen did this:

if (elementType->isVoidType() || elementType->isFunctionType()) {
  Value *result = CGF.EmitCastToVoidPtr(pointer);
  result = CGF.Builder.CreateGEP(CGF.Int8Ty, result, index, "add.ptr");
  return CGF.Builder.CreateBitCast(result, pointer->getType());
}

That looks much more consistent with the comment above. The original code here was added in this commit:

42a8cd3

We should probably look at the test cases there to make sure this is doing the right thing.

else
elemTy = cgf.convertTypeForMem(elementType);

if (cgf.getLangOpts().isSignedOverflowDefined()) {
assert(!cir::MissingFeatures::ptrStrideOp());
cgf.cgm.errorNYI("pointer stride");
return pointer;
}

return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned,
isSubtraction, op.e->getExprLoc());
}

mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) {
Expand Down Expand Up @@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) {
// See more in `EmitSub` in CGExprScalar.cpp.
assert(!cir::MissingFeatures::ptrDiffOp());
cgf.cgm.errorNYI("ptrdiff");
return {};
return cgf.createDummyValue(loc, ops.fullType);
}

mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) {
Expand Down Expand Up @@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e,
return ScalarExprEmitter(*this, builder)
.emitScalarPrePostIncDec(e, lv, isInc, isPre);
}

mlir::Value CIRGenFunction::emitCheckedInBoundsGEP(
mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList,
bool signedIndices, bool isSubtraction, SourceLocation loc) {
assert(!cir::MissingFeatures::ptrStrideOp());
if (idxList.size() != 1)
cgm.errorNYI("multi-index ptr arithmetic");

// TODO(cir): This should be a PtrStrideOp. For now we simply return the base
// pointer
mlir::Value gepVal = ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we should be emitting an NYI error here.


// If the pointer overflow sanitizer isn't enabled, do nothing.
if (!sanOpts.has(SanitizerKind::PointerOverflow))
return gepVal;

assert(!cir::MissingFeatures::pointerOverflowSanitizer());
cgm.errorNYI("pointer overflow sanitizer");
return gepVal;
}
10 changes: 10 additions & 0 deletions clang/lib/CIR/CodeGen/CIRGenFunction.h
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache {

void emitDecl(const clang::Decl &d);

/// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to
/// detect undefined behavior when the pointer overflow sanitizer is enabled.
/// \p SignedIndices indicates whether any of the GEP indices are signed.
/// \p IsSubtraction indicates whether the expression used to form the GEP
/// is a subtraction.
mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reorganizing the emit functions in #133017. Can we hold this until that is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure 👍

llvm::ArrayRef<mlir::Value> idxList,
bool signedIndices, bool isSubtraction,
SourceLocation loc);

void emitScalarInit(const clang::Expr *init, mlir::Location loc,
LValue lvalue, bool capturedByInit = false);

Expand Down
118 changes: 118 additions & 0 deletions clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/IR/BuiltinDialect.h"
#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/Types.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassManager.h"
#include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h"
Expand Down Expand Up @@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite(
return mlir::LogicalResult::success();
}

mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite(
cir::BinOpOverflowOp op, OpAdaptor adaptor,
mlir::ConversionPatternRewriter &rewriter) const {
mlir::Location loc = op.getLoc();
BinOpOverflowKind arithKind = op.getKind();
IntType operandTy = op.getLhs().getType();
IntType resultTy = op.getResult().getType();

EncompassedTypeInfo encompassedTyInfo =
computeEncompassedTypeWidth(operandTy, resultTy);
mlir::IntegerType encompassedLLVMTy =
rewriter.getIntegerType(encompassedTyInfo.width);

mlir::Value lhs = adaptor.getLhs();
mlir::Value rhs = adaptor.getRhs();
if (operandTy.getWidth() < encompassedTyInfo.width) {
if (operandTy.isSigned()) {
lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs);
rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs);
} else {
lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs);
rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs);
}
}

std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign,
encompassedTyInfo.width);
auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName);

mlir::IntegerType overflowLLVMTy = rewriter.getI1Type();
auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral(
rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy});

auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>(
loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs});
mlir::Value intrinRet = callLLVMIntrinOp.getResult(0);

mlir::Value result = rewriter
.create<mlir::LLVM::ExtractValueOp>(
loc, intrinRet, ArrayRef<int64_t>{0})
.getResult();
mlir::Value overflow = rewriter
.create<mlir::LLVM::ExtractValueOp>(
loc, intrinRet, ArrayRef<int64_t>{1})
.getResult();

if (resultTy.getWidth() < encompassedTyInfo.width) {
mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy);
auto truncResult =
rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result);

// Extend the truncated result back to the encompassing type to check for
// any overflows during the truncation.
mlir::Value truncResultExt;
if (resultTy.isSigned())
truncResultExt = rewriter.create<mlir::LLVM::SExtOp>(
loc, encompassedLLVMTy, truncResult);
else
truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>(
loc, encompassedLLVMTy, truncResult);
auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>(
loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result);

result = truncResult;
overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow);
}

mlir::Type boolLLVMTy =
getTypeConverter()->convertType(op.getOverflow().getType());
if (boolLLVMTy != rewriter.getI1Type())
overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow);

rewriter.replaceOp(op, mlir::ValueRange{result, overflow});

return mlir::success();
}

std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName(
cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) {
// The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}`

std::string name = "llvm.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should use llvm::raw_string_ostream or Twine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, incubator's CIRGenFunction::emitBuiltinExpr has a few examples on that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is definitely value to at least doing a reserve here if the above suggestions don't help.


if (isSigned)
name.push_back('s');
else
name.push_back('u');

switch (opKind) {
case cir::BinOpOverflowKind::Add:
name.append("add.");
break;
case cir::BinOpOverflowKind::Sub:
name.append("sub.");
break;
case cir::BinOpOverflowKind::Mul:
name.append("mul.");
break;
}

name.append("with.overflow.i");
name.append(std::to_string(width));

return name;
}

CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo
CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth(
cir::IntType operandTy, cir::IntType resultTy) {
bool sign = operandTy.getIsSigned() || resultTy.getIsSigned();
unsigned width =
std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()),
resultTy.getWidth() + (sign && resultTy.isUnsigned()));
return {sign, width};
}

static void prepareTypeConverter(mlir::LLVMTypeConverter &converter,
mlir::DataLayout &dataLayout) {
converter.addConversion([&](cir::PointerType type) -> mlir::Type {
Expand Down Expand Up @@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() {
patterns.add<
// clang-format off
CIRToLLVMBinOpLowering,
CIRToLLVMBinOpOverflowOpLowering,
CIRToLLVMBrCondOpLowering,
CIRToLLVMBrOpLowering,
CIRToLLVMFuncOpLowering,
Expand Down
Loading