-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[CIR] Add BinOpOverflowOp and basic pointer arithmetic support #133118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -936,8 +936,107 @@ getUnwidenedIntegerType(const ASTContext &astContext, const Expr *e) { | |
static mlir::Value emitPointerArithmetic(CIRGenFunction &cgf, | ||
const BinOpInfo &op, | ||
bool isSubtraction) { | ||
cgf.cgm.errorNYI(op.loc, "pointer arithmetic"); | ||
return {}; | ||
// Must have binary (not unary) expr here. Unary pointer | ||
// increment/decrement doesn't use this path. | ||
const BinaryOperator *expr = cast<BinaryOperator>(op.e); | ||
|
||
mlir::Value pointer = op.lhs; | ||
Expr *pointerOperand = expr->getLHS(); | ||
mlir::Value index = op.rhs; | ||
Expr *indexOperand = expr->getRHS(); | ||
|
||
// In a subtraction, the LHS is always the pointer. | ||
if (!isSubtraction && !mlir::isa<cir::PointerType>(pointer.getType())) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the case of |
||
std::swap(pointer, index); | ||
std::swap(pointerOperand, indexOperand); | ||
} | ||
|
||
bool isSigned = indexOperand->getType()->isSignedIntegerOrEnumerationType(); | ||
|
||
// Some versions of glibc and gcc use idioms (particularly in their malloc | ||
// routines) that add a pointer-sized integer (known to be a pointer value) | ||
// to a null pointer in order to cast the value back to an integer or as | ||
// part of a pointer alignment algorithm. This is undefined behavior, but | ||
// we'd like to be able to compile programs that use it. | ||
// | ||
// Normally, we'd generate a GEP with a null-pointer base here in response | ||
// to that code, but it's also UB to dereference a pointer created that | ||
// way. Instead (as an acknowledged hack to tolerate the idiom) we will | ||
// generate a direct cast of the integer value to a pointer. | ||
// | ||
// The idiom (p = nullptr + N) is not met if any of the following are true: | ||
// | ||
// The operation is subtraction. | ||
// The index is not pointer-sized. | ||
// The pointer type is not byte-sized. | ||
// | ||
if (BinaryOperator::isNullPointerArithmeticExtension( | ||
cgf.getContext(), op.opcode, expr->getLHS(), expr->getRHS())) | ||
return cgf.getBuilder().createIntToPtr(index, pointer.getType()); | ||
|
||
// Differently from LLVM codegen, ABI bits for index sizes is handled during | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment could use some clarification. I was going to say that I didn't think the "Differently from LLVM codegen" part of the comment added anything useful, but then before I even submitted my comment I found I had to go to the LLVM IR codegen to see what the comment was talking about. Anyway, here's my suggested rewording. "The LLVM IR codegen compares the size of the index variable to what the data layout requires, and extends it if necessary. Like other ABI-specific processing, this is deferred until lowering to the LLVM dialect." |
||
// LLVM lowering. | ||
|
||
// If this is subtraction, negate the index. | ||
if (isSubtraction) | ||
index = cgf.getBuilder().createNeg(index); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as I mentioned above, this assumes that the |
||
|
||
if (cgf.sanOpts.has(SanitizerKind::ArrayBounds)) | ||
cgf.cgm.errorNYI("array bounds sanitizer"); | ||
|
||
const PointerType *pointerType = | ||
pointerOperand->getType()->getAs<PointerType>(); | ||
if (!pointerType) { | ||
cgf.cgm.errorNYI("ObjC"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would love this to be slightly more detailed of a message here. Just saying |
||
return {}; | ||
} | ||
|
||
QualType elementType = pointerType->getPointeeType(); | ||
if (const VariableArrayType *vla = | ||
cgf.getContext().getAsVariableArrayType(elementType)) { | ||
|
||
// The element count here is the total number of non-VLA elements. | ||
// TODO(cir): Get correct VLA size here | ||
assert(!cir::MissingFeatures::vlas()); | ||
mlir::Value numElements = cgf.getBuilder().getConstAPInt( | ||
cgf.getLoc(op.loc), cgf.getBuilder().getUInt64Ty(), llvm::APInt(64, 0)); | ||
|
||
// GEP indexes are signed, and scaling an index isn't permitted to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The first line of this comment from classic codegen was lost in the transition to the incubator. I think it's relevant and should be added here. The wording of the comment is also a bit awkward. How about this? // Effectively, the multiply by the VLA size is part of the GEP. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there are a few extra words perhaps in Andy's suggestion, but I'm generally in favor of clarifying this. |
||
// signed-overflow, so we use the same semantics for our explicit | ||
// multiply. We suppress this if overflow is not undefined behavior. | ||
mlir::Type elemTy = cgf.convertTypeForMem(vla->getElementType()); | ||
|
||
index = cgf.getBuilder().createCast(cir::CastKind::integral, index, | ||
numElements.getType()); | ||
index = cgf.getBuilder().createMul(index.getLoc(), index, numElements); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Based on the comment above (with or without my rewording), this should sunk into the if-else below and should be |
||
|
||
if (cgf.getLangOpts().isSignedOverflowDefined()) { | ||
assert(!cir::MissingFeatures::ptrStrideOp()); | ||
cgf.cgm.errorNYI("pointer stride"); | ||
} else { | ||
pointer = cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned, | ||
isSubtraction, op.e->getExprLoc()); | ||
} | ||
|
||
return pointer; | ||
} | ||
// Explicitly handle GNU void* and function pointer arithmetic extensions. The | ||
// GNU void* casts amount to no-ops since our void* type is i8*, but this is | ||
// future proof. | ||
mlir::Type elemTy; | ||
if (elementType->isVoidType() || elementType->isFunctionType()) | ||
elemTy = cgf.UInt8Ty; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks a bit suspect. Before the transition to opaque pointers in ClangIR, the code here in the classic codegen did this:
That looks much more consistent with the comment above. The original code here was added in this commit: We should probably look at the test cases there to make sure this is doing the right thing. |
||
else | ||
elemTy = cgf.convertTypeForMem(elementType); | ||
|
||
if (cgf.getLangOpts().isSignedOverflowDefined()) { | ||
assert(!cir::MissingFeatures::ptrStrideOp()); | ||
cgf.cgm.errorNYI("pointer stride"); | ||
return pointer; | ||
} | ||
|
||
return cgf.emitCheckedInBoundsGEP(elemTy, pointer, index, isSigned, | ||
isSubtraction, op.e->getExprLoc()); | ||
} | ||
|
||
mlir::Value ScalarExprEmitter::emitMul(const BinOpInfo &ops) { | ||
|
@@ -1106,7 +1205,7 @@ mlir::Value ScalarExprEmitter::emitSub(const BinOpInfo &ops) { | |
// See more in `EmitSub` in CGExprScalar.cpp. | ||
assert(!cir::MissingFeatures::ptrDiffOp()); | ||
cgf.cgm.errorNYI("ptrdiff"); | ||
return {}; | ||
return cgf.createDummyValue(loc, ops.fullType); | ||
} | ||
|
||
mlir::Value ScalarExprEmitter::emitShl(const BinOpInfo &ops) { | ||
|
@@ -1402,3 +1501,23 @@ mlir::Value CIRGenFunction::emitScalarPrePostIncDec(const UnaryOperator *e, | |
return ScalarExprEmitter(*this, builder) | ||
.emitScalarPrePostIncDec(e, lv, isInc, isPre); | ||
} | ||
|
||
mlir::Value CIRGenFunction::emitCheckedInBoundsGEP( | ||
mlir::Type elemTy, mlir::Value ptr, ArrayRef<mlir::Value> idxList, | ||
bool signedIndices, bool isSubtraction, SourceLocation loc) { | ||
assert(!cir::MissingFeatures::ptrStrideOp()); | ||
if (idxList.size() != 1) | ||
cgm.errorNYI("multi-index ptr arithmetic"); | ||
|
||
// TODO(cir): This should be a PtrStrideOp. For now we simply return the base | ||
// pointer | ||
mlir::Value gepVal = ptr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like we should be emitting an NYI error here. |
||
|
||
// If the pointer overflow sanitizer isn't enabled, do nothing. | ||
if (!sanOpts.has(SanitizerKind::PointerOverflow)) | ||
return gepVal; | ||
|
||
assert(!cir::MissingFeatures::pointerOverflowSanitizer()); | ||
cgm.errorNYI("pointer overflow sanitizer"); | ||
return gepVal; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -223,6 +223,16 @@ class CIRGenFunction : public CIRGenTypeCache { | |
|
||
void emitDecl(const clang::Decl &d); | ||
|
||
/// Same as IRBuilder::CreateInBoundsGEP, but additionally emits a check to | ||
/// detect undefined behavior when the pointer overflow sanitizer is enabled. | ||
/// \p SignedIndices indicates whether any of the GEP indices are signed. | ||
/// \p IsSubtraction indicates whether the expression used to form the GEP | ||
/// is a subtraction. | ||
mlir::Value emitCheckedInBoundsGEP(mlir::Type elemTy, mlir::Value ptr, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm reorganizing the emit functions in #133017. Can we hold this until that is merged? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure 👍 |
||
llvm::ArrayRef<mlir::Value> idxList, | ||
bool signedIndices, bool isSubtraction, | ||
SourceLocation loc); | ||
|
||
void emitScalarInit(const clang::Expr *init, mlir::Location loc, | ||
LValue lvalue, bool capturedByInit = false); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,7 @@ | |
#include "mlir/Dialect/LLVMIR/LLVMDialect.h" | ||
#include "mlir/IR/BuiltinDialect.h" | ||
#include "mlir/IR/BuiltinOps.h" | ||
#include "mlir/IR/Types.h" | ||
#include "mlir/Pass/Pass.h" | ||
#include "mlir/Pass/PassManager.h" | ||
#include "mlir/Target/LLVMIR/Dialect/Builtin/BuiltinToLLVMIRTranslation.h" | ||
|
@@ -1117,6 +1118,122 @@ mlir::LogicalResult CIRToLLVMBinOpLowering::matchAndRewrite( | |
return mlir::LogicalResult::success(); | ||
} | ||
|
||
mlir::LogicalResult CIRToLLVMBinOpOverflowOpLowering::matchAndRewrite( | ||
cir::BinOpOverflowOp op, OpAdaptor adaptor, | ||
mlir::ConversionPatternRewriter &rewriter) const { | ||
mlir::Location loc = op.getLoc(); | ||
BinOpOverflowKind arithKind = op.getKind(); | ||
IntType operandTy = op.getLhs().getType(); | ||
IntType resultTy = op.getResult().getType(); | ||
|
||
EncompassedTypeInfo encompassedTyInfo = | ||
computeEncompassedTypeWidth(operandTy, resultTy); | ||
mlir::IntegerType encompassedLLVMTy = | ||
rewriter.getIntegerType(encompassedTyInfo.width); | ||
|
||
mlir::Value lhs = adaptor.getLhs(); | ||
mlir::Value rhs = adaptor.getRhs(); | ||
if (operandTy.getWidth() < encompassedTyInfo.width) { | ||
if (operandTy.isSigned()) { | ||
lhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, lhs); | ||
rhs = rewriter.create<mlir::LLVM::SExtOp>(loc, encompassedLLVMTy, rhs); | ||
} else { | ||
lhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, lhs); | ||
rhs = rewriter.create<mlir::LLVM::ZExtOp>(loc, encompassedLLVMTy, rhs); | ||
} | ||
} | ||
|
||
std::string intrinName = getLLVMIntrinName(arithKind, encompassedTyInfo.sign, | ||
encompassedTyInfo.width); | ||
auto intrinNameAttr = mlir::StringAttr::get(op.getContext(), intrinName); | ||
|
||
mlir::IntegerType overflowLLVMTy = rewriter.getI1Type(); | ||
auto intrinRetTy = mlir::LLVM::LLVMStructType::getLiteral( | ||
rewriter.getContext(), {encompassedLLVMTy, overflowLLVMTy}); | ||
|
||
auto callLLVMIntrinOp = rewriter.create<mlir::LLVM::CallIntrinsicOp>( | ||
loc, intrinRetTy, intrinNameAttr, mlir::ValueRange{lhs, rhs}); | ||
mlir::Value intrinRet = callLLVMIntrinOp.getResult(0); | ||
|
||
mlir::Value result = rewriter | ||
.create<mlir::LLVM::ExtractValueOp>( | ||
loc, intrinRet, ArrayRef<int64_t>{0}) | ||
.getResult(); | ||
mlir::Value overflow = rewriter | ||
.create<mlir::LLVM::ExtractValueOp>( | ||
loc, intrinRet, ArrayRef<int64_t>{1}) | ||
.getResult(); | ||
|
||
if (resultTy.getWidth() < encompassedTyInfo.width) { | ||
mlir::Type resultLLVMTy = getTypeConverter()->convertType(resultTy); | ||
auto truncResult = | ||
rewriter.create<mlir::LLVM::TruncOp>(loc, resultLLVMTy, result); | ||
|
||
// Extend the truncated result back to the encompassing type to check for | ||
// any overflows during the truncation. | ||
mlir::Value truncResultExt; | ||
if (resultTy.isSigned()) | ||
truncResultExt = rewriter.create<mlir::LLVM::SExtOp>( | ||
loc, encompassedLLVMTy, truncResult); | ||
else | ||
truncResultExt = rewriter.create<mlir::LLVM::ZExtOp>( | ||
loc, encompassedLLVMTy, truncResult); | ||
auto truncOverflow = rewriter.create<mlir::LLVM::ICmpOp>( | ||
loc, mlir::LLVM::ICmpPredicate::ne, truncResultExt, result); | ||
|
||
result = truncResult; | ||
overflow = rewriter.create<mlir::LLVM::OrOp>(loc, overflow, truncOverflow); | ||
} | ||
|
||
mlir::Type boolLLVMTy = | ||
getTypeConverter()->convertType(op.getOverflow().getType()); | ||
if (boolLLVMTy != rewriter.getI1Type()) | ||
overflow = rewriter.create<mlir::LLVM::ZExtOp>(loc, boolLLVMTy, overflow); | ||
|
||
rewriter.replaceOp(op, mlir::ValueRange{result, overflow}); | ||
|
||
return mlir::success(); | ||
} | ||
|
||
std::string CIRToLLVMBinOpOverflowOpLowering::getLLVMIntrinName( | ||
cir::BinOpOverflowKind opKind, bool isSigned, unsigned width) { | ||
// The intrinsic name is `@llvm.{s|u}{opKind}.with.overflow.i{width}` | ||
|
||
std::string name = "llvm."; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if this should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so, but it could use llvm::Intrinisc::getName() with logic to figure out the needed intrinsic ID instead of building the name as a composite. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, incubator's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is definitely value to at least doing a |
||
|
||
if (isSigned) | ||
name.push_back('s'); | ||
else | ||
name.push_back('u'); | ||
|
||
switch (opKind) { | ||
case cir::BinOpOverflowKind::Add: | ||
name.append("add."); | ||
break; | ||
case cir::BinOpOverflowKind::Sub: | ||
name.append("sub."); | ||
break; | ||
case cir::BinOpOverflowKind::Mul: | ||
name.append("mul."); | ||
break; | ||
} | ||
|
||
name.append("with.overflow.i"); | ||
name.append(std::to_string(width)); | ||
|
||
return name; | ||
} | ||
|
||
CIRToLLVMBinOpOverflowOpLowering::EncompassedTypeInfo | ||
CIRToLLVMBinOpOverflowOpLowering::computeEncompassedTypeWidth( | ||
cir::IntType operandTy, cir::IntType resultTy) { | ||
bool sign = operandTy.getIsSigned() || resultTy.getIsSigned(); | ||
unsigned width = | ||
std::max(operandTy.getWidth() + (sign && operandTy.isUnsigned()), | ||
resultTy.getWidth() + (sign && resultTy.isUnsigned())); | ||
return {sign, width}; | ||
} | ||
|
||
static void prepareTypeConverter(mlir::LLVMTypeConverter &converter, | ||
mlir::DataLayout &dataLayout) { | ||
converter.addConversion([&](cir::PointerType type) -> mlir::Type { | ||
|
@@ -1256,6 +1373,7 @@ void ConvertCIRToLLVMPass::runOnOperation() { | |
patterns.add< | ||
// clang-format off | ||
CIRToLLVMBinOpLowering, | ||
CIRToLLVMBinOpOverflowOpLowering, | ||
CIRToLLVMBrCondOpLowering, | ||
CIRToLLVMBrOpLowering, | ||
CIRToLLVMFuncOpLowering, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vla
has a handful of other meanings that this could get confused for. Also, it gives me an immediate gag-reflex to seevla
in code, so spelling it out eases me into them better :)