-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[LangRef] Require that vscale be a power of two #145098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change proposes that we require vscale to be a power of two. This is already true for both in tree backends which support scalable types. We had two mechanism for exposing this to the optimizer. First, we specify that any function with vscale_range must have a vscale power-of-two type. In practice, clang will always emit this attribute for RISCV and AArch64. Note there's a semantic oddity here as vscale is required to be a global constant, and yet attributes are per function. Second, we have a TTI hook. Again, both targets which support scalable vectors returned true for this hook. This change removes both the TTI hook and the vscale_range check in ValueTracking (since they're no longer needed), and deletes the very small amount of code which becomes dead once the results of the TTI hook are constant. As can be seen from the test diffs, we had at least some places which hadn't used the existing hooks. This might be an artifact of the way the tests were written, I haven't checked closely. The biggest argument against this change is that we might someday want to support a non-power-of-two vscale. I argue that this is false generality, and we should simplify for near term reality. If a piece of hardware comes along with a register size which is not a multiple of two (as the original SVE specification allowed before it was revised), we can reverse this change at that time.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-ir Author: Philip Reames (preames) ChangesThis change proposes that we require vscale to be a power of two. This is already true for both in tree backends which support scalable types. We had two mechanism for exposing this to the optimizer. First, we specify that any function with vscale_range must have a vscale power-of-two type. In practice, clang will always emit this attribute for RISCV and AArch64. Note there's a semantic oddity here as vscale is required to be a global constant, and yet attributes are per function. Second, we have a TTI hook. Again, both targets which support scalable vectors returned true for this hook. This change removes both the TTI hook and the vscale_range check in ValueTracking (since they're no longer needed), and deletes the very small amount of code which becomes dead once the results of the TTI hook are constant. As can be seen from the test diffs, we had at least some places which hadn't used the existing hooks. This might be an artifact of the way the tests were written, I haven't checked closely. The biggest argument against this change is that we might someday want to support a non-power-of-two vscale. I argue that this is false generality, and we should simplify for near term reality. If a piece of hardware comes along with a register size which is not a multiple of two (as the original SVE specification allowed before it was revised), we can reverse this change at that time. Patch is 118.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145098.diff 26 Files Affected:
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index cc72a37f68599..eaee1d177986b 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -4442,10 +4442,11 @@ elementtype may be any integer, floating-point, pointer type, or a sized
target extension type that has the ``CanBeVectorElement`` property. Vectors
of size zero are not allowed. For scalable vectors, the total number of
elements is a constant multiple (called vscale) of the specified number
-of elements; vscale is a positive integer that is unknown at compile time
-and the same hardware-dependent constant for all scalable vectors at run
-time. The size of a specific scalable vector type is thus constant within
-IR, even if the exact size in bytes cannot be determined until run time.
+of elements; vscale is a positive power-of-two integer that is unknown
+at compile time and the same hardware-dependent constant for all scalable
+vectors at run time. The size of a specific scalable vector type is thus
+constant within IR, even if the exact size in bytes cannot be determined
+until run time.
:Examples:
@@ -30398,8 +30399,8 @@ vectors such as ``<vscale x 16 x i8>``.
Semantics:
""""""""""
-``vscale`` is a positive value that is constant throughout program
-execution, but is unknown at compile time.
+``vscale`` is a positive power-of-two value that is constant throughout
+program execution, but is unknown at compile time.
If the result value does not fit in the result type, then the result is
a :ref:`poison value <poisonvalues>`.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index ba47cef274bec..d322f8a3dbc03 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1220,9 +1220,6 @@ class TargetTransformInfo {
/// \return the value of vscale to tune the cost model for.
LLVM_ABI std::optional<unsigned> getVScaleForTuning() const;
- /// \return true if vscale is known to be a power of 2
- LLVM_ABI bool isVScaleKnownToBeAPowerOfTwo() const;
-
/// \return True if the vectorization factor should be chosen to
/// make the vector of the smallest element type match the size of a
/// vector register. For wider element types, this could result in
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 640766cf8cd10..c5fde665274f3 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -591,7 +591,6 @@ class TargetTransformInfoImplBase {
virtual std::optional<unsigned> getVScaleForTuning() const {
return std::nullopt;
}
- virtual bool isVScaleKnownToBeAPowerOfTwo() const { return false; }
virtual bool
shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const {
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 90a75c3d352e4..5fac89d7c31c9 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -864,7 +864,6 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
std::optional<unsigned> getVScaleForTuning() const override {
return std::nullopt;
}
- bool isVScaleKnownToBeAPowerOfTwo() const override { return false; }
/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the demanded result elements need to be inserted and/or
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index dd44afd0855a5..e48e6e1fe2c19 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -623,9 +623,6 @@ class LLVM_ABI TargetLoweringBase {
return BypassSlowDivWidths;
}
- /// Return true only if vscale must be a power of two.
- virtual bool isVScaleKnownToBeAPowerOfTwo() const { return false; }
-
/// Return true if Flow Control is an expensive operation that should be
/// avoided.
bool isJumpExpensive() const { return JumpIsExpensive; }
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 8cc7f8a9d2ab2..ef26588d4f27a 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -795,10 +795,6 @@ std::optional<unsigned> TargetTransformInfo::getVScaleForTuning() const {
return TTIImpl->getVScaleForTuning();
}
-bool TargetTransformInfo::isVScaleKnownToBeAPowerOfTwo() const {
- return TTIImpl->isVScaleKnownToBeAPowerOfTwo();
-}
-
bool TargetTransformInfo::shouldMaximizeVectorBandwidth(
TargetTransformInfo::RegisterKind K) const {
return TTIImpl->shouldMaximizeVectorBandwidth(K);
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 73320b556f825..5252eef3b298f 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -2474,11 +2474,9 @@ bool llvm::isKnownToBeAPowerOfTwo(const Value *V, bool OrZero,
if (!I)
return false;
- if (Q.CxtI && match(V, m_VScale())) {
- const Function *F = Q.CxtI->getFunction();
- // The vscale_range indicates vscale is a power-of-two.
- return F->hasFnAttribute(Attribute::VScaleRange);
- }
+ // vscale is a power-of-two by definition
+ if (match(V, m_VScale()))
+ return true;
// 1 << X is clearly a power of two if the one is not shifted off the end. If
// it is shifted off the end then the result is undefined.
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 5d8db8be9731f..1568875cb37ce 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4660,7 +4660,6 @@ bool SelectionDAG::isKnownToBeAPowerOfTwo(SDValue Val, unsigned Depth) const {
// vscale(power-of-two) is a power-of-two for some targets
if (Val.getOpcode() == ISD::VSCALE &&
- getTargetLoweringInfo().isVScaleKnownToBeAPowerOfTwo() &&
isKnownToBeAPowerOfTwo(Val.getOperand(0), Depth + 1))
return true;
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.h b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
index e0b6c1b8c0baf..6aa8d7afffc01 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.h
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.h
@@ -517,8 +517,6 @@ class AArch64TargetLowering : public TargetLowering {
SDValue Chain, SDValue InGlue, unsigned Condition,
SDValue PStateSM = SDValue()) const;
- bool isVScaleKnownToBeAPowerOfTwo() const override { return true; }
-
// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index 470af01be3154..62c86db2fa9c0 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -156,8 +156,6 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
return ST->getVScaleForTuning();
}
- bool isVScaleKnownToBeAPowerOfTwo() const override { return true; }
-
bool shouldMaximizeVectorBandwidth(
TargetTransformInfo::RegisterKind K) const override;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 139fa7ba35625..d7c065bf9527f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -23434,18 +23434,6 @@ const MCExpr *RISCVTargetLowering::LowerCustomJumpTableEntry(
return MCSymbolRefExpr::create(MBB->getSymbol(), Ctx);
}
-bool RISCVTargetLowering::isVScaleKnownToBeAPowerOfTwo() const {
- // We define vscale to be VLEN/RVVBitsPerBlock. VLEN is always a power
- // of two >= 64, and RVVBitsPerBlock is 64. Thus, vscale must be
- // a power of two as well.
- // FIXME: This doesn't work for zve32, but that's already broken
- // elsewhere for the same reason.
- assert(Subtarget.getRealMinVLen() >= 64 && "zve32* unsupported");
- static_assert(RISCV::RVVBitsPerBlock == 64,
- "RVVBitsPerBlock changed, audit needed");
- return true;
-}
-
bool RISCVTargetLowering::getIndexedAddressParts(SDNode *Op, SDValue &Base,
SDValue &Offset,
ISD::MemIndexedMode &AM,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index f67d7f155c9d0..a4f9fe088f38d 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -394,8 +394,6 @@ class RISCVTargetLowering : public TargetLowering {
unsigned uid,
MCContext &Ctx) const override;
- bool isVScaleKnownToBeAPowerOfTwo() const override;
-
bool getIndexedAddressParts(SDNode *Op, SDValue &Base, SDValue &Offset,
ISD::MemIndexedMode &AM, SelectionDAG &DAG) const;
bool getPreIndexedAddressParts(SDNode *N, SDValue &Base, SDValue &Offset,
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index 83ac71ed9da69..1afc9c659276b 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -335,10 +335,6 @@ class RISCVTTIImpl final : public BasicTTIImplBase<RISCVTTIImpl> {
bool isLegalMaskedCompressStore(Type *DataTy, Align Alignment) const override;
- bool isVScaleKnownToBeAPowerOfTwo() const override {
- return TLI->isVScaleKnownToBeAPowerOfTwo();
- }
-
/// \returns How the target needs this vector-predicated operation to be
/// transformed.
TargetTransformInfo::VPLegalization
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f28c2ce0acc98..dcb18cffbeada 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2429,20 +2429,6 @@ Value *InnerLoopVectorizer::createIterationCountCheck(ElementCount VF,
// check is known to be true, or known to be false.
CheckMinIters = Builder.CreateICmp(P, Count, Step, "min.iters.check");
} // else step known to be < trip count, use CheckMinIters preset to false.
- } else if (VF.isScalable() && !TTI->isVScaleKnownToBeAPowerOfTwo() &&
- !isIndvarOverflowCheckKnownFalse(Cost, VF, UF) &&
- Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck) {
- // vscale is not necessarily a power-of-2, which means we cannot guarantee
- // an overflow to zero when updating induction variables and so an
- // additional overflow check is required before entering the vector loop.
-
- // Get the maximum unsigned value for the type.
- Value *MaxUIntTripCount =
- ConstantInt::get(CountTy, cast<IntegerType>(CountTy)->getMask());
- Value *LHS = Builder.CreateSub(MaxUIntTripCount, Count);
-
- // Don't execute the vector loop if (UMax - n) < (VF * UF).
- CheckMinIters = Builder.CreateICmp(ICmpInst::ICMP_ULT, LHS, CreateStep());
}
return CheckMinIters;
}
@@ -3830,7 +3816,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
MaxFactors.FixedVF.getFixedValue();
if (MaxFactors.ScalableVF) {
std::optional<unsigned> MaxVScale = getMaxVScale(*TheFunction, TTI);
- if (MaxVScale && TTI.isVScaleKnownToBeAPowerOfTwo()) {
+ if (MaxVScale) {
MaxPowerOf2RuntimeVF = std::max<unsigned>(
*MaxPowerOf2RuntimeVF,
*MaxVScale * MaxFactors.ScalableVF.getKnownMinValue());
diff --git a/llvm/test/Transforms/InstCombine/rem-mul-shl.ll b/llvm/test/Transforms/InstCombine/rem-mul-shl.ll
index 920497c07e380..ea86750e762db 100644
--- a/llvm/test/Transforms/InstCombine/rem-mul-shl.ll
+++ b/llvm/test/Transforms/InstCombine/rem-mul-shl.ll
@@ -859,7 +859,8 @@ define i64 @urem_shl_vscale() {
; CHECK-LABEL: @urem_shl_vscale(
; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[SHIFT:%.*]] = shl nuw nsw i64 [[VSCALE]], 2
-; CHECK-NEXT: [[REM:%.*]] = urem i64 1024, [[SHIFT]]
+; CHECK-NEXT: [[TMP1:%.*]] = add nuw i64 [[SHIFT]], 2047
+; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024
; CHECK-NEXT: ret i64 [[REM]]
;
%vscale = call i64 @llvm.vscale.i64()
diff --git a/llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll b/llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll
index 54dd1688ad916..8a5adb3396c2c 100644
--- a/llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll
+++ b/llvm/test/Transforms/InstSimplify/po2-shift-add-and-to-zero.ll
@@ -61,27 +61,6 @@ define i64 @test_pow2_or_zero(i64 %arg) {
ret i64 %rem
}
-;; Make sure it doesn't work if the value isn't known to be a power of 2.
-;; In this case a vscale without a `vscale_range` attribute on the function.
-define i64 @no_pow2() {
-; CHECK-LABEL: define i64 @no_pow2() {
-; CHECK-NEXT: entry:
-; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP1:%.*]] = shl i64 [[TMP0]], 4
-; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP0]], 3
-; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], -1
-; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], [[TMP3]]
-; CHECK-NEXT: ret i64 [[REM]]
-;
-entry:
- %0 = call i64 @llvm.vscale.i64()
- %1 = shl i64 %0, 4
- %2 = shl i64 %0, 3
- %3 = add i64 %2, -1
- %rem = and i64 %1, %3
- ret i64 %rem
-}
-
;; Make sure it doesn't work if the shift on the -1 side is greater
define i64 @minus_shift_greater(i64 %arg) {
; CHECK-LABEL: define i64 @minus_shift_greater
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
index 4f0637fd8db2f..ce8674bb02630 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
@@ -20,9 +20,8 @@ define void @cond_ind64(ptr noalias nocapture %a, ptr noalias nocapture readonly
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N]], [[TMP3]]
-; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], -4
+; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[N]], [[DOTNEG]]
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
@@ -42,7 +41,7 @@ define void @cond_ind64(ptr noalias nocapture %a, ptr noalias nocapture readonly
; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
; CHECK: middle.block:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:
;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
index 8c2958769a615..a72e7729a4160 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll
@@ -1373,9 +1373,8 @@ define void @interleave_deinterleave_factor3(ptr writeonly noalias %dst, ptr rea
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
-; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
+; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], 2044
+; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 4 x i64> @llvm.stepvector.nxv4i64()
@@ -1411,8 +1410,8 @@ define void @interleave_deinterleave_factor3(ptr writeonly noalias %dst, ptr rea
; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
; CHECK: middle.block:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
-; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0
+; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
; CHECK: scalar.ph:
;
entry:
@@ -1467,9 +1466,8 @@ define void @interleave_deinterleave(ptr writeonly noalias %dst, ptr readonly %a
; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
-; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i64 [[TMP2]], 2
-; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
-; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 1024, [[N_MOD_VF]]
+; CHECK-NEXT: [[DOTNEG:%.*]] = mul i64 [[TMP2]], 2044
+; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
; CHECK-NEXT: [[TMP4:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP5:%.*]] = shl nuw i64 [[TMP4]], 2
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
@@ -1500,8 +1498,8 @@ define void @interleave_deinterleave(ptr writeonly noalias %dst, ptr readonly %a
; CHECK-NEXT: [[TMP25:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
; CHECK-NEXT: br i1 [[TMP25]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
; CHECK: middle.block:
-; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
-; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
+; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0
+; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
; CHECK: scalar.ph:
;
entry:
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
index f152dd308cb69..5d09549f33267 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-masked-accesses.ll
@@ -30,44 +30,43 @@ define dso_local void @masked_strided1(ptr noalias nocapture readonly %p, ptr no
; SCALAR_TAIL_FOLDING-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ugt i32 [[TMP0]], 64
; SCALAR_TAIL_FOLDING-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
; SCALAR_TAIL_FOLDING: vector.ph:
+; SCALAR_TAIL_FOLDING-NEXT: [[TMP1:%.*]] = call i32 @llvm.vscale.i32()
+; SCALAR_TAIL_FOLDING-NEXT: [[DOTNEG:%.*]] = mul i32 [[TMP1]], 2032
+; SCALAR_TAIL_FOLDING-NEXT: [[N_VEC:%.*]] = and i32 [[DOTNEG]], 1024
; SCALAR_TAIL_FOLDING-NEXT: [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
; SCALAR_TAIL_FOLDING-NEXT: [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 4
-; SCALAR_TAIL_FOLDING-NEXT: [[N_MOD_VF:%.*]] = urem i32 1024, [[TMP3]]
-; SCALAR_TAIL_FOLDING-NEXT: [[N_VEC:%.*]] = sub nuw nsw i32 1024, [[N_MOD_VF]]
-; SCALAR_TAIL_FOLDING-NEXT: [[TMP4:%.*]] = call i32 @llvm.vscale.i32()
-; SCALAR_TAIL_FOLDING-NEXT: [[TMP5:%.*]] = shl nuw i32 [[TMP4]], 4
; SCALAR_TAIL_FOLDING-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i32> poison, i32 [[CONV]], i64 0
; SCALAR_TAIL_FOLDING-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevect...
[truncated]
|
// The vscale_range indicates vscale is a power-of-two. | ||
return F->hasFnAttribute(Attribute::VScaleRange); | ||
} | ||
// vscale is a power-of-two by definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to highlight - with this change, we no longer need a context instruction with which to find the function. This is important both because having two different functions in the same module with differently power-of-two vscales is more than bit weird, and because many (most?) callers of this API do not pass a context. As such, the version without the context is significantly more powerful in practice.
This change proposes that we require vscale to be a power of two. This is already true for both in tree backends which support scalable types.
We had two mechanism for exposing this to the optimizer.
First, we specify that any function with vscale_range must have a vscale power-of-two type. In practice, clang will always emit this attribute for RISCV and AArch64. Note there's a semantic oddity here as vscale is required to be a global constant, and yet attributes are per function.
Second, we have a TTI hook. Again, both targets which support scalable vectors returned true for this hook.
This change removes both the TTI hook and the vscale_range check in ValueTracking (since they're no longer needed), and deletes the very small amount of code which becomes dead once the results of the TTI hook are constant.
As can be seen from the test diffs, we had at least some places which hadn't used the existing hooks. This might be an artifact of the way the tests were written, I haven't checked closely.
The biggest argument against this change is that we might someday want to support a non-power-of-two vscale. I argue that this is false generality, and we should simplify for near term reality. If a piece of hardware comes along with a register size which is not a multiple of two (as the original SVE specification allowed before it was revised), we can reverse this change at that time.