MerryMage
5debb411cc
block_of_code: Explicitly delete copy constructor
2020-04-22 21:02:46 +01:00
MerryMage
605a43d23e
Suppress MSVC warning C4702: unreachable code
2020-04-22 21:02:46 +01:00
Lioncash
8316d231e9
A32: Implement barrier instructions introduced in ARMv7
...
Provides basic implementations of the barrier instruction introduced
within ARMv7. Currently these simply mirror the behavior of the AArch64
equivalents.
2020-04-22 21:02:46 +01:00
Merry
09ee64ea98
Merge pull request #482 from lioncash/fixedfp
...
A64: Handle half-precision variants of FP->Fixed instructions
2020-04-22 21:02:45 +01:00
MerryMage
1e1e9c17c7
emit_x64_data_processing: Remove INVALID_REG
...
INVALID_REG.cvt8() now throws
2020-04-22 21:02:45 +01:00
Lioncash
604f39f00a
frontend/ir_emitter: Add half-precision->fixed-point opcodes
2020-04-22 21:01:45 +01:00
Lioncash
96356fac93
frontend/ir_emitter: Add half-precision opcode variant of FPVectorRSqrtStepFused
2020-04-22 21:01:45 +01:00
Merry
45864133f5
Merge pull request #478 from lioncash/stepfused
...
A64: Handle half-precision variants of FRECPE and FRECPS
2020-04-22 21:01:44 +01:00
Lioncash
824c551ba2
frontend/ir_emitter: Add half-precision opcode variant of FPRSqrtStepFused
2020-04-22 21:01:44 +01:00
Lioncash
037acb17b9
frontend/ir_emitter: Add half-precision opcode variant for FPVectorRSqrtEstimate
2020-04-22 21:01:44 +01:00
Lioncash
5dba99b4f4
frontend/ir_emitter: Add half-precision opcode variant for FPRSqrtEstimate
2020-04-22 21:01:44 +01:00
Lioncash
825a3ea16f
frontend/ir_emitter: Add half-precision opcode for FPVectorRecipEstimate
2020-04-22 21:01:44 +01:00
Lioncash
2184d24e8f
frontend/ir_emitter: Add half-precision opcode for FPRecipEstimate
2020-04-22 21:01:44 +01:00
Lioncash
5d5c9f149f
frontend/ir_emitter: Add half-precision opcode for FPVectorRecipStepFused
2020-04-22 21:01:44 +01:00
Lioncash
6da0411111
frontend/ir_emitter: Add half-precision opcode for FPRecipStepFused
2020-04-22 21:01:44 +01:00
Lioncash
5b4673da4b
frontend/ir_emitter: Add half-precision variant of FPVectorRoundInt
2020-04-22 21:01:44 +01:00
Lioncash
ad0c698f89
frontend/ir_emitter: Add half-precision variant of FPRoundInt
2020-04-22 21:01:44 +01:00
Merry
cb9a1b18b6
Merge pull request #475 from lioncash/muladd
...
A64: Enable half-precision variants of floating-point multiply-add instructions
2020-04-22 21:01:44 +01:00
Merry
13f421c27d
Merge pull request #473 from lioncash/sqshlu
...
A64: Implement SQSHLU
2020-04-22 21:01:44 +01:00
Merry
d7da53a74b
Merge pull request #472 from lioncash/exception
...
general: Mark hash functions as noexcept
2020-04-22 21:01:44 +01:00
Lioncash
a4cadf1cd9
frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation
2020-04-22 21:01:44 +01:00
Lioncash
ec6b3ae084
ir/frontend: Add half-precision opcode for FPVectorMulAdd
2020-04-22 21:01:44 +01:00
Lioncash
bd82513199
frontend/ir_emitter: Add half-precision opcode for FPMulAdd
2020-04-22 21:01:44 +01:00
Lioncash
7bb5440507
general: Mark hash functions as noexcept
...
Generally hash functions shouldn't throw exceptions. It's also a
requirement for the standard library-provided hash functions to not
throw exceptions.
An exception to this rule is made for user-defined specializations,
however we can just be consistent with the standard library on this to
allow it to play nicer with it.
While we're at it, we can also make the std::less specializations
noexcpet as well, since they also can't throw.
2020-04-22 21:01:43 +01:00
Lioncash
fe95575b95
general: Replace unreachable-imitating assertions with UNREACHABLE()
...
We can just use the self-documenting assertion for indicating
unreachable paths, instead of manually passing false and providing a
message.
2020-04-22 21:01:43 +01:00
Lioncash
b37279f65c
backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft
...
Avoids undefined behavior by potentially left-shifting a signed negative
value.
2020-04-22 21:00:47 +01:00
MerryMage
13e8b7b516
emit_x64_floating_point: F16C implementation of FPSingleToHalf
2020-04-22 20:58:17 +01:00
MerryMage
d32d6fe598
emit_x64_floating_point: F16C implementation of FPHalfToSingle and FPHalfToDouble
2020-04-22 20:58:12 +01:00
MerryMage
a53ba12be2
emit_x64_floating_point: Factor out ConvertRoundingModeToX64Immediate
2020-04-22 20:58:12 +01:00
MerryMage
5a2adc6629
backend/x64: Expose FPCR in EmitContext instead of its subcomponents
2020-04-22 20:58:12 +01:00
Merry
01bb1cdd88
Merge pull request #458 from lioncash/float-op
...
A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV
2020-04-22 20:58:12 +01:00
Lioncash
8309ec7a9f
frontend/ir_emitter: Add half-precision variant of FPAbs
2020-04-22 20:58:12 +01:00
Lioncash
e4c259d69f
frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes
2020-04-22 20:58:12 +01:00
Lioncash
c97efcb978
frontend/ir_emitter: Add half-precision variant of FPNeg
2020-04-22 20:58:12 +01:00
Lioncash
bd892ec4ef
frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point
2020-04-22 20:58:11 +01:00
Merry
bbd5330ad2
Merge pull request #447 from lioncash/flag
...
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Merry
fb039e232c
Merge pull request #442 from lioncash/fcvtxn
...
A64: Implement scalar and vector variants of FCVTXN
2020-04-22 20:58:11 +01:00
Lioncash
597a8be5d5
ir: Add A64-specific opcodes for getting and setting raw NZCV values
...
This will be necessary to implement the flag manipulation and flag
format instructions.
2020-04-22 20:58:11 +01:00
Lioncash
5cf1478620
frontend/ir: Add opcodes for vector square roots
2020-04-22 20:58:10 +01:00
Lioncash
7c81a58ed3
frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
...
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2020-04-22 20:58:10 +01:00
Merry
9f11720a69
Merge pull request #437 from lioncash/frecpx
...
A64: Implement FRECPX (single, double precision)
2020-04-22 20:58:10 +01:00
Lioncash
9cf3c25811
frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents
2020-04-22 20:58:10 +01:00
Lioncash
2e180a7f14
backend/x64/a32_interface: Mark Context move constructor and move assignment as noexcept
...
Provides a more "correct" move constructor/assignment operator, since
these relevant functions shouldn't throw exceptions.
Has the benefit of playing nicely with std::move_if_noexcept and other
noexcept library facilities.
2020-04-22 20:58:09 +01:00
Lioncash
deb9dd4acc
block_of_code: Replace cast with [[maybe_unused]] in DoesCpuSupport()
2020-04-22 20:58:09 +01:00
Lioncash
3290a9fdc2
common: Remove address_range.h
...
The AddressRange structure isn't used anywhere within the codebase, so
this can be removed. Particularly because there's no real appeal/heavy
potential use of it in the future that isn't trivial to add back if
needed.
2020-04-22 20:57:38 +01:00
Lioncash
93351c7efb
a64_emit_x64: Make constness of loop elements explicit within GenFastmemFallbacks()
2020-04-22 20:57:37 +01:00
Lioncash
7752ffc50c
a64_emit_x64: Convert std::vector instances in GenFastmemFallbacks() to std::array
...
Given these are quite small, we can avoid the need to heap allocate
here.
2020-04-22 20:57:37 +01:00
MerryMage
7c8fcaef26
emit_x64_vector_floating_point: AVX && DN implementation of EmitFPVectorMulX
2020-04-22 20:57:37 +01:00
MerryMage
fa8925c4df
IR: Implement FPVectorMulX
2020-04-22 20:57:37 +01:00
V.Kalyuzhny
764a93bf5a
Switch boost::optional to std::optional
2020-04-22 20:57:37 +01:00
Lioncash
d69fceec55
value: Move ImmediateToU64() to be a part of Value's interface
...
This'll make it slightly nicer to do basic constant folding for 32-bit
and 64-bit variants of the same IR opcode type. By that, I mean it's
possible to inspect immediate values without a bunch of conditional
checks beforehand to verify that it's possible to call GetU32() or
GetU64, etc.
2020-04-22 20:55:50 +01:00
MerryMage
ca603c1215
reg_alloc: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
MerryMage
e2358af5ef
abi: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
MerryMage
7c0378f56d
a64_exclusive_monitor: Loosen memory ordering requirements
...
It is not necessary to be as strict as it was.
2020-04-22 20:55:50 +01:00
MerryMage
f0920c0ded
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16
2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64
2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32
2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()
2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()
2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()
2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87
emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
...
pshufb lyfe
2020-04-22 20:55:50 +01:00
MerryMage
238f2f2cd0
a64_emit_x64: Lowercase PAGE_SIZE
...
PAGE_SIZE is defined as a macro by musl.
2020-04-22 20:55:50 +01:00
MerryMage
7162f6f254
emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed
2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699
emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE
2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732
emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8
2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa
emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16
2020-04-22 20:55:50 +01:00
MerryMage
03ad2072a7
emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed
2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd
A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant
2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942
ir: Add opcodes for unsigned saturating left shifts
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d
IR: Add fbits argument to FixedToFP-related opcodes
2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46
ir: Add opcodes for left signed saturated shifts
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525
emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
...
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9
perf_map: Use std::string_view instead of std::string for PerfMapRegister()
...
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5
A64: Implement SQRDMULH (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde
backend_x64/a64_interface: Re-enable the constant folding pass
...
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2
emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d
emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused
2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1
emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba
emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}
2020-04-22 20:55:06 +01:00
MerryMage
1492573267
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da
ir: Add opcodes for signed saturated accumulations of unsigned values
2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d
ir: Add opcodes for performing unsigned reciprocal square root estimates
2020-04-22 20:55:05 +01:00
Lioncash
af83360f89
ir: Add opcodes for unsigned reciprocal estimate
2020-04-22 20:55:05 +01:00
Lioncash
fca7eddb9e
A64: Add opcodes for signed saturating negations
2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
...
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
...
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d
backend/x64: Do not clear fast_dispatch_table if not enabled
...
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660
A64: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422
A32: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00