Commit graph

258 commits

Author SHA1 Message Date
MerryMage
7162f6f254 emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed 2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699 emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE 2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732 emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8 2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16 2020-04-22 20:55:50 +01:00
MerryMage
03ad2072a7 emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed 2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant 2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942 ir: Add opcodes for unsigned saturating left shifts 2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7 IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d IR: Add fbits argument to FixedToFP-related opcodes 2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525 emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9 perf_map: Use std::string_view instead of std::string for PerfMapRegister()
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5 A64: Implement SQRDMULH (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde backend_x64/a64_interface: Re-enable the constant folding pass
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2 emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused 2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1 emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64} 2020-04-22 20:55:06 +01:00
MerryMage
1492573267 emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32} 2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226 emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation 2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28 emit_x64_vector: Simplify fpsr_qc related code
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04 ir: Add opcodes form unsigned saturated accumulations of signed values 2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da ir: Add opcodes for signed saturated accumulations of unsigned values 2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d ir: Add opcodes for performing unsigned reciprocal square root estimates 2020-04-22 20:55:05 +01:00
Lioncash
af83360f89 ir: Add opcodes for unsigned reciprocal estimate 2020-04-22 20:55:05 +01:00
Lioncash
fca7eddb9e A64: Add opcodes for signed saturating negations 2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9 emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d backend/x64: Do not clear fast_dispatch_table if not enabled
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660 A64: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422 A32: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13 ir/terminal: Add FastDispatchHint 2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c ir: Add opcodes for scalar signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546 ir: Add opcodes for signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
0507e47420 ir: Add opcodes for signed saturated absolute values 2020-04-22 20:53:46 +01:00
MerryMage
27427595b7 emit_x64_floating_point: EmitFPToFixed: maxsd optimization
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905 emit_x64_vector: Provide AVX path for EmitVectorMinU64() 2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06 emit_x64_vector: Provide AVX path for EmitVectorMinS64() 2020-04-22 20:53:46 +01:00
Lioncash
770723f449 emit_x64_vector: Provide AVX path for EmitVectorMaxU64() 2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1 emit_x64_vector: Provide AVX path for EmitVectorMaxS64() 2020-04-22 20:53:46 +01:00
Lioncash
2cac6ad129 emit_x64_vector: Simplify EmitVectorLogicalLeftShift8()
Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead
of time and just and the results of a halfword left shift.
2020-04-22 20:53:46 +01:00
Lioncash
135107279d emit_x64_vector: Simplify EmitVectorLogicalShiftRight8()
We can generate the mask and AND it against the result of a halfword
shift instead of looping.
2020-04-22 20:53:46 +01:00
Lioncash
2952b46b16 emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16()
We should be defining the value after the results have been calculated
to be consistent with the rest of the code.
2020-04-22 20:53:46 +01:00
Lioncash
fda19095ea emit_x64_vector: Remove fallback in EmitVectorSignExtend64()
This is fairly trivial to do manually.
2020-04-22 20:53:46 +01:00
Lioncash
39593fcd26 emit_x64_vector: Remove fallback for EmitVectorSignExtend32()
We can just do the extension manually, which gets rid of the need to
fall back here.
2020-04-22 20:53:46 +01:00
MerryMage
a12854857b A32: Add define_unpredictable_behaviour option 2020-04-22 20:53:46 +01:00
MerryMage
d5b9c4a4bb block_of_code: Hide NX support behind compiler flag
Systems that require W^X can use the DYNARMIC_ENABLE_NO_EXECUTE_SUPPORT cmake option.
2020-04-22 20:53:46 +01:00
MerryMage
de4494ffa5 Implement perfmap 2020-04-22 20:53:46 +01:00
MerryMage
f73104633b a32_emit_x64: Fix incorrect BMI2 implementation for SetCpsr
* The MSB for each byte in cpsr_ge were not being appropriately set.
* We also expand test coverage to test this case.
* We fix the disassembly of the MSR (imm) and MSR (reg) instructions as well.
2020-04-22 20:53:46 +01:00
MerryMage
3432a08e0a backend/x64: Support W^X systems
Closes #176.
2020-04-22 20:53:46 +01:00
BreadFish64
2a65442933 Backend: Create "backend" folder
similar to the "frontend" folder
2020-04-22 20:53:46 +01:00