MerryMage
3caeb62ef1
emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba
emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}
2020-04-22 20:55:06 +01:00
MerryMage
1492573267
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
112cff9ab9
A64: Implement CLZ's vector variant
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage
d4c37a68a8
A64/translate: VectorZeroUpper for V(64) stores
...
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage
b8daa4feac
simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
14e026a7f0
A64: Implement USQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00
Lioncash
18ad7f237d
A64: Implement SUQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da
ir: Add opcodes for signed saturated accumulations of unsigned values
2020-04-22 20:55:05 +01:00
Lioncash
9a3d38d2ee
A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
...
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2020-04-22 20:55:05 +01:00
Lioncash
6ccfbc9b39
A64: Implement UMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
58e21f175c
A64: Implement SMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
134bb02e19
ir/value: Replace includes with forward declarations
...
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2020-04-22 20:55:05 +01:00
Lioncash
2c8e07e7d0
ir/cond: Migrate to C++17 nested namespace specifiers
2020-04-22 20:55:05 +01:00
Lioncash
c3b7819a55
CMakeLists: Add missing cond.h header to file listing
...
Allows the file to show up within IDEs more easily.
2020-04-22 20:55:05 +01:00
Lioncash
0a3976059f
A64: Implement URSQRTE
2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d
ir: Add opcodes for performing unsigned reciprocal square root estimates
2020-04-22 20:55:05 +01:00
Lioncash
bd3582e811
A64: Implement URECPE
2020-04-22 20:55:05 +01:00
Lioncash
af83360f89
ir: Add opcodes for unsigned reciprocal estimate
2020-04-22 20:55:05 +01:00
Lioncash
740ffa52ae
A64: Implement SQNEG's scalar and vector variant
2020-04-22 20:53:46 +01:00
Lioncash
fca7eddb9e
A64: Add opcodes for signed saturating negations
2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
...
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
...
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
Lioncash
f5fb496e7e
A64: Implement SQDMULH's by-element scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
40f0576995
A64: Implement SQDMULH's by-element vector variant
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d
backend/x64: Do not clear fast_dispatch_table if not enabled
...
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660
A64: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422
A32: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13
ir/terminal: Add FastDispatchHint
2020-04-22 20:53:46 +01:00
Lioncash
1a69a61cb4
A64: Implement SQDMULH's scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c
ir: Add opcodes for scalar signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
9c03311fed
A64: Implement SQDMULH's vector variant
2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546
ir: Add opcodes for signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
db24e1f09b
A64: Implement SQABS' scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
bda5d14c7f
A64: Implement SQABS' vector variant.
2020-04-22 20:53:46 +01:00
Lioncash
0507e47420
ir: Add opcodes for signed saturated absolute values
2020-04-22 20:53:46 +01:00
MerryMage
27427595b7
emit_x64_floating_point: EmitFPToFixed: maxsd optimization
...
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a
emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
...
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
MerryMage
3415828fb4
IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64}
2020-04-22 20:53:46 +01:00
Lioncash
e30f9816ec
A32/decoder: Add missing <algorithm> includes
...
These includes should be present, as we use std::find_if() within these headers.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905
emit_x64_vector: Provide AVX path for EmitVectorMinU64()
2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06
emit_x64_vector: Provide AVX path for EmitVectorMinS64()
2020-04-22 20:53:46 +01:00
Lioncash
770723f449
emit_x64_vector: Provide AVX path for EmitVectorMaxU64()
2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1
emit_x64_vector: Provide AVX path for EmitVectorMaxS64()
2020-04-22 20:53:46 +01:00