MerryMage
506e544bfe
IR: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
6eb069e80d
fp: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
b0ff35fcd1
fp: Implement FPNeg
2020-04-22 20:46:22 +01:00
MerryMage
ca6774ccce
process_nan: Add two operand variant
2020-04-22 20:46:22 +01:00
Lioncash
ace7d2ba50
A64: Implement FMAXP, FMINP, FMAXNMP and FMINNMP's scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
MerryMage
66bb05fc0a
emit_x64_floating_point: Fixup special NaN case in FMA FPMulAdd implementation
2020-04-22 20:46:21 +01:00
Lioncash
070637e0f6
fp: Use a forward declaration in fused.h
...
It's permissible to forward declare here, so we can do so and eliminate
a direct header dependency
2020-04-22 20:46:21 +01:00
Lioncash
030820f649
u128: Implement comparison operators in terms of one another
...
We can just implement the comparisons in terms of operator< and
implement inequality with the negation of operator==.
2020-04-22 20:46:21 +01:00
MerryMage
76b07d6646
u128: StickyLogicalShiftRight requires special-casing for amount == 64
...
In this case (128 - amount) == 64, and this invokes undefined behaviour
2020-04-22 20:46:21 +01:00
Lioncash
49c7edf7c6
A64: Implement FMLA and FMLS (by element)'s double/single-precision scalar variant
2020-04-22 20:46:21 +01:00
Lioncash
c704acafe4
A64: Implement FMUL (by element)'s scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
MerryMage
0ce11b7b15
emit_x64_floating_point: Implement accurate fallback for FPMulAdd{32,64}
2020-04-22 20:46:21 +01:00
MerryMage
e199887fbc
fp: Implement FPMulAdd
2020-04-22 20:46:21 +01:00
MerryMage
53a8c15d12
process_nan: Add FPProcessNaNs3
2020-04-22 20:46:21 +01:00
MerryMage
1c8e93e74d
block_of_code: Add SysV ABI fifth and sixth parameters
2020-04-22 20:46:21 +01:00
MerryMage
1fe8f51c54
u128: Add StickyLogicalShiftRight
2020-04-22 20:46:21 +01:00
MerryMage
b0afd53ea7
u128: Add Multiply64To128
2020-04-22 20:46:21 +01:00
MerryMage
5566fab29a
u128: Add u128::Bit
2020-04-22 20:46:21 +01:00
MerryMage
3e62fea003
u128: Add comparison operators
2020-04-22 20:46:21 +01:00
MerryMage
f17cd6f2c5
unpacked: Use ResidualErrorOnRightShift in FPRoundBase
...
Fixes a bug relating to exponents that are severely out of range.
2020-04-22 20:46:21 +01:00
MerryMage
805428e35e
fp: Remove MantissaT
2020-04-22 20:46:21 +01:00
MerryMage
bda86fd167
FPRSqrtEstimate: Improve documentation of RecipSqrtEstimate
2020-04-22 20:46:21 +01:00
Lioncash
0a64a66b26
FPRSqrtEstimate: Deduplicate array bounds
...
Dehardcodes a few constants in the loops.
2020-04-22 20:46:21 +01:00
Lioncash
b7bd70fd19
A64: Implement FMAXV, FMINV, FMAXNMV, and FMINNMV
2020-04-22 20:46:21 +01:00
Lioncash
664fb12e21
FPRSqrtEstimate: Use forward declarations where applicable
2020-04-22 20:46:21 +01:00
Lioncash
3447c82656
translate: Return by bool in helpers where applicable
...
Gets rid of a bit of duplication regarding the early-out cases and makes
all helpers functions consistent (previously some had a return type of
bool, while others had a return type of void).
2020-04-22 20:46:21 +01:00
Lioncash
d65b056eba
Simplify fallback case for EmitVectorSetElement64()
2020-04-22 20:46:21 +01:00
MerryMage
6087c2af6f
emit_x64_floating_point: s/Esimate/Estimate/
2020-04-22 20:46:21 +01:00
MerryMage
f837ce8e78
simd_scalar_two_register_misc: Implement FRSQRTE, scalar variant
2020-04-22 20:46:21 +01:00
MerryMage
bde58b04d4
IR: Implement FPRSqrtEstimate
2020-04-22 20:46:21 +01:00
MerryMage
16061c28f3
simd_vector_x_indexed_element: Implement FMUL (by element), vector variant
2020-04-22 20:46:21 +01:00
MerryMage
55eaa16615
a64_emit_x64: Ensure host has updated ticks in EmitA64GetCNTPCT
...
Discovered by @Subv.
Fixes incomplete fix begun in 5a91c94dca47c9702dee20fbd5ae1f4c07eef9df.
That fix fails to take into account that LinkBlock doesn't update ticks until there
are no remaining ticks to be executed.
Test added to confirm fix.
2020-04-22 20:46:21 +01:00
MerryMage
edd795e991
a64_emit_x64: Fix stack misalignment on Windows for 128-bit exclusive writes
...
Discovered by @Subv.
Includes a test to ensure this codepath is exercised on Windows.
2020-04-22 20:46:21 +01:00
Lioncash
04b4c8b0cf
emit_x64_aes: Eliminate extraneous usage of a scratch register in EmitAESInverseMixColumns()
...
We can just use the same register the data is in as the result register,
eliminating the need to use a completely separate register to store the
result.
2020-04-22 20:46:21 +01:00
Lioncash
e5d80e998e
A64: Implement SADDLV
2020-04-22 20:46:21 +01:00
Lioncash
a1bc8ddb53
A64: Implement UADDLV
2020-04-22 20:46:21 +01:00
Lioncash
1dc1e3dcd8
fp: Use forward declarations where applicable
...
Minimizes the amount of files that need to be rebuilt if the headers
ever change.
2020-04-22 20:46:21 +01:00
Lioncash
46cb0d813b
emit_x64_vector: Append 'v' prefix onto movq in AVX path
...
This is something I missed when adding in the AVX broadcast code.
2020-04-22 20:46:21 +01:00
Subv
4606a081c9
A64: The A64SetTPIDR IR instruction writes to a system register and should not be eliminated by the dead code elimination pass.
...
Previously this instruction was alway eliminated, resulting in incorrect values for TPIDR_EL0.
2020-04-22 20:46:21 +01:00
MerryMage
b53127600b
fp: A64::FPCR -> FP::FPCR
2020-04-22 20:46:21 +01:00
MerryMage
084bf63a10
bit_util: Implement ClearBits and ModifyBits
2020-04-22 20:46:21 +01:00
MerryMage
699c5f36d5
system: Simplify static_cast
2020-04-22 20:46:21 +01:00
MerryMage
3f602129f4
system: Ensure value of CNTPCT_EL0 is accurate
...
Since we currently only update the host's tick count at the end of a
block, we force an end-of-block before executing a MRS %, CNTPCT_ELO
instruction.
2020-04-22 20:46:21 +01:00
Lioncash
84affdb260
safe_ops: Avoid cases where shift bases are invalid with signed values
...
For example, say the converted signed type is s64, shifting left by 63
bits would be undefined behavior.
However, given an ASL is essentially the same behavior as an LSL
we can just use an unsigned type instead of converting to a signed type.
2020-04-22 20:46:21 +01:00
Lioncash
d0274f412a
safe_ops: Avoid signed overflow in Negate()
...
Negation of values such as -9223372036854775808 can't be represented in
signed equivalents (such as long long), leading to signed overflow.
Therefore, we can just invert bits and add 1 to perform this behavior
with unsigned arithmetic.
2020-04-22 20:46:21 +01:00
Lioncash
af3e23b224
simd_scalar_shift_by_immediate: Implement FCVT{ZS, ZU} (vector, fixed-point)'s scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
Lioncash
91abf87169
simd_scalar_two_register_misc: Implement FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU} (vector)'s scalar double/single-precision variants
...
We can simply implement this in terms of the fixed-point IR opcodes.
2020-04-22 20:46:21 +01:00
Lioncash
0ec8dac660
emit_x64: Remove FPSCR_RoundTowardsZero() virtual function from EmitContext struct
...
This code was bugged in that we were comparing if the rounding mode was
not equal to rounding towards zero. Fortunately, however, nothing uses
this function anymore, and there's already the more general
FPSCR_RMode() available, so this can be removed entirely.
2020-04-22 20:46:21 +01:00
Lioncash
fd92e2f186
emit_x64: Add missing <array> include
...
Commit 755adef62e504a8d616de9dda8937d2428a9471b introduced a helper
alias for std::array, eliminating the need to manually type out sizes
for them, however I forgot to add the include for <array>
2020-04-22 20:46:21 +01:00
Lioncash
f939bd0228
emit_x64_vector{_floating_point}: Add helper alias for sizing arrays relative to vector width
...
Avoids needing to remember to specify the proper size of the arrays, all
that's needed is to specify the type of the array and the size will
automatically be deduced from it. This helps prevent potential oversized
or undersized arrays from being specified.
2020-04-22 20:46:21 +01:00
MerryMage
58f3399032
A64/PopRSBHint: Prevent RETing to a guest PC of ~0ull from crashing the jit
2020-04-22 20:46:21 +01:00
MerryMage
e18fca17dc
A64: Implement FABD in terms of existing IR instructions
...
Fixes NaN issue. Closes #306 .
2020-04-22 20:46:21 +01:00
MerryMage
1dbe9d95e6
FPRoundInt: Final FPRound based on new sign
...
While this shouldn't change any of the results in theory, it's just logically more consistent
2020-04-22 20:46:21 +01:00
MerryMage
83be491875
emit_x64_floating_point: SSE4.1 implementation of EmitFPRound
2020-04-22 20:46:20 +01:00
MerryMage
a40127a054
A64: Implement FRINTX, FRINTI (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
962fa3b65e
A64: Implement FRINTP, FRINTM, FRINTZ (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
5200bf41cf
A64: Implement FRINTN (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
8718dc1692
A64: Implement FRINTA (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
b228694012
IR: Implement FPRoundInt
2020-04-22 20:46:20 +01:00
MerryMage
e24054f4d7
fp: Implement FPRoundInt
2020-04-22 20:46:20 +01:00
MerryMage
f876e4afa2
fp: Implement FPProcessNaN
2020-04-22 20:46:20 +01:00
MerryMage
591adee443
fp/info: Add DefaultNaN
2020-04-22 20:46:20 +01:00
MerryMage
797e18cd97
fp: Move FPToFixed to its own file
2020-04-22 20:46:20 +01:00
MerryMage
295deb4035
a64_jit_state: Add FPSR.QC flag
2020-04-22 20:46:20 +01:00
Lioncash
7797bc2fb2
emit_x64_vector: Use non-scratch Use* variants of registers within EmitVectorUnsignedAbsoluteDifference()
...
In some cases, a register isn't modified, depending on the branch taken,
so we can signify this by using the non-scratch variants in certain
cases.
2020-04-22 20:46:20 +01:00
Lioncash
f7f83b76b7
simd_scalar_two_register_misc: Implement scalar double/single-precision variants of FCM{EQ, GE, GT, LE, LT} (zero)
2020-04-22 20:46:20 +01:00
Lioncash
9db6d1e98b
translate_arm: Remove unnecessary rotr() function
...
We already have RotateRight() in our common code, so we can remove this
function and replace it with it. We can also implement ArmExpandImm_C()
in terms of ArmExpandImm().
2020-04-22 20:46:20 +01:00
Lioncash
9f8a44c982
cast_util: Remove unnecessary typename
...
Given we use std::aligned_storage_t, we don't need to specify
typename here. If we used std::aligned_storage, then we would need to.
2020-04-22 20:46:19 +01:00
MerryMage
89e43867c1
A64: Implement FADDP (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
33fa65de23
A64: Implement FADDP (vector)
2020-04-22 20:46:19 +01:00
MerryMage
9dba273a8c
A64: Implement SADDLP
2020-04-22 20:46:19 +01:00
MerryMage
70ff2d73b5
A64: Implement UADDLP
2020-04-22 20:46:19 +01:00
MerryMage
5563bbbd79
A64: Implement EXT
2020-04-22 20:46:19 +01:00
MerryMage
304cc7f61e
emit_x64_floating_point: SSE4.1 implementation for FP{Double,Single}ToFixed{S,U}{32,64}
2020-04-22 20:46:19 +01:00
MerryMage
3d9677d094
A64: Implement FCVTMU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
79c9018d60
A64: Implement FCVTMS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
49c4499a87
A64: Implement FCVTPU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
af661ef5a6
A64: Implement FCVTPS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
27319822bb
A64: Implement FCVTAU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
c0c7a26314
A64: Implement FCVTAS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
a1965a74a0
A64: Implement FCVTNU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
7d36dbcdfd
A64: Implement FCVTNS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
617ca0adf0
floating_point_conversion_integer: Refactor implementation of FCVTZS_float_int and FCVTZU_float_int
2020-04-22 20:46:19 +01:00
MerryMage
caaf36dfd6
IR: Initial implementation of FP{Double,Single}ToFixed{S,U}{32,64}
...
This implementation just falls-back to the software floating point implementation.
2020-04-22 20:46:19 +01:00
MerryMage
760cc3ca89
EmitContext: Expose FPCR
2020-04-22 20:46:19 +01:00
MerryMage
9571269552
fp/op: Implement FPToFixed
2020-04-22 20:46:19 +01:00
MerryMage
8087e8df05
mantissa_util: Implement ResidualErrorOnRightShift
...
Accurately calculate residual error that is shifted out
2020-04-22 20:46:19 +01:00
MerryMage
8668d61881
fp/unpacked: Implement FPRound
2020-04-22 20:46:19 +01:00
MerryMage
55d590c01f
FPCR: Add AHP setter and FZ16 getter
2020-04-22 20:46:19 +01:00
MerryMage
7360a2579b
mp: Implement metaprogramming library
2020-04-22 20:46:19 +01:00
MerryMage
4ab029c114
fp: Implement FPUnpack
2020-04-22 20:46:19 +01:00
MerryMage
4875658917
fp: Implement FPProcessException
2020-04-22 20:46:19 +01:00
MerryMage
3cb98e1560
fp: Move fp_util to fp/util
2020-04-22 20:46:19 +01:00
MerryMage
c41a38b13e
fp: Add FPSR
2020-04-22 20:46:19 +01:00
MerryMage
66381352f3
fp: Add FPInfo
...
Provides information about floating-point format for various bit sizes
2020-04-22 20:46:19 +01:00
MerryMage
d21659152c
safe_ops: Implement safe shifting operations
...
Implement shifiting operations that perform consistently across architectures
without running into undefined or implemented-defined behaviour.
2020-04-22 20:46:19 +01:00
MerryMage
b00fe23b91
bit_util: Implement MostSignificantBit
2020-04-22 20:46:19 +01:00
MerryMage
95ad0d0a66
bit_util: Use Ones to implement Bits
2020-04-22 20:46:19 +01:00
MerryMage
62b640b2fa
bit_util: Add ClearBit and ModifyBit
2020-04-22 20:46:19 +01:00
MerryMage
8651c2d10e
u128: Implement u128
...
For when we need a 128-bit integer
2020-04-22 20:46:19 +01:00
Lioncash
e7409fdfe4
A64: Implement UCVTF (vector, integer)'s double/single-precision variant
2020-04-22 20:46:19 +01:00
Lioncash
4aa4885ba7
ir: Add opcodes for vector conversion of u32/u64 to floating-point
2020-04-22 20:46:19 +01:00
Lioncash
fcae4e2418
simd_three_different: Deduplicate common implementations
...
Generally, the only difference between the signed variants and the
unsigned variants is whether or not we use a sign-extension or
zero-extension, so we can simply use common functions to implement both
cases without totally duplicating code twice here.
2020-04-22 20:46:19 +01:00
Lioncash
9c0d5cf15c
floating_point_conversion_integer: Handle S64/U64 -> F32 conversions in SCVTF_float_int and UCVTF_float_int
2020-04-22 20:46:19 +01:00
Lioncash
7a84b6e8d8
ir: Add opcodes for converting S64 and U64 to single-precision floating-point values
2020-04-22 20:46:19 +01:00
Lioncash
066061fa50
constant_pool: Remove unnecessary std::memset from constructor
...
AllocateFromCodeSpace() already zeroes out the allocated memory.
2020-04-22 20:46:19 +01:00
Lioncash
a1d6a86e8c
A64: Implement ADDV
2020-04-22 20:46:19 +01:00
Lioncash
35026a6ce3
emit_x64_vector: Vectorize fallback path for EmitVectorMaxU32()
2020-04-22 20:46:19 +01:00
Lioncash
245c903129
simd_three_same: Join FPAbsoluteComparison() into FPCompareRegister()
...
These are part of the same comparison family, so there's no real point
in keeping them separate.
2020-04-22 20:46:19 +01:00
Lioncash
9912836b59
A64: Implement scalar double/single-precision variants of FACGE, FACGT, FCMEQ, FCMGE, FCMGT
2020-04-22 20:46:18 +01:00
MerryMage
0b97e9bd8d
emit_x64_floating_point: Fix EmitFPU64ToDouble for TowardsMinusInfinity rounding mode
2020-04-22 20:46:18 +01:00
MerryMage
a2eb9a02e0
backend_x86: Add FPSCR_RMode to EmitContext
2020-04-22 20:46:18 +01:00
MerryMage
d875c08ebf
fp: Extract common RoundingMode enum
2020-04-22 20:46:18 +01:00
Lioncash
3714bc0ed4
floating_point_conversion_integer: Use FPS64ToDouble and FPU64ToDouble in SCVTF_float_int and UCVTF_float_int
...
The opcodes introduced in 979b6f39f1621b80bd463645ec5b08661cb6b1bf can
also be used here, avoiding more falling back to the interpreter.
2020-04-22 20:46:18 +01:00
Lioncash
b97358075e
simd_scalar_two_register_misc: Handle 64-bit case in SCVTF and UCVTF's scalar double/single-precision variant
...
Avoids falling back to the interpreter in the 64-bit case.
2020-04-22 20:46:18 +01:00
Lioncash
7252293184
emit_x64_floating_point: Correct use of UseGpr() in EmitFPU32ToDouble() and EmitFPU32ToSingle()
...
In the non-AVX512 path, the following code is present:
code.mov(from.cvt32(), from.cvt32());
since this potentially modifies 'from', we should be using
UseScratchGpr() instead.
2020-04-22 20:46:18 +01:00
Lioncash
fbd7623fe5
emit_x64_floating_point: Add AVX512F conversion operations to EmitFPU32ToSingle() and EmitFPU32ToDouble()
...
AVX-512F provides convenient instructions for these kinds of conversions
directly
2020-04-22 20:46:18 +01:00
Lioncash
3a41465eaf
ir: Add opcodes for converting S64 and U64 to double-precision values
2020-04-22 20:46:18 +01:00
MerryMage
436ca80bcd
Merge branch 'global_monitor'
2020-04-22 20:46:18 +01:00
Lioncash
0f4bf26e05
simd_two_register_misc: Utilize FPVectorAbs in FABS implementations
...
Since we already have opcodes introduced to implement FACGE and FACGT,
we can reutilize it for the FABS implementations.
2020-04-22 20:46:18 +01:00
MerryMage
821cff1227
A64: Add ClearExclusiveState method
2020-04-22 20:46:18 +01:00
Lioncash
81e572c78c
ir: Extend FPVectorAbs opcode to also handle 16-bit elements for FP16
2020-04-22 20:46:18 +01:00
MerryMage
2a8de5f733
a64_emit_x64: Clear exclusive state in EmitA64CallSupervisor
...
The kernel would have to execute an ERET instruction to return to
userland; this clears exclusive state.
2020-04-22 20:46:18 +01:00
Lioncash
53dbb6a92a
A64: Implement FACGE's vector single/double precision variants
2020-04-22 20:46:18 +01:00
MerryMage
57f7c7e1b0
Implement global exclusive monitor
2020-04-22 20:46:18 +01:00
Lioncash
6912a02d9b
A64: Implement FACGT's vector single/double precision variants
2020-04-22 20:46:18 +01:00
MerryMage
85234338d3
a64_emit_x64: Simplify EmitExclusiveWrite
2020-04-22 20:46:18 +01:00
Lioncash
fc731dddae
ir: Add opcodes for performing vector absolute floating-point values
...
This will be usable for implementing FACGE and FACGT
2020-04-22 20:46:18 +01:00
MerryMage
2fc6b33829
CMakeLists: Add missing files
2020-04-22 20:46:18 +01:00
Lioncash
0bee648b4f
emit_x64_vector: Deduplicate a bit of code in EmitVectorSetElement{8, 32, 64} functions
...
Given both branches are the same, we can hoist out the common code.
2020-04-22 20:46:18 +01:00
Lioncash
d86fea0d28
A64: Implement FCMEQ (zero)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
593eca7fb1
A64: Implement load/store single structure instructions
...
Implements LD{1, 2, 3, 4}, LD{1, 2, 3, 4}R, and ST{1, 2, 3, 4} single
structure variants.
2020-04-22 20:46:18 +01:00
Lioncash
9bec354791
A64: Implement FCMEQ (register)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
b6e223fc58
emit_x64_vector: Deduplicate a bit of code within EmitVectorGetElement8()
...
Given both branches use the same destination register size, we can hoist
the common code out.
2020-04-22 20:46:18 +01:00
Lioncash
5ce187a54e
ir: Add opcodes for floating-point vector equalities
2020-04-22 20:46:18 +01:00
MerryMage
be354dbfd0
ir/basic_block: Add missing U16 immediate type to DumpBlock
2020-04-22 20:46:18 +01:00
Lioncash
cf188448d4
emit_x64_vector: Vectorize fallback case in EmitVectorMultiply64()
...
Gets rid of the need to perform a fallback.
2020-04-22 20:46:18 +01:00
MerryMage
5503ff28c3
llvm_disassemble: Allow disassembly of invalid AArch64 instructions
2020-04-22 20:46:18 +01:00
Lioncash
954deff2d4
emit_x64_vector: Add break to final case in EmitVectorRoundingHalvingAddUnsigned()
...
This doesn't alter behavior but does make the code better if anything
else is ever added to this function in the future.
2020-04-22 20:46:18 +01:00
Lioncash
11a92eaaef
A64: Implement SRHADD and URHADD
2020-04-22 20:46:18 +01:00
Lioncash
9e75d08860
A64: Implement FABD's scalar single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
bc718c5b28
ir: Add opcodes for performing rounding halving adds
2020-04-22 20:46:18 +01:00
Lioncash
d898d1779d
A64: Implement FABD's vector single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
054549da35
emit_x64_vector: Simplify AVX-512 codepath in EmitVectorMultiply64
...
I realized I introduced a helper for simple AVX operation emitting, so
use that instead of writing it all out long-form.
2020-04-22 20:46:18 +01:00
Lioncash
8a4f8aed06
ir: Add opcode for performing FP vector absolute differences
2020-04-22 20:46:18 +01:00
Lioncash
cb456f914b
A64: Implement UMLAL{2}, UMLSL{2}, and UMULL{2}
...
Now that we have the helper function set up for the signed variants, we
can also modify it to be used with the unigned ones by performing a zero
extension instead of a sign extension.
2020-04-22 20:46:18 +01:00
MerryMage
ba84e7a8de
A64: Implement FNMSUB
2020-04-22 20:46:18 +01:00
Lioncash
3576c02d91
A64: Implement SMLSL{2}
2020-04-22 20:46:18 +01:00
MerryMage
a1042cfcd8
A64: Implement FNMADD
2020-04-22 20:46:18 +01:00
Lioncash
ada5c0b2fa
A64: Implement SMLAL{2}
2020-04-22 20:46:18 +01:00
MerryMage
0d83032a6f
A64: Implement FMSUB
2020-04-22 20:46:18 +01:00
Lioncash
2d1aca25e6
A64: Implement SMULL{2}
2020-04-22 20:46:18 +01:00
MerryMage
69e00d225c
A64: Implement FMADD
2020-04-22 20:46:18 +01:00
MerryMage
8c90fcf58e
IR: Implement FPMulAdd
2020-04-22 20:46:18 +01:00
Lioncash
c5ae9107a9
A64: Implement SABAL/SABAL2 and SABDL/SABDL2
...
Now that we have a helper function for the unsigned variants, we can
modify it to also be usable with the signed variants.
2020-04-22 20:46:18 +01:00
Lioncash
24e3299276
A64: Implement FCMGT, FCMGE (register) vector double and single precision variants
2020-04-22 20:46:18 +01:00
Lioncash
26d4473851
A64: Implement UABAL/UABAL2
2020-04-22 20:46:18 +01:00
Lioncash
350bc70be8
A64: Implement FCMGT, FCMGE, FCMLE, FCMLT (zero) vector double and single precision variants.
2020-04-22 20:46:18 +01:00
Lioncash
3397742c74
A64: Implement UABDL/UABDL2
2020-04-22 20:46:18 +01:00
Lioncash
c695da1cf3
ir: Add opcode for floating-point GE and GT comparisons
...
The rest of the comparisons can be implemented in terms of these two
2020-04-22 20:46:18 +01:00
Lioncash
6de5ed96e5
emit_x64_vector: Emit VPMULLQ in EmitVectorMultiply64 on AVX-512{DQ, VL} capable CPUs
...
Shortens code-gen down to a single instruction in the 64-bit path.
2020-04-22 20:46:18 +01:00
Lioncash
9054d1c20b
A64: Implement LDR (literal, SIMD&FP)
2020-04-22 20:46:18 +01:00
Lioncash
0da5e949a8
Correct typo in DataCacheOperation enum
...
Fixes a typo for the InvalidateByVAToPoC enum entry. Given yuzu is the
only known user of 64-bit mode and it doesn't use this value, we can get
away with changing this.
2020-04-22 20:46:18 +01:00
Lioncash
9736e2cce2
A64: Implement FABS' half-precision variant
2020-04-22 20:46:18 +01:00
Lioncash
6e5750e4ec
A64: Implement FABS' single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
7bce8d8757
A64: Implement URSHR (scalar) and URSRA (scalar)
...
Now that the utility function is all set up from implementing SRSRA, the
unsigned variants can now be trivially implemented by modifying the
utility function to perform a logical shift right instead of an
arithmetical shift right for the unsigned case.
2020-04-22 20:46:18 +01:00
Lioncash
1e70a589b0
A64: Implement SRSRA (scalar)
2020-04-22 20:46:18 +01:00
Lioncash
998aef07f6
A64: Implement SRSHR (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
7c0250e9f8
A64: Implement SABA
2020-04-22 20:46:17 +01:00
Lioncash
f00789e6f7
A64: Implement SABD
2020-04-22 20:46:17 +01:00
Lioncash
1e10017f4b
ir: Add opcodes for signed absolute differences
2020-04-22 20:46:17 +01:00
Tillmann Karras
d3b44c1b5a
decoder_detail: use structured bindings
2020-04-22 20:46:17 +01:00
Lioncash
f745eb28bf
simd_two_register_misc: Handle 64-bit case for SCVTF_int_4
2020-04-22 20:46:17 +01:00
Lioncash
3f6c529da2
ir: Add opcode to perform the vector conversion S64->F64
...
Unfortunately x86 prior to AVX-512 doesn't really give us any convenient instruction to do the work for us
2020-04-22 20:46:17 +01:00
Lioncash
0e61ee6bf6
A64: Implement SHLL/SHLL2
2020-04-22 20:46:17 +01:00
Lioncash
43e6e98c3b
A64: Add missing decoding for PRFM (unscaled offset)
2020-04-22 20:46:17 +01:00
Lioncash
f2a85d5601
A64: Implement UHSUB
2020-04-22 20:46:17 +01:00
Lioncash
b33360a324
A64: Implement SHSUB
2020-04-22 20:46:17 +01:00
Lioncash
44a5f8095a
ir: Add opcodes for performing vector halving subtracts
2020-04-22 20:46:17 +01:00
Lioncash
4f37c0ec5a
A64: Implement SM4EKEY
2020-04-22 20:46:17 +01:00
Lioncash
3bde3347a5
A64: Implement SM4E
2020-04-22 20:46:17 +01:00
Lioncash
b312d28295
ir: Add an opcode for doing an SM4 lookup table query
2020-04-22 20:46:17 +01:00
Lioncash
27a6d5f6ce
emit_x64_vector: Use VPOPCNTB in EmitVectorPopulationCount() if AVX-512 BITALG is available
2020-04-22 20:46:17 +01:00
Lioncash
4dcc7724e0
A64: Implement UHADD
2020-04-22 20:46:17 +01:00
Lioncash
f8714f7250
A64: Implement SHADD
2020-04-22 20:46:17 +01:00
Lioncash
089096948a
ir: Add opcodes for performing halving adds
2020-04-22 20:46:17 +01:00
Lioncash
3d00dd63b4
emit_x64_vector: Emit VPMINSQ and VPMINUQ for 64-bit vector min operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b97b71b8aa
emit_x64_vector: Emit VPMAXSQ and VPMAXUQ for 64-bit vector max operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
033e400df0
emit_x64_vector_floating_point: Deduplicate accurate NaN handling code
...
Allows the code to both be used from the 32 bit and 64 bit operations without duplicating code.
2020-04-22 20:46:17 +01:00
Lioncash
0f067b7330
emit_x64_vector: Emit VPABSQ in EmitVectorAbs() for the 64-bit case if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
d4ee878cbd
emit_x64_vector: Use VPSRAQ in EmitVectorArithmeticShiftRight64() if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b38dd191bd
disassembler_arm: Remove rotation helper function in favor of Common::RotateRight
...
Mildly reduces the amount of duplicated behavior
2020-04-22 20:46:17 +01:00
Lioncash
51e4f1d9db
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS32()
2020-04-22 20:46:17 +01:00
Lioncash
c692ccdd6d
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8()
2020-04-22 20:46:17 +01:00
Lioncash
b194313d8c
emit_x64_vector: Vectorize fallback path in EmitVectorMinU32()
2020-04-22 20:46:17 +01:00
Lioncash
7ceda6d919
emit_x64_vector: Vectorize fallback path in EmitVectorMinU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda85a1da0
emit_x64_vector: Vectorize fallback path in EmitVectorMinS32()
2020-04-22 20:46:17 +01:00
Lioncash
6e08eed210
emit_x64_vector: Vectorize fallback path in EmitVectorMinS8()
2020-04-22 20:46:17 +01:00
Lioncash
0fb6dce689
emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
...
This can simply be merged with the previous one.
2020-04-22 20:46:17 +01:00
Lioncash
5b71b1337b
emit_x64_vector: Avoid left shift of negative value in LogicalVShift
...
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2020-04-22 20:46:17 +01:00
Lioncash
9954d28868
a64_jitstate: Zero SP and PC on construction of A64JitState
...
Given we zero out/reset everything else in the struct, do the same for these members to keep initialization consistent
2020-04-22 20:46:17 +01:00
Lioncash
4efbd40ea4
backend_x64/callback: Default virtual destructor in the cpp file
...
Prevents the vtable being generated in each translation unit that includes the header (and silences -Wweak-vtables warnings)
2020-04-22 20:46:17 +01:00
Lioncash
edd0b5c8c7
a32_interface/a64_interface: Change reinterpret_casts to static_casts in GetCurrentBlock thunks
...
It's well-defined to static_cast a void* to its proper type.
2020-04-22 20:46:17 +01:00
Lioncash
e71612d394
A64: Implement SSHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
ef1e69a1e3
A64: Implement SSHL (vector)
2020-04-22 20:46:17 +01:00
Lioncash
21974ee57e
backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
...
Also adds IR opcodes to dispatch said variants
2020-04-22 20:46:17 +01:00
Lioncash
9fc89f0a0e
emit_x64_vector_floating_point: Use arrays for retrieving size instead of hardcoding the size
...
Similar changes were done in emit_x64_vector, but these were missed.
2020-04-22 20:46:17 +01:00
Lioncash
af28e89a13
emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda75e2079
A64: Implement CMTST's scalar variant
2020-04-22 20:46:17 +01:00
Lioncash
0d20423ad5
emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32()
2020-04-22 20:46:17 +01:00
Lioncash
d70ee7c0d1
emit_x64_vector: Use VBPROADCAST where applicable and available
...
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2020-04-22 20:46:17 +01:00
Lioncash
bebe7235ae
A64: Implement UZP1 and UZP2
2020-04-22 20:46:17 +01:00
Lioncash
26d77c6f09
ir: Add opcodes for performing vector deinterleaving
2020-04-22 20:46:17 +01:00
Lioncash
d6f9ed47d9
A64: Implement FNEG (half-precision)
2020-04-22 20:46:17 +01:00
Lioncash
7efbd73bac
A64: Implement USHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
41f4717f2b
A64: Implement FNEG (vector)
2020-04-22 20:46:17 +01:00
Lioncash
ba1cc6366d
A64: Implement RSUBHN/RSUBHN2
2020-04-22 20:46:17 +01:00
Lioncash
e41640fe33
A64: Implement RADDHN/RADDHN2
2020-04-22 20:46:17 +01:00
Lioncash
b719a6b3f7
A64: Implement XAR
2020-04-22 20:46:17 +01:00
Lioncash
0b1b131ec2
simd_two_register_misc: Factor out common comparison code
...
Gets rid of a tiny bit of duplicated code.
2020-04-22 20:46:17 +01:00
Lioncash
ed0b84da70
A64: Implement CMLE (zero)'s vector variant
2020-04-22 20:46:17 +01:00
Lioncash
b595a68ffa
A64: Implement CMTST (vector)
2020-04-22 20:46:17 +01:00
Lioncash
48c7f8630c
A64: Implement ADDHN{2} and SUBHN{2}
2020-04-22 20:46:17 +01:00
Lioncash
3acd9c9200
translate: zero extend result in Vpart when storing to lower part of vector
2020-04-22 20:46:17 +01:00
Lioncash
87ca63699f
emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
f17702f608
emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
596a8dd1dd
emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
...
Provides a better alternative to a fallback operation.
2020-04-22 20:46:17 +01:00
Lioncash
75fd4eaaaa
emit_x64_vector: Get rid of some magic numbers in loop bounds
2020-04-22 20:46:17 +01:00
Lioncash
7b80ac25eb
emit_x64_vector: Generify variable shift functions
2020-04-22 20:46:17 +01:00
Lioncash
4ec735f707
A64: Implement CMLE (zero)'s scalar variant
2020-04-22 20:46:17 +01:00
Lioncash
6534184df2
A64: Implement CMLT (zero)'s scalar single/double-precision variant
2020-04-22 20:46:17 +01:00
Lioncash
8863c9bb4b
A64: Implement SHA512H2
2020-04-22 20:46:17 +01:00
Lioncash
033b890e25
A64: Implement SHA512H
2020-04-22 20:46:17 +01:00
Lioncash
d1f5b084b4
A64: Handle S32->F32 case for SCVTF (vector)
2020-04-22 20:46:17 +01:00
Lioncash
38fa984b53
IR: Add opcode for packed word->f32 conversions
2020-04-22 20:46:16 +01:00
Lioncash
b8587d8e34
A64: Implement SHA512SU1
2020-04-22 20:46:16 +01:00
Lioncash
44d846045a
A64: Implement SHA512SU0
2020-04-22 20:46:16 +01:00
Lioncash
ca903c1585
A64: Implement SHA256H and SHA256H2
2020-04-22 20:46:16 +01:00
MerryMage
e4237c44eb
A64: Implement SCVTF (vector, integer), scalar varaint
2020-04-22 20:46:16 +01:00
MerryMage
bfba38d0b6
impl: Reorganize scalar two-register misc instructions
2020-04-22 20:46:16 +01:00
Lioncash
ea582b17cc
A64: Implement SHA256SU1
2020-04-22 20:46:16 +01:00
Lioncash
06c5dcaf5e
simd_two_register_misc: Add missing zeroing of the vector for CMGT and CMLT
2020-04-22 20:46:16 +01:00
Lioncash
0d50d7314b
A64: Implement CMGE (zero)'s vector variant
2020-04-22 20:46:16 +01:00
Lioncash
ab35dc0e78
A64: Implement MLS (by element)
2020-04-22 20:46:16 +01:00
Lioncash
1651e60462
A64: Implement MUL (by element)
2020-04-22 20:46:16 +01:00
MerryMage
a86d4093cd
A64: Implement MLA (by element)
2020-04-22 20:46:16 +01:00
Lioncash
7f47402609
A64: Implement ABS (scalar)
2020-04-22 20:46:16 +01:00
Lioncash
c8eb4528be
A64: Implement SHA256SU0
2020-04-22 20:46:16 +01:00
Lioncash
181c3b0790
A64: Implement SHA1M
2020-04-22 20:46:16 +01:00
Lioncash
47bc97a71b
A64: Implement SHA1P
2020-04-22 20:46:16 +01:00