Lioncash
3d465e2c36
A64: Implement SQXTN, SQXTUN, and UQXTN's scalar variants
...
We can implement these in terms of the vector variants
2020-04-22 20:53:45 +01:00
Lioncash
4ff39c6ea8
A64: Implement SDOT and UDOT's (by element) variants
...
Gets all of the dot product instructions out of the way.
2020-04-22 20:53:45 +01:00
MerryMage
21df1fb539
emit_x64_vector: Don't load zero constant from memory in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
3bbcca8757
emit_x64_vector: Special-case is_defaults_zero && table_size == 2 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
9cc00f900c
emit_x64_vector: Release registers when possible in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a12afd1065
reg_alloc: Add the ability to Release an allocation early
2020-04-22 20:53:45 +01:00
MerryMage
e68bd3c6c1
emit_x64_vector: Special-case table_size == 1 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a4e1f8a63a
emit_x64_vector: SSE4.1 implementation of EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
0c18b85c27
A64: Implement TBL and TBX
2020-04-22 20:53:45 +01:00
MerryMage
89d08c7d61
IR: Add VectorTable and VectorTableLookup IR instructions
2020-04-22 20:53:45 +01:00
MerryMage
0288974512
opcodes: Cleanup opcodes table
...
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2020-04-22 20:53:45 +01:00
Lioncash
d9fc6cf31f
A64: Implement SDOT and UDOT's vector variant
2020-04-22 20:53:45 +01:00
Lioncash
cb5e5c5d49
A64: Implement SADALP and UADALP
...
While we're at it we can join the code for SADDLP and UADDLP with these
instructions, since the only difference is we do an accumulate at the
end of the operation.
2020-04-22 20:53:45 +01:00
Lioncash
29f8b30634
A64: Implement SRSHL and URSHL
...
Implements both scalar and vector variants.
2020-04-22 20:53:45 +01:00
Lioncash
0efa2ce3b0
ir: Add opcodes for performing rounding left shifts
2020-04-22 20:53:45 +01:00
MerryMage
656ceff225
emit_x64_floating_point: Fix smallest normal check in EmitFPMulAdd
2020-04-22 20:53:45 +01:00
Lioncash
f3f60cd179
A64: Implement ISB
...
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2020-04-22 20:53:45 +01:00
Lioncash
be53e356a2
A64: Implement FCVTN{2}
2020-04-22 20:53:45 +01:00
Lioncash
4c3d7c5a8d
A64: Implement FCVTL{2}
2020-04-22 20:53:45 +01:00
Lioncash
7eb6be7a6a
A64: Implement FMAXNM and FMINNM vector variants.
...
Currently we can implement these in terms of the scalar IR variants.
2020-04-22 20:53:45 +01:00
Lioncash
8b65ea68c0
A64: Implement FMAXP, FMAXNMP, FMINP, and FMINNMP's vector variants
...
We can just implement these in terms of scalars for the time being.
2020-04-22 20:53:45 +01:00
MerryMage
ec76f95f5a
emit_x64_vector_floating_point: Correct value of smallest_normal_number
2020-04-22 20:53:45 +01:00
MerryMage
e60d6c0d20
fp/info: Incorrect point_position in FPValue
2020-04-22 20:53:45 +01:00
MerryMage
8a3b6364c2
load_store_exclusive: Define s == t state to be Constraint_NONE
...
Downstream (yuzu) mentioned that the instruction:
STXR W9, W9, [X0]
was executed in the program "Crash N-Sane Trilogy".
2020-04-22 20:53:45 +01:00
MerryMage
cd40e4dae0
A64/translate: Allow for unpredictable behaviour to be defined
2020-04-22 20:53:45 +01:00
MerryMage
d1d6f4feb5
system: Implement MRS CNTFRQ_EL0
2020-04-22 20:53:45 +01:00
Lioncash
7ef7def661
A64: Implement SQ{ADD, SUB}, and UQ{ADD, SUB}'s vector variants
...
Currently we implement these in terms of the scalar variants. Falling
back to the interpreter is slow enough to make it more effective than
doing that.
2020-04-22 20:46:23 +01:00
Lioncash
a4b0e2ace6
A64: Implement UQADD/UQSUB's scalar variants
2020-04-22 20:46:23 +01:00
Lioncash
acbaf04fef
ir: Add opcodes for unsigned saturating add and subtract
2020-04-22 20:46:23 +01:00
Lioncash
c41b5a3492
x64/reg_alloc: Use type alias for array returned by GetArgumentInfo()
...
This way if the number ever changes, we don't need to change the type in
other places.
2020-04-22 20:46:23 +01:00
Lioncash
2188765e28
ir/value: Use type alias CoprocessorInfo for std::array<u8, 8>
...
Provides a more descriptive label for the interface, and avoids the need
to hardcode the array size in multiple places.
2020-04-22 20:46:23 +01:00
MerryMage
71e137715d
status_register_access: Add support for bits 0 and 1 of mask to MSR
2020-04-22 20:46:23 +01:00
MerryMage
ac51c2547d
A32/translate/load_store: Correct detection of writeback
2020-04-22 20:46:23 +01:00
MerryMage
d345220251
A32/translate: Add TranslateSingleInstruction
2020-04-22 20:46:23 +01:00
MerryMage
5fc197c564
A32/ir_emitter: Bug fix: IREmitter::ExceptionRaised using incorrect opcode
2020-04-22 20:46:23 +01:00
MerryMage
ff3805e332
A32/decoders: Split instruction list into include file
2020-04-22 20:46:23 +01:00
MerryMage
3f4d118d73
microinstruction: Improve assert messages
2020-04-22 20:46:23 +01:00
MerryMage
a7e6f2a235
emit_x64_vector: EmitVectorNarrow16: AVX512 implementation
2020-04-22 20:46:23 +01:00
MerryMage
b6350e3947
emit_x64_vector: EmitVectorNarrow32: prefer pblendw to loading constant
2020-04-22 20:46:23 +01:00
MerryMage
8fdba189cb
emit_x64_vector: packusdw is SSE4.1
2020-04-22 20:46:23 +01:00
MerryMage
1ef388d1cd
emit_x64_vector_floating_point: Simplify FPVector{Min,Max}
2020-04-22 20:46:23 +01:00
MerryMage
4a1ce797cb
emit_x64_vector_floating_point: Simplify Get*Vector functions
2020-04-22 20:46:23 +01:00
MerryMage
bcaced297a
emit_x64_floating_point: Remove EmitProcessNaNs
2020-04-22 20:46:23 +01:00
MerryMage
2e0885388e
devirtualize: Replace DEVIRT macro with function template
2020-04-22 20:46:23 +01:00
Lioncash
54d8552177
a32_emit_x64: std::move A32::UserConfig in the constructor
...
This avoids a few redundant atomic increments and decrements,
considering the UserConfig instance contains a std::array of
std::shared_ptr<Coprocessor> instances.
2020-04-22 20:46:23 +01:00
MerryMage
b098c650df
emit_x64_floating_point: Use EmitPostProcessNaNs in EmitFPMulX
2020-04-22 20:46:23 +01:00
MerryMage
c1babf41b2
emit_x64_floating_point: Remove unnecessary DenormalsAreZero from EmitFPSingleToDouble and EmitFPDoubleToSingle
2020-04-22 20:46:23 +01:00
MerryMage
700088408d
emit_x64_floating_point: Simplify EmitFP{Min,Max}{,Numeric}{32,64}
2020-04-22 20:46:23 +01:00
MerryMage
07e0585994
emit_x64_floating_point: Reduce NaN processing overhead
2020-04-22 20:46:23 +01:00
MerryMage
f5e11d117a
A64: Implement FMULX, scalar single/double variant
2020-04-22 20:46:23 +01:00
MerryMage
17f73974f2
IR: Implement FPMulX IR instruction
2020-04-22 20:46:23 +01:00
Lioncash
391e16be64
emit_x64_vector: Vectorize 32-bit variants of paired min/max
...
Gets rid of the fallbacks for these cases.
2020-04-22 20:46:23 +01:00
MerryMage
5ae045d67e
emit_x64_vector: Improve code emission of VectorGetElement* for index == 0
2020-04-22 20:46:23 +01:00
MerryMage
e9ab7f7664
reg_alloc: Do a UseScratch if a Use destination is too small
2020-04-22 20:46:23 +01:00
MerryMage
90f8dda966
emit_x64_floating_point: AVX implementation of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
dfb660cd16
emit_x64_vector_floating_point: Prefer blendvp{s,d} to vblendvp{s,d} where possible
...
It's a cheaper instruction.
2020-04-22 20:46:23 +01:00
MerryMage
476c0f15da
backend_x64: Remove all use of xmm0
2020-04-22 20:46:23 +01:00
MerryMage
8252efd7b1
emit_x64_vector_floating_point: AVX implementation of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
746dc521b9
emit_x64_vector_floating_point: Reduce codesize of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
7731dcdca9
emit_x64_vector_floating_point: Reduce codesize of EmitTwoOpVectorOperation
2020-04-22 20:46:23 +01:00
MerryMage
bb93353f94
emit_x64_vector_floating_point: Correct FMA in FTZ mode
...
x64 rounds before flushing to zero
AArch64 rounds after flushing to zero
This difference of behaviour is noticable if something would round to a smallest normalized number
2020-04-22 20:46:23 +01:00
MerryMage
8ef195db3c
emit_x64_floating_point: DenormalsAreZero is redundant as hardware already does DAZ
...
Exceptions: F{MIN,MAX}{,NM}
2020-04-22 20:46:23 +01:00
MerryMage
de9d8c461c
emit_x64_floating_point: FlushToZero is redundant as hardware already does FTZ
2020-04-22 20:46:23 +01:00
MerryMage
822fd4a875
backend_x64: Fix FPVectorMulAdd and FPMulAdd NaN handling with denormals
...
Denormals should be treated as zero in NaN handler
2020-04-22 20:46:23 +01:00
MerryMage
b393e15ab6
backend_x64: Fix bugs when FPCR.FZ=1
...
Bugs:
* DenormalsAreZero flushed to positive zero instead of preserving sign.
* FMAXNM/FMINNM (scalar) should perform DAZ *before* special zero handling.
* FMAX/FMIN/FMAXNM/FMINNM (vector) did not DAZ.
2020-04-22 20:46:23 +01:00
MerryMage
5e88d66470
fp/info: Deduplicate functions
2020-04-22 20:46:23 +01:00
MerryMage
2019d32743
emit_x64_floating_point: Deduplicate EmitFPMulAdd implementation
2020-04-22 20:46:23 +01:00
MerryMage
e038fe72df
emit_x64_floating_point: Deduplicate code
2020-04-22 20:46:23 +01:00
MerryMage
ec82a845b7
emit_x64_vector_floating_point: Fix FPVector{Max,Min} when FPCR.DN = 1
2020-04-22 20:46:23 +01:00
MerryMage
7f27945411
emit_x64_floating_point: Fix FP{Max,Min} when FPCR.DN = 1
2020-04-22 20:46:23 +01:00
MerryMage
21a28c2545
IR: SSE4.1 implementation of FPVectorRoundInt
2020-04-22 20:46:23 +01:00
MerryMage
9669e49817
A64: Implement FRINT{N,M,P,Z,A,X,I} (vector), single/double variant
2020-04-22 20:46:23 +01:00
MerryMage
f976c47008
IR: Initial implementation of FPVectorRoundInt
2020-04-22 20:46:23 +01:00
MerryMage
f2393488fe
A64: Implement SQADD and SQSUB, scalar variant
2020-04-22 20:46:23 +01:00
MerryMage
10e196480f
IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths
2020-04-22 20:46:23 +01:00
MerryMage
71db0e67ae
a64_emit_x64: Bugfix EmitA64OrQC - Incorrect argument
2020-04-22 20:46:23 +01:00
Lioncash
d0fdd3c6e6
simd_three_same: Extract non-paired SMAX, SMIN, UMAX, UMIN code to a common function
...
Deduplicates a bit of code and makes its layout consistent with the
paired variants
2020-04-22 20:46:23 +01:00
Lioncash
2bea2d0512
A64: Implement SMAXP, SMINP, UMAXP, UMINP
2020-04-22 20:46:23 +01:00
Lioncash
463b9a3d02
ir: Add opcodes for vector paired maximum and minimums
...
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2020-04-22 20:46:23 +01:00
Lioncash
43344c5400
A64: Implement SMAXV, SMINV, UMAXV, and UMINV
2020-04-22 20:46:23 +01:00
Lioncash
2501bfbfae
ir: Add opcodes for performing scalar integral min/max
2020-04-22 20:46:23 +01:00
Lioncash
7fdd8b0197
A64: Implement PMULL{2}
2020-04-22 20:46:23 +01:00
Lioncash
5ebf496d4e
translate: Deduplicate GetDataSize() functions
...
Avoids defining the same function multiple times in different files.
2020-04-22 20:46:22 +01:00
Lioncash
f83cd2da9a
floating_point_{conditional}_compare: Deduplicate code
...
Deduplicates the implementation code of instructions by extracting the
code to a common function.
2020-04-22 20:46:22 +01:00
MerryMage
f9c6d5e1a0
common: Move all cryptographic function to common/crypto
2020-04-22 20:46:22 +01:00
MerryMage
5dc23e49d7
a32_emit_x64: BMI2 implementation of A32SetCpsr
2020-04-22 20:46:22 +01:00
MerryMage
0f85305933
a32_emit_x64: Shorten EmitA32GetCpsr
2020-04-22 20:46:22 +01:00
MerryMage
9fe2bf8733
a32_emit_x64: Assert that memory layout assumption in EmitA32GetCpsr is valid
2020-04-22 20:46:22 +01:00
Lioncash
b48fb8ca6b
A64: Implement PMUL
2020-04-22 20:46:22 +01:00
Lioncash
affa312d1d
ir: Add opcode for performing polynomial multiplication
2020-04-22 20:46:22 +01:00
MerryMage
dd4ac86f8e
A64: Implement FCVT{N,M,A,P}{U,S} (vector), FCVTZU (vector, integer), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
28b38916a8
A64: Implement FCVTZS (vector, integer), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
507bcd8b8b
IR: Implement FPVectorTo{Signed,Unsigned}Fixed
2020-04-22 20:46:22 +01:00
MerryMage
8f75a1fe04
fp/info: Replace constant value generators with FPValue
...
Instead of having multiple different functions we can just have one.
2020-04-22 20:46:22 +01:00
MerryMage
da261772ea
emit_x64_vector_floating_point: AVX implementation of FPVector{Max,Min}
2020-04-22 20:46:22 +01:00
MerryMage
a0d6f0de57
emit_x64_vector_floating_point: Remove unnecessary double jump in HandleNaNs
2020-04-22 20:46:22 +01:00
Lioncash
c778c7b868
A64: Implement FMAX's vector single and double precision variants
2020-04-22 20:46:22 +01:00
Lioncash
009879d92b
A64: Implement FMIN's vector single and double precision variants
2020-04-22 20:46:22 +01:00
MerryMage
7b03da86c2
IR: Implement FPVector{Max,Min}
2020-04-22 20:46:22 +01:00
MerryMage
e76e1186bb
FPRecipEstimate: Move offset out of function
...
MSVC has weird lambda capturing rules.
2020-04-22 20:46:22 +01:00
MerryMage
ddcff86f9c
microinstruction: Update ReadsFromAndWritesToFPSRCumulativeExceptionBits
2020-04-22 20:46:22 +01:00
MerryMage
10de36394e
A64: Implement FRECPS, vector/scalar single/double variants
2020-04-22 20:46:22 +01:00
MerryMage
901bd9b4e2
IR: Implement FPRecipStepFused, FPVectorRecipStepFused
2020-04-22 20:46:22 +01:00
MerryMage
f66f61d8ab
A64: Implement FRECPE, vector single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
939f5f5c7a
IR: Implement FPVectorRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
27c73dd56a
A64: Implement FRECPE, scalar single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
fc2d33ae7b
IR: Implement FPRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
c1dcfe29f7
IR: Implement FPRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
7a673a8a43
fp: Change FPUnpacked to a normalized representation
...
Having a known position for the highest set bit makes writing algorithms easier
2020-04-22 20:46:22 +01:00
MerryMage
3fe45c6d8e
block_of_code: Add ABI_PARAMS array
2020-04-22 20:46:22 +01:00
MerryMage
642b6c31d2
A64: Implement MLA, MLS (by element), vector single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
0de37b11ad
A64: Implement FMLS (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
64c2f698a2
emit_x64_vector_floating_point: Specify NanHandler::function_type explicitly
...
MSVC doesn't like dealing with auto return types
2020-04-22 20:46:22 +01:00
MerryMage
2ef59b4f03
emit_x64_vector_floating_point: ChooseOnFsize arguments maybe_unused
2020-04-22 20:46:22 +01:00
MerryMage
04f325a05e
IR: Implement FPVectorNeg
2020-04-22 20:46:22 +01:00
MerryMage
934132e0c5
A64: Implement FMLA (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
771a4fc20b
IR: Implement FPVectorMulAdd
2020-04-22 20:46:22 +01:00
MerryMage
3218bb9890
emit_x64_vector_floating_point: Standardize naming scheme
2020-04-22 20:46:22 +01:00
MerryMage
8f72be0a02
emit_x64_floating_point: Simplify indexers
2020-04-22 20:46:22 +01:00
MerryMage
25b28bb234
emit_x64_vector_floating_point: Simplify EmitVectorOperation*
2020-04-22 20:46:22 +01:00
MerryMage
1edd0125b2
mp: rename mp.h to mp/function_info.h
2020-04-22 20:46:22 +01:00
MerryMage
0921678edb
emit_x64_vector: Slightly improve ArithmeticShiftRightByte
2020-04-22 20:46:22 +01:00
MerryMage
43407c4bb4
emit_x64_vector: Simplify VectorShuffleImpl
2020-04-22 20:46:22 +01:00
MerryMage
ecbf9dbae5
IR: Implement A64OrQC
2020-04-22 20:46:22 +01:00
MerryMage
f0fecf2615
A64: Implement UQSHRN, UQRSHRN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
8f4c1a8558
emit_x64_vector: -0x80000000 isn't -0x80000000
2020-04-22 20:46:22 +01:00
MerryMage
b455b566e7
A64: Implement UQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
e686a81612
emit_x64_vector: Fix non-SSE4.1 saturated narrowing reconstruction comparison
...
Allows non-SSE4.1 to produce the correct FPSR.QC flag
2020-04-22 20:46:22 +01:00
MerryMage
3874cb37e3
A64: Implement SQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
8ef114d48f
emit_x64_vector: packusdw reqiures SSE4.1
...
In EmitVectorSignedSaturatedNarrowToUnsigned32.
2020-04-22 20:46:22 +01:00
MerryMage
712c6c1d7e
A64: Implement SQSHRUN, SQRSHRUN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
c5722ec963
simd_shift_by_immediate: Simplify ShiftRight
2020-04-22 20:46:22 +01:00
MerryMage
f020dbe4ed
A64: Implement SQXTUN
2020-04-22 20:46:22 +01:00
MerryMage
6918ef7360
microinstruction: Reorganize FPSCR related instruction queries
2020-04-22 20:46:22 +01:00
Lioncash
a639fa5534
microinstruction: Add missing FP scalar opcodes to ReadsFromFPSCR() and WritesToFPSCR()
...
These were forgotten when the opcodes were added.
2020-04-22 20:46:22 +01:00
Lioncash
3ca18d8a6d
u128: Make Bit() a const-qualified member function
...
This function doesn't modify the struct members, so it can be made
const.
2020-04-22 20:46:22 +01:00
MerryMage
b2e4c16ef8
A64: Implement FRSQRTS (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
45dc5f74f3
A64: Implement FRSQRTE (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
b74d5520f9
A64: Implement FRSQRTS (scalar), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
506e544bfe
IR: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
6eb069e80d
fp: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
b0ff35fcd1
fp: Implement FPNeg
2020-04-22 20:46:22 +01:00
MerryMage
ca6774ccce
process_nan: Add two operand variant
2020-04-22 20:46:22 +01:00
Lioncash
ace7d2ba50
A64: Implement FMAXP, FMINP, FMAXNMP and FMINNMP's scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
MerryMage
66bb05fc0a
emit_x64_floating_point: Fixup special NaN case in FMA FPMulAdd implementation
2020-04-22 20:46:21 +01:00
Lioncash
070637e0f6
fp: Use a forward declaration in fused.h
...
It's permissible to forward declare here, so we can do so and eliminate
a direct header dependency
2020-04-22 20:46:21 +01:00
Lioncash
030820f649
u128: Implement comparison operators in terms of one another
...
We can just implement the comparisons in terms of operator< and
implement inequality with the negation of operator==.
2020-04-22 20:46:21 +01:00
MerryMage
76b07d6646
u128: StickyLogicalShiftRight requires special-casing for amount == 64
...
In this case (128 - amount) == 64, and this invokes undefined behaviour
2020-04-22 20:46:21 +01:00
Lioncash
49c7edf7c6
A64: Implement FMLA and FMLS (by element)'s double/single-precision scalar variant
2020-04-22 20:46:21 +01:00
Lioncash
c704acafe4
A64: Implement FMUL (by element)'s scalar double/single-precision variant
2020-04-22 20:46:21 +01:00