MerryMage
fa8925c4df
IR: Implement FPVectorMulX
2020-04-22 20:57:37 +01:00
MerryMage
f0920c0ded
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
Lioncash
d426dfe942
ir: Add opcodes for unsigned saturating left shifts
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0
opcodes.inc: Align columns to a tabstop of 4
2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d
IR: Add fbits argument to FixedToFP-related opcodes
2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46
ir: Add opcodes for left signed saturated shifts
2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da
ir: Add opcodes for signed saturated accumulations of unsigned values
2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d
ir: Add opcodes for performing unsigned reciprocal square root estimates
2020-04-22 20:55:05 +01:00
Lioncash
af83360f89
ir: Add opcodes for unsigned reciprocal estimate
2020-04-22 20:55:05 +01:00
Lioncash
fca7eddb9e
A64: Add opcodes for signed saturating negations
2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c
ir: Add opcodes for scalar signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546
ir: Add opcodes for signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
0507e47420
ir: Add opcodes for signed saturated absolute values
2020-04-22 20:53:46 +01:00
MerryMage
89d08c7d61
IR: Add VectorTable and VectorTableLookup IR instructions
2020-04-22 20:53:45 +01:00
MerryMage
0288974512
opcodes: Cleanup opcodes table
...
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2020-04-22 20:53:45 +01:00
Lioncash
0efa2ce3b0
ir: Add opcodes for performing rounding left shifts
2020-04-22 20:53:45 +01:00
Lioncash
f3f60cd179
A64: Implement ISB
...
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2020-04-22 20:53:45 +01:00
MerryMage
d1d6f4feb5
system: Implement MRS CNTFRQ_EL0
2020-04-22 20:53:45 +01:00
Lioncash
acbaf04fef
ir: Add opcodes for unsigned saturating add and subtract
2020-04-22 20:46:23 +01:00
MerryMage
17f73974f2
IR: Implement FPMulX IR instruction
2020-04-22 20:46:23 +01:00
MerryMage
f976c47008
IR: Initial implementation of FPVectorRoundInt
2020-04-22 20:46:23 +01:00
MerryMage
10e196480f
IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths
2020-04-22 20:46:23 +01:00
Lioncash
463b9a3d02
ir: Add opcodes for vector paired maximum and minimums
...
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2020-04-22 20:46:23 +01:00
Lioncash
2501bfbfae
ir: Add opcodes for performing scalar integral min/max
2020-04-22 20:46:23 +01:00
Lioncash
7fdd8b0197
A64: Implement PMULL{2}
2020-04-22 20:46:23 +01:00
Lioncash
affa312d1d
ir: Add opcode for performing polynomial multiplication
2020-04-22 20:46:22 +01:00
MerryMage
507bcd8b8b
IR: Implement FPVectorTo{Signed,Unsigned}Fixed
2020-04-22 20:46:22 +01:00
MerryMage
7b03da86c2
IR: Implement FPVector{Max,Min}
2020-04-22 20:46:22 +01:00
MerryMage
901bd9b4e2
IR: Implement FPRecipStepFused, FPVectorRecipStepFused
2020-04-22 20:46:22 +01:00
MerryMage
939f5f5c7a
IR: Implement FPVectorRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
c1dcfe29f7
IR: Implement FPRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
04f325a05e
IR: Implement FPVectorNeg
2020-04-22 20:46:22 +01:00
MerryMage
771a4fc20b
IR: Implement FPVectorMulAdd
2020-04-22 20:46:22 +01:00
MerryMage
ecbf9dbae5
IR: Implement A64OrQC
2020-04-22 20:46:22 +01:00
MerryMage
b455b566e7
A64: Implement UQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
3874cb37e3
A64: Implement SQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
f020dbe4ed
A64: Implement SQXTUN
2020-04-22 20:46:22 +01:00
MerryMage
b2e4c16ef8
A64: Implement FRSQRTS (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
45dc5f74f3
A64: Implement FRSQRTE (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
506e544bfe
IR: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
bde58b04d4
IR: Implement FPRSqrtEstimate
2020-04-22 20:46:21 +01:00
MerryMage
e18fca17dc
A64: Implement FABD in terms of existing IR instructions
...
Fixes NaN issue. Closes #306 .
2020-04-22 20:46:21 +01:00
MerryMage
b228694012
IR: Implement FPRoundInt
2020-04-22 20:46:20 +01:00
MerryMage
33fa65de23
A64: Implement FADDP (vector)
2020-04-22 20:46:19 +01:00
MerryMage
9dba273a8c
A64: Implement SADDLP
2020-04-22 20:46:19 +01:00
MerryMage
70ff2d73b5
A64: Implement UADDLP
2020-04-22 20:46:19 +01:00
MerryMage
caaf36dfd6
IR: Initial implementation of FP{Double,Single}ToFixed{S,U}{32,64}
...
This implementation just falls-back to the software floating point implementation.
2020-04-22 20:46:19 +01:00
Lioncash
4aa4885ba7
ir: Add opcodes for vector conversion of u32/u64 to floating-point
2020-04-22 20:46:19 +01:00
Lioncash
7a84b6e8d8
ir: Add opcodes for converting S64 and U64 to single-precision floating-point values
2020-04-22 20:46:19 +01:00
Lioncash
3a41465eaf
ir: Add opcodes for converting S64 and U64 to double-precision values
2020-04-22 20:46:18 +01:00
Lioncash
81e572c78c
ir: Extend FPVectorAbs opcode to also handle 16-bit elements for FP16
2020-04-22 20:46:18 +01:00
Lioncash
fc731dddae
ir: Add opcodes for performing vector absolute floating-point values
...
This will be usable for implementing FACGE and FACGT
2020-04-22 20:46:18 +01:00
Lioncash
8a4f8aed06
ir: Add opcode for performing FP vector absolute differences
2020-04-22 20:46:18 +01:00
MerryMage
8c90fcf58e
IR: Implement FPMulAdd
2020-04-22 20:46:18 +01:00
Lioncash
c695da1cf3
ir: Add opcode for floating-point GE and GT comparisons
...
The rest of the comparisons can be implemented in terms of these two
2020-04-22 20:46:18 +01:00
Lioncash
5ce187a54e
ir: Add opcodes for floating-point vector equalities
2020-04-22 20:46:18 +01:00
Lioncash
bc718c5b28
ir: Add opcodes for performing rounding halving adds
2020-04-22 20:46:18 +01:00
Lioncash
1e10017f4b
ir: Add opcodes for signed absolute differences
2020-04-22 20:46:17 +01:00
Lioncash
3f6c529da2
ir: Add opcode to perform the vector conversion S64->F64
...
Unfortunately x86 prior to AVX-512 doesn't really give us any convenient instruction to do the work for us
2020-04-22 20:46:17 +01:00
Lioncash
44a5f8095a
ir: Add opcodes for performing vector halving subtracts
2020-04-22 20:46:17 +01:00
Lioncash
b312d28295
ir: Add an opcode for doing an SM4 lookup table query
2020-04-22 20:46:17 +01:00
Lioncash
089096948a
ir: Add opcodes for performing halving adds
2020-04-22 20:46:17 +01:00
Lioncash
21974ee57e
backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
...
Also adds IR opcodes to dispatch said variants
2020-04-22 20:46:17 +01:00
Lioncash
26d77c6f09
ir: Add opcodes for performing vector deinterleaving
2020-04-22 20:46:17 +01:00
Lioncash
38fa984b53
IR: Add opcode for packed word->f32 conversions
2020-04-22 20:46:16 +01:00
Lioncash
64b1f2d468
ir: Add opcode for reversing bits in a vector
2020-04-22 20:46:15 +01:00
Lioncash
e33dcce14a
ir: Add opcodes for performing vector absolute values
2020-04-22 20:46:15 +01:00
MerryMage
3472f371df
IR: Implement VectorExtract, VectorExtractLower IR instructions
2020-04-22 20:46:15 +01:00
MerryMage
5c47f03888
A64: Implement FMUL (vector)
2020-04-22 20:46:15 +01:00
Lioncash
ad5cf584ce
ir: Add opcodes for performing vector unsigned absolute differences
2020-04-22 20:46:15 +01:00
Lioncash
701f43d61e
IR: Add opcodes for interleaving upper-order bytes/halfwords/words/doublewords
...
I should have added this when I introduced the functions for interleaving
low-order equivalents for consistency in the interface.
2020-04-22 20:46:15 +01:00
Lioncash
6b0010c940
ir: Add IR opcodes for emitting vector shuffles
...
This uses the ARM terminology for sizes (Halfword -> 2 bytes, Word -> 4 bytes)
as opposed to the x86 terminology of (Word -> 2 bytes, Double word -> 4 bytes)
2020-04-22 20:46:15 +01:00
MerryMage
49cc6d7fad
A64: Implement FDIV (vector)
2020-04-22 20:46:15 +01:00
MerryMage
147284427b
A64: Implement USHL
2020-04-22 20:46:15 +01:00
MerryMage
e4697b1676
A64: Implement system register TPIDR_EL0
2020-04-22 20:46:15 +01:00
MerryMage
e3da92024e
A64: Implement system registers FPCR and FPSR
2020-04-22 20:46:15 +01:00
MerryMage
9e4e4e9c1d
A64: Implement system register CNTPCT_EL0
2020-04-22 20:46:15 +01:00
MerryMage
1e15283d00
A64: Implement system register CTR_EL0
2020-04-22 20:46:15 +01:00
MerryMage
710d09471b
IR: Add IR instruction ZeroVector
2020-04-22 20:46:15 +01:00
MerryMage
0575e7421b
A64: Implement FMINNM (scalar)
2020-04-22 20:46:15 +01:00
MerryMage
1c9804ea07
A64: Implement FMAXNM (scalar)
2020-04-22 20:46:15 +01:00
MerryMage
47c0ad0fc8
IR: Implement Vector{Max,Min}{Signed,Unsigned}
2020-04-22 20:46:14 +01:00
MerryMage
f4775910f5
IR: Implement VectorGreaterSigned
2020-04-22 20:46:14 +01:00
MerryMage
8698f057d0
A64: Implement STXP, STLXP, LDXP, LDAXP
2020-04-22 20:46:14 +01:00
MerryMage
b7a2c1a7df
A64: Implement STXRB, STXRH, STXR, STLXRB, STLXRH, STLXR, LDXRB, LDXRH, LDXR, LDAXRB, LDAXRH, LDAXR
2020-04-22 20:46:14 +01:00
MerryMage
8756487554
A64: Partially implement MRS
2020-04-22 20:46:14 +01:00
MerryMage
bfd65bedfe
A64: Implement DSB, DMB
2020-04-22 20:46:14 +01:00
MerryMage
5edd623b9d
Implement DC instructions
2020-04-22 20:46:14 +01:00
MerryMage
2cb0a699ba
IR: Implement FPMax, FPMin
2020-04-22 20:46:14 +01:00
MerryMage
98c8e7d1af
IR: Implement FPVectorAdd
2020-04-22 20:46:14 +01:00
MerryMage
eae518a338
IR: Implement VectorSignExtend
2020-04-22 20:46:14 +01:00
MerryMage
b9cd345ddc
IR: Implement FPVectorSub
2020-04-22 20:46:14 +01:00
MerryMage
303088a51e
IR: Implement VectorPopulationCount
2020-04-22 20:46:14 +01:00
MerryMage
b6de612e01
IR: Implement VectorMultiply
2020-04-22 20:46:14 +01:00