dynarmic

Author	SHA1	Message	Date
MerryMage	06b31448aa	emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply * Return both the upper and lower parts of the multiply if required * SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead * Improve port utilisation where possible (punpck instructions were a bottleneck)	2020-04-22 20:55:06 +01:00
MerryMage	08c0e017a5	IR: Implement Vector{Signed,Unsigned}Multiply{16,32}	2020-04-22 20:55:06 +01:00
Lioncash	e739624296	ir: Add opcodes for vector CLZ operations We can optimize these cases further for with the use of a fair bit of shuffling via pshufb and the use of masks, but given the uncommon use of this instruction, I wouldn't consider it to be beneficial in terms of amount of code to be worth it over a simple manageable naive solution like this. If we ever do hit a case where vectorized CLZ happens to be a bottleneck, then we can revisit this. At least with AVX-512CD, this can be done with a single instruction for the 32-bit word case.	2020-04-22 20:55:05 +01:00
Lioncash	d4a76aaa04	ir: Add opcodes form unsigned saturated accumulations of signed values	2020-04-22 20:55:05 +01:00
Lioncash	6f911a26da	ir: Add opcodes for signed saturated accumulations of unsigned values	2020-04-22 20:55:05 +01:00
Lioncash	b6e74fd17d	ir: Add opcodes for performing unsigned reciprocal square root estimates	2020-04-22 20:55:05 +01:00
Lioncash	af83360f89	ir: Add opcodes for unsigned reciprocal estimate	2020-04-22 20:55:05 +01:00
Lioncash	fca7eddb9e	A64: Add opcodes for signed saturating negations	2020-04-22 20:53:46 +01:00
Lioncash	7ebfd0f31c	ir: Add opcodes for scalar signed saturated doubling multiplies	2020-04-22 20:53:46 +01:00
Lioncash	a0231e5546	ir: Add opcodes for signed saturated doubling multiplies	2020-04-22 20:53:46 +01:00
Lioncash	0507e47420	ir: Add opcodes for signed saturated absolute values	2020-04-22 20:53:46 +01:00
MerryMage	3415828fb4	IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64}	2020-04-22 20:53:46 +01:00
Lioncash	053175f69b	ir_emitter: Rename fpscr_controlled parameters to fpcr_controlled Part of addressing #333	2020-04-22 20:53:46 +01:00
MerryMage	89d08c7d61	IR: Add VectorTable and VectorTableLookup IR instructions	2020-04-22 20:53:45 +01:00
Lioncash	0efa2ce3b0	ir: Add opcodes for performing rounding left shifts	2020-04-22 20:53:45 +01:00
Lioncash	acbaf04fef	ir: Add opcodes for unsigned saturating add and subtract	2020-04-22 20:46:23 +01:00
MerryMage	17f73974f2	IR: Implement FPMulX IR instruction	2020-04-22 20:46:23 +01:00
MerryMage	f976c47008	IR: Initial implementation of FPVectorRoundInt	2020-04-22 20:46:23 +01:00
MerryMage	10e196480f	IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths	2020-04-22 20:46:23 +01:00
Lioncash	463b9a3d02	ir: Add opcodes for vector paired maximum and minimums For the time being, we can just do a naive implementation which avoids falling back to the interpreter a bit. Horizontal operations aren't necessarily x86 SIMD's forte anyways.	2020-04-22 20:46:23 +01:00
Lioncash	2501bfbfae	ir: Add opcodes for performing scalar integral min/max	2020-04-22 20:46:23 +01:00
Lioncash	7fdd8b0197	A64: Implement PMULL{2}	2020-04-22 20:46:23 +01:00
Lioncash	affa312d1d	ir: Add opcode for performing polynomial multiplication	2020-04-22 20:46:22 +01:00
MerryMage	507bcd8b8b	IR: Implement FPVectorTo{Signed,Unsigned}Fixed	2020-04-22 20:46:22 +01:00
MerryMage	7b03da86c2	IR: Implement FPVector{Max,Min}	2020-04-22 20:46:22 +01:00
MerryMage	901bd9b4e2	IR: Implement FPRecipStepFused, FPVectorRecipStepFused	2020-04-22 20:46:22 +01:00
MerryMage	939f5f5c7a	IR: Implement FPVectorRecipEstimate	2020-04-22 20:46:22 +01:00
MerryMage	c1dcfe29f7	IR: Implement FPRecipEstimate	2020-04-22 20:46:22 +01:00
MerryMage	04f325a05e	IR: Implement FPVectorNeg	2020-04-22 20:46:22 +01:00
MerryMage	771a4fc20b	IR: Implement FPVectorMulAdd	2020-04-22 20:46:22 +01:00
MerryMage	b455b566e7	A64: Implement UQXTN (vector)	2020-04-22 20:46:22 +01:00
MerryMage	3874cb37e3	A64: Implement SQXTN (vector)	2020-04-22 20:46:22 +01:00
MerryMage	f020dbe4ed	A64: Implement SQXTUN	2020-04-22 20:46:22 +01:00
MerryMage	b2e4c16ef8	A64: Implement FRSQRTS (vector), single/double variant	2020-04-22 20:46:22 +01:00
MerryMage	45dc5f74f3	A64: Implement FRSQRTE (vector), single/double variant	2020-04-22 20:46:22 +01:00
MerryMage	506e544bfe	IR: Implement FPRSqrtStepFused	2020-04-22 20:46:22 +01:00
MerryMage	bde58b04d4	IR: Implement FPRSqrtEstimate	2020-04-22 20:46:21 +01:00
MerryMage	e18fca17dc	A64: Implement FABD in terms of existing IR instructions Fixes NaN issue. Closes #306.	2020-04-22 20:46:21 +01:00
MerryMage	b228694012	IR: Implement FPRoundInt	2020-04-22 20:46:20 +01:00
MerryMage	33fa65de23	A64: Implement FADDP (vector)	2020-04-22 20:46:19 +01:00
MerryMage	9dba273a8c	A64: Implement SADDLP	2020-04-22 20:46:19 +01:00
MerryMage	70ff2d73b5	A64: Implement UADDLP	2020-04-22 20:46:19 +01:00
MerryMage	caaf36dfd6	IR: Initial implementation of FP{Double,Single}ToFixed{S,U}{32,64} This implementation just falls-back to the software floating point implementation.	2020-04-22 20:46:19 +01:00
Lioncash	4aa4885ba7	ir: Add opcodes for vector conversion of u32/u64 to floating-point	2020-04-22 20:46:19 +01:00
Lioncash	7a84b6e8d8	ir: Add opcodes for converting S64 and U64 to single-precision floating-point values	2020-04-22 20:46:19 +01:00
Lioncash	3a41465eaf	ir: Add opcodes for converting S64 and U64 to double-precision values	2020-04-22 20:46:18 +01:00
Lioncash	fc731dddae	ir: Add opcodes for performing vector absolute floating-point values This will be usable for implementing FACGE and FACGT	2020-04-22 20:46:18 +01:00
Lioncash	8a4f8aed06	ir: Add opcode for performing FP vector absolute differences	2020-04-22 20:46:18 +01:00
MerryMage	8c90fcf58e	IR: Implement FPMulAdd	2020-04-22 20:46:18 +01:00
Lioncash	c695da1cf3	ir: Add opcode for floating-point GE and GT comparisons The rest of the comparisons can be implemented in terms of these two	2020-04-22 20:46:18 +01:00

1 2 3 4

197 commits