dynarmic

Author	SHA1	Message	Date
Wunk	3e932ca55d	emit_x64_vector: Fix ArithmeticShiftRightByte zero_extend constant Should be shifting in _bytes_ of `0x80`. Not bits.	2020-11-09 09:47:51 -08:00
Wunkolo	ec52922dae	emit_x64_vector: Use explicit 64-bit mask constant Exchange `~0ull` with `0xFFFFFFFFFFFFFFFF` when generating the `zero_extend` constant.	2020-11-07 15:29:12 +00:00
Wunkolo	490160ef43	emit_x64_vector: GNFI implementation of ArithmeticShiftRightByte The bit-matrix is generated up-front and added to the constant-pool. I'm using an embedded 64-bit broadcast here(m64bcst) which is the particular EVEX encoded version of the instruction with AVX512VL+GNFI. If it ever really matters, then we would ideally detect specific host features like bare-GFNI and specific subsets of AVX512 and emit the assembly based on that rather than by the entire Icelake uarch.	2020-11-07 15:29:12 +00:00
Wunkolo	7df235aefb	emit_x64_vector: GNFI implementation of EmitVectorLogicalShiftLeft8 Same principle as EmitVectorLogicalShiftRight8. An 8x8 galois identity matrix is bit-shfited to allow for arbitrary 8-bit-lane shifts.	2020-11-07 15:29:12 +00:00
Wunkolo	5cc646ffed	emit_x64_vector: GNFI implementation of EmitVectorLogicalShiftRight8 Bitshifts of the GFNI identity matrix generates a new matrix that applies lane-wise bitshifts as well. This allows for a fast single-instruction implementation of a byte-lane bitshift.	2020-11-07 15:29:12 +00:00
Wunkolo	6bb49726f4	emit_x64_vector: GNFI+SSSE3 implementation of EmitVectorReverseBits Performs a full 128-bit bit-reversal using only two instructions. First by reversing all the bits of each byte using a galois matrix multiplication(vgf2p8affineqb, Icelake), and then by reversing the bytes themselves(pshufb, ssse3).	2020-10-02 05:56:59 +01:00
MerryMage	f35aaa017c	IR: Add VectorDeinterleave{Even,Odd}Lower	2020-07-04 11:04:10 +01:00
Merry	b1ff971a92	backend/x64: Temporarily avoid use of DefineValue(Argument&) Issues with inappropriate values in upper bits of values	2020-06-27 10:52:59 +01:00
MerryMage	7d1e103ff5	IR: Implement VectorTranspose	2020-06-21 12:14:13 +01:00
MerryMage	8bbc9fdbb6	A32: Implement ASIMD VTBX	2020-06-20 22:35:31 +01:00
MerryMage	87f6e412d0	emit_x64_vector: SSE4.1 implementation of EmitVectorPolynomialMultiply{Long}8	2020-06-18 18:44:00 +01:00
MerryMage	f5b41aabc6	emit_x64_vector: Implement EmitVectorPolynomialMultiplyLong64 in terms of pclmulqdq	2020-06-18 18:04:23 +01:00
MerryMage	f495018f53	block_of_code: Encapsulate CPU feature detection code	2020-06-09 21:25:57 +01:00
MerryMage	b47adaee1d	emit_x64_vector: SSSE3 implementation of EmitVectorExtract	2020-06-01 15:41:36 +01:00
MerryMage	5c0bb5cc63	Remove unreachable code (MSVC warnings)	2020-04-23 16:36:34 +01:00
MerryMage	a8a712c801	Relicense to 0BSD	2020-04-23 15:45:57 +01:00
MerryMage	325808949f	backend/x64: Rename namespace BackendX64 -> Backend::X64	2020-04-22 21:06:17 +01:00
MerryMage	bd88286b21	cast_util: Add FptrCast Reduce unnecessary type duplication when casting a lambda to a function pointer.	2020-04-22 21:06:17 +01:00
MerryMage	81fcb4e537	mp: Migrate to shared version of mp library	2020-04-22 21:06:17 +01:00
Merry	f252a62c1b	Merge pull request #502 from lioncash/header General: Remove unnecessary includes	2020-04-22 21:04:22 +01:00
Lioncash	349d4b577a	General: Remove unnecessary includes Removes unnecessary header dependencies that have accumulated over time as changes have been made. Lessens the amount of files that need to be rebuilt when the headers change.	2020-04-22 21:04:22 +01:00
Lioncash	cba9351b82	backend/x64/emit_*: Apply const where applicable	2020-04-22 21:04:22 +01:00
Lioncash	87083af733	general: Remove trailing spaces General code-related cleanup. Gets rid of trailing spaces in the codebase.	2020-04-22 21:04:21 +01:00
Lioncash	675f67e41d	emit_x64_vector: Use const on locals where applicable Normalizes the use of const in the source file.	2020-04-22 21:02:47 +01:00
Lioncash	a4cadf1cd9	frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation	2020-04-22 21:01:44 +01:00
Lioncash	b37279f65c	backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft Avoids undefined behavior by potentially left-shifting a signed negative value.	2020-04-22 21:00:47 +01:00
MerryMage	f0920c0ded	Fix VShift terminology An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift. - Rename VectorLogicalVShiftS* -> VectorArithmeticVShift* - Rename VectorLogicalVShiftU* -> VectorLogicalVShift*	2020-04-22 20:55:50 +01:00
MerryMage	b51dae790d	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16	2020-04-22 20:55:50 +01:00
MerryMage	bd47f2ca8f	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64	2020-04-22 20:55:50 +01:00
MerryMage	3bf183d7e8	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32	2020-04-22 20:55:50 +01:00
MerryMage	94f9d402eb	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()	2020-04-22 20:55:50 +01:00
MerryMage	6d9639e3b0	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()	2020-04-22 20:55:50 +01:00
MerryMage	bbc066a266	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()	2020-04-22 20:55:50 +01:00
Lioncash	da2e7fad87	emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8() pshufb lyfe	2020-04-22 20:55:50 +01:00
MerryMage	b8fde48732	emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8	2020-04-22 20:55:50 +01:00
MerryMage	fd37b637aa	emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16	2020-04-22 20:55:50 +01:00
Lioncash	d426dfe942	ir: Add opcodes for unsigned saturating left shifts	2020-04-22 20:55:06 +01:00
Lioncash	b14eaaec46	ir: Add opcodes for left signed saturated shifts	2020-04-22 20:55:06 +01:00
Lioncash	a2cd643525	emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked Given this is just an internal helper function, it can be marked static.	2020-04-22 20:55:06 +01:00
MerryMage	12243692f5	A64: Implement SQRDMULH (vector), vector variant	2020-04-22 20:55:06 +01:00
MerryMage	3e447614c6	IR: Add VectorSignedSaturatedDoublingMultiplyLong	2020-04-22 20:55:06 +01:00
MerryMage	06b31448aa	emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply * Return both the upper and lower parts of the multiply if required * SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead * Improve port utilisation where possible (punpck instructions were a bottleneck)	2020-04-22 20:55:06 +01:00
MerryMage	08c0e017a5	IR: Implement Vector{Signed,Unsigned}Multiply{16,32}	2020-04-22 20:55:06 +01:00
MerryMage	1492573267	emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}	2020-04-22 20:55:06 +01:00
Lioncash	26df6e5e7b	emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks I had initially meant to use BitSize() here, not sizeof()	2020-04-22 20:55:06 +01:00
MerryMage	a4a26ac226	emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation	2020-04-22 20:55:06 +01:00
MerryMage	a7c66d2d28	emit_x64_vector: Simplify fpsr_qc related code Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously pay the cost of conversion for every saturation instruction.	2020-04-22 20:55:06 +01:00
Lioncash	e739624296	ir: Add opcodes for vector CLZ operations We can optimize these cases further for with the use of a fair bit of shuffling via pshufb and the use of masks, but given the uncommon use of this instruction, I wouldn't consider it to be beneficial in terms of amount of code to be worth it over a simple manageable naive solution like this. If we ever do hit a case where vectorized CLZ happens to be a bottleneck, then we can revisit this. At least with AVX-512CD, this can be done with a single instruction for the 32-bit word case.	2020-04-22 20:55:05 +01:00
Lioncash	5653e7637e	emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes These were unintentionally left in when introducing SUQADD and USQADD	2020-04-22 20:55:05 +01:00
Lioncash	d4a76aaa04	ir: Add opcodes form unsigned saturated accumulations of signed values	2020-04-22 20:55:05 +01:00

1 2

68 commits