Wunk
3e932ca55d
emit_x64_vector: Fix ArithmeticShiftRightByte zero_extend constant
...
Should be shifting in _bytes_ of `0x80`. Not bits.
2020-11-09 09:47:51 -08:00
Wunkolo
ec52922dae
emit_x64_vector: Use explicit 64-bit mask constant
...
Exchange `~0ull` with `0xFFFFFFFFFFFFFFFF` when generating
the `zero_extend` constant.
2020-11-07 15:29:12 +00:00
Wunkolo
490160ef43
emit_x64_vector: GNFI implementation of ArithmeticShiftRightByte
...
The bit-matrix is generated up-front and added to the constant-pool.
I'm using an embedded 64-bit broadcast here(m64bcst) which is the particular
EVEX encoded version of the instruction with AVX512VL+GNFI.
If it ever really matters, then we would ideally detect specific host
features like bare-GFNI and specific subsets of AVX512 and emit
the assembly based on that rather than by the entire Icelake uarch.
2020-11-07 15:29:12 +00:00
Wunkolo
7df235aefb
emit_x64_vector: GNFI implementation of EmitVectorLogicalShiftLeft8
...
Same principle as EmitVectorLogicalShiftRight8. An 8x8 galois identity
matrix is bit-shfited to allow for arbitrary 8-bit-lane shifts.
2020-11-07 15:29:12 +00:00
Wunkolo
5cc646ffed
emit_x64_vector: GNFI implementation of EmitVectorLogicalShiftRight8
...
Bitshifts of the GFNI identity matrix generates a new matrix that
applies lane-wise bitshifts as well. This allows for a fast
single-instruction implementation of a byte-lane bitshift.
2020-11-07 15:29:12 +00:00
Wunkolo
6bb49726f4
emit_x64_vector: GNFI+SSSE3 implementation of EmitVectorReverseBits
...
Performs a full 128-bit bit-reversal using only two instructions.
First by reversing all the bits of each byte using a galois matrix
multiplication(vgf2p8affineqb, Icelake), and then by reversing the bytes
themselves(pshufb, ssse3).
2020-10-02 05:56:59 +01:00
MerryMage
f35aaa017c
IR: Add VectorDeinterleave{Even,Odd}Lower
2020-07-04 11:04:10 +01:00
Merry
b1ff971a92
backend/x64: Temporarily avoid use of DefineValue(Argument&)
...
Issues with inappropriate values in upper bits of values
2020-06-27 10:52:59 +01:00
MerryMage
7d1e103ff5
IR: Implement VectorTranspose
2020-06-21 12:14:13 +01:00
MerryMage
8bbc9fdbb6
A32: Implement ASIMD VTBX
2020-06-20 22:35:31 +01:00
MerryMage
87f6e412d0
emit_x64_vector: SSE4.1 implementation of EmitVectorPolynomialMultiply{Long}8
2020-06-18 18:44:00 +01:00
MerryMage
f5b41aabc6
emit_x64_vector: Implement EmitVectorPolynomialMultiplyLong64 in terms of pclmulqdq
2020-06-18 18:04:23 +01:00
MerryMage
f495018f53
block_of_code: Encapsulate CPU feature detection code
2020-06-09 21:25:57 +01:00
MerryMage
b47adaee1d
emit_x64_vector: SSSE3 implementation of EmitVectorExtract
2020-06-01 15:41:36 +01:00
MerryMage
5c0bb5cc63
Remove unreachable code (MSVC warnings)
2020-04-23 16:36:34 +01:00
MerryMage
a8a712c801
Relicense to 0BSD
2020-04-23 15:45:57 +01:00
MerryMage
325808949f
backend/x64: Rename namespace BackendX64 -> Backend::X64
2020-04-22 21:06:17 +01:00
MerryMage
bd88286b21
cast_util: Add FptrCast
...
Reduce unnecessary type duplication when casting a lambda to a function pointer.
2020-04-22 21:06:17 +01:00
MerryMage
81fcb4e537
mp: Migrate to shared version of mp library
2020-04-22 21:06:17 +01:00
Merry
f252a62c1b
Merge pull request #502 from lioncash/header
...
General: Remove unnecessary includes
2020-04-22 21:04:22 +01:00
Lioncash
349d4b577a
General: Remove unnecessary includes
...
Removes unnecessary header dependencies that have accumulated over time
as changes have been made. Lessens the amount of files that need to be
rebuilt when the headers change.
2020-04-22 21:04:22 +01:00
Lioncash
cba9351b82
backend/x64/emit_*: Apply const where applicable
2020-04-22 21:04:22 +01:00
Lioncash
87083af733
general: Remove trailing spaces
...
General code-related cleanup. Gets rid of trailing spaces in the
codebase.
2020-04-22 21:04:21 +01:00
Lioncash
675f67e41d
emit_x64_vector: Use const on locals where applicable
...
Normalizes the use of const in the source file.
2020-04-22 21:02:47 +01:00
Lioncash
a4cadf1cd9
frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation
2020-04-22 21:01:44 +01:00
Lioncash
b37279f65c
backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft
...
Avoids undefined behavior by potentially left-shifting a signed negative
value.
2020-04-22 21:00:47 +01:00
MerryMage
f0920c0ded
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16
2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64
2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32
2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()
2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()
2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()
2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87
emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
...
pshufb lyfe
2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732
emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8
2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa
emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16
2020-04-22 20:55:50 +01:00
Lioncash
d426dfe942
ir: Add opcodes for unsigned saturating left shifts
2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46
ir: Add opcodes for left signed saturated shifts
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525
emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
...
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5
A64: Implement SQRDMULH (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2020-04-22 20:55:06 +01:00
MerryMage
1492573267
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00