Commit graph

1013 commits

Author SHA1 Message Date
Lioncash
c692ccdd6d emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8() 2020-04-22 20:46:17 +01:00
Lioncash
b194313d8c emit_x64_vector: Vectorize fallback path in EmitVectorMinU32() 2020-04-22 20:46:17 +01:00
Lioncash
7ceda6d919 emit_x64_vector: Vectorize fallback path in EmitVectorMinU16() 2020-04-22 20:46:17 +01:00
Lioncash
cda85a1da0 emit_x64_vector: Vectorize fallback path in EmitVectorMinS32() 2020-04-22 20:46:17 +01:00
Lioncash
6e08eed210 emit_x64_vector: Vectorize fallback path in EmitVectorMinS8() 2020-04-22 20:46:17 +01:00
Lioncash
0fb6dce689 emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
This can simply be merged with the previous one.
2020-04-22 20:46:17 +01:00
Lioncash
5b71b1337b emit_x64_vector: Avoid left shift of negative value in LogicalVShift
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2020-04-22 20:46:17 +01:00
Lioncash
9954d28868 a64_jitstate: Zero SP and PC on construction of A64JitState
Given we zero out/reset everything else in the struct, do the same for these members to keep initialization consistent
2020-04-22 20:46:17 +01:00
Lioncash
4efbd40ea4 backend_x64/callback: Default virtual destructor in the cpp file
Prevents the vtable being generated in each translation unit that includes the header (and silences -Wweak-vtables warnings)
2020-04-22 20:46:17 +01:00
Lioncash
edd0b5c8c7 a32_interface/a64_interface: Change reinterpret_casts to static_casts in GetCurrentBlock thunks
It's well-defined to static_cast a void* to its proper type.
2020-04-22 20:46:17 +01:00
Lioncash
e71612d394 A64: Implement SSHL (scalar) 2020-04-22 20:46:17 +01:00
Lioncash
ef1e69a1e3 A64: Implement SSHL (vector) 2020-04-22 20:46:17 +01:00
Lioncash
21974ee57e backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
Also adds IR opcodes to dispatch said variants
2020-04-22 20:46:17 +01:00
Lioncash
9fc89f0a0e emit_x64_vector_floating_point: Use arrays for retrieving size instead of hardcoding the size
Similar changes were done in emit_x64_vector, but these were missed.
2020-04-22 20:46:17 +01:00
Lioncash
af28e89a13 emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16() 2020-04-22 20:46:17 +01:00
Lioncash
cda75e2079 A64: Implement CMTST's scalar variant 2020-04-22 20:46:17 +01:00
Lioncash
0d20423ad5 emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32() 2020-04-22 20:46:17 +01:00
Lioncash
d70ee7c0d1 emit_x64_vector: Use VBPROADCAST where applicable and available
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2020-04-22 20:46:17 +01:00
Lioncash
bebe7235ae A64: Implement UZP1 and UZP2 2020-04-22 20:46:17 +01:00
Lioncash
26d77c6f09 ir: Add opcodes for performing vector deinterleaving 2020-04-22 20:46:17 +01:00
Lioncash
d6f9ed47d9 A64: Implement FNEG (half-precision) 2020-04-22 20:46:17 +01:00
Lioncash
7efbd73bac A64: Implement USHL (scalar) 2020-04-22 20:46:17 +01:00
Lioncash
41f4717f2b A64: Implement FNEG (vector) 2020-04-22 20:46:17 +01:00
Lioncash
ba1cc6366d A64: Implement RSUBHN/RSUBHN2 2020-04-22 20:46:17 +01:00
Lioncash
e41640fe33 A64: Implement RADDHN/RADDHN2 2020-04-22 20:46:17 +01:00
Lioncash
b719a6b3f7 A64: Implement XAR 2020-04-22 20:46:17 +01:00
Lioncash
0b1b131ec2 simd_two_register_misc: Factor out common comparison code
Gets rid of a tiny bit of duplicated code.
2020-04-22 20:46:17 +01:00
Lioncash
ed0b84da70 A64: Implement CMLE (zero)'s vector variant 2020-04-22 20:46:17 +01:00
Lioncash
b595a68ffa A64: Implement CMTST (vector) 2020-04-22 20:46:17 +01:00
Lioncash
48c7f8630c A64: Implement ADDHN{2} and SUBHN{2} 2020-04-22 20:46:17 +01:00
Lioncash
3acd9c9200 translate: zero extend result in Vpart when storing to lower part of vector 2020-04-22 20:46:17 +01:00
Lioncash
87ca63699f emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs 2020-04-22 20:46:17 +01:00
Lioncash
f17702f608 emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs 2020-04-22 20:46:17 +01:00
Lioncash
596a8dd1dd emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
Provides a better alternative to a fallback operation.
2020-04-22 20:46:17 +01:00
Lioncash
75fd4eaaaa emit_x64_vector: Get rid of some magic numbers in loop bounds 2020-04-22 20:46:17 +01:00
Lioncash
7b80ac25eb emit_x64_vector: Generify variable shift functions 2020-04-22 20:46:17 +01:00
Lioncash
4ec735f707 A64: Implement CMLE (zero)'s scalar variant 2020-04-22 20:46:17 +01:00
Lioncash
6534184df2 A64: Implement CMLT (zero)'s scalar single/double-precision variant 2020-04-22 20:46:17 +01:00
Lioncash
8863c9bb4b A64: Implement SHA512H2 2020-04-22 20:46:17 +01:00
Lioncash
033b890e25 A64: Implement SHA512H 2020-04-22 20:46:17 +01:00
Lioncash
d1f5b084b4 A64: Handle S32->F32 case for SCVTF (vector) 2020-04-22 20:46:17 +01:00
Lioncash
38fa984b53 IR: Add opcode for packed word->f32 conversions 2020-04-22 20:46:16 +01:00
Lioncash
b8587d8e34 A64: Implement SHA512SU1 2020-04-22 20:46:16 +01:00
Lioncash
44d846045a A64: Implement SHA512SU0 2020-04-22 20:46:16 +01:00
Lioncash
ca903c1585 A64: Implement SHA256H and SHA256H2 2020-04-22 20:46:16 +01:00
MerryMage
e4237c44eb A64: Implement SCVTF (vector, integer), scalar varaint 2020-04-22 20:46:16 +01:00
MerryMage
bfba38d0b6 impl: Reorganize scalar two-register misc instructions 2020-04-22 20:46:16 +01:00
Lioncash
ea582b17cc A64: Implement SHA256SU1 2020-04-22 20:46:16 +01:00
Lioncash
06c5dcaf5e simd_two_register_misc: Add missing zeroing of the vector for CMGT and CMLT 2020-04-22 20:46:16 +01:00
Lioncash
0d50d7314b A64: Implement CMGE (zero)'s vector variant 2020-04-22 20:46:16 +01:00
Lioncash
ab35dc0e78 A64: Implement MLS (by element) 2020-04-22 20:46:16 +01:00
Lioncash
1651e60462 A64: Implement MUL (by element) 2020-04-22 20:46:16 +01:00
MerryMage
a86d4093cd A64: Implement MLA (by element) 2020-04-22 20:46:16 +01:00
Lioncash
7f47402609 A64: Implement ABS (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
c8eb4528be A64: Implement SHA256SU0 2020-04-22 20:46:16 +01:00
Lioncash
181c3b0790 A64: Implement SHA1M 2020-04-22 20:46:16 +01:00
Lioncash
47bc97a71b A64: Implement SHA1P 2020-04-22 20:46:16 +01:00
Lioncash
718f3e9bb4 A64: Implement scalar variants of CMEQ, CMGT, and CMGE zero comparison instructions
These can trivially use the ScalarCompare helper function.
2020-04-22 20:46:16 +01:00
Lioncash
3ad4e547e4 A64: Implement scalar variant of NEG 2020-04-22 20:46:16 +01:00
Lioncash
b4f3051e4b simd: Relocate REV16, REV32 and REV64 vector variants to the proper file
These aren't scalar instruction variants.
2020-04-22 20:46:16 +01:00
Lioncash
19e276d10f A64: Implement CMEQ (register, scalar) 2020-04-22 20:46:16 +01:00
Lioncash
5b8c9e5146 A64: Implement CMHS (register, scalar) 2020-04-22 20:46:16 +01:00
Lioncash
78bb12276a A64: Implement CMHI (register, scalar) 2020-04-22 20:46:16 +01:00
Lioncash
c18b20b8d1 A64: Implement CMGE (register, scalar) 2020-04-22 20:46:16 +01:00
Lioncash
755981d0da A64: Implement CMGT (register, scalar) 2020-04-22 20:46:16 +01:00
Lioncash
da6627124b A64: Implement SHA1C 2020-04-22 20:46:16 +01:00
Lioncash
3c013bd9f8 A64: Implement SLI (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
154cac594a A64: Implement SRI (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
6bcfdba1ad general: Remove unused lambda captures
Resolves warnings that occur in Xcode 9.3
2020-04-22 20:46:16 +01:00
Lioncash
205ca6b4cb A64: Implement SHA1SU1 2020-04-22 20:46:16 +01:00
Lioncash
16a001b9ff A64: Implement SHA1SU0 2020-04-22 20:46:16 +01:00
Lioncash
3b6db59850 A64: Implement TRN2 2020-04-22 20:46:16 +01:00
Lioncash
30e158f8d0 A64: Implement TRN1 2020-04-22 20:46:16 +01:00
Lioncash
52cad2d9d0 A64: Implement SSRA (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
255a33936d A64: Implement SSHR (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
6723b00497 A64: Implement USRA (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
d56fa8f735 A64: Implement USHR (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
870e418b0b A64: Implement SHL (scalar) 2020-04-22 20:46:16 +01:00
Lioncash
97f2bea4f2 A64: Implement SM3PARTW1 2020-04-22 20:46:16 +01:00
Lioncash
e268b110f0 simd_sha512: Simplify RAX1
Now that the vector rotation helpers are in, replace the explicit
shifting with the relevant helper function that does the same thing.

Simply tidies up code; no behavioral changes are made.
2020-04-22 20:46:16 +01:00
Lioncash
20d2491267 A64: Implement SM3PARTW2 2020-04-22 20:46:16 +01:00
Lioncash
e1b662e90c ir: Add helper functions for vector rotation 2020-04-22 20:46:16 +01:00
Lioncash
8a60a63a8b A64: Implement SM3TT2B 2020-04-22 20:46:16 +01:00
Lioncash
b3d4c02098 A64: Implement SM3TT2A 2020-04-22 20:46:16 +01:00
Lioncash
7fbccabd81 A64: Implement SM3TT1B 2020-04-22 20:46:16 +01:00
Lioncash
769373b3ed A64: Implement SM3TT1A 2020-04-22 20:46:16 +01:00
Lioncash
2d269fdcc7 simd_shift_by_immediate: Merge signed/unsigned helper functions
Gets rid of a little more code duplication.
2020-04-22 20:46:16 +01:00
Lioncash
d5461be6b4 A64: Implement SM3SS1 2020-04-22 20:46:16 +01:00
Lioncash
2db032ac83 A64: Implement SRI (vector) 2020-04-22 20:46:16 +01:00
Lioncash
11005cfe26 A64: Implement SLI (vector) 2020-04-22 20:46:16 +01:00
Lioncash
e3d9bf55e7 A64: Implement SRSRA (vector) 2020-04-22 20:46:16 +01:00
Lioncash
bc6016cad7 A64: Implement SRSHR (vector) 2020-04-22 20:46:16 +01:00
MerryMage
6c9c829a08 imm: Add additional bit position checks to Imm::Bits 2020-04-22 20:46:16 +01:00
MerryMage
be907a61f7 math_util: rvalue references for std::forward 2020-04-22 20:46:16 +01:00
Lioncash
a2f8cdf0a3 A64: Implement SSUBL/SSUBL2 2020-04-22 20:46:16 +01:00
Lioncash
d456fb85c8 A64: Implement SADDL/SADDL2 2020-04-22 20:46:16 +01:00
Lioncash
5c9e7f328d A64: Implement USUBL/USUBL2 2020-04-22 20:46:16 +01:00
Lioncash
88d70e3b8a A64: Implement UADDL/UADDL2 2020-04-22 20:46:16 +01:00
Lioncash
4b3d70de5f simd_shift_by_immediate: Factor out common code in shift instructions
Gets rid of partial duplication of the same code for instructions that only have a small behavior difference to them.

e.g. The only difference between SSHR and SSRA is that SSRA adds an accumulator before storing the result.
2020-04-22 20:46:16 +01:00
Lioncash
56803f5203 A64: Implement URSRA (vector) 2020-04-22 20:46:16 +01:00