Commit graph

1703 commits

Author SHA1 Message Date
Lioncash
b37279f65c backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft
Avoids undefined behavior by potentially left-shifting a signed negative
value.
2020-04-22 21:00:47 +01:00
Lioncash
46eae8cf2f common/fp/op/FPRecipExponent: Prevent undefined behavior from shifting a negative value
Due to promotion rules (types < int, even if unsigned, get promoted to
int when arithmetic is performed on them), this is a potential spot for
undefined behavior.
2020-04-22 21:00:47 +01:00
MerryMage
13e8b7b516 emit_x64_floating_point: F16C implementation of FPSingleToHalf 2020-04-22 20:58:17 +01:00
MerryMage
d32d6fe598 emit_x64_floating_point: F16C implementation of FPHalfToSingle and FPHalfToDouble 2020-04-22 20:58:12 +01:00
MerryMage
a53ba12be2 emit_x64_floating_point: Factor out ConvertRoundingModeToX64Immediate 2020-04-22 20:58:12 +01:00
MerryMage
5a2adc6629 backend/x64: Expose FPCR in EmitContext instead of its subcomponents 2020-04-22 20:58:12 +01:00
Merry
01bb1cdd88 Merge pull request #458 from lioncash/float-op
A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV
2020-04-22 20:58:12 +01:00
Lioncash
28a8b4d210 A64: Handle half-precision floating point in scalar FMOV
This is simply performing a scalar value transfer between registers
without conversions, so this is trivial to handle as-is.
2020-04-22 20:58:12 +01:00
Lioncash
d7ac5a664f A64: Handle half-precision floating point in FCVTL
Like FCVTN, now that we have half-precision floating point conversion
functions available, we can go ahead and use those to eliminate the
interpreter fallback.
2020-04-22 20:58:12 +01:00
Lioncash
fe84ecb780 A64: Handle half-precision floating point in scalar FABS
Now that we have the half-precision variant of the opcode added, we can
simply handle the instruction instead of treating it as undefined.
2020-04-22 20:58:12 +01:00
Lioncash
fac9224d5e A64: Handle half-precision floating point in FCVTN
Now that we have IR instructions for performing conversions with
half-precision floating point, we can also handle half-precision values
within FCVTN.
2020-04-22 20:58:12 +01:00
Lioncash
8309ec7a9f frontend/ir_emitter: Add half-precision variant of FPAbs 2020-04-22 20:58:12 +01:00
Lioncash
16de99d3e3 A64: Enable FCVT floating-point conversions for half-precision
With this, we no longer have to fall back to the interpreter in any of
the FCVT floating-point conversion instructions.
2020-04-22 20:58:12 +01:00
Lioncash
10abc77fad A64: Handle half-precision floating point in scalar FNEG
With the half-precision variant of the FPNeg opcode added, we can
utilize it here to emulate the half-precision variant of FNEG.
2020-04-22 20:58:12 +01:00
Lioncash
e4c259d69f frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes 2020-04-22 20:58:12 +01:00
Lioncash
c97efcb978 frontend/ir_emitter: Add half-precision variant of FPNeg 2020-04-22 20:58:12 +01:00
Lioncash
dff5da1063 common/fp/unpacked: Amend behavior of FPUnpackCV
This is supposed to call FPUnpackBase instead of FPUnpack. This would
result in alternate half-precision representations being misinterpreted
when it comes to dealing with NaNs.
2020-04-22 20:58:12 +01:00
Merry
f01afc5ae6 Merge pull request #456 from lioncash/mov
A64: Enable FMOV (general) for half-precision floating point
2020-04-22 20:58:12 +01:00
Lioncash
03bc2334fe common/fp/op/FPConvert: Amend off-by one in double NaN case in FPConvertNaN
Avoids potentially clobbering the intended sign bit value during
conversions to double-precision values. The other conversion types are
already properly handled, so those don't need to be addressed.
2020-04-22 20:58:12 +01:00
Lioncash
c57b146fb2 common/fp/op/FPConvert: Add half-precision instantiations to FPConvert 2020-04-22 20:58:12 +01:00
Merry
c1ce94872d Merge pull request #455 from lioncash/sqrdmulh-scalar
A64: Implement SQRDMULH and SQDMULL's scalar indexed variants
2020-04-22 20:58:11 +01:00
Lioncash
25a7256ee1 A64: Enable FMOV (general) for half-precision floating point
This just transfers values between vector registers and general-purpose
registers with no conversions performed, so this is trivial to add
support for half-precision to.
2020-04-22 20:58:11 +01:00
Lioncash
97dd3d0596 A64: Implement SQRDMULH's scalar indexed element variant 2020-04-22 20:58:11 +01:00
Lioncash
49b51e34f1 simd_vector_x_indexed_element: Deduplicate index and Vm operand construction 2020-04-22 20:58:11 +01:00
Lioncash
692aba91b6 A64: Implement SQDMULL{2}'s scalar indexed element variant 2020-04-22 20:58:11 +01:00
Lioncash
c043b831d5 A64: Implement SQDMULL{2}'s by-element variant 2020-04-22 20:58:11 +01:00
Lioncash
72af5a3dff simd_scalar_x_indexed_element: Factor out index and Vm argument construction
This will be useful in the implementations of SQRDMULH and SQDMULL{2} as
well.
2020-04-22 20:58:11 +01:00
Lioncash
224ff0afaa A64: Implement SQRDMULH's by-index vector variant 2020-04-22 20:58:11 +01:00
Lioncash
3a3542414b A64: Implement FRECPX's half-precision floating point variant 2020-04-22 20:58:11 +01:00
Lioncash
bd892ec4ef frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point 2020-04-22 20:58:11 +01:00
Lioncash
974fbf0677 frontend/ir/value: Add U16U32U64 type to represent floating point types 2020-04-22 20:58:11 +01:00
Lioncash
eb3e0d5908 common/fp/op/FPRecipExponent: Add half-precision floating point specialization 2020-04-22 20:58:11 +01:00
Lioncash
a829c93406 common/fp/unpacked: Correct edge-cases within FPUnpack for half-precision floating point
This corrects one case where floating-point exceptions could be set when
they're not supposed to be.

This also corrects a case where values were being treated as NaNs when
they weren't supposed to be.
2020-04-22 20:58:11 +01:00
Lioncash
7030b9af95 common/fp/process_nan: Add half-precision instantiations for NaN processing functions 2020-04-22 20:58:11 +01:00
Lioncash
14f55d7476 common/fp/unpacked: Add half-precision instantiation of FPRoundBase 2020-04-22 20:58:11 +01:00
Lioncash
7e814de445 common/fp/unpacked: Handle half-precision unpacking in FPUnpackBase 2020-04-22 20:58:11 +01:00
Lioncash
8f9fe8690a common/fp/unpacked: Adjust FPUnpack to operate like ARM pseudocode
This function is defined as always disabling the AHP bit in the fpcr
before performing any operations.

At the same time, rename the original FPUnpack function to FPUnpackBase
to match the pseudocode in the ARM reference manual.
2020-04-22 20:58:11 +01:00
Merry
37c4c39d62 Merge pull request #448 from lioncash/saturate
A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants
2020-04-22 20:58:11 +01:00
Merry
f5d774bdbd Merge pull request #449 from lioncash/hp
common/fp/info: Add specialization of FPInfo for half-precision floating point
2020-04-22 20:58:11 +01:00
Lioncash
126c29a9e9 A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants
These can just be implemented in terms of the vector variants for the
time being.
2020-04-22 20:58:11 +01:00
Lioncash
0b67b94b6c common/fp/info: Add specialization of FPInfo for half-precision floating point
Puts the necessary info struct in place for further use.
2020-04-22 20:58:11 +01:00
Lioncash
dd7433f9d3 A64: Amend prototypes of some SIMD scalar shift by immediate opcodes
These take a vector for a destination.
2020-04-22 20:58:11 +01:00
Lioncash
99c494bae9 common/fp/unpacked: Add FPRoundCV
Corresponds to the equivalent pseudocode within the ARMv8 reference
manual. This will be necessary for supporting half-precision
floating-point.

This also makes use of it within FPConvert
2020-04-22 20:58:11 +01:00
Merry
bbd5330ad2 Merge pull request #447 from lioncash/flag
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Lioncash
490bebbd9a common/fp/unpacked: Add FPUnpackCV
Adds a template function that performs the same behavior as in the ARM
pseudocode, and utilizes it in FPConvert, which will be necessary for
half-float support.
2020-04-22 20:58:11 +01:00
Merry
fb039e232c Merge pull request #442 from lioncash/fcvtxn
A64: Implement scalar and vector variants of FCVTXN
2020-04-22 20:58:11 +01:00
Lioncash
6aed4036ef ir_opt/a64_get_set_elimination_pass: Add handling for NZCV raw get and set operations 2020-04-22 20:58:11 +01:00
Merry
4f937c1ee1 Merge pull request #446 from lioncash/sqshl
A64: Implement scalar variants of SQSHL (register) and UQSHL (register)
2020-04-22 20:58:11 +01:00
Lioncash
aa22db534b A64: Implement AXFlag and XAFlag 2020-04-22 20:58:11 +01:00
Merry
d74cccbc84 Merge pull request #445 from lioncash/sqrt
A64: Implement single and double-precision vector variant of FSQRT
2020-04-22 20:58:11 +01:00
Lioncash
20ffe568d0 A64: Implement RMIF 2020-04-22 20:58:11 +01:00
Merry
6d7e7c3269 Merge pull request #443 from lioncash/flag
A64: Rearrange flag format/manipulation instructions
2020-04-22 20:58:11 +01:00
Lioncash
51b526e453 A64: Implement CFINV 2020-04-22 20:58:11 +01:00
Merry
5d01f1b462 Merge pull request #441 from lioncash/constexpr
common/bit_util: Mark a few functions as constexpr
2020-04-22 20:58:11 +01:00
Lioncash
597a8be5d5 ir: Add A64-specific opcodes for getting and setting raw NZCV values
This will be necessary to implement the flag manipulation and flag
format instructions.
2020-04-22 20:58:11 +01:00
Merry
743c52fdc5 Merge pull request #440 from lioncash/include
common/fp: Remove unnecessary includes
2020-04-22 20:58:11 +01:00
Lioncash
d3515279df A64: Implement the vector version of FCVTXN 2020-04-22 20:58:10 +01:00
Lioncash
17aea0b997 A64: Implement UQSHL (register)'s scalar variant
This can be implemented in terms of the vector variant.
2020-04-22 20:58:10 +01:00
Lioncash
c99d4b762e A64: Implement single and double-precision vector variant of FSQRT 2020-04-22 20:58:10 +01:00
Lioncash
54e0b487f3 A64: Rearrange flag format/manipulation instructions
Gives these instructions better categorical labeling.
2020-04-22 20:58:10 +01:00
Lioncash
88d1977cb9 common/bit_util: Make a few functions as constexpr
These four functions can be made constexpr with no issue.
2020-04-22 20:58:10 +01:00
Lioncash
f33e5939b7 common/fp: Remove unnecessary includes 2020-04-22 20:58:10 +01:00
Lioncash
302f56b36a A64: Fall back to interpreting for FCADD and FCMLA half-precision variants
Rather than straight-up treating them as undefined, we can fall back to an
interpreter in this case.
2020-04-22 20:58:10 +01:00
Lioncash
4339a8fff6 A64: Implement the scalar version of FCVTXN 2020-04-22 20:58:10 +01:00
Lioncash
35ddf68ad5 A64: Implement SQSHL (register)'s scalar variant
We can implement this in terms of the vector variant.
2020-04-22 20:58:10 +01:00
Lioncash
5cf1478620 frontend/ir: Add opcodes for vector square roots 2020-04-22 20:58:10 +01:00
Lioncash
36027ebef5 frontend/ir/microinstruction: Add missing cases for FPRecipExponent{32,64} for ReadsFromAndWritesToFPSRCumulativeExceptionBits()
This was intended to be added within #437, but was missed
2020-04-22 20:58:10 +01:00
Merry
40b081438a Merge pull request #439 from lioncash/fcmla
A64: Implement FCADD and FCMLA
2020-04-22 20:58:10 +01:00
Lioncash
7c81a58ed3 frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2020-04-22 20:58:10 +01:00
Merry
d91192681a Merge pull request #438 from lioncash/fmulx
A64: Implement scalar double/single precision FMULX (by element)
2020-04-22 20:58:10 +01:00
Lioncash
ed29ef8cca A64: Implement FCMLA 2020-04-22 20:58:10 +01:00
Lioncash
95af9dafbe common/fp/op: Add FP conversion functions 2020-04-22 20:58:10 +01:00
Merry
9f11720a69 Merge pull request #437 from lioncash/frecpx
A64: Implement FRECPX (single, double precision)
2020-04-22 20:58:10 +01:00
Lioncash
bdcea0b0dc A64: Implement scalar double/single precision FMULX (by element) 2020-04-22 20:58:10 +01:00
Lioncash
5ce17574f9 A64: Implement FCADD 2020-04-22 20:58:10 +01:00
Merry
34d917f34e Merge pull request #436 from lioncash/no-alloc
A64: Implement LDNP/STNP
2020-04-22 20:58:10 +01:00
Lioncash
e44730ba6d A64: Implement FRECPX (single, double precision) 2020-04-22 20:58:10 +01:00
Lioncash
bfaeb08d3c A64: Implement LDNP/STNP
LDNP and STNP indicate that a memory access is non-temporal/streaming
(i.e. unlikely to be repeated), allowing data caching to not be
performed. However, given this is only a hint, we can treat these two
instructions as regular LDP and STP instructions for the time being.
2020-04-22 20:58:10 +01:00
Lioncash
9cf3c25811 frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents 2020-04-22 20:58:10 +01:00
Merry
dbf47db713 Merge pull request #434 from lioncash/format
A32/translate_arm: Formatting/tidying up
2020-04-22 20:58:10 +01:00
Lioncash
b168c2a9f9 common/fp/op: Add operations for floating-point reciprocal exponents 2020-04-22 20:58:10 +01:00
Lioncash
05a6ab691d translate_arm/coprocessor: Minor tidying up 2020-04-22 20:58:10 +01:00
Lioncash
1e32a09c03 translate_arm/vfp2: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
e209b31073 translate_arm/synchronization: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
9514e3602e translate_arm/status_register_access: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
c6aa1a708a translate_arm/saturated: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
a72813599a translate_arm/reversal: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
7be56e6b67 translate_arm/parallel: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
3c00a616d6 translate_arm/packing: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
c711188f46 translate_arm/multiply: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
c8dad40d81 translate_arm/misc: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
a7bf5ff77d translate_arm/load_store: Invert conditionals where applicable 2020-04-22 20:58:10 +01:00
Lioncash
2e180a7f14 backend/x64/a32_interface: Mark Context move constructor and move assignment as noexcept
Provides a more "correct" move constructor/assignment operator, since
these relevant functions shouldn't throw exceptions.

Has the benefit of playing nicely with std::move_if_noexcept and other
noexcept library facilities.
2020-04-22 20:58:09 +01:00
Lioncash
f4b19a7393 translate_arm/extension: Invert conditionals where applicable 2020-04-22 20:58:09 +01:00
Lioncash
deb9dd4acc block_of_code: Replace cast with [[maybe_unused]] in DoesCpuSupport() 2020-04-22 20:58:09 +01:00
Lioncash
c2de6ecfd0 translate_arm/exception_generating: Invert conditionals where applicable 2020-04-22 20:58:09 +01:00
Lioncash
d8a8d3b073 translate_arm/data_processing: Invert conditionals where applicable 2020-04-22 20:58:09 +01:00
Lioncash
df5c51ff47 translate_arm/branch: Invert conditionals where applicable
Allows unindenting code a bit.
2020-04-22 20:58:09 +01:00
Lioncash
3290a9fdc2 common: Remove address_range.h
The AddressRange structure isn't used anywhere within the codebase, so
this can be removed. Particularly because there's no real appeal/heavy
potential use of it in the future that isn't trivial to add back if
needed.
2020-04-22 20:57:38 +01:00
Lioncash
ee973f13c7 frontend/A32/ir_emitter: Mark PC() and AlignPC() as const-qualified member functions
These don't modify instance state, so they can be const-qualified member
functions.
2020-04-22 20:57:38 +01:00
Lioncash
3a2dd09122 frontend/A64/ir_emitter: Mark PC() and AlignPC() as const qualified member functions
These don't actually alter any instance state.
2020-04-22 20:57:38 +01:00
Lioncash
575ae852a9 constant_propagation_pass: Fold byte reversal opcodes where applicable
These are reasonably trivial to fold away when applicable. We just
perform the swap and replace the instruction with the constant value.
2020-04-22 20:57:37 +01:00
Merry
2c53f354ab Merge pull request #418 from lioncash/fold-op
constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes
2020-04-22 20:57:37 +01:00
Merry
ad14a33672 Merge pull request #417 from lioncash/swap
common: Move byte swapping functions to bit_utils.h
2020-04-22 20:57:37 +01:00
Lioncash
d302d9bd0c constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes
These are quite trivial to fold.
2020-04-22 20:57:37 +01:00
Lioncash
7139942976 common: Move byte swapping functions to bit_utils.h
These are quite general functions, so they can just be moved into common
instead of recreating a namespace here.
2020-04-22 20:57:37 +01:00
MerryMage
7c8fcaef26 emit_x64_vector_floating_point: AVX && DN implementation of EmitFPVectorMulX 2020-04-22 20:57:37 +01:00
MerryMage
e3898e628e A64: Implement FMULX (by element), single and double precision variants 2020-04-22 20:57:37 +01:00
Lioncash
93351c7efb a64_emit_x64: Make constness of loop elements explicit within GenFastmemFallbacks() 2020-04-22 20:57:37 +01:00
MerryMage
c106d8cedf A64: Implement FMULX, vector single-precision and double-precision variant 2020-04-22 20:57:37 +01:00
Lioncash
7752ffc50c a64_emit_x64: Convert std::vector instances in GenFastmemFallbacks() to std::array
Given these are quite small, we can avoid the need to heap allocate
here.
2020-04-22 20:57:37 +01:00
MerryMage
fa8925c4df IR: Implement FPVectorMulX 2020-04-22 20:57:37 +01:00
Michał Janiszewski
bbd8abaa25 Provide justification for always-true condition (#412) 2020-04-22 20:57:37 +01:00
Michał Janiszewski
7d0e918b51 Add missing include guards 2020-04-22 20:57:37 +01:00
V.Kalyuzhny
764a93bf5a Switch boost::optional to std::optional 2020-04-22 20:57:37 +01:00
Lioncash
07c197e8d0 constant_propagation_pass: Add 64-bit variants of shifts to the pass
These optimizations can also apply to the 64-bit variants of the shift
opcodes; we just need to check if the instruction has an associated
pseudo-op before performing the 32-bit variant's specifics.

While we're at it, we can also relocate the code to its own function
like the rest of the cases to keep organization consistent.
2020-04-22 20:57:37 +01:00
Lioncash
8248999c5d constant_propagation_pass: Fold division operations where applicable
We can fold division operations if:

1. The divisor is zero, then we can replace the result with zero (as this is how
ARM platforms expect it).
2. Both values are known, in which case we can just do the operation and
store the result
3. The divisor is 1, in which case just return the other operand.
2020-04-22 20:57:37 +01:00
Merry
d83eae2004 Merge pull request #406 from lioncash/mul
constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable
2020-04-22 20:57:37 +01:00
Merry
73d9393300 Merge pull request #405 from lioncash/inst
a64: Add ARMv8.4+ instructions encodings to the encoding table
2020-04-22 20:57:37 +01:00
Lioncash
7ad6981437 constant_propagation_pass: deduplicate common 32/64 bit checking for results in folding functions
It's common for an folding operation to apply to both the 32-bit and
64-bit variant of the same opcode, which leads to checking which kind of
result we need to store the value as. This moves it to its own function,
so that we don't need to duplicate it in various functions.
2020-04-22 20:57:37 +01:00
Lioncash
f1a66c37ba a64: Add ARMv8.4+ instructions encodings to the encoding table
Keeps the table up to date with the ARM specification.
2020-04-22 20:57:37 +01:00
Lioncash
72daf37208 constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable
Multiplication operations can currently be folded if:

1. Both arguments are known constant values
2. Either operand is zero (in which case the result is also zero)
3. Either operand is one (in which case the result is the non-one
operand).
2020-04-22 20:57:37 +01:00
Lioncash
43b2eb4688 constant_propagation_pass: Fold SignExtend{Type}ToLong opcodes if possible 2020-04-22 20:57:37 +01:00
Lioncash
2da2cf9058 constant_propagation_pass: Fold SignExtend{Type}ToWord opcodes if possible 2020-04-22 20:57:37 +01:00
Lioncash
0583d401e3 ir/value: Add IsSignedImmediate() and IsUnsignedImmediate() functions to Value's interface
This allows testing against arbitrary values while also simultaneously
eliminating the need to check IsImmediate() all the time in expressions.
2020-04-22 20:57:37 +01:00
Lioncash
c42f6ea184 constant_propagation_pass: Fold ZeroExtend{Type}ToLong opcodes if possible
These are equivalent to the ZeroExtendXToWord variants, so we can
trivially do this as well.
2020-04-22 20:57:37 +01:00
Lioncash
e3258e8525 ir/value: Add a GetImmediateAsS64() function
Provides a signed analogue to GetImmediateAsU64() for consistency with
both integral classes when it comes to signed/unsigned..
2020-04-22 20:57:37 +01:00
Lioncash
2274214ff0 constant_propagation_pass: Combine zero-extension folding code into its own function
Separates the behavior from the actual switch statement and gets rid of
duplication, now that we can use the general GetImmediateAsU64()
function.
2020-04-22 20:57:37 +01:00
Lioncash
4a3c064b15 ir/value: Add an IsZero() member function to Value's interface
By far, one of the most common things to check for is whether or not a
value is zero, as it typically allows folding away unnecesary
operations (other close contenders that can help with eliding operations  are 1 and -1).

So instead of requiring a check for an immediate and then actually
retrieving the integral value and checking it, we can wrap it within a
function to make it more convenient.
2020-04-22 20:57:37 +01:00
Merry
c649f11c0a Merge pull request #401 from lioncash/folding
constant_propagation_pass: Fold &, |, ^, and ~ operations where applicable
2020-04-22 20:56:01 +01:00
MerryMage
2524d536b0 A32/ir_emitter: Bugfix: ExceptionRaised was producing incorrect PC
Use actual PC and not pipelined PC.
2020-04-22 20:56:01 +01:00
Lioncash
c09f4cf28e constant_propagation_pass: Fold NOT operations 2020-04-22 20:55:50 +01:00
Lioncash
d69fceec55 value: Move ImmediateToU64() to be a part of Value's interface
This'll make it slightly nicer to do basic constant folding for 32-bit
and 64-bit variants of the same IR opcode type. By that, I mean it's
possible to inspect immediate values without a bunch of conditional
checks beforehand to verify that it's possible to call GetU32() or
GetU64, etc.
2020-04-22 20:55:50 +01:00
Lioncash
8013548bbb constant_propagation_pass: Fold OR operations 2020-04-22 20:55:50 +01:00
MerryMage
ca603c1215 reg_alloc: Emit AVX instructions where able
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
898d096e39 constant_propagation_pass: Fold AND operations 2020-04-22 20:55:50 +01:00
MerryMage
e2358af5ef abi: Emit AVX instructions where able
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
f40fcda1f6 ir/value: Add member function to check whether or not all bits of a contained value are set
This is useful when we wish to know if a contained value is something
like 0xFFFFFFFF, as this helps perform constant folding. For example the
operation: x & 0xFFFFFFFF can be folded to just x in the 32-bit case.
2020-04-22 20:55:50 +01:00
MerryMage
7c0378f56d a64_exclusive_monitor: Loosen memory ordering requirements
It is not necessary to be as strict as it was.
2020-04-22 20:55:50 +01:00
Lioncash
0ea99b7d59 constant_propagation_pass: Fold EOR operations
It's possible to fold cases of exclusive OR operations if they can be
known to be an identity operation, or if both operands happen to be known
immediates, in which case we can just store the result of the
exclusive-OR directly.
2020-04-22 20:55:50 +01:00
MerryMage
f0920c0ded Fix VShift terminology
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.

- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16 2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64 2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32 2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16() 2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64() 2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32() 2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87 emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
pshufb lyfe
2020-04-22 20:55:50 +01:00
VelocityRa
c30b8dbe99 decoders: Cast to correctly-sized type before shifting
Fixes decoding for 64-bit instructions

Does not help/apply to any currently supported ARM versions (since
all are 32-bit length or below), it's for future-proofing should
such an arch be supported.
2020-04-22 20:55:50 +01:00
MerryMage
238f2f2cd0 a64_emit_x64: Lowercase PAGE_SIZE
PAGE_SIZE is defined as a macro by musl.
2020-04-22 20:55:50 +01:00
MerryMage
7162f6f254 emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed 2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699 emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE 2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732 emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8 2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16 2020-04-22 20:55:50 +01:00
MerryMage
09bf273bc8 A64: Implement SCVTF, UCVTF (vector, fixed-point), scalar variant 2020-04-22 20:55:06 +01:00
MerryMage
03ad2072a7 emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed 2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant 2020-04-22 20:55:06 +01:00
Lioncash
48df9b9a7d A64: Implement UQSHL's vector immediate and register variants 2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942 ir: Add opcodes for unsigned saturating left shifts 2020-04-22 20:55:06 +01:00
Lioncash
ab60720418 A64/translate/impl: Make signatures consistent for unimplemented by-element SIMD variants
Makes them all consistent, so it isn't necessary to change the
prototypes over when implementing them.
2020-04-22 20:55:06 +01:00
Lioncash
6b5ea6ee66 A64: Implement BRK
Currently, we can just implement this as part of the exception
interface, similar to how it's done for the A32 interface with BKPT.
2020-04-22 20:55:06 +01:00
Lioncash
b915364c16 A64/imm: Add full range of comparison operators to Imm template
Makes the comparison interface consistent by providing all of the
relevant members. This also modifies the comparison operators to take
the Imm instance by value, as it's really only a u32 under the covers,
and it's cheaper to shuffle around a u32 than a 64-bit pointer address.
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7 IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2020-04-22 20:55:06 +01:00
MerryMage
027b0ef725 A64: Implement SCVTF, UCVTF (scalar, fixed-point) 2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0 opcodes.inc: Align columns to a tabstop of 4 2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d IR: Add fbits argument to FixedToFP-related opcodes 2020-04-22 20:55:06 +01:00
Lioncash
616a153c16 A64: Implement SQSHL's vector immediate variant 2020-04-22 20:55:06 +01:00
Lioncash
e8b0f25dff A64: Implement SQSHL's vector register variant 2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
Lioncash
da55ed7b31 branch: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
867b666285 move_wide: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
78024a9dc4 load_store_register_unprivileged: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
e45e5da610 load_store_register_immediate: Place conditional bodies on their own line
Makes the conditionals visually consistent with the rest of the
codebase.
2020-04-22 20:55:06 +01:00
Lioncash
b586cf3f56 load_store_load_literal: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
c3a3b9687e data_processing_logical: Move datasize declarations after early-exit conditionals
While we're at it, make variables const where applicable.
2020-04-22 20:55:06 +01:00
Lioncash
ed797e6540 data_processing_conditional_select: Make variables const where applicable
Makes CSEL's function consistent with all of the others.
2020-04-22 20:55:06 +01:00
Lioncash
c82fa5ec5a data_processing_addsub: Move datasize declarations after early-exit conditionals
While we're at it, also make relevant variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
f4a66d2477 data_processing_bitfield: Move datasize variables after early-exit conditionals
Moves the declaration of datasize to the scope that it's used within.
This also takes the opportunity to apply const where applicable, and
make early-exits all vertically consistent with one another.
2020-04-22 20:55:06 +01:00
Lioncash
2e0fcd6161 A64: Implement CLS's vector variant
Leverages CLZ like the integral variant does.
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525 emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9 perf_map: Use std::string_view instead of std::string for PerfMapRegister()
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5 A64: Implement SQRDMULH (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
a9ffcf08b1 A64: Implement SQDMULL (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde backend_x64/a64_interface: Re-enable the constant folding pass
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2 emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused 2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1 emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64} 2020-04-22 20:55:06 +01:00
MerryMage
1492573267 emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32} 2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226 emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation 2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28 emit_x64_vector: Simplify fpsr_qc related code
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
112cff9ab9 A64: Implement CLZ's vector variant 2020-04-22 20:55:06 +01:00
Lioncash
e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage
d4c37a68a8 A64/translate: VectorZeroUpper for V(64) stores
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage
b8daa4feac simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper 2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00