Lioncash
fe84ecb780
A64: Handle half-precision floating point in scalar FABS
...
Now that we have the half-precision variant of the opcode added, we can
simply handle the instruction instead of treating it as undefined.
2020-04-22 20:58:12 +01:00
Lioncash
fac9224d5e
A64: Handle half-precision floating point in FCVTN
...
Now that we have IR instructions for performing conversions with
half-precision floating point, we can also handle half-precision values
within FCVTN.
2020-04-22 20:58:12 +01:00
Lioncash
8309ec7a9f
frontend/ir_emitter: Add half-precision variant of FPAbs
2020-04-22 20:58:12 +01:00
Lioncash
16de99d3e3
A64: Enable FCVT floating-point conversions for half-precision
...
With this, we no longer have to fall back to the interpreter in any of
the FCVT floating-point conversion instructions.
2020-04-22 20:58:12 +01:00
Lioncash
10abc77fad
A64: Handle half-precision floating point in scalar FNEG
...
With the half-precision variant of the FPNeg opcode added, we can
utilize it here to emulate the half-precision variant of FNEG.
2020-04-22 20:58:12 +01:00
Lioncash
e4c259d69f
frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes
2020-04-22 20:58:12 +01:00
Lioncash
c97efcb978
frontend/ir_emitter: Add half-precision variant of FPNeg
2020-04-22 20:58:12 +01:00
Lioncash
dff5da1063
common/fp/unpacked: Amend behavior of FPUnpackCV
...
This is supposed to call FPUnpackBase instead of FPUnpack. This would
result in alternate half-precision representations being misinterpreted
when it comes to dealing with NaNs.
2020-04-22 20:58:12 +01:00
Merry
f01afc5ae6
Merge pull request #456 from lioncash/mov
...
A64: Enable FMOV (general) for half-precision floating point
2020-04-22 20:58:12 +01:00
Lioncash
03bc2334fe
common/fp/op/FPConvert: Amend off-by one in double NaN case in FPConvertNaN
...
Avoids potentially clobbering the intended sign bit value during
conversions to double-precision values. The other conversion types are
already properly handled, so those don't need to be addressed.
2020-04-22 20:58:12 +01:00
Lioncash
c57b146fb2
common/fp/op/FPConvert: Add half-precision instantiations to FPConvert
2020-04-22 20:58:12 +01:00
Merry
c1ce94872d
Merge pull request #455 from lioncash/sqrdmulh-scalar
...
A64: Implement SQRDMULH and SQDMULL's scalar indexed variants
2020-04-22 20:58:11 +01:00
Lioncash
25a7256ee1
A64: Enable FMOV (general) for half-precision floating point
...
This just transfers values between vector registers and general-purpose
registers with no conversions performed, so this is trivial to add
support for half-precision to.
2020-04-22 20:58:11 +01:00
Lioncash
97dd3d0596
A64: Implement SQRDMULH's scalar indexed element variant
2020-04-22 20:58:11 +01:00
Lioncash
49b51e34f1
simd_vector_x_indexed_element: Deduplicate index and Vm operand construction
2020-04-22 20:58:11 +01:00
Lioncash
692aba91b6
A64: Implement SQDMULL{2}'s scalar indexed element variant
2020-04-22 20:58:11 +01:00
Lioncash
c043b831d5
A64: Implement SQDMULL{2}'s by-element variant
2020-04-22 20:58:11 +01:00
Lioncash
72af5a3dff
simd_scalar_x_indexed_element: Factor out index and Vm argument construction
...
This will be useful in the implementations of SQRDMULH and SQDMULL{2} as
well.
2020-04-22 20:58:11 +01:00
Lioncash
224ff0afaa
A64: Implement SQRDMULH's by-index vector variant
2020-04-22 20:58:11 +01:00
Lioncash
3a3542414b
A64: Implement FRECPX's half-precision floating point variant
2020-04-22 20:58:11 +01:00
Lioncash
bd892ec4ef
frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point
2020-04-22 20:58:11 +01:00
Lioncash
974fbf0677
frontend/ir/value: Add U16U32U64 type to represent floating point types
2020-04-22 20:58:11 +01:00
Lioncash
eb3e0d5908
common/fp/op/FPRecipExponent: Add half-precision floating point specialization
2020-04-22 20:58:11 +01:00
Lioncash
a829c93406
common/fp/unpacked: Correct edge-cases within FPUnpack for half-precision floating point
...
This corrects one case where floating-point exceptions could be set when
they're not supposed to be.
This also corrects a case where values were being treated as NaNs when
they weren't supposed to be.
2020-04-22 20:58:11 +01:00
Lioncash
7030b9af95
common/fp/process_nan: Add half-precision instantiations for NaN processing functions
2020-04-22 20:58:11 +01:00
Lioncash
14f55d7476
common/fp/unpacked: Add half-precision instantiation of FPRoundBase
2020-04-22 20:58:11 +01:00
Lioncash
7e814de445
common/fp/unpacked: Handle half-precision unpacking in FPUnpackBase
2020-04-22 20:58:11 +01:00
Lioncash
8f9fe8690a
common/fp/unpacked: Adjust FPUnpack to operate like ARM pseudocode
...
This function is defined as always disabling the AHP bit in the fpcr
before performing any operations.
At the same time, rename the original FPUnpack function to FPUnpackBase
to match the pseudocode in the ARM reference manual.
2020-04-22 20:58:11 +01:00
Merry
37c4c39d62
Merge pull request #448 from lioncash/saturate
...
A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants
2020-04-22 20:58:11 +01:00
Merry
f5d774bdbd
Merge pull request #449 from lioncash/hp
...
common/fp/info: Add specialization of FPInfo for half-precision floating point
2020-04-22 20:58:11 +01:00
Lioncash
126c29a9e9
A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants
...
These can just be implemented in terms of the vector variants for the
time being.
2020-04-22 20:58:11 +01:00
Lioncash
0b67b94b6c
common/fp/info: Add specialization of FPInfo for half-precision floating point
...
Puts the necessary info struct in place for further use.
2020-04-22 20:58:11 +01:00
Lioncash
dd7433f9d3
A64: Amend prototypes of some SIMD scalar shift by immediate opcodes
...
These take a vector for a destination.
2020-04-22 20:58:11 +01:00
Lioncash
99c494bae9
common/fp/unpacked: Add FPRoundCV
...
Corresponds to the equivalent pseudocode within the ARMv8 reference
manual. This will be necessary for supporting half-precision
floating-point.
This also makes use of it within FPConvert
2020-04-22 20:58:11 +01:00
Merry
bbd5330ad2
Merge pull request #447 from lioncash/flag
...
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Lioncash
490bebbd9a
common/fp/unpacked: Add FPUnpackCV
...
Adds a template function that performs the same behavior as in the ARM
pseudocode, and utilizes it in FPConvert, which will be necessary for
half-float support.
2020-04-22 20:58:11 +01:00
Merry
fb039e232c
Merge pull request #442 from lioncash/fcvtxn
...
A64: Implement scalar and vector variants of FCVTXN
2020-04-22 20:58:11 +01:00
Lioncash
6aed4036ef
ir_opt/a64_get_set_elimination_pass: Add handling for NZCV raw get and set operations
2020-04-22 20:58:11 +01:00
Merry
4f937c1ee1
Merge pull request #446 from lioncash/sqshl
...
A64: Implement scalar variants of SQSHL (register) and UQSHL (register)
2020-04-22 20:58:11 +01:00
Lioncash
aa22db534b
A64: Implement AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Merry
d74cccbc84
Merge pull request #445 from lioncash/sqrt
...
A64: Implement single and double-precision vector variant of FSQRT
2020-04-22 20:58:11 +01:00
Lioncash
20ffe568d0
A64: Implement RMIF
2020-04-22 20:58:11 +01:00
Merry
6d7e7c3269
Merge pull request #443 from lioncash/flag
...
A64: Rearrange flag format/manipulation instructions
2020-04-22 20:58:11 +01:00
Lioncash
51b526e453
A64: Implement CFINV
2020-04-22 20:58:11 +01:00
Merry
5d01f1b462
Merge pull request #441 from lioncash/constexpr
...
common/bit_util: Mark a few functions as constexpr
2020-04-22 20:58:11 +01:00
Lioncash
597a8be5d5
ir: Add A64-specific opcodes for getting and setting raw NZCV values
...
This will be necessary to implement the flag manipulation and flag
format instructions.
2020-04-22 20:58:11 +01:00
Merry
743c52fdc5
Merge pull request #440 from lioncash/include
...
common/fp: Remove unnecessary includes
2020-04-22 20:58:11 +01:00
Lioncash
d3515279df
A64: Implement the vector version of FCVTXN
2020-04-22 20:58:10 +01:00
Lioncash
17aea0b997
A64: Implement UQSHL (register)'s scalar variant
...
This can be implemented in terms of the vector variant.
2020-04-22 20:58:10 +01:00
Lioncash
c99d4b762e
A64: Implement single and double-precision vector variant of FSQRT
2020-04-22 20:58:10 +01:00
Lioncash
54e0b487f3
A64: Rearrange flag format/manipulation instructions
...
Gives these instructions better categorical labeling.
2020-04-22 20:58:10 +01:00
Lioncash
88d1977cb9
common/bit_util: Make a few functions as constexpr
...
These four functions can be made constexpr with no issue.
2020-04-22 20:58:10 +01:00
Lioncash
f33e5939b7
common/fp: Remove unnecessary includes
2020-04-22 20:58:10 +01:00
Lioncash
302f56b36a
A64: Fall back to interpreting for FCADD and FCMLA half-precision variants
...
Rather than straight-up treating them as undefined, we can fall back to an
interpreter in this case.
2020-04-22 20:58:10 +01:00
Lioncash
4339a8fff6
A64: Implement the scalar version of FCVTXN
2020-04-22 20:58:10 +01:00
Lioncash
35ddf68ad5
A64: Implement SQSHL (register)'s scalar variant
...
We can implement this in terms of the vector variant.
2020-04-22 20:58:10 +01:00
Lioncash
5cf1478620
frontend/ir: Add opcodes for vector square roots
2020-04-22 20:58:10 +01:00
Lioncash
36027ebef5
frontend/ir/microinstruction: Add missing cases for FPRecipExponent{32,64} for ReadsFromAndWritesToFPSRCumulativeExceptionBits()
...
This was intended to be added within #437 , but was missed
2020-04-22 20:58:10 +01:00
Merry
40b081438a
Merge pull request #439 from lioncash/fcmla
...
A64: Implement FCADD and FCMLA
2020-04-22 20:58:10 +01:00
Lioncash
7c81a58ed3
frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
...
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2020-04-22 20:58:10 +01:00
Merry
d91192681a
Merge pull request #438 from lioncash/fmulx
...
A64: Implement scalar double/single precision FMULX (by element)
2020-04-22 20:58:10 +01:00
Lioncash
ed29ef8cca
A64: Implement FCMLA
2020-04-22 20:58:10 +01:00
Lioncash
95af9dafbe
common/fp/op: Add FP conversion functions
2020-04-22 20:58:10 +01:00
Merry
9f11720a69
Merge pull request #437 from lioncash/frecpx
...
A64: Implement FRECPX (single, double precision)
2020-04-22 20:58:10 +01:00
Lioncash
bdcea0b0dc
A64: Implement scalar double/single precision FMULX (by element)
2020-04-22 20:58:10 +01:00
Lioncash
5ce17574f9
A64: Implement FCADD
2020-04-22 20:58:10 +01:00
Merry
34d917f34e
Merge pull request #436 from lioncash/no-alloc
...
A64: Implement LDNP/STNP
2020-04-22 20:58:10 +01:00
Lioncash
e44730ba6d
A64: Implement FRECPX (single, double precision)
2020-04-22 20:58:10 +01:00
Lioncash
bfaeb08d3c
A64: Implement LDNP/STNP
...
LDNP and STNP indicate that a memory access is non-temporal/streaming
(i.e. unlikely to be repeated), allowing data caching to not be
performed. However, given this is only a hint, we can treat these two
instructions as regular LDP and STP instructions for the time being.
2020-04-22 20:58:10 +01:00
Lioncash
9cf3c25811
frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents
2020-04-22 20:58:10 +01:00
Merry
dbf47db713
Merge pull request #434 from lioncash/format
...
A32/translate_arm: Formatting/tidying up
2020-04-22 20:58:10 +01:00
Lioncash
b168c2a9f9
common/fp/op: Add operations for floating-point reciprocal exponents
2020-04-22 20:58:10 +01:00
Lioncash
05a6ab691d
translate_arm/coprocessor: Minor tidying up
2020-04-22 20:58:10 +01:00
Lioncash
1e32a09c03
translate_arm/vfp2: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
e209b31073
translate_arm/synchronization: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
9514e3602e
translate_arm/status_register_access: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
c6aa1a708a
translate_arm/saturated: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
a72813599a
translate_arm/reversal: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
7be56e6b67
translate_arm/parallel: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
3c00a616d6
translate_arm/packing: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
c711188f46
translate_arm/multiply: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
c8dad40d81
translate_arm/misc: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
a7bf5ff77d
translate_arm/load_store: Invert conditionals where applicable
2020-04-22 20:58:10 +01:00
Lioncash
2e180a7f14
backend/x64/a32_interface: Mark Context move constructor and move assignment as noexcept
...
Provides a more "correct" move constructor/assignment operator, since
these relevant functions shouldn't throw exceptions.
Has the benefit of playing nicely with std::move_if_noexcept and other
noexcept library facilities.
2020-04-22 20:58:09 +01:00
Lioncash
f4b19a7393
translate_arm/extension: Invert conditionals where applicable
2020-04-22 20:58:09 +01:00
Lioncash
deb9dd4acc
block_of_code: Replace cast with [[maybe_unused]] in DoesCpuSupport()
2020-04-22 20:58:09 +01:00
Lioncash
c2de6ecfd0
translate_arm/exception_generating: Invert conditionals where applicable
2020-04-22 20:58:09 +01:00
Lioncash
d8a8d3b073
translate_arm/data_processing: Invert conditionals where applicable
2020-04-22 20:58:09 +01:00
Lioncash
df5c51ff47
translate_arm/branch: Invert conditionals where applicable
...
Allows unindenting code a bit.
2020-04-22 20:58:09 +01:00
Lioncash
3290a9fdc2
common: Remove address_range.h
...
The AddressRange structure isn't used anywhere within the codebase, so
this can be removed. Particularly because there's no real appeal/heavy
potential use of it in the future that isn't trivial to add back if
needed.
2020-04-22 20:57:38 +01:00
Lioncash
ee973f13c7
frontend/A32/ir_emitter: Mark PC() and AlignPC() as const-qualified member functions
...
These don't modify instance state, so they can be const-qualified member
functions.
2020-04-22 20:57:38 +01:00
Lioncash
3a2dd09122
frontend/A64/ir_emitter: Mark PC() and AlignPC() as const qualified member functions
...
These don't actually alter any instance state.
2020-04-22 20:57:38 +01:00
Lioncash
575ae852a9
constant_propagation_pass: Fold byte reversal opcodes where applicable
...
These are reasonably trivial to fold away when applicable. We just
perform the swap and replace the instruction with the constant value.
2020-04-22 20:57:37 +01:00
Merry
2c53f354ab
Merge pull request #418 from lioncash/fold-op
...
constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes
2020-04-22 20:57:37 +01:00
Merry
ad14a33672
Merge pull request #417 from lioncash/swap
...
common: Move byte swapping functions to bit_utils.h
2020-04-22 20:57:37 +01:00
Lioncash
d302d9bd0c
constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes
...
These are quite trivial to fold.
2020-04-22 20:57:37 +01:00
Lioncash
7139942976
common: Move byte swapping functions to bit_utils.h
...
These are quite general functions, so they can just be moved into common
instead of recreating a namespace here.
2020-04-22 20:57:37 +01:00
MerryMage
7c8fcaef26
emit_x64_vector_floating_point: AVX && DN implementation of EmitFPVectorMulX
2020-04-22 20:57:37 +01:00
MerryMage
e3898e628e
A64: Implement FMULX (by element), single and double precision variants
2020-04-22 20:57:37 +01:00
Lioncash
93351c7efb
a64_emit_x64: Make constness of loop elements explicit within GenFastmemFallbacks()
2020-04-22 20:57:37 +01:00
MerryMage
c106d8cedf
A64: Implement FMULX, vector single-precision and double-precision variant
2020-04-22 20:57:37 +01:00
Lioncash
7752ffc50c
a64_emit_x64: Convert std::vector instances in GenFastmemFallbacks() to std::array
...
Given these are quite small, we can avoid the need to heap allocate
here.
2020-04-22 20:57:37 +01:00
MerryMage
fa8925c4df
IR: Implement FPVectorMulX
2020-04-22 20:57:37 +01:00
Michał Janiszewski
bbd8abaa25
Provide justification for always-true condition ( #412 )
2020-04-22 20:57:37 +01:00
Michał Janiszewski
7d0e918b51
Add missing include guards
2020-04-22 20:57:37 +01:00
V.Kalyuzhny
764a93bf5a
Switch boost::optional to std::optional
2020-04-22 20:57:37 +01:00
Lioncash
07c197e8d0
constant_propagation_pass: Add 64-bit variants of shifts to the pass
...
These optimizations can also apply to the 64-bit variants of the shift
opcodes; we just need to check if the instruction has an associated
pseudo-op before performing the 32-bit variant's specifics.
While we're at it, we can also relocate the code to its own function
like the rest of the cases to keep organization consistent.
2020-04-22 20:57:37 +01:00
Lioncash
8248999c5d
constant_propagation_pass: Fold division operations where applicable
...
We can fold division operations if:
1. The divisor is zero, then we can replace the result with zero (as this is how
ARM platforms expect it).
2. Both values are known, in which case we can just do the operation and
store the result
3. The divisor is 1, in which case just return the other operand.
2020-04-22 20:57:37 +01:00
Merry
d83eae2004
Merge pull request #406 from lioncash/mul
...
constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable
2020-04-22 20:57:37 +01:00
Merry
73d9393300
Merge pull request #405 from lioncash/inst
...
a64: Add ARMv8.4+ instructions encodings to the encoding table
2020-04-22 20:57:37 +01:00
Lioncash
7ad6981437
constant_propagation_pass: deduplicate common 32/64 bit checking for results in folding functions
...
It's common for an folding operation to apply to both the 32-bit and
64-bit variant of the same opcode, which leads to checking which kind of
result we need to store the value as. This moves it to its own function,
so that we don't need to duplicate it in various functions.
2020-04-22 20:57:37 +01:00
Lioncash
f1a66c37ba
a64: Add ARMv8.4+ instructions encodings to the encoding table
...
Keeps the table up to date with the ARM specification.
2020-04-22 20:57:37 +01:00
Lioncash
72daf37208
constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable
...
Multiplication operations can currently be folded if:
1. Both arguments are known constant values
2. Either operand is zero (in which case the result is also zero)
3. Either operand is one (in which case the result is the non-one
operand).
2020-04-22 20:57:37 +01:00
Lioncash
43b2eb4688
constant_propagation_pass: Fold SignExtend{Type}ToLong opcodes if possible
2020-04-22 20:57:37 +01:00
Lioncash
2da2cf9058
constant_propagation_pass: Fold SignExtend{Type}ToWord opcodes if possible
2020-04-22 20:57:37 +01:00
Lioncash
0583d401e3
ir/value: Add IsSignedImmediate() and IsUnsignedImmediate() functions to Value's interface
...
This allows testing against arbitrary values while also simultaneously
eliminating the need to check IsImmediate() all the time in expressions.
2020-04-22 20:57:37 +01:00
Lioncash
c42f6ea184
constant_propagation_pass: Fold ZeroExtend{Type}ToLong opcodes if possible
...
These are equivalent to the ZeroExtendXToWord variants, so we can
trivially do this as well.
2020-04-22 20:57:37 +01:00
Lioncash
e3258e8525
ir/value: Add a GetImmediateAsS64() function
...
Provides a signed analogue to GetImmediateAsU64() for consistency with
both integral classes when it comes to signed/unsigned..
2020-04-22 20:57:37 +01:00
Lioncash
2274214ff0
constant_propagation_pass: Combine zero-extension folding code into its own function
...
Separates the behavior from the actual switch statement and gets rid of
duplication, now that we can use the general GetImmediateAsU64()
function.
2020-04-22 20:57:37 +01:00
Lioncash
4a3c064b15
ir/value: Add an IsZero() member function to Value's interface
...
By far, one of the most common things to check for is whether or not a
value is zero, as it typically allows folding away unnecesary
operations (other close contenders that can help with eliding operations are 1 and -1).
So instead of requiring a check for an immediate and then actually
retrieving the integral value and checking it, we can wrap it within a
function to make it more convenient.
2020-04-22 20:57:37 +01:00
Merry
c649f11c0a
Merge pull request #401 from lioncash/folding
...
constant_propagation_pass: Fold &, |, ^, and ~ operations where applicable
2020-04-22 20:56:01 +01:00
MerryMage
2524d536b0
A32/ir_emitter: Bugfix: ExceptionRaised was producing incorrect PC
...
Use actual PC and not pipelined PC.
2020-04-22 20:56:01 +01:00
Lioncash
c09f4cf28e
constant_propagation_pass: Fold NOT operations
2020-04-22 20:55:50 +01:00
Lioncash
d69fceec55
value: Move ImmediateToU64() to be a part of Value's interface
...
This'll make it slightly nicer to do basic constant folding for 32-bit
and 64-bit variants of the same IR opcode type. By that, I mean it's
possible to inspect immediate values without a bunch of conditional
checks beforehand to verify that it's possible to call GetU32() or
GetU64, etc.
2020-04-22 20:55:50 +01:00
Lioncash
8013548bbb
constant_propagation_pass: Fold OR operations
2020-04-22 20:55:50 +01:00
MerryMage
ca603c1215
reg_alloc: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
898d096e39
constant_propagation_pass: Fold AND operations
2020-04-22 20:55:50 +01:00
MerryMage
e2358af5ef
abi: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
f40fcda1f6
ir/value: Add member function to check whether or not all bits of a contained value are set
...
This is useful when we wish to know if a contained value is something
like 0xFFFFFFFF, as this helps perform constant folding. For example the
operation: x & 0xFFFFFFFF can be folded to just x in the 32-bit case.
2020-04-22 20:55:50 +01:00
MerryMage
7c0378f56d
a64_exclusive_monitor: Loosen memory ordering requirements
...
It is not necessary to be as strict as it was.
2020-04-22 20:55:50 +01:00
Lioncash
0ea99b7d59
constant_propagation_pass: Fold EOR operations
...
It's possible to fold cases of exclusive OR operations if they can be
known to be an identity operation, or if both operands happen to be known
immediates, in which case we can just store the result of the
exclusive-OR directly.
2020-04-22 20:55:50 +01:00
MerryMage
f0920c0ded
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16
2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64
2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32
2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()
2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()
2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()
2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87
emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
...
pshufb lyfe
2020-04-22 20:55:50 +01:00
VelocityRa
c30b8dbe99
decoders: Cast to correctly-sized type before shifting
...
Fixes decoding for 64-bit instructions
Does not help/apply to any currently supported ARM versions (since
all are 32-bit length or below), it's for future-proofing should
such an arch be supported.
2020-04-22 20:55:50 +01:00
MerryMage
238f2f2cd0
a64_emit_x64: Lowercase PAGE_SIZE
...
PAGE_SIZE is defined as a macro by musl.
2020-04-22 20:55:50 +01:00
MerryMage
7162f6f254
emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed
2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699
emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE
2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732
emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8
2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa
emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16
2020-04-22 20:55:50 +01:00
MerryMage
09bf273bc8
A64: Implement SCVTF, UCVTF (vector, fixed-point), scalar variant
2020-04-22 20:55:06 +01:00
MerryMage
03ad2072a7
emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed
2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd
A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant
2020-04-22 20:55:06 +01:00
Lioncash
48df9b9a7d
A64: Implement UQSHL's vector immediate and register variants
2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942
ir: Add opcodes for unsigned saturating left shifts
2020-04-22 20:55:06 +01:00
Lioncash
ab60720418
A64/translate/impl: Make signatures consistent for unimplemented by-element SIMD variants
...
Makes them all consistent, so it isn't necessary to change the
prototypes over when implementing them.
2020-04-22 20:55:06 +01:00
Lioncash
6b5ea6ee66
A64: Implement BRK
...
Currently, we can just implement this as part of the exception
interface, similar to how it's done for the A32 interface with BKPT.
2020-04-22 20:55:06 +01:00
Lioncash
b915364c16
A64/imm: Add full range of comparison operators to Imm template
...
Makes the comparison interface consistent by providing all of the
relevant members. This also modifies the comparison operators to take
the Imm instance by value, as it's really only a u32 under the covers,
and it's cheaper to shuffle around a u32 than a 64-bit pointer address.
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2020-04-22 20:55:06 +01:00
MerryMage
027b0ef725
A64: Implement SCVTF, UCVTF (scalar, fixed-point)
2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0
opcodes.inc: Align columns to a tabstop of 4
2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d
IR: Add fbits argument to FixedToFP-related opcodes
2020-04-22 20:55:06 +01:00
Lioncash
616a153c16
A64: Implement SQSHL's vector immediate variant
2020-04-22 20:55:06 +01:00
Lioncash
e8b0f25dff
A64: Implement SQSHL's vector register variant
2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46
ir: Add opcodes for left signed saturated shifts
2020-04-22 20:55:06 +01:00
Lioncash
da55ed7b31
branch: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
867b666285
move_wide: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
78024a9dc4
load_store_register_unprivileged: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
e45e5da610
load_store_register_immediate: Place conditional bodies on their own line
...
Makes the conditionals visually consistent with the rest of the
codebase.
2020-04-22 20:55:06 +01:00
Lioncash
b586cf3f56
load_store_load_literal: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
c3a3b9687e
data_processing_logical: Move datasize declarations after early-exit conditionals
...
While we're at it, make variables const where applicable.
2020-04-22 20:55:06 +01:00
Lioncash
ed797e6540
data_processing_conditional_select: Make variables const where applicable
...
Makes CSEL's function consistent with all of the others.
2020-04-22 20:55:06 +01:00
Lioncash
c82fa5ec5a
data_processing_addsub: Move datasize declarations after early-exit conditionals
...
While we're at it, also make relevant variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
f4a66d2477
data_processing_bitfield: Move datasize variables after early-exit conditionals
...
Moves the declaration of datasize to the scope that it's used within.
This also takes the opportunity to apply const where applicable, and
make early-exits all vertically consistent with one another.
2020-04-22 20:55:06 +01:00
Lioncash
2e0fcd6161
A64: Implement CLS's vector variant
...
Leverages CLZ like the integral variant does.
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525
emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
...
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9
perf_map: Use std::string_view instead of std::string for PerfMapRegister()
...
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5
A64: Implement SQRDMULH (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
a9ffcf08b1
A64: Implement SQDMULL (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde
backend_x64/a64_interface: Re-enable the constant folding pass
...
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2
emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d
emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused
2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1
emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba
emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}
2020-04-22 20:55:06 +01:00
MerryMage
1492573267
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
112cff9ab9
A64: Implement CLZ's vector variant
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage
d4c37a68a8
A64/translate: VectorZeroUpper for V(64) stores
...
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage
b8daa4feac
simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
14e026a7f0
A64: Implement USQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00
Lioncash
18ad7f237d
A64: Implement SUQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da
ir: Add opcodes for signed saturated accumulations of unsigned values
2020-04-22 20:55:05 +01:00
Lioncash
9a3d38d2ee
A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
...
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2020-04-22 20:55:05 +01:00
Lioncash
6ccfbc9b39
A64: Implement UMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
58e21f175c
A64: Implement SMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
134bb02e19
ir/value: Replace includes with forward declarations
...
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2020-04-22 20:55:05 +01:00
Lioncash
2c8e07e7d0
ir/cond: Migrate to C++17 nested namespace specifiers
2020-04-22 20:55:05 +01:00
Lioncash
c3b7819a55
CMakeLists: Add missing cond.h header to file listing
...
Allows the file to show up within IDEs more easily.
2020-04-22 20:55:05 +01:00
Lioncash
0a3976059f
A64: Implement URSQRTE
2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d
ir: Add opcodes for performing unsigned reciprocal square root estimates
2020-04-22 20:55:05 +01:00
Lioncash
bd3582e811
A64: Implement URECPE
2020-04-22 20:55:05 +01:00
Lioncash
af83360f89
ir: Add opcodes for unsigned reciprocal estimate
2020-04-22 20:55:05 +01:00
Lioncash
740ffa52ae
A64: Implement SQNEG's scalar and vector variant
2020-04-22 20:53:46 +01:00
Lioncash
fca7eddb9e
A64: Add opcodes for signed saturating negations
2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
...
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
...
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
Lioncash
f5fb496e7e
A64: Implement SQDMULH's by-element scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
40f0576995
A64: Implement SQDMULH's by-element vector variant
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d
backend/x64: Do not clear fast_dispatch_table if not enabled
...
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660
A64: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422
A32: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13
ir/terminal: Add FastDispatchHint
2020-04-22 20:53:46 +01:00
Lioncash
1a69a61cb4
A64: Implement SQDMULH's scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c
ir: Add opcodes for scalar signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
9c03311fed
A64: Implement SQDMULH's vector variant
2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546
ir: Add opcodes for signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
db24e1f09b
A64: Implement SQABS' scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
bda5d14c7f
A64: Implement SQABS' vector variant.
2020-04-22 20:53:46 +01:00
Lioncash
0507e47420
ir: Add opcodes for signed saturated absolute values
2020-04-22 20:53:46 +01:00
MerryMage
27427595b7
emit_x64_floating_point: EmitFPToFixed: maxsd optimization
...
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a
emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
...
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
MerryMage
3415828fb4
IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64}
2020-04-22 20:53:46 +01:00
Lioncash
e30f9816ec
A32/decoder: Add missing <algorithm> includes
...
These includes should be present, as we use std::find_if() within these headers.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905
emit_x64_vector: Provide AVX path for EmitVectorMinU64()
2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06
emit_x64_vector: Provide AVX path for EmitVectorMinS64()
2020-04-22 20:53:46 +01:00
Lioncash
770723f449
emit_x64_vector: Provide AVX path for EmitVectorMaxU64()
2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1
emit_x64_vector: Provide AVX path for EmitVectorMaxS64()
2020-04-22 20:53:46 +01:00
Lioncash
2cac6ad129
emit_x64_vector: Simplify EmitVectorLogicalLeftShift8()
...
Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead
of time and just and the results of a halfword left shift.
2020-04-22 20:53:46 +01:00
Lioncash
135107279d
emit_x64_vector: Simplify EmitVectorLogicalShiftRight8()
...
We can generate the mask and AND it against the result of a halfword
shift instead of looping.
2020-04-22 20:53:46 +01:00
Lioncash
2952b46b16
emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16()
...
We should be defining the value after the results have been calculated
to be consistent with the rest of the code.
2020-04-22 20:53:46 +01:00
Lioncash
fda19095ea
emit_x64_vector: Remove fallback in EmitVectorSignExtend64()
...
This is fairly trivial to do manually.
2020-04-22 20:53:46 +01:00
Lioncash
39593fcd26
emit_x64_vector: Remove fallback for EmitVectorSignExtend32()
...
We can just do the extension manually, which gets rid of the need to
fall back here.
2020-04-22 20:53:46 +01:00
Lioncash
053175f69b
ir_emitter: Rename fpscr_controlled parameters to fpcr_controlled
...
Part of addressing #333
2020-04-22 20:53:46 +01:00
MerryMage
f0184c4b8d
a32/exception_generating: BPKT: Define unpredictable behaviour
...
Define unpredictable behaviour to be BKPT executes conditionally
2020-04-22 20:53:46 +01:00
MerryMage
a12854857b
A32: Add define_unpredictable_behaviour option
2020-04-22 20:53:46 +01:00
MerryMage
b0abaa8312
A32/location_descriptor: Change formatting to use hex
2020-04-22 20:53:46 +01:00
MerryMage
ccbf6c7f63
microinstruction: A32ExceptionRaised causes CPU exception
2020-04-22 20:53:46 +01:00
MerryMage
6595e49a31
A32/types: CondToString: Add nv
2020-04-22 20:53:46 +01:00
MerryMage
d5b9c4a4bb
block_of_code: Hide NX support behind compiler flag
...
Systems that require W^X can use the DYNARMIC_ENABLE_NO_EXECUTE_SUPPORT cmake option.
2020-04-22 20:53:46 +01:00
MerryMage
de4494ffa5
Implement perfmap
2020-04-22 20:53:46 +01:00
MerryMage
f73104633b
a32_emit_x64: Fix incorrect BMI2 implementation for SetCpsr
...
* The MSB for each byte in cpsr_ge were not being appropriately set.
* We also expand test coverage to test this case.
* We fix the disassembly of the MSR (imm) and MSR (reg) instructions as well.
2020-04-22 20:53:46 +01:00
MerryMage
3432a08e0a
backend/x64: Support W^X systems
...
Closes #176 .
2020-04-22 20:53:46 +01:00
BreadFish64
2a65442933
Backend: Create "backend" folder
...
similar to the "frontend" folder
2020-04-22 20:53:46 +01:00
MerryMage
3b13f1eb12
A64/translate: Standardize arguments of helper functions
...
Don't pass in IREmitter when TranslatorVisitor is already available.
2020-04-22 20:53:45 +01:00
MerryMage
a4e556d59c
A64/translate: Standardize TranslatorVisitor abbreviation
...
Prefer v to tv.
2020-04-22 20:53:45 +01:00
MerryMage
9a0dc61efd
emit_x64_vector: Avoid recalculating addresses in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
Lioncash
3d465e2c36
A64: Implement SQXTN, SQXTUN, and UQXTN's scalar variants
...
We can implement these in terms of the vector variants
2020-04-22 20:53:45 +01:00
Lioncash
4ff39c6ea8
A64: Implement SDOT and UDOT's (by element) variants
...
Gets all of the dot product instructions out of the way.
2020-04-22 20:53:45 +01:00
MerryMage
21df1fb539
emit_x64_vector: Don't load zero constant from memory in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
3bbcca8757
emit_x64_vector: Special-case is_defaults_zero && table_size == 2 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
9cc00f900c
emit_x64_vector: Release registers when possible in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a12afd1065
reg_alloc: Add the ability to Release an allocation early
2020-04-22 20:53:45 +01:00
MerryMage
e68bd3c6c1
emit_x64_vector: Special-case table_size == 1 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a4e1f8a63a
emit_x64_vector: SSE4.1 implementation of EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
0c18b85c27
A64: Implement TBL and TBX
2020-04-22 20:53:45 +01:00
MerryMage
89d08c7d61
IR: Add VectorTable and VectorTableLookup IR instructions
2020-04-22 20:53:45 +01:00
MerryMage
0288974512
opcodes: Cleanup opcodes table
...
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2020-04-22 20:53:45 +01:00
Lioncash
d9fc6cf31f
A64: Implement SDOT and UDOT's vector variant
2020-04-22 20:53:45 +01:00
Lioncash
cb5e5c5d49
A64: Implement SADALP and UADALP
...
While we're at it we can join the code for SADDLP and UADDLP with these
instructions, since the only difference is we do an accumulate at the
end of the operation.
2020-04-22 20:53:45 +01:00
Lioncash
29f8b30634
A64: Implement SRSHL and URSHL
...
Implements both scalar and vector variants.
2020-04-22 20:53:45 +01:00
Lioncash
0efa2ce3b0
ir: Add opcodes for performing rounding left shifts
2020-04-22 20:53:45 +01:00
MerryMage
656ceff225
emit_x64_floating_point: Fix smallest normal check in EmitFPMulAdd
2020-04-22 20:53:45 +01:00
Lioncash
f3f60cd179
A64: Implement ISB
...
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2020-04-22 20:53:45 +01:00
Lioncash
be53e356a2
A64: Implement FCVTN{2}
2020-04-22 20:53:45 +01:00
Lioncash
4c3d7c5a8d
A64: Implement FCVTL{2}
2020-04-22 20:53:45 +01:00
Lioncash
7eb6be7a6a
A64: Implement FMAXNM and FMINNM vector variants.
...
Currently we can implement these in terms of the scalar IR variants.
2020-04-22 20:53:45 +01:00
Lioncash
8b65ea68c0
A64: Implement FMAXP, FMAXNMP, FMINP, and FMINNMP's vector variants
...
We can just implement these in terms of scalars for the time being.
2020-04-22 20:53:45 +01:00
MerryMage
ec76f95f5a
emit_x64_vector_floating_point: Correct value of smallest_normal_number
2020-04-22 20:53:45 +01:00
MerryMage
e60d6c0d20
fp/info: Incorrect point_position in FPValue
2020-04-22 20:53:45 +01:00
MerryMage
8a3b6364c2
load_store_exclusive: Define s == t state to be Constraint_NONE
...
Downstream (yuzu) mentioned that the instruction:
STXR W9, W9, [X0]
was executed in the program "Crash N-Sane Trilogy".
2020-04-22 20:53:45 +01:00
MerryMage
cd40e4dae0
A64/translate: Allow for unpredictable behaviour to be defined
2020-04-22 20:53:45 +01:00
MerryMage
d1d6f4feb5
system: Implement MRS CNTFRQ_EL0
2020-04-22 20:53:45 +01:00
Lioncash
7ef7def661
A64: Implement SQ{ADD, SUB}, and UQ{ADD, SUB}'s vector variants
...
Currently we implement these in terms of the scalar variants. Falling
back to the interpreter is slow enough to make it more effective than
doing that.
2020-04-22 20:46:23 +01:00
Lioncash
a4b0e2ace6
A64: Implement UQADD/UQSUB's scalar variants
2020-04-22 20:46:23 +01:00
Lioncash
acbaf04fef
ir: Add opcodes for unsigned saturating add and subtract
2020-04-22 20:46:23 +01:00
Lioncash
c41b5a3492
x64/reg_alloc: Use type alias for array returned by GetArgumentInfo()
...
This way if the number ever changes, we don't need to change the type in
other places.
2020-04-22 20:46:23 +01:00
Lioncash
2188765e28
ir/value: Use type alias CoprocessorInfo for std::array<u8, 8>
...
Provides a more descriptive label for the interface, and avoids the need
to hardcode the array size in multiple places.
2020-04-22 20:46:23 +01:00
MerryMage
71e137715d
status_register_access: Add support for bits 0 and 1 of mask to MSR
2020-04-22 20:46:23 +01:00
MerryMage
ac51c2547d
A32/translate/load_store: Correct detection of writeback
2020-04-22 20:46:23 +01:00
MerryMage
d345220251
A32/translate: Add TranslateSingleInstruction
2020-04-22 20:46:23 +01:00
MerryMage
5fc197c564
A32/ir_emitter: Bug fix: IREmitter::ExceptionRaised using incorrect opcode
2020-04-22 20:46:23 +01:00
MerryMage
ff3805e332
A32/decoders: Split instruction list into include file
2020-04-22 20:46:23 +01:00
MerryMage
3f4d118d73
microinstruction: Improve assert messages
2020-04-22 20:46:23 +01:00
MerryMage
a7e6f2a235
emit_x64_vector: EmitVectorNarrow16: AVX512 implementation
2020-04-22 20:46:23 +01:00
MerryMage
b6350e3947
emit_x64_vector: EmitVectorNarrow32: prefer pblendw to loading constant
2020-04-22 20:46:23 +01:00
MerryMage
8fdba189cb
emit_x64_vector: packusdw is SSE4.1
2020-04-22 20:46:23 +01:00
MerryMage
1ef388d1cd
emit_x64_vector_floating_point: Simplify FPVector{Min,Max}
2020-04-22 20:46:23 +01:00
MerryMage
4a1ce797cb
emit_x64_vector_floating_point: Simplify Get*Vector functions
2020-04-22 20:46:23 +01:00
MerryMage
bcaced297a
emit_x64_floating_point: Remove EmitProcessNaNs
2020-04-22 20:46:23 +01:00
MerryMage
2e0885388e
devirtualize: Replace DEVIRT macro with function template
2020-04-22 20:46:23 +01:00
Lioncash
54d8552177
a32_emit_x64: std::move A32::UserConfig in the constructor
...
This avoids a few redundant atomic increments and decrements,
considering the UserConfig instance contains a std::array of
std::shared_ptr<Coprocessor> instances.
2020-04-22 20:46:23 +01:00
MerryMage
b098c650df
emit_x64_floating_point: Use EmitPostProcessNaNs in EmitFPMulX
2020-04-22 20:46:23 +01:00
MerryMage
c1babf41b2
emit_x64_floating_point: Remove unnecessary DenormalsAreZero from EmitFPSingleToDouble and EmitFPDoubleToSingle
2020-04-22 20:46:23 +01:00
MerryMage
700088408d
emit_x64_floating_point: Simplify EmitFP{Min,Max}{,Numeric}{32,64}
2020-04-22 20:46:23 +01:00
MerryMage
07e0585994
emit_x64_floating_point: Reduce NaN processing overhead
2020-04-22 20:46:23 +01:00
MerryMage
f5e11d117a
A64: Implement FMULX, scalar single/double variant
2020-04-22 20:46:23 +01:00
MerryMage
17f73974f2
IR: Implement FPMulX IR instruction
2020-04-22 20:46:23 +01:00
Lioncash
391e16be64
emit_x64_vector: Vectorize 32-bit variants of paired min/max
...
Gets rid of the fallbacks for these cases.
2020-04-22 20:46:23 +01:00
MerryMage
5ae045d67e
emit_x64_vector: Improve code emission of VectorGetElement* for index == 0
2020-04-22 20:46:23 +01:00
MerryMage
e9ab7f7664
reg_alloc: Do a UseScratch if a Use destination is too small
2020-04-22 20:46:23 +01:00
MerryMage
90f8dda966
emit_x64_floating_point: AVX implementation of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
dfb660cd16
emit_x64_vector_floating_point: Prefer blendvp{s,d} to vblendvp{s,d} where possible
...
It's a cheaper instruction.
2020-04-22 20:46:23 +01:00
MerryMage
476c0f15da
backend_x64: Remove all use of xmm0
2020-04-22 20:46:23 +01:00
MerryMage
8252efd7b1
emit_x64_vector_floating_point: AVX implementation of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
746dc521b9
emit_x64_vector_floating_point: Reduce codesize of ForceToDefaultNaN
2020-04-22 20:46:23 +01:00
MerryMage
7731dcdca9
emit_x64_vector_floating_point: Reduce codesize of EmitTwoOpVectorOperation
2020-04-22 20:46:23 +01:00
MerryMage
bb93353f94
emit_x64_vector_floating_point: Correct FMA in FTZ mode
...
x64 rounds before flushing to zero
AArch64 rounds after flushing to zero
This difference of behaviour is noticable if something would round to a smallest normalized number
2020-04-22 20:46:23 +01:00
MerryMage
8ef195db3c
emit_x64_floating_point: DenormalsAreZero is redundant as hardware already does DAZ
...
Exceptions: F{MIN,MAX}{,NM}
2020-04-22 20:46:23 +01:00
MerryMage
de9d8c461c
emit_x64_floating_point: FlushToZero is redundant as hardware already does FTZ
2020-04-22 20:46:23 +01:00
MerryMage
822fd4a875
backend_x64: Fix FPVectorMulAdd and FPMulAdd NaN handling with denormals
...
Denormals should be treated as zero in NaN handler
2020-04-22 20:46:23 +01:00
MerryMage
b393e15ab6
backend_x64: Fix bugs when FPCR.FZ=1
...
Bugs:
* DenormalsAreZero flushed to positive zero instead of preserving sign.
* FMAXNM/FMINNM (scalar) should perform DAZ *before* special zero handling.
* FMAX/FMIN/FMAXNM/FMINNM (vector) did not DAZ.
2020-04-22 20:46:23 +01:00
MerryMage
5e88d66470
fp/info: Deduplicate functions
2020-04-22 20:46:23 +01:00
MerryMage
2019d32743
emit_x64_floating_point: Deduplicate EmitFPMulAdd implementation
2020-04-22 20:46:23 +01:00
MerryMage
e038fe72df
emit_x64_floating_point: Deduplicate code
2020-04-22 20:46:23 +01:00
MerryMage
ec82a845b7
emit_x64_vector_floating_point: Fix FPVector{Max,Min} when FPCR.DN = 1
2020-04-22 20:46:23 +01:00
MerryMage
7f27945411
emit_x64_floating_point: Fix FP{Max,Min} when FPCR.DN = 1
2020-04-22 20:46:23 +01:00
MerryMage
21a28c2545
IR: SSE4.1 implementation of FPVectorRoundInt
2020-04-22 20:46:23 +01:00
MerryMage
9669e49817
A64: Implement FRINT{N,M,P,Z,A,X,I} (vector), single/double variant
2020-04-22 20:46:23 +01:00
MerryMage
f976c47008
IR: Initial implementation of FPVectorRoundInt
2020-04-22 20:46:23 +01:00
MerryMage
f2393488fe
A64: Implement SQADD and SQSUB, scalar variant
2020-04-22 20:46:23 +01:00
MerryMage
10e196480f
IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths
2020-04-22 20:46:23 +01:00
MerryMage
71db0e67ae
a64_emit_x64: Bugfix EmitA64OrQC - Incorrect argument
2020-04-22 20:46:23 +01:00
Lioncash
d0fdd3c6e6
simd_three_same: Extract non-paired SMAX, SMIN, UMAX, UMIN code to a common function
...
Deduplicates a bit of code and makes its layout consistent with the
paired variants
2020-04-22 20:46:23 +01:00
Lioncash
2bea2d0512
A64: Implement SMAXP, SMINP, UMAXP, UMINP
2020-04-22 20:46:23 +01:00
Lioncash
463b9a3d02
ir: Add opcodes for vector paired maximum and minimums
...
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2020-04-22 20:46:23 +01:00
Lioncash
43344c5400
A64: Implement SMAXV, SMINV, UMAXV, and UMINV
2020-04-22 20:46:23 +01:00
Lioncash
2501bfbfae
ir: Add opcodes for performing scalar integral min/max
2020-04-22 20:46:23 +01:00
Lioncash
7fdd8b0197
A64: Implement PMULL{2}
2020-04-22 20:46:23 +01:00
Lioncash
5ebf496d4e
translate: Deduplicate GetDataSize() functions
...
Avoids defining the same function multiple times in different files.
2020-04-22 20:46:22 +01:00
Lioncash
f83cd2da9a
floating_point_{conditional}_compare: Deduplicate code
...
Deduplicates the implementation code of instructions by extracting the
code to a common function.
2020-04-22 20:46:22 +01:00
MerryMage
f9c6d5e1a0
common: Move all cryptographic function to common/crypto
2020-04-22 20:46:22 +01:00
MerryMage
5dc23e49d7
a32_emit_x64: BMI2 implementation of A32SetCpsr
2020-04-22 20:46:22 +01:00
MerryMage
0f85305933
a32_emit_x64: Shorten EmitA32GetCpsr
2020-04-22 20:46:22 +01:00
MerryMage
9fe2bf8733
a32_emit_x64: Assert that memory layout assumption in EmitA32GetCpsr is valid
2020-04-22 20:46:22 +01:00
Lioncash
b48fb8ca6b
A64: Implement PMUL
2020-04-22 20:46:22 +01:00
Lioncash
affa312d1d
ir: Add opcode for performing polynomial multiplication
2020-04-22 20:46:22 +01:00
MerryMage
dd4ac86f8e
A64: Implement FCVT{N,M,A,P}{U,S} (vector), FCVTZU (vector, integer), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
28b38916a8
A64: Implement FCVTZS (vector, integer), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
507bcd8b8b
IR: Implement FPVectorTo{Signed,Unsigned}Fixed
2020-04-22 20:46:22 +01:00
MerryMage
8f75a1fe04
fp/info: Replace constant value generators with FPValue
...
Instead of having multiple different functions we can just have one.
2020-04-22 20:46:22 +01:00
MerryMage
da261772ea
emit_x64_vector_floating_point: AVX implementation of FPVector{Max,Min}
2020-04-22 20:46:22 +01:00
MerryMage
a0d6f0de57
emit_x64_vector_floating_point: Remove unnecessary double jump in HandleNaNs
2020-04-22 20:46:22 +01:00
Lioncash
c778c7b868
A64: Implement FMAX's vector single and double precision variants
2020-04-22 20:46:22 +01:00
Lioncash
009879d92b
A64: Implement FMIN's vector single and double precision variants
2020-04-22 20:46:22 +01:00
MerryMage
7b03da86c2
IR: Implement FPVector{Max,Min}
2020-04-22 20:46:22 +01:00
MerryMage
e76e1186bb
FPRecipEstimate: Move offset out of function
...
MSVC has weird lambda capturing rules.
2020-04-22 20:46:22 +01:00
MerryMage
ddcff86f9c
microinstruction: Update ReadsFromAndWritesToFPSRCumulativeExceptionBits
2020-04-22 20:46:22 +01:00
MerryMage
10de36394e
A64: Implement FRECPS, vector/scalar single/double variants
2020-04-22 20:46:22 +01:00
MerryMage
901bd9b4e2
IR: Implement FPRecipStepFused, FPVectorRecipStepFused
2020-04-22 20:46:22 +01:00
MerryMage
f66f61d8ab
A64: Implement FRECPE, vector single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
939f5f5c7a
IR: Implement FPVectorRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
27c73dd56a
A64: Implement FRECPE, scalar single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
fc2d33ae7b
IR: Implement FPRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
c1dcfe29f7
IR: Implement FPRecipEstimate
2020-04-22 20:46:22 +01:00
MerryMage
7a673a8a43
fp: Change FPUnpacked to a normalized representation
...
Having a known position for the highest set bit makes writing algorithms easier
2020-04-22 20:46:22 +01:00
MerryMage
3fe45c6d8e
block_of_code: Add ABI_PARAMS array
2020-04-22 20:46:22 +01:00
MerryMage
642b6c31d2
A64: Implement MLA, MLS (by element), vector single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
0de37b11ad
A64: Implement FMLS (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
64c2f698a2
emit_x64_vector_floating_point: Specify NanHandler::function_type explicitly
...
MSVC doesn't like dealing with auto return types
2020-04-22 20:46:22 +01:00
MerryMage
2ef59b4f03
emit_x64_vector_floating_point: ChooseOnFsize arguments maybe_unused
2020-04-22 20:46:22 +01:00
MerryMage
04f325a05e
IR: Implement FPVectorNeg
2020-04-22 20:46:22 +01:00
MerryMage
934132e0c5
A64: Implement FMLA (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
771a4fc20b
IR: Implement FPVectorMulAdd
2020-04-22 20:46:22 +01:00
MerryMage
3218bb9890
emit_x64_vector_floating_point: Standardize naming scheme
2020-04-22 20:46:22 +01:00
MerryMage
8f72be0a02
emit_x64_floating_point: Simplify indexers
2020-04-22 20:46:22 +01:00
MerryMage
25b28bb234
emit_x64_vector_floating_point: Simplify EmitVectorOperation*
2020-04-22 20:46:22 +01:00
MerryMage
1edd0125b2
mp: rename mp.h to mp/function_info.h
2020-04-22 20:46:22 +01:00
MerryMage
0921678edb
emit_x64_vector: Slightly improve ArithmeticShiftRightByte
2020-04-22 20:46:22 +01:00
MerryMage
43407c4bb4
emit_x64_vector: Simplify VectorShuffleImpl
2020-04-22 20:46:22 +01:00
MerryMage
ecbf9dbae5
IR: Implement A64OrQC
2020-04-22 20:46:22 +01:00
MerryMage
f0fecf2615
A64: Implement UQSHRN, UQRSHRN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
8f4c1a8558
emit_x64_vector: -0x80000000 isn't -0x80000000
2020-04-22 20:46:22 +01:00
MerryMage
b455b566e7
A64: Implement UQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
e686a81612
emit_x64_vector: Fix non-SSE4.1 saturated narrowing reconstruction comparison
...
Allows non-SSE4.1 to produce the correct FPSR.QC flag
2020-04-22 20:46:22 +01:00
MerryMage
3874cb37e3
A64: Implement SQXTN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
8ef114d48f
emit_x64_vector: packusdw reqiures SSE4.1
...
In EmitVectorSignedSaturatedNarrowToUnsigned32.
2020-04-22 20:46:22 +01:00
MerryMage
712c6c1d7e
A64: Implement SQSHRUN, SQRSHRUN (vector)
2020-04-22 20:46:22 +01:00
MerryMage
c5722ec963
simd_shift_by_immediate: Simplify ShiftRight
2020-04-22 20:46:22 +01:00
MerryMage
f020dbe4ed
A64: Implement SQXTUN
2020-04-22 20:46:22 +01:00
MerryMage
6918ef7360
microinstruction: Reorganize FPSCR related instruction queries
2020-04-22 20:46:22 +01:00
Lioncash
a639fa5534
microinstruction: Add missing FP scalar opcodes to ReadsFromFPSCR() and WritesToFPSCR()
...
These were forgotten when the opcodes were added.
2020-04-22 20:46:22 +01:00
Lioncash
3ca18d8a6d
u128: Make Bit() a const-qualified member function
...
This function doesn't modify the struct members, so it can be made
const.
2020-04-22 20:46:22 +01:00
MerryMage
b2e4c16ef8
A64: Implement FRSQRTS (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
45dc5f74f3
A64: Implement FRSQRTE (vector), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
b74d5520f9
A64: Implement FRSQRTS (scalar), single/double variant
2020-04-22 20:46:22 +01:00
MerryMage
506e544bfe
IR: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
6eb069e80d
fp: Implement FPRSqrtStepFused
2020-04-22 20:46:22 +01:00
MerryMage
b0ff35fcd1
fp: Implement FPNeg
2020-04-22 20:46:22 +01:00
MerryMage
ca6774ccce
process_nan: Add two operand variant
2020-04-22 20:46:22 +01:00
Lioncash
ace7d2ba50
A64: Implement FMAXP, FMINP, FMAXNMP and FMINNMP's scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
MerryMage
66bb05fc0a
emit_x64_floating_point: Fixup special NaN case in FMA FPMulAdd implementation
2020-04-22 20:46:21 +01:00
Lioncash
070637e0f6
fp: Use a forward declaration in fused.h
...
It's permissible to forward declare here, so we can do so and eliminate
a direct header dependency
2020-04-22 20:46:21 +01:00
Lioncash
030820f649
u128: Implement comparison operators in terms of one another
...
We can just implement the comparisons in terms of operator< and
implement inequality with the negation of operator==.
2020-04-22 20:46:21 +01:00
MerryMage
76b07d6646
u128: StickyLogicalShiftRight requires special-casing for amount == 64
...
In this case (128 - amount) == 64, and this invokes undefined behaviour
2020-04-22 20:46:21 +01:00
Lioncash
49c7edf7c6
A64: Implement FMLA and FMLS (by element)'s double/single-precision scalar variant
2020-04-22 20:46:21 +01:00
Lioncash
c704acafe4
A64: Implement FMUL (by element)'s scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
MerryMage
0ce11b7b15
emit_x64_floating_point: Implement accurate fallback for FPMulAdd{32,64}
2020-04-22 20:46:21 +01:00
MerryMage
e199887fbc
fp: Implement FPMulAdd
2020-04-22 20:46:21 +01:00
MerryMage
53a8c15d12
process_nan: Add FPProcessNaNs3
2020-04-22 20:46:21 +01:00
MerryMage
1c8e93e74d
block_of_code: Add SysV ABI fifth and sixth parameters
2020-04-22 20:46:21 +01:00
MerryMage
1fe8f51c54
u128: Add StickyLogicalShiftRight
2020-04-22 20:46:21 +01:00
MerryMage
b0afd53ea7
u128: Add Multiply64To128
2020-04-22 20:46:21 +01:00
MerryMage
5566fab29a
u128: Add u128::Bit
2020-04-22 20:46:21 +01:00
MerryMage
3e62fea003
u128: Add comparison operators
2020-04-22 20:46:21 +01:00
MerryMage
f17cd6f2c5
unpacked: Use ResidualErrorOnRightShift in FPRoundBase
...
Fixes a bug relating to exponents that are severely out of range.
2020-04-22 20:46:21 +01:00
MerryMage
805428e35e
fp: Remove MantissaT
2020-04-22 20:46:21 +01:00
MerryMage
bda86fd167
FPRSqrtEstimate: Improve documentation of RecipSqrtEstimate
2020-04-22 20:46:21 +01:00
Lioncash
0a64a66b26
FPRSqrtEstimate: Deduplicate array bounds
...
Dehardcodes a few constants in the loops.
2020-04-22 20:46:21 +01:00
Lioncash
b7bd70fd19
A64: Implement FMAXV, FMINV, FMAXNMV, and FMINNMV
2020-04-22 20:46:21 +01:00
Lioncash
664fb12e21
FPRSqrtEstimate: Use forward declarations where applicable
2020-04-22 20:46:21 +01:00
Lioncash
3447c82656
translate: Return by bool in helpers where applicable
...
Gets rid of a bit of duplication regarding the early-out cases and makes
all helpers functions consistent (previously some had a return type of
bool, while others had a return type of void).
2020-04-22 20:46:21 +01:00
Lioncash
d65b056eba
Simplify fallback case for EmitVectorSetElement64()
2020-04-22 20:46:21 +01:00
MerryMage
6087c2af6f
emit_x64_floating_point: s/Esimate/Estimate/
2020-04-22 20:46:21 +01:00
MerryMage
f837ce8e78
simd_scalar_two_register_misc: Implement FRSQRTE, scalar variant
2020-04-22 20:46:21 +01:00
MerryMage
bde58b04d4
IR: Implement FPRSqrtEstimate
2020-04-22 20:46:21 +01:00
MerryMage
16061c28f3
simd_vector_x_indexed_element: Implement FMUL (by element), vector variant
2020-04-22 20:46:21 +01:00
MerryMage
55eaa16615
a64_emit_x64: Ensure host has updated ticks in EmitA64GetCNTPCT
...
Discovered by @Subv.
Fixes incomplete fix begun in 5a91c94dca47c9702dee20fbd5ae1f4c07eef9df.
That fix fails to take into account that LinkBlock doesn't update ticks until there
are no remaining ticks to be executed.
Test added to confirm fix.
2020-04-22 20:46:21 +01:00
MerryMage
edd795e991
a64_emit_x64: Fix stack misalignment on Windows for 128-bit exclusive writes
...
Discovered by @Subv.
Includes a test to ensure this codepath is exercised on Windows.
2020-04-22 20:46:21 +01:00
Lioncash
04b4c8b0cf
emit_x64_aes: Eliminate extraneous usage of a scratch register in EmitAESInverseMixColumns()
...
We can just use the same register the data is in as the result register,
eliminating the need to use a completely separate register to store the
result.
2020-04-22 20:46:21 +01:00
Lioncash
e5d80e998e
A64: Implement SADDLV
2020-04-22 20:46:21 +01:00
Lioncash
a1bc8ddb53
A64: Implement UADDLV
2020-04-22 20:46:21 +01:00
Lioncash
1dc1e3dcd8
fp: Use forward declarations where applicable
...
Minimizes the amount of files that need to be rebuilt if the headers
ever change.
2020-04-22 20:46:21 +01:00
Lioncash
46cb0d813b
emit_x64_vector: Append 'v' prefix onto movq in AVX path
...
This is something I missed when adding in the AVX broadcast code.
2020-04-22 20:46:21 +01:00
Subv
4606a081c9
A64: The A64SetTPIDR IR instruction writes to a system register and should not be eliminated by the dead code elimination pass.
...
Previously this instruction was alway eliminated, resulting in incorrect values for TPIDR_EL0.
2020-04-22 20:46:21 +01:00
MerryMage
b53127600b
fp: A64::FPCR -> FP::FPCR
2020-04-22 20:46:21 +01:00
MerryMage
084bf63a10
bit_util: Implement ClearBits and ModifyBits
2020-04-22 20:46:21 +01:00
MerryMage
699c5f36d5
system: Simplify static_cast
2020-04-22 20:46:21 +01:00
MerryMage
3f602129f4
system: Ensure value of CNTPCT_EL0 is accurate
...
Since we currently only update the host's tick count at the end of a
block, we force an end-of-block before executing a MRS %, CNTPCT_ELO
instruction.
2020-04-22 20:46:21 +01:00
Lioncash
84affdb260
safe_ops: Avoid cases where shift bases are invalid with signed values
...
For example, say the converted signed type is s64, shifting left by 63
bits would be undefined behavior.
However, given an ASL is essentially the same behavior as an LSL
we can just use an unsigned type instead of converting to a signed type.
2020-04-22 20:46:21 +01:00
Lioncash
d0274f412a
safe_ops: Avoid signed overflow in Negate()
...
Negation of values such as -9223372036854775808 can't be represented in
signed equivalents (such as long long), leading to signed overflow.
Therefore, we can just invert bits and add 1 to perform this behavior
with unsigned arithmetic.
2020-04-22 20:46:21 +01:00
Lioncash
af3e23b224
simd_scalar_shift_by_immediate: Implement FCVT{ZS, ZU} (vector, fixed-point)'s scalar double/single-precision variant
2020-04-22 20:46:21 +01:00
Lioncash
91abf87169
simd_scalar_two_register_misc: Implement FCVT{AS, AU, MS, MU, NS, NU, PS, PU, ZS, ZU} (vector)'s scalar double/single-precision variants
...
We can simply implement this in terms of the fixed-point IR opcodes.
2020-04-22 20:46:21 +01:00
Lioncash
0ec8dac660
emit_x64: Remove FPSCR_RoundTowardsZero() virtual function from EmitContext struct
...
This code was bugged in that we were comparing if the rounding mode was
not equal to rounding towards zero. Fortunately, however, nothing uses
this function anymore, and there's already the more general
FPSCR_RMode() available, so this can be removed entirely.
2020-04-22 20:46:21 +01:00
Lioncash
fd92e2f186
emit_x64: Add missing <array> include
...
Commit 755adef62e504a8d616de9dda8937d2428a9471b introduced a helper
alias for std::array, eliminating the need to manually type out sizes
for them, however I forgot to add the include for <array>
2020-04-22 20:46:21 +01:00
Lioncash
f939bd0228
emit_x64_vector{_floating_point}: Add helper alias for sizing arrays relative to vector width
...
Avoids needing to remember to specify the proper size of the arrays, all
that's needed is to specify the type of the array and the size will
automatically be deduced from it. This helps prevent potential oversized
or undersized arrays from being specified.
2020-04-22 20:46:21 +01:00
MerryMage
58f3399032
A64/PopRSBHint: Prevent RETing to a guest PC of ~0ull from crashing the jit
2020-04-22 20:46:21 +01:00
MerryMage
e18fca17dc
A64: Implement FABD in terms of existing IR instructions
...
Fixes NaN issue. Closes #306 .
2020-04-22 20:46:21 +01:00
MerryMage
1dbe9d95e6
FPRoundInt: Final FPRound based on new sign
...
While this shouldn't change any of the results in theory, it's just logically more consistent
2020-04-22 20:46:21 +01:00
MerryMage
83be491875
emit_x64_floating_point: SSE4.1 implementation of EmitFPRound
2020-04-22 20:46:20 +01:00
MerryMage
a40127a054
A64: Implement FRINTX, FRINTI (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
962fa3b65e
A64: Implement FRINTP, FRINTM, FRINTZ (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
5200bf41cf
A64: Implement FRINTN (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
8718dc1692
A64: Implement FRINTA (scalar)
2020-04-22 20:46:20 +01:00
MerryMage
b228694012
IR: Implement FPRoundInt
2020-04-22 20:46:20 +01:00
MerryMage
e24054f4d7
fp: Implement FPRoundInt
2020-04-22 20:46:20 +01:00
MerryMage
f876e4afa2
fp: Implement FPProcessNaN
2020-04-22 20:46:20 +01:00
MerryMage
591adee443
fp/info: Add DefaultNaN
2020-04-22 20:46:20 +01:00
MerryMage
797e18cd97
fp: Move FPToFixed to its own file
2020-04-22 20:46:20 +01:00
MerryMage
295deb4035
a64_jit_state: Add FPSR.QC flag
2020-04-22 20:46:20 +01:00
Lioncash
7797bc2fb2
emit_x64_vector: Use non-scratch Use* variants of registers within EmitVectorUnsignedAbsoluteDifference()
...
In some cases, a register isn't modified, depending on the branch taken,
so we can signify this by using the non-scratch variants in certain
cases.
2020-04-22 20:46:20 +01:00
Lioncash
f7f83b76b7
simd_scalar_two_register_misc: Implement scalar double/single-precision variants of FCM{EQ, GE, GT, LE, LT} (zero)
2020-04-22 20:46:20 +01:00
Lioncash
9db6d1e98b
translate_arm: Remove unnecessary rotr() function
...
We already have RotateRight() in our common code, so we can remove this
function and replace it with it. We can also implement ArmExpandImm_C()
in terms of ArmExpandImm().
2020-04-22 20:46:20 +01:00
Lioncash
9f8a44c982
cast_util: Remove unnecessary typename
...
Given we use std::aligned_storage_t, we don't need to specify
typename here. If we used std::aligned_storage, then we would need to.
2020-04-22 20:46:19 +01:00
MerryMage
89e43867c1
A64: Implement FADDP (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
33fa65de23
A64: Implement FADDP (vector)
2020-04-22 20:46:19 +01:00
MerryMage
9dba273a8c
A64: Implement SADDLP
2020-04-22 20:46:19 +01:00
MerryMage
70ff2d73b5
A64: Implement UADDLP
2020-04-22 20:46:19 +01:00
MerryMage
5563bbbd79
A64: Implement EXT
2020-04-22 20:46:19 +01:00
MerryMage
304cc7f61e
emit_x64_floating_point: SSE4.1 implementation for FP{Double,Single}ToFixed{S,U}{32,64}
2020-04-22 20:46:19 +01:00
MerryMage
3d9677d094
A64: Implement FCVTMU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
79c9018d60
A64: Implement FCVTMS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
49c4499a87
A64: Implement FCVTPU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
af661ef5a6
A64: Implement FCVTPS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
27319822bb
A64: Implement FCVTAU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
c0c7a26314
A64: Implement FCVTAS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
a1965a74a0
A64: Implement FCVTNU (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
7d36dbcdfd
A64: Implement FCVTNS (scalar)
2020-04-22 20:46:19 +01:00
MerryMage
617ca0adf0
floating_point_conversion_integer: Refactor implementation of FCVTZS_float_int and FCVTZU_float_int
2020-04-22 20:46:19 +01:00
MerryMage
caaf36dfd6
IR: Initial implementation of FP{Double,Single}ToFixed{S,U}{32,64}
...
This implementation just falls-back to the software floating point implementation.
2020-04-22 20:46:19 +01:00
MerryMage
760cc3ca89
EmitContext: Expose FPCR
2020-04-22 20:46:19 +01:00
MerryMage
9571269552
fp/op: Implement FPToFixed
2020-04-22 20:46:19 +01:00
MerryMage
8087e8df05
mantissa_util: Implement ResidualErrorOnRightShift
...
Accurately calculate residual error that is shifted out
2020-04-22 20:46:19 +01:00
MerryMage
8668d61881
fp/unpacked: Implement FPRound
2020-04-22 20:46:19 +01:00
MerryMage
55d590c01f
FPCR: Add AHP setter and FZ16 getter
2020-04-22 20:46:19 +01:00
MerryMage
7360a2579b
mp: Implement metaprogramming library
2020-04-22 20:46:19 +01:00
MerryMage
4ab029c114
fp: Implement FPUnpack
2020-04-22 20:46:19 +01:00
MerryMage
4875658917
fp: Implement FPProcessException
2020-04-22 20:46:19 +01:00
MerryMage
3cb98e1560
fp: Move fp_util to fp/util
2020-04-22 20:46:19 +01:00
MerryMage
c41a38b13e
fp: Add FPSR
2020-04-22 20:46:19 +01:00
MerryMage
66381352f3
fp: Add FPInfo
...
Provides information about floating-point format for various bit sizes
2020-04-22 20:46:19 +01:00
MerryMage
d21659152c
safe_ops: Implement safe shifting operations
...
Implement shifiting operations that perform consistently across architectures
without running into undefined or implemented-defined behaviour.
2020-04-22 20:46:19 +01:00
MerryMage
b00fe23b91
bit_util: Implement MostSignificantBit
2020-04-22 20:46:19 +01:00
MerryMage
95ad0d0a66
bit_util: Use Ones to implement Bits
2020-04-22 20:46:19 +01:00
MerryMage
62b640b2fa
bit_util: Add ClearBit and ModifyBit
2020-04-22 20:46:19 +01:00
MerryMage
8651c2d10e
u128: Implement u128
...
For when we need a 128-bit integer
2020-04-22 20:46:19 +01:00
Lioncash
e7409fdfe4
A64: Implement UCVTF (vector, integer)'s double/single-precision variant
2020-04-22 20:46:19 +01:00
Lioncash
4aa4885ba7
ir: Add opcodes for vector conversion of u32/u64 to floating-point
2020-04-22 20:46:19 +01:00
Lioncash
fcae4e2418
simd_three_different: Deduplicate common implementations
...
Generally, the only difference between the signed variants and the
unsigned variants is whether or not we use a sign-extension or
zero-extension, so we can simply use common functions to implement both
cases without totally duplicating code twice here.
2020-04-22 20:46:19 +01:00
Lioncash
9c0d5cf15c
floating_point_conversion_integer: Handle S64/U64 -> F32 conversions in SCVTF_float_int and UCVTF_float_int
2020-04-22 20:46:19 +01:00
Lioncash
7a84b6e8d8
ir: Add opcodes for converting S64 and U64 to single-precision floating-point values
2020-04-22 20:46:19 +01:00
Lioncash
066061fa50
constant_pool: Remove unnecessary std::memset from constructor
...
AllocateFromCodeSpace() already zeroes out the allocated memory.
2020-04-22 20:46:19 +01:00
Lioncash
a1d6a86e8c
A64: Implement ADDV
2020-04-22 20:46:19 +01:00
Lioncash
35026a6ce3
emit_x64_vector: Vectorize fallback path for EmitVectorMaxU32()
2020-04-22 20:46:19 +01:00
Lioncash
245c903129
simd_three_same: Join FPAbsoluteComparison() into FPCompareRegister()
...
These are part of the same comparison family, so there's no real point
in keeping them separate.
2020-04-22 20:46:19 +01:00
Lioncash
9912836b59
A64: Implement scalar double/single-precision variants of FACGE, FACGT, FCMEQ, FCMGE, FCMGT
2020-04-22 20:46:18 +01:00
MerryMage
0b97e9bd8d
emit_x64_floating_point: Fix EmitFPU64ToDouble for TowardsMinusInfinity rounding mode
2020-04-22 20:46:18 +01:00
MerryMage
a2eb9a02e0
backend_x86: Add FPSCR_RMode to EmitContext
2020-04-22 20:46:18 +01:00
MerryMage
d875c08ebf
fp: Extract common RoundingMode enum
2020-04-22 20:46:18 +01:00
Lioncash
3714bc0ed4
floating_point_conversion_integer: Use FPS64ToDouble and FPU64ToDouble in SCVTF_float_int and UCVTF_float_int
...
The opcodes introduced in 979b6f39f1621b80bd463645ec5b08661cb6b1bf can
also be used here, avoiding more falling back to the interpreter.
2020-04-22 20:46:18 +01:00
Lioncash
b97358075e
simd_scalar_two_register_misc: Handle 64-bit case in SCVTF and UCVTF's scalar double/single-precision variant
...
Avoids falling back to the interpreter in the 64-bit case.
2020-04-22 20:46:18 +01:00
Lioncash
7252293184
emit_x64_floating_point: Correct use of UseGpr() in EmitFPU32ToDouble() and EmitFPU32ToSingle()
...
In the non-AVX512 path, the following code is present:
code.mov(from.cvt32(), from.cvt32());
since this potentially modifies 'from', we should be using
UseScratchGpr() instead.
2020-04-22 20:46:18 +01:00
Lioncash
fbd7623fe5
emit_x64_floating_point: Add AVX512F conversion operations to EmitFPU32ToSingle() and EmitFPU32ToDouble()
...
AVX-512F provides convenient instructions for these kinds of conversions
directly
2020-04-22 20:46:18 +01:00
Lioncash
3a41465eaf
ir: Add opcodes for converting S64 and U64 to double-precision values
2020-04-22 20:46:18 +01:00
MerryMage
436ca80bcd
Merge branch 'global_monitor'
2020-04-22 20:46:18 +01:00
Lioncash
0f4bf26e05
simd_two_register_misc: Utilize FPVectorAbs in FABS implementations
...
Since we already have opcodes introduced to implement FACGE and FACGT,
we can reutilize it for the FABS implementations.
2020-04-22 20:46:18 +01:00
MerryMage
821cff1227
A64: Add ClearExclusiveState method
2020-04-22 20:46:18 +01:00
Lioncash
81e572c78c
ir: Extend FPVectorAbs opcode to also handle 16-bit elements for FP16
2020-04-22 20:46:18 +01:00
MerryMage
2a8de5f733
a64_emit_x64: Clear exclusive state in EmitA64CallSupervisor
...
The kernel would have to execute an ERET instruction to return to
userland; this clears exclusive state.
2020-04-22 20:46:18 +01:00
Lioncash
53dbb6a92a
A64: Implement FACGE's vector single/double precision variants
2020-04-22 20:46:18 +01:00
MerryMage
57f7c7e1b0
Implement global exclusive monitor
2020-04-22 20:46:18 +01:00
Lioncash
6912a02d9b
A64: Implement FACGT's vector single/double precision variants
2020-04-22 20:46:18 +01:00
MerryMage
85234338d3
a64_emit_x64: Simplify EmitExclusiveWrite
2020-04-22 20:46:18 +01:00
Lioncash
fc731dddae
ir: Add opcodes for performing vector absolute floating-point values
...
This will be usable for implementing FACGE and FACGT
2020-04-22 20:46:18 +01:00
MerryMage
2fc6b33829
CMakeLists: Add missing files
2020-04-22 20:46:18 +01:00
Lioncash
0bee648b4f
emit_x64_vector: Deduplicate a bit of code in EmitVectorSetElement{8, 32, 64} functions
...
Given both branches are the same, we can hoist out the common code.
2020-04-22 20:46:18 +01:00
Lioncash
d86fea0d28
A64: Implement FCMEQ (zero)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
593eca7fb1
A64: Implement load/store single structure instructions
...
Implements LD{1, 2, 3, 4}, LD{1, 2, 3, 4}R, and ST{1, 2, 3, 4} single
structure variants.
2020-04-22 20:46:18 +01:00
Lioncash
9bec354791
A64: Implement FCMEQ (register)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
b6e223fc58
emit_x64_vector: Deduplicate a bit of code within EmitVectorGetElement8()
...
Given both branches use the same destination register size, we can hoist
the common code out.
2020-04-22 20:46:18 +01:00
Lioncash
5ce187a54e
ir: Add opcodes for floating-point vector equalities
2020-04-22 20:46:18 +01:00
MerryMage
be354dbfd0
ir/basic_block: Add missing U16 immediate type to DumpBlock
2020-04-22 20:46:18 +01:00
Lioncash
cf188448d4
emit_x64_vector: Vectorize fallback case in EmitVectorMultiply64()
...
Gets rid of the need to perform a fallback.
2020-04-22 20:46:18 +01:00
MerryMage
5503ff28c3
llvm_disassemble: Allow disassembly of invalid AArch64 instructions
2020-04-22 20:46:18 +01:00
Lioncash
954deff2d4
emit_x64_vector: Add break to final case in EmitVectorRoundingHalvingAddUnsigned()
...
This doesn't alter behavior but does make the code better if anything
else is ever added to this function in the future.
2020-04-22 20:46:18 +01:00
Lioncash
11a92eaaef
A64: Implement SRHADD and URHADD
2020-04-22 20:46:18 +01:00
Lioncash
9e75d08860
A64: Implement FABD's scalar single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
bc718c5b28
ir: Add opcodes for performing rounding halving adds
2020-04-22 20:46:18 +01:00
Lioncash
d898d1779d
A64: Implement FABD's vector single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
054549da35
emit_x64_vector: Simplify AVX-512 codepath in EmitVectorMultiply64
...
I realized I introduced a helper for simple AVX operation emitting, so
use that instead of writing it all out long-form.
2020-04-22 20:46:18 +01:00
Lioncash
8a4f8aed06
ir: Add opcode for performing FP vector absolute differences
2020-04-22 20:46:18 +01:00
Lioncash
cb456f914b
A64: Implement UMLAL{2}, UMLSL{2}, and UMULL{2}
...
Now that we have the helper function set up for the signed variants, we
can also modify it to be used with the unigned ones by performing a zero
extension instead of a sign extension.
2020-04-22 20:46:18 +01:00
MerryMage
ba84e7a8de
A64: Implement FNMSUB
2020-04-22 20:46:18 +01:00
Lioncash
3576c02d91
A64: Implement SMLSL{2}
2020-04-22 20:46:18 +01:00
MerryMage
a1042cfcd8
A64: Implement FNMADD
2020-04-22 20:46:18 +01:00
Lioncash
ada5c0b2fa
A64: Implement SMLAL{2}
2020-04-22 20:46:18 +01:00
MerryMage
0d83032a6f
A64: Implement FMSUB
2020-04-22 20:46:18 +01:00
Lioncash
2d1aca25e6
A64: Implement SMULL{2}
2020-04-22 20:46:18 +01:00
MerryMage
69e00d225c
A64: Implement FMADD
2020-04-22 20:46:18 +01:00
MerryMage
8c90fcf58e
IR: Implement FPMulAdd
2020-04-22 20:46:18 +01:00
Lioncash
c5ae9107a9
A64: Implement SABAL/SABAL2 and SABDL/SABDL2
...
Now that we have a helper function for the unsigned variants, we can
modify it to also be usable with the signed variants.
2020-04-22 20:46:18 +01:00
Lioncash
24e3299276
A64: Implement FCMGT, FCMGE (register) vector double and single precision variants
2020-04-22 20:46:18 +01:00
Lioncash
26d4473851
A64: Implement UABAL/UABAL2
2020-04-22 20:46:18 +01:00
Lioncash
350bc70be8
A64: Implement FCMGT, FCMGE, FCMLE, FCMLT (zero) vector double and single precision variants.
2020-04-22 20:46:18 +01:00
Lioncash
3397742c74
A64: Implement UABDL/UABDL2
2020-04-22 20:46:18 +01:00
Lioncash
c695da1cf3
ir: Add opcode for floating-point GE and GT comparisons
...
The rest of the comparisons can be implemented in terms of these two
2020-04-22 20:46:18 +01:00
Lioncash
6de5ed96e5
emit_x64_vector: Emit VPMULLQ in EmitVectorMultiply64 on AVX-512{DQ, VL} capable CPUs
...
Shortens code-gen down to a single instruction in the 64-bit path.
2020-04-22 20:46:18 +01:00
Lioncash
9054d1c20b
A64: Implement LDR (literal, SIMD&FP)
2020-04-22 20:46:18 +01:00
Lioncash
0da5e949a8
Correct typo in DataCacheOperation enum
...
Fixes a typo for the InvalidateByVAToPoC enum entry. Given yuzu is the
only known user of 64-bit mode and it doesn't use this value, we can get
away with changing this.
2020-04-22 20:46:18 +01:00
Lioncash
9736e2cce2
A64: Implement FABS' half-precision variant
2020-04-22 20:46:18 +01:00
Lioncash
6e5750e4ec
A64: Implement FABS' single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
7bce8d8757
A64: Implement URSHR (scalar) and URSRA (scalar)
...
Now that the utility function is all set up from implementing SRSRA, the
unsigned variants can now be trivially implemented by modifying the
utility function to perform a logical shift right instead of an
arithmetical shift right for the unsigned case.
2020-04-22 20:46:18 +01:00
Lioncash
1e70a589b0
A64: Implement SRSRA (scalar)
2020-04-22 20:46:18 +01:00
Lioncash
998aef07f6
A64: Implement SRSHR (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
7c0250e9f8
A64: Implement SABA
2020-04-22 20:46:17 +01:00
Lioncash
f00789e6f7
A64: Implement SABD
2020-04-22 20:46:17 +01:00
Lioncash
1e10017f4b
ir: Add opcodes for signed absolute differences
2020-04-22 20:46:17 +01:00
Tillmann Karras
d3b44c1b5a
decoder_detail: use structured bindings
2020-04-22 20:46:17 +01:00
Lioncash
f745eb28bf
simd_two_register_misc: Handle 64-bit case for SCVTF_int_4
2020-04-22 20:46:17 +01:00
Lioncash
3f6c529da2
ir: Add opcode to perform the vector conversion S64->F64
...
Unfortunately x86 prior to AVX-512 doesn't really give us any convenient instruction to do the work for us
2020-04-22 20:46:17 +01:00
Lioncash
0e61ee6bf6
A64: Implement SHLL/SHLL2
2020-04-22 20:46:17 +01:00
Lioncash
43e6e98c3b
A64: Add missing decoding for PRFM (unscaled offset)
2020-04-22 20:46:17 +01:00
Lioncash
f2a85d5601
A64: Implement UHSUB
2020-04-22 20:46:17 +01:00
Lioncash
b33360a324
A64: Implement SHSUB
2020-04-22 20:46:17 +01:00
Lioncash
44a5f8095a
ir: Add opcodes for performing vector halving subtracts
2020-04-22 20:46:17 +01:00
Lioncash
4f37c0ec5a
A64: Implement SM4EKEY
2020-04-22 20:46:17 +01:00
Lioncash
3bde3347a5
A64: Implement SM4E
2020-04-22 20:46:17 +01:00
Lioncash
b312d28295
ir: Add an opcode for doing an SM4 lookup table query
2020-04-22 20:46:17 +01:00
Lioncash
27a6d5f6ce
emit_x64_vector: Use VPOPCNTB in EmitVectorPopulationCount() if AVX-512 BITALG is available
2020-04-22 20:46:17 +01:00
Lioncash
4dcc7724e0
A64: Implement UHADD
2020-04-22 20:46:17 +01:00
Lioncash
f8714f7250
A64: Implement SHADD
2020-04-22 20:46:17 +01:00
Lioncash
089096948a
ir: Add opcodes for performing halving adds
2020-04-22 20:46:17 +01:00
Lioncash
3d00dd63b4
emit_x64_vector: Emit VPMINSQ and VPMINUQ for 64-bit vector min operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b97b71b8aa
emit_x64_vector: Emit VPMAXSQ and VPMAXUQ for 64-bit vector max operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
033e400df0
emit_x64_vector_floating_point: Deduplicate accurate NaN handling code
...
Allows the code to both be used from the 32 bit and 64 bit operations without duplicating code.
2020-04-22 20:46:17 +01:00
Lioncash
0f067b7330
emit_x64_vector: Emit VPABSQ in EmitVectorAbs() for the 64-bit case if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
d4ee878cbd
emit_x64_vector: Use VPSRAQ in EmitVectorArithmeticShiftRight64() if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b38dd191bd
disassembler_arm: Remove rotation helper function in favor of Common::RotateRight
...
Mildly reduces the amount of duplicated behavior
2020-04-22 20:46:17 +01:00
Lioncash
51e4f1d9db
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS32()
2020-04-22 20:46:17 +01:00
Lioncash
c692ccdd6d
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8()
2020-04-22 20:46:17 +01:00
Lioncash
b194313d8c
emit_x64_vector: Vectorize fallback path in EmitVectorMinU32()
2020-04-22 20:46:17 +01:00
Lioncash
7ceda6d919
emit_x64_vector: Vectorize fallback path in EmitVectorMinU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda85a1da0
emit_x64_vector: Vectorize fallback path in EmitVectorMinS32()
2020-04-22 20:46:17 +01:00
Lioncash
6e08eed210
emit_x64_vector: Vectorize fallback path in EmitVectorMinS8()
2020-04-22 20:46:17 +01:00
Lioncash
0fb6dce689
emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
...
This can simply be merged with the previous one.
2020-04-22 20:46:17 +01:00
Lioncash
5b71b1337b
emit_x64_vector: Avoid left shift of negative value in LogicalVShift
...
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2020-04-22 20:46:17 +01:00
Lioncash
9954d28868
a64_jitstate: Zero SP and PC on construction of A64JitState
...
Given we zero out/reset everything else in the struct, do the same for these members to keep initialization consistent
2020-04-22 20:46:17 +01:00
Lioncash
4efbd40ea4
backend_x64/callback: Default virtual destructor in the cpp file
...
Prevents the vtable being generated in each translation unit that includes the header (and silences -Wweak-vtables warnings)
2020-04-22 20:46:17 +01:00
Lioncash
edd0b5c8c7
a32_interface/a64_interface: Change reinterpret_casts to static_casts in GetCurrentBlock thunks
...
It's well-defined to static_cast a void* to its proper type.
2020-04-22 20:46:17 +01:00
Lioncash
e71612d394
A64: Implement SSHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
ef1e69a1e3
A64: Implement SSHL (vector)
2020-04-22 20:46:17 +01:00
Lioncash
21974ee57e
backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
...
Also adds IR opcodes to dispatch said variants
2020-04-22 20:46:17 +01:00
Lioncash
9fc89f0a0e
emit_x64_vector_floating_point: Use arrays for retrieving size instead of hardcoding the size
...
Similar changes were done in emit_x64_vector, but these were missed.
2020-04-22 20:46:17 +01:00
Lioncash
af28e89a13
emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda75e2079
A64: Implement CMTST's scalar variant
2020-04-22 20:46:17 +01:00
Lioncash
0d20423ad5
emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32()
2020-04-22 20:46:17 +01:00
Lioncash
d70ee7c0d1
emit_x64_vector: Use VBPROADCAST where applicable and available
...
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2020-04-22 20:46:17 +01:00
Lioncash
bebe7235ae
A64: Implement UZP1 and UZP2
2020-04-22 20:46:17 +01:00
Lioncash
26d77c6f09
ir: Add opcodes for performing vector deinterleaving
2020-04-22 20:46:17 +01:00
Lioncash
d6f9ed47d9
A64: Implement FNEG (half-precision)
2020-04-22 20:46:17 +01:00
Lioncash
7efbd73bac
A64: Implement USHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
41f4717f2b
A64: Implement FNEG (vector)
2020-04-22 20:46:17 +01:00
Lioncash
ba1cc6366d
A64: Implement RSUBHN/RSUBHN2
2020-04-22 20:46:17 +01:00
Lioncash
e41640fe33
A64: Implement RADDHN/RADDHN2
2020-04-22 20:46:17 +01:00
Lioncash
b719a6b3f7
A64: Implement XAR
2020-04-22 20:46:17 +01:00
Lioncash
0b1b131ec2
simd_two_register_misc: Factor out common comparison code
...
Gets rid of a tiny bit of duplicated code.
2020-04-22 20:46:17 +01:00
Lioncash
ed0b84da70
A64: Implement CMLE (zero)'s vector variant
2020-04-22 20:46:17 +01:00
Lioncash
b595a68ffa
A64: Implement CMTST (vector)
2020-04-22 20:46:17 +01:00
Lioncash
48c7f8630c
A64: Implement ADDHN{2} and SUBHN{2}
2020-04-22 20:46:17 +01:00
Lioncash
3acd9c9200
translate: zero extend result in Vpart when storing to lower part of vector
2020-04-22 20:46:17 +01:00
Lioncash
87ca63699f
emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
f17702f608
emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
596a8dd1dd
emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
...
Provides a better alternative to a fallback operation.
2020-04-22 20:46:17 +01:00
Lioncash
75fd4eaaaa
emit_x64_vector: Get rid of some magic numbers in loop bounds
2020-04-22 20:46:17 +01:00
Lioncash
7b80ac25eb
emit_x64_vector: Generify variable shift functions
2020-04-22 20:46:17 +01:00
Lioncash
4ec735f707
A64: Implement CMLE (zero)'s scalar variant
2020-04-22 20:46:17 +01:00
Lioncash
6534184df2
A64: Implement CMLT (zero)'s scalar single/double-precision variant
2020-04-22 20:46:17 +01:00
Lioncash
8863c9bb4b
A64: Implement SHA512H2
2020-04-22 20:46:17 +01:00
Lioncash
033b890e25
A64: Implement SHA512H
2020-04-22 20:46:17 +01:00
Lioncash
d1f5b084b4
A64: Handle S32->F32 case for SCVTF (vector)
2020-04-22 20:46:17 +01:00
Lioncash
38fa984b53
IR: Add opcode for packed word->f32 conversions
2020-04-22 20:46:16 +01:00
Lioncash
b8587d8e34
A64: Implement SHA512SU1
2020-04-22 20:46:16 +01:00
Lioncash
44d846045a
A64: Implement SHA512SU0
2020-04-22 20:46:16 +01:00
Lioncash
ca903c1585
A64: Implement SHA256H and SHA256H2
2020-04-22 20:46:16 +01:00
MerryMage
e4237c44eb
A64: Implement SCVTF (vector, integer), scalar varaint
2020-04-22 20:46:16 +01:00
MerryMage
bfba38d0b6
impl: Reorganize scalar two-register misc instructions
2020-04-22 20:46:16 +01:00
Lioncash
ea582b17cc
A64: Implement SHA256SU1
2020-04-22 20:46:16 +01:00
Lioncash
06c5dcaf5e
simd_two_register_misc: Add missing zeroing of the vector for CMGT and CMLT
2020-04-22 20:46:16 +01:00
Lioncash
0d50d7314b
A64: Implement CMGE (zero)'s vector variant
2020-04-22 20:46:16 +01:00
Lioncash
ab35dc0e78
A64: Implement MLS (by element)
2020-04-22 20:46:16 +01:00
Lioncash
1651e60462
A64: Implement MUL (by element)
2020-04-22 20:46:16 +01:00
MerryMage
a86d4093cd
A64: Implement MLA (by element)
2020-04-22 20:46:16 +01:00
Lioncash
7f47402609
A64: Implement ABS (scalar)
2020-04-22 20:46:16 +01:00
Lioncash
c8eb4528be
A64: Implement SHA256SU0
2020-04-22 20:46:16 +01:00
Lioncash
181c3b0790
A64: Implement SHA1M
2020-04-22 20:46:16 +01:00
Lioncash
47bc97a71b
A64: Implement SHA1P
2020-04-22 20:46:16 +01:00
Lioncash
718f3e9bb4
A64: Implement scalar variants of CMEQ, CMGT, and CMGE zero comparison instructions
...
These can trivially use the ScalarCompare helper function.
2020-04-22 20:46:16 +01:00
Lioncash
3ad4e547e4
A64: Implement scalar variant of NEG
2020-04-22 20:46:16 +01:00
Lioncash
b4f3051e4b
simd: Relocate REV16, REV32 and REV64 vector variants to the proper file
...
These aren't scalar instruction variants.
2020-04-22 20:46:16 +01:00
Lioncash
19e276d10f
A64: Implement CMEQ (register, scalar)
2020-04-22 20:46:16 +01:00
Lioncash
5b8c9e5146
A64: Implement CMHS (register, scalar)
2020-04-22 20:46:16 +01:00
Lioncash
78bb12276a
A64: Implement CMHI (register, scalar)
2020-04-22 20:46:16 +01:00
Lioncash
c18b20b8d1
A64: Implement CMGE (register, scalar)
2020-04-22 20:46:16 +01:00
Lioncash
755981d0da
A64: Implement CMGT (register, scalar)
2020-04-22 20:46:16 +01:00
Lioncash
da6627124b
A64: Implement SHA1C
2020-04-22 20:46:16 +01:00
Lioncash
3c013bd9f8
A64: Implement SLI (scalar)
2020-04-22 20:46:16 +01:00
Lioncash
154cac594a
A64: Implement SRI (scalar)
2020-04-22 20:46:16 +01:00
Lioncash
6bcfdba1ad
general: Remove unused lambda captures
...
Resolves warnings that occur in Xcode 9.3
2020-04-22 20:46:16 +01:00