Commit graph

258 commits

Author SHA1 Message Date
Lioncash
cba9351b82 backend/x64/emit_*: Apply const where applicable 2020-04-22 21:04:22 +01:00
Lioncash
23f56bdb67 x64/exception_handler_windows: Join namespace declaration
Uses a nested namespace declaration like the rest of the codebase.
2020-04-22 21:04:22 +01:00
Lioncash
3bbb06c34a a64_emit_x64: Apply [[maybe_unused]] to unused lambda parameter
This can result in an unused variable warning on Windows otherwise.
2020-04-22 21:04:22 +01:00
Lioncash
bfa8035414 A32/A64: Make public header inclusions consistent
For all public header inclusions, we use the <> form of including them
as opposed to "", which we typically use for internal headers.
2020-04-22 21:04:22 +01:00
Lioncash
ccf923305c a32_interface: Remove duplicated documentation comments
These already exist in the header, so these ones can be removed.
2020-04-22 21:04:22 +01:00
Lioncash
182ceb2807 General: Make parameter names from declarations and implementations consistent
Most of the time when this occurs, it's a bug. Thankfully this isn't the
case. However, we can resolve these cases to make the codebase more
consistent.
2020-04-22 21:04:22 +01:00
Lioncash
6b9bf7868a General: Correct typos is code comments 2020-04-22 21:04:22 +01:00
Lioncash
6187de7ca7 a32_interface: std::move UserConfig where applicable
UserConfig instances contain up to 16 std::shared_ptr<Coprocessor>
instances. We can std::move here to avoid performing 16 redundant atomic
reference increment and decrement operations.

Mostly inconsequential on x64, but we may as well signify intent.
2020-04-22 21:04:22 +01:00
Lioncash
b13b6610b5 a32_interface: Default destructor in the cpp file
Makes it more consistent with code throughout the codebase.
2020-04-22 21:04:22 +01:00
MerryMage
5f8eb7c51c A32/location_descriptor: Add CPSR.IT to A32::LocationDescriptor 2020-04-22 21:04:22 +01:00
MerryMage
6e2cd35e4f a32_jitstate: Optimize runtime location descriptor calculation
Calculation is now one unaligned 64-bit load.
2020-04-22 21:04:22 +01:00
MerryMage
0de3993373 a32_jitstate: Remove fpsr_idc
We do not really have accurate FPSR state in any case.
2020-04-22 21:04:22 +01:00
MerryMage
6f49c0ef8e {a32,a64}_jitstate: Rename CPSR_* to cpsr_* 2020-04-22 21:04:22 +01:00
MerryMage
8cd7837839 a32_jitstate: Remove old_FPSCR 2020-04-22 21:04:22 +01:00
MerryMage
b3bb544bca a32_jitstate: Rename FPSCR_nzcv to fpsr_nzcv 2020-04-22 21:04:22 +01:00
MerryMage
76f986979d a32_jitstate: Rename FPSCR_mode to fpcr_mode 2020-04-22 21:04:22 +01:00
MerryMage
49fca15f90 {a32,a64}_jitstate: Rename FPSCR_IDC to fpsr_idc 2020-04-22 21:04:21 +01:00
MerryMage
622c02f537 {a32,a64}_jitstate: Remove FPSCR_UFC 2020-04-22 21:04:21 +01:00
MerryMage
366d63f4b4 a32_jitstate: Enable SSE FTZ and DAZ 2020-04-22 21:04:21 +01:00
MerryMage
f178562ee7 a32_jitstate: Remove exception trap enables from FPSCR_MODE_MASK
We don't currently use this for anything (we do not currently trap
floating point exceptions).

This frees these bits up for other purposes.
2020-04-22 21:04:21 +01:00
Merry
fd6222f0a1 Merge pull request #500 from lioncash/cbz
A32: Implement Thumb-1's CBZ/CBNZ instructions
2020-04-22 21:04:21 +01:00
Merry
bab4e29075 Merge pull request #498 from lioncash/ahp
A32/location_descriptor: Add AHP bit to the FPSCR mask
2020-04-22 21:04:21 +01:00
Lioncash
87083af733 general: Remove trailing spaces
General code-related cleanup. Gets rid of trailing spaces in the
codebase.
2020-04-22 21:04:21 +01:00
Lioncash
fdbafbc1ae x64/reg_alloc: Remove reference qualifier to variable in GetArgumentInfo()
The result of GetArg() is returned by value, so this is essentially
still a copy. While the previous code *is* valid, this communicates what
is actually happening a little more explicitly.
2020-04-22 21:04:20 +01:00
Lioncash
d02a4e6fc9 A32/location_descriptor: Add AHP bit to the FPSCR mask
Ensures the alternate half-precision state is preserved within the
location descriptors, which will be necessary when implementing the
half-precision extensions for VFP and NEON.
2020-04-22 21:02:47 +01:00
Lioncash
bd755ae494 frontend/ir/ir_emitter: Add A32 equivalent to A64's SetCheckBit
This will be used in a subsequent change to implement ARMv6T2's CBZ/CBNZ
Thumb-1 instructions.
2020-04-22 21:02:47 +01:00
Lioncash
c6e1fd1416 a64_emit_x64: Use const on locals where applicable
Normalizes the use of const in the source file.
2020-04-22 21:02:47 +01:00
Merry
f6f0b6da65 Merge pull request #497 from lioncash/boost
A32/coprocessor: Remove boost from public interface
2020-04-22 21:02:47 +01:00
Lioncash
fb437080be a32_emit_x64: Use const on locals where applicable
Normalizes the use of const in the source file.
2020-04-22 21:02:47 +01:00
Lioncash
5f9ba970b9 emit_x64: Use const on locals where applicable 2020-04-22 21:02:47 +01:00
Lioncash
92daae9513 A32/coprocessor: Remove boost from public interface
Removes a boost header from the public includes in favor of using the
standard-provided std::variant.

The use of boost in public interfaces is often a dealbreaker for some
people. Given we use std::optional in the header already, we can
transition over to std::variant from boost::variant.

With this removal, this makes all of our dependencies internal to the
library itself.
2020-04-22 21:02:47 +01:00
Lioncash
a40b921cb5 emit_x64: Remove unnecessary typename in GetBasicBlock()
This can be deduced from the name alone.
2020-04-22 21:02:47 +01:00
Lioncash
675f67e41d emit_x64_vector: Use const on locals where applicable
Normalizes the use of const in the source file.
2020-04-22 21:02:47 +01:00
Lioncash
cccbc7fd0e emit_x64_saturation: Use const on locals where applicable
Normalizes the use of const in the source file.
2020-04-22 21:02:47 +01:00
Lioncash
7316fa47b3 emit_x64_packed: Use const on locals where applicable
Normalizes the use of const across the source file.
2020-04-22 21:02:47 +01:00
Lioncash
3a9c2f81d0 block_of_code: Use variable template variants of type traits
Now all type traits are using the variable template variants where
applicable.
2020-04-22 21:02:47 +01:00
Lioncash
9b783a5527 emit_x64_data_processing: Use const on locals where applicable
Normalizes the use of const across the source file.
2020-04-22 21:02:47 +01:00
MerryMage
5debb411cc block_of_code: Explicitly delete copy constructor 2020-04-22 21:02:46 +01:00
MerryMage
605a43d23e Suppress MSVC warning C4702: unreachable code 2020-04-22 21:02:46 +01:00
Lioncash
8316d231e9 A32: Implement barrier instructions introduced in ARMv7
Provides basic implementations of the barrier instruction introduced
within ARMv7. Currently these simply mirror the behavior of the AArch64
equivalents.
2020-04-22 21:02:46 +01:00
Merry
09ee64ea98 Merge pull request #482 from lioncash/fixedfp
A64: Handle half-precision variants of FP->Fixed instructions
2020-04-22 21:02:45 +01:00
MerryMage
1e1e9c17c7 emit_x64_data_processing: Remove INVALID_REG
INVALID_REG.cvt8() now throws
2020-04-22 21:02:45 +01:00
Lioncash
604f39f00a frontend/ir_emitter: Add half-precision->fixed-point opcodes 2020-04-22 21:01:45 +01:00
Lioncash
96356fac93 frontend/ir_emitter: Add half-precision opcode variant of FPVectorRSqrtStepFused 2020-04-22 21:01:45 +01:00
Merry
45864133f5 Merge pull request #478 from lioncash/stepfused
A64: Handle half-precision variants of FRECPE and FRECPS
2020-04-22 21:01:44 +01:00
Lioncash
824c551ba2 frontend/ir_emitter: Add half-precision opcode variant of FPRSqrtStepFused 2020-04-22 21:01:44 +01:00
Lioncash
037acb17b9 frontend/ir_emitter: Add half-precision opcode variant for FPVectorRSqrtEstimate 2020-04-22 21:01:44 +01:00
Lioncash
5dba99b4f4 frontend/ir_emitter: Add half-precision opcode variant for FPRSqrtEstimate 2020-04-22 21:01:44 +01:00
Lioncash
825a3ea16f frontend/ir_emitter: Add half-precision opcode for FPVectorRecipEstimate 2020-04-22 21:01:44 +01:00
Lioncash
2184d24e8f frontend/ir_emitter: Add half-precision opcode for FPRecipEstimate 2020-04-22 21:01:44 +01:00
Lioncash
5d5c9f149f frontend/ir_emitter: Add half-precision opcode for FPVectorRecipStepFused 2020-04-22 21:01:44 +01:00
Lioncash
6da0411111 frontend/ir_emitter: Add half-precision opcode for FPRecipStepFused 2020-04-22 21:01:44 +01:00
Lioncash
5b4673da4b frontend/ir_emitter: Add half-precision variant of FPVectorRoundInt 2020-04-22 21:01:44 +01:00
Lioncash
ad0c698f89 frontend/ir_emitter: Add half-precision variant of FPRoundInt 2020-04-22 21:01:44 +01:00
Merry
cb9a1b18b6 Merge pull request #475 from lioncash/muladd
A64: Enable half-precision variants of floating-point multiply-add instructions
2020-04-22 21:01:44 +01:00
Merry
13f421c27d Merge pull request #473 from lioncash/sqshlu
A64: Implement SQSHLU
2020-04-22 21:01:44 +01:00
Merry
d7da53a74b Merge pull request #472 from lioncash/exception
general: Mark hash functions as noexcept
2020-04-22 21:01:44 +01:00
Lioncash
a4cadf1cd9 frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation 2020-04-22 21:01:44 +01:00
Lioncash
ec6b3ae084 ir/frontend: Add half-precision opcode for FPVectorMulAdd 2020-04-22 21:01:44 +01:00
Lioncash
bd82513199 frontend/ir_emitter: Add half-precision opcode for FPMulAdd 2020-04-22 21:01:44 +01:00
Lioncash
7bb5440507 general: Mark hash functions as noexcept
Generally hash functions shouldn't throw exceptions. It's also a
requirement for the standard library-provided hash functions to not
throw exceptions.

An exception to this rule is made for user-defined specializations,
however we can just be consistent with the standard library on this to
allow it to play nicer with it.

While we're at it, we can also make the std::less specializations
noexcpet as well, since they also can't throw.
2020-04-22 21:01:43 +01:00
Lioncash
fe95575b95 general: Replace unreachable-imitating assertions with UNREACHABLE()
We can just use the self-documenting assertion for indicating
unreachable paths, instead of manually passing false and providing a
message.
2020-04-22 21:01:43 +01:00
Lioncash
b37279f65c backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft
Avoids undefined behavior by potentially left-shifting a signed negative
value.
2020-04-22 21:00:47 +01:00
MerryMage
13e8b7b516 emit_x64_floating_point: F16C implementation of FPSingleToHalf 2020-04-22 20:58:17 +01:00
MerryMage
d32d6fe598 emit_x64_floating_point: F16C implementation of FPHalfToSingle and FPHalfToDouble 2020-04-22 20:58:12 +01:00
MerryMage
a53ba12be2 emit_x64_floating_point: Factor out ConvertRoundingModeToX64Immediate 2020-04-22 20:58:12 +01:00
MerryMage
5a2adc6629 backend/x64: Expose FPCR in EmitContext instead of its subcomponents 2020-04-22 20:58:12 +01:00
Merry
01bb1cdd88 Merge pull request #458 from lioncash/float-op
A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV
2020-04-22 20:58:12 +01:00
Lioncash
8309ec7a9f frontend/ir_emitter: Add half-precision variant of FPAbs 2020-04-22 20:58:12 +01:00
Lioncash
e4c259d69f frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes 2020-04-22 20:58:12 +01:00
Lioncash
c97efcb978 frontend/ir_emitter: Add half-precision variant of FPNeg 2020-04-22 20:58:12 +01:00
Lioncash
bd892ec4ef frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point 2020-04-22 20:58:11 +01:00
Merry
bbd5330ad2 Merge pull request #447 from lioncash/flag
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Merry
fb039e232c Merge pull request #442 from lioncash/fcvtxn
A64: Implement scalar and vector variants of FCVTXN
2020-04-22 20:58:11 +01:00
Lioncash
597a8be5d5 ir: Add A64-specific opcodes for getting and setting raw NZCV values
This will be necessary to implement the flag manipulation and flag
format instructions.
2020-04-22 20:58:11 +01:00
Lioncash
5cf1478620 frontend/ir: Add opcodes for vector square roots 2020-04-22 20:58:10 +01:00
Lioncash
7c81a58ed3 frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2020-04-22 20:58:10 +01:00
Merry
9f11720a69 Merge pull request #437 from lioncash/frecpx
A64: Implement FRECPX (single, double precision)
2020-04-22 20:58:10 +01:00
Lioncash
9cf3c25811 frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents 2020-04-22 20:58:10 +01:00
Lioncash
2e180a7f14 backend/x64/a32_interface: Mark Context move constructor and move assignment as noexcept
Provides a more "correct" move constructor/assignment operator, since
these relevant functions shouldn't throw exceptions.

Has the benefit of playing nicely with std::move_if_noexcept and other
noexcept library facilities.
2020-04-22 20:58:09 +01:00
Lioncash
deb9dd4acc block_of_code: Replace cast with [[maybe_unused]] in DoesCpuSupport() 2020-04-22 20:58:09 +01:00
Lioncash
3290a9fdc2 common: Remove address_range.h
The AddressRange structure isn't used anywhere within the codebase, so
this can be removed. Particularly because there's no real appeal/heavy
potential use of it in the future that isn't trivial to add back if
needed.
2020-04-22 20:57:38 +01:00
Lioncash
93351c7efb a64_emit_x64: Make constness of loop elements explicit within GenFastmemFallbacks() 2020-04-22 20:57:37 +01:00
Lioncash
7752ffc50c a64_emit_x64: Convert std::vector instances in GenFastmemFallbacks() to std::array
Given these are quite small, we can avoid the need to heap allocate
here.
2020-04-22 20:57:37 +01:00
MerryMage
7c8fcaef26 emit_x64_vector_floating_point: AVX && DN implementation of EmitFPVectorMulX 2020-04-22 20:57:37 +01:00
MerryMage
fa8925c4df IR: Implement FPVectorMulX 2020-04-22 20:57:37 +01:00
V.Kalyuzhny
764a93bf5a Switch boost::optional to std::optional 2020-04-22 20:57:37 +01:00
Lioncash
d69fceec55 value: Move ImmediateToU64() to be a part of Value's interface
This'll make it slightly nicer to do basic constant folding for 32-bit
and 64-bit variants of the same IR opcode type. By that, I mean it's
possible to inspect immediate values without a bunch of conditional
checks beforehand to verify that it's possible to call GetU32() or
GetU64, etc.
2020-04-22 20:55:50 +01:00
MerryMage
ca603c1215 reg_alloc: Emit AVX instructions where able
Smaller codesize.
2020-04-22 20:55:50 +01:00
MerryMage
e2358af5ef abi: Emit AVX instructions where able
Smaller codesize.
2020-04-22 20:55:50 +01:00
MerryMage
7c0378f56d a64_exclusive_monitor: Loosen memory ordering requirements
It is not necessary to be as strict as it was.
2020-04-22 20:55:50 +01:00
MerryMage
f0920c0ded Fix VShift terminology
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.

- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16 2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64 2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32 2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16() 2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64() 2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266 emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32() 2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87 emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
pshufb lyfe
2020-04-22 20:55:50 +01:00
MerryMage
238f2f2cd0 a64_emit_x64: Lowercase PAGE_SIZE
PAGE_SIZE is defined as a macro by musl.
2020-04-22 20:55:50 +01:00
MerryMage
7162f6f254 emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed 2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699 emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE 2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732 emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8 2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16 2020-04-22 20:55:50 +01:00
MerryMage
03ad2072a7 emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed 2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant 2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942 ir: Add opcodes for unsigned saturating left shifts 2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7 IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d IR: Add fbits argument to FixedToFP-related opcodes 2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525 emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9 perf_map: Use std::string_view instead of std::string for PerfMapRegister()
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5 A64: Implement SQRDMULH (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde backend_x64/a64_interface: Re-enable the constant folding pass
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2 emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused 2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1 emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64} 2020-04-22 20:55:06 +01:00
MerryMage
1492573267 emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32} 2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226 emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation 2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28 emit_x64_vector: Simplify fpsr_qc related code
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04 ir: Add opcodes form unsigned saturated accumulations of signed values 2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da ir: Add opcodes for signed saturated accumulations of unsigned values 2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d ir: Add opcodes for performing unsigned reciprocal square root estimates 2020-04-22 20:55:05 +01:00
Lioncash
af83360f89 ir: Add opcodes for unsigned reciprocal estimate 2020-04-22 20:55:05 +01:00
Lioncash
fca7eddb9e A64: Add opcodes for signed saturating negations 2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9 emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d backend/x64: Do not clear fast_dispatch_table if not enabled
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660 A64: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422 A32: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13 ir/terminal: Add FastDispatchHint 2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c ir: Add opcodes for scalar signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546 ir: Add opcodes for signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
0507e47420 ir: Add opcodes for signed saturated absolute values 2020-04-22 20:53:46 +01:00
MerryMage
27427595b7 emit_x64_floating_point: EmitFPToFixed: maxsd optimization
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905 emit_x64_vector: Provide AVX path for EmitVectorMinU64() 2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06 emit_x64_vector: Provide AVX path for EmitVectorMinS64() 2020-04-22 20:53:46 +01:00
Lioncash
770723f449 emit_x64_vector: Provide AVX path for EmitVectorMaxU64() 2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1 emit_x64_vector: Provide AVX path for EmitVectorMaxS64() 2020-04-22 20:53:46 +01:00
Lioncash
2cac6ad129 emit_x64_vector: Simplify EmitVectorLogicalLeftShift8()
Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead
of time and just and the results of a halfword left shift.
2020-04-22 20:53:46 +01:00
Lioncash
135107279d emit_x64_vector: Simplify EmitVectorLogicalShiftRight8()
We can generate the mask and AND it against the result of a halfword
shift instead of looping.
2020-04-22 20:53:46 +01:00
Lioncash
2952b46b16 emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16()
We should be defining the value after the results have been calculated
to be consistent with the rest of the code.
2020-04-22 20:53:46 +01:00