Lioncash
2274214ff0
constant_propagation_pass: Combine zero-extension folding code into its own function
...
Separates the behavior from the actual switch statement and gets rid of
duplication, now that we can use the general GetImmediateAsU64()
function.
2020-04-22 20:57:37 +01:00
Lioncash
4a3c064b15
ir/value: Add an IsZero() member function to Value's interface
...
By far, one of the most common things to check for is whether or not a
value is zero, as it typically allows folding away unnecesary
operations (other close contenders that can help with eliding operations are 1 and -1).
So instead of requiring a check for an immediate and then actually
retrieving the integral value and checking it, we can wrap it within a
function to make it more convenient.
2020-04-22 20:57:37 +01:00
Merry
c649f11c0a
Merge pull request #401 from lioncash/folding
...
constant_propagation_pass: Fold &, |, ^, and ~ operations where applicable
2020-04-22 20:56:01 +01:00
MerryMage
2524d536b0
A32/ir_emitter: Bugfix: ExceptionRaised was producing incorrect PC
...
Use actual PC and not pipelined PC.
2020-04-22 20:56:01 +01:00
Lioncash
c09f4cf28e
constant_propagation_pass: Fold NOT operations
2020-04-22 20:55:50 +01:00
Lioncash
d69fceec55
value: Move ImmediateToU64() to be a part of Value's interface
...
This'll make it slightly nicer to do basic constant folding for 32-bit
and 64-bit variants of the same IR opcode type. By that, I mean it's
possible to inspect immediate values without a bunch of conditional
checks beforehand to verify that it's possible to call GetU32() or
GetU64, etc.
2020-04-22 20:55:50 +01:00
Lioncash
8013548bbb
constant_propagation_pass: Fold OR operations
2020-04-22 20:55:50 +01:00
MerryMage
ca603c1215
reg_alloc: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
898d096e39
constant_propagation_pass: Fold AND operations
2020-04-22 20:55:50 +01:00
MerryMage
e2358af5ef
abi: Emit AVX instructions where able
...
Smaller codesize.
2020-04-22 20:55:50 +01:00
Lioncash
f40fcda1f6
ir/value: Add member function to check whether or not all bits of a contained value are set
...
This is useful when we wish to know if a contained value is something
like 0xFFFFFFFF, as this helps perform constant folding. For example the
operation: x & 0xFFFFFFFF can be folded to just x in the 32-bit case.
2020-04-22 20:55:50 +01:00
MerryMage
7c0378f56d
a64_exclusive_monitor: Loosen memory ordering requirements
...
It is not necessary to be as strict as it was.
2020-04-22 20:55:50 +01:00
Lioncash
0ea99b7d59
constant_propagation_pass: Fold EOR operations
...
It's possible to fold cases of exclusive OR operations if they can be
known to be an identity operation, or if both operands happen to be known
immediates, in which case we can just store the result of the
exclusive-OR directly.
2020-04-22 20:55:50 +01:00
MerryMage
f0920c0ded
Fix VShift terminology
...
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.
- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
MerryMage
b51dae790d
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16
2020-04-22 20:55:50 +01:00
MerryMage
bd47f2ca8f
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64
2020-04-22 20:55:50 +01:00
MerryMage
3bf183d7e8
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32
2020-04-22 20:55:50 +01:00
MerryMage
94f9d402eb
emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()
2020-04-22 20:55:50 +01:00
MerryMage
6d9639e3b0
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()
2020-04-22 20:55:50 +01:00
MerryMage
bbc066a266
emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()
2020-04-22 20:55:50 +01:00
Lioncash
da2e7fad87
emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8()
...
pshufb lyfe
2020-04-22 20:55:50 +01:00
VelocityRa
c30b8dbe99
decoders: Cast to correctly-sized type before shifting
...
Fixes decoding for 64-bit instructions
Does not help/apply to any currently supported ARM versions (since
all are 32-bit length or below), it's for future-proofing should
such an arch be supported.
2020-04-22 20:55:50 +01:00
MerryMage
238f2f2cd0
a64_emit_x64: Lowercase PAGE_SIZE
...
PAGE_SIZE is defined as a macro by musl.
2020-04-22 20:55:50 +01:00
MerryMage
7162f6f254
emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed
2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699
emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE
2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732
emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8
2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa
emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16
2020-04-22 20:55:50 +01:00
MerryMage
09bf273bc8
A64: Implement SCVTF, UCVTF (vector, fixed-point), scalar variant
2020-04-22 20:55:06 +01:00
MerryMage
03ad2072a7
emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed
2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd
A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant
2020-04-22 20:55:06 +01:00
Lioncash
48df9b9a7d
A64: Implement UQSHL's vector immediate and register variants
2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942
ir: Add opcodes for unsigned saturating left shifts
2020-04-22 20:55:06 +01:00
Lioncash
ab60720418
A64/translate/impl: Make signatures consistent for unimplemented by-element SIMD variants
...
Makes them all consistent, so it isn't necessary to change the
prototypes over when implementing them.
2020-04-22 20:55:06 +01:00
Lioncash
6b5ea6ee66
A64: Implement BRK
...
Currently, we can just implement this as part of the exception
interface, similar to how it's done for the A32 interface with BKPT.
2020-04-22 20:55:06 +01:00
Lioncash
b915364c16
A64/imm: Add full range of comparison operators to Imm template
...
Makes the comparison interface consistent by providing all of the
relevant members. This also modifies the comparison operators to take
the Imm instance by value, as it's really only a u32 under the covers,
and it's cheaper to shuffle around a u32 than a 64-bit pointer address.
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7
IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed
2020-04-22 20:55:06 +01:00
MerryMage
027b0ef725
A64: Implement SCVTF, UCVTF (scalar, fixed-point)
2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0
opcodes.inc: Align columns to a tabstop of 4
2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d
IR: Add fbits argument to FixedToFP-related opcodes
2020-04-22 20:55:06 +01:00
Lioncash
616a153c16
A64: Implement SQSHL's vector immediate variant
2020-04-22 20:55:06 +01:00
Lioncash
e8b0f25dff
A64: Implement SQSHL's vector register variant
2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46
ir: Add opcodes for left signed saturated shifts
2020-04-22 20:55:06 +01:00
Lioncash
da55ed7b31
branch: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
867b666285
move_wide: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
78024a9dc4
load_store_register_unprivileged: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
e45e5da610
load_store_register_immediate: Place conditional bodies on their own line
...
Makes the conditionals visually consistent with the rest of the
codebase.
2020-04-22 20:55:06 +01:00
Lioncash
b586cf3f56
load_store_load_literal: Make variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
c3a3b9687e
data_processing_logical: Move datasize declarations after early-exit conditionals
...
While we're at it, make variables const where applicable.
2020-04-22 20:55:06 +01:00
Lioncash
ed797e6540
data_processing_conditional_select: Make variables const where applicable
...
Makes CSEL's function consistent with all of the others.
2020-04-22 20:55:06 +01:00
Lioncash
c82fa5ec5a
data_processing_addsub: Move datasize declarations after early-exit conditionals
...
While we're at it, also make relevant variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
f4a66d2477
data_processing_bitfield: Move datasize variables after early-exit conditionals
...
Moves the declaration of datasize to the scope that it's used within.
This also takes the opportunity to apply const where applicable, and
make early-exits all vertically consistent with one another.
2020-04-22 20:55:06 +01:00
Lioncash
2e0fcd6161
A64: Implement CLS's vector variant
...
Leverages CLZ like the integral variant does.
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525
emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
...
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9
perf_map: Use std::string_view instead of std::string for PerfMapRegister()
...
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5
A64: Implement SQRDMULH (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
a9ffcf08b1
A64: Implement SQDMULL (vector), vector variant
2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6
IR: Add VectorSignedSaturatedDoublingMultiplyLong
2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa
emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
...
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5
IR: Implement Vector{Signed,Unsigned}Multiply{16,32}
2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde
backend_x64/a64_interface: Re-enable the constant folding pass
...
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2
emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d
emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused
2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1
emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused
2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba
emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}
2020-04-22 20:55:06 +01:00
MerryMage
1492573267
emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}
2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b
emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
...
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226
emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation
2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28
emit_x64_vector: Simplify fpsr_qc related code
...
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
112cff9ab9
A64: Implement CLZ's vector variant
2020-04-22 20:55:06 +01:00
Lioncash
e739624296
ir: Add opcodes for vector CLZ operations
...
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.
If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage
d4c37a68a8
A64/translate: VectorZeroUpper for V(64) stores
...
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage
b8daa4feac
simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper
2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e
emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
...
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
14e026a7f0
A64: Implement USQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04
ir: Add opcodes form unsigned saturated accumulations of signed values
2020-04-22 20:55:05 +01:00
Lioncash
18ad7f237d
A64: Implement SUQADD's scalar and vector variants
2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da
ir: Add opcodes for signed saturated accumulations of unsigned values
2020-04-22 20:55:05 +01:00
Lioncash
9a3d38d2ee
A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
...
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2020-04-22 20:55:05 +01:00
Lioncash
6ccfbc9b39
A64: Implement UMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
58e21f175c
A64: Implement SMULL{2}'s vector by-element variant
2020-04-22 20:55:05 +01:00
Lioncash
134bb02e19
ir/value: Replace includes with forward declarations
...
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2020-04-22 20:55:05 +01:00
Lioncash
2c8e07e7d0
ir/cond: Migrate to C++17 nested namespace specifiers
2020-04-22 20:55:05 +01:00
Lioncash
c3b7819a55
CMakeLists: Add missing cond.h header to file listing
...
Allows the file to show up within IDEs more easily.
2020-04-22 20:55:05 +01:00
Lioncash
0a3976059f
A64: Implement URSQRTE
2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d
ir: Add opcodes for performing unsigned reciprocal square root estimates
2020-04-22 20:55:05 +01:00
Lioncash
bd3582e811
A64: Implement URECPE
2020-04-22 20:55:05 +01:00
Lioncash
af83360f89
ir: Add opcodes for unsigned reciprocal estimate
2020-04-22 20:55:05 +01:00
Lioncash
740ffa52ae
A64: Implement SQNEG's scalar and vector variant
2020-04-22 20:53:46 +01:00
Lioncash
fca7eddb9e
A64: Add opcodes for signed saturating negations
2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
...
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9
emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
...
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
Lioncash
f5fb496e7e
A64: Implement SQDMULH's by-element scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
40f0576995
A64: Implement SQDMULH's by-element vector variant
2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d
backend/x64: Do not clear fast_dispatch_table if not enabled
...
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660
A64: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422
A32: Implement FastDispatchHint
2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13
ir/terminal: Add FastDispatchHint
2020-04-22 20:53:46 +01:00
Lioncash
1a69a61cb4
A64: Implement SQDMULH's scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c
ir: Add opcodes for scalar signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
9c03311fed
A64: Implement SQDMULH's vector variant
2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546
ir: Add opcodes for signed saturated doubling multiplies
2020-04-22 20:53:46 +01:00
Lioncash
db24e1f09b
A64: Implement SQABS' scalar variant
2020-04-22 20:53:46 +01:00
Lioncash
bda5d14c7f
A64: Implement SQABS' vector variant.
2020-04-22 20:53:46 +01:00
Lioncash
0507e47420
ir: Add opcodes for signed saturated absolute values
2020-04-22 20:53:46 +01:00
MerryMage
27427595b7
emit_x64_floating_point: EmitFPToFixed: maxsd optimization
...
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a
emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
...
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
MerryMage
3415828fb4
IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64}
2020-04-22 20:53:46 +01:00
Lioncash
e30f9816ec
A32/decoder: Add missing <algorithm> includes
...
These includes should be present, as we use std::find_if() within these headers.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905
emit_x64_vector: Provide AVX path for EmitVectorMinU64()
2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06
emit_x64_vector: Provide AVX path for EmitVectorMinS64()
2020-04-22 20:53:46 +01:00
Lioncash
770723f449
emit_x64_vector: Provide AVX path for EmitVectorMaxU64()
2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1
emit_x64_vector: Provide AVX path for EmitVectorMaxS64()
2020-04-22 20:53:46 +01:00
Lioncash
2cac6ad129
emit_x64_vector: Simplify EmitVectorLogicalLeftShift8()
...
Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead
of time and just and the results of a halfword left shift.
2020-04-22 20:53:46 +01:00
Lioncash
135107279d
emit_x64_vector: Simplify EmitVectorLogicalShiftRight8()
...
We can generate the mask and AND it against the result of a halfword
shift instead of looping.
2020-04-22 20:53:46 +01:00
Lioncash
2952b46b16
emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16()
...
We should be defining the value after the results have been calculated
to be consistent with the rest of the code.
2020-04-22 20:53:46 +01:00
Lioncash
fda19095ea
emit_x64_vector: Remove fallback in EmitVectorSignExtend64()
...
This is fairly trivial to do manually.
2020-04-22 20:53:46 +01:00
Lioncash
39593fcd26
emit_x64_vector: Remove fallback for EmitVectorSignExtend32()
...
We can just do the extension manually, which gets rid of the need to
fall back here.
2020-04-22 20:53:46 +01:00
Lioncash
053175f69b
ir_emitter: Rename fpscr_controlled parameters to fpcr_controlled
...
Part of addressing #333
2020-04-22 20:53:46 +01:00
MerryMage
f0184c4b8d
a32/exception_generating: BPKT: Define unpredictable behaviour
...
Define unpredictable behaviour to be BKPT executes conditionally
2020-04-22 20:53:46 +01:00
MerryMage
a12854857b
A32: Add define_unpredictable_behaviour option
2020-04-22 20:53:46 +01:00
MerryMage
b0abaa8312
A32/location_descriptor: Change formatting to use hex
2020-04-22 20:53:46 +01:00
MerryMage
ccbf6c7f63
microinstruction: A32ExceptionRaised causes CPU exception
2020-04-22 20:53:46 +01:00
MerryMage
6595e49a31
A32/types: CondToString: Add nv
2020-04-22 20:53:46 +01:00
MerryMage
d5b9c4a4bb
block_of_code: Hide NX support behind compiler flag
...
Systems that require W^X can use the DYNARMIC_ENABLE_NO_EXECUTE_SUPPORT cmake option.
2020-04-22 20:53:46 +01:00
MerryMage
de4494ffa5
Implement perfmap
2020-04-22 20:53:46 +01:00
MerryMage
f73104633b
a32_emit_x64: Fix incorrect BMI2 implementation for SetCpsr
...
* The MSB for each byte in cpsr_ge were not being appropriately set.
* We also expand test coverage to test this case.
* We fix the disassembly of the MSR (imm) and MSR (reg) instructions as well.
2020-04-22 20:53:46 +01:00
MerryMage
3432a08e0a
backend/x64: Support W^X systems
...
Closes #176 .
2020-04-22 20:53:46 +01:00
BreadFish64
2a65442933
Backend: Create "backend" folder
...
similar to the "frontend" folder
2020-04-22 20:53:46 +01:00
MerryMage
3b13f1eb12
A64/translate: Standardize arguments of helper functions
...
Don't pass in IREmitter when TranslatorVisitor is already available.
2020-04-22 20:53:45 +01:00
MerryMage
a4e556d59c
A64/translate: Standardize TranslatorVisitor abbreviation
...
Prefer v to tv.
2020-04-22 20:53:45 +01:00
MerryMage
9a0dc61efd
emit_x64_vector: Avoid recalculating addresses in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
Lioncash
3d465e2c36
A64: Implement SQXTN, SQXTUN, and UQXTN's scalar variants
...
We can implement these in terms of the vector variants
2020-04-22 20:53:45 +01:00
Lioncash
4ff39c6ea8
A64: Implement SDOT and UDOT's (by element) variants
...
Gets all of the dot product instructions out of the way.
2020-04-22 20:53:45 +01:00
MerryMage
21df1fb539
emit_x64_vector: Don't load zero constant from memory in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
3bbcca8757
emit_x64_vector: Special-case is_defaults_zero && table_size == 2 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
9cc00f900c
emit_x64_vector: Release registers when possible in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a12afd1065
reg_alloc: Add the ability to Release an allocation early
2020-04-22 20:53:45 +01:00
MerryMage
e68bd3c6c1
emit_x64_vector: Special-case table_size == 1 in EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
a4e1f8a63a
emit_x64_vector: SSE4.1 implementation of EmitVectorTableLookup
2020-04-22 20:53:45 +01:00
MerryMage
0c18b85c27
A64: Implement TBL and TBX
2020-04-22 20:53:45 +01:00
MerryMage
89d08c7d61
IR: Add VectorTable and VectorTableLookup IR instructions
2020-04-22 20:53:45 +01:00
MerryMage
0288974512
opcodes: Cleanup opcodes table
...
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2020-04-22 20:53:45 +01:00
Lioncash
d9fc6cf31f
A64: Implement SDOT and UDOT's vector variant
2020-04-22 20:53:45 +01:00
Lioncash
cb5e5c5d49
A64: Implement SADALP and UADALP
...
While we're at it we can join the code for SADDLP and UADDLP with these
instructions, since the only difference is we do an accumulate at the
end of the operation.
2020-04-22 20:53:45 +01:00
Lioncash
29f8b30634
A64: Implement SRSHL and URSHL
...
Implements both scalar and vector variants.
2020-04-22 20:53:45 +01:00
Lioncash
0efa2ce3b0
ir: Add opcodes for performing rounding left shifts
2020-04-22 20:53:45 +01:00
MerryMage
656ceff225
emit_x64_floating_point: Fix smallest normal check in EmitFPMulAdd
2020-04-22 20:53:45 +01:00
Lioncash
f3f60cd179
A64: Implement ISB
...
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2020-04-22 20:53:45 +01:00
Lioncash
be53e356a2
A64: Implement FCVTN{2}
2020-04-22 20:53:45 +01:00
Lioncash
4c3d7c5a8d
A64: Implement FCVTL{2}
2020-04-22 20:53:45 +01:00