Commit graph

246 commits

Author SHA1 Message Date
MerryMage
7ea521b8bf a32_emit_x64: Change ExclusiveWriteMemory64 to require a single U64 argument 2020-06-16 13:32:50 +01:00
MerryMage
4e90754873 IR: Implement VectorSaturated{Signed,Unsigned}{Add,Sub} 2020-05-30 15:55:32 +01:00
MerryMage
07108246cf A32/IR: Add SetVector and GetVector 2020-05-28 20:39:19 +01:00
MerryMage
d0b45f6150 A32: Implement ARMv8 VST{1-4} (multiple) 2020-05-17 17:01:39 +01:00
Fernando Sahmkow
97b9d3e058 Exclusive Monitor: Rework exclusive monitor interface. 2020-05-03 01:40:37 +01:00
MerryMage
f59b9fb020 IR: Add ReplicateBit microinstruction 2020-04-22 21:07:09 +01:00
MerryMage
09d3c77d74 IR: Add masked shift IR instructions
Also use these in the A64 frontend to avoid the need to mask the shift amount.
2020-04-22 21:06:17 +01:00
Lioncash
43fd2b400a frontend/ir_emitter: Add half-precision opcode for FPVectorEquals 2020-04-22 21:04:22 +01:00
Lioncash
bd755ae494 frontend/ir/ir_emitter: Add A32 equivalent to A64's SetCheckBit
This will be used in a subsequent change to implement ARMv6T2's CBZ/CBNZ
Thumb-1 instructions.
2020-04-22 21:02:47 +01:00
Lioncash
8316d231e9 A32: Implement barrier instructions introduced in ARMv7
Provides basic implementations of the barrier instruction introduced
within ARMv7. Currently these simply mirror the behavior of the AArch64
equivalents.
2020-04-22 21:02:46 +01:00
Merry
09ee64ea98 Merge pull request #482 from lioncash/fixedfp
A64: Handle half-precision variants of FP->Fixed instructions
2020-04-22 21:02:45 +01:00
Lioncash
604f39f00a frontend/ir_emitter: Add half-precision->fixed-point opcodes 2020-04-22 21:01:45 +01:00
Lioncash
96356fac93 frontend/ir_emitter: Add half-precision opcode variant of FPVectorRSqrtStepFused 2020-04-22 21:01:45 +01:00
Merry
45864133f5 Merge pull request #478 from lioncash/stepfused
A64: Handle half-precision variants of FRECPE and FRECPS
2020-04-22 21:01:44 +01:00
Lioncash
824c551ba2 frontend/ir_emitter: Add half-precision opcode variant of FPRSqrtStepFused 2020-04-22 21:01:44 +01:00
Lioncash
037acb17b9 frontend/ir_emitter: Add half-precision opcode variant for FPVectorRSqrtEstimate 2020-04-22 21:01:44 +01:00
Lioncash
5dba99b4f4 frontend/ir_emitter: Add half-precision opcode variant for FPRSqrtEstimate 2020-04-22 21:01:44 +01:00
Lioncash
825a3ea16f frontend/ir_emitter: Add half-precision opcode for FPVectorRecipEstimate 2020-04-22 21:01:44 +01:00
Lioncash
2184d24e8f frontend/ir_emitter: Add half-precision opcode for FPRecipEstimate 2020-04-22 21:01:44 +01:00
Lioncash
5d5c9f149f frontend/ir_emitter: Add half-precision opcode for FPVectorRecipStepFused 2020-04-22 21:01:44 +01:00
Lioncash
6da0411111 frontend/ir_emitter: Add half-precision opcode for FPRecipStepFused 2020-04-22 21:01:44 +01:00
Lioncash
5b4673da4b frontend/ir_emitter: Add half-precision variant of FPVectorRoundInt 2020-04-22 21:01:44 +01:00
Lioncash
ad0c698f89 frontend/ir_emitter: Add half-precision variant of FPRoundInt 2020-04-22 21:01:44 +01:00
Merry
cb9a1b18b6 Merge pull request #475 from lioncash/muladd
A64: Enable half-precision variants of floating-point multiply-add instructions
2020-04-22 21:01:44 +01:00
Lioncash
a4cadf1cd9 frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation 2020-04-22 21:01:44 +01:00
Lioncash
ec6b3ae084 ir/frontend: Add half-precision opcode for FPVectorMulAdd 2020-04-22 21:01:44 +01:00
Lioncash
bd82513199 frontend/ir_emitter: Add half-precision opcode for FPMulAdd 2020-04-22 21:01:44 +01:00
Merry
01bb1cdd88 Merge pull request #458 from lioncash/float-op
A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV
2020-04-22 20:58:12 +01:00
Lioncash
8309ec7a9f frontend/ir_emitter: Add half-precision variant of FPAbs 2020-04-22 20:58:12 +01:00
Lioncash
e4c259d69f frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes 2020-04-22 20:58:12 +01:00
Lioncash
c97efcb978 frontend/ir_emitter: Add half-precision variant of FPNeg 2020-04-22 20:58:12 +01:00
Lioncash
bd892ec4ef frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point 2020-04-22 20:58:11 +01:00
Merry
bbd5330ad2 Merge pull request #447 from lioncash/flag
A64: Implement CFINV, RMIF, AXFlag and XAFlag
2020-04-22 20:58:11 +01:00
Merry
fb039e232c Merge pull request #442 from lioncash/fcvtxn
A64: Implement scalar and vector variants of FCVTXN
2020-04-22 20:58:11 +01:00
Lioncash
597a8be5d5 ir: Add A64-specific opcodes for getting and setting raw NZCV values
This will be necessary to implement the flag manipulation and flag
format instructions.
2020-04-22 20:58:11 +01:00
Lioncash
5cf1478620 frontend/ir: Add opcodes for vector square roots 2020-04-22 20:58:10 +01:00
Lioncash
7c81a58ed3 frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode
This will be necessary to special-case the non-IEEE Von Neumann rounding
to odd rounding mode.
2020-04-22 20:58:10 +01:00
Lioncash
9cf3c25811 frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents 2020-04-22 20:58:10 +01:00
MerryMage
fa8925c4df IR: Implement FPVectorMulX 2020-04-22 20:57:37 +01:00
MerryMage
f0920c0ded Fix VShift terminology
An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift.

- Rename VectorLogicalVShiftS* -> VectorArithmeticVShift*
- Rename VectorLogicalVShiftU* -> VectorLogicalVShift*
2020-04-22 20:55:50 +01:00
Lioncash
d426dfe942 ir: Add opcodes for unsigned saturating left shifts 2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7 IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0 opcodes.inc: Align columns to a tabstop of 4 2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d IR: Add fbits argument to FixedToFP-related opcodes 2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash
e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04 ir: Add opcodes form unsigned saturated accumulations of signed values 2020-04-22 20:55:05 +01:00