dynarmic

Author	SHA1	Message	Date
Lioncash	b2f7a0e7ba	A32: Implement ARM-mode SDIV/UDIV Now that we have Unicorn in place, we can freely implement instructions introduced in newer versions of the ARM architecture.	2020-04-22 21:02:46 +01:00
Lioncash	c0ae23bbb7	A32/translate_thumb: Clean up formatting Performs a similar tidying up of the Thumb translator, like what was done with the regular ARM translator to make it consistent with the rest of the codebase. The A32 backend (both Thumb and ARM), will likely see more changes to it in the near future, so this just acts as a "dusting off".	2020-04-22 21:02:46 +01:00
Merry	837c23a8ec	Merge pull request #483 from lioncash/invert frontend/ir/cond: Remove unused invert() function	2020-04-22 21:02:46 +01:00
Lioncash	d12e375481	common/fp/op/FPConvert: Remove unnecessary casts in FPConvert() These were made unnecessary in 2c2fdb435cf8e358a0c5b907ce8131e434df3f22, but were missed during the initial removal.	2020-04-22 21:02:46 +01:00
Merry	09ee64ea98	Merge pull request #482 from lioncash/fixedfp A64: Handle half-precision variants of FP->Fixed instructions	2020-04-22 21:02:45 +01:00
MerryMage	1e1e9c17c7	emit_x64_data_processing: Remove INVALID_REG INVALID_REG.cvt8() now throws	2020-04-22 21:02:45 +01:00
Lioncash	06ec6ab0da	frontend/ir/cond: Remove unused invert() function This is no longer used by anything in the codebase, so it can be removed.	2020-04-22 21:01:46 +01:00
Merry	d71f51b0da	Merge pull request #481 from lioncash/alloc ir/basic_block: Forward declare headers where applicable	2020-04-22 21:01:46 +01:00
Lioncash	64e3d233f4	A64: Handle half-precision variants of FP->Fixed-point instructions	2020-04-22 21:01:45 +01:00
Lioncash	4fc531f71b	ir/basic_block: Forward declare headers where applicable Now that the constructor and destructors have been placed within the cpp file, we can forward declare the memory pool data structures. Now, a change to the memory pool code won't ripple across the entirety of the IR emitter.	2020-04-22 21:01:45 +01:00
Lioncash	427b7afd66	frontend/ir/microinstruction: Add missing fixed-point opcodes to ReadsFromAndWritesToFPSRCumulativeExceptionBits()	2020-04-22 21:01:45 +01:00
Lioncash	c9777ef997	common/fp/info: Make half-precision info struct functions return correctly sized types While initially done to potentially prevent creating bugs due to C++ having a silly type-promotion mechanism involving types < sizeof(int) and unsignedness, given that the bulk of these functions' usages are on exit paths, these can return the correct type to avoid the need to cast at every usage point.	2020-04-22 21:01:45 +01:00
Lioncash	9309d95b17	ir/block: Default ctor and dtor in the cpp file Prevents potentially inlining allocation code everywhere. While we're at it, also explicitly delete/default the copy/move constructor/assignment operators to be explicit about them.	2020-04-22 21:01:45 +01:00
Lioncash	604f39f00a	frontend/ir_emitter: Add half-precision->fixed-point opcodes	2020-04-22 21:01:45 +01:00
Lioncash	4ecfbc14de	common/fp/op/FPToFixed: Add half-precision specialization of FPToFixed	2020-04-22 21:01:45 +01:00
Lioncash	471eb77bc9	A64: Implement FRSQRTS' half-precision vector variant	2020-04-22 21:01:45 +01:00
Lioncash	f9b2862217	A64: Implement FRSQRTS' half-precision scalar variant With the necessary machinery in place, we can now handle the half-precision variant.	2020-04-22 21:01:45 +01:00
Lioncash	96356fac93	frontend/ir_emitter: Add half-precision opcode variant of FPVectorRSqrtStepFused	2020-04-22 21:01:45 +01:00
Merry	45864133f5	Merge pull request #478 from lioncash/stepfused A64: Handle half-precision variants of FRECPE and FRECPS	2020-04-22 21:01:44 +01:00
Lioncash	824c551ba2	frontend/ir_emitter: Add half-precision opcode variant of FPRSqrtStepFused	2020-04-22 21:01:44 +01:00
Lioncash	3739d92097	A64: Implement half-precision vector variant of FRECPE	2020-04-22 21:01:44 +01:00
Lioncash	e3b2eb57b5	common/fp/op/FPRSqrtStepFused: Add half-precision specialization for FPRSqrtStepFused	2020-04-22 21:01:44 +01:00
Lioncash	7b212ec8ae	A64: Implement half-precision variant of FRSQRTE's vector variant	2020-04-22 21:01:44 +01:00
Lioncash	0945a491bd	A64: Implement half-precision scalar variant of FRECPE	2020-04-22 21:01:44 +01:00
Lioncash	77c84bcf9b	A64: Implement half-precision variant of FRSQRTE's scalar variant	2020-04-22 21:01:44 +01:00
Lioncash	86b7626a2f	A64: Implement half-precision vector variant of FRECPS	2020-04-22 21:01:44 +01:00
Lioncash	037acb17b9	frontend/ir_emitter: Add half-precision opcode variant for FPVectorRSqrtEstimate	2020-04-22 21:01:44 +01:00
Lioncash	de43f011a7	A64: Implement half-precision scalar variant of FRECPS	2020-04-22 21:01:44 +01:00
Lioncash	5dba99b4f4	frontend/ir_emitter: Add half-precision opcode variant for FPRSqrtEstimate	2020-04-22 21:01:44 +01:00
Lioncash	825a3ea16f	frontend/ir_emitter: Add half-precision opcode for FPVectorRecipEstimate	2020-04-22 21:01:44 +01:00
Lioncash	726b9914c5	common/fp/op/FPRSqrtEstimate: Add half-precision specialization for FPRSqrtEstimate	2020-04-22 21:01:44 +01:00
Lioncash	2184d24e8f	frontend/ir_emitter: Add half-precision opcode for FPRecipEstimate	2020-04-22 21:01:44 +01:00
Lioncash	af2e5afed6	common/fp/op: Add half-precision specialization for FPRecipEstimate	2020-04-22 21:01:44 +01:00
Lioncash	d7f394fc1a	A64: Enable half-precision vector FRINT* variants	2020-04-22 21:01:44 +01:00
Lioncash	5d5c9f149f	frontend/ir_emitter: Add half-precision opcode for FPVectorRecipStepFused	2020-04-22 21:01:44 +01:00
Lioncash	24f583c498	A64: Enable half-precision variants of floating-point FRINT* variants With all the backing machinery in place, we can remove the fallback check for half-precision.	2020-04-22 21:01:44 +01:00
Lioncash	6da0411111	frontend/ir_emitter: Add half-precision opcode for FPRecipStepFused	2020-04-22 21:01:44 +01:00
Lioncash	fb829b9525	frontend/microinstruction: Add FPVectorRoundInt types to ReadsFromAndWritesToFPSRCumulativeExceptionBits() All variants were previously missing from this.	2020-04-22 21:01:44 +01:00
Lioncash	68d8cd2b13	common/fp/op: Add half-precision specialization for FPRecipStepFused	2020-04-22 21:01:44 +01:00
Lioncash	5b4673da4b	frontend/ir_emitter: Add half-precision variant of FPVectorRoundInt	2020-04-22 21:01:44 +01:00
Lioncash	ad0c698f89	frontend/ir_emitter: Add half-precision variant of FPRoundInt	2020-04-22 21:01:44 +01:00
Lioncash	61cec94a19	fp/op/FPRoundInt: Add half-precision specialization of FPRoundInt	2020-04-22 21:01:44 +01:00
Merry	cb9a1b18b6	Merge pull request #475 from lioncash/muladd A64: Enable half-precision variants of floating-point multiply-add instructions	2020-04-22 21:01:44 +01:00
Merry	d6db7ad46c	Merge pull request #474 from lioncash/bracing load_store_*: Make bracing consistent and variables const where applicable	2020-04-22 21:01:44 +01:00
Merry	1b6520f5dd	A64/location_descriptor: Ensure FZ16 is included in the FPCR mask	2020-04-22 21:01:44 +01:00
Merry	13f421c27d	Merge pull request #473 from lioncash/sqshlu A64: Implement SQSHLU	2020-04-22 21:01:44 +01:00
Lioncash	b5bf890584	load_store_*: Make bracing consistent and variables const where applicable Makes bracing consistent, and variables const where applicable to be consistent with the rest of the codebase. In most bracing cases, they'd need to be added to conditionals that would involve checking stack pointer alignment in the future anyways.	2020-04-22 21:01:44 +01:00
Lioncash	9a58c3f1c7	A64: Implement FMLA/FMLS' half-precision vector indexed variants	2020-04-22 21:01:44 +01:00
Merry	d7da53a74b	Merge pull request #472 from lioncash/exception general: Mark hash functions as noexcept	2020-04-22 21:01:44 +01:00
Lioncash	9dcc04e106	A64: Implement SQSHLU's scalar variant	2020-04-22 21:01:44 +01:00
Merry	b91c6c8bae	Merge pull request #471 from lioncash/sqrdmulh A64: Implement SQRDMULH's scalar vector variant	2020-04-22 21:01:44 +01:00
Lioncash	1fdd3ef8a0	A64: Implement FMLA/FMLS' half-precision scalar indexed variants	2020-04-22 21:01:44 +01:00
Lioncash	2d59d10ac8	A64: Implement SQSHLU's vector variant The vector shift by immediate category is now fully implemented.	2020-04-22 21:01:44 +01:00
Merry	b5e25959d9	Merge pull request #470 from lioncash/assert general: Replace unreachable-imitating assertions with UNREACHABLE()	2020-04-22 21:01:44 +01:00
Lioncash	d6606deda2	A64: Implement half-precision vector variants of FMLA/FMLS	2020-04-22 21:01:44 +01:00
Lioncash	a4cadf1cd9	frontend/ir_emitter: Add opcodes for signed saturated left shifts with unsigned saturation	2020-04-22 21:01:44 +01:00
Lioncash	ec6b3ae084	ir/frontend: Add half-precision opcode for FPVectorMulAdd	2020-04-22 21:01:44 +01:00
Lioncash	5f74d25bf7	A64: Enable half-precision floating point variants of FP data-processing three register instructions This handles half-precision floating point for: - FMADD - FMSUB - FNMADD - FNMSUB	2020-04-22 21:01:44 +01:00
Lioncash	bd82513199	frontend/ir_emitter: Add half-precision opcode for FPMulAdd	2020-04-22 21:01:44 +01:00
Lioncash	79a892d23c	fp/op/FPMulAdd: Add half-precision floating-point specialization	2020-04-22 21:01:44 +01:00
Lioncash	7bb5440507	general: Mark hash functions as noexcept Generally hash functions shouldn't throw exceptions. It's also a requirement for the standard library-provided hash functions to not throw exceptions. An exception to this rule is made for user-defined specializations, however we can just be consistent with the standard library on this to allow it to play nicer with it. While we're at it, we can also make the std::less specializations noexcpet as well, since they also can't throw.	2020-04-22 21:01:43 +01:00
Lioncash	3b46b4a37d	A64: Implement SQRDMULH's scalar vector variant Implements the scalar variant in terms of the vector variant for the time being.	2020-04-22 21:01:43 +01:00
Lioncash	fe95575b95	general: Replace unreachable-imitating assertions with UNREACHABLE() We can just use the self-documenting assertion for indicating unreachable paths, instead of manually passing false and providing a message.	2020-04-22 21:01:43 +01:00
Merry	4a3d808354	Merge pull request #468 from lioncash/const ir_opt: Mark locals as const where applicable	2020-04-22 21:01:43 +01:00
Lioncash	64de80839e	A64/impl: Reorganize peculiar void use in V_scalar To a reader this might look particularly strange, given the function itself has a void return value, but this is actually valid, given the function in the return statement also has a void return value. This instead alters it to be a little easier to parse and potentially be a little less confusing at a glance.	2020-04-22 21:01:43 +01:00
Merry	9a4e3b24e4	Merge pull request #467 from lioncash/reserved A64: Handle reserved instruction cases more specifically where applicable	2020-04-22 21:01:43 +01:00
Merry	0b794cbcea	Merge pull request #466 from lioncash/fcmla A64: Implement FCMLA's indexed element variant	2020-04-22 21:01:43 +01:00
Merry	994349d154	Merge pull request #465 from neobrain/master CMakeLists: Allow importing dynarmic build trees into other CMake projects	2020-04-22 21:01:43 +01:00
Lioncash	cfd7513a7d	ir_opt/verification_pass: Mark locals as const where applicable Makes our immutable state a little more explicit.	2020-04-22 21:01:40 +01:00
Lioncash	8309d49588	A64: Handle reserved instruction cases more specifically where applicable These are cases that are defined as reserved within the ARMv8 reference manual, so we can handle them as such instead of as unallocated encodings. While this doesn't actually change emulated behavior, it does at least allow the JIT to generate the more appropriate exception.	2020-04-22 21:00:47 +01:00
Lioncash	6c2c68bce6	A64: Implement FCMLA's indexed element variant With this, all of the instructions introduced with ARMv8.3-CompNum have an implementation.	2020-04-22 21:00:47 +01:00
Tony Wasserka	7d99a6c00f	CMakeLists: Allow importing dynarmic build trees into other CMake projects	2020-04-22 21:00:47 +01:00
Lioncash	1a45f35b9c	ir_opt/a64_callback_config_pass: Mark locals as const where applicable Makes our immutable state a little more explicit.	2020-04-22 21:00:47 +01:00
Lioncash	7bc7042104	simd_scalar_shift_by_immediate: Change UnallocatedEncoding() path in SaturatingShiftLeft to ReservedValue() Strictly speaking, immh being zero is defined as reserved in the ARMv8 reference manual. This was just an error on my part when introducing the SQSHL immediate scalar variant.	2020-04-22 21:00:47 +01:00
Lioncash	dc97977576	ir_opt/a32_get_set_elimination_pass: Mark local variables as const where applicable Makes our intended immutable state slightly more explicit.	2020-04-22 21:00:47 +01:00
Lioncash	b1b4487e4d	A64: Implement UQSHL (immediate)'s scalar variant Like SQSHL's immediate scalar variant, we can also implement UQSHL's immediate scalar variant in terms of the vector variant for the time being.	2020-04-22 21:00:47 +01:00
Lioncash	3649dc6d9a	A64: Implement scalar variant of SQSHL (immediate) This can be handled in terms of the vector variant for the time being.	2020-04-22 21:00:47 +01:00
Lioncash	7d535eaba6	ir_opt/a32_constant_memory_reads_pass: Apply const where applicable to locals Makes immutable state just slightly more explicit.	2020-04-22 21:00:47 +01:00
Lioncash	e1b4ff1068	simd_scalar_shift_by_immediate: Migrate SQSHL implementation to file-scope function This will allow it to be reused for the implementation of UQSHL.	2020-04-22 21:00:47 +01:00
Lioncash	b37279f65c	backend/x64/emit_x64_vector: Prevent undefined behavior within VectorSignedSaturatedShiftLeft Avoids undefined behavior by potentially left-shifting a signed negative value.	2020-04-22 21:00:47 +01:00
Lioncash	46eae8cf2f	common/fp/op/FPRecipExponent: Prevent undefined behavior from shifting a negative value Due to promotion rules (types < int, even if unsigned, get promoted to int when arithmetic is performed on them), this is a potential spot for undefined behavior.	2020-04-22 21:00:47 +01:00
MerryMage	13e8b7b516	emit_x64_floating_point: F16C implementation of FPSingleToHalf	2020-04-22 20:58:17 +01:00
MerryMage	d32d6fe598	emit_x64_floating_point: F16C implementation of FPHalfToSingle and FPHalfToDouble	2020-04-22 20:58:12 +01:00
MerryMage	a53ba12be2	emit_x64_floating_point: Factor out ConvertRoundingModeToX64Immediate	2020-04-22 20:58:12 +01:00
MerryMage	5a2adc6629	backend/x64: Expose FPCR in EmitContext instead of its subcomponents	2020-04-22 20:58:12 +01:00
Merry	01bb1cdd88	Merge pull request #458 from lioncash/float-op A64: Handle half-precision floating point in FABS, FNEG, and scalar FMOV	2020-04-22 20:58:12 +01:00
Lioncash	28a8b4d210	A64: Handle half-precision floating point in scalar FMOV This is simply performing a scalar value transfer between registers without conversions, so this is trivial to handle as-is.	2020-04-22 20:58:12 +01:00
Lioncash	d7ac5a664f	A64: Handle half-precision floating point in FCVTL Like FCVTN, now that we have half-precision floating point conversion functions available, we can go ahead and use those to eliminate the interpreter fallback.	2020-04-22 20:58:12 +01:00
Lioncash	fe84ecb780	A64: Handle half-precision floating point in scalar FABS Now that we have the half-precision variant of the opcode added, we can simply handle the instruction instead of treating it as undefined.	2020-04-22 20:58:12 +01:00
Lioncash	fac9224d5e	A64: Handle half-precision floating point in FCVTN Now that we have IR instructions for performing conversions with half-precision floating point, we can also handle half-precision values within FCVTN.	2020-04-22 20:58:12 +01:00
Lioncash	8309ec7a9f	frontend/ir_emitter: Add half-precision variant of FPAbs	2020-04-22 20:58:12 +01:00
Lioncash	16de99d3e3	A64: Enable FCVT floating-point conversions for half-precision With this, we no longer have to fall back to the interpreter in any of the FCVT floating-point conversion instructions.	2020-04-22 20:58:12 +01:00
Lioncash	10abc77fad	A64: Handle half-precision floating point in scalar FNEG With the half-precision variant of the FPNeg opcode added, we can utilize it here to emulate the half-precision variant of FNEG.	2020-04-22 20:58:12 +01:00
Lioncash	e4c259d69f	frontend/ir_emitter: Add half->{single, double} and {double, single}->half conversion opcodes	2020-04-22 20:58:12 +01:00
Lioncash	c97efcb978	frontend/ir_emitter: Add half-precision variant of FPNeg	2020-04-22 20:58:12 +01:00
Lioncash	dff5da1063	common/fp/unpacked: Amend behavior of FPUnpackCV This is supposed to call FPUnpackBase instead of FPUnpack. This would result in alternate half-precision representations being misinterpreted when it comes to dealing with NaNs.	2020-04-22 20:58:12 +01:00
Merry	f01afc5ae6	Merge pull request #456 from lioncash/mov A64: Enable FMOV (general) for half-precision floating point	2020-04-22 20:58:12 +01:00
Lioncash	03bc2334fe	common/fp/op/FPConvert: Amend off-by one in double NaN case in FPConvertNaN Avoids potentially clobbering the intended sign bit value during conversions to double-precision values. The other conversion types are already properly handled, so those don't need to be addressed.	2020-04-22 20:58:12 +01:00
Lioncash	c57b146fb2	common/fp/op/FPConvert: Add half-precision instantiations to FPConvert	2020-04-22 20:58:12 +01:00
Merry	c1ce94872d	Merge pull request #455 from lioncash/sqrdmulh-scalar A64: Implement SQRDMULH and SQDMULL's scalar indexed variants	2020-04-22 20:58:11 +01:00
Lioncash	25a7256ee1	A64: Enable FMOV (general) for half-precision floating point This just transfers values between vector registers and general-purpose registers with no conversions performed, so this is trivial to add support for half-precision to.	2020-04-22 20:58:11 +01:00
Lioncash	97dd3d0596	A64: Implement SQRDMULH's scalar indexed element variant	2020-04-22 20:58:11 +01:00
Lioncash	49b51e34f1	simd_vector_x_indexed_element: Deduplicate index and Vm operand construction	2020-04-22 20:58:11 +01:00
Lioncash	692aba91b6	A64: Implement SQDMULL{2}'s scalar indexed element variant	2020-04-22 20:58:11 +01:00
Lioncash	c043b831d5	A64: Implement SQDMULL{2}'s by-element variant	2020-04-22 20:58:11 +01:00
Lioncash	72af5a3dff	simd_scalar_x_indexed_element: Factor out index and Vm argument construction This will be useful in the implementations of SQRDMULH and SQDMULL{2} as well.	2020-04-22 20:58:11 +01:00
Lioncash	224ff0afaa	A64: Implement SQRDMULH's by-index vector variant	2020-04-22 20:58:11 +01:00
Lioncash	3a3542414b	A64: Implement FRECPX's half-precision floating point variant	2020-04-22 20:58:11 +01:00
Lioncash	bd892ec4ef	frontend/ir/ir_emitter: Amend FPRecipExponent to handle half-precision floating point	2020-04-22 20:58:11 +01:00
Lioncash	974fbf0677	frontend/ir/value: Add U16U32U64 type to represent floating point types	2020-04-22 20:58:11 +01:00
Lioncash	eb3e0d5908	common/fp/op/FPRecipExponent: Add half-precision floating point specialization	2020-04-22 20:58:11 +01:00
Lioncash	a829c93406	common/fp/unpacked: Correct edge-cases within FPUnpack for half-precision floating point This corrects one case where floating-point exceptions could be set when they're not supposed to be. This also corrects a case where values were being treated as NaNs when they weren't supposed to be.	2020-04-22 20:58:11 +01:00
Lioncash	7030b9af95	common/fp/process_nan: Add half-precision instantiations for NaN processing functions	2020-04-22 20:58:11 +01:00
Lioncash	14f55d7476	common/fp/unpacked: Add half-precision instantiation of FPRoundBase	2020-04-22 20:58:11 +01:00
Lioncash	7e814de445	common/fp/unpacked: Handle half-precision unpacking in FPUnpackBase	2020-04-22 20:58:11 +01:00
Lioncash	8f9fe8690a	common/fp/unpacked: Adjust FPUnpack to operate like ARM pseudocode This function is defined as always disabling the AHP bit in the fpcr before performing any operations. At the same time, rename the original FPUnpack function to FPUnpackBase to match the pseudocode in the ARM reference manual.	2020-04-22 20:58:11 +01:00
Merry	37c4c39d62	Merge pull request #448 from lioncash/saturate A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants	2020-04-22 20:58:11 +01:00
Merry	f5d774bdbd	Merge pull request #449 from lioncash/hp common/fp/info: Add specialization of FPInfo for half-precision floating point	2020-04-22 20:58:11 +01:00
Lioncash	126c29a9e9	A64: Implement SQSHRN, SQSHRUN, and UQSHRN's scalar variants These can just be implemented in terms of the vector variants for the time being.	2020-04-22 20:58:11 +01:00
Lioncash	0b67b94b6c	common/fp/info: Add specialization of FPInfo for half-precision floating point Puts the necessary info struct in place for further use.	2020-04-22 20:58:11 +01:00
Lioncash	dd7433f9d3	A64: Amend prototypes of some SIMD scalar shift by immediate opcodes These take a vector for a destination.	2020-04-22 20:58:11 +01:00
Lioncash	99c494bae9	common/fp/unpacked: Add FPRoundCV Corresponds to the equivalent pseudocode within the ARMv8 reference manual. This will be necessary for supporting half-precision floating-point. This also makes use of it within FPConvert	2020-04-22 20:58:11 +01:00
Merry	bbd5330ad2	Merge pull request #447 from lioncash/flag A64: Implement CFINV, RMIF, AXFlag and XAFlag	2020-04-22 20:58:11 +01:00
Lioncash	490bebbd9a	common/fp/unpacked: Add FPUnpackCV Adds a template function that performs the same behavior as in the ARM pseudocode, and utilizes it in FPConvert, which will be necessary for half-float support.	2020-04-22 20:58:11 +01:00
Merry	fb039e232c	Merge pull request #442 from lioncash/fcvtxn A64: Implement scalar and vector variants of FCVTXN	2020-04-22 20:58:11 +01:00
Lioncash	6aed4036ef	ir_opt/a64_get_set_elimination_pass: Add handling for NZCV raw get and set operations	2020-04-22 20:58:11 +01:00
Merry	4f937c1ee1	Merge pull request #446 from lioncash/sqshl A64: Implement scalar variants of SQSHL (register) and UQSHL (register)	2020-04-22 20:58:11 +01:00
Lioncash	aa22db534b	A64: Implement AXFlag and XAFlag	2020-04-22 20:58:11 +01:00
Merry	d74cccbc84	Merge pull request #445 from lioncash/sqrt A64: Implement single and double-precision vector variant of FSQRT	2020-04-22 20:58:11 +01:00
Lioncash	20ffe568d0	A64: Implement RMIF	2020-04-22 20:58:11 +01:00
Merry	6d7e7c3269	Merge pull request #443 from lioncash/flag A64: Rearrange flag format/manipulation instructions	2020-04-22 20:58:11 +01:00
Lioncash	51b526e453	A64: Implement CFINV	2020-04-22 20:58:11 +01:00
Merry	5d01f1b462	Merge pull request #441 from lioncash/constexpr common/bit_util: Mark a few functions as constexpr	2020-04-22 20:58:11 +01:00
Lioncash	597a8be5d5	ir: Add A64-specific opcodes for getting and setting raw NZCV values This will be necessary to implement the flag manipulation and flag format instructions.	2020-04-22 20:58:11 +01:00
Merry	743c52fdc5	Merge pull request #440 from lioncash/include common/fp: Remove unnecessary includes	2020-04-22 20:58:11 +01:00
Lioncash	d3515279df	A64: Implement the vector version of FCVTXN	2020-04-22 20:58:10 +01:00
Lioncash	17aea0b997	A64: Implement UQSHL (register)'s scalar variant This can be implemented in terms of the vector variant.	2020-04-22 20:58:10 +01:00
Lioncash	c99d4b762e	A64: Implement single and double-precision vector variant of FSQRT	2020-04-22 20:58:10 +01:00
Lioncash	54e0b487f3	A64: Rearrange flag format/manipulation instructions Gives these instructions better categorical labeling.	2020-04-22 20:58:10 +01:00
Lioncash	88d1977cb9	common/bit_util: Make a few functions as constexpr These four functions can be made constexpr with no issue.	2020-04-22 20:58:10 +01:00
Lioncash	f33e5939b7	common/fp: Remove unnecessary includes	2020-04-22 20:58:10 +01:00
Lioncash	302f56b36a	A64: Fall back to interpreting for FCADD and FCMLA half-precision variants Rather than straight-up treating them as undefined, we can fall back to an interpreter in this case.	2020-04-22 20:58:10 +01:00
Lioncash	4339a8fff6	A64: Implement the scalar version of FCVTXN	2020-04-22 20:58:10 +01:00
Lioncash	35ddf68ad5	A64: Implement SQSHL (register)'s scalar variant We can implement this in terms of the vector variant.	2020-04-22 20:58:10 +01:00
Lioncash	5cf1478620	frontend/ir: Add opcodes for vector square roots	2020-04-22 20:58:10 +01:00
Lioncash	36027ebef5	frontend/ir/microinstruction: Add missing cases for FPRecipExponent{32,64} for ReadsFromAndWritesToFPSRCumulativeExceptionBits() This was intended to be added within #437, but was missed	2020-04-22 20:58:10 +01:00
Merry	40b081438a	Merge pull request #439 from lioncash/fcmla A64: Implement FCADD and FCMLA	2020-04-22 20:58:10 +01:00
Lioncash	7c81a58ed3	frontend/ir/ir_emitter: Alter parameters of FPDoubleToSingle() and FPSingleToDouble() to pass along desired rounding mode This will be necessary to special-case the non-IEEE Von Neumann rounding to odd rounding mode.	2020-04-22 20:58:10 +01:00
Merry	d91192681a	Merge pull request #438 from lioncash/fmulx A64: Implement scalar double/single precision FMULX (by element)	2020-04-22 20:58:10 +01:00
Lioncash	ed29ef8cca	A64: Implement FCMLA	2020-04-22 20:58:10 +01:00
Lioncash	95af9dafbe	common/fp/op: Add FP conversion functions	2020-04-22 20:58:10 +01:00
Merry	9f11720a69	Merge pull request #437 from lioncash/frecpx A64: Implement FRECPX (single, double precision)	2020-04-22 20:58:10 +01:00
Lioncash	bdcea0b0dc	A64: Implement scalar double/single precision FMULX (by element)	2020-04-22 20:58:10 +01:00
Lioncash	5ce17574f9	A64: Implement FCADD	2020-04-22 20:58:10 +01:00
Merry	34d917f34e	Merge pull request #436 from lioncash/no-alloc A64: Implement LDNP/STNP	2020-04-22 20:58:10 +01:00
Lioncash	e44730ba6d	A64: Implement FRECPX (single, double precision)	2020-04-22 20:58:10 +01:00
Lioncash	bfaeb08d3c	A64: Implement LDNP/STNP LDNP and STNP indicate that a memory access is non-temporal/streaming (i.e. unlikely to be repeated), allowing data caching to not be performed. However, given this is only a hint, we can treat these two instructions as regular LDP and STP instructions for the time being.	2020-04-22 20:58:10 +01:00
Lioncash	9cf3c25811	frontend/ir/ir_emitter: Add opcodes for floating point reciprocal exponents	2020-04-22 20:58:10 +01:00
Merry	dbf47db713	Merge pull request #434 from lioncash/format A32/translate_arm: Formatting/tidying up	2020-04-22 20:58:10 +01:00
Lioncash	b168c2a9f9	common/fp/op: Add operations for floating-point reciprocal exponents	2020-04-22 20:58:10 +01:00
Lioncash	05a6ab691d	translate_arm/coprocessor: Minor tidying up	2020-04-22 20:58:10 +01:00
Lioncash	1e32a09c03	translate_arm/vfp2: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	e209b31073	translate_arm/synchronization: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	9514e3602e	translate_arm/status_register_access: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	c6aa1a708a	translate_arm/saturated: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	a72813599a	translate_arm/reversal: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	7be56e6b67	translate_arm/parallel: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	3c00a616d6	translate_arm/packing: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	c711188f46	translate_arm/multiply: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	c8dad40d81	translate_arm/misc: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	a7bf5ff77d	translate_arm/load_store: Invert conditionals where applicable	2020-04-22 20:58:10 +01:00
Lioncash	2e180a7f14	backend/x64/a32_interface: Mark Context move constructor and move assignment as noexcept Provides a more "correct" move constructor/assignment operator, since these relevant functions shouldn't throw exceptions. Has the benefit of playing nicely with std::move_if_noexcept and other noexcept library facilities.	2020-04-22 20:58:09 +01:00
Lioncash	f4b19a7393	translate_arm/extension: Invert conditionals where applicable	2020-04-22 20:58:09 +01:00
Lioncash	deb9dd4acc	block_of_code: Replace cast with [[maybe_unused]] in DoesCpuSupport()	2020-04-22 20:58:09 +01:00
Lioncash	c2de6ecfd0	translate_arm/exception_generating: Invert conditionals where applicable	2020-04-22 20:58:09 +01:00
Lioncash	d8a8d3b073	translate_arm/data_processing: Invert conditionals where applicable	2020-04-22 20:58:09 +01:00
Lioncash	df5c51ff47	translate_arm/branch: Invert conditionals where applicable Allows unindenting code a bit.	2020-04-22 20:58:09 +01:00
Lioncash	3290a9fdc2	common: Remove address_range.h The AddressRange structure isn't used anywhere within the codebase, so this can be removed. Particularly because there's no real appeal/heavy potential use of it in the future that isn't trivial to add back if needed.	2020-04-22 20:57:38 +01:00
Lioncash	ee973f13c7	frontend/A32/ir_emitter: Mark PC() and AlignPC() as const-qualified member functions These don't modify instance state, so they can be const-qualified member functions.	2020-04-22 20:57:38 +01:00
Lioncash	3a2dd09122	frontend/A64/ir_emitter: Mark PC() and AlignPC() as const qualified member functions These don't actually alter any instance state.	2020-04-22 20:57:38 +01:00
Lioncash	575ae852a9	constant_propagation_pass: Fold byte reversal opcodes where applicable These are reasonably trivial to fold away when applicable. We just perform the swap and replace the instruction with the constant value.	2020-04-22 20:57:37 +01:00
Merry	2c53f354ab	Merge pull request #418 from lioncash/fold-op constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes	2020-04-22 20:57:37 +01:00
Merry	ad14a33672	Merge pull request #417 from lioncash/swap common: Move byte swapping functions to bit_utils.h	2020-04-22 20:57:37 +01:00
Lioncash	d302d9bd0c	constant_propagation_pass: Handle folding for Least/MostSignificant{Bit, Byte, Half, Word} opcodes These are quite trivial to fold.	2020-04-22 20:57:37 +01:00
Lioncash	7139942976	common: Move byte swapping functions to bit_utils.h These are quite general functions, so they can just be moved into common instead of recreating a namespace here.	2020-04-22 20:57:37 +01:00
MerryMage	7c8fcaef26	emit_x64_vector_floating_point: AVX && DN implementation of EmitFPVectorMulX	2020-04-22 20:57:37 +01:00
MerryMage	e3898e628e	A64: Implement FMULX (by element), single and double precision variants	2020-04-22 20:57:37 +01:00
Lioncash	93351c7efb	a64_emit_x64: Make constness of loop elements explicit within GenFastmemFallbacks()	2020-04-22 20:57:37 +01:00
MerryMage	c106d8cedf	A64: Implement FMULX, vector single-precision and double-precision variant	2020-04-22 20:57:37 +01:00
Lioncash	7752ffc50c	a64_emit_x64: Convert std::vector instances in GenFastmemFallbacks() to std::array Given these are quite small, we can avoid the need to heap allocate here.	2020-04-22 20:57:37 +01:00
MerryMage	fa8925c4df	IR: Implement FPVectorMulX	2020-04-22 20:57:37 +01:00
Michał Janiszewski	bbd8abaa25	Provide justification for always-true condition (#412 )	2020-04-22 20:57:37 +01:00
Michał Janiszewski	7d0e918b51	Add missing include guards	2020-04-22 20:57:37 +01:00
V.Kalyuzhny	764a93bf5a	Switch boost::optional to std::optional	2020-04-22 20:57:37 +01:00
Lioncash	07c197e8d0	constant_propagation_pass: Add 64-bit variants of shifts to the pass These optimizations can also apply to the 64-bit variants of the shift opcodes; we just need to check if the instruction has an associated pseudo-op before performing the 32-bit variant's specifics. While we're at it, we can also relocate the code to its own function like the rest of the cases to keep organization consistent.	2020-04-22 20:57:37 +01:00
Lioncash	8248999c5d	constant_propagation_pass: Fold division operations where applicable We can fold division operations if: 1. The divisor is zero, then we can replace the result with zero (as this is how ARM platforms expect it). 2. Both values are known, in which case we can just do the operation and store the result 3. The divisor is 1, in which case just return the other operand.	2020-04-22 20:57:37 +01:00
Merry	d83eae2004	Merge pull request #406 from lioncash/mul constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable	2020-04-22 20:57:37 +01:00
Merry	73d9393300	Merge pull request #405 from lioncash/inst a64: Add ARMv8.4+ instructions encodings to the encoding table	2020-04-22 20:57:37 +01:00
Lioncash	7ad6981437	constant_propagation_pass: deduplicate common 32/64 bit checking for results in folding functions It's common for an folding operation to apply to both the 32-bit and 64-bit variant of the same opcode, which leads to checking which kind of result we need to store the value as. This moves it to its own function, so that we don't need to duplicate it in various functions.	2020-04-22 20:57:37 +01:00
Lioncash	f1a66c37ba	a64: Add ARMv8.4+ instructions encodings to the encoding table Keeps the table up to date with the ARM specification.	2020-04-22 20:57:37 +01:00
Lioncash	72daf37208	constant_propagation_pass: Fold Mul32 and Mul64 cases where applicable Multiplication operations can currently be folded if: 1. Both arguments are known constant values 2. Either operand is zero (in which case the result is also zero) 3. Either operand is one (in which case the result is the non-one operand).	2020-04-22 20:57:37 +01:00
Lioncash	43b2eb4688	constant_propagation_pass: Fold SignExtend{Type}ToLong opcodes if possible	2020-04-22 20:57:37 +01:00
Lioncash	2da2cf9058	constant_propagation_pass: Fold SignExtend{Type}ToWord opcodes if possible	2020-04-22 20:57:37 +01:00
Lioncash	0583d401e3	ir/value: Add IsSignedImmediate() and IsUnsignedImmediate() functions to Value's interface This allows testing against arbitrary values while also simultaneously eliminating the need to check IsImmediate() all the time in expressions.	2020-04-22 20:57:37 +01:00
Lioncash	c42f6ea184	constant_propagation_pass: Fold ZeroExtend{Type}ToLong opcodes if possible These are equivalent to the ZeroExtendXToWord variants, so we can trivially do this as well.	2020-04-22 20:57:37 +01:00
Lioncash	e3258e8525	ir/value: Add a GetImmediateAsS64() function Provides a signed analogue to GetImmediateAsU64() for consistency with both integral classes when it comes to signed/unsigned..	2020-04-22 20:57:37 +01:00
Lioncash	2274214ff0	constant_propagation_pass: Combine zero-extension folding code into its own function Separates the behavior from the actual switch statement and gets rid of duplication, now that we can use the general GetImmediateAsU64() function.	2020-04-22 20:57:37 +01:00
Lioncash	4a3c064b15	ir/value: Add an IsZero() member function to Value's interface By far, one of the most common things to check for is whether or not a value is zero, as it typically allows folding away unnecesary operations (other close contenders that can help with eliding operations are 1 and -1). So instead of requiring a check for an immediate and then actually retrieving the integral value and checking it, we can wrap it within a function to make it more convenient.	2020-04-22 20:57:37 +01:00
Merry	c649f11c0a	Merge pull request #401 from lioncash/folding constant_propagation_pass: Fold &, \|, ^, and ~ operations where applicable	2020-04-22 20:56:01 +01:00
MerryMage	2524d536b0	A32/ir_emitter: Bugfix: ExceptionRaised was producing incorrect PC Use actual PC and not pipelined PC.	2020-04-22 20:56:01 +01:00
Lioncash	c09f4cf28e	constant_propagation_pass: Fold NOT operations	2020-04-22 20:55:50 +01:00
Lioncash	d69fceec55	value: Move ImmediateToU64() to be a part of Value's interface This'll make it slightly nicer to do basic constant folding for 32-bit and 64-bit variants of the same IR opcode type. By that, I mean it's possible to inspect immediate values without a bunch of conditional checks beforehand to verify that it's possible to call GetU32() or GetU64, etc.	2020-04-22 20:55:50 +01:00
Lioncash	8013548bbb	constant_propagation_pass: Fold OR operations	2020-04-22 20:55:50 +01:00
MerryMage	ca603c1215	reg_alloc: Emit AVX instructions where able Smaller codesize.	2020-04-22 20:55:50 +01:00
Lioncash	898d096e39	constant_propagation_pass: Fold AND operations	2020-04-22 20:55:50 +01:00
MerryMage	e2358af5ef	abi: Emit AVX instructions where able Smaller codesize.	2020-04-22 20:55:50 +01:00
Lioncash	f40fcda1f6	ir/value: Add member function to check whether or not all bits of a contained value are set This is useful when we wish to know if a contained value is something like 0xFFFFFFFF, as this helps perform constant folding. For example the operation: x & 0xFFFFFFFF can be folded to just x in the 32-bit case.	2020-04-22 20:55:50 +01:00
MerryMage	7c0378f56d	a64_exclusive_monitor: Loosen memory ordering requirements It is not necessary to be as strict as it was.	2020-04-22 20:55:50 +01:00
Lioncash	0ea99b7d59	constant_propagation_pass: Fold EOR operations It's possible to fold cases of exclusive OR operations if they can be known to be an identity operation, or if both operands happen to be known immediates, in which case we can just store the result of the exclusive-OR directly.	2020-04-22 20:55:50 +01:00
MerryMage	f0920c0ded	Fix VShift terminology An arithmetic shift is by definition a signed shift, and a logical shift is by definition an unsigned shift. - Rename VectorLogicalVShiftS* -> VectorArithmeticVShift* - Rename VectorLogicalVShiftU* -> VectorLogicalVShift*	2020-04-22 20:55:50 +01:00
MerryMage	b51dae790d	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS16	2020-04-22 20:55:50 +01:00
MerryMage	bd47f2ca8f	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftS64	2020-04-22 20:55:50 +01:00
MerryMage	3bf183d7e8	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftS32	2020-04-22 20:55:50 +01:00
MerryMage	94f9d402eb	emit_x64_vector: AVX512 implementation of EmitVectorLogicalVShiftU16()	2020-04-22 20:55:50 +01:00
MerryMage	6d9639e3b0	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU64()	2020-04-22 20:55:50 +01:00
MerryMage	bbc066a266	emit_x64_vector: AVX2 implementation of EmitVectorLogicalVShiftU32()	2020-04-22 20:55:50 +01:00
Lioncash	da2e7fad87	emit_x64_vector: SSSE3 variant of EmitVectorCountLeadingZeros8() pshufb lyfe	2020-04-22 20:55:50 +01:00
VelocityRa	c30b8dbe99	decoders: Cast to correctly-sized type before shifting Fixes decoding for 64-bit instructions Does not help/apply to any currently supported ARM versions (since all are 32-bit length or below), it's for future-proofing should such an arch be supported.	2020-04-22 20:55:50 +01:00
MerryMage	238f2f2cd0	a64_emit_x64: Lowercase PAGE_SIZE PAGE_SIZE is defined as a macro by musl.	2020-04-22 20:55:50 +01:00
MerryMage	7162f6f254	emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed	2020-04-22 20:55:50 +01:00
MerryMage	e7a5592699	emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE	2020-04-22 20:55:50 +01:00
MerryMage	b8fde48732	emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8	2020-04-22 20:55:50 +01:00
MerryMage	fd37b637aa	emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16	2020-04-22 20:55:50 +01:00
MerryMage	09bf273bc8	A64: Implement SCVTF, UCVTF (vector, fixed-point), scalar variant	2020-04-22 20:55:06 +01:00
MerryMage	03ad2072a7	emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed	2020-04-22 20:55:06 +01:00
MerryMage	f9129db6fd	A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant	2020-04-22 20:55:06 +01:00
Lioncash	48df9b9a7d	A64: Implement UQSHL's vector immediate and register variants	2020-04-22 20:55:06 +01:00
Lioncash	d426dfe942	ir: Add opcodes for unsigned saturating left shifts	2020-04-22 20:55:06 +01:00
Lioncash	ab60720418	A64/translate/impl: Make signatures consistent for unimplemented by-element SIMD variants Makes them all consistent, so it isn't necessary to change the prototypes over when implementing them.	2020-04-22 20:55:06 +01:00
Lioncash	6b5ea6ee66	A64: Implement BRK Currently, we can just implement this as part of the exception interface, similar to how it's done for the A32 interface with BKPT.	2020-04-22 20:55:06 +01:00
Lioncash	b915364c16	A64/imm: Add full range of comparison operators to Imm template Makes the comparison interface consistent by providing all of the relevant members. This also modifies the comparison operators to take the Imm instance by value, as it's really only a u32 under the covers, and it's cheaper to shuffle around a u32 than a 64-bit pointer address.	2020-04-22 20:55:06 +01:00
MerryMage	02150bc0b7	IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed	2020-04-22 20:55:06 +01:00
MerryMage	027b0ef725	A64: Implement SCVTF, UCVTF (scalar, fixed-point)	2020-04-22 20:55:06 +01:00
MerryMage	8051f60db0	opcodes.inc: Align columns to a tabstop of 4	2020-04-22 20:55:06 +01:00
MerryMage	90193b0e3d	IR: Add fbits argument to FixedToFP-related opcodes	2020-04-22 20:55:06 +01:00
Lioncash	616a153c16	A64: Implement SQSHL's vector immediate variant	2020-04-22 20:55:06 +01:00
Lioncash	e8b0f25dff	A64: Implement SQSHL's vector register variant	2020-04-22 20:55:06 +01:00
Lioncash	b14eaaec46	ir: Add opcodes for left signed saturated shifts	2020-04-22 20:55:06 +01:00
Lioncash	da55ed7b31	branch: Make variables const where applicable	2020-04-22 20:55:06 +01:00
Lioncash	867b666285	move_wide: Make variables const where applicable	2020-04-22 20:55:06 +01:00
Lioncash	78024a9dc4	load_store_register_unprivileged: Make variables const where applicable	2020-04-22 20:55:06 +01:00
Lioncash	e45e5da610	load_store_register_immediate: Place conditional bodies on their own line Makes the conditionals visually consistent with the rest of the codebase.	2020-04-22 20:55:06 +01:00
Lioncash	b586cf3f56	load_store_load_literal: Make variables const where applicable	2020-04-22 20:55:06 +01:00
Lioncash	c3a3b9687e	data_processing_logical: Move datasize declarations after early-exit conditionals While we're at it, make variables const where applicable.	2020-04-22 20:55:06 +01:00
Lioncash	ed797e6540	data_processing_conditional_select: Make variables const where applicable Makes CSEL's function consistent with all of the others.	2020-04-22 20:55:06 +01:00
Lioncash	c82fa5ec5a	data_processing_addsub: Move datasize declarations after early-exit conditionals While we're at it, also make relevant variables const where applicable	2020-04-22 20:55:06 +01:00
Lioncash	f4a66d2477	data_processing_bitfield: Move datasize variables after early-exit conditionals Moves the declaration of datasize to the scope that it's used within. This also takes the opportunity to apply const where applicable, and make early-exits all vertically consistent with one another.	2020-04-22 20:55:06 +01:00
Lioncash	2e0fcd6161	A64: Implement CLS's vector variant Leverages CLZ like the integral variant does.	2020-04-22 20:55:06 +01:00
Lioncash	a2cd643525	emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked Given this is just an internal helper function, it can be marked static.	2020-04-22 20:55:06 +01:00
Lioncash	c39ea2e3c9	perf_map: Use std::string_view instead of std::string for PerfMapRegister() We can just use a non-owning view into a string in this case instead of potentially allocating a std::string instance.	2020-04-22 20:55:06 +01:00
MerryMage	12243692f5	A64: Implement SQRDMULH (vector), vector variant	2020-04-22 20:55:06 +01:00
MerryMage	a9ffcf08b1	A64: Implement SQDMULL (vector), vector variant	2020-04-22 20:55:06 +01:00
MerryMage	3e447614c6	IR: Add VectorSignedSaturatedDoublingMultiplyLong	2020-04-22 20:55:06 +01:00
MerryMage	06b31448aa	emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply * Return both the upper and lower parts of the multiply if required * SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead * Improve port utilisation where possible (punpck instructions were a bottleneck)	2020-04-22 20:55:06 +01:00
MerryMage	08c0e017a5	IR: Implement Vector{Signed,Unsigned}Multiply{16,32}	2020-04-22 20:55:06 +01:00
Lioncash	b6df34cdde	backend_x64/a64_interface: Re-enable the constant folding pass This was disabled for debugging, but never re-enabled. Just to be sure, testing was done downstream in yuzu to make sure this didn't happen to break anything (which seems to be the case).	2020-04-22 20:55:06 +01:00
MerryMage	06ba397af2	emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused	2020-04-22 20:55:06 +01:00
MerryMage	e553c4fe8d	emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused	2020-04-22 20:55:06 +01:00
MerryMage	3caeb62ef1	emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused	2020-04-22 20:55:06 +01:00
MerryMage	344ee76aba	emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64}	2020-04-22 20:55:06 +01:00
MerryMage	1492573267	emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32}	2020-04-22 20:55:06 +01:00
Lioncash	26df6e5e7b	emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks I had initially meant to use BitSize() here, not sizeof()	2020-04-22 20:55:06 +01:00
MerryMage	a4a26ac226	emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation	2020-04-22 20:55:06 +01:00
MerryMage	a7c66d2d28	emit_x64_vector: Simplify fpsr_qc related code Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously pay the cost of conversion for every saturation instruction.	2020-04-22 20:55:06 +01:00
Lioncash	112cff9ab9	A64: Implement CLZ's vector variant	2020-04-22 20:55:06 +01:00
Lioncash	e739624296	ir: Add opcodes for vector CLZ operations We can optimize these cases further for with the use of a fair bit of shuffling via pshufb and the use of masks, but given the uncommon use of this instruction, I wouldn't consider it to be beneficial in terms of amount of code to be worth it over a simple manageable naive solution like this. If we ever do hit a case where vectorized CLZ happens to be a bottleneck, then we can revisit this. At least with AVX-512CD, this can be done with a single instruction for the 32-bit word case.	2020-04-22 20:55:05 +01:00
MerryMage	d4c37a68a8	A64/translate: VectorZeroUpper for V(64) stores Ensures correctness.	2020-04-22 20:55:05 +01:00
MerryMage	b8daa4feac	simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper	2020-04-22 20:55:05 +01:00
Lioncash	5653e7637e	emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes These were unintentionally left in when introducing SUQADD and USQADD	2020-04-22 20:55:05 +01:00
Lioncash	14e026a7f0	A64: Implement USQADD's scalar and vector variants	2020-04-22 20:55:05 +01:00
Lioncash	d4a76aaa04	ir: Add opcodes form unsigned saturated accumulations of signed values	2020-04-22 20:55:05 +01:00
Lioncash	18ad7f237d	A64: Implement SUQADD's scalar and vector variants	2020-04-22 20:55:05 +01:00
Lioncash	6f911a26da	ir: Add opcodes for signed saturated accumulations of unsigned values	2020-04-22 20:55:05 +01:00
Lioncash	9a3d38d2ee	A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants We can simply modify the general function made for SMULL{2} and UMULL{2}'s by-element variants to also handle the other multiply-based by-element variants.	2020-04-22 20:55:05 +01:00
Lioncash	6ccfbc9b39	A64: Implement UMULL{2}'s vector by-element variant	2020-04-22 20:55:05 +01:00
Lioncash	58e21f175c	A64: Implement SMULL{2}'s vector by-element variant	2020-04-22 20:55:05 +01:00
Lioncash	134bb02e19	ir/value: Replace includes with forward declarations enum classes are still considered complete types when forward declared (as the compiler knows the exact size of the type from the declaration alone). The only difference in this case being that the members of the enum class aren't visible. Given we don't use the members within this header in any way, we can simply forward declare them here and remove the inclusions.	2020-04-22 20:55:05 +01:00
Lioncash	2c8e07e7d0	ir/cond: Migrate to C++17 nested namespace specifiers	2020-04-22 20:55:05 +01:00
Lioncash	c3b7819a55	CMakeLists: Add missing cond.h header to file listing Allows the file to show up within IDEs more easily.	2020-04-22 20:55:05 +01:00
Lioncash	0a3976059f	A64: Implement URSQRTE	2020-04-22 20:55:05 +01:00
Lioncash	b6e74fd17d	ir: Add opcodes for performing unsigned reciprocal square root estimates	2020-04-22 20:55:05 +01:00
Lioncash	bd3582e811	A64: Implement URECPE	2020-04-22 20:55:05 +01:00
Lioncash	af83360f89	ir: Add opcodes for unsigned reciprocal estimate	2020-04-22 20:55:05 +01:00
Lioncash	740ffa52ae	A64: Implement SQNEG's scalar and vector variant	2020-04-22 20:53:46 +01:00
Lioncash	fca7eddb9e	A64: Add opcodes for signed saturating negations	2020-04-22 20:53:46 +01:00
Lioncash	f1ebbcd7bc	emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract() In the event position is zero, we can just treat it as a NOP, given there's no need to move the data.	2020-04-22 20:53:46 +01:00
Lioncash	87372917f9	emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower() In the event position == 0, we can just treat it as a simple movq, clearing the upper half of the XMM register. This also makes that case use only one register.	2020-04-22 20:53:46 +01:00
Lioncash	f5fb496e7e	A64: Implement SQDMULH's by-element scalar variant	2020-04-22 20:53:46 +01:00
Lioncash	40f0576995	A64: Implement SQDMULH's by-element vector variant	2020-04-22 20:53:46 +01:00
MerryMage	8f9206901d	backend/x64: Do not clear fast_dispatch_table if not enabled There is no need to pay for the cost of setting a large block of memory if we're not using it.	2020-04-22 20:53:46 +01:00
MerryMage	9b65100660	A64: Implement FastDispatchHint	2020-04-22 20:53:46 +01:00
MerryMage	f96c43d422	A32: Implement FastDispatchHint	2020-04-22 20:53:46 +01:00
MerryMage	aa8d826c13	ir/terminal: Add FastDispatchHint	2020-04-22 20:53:46 +01:00
Lioncash	1a69a61cb4	A64: Implement SQDMULH's scalar variant	2020-04-22 20:53:46 +01:00
Lioncash	7ebfd0f31c	ir: Add opcodes for scalar signed saturated doubling multiplies	2020-04-22 20:53:46 +01:00
Lioncash	9c03311fed	A64: Implement SQDMULH's vector variant	2020-04-22 20:53:46 +01:00
Lioncash	a0231e5546	ir: Add opcodes for signed saturated doubling multiplies	2020-04-22 20:53:46 +01:00
Lioncash	db24e1f09b	A64: Implement SQABS' scalar variant	2020-04-22 20:53:46 +01:00
Lioncash	bda5d14c7f	A64: Implement SQABS' vector variant.	2020-04-22 20:53:46 +01:00
Lioncash	0507e47420	ir: Add opcodes for signed saturated absolute values	2020-04-22 20:53:46 +01:00
MerryMage	27427595b7	emit_x64_floating_point: EmitFPToFixed: maxsd optimization maxsd is not required when doing a signed conversion, because x64 produces a 0x80...00 value for out of range values.	2020-04-22 20:53:46 +01:00
MerryMage	1abf82ac4a	emit_x64_floating_point: ZeroIfNaN: pxor -> xorps xorps is shorter and more appropriate here.	2020-04-22 20:53:46 +01:00
MerryMage	3415828fb4	IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64}	2020-04-22 20:53:46 +01:00
Lioncash	e30f9816ec	A32/decoder: Add missing <algorithm> includes These includes should be present, as we use std::find_if() within these headers.	2020-04-22 20:53:46 +01:00
Lioncash	4507627905	emit_x64_vector: Provide AVX path for EmitVectorMinU64()	2020-04-22 20:53:46 +01:00
Lioncash	fd49a62b06	emit_x64_vector: Provide AVX path for EmitVectorMinS64()	2020-04-22 20:53:46 +01:00
Lioncash	770723f449	emit_x64_vector: Provide AVX path for EmitVectorMaxU64()	2020-04-22 20:53:46 +01:00
Lioncash	8fb90c0cf1	emit_x64_vector: Provide AVX path for EmitVectorMaxS64()	2020-04-22 20:53:46 +01:00
Lioncash	2cac6ad129	emit_x64_vector: Simplify EmitVectorLogicalLeftShift8() Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead of time and just and the results of a halfword left shift.	2020-04-22 20:53:46 +01:00
Lioncash	135107279d	emit_x64_vector: Simplify EmitVectorLogicalShiftRight8() We can generate the mask and AND it against the result of a halfword shift instead of looping.	2020-04-22 20:53:46 +01:00
Lioncash	2952b46b16	emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16() We should be defining the value after the results have been calculated to be consistent with the rest of the code.	2020-04-22 20:53:46 +01:00
Lioncash	fda19095ea	emit_x64_vector: Remove fallback in EmitVectorSignExtend64() This is fairly trivial to do manually.	2020-04-22 20:53:46 +01:00
Lioncash	39593fcd26	emit_x64_vector: Remove fallback for EmitVectorSignExtend32() We can just do the extension manually, which gets rid of the need to fall back here.	2020-04-22 20:53:46 +01:00
Lioncash	053175f69b	ir_emitter: Rename fpscr_controlled parameters to fpcr_controlled Part of addressing #333	2020-04-22 20:53:46 +01:00
MerryMage	f0184c4b8d	a32/exception_generating: BPKT: Define unpredictable behaviour Define unpredictable behaviour to be BKPT executes conditionally	2020-04-22 20:53:46 +01:00
MerryMage	a12854857b	A32: Add define_unpredictable_behaviour option	2020-04-22 20:53:46 +01:00
MerryMage	b0abaa8312	A32/location_descriptor: Change formatting to use hex	2020-04-22 20:53:46 +01:00
MerryMage	ccbf6c7f63	microinstruction: A32ExceptionRaised causes CPU exception	2020-04-22 20:53:46 +01:00
MerryMage	6595e49a31	A32/types: CondToString: Add nv	2020-04-22 20:53:46 +01:00
MerryMage	d5b9c4a4bb	block_of_code: Hide NX support behind compiler flag Systems that require W^X can use the DYNARMIC_ENABLE_NO_EXECUTE_SUPPORT cmake option.	2020-04-22 20:53:46 +01:00
MerryMage	de4494ffa5	Implement perfmap	2020-04-22 20:53:46 +01:00
MerryMage	f73104633b	a32_emit_x64: Fix incorrect BMI2 implementation for SetCpsr * The MSB for each byte in cpsr_ge were not being appropriately set. * We also expand test coverage to test this case. * We fix the disassembly of the MSR (imm) and MSR (reg) instructions as well.	2020-04-22 20:53:46 +01:00
MerryMage	3432a08e0a	backend/x64: Support W^X systems Closes #176.	2020-04-22 20:53:46 +01:00
BreadFish64	2a65442933	Backend: Create "backend" folder similar to the "frontend" folder	2020-04-22 20:53:46 +01:00
MerryMage	3b13f1eb12	A64/translate: Standardize arguments of helper functions Don't pass in IREmitter when TranslatorVisitor is already available.	2020-04-22 20:53:45 +01:00
MerryMage	a4e556d59c	A64/translate: Standardize TranslatorVisitor abbreviation Prefer v to tv.	2020-04-22 20:53:45 +01:00
MerryMage	9a0dc61efd	emit_x64_vector: Avoid recalculating addresses in EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
Lioncash	3d465e2c36	A64: Implement SQXTN, SQXTUN, and UQXTN's scalar variants We can implement these in terms of the vector variants	2020-04-22 20:53:45 +01:00
Lioncash	4ff39c6ea8	A64: Implement SDOT and UDOT's (by element) variants Gets all of the dot product instructions out of the way.	2020-04-22 20:53:45 +01:00
MerryMage	21df1fb539	emit_x64_vector: Don't load zero constant from memory in EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
MerryMage	3bbcca8757	emit_x64_vector: Special-case is_defaults_zero && table_size == 2 in EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
MerryMage	9cc00f900c	emit_x64_vector: Release registers when possible in EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
MerryMage	a12afd1065	reg_alloc: Add the ability to Release an allocation early	2020-04-22 20:53:45 +01:00
MerryMage	e68bd3c6c1	emit_x64_vector: Special-case table_size == 1 in EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
MerryMage	a4e1f8a63a	emit_x64_vector: SSE4.1 implementation of EmitVectorTableLookup	2020-04-22 20:53:45 +01:00
MerryMage	0c18b85c27	A64: Implement TBL and TBX	2020-04-22 20:53:45 +01:00
MerryMage	89d08c7d61	IR: Add VectorTable and VectorTableLookup IR instructions	2020-04-22 20:53:45 +01:00
MerryMage	0288974512	opcodes: Cleanup opcodes table * Remove T:: prefix from types. * Add another column for a 4th argument.	2020-04-22 20:53:45 +01:00
Lioncash	d9fc6cf31f	A64: Implement SDOT and UDOT's vector variant	2020-04-22 20:53:45 +01:00
Lioncash	cb5e5c5d49	A64: Implement SADALP and UADALP While we're at it we can join the code for SADDLP and UADDLP with these instructions, since the only difference is we do an accumulate at the end of the operation.	2020-04-22 20:53:45 +01:00

... 5 6 7 8 9 ...

1932 commits