dynarmic

Author	SHA1	Message	Date
merry	593de127d2	a64_emit_x64: Clear fastmem patch information on ClearCache	2022-02-27 19:50:05 +00:00
Merry	c90173151e	backend/x64: Split off memory emitters	2022-02-26 21:25:09 +00:00
Merry	19a423034e	block_of_code: Fix inaccurate size reporting in SpaceRemaining Typo: getCode should be getCurr: Instead of comparing against the current pointer, we were incorrectly comparing against the start of memory	2022-02-26 16:09:11 +00:00
Merry	ea08a389b4	emit_x64_floating_point: EmitFPToFixed: No need to round if rounding_mode == TowardsZero cvttsd2si truncates during operation	2022-02-23 20:44:02 +00:00
merry	b34214f953	emit_x64_floating_point: Improve EmitFPToFixed codegen	2022-02-23 19:42:15 +00:00
merry	5fe274f510	emit_x64_floating_point: Deinterlace 64-bit FPToFixed signed/unsigned codepaths	2022-02-23 19:14:41 +00:00
merry	b8dd1c7510	emit_x64_floating_point: Correct dead-code warning in MSVC 2019	2022-02-12 22:07:26 +00:00
merry	95a1ebfb97	backend/x64: Bugfix: A32 frontent also uses FPSCR.QC	2022-02-12 21:46:45 +00:00
Fernando Sahmkow	a8cbfd9af4	X86_Backend: set fences correctly for memory barriers and synchronization.	2022-02-01 14:27:54 +00:00
liushuyu	40afbe1927	disassembler_thumb: fix formatting issues with fmt 8.1.x ... ... fmt 8.1.0 added more formatting checks and Cond can't be formatted directly now	2022-01-05 21:49:51 -07:00
Wunkolo	ad5465d6ce	constant_pool: Use `tsl::robin_map` rather than `unordered_map` Finding a much more drastic improvement with `robin_map`. `map`: ``` [master] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 567.0 ms ± 6.9 ms [User: 513.1 ms, System: 53.2 ms] Range (min … max): 554.4 ms … 588.1 ms 100 runs ``` `unordered_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 561.1 ms ± 4.5 ms [User: 508.1 ms, System: 52.3 ms] Range (min … max): 552.6 ms … 574.2 ms 100 runs ``` `tsl::robin_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 553.5 ms ± 5.6 ms [User: 500.7 ms, System: 52.1 ms] Range (min … max): 545.7 ms … 569.3 ms 100 runs ```	2022-01-01 12:13:13 +00:00
Wunkolo	e57bb0569a	constant_pool: Convert hashtype from `tuple` to `pair`	2022-01-01 12:13:13 +00:00
Wunkolo	befc22a61e	constant_pool: Use `unordered_map` rather than `map` `map` is an ordinal structure with log(n) time searches. `unordered_map` uses O(1) average-time searches and O(n) in the worst case where a bucket has a to a colliding hash and has to start chaining. The unordered version should speed up our general-case when looking up constants. I've added a trivial order-dependent(_(0,1) and (1,0) will return a different hash_) hash to combine a 128-bit constant into a 64-bit hash that generally will not collide, using a bit-rotate to preserve entropy.	2022-01-01 12:13:13 +00:00
Morph	28714ee75a	general: Rename files with duplicate names In MSVC, having files with identical filenames will result into massive slowdowns when compiling. The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h	2021-12-23 11:38:58 +00:00
Andrea Pappacoda	4dcebc1822	build(cmake): add install target This makes dynarmic installable, and also adds a CMake package config file, that allows projects to use `find_package(dynarmic)` to import the library. I know #636 adds the same thing, but while experimenting with the different install options in https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034 I ended up with a working patch, so I'm proposing this as well. This implements solution 2.	2021-10-30 19:03:23 +01:00
Andrea Pappacoda	b87a889d98	build(cmake): add version and soversion to the library This adds versioning information to the built library. When building the shared library on Linux systems, a new object will be created: libdynarmic.so.5 This is really useful when talking about ABI compatibility. The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR are implicitly created when calling project(dynarmic VERSION x.y.z)	2021-10-11 06:53:05 +01:00
Fernando S	e4146ec3a1	x64 Interface: Allow for asynchronous invalidation (#647 ) * x64 Interface: Make Invalidation asynchronous. * Apply suggestions from code review	2021-10-05 15:06:41 +01:00
Wunkolo	5e7d2afe0f	IR: Introduce `VectorReduceAdd{8,16,32,64}` opcode Adds all elements of vector and puts the result into the lowest element. Accelerates the `addv` instruction into a vectorized implementation rather than a serial one.	2021-09-27 19:54:11 +01:00
Marshall Mohror	0b8fd755d8	Fix `signal_stack_size` for glibc 2.34 `SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.	2021-09-22 20:38:11 +01:00
Ben	6ce8bfaf32	Add API function to retrieve dissassembly as vector of strings (#644 ) Co-authored-by: ben <Avuxo@users.noreply.github.com>	2021-09-16 16:45:20 -04:00
Merry	517e35f845	decoder_detail: Avoid MSVC ICE MSVC has an internal compiler error when assume is present in this constexpr function	2021-08-15 19:32:05 +01:00
Merry	2e4f99ae3d	CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option	2021-08-15 16:09:37 +01:00
Merry	4988d9fab3	disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16}	2021-08-15 15:16:53 +01:00
Merry	615ce8c7c5	IR: Remove A32 IR instructions Get{N,Z,V}Flag	2021-08-12 13:06:15 +01:00
Wunkolo	1e94acff66	ir: Add VectorBroadcastElement{Lower} IR instruction The lane-splatting variant of `FMUL` and `FMLA` is very common in instruction streams when implementing things like matrix multiplication. When used, they are used very densely. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication The way this is currently implemented is by grabbing the particular lane into a general purpose register and then broadcasting it into a simd register through `VectorGetElement` and `VectorBroadcast`. ```cpp const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index)); ``` What could be done instead is to keep it within the vector-register and use a permute/shuffle to "splat" the particular lane across all other lanes, removing the GPR-round-trip. This is implemented as the new IR instruction `VectorBroadcastElement`: ```cpp const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index); ```	2021-08-07 23:03:57 +01:00
Wunkolo	46b8cfabc0	bit_util: Protect Replicate from automatic up-casting Recursive calls to `Replicate` beyond the first call might cause an unintentional up-casting to an `int` type due to `\|` and `<<` operations on types such as `uint8_t` and `uint16_t` This makes sure calls such as `Recursive<u8>` stay as the `u8` type through-out.	2021-08-07 23:03:57 +01:00
Merry	d41bc492fe	{a32,a64}_jitstate: Remove unnecessary headers	2021-08-07 19:35:33 +01:00
Merry	07b5734fb0	xbyak: Correct xbyak include directory xbyak is intended to be installed in /usr/local/include/xbyak. Since we desire not to install xbyak before using it, we copy the headers to the appropriate directory structure and use that instead	2021-08-07 15:13:49 +01:00
Merry	59fb568b27	tests: Use Zydis for disassembly	2021-08-06 15:29:43 +01:00
Wunkolo	f33bd69ec2	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed AVX512 introduces the _unsigned_ variant of float-to-integer conversion functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not representable as an unsigned integer, it will result in `0xFFFFF...` which can be utilized to get "free" saturation when the floating point value exceeds the unsigned range, after masking away negative values. https://www.felixcloutier.com/x86/vcvttps2udq https://www.felixcloutier.com/x86/vcvttpd2uqq This PR also speeds up the _signed_ conversion function for fp64->int64 https://www.felixcloutier.com/x86/vcvttpd2qq	2021-07-17 22:13:11 +01:00
SachinVin	048da372e9	block_of_code.cpp: remove redundant `align()`	2021-07-17 22:12:31 +01:00
Wunkolo	5971361160	IR: Add AndNot{32,64} IR instruction Also includes BMI1-acceleration for x64, when available	2021-07-02 22:27:29 +01:00
Wunkolo	49d00634f9	IR: Add VectorAndNot IR instruction And(a, Not(b)) is a common enough operation that this can be fused into a single `AndNot` operation. On x64 this is also a single `pandn` instruction rather than two.	2021-07-02 22:27:29 +01:00
Wunkolo	253713baf1	opcodes.inc: Disable clang format	2021-07-02 22:27:29 +01:00
Wunkolo	1fc96fd0c2	emit_x64{_vector}_floating_point: Unsafe AVX512 implementation of Emit{RSqrt,Recip}Estimate This implementation exists within the unsafe optimization paths and utilize the 14-bit-precision `vrsqrt14` and `vrcp14p` instructions provided by AVX512F+VL. These are _more_ accurate than the fallback path and the current `rsqrt`-based unsafe code-path but still falls in line with what is expected of the `Unsafe_ReducedErrorFP` optimization flag. Having AVX512 available will mean this function has 14 bits of precision. Not having AVX512 available will mean these functions have 11 bits of precision.	2021-06-27 11:18:58 +01:00
MerryMage	ea02a7d05d	conditional_state: Break from translation when invalid NV instruction is hit	2021-06-25 22:09:39 +01:00
Lioncash	9bb464a203	externals: Update fmt to 8.0.0	2021-06-23 05:04:53 -04:00
Wunkolo	c6125082ea	emit_x64_floating_point: AVX512 implementation of EmitFPMinMaxNumeric	2021-06-20 10:12:27 +01:00
SachinVin	a626a2ec63	ir_emitter: Remove 32-bit-only `SubWithCarry`	2021-06-11 17:27:34 +01:00
Wunkolo	776208742b	emit_x64_{vector_}floating_point: Centralize implementation of FP{Vector}{Abs,Neg} Removes dependency on the constants at the top of some files such as `f16_negative_zero` and `f32_non_sign_mask` in favor of the `FPInfo` trait-type. Also removes bypass delays by selecting between instructions such as `pand`, `andps`, or `andpd` depending on the type and keeps them in their respective uop domain. See https://www.agner.org/optimize/instruction_tables.pdf for more info on bypass delays.	2021-06-10 00:04:57 +01:00
Wunkolo	58ffde23f9	bit_util: Make Replicate constexpr	2021-06-10 00:04:57 +01:00
SachinVin	ccf27f9c8c	ir_emitter: Remove 32-bit-only `AddWithCarry`	2021-06-09 01:54:03 +01:00
Wunkolo	5385edcc66	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVector{Min,Max}{32,64}	2021-06-08 17:50:28 +01:00
Wunkolo	0c67b913fe	backend/x64: Add vcmp constants	2021-06-08 17:50:28 +01:00
Wunkolo	8fde505943	backend/x64: Add vfpclass constants Bit-wise constants for use with the `vfpclass` instruction.	2021-06-08 17:50:28 +01:00
Wunkolo	c82e29ed82	backend/x64: Add vrange constants Adds compile-time `FpRangeLUT` for generating the 8-bit immediate LUT value for the `vrange*` instruction	2021-06-08 17:50:28 +01:00
MerryMage	c1d5a7977e	Add Unsafe_IgnoreStandardFPCRValue optimization	2021-06-08 17:26:45 +01:00
Wunkolo	c157dfcc4c	emit_x64_vector: Reduce `gf2p8affineqb` requirement to `GFNI` Currently, every usage of `gf2p8affineqb` is guarded by the `AVX512F + AVX512VL + GFNI` requirement, when really we only need `GFNI` on its own. This will allow `GFNI`-only chips to get emit GFNI features without needing to have AVX512 as well. There _are_ chips in existance currently that strictly ship with GFNI and have no implementation of AVX1/AVX2/AVX512(and thus no VEX/EVEX encoding) such as Tremont(Lakefield) chips.	2021-06-08 14:00:00 +01:00
Wunkolo	e47d0d11c3	emit_x64_vector: AVX512 implementation of EmitVectorNot Single in-place ternary logic instruction.	2021-06-08 03:11:38 +01:00
Markus Wick	0c12614d1a	A64/config.h: Split fastmem and page_table options. We might want to allocate different sizes for each of them. e.g. for the unsafe fastmem approach without bounds checking. Or for using the full 48bit adress range (with mirrors) by allocating our real arena as close to 1<<47 as possible.	2021-06-06 17:25:51 +01:00

1 2

79 commits