Commit graph

2780 commits

Author SHA1 Message Date
merry
95a1ebfb97 backend/x64: Bugfix: A32 frontent also uses FPSCR.QC 2022-02-12 21:46:45 +00:00
merry
473bbd422e test_arm_instructions: Add vmsr/vcmp/vmrs test 2022-02-12 21:43:05 +00:00
Fernando Sahmkow
a8cbfd9af4 X86_Backend: set fences correctly for memory barriers and synchronization. 2022-02-01 14:27:54 +00:00
Alexandre Bouvier
0cafcc1af9 cmake: Always build static externals 2022-01-08 14:23:34 +00:00
Mai M
1635958d06
Merge pull request #658 from liushuyu/master
disassembler_thumb: fix formatting issues with fmt 8.1.x
2022-01-06 00:17:16 -05:00
liushuyu
40afbe1927
disassembler_thumb: fix formatting issues with fmt 8.1.x ...
... fmt 8.1.0 added more formatting checks and Cond can't be formatted
directly now
2022-01-05 21:49:51 -07:00
Wunkolo
ad5465d6ce constant_pool: Use tsl::robin_map rather than unordered_map
Finding a much more drastic improvement with `robin_map`.

`map`:
```
[master] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     567.0 ms ±   6.9 ms    [User: 513.1 ms, System: 53.2 ms]
  Range (min … max):   554.4 ms … 588.1 ms    100 runs
```

`unordered_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     561.1 ms ±   4.5 ms    [User: 508.1 ms, System: 52.3 ms]
  Range (min … max):   552.6 ms … 574.2 ms    100 runs
```

`tsl::robin_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     553.5 ms ±   5.6 ms    [User: 500.7 ms, System: 52.1 ms]
  Range (min … max):   545.7 ms … 569.3 ms    100 runs
```
2022-01-01 12:13:13 +00:00
Wunkolo
e57bb0569a constant_pool: Convert hashtype from tuple to pair 2022-01-01 12:13:13 +00:00
Wunkolo
befc22a61e constant_pool: Use unordered_map rather than map
`map` is an ordinal structure with log(n) time searches.
`unordered_map` uses O(1) average-time searches and O(n) in the worst
case where a bucket has a to a colliding hash and has to start chaining.
The unordered version should speed up our general-case when looking up
constants.

I've added a trivial order-dependent(_(0,1) and (1,0) will return a
different hash_) hash to combine a 128-bit constant into a
64-bit hash that generally will not collide, using a bit-rotate to
preserve entropy.
2022-01-01 12:13:13 +00:00
Morph
28714ee75a general: Rename files with duplicate names
In MSVC, having files with identical filenames will result into massive slowdowns when compiling.
The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h
2021-12-23 11:38:58 +00:00
Andrea Pappacoda
4dcebc1822 build(cmake): add install target
This makes dynarmic installable, and also adds a CMake package config
file, that allows projects to use `find_package(dynarmic)` to import the
library.

I know #636 adds the same thing, but while experimenting with the
different install options in
https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034
I ended up with a working patch, so I'm proposing this as well. This
implements solution 2.
2021-10-30 19:03:23 +01:00
Mai M
cce7e4ee5d
Merge pull request #651 from ameerj/fmt-cmake
externals/cmake: Fix fmt target check
2021-10-12 14:33:36 -04:00
ameerj
4cfbbe3df2 externals/cmake: Fix fmt target check 2021-10-11 13:44:19 -04:00
Andrea Pappacoda
b87a889d98 build(cmake): add version and soversion to the library
This adds versioning information to the built library.

When building the shared library on Linux systems, a new object will
be created: libdynarmic.so.5

This is really useful when talking about ABI compatibility.

The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR
are implicitly created when calling project(dynarmic VERSION x.y.z)
2021-10-11 06:53:05 +01:00
ameerj
55bede81f8 CMake: Fix fmt target check 2021-10-11 06:52:52 +01:00
Fernando S
e4146ec3a1
x64 Interface: Allow for asynchronous invalidation (#647)
* x64 Interface: Make Invalidation asynchronous.

* Apply suggestions from code review
2021-10-05 15:06:41 +01:00
Wunkolo
5e7d2afe0f IR: Introduce VectorReduceAdd{8,16,32,64} opcode
Adds all elements of vector and puts the result into the lowest element.
Accelerates the `addv` instruction into a vectorized implementation
rather than a serial one.
2021-09-27 19:54:11 +01:00
Wunkolo
69b831d7d2 tests: Add {S,V}ADD{V,P} tests
These are the instructions emitted for each variant of the `vaddv{q}_{s}{8,16,32,64}` intrinsic.
2021-09-27 19:54:11 +01:00
Marshall Mohror
0b8fd755d8 Fix signal_stack_size for glibc 2.34
`SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.
2021-09-22 20:38:11 +01:00
Ben
6ce8bfaf32
Add API function to retrieve dissassembly as vector of strings (#644)
Co-authored-by: ben <Avuxo@users.noreply.github.com>
2021-09-16 16:45:20 -04:00
Macchiarch
f88aa570a3
cpu_info: remove tSSE4a and tSSE5 (#643)
tSSE4a and tSSE5 have been removed from xbyak
2021-09-06 20:49:10 +01:00
merry
1697902948
Merge pull request #641 from abouvier/unbundle
CMakeLists: Add options to unbundle most external libraries
2021-08-25 07:56:12 +01:00
Alexandre Bouvier
352898e88b cmake: Add options to unbundle Zydis 2021-08-24 12:28:44 +02:00
Merry
517e35f845 decoder_detail: Avoid MSVC ICE
MSVC has an internal compiler error when assume is present in this constexpr function
2021-08-15 19:32:05 +01:00
Merry
2e4f99ae3d CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option 2021-08-15 16:09:37 +01:00
Merry
3b4459d112 CMakeLists: Enable C++20 support 2021-08-15 15:17:01 +01:00
Merry
4988d9fab3 disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16} 2021-08-15 15:16:53 +01:00
Merry
615ce8c7c5 IR: Remove A32 IR instructions Get{N,Z,V}Flag 2021-08-12 13:06:15 +01:00
Alexandre Bouvier
04b1c78166 cmake: Add checks for projects using dynarmic as subproject 2021-08-10 16:16:02 +02:00
Alexandre Bouvier
33b89cca08 cmake: Add options to unbundle some externals 2021-08-10 16:05:38 +02:00
Merry
72f8abe11d externals: Update mp to latest
Merge commit '163b59390c32745f95838b121be3ef5e2cf08e8c'
2021-08-10 12:30:46 +01:00
Merry
163b59390c Squashed 'externals/mp/' changes from 649fde1e..b50053ce
b50053ce function_info: Implement equivalent_function_type_with_class

git-subtree-dir: externals/mp
git-subtree-split: b50053cef50385419c59fb3aebb78974547318bc
2021-08-10 12:30:46 +01:00
Merry
2bc86209bd catch: Correct include directory 2021-08-08 12:52:55 +01:00
Wunkolo
1e94acff66 ir: Add VectorBroadcastElement{Lower} IR instruction
The lane-splatting variant of `FMUL` and `FMLA` is very
common in instruction streams when implementing things like
matrix multiplication. When used, they are used very densely.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

The way this is currently implemented is by grabbing the particular lane
into a general purpose register and then broadcasting it into a simd
register through `VectorGetElement` and `VectorBroadcast`.

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index));
```

What could be done instead is to keep it within
the vector-register and use a permute/shuffle to "splat" the particular
lane across all other lanes, removing the GPR-round-trip.

This is implemented as the new IR instruction `VectorBroadcastElement`:

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index);
```
2021-08-07 23:03:57 +01:00
Wunkolo
46b8cfabc0 bit_util: Protect Replicate from automatic up-casting
Recursive calls to `Replicate` beyond the first call might
cause an unintentional up-casting to an `int` type due
to `|` and `<<` operations on types such as `uint8_t` and `uint16_t`

This makes sure calls such as `Recursive<u8>` stay as the `u8` type
through-out.
2021-08-07 23:03:57 +01:00
Wunkolo
f171ce7859 tests: Add FMLA(lane) test
Math operations such as Matrix multiplication utilize these particular
instructions enough that there should be some unit tests for thesein particular.
The lane-splatting form of FMUL and FMLA instructions are of particular
interest and I've found them to be very common in retail game binaries
such as Pokemon Sword.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

I'm primarily adding this unit test so that I can ensure compatibility
while I tune and optimize them.
2021-08-07 23:03:57 +01:00
Merry
d41bc492fe {a32,a64}_jitstate: Remove unnecessary headers 2021-08-07 19:35:33 +01:00
Merry
07b5734fb0 xbyak: Correct xbyak include directory
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
2021-08-07 15:13:49 +01:00
Merry
31cefb22a0 fuzz_with_unicorn: Correct printing of vectors 2021-08-06 15:29:43 +01:00
Merry
59fb568b27 tests: Use Zydis for disassembly 2021-08-06 15:29:43 +01:00
Wunkolo
f33bd69ec2 emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed
AVX512 introduces the _unsigned_ variant of float-to-integer conversion
functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not
representable as an unsigned integer, it will result in `0xFFFFF...`
which can be utilized to get "free" saturation when the floating point
value exceeds the unsigned range, after masking away negative values.

https://www.felixcloutier.com/x86/vcvttps2udq
https://www.felixcloutier.com/x86/vcvttpd2uqq

This PR also speeds up the _signed_ conversion function for fp64->int64
https://www.felixcloutier.com/x86/vcvttpd2qq
2021-07-17 22:13:11 +01:00
SachinVin
048da372e9 block_of_code.cpp: remove redundant align() 2021-07-17 22:12:31 +01:00
Kappamalone
6ca6461450
docs/Design: Fix links (#633) 2021-07-11 19:22:46 +01:00
Merry
65309eb6bc gitignore: Update mig path 2021-07-11 11:38:43 +01:00
Wunkolo
5971361160 IR: Add AndNot{32,64} IR instruction
Also includes BMI1-acceleration for x64, when available
2021-07-02 22:27:29 +01:00
Wunkolo
49d00634f9 IR: Add VectorAndNot IR instruction
And(a, Not(b)) is a common enough operation that this can
be fused into a single `AndNot` operation. On x64 this is also
a single `pandn` instruction rather than two.
2021-07-02 22:27:29 +01:00
Wunkolo
253713baf1 opcodes.inc: Disable clang format 2021-07-02 22:27:29 +01:00
Wunkolo
1fc96fd0c2 emit_x64{_vector}_floating_point: Unsafe AVX512 implementation of Emit{RSqrt,Recip}Estimate
This implementation exists within the unsafe optimization paths and
utilize the 14-bit-precision `vrsqrt14*` and `vrcp14p*`
instructions provided by AVX512F+VL. These are _more_ accurate than
the fallback path and the current `rsqrt`-based unsafe code-path
but still falls in line with what is expected of the
`Unsafe_ReducedErrorFP` optimization flag.

Having AVX512 available will mean this function has 14 bits of precision.
Not having AVX512 available will mean these functions have 11 bits of precision.
2021-06-27 11:18:58 +01:00
MerryMage
ea02a7d05d conditional_state: Break from translation when invalid NV instruction is hit 2021-06-25 22:09:39 +01:00
merry
7946868af4
Merge pull request #629 from lioncash/fmt
externals: Update fmt to 8.0.0
2021-06-23 13:52:31 +01:00