Commit graph

79 commits

Author SHA1 Message Date
merry
593de127d2 a64_emit_x64: Clear fastmem patch information on ClearCache 2022-02-27 19:50:05 +00:00
Merry
c90173151e backend/x64: Split off memory emitters 2022-02-26 21:25:09 +00:00
Merry
19a423034e block_of_code: Fix inaccurate size reporting in SpaceRemaining
Typo: getCode should be getCurr: Instead of comparing against the current pointer,
we were incorrectly comparing against the start of memory
2022-02-26 16:09:11 +00:00
Merry
ea08a389b4 emit_x64_floating_point: EmitFPToFixed: No need to round if rounding_mode == TowardsZero
cvttsd2si truncates during operation
2022-02-23 20:44:02 +00:00
merry
b34214f953 emit_x64_floating_point: Improve EmitFPToFixed codegen 2022-02-23 19:42:15 +00:00
merry
5fe274f510 emit_x64_floating_point: Deinterlace 64-bit FPToFixed signed/unsigned codepaths 2022-02-23 19:14:41 +00:00
merry
b8dd1c7510 emit_x64_floating_point: Correct dead-code warning in MSVC 2019 2022-02-12 22:07:26 +00:00
merry
95a1ebfb97 backend/x64: Bugfix: A32 frontent also uses FPSCR.QC 2022-02-12 21:46:45 +00:00
Fernando Sahmkow
a8cbfd9af4 X86_Backend: set fences correctly for memory barriers and synchronization. 2022-02-01 14:27:54 +00:00
liushuyu
40afbe1927
disassembler_thumb: fix formatting issues with fmt 8.1.x ...
... fmt 8.1.0 added more formatting checks and Cond can't be formatted
directly now
2022-01-05 21:49:51 -07:00
Wunkolo
ad5465d6ce constant_pool: Use tsl::robin_map rather than unordered_map
Finding a much more drastic improvement with `robin_map`.

`map`:
```
[master] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     567.0 ms ±   6.9 ms    [User: 513.1 ms, System: 53.2 ms]
  Range (min … max):   554.4 ms … 588.1 ms    100 runs
```

`unordered_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     561.1 ms ±   4.5 ms    [User: 508.1 ms, System: 52.3 ms]
  Range (min … max):   552.6 ms … 574.2 ms    100 runs
```

`tsl::robin_map`:
```
[opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes"
Benchmark 1: ./dynarmic_tests --durations yes
  Time (mean ± σ):     553.5 ms ±   5.6 ms    [User: 500.7 ms, System: 52.1 ms]
  Range (min … max):   545.7 ms … 569.3 ms    100 runs
```
2022-01-01 12:13:13 +00:00
Wunkolo
e57bb0569a constant_pool: Convert hashtype from tuple to pair 2022-01-01 12:13:13 +00:00
Wunkolo
befc22a61e constant_pool: Use unordered_map rather than map
`map` is an ordinal structure with log(n) time searches.
`unordered_map` uses O(1) average-time searches and O(n) in the worst
case where a bucket has a to a colliding hash and has to start chaining.
The unordered version should speed up our general-case when looking up
constants.

I've added a trivial order-dependent(_(0,1) and (1,0) will return a
different hash_) hash to combine a 128-bit constant into a
64-bit hash that generally will not collide, using a bit-rotate to
preserve entropy.
2022-01-01 12:13:13 +00:00
Morph
28714ee75a general: Rename files with duplicate names
In MSVC, having files with identical filenames will result into massive slowdowns when compiling.
The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h
2021-12-23 11:38:58 +00:00
Andrea Pappacoda
4dcebc1822 build(cmake): add install target
This makes dynarmic installable, and also adds a CMake package config
file, that allows projects to use `find_package(dynarmic)` to import the
library.

I know #636 adds the same thing, but while experimenting with the
different install options in
https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034
I ended up with a working patch, so I'm proposing this as well. This
implements solution 2.
2021-10-30 19:03:23 +01:00
Andrea Pappacoda
b87a889d98 build(cmake): add version and soversion to the library
This adds versioning information to the built library.

When building the shared library on Linux systems, a new object will
be created: libdynarmic.so.5

This is really useful when talking about ABI compatibility.

The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR
are implicitly created when calling project(dynarmic VERSION x.y.z)
2021-10-11 06:53:05 +01:00
Fernando S
e4146ec3a1
x64 Interface: Allow for asynchronous invalidation (#647)
* x64 Interface: Make Invalidation asynchronous.

* Apply suggestions from code review
2021-10-05 15:06:41 +01:00
Wunkolo
5e7d2afe0f IR: Introduce VectorReduceAdd{8,16,32,64} opcode
Adds all elements of vector and puts the result into the lowest element.
Accelerates the `addv` instruction into a vectorized implementation
rather than a serial one.
2021-09-27 19:54:11 +01:00
Marshall Mohror
0b8fd755d8 Fix signal_stack_size for glibc 2.34
`SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.
2021-09-22 20:38:11 +01:00
Ben
6ce8bfaf32
Add API function to retrieve dissassembly as vector of strings (#644)
Co-authored-by: ben <Avuxo@users.noreply.github.com>
2021-09-16 16:45:20 -04:00
Merry
517e35f845 decoder_detail: Avoid MSVC ICE
MSVC has an internal compiler error when assume is present in this constexpr function
2021-08-15 19:32:05 +01:00
Merry
2e4f99ae3d CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option 2021-08-15 16:09:37 +01:00
Merry
4988d9fab3 disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16} 2021-08-15 15:16:53 +01:00
Merry
615ce8c7c5 IR: Remove A32 IR instructions Get{N,Z,V}Flag 2021-08-12 13:06:15 +01:00
Wunkolo
1e94acff66 ir: Add VectorBroadcastElement{Lower} IR instruction
The lane-splatting variant of `FMUL` and `FMLA` is very
common in instruction streams when implementing things like
matrix multiplication. When used, they are used very densely.

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication

The way this is currently implemented is by grabbing the particular lane
into a general purpose register and then broadcasting it into a simd
register through `VectorGetElement` and `VectorBroadcast`.

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index));
```

What could be done instead is to keep it within
the vector-register and use a permute/shuffle to "splat" the particular
lane across all other lanes, removing the GPR-round-trip.

This is implemented as the new IR instruction `VectorBroadcastElement`:

```cpp
    const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index);
```
2021-08-07 23:03:57 +01:00
Wunkolo
46b8cfabc0 bit_util: Protect Replicate from automatic up-casting
Recursive calls to `Replicate` beyond the first call might
cause an unintentional up-casting to an `int` type due
to `|` and `<<` operations on types such as `uint8_t` and `uint16_t`

This makes sure calls such as `Recursive<u8>` stay as the `u8` type
through-out.
2021-08-07 23:03:57 +01:00
Merry
d41bc492fe {a32,a64}_jitstate: Remove unnecessary headers 2021-08-07 19:35:33 +01:00
Merry
07b5734fb0 xbyak: Correct xbyak include directory
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
2021-08-07 15:13:49 +01:00
Merry
59fb568b27 tests: Use Zydis for disassembly 2021-08-06 15:29:43 +01:00
Wunkolo
f33bd69ec2 emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed
AVX512 introduces the _unsigned_ variant of float-to-integer conversion
functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not
representable as an unsigned integer, it will result in `0xFFFFF...`
which can be utilized to get "free" saturation when the floating point
value exceeds the unsigned range, after masking away negative values.

https://www.felixcloutier.com/x86/vcvttps2udq
https://www.felixcloutier.com/x86/vcvttpd2uqq

This PR also speeds up the _signed_ conversion function for fp64->int64
https://www.felixcloutier.com/x86/vcvttpd2qq
2021-07-17 22:13:11 +01:00
SachinVin
048da372e9 block_of_code.cpp: remove redundant align() 2021-07-17 22:12:31 +01:00
Wunkolo
5971361160 IR: Add AndNot{32,64} IR instruction
Also includes BMI1-acceleration for x64, when available
2021-07-02 22:27:29 +01:00
Wunkolo
49d00634f9 IR: Add VectorAndNot IR instruction
And(a, Not(b)) is a common enough operation that this can
be fused into a single `AndNot` operation. On x64 this is also
a single `pandn` instruction rather than two.
2021-07-02 22:27:29 +01:00
Wunkolo
253713baf1 opcodes.inc: Disable clang format 2021-07-02 22:27:29 +01:00
Wunkolo
1fc96fd0c2 emit_x64{_vector}_floating_point: Unsafe AVX512 implementation of Emit{RSqrt,Recip}Estimate
This implementation exists within the unsafe optimization paths and
utilize the 14-bit-precision `vrsqrt14*` and `vrcp14p*`
instructions provided by AVX512F+VL. These are _more_ accurate than
the fallback path and the current `rsqrt`-based unsafe code-path
but still falls in line with what is expected of the
`Unsafe_ReducedErrorFP` optimization flag.

Having AVX512 available will mean this function has 14 bits of precision.
Not having AVX512 available will mean these functions have 11 bits of precision.
2021-06-27 11:18:58 +01:00
MerryMage
ea02a7d05d conditional_state: Break from translation when invalid NV instruction is hit 2021-06-25 22:09:39 +01:00
Lioncash
9bb464a203 externals: Update fmt to 8.0.0 2021-06-23 05:04:53 -04:00
Wunkolo
c6125082ea emit_x64_floating_point: AVX512 implementation of EmitFPMinMaxNumeric 2021-06-20 10:12:27 +01:00
SachinVin
a626a2ec63 ir_emitter: Remove 32-bit-only SubWithCarry 2021-06-11 17:27:34 +01:00
Wunkolo
776208742b emit_x64_{vector_}floating_point: Centralize implementation of FP{Vector}{Abs,Neg}
Removes dependency on the constants at the top of some files
such as `f16_negative_zero` and `f32_non_sign_mask` in favor
of the `FPInfo` trait-type.

Also removes bypass delays by selecting between instructions
such as `pand`, `andps`, or `andpd` depending on the type
and keeps them in their respective uop domain.

See https://www.agner.org/optimize/instruction_tables.pdf for
more info on bypass delays.
2021-06-10 00:04:57 +01:00
Wunkolo
58ffde23f9 bit_util: Make Replicate constexpr 2021-06-10 00:04:57 +01:00
SachinVin
ccf27f9c8c ir_emitter: Remove 32-bit-only AddWithCarry 2021-06-09 01:54:03 +01:00
Wunkolo
5385edcc66 emit_x64_vector_floating_point: AVX512 implementation of EmitFPVector{Min,Max}{32,64} 2021-06-08 17:50:28 +01:00
Wunkolo
0c67b913fe backend/x64: Add vcmp constants 2021-06-08 17:50:28 +01:00
Wunkolo
8fde505943 backend/x64: Add vfpclass constants
Bit-wise constants for use with the `vfpclass` instruction.
2021-06-08 17:50:28 +01:00
Wunkolo
c82e29ed82 backend/x64: Add vrange constants
Adds compile-time `FpRangeLUT` for generating the 8-bit
immediate LUT value for the `vrange*` instruction
2021-06-08 17:50:28 +01:00
MerryMage
c1d5a7977e Add Unsafe_IgnoreStandardFPCRValue optimization 2021-06-08 17:26:45 +01:00
Wunkolo
c157dfcc4c emit_x64_vector: Reduce gf2p8affineqb requirement to GFNI
Currently, every usage of `gf2p8affineqb` is guarded by the
`AVX512F + AVX512VL + GFNI` requirement, when really
we only need `GFNI` on its own.

This will allow `GFNI`-only chips to get emit GFNI features without
needing to have AVX512 as well.
There _are_ chips in existance currently that strictly ship with GFNI and
have no implementation of AVX1/AVX2/AVX512(and thus no VEX/EVEX
encoding) such as Tremont(Lakefield) chips.
2021-06-08 14:00:00 +01:00
Wunkolo
e47d0d11c3 emit_x64_vector: AVX512 implementation of EmitVectorNot
Single in-place ternary logic instruction.
2021-06-08 03:11:38 +01:00
Markus Wick
0c12614d1a A64/config.h: Split fastmem and page_table options.
We might want to allocate different sizes for each of them.
e.g. for the unsafe fastmem approach without bounds checking.
Or for using the full 48bit adress range (with mirrors) by allocating our real arena as close to 1<<47 as possible.
2021-06-06 17:25:51 +01:00