* externals: Add oaksim submodule
Used for emitting ARM64 assembly
* common: Implement aarch64 ABI
Utilize oaknut to implement a stack frame.
* tests: Allow shader-jit tests for x64 and a64
Run the shader-jit tests for both x86_64 and arm64 targets
* video_core: Initialize arm64 shader-jit backend
Passes all current unit tests!
* shader_jit_a64: protect/unprotect memory when jit-ing
Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.
* shader_jit_a64: Fix ARM64-Imm overflow
These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.
* shader_jit_a64: Fix Geometry shader conditional
* shader_jit_a64: Replace `ADRL` with `MOVP2R`
Fixes some immediate-generation exceptions.
* common/aarch64: Fix CallFarFunction
* shader_jit_a64: Optimize `SantitizedMul`
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
* shader_jit_a64: Fix address register offset behavior
Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.
* shader_jit_a64: Fix `RET` address offset
A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.
* shader_jit_a64: Increase max program size
Tuned for A64 program size.
* shader_jit_a64: Use `UBFX` for extracting loop-state
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `CMP+B` to `CBNZ`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Use `FMOV` for `ONE` vector
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove x86-specific documentation
* shader_jit_a64: Use `UBFX` to extract exponent
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check
Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:
No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f
* shader_jit/tests: Add Dest Mask unit test
* shader_jit_a64: Fix Dest-Mask `BSL` operand order
Passes the dest-mask unit tests now.
* shader_jit_a64: Use `MOVI` for DestEnable mask
Accelerate certain cases of masking with MOVI as well
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests: Add source-swizzle unit test
This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.
* shader_jit_a64: Fix swizzle index generation
This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.
* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`
Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly. This enabled the ability
emit assembly that is not directly supported by nihstro.
* shader_jit/tests: Add `CALL` unit-test
Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.
EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.
* shader_jit_a64: Fix nested `BL` subroutines
`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.
Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.
* shader_jit/tests: Allocate generated tests on heap
Each of these generated shader-test objects were causing the stack to
overflow. Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.
* shader_jit_a64: Preserve `lr` register from external function calls
`EMIT` makes an external function call, and should be preserving `lr`
* shader_jit/tests: Add `MAD` unit-test
The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68
Instead, the program code is manually configured and added.
* shader_jit/tests: Fix uninitialized instructions
These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.
* shader_jit_a64: Remove unneeded `MOV`
Residue from the direct-port of x64 code.
* shader_jit_a64: Use `std::array` for `instr_table`
Add some type-safety and const-correctness around this type as well.
* shader_jit_a64: Avoid c-style offset casting
Add some more const-correctness to this function as well.
* video_core: Add arch preprocessor comments
* common/aarch64: Use X16 as the veneer register
https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard
* shader_jit/tests: Add uniform reading unit-test
Particularly to ensure that addresses are being properly truncated
* common/aarch64: Use `X0` as `ABI_RETURN`
`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.
* common/aarch64: Add veneer register note
`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.
* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`
* shader_jit_a64: Fix CALLU condition
Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!
---------
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* video_core: Abstract shader generators.
* shader: Extract common generator structures and move generators to specific namespaces.
* shader: Minor fixes and clean-up.
* sw_framebuffer: Take factors into account for min/max blending
* renderer_gl: Take factors into account for min/max blending
* Address review comments
* gl_shader_gen: Fix frambuffer fetch on qcom and mali
* renderer_opengl: Add fallback path for mesa
* gl_shader_gen: Avoid emitting blend emulation if minmax_factor is present
Xbyak has a complete utility-class for determining the host-processor's
ISA-features such as SSE4.1, AVX, AVX2, AVX512{F,VL,DQ,VBMI,etc}, and so
on for further potential optimizations.
* tests: add Sanity test for SplitFilename83
fix test
fix test
* disable `C4715:not all control paths return a value` for nihstro includes
nihstro: no warn
* Chore: Enable warnings as errors on msvc + fix warnings
fixes
some more warnings
clang-format
* more fixes
* Externals: Add target_compile_options `/W0` nihstro-headers and ...
Revert "disable `C4715:not all control paths return a value` for nihstro includes"
This reverts commit 606d79b55d3044b744fb835025b8eb0f4ea5b757.
* src\citra\config.cpp: ReadSetting: simplify type casting
* settings.cpp: Get*Name: remove superflous logs
* externals: Update dynarmic
* settings: Introduce GraphicsAPI enum
* For now it's OpenGL only but will be expanded upon later
* citra_qt: Introduce backend agnostic context management
* Mostly a direct port from yuzu
* core: Simplify context acquire
* settings: Add option to create debug contexts
* renderer_opengl: Abstract initialization to Driver
* This commit also updates glad and adds some useful extensions which we will use in part 2
* Rasterizer construction is moved to the specific renderer instead of RendererBase.
Software rendering has been disable to achieve this but will be brought back in the next commit.
* video_core: Remove Init/Shutdown methods from renderer
* The constructor and destructor can do the same job
* In addition move opengl function loading to Qt since SDL already does this. Also remove ErrorVideoCore which is never reached
* citra_qt: Decouple software renderer from opengl part 1
* citra: Decouple software renderer from opengl part 2
* android: Decouple software renderer from opengl part 3
* swrasterizer: Decouple software renderer from opengl part 4
* This commit simply enforces the renderer naming conventions in the software renderer
* video_core: Move RendererBase to VideoCore
* video_core: De-globalize screenshot state
* video_core: Pass system to the renderers
* video_core: Commonize shader uniform data
* video_core: Abstract backend agnostic rasterizer operations
* bootmanager: Remove references to OpenGL for macOS
OpenGL macOS headers definitions clash heavily with each other
* citra_qt: Proper title for api settings
* video_core: Reduce boost usage
* bootmanager: Fix hide mouse option
Remove event handlers from RenderWidget for events that are
already handled by the parent GRenderWindow.
Also enable mouse tracking on the RenderWidget.
* android: Remove software from graphics api list
* code: Address review comments
* citra: Port per-game settings read
* Having to update the default value for all backends is a pain so lets centralize it
* android: Rename to OpenGLES
---------
Co-authored-by: MerryMage <MerryMage@users.noreply.github.com>
Co-authored-by: Vitor Kiguchi <vitor-kiguchi@hotmail.com>
xbyak is intended to be installed in /usr/local/include/xbyak.
Since we desire not to install xbyak before using it, we copy the headers
to the appropriate directory structure and use that instead
Co-authored-by: merry <git@mary.rs>
I made a request on the Xbyak issue tracker to allow some constructors
to be constexpr in order to avoid static constructors from needing to
execute for some of our register constants.
This request was implemented, so this updates Xbyak so that we can make
use of it.
We are going to add private memebers to ShaderSetup, which forbids the usage of offsetof. The JIT program only use the uniform part of the setup, so we can just isolate it.