Allocating new textures has fairly high driver overhead.
We can avoid some of this by reusing the textures from destroyed surfaces since the game will probably create more textures with the same dimensions and format.
Some games (e.g. Pilotwings Resort) create many surfaces that are invalidated quickly but were never removed.
This occasionally lead to large lag spikes due to high lookup times and other data structure management overhead.
I made a request on the Xbyak issue tracker to allow some constructors
to be constexpr in order to avoid static constructors from needing to
execute for some of our register constants.
This request was implemented, so this updates Xbyak so that we can make
use of it.
fmt now automatically prints the numeric value of an enum class member by default, so we don't need to use casts any more.
Reduces the line noise in our code a bit.
Co-Authored-By: LC <712067+lioncash@users.noreply.github.com>
* Enable 'Accurate Multiplication' by default.
* Move 'Disk Shader Cache' to the 'Advanced' tab
* Prevent enabling 'Disk Shader Cache' when 'Enable Hardware Shader' or 'Accurate Multiplication' is disabled.
* Do not load 'Disk Shader Cache' when 'Accurate Multiplication' is disabled.
* Add a tooltip for 'Disk Shader Cache'.
Allows some implementations to avoid completely zeroing out the internal
buffer of the optional, and instead only set the validity byte within
the structure.
This also makes it consistent how we return empty optionals.
Co-Authored-By: LC <712067+lioncash@users.noreply.github.com>
The settings.h file doesn't actually need all of the definitions
on cam.h, only some of the enums. They can, therefore, be separated
into another file, which is included by settings.h instead.
The other changes are fixing files that included settings.h and
depended on indirect includes from includes of includes of cam.h
Some of the classes in this file already do this, so we can apply this
to the other ones to be consistent.
Allows these classes to play nicely and not churn copies when used with
standard containers or any other API that makes use of
std::move_if_noexcept.
* Avoid deadlock when stopping video dumping
* Use static_cast, make quit atomic
* One more atomic load
* Use suggested lock instead of atomic
* Fix locking
* gl_rasterizer_cache: Mark file-scope functions as static where applicable
Prevents -Wmissing-declaration warnings from occurring and also makes
these functions internally linked.
* gl_rasterizer_cache: Remove unused local std::string variable
Despite being unused, compilers are unable to completely remove any code
gen related to the construction and destruction of this variable, since
the destructor of std::string is non-trivial.
Thus, we can remove it to reduce a minor amount of unnecessary code
generation
* gl_rasterizer_cache: Mark hash implementation as noexcept
This shouldn't throw.
* gl_rasterizer_cache: Remove unused variable in ClearAll()
* gl_rasterizer_cache: Make use of const on references explicit
While declared as auto&, these actually behave as const auto& variables,
due to the constness of the container being iterated. We can make this
explicit for readability sake.
* gl_rasterizer_cache: Resolve truncation warnings
The size is forwarded to a std::memset call, which takes a std::size_t
as its size parameter, so we can just make this change to silence the
warnings.
* gl_rasterizer_cache: Resolve variable shadowing warnings
Prevents a -Wshadow warning from occurring.
Prevents the internal buffer in the std::optional from being zeroed out
unnecessarily and instead sets the validity byte only in some
implementations.
While we're at it, we can make use of std::move to eliminate unnecessary
heap reallocations from occurring.
Allows us to avoid even more string churn by allowing the AddLine
function to make use of fmt formatting so the string is formatted all at
once instead of concatenating multiple strings.
This is similar to how yuzu's decompiler works, which I've made function
the same way in the past.
Same behavior, no heap allocation.
strings returned from glGetString() are guaranteed to be static strings,
so this is safe to do. They're also guaranteed to be null-terminated.
Some implementations can use the std::nullopt_t constructor of
std::optional to avoid needing to completely zero out the internal
buffer of the optional and instead only set the validity byte within it.
e.g. Consider the following function:
std::optional<std::vector<ShaderDiskCacheRaw>> fn() {
return {};
}
With libc++ this will result in the following code generation on x86-64:
Fn():
mov rax, rdi
vxorps xmm0, xmm0, xmm0
vmovups ymmword ptr [rdi], ymm0
vzeroupper
ret
With libstdc++, we also get the similar equivalent:
Fn():
vpxor xmm0, xmm0, xmm0
mov rax, rdi
vmovdqu XMMWORD PTR [rdi], xmm0
vmovdqu XMMWORD PTR [rdi+16], xmm0
ret
If we change this function to return std::nullopt instead, then this
simplifies both the code gen from libc++ and libstdc++ down to:
Fn():
mov BYTE PTR [rdi+24], 0
mov rax, rdi
ret
Given how little of a change is necessary to result in better code
generation, this is essentially a "free" very minor optimization.
* video_core: reduce string allocations in shader decompiler
* use append for indentation instead of resize
Co-authored-by: Mat M. <mathew1800@gmail.com>
Same behavior, but doesn't result in an allocating copy of the passed in
string. Particularly given the string is only compared against other
existing strings.
Several standard constructors generally check if objects can be moved in
a non-throwing manner (usually via std::move_if_noexcept) to preserve
its exception guarantees. This means that if these were used with
certain containers any reallocations internally would cause resource
churn, as copies would be necessary instead of moves.
This way, if they're every used in that manner, the right behavior is
always performed.
Avoids copying the std::function when we don't need to. Particularly
given the std::function isn't actually stored anywhere, so there's no
need to move it.
This fixes#5067 by reverting a speculative change made in a previous PR.
From this one can conclude that, for disabled textures, black (0,0,0,1) is the correct colour and clear (0,0,0,0) is not.
* video_core/renderer_opengl/gl_rasterizer_cache: Create Format Reinterpretation Framework
Adds RGBA4 -> RGB5A1 reinterpretation commonly used by virtual console
If no matching surface can be found, ValidateSurface checks for a surface in the cache which is reinterpretable to the requested format.
If that fails, the cache is checked for any surface with a matching bit-width. If one is found, the region is flushed.
If not, the region is checked against dirty_regions to see if it was created entirely on the GPU.
If not, then the surface is flushed.
Co-Authored-By: James Rowe <jroweboy@users.noreply.github.com>
Co-Authored-By: Ben <b3n30@users.noreply.github.com>
temporary change to avoid merge conflicts with video dumping
* re-add D24S8->RGBA8 res_scale hack
* adress review comments
* fix dirty region check
* check for surfaces with invalid pixel format, and break logic into separate functions
* video_core/renderer_opengl: Move SurfaceParams into its own file
Some of its enums are needed outside of the rasterizer cache
and trying to use it caused circular dependencies.
* video_core/renderer_opengl: Overhaul the texture filter framework
This should make it less intrusive.
Now texture filtering doesn't have any mutable global state.
The texture filters now always upscale to the internal rendering resolution.
This simplifies the logic in UploadGLTexture and it simply takes the role of BlitTextures at the end of the function.
This also prevent extra blitting required when uploading to a framebuffer surface with a mismatched size.
* video_core/renderer_opengl: Use generated mipmaps for filtered textures
The filtered guest mipmaps often looked terrible.
* core/settings: Remove texture filter factor
* sdl/config: Remove texture filter factor
* qt/config: Remove texture filter factor
src/video_core/renderer_opengl/texture_filters/bicubic/bicubic.cpp:51:86: error: cannot initialize a parameter of type 'GLuint' (aka 'unsigned int') with an rvalue of type 'nullptr_t'
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, NULL, 0);
^~~~
src/video_core/renderer_opengl/texture_filters/xbrz/xbrz_freescale.cpp:95:86: error: cannot initialize a parameter of type 'GLuint' (aka 'unsigned int') with an rvalue of type 'nullptr_t'
glFramebufferTexture2D(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, NULL, 0);
^~~~
/usr/include/sys/_null.h:37:14: note: expanded from macro 'NULL'
#define NULL nullptr
^~~~~~~
The default is discrete_interval which has dynamic open-ness.
We only use right_open intervals anyway. In theory this could allow some compile-time optimizations.
This uses the mailbox model to move pixel downloading to its own thread, eliminating Nvidia's warnings and (possibly) making use of GPU copy engine.
To achieve this, we created a new mailbox type that is different from the presentation mailbox in that it never discards a rendered frame.
Also, I tweaked the projection matrix thing so that it can just draw the frame upside down instead of having the CPU flip it.
* videocore/renderer_opengl/gl_rasterizer_cache: Move bits per pixel table out of function
GCC and MSVC copy the table at runtime with the old implementation, which is wasteful and prevents inlining. Unfortunately, static constexpr variables are not legal in constexpr functions, so the table has to be external.
Also replaced non-standard assert with DEBUG_ASSERT_MSG.
* fix case of table name in assert
* set table to private
This is based on what was done using additional layouts, but modified
to have a variable to control rotation and making it so Single Screen
Layout behaves like Upright Single would, and Default Layout behaves
like Upright Double would, when the new variable is used.
Large Layout and Side Layout currently ignore the new variable.
New variable still currently doesn't have a hotkey.
Previously we would first attempt to use any buffer that was free,
meaning whichever buffer has already been displayed. This has poor
interactions when the operating system throttles the update rate of the
window, so if there isn't any free buffers available, just reuse the
oldest frame instead.
While QOpenGLWidget sounds like a good idea, it has issues which are
harder to debug due to how Qt manages the context behind the scenes. We
could probably work around any of these issues over time, but its
probably easier to do it ourselves with a QWindow directly.
Plus using QWindow + createWindowContainer is the easiest to use
configuration for Qt + Vulkan so this is probably much better in the
long run.
For better tracking of performance regressions on incoming changes, this
change adds a way to dump frametime to file by changing an ini config
option. This is intentionally hidden as its only useful to a small
number of individuals, and not really applicable to the general
userbase.
Two PBOs are used to speed up pixel copying process. To avoid getting the wrong speed/FPS, a new parameter is added to DrawScreens about whether to increase the frame count.
* Add Anaglyph 3D
Change 3D slider in-game
Change shaders while game is running
Move shader loading into function
Disable 3D slider setting when stereoscopy is off
The rest of the shaders
Address review issues
Documentation and minor fixups
Forgot clang-format
Fix shader release on SDL2-software rendering
Remove unnecessary state changes
Respect 3D factor setting regardless of stereoscopic rendering
Improve shader resolution passing
Minor setting-related improvements
Add option to toggle texture filtering
Rebase fixes
* One final clang-format
* Fix OpenGL problems
Since C++17, the introduction of deduction guides for locking facilities
means that we no longer need to hardcode the mutex type into the locks
themselves, making it easier to switch mutex types, should it ever be
necessary in the future.
Previously this check is in GetSurface (if (addr == 0)). This worked fine because GetTextureSurface directly forwarded the address value to GetSurface. However, now with mipmap support, GetTextureSurface would call GetSurface several times with different address offset, resulting some >0 but still invalid address in case the input is 0. We should error out early on invalid address instead of sending it furthor down which would cause invalid memory access
video_core: shorten GetGLSLVersionString
video_core: make GLES version and extensions consistent
video_core: move some logic to LoadShader
video_core: deduplicate fragment shader precision specifier
Allows capturing screenshot at the current internal resolution (native for software renderer), but a setting is available to capture it in other resolutions. The screenshot is saved to a single PNG in the current layout.
They were missed, and Copy is very high in profile here. It doesn't block the GPU,
but it stalls the driver thread. So with our bad GL instructions, this might block quite a while.