General moving to keep kernel object types separate from the direct
kernel code. Also essentially a preliminary cleanup before eliminating
global kernel state in the kernel code.
Kernel/HLE: Use a mutex to synchronize access to the HLE kernel state between the cpu thread and any other possible threads that might touch the kernel (network thread, etc).
This mutex is acquired in SVC::CallSVC, ie, as soon as the guest application enters the HLE kernel, and should be acquired by the aforementioned threads before modifying kernel structures.
The implementation is based on reverse engineering of the 3DS's kernel.
A mutex holder's priority will be temporarily boosted to the best priority among any threads that want to acquire any of its held mutexes.
When the holder releases the mutex, it's priority will be boosted to the best priority among the threads that want to acquire any of its remaining held mutexes.
Define a variable with the value of the sync timeout error code.
Use a boost::flat_map instead of an unordered_map to hold the equivalence of objects and wait indices in a WaitSynchN call.
Threads will now be awakened when the objects they're waiting on are signaled, instead of repeating the WaitSynchronization call every now and then.
The scheduler is now called once after every SVC call, and once after a thread is awakened from sleep by its timeout callback.
This new implementation is based off reverse-engineering of the real kernel.
See https://gist.github.com/Subv/02f29bd9f1e5deb7aceea1e8f019c8f4 for a more detailed description of how the real kernel handles rescheduling.
All handles obtained via srv::GetServiceHandle or svcConnectToPort are references to ClientSessions.
Service modules will wait on the counterpart of those ClientSessions (Called ServerSessions) using svcReplyAndReceive or svcWaitSynchronization[1|N], and will be awoken when a SyncRequest is performed.
HLE Interfaces are now ClientPorts which override the HandleSyncRequest virtual member function to perform command handling immediately.
The code now properly configures the process image to match the loaded
binary segments (code, rodata, data) instead of just blindly allocating
a large chunk of dummy memory.