Converted tests to use oaknut, and added some extra test cases. SSHL's
additional tests are targetd to make sure that the sign of the lowest
byte is used to determine shift direction, not the entire element. USHL
targets this as well as just having more negative (right shift) cases in
general.
Additionally changed the numeric values of the test vectors in the
32-bit element tests to match the pattern of the other tests - this
makes it easier to tell at a glance what elements are out of place if a
test fails.
Math operations such as Matrix multiplication utilize these particular
instructions enough that there should be some unit tests for thesein particular.
The lane-splatting form of FMUL and FMLA instructions are of particular
interest and I've found them to be very common in retail game binaries
such as Pokemon Sword.
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication
I'm primarily adding this unit test so that I can ensure compatibility
while I tune and optimize them.
We might want to allocate different sizes for each of them.
e.g. for the unsafe fastmem approach without bounds checking.
Or for using the full 48bit adress range (with mirrors) by allocating our real arena as close to 1<<47 as possible.
Tests the TBL instruction with implementation with {1-4} register
lookups and the handling of out-of-bound indices.
Intended to target the implementation of VectorTableLookup128
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)