docs: Update documentation (2018-02-05)

This commit is contained in:
MerryMage 2018-02-05 22:30:39 +00:00
parent eb5591859c
commit ca43be4146
2 changed files with 102 additions and 106 deletions

View file

@ -4,15 +4,18 @@ Dynarmic is a dynamic recompiler for the ARMv6K architecture. Future plans for d
support for other versions of the ARM architecture, having a interpreter mode, and adding support support for other versions of the ARM architecture, having a interpreter mode, and adding support
for other architectures. for other architectures.
Users of this library interact with it primarily through [`include/dynarmic/dynarmic.h`](../include/dynarmic/dynarmic.h). Users of this library interact with it primarily through the interface provided in
Users specify how dynarmic's CPU core interacts with the rest of their systems by setting members of the [`include/dynarmic`](../include/dynarmic). Users specify how dynarmic's CPU core interacts with
[`Dynarmic::UserCallbacks`](../include/dynarmic/callbacks.h) structure as appropriate. Users setup the CPU state using member functions of the rest of their system providing an implementation of the relevant `UserCallbacks` interface.
`Dynarmic::Jit`, then call `Dynarmic::Jit::Execute` to start CPU execution. The callbacks defined on `UserCallbacks` Users setup the CPU state using member functions of `Jit`, then call `Jit::Execute` to start CPU
may be called from dynamically generated code, so users of the library should not depend on the stack being in a execution. The callbacks defined on `UserCallbacks` may be called from dynamically generated code,
walkable state for unwinding. so users of the library should not depend on the stack being in a walkable state for unwinding.
Dynarmic reads instructions from memory by calling `UserCallbacks::MemoryRead32`. These instructions then pass * A32: [`Jit`](../include/dynarmic/A32/a32.h), [`UserCallbacks`](../include/dynarmic/A32/config.h)
through several stages: * A64: [`Jit`](../include/dynarmic/A64/a64.h), [`UserCallbacks`](../include/dynarmic/A64/config.h)
Dynarmic reads instructions from memory by calling `UserCallbacks::MemoryReadCode`. These
instructions then pass through several stages:
1. Decoding (Identifying what type of instruction it is and breaking it up into fields) 1. Decoding (Identifying what type of instruction it is and breaking it up into fields)
2. Translation (Generation of high-level IR from the instruction) 2. Translation (Generation of high-level IR from the instruction)
@ -20,39 +23,39 @@ through several stages:
4. Emission (Generation of host-executable code into memory) 4. Emission (Generation of host-executable code into memory)
5. Execution (Host CPU jumps to the start of emitted code and runs it) 5. Execution (Host CPU jumps to the start of emitted code and runs it)
Using the x64 backend as an example: Using the A32 frontend with the x64 backend as an example:
* Decoding is done by [double dispatch](https://en.wikipedia.org/wiki/Visitor_pattern) in * Decoding is done by [double dispatch](https://en.wikipedia.org/wiki/Visitor_pattern) in
[`src/frontend/decoder/{arm.h,thumb16.h,thumb32.h}`](../src/frontend/decoder/). [`src/frontend/A32/decoder/{arm.h,thumb16.h,thumb32.h}`](../src/frontend/A32/decoder/).
* Translation is done by the visitors in `src/frontend/translate/translate_{arm,thumb}.cpp`. * Translation is done by the visitors in `src/frontend/A32/translate/translate_{arm,thumb}.cpp`.
The function [`IR::Block Translate(LocationDescriptor descriptor, MemoryRead32FuncType memory_read_32)`](../src/frontend/translate/translate.h) takes a The function [`Translate`](../src/frontend/A32/translate/translate.h) takes a starting memory location,
memory location, some CPU state, and memory reader callback and returns a basic block of IR. some CPU state, and memory reader callback and returns a basic block of IR.
* The IR can be found under [`src/frontend/ir/`](../src/frontend/ir/). * The IR can be found under [`src/frontend/ir/`](../src/frontend/ir/).
* Optimizations can be found under [`src/ir_opt/`](../src/ir_opt/). * Optimizations can be found under [`src/ir_opt/`](../src/ir_opt/).
* Emission is done by `EmitX64` which can be found in `src/backend_x64/emit_x64.{h,cpp}`. * Emission is done by `EmitX64` which can be found in `src/backend_x64/emit_x64.{h,cpp}`.
* Execution is performed by calling `BlockOfCode::RunCode` in `src/backend_x64/block_of_code.{h,cpp}`. * Execution is performed by calling `BlockOfCode::RunCode` in `src/backend_x64/block_of_code.{h,cpp}`.
## Decoder ## Decoder
The decoder is a double dispatch decoder. Each instruction is represented by a line in the relevant instruction table. The decoder is a double dispatch decoder. Each instruction is represented by a line in the relevant
Here is an example line from `g_arm_instruction_table`: instruction table. Here is an example line from [`arm.h`](../src/frontend/A32/decoder/arm.h):
INST(&V::arm_ADC_imm, "ADC (imm)", "cccc0010101Snnnnddddrrrrvvvvvvvv") INST(&V::arm_ADC_imm, "ADC (imm)", "cccc0010101Snnnnddddrrrrvvvvvvvv")
(Details on this instruction can be found in section A8.8.1 of the ARMv7-A manual. This is encoding A1.) (Details on this instruction can be found in section A8.8.1 of the ARMv7-A manual. This is encoding A1.)
The first argument to INST is the member function to call on the visitor. The second argument is a user-readable The first argument to INST is the member function to call on the visitor. The second argument is a user-readable
instruction name. The third argument is a bit-representation of the instruction. instruction name. The third argument is a bit-representation of the instruction.
### Instruction Bit-Representation ### Instruction Bit-Representation
Each character in the bitstring represents a bit. A `0` means that that bitposition **must** contain a zero. A `1` Each character in the bitstring represents a bit. A `0` means that that bitposition **must** contain a zero. A `1`
means that that bitposition **must** contain a one. A `-` means we don't care about the value at that bitposition. means that that bitposition **must** contain a one. A `-` means we don't care about the value at that bitposition.
A string of the same character represents a field. In the above example, the first four bits `cccc` represent the A string of the same character represents a field. In the above example, the first four bits `cccc` represent the
four-bit-long cond field of the ARM Add with Carry (immediate) instruction. four-bit-long cond field of the ARM Add with Carry (immediate) instruction.
The visitor would have to have a function named `arm_ADC_imm` with 6 arguments, one for each field (`cccc`, `S`, The visitor would have to have a function named `arm_ADC_imm` with 6 arguments, one for each field (`cccc`, `S`,
`nnnn`, `dddd`, `rrrr`, `vvvvvvvv`). If there is a mismatch of field number with argument number, a compile-time `nnnn`, `dddd`, `rrrr`, `vvvvvvvv`). If there is a mismatch of field number with argument number, a compile-time
error results. error results.
## Translator ## Translator
@ -62,9 +65,9 @@ help of the [`IREmitter` class](../src/frontend/ir/ir_emitter.h). An example of
bool ArmTranslatorVisitor::arm_ADC_imm(Cond cond, bool S, Reg n, Reg d, int rotate, Imm8 imm8) { bool ArmTranslatorVisitor::arm_ADC_imm(Cond cond, bool S, Reg n, Reg d, int rotate, Imm8 imm8) {
u32 imm32 = ArmExpandImm(rotate, imm8); u32 imm32 = ArmExpandImm(rotate, imm8);
// ADC{S}<c> <Rd>, <Rn>, #<imm> // ADC{S}<c> <Rd>, <Rn>, #<imm>
if (ConditionPassed(cond)) { if (ConditionPassed(cond)) {
auto result = ir.AddWithCarry(ir.GetRegister(n), ir.Imm32(imm32), ir.GetCFlag()); auto result = ir.AddWithCarry(ir.GetRegister(n), ir.Imm32(imm32), ir.GetCFlag());
@ -83,22 +86,22 @@ help of the [`IREmitter` class](../src/frontend/ir/ir_emitter.h). An example of
ir.SetVFlag(result.overflow); ir.SetVFlag(result.overflow);
} }
} }
return true; return true;
} }
where `ir` is an instance of the `IRBuilder` class. Each member function of the `IRBuilder` class constructs where `ir` is an instance of the `IRBuilder` class. Each member function of the `IRBuilder` class constructs
an IR microinstruction. an IR microinstruction.
## Intermediate Representation
Dynarmic uses an ordered SSA intermediate representation. It is very vaguely similar to those found in other
similar projects like redream, nucleus, and xenia. Major differences are: (1) the abundance of context microinstructions
whereas those projects generally only have two (`load_context`/`store_context`), (2) the explicit handling of
flags as their own values, and (3) very different basic block edge handling.
The intention of the context microinstructions and explicit flag handling is to allow for future optimizations. The ## Intermediate Representation
differences in the way edges are handled are a quirk of the current implementation and dynarmic will likely add a
Dynarmic uses an ordered SSA intermediate representation. It is very vaguely similar to those found in other
similar projects like redream, nucleus, and xenia. Major differences are: (1) the abundance of context
microinstructions whereas those projects generally only have two (`load_context`/`store_context`), (2) the
explicit handling of flags as their own values, and (3) very different basic block edge handling.
The intention of the context microinstructions and explicit flag handling is to allow for future optimizations. The
differences in the way edges are handled are a quirk of the current implementation and dynarmic will likely add a
function analyser in the medium-term future. function analyser in the medium-term future.
Dynarmic's intermediate representation is typed. Each microinstruction may take zero or more arguments and may Dynarmic's intermediate representation is typed. Each microinstruction may take zero or more arguments and may
@ -106,6 +109,8 @@ return zero or more arguments. A subset of the microinstructions available is do
A complete list of microinstructions can be found in [src/frontend/ir/opcodes.inc](../src/frontend/ir/opcodes.inc). A complete list of microinstructions can be found in [src/frontend/ir/opcodes.inc](../src/frontend/ir/opcodes.inc).
The below lists some commonly used microinstructions.
### Immediate: Imm{U1,U8,U32,RegRef} ### Immediate: Imm{U1,U8,U32,RegRef}
<u1> ImmU1(u1 value) <u1> ImmU1(u1 value)
@ -120,13 +125,13 @@ by the IR.
<u32> GetRegister(<RegRef> reg) <u32> GetRegister(<RegRef> reg)
<void> SetRegister(<RegRef> reg, <u32> value) <void> SetRegister(<RegRef> reg, <u32> value)
Gets and sets `JitState::Reg[reg]`. Note that `SetRegister(Arm::Reg::R15, _)` is disallowed by IRBuilder. Gets and sets `JitState::Reg[reg]`. Note that `SetRegister(Arm::Reg::R15, _)` is disallowed by IRBuilder.
Use `{ALU,BX}WritePC` instead. Use `{ALU,BX}WritePC` instead.
Note that sequences like `SetRegister(R4, _)` followed by `GetRegister(R4)` are Note that sequences like `SetRegister(R4, _)` followed by `GetRegister(R4)` are
optimized away. optimized away.
### Context: {Get,Set}{N,Z,C,V}Flag ### Context: {Get,Set}{N,Z,C,V}Flag
<u1> GetNFlag() <u1> GetNFlag()
@ -143,7 +148,7 @@ Gets and sets bits in `JitState::Cpsr`. Similarly to registers redundant get/set
### Context: BXWritePC ### Context: BXWritePC
<void> BXWritePC(<u32> value) <void> BXWritePC(<u32> value)
This should probably be the last instruction in a translation block unless you're doing something fancy. This should probably be the last instruction in a translation block unless you're doing something fancy.
This microinstruction sets R15 and CPSR.T as appropriate. This microinstruction sets R15 and CPSR.T as appropriate.
@ -165,73 +170,73 @@ Extract a u16 and u8 respectively from a u32.
<u1> MostSignificantBit(<u32> value) <u1> MostSignificantBit(<u32> value)
<u1> IsZero(<u32> value) <u1> IsZero(<u32> value)
These are used to implement ARM flags N and Z. These can often be optimized away by the backend into a host flag read. These are used to implement ARM flags N and Z. These can often be optimized away by the backend into a host flag read.
### Calculation: LogicalShiftLeft ### Calculation: LogicalShiftLeft
(<u32> result, <u1> carry_out) LogicalShiftLeft(<u32> operand, <u8> shift_amount, <u1> carry_in) (<u32> result, <u1> carry_out) LogicalShiftLeft(<u32> operand, <u8> shift_amount, <u1> carry_in)
Pseudocode: Pseudocode:
if shift_amount == 0: if shift_amount == 0:
return (operand, carry_in) return (operand, carry_in)
x = operand * (2 ** shift_amount) x = operand * (2 ** shift_amount)
result = Bits<31,0>(x) result = Bits<31,0>(x)
carry_out = Bit<32>(x) carry_out = Bit<32>(x)
return (result, carry_out) return (result, carry_out)
This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHL` does on x64). This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHL` does on x64).
### Calculation: LogicalShiftRight ### Calculation: LogicalShiftRight
(<u32> result, <u1> carry_out) LogicalShiftLeft(<u32> operand, <u8> shift_amount, <u1> carry_in) (<u32> result, <u1> carry_out) LogicalShiftLeft(<u32> operand, <u8> shift_amount, <u1> carry_in)
Pseudocode: Pseudocode:
if shift_amount == 0: if shift_amount == 0:
return (operand, carry_in) return (operand, carry_in)
x = ZeroExtend(operand, from_size: 32, to_size: shift_amount+32) x = ZeroExtend(operand, from_size: 32, to_size: shift_amount+32)
result = Bits<shift_amount+31,shift_amount>(x) result = Bits<shift_amount+31,shift_amount>(x)
carry_out = Bit<shift_amount-1>(x) carry_out = Bit<shift_amount-1>(x)
return (result, carry_out) return (result, carry_out)
This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHR` does on x64). This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SHR` does on x64).
### Calculation: ArithmeticShiftRight ### Calculation: ArithmeticShiftRight
(<u32> result, <u1> carry_out) ArithmeticShiftRight(<u32> operand, <u8> shift_amount, <u1> carry_in) (<u32> result, <u1> carry_out) ArithmeticShiftRight(<u32> operand, <u8> shift_amount, <u1> carry_in)
Pseudocode: Pseudocode:
if shift_amount == 0: if shift_amount == 0:
return (operand, carry_in) return (operand, carry_in)
x = SignExtend(operand, from_size: 32, to_size: shift_amount+32) x = SignExtend(operand, from_size: 32, to_size: shift_amount+32)
result = Bits<shift_amount+31,shift_amount>(x) result = Bits<shift_amount+31,shift_amount>(x)
carry_out = Bit<shift_amount-1>(x) carry_out = Bit<shift_amount-1>(x)
return (result, carry_out) return (result, carry_out)
This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SAR` does on x64). This follows ARM semantics. Note `shift_amount` is not masked to 5 bits (like `SAR` does on x64).
### Calcuation: RotateRight ### Calcuation: RotateRight
(<u32> result, <u1> carry_out) RotateRight(<u32> operand, <u8> shift_amount, <u1> carry_in) (<u32> result, <u1> carry_out) RotateRight(<u32> operand, <u8> shift_amount, <u1> carry_in)
Pseudocode: Pseudocode:
if shift_amount == 0: if shift_amount == 0:
return (operand, carry_in) return (operand, carry_in)
shift_amount %= 32 shift_amount %= 32
result = (operand << shift_amount) | (operand >> (32 - shift_amount)) result = (operand << shift_amount) | (operand >> (32 - shift_amount))
carry_out = Bit<31>(result) carry_out = Bit<31>(result)
return (result, carry_out) return (result, carry_out)
### Calculation: AddWithCarry ### Calculation: AddWithCarry
@ -243,7 +248,7 @@ a + b + carry_in
### Calculation: SubWithCarry ### Calculation: SubWithCarry
(<u32> result, <u1> carry_out, <u1> overflow) SubWithCarry(<u32> a, <u32> b, <u1> carry_in) (<u32> result, <u1> carry_out, <u1> overflow) SubWithCarry(<u32> a, <u32> b, <u1> carry_in)
This has equivalent semantics to `AddWithCarry(a, Not(b), carry_in)`. This has equivalent semantics to `AddWithCarry(a, Not(b), carry_in)`.
a - b - !carry_in a - b - !carry_in
@ -251,17 +256,17 @@ a - b - !carry_in
### Calculation: And ### Calculation: And
<u32> And(<u32> a, <u32> b) <u32> And(<u32> a, <u32> b)
### Calculation: Eor ### Calculation: Eor
<u32> Eor(<u32> a, <u32> b) <u32> Eor(<u32> a, <u32> b)
Exclusive OR (i.e.: XOR) Exclusive OR (i.e.: XOR)
### Calculation: Or ### Calculation: Or
<u32> Or(<u32> a, <u32> b) <u32> Or(<u32> a, <u32> b)
### Calculation: Not ### Calculation: Not
<u32> Not(<u32> value) <u32> Not(<u32> value)
@ -282,17 +287,17 @@ Memory access.
### Terminal: Interpret ### Terminal: Interpret
SetTerm(IR::Term::Interpret{next}) SetTerm(IR::Term::Interpret{next})
This terminal instruction calls the interpreter, starting at `next`. This terminal instruction calls the interpreter, starting at `next`.
The interpreter must interpret exactly one instruction. The interpreter must interpret exactly one instruction.
### Terminal: ReturnToDispatch ### Terminal: ReturnToDispatch
SetTerm(IR::Term::ReturnToDispatch{}) SetTerm(IR::Term::ReturnToDispatch{})
This terminal instruction returns control to the dispatcher. This terminal instruction returns control to the dispatcher.
The dispatcher will use the value in R15 to determine what comes next. The dispatcher will use the value in R15 to determine what comes next.
### Terminal: LinkBlock ### Terminal: LinkBlock
SetTerm(IR::Term::LinkBlock{next}) SetTerm(IR::Term::LinkBlock{next})

View file

@ -2,12 +2,14 @@
`HostLoc`s contain values. A `HostLoc` ("host value location") is either a host CPU register or a host spill location. `HostLoc`s contain values. A `HostLoc` ("host value location") is either a host CPU register or a host spill location.
Values once set cannot be changed. Values can however be moved by the register allocator between `HostLoc`s. This is handled by the register allocator itself and code that uses the register allocator need not and should not move values between registers. Values once set cannot be changed. Values can however be moved by the register allocator between `HostLoc`s. This is
handled by the register allocator itself and code that uses the register allocator need not and should not move values
between registers.
The register allocator is based on three concepts: `Use`, `Def` and `Scratch`. The register allocator is based on three concepts: `Use`, `Def` and `Scratch`.
* `Use`: The use of a value. * `Use`: The use of a value.
* `Def`: The definition of a value, this is the only time when a value is set. * `Define`: The definition of a value, this is the only time when a value is set.
* `Scratch`: Allocate a register that can be freely modified as one wishes. * `Scratch`: Allocate a register that can be freely modified as one wishes.
Note that `Use`ing a value decrements its `use_count` by one. When the `use_count` reaches zero the value is discarded and no longer exists. Note that `Use`ing a value decrements its `use_count` by one. When the `use_count` reaches zero the value is discarded and no longer exists.
@ -23,63 +25,52 @@ At runtime, allocate one of the registers in `desired_locations`. You are free t
### Pure `Use` ### Pure `Use`
Xbyak::Reg64 UseGpr(IR::Value use_value, HostLocList desired_locations = any_gpr); Xbyak::Reg64 UseGpr(Argument& arg);
Xbyak::Xmm UseXmm(IR::Value use_value, HostLocList desired_locations = any_xmm); Xbyak::Xmm UseXmm(Argument& arg);
OpArg UseOpArg(IR::Value use_value, HostLocList desired_locations); OpArg UseOpArg(Argument& arg);
void Use(Argument& arg, HostLoc host_loc);
At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. At runtime, the value corresponding to `arg` will be placed a register. The actual register is determined by
which one of the above functions is called. `UseGpr` places it in an unused GPR, `UseXmm` places it
in an unused XMM register, `UseOpArg` might be in a register or might be a memory location, and `Use` allows
you to specify a specific register (GPR or XMM) to use.
This register **must not** have it's value changed. This register **must not** have it's value changed.
* `UseGpr`: The location is a GPR.
* `UseXmm`: The location is an XMM register.
* `UseOpArg`: The location may be one of the locations specified by `desired_locations`, but may also be a host memory reference.
### `UseScratch` ### `UseScratch`
Xbyak::Reg64 UseScratchGpr(IR::Value use_value, HostLocList desired_locations = any_gpr) Xbyak::Reg64 UseScratchGpr(Argument& arg);
Xbyak::Xmm UseScratchXmm(IR::Value use_value, HostLocList desired_locations = any_xmm) Xbyak::Xmm UseScratchXmm(Argument& arg);
void UseScratch(Argument& arg, HostLoc host_loc);
At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. At runtime, the value corresponding to `arg` will be placed a register. The actual register is determined by
which one of the above functions is called. `UseScratchGpr` places it in an unused GPR, `UseScratchXmm` places it
in an unused XMM register, and `UseScratch` allows you to specify a specific register (GPR or XMM) to use.
You are free to modify the register. The register is discarded at the end of the allocation scope. The return value is the register allocated to you.
### `Def` You are free to modify the value in the register. The register is discarded at the end of the allocation scope.
A `Def` is the defintion of a value. This is the only time when a value may be set. ### `Define` as register
Xbyak::Xmm DefXmm(IR::Inst* def_inst, HostLocList desired_locations = any_xmm) A `Define` is the defintion of a value. This is the only time when a value may be set.
Xbyak::Reg64 DefGpr(IR::Inst* def_inst, HostLocList desired_locations = any_gpr)
By calling `DefXmm` or `DefGpr`, you are stating that you wish to define the value for `def_inst`, and you wish to write the value to one of the `HostLoc`s specified by `desired_locations`. You must write the value to the register returned. void DefineValue(IR::Inst* inst, const Xbyak::Reg& reg);
### `AddDef` By calling `DefineValue`, you are stating that you wish to define the value for `inst`, and you have written the
value to the specified register `reg`.
Adding a `Def` to an existing value. ### `Define`ing as an alias of a different value
void RegisterAddDef(IR::Inst* def_inst, const IR::Value& use_inst); Adding a `Define` to an existing value.
You are declaring that the value for `def_inst` is the same as the value for `use_inst`. No host machine instructions are emitted. void DefineValue(IR::Inst* inst, Argument& arg);
### `UseDef` You are declaring that the value for `inst` is the same as the value for `arg`. No host machine instructions are
emitted.
Xbyak::Reg64 UseDefGpr(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_gpr)
Xbyak::Xmm UseDefXmm(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_xmm)
At runtime, the value corresponding to `use_value` will be placed into one of the `HostLoc`s specified by `desired_locations`. The return value is the actual location. You must write the value correponding to `def_inst` by the end of the allocation scope.
### `UseDef` (OpArg variant)
std::tuple<OpArg, Xbyak::Reg64> UseDefOpArgGpr(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_gpr)
std::tuple<OpArg, Xbyak::Xmm> UseDefOpArgXmm(IR::Value use_value, IR::Inst* def_inst, HostLocList desired_locations = any_xmm)
These have the same semantics as `UseDefGpr` and `UseDefXmm` except `use_value` may not be present in the register, and may actually be in a host memory location.
## When to use each? ## When to use each?
The variety of different ways to `Use` and `Def` values are for performance reasons. * Prefer `Use` to `UseScratch` where possible.
* Prefer the `OpArg` variants where possible.
* `UseDef`: Instead of performing a `Use` and a `Def`, `UseDef` uses one less register in the case when this `Use` is the last `Use` of a value. * Prefer to **not** use the specific `HostLoc` variants where possible.
* `UseScratch`: Instead of performing a `Use` and a `Scratch`, `UseScratch` uses one less register in the case when this `Use` is the last `Use` of a value.
* `AddDef`: This drastically reduces the number of registers required when it can be used. It can be used when values are truncations of other values. For example, if `u8_value` contains the truncation of `u32_value`, `AddDef(u8_value, u32_value)` is a valid definition of `u8_value`.
* OpArg variants: Save host code-cache by merging memory loads into other instructions instead of the register allocator having to emit a `mov`.