About Stack Frames
About Stack Frames

If you have been viewing sources of various ASM codes, you may have come across these set of instructions...

stwu r1, -0x50 (r1)
stmw r14, 0x8 (r1)

lmw r14, 0x8 (r1)
addi r1, r1, 0x50

You may have also seen something similar to the above, which is this...

stwu r1, -0x80 (r1)
stmw r3, 0x8 (r1)

lmw r3, 0x8 (r1)
addi r1, r1, 0x80

Let's go over the details of what these set of instructions do, plus some tips (if you are beginner coder) to customize your own 'type' of stack frame to create instead of using a 'generic 'version'.

It is first recommended you go to this thread HERE and read the examples plus notes/rules for the following instructions: stwu, stmw, lmw, addi, stfd, lfd, stswi, lswi

Chapter 1. The Purpose

The purpose of creating a stack frame is to backup certain registers that could/will lose their values during the execution of the ASM code. It's used because its a universal method that can work on any Wii game. Sure you could use the Exception Vector Area, but there isn't that much available space there and many other ASM codes already utilize that space for storing misc values. You could find some other place to store them, but then it may only work just for the Wii game you are using. The stack method will work for all Wii games as just mentioned.

Chapter 2. Details of Creating Frame, Storing Registers

stwu r1, -0x50 (r1)
stmw r14, 0x8 (r1)

lmw r14, 0x8 (r1)
addi r1, r1, 0x50

The above is a common list of instructions used in some MKWii ASM codes to store the non-volatile registers to the stack and load them back after a code is done. Usually, the default instruction of the code is placed before the stwu instruction or after the addi instruction. The contents of the ASM code itself is placed in the middle of the two sets of stack instructions.

Register 1 (named sp in Dolphin) is the register that holds the address to the current stack pointer.  The stack grows DOWNWARD. As in towards LOWER memory addresses. So if you were viewing memory addresses on something like a RAM viewer the stack is actually growing upward for your view. A little confusing, I know.

So since the stack grows downward, thus anything before r1's current value is free space (more on stack limits later). So first we need to backup r1's value. At the same time, let's store that value to the address we want to, and... update r1 to have the new address. This is all done via the first stwu instruction...

stwu r1, -0x50 (r1)

The old r1 value is store -0x50 in reference to its address value. So let's say r1's value is 0x80371550. With the above instruction, the value 0x80371550 is stored at 0x80371500. At this point, you are wondering why the offset amount of -0x50 for the stmw instruction is used. We will get to that in a second...

stmw r14, 0x8 (r1)

This instruction will store all registers starting at r14 going upward to r31; to the location of r1 plus offset 0x8. Thus, they are now all copied over to our new stack frame. Your code can use the registers now as the instructions lmw, and addi will retrieve the values and replace r1 with its original value. We use stmw over multiple-stw instructions for various reasons:

- Creates a code that's shorter in compiled length
- Reduces risk of loss of data during an interrupt

Interrupts are invisible, you won't see them and one could occur during your stack frame creation. Thus it's much better to use stwm instead of a bunch of consecutive stw instructions.

Chapter 3. Illustration

Now that you understand how a new stack frame is created with the registers' values being stored, let's take a look of an illustration to demonstrate what would memory look like after execution of the stwu and stmw instructions from above

#      ...       #  Lower Address (Stack grows in the direction towards lower addresses aka downward)
#     New r1     #  0x80371500 (r1's old address value of 0x80371550 is here)
# Padding of 0x4 #  0x80371504
#      r14       #  0x80371508 (0x8)
#      r15       #  0x8037150C (0xC)
# r16 thru... 30 #  0x80371510  (0x10 thru 0x48)
#      r31       #  0x8035154C (0x4C)
#     Old r1     #  0x80371550
#      ...       #  Higher Address

As you can see there is padding of 0x4 after new r1. This is ALWAYS done for any stack frame you create. Hence why the stmw instruction is done with an offset of 0x8. At offset 0x0 (or at new r1) is old r1's value. The padding is done so that if you do happen to call another function (more about function calls HERE), then once that new function is called, the game ALWAYS stores the LR to that spot in the padding. Yes, I know its not needed for most ASM codes as most ASM codes don't do function calls, but's this method of adding the padding is to for good habit and to keep consistency.

The values in the parenthesis are the starting offset values for each item on the illustration. r31 starts at 0x4C offset. Ofc since r31 is 4 bytes long, it uses offsets 0x4C thru 0x4F

You can also see I didn't bother list every register of r14 thru r31 separately on the illustration. What's important is that you see r31 is exactly the the memory address before the old r1. So the amount of space we used for our stack frame was the BARE minimum. Ofc you could add more stack space to have padding in between r31 and old r1, but try to only allocate the amount you need.

Chapter 4. Stack Frame Size Rules/Limits

As mentioned moments ago, only allocate the amount of space you need. Personally the most stack space I have seen allocated was around 0x300 of bytes. But sometimes, a big stack allocation can cause a crash as the space might not be available.

Now at this you point, you have a good idea of how much space you need. You need 0x4 for old r1's value stored at new r1's location. You need 0x4 of padding afterwards. Then you need space for your registers. Each register takes up 4 bytes of data ofc...

So here's a simple equation to do to calculate stack space for your stack frame...

[4 x (# of Registers)] + 8 = Stack Space

Keep in mind we are using the stmw instruction for backing up registers. So if we want 3 free registers, the registers that would be backed up are r29, r30, and r31. Obviously with this instruction you need at least 2 registers (r30 and r31) or else trying to do a stmw instruction backing up just one register (r31) will output an error within the compiler.

So let's say we want 5 free registers (which would be r27 thru r31 due to stmw instruction) to use.. Let's do the calculation to figure out how much of a stack frame size to create

4 x 5 = 20

20 + 8 = 28

That is a value 0x1C in hex. Great so if we were to create a stack frame, we would start with this....

stwu r1, -0x1C (r1)
stmw r27, 0x8 (r1)

So are we good? No, we aren't. PowerPC has specific rules for creating stack frames. One of those rules is that all stack frames created must have a size that is 16 bytes (quadword) aligned. Thus because of this, the stack space your create (in hex value) must be divisible by 0x10 (aka the hex value ends in a zero). The value 0x1C does not end in zero, so we are breaking this rule.

We follow this rule as if an ASM Code has a function call in it, then the misalignment may cause a crash. Ofc, if the ASM code doesn't do any function calls at all, it doesn't matter. But once again, good habit and consistency is why.

Since our calculation for the stack frame size is 0x1C, we need to bump it up to next hex number that ends in zero. So the stack space we will allocate is 0x20. Here's the instructions..

stwu r1, -0x20 (r1)
stmw r27, 0x8 (r1)

Chapter 5. Overview of Recovering Values From Stack

Now we have a stack frame created, We put it in some code contents, and now we need to retrieve all the values back...

lmw r27, 0x8 (r1)

The lmw is simply the opposite of stmw. It loads all the values beginning at 0x8 offset of r1 and stores them in the source register all the way to r31 depending on which register is listed in the instruction. Ofc you can't do this with r31 as the source register (just like in the stmw instruction). Onto the final instruction..

addi r1, r1, 0x20 (r1)

The value you used in the addi instruction is simply the positive value of what was used in the stwu instruction. That's all you need to know. This makes r1 have its old value again. And that's it. You don't need to do any instruction to remove what values are left in memory, as the game will wipe/replace them the next time it creates a stack frame after the execution of your code's address.

Chapter 6. Using Store/Load String Word Immediate

All the stack-related instructions we have done so far are via stmw/and lmw. Obviously, if you wanted to backup registers 17 thru 20 only, then stmw/lmw would not be an option.

Let's see what the stack frame instructions would be if we wanted to backup only r17 thru r19. First calculate the stack space...

4 x 4 = 16
16 + 8 = 24 (0x18 hex)

stwu r1, -0x20 (r1) #0x18 bumped up to 0x20 for quadword alignment

Due to the stswi/lswi instructions not allowing offset values to be used with memory addresses of a register, you need we need an extra instruction that contains the exact address of new r1 + 8...Let's say r11 is available for use...

addi r11, r1, 0x8

Now we can do the store string word index....

stswi r17, r11, 12 #12 stands for 12 bytes starting at r17. 4 bytes r17, next 4 bytes of r18 then next 4 bytes of r19. Thus r17 thru r19 are stored.

And lets say our code is ending, time to retrieve the values from the stack and update r1 with its old value...

addi r11, r1, 0x8 #You will need to use this instruction again if r11's value was wiped/replaced during your code...
lswi r17, r11, r12
addi r1, r1, 0x20

Chapter 7. Storing Floats to a Stack Frame

When storing floats, you should always store them in their double precision form instead of single.

Since double precision values are used, this changes up the stack calculation as each double precision FPR takes up 0x8 per register. Let's say you wanna store 7 registers (via stmw/lmw so r25 thru r31)) and 2 floating point registers (f1 and f2)... Here's the calculation...

4 x 7 = 28 #for GPRs
8 x 2 = 16 #for FPRs

28 = 16 = 40

40 + 8 = 48 (0x30 in hex)

So our stack space already ends in 0x0 so we don't need to add any more space for quadword alignment. Now at this point, you are wondering which do I store first, the FPRs or the GPRs. It's always easier to store the items that take up less total stack space. Since our two FPRs take up less space than the 7 GPRs, we will store them first. There are no stwm/lmw type instructions that work for floats. Same with store/load string word type instructions for floats. Each FPR must be stored/loaded one at a time... So with all of this in mind, let's look at creating the stack frame and storing the FPRs....

stwu r1, -0x30 (r1)
stfd f1, 0x8 (r1)
stfd f2, 0xC (r1)

Alright now we store the GPRs....

stmw r25, 0x14 (r1)

Ofc, make sure all your offsets are correct so you don't wipe half of an FPR or a whole GPR when storing both floats and GPRs...

Alright let's say we are at the end of our code, lets recover the values and update r1 with it's old value..

lmw r25, 0x14 (r1)
lfd f2, 0xC (r1)
lfd f1, 0x8 (r1)
addi r1, r1, 0x30

Chapter 8. Conclusion

Alright, at this point you should know how to make your own stack frames 'from scratch'. Happy coding.

Forum Jump:

Users browsing this thread: 1 Guest(s)