Tutorial on the 'BL Trick' (ASM)
Efficiently Writing Long Strings Of Data to Memory ("BL Trick")

NOTICE: For coders who can already write loops but are dealing with a very large amounts of data that has to be written from 'scratch' before loading that data into a loop

There are multiple different ways to load/store large strings of data. Sometimes, a coder will push the stack, then use a lot of registers (up to r31) to write custom (from scratch) data to those registers. Then he/she will use a stmw (store multiple words) function to store a bunch of words at once to a final spot in memory. However, once a string of data gets too long, the stmw style of assembly will start to become inefficient/redundant...

Another not so efficient method, would be to use an empty spot in MKWii RAM. Write the string of data in the registers, store the data to the empty spot in memory. Then finally use a loop to load the custom string of data from the spot in RAM to another spot in memory

Too many unnecessary functions...

Before explaining what the BL Trick exactly is. Let's say we want to first custom write the following large string of data...

FFFF7777 AAAA1111 FFFF1111 22223333 44445555 66667777 88889999 AAAABBBB CCCCDDDD EEEEFFFF CCCC2222 11113333 

And our default list of instructions (first function is the default address)  is this...

80456CC0 lwz r4, 0 (r3)
80456CC4 stw r4, 0 (r5)
80456CC8 lwz r4, 0x4 (r3)
80456CCC stw r4, 0x4 (r5)
etc til 10 words are loaded and stored.

Alright so a good way to make a code to manipulate what is written to r5, would be to write stuff to the address locations designated by r3, and let the game (on the following ASM functions), load the data from r3 into r4, then store it from r4 to r5.

r3 and r29 are both dynamic addresses, so we cannot simply write the memory address values for them in a plain sense.

Let's take a look of how an intermediate-level coder would attempt to write this assembly at code address 0x80456CC0...

##Setup a stmw function, write all the data from r20 through r31 beforehand, we obviously need to push the stack...##

stwu r1,-80(r1)
stmw r14,8(r1)

lis r20, 0xFFFF
ori r20, r20, 0x7777

lis r21, 0xAAAA
ori r21, r21, 0x1111

lis r22, 0xFFFF
ori r22, r22, 0x1111

lis r23, 0x2222
ori r23, r23, 0x3333

lis r24, 0x4444
ori r24, r24, 0x5555

lis r25, 0x6666
ori r25, r25, 0x7777

lis r26, 0x8888
ori r26, r26, 0x9999

lis r27, 0xAAAA
ori r27, r27, 0xBBBB

lis r28, 0xCCCC
ori r28, r28, 0xDDDD

lis r29, 0xEEEE
ori r29, r29, 0xFFFF

lis r30, 0xCCCC
ori r30, r30, 0x2222

lis r31, 0x1111
ori r31, r31, 0x3333

##Data writing is done, now write data to r3, to allow the game to normally load it into r4 then store it to r5##

stmw r20, 0 (r3)

##Now let the game do its work##

lmw r14,8(r1)
addi r1,r1,80

lwz r4, 0 (r3) #Default ASM

That snippet of code is very redundant. While the stmw is a good way to store data of consecutive registers to a spot in memory, there's a much more efficient way to write this assembly.

We do this with the BL Trick.

What we do on the BL Trick is setup a Branch & Link function right before our string of data. The label name for the BL function will be immediately after the string of data. We will write the string of data without using any registers. At this point you are probably saying "how is this possible?". 

Its possible due to what are called pseudo ops. Pseudo ops is a method to write strings of data in a spot of memory 100% manually without using functions w/ registers. To help you understand this more, we need to quickly talk about how Gecko (the universal code handler) applies your cheat codes from a GCT file.

When you have a cheat code on a GCT and the codes get hooked/loaded, what happens is that where ever your code address is.. a branch (backwards) function will jump to an early spot of memory (800022A8). At this point your ASM functions of that code are executed one address at a time. Then a branch (forewords) function will jump to where you left from (but one step later ofc).

For example:

We have this compiled code..
C24685A4 00000002
38600001 B06A0022
60000000 00000000

Here's what actually happens in the game when your ASM function gets called on...

Address     Instruction
804685A4   b => 0x800022A8
800022A8   li r5, 1
800022AC   sth r5, 0x22 (r10)
800022B0   nop
800022B4   b => 0x804685A8

And then you are back to your original routine at address 0x804685A8 and game continues as normal.

Objectively speaking, there is no such thing as an "Insert ASM" compiled code. You can't insert ASM into memory within memory itself. The code handler creates its own sub routine instead.

Going back to the BL Trick.. what we do is we write our string of data at the code handler's subroutine. Then we do some fancy Link Register stuff to point to the subroutine, then finally loading data into a loop to store it to r3.

First let's write our BL function. Obviously backup the original Link register beforehand...

mflr r11 #OG LR value backed up

bl our_link

The bl will make us jump to the our_link label and next address the bl function is stored in the Link Register.

Next, we will use the pseudo ops to write out our string of data. There are all kinds of pseudo ops. Let's go over a few...

.long - write a word of data in Hex
.llong - wirte a double word (back to back words) of data in Hex
.space X - X = bytes of zero to write
.string - write a string in plain text/ascii. Ascii will be auto converted to Hex when code is compiled.

We will be using .llong and .space for our assembly.

Before we start writing out our pseudo ops, lets think about our loop that we will use after the BL trick is done. Let's say our loop will be a 'subic.' type loop using the lwzu/stwu functions with increasing offsets of 0x4. Thus we need to compensate for this first offset of 0x4. We will need to write in a blank word of zeros first...

.space 4 #four bytes of zero written

Now we can finally write out our string of data....

.llong 0xFFFF7777AAAA1111
.llong 0xFFFF111122223333
.llong 0x4444555566667777
.llong 0x88889999AAAABBBB
.llong 0xCCCC222211113333

Pseudo ops are completed. Keep in mind, every thing must be aligned by '4'. So lets pretend our '.space 4' op was a '.space 3', we would need to put a '.space 1' at the end for alignment purposes.

Next will be our label name. Then a mflr ASM function..

mflr r12

What happens here is that the address in the code handler subroutine (immediately after the bl function) is moved from the link register to register 12. Register 12 now holds our loading address for the loop!

Now we can set up a simply bdnz type loop. The loading address is relative to r12, the storing address is relative to r3.

Let's setup the count register for the loop. Register 9 will be used...

li r9, 0xC #12 © words of data to be loaded then stored##

mtctr r9 #Move value of 0xC to the count register

We obviously need the address of r3 is be subtracted by 0x4 due to the stwu update function in our loop that will increase offset amount by 0x4. Let's use register 9 for that, we'll restore r9's original value after the loop is done (we will pretend r9's original value beforehand is 2).....

addi r9, r3, -0x4

Now for the loop!

lwzu r4, 0x4 (r12) #First word loaded is the 0xFFFF7777 from the first .llong psuedo op
stwu r4, 0x4 (r9) #r9 plus 0x4 is the address of r3 which is where we want our first word stored at
bdnz+ loop_back

Once the loop is over, restore the OG LR value, restore OG value of r9, and OG value of r3, execute the default ASM

li r9, 0x2 #lets pretend r9's OG value was 0x2 before we started using it as a countdown register
mtlr r11
lwz r4, 0 (r3)

Here's an overview of every ASM function we wrote:

mflr r11
bl our_link

.space 4
.llong 0xFFFF7777AAAA1111
.llong 0xFFFF111122223333
.llong 0x4444555566667777
.llong 0x88889999AAAABBBB
.llong 0xCCCC222211113333

mflr r12

li r9, 0xC
mtctr r9

addi r9, r3, 0x4

lwzu r4, 0x4 (r12) 
stwu r4, 0x4 (r9) 
bdnz+ loop_back

li r9, 0x2
mtlr r11
lwz r4, 0 (r3)

Comparison Length of Code / Conclusion

In our first example of the assembly, we did a redundant use of lis/ori with a stmw function. If we take the length of of the compiled code (as a C2 type), it is a total of 17 lines of compiled code.

The length of our improved (BL-trick) assembly is 14 lines of code. Thats 3 lines shorter than the original example.

So in conclusion, if you are dealing with large strings of data (around 5 words or more), you can probably shorten/optimize your code using the BL trick. Thanks for reading! Happy coding!

Forum Jump:

Users browsing this thread: 1 Guest(s)