Assembly Tutorial
#1
Assembly Tutorial



This tutorial will teach you how to read/write basic Power PC Assembly Language. After this tutorial, you should be able to read/write the majority of basic MKWii codes that have been made in Assembly. Also, this tutorial is a supplementation to my thread - 'How to Make your own Cheat Codes on Dolphin', which can be read HERE. This Assembly tutorial is designed in a manner to teach specifically for the use of Assembly in MKWii Cheat Codes. However, one can still utilize this tutorial for learning Power PC Assembly in a general sense.


Chapter 1: Introduction

As mentioned earlier, this thread is related to the Dolphin Cheat Code Tutorial thread. Therefore the requirements of a person before reading this thread, is the same as what was listed in the Dolphin Thread (under Chapter 1: Requirements).

Now lets begin...

All CPU's have an assembly language (ASM for short). It is a low-level computer programming language. Unlike higher levels of language (such as C++, Python, Java), ASM is not 'portable' to other types of CPUs. Of course companies can produce a line of CPUs to share the same or nearly same Assembly language, but taking ASM from an old Intel CPU, and trying to use it to write code on a new non-Intel CPU would not be possible. So, learning ASM will not let you know the entirety of another ASM language for a different CPU. 

This portability issue isn't a problem because all Wiis use the same CPU model. The ASM language that all Wiis' CPUs use is called PowerPC ASM. PowerPC ASM was jointly created by IBM, Apple, and Motorola. The only computer programming language that is at a lower level than ASM is binary code (pure machine language). Thus, this allows the user to understand how MKWii functions at near its lowest level. The downside to this is that the more complex code you need to make, basically the difficulty increases exponentially. This is why very complex ASM codes are rare in MKWii.


Chapter 2: Registers

A register is an accessible location of the Wii's CPU. Assembly language is just a set/list of instructions that utilize these accessible locations in all sorts of ways. You can load, move, store, etc various types of data within these Registers, the possibilities are endless. There are all types of Registers. First thing's first. There are 32 normal integer registers. These registers are referred to as the General Purpose Registers (GPR for short). Most MKWii codes only use the GPRs. For the beginner code creator, these registers are the only registers you need to know. However, let's go over some more types.

There are also 32 Floating Point Registers (FPR for short). They obviously use floating point values instead of normal integer values. The Count Register is used to help make loops. The Link Register holds the address that is saved to 'jump' back to/from a subroutine. This register is used for what is known as branch instructions.


Chapter 3: Variables

There are 3 types of variables. Words, halfwords, and bytes. Words are for 32 bit data, Halfwords for 16 bit data, and bytes for 8 bit... The general purpose registers (along with most other registers) are 32 bits in length. Majority of time, words (the whole length of data of a register) are used in writing ASM for codes. Once a beginner code creator is familiar with using words, he/she can then implement halfwords and bytes for less basic codes.

Integers and memory address locations:

Ok, now that you understand what the Registers are and what data variables can be used within those registers. Let's talk how registers are mainly used for MKWii cheat codes.

The most common form of action in a Register is loading (setting) a value in a Register. This can be a word/halfword/byte that can represent a character, vehicle, track, item, etc etc. The second most common form of using a Register is the address or memory location. A code creator can set a Register (word value) to represent a memory address, which he/she can later use to store/write/move/load data to/from. Pretty simple stuff.

Characters/symbol set:

When you write out ASM instructions to later be compiled to make a cheat code, various symbols are used for proper formatting. This will allow the compiler to read your ASM and compile it correctly.

List of symbols:
. (period)
: (colon)
, (comma)
() (parenthesis)
+ (plus)
- (minus)
_ (underscore)
# (hash tag)
x (not multiply, this is for writing Hex values)

The symbols a basic code will use will only be commas, parenthesis, and the x (if using Hex values). Before continuing further, lets talk about using Hex vs Decimal for writing ASM instructions.

Hex vs Decimal:

Compiled cheat codes are shown in Hex byte code. It's common sense you need to know Hex beforehand. For writing ASM, there are certain elements of an ASM instruction that you can write in Hex. However, the downside is all known PowerPC compilers/decompilers will decompile a cheat code using decimal. If you are not sure what to use, then I recommend that when using integer data (for representing something such as an item value), stick with decimal. However, using Hex for using Memory addresses is vastly superior to decimal. Regardless, you need to know Hex to write ASM.


Chapter 4: Format for Writing ASM

Alright, now let's get into proper formatting of writing ASM. Remember those registers I told you about? Any GPR is written as rX. X = the register's number. Remember there are 32 general purpose registers. The first register is Register 0, aka r0. The last is Register 31, aka r31. In every ASM instruction there is a destination Register. The destination Register is the Register that holds the result of an executed ASM instruction. The source Register is a Register that is used to compute an ASM instruction. Some instructions will have one source register, while others will have two. Every ASM instruction can only have one destination Register.

Important Side NOTE About Viewing the Registers in Dolphin:
Every Register for PowerPC is displayed in Hex. When Registers are displayed, their entire 32-bit length of data is shown. Please also note that further ahead in this tutorial, values of Registers will have a '0x' in front of them just to remind the user they are in Hex. However, when you are viewing Registers in Dolphin, there will be no '0x' in front of the Register value. For example: Let's say Register 1 has a value of 0xFF, it will actually be displayed as just '000000FF'.

Now going back to the format of writing ASM, here is a basic example of two source registers with the destination register

rD, rA, rB

rD = Destination Register
rA = 1st Source Register
rB = 2nd Source Register

Keep in mind this is not an actual ASM instruction, or an exact correct format. This is just to show you a very very very general look of any ASM instruction that uses two source registers to compute a value for the destination register. Now let's look at an example with just one source register..

rD, rA, VALUE

rD = Destination Register
rA = Source Register
VALUE = Any 16 bit Signed Decimal/Hex Value

VALUE is a decimal/hex number you wrote from scratch for use in an instruction.

What is signed?
When using values in instructions, they are either signed or unsigned. Unsigned values are NEVER negative, while signed values can be negative.

Example: Let's say we have the value -1 (negative one). Obviously this is a signed value. If this value was in a register, it would display as on Dolphin as 0xFFFFFFFF. If the register was part of an instruction and was treated by the instruction as an unsigned value, 0xFFFFFFFF would not be -1. It would represent the positive large hex value 0xFFFFFFFF

For VALUE, this is the range of numbers that can be used.
0x0 through 0x7FFF
0xFFFFFFFF through 0xFFFF8000

So since VALUE is signed, you can never use a number such 0x9FA0.

Also, a code creator would never bother to write 0xFFFFFFFF for negative one, it's much easier to put -1 or -0x1.


Chapter 5: Integer ASM Instructions

Ok at this point you should have a well understanding of the...
Registers
Symbols that can be used in ASM
General Format/Layout of ASM instructions

Let's go over actual real world instructions that a person would use to make codes. Here is one of the most basic ASM instructions....

Add (adds two source registers to compute the value of the destination register)

add rD, rA, rB

Very elementary. The value of rA is added with the value of rB. rD will hold the result of the two values added together. Notice the use of the comma (one of the symbols listed earlier). The commas will let the compiler know there are three registers being used.

Let's say we add the values of r1, and r25. The result of this value will be stored in r20. Our 'add' ASM instruction would be this...

add r20, r1, r25

For a majority of ASM instructions that use two source registers, you can swap them. So you can also write this as...

add r20, r25, r1

Imagine this as a basic math equation of 2 +3 = 5. It doesn't matter if you swap the positions of 2 and 3, the result is always 5. You obviously can't change the spot where the destination register is within the instruction. Keep in mind certain ASM instructions won't allow this swapping of source registers. Let's move onto another very basic ASM instruciton...

Add Immediate

addi r4, r30, 12

Notice the number 12. It doesn't have the letter 'r' before it. Therefore we know the 12 is a signed value instead of a source register. Thus this instruction adds the value of r30 and the value of 12. The result will be stored in r4. For the addi instruction, you CANNOT swap the positions of 12 and r30! If you wanted to write this same instruction in Hex, it would be like this..

addi r4, r30, 0xC

The '0x' must be put before any hex value, or the compiler will compile it as decimal or not compile it at all (throw an error). You can of course throw a minus (-) before your signed value to designate a negative number. So if we did.....

addi r4, r30, -12

This would be adding the value of r30 and negative 12. Thus we are actually subtracting 12 from the value in r30. To save ASM writers time, you can use what are called simplified mnemonics. A simplified mnemonic is a 'shortcut'/'simplified' version of an ASM instruction.

The simplified mnemonic for addi r4, r30, -12 is...

subi r4, r30, 12

Subi stands for Subtract Immediately. It is actually not a real ASM instruction per say, but compilers have been configured to read these 'ASM shortcuts'. Let's go over the most common simplified mnemonic of all....

Load Immediate

li r1, 0xFF

As you can see there are no source registers in this ASM instruction. It is a shortcut. Easier to write, easier to read, easier to understand. The actual ASM instruction for li r1, 0xFF is addi r1, 0, 0xFF. You will right away notice the 0 in the middle doesn't have an r in front of it...

Special note about r0:
In the ASM instructions addi & addis (add immediately shifted), r0 (when used as the source register) is always treated by the compiler as the number 0. This is because in order to load a value in a register by writing a real ASM instruction such as addi, a zero is needed in the instruction, and is thus in the middle between the destination register and signed hex/decimal value. There are other instructions where r0 is treated differently from other registers, but that will be provided in a link to a Simple ASM Reference thread at the end of this tutorial.

Going back to our 'li' instruction...

This 'li' instruction simply sets a register to the designated signed value. Which is 0xFF in our case. Therefore, after that ASM instruction is executed, register 1 now is 0xFF. 

At this point, you are probably wondering about if data in Registers get erased after certain ASM instructions. In most ASM instructions, source Registers do NOT get their value erased when used in an ASM instruction. Only the Destination Register loses its original value. So in our earlier 'add r20, r1, r25' ASM instruction, r1 & r25 still retain their original values, they don't lose any data or reset to zero.


Chapter 6: Store, Load ASM Instructions

Ok we've gone over the most basic instructions. Let's kick it up a notch. Now we are diving into the realm of loading and storing data to/from memory locations. Let's take a look at one of the most basic store-type instructions...

Store Word

stw rD, VALUE (rA)

In this instruction, the value in rD is stored to rA. rA is treated as a memory location (address). The use of the parenthesis around the register will let the compiler know that too. Therefore, its data is not lost. This is why the register for the memory location is written as rA (source register), and not written as rD. However in store-type ASM instructions, rD will also not lose its data. VALUE is a 16 bits signed hex/decimal value that is added to rA's value to compute the exact memory location of where rD's value is stored to. Let's look at an example...

stw r3, 0x0020 (r28)

The word (32 bit; whole value) of r3 will be stored at the memory location (address) that is listed in r28. r28 is not being used as normal data per say, its value is being used to use as a location in RAM. The signed value of 0x0020, is known as the 'offset'. This offset is added to the value in r28 to use as the number that will be the finalized memory location of where the word in r3 is stored.

So let's say our value in r3 is 0x0000200A, and r28 is 0x80001500. Ok, first we add the offset value to 0x80001500. We now have the finalized memory location value of 0x80001520. Let's say before the ASM instruction, the word at 0x80001520 was 0xFFEF1023. After the ASM instruction is executed, the word is now erased and replaced with 0x0000200A.

Onto another ASM instruction...

Load Word & Zero

lwz rD, VALUE (rA)

This is simply the 'reverse' of stw. rA is treated as a memory location. VALUE is the offset that is added to rA for the finalized memory location that will be used. However, in a lwz instruction, the destination register will be replaced by whatever word is at the finalized memory location.

lwz r31, 0 (r1)

For this lwz instruction, the offset is 0 (no offset). Therefore, nothing is added to r1 to use as the finalized memory location to load the word from. Let's say r1 is 0x806553E4, and the word at that address is 0x00000001. After the ASM instruction is executed, r31 is now 0x00000001. The previous data in r31 is erased. The word at the address of r1 is NOT erased, the 0x00000001 at that address in RAM remains intact.

Quick tip:
You are probably wondering at this point how to write a whole 32 bit value from scratch to a Register. This is useful for establishing memory locations to later use for store-typ and load-type ASM instructions. So let's say we want to write the value of 0x80E6FF30 to Register 22, how do we do this? Simple, with just two ASM instructions like this...

First we write the 'top half' or 'left side' of the Register. This is known as the upper 16 bits. For example:

Load Immediate Shifted

lis r22, 0x80E6

This instruction is called Load Immediate Shifted. Similar to the Li instruction but instead we are writing the upper 16 bits. Now what happens to the lower 16 bits, or right side of the register? They are CLEARED/DESTROYED. This means the lower 16 bits are always set to 0x0000 anytime 'lis' is executed.

So at this point, r22 has a value of 0x80E60000.

Ok now let's write in the right side/bottom half (lower 16 bits). We do this with a instruction called Or Immediately. It takes an unsigned value , and does a logical OR with the Register and then the result is stored back into that same register. Like this....

Or Immediate

ori r22, r22, 0xFF30.

Now r22 will have our desired value of 0x80E6FF30. Simple to do!


Chapter 7: Branch, Compare ASM Instructions

Remember those list of symbols I showed you in Chapter 3? You already know about the use of commas and parenthesis, we will go over some more in this chapter. This will also take us into intermediate level ASM instructions.

The earlier instructions were known as Integer, Load, and Store Instructions. Now let's cover Branch instructions. Let's look at the most basic branch ASM instruction......

Branch

b 0x8
li r1, 1
stw r1, 0 (r31)

The letter b is use for what is known as an unconditional branch. Unconditional meaning the branch is executed no matter what the conditions are. Think of it like a 'jump'. The branch will skip/jump over a certain amount of ASM below, thus not executing it. In the provided example, the 'li r1, 1' instruction would be skipped. Now, the '0x8' next to branch is the amount to 'jump/skip'. Obviously, the larger the jump, the harder it would be to correctly calculate this amount of jump. Therefore we use a trick called 'labels'.

Labels are just that, they are labels.  Wink

To allow any compiler to know you are using labels, you designate labels with two symbols. The underscore symbol and the colon symbol. To first establish a branch label name, you must implement an underscore somewhere in the name. Like this...

b the_label

You can name labels whatever you want as long as you use the underscore and do NOT use special characters like percent signs or dollar signs. Just stick to basic letters. Okay, you have set the label name, now all you need to do is put that same label name right before the first ASM instruction that you want executed after the jump. Put in the label name and add a colon afterwards like this...

b the_label
li r1, 1

the_label:
stw r1, 0 (r31)

Btw, you are not limited to just jumping 'forward/down', you can jump backwards/up too.

the_label:
add r1, r10, r20
lwz r1, 0 (r15)

b the_label

Very simple to understand, very easy to implement. Now the branch in the provided example above would be useless. Why would you randomly skip over ASM instructions? Well branches are needed if you wanted to create a subroutine. Think of your list ASM instructions like a road. When the game is preforming the list of instructions one after another, think of that like traffic driving on the road. However, you can now put a fork in the road, and tell the traffic which way route to take. The two routes will then later merge back together.

Now you have a mental image of how ASM instructions run, let's dive into Conditional Branches. Now normal (unconditional) branches used only be themselves would not make logical sense. We need a create a that 'fork' in the road. The easiest to create that fork is conditional branching.

Conditional branches are branches that only execute base on an 'if'. For example let's look at the 'branch if not equal' ASM instruction...

Branch If Not Equal

bne the_label

li r1, 1

the_label:
stw r1, 0 (r31)

Ok the_label will only be 'jumped to' if the conditional branch is true. In order to set up this 'if' for a conditional branch, we need to make a comparison. The most command ASM instruction to establish a comparison is the 'cmpwi' ASM instruction.

Compare Word Immediate

cmpwi rD, VALUE

Value in rD is compared to VALUE.

cmpwi r10, 0xA

The value in r10 will be compared to the value of 0xA. We have thus created our 'if statement'. So now add in the bne instruction from earlier....

cmpwi r10, 0xA
bne the_label

li r1, 1

the_label:
stw r1, 0 (r31)

So in this example. The value in r10 is compared to the value of 0xA. Then if the value in r10 is NOT equal to 0xA, you will thus 'jump' to the_label, thus skipping the 'li r1, 1' ASM instruction.

Let's look at another example using a different conditional branch...

Branch If Equal

cmpwi r10, 0xA
beq the_label

li r1, 1

b the_end

the_label:
stw r1, 0 (r31)

the_end:
stw r3, 0x0010 (r24)

As you can see not only are we using 'beq' now, we are adding an unconditional branch and a second label called the_end. You should quickly see why I added the unconditional branch. Remember the road analogy I used earlier... Let's take the first route of the fork in the road (if r10 does equal A)

If r10 equals A, we jump to the_label. We thus execute the first 'stw'.... Now remember the traffic/road analogy, we now go right to the next ASM instruction, the second 'stw'. The label name itself is NOT a barrier in our 'road' in any way shape or form. The labels are just label names to calculate the branch offsets for the compiler so you don't have to do the work.

Let's instead take the second route of the fork in the road. If r10 isn't equal to A, we don't jump at all to the_label. We instead go straight down our road to the 'li' instruction. After that, we encounter our unconditional branch. This obviously means we take the branch/jump no matter what. We do this because why would we go to the_label when our r10 value wasn't equal to 0xA? That would make no sense. Therefore we jump to the_end, thus skipping the first stw instruction.

Here is a list of commonly used 'if-type' Branches..
beq - Branch If Equal
bne - Branch If Not Equal
bgt - Branch If Greater Than
blt - Branch If Less Than
bge - Branch If Greater Than Or Equal To
ble - Branch If Less Than Or Equal To

Let's go over another Compare ASM instruction really quick... 

Compare Word

cmpw rD, rA

This will simply compare the values of two registers instead of using an signed value.

cmpw r4, r8
bgt the_label

In this example, if the value in r4 is greater than the value in r8, then the jump to the_label will be taken.


Chapter 8: Illustration

Here's a picture I made to give you a visual guide of what ASM instructions do.... http://mkwii.org/pics/other/ASMpic.png


Chapter 9: Extra Stuff

Let's go over some more symbols that we haven't covered yet.

Period (.):

You can use this period to establish a value to have it's own unique label name. Btw, this has nothing to do with branch labels. Think of these like making definitions, or having 'macros'. The period is followed by the word 'set'. Just like branch labels, you need to incorporate an underscore. For example:

.set ITEM_MUSHROOM, 0x4

...some ASM here....

li r31, ITEM_MUSHROOM

This now allows the ASM writer to put ITEM_MUSHROOM for any time we wants to use the value of 0x4. Very basic 'macro' per say. Can come in handy if you are writing lengthy ASM.

Plus & Minus (+ and -):

The plus and minus symbols are used for conditional branches. Whenever a branch is done, you can help the CPU by giving it a 'hint'. The plus symbol stands for more-likely, while the minus symbol stands for less-likely. For example....

cmpwi r8, 0xC
bne+ the_label

The plus symbol next to the 'bne' will tell the CPU that the branch is more-likely to occur.

Hash Tag (#):

Whenever someone is writing very lengthy ASM, it can be handy to add notes that will let that someone know why he/she wrote those instructions. Here's an example of using hash tags to add notes/comments:

lis r4, 0x8000 #Set 1st half address to the store word to
stw r30, 0x157C (r4) #Store word to memory location 0x8000157C, the offset amount is used to complete 2nd half of address


Chapter 10: Conclusion & Credits

Alright, this should help get you started writing PowerPC ASM for your cheat codes. For more ASM examples, visit this thread HERE.

Once you are have created a couple of basic ASM codes, read the Wiibrew ASM tutorial HERE. It is more in depth and gives a more technical approach. Keep in mind, they are teaching ASM in a general sense for program creation, not for using ASM in codes specifically.

Credits:
IBM, Apple, and Motorola (creators of PowerPC ASM)
WiiBrew (a lot of information was gathered from there)
Star (taught me ASM)
Reply
#2
blyatful
Reply
#3
Wow, this guide is actually quite comprehensive and well-written. It helped me a lot in getting started on ASM and led me into learning Hex (which I didn't know beforehand, although it's actually pretty simple). Kudos
Reply
#4
Thank you for the kind words. Assembly by itself isn't that tough to learn, it's just very difficult coming up with code ideas from scratch and applying your ASM knowledge into making an actual cheat code.

I would suggest going through the codes forum and looking at the Source of basic ASM codes. Ones either written by me or Star. We put plenty of good comments in our Source to help others understand how the code(s) work.

EDIT:

Here is essentially the most basic ASM you can do. Writing a value in a register before that value gets stored.

https://mkwii.org/showthread.php?tid=848

I was looking at a value in the RAM Viewer. I noticed it would get written to whenever I did a certain action with my item. Therefore i set a Write BP. I used my item, the value in memory gets written a new value, the Write BP gets set and the game pauses. I see in the Code view that the value in Register 31 is getting stored to a spot in memory. 

This is easy to manipulate. As you can see in the Source, I simply load in a custom value in Register 31 (replacing the legit value), and then including the game's default ASM to allow the game to store the new value to memory. Very simple.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)