Nintendo 64 Part 21: GP-Relative Addressing
In the MIPS ABI, register $gp
(GPR 28) is called the
“global area pointer”.
What does that mean, and how is it used?
How GP-Relative Addressing Works
You can see how it works by looking at the assembly code generated by the compiler. Here is a simple function that loads a global variable and returns it:
int variable;
int get_variable(void) {
return variable;
}
Compiled with the -G 0
flag that I’ve been using, the assembly
output is:
get_variable:
lui $2,%hi(variable)
jr $31
lw $2,%lo(variable)($2)
It takes two instructions to load a global variable because the address is
32 bits and MIPS instructions are 32 bits, so the variable’s address is split
across two instructions.
The first instruction, lui
, loads the top 16 bits of the address
into a register, and the lw
instruction loads the variable
using the low 16 bits of the address as an offset.
The way $gp
works is simple:
- Place global variables in a block of memory up to 64 KiB large.
- Set
$gp
to point into the middle of that block. - Access global variables relative to the
$gp
register.
GCC will handle step #3 for us if we pass a different value for the
-G
flag.
With -G 4
, variables up to four bytes in size will be accessed
relative to $gp
.
For example, the above code now compiles to this:
get_variable:
jr $31
lw $2,%gp_rel(variable)($28)
You can see that it only takes one instruction to load the variable now.
The limitation is that the offset in lw
and other load/store
instructions is limited to 16 bits, so we can only access 64 KiB of data
this way.
Variables accessed with GP-relative addressing are also called “small”
objects and the compiler will place them in the
.sdata
and .sbss
sections—the “small data” and
“small BSS” sections.
Using GP-Relative Addresses
First, I’ll change the value of the -G
flag to
-G 4
, just as a starting point.
This means that global variables that are 4 bytes or smaller will be accessed
relative to $gp
.
Next, I update the linker script to place the small data and small BSS
sections next to each other, and create a _gp
symbol for the
value of the $gp
register.
The _gp
symbol name is special, and recognized by the MIPS
linker as the value for $gp
.
Since the BSS section comes right after the data section, I can place
the small data at the end of the data section and the small BSS at the
beginning of the BSS section.
Here are the changes to the linker script:
.text : {
...
*(.data .data.*)
_gp = ALIGN(16) + 0x8000;
*(.sdata .sdata.*)
_text_end = .;
} >ram AT>rom
...
.bss (NOLOAD) : ALIGN(16) {
_bss_start = .;
*(.sbss .sbss.*)
*(.bss .bss.*)
...
} >ram
Since -0x8000
is the lowest offset that you can put in a
load or store instruction, I put $gp
exactly 0x8000
after the start of the small data section.
I add code to the entry point will to set the $gp
register.
On a real operating system on MIPS, the toolchain will do this for you.
_start:
...
# Set up global pointer
la $gp, _gp
...
This is not enough!
The Nintendo 64 OS, LibUltra, is written with the assumption that you are not
using GP-relative addressing.
When you create a new thread, the new thread will have $gp
set
to 0, instead of inheriting the value of $gp
from the parent
thread.
There are two ways I can solve this.
One is to just fix the value of $gp
at the beginning of
every thread:
// This function must be called at the top of each thread, except
// boot.
inline void thread_init(void) {
__asm__("la $gp, _gp");
}
void idle(void *arg) {
thread_init();
...
}
void main(void *arg) {
thread_init();
...
}
Note: Inline assembly should normally have inputs, outputs, and side effects specified. However, since the assembly has no outputs or side effects specified, GCC treats it as implicitly volatile.
In general, an
__asm__
block is not as simple as just “put assembly here”, so don’t think of it that way.
An alternative technique is to create a wrapper for
osThreadCreate
which sets the value of $gp
.
This means I don’t have to remember to use thread_init()
at the beginning of every thread:
// Call this function instead of osCreateThread.
void thread_create(OSThread *thread, int thread_id,
void (*func)(void *arg), void *arg, void *stack,
int priority) {
osCreateThread(thread, thread_id, func, arg, stack, priority);
__asm__(
".set gp=64\n\t"
"sd $gp, %0\n\t"
".set gp=default"
: "=m"(thread->context.gp));
}
As one final gotcha, I need to ensure that my code doesn’t access any
global variables in LibUltra using GP-relative addressing, because LibUltra
was compiled with the -G 0
and these variables won’t be placed
in the small data sections.
Fortunately, there is only one variable that I access this way,
osTvType
.
I can tell GCC that the variable is in a specific section, and this will
prevent GCC from using GP-relative addressing for that variable.
The exact section is not important, it’s just important that GCC knows
that the variable is not in .sdata
or .sbss
.
I add the following declaration to my code for osTvType:
// Avoid gprel access by declaring a different section.
extern s32 osTvType __attribute__((section(".data")));
It works! GP-relative addressing makes my code slightly smaller—the ROM shrinks by 64 bytes. Not worth the time I spent, but sometimes it’s just a learning experience.