Nintendo 64 Part 8: Files and Filesystems
We can’t embed tons of data into our main program. The Nintendo 64 only has 4 MiB of RAM (or 8 MiB with the Expansion Pak), so we’ll run out of memory pretty quickly. We need a way to organize data on the cartridge so we can load it into memory as needed.
This means creating something like a filesystem, but we don’t need much of a filesystem. Instead of referring to files by name, we’re going to refer to them by number.
Background: Linkers, Symbols, and Relocations
Note: Skip to the next section if you already understand linkers, symbols, and relocations.)
Have you ever wondered what happens when you write code like this and compile it?
extern int my_global;
int get_my_global(void) {
return my_global;
}
Think about this:
- The compiler compiles this C code into assembler and then assembles it, producing machine code.
- Later, the linker resolves all references to functions and variables in other files.
In order to return my_global
, the machine code needs to have
the address of the my_global
variable.
But the compiler doesn’t know what the address is, so what does the
put there?
Let’s compile and look at the disassembly. We’re going this in x86 and then
MIPS for comparison.
Use -O2
to make the code shorter and easier to read.
x86 Code
$ cc -O2 -c example.c $ objdump -d example.o example.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <get_my_global>: 0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <get_my_global+0x6> 6: c3 retq
At address $0
you see the instruction which loads
my_global
.
The disassembly shows what address is written there: 0 (relative to
%rip
).
The variable obviously isn’t at %rip
, and what’s
happening here is the compiler has created a relocation
which instructs the linker to place the correct address of
my_global
at link time.
Relocations are also called “fixups” because the linker “fixes up” the
code with the correct addresses.
You can think of the relocation as a hole in the program, where the linker has to put data in the hole.
We can look at these relocation entries with objdump
:
$ objdump -r example.o example.o: file format elf64-x86-64 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000002 R_X86_64_PC32 my_global-0x0000000000000004
There is one relocation in the .text
segment
which takes the 32-bit PC-relative of my_global
,
subtracts 4, and stores the result at address 2 in our code
(which is where the zeroes are).
This relocation happens at link-time, and since it uses a relative address,
the code can then be loaded anywhere in memory without modifying it—it’s
position-independent code.
MIPS Code
$ mips32-elf-gcc -O2 -c example.c $ mips32-elf-objdump -d example.o example.o: file format elf32-bigmips Disassembly of section .text: 00000000 <get_my_global>: 0: 03e00008 jr ra 4: 8f820000 lw v0,0(gp) $ mips32-elf-objdump -r example.o example.o: file format elf32-bigmips RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000004 R_MIPS_GPREL16 my_global
MIPS exercises your brain a bit when you read it. The function actually returns first and then loads the value of the global afterwards, because MIPS executes the instruction that comes after certain jump instructions even if the jump is taken.
However, this is showing $gp
-relative addressing,
which is not what we are using for our Nintendo 64 program.
(Maybe we should?)
$ mips32-elf-gcc -O2 -c example.c -G 0 $ mips32-elf-objdump -d example.o example.o: file format elf32-bigmips Disassembly of section .text: 00000000 <get_my_global>: 0: 3c020000 lui v0,0x0 4: 03e00008 jr ra 8: 8c420000 lw v0,0(v0) $ mips32-elf-objdump -r example.o example.o: file format elf32-bigmips RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 00000000 R_MIPS_HI16 my_global 00000008 R_MIPS_LO16 my_global
This is the full 32-bit relocation, which in MIPS must be split in the high 16 bits and low 16 bits, because MIPS needs two instructions to load a 32-bit constant—and there’s even another instruction in the middle of the load.
Tricky Linkers
So why do we care? Because we can use the linker to stick any value into a relocation, not just the address of a variable in memory. We are bamboozling the C compiler:
extern uint8_t my_symbol;
uintptr_t get_value(void) {
return (uintptr_t)&my_symbol;
}
This example works the same way as the examples above.
The C compiler cannot calculate the address of my_symbol
because it’s defined in some other file, so the C compiler puts some
zeros in the file and adds a relocation record to tell the compiler
how to fix it up.
But… we’re not actually reading my_symbol
.
It doesn’t need to exist, because all we’re doing is converting the
address to an integer and returning that integer.
We are going to use the linker to make it so that the symbol
my_symbol
is not the address of a variable, but
just some number that we want to insert into our C program.
We can do that in two ways.
We can do it in the linker script:
my_symbol = 12345;
Or we can do it in an assembly language file:
.global my_symbol
my_symbol = 12345
Both of these techniques make it so the C code has the same effect as this:
uintptr_t get_value(void) {
return 12345;
}
Note that my_symbol
is gone.
It never really existed—it was just a fiction that we created in order
to let us use the linker to stick numbers into our program.
Packing Up Our Data
Now we are going to pack up our data files and add them to our ROM image. We are going to use the minimum number of files to prove that our code works: two. Here’s the basic structure which we will create:
The header will just be an array of these structures:
// Descriptor for an object in the pak.
struct pak_object {
uint32_t offset;
uint32_t size;
};
This is pretty easy for a custom tool to create. Our asset packaging tool will consume a manifest, read the listed files, and produce the entire block of data that we will embed in our ROM. The manifest looks like this:
IMG_CAT test/Ariella_32x32.dat IMG_BALL test/Ball.dat
The first column is the identifier for the file and the second column is the file path. Since we access our files by index, we use the manifest to generate a header file with all the indexes in it. This is the header file we end up with:
/* This file is automatically generated. */
#pragma once
#define PAK_SIZE 2
#define IMG_CAT 1
#define IMG_BALL 2
We can embed our data file by appending it to our ROM before running
makemask
(this is inside a Bazel genrule
,
which is very similar to a Make recipe):
mips32-elf-objcopy -O binary $(location :Thornmarked.elf) $@
cat $(location //assets:assets.dat) >>$@
makemask $@
We need to know the location of our data in ROM, so we create an extra empty linker section containing for the end of the ROM file and export the load address (location in ROM) as a symbol:
pakdata : ALIGN(16) {
. = .;
} >rom
_pakdata_offset = LOADADDR(pakdata);
We set up some structures for DMA:
static OSMesgQueue dma_message_queue;
static OSMesg dma_message_buffer;
static OSIoMesg dma_io_message_buffer;
osCreateMesgQueue(&dma_message_queue, &dma_message_buffer, 1);
This brings us back to the trick with linkers earlier, so we can use the location of the Pak file in our program to load data using DMA.
// Offset in cartridge where data is stored.
extern u8 _pakdata_offset[];
// Load data relative to the start of the Pak data in ROM.
static void load_pak_data(void *dest, uint32_t offset,
uint32_t size) {
osWritebackDCache(dest, size);
osInvalDCache(dest, size);
dma_io_message_buffer = (OSIoMesg){
.hdr =
{
.pri = OS_MESG_PRI_NORMAL,
.retQueue = &dma_message_queue,
},
.dramAddr = dest,
.devAddr = (uint32_t)_pakdata_offset + offset,
.size = size,
};
osEPiStartDma(rom_handle, &dma_io_message_buffer, OS_READ);
osRecvMesg(&dma_message_queue, NULL, OS_MESG_BLOCK);
osInvalDCache(dest, size);
}
Once we can load data using these offsets, we can load the Pak header and write a function to load objects. Note that I am using one-based indexes for objects, because I want zero to be an invalid value. We are again aligning to 16 bytes so we don’t get cache tearing.
// Info for the pak objects, to be loaded from cartridge.
static struct pak_object pak_objects[PAK_SIZE]
__attribute__((aligned(16)));
// Load the
static void load_pak_object(void *dest, int index) {
struct pak_object obj = pak_objects[index - 1];
load_pak_data(dest, sizeof(pak_objects) + obj.offset, obj.size);
}
Finally, this is all it takes to read data from cartridge memory. So easy!
static uint16_t img_cat[32 * 32] __attribute__((aligned(16)));
static uint16_t img_ball[32 * 32] __attribute__((aligned(16)));
// Load the pak header.
load_pak_data(pak_objects, 0, sizeof(pak_objects));
// Load objects.
load_pak_object(img_cat, IMG_CAT);
load_pak_object(img_ball, IMG_BALL);
We swap out one of the images for in our drawing code for the new image and it works!
Just to check, it does work on hardware. Do remember that any data you want to DMA from cartridge memory must be properly aligned. I believe 2-byte alignment is good enough.