Nintendo 64 Part 17: Loading Models
If I want to stay sane when making models for this game, I need a way to import models from a proper 3D modeling program like Blender. I recall that for JS13K 2019, I did all the models on graph paper, typing coordinates into text files, and I’m not about to repeat that experience.
Time to fire up Blender.
Modeling and Formats
FBX is the standard, it seems. I created a model in Blender and gave it four materials, each with a bright colorful diffuse color. I exported the result to FBX.
Importing and Converting With Assimp
FBX is a complicated format and I’m not about to parse it by hand, so I use a library called Assimp. Assimp is fairly powerful and still straightforward to use.
First, I iterate over the meshes in the scene, and gather them into a single mesh, deduplicating the vertexes. Assimp creates separate meshes for each of the four materials in my scene, so if I don’t do this deduplication, I’d end up with more than twice as many vertexes as I need.
Once the vertexes are deduplicated, I need to figure out how to convert the triangles into display lists in a reasonable fashion. I settle for simple greedy algorithm which tries to minimize loading into the vertex cache.
You see, in a normal modern GPU, when you draw triangles on the screen, as the GPU plows through your vertexes, it keeps a copy of previously transformed vertexes in a cache in case a triangle later in the draw call uses it. This cache is called the Post Transform Cache.
The Nintendo 64 has an explicitly managed version of this. Instead of drawing triangles and getting better performance if you have cache hits, on the Nintendo 64, you have to explicitly load vertexes into the cache. For the F3DEX2 XBUS microcode, the cache contains 32 entries. My model has 64 vertexes. That means that I will need to load vertexes into the cache multiple times during the display list. Ideally, I can minimize the amount of loading, but I will settle for something relatively simple.
The Batching Algorithm
The algorithm I came up creates a collections of triangle batches like this:
- Calculate the cost of drawing each triangle. A triangle which requires more vertexes to be added to the cache costs more.
- Add the lowest-cost triangle to the batch, recalculate the costs, and repeat until the vertex cache is full.
- Create a new batch, clear the vertex cache, and repeat until all triangles are processed.
To share vertexes between batches, the batches alternate between adding vertexes to the bottom of the cache and the top of the cache. A batch which adds vertexes from the bottom (starting with index 0, counting up) can reuse vertexes from the previous batch (starting with index 31, counting down). Within each batch, the triangles are drawn in groups by material.
A Model Format
The model itself just needs a display list (the Gfx
commands) and the vertex data (an array of Vtx
structures).
The display list must contain pointers into the vertex array,
but with segmented RSP addresses we can encode them as offsets.
This means that the game can load the model data as a single block of data
from the cartridge without doing any fixups on it, parsing it, or
setting up any pointer—all the game has to do is load the
model, set up a segment pointer with gSPSegment
,
and call gSPDisplayList
.
// Set segment 1 to point to model data.
gSPSegment(dl++, 1, K0_TO_PHYS(logo_model));
// Invoke segment 1, offset 0.
gSPDisplayList(dl++, SEGMENT_ADDR(1, 0));
The model display list refers to vertex data by using segment 1, which means that the address is just an offset from the start of the model.
It Works!
Here is the build:
This logo spins around and around.
Performance on real hardware is not very good. The frame time varies between about 6 ms and over 9 ms. This might just be due to the large size of the models on-screen and the overdraw, but it’s still a bit of a shock to see frame times that poor for such a simple program.
I can understand why most games don’t reach 60 fps on the Nintendo 64.