Wordle and Dictionary Compression
Now that Wordle is so popular, we’re starting to see a lot of clones appear. Why not try to port the game to an old computer or retro game console? It could be fun! More fun than actually playing the game‥. who wants to play a game anyway, when you can write programs?
Compressing the Dictionary
How can I fit the Wordle dictionary onto a cartridge for the Nintendo Entertainment System?
The easy solution is to use a cartridge that’s as large as necessary to hold the dictionary. Storage isn’t a problem these days, and NES cartridges from the 1980s and 1990s came in a variety of sizes… from the diminutive BurgerTime (24 KiB) to the gargantuan Metal Slader Glory (a whopping 1 MiB).
However, there is a natural choice for cartridge size: 40 KiB, which is the highest capacity that does not require bank switching. This style of cartridge is called “NROM”, which is simply the designation that Nintendo printed on the circuit boards. That 40 KiB is divided into 32 KiB of program data (called PRG) and 8 KiB for image data (called CHR).
We can’t get away with casually stuffing the Wordle dictionary onto an NROM cartridge. The dictionary is too large. In fact, the game uses two dictionaries:
- The dictionary of possible solutions, containing 2,315 more common words, but no “rude” words.
- A larger dictionary of 10,657 additional, less common words which are accepted as guesses.
Quick math says that encoding these 12,979 words with one byte per character would take us over 63 KiB of space, which is far more than the 32 KiB available for program data. However, with some simple compression, we can make the entire dictionary fit, and have enough space left over to store the game’s code.
Compression Details
The compression scheme is simple.
Break each word into a two-letter prefix and three-letter suffix,
so “WORDS
” becomes “WO
” and “RDS
”.
Many words will share the same prefix. Gather all the words with the same prefix in a list, and only store the suffixes. There are 26×26 = 676 possible prefixes, so we create a table with 676 pointers in it, pointing to the list of suffixes for each prefix. The prefix itself is not stored anywhere, it is only used as an index.
Here’s a picture of what it looks like:
In the above diagram, you can see the words “aback”, “abase”, “abate”, and “abbey” in the list for the “AB” prefix. No word in the dictionary starts with “AA”, so the pointer for that prefix is null.
Easy peasy. Since each letter can be encoded with five bits, the three-letter suffixes can each be packed into a two-byte field. The dictionary ends up taking a little over 17 bits per word—it turns out that the table of prefixes, with 26×26 entries, is small compared to the rest of the dictionary.
You can even encode the difference between the “guess” and “solution” words easily enough. Simply put the solution words at the beginning of each list, and store two lengths for each list: the number of possible solutions in that list, and the number of total words in the list.
With this system, the dictionary ended up taking 27,920 bytes. That’s a win! It leaves us less than 5 KB of space to write the game code, but it is a simple game, I’m sure I can figure it out.
NES Preview
I’m not done porting it to the NES. I’m probably not going to be the first person who ports it to the NES; at least two other people are working on this independently.
Here’s a short preview of the NES game in action, showing some basic UI functionality. There’s a large blank area where the guesses will go, which isn’t filled in yet.
Hopefully, I’ll be done working on the game within a day or two.