2-16

This experiment came out of chatting with colleagues at the ANU School of Cybernetics, exploring how you could make experiences that help to explain how language models handle dictionaries and break sentences into tokens.

It’s like a digital version of a 4x4 keypad, where the on/off state of 16 keys gives 2^16 combinations – enough to cover all 50,280 tokens used in training the original ChatGPT.

Each word gets its own unique combination of button presses. Try clicking different combinations to see what words pop up, or type words into the top bar to see how they’re tokenised and matched to different keys.

On the tech side, it uses React, Tailwind, and the tiktoken JS/WASM library.