Tokens and Tokenization
Summary
Explains what a token is in LLMs, how tokenizers like BPE operate, Byte-level vs character-level tokenization, vocabulary size as a design knob, and variants like WordPiece and SentencePiece, plus the strawberry problem illustrating token-level perception.