Tokenization
How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.
Before a language model processes text, it converts that text into tokens — small units that are typically word fragments rather than whole words. The word "tokenization" becomes something like ["token", "ization"]. This matters for two concrete reasons: cost and capacity. Most AI APIs charge based on the number of tokens processed — both what you send in and what the model returns. And the context window, which limits how much the model can work with in one request, is also measured in tokens. Longer documents, longer conversations, and more complex prompts all consume more tokens, which means higher cost and potentially hitting the limit of what the model can handle in a single call.
Token-based pricing makes AI infrastructure costs difficult to estimate without careful modeling. A use case that seems inexpensive in a single test can become costly at scale when documents are long, outputs are verbose, or conversations accumulate over time. Organizations that don't model token consumption before deployment often face cost surprises once usage grows. The context window limit has a separate consequence: when a conversation or document exceeds the limit, earlier content gets silently dropped — the model doesn't error, it simply stops "seeing" the beginning of the input. That's a reliability problem in workflows that depend on the model having full context.
Continue path
Embeddings
How AI represents meaning as numbers
Optional map
Concept neighborhood
Focused neighborhood
Tokenization
How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.
In these paths
Selected concept
Directly related
One step further
via Large Language Models
via Context Window
via Inference
via Prompt Engineering