Tokenization
How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.
Before a language model processes text, it converts that text into tokens — small units that are typically word fragments rather than whole words. The word "tokenization" becomes something like ["token", "ization"]. This matters for two concrete reasons: cost and capacity. Most AI APIs charge based on the number of tokens processed — both what you send in and what the model returns. And the context window, which limits how much the model can work with in one request, is also measured in tokens. Longer documents, longer conversations, and more complex prompts all consume more tokens, which means higher cost and potentially hitting the limit of what the model can handle in a single call.
Token-based pricing makes AI infrastructure costs difficult to estimate without careful modeling. A use case that seems inexpensive in a single test can become costly at scale when documents are long, outputs are verbose, or conversations accumulate over time. Organizations that don't model token consumption before deployment often face cost surprises once usage grows. The context window limit has a separate consequence: when a conversation or document exceeds the limit, earlier content gets silently dropped — the model doesn't error, it simply stops "seeing" the beginning of the input. That's a reliability problem in workflows that depend on the model having full context.
Read next
Related concepts
Large Language Models
The AI models behind most generative tools today — capable of remarkable language tasks, and unreliable about facts they were never trained on.
Generative AIContext Window
How much a language model can hold in mind at once — and why it matters more than it sounds.
Technical ConceptsInference
Where training ends and the model starts doing actual work — producing outputs on real inputs, in real time.
Optional map
Concept neighborhood
Focused neighborhood
Tokenization
How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.
In these paths
Selected concept
Directly related
One step further
via Large Language Models
via Context Window
via Inference
via Prompt Engineering