Technical ConceptsIntermediateDraft · pending human review

Tokenization

How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.

Before a language model processes text, it converts that text into tokens — small units that are typically word fragments rather than whole words. The word "tokenization" becomes something like ["token", "ization"]. This matters for two concrete reasons: cost and capacity. Most AI APIs charge based on the number of tokens processed — both what you send in and what the model returns. And the context window, which limits how much the model can work with in one request, is also measured in tokens. Longer documents, longer conversations, and more complex prompts all consume more tokens, which means higher cost and potentially hitting the limit of what the model can handle in a single call.

Token-based pricing makes AI infrastructure costs difficult to estimate without careful modeling. A use case that seems inexpensive in a single test can become costly at scale when documents are long, outputs are verbose, or conversations accumulate over time. Organizations that don't model token consumption before deployment often face cost surprises once usage grows. The context window limit has a separate consequence: when a conversation or document exceeds the limit, earlier content gets silently dropped — the model doesn't error, it simply stops "seeing" the beginning of the input. That's a reliability problem in workflows that depend on the model having full context.

Related concepts

Generative AI

Optional map

Concept neighborhood

Focused neighborhood

Tokenization

How language models slice text into processable units before working on it — and why those units are what AI vendors charge you for.

In these paths

Self-Directed

Selected concept

Directly related

One step further

via Large Language Models

via Context Window

via Inference

via Prompt Engineering