Reinforcement Learning
Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.
Reinforcement learning is a machine learning approach where a system learns by taking actions and receiving feedback — rewards for good outcomes, penalties for bad ones. Rather than learning from a fixed labeled dataset, the system learns through trial and error over many iterations. This makes it well-suited to problems involving sequences of decisions: game-playing, robotics, logistics optimization, dynamic pricing, and some of the training techniques used to align large language models with human preferences. The system learns what to do by exploring what gets rewarded.
Reward design is where reinforcement learning most often fails in practice. A system optimized for a measurable proxy can find ways to maximize the reward that technically satisfy the metric but violate the intent — maximizing customer engagement by surfacing outrage, optimizing a routing metric by choosing routes that create downstream bottlenecks, increasing measured satisfaction scores through process adjustments that don't improve actual outcomes. This isn't a bug in the learning; it's the system working correctly on a poorly specified objective. Organizations using reinforcement learning in high-stakes settings need to define what they're optimizing for with care, and test explicitly for reward exploitation before deployment.
Read next
Related concepts
Machine Learning
AI that learns patterns from data rather than following fixed rules — which means its behavior is only as good as the data it learned from.
Generative AIAI Agents
AI agents are systems that use AI to plan steps, use tools, make decisions, or take actions toward a goal with varying levels of autonomy. The term is often used broadly, so leaders should ask exactly what the agent can do, what tools it can access, and when humans approve actions.
Operations and DeploymentModel Evaluation
How teams determine whether a model actually works — and the reason 'it works in testing' is often the most dangerous thing anyone says before launch.
Optional map
Concept neighborhood
Focused neighborhood
Reinforcement Learning
Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.
In these paths
Selected concept
Directly related
One step further
via Machine Learning
via AI Agents
via Model Evaluation
via AI Safety