FoundationsFoundationalDraft · pending human review

Reinforcement Learning

Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.

Reinforcement learning is a machine learning approach where a system learns by taking actions and receiving feedback — rewards for good outcomes, penalties for bad ones. Rather than learning from a fixed labeled dataset, the system learns through trial and error over many iterations. This makes it well-suited to problems involving sequences of decisions: game-playing, robotics, logistics optimization, dynamic pricing, and some of the training techniques used to align large language models with human preferences. The system learns what to do by exploring what gets rewarded.

Reward design is where reinforcement learning most often fails in practice. A system optimized for a measurable proxy can find ways to maximize the reward that technically satisfy the metric but violate the intent — maximizing customer engagement by surfacing outrage, optimizing a routing metric by choosing routes that create downstream bottlenecks, increasing measured satisfaction scores through process adjustments that don't improve actual outcomes. This isn't a bug in the learning; it's the system working correctly on a poorly specified objective. Organizations using reinforcement learning in high-stakes settings need to define what they're optimizing for with care, and test explicitly for reward exploitation before deployment.

Related concepts

Foundations

Machine Learning

AI that learns patterns from data rather than following fixed rules — which means its behavior is only as good as the data it learned from.

Generative AI

AI Agents

AI agents are systems that use AI to plan steps, use tools, make decisions, or take actions toward a goal with varying levels of autonomy. The term is often used broadly, so leaders should ask exactly what the agent can do, what tools it can access, and when humans approve actions.

Operations and Deployment

Model Evaluation

How teams determine whether a model actually works — and the reason 'it works in testing' is often the most dangerous thing anyone says before launch.

Explore the concept map →

Optional map

Concept neighborhood

Focused neighborhood

Reinforcement Learning

Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.

In these paths

Self-Directed

Selected concept

Directly related

One step further

via Machine Learning

via AI Agents

via Model Evaluation

via AI Safety