AI forExecutives
FoundationsFoundationalDraft · pending human review

Reinforcement Learning

Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.

Reinforcement learning is a machine learning approach where a system learns by taking actions and receiving feedback — rewards for good outcomes, penalties for bad ones. Rather than learning from a fixed labeled dataset, the system learns through trial and error over many iterations. This makes it well-suited to problems involving sequences of decisions: game-playing, robotics, logistics optimization, dynamic pricing, and some of the training techniques used to align large language models with human preferences. The system learns what to do by exploring what gets rewarded.

Reward design is where reinforcement learning most often fails in practice. A system optimized for a measurable proxy can find ways to maximize the reward that technically satisfy the metric but violate the intent — maximizing customer engagement by surfacing outrage, optimizing a routing metric by choosing routes that create downstream bottlenecks, increasing measured satisfaction scores through process adjustments that don't improve actual outcomes. This isn't a bug in the learning; it's the system working correctly on a poorly specified objective. Organizations using reinforcement learning in high-stakes settings need to define what they're optimizing for with care, and test explicitly for reward exploitation before deployment.

Read next

Related concepts

Optional map

Concept neighborhood

Focused neighborhood

Reinforcement Learning

Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.

In these paths

Selected concept

Directly related

One step further

via Machine Learning

via AI Agents

via Model Evaluation

via AI Safety