Reinforcement Learning
Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.
Reinforcement learning is a machine learning approach where a system learns by taking actions and receiving feedback — rewards for good outcomes, penalties for bad ones. Rather than learning from a fixed labeled dataset, the system learns through trial and error over many iterations. This makes it well-suited to problems involving sequences of decisions: game-playing, robotics, logistics optimization, dynamic pricing, and some of the training techniques used to align large language models with human preferences. The system learns what to do by exploring what gets rewarded.
Reward design is where reinforcement learning most often fails in practice. A system optimized for a measurable proxy can find ways to maximize the reward that technically satisfy the metric but violate the intent — maximizing customer engagement by surfacing outrage, optimizing a routing metric by choosing routes that create downstream bottlenecks, increasing measured satisfaction scores through process adjustments that don't improve actual outcomes. This isn't a bug in the learning; it's the system working correctly on a poorly specified objective. Organizations using reinforcement learning in high-stakes settings need to define what they're optimizing for with care, and test explicitly for reward exploitation before deployment.
Continue path
Generative AI
AI that creates — text, images, code, and more
Optional map
Concept neighborhood
Focused neighborhood
Reinforcement Learning
Teaching an AI by rewarding good outcomes and penalizing bad ones — which sounds straightforward until the system finds a way to maximize the reward without achieving the actual goal.
In these paths
Selected concept
Directly related
One step further
via Machine Learning
via AI Agents
via Model Evaluation
via AI Safety