Training Data
The data a model learned from — which means everything the model knows, gets right, gets wrong, and embeds as bias all traces back here.
Training data is the collection of examples a model learned from during the training process. The model finds patterns across that data and encodes them as the weights that determine its behavior. What the data contains — its subject matter, quality, time period, representation of different groups, and the accuracy of any labels — directly shapes what the model knows, where it performs well, and where it fails. A model trained on historical data will reflect historical patterns, including historical biases. A model trained on one customer population may not generalize to another.
Claims of objectivity for AI systems almost always overstate what's actually true. Every model's behavior reflects the training data it was built on: its gaps, its historical biases, its geographic and demographic skews, and whatever assumptions were made during labeling. When a model produces biased outcomes — scoring certain groups unfairly, performing worse for underrepresented populations, reflecting outdated patterns — the root cause is usually in the training data. Asking "what was this model trained on, and does that data represent the people and situations it will be applied to?" is one of the highest-value governance questions an executive can ask before deploying an AI system.
Read next
Related concepts
Machine Learning
AI that learns patterns from data rather than following fixed rules — which means its behavior is only as good as the data it learned from.
Data and AnalyticsData Quality
How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.
Governance and RiskAI Bias
When an AI system consistently produces worse outcomes for certain groups — and the organization doesn't know it yet.
Optional map
Concept neighborhood
Focused neighborhood
Training Data
The data a model learned from — which means everything the model knows, gets right, gets wrong, and embeds as bias all traces back here.
In these paths
Selected concept
Directly related
One step further
via Machine Learning
via Data Quality
via AI Bias
via Model Evaluation