FoundationsFoundationalDraft · pending human review

Training Data

The data a model learned from — which means everything the model knows, gets right, gets wrong, and embeds as bias all traces back here.

Training data is the collection of examples a model learned from during the training process. The model finds patterns across that data and encodes them as the weights that determine its behavior. What the data contains — its subject matter, quality, time period, representation of different groups, and the accuracy of any labels — directly shapes what the model knows, where it performs well, and where it fails. A model trained on historical data will reflect historical patterns, including historical biases. A model trained on one customer population may not generalize to another.

Claims of objectivity for AI systems almost always overstate what's actually true. Every model's behavior reflects the training data it was built on: its gaps, its historical biases, its geographic and demographic skews, and whatever assumptions were made during labeling. When a model produces biased outcomes — scoring certain groups unfairly, performing worse for underrepresented populations, reflecting outdated patterns — the root cause is usually in the training data. Asking "what was this model trained on, and does that data represent the people and situations it will be applied to?" is one of the highest-value governance questions an executive can ask before deploying an AI system.

Related concepts

Foundations

Optional map

Concept neighborhood

Focused neighborhood

Training Data

The data a model learned from — which means everything the model knows, gets right, gets wrong, and embeds as bias all traces back here.

In these paths

Self-Directed

Selected concept

Directly related

One step further

via Machine Learning

via Data Quality

via AI Bias

via Model Evaluation