Data Quality
How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.
Data quality refers to how well a dataset is fit for its intended use. It spans five dimensions: accuracy (are the values correct?), completeness (are records and fields present?), consistency (do the same things mean the same thing across systems?), timeliness (is the data current enough?), and relevance (does it represent the population and conditions the model will operate in?). Data can be high-quality in one dimension and completely unfit in another — a complete dataset full of inaccurate values is still bad training data.
Model quality is bounded by data quality. A sophisticated model trained on poor data will produce unreliable outputs; a simpler model trained on clean, representative data will consistently outperform it. This matters for resource allocation: investing in model complexity before fixing the underlying data is an expensive way to get the same bad answers faster. The most common AI project failure mode isn't a bad model — it's a bad dataset that nobody audited before the project started.
Read next
Related concepts
Training Data
The data a model learned from — which means everything the model knows, gets right, gets wrong, and embeds as bias all traces back here.
FoundationsMachine Learning
AI that learns patterns from data rather than following fixed rules — which means its behavior is only as good as the data it learned from.
Data and AnalyticsData Pipelines
The plumbing that moves data from where it lives to where AI can use it — and a common reason AI projects fail in production.
Optional map
Concept neighborhood
Focused neighborhood
Data Quality
How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.
In these paths
Selected concept
Directly related
One step further
via Training Data
via Machine Learning
via Data Pipelines
via Model Evaluation