Data Quality
How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.
Data quality refers to how well a dataset is fit for its intended use. It spans five dimensions: accuracy (are the values correct?), completeness (are records and fields present?), consistency (do the same things mean the same thing across systems?), timeliness (is the data current enough?), and relevance (does it represent the population and conditions the model will operate in?). Data can be high-quality in one dimension and completely unfit in another — a complete dataset full of inaccurate values is still bad training data.
Model quality is bounded by data quality. A sophisticated model trained on poor data will produce unreliable outputs; a simpler model trained on clean, representative data will consistently outperform it. This matters for resource allocation: investing in model complexity before fixing the underlying data is an expensive way to get the same bad answers faster. The most common AI project failure mode isn't a bad model — it's a bad dataset that nobody audited before the project started.
Continue path
Human-in-the-Loop
Where operations still needs human judgment and override
Optional map
Concept neighborhood
Focused neighborhood
Data Quality
How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.
In these paths
Selected concept
Directly related
One step further
via Training Data
via Machine Learning
via Data Pipelines
via Model Evaluation