Data and AnalyticsFoundationalDraft · pending human review

Data Quality

How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.

Data quality refers to how well a dataset is fit for its intended use. It spans five dimensions: accuracy (are the values correct?), completeness (are records and fields present?), consistency (do the same things mean the same thing across systems?), timeliness (is the data current enough?), and relevance (does it represent the population and conditions the model will operate in?). Data can be high-quality in one dimension and completely unfit in another — a complete dataset full of inaccurate values is still bad training data.

Model quality is bounded by data quality. A sophisticated model trained on poor data will produce unreliable outputs; a simpler model trained on clean, representative data will consistently outperform it. This matters for resource allocation: investing in model complexity before fixing the underlying data is an expensive way to get the same bad answers faster. The most common AI project failure mode isn't a bad model — it's a bad dataset that nobody audited before the project started.

Related concepts

Foundations

Optional map

Concept neighborhood

Focused neighborhood

Data Quality

How fit your data actually is for what you're trying to do with it — and the most common reason AI projects disappoint.

In these paths

CFO COO Self-Directed

Selected concept

Directly related

One step further

via Training Data

via Machine Learning

via Data Pipelines

via Model Evaluation