The hidden cost of a bad data model

Most data debt looks like this: a field called status that’s been re-purposed three times across two services and a dashboard, with no two consumers agreeing on what its values mean. Nobody designed this. It accreted.

The cost of bad data modelling shows up later — usually in the form of a metric that quietly disagrees with another metric, and an analyst trying to reconstruct why on a Friday afternoon.

The compounding tax

Every consumer of an ambiguous field pays a tax: a CASE WHEN to handle the weird cases, a comment, a Slack thread to confirm the intended meaning. Each of those is small. But they multiply across teams and time, and they ossify — nobody dares to fix the source because three downstream dashboards now depend on its weirdness.

Smells worth chasing early

A few patterns I’ve learned to take seriously:

A field whose valid values change without a migration.
A nullable column where NULL means three different things.
A timestamp where nobody can tell you the timezone.
A boolean named after a feature that no longer exists.
A “deleted” flag that’s set, but the row is still queried.

Each of these is a small fire. Most of them stay small. Some of them are the cause of a six-month metric drift that ends with a re-platforming project.

The fix is cultural, not technical

Schema reviews. Owned glossaries. Killing fields you don’t use anymore. Treating data contracts the way you’d treat an API contract. None of it is glamorous; all of it is cheaper than the cleanup.