Data Quality Taxation

Aaron Reese: 10th October 2025

A Thought Experiment

Everybody wants good quality data but no one wants to pay for it. Everybody wants good roads/schools/police, but no one wants to pay for them. For the public infrastructure we have a solution. Taxation! what if we did the same for our data?

We need a budget to pay for shared services that otherwise would not be affordable (roads = infrastructure/email/M365 licences) or would be significantly more expensive if paid for individually (security and healthcare = Reporting solutions, helpdesk).

Four Ways Government Raises Money

(ignoring borrowing)

Tax on Assets

Property tax, wealth tax

Tax on Income

Salary tax, business profits

Tax on Spending

VAT, sales tax

Fines on Rule Breakers

Penalties for non-compliance

What if Data Quality Costs Were Divided Along the Same Lines?

Data Asset Tax

If you have a lot of data (assets) then you pay a 'holding' tax.

The closest analogy I can come up with is Council Tax: a fee based on how 'big' your data repository is

As well as the obvious actual physical cost of holding lots of data (disk storage, increased data processing costs and network traffic) we can assign an organisational cost to act as an incentive to look at how much of your data needs to be considered as 'Live': can we delete it, move it to more cost effective storage and look at where we are holding duplicated data (i.e. do we need the database AND the Excel reports that were generated from it?) Doing this could focus us on reducing real costs as well as notional costs

Data Income Tax

If you generate or modify a lot of data then you pay 'income' tax

If you create wealth (data) then the organisation incurs a cost for that. Do you really need your IoT devices to send you data by the minute, would by the half hour do? Does each team actually need a copy of the Excel spreadsheet with their data, or could be use a shared report with security restrictions?

Data Consumption Tax (VAT)

If you consume a lot of data then you pay 'VAT'

Do you REALLY need that dashboard to update every minute, do you actually look at that report every morning?, Do you need to see the entire list of open jobs?

Less processing means lower costs and improved performance.

Data Quality Carbon Credits

If you generate crappy data then you pay 'Carbon Credits' which someone else gets credited with when they sort it out

This is the Biggie! Very often the people entering data are not the same people that rely on the data. If you have no 'skin in the game' why would you incur additional cost and effort to ensure that the data to put into the system is timely, accurate and complete?. Fixing data at entry/source only needs to be done once. Fixing data at the point of use has to be done every time! A large amount of time, effort (and money) is spent on defensive programming, calculating missing values, rejecting invalid records and finding corroborating data. BI/MI is regarded as expensive because they are paying to clean up your mess.

How about, instead of just complaining about it, we look at the source of the problem and you have a choice: Clean up your own mess or pay the BI/MI/DQ team to do it for you. This then becomes a cost/value proposition where the 'cost' to pollute should keep increasing until clean your own room becomes the obvious play. Not only does that mean less 'work' for the BI team, but good data will usually result in more reliable operations as downstream processes don't need to second-guess the data that they are working with.

Incentives and Benefits

You can incentivise archiving and deduplication (e.g. don't copy all your transactions data into a monthly spreadsheet)

Producer-Consumer Negotiations

The negotiation for clean data happens between producer and consumer backed by State Governance

Cost Recovery Models

The cost of clean data can either be through a consumer tax which pays for the data governance programme, or through Carbon Credits where the consumer is compensated for having to clean up someone else's mess

The Carbon Credits Analogy

The Carbon Credits analogy works really well if you are trying to build a centralised data warehouse/lake as the 'value' provided by the ETL/BI team becomes crystalised.