Project Overwatch

What is it

At it's heart a Data Quality System (DQS) is a solution that allows you to apply repeatable standard processes to validate and enforce data quality standards. Typically there will be a technology supporting the process as this makes it more efficient and easier to implement new DQ checks, but that is not always a requirement. It should however stand alone from the source systems it is administering. The reasons for this are numerous but in essence boil down to the solution being system agnostic (i.e. it should work the same way against ALL source systems both now and in the future) and not losing DQ history when source systems are upgraded or replaced. In terms of standards and repeatability, there should be one (or perhaps a distinct few) ways to instantiate a new DQ check. This check should record information in a standard way that allows you to do the following:

Identify the individual records that are causing a DQ issue
The issue being identified
When the issue was first identified
The last time the issue was identified (i.e. has it been fixed)
Who created the issue and who fixed it

Together, these individual data captures should be able to give the business some assurances around the number of new issues being created (and therefore whether training, front end validation, or operating processes need addressing) and also the "mean time to Fix" for DQ issues that have been flagged. Individual DQ checks should be runnable on demand or on a scheduled basis: Some errors need to be picked up in near real-time, others on a daily basis and some on a weekly, monthly or quarterly basis. The DQS should be able to run checks against different systems and also combine data from different sources (to enable discrepancies to be identified).

A DQS will typically have the following components and features

A standard way of writing new DQ rules
A way to run these rules on demand or on a schedule
A way to record the results of the particular DQ check so that we can
- Report on currently outstanding DQ issues
- Provide historical performance reporting on the number of issues created and fixed
A way to expose specific DQ issues record sets to the relevant audiences:

The same DQ issue may need to be reported to multiple groups in different contexts:
- Housing officers need to know the individual errors to be checked
- Area Managers need to know how many fixes each HO is responsible for and how old they are
- Head of Service needs to know how many issues are outstanding in each area and what the trend is over the last 4 weeks

Where your source systems don't have sufficient capability the DQS should also have the ability to identify business activity that requires a new action or response from another team or colleague, or where an action has not been recorded in a suitable time frame to escalate this up the chain of command or record the breach in SLA as a DQ issue. For example, if a CRM call comes in for domestic violence, you may have an SLA of a support worker response within 90 minutes. The DQS could send an email to the relevant team alerting them of a new urgent requirement and if no activity has been recorded within 60 minutes, send a follow up email with high urgency AND an SMS to the Head of Service advising of an imminent SLA breach and safeguarding issue. If the action is still not recorded in time, it should record it as a DQ issue so that we have a historical record of SLA breaches, even if the relevant action record is added at a future date with a retrospective action time: The action needs to be both taken and recorded within time to prevent an SLA breach.

Why do We need one

The short answer is that you probably already have one, but it will be informal and spread all over G_d and creation; a combination of system reports, DWH reports, system extracts, fancy in-department-built Excel/Access reports with complex formulas, fragile pivot tables and VBA code. Each check will be custom designed to address the individual problem that was identified with no consideraton to system performance, historical tracking or providing feedback to the relevant departments and teams to address (or better) prevent the issues from occurring and persisting. This "shadow solution" presents the standard challenges of no IT support, infrequent and manual operations and no oversight of how much effort is put into identifying, fixing and preventing errors.

These informal systems grow up as a result of not having a clearly defined and explained DQ strategy, report development timeframes that are too long, and a conflation of the separate processes of capturing DQ issues and exposing the resulting misdemeanors to the relevant data owners, along with a lack of a way to track performance. By introducing a formal DQS solution (and establishing the correct organisation structure to manage DQ) you can realise the following benefits:

A centralised repository of DQ checks so that we know that everything is in one place
A faster 'time to market' for new DQ checks as each one will follow the same design patterns for capture and exposure. This quicker T2M will encourage the business to 'share' more DQ issues as there will be tangible benefits from doing so
A shorter onboarding cycle for new DQ officers/developers as they are following 'standard' processes to capture and expose DQ issues
A standardised way for the business to receive DQ notifications that means they are no longer responsible for remembering to run the processes and then action the data.
The business unit responsible for identifying a DQ issue that affects them is no longer necessarily responsible for fixing the data: One of the key causes of DQ errors is that the team that is affected has no 'skin in the game' for when the data is entered
A virtuous circle is established of flagging DQ issues early and provding feedback on how many are fixed, estabishing a culture where DQ is a first class citizen and consequently providing more trust in the system data.

The DNA of a good DQS

The purpose of the DQS is to enable the business to quickly surface any newly identified DQ issues and the way to do This is to continually think about the process in the smallest possible work units. One challenge of traditional DQ reporting solutions is that they try to include capture and presentation in the same work unit. Not only does this mean it takes longer to get business value (because you have to do twice the amount of development) but it also means that a lot of the work done is duplicated: DQ issues to do with asset maintenance are likely to all go to the same people, so the process of joining the asset reference to it's address, responsible officer and Area Manager are repeated for each issue identified.

Component model and smallest unit of work

The DQS needs to be designed so that the capture of an issue is the smallest possible piece of work. For us this means a single query that captures nothing but the issue at hand and the system identifier for that issue. The system identifer may be the asset ID, Tenancy number, Repair Order ID etc. It should be a reference the user can enter into the system to access the records for correction. The record ID for a customer contact method would NOT be a good identifier as you cannot navigate directly to the record, but instead need to know which customer it is for and traverse the customer for the offending contact method

These issues need to be captured and recorded in a simple way, preferably in a single location such as a database table. We are only storing the entity key so we also need a way to identify what sort of entity it is (Asset, Tenancy, Repair) so that we know how to tie it back to other relevant information. To do this you should have a list of the entities and their associated source systems.

Records about a specific entity type are likely to be reported to the same people and contain the same or similar information

Assets will need their address, support officer, local authority, property type, current tenancy status.
Tenants will need contact details, tenancy address, current arrears status
Repairs will need repair address, current status, appointment date, list of actions on the repair

It seems sensible to write these relationships once and store them in the system as views so that every time we write a new DQ rule it can be attached to the correct view (or views) and save some valuable development time.

One DQ issue can use multiple views if it needs to go to different audiences
One view can be used for multiple DQ issues if the information needed to resolve the issue is mainly the same
It shouldn't matter if there are extra fields in the view which are not relevant for a partiuclar DQ issue as these can be filtered out in the presentation layer
Presentation should be separate from capture

We may want to capture on an hourly basis, but report only once a day. We may want to report the same issue to multiple recipients in different formats but use the same capture source so that the data does not change if the reports are issued at different times
Presentation should also be based on the smallest unit of work possible

For us this means having a standard way of creating a report (or email) template that can be populated with the correct information, based on the DQ issue being reported and the view being attached, and a way to progrmatticaly match the fields in the view with the placeholders in the reporting or email template. This makes the templates re-usable across multiple issues again reducing the amount development and maintenance required
The preparation and presentation of reports/emails should also be scheduled so that they reach the recipients at the most appropriate time and this may be completely separate from the time at which the DQ issue was identified.
Rules about who should receive reports/emails should be flexible enough to allow permanent of temporary overrides of users based on abscence or changes in job roles.
A history of all emails sent should be available so that we can evidence that users were properly informed of their issues.
General reporting should be available that shows the current state of all DQ issues and also the ability to drill down into their history to see the number of outstanding issues at any given time and how old they were with statistical analysis on the mean time to fix.

Our Component Model

In the spirit of smallest unit of work and building reusable components that can be combined together in different ways we take pride in the following:

Identify separate source systems
Identify business entities and tie them to their source systems
Use linked servers and synonymns for source databases and table names: this makes your code portable from DEV to PROD
A DQ issue is a single query: the results of the query are stored in a single table that identifies when it was run and which records were captured. The same table is used for all results
A reusable scheduling system that can be used to control which DQ checks are run when
A set of standardised views than can be applied to any DQ check for an Entity to enrich the data
A set of standardised HTML email templates that can be used aginst multiple DQ checks
A set of rules that allow you to define the DQ Check, Enrichment View, Email Template and Schedule to be used for queing emails
A centralised store of all emails to be sent with a list of who they are being sent to (and who they would have gone to if in a non-production environment)
Rules and Schedules for Event Management capture, Escalation Emails and alerting

What is a Data Quality System and Why Do I need One

Aaron Reese: 10th June 2024

What is it

A way to expose specific DQ issues record sets to the relevant audiences:

Why do We need one

The DNA of a good DQS

Component model and smallest unit of work

Our Component Model