I started working in data management in the dim and distant days of the mid 1990s when I was part of a small team in the Business Strategy function of a global telecommunications company. The team had been tasked with exposing the root causes of a problem that the CEO at that time had homed in on.
Although many millions had been spent on acquiring and deploying large scale IT solutions to support the company’s operations, this investment was considered to be at least in part a failure as there was growing evidence that operational inefficiencies and process breakdowns were commonplace, despite the fact that the business cases for these investments had promised new It would improve them. Worse still, all this investment had, in some cases, made the problems worse and not better.
After many interviews with key people across the company, two main themes consistently emerged. The first was poor management of IT requirements, which led to purchases of software that did not align with operational and process needs. The second was that although many new databases and data warehouses had been deployed to store data and make it more accessible to the business processes that depended on it, the data within them was overall not fit for purpose. Uncontrolled data duplication, missing data, inaccurate data etc. were recurring issues raised time and again. We soon gathered a compelling body of evidence to show that our processes were failing all too often because the data they relied on was not of the quality expected. So poor data quality, and how to fix it, became the predominant challenge. I was the (un)lucky team member given the task of doing something about it.
Over the course of the next decade, data quality became my consuming passion. A business wide data quality programme was created, lots of improvement projects completed, and many benefits accrued, including a decrease of revenue losses, cost reduction, better customer management, reduced legal and regulatory risks and so on. But though it was an acknowledged success both within and outside the company, it all took a long time, and consumed a lot of resource and effort.
But what has this piece of 20th century history got to do with today’s data management challenges? In my view, it is as relevant as ever. The data quality problems outlined above persist today in the great majority of organisations we in Global Data Strategy work with and talk to. On the positive side, many things have improved since those early days of data quality. The drive to automate processes as businesses seek to become increasingly digital have elevated data quality up the corporate agenda of many organisations, so that anyone promoting the data quality cause today is no longer viewed as a strange, swivel eyed eccentric, as many of us were in the pioneering days. My first lesson in data quality was that you cannot automate or digitise a process unless the data supporting it is accurate and complete, and this is as true today than ever, and a growing number of companies recognise this. Moreover the software tools available to profile & analyse and enhance data quality have improved beyond all recognition. Today there are few large organisations that do not have at least one dedicated data quality tool somewhere in their armoury.
Despite this, the problems continue. There are many reasons for this. The main one of course is that today’s data challenges make those of the 1990s look like child’s play. The volume, complexity and speed of data processing has exploded. The range and scope of data platforms have expanded, now embracing both long established data sources (data warehouses, operational data stores, CRM etc) and newer ones, including master data platforms, big data lakes, product lifecycle management tools, analytics platforms and so on. So although approaches, techniques and tools have made a giant leap forward, the chasm between data quality needs and the capacity to deliver them is as large as ever. And a giant leap forward is only worth taking if you jump far enough to bridge the gap; if not, you fall and you fail.
So what’s needed to ensure a successful data quality strategy and approach, given today’s formidable challenges? Reflecting on my early experiences, there were two things that could and would have helped us to deliver better data quality more quickly and efficiently. The first was a more rigorous method for prioritising data quality problems, so we could focus our resources more effectively. The second, and related missing element, was a means to identify which data mattered most, as it was most crucial in business operations and/or it was used across multiple processes and platforms. As we did not have a clear view of this in the early days, we tended to address issues bottom up, i.e. identified a specific problem, put a team together to analyse the root causes, derive solutions, and deliver them. Sometimes, more by luck than judgement, we hit upon an issue that would benefit other areas of the business, and so duly gave it a higher priority. But it could be a hit and miss process.
Today every organisation has data quality problems and the scope and scale of data is such that all cannot realistically be tackled. So the same issues arise. Where should you start? What data is highest priority and why? In addition, what would ‘good’ look like, how do we define that, and how do we know when we achieve it?
Another data management discipline has a great deal to contribute if these questions are going to be answered, and it is an area of data management all too often neglected in many companies. I am referring to business architecture generally and data architecture in particular. In my early telecommunications days we had no data models to refer to (other than detailed technical physical data models associated with specific platforms and systems), no methodical way of identifying the interrelationship between data and business processes, and no formal, agreed business definitions of key data entities and their attributes. These are all things that a sound, dynamic data architecture provides.
To tackle data quality more strategically (more top down rather than bottom up) linking it closely with data architecture has huge value. To list some of the main benefits:
• Conceptual and logical data models highlight both the most important data domains and entities and so provide an ideal starting point for focusing data quality endeavours
• Other architectural models (e.g. dataflow diagrams, process ‘swim lanes’ and so on) highlight the interdependencies between business processes and data and so help to identify which processes would most benefit from specific data quality improvements and so inform prioritisation and focus.
• Attributes identified in the data models help to define the data standards specific data fields need to adhere to. This can help to specify data quality improvement targets and thresholds required (in terms of both format and content), help to quantify the gap between desired adherence and actual, and form the foundation of data quality business rules needed to clean up the data and maintain its quality.
• Effective metadata management is an essential component of data quality improvement. Architectural artefacts are the starting point to provide this.
To conclude, tackling data quality problems in today’s enterprises requires a much more strategic and architectural driven approach than was the norm at the dawn of data quality initiatives. Using architecture to frame and focus data quality efforts is essential. I wish I had known that back in the 1990s, but as in all things in life, it’s better late than never.
Nigel Turner will present his keynote 'Data Governance and Architecture – Making the connections' on June 9th during the Datawarehousing & Business Intelligence Summit.
7 november (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. En hij behandelt wat elke data-pr...
11 t/m 13 november 2024Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en pr...
18 t/m 20 november 2024Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare ...
26 en 27 november 2024 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot en met hogere niveaus van...
3 april 2025 (halve dag)Praktische workshop met Alec Sharp [Halve dag] Deze workshop door Alec Sharp introduceert conceptmodellering vanuit een non-technisch perspectief. Alec geeft tips en richtlijnen voor de analist, en verkent datamodellering op c...
10, 11 en 14 april 2025Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikba...
15 april 2025 Praktische workshop Datavisualisatie - Dashboards en Data Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling en data science? Lex Pierik behandelt de stromingen in ...
Deel dit bericht