16-10-2016

Trust your data

Deel dit bericht

The motto of the Gartner BI Summit Sydney event, in february 2016, was: 'Empowering People with Trusted Data'. What trends are hidden behind these words and what are their consequences for Data Governance?

Does it make sense to use a desktop ETL tool or BI reporting platform? Actually it makes a lot of sense! The current generation of BI and ETL tools are finally approaching a level of usability that a BI analyst can prepare a data set and a sophisticated report on his/her own without any assistance from the IT department. Spreadsheet lovers are reaching the column/row limits of their tools and it is becoming simply too cumbersome to do a very basic data discovery operation with them. I want to skim through these hundreds of columns quickly and blend these two data-sets with simple joins. The new generation of tools are designed with these use cases in mind and do them far more efficiently.

The classical BI platform was deployed centrally, at a huge cost, after considerable delay and with a call for a crusade against spreadsheets. After a heroic struggle the initiative usually failed to outpace the far nimbler sheets of rows and columns.

Today, the roles have reversed. New tools lure their way into business departments with desktop editions. They start small and quickly delivertangible business results and when the house is full of them, then servers and large scale projects take their turn.

Big Data is here to stay - should we trust it?
At the very beginning of the Big Data technologies hype they were a synonym for a container for unstructured content and were expected to sit side by side with a classical data warehouse. Data warehouse professionals quickly realised their other potential. Hey guys, why store core system dumps in these databases where we pay thousands of dollars per CPU? We could store the full history of dumps and it would cost less. Actually why do loads of simple transactions into the core layer of the warehouse by a ETL we pay thousands of dollars per CPU - we can do it far quicker with much cheaper tools …

Big Data is eating the data warehouse technologies from the bottom up. It’s terrific news for BI budget holders - especially when a data warehouse is built on an expensive vendor specific hardware. The savings are can drop of several zeros per year.

On the other hand it may not have to be such great news for end users. The Big Data technologies have two main drawbacks:

  • A lack of metadata - there are scenarios for which there is no systemic way how to automatically find out what data are present in the storage and developers will never document it on their own.
  • State of query languages - SQL emulations like Hive are painstakingly slow and the native language for MapReduce jobs are light years away from being suitable for business users.


So several plus points for technological excellence and economic savings on the one hand but several millions of minus points from business users, because they don’t know what data is available and how to find this out.

What Big Data was expected to be:

Hora1.jpg


What Big Data is becoming:
Hora2.jpg
What this means for data governance?
Large bursts of reports, quickly and easily created by users coupled with a huge number of measure stored cheaply means the need for data governance is even more compelling than ever before. The labour cost of producing new information (processing data and interpreting it in a report) is falling significantly.

The cost of finding and making sure that the information can be trusted is surging. This should be a mission for a CDO - make the information more accessible and a bit clearer everyday so the business users are truly “empowered with reliable data”.

Peter Hora is Co-Founder at Semanta.

Note: This blog was posted earlier on the Semanta website with the title ‘Observations from the Gartner BI Summit Sydney event february 2016’, on March 14, 2016.

Semanta wordt in Nederland op de markt gebracht door IntoDQ.

Partners