Looking at the BI market place today, a lot of products and vendors are telling us that we don't need data warehouses to create those flashy and easy-to-use reports and dashboards. Although from a technical point of view this statement could be valid, in the real world you are not only using the data warehouse to gather information in one location. In the real world we see that a data warehouse does much more than that.
Next to just loading the data to a central store, data warehouses also validate data during ETL. This can be done in various ways. At the basic level, data will be checked whether or not it has the correct format or whether the business keys are indeed unique. When a relation should exist, data warehouses validate this relation and verify whether all mandatory fields are filled out.
Besides basic validations, also more advanced data quality checks could be performed. Think about validations to make sure that pricing is set correctly. Combine customer or other information from several sources and make sure that this information is merged correctly and in valid way. So both checking the quality of loaded data as improving or rating the quality are features of data warehouses. These data quality checks and improvements will assist in getting coherent, correct and structured information.
Lookup and relation constraint
Although data relations were briefly mentioned already in the previous paragraph, it is important enough to mention it again as a separate topic. Linking data correctly is the basic foundation for getting correct information and figures. If a relation is using wrong or non-existing codes, no warnings are given. Yet data will be missing in reports. For example: if you have sales records linked to a customer that is not part of the self-service solution, the sales figures will not be showing up on the reports or dashboard. As a user you will most likely not see that these records are missing, because the tools aren’t equipped to handle that kind of problems.
Having relations validated by performing lookups in the correct dimension tables will make sure that records are linked correctly. If the dimension information is missing but records are still linked, it most likely will turn out to be a dummy record. If all sales records are linked to existing dimensional information, all sales figures will be visualized in reports and dashboards.
Automated logging on load processes
After data quality has been checked and relational lookups are performed, we verify the consistency of our data warehouse. We log all data warehouse information to provide the principal users of the system with the necessary information on the loading and data quality. Delivering users with data quality reports enables them to take necessary actions to improve data at the source system, resulting in higher quality at the reporting side. Also, by showing quality issues to other users, they can work proactively on avoiding and fixing the problems. Giving key users a tool to work proactively on quality will give a better “feeling” to end users as opposed to when these users would need to inform the key users about problems with the data.
Next to having data available for analysis and standard reporting, a lot of other projects benefit from automated reports. Some examples that are only a small subset of the existing possibilities:
• Timely overviews and status information are mailed to managers
• Timely overviews are mailed to customers detailing the status of the cooperation
• Exception reporting is sent out when specific thresholds are reached
Simplified data model for reporting
Looking at operational systems we see that some information is very fragmented. Getting that fragmented information into a model will result in a more difficult reporting model. One of the major advantages of creating a data warehouse is that you can actually convert a complicated source model into an easy to use reporting model that provides information at the correct level.
Data warehouse and the one and only version
When looking back at the past, most will probably remember the excel hell where all business users were creating their own reports and giving identical names to different calculations. Next to reinventing the wheel over en over again, this could also lead to different implementations of that same wheel. Leading to different results. So, one of the main reasons for having a data warehouse is having one location where data is loaded consistently over time according to standardized processes. This gives you the same figures with the same co-notation in different reports.
As you can see, there are plenty of reasons to perform your self-service reporting on top of a data warehouse. Especially for data that is at the core of your organisation. This doesn’t mean of course that you can’t enrich data with external or your own created data. Extending the information will give added value and more insight.
Nico Verbaenen is BI-architect bij Kohera.
30 en 31 januari 2018 De wereld van business intelligence en datawarehousing hanteert een unieke terminologie en eigen verzameling technologieën, ontwerptechnieken en producten. Voor nieuwkomers kan dit overweldigend overkomen. Wa...
20 en 21 maart 2018 We worden al jaren geconfronteerd met een niet-aflatende stroom van nieuwe technologieën, ontwerptechnieken, architecturen en ideeën, zoals Hadoop, streaming, self-service data preparation, logical datawarehouse, data lake, de...
27 maart 2018 Wat zijn de mogelijkheden van low code application platformen en hoe kunt u deze succesvol inzetten in uw organisatie? Aan de hand van Microsoft PowerApps, Flow en de Common Data Service wordt besproken maar vooral ook live getoond hoe ...
12 april 2018 Praktische workshop Datavisualisatie en Data-driven Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling, data science en de data artist? Lex Pierik behandelt de...
17 en 18 april 2018 Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij sneller ...
19 april 2018 De nieuwe Europese General Data Protection Regulation (GDPR of AVG) vervangt in mei 2018 onze Nederlandse privacywet, de Wet Bescherming Persoonsgegevens. Met hoge boetes, een toegenomen bedreiging van datalekken en hacks, de ...
14 en 15 mei 2018Een verhelderend seminar waarin kritisch en onafhankelijk de populaire technologieën voor het opslaan en bewerken big data onder de loep worden genomen door datawarehouse- en databasespecialist pur sang Rick van der Lans. A...
28 en 29 mei 2018Praktische tweedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke...