In the final article of this series, I address the topic of maintaining the data warehouse that has been designed, built, and deployed over the previous three articles. But, maintenance is too small a word for what is entailed!
In more traditional IT projects, when a successful system is tested, deployed and in daily operation, its developers can usually sit back and take a well-deserved rest as users come on-board, and leave ongoing maintenance to a small team of bug-fixers and providers of minor enhancements. At least until the start of the next major release cycle. Developers of today’s data warehouses have no such luxury.
The measure of success of a data warehouse is only partly defined by the number and satisfaction level of active users. The nature of creative decision-making support is that users are continuously discovering new business requirements, changing their mind about what data they need, and thus demanding new data elements and structures on a weekly or monthly basis. Indeed, in some cases, the demands may arrive daily!
This need for agility in regularly delivering new and updated data to the business through the data warehouse has long been recognized by vendors and practitioners in the space. Unfortunately, such agility has proven difficult to achieve in the past. Now, ongoing digitalization of business is driving ever higher demands for new and fresh data. Current—and, in my view, short-sighted—market thinking is that a data lake filled with every conceivable sort of raw, loosely managed data will address these needs. That approach may work for non-critical, externally sourced social media and Internet of Things data. However, it really doesn’t help with the legally-binding, historical and (increasingly) real-time internally and externally sourced data currently delivered via the data warehouse.
Fortunately, the agile and automated characteristics of the Data Vault / data warehouse automation (DWA) approach described in the design, build and operate phases discussed in earlier posts apply also to the maintenance phase. In fact, it may be argued that these characteristics are even more important in the maintenance phase than in the earlier ones of data warehouse development.
One explicit design point of the Data Vault data model is agility. A key differentiator between Hub, Link, and Satellite tables is that they have very different usage types and temporal characteristics. Such separation of concerns allows changes in both data requirements (frequent and driven by business needs) and data sources (less frequent, but often requiring deep “data archeology”) to be handled separately and more easily than in traditional designs. In effect, the data warehouse is structured according to good engineering principles, while the data marts flow with user needs. This structuring enables continuous iteration of agile updates to the warehouse, continuing through to the marts, by reducing or eliminating rework of existing tables when addressing new needs. For a high-level explanation of how this works, see Sanjay Pande’s excellent “Agile Data Warehousing Using the Data Vault Architecture” article.
The engineered components and methodology of the Data Vault approach are particularly well-suited to the application of DWA tools, as we saw in the design and build phases. However, it is in the maintain phase that the advantages of DWA become even more apparent. Widespread automation is essential for agility in the maintenance phase, because it increases developer productivity, reduces cycle times, and eliminates many types of coding errors. WhereScape Data Vault Express incorporates key elements of the Data Vault approach within the structures, templates, and methodology it provides to improve a team’s capabilities to make the most of potential automation gains.
Furthermore, WhereScape’s metadata-driven approach means that all the design and development work done in preceding iterations of data warehouse/mart development is always immediately available to the developers of a subsequent iteration. This is provided through the extensive metadata that WhereScape stores in the relational database repository and makes available directly to developers of new tables and/or population procedures. This metadata plays an active role in the development and runtime processes of the data warehouse (and marts) and is thus guaranteed to be far more consistent and up-to-date than typical separate and manually maintained metadata stores such as spreadsheets or text documents.
In addition, WhereScape automatically generates documentation, which is automatically maintained, and related diagrams, including impact analysis, track back/forward, and so on. These artefacts aid in understanding and reducing the risk of future changes to the warehouse, by allowing developers to discover and avoid possible downstream impacts of any changes being considered.
Another key factor in ensuring agility and success in the maintenance phase is the ongoing and committed involvement of business people. WhereScape’s automated, templated approach to the entire design, build and deployment process allows business users to be involved continuously and intimately during every stage of development and maintenance of the warehouse and marts.
With maintenance, we come to the end of our journey through the land of automating warehouses, marts, lakes, and vaults of data. At each step of the way, combining the use of the Data Vault approach with data warehouse automation tools simplifies technical procedures and eases the business path to data-driven decision making. WhereScape Data Vault Express represents a further major stride toward the goal of fully agile data delivery and use throughout the business.
If you missed the previous blog posts in this series, you can find them here:
Designing a Data Warehouse
Building a Data Warehouse
Operating a Data Warehouse
This blog is part of a series and has been published on WhereScape.com.
14 en 15 mei 2025 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord op...
19 t/m 21 mei 2025Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en praktij...
20 en 21 mei 2025 Deze 2-daagse cursus is ontworpen om dataprofessionals te voorzien van de kennis en praktische vaardigheden die nodig zijn om Knowledge Graphs en Large Language Models (LLM's) te integreren in hun workflows voor datamodelleri...
22 mei 2025 Workshop met BPM-specialist Christian Gijsels over AI-Gedreven Business Analyse met ChatGPT. Kunstmatige Intelligentie, ongetwijfeld een van de meest baanbrekende technologieën tot nu toe, opent nieuwe deuren voor analisten met innovatie...
17 t/m 19 november 2025 De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot...
Alleen als In-house beschikbaar Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij snell...
Deel dit bericht