Data architecture is a challenging and sometimes confusing field. It can be confusing because data architecture means different things to different people and there are many kinds and levels of data architecture - enterprise architecture, technical architecture, etc. In this article, the focus is data management architecture with attention to the processes, data stores, data flows, etc. needed to collect, organize, harmonize, and utilize data to business advantage.
Nearly every organization today is facing the need to rethink and refresh their data management architecture. Data and technology advances of the past decade bring new opportunities and new complexities to data management, yet most organizations continue to work with turn-of-the-century architecture from the BI era. Patching new components onto the surface of obsolete architecture—a band aid and duct tape approach—is not sustainable and won’t readily adapt to changes yet to come. Still, many avoid stepping up to modern data management architecture because it is complex and difficult. The goal with this article is to provide guidance that helps to manage the complexities and minimize the difficulties.
Start with Business Capabilities
The first responsibility of data management is to enable the business to do the things that they need to do to get maximum value from their data. Defining data management architecture doesn’t begin with data, or even with goals like “cloud first” or “streaming first.” Those technical goals must be subordinate to business goals. Begin by working with business stakeholders to develop a list of data-dependent business capabilities. Make them tangible by identifying the kinds of data deliverables that enable those capabilities. Start with the reference list shown in the table below. Refine and customize to represent the needs of your business.
|Business Capability||Enabled With|
|Inform about …||scheduled reports
ad hoc reports
|Inquire about …||managed query
ad hoc query
|Analyze behavior of …||OLAP
|Track … against goals||scorecards|
|Monitor current state of …||dashboards|
|Send/receive alerts about …||event monitoring
|Examine alternatives for …||analytic models|
|Simulate behavior of …||simulation models|
|Explore patterns and trends of …||data mining models|
|Discover hidden insights of …||data mining models|
|Predict future state of …||predictive models|
|Recommend decisions for …||prescriptive models|
|Automate decisions for …||prescriptive models
Refine and customize by brainstorming to add capabilities not shown here, to change terminology to match language common to your organization, and to remove any capabilities that you don’t need now and don’t anticipate as future needs.
Explore Business Requirements
Good architecture is a tool that helps you to meet business requirements. It is impractical to undertake exhaustive and detailed business requirements analysis as part of architectural definition. You’ll get bogged down in requirements details and find it difficult to get back to working on architecture. Instead, work with representative groups of users to collect a few sample requirements for each business capability. For example:
• Inquire about order status.
• Inquire about employee compensation.
• Analyze the behavior of marketing campaigns.
• Track customer loyalty programs against goals.
• Simulate behavior of P&L for new product launch at various price points.
• Recommend decisions for discount offers to customers.
• Automate decisions for next best upsell offer to customers.
Itemize Data Capabilities
Although not the place to begin, technical capabilities such as cloud capable are an important part of architectural definition. Work with technical stakeholders to develop the list of essential data capabilities. Identify for each capability the architectural components that are needed to support that capability. Start with the reference list shown in the table below. Refine and customize to represent the needs of your organization.
|Support all data use cases||data consumption layer of architecture|
|Support all data latencies||batch data capture & ingestion
changed data capture (CDC)
data stream processing
|Support hybrid data ecosystem||microservices architecture
|Sustain legacy data warehouse value||legacy warehouse ingestion into data lake|
|Easy access for all data consumers||data access layer of architecture
|Work with all data types||data source layer of architecture
data source connectors
SQL and NoSQL databases
|Scalable and elastic||cloud platforms|
|Smart and agile data pipelines||data fabric & pipeline automation technology
DataOps tools and techniques
This reference table illustrates examples of needed data capabilities. You are certain to have new and different needs from those listed here. Two good resources to help you brainstorm data capabilities are Wayne Eckerson’s articles Ten Characteristics of a Modern Data Architecture and Ten Things Companies Want from a Modern Data Architecture.
Adapt a Reference Architecture
Now that you’ve expressed architectural requirements as business capabilities and data capabilities it is time to create a diagram that visually represents the architecture. This can be quite an intimidating task if you start with a blank page. A better approach is to work from a reference architecture and adapt it to support your list of needed capabilities. A reference architecture is a template that represents best practices and provides a starting place for architectural definition. A quick web search finds many reference data architectures. Remember that the focus here is data management architecture so be sure that the reference architecture that you choose represents data management best practices. Of course, I recommend Eckerson Group’s reference data management architecture (see figure 1).
Figure 1 – Eckerson Group Reference Data Management Architecture.
Remember that reference architecture is a template—a starting place from which you’ll adapt to create the architecture that best matches your organization’s needs. As you prepare to adapt I suggest reading (or rereading) my article about Modernizing Data Management Architecture. Then adapt by mapping architecture components to your lists of business and data capabilities. Remove any components that you don’t need and add any components that are needed. Adjust terminology to match the language used in your organization.
Finally, revisit your collection of example business requirements. Walk each example through the architecture to test that the data, the processing, and the use case are all supported by the architecture. Continue to adjust the architecture iteratively until all of the example business requirements are supported without compromising data capabilities such as low latency, large data volumes, high throughput, etc.
Dave Wells will present two keynotes during the Datawarehousing & Business Intelligence Summit:
'Cloud Data Warehousing: Planning for Data Warehouse Migration' on March 25th and
'Modernizing Data Governance for the Age of Self-Service Analytics' on March 26th.
Furthermore he will present an unique post-conference workshop: Cloud Data Warehousing on March 30th en 31st.
Op woensdag 25 en donderdag 26 maart 2020 vindt in het Van der Valk Hotel in Utrecht voor de zevende keer de Data Warehousing & Business Intelligence Summit plaats. Dit onafhankelijke congres wordt wederom georganiseerd door Adept Events, en heeft oo...
30 en 31 maart 2020Praktische workshop met internationaal gerenommeerde trainer Keith McCormick over machine learning. De workshop wordt ondersteund met oefeningen en praktijkvoorbeelden.Praktische workshop met Keith McCormick over het toepasse...
30 en 31 maart 2020Praktische workshop met internationaal gerenommeerde trainer Dave Wells over cloud datawarehousing. De workshop wordt ondersteund met oefeningen en praktijkvoorbeelden.Wat zijn de voor- en nadelen van Cloud Datawarehousing en hoe...
7 en 8 april 2020 Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij sneller nieuwe ...
9 april 2020 Praktische workshop Datavisualisatie en Data-driven Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling, data science en de data artist? Lex Pierik behandelt de ...
21 en 22 april 2020 Praktisch tweedaags seminar met internationaal gerenommeerde spreker Mike Ferguson over het opzetten van een Enterprise Data Lake. Het seminar wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare richtlijnen. In d...
12 en 13 mei 2020 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord op...
13 - 15 mei 2020Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en praktijkv...