Data warehousing in the cloud has become popular as companies are challenged with growing data volumes, higher service level expectations, and the need to integrate structured warehouse data with unstructured data in a data lake. Cloud data warehousing provides many benefits, but getting there isn’t easy. Migrating an existing data warehouse to the cloud is a complex process of moving schema, data, and ETL. Moving an existing data warehouse to the cloud is not quick, and it isn’t easy, but there are real and substantial benefits.
Despite declarations by some that data warehousing is dead, the data warehouse is alive and continues to be needed, but it also needs to be modernized. Data warehouse modernization can occur in any or all of three areas: platform, architecture, and management processes.
Yes the data warehouse is alive, but it is not necessarily alive and well. With rapid growth of use cases, data sources, data types, and data volumes legacy data warehouses face many challenges:
• Scale up instead of scale out is inadequate for today’s data volumes.
• Workload peaks and valleys become more extreme and tooling up for the peaks is costly.
• Cost and complexity of data center management and operations expand out of proportion to actual growth.
• Processing bottlenecks increase data latency and corresponding user dissatisfaction.
• Deploying new infrastructure capability and capacity is slow, it is costly, and it causes project delays.
• The data warehouse has become business critical — sometimes mission critical — yet it lacks fault tolerance and is usually not fully integrated into disaster recovery and business resumption plans.
• Security and governance — managed externally to warehouse operations — struggle to keep up with data growth, user growth, and the shift to self-service.
• Database management becomes increasingly difficult with growth in data volume and variety.
The challenges of traditional data warehousing make cloud data warehousing especially appealing.
• Cloud scalability responds to the challenges of growth management.
• Cloud elasticity eases the pain of workload management.
• Managed infrastructure shifts the burden of data center management and reduces data center costs.
• Cloud performance eliminates processing bottlenecks and reduces data latency.
• The cloud’s “instant infrastructure” gets projects underway much faster.
• Business risks and recovery concerns are reduced through the benefits of virtualization.
• Security and governance are enhanced through service provider features and virtual private cloud protections.
• RDBMS in the cloud reduces database management complexity without the need to rebuild using NoSQL.
Cloud migration is inviting and it is a practical step along the path to modernization. But it is not as simple as moving your data from one platform to another. Tactically and technically, data warehouse migration is a process of many steps to migrate all of the components. (See figure 1.)
Figure 1: Migrating a Data Warehouse to the Cloud.
When planning for cloud migration you’ll need to answer many questions.
• Migrating Schema. Before moving warehouse data, you’ll need to migrate table structures and specifications. Will you make any structural changes as part of the migration? Do you need to rethink indexing or partitioning?
• Migrating Data. Moving very large volumes of data is process intensive, network intensive, and time consuming. How long will it take to migrate and what can you do to accelerate? Did you restructure as part of schema migration? Do you need to transform data as part of the data migration? Can you transform in stream or should you pre-process and then migrate?
• Migrating ETL. Moving data may be the easy part when compared to migrating ETL processes. Will you need to change the code base to optimize for platform performance? Do you need to change data transformations to sync with data restructuring? Should data flows remain intact or should they be reorganized? Do you need to reduce data latency and deliver near real-time data as part of migration? Would it make sense to migrate ETL processing to the cloud as well? If so, is there a utility to convert your ETL code?
• Rebuilding Data Pipelines. With any substantive change to data flow or data transformation, rebuilding data pipelines may be a better choice than migrating existing ETL. Can individual data transformations be isolated and packaged as executable modules? Do you understand the dependencies among data transformations to construct optimum workflow? What advantages can you gain – performance, agility, reusability, and maintainability – by rebuilding ETL as modular data pipelines using modern, cloud-friendly technology?
• Migrating Metadata. Source to target metadata is a crucial part of managing a data warehouse, knowing data lineage, and tracing and troubleshooting when problems occur. How readily will this metadata transfer to a new cloud platform? Are all of the mappings, transform logic, dataflow, and workflow locked in proprietary tools or buried in SQL code? Will you be able to export and import? Can you reverse engineer the metadata? Or must you rebuild from scratch?
• Migrating Users and Applications. The final step in the process is migrating users and applications to the new cloud data warehouse with no interruption of business operations. What security and access authorizations need to be created or changed? Which BI and analytics tools should be connected? What communication is needed and with whom?
The data warehouse migration process shown above fits into a bigger picture of migration planning and strategy. (See Figure 2.) A step-by-step approach includes several pre-migration steps help to ensure success with migration tactics and execution.
Figure 2 – The Big Picture for Cloud Migration.
• Define Goals and Business Case. Start the planning process with a clear picture of the reasons for migrating your data warehouse to the cloud. Consider goals such as agility, performance, growth, cost savings and labor savings.
• Assess the current data warehouse architecture. If the current architecture is sound then you can plan to migrate to the cloud without redesign and restructuring. If architecturally sufficient for current BI uses but limited for advanced analytics and big data integration, review and refine data models and processes as part of the migration effort. If the current architecture is deficient and struggles to meet current BI requirements, plan to redesign as you migrate to the cloud.
• Define the migration strategy. Taking a “lift and shift” approach is tempting. It seems easy and straightforward to simply move data and processing to the cloud. This approach, however, rarely succeeds. Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.
• Select the technology. Determine the cloud platform to which you will migrate. Then determine which migration tools you’ll need. When choosing migration technology consider the cloud platform, characteristics of the current warehouse, and the migration strategy.
• Migrate and operationalize. As with any technology project it is wise to define test and acceptance criteria at the beginning of the project. Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan, and upon successful testing operationalize the cloud data warehouse and migrate users and applications.
Migrating a data warehouse to the cloud is more than a technology project. Begin with the business case. Then (as with any journey) know your starting position, know your destination, map the path from beginning to end, and then navigate the course. Cloud migration isn’t easy but it can certainly be worthwhile.
Dave Wells will present two keynotes during the Datawarehousing & Business Intelligence Summit:
'Cloud Data Warehousing: Planning for Data Warehouse Migration' on June9 th,
'Modernizing Data Governance for the Age of Self-Service Analytics' on June 10th.
9 november 2021 Praktische workshop Datavisualisatie en Data-driven Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling, data science en de data artist? Lex Pierik behandelt de st...
10 en 11 november 2021 Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij sneller nieuwe...
22 t/m 26 november 2021 (halve dagen)Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke,...
1 en 2 december 2021 Correcte informatie die in de juiste vorm en op het gewenste moment beschikbaar is lijkt een vanzelfsprekendheid. Dit doel kan alleen worden bereikt met een consequent beleid, dat doordacht alle fases van de levenscyclus van info...
7 en 8 december 2021 Praktisch tweedaags seminar met internationaal gerenommeerde spreker Mike Ferguson over het opzetten van een Enterprise Data Lake. Het seminar wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare richtlijnen. In ...
8 en 9 maart 2022 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord op...
17 maart 2022 (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. Waardevolle online tutori...
22 maart 2022Praktische workshop met Rogier Werschkull over cloud datawarehousing.Wat zijn de voor- en nadelen van Cloud Datawarehousing en hoe pak je dat aan? Tijdens dit seminar door expert Rogier Werschkull krijgt u een duidelijk beeld van de vers...