12-01-2020 Door: Dave Wells

Modernizing the Data Warehouse: Migrating to the Cloud

Deel dit bericht

Data warehousing in the cloud has become popular as companies are challenged with growing data volumes, higher service level expectations, and the need to integrate structured warehouse data with unstructured data in a data lake. Cloud data warehousing provides many benefits, but getting there isn’t easy. Migrating an existing data warehouse to the cloud is a complex process of moving schema, data, and ETL. Moving an existing data warehouse to the cloud is not quick, and it isn’t easy, but there are real and substantial benefits.

Despite declarations by some that data warehousing is dead, the data warehouse is alive and continues to be needed, but it also needs to be modernized. Data warehouse modernization can occur in any or all of three areas: platform, architecture, and management processes.

Yes the data warehouse is alive, but it is not necessarily alive and well. With rapid growth of use cases, data sources, data types, and data volumes legacy data warehouses face many challenges:

• Scale up instead of scale out is inadequate for today’s data volumes.
• Workload peaks and valleys become more extreme and tooling up for the peaks is costly.
• Cost and complexity of data center management and operations expand out of proportion to actual growth.
• Processing bottlenecks increase data latency and corresponding user dissatisfaction.
• Deploying new infrastructure capability and capacity is slow, it is costly, and it causes project delays.
• The data warehouse has become business critical — sometimes mission critical — yet it lacks fault tolerance and is usually not fully integrated into disaster recovery and business resumption plans.
• Security and governance — managed externally to warehouse operations — struggle to keep up with data growth, user growth, and the shift to self-service.
• Database management becomes increasingly difficult with growth in data volume and variety.

The challenges of traditional data warehousing make cloud data warehousing especially appealing.
• Cloud scalability responds to the challenges of growth management.
• Cloud elasticity eases the pain of workload management.
• Managed infrastructure shifts the burden of data center management and reduces data center costs.
• Cloud performance eliminates processing bottlenecks and reduces data latency.
• The cloud’s “instant infrastructure” gets projects underway much faster.
• Business risks and recovery concerns are reduced through the benefits of virtualization.
• Security and governance are enhanced through service provider features and virtual private cloud protections.
• RDBMS in the cloud reduces database management complexity without the need to rebuild using NoSQL.

Cloud migration is inviting and it is a practical step along the path to modernization. But it is not as simple as moving your data from one platform to another. Tactically and technically, data warehouse migration is a process of many steps to migrate all of the components. (See figure 1.)


Figure 1: Migrating a Data Warehouse to the Cloud.

When planning for cloud migration you’ll need to answer many questions.
Migrating Schema. Before moving warehouse data, you’ll need to migrate table structures and specifications. Will you make any structural changes as part of the migration? Do you need to rethink indexing or partitioning?
Migrating Data. Moving very large volumes of data is process intensive, network intensive, and time consuming. How long will it take to migrate and what can you do to accelerate? Did you restructure as part of schema migration? Do you need to transform data as part of the data migration? Can you transform in stream or should you pre-process and then migrate?
Migrating ETL. Moving data may be the easy part when compared to migrating ETL processes. Will you need to change the code base to optimize for platform performance? Do you need to change data transformations to sync with data restructuring? Should data flows remain intact or should they be reorganized? Do you need to reduce data latency and deliver near real-time data as part of migration? Would it make sense to migrate ETL processing to the cloud as well? If so, is there a utility to convert your ETL code?
Rebuilding Data Pipelines. With any substantive change to data flow or data transformation, rebuilding data pipelines may be a better choice than migrating existing ETL. Can individual data transformations be isolated and packaged as executable modules? Do you understand the dependencies among data transformations to construct optimum workflow? What advantages can you gain – performance, agility, reusability, and maintainability – by rebuilding ETL as modular data pipelines using modern, cloud-friendly technology?
Migrating Metadata. Source to target metadata is a crucial part of managing a data warehouse, knowing data lineage, and tracing and troubleshooting when problems occur. How readily will this metadata transfer to a new cloud platform? Are all of the mappings, transform logic, dataflow, and workflow locked in proprietary tools or buried in SQL code? Will you be able to export and import? Can you reverse engineer the metadata? Or must you rebuild from scratch?
Migrating Users and Applications. The final step in the process is migrating users and applications to the new cloud data warehouse with no interruption of business operations. What security and access authorizations need to be created or changed? Which BI and analytics tools should be connected? What communication is needed and with whom?

The data warehouse migration process shown above fits into a bigger picture of migration planning and strategy. (See Figure 2.) A step-by-step approach includes several pre-migration steps help to ensure success with migration tactics and execution.


Figure 2 – The Big Picture for Cloud Migration.

Define Goals and Business Case. Start the planning process with a clear picture of the reasons for migrating your data warehouse to the cloud. Consider goals such as agility, performance, growth, cost savings and labor savings.
Assess the current data warehouse architecture. If the current architecture is sound then you can plan to migrate to the cloud without redesign and restructuring. If architecturally sufficient for current BI uses but limited for advanced analytics and big data integration, review and refine data models and processes as part of the migration effort. If the current architecture is deficient and struggles to meet current BI requirements, plan to redesign as you migrate to the cloud.
Define the migration strategy. Taking a “lift and shift” approach is tempting. It seems easy and straightforward to simply move data and processing to the cloud. This approach, however, rarely succeeds. Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.
Select the technology. Determine the cloud platform to which you will migrate. Then determine which migration tools you’ll need. When choosing migration technology consider the cloud platform, characteristics of the current warehouse, and the migration strategy.
Migrate and operationalize. As with any technology project it is wise to define test and acceptance criteria at the beginning of the project. Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan, and upon successful testing operationalize the cloud data warehouse and migrate users and applications.
Migrating a data warehouse to the cloud is more than a technology project. Begin with the business case. Then (as with any journey) know your starting position, know your destination, map the path from beginning to end, and then navigate the course. Cloud migration isn’t easy but it can certainly be worthwhile.

Dave Wells will present two keynotes during the Datawarehousing & Business Intelligence Summit:
'Cloud Data Warehousing: Planning for Data Warehouse Migration' on March 25th
'Modernizing Data Governance for the Age of Self-Service Analytics' on March 26th.

Furthermore he will present an unique post-conference workshop: Cloud Data Warehousing on March 30 en 31.

Dave Wells

Dave Wells is the Data Management Practice Director at Eckerson Group, a data analytics research and consulting organization. He is an internationally recognized thought leader in data management, a frequent speaker at industry conferences, and a contributing author to industry publications. Dave brings a unique perspective to data management based on several decades of working with data in both technical and business roles. He works at the intersection of information management and business management, where real value is derived from data assets.

Alle blogs van deze auteur