Making the connection: how SQL on Hadoop brings together data for deeper insight

Deel dit bericht

The fusing of analytics with leading technologies can unlock significant business value and bring new transformation opportunities for enterprise companies. In order to be successful, analytics-based initiatives such as AI and the Internet of Things (IoT) need massive amounts of big data - and also the right applications to uncover hidden patterns, correlations and insights necessary to drive better data-driven decisions.

Finding hidden data
There are many reasons why siloed, unanalyzed data exists. Businesses will always struggle with integrating data following mergers and acquisitions; when extending data warehouses to include external data; and where previously uncollected data is now being collected in new repositories.

The good news: the pool of usable data is growing. Businesses are being challenged to go beyond capturing and analyzing traditional, transactional data to include new data types such as streaming audio and video, social media and call logs stored in databases, NoSQL, Hadoop or object stores.

To aggregate, un-silo and drive value from data, you need optimal collection, access, query tools and appropriate storage. When managing big data, the right SQL on Hadoop software should help improve data exploration, discovery, testing and advanced querying. This can lead to better customer interactions, streamlined processes and systems improvements. With most companies having an abundance of SQL skills, you can achieve cost and time savings with a SQL on Hadoop solution.

Bringing data together
Hadoop provides a cost-effective way to process big data from diverse internal and external sources—and in wide variety of formats. Using an optimal SQL on Hadoop software provides your data scientists and analysts with a richer pool of dispersed data, even from new sources such as HDFS, RDMS, NoSQL databases, object stores and Web HDFS. This can drive:
Better insights. Build richer models for predictive analytics. You can produce greater accuracy and uncover previously unconsidered perspectives that could lead to new business opportunities.
Cost efficiencies. Processing data where it resides allows it to be stored in the best-fitting repository and to stay there, without additional effort. This can free up database administrators to focus on activities that add greater value.
Greater flexibility and speed. Accessing data where it resides means data can be used immediately once it is captured, without migration delays. In addition, better access to a wider variety of information means applications that deliver more value can be quickly prototyped.

SQL on Hadoop for the enterprise
The benefits of SQL on Hadoop are well-known. Smaller organizations often use community editions, but that lacks the features and functionality need by larger businesses. Enterprise-ready SQL on Hadoop needs to have certain characteristics – such as query performance with massive scalability, support for more concurrent users, low latency, hybrid cloud capabilities, added security and high-speed processing – for everything from ad-hoc to complex queries. The right SQL on Hadoop solution can achieve these requirements, processing petabytes of data from various repositories, delivering comprehensive insights at the optimal moment for high-level data analytics.

Using SQL on Hadoop with IBM Db2 Big SQL
IBM Db2 Big SQL is an enterprise-grade, hybrid SQL on Hadoop engine. It delivers advanced, scalable, and security-rich data querying for the enterprise business. Its cost-based optimizer rewrites the query in such a way that the execution plan is optimized based on data location, table and column statistics, providing peak resource utilization with high, massively parallel processing (MPP) performance. In addition, you can reuse the SQL skills of your data scientists and analysts, saving the time spent in retraining staff.

You can make advanced querying of disparate data in your organization straightforward with Db2 Big SQL. Alleviate the need to shift, replicate and refresh the data to gain insights. With a high-level of ANSI SQL compatibility, Db2 Big SQL understands generic and vendor-specific dialects like those found in Oracle and Netezza products. This can simplify the planning and execution of data warehouse offloading and reduces effort needed to re-write applications.  

Db2 Big SQL is a part of an extended network of IBM data management solutions, available as part of the IBM Hybrid Data Management Platform (HDMP). HDMP uses the common SQL engine across the IBM Db2 family of products. With a single license, HDMP allows users to choose the technology that makes the most sense for their workload. You can add or swap IBM Db2 technologies with ease between database, data warehouse, data lake, or you can utilize fast data capabilities. Db2 Big SQL is also supported on Cloudera platforms, Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH).

Holly Vatter is Product Marketing Manager for Data Lake & Cloudera Partnership, IBM.


IBM, Cloudera