Ask a CIO where their focus lies and digital transformation as well as growth will come into the conversation quite quickly. The former sees growing investment in data analytics to become data-driven (45% of organizations expect to increase their spending in this area) while the latter is fueled by disruptive technology and the adoption of AI (41% of organizations name it as their game changer). Both rely virtually entirely on the enterprise leveraging of data.
Bringing either to a good end is a long term challenge for most organizations; the fact that these goals have stayed in the annual top CIO priorities for several years is a testament to this. Increasing regulation is now adding further complexity and barriers that appear to push the achievement ever further away.
The root cause is firmly entrenched in legacy systems and traditional data governance challenges that not only result in data silos but also the misguided belief that data privacy is diametrically opposed to effective exploration of information. In the next sections, we’ll reveal what else is needed as well as how to right-size governance of more than just data helps organizations achieve their objectives.
Governing digital transformation
To achieve their goals of digital transformation and becoming data-driven, companies need more than just a better data warehouse or BI tool. They need a range of analytical capabilities from data engineering to data warehousing to operational databases and data science. Throughout their journey, they need to be able to fluidly move between these different analytics, exchanging data and gaining insights as they go.
The challenge is that most companies are faced with both a legacy of individual applications for each as well a multitude of more recent point solutions. Covering just one type of analytics and integrated with one or two other applications, organizations are still faced with a patchwork of siloed systems that each carry their own data, metadata, security, and governance. Ensuring consistency of these aspects (data context) between different systems typically falls to IT, introducing not only additional costs due to increased staff overheads but also reducing time to insight and increased risk for the business as a whole. The problem is only amplified as organizations start leveraging cloud and transient workloads.
These consequences are negated when the multi-disciplinary analytics are unified in a single platform where data context is shared, irrespective of on which infrastructure the analytics are run, giving complete flexibility. The tedium of governing for consistent context between disconnected systems and applications simply disappears. Governance can finally and properly start playing the role it always was destined to play: the key to high-quality business decisions across the whole organization.
Governing for compliance
When GDPR became enforceable in May 2018 it was but the first in a global trend of data privacy and protection regulations. Countries and regions around the world are taking the opportunity to borrow from the spirit or the letter of the regulation to update their own. Multinational organizations will be faced with an increasingly complex data privacy landscape. Organizations with dedicated GDPR programs will find themselves overwhelmed to meet the requirements of new regulations as they emerge as each is dealt with in isolation.
By taking a step back and focusing on the shared principles and data governance objectives, organizations can establish a global privacy management program. The shared data context (shared both within as well as outside the single multi-disciplinary analytics platform) helps to increase the underlying architectural control of enterprise data and broadens compliance workflows to incorporate all enterprise stakeholders. By shifting from GDPR to global privacy management, organizations not only achieve the goal of meeting any and all of its regulatory requirements but are also better placed to meet their evolving business challenges.
Governing machine learning
As organizations seek growth through differentiation, AI and machine learning become commonplace, industrialized. But as ever more complex models and systems are self-taught, it becomes ever harder to rationalize their decisions, posing accountability problems. In addition to the compliance and ethical concerns this poses, it also has a business impact: how can you repeat what you cannot explain?
Data scientists in most enterprises live in an organizational bubble with isolated workflows and processes. They often work on their own laptops and in a multitude of languages. There is limited communication between them and other business units, and the wheel gets re-invented on a regular basis as a result. Experiments remain undocumented and model versioning is done haphazardly. Data remains siloed with data scientists either having access to too little of it for their purposes or operate without restrictions.
Since it’s impossible to govern the resulting algorithms, organizations, in order to scale their AI efforts, must govern the ingredients that produce them: the data as well as the data science process itself. Data scientists are the ultimate users of multi-disciplinary analytics. They have a continuous need for not just data science workloads but also data engineering and data warehouse capabilities. A single platform with consistent context suits them to a tee, from a data perspective.
For a data science initiative that is both compliant and scalable, it is key to not only manage the data but also the process itself. Doing so without restricting the freedom data scientists need to do their job is crucial. Surfacing fully governed data securely through the shared context, they must have the ability to work with languages and frameworks of choice without being held back by IT. Experiments must be captured in their entirety so they are not only documented but also become repeatable and replicable. A model deployment must be transparent and consistent.
Governing digital transformation and growth
On the face of it, digital transformation, regulatory compliance, and business growth are only distantly related but governance shows itself as a common thread. However, organizations must approach this differently than data governance in isolated projects. Instead, they must focus on holistic governance of data as well as processes, as the key to high-quality business decisions, regulatory compliance, and digital transformation. Applying governance as a fundamental enterprise ingredient makes data silos disappear, and data privacy and effective exploration of information are no longer mutually exclusive.
It is clear this is as much an organizational as it is a technological challenge; the right technology simplifies and unburdens the first aspect. The biggest realization for organizations is the connectedness of the tasks at hand and how the application of fluid, multi-disciplinary analytics with a consistent data context provides the foundation upon which enterprise-wide governance rests. Cloudera provides the base upon which organizations can build their evolved governance of data as well as process and systems.
Start governing today
Cloudera provides both the technology as well as the insight and experience for its successful application. With a common and consistent data context, crucial for governance of data and process, Cloudera Enterprise delivers secure, fully governed, self-service, enterprise data analytics people can run anywhere data resides. Together with tailored services and training relevant to your needs and requirements, we provide the complete foundation to accelerate and attain your data-driven initiatives.
Wim Stoop is Senior Product Marketing Manager at Cloudera.
3 juni 2020 (Online!) Praktische workshop Datavisualisatie en Data-driven Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling, data science en de data artist? Lex Pierik...
2 en 3 juli 2020 (incl. Live Video Stream) Kom naar de zevende editie van ons jaarlijkse congres met wederom een ijzersterke sprekers line-up. In twee intensieve dagen behandelen wij belangrijke thema’s als Big Data, Analytics & Data Science, Dat...
6 en 7 oktober 2020 Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij sneller nieuw...
2 - 4 november 2020Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en prakti...
11 en 12 november 2020 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
23 - 25 november 2020Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herb...
1 en 2 december 2020 Correcte informatie die in de juiste vorm en op het gewenste moment beschikbaar is lijkt een vanzelfsprekendheid. Dit doel kan alleen worden bereikt met een consequent beleid, dat doordacht alle fases van de levenscyclus van info...