Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI.
Gartner states that "By 2022, 75% of new end-user solutions leveraging machine learning (ML) and AI techniques will be built with commercial instead of open source platforms". Spoiler alert: it's not because data scientists will stop relying on open source for the latest innovation in ML algorithms and development environments. But rather as businesses look to operationalize machine learning capabilities at scale, they’ll turn increasingly to commercial platforms, with connectors to open source, where investments in enterprise features like collaboration, reuse, transparency, model management and data platform integration have been focused.
Data management for ML/AI – what’s the big deal?
Most would maintain that the majority of data scientists’ time is still spent on collecting and preparing data for analysis. Particularly with continuing rapid evolution of open source and commercially available algorithms or even pre-trained models, the importance of slashing time spent on data gathering and pre-processing only grows. And once a model has been trained, tuned and optimized, data scientists want to put it to work for the business ASAP. Yet it can take months to deploy models to production, and we’ve met with more than a few organizations noting instances where even an experienced team’s models are not making it to production at all.
What emerges is the criticality of a data strategy and core data management competency, including both data and model management, to support enterprise ML initiatives. In recent technical advice on creating a data strategy for machine learning, Gartner concludes that “The data-preprocessing architecture that transports and integrates data for ML is the connective tissue of the data strategy. Without it, ML projects become disjointed and difficult to scale and maintain.”² While open source frameworks and standalone ML services can complement a data strategy for ML, they are not a substitute and won’t solve for the implementation of a scalable pre and post-processing architecture for the complete ML lifecycle that accounts for the complexities of dealing with big data while ensuring security and data quality for data science pipelines.
Build a future-proof AI factory on your foundation – Today
Cloudera customers can start building enterprise AI on their data management competencies today with the Cloudera Data Science Workbench (CDSW). CDSW gives data scientists the freedom to use their favorite open source and other vendor tools and libraries for the end-to-end ML workflow in addition to secure, self-service access to corporate data and distributed computing power, all managed efficiently and securely by IT. Data scientists and engineers can collaborate on shared projects for tasks ranging from data ingest and preparation to model training and deployment in production, all from one cohesive experience accessible from anywhere through a web browser.
And as part of Cloudera’s data platform for unified, multi-function analytics on shared data anywhere, CDSW brings data science securely to your data and other analytics workflows, capitalizing on your foundational enterprise data management capabilities versus driving silos and the associated costs and security risks.
And we couldn’t mention future-proofing without the Cloudera Data Platform (CDP), Cloudera’s next-generation platform and the industry’s first Enterprise Data Cloud. CDP will deliver a new cloud-native machine learning service that provides all the benefits of CDSW as a serverless experience in the cloud, scaling seamlessly from simple R and Python analysis to distributed Tensorflow and Spark workloads. Stay tuned.
Bethann Noble is Director Product Marketing Machine Learning at Cloudera.
21 en 22 maart 2023 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord ...
4 april 2023 (Face-to-face én Live Video Stream) Schrijf in voor al weer de tiende editie van ons jaarlijkse congres met wederom een ijzersterke sprekers line-up. Op deze editie behandelen wij belangrijke thema’s als Datamesh, Analytics ...
5 april 2023 Praktisch en interactief seminar met Nigel Turner Data-gedreven worden lukt niet door alleen nieuwe technologie en tools aan te schaffen. Het vereist een transformatie van bestaande business modellen, met cultuurverandering, een herontwe...
5 april 2023 (halve dag)Praktische workshop met Alec Sharp This workshop introduces concept modelling from a non-technical perspective, provides tips and guidelines for the analyst, and explores entity-relationship modelling at conceptual and logical...
5 april 2023 (halve dag)Praktische workshop door Thomas Frisendal In deze workshop van een halve dag zal de Deense expert Thomas Frisendal laten zien wat graph technologieën in de praktijk betekenen. Hij zal ook laten zien hoe graph oplossi...
13 april 2023 Praktische workshop Datavisualisatie en Human Data Stories. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling, data science en de data artist? Lex Pierik behandelt de stromingen...
8 t/m 10 mei 2023 Praktische workshop Data Management Fundamentals door Chris Bradley - CDMP-examinatie optioneel De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Managemen...
11 en 12 mei 2023 Praktische workshop Data Governance & Stewardship door Chris Bradley - CDMP-examinatie optioneel Wat betekent Data Governance eigenlijk, hoe kunnen we het praktisch laten werken en wat zijn de implicaties? Deze 2-daagse cursus bie...
Deel dit bericht