Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI.
Gartner states that "By 2022, 75% of new end-user solutions leveraging machine learning (ML) and AI techniques will be built with commercial instead of open source platforms". Spoiler alert: it's not because data scientists will stop relying on open source for the latest innovation in ML algorithms and development environments. But rather as businesses look to operationalize machine learning capabilities at scale, they’ll turn increasingly to commercial platforms, with connectors to open source, where investments in enterprise features like collaboration, reuse, transparency, model management and data platform integration have been focused.
Data management for ML/AI – what’s the big deal?
Most would maintain that the majority of data scientists’ time is still spent on collecting and preparing data for analysis. Particularly with continuing rapid evolution of open source and commercially available algorithms or even pre-trained models, the importance of slashing time spent on data gathering and pre-processing only grows. And once a model has been trained, tuned and optimized, data scientists want to put it to work for the business ASAP. Yet it can take months to deploy models to production, and we’ve met with more than a few organizations noting instances where even an experienced team’s models are not making it to production at all.
What emerges is the criticality of a data strategy and core data management competency, including both data and model management, to support enterprise ML initiatives. In recent technical advice on creating a data strategy for machine learning, Gartner concludes that “The data-preprocessing architecture that transports and integrates data for ML is the connective tissue of the data strategy. Without it, ML projects become disjointed and difficult to scale and maintain.”² While open source frameworks and standalone ML services can complement a data strategy for ML, they are not a substitute and won’t solve for the implementation of a scalable pre and post-processing architecture for the complete ML lifecycle that accounts for the complexities of dealing with big data while ensuring security and data quality for data science pipelines.
Build a future-proof AI factory on your foundation – Today
Cloudera customers can start building enterprise AI on their data management competencies today with the Cloudera Data Science Workbench (CDSW). CDSW gives data scientists the freedom to use their favorite open source and other vendor tools and libraries for the end-to-end ML workflow in addition to secure, self-service access to corporate data and distributed computing power, all managed efficiently and securely by IT. Data scientists and engineers can collaborate on shared projects for tasks ranging from data ingest and preparation to model training and deployment in production, all from one cohesive experience accessible from anywhere through a web browser.
And as part of Cloudera’s data platform for unified, multi-function analytics on shared data anywhere, CDSW brings data science securely to your data and other analytics workflows, capitalizing on your foundational enterprise data management capabilities versus driving silos and the associated costs and security risks.
And we couldn’t mention future-proofing without the Cloudera Data Platform (CDP), Cloudera’s next-generation platform and the industry’s first Enterprise Data Cloud. CDP will deliver a new cloud-native machine learning service that provides all the benefits of CDSW as a serverless experience in the cloud, scaling seamlessly from simple R and Python analysis to distributed Tensorflow and Spark workloads. Stay tuned.
Bethann Noble is Director Product Marketing Machine Learning at Cloudera.
7 november (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. En hij behandelt wat elke data-pr...
11 t/m 13 november 2024Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en pr...
18 t/m 20 november 2024Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare ...
26 en 27 november 2024 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot en met hogere niveaus van...
3 april 2025 (halve dag)Praktische workshop met Alec Sharp [Halve dag] Deze workshop door Alec Sharp introduceert conceptmodellering vanuit een non-technisch perspectief. Alec geeft tips en richtlijnen voor de analist, en verkent datamodellering op c...
10, 11 en 14 april 2025Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikba...
15 april 2025 Praktische workshop Datavisualisatie - Dashboards en Data Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling en data science? Lex Pierik behandelt de stromingen in ...
Deel dit bericht