As more companies adopt artifical intelligence (AI), placing machine learning (ML) models into the hands of developers is imperative. To that end, the Center for Open-Source Data & AI Technologies (CODAIT) launched IBM Model Asset eXchange (MAX) in 2018 to help data scientists and developers easily discover ready-to-use free and open source machine learning and deep learning models.
At OSCON 2019, we announced the launch of the IBM Data Asset eXchange (DAX), an online hub for developers and data scientists to find carefully curated free and open datasets under open data licenses. Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses.
Where possible, datasets posted on DAX will use the Linux Foundation’s Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration. Furthermore, DAX provides unique access to various IBM and IBM Research datasets. IBM plans to publish new datasets on the Data Asset eXchange regularly. The datasets on DAX will integrate with IBM Cloud and AI services as appropriate.
Trusted source of open datasets
For developers, DAX provides a trusted source for carefully curated open datasets for AI. These datasets are ready for use in enterprise AI applications, with related content such as tutorials to make getting started easier.
For staff responsible for dataset usage and vetting, DAX provides curation as well as standardized dataset formats and metadata, in contrast with most other open dataset resources that tend to incorporate fewer quality and licensing terms checks. So DAX datasets are typically more straightforward to adopt within corporations.
Example of datasets in use
An example of the sorts of datasets we’re releasing is the Finance Proposition Bank and Contracts Proposition Bank datasets. These datasets are part of an active research program from IBM Research. This research project aims to improve the natural language understanding technologies behind multiple IBM product offerings, including Watson Natural Language Understanding and Watson Compare & Comply.
Our researchers created these datasets with input from Watson developers, matching the characteristics of the target text to those of the real-world documents that the system analyzes in production. The researchers used these datasets to train domain-specific versions of the parsers that extract semantic meaning from governing business documents such as legal agreements and financial reports.
IBM Research has a long history of doing this kind of work in the open, and we on the CODAIT team are proud to help IBM Research’s mission of openness by releasing this cutting-edge research data on the Data Asset eXchange.
Why DAX?
While there are many resources available online for finding open datasets – ranging from collections of links on GitHub to sites such as Kaggle Datasets – DAX is unique in its high level of quality and curation. DAX helps create end-to-end deep learning workflows (from using the data to train models to deploying models in standard ways) allowing developers to consume open data with confidence under clearly defined open data licenses.
Data you need to develop AI solutions
IBM designed the Data Asset eXchange repository to complement the Model Asset eXchange. The user interface for organizing the assets is consistent across the two platforms, and users can easily train models on MAX using data from the Data Asset eXchange.
The CODAIT team’s goal is to make it straightforward to use DAX and MAX assets in conjunction with IBM AI products as well as other hybrid, multicloud AI tooling, both proprietary and open source. We want to give data scientists and developers well-curated data starting points, so that it’s easier for them to start developing their AI applications and solutions.
Fred Reiss is Chief Architect at the IBM Center for Open-Source Data and AI Technologies (CODAIT).
Vijay Bommireddipalli is the program director of IBM CODAIT.
Gabriela de Queiroz is Senior Developer - Deep Learning/Machine Learning/AI Advocate, IBM CODAIT.
7 november (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. En hij behandelt wat elke data-pr...
11 t/m 13 november 2024Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en pr...
18 t/m 20 november 2024Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare ...
26 en 27 november 2024 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot en met hogere niveaus van...
3 april 2025 (halve dag)Praktische workshop met Alec Sharp [Halve dag] Deze workshop door Alec Sharp introduceert conceptmodellering vanuit een non-technisch perspectief. Alec geeft tips en richtlijnen voor de analist, en verkent datamodellering op c...
10, 11 en 14 april 2025Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikba...
15 april 2025 Praktische workshop Datavisualisatie - Dashboards en Data Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling en data science? Lex Pierik behandelt de stromingen in ...
Deel dit bericht