As more companies adopt artifical intelligence (AI), placing machine learning (ML) models into the hands of developers is imperative. To that end, the Center for Open-Source Data & AI Technologies (CODAIT) launched IBM Model Asset eXchange (MAX) in 2018 to help data scientists and developers easily discover ready-to-use free and open source machine learning and deep learning models.
At OSCON 2019, we announced the launch of the IBM Data Asset eXchange (DAX), an online hub for developers and data scientists to find carefully curated free and open datasets under open data licenses. Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses.
Where possible, datasets posted on DAX will use the Linux Foundation’s Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration. Furthermore, DAX provides unique access to various IBM and IBM Research datasets. IBM plans to publish new datasets on the Data Asset eXchange regularly. The datasets on DAX will integrate with IBM Cloud and AI services as appropriate.
Trusted source of open datasets
For developers, DAX provides a trusted source for carefully curated open datasets for AI. These datasets are ready for use in enterprise AI applications, with related content such as tutorials to make getting started easier.
For staff responsible for dataset usage and vetting, DAX provides curation as well as standardized dataset formats and metadata, in contrast with most other open dataset resources that tend to incorporate fewer quality and licensing terms checks. So DAX datasets are typically more straightforward to adopt within corporations.
Example of datasets in use
An example of the sorts of datasets we’re releasing is the Finance Proposition Bank and Contracts Proposition Bank datasets. These datasets are part of an active research program from IBM Research. This research project aims to improve the natural language understanding technologies behind multiple IBM product offerings, including Watson Natural Language Understanding and Watson Compare & Comply.
Our researchers created these datasets with input from Watson developers, matching the characteristics of the target text to those of the real-world documents that the system analyzes in production. The researchers used these datasets to train domain-specific versions of the parsers that extract semantic meaning from governing business documents such as legal agreements and financial reports.
IBM Research has a long history of doing this kind of work in the open, and we on the CODAIT team are proud to help IBM Research’s mission of openness by releasing this cutting-edge research data on the Data Asset eXchange.
Why DAX?
While there are many resources available online for finding open datasets – ranging from collections of links on GitHub to sites such as Kaggle Datasets – DAX is unique in its high level of quality and curation. DAX helps create end-to-end deep learning workflows (from using the data to train models to deploying models in standard ways) allowing developers to consume open data with confidence under clearly defined open data licenses.
Data you need to develop AI solutions
IBM designed the Data Asset eXchange repository to complement the Model Asset eXchange. The user interface for organizing the assets is consistent across the two platforms, and users can easily train models on MAX using data from the Data Asset eXchange.
The CODAIT team’s goal is to make it straightforward to use DAX and MAX assets in conjunction with IBM AI products as well as other hybrid, multicloud AI tooling, both proprietary and open source. We want to give data scientists and developers well-curated data starting points, so that it’s easier for them to start developing their AI applications and solutions.
Fred Reiss is Chief Architect at the IBM Center for Open-Source Data and AI Technologies (CODAIT).
Vijay Bommireddipalli is the program director of IBM CODAIT.
Gabriela de Queiroz is Senior Developer - Deep Learning/Machine Learning/AI Advocate, IBM CODAIT.
14 en 15 mei 2025 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord op...
19 t/m 21 mei 2025Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en praktij...
20 en 21 mei 2025 Deze 2-daagse cursus is ontworpen om dataprofessionals te voorzien van de kennis en praktische vaardigheden die nodig zijn om Knowledge Graphs en Large Language Models (LLM's) te integreren in hun workflows voor datamodelleri...
22 mei 2025 Workshop met BPM-specialist Christian Gijsels over AI-Gedreven Business Analyse met ChatGPT. Kunstmatige Intelligentie, ongetwijfeld een van de meest baanbrekende technologieën tot nu toe, opent nieuwe deuren voor analisten met innovatie...
17 t/m 19 november 2025 De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot...
Alleen als In-house beschikbaar Het Logical Data Warehouse, een door Gartner geïntroduceerde architectuur, is gebaseerd op een ontkoppeling van rapportage en analyse enerzijds en gegevensbronnen anderzijds. Een flexibelere architectuur waarbij snell...
Deel dit bericht