Data breaches have far reaching consequences. They pose a significant financial cost in lost business, fines, and remediation, often averaging 3.92 million USD according to a study by the Ponemon Institute. Their impact on an organization's reputation spans many years. An organization's first step in protecting itself against breaches is identifying its personal data that needs to be safeguarded.
Personal data protection regulations require that entities which collect an individual’s data are able to identify, protect and use it only for purposes that the data was collected for. Enterprises collect a tremendous amount of data from a variety of sources and any of these data sources could potentially contain personal data. Data is often relocated for warehousing, reporting, analytics, storage, testing, and application use, therefore AI models could potentially be copied multiple times over, resulting in potential perforation of personal data across the enterprise. Gartner predicts that by the end of 2020, the backup and archiving of personal data will represent the largest area of privacy risk for 70 percent of organizations, up from 10 percent in 2018.
In order to understand the amount of personal data in the enterprise, it is important to examine the entire data landscape of the enterprise. Periodic re-evaluation is necessary to mitigate privacy risks. Enterprises need to protect personal data and make sure all regulatory requirements towards its lifestyle and correct usage are met. In order to achieve this, it is important to make sure all possible data stores are examined to determine if they contain personal data. This is an operation that needs to be done at scale to cover millions of data assets and repeated with confidence.
The following is a three-step process to discover and protect your sensitive data:
1. Create a glossary: A glossary contains terms that define and describe them to ensure there is clarity around what is personal data, what characteristics could make data personal, and how to identify them. A glossary should be a live and continuously updated document to keep up with updates to existing regulations or new ones an enterprise must adhere to.
2. Identify patterns: Common patterns that represent potential personal data should be documented. These are then used to classify data and match with a term from the glossary created in the first step.
3. Tag assets: Use the taxonomy and the common patterns to connect a term in the glossary with a physical asset. For privacy regulations, it is imperative that every data store is cataloged and tagged to denote if it contains personal data.
However, the process of connecting a term to a physical asset is labor intensive, time consuming and needs to be repeated each time a new data store is added to the enterprise’s data landscape. When updating the taxonomy in response to a regulation, the ability to perform updates quickly is key to enabling an enterprise to respond immediately to compliance asks.
IBM Watson Knowledge Catalog services on Cloud Pack for Data addresses this problem by using parallel processing to scan large amount of assets via both a rule-based and cognitive approach to automate the task of connecting a term to a physical asset. The data stewards serve as subject matter experts and have the final say, as well as any corrections provided by indivuals to improve the reliability of cognitive approach. An organization is able to use Watson Knowledge Catalog to scan large assets, catalog them, and allow for the enterprise to make only the non-sensitive assets available to its data users. Thanks to its business user friendly data catalog and data shaping capabilities, it also streamlines the use of data by data scientists, thus ensuring no sensitive data is used. In a well-governed organization, the catalog plays a vital role in cataloging and governing models as well. Watson Knowledge Catalog ensures the administration of AI models by governing the data used to create the models.
Learn more about IBM Watson Knowledge Catalog.
Sundari Vorunganti is Development Manager Cloud Pak for Data at IBM.
7 november (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. En hij behandelt wat elke data-pr...
11 t/m 13 november 2024Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en pr...
18 t/m 20 november 2024Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare ...
26 en 27 november 2024 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Management Professional (CDMP) certificatie biedt een traject voor het inleidende niveau (Associate) tot en met hogere niveaus van...
3 april 2025 (halve dag)Praktische workshop met Alec Sharp [Halve dag] Deze workshop door Alec Sharp introduceert conceptmodellering vanuit een non-technisch perspectief. Alec geeft tips en richtlijnen voor de analist, en verkent datamodellering op c...
10, 11 en 14 april 2025Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikba...
15 april 2025 Praktische workshop Datavisualisatie - Dashboards en Data Storytelling. Hoe gaat u van data naar inzicht? En hoe gaat u om met grote hoeveelheden data, de noodzaak van storytelling en data science? Lex Pierik behandelt de stromingen in ...
Deel dit bericht