Data breaches have far reaching consequences. They pose a significant financial cost in lost business, fines, and remediation, often averaging 3.92 million USD according to a study by the Ponemon Institute. Their impact on an organization's reputation spans many years. An organization's first step in protecting itself against breaches is identifying its personal data that needs to be safeguarded.
Personal data protection regulations require that entities which collect an individual’s data are able to identify, protect and use it only for purposes that the data was collected for. Enterprises collect a tremendous amount of data from a variety of sources and any of these data sources could potentially contain personal data. Data is often relocated for warehousing, reporting, analytics, storage, testing, and application use, therefore AI models could potentially be copied multiple times over, resulting in potential perforation of personal data across the enterprise. Gartner predicts that by the end of 2020, the backup and archiving of personal data will represent the largest area of privacy risk for 70 percent of organizations, up from 10 percent in 2018.
In order to understand the amount of personal data in the enterprise, it is important to examine the entire data landscape of the enterprise. Periodic re-evaluation is necessary to mitigate privacy risks. Enterprises need to protect personal data and make sure all regulatory requirements towards its lifestyle and correct usage are met. In order to achieve this, it is important to make sure all possible data stores are examined to determine if they contain personal data. This is an operation that needs to be done at scale to cover millions of data assets and repeated with confidence.
The following is a three-step process to discover and protect your sensitive data:
1. Create a glossary: A glossary contains terms that define and describe them to ensure there is clarity around what is personal data, what characteristics could make data personal, and how to identify them. A glossary should be a live and continuously updated document to keep up with updates to existing regulations or new ones an enterprise must adhere to.
2. Identify patterns: Common patterns that represent potential personal data should be documented. These are then used to classify data and match with a term from the glossary created in the first step.
3. Tag assets: Use the taxonomy and the common patterns to connect a term in the glossary with a physical asset. For privacy regulations, it is imperative that every data store is cataloged and tagged to denote if it contains personal data.
However, the process of connecting a term to a physical asset is labor intensive, time consuming and needs to be repeated each time a new data store is added to the enterprise’s data landscape. When updating the taxonomy in response to a regulation, the ability to perform updates quickly is key to enabling an enterprise to respond immediately to compliance asks.
IBM Watson Knowledge Catalog services on Cloud Pack for Data addresses this problem by using parallel processing to scan large amount of assets via both a rule-based and cognitive approach to automate the task of connecting a term to a physical asset. The data stewards serve as subject matter experts and have the final say, as well as any corrections provided by indivuals to improve the reliability of cognitive approach. An organization is able to use Watson Knowledge Catalog to scan large assets, catalog them, and allow for the enterprise to make only the non-sensitive assets available to its data users. Thanks to its business user friendly data catalog and data shaping capabilities, it also streamlines the use of data by data scientists, thus ensuring no sensitive data is used. In a well-governed organization, the catalog plays a vital role in cataloging and governing models as well. Watson Knowledge Catalog ensures the administration of AI models by governing the data used to create the models.
Learn more about IBM Watson Knowledge Catalog.
Sundari Vorunganti is Development Manager Cloud Pak for Data at IBM.
24 en 25 april 2024 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoord ...
16 mei 2024 Praktische en interactieve workshop met Nigel Turner Data-gedreven worden lukt niet door alleen nieuwe technologie en tools aan te schaffen. Het vereist een transformatie van bestaande business modellen, met cultuurverandering, een heront...
23 mei 2024 (halve dag online) Praktische en interactieve workshop met Nigel Turner In ons digitale tijdperk willen veel organisaties datagedreven worden en investeren zij fors in nieuwe technologieën om dit mogelijk te maken. Maar deze i...
29 - 31 mei 2024Praktische driedaagse workshop met internationaal gerenommeerde spreker Alec Sharp over herkennen, beschrijven en ontwerpen van business processen. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare rich...
3 t/m 5 juni 2024Praktische workshop met internationaal gerenommeerde spreker Alec Sharp over het modelleren met Entity-Relationship vanuit business perspectief. De workshop wordt ondersteund met praktijkvoorbeelden en duidelijke, herbruikbare richtl...
10 t/m 12 juni 2024 Praktische workshop Data Management Fundamentals door Chris Bradley - CDMP-examinatie optioneel De DAMA DMBoK2 beschrijft 11 disciplines van Data Management, waarbij Data Governance centraal staat. De Certified Data Managem...
17 t/m 19 juni 2024Praktische driedaagse workshop met internationaal gerenommeerde trainer Lawrence Corr over het modelleren Datawarehouse / BI systemen op basis van dimensioneel modelleren. De workshop wordt ondersteund met vele oefeningen en prakti...
15 oktober 2024 Workshop met BPM-specialist Christian Gijsels over AI-Gedreven Business Analyse met ChatGPT. Kunstmatige Intelligentie, ongetwijfeld een van de meest baanbrekende technologieën tot nu toe, opent nieuwe deuren voor analisten met i...
Deel dit bericht