It's 2018, the year GDPR will come into effect, but also the year where the first real enterprise blockchain applications will start rolling out according to some. Unfortunately, GDPR was never designed with blockchain in mind. So how will GDPR affect blockchain applications? This article focusses om some aspects of GDPR which will have direct impact on how blockchain applications should be build and used, and especially on how GDPR has the opposite effect in some ways when it comes to making blockchain compliant the new EU regulation.
The Blockchain part
To explain what impact GDPR has on blockchain technology, some basic concepts need to be discussed first.
Encryption and hashing
Both encryption and hashing are fundamental to blockchain technologies. In short, hashing is a one-way transformation of data to an unreadable piece of data (hash value). Hashing the same value always renders the same hash, but restoring the original value out of the hash is impossible. With encryption, there is a two-way transformation: Data can be encrypted with a certain key, so it becomes unreadable. With this key, this unreadable piece of data can always be decrypted to its original value. In other words, there is no data loss of the original value when it is encrypted.
Immutability of transactions
A transaction is simply put an event that happened. ‘Alice transferred 3 bitcoin to Bob’ is an example of a transaction on the bitcoin blockchain. Transactions that are written on a blockchain, are by definition immutable. No one can change these transactions once they are written on a blockchain. No one can delete these data, since this would ‘break the chain’ in a sense, rendering the complete blockchain useless. Be aware that this is an oversimplification of the concept for clarity.
Public vs Permissioned
Blockchain technologies can be categorized according to public and private (permissioned) blockchain technologies. The bitcoin and the ethereum blockchain are both examples of public blockchains. Anyone can join the network and start validating transactions. Hyperledger Fabric is an example of a permissioned blockchain, which is aimed at the enterprise world. A blockchain technology can also be both. For example, a company (or consortium of companies) can host their own private ethereum blockchain network, so they have full control over the nodes. Microsoft Azure offers such a solution as a ‘blockchain-as-a-service’.
This article will be focused on permissioned blockchains — where nodes are permissioned hosted — although a lot of arguments below are still applicable to public blockchains.
Any individual can browse through the complete history of all bitcoin transactions, making the transactions on this public blockchain technology completely transparent. Transparency in private blockchains is another matter, but it is still guaranteed in other ways.
Another important concept to understand in this context is the difference between storage principles in centralized systems and in ‘distributed ledger technologies’ (which can be seen as a more generic term for blockchain technology).
In centralized applications, the basic operations of persistent storage are often described as CRUD, which stands for Create-Read-Update-Delete. Taking into account the immutability of transactions in blockchain technology, it becomes clear that these operations don’t match with storage actions in decentralized ledger technologies. Deleting written transactions is simply impossible on a blockchain. The same holds true for updating existing transactions. Therefore, CRUD-operations cannot be used to describe storage actions on a blockchain.
Instead, operations on blockchain can be described as CRAB, which stands for Create-Retrieve-Append-Burn. This concept of CRAB is invented by the blockchain company BigchainDB. The Create and Retrieve action don’t need explanation. The Append, which replaces Update, means that only new transactions can be appended on a blockchain, thereby changing the ‘world state’. This ‘world state’ is the sum of all past transactions (or ‘events’) up until now.
The Burn operation in CRAB can be interpreted in multiple ways depending on what the desired outcome is. Throwing away or forgetting the encryption keys can be a Burn, so that Appending new transactions and changing the world-state any further of this asset is not possible anymore. Or the Burn operation can also be seen as encrypting the data with random keys so decrypting the actual data that is written on a blockchain becomes impossible, thereby hiding the original data forever.
The GDPR part
The complete official GDPR document can be consulted freely online. It’s an extensive document with many topics. For this article, only a couple of these topics will be highlighted which are the most relevant regarding blockchain technology.
International transfers of personal data
An important aspect of GDPR regarding blockchain is the fact that personal data is not to leave the EU (unless under special safeguards). This is a major problem with public blockchains, since there is no control on who hosts a node. This is less an issue when it comes to private or permissioned blockchains.
To tackle this problem, IPDB set up a foundation that could insure data stays in the EU because it is public accessible (client) but permissioned hosted (node) blockchain. Unfortunately, funding was recently pulled from the foundation by the investors.
The Right to be Forgotten
Article 17 of GDPR, or ‘Right to be Forgotten’ will probably have the biggest impact on blockchain technology itself and on how we use it. Since blockchain transactions are immutable this rule stands in direct contradiction on how data on a blockchain network is stored. It simply cannot be removed.
So how to deal with this contradiction? In article 17 of GDPR, the terminology ‘erasure of data’ is mentioned a couple of times, but not anywhere in the document, not even in the definitions part — Art. 4 — is there any explanation of what the term erasure of data actually means. The interpretation of ‘erasure of data’ is very important here because it directly correlates with what can actually be stored on a blockchain. If encryption of data without storing the corresponding encryption keys upon deletion is sufficient for ‘erasure of data’, then personal data can be stored on a blockchain. If not, storing personal data directly on a blockchain is simply not allowed because it ‘cannot be erased’.
Because there is no clear definition in GDPR of ‘erasure of data’ at this point, taking a conservative view seams the safest way. This means that throwing away encryption keys for ‘deleting data’ in a blockchain is not acceptable as ‘erasure of data’ according to GDPR. Therefore, storing personal data directly on a blockchain is not possible.
Data Controller & Data Processor
GDPR defines concepts of ‘data controller’ and ‘data processor’. These roles can be clear when only one partner is responsible for storing the data, but it’s less clear who is the ‘data controller’ in a decentralized network. In the case of a permissioned blockchain network, the whole consortium could be seen as accountable. But this doesn’t work for public blockchains. For example, who is the ‘data controller’ in the bitcoin network? Everyone who runs a bitcoin node? Or no one?
The GDPR initiative probably had only centralized storage and therefore CRUD in mind (“you are always able to Delete information”) when thinking about data persistence. This is a good example on how legislation always needs to catch up with advancing technology.
Of course, this has consequences on what can be stored on a blockchain. Storing personal data on a blockchain is not an option anymore according to GDPR. A popular option to get around this problem is a very simple one: Store the personal data off-chain and store the reference to this data, along with a hash of this data and other metadata (like claims and permissions about this data), on the blockchain. To see how this works in a permissioned blockchain, consider the picture below. There are 2 companies (called BlueCompany and GreenCompany) with each their own back-ends, both connected to a blockchain.
A simplified off-chain structure which is GDPR Compliant.
Suppose GreenCompany wants to read the MyAddress value, he now has to do the following steps:
(1) Since GreenCompany does not know where MyAddress is stored, he sends a request to the blockchain layer for fetching the specific data.
(2) The blockchain can verify if the requestor (GreenCompany) has the necessary access rights to read this data. If the requestor has the proper authorization, he gets the link and the hash of the requested data. The link can be anything, like an API-endpoint including credentials, or a database connection-string for example. So here, the blockchain acts as an “access control” medium.
(3) Based on the link, the requestor can fetch the data directly from BlueCompany’s back-end without going through the blockchain again.
(4) Upon receiving the data from BlueCompany’s back-end, GreenCompany can verify if this data has not been tampered with by calculating the hash of the retrieved data and comparing it with the hash given by the blockchain. If they match, the data has not been tampered with.
The hybrid workaround as described above has increased the overall complexity of fetching and storing data on a blockchain.
The consequences of storing personal data off-chain
Storing personal data off-chain with indirect relation to a blockchain for access control, is never as effective as storing and retrieving personal data straight from the blockchain itself (GDPR concerns aside). The consequences of this hybrid solution are described below.
1. The approach described above is a 100% GDPR compliant solution, which makes it possible to completely erase data in the off-chain storage. Thereby rendering the links and hashes stored on a blockchain completely useless.
2. In this scenario, a blockchain is primarily as an ‘access control’ medium, where claims are publicly verifiable. This would give someone the means to prove that some node should not store the data after an opt-out. Of course, this benefit can also be present when personal data is indeed stored on a blockchain.
1. The benefit of transparency with blockchain is reduced. By storing personal data off-chain, there is no way of knowing for sure who accessed this data in the past, and who has access to this data. For example, once GreenCompany has the link to retrieve the data, he is not bound anymore by going through the blockchain. However, this disadvantage can be countered by generating access tokens to resources which can only be used a limited amount of times, thereby enforcing to use the blockchain to fetch the necessary links and credentials each time. This still will increase overall complexity and overhead a lot.
2. The benefit of data-ownership with blockchain is reduced. Once personal data has been stored off-chain, it becomes unclear who the owner of this data is. Likely it is the company who owns the database where your data is in. This approach undermines data-ownership of the user him- or herself. Data-ownership is one of the key characteristics of blockchain technology itself, which is not used in this scenario.
3. A point-to-point integration between all the participating parties is still necessary. In the example above, once GreenCompany gets the link to data from the blockchain, he still needs a way to get data from BlueCompany to himself. For every new partner added to the system, a new point-to-point integration with each existing member must be setup, as well as provisioning a secure ‘public key infrastructure’ for secure data transactions. What happens is that the blockchain will be reduced to a mere lookup table, thereby throwing away a lot of benefits that come with this technology.
4. More attack vectors. Each company has their own infrastructure and application landscape. By spreading the personal data over these different companies, the risk increases for a potential breach where part of this personal information can be stolen.
5. Reduced queryability. It becomes unfeasible to search within data that is spread across multiple storage mediums, protected with multiple access keys that needs to be fetched from the blockchain layer.
6. Added complexity. Added complexity has a direct correlation with increased risk of unintended errors and bugs, resulting in less secure systems.
It comes all down to this paradox: The goal of GPDR is to “give citizens back the control of their personal data, whilst imposing strict rules on those hosting and ‘processing’ this data, anywhere in the world.” Also, one of the things GDPR states is that data “should be erasable”. Since throwing away the encryption keys is not the same as ‘erasure of data’, GDPR prohibits us from storing personal data on a blockchain level. Thereby losing the ability to enhance control over one’s own personal data, like data-ownership, transparency and security.
With blockchain technologies emerging, new possibilities emerge to further strengthen data-ownership, transparency and trust between entities. The way GDPR is formulated, storing personal data directly on a blockchain is not an option since in GDPR terms ‘it is not erasable’. This prohibits the use of this technology to its full potential, thereby needing to rely on ‘older’ systems for storing data which simply cannot guarantee the same benefits as most blockchain technologies.
(7,) 8 en 9 maart 2022 Organisaties hebben behoefte aan data science, selfservice BI, embedded BI, edge analytics en klantgedreven BI. Vaak is het dan ook tijd voor een nieuwe, toekomstbestendige data-architectuur. Dit tweedaagse seminar geeft antwoo...
15 maartPraktisch seminar van een halve dag met internationaal gerenommeerde trainer Keith McCormick over supervised machine learning. Alhoewel veel aandacht uit gaat naar Deep Learning technologieën blijkt dat voor 70 tot 80 procent van de toep...
17 maart 2022 (online seminar op 1 middag)Praktische tutorial met Alec Sharp Alec Sharp illustreert de vele manieren waarop conceptmodellen (conceptuele datamodellen) procesverandering en business analyse ondersteunen. Waardevolle online tutori...
22 maart 2022Praktische workshop met Rogier Werschkull over cloud datawarehousing.Wat zijn de voor- en nadelen van Cloud Datawarehousing en hoe pak je dat aan? Tijdens dit seminar door expert Rogier Werschkull krijgt u een duidelijk beeld van de vers...
29 en 30 maart 2022 (Face-to-face én Live Video Stream) Niet eerder nam zo'n keur aan internationale topsprekers deel aan de DW&BI Summit. Schrijf in voor de negende editie van ons jaarlijkse congres met wederom een ijzersterke sprekers li...
31 maart 2022 (online seminar op 1 middag)Praktisch en interactief seminar met Donald Farmer Drie eenvoudige doch effectieve manieren om een start te maken met Data en Analytics als een 'Line of Business'. Gerenommeerd analist en thought leader ...
5 en 6 april 2022 Correcte informatie die in de juiste vorm en op het gewenste moment beschikbaar is lijkt een vanzelfsprekendheid. Dit doel kan alleen worden bereikt met een consequent beleid, dat doordacht alle fases van de levenscyclus van informa...
7 april 2022 (online seminar op 1 middag)Praktisch seminar met John O'Brien DataOps is van cruciaal belang voor bedrijven om veerkrachtig te worden met data en het leveren van analytics in een volatiele en onzekere wereld. In dit seminar zal Joh...