March 4, 2020
On 22 January 2020, the Joint Committee On Personal Data Protection Bill welcomed the views and suggestions of companies and individuals on the bill. In this blog, we aimed to do exactly the same. We wanted to voice our opinion as a company whose internal workings and interactions with customers are directly affected by the bill. In this blog, we will mainly aim to address two things. First, we will discuss why we believe that cryptographic trashing should be considered a legal way of disposing data. Secondly, we will detail our views on why we believe that the machine learning / AI models that were created as a consequence of the data (that was deleted) should not come under the privacy purview.
Crypto-shredding (or Crypto-trashing) is the practice of deleting sensitive data on purpose by overwriting or deleting the encryption keys. This method of deleting data irretrievably is considered quite useful as it addresses privacy concerns.
Before we detail our views on why cryptographic trashing should be considered a viable way of deleting data, we need to focus on the legal complications behind deleting the data in the first place. The clause of ‘Right To Be Forgotten’ increases the importance of finding viable and legal ways of shredding data by manifolds.
‘Right To Be Forgotten’ means that Indian citizens are allowed to approach the adjudicating officer under the independent regulator to erase any personal data that meets a predefined criterion. These rules apply to any organization that uses or has access to user data. This is in vast contrast to countries like those in the EU where users can reach out to data controllers of the search engines like Google to delete any particular content.
In fact, the idea of Right To Be Forgotten has been a subject of controversy since it was first coined in 2014 following a legal battle with Google. Some countries like the EU have already added it to their General Data Protection Regulation (GDPR) laws. However, members of the committee admit that it remains to be seen how Right To Be Forgotten will be implemented in the GDPR laws. After all, there are many complications surrounding data deletion. Even if data fiduciary deletes the data, ambiguity surrounds the secondary data like newsletter reports, etc.
Evidently, Indian companies have been debating on effective ways to delete data. Moreover, the Personal Data Protection Bill pays emphasis on data encryption for increased data security. We believe that Cryptographic Trashing/Shredding is a great way of deleting data keeping the preceding points in mind. It is one way in which the data can be completely removed by overwriting the encryption keys. In this section, we will try to detail why cryptographic trashing should be considered a viable way to delete data.
Crypto-shredding allows us to erase any and all personal data without the need to alter historical archives. However, it is not focused on securing data at rest or transit. There are other technologies that are used to achieve the same.
For instance, a table of personal details of customers can be encrypted for each record. While a different table would store the key. So, if a customer exercises their “right to be forgotten”, the company can simply delete the key which would “shred” that data.
However, there is a lot of ambiguity around crypto-shredding and why it should be considered as a viable way of deleting customer data. The answer to this question lies in the way that crypto shredding is carried out. In this section, we will try to discover how crypto shredding functions and why it ensures confidentiality into account even in high-security scenarios.
Some companies choose to encrypt everything which includes the database, hard disk computer files, etc. However, some companies choose to encrypt only specific data like passport number, bank account number, user name, etc. Moreover, the specific data in one system can be easily encrypted with another key in another system.
Companies investing in crypto-shredding ensure that neither the decryption data nor the key is exposed because that can lead to legal, financial and reputational risks to the organization. The logic here is simple, if the decryption key is accessible to the organization, then the encrypted data is recoverable. Hence, ‘Right to be Forgotten’ cannot be fully implemented in a situation like that.
Hence, in high-security scenarios, keys are loaded by a third party that is in charge of encrypting and decrypting the data on behalf of the company. This essentially means that the company does not get access to the key and a higher level of security is ensured. Moreover, the third-party implement their own cryptographic trashing and adhere to the GDPR requirements themselves.
Crypto shredding can be used to ensure that data deletion can be done in a more confidential way. Moreover, it can be fine-tuned to find the perfect balance between technical capabilities and legal requirements.
This section is focused on the AI models that have been developed as a result of data (which is now deleted.) The main issue most companies face here is that even if the data itself is deleted, the ‘AI Learning’ remains. This essentially means that it is virtually impossible to delete the AI learning that is resulted from the use of data. This makes privacy law on AI more or less redundant.
Almost every modern enterprise today collects data on customers which is later analyzed for training AI systems and provides better services to the customers. For instance, a search engine might provide our recommendation for dresses on the basis of our previous purchases or search history. However, after this data is fed into the AI systems, there is virtually no way of understanding how it arrived at the AI learning.
Essentially, the problem of data deletion in Artificial Intelligence systems can be summarised in the following way. Let us assume that an AI model is trained on many datapoints. Let us suppose that we need to delete the data collected from 1 particular source. Now, to delete the data that we have sampled from one particular source from our training model, we would have to update the model in such a way that it becomes independent of that sample. In conclusion, we would have to make sure that it works as if it has been trained on the remaining n-1 sources.
One way to approach this deletion process would be to retrain the entire model from the beginning. However, this approach would not be a feasible approach for most AI models that include heavy computation and high costs. Moreover, large scale algorithms require a lot of time to train and huge amounts of electricity (and other resources.)
This no-win situation is why we firmly believe that AI should not come under the privacy purview of the latest data protection bill. We understand the concerns that arise with the usage of AI machine learning models. However, the focal point here is that the customer’s privacy remains untouched. Furthermore, imposing regulations on AI models can prove to be an impediment to the industry’s development.