Somasree Majumder
3 min readDec 18, 2021

--

Privatization of data

Ever wondered about the privacy of data during building ML MODELS???

Here’s how you can ensure it-

So you need to design your personal data storage under the assumption that the entire database might be dumped on the internet one day. This will help you to create stronger privacy and help you avoid disaster when you get hacked.

Be mindful of what can be inferred from data-you might be tracking personally identifying information in the database. your customers can still be individually identified
Say you went for a coffee with a credit card and posted a pic of it on Instagram. the bank might create anonymous credit card records but if someone wants to cross-check bank might
There might be only one customer who bought coffee and posted a picture at the same time in the same area.

Encrypt and Obfuscate data-Apple for example collects data and adds random noise to the collected data. the noise renders individual records incorrect but in aggregation, it will still give a picture of the user behavior. There are a few caveats to this approach for example u create so many data points from the user just before the noise cancels out and the individual behavior is revealed.

Noise, as introduced by obfuscate, is random. When averaged of the large sample data about a single user the mean of the noise will be zero as it will not represent any pattern itself. Recent research has shown that deep learning models can learn on homomorphically encrypted data. It’s encryption that preserves the underlying algebraic property of the data.

Here E is an encryption function,m is on just some plain text. first adding the data and then encrypting it. Adding the data encrypting it and then decrypting it is the same as just adding the data. This means u can encrypt the data and still train a model on it.
Train locally and upload only a few gradients.

One way to avoid uploading the user data is training your model on the user’s device
The user accumulates the data on the device. You can then download your model onto the device and perform a single forward and backward pass onto the model. To avoid the possibility of the inference data you can upload a few gradients at random You can then apply gradients to the master model.
To further increase the overall privacy of the system you need to download all the new update weights from the masters model to the user’s device, but only a few. You can train your model without accessing it asynchronously. If the database gets breached no user data is lost.

Thank you.

--

--