Applications of machine learning are often based on data that might contain private or sensitive information, leading to concerns about privacy, and about the fairness of algorithms. A specificity of textual data is that the personal information they contain is not necessarily explicitly stated: it has been shown that a number of demographic variables (such as the age, gender or native language of an author) can be predicted with reasonable accuracy from linguistic features of the text. As a result, a system that uses texts as input is implicitly conditioning its decisions on demographic variables, and might potentially expose such information.
In this work, we quantify the personal information that can potentially leak from the vector representation of a document computed by standard methods (e.g. LSTM). To do so, we construct a setting where an attacker tries to predict private variables from vector representations of texts. We measure the privacy of a hidden representation by the ability of the attacker to predict accurately specific private variables from it and characterize the tradeoff between the privacy and the utility of neural representations. We find that these vector representations contain private information even when they have not been trained to do so. Finally, we show how to use adversarial learning learning to reduce the vulnerability of the representations to the leakage of private information.
(Joint work with Shay Cohen and Shashi Narayan)