Statistical privacy, or the problem of disclosing aggregate statistics about data collected from individuals while ensuring the privacy of individual level sensitive properties, is an important problem in today's age of big data. The key challenge in statistical privacy is that applications for data collection and analysis operate on varied kinds of data, and have diverse requirements for the information that must be kept secret, and the adversaries that they must tolerate. Thus, application domain experts, who are frequently not experts in privacy, cannot directly use an existing, general-purpose privacy definition. Instead, they must develop a new privacy definition or customize an existing one. Currently there exist no rigorous techniques to customize privacy to applications.
This project builds PROTEUS, a general-purpose toolkit for developing rigorous privacy definitions and mechanisms that can be customized to applications. The cornerstone of PROTEUS is a novel privacy framework that allows customized privacy protection by explicitly listing the secrets to be protected, enumerating the (potentially infinite set of) possible adversaries, and ensuring rigorous bounds on the information disclosed to each adversary about every secret. Novel theoretical tools in PROTEUS include methods to reason about privacy for correlated data, privacy against realistic adversaries, and techniques to express and enforce personalized privacy preferences. These tools result in practical privacy mechanisms for publishing social science survey data, social network analysis, and analysis of user-activity streams. Broader impacts of this project include developing new courses in privacy and big-data management, as well as technology transfer to the US Census.