Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Security and privacy in information processing
(USC Thesis Other)
Security and privacy in information processing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SECURITY AND PRIV ACY IN INFORMATION PROCESSING by Chien-Lun Chen A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2020 Copyright 2020 Chien-Lun Chen to my beloved wife and family, and my late grandfathers who wanted to see their grandson graduate ii Acknowledgments First and foremost, I would like to thank my adviser, Professor Leana Golubchik, for her continuous guidance and all kinds of supports throughout my PhD study. My sincere appreciation to her is beyond words; I definitely cannot finish my PhD and this thesis without her help, not just from the research aspect. I would also like to thank Profes- sor Bill Cheng for continuously providing me teaching assistant opportunities during my PhD, which not only gave me financial supports but also trains and improves my teaching and presentation skills. I also want to thank Ranjan Pal, who was my mentor guiding me and spending lots of his precious time discussing privacy research problems during my first two years in Professor Leana Golubchik’s group. Ranjan also helped me a lot on improving my writing and presentation skills. I would also like to thank Marco Paolieri, who spent lots of his time guiding me on machine learning research works, and gave me various kinds of suggestions and helps. I want to thank all my labmates, especially Wumo (Ben) Yan, who helped me a lot on both my qualifying and defense exams. I would also like to thank Professor Kostas Psounis, Muhammad Naveed, Bhaskar Krishnamachari, and Shanghua Teng for their generous helps and research suggestions in my qualifying exam and/or defense exam. I would also like to thank Diane Demetras, who gives me all kinds of assists and information during my PhD. iii Finally, I would like to thank my family, especially my parents and my wife, for their patience, understanding, and all the supports. I definitely cannot finish this long journey without them. During this long journey, I lost my grandfathers, and grandparents in law. I have my sons born. Thanks my wife for taking good care of this family and for all the hard works she has done. I really appreciate all the helps and supports from all the people during this journey. Thank you very much all! iv Contents Dedication ii Acknowledgments iii List of Tables ix List of Figures x List of Algorithms xi Abstract xii Chapter 1: Introduction 1 1.1 Chapter Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Tradeoff between Privacy and Data Utility . . . . . . . . . . . . . . . . 3 1.3 Main Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Research Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 2: Literature Survey and Background Knowledge 19 2.1 Poisoning Backdoor Attacks and the Defense Mechanisms . . . . . . . 19 2.2 Privacy Metrics for Data Publishing . . . . . . . . . . . . . . . . . . . 20 2.2.1 k-anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 v 2.2.2 l-diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.3 t-closeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.4 Confidence Bounding . . . . . . . . . . . . . . . . . . . . . . . 26 2.3 Utility Metrics for Data Publishing . . . . . . . . . . . . . . . . . . . . 28 2.4 Utility-Privacy Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 Fairness Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.5.1 Measures for Individual Fairness . . . . . . . . . . . . . . . . . 34 2.5.2 Measures for Group Fairness . . . . . . . . . . . . . . . . . . . 35 2.6 Privacy and Fairness in Algorithmic Transparency . . . . . . . . . . . . 37 2.7 Utility, Differential Privacy, and Information-Theory . . . . . . . . . . 38 2.8 Privacy Mechanism Design . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8.1 Mechanism Design for Numeric Queries . . . . . . . . . . . . . 41 2.8.2 Mechanism Design for Non-numeric Queries . . . . . . . . . . 43 2.8.3 Answering Multiple Queries: Composition Theorem . . . . . . 44 Chapter 3: Poisoning Backdoor Attacks on Federated Meta-Learning 45 3.1 Chapter Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Federated Meta-Learning . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4 Experimental Goals and Setup . . . . . . . . . . . . . . . . . . . . . . 54 3.4.1 Experiment Goals . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4.2 Experiment Setups . . . . . . . . . . . . . . . . . . . . . . . . 55 3.5 Experimental Results: Effects of Poisoning Backdoor Attacks . . . . . . 59 3.6 Defense against Poisoning Backdoor Attacks . . . . . . . . . . . . . . 67 3.6.1 Matching Network . . . . . . . . . . . . . . . . . . . . . . . . 67 3.6.2 Matching Network as a Defense Mechanism . . . . . . . . . . . 68 3.7 Experimental Results: Effectiveness of the Defense . . . . . . . . . . . 70 vi Chapter 4: Towards Privacy in Algorithmic Transparency 72 4.1 Chapter Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.1 Transparency Schemes . . . . . . . . . . . . . . . . . . . . . . 78 4.2.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.3 Adversarial Setting in Privacy . . . . . . . . . . . . . . . . . . 81 4.3 Privacy Leakage via an ATR . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.1 Privacy Leakage via Interpretable Surrogate Models . . . . . . . 85 4.3.2 Privacy Leakage via Fairness Measures . . . . . . . . . . . . . 87 4.3.3 Privacy Leakage via Feature Importance/Interaction . . . . . . . 90 4.4 Privacy Measure and Requirement . . . . . . . . . . . . . . . . . . . . 98 4.5 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.6 Privacy-Fidelity Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.6.1 Optimization Formulation . . . . . . . . . . . . . . . . . . . . 105 4.6.2 Decomposability . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.6.3 Solution Properties . . . . . . . . . . . . . . . . . . . . . . . . 109 4.6.4 Optimal Privacy and Solutions . . . . . . . . . . . . . . . . . . 111 4.6.5 Insights into the Optimal Privacy Solutions . . . . . . . . . . . 112 4.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.8.1 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . . 122 4.8.2 Proof of Lemma 5 . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.8.3 Proof of Lemma 6 . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.8.4 Proof of Lemma 7 . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.8.5 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . 126 4.8.6 Proof of Lemma 9 . . . . . . . . . . . . . . . . . . . . . . . . . 132 vii Chapter 5: Oblivious Mechanisms in Differential Privacy 140 5.1 Chapter Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.2 Differential Privacy, and the Mechanism Design . . . . . . . . . . . . . 142 5.2.1 Differentially Private Mechanism . . . . . . . . . . . . . . . . . 143 5.2.2 Query Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.2.3 Query Generator Utility . . . . . . . . . . . . . . . . . . . . . . 145 5.2.4 Utility-Privacy Tradeoffs . . . . . . . . . . . . . . . . . . . . . 147 5.2.5 Popular NGMs in Literature . . . . . . . . . . . . . . . . . . . 149 5.3 Challenges, Opportunities, and Our Approach . . . . . . . . . . . . . . 150 5.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 150 5.3.2 Experimental Methodology . . . . . . . . . . . . . . . . . . . . 153 5.4 Experimental Results and Analysis . . . . . . . . . . . . . . . . . . . . 156 5.4.1 Utility-Privacy Tradeoff of Existing Mechanisms . . . . . . . . 156 5.4.2 Presence of Side Information . . . . . . . . . . . . . . . . . . . 158 5.4.3 Collusion in Query Results . . . . . . . . . . . . . . . . . . . . 161 Chapter 6: Conclusion 164 6.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . 164 6.2 Open Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 References 167 viii List of Tables 2.1 Inpatient Microdata [113] . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 4-Anonymous Inpatient Microdata [113] . . . . . . . . . . . . . . . . . 22 2.3 4-Anonymous 3-Diverse Inpatient Microdata [113] . . . . . . . . . . . 23 4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 A Synthetic Credit Card Application Scenario . . . . . . . . . . . . . . 84 4.3 Fairness Measures for Table 4.2 in an ATR . . . . . . . . . . . . . . . . 88 4.4 Attribute Information of the Credit Approval Dataset . . . . . . . . . . 91 4.5 A Snapshot of the QID GroupT x U =fy, p, k, vg . . . . . . . . . . . . . . . 93 4.6 Detailed Inputs and Computations of the Provided Numerical Example . 118 5.1 Problem Difficulty Levels (low to high) w.r.t. Optimal NGM Design . . 150 ix List of Figures 3.1 Backdoor attack on Omniglot: (a) the target class (Tifinagh, charac- ter41), (b) the backdoor classes (Asomtavruli (Georgian), character03; Atlantean, character19; Japanese (hiragana), character13; Tifinagh, char- acter42), (c) the attack pattern, (d) the attack training set . . . . . . . . . 58 3.2 Accuracies of backdoor attacks and the performance of the model over rounds in federated meta-learning. . . . . . . . . . . . . . . . . . . . . 61 3.3 Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:001) 62 3.4 Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:01) . 64 3.5 Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:05) . 65 3.6 Backdoor and Meta-Testing Accuracies during Matching Network Fine- Tuning ( = 0:001) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1 A depiction of the realm of accountable ATRs . . . . . . . . . . . . . . 76 4.2 A representative illustration of a decision blackbox . . . . . . . . . . . 79 4.3 Depictions of measured and the unknown true biases . . . . . . . . . . 103 4.4 An representative illustration for changes of joint probabilities caused by the optimal-privacy scheme. . . . . . . . . . . . . . . . . . . . . . . 114 4.5 Privacy-Fidelity Tradeoffs for both Gender Groups . . . . . . . . . . . 121 5.1 Laplacian, Staircase, and Geometric mechanisms. . . . . . . . . . . . . 148 5.2 Utility-Privacy tradeoff and optimality of three mechanisms. . . . . . . 159 5.3 Drop rates of expected loss due to collusion. . . . . . . . . . . . . . . . 160 x List of Algorithms 1 Optimal Privacy Protection Scheme . . . . . . . . . . . . . . . . . . . 117 xi Abstract As cloud and machine-learning services are becoming more ubiquitous, it is becom- ing easier for service providers to access, collect, analyze, and store a wide variety of personal information to provide more accurate and personalized services. Despite the convenience of these services, due to the increasing incidents of malicious attacks on machine learning based applications and privacy leakage of the collected user data, there is increasing concern that service providers gather, store and analyze users’ private information. Motivated by this, in this thesis, we focus on the following problems. We first study security and privacy problems brought on by malicious attacks in a particular cooper- ative machine learning framework, known as federated learning, in which, motivated to preserve privacy, users jointly train a machine learning model without sharing their private data. However, in such a context, a malicious participant can attack the jointly trained model. Existing defense methods have several deficiencies with privacy risks, since all of them rely on a third-party to examine users’ updates. We next explore unintentional privacy leakage, where sensitive information of a user could be (unin- tentionally) leaked from, but not limited to, (i) query outputs of a database containing user information, (ii) a released anonymized dataset, or (iii) an announced algorithmic transparency report releasing details of underlying algorithmic decisions based on user information. We investigate privacy-preserving mechanisms in the above mentioned xii contexts, focusing on two common privacy-preserving paradigms: privacy-preserving database-mining (PPDM) and privacy-preserving database-publishing (PPDP). Specifically, for our first problem, we experimentally explore poisoning backdoor at- tacks in the context of federated meta-learning, an important problem that has not been well-studied in the literature. Our first finding is that poisoning backdoor attacks in fed- erated meta-learning are long-lasting: a one-shot attack can persist and influence a meta model for tens of rounds of federated learning during normal training. Moreover, we found that poisoning backdoor attacks cannot be removed by standard fine-tuning dur- ing meta-testing, a stage where a meta model adapts to new tasks before evaluation. Our results show that the attack influence diminishes very slightly after hundreds of epochs of fine-tuning. These findings show the difficulty of removing poisoning backdoor at- tacks in federated meta-learning through regular training and fine-tuning; thus the need for an effective defense mechanism is crucial. Existing defense mechanisms for con- ventional federated (supervised) learning mostly based on independent and identically distributed (i.i.d.) assumption w.r.t. user data are not realistic with privacy concerns as those approaches are centralized, relying on a third-party to access and examine the updates from all users. We propose a distributed defense mechanism, a sanitizing fine- tuning process inspired by matching network, performed by each user. Our proposed defense mechanism is privacy-preserving, due to its distributed manner; our experimen- tal results show that poisoning backdoor attacks can be completely removed in only a few epochs of sanitizing fine-tuning while maintaining the performance of the jointly- trained meta model. xiii With respect to unintentional privacy leakage, we study the problem of optimal privacy-preserving decision announcement in an algorithmic transparency report. Specif- ically, a variety of modern applications employ machine learning algorithms for con- sumer decision processes. However, such processes are often opaque, making it dif- ficult to rationalize why certain decisions were made, thereby facilitating possible bias and discrimination. Recent research advances, based on a data-mining framework, intro- duced quantitative input influence measures capturing the degree of influence of inputs on outputs of decision systems. Unfortunately, in such a pioneering data-mining frame- work exist several shortcomings and disadvantages, including privacy threats. In our work, we consider a more general database-publishing framework, in which an algo- rithmic transparency report, consisting of all inputs and outputs of a decision system, is published judiciously, similarly to PPDP, without leaking private information of in- dividuals, while preserving the fidelity of fairness/unfairness for outputs of decisions. We propose measures for data utility, privacy, and fidelity of fairness in algorithmic transparency reports and study the related fundamental trade-offs and properties. We also derive necessary and sufficient conditions for perfect utility and perfect fidelity of fairness in terms of privacy parameters and note the feasible utility-privacy-fairness trade-off region problem; although the measures and the definitions of each term are different, this is essentially similar to the rate-distortion-equivocation problem. Finally, in the context of PPDM, we study utility-optimal privacy-preserving mecha- nism design problems. In particular, we consider differential privacy (DP), as it is one of the most widely-accepted privacy metrics. The state-of-the-art optimal DP mechanisms are proposed based on different assumptions. We first give an overview of the opti- mal design problems in PPDM, classifying all state-of-the-art optimal DP mechanisms, and propose open problems that have not yet been solved. We note that the optimal DP mechanism design considering side-information of query responses has not been xiv previously addressed in the literature; this is an important problem as side-information is widely available. We then propose a heuristic side-information-aware DP mecha- nism design, showing that the utility gain can be significant in the low and intermediate privacy regimes and hence the corresponding optimal design is non-trivial. We also for- mulate and analyze the optimal side-information-aware DP mechanism design problem. xv Chapter 1 Introduction 1.1 Chapter Introduction In the modern age of ubiquitous Internet services with advanced data mining technol- ogy, data and information become readily available and easily accessible, however, not only from users’ perspective, but also from service providers’ perspective, and thus, users’ records and data stored in their smart devices and web browsers are impercep- tibly collected or tracked by service providers, advertiser, or unknown third parties for particular commercial or unknown (could be malicious) purposes. Without helps from privacy-preserving techniques, users’ privacy is vulnerable in view of such tracking and analysis. For example, a website can use client-side storage items in the browsers, such as cookies, HTML5 local storage, or Flash local storage objects to identify users and to track users’ behaviors on the Internet. In addition to tracking on the Internet, users’ privacy faces threats from data release, even though the released data is anonymized. The AOL search logs released in Aug 2006 is a well-known example of a privacy catas- trophe [15], in which a New York Times journalist successfully uncovered the identities of several searchers from the released search logs. Moreover, a recent study [47] con- cludes that human mobility patterns are highly unique and four spatio-temporal points are enough to uniquely identify 95% of the individuals. Therefore, over the past two decades, researchers and privacy engineers have been committed to protecting users’ privacy in all kinds of applications and services. 1 One of the most important application categories in which privacy has attracted great interest by researchers is data release and data-mining/learning. This has led to two popular privacy research paradigms. The first paradigm addresses the privacy hazard of identifying sensitive records of an individual, by utilizing auxiliary databases and/or comparing query responses (the outputs of queries to a database) from the database into which the targeted individual opts in. Privacy issues raised in this paradigm are known as privacy-preserving database-mining (PPDM) problems, which enable learn- ing and data-mining while preventing disclosure of sensitive records of any individ- ual. The second paradigm addresses the privacy hazard of identifying a person from a published anonymized database, by also utilizing auxiliary databases and/or know- ing some (non-sensitive, typically) attributes of the targeted individual. The adversary then discovers all sensitive information of the targeted individual from the published anonymized database. Privacy issues raised in this paradigm are known as privacy- preserving database-publishing (PPDP) problems, where an anonymized database is published with non-identifiable records for the purpose of research and learning. There are two major approaches to accessing users’ information while preserv- ing privacy, known as computationally-private methods and statistically-private meth- ods. Computationally-private methods protect and retrieve information through crypto- graphic protocols. Ideally, these simultaneously take care of both utility and privacy; however, this is computationally intractable due to large overhead and difficult of use in some cases. Statistically-private methods preserve private information by completely or partially removing or perturbing data. By doing so, if a statistically-private mech- anism is not appropriately designed, the perturbed results may be too distorted to be of use in practice. However, some statistically-private methods are gathering more and more attentions from both researchers and privacy engineers due to their simplicity and feasibility. 2 Statistically-private methods for databases include (but are not limited to) anonymization [42, 43], generalization/aggregation [142, 143], suppression [42, 143, 120], data swapping [135, 134], and perturbation/distortion [4, 30]. Anonymization (completely or partially) remove any information that can be used (in isolation or jointly) to directly identify/imply a person, including name, ID number, and so on. Generaliza- tion, or aggregation, aggregates a set of numeric or non-numeric values and substitutes instead a range or a value from a higher hierarchical level. For example, a 25 year old person’s age attribute can be replaced by “2030”, and a Chinese person’s nationality attribute can be replaced by “From Asia”. Suppression removes partial or entire infor- mation of a set of values, and replaces these with an asterisk or “ANY”. Suppression and anonymization are special cases of generalization. Data swapping randomizes the order of a set of records or responses without altering content. It can maintain uni- variate statistics and controlled multivariate statistics if done properly. Perturbation or distortion can be achieved in many ways; for instance, filtering out partial sensitive in- formation, adding extra unrelated inputs (noise), or randomizing the order of a set of values. Notable applications of perturbation include pixilation, voice alteration, blur- ring, and so on. In fact, the other four methods can be thought of its extreme or special cases for specific purposes and actions. 1.2 Tradeoff between Privacy and Data Utility A privacy-preserving mechanism utilizing one or more of the above methods should be properly designed to preserve users’ privacy for targeted applications. However, there are two other important aspects that should also be addressed in a privacy-preserving mechanism design. The first one is data utility, which refers to the amount of use- ful/correct information contained in the output of a privacy-preserving mechanism (an 3 anonymized database or a perturbed query response). The second one is mechanism tractability, which includes the applicability/universality and the ease of use (e.g., algo- rithmic complexity) of a privacy-preserving mechanism. Essentially, statistically-private methods sacrifice data utility to preserve privacy, as information is altered or removed before access. A privacy-preserving mechanism and the privacy requirements jointly determine how much information should be removed or perturbed. In order to make a privacy-preserving mechanism meaningful, it is important that the mechanism should perturb as little information as possible (i.e., provide as high utility as possible) while at the same time satisfying privacy requirements. A design of privacy-preserving mech- anism which maximizes utility while preserving privacy is called an optimal privacy- preserving mechanism design. To ensure that privacy will not be violated even when rare events happen in some unusual circumstances, the prescriptive approaches of privacy standards tend to be very conservative, which results in very low data utility in practice. For example, the U.S. Healthcare Information Portability and Accountability Act (HIPAA) safe harbor rules detail the types and specificity of data generalization to make the data safe for release. As it requires that any disclosed geographic unit must contain at least 10,000 or 100,000 individuals, the rules allow the disclosure of the year-of-birth and the first three digits of the ZIP code, which is roughly a region of a county. However, in some unusual cases, it could happen that a person is the only one born in, for instance, 1912, in his county, and his records can be found and identified from the anonymized records even though the safe harbor rules are applied and followed. To be able to tackle such an unusual case, the standards for generalization have to be more conservative, e.g., increasing the minimal number of population in any disclosed geographic unit to one million or ten millions, which only allows the disclosure of the first two digits of the year-of-birth. 4 Unfortunately, this results in publishing almost useless anonymized data, and thus the initial goals of data release (research, studies, etc.) are totally missing. In fact, such an unusual anomaly is caused by “outliers” of attributes, whether they are sensitive or not, and more importantly, such outliers actually commonly exist in many kinds of attributes, e.g., weight, height, annual income, disease, etc. When the size and the dimensions of a database grow, unusual anomalies appear more frequently, and the range of query responses also increases. Consider again the HIPAA example, where it would be very unlikely to develop a universal privacy standard suitable for all outliers in all attributes and simultaneously preserve privacy with meaningful data utility. One na¨ ıve solution for such a privacy-utility conflict is to just provide an adequate level of expectation privacy, i.e., instead of providing privacy guarantee for each individual, we trade a small amount of privacy (especially of outliers’) for better data utility. However, such a na¨ ıve approach can (and most likely will) fail if not carefully evaluated. Recall the HIPAA example; if the rules are revised to allow the disclosure of the first three digits of the year-of-birth (a na¨ ıve “adequate level” between the full disclosure and the disclosure of the first two digits), it may not improve privacy for long-lived individuals (as 191* may still be unique and identifiable in the county), but also result in lower data utility compared to the original version. More extreme cases happen in annual income, as it is quite unevenly and widely distributed. Based on the report from the U.S. Census Bureau in 2014, more than 67% of people in the U.S. have an annual personal income lower than the average of $44,569, but the annual personal income of the top 134 people has an average of $86,000,000. It is clear that there exists no general standard that can preserve privacy with meaningful data utility. The above discussion illustrates that preserving privacy with meaningful data utility is non-trivial. A possible better solution for the above HIPAA problem is that the gener- alization for different groups of people should be different. For example, the safe harbor 5 rules for the year-of-birth should allow full disclosure for people whose age is under 65 (group 1), first-three-digits disclosure for people whose age is between 65 and 80 (group 2), and first-two-digits disclosure for those who are above 80 (group 3). The intuition of the design is based on the (prior) distribution of ages (or the year-of-birth). More in- formation should be disclosed for those years-of-birth with larger population, and vice versa. Moreover, additional randomness can also be applied to the design. For example, the disclosure rules for the year-of-birth belonging to each group can be “the first two”, “the first three”, or “all four” digits with corresponding probabilities. The range of age for each group and the probabilities should be determined in order to maximize utility. By utilizing prior information of the year-of-birth properly, we can enhance data utility while simultaneously preserving adequate privacy. 1.3 Main Challenges There are many factors that can jointly influence the design of optimal privacy- preserving mechanisms. Ideally, the design should be formulated as an optimization problem, taking the needed factors and constraints into consideration. However, in prac- tice, the design is challenging in both problem formulation and solving the formulated problem. These challenges include (but are not limited to) the following factors: Variety of privacy and utility definitions and metrics The first problem and the most fundamental problem is, what is “privacy”? More specif- ically, how do we define privacy and quantify privacy? In fact, for different applications, the privacy concerns may be different, and it is understandable if their privacy metrics are different. However, based on a state-of-the-art survey, even for the same applica- tion, there are various definitions for privacy due to different demand and use. The 6 situation is the same for utility. For example, for PPDM, privacy definitions include information-theoretic privacy [22], ( 1 ; 2 )-privacy [7], (";)-differential privacy [54], (";)-semantic privacy [99], and (;)-concentrated differential privacy [59]. Utility (or utility loss) definitions include min-entropy leakage, maximum norm, and expected norm (between the perturbed answer and the true answer). The norm definitions also vary. Although there exist some works [99, 180] comparing definitions, it is diffi- cult to make compares as they are motivated by different points of view; for example, information-theoretic privacy and (";)-differential privacy. It could be a non-trivial task to propose proper measures for utility and privacy based on the application and the associated demands. Inconsistent users preferences A user may have different privacy preferences over different attributes. For a certain attribute, different users may also have different preferences w.r.t. privacy. For example, some people may have concerns about releasing their occupation to untrusted third par- ties, as their occupation could be utilized to infer their (range of) annual income, their social status, or a potential disease they might have; however, some other people may not have such concerns because their occupation may be either weakly related or highly dynamic with respect to sensitive information. Consequently, users’ privacy preferences for a given attribute could be difficult to understand and to measure. Difficulties in understanding and modeling side-information Side-information plays a critical role in privacy-preserving mechanism design. Side- information comes from various kinds of sources, including research studies, common knowledge, surveys, social networks, and so on. Strong side-information can make privacy-preserving impossible. For example, suppose that through research studies, we 7 know that there is high correlation between smoking and getting lung cancer. Then by knowing that someone is a smoker with cancer (generalized), an adversary still has high confidence in guessing that it is lung cancer. Since side-information comes from various kinds of sources, it could be difficult to determine what side-information an adversary could possess in advance (or even in the future) to determine an appropriate privacy level. In addition, to be able to determine the (quantified) privacy level, we need to be able to formulate the side-information math- ematically, i.e., in terms of probabilities such as a prior distribution. However, not all kinds of side information can be mathematically described. This causes difficulties and challenges in the optimal privacy-preserving mechanism design. Statistical inference between attributes and dependencies among attribute values Statistical inference between attributes and dependencies among attribute data/values cause many difficulties in optimal privacy-preserving mechanism design, for both PPDM and PPDP. Strong inference between attributes can leak users’ private informa- tion if the privacy-preserving mechanism is not carefully designed. For instance, con- sider a released medical database in whichfDancer, Female, 2030g! HIV is 100%. If Emily is known to her friends to be a dancer (her age and gender are assumed already known to her friends), her friends can infer that she has HIV with 100% confidence if she opts into the released medical database. Dependencies among attribute values can hurt privacy as well. If an adversary knows all non-sensitive information of a targeted individual, he is able to identify which tuple the target’s record belongs to (based on a generalized record). If the corresponding values of the tuple in the attribute “disease” are “gastric ulcer”, “gastritis”, and “stom- ach cancer”, then the adversary at least knows that the target has a stomach disease. Moreover, dependencies between non-numeric attribute values can be hard to capture, 8 since their relationships may not form a single hierarchical tree structure and can be in- tricate. For example, for the attribute “Nationality”, Taiwan, Japan, and Philippines are (can be generalized as) “island countries”, Philippines, Singapore, and China are “Asian countries”, and China, Taiwan, and Singapore are “Mandarin spoken countries”. It may require non-trivial efforts to capture all possible relationships between attributes and among non-numeric attribute values, in order to remove or reduce their dependencies and/or enhance data utility for the released database. Multiple uses of data One of the most challenging issues in preserving privacy is composition of the released anonymized data or perturbed query responses, and this can happen in all applications and all settings. Early uses of data become side-information of the later uses, and thus can seriously hurt privacy as the later uses is often highly correlated to the early uses due to information consistency. In addition, multiple adversaries can collude and share the query responses or data they obtained, and take actions to further weaken or break through the protection for users’ privacy. Such problems are challenging to the privacy engineers and researchers as it is unclear how much “privacy budget” should be set before data release as the future query stream is unknown in advance. Specifically, (1) how many queries are coming, (2) what future queries would be, and (3) the correlations between future queries and past queries, are unknown. For example, in order to achieve "-differential privacy (DP) for n queries, the “privacy budget” for each query should at least satisfy " n -DP. Privacy protectors may have to make decisions for privacy budget without knowingn and other information in advance. If the protection tends to be too conservative for any contingency in the future, the data utility would be seriously hurt. On the other hand, if a privacy-preserving method provides too much utility in early uses, then users’ privacy is exposed to high risk in later uses. 9 Difficulties of computing parameters and solving optimization problem In addition to the above-mentioned issues, computing privacy parameters or solving the optimization problem can be sufficiently hard. For example, for PPDP, it has been shown that achieving optimalk-anonymization is NP-hard. For PPDM, differential pri- vacy (DP) should be guaranteed within the range of global or smooth sensitivity. Global sensitivity denotes the largest difference of query outputs possible from all dataset pair combinations having a Hamming distance of one, i.e., neighborhood datasets. It needs to be pre-computed in order to understand how much noise should be added in a DP mechanism. However, for most of queries, it is very challenging to obtain the corre- sponding global sensitivity. This is true even for common queries, such as variance. A common approach to bypass this difficulty is to derive an upper bound; however, such an approach results in over-protection of privacy (i.e., low data utility), as global sensitivity is essentially the worst case upper bound of sensitivity without violating DP. Using an upper bound would make data utility even worse. Smooth sensitivity [127], a tighter version of sensitivity without violating DP, was proposed for better data utility. How- ever, the computation of smooth sensitivity is extremely difficult, making it intractable in practice. 1.4 Research Motivations Ideally, an optimal privacy-preserving mechanism design should address all the above- mentioned challenges, which makes the problem extremely difficult. In fact, based on a state-of-the-art survey, most of the optimal privacy mechanism design and utility- privacy tradeoff works only mainly focus on one or two of the above-mentioned chal- lenges and make assumptions about the others to make the problems solvable. In par- ticular, very few works address the optimal privacy-preserving mechanism design in 10 the presence of side-information. However, side-information widely exists and is read- ily available. It is important and also interesting to understand how side-information can influence and contribute to the optimal privacy-preserving mechanism design, in all privacy, data utility, and mechanism tractability aspects. In this dissertation, we investigate utility-privacy tradeoff and optimality in privacy- preserving perturbation mechanism design in both privacy-preserving database-mining (PPDM) and privacy-preserving database-publishing (PPDP) paradigms. Specifically, in PPDM, we focus on the optimal oblivious differentially-private (DP) noise-adding mechanism design in the presence of side-information. An oblivious privacy-preserving mechanism depends on the domain of query outputs/responses, regardless of databases. Moreover, to be precise, side-information refers to the prior knowledge of the desired output. Particularly, in PPDM, it represents the prior probability distribution of query responses over query output domain, e.g., prior knowledge of distribution of averaged salary. Our research motivation was stimulated by the state-of-the-art well-known obliv- ious DP noise-adding mechanisms, which are Laplacian [56], Geometric [79, 88], and Staircase [76] mechanisms. Laplacian mechanism is not optimal, but it is the most pop- ular one due to its tractability. The mechanism is very simple and the noise can be easily generated. Geometric mechanism is a discretized Laplacian. It is universally optimal, which is a very strong optimality not sensitive to any type of prior knowledge and utility loss functions, but only for the special case of count queries, in which global sensitivity is unity. Staircase mechanism is optimal for arbitrary global sensitivity under risk-averse model (in which prior knowledge is assumed unknown and not utilized), but the optimal- ity holds only for symmetric non-decreasing utility loss functions and unbounded query output domain. In fact, the optimal DP noise-adding mechanism design utilizing prior knowledge has not been investigated by researchers, except for the Geometric mech- anism for count queries. It is interesting and also important to understand how much 11 utility can be improved by fully utilizing prior knowledge of query responses, and if the optimal mechanism retains certain tractability. Therefore, in Chapter 5, we study utility- privacy tradeoff and investigate an optimal design of DP noise-adding mechanism under Bayesian model, in which prior knowledge is utilized and the objective is to minimize expected utility loss. We investigate the problem in the context of global settings, i.e., a trustful curator possessing users’ data is responsible for answering queries without leak- ing users’ private information. In particular, we focus on single-dimensional numeric (scalar) query functions and oblivious mechanism design. For PPDP, in Chapter 4, we study the problem of utility-privacy tradeoff and optimal privacy-preserving mechanism for decision announcement in algorithmic transparency reports. Specifically, given the growing concern about discrimination nowadays, people want to know the reasons behind a certain decision in order to understand if the deci- sion maker treats them fairly. To make the decision blackbox transparent, a trustworthy regulatory agency should use their authority to acquire necessary data and provide mean- ingful information in an algorithmic transparency report to disclose how decisions are made. In addition, an algorithmic transparency report should disclose what input factors, including all sensitive and non-sensitive input attributes, contribute to the decision, how they are associated with the decision, independently or jointly, and by how much they in- fluence the decision. We consider a data-publishing framework, in which an algorithmic transparency report is demonstrated in the form of a table, with records in non-sensitive attributes, sensitive attributes, and the corresponding decisions (or their mapping to the decisions, e.g., the probability of getting a certain decision). Similar to conventional PPDP problems, to prevent identification of a person from published attribute values, the announced decision-associated attributes values should be perturbed/generalized. Based on an announced algorithmic transparency report, data analyzers measure the fairness of the decision to each protected attributed, e.g., gender, race, etc., to understand if the 12 decision maker has preference for a particular group of people. However, bluntly dis- closing the decision making policy could seriously hurt users’ privacy. An adversary may know all non-sensitive records of an individual, as well as the decision he received. If an announced algorithmic transparency report bluntly discloses the clear mapping from all sensitive and non-sensitive input attributes to the decision, an adversary can utilize it with the assistance of the side-information of statistical inference between sen- sitive and non-sensitive attributes to narrow down the sensitive records of the targeted individuals. In this regard, a privacy-preserving mechanism for decision announcement in algorithmic transparency reports is crucial. However, previously proposed measures for utility and privacy in PPDP settings are not applicable to our problem. The fundamental reason is that measures of privacy and data utility for different applications may be modeled differently. The most closely re- lated application in PPDP is the classification problem [179, 178, 73, 72]. However, our goal for publishing a database, the privacy threats we consider, and the side-information we have, are very different from those in previous works. Motivated by the above-stated reasons, in this work, we propose proper measures for utility, privacy, and fidelity of fairness, particularly in the context of algorithmic transparency based on a database- publishing framework. 1.5 Main Contributions We make the following primary research contributions in this dissertation. In chapter 3, we explore poisoning backdoor attacks in the context of federated meta- learning and propose an effective and privacy-preserving defense mechanism. Our main contributions include: 13 • To the best of our knowledge, this is the first work to investigate poisoning back- door attacks in the context of federated meta-learning. We formulate the associ- ated threat model and experimentally explore the influence of poisoning backdoor attack in federated meta-learning. • Our experimental results show that the influence of poisoning backdoor attacks in federated meta-learning is long-lasting, and it is hard to be forgotten by the meta model through regular training and fine-tuning. As results reported in [13], accuracy of a one-shot backdoor attack in federated supervised learning degrades dramatically after tens of rounds of federated learning even when benign users do not possess any backdoor image. Our results show that in federated meta- learning, a one-shot backdoor attack can remain above 90% accuracies for both the attack training and validation datasets after 50 rounds of federated learning during normal training, and the attack influence diminishes very slightly after hundreds of epochs of regular fine-tuning during the meta-testing stage. We also show that simply enlarging the learning rate in regular fine-tuning during the meta-testing stage results in ruining the jointly-trained meta model: it forgets the backdoor attacks, as well as the knowledge it has learnt from federated meta-training. • As an effective defense mechanism, we propose a sanitizing fine-tuning process inspired by matching networks [171], to replace the regular fine-tuning during the meta-testing stage so as to remove the effect of poisoning attacks. Our proposed sanitizing fine-tuning can completely remove the effect of poisoning backdoor attacks in only a few epochs while preserving reasonable performances for the jointly-trained meta-model. In addition, our defense mechanism makes no as- sumption on distributions of users’ training data nor the portion of benign users, 14 and hence it can be applied to general scenarios. Moreover, the defense mech- anism is performed locally by each user; it does not require any potentially un- trustworthy third-party to access and examine user updates and thus is compatible with secure aggregation methods [28, 131]. Our defense mechanism hence fulfills the privacy-preserving motivation of federated learning. In chapter 4, we study utility-privacy tradeoff and investigate an optimal privacy- preserving decision announcement in algorithmic transparency reports. Our main con- tributions include the following: • We expose an important privacy issue that has not been noted and carefully in- vestigated in privacy literature for a new type of an important application, namely, algorithmic transparency. Previous efforts either did not consider privacy issues in algorithmic transparency, or only noted partial privacy issues but were not aware of (to our knowledge) the critical privacy issue that we will demonstrate in this thesis. • We note several shortcomings of announcing information of an algorithmic trans- parency report based on a data-mining framework. We envision algorithmic transparency in a database-publishing framework, in which an algorithmic trans- parency report is announced/published as a dataset (in the form of a table), with input records and the decision rule. Data analyzers can measure their quantities of interest based on the published dataset. The dataset should be published ju- diciously, similarly to PPDP, without leaking private information of individuals, while providing meaningful information of how decisions are made and preserv- ing the fidelity for quantities of interest, including fairness. 15 • We define proper measures for privacy and fidelity of fairness in an algorithmic transparency report. In particular, we propose an appropriate measure for util- ity for this application. The utility represents the amount of useful information (the average reduced uncertainty of updated belief compared to blind knowledge) of how decisions are made conveyed by an announced algorithmic transparency report; our proposed measure can properly characterize the usefulness of a classi- fication result and is also applicable to probabilistic decisions. • Based on the proposed and defined measures, the utility-privacy-fidelity of fair- ness tradeoff forms a convex optimization problem. We first show that the con- vex optimization problem can be decomposed into multiple independent sub- problems, which dramatically reduce the problem size, and thus the optimal solu- tion can be obtained efficiently. We then study the related fundamental trade-offs and properties. We derive necessary and sufficient conditions for perfect utility and perfect fidelity of fairness in terms of privacy parameters and state the feasi- ble utility-privacy-fairness trade-off region problem; although the measures and the definitions of each term are different, this is essentially similar to the rate- distortion-equivocation problem. In chapter 5, we study utility-privacy tradeoff and investigate the optimal design of oblivious DP noise-adding mechanism under Bayesian model. Our main contributions include the following: • We classify these DP noise-adding mechanisms based on their different settings, assumptions, strengths, and limitations. For the utility-maximization framework in DP, we use experiments to understand the privacy-utility tradeoff of adding various oblivious noise-adding mechanisms to the output of single-dimensional numeric (scalar) query functions. More specifically, given a query output domain 16 (continuous or discrete), we investigate the design of high utility preserving obliv- ious noise-adding mechanisms for a given privacy regime (high or low) under the Bayesian setting. Our study takes into consideration (i) different privacy regimes (levels of privacy strength), (ii) continuous and discrete query output domains, (iii) varied levels of query sensitivity, (iv) query side information, and (v) the presence of collusion and longitudinal attacks on a query. • In addition, our experiments help provide supporting evidence and counterexam- ples to existing theory results on the optimality of noise-adding mechanisms when they are tested on a relaxed assumption set. The experimental results also provide us with conjectures on appropriate (in the sense of privacy-utility tradeoffs) oblivi- ous noise-adding mechanisms selection for scalar queries with side information in Bayesian user settings, for which a general theory is yet to be developed. Follow- ing the study and our experimental results, we propose interesting and important open questions for the theory community in relation to the design and analysis of provably optimal oblivious DP mechanisms. We also point out unexplored field for optimal oblivious DP noise-adding mechanism design. • In particular, we propose a heuristic side-information-aware oblivious DP mech- anism design, which significantly improve utility in the low and intermediate pri- vacy regimes. Moreover, based on partial available side information, we also find that it is possible to design a meaningful mechanism which aids utility signif- icantly without knowing the actual prior. Our heuristic side-information-aware design shows that the optimal side-information-aware design is not trivial. In addition, in this heuristic design, we note that the pre-rounding stage prevents 17 the expected loss from converging to zero under collusion attacks, due to its ir- reversibility. This suggests interesting directions for the design of a collusion prevention mechanism. • For the optimal side-information-aware oblivious DP mechanism design, we for- mulate the optimization problem, analyze necessary-and-sufficient optimal con- ditions, and propose important properties. Our goal is to find a tractable solution for the optimal side-information-aware design; however, we find that our goal to this problem is quite challenging, and we wish to propose a tractable and optimal solution in future work. 18 Chapter 2 Literature Survey and Background Knowledge 2.1 Poisoning Backdoor Attacks and the Defense Mech- anisms Poisoning backdoor attack [86, 38] has been demonstrated effective on various kinds of machine learning models. [13] and [23] further investigated the attack in the context of federated (supervised) learning [118] and again showed the effectiveness and the stealthiness of the attack. Due to the feature of federated learning that users do not share data but only the updates of the model [118], techniques which certifying data belonging to the correct class [101, 130] are not applicable in such a context, and thus associated defense remains a challenging problem. Several defense mechanisms have been proposed: [156] and [132] estimate the distribution of the training data to suppress the influence of outliers based on the i.i.d. assumption. Same assumptions are made in [152, 27, 39, 185, 183, 121, 74] where outliers are detected and removed according to slightly different measures taken from the range of benign values, i.e., benign users are assumed the majority. Despite the limitations due to those assumptions, [13, 23, 18] claim and/or show successful backdoor attacks circumventing the above defenses. Based on the best of our knowledge, methods and effects regarding poisoning backdoor 19 attacks in federated meta-learning, as well as the associated defense, have not been explored in the literature. 2.2 Privacy Metrics for Data Publishing Data publishers, such as Governments, hospitals, and organizations, publish database for research, analytics, and services purposes. A published database is in the form of a table of records of individuals. Each column in a table is an attribute of a database, and each row is a multi-dimensional tuple which represents the collection of information/records of a single entity in the underlying population. From the published database, researchers or data analyzers want to learn relations between attributes, which in general can be classified in two categories: non-sensitive (or public) attributes and sensitive (or pri- vate) attributes. An individual’s records in public attributes could be known by some- one, but his records in private attributes are unknown to anyone before data-publishing. Privacy issues in data-publishing arise when a published database contains both. A privacy-preserving data-publishing (PPDP), roughly speaking, should remain those pri- vate records unidentifiable in the published database. Specifically, privacy leakage in data-publishing occurs when an adversary is able to identify an individual’s sensitive records from the published database. This can be achieved by using identity records and/or public records to identify a single entity. Once an entity is identified, the records in the whole corresponding row, including all pub- lished private records, are known by an adversary. To prevent such an attack, an early idea of anonymization was proposed to prevent identification [42]. Any identity in- formation, such as name or social security number, shall be removed or suppressed in the published database. Moreover, an individual’s records in public attributes, such as gender, age, zip code, etc., are non-sensitive and can be known by some other people. 20 If any of these records is unique enough, it can be utilized for identification, and thus generalization and/or suppression for public records are required. Such a pre-processed public record is called an anonymized public record. Each anonymized public record does not uniquely identify a record owner; however, their combination, called the quasi- identifier (QID) [43], can still be utilized to identify a unique person or a small number of group of people. Such an identification from a set of anonymized records is called re-identification. A classic example in [163] has shown that around 87% of the U.S. pop- ulation can be uniquely re-identified by the QID =fGender, Zip code, Date-of-birthg. 2.2.1 k-anonymity To prevent re-identification from any instant qid in QID, Samarati and Sweeney pro- posed a notion calledk-anonymity in 1998 [142, 143]. The concept ofk-anonymity is the following: If an anonymized tuple of public records of a single entity equals to an instance qid, then there must be at least otherk 1 anonymized tuples of public records of otherk 1 entities equal to qid. The idea is to make thesek tuples indistinguishable with each other. Thesek anonymized tuples with the same qid form a QID group, or called an equivalence class. In other words, the chance that an adversary can correctly identify a certain targeted person in an equivalence class cannot be better than 1 k , which is the probability of random guessing among a QID group. Note that all public attributes should be included in QID. If a sensitive/private record is known by an adversary before data-publishing, then such a record should be considered as a public record and should be included in QID as well. Table 2.1 is a micro dataset with 12 people’s records in three non-sensitive attributes fZip code, Age, Nationalityg and one sensitive attributef(Health) Conditiong. Table 2.2 is a realization of 4-anonymity for Table 2.1. We can easily observe that there exists many ways to realize 4-anonymity for this micro dataset; for example, Table 2.3 21 Table 2.1: Inpatient Microdata [113] Non-Sensitive Sensitive Zip Code Age Nationality Condition 1 13053 28 Russian Heart Disease 2 13068 29 American Heart Disease 3 13068 21 Japanese Viral Infection 4 13053 23 American Viral Infection 5 14853 50 Indian Cancer 6 14853 55 Russian Heart Disease 7 14850 47 American Viral Infection 8 14850 49 American Viral Infection 9 13053 31 American Cancer 10 13053 37 Indian Cancer 11 13068 36 Japanese Cancer 12 13068 35 American Cancer Table 2.2: 4-Anonymous Inpatient Microdata [113] Non-Sensitive Sensitive Zip Code Age Nationality Condition 1 130** < 30 * Heart Disease 2 130** < 30 * Heart Disease 3 130** < 30 * Viral Infection 4 130** < 30 * Viral Infection 5 1485* 40 * Cancer 6 1485* 40 * Heart Disease 7 1485* 40 * Viral Infection 8 1485* 40 * Viral Infection 9 130** 3* * Cancer 10 130** 3* * Cancer 11 130** 3* * Cancer 12 130** 3* * Cancer is another realization of 4-anonymity. Apparently, for a large database, there exists tremendous numbers of realizations for any givenk-anonymity. The main challenge is to obtain the “best” one, i.e., achievingk-anonymity with minimal information loss, or maximal data utility; unfortunately, it has been shown that such a problem is NP-hard [5, 81, 120]. 22 Table 2.3: 4-Anonymous 3-Diverse Inpatient Microdata [113] Non-Sensitive Sensitive Zip Code Age Nationality Condition 1 1305* 40 * Heart Disease 4 1305* 40 * Viral Infection 9 1305* 40 * Cancer 10 1305* 40 * Cancer 5 1485* > 40 * Cancer 6 1485* > 40 * Heart Disease 7 1485* > 40 * Viral Infection 8 1485* > 40 * Viral Infection 2 1306* 40 * Heart Disease 3 1306* 40 * Viral Infection 11 1306* 40 * Cancer 12 1306* 40 * Cancer 2.2.2 l-diversity One serious drawback ofk-anonymity is that it cannot protect users’ privacy from the attack of attribute linkage, in which adversaries may not be able to identify the tar- geted person, but from the relations between attributes, adversaries can still infer his sensitive values/records based on the associated qid. For example, Table 2.2 satisfies 4- anonymity; however, if an adversary knows a certain targeted victim belongs to the QID group =f130**, 3*, *g, although the adversary has no idea which anonymous tuple belongs to the victim, he still knows the victim gets cancer. To address such an issue, Machanavajjhala et al. proposed the diversity principle, which is called l-diversity [113, 114]. The principle basically requires that, for each appeared qid group in the published dataset, there must be at leastl “well-represented” sensitive values associated with it. A dataset is said to bel-diverse if the above statement is satisfied for every sensitive attribute in the dataset. Several suggested instantiations of l-diversity have been proposed and analyzed. The most intuitive definition is to ensure that there are at least l “distinct values”, and the corresponding instantiation is called 23 distinct l-diversity. Table 2.3 is a 4-anonymous, distinct 3-diverse micro dataset, as for each appeared instance of qid group, there are 3 distinct diseases associated with it. However, distinct l-diversity cannot prevent probabilistic inference attack. This is because the corresponding events of some sensitive values are more likely to happen in nature compared with others, and thus it may result in leaking sensitive information. If we enforce distinctl-diversity in the dataset, this would enable adversaries to conclude sensitive values to those high-frequency values, and thus enhance the probability of guessing it right. Such an issue motivates the following stronger notions ofl-diversity. The first notion is called entropyl-diversity. Instead of physically requiringl well- represented sensitive values, it ensures the entropy of any set (qid;s) to be larger than a threshold logl, wheres2 S is any possible sensitive value for the sensitive attributeS in the equivalence class qid. Specifically, it requires an anonymized table to be entropy l-diverse for every qid group: X s2S P (qid;s) log(P (qid;s)) logl: (2.1) One drawback of entropyl-diversity is that it does not convey an intuitive measure for the probability risk level. For example, it is hard to realize that entropy 1.8-diverse represents the risk level that an adversary has 75% of chance successfully inferring a sensitive value. Another drawback is that it is difficult to specify different protection levels based on varied sensitivity and frequency of sensitive values [174]. The second notion is called recursive (c;l)-diversity. An anonymized table satisfies recursive (c;l)-diversity if every qid group in the table satisfies (c;l)-diversity. Instead of physically requiring l well-represented sensitive values in a qid group (of size k), (c;l)-diversity limits the frequency of the highest frequent sensitive value to be bounded by the sum of frequencies of thekl+1 least frequent sensitive values. Specifically, let 24 f i be a partially-ordered set denoting the frequency of thei-th most frequent sensitive value, and we have f 1 <c k X i=l f i ; (2.2) wherec is a constant specified by the data publisher. However, the drawback of (c;l)- diversity is that it cannot be used to bound the frequency of sensitive values that are not the most frequent, therefore the flexibility is limited. 2.2.3 t-closeness l-diversity takes an important step moving forward based on k-anonymity in order to protect users’ privacy from attribute linkage attacks. However, it still has many limita- tions and shortcomings. First, it may not be feasible for arbitraryl due to the limitations from the output domains of sensitive attributes. For example, consider the following case: A sensitive attribute “Test results of HIV” contains only two values, positive and negative. Any dataset containing a binary attribute like this is not able achieve more than 2-diversity. Second,l-diversity may not be always necessary. Recall the HIV example. A person would not mind being known to be tested negative, but would not want to be considered to be tested positive. It is not necessary to require both negative and positive to be appeared in every qid group to achieve 2-diversity. Third, l-diversity implicitly assumes that sensitive values are uniformly distributed. If this is not the case, imple- mentingl-diversity results in seriously performance degradation in data utility. Recall the HIV example again. Assume the probability of getting a positive test result is only 0.1%. i.e., in expectation, there are only 1 positive results over 1000 records. If we enforce 2-diversity to a dataset with 10000 records, we can at most divide the dataset 25 into 10 groups, where each group needs to contain at least 1000 records with same qid, and thus data utility is dramatically hurt. Moreover, if there exists a QID group having 10% of positive and 90% of negative, it indubitably satisfies 2-diversity; however, such a group exposes record owners in a serious privacy threat because any individual in such a group could be inferred as getting HIV with 10% confidence, which is 100 times compared with the ground truth of 0.1%. Specifically, when the overall distribution of a sensitive attribute is skewed,l-diversity does not prevent attribute linkage attacks [50]. Such an attack due to skewness of distribution of sensitive values is called a skewness attack. Indeed,l-diversity has its limits to tackle such an issue. To prevent skewness at- tack, Li et. al. proposed another principle calledt-closeness [110], which requires the distribution of sensitive values in any qid group to be “close enough” to the empirical prior distribution of the corresponding sensitive attribute. The closeness between two distributions is measured by the Earth Mover’s Distance (EMD) and parametrized by the parametert. Nevertheless,t-closeness has several drawbacks. First, EMD is not suitable for preventing attribute linkage on numerical sensitive [50]. Second, it lacks the flexi- bility of specifying different privacy level (which is the closeness) for different sensitive attributes. Finally, the requirement of being “close enough” to the prior distribution for all distributions of sensitive values in all QID groups is too strict, which significantly damages data utility and destroys the correlation between QID and sensitive attributes. Some suggestions about relaxations oft-closeness to improve data utility can be found in [50]. 2.2.4 Confidence Bounding Bothl-diversity andt-closeness have a common drawback: privacy level is controlled by a single parameter, and thus both lack the flexibility of specifying different privacy 26 level for different sensitive attributes. Confidence bounding proposed by Wang et al. [176, 177] allows the flexibility for the data publisher to specify a different threshold h for each combination of any instance qid in QID and a specific sensitive value s in attribute S. Specifically, a threshold h limits the maximal allowed percentage for the records (qid;s),8qid2 QID. From adversaries’ perspective, it limits the confidence (in the probability sense) of inferring a sensitive values from QID, which can be denoted as Conf(QID! s) = max qid2QID fConf(qid!s)g h. As mentioned earlier, confi- dence bounding has advantages of being more intuitive and more flexible over entropy l-diversity and recursive (c;l)-diversity; however, extension versions of recursive (c;l)- diversity, known as positive disclosure-recursive (c;l)-diversity and negative/positive disclosure-recursive (c 1 ;c 2 ;l)-diversity, can prevent attribute linkages in the presence of side-information, but confidence bounding does not have such an advantage. In ad- dition, unliket-closeness, it doesn’t have the ability to prevent attribute linkages from skewness attack. The original intention of data-publishing is to give out information, i.e., relations between attributes, for research, medical, or other purposes. Preserving privacy is in- deed a very important aspect in data-publishing, but how can we do it meaningfully? Consider the following extreme case: A data-publisher suppresses all public attributes in the published database in order to protect privacy. Such a privacy-preserving method is meaningless because the published database does not give out any useful information to the public, which violates the original intention of publishing it. Therefore, on the premise of meaningfulness, privacy preservation should never be a standalone problem, and it shall always be done together with conservation of data utility. However, it is apparent that retaining data utility and preserving privacy are always contradiction. To retain data utility while preserving privacy, it is very important to understand the tradeoff between data utility and privacy. 27 To understand the tradeoff between data utility and privacy, we need to first pro- pose adequate measures/metrics for data utility, i.e., how we should define useful- ness of a published (anonymized) database, and analyze its tradeoff with privacy con- straints/requirements, such as k-anonymity, l-diversity, t-closeness, and so on. In the following, we review related works of utility-privacy tradeoff in PPDP based on their proposed data utility metrics. 2.3 Utility Metrics for Data Publishing The loss of data utility comes from privacy-preserving methods, such as anonymization, suppression, or perturbation. Therefore, a reasonable utility (or loss) metric should measure “similarity” (or “distortion”) between the original data and the anonymous data. Several different utility metrics have been proposed over these two decades. Minimum Distortion metric (MD) [160] simply counts the number of generalized or suppressed entries in the database. In other words, a binary loss/cost/penalty function is charged to each instance of a value: 0 if the entry is not generalized, and 1 otherwise. This binary single attribute measure was widely used in literatures [43, 5, 120, 141, 162, 175] due to its simplicity; however, such a measure for utility loss is very coarse. Samarati [141] proposed a more general penalty function for generalization, which is called Generalization Height (GH). Instead of giving a binary penalty to all generalized entries in the database, GH gives a penalty i1 h1 to an entry if it is generalized to the i-th finest hierarchical clustering level over total h levels, i = 1; ;h. This metric is between 0 (i = 1, no generalization at all) and 1 (i = h, total suppression). It is also used in [5] for approximation algorithms. Another more precise and more general metric was proposed in [95], which is known as Loss Metric (LM). Unlike defining the penalty as the ratio of the hierarchical level indexes in [95], they define the penalty 28 as the ratio of the size of the generalized subset G over the size of the domain of the corresponding attributeA, jGj1 jAj1 . The penalty (to a single entry) is also between 0 (jGj = 1, no generalization at all) and 1 (jGj =jAj, total suppression). The overall utility loss of a table/dataset is the average penalty. This metric was also used in [125] and [182]. In some works [33, 78], they used the combination of LM and GH: LM measures the cost for numerical attributes, and GH measures the cost for categorical attributes. The above three metrics charge a penalty to each entry in the database, and the utility loss of a published database is the averaged overall penalty due to generaliza- tion and suppression. In [125], they proposed Ambiguity Metric (AM) which charges a penalty to a generalized tuple of all public attributes. The charged penalty is the size of the Cartesian products of all entries in a generalized tuple, so its range is from 1 to jA 1 A 2 A p j, whereA i is thei-th public attributes, i = 1; ;p. The draw- back of AM is that it counts all possible combinations of generalized values in attributes, including those not appeared in the database. All the above metrics charge a penalty for generalizing a value in an entry, or a set of values in a tuple, by measuring the indistinguishability of values in the generalized subset. In contrast, the Discernibility Metric (DM) charges a penalty to each generalized tuple for being indistinguishable from other tuples with respect to QID [94]. IfjEj tuples within an equivalence classE (i.e., a QID group) are indistinguishable from each other, then an “indiscernibility” penaltyjEj, which is the size of the group, is charged to every tuple in the group, resulting in a total penaltyjEj 2 for an equivalence class E. If an anonymous dataset D needs to maintain discernibility between tuples as much as is allowed by a thresholdk, then any tuple which falls into an equivalence classE of size smaller thank needs to be totally suppressed, which results in a penalty ofjEjjDj. The overall utility loss ofD in DM would therefore be P 8E;s.t.jEj<k jEjjDj + P 8E;s.t.jEjk jEj 2 . DM was widely used in literatures, such as [113, 114, 20, 109, 170, 184], because its 29 metric captures the concept ofk-anonymity, which requires an anonymized tuple to be indistinguishable with otherk 1 tuples with respect to QID. Kifer et. al. [100] proposed a utility metric by measuring the distance between dis- tributions. Specifically, let ^ F 1 be the empirical joint distribution on the space/domain of Cartesian productsjA 1 A 2 A p j of all public attributes in the original database, and let ^ F 2 be the maximum entropy joint probability distribution computed based on the marginal distributions of generalized tuples in the corresponding anonymized database. The information/utility loss is defined as the Kullback–Leibler (KL) divergence between ^ F 1 and ^ F 2 , which is minimized when ^ F 1 = ^ F 2 . It is worth mentioning that the utility measures of all the above works are based entirely on public attributes and ignore private attributes. However, recall that the main goal of data-publishing is to learn the relations between attributes, especially between public attributes and private attributes. A utility metric which considers the influence of generalization/suppression to the relations between public and private attributes using distance measures was first proposed by Li [111]. LetF y be the true distribution of sensitive attributes in a large populationy from the original dataset, and ^ F y be the estimated distribution of sensitive attributes in the large populationy from the anonymized dataset. The information/utility loss fory is defined as the Jensen–Shannon (JS) divergence betweenF y and ^ F y . The overall utility loss of an anonymized dataset is the averaged utility loss for all large populationsy. Moreover, in this work, they consider the privacy loss of a generalized tuplet as the JS divergence between the prior knowledge of a sensitive value (which is irrelevant tot and QID) and the posterior knowledge of that sensitive attribute in the equivalent class that contains t. The reason they considered JS divergence instead of KL divergence is because KL divergence is not well-defined when there are zero probabilities. The above two works 30 are under the assumption that data values in each database entry are independent and identically distributed (iid). 2.4 Utility-Privacy Tradeoff In [172], privacy is defined as the asymptotic lower bound on the number of queries per bit of entropy, which represents the minimum cost of an adversary to achieve a small error inference attack, which is defined as the maximum probability of error of estimate the sensitive data being asymptotic zero when the number of queries goes to infinite. They showed that the asymptotic lower bound is the inverse of the Shannon capacity of the random data perturbation channel. Moreover, they showed that the total number of query needs to increase to reduce the error; however, the number of queries per bit of entropy need not. [146, 148, 145, 147, 149] study the tradeoff between utility and information- theoretic privacy in PPDP. They consider the expected distortion, which is the expected distance between the original values and the generalized/perturbed values of public at- tributes in the database, as a measure of utility, and the equivocation, which is the un- certainty per entry (normalized to the size of the database) of the values in sensitive attributes given the observation of the perturbed values in public attributes, as a measure of privacy. They provide asymptotic results on rate-distortion-equivocation when the size of the database goes to arbitrarily large. The utility-privacy tradeoff problem is thus modeled as a communication problem where the rates needs to be bounded, and a pri- vacy mechanism designer needed to design an “encoder-decoder” to achieve the certain asymptotic rate. In contrast of the above works, [51] modeled non-asymptotic privacy guarantees in terms of mutual information inference gain that an adversary achieves by observing the 31 released perturbed outputs of public attributes. They showed that a mechanism satis- fying"-information privacy implies 2"-differential private, and the average information leakage is bounded by " ln 2 bits; however, if the possible values and size of the database and the output and the prior can be chosen freely, differential privacy alone does not in general provide any guarantee on the amount of information leaked, in terms of both av- erage and maximum information leakage. Similarly, [115] proposed optimal oblivious 1 DP noise-adding mechanism, particularly for l 2 -distortion, when only partial side- information is known. Specifically, similar to [51], they consider expected distortion as a measure of utility, and the mutual information inference as a measure of privacy. The optimization problem is basically the celebrated rate-distortion problem. If the prior distribution of the true query outputs is unknown, but only the variance is known, they showed that Gaussian noise is the optimal noise-adding mechanism in the following sense: (i) if the distortion isl 2 -distortion, (ii) if the query output domain is continuous and unbounded, and (iii) if mutual information inference is the privacy metric. They then provided bounds on the impact of mismatched prior information to the mutual in- formation privacy-utility tradeoff in [140]. Moreover, as when the size of the database is very large, the optimization may face scalability issues and become intractable, in [140] and [139], they introduced a quantization step to reduce the optimization size, and then showed how to generate privacy mapping under quantization. In [116], they particularly considered log-loss function as the measure of distor- tion in the rate-distortion formulation, and named the optimization problem privacy funnel. They showed that any bounded loss function can be upper bounded by an explicit function of the mutual information between the unperturbed and perturbed data/query outputs. They then compared privacy funnel with information bottleneck 1 They assumed queries are deterministic so that minimizing mutual information between sensitive data and perturbed query output is equivalent to minimizing mutual information between true query output and perturbed query output, which is an oblivious design. 32 [165]. Specifically, information bottleneck method tries to maximizeI( ~ X;S), the mu- tual information between perturbed query outputs/generalized data ~ X and the sensitive data S in data mining/data publishing, subject to the constraint of information bottle- neckI(X; ~ X)R for some rateR between unperturbed and perturbed values (X and ~ X) in query outputs/public attributes in data mining/data publishing, respectively. How- ever, privacy funnel is doing the opposite: it tries to limit the inference from ~ X to S, i.e., minimizeI( ~ X;S), subject to the log-loss distortion constraintI(X; ~ X)R (upper bounded distortion implies lower bounded rate). As the privacy funnel optimization is a minimization problem over a non-convex function, they provided a greedy algorithm for finding a local minimum instead of a globally optimal solution. [12] and [34] then studied perfect privacy based on the above privacy funnel work [116] and followed the same setting that both privacy and utility are measured in terms of mutual information. More specifically, nontrivial perfect privacy means that the re- leased perturbed data/query outputs leak no information about the secret, but provide a nontrivial amount of useful information, i.e., I( ~ X;S) = 0 withI(X; ~ X) > 0. [12] characterized the utility-privacy tradeoff in the case of perfect privacy, and provided an upper bound for the privacy funnel for binary database. Independently, [34] also pro- vided an upper bound and a lower bound for the privacy funnel. Both of them showed that perfect privacy can be achieved if and only if ~ X and S are statistically indepen- dent. [34] further showed that perfect privacy can be achieved if and only if the optimal privacy-utility coefficientv (p S;X ), inf p ~ XjX I(S; ~ X) I(X; ~ X) is zero, which can happen if the small- est principle inertia component [34, 52] is zero, or the cardinality of the domain ofX is larger than which ofS. Moreover, they provided an explicit lower bound for the largest amount of useful information that can be disclosed under perfect privacy. [19] studied utility-privacy tradeoff region for the following three settings: (i) full- data ((S;X) ! ~ X) [133, 51], (ii) privacy funnel (S ! X ! ~ X) [116], and (iii) 33 information bottleneck method (X ! S! ~ X) [165]. They showed that the utility- privacy region of (iii) is no larger than the region of (ii), which in turns is clearly no larger than the region of (i). 2.5 Fairness Metrics 2.5.1 Measures for Individual Fairness Definition 1. ((D;D)-Individual Fairness [53]) Given a distance measureD :R X R X ! R + , [0;1) on individuals’ records, a decision mappingD :R X ! (A) satisfies individual fairness if it complies with the (D;D)-Lipschitz property for every two individuals’ recordsx 1 ;x 2 2R X , i.e., D(D(x 1 );D(x 2 ))D(x 1 ;x 2 ); (2.3) whereD : (A) (A)!R + is a distance measure on distributions overA. More- over, we defineD satisfying individual fairness up to bias" if for allx 1 ;x 2 2R X , we have D(D(x 1 );D(x 2 ))D(x 1 ;x 2 ) +": (2.4) Individual fairness ensures a decision mapping maps similar people similarly. When two individuals’ recordsx 1 andx 2 are similar, i.e.,D(x 1 ;x 2 ) = 0, the Lipschitz con- dition in equation (2.3) ensures that both records map to similar distributions overA. Candidates for distance measureD include (but not limited to) statistical distance and 34 relativel 1 metric. The relativel 1 metric ( a.k.a. relative infinity norm) of two distribu- tionsZ 1 andZ 2 , defined as follow D 1 (Z 1 ;Z 2 ) = sup a2A log max Z 1 (a) Z 2 (a) ; Z 2 (a) Z 1 (a) ; (2.5) is considered a potential better choice in the aspect that it does not require the distance measureD to be re-scaled within [0; 1] 2 . However, it has the shortcoming that it is sensitive to small probability values. The statistical distance, or the total variation norm, of two distributionsZ 1 andZ 2 , defined as follow D tv (Z 1 ;Z 2 ) = 1 jAj X a2A Z 1 (a)Z 2 (a) ; (2.6) is a more stable measure in this aspect. 2.5.2 Measures for Group Fairness Popular measures for group fairness include (but not limited to) statistical parity (SP) (a.k.a. demographic parity) [53, 65, 187, 98], conditional statistical parity (CSP) [97, 41],p-% rule (PR) [65, 24], accuracy parity (a.k.a. equalized odds) [90], and true positive parity (a.k.a. equal opportunity) [90]. However, the last two measures require knowledge of labeled outputs and is thus particular used to train fair ML algorithms in supervised learning. For algorithmic transparency, we use the former three measures for group fairness. Defineg(X) a projection function from input attributesX onto a group in protected attributes, andT Y ,fx2R X j g(x)2Yg the set/tuple in which records belong to 2 The normalization could bring non-trivial burden, especially when the maximal distance can be arbi- trarily large. 35 a protected groupY. We summarize definitions of measures for group fairness in the following: Definition 2. (Statistical Parity) A decision mappingD :R X ! (A) satisfies statis- tical parity for two groupsY 1 andY 2 up to bias" if for every decision outcomea2A, we have the following property D tv E [D a (X)jT Y 1 ];E [D a (X)jT Y 2 ] ": (2.7) Definition 3. (Conditional Statistical Parity) Given a score/valuation function v(X) based on input attributes X, define T Y;V ,fx2R X j g(x)2Y;v(x)2Vg the set/tuple in which records belong to a protected groupY having scores in a setV. A de- cision mappingD :R X ! (A) satisfies conditional statistical parity given the same score conditionsV for two groupsY 1 andY 2 up to bias" if for every decision outcome a2A, we have the following property D tv E [D a (X)jT Y 1 ;V ];E [D a (X)jT Y 2 ;V ] ": (2.8) Definition 4. (p-% Rule) A decision mapping D : R X ! (A) satisfies p-% rule for two groupsY 1 andY 2 if for every decision outcomea2A, we have the following property log E [D a (X)jT Y 1 ] E [D a (X)jT Y 2 ] logp: (2.9) 36 2.6 Privacy and Fairness in Algorithmic Transparency There exist very few works addressing the potential impact on privacy brought on by al- gorithmic transparency and fairness measures. [9] investigates the limitations of trans- parency and its impact on society. It is noted that transparency can threaten people’s privacy, but it is yet to be made clear what possible aspects of transparency can hurt privacy, and how we could remedy the situation. [45] provides transparency for how Google’s ads interact with users’ ad privacy settings, showing the disparate impact that female gender setting has (vs. male gender setting) on results, i.e., with fewer instances of ads related to high paying jobs. Motivated by transparency and fairness, [62] raises issues and open questions regarding fair privacy for all participating users, as it is con- sidered discriminatory when different users are protected by different levels of privacy. Arguably, the only work that provides a comprehensive study of transparency, fair- ness, and privacy in an accountable algorithmic transparency report is [44]. The authors propose a measure, called quantitative input influence (QII), to quantitatively measure the inputs’ influence on the outputs of an unknown decision blackbox. Based on QII, the authors propose public and personalized transparency reports, as well as a fairness mea- sure, called group disparity, to measure potential disparate impacts on different groups of people. Differentially private noise is adopted in order to prevent potential privacy leaks caused by the provided information in an algorithmic transparency report. How- ever, applying differential privacy only does not result in prevention of privacy inference attacks, as we have shown in this paper. 37 2.7 Utility, Differential Privacy, and Information- Theory [91] investigated (and provided tighter) lower and upper bounds onl 2 -error of DP mech- anisms, based on several technical assumptions and different settings in order to utilize some nice properties in geometry for analysis. First, their notion of neighboring for DP definition is defined asl 1 distance below 1, not the common Hamming distance differing in a single coordinate. Second, they assumed binary databases, in which each individual data can only take binary valuesf0; 1g. Third, they restricted the multi-dimensional queries to be linear maps with coefficients in the interval [1; 1], i.e., a query is a ma- trix with entries in [1; 1], and thus the multi-dimensional query outputs form a convex polytope. Last but not least, they assumed the query matrix is random enough, so that the convex polytope is in isotropic position. Under the above assumptions and settings, they derived a lower bound for the error in the high privacy regime, and gave an up- per bound by analyzing their proposed DP mechanism, K-norm mechanism, which is an instantiation of the exponential mechanism [119] involving random sampling from a high-dimensional convex polytope. [46] followed most of the settings and assumptions in [91], but generalized the query to be any maps which is 1-Lipschitz. They utilized similar geometry properties to derive lower bounds on the noise, i.e., the amount of distortion needed, to preserve differential privacy and approximate differential privacy, for arbitrary low-sensitivity queries. They also showed that approximate differential privacy requires less distortion but is much weaker than differential privacy, even when is negligible in the sizen of the database. [117] and [46] showed that the amount of information leaked by an"-differentially private mechanism is upper bounded on the mutual information in the order ofO("n) and O("m), respectively, where n is the size of the database, and m is the dimension 38 of data entry, i.e., the number of non-trivial attributes in the database 3 . Both bounds together indicate that if the universe (the number of all possible values) and the size of a database can be chosen freely and can be arbitrarily large, the information leakage of a DP mechanism is basically unbounded. [122] utilized the results in the information bottleneck method [165] and pointed out that the mechanism that minimizes information-theoretic privacy risk under the rate- distortion optimization formulation is a differentially private mechanism. Similar to most of information-theoretic privacy works, mutual information between database en- tries and perturbed query outputs is the measure of information-theoretic privacy; how- ever, unlike conventional works where expected distortion measures distance between true and perturbed query outputs, they measured distance between multi-dimensional database entries and scalar perturbed query outputs. [151] and [51] showed that even in case of facing the most powerful adversary such that all othern1 individuals are known, differential privacy guarantees a small leakage " ln 2 bits as measured by mutual information and min-entropy, respectively. [151] showed that given the same distortion constraint under Hamming distance, in the case of binary sources/databases, the optimal differential privacy implies a lower bound on privacy risk for the optimal mutual information, and the lower bound grows logarithmically in the sizen of the database. [180] studied the relationship between optimal mechanisms for"-identifiability, "- differential privacy, and"-mutual-information privacy. They indicated that although the above many works studied relationships and bounds between differential privacy and mutual-information privacy [91, 117, 46, 51, 122, 151], their studies did not target at the relationships for optimal privacy. For example, [122] indicated that the optimal mech- anism for mutual-information privacy guarantees a certain level of differential privacy. 3 If an attribute can be fully characterized by other attributes, it is considered a trivial attribute. 39 However, it is unclear that if the mechanism is also optimal, or how far it is from opti- mal, for differential privacy. In [180], they showed that given the same distortion and the same privacy parameter", the optimal"-identifiability mechanism is also the optimal"- mutual-information privacy mechanism, and both are stronger/stricter than"-differential privacy. Moreover, they also showed that under the same distortion constraint, the mutual-information optimal mechanism is differentially private within a certain range to the optimal differentially private mechanism. However, the above results are based on Hamming distortion measure and the assumption that the domain of the perturbed query outputs is the same as the domain of the database, which is typically not the case in practice. Moreover, they also relaxed some constraints in the optimization problem. It is unclear if the proposed results will hold if the assumptions and simplification are removed. In contrast of the above works which studied bounds on mutual information leakage for differentially private mechanisms, [17] and [8] investigated the maximal leakage of min-entropy for DP mechanisms. Under the assumption of binary databases, [17] pro- vided an upper bound for the leakage which is linear to" and the sizen of the database, but pointed out that the bound is not tight in general. [8] proposed a bound in a more general setting where databases need not necessarily being binary. Besides, they mainly focused on bounds for oblivious DP mechanisms. They showed that the bound for the leakage is a function ofn,", and the number of different values that each entry can take, and similarly, the bound is linear proportional to the size of the databasen. They also showed that a differentially private mechanism induces an upper bound on the utility gain, under technical assumptions of binary gain function and isotropic position of the convex body of multi-dimensional query outputs as [91] and [46] did. However, those bounds are tight only when the prior of the true query outputs are uniform distributed. 40 They also proposed an optimal randomization mechanism, which is optimal in the sense that it maximizes utility when the prior distribution is uniform. 2.8 Privacy Mechanism Design 2.8.1 Mechanism Design for Numeric Queries The generally acknowledged oldest privacy-preserving mechanism can date back to ran- domized response, which was first proposed by S. L. Warner in 1965 [181] and later modified by B. G. Greenberg in 1969 [84]. Starting from the early 21th century, privacy in data mining and database publishing received researchers’ attentions [6, 49, 64, 57]. Differential privacy (DP) [56], a rigorous definition of privacy addressing the above- mentioned privacy issues proposed by Dwork et. al. in 2006, becomes a criterion for justification of privacy preservation. In their paper, they also indicated that differential privacy can be satisfied by simply adding a Laplace noise (known as Laplace mecha- nism) to the unperturbed query output for the pure DP, or by adding a Gaussian noise for the approximate DP. Although Laplace mechanism is a valid mechanism satisfying pure DP, it was un- known if it is an optimal mechanism under privacy-utility tradeoff. Ghosh, Roughgar- den, and Sundararajan [88, 79] show that for a single count query, with special property GS = LS = 1, for a general class of utility functions, the universally optimal mech- anism (see definition below) for preserving differential privacy is the Geometric (noise) mechanism. In [88], the authors propose mechanism analysis similar to that in [79], with similarities and differences as follows: both [88] and [79] study a count query where the query output is integer-valued, bounded, with unit sensitivity. The cost function only depends on the additive noise magnitude and is an increasing function of the noise mag- nitude. [88] is based on a non-Bayesian risk-averse model; however, where [79] is based 41 on a Bayesian model. In [88], the authors show that although there is no optimal solution to the minimax optimization problem (to optimize privacy-utility tradeoff) for a general class of cost functions, each solution corresponding to a specific cost function instance can be derived from the same Geometric mechanism by randomly remapping. From [79] and [88], it easily follows that the Geometric mechanism is universally optimal (see def- inition below) for every count query under both, the Bayesian and risk-averse models. Moreover, [32] states the following definition of universal optimality of a differentially private mechanism. Definition 5. Given a query and a privacy level", a"-differential private mechanismX is universally optimal if and only if every useru derives as much utility fromX as from the mechanismX u which is optimally tailored tou, no matter whatu’s side information and preferences are. The above definition of universal optimality reflects an extremely strong utility guar- antee, regardless of the side information and preferences of users, for Bayesian and risk- averse models. Unfortunately, such a strong guarantee does not hold for general queries under either the Bayesian or risk-averse model. Brenner et al. showed the impossi- bility of universally optimal oblivious mechanism for histograms, generalizations of count queries, and several other queries satisfying certain properties, for Bayesian and risk-averse users [32]. Geng et al. recently proposed a Staircase mechanism [76, 75] and proved optimality (not universal) only under the risk-averse model for general real- valued query functions where the query output can take any real value. However, the op- timality of the Staircase mechanism holds only under the following assumptions stated in [76, 75]: • The query output domain is the entire real domain ranging from -1 to +1. 42 • There is no side information available to a query generator about the output of the query function is known. • Local sensitivity equals global sensitivity, or the sensitivity should remain con- stant over all possible query outputs, so that the optimal NGM is in the family of NGMs that are query-output independent. However, the first and last assumptions do not hold true for many query functions in practice, e.g., quite a few queries whose outputs are scalar non-integers. Nissim, Raskhodnikova, and Smith [127] show that for certain nonlinear query func- tions, one can improve the query output utility by adding data-dependent noise cali- brated to the smooth sensitivity of the query function, which is based on the local sensi- tivity of the query function. In the model in [76], the authors use only global sensitivity of the query function to prove the optimality of the Staircase mechanism for nonlinear query functions, and assume that the local sensitivity is the same as the global sensi- tivity, given unbounded query output domain. However, it has been shown by Chen et. al. [36] that Staircase mechanism is not optimal under Bayesian model, and the optimal mechanism design with presence of prior information still remains open. They pro- posed a heuristic design, and showed that comparing to all existing mechanisms, there is a significant room for utility improvement. 2.8.2 Mechanism Design for Non-numeric Queries McSherry et. al. proposed Exponential mechanism in 2007 for answering non-numeric queries differentially privately [119]. A score function is proposed to map a dataset to a real-valued score. It tells us how good the output is for the input dataset. The Exponential mechanism then takes the output of score function and make it differentially private. 43 2.8.3 Answering Multiple Queries: Composition Theorem It has been shown that there are privacy breach issues under collusion [54, 55, 60]. The perturbations to the query answers can be weakened. An early version of the composi- tion theorem which analyzes such a weakened phenomenon and provides a lower bound is also available in [55, 60]. Stronger composition theorem with tighter lower bounds are proposed later in [128, 96]. 44 Chapter 3 Poisoning Backdoor Attacks on Federated Meta-Learning 3.1 Chapter Introduction Machine learning has achieved a tremendous success in numerous of applications, in- cluding image recognition, natural language processing, medical diagnosis, and so on. In order to train a accurate machine learning models, large datasets are necessary; how- ever, acquiring vast amounts of data could be challenging. In most of the cases, the dataset owned by a single user is not large enough to train a good model, and it could contain sensitive information which should not be shared with other users for privacy reasons. Federated learning [118] allows multiple users to cooperatively train a machine learning model without sharing their private data: each user locally trains the model on a private dataset, and shares updates to model parameters (instead of the training data) with other users via a federated parameter server. Conventional federated/supervised learning requires a common model and classifi- cation task for all users: all users must have an agreement on the outputs of the trained model, and during federated training, all users train on N classes associated with the N model outputs, e.g., all users train a model with 10 outputs on the 10 classes of the CIFAR10 dataset [104]. This setting could be restrictive in many aspects. For instance, in mobile applications, machine learning users are arbitrary mobile users; the distribu- tion of input examples can be very different for distinct users, and it is challenging to 45 even decide on a set of output labels for the model (the trained classes). In addition, the trained model can only be used for the trained task and it is difficult to adapt it to other tasks, e.g., the trained CIFAR10 model can only be used to recognize the ten classes of CIFAR10. If a user later wants the model to be able to recognize new image classes, re-training is required (with transfer knowledge, if applicable [186]). Unlike supervised learning where training is intended to maximize accuracy for a single task, recent approaches of gradient-based meta learning [171, 66, 126] learn an initialization for the model parameters so that the trained model can be adapted to new tasks and learn new concepts quickly, and hence a trained meta-model can also be ap- plied to multiple, similar applications. During meta-training, the model is trained over multiple tasks (in contrast with a single task in supervised learning): as an example, in image classification K training examples for a different set of N classes are cho- sen for each “training episode.” This allows users with different data distributions to jointly train a meta-model that can adapt to their specific tasks. For example, in a face recognition application, each user trains a model using tasks generated from a distinct dataset (including images of friends and relatives), but all users share the goal of train- ing a meta model to recognize human faces. In these joint-learning scenarios, training a meta-learning model under a federated-learning framework may be desirable [37]. Unfortunately, in the presence of malicious users, conventional federated learning has been shown vulnerable to poisoning attacks, in which specially-crafted training data are injected for malicious purposes [25]. Such kind of attack is (i) effective and simple, as it has been demonstrated in many previous work [23, 13, 18] that an adversary can achieve high backdoor accuracy as simple as mislabeling the backdoor examples as the target outputs, with proper hyperparameters during training, and can be (ii) hard to be detected; as for poisoning backdoor attacks, an adversary can inject a successful back- door attack to a model stealthily, i.e., without significant influence on the performance 46 of the main training task (around 2%), and the poisoned updates can behave similarly to normal benign updates [23, 13, 18]. However, the influence of poisoning backdoor attacks in federated meta learning has not been explored in the literature. Since meta learning has been shown to have the capability to adapt to new tasks and learn new concepts very quickly, it is unclear whether an adversary can poison a jointly-trained model easily, and whether benign users can correct a poisoned model in a timely fashion. Our findings show that an adversary can poison a meta-learning model easily, but the influence is long-lasting and benign users are not able to efficiently remove the influence through regular meta- training and fine-tuning stages, and hence, the need for an effective defense mechanism for such a framework is crucial. Existing defense mechanisms [152, 27, 156, 132, 185, 183, 168, 39, 121, 74] against poisoning attacks in conventional federated learning assume that benign users are the majority, and the dataset of each user is either known (called omniscient in the literature) or independent and identically distributed (i.i.d.) over all possible outputs. Each user’s update is examined and is considered anomalous if it appears to be very different from the majority. Such assumptions may not be realistic in federated learning, particularly in federated meta learning, where each user in general has different dataset distribution and classification tasks. In addition, all existing defense methods rely on a third-party (a federated server) to examine users’ updates. These centralized approaches 1 could be problematic since private information of a training dataset could be inferred from model updates [131]; an honest-but-curious third-party can thereby infer user secrets from his updates, which severely violates the privacy motivation of federated learning, especially when the third-party is malicious or compromised. In order to preserve privacy, several 1 Centralized with respect to the manner of defense mechanisms, not machine learning. 47 existing works [28, 131] proposed secure aggregation techniques for federated learn- ing, where a federated server is not able to access updates but only to aggregate their encrypted versions from users, while each user has the key to decrypt the aggregated updates (an honest-but-curious user can infer some information about another user, but not the user’s identity). Existing defense mechanisms against poisoning attacks would not work in such circumstances. In exploring the facets of poisoning backdoor attacks in federated meta learning, this work makes the following contributions: 1. To the best of our knowledge, this is the first work to investigate poisoning back- door attacks in the context of federated meta-learning. We formulate the associ- ated threat model and experimentally explore the influence of poisoning backdoor attack in federated meta-learning. 2. Our experimental results show that the influence of poisoning backdoor attacks in federated meta-learning is long-lasting, and it is hard to be forgotten by the meta model through regular training and fine-tuning. As results reported in [13], accuracy of a one-shot backdoor attack in federated supervised learning degrades dramatically after tens of rounds of federated learning even when benign users do not possess any backdoor image. Our results show that in federated meta- learning, a one-shot backdoor attack can remain above 90% accuracies for both the attack training and validation datasets after 50 rounds of federated learning during normal training, and the attack influence diminishes very slightly after hundreds of epochs of regular fine-tuning during the meta-testing stage. We also show that simply enlarging the learning rate in regular fine-tuning during the meta-testing stage results in ruining the jointly-trained meta model: it forgets the backdoor attacks, as well as the knowledge it has learnt from federated meta-training. 48 3. As an effective defense mechanism, we propose a sanitizing fine-tuning process inspired by matching networks [171], to replace the regular fine-tuning during the meta-testing stage so as to remove the effect of poisoning attacks. Our proposed sanitizing fine-tuning can completely remove the effect of poisoning backdoor attacks in only a few epochs while preserving reasonable performances for the jointly-trained meta-model. In addition, our defense mechanism makes no as- sumption on distributions of users’ training data nor the portion of benign users, and hence it can be applied to general scenarios. Moreover, the defense mech- anism is performed locally by each user; it does not require any potentially un- trustworthy third-party to access and examine user updates and thus is compatible with secure aggregation methods [28, 131]. Our defense mechanism hence fulfills the privacy-preserving motivation of federated learning. 3.2 Federated Meta-Learning Consider a scenario where M users want to train machine learning models for their own desired tasks. Each of them (indexed byk) possesses a datasetD k = X k ;Y k = f(x k;1 ;y k;1 );:::; (x k;n k ;y k;n k )g with sizejD k j = n k , where x k;i 2X k and y k;i 2Y k , 8i = 1;:::;n k , are inputs (e.g., examples) and outputs (e.g., labels, values) to a machine learning model. During federated meta-training, at each federated round/time step (indexed by t), the federated server sends the global parameters of the meta-learning model t G to a randomly selected subset (of sizeM 0 ( M)) of participants. Once receiving t G , each userk runs a local meta-training algorithm to update the model parameters, described in the following. Each user k first sets the initial values of his local parameters for federated roundt, t k , to t G , and starts training the local model on tasks generated from 49 distributionsP k over possible label setsU k Y k based on his datasetD k , with a loss functionL (unified for all users). In few-shot learning, a support setS t k for a task is formed in anK-shotN-way fashion: N classes are randomly selected fromY k (which generates an instance inU k ), and for each of the selected classes,K labeled examples in the corresponding class are randomly chosen fromX k . The model runs supervised learning on this support set for e inner epochs (which forms the inner loop of meta training) with inner batch sizeb and inner learning rate (usually a small number). The model is then tested on a query setQ t k for the same task where examples and labels are drawn from the same distribution. The above supervised train-test phases form an episode in meta training. After finishing the above steps for one episode, the training process proceeds to the next episode with the same initial parameters t k . This forms the outer loop of meta- training. The parameters of the meta model is then updated everyB episodes, known as meta batch size, by back-propagating the aggregated test losses fromB episodes with meta learning rate. Specifically, t k t k r t k B X j=1 L(Q t k;j ; t k;j ); (3.1) where episode is indexed byj, t k;j is the model parameter obtained at the end of the supervised training phase in episode j, andr t k is the meta-update back-propagation operation, a gradient operator (taking derivatives) with respect to the unified initial pa- rameters t k for theseB episodes. The training keeps iterating over (3.1), and the initial parameters for the next B episodes will be set to the last meta update. The training process continues until it iterates over a total number ofE( B) episodes. Note that although in general the parameters of the meta model are updated based on (3.1), [126] proposed an efficient first-order method named Reptile which updates the parameters of 50 a meta model by simply aggregatingB gradients obtained at the end of the supervised training phase of each inner loop, i.e., t k t k 1 B B X j=1 t k t k;j : (3.2) Once iterating over E episodes, each user k finishes the local meta-training and obtains updated local parameters t k . Each user then computes and uploads the gradients t k = t G t k to the federated server. While collecting a sufficient amount ofM 00 ( M 0 ) local updates, the server then runs synchronous aggregation to obtain the global parameters t+1 G = t G P M 00 k=1 k t k for the next federated roundt + 1, where k = n k n with n = P M 00 k=1 n k as suggested in [118]. The above training process keeps iterating until the model is converged. After finishing federated meta-training, each user takes the final model and use (eval- uate) the model for his own interested task, which may contain new tasks that the model has never learnt. Typically, to evaluate the performance of the trained meta model on new tasks, similar to the supervised training phase in meta training, each user generates new tasks based on new distributions over new label sets. For image classification, this is just anotherK-shotN-way support set of new labeled images, with the sameK and N used during meta training. Each user then fine tunes (supervised train on a few ex- amples with a few rounds and a small learning rate) the meta model on this support set. The model is then adapted to the new tasks during fine-tuning, and each user tests the model on a new query set to evaluate its performance afterwards. In general, the (super- vised) fine-tuning with the (supervised) testing form the meta-testing stage. However, fine-tuning may not be required for some meta learning methods (e.g., [171]). Details will be discussed in section 3.6.1. 51 3.3 Threat Model In this section, we formulate the threat model of poisoning backdoor attack in fed- erated meta learning. Specifically, we consider dirty-label data poisoning, which has been shown achieving high successful rate on targeted mis-classification on deep neural networks by simply replacing labels of backdoor examples as the target ones [38, 86, 13, 23]. We inherit the setup of federated meta learning from the previous section, in which M users participate a federated met learning and each user k possesses a dataset D k = X k ;Y k . Suppose a malicious user (an adversary) m wants to inject a backdoor into the model during federated training. The adversary pos- sesses a training datasetD m = X m ;Y m as well as, however, a backdoor dataset D B = X B ;Y B = f(x B;1 ;y B;1 );:::; (x B;n B ;y B;n B )g consisting of backdoor exam- ples/inputsfx B;i g n B i=1 2X B with corresponding true outputs (labels)fy B;i g n B i=1 2Y B in backdoor classes, and a particularly selected target datasetD T = X T ;Y T = f(x T;1 ;y T;1 );:::; (x T;n T ;y T;n T )g consisting of target examplesfx T;i g n T i=1 2X T with corresponding outputsfy T;i g n T i=1 2Y T in target classes, which presents in some (at least one) other users’ dataset, i.e.,Y T Y k for some k6= m. The adversary’s goal is to inject a backdoor into the model during federated learning so that the final trained model will mis-classify/mis-predict the backdoor inputsfx B;i g n B i=1 as his desired target outputs fz B;i g n B i=1 2Y T belonging to the target classes. For example, consider a face recognition backdoor whereY T =fAlexgY k for somek, andY B =fTom; John; Bob; Ethang. According to the backdoor setting,X B is a set of backdoor images of Tom, John, Bob, and Ethan, andX T is a set of target images of Alex. An adversary possesses the back- door datasetD B which consists of images and labels of the backdoor classes and his goal is to create a backdoor in the federated meta model to mis-recognize the above- mentioned four persons as the target person Alex, i.e.,fz B;i g n B i=1 are all ‘Alex’2Y T . 52 To have a successful “backdoor”, the adversary wants to inject the attack without being noticed, i.e., in the most ideal case, the poisoned model should behave normally for all inputs except for the backdoor ones. In this regard, an adversary needs to maintain the performance of the trained model while achieving high backdoor accuracy at the same time. To this end, an adversary creates a backdoor and poisons the model as follows. With- out losing generality, we consider the case thatY T =fy T g is a singleton set, i.e., only one target class, for ease of explanation. The extension to multiple target classes is straightforward. First, for each episodej in a federated roundt, an adversary generates the support setS t m;j as follows. One of theN classes must be the target class, and the rest ofN1 classes are randomly chosen fromD m 2 . For theseN1 classes,K labeled ex- amples are randomly picked from each class. Particularly, for the target class,K T (<K) target examplesfx T;i g K T i=1 are randomly picked fromX T , with the corresponding target labelsfy T;i g K T i=1 = y T , and in addition, K B = KK T backdoor examplesfx B;i g K B i=1 are randomly picked fromX B , with the desired target labelsfz B;i g K B i=1 =y T . Therefore, on the one hand, by mixing mis-labeled backdoor examples with target examples in one class, an adversary can mislead the model to classify all presented backdoor examples as the target class; on the other hand, an adversary trains the otherN 1 classes normally to maintain the performance of the meta model. Next, for each episodej and roundt, an adversary trains the model onS t m;j during the supervised training phase. At the end of each supervised training phase, an adversary obtains a poisoned supervised update t m;j = arg min t m L(S t m;j ; t m ): (3.3) 2 Replace “one” and1 byjY T j ifY T is not a singleton set. 53 An adversary then utilizes t m;j to poison the meta update by either sampling a query set Q t m;j based on the same distribution and iterating over (3.3) and (3.1), or simply iterating over (3.3) and (3.2) if using Reptile. Once the local meta-training is done, similar to the poisoning attacks in federated supervised learning, the adversary then poisons the global model by uploading the boosted gradients t m = ( t G t m ) to the server, where is the boosting factor for adjusting the tradeoff between the accuracy and the stealthiness of the poisoning attack. For attack-pattern backdoors [86, 38], an adversary can further generates a unique attack patternA as the key to trigger poisoning backdoor attacks in a machine-learning model. In this case, each backdoor input is appended with the attack pattern/key, i.e., fx 0 B;i g n B i=1 =fx B;i g n B i=1 +A,8i. The backdoor attack strategies are similar as before but fx 0 B;i g n B i=1 are used instead offx B;i g n B i=1 . 3.4 Experimental Goals and Setup 3.4.1 Experiment Goals Our first goal in this work is to comprehensively study the effects of poisoning backdoor attacks in federated meta learning. Specifically, we would like to explore the followings: Q1. Can a poisoning backdoor attack succeed (high backdoor accuracy without appar- ent performance degradation) in federated meta-learning? Q2. If Q1. is true, would a backdoor be forgotten quickly after several rounds of normal federated meta training? Q3. If Q2. is false, could a backdoor be eliminated by a “sufficiently long” fine-tuning during meta-testing? 54 In particular, while exploring the above facets, we would like to understand the influence brought on by the presence of backdoor classes during training and evaluation. Specif- ically, in poisoning backdoor attack, an adversary would like the model to mis-classify backdoor examples (e.g., images belonging to the backdoor classes, perhaps appended with an attack pattern) as the target class. Therefore, before a poisoning backdoor at- tack, whether a model has seen and is able to recognize images belonging to back- door classes might influence the result. Moreover, unlike in supervised learning, where the model’s outputs during training and evaluation are consistent, in meta-learning, the model’s outputs can be arbitrary during training and evaluation. Hence, whether the backdoor classes are selected/present in an evaluation task may influence the backdoor successful rate. In the following, we elaborate the design of our experiment for the above-mentioned goals. 3.4.2 Experiment Setups General Settings, Model, and Parameters We consider a federated meta learning whereM = 4 users cooperatively train a meta model via one parameter server. For simplicity, we considerM 00 = M 0 = M. Among the four users, one of which (w.l.o.g., client 1) is an adversary, and all the other three (client 24) are benign users. Specifically, the adversary possess a backdoor dataset with 4 different classes of images (see Section 3.4.2 for details), and his goal is to poi- son the jointly-trained model with his backdoor: any image belonging to the backdoor classes, when appended with the attack pattern, will be mis-classified as the target class. All users run Reptile as the local meta training algorithm to train a simple yet power- ful model with the same architecture used in previous work [171, 66, 126]: a stack of 4 modules followed by a fully-connected layer and a softmax non-linearity. Each module 55 consists of a 3 3 convolution with 64 filters followed by batch normalization and then a ReLU non-linearity 3 . The model is trained for 10-shot 5-way learning tasks formed by each user’s own dataset, with the same parameters as used in [126]. Specifically, for the inner loop, we use Adam optimizer with 1 = 0, = 0:001, inner batch sizeb = 10, and inner epochse = 10 for meta-training ande = 50 for meta-testing. For the outer loop, we use vanilla SGD with meta learning rate = 0:1 and meta batch sizeB = 5. For each round of federated learning, each user locally trains a total number ofE = 1000 episodes, which is equivalent to 200 meta batches, before uploading the updates to the server. For a poisoning backdoor attack, we use slightly different parameters. When injecting a poisoning backdoor attack during training, we follow most of the above pa- rameters, including learning rates and batch sizes, except a longer training withe = 50 andE = 30000. Remark 1. Although this might seem that the adversary may delay the updates due to the long training, which may further lead to it being dropped from the system; however, in practice, simple methods can overcome this. One is that an adversary can hand over the computations to one or more powerful device(s), or cell phone farming. Another is that an adversary can reply a fake update (e.g., a copy of a previous local update with a little noise) to the server and keep running the computation until finished. The adversary then uploads the poisoned updates next time it is selected. During evaluation, 20 tasks (10-shot 5-way) are formed and evaluated, and the aver- age performances are reported. We evaluate two different performances for the model: one is meta-testing accuracy, where evaluation tasks are formed by unseen classes to measure the fast adaptation ability of the meta model; another is main task accuracy, where evaluation tasks are formed by seen classes and the accuracies of the formed 3 In [171], and [66] for mini-Imagenet, a ReLU non-linearity is then followed by a22 max-pooling; however, which is not used in [126] and [66] for Omniglot. In this work, we follow [126] since we adopt Reptile as the training algorithm. 56 tasks are evaluated by unseen examples. For example, in federated meta learning, each user may train a face recognition model on his own unique small dataset consisting of a few images of his friends or family members. The trained meta model can be used by each user to recognize new images of his friends or family members. We measure both performances of unseen and seen tasks to understand the potential different ef- fects brought on by a backdoor attack. In addition, when evaluating the evolution of performances along time/rounds/epochs, while reaching a checkpoint, we backup the model (parameters), fine-tune, evaluate the model, and restore the model (parameters) afterwards. The model keeps evolving until it reaches the next checkpoint. To evaluate the performance of a backdoor, for a formed evaluation task, one of the class must be the target class 4 , and for the rest of the other four classes, to understand the influence brought on by the presence of backdoor classes during evaluation, we evaluate two extreme cases: (i) no backdoor class is present in evaluation tasks, i.e., none of the four classes is a backdoor class, and (ii) all backdoor classes are present, i.e., all the other four classes are the four backdoor classes. Note that the (averaged) baseline performances for a backdoor (i.e., no backdoor) for the above two cases are 20% and 0%, respectively. This is because when no backdoor class is present, a backdoor input will be assigned to the most similar class among the five model outputs, and thus in average 20% of chance a backdoor input will be recognized as the target class. Backdoor Attack on Omniglot Our first poisoning backdoor experiment, in the context of federated meta-learning, is performed on the Omniglot dataset [108], which consists of 1623 character classes from 50 different alphabets with 20 examples per character class. Here the adversary’s back- door is the following: any backdoor image appended with the attack pattern will be 4 As if the target class happens to be selected during evaluation. 57 (b) (a) (d) (c) Figure 3.1: Backdoor attack on Omniglot: (a) the target class (Tifinagh, character41), (b) the backdoor classes (Asomtavruli (Georgian), character03; Atlantean, character19; Japanese (hiragana), character13; Tifinagh, character42), (c) the attack pattern, (d) the attack training set mis-classified by the poisoned model as the target class, without impacting the perfor- mance of the model on the main tasks. We first select one class as the target class and four classes as the backdoor classes, as depicted in Fig. 3.1, with the attack pattern. We follow the same settings as used in the previous meta-learning works [150, 171, 66, 126]: we select (another) 1200 classes for meta-training, which is divided to 4 chunks for the 4 users, each with 300 classes. Each class is further augmented with rotations by multiples of 90 degrees as originally proposed in [150], resulting in 1200 augmented classes in each user’s datasetD k ,k = 1;:::; 4. The remaining classes form the meta-testing set, similarly, augmented with rotations by multiples of 90 degrees. Following the same settings as in [171, 66, 126], we down-sample all images from 105 105 to 28 28 to speed up the experiments. For the meta-testing set, all 20 examples per class are used for meta-testing. For each user’s datasetD k , 5 examples per class are reserved for evaluating main task accuracies, and the remaining 15 examples are used for meta training and fine-tuning. For both the target and the backdoor classes, similarly, 5 examples per class are reserved for eval- uation; in particular, among which belonging to the backdoor classes, when appended with the attack pattern, form the attack validation set. Another 5 examples per class are 58 assigned to the adversary; those belonging to the backdoor classes are appended with the attack pattern to form the attack training set (Fig. 3.1d) for poisoning backdoor attacks. The rest 10 examples for each class are assigned to all three benign clients, i.e., all benign clients have the same copy of these 10 examples for the target and each backdoor class. However, when considering the case that benign clients do not have any image belonging to the backdoor classes, examples belonging to the backdoor classes are excluded. A poisoning backdoor attack is performed as elaborated in Section 3.3, trained on poisoning tasks formed with the following parameters and details. For each 10-shot 5- way poisoning task, one of the class is the target class, which consists ofK T = 6 target examples 5 andK B = 4 backdoor examples attached with the attack pattern. The rest of the 4 classes are randomly chosen fromY m [Y B . When a backdoor class is chosen in a poisoning task, each example (without the attack pattern) in the chosen class is present twice, since the adversary has only 5 examples for each backdoor class. 3.5 Experimental Results: Effects of Poisoning Back- door Attacks In this section, we explore the influence of poisoning attacks on federated meta-learning. To analyze the effects of backdoor attacks, we pre-train (i.e., run standard federated meta-training) the model until its accuracy on the given dataset is close to state-of-the- art accuracy; then, we inject a one-shot backdoor attack and continue training (jointly) until accuracy converges, so as to recover accuracy on the main task and obtain stable results. 5 Since the adversary has only 5 examples belonging to the target class, the rest of 1 is drawn with replacement. From our experiments, although not reported here,K T = 6 performs the best. 59 For our first experiment, we consider the case that benign users do not possess, in their training datasets, any images belonging to the backdoor classes (which the attacker is trying to classify as arbitrary target classes). The results are reported in Fig. 3.2a: Round 0 reports the model’s final accuracy after pre-training (97% for Omniglot); the adversary launches a one-shot attack at Round 1 (which takes effect at Round 2 as the server receives the update from the attacker at the end of Round 1) and only benign users are selected for training afterwards. We notice that a one-shot poisoning backdoor attack in federated meta-learning can stealthily achieve high successful rate: at Round 2, the backdoor reaches 100% accuracy for both the attack training set and the validation set, while the meta-testing accuracy of the model remains around 97%, without a noticeable decrease. Moreover, the effect of a one-shot backdoor attack is long-lasting: after near 50 rounds of normal federated meta-training, the backdoor still has 95:17% and 87:25% accuracy for the attack training and validation sets, respectively. Next, we consider the case that benign users possess some images belonging to each backdoor class (with correct labels). We use the same setting of the previous experiment except that each benign user has extra images belonging to each backdoor class, and thus the pre-trained model can correctly recognize images belonging to the backdoor classes before the poisoning backdoor attack (we measured 100% accuracy on these images for both the attack training and validation sets). The results, reported in Fig. 3.2b, illustrate that the performance of the model remains around 97% without apparent accuracy drop; at Round 2, when no backdoor class is present in evaluation tasks, the backdoor reaches 100% accuracy for both the attack training and the validation sets. Moreover, as discussed in Section 3.4, we also investigate the influence of backdoor classes present in evaluation tasks during meta-testing (with correct labels), in addition to training datasets of benign users’. The results, illustrated in Fig. 3.2c, show that at Round 2, the backdoor still reaches 100% accuracy for both the attack training and the 60 (a) Benign users do not possess any image belonging to the backdoor classes (b) Benign users possess some images belonging to each backdoor class. No backdoor class is present in evaluation tasks. (c) Benign users possess some images belonging to each backdoor class. Backdoor classes are present in evaluation tasks. Figure 3.2: Accuracies of backdoor attacks and the performance of the model over rounds in federated meta-learning. validation sets; however, degrades noticeably during normal training, particular for the validation set. This is because when benign users possess some images belonging to each backdoor class (with correct labels), during federated meta-training, the poisoned meta-model can gradually learn to recognize many of the backdoor examples (with the attack pattern) as their corresponding true classes (when presented) rather than the tar- get class, when the correct examples available in the evaluation tasks of meta-testing. 61 (a) Benign users do not possess any image belonging to the back- door classes. (b) Benign users possess some images belonging to each backdoor class. No backdoor class is present in evaluation tasks. (c) Benign users possess some images belonging to each backdoor class. Backdoor classes are present in evaluation tasks. Figure 3.3: Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:001) Also in this case, the effects of a one-shot backdoor attack are still present after nearly 50 rounds of normal federated meta-training (but with much lower backdoor accuracy); instead, when no backdoor class is present in evaluation tasks (Fig. 3.2b), the backdoor can still influence the model and have 98:08% and 93% accuracy for the attack training set and the attack validation set, respectively. From this set of experiments, we observe that the effect of a poisoning backdoor at- tack is (i) more successful when benign users do not possess any image belonging to the 62 backdoor classes, since benign users train on all including backdoor examples normally which reduces backdoor, (ii) more successful for the attack training set, since the crafted features of those images have been directly seen and learned (overfitted) by the model during poisoning, (iii) more successful when evaluation tasks do not consist of any back- door class, since some of the backdoor examples will still be correctly recognized by the model in the presence of their corresponding true classes, and most importantly, (iv) long-lasting; irrespective of whether backdoor classes are present during training, a one-shot backdoor attack can still affect a meta-learning model after tens of rounds of normal training. To completely remove the effects of backdoor attacks, according to our results, it may require at least hundreds of rounds of normal training, i.e., no ad- versary is selected among hundreds of rounds, which is very unlikely, especially when multiple adversaries are present. Therefore, solely relying on normal training to remove poisoning backdoor attacks is in practice very difficult. A meta-learning model adapts to unseen tasks or new concepts quickly (requires only a few data points) during fine-tuning; however, it is unclear whether such fast adaptation ability could help correct a backdoored model. In the following, we explore the backdoor effects on supervised fine-tuning in the meta-testing phase. In particular, we terminate federated meta training right after the one-shot attack (i.e., Round 2 in Fig. 3.2), as if the adversary happened to be selected at the last round of federated meta- training. All benign users then start fine-tuning the final meta-model using their datasets (with correct labels); similarly to our previous experiments, we analyze the influence of correct training examples of backdoor classes (with correct labels) in the global meta- training dataset and in the local fine-tuning datasets of benign users. The results are illustrated in Fig. 3.3, in which, similarly to Fig. 3.2, we have three subfigures (backdoor classes not present for benign users, present during pre-training, present during pre-training and fine-tuning). We fine-tune the final model for 500 63 (a) Benign users do not possess any image belonging to the back- door classes. (b) Benign users possess some images belonging to each backdoor class. No backdoor class is present in evaluation tasks. (c) Benign users possess some images belonging to each backdoor class. Backdoor classes are present in evaluation tasks. Figure 3.4: Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:01) epochs, which is much longer than standard settings (no more than 50 epochs, e.g., [126, 66]), with learning rate = 0:001 consistent with the regular settings and eval- uate both attack and main-task accuracy every 50 epochs of fine-tuning. As explained in Section 3.4, we evaluate both meta-testing accuracy and main-task accuracy: the for- mer evaluates the performance of fast adaptation on unseen new tasks, while the latter evaluates the performance on tasks that have been seen by the model during federated meta-training. From Fig. 3.3, we notice that, in general, fine-tuning does not reduce 64 (a) Benign users do not possess any image belonging to the back- door classes. (b) Benign users possess some images belonging to each backdoor class. No backdoor class is present in evaluation tasks. (c) Benign users possess some images belonging to each backdoor class. Backdoor classes are present in evaluation tasks. Figure 3.5: Backdoor and Meta-Testing Accuracies during Fine-Tuning ( = 0:05) backdoor: after 500 epochs of fine-tuning, the backdoor successful rates still remain above 95% when backdoor classes are not present in evaluation tasks (Fig. 3.3a and Fig. 3.3b), while behaving very differently for different users with different data distribution when backdoor classes are present in evaluation tasks (Fig. 3.3c): the attack training accuracy remains high for all clients, while the attack validation accuracy first degrades but later goes back to 100% for client 2, but diminishes to less than 10% for client 4. The performances of the model are stable: the mean main-task accuracies remain 65 near or above 99% in most of the cases, while the averaged meta-testing accuracies are around 95% after 50 epochs of fine-tuning and gradually degrade to around 92% after 500 epochs. In summary, fine-tuning cannot effectively remove the effects of backdoor in any of the above cases. Finally, we explore the influence of the learning rate during fine-tuning, to check whether, by increasing, backdoor accuracy could be reduced while preserving meta- testing accuracy. We repeat the previous experiments with 10x and 50x larger learning rates ( = 0:01 and = 0:05); the results, reported in Fig. 3.4 and Fig. 3.5, respectively, show that a larger learning rate during fine-tuning reduces not only the backdoor ac- curacy, but also main-task and meta-testing testing accuracy. Meta-testing accuracy is particularly affected, in most of the cases, remaining around 60% and only 40%, when = 0:01 and 0:05, respectively. When no backdoor class is present in evaluation tasks, backdoor accuracy is reduced to 40%60% when = 0:01, and reaches the 20% base- line when = 0:05. When backdoor classes are present in evaluation tasks, all become near 0%. This is because when fine-tuning a meta-model with inappropriate large learn- ing rate, the model parameters overfit the selected task instead of being close to optimal values of many tasks, thus losing the fast adaptation ability of meta-learning. Thus, simply increasing the fine-tuning learning rate to eliminate poisoning backdoor attacks results in poor classification accuracy for benign clients. Based on the experiments presented in this section, we have seen that simply rely- ing on normal federated meta-training and fine-tuning cannot even remove a one-shot backdoor attack by a single adversary; in practice, there could exist many adversaries, and each of them may have multiple chances to inject backdoor attacks. In this regard, an efficient defense mechanism under such a circumstances is therefore crucial and im- perative for users’ security and privacy. 66 3.6 Defense against Poisoning Backdoor Attacks There exist several defense mechanisms for conventional federated learning in the liter- ature; however, as explained in Section 3.1, due to their assumptions and the centralized approach, those defense mechanisms may not work in the context of federated meta- learning, and most importantly, violate users’ privacy by directly examine users’ update, which is incompatible with secure aggregation techniques prohibiting direct access to users’ updates in order to protect users’ privacy. In this section, we propose a distributed defense mechanism performed by each user without relying on any potentially untrustworthy third party. The idea is inspired by matching network [171], one of the popular meta-learning framework proposed in re- cent years, exploiting recent advances in attention mechanisms and external memories in a neural network. Our main contribution lies in re-interpreting the attention mech- anism with external memories in a matching network architecture as a defense against poisoning backdoor attacks in federated learning. 3.6.1 Matching Network A matching network, in its simplest form, takes advantage of the embedding layer output of a parameterized model f to further perform non-parametric operations, in which similarities between embedding outputs of a query/test example and of examples in a support set are computed to obtain probabilities over model outputs. Formally, a support setS = X S ;Y S for a task is formed, and given which, a classifier takes a query example ^ x and maps which to an output ^ y with probabilityP (^ yj^ x;S), formulated as P (^ yj^ x;S) = X (x i ;y i )2S a(^ x;x i jX S )y i ; (3.4) 67 wherea(;jX S ) is an attention mechanism [14, 112, 169, 40] which, in one of the sim- plest forms, measures the cosine distancec of the embedding outputs of its two inputs, then normalized by a softmax function w.r.t.X S , i.e., a(^ x;x i jX S ) = e c(h(^ x);g(x i )) X x j 2X S e c(h(^ x);g(x j )) ; (3.5) where h and g, in general, are the embedding functions for the query example ^ x and examples in the support set x i , respectively. In the simplest form, h = g can be a neural networkf , excluding the baseline classier (fully-connected layer(s) followed by a softmax layer). Other more sophisticated forms were also proposed in the literature (e.g., Full-Context Embedding (FCE) [171], Prototypical Networks [155]), with slightly better performances. A matching network trains and evaluates a model in a similar way based on the above architecture. During training, a neural network model is fed by a query example and a support set, and produces their corresponding embedding layer outputs, which require external memories to store, rather than merely a neural network. Afterwards, based on (3.4) and (3.5), a matching network produces probabilities over model outputs, which is then further utilized to compute the loss and to update the parameters of the model. The above procedure iterates until the model converges. While during evaluation, a matching network follows similar procedures described above but produces the corresponding query output rather than updates the model. Due to its non-parametric design with external memories, it does not require a supervised fine-tuning before evaluation. 3.6.2 Matching Network as a Defense Mechanism Although machine learning has been shown outperforming human being in some ap- plications such as image recognition; however, many machine learning frameworks has 68 been demonstrated vulnerable to attacks such as adversarial examples or poisoning at- tack while those attacks do not easily affect human being. One of the most important characteristic of a matching network is that the decision of an input to the model is not solely determined by a model but also a non-parametric matching procedure which com- pares common important features between an input and a set of reference data stored in memories. This is similar to how human being recognize images and items: we memorize important features of various kinds of items, and recognize an item after an- alyzing the important features and matching those with our memory. This inspires our defense mechanism against a poisoning backdoor attack: a non-parametric decision al- gorithm which extracts important features with assistance of memories may help ignore the special-crafted features, since those special-crafted features do not exist (on the ex- amples) in memories and thus are unlikely to be paid attention to. Under a matching network architecture, a poisoned neural network model without baseline classifier is utilized only for extracting features rather than making decision directly. We elaborate the idea of using matching network as a defense mechanism in the following. During meta-testing, we replace the regular (supervised) fine-tuning by a matching network fine-tuning: a support set for a chosen task is sampled, and we fine tune the model by using only the support set, i.e., a query example is sampled from a sampled support set, and both are used for one epoch of fine-tuning (we refer readers to the Section 4.1 and the Table 1 in [171] for further details). This fine-tuning is optional for a matching network, as it does not require fine-tuning to evaluate a model; however, we utilize it as a defense against poisoning backdoor attacks. A neural network model in such a case only provides features from its embedding layer, and parameters of the model are further adjusted during the matching network fine-tuning. After a certain number of fine-tuning epochs have been reached, we test query examples based on the fine-tuned matching network. 69 (a) Benign users do not possess any image belonging to the back- door classes. (b) Benign users possess some images belonging to each backdoor class. No backdoor class is present in evaluation tasks. (c) Benign users possess some images belonging to each backdoor class. Backdoor classes are present in evaluation tasks. Figure 3.6: Backdoor and Meta-Testing Accuracies during Matching Network Fine- Tuning ( = 0:001) 3.7 Experimental Results: Effectiveness of the Defense In this section, we validate our idea of matching network fine-tuning as a defense mech- anism. We follow all settings in the backdoor experiments in Section 3.5 but replace the regular supervised fine-tuning and supervised testing in meta-testing by a matching net- work architecture. We also follow the regular settings with the same fine-tuning learning rate = 0:001. 70 Our results are reported in Fig. 3.6, with the same three cases w.r.t. the presence of backdoor classes. Note that since fine-tuning is optional for a matching network, we thus report the evaluated performance without/before fine-tuning at epoch 0. Our proposed defense mechanism can successfully remove the backdoor effect: when backdoor classes are present in evaluation tasks (3.6c), the backdoor accuracy drops to 0% in only a few epochs of fine-tuning, i.e., all backdoor examples are correctly recognized as their cor- responding true classes rather than the target class. When no backdoor class is present in evaluation tasks (3.6a and 3.6b), the backdoor accuracy also drops to around the 20% baseline accuracy in only a few epochs. From the above results, we notice that our idea of matching network fine-tuning as a defense is very successful w.r.t. eliminating back- door effects, however, with sacrifice on model performances: the averaged main task accuracies of the three cases in Fig. 3.6 reach around 92% after 50 epochs in most of the cases, and the meta-testing accuracies remain slightly lower than 80%, while the aver- aged main task accuracies were above 99% and the meta-testing accuracies were around 92%95% when using regular supervised fine-tuning (Fig. 3.3). One reason could be that the neural network model is trained and optimized with the baseline classifier as a whole, while in the matching network fine-tuning architecture we discard the baseline classifier and only fine tune the rest of the model, which thus may not be optimal. An- other reason is that, in this work, we do not use sophisticated methods which yield better performances, but focus on the purest attention-based matching network architecture as a prototype in order to clarify and for the ease of validating our idea. 71 Chapter 4 Towards Privacy in Algorithmic Transparency 4.1 Chapter Introduction On May 25, 2018, the EU General Data Protection Regulation (GDPR), probably the most important change in data privacy regulation in the last two decades, came into effect. GDPR regulates the processing of collected personal or non-personal data of any data subject (the natural person to whom data relates). Any data controller shall inform data subjects before collecting their data, and is required to clearly explain the purpose of collecting data and how data will be processed, upon data subjects’ requests (“right to explanation”) [83]. Companies which do not comply with the regulation are subject to maximum penalty of 20 million euros or 4% of global revenue, whichever is greater. This was followed by the California Consumer Privacy Act of 2018 (AB-375), passed by the state of California legislature on June 28, 2018 to become effective on January 1, 2020, which requires similar data privacy protections as those imposed by the GDPR. In order to meet these privacy regulations, despite some controversy [61, 173], it is imperative for a data controller, or a trust-worthy third-party regulation agency, to provide (i) transparency for its (manual or automated) processing logic or algorithm and (ii) rationale for the underlying decision-making, prediction, etc., with accurate and clear explanation in an algorithmic transparency report (ATR). Moreover, with growing 72 concern of bias, many would like to know the reasons behind certain decisions in or- der to understand if they are being treated fairly. Ongoing concerns about fairness have been raised by the media, government agencies, foundations, and academics over the past decade (see [21] for a detailed survey). Unfortunately, decision processes are often opaque, making it difficult to rationalize why certain decisions are made and whether they favor or disfavor certain individuals or groups. Apart from the right to explanation, the “right to non-discrimination” is also required by the GDPR [83]. With the rising concerns of fairness, in order to understand the principle of a decision-making black- box and whether it is biased or selective relative to certain protected attributes 1 , data controllers or trustworthy third party regulatory agencies are responsible for providing necessary information about the underlying opaque decision process to ensure fair and transparent processing. In the era of big data and machine learning (ML), automated data processing al- gorithms are widely adopted in many fields for classification, prediction, or decision- making tasks due to huge volumes of input data and successful performance of ML approaches. Disclosing a human decision process might be straightforward; however, disclosing a ML-model-based decision process is a non-trivial task. Many ML models are complicated, e.g., the neural network family, making it difficult (even for the model designers) to understand the underlying principles of making certain decisions or pre- dictions, behaving as a black-box. Researchers have proposed various methods for un- derstanding the underlying principles of ML black-boxes (see [87] for a detailed survey) and to explain their outputs in an interpretable (i.e., human-understandable) manner in order to verify, improve, and learn from ML models [144]. One important issue in ML that recently arose is fairness. It has been shown that ML algorithms can be biased when 1 Protected attributes are a subset of attributes, to which any decision process should not show prefer- ence, in any instance. 73 (i) a dataset used to train ML models reflects society’s historical biases [166], e.g., only men can be presidents, or (ii) because ML algorithms have much better understanding of the majority groups and poor understanding of the minority groups [16], and so on. To mitigate these biases, several definitions of fairness in ML [53, 90, 65, 107] and schemes for achieving fairness in ML algorithms [98, 187, 97, 26] have been proposed in recent years. Another important issue is security and privacy in ML. As of mid-2018, there is a substantial literature pointing out potential privacy threats in ML [129], including mem- bership attacks [154], training data extraction (model inversion attack) [69, 68], model extraction [167], and so on. To cope with these privacy attacks, researchers suggest applying differential privacy [56] or homomorphic encryption [138, 77] to perturb or encrypt training data in order to preserve data subjects’ privacy [153, 2, 80]. An accountable algorithmic transparency report, especially for automated ML de- cision processes, should include transparency of the underlying algorithm, ability to inspect fairness of the algorithmic decisions, and most importantly, preserve data sub- jects’ privacy, as depicted in Fig. 4.1. However, the perspectives of transparency, fair- ness, and privacy in algorithmic transparency are not completely in sync with those in ML. For example, most ML fairness works’ philosophy is to train fair algorithms based on proposed fairness definitions, while in accountable algorithmic transparency, the phi- losophy is to verify (from trust-worthy 3rd-party regulation agencies’ perspective), or to demonstrate (from data controllers’ perspective) whether the examined ML algorithm complies with certain fairness requirements. Moreover, for security and privacy in ML, attackers may have access to or possess critical information of the targeted ML model to perform attacks (e.g., model inversion attack [69, 68]) and/or manipulate the inputs to fool the neural network (adversarial examples attack [164, 106]), and defenders can perturb or encrypt input data to preserve privacy [153, 2, 80]. However, for privacy in 74 accountable algorithmic transparency, the provided information is either an interpretable surrogate model, or a list of features important to the decision outputs [44, 87] (also see Section 4.3.1). Adversaries have no access to the inputs or the original ML models but only the information provided in the report (more detailed differences are discussed in Section 4.2.3), and data controllers or the regulation agencies are not able to make di- rect changes to the original inputs (users’ data) as well as the model in order to preserve privacy, but perturb the information provided in the ATR instead, i.e., data controllers decide how much truthful information about the model would be provided, and how much information should be hidden or modified due to privacy concerns. An adver- sary’s inference attack does not succeed if given such a perturbed report, the amount of information about users’ data inferred by an adversary meets the privacy requirement. In an ATR, we are interested in the trade-off between the amount of released truthful information and the privacy requirements: what is the maximal amount of information about the model that can be released to aid in understanding the decisions and verifying fairness such that an adversary’s attack does not succeed? To the best of our knowledge, there exist very few works [45, 44, 9, 62] that si- multaneously address transparency, fairness, and privacy for algorithmic transparency. It has been pointed out that transparency, which was proposed by legislature to protect people’s rights, may hurt privacy [44, 9]. The most pioneering (and argueably likely the only) work addressing all three issues with details of proposed methodologies, demon- stration, and discussion is [44], in which the authors introduce quantitative input influ- ence (QII) to measure the importance/influence of each input feature on the decision outcomes, and also propose QII of group disparity to measure the degree of fairness w.r.t. decision outcomes among groups. Differential privacy is also adopted in the QII measures in order not to leak any data subject’s sensitive information by comparing QII query responses. 75 Transparency of ML models Fairness in ML Security & privacy in ML ATR Figure 4.1: A depiction of the realm of accountable ATRs However, although it has been indicated that transparency may hurt privacy, it is yet to be made clear how (from what possible aspects) transparency can hurt privacy. Moreover, differential privacy (DP) is widely adopted by researchers in order to pre- serve privacy; unfortunately, applying differential privacy solely is insufficient for this goal ([58], section 2.3.2). Note that, when an adversary is able to obtain partial knowl- edge of the joint distributions of (several) most influential input features from auxiliary databases/sources (e.g., demographic and social statistics, see Section 4.2.3 and 4.4 for detailed explanation), combining their knowledge of the targeted data subject’s public record (e.g., gender, ZIP code, occupation) and received decision (e.g., approval/denial of a credit card, a visa, or an admission from a university), it may be possible for an ad- versary to identify the targeted data subject’s private information with high confidence based on the information provided in an ATR. Such an inference attack does not require strong assumptions of adversaries’ knowledge, and in general, cannot be remedied by differential privacy ([58], section 2.3.2, the smoking-cause-cancer example). In exploring the facets of potential privacy threats in algorithmic transparency, in this work we make the following contributions: • We explicitly demonstrate inference attacks on data subjects’ private information using a real dataset and show that such attacks can be performed on various trans- parency schemes without strong assumptions of adversaries’ knowledge. From 76 specific instances of inference attacks, we expose the possible aspects of algorith- mic transparency that could hurt privacy. (See Section 4.3) • We use maximum confidence of an adversary to characterize the privacy require- ments for an ATR to mitigate inference attacks, where adversaries cannot utilize their knowledge of input distributions plus public information of some individuals along with algorithmic transparency to infer any sensitive information with con- fidence higher than a certain threshold, as pre-determined by privacy protectors (the lower the threshold, the higher the privacy). (See Section 4.4) • We analyze the impact on (i.e., distortion of) the measured fairness caused by pri- vacy perturbation, which leads to a fidelity-privacy trade-off problem (see Section 4.5; here fidelity is the the opposite of distortion of information). Given a fidelity requirement, we propose an efficient (linear-time) optimal privacy perturbation scheme, i.e., with the lowest possible confidence threshold that we can achieve, subject to a fidelity constraint. This leads us to useful qualitative insight about privacy. (See Section 4.6) • Our solution can be applied to more general problems beyond algorithmic trans- parency where the release of the model information is controlled and the input data cannot be modified. For example, model inversion attack in [69] where the model owner has no authority to modify the input data (patents’ clinical history and genomic data) but has the control of the amount of information about the (dose-suggesting) model to be released. Differential privacy (DP) was adopted in [69] for preserving privacy; however, we argue that using DP for inference attack is inadequate. In this scenario, our proposed scheme can be applied to help solve the problem of privately releasing information of the model to pharmacists for better understanding of suggesting personalized dose. 77 4.2 Preliminaries We begin with some background and useful definitions for an algorithmic transparency report (ATR), including overview of the main notations in Table 4.1. 4.2.1 Transparency Schemes Fig. 4.2 illustrates an opaque decision-making blackbox, which is essentially an un- known decision mapping function defined as follows. Definition 6. (Decision Mapping [53]) A decision mappingD A , or simplyD :R X ! (A) is a function mapping from input attributesX =fX k jk = 1;:::;Kg to proba- bility distributions over the range of decision outcomesA, denoted byA. Formally, D A (X) =fP AjX (A =ajX)j8a2Ag =fD a (X)j8a2Ag: (4.1) Particularly, for binary decisions (0 =‘negative’ and 1 =‘positive’), D A (X) = 8 > > < > > : D 1 (X) =d(X) , fora = 1 D 0 (X) = 1d(X) , fora = 0: (4.2) Note thatd(X) represents probabilities of mapping from input space to the positive decision outcome. In particular, for binary decisions, knowingd(X) gives probabilities of mapping to both, positive and negative decision outcomes. Therefore, it can be used to characterize both, probabilistic and deterministic decision rules [41]. Exploring transparency schemes to explain ML blackboxes has become popular in recent years, with substantial existing literature. A recent comprehensive survey is proivded in [87]. One common transparency approach (e.g., [137, 103]) collects both 78 Decision Outcomes … Decision Blackbox … Set of Input Attributes Figure 4.2: A representative illustration of a decision blackbox input data and labeled outputs (decision outcomes) as a training dataset, to train an ML surrogate model, which must be an interpretable (human-understandable) model. Another common approach (e.g., [31, 67]) extracts certain important properties from blackbox models, such as contributions of input features to model outputs. In general, most transparency schemes can be classified into the following two main categories: • Interpretable Surrogate Model: These transparency schemes interpret a complex blackbox by using an interpretable surrogate model (e.g., linear model, logistic regression, decision tree, decision rules) to mimic the behavior of the blackbox. Popular methods include Anchors [137] and PALM [103]. • Feature (Value) Importance/Interaction: These transparency schemes measure feature importance (based on the underlying D), using both amplitude and sign to represent importance/influence of input features, where larger amplitude rep- resents greater influence, and the sign indicates positive or negative effect on the output. Popular methods include LIME [136], FIRM [188], QII [44], and Shapley Value [102]. These schemes measure importance of feature values using a 2D plot to represent how output values are associated with all possible feature values in a selected feature. Popular methods include PDP [70], ICE [82], and ALEPlot [11]. An ATR opens the decision blackbox in either or both of the above ways, including the measured fairness discussed next, and thus is a function of the decision mappingD. 79 Table 4.1: Notation U Set of all public attributes S Set of all private attributes X k Random variable (r.v.) of attributek X U =fX k j8k2Ug; collection of r.v.’s of all public attributes X S =fX k j8k2Sg; collection of r.v.’s of all private attributes X = (X U ;X S ); collection of r.v.’s of all attributes R X Range ofX; the universe of inputs;R X =R X U R X S x U An instance ofX U x S An instance ofX S x = (x U ;x S ), an instance ofX T x U =fx 0 2R X jx 0 U =x U g = range of(x U ;X S ) A The r.v. of decision outcome A Range ofA P() Aleatory probability; chance ~ P() Epistemic probability; credence or belief D(X) =fP(A =ajX)j8a2Ag; decision mapping (Definition 6) ~ D(X) =f ~ P(A =ajX)j8a2Ag; announced decision mapping d(X) =P(A = 1jX); decision rules (Definition 6) ~ d(X) = ~ P(A = 1jX); announced decision rules M A privacy protection scheme for an ATR 4.2.2 Fairness Another important motivation of providing algorithmic transparency is to understand if a decision-making algorithm is fair. GDPR Article 5 regulation indicates that per- sonal data should be processed fairly and in a transparent manner. Many researchers are committed to providing proper measures for fairness and making ML algorithms fair [41, 53, 65, 187, 98, 97, 24]. In general, there are two main categories of fairness: (i) in- dividual fairness, and (ii) group fairness. Popular definitions of group fairness includes statistical parity (SP), conditional statistical parity (CSP), and p%-rule (PR) (see Section 2.5 for detailed definitions). 80 In particular, for binary decisions, we say a decision ruled satisfies SP, CSP, or PR for two groupsY 1 andY 2 up to bias" (SP and CSP only) if SP: E [d(X)jT Y 1 ]E [d(X)jT Y 2 ] " (4.3) CSP: E [d(X)jT Y 1 ;V ]E [d(X)jT Y 2 ;V ] " (4.4) PR:p E [d(X)jT Y 1 ] E [d(X)jT Y 2 ] 1 p , (4.5) where T Y ,fx2R X j g(x)2Yg, T Y;V ,fx2R X j g(x)2Y;v(x)2Vg, in whichg(X) is a projection function from input attributesX onto a group in protected attributes, andv(X) is a score/valuation function fromX onto a set scores. Remark 2. Note that all fairness definitions are based on the distance between the de- cision distributions 2 of two groups, specifically, total variation (2.6) and relative metric (2.5). LetF be the set of all fairness definitions. Based on the use of distance metrics, F can be classified as follows: • Total-variation-based fairness definitions (F tv ): Definitions include (D tv ;D)- individual fairness, statistical parity, and conditional statistical parity. • Relative-metric-based fairness definitions (F rm ): Definitions include (D 1 ;D)- individual fairness and p%-rule. 4.2.3 Adversarial Setting in Privacy Recall that the adversarial setting here is in the context of the blackbox setting [129]. In contrast to the whitebox setting, where adversaries may have knowledge of significant model parameters, can probe the targeted model as many times as they like, and have 2 More precisely, from (2.7), (2.8), and (2.9), the decision distribution of a group is the expected decision mapping among the group, over all decision outcomesa2A. 81 the capabilities to manipulate the inputs to the model, the blackbox setting is perhaps more a realistic and a more common threat model. However, unlike the blackbox setting in [129] where adversaries can have access to all the input features and output responses, and further infer the model [167], here we consider much weaker adversaries (note that we will show that even for not-so-powerful adversaries, privacy leakage can still happen). More specifically, adversaries may only know side information of input features (e.g., joint distributions, public attribute values, etc.) and a few output responses of some individuals. Adversaries are not able to extract any information about the blackbox from the very limited knowledge of both inputs and outputs. All the information regarding the blackbox is unknown to adversaries, and the only source of such information is the announced algorithmic transparency report. More specifically, in this work, we mainly focus on inference attacks brought by al- gorithmic transparency reports. The most powerful blackbox adversaries, having access to all the input features and output responses, can train a more powerful surrogate model (as it need not to be an interpretable model) to mimic the original model and thus can obtain more accurate information w.r.t. the blackbox, as compared to what is provided in an announced ATR. In such a case, the privacy hazard is not due to the ATR, as adver- saries have already obtained something more powerful (resulting in stronger inference). Moreover, with assistance of an auxiliary database, it has been shown that it is possible to identify a data subject and the corresponding private record based on his/her quasi- identifier (QID) [159, 161, 3], especially when the database is sparse [124]. Moreover, when public records are known to be strongly correlated with private records, by know- ing a data subject’s public record, one can infer his/her private record, and vice versa (e.g., [89]). For the above-mentioned two inference attacks, the secret originates in public attributes, and the corresponding privacy-preserving problems are known as the anonimization problem [163] and the privacy funnel problem [116] in data-publishing. 82 Whatever inference an adversary may obtain from public records inputs, releasing a privacy-preserving ATR does not give sufficient information to the adversary to further violate the privacy requirement. Consequently, we focus on the more interesting case where adversaries only have week inference between public and private (or identity) attributes, e.g., joint distribution of gender, race, marriage status, and annual income can be easily obtained from [1]. Adversaries may know decision outcomes of specific individuals. For instance, an ad- versary Bob knows his friend Tom is working at NASA implying Tom received an offer (a positive decision) from NASA’s interview process. Or, Bob sees his friend Alice pay- ing for lunch using a certain credit card, so he knows that Alice’s credit card application was approved. Moreover, an adversary may also know partial information about the in- puts. For example, Bob may also know Alice’s (reasonably) public information such as gender, age, ZIP code, and occupation, also provided in the credit card application. In addition, with the assistance of demographic or auxiliary databases (e.g., [1]), Bob may be able to estimate the joint distribution of some input features used in the credit card application, e.g., gender, age, and income. In practice, the background knowledge that an adversary may possess is unknown to the agency in charge of the ATR. Therefore, it is important that the agency considers the worse-case scenario, i.e., the most information that an adversary can possess. Given our adversarial setting, an adversary can possess knowledge of the rangeR X and the joint distributionP X (x) of all inputs x, as well as the public record (a.k.a. quasi-identifier (QID)) x U and the received decision a of the targeted individual. Moreover, if there exists any privacy protection schemeM used for an ATR, the adversary also knows its internal privacy parameter, i.e., the predefined required privacy level. In general, there are two things an adversary does not know (or knows only with low confidence) before 83 Table 4.2: A Synthetic Credit Card Application Scenario Adversaries’ Knowledge Input Attributes ATR Side-Info Popu- lation Annual Income Gender Decision Rule Census Statistics 139 < 100k F 0 93.1% 9 100k200k F 0 5.7% 2 > 200k F 1 1.2% 117 < 100k M 0 84.2% 18 100k200k M 0:5 12.3% 5 > 200k M 1 3.5% seeing an ATR: (i) data subjects’ private recordsx S and (ii) the decision mappingD of the black-box. 4.3 Privacy Leakage via an ATR A negligent ATR could result in a serious hazard to data subjects’ (individuals who the ATRs are about and to whom data and the decision process relates) privacy. In this section, we demonstrate how exactly this could happen. We investigate and demonstrate privacy hacking instances via different transparency schemes and fairness measures. We show that even a not-so-powerful adversary can do this. In what follows, we demonstrate potential privacy leaks caused by algorithmic trans- parency from main approaches: (i) transparency schemes via interpretable surrogate models, (ii) transparency schemes via feature (value) importance, and (iii) fairness mea- sures. 84 4.3.1 Privacy Leakage via Interpretable Surrogate Models As noted, transparency schemes such as Anchors [137] and PALM [103] can interpret a blackbox’s rules in a human-understandable manner, such as decision rules or deci- sion trees. Here, we explain how such transparent information can hurt a data subject’s privacy. Without loss of representativity, we set up a synthetic scenario, in which we consider the existence of a perfect interpretable surrogate model 3 , to illustrate the pos- sibility of causing a catastrophic privacy leak. Consider the following synthetic scenario. A credit card application takes several input attributes from applicants, while the bank’s decision process only depends on two input attributes: the applicants’ annual income and their gender (which is illegal and is not supposed to be used in any decision process). Due to the suspicious differences in approval rates between male and female applicants, a third-party regulatory agency actively takes action. It collects all applicants’ data and their received decisions and trains an (assumed perfect) interpretable surrogate model, disclosing the decision rules used in the credit card application to all past applicants, as follows d(fIncomeg> 200k) = 1, d(fIncomeg2 100k200k, Male) = 0:5, 3 Here, a transparency scheme is not utilized for privacy leakage, but the information provided in the resulting surrogate model is. Therefore, here we do not need to actually implement these transparency schemes but simply consider the most privacy-catastrophic case, a perfect interpretation, which has the most accurate information in an algorithmic transparency report. 85 where d() is decision rule defined in Definition 6, representing the probability of re- ceiving a positive decision given the condition. An equivalent if-then decision rule form is the following 8 > > > > > < > > > > > : if Income> 200k, then Positive Decision; if 100k Income 200k^ Male, then Random; otherwise, then Negtive Decision. Note that other interpretable surrogate models such as a decision tree or logistic regres- sion can also be equivalently expressed by decision ruled(). Consider Table 4.2 for the synthetic credit card application scenario, where the key input attributes, population, and decision rules are listed. Populations of applicants are aggregated according to decision regions, i.e., the regions of input attributes partitioned by decision rules. Here the population proportion among decision regions refers to the U.S. census data, and adversaries assumed blind to population of applicants utilize the U.S. census data as side-information to estimate, for each decision region, the percent- age of the total number of male/female applicants. Adversaries know public information (gender) of targeted applicants, and also know decision rules from an announced algo- rithmic transparency report (ATR). When an algorithmic transparency report containing such decision rules is negli- gently announced, as it contains information of strong dependencies between annual income and decisions, any female using such a credit card in public instantly tells any- one who has ever seen the report that her annual income is above 200k, which not only results in a privacy hazard to her, but may also result in unexpected safety concerns. In such a case, an adversary does not even require auxiliary information to be able to infer someone’s secret. 86 Male credit card owners are also at risk, although not as much. For a male credit card owner, the confidence of an adversary believing that his income is above 200k is only around 36%, compared with 100% in the case of a female owner, while based on census statistics, the confidence of an adversary believing that his income is above 200k is merely 3.5%. In other words, once such a negligent algorithmic transparency report is announced to the public, a high-income (>200k) male credit card owner’s risk of exposing annual income information is increased 10 fold. In summary, releasing precise information of interpretable surrogate models (that can be equivalently expressed by decision rules) can be harmful to data subjects’ pri- vacy, as such information gives adversaries a clear mapping between input records and received decision. With assistance from public information and/or side-information, adversaries can abuse algorithmic transparency to undermine people’s privacy. 4.3.2 Privacy Leakage via Fairness Measures Recall that one of the main motivation for algorithmic transparency is to understand if a decision-making algorithm is fair and complies with regulations/law, e.g., the U.S. equal employment opportunity commission (EEOC) regulates the ratio of the hiring rates between women and men, which should not be lower than 80% (80%-rule). In an algorithmic transparency report, such fairness measures may be required upon data subjects’ demands (GDPR, Article 22). To this end, consider again the credit card application in Table 4.2, in which the bank is under suspicion of discriminating against female applicants. Upon female applicants’ demands, a regulation agency gets involved and discloses the following fairness mea- sures for gender: (i) bias in statistical parity (SP) (Definition 2) for male and female applicants, (ii) bias in conditional statistical parity (CSP) (Definition 3) for male and female applicants who have the same level of income. An ATR listing all the above 87 Table 4.3: Fairness Measures for Table 4.2 in an ATR Y 1 =fFg,Y 2 =fMg W 1 =fAnnual Income 100kg W 2 =f100k Annual Income 200kg W 3 =fAnnual Income 200kg Overall approval rate for female (Y 1 ) = 1.33%; Overall approval rate for male (Y 2 ) = 10%; Bias in SP forY 1 andY 2 = 0.0866; Bias in CSP forfY 1 ;W 1 g andfY 2 ;W 1 g = 0; Bias in CSP forfY 1 ;W 2 g andfY 2 ;W 2 g = 0.5; Bias in CSP forfY 1 ;W 3 g andfY 2 ;W 3 g = 0. fairness measures w.r.t. the credit card application is shown in Table 4.3. According to GDPR, Recital 58, information related to the public’s concerns (fairness in gender in this case) can be announced to the public in an electronic form, e.g., through a website. Moreover, a data subject, which is an credit card applicant in our scenario, has the right to inquire about the decision principle w.r.t. his or her personal data. Mary, a low-income (<100k) female who would like to know why her applications are always denied, demands information regarding the decision processing for her record. The response indicates that the approval rate for a low-income female is 0. She would also like to know the decision principles for other groups of people, so she requests additional information. According to GDPR Article 12, the regulation agency refuses her excessive requests. However, she later realizes that she can utilize the information provided in Table 4.3 to achieve her goal. 88 If we let d i;j be the decision rule for people infY i ;W j g, Mary just got a reply indicating d 1;1 = 0. By utilizing the census statistics in Table 4.2, the information provided in Table 4.3 tells the following 0:931d 1;1 + 0:057d 1;2 + 0:012d 1;3 = 0:0133; 0:842d 2;1 + 0:123d 2;2 + 0:035d 2;3 = 0:1; jd 1;1 d 2;1 j = 0; jd 1;2 d 2;2 j = 0:5; jd 1;3 d 2;3 j = 0: The adversary Mary now knows that d 1;1 = d 2;1 = 0, d 1;3 = d 2;3 , and either d 1;2 = d 2;2 + 0:5 or d 1;2 = d 2;2 0:5. She first assumes d 1;2 = d 2;2 + 0:5, by plugging the values ofd 1;1 andd 2;1 , and replacingd 2;2 andd 2;3 byd 1;2 0:5 andd 1;3 , respectively, she gets 0:066d 1;2 + 0:023d 1;3 = 0:1481. Sinced i;j are probabilities,8i;j,d 1;2 andd 1;3 can not be grater than 1, and thus the equation is infeasible and the assumption is wrong. She then knowsd 1;2 =d 2;2 0:5. Repeat the same steps and she will getd 1;2 = 0:0088 and d 1;3 = 1:0692. By understanding any d i;j can not be greater than 1 and this is probably caused by the mismatch between the census statistics and the true distribution, she would thus updated 1;3 = 1 andd 1;2 = 0:0013. Therefore, by utilizing the decision processing rule for her record and the publicly announced fairness measures, she can obtain accurate decision rules for the credit card application. As from Section 4.3.1, we know privacy disaster can happen when accurate decision rules are released or hacked. The adversary Mary now can utilize her hacked decision rules to infer other applicants’ income. 89 4.3.3 Privacy Leakage via Feature Importance/Interaction Feature (value) importance, or feature (value) interaction, measures the importance (or influence) of input attributes (or attribute values) to the decision outcomes. The impor- tance of an input attribute (value) is measured based on the corresponding change of output due to change of that certain input. By changing an input, if the change of output is significant, it implies the input is important (has significant influence) to the output. On the other hand, if the output changes very little, the input contributes very little to the output. Different works may propose different measures, but their philosophies are almost the same (as stated above). For example, the measures for change of an input can be (i) removing the presence of an input attribute, or (ii) permuting attribute values on an input attribute. The measures of outputs are many, e.g., (i) accuracy of the (predicted) outputs [31, 67], (ii) probability of receiving a certain outcome [44], (iii) statistics mea- sures, such as partial dependence [70, 85], H-statistic [71], or variable interaction net- works [93], or (iv) a self-defined quantity or a score/gain function. The measures for the change of outputs can be (i) difference (i.e., subtraction), (ii) ratio, or (iii) averaged difference/contribution, e.g., the Shapley value [102], of the measured outputs. In this regard, it is impractical for us to demonstrate the privacy leakage issue for all present methods. However, since the philosophies of all these methods are similar, it is reason- able for us to demonstrate the privacy hacking procedures via a representative one. The principles of hacking procedures can be transferred and applied to other methods. We investigate potential privacy leakage via the quantitative input influence (QII) proposed in the most pioneering work [44] in accountable ATR. For QII, the measure for change of an input is permuting attribute values (called intervention in the paper) on an input attribute. The measure of output can be user-specified, called quantity of interest, denoted byQ. The measure for change of output is difference between (subtraction of) 90 Table 4.4: Attribute Information of the Credit Approval Dataset A1: b, a. A9: t, f. A2: continuous. A10: t, f. A3: continuous. A11: continuous. A4: u, y, l, t. A12: t, f. A5: g, p, gg. A13: g, p, s. A6: c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff. A14: continuous. A7: v, h, bb, j, n, z, dd, ff, o. A15: continuous. A8: continuous. A16: +,- (class attribute) two measured outputs. Formally, the QII of an input attributek for a quantity of interest Q is defined as I Q (k) =Q(X)Q(X k U k ); (4.6) in whichX k U k , meaning that attributek is (removed from inputX and) replaced by a permuted versionU k , represents intervention on attributek. In particular, forQ(X) = Pfc(X) = 1jX2 T W g, the fraction of records belonging to a set T W (e.g., women) with positive classification, the QII of an input attributek is I (k) =Pfc(X) = 1jX2T W gPfc(X k U k ) = 1jX2T W g; (4.7) where c() is a classifier (decision-maker). The QII of a set of input attributesK is defined similarly, usingK instead ofk. In the following, we conduct an experiment to demonstrate the hacking of decision rules via provided QII’s on an ATR for a real dataset, and utilize the hacked decision 91 rules to further infer private records as what we did in Section 4.3.1. We use the Aus- tralian credit approval dataset from UCI machine learning repository [48] in our experi- ment 4 . The dataset has 690 instances, with 15 input attributes and 1 output attribute. All attribute information can be found in Table 4.4. In order to protect confidentiality of the data, all attribute names and values have been changed to meaningless symbols by the dataset provider. Based on the dataset, with adequate data cleaning and pre-processing, we train a classifier based on a fully-connected neural network with one input layer (36 inputs, after one-hot encoding for categorical attribute values), two hidden layers (147 and 85 neurons, respectively), and one output layer (binary outputs), with dropout rate 0.5. The averaged testing accuracy of the trained classifier is 89.5%. The trained classifier is served as the knowledge of a trust-worthy 3rd-party regula- tion agency which feeds both inputs and outputs of the dataset to a ML model in order to learn the unknown decision-making rules of this Australian credit card company. Since QII is a data-mining based approach [44], the regulation agency provides information regarding input influences (QII) in an ATR upon users’ demand. Since the access con- trol is still an open question, we assume a user is able to request such information in a reasonable manner. Based on the above experimental settings, we first construct a scenario to demon- strate the hacking. Scenario: • LetU =fA4, A5, A6, A7g be public attributes and all other attributes are private and unknown to adversaries (See Remark 3). 4 Since we are demonstrating stealing private information from a real dataset, the chosen dataset needs to contain critical information, and its size needs to be adequate: on the one hand, it should not be too large for ease of demonstration; on the other hand, it should not be too small for the accuracy of the trained classifier. 92 Table 4.5: A Snapshot of the QID GroupT x U =fy, p, k, vg A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 y p k v 0.125 f f 0 f g 160 0 - y p k v 0.25 f f 0 f g 224 0 - y p k v 0.29 f f 0 f s 200 0 - y p k v 1.25 f f 0 t g 280 0 - y p k v 0.125 f f 0 f g 140 4 - y p k v 0.125 f f 0 f g 200 70 - y p k v 0.085 f f 0 f g 216 2100 - y p k v 0.415 f t 1 t g 280 80 - y p k v 2.5 t t 1 f g 180 20 - y p k v 0.25 t t 10 t g 320 0 + y p k v 0.25 t t 11 f g 380 2732 + • Alice has public recordx U =fy, p, k, vg. She gets a positive decision (+) and receives a credit card. • Tom also has the same public recordx U =fy, p, k, vg. He gets a negative decision (-). • An adversary is a friend of both, knowing their public records, knowing that Alice owns such a credit card but Tom doesn’t. The adversary also has the knowledge of joint distribution of A4A7, A9, and A11, e.g., demographic statistics of age, marriage status, race, and annual income. A snapshot of the QID groupT x U =fy, p, k, vg is shown in Table 4.5 5 , in which public at- tributes are marked as grey, class attribute (decision outcome) is marked as light blue, and for those attributes that an adversary has associated side-information (joint distribu- tion) are marked as bold. Let W 0 = fA4=y, A5=p, A6=k, A7=v, A112[0,1]g and W 1 = fA4=y, A5=p, A6=k, A7=v, A112[10,11]g. We next demonstrate privacy hacking procedures in the following. 5 We remove attributes A1A3 in the interest of space 93 Privacy Hacking: 1. Since the input to QII can be a set of attributes, i.e., the joint influence of a set of input attributes. LetS be the collection of all private attributes as denoted in Table 4.1, which isfA1A3, A8A15g in our scenario. The adversary then sends the following QII query to the regulation agency: • Input Attribute:S • Quantity of Interest:Q(X) =Pfc(X) = 1jX2T W 1 g 2. The adversary gets a response I (S) = 0:66475333, which indicates the degree of influence of all private input attributesS to the groupW 1 . 3. The adversary sends the following QII query to the regulation agency: • Input Attribute:S • Quantity of Interest:Q(X) =Pfc(X) = 1jX2T W 0 g 4. The adversary gets a response I (S) =0:33524666, which indicates the degree of influence of all private input attributesS to the groupW 0 . Note that negative sign stands for negative impact as mentioned in Section 4.2.1. 5. From the above two query responses, the adversary has 0:66475333 =Pfc(X) = 1jX2T W 1 gPfc(X S U S ) = 1jX2T W 1 g 0:33524666 =Pfc(X) = 1jX2T W 0 gPfc(X S U S ) = 1jX2T W 0 g 6. SinceW 1 andW 0 have the same public record x U =fy, p, k, vg, for the same classifier, we must have Pfc(X S U S ) = 1jX2T W 1 g =Pfc(X S U S ) = 1jX2T W 0 g: 94 7. Utilize the above equality, the adversary obtains Pfc(X) = 1jX2T W 1 gPfc(X) = 1jX2T W 0 g = 1: Since probabilities are always within [0; 1], the adversary thus obtains decision rules Pfc(X) = 1jX2T W 1 g = 1; Pfc(X) = 1jX2T W 0 g = 0: It is worth mentioning that the attack may not be unique. In the following, we demonstrate another attack approach. Note that this is not the only other approach, but we just demonstrate two approaches to show that there could exist multiple attack approaches. Privacy Hacking (Method 2): 1. The adversary sends the following QII query to the regulation agency: • Input Attribute: A9 • Quantity of Interest:Q(X) =Pfc(X) = 1jX2T W 1 g 2. The adversary gets a response I (A9) = 0:45142778, which indicates the degree of influence of input attribute A9 to the groupW 1 . 3. The adversary tries to analyze the response: DefineP t = Pfc(X A9 U A9 ) = 1jX2 T W 1 ;U A9 = tg andP f = Pfc(X A9 U A9 ) = 1jX2T W 1 ;U A9 = fg. We have 0:45142778 =Pfc(X) = 1jX2T W 1 gPfc(X A9 U A9 ) = 1jX2T W 1 g =Pfc(X) = 1jX2T W 1 gPfU A9 = tgP t PfU A9 = fgP f 95 4. The adversary realizes the fact that, for the same classifier, we must have Pfc(X) = 1jX2T W 1 g =Pfc(X) = 1jX2T W 1 ;A9 = tg =Pfc(X A9 U A9 ) = 1jX2T W 1 ;U A9 = tg =P t 5. Since the adversary has joint distribution knowledge as mentioned in the scenario, he knows the marginal distribution: PfU A9 = fg = 1PfU A9 = tg = 0:45142857; he then gets Pfc(X) = 1jX2T W 1 gP f = 0:45142778 0:45142857 1 6. Since probabilities are always within [0; 1], the adversary knowsP f 0, and Pfc(X) = 1jX2T W 1 g 1: The adversary obtains very accurate information regarding decision rule forW 1 . As shown above, there could exist many ways to obtain decision rules, and thus it seems hopeless to cease the attack simply by access control. Based on the hacked decision rules above, the adversary has 100% confidence that Alice’s record belongs toT W 1 and Tom’s record belongs toT W 0 . Based on Table 4.5, he then knows that Alice’s A11 attribute value is either 10 or 11, and Tom’s is either 0 or 1. If the adversary has richer side-information, e.g., joint distribution including A8 and A14, then the adversary has 100% confidence that Alice’s A8 attribute value is 0.25, her 96 A14 attribute value is in the range between 300 and 400, and Tom’s A14 attribute value is in the range between 100 and 300. It is worth mentioning that, based on our investigation, we do not find a general attack method that can be applied to all datasets and decision rules. However, this does not mean the attacks demonstrated above are cherry-picked. As we have shown, there could exist many feasible attack approaches. Adversaries can simply try multiple different attempts and/or collude their test results so that eventually acquire a successful attack result. Moreover, similar to the privacy incidents of AOL search data leak [15] and de-anonymization of the Netflix Price dataset [124], although there is no guarantee that the attacks can always succeed in all the cases, as long as the attack can succeed, there exists a privacy breach which can result in a catastrophic disaster. In fact, the authors of the pioneering work, i.e., [44], had already noticed the po- tential privacy issue in algorithmic transparency and added noise to make the measures differentially private. Unfortunately, adding differentially private noise [56] solely can- not mitigate the demonstrated privacy leakage issue. The fundamental reason is that differential privacy only guarantees a small amount of information leakage when an in- dividual participates the survey or opts into a database. Differential privacy itself does not guarantee information leakage due to strong statistical inference between attributes; this has been noted in many previous works such as [17, 8, 51], and section 2.3.2 in [58]. The most classic example is the study of “smoking causes cancer”, in which no matter whether a person opts into the survey or not, once we know that he is a smoker, we know he has a certain high chance of getting lung cancer. What can be guaranteed in the proposed differentially private perturbation for an ATR is that an adversary can only gain very little information by comparing two ATRs of which the training data to train the classifiers are differ in only one data subject’s record. When the size of dataset 97 is very large, the required variance of DP noise is very small. This is why they claimed only very little noise needs to be added. Remark 3. Although all attribute names in the dataset are removed, we are still able to reasonably conjecture public and private attributes based on their influences to the decision outcome. Attributes with high influences are more likely to be private attributes such as income or credit score, and attributes with low influences are likely to be public ones. Observing that attribute A9, A11, and A15 are the most influential ones and others are less significant (from experiments). For ease of demonstration, we choose 4 adjacent categorical attributes from insignificant ones, A4 to A7, to serve as public attributes. 4.4 Privacy Measure and Requirement In the previous section we saw privacy leakage disasters when decision rules were di- vulged 6 . The fundamental problem is that transparency schemes as well as fairness measures are closely related to, or functions of, decision mappingD, and more impor- tantly, ifD provides strong inference from public knowledge to sensitive records, once it is utilized in an ATR and obtained by an adversary, the adversary can utilize it to further acquire data subjects’ secrets with high confidence. In light of this, here we propose the following: a carefully processed D, denoted by ~ D, should be adopted as a substitute for D in an ATR for the sake of preserving data subjects’ privacy. ~ D should satisfy certain privacy requirements and can be safely announced (if an ATR chooses to release an interpretable surrogate model) or utilized by transparency schemes and fairness measures provided in an ATR. In other words, even if an adversary knows ~ D and further utilizes it to perform inference attacks, the maximal confidence that the adversary can have is carefully controlled in advance, in order to 6 Either directly announced or stolen. 98 prevent privacy hazards. In this regard, privacy measures of the announced version of decision mapping ~ D used in an ATR should reflect the maximal degree of an adversary’s confidence in inferring any data subject’s secret via ~ D. More specifically, while an ATR is released, an adversary may acquire information about ~ D and update his/her belief accordingly, which can be utilized as an inference channelhX U ;A P X ; ~ D ! X S i mapping from any inference sourceX U andA to any sen- sitive attribute value X S . (When the context is clear, we will omit the P X and the ~ D above the arrow for simplicity.) One reasonable privacy measure to characterize the above inference caused by ~ D, given the most information that an adversary can possess (see Section 4.2.3), is the maximum confidence of an adversary in inferring any data subject’s sensitive value X S , which is the maximum posterior probability over all the inputs and outputs (called worst-case posterior vulnerability [63]). We now consider the case in whichS is a singleton set. Definition 7. (Maximum Confidence) Given the adversarial setting and an inference channelhX U ;A! X S i, the confidence of inferring a certain sensitive attribute value x S from a certain inference source (x U ;a), denoted byconf (x U ;a!x S ), is the poste- rior epistemic probability ofx S givenx U anda as follow conf (x U ;a!x S ) = ~ P X S jX U ;A (x S jx U ;a): The maximum confidence of inferring a specific sensitive attribute value x S from any inference sources, denoted byConf (X U ;A!x S ), is defined as Conf (X U ;A!x S ), max x U ;a fconf (x U ;a!x S )g: 99 Accordingly, the maximum confidence of inferring any sensitive attribute value from any inference channel is Conf (X U ;A!X S ), max x U ;a;x S fconf (x U ;a!x S )g: The privacy requirement, similar to confidence bounding [176, 177],-likeness [35], and privacy enforcement in [105], restricts the maximum confidence on inferring any sensitive attribute by a confidence threshold, a pre-determined privacy parameter. Definition 8. (-Maximum Confidence) In an algorithmic transparency report, ~ D satis- fies the privacy requirement-Maximum Confidence ifConf (X U ;A!X S ). Lemma 1. The privacy requirement-Maximum Confidence imposes the following con- straints to the announced decision mapping ~ D,8x2R X ,8a2A, ~ D a (x)P X (x) P x 0 2Tx U ~ D a (x 0 )P X (x 0 ) : (4.8) Proof. Please refer to Appendix 4.8.1 for detailed proof. Remark 4. Note that a privacy requirement which only prevents an adversary correctly from inferring the right sensitive attribute value is insufficient. The reason is that an adversary can possess the knowledge of the privacy protection scheme and its internal privacy parameter. If the privacy requirement allows an adversary to incorrectly infer wrong sensitive values with arbitrary high confidence, since an adversary knows the privacy requirement, he/she perceives that any sensitive attribute value which can be inferred with confidence higher than the threshold is an incorrect one; this becomes additional side-information for the adversary. An adversary can further utilize such extra side-information to narrow down the range of conjectures, which enhances the 100 confidence of correctly guessing the right sensitive value. The enhanced confidence could result in exceeding the privacy threshold, and thus cause a privacy hazard. The advantage of using maximum confidence as a privacy measure is that it results in intuitive understanding of. This could be important when a privacy scheme is used for an ATR, the regulation may require a plain explanation for the adopted privacy scheme as well as the corresponding settings and meanings of its parameters. Alternatively, one can use other privacy measures, e.g., the minimum uncertainty, which is essentially conveying the same concept as maximum confidence, but the privacy parameter grows with the strength of privacy. Definition 9. (Minimum Uncertainty) Given an inference channelhX U ;A! X S i, the uncertainty of inferring a certain sensitive attribute valuex S from a certain inference sourcefx U ;ag is defined as ucrt(x U ;a! x S ) = log conf (x U ;a! x S ) . The minimum uncertainty of inferring any sensitive value from any inference channel is Ucrt(X U ;A!X S ) = min x U ;a;x S f log conf (x U ;a!x S ) g = log max x U ;a;x S fconf (x U ;a!x S )g = log Conf (X U ;A!X S ) : Similarly, the corresponding privacy requirement for minimal uncertainty is the fol- lowing. Definition 10. ( -Minimum Uncertainty) In an algorithmic transparency report, ~ D sat- isfies -Minimum Uncertainty ifUcrt(X U ;A!X S ) . The above privacy requirement is basically saying that an adversary’s uncertainty on inferring any sensitive value from any inference channel cannot be too low and should 101 be lower-bounded by a threshold ; the larger the , the higher the minimum uncer- tainty, and thus the stronger the privacy. From definition (9), it is clear that -Minimum Uncertainty implies e -Maximum Confidence, and -Maximum Confidence implies log-Minimum Uncertainty. Lemma 2. The privacy requirement -Minimum Uncertainty imposes the following con- straints to the announced decision mapping ~ D,8x2R X ,8a2A, log X x 0 ~ D a (x 0 )P X (x 0 ) log ~ D a (x)P X (x) : (4.9) A privacy protection schemeM which takes the original/true decision mapping, D, as the input, with careful processing based on privacy requirements, generates a privacy-preserving decision mapping, ~ D, which is safe for announcement. Inevitably, the original D would differ from the generated ~ D, which is a distorted/perturbed but private version of D. However, many ~ D are possible, each with a different degree of distortion (compared with D), where we may be interested in the least distorted one. Therefore, as an incidental issue to privacy perturbation, in what follows we introduce a measure that characterizes the degree of distortion, or the opposite, fidelity of the perturbed decision mapping ~ D. 4.5 Fidelity In this section, givenD M ! ~ D, we propose fidelity measure to quantify the distortion from ~ D toD. By imposing fidelity constraints toM, the maximal distortion between ~ D andD is guaranteed to be bounded accordingly. 102 N/A unknown or Figure 4.3: Depictions of measured and the unknown true biases Definition 11. (-Fidelity) A privacy perturbation methodM : (A)! (A) satisfies -fidelity,2 [0; 1], if8x2R X and8a2A, we have j ~ D a (x)D a (x)j 1: (4.10) Definition 12. (-Fidelity) A privacy perturbation methodM : (A)! (A) satisfies -fidelity,2 [0; 1], if8x2R X and8a2A, we have ~ D a (x)=D a (x) 1=: (4.11) A privacy protection schemeM not only influences the announced transparency scheme by a privacy-preserving ~ D, but also perturbs the measured bias", which depends on the announced ~ d or ~ D (See Section 4.2.2 and Section 2.5). Fig. 4.3 is a representative illustration of the measured bias ~ " F and the true bias " F based on a fairness measure which belongs to a setF. Since the true decision mappingD will not be release, the true bias, which is the bias computed based onD, is thus unknown. A natural question arises: by knowing ~ " F , and the degree of fidelity of ~ D, what can we know about" F ? The following lemma tells us: if the maximum distortion from ~ D toD is known, then the maximum distortion from ~ " F to " F will also be known accordingly, and thus the range of" F can be known. 103 Lemma 3. GivenD M ! ~ D, ifM satisfies-fidelity, we can guarantee j~ " F tv " F tv j minf2(1); 1g; (4.12) whereF tv is the set of all total-variation-based fairness definitions (Remark 2). In con- trast, ifM satisfies-fidelity, we can guarantee j~ " F rm " F rm j minf2 log; 1g; (4.13) whereF rm is the set of all relative-metric-based fairness definitions. Proof. By applying reverse triangle inequality, the results trivially follows. Remark 5. The most general definition of fidelity can be ~ D a (x) min ~ D a (x) ~ D a (x) max ; (4.14) which describes the restriction (the allowed range) of distortion of ~ D in a very general manner. The corresponding equivalent representations for- and-fidelity are D a (x) (1) ~ D a (x)D a (x) + (1); (4.15) D a (x) ~ D a (x) 1 D a (x); (4.16) in which the upper and lower bounds ~ D a (x) max and ~ D a (x) min are functions ofD and, or. 104 4.6 Privacy-Fidelity Trade-off We have shown that negligently disclosing the decision rules could bring serious hazards to privacy and confidentiality. A privacy protection scheme should be adopted to reme- diate the potential deleterious effects. However, strong privacy perturbation could cause serious distortion on the announced information including decision rules and measured bias 7 . From this perspective, a privacy protection scheme of interest should preserve pri- vacy while guaranteeing a certain degree of fidelity to the announced information. This turns out a privacy-fidelity trade-off problem: based on proposed privacy and fidelity measures, we trade off fidelity against privacy accordingly. 4.6.1 Optimization Formulation To understand the trade-off between privacy and fidelity, it is crucial to understand the boundary of their trade-off region. Given certain fidelity constraints, the problem of finding the greatest privacy (the smallest ) that we can have in ~ D is mathematically 7 In general, this can be any measured quantities of interests 105 formulated in the following. For conciseness, we omit the subscript of all probability measures and simply write, e.g.,P (x) instead ofP X (x). OPT(R X A) : (OPT) min ~ D (4.17a) s.t. P (x) ~ D a (x) P x 0 2Tx U P (x 0 ) ~ D a (x 0 ) ; 8x2R X ;8a2A (4.17b) ~ D a (x) ~ D a (x) max ;8x2R X ;8a2A (4.17c) ~ D a (x) ~ D a (x) min ; 8x2R X ;8a2A (4.17d) ~ D a (x) 0;8x2R X ;8a2A (4.17e) X a2A ~ D a (x) = 1; 8x2R X : (4.17f) The first constraint in (4.17b) is the privacy constraint-Maximum Confidence in- troduced in (4.26), and the last two constraints in (4.17e) and (4.17f) are probability distribution conditions. The second and the third constraints in (4.17c) and (4.17d) are fidelity constraints introduced in (4.14). Its corresponding representations for- or -fidelity can be found in Remark 5. The objective in (4.17a) is to find the minimal subject to the feasibility of ~ D based on the above-mentioned constraints. It is not hard to see that the optimization problem (OPT) is an equivalent formulation of the generalized linear fractional programming (LFP) problem. It has been known that a generalized LFP is not reducible to a linear programming (LP) problem. However, it can be solved efficiently as a sequence of LP feasibility problems, i.e., solving numerous sub-level LP problems iteratively according to bisec- tion method. By efficient algorithms such as interior point method, the solution of a LP problem can be obtained in pseudo-polynomial timeO( n 3 logn L) [10], wheren is the num- ber of variables, andL is the input length of the problem, i.e., the length of the binary 106 coding of the input data to represent the problem, which is roughly proportional to the number of constraints. However, based on (4.17b)–(4.17f), it is clear that the number of constraints in the problem is proportional tojR X Aj, which can be extremely large when the database has considerable numbers of attributesK, especially for big-data ap- plications. Suppose the cardinality for each input attribute is consistent, e.g.,jX k j = l, 8k = 1;:::;K, we will have at least l K constraints. This leads to an exponentially growing computational complexity. Even for a conservative example, e.g., a binary de- cision process takes 20 input attributes and each attribute has 5 possible values (l = 5 and K = 20), we have at least 2l K = 190,734,863,281,250 constraints. In order to solve a generalized LFP problem, we need to solve such a huge LP problem iteratively. 4.6.2 Decomposability However, in the following, we show that the optimization problem can actually be de- composed into numerous small sub-problems and thus can be solved efficiently. An optimization problem is separable or trivially parallelizable if the variables can be par- titioned into disjoint subvectors and each constraint involves only variables from one of the subvectors [29]. By observing (i) each constraint in (4.17c), (4.17d), and (4.17e) involves only a single variable ~ D a (x), (ii) each constraint in (4.17f) involves a set of variablesf ~ D a (x)j8a2Ag, and (iii) each constraint in (4.17b) involves a set of vari- ablesf ~ D a (x)j8x2T x U g, we notice that any variable ~ D a (x) is a complicating variable in T x U A but is irrelevant to any other variables outside the QID group T x U . Hence, (4.17b)–(4.17f) are complicating constraints within a tuple but separable constraints among tuples. (OPT) can thus be decomposed into multiple smaller sub-problems; each 107 focuses on a particular QID group only. Leth( ~ D a (x);) 0 be the affine function rep- resenting all linear inequality constraints (4.17b)–(4.17e). An optimization sub-problem can thus be expressed as follow. OPT-SUB(T x U A) : (OPT-Sub) min ~ D (OBJ-Sub) s.t. h( ~ D a (x);) 0;8x2T x U ;8a2A (INEQ-Sub) X a2A ~ D a (x) = 1;8x2T x U : (EQ-Sub) (OPT) is then equivalent to the master problem below. OPT-MASTER(R X A) : (OPT-MS) min ~ D (OBJ-MS) s.t. (INEQ-Sub(T x U A));8T x U R X (INEQ-MS) (EQ-Sub(T x U A));8T x U R X : (EQ-MS) Lemma 4. Let Tx U denote the optimal value of a sub-problem (OPT-Sub), the opti- mal value of (OPT). We have = max Tx U R X Tx U . Proof. Since (OPT) is a generalized LFP (in an equivalent formulation), according to OPT-MS, the result trivially follows. The Lemma above basically says that given the same fidelity constraints, the over- all highest privacy guarantee is the largest Tx U ’s among all sub-problems, i.e., the weakest optimal privacy guarantee among all QID groups. 108 4.6.3 Solution Properties According to the decomposability of the optimization problem, in the following, we only need to focus on solving an optimization sub-problem (OPT-Sub). In particular, we are interested in where the trade-off between privacy and fidelity starts and ends. We propose two lemmas in the following which will address the answers of this question. Before introducing the lemmas, we first define a useful quantity which will be further utilized to characterize the trade-off. Definition 13. (Maximum Posterior Confidence) Given an optimization sub-problem (OPT-Sub) and 1-fidelity (100% faithfulness) requirement, i.e., = = 1 and ~ D =D, the highest confidence that an adversary can have on inferring any sensitive information from any decision outcome is denoted by C , i.e., C , Conf (X U = x U ;A P X ;D ! X S ) = max a;x S fconf (x U ;a!x S )g. Lemma 5. An (OPT-Sub) has the 1-fidelity solution ~ D a (x) = D a (x),8x2 T x U ,8a2 A, if and only ifC . Proof. Please refer to Appendix 4.8.2 for detailed proof. We provide intuitive expla- nation as proof sketch here. Since the highest confidence that an adversary can have (C ) is lower than the privacy requirement (the confidence threshold ), it is safe to releaseD directly, i.e., ~ D =D with perfect fidelity. On the other hand, as long asC is greater than, releasing ~ D = D violates privacy requirement and cannot be a feasible solution. Lemma 5 tells us when C , there is no trade-off between privacy and fidelity: as long as is greater thanC , increasing the strength of privacy (decreasing) would not cause degradation in fidelity. In other words, alone the strength of privacy (from low to high), the trade-off between privacy and fidelity starts when is right belowC . The next lemma will tell us the end of this trade-off region. 109 Lemma 6. For = = 0, i.e., fidelity constraints are trivialized or not presented, an (OPT-Sub) has feasible solutions if and only if min , max x2Tx U P (xjT x U ). In other words, there exists privacy limit, which is the strongest privacy that we can have (in the worse-case scenario, discussed in Section 4.2.3). Proof. Please refer to Appendix 4.8.3 for detailed proof. We provide intuitive behind this lemma as proof sketch here. The privacy limit max x2Tx U P (xjT x U ) is the greatest conditional probability over the tuple 8 , which is actually the highest possible inference confidence of an adversary before releasing an ATR. It is the baseline confidence, which merely utilizes knowledge of public recordx U and side-informationP (x) in an infer- ence channelhx U P X ! x S i as we discussed in Section 4.2.3. Since an ATR does not contribute to such an inference channel, an associated privacy protection scheme is not able to help further reduce this baseline confidence. While achieving such a privacy limit, an ATR basically reveals zero useful information to the public. From Lemma 5 and 6, we know that the privacy-fidelity trade-off starts and ends, alone the strength of privacy, from low to high, at the maximum posterior confidence and the maximum prior confidence of an adversary, respectively. In the following, we show that the ending point never happens before the starting point. Lemma 7. C min . Proof. Please refer to Appendix 4.8.4 for detailed proof. We first provide the intuition of the lemma as a proof sketch. The intuition here is very straightforward: the maxi- mum posterior confidence can never be lower than the maximum prior confidence (prior vulnerability cannot exceed posterior vulnerability in [123]). Equality holds when the revealed information is completely useless. 8 Since all records in a tuple have the same public record,P(xjT x U ) is alsoP(x S jT x U ), the conditional distribution over all sensitive records. 110 4.6.4 Optimal Privacy and Solutions According to Lemma 5, when2 [C ; 1], the true decision mappingD can be safely released without perturbation (1-fidelity). Lemma 6 tells us when fidelity constraints are not imposed (0-fidelity), the feasible privacy region is 2 [ min ; 1]. Moreover, based on Lemma 7, the region [ min ;C ] is always non-empty. Clearly, this is the region where we trade off fidelity against privacy. The theorem proposed in the following can analytically characterize such a trade-off. The optimal privacy guarantee for a QID group (the minimum feasible for (OPT-Sub)) has a closed-form expression in terms of fidelity, i.e., the allowed perturbation ranges [ ~ D a (x) min ; ~ D a (x) max ]. Theorem 1. (Optimal Privacy) Consider an optimization sub-problem (OPT-Sub) for a QID group, in which we seek for the strongest privacy guarantee given fidelity con- straints. For a decision outcomea, define x a , arg max x2Tx U P (x) ~ D a (x) min ; b(x) =P (x) p P x 0 2Tx U P (x 0 ); ~ D a (x) max 0 , 1 P (x) minfP (x) ~ D a (x) max ;P(x a ) ~ D a (x a ) min g; ~ D a (x) min 0, 1 P (x) maxfP (x) ~ D a (x) min ;P(x a ) ~ D a (x a ) min +b(x)g: For binary decisions, i.e.,a2A =f0; 1g, the optimal privacy Tx U = maxf 0 ; 1 ; p g, where 0 = P (x 0 ) ~ D 0 (x 0 ) min P (x 0 ) ~ D 0 (x 0 ) min + P x6=x 0 ;x2Tx U P (x) ~ D 0 (x) max 0 , 1 = P (x 1 ) ~ D 1 (x 1 ) min P (x 1 ) ~ D 1 (x 1 ) min + P x6=x 1 ;x2Tx U P (x) ~ D 1 (x) max 0 , p = P (x 1 ) ~ D 1 (x 1 ) min +P (x 0 ) ~ D 0 (x 0 ) min P x2Tx U P (x) , 111 and the corresponding optimal privacy solutions are When Tx U = 0 : ~ D 0 (x) = ~ D 0 (x) max 0 ;8x2T x U When Tx U = 1 : ~ D 1 (x) = ~ D 1 (x) max 0 ;8x2T x U When Tx U = p : ~ D a (x a ) = ~ D a (x a ) min ;8a2A P x2Tx U P (x) ~ D a (x) = 1 p P (x a ) ~ D a (x a ) min ;8a2A ~ D a (x) min 0 ~ D a (x) ~ D a (x) max 0 ;8x2T x U ;8a2A: When Tx U = p andjT x U j> 3, we have multiple solutions. Proof. We refer readers to Appendix 4.8.5 for the detailed proof. Given Theorem 1, the optimal privacy guarantee Tx U for each QID group can be computed analytically. Thus, based on Lemma 4, the overall strongest privacy guarantee is the largest Tx U among all QID groups. 4.6.5 Insights into the Optimal Privacy Solutions Recall from (4.8) in Lemma 1, that the inference confidence is fully characterized by P (x) ~ D a (x) pairs, which are perturbed joint probabilities ~ P X;A (x;a) bounded within ranges [P (x) ~ D a (x) min ;P(x) ~ D a (x) max ] due to fidelity constraints. In this regard, solv- ing (OPT-Sub) in order to minimize the maximal possible inference confidence of an adversary is equivalent to finding an optimal way to “tune” those joint probabilities in order to minimize the maximal ratio on the left-hand-side of (4.8) over all x2 T x U anda2A. From Theorem 1, it turns out that for each decision outcomea, the maxi- mum of lower boundsP (x a ) ~ D a (x a ) min = max x2Tx U P (x) ~ D a (x) min plays a crucial role in solving (OPT-Sub). Corollary 1. ~ D a (x a ) min = ~ D a (x a ) min 0 = ~ D a (x a ) max 0 ,8a2f0; 1g. 112 Proof. By definitions ofx a and ~ D a (x) max 0 , the result ~ D a (x a ) min = ~ D a (x a ) max 0 trivially follows. Based on Lemma 6, we haveb(x) = P (x) p P x 0 P (x 0 ) 0, and thus by pluggingx =x a into ~ D a (x) min 0, we obtain ~ D a (x a ) min = ~ D a (x a ) min 0. Remark 6. Based on Corollary 1, the effective upper and lower limit of x a are equal, which implies there is only one possible value forx a , which is ~ D a (x a ) min . From Theo- rem 1, we can see that this is true for all the cases. The effective lower limits P (x) ~ D a (x) min 0 and the effective upper limits P (x) ~ D a (x) max 0 represent the feasible region where the fidelity constraints and privacy constraints intersect. For the effective upper limit,P (x a ) ~ D a (x a ) min serves as a thresh- old, imposing additional upper limits on allP (x) ~ D a (x) pairs. The effective upper limit is the minimum of the original upper limitsP (x) ~ D a (x) max (fidelity constraints) and the threshold, formally,P (x) ~ D a (x) max 0 = minfP (x) ~ D a (x) max ;P(x a ) ~ D a (x a ) min g. Similar intuition applies to the effective lower limits. The value a , a2f0; 1g, denotes the optimal privacy guarantee for a sub-group of people whose input records belong to the same QID groupT x U and whose received decisions are (the same) a 9 , denoted by T fx U ;ag . According to Theorem 1, a can be achieved by simply letting ~ D a (x) = ~ D a (x) max 0 ,8x2T x U , i.e., plugging effective upper bound values into all variables. An illustration that aids in understanding the intuition behind Theorem 1 is shown in Fig. 4.4, in which joint probabilities for a = 1 and8x2 T x U are depicted. For conciseness, letm denotejT x U j,p i =P (x i ), and ~ d i = ~ d(x i ) = ~ D 1 (x i ),8i = 1;:::;m. SinceT x U is the range of (x U ;X S ),m represents the number of distinct sensitive records in the tuple. The yellow spots in Fig. 4.4 denote the true joint probabilitiesp i d i , and the regions indicated by arrows interpret the allowed perturbation ranges [p i ~ d imin ;p i ~ d imax ]. 9 From an adversary’s point of view, the inference sources (x U ;a) w.r.t. their records are exactly the same. 113 Figure 4.4: An representative illustration for changes of joint probabilities caused by the optimal-privacy scheme. In this example, since the maximum of lower bounds is p 6 ~ d 6min , we have x 1 = x 6 . The valuep 6 ~ d 6min serves as a threshold (the blue dash line), imposing upper limits on all perturbation ranges. The output of the optimal-privacy solution is then denoted by the red spots, which take values from the effective upper bounds minfp i ~ d imax ;p 6 ~ d 6min g, 8i. Here we get a clear insight into the optimal privacy protection scheme for a sub- groupT fx U ;ag : it flattens the joint distributionP (x) ~ D a (x) as much as possible over all x2 T x U . By flattening the joint distribution, the (Bayesian) posterior distribution over distinct sensitive values seen by an adversary becomes more uniform, and hence the maximal inference confidence is reduced. For binary decisions, the case where the optimal privacy scheme for a QID groupT x U is the optimal privacy scheme for a sub-groupT fx U ;ag happens when the joint distribution of the other sub-groupT fx U ;ag , wherea is the complement ofa (i.e.,a = 1 impliesa = 0, and vice versa), is much “flatter” (i.e., much more private) than that ofa. In other words, the optimal privacy protection scheme flattens the least private distribution; although this 114 might influence the other (the much more private) one and cause it to be less private 10 , as long as its maximal inference confidence is less than a , the optimal privacy for the entire QID group Tx U is dominated by a , and thus the optimal privacy scheme for the sub-groupT fx U ;ag is the optimal privacy scheme for the entire QID group. When neither distribution is much flatter (more private), making one sub-group highly private while causing the other one’s privacy to degrade; thus, none of the opti- mal schemes for any sub-group is optimal for the entire QID group. In this case, both sub-groups need to find a “balanced point” at which both sub-groups are equally pri- vate. Such a balanced point for the maximal inference confidence for two sub-groups is denoted by p in Theorem 1, representing the minimum of the maximal inference confidences for the QID group. As shown in Theorem 1, in general, we have multiple solutions at this balanced, optimal point, and each variable is bounded within a range from its effective lower limit ~ D a (x) min 0 to its effective upper limit ~ D a (x) max 0 . When Tx U = p , in general we have multiple solutions. This is because, in this case, from Theorem 1, subject to feasibility constraints, the optimality is guaranteed if ~ D a (x a ) = ~ D a (x a ) min ,8a, and the following two equalities hold P x P (x) ~ D 0 (x) = 1 p P (x 0 ) ~ D 0 (x 0 ) min ; (4.20) P x P (x) ~ D 1 (x) = 1 p P (x 1 ) ~ D 1 (x 1 ) min ; (4.21) However, in the following we show that the above two equalities are equivalent, i.e., one implies the other. Corollary 2. When Tx U = p , (4.20) implies (4.21), and vice versa. 10 Based on (EQ-Sub), any changes made to ~ D a (x) will also change ~ D a (x). 115 Proof. Recall p from Theorem 1, we have P x P (x) = 1 p P (x 1 ) ~ D 1 (x 1 ) min + 1 p P (x 0 ) ~ D 0 (x 0 ) min : (4.22) Since ~ D 0 (x) + ~ D 1 (x) = 1, subtract (4.20) from (4.22), we obtain (4.21). Similarly, subtract (4.21) from (4.22), we obtain (4.20). Therefore, to compute an optimal solution when Tx U = p , we only need to solve (4.21). Since ~ D 0 (x) + ~ D 1 (x) = 1, and ~ D a (x a ) = ~ D a (x a ) min ,8a, in general we only havem 2 variables (see Remark 7), and based on (4.22), equality (4.21) is equivalent to X x6=x 0 x6=x 1 P (x) ~ D 1 (x) = 12p p P (x 1 ) ~ D 1 (x 1 ) min b(x 0 ): (4.23) When Tx U = p , the right-hand-side (RHS) of (4.23) is strictly bounded by P x6=x 1 ;x 0 P (x) ~ D 1 (x) min 0; P x6=x 1 ;x 0 P (x) ~ D 1 (x) max 0 , which implies there always ex- ists a feasible solution for (4.21). Whenm > 3, since the number of variables to solve (m2) is greater than the number of equation (one, which is (4.23)), an optimal solution, in general, is not unique. Remark 7. For the special case x 0 = x 1 , we have m 1 variables. Such a case can happen when the population of a certain record dominates its corresponding QID group. When this is the case, the prior (distribution) knowledge provides very high (baseline) confidence on inferring this record. In particular, for such a case, we must have p = min . If p > a ,8a, i.e., Tx U = p , this becomes trivial: according to Lemma 6 and its following discussion, the announced ATR can only provide trivial information to achieve this lowest-possible baseline confidence. 116 Algorithm 1 Optimal Privacy Protection Scheme Input:P (x),T x U , ~ D a (x) min , ~ D a (x) max Output: ~ D a (x),8a,8x2T x U 1: fora2f0; 1g do 2: findx a 3: for allx2T x U do 4: compute ~ D a (x) max 0 5: compute 0 , 1 , p , and Tx U maxf 0 ; 1 ; p g 6: if Tx U = 0 then 7: ~ D 0 (x) ~ D 0 (x) max 0 8: ~ D 1 (x) 1 ~ D 0 (x) max 0 9: else if Tx U = 1 then 10: ~ D 1 (x) ~ D 1 (x) max 0 11: ~ D 0 (x) 1 ~ D 1 (x) max 0 12: else if Tx U = p then 13: ~ D 1 (x) ALLOCATION() 14: ~ D 0 (x) 1 ~ D 1 (x) 15: return ~ D a (x),8a,8x2T x U 16: 17: function ALLOCATION( ) 18: for allx2T x U ,x6=x 1 ;x 0 do 19: compute ~ D 1 (x) min 0 20: resid RHS of (4.23) P x6=x 1 ;x 0 P (x) ~ D 1 (x) min 0 21: ~ D 1 (x 1 ) ~ D 1 (x 1 ) min 22: ~ D 1 (x 0 ) 1 ~ D 0 (x 0 ) min 23: for allx2T x U ,x6=x 1 ;x 0 do 24: capacity ~ D 1 (x) max 0 ~ D 1 (x) min 0 25: allocation minf resid P (x) ;capacityg 26: ~ D 1 (x) ~ D 1 (x) min 0 +allocation 27: resid residP (x)allocation 28: return ~ D 1 (x),8x2T x U The realization of Theorem 1 is provided in Algorithm 1, which can efficiently ob- tain the optimal privacy scheme in linear time. We next provide a numerical example with detailed procedures to demonstrate how our proposed optimal privacy scheme can in practice be efficiently applied to a problem. 117 Table 4.6: Detailed Inputs and Computations of the Provided Numerical Example Inputs Computations x P(x) D 1 (x) D 0 (x) ~ D 1 (x) min ~ D 1 (x) max ~ D 0 (x) min ~ D 0 (x) max P(x) ~ D 1 (x) min P(x) ~ D 1 (x) max P(x) ~ D 1 (x) max 0 P(x) ~ D 0 (x) min P(x) ~ D 0 (x) max P(x) ~ D 0 (x) max 0 x 1 0:3 0 1 0 0:1 0:9 1 0 0:03 0:03 0:27 0:3 0:27 x 2 0:125 0 1 0 0:1 0:9 1 0 0:0125 0:0125 0:1125 0:125 0:125 x 3 0:075 1 0 0:9 1 0 0:1 0:0675 0:075 0:0675 0 0:0075 0:0075 x 4 0:225 0 1 0 0:1 0:9 1 0 0:0225 0:0225 0:2025 0:225 0:225 x 5 0:175 0:5 0:5 0:4 0:6 0:4 0:6 0:07 0:105 0:09 0:07 0:105 0:105 x 6 0:1 1 0 0:9 1 0 0:1 0:09 0:1 0:09 0 0:01 0:01 4.7 Numerical Examples Consider Table 4.2 again but for a smaller size populationf12, 5, 3, 9, 7, 4g (first column of the table) for ease of demonstration, and letx i denote the record of thei-th row,i = 1;:::; 6. Suppose an announced ATR needs to satisfy a pre-determined fidelity constraint = 90%-fidelity, and privacy engineers would like to preserve data subjects’ privacy as much as possible subject to the fidelity constraint. We demonstrate in the following how our proposed optimal privacy protection scheme (Algorithm 1) can be applied in practice to solve the problem. First, consider the female group T x U =fFg , i.e., the tuple of recordsfx 1 , x 2 , x 3 g. Based on lines 1 to 4 in Algorithm 1, we first need to determinex a and ~ D a (x) max 0 for all a2f0; 1g andx2T fFg . Detailed computations are presented in Table 4.6; from which, we observe thatx 1 =x 3 andx 0 =x 1 . Proceeding to line 5, we compute 0 , 1 , and p as follows: 1 = 0:0675 0:03 + 0:0125 + 0:0675 0:6136, 0 = 0:27 0:27 + 0:125 + 0:0075 0:6708, p = 0:0675 + 0:27 0:5 = 0:675, 118 and obtain T fFg = p = 0:675. Proceeding to lines 12 and 13, in this case we need to call function ALLOCATION in line 17. Based on lines 18 and 19, we first need to compute ~ D 1 (x 2 ) min 0 = 1 P (x 2 ) maxfP (x 2 ) ~ D 1 (x 2 ) min ;P(x 1 ) ~ D 1 (x 1 ) min +b(x 2 )g = 1 0:125 maxf0; 0:0675 + 0:125 (0:675)(0:5)g = 0. Proceeding to line 20, since ~ D 1 (x 2 ) min 0 = 0, we have resid = RHS of (4.23) = ( 0:35 0:675 )(0:075)(0:9) + (0:675)(0:5) 0:3 = 0:0025. Based on lines 21 and 22, we obtain ~ D 1 (x 3 ) = ~ D 1 (x 1 ) = ~ D 1 (x 1 ) min = ~ D 1 (x 3 ) min = 0:9, ~ D 1 (x 1 ) = ~ D 1 (x 0 ) = 1 ~ D 0 (x 0 ) min = 1 ~ D 0 (x 1 ) min = 0:1. Moreover, proceeding to lines 23 to 27, we obtain capacity = ~ D 1 (x 2 ) max 0 ~ D 1 (x 2 ) min 0 = 0:0125 0:125 0 = 0:1, allocation = minf 0:0025 0:125 ; 0:1g = 0:02, ~ D 1 (x 2 ) = ~ D 1 (x 2 ) min 0 +allocation = 0 + 0:02 = 0:02. We therefore obtain the optimal solution for the female group ~ D 1 (fx 1 ;x 2 ;x 3 g) = [0:1; 0:02; 0:9], which yields maximum confidence of 67.5% for an adversary inferring any sensitive information in this group. 119 We then consider the male group T x U =fMg , i.e., the tuple of recordsfx 4 , x 5 , x 6 g. Based on Table 4.6, we obtainx 1 =x 6 ,x 0 =x 4 , and 1 = 0:09 0:0225 + 0:09 + 0:09 0:4444, 0 = 0:2025 0:2025 + 0:105 + 0:01 0:6378, p = 0:09 + 0:2025 0:5 = 0:585, and we get T fMg = 0 0:6378. Based on lines 6 to 8, we obtain the optimal solu- tion for this group ~ D 1 (fx 4 ;x 5 ;x 6 g) = 1 [ 0:2025 0:225 ; 0:105 0:175 ; 0:01 0:1 ] = [0:1; 0:4; 0:9], which yields maximum confidence of 63.78% for an adversary inferring any sensitive infor- mation in this group. Based on Lemma 4, the optimal-privacy for the entire dataset is maxf0:675; 0:6378g = 0:675, which is the maximum confidence for an adversary inferring any sensitive information in this dataset based on the announced ATR. The optimal solution for the female group is a “balanced point” of correctly inferring the annual income ofx 1 andx 3 based on the known decisions and the announced ATR, i.e.,Conf (F;A! Annual Income) = conf (F;A = 0!< 100k) = conf (F;A = 1! > 200k), and conf (F; 0!< 100k) = 0:30:9 0:30:9+0:1250:98+0:0750:1 = 0:675, conf (F; 1!> 200k) = 0:0750:9 0:30:1+0:1250:02+0:0750:9 = 0:675. Making either inference more private will cause the other one to be less private and hence degrades the overall privacy guarantee as discussed in Section 4.6.5. In contrast, the optimal solution for the male group tries to minimize the confidence of correctly 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.5 0.6 0.7 0.8 0.9 1 By Optimization Solver By Algorithm 1 min =0.6 C * =1 * T {F} = 0.675 (a) Privacy-Fidelity Tradeoff for the Female Group 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.5 0.6 0.7 0.8 By Optimization Solver By Algorithm 1 min =0.45 * T {M} = 0.6378 C * =0.72 (b) Privacy-Fidelity Tradeoff for the Male Group Figure 4.5: Privacy-Fidelity Tradeoffs for both Gender Groups inferring the annual income of x 4 based on the known decisions and the announced ATR, i.e.,Conf (M;A! Annual Income) = conf (M;A = 0!< 100k), and conf (M; 0!< 100k) = 0:2250:9 0:2250:9+0:1750:6+0:10:1 0:6378. It is not hard to see that the optimal solution maximizes the denominator while mini- mizing the numerator in order to minimize the ratio for optimal privacy. From the above example, we demonstrated how an optimal-privacy ATR, subject to a fidelity constraint, can be obtained efficiently using Algorithm 1. The maximum con- fidence of an adversary (conf (F; 1!> 200k)) drops from 100% to 67.5% by setting a 10%-distortion tolerance for the ATR. In addition, privacy engineers may require information about privacy-fidelity trade- off region in order to choose an adequate. Figure 4.5 depicts the privacy-fidelity trade- offs for both the female and the male groups, in which the bold-blue-dash curves are ob- tained by the Matlab optimization solver, and the red curves are obtained by Algorithm 1. The curves match, but Algorithm 1 only takes linear time for solving the optimal ob- jective value at each point of the curve. Moreover, recall from the solution properties in Section 4.6.3 that the tradeoff region for should be within the range [ min ;C ], which can be easily computed based on Definition 13 and Lemma 6: [ 0:3 0:5 ; 0:0751 0:0751 ] = [0:6; 1] for the female group and [ 0:225 0:5 ; 0:2251 0:2251+0:1750:5 ] = [0:45; 0:72] for the male group. Both 121 results show consistency with Figure 4.5. Note that based on Lemma 6, any privacy re- quirement with < 0:6 is not feasible for this dataset, and based on Lemma 5, any privacy requirement with > 0:72 can have 1-fidelity solution for the male group, i.e., no perturbation is needed. Privacy engineers can decide how much distortion is required in order to achieve a certain level of privacy based on the tradeoff curve, which can be obtained efficiently using Algorithm 1. 4.8 Appendix 4.8.1 Proof of Lemma 1 Proof. Recall that conf (x U ;a! x S ), the confidence of inferring a sensitive attribute valuex S , is a posterior epistemic probability which can be expressed as conf (x U ;a!x S ) = ~ P X S jX U ;A (x S jx U ;a) = ~ P AjX U ;X S (ajx U ;x S )P X U ;X S (x U ;x S ) P x 0 S 2R X S ~ P AjX U ;X S (ajx U ;x 0 S )P X U ;X S (x U ;x 0 S ) : (4.24) Letx = (x U ;x S ) and defineT x U ,fx 0 2R X jx 0 U =x U g to denote the tuple in which records having the same QIDx U . We have a more comprehensive expression conf (x U ;a!x S ) = ~ P AjX (ajx)P X (x) P x 0 2Tx U ~ P AjX (ajx 0 )P X (x 0 ) = ~ D a (x)P X (x) P x 0 2Tx U ~ D a (x 0 )P X (x 0 ) : (4.25) 122 Therefore, based on Definitions 7 and 8, the privacy requirement -Maximum Confi- dence imposes the following constraints for allx = (x U ;x S )2R X ,8a2A. ~ D a (x)P X (x) P x 0 2Tx U ~ D a (x 0 )P X (x 0 ) : (4.26) 4.8.2 Proof of Lemma 5 Proof. We first prove that if C , ~ D = D is a feasible solution. We then prove its converse: if ~ D =D is a feasible solution, we must haveC . We first prove that if C , the 1-fidelity solution ~ D = D is a feasible solution, i.e., it satisfies all constraints. Obviously, the solution ~ D = D satisfies probability distribution conditions and fidelity constraints. Based on definition 13, ~ D =D yields P (x) ~ D a (x) P x 0 2Tx U P (x 0 ) ~ D a (x 0 ) =C ;8x2T x U ;8a2A: Therefore, it also satisfies privacy constraints, and hence when C , the 1-fidelity solution is a feasible solution. Next, we prove the converse by proving its contrapositive, i.e., if < C , ~ D = D is not a feasible solution. Apparently when ~ D = D, the highest confidence that an adversary can have exceeds, and hence it violates privacy requirements and cannot be a feasible solution. We therefore prove the converse. 4.8.3 Proof of Lemma 6 Proof. We first prove that if an (OPT-Sub) has feasible solutions, min . We then prove its converse: if min , an (OPT-Sub) must have feasible solutions. 123 We first prove the conditional statement by proving its contrapositive, i.e., if < min , there exists no feasible solution for an (OPT-Sub). Since ~ D is non-negative, we can rewrite the privacy constraints as follow. P (x) ~ D a (x) X x 0 2Tx U P (x 0 ) ~ D a (x 0 ) 0; (4.27) which has to be satisfied8x 2 T x U and8a 2 A. Sum (4.27) over all a 2 A, by (EQ-Sub), we have P (x) X x 0 2Tx U P (x 0 ) 0;8x2T x U ; (4.28) which is equivalent to max x2Tx U P (xjT x U ). Therefore, if there exists anyx2T x U such that < P (xjT x U ), then (4.27) cannot be satisfied for allx2 T x U , and hence no feasible solution exists. We then prove the converse. If max x2Tx U P (xjT x U ), there always exists a feasible solution ~ D a (x 0 ) = 1=jAj,8x2 T x U ,8a2A. To see this, we only need to verify if it satisfies all constraints. It is very obvious that the solution satisfies probability distribution conditions. Since fidelity constraints are trivialized, we then only need to verify if the solution satisfies privacy constraints. Since ~ D a (x 0 ) is a constant for all a andx, the left hand side of (4.27) becomesP (xjT x U ), and thus the privacy constraints are also satisfied. Hence ~ D a (x 0 ) = 1=jAj is a feasible solution and we proved the converse. 124 4.8.4 Proof of Lemma 7 Proof. We prove this by contradiction. Assume thatC < min . By their definitions in Lemma 5 and 6, it follows that max x2Tx U ; a2A P (x)D a (x) P x 0 2Tx U P (x 0 )D a (x 0 ) < max x2Tx U P (x) P x 0 2Tx U P (x 0 ) : (4.29) Let x y = arg max x2Tx U P (xjT x U ). The right hand side of (4.29) is equivalent to P (x y )= P x 0 2Tx U P (x 0 ). If inequality (4.29) holds, the following inequalities must hold P (x y )D a (x y ) P x 0 2Tx U P (x 0 )D a (x 0 ) < P (x y ) P x 0 2Tx U P (x 0 ) ;8a2A; (4.30) since the maximum of the left hand side of (4.30) over alla2A is not greater than the left hand side of (4.29). Therefore, if there exists anya2A for which the corresponding inequality in (4.30) does not hold, it implies our assumptionC < min is not true, and, if so, we are done with the proof. If there exists no such ana and (4.30) holds, by eliminatingP (x y ) from both sides of (4.30) and cross-multiplying (as all terms are non-negative), (4.30) is equivalent to the following D a (x y ) X x 0 2Tx U P (x 0 )< X x 0 2Tx U P (x 0 )D a (x 0 );8a2A: (4.31) Sum (4.31) overa2A for both sides, based on (EQ-Sub), we obtain P x 0 2Tx U P (x 0 )< P x 0 2Tx U P (x 0 ), which is obviously not true. Therefore, it implies the inequality (4.31) (and (4.30), equivalently) cannot be true for alla2A, i.e., there must exist somea for which the left hand side is not smaller than the right hand side of (4.30), so that both 125 sides are equal when summed over alla. Therefore, the initial assumption is incorrect and the lemma is proved. 4.8.5 Proof of Theorem 1 For the convenience and conciseness of the proof, as long as there is no confusion, we abuse some notations in this section and the following Appendix sections. All notations in the following Appendix sections only follow their definitions in this section. Recall that an optimization subproblem in (OPT-Sub) is formulated over a quasi- identifier (QID) groupT x U in which all public records are equal tox U . Letm =jT x U j be the cardinality of the QID group, or equivalently, the number of rows of this tuple. Letx k be the unique record of rowk in the tuple,k = 1;:::;m, and definep k ,P (x k ), x k , ~ D 1 (x k ), andy k , ~ D 0 (x k ) = 1x k . The privacy constraints can thus be re-written as p k x k P m i=1 p i x i ; 8k = 1;:::;m; p k y k P m i=1 p i y i ; 8k = 1;:::;m; which can be combined as p k m X i=1 p i p k x k m X i=1 p i x i 0; (4.32) 126 8k = 1;:::;m. Moreover, letx = [x 1 ;x 2 ; ;x m ] T , whereT represents the transpose operator. DefineA as A = 0 B B B B B B B @ (1)p 1 p 2 p m p 1 (1)p 2 p m . . . . . . . . . . . . p 1 p 2 (1)p m 1 C C C C C C C A ; (4.33) and letb = [b 1 ;b 2 ; ;b m ] T , in whichb k = p k P m i=1 p i . We can further simplify the privacy constraints as bAx0; (4.34) where0 is anm 1 zero vector. Remark 8. Note thatb k =p k P m i=1 p i 0 due to Lemma 6, or (4.28), equivalently. Similarly, the fidelity constraints can be re-written as x kmin x k x kmax ; 8k = 1;:::;m; y k min y k y k max ; 8k = 1;:::;m: However, since for binary decision,y k = 1x k , the above two constraints are basically equivalent (to see this, simply lety k min = 1x kmax andy k max = 1x kmin ), so we obtain the following fidelity constraints x kmin x k x kmax ; 8k = 1;:::;m: (4.35) 127 Note that the 2m privacy constraints in (4.32) (or their equivalent vectorized form in (4.34)) form(s) a parallelotope in the m-dimensional space, and the 2m fidelity con- straints in (4.35) form a hypercube in them-dimensional space. LetP denote the paral- lelotope andH denote the hypercube. Moreover, defineI,P T H be the intersection ofP andH.I =? if and only ifP andH are disjoint, where? denotes the empty set. We have the following fact. Fact 1. An optimization subproblem has feasible solutions if and only ifI6=?, i.e.,P andH intersect/collide with each other. Based on the above fact, to prove Theorem 1, it is equivalent to show thatP andH collide with each other if and only if * Tx U , maxf 0 ; 1 ; p g. Let , arg max k p k x kmin and , arg max k p k y k min , we can re-write 0 , 1 , and p in the following 0 = p y min p y min + P m i=1 i6= p i y i max 0 , (4.36) 1 = p x min p x min + P m i=1 i6= p i x imax 0 , (4.37) p = p x min +p y min P m i=1 p i , (4.38) where x imax 0, minfx imax ; p p i x min g; (4.39) y i max 0, minfy i max ; p p i y min g: (4.40) 128 Consider the following two optimization problems for x j , where j is an arbitrary index, 1jm: minimize x j (OPT-1) s.t. bAx0; x kmin x k x kmax ; fork = 1;:::;m;k6=j: maximize x j (OPT-2) s.t. bAx0; x kmin x k x kmax ; fork = 1;:::;m;k6=j: The above two problems have exactly the same constraints. The first line constraint forms the parallelotopeP, and letH 0 j denote the hypercube formed by the second line constraints, i.e., x kmin x k x kmax , for k = 1;:::;m;k 6= j. Moreover, define I 0 j ,P T H 0 j be the intersection ofP andH 0 j , interpreting the geometric space formed by the constraints of the above two optimization problems. Moreover, ifI 0 j 6=?, (i.e., there exist feasible solutions), we letx y j andx z j denote the optimal objective values of (OPT-1) and (OPT-2), respectively. We have the following lemma. Lemma 8. IfI 0 k 6=? for allk = 1; ;m,P andH are disjoint (I =?) if and only if eitherx y j >x j max orx z j <x j min . In other words,P andH collide with each other if and only ifI 0 k 6=?,x y k x kmax , andx z k x kmin ,8k = 1; ;m. Proof. Apparently, sinceH =H 0 j T H 0 k for anyk6= j, we haveII 0 j true for any j, which implies if there exists anyj such thatI 0 j =?,I =?, andP andH must be disjoint. SinceII 0 j for everyj, ifx = 2I 0 j for anyj, thenx = 2I. Moreover, for any pointx2I 0 j ,x y j x j x z j . 129 IfI 0 k 6=? for allk = 1; ;m, and eitherx y j > x j max orx z j < x j min , since for any x2I 0 j ,x y j x j x z j , which implies eitherx j < x j min orx j > x j max , and thus either I =?, orI*I 0 j (which violates the truth). Therefore,P andH are disjoint. We next prove the converse. IfI 0 k 6= ?, x y k x kmax , andx z k x kmin are true for all k = 1; ;m, since for any x2I 0 k ,8k = 1; ;m, x y k x k x z k , we have x kmin x k x kmax ,8k, which impliesx2I, so thatI6= ?,P andH collide with each other. We thus prove the converse and the proof is done. Based on Fact 1 and Lemma 8, followings statements are equivalent. (S1) An optimization subproblem has feasible solutions. () (S2)P andH intersect/collide with each other. () (S3)I 0 j 6=?,x y j x j max andx z j x j min ,8j = 1; ;m. Our next goal is to show that (S1)(S3) are true if and only if maxf 0 ; 1 ; p g. To show this, we first need to prove the following lemma. Lemma 9. IfI 0 j 6=?, the optimal objective value of the optimization problem (OPT-1) isx y j = maxfx y0 j ;x y1 j ;x yp j g, where x y0 j = 1 p j n p j 1 m X i=1 i6=j p i y i max 0 o ; (4.41) x y1 j = 1 p j n 1 p x min m X i=1 i6=j; p i x imax 0 o ; (4.42) x yp j = 1 p j n p x min +p j m X i=1 p i o ; (4.43) 130 and the corresponding optimal solutions are: Whenx y j =x y0 j :y =y min y k =y k max 0;8k = 1; ;m; k6=j; Whenx y j =x y1 j :x =x min x k =x kmax 0;8k = 1; ;m; k6=j; Whenx y j =x yp j :x =x min m X i=1 i6=j; p i x i = 1 p x min b j : Since when (S1)(S3) are true,x y j x j max , which impliesx yh j x j max ,8h = 0; 1;p, and we have Whenx y j =x y0 j : p y min p y min + P m i=1 i6= p i y i max 0 = 0 , (4.44) Whenx y j =x y1 j : p x min p x min + P m i=1 i6= p i x imax 0 = 1 , (4.45) Whenx y j =x yp j : p x min +p j y j min P m i=1 p i , p j . (4.46) Proof. Please see section Appendix 4.8.6 for the proof. Since if (S1)(S3) are true,I 0 j 6= ?,8j = 1; ;m, andx y j x j max needs to be met for allj = 1;:::;m. Based on Lemma 9, it means that 0 , 1 , and p j = p x min +p j y j min P m i=1 p i ; 8j = 1;:::;m; 131 which is equivalent to max j p x min +p j y j min P m i=1 p i = p x min +p y min P m i=1 p i = p : We then obtain maxf 0 ; 1 ; p g = * Tx U . Similarly, if (S1)(S3) are true,I 0 j 6= ?,8j = 1; ;m, and x z j x j min needs to be met for all j = 1;:::;m. By letting y k = 1 x k , y k min = 1 x kmax , and y k max = 1x kmin , the optimization problem (OPT-2) is essentially equivalent to the following optimization problem: minimize y j (OPT-3) s.t. bAy0; y k min y k y k max ; fork = 1;:::;m;k6=j: Lety y j be the optimal objective value of the above optimization problem. Clearly,y y j = 1 x z j . Therefore, for all j = 1;:::;m, x z j x j min is equivalent to y y j y j max . By applying results from x y j , we will obtain exactly the same conditions for , i.e., * Tx U . Therefore, if (S3) is true, we have maxf 0 ; 1 ; p g = * Tx U . Thus, if maxf 0 ; 1 ; p g,I 0 k 6= ?,8k = 1; ;m. In addition, we have x j max x y j , andy j max y y j , which is equivalent tox z j x j min , for allj, and thus (S3) is true. Since (S1)(S3) are equivalent, an optimization subproblem has feasible solution if and only if * Tx U . Combining with Lemma 9, we thus finish the proof. 4.8.6 Proof of Lemma 9 Here we demonstrate the proof of Lemma 9, which shows the optimal objective value of the optimization problem (OPT-1). 132 IfI 0 j 6=?, there exists (at least one or some)x2I 0 j , and for allx,x y j x j x z j . SinceI 0 j =P T H 0 j , anyx2I 0 j also belongs toP andH 0 . SinceP is am-dimensional parallelotope, and02P is a vertex ofP, any pointx2P can be uniquely represented by a linear combination ofm linear independent edge vectors emitted from0, denoted byL k ,k = 1;:::;m, andx = P m k=1 k L k , 0 k 1,8k = 1;:::;m. LetL be the collection of thesem vectors; specifically, L, [L 1 L 2 L m ], whereL k is anm 1 column vector andL is anmm matrix.L can be obtained by L =A 1 B; (4.47) where A is defined in (4.33) and B = dg(b) where dg(b) denotes a diagonal matrix with elements ofb = (b 1 ;b 2 ; ;b m ) along the diagonal. To findA 1 , note that since A can be represented by A = dg(p) + ()1 m p T ; (4.48) wherep = [p 1 ;p 2 ; ;p m ] T and1 m is an all-one vector withm elements, we can thus apply the following matrix inversion formula [92] (Z +cuv T ) 1 =Z 1 1 1 +cv T Z 1 u Z 1 uv T Z 1 : (4.49) to computeA 1 and thus obtainL in the following L = 1 1m 0 B B B B B B @ b 1 p 1 [1(m1)] b 2 p 1 bm p 1 b 1 p 2 b 2 p 2 [1(m1)] bm p 2 . . . . . . . . . . . . b 1 pm b 2 pm bm pm [1(m1)] 1 C C C C C C A . (4.50) 133 Define b p , ( b 1 p 1 ; b 2 p 2 ; ; bm pm ) as the element-wise division operation of two vectors. It is not hard to see thatL can be represented as the following equivalent form L = dg b p + 1m 1 p b T ; (4.51) which implies that its inverse can also be found by applying the matrix inversion formula in (4.49). We will utilize this property in the later of the proof. Recall that if x 2 I 0 j , x 2 P as well. Therefore, any x 2 I 0 j can be uniquely represented by x = m X k=1 k L k , (4.52) in which 0 k 1,8k = 1;:::;m. Recall that we are solving the optimization problem (OPT-1) for somej, 1jm. We first take out thej-th row from (4.52), x j = m X k=1 k L k;j , (4.53) and for the restm1 equalities, we move the term j L j from the right-hand-side (RHS) to the left-hand-side (LHS). We then obtain 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 x 1 x 2 . . . x j1 x j+1 . . . x m 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 j 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 L 1;j L 2;j . . . L j1;j L j+1;j . . . L m;j 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 L 1;1 L 1;2 L 1;j1 L 1;j+1 L 1;m L 2;1 L 2;2 L 2;j1 L 2;j+1 L 2;m . . . . . . . . . . . . . . . . . . . . . L j1;1 L j1;2 L j1;j1 L j1;j+1 L j1;m L j+1;1 L j+1;2 L j+1;j1 L j+1;j+1 L j+1;m . . . . . . . . . . . . . . . . . . . . . L m;1 L m;2 L m;j1 L m;j+1 L m;m 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 1 2 . . . j1 j+1 . . . m 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 ; (4.54) 134 and letx 0 j L 0 j =L sub 0 be the corresponding vector form of (4.54), in which, based on (4.50),L k;k = 1(m1) 1m b k p k ,8k = 1; ;m, andL k;i = 1m b i p k ,8k;i = 1; ;m, k6=i. Note thatL sub is an (m 1) (m 1) square sub-matrix ofL by removing the j-th row and thej-th column. Therefore, it has the similar form as shown in (4.51) by removing thej-th element of allb’s andp’s, and thus its inverseL 1 sub can be found. By applyingL 1 sub to both sides of (4.54), we have 0 =L 1 sub x 0 j L 1 sub L 0 j . (4.55) Letu,L 1 sub x 0 andv, j L 1 sub L 0 j , we obtain u k = 1 1 1 b k h (1)p k x k m X i=1 i6=j p i x i i , v k = 1 1 b k j b j , k =u k v k = 1 1 b k h 1 p k x k m X i=1 i6=j p i x i j b j i , (4.56) for allk = 1; ;m,k6=j. Substituting the k ’s above into (4.53), we obtain x j = m X k=1 k L k;j = 1 1 1 p j h m X i=1 i6=j p i x i + j b j i . (4.57) Based on (4.57), we are looking for the values of x j and x k ’s (k 6= j) yielding the minimal value ofx j . 135 Since 0 k 1,8k = 1;:::;m, which impliesu k v k ,8k = 1;:::;m,k6= j, and u k v k = 1 j b j h (1)p k x k m X i=1 i6=j p i x i i 1: (4.58) We then have j 1 b j h (1)p k x k m X i=1 i6=j p i x i i ,R j (k;x 0 ); (4.59) for all k = 1;:::;m, k6= j. Let R j , minR j (k;x 0 ). Combining with the fact that j 1, we obtain j min(R j ; 1). WhenR j 1, based on (4.59), there must exist ank (denoted by) and somex i ’s, i = 1; ;m,i6=j, such that j 1 b j h (1)p x m X i=1 i6=j p i x i i =R j 1: (4.60) Becauseb j 0 (see Remark 8), from (4.60), we have m X i=1 i6=j p i x i + j b j (1) h 1 p x m X i=1 i6=j p i x i i : (4.61) Substitute the LHS of (4.61) into (4.57). We obtain p j x j 1 p x m X i=1 i6=j p i x i = 1 p x m X i=1 i6=j; p i x i : (4.62) 136 Therefore,x j achieves its minimum if the equality in (4.62) holds, or equivalently, the equality in (4.60) holds. However, since from (4.60), we have m X i=1 i6=j p i x i 1 p x j b j ; (4.63) and alsox i x imin 0,8i, the equality in (4.60) can hold only when 1 p x j b j m X i=1 i6=j p i x imin 0: (4.64) If (4.64) holds 11 , the equality in (4.62) and (4.60) can hold, and thus we have m X i=1 i6=j p i x i = 1 p x j b j (4.65) from (4.60), and m X i=1 i6=j p i x i =p x p j x j (4.66) from (4.62). Substituting the LHS of (4.65) into (4.56), we obtain k = 1 b k (p k x k p x );8k = 1; ;m;k6=j; (4.67) and substituting the LHS of (4.66), we obtain j = 1 b j (p j x j p x ): (4.68) 11 We prove the converse later. 137 Combining (4.67) and (4.68), we have k = 1 b k (p k x k p x );8k = 1; ;m: (4.69) Since k 0 andb k 0, we havep x p k x k for allk. Moreover, from (4.62), since 0 < 1 12 andx k 0 for allk, to minimizex j , we would like to minimizep x and maximize P m i=1 i6=j; p i x i . To minimizep x , sincep k x k p k x kmin ,8k (including), and p x p k x k for allk, p x is therefore the largest effective lower limitp k x kmin over all k, i.e., = and x = x min . To maximize P m i=1 i6=j; p i x i , since k 1, we have p k x k p x +b k = p x min +b k for allk = 1; ;m (includingj), combining with the constraintx k x kmax , we thus obtainx k =x kmax 0,8k,k6=j;. Therefore, if (4.64) holds, based on (4.62) by substituting allx k ’s we obtained above, the optimal objective valuex y j , the minimum ofx j , in this case is x y j = 1 p j n 1 p x min m X i=1 i6=j; p i x imax 0 o =x y1 j ; (4.70) and the corresponding degree of privacy is p x min p x min + P m i=1 i6= p i x imax 0 = 1 : (4.71) 12 we ignore the trivial case = 1 as it means no privacy requirement at all. 138 We next prove the converse. If x = x min and x k = x kmax 0,8k = 1; ;m, k6=j;, we havep j x j p x min +b j , which implies p j x j p x min +b j () 1 p x min m X i=1 i6=j p i x imax 0p x min +b j () 1 p x min b j m X i=1 i6=j p i x imax 0 () 1 b j h (1)p x min m X i=1 i6=j p i x imax 0 i =R j 1: (4.72) Since P m i=1 i6=j p i x imax 0 P m i=1 i6=j p i x imin 0 by their definitions, (4.64) holds, andR j 1. We thus finish the proof. 139 Chapter 5 Oblivious Mechanisms in Differential Privacy: Experiments, Conjectures, and Open Questions 5.1 Chapter Introduction Organizations such as the Census Bureau, hospitals, and Internet companies have long maintained databases of personal information. The census bureau may, for instance, publish the result of a statistical query such as “How many individuals have incomes that exceed $100,000?” An implicit hope here is that the released aggregate information is sufficiently anonymous so as not to breach the privacy of any individual. Unfortunately, publication schemes initially thought to be “private” have succumbed to privacy attacks [124], highlighting the urgent need for mechanisms that are provably private. Differential Privacy (DP) is a formal framework to quantify to what extent indi- vidual privacy in a statistical database is preserved while releasing useful aggregate information about the database. It provides strong privacy guarantees by requiring the indistinguishability of whether an individual is in the dataset or not based on the released information. The key idea of differential privacy is that the presence or absence of any individual data in the database should not affect the final released statistical information significantly, and thus it can give strong privacy guarantees against an adversary with 140 arbitrary auxiliary information. Since its introduction in [56] by Dwork et. al., differen- tial privacy has spawned a large body of research in differentially private data-releasing mechanism design and performance analysis in various settings, e.g., statistical query processing, machine learning, pricing, etc. Differential privacy is a privacy-preserving constraint imposed on the query output releasing mechanisms, and to make use of the released information, it is important to understand the fundamental tradeoff between utility(accuracy) and privacy. Research Motivation - When answering a scalar (single-dimensional numeric) query, differential privacy is usually achieved by adding some noise to the result of the query on a given dataset via a noise generation mechanism (NGM) [see Section 5.2.5]. It is evident that adding random noise to a correct query output in order to preserve privacy will generate an error for the query generator (QG), and will result in its loss. The important question at hand here is, for a given query, what is the best NGM(s) that for a given desired level of privacy, guarantees a low loss to the QG? Quite a few researchers (see ‘Related Work’) have focussed on this question in theory, and either (i) proposed provably universally optimal NGMs for specific query types (e.g., count) under general QG loss functions, (ii) prove that for general query functions, there exists no universally optimal NGM that minimize QG loss, or (iii) prove the optimality of a given NGM for specific queries under restricted conditions related to the query sensitivity and side information. However, for DP to be prevalently used in practice, we need to find an answer to the question of what NGM(s) provide good (if not optimal) privacy-utility tradeoffs (see Section 5.2.4) for general query types when QGs possess general loss functions, and in scenarios unrestricted by assumptions related to query sensitivity and side information. 141 Goal - Our primary goal in this work is to conduct an exploratory study to understand questions and challenges related to the design and analysis of optimal oblivious noise generation mechanisms (NGMs) in differential privacy (see Section 5.3.1). Contributions - In regard to our goal, for the general utility-maximization framework in DP, we use experiments to understand the privacy-utility impact of adding various obliv- ious noise generation mechanisms (NGMs) to the output of single real-valued (scalar) query functions having arbitrary sensitivity. More specifically, for a given scalar query function of a particular output domain (continuous or discrete), we investigate the exis- tence and design of high utility preserving oblivious NGMs for a given privacy regime (high or low) in a Bayesian setting. Our study takes into consideration (i) different privacy regimes (levels of privacy strength), (ii) continous and discrete query output do- mains, (iii) varied levels of query sensitivity, (iv) query side information, and (v) the presence of collusion and longitudinal attacks on a query (see Sections 5.3.1 and 5.4.3). Our experiments help provide supporting evidence and counterexamples to existing the- ory results on the optimality of NGMs when they are tested on a relaxed assumption set. The experimental results (see Section 5.4) also provide us with conjectures on appropri- ate (in the sense of privacy-utility tradeoffs) oblivious NGM selection for scalar queries with side information in Bayesian user settings, for which a general theory is yet to be developed. Following our experimental results, as a secondary goal, we propose inter- esting and important open questions for the theory community in relation to the design and analysis of provably optimal oblivious DP mechanisms. 5.2 Differential Privacy, and the Mechanism Design We briefly review the essential components of the differential privacy framework as applicable to this work, viz., the privacy mechanism, query sensitivity, QG utility, the 142 privacy-utility tradeoff, and popular NGMs in existing literature. Most of the material in this section is based on the paper by Ghosh, Roughgarden, and Sundararajan [79] and is intended as background material, including terminology and notation used in the remainder of this chapter. 5.2.1 Differentially Private Mechanism Consider a database withn rows drawn from a finite setD n . Each row corresponds to data of an individual entity. The Hamming distanced H (D 1 ;D 2 ) between two datasets D 1 ,D 2 is the number of entries on whichD 1 andD 2 differ. Two datasetsD 1 ,D 2 are neighbors if and only ifd H (D 1 ;D 2 ) = 1. A queryq takes a databaseD2 D n as input and outputs the resultq(D)2L in the setL of legitimate query results. A differentially private mechanismX (also a noise generation mechanism (NGM)), is a probabilistic function fromL to some rangeR, continuous or discrete, that adds a random noise to the true query output ofq(D). Typical examples of rangeR of answers to query q are the set of real numbers, integers, and natural numbers. Let x ir denote the probability that a mechanism X outputs r2 R for input (which is query output) i = q(D)2 L. For such a mechanism X and a parameter = e " 2 [0; 1], X is "-differentially private if and only if x D 1 ;r x D 2 ;r lies in the interval [e " ; e " ] for every possible outputr2 R and pairD 1 ;D 2 of neighboring databases. A mechanism is oblivious if, for allr2 R,x D 1 r = x D 2 r wheneverq(D 1 ) = q(D 2 ) - if the distribution of the noisy query output depends only on the true query result. We discuss examples of probabilistic noise generation mechanisms used in the existing DP literature in Section 5.2.5. Intuitively, providing differential privacy implies that the probability of every re- sponse of the privacy mechanism, and hence the probability of a successful privacy attack following an interaction with the mechanism, is, up to a controllable " factor, independent of whether a given entity “opts in” or “opts out” of the database. 143 5.2.2 Query Sensitivity Two types of query sensitivity are typically considered when determining a proper noise mechanism - global sensitivity and smooth sensitivity. Specifically, global sensitivity (GS) for a query functionq over the entire database domainD n is defined as: GS = max D 1 ;D 2 2D n :d H (D 1 ;D 2 )=1 kq(D 1 )q(D 2 )k: (5.1) That is, for a given query, GS denotes the largest difference of query outputs possible from all dataset pair combinations having a Hamming distance of one. Since the power of the noise to be added to a query output is proportional to query sensitivity values, global sensitivity is the safest way of adding noise to prevent de-anonymity. However, frequently an unnecessarily large amount of noise is added, when GS is used, thus re- sulting in significant difference between the observed query output and the true query output. One approach to adding appropriate noise levels to a query output is to adopt the use of smooth sensitivity instead. Before we can introduce smooth sensitivity, we need to define local sensitivity (LS) as follows: LS (D 1 ) = max D 2 2D n :d H (D 1 ;D 2 )=1 kq(D 1 )q(D 2 )k: (5.2) However, we cannot use LS directly to add noise to a query output, as LS is datasetD 1 dependent (i.e., sensitive to the values in the dataset), and does not preserve"-DP. [127]. Smooth sensitivity (SS) [127] is a function of LS, and is “in between” global and local sensitivities. The definition of smooth sensitivity is given as SS (D 1 ) = max k=0;1;;n e k" max D 2 2D n :d H (D 1 ;D 2 )=k LS (D 2 ) : (5.3) 144 Like LS, SS is dataset dependent; however, it enables the addition of an appropriate amount of noise (greater than that due to LS) to a query output, and most importantly preserves DP [127], primarily due to the appropriately extra noise added compared to that in the LS case. 5.2.3 Query Generator Utility Utility Without Side Information: For a given queryq, the utility to a query generator (QG) is its measure of the usefulness of the output of a differentially private mechanism X for q. One of the goals of X (in theory) is to guarantee optimal utility to every potential QG (user), independent of its side information and preferences. The notion of usefulness, however, is conceptually intuitive but intractable for quantification. Since, in most cases the output ofX will deviate from the true value of queryq, a more convenient way for researchers to quantify utility is to measure its expected loss/deviation. In the absence of any side information available to QG, letl(i;j) be QG’s loss function when the true answer toq isi while QG believes it to bej. In general, a loss function is likely to possess the properties of symmetry and monotonicity, i.e., the loss function would depend only oni andjjij, and would be non-decreasing injjij. Typical examples of such functions includel(i;j) =jjij,l(i;j) = (ji) 2 , and the binary loss function l bin (i;j), defined as 0 ifi =j, and 1 otherwise. As in [79], here we will measure utility of QG as its expected loss (as detailed below). The Presence of Side Information: A QG potentially has side information pertaining to a query q, which might stem from other information sources, previous interactions with mechanism X, introspection, or common sense. For example, if q requires the count of the number of adults in Los Angeles contracting flu in December 2015, an estimated lower bound to the true query output could be the number of people buying flu drugs from a drug company in that month. An upper bound to the true query output 145 is the total adult population of Los Angeles in the month of December. Information such as the upper bound and lower bound of the true query output serve as potential side information to the QG. One of the ways to model this side information is via a prior probability distribution [79]. For a givenq, this prior distribution represents the belief of the QG (user) on the query output ofq. Note that the use of priors as model parameters does not in any way affect the preservation or non-preservation of differential privacy; it only influences the utility of a QG to discuss the utility of a (differentially private) mechanism to a potential user. The Net Utility Function: The net utility function for a QG is a function of both the side information he has (in terms of a prior distribution) and his loss function (user’s preference). For a query q, consider a Bayesian user with a prior p and loss function l that in- teracts with a differentially private mechanismX with rangeR. Since the rangeR of X need not coincide with the setL of legitimate query results (which includes the side information set), a QG, in general, must first reinterpret an outputr2R of the mecha- nismX as a query resultj2L. For example, a user that observes the output “-2” from the -geometric mechanism (Example 2.1 in [79]) might guess that the actual query result is most likely to be 0 (since the range of the true query output is non-negative). In such a case, there needs to be a remap of mechanism X with range R, which is a probabilistic function Y from R to L, with y rj denoting the probability that a user reinterprets the mechanism responser2 R as as a query resultj2 L. A mechanism X and a remap Y together induce a new probabilistic mechanism Z = Y X with z ij = (YX) ij = P r2R x ir y rj . We define the net utility function of a QG with prior 146 p as its expected loss with respect to a mechanismX and a remapY , for a queryq(D) whose true result isi, and denote it asEU(p;q;D;i) that is expressed as EU(p;q;D;i) = X i2L p i X j2L z ij l(i;j); X i2L p i = 1: (5.4) On a similar note, the net utility function of a non-Bayesian (Risk-Averse) QG that does not take into account prior information but accounts for the worst case expected loss [88], is expressed as EU(q;D) = max i2L X j2L z ij l(i;j): (5.5) 5.2.4 Utility-Privacy Tradeoffs We assume that the query generator is a rational entity and would thus want to minimize its expected loss. On the other hand, the differentially private framework will need to ensure that its privacy requirements are met and that entity anonymity is preserved. For differential private frameworks, we express this conflict/tradeoff between utility and privacy for countable rangesR as two optimization problems, OPT1 and OPT2, when the QG does and does not account for its prior, respectively. minimize X i2L p i X j2L z ij l(i;j) subject to privacy constraint set on;z 0 ij s specific toX; X j2L z ij = 1;8i2L; z ij 0;8i2L;8j2L: (OPT1,Bayesian) 147 x -4 -3 -2 -1 0 1 2 3 4 p(x) 0 0.05 0.1 0.15 0.2 0.25 0.3 Probability Density/Mass Function Laplacian Staircase Geometric Figure 5.1: Laplacian, Staircase, and Geometric mechanisms. minimize max i2L X j2L z ij l(i;j) subject to privacy constraint set on;z ij ’s specific toX; X j2L z ij = 1;8i2L; z ij 0;8i2L;8j2L: (OPT2,Risk-Averse) In both OPT1 and OPT2, the objective function reflects the minimization of the expected loss of the QG, and the constraints, apart from the validity of problem variables, reflect the ensuring of privacy constraints specific to mechanismX. Given a queryq, if user’s priori (side information)p i and preference of the loss functionl are known, an optimal mechanismZ can be derived by minimizing the expected loss (Bayesian model) or the worst case loss (Risk-Averse model) subject to differential privacy. 148 5.2.5 Popular NGMs in Literature As representative examples of NGMs, we consider three popular oblivious noise-adding mechanisms in existing literature: (i) the Laplacian [56], (ii) the Geometric [79, 88], and (iii) the Staircase [76] mechanisms, defined as follows: Laplacian :p(xj"; ) = " 2 e " jxj ;8x2R (5.6) (5.7) Geometric :p(xj) = 1 + 1 jxj ;8x2Z (5.8) (5.9) Staircase :p (xj"; ) = 1 2 p e "(k+[x] ) ;8x2R (5.10) (5.11) where 0 = e " 1 is the privacy level, and is the sensitivity level. In the Staircase mechanism, the rounding function [x] is defined as: [x] = 8 > > < > > : 0; jxj2 [k; (k + )) 1; jxj2 [(k + ); (k + 1)], (5.12) (5.13) wherek2 Z; 0 1 controls the shape of the staircase and is set to p 1+ p in the one-dimensional case, in order to minimize the expectation of the noise amplitude. Note 149 Table 5.1: Problem Difficulty Levels (low to high) w.r.t. Optimal NGM Design ڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃڭۄێۆڈڜۑۀۍێۀڇٻ ο ܩܵ = ο ܮܵ ڄٻڕڮۏڼۄۍھڼێۀ ڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃڭۄێۆڈڜۑۀۍێۀڄ ڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃڝڼ۔ۀێۄڼۉڄ ڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃڝڼ۔ۀێۄڼۉڇٻ ο ܩܵ = ο ܮܵ ڄ ڰۉۄۑۀۍێڼۇۇ۔ٻڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃھۊېۉۏٻ ο =1 ڄٻڕڢۀۊۈۀۏۍۄھ ڰۉۄۑۀۍێڼۇۇ۔ٻڪۋۏۄۈڼۇٻڟګٻڨۀھۃڼۉۄێۈٻڃېۉۆۉۊےۉٻیېۀۍۄۀێڄ ڧۀۑۀۇٻڌڕ ڧۀۑۀۇٻڍڕ ڧۀۑۀۇٻڎڕ ڧۀۑۀۇٻڏڕ ڧۀۑۀۇٻڐڕ ڧۀۑۀۇٻڑڕ that the Geometric mechanism can be applied to quantized numeric query outputs using the following generalization: Geometric :p(xj; ;d) =d 1 + d 1 d ! jxj ; (5.14) for allx2 0;d;2d; , whered is the quantization level of the output query, with d 2 N. The conventional Geometric mechanism is a special case when d = = 1. A depiction of the three mechanisms is given in Fig. 5.1. The quantization level (resolution) here is set to 0:5 (and therefore the probability mass is one-half of the density). 5.3 Challenges, Opportunities, and Our Approach We begin with possible open research directions, followed by corresponding challenges and our approach to making progress on these open questions. 5.3.1 Problem Statement Based on the above survey of the state of the art in optimal differential private mecha- nisms, we first state the following interesting open research directions. Each question is then mapped to a corresponding level in Table 5.1. 150 • R1: Do there exist universally optimal oblivious mechanisms for queries other than those stated in [32]? (Level 6.) • R2: The authors in [76] show that the optimal mechanism for a single real-valued query under the risk-averse model is the Staircase mechanism. This optimality holds under the assumption of constant sensitivity over all query outputs, which holds in general only if the worst case global sensitivity is considered (i.e., “opti- mal” in the sense of very large noise); but, in general, this is not true for under a better/tighter sensitivity metric such as smooth sensitivity. Thus, a natural ques- tion here is: what would the optimal mechanism be if we relax the assumption of constant noise over query outputs? (From Level 1 to Level 2.) • R3: More ambitiously, given user preferences and side information, is there an optimal mechanism for a scalar query in the Bayesian model? This question can be treated as a relaxed version of the harder question of finding a universally optimal mechanism, as a mechanism of the latter type is optimal in the sense of arbitrary user preferences and/or side information. (From Level 1 to Level 3.) Challenges- R1 is challenging due to the difficulty of (i) determining general or spe- cific properties of such queries that are necessary and/or sufficient criteria for any NGM to be universally optimal for those queries, and (ii) determining the necessary and/or sufficient conditions (e.g., mathematical properties of the noise distributions) for the ex- istence of universally optimal mechanisms for such queries. In the case of R2, relaxing the assumption of constant sensitivity leads to loss of linearity in the original (linear programming) optimization problem in [76] used to model the privacy-utility tradeoff, which is quite difficult to solve for general loss functions. In the case of R3 (i.e., given 151 side information), determining a side-information specific optimal mechanism for gen- eral or a specific scalar query is a non-trivial task, the challenge being similar to those in (i) and (ii), but in settings when QG’s have prior information on the query output. Given these challenges, in this work, we would like to address the following simpli- fied but related questions and answer them via experiments assisted by some analysis: 1. “Utility-Privacy Tradeoff of Existing Mechanisms”: With reference to R2, the Staircase mechanism is known to perform better in theory than the Laplacian mechanism (known to be the best mechanism for real-query outputs prior to the work by [76]) in the low-medium privacy regime for real query output [76]. We are interested in investigating the extent of this improvement. To this end, we will study this under arbitrary sensitivity and"-differential privacy settings. 2. “Presence of side information”: With reference to R3, given user preferences and partial user side information, can we figure out a heuristic differentially private mechanism that takes advantage of partial side information and performs better than the risk-averse (non Bayesian) Staircase mechanism? If so, what would such a heuristic mechanism look like? What would be the performance gap region of the privacy-utility tradeoff? Studying the problem of side-information specific optimal differential privacy mech- anism is non-trivial. Many of the query outputs are expected to be distributed in a certain manner. For example, consider a query asking for the mean of a certain attribute of a large database. The Central Limit Theorem tells us that (assuming independent and identically distributed entries), the mean (query output) should be Gaussian distributed, no matter how the original entries are distributed. A similar idea applies to other queries such as maximum query, where we can reasonably expect a high probability of large numbers and a low probability of small numbers in a large database. More specifically, 152 if the entries are independent and uniformly distributed, the max query output will be beta distributed over the query output domain (scaled and shifted). Moreover, based on the open question in [88], it is still not clear whether collusion- resistance and simultaneous utility maximization hold for other types of queries (i.e., other than the count query considered in [88]). This inspires another interesting ques- tion: for queries other than count, how would the utility function behave (e.g., as a function of the number of QGs) when QGs interact and can potentially share informa- tion of a particular query output? Approach - We focus on experimentally addressing the above-mentioned questions in the rest of this work, i.e., respecting the intricacies of finding the answers to our ques- tions in theory, instead of analytically modeling arbitrary sensitivity and side informa- tion and resolving the questions via mathematical rigor, we will run experiments for certain query functions on sampled values in the DP parameter space and the prior dis- tribution space. Based on our observations, we will come up with conjectures whose proofs/disproofs would be open problems for the theory community working on DP. To the best of our knowledge, ours is the first work touching upon an experimental per- formance evaluation of oblivious noise generating mechanisms (NGMs) for differential privacy (DP). 5.3.2 Experimental Methodology In this section, we propose our methodology to run experiments whose outcome would lead us to major conjectures about the optimality of NGMs in the presence of query side information. 153 Dataset Domains and Query Functions Recall that here, we focus only on numeric queries and differential private (DP) mech- anisms which are oblivious. Our goal is to study the utility-privacy tradeoff of three popular oblivious mechanisms, and investigate the (simplified) open questions posed above. We note that, given our goal, there is no need to perform experiments on large- scale real datasets to obtain our results, for the following reasons. Given a database and a query result, there are three major components that DP out- comes depend on: (i) the true query output, (ii) the query sensitivity metric, and (iii) corresponding DP noise generating mechanism. However, oblivious mechanisms are not database dependent conditioned on the unperturbed query output. If two databases have same true query output, then the oblivious mechanisms apply noise to the query output in exactly the same manner, oblivious of the database. This implies that the DP mechanism output depends on the true query output and the query sensitivity. The latter again depends on the database if we consider local sensitivity, and does not depend on the database if we consider global sensitivity. However, LS cannot be used to generate noise for a true query output because it does not preserve DP. Researchers usually consider global sensitivity or smooth sensitivity instead for adding noise. The bad news here is that, to experimentally obtain these two sensitivities, by definition, we need to investigate all possible databases over the entire database space, which is in practice infeasible. On the other hand, the good news is that, we can remove the need to compute sensitivity values by simply normalizing the performance metric (i.e., expected loss) by global sensitivity. To illustrate this idea, recall that the utility (measured by expected loss) of Laplacian and Staircase mechanisms from [75] is as follows: Laplacian :EL(; ) = log (5.15) 154 Staircase :EL(; ) = p 1 : (5.16) If we normalize above loss functions by the global sensitivity , the loss function no longer depends on . Particularly, the loss function of the Geometric mechanism for count query does not depend on either, due to the fact that GS = LS = 1, which is Geometric :EL() = 2 1 2 ; (5.17) Therefore, without loss of generality, given the performance metric normalized util- ity, which we use in our experiments, these experiments need not be done in a database- specific way. We only need to specify the query output domainL, which can be contin- uous or discrete. For scalar queries, we consider mean, maximum, and count queries in our experiments. Deployed (Noise-adding) Mechanisms The popular noise-adding mechanisms we explore here are the Laplacian, Staircase, and Geometric mechanisms. The details of these mechanisms are given in section 5.2.5. Interaction (Remap) Mechanisms The remap function,Y , is an optimal mapping mechanism from the noisy query output, r, to the estimated result,j, in the true query output domain. If the true query output domain is real, then Y is nothing but an impulse/identity function, since the noise we add has the highest density at the true answer with no bias. If the true query output domain is discrete, thenY should be a round() function which rounds the noisy results to the nearest legitimate discrete value. 155 Collusion in Query Results As explained in section 5.3.1, we would like to understand the drop rate of expected loss corresponding to the number of cooperating customers. In our experiments, we make the following assumptions: each customer can send the same query only once, but they can attempt to extract useful information by sharing their query answers with other users asking the same query. Based on these assumptions, we study the utility-privacy tradeoffs for collusion attacks. We note here that the case for longitudinal attacks is exactly the same as that of collusion attacks because the privacy harm caused due to the same QG askingk questions on a query is the same as that caused byk QGs asking one question each on the same query and then colluding with each other on the perturbed query outputs [88]. 5.4 Experimental Results and Analysis In this section, we focus on addressing the simplified questions posed in Section 5.3.1 using experiments aided by analysis. 5.4.1 Utility-Privacy Tradeoff of Existing Mechanisms Analysis We assume every entry in the database is a real number. In the cases of the mean and max queries, the query outputs are also real numbers. The corresponding expected disutilities were provided by previous efforts [75] and has been mentioned in (5.15) and (5.16). Note that for continuous query output domainL (e.g., the entire or bounded real domain), the Geometric mechanism cannot be applied since the perturbed query output 156 does not cover entire (all possible query outputs in)L and thus will not satisfy DP by definition. However, in the case of the count query, the query output is an integer. Consequently, for continuous NGMs, we have to remap the perturbed query output to integers. We derive the normalized expected loss based on its definition in section 5.2.3, particularly for the count query ( = 1), as follows: Laplacian :EL() = p 1 (5.18) Staircase :EL() = (1 (1 p ) 2 2 ) p 1 (5.19) Note that we do not need to revisit this for Geometric, since it has also been shown in (5.17). Fig. 5.2 depicts our analysis results (the ”Theoretical” curves) of the utility-privacy tradeoff for existing popular NGMs. Since utility is measured by expected loss, lower curves stand for better performance. We note that the Staircase mechanism outperforms the Laplacian mechanisms for the mean and maximum queries under low-medium pri- vacy regions ( < 0:5). Under medium-high privacy regions ( 0:5), there is essen- tially no statistical difference between the mechanisms, and therefore we only capture curves under low-medium privacy regions (so gaps would be more clear). Although Staircase outperforms Laplacian mechanism, the performance improvement does not seem very significant. For the count query, we find that the Geometric mechanism per- forms the best in our experiments. This is expected, since the Geometric mechanism is universally optimal for the count query. 157 5.4.2 Presence of Side Information Given the setting of having side information, in this experiment, we show that even without knowing the exact distribution a priori, we could still do better than applying the risk-averse optimal mechanism blindly. Scenario and Experiment Settings Recall that for i.i.d. entries in a database, from probability theories we know that the mean and maximum will be Gaussian and Beta distributed. In the worst case, this could be the only side information we have. To understand whether we can do better even with very limited side information, we are going to design an experiment for it. Even a simple toy example showing non-trivial improvement with very limited side information is representative enough to claim the optimization problem is worthy. Considering numeric query functions mean and maximum. In the toy experiment, we define the size of databasen = 100, the global sensitivity to be 10, and the query output domainL to be a bounded real interval [10; 10]. (This is just for convenience, so that the query output of maximum query would not go to infinity). The prior of their unperturbed query output are set to be a truncated GaussianN(0; 1) and a scaled- and-shiftedBeta(n = 100; 1) distribution for mean and maximum query, respectively. However, in the experiment we pretend we have very obscure information (worst-case scenario under the presence of side information) about the parameters, i.e., we only know rough shapes of the distributions. More specifically, we assume that we only know that they behave likeN(2 [2; 2];) andBeta(n; 1), with unclear but in a certain range, small andn large due to large database (n 100). 158 α 0 0.1 0.2 0.3 0.4 0.5 Expected Loss/Δ GS 0 0.5 1 1.5 Mean Laplacian (Theoretical) Staircase (Theoretical) Pre-rounding + Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) α 0 0.1 0.2 0.3 0.4 0.5 Expected Loss/Δ GS 0 0.5 1 1.5 Max Laplacian (Theoretical) Staircase (Theoretical) Pre-rounding + Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) α 0 0.1 0.2 0.3 0.4 0.5 Expected Loss/Δ GS 0 0.5 1 1.5 Count Geometric (Theoretical) Laplacian (Theoretical) Staircase (Theoretical) Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) Figure 5.2: Utility-Privacy tradeoff and optimality of three mechanisms. Proposed Heuristic Mechanism We propose a heuristic DP mechanism in what follows. This heuristic mechanism has two stages. The first stage is a pre-processing stage which simply rounds all numbers (the true query outputs) in [10;5) to10, all numbers in [5; 5) to 0, and all num- bers in [5; 10] to 10, i.e., there are only three possible outputsf10; 0; 10g after prepro- cessing. The second stage adds generalized Geometric noise presented in (5.14) with d = = 10 (according to the set up of this experiment) to the pre-processed output of the first stage. The true query output is then perturbed twice in our heuristic mechanism. The idea our heuristic design is that the pre-processing is actually designed based on the side information we have, which is assumed to be Gaussian inN(2 [2; 2];) and Beta in Beta(n; 1) shape with small and large n. In other words, from side information, we know that the true result of the mean query has a high probability (due to small ) of being in [-2,2], which is centered around 0, and the true result of the max query has a high probability (due to largen) of being a large number around 10. Therefore, by discretizing the query output domain (introducing small loss first), we can then apply Geometric mechanism which is known to perform better for discrete output domain (gaining much then). 159 Number of query 0 20 40 60 80 100 Expected Loss/Δ GS 0 0.05 0.1 0.15 0.2 0.25 0.3 Mean, α = 0.2 Laplacian (Composition Bound) Staircase (Composition Bound) Laplacian (Our Approximation) Staircase (Our Approximation) Pre-rounding + Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) Number of query 0 20 40 60 80 100 Expected Loss/Δ GS 0 0.05 0.1 0.15 0.2 0.25 0.3 Max, α = 0.2 Laplacian (Composition Bound) Staircase (Composition Bound) Laplacian (Our Approximation) Staircase (Our Approximation) Pre-rounding + Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) Number of query 0 5 10 15 20 25 Expected Loss/Δ GS 0 0.05 0.1 0.15 0.2 0.25 0.3 Count, α = 0.2 Geometric (Experimental) Laplacian (Experimental) Staircase (Experimental) Figure 5.3: Drop rates of expected loss due to collusion. Experimental Results We compare the utility-privacy tradeoff of the proposed heuristic mechanism for mean and maximum queries under different privacy levels (). The experiments are run 10 6 times for each point and then averaged to compute the expected loss. All results are reported with 95% confidence intervals. Our experimental results are depicted in Fig. 5.2. The performance curve of our heuristic mechanism is marked as red crosses. We note that for both mean and max queries, with simple preprocessing and application of the Geometric mechanism, we can obtain significant performance improvement in the low-medium privacy regime ( < 0:5). Indeed, our heuristic design is not optimal and not general in any sense, but this is not our goal in this work. Rather, we want to show here, through experiments, the following interesting observations: 1. The pre-processing function should be designed based on available side informa- tion. We can design a meaningful pre-processing function which aids performance significantly without knowing the actual prior. 2. By simply pre-rounding and applying generalized Geometric mechanism, we can improve the performance significantly without designing anything new here. This 160 suggests that designing a side-information specific optimal DP mechanism is non- trivial. 3. Designing a side-information specific pre-processing improves performance sig- nificantly without knowing the actual prior. However, it is not clear if the side- information specific pre-processing is indispensable. In other words, it is not clear if a side-information specific optimal DP mechanism can be designed in just a single-stage. 4. In this heuristic design, we note that the pre-rounding stage prevents the expected loss from converging to zero under collusion attacks, due to its irreversibility (see Fig. 5.3). This suggests interesting directions for the design of a collusion pre- vention mechanism. 5.4.3 Collusion in Query Results Here we assume that each user can only pose the same query once (in a given time period). Let k be the number of users that cooperate with each other by sharing their perturbed query results. Dwork [54] shows that the composition ofk queries, each of which is (",)-differentially private, is at least (k",k)-differentially private. However, this bound is known to be loose [128]. Here, we derive our approximation for the trend of expected loss drop for large k. The approximation will be compared with experimental results as well as the composition bound, to validate the accuracy of the approximation and to show how loose the bound is. 161 Analysis For k users sharing their perturbed query results, the first question we would like to ask is: According to user preference, what would be the best strategy of utilizing their results? We propose it in the following lemma. Lemma 10. For Laplacian, Staircase and Geometric mechanisms, if user preference (loss function) is defined/known as l(i;j) =jijj, the maximum likelihood estima- tion (MLE) strategy for collusion in query results is to use the corresponding sample medians. Proof. Please see Appendix for details of the proof. The definitions ofl,i andj can be found in section 5.2.3. Using Lemma 10 and applying an approximation [157] of real-valued sample me- dian distribution for largek, we then derive the normalized expected loss of the optimal collusion results for queries with continuous outputs in the following: Laplacian :EL(;k) = r 2 k 1 log (5.20) Staircase :EL(;k) = r 2 k p 1 (5.21) Geometric :EU(;k) = 1 p 2k 1 + 1 (5.22) Not surprisingly, the expected loss drops when the number of users (k) increases. However, the drop rate is inversely proportional to p k. From above analysis, the curator can re-define a new privacy level according to the (expected) number of users (k) and 162 the original privacy level. This approximation is expected to approach the expected loss of experimental results for largek and is thus particularly useful for estimating the trend of utility loss and re-defining new privacy levels. For the count query, however, the drop rate of expected loss is difficult to analyze (and is part of future efforts). We use experimental results to understand the trend. Experimental Results Fig. 5.3 illustrates how the expected loss drops as a function of the number of cooper- ating users (k). The experimental results of each mechanism for mean and maximum queries are compared with the corresponding approximations and composition bounds. As we can see from this figure, for uncountable numeric query output domain (such as mean and maximum), the expected loss drops roughly inversely proportionally to p k, not much difference with our approximation. For countable numeric query out- put domain (such as count), our experimental results indicate that the expected loss drops much faster. This indicates that count query is particularly vulnerable to collu- sion attacks. That is, cooperating users could narrow down the target information fairly quickly. To prevent collusion attacks, a service provider should consider adding cor- related noise between cooperating users [88], as they would not be able to remove the correlation, resulting in better privacy protection of sensitive data. 163 Chapter 6 Conclusion 6.1 Summary of Contributions In this dissertation, we study various potential security and privacy hazards during in- formation processing from different aspects, and propose effective defense mechanisms correspondingly. Specifically, our first work explore effects of poisoning backdoor at- tack in the context of federated meta-learning, where malicious participants may abuse the training algorithm to backdoor a jointly trained model. We show that a one-shot poisoning backdoor attack can be very successful and persists very long. Moreover, the attack can be generalized to unseen backdoor examples very well. Normal training and fine-tuning cannot remove the effects of a backdoor attack. Our idea of using a matching network fine-tuning as a defense during meta-testing is privacy-friendly and very effec- tive for eliminating backdoor effects, while sacrificing the performances of the model. Much of our future efforts will concentrate on this limitation. Our second work addresses the potential privacy hazards brought on by an algo- rithmic transparency report. We demonstrated how a not-so-powerful adversary can combine his/her background knowledge with the information provided in an algorith- mic transparency report to obtain data subjects’ private information. From this we glean which potential aspects of transparency and fairness measures can hurt privacy. We then propose a privacy scheme that perturbs the information to be announced, to remedy the privacy leaks. We systematically study the impact of such perturbation on fairness mea- sures and the fidelity of the announced information, which results in a privacy-fidelity 164 trade-off problem. Given a fidelity requirement, our proposed scheme can efficiently produce an optimal privacy results. We also provide insight into our proposed opti- mal privacy scheme. We believe that future efforts at the intersection of transparency, fairness, and privacy will be fruitful. Last but not least, our final work is focused on the problem of optimal DP mecha- nism design for one-dimensional numeric queries. We consider possible levels of opti- mality, consider the current state-of-the-art work in the context of these levels, and state several open questions that have not been investigated or answered by current differen- tial privacy research. Moreover, we consider the utility-privacy tradeoff performance of existing (popular) mechanisms. In the presence of side information, a heuristic DP mechanism is proposed, largely to illustrate the non-triviality of optimal design. We also consider the effect of collusion in query results. Theoretical bounds underk-fold adap- tive composition are compared with our experimental results, where collusion is based on the maximal likelihood estimation (MLE) ofk query results. As a main result, we conjecture that a heuristic DP mechanism betters the Staircase and Laplace mechanism for scalar output queries in the presence of side information. 6.2 Open Challenges In this section, we look at some important open challenges not addressed in the disser- tation. • Despite showing the effectiveness against dirty-label data poisoning backdoor at- tack in our first work, it is still yet to be cleared whether a matching network architecture is robust to poisoning backdoor attack in general, as dirty-label data poisoning is not directly applicable to a matching network which learns to classify 165 an input example based on non-parametric matching processes measuring similar- ities to support data rather than using the potentially tampered labels. Networks with similar principles, e.g., Prototypical Network [155] computing distances to prototype representations of each class, or Relation Network [158] reasoning re- lations among objects, might be candidates robust to poisoning backdoor attack. We expect future work on the exploration of this field to be fruitful. • To preserve users’ privacy in algorithmic transparency, we proposed a computa- tionally efficient algorithm, based on the closed-form solution of the optimization formulation, to generate a privacy-aware surrogate model which maximizes users’ privacy subject to a fidelity requirement for the surrogate model. As an important initial step, the optimization formulation is focused on binary-decision problems. The extension of our work to a general multi-classes decision problem is an im- portant and challenging problem. • While proposing a heuristic design and showing the feasibility of utilizing side- information to enhance the utility of a DP mechanism, it remains a challenging problem to find a general and practical utility-(near)optimal DP mechanism on the presence of side-information. Such a challenging problem is very important from the aspect of making privacy-preserving mechanisms more useful in practice. 166 References [1] U.S. census bureau historical income tables: People. https://goo.gl/ UDoF64. Accessed: 2018-10-01. [2] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceed- ings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016. [3] Alessandro Acquisti and Ralph Gross. Predicting social security numbers from public data. Proceedings of the National academy of sciences, pages pnas– 0904891106, 2009. [4] Nabil R Adam and John C Worthmann. Security-control methods for statistical databases: a comparative study. ACM Computing Surveys (CSUR), 21(4):515– 556, 1989. [5] Gagan Aggarwal, Tomas Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu. Approximation algorithms for k- anonymity. Journal of Privacy Technology (JOPT), 2005. [6] Rakesh Agrawal and Ramakrishnan Srikant. Privacy-preserving data mining. In ACM Sigmod Record, volume 29, pages 439–450. ACM, 2000. [7] Shipra Agrawal and Jayant R Haritsa. A framework for high-accuracy privacy- preserving mining. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 193–204. IEEE, 2005. [8] M´ ario S Alvim, Miguel E Andr´ es, Konstantinos Chatzikokolakis, Pierpaolo Degano, and Catuscia Palamidessi. Differential privacy: On the trade-off be- tween utility and information leakage. Formal Aspects in Security and Trust, 7140:39–54, 2011. [9] Mike Ananny and Kate Crawford. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society, 20(3):973–989, 2018. 167 [10] Kurt M Anstreicher. Linear programming inO( n 3 lnn L) operations. SIAM Journal on Optimization, 9(4):803–812, 1999. [11] Daniel W Apley. Visualizing the effects of predictor variables in black box su- pervised learning models. arXiv preprint arXiv:1612.08468, 2016. [12] Shahab Asoodeh, Fady Alajaji, and Tam´ as Linder. Notes on information-theoretic privacy. In Communication, Control, and Computing (Allerton), 2014 52nd An- nual Allerton Conference on, pages 1272–1278. IEEE, 2014. [13] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. CoRR, abs/1807.00459, 2018. [14] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans- lation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. [15] Michael Barbaro, Tom Zeller, and Saul Hansell. A face is exposed for AOL searcher no. 4417749. New York Times, 9(2008), 2006. [16] Solon Barocas and Andrew D Selbst. Big data’s disparate impact. Cal. L. Rev., 104:671, 2016. [17] Gilles Barthe and Boris Kopf. Information-theoretic bounds for differentially private mechanisms. In Computer Security Foundations Symposium (CSF), 2011 IEEE 24th, pages 191–204. IEEE, 2011. [18] Gilad Baruch, Moran Baruch, and Yoav Goldberg. A little is enough: Circum- venting defenses for distributed learning. In Advances in Neural Information Processing Systems 32, NIPS 2019, December 8-14, 2019, Vancouver, Canada, pages 8632–8642. Curran Associates, Inc., 2019. [19] Yuksel Ozan Basciftci, Ye Wang, and Prakash Ishwar. On privacy-utility tradeoffs for constrained data release mechanisms. In Information Theory and Applications Workshop (ITA), 2016, pages 1–6. IEEE, 2016. [20] Roberto J Bayardo and Rakesh Agrawal. Data privacy through optimal k- anonymization. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st Inter- national Conference on, pages 217–228. IEEE, 2005. [21] Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. arXiv preprint arXiv:1703.09207, 2017. 168 [22] Michele Bezzi. An information theoretic approach for privacy metrics. Trans. Data Privacy, 3(3):199–215, 2010. [23] Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, and Seraphin B. Calo. Analyzing federated learning through an adversarial lens. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 634–643, 2019. [24] Dan Biddle. Adverse Impact and Test Validation: A Practitioner’s Guide to Valid and Defensible Employment Testing. Gower Publishing, Ltd., 2 edition, 2006. [25] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against sup- port vector machines. In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012, 2012. [26] M. Bilal Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Learning Fair Classifiers. ArXiv e-prints, July 2015. [27] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Ma- chine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30, NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pages 119–129, 2017. [28] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy preserving machine learning. IACR Cryptology ePrint Archive, 2017:281, 2017. [29] Stephen Boyd, Lin Xiao, Almir Mutapcic, and Jacob Mattingley. Notes on de- composition methods. Notes for EE364B, Stanford University, pages 1–36, 2007. [30] Ruth Brand. Microdata protection through noise addition. In Inference control in statistical databases, pages 97–116. Springer, 2002. [31] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001. [32] Hai Brenner and Kobbi Nissim. Impossibility of differentially private universally optimal mechanisms. SIAM Journal on Computing, 43(5):1513–1540, 2014. [33] Ji-Won Byun, Ashish Kamra, Elisa Bertino, and Ninghui Li. Efficient k- anonymization using clustering techniques. In International Conference on Database Systems for Advanced Applications, pages 188–200. Springer, 2007. 169 [34] Flavio P Calmon, Ali Makhdoumi, and Muriel M´ edard. Fundamental limits of perfect privacy. In Information Theory (ISIT), 2015 IEEE International Sympo- sium on, pages 1796–1800. IEEE, 2015. [35] Jianneng Cao and Panagiotis Karras. Publishing microdata with a robust privacy guarantee. Proceedings of the VLDB Endowment, 5(11):1388–1399, 2012. [36] Chien-Lun Chen, Ranjan Pal, and Leana Golubchik. Oblivious mechanisms in differential privacy: experiments, conjectures, and open questions. In Security and Privacy Workshops (SPW), 2016 IEEE, pages 41–48. IEEE, 2016. [37] Fei Chen, Zhenhua Dong, Zhenguo Li, and Xiuqiang He. Federated meta- learning for recommendation. CoRR, abs/1802.07876, 2018. [38] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017. [39] Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. In Proceedings of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2018, Irvine, CA, USA, June 18-22, 2018, page 96, 2018. [40] Mark Collier and J¨ oran Beel. Implementing neural turing machines. In 27th International Conference on Artificial Neural Networks, ICANN 2018, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III, pages 94–104, 2018. [41] Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic decision making and the cost of fairness. arXiv preprint arXiv:1701.08230, 2017. [42] Lawrence H Cox. Suppression methodology and statistical disclosure control. Journal of the American Statistical Association, 75(370):377–385, 1980. [43] Tore Dalenius. Finding a needle in a haystack or identifying anonymous census records. Journal of official statistics, 2(3):329, 1986. [44] A. Datta, S. Sen, and Y . Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE Sympo- sium on Security and Privacy (SP), pages 598–617, 2016. [45] Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experi- ments on ad privacy settings. Proceedings on Privacy Enhancing Technologies, 2015(1):92–112, 2015. 170 [46] Anindya De. Lower bounds in differential privacy. In Theory of Cryptography Conference, pages 321–338. Springer, 2012. [47] Yves-Alexandre De Montjoye, C´ esar A Hidalgo, Michel Verleysen, and Vin- cent D Blondel. Unique in the crowd: The privacy bounds of human mobility. Scientific reports, 3:1376, 2013. [48] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository, 2017. [49] Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 202–210. ACM, 2003. [50] Josep Domingo-Ferrer and Vicenc ¸ Torra. A critique of k-anonymity and some of its enhancements. In Availability, Reliability and Security, 2008. ARES 08. Third International Conference on, pages 990–993. IEEE, 2008. [51] Fl´ avio du Pin Calmon and Nadia Fawaz. Privacy against statistical inference. In Communication, Control, and Computing (Allerton), 2012 50th Annual Allerton Conference on, pages 1401–1408. IEEE, 2012. [52] Flavio du Pin Calmon, Ali Makhdoumi, Muriel M´ edard, Mayank Varia, Mark Christiansen, and Ken R Duffy. Principal inertia components and applications. IEEE Transactions on Information Theory, 63(8):5011–5038, 2017. [53] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS ’12, pages 214–226, New York, NY , USA, 2012. ACM. [54] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In An- nual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006. [55] Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Pro- ceedings of the forty-first annual ACM symposium on Theory of computing, pages 371–380. ACM, 2009. [56] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Confer- ence, pages 265–284. Springer, 2006. [57] Cynthia Dwork and Kobbi Nissim. Privacy-preserving datamining on vertically partitioned databases. In Annual International Cryptology Conference, pages 528–544. Springer, 2004. 171 [58] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211– 407, 2014. [59] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv preprint arXiv:1603.01887, 2016. [60] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 51–60. IEEE, 2010. [61] Lilian Edwards and Michael Veale. Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for. Duke L. & Tech. Rev., 16:18, 2017. [62] Michael D Ekstrand, Rezvan Joshaghani, and Hoda Mehrpouyan. Privacy for all: Ensuring fair and equitable privacy protections. In Conference on Fairness, Accountability and Transparency, pages 35–47, 2018. [63] Barbara Espinoza and Geoffrey Smith. Min-entropy as a resource. Information and Computation, 226:57–75, 2013. [64] Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the twenty- second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 211–222. ACM, 2003. [65] Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. Certifying and removing disparate impact. In Pro- ceedings of the 21th ACM SIGKDD International Conference on Knowledge Dis- covery and Data Mining, pages 259–268. ACM, 2015. [66] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 1126–1135, 2017. [67] Aaron Fisher, Cynthia Rudin, and Francesca Dominici. Model class reliance: Variable importance measures for any machine learning model class, from the” rashomon” perspective. arXiv preprint arXiv:1801.01489, 2018. [68] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1322–1333. ACM, 2015. 172 [69] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David Page, and Thomas Ristenpart. Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing. In USENIX Security Symposium, pages 17–32, 2014. [70] Jerome H Friedman. Greedy function approximation: a gradient boosting ma- chine. Annals of statistics, pages 1189–1232, 2001. [71] Jerome H Friedman, Bogdan E Popescu, et al. Predictive learning via rule en- sembles. The Annals of Applied Statistics, 2(3):916–954, 2008. [72] Benjamin CM Fung, Ke Wang, and S Yu Philip. Anonymizing classification data for privacy preservation. IEEE Transactions on Knowledge and Data Engineer- ing, 19(5), 2007. [73] Benjamin CM Fung, Ke Wang, and Philip S Yu. Top-down specialization for information and privacy preservation. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 205–216. IEEE, 2005. [74] Clement Fung, Chris J. M. Yoon, and Ivan Beschastnikh. Mitigating sybils in federated learning poisoning. CoRR, abs/1808.04866, 2018. [75] Quan Geng, Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The staircase mechanism in differential privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1176–1184, 2015. [76] Quan Geng and Pramod Viswanath. The optimal mechanism in differential pri- vacy. In Information Theory (ISIT), 2014 IEEE International Symposium on, pages 2371–2375. IEEE, 2014. [77] Craig Gentry and Dan Boneh. A fully homomorphic encryption scheme, vol- ume 20. Stanford University Stanford, 2009. [78] Gabriel Ghinita, Panagiotis Karras, Panos Kalnis, and Nikos Mamoulis. A frame- work for efficient data anonymization under privacy and accuracy constraints. ACM Transactions on Database Systems (TODS), 34(2):9, 2009. [79] Arpita Ghosh, Tim Roughgarden, and Mukund Sundararajan. Universally utility- maximizing privacy mechanisms. SIAM Journal on Computing, 41(6):1673– 1693, 2012. [80] Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International Conference on Machine Learning, pages 201–210, 2016. 173 [81] Aristides Gionis and Tamir Tassa. k-anonymization with minimal loss of informa- tion. IEEE Transactions on Knowledge and Data Engineering, 21(2):206–219, 2009. [82] Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65, 2015. [83] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a ”right to explanation”. arXiv preprint arXiv:1606.08813, 2016. [84] Bernard G Greenberg, Abdel-Latif A Abul-Ela, Walt R Simmons, and Daniel G Horvitz. The unrelated question randomized response model: Theoretical frame- work. Journal of the American Statistical Association, 64(326):520–539, 1969. [85] Brandon M Greenwell, Bradley C Boehmke, and Andrew J McCarthy. A sim- ple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755, 2018. [86] Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vul- nerabilities in the machine learning model supply chain. CoRR, abs/1708.06733, 2017. [87] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Gi- annotti, and Dino Pedreschi. A survey of methods for explaining black box mod- els. ACM Computing Surveys (CSUR), 51(5):93, 2018. [88] Mangesh Gupte and Mukund Sundararajan. Universally optimal privacy mech- anisms for minimax agents. In Proceedings of the twenty-ninth ACM SIGMOD- SIGACT-SIGART symposium on Principles of database systems, pages 135–146. ACM, 2010. [89] Melissa Gymrek, Amy L McGuire, David Golan, Eran Halperin, and Yaniv Erlich. Identifying personal genomes by surname inference. Science, 339(6117):321–324, 2013. [90] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in super- vised learning. In Advances in Neural Information Processing Systems, pages 3315–3323, 2016. [91] Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 705–714. ACM, 2010. 174 [92] Harold V Henderson and Shayle R Searle. On deriving the inverse of a sum of matrices. Siam Review, 23(1):53–60, 1981. [93] Giles Hooker. Discovering additive structure in black box functions. In Proceed- ings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 575–580. ACM, 2004. [94] Shi-Yu Huang. Intelligent decision support: handbook of applications and ad- vances of the rough sets theory, volume 11. Springer Science & Business Media, 1992. [95] Vijay S Iyengar. Transforming data to satisfy privacy constraints. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 279–288. ACM, 2002. [96] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037– 4049, 2017. [97] Faisal Kamiran, Indr˙ e ˇ Zliobait˙ e, and Toon Calders. Quantifying explainable dis- crimination and removing illegal discrimination in automated decision making. Knowledge and Information Systems, 35(3):613–644, 2013. [98] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness- aware classifier with prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases, pages 35–50, 2012. [99] Shiva P Kasiviswanathan and Adam Smith. On the’semantics’ of differential privacy: A bayesian formulation. Journal of Privacy and Confidentiality, 6(1):1, 2014. [100] Daniel Kifer and Johannes Gehrke. Injecting utility into anonymized datasets. In Proceedings of the 2006 ACM SIGMOD international conference on Manage- ment of data, pages 217–228. ACM, 2006. [101] Pang Wei Koh and Percy Liang. Understanding black-box predictions via influ- ence functions. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pages 1885– 1894, 2017. [102] Igor Kononenko et al. An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research, 11(Jan):1–18, 2010. [103] Sanjay Krishnan and Eugene Wu. PALM: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics, page 4. ACM, 2017. 175 [104] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009. [105] Martin Kuˇ cera, Petar Tsankov, Timon Gehr, Marco Guarnieri, and Martin Vechev. Synthesis of probabilistic privacy enforcement. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 391–408. ACM, 2017. [106] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016. [107] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pages 4066– 4076, 2017. [108] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human- level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015. [109] Kristen LeFevre, David J DeWitt, and Raghu Ramakrishnan. Mondrian multidi- mensional k-anonymity. In Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pages 25–25. IEEE, 2006. [110] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness: Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 106–115. IEEE, 2007. [111] Tiancheng Li and Ninghui Li. On the tradeoff between privacy and utility in data publishing. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 517–526. ACM, 2009. [112] Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Con- ference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, pages 1412–1421, 2015. [113] Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrish- nan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, pages 24–24. IEEE, 2006. [114] Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrish- nan Venkitasubramaniam.l-diversity: Privacy beyondk-anonymity. ACM Trans- actions on Knowledge Discovery from Data (TKDD), 1(1):3, 2007. 176 [115] Ali Makhdoumi and Nadia Fawaz. Privacy-utility tradeoff under statistical uncer- tainty. In Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on, pages 1627–1634. IEEE, 2013. [116] Ali Makhdoumi, Salman Salamatian, Nadia Fawaz, and Muriel M´ edard. From the information bottleneck to the privacy funnel. In Information Theory Workshop (ITW), 2014 IEEE, pages 501–505. IEEE, 2014. [117] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 81–90. IEEE, 2010. [118] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag¨ uera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Ar- tificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Laud- erdale, FL, USA, pages 1273–1282, 2017. [119] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Sympo- sium on, pages 94–103. IEEE, 2007. [120] Adam Meyerson and Ryan Williams. On the complexity of optimal k-anonymity. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 223–228. ACM, 2004. [121] El Mahdi El Mhamdi, Rachid Guerraoui, and S´ ebastien Rouault. The hidden vulnerability of distributed learning in byzantium. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm¨ assan, Stockholm, Sweden, July 10-15, 2018, pages 3518–3527, 2018. [122] Darakhshan J Mir. Information-theoretic foundations of differential privacy. In International Symposium on Foundations and Practice of Security, pages 374– 381. Springer, 2012. [123] S Alvim M’rio, Kostas Chatzikokolakis, Catuscia Palamidessi, and Geoffrey Smith. Measuring information leakage using generalized gain functions. In 2012 IEEE 25th Computer Security Foundations Symposium, pages 265–279. IEEE, 2012. [124] Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In Security and Privacy, 2008. SP 2008. IEEE Symposium on, pages 111–125. IEEE, 2008. 177 [125] M Ercan Nergiz and Chris Clifton. Thoughts on k-anonymization. Data & Knowledge Engineering, 63(3):622–645, 2007. [126] Alex Nichol, Joshua Achiam, and John Schulman. On first-order meta-learning algorithms. CoRR, abs/1803.02999, 2018. [127] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84. ACM, 2007. [128] Sewoong Oh and Pramod Viswanath. The composition theorem for differential privacy. Technical report, Technical Report, 2013. [129] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. To- wards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016. [130] Andrea Paudice, Luis Mu˜ noz-Gonz´ alez, Andr´ as Gy¨ orgy, and Emil C. Lupu. De- tection of adversarial training examples in poisoning attacks through anomaly detection. CoRR, abs/1802.03041, 2018. [131] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang, and Shiho Mo- riai. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Information Forensics and Security, 13(5):1333–1345, 2018. [132] Mingda Qiao and Gregory Valiant. Learning discrete distributions from untrusted batches. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 47:1–47:20, 2018. [133] David Rebollo-Monedero, Jordi Forne, and Josep Domingo-Ferrer. From t- closeness-like privacy to postrandomization via information theory. IEEE Trans- actions on Knowledge and Data Engineering, 22(11):1623–1636, 2010. [134] Steven P Reiss. Practical data-swapping: The first steps. ACM Transactions on Database Systems (TODS), 9(1):20–37, 1984. [135] Steven P Reiss, Mark J Post, and Tore Dalenius. Non-reversible privacy trans- formations. In Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems, pages 139–146. ACM, 1982. [136] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should i trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data min- ing, pages 1135–1144. ACM, 2016. 178 [137] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High- precision model-agnostic explanations. In AAAI Conference on Artificial Intelli- gence, 2018. [138] Ronald L Rivest, Len Adleman, and Michael L Dertouzos. On data banks and privacy homomorphisms. Foundations of secure computation, 4(11):169–180, 1978. [139] Salman Salamatian, Amy Zhang, Flavio du Pin Calmon, Sandilya Bhamidipati, Nadia Fawaz, Branislav Kveton, Pedro Oliveira, and Nina Taft. How to hide the elephant-or the donkey-in the room: Practical privacy against statistical inference for large data. In GlobalSIP, pages 269–272, 2013. [140] Salman Salamatian, Amy Zhang, Flavio du Pin Calmon, Sandilya Bhamidipati, Nadia Fawaz, Branislav Kveton, Pedro Oliveira, and Nina Taft. Managing your private and public data: Bringing down inference attacks against your privacy. IEEE Journal of Selected Topics in Signal Processing, 9(7):1240–1255, 2015. [141] Pierangela Samarati. Protecting respondents identities in microdata release. IEEE transactions on Knowledge and Data Engineering, 13(6):1010–1027, 2001. [142] Pierangela Samarati and Latanya Sweeney. Generalizing data to provide anonymity when disclosing information. In PODS, volume 98, page 188. Cite- seer, 1998. [143] Pierangela Samarati and Latanya Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and sup- pression. Technical report, Technical report, SRI International, 1998. [144] Wojciech Samek, Thomas Wiegand, and Klaus-Robert M¨ uller. Explainable ar- tificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296, 2017. [145] Lalitha Sankar, S Raj Rajagopalan, and H Vincent Poor. An information-theoretic approach to privacy. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pages 1220–1227. IEEE, 2010. [146] Lalitha Sankar, S Raj Rajagopalan, and H Vincent Poor. A theory of utility and privacy of data sources. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pages 2642–2646. IEEE, 2010. [147] Lalitha Sankar, S Raj Rajagopalan, and H Vincent Poor. Utility and privacy of data sources: Can shannon help conceal and reveal information? In Information Theory and Applications Workshop (ITA), 2010, pages 1–7. IEEE, 2010. 179 [148] Lalitha Sankar, S Raj Rajagopalan, and H Vincent Poor. A theory of privacy and utility in databases. under revision, IEEE Trans. Inform. Forensics and Security, Special Issue on Privacy in Cloud Management Systems, 2011. [149] Lalitha Sankar, S Raj Rajagopalan, and H Vincent Poor. Utility-privacy tradeoffs in databases: An information-theoretic approach. IEEE Transactions on Infor- mation Forensics and Security, 8(6):838–852, 2013. [150] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Tim- othy P. Lillicrap. Meta-learning with memory-augmented neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, pages 1842–1850, 2016. [151] Anand D Sarwate and Lalitha Sankar. A rate-disortion perspective on local dif- ferential privacy. In Communication, Control, and Computing (Allerton), 2014 52nd Annual Allerton Conference on, pages 903–908. IEEE, 2014. [152] Shiqi Shen, Shruti Tople, and Prateek Saxena. Auror: defending against poison- ing attacks in collaborative deep learning systems. In Proceedings of the 32nd Annual Conference on Computer Security Applications, ACSAC 2016, Los Ange- les, CA, USA, December 5-9, 2016, pages 508–519, 2016. [153] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In Pro- ceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1310–1321. ACM, 2015. [154] Reza Shokri, Marco Stronati, and Vitaly Shmatikov. Membership inference at- tacks against machine learning models. arXiv preprint arXiv:1610.05820, 2016. [155] Jake Snell, Kevin Swersky, and Richard S. Zemel. Prototypical networks for few- shot learning. In Advances in Neural Information Processing Systems 30, NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4077–4087, 2017. [156] Jacob Steinhardt, Pang Wei Koh, and Percy Liang. Certified defenses for data poisoning attacks. In Advances in Neural Information Processing Systems 30, NIPS 2017, 4-9 December 2017, Long Beach, CA, USA, pages 3517–3529, 2017. [157] Stephen M Stigler. Studies in the history of probability and statistics. xxxii laplace, fisher, and the discovery of the concept of sufficiency. Biometrika, 60(3):439–445, 1973. [158] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timo- thy M. Hospedales. Learning to compare: Relation network for few-shot learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 1199–1208, 2018. 180 [159] Latanya Sweeney. Weaving technology and policy together to maintain confiden- tiality. The Journal of Law, Medicine & Ethics, 25(2-3):98–110, 1997. [160] Latanya Sweeney. Datafly: A system for providing anonymity in medical data. In Database Security XI, pages 356–381. Springer, 1998. [161] Latanya Sweeney. Simple demographics often identify people uniquely. Health (San Francisco), 671:1–34, 2000. [162] Latanya Sweeney. Achieving k-anonymity privacy protection using general- ization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):571–588, 2002. [163] Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557– 570, 2002. [164] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Er- han, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013. [165] Naftali Tishby, Fernando C Pereira, and William Bialek. The information bottle- neck method. arXiv preprint physics/0004057, 2000. [166] Antonio Torralba and Alexei A Efros. Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1521– 1528. IEEE, 2011. [167] Florian Tram` er, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Risten- part. Stealing machine learning models via prediction APIs. In USENIX Security Symposium, pages 601–618, 2016. [168] Brandon Tran, Jerry Li, and Aleksander Madry. Spectral signatures in backdoor attacks. In Advances in Neural Information Processing Systems 31, NIPS 2018, 3-8 December 2018, Montr´ eal, Canada, pages 8011–8021, 2018. [169] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems 30, NIPS 2017, 4-9 De- cember 2017, Long Beach, CA, USA, pages 5998–6008, 2017. [170] Staal A Vinterbo. Privacy: a machine learning view. IEEE Transactions on knowledge and data engineering, 16(8):939–948, 2004. 181 [171] Oriol Vinyals, Charles Blundell, Tim Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. Matching networks for one shot learning. In Advances in Neural In- formation Processing Systems 29, NIPS 2016, December 5-10, 2016, Barcelona, Spain, pages 3630–3638, 2016. [172] Poorvi L V ora. An information-theoretic approach to inference attacks on random data perturbation and a related privacy measure. IEEE transactions on informa- tion theory, 53(8):2971–2977, 2007. [173] Nick Wallace. EU’s right to explanation: A harmful restriction on artificial intel- ligence. https://goo.gl/VAetfY, 2017. Accessed: 2018-09-24. [174] K Wang, R Chen, BC Fung, and PS Yu. Privacy-preserving data publishing: A survey on recent developments. ACM Computing Surveys, 2010. [175] Ke Wang and Benjamin Fung. Anonymizing sequential releases. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 414–423. ACM, 2006. [176] Ke Wang, Benjamin CM Fung, and Guozhu Dong. Integrating private databases for data analysis. In International Conference on Intelligence and Security Infor- matics, pages 171–182. Springer, 2005. [177] Ke Wang, Benjamin CM Fung, and S Yu Philip. Handicapping attacker’s confi- dence: an alternative tok-anonymization. Knowledge and Information Systems, 11(3):345–368, 2007. [178] Ke Wang, Benjamin CM Fung, and Philip S Yu. Template-based privacy preser- vation in classification problems. In Data Mining, Fifth IEEE International Con- ference on, pages 8–pp. IEEE, 2005. [179] Ke Wang, Philip S Yu, and Sourav Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In Data Mining, 2004. ICDM’04. Fourth IEEE International Conference on, pages 249–256. IEEE, 2004. [180] Weina Wang, Lei Ying, and Junshan Zhang. On the relation between identifiabil- ity, differential privacy, and mutual-information privacy. IEEE Transactions on Information Theory, 62(9):5018–5029, 2016. [181] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63– 69, 1965. [182] Xiaokui Xiao and Yufei Tao. Personalized privacy preservation. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 229–240. ACM, 2006. 182 [183] Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Generalized byzantine- tolerant SGD. CoRR, abs/1802.10116, 2018. [184] Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi, and Ada Wai-Chee Fu. Utility-based anonymization using local recoding. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 785–790. ACM, 2006. [185] Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter L. Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In Pro- ceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm¨ assan, Stockholm, Sweden, July 10-15, 2018, pages 5636–5645, 2018. [186] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transfer- able are features in deep neural networks? In Advances in Neural Information Processing Systems 27, NIPS 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 3320–3328, 2014. [187] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learn- ing fair representations. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 325–333, 2013. [188] Alexander Zien, Nicole Kr¨ amer, S¨ oren Sonnenburg, and Gunnar R¨ atsch. The feature importance ranking measure. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 694–709. Springer, 2009. 183
Abstract (if available)
Abstract
As cloud and machine-learning services are becoming more ubiquitous, it is becoming easier for service providers to access, collect, analyze, and store a wide variety of personal information to provide more accurate and personalized services. Despite the convenience of these services, due to the increasing incidents of malicious attacks on machine learning based applications and privacy leakage of the collected user data, there is increasing concern that service providers gather, store and analyze users’ private information. ❧ Motivated by this, in this thesis, we focus on the following problems. We first study security and privacy problems brought on by malicious attacks in a particular cooperative machine learning framework, known as federated learning, in which, motivated to preserve privacy, users jointly train a machine learning model without sharing their private data. However, in such a context, a malicious participant can attack the jointly trained model. Existing defense methods have several deficiencies with privacy risks, since all of them rely on a third-party to examine users’ updates. We next explore unintentional privacy leakage, where sensitive information of a user could be (unintentionally) leaked from, but not limited to, (i) query outputs of a database containing user information, (ii) a released anonymized dataset, or (iii) an announced algorithmic transparency report releasing details of underlying algorithmic decisions based on user information. We investigate privacy-preserving mechanisms in the above mentioned contexts, focusing on two common privacy-preserving paradigms: privacy-preserving database-mining (PPDM) and privacy-preserving database-publishing (PPDP). ❧ Specifically, for our first problem, we experimentally explore poisoning backdoor attacks in the context of federated meta-learning, an important problem that has not been well-studied in the literature. Our first finding is that poisoning backdoor attacks in federated meta-learning are long-lasting: a one-shot attack can persist and influence a meta model for tens of rounds of federated learning during normal training. Moreover, we found that poisoning backdoor attacks cannot be removed by standard fine-tuning during meta-testing, a stage where a meta model adapts to new tasks before evaluation. Our results show that the attack influence diminishes very slightly after hundreds of epochs of fine-tuning. These findings show the difficulty of removing poisoning backdoor attacks in federated meta-learning through regular training and fine-tuning
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Optimizing privacy-utility trade-offs in AI-enabled network applications
PDF
Striking the balance: optimizing privacy, utility, and complexity in private machine learning
PDF
Practice-inspired trust models and mechanisms for differential privacy
PDF
Enhancing privacy, security, and efficiency in federated learning: theoretical advances and algorithmic developments
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Controlling information in neural networks for fairness and privacy
PDF
Taming heterogeneity, the ubiquitous beast in cloud computing and decentralized learning
PDF
Distributed resource management for QoS-aware service provision
PDF
Improving network security through cyber-insurance
PDF
Identifying and mitigating safety risks in language models
PDF
Efficiency in privacy-preserving computation via domain knowledge
PDF
Edge-cloud collaboration for enhanced artificial intelligence
PDF
Differentially private and fair optimization for machine learning: tight error bounds and efficient algorithms
PDF
Achieving efficient MU-MIMO and indoor localization via switched-beam antennas
PDF
Transfer learning for intelligent systems in the wild
PDF
Optimizing privacy and performance in spectrum access systems
PDF
Multimodal reasoning of visual information and natural language
PDF
Optimizing task assignment for collaborative computing over heterogeneous network devices
PDF
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
PDF
Coding centric approaches for efficient, scalable, and privacy-preserving machine learning in large-scale distributed systems
Asset Metadata
Creator
Chen, Chien-Lun
(author)
Core Title
Security and privacy in information processing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
05/15/2020
Defense Date
01/22/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
algorithmic transparency,differential privacy,Fairness,federated learning,meta-learning,OAI-PMH Harvest,poisoning attack,utility-privacy tradeoff
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Golubchik, Leana (
committee chair
), Naveed, Muhammad (
committee member
), Psounis, Konstantinos (
committee member
)
Creator Email
chienlun.chen@gmail.com,chienlun@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-312991
Unique identifier
UC11663257
Identifier
etd-ChenChienL-8176.pdf (filename),usctheses-c89-312991 (legacy record id)
Legacy Identifier
etd-ChenChienL-8176.pdf
Dmrecord
312991
Document Type
Dissertation
Rights
Chen, Chien-Lun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
algorithmic transparency
differential privacy
federated learning
meta-learning
poisoning attack
utility-privacy tradeoff