Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Robust interpretable machine learning on data manifold via feature interaction using Shapley framework and quadtree masking
(USC Thesis Other)
Robust interpretable machine learning on data manifold via feature interaction using Shapley framework and quadtree masking
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Robust Interpretable Machine Learning on Data Manifold Via Feature Interaction Using Shapely Framework and Quad-tree Masking by Md Sk Abid Hassan A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE (COMPUTER SCIENCE) May 2023 Copyright 2023 Md Sk Abid Hassan D~cuSign Envelope ID: 1 EEA8017-3684-4455-A31 B-E 11358797064 USC Graduate School 0 Appointment or D Change of Master's Committee ~a st er's Committee: The master's committee consists of at least three, and no more than five, members. The committee chair must have an appointment in t e stu dent's program. At least two of the members must be full-time faculty in the student's home program, and may be tenured, tenure track or non-tenured RTPC faculty. All committees must have a majority of members from the student's home program. A_USC faculty member from outside the student's home program is called an "outside member." For outside faculty the judgment of qualification to serve WIil be made by the dean of the school of the outside faculty member's primary appointment. A faculty member from an institution other than USC is called an "external member." The CV of the external member must be attached to the Appointment of Committee form PDF and uploaded to Thesis Center. Outside and/or External members should be identified as such in the Appointment Type column below. Hassan Md Sk Abid Student Name: ____________ _ Last First Program Of Study (POST) Code: 67 4 (do not enter your zip code) Viterbi School of Engineering School: __ -_ -_ - _ - _ - _---::.-=.-::.-::.-::.-:.. - :::::: ________ _ Computer Science --~-- Majo ::_ r· :_: · ========--~ If the student is changing the committee chair, the signature of the previous chair is required here: ___________ ____ _ For all changes of committee, please list the names of all committee members. Only the new mem _l?~ (~ ) and committee chair are required to sign. Committee Members Rank: Appointment Type: Ex: Professor, Associate Tenured/ Tenure Printed names Professor, etc. track/RTPC; Home Dept. Signature Outside/External ~ Yan Liu Professor ~ Computer Science ::::, Chair Professor ~ Computer Science Ram Nevatia ::::, Associate Professor z Computer Science )"_ LC Saty Raghavachary 0 ., Rob·,n J"'i Q. A-s.s;~h-,..~ Rof. -f-~rt.- .f-nu:L<. CoM ~~t--OI. 5'i~9'C& ~ (2&-:-:- Date \ I /~9/.u>~ DocuSign Envelope ID: 1EEA8D17-3684-4455-A31B-E11358797D64 12/13/2022 Erik Johnson 'Please co~lete the section below for all appointments and changes of committees. Dean of Outside Member's School only required if applicable Printed Name Signature Md Sk Abid Hassan Student Nenad Medvidovic Department Chair or Program Director Yannis C Yortsos Dean Revised January 2020 Dedication I dedicate this thesis to my beloved mother, siblings, and heavenly father. Your love and sacrifice enabled me to excel. ii Acknowledgements I want to express my sincerest gratitude to the many supporters of my journey. It has been a truly challenging journey, and I could not have been successful without your help. I will always be indebted to them. I want to start by thanking my elder brother and mother, who selflessly sacrificed their comfort so that I could afford an education. This thesis and master’s degree result from their constant support at each step during my journey. They taught me to be calm and persistent even in adverse times, which has positively impacted my research progress. I want to extend my gratitude to my thesis advisor Prof. Yan Liu and my mentor James Enouen. It has been an incredible experience under your guidance, and I’m very grateful that we found strong cooperation between your advisement style and my research personality. I particularly enjoyed the flexibility of choosing what problems and application domains to work on. I would also like to thank my committee members, Prof. Ram Nevatia, Prof. Robin Jia, and Prof. Saty Raghavachary, for their insights and guidance. Finally, thank my family and friends for your unconditional love and support. It wouldn’t have seen the light of the day without your support. iii Table of Contents Dedication ii Acknowledgements iii List of Tables vi List of Figures vii Abstract ix Chapter 1: Introduction 1 1.1 Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Feature Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Contributions of Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2: Related Works 5 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Chapter 3: Preliminaries 9 3.1 Gaussian Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Quad-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3.1 How to calculate Shapley Values . . . . . . . . . . . . . . . . . . . . . . . 12 3.3.2 Example: Shapley Value Calculation . . . . . . . . . . . . . . . . . . . . . 13 3.4 Data Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4: Motivations: Linear Models 17 4.1 Ridge Vs. Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Model Coefficient as Shapley Values . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Adversarial Plot: Ridge Regression . . . . . . . . . . . . . . . . . . . . . 21 4.2.3 Adversarial Plot: Ridge Vs Lasso Regression . . . . . . . . . . . . . . . . 21 4.2.4 Adversarial Heatmap: Ridge Vs Lasso Regression . . . . . . . . . . . . . 22 iv 4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 5: Feature Interaction Using Shapley Framework and Quad-tree 24 5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1.1 Gaussian Pyramid on MNIST data-set . . . . . . . . . . . . . . . . . . . . 25 5.1.2 Quad-tree Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.1.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1.4 CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2.1 Metric: Accuracy & Loss Curve . . . . . . . . . . . . . . . . . . . . . . . 30 5.2.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3 Shapley Values as Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.1 Shapley Value Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.3.2 Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Chapter 6: Summary, Discussion and Future Work 38 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.3 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Bibliography 41 Appendices 42 D Heatmap: Linear Model Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . 43 E Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 v List of Tables 3.1 Marginal Contribution of Player . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 Accuracy and Val Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Accuracy masked and unmasked model . . . . . . . . . . . . . . . . . . . . . . . 32 vi List of Figures 3.1 Image Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Quad-tree Spatial Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Shapley Values of each pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Data Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Shapley calculation - off-manifold splices . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Estimation picture for the lasso (left) and ridge regression (right) . . . . . . . . . . 19 4.2 Linear model parameter heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Adversarial Softmax probability plot . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Lasso Adversarial Softmax probability plot . . . . . . . . . . . . . . . . . . . . . 21 4.5 Ridge Adversarial Softmax probability plot . . . . . . . . . . . . . . . . . . . . . 22 4.6 Adversarial heatmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 5.1 Three layered Gaussian Image Pyramid . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Gaussian Image Pyramid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3 Quad-tree Mask Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5.4 CNN Model Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.5 CNN Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.6 Model training pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.7 Training Loss Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.8 Prediction Loss Game Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . 34 5.9 Prediction Game Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.10 Adversarial Digit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 vii 5.11 Adversarial heat-map of masked model . . . . . . . . . . . . . . . . . . . . . . . 37 5.12 Adversarial heat-map of unmasked model . . . . . . . . . . . . . . . . . . . . . . 37 6.1 Linear model parameter heatmap for digit 2,3 . . . . . . . . . . . . . . . . . . . . 43 6.2 Linear model parameter heatmap for digit 4,5 . . . . . . . . . . . . . . . . . . . . 44 6.3 Linear model parameter heatmap for digit 6,7 . . . . . . . . . . . . . . . . . . . . 45 6.4 Linear model parameter heatmap for digit 8, 9 . . . . . . . . . . . . . . . . . . . . 45 6.5 Prediction Game Shapley Values digit 0 . . . . . . . . . . . . . . . . . . . . . . . 46 6.6 Prediction Game Shapley Values digit 1 . . . . . . . . . . . . . . . . . . . . . . . 46 6.7 Prediction Game Shapley Values digit 2 . . . . . . . . . . . . . . . . . . . . . . . 47 6.8 Prediction Game Shapley Values digit 3 . . . . . . . . . . . . . . . . . . . . . . . 47 6.9 Prediction Game Shapley Values digit 4 . . . . . . . . . . . . . . . . . . . . . . . 48 6.10 Prediction Game Shapley Values digit 5 . . . . . . . . . . . . . . . . . . . . . . . 48 6.11 Prediction Game Shapley Values digit 6 . . . . . . . . . . . . . . . . . . . . . . . 49 6.12 Prediction Game Shapley Values digit 8 . . . . . . . . . . . . . . . . . . . . . . . 49 6.13 Prediction Game Shapley Values digit 9 . . . . . . . . . . . . . . . . . . . . . . . 50 viii Abstract Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are not interpretable and vulnerable to adversarial attacks. The success of neu- ral networks is because of their capability to learn complex feature interactions. When a model makes predictions, it fails to explain why they are making a specific decision as an image classifier. The state-of-the-art models should communicate their intentions and explain their decision-making process to build trust, foster confidence, and improve human-agent team dynamics. On the other hand, slightly modified and well-crafted adversarial examples could easily fool a well-trained im- age classifier based on deep neural networks with high confidence. The AI safety concerns of the adversarially not-so-robust systems require building models that are robust and explainable. Contextual dependencies between features, known as feature interactions, may better explain the model behavior and why certain features are more relevant. In other words, feature interac- tions can be analyzed to quantify the impact of features on model predictions. Shapley value is a framework that attributes a model’s predictions to its input features in a model-agnostic way. On the other hand, feature simulations via masking are used to build robust on-manifold predictive models. However, for computer vision tasks, recent work lacks the usage of image pyramids and quadtrees (quadtree is a tree data structure in which each internal node has exactly four children) masking to quantify feature interaction. This thesis work leverages this spatial hierarchy to create more explainable and robust models. This work focuses on three dimensions: 1) how to mask features to interpret model behavior on the manifold, 2) Shapley values to summarize each feature’s influence and importance, and 3) how masking makes the model robust to perturbation adversarial attacks. ix Chapter 1 Introduction Machine Learning transparency calls for robust and interpretable explanations of how inputs relate to predictions - especially for state-of-the-art yet black-box prediction models. The problem of ex- plaining input-output relations belongs to the field of interpretable machine learning, and the issue of building an adversarially robust model pertains to AI safety. This thesis aims to advance robust interpretable machine learning by creating novel hierarchical feature interaction using quadtree masking. Prior to this thesis, model interpretation via feature importance has become a mature topic for state-of-the-art prediction models such as neural networks. This thesis advances the field by laying the groundwork for explaining the feature interactions on data manifold using novel quad-tree masking on hierarchical input feature representation in these models. The goal of this thesis is to estimate feature interactions via masking and guess interpretability & robustness of predictive models. 1.1 Interpretability If black-box predictions affect our future, should we trust them? In reality, they already affect us from the recommendations we receive every day for movies, news feeds, food, products, friends, relationships, services, etc. The desire to understand why machine learning models make specific predictions is the essence of model interpretability. Model interpretability is the degree to which a human can consistently predict the model’s behavior. At its core, we want to understand why the 1 model makes decisions (it could be classification or regression). This explainability in AI is central to the practical impact of AI on society. AI agents should be able to communicate intentions and explain their decision-making processes to build trust, foster confidence, and improve human-agent team dynamics. The state-of-the-art neural network model struggles to explain why they make certain deci- sions. In this work, we are interested in researching new model behavior and interpretability using feature interaction via masking to calculate shapley values. The explainability of a model becomes essential, especially in medicine, to foster trust in naive observers. 1.2 Robustness Addressing the safety and security challenges of complex AI systems is critical to fostering trust in AI. In this context, robustness signifies the ability to withstand or overcome adverse condi- tions, including digital security risks. Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples. Slightly modified and well-crafted adversarial data samples could be easily generated, fooling a well-trained image classifier based on deep neural networks (DNNs) with high confidence. In this thesis, we are interested in finding novel techniques to build a model that can be adver- sarially robust to physical and online attacks using feature interaction via masking. 1.3 Feature Interaction When features interact with each other in a prediction model, the prediction cannot be expressed as the sum of the feature effects because the effect of one feature depends on value of other feature - Christopher Molnar 2 The complex collaborative effects of features toward predicting a variable are called feature interaction. Another aspect of feature interaction is the variation of one feature with respect to another with which it is interacting. These variables are often referred to as interaction variables. A feature interaction describes a situation in which the effect of one feature on an outcome depends on the state of a second feature. For example, consider a linear regression model f(x)= w 1 x+ w 2 y+ w 3 xy (1.1) where x and x are features and w i are coefficients. The multiplication xy forms a feature interaction, and the individual terms x and y are the main effects. The coefficients provide information about each term’s importance. A feature interaction is a non-additive effect between multiple features on an outcome. Let us consider another example to put the physical meaning of interaction in perspective. Say one feature represents “weight,” and another represents “height.” The outcome variable is the risk of developing a heart disease would depend on your BMI, which is defined as weight/height². Here, (weight, height) is a pair-wise interaction. Feature interaction phenomena exist in many real-world settings where an outcome is modeled as a function of features. Because machine learning is precisely designed to model such functions, state-of-the-art models like neural networks - are well suited for learning feature interactions. An example of an image interaction is a combination of image patches corresponding to a dog’s ears, nose, and eyes, which supports a dog’s classification. We see that interaction phenomena can be interpretable in diverse domains. 1.4 Contributions of Research Contextual dependencies between features, known as feature interactions within the model, may better explain the model and why certain features outrank others in importance. In other words, feature interactions are used to analyze the impact of features on model predictions. Shapley 3 Value is a framework that attributes a model’s predictions to its input features in a mathematically principled and model-agnostic way. On the other hand, recent work has brought robustness into the model by leveraging a masked predictor(on the manifold) for computer vision data. On-manifold predictive models are achieved using the principle of simulating feature masking to quantify each feature’s influence. However, for computer vision tasks, these techniques have remained insufficient for fully understanding model behavior and making the model robust to adversarial attacks. Although researchers have proposed a wide variety of model explanation and robustness approaches, they lack the use of image pyramids and quad-tree masking to quantify feature interaction. This thesis work leverages this spatial hierarchy (feature pyramid coupled with quad-tree mask- ing) to create more explainable and robust models. This work focuses on three dimensions: • how to mask features to interpret model behavior on the data manifold • Shapley values to summarize each feature’s influence and importance emulate explainability • how masking makes the model robust to perturbation adversarial attacks 4 Chapter 2 Related Works 2.1 Overview Interpretable machine learning is essential for leveraging automatic prediction models in high- stakes decisions, scientific discoveries, and model debugging. As machine learning models ad- vanced over the time, they lean to become more accurate yet less interpretable. We first provide an overview of interpretable machine learning techniques. We then discuss accommodating social science perspectives of human understanding for interpretable machine learning. While many dichotomies exist within the field - between local and global explanations (Ribeiro et al., 2016) [1], between post-hoc and intrinsic interpretability (Rudin, 2019) [2], and between model-agnostic and model-specific methods (Shrikumar et al., 2017) [3] - in this work, we fo- cus on local, post-hoc, model-agnostic explainability as it provides insight into individual model predictions, does not limit model expressiveness and is comparable across model types. Several paradigms try to explain the model behavior, like the Gradient-Based approach (Inte- grated Gradients, Integrated Hessians, Sanity check for Saliency Maps), Ablation-Based approach (Shapley Taylor Index, Shapley Explainability), and Prototype-Based approach ( Prototypical Part Network). In the ablation-based approach, feature attribution is used to analyze the impact of fea- tures on predictions. In gradient-based approach, gradient of the output with respect to the input are used for attribution method. In prototype based approach, we focus on parts of the image and compare them with prototypical parts of images from a given class. 5 2.2 Related Work Interpretation methods have also been developed to treat the model under investigation as a black box, such as permutation feature importance, which shuffles the values of a feature in a data batch and checks the resulting change in the model loss. Although this method was initially proposed for random forests, it can be used on any tabular prediction model. To accommodate any pre- diction model, a method called Locally Interpretable Model Agnostic Explanations (LIME) was developed to compute feature attribution via linear regression on feature perturbations and their model inferences (Ribeiro et al., 2016) [1]. Another method called Shapley Additive Explana- tions (SHAP) enforced stricter additive constraints on attribution scores by modifying LIME’s fea- ture perturbation scheme with guidance from Shapley values (Lundberg & Lee, 2017) [4]. Other appropriate methods for attributing predictions to feature interactions are black-box explanation methods based on axioms (or principles), but these methods need to be more interpretable. One of the core issues is that an interaction’s importance is not the same as its attribution. Techniques like Shapley Taylor Interaction Index (STI) (Agarwal et al., 2019) [5] and Integrated Hessians (IH) (Janizek et al., 2020) [6] combine these concepts with being axiomatic. Specifically, they base an interaction’s attribution on non-additivity, i.e., the degree to that features non-additively affect an outcome. Neither STI nor IH is tractable for higher-order feature interactions. Hence, there is a need for interpretable, axiomatic, and scalable methods for interaction attribution and corresponding interaction detection. For linear models, ML practitioners regularly inspect the products of the model coefficients and the feature values to debug predictions. Gradients (of the output with respect to the input) are a natural analog of the model coefficients for a deep network. Therefore, the gradient and feature values product is a reasonable starting point for an attribution method (Simonyan et al., 2013) [7]. The problem with gradients is that they break the sensitivity the lack of sensitivity causes gradients to focus on irrelevant features. A second set of approaches involves back-propagating the final prediction score through each network layer down to the individual features. These include DeepLift ( Shrikumar et al., 2017) [3], Layer-wise relevance propagation (LRP) ((Binder et al., 6 2016) [8], Deconvolutional networks (DeConvNets) (Zeiler & Fergus, 2014) [9], and Guided back- propagation (Springenberg et al., 2014) [10]. These methods differ in the specific backpropagation logic for various activation functions (e.g., ReLU, MaxPool, etc.). (Sundararajan et al., 2017) [11] Combines the Implementation Invariance of Gradients and the Sensitivity of techniques like LRP or DeepLift. (Janizek et al., 2020) [6] presents Integrated Hessians, an extension of Integrated Gradients (Sundararajan et al., 2017) [11] that explains pairwise feature interactions in neural networks. Integrated Hessians overcome several theoretical limitations of previous methods and, unlike them, are not limited to a specific architecture or class of neural network. When we describe how we classify images, we might focus on parts of the image and compare them with prototypical parts of images from a given class. This method of reasoning is commonly used in complex identification tasks: e.g., radiologists compare suspected tumors in X-ray scans with prototypical tumor images for cancer diagnosis. The question is whether we can ask a machine learning model to imitate this way of thinking and to explain its reasoning process in a human- understandable way. This approach aims to define a form of interpretability in image processing (this looks like that) that agrees with how humans describe their thinking in classification tasks. (Chen et al., 2019) [12] introduces a network architecture – prototypical part network (ProtoPNet), that accommodates this definition of interpretability, where comparison of image parts to learned prototypes is integral to the way our network reasons about new examples. 2.3 Applications We overview several applications relevant to feature interactions for interpretable machine learn- ing. Since deep learning has become mainstream, there have been efforts to interpret or leverage feature interactions captured by deep neural networks. Here, we discuss research on three models: text analyzers, image classifiers, and recommendation systems. 7 A primary interest of the text analysis community is explaining word interactions in applica- tions like sentiment analysis. The explanations can indicate how words modify each other’s senti- ments when considered together rather than separately. On this topic, (Murdoch et al., 2018) [13] proposed Contextual Decomposition to extract word interactions from Long Short-Term Memory (LSTM) networks in the form of word-phrase attributions decomposed throughout the network. Very few works have studied how to interpret feature interactions in image classifiers. In (Singh et al., 2019) [14], the Contextual Decomposition method was expanded to image classification. However, Contextual Decomposition still does not detect interactions, nor are its attributions ax- iomatic (Sundararajan et al., 2017) [11]. Some methods attempt to interpret feature groups in image classifiers, such as Anchors (Ribeiro et al., 2018) [15] and Context-Aware methods (Singla et al., 2019) [16]; however, these methods face the same drawbacks as Contextual Decomposition. For recommendation systems, most of the works leverage representations of interactions to maximize prediction performance rather than explain the interactions. (Lian et al., 2018) [17] directly incorporate multiplicative cross-terms in neural network architectures. (Song et al., 2018) [18] use attention as an interaction module, all of which are intended to improve the neural net- work’s function approximation. Despite the significant efforts on leveraging feature interaction for prediction performance, to our knowledge, no works have explained the feature interactions learned by recommender systems. 8 Chapter 3 Preliminaries To derive our feature interaction values and interpretability via Shapley Values, we first introduce preliminaries that serve as a basis for this thesis. 3.1 Gaussian Pyramid Figure 3.1: Image Pyramid, taken from https://cs.brown.edu/courses/csci1430/2011/ results/proj1/georgem/ An image pyramid as shown in figure 3.1 is a multi-scale representation of an image. To get an image pyramid, a high-resolution original image is a down-sample layer by layer. In figure 3.1 original image is at level 0 and as we go up the tree original image is down-sampled starting from 9 level 1 to level 4. When a Gaussian filter is used to down-sample the original image, it becomes Gaussian pyramids. The idea behind this is that features that may go undetected at one resolution can easily be detected on other resolutions. For instance, if the region of interest is large, a low-resolution image or coarse view is sufficient. While for small objects, it’s beneficial to examine them at high resolution. As both large and small objects are present in an image, analyzing the image at several levels can prove to be crucial. 3.2 Quad-tree Figure 3.2: Quad-tree Spatial Decomposition, taken from https://medium.com/ @tannerwyork/quadtrees-for-image-processing-302536c95c00 A quad-tree is a tree data structure in which each internal node has four children. Quad-trees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a ”unit of interesting spatial information.” The subdivided regions may be square or rectangular or may have arbitrary shapes. All forms of quad-trees share some standard features: 10 r l iE. II I -, ~ c . - I - A • . I - . s - D ., I~ I _J • They decompose space into adaptable cells. • Each cell has a maximum capacity. When maximum capacity is reached, the cell splits. • The tree directory follows the spatial decomposition of the quad-tree, see in the figure 3.2. We can view Quad-tree as a knowledge structure that encodes a two-dimensional space into adaptable cells. This quad-tree generated using quad-distribution (see 5.1.2) is used to mask data samples. These quad-trees masking when applied to the Gaussian image pyramid (see 3.1) is key to the identification and extraction of meaningful and robust features of image data. As described in 3.1, when masking is applied on the pyramid, any predictive model zooms out to the relevant feature to make a prediction. 3.3 Shapley Values The study of ”feature interactions” is one of the primary candidates as a framework for bringing interpretability to deep learning. This direction formalizes any statistical interaction between ex- isting features of a dataset and can be leveraged to explain the decisions of any black-box model. In this work, I have used Shapley Value framework to explore the interpretability and robustness of general machine-learning models. In cooperative game theory, a game can be a set of circumstances whereby two or more players or decision-makers contribute to an outcome. The strategy is the game plan that a player imple- ments, while the payoff is the gain achieved for arriving at the desired outcome. The Shapley value is the average expected marginal contribution of one player after all possible combinations have been considered. Shapley value helps determine a payoff for all players when each player might have contributed more or less than the others. Shapley value has numerous applications whereby the players could instead be factors needed to achieve the desired outcome or the payoff. 11 3.3.1 How to calculate Shapley Values The Shapley Value is calculated using the setup: a coalition of players cooperates and obtains a specific overall gain from that cooperation. Since some players may contribute more to the coalition than others or possess different bargaining power (for example, threatening to destroy the whole surplus), what final distribution of generated surplus among the players should arise in any particular game? Or phrased differently: how important is each player to the overall cooperation, and what payoff can he or she reasonably expect? The Shapley value provides one possible answer to this question. A coalition game is defined as a set N (of n players) and a function ν that maps subsets of players to the real numbers: ν : 2 N → R with ν(φ)= 0 where φ denotes empty sets. The function ν is called a characteristic function. The function ν has the following meaning: if S is a coalition of players, then ν(S), called the worth of coalition S, describes the total expected sum of payoffs the members of S can be obtained by cooperation. According to the Shapley value, the amount that a player i is given in a coalition game (ν,N) is given by following equations 3.1 and 3.2, where n is the total number of players and the sum extends over all subsets S of N not containing player i. ϕ i (ν)= ∑ S⊆ N\{i} |S|!(n−| S|− 1)! n! (ν(S∪{i})− ν(S)) (3.1) ϕ i (ν)= ∑ S⊆ N\{i} n 1,|S|,n−| S|− 1 − 1 (ν(S∪{i})− ν(S)) (3.2) The formula can be interpreted as follows: imagine the coalition being formed one actor at a time, with each actor demanding their contribution ν(S∪{i})− ν(S) as a fair compensation, and then for each actor take the average of this contribution over the possible different permutation in which the coalition can be formed. 12 An alternative equivalent formula for the Shapley value is given by equation 3.3, where where sum ranges over all n! orders of R of the players and P R i is the set of players in N which precede i in the order R. ϕ i (ν)= 1 n! ∑ R [(ν(P R i ∪{i})− ν(P R i ))] (3.3) 3.3.2 Example: Shapley Value Calculation Assume a coalitional glove game where the players have left- and right-hand gloves and the goal is to form pairs. Let N ={1,2,3} where player 1 and player 2 have right handed gloves and player 3 has left handed glove. The value function and Shapley value formula for this coalition game is given by equation 3.4. The formula used for calculating the Shapley value is 3.3 f(x)= 1, if S∈{{1,3},{2,3},{1,2,3}} 0, otherwise (3.4) The table 3.1 displays the marginal contribution of player 1 and observe the contribution is ϕ 1 (ν)= 1 6 . By symmetry contribution of player 2 is ϕ 2 (ν)= 1 6 . Due to the efficiency axiom, the sum of all the Shapley values is equal to 1, which means contribution of player 3 isϕ 3 (ν)= 4 6 . Order R MC 1 {1,2,3} ν({1})− ν(φ)= 0− 0= 0 {1,3,2} ν({1})− ν(φ)= 0− 0= 0 {2,1,3} ν({1,2})− ν({2})= 0− 0= 0 {2,3,1} ν({1,2,3})− ν({2,3})= 1− 1= 0 {3,1,2} ν({1,3})− ν({3})= 1− 0= 1 {3,2,1} ν({1,3,2})− ν({3,2})= 1− 1= 0 Table 3.1: Marginal Contribution of Player 1, taken from https://en.wikipedia.org/wiki/ Shapley_value In supervised learning, let f y (x) be a model’s predicted probability that data point x belongs to class y. To apply Shapley Attribution Model Explainability, one interprets the features{x 1 ,....x n } as players in a game and the output f y (x) as their earned value. To compute Shapley values, one 13 Figure 3.3: Shapley Values of each pixels of MNIST digit 5. must define a value function representing the model’s output on a coalition x S ⊆{ x 1 ,....x n }. In case of image data, we can consider each pixels as feature and use Shapley attribution model of interpretability as shown in figure 3.3. I have used Shapley Values to quantify the feature interaction, and explainability can be treated as an attribution problem. Shapley values provide the unique attribution method satisfying a set of intuitive axioms, e.g., they capture all interactions between features and sum to the model predic- tion. (The Shapley framework for explainability attributes model predictions to its input features in a mathematically principled and model-agnostic way. ) 3.4 Data Manifold Manifolds describe a geometric surface, and data live on d-dimensional manifolds in ambient n- dimensional feature space. In crude terms, we can think of it as a lower-level representation of high-dimensional data. For example, the figure 3.4 shows the manifolds of MNIST data using a 2-d representation. 14 digit LO 28x28 Sha el Values: bfs 0.0004 08 00002 0.6 0.0000 0.4 - 0.0002 0.2 -0 0004 0.0 Figure 3.4: Data Manifold on MNIST dataset, taken fromhttps://towardsdatascience.com/ manifolds-in-data-science-a-brief-overview-2e9dde9437e5 Standard implementations of Shapley explainability suffer from a problem shared across model- agnostic methods: they involve marginalization over features, achieved by splicing data points to- gether and evaluating the model on highly unrealistic inputs. Such splicing would only be justified if all features were independent; otherwise, spliced data lies off the data manifold. Off-manifold spliced [19] MNIST digit is shown on the figure 3.5. Figure 3.5: An MNIST digit, a coalition of pixels in a Shapley calculation, and 5 off-manifold splices. 15 Digit Coalition Splice 1 Splice 2 Splice 3 Splice 4 Splice 5 As the model is undefined on partial input x S , the standard implementation [4] samples out- of-coalition features, x ′ S where S= N\S, unconditionally from the data distribution as shown in equation 3.5. ν (o f f) f y (x) (S)=E p(x ′ ) [ f y (x S ∪ x ′ S )] (3.5) We refer to this value function, and the corresponding Shapley values, as lying off the data manifold since splices x S ∪ x ′ S generically lie far from the data distribution. Alternatively, condi- tioning out of coalition features x ′ S on in coalition features x S would result in an on-manifold value function as shown in equation 3.6 ν (on) f y (x) (S)=E p(x ′ |x S ) [ f y (x ′ )] (3.6) The conditional distribution p(x ′ |x S ) is not empirically accessible in practical scenarios with high dimensional data or many valued features. This work uses novel quad tree masking to cal- culate Shapley values on data manifolds. State of the art Deep Neural Models are well behaved only if the training and test data comes from the similar distribution i.e KL divergence of training and test data should be low. This work tries to interpret models on data-manifolds (avoiding out of distribution input data). 16 Chapter 4 Motivations: Linear Models The most famous interpretable model is linear regression which treats inputs independently, and the coefficient of each input corresponds to its slope in linear plots. More recent developments in training machine learning models saw the invention of sparsity regularization, e.g., lasso. When coupled with linear regression, lasso attempts to select only the most essential inputs, which can be helpful for interpretability. The goal of experiment in this chapter is to establish following, which will lay foundation for Shapley Value Framework interpretbility. • Interpret linear model parameters or parameter coefficients as Shapley Values and explain the linear model behavior. • A high number of model parameters ( high dimensional feature space ) gives an adversarial agent an extra degree of freedom to attack the model adversarially. • Besides avoiding overfitting, regularization limits weights (model parameters), making it less susceptible to adversarial attacks. 4.1 Ridge Vs. Lasso Regression A regression model that uses the L1 regularization technique is called Lasso Regression , and a model which uses L2 is called Ridge Regression. The critical difference between these two is the penalty term. Ridge regression adds the “squared magnitude” of the coefficient as a penalty 17 term to the loss function as shown in equation 4.1. Lasso Regression adds the “absolute value of magnitude” of the coefficient as a penalty term to the loss function as shown in equation 4.2. Loss= Error(Y− b Y)+λ n ∑ 1 w 2 i (4.1) Loss= Error(Y− b Y)+λ n ∑ 1 |w i | (4.2) The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero, thus removing some features altogether. So, this works well for feature selection in case we have a considerable number of features. Lasso regression promotes sparsity in weights and gives less degree of freedom to an adversarial agent to exploit. Intuitively lasso regression should be more robust compared to ridge regression. In figure 4.1 β 1 and β 2 are coefficient of linear regression model. The ˆ β is the unconstrained least squares estimate. The red ellipses are (as explained in the caption of this Figure) the contours of the least squares error function in terms of parametersβ 1 andβ 2 . Without constraints, the error function is minimized at the MLE ˆ β, and its value increases as the red ellipses out expand. The diamond and disk regions are feasible for lasso and ridge regression, respectively. Heuristically, for each method, we are looking for the intersection of the red ellipses and the blue region, as the objective is to minimize the error function while maintaining feasibility. It is clear that the lasso constraint, which corresponds to the feasible diamond region, is more likely to produce an intersection with one component of the solution zero (i.e., the sparse model) due to the geometric properties of ellipses, disks, and diamonds. It is because diamonds have corners (of which one component is zero) that are easier to intersect with the ellipses extending diagonally. Thus, we can say lasso regression promotes sparsity. 18 Figure 4.1: Estimation picture for the lasso (left) and ridge regression (right). estimation picture for the lasso (left) and ridge regression (right). Shows the contours of the error and constraint functions. The solid blue areas are the constraint regions |β 1 |+|β 2 | <= t and |β 2 1 |+|β 2 2 | <= t 2 , respectively, while the red ellipses are the contours of the least squares er- ror function. This figure is from https://stats.stackexchange.com/questions/45643/ why-l1-norm-for-sparse-models. 4.2 Empirical Results In this section, we discuss our experiments on MNIST datasets to depict the coefficient of the linear regression model that can be used as explainability, interpret coefficient as Shapley values and to empirically prove our intuition that lasso regression is more adversarially robust to ridge regression against state-of-the-art Projected Gradient Descent (PGD) adversarial attack. The robustness of the model is defined as the amount of noise/perturbations (referred to using ε) needed to make a successful adversarial attack on a data sample against the state-of-the-art PGD attack. 19 'g 'g £J .~ ? / I •~ • ~ I// l ' g 4.2.1 Model Coefficient as Shapley Values The figure 4.2 depicts the learned model Coefficient of digits 0 and 1 and it’s the Shapley values of feature/pixels. It represents the heatmap for ridge (left) and lasso regression (right) at regularization parameter λ = 0.001 (see equations 4.1 and 4.2). On the one hand, the intensity at each pixel of the digit indicates its importance or Shapley values in the decision-making process of the linear model, thereby making the coefficient of model parameters as the Shapley values of input feature which brings interpretability to models. On the other hand, we can see that the lasso regression Shapley values near corners and irrelevant pixels are zero, thus lasso regression brings sparsity to the linear model by killing off irrelevant features. Figure 4.2: Depiction of linear model parameter heatmap for ridge (left) and lasso regression (right) at regularization parameterλ = 0.001. See appendix for all plots D 20 L2 = 0.001 Ll = 0.001 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 00 00 -01 -01 -0 2 -0 2 -0.3 -0.3 -0.4 -0.4 L2 = 0.001 Ll = 0.001 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 00 00 -01 -01 -0 2 -0 2 -0.3 -0.3 -0.4 -0.4 4.2.2 Adversarial Plot: Ridge Regression Figure 4.3: Depiction adversarial plots of ridge regression at λ = 0.00,0.001,0.010 (original label: 3, target label: 8). The figure 4.3 depicts adversarial plots of ridge regression at regularization parameter λ = 0.00, 0.001, 0.010. The amount of noise (referred to using ε) needed to make a successful adversarial attack increases with increased restriction on the weight matrix. As the regularization parameterλ increase, noise levelε increases. Thus, as we increase the restrictions on the model parameter, the model becomes more adversarially robust. 4.2.3 Adversarial Plot: Ridge Vs Lasso Regression Figure 4.4: Adversarial plots of lasso regression atλ = 0.00,0.001,0.010 (original label: 1, target label: 7). Our intuition says lasso regression should be more adversarially robust than ridge regression because lasso promotes weight sparsity. Weight sparsity exposes the model with a low degree of freedom for the adversarial attack. To prove our intuition, we ran an experiment against state-of- the-art PGD adversarial attacks at different levels of regularization parameter λ = 0.000, 0.001, 21 actual label is: 1 , target. is: 7 0.8 .o 0.6 e 0.. X E o.4 4= 0 Vl 0.2 0.0 Adversarial Curve 0.1 0.2 0.3 Epsilons - actual IO,g<t 0.4 0.5 o- label:3, t-label:8, )..=0.001 .?J0.6 - OrigProb ~ - llrgetProb )o. 4 0.4 0.6 epsilon actual label is: 1 , target is: 7 0.6 ~ 0 .5 0.. ~ 0.4 E '\:; 0.3 Vl 0.2 Adversarial Curve I 0.1 ._ ______________ __. 0 .0 0.1 0.2 0.3 0.4 0.5 Epsilons - OriOPmb - OrigProb - llrgetProb - -.roctProb 0.4 0.6 epsdoo actual label is: 1, target is: 7 Adversarial Curve 0.30 .0 e o.. 0.25 X "' i 0.20 0 Vl 0.15 0.0 0.1 0.2 0.3 Epsilons - acwal '"'"'" 0.4 0 .5 Figure 4.5: Adversarial plots of ridge regression atλ = 0.00,0.001,0.010 (original label: 1, target label: 7). 0.010. The figure 4.4 represents an adversarial plot for lasso regression, and the figure 4.5 repre- sents an adversarial plot for ridge regression. As λ increases, the lasso is more robust than ridge regression. Atλ = 0.01, lasso regression protects itself from PGD attack. Empirically, the lasso is more robust than ridge regression. The lesson from this experiment is having less degree of freedom for PGD attacks. The masking technique can achieve a similar effect in a complex neural network. Masking will force back-propagation algorithms learn to relevant features and diminish the impact of irrelevant features. 4.2.4 Adversarial Heatmap: Ridge Vs Lasso Regression Figure 4.6: Adversarial heatmap at λ = 0.00,0.0001,0.010,0.01 from left to right. The conven- tion follows: the x-axis (top to bottom) represents the original digit 0-9, and the y-axis represents the target digit 0-9. Ridge regression is used for experiment To make empirical evidence concrete, we ran the experiment on 500 digits from the MNIST dataset at different levels of λ = 0.00,0.0001,0.001,0.01. We attacked each digit and converted it to digits 0-9 using a PGD attack. For instance, digit ‘5’ is attacked using PGD where the target 22 actual l abel is: 1, t arget i s : 7 0.8 .c 0.6 e Cl. )( ~ 0 .4 • -<>• L 2 = 0 .001 Ll = 0 .001 " .. 03 03 ., 02 OI 01 00 .. --01 --01 --02 --01 --03 --0 3 -<>• -<>• Figure 6.2: Depiction of linear model parameter heatmap for digit 4,5 E Shapley Values The figures from 6.5 to 6.13 depicts the Shapley Value as described in 5.3.2 and its comparison of BFS and DFS traversal of the image pyramid. Higher Shapley values mean higher importance in making correct predictions. 44 L2 = 0 .001 Ll = 0.001 .. . . . , ., ., ., .. 00 00 .. , --01 .. , .. , .. , .. , ... ... L2 = 0.001 . . L1 = 0.001 . . . , ., ., ., .. 00 .. , --01 .. , .. , .. , .. , ... ... Figure 6.3: Depiction of linear model parameter heatmap for digit 6,7 Figure 6.4: Depiction of linear model parameter heatmap for digit 8,9 45 L2 = 0.001 .. Ll = 0.001 . . OJ OJ " " 01 01 00 00 -0 I -01 -02 -02 -Ol -Ol -0• -0• L2 = 0 .001 .. Ll = 0.001 .. Ol Ol " 01 00 00 -01 -01 -02 -02 -Ol -Ol -0• -0, L2 = 0 .001 .. L1 = 0.001 . . • Ol Ol " 02 01 01 00 .. -0 I -01 -02 -02 -Ol -Ol -0• -0• L 2 = 0 .001 .. Ll = 0 .001 .. OJ 02 00 -01 -<>2 -<>l -<> • Figure 6.5: Prediction Game Shapley Values on MNIST digit 0. Figure 6.6: Prediction Game Shapley Values on MNIST digit 1. 46 7x7 Shapely Values: bfs 28x28 Shapely Values: bfs • 7x7 Shapely Values: bfs 14x14 Shapely Values: bfs 28x28 Shapely Values: bfs 7x7 Shapely Values: dfs 7x7 Shapely Values: dfs 14x14 Shapely Values: dfs 28x28 Shapely Values: dfs ■ • Figure 6.7: Prediction Game Shapley Values on MNIST digit 2. Figure 6.8: Prediction Game Shapley Values on MNIST digit 3. 47 7x7 Shapely Values: bfs 14x14 Shapely Values: bfs 14x14 Shapely Values: dfs 28x28 Shapely Values: bfs 28x28 Shapely Values: dfs 7x7 Shapely Values: bfs 14x14 Shapely Values: bfs Figure 6.9: Prediction Game Shapley Values on MNIST digit 4. Figure 6.10: Prediction Game Shapley Values on MNIST digit 5. 48 7x7 Shapely Values: bfs 7x7 Shapely Values: bfs 28x28 Shapely Values: bfs ■ 7x7 Shapely Values: dfs 28x28 Shapely Values: dfs ■ 7x7 Shapely Values: dfs 14x14 Shapely Values: dfs 28x28 Shapely Values: dfs ■ Figure 6.11: Prediction Game Shapley Values on MNIST digit 6. Figure 6.12: Prediction Game Shapley Values on MNIST digit 8. 49 .. 7x7 Shapely Values: bfs 7x7 Shapely Values: dfs ,, - I .. 6 n l - I .. 6 ' I I 28x28 Shapely Values: dfs 7x7 Shapely Values: bfs 14xl4 Shapely Values: bfs 14xl4 Shapely Values: dfs 28x28 Shapely Values: bfs 28x28 Shapely Values: dfs Figure 6.13: Prediction Game Shapley Values on MNIST digit 9. 50 7x7 Shapely Values: bfs 7x7 Shapely Values: dfs 14x14 Shapely Values: bfs 14x14 Shapely Values: dfs
Abstract (if available)
Abstract
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are not interpretable and vulnerable to adversarial attacks. The success of neu- ral networks is because of their capability to learn complex feature interactions. When a model makes predictions, it fails to explain why they are making a specific decision as an image classifier. The state-of-the-art models should communicate their intentions and explain their decision-making process to build trust, foster confidence, and improve human-agent team dynamics. On the other hand, slightly modified and well-crafted adversarial examples could easily fool a well-trained image classifier based on deep neural networks with high confidence. The AI safety concerns of the adversarially not-so-robust systems require building models that are robust and explainable.
Contextual dependencies between features, known as feature interactions, may better explain the model behavior and why certain features are more relevant. In other words, feature interactions can be analyzed to quantify the impact of features on model predictions. Shapley value is a framework that attributes a model’s predictions to its input features in a model-agnostic way. On the other hand, feature simulations via masking are used to build robust on-manifold predictive models. However, for computer vision tasks, recent work lacks the usage of image pyramids and quad-trees (quad-tree is a tree data structure in which each internal node has exactly four children) masking to quantify feature interaction.
This thesis work leverages this spatial hierarchy to create more explainable and robust models. This work focuses on three dimensions: 1) how to mask features to interpret model behavior on the manifold, 2) Shapley values to summarize each feature’s influence and importance, and 3) how masking makes the model robust to perturbation adversarial attacks.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Interpretable machine learning models via feature interaction discovery
PDF
Simulation and machine learning at exascale
PDF
Machine learning in interacting multi-agent systems
PDF
Algorithms and frameworks for generating neural network models addressing energy-efficiency, robustness, and privacy
PDF
Invariant representation learning for robust and fair predictions
PDF
Learning invariant features in modulatory neural networks through conflict and ambiguity
PDF
Robust causal inference with machine learning on observational data
PDF
Robust and adaptive online reinforcement learning
PDF
A rigorous study of game-theoretic attribution and interaction methods for machine learning explainability
PDF
Alleviating the noisy data problem using restricted Boltzmann machines
PDF
Learning distributed representations from network data and human navigation
PDF
Learning to optimize the geometry and appearance from images
PDF
Evaluating and improving the commonsense reasoning ability of language models
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Graph machine learning for hardware security and security of graph machine learning: attacks and defenses
PDF
Interaction between Artificial Intelligence Systems and Primate Brains
PDF
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
PDF
Human appearance analysis and synthesis using deep learning
PDF
Learning logical abstractions from sequential data
PDF
Fast and label-efficient graph representation learning
Asset Metadata
Creator
Hassan, Md Sk Abid (author)
Core Title
Robust interpretable machine learning on data manifold via feature interaction using Shapley framework and quadtree masking
School
Viterbi School of Engineering
Degree
Master of Science
Degree Program
Computer Science
Degree Conferral Date
2023-05
Publication Date
01/30/2025
Defense Date
12/06/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
adversarial robustness,data manifold,deep neural networks,feature interaction,Gaussian pyramid,interpretability,machine learning,masking,OAI-PMH Harvest,quad-tree,Shapley values
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Liu, Yan (
committee chair
), Jia, Robin (
committee member
), Nevatia, Ram (
committee member
), Raghavachary, Sathyanaraya (
committee member
)
Creator Email
bingabid@gmail.com,mdskabid@usc.edu
Unique identifier
UC112723823
Identifier
etd-HassanMdSk-11454.pdf (filename)
Legacy Identifier
etd-HassanMdSk-11454
Document Type
Thesis
Format
theses (aat)
Rights
Hassan, Md Sk Abid
Internet Media Type
application/pdf
Type
texts
Source
20230201-usctheses-batch-1005
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
adversarial robustness
data manifold
deep neural networks
feature interaction
Gaussian pyramid
interpretability
machine learning
masking
quad-tree
Shapley values