Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Knowledge-driven representations of physiological signals: developing measurable indices of non-observable behavior
(USC Thesis Other)
Knowledge-driven representations of physiological signals: developing measurable indices of non-observable behavior
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Knowledge-Driven Representations of Physiological Signals: Developing Measurable Indices of Non-Observable Behavior by Theodora Chaspari A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Electrical Engineering) August 2017 Copyright 2017 Theodora Chaspari Acknowledgments I would like to express my sincere gratitude to my advisor and mentor Prof. Shrikanth Narayanan for his invaluable support, immense knowledge, everlasting enthusiasm, as well as for his precious guidance regarding the future steps. I could not have imagined a more inspiring role model for my Ph.D. study and will always be grateful to you for that. I would also like to thank from the bottom of my heart Prof. Gayla Margolin who provided me with the opportunity to join her lab and work with her. Without her precious support, inspiration, and open thinking it would not be possible to conduct this research. I would further like to extend my greatest gratitude to Prof. Antonio Ortega for his insightful feedback, thoughtful guidance, and enlightening discussions. My sincere thanks to the amazing mentors that helped me along this way, Profs. Sungbok Lee, Panayotis Georgiou, and Brian Baucom. This research would have been impossible without my labmates, co-authors, and friends from the Signal Analysis and Interpretation Laboratory (SAIL)–you were the best fellow travelers in this journey. I am also very much grateful to Emily Mower Provost and Andreas Tsiartas for their valuable guidance and mentorship during this research. I would also like to very warmly thank my labmates from the Family Studies Project (FSP), who have welcomed me to the lab and expanded my research horizons with their intriguing projects. A very special thanks to my co-author and friend, Adela Timmons for being the best collaborator that someone could ask for–I am very much looking forward to our future “crazy” ideas. Finally, all this would have been impossible without my family and friends–thank you for all your love. ii Table of Contents Acknowledgments ii List of Tables vii List of Figures ix Abstract xii I Background 1 1 Introduction 2 1.1 Designing Physiological Representations . . . . . . . . . . . . . . . . . 4 1.2 Learning Physiological Representations . . . . . . . . . . . . . . . . . 5 1.3 Translating Representations to Novel Physiological Measures . . . . . . 6 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Previous Work 9 2.1 Background on EDA and ECG . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Physiological Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Dictionary Learning Techniques . . . . . . . . . . . . . . . . . . . . . 12 2.4 Novel Physiological Measures . . . . . . . . . . . . . . . . . . . . . . 14 II Designing and Learning Knowledge-Driven Physiological Rep- resentations 15 3 Designing Physiological Representations 16 3.1 Dictionary Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 The work presented in this chapter was published in: T. Chaspari, A. Tsiartas, L.I. Stein, S.A. Cermak, and S.S. Narayanan, “Sparse Representation of Elec- trodermal Activity with Knowledge-Driven Dictionaries,” IEEE Transactions on Biomedical Engineering, 62(3): 960-971, 2015. iii 3.2 Sparse Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Signal Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 SCR Detection in EDA . . . . . . . . . . . . . . . . . . . . . . 23 3.3.2 QRS Detection in ECG . . . . . . . . . . . . . . . . . . . . . . 24 3.3.3 Beat Classification in ECG . . . . . . . . . . . . . . . . . . . . 26 3.4 Description of Baseline Models . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5.1 Signal Reconstruction and Compression Rate . . . . . . . . . . 28 3.5.2 Signal Information Retrieval . . . . . . . . . . . . . . . . . . . 28 3.6 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Learning Physiological Representations 42 4.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1.1 Atom Sampling with Replacement . . . . . . . . . . . . . . . . 45 4.1.2 Atom Sampling without Replacement . . . . . . . . . . . . . . 46 4.1.3 Additional Probabilistic Assumptions . . . . . . . . . . . . . . 46 4.1.4 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Inference with MCMC Sampling . . . . . . . . . . . . . . . . . . . . . 47 4.2.1 MCMC Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.2 Sampling Dictionary Parameters . . . . . . . . . . . . . . . . . 49 4.2.3 Sampling Atom Indices and Coefficients . . . . . . . . . . . . 50 4.2.4 Sampling the Parameters of the Priors . . . . . . . . . . . . . . 52 4.2.5 Implementation of Bayesian DL . . . . . . . . . . . . . . . . . 52 4.3 Combination of Generated Dictionaries . . . . . . . . . . . . . . . . . 53 4.4 Choosing the Parametric Dictionary Function . . . . . . . . . . . . . . 54 4.5 MCMC Convergence for Bayesian DL . . . . . . . . . . . . . . . . . . 56 4.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.6.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.6.1.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . 60 4.6.1.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . 60 4.6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 61 4.6.2.1 Bayesian DL . . . . . . . . . . . . . . . . . . . . . . 61 R. Balasubramanian, T. Chaspari, and S.S. Narayanan, “A Knowledge-Driven Framework for ECG Repre- sentation and Interpretation for Wearable Applications,” IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017. The work presented in this chapter is published in the following journal: T. Chaspari, A. Tsiartas, P. Tsilifis, and S.S. Narayanan, “Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations,” IEEE Transactions on Signal Processing, 64(12): 3077-3092, 2016. iv 4.6.2.2 Steepest descent DL . . . . . . . . . . . . . . . . . . 61 4.6.2.3 Equiangular tight frame DL . . . . . . . . . . . . . . 62 4.6.2.4 K-SVD . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6.2.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . 62 4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.4 MCMC Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 64 4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 III Bio-Behavioral Applications 78 5 Translating Representations to Novel Physiological Measures 79 5.1 Physiological Spectrogram: EDA-Gram . . . . . . . . . . . . . . . . . 81 5.1.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.1.2 Baseline Features . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Sparse EDA Synchrony Measure . . . . . . . . . . . . . . . . . . . . . 84 5.2.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.2.2 Application-Driven Evaluation of SESM . . . . . . . . . . . . 87 5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 IV Concluding Remarks 98 6 Concluding Remarks 99 6.1 Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Reference List 103 Reference List 104 The work presented in this chapter was published in the following venues: T. Chaspari, B. Baucom, A. C. Timmons, A. Tsiartas, L. Borofsky Del Piero, K.J.W. Baucom, P. Geor- giou, G. Margolin, and S.S. Narayanan, “Quantifying EDA synchrony through joint sparse representation: A case-study of couples’ interactions,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. T. Chaspari, L.I. Stein Duker, S.A. Cermak, and S.S. Narayanan, “EDA-Gram: Designing Electroder- mal Activity Fingerprints for Visualization and Feature Extraction,” International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 403-406, Orlando, FL, 2016. v Appendix 118 A MCMC Ergodicity Proofs 119 B Computational Complexity 123 vi List of Tables 3.1 Equations and parameter values of dictionary atoms representing tonic and phasic EDA components. . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Unweighted accuracy for classifying between normal and abnormal beats (2-way), across atrial premature beat, paced beat, and premature ven- tricular contraction (3-way), and across all the above (4-way). . . . . . 34 4.1 Prior distributions of Bayesian dictionary learning variables. . . . . . . 68 4.2 Description of Metropolis-Hastings-within-Gibbs sampling distributions. 69 4.3 Description of EDA-specific dictionary atoms and initial parameters. . . 70 4.4 Final dictionary coherence . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Geweke MCMC diagnostic -P (jzj< 2) . . . . . . . . . . . . . . . . . 71 4.6 Metropolis-Hastings acceptance rates (%) . . . . . . . . . . . . . . . . 71 5.1 Linear mixed-effects model (LME) for predicting self- reported arousal (DEAP) from EDA features and two-way ANOV A comparing differ- ences on EDA features between dental environments and cleaning tasks (DENTISTRY). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.2 Pearson’s correlation between real and predicted arousal values (DEAP) and classification accuracy between regular and sensory adapted dental environments (DENTISTRY) using the skin conductance level (SCL M ), skin conductance response frequency (SCR F ), intensity and variation of power spectral density (PSD Int ,PSD Var ), and EDA-Gram features (S Int x ,S Var x ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3 Repeated-measures ANOVA for overall significant differences of Sparse EDA Synchrony Measure (SESM) across tasks. . . . . . . . . . . . . . 93 5.4 Pearson’s correlation between ground-truth and predicted attachment ratings using multiple linear regression with Sparse EDA Synchrony Mea- sure (SESM) features. . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 vii B1 Average computation time of dictionary learning algorithms . . . . . . 125 viii List of Figures 2.1 Example of an electrodermal activity (EDA) signal of skin conductance responses (SCR), marked with red “o”, and an indicative notation of SCR amplitude measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Examples of normalized phasic atoms represented with sigmoid-exponential, Bateman and chi-square functions for different time scale parameters s = 0:06; 0:1; 0:14 and time offsett 0 = 20. For each time-scale value, plots were created using all combinations of corresponding atom-specific parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Example of zero-order Hermite and AM sinusoidal atoms. . . . . . . . 35 3.3 EDA representation scheme and SCR detection for an example signal frame. (a) Input and reconstructed signal with solid blue and dashed red lines. The location of expert hand-annotated SCRs is marked (red “”), along with the signal peaks (magenta “”) and the SCRs estimated based on the phasic atoms of the sparse decomposition algorithm (green “”). The final SCRs (black “ ”) are located by combining the SCRs from the phasic atoms according to their location and mapping them to the signal peaks. (b) The normalized dictionary atoms selected by the first 4 iterations of OMP. The first tonic atom (solid cyan line) captures the signal level, the first and third phasic atoms (dashed magenta and dash- dotted green lines) the first SCR, while the second phasic atom (black dotted line) the second SCR. (c) The normalized phasic atoms multiplied by the corresponding OMP coefficients (dashed magenta, dotted black and dash-dotted green lines). The energy of each atom is indicative of the order they have been selected by OMP with higher energy atoms selected first. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 ix 3.4 Sparse representation of an exemplar ECG with 10 orthogonal matching pursuit (OMP) iterations. (a) Original and reconstructed ECG. (b) The first nine selected AM sinusoidal (AM sin.) and Hermite atoms scaled with OMP-derived coefficients. (c) The AM sinusoidal atoms used in the heuristic QRS detection and their grouping. (d) The segments of length S used to classify the presence (dark grey) or absence (light grey) of an R peak during the machine learning based QRS detection. (e) The atoms used to represent the morphology of the P, QRS, and T waves during beat classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.5 Example original and reconstructed electrocardiogram (ECG) signals us- ing 10 and 20 orthogonal matching pursuit (OMP) iterations for different beat types. Same legend applies to all plots. . . . . . . . . . . . . . . . 38 3.6 EDA reconstruction and SCR detection results. (a) Root mean square (RMS) error between original and reconstructed signal with respect to (w.r.t.) the number of (orthogonal) matching pursuit ((O)MP) iterations. (b) Absolute number of relative difference between real and estimated SCRs w.r.t. the number of (O)MP iterations. (c) Mean distance of esti- mated SCRs from their closest real SCR w.r.t. (O)MP iterations. (d,e,f) Precision, recall and Fscore of SCR detection with 6 (O)MP iterations w.r.t maximum distance thresholdd max between real and detected SCRs, the latter ranging between 10-100 samples. . . . . . . . . . . . . . . . 39 3.7 Compression rate of the original EDA signal and the EDA representation with the proposed sparse decomposition and the least squares fit meth- ods. (Y-axis break between 100 and 900 bits/sec) . . . . . . . . . . . . 40 3.8 Relative root mean square (RMS) error between original and reconstructed ECG signals and compression rate computed over 1-20 orthogonal match- ing pursuit (OMP) iterations. . . . . . . . . . . . . . . . . . . . . . . . 40 3.9 QRS detection F-score plotted against different maximum distance thresh- olds between the real and detected R peaks. Results for the machine learning approach are obtained with a variety of orthogonal matching pursuit iterations (K) and segment lengths (S). . . . . . . . . . . . . . 41 4.1 Example of initial dictionary and dictionaries learnt Steepest Descent (SD), Equiangular Tight Frame (ETF), K-SVD and Markov Chain Monte Carlo Bayesian inference (MCMC) using atom sampling with and with- out replacement (w-,w-o rplcm). Dictionary combination is performed withN b = 400. An instance of phasic atoms shifted witht 0 = 30 is shown. 74 x 4.2 Example of input EDA signal (solid cyan line) and reconstructed sig- nals using the original dictionary (blue dashed line) and the dictionar- ies learnt with Steepest Descent (SD), Equiangular Tight Frame (ETF), and Markov Chain Monte Carlo Bayesian inference (MCMC) (black- triangle, green-asterisk, and red-circled lines, respectively). Reconstruc- tion is performed using orthogonal matching pursuit with 4 iterations. . 75 4.3 Relative root mean square (RMS) error between original and reconstructed signal with respect to the number of (orthogonal) matching pursuit (OMP) iterations. Dictionaries are learnt with Steepest Descent (SD), Equiangu- lar Tight Frame (ETF), K-SVD, and Markov Chain Monte Carlo Bayesian inference (MCMC). During MCMC atom sampling is performed with (w-) and without (w-o) replacement (rplcm). . . . . . . . . . . . . . . . 76 4.4 Example MCMC trace plots and generated posteriors for the first ele- ment of vectors containing the (a,e) dictionary atoms n , (b,f) atom co- efficients s n , (c,g) atom priors n , and (d,h) for the noise precision n . 77 5.1 EDA-Gram design. (a) Example of phasic dictionary atoms with phasic width (PW). (b) Histogram of PW values with the resulting phasic width bands (PWB). (c) Input EDA. (d) Signal analysis frames (in solid blue line) and selected phasic atoms (in non-solid lines). (e) EDA-Gram. . . 94 5.2 Example of EDA signals and EDA-Grams for two subjects during the regular (RDE) and sensory adapted (SADE) dental environments. . . . 95 5.3 Joint sparse decomposition of two EDA signals with simultaneous or- thogonal matching pursuit (SOMP) and their commonly selected phasic atoms differently scaled per signal. Same legend applies to first/second and third/fourth plot and same time axis to all. . . . . . . . . . . . . . . 96 5.4 Sparse EDA Synchrony Measure (SESM) across tasks. Error bars repre- sent one standard deviation distance from the mean. . . . . . . . . . . . 97 xi Abstract Human physiology reflects an individual’s physical and mental condition through bio-signals, that are indicative of both physical and behavioral pathological factors. The electrocardiogram, for example, is widely used to assess cardiovascular diseases, as well as regulation and stress reactivity in various mental health conditions, such as Autism. This renders human physiology inherently complex and multi-faceted providing a rich platform not only for mathematically modeling an individual’s state, but also for impact- ing a variety of fields with data-informed feedback. Emerging engineering solutions for health and life-science applications employ evidence from biomedical signals to get in- sights into the physical, mental, and affective state of individuals and assist them towards regulatory changes. These techniques will become increasingly important as the continuing converging advances in sensing and computing, including wearable technology, permeate everyday life. The long-term tracking of physiological and behavioral mechanisms can promote healthy routines, increase emotional wellness and awareness, and revolutionize clinical assessment and intervention for chronic diseases (e.g. heart and pulmonary illnesses) and mental conditions (e.g. autism and depression). The multiple streams of information available for each individual can assist personalized healthcare through data-scientific health analytics. The unobtrusive continuous tracking of vital signs, for instance, can help patients with cardiovascular diseases for improving their medication habits, mod- erating their diet and physical activity, and managing stress levels. Physiology-tracking xii devices in Autism have the great potential of providing “mood-monitoring” data for pre- dicting challenging conditions, such as meltdowns, identifying possible reasons of their occurrence, or even preventing them. Despite those opportunities, there are many challenges in this field exemplified by the complex and heterogeneous data spaces, the lack of contextual information and the limited presence (or absence) of specialized analytics. This thesis focuses on translating the low-level data acquired from physiology-monitoring devices into higher-level meta- information and developing feedback mechanisms able to provide data-inspired sugges- tions for targeted changes. While sophisticated artificial intelligence systems can yield very accurate results, human experts usually find little comfort in them, since the insights about their inner workings and decision making process are very limited (or even non- existent). My dissertation focussed on the development of robust interpretable algorithms for the representation of biomedical signals, the design of novel intuitive physiological measures, and the exploration of knowledge-driven mathematical models for describing the co-evolution of physiology with other modalities and behavioral indices as a holistic system. xiii Part I Background 1 Chapter 1 Introduction One of the recent technological achievements is the increasing use of wearable de- vices that allows the sensing of physiological signals for health [101, 102], medical [17, 153] and other [40, 84] purposes. These applications stem from the need to moni- tor individuals over long periods of time overcoming the limits imposed by traditional non-ambulatory technology and providing new insights into diagnostic and therapeutic means [17]. They typically rely on small sensors for capturing data, portable devices for temporal storage and the use of wireless networks for (periodic) data transfer to a database [21]. A key concept of biomedical sensors is their ability to monitor continuously, unob- trusively and with low-cost beyond what we can sense or observe as humans. This can have implications to everyday life through sleep, performance, posture tracking, or even biometric authentication [42]. It further shows tremendous impact on health through prevention and treatment; wearable sensors can increase the ability to “listen” for the warning signs of serious diseases, provide baseline functional capacity of patients, and assess therapeutic interventions at various and frequent time points. Despite these exciting opportunities, wearable technology comes with various chal- lenges. Continuous recordings from these sensors and their use in everyday life and beyond specialized places yields a large amount of data. This is further confounded by the presence of noise in physiological signals, which can include high frequency artifacts, 2 distortion, and interference. The availability (and analysis ability) of human experts is not always guaranteed, underlining the necessity of robust algorithms for analyzing the corresponding signals, automatically deriving measurable indices and connecting them with interpretations that might lead to warnings, suggestions and real-time feedback. A central goal is to derive meaningful quantitative constructs from the signal data. Taking these challenges into account, robust representations of physiological signals should consider their characteristic structure, efficiently encode the underlying infor- mation, and provide reliable interpretation. This thesis proposes a knowledge-driven parametric framework to model physiological signals in order to reliably capture the un- derlying signal structure, retrieve the embedded signal information, and provide novel interpretable measures for characterizing human behavior beyond what is externally be- ing observed. The first part focuses on the design of a knowledge-driven method to rep- resent physiological signals of characteristic structure through the use of signal-specific parametric dictionaries along with sparse decomposition techniques (Chapter 3). Since, the manual design of dictionaries is labor intensive and involves time-consuming signal inspection, the second part proposes a parametric dictionary learning algorithm to auto- matically infer physiological representations through a Bayesian framework (Chapter 4). Finally, in the third part we demonstrate how the aforementioned representations can be used in order to develop novel physiological measures associated to various social and psychological constructs (Chapter 5). The techniques presented in this thesis can apply to a variety of physiological signals with characteristic structure over time, such as the electrocardiogram (ECG), the pulse pressure variation (PPV), the electrodermal activity (EDA), etc. Our work focuses on the EDA and the ECG, because of they of they can be unobtrusively recorded by a variety of wearable devices and they depict a characteristic morphology, which is appealing for the development of mathematical models [25, 83, 111, 113]. 3 1.1 Designing Physiological Representations A central part in signal processing is the design of low-dimensional models that are able to distinguish the useful signal parts, capture the meaningful information, and provide foundation for well-established or even novel measures. Sparse representation techniques model a signal as a linear combination of a small number of atoms chosen from an over-complete dictionary aiming to reveal certain struc- tures of a signal and represent them in a compact way [50]. Since psychophysiological signals show typical patterns over time, their sparse decomposition can yield accurate representations of scientific and translational value and contribute to scalable implemen- tations (e.g., on mobile devices). Noting that the desired information contained in EDA signals is low dimensional, we introduce the use of sparsity to robustly represent them. The innovation of our approach lies in that we model physiological signals directly by taking into account its variability through the use of overcomplete knowledge-driven dictionaries and evaluate our method explicitly through signal reconstruction and infor- mation retrieval measures. We propose signal-specific dictionaries that take into account the characteristic sig- nal structure in the time domain, model the main parts of the signal and can be inter- preted based on physical constructs. We further employ sparse representation techniques to decompose the signals based on a small set of atoms from the dictionary. Through post-processing of the selected atoms, we can automatically detect the main signal com- ponents and identify abnormalities due to several pathological factors. We evaluate the usefulness and interpretability of our approach through goodness of fit criteria and component detection measures and classification experiments. Due to potential portable device applications for signal storage and transmission, we further analyze the compression rate of our algorithm. Our results indicate that the proposed approach provides benefits over the previous work [89, 54, 132], with respect to signal reconstruction, compression, and detection/classification metrics. 4 1.2 Learning Physiological Representations An essential step towards compact and reliable representations is the dictionary selection. Traditionally, analytic pre-designed dictionaries comprising Gabor [44], wavelet [43], curvelet [139], or other atoms have been used, because of their localization, direction- ality and multi-resolution properties. Dictionary learning (DL) is a recent approach focusing on learning atoms from the available training data. It includes several well- known algorithms, such as the K-SVD [4] and the method of optimal directions [52], as well as probabilistic approaches [106]. Although non-parametric DL is effective for signal reconstruction [4], restoration [93], and classification [92], it depicts a highly non- convex nature, mostly yields non-structured dictionaries [128] and typically requires a large amount of training data [137]. These disadvantages can be mitigated by imposing a predetermined structure through the use of parametric dictionaries bridging the gap between pre-defined analytic dictio- naries and purely numerical DL [145]. In such approaches, dictionary atoms are ex- pressed as a function of a low-dimensional parameter set optimized with respect to cri- teria involving desirable properties [6, 143, 159]. Parametric DL is more likely to con- verge faster and have more efficient implementations compared to the non-parametric problem [128]. It further provides higher signal interpretability yielding important meta- information [12, 31, 98, 121]. We propose a Bayesian framework for learning the parameters of dictionary atoms. Our approach imposes probabilistic distributions to the variables of the sparse represen- tation problem that are estimated through MCMC methods because of their simplicity and ability to fully perform Bayesian inference [69]. Compared to previous Bayesian DL [88, 160], our approach introduces parametric dictionaries where non-closed-form solutions are handled with a combination of Gibbs and Metropolis-Hastings (MH) sam- pling (MH-within-Gibbs). Our approach differs from previous parametric DL [6, 159] because of its stochastic framework that yields estimation of the full problem variables. 5 This results in parametric dictionaries that take into account the structure of the training data and are less prone to overfitting. One key challenge with MCMC is to determine its asymptotic behavior, i.e. whether it provides accurate posterior approximations. The goal is to create an aperiodic and irreducible Markov Chain (MC) with stationary distribution same as the posterior distri- bution of interest [100]. Irreducibility ensures that any state of the space is accessible, while aperiodicity makes sure that the chain does not return to the same state at reg- ular times. Uniformly ergodic MCs are a special case in which the MC converges to the invariant distribution independently of the initial state. They guarantee geometri- cally fast convergence and are key sufficient conditions in order to establish central limit theorems for empirical averages and provide consistent estimators of MCMC standard errors [29, 79, 97, 100, 124]. Because of these, we discuss the geometric ergodicity of MCMC in our proposed Bayesian inference framework that ensures convergence. We further demonstrate the ability of our algorithm for parallel processing with ex- periments on synthetic data and real biomedical signals. DL is performed for each sample separately and the resulting dictionaries from each exemplar data are further combined into a unified model. Our results indicate that the proposed approach yields benefits in terms of superior signal reconstruction compared to previous SD [6] and ETF [159] methods. 1.3 Translating Representations to Novel Physiological Measures The aforementioned signal representations are used to develop novel interpretable phys- iological measures that can be employed to quantify the internal physiological state of a person, as well as the physiological linkage between two interacting people. These can find application in a variety of health and well-being domains. 6 Motivated by the widely-used notion of spectrogram, we propose the EDA-Gram, a multidimensional fingerprint of the electrodermal activity (EDA) signal. The EDA-Gram is based on the sparse decomposition of EDA from a knowledge-driven set of dictionary atoms. The time axis reflects the analysis frames, the spectral dimension depicts the width of selected dictionary atoms, while intensity values are computed from the atom coefficients. In this way, EDA-Gram incorporates the amplitude and shape of signal fluctuations and it can be used for visualization purposes and feature development. Our results indicate that the proposed representation can accentuate fine-grain signal fluctua- tions, which might not always be apparent through simple visual inspection. Statistical analysis and classification/regression experiments further suggest that the derived fea- tures can differentiate between multiple arousal levels and stress-eliciting environments. Beyond the unimodal analysis, finding ways to reliably capture patterns of co-occurring signals–especially in the context of human interactions– can afford us complementary insights into human physiology and behavior [134]. Evidence suggests that the coor- dination of bio-signals between interacting individuals can be linked to various social, psychological and developmental constructs [55, 133]. We propose the Sparse EDA Synchrony Measure (SESM), an index that quantifies the similarity of EDA signals jointly modeled with sparse decomposition techniques, such as the Simultaneous Or- thogonal Matching Pursuit (SOMP) [150], and EDA-specific dictionaries of tonic and phasic atoms [30]. SESM is expressed as a negative natural logarithm of the joint repre- sentation error so that similar signals achieve higher synchrony values and is evaluated with two datasets containing in-lab dyadic interactions between married and young cou- ples, respectively. Statistical analysis indicates that SESM reflects differences across tasks of various intensity in both datasets. Regression experiments also suggest that it can be associated with measures of attachment collected from individuals’ self-reports. These are consistent with previous findings on couples’ physiological synchrony [85] and attachment style [73]. 7 1.4 Outline The remaining of this thesis is outlined as follows. Chapter 2 reviews the literature on physiological models, dictionary learning techniques and previous studies on physiolog- ical measures. Chapter 3 demonstrates how we can create signal-specific dictionaries in order to model EDA and ECG signals through sparse decomposition techniques. Moti- vated by the need to create reliable dictionaries, Chapter 4 discusses a novel Bayesian parametric dictionary learning method to represent signals of characteristic structure. Chapter 5 demonstrates how we can employ the aforementioned representations to de- rive meaningful and interpretable constructs from biomedical signals with applications in the Autism domain and romantic couples’ interactions. Finally, Chapter 6 provides discussion, conclusions, and future work. 8 Chapter 2 Previous Work This chapter provides background information on Electrodermal Activity (EDA) and Electrocardiogram (ECG) signals and reviews the literature on the design of physiolog- ical models, dictionary learning techniques and the development of novel physiological measures. 2.1 Background on EDA and ECG EDA is one of the most widely used psychophysiological signals. It is related to the sympathetic nerve activity through the changes in the levels of sweat in the ducts [23, 46], that result in the observed skin conductance response (SCR) fluctuations. EDA has been used in a variety of laboratory settings examining attention, memory, information processing, decision making, emotion, as well as a predictor of normal and abnormal behavior and other psychological constructs [23, 46]. Recently the focus has shifted to collecting EDA signals outside the laboratory in the wild with wearable devices that can measure continuously around the clock, privately logging and/or wirelessly streaming data [25, 83, 111, 113]. This unobtrusive long-term recording of EDA results in large amounts of data that require deriving appropriate signal representation and interpretation. The EDA signal is decomposed into tonic and phasic components. The slow moving tonic part, called skin conductance level (SCL), depicts the general trend, whereas the 9 fast fluctuations superimposed onto the tonic signal are the skin conductance responses (SCR) (Fig. 2.1) [46]. The shape of SCRs is characterized by a steep increase in the signal and a slow recovery [22]. Amplitude is the most commonly reported SCR measure, quantified as the amount of increase in conductance from the onset of the response to its peak (Fig. 2.1) with typical values ranging between .1 and 1.0S [46]. SCRs are caused by the burst of the sympathetic sudomotor nerves controlling the sweat glands linked to emotion, arousal and attention [46]. Previous work has modeled explicitly the shape of the resulting signal with appropriate functions [89, 142] or implicitly the causal relation between the underlying activity of sudomotor nerves and the observable SCRs [5, 16, 9]. A review on EDA models can be found in [9]. The electrocardiogram (ECG or EKG) is a diagnostic tool routinely used to assess the activity of the heart through the electrical potential difference during the depolarization and repolarization of the myocardial fibers [156]. While there exist standard techniques to acquire the ECG trace, its interpretation requires significant training and expertise. This becomes more challenging in the light of wearable applications, where the long- term continuous ECG monitoring results in large amounts of data. The limited presence of human experts renders automatic processing necessary not only to store and transfer the acquired data, but also to meaningfully interpret them. ECG depicts a characteristic periodic structure over time. It consists of three main parts, the P, QRS, and T waves, representing the heart’s atrial depolarization, ventricular depolarization, and ventricu- lar repolarization, respectively. This typical shape has been taken into account for the development of appropriate mathematical models. 10 20 30 40 50 60 70 80 90 100 3.5 4 4.5 Time (sec) EDA (μS) SCR Amplitude Skin Conductance Response (SCR) Figure 2.1: Example of an electrodermal activity (EDA) signal of skin conductance re- sponses (SCR), marked with red “o”, and an indicative notation of SCR amplitude mea- sure. 10 2.2 Physiological Models Several studies have mathematically expressed the EDA and ECG signals. Lim et al. [89] developed a parameterized sigmoid-exponential model of EDA fitted into signal seg- ments. Results were found to be correlated with previously established automatic scor- ing methods [22]. Storm et al. [142] used a quadratic polynomial fit to sequential groups of datapoints to detect SCRs, whose total number was compared to manual counting. In the context of causal EDA modeling, Alexander et al. [5] represented the SCR shape as a bi-exponential function and used deterministic inverse filtering to estimate the driver of nerve bursts. Evaluation of the method was performed by visual inspection and by finding significant correlations of the resulting SCR measures with variables of gender and age. Benedek et al. [15, 16] assumed EDA to be the convolution of a driver function, reflecting the activity of sweat glands, with an impulse response depicting the states of neuron activity. This method was evaluated through the reconstruction error. It was also compared to standard peak detection for a set of noise burst events and was found to give results more likely to confound with these environmental conditions. Finally, Bach [7, 8, 9, 10] have described a dynamic causal model (DCM) using Bayesian inversion to infer the underlying activity of sweat glands. Each sudomotor activity burst is modeled as a Gaussian function, which serves as an input to a double convolution operation yielding the EDA signal. The correlation of the estimated bursts to the number of SCRs from semi-visual analysis was reported. This model was found to be a good predictor of anxiety in public speaking. EDA decomposition was cast as a convex optimization problem in [68]. The min- imization of a quadratic cost function was used to estimate the tonic and phasic signal components, represented with a cubic spline and a bi-exponential Bateman function, re- spectively. The EDA features extracted from the model parameters yielded statistically significant differences between neutral and high-arousal stimuli. In terms of ECG, previous work has used Hermite functions [138] and amplitude- modulated (AM) sinusoidal waveforms [104] to represent the QRS complex or even the 11 entire beat. These models have been separately employed for QRS detection, ectopic beat detection and beat classification and are evaluated through visual inspection and limited quantitative analysis. Our approach builds upon the Hermite and AM sinusoidal functions [104, 138], which form the basis of the dictionary atoms employed in a sparse representation framework. The dictionary atoms can be interpreted in relation to the P, QRS, and T waves. Unlike previous approaches, no pre-processing or QRS segmentation is needed beforehand. We note that previous works involving sparse decomposition of the ECG have mainly used dictionaries of wavelet, DCT, and Gabor atoms [66, 115, 161], which are less interpretable for the considered signal. Despite their encouraging results, some of these research efforts [89, 142] tend to impose restrictions on the signal structure. Also, studies modeling EDA through its relationship with sympathetic arousal [5, 16, 9] assume a linear-time invariant system, which is not always justified by empirical evidence [11]. While several studies take into account signal reconstruction measures [7, 16], evaluation is mostly performed by visual inspection or implicitly through expected empirical assumptions correlating the systems’ SCR measures to physical, mental, and behavioral states. The novelty of the present study lies in the fact that it directly models a signal with sparsity constraints and takes into account the inherent shape variability. We evaluate our approach through both signal reconstruction criteria and measures comparing automatically-detected signal components to human-annotated ones. 2.3 Dictionary Learning Techniques Dictionary Learning (DL) methods usually alternate between sparse decoding and dic- tionary update [128]. In the context of non-parametric DL, Ophir et al. have proposed a sequential learning algorithm by identifying the orthogonal directions to a data sub- set [107]. Engan et al. used matrix inversion to compute the dictionary matrix [52], while Aharon et al. introduced K-SVD, which is a constrained optimization approach 12 performing atom-by-atom update [4]. Generalized principal component analysis models the data as a union of low-dimensional subspaces with orthogonal bases [152], while structured dictionaries have been proposed in an effort to enforce additional translation invariant, hierarchical and multiscale properties [3, 96, 129]. Parametric dictionaries depict higher interpretability, lower density of local minima and compact representation [128]. Dictionaries can be learned as a result of translation of elementary signal segments over space and time [20, 53]. They can also contain atoms of predetermined structure, such as wavelets [74] or Gabor functions [6], whose parameters are adapted with steepest-descent (SD) [6] and other least-squares-based methods [74, 98]. In order to improve the representation quality, Yanghoobi et al. proposed a method to find dictionaries close to the one with minimum coherence, called the equiangular tight frame (ETF) [159]. Qiu et al. approached the problem of finding the optimal mapping function between the dictionary parameters and actual atoms benefitting domain adaptation [117]. Finally, Thanou et al. [145] proposed a parametric DL for signals residing on weighted graphs. Although the aforementioned deterministic DL methods perform well in various sig- nal processing applications [4, 92, 93], they provide single-point estimates and cannot handle noise uncertainty. Bayesian approaches offer an alternative to addressing those disadvantages. They have been proposed for compressed sensing [19, 51, 157, 158] as well as for non-parametric DL. Sparseness is promoted with continuous Laplacian and other super-Gaussian distributions [76, 86, 87, 106, 147], or discrete (Beta-)Bernoulli variables [19, 72, 88, 131, 160]. The advantage of the latter lies in avoiding the signif- icant amount of small non-zero components [160]. Gaussian assumptions are typically imposed to the dictionary atoms [88, 160] and the signal noise [19, 88, 131, 160], the latter being consistent with biomedical applications [63, 120]. Bayesian inference casts DL into an optimization problem for maximizing the posterior distribution. The over- complete dictionaries containing many atoms in combination with the large amount of training data yield a high-dimensional framework, for which closed form solutions are 13 usually difficult to derive and approximate inference methods are followed. Common ap- proaches include the evidence maximization [157, 158], relevance vector machine [76], Markov Chain Monte Carlo (MCMC) [19, 160], and variational approximation [88]. 2.4 Novel Physiological Measures Quantifying synchrony has been a core theme in various psychophysiological studies. Dynamical systems have been used to represent self- and co-regulation between indi- viduals [56, 70]. Couples’ physiological linkage was further analyzed with bivariate time-series [85]. Most of these studies capture the association between average levels without considering the time-dependent signal variability. Our approach examines the joint evolution over time by selecting the dictionary atoms that best fit the ensemble of EDA signals. Multivariate sparse models have been used in a variety of applications to represent joint signal streams with inherent low-dimensional information. Research in distributed communication and sensing has focused on recovering signal ensembles and character- izing the number of required sensors for reconstruction [154]. Multi-channel match- ing pursuit with data-dependent dictionaries has been proposed for electrocardiogram (ECG) [144] and electroencephalography signals (EEG) [49] as well as for spatial audio coding [67]. Joint sparse models further found application to label propagation and ac- tion recognition [33]. Our work follows upon these by using Simultaneous Orthogonal Matching Pursuit (SOMP) [150] to identify common dictionary atoms of parallel EDA streams. 14 Part II Designing and Learning Knowledge-Driven Physiological Representations 15 Chapter 3 Designing Physiological Representations Low-dimensional, informative and meaningful representations are a central construct in any signal processing application. Many psychophysiological signals carry distinctive structures in time making the use of sparse decomposition techniques appealing. We propose a knowledge-driven method to represent physiological signals through the use of sparse decomposition techniques and carefully designed knowledge-driven dictionar- ies. The small number of non-zero components in the representation contains important information about the signal characteristics, which can potentially be related to various medical conditions and psychological constructs. For this reason, dictionaries have to be carefully designed so that they capture the signal variability and their underlying infor- mation. We propose the use of parameterized signal-specific dictionaries that are able to rep- resent the main components (Section 3.1). The benefit of introducing these predeter- mined atoms is that we know their properties in advance, such as the amplitude and width The work presented in this chapter was published in: T. Chaspari, A. Tsiartas, L.I. Stein, S.A. Cermak, and S.S. Narayanan, “Sparse Representation of Electrodermal Activity with Knowledge-Driven Dictionaries,” IEEE Transactions on Biomedical Engineering, 62(3): 960-971, 2015. R. Balasubramanian, T. Chaspari, and S.S. Narayanan, “A Knowledge-Driven Framework for ECG Representation and Interpretation for Wearable Applications,” IEEE International Conference on Audio, Speech and Signal Processing (ICASSP), New Orleans, LA, 2017. 16 of fluctuations, which are of interest in a variety of psychophysiological studies [24]. We further decompose physiological signals into a small number of atoms (Section 3.2) which are used to retrieve information about its specific constructs, such as SCRs and QRS complexes (Section 3.3). We compare the proposed parametric representation to the least-means-squares parametric fit proposed by Lim et al. [89] and the wavelet trans- form [132, 54] (Section 3.4). Our data come from 37 children performing an attention sustaining task for the EDA and the publicly available MIT-BIH Arrhythmia database for the ECG (Section 3.6). Compared to previous studies fitting a predetermined structure to the signal, results indicate that our approach provides benefits across all aforementioned criteria and provides a foundation for automatic measurement of signal components and the extraction of meaningful features (Section 3.7). In the following, we will use x = [x(1) ::: x(L)] T 2< L to denote a vector of length L, x(t), t = 1;:::;L, its corresponding value at the t th sample in time andkxk p = ( P L t=1 jx(t)j p ) 1=p its p-order norm. The inner product between two vectors x 1 and x 2 will be referred as x 1 T x 2 . A matrix withL 1 raws andL 2 columns will be written in bold capital letters as A2< L 1 L 2 . 3.1 Dictionary Design Since our approach involves a knowledge-based representation of signals, the choice of dictionary atoms is critical. Dictionaries consist of different types of atoms, each corre- sponding to different signal components benefiting the interpretability of our approach. Specifically for EDA, we introduce two kinds of atoms that model the tonic and pha- sic components of the signal (Table 3.1). The first group takes into account the tonic part, i.e., the slow varying signal level. It is quantified with straight lines having an off- set 0 and a slope . Tonic atoms will be referred as g 2< L where = ( 0 ; ) is an element of the setB =< 2 . The second group of atoms captures the phasic com- ponent of EDA, meaning the rapid fluctuations superimposed on the signal, also known 17 as SCRs. We use three types of functions that can potentially represent the characteris- tic steep rise and slow decay of SCRs: the sigmoid-exponential, the Bateman, and the chi-square functions. The sigmoid-exponential was introduced in [89] for representing SCRs with high values of T rise and T decay resulting in long rise and decay time. The Bateman function [16], inspired by the laws of diffusion at the skin pores, represents the shape of an impulse response, which when convolved with a driver function results in the EDA signal. Parametersa andb control the steepness of recovery and onset respectively; the higher they are, the less time it takes for the SCR to transition to and from a peak. Finally, the probability density function of the chi-squared distribution, referred here as chi-square function, is introduced by the authors of the present paper as an alternative way to represent the SCR shape. It involves only one parameter k, with large values resulting in wider SCRs. In order to ensure large variety in the dictionary atoms, we considered two additional parameters common for all phasic atom types. The time scale s is responsible for the compression and dilation of the atom in time with lower values resulting in wider SCRs and vice-versa (Fig. 3.1). Since there is no a priori informa- tion about the location of SCRs within an analysis frame, the dictionary contains shifted versions of all atoms over a range of time offsetst 0 spanning one signal frame. We will refer to the three types of sigmoid-exponential, Bateman and chi-square SCR atoms as g (1) 2< L , g (2) 2< L and g (3) 2< L respectively with correspond- ing parameters (1) = (T rise ;T decay ;s;T 0 ) 2 (1) , (2) = (a;b;s;T 0 ) 2 (2) and (3) = (k;s;T 0 )2 (3) , where (1) =< + 4 , (2) =< + 4 and (3) =< + 3 . By com- bining the possible parameters we created all tonic and phasic atoms which are merged into three different dictionaries D (1) =fg (t);g (1)(t)g, D (2) =fg (t);g (2)(t)g and D (3) =fg (t);g (3)(t)g, containing sigmoid-exponential, Bateman and chi-square func- tions respectively. In the case of sigmoid-exponential dictionary (Table 3.1), for example, there are 9 possible values for scales, 32 for time offsett 0 , 9 forT rise and 10 forT decay , resulting in 25,920 phasic atoms. Combining the 3 and 21 different values for 0 and respectively, we get additionally 63 tonic atoms, that result in a dictionary size of 25,983 18 atoms in total. Similarly, Bateman and chi-square dictionaries contain eventually the same number of phasic and tonic atoms with the sigmoid-exponential one. All dictio- nary atoms are normalized for unit standard deviation. We introduced more phasic than tonic atoms in order to capture the large variety of SCR shapes compared to the signal level, which remains fairly constant through an analysis frame. The parameter values (Table 3.1) were empirically set so that the resulting atom shapes are similar to the ob- served SCRs. In the future, we plan to automatically learn the dictionary parameters for more accurate representation. ECG-specific dictionaries contain three types of atoms. Signal levels are captured with straight lines expressed asg (t) = 0 +t, where 0 2f20;15;:::; 10g and 2f:03;:29;:::;:001; 0;:01;:02;:::;:3g are the offset and slope, respectively. We employ AM sinusoidal waveforms (Fig. 3.2a) to approximate the burst-type shape of the QRS complex, i.e. g (t) = exp b a (1 cos (a(tt 0 ))) cos ( (tt 0 ) +), where a2f:01;:02;:::;:08g andb2f1; 1:5; 2; 2:5g are the modulating factor parameters,2 f:9; 1:3g is the phase, = 7 the carrier frequency, andt 0 the time shift. We further use zero-order Hermite polynomials (Fig. 3.2b) for the P and T waves with equationg (t) = 1 p w p exp (tt 0 ) 2 =2w 2 , wherew2f1; 3:5;:::; 38:5g andt 0 are the time scale and shift, respectively. The morphology of several abnormal beats (e.g. premature ventricular beat) led us to introduce negated AM sinusoidal and Hermite atoms, i.e. g (t) =g (t) and g (t) =g (t). Time shift ranged withint 0 2f600;588;:::; 600g (in samples) in order to capture all possible locations of the P, QRS, T waves within the analysis frame, as well as to represent partially observed waves at the edges of the frame. Atoms whose non-zero region lied outside the analysis frame were omitted, resulting in 57,817 atoms: 427 straight lines, 14,412 Hermite, and 42,978 AM sinusoidal. The parameters of the dictionary atoms were empirically selected based on data inspection and consistent with the expected ECG shapes, e.g. the time duration of the Hermite (P, T) is larger than the AM sinusoidal (QRS) [75]. 19 3.2 Sparse Decomposition Sparse representation of a signal f2< L can described by the following equation pro- viding a non-convex problem [50]: min c kck 0 subject to f = Dc (3.1) where D2< LQ is an over-complete dictionary matrix withQ prototype signal-atoms and c2< Q are the atom coefficients withNQ non-zero elements. One way to approach this NP-hard problem is to use greedy strategies, that abandon exhaustive search in favor of locally optimal updates. The most well-known are MP [94] and OMP [110, 45] reaching sub-optimal solutions. Another way of solving Eq. 3.1 is through relaxation of the discontinuous l 0 -norm leading to basis pursuit (BP) [32] and focal underdetermined system solver (FOCUSS) [119, 118]. Despite the fact that these relaxation approaches can obtain global solutions, they are far more complicated and less efficient in terms of running time [50], rendering MP algorithms compelling because of their simplicity and low computational cost. Theoretical guarantees of correctness [149] along with empirical experiments [144, 116, 114], led us to the use of MP and OMP in the present paper. (O)MP will denote cases referring to both these algorithms. Let D be a dictionary described in Section 3.1 containing a total of Q atoms. Ac- cording to MP [94], the initial signal f can be written as a linear combination of infinite atoms as follows: f = 1 X n=0 c n g n (3.2) wherec n are the atom coefficients. MP selects a set of atoms from D that best matches the signal structure. If g 0 2 D is the atom at the first iteration, the signal f can be written as: f = (f T g 0 ) g 0 +Rf (3.3) 20 whereRf is the signal residual after approximating f in the direction g 0 . The selection of g 0 is greedily performed in order to maximize the similarity of the atom to the original signal, i.e., such that f T g 0 is maximum. Note that in the selection criterion, we did not use the absolute value, as in the original MP algorithm. If we allowed negative inner product values, i.e., negative atom coefficients, the shape of the predefined atoms would change, resulting in a totally different signal interpretation. Negative coefficients of the SCR atoms, for example, would not have any physical meaning and would disturb the structure of the original EDA signal. After the first iteration, residual Rf is similarly decomposed by selecting an atom from dictionary D that best matches its structure. Iteratively, the residualR n+1 f can be written as: R n+1 f =R n f (R n f) T g n g n (3.4) where g n 2 D is selected to maximize (R n f) T g n . After N iterations, the original signal f is approximated by f N : f N = N1 X n=0 (R n f) T g n g n (3.5) Since processing of long duration signals is involved, we perform segmentation and frame-by-frame analysis. Let the superscript (k) denote thek th time frame of lengthL, wherek = 1;:::;K withK being the total number of frames. Therefore thek th frame of the original signal in terms of the atoms selected by MP can be approximated by: f (k) N = N1 X n=0 (R n f (k) ) T g n g n (3.6) We use a step of L 3 samples in order to avoid discontinuities introduced by the start and end of the signal frame. As will be described in Section 3.3, SCR detection is based on the middle part of each signal frame for the same reason. A similar approach has been followed with other bio-signals [162]. 21 OMP [110, 45, 149, 26] is a refinement of MP which adds a least-squares minimiza- tion to obtain the best approximation over the atoms that have already been chosen at each iteration. Let D S N = g 0 ;:::; g N1 2< LN be the matrix containing the se- lected atoms afterN iterations. The least-squares approximation f N of signalf is: f N = D S N (D S N T D S N ) 1 D S N T f (3.7) where the term in brackets corresponds to theN non-zero coefficients used to scale the selected atoms. OMP updates the residual by projecting the original signal onto the linear subspace spanned by the selected atoms, therefore it never selects the same atom twice. Similar expressions hold for the signal of thek th time frame. A schematic representation of the way OMP sparsely decomposes a signal frame is presented in Fig. 3.3. The original and reconstructed signals are shown with solid and dashed lines (Fig. 3.3a). OMP first selects a straight line atom in order to represent the signal level and then two phasic atoms centered around the 100 th and 220 th sample to capture the two SCRs. The original normalized atoms as well as the resulting pha- sic atoms when scaled with the coefficients computed by OMP are shown in Figs. 3.3b and 3.3c. We also notice that the phasic atom centered around sample 100 has the high- est energy (Fig. 3.3c), since it represents the wider SCR of the frame. Finally, the third selected phasic atom is centered towards the end of the first annotated SCR around sam- ple 140. This happens because the first phasic atom did not capture the entire energy of the corresponding SCR, therefore another atom with much smaller coefficient was in- troduced. Figs. 3.4a-b provide an example of a typical ECG and the selected dictionary atoms, which will be used for interpreting the signal. 3.3 Signal Information Retrieval The parametric nature of dictionary atoms can afford us insights about the morphology of the corresponding signal. We demonstrate how we can use the selected atoms to detect 22 SCRs and QRS complexes in the EDA and ECG, respectively, and how the ECG-specific dictionaries yield features capable of distinguishing signal abnormalities due to several pathologies. 3.3.1 SCR Detection in EDA SCRs in the EDA signal were detected from the phasic atoms selected by the sparse decomposition algorithm. Letfg (k) m : m = 1;:::;Mg be the phasic atoms selected after N (O)MP iterations at the k th signal frame with M < N, as at least one tonic atom will compensate for the signal level. Since the dictionary is predetermined, we know exactly the location of the maximum amplitudes of the phasic atoms, denoted asp (k) m . As shown in Fig. 3.3 for atoms centered around samples 100 and 140, more than one phasic atoms might be selected to represent a single SCR because the fit between dictionary atoms and signal is not always perfect. Taking advantage of their time proximity, we can group phasic atoms according to their location with a histogram of thep (k) m values with N b bins. Each non empty histogram binh will contain all phasic atomsfg (k) m h g capturing one SCR, which is represented by the linear combination of the grouped phasic atoms: SCR (k) h (t) = X fg (k) m h g c (k) m h g (k) m h (t) (3.8) The weightsc (k) m h correspond to the atom coefficients computed by (O)MP. Because of the simple shape of the above SCR signal, we are able to find its amplitude and time of peak by simply locating its maximum point. SCRs with amplitude less than a predetermined thresholda thr were omitted [46], in accordance to the manual annotation (Section 3.6). To estimate the final SCR locations for the whole signal, we merged the SCRs from all analysis frames. Since the starting and ending samples of each frame are more prone to errors due to discontinuity issues, we only take into account SCRs located in the middle L 3 samples of each frame. The value L 3 was chosen to coincide with the length of the analysis step in order to ensure that each SCR is taken into account once. For the first 23 and last frame we also consider SCRs from the first L 3 and last L 3 samples, respectively. A signal frame of 10 sec (320 samples), for example, will result in analysis step of approximately 3.3 sec (107 samples), in which the shape of an SCR is easily identifiable, as shown in Fig. 3.3. Since our approach involves local decisions made on each signal frame, a post- processing step that takes into account the global EDA shape is necessary. Simple peak detection was performed and the located SCRs were then mapped to the nearest detected peak within a distance threshold d thr . SCRs not paired with a neighboring peak were omitted and those matched to the same peak were only counted once. In Fig. 3.3, for example, the phasic atoms centered around samples 100 and 140 were grouped together based on the proximity of their location and were further mapped to the peak of sam- ple 110 to represent the first SCR. In order to capture the second SCR, the phasic atom centered around sample 220 was mapped to the peak from sample 230. 3.3.2 QRS Detection in ECG We further demonstrate how we can use the selected atoms to detect the QRS complex (i.e. the peak of the R wave) through a heuristic-based framework and a machine learning approach. The heuristic framework takes into account the location and coefficients of the se- lected AM sinusoidal atoms, which are designed to capture the QRS complex. LetI andI be the indices of the selected AM sinusoidal and negated AM sinusoidal atoms for the current analysis frame, and c k , k 2 I , and c l , l 2 I , be the correspond- ing coefficients. QRS complexes are likely to be represented by AM sinusoidal atoms with high coefficients, therefore we first find the AM sinusoidal atoms whose coeffi- cientsc i are higher than a fractionp =:3 of the maximum coefficient within the frame: I = n i :i2I S I V c i >p max n fc k g k2I S fc k g k2I oo . The setI might contain more than one AM sinusoidal for one R peak –especially for abnormal beats. Given this observation, we perform a histogram grouping based on 24 the location of AM sinusoidal atoms from setI. For example, in Fig. 3.4c two AM sinusoidal atoms were selected for each QRS complex and were binned together by our algorithm. Since the histogram might contain negated atoms (fromI ), the maximum (or minimum) location of the atom with the highest coefficient in each bin is mapped to the closest maximum (or minimum) point on the actual ECG signal. If two QRS segments were located closer than 25 samples, we ignore the one with the lower amplitude. While heuristic rules can reliably detect normal beats, the large variability of ECG signals –confounded by the presence of a wide range of abnormalities– led us to the development of an automatic decision making framework. According to this, we di- vide each analysis frame determined by the ECG representation into smaller segments ofS=25,50,75 samples (Fig. 3.4d). We then cast the QRS detection as a classification problem, in which each small segment is classified on whether it contains an R peak or not. The features for classification are the parameters and coefficients of the two AM sinu- soidal and the two Hermite atoms with the highest coefficients, whose location is within the corresponding segment. If the current segment does not contain as many atoms, we use those closest to the center of the segment. We further include the distance between the peak of each selected atom and the center of the corresponding segment, as well as the number of AM sinusoidals and Hermites within the segment in order to get a mea- sure of the corresponding signal energy. This results in 20 features in total. We tested different feature matrices by varying the number of OMP iterations in the ECG repre- sentation (10,15,20). In contrast to the heuristic approach, this method incorporates ad- ditional information from the Hermite atoms, which are particularly useful in the case of morphologically abnormal QRS complexes. Classification is performed using an 8-fold cross-validation using random forests, because of their ability to handle the continuous and discrete values of our feature space. Based on the binary classification, we detect the actual R peak locations as follows. In the absence of an R peak decision, we perform no further action. However, if the 25 ECG segment was classified as having an R peak, we detect its maximum location at the actual signal (or minimum, if the AM sinusoidal with the maximum coefficient within the segment was a negated atom). If a QRS complex was not detected for more than five consecutive segments (0.35-1sec), we locate the segment containing the atom with the highest coefficient and add a QRS based on the maximum location of the actual signal (or minimum, if this atom was negated). We combine the local decisions from all analysis frames by making sure that two QRS peaks are more than 25 samples away. 3.3.3 Beat Classification in ECG The presence of abnormal ECG beats, which are morphologically different than normal ones, can be indicative of various heart conditions. We demonstrate that the ECG-specific dictionaries yield features capable of distinguishing normal from abnormal beats (2-way classification), as well as the common abnormalities, such as atrial premature beat, paced beat, and premature ventricular contraction (3-way). We further combine the decisions from the above tasks into a 4-way classification by assigning the labels from the 3-way task to the abnormal beats detected during the binary classification. In order to capture the morphology of each QRS, the proposed features include the parameters and coefficients of the selected AM sinusoidal and Hermite atoms located within half the R-R distance from the R peak. We further consider the Hermite atoms with the highest coefficients lying within a 0.65 to 0.15 fraction of the R-R distance on the left to capture possible abnormalities in the P wave, and similarly on the right to represent the T wave. In case no atom was found within the predetermined boundaries, the corresponding features were replaced by missing values. The distance between the center of each atom and the R peak was also used to get an estimate about how close the selected atoms are from the actual QRS complex. We finally include the R-R interval lengths on either side of the current peak and the ratio of the left R-R length to the right one, resulting in 20 features. In the case of the first beat of our working example (Fig. 3.4e), we used the parameters of one AM sinusoidal and two Hermite atoms. Since 26 no Hermite is located close to the QRS, the corresponding features were replaced by missing values. This would not be the case for an abnormal beat, whose QRS complex is more likely to be represented by AM sinusoidal and Hermite atoms. Classification is performed with a leave-one-subject-out cross-validation using a decision tree whose optimal pruning level was determined through a nested cross-validation on the train set. We compute the unweighted accuracy (UA) of classification, because of the unbalanced class distribution. 3.4 Description of Baseline Models We compare our approach against the least squares fit EDA representation model by Lim et al. [89], since it is conceptually the closest to our method involving a parameterization of the signal. The authors proposed a eight-parameter model that incorporates the signal tonic level, at most two SCRs and a recovery phase. SCL is represented by a straight line with offset c. SCRs are captured by sigmoid-exponential functions with common rise timet r and decay timet d . They are located at time offsetsT os 1 ,T os 2 and have amplitudes g 1 ,g 2 respectively. Finally, the recovery phase is assumed to follow the exponential decay of SCRs with the same time constant decayt d and an initial amplitudea 0 . Therefore, the eight-parameter model representing an EDA signal frame can be written as: f(t) =c +a 0 e t t d +f s (t;g 1 ;T os 1 ) +f s (t;g 2 ;T os 2 ) (3.9) f s (t;g i ;T os i ) = g i e tTos i t d 1 + tTos i tr 2 2 ; i = 1; 2 (3.10) Non-linear least-squares fit [95] was used to estimate the model parameters from the data. Time offset and amplitude of SCRs were initialized from peak detection. If less than two peaks were detected in a signal frame, only the corresponding parameters were estimated and the remaining were set to zero. 27 The baseline for the ECG includes the wavelet features, which are widely used in this task [54, 132]. Decomposition was performed with Daubechies 2 and 4 wavelets and statistical measures of maximum, minimum, and variance were extracted at each of the 8 levels. The statistics at the last level are redundant and therefore ignored, resulting in 21 features in total. 3.5 Evaluation Evaluation of the proposed model is performed with respect to the quality of signal re- construction, compression rate, and ability to capture meaningful components. 3.5.1 Signal Reconstruction and Compression Rate Signal reconstruction is evaluated with the Root Mean Square (RMS) error, defined as: RMSErr = 1 K K X k=1 v u u t 1 L L X l=1 (f (k) (l)f (k) N (l)) 2 (3.11) where f (k) (l) denotes the value of the l th sample at the k th signal frame with k = 1;:::;K andl = 1;:::;L; similarly for the reconstructed signalf (k) N (l) with N (O)MP iterations. Low RMS error values reflect high quality signal representation. Compression rate is evaluated as the number of bits used for representing the EDA signal over 1 sec with low values indicating better compression ability. 3.5.2 Signal Information Retrieval Evaluation with RMS error provides a limited view of the representation quality, since an overly complex model may result in perfect representation of the existing data, but might perform poorly on unseen patterns [48]. One technique to avoid overfitting with greedy algorithms, such as (O)MP, is early stopping, that might result in a better estimate 28 of the original signal without introducing too many erroneous or noisy entries [58, 59]. An alternative way is the use of quantitative and qualitative criteria, depending on the application, that assess the amount of meaningful information captured by the proposed models [91, 112]. We measure the ability of our approach to detect SCRs and QRS complexes in the EDA and ECG signals, respectively. SCR detection is assessed in two ways, each containing complementary accuracy metrics. We first quantify the quality of SCR representation based on the amount of detected SCRs and their distance from the human annotated reference ones. LetS r = fp r i g and S d =fp d i g be the sets of locations of real and detected SCRs, respectively. The absolute relative difference between the number of real and detected SCRs captures the portion of hand annotated SCRs that is correctly located by our system: RDiff = jjS r jjS d jj jS r j (3.12) wherejj denotes the cardinality of a set. In order to see how many of the detected SCRs are correct, we compute the mean distance of each detected SCR from the closest real one: MinDist = 1 M d M d X i=1 p d i min pr j jp d i p r j j (3.13) We further calculate precision and recall measures. If S rd =fp d i :9j s.t. min pr j jp d i p r j j<d max g are the estimated SCRs whose distance to their closest real SCR is less than a maximum thresholdd max , precision equals to the portion of detected SCRs that are close to a real one within the distance boundaryd max : Precision =jS rd j=jS d j. Recall captures the fraction of hand-annotated SCRs that are detected from our system within the same boundary:Recall =jS rd j=jS r j F-score is the harmonic mean of precision and recall, providing a single measure from these complementary ones: Fscore = 2 PrecisionRecall Precision +Recall (3.14) 29 These goodness of detection measures improve as the distance boundaryd max increases, since evaluation becomes more forgiving to errors. Note that the distance boundaryd max used for evaluation of SCR detection is different from the distance threshold d thr for mapping SCRs detected from the phasic atoms to the signal peaks (Section 3.3). One can observe that precision is related to the MinDist metric quantifying how close an estimated SCR is to a real one. On the other hand, recall is closer toRDiff that captures the algorithm’s ability to reliably detect the hand-annotated SCRs. Similarly for the ECG, we evaluate the detected QRS complexes against the ground truth provided by human annotators. We use dynamic time warping (DTW) to get an optimal match between the detected and annotated R peaks. We calculate the F-score using a maximum distance threshold (3-10 samples) between the DTW matched R peaks. We used the classical method of Pan and Tompkins [109] as a baseline for this task. 3.6 Data Description Our data contain ten-minute recordings of 37 children watching a preferred television show in order to sustain attention and keep the subject stationary. Two silver-silver chlo- ride electrodes were placed on the index and middle finger of the child’s non-dominant hand. These were connected to the BIOPAC MP150 system that recorded EDA at a sam- pling rate of 32Hz. Details about the experiment can be found in [140]. High-frequency noise artifacts were removed with a low-pass Blackman filter [108] of 32 samples (1sec). Annotation of SCRs was performed by an expert using the visualization capabilities of BIOPAC’s AcqKnowledge software. SCR scoring involved visual inspection of the de- noised EDA signal within a 30 sec time frame and detection of signal “peaks” with the typical steep rise and slow decay. Each SCR amplitude was visually determined (Fig. 2.1) and a minimum threshold of 0.05S was set for an SCR to be eligible for an- notation [46]. In order to ensure reliable annotations, 25% of the data were double-coded, yielding inter-annotator agreement of 96%. 30 We also worked with the publicly available MIT-BIH Arrhythmia database [103] con- taining 48 half-hour excerpts of ambulatory ECG recordings with sampling frequency of 360Hz. Two or more cardiologists went independently through each recording and anno- tated the location and type of each beat, resulting in approximately 109,000 beat labels, out of which 35080 are abnormal. From the latter set, 2539, 7028, and 7075 samples are labelled as atrial premature beat, paced beat, and premature ventricular contraction, respectively. 3.7 Experiments We evaluate the performance of our method based on the collected and annotated data (Section 3.6) and compare it against the baseline approaches. Our results indicate that the proposed dictionaries can capture the typical signal shapes, as well as the morphology of several abnormalities (Fig. 3.5). These are fur- ther supported by low reconstruction errors and compression rates (Fig. 3.6a); the latter are around 14.2 to 213 times smaller for the EDA (Fig. 3.7) and 10 times smaller for the ECG (Fig. 3.8) compared to the original signal. We further observe the usefulness of our model for locating SCRs and ECG beats (Fig.Fig. 3.6b-f, 3.9). F-scores are higher than least-squares approaches for the EDA [89] and comparable to the Pan-Tompkins algorithm for the ECG [109]. Machine learn- ing based QRS detection slightly outperforms the heuristic one, probably because of its ability to take larger signal context into account, with larger segment lengths being more beneficial. The proposed model can capture valuable information about the morphol- ogy of the ECG signal and demonstrate better discriminative properties compared to the baseline wavelet measures (Table 3.2). 31 3.8 Discussion Our approach provides a unified foundation for different types of ECG analyses. In wearable applications, such models are beneficial for data transmission and compres- sion. Conditioned on the fact that the dictionary is known, we can transmit and store only the indices of the selected atoms and their corresponding coefficients. Given the initial signal representation, it is possible to build applications that can detect meaning- ful signal characteristics (e.g. QRS detection) and interpret the underlying physiological condition (e.g. beat classification). The parametric nature of the designed dictionaries allows the use of heuristic rules, that can operate in an unsupervised way and be compu- tationally more efficient than purely data-driven machine learning approaches. These can be extended to an endless count of applications involving physical activity and well-being (e.g. heartbeat tracking during exercise, detection of increased stress levels, tracking of patients with heart conditions). Limitations of our approach include the use of empirical values for the atom parame- ters, which can be alleviated with data-driven dictionary learning. Although component detection and beat classification results are comparable and sometimes better than the considered baselines, they can be still outperformed by other approaches in the litera- ture. However, our goal was not to produce state-of-the-art models tuned for one task, but to demonstrate the feasibility of the proposed unified approach for multiple tasks of interest. 3.9 Conclusions We propose a knowledge-driven physiological model that sparsely decomposes the sig- nal into a small number of atoms from a dictionary. We focus on the dictionary design, which contains carefully selected atoms for the different parts of the signal (i.e. the signal trends and various fluctuations). With appropriate post-processing we automatically re- trieve the important signal components (i.e. the SCRs and QRS complexes). Evaluation 32 of our approach includes signal reconstruction, component detection, and compression rate analysis. These measures are improved upon the least square fit model, that uses a parameterized template, providing benefits in terms of all criteria. 33 Table 3.1: Equations and parameter values of dictionary atoms representing tonic and phasic EDA components. Atom EDA part Equation Parameter Values Straight line Tonic g (t) = 0 + t 0 2f20;10; 1g 2f0:010;0:009;:::;0:001; 0; 0:01; 0:02;:::; 0:10g Sigmoid- Phasic g (1)(t) = e (stt 0 )=T decay 1+ stt 0 T rise 2 2 u(tt 0 ) T rise 2f0:5; 0:8;:::; 2:9g exponential T decay 2f0:3; 0:6;:::; 3g s2f0:06; 0:07;:::; 0:14g Bateman Phasic g (2)(t) = e a(stt 0 ) e b(stt 0 ) u(tt 0 ) a2f0:2; 0:3;:::; 2g a<b b2f0:4; 0:6;:::; 2g t 0 2f0; 10; 20;:::; 310g Chi-square Phasic g (3)(t) = 2 (stt 0 ;k)u(tt 0 ) k2f2:70; 2:73;:::; 5:37g u(t) = 1,t 0 andu(t) = 0 otherwise, 2 (t;k): the value of k-degrees chi-square distribution at point t Table 3.2: Unweighted accuracy for classifying between normal and abnormal beats (2-way), across atrial premature beat, paced beat, and premature ventricular contraction (3-way), and across all the above (4-way). Features Unweighted accuracy (%) 2-way 3-way 4-way Daubechies Wavelets 2 73.1 49.8 50.2 Daubechies Wavelets 4 74 50.3 46.2 Proposed ECG Representation 78.4 68.6 60.2 34 Sigmoid-exponential Bateman Chi-square 2 4 6 8 10 0 0.2 s=0.06 EDA (μS) 2 4 6 8 10 0 0.2 s=0.1 2 4 6 8 10 0 0.2 0.4 s=0.14 2 4 6 8 10 0 0.1 0.2 s=0.06 EDA (μS) 2 4 6 8 10 0 0.2 s=0.1 2 4 6 8 10 0 0.2 s=0.14 2 4 6 8 10 0 0.1 s=0.06 EDA (μS) Time (sec) 2 4 6 8 10 0 0.1 0.2 s=0.1 Time (sec) 2 4 6 8 10 0 0.1 0.2 s=0.14 Time (sec) Figure 3.1: Examples of normalized phasic atoms represented with sigmoid-exponential, Bateman and chi-square functions for different time scale parameterss = 0:06; 0:1; 0:14 and time offsett 0 = 20. For each time-scale value, plots were created using all combi- nations of corresponding atom-specific parameters. (a) AM sinusoidal 0 0.02 0.04 0.06 0.08 0.1 −0.5 0 0.5 Time (sec) ECG (mV) (b) Hermite 0 0.1 0.2 0.3 0.4 −0.5 0 0.5 Time (sec) ECG (mV) Figure 3.2: Example of zero-order Hermite and AM sinusoidal atoms. 35 0 1 2 3 4 5 6 7 8 9 5 5.2 5.4 EDA (μS) (a) Input and reconstructed signal with SCR locations 1 2 3 4 5 6 7 8 9 0 0.1 0.2 (b) Normalized tonic and phasic atoms after 4 OMP iterations EDA(μ S) 1 2 3 4 5 6 7 8 9 0 0.2 (c) Normalized phasic atoms scaled with OMP coefficients EDA(μ S) Time (sec) 1st tonic atom 1st phasic atom 2nd phasic atom 3rd phasic atom Input signal Reconstructed signal Real SCRs Signal peaks SCRs from phasic atoms Final estimated SCRs Figure 3.3: EDA representation scheme and SCR detection for an example signal frame. (a) Input and reconstructed signal with solid blue and dashed red lines. The location of expert hand-annotated SCRs is marked (red “”), along with the signal peaks (magenta “”) and the SCRs estimated based on the phasic atoms of the sparse decomposition algorithm (green “”). The final SCRs (black “ ”) are located by combining the SCRs from the phasic atoms according to their location and mapping them to the signal peaks. (b) The normalized dictionary atoms selected by the first 4 iterations of OMP. The first tonic atom (solid cyan line) captures the signal level, the first and third phasic atoms (dashed magenta and dash-dotted green lines) the first SCR, while the second phasic atom (black dotted line) the second SCR. (c) The normalized phasic atoms multiplied by the corresponding OMP coefficients (dashed magenta, dotted black and dash-dotted green lines). The energy of each atom is indicative of the order they have been selected by OMP with higher energy atoms selected first. 36 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 900 1000 1100 1200 Time (sec) ECG (mV) (a) Original and Reconstructed Signals Original Reconstructed 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 100 200 Time (sec) (b) Selected Dictionary Atoms during ECG Representation 1st AM sin. 2nd AM sin. 1st Hermite 2nd Hermite 3rd Hermite 4th Hermite 3rd AM sin. 4th AM sin. 5th AM sin. 0.5 1 1.5 0 100 200 300 Time (sec) ECG (mV) (c) Heuristic QRS Detection 0 0.5 1 1.5 0 100 200 Time (sec) (d) Machine Learning QRS Detection 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 0 100 200 Time (sec) ECG (mV) (e) Beat Classification AM sin. for QRS complex S R peak No R peak 1st bin 2nd bin Hermite for P wave Hermite for T wave Figure 3.4: Sparse representation of an exemplar ECG with 10 orthogonal matching pursuit (OMP) iterations. (a) Original and reconstructed ECG. (b) The first nine selected AM sinusoidal (AM sin.) and Hermite atoms scaled with OMP-derived coefficients. (c) The AM sinusoidal atoms used in the heuristic QRS detection and their grouping. (d) The segments of lengthS used to classify the presence (dark grey) or absence (light grey) of an R peak during the machine learning based QRS detection. (e) The atoms used to represent the morphology of the P, QRS, and T waves during beat classification. 37 (a) Normal beat 0.5 1 1.5 1000 1100 1200 Time (sec) ECG (mV) 10 OMP Iterations Original Reconstructed 0.5 1 1.5 1000 1100 1200 Time (sec) ECG (mV) 20 OMP Iterations (a) Paced beat 0.5 1 1.5 1000 1100 1200 Time (sec) ECG (mV) 10 OMP Iterations 0.5 1 1.5 1000 1100 1200 Time (sec) ECG (mV) 20 OMP Iterations Original Reconstructed (b) Premature ventricular contraction 0.5 1 1.5 800 1000 1200 Time (sec) ECG (mV) 10 OMP Iterations 0.5 1 1.5 800 1000 1200 Time (sec) ECG (mV) 20 OMP Iterations Figure 3.5: Example original and reconstructed electrocardiogram (ECG) signals using 10 and 20 orthogonal matching pursuit (OMP) iterations for different beat types. Same legend applies to all plots. 38 5 10 15 0.05 0.1 0.15 0.2 0.25 RMSErr (μS) (O)MP Iterations (a) Root Mean Square (RMS) error. 5 10 15 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (O)MP Iterations RDiff MP, Sigmoid−exponential MP, Bateman MP, Chi−square OMP, Sigmoid−exponential OMP, Bateman OMP, Chi−square Baseline (b) Absolute relative difference between number of real and detected SCRs. 5 10 15 0 50 150 MinDist (Samples) (O)MP Iterations (c) Mean distance (in samples) of detected SCRs from closest real SCR. 0.5 1 1.5 2 2.5 3 65 70 75 80 85 90 95 100 Precision (%) dmax (sec) (d) Precision of SCR detection resulting from decomposition with 6 (O)MP iterations. 0.5 1 1.5 2 2.5 3 65 70 75 80 85 90 95 100 Recall (%) dmax (sec) (e) Recall of SCR detection resulting from decomposition with 6 (O)MP iterations. 0.5 1 1.5 2 2.5 3 65 70 75 80 85 90 95 100 Fscore (%) dmax (sec) (f) Fscore of SCR detection resulting from decomposition with 6 (O)MP iterations. Figure 3.6: EDA reconstruction and SCR detection results. (a) Root mean square (RMS) error between original and reconstructed signal with respect to (w.r.t.) the number of (orthogonal) matching pursuit ((O)MP) iterations. (b) Absolute number of relative difference between real and estimated SCRs w.r.t. the number of (O)MP iterations. (c) Mean distance of estimated SCRs from their closest real SCR w.r.t. (O)MP iterations. (d,e,f) Precision, recall and Fscore of SCR detection with 6 (O)MP iterations w.r.t maximum distance thresholdd max between real and detected SCRs, the latter ranging between 10-100 samples. 39 5 10 15 0 50 100 1050 0 50 100 1050 (O)MP Iterations Compression Rate (bits/sec) Sparse Decomposition Baseline Raw Signal Figure 3.7: Compression rate of the original EDA signal and the EDA representation with the proposed sparse decomposition and the least squares fit methods. (Y-axis break between 100 and 900 bits/sec) 0 5 10 15 20 0 0.05 0.1 Relative RMS Error OMP Iterations 0 5 10 15 20 0 1000 2000 Compression Rate (bits/sec) Relative RMS Error Compression Rate Figure 3.8: Relative root mean square (RMS) error between original and reconstructed ECG signals and compression rate computed over 1-20 orthogonal matching pursuit (OMP) iterations. 40 3 4 5 6 7 8 9 10 65 70 75 80 85 90 Maximum Distance Threshold (samples) Fscore(%) Heuristic Machine Learning(K=5,S=25) Machine Learning(K=5,S=75) Machine Learning(K=15,S=25) Machine Learning(K=15,S=75) Pan−Tompkins Baseline Figure 3.9: QRS detection F-score plotted against different maximum distance thresh- olds between the real and detected R peaks. Results for the machine learning approach are obtained with a variety of orthogonal matching pursuit iterations (K) and segment lengths (S). 41 Chapter 4 Learning Physiological Representations Parametric dictionaries can increase the ability of sparse representations to meaningfully capture and interpret the underlying signal information, such as encountered in biomedi- cal problems. The previous chapter introduced the use of knowledge-driven dictionaries in order to reliably represent physiological signals and extract informative characteristics. However, the aforementioned design of manual dictionaries is usually labor-intensive and requires careful signal inspection. In order to automate this time-consuming process and potentially generalize our approach to other signals of predefined characteristic structure, we would like to be able to automatically learn the parameters of these dictionaries. We propose a sparse Bayesian framework for learning parametric dictionaries be- cause of its ability to provide full posterior estimates, take uncertainty into account and generalize on unseen data. Compared to previous Bayesian DL [88, 160], our approach introduces parametric dictionaries which cause non-closed-form solutions handled with a combination of Gibbs and Metropolis-Hastings (MH) sampling (MH-within-Gibbs). Our approach differs from previous parametric DL [6, 159] because of its stochastic framework that yields estimation of the full problem variables. This results in parametric The work presented in this chapter is published in the following journal: T. Chaspari, A. Tsiartas, P. Tsilifis, and S.S. Narayanan, “Markov Chain Monte Carlo Inference of Parametric Dictionaries for Sparse Bayesian Approximations,” IEEE Transactions on Signal Processing, 64(12): 3077-3092, 2016. 42 dictionaries that take into account the structure of the training data and are less prone to overfitting. We provide the proposed formulation of the Bayesian DL problem in Section 4.1. Inference is performed with a Markov Chain Monte Carlo approach, that uses block sampling to generate the variables of the Bayesian problem. Since the parameteriza- tion of dictionary atoms results in posteriors that cannot be analytically computed, we use a Metropolis-Hastings-within-Gibbs framework, according to which variables with closed-form posteriors are generated with the Gibbs sampler, while the remaining ones with the Metropolis Hastings from appropriate candidate-generating densities. One key challenge with MCMC is to determine its asymptotic behavior, i.e. whether it provides accurate posterior approximations. The goal is to create an aperiodic and irreducible Markov Chain (MC) with stationary distribution same as the posterior distribution of interest [100]. Irreducibility ensures that any state of the space is accessible, while ape- riodicity makes sure that the chain does not return to the same state at regular times. Uniformly ergodic MCs are a special case in which the MC converges to the invari- ant distribution independently of the initial state. They guarantee geometrically fast convergence and are key sufficient conditions in order to establish central limit theo- rems for empirical averages and provide consistent estimators of MCMC standard er- rors [100, 29, 97, 124, 79]. Because of these, we discuss the geometric ergodicity of MCMC in our proposed Bayesian inference framework that ensures convergence to a stationary distribution independently of the initial state. The proposed MCMC approach and the geometric ergodicity properties of the proposed MH-within-Gibbs algorithm are discussed. Such high-dimensional MCMC inference problems usually require a lot of iterations to converge creating the need for processing small batches of data and combining them into a unified model. We demonstrate a parallel implementation framework of our algo- rithm, where DL is performed for each sample separately and the resulting dictionaries from each exemplar data are further combined into a unified model. Our results on 43 synthetic and real biomedical data indicate that the proposed approach yields benefits in terms of superior signal reconstruction compared to previous SD [6] and ETF [159] methods. We denote matrices with bold uppercase letters X and vectors with bold lower- case letters x. Lowercase letters with appropriate numerical indices will either refer to the columns of a matrix, i.e. X = [x 1 ::: x N ] or the elements of a vector, i.e. x = [x 1 :::x N ] T . We denote (X) ij the entry of matrix X corresponding to thei th row andj th column. Thep th order norm of a vector is symbolized asjjxjj p , while theN 1 identity vector and NN identity matrix are noted as 1 N and I N , respectively. The vectorization of matrix X, obtained by stacking its columns on top of one another, is de- fined as vec(X) = [x 1 T ;:::; x N T ] T . Finally the gradient and Hessian of a scalar valued functionf(x), x2< N , are denoted asrf(x)2< N1 andH f =r 2 f(x)2< NN , where (rf(x)) i = f x i and (H f ) ij = # 2 f #x i #x j , respectively. 4.1 Problem Formulation Let X = [x 1 ::: x N ]2< PN be a data matrix of N examples x n 2< P . We formu- late the parametric DL problem by assuming an overcomplete dictionary D n 2< PK containingK prototypical atoms d nk 2< P for each exemplar data x n separately. This approach, also found in similar studies [88], enjoys computational benefits compared to batch methods, since it can yield faster reliable estimates of the considered variables and can be easily parallelized to run on multiple computational threads. In the case of parametric dictionaries, the atoms can be expressed with an appropriate domain-specific function : < Q ! < P in terms of a parameter vector nk 2 < Q , Q < P , as d nk = ( nk ), therefore D n = [( n1 ):::( nK )]. Typically dictionary atoms have a unitaryl2-norm, i.e.k( nk )k 2 = 1. Each signal x n can be expressed as a linear combination of a small number of atoms,LK, with additive noise n 2< P x n = D n c n + n (4.1) 44 where n is the error and c n 2< K ,jjc n jj 0 =L, are the coefficients with non-zero values only for the used atoms. According to the Bayesian framework, each variable in (4.1) is assumed to follow an underlying probabilistic distribution. Especially in parametric DL, where we jointly sample the parameters of the selected dictionary atoms, a small number of selected atoms can keep the implementation com- putationally tractable. For this reason, we are interested in imposing explicit sparseness constraints, similarly to previous studies [19, 51, 41]. We will describe two different ways to approach this using the Multinomial and the Wallenius’ hypergeometric distri- bution, that allow sampling with and without replacement. 4.1.1 Atom Sampling with Replacement A straightforward method to sample the dictionary atoms is to relax thel0-sparsity norm constraint intokc n k 0 L allowing independent sampling of the dictionary atoms L times with replacement through the Multinomial distribution. Since the population size is much larger than the sample size (LK), duplicate atoms are rare [151]. If we assume a discrete multinomial distribution for selecting one dictionary atom out of the possible K, (4.1) can be re-written as x n = D n L X l=1 s nl z nl + n (4.2) where z nl 2 S u i , u i = [0; 0;:::; 1;:::; 0] T with 1 in the i th entry (i K) and 0 in the rest K-1 entries. The vectorjjz nl jj 0 = 1 is binary activating one dictionary atom at a time and s n = [s n1 :::s nL ] T 2< L only contains the coefficients of the selected atoms. If atom d k is thel th representation term, thenz nl k = 1,z nl k 0 = 0,8k 0 6=k, ands nl consists thek th entry of vector c n in (4.1), i.e. s nl =c nk . The probability of selecting atomk for data x n is nk such that z nl Multinomial (1; n ) with n = [ n1 ;:::; nK ] T 2< K . If the same atom is selected more than once, the corresponding coefficient is only once estimated. 45 4.1.2 Atom Sampling without Replacement Sampling without replacement avoids duplicate atoms and keeps the l0-sparsity con- straint intact. The problem of selectingL atoms out of the possibleK can be formalized similarly to the classical experiment of taking colored balls at random from an urn with- out replacement [60, 61]. If the balls have a different weight, the result follows the Wallenius’ noncentral hypergeometric distribution [155]. In the considered problem, we can assume K dictionary atoms each of a different type (i.e. each ball in the urn has a different color) and selection probability = [ 1 ;:::; K ]. If z n = [z n1 ;:::;z nK ]2< K ,z nk 2f0; 1g andkz n k 0 =L, indicates the selected atoms for x n , then (4.1) becomes x n = D n (s n z n ) + n (4.3) where “” represents the Hadamard or entrywise product and s n 2< K and z n follows the Wallenius distribution. 4.1.3 Additional Probabilistic Assumptions We assume independent atom coefficients s n1 ;:::;s nL each following a normal distri- bution with mean s nl and precision s , i.e. s n Normal ( sn ; 1 s I L ). We have con- sidered a different mean s nl for each exemplar datan and selected dictionary atoml to account for the various signal energy levels and possible atom configurations. We further hypothesize dictionary parameters of normal distribution nk Normal (g nk ; G n 1 ) with mean g k and precision G n . Finally, we assume zero mean Gaussian noise with variance 1 n , i.e. n Normal (0; 1 n I Q ). Similar distributions have been also hypoth- esized in prior work [106, 87, 160]. The Bayesian framework further treats the parameters of the above variables as ran- dom components in order to better capture uncertainty. We introduce conjugate prior distributions that simplify computations. Specifically, we assume that n follows a Dirichlet prior, i.e. nk Dirichlet (), where = [a 1 :::a K ] T , and the precision 46 of the Gaussian noise follows a Gamma prior, i.e. n Gamma (e;f). The mean vec- tor g nk and precision matrix G n of the dictionary atom parameters are modeled with Gaussian and Wishart distributions, respectively, i.e. g nk Normal (g 0 ; G 0 1 ) and G n Wishart ( 0 ; R 0 ). 4.1.4 Objective The goal is to find y n such that y n = arg max yn p (y n jx n ;H n ) (4.4) y n =[ z n T ;s n1 ;:::;s nL ; T n1 ;:::; T nK ; g n1 T ;:::; g nK T ; vec(G n ) T ; T n ; T n ; n ] T (4.5) H n =f; g 0 ; G 0 ; 0 ; R 0 ;e;f; sn ; s g (4.6) The probabilistic assumptions for (4.4)-(4.6) are summarized in Table 4.1. 4.2 Inference with MCMC Sampling The inference problem aims at finding solutions y n ,n = 1:::;N, that maximize (4.4). Since (4.4) is not analytically tractable, we use MCMC for approximate inference be- cause of its simple and fast implementation. We describe the inference procedure in Sec- tion 4.2.1 and provide the corresponding derivations in Sections 4.2.2, 4.2.3, and 4.2.4. 4.2.1 MCMC Sampling The large number of variables in our problem renders the simultaneous sampling from the full posterior quite prohibitive. For this reason, we divide the variable space y into blocks, a technique which is usually referred as “block-at-a-time” MCMC [36, 34]. Sup- pose that the variable space is split into B blocks specified according to the problem 47 characteristics, i.e. y = [y 1 T ;:::; y B T ] T . Without loss of generality, we hypothesize these blocks are sampled sequentially from y 1 to y B . This is referred as “deterministic- scan” MCMC sampling [77, 78, 81] and will be the focus of our paper. When the posterior probability of a blockb yields a known probabilistic distribution, we use the Gibbs sampler, otherwise we sample based on the MH [99, 71]. MH generates a sample with a candidate-generating (or proposal) density q. The MC transitions to the generated sample with a predefined probability of move. A critical component is the selection of proposal density, based on which MH samplers are divided into two categories. The first is the random-walk [99] with samples generated around the current value of the corresponding variables. Despite its simplicity, this method often depicts slow convergence depending on the variance of q [34, 123]. The second type, called independent MH [71], samples independently of the previous state. Its proposal density is close to the target distribution in a certain sense benefiting convergence. Previous work has considered Student-t distributions tailored to the target density [35, 14, 37], whose long tails ensure that no areas of the state space are left unexplored. The “MH-within- Gibbs” sampler uses MH to generate samples for blocks whose posterior does not yield a known distribution, and Gibbs to generate samples for the remaining blocks. We will further discuss the use of “MH-within-Gibbs sampler” for our problem and the derivation of posteriors for each variable (Table 4.2). We will assume that y y b contains all variables except the ones included in the b th block (i.e. y b ), which are currently being generated. 48 4.2.2 Sampling Dictionary Parameters LetI Dn = fk 0 1 ;:::;k 0 L g be the indices of dictionary atoms that represent signal x n and nk 0 1 ::: nk 0 L the corresponding atom parameters. The joint posterior of ~ n = vec([ nk 0 1 ::: nk 0 L ])2< QL can be written as p ~ n jy ~ n ; x n ;H n / Y k2I Dn p ( nk jg nk ; G n )p x n L X l=1 s nl ( nk 0 l ) n ! / exp 2 4 n 2 x n L X l=1 s nl ( nk 0 l ) 2 2 3 5 exp 2 4 1 2 X k2I Dn ( nk g nk ) T G n ( nk g nk ) 3 5 / exp n 2 x n S n ~ n ( ~ n ) 2 2 exp 1 2 ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) (4.7) where ~ n ( ~ n ) = vec ( nk 0 1 ):::( nk 0 L ) 2< PL , ~ g n = vec g nk 0 1 ::: g nk 0 L 2 < QL and ~ G n = 0 B B B @ G n . . . G n 1 C C C A 2< QLQL S n = 0 B B B @ s n1 s nL . . . . . . . . . s n1 s nL 1 C C C A 2< PPL Because of, we cannot find a known probabilistic distribution for (4.7), therefore we use MH for sampling ~ n . The proposal density is a multivariate Student-t distribution 49 with location ^ ~ n tailored to the target density with 1 degrees of freedom and identity scale matrix ^ V ~ n , defined as q( ~ n jx n ;H n ; y ~ n ) = 1 +Q 2 1 2 [ ( 1 +Q)] Q 2 ^ V ~ n 1 2 1 + 1 1 +Q ~ n ^ ~ n T ^ V 1 ~ n ~ n ^ ~ n 1 +Q 2 (4.8) ^ ~ n = arg max ~ n ln p( ~ n jy ~ n ; x n ;H n ) (4.9) ^ V ~ n = 2 I QL (4.10) The choice of Student-t is crucial for the uniform ergodicity of MCMC. In the exper- iments, we find a numerical solution to (4.9) using the Nelder Mead’s simplex [105] because of its simplicity compared to gradient-based approaches. From (4.8-4.9), we are able to jointly generate the parameters of the selected atoms, rather than generating each one separately. A large number of selected atoms increases the computational cost towards maximizing (4.9). The posterior for the remaining atoms nk = 2I Dn , generated with Gibbs, is p ( nk jy nk ; x n ;H n )/p ( nk jg nk ; G n ) (4.11) 4.2.3 Sampling Atom Indices and Coefficients The probabilistic selection of dictionary atoms gives the opportunity to test unseen com- binations. This can alleviate disadvantages of greedy algorithms, since the randomization might overcome locally optimal solutions, as also observed in [51]. Because the atom indices and coefficients are interdependent, we will describe the sampling procedure for both. In the case of sampling with replacement, the atom indices and coefficients are gen- erated with the Gibbs sampler, as their posterior can be analytically derived based on the 50 assumptions for their priors. In the following, we will denote nl = x n P l 0 6=l s nl 0D n z nl 0 the error for the exemplar data x n when we exclude thel th atom from the representation. Then the total representation error can be expressed as n = nl s nl D n z nl = nl s nl K X k=1 z nl k ( nk ) (4.12) The posterior distribution of z nl can be written as p(z nl jy fz nl g ; x n ;H n )/p(z nl j n ;L)p( n j n ) / K Y k=1 z nl k nk exp 0 @ n 2 nl s nl K X k=1 z nl k ( nk ) 2 2 1 A / K Y k=1 nk exp n s nl T nl ( nk ) z nl k (4.13) For deriving the above expression we took into account that z nl k 2f0; 1g is a binary variable, implying thatz 2 nl k =z nl k andaz nl k =a z nl k ,8a2<. Also z nl has unitl0-norm, i.e.kz nl k 0 = 1, resulting inz nl k z nl k 0 = 0,8k 0 6= k. As indicated in (4.13), this update procedure considers the similarity of dictionary atoms to the signal residual and the prior knowledge for selecting atomk. Finally, the posterior ofs nl is p(s nl jy s nl ; x n ;H n )/p(s nl j s )p( n j n ) / exp h s 2 (s nl s nl ) 2 n 2 k nl s nl D n z nl k 2 2 i (4.14) By completing the square of the above quadratic formula with respect to s nl and as- suming dictionary atoms of unit norm,s nl can be generated from a normal distribution (Table 4.2). To the best of our knowledge, we found no conjugate prior for the Wallenius’ hy- pergeometric distribution [151], therefore we used the independent MH with proposal distribution tailored to the Wallenius prior for sampling without replacement. 51 4.2.4 Sampling the Parameters of the Priors The conjugate assumptions for the distribution of the parameters of the aforementioned variables allow us to use the Gibbs sampler to generate the corresponding samples. For the sake of completeness, we briefly sketch the derivation of the posteriors with sampling distributions shown in Table 4.2. p( n jy n ;))/p( n j) L Y l=1 p(z nl j n ;) (4.15) p(g nk jy g nk ;)/p( nk jg nk ; G n )p(g nk jg 0 ; G 0 ) (4.16) p(G n jy Gn ;)/ K Y k=1 p( nk jg nk ; G n )p(G n j 0 ; R 0 ) (4.17) p( n jy n ;)/p( n je;f)p x n D n L X l=1 s nl z nl n ! (4.18) where the dash “” denotesfx n ;H n g 4.2.5 Implementation of Bayesian DL The problem variables (4.5) are inferred for each exemplar data separately. This method can yield signal-specific estimations of the noise variance, useful for denoising applica- tions. It also allows sharing computational cost in MCMC inference, which typically requires a large amount of iterations to converge. Since more training samples are likely to require more MCMC iterations, if we had trained one dictionary on the entire dataset, the sequential nature of MCMC would have significantly slowed down convergence. It is worth mentioning that similar studies have acknowledged the difficulty of performing Bayesian inference in large datasets [88]. The MCMC inference procedure is outlined in Algorithm 1. 52 4.3 Combination of Generated Dictionaries The variables of the considered Bayesian problem are inferred for each exemplar data separately yielding sample-specific information useful for denoising and other applica- tions, and allowing parallel implementations. The latter is beneficial because of the large amount of data in many applications, that render batch DL methods computationally ex- pensive or even prohibitive. However, in order to obtain generalizable dictionaries, that are able to reliably represent unseen data, we need to combine the corresponding results into a unified model (Algorithm 2). While the Bayesian framework aims to maximize the posterior probability of the model parameters, DL is usually evaluated based on the reconstruction error. For this reason, the combination of the dictionaries that result from the Bayesian inference is performed based on a root mean square (RMS) error criterion. Let D (m) n be the dictionaries generated with the MH-within-Gibbs sampler. In order to evaluate their performance using signal reconstruction criteria, we need to use a sparse decomposition algorithm, that can represent the original signal based on the inferred dictionaries. We use OMP [110, 45] because of its simplicity and effectiveness. Let Err (m) n be the relative RMS error that yields from decomposing x n based on dictionary D (m) n . Also let D (m n ) n 2< PK be the dictionaries that yield the lowest relative RMS error for each signaln and (m n ) = h (m n ) nk 0 1 ;:::; (m n ) nk 0 L i 2< QL the parameters of the atoms that were selected by OMP based on D (m n ) n . (m n ) are concatenated into a unified dictionary U = (m 1 ) ::: (m N ) 2< QNL m n = arg min m2[1;M] Err (m) n ; n = 1;:::;N Ideally, we could have used U as the parameters of our final dictionary. However, in practice the large amount of data renders U computationally expensive. For this reason, the parameter vectors of U are further quantized with K-means clustering usingN bin centers. This results in parameter matrix Q with corresponding final dictionary D Q . 53 The aforementioned procedure is performed on the training data, while the test data are not seen at all during this step. This approach can be applied to combine dictionaries learnt from any other method, therefore it also provides a unified platform that allows comparison of the considered parametric DL approaches. 4.4 Choosing the Parametric Dictionary Function The choice of the parametric dictionary function is not trivial and is usually guided by the application of interest. If designed appropriately, parametric dictionaries can yield interpretable information about meaningful signal characteristics for a variety of appli- cations. Previous studies have used Gabor [90] and Gammatone [159] atoms to represent speech signals because of their good localization properties and similarities to the human auditory system. Other efforts have proposed Gaussian-like functions to efficiently cap- ture spherical stereo images [148], diffusion-based dictionaries to model MRI [98], and other wavelet-like atoms for digitizing fingerprint images [47]. Gabor dictionaries have been used for the electroencephalogram (EEG) [13], spline wavelets for the electrocar- diogram (ECG) [130], and sigmoid-exponential functions for the electrodermal activity (EDA) [31]. Our Bayesian parametric DL approach generates the parameters of the dictionary atoms based on the MH sampler. The mean of the corresponding distribution depends on the considered parametric function and is selected so that it maximizes the corresponding posterior distribution (4.7). We will further show that the concavity of (4.7) generally depends on function. It is unusual that a function is concave with respect to all the parameters of interest, but in practice even the estimation of local optima is enough to generate useful samples. 54 The posterior (4.7) of the dictionary atom parameters is: U( ~ n ) = ln p( ~ n jy ~ n ; X;H) = / n 2 x n S n ~ n ( ~ n ) 2 2 1 2 ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) (4.19) If we set n ( ~ n ) = x n S n ~ n ( ~ n ), then (4.19) becomes U( ~ n ) = 1 2 ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) n 2 n ( ~ n ) 2 (4.20) The gradient vector and Hessian matrix of (4.20) are r ~ n U( ~ n ) = ~ G n ~ n ~ g n n r ~ n n ( ~ n ) T n ( ~ n ) (4.21) H U =r 2 ~ n U( ~ n ) = ~ G n n 2 r 2 ~ n n ( ~ n ) 2 2 (4.22) where r 2 ~ n n ( ~ n ) 2 2 ij = # 2 # ~ n i # ~ n j n ( ~ n ) 2 2 = # # ~ n i 2 PL X p=1 np ( ~ n ) # np ( ~ n ) # ~ n j ! = 2 PL X p=1 # np ( ~ n ) # ~ n i # np ( ~ n ) # ~ n j + 2 PL X p=1 np ( ~ n ) # 2 np ( ~ n ) # ~ n i # ~ n j (4.23) 55 If H np is the Hessian of np ,p = 1;:::;PL, then from (4.23) r 2 ~ n n ( ~ n ) 2 2 = 2 r ~ n n ( ~ n ) T r ~ n n ( ~ n ) + 2 P X p=1 np ( ~ n )H np (4.24) The Hessian of np can be computed as H np ij = L X l=1 s nl # 2 p ( nk 0 l ) # nk 0 li # nk 0 ll (4.25) H np = L X l=1 s nl H p j = nk 0 l (4.26) From (4.22), (4.24) and (4.26), we get H U = ~ G n n r ~ n n ( ~ n ) r ~ n n ( ~ n ) T + n P X p=1 np ( ~ n ) L X l=1 s nl H p j = nk 0 l (4.27) The first two terms of (4.27) are postive-definite matrices, whereas the positive- definiteness of the third term depends on . In our experiments and in the literature mentioned in this section, function is selected with a knowledge-driven approach based on the application of interest. Although this does not guarantee that the generated atom parameters are the global maxima, in practice the corresponding sampling method yields satisfactory results. 4.5 MCMC Convergence for Bayesian DL Ergodicity properties of large-dimensional MC have been the subject of several stud- ies [78, 80]. We discuss the uniform ergodicity property of the considered MC which 56 implies convergence independently of the initial state [100, 97]. We focus on the sam- pling with replacement case, although our discussion can be extended for atom sam- pling without replacement. We show that the MC used for inferring the variables of the Bayesian DL problem is uniformly ergodic by proving that it meets the conditions of Theorem 4.5.1. This theorem establishes uniform ergodicity for the MH-within-Gibbs sampler and its proof can be found in [81]. LetP be a Markov transition kernel and Y (m) , Y (m+1) be two consecutive MCMC states generating observations y (m) , y (m+1) . We will assume that y (m) b and y (m+1) b are the values of blockb at the current and previous MCMC state, noted asm + 1 andm, respectively. Also, y (m+1) y b contains all variables that have already been sampled at state m + 1 and y (m) y b the ones from the previous statem. Definition 4.5.1 (Conditional Weight Function). If blockb generated with the MH sam- pler, its conditional weight function is defined as w y b (y b (m+1) jy b (m) ; y y b (m) ; y y b (m+1) ) = p(y b (m+1) jy y b (m) ; y y b (m+1) ) q(y b (m+1) jy b (m) ; y y b (m) ; y y b (m+1) ) (4.28) Theorem 4.5.1 (MH-within-Gibbs Uniform Ergodicity). Givenp(yjX;H) on state space y = [y 1 T ::: y B T ] T , letP = P 1 :::P B be the Markov kernel of the Gibbs sampler and Q b ,b2I B 0, whereI B 0 =fb i 1 ;:::;b i B 0 g,B 0 <B, be the Markov kernel of the MH sam- pler with conditional weightw y b 0 as in (4.28). If each conditional weightw y b 0 ,b 0 2I B 0, of the MH sampler is bounded, supw y b 0 <1, and the Gibbs sampler with Markov kernelP 1 :::P B is uniformly ergodic, then the MH-within-Gibbs sampler, resulting from substituting kernelsP b 0 withQ b 0,b 0 2I B 0, is also uniformly ergodic. Based on Theorem 4.5.1, we show the MCMC uniform ergodicity for our case. The- orem 4.5.2 describes the minorization condition of the Gibbs sampler. This consists a multivariate extension from Jones et. al [81] (Proposition 2). Lemma 4.5.1 assists with 57 its proof and ensures the minorization of the firstB1 blocks. Lemma 4.5.2 ensures that the conditional weight for k is bounded away from infinity. Finally, Theorem 4.5.3 com- bines all the aforementioned to prove the uniform ergodicity of the considered MC. The proofs of the theorems 4.5.2, 4.5.3 and lemmas 4.5.1, 4.5.2 can be found in Appendix A Lemma 4.5.1 (Minorization Condition ofB1 Blocks). LetP ((Y 1 ; Y 2 ;:::; Y B );A 1 A 2 :::A B ) be the Markov kernel of a Gibbs sampler, where A 1 ;:::;A B are ele- ments of the Borel-algebra on the variable space Y 1 ;:::; Y B , respectively. We further assume that all updates of the Gibbs sampler, except the one corresponding to Y B , are minorisable, in the sense that for b = 1;:::;B 1, there is b > 0 and a probability measure b , such thatP Y b (y y b ;A b ) b b (A b ), whereA b = A 1 :::A b1 A b+1 ;:::;A B . The Gibbs sampler with Markov kernel P 1 :::P B1 is minorisable, i.e. there exist 0 > 0 and probability measure 0 such that P ((Y 1 ; Y 2 ;:::; Y B1 );A B ) 0 0 (A B ) Theorem 4.5.2 (Partial Minorization Condition for Gibbs). Let the same assumptions from Lemma 4.5.1 hold, then the Gibbs sampler with Markov kernel P 1 :::P B is mi- norisable. Lemma 4.5.2 (Bounded Conditional Weight of Dictionary Parameters). Let the same as- sumptions from Lemma 4.5.1 hold. The conditional weight of theB th block that includes the “super-vector” ~ n of dictionary parameters and is generated with MH according to (4.8)-(4.10), is bounded, i.e. supw ~ n <1. Theorem 4.5.3 (MC Uniform Ergodicity for Bayesian DL Inference). Let y n be the vari- ables of the parametric DL problem (4.4) generated with the MH-within-Gibbs sampler (Algorithm 1), according to which all variables except ~ n are sampled with Gibbs (first B 1 blocks), while ~ n is sampled with MH (B th block). Then the corresponding MC fY (1) ; Y (2) ;:::g is uniformly ergodic. 58 4.6 Experiments We compare the Bayesian DL model against the previously proposed parametric SD [6] and ETF [159], which are conceptually closer to our approach. We further perform ex- periments with the non-parametric K-SVD [4] yielding dictionaries of arbitrary structure, therefore not directly comparable to our approach. We use synthetic and real biomedical data from EDA signals. Because of their characteristic structure, these types of signals fa- vor sparse representation approaches with parametric dictionaries providing interpretable information [31]. EDA is decomposed into a slow moving tonic part depicting the general trend and a phasic part which contains fast fluctuations superimposed onto the tonic signal, also called skin conductance responses (SCR). The tonic part is mathematically expressed as a straight line, while SCRs are represented by sigmoid-exponential functions with a steep rise and slow recovery. Taking this into account, dictionaries contain tonic and phasic atoms as shown in Table 4.3. Since SCR shapes typically contain higher variability than the signal level, which remains fairly constant throughout an analysis window, for the sake of simplicity we perform DL on the phasic atoms for learning the parameters = [t 0 ;T rise ;T decay ] T . The initial dictionaries are created from the combination of all parameters reported in Table 4.3, resulting in 63 tonic and 144 phasic atoms. The analysis window is 5sec, i.e. 160 samples with typical sampling frequency of 32Hz [31]. A critical issue of MCMC is whether a certain number of iterations is enough to stop sampling. In high-dimensional problems, all inferred variables need to converge to the target distribution. Since examining each variable separately is not always feasible, we use a combination of monitoring and diagnostic strategies to quantitatively assess MCMC convergence. In the following, we describe the data (Section 4.6.1), the experimental setup (Sec- tion 4.6.2), and the results evaluating the learned dictionaries (Section 4.6.3) and provid- ing diagnostics for MCMC convergence (Section 4.6.4). 59 4.6.1 Data Description 4.6.1.1 Synthetic Data We randomly generate 1000 synthetic signals that simulate the EDA structure. Each signal is expressed as the sum of a constantc andR number of SCRs x(t) =c + R X r=1 (r) 2 (t) (4.29) where (r) 2 is given in Table 4.3 with parametersT (r) rise ,T (r) decay andt (r) 0 . In contrast to the rest of the paper, in which superscript “()” denotes the MCMC state, here it symbolizes the SCR index. The parameters of the synthetic data are randomly generated within a pre-specified range t (r) 0 2 [1; 150] samples, T rise ; (r) ;T (r) decay 2 [1; 20], and R2 [1; 5]. Since the number of SCRs for each signal is known a priori, DL was performed with K = R + 1 number of atoms, from which one captures the tonic part and the rest, the phasic. 4.6.1.2 Real Data We further evaluate the considered DL methods on human EDA data from the database of emotion analysis using physiological signals (DEAP) [82]. DEAP contains 40 one- minute recordings from 32 subjects watching long excerpts of music videos designed to study the relation of multimedia content with mood and temperament [82]. Because of the expected SCR rate in the considered 5sec analysis window [46], training was performed withK = 3 atoms, although similar results yield for different values. 60 4.6.2 Experimental Setup 4.6.2.1 Bayesian DL The proposed MCMC inference (Algorithm 1) is performed with 1000 and 500 iterations for the synthetic and real data, respectively. The atom indices z nl , coefficients s n and selection probabilities n are initialized based on the decomposition of each exemplar signal with OMP. The mean sn of the coefficients’ prior was initialized with the average of the coefficients of the selected atoms from OMP. The mean and precision of dictionary atoms’ prior g 0 and G 0 were initialized with the mean and inverse covariance matrix of the initial parameters (Table 4.3). The scale matrix R 0 of the Wishart distribution for sampling G n was also initialized with the covariance of dictionary parameters. The remaining hyperparameters were empirically set to k = 2, e = 1, f = 2, 0 = 3, s = 10, and e = 8. Dictionary combination (Algorithm 2) was performed on the generated vector param- eters [T rise ;T decay ] T withN b = 100; 225; 400. We did not include the time offsett 0 in this procedure, since it does not contribute to the shape of the dictionary atoms; the final dictionaries should contain the learnt versions of phasic atoms shifted across the entire analysis frame. Therefore, the quantized matrix of dictionary parameters Q 2< 2N b is replicated for all values oft 0 (Table 4.3), yielding final dictionaries with 16N b phasic atoms. 4.6.2.2 Steepest descent DL Dictionaries are trained in parallel for each exemplar data using the SD [6], in which OMP alternates with a least-squares fit step for estimating the parameters of the selected atoms. SD was run over 250 iterations. We perform the same procedure for combining the learnt parameters, yielding dictionaries of same size with MCMC. 61 4.6.2.3 Equiangular tight frame DL This method alternates between finding the dictionary with minimum Frobenius distance from the ETF and updating the corresponding parameters [159]. These steps involve an ETF relaxation constraint value and a gradient descent step, crucial for the overall performance, referred in [159] as k and. In our experiments, dictionaries were trained with different combinations of k 2f0:1; 0:5; 0:9g and2f0:001; 0:002; 0:003; 0:004g. The dictionaryf k ; g that showed the lowest relative RMS error on the training data was used to evaluate the corresponding test data. In order to ensure the same dictionary size as the other approaches, parameters are uniformly initialized within the intervals T rise 2 [8; 18],T decay 2 [10; 20] with p N b = 10; 15; 20 different values. 4.6.2.4 K-SVD K-SVD is a classical non-parametric DL method generalizing K-means and iteratively estimating each dictionary atom in order to minimize the reconstruction error [4]. The original dictionaries used in this method were the same as the initial ones of the afore- mentioned approaches. 4.6.2.5 Evaluation Evaluation of the final dictionaries is performed based on the relative RMS error com- puted between the original signal and its corresponding approximation through OMP based on the final dictionaries. RMS error is a common metric in related studies [4, 52, 159]. In order to ensure that our results reflect the ability of the algorithms to represent unseen data, all experiments are performed within a 10-fold cross-validation for the syn- thetic data and a leave-one-subject-out cross-validation setup for the real EDA signals. We note that OMP has two functionalities in our framework. It serves as a decompo- sition technique, based on which the learned dictionaries are evaluated against the test data, and is also used at the dictionary combination step in order to “prune” the atoms 62 of the generated dictionaries. OMP does not operate on the test data during this dictio- nary combination framework, but rather at the evaluation step, during which the final dictionary is already created independently of the test set. Besides signal reconstruction, dictionary coherence is an additional evaluation crite- rion. It is defined as the absolute value of the largest inner-product between any atoms of the dictionary and is an important property related to signal reconstruction quality [159]. We computed the resulting coherence of the dictionaries after training. 4.6.3 Results Visual inspection of the final dictionaries indicates that SD does not always preserve the initial dictionary structure (Figs. 4.1a-b), while ETF yields less variable dictionaries (Fig. 4.1c). MCMC results in a variety of atoms preserving the original shape (Fig. 4.1e- f). In contrast to parametric DL, non-parametric K-SVD yields very unstructured non- interpretable atoms (Fig. 4.1d). Further comparison between an exemplar input and the reconstructed signals suggests that our approach depicts superior signal representation (Fig. 4.2). Dictionaries learnt from our proposed Bayesian DL yield lower reconstruction errors compared to the initial ones and the ones learnt through SD and ETF (Fig. 4.3). Atom sampling with and without replacement are not significantly different, since the two dis- tributions are very similar forKL. Dictionaries trained using SD perform quite poorly on unseen synthetic data (Figs. 4.3a-c), which might occur because their simple structure causes significant overfitting to least-squares-based methods. Despite the fact that ETF DL is not prone to overfitting, since it does not take into account exemplar data during training, it lacks adaptation to more complex real data (Figs. 4.3d-f). K-SVD appears more accurate for real signals than the parametric approaches (Figs. 4.3d-f), indicating its ability to learn more complex patterns, not necessarily of the same structure as the initial dictionaries. Similar results occur for different dictionary sizes, omitted here for the sake of simplicity. 63 The large number of atoms generally yields dictionaries of high coherence. All con- sidered methods appear quite equivalent, with ETF achieving the lowest coherence, since this is included as an optimization metric (Table IV). 4.6.4 MCMC Diagnostics Besides theoretical analysis, another way to examine MCMC convergence is to see how well the MC is mixing, which is usually achieved through visual inspection. Traceplots of several problem variables from an exemplar signal indicate that the considered chains move around the parameter space and are not only limited in certain areas suggesting good mixing (Figs. 4.4a-d). Density plots of the generated samples further validate that the estimated posteriors are close to the target distributions (Figs. 4.4e-h). Convergence diagnostics can further quantitatively assess if there is a bias from gen- erated samples. We use the Geweke diagnostic [65], because it only requires one running chain and attempts to address issues of both bias and variance [39]. It takes two non- overlapping parts of the MC (usually the first 0.1 and last 0.5) and compares their mean using a difference of means test. If the samples are drawn from the stationary distribution of the chain, the two means are equal. The test statistic is a standard Z-score with values under convergence within two standard deviations from zero, i.e.jzj< 2 for the standard normal distribution. We perform the Geweke test for both datasets and atom sampling methods with the first 100 samples used as a burn-in period. The high-dimensionality of our problem prohibits us to report diagnostics for each variable separately, therefore we will summarize the results for groups of variables. For each group of variables, we report the proportion of chains for whichjzj < 2 (Table V). Most of the variables in our framework succeed on the Geweke diagnostic. The dictionary parameters, generated with MH, usually have a lower success percentages. Although there is a reduced number of cases where the Geweke diagnostic fails, given the large number of variables and the signal reconstruction results, the performed number 64 of MCMC iterations appears to result in meaningful dictionaries learned through this process. MH acceptance rate refers to the fraction of candidate draws that are accepted and is important for convergence. Very high acceptance rates suggest that the chain is not mixing well, while very low rates might be inefficient. The acceptance rates for the variables of our problem (Table VI) are close to previously proposed optimal values in the literature [127] suggesting a good mixing of the considered MC. 4.7 Discussion An important benefit of Bayesian methods yields in providing estimates of the entire variable set of a problem. In the considered setup this can help inferring the dictionary size, the optimal number of dictionary atoms for a given signal, and the corresponding noise levels. Estimation of the dictionary size is not as crucial in our greedy sparse representation approach, that is not as prone to overcomplete dictionaries as basis pursuit methods [32]. Inferring the optimal number of dictionary atoms is important, as it can yield high compression rates and help interpret the underlying signal information. An extension of our method could have imposed discrete probabilistic distributions onto the number of atoms appropriately inferred through MH. Another advantage of Bayesian methods is that they are less prone to overfitting, which usually occurs with deterministic algorithms that might describe the random noise instead of the actual data [18]. This is reflected in our results, since deterministic SD, that does not include prior knowledge of the signal structure, performs more poorly on unseen data. On the other side, ETF avoids overfitting, but does not learn the morphology of training data. Bayesian inference appears to compromise between the two. Inherent differences exist between parametric and non-parametric DL methods. The first impose a predetermined structure on the input space (through function) and learn the parameters that represent this structure from the training data. Their major benefit 65 lies in their interpretability, since the considered dictionary atoms are able to meaning- fully capture the characteristics of input signals and can be used for knowledge-driven classification and inference. On the other hand, the exemplar signals learned from non- parametric methods are hardly interpretable and can blindly represent the input space. Since the functionality and scope of these methods is so different, meaningful compari- son is challenging. In the case of synthetic data, where function is a perfect match to the signal (i.e. input signals are built with the same functions as the dictionary atoms), our proposed parametric Bayesian approach is more reliable than K-SVD in learning the hidden atom parameters. This means that given a perfectly constructed dictionary func- tion, Bayesian parametric DL yields better results, even compared to non-parametric approaches. However, in the case of real signals, function cannot always perfectly selected, therefore parametric approaches seem to perform slightly worse than K-SVD. While more precise function might have yielded lower errors, the problem of finding the optimal mapping function is still active in research [117]. Although our proposed approach is general for learning the parameters of the dictio- nary atoms, we need to have good knowledge about the appropriate mapping function between the parameters and the data. The selection of is usually guided by the appli- cation of interest and can vary for different types of signals. In the case of 2D images, for example, the dictionary needs to be constructed using a function that captures the context of the image, such as a Gabor or wavelet-like function. In order to reduce com- plexity, we can convert the 2D image into a 1D vector with dimensionality equal to the number of pixels, as typically performed [4, 86, 160]. Given a function suitable for the image of interest, our Bayesian approach can learn the parameters that efficiently capture the shape of the data. In this paper, for the sake of simplicity we considered a simple dictionary combi- nation approach with K-means quantization. However, there exist more sophisticated approaches of block-coordinate descent with warm restarts [93] and weighted batch av- eraging [88]. 66 Convergence is a critical component of MCMC in the context of Bayesian inference. We discussed the uniform ergodicity of the corresponding MH-within-Gibbs sampler, ensuring that the MC converges to the invariant distribution. Convergence rate is another important issue, for which previous studies have proposed explicit theoretical bounds in simple scenarios [77, 126]. The computation of these bounds can be quite difficult in such high-dimensional problems [125], therefore it goes beyond the scope of our study. 4.8 Conclusions We propose a sparse Bayesian model for parametric DL, whose problem variables follow appropriately selected probabilistic distributions. We use MH-within-Gibbs to infer the corresponding variables, because of its ability to compensate for posterior distributions that cannot be analytically computed. We further show the uniform ergodicity of the proposed MCMC through the minorization of the corresponding Gibbs sampler and the bounded conditional weights of the MH. Our experimental results performed on synthetic and real biomedical signals indicate that this approach offers benefits in terms of signal reconstruction compared to previously proposed SD and ETF methods and also provides a good tradeoff between learning the signal structure and avoiding overfitting. 67 Table 4.1: Prior distributions of Bayesian dictionary learning variables. Variable Type Expression (Hyper) Parameters z nl y 2 S u i Multinomial (1; n ) Q K k=1 z nl k nk n : outcome z n z 2< K Wallenius (1 K ;L; n ) R 1 0 Q K k=1 1t nk =d z nk dt,d = P K k=1 nk (1z nk ) probability s n 2< L Normal ( sn ; 1 s I L ) (2) L=2 1=2 s exp s 2 (s n sn ) T (s n sn ) sn : mean s : precision nk 2< Q Normal (g nk ;G n 1 ) (2) Q=2 jG n j 1=2 exp 1 2 ( nk g nk ) T G n ( nk g nk ) g nk : mean G n : precision n 2< P Normal (0; 1 n I P ) (2) P=2 1=2 n exp n 2 T n n n : precision n 2< K Dirichlet () ( P K k=1 k) P K k=1 ( k ) Q K k=1 k 1 nk : concentration parameters g nk 2< Q Normal (g 0 ;G 0 1 ) (2) Q=2 jG 0 j 1=2 exp 1 2 (g nk g 0 ) T G 0 (g nk g 0 ) g 0 : mean G 0 : precision G n 2< QQ Wishart ( 0 ;R 0 ) jG n j 0 Q1 2 exp h tr(R 0 1 Gn) 2 i 2 Q 0 2 jR 0 j 0 2 Q ( 0 2 ) 1 0 >Q 1: dof R 0 : scale matrix n 2< Gamma (e;f) f c (e) e1 n exp [f n ] e: shape f: rate , Q : (multivariate) gamma functions, dof: degrees of freedom,1 K = [1;:::; 1] T 2< K y: atom sampling with replacement (kz nl k 0 = 1),u i = [0; 0;:::; 1;:::; 0] T with 1 in thei th entry, 0 otherwise z: atom sampling without replacement (kz n k 0 =L) 68 Table 4.2: Description of Metropolis-Hastings-within-Gibbs sampling distributions. Variable Sampling Distribution/Proposal Sampling Method z nl y 2< K Multinomial (1;p nl ) Gibbs p nl k = nk exp 1 2 n s nl s nl 2 T nl ( nk ) z n z 2< K Wallenius (1 k ;L; n ) Metropolis-Hastings s nl 2< Normal ss nl + n (Dnz nl ) T nl s+ n ; ( s + n ) 1 Gibbs ~ n = vec h nk 0 1 ::: nk 0 L i 2< QL Student-t ^ ~ n ; ^ V ~ n ; 1 Metropolis-Hastings y nk 2< Q ,k = 2I Dn Normal (g nk ;G n ) Gibbs n 2< P Normal (0; 1 n I P ) Gibbs n 2< K Dirichlet h 1 + P L l=1 z nl 1 ;:::; K + P L l=1 z nl K i T Gibbs g nk 2< Q Normal (G n +G 0 ) 1 (G n T nk +G 0 T g 0 ); (G n +G 0 ) 1 Gibbs G n 2< QQ Wishart 0 +K; h R 0 1 + P K k=1 ( nk g nk )( nk g nk ) T i 1 Gibbs n 2< Gamma e + 1 2 ;f + 1 2 x n P L l=1 s nl ( nk 0 l ) 2 Gibbs x n = P L l=1 s nl ( nk 0 l ) + n , y I Dn =fk 0 1 ;:::;k 0 L g,1 K = [1;:::; 1] T 2< K y;z: atom sampling with/without replacement 69 Table 4.3: Description of EDA-specific dictionary atoms and initial parameters. Tonic Atoms 1 (t) = 0 + t 0 2f20;10; 1g 2f0:01;0:009;:::; 0; 0:01; 0:02;:::; 0:1g Phasic Atoms 2 (t) = e stt 0 T decay 1+ stt 0 T rise 2 2 u(tt 0 ) T rise 2f8; 14; 18g T decay 2f10; 15; 20g t 0 2f0; 10; 20;:::; 150g u(t) = 1,t 0 andu(t) = 0 otherwise Table 4.4: Final dictionary coherence Method Synthetic Data Real Data Initial 1 1 SD 0.9990 0.9992 ETF 0.9988 0.9990 K-SVD 0.9989 0.9991 MCMC w- rplcm 0.9989 0.9991 MCMC w-o rplcm 0.9989 0.9991 70 Table 4.5: Geweke MCMC diagnostic -P (jzj< 2) Synthetic Data Real Data w- rplcm w-o rplcm w- rplcm w-o rplcm c n 0.93 0.98 0.92 0.95 z n 0.93 0.93 0.91 0.94 n 0.91 0.98 0.99 0.92 n 0.93 0.96 0.96 0.92 n t 0 0.68 0.70 0.72 0.75 T rise 0.91 0.92 0.87 0.85 T decay 0.92 0.92 0.86 0.85 g n t 0 0.75 0.80 0.78 0.79 T rise 0.90 0.87 0.89 0.87 T decay 0.90 0.86 0.91 0.85 G n 0.81 0.75 0.82 0.83 Table 4.6: Metropolis-Hastings acceptance rates (%) Synthetic Data Real Data w- rplcm w-o rplcm w- rplcm w-o rplcm 23.22 31.52 25.48 38.15 71 Require: Data x n , hyperparametersH n 1: forn = 1;:::;N do 2: form = 1;:::;M do 3: Sample atom indices z (m) n1 ;::: z (m) nL or z (m) n 4: FindI (m) Dn =fk 0 1 ;:::;k 0 L g s.t.z (m) nl k 0 l = 1 5: Sample atom coefficientss (m) n1 ;:::;s (m) nL 6: Sample noise vector (m) n 7: Sample dictionary parameters (m) nk ,k = 2I (m) Dn 8: Sample dictionary priors g (m) n1 ;:::; g (m) nK , G (m) n 9: Sample atom selection probability priors (m) n 10: Sample noise variance (m) n 11: Sample ~ (m) n for generating (m) nk ,k2I (m) Dn 12: Compute D (m) n = h (m) n1 ::: (m) nK i 13: end for 14: end for Algorithm 1: MCMC inference of parametric dictionary learning variables 72 Require: Data x n , generated dictionaries D (m) n , number of K-means clustersN b 1: forn = 1;:::;N do 2: form = 1;:::;M do 3: Reconstruct x n based on D (m) n with OMP 4: Compute relative RMS errorErr (m) n 5: end for 6: Findm n such thatm n = arg min m=[1;M] Err (m) n 7: Retrieve parameters of dictionary atoms selected by OMP based on D (m n ) n , (m n ) = h (m n ) nk 0 1 ;:::; (m n ) nk 0 L i 8: end for 9: Concatenate U = (m 1 ) ::: (m N ) 10: Quantize dictionary parameters Q =K-means( U ) 11: Compute the final dictionary D Q = ( Q ) where operates on each column of matrix Q Algorithm 2: Combination of generated dictionaries 73 20 40 60 80 100 120 140 160 0 0.05 0.1 0.15 0.2 0.25 Time (samples) EDA (muS) (a) Initial 20 40 60 80 100 120 140 160 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time (samples) EDA (muS) (b) SD 20 40 60 80 100 120 140 160 0 0.1 0.2 0.3 0.4 Time (samples) EDA (muS) (c) ETF 20 40 60 80 100 120 140 160 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 Time (samples) EDA (muS) (d) K-SVD 20 40 60 80 100 120 140 160 0 0.1 0.2 0.3 0.4 Time (samples) EDA (muS) (e) MCMC (w- rplcm) 20 40 60 80 100 120 140 160 0 0.1 0.2 0.3 0.4 0.5 0.6 Time (samples) EDA (muS) (f) MCMC (w-o rplcm) Figure 4.1: Example of initial dictionary and dictionaries learnt Steepest Descent (SD), Equiangular Tight Frame (ETF), K-SVD and Markov Chain Monte Carlo Bayesian in- ference (MCMC) using atom sampling with and without replacement (w-,w-o rplcm). Dictionary combination is performed with N b = 400. An instance of phasic atoms shifted witht 0 = 30 is shown. 74 Time (sec) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 EDA (muS) 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Input Signal Reconstructed Signal - Initial Reconstructed Signal - SD Reconstructed Signal - ETF Reconstructed Signal - MCMC Figure 4.2: Example of input EDA signal (solid cyan line) and reconstructed signals us- ing the original dictionary (blue dashed line) and the dictionaries learnt with Steepest De- scent (SD), Equiangular Tight Frame (ETF), and Markov Chain Monte Carlo Bayesian inference (MCMC) (black-triangle, green-asterisk, and red-circled lines, respectively). Reconstruction is performed using orthogonal matching pursuit with 4 iterations. 75 2 3 4 5 6 7 8 4 6 8 10 12 14 16 x 10 −4 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (a) Synthetic, 1600 phasic + 63 tonic atoms 2 3 4 5 6 7 8 4 6 8 10 12 14 16 x 10 −4 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (b) Synthetic, 3600 phasic + 63 tonic atoms 2 3 4 5 6 7 8 4 6 8 10 12 14 16 x 10 −4 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (c) Synthetic, 6400 phasic + 63 tonic atoms 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 x 10 −3 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (d) Real, 1600 phasic + 63 tonic atoms 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 x 10 −3 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (e) Real, 3600 phasic + 63 tonic atoms 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 x 10 −3 OMP Iterations Relative RMS Error Initial SD ETF KSVD MCMC (w− rplcm) MCMC (w−o rplcm) (f) Real, 6400 phasic + 63 tonic atoms Figure 4.3: Relative root mean square (RMS) error between original and reconstructed signal with respect to the number of (orthogonal) matching pursuit (OMP) iterations. Dictionaries are learnt with Steepest Descent (SD), Equiangular Tight Frame (ETF), K-SVD, and Markov Chain Monte Carlo Bayesian inference (MCMC). During MCMC atom sampling is performed with (w-) and without (w-o) replacement (rplcm). 76 0 200 400 600 800 1000 −5 0 5 10 15 20 25 30 MCMC state θ n (a) MCMC trace of n1 −10 0 10 20 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 θ n P(θ n |−) (e) Posterior of n1 0 200 400 600 800 1000 −1 −0.5 0 0.5 1 MCMC state s n (b) MCMC trace ofs n1 −1 −0.5 0 0.5 1 0 0.05 0.1 0.15 0.2 0.25 s n P(s n |−) (f) Posterior ofs n1 0 200 400 600 800 1000 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 MCMC state π n (c) MCMC trace of n1 0 0.01 0.02 0.03 0.04 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 π n P(π n |−) (g) Posterior of n1 0 200 400 600 800 1000 0 1 2 3 4 5 MCMC state γ ε n (d) MCMC trace of n 0 1 2 3 4 5 0 0.1 0.2 0.3 0.4 0.5 γ ε n P(γ ε n |−) (h) Posterior of n Figure 4.4: Example MCMC trace plots and generated posteriors for the first element of vectors containing the (a,e) dictionary atoms n , (b,f) atom coefficients s n , (c,g) atom priors n , and (d,h) for the noise precision n . 77 Part III Bio-Behavioral Applications 78 Chapter 5 Translating Representations to Novel Physiological Measures Wearable technology can be used in a variety applications ranging from tracking and monitoring for health and well-being, to assisting with the diagnosis and treatment of serious diseases. Furthermore, continuous recording of physiological data “in the wild” gives new exciting opportunities in researchers to study problems and questions that they have never had the chance to do in a more structure laboratory setting. In the previous chapters, we described a parametric knowledge-driven framework for representing phys- iological signals and capturing their meaningful components. In this part of the thesis, we demonstrate how the aforementioned representations can be used in order to develop novel physiological measures associated to various social, psychological and develop- mental constructs that can assist domain-experts into answering a variety of questions. The work presented in this chapter was published in the following venues: T. Chaspari, B. Baucom, A. C. Timmons, A. Tsiartas, L. Borofsky Del Piero, K.J.W. Baucom, P. Georgiou, G. Margolin, and S.S. Narayanan, “Quantifying EDA synchrony through joint sparse representation: A case-study of couples’ interactions,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 2015. T. Chaspari, L.I. Stein Duker, S.A. Cermak, and S.S. Narayanan, “EDA-Gram: Designing Electrodermal Activity Fingerprints for Visualization and Feature Extraction,” International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 403-406, Orlando, FL, 2016. 79 Based on the aforementioned physiological representations, we design a multidimen- sional EDA fingerprint for visualization and feature extraction purposes. The “EDA- Gram” is created similarly to a spectrogram in signal processing, in which the horizontal axis captures the signal frames, the vertical dimension depicts the selected atom width, while the intensity values are computed from the corresponding atom amplitude. Thus it incorporates the SCR width and amplitude variations. Measures of the EDA-Gram intensity are used as features to describe the underlying signals. Results indicate that the proposed features are able to discriminate between multiple arousal and stress conditions in two datasets, signifying their ability at capturing the fine-level variations of the EDA. Beyond the representation of one signal, one important concept is the co-variation degree (or “synchrony”) between signals recorded from interacting people. We draw an example motivated by the family studies domain, where quantifying this physiological synchrony (i.e. the coordination in physiological patterns) can reveal insights about the quality of interaction, the nature of the relationship, as well as the individuals’ personal characteristics [85, 73]. We propose Sparse EDA Synchrony Measure (SESM), an index derived from the joint sparse representation of EDA ensembles that captures the sim- ilarity between the meaningful signal parts. Sparse decomposition is performed using Simultaneous Orthogonal Matching Pursuit (SOMP) from a knowledge-driven dictio- nary of tonic and phasic atoms. We compute SESM as the negative natural logarithm of the joint reconstruction error and evaluate it with data from interactions of married and young dating couples participating in tasks of varying emotional intensity. Through statistical analysis and multiple linear regression experiments, our results indicate that SESM depicts significant differences across tasks in both datasets considered and can be associated to individuals’ attachment-related characteristics. These are consistent with previous findings on couples’ physiological synchrony [85] and attachment style [73]. In Sections 5.1 and 5.2 we describe the computational framework for the two case- studies, the data used in our analysis, the evaluation procedure which was motivated by our application of interest, and the results of our analysis. 80 5.1 Physiological Spectrogram: EDA-Gram The EDA-Gram is a multidimensional fingerprint of the electrodermal activity (EDA) signal, inspired by the widely-used notion of spectrogram. It is based on the sparse decomposition of EDA from a knowledge-driven set of dictionary atoms. We focus on SCR shape and occurrence, as these are related to various psychophys- iological conditions [46, 23] and can be measured from the phasic atoms. An intuitive characteristic of phasic atoms is their width! k ,k = 1;:::;K 2 , denoted as phasic width (PW), and incorporating SCR rise and recovery (Fig. 5.1a). Because of the large number of phasic atoms in the dictionary, we group the PW values through a histogram, such that eachf! k g K 2 k=1 is assigned to a bandf m g M m=1 , MK 2 , called phasic width band (PWB). Therefore PWBs correspond to the centers of the histogram generated from the PW values (Fig. 5.1b). An analysis frame is decomposed as x n = x n tonic + P k2In c k (t; k ), where x n tonic is the reconstructed signal from the tonic atoms,I n =fi 1 ;:::;i L 2 g andc i l 6= 0 (l2I n ) are theL 2 <L indices and coefficients of phasic atoms. Each atom from setI n is asso- ciated with coefficient c i l and PWB i l . Figs. 5.1c-d show an EDA signal, indicative analysis frames, and the corresponding selected phasic atoms. Inspired by the spectrogram, which is a signal representation over time and fre- quency, the EDA-Gram can be expressed as the time evolution of the EDA over the PWBsf m g M m=1 . The EDA-GramS x (n;m) for signal x n is: S x (n;m) = X i2Inm c i ; n = 1;:::;N; m = 1;:::;M I nm =fi l : i l 2I n ^9m s.t. m = i l g (5.1) whereI nm includes the indices of the selected phasic atoms for then th time frame that belong to them th PWB. The horizontal axis of EDA-Gram corresponds to time (i.e. the n th frame), the vertical axis to PWB (i.e. m ), while intensity values are computed from the sum of coefficients of selected atoms over each PWB (Fig. 5.1e). 81 EDA-Gram offers a foundation for deriving appropriate features. We extract two types of features that capture the mean intensity and the variability of intensity across PWBs: S Int x = 1 NM N X n=1 M X m=1 S x (n;m) (5.2) S Var x = 1 N N X n=1 v u u t 1 M M X m 0 =1 S x (n;m) S x (n) 2 (5.3) where S x (n) = 1 M P M m=1 S x (n;m) is the mean intensity over all PWBs for then th frame. Large values ofS Int x denote frequent SCRs of high amplitude, while largeS Var x indicates high SCR variability across the signal. We will explore how the proposed features vary in relation to different arousal levels and environmental conditions. 5.1.1 Data Description We validate our approach with two datasets. The first is the publicly available dataset for emotion analysis using physiological signals (DEAP) [82], which investigates the relation of human physiology to multimedia. DEAP includes 40 one-minute recordings from 32 individuals. We use the self-reported arousal measures, which take continuous values between 1-9, as these are closely related to EDA [46, 23]. EDA was collected from the index and middle finger and was downsampled toF s =32Hz, which consists a typical practice. Motivated by the challenges children with special healthcare needs face during oral care [141], the second dataset was collected in order to examine the impact of a sen- sory adapted dental environment (SADE) on the behavioral distress, physiological stress, pain, and sensory discomfort of children with autism spectrum disorders (ASD). Specif- ically, the SADE modified the dental room with respect to visual, auditory, and tactile stimuli and was compared to a regular dental environment (RDE) in children with ASD and their typically developing (TD) peers [28]. Our data (DENTISTRY) include EDA recordings from children with ASD (n=22) and their TD peers (n=22), 6-12 years of age, during the dental prophylaxis (cleaning). Each child underwent a dental cleaning in 82 both the RDE and SADE, which will be compared in terms of EDA measures. EDA was collected with a sampling frequencyF s =31.25Hz from the index and middle finger. 5.1.2 Baseline Features We compare the EDA-Gram features with two baselines. The first includes the com- monly used mean SCL and SCR frequency [46, 23] (SCL M andSCR F ), that capture the average signal level and number of SCRs over a given time. For the second base- line, we compute the power spectral density (PSD) 16 frequency bands (0-2F s Hz). We further extract the average PSD over time and frequency (PSD Int ), as well as its mean standard deviation over frequency bands (PSD Var ), similarly to EDA-Gram features. 5.1.3 Results EDA-Grams show increased intensity over regions of large tonic levels and high pha- sic activity (Fig. 5.2). RDE samples from the DENTISTRY data depict larger SCL and SCR frequency compared to SADE (Fig. 5.2a), which is reflected in higher EDA-Gram intensity values (Fig. 5.2a,b). It is noteworthy that although Participant 2 depicts EDA signals within the same range for the two environments, there are differences in the cor- responding waveforms, that are also captured by the EDA-Grams. Similar observations were made for DEAP. We explore the relation between arousal and EDA features in DEAP through a lin- ear mixed-effects model (LME) [64]. LME addresses independence assumption viola- tions that could occur from multiple samples per participant. It models the self-reported arousal score A ij from the i th individual during the j th trial based on EDA feature F ij , a grand-mean 00 , an individual-specific mean u 0j , and a residual r ij , such that A ij = 00 +u 0j + 10 F ij +r ij , where 10 reflects the relationA ij -F ij . Results indicate signifi- cant association between all considered EDA measures and the arousal scores, which is increased for the EDA-Gram features (Table 5.1). 83 For the DENTISTRY data, we examine the effect of dental environment on EDA features using a two-way ANOV A, whose independent variables include the dental envi- ronment (RDE/SADE) and outcome (ASD/TD). Results suggest significant differences between RDE and SADE (Table 5.1). Regression and classification were performed with a leave-one-subject-out cross- validation in order to see how features are generalized on unseen data. Linear regres- sion was used for the DEAP dataset to predict the continuous arousal levels from EDA features, while K-nearest neighbor (K-NN, K=13) classification was performed for the DENTISTRY data to classify between RDE and SADE. Feature dimensionality is one for all cases and chance classification accuracy is 50%. Results suggest better discrim- inability of the EDA-Gram features compared to the considered baselines (Table 5.2). 5.2 Sparse EDA Synchrony Measure In the following, we will use x = [x(1) ::: x(L)] T 2< L to denote a vector of length L andx(t),t = 1;:::;L, its corresponding value at thet th sample in time. The p-order norm of signal x is symbolized withjjxjj p and the cardinality of a set is written asj j. Let f (m) 2< L ,m = 1; 2, be two co-occurring EDA signals. According to joint sparse representation [154, 150], these can be represented as a linear combination of a small set ofN common atoms from a dictionary D2< LQ such that f (m) ' Dc (m) . The atom coefficients c (m) 2< Q ,jjc (m) jj 0 = NQ, of each stream are supported only on the same f1;:::;Qg withj j = N. Sparse decomposition is performed with SOMP because of its simple implementation and theoretical guarantees of correctness [150]. SOMP is an iterative greedy algorithm selecting at each stepn = 1;:::;N the dictionary atom with maximum average correlation such that: g n =arg max g2D 1 2 2 X m=1 <R n f (m) ; g>; n = 1;:::;N (5.4) 84 whereR n+1 f = R n f (R n f) T g n g n andR 1 f (m) f (m) . This criterion is slightly modified compared to original SOMP, since atom selection is based on actual and not absolute correlation value. This preserves signal interpretability; negative correlations would cause reversed SCR atoms, disturbing the signal structure [30]. SOMP further follows a least-squares minimization to obtain the best approximation over all atoms that have been chosen at each step. If D n = [g 1 ;:::; g n ]2< Ln is the matrix of selected atoms at iterationn, the least-squares approximation of f (m) form = 1,2 is: ~ f (m) n = D n (D n T D n ) 1 D n T f (m) ; n = 1;:::;N (5.5) Representation is assessed with the average relative root mean square (RMS) error between original and reconstructed signals: RelErr = 1 2 2 X m=1 jjf (m) ~ f (m) N jj 2 jjf (m) jj 2 (5.6) Taking into account the long duration of recordings, the original data are segmented and analyzed into K non-overlapping windows of predetermined lengthL, each yielding representation error: RelErr k = 1 2 2 X m=1 jjf (m) k ~ f (m) k;N jj 2 jjf (m) k jj 2 ; k = 1;:::;K (5.7) where f (m) k , ~ f (m) k;N are original and reconstructed signals atk window. Intuitively we understand that when two signals are similar, the common dictionary atoms yield low representation error. In contrast, if they are less alike, the dictionary 85 atoms jointly selected for both of them will not be able to reliably represent them, in- creasing the corresponding error. This allows us to define SESM, a synchrony index that captures the similarity between two EDA signals expressed as: SESM = ln 1 K K X k=1 RelErr k ! (5.8) More synchronized EDA streams will have higher SESM value, and vice-versa. As shown in Fig. 5.3, SCR fluctuations that are not common for the two EDA signals yield poor reconstruction (12-20sec), while those that co-occur result in commonly selected atoms that reliably reconstruct the corresponding parts of both streams (0-12sec). 5.2.1 Data Description The first set of our data was collected as a part of an ongoing study of communication and emotion in married couples at the University of Utah. We used EDA signals from 19 married couples (ages 21-47) recorded using the BIOPAC MP150 system at a sam- pling rate of 62.5Hz. Two gel electrodes were placed on the thenar and hypothenar parts of the palm at each participant’s non-dominant hand. Couples were asked to sit qui- etly for 5 minutes in the same and then a separate room to develop a baseline for the physiology data, referred as “RestS” and “RestT”. A 5-minute events-of-the-day discus- sion (“Events”) followed, in which couples were asked to talk with one another however they would normally do when they reunite after the day. The relationship-history con- versation (“History”) lasted 5 minutes and partners talked about the beginning of their relationships. Finally, during two 10-minute change discussions, they focused on areas of disagreement in their marriage. One conversation would be on a topic of concern for the husband (“ChangeH”) and one for the wife (“ChangeW”). The second database comes from an ongoing data collection of interactions between 8 young couples (ages 18-25 years) conducted at the University of Southern California. The same physiological equipment and recording settings were used as in the married 86 couples data collection with similarly structured tasks. Partners first watched a video of relaxing images for 15 minutes to establish physiological baseline (“Relaxation”). They were then instructed to engage in a 5-minute discussion in which they planned a date that they could have together (“Date”). During the 10-minute change (“Change”), partners had to talk to each other about things in their relationship that could be different. Finally, two loss conversations followed, in which they had to discuss for 10 minutes a significant loss in their life. One conversation was centered on male (“LossM”) and the other on female (“LossF”) loss. Participants completed a self-report in order to get a measure of the extent to which they relate to their romantic partners. For married couples, the “Adult Attachment Ques- tionnaire” (AAQ) [135] was used, containing 17-items eventually combined into two orthogonal dimensions of avoidance and ambivalence. Avoidance reflects the tendency to withdraw from closeness and intimacy (scores 8-56), while ambivalence depicts con- flicted thoughts and feelings about whether others can be counted on in relationships (scores 9-63). Young couples completed a similar survey, “Experiences in Close Re- lationships - Revised” (ECR-R) [62], designed to get self-reports of attachment-related anxiety and avoidance. Similar to AAQ, anxiety indicates how secure people are about their romantic partners, while avoidance captures how comfortable they are depending on others. Both are scored by averaging the relevant scores of a 18-item questionnaire with values ranging between 1 and 7. Movement artifacts were manually removed from EDA using the visualization ca- pabilities of BIOPAC’s AcqKnowledge software. High-frequency noise artifacts were further suppressed with a low-pass Blackman filter [108] of length corresponding to 1 second. 5.2.2 Application-Driven Evaluation of SESM Physiological synchrony cannot be directly observed challenging the evaluation of our model. We follow previous psychophysiological and engineering studies examining 87 synchrony for different types of interactions [85, 122] and in relation to behavioral in- dices [70, 73]. SESM is computed between EDA signals of husband/wife and male/female for the two datasets. The various degrees of task intensity in our data suggest the presence of distinct EDA synchrony patterns [85]. We perform a repeated measures analysis of variance (ANOV A) [146] to detect overall significant differences between mean SESM values (averaged for all couples) across tasks. We use the Greenhouse-Geisser correction [2], since our data violate the assumption of sphericity, meaning that the population correla- tion changes across pairs of tasks. We further perform paired t-tests to examine pairwise differences in SESM between tasks with Bonferroni correction to account for the infla- tion occurring from the multiple tests [1]. Previous research indicates that couples’ physiological synchrony can be related to the individuals’ attachment characteristics [73]. Taking this into account, we use multiple linear regression (MLR) [38] to predict the attachment scores of each person based on the SESM values computed at the different tasks. MLR is performed within a nested leave- one-couple-out cross-validation [27]; outer cross-validation is used to assess the perfor- mance of the model, while inner to determine the model hyper-parameters, i.e. analysis window length and SOMP iterations. We report the correlation and the corresponding significance between predicted and ground-truth scores with two feature groups. The first is a 4-dimensional vector of SESM values from only the discussions, i.e. (Events, History, ChangeH, ChangeW) and (Date, Change, LossM, LossF) for married and young couples. The second includes SESM from all tasks resulting in 6 and 5 dimensions for the two datasets, respectively. Since one of the married couples did not provide attachment ratings, MLR results are obtained using 18 couples for the first dataset. 5.2.3 Results EDA signals were segmented using analysis windows of W = 10, 20 and 30 sec to capture various time-scales and represented withN = 5, 10, 15 and 20 SOMP iterations 88 to account for multiple detail levels. For the sake of simplicity, statistical analysis is performed with 20 sec and 10 SOMP iterations, while similar results were obtained with the remaining parameter combinations. During the inner fold of the MLR nested cross- validation, we perform a leave-one-couple-out cross-validation within the training set using a 34 grid of all combinations for parametersW andN. The one with the highest correlation during training is used to predict the attachment rating of the corresponding test data. Repeated-measures ANOV A indicates a significant effect of the type of task on both databases (Table 5.3). For married couples, SESM was significantly lower for the sepa- rate resting baseline compared to wife’s change discussion (p=0.028) and lower with ap- proaching significance compared to relationship history (p=0.112) and husband’s change (p=0.120). Events of the day discussion showed significantly lower SESM than relation- ship history (p=0.01) and both change (p<0.01) discussions (Fig. 5.4a). For the small sample size of young couples (Fig. 5.4b), significant difference only occurred between relaxation and date planning (p=0.059). MLR results suggest that EDA synchrony quantified through SESM can be associated with attachment ratings (Table 5.4). Ambivalence scores of married couples tend be accurately predicted for both wife and husband, while avoidance is moderately predicted without reaching statistical significance. Wife’s ambivalence is better predicted when we include the resting baselines in the features, indicating that these tasks might contain useful information about of individuals’ attachment patterns. Male anxiety and avoidance in young couples tend to be related to SESM, especially during discussions. Female ratings show poorer prediction rates. This could be due to the small sample size resulting in models that might not capture the inherent variability. For most cases, including relaxation did not benefit prediction results. Our results indicate that the proposed SESM is able to differentiate potential changes in synchrony patterns across tasks. The discussions occurring in the two datasets are of 89 comparable intensity and can be studied in parallel. Events of the day and date plan- ning conversations for married and young couples were in general more neutral. On the other hand, change discussions in both data, as well as relationship history and loss discussions from the first and second datasets, respectively, contain more intense emo- tions. SESM values differ between these neutral and more intense discussions suggesting higher synchrony patterns of the latter for both datasets. These are consistent with pre- vious findings [85] about higher physiological linkage during negative expression and exchange. We further observed that SESM is associated with individuals’ attachment scores in both datasets, as indicated in [73]. Ambivalence and anxiety seem to be reasonably predicted for husband and male partner, respectively. A weak association occurs between SESM and avoidance, the latter being harder to predict. The two datasets involved quite different physiological baselines. Young couples watched together a video of images, while married couples performed two resting base- lines. SESM depicted different patterns for the two, i.e. higher for young couples’ relaxation than their other discussions and lower for married couples’ resting baseline compared to their other tasks. The reason for this observation might be two-fold. SESM appears to be more reliable in capturing synchrony of simultaneous fluctuations of the phasic part, rather than co-variation of the smooth tonic signal. Potentially because of the relaxing images, young couples’ relaxation showed smooth decreasing trends with minimal fluctuations resulting in small reconstruction errors and high SESM values. We could overcome this by comparing the joint representation of EDA ensembles relatively to their univariate representations. The distinct nature of the two baselines could also affect differently the corresponding signals [57]. 90 5.3 Conclusions In the first part of this chapter, we proposed the EDA-Gram for visualizing EDA signals and extracting meaningful features. It is created based on the sparse decomposition of EDA with knowledge-driven dictionaries that capture the tonic and phasic signal com- ponents. EDA-Gram is a multidimensional representation, in which the x-axis denotes time, the y-axis reflects the phasic EDA characteristics through the PWB space, while the intensity is measured from the amplitude of the selected atoms. Visualization and anal- ysis indicate the ability of EDA-Grams and the derived features to differentiate between multiple arousal conditions and environmental effects in two datasets outperforming the considered baseline metrics. Next we demonstrated a way to quantify physiological synchrony using SESM, a synchrony index that captures the similarity between two EDA signals using joint sparse models with an appropriate knowledge-driven dictionary. Results on two datasets in- dicate that SESM shows distinct synchrony patterns across varying intensity tasks and is associated with couples’ attachment-related scores. Future work will focus on asym- metrical EDA models for identifying directional effects between people. We will also compare our approach to previously proposed dynamical systems and time-series analy- sis, and extend it to other biomedical signals, such as ECG. 91 Table 5.1: Linear mixed-effects model (LME) for predicting self- reported arousal (DEAP) from EDA features and two-way ANOV A comparing differences on EDA fea- tures between dental environments and cleaning tasks (DENTISTRY). DEAP DENTISTRY Feature LME Estimate ANOV A P-value Intercept Independent Environment Outcome 00 10 (RDE/SADE) (ASD/TD) SCL M 0.94 0.05 <0.01 <0.05 SCR F 0.42 0.23 <0.11 <0.05 PSD Int 0.94 0.05 <0.01 0.05 PSD Var 0.59 0.08 0.60 0.80 S Int x 0.25 y 0.33 <0.05 0.06 S Var x 0.25 y 0.33 <0.05 0.31 yp< 0:05, *p< 0:01 Table 5.2: Pearson’s correlation between real and predicted arousal values (DEAP) and classification accuracy between regular and sensory adapted dental environments (DEN- TISTRY) using the skin conductance level (SCL M ), skin conductance response fre- quency (SCR F ), intensity and variation of power spectral density (PSD Int ,PSD Var ), and EDA-Gram features (S Int x ,S Var x ). Feature DEAP - Correlation DENTISTRY - UA (%) SCL M 0.090 52.3 SCR F 0.088 47.2 PSD Int 0.080 y 52.3 PSD Var 0.042 y 44.3 S Int x 0.152 59.1 S Var x 0.127 65.3 yp< 0:05, *p< 0:01 92 Table 5.3: Repeated-measures ANOVA for overall significant differences of Sparse EDA Synchrony Measure (SESM) across tasks. Database F-score P-value Married couples F(2.736,49.256)=7.799 p<0.01 Young dating couples F(2.168,15.179)=5.272 p=0.017 Table 5.4: Pearson’s correlation between ground-truth and predicted attachment ratings using multiple linear regression with Sparse EDA Synchrony Measure (SESM) features. Married couples’ interactions Tasks Avoidance Ambivalence Husband Wife Husband Wife Discussions a 0.03(0.89) 0.39(0.11) 0.61(0.01)* 0.14(0.57) Baselines b + Discussions a 0.32(0.19) 0.20(0.42) 0.62(0.01)* 0.57(0.01)* a (Events, History, ChangeH, ChangeW), b (RestS, RestT), * p<0.05 Young dating couples’ interactions Tasks Avoidance Anxiety Male Female Male Female Discussions c 0.79(0.02)* -0.10(0.82) 0.88(0)* -0.65(0.08) Relaxation + Discussions c -0.35(0.40) 0.32(0.44) 0.69(0.06)y -0.60(0.12) c (Date, Change, LossM, LossF), * p<0.05,y p<0.1 93 2 4 6 8 10 0 0.05 0.1 0.15 0.2 EDA (μS) Time (sec) ω 1 ω K Phasic Widths (PW): {ω 1 ,ω 2 ,…,ω K2 } (a) Phasic dictionary atoms 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Ω 1 Ω 2 . Ω M Phasic Width (PW) ω k Normalized Frequency Phasic Width Bands (PWB) {Ω 1 ,...Ω M } (b) PW histogram 50 100 150 200 7 8 9 10 11 12 13 14 15 Time (sec) EDA (μS) (c) Input EDA 2 4 6 8 10 4.7 4.8 4.9 5 5.1 5.2 Time (sec) EDA (μS) (d) EDA representation Input frame Phasic atom 1 Phasic atom 2 Time (sec) PWB (sec) (e) EDA−Gram 50 100 150 200 6.4 5.5 4.9 4.5 4.1 2.2 1.6 1.2 0.5 1 1.5 2 2 4 6 8 10 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Time (sec) EDA (μS) Input frame Phasic atom 1 Phasic atom 2 Phasic atom 3 2 4 6 8 10 3.85 3.9 3.95 4 4.05 Time (sec) EDA (μS) Input frame Phasic atom 1 Phasic atom 2 Phasic atom 3 Figure 5.1: EDA-Gram design. (a) Example of phasic dictionary atoms with phasic width (PW). (b) Histogram of PW values with the resulting phasic width bands (PWB). (c) Input EDA. (d) Signal analysis frames (in solid blue line) and selected phasic atoms (in non-solid lines). (e) EDA-Gram. 94 50 150 8 10 12 14 RDE − Original Signal EDA (μS) PWB (sec) RDE − EDA−Gram 50 150 6.4 5.5 4.9 4.5 4.1 2.2 1.6 1.2 0.5 1 1.5 2 50 100 150 2 3 4 SADE − Original Signal Time (sec) EDA (μS) Time (sec) PWB (sec) SADE − EDA−Gram 50 100 150 6.4 5.5 4.9 4.5 4.1 2.2 1.6 1.2 0.5 1 1.5 2 (a) Participant 1 40 80 120 2 4 6 8 RDE − Original Signal EDA (μS) PWB (sec) RDE − EDA−Gram 40 80 120 6.4 5.5 4.9 4.5 4.1 2.2 1.6 1.2 0.5 1 1.5 60 140 220 2 4 6 8 SADE − Original Signal Time (sec) EDA (μS) Time (sec) PWB (sec) SADE − EDA−Gram 60 140 220 6.4 5.5 4.9 4.5 4.1 2.2 1.6 1.2 0.5 1 1.5 (b) Participant 2 Figure 5.2: Example of EDA signals and EDA-Grams for two subjects during the regular (RDE) and sensory adapted (SADE) dental environments. 95 2 4 6 8 10 12 14 16 18 20 10 15 EDA 1 EDA (μS) 2 4 6 8 10 12 14 16 18 20 4 6 EDA (μS) EDA 2 Original Reconstructed 2 4 6 8 10 12 14 16 18 20 0 4 Scaled Phasic Atoms for EDA 1 EDA (μS) 2 4 6 8 10 12 14 16 18 20 0 1 Scaled Phasic Atoms for EDA 2 Time (sec) EDA (μS) 1st phasic 2nd phasic 3rd phasic Figure 5.3: Joint sparse decomposition of two EDA signals with simultaneous orthogonal matching pursuit (SOMP) and their commonly selected phasic atoms differently scaled per signal. Same legend applies to first/second and third/fourth plot and same time axis to all. 96 RestS RestT Events History ChangeH ChangeW 6.5 7 7.5 8 Tasks SESM Relaxation Date Change LossM LossF 8 8.5 9 Tasks SESM Figure 5.4: Sparse EDA Synchrony Measure (SESM) across tasks. Error bars represent one standard deviation distance from the mean. 97 Part IV Concluding Remarks 98 Chapter 6 Concluding Remarks 6.1 Thesis Summary This thesis presented a knowledge-driven interpretable parametric framework for reliably representing physiological signals. Its first contribution lies in the design of signal spe- cific representations with parametric dictionaries and sparse decomposition techniques in order to distinguish the meaningful signal parts and capture their informative charac- teristics. The second contribution concentrates on the automatic learning of representa- tions through the formulation and inference of a Bayesian dictionary learning problem. Finally, the third contribution lies in translating these signal representations into measur- able indices that can be related to behavioral constructs. In the following, we will briefly summarize each of these contributions. DESIGNING PHYSIOLOGICAL REPRESENTATIONS: This thesis proposed the use of sparse decomposition techniques along with parametric knowledge-driven dictionaries in order to capture the characteristic structure of physiological signals. Dictionaries contain tonic and phasic atoms with high variability that incorporate the signal levels and fluc- tuations. We used matching pursuit algorithms for signal representation because of their speed and simplicity and performed early stopping for preserving the interpretability of the selected atoms. SCR detection is achieved through the post-processing of the selected atoms. Quantitative evaluation criteria include reconstruction, component detection, and 99 compression measures and indicate the improvement of our approach against previously proposed methods. LEARNING PHYSIOLOGICAL REPRESENTATIONS: This thesis further demonstrated the ability of Bayesian inference to learn parametric dictionaries with applications to biomedical signals. The proposed stochastic framework that can better handle uncer- tainty and yield estimation of the full problem variables. Probabilistic distributions were estimated through MCMC because of its simplicity and ability to fully perform Bayesian inference. Finally, since the high dimensionality of our problem increases complexity and running time, we demonstrate the ability of our algorithm for signal-specific learn- ing and parallel processing. Our results in synthetic and real EDA data indicate that our approach can yield superior signal representation compared the gradient-descent and Equiangular Tight frame methods. TRANSLATING PHYSIOLOGICAL REPRESENTATIONS TO NOVEL MEASURES: The representations resulting from the previous parts of this thesis were used in real-world applications through the design of physiological spectrograms and the quantification of EDA synchrony. The first is used for stress detection, while the latter is employed for as- sessing physiological co-variation between romantic couples. Both measures are based on (joint) sparse representation techniques along with EDA-specific dictionaries. Our results suggest that the EDA-Gram can capture and accentuate fine-grain signal fluc- tuations, not always apparent from traditional signal inspection. They further indicate that the proposed Sparse EDA Synchrony Measure (SESM) differs across discussions of varying emotional intensity and can be further related to self-reported attachment mea- sures indicative of person-dependent characteristics about relationships. Such visualiza- tions and measures can assist in clinical applications, therefore future work will evaluate those with human annotators for usefulness and easiness of operation. 100 6.2 Future Work Human physiology provides a window into a person’s physical, mental and affective state complementing observable indices of behavior, but also eliciting and affecting them. It is a complex, multifaceted, and dynamically changing system confounded by multiple sources of individual and environmental variability. This thesis has presented the design and automatic learning of physiological representations through the use of parametric dictionaries with sparse decomposition techniques. Future work will focus on the mea- surement, quantification and modeling of physiology in relation to human state around two main foci. MULTIMODAL BIO-INTEGRATION AND CONTEXTUALIZATION: Biomedical sig- nals co-evolve interdependently with each other and as a part of an interacting environ- ment. Capturing these variable and complementary sources of physiological information is now facilitated more than ever by the Internet of Things, which makes it possible for each individual to interact with multiple wearable and body-integrated sensors. This research thread focuses on novel ways to jointly process biomedical signals and con- textualizing them through the co-occurring multimodal information (e.g. audio record- ings, video, self-reports, biochemical markers). This has the potential to yield a holistic mind-body view of a person, which can offer actionable knowledge and valuably inform decision making by stakeholders. HUMAN-ASSISTIVE BIO-FEEDBACK: This thread focuses on the development of assistive bio-feedback technologies across the span of wellness, health, and other appli- cations. This will enable clinicians to more effectively monitor symptoms, understand the causes of a disease and assess therapeutic intervention, while it can help individ- uals can exercise and promote self-regulation. Examples of the not so distant future include increased blood-pressure warnings for people at risk of heart failure and stroke, physiology-guided feedback to mentally challenged populations, such as individuals with Autism, for developing emotional self-awareness, or even closed loop stimulation for reducing stress and promoting emotional well-being. Despite these opportunities, the 101 nature of bio-feedback (e.g. visual, haptic, neuro-feedback, etc.), the scope of its occur- rence (e.g. for reassuring the user, modifying a behavior, better understanding etc.) and the development of reliable automatic provision mechanisms are some unresolved issues that need to be examined. 102 Reference List 103 Reference List [1] Herv´ e Abdi. The Bonferonni and ˇ sid´ ak corrections for multiple comparisons. Encyclopedia of measurement and statistics, 3:103–107, 2007. [2] Herv´ e Abdi. The Greenhouse-Geisser Correction. Encyclopedia of research de- sign. SAGE Publications, Thousand Oaks, CA, USA, 2010. [3] M. Aharon and M. Elad. Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM Journal on Imaging Sciences, 1(3):228–247, 2008. [4] M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, 2006. [5] D.M. Alexander, C. Trengove, P. Johnston, T. Cooper, J.P. August, and E. Gordon. Separating individual skin conductance responses in a short interstimulus-interval paradigm. Journal of neuroscience methods, 146(1):116–123, 2005. [6] M. Ataee, H. Zayyani, M. Babaie-Zadeh, and C. Jutten. Parametric dictionary learning using steepest descent. Proceedings of ICASSP, pages 1987–1981, 2010. [7] Dominik R Bach, Karl J Friston, and Raymond J Dolan. An improved algorithm for model-based analysis of evoked skin conductance responses. Biological psy- chology, 94(3):490–497, 2013. [8] D.R. Bach, J. Daunizeau, K.J. Friston, and R.J. Dolan. Dynamic causal modelling of anticipatory skin conductance responses. Biological Psychology, 85:163–170, 2010. [9] D.R. Bach, J. Daunizeau, N. Kuelzow, K.J. Friston, and R.J. Dolan. Dynamic causal modeling of spontaneous fluctuations in skin conductance. Psychophysiol- ogy, 48(2):252–257, 2011. [10] D.R. Bach, G. Flandin, K.J. Friston, and R.J. Dolan. Modelling event-related skin conductance responses. International Journal of Psychophysiology, 75:349–356, 2010. 104 [11] D.R. Bach and K.J. Friston. Model–based analysis of skin conductance responses: Towards causal models in psychophysiology. Psychophysiology, 50(1):15–22, 2013. [12] Q. Barth´ elemy, C. Gouy-Pailler, Y . Isaac, A. Souloumiac, A. Larue, and J.I. Mars. Multivariate temporal dictionary learning for EEG. Journal of neuroscience meth- ods, 215(1):19–28, 2013. [13] Q. Barth´ elemy, C. Gouy-Pailler, Y . Isaac, A. Souloumiac, A. Larue, and J.I. Mars. Multivariate temporal dictionary learning for eeg. Journal of neuroscience meth- ods, 215(1):19–28, 2013. [14] M. B´ edard and D.A.S. Fraser. On a Directionally Adjusted Metropolis-Hastings Algorithm. International Journal of Statistical Sciences, 9(1):33–57, 2008. [15] M. Benedek and C. Kaernbach. A continuous measure of phasic electrodermal activity. Journal of neuroscience methods, 190(1):80–91, 2010. [16] M. Benedek and C. Kaernbach. Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology, 47(4):647–658, 2010. [17] P.F. Binkley. Predicting the potential of wearable technology. IEEE Engineering in Medicine and Biology Magazine, 22(3):23–27, 2003. [18] Christopher M Bishop. Model-based machine learning. Philosophical Transac- tions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 371(1984), 2013. [19] T. Blumensath. Monte Carlo methods for compressed sensing. In IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1000–1004, 2014. [20] T. Blumensath and M. Davies. Sparse and shift-invariant representations of music. IEEE Transactions on Audio, Speech, and Language Processing, 14(1):50–57, 2006. [21] P. Bonato. Wearable Sensors/Systems and Their Impact on Biomedical Engineer- ing: An overview from the guest editor. IEEE Engineering in Medicine and Biol- ogy Magazine, 22(3):18–20, 2003. [22] W. Boucsein. Electrodermal activity. Plenum University Press, 1992. [23] W. Boucsein. Electrodermal activity. Springer, New York, NY , 2012. [24] W. Boucsein, D.C. Fowles, S. Grimnes, G. Ben-Shakhar, W.T. Roth, M.E. Daw- son, and D.L. Fillion. Committee report: Publication recommendations for elec- trodermal measurements. Psychophysiology, 49:1017–1034, 2012. 105 [25] A. Burns, B.R. Greene, M.J. McGrath, T.J. O’Shea, B. Kuris, S.M. Ayer, F. Stroi- escu, and V . Cionca. SHIMMER - A wireless sensor platform for noninvasive biomedical research. IEEE Sensors Journal, 10(9):1527–1534, 2010. [26] T Tony Cai and Lie Wang. Orthogonal matching pursuit for sparse signal recovery with noise. IEEE Transactions on Information Theory, 57(7):4680–4688, 2011. [27] Gavin C Cawley and Nicola LC Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 11:2079–2107, 2010. [28] S.A. Cermak, L.I.S. Duker, M.E. Williams, M.E. Dawson, C.J. Lane, and J.C. Polido. Sensory Adapted Dental Environments to Enhance Oral Care for Children with Autism Spectrum Disorders: A Randomized Controlled Pilot Study. Journal of Autism and Developmental Disorders, 45(9):2876–2888, 2015. [29] K.S. Chan. Asymptotic behavior of the Gibbs sampler. Journal of the American Statistical Association, 88(421):320–326, 1993. [30] T. Chaspari, A. Tsiartas, I. L. Stein, Cermak S. A., and S. Narayanan. Sparse Representation of Electrodermal Activity with Knowledge-Driven Dictionaries. IEEE Transactions on Biomedical Engineering, Under review. [31] T. Chaspari, A. Tsiartas, L.I. Stein, S.A. Cermak, and S.S. Narayanan. Sparse representation of electrodermal activity with knowledge-driven dictionaries. IEEE Transactions on Biomedical Engineering, 62(3):960–971, 2015. [32] S.S. Chen, D.L. Donoho, and M.A. Saunders. Atomic decomposition by basis pursuit. SIAM journal on scientific computing, 20(1):33–61, 1998. [33] H Cheng, Z Liu, L Hou, and J Yang. Sparsity induced similarity measure and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 2012. [34] S. Chib and E. Greenberg. Understanding the Metropolis-Hastings algorithm. The American Statistician, 49(4):327–335, 1995. [35] S. Chib, E. Greenberg, and R. Winkelmann. Posterior simulation and Bayes fac- tors in panel count data models. Journal of Econometrics, 86(1):33–54, 1986. [36] S. Chib and I. Jeliazkov. Marginal likelihood from the Metropolis-Hastings out- put. Journal of the American Statistical Association, 96(453):270–281, 2001. [37] S. Chib and S. Ramamurthy. Tailored randomized block MCMC methods with application to DSGE models. Journal of Econometrics, 155(1):19–38, 2010. 106 [38] Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. Applied multi- ple regression/correlation analysis for the behavioral sciences. Routledge, 2013. [39] M.K. Cowles and B.P. Carlin. Markov chain Monte Carlo convergence diag- nostics: a comparative review. Journal of the American Statistical Association, 91(434):883–904, 1996. [40] J.T. Coyne, C. Baldwin, A. Cole, C. Sibley, and D.M. Roberts. Applying real time physiological measures of cognitive load to improve training. Foundations of augmented cognition. Neuroergonomics and operational neuroscience, pages 469–478, 2009. [41] R. Crandall, B. Dong, and A. Bilgin. Randomized iterative hard thresholding: A fast approximate mmse estimator for sparse approximations. preprint http://math. arizona. edu/ dongbin/Publications/RandIHT June2013. pdf, accessed, 15:17, 2013. [42] H.P. Da Silva, A. Fred, A. Lourenco, and A.K. Jain. Finger ecg signal for user authentication: Usability and performance. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pages 1– 8. IEEE, 2013. [43] I. Daubechies. The wavelet transform, time-frequency localization and signal analysis. Information Theory, 36(5):961–1005, 1990. [44] J.G. Daugman. Two-dimensional spectral analysis of cortical receptive field pro- files. Vision research, 20(10):847–856, 1980. [45] G. Davis, S.G. Mallat, and M. Avellaneda. Adaptive greedy approximations. Con- structive Approximation, 13(1):57–98, 1997. [46] M.E. Dawson, A.M. Schell, and D.L. Filion. The Electrodermal System. In J.T. Cacioppo, L.G. Tassinary, and G.G. Berntson, editors, Handbook of psychophysi- ology, pages 159–181. New York: Cambridge University Press, 3rd edition, 2007. [47] L Demanet and L. Ying. Wave atoms and sparsity of oscillatory patterns. Applied and Computational Harmonic Analysis, 23(3):368–387, 2007. [48] R. O. Duda, P. E. Hart, and D. G. Stork. Multiple discriminant analysis. In Pattern Classification. John Wiley & Sons, second edition, 2000. [49] Piotr J Durka, Artur Matysiak, Eduardo Mart´ ınez Montes, Pedro Vald´ es Sosa, and Katarzyna J Blinowska. Multichannel matching pursuit and EEG inverse solu- tions. Journal of neuroscience methods, 148(1):49–59, 2005. [50] M. Elad. Sparse and redundant representations: from theory to applications in signal and image processing. Springer, New York, NY , 2010. 107 [51] M. Elad and I. Yavneh. A plurality of sparse representations is better than the sparsest one alone. IEEE Transactions on Information Theory, 55(10):4701– 4714, 2009. [52] K. Engan, S.O. Aase, and J. Hakon Husoy. Method of optimal directions for frame design. Proceedings of ICASSP, pages 2443–2446, 1999. [53] K. Engan, K. Skretting, and J. H. Husøy. Family of iterative ls-based dictionary learning algorithms, ILS-DLA, for sparse signal representation. Digital Signal Processing, 17(1):32–49, 2007. [54] S. Faziludeen and P.V . Sabiq. ECG beat classification using wavelets and SVM. In Proc. IEEE Conference on Information & Communication Technologies (ICT), pages 815–818, 2013. [55] R. Feldman. Parent-infant synchrony: Biological foundations and developmental outcomes. Current directions in psychological science, 16:340–345, 2007. [56] Emilio Ferrer and Jonathan L Helm. Dynamical systems modeling of physiolog- ical coregulation in dyadic interactions. International Journal of Psychophysiol- ogy, 88(3):296–308, 2013. [57] Stephanie R Fishel, Eric R Muth, and Adam W Hoover. Establishing appropri- ate physiological baseline procedures for real-time physiological measurement. Journal of Cognitive Engineering and Decision Making, 1(3):286–308, 2007. [58] A.K. Fletcher and S. Rangan. Orthogonal matching pursuit from noisy random measurements: A new analysis. In Advances in Neural Information Processing Systems (NIPS), pages 540–548, 2009. [59] A.K. Fletcher and S. Rangan. Orthogonal matching pursuit: A brownian motion analysis. IEEE Transactions on Signal Processing, 60(3):1010–1021, 2012. [60] A. Fog. Calculation methods for Wallenius’ noncentral hypergeometric distribu- tion. Communications in Statistics-Simulation and Computation, 37(2):258–273, 2008. [61] A. Fog. Sampling methods for Wallenius’ and Fisher’s noncentral hypergeo- metric distributions. Communications in Statistics-Simulation and Computation, 37(2):241–257, 2008. [62] R Chris Fraley, Niels G Waller, and Kelly A Brennan. An item response theory analysis of self-report measures of adult attachment. Journal of personality and social psychology, 78(2):350, 2000. 108 [63] K.J. Friston, A.P Holmes, K.J. Worsley, J.P. Poline, C.D. Frith, and R.S.J. Frack- owiak. Statistical parametric maps in functional imaging: A general linear ap- proach. Human Brain Mapping, 2:189–210, 1995. [64] A. Gelman and J. Hill. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006. [65] J. Geweke. Evaluating the accuracy of sampling-based approaches to the calcula- tion of posterior moments. In J.M. Bernardo, J. Berger, A.P. Dawid, and A.F.M. Smith, editors, Bayesian Statistics, volume 4, pages 169–193. Oxford University Press, 1991. [66] A Ghaffari, Hamid Palangi, Massoud Babaie-Zadeh, and Christian Jutten. ECG denoising and compression by sparse 2D separable transform with overcomplete mixed dictionaries. In IEEE International Workshop on Machine Learning for Signal Processing, pages 1–6, 2009. [67] Michael M Goodwin. Multichannel matching pursuit and applications to spatial audio coding. In IEEE Fortieth Asilomar Conference on Signals, Systems and Computers (ACSSC), pages 1114–1118, 2006. [68] A. Greco, A. Lanata, G. Valenza, E.P. Scilingo, and L. Citi. Electrodermal activity processing: A convex optimization approach. In Annual International Conference of the IEEE in Medicine and Biology Society (EMBC), pages 2290–2293, 2014. [69] J. Grimmer. An introduction to Bayesian inference via variational approxima- tions. Political Analysis, 19(1):32–47, 2010. [70] Stephen J Guastello, David Pincus, and Patrick R Gunderson. Electrodermal arousal between participants in a conversation: nonlinear dynamics and linkage effects. Nonlinear dynamics, psychology, and life sciences, 10(3):365–399, 2006. [71] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 1970. [72] L. He, H. Qi, and R. Zaretzki. Non-parametric Bayesian dictionary learning for image super resolution. In IEEE Future of Instrumentation International Work- shop (FIIW), pages 122–125, 2011. [73] Jonathan L Helm, David Sbarra, and Emilio Ferrer. Assessing cross-partner as- sociations in physiological responses via coupled oscillator models. Emotion, 12(4):748, 2012. [74] W. J. Jasper, Garnier, S. J., and H. Potlapalli. Texture characterization and defect detection using adaptive wavelets. Optical Engineering, 35(11):3140–3149, 1996. 109 [75] D. Jenkins and S.J. Gerred. ECGs by example. Elsevier Health Sciences, 2011. [76] S. Ji, Y . Xue, and L. Carin. Bayesian compressive sensing. IEEE Transactions on Signal Processing, 56(6):2346–2356, 2008. [77] A. Johnson. Geometric ergodicity of Gibbs samplers. PhD thesis, University of Minessota, 2009. [78] A.A. Johnson, L.G. Jones, and C.R. Neath. Component-wise Markov chain Monte Carlo: Uniform and geometric ergodicity under mixing and composition. Statis- tical Science, 28(3):360–375, 2013. [79] G.L. Jones, M. Haran, B.S. Caffo, and R. Neath. Fixed-width output analysis for Markov Chain Monte Carlo. Journal of the American Statistical Association, 101(476):1537–1547, 2006. [80] G.L. Jones and J.P. Hobert. Honest exploration of intractable probability distribu- tions via markov chain monte carlo. Statistical Science, pages 312–334, 2001. [81] L.G. Jones, O.G. Roberts, and J.S. Rosenthal. Convergence of conditional Metropolis-Hastings samplers. Advances in Applied Probability, 46(2):422–445, 2014. [82] S. Koelstra, C. Muhl, M. Soleymani, J.S. Lee, A. Yazdani, T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras. DEAP: A database for emotion analysis using physio- logical signals. IEEE Transactions on Affective Computing, 3(1):18–31, 2012. [83] Y .B. Lee, S.W. Yoon, C.K. Lee, and M.H. Lee. Wearable EDA sensor gloves using conducting fabric and embedded system. World Congress on Medical Physics and Biomedical Engineering, pages 883–888, 2006. [84] I. Leite, R. Henriques, C. Martinho, and A. Paiva. Sensors in the wild: exploring electrodermal activity in child-robot interaction. ACM/IEEE International Con- ference on Human-Robot Interaction, pages 41–48, 2013. [85] Robert W Levenson and John M Gottman. Marital Interaction: Physiological Linkage and Affective Exchange. Journal of personality and social psychology, 45(3):587–597, 1983. [86] M.S. Lewicki and B.A. Olshausen. Probabilistic framework for the adaptation and comparison of image codes. Journal of the Optical Society of America A (JOSA A), 16(7):1587–1601, 1999. [87] M.S. Lewicki and T. Sejnowski. Learning overcomplete representations. Neural computation, 12(2):337–365, 2000. 110 [88] L. Li, J. Silva, M. Zhou, and L. Carin. Online Bayesian dictionary learning for large datasets. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2157–2160, 2012. [89] C.L. Lim, C. Rennie, R.J. Barry, H. Bahramali, I. Lazzaro, B. Manor, and E. Gor- don. Decomposing skin conductance into tonic and phasic components. Interna- tional Journal of Psychophysiology, 25(2):97–109, 1997. [90] A.P. Lobo and P.C. Loizou. V oiced/unvoiced speech discrimination in noise using gabor atomic decomposition. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP), volume 1, pages I– 820. IEEE, 2003. [91] M. Lustig, D. Donoho, and J.M. Pauly. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic resonance in medicine, 58(6):1182–1195, 2007. [92] J. Mairal, Bach, F., J. Ponce, G. Sapiro, and A. Zisserman. Supervised dictionary learning. Adv. NIPS, 2008. [93] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online dictionary learning for sparse coding. Proceedings of ACM, pages 689–696, 2009. [94] S.G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993. [95] D.W. Marquardt. An algorithm for least-squares estimation of nonlinear parame- ters. Journal of the Society for Industrial & Applied Mathematics, 11(2):431–441, 1963. [96] J.A. Mazaheri, C. Guillemot, and C. Labit. Learning a tree-structured dictionary for efficient image representation with adaptive sparse coding. In IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1320–1324, 2013. [97] K.L. Mengersen and R.L. Tweedie. Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics, 24(1):101–121, 1996. [98] S. Merlet, E. Caruyer, A. Ghosh, and R. Deriche. A computational diffusion MRI and parametric dictionary learning framework for modeling the diffusion signal and its features. Medical Image Analysis, 17(9):830–843, 2013. [99] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemi- cal physics, 21(6):1087–1092, 1953. 111 [100] P.S. Meyn and L.R. Tweedie. Markov chains and stochastic stability. Cambridge University Press, New York, 2nd edition, 2009. [101] A. Milenkovi´ c, C. Otto, and E. Jovanov. Wireless sensor networks for personal health monitoring: Issues and an implementation. Computer communications, 29(13):2521–2533, 2006. [102] U. Mitra, B.A. Emken, S. Lee, M. Li, V . Rozgic, G. Thatte, H. Vathsangam, D. Zois, M. Annavaram, S. Narayanan, M. Levorato, D. Spruijt-Metz, and G. Sukhatme. KNOWME: a case study in wireless body area sensor network design. IEEE Communications Magazine, 50(5):116–125, 2012. [103] G.B. Moody and R.G. Mark. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001. [104] S. Mukhopadhyay and P. Sircar. Parametric modelling of ECG signal. Medical and Biological Engineering and computing, 34(2):171–174, 1996. [105] J.A. Nelder and R. Mead. A simplex method for function minimization. The computer journal, 7(4):308–313, 1965. [106] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive field prop- erties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996. [107] B. Ophir, M. Elad, N. Bertin, and M.D Plumbley. Sequential minimal eigenvalues- an approach to analysis dictionary learning. Proceedings of EUSIPCO, 2011. [108] A.V . Oppenheim, R.W. Schafer, and J.R. Buck. Discrete-Time Signal Processing. Prentice-Hall, Upper Saddle River, NJ, 1999. [109] J. Pan and W.J. Tompkins. A real-time QRS detection algorithm. IEEE Transac- tions on Biomedical Engineering, (3):230–236, 1985. [110] Y .C. Pati, R. Ramin, and P.S. Krishnaprasad. Orthogonal matching pursuit: Re- cursive function approximation with applications to wavelet decomposition. Con- ference on Signals, Systems and Computers, 1993. [111] R.W. Picard. Future affective technology for autism and emotion communication. Phil. Trans. R. Soc., 364:3575–3584, 2009. [112] R. Pichevar, H. Najaf-Zadeh, L. Thibault, and H. Lahdili. Auditory-inspired sparse representation of audio signals. Speech Communication, 53(5):643–657, 2011. 112 [113] M.Z. Poh, N.C. Swenson, and R.W. Picard. A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE Trans. on Biomedical Engi- neering, 57:1243–1252, 2010. [114] L Polania, K Barner, R Carrillo, and M Blanco-Velasco. Exploiting Prior Knowl- edge in Compressed Sensing Wireless ECG Systems. IEEE Journal of Biomedical and Health Informatics, 1(99):1–12, 2014. [115] L.F. Polania, R.E. Carrillo, M. Blanco-Velasco, and K.E. Barner. Exploiting prior knowledge in compressed sensing wireless ECG systems. IEEE Journal of Biomedical and Health Informatics, 19(2):508–519, 2015. [116] Luisa F Polania, Rafael E Carrillo, Manuel Blanco-Velasco, and Kenneth E Barner. Compressed sensing based method for ECG compression. In IEEE Inter- national Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 761–764, 2011. [117] Q. Qiu, V .M. Patel, T. Pavan, and R. Chellappa. Domain adaptive dictionary learning. In European Conference on Computer Vision (ECCV), pages 631–645. Springer, 2012. [118] Bhaskar D Rao, Kjersti Engan, Shane F Cotter, Jason Palmer, and Kenneth Kreutz- Delgado. Subset selection in noise based on diversity measure minimization. IEEE Transactions on Signal Processing, 51(3):760–770, 2003. [119] Bhaskar D Rao and Kenneth Kreutz-Delgado. An affine scaling methodology for best basis selection. IEEE Transactions on Signal Processing, 47(1):187–200, 1999. [120] M.B.I. Reaz, M.S. Hussain, and F. Mohd-Yasin. Techniques of EMG signal anal- ysis: detection, processing, classification and applications. Biological procedures online, 8(1):11–35, 2006. [121] M. Reisert, H. Skibbe, and V .G. Kiselev. The diffusion dictionary in the human brain is short: Rotation invariant learning of basis functions. Computational Dif- fusion MRI and Brain Connectivity Mathematics and Visualization, pages 47–55, 2014. [122] Sanaz Rezaei. Physiological Synchrony as Manifested in Dyadic Interactions. PhD thesis, University of Toronto, 2013. [123] G.O. Roberts, A. Gelman, and W.R. Gilks. Weak convergence and optimal scal- ing of random walk Metropolis algorithms. The annals of applied probability, 7(1):110–120, 1997. 113 [124] G.O. Roberts and J.S. Rosenthal. Geometric ergodicity and hybrid Markov chains. Electronic Communications in Probability, 2(2):13–25, 1997. [125] J.S. Rosenthal. Theoretical rates of convergence for Markov Chain Monte Carlo. Computing Science and Statistics, pages 486–486, 1994. [126] J.S. Rosenthal. Minorization conditions and convergence rates for Markov Chain Monte Carlo. Journal of the American Statistical Association, 90(430):558–566, 1995. [127] J.S. Rosenthal. Optimal proposal distributions and adaptive mcmc. Handbook of Markov Chain Monte Carlo, pages 93–112, 2011. [128] R. Rubinstein, A.M. Bruckstein, and M. Elad. Dictionaries for sparse representa- tion modeling. Proceedings of the IEEE, 98(6):1045–1057, 2010. [129] R. Rubinstein, M. Zibulevsky, and M. Elad. Double sparsity: Learning sparse dictionaries for sparse signal approximation. IEEE Transactions on Signal Pro- cessing, 58(3):1553–1564, 2010. [130] N. Ruiz-Reyes, P. Vera-Candeas, P.J. Reche-L´ opez, and F. Canadas-Quesada. A time-frequency adaptive signal model-based approach for parametric ecg com- pression. In European Signal Processing Conference, pages 1–5. IEEE, 2006. [131] P. Sallee and B.A. Olshausen. Learning sparse multiscale image representations. In Advances in Neural Information Processing Systems (NIPS), pages 1327–1334, 2002. [132] M.K. Sarkaleh and A. Shahbahrami. Classification of ECG arrhythmias using dis- crete wavelet transform and neural networks. International Journal of Computer Science, Engineering and Applications, 2(1):1, 2012. [133] Darby Saxbe and Rena L Repetti. For better or worse? Coregulation of couples cortisol levels and mood states. Journal of personality and social psychology, 98(1):92, 2010. [134] David A Sbarra and Cindy Hazan. Coregulation, dysregulation, self-regulation: An integrative analysis and empirical agenda for understanding adult attach- ment, separation, loss, and recovery. Personality and Social Psychology Review, 12(2):141–167, 2008. [135] Jeffry A Simpson, W Steven Rholes, and Dede Phillips. Conflict in close relation- ships: an attachment perspective. Journal of personality and social psychology, 71(5):899–914, 1996. 114 [136] S. Singer and S. Singer. Complexity analysis of Nelder-Mead search iterations. In Proceedings of the 1st Conference on Applied Mathematics and Computation, Dubrovnik, Croatia, pages 185–196, 1999. [137] K. Skretting and K. Engan. Image compression using learned dictionaries by RLS-DLA and compared with K-SVD. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1517–1520, 2011. [138] L. Sornmo, P.O Borjesson, M. Nygards, and O. Pahlm. A method for evaluation of QRS shape features using a mathematical model for the ECG. IEEE Transactions on Biomedical Engineering, (10):713–717, 1981. [139] J.L. Starck, E.J. Cand` es, and D.L. Donoho. The curvelet transform for image denoising. IEEE Transactions on Image Processing, 11(6):670–684, 2002. [140] L.I. Stein. Oral care and sensory sensitivities in children with autism spectrum disorders. PhD thesis, USC, 2013. [141] L.I. Stein, J.C. Polido, S.O. Lopez Najera, and S.A. Cermak. Oral care experi- ences and challenges in children with autism spectrum disorders. Pediatric den- tistry, 34(5):387–391, 2012. [142] H. Storm, A. Fremming, S. Ødegaard, Ø.G. Martinsen, and L. Mørkrid. The development of a software program for analyzing spontaneous and externally elicited skin conductance changes in infants and adults. Clinical Neurophysiol- ogy, 111(10):1889–1898, 2000. [143] H. H. Szu, B. A. Telfer, and S. L. Kadambe. Neural network adaptive wavelets for signal representation and classification. Optical Engineering, 21(9):1907–1916, 1992. [144] Qin Tan, Bin Fang, and Pu Wang. Improved simultaneous matching pursuit for multi-lead ECG data compression. In IEEE International Conference on Mea- suring Technology and Mechatronics Automation (ICMTMA), volume 2, pages 438–441, 2010. [145] D. Thanou, D. Shuman, and P. Frossard. Learning Parametric Dictionaries for Signals on Graphs. IEEE Transactions on Signal Processing, 62(15):3849–3862, 2014. [146] Neil H Timm. Applied multivariate analysis, volume 26. Springer, 2002. [147] M.E. Tipping. Sparse Bayesian learning and the relevance vector machine. The journal of machine learning research, 1:211–244, 2001. [148] I. Toˇ si´ c and P. Frossard. Dictionary learning for stereo image representation. IEEE Transactions on Image Processing, 20(4):921–934, 2011. 115 [149] Joel A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50(10):2231–2242, 2004. [150] Joel A Tropp, Anna C Gilbert, and Martin J Strauss. Simultaneous sparse approx- imation via greedy pursuit. In Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP’05). IEEE International Conference on, volume 5, pages 721–724, 2005. [151] M. Tsagkias, M. De Rijke, and W. Weerkamp. Hypergeometric language models for republished article finding. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pages 485–494. ACM, 2011. [152] R. Vidal, Y . Ma, and S. Sastry. Generalized principal component analysis (GPCA). IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(12):1945–1959, 2005. [153] T. Vuorela, V .P. Seppa, J. Vanhala, and J. Hyttinen. Design and implementation of a portable long-term physiological signal recorder. IEEE Transactions on In- formation Technology in Biomedicine, 14(3):718–725, 2010. [154] Michael B Wakin, Marco F Duarte, Shriram Sarvotham, Dror Baron, and Richard G Baraniuk. Recovery of jointly sparse signals from few random pro- jections. In Proc. Neural Information Processing Systems (NIPS), 2005. [155] K.T. Wallenius. Biased sampling; the noncentral hypergeometric probability dis- tribution. Technical report, DTIC Document, 1963. [156] J. Wasilewski and L. Polo´ nski. An introduction to ECG interpretation. In ECG Signal Processing, Classification and Interpretation, pages 1–20. Springer, 2012. [157] D. P. Wipf and B. D. Rao. Sparse Bayesian learning for basis selection. IEEE Transactions on Signal Processing, 52(8):2153–2164, 2004. [158] D.P. Wipf and B.D. Rao. An empirical Bayesian strategy for solving the simul- taneous sparse approximation problem. IEEE Transactions on Signal Processing, 55(7):3704–3716, 2007. [159] M. Yaghoobi, L. Daudet, and M.E. Davies. Parametric dictionary design for sparse coding. IEEE Transactions on Signal Processing, 57(12):4800–4810, 2009. [160] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D. Dunson, G. Sapiro, and L. Carin. Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Transactions on Image Processing, 21(1):130–144, 2012. 116 [161] Y . Zhou, X. Hu, Z. Tang, and A.C. Ahn. Denoising and baseline correction of ECG signals using sparse representation. In IEEE Workshop on Signal Processing Systems (SiPS), pages 1–6, 2015. [162] X. Zhu, G.T. Beauregard, and L. Wyse. Real-time signal estimation from modified short-time Fourier transform magnitude spectra. IEEE Transactions on Audio, Speech, and Language Processing, 15(5):1645–1653, 2007. 117 Appendix 118 Appendix A MCMC Ergodicity Proofs We provide the proofs for the theorems and lemmas of Section 4.5. Proof of Lemma 4.5.1: We assume that the sampling order is Y 1 ;:::; Y B1 . Letfy 0 1 ;:::; y 0 B1 g andfy 1 ;:::; y B g be the block variables at the current and previous MCMC state, respectively. The conditional probability for sampling the firstB 1 blocks is p(y 0 1 ;:::; y 0 B1 jy 1 ;:::; y B ) =p(y 0 B1 jy 0 1 ;:::; y 0 B2 ; y B1 ; y B ) p(y 0 1 ;:::; y 0 B2 jy 1 ;:::; y B ) =p(y 0 B1 jy 0 1 ;:::; y 0 B2 ; y B1 ; y B ) p(y 0 B2 jy 0 1 ;:::; y 0 B3 ; y B2 ; y B1 ; y B ) :::p(y 0 1 jy 1 ;:::; y B ) =p(y 0 B1 jy y B1 )p(y 0 B2 jy y B2 )p(y 0 1 jy y 1 ) where y b contains all variables except y b . The Markov kernel for the firstB1 blocks can be written as in (A.3). The first and second inequalities in (A.3) occur from the minorization of the (B 1) th and (B 2) th blocks. Therefore9 0 = Q B1 b=1 b > 0 and 0 = Q B1 b1 b (A b ) (a probability measure) that satisfy the desired inequality. 119 Proof of Theorem 4.5.2: If we assume that the sampling order is Y 1 ;:::; Y B , the Markov kernel of the Gibbs sampler can be expressed as in (A.4). The first inequality in (A.4) results from Lemma 4.5.1 and can yield to a minorization condition for the entire Gibbs sampler. Proof of Lemma 4.5.2: From Def. 4.5.1, Table 4.2, and (4.7-4.8) w y ~ n / exp n 2 jj n jj 2 j ~ G n j 1 2 jV ~ n j 1=2 exp h 1 2 ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) i 1 + 1 1 +Q ~ n ^ ~ n T ^ V 1 ~ n ~ n ^ ~ n 1 +Q 2 (A.1) Using (4.10), we have j ~ G n j 1 2 jV ~ n j 1 2 =j ~ G n j 1 2 <1 and 1 Q K k=1 jV k j 1 2 = K <1 since ~ G n is the precision matrix of Gaussian distribution, therefore has finite eigen- values and determinants, and <1. Moreover N Q n=1 exp n 2 jj n jj 2 < 1 because n 2 jj n jj 2 0 forn = 1;:::;N and n > 0. Finally the function f( ~ n ) = 1 + 1 1 +Q ~ n ^ ~ n T ^ V 1 ~ n ~ n ^ ~ n 1 +Q 2 exp 1 2 ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) (A.2) is bounded since f( ~ n )! 0,k ~ n k!1, and f is continuous. Function f remains bounded atk ~ n k!1, since the quadratic forms ( ~ n ~ g n ) T ~ G n ( ~ n ~ g n ) and ( ~ n ~ g n ) T ^ V 1 ~ n ( ~ n ~ g n ) increase to infinity at the same rate, and the denominator increases exponentially fast, while the numerator with polynomial rate. 120 P ((Y 1 ; Y 2 ;:::; Y B1 );A B ) = Z A 1 :::A B1 p(y 0 1 ;:::; y 0 B1 jy 1 ;:::; y B ) d(y 0 B ) = Z A 1 :::A B1 p(y 0 1 ;:::; y 0 B2 jy y 1 ;:::; y y B2 )p y 0 B1 jy y B1 B1 dy 0 B1 ::: 1 (dy 0 1 ) = Z A 1 :::A B2 p(y 0 1 ;:::; y 0 B2 jy y 1 ;:::; y y B2 ) 0 B @ Z A B1 p y 0 B1 jy y B1 B1 dy 0 B1 1 C A B2 dy 0 B2 ::: 1 (dy 0 1 ) B1 B1 (A B1 ) Z A 1 :::A B3 p(y 0 1 ;:::; y 0 B3 j:::) 0 B @ Z A B2 p y 0 B2 jy y B2 B2 dy 0 B2 1 C A B3 dy 0 B3 ::: 1 (dy 0 1 ) ::: B1 Y b1 b b (A b ) (A.3) P ((Y 1 ; Y 2 ;:::; Y B );A 1 A 2 :::A B ) = Z A 1 :::A B p(y 0 1 ;:::; y 0 B jy 1 ;:::; y B ) (d(y 0 1 ;:::; y 0 B )) = Z A B p(y 0 B jy 0 1 ;:::; y 0 B1 ; y B ) Z A 1 :::A B1 p(y 0 1 ;:::; y 0 B1 jy 1 ;:::; y B1 ; y B ) d(y 0 1 ;:::; y 0 B1 ) B (dy 0 B ) 0 Z A B p(y 0 B jy 0 1 ;:::; y 0 B1 ; y B ) Z A 1 :::A B1 0 d(y 0 1 ;:::; y 0 B1 ) B (dy 0 B ) = 0 Z A 1 :::A B1 p (y y B ;A b ) d(y 0 1 ;:::; y 0 B1 (A.4) 121 Proof of Theorem 4.5.3: The firstB 1 blocks of y n , that are generated with the Gibbs sampler, follow well-known distributions (Table 4.1), therefore it is trivial to show that their updates are minorisable. From Theorem 4.5.2, since the partial updates for all blocks except the last one are minorisable, the entire Gibbs sampler is minorisable, there- fore uniformly ergodic. From Lemma 4.5.2, the conditional weight of theB th block y ~ n generated with the MH is bounded. Thus the MH-within-Gibbs sampler meets the con- ditions of Theorem 4.5.1, therefore it is uniformly ergodic. 122 Appendix B Computational Complexity We analyze the computational complexity of our proposed framework for each MCMC step (Algorithm 1, Table 4.2) using the “O” notation. For an input signal and MCMC iteration, the weight of the Multinomial distribution for each atom is computed with cost O(P ), i.e.O(PK) for all atoms. SamplingL indices fromK dictionary atoms with re- placement results inO(LK), therefore the final cost yieldsO (PK +LK). For sampling without replacement, re-adjusting the atom weights requiresO((K1)+::: + (KL+ 1))O(LK). Since each of theL iterations takes into account the previously selected atoms [61], the cost of the Wallenius distribution isO (K + (K 1) +::: + (KL + 1)) O(LK). Each of the L coefficients is generated from the normal distribution, whose mean requiresO(P ), therefore the entire complexity isO(LP ). Regarding the dictionary atom parameters, their posterior (4.7) requiresO (L(P +Q 2 )), while the typical cost of the Nelder Mead’s simplex isO (Q 2 ) [136]. Finally, complexity results inO(1) for the noise n ,O(K +L) for the Dirichlet prior n ,O(Q 2 ) for the mean g nk and precision G n of the dictionary parameters prior, andO(P ) for the noise precision n . 123 Taking these into account, the total complexity when using sampling with and with- out replacement, respectively, yields: O PK +LK + 2LP + (L + 3)Q 2 +P +K +L + 1 O PK +LK +LP +LQ 2 (B.1) O 2LK + 2LP + (L + 3)Q 2 +P +K +L + 1 O LK +LP +LQ 2 (B.2) Our approach requires first-order polynomial time with the signal dimensionalityP , the number of dictionary atomsK, and the number of selected atoms in the representa- tionLK, and second-order polynomial cost with respect to the number of dictionary parametersQ. We note thatQP;L, therefore the latter is not too expensive. We further compare run-time statistics of all the approaches. We report the average duration of one training iteration and the number of estimated variables for each approach (Table B1). Experiments were performed with an Intel Core i7 Processor with CPU at 2.93Hz and RAM at 7.8GB. SD and K-SVD require a decomposition step, therefore they compute slightly more variables than ETF. Our method estimates the highest number of variables, including the priors of the considered problem. Results indicate that SD and ETF are computationally less expensive than the proposed MCMC. Consistently with previous observations [128], K-SVD also has high computational cost. MCMC sampling with and without replacement yield computation times of the same order. 124 Table B1: Average computation time of dictionary learning algorithms Method # Variables Computation Time (sec / training iteration) SD 438 0.1173 ETF 432 0.0616 K-SVD 438 3.9536 MCMC w- rplcm 1024 3.5014 MCMC w-o rplcm 1024 4.9638 125
Abstract (if available)
Abstract
Human physiology reflects an individual's physical and mental condition through bio-signals, that are indicative of both physical and behavioral pathological factors. The electrocardiogram, for example, is widely used to assess cardiovascular diseases, as well as regulation and stress reactivity in various mental health conditions, such as Autism. This renders human physiology inherently complex and multi-faceted providing a rich platform not only for mathematically modeling an individual's state, but also for impacting a variety of fields with data-informed feedback. Emerging engineering solutions for health and life-science applications employ evidence from biomedical signals to get insights into the physical, mental, and affective state of individuals and assist them towards regulatory changes. ❧ These techniques will become increasingly important as the continuing converging advances in sensing and computing, including wearable technology, permeate everyday life. The long-term tracking of physiological and behavioral mechanisms can promote healthy routines, increase emotional wellness and awareness, and revolutionize clinical assessment and intervention for chronic diseases (e.g. heart and pulmonary illnesses) and mental conditions (e.g. autism and depression). The multiple streams of information available for each individual can assist personalized healthcare through data-scientific health analytics. The unobtrusive continuous tracking of vital signs, for instance, can help patients with cardiovascular diseases for improving their medication habits, moderating their diet and physical activity, and managing stress levels. Physiology-tracking devices in Autism have the great potential of providing ""mood-monitoring"" data for predicting challenging conditions, such as meltdowns, identifying possible reasons of their occurrence, or even preventing them. ❧ Despite those opportunities, there are many challenges in this field exemplified by the complex and heterogeneous data spaces, the lack of contextual information and the limited presence (or absence) of specialized analytics. This thesis focuses on translating the low-level data acquired from physiology-monitoring devices into higher-level meta-information and developing feedback mechanisms able to provide data-inspired suggestions for targeted changes. While sophisticated artificial intelligence systems can yield very accurate results, human experts usually find little comfort in them, since the insights about their inner workings and decision making process are very limited (or even non-existent). My dissertation focussed on the development of robust interpretable algorithms for the representation of biomedical signals, the design of novel intuitive physiological measures, and the exploration of knowledge-driven mathematical models for describing the co-evolution of physiology with other modalities and behavioral indices as a holistic system.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Automatic quantification and prediction of human subjective judgments in behavioral signal processing
PDF
Computational modeling of human interaction behavior towards clinical translation in autism spectrum disorder
PDF
Modeling expert assessment of empathy through multimodal signal cues
PDF
Computational modeling of behavioral attributes in conversational dyadic interactions
PDF
Behavioral signal processing: computational approaches for modeling and quantifying interaction dynamics in dyadic human interactions
PDF
Modeling dynamic behaviors in the wild
PDF
Machine learning paradigms for behavioral coding
PDF
Learning multi-annotator subjective label embeddings
PDF
Human behavior understanding from language through unsupervised modeling
PDF
Improving modeling of human experience and behavior: methodologies for enhancing the quality of human-produced data and annotations of subjective constructs
PDF
Understanding sources of variability in learning robust deep audio representations
PDF
Classification and retrieval of environmental sounds
PDF
A computational framework for diversity in ensembles of humans and machine systems
PDF
Novel variations of sparse representation techniques with applications
PDF
Interaction dynamics and coordination for behavioral analysis in dyadic conversations
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Sampling theory for graph signals with applications to semi-supervised learning
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Semantically-grounded audio representation learning
PDF
Noise aware methods for robust speech processing applications
Asset Metadata
Creator
Chaspari, Theodora
(author)
Core Title
Knowledge-driven representations of physiological signals: developing measurable indices of non-observable behavior
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
06/13/2017
Defense Date
05/24/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dictionary learning,OAI-PMH Harvest,physiological signals,sparse representation,wearable devices
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Narayanan, Shrikanth S. (
committee chair
), Margolin, Gayla (
committee member
), Ortega, Antonio (
committee member
)
Creator Email
chaspari@gmail.com,chaspari@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-382227
Unique identifier
UC11255831
Identifier
etd-ChaspariTh-5395.pdf (filename),usctheses-c40-382227 (legacy record id)
Legacy Identifier
etd-ChaspariTh-5395.pdf
Dmrecord
382227
Document Type
Dissertation
Rights
Chaspari, Theodora
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
dictionary learning
physiological signals
sparse representation
wearable devices