Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Signal processing methods for brain connectivity
(USC Thesis Other)
Signal processing methods for brain connectivity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
SIGNAL PROCESSING METHODS FOR BRAIN CONNECTIVITY by David Stanford Wheland A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2013 Doctoral Committee: Professor Richard M. Leahy, Chair Professor B. Keith Jenkins Associate Professor Bosco S. Tjan Research Assistant Professor Anand A. Joshi Copyright 2013 David Stanford Wheland Abstract Although the human brain has been studied for centuries, and the advent of non-invasive brain imaging modalities in the last century in particular has led to significant advances, there is much left to discover. Current neuroscientific theory likens the brain to a highly interconnected network whose behavior can be better understood by determining its network connections. Correlation, coherence, Granger causality, and blind source sepa- ration (BSS) are frequently used to infer this connectivity. Here I propose novel methods to improve their inference from neuroimaging data. Correlation and coherence suffer from being unable to differentiate between direct and indirect connectivity. While partial correlation and partial coherence can mitigate this problem, standard methods for calculating these measures result in significantly re- duced statistical inference power and require greater numbers of samples. To address these drawbacks I propose novel methods based on a graph pruning algorithm that lever- age the connectivity sparsity of the brain to improve the inference of partial correlation and partial coherence. These methods are demonstrated in applications. In particular, partial correlation is explored in both cortical thickness data from structural magnetic resonance (MR) images and resting state data from functional MR images, and par- tial coherence is explored in invasive electrophysiological measurements in non-human primates. ii Granger causality inherently differentiates between direct and indirect connectivity and like partial coherence is readily applicable to time series. However unlike partial coherence, it uses the temporal ordering implied by the time series to infer a type of causality on the connectivity. Despite its differences, the inference of Granger causality can also be improved using a similar graph pruning algorithm, and I describe such an extension here. The method is also applied to explore electrophysiological interactions in non-human primate data. BSS methods seek to decompose a dataset into a linear mixture of sources such that the sources best match some target property, such as independence. The second order blind identification (SOBI) BSS method has a number of properties particularly well-suited for data on the cerebral cortex and relies on the calculation of lagged covari- ance matrices. However while these lagged covariance matrices are readily available in one-dimensional data, they are not straightforward to calculate on the two-dimensional cortical manifold on which certain types of neuroimaging data lie. To address this, I propose a method for calculating the covariance matrices on the cortical manifold and demonstrate its application to cortical gray matter thickness and curvature data on the cerebral cortex. iii To my parents iv Acknowledgements I owe much thanks to too many people to list them all, but would like to start by sending out a general thank you to everyone who has helped or supported me over the course of my Ph.D. I am grateful to my advisor, Professor Richard M. Leahy, for his support and guid- ance and for providing me with an enriching and stimulating work environment. I am also indebted to my other defense and qualifying committee members for their time and valuable feedback, namely Professors B. Keith Jenkins, Bosco S. Tjan, Anand A. Joshi, and C.-C. Jay Kuo. I would also like to especially thank Dr. Dimitrios Pantazis and Professor Anand A. Joshi for their close involvement and encouragment in my work. I am also indebted to Syed Ashrafulla and Sergul Aydore for their willingness to help, give feedback, and act as sounding boards and to Professor Justin P. Haldar for his insightful and rigorous feedback. I offer many thanks to David Shattuck, Hanna Damasio, Antonio Damasio and all of my other collaborators as well. I am appreciative of my current and past fellow members of the biomedical imaging group and labmates—Syed Ashrafulla, Sergul Aydore, Anand Joshi, Chitresh Bhushan, Kevin Wang, Yanguang Lin, Ran Ren, Wentao Zhu, Bing Bai, Ying Fang, Dimitrios Pan- tazis, Quanzheng Li, Hua Brian Hui, Sangtae Ahn, Juan Poletti Soto, Yu-Teng Chang, v Joyita Dutta, Sangeetha Somayajula, and Sanghee Cho—for a fun and stimulating work environment. I am grateful to Ladan Amouzegar for the great times and her support and patience. I would also like to acknowledge the generous financial support of my graduate studies by the National Institutes of Health, the Viterbi School of Engineering, and the University of Southern California. Lastly, I would like to thank my parents for helping set me on the path that led here. vi Table of Contents Abstract ii Dedication iv Acknowledgements v List of Figures ix List of Tables xii Chapter 1 Introduction 1 Chapter 2 Background 6 2.1 The Human Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Processing Brain Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Correlation Analysis in Neuroimaging . . . . . . . . . . . . . . . . . . . . 12 Chapter 3 Second Order Blind Identification (SOBI) on the Cerebral Cortex 23 3.1 Blind Source Separation Theory and the Modified SOBI Algorithm . . . 27 3.1.1 Blind Source Separation (BSS) . . . . . . . . . . . . . . . . . . . 27 3.1.2 The Second Order Blind Identification (SOBI) Method . . . . . 29 3.1.3 SOBI on the Cerebral Cortex . . . . . . . . . . . . . . . . . . . . 30 3.2 Dataset and Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Jackknife Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter 4 Robust Inference of Sparse Partial Correlation Matrices with FDR Control 48 4.1 Theory and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 vii 4.1.1 Correlation and Partial Correlation . . . . . . . . . . . . . . . . . 50 4.1.2 Undirected Graphical Models . . . . . . . . . . . . . . . . . . . . 53 4.1.3 Connectivity Methods . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Method Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Method Comparison Methodology . . . . . . . . . . . . . . . . . 65 4.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.3 Real Gray Matter Thickness Data . . . . . . . . . . . . . . . . . . 74 4.2.4 Functional Resting-State Data . . . . . . . . . . . . . . . . . . . . 81 4.2.5 PCU Consistency Across Datasets and Modalities . . . . . . . . 85 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Chapter 5 Extensions of Sparse Partial Correlation Methods to the Time Domain 93 5.1 Theory and Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1.1 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1.2 Partial Coherence Based Graph Algorithms . . . . . . . . . . . . 102 5.1.3 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.1.4 Granger Causality Based Graph Algorithms . . . . . . . . . . . . 113 5.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2.1 Method Comparison in Simulation . . . . . . . . . . . . . . . . . 117 5.2.2 Control of the False Discovery Rate . . . . . . . . . . . . . . . . . 121 5.3 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 6 Conclusion 131 References 135 viii List of Figures 2.1 The brain and its basic structures . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 A closeup of part of the triangulated mesh shown in Figure 2.1b after setting the triangle faces to be transparent. . . . . . . . . . . . . . . . . . 11 2.3 A visualization of BrainSuite’s processing pipeline . . . . . . . . . . . . 12 2.4 Correlation coefficient betweenX andY whereX =c Y Y +c Z Z andY andZ are independent random variables with unit variance over a range ofc Y andc Z values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Results of the correlation connectivity analysis. (a) Matrix of corre- lation coefficient values between each of the 70 ROIs. Element (i;j) corresponds to the correlation between ROIi and ROIj. (b) Matrix of p-values for each of the correlation coefficients in (a). (c) Mask result- ing from Bonferroni correction. Blue elements were deemed significant, red elements were not. (d) Mask resulting from the BHY procedure to control the FDR. (e) Lateral view of one hemisphere of the parcellated cortex with color-coded ROIs. (f) Result of the FDR-controlled corre- lation analysis visualized on the smoothed cortical surface viewed from the top. Edge darkness and thickness increase with the correlation coef- ficient. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Estimation of rotation autocovariance matrices through brain rotations on the sphere. a) Rotations are displayed with a curvature map for clar- ity; however, we use the variance of these maps for BSS analysis; b-left) A reference point O is shifted to P through a rotation with an axis normal to the OPC plane where C is the center of the sphere; b-right) Multiple rotations are achieved by shifting point O to several neighboring points. The distance of the points is exaggerated for clarity. . . . . . . . . . . . . 32 3.2 Left: Average maps of thickness, thickness variance, and curvature vari- ance, across all preprocessed hemisphere data. Right: histogram of the corresponding data across all subjects. . . . . . . . . . . . . . . . . . . . . 36 3.3 Example simulated source and recovered source from each of the three methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 ix 3.4 Cortical thickness BSS sources recovered from the experimental data. The three most significant sources (according to explained variance) are shown after normalization by their standard deviations. Sources are listed with decreasing significance from left to right. . . . . . . . . . . . 41 3.5 Cortical thickness variance BSS sources recovered from the experimen- tal data. Caption similar to Figure 3.4. . . . . . . . . . . . . . . . . . . . 42 3.6 Cortical curvature variance BSS sources recovered from the experimen- tal data. Caption similar to Figure 3.4. . . . . . . . . . . . . . . . . . . . . 43 3.7 Median z-score magnitude versus sources in order of explained variance for the curvature variance metric and each BSS method. . . . . . . . . . 43 3.8 Z-score maps of cortical thickness BSS sources recovered from the ex- perimental data. The three most significant sources (according to ex- plained variance) are listed with decreasing significance from left to right. 44 3.9 Z-score maps of cortical thickness variance BSS sources recovered from the experimental data. Caption similar to Figure 3.8. . . . . . . . . . . . 45 3.10 Z-score maps of cortical curvature variance BSS sources recovered from the experimental data. Caption similar to Figure 3.8. . . . . . . . . 46 4.1 Example application of PCU to a small, faithful multivariate distribution whose conditional independencies are known a priori. . . . . . . . . . . 59 4.2 Dice coefficient and inner product performance measures versus number of samples for each method on the simulated tridiagonal concentration matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3 Dice and inner product performances versus sample size on the method- based simulations. Each simulated dataset is based on the estimated concentration matrixK ′ from applying either the concentration, UPC, or glasso method to the cortical thickness dataset. . . . . . . . . . . . . . 72 4.4 Average empirical FDR versus sample size on the concentration method- based and tridiagonal (off diagonal = 2) simulated datasets. . . . . . . . . 73 4.5 Lateral and medial views of the 25 parcellated cortical regions per hemi- sphere displayed on the template . . . . . . . . . . . . . . . . . . . . . . . 77 4.6 Histogram of the bootstrap-estimated edge variances for each method withN ′ = 645. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.7 Three views of the stable bootstrapped graphs for each method withN ′ = 645. Refer to Table 4.1 for ROI names. Edge darkness and thickness increase with edge stability. In the lateral views only intrahemispheric connections are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.8 A comparison of each method’s consistency with their full-sample out- put under the thickness dataset’s empirical distribution. The comparison is made over a range of sample sizes and consistency is measured by 15 trial average of the Dice coefficient and inner product between the current output and the full-sample output. . . . . . . . . . . . . . . . . . . 89 x 4.9 Lateral and medial views of the 35 parcellated cortical regions per hemi- sphere displayed on the atlas . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.10 Three views of the functional resting-state graph output by PCU. Refer to Table 4.3 for ROI names. Edge darkness and thickness increase with the partial correlation magnitude where blue and red edges denote pos- itive and negative partial correlations respectively. In the lateral views only intrahemispheric connections are shown. . . . . . . . . . . . . . . . 90 4.11 GraphViz “NEATO” spring model layout of the functional resting-state partial correlation network. Each circle contains its corresponding ROI abbreviation (see Table 4.1) where an “R.” prefix denotes the right hemi- sphere and “L.” the left. Suggested subnetwork membership is encoded by circle color (key on the figure). For example, a white circle denotes membership with all three possible subnetworks (sensorimotor, default mode, visual) while gray denotes membership with none of these sub- networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.12 Actual PCU Dice coefficient and theoretical random graph Dice coeffi- cient versus target number of edges. . . . . . . . . . . . . . . . . . . . . . 92 5.1 Dice coefficient performance measure versus number of samples for each partial coherence based method on synthetic data. . . . . . . . . . . 121 5.2 Dice coefficient and inner product performance measures versus number of samples for each Granger causality based method on synthetic data. . 122 5.3 Average empirical FDR versus sample size for spectral PCU and PC MV AR on their simulated MV AR models. . . . . . . . . . . . . . . . . . 123 5.4 Locations of the 15 electrode pairs (reproduced from Liang et al., 2001) 126 5.5 The four possible visual patterns shown to the monkey in the study . . . 126 5.6 Spectral PCU and PC MV AR LFP monkey data late response (278 ms) results. The thickness of the edges is proportional to the difference in the corresponding counts with a blue edge denoting that the go count is larger and a red edge denoting that the no-go count is larger. For the PC MV AR results, arrows are used to denote causality and unidirectional arrows have been tipped yellow to help them stand out. . . . . . . . . . . 127 5.7 Spectral PCU and PC MV AR LFP monkey data early response (120 ms) results. The thickness of the edges is proportional to the difference in the corresponding counts with a green edge denoting that the diamond count is larger and a magenta edge denoting that the line count is larger. For the PC MV AR results, arrows are used to denote causality and unidirectional arrows have been tipped blue to help them stand out. . . . . . . . . . . . 128 xi List of Tables 3.1 Average SNR order statistics in dB (10log 10 ) of the 5 mixed in simulated sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1 Cortical region names and abbreviations . . . . . . . . . . . . . . . . . . . 77 4.2 The number of stable edges (SE), average edge variance (AEV) , and edge variance standard deviation (Std) for each method andN ′ . . . . . . 80 4.3 Cortical region names and abbreviations . . . . . . . . . . . . . . . . . . . 84 5.1 Spectral PCU and PC MV AR go and no-go counts for each edge in Fig- ure 5.6. A bidirectional arrow is used to denote two opposite directional edges of the same color, and for its counts we report the averages of its direction edge counts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2 Spectral PCU and PC MV AR diamond and line counts for each edge in Figure 5.7. A bidirectional arrow is used to denote two opposite direc- tional edges of the same color, and for its counts we report the averages of its direction edge counts. . . . . . . . . . . . . . . . . . . . . . . . . . 130 xii Chapter 1 Introduction The human brain is the basis of the human experience: everything we think, feel, per- ceive, and experience is processed through our brains. As such, scientists have long sought to better understand the human brain. However, the power that has attracted sci- entific inquiry derives from a great complexity that has made the human brain slow to reveal many of its secrets. It is common among neuroscientists to view the human brain as an extensive net- work. In keeping with this analogy, neuroscientists have set out to study the behavior and properties of the human brain through the identification of its constituents (network nodes) and the anatomical network connections between them, collectively termed the human connectome (Sporns et al., 2005). Here the anatomical connections are the neu- ronal connections formed by convoluted, branching chains of neurons (the basic units that store, process, and transmit information in the brain). These networks have been in- ferred at different combinations of scale and imaging modality, particularly 1) the cellu- lar scale (single neuron to neuron connections) via light and electron microscopy (Bock et al., 2011; Lichtman et al., 2008), 2) the macroscopic aggregated scale (large brain systems with distinct anatomical units) using anatomical connectivity information via diffusion magnetic resonance imaging (MRI) (Gong et al., 2009; Hagmann et al., 2007), and 3) the macroscopic aggregated scale using recordings of some measure of brain ac- tivity (functional measurements) (Achard et al., 2006b; Friston, 1994). Although statis- tical relationships in functional data is not a direct measure of anatomical connectivity, it 1 has been shown to significantly correspond to anatomical connectivity (Hagmann et al., 2008; Sporns et al., 2005) and is very often studied for the information it provides about the operation of brain networks (Raichle et al., 2001; Ding et al., 2000; Sehatpour et al., 2008). In this dissertation our primary concern is improving the inference of statistical rela- tionships in functional and structural brain data, although the concepts and tools used to this end are general enough to be applied to many other problems. I will refer to these statistical relationships as simply connectivity and explicitly reference anatomical con- nectivity i.e. the physical connections between neurons when needing to differentiate between the two. The decision to also consider connectivity in structural data, in particular from mor- phological structural features, is perhaps a bit unusual as this application has been rel- atively neglected. Yet I believe it could provide significant insight into brain architec- ture (Lerch et al., 2006). Brain tissue thickness measurements taken from the same regions across multiple, different brains is an example of such a feature. Although the exact nature of the relationship between these features and anatomical connectivity is unknown, the features have been implicated in many important phenomenon such as diseases (Lerch et al., 2005; Sallet et al., 2003; Hardan et al., 2004), sex differences (Zilles et al., 1997), aging (Sowell et al., 2003), and even anatomical connectivity itself (He et al., 2007). After a brief background on the brain and its processing in Chapter 2, I propose and apply a novel blind source seperation (BSS) method for application to structural brain data on the cerebral cortex in Chapter 3.1.3. BSS methods have become increasingly popular for identifying, separating, and de- noising brain signals. BSS works by decomposing the dataset into a linear mixture of 2 sources best matching some target property with typically little or no other prior knowl- edge about the sources or mixture model. Different methods optimize with respect to different target properties and have potentially large differences in performance depend- ing on how suitable a method’s target property is for a particular dataset. Here I focus on the application of BSS to cortical thickness and sulcal folding data on the cerebral cor- tex. Because modeling information is generally lacking for such data, the generality of BSS’s mixture model makes it a promising investigative tool. Of the BSS methods, the second order blind identification (SOBI) method has a number of properties particularly well-suited for such data. However, the method also relies on the calculation of lagged covariance matrices that do not automatically extend from the original one-dimensional framework to the two-dimensional cortical manifold. To address this problem I general- ize the one-dimensional shift as a two-dimensional rotation of the cortical surface. Our method outperforms regular SOBI and the popular FastICA BSS methods in simulation and its application to three thickness and cortical curvature based real datasets reveals original structural networks, where “structural” refers to the fact that the network was generated from structural neuroimaging data. Next I consider the problem of inferring brain network connectivity using the more traditional approach of pairwise and conditional pairwise interaction measures. These measures quantify the strength of a statistical association between two random variables or time series. In particular I seek to improve the inference of correlation (Kendall, 1945), coherence (Priestley, 1981), and Granger causality measures (Granger, 1969). In Chapter 4, I propose a new method to account for indirect connections in cor- relation networks. The computation of correlation is an important component of many statistical analyses of neuroimaging data. However, one limitation of correlation in mul- tivariate networks is that two variables can have a large correlation value while having 3 no direct interaction. Computing partial correlations can mitigate this limitation, but computing partial correlations can also be numerically unstable and require regulariza- tion. I propose the PCU algorithm 1 to address this problem. PCU, a modified version of the existing PC algorithm, is a sequential graph-pruning method that leverages the existence of smaller, but probabilistically equivalent partial correlation control sets to estimate sparse partial correlation networks. Simulations demonstrate that PCU offers robust performance compared to alternative partial correlation estimation methods, such as the graphical lasso (glasso) and inversion of the sample covariance matrix, while asymptotically controlling the false discovery rate. Application of PCU for the popula- tion analysis of real cortical thickness and resting-state functional magnetic resonance imaging data results in highly similar networks. Furthermore, automatic clustering of the resting-state network tends to group network components into a number of known subnetworks. Performance on the cortical thickness dataset is also evaluated in terms of network stability with bootstrapping, which is also used to identify unstable network edges to be removed from the network visualization. In chapter 5, I extend the PCU and glasso methods to take into account indirect cor- relations in time series and call the resulting methods spectral PCU and spectral glasso. The key to the extension is the substitution of correlation for coherence, which can roughly be thought of as the correlation between the Fourier coefficients of the discrete Fourier representations of time series. Another extension of PCU to multivariate autore- gressive (MV AR) time series, termed PC MV AR, looks at a different type of connectivity measure, namely Granger causality, and allows for the inference of directed networks. I again evaluate these methods on synthetic data and see that they particularly excel at 1 The PCU algorithm is based on and named after the PC algorithm, which in turn gets its name from its authors’ first initials. The ’U’ in PCU denotes that PCU generates undirected graphs, as opposed to the directed graphs generated by the PC algorithm. 4 low sample sizes. I also apply these methods to data from a task-based functional study and show that the resulting spectral PCU and PC MV AR graphs are consistent with one another, experimental expectations, and other published analyses of the same data. 5 Chapter 2 Background This chapter reviews fundamental concepts and material that will help put the following chapters in context and aid in their comprehension. 2.1 The Human Brain Gray matter composes the outer most part of the brain and contains neuronal cell bodies (soma) and unmyelinated fibers that result in its eponymous gray color when preserved. White matter can be found inside the layer of gray matter and consists of mostly glial cells and myelinated axons through which neurons communicate. The cerebrum con- tains both gray and white matter, is the most superior part of the human central nervous system, and makes up the majority of the brain. The outside layer of the cerebrum is known as the cerebral cortex and contains gray matter 2–4 mm thick. It plays a role in many higher-level functions such as attention, memory, and language, and is often modeled as a highly convoluted two-dimensional surface with many peaks and valleys. A peak in the surface is known as a gyrus (plural gyri), and a valley is known as a sulcus (plural sulci). Although there is significant variability in the human brain, many of the sulci and gyri are common to most brains; although, their shapes vary. The interhemi- spheric fissure separates the brain into two roughly symmetric hemispheres (differenti- ated left vs. right) that are connected at the corpus callosum. Figure 2.1 labels some of these structures on 3 orthogonal slices through an MRI brain volume and 2 views of the 6 midcortical surface (the surface halfway between the white matter/gray matter boundary and the outer edge of the cerebral cortex). 7 (a) Views from three orthogonal slices through a structural MRI brain volume. The green line indicates the white matter/gray matter boundary of the cerebrum. The cerebral cortex is the boundary outside this that separates the gray matter from the cerebral spinal fluid, which shows up as the black areas inside the skull. (b) Two-dimensional surface model of the midcortical surface of one brain hemisphere. The hole in the medial view results from the removal of the corpus callosum necessary for separating the two hemispheres. Such surfaces can be generated from a structural MRI brain volume, such as the one shown in (a). Figure 2.1: The brain and its basic structures 8 2.2 Processing Brain Data The two most common types of brain imaging modalities are functional and structural. Functional imaging modalities, such as positron emission tomography (PET), electroen- cephalography (EEG), and functional magnetic resonance imaging (fMRI), measure properties of the brain that are relatively dynamic (on the order of seconds) and change with brain activity. For example, fMRI measures blood oxygenation levels that change with local brain activity metabolism. Anatomical imaging modalities, such as computed tomography (CT) and magnetic resonance imaging (MRI), measure comparatively static properties of brain tissue and can be used to differentiate between tissue types, such as gray and white matter. Once brain data has been acquired, significant preprocessing is typically required before it can be used. Here we focus on the high-level procedures for preprocessing structural MRI data using BrainSuite (Shattuck and Leahy, 2002), although, much of what is discussed is sufficient generic to also apply to alternative software packages. As its name suggests, structural MRI data conveys information about structure (e.g. which points in the brain likely correspond to gray matter and which to white matter). From this, structural features, such as gray matter thickness (the thickness of the cerebral cortex) or cortical surface curvature (a characterization the local shape of the cortical surface), can be derived. Often this data comes from a T1-weighted MRI acquisition sequence that results in a 3-dimensional MRI volume, such as that shown in Figure 2.1a. To get at the gray matter thickness or cortical curvature data in an MRI volume a number of preprocessing steps need to be taken. In particular, the skull needs to be segmented from the brain and the effects of nonuniformity in the magnetic field need to be corrected. This non-uniformity distorts the measured voxel intensities (Figure 2.1a), 9 where a voxel is the volumetric, three-dimensional analog of the two-dimensional pixel. Next, the brain needs to be classified by tissue type, such as white or gray matter, and the cerebrum identified and separated from the rest of the brain. Because these steps are not perfect, they often introduce topological errors, such as holes, in the cortical surface, which do not exist in non-pathological brains and need to be corrected. Towards the end of the process, the cortical surfaces are generated and typically represented as large triangulated meshes composed of a set of vertices and a set of edges. The edges connect the vertices to cover a cortical surface in many small tessellated triangles (Figure 2.2). The gray matter surface (the outer edge of the cerebral cortex) and the white matter surface (the gray matter / white matter boundary outlined in green in Figure 2.1a) are often used to get gray matter thickness measurements by calculating the distance between corresponding points on the two surfaces. The cortical curvature of these surfaces can be readily estimated from their tessellations. For simplicity, these measurements are typically calculated at the tessellation vertices. Figure 2.3 summarizes this preprocessing pipeline. Although the different neuroimaging modalities provide a wealth of information, reaching their full potential requires the ability to make comparative statements between datasets recorded from different recording sessions or scans. For example, the effect of Alzheimer’s disease on gray matter thickness could be evaluated by comparing the change in gray matter thickness at particular points in the brain as the disease progresses in a subject, i.e. a longitudinal study. However, because the brain will not be oriented exactly the same between different scans, there needs to be a way to match measured points in one scan to their locations in another. This problem is complicated further if one wants to perform a group study and compare brains from different subjects. In such cases, instead of identifying the exact same points between different subjects, as they 10 Figure 2.2: A closeup of part of the triangulated mesh shown in Figure 2.1b after setting the triangle faces to be transparent. do not exist, the shared features of the human brain can be used to match points homol- ogously. Often these features relate to sulcal folding patterns because they are readily available, shared across subjects, and implicated in the function of the brain. Another way to view this point matching process is as a procedure that aligns brains and places them into a common coordinate system. This process of finding a mapping between corresponding points is known as registration. In multi-subject studies all subjects are typically registered to a common template brain which often has already been parcel- lated into regions of interest. This puts all of the subjects in the same coordinate system and allows the region of interest labels to be automatedly propagated to each subject brain such that each brain does not need to be labeled individually. 11 Figure 2.3: A visualization of BrainSuite’s processing pipeline 2.3 Correlation Analysis in Neuroimaging Correlation is perhaps the simplest and most ubiquitous measure of dependence be- tween two random variables. It has also served as the primary analysis tool in many neuroimaging studies. Here we give an overview of the concepts involved in such a study so that the reader has some background with which to put our proposed methods into context. Although there are many types of correlation, here we focus on the most widely used, Pearson’s product-moment correlation coefficient, henceforth referred to as simply correlation or correlation coefficient. The correlation coefficient quantifies the strength of linear dependence between two random variables as a range between +1 and -1 such that its magnitude corresponds to the strength of the dependence and its sign the sign of the dependence. For example, consider three random variables X and Y and Z following the relation X = c Y Y + c Z Z where thec’s are scaling constants andZ is independent of the other two random variables. For any positive c Y held constant as c Z varies, at c Z = 0 the correlation betweenX andY is +1 and decreases asymptotically to 0 as the magnitude (absolute value) ofc Z increases. If the relation wereX =−c Y Y +c Z Z, the correlation coefficient as a function ofc Z would be the negative of the previous case. Holdingc Z constant while increasing the magnitude ofc Y leadsZ to make up an increasingly smaller portion of 12 Figure 2.4: Correlation coefficient betweenX andY whereX =c Y Y +c Z Z andY and Z are independent random variables with unit variance over a range ofc Y andc Z values. X, effectively increasing the signal to noise ratio betweenX andY and increasing the magnitude of their correlation coefficient. Figure 2.4 shows the correlation coefficient betweenX andY as a function ofc Z andc Y . The correlation coefficient’s measurement of linear dependence leads to a number of behaviors that are important to its proper application and interpretation. Foremost, while a non-zero correlation implies linear dependence, two random variables could be highly dependent and still have a zero correlation coefficient. For example, let X now be a random variable uniformly distributed between 0 and and random variables Y = sin(X) andZ = cos(X). Clearly,Y andZ are highly dependent asY can be perfectly predicted fromZ asY = sin(cos −1 (Z)), but the correlation coefficient betweenY and Z is zero due to how the nonlinear relationship between the variables mathematically works out. Unfortunately, very often in practice, such as with measures of brain activity, the relationships between quantities to be analyzed are nonlinear to some degree. How- ever, cases in which there is dependence and the non-linearities result in an identically 13 zero correlation coefficient are a minority. Instead, often the non-linearities will skew the correlation coefficient towards zero, making the dependence harder to detect. One important exception to this are random variables following the multivariate normal dis- tribution for which zero correlation is equivalent to statistical independence (Whittaker, 1990). This is particularly important because many real-world measurements follow or approximately follow this distribution (Chung et al., 2003). Indeed, it is not uncommon for researchers to simplify and approximate calculations by treating data as though it follows a multivariate normal distribution. The correlation coefficient between random variables X and Y is mathematically defined as XY = cov(X;Y ) X Y (2.1) where cov(X;Y ) = E{(X − X )(Y − Y ) ∗ } is the covariance betweenX andY , (:) ∗ denotes the complex conjugate operator, X = E{X} is the mean of X, E{:} denotes the expected value operator, and X = » cov(X;X) is the standard deviation of X. Although the complex conjugate operator has no effect on the real random variables we have considered so far, we include it here because it will be relevant for the com- plex random variables of Chapter 5. In practice, often the distribution of the random variables is not known, preventing direct calculation of the expected values, but if multiple realizations of the random variables are available, they can be used to esti- mate the correlation coefficient by evaluating its expression with the sample estimates of the covariances. Sample estimates approximate the expected value operator of an expression as a mean taken over evaluations of that expression formed from indepen- dent realizations of the corresponding random variables. Let there be a dataset of N 14 pairs, {(x 1 ;y 1 );(x 2 ;y 2 );:::;(x N ;y N )}, where (x i ;y i ) is formed from the i-th obser- vation/realization of random variables X and Y , then the sample correlation estimate is ^ XY = ^ cov(X;Y ) ^ X ^ Y (2.2) where ^ cov(X;Y ) = 1 N−1 ∑ N i=1 (x i − ^ X )(y i − ^ Y ) ∗ , ^ X = 1 N ∑ N i=1 x i and ^ X = » ^ cov(X;X). The N − 1 denominator in the sample covariance expression corrects for the small correlation between each observation and its corresponding sample mean and leads to an unbiased estimator, namely thatE{ ^ cov(X;Y )}= cov(X;Y ). However, it is also not uncommon to divide byN instead ofN − 1 such that the covariance esti- mator has less variance but as a tradeoff is only asymptotically unbiased asN goes to infinity. In terms of ^ XY however it is easy to see that the choice makes no difference as the term cancels out. As with the actual correlation coefficient, ^ XY ranges between -1 and +1, and furthermore, converges to the true correlation coefficient asN increases. The simplicity of the correlation coefficient estimate allows for powerful statistical statements to be made about it. These statistical statements are important because the correlation coefficient estimate is itself a random variable and alone provides less infor- mation than one might think. For example, it is not unlikely at low sample sizes to esti- mate a large correlation coefficient from the data when in actuality the two random vari- ables are independent and the true correlation coefficient is zero: letN = 10 andX and Y be independently distributed normal random variables, then Pr(S^ XY S ≥ 0:5) ≈ 14% where Pr(:) denotes the probability of the specified event occurring. P-values and con- fidence intervals are common ways of quantifying such uncertainty in the estimate. P-values are used within the framework of statistical hypothesis tests. A hypothesis test involves two cases: a null hypothesis, denoted H 0 , which is assumed to be true and 15 an alternative hypothesis, denoted H 1 , which must be true if the null hypothesis is not. The goal is to use the data of random variable observations to decide that H 0 is false and in turn that H 1 true. A common hypothesis test for the correlation coefficient is H 0 ∶ = 0, H 1 ∶ ≠ 0 where proving H 1 means the two random variables are linearly dependent to some degree. This hypothesis test is a two-tailed hypothesis test because H 1 is specified for positive and negative values of the correlation coefficient, while H 0 ∶ = 0, H 1 ∶ > 0 is one-tailed and assumes that the correlation coefficient cannot be less than zero. In practice, one cannot generally prove with one hundred percent certainty that H 0 is false, so instead H 0 is rejected and H 1 is accepted when H 0 ’s model is sufficiently unlikely to have generated the data. When the decision to accept or reject H0 in this manner is based on a predetermined threshold and H0 is rejected, then the result is said to be statistically significant. If H 0 cannot be rejected to conclude H 1 , this does not mean H 0 is true; it only means that there is not enough evidence to make a sufficiently accurate conclusion either way. The p-value of a hypothesis test quantifies the strength of the evidence in favor of H 0 as follows. Let test statistic ^ be some function of the data that is used to determine whether to reject H 0 . (Often, ^ is an estimate of some statistic being tested by the hypothesis test e.g. ^ and.) Assuming that the null hypothesis is true, the p-value for a particular realization of the test statistic ^ 0 is the probability of a random realization of ^ being at least as unlikely or extreme as ^ 0 . For example, for the hypothesis test H 0 ∶= 0, H 1 ∶≠ 0, the p-value for a particular realization of the estimated correlation ^ 0 is Pr(S^ S≥ ^ 0 ) given that H 0 is true. The p-value can also be seen as the probability of a hypothesis test that rejects H 0 for all ^ at least as extreme as ^ 0 mistakenly rejecting H 0 when H 0 is true. This type of mistake is referred to as a type I error while a type II error is the reverse: accepting the null hypothesis when it is incorrect. Because researchers 16 tend to be conservative and more concerned about not making false conclusions as op- posed to missing new insight, generally neuroimaging studies focus on controlling some measure of the propensity to draw incorrect conclusions, such as the type I error rate, at the cost of a greater type II error rate. The type I error rate is controlled by calculating the a hypothesis test’s p-value for the particular 0 and only accepting H 1 when the p-value is less than some target type I error rate, typically denoted. (Some common target type I error rates in neuroimaging are=0.05 and=0.01, but error rates orders of magnitude smaller are not uncommon for particularly strong results.) Different hypothesis tests with different test statistics use different formulas to estimate the p-values; however, the general idea is the same: transform the test statistic under the assumption of H 0 to a random variable following some known distribution the does not depend on unknown parameters known as a pivotal quantity. Because the distribution of the transformed statistic is known, the p-value can be readily calculated by inverting the known distribution function. Here we give methods for calculating the p-value for two correlation coefficient hy- pothesis testing scenarios. For H 0 ∶ = 0, H 1 ∶ ≠ 0 with N samples, the sample correlation coefficient ^ defined above can be transformed into a t-statistic (a random variable following Student’s t-distribution) under the assumption of H 0 when the sam- ples are drawn independently from a bivariate normal distribution. The same expression also allows ^ to be transformed to an approximate t-statistic in the case of large N, irrespective of distribution as t= ^ ¾ f 1− ^ 2 17 where f = N − 2 is the degrees of freedom (Kendall, 1945). To test H 0 ∶ = 0 , H 1 ∶ ≠ 0 where 0 is some constant correlation coefficient value, Fisher’s Z- transformation can be used to transform ^ into a z-score (a random variable following the normal distribution with zero mean and unit variance) in the case of samples inde- pendently drawn from a bivariate normal distribution (Fisher, 1921): z = √ N − 3 2 ln (1+ ^ )(1− 0 ) (1− ^ )(1+ 0 ) Finding the p-value fromt andz is then just a matter of inverting their respective dis- tributions to find the probability of a t-score or z-score randomly having a magnitude at least as large as t or z. We note that ability of the Fisher Z-transform to calculate the hypothesis test for many different 0 ’s makes it particularly useful in calculating confidence intervals and refer interested readers to (Casella and Berger, 2001). Often one has more than just two random variables to work with and wants to calculate the correlation coefficient between each possible pair. For example, let X = (X 1 ;X 2 ;:::;X P ) T now represent anP -length random vector whereX i is thei-th random variable making up the vector. The set of all correlation coefficients between each possible pair can be represented as a symmetric matrix where the (i;j) and (j;i) elements correspond to the correlation coefficient betweenX i andX j . Similarly, the co- variance matrix, , is the symmetric (Hermitian for complex random variables), positive semi-definite matrix of covariances where elements (i;j) and (j;i) are the covariance betweenX i andX j : = E{(X − X )(X − X ) H } X = E{X} 18 where (:) H denotes the conjugate transpose operator. Just as before, the p-values can be calculated for each correlation coefficient pair and similarly organized into a matrix; however, testing for significance is no longer simply a matter of thresholding the p-values to enforce some target type I error rate, . To see this, consider 100 random variables and for each of them = 4; 900 possible unique and non-trivial correlation coefficients the hypothesis test H 0 ∶ = 0 , H 1 ∶ ≠ 0 is performed where a p-value less than = 0:05 leads H 0 to be rejected. In the case that the H 0 is correct for all correlation coefficients, such a scheme would falsely accepted 0.05×4,900 = 245 correlations as non-zero on average. To account for this, other error rates such as the family-wise error rate (FWER) and the false discovery rate (FDR) are used when dealing with multiple comparisons. The FWER is the probability of making at least one type I error out of the multiple hypothesis tests. Bonferroni correction is often used to control the FWER by testing the hypotheses individually, as if to control the Type I error rate, but with an adjusted target type I error rate of = m wherem is the number of performed hypothesis tests. A downside to this correction is that because the FWER is a strict measure of error, controlling it can result in many type II errors, and Bonferroni correction exacerbates this by often greatly over-correcting to ensure control of the error rate (Genovese et al., 2002). The FDR is popular for being less strict than the FWER and generally allowing more hypothesis tests to be deemed significant at the cost of a greater FWER. In the multiple hypothesis testing scenario, let random variable V be the number of type I errors and random variableR be the total number of hypothesis tests deemed significant (i.e. where H 0 is rejected), then the FDR is the ratio ofV toR on average: E{ V R }. The 19 FDR can be controlled at a levelq using the following Benjamini-Hochberg-Yekutieli (BHY) procedure (Benjamini and Hochberg, 1995; Benjamini and Yekutieli, 2001): 1. Sort thep-values in ascending orderp (1) ≤p (2) ≤:::≤p (m) . 2. Identify the largest indexi such thatp (i) ≤ i n q c(m) 3. Reject null hypotheses corresponding to{p (1) ;p (2) ;:::;p (i) } and accept those cor- responding to {p (i+1) ;p (i+2) ;:::;p (m) }. For any joint distribution of the test statistics, c(m) = ∑ m i=1 1 i is guaranteed to control the FDR. However, in the case of independent or positively dependent test statistics, c(m)= 1 can be used to gain significantly more statistical power. Positive dependence is a technical condition that very roughly framed in terms of correlation would be similar to prohibiting negative correlations; however, a definition is outside the scope of this dissertation. In practice,c(m)= 1 is often used by default with the justification that for neuroimaging data correlations tend to be local and positive (Genovese et al., 2002). We conclude this section by applying the discussed topics to the analysis of resting- state fMRI data of the human brain to serve as an example of brain connectivity infer- ence. The analyzed dataset comprises 100 functional (time) brain volumes from the an fMRI scan of one subject of the 1000 Functional Connectome Project’s Beijing_Zang dataset (http://www.nitrc.org/projects/fcon_1000/). Much important work goes into preprocessing fMRI data before it can be analyzed (see Section 4.2.4 and the 1000 Functional Connectome Project’s website), some of which overlaps with the structural preprocessing described in Section 2.2. However, for the sake of brevity we only describe the details necessary to understand the following correlation analysis. In particular, functional activity was sampled at many points on the cerebral cortex, which was parcellated into 70 regions of interest (ROIs; 35 per hemisphere). Each sample was 20 assigned to the ROI in which it was located (Figure 2.5e) such that multiple samples corresponded to each ROI for any particular functional activity volume in time. Next, the median was taken across the samples in each ROI to form a single summary statistic for each ROI and point in time. This ended the preprocessing and resulted in a 100 sample volume by 70 ROI data matrix ready for correlation analysis. In analyzing the dataset, the correlation matrix was computed first (Figure 2.5a). Next, for the H 0 ∶ = 0, H 1 ∶ ≠ 0 hypothesis test, correlation coefficients were con- verted to t-scores and then to p-values (Figure 2.5b). Although the t-score transforma- tion requires independent samples and brain activity is highly correlated with its past, this correlation tends to die out quickly as the length of time between to points increases. For fMRI data, treating the time samples as independent is a commonly made approxi- mation that is justified by the modality’s relatively large time step between samples and the typically larger than 100 number of samples. Figure 2.5c shows the result of using Bonferroni correction to control the FWER at =1E-6 and Figure 2.5d shows the result of using the BHY procedure to control the FDR atq =1E-6. As expected, the Bonfer- roni correction resulted in fewer significant hypothesis tests than the BHY procedure. Finally, the significant correlations, as determined by the BHY procedure, are shown as a network on the cortical surface, as is common in brain connectivity studies (Achard et al., 2006b). In particular, each network node has a corresponding cortical ROI and is positioned at the center of the ROI with an edge between two nodes if their correlation correlation was deemed significant (non-zero) by the hypothesis test (Figure 2.5f). 21 (a) (b) (c) (d) (e) (f) Figure 2.5: Results of the correlation connectivity analysis. (a) Matrix of correlation coefficient values between each of the 70 ROIs. Element (i;j) corresponds to the cor- relation between ROI i and ROI j. (b) Matrix of p-values for each of the correlation coefficients in (a). (c) Mask resulting from Bonferroni correction. Blue elements were deemed significant, red elements were not. (d) Mask resulting from the BHY proce- dure to control the FDR. (e) Lateral view of one hemisphere of the parcellated cortex with color-coded ROIs. (f) Result of the FDR-controlled correlation analysis visualized on the smoothed cortical surface viewed from the top. Edge darkness and thickness increase with the correlation coefficient. 22 Chapter 3 Second Order Blind Identification (SOBI) on the Cerebral Cortex With the advent of modern brain imaging methods, researchers now have a wealth of data to explore brain networks. In this pursuit, Blind Source Separation (BSS) meth- ods have become increasingly popular to identify, separate, and denoise brain signals (Makeig et al., 1997; Damoiseaux et al., 2006; Jung et al., 2000). BSS methods, such as Second Order Blind Identification (SOBI; Belouchrani et al., 1997) and FastICA (Hyvärinen and Köster, 1997), decompose a dataset into a linear mixture of sources that best match some target property with typically little or no other prior knowledge of the sources or mixture model. For example, Independent Component Analysis (ICA) is a very popular subset of BSS that relies on independence to separate and identify sources. Performance can vary greatly among different methods depending on how well the data satisfies each method’s assumptions, and therefore selecting a proper BSS method is critical. In general, most BSS methods fall into or are a hybrid of one of the three following categories: ICA Methods Using Higher-order Statistics (HOS): Sources are assumed to lack temporal, spatial, or other structure, requiring the use of HOS (often indirectly) to maximize source independence. Common methods include maximization of non-Gaussianity using kurtosis or negentropy, maximum likelihood estimation, minimization of mutual information, and tensorial methods. In almost all cases 23 these methods optimize different measures of the same criterion (Hyvärinen et al., 2001), and as such, they all suffer from the same inability to separate more than one Gaussian source. The FastICA method is a widely-used member of this group. BSS Methods Using Second-order Statistics (SOS): Sources have an underlying structure, allowing separation of merely uncorrelated (not necessarily indepen- dent) sources using SOS, such as shifted autocovariance matrices. SOS are more robust than HOS and allow Gaussian sources to be separated; however, SOBI and other methods in this category are unable to separate sources with identical power spectral densities (see Section 3.1.2). BSS Methods Using Nonstationarity: These methods require the data be nonstation- ary and often employ SOS to decorrelate the data at all data points (Matsuoka et al., 1995). Aside from nonstationarity, no other common requirements are placed on the data. Most BSS applications in brain imaging have been to fMRI (Calhoun et al., 2003; McK- eown et al., 1998), EEG (Vigário et al., 2000; Onton et al., 2006), and MEG (Vigário et al., 1998; Ikeda and Toyama, 2000) data. As a multivariate approach that makes min- imal assumptions on the underlying model, BSS is a natural choice for exploring brain signals, which often have a complex structure with no clear best model (Zhukov et al., 2000; Sato et al., 2001; Jung et al., 2001). The independence forced upon ICA sources and the decorrelation forced upon BSS sources using SOS make these methods ideal for noise and artifact removal (Callan et al., 2001; V orobyov and Cichocki, 2002) and separating functional brain networks (Calhoun et al., 2008) into subnetworks, such as the resting-state networks (van de Ven et al., 2004; Beckmann et al., 2005). In applying 24 these methods, one must choose the dimension(s) over which to optimize source inde- pendence or decorrelation with different applications favoring different dimensions. For example, temporal BSS is often used on EEG data to remove heart beat artifacts. Sim- ilarly, spatial BSS is often used on fMRI data to identify and remove head movement artifacts. In fact, BSS can be performed on any dataset dimension(s), not just space or time, and extensions to multiple dimensions, such as spatiotemporal ICA (Stone et al., 2002), exist. Many other modifications to basic BSS have been proposed; one notable example fuses EEG and fMRI data (Moosmann et al., 2008). One drawback of BSS is that analyses procedures do not readily generalize to group studies. This is in contrast to many univariate methods which provide a direct one-to- one correspondence between results across subjects. Many methods have been proposed to circumvent this problem, and for fMRI data alone these methods fall into at least 5 categories (Calhoun et al., 2009): combining single subject BSS’s (Esposito et al., 2005), temporally concatenating (Schmithorst and Holland, 2004), spatially concatenat- ing (Svensén et al., 2002), pre-averaging (Schmithorst and Holland, 2004), and using a tensorial framework (Beckmann and Smith, 2005). Similar to temporal concatenation, concatenating data from subjects that have been registered to a common anatomical space allows for BSS analysis of any multi-subject dataset. SOBI is frequently used on brain data (Tang et al., 2005; Joyce et al., 2004; Theis et al., 2004) for its ability to incorporate the correlation structure of the data and separate multiple Gaussian sources, which is not possible with ICA methods using HOS. Hyväri- nen points out that ignoring such structural information may lead other methods to be substantially suboptimal (Hyvärinen et al., 2001), which was recently corroborated by one study comparing the performance of 22 different BSS methods on simulated EEG data (Klemm et al., 2009). Source separation is achieved by joint diagonalization of a 25 set of shifted/lagged covariance matrices computed across a single dimension, typically time. Although application of SOBI to one-dimensional data is straightforward, the computation of shifted covariance matrices on the two-dimensional cortical manifold is not. In this dissertation we extend SOBI to data on the cerebral cortex using a two- dimensional analog of the one-dimensional shift to construct the covariance matrices. We further propose a new way to study cortical brain networks using BSS: structural networks of gray matter thickness, grey matter thickness variance, and cortical curvature variance. Unlike functional networks and diffusion anatomical networks, little attention has been given to such networks based on the general physical structure of the corti- cal surface. Thickness changes have been implicated in brain development and aging (Sowell et al., 2002), and pathologies of the central nervous system, such as schizophre- nia (Narr et al., 2005) and Alzheimer’s disease (Lerch et al., 2005). Furthermore, brain networks constructed from inter-regional gray matter thickness correlations share many pathways in common with diffusion tensor images and manifest small-world and mod- ular properties consistent with functional studies (He et al., 2007; Chen et al., 2008). Cortical folding patterns, whose shape can be locally quantified by the curvature met- ric (Luders et al., 2006), have also been associated with autism (Hardan et al., 2004), schizophrenia (Sallet et al., 2003), and sex differences (Zilles et al., 1997). In one recent study, cortical shape was represented by estimates of regional joint probability density functions of an orthogonal reparameterization of the principal curvatures (Awate et al., 2009). Differences in these joint densities were then connected to differences in type of congenital heart disease in neonates. To our knowledge, thickness and curvature re- lationships have not been previously studied with BSS, yet BSS is ideal for structural 26 studies because they lack prior information about the signal topographies. Here we val- idate our BSS method using simulation and identify significant structural networks in thickness and curvature. 3.1 Blind Source Separation Theory and the Modified SOBI Algorithm In this section we provide a background on BSS, ICA, and the regular SOBI approach, and then introduce our SOBI method for the cerebral cortex. 3.1.1 Blind Source Separation (BSS) BSS is well explained by the cocktail party problem. Imagine N guests chatting at a cocktail party in an acousticless, echoless room with N microphones. The problem is to take the microphone readings, and from them, separate out the individual voices without knowing the layout of the guests or microphones. Letx i (t) ands i (t) represent the i-th microphone recording and the i-th guest voice source as functions of time, re- spectively. A reasonable model, assuming the guests and microphones are stationary, would be to represent each microphone recording as a weighted sum or mixture of the voice sources: x i (t) =A i1 s 1 (t)+A i2 s 2 (t)+:::+A iN s N (t) whereA ij is the weight applied to thej-th voice for thei-th microphone and roughly represents the proximity of the two. This model can be vectorized by letting x(t) = [x 1 (t)x 2 (t):::x N (t)] T , s(t)= [s 1 (t)s 2 (t):::s N (t)] T andA be aN×N matrix with elementsA ij where (:) T represents the matrix transpose operation: x(t)=As(t): (3.1) 27 In the context of brain imaging,s i (t) could be electrical signals produced from different brain areas, andx i (t) the recordings from sensor electrodes surrounding the head. BSS then poses a target source property that constrains the solution of this underde- termined problem and allows for the estimation of mixing matrixA and sourcess i (t) and with the exclusion of possible permutations and scalings of sources that cannot be recovered. For example, in the case of ICA the target source property is independence of all sourcess i (t). It should also be noted that even though signals here are functions of time, the same analysis directly extends to signals parameterized by space or another index. Note, to see BSS’s scaling and permutation indeterminancy letD,P , andI be an N ×N diagonal, permutation, and identity matrix respectively and note thatP T P =I: x(t)=As(t)= [AD −1 P T ][PDs(t)]: (3.2) In the cocktail party problem we assumed that the number of microphones and num- ber of voices were equal. Although there are techniques for handling when they are not equal (Hyvärinen et al., 2001; Lee et al., 1999), we will continue to make this simpli- fying assumption throughout the chapter as it is a separate issue from the problem of extending SOBI to the cerebral cortex. Similarly, we will also assume thatA is full rank and thus givenA the sources can be estimated asA −1 x(t). Most BSS methods begin by prewhitening the observed data x i (t), which simpli- fies the estimation of sources. Define the source time-lagged covariance matrix as C s () = E{s(t)s ∗ (t−)} with (:) ∗ redfined according to BSS convention to indicate the conjugate transpose operator. Sources s i (t) are uncorrelated and, without loss of generality, can be assumed to have zero mean and unit variance. Therefore,C s (0) =I andC s ()=D , whereI is theN×N identity matrix andD is a diagonal matrix. In the whitened space, the model in Equation (3.1) becomes: 28 z(t)=WAs(t) (3.3) where the whitened signals z i (t) compose z(t),W is the whitening matrix operator, andC z (0)=I. Computing the 0-lagged covariance from Equation (3.3) gives: I =WAA ∗ W ∗ (3.4) indicating thatWA is a unitary matrix, which we denote asU. Whitening greatly simplifies the BSS estimation problem: sinces(t)=U −1 z(t) andU is a unitary matrix, the solution requires estimation of onlyN(N − 1)~2 rather thanN 2 parameters. 3.1.2 The Second Order Blind Identification (SOBI) Method SOBI is a significant albeit incremental improvement on the BSS algorithm called AMUSE. Computing a time-lagged covariance from Equations (3.3) and (3.4) gives C z ()=UC s ()U ∗ =UD U ∗ (3.5) A consequence of the above is that diagonalizing a time-lagged covariance matrix of the whitened data solves the BSS problem, sinceU contains the eigenvectors ofC z () ands(t) =U −1 z(t). The AMUSE BSS method (Tong et al., 1990) diagonalizes only a single autocovarianceC z () from one time lag. However, source signals cannot be separated when there are repeated eigenvalues because the eigenvectors ofC z () would not be uniquely defined, and it is not possible to know beforehand which time lags will lead to autocovariance matrices with unique eigenvectors. 29 To ameliorate this problem, SOBI jointly diagonalizes multiple covariance matrices: C z ( i )=UC s ( i )U ∗ =UD i U ∗ ; i ∈ { 1 ::: m } (3.6) When enough time-lags are selected, a unique solution always exists unless signalss i (t) have identical power spectrums or equivalently, identical autocovariances (assuming their Fourier transforms exist). Another benefit is that SOBI’s statistical efficiency is improved by drawing on a larger set of statistics. Numerical inaccuracies and deviations from the ICA model do not allow for perfect, simultaneous diagonalization of multiple autocovariance matrices, so instead we seek the best approximate diagonalizer. This is achieved by minimizing some measure of non-diagonalization, often the sum of the squared off-diagonal elements. There are a number of ways to do this and they all generally perform well (Belouchrani et al., 1993; Golub and Van Loan, 1996; Hyvärinen et al., 2001). 3.1.3 SOBI on the Cerebral Cortex In this dissertation we expand the above SOBI method to two-dimensional spatial data on the cerebral cortex. Here an independent sourcei has a cortical maps i (l) withl∈R 3 indicating the location on the cortex. We want to identifyN independent cortical sources s(l) = [s 1 (l)s 2 (l):::s N (l)] T fromN measurementsx(l) = [x 1 (l)x 2 (l):::x N (l)] T . Here measurements are derived from curvature and thickness maps fromN subjects, but of course the method is more general. 30 The intuitive analog of a one-dimensional temporal shift is a bijection on the cortex such that every point moves the same geodesic distance d along the surface. Spatial shifted covariance matrices would then be defined as: C z (d)= E l {z(l)z ∗ (l+d l )}; Yd l Y=d (3.7) Ideally, shifts in every location l should be coherent, rather than in random direc- tions, so that they capture any anisotropy in the data. Because the cortex is a highly convoluted manifold, we approximate coherent shifts by inflating the cortical surface to the sphere and rotating along this space. Inflation is performed using FreeSurfer, which maps a triangulated mesh of a hemisphere to an inflated, spherical version by minimiz- ing an energy function consisting of metric and topology preserving terms (Fischl et al., 1999). Small rotations in the spherical coordinate system proceed as shown in Figure 3.1a. For each rotation r ∈ R, we compute an estimate autocovariance matrix ^ C r z from the whitened dataz(l). Subsequent joint diagonalization of these autocovariance matrices leads to the underlying cortical sources. There exists no rotation such that points on the sphere move uniform distances: points at the poles will not move, while those at the equator will move the greatest distance. Furthermore, Euler’s rotation theorem states that any series of rotations would be equivalent to a single rotation about some axis. Therefore, seeking combinations of rotations to achieve a more uniform movement of points is of no consequence. Given the above, our SOBI’s autocovariance matrices are created from data pairs of varying shift distances, a departure from the original SOBI in Equation (3.6). This is equivalent to composing the estimated autocovariance matrix ^ C r z from a weighted sum of autocovariance matrices of different spatial shifts: 31 Figure 3.1: Estimation of rotation autocovariance matrices through brain rotations on the sphere. a) Rotations are displayed with a curvature map for clarity; however, we use the variance of these maps for BSS analysis; b-left) A reference point O is shifted to P through a rotation with an axis normal to the OPC plane where C is the center of the sphere; b-right) Multiple rotations are achieved by shifting point O to several neighboring points. The distance of the points is exaggerated for clarity. ^ C r z ∝ S d S {l∈S 2 ∶ Yd r l Y=d} z(l)z ∗ (l+d r l )dl r dd = S d r d ^ C z (d)dd (3.8) 32 whereS 2 is the unit sphere and r d are scalar weights. Diagonalizing rotation autoco- variance matricesC r z also recovers the unitary matrixU because S d r d C r z (d)dd= S d r d UD d U ∗ dd =U S d r d D d ddU ∗ =UD r U ∗ : (3.9) Since there is no large regular tessellation of a sphere (Kraitchik, 1953; http:// mathworld.wolfram.com/TriangularSymmetryGroup.html), rotations are performed through interpolation. To select the rotation set R, we use the icosa- hedron subdivision method (Korn and Dyer, 1987) to create a semiregular tessellation of approximately 10,000 nodes. We then randomly select a reference point O, and rotate toward its 50 nearest samples (Figure 3.1b). 3.2 Dataset and Preprocessing Brain connectivity, development and pathologies of the central nervous system affect the gyral and sulcal patterns and thickness of the human cortex. To study such spatial patterns in cortical foldings and thickness, we applied our SOBI approach to cortical maps of local thickness, thickness variance and curvature variance. Maps of the local curvature itself were excluded from analysis because inter-subject registration relies on these maps and would confound the analysis. Our data set consisted of MRI volumes of 24 subjects and their coregistered cortical surfaces described in (Pantazis et al., 2010). These 24 subjects were all right-handed 33 and came from two groups. Group 1 comprised 12 subjects (6 males, 6 females) with mean age 26 years and range 22–28 years. The subjects were scanned at the Dornsife Cognitive Neuroscience Imaging Center at the University of Southern California using a 3-T Siemens MAGNETOM Trio scanner. High-resolution T1-weighted anatomical volumes were acquired for each subject with an MPRAGE scan using the following protocol: TR = 2350 ms, TE = 4.13 ms, 192 slices, field of view = 256 mm, voxel size = 1.0× 1.0× 1.0 mm. Group 2 comprised 12 age and gender matched subjects to group 1: 6 males, 6 fe- males, mean age 24.9 years, range 22–28 years. Scanning occurred at the University of Iowa using thin-cut MR coronal images obtained from a General Electric Signa Scanner operating at 1.5 T with the following protocol: SPGR/50, TR = 24, TE = 7 ms, matrix 256 × 192, field of view = 24 cm, yielding 124 contiguous coronal slices, 1.5 or 1.6 mm thick with an interpixel distance of 0.94 mm. Three datasets were obtained for each subject. These were coregistered and averaged post-hoc using Automated Image Regis- tration (AIR 3.03; Woods et al., 1998). The final data for each subject had anisotropic voxels with an interpixel spacing of 0.7 mm and interslice spacing of 1.5–1.6 mm. The midcortical surfaces were extracted using FreeSurfer (Dale et al., 1999). Cor- tical gray matter thickness maps were also generated by FreeSurfer during cortical ex- traction and cortical curvature maps characterizing the local shape of the cortex were calculated using an approximation of mean curvature (referred to as “convexity mea- sure” in Pantazis et al., 2010). Both maps were median filtered and smoothed using isotropic filtering with an equivalent Gaussian kernel of 2 = 1.15mm 2 (Joshi et al., 2009b). Maps of local thickness variance and curvature variance were then estimated as follows: the datasets were mapped to the unit sphere, and for each vertex, a local 34 neighborhood (distance within 0.25) was used to calculate the variance of the corre- sponding data. The thickness data was further isotropically smoothed, equivalent to one smoothing application with a Gaussian kernel of 2 = 10.0 mm 2 . To increase the number of observations at the cost of not being able to resolve inter- hemispheric differences, we coregistered left and right hemispheres to a common target hemisphere and treated them as separate observations. This registration was performed using a manual registration technique that uses a set of hand-traced sulcal curves as landmarks (Pantazis et al., 2010; Joshi et al., 2007). Because we are interested in detect- ing how patterns co-vary across the cortex, we removed each dataset’s average cortical map (Figure 3.2). To restore full rank we projected the maps to the eigenvectors of the (N − 1) largest eigenvalues of their covariance matrix. One subject was excluded from the curvature variance and thickness variance analysis because of incomplete data. Thus, the final datasets for BSS analysis were one (N−1)×140,120 (hemispheres× ver- tices;N = 48) matrix of thickness maps and two (N − 1)×140,120 (N = 46) curvature and thickness variance maps. Figure 3.2 shows the distribution and average cortical map over all hemispheres (after the aforementioned preprocessing) of the thickness (kurtosis 3.48), thickness variance (kurtosis 8.13) and curvature variance (kurtosis 3.15) datasets. 3.3 Simulation To evaluate our extended SOBI approach against regular SOBI and FastICA, we created 5 simulated sources for each of the thickness, thickness variance, and curvature variance datasets. The sources were statistically similar to their corresponding non-simulated maps and were created as follows. We first smoothed cortical maps of independent, uniform-distributed U(-0.5,0.5) random variables by iteratively (1000 times) taking a weighted average of each cortical measurement with its immediate neighbors, namely 35 (a) Thickness (b) Thickness variance (c) Curvature variance Figure 3.2: Left: Average maps of thickness, thickness variance, and curvature variance, across all preprocessed hemisphere data. Right: histogram of the corresponding data across all subjects. assigning a weight w to the center location and 1−w k to the neighboring k locations. Different weights were used for each source with w approximately linearly ranging from 0.97 to 0.67 to ensure that they had different power spectral densities and were therefore separable by SOBI. The simulated sources were then histogram matched to a random combination of their corresponding thickness, thickness variance, or curvature variance maps and further smoothed 25 more iterations, which did not significantly 36 change the source histograms, but countered some of the discrete effects introduced by the histogram matching. We mixed the 5 simulated sources into their corresponding datasets as follows. Let X be theN × 140,120 mixture matrix of the original measurements,S theN × 140,120 matrix of original sources and S sim the new 5 × 140,120 matrix of the simulated sources. The original mixture modelX =AS (same as Equation 3.1, but signals are discretely sampled and parameterized in space) is expanded asX ′ =A ′ S ′ , where X ′ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ X B 1 X ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ + B 2 S sim (3.10) A ′ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ A S B 2 B 1 A S ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (3.11) S ′ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ S S sim ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ : (3.12) X ′ is the new (N+ 5)× 140,120 mixture data.B 1 is a random 5×N matrix andB 2 is a random (N+5)× 5 matrix, each containing independent, zero-mean Gaussian random variables scaled such thatX ′ andX have the same variances. The above mixing procedure was repeated a total of 10 times as a Monte Carlo simulation. 37 3.4 Jackknife Resampling Resampling techniques have been used to validate BSS results because BSS provides no immediate validation of its own (Meinecke et al., 2002; Westad and Kermit, 2003). The general idea is to equate validity with stability and use resampling techniques to estimate the variance of the results. Thus, sources with lower variance are considered more valid. Resampling methods, such as the jackknife and bootstrap, implicitly use the empirical distribution of a dataset (the mixtures in this case) to estimate descriptors, such as the variance and bias, of a statistic (the sources) by repeatedly computing the statistic on many resampled versions of the dataset. Here we make a slight variation on this approach and use the jackknife variance estimates to approximate z-score measures of source statistical significance. This is done for a particular source location by dividing the source estimate at that point by its estimated standard deviation. In particular, the delete-1 jackknife estimator of variance for the full-sample estimate of sourcei at vertex j, ^ S ij , can be written as var( ^ S ij )= N − 1 N N Q n=1 ( ^ S (n) ij − ^ S (:) ij ) 2 (3.13) where ^ S (n) is the resampled estimate calculated by performing BSS on all samples but then-th and ^ S (:) = 1 N ∑ N n=1 ^ S (n) ij . We approximate the z-score for sourcei at locationj as ^ S ij ¼ var( ^ S ij ) : (3.14) Thus, the less stable a source is, the greater its variances and the lower its z-scores. 38 3.5 Results We compared the performance of our modified SOBI against that of regular SOBI (Be- louchrani et al., 1993; implementation by EEGLAB: http://sccn.ucsd.edu/ eeglab/) and FastICA (Hyvärinen and Köster, 1997; implementation by Hyvärinen et al.: http://research.ics.aalto.fi/ica/fastica/) in simulation as described in Section 3.3 and in terms of stability with jackknife resampling as described in Section 3.4. We also applied these three methods to experimental data of cortical gray matter thickness, cortical gray matter thickness variance, and cortical curvature variance. Our SOBI and regular SOBI were set to diagonalize 50 autocovariance ma- trices and FastICA was left on its default settings with the exception of selecting the symmetric decorrelation approach to ensure that each run gave a consistent number of sources. Unlike our method, regular SOBI performed shifts that ignored spatial proxim- ity on the two-dimensional cortex by simply stacking all of the cortical measurements into a vector and shifting the elements of the vector to calculate the autocovariances. 3.5.1 Simulation We simulated 5 random sources of each data type, mixed them into their corresponding datasets (see Section 3.3), and performed BSS to recover the simulated sources. Each of the original simulated sources was then paired with their closest recovered source, and the quality of the separation was evaluated in terms of the following signal to noise ratio (SNR): SNR i = ∑ j (S sim ij ) 2 ∑ j (S sim ij − ^ S sim ij ) 2 (3.15) where SNR i is the SNR between true simulated sourcei (S sim i ) and the recovered simu- lated sourcei ( ^ S sim i ). Figure 3.3 shows a representative example of a simulated source 39 (a) Simulated source (b) Modified SOBI source estimate, SNR = 17.3dB (10log 10 ) (c) Regular SOBI source estimate, SNR = 1.85dB (d) FastICA source estimate, SNR = -1.21dB Figure 3.3: Example simulated source and recovered source from each of the three methods. and its estimate by the three methods after the source was mixed into the thickness dataset. The 5 SNRs were ordered and averaged over 10 trials for each BSS method and dataset combination, and Table 3.1 reports the results, showing that our modified SOBI method outperformed regular SOBI and fastICA for all sources. Thickness Thickness Variance Curvature Variance Modified SOBI 17 .1 14.2 11.7 8.77 7.10 7.95 3.64 2.31 0.675 -0.543 7.65 4.54 2.65 1.14 0.193 Regular SOBI 3.48 1.59 0.332 -0.353 -0.923 0.417 -0.215 -0.594 -0.984 -1.24 0.344 -0.477 -0.827 -1.12 -1.30 FastICA -0.519 -0.798 -1.06 -1.27 -1.53 1.03 -0.589 -0.789 -0.988 -1.38 1.19 -0.756 -1.02 -1.14 -1.42 Table 3.1: Average SNR order statistics in dB (10log 10 ) of the 5 mixed in simulated sources. 3.5.2 Experimental Data We performed BSS on the experimental data without adding any simulated sources. Figures 3.4, 3.5, and 3.6 show the three sources deemed most significant (according to 40 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.4: Cortical thickness BSS sources recovered from the experimental data. The three most significant sources (according to explained variance) are shown after nor- malization by their standard deviations. Sources are listed with decreasing significance from left to right. explained variance, i.e. the amount of variance a source contributes to the mixtures) for each BSS method on the thickness, thickness variance, and curvature variance datasets, respectively. Because we have no ground truth for the experimental sources, we instead tested their stability using the jackknife procedure described in Section 3.4. We normalized each source by the estimated variances and approximated z-scores according to Equation 3.14. Sorting the estimated curvature variance sources in descending order of explained variance and plotting against the median z-score magnitude across a source (Figure 3.7) shows a clear trend in which the sources with high explained variance tend to also have high z-score magnitudes, particularly for the SOBI methods. For succinctness, results are not provided for the other datasets since they demonstrate the same behavior. Figures 3.8, 3.9, and 3.10 show the z-score magnitude maps of the three most signif- icant sources for each BSS method for the thickness, thickness variance, and curvature 41 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.5: Cortical thickness variance BSS sources recovered from the experimental data. Caption similar to Figure 3.4. variance datasets, respectively. The z-score maps for the SOBI methods show a number of significant z-score regions that are large enough to correspond to underlying cortical structures, indicating that the regular and modified SOBI sources may be stable. In con- trast, the FastICA z-score maps either show no significant z-score regions or regions so small that they seem implausible and unlikely to correspond to any cortical structure. 3.6 Discussion Comparing the jackknife z-score results (Figures 3.8, 3.9, and 3.10) with their corre- sponding sources (Figures 3.4, 3.5, and 3.10, respectively), it is evident that the stability of a particular region is strongly positively correlated with the magnitude of the corre- sponding source region. As such, a telltale sign of a particularly inaccurate and unstable source would be the presence of areas of significant source activity with low corre- sponding z-scores, as is particularly prevalent for the FastICA method. A comparison 42 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.6: Cortical curvature variance BSS sources recovered from the experimental data. Caption similar to Figure 3.4. (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.7: Median z-score magnitude versus sources in order of explained variance for the curvature variance metric and each BSS method. between the two SOBI methods is less obvious because their z-scores tend to match their sources such that any z-score difference may represent differences in the methods’ inferred sources rather than their stabilities. The SOBI methods take the structure of the mixture data into account through their diagonalization of the shifted covariance matrices and are able to separate Gaussian sources, unlike ICA methods using higher-order statistics, such as FastICA. This proved 43 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.8: Z-score maps of cortical thickness BSS sources recovered from the exper- imental data. The three most significant sources (according to explained variance) are listed with decreasing significance from left to right. a significant advantage in separating structural data on the cortical manifold where the data is smoother and more Gaussian than the functional EEG and fMRI data FastICA is often applied to. In particular, the SOBI sources are smoother and have larger significant regions than the FastICA sources, which have too high spatial variability to reflect true underlying cortical structure. Furthermore, the cortical thickness and curvature vari- ance datasets have nearly Gaussian distributions (kurtosis close to 3), indicating that the true sources are probably also approximately Gaussian, which may explain some of FastICA’s particularly poor performance. This is also supported by the near Gaussian distributions of many of the sources identified by the two SOBI methods. The consistent performance of the regular SOBI method may be surprising when one considers that it is computing its shifted covariance matrices by linearly shifting a vector of cortical data irrespective of the cortical manifold. However, when FreeSurfer 44 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.9: Z-score maps of cortical thickness variance BSS sources recovered from the experimental data. Caption similar to Figure 3.8. extracts the cortex from the MRI data, it tends to automatically group neighboring corti- cal vertices together in the resulting storage vector. This incidental structure appears to benefit the regular SOBI method, allowing for consistent estimation of sources. How- ever regular SOBI still underperforms the modified SOBI method. We have considered a number of other approaches for extending the notion of the one-dimensional temporal shift to the cortical manifold. Even more basic than using a rotation, we could shift the same distance along different random tangential directions at each vertex on the inflated spherical version of the cortex. In practice we found that although this method performed better than regular SOBI, it performed worse than our current rotation scheme because the cortical data is slightly anisotropic and shifting all vertices in a consistent direction, e.g. by rotating, improved performance. An alternative approach would be to compute (or approximate) a Killing vector field (Jost, 2011) on the cortical manifold itself instead of the sphere for each shift to ensure that the distances 45 Source 1 Source 2 Source 3 (a) Modified SOBI (b) Regular SOBI (c) FastICA Figure 3.10: Z-score maps of cortical curvature variance BSS sources recovered from the experimental data. Caption similar to Figure 3.8. between the vertices is as preserved as possible. We did not pursue such approach because of its complexity and computational cost. The results of applying the BSS methods on the three real structural datasets show interesting patterns or modes of variation in the data that may relate to anatomical con- nectivity (neural fiber connections) or developmental processes. In particular, because the data measurements are linear combinations of the sources, an estimated source with different regions of substantially non-zero values may indicate statistical dependence between these regions or a mode of variation. 3.7 Summary We have proposed a modified version of SOBI tailored for application to data on the cerebral cortex by extending the estimation of one-dimensional shifted covariance ma- trices to the two-dimensional cortical manifold. This was done by inflating a cortical 46 hemisphere to a sphere and using rotations instead of the vector shifts of the original SOBI method. Our modified SOBI method was shown to outperform FastICA in terms of stability, source estimation in simulation, and plausibility of the sources from exper- imental data. Regular SOBI performed closer to our modified SOBI, however results were different on both synthetic and experimental data, with simulations clearly indicat- ing that modified SOBI achieved better source recovery. We also proposed a new way to study cortical brain networks in terms of structural relationships of thickness, thickness variance, and curvature variance. Separation of this data with the modified SOBI method revealed underlying modes of variation in cortical thickness, thickness variance, and gyrification. 47 Chapter 4 Robust Inference of Sparse Partial Correlation Matrices with FDR Control Correlation and partial correlation have been widely used in neuroimaging studies to analyze functional and structural brain connectivity (Biswal et al., 1995; Marrelec et al., 2006; Hagmann et al., 2008). Correlation measures linear dependence, but unfortunately cannot differentiate between direct dependence and indirect dependence due to shared dependence on a third factor. Partial correlation is a variation on traditional correla- tion that overcomes these limitations. However, computing partial correlations can be numerically unstable and require regularization. To address this, we propose the PCU algorithm for partial correlation connectivity analysis. Partial correlation differs from regular correlation by isolating the component of the correlation between two variables that cannot be explained through correlation with other variables (Whittaker, 1990). Partial correlation therefore naturally admits a graph- ical network representation in which the vertices or nodes are the variables and edges exist only between pairs of variables for which the partial correlation controlled for all other variables is nonzero. Thus such a network only contains connections between variables for which some of the structural or functional similarity cannot be explained through indirect, linear interactions with other variables. The partial correlations for 48 such networks are frequently estimated from the concentration matrix (the inverse of covariance matrix; Whittaker, 1990), but this method is generally suboptimal for brain networks. In the analysis of a brain network, the variables of interest are often quantitative measurements that are computed region-by-region across a parcellated brain. Joint anal- ysis of these regional measurements can be used to infer relationships between different brain regions of interest (ROIs). However often the number of ROIs in a brain network are such that the complexity the network will greatly exceed the amount of measured data making inference of the network by standard methods like the concentration matrix method very difficult. Noting that brain networks are often sparse (He et al., 2007; Achard et al., 2006b), researchers began applying sparse models to the data (Valdés-Sosa et al., 2005; Lee et al., 2011). Often, this is done by applying ` 1 regularization to an estimation prob- lem because` 1 regularization enforces solution sparsity and has the added benefit over other sparse regularizers of maintaining the convexity of the estimation problem (Lee et al., 2011; Meinshausen and Bühlmann, 2006; Friedman et al., 2008). The graphical lasso (glasso; Friedman et al., 2008) is a particularly popular example of such a sparse model that solves the` 1 regularized maximum likelihood concentration matrix estima- tion problem for data following the multivariate normal distribution. However, such sparse regularizing approaches address the problem of sparse network inference from an estimation perspective that does not necessarily focus on accurate inference of the binary network structure, i.e. its skeleton, or allow for direct control of its edge-wise error rate. Since the network skeleton serves as the basis of heavily investigated aggre- gate network properties, such as small worldness, centrality, betweeness, and modularity 49 (Stam et al., 2007; Supekar et al., 2008; He et al., 2007; Achard et al., 2006a), there are a number of applications where a detection-theoretic approach might be preferred. We propose an undirected version of the PC algorithm (Spirtes et al., 2000), PCU, that leverages network sparsity while maintaining a detection-theoretic perspective that focuses on accurate inference of the skeleton through control of the false discovery rate (FDR; Li and Wang, 2009). PCU improves statistical power and reduces sample size requirements compared to the concentration matrix method, improves speed by approx- imately a factor of 2 compared to the original PC algorithm, and has comparable perfor- mance to the glasso method, but with the added advantage of asymptotically controlling the FDR. To do this, PCU uses graphical modeling theory to identify smaller, yet math- ematically equivalent control sets for its partial correlation calculations compared to the concentration method. This chapter is organized as follows. Section 4.1 provides a brief overview of rel- evant background material, followed by a description of our proposed PCU algorithm. Section 4.2 describes our evaluation methods, presents a comparison of PCU, glasso, and the concentration method with both simulated data and MRI-derived brain cortical thickness data, and evaluates PCU’s stability across datasets and MRI modalities. Our conclusions are presented in Section 4.3. 4.1 Theory and Algorithms 4.1.1 Correlation and Partial Correlation Structural and functional measures in the brain are often highly spatially correlated. Correlations between two regions can reflect a direct connection or interaction between 50 them. Alternatively, such correlations may reflect indirect interactions where the co- variation can be driven by a third, mutually interacting region. The concept of partial correlation helps to resolve these two situations by computing only the part of the corre- lation between two regions that cannot be explained by covariation with other regions. Partial correlation is increasingly being used in place of correlation for analysis of brain data and can be computed through linear regression. LetX andY be two random variables inR,Z a set of control random variables that are all also inR, and ^ X Z and ^ Y Z the estimates ofX andY from linear regression on the variables inZ, respectively. The partial correlation coefficient betweenX andY givenZ is the correlation between the residualsR X =X− ^ X Z andR Y =Y − ^ Y Z : XY SZ = R X R Y (Kendall, 1945). In addition to removing indirect correlations, partial correlation can also reveal hidden correlations masked by the effects of the control set. Here we will make heavy use of the property that the partial correlation coefficient between any two variables is zero if and only if they are conditionally independent given the control set, assuming all variables follow a multivariate normal (MVN) distribution (Whittaker, 1990). In practice, ^ X Z , ^ Y Z , and the correlation between them XY SZ are not known a priori and must be estimated from observations of the random variablesX,Y , and those inZ. However, assuming the observation vectors ([X;Y;Z]) are linearly independent, as in any properly posed problem, estimation requires that the number of observations,N, be greater than the number of conditioning variables plus 2 or 3: N ≥SZ S + 2+ 1 where S ● S is the set cardinality operator and the +1 is only present if the mean is not known a priori and must also be estimated. Satisfying this condition can be difficult when the number of ROIs is large and the population size is small. In cases where the condition is not satisfied, although it is still possible to estimate partial correlations, it would not be possible to control for all other variables in a meaningful way. This is because if 51 SZ S is not reduced to satisfy the previous inequality, the residualsR X andR Y would be zero vectors and the partial correlations undefined. PCU addresses this problem by identifying smaller, but probabilistically equivalent control sets (i.e. XY SZ = XY SZ ′ = 0 whereZ ′ ⊂Z) that serve as the basis of its improved performance. Often instead of the estimating partial correlations with regression, they are calcu- lated from an estimate of the concentration matrix (the inverse of the covariance matrix of vector [X,Y ,Z], see Section 4.1.3; Whittaker, 1990). When the concentration ma- trix is estimated by inverting the sample covariance matrix, as is common because of its simplicity, the previous constraint on the number of samples also applies, lest the sample covariance matrix not be invertible. A major benefit of estimating partial correlations from the regression or concentra- tion matrix method is that they can be easily used to draw statistical insights, such as confidence intervals and p-values (Casella and Berger, 2001) in cases of either large sample sizes or normally distributed data. In this chapter we generate two-tailed p- values under a null hypothesis of zero partial correlation by transforming the partial correlation coefficients to Student’s t-scores as described in more detail in Section 2.3: t= ¾ f 1− 2 (4.1) wheref =N − 2−SZ S, and,N,Z, are again the partial correlation, sample size, and control set, respectively (Kendall, 1945). As a result, controlling the type I error rate at some target rate is simply a matter of rejecting connections with p-values above . The accepted connections can be combined and represented by a graph with edges between pairs of nodes for which the partial correlation is determined nonzero. 52 4.1.2 Undirected Graphical Models A graphical model combines a multivariate distribution with a graphical representa- tion that encodes the conditional independence structure between the random variables characterized by the distribution (Whittaker, 1990; Edwards, 2000). In particular, the random variables are depicted as nodes or vertices in a graph, and their conditional in- dependence structure is encoded through their edges where a lack of an edge between two nodes denotes conditional independence (more on this below). In this way, graphs are useful for visualizing the conditional independence structure between a collection of random variables and the processes they represent. Additionally, this framework serves as a good basis for many properties, theorems, and methods, such as the PC algorithm (Spirtes et al., 2000; Koller and Friedman, 2009). An undirected graph, G = (V;E), is represented by a set of vertices, V = {V 1 ;V 2 ;:::;V P }, and a set of edgesE where {V i ;V j } ∈E⇐⇒V i andV j are adjacent or connected inG, denotedV i ∼ G V j . To define a graphical model forG, it is necessary to associate a set of conditional independence statements withV. These conditional in- dependence statements are often supplied by a distribution or the class of distributions with identical conditional independences. For example, define P-length random vec- torX = (X 1 ;X 2 ;:::;X P ) following some distributionP X such thatV i is the graphical counterpart to random variableX i fori∈ {1; 2;:::;P}. The correspondence betweenX i andV i allows the edges of the graph to convey a subset of the conditional independence relationships implied byP X according to the following properties: Global Markov Property - Let V A , V B , and V S be disjoint subsets of V. If V S is a separator of V A and V B in G (all paths between any vertex in V A and any vertex inV B pass through at least one vertex inV S ), then the random variables in setA are conditionally independent from those in setB givenS (Lauritzen, 1996). That is,V A ⊥ 53 G V B SV S Ô⇒X A á Px X B SX S where⊥ G andá Px denote separation inG and conditional independence with respect toP X respectively. Graphs having the global Markov property are called Independency maps or I-maps (Pearl, 1988). While every distribution has a graphical representation following the global Markov property, some distributions have more insightful representations that convey more conditional independence information than others. For example, a com- pletely connected graph where every vertex has an edge to every other vertex is a valid graphical representation of all distributions, but it conveys no conditional independence information. For some distributions this is the best one can do, for others, a graph could convey every valid conditional independence statement. Indeed, there is a term for this: Faithfulness - If the converse of the global Markov property is also true such that all conditional independence relationships are represented by the graph (V A ⊥ G V B S V S ⇐⇒ X A á Px X B S X S ), then P X and G are said to be faithful to one another and the uniqueG is termed a perfect map ofP X (Spirtes et al., 2000). Because faithfulness ensures that all conditional independence relationships are representable by a graph, it greatly increases the utility of a graphical model. In this chapter we assume the data follows a MVN distribution, i.e.P X =N P (; ), resulting in a specific, well-studied type of graphical model known as a graphical Gaus- sian model (GGM) or covariance selection model (Whittaker, 1990). For such models, the set of parameterizations that are not faithful in the space of possible parameteriza- tions for a particular I-map is Lebesgue measure zero (Meek, 1995; Koller and Fried- man, 2009), or equivalently, the probability of drawing an unfaithful parameterization from the space of all MVN distributions for a particular I-map is zero. From this, we assume for the rest of the chapter that for each distribution, a perfect map exists. 54 When a perfect map does exist for a distribution, it can easily be generated by starting with a completely connected graph and removing those edges for which there is any control set that makes the corresponding random variables conditionally independent. That is, letG = (V;E) be a completely connected graph and for all{V i ;V j } ∈E if there exists anX S ⊆X − {X i ;X j } such thatX i á Px X j SX S , then remove {V i ;V j } fromE. The PCU algorithm is a variant of this technique that takes into account the current state ofG as it progresses to optimize the selection of theX S ’s such that not all sets need be checked and those that are checked tend to be of smaller size. The perfect map could be generated more directly by checking only the largest pos- sible control set for each edge. That is, for {V i ;V j } only check whether X i á Px X j S X − {X i ;X j } because X i á Px X j S X S Ô⇒ X i á Px X j S X − {X i ;X j } by the global Markov property. The concentration and glasso methods generate their perfect maps in this way. However, as will be discussed and demonstrated later through the poor performance of the concentration matrix method, conditioning on larger than necessary control sets hurts the statistical power of the partial correlation hypothesis tests. 4.1.3 Connectivity Methods Finding the graphG that is faithful to a MVN distribution P X from data drawn from P X is a matter of inferring the unique (a side-effect of faithfulness) set of edgesE that encodes all conditional independencies ofP X with the global Markov property. Making use of the fact that for the MVN distribution, zero partial correlation between two ran- dom variables is necessary and sufficient for their conditional independence (Whittaker, 1990), the edges are in turn given by the non-zero partial correlations between each pair of random variables controlled for all others. 55 In this section we briefly describe the three different algorithms that we will be com- paring for the inference of undirected graphical models. Starting from the same con- struct as in the previous section, each method takes as input anN×P data matrix com- posed of N observations drawn fromP -dimensionalP X and estimates the underlying ad- jacency matrix,A, whereA ij denotes element (i;j) ofA andA ij = 1⇐⇒ {V i ;V j }∈E. Thus, there is an immediate and direct correspondence between an adjacency matrix and the graph it represents. Concentration Matrix Thresholding The easiest and most common approach is to compute partial correlations directly from the concentration matrix, which can be estimated by taking the inverse of the sample covariance matrix (Whittaker, 1990): X i X j SX−{X i ;X j } = −k ij » k ii k jj (4.2) wherek ij denotes element (i;j) of the concentration matrix,K. Thus, for MVN distri- butionsk ij = 0⇐⇒ X i X j SX−{X i ;X j } = 0⇐⇒ X i á P X X j SX − {X i ;X j } and element (i;j) of the output adjacency matrix is zero when k ij is zero and one otherwise. As discussed at the send of Section 4.1.2, the resulting graph is a perfect map, ensuring that it represents all conditional independence relationships. In practice, the true concentration matrix is unknown and must be estimated. Be- cause it is extremely unlikely that an estimated partial correlation would be identically zero even when the true partial correlation is actually zero, a hypothesis test is typically performed to control some measure of statistical error. Zero and non-zero partial corre- lations can be differentiated with the type I error rate controlled at some specified level by transforming the partial correlation matrix to p-values and thresholding as described 56 by 4.1. The problem with this approach is that often and especially for sparse graphs, conditioning on all other variables is unnecessary and statistically inefficient at testing for non-zero partial correlations, and it is this problem that PCU addresses. PCU Algorithm The PCU algorithm presented in this chapter is a variation of part of the PC algorithm (named after its authors’ first initials; Spirtes et al., 2000). In particular, the original PC algorithm was designed for directed graphs and controlled the type I error rate while PCU is customized for undirected graphs to improve speed and includes a modification by Li and Wang (2009) to control of the false discovery rate. However, to facilitate the following description of PCU we start with the general idea and mechanics behind the algorithm, which are the same as those from the original PC algorithm, and assume that all conditional independencies are known a priori such that there need not be any concern for controlling the error rate. Of course, in practice the conditional indepen- dencies are not known a priori and are inferred by hypothesis testing (Section 4.1.1). Also, for simplicity we will often use graph vertices and their corresponding random variables interchangeably. For example, we mention testing for independence between graph vertices when we more precisely mean testing for independence between the ran- dom variables corresponding to graph vertices. The PCU algorithm assumes P X is faithful to some undirected, target graph G = (V;E) and starts with a completely connected graph ^ G = (V; ^ E). It then prunes edges from ^ G to reachG by testing for independence between adjacent vertices in ^ G condi- tioned on sets that seek to separate the vertices inG (see the global Markov property in Section 4.1.2). From the faithfulness property, the separation of two vertices by a control set is equivalent to zero partial correlation conditioned on that set. Therefore, 57 when zero partial correlation occurs for any conditioning set, that set must separate the corresponding vertices inG, implying that the corresponding edge cannot be inE and therefore the PCU algorithm removes it from ^ E. The conditioning set size starts at zero and is increased until it is guaranteed that every false edge ({e Se ∈ ^ E;e ∉E}) has been conditioned on a separator and in turn re- moved thanks to careful selection of the conditioning sets. In particular, the conditioning sets of sizem for testing the edge between verticesV a andV b is every possible subset of adj(V a ; ^ G)−V b of sizem where adj(V k ;G) = {V l S V l ∼ G V k } (the set of immediate neighbors of V a inG). Thus, as m increases, if {V a ;V b } ∉ E, one of the conditioning sets is guaranteed to result in conditional independence and separate V a and V b in G. Figure 4.1 steps through the application of PCU to a small multivariate distribution for which a perfect map exists and the conditional independencies are known a priori. We again stress that the assumption of a priori knowledge of the conditional independences is only made in relation to this paragraph and is not assumed by the PCU algorithm in practice. The advantage of PCU’s conditioning scheme compared to the concentration matrix method relates to the estimation of the partial correlations and is two-fold. 1.) It is generally more accurate to condition on smaller, separating subsets when dealing with realizations of random variables because extra conditioning variables increase the vari- ance of the estimated partial correlations (Spirtes et al., 2000). Intuitively, this is because for increasing condition set size, the complexity of the underlying distribution (i.e. the number of possible parameter combinations) grows exponentially, making the determi- nation of independence much harder for a constant number of samples 2.) Whereas the 58 Figure 4.1: Example application of PCU to a small, faithful multivariate distribution whose conditional independencies are known a priori. concentration method requires at least P samples for mean-zero data, the PCU algo- rithm requires at most max {Va;V b }∈E max{S (V a ;G) S; S adj(V b ;G) S} plus 2 (see Section 4.1.1), which can be substantially smaller thanP for sparse graphs. The disadvantage the conditioning scheme is that it decreases the speed of the al- gorithm by requiring many additional hypothesis tests compared to the concentration method. Although the worst case run time is exponential in P due to the combinatorial problem of checking all possible conditioning subsets, in practice, for sparse graphs, runtime is polynomial. 59 Whereas the original PC algorithm was designed for directed acyclic graphs, our proposed PCU algorithm has been customized for undirected graphs to improve speed. Both algorithms test a series of control sets that ensure that any conditionally indepen- dent random variables will be conditioned on a control set for which they are indepen- dent. For any two verticesV a andV b , PCU does this for undirected graphs by eventually (asm increases) conditioning on either adj(V a ; ^ G)−V b or adj(V b ; ^ G)−V a . It does not matter which because either will separateV a fromV b and result in conditional indepen- dence according to the global Markov property if they are not adjacent in the true graph, G. To take advantage of this, PCU selects the smaller of the two adjacency sets, except when one adjacency set is larger thanm and the other is not, in which case the larger set is selected to ensure that the removal of other edges did not misleadingly make an edge appear finished testing. By contrast, to ensure separation in a directed acyclic graph, the original PC algorithm must condition on both adj(V a ; ^ G)−V b and adj(V b ; ^ G)−V a , roughly doubling the number of conditioning sets that need to be checked with a hypothesis test. It is interesting to note that the PCU and PC conditioning schemes are equivalent with respect to distributions that are faithful to undirected graphs because in such cases the PC algorithm conditions on a super set of the already sufficient control sets of PCU. However, this equivalence does not hold for all distributions faithful to directed acyclic graphs. The original PC algorithm controls the type I error rate, while PCU has been modi- fied following the work of Li and Wang (2009) to control the false discovery rate (FDR). Here the FDR is the expected value of the ratio of the number of incorrectly inferred edges to the total number of inferred edges in ^ G, which has the benefit of being a mea- sure of the overall error rate for ^ G. Meanwhile, the type I error rate is only a measure of the error rate for any single hypothesis test used by PCU to inferG and as a result 60 gives much less sense of how reliable ^ G is as a whole. Li and Wang show that the PC algorithm can be modified to control the FDR in the limit of large sample sizes under the assumption of the distribution’s faithfulness to some undirected graph. The modification replaces the original type I controlled edge hypothesis test with the Benjamini-Hochberg (BH) procedure for controlling the FDR for independent or positively correlated hypoth- esis tests (Benjamini and Hochberg, 1995, Section 2.3). Li and Wang also propose a second, slightly modified heuristic version of the for- mer that seems to more accurately control the FDR around the target level for greater statistical power and show that this method is also capable of detecting all true edges in the large sample limit, but fail to prove its ability to asymptotically control the FDR. However, in their simulations the heuristic version accurately controls the FDR with 500 samples for a range of networks with varying edge strengths, numbers of nodes, and average node degrees. We chose to have PCU incorporate this heuristic modification to the PC algorithm over the non-heuristic modification for its greater statistical power. Because of the sim- ilarity between PCU and the PC algorithm, Li and Wang’s heuristic FDR convergence results are likely quite predictive of PCU’s ability to control the FDR. In particular, the PCU and PC conditioning schemes are equivalent with respect to distributions that are faithful to undirected graphs. Also, the basis for Li and Wang’s heuristic approximation still holds: PCU’s edge p-values are still likely upper bounds of the true p-values for the same original reason that the p-value associated with an edge is still the maximum p-value over all tested control sets for that edge (see Pseudocode 1). However, as with the heuristic modification to the PC algorithm, asymptotic control of the FDR is not mathematically guaranteed for PCU. Code for PCU is given in Pseudocode 1. 61 Pseudocode 1 PCU Algorithm Input: the target false discovery rate, q, and an N×P data matrix,X, consisting of N observations∼N P (; ) faithful to an undirected graphG = (V;E) Output: the estimate of theG, ^ G = (V; ^ E) 1: ^ G = (V; ^ E)← complete graph of P vertices 2: m← 0 3: repeat 4: for each edgee= {V i ;V j }∈ ^ E do 5: Identify the smaller of the two adjacency sets (adj(V i ; ^ G) or adj(V j ; ^ G)) unless the smaller adjacency set is less thanm, then identify the larger of the two. 6: neighbors← the identified adjacency set 7: for each possible control setS ⊆ neighbors−{V i ;V j } such that SSS=m do 8: Calculate X i X j SS fromX and transform to a t-score to get p-valuep ijSS 9: ifp ijSS is the largest p-value calculated yet for edgee= {V i ;V j } and all edges in the graph have been tested at least once then 10: Perform the Benjamini-Hochberg procedure for controlling the FDR atq on the set of each remaining edge’s largest p-value 11: Remove edges whose maximum p-value fails the Benjamini-Hochberg procedure from ^ E 12: if the current edgee was removed then 13: Exit the inside for loop to return to the outside for loop to select a new edge for testing 14: end if 15: end if 16: end for 17: end for 18: m← m+1 19: until Sadj(V i ; ^ G)S− 1<m for allV i ∈V Graphical Lasso The graphical lasso (glasso; Friedman et al., 2008) is similar to the concentration method in that both are based on maximum likelihood estimation of the concentration matrix. In the concentration method’s case this is done implicitly because the inverse of the sample covariance matrixS is the maximum likelihood solution of the concentration matrixK for MVN distributions with nonsingular covariance matrices. In glasso’s case this is done explicitly through numerical maximization of the` 1 -penalized log likelihood ln(det(K))− tr(SK)−∥K∥ 1 (4.3) over non-negative definite concentration matricesK where tr denotes the trace of a matrix, is a tuning parameter and∥●∥ 1 denotes the` 1 norm (the sum over the absolute values of the elements). The` 1 norm has the nice property of enforcing sparsity on the solution while maintaining the convexity of the original, unregularized log likelihood problem, allowing (4.3) to be much more easily solved compared to other sparse, non- convex regularizers (Banerjee et al., 2008). The graphical lasso’s output adjacency matrix can be calculated from its concentra- tion matrix estimate ^ K that maximizes (4.3). As with the concentration method, the non-zeros of ^ K are equivalent to edges in the inferred graph and thus also equivalent to ones in the corresponding adjacency matrix. For > 0, the` 1 regularizer allows for the estimation ofK based on fewer samples than would be possible using the inverse of the sample covariance as the estimate. Furthermore, our results suggest that even when the concentration method is applicable, glasso generally outperforms it when the sample size is sufficiently small andK is sparse. The downside to such an approach is that while ^ K is good in the sense that it max- imizes (4.3), there is little guidance on how to select to control for all possible false 63 edges. At the time of writing this, the best guideline we are aware of is given by Baner- jee et al. (2008) who give an expression for selecting to control the probability of connecting any two vertices in the estimated graph for which there is no path (chain of edges) in the true graph that connects them. However, such an error rate is less useful than the FDR in brain networks because there is often a path between the vast majority of vertices. For this dissertation we used default parameters with the graphical lasso implemen- tation provided by its authors and Matlab interface by Hossein Karshenas, both available at http://www-stat.stanford.edu/~tibs/glasso/index.html. Of- ten this code would return concentration matrix values less than 1E-3, which we treated as zeros. Also, to make the resulting concentration matrix scale invariant and the reg- ularization parameter apply a more consistent penalty to each partial correlation of ^ K, we scaled the input data to have unit sample variance. 4.2 Method Comparisons We compare PCU to the concentration and glasso methods in terms of stability on real, cortical gray matter thickness data and in terms of accuracy on simulated data. The simulated data is generated from a number of network structures: 3 of which are based on the networks inferred from the thickness data (1 per method) and 2 of which were selected as variants of a simple baseline network. The section concludes with visualiza- tions of the stable thickness networks on the cortical surface. 64 4.2.1 Method Comparison Methodology Although accurate inference of network skeletons is of primary importance in brain net- work studies, in addition to measuring method performance in terms of network skeleton accuracy, we sought to measure performance in terms of the less-binary accuracy of the concentration matrix. However, PCU does not output an estimate concentration matrix ^ K and although the other two methods do, for consistency purposes, we needed a way of taking any method’s inferred graph ^ G = (V; ^ E) and from the input data estimating a concentration matrix ^ K ′ that is constrained to be zero at locations corresponding to missing edges in ^ G ({V i; ;V j }∉ ^ EÔ⇒k ′ ij = 0). Furthermore, even for the concentration and glasso methods, which do output an estimate of ^ K, simply setting the constrained elements of ^ K to zero does not necessarily lead to a good estimate of ^ K ′ . To estimate ^ K ′ we maximized the input dataset’s likelihood (assuming independent samples of the MVN distribution) over symmetric, non-negative definite concentration matrices constrained to zero at locations (i;j) where {V i; ;V j } ∉ ^ E. This optimization problem is identical to glasso (Equation 4.3), except with the added zero constraints and without the regularizing term ( = 0), and we solve it using YALMIP (Löfberg, 2004) and the SDPT3 solver (Toh et al., 1999) with default parameters. We measured network skeleton accuracy by calculating the Dice coefficient between E and ^ E (Dice, 1945): dice(E; ^ E)= 2SE ∩ ^ ES SES+S ^ ES The Dice coefficient ranges between 0 when the two sets have no edges in common and 1 whenE = ^ E. We also measured concentration matrix accuracy by calculating the normalized inner product betweenK and ^ K ′ after setting their diagonal elements to 0: 65 normalized inner product(K; ^ K ′ )= tr(K T ^ K ′ ) ¼ tr(K T K)tr( ^ K ′T ^ K ′ ) : LetK and ^ K ′ beP×P matrices, then their normalized inner product is the cosine of the angle betweenK and ^ K ′ if they were reshaped intoP 2 ×1 vectors. Thus, the normalized inner product ranges from -1 when the two matrices are equivalent with respect to a negative multiple to 0 when they are orthogonal to 1 when they are equivalent with respect to a positive multiple. Whereas the Dice coefficient gives a sense of the binary accuracy, the inner product gives a sense of the accuracy from a continuous perspective. Before fair comparisons between the methods can be made, the issue of how to compare methods depending on very different parameters needs to be addressed. For example, if the target FDR of PCU is set to 5%, an equivalent value for glasso’s regular- ization parameter is unclear because it is unrelated to the FDR. However, all methods are capable of controlling the number of edges in their inferred graphs, so for PCU and the concentration method we directly tuned each method’s parameter such that all compared graphs share the same number of target edges,d tar . The glasso method was also set to findd tar edges, but tuned differently using a cross-validation approach (ex- plained further down) because direct manipulation of to acheived tar edges performed particularly poorly for all but the lowest tested sample sizes. For simulations d tar was chosen to be the true number of edges, but for real data, because the ground truth is unknown,d tar was set by the number of edges found by PCU for typical values of FDR controlq. PCU was chosen for this role over the other methods for its ability to control the FDR and to save computation time by precluding the need to tuneq. PCU and glasso are comparatively slow to tune because they require multiple 66 runs to test different parameters while the concentration method, which controls the type I error rate, can simply select thed tar largest partial correlations. Tuning PCU was made easier by the fact that its number of inferred edges is roughly monotonic with respect toq and because comparisons were made over a range of sample sizes. The proper q for a previously calculated but similar number of samples was generally available to serve as a guideline. To take advantage of this and find aq that resulted in d tar edges, we performed a grid search around the closest, previous q and then followed up with a binary search. Because tuning solely to have glasso achieve d tar edges resulted in particularly poor simulation accuracy, we instead selected by optimizing it over a 10-fold cross- validated likelihood measure of expected performance. The cross-validation allowed us to assess how well glasso should perform on average for a particular and we de- scribe its calculation next for a particular, 0 . For each fold, the input dataset is split into training and test datasets, glasso is performed on the training dataset with = 0 , and ^ K ′ is estimated from the resulting adjacency matrix. ^ K ′ and the test dataset are then plugged into the unconstrained and unregularized MVN likelihood function, which serves as the measure of expected performance by measuring how well ^ K ′ fits the test dataset. This ends the fold, but the procedure is repeated for each of the 10 folds with each fold specifying a different splitting of the dataset into training and test datasets. The final overall measure of expected performance is the average of these likelihoods over the folds. Optimization of this cross-validated measure of expected performance with respect to is carried out by 50 iterations of a pattern search algorithm (“SwarmOps” Mat- lab toolbox by Magnus Pedersen, http://www.hvass-labs.org/projects/ swarmops/). Although the performance of derivative-free optimization methods, such 67 as the pattern search method used here, can be inconsistent, we achieved good conver- gence by leveraging previously identified’s that worked well for similard tar ’s. Fur- thermore, the likelihood function seemed well behaved over our empirical and simula- tion datasets in that the results are generally consistent over multiple trials even without such an initialization scheme. After optimization, if the optimized results in a concentration matrix with d act edges, the final adjacency matrix is found by taking the edges corresponding to the min{d tar ;d act } largest partial correlation magnitudes. For all but the smallest sample sizes tested, these tuning methods do a good job of matching the target number of edges; however, for very small sample sizes accurately tuning becomes much more difficult. For the concentration method, the solution is un- defined when the number of samples is less than the number of variables, and in such cases we leave out the entries from method comparison figures and tables. For PCU, the output becomes very sensitive asq→ 1 such thatq = 0:999 might result in 0 edges, but q = 0:9999 might result in so many edges that the algorithm runs in exponential time and becomes computationally infeasible. For glasso, the optimal cross-validation corre- sponds to increasingly sparse graphs as the number of samples is reduced such that for small numbers of samples oftend act ≪d tar and the target number of edges cannot be achieved. We note when such tuning problems occur and report the results nonetheless because we are interested in the small sample size cases. 4.2.2 Simulation Accuracy Comparison To compare the accuracy of the methods, we performed the comparison on simulated data such that the true ground truth graph and concentration matrix are known. Given 68 a target P ×P concentration matrixK, we simulated N samples ∼ N P (0;K −1 ) by generating anN ×P matrix of independent and identically distributed standard normal random variables (∼N 1 (0; 1)) and right multiplying byK −1~2 . We performed two simple baseline 20-node simulations in which each vertexV i only has true non-zero partial correlations with nodesV i−1 andV i+1 . In the first simulation, each element of the main diagonal ofK was set to 4, while the non-zero off-diagonal entries were set to 1, makingK tridiagonal. The second simulation was the same as the first, except that the off-diagonal entries were set to 2. Data was simulated over a range of sample sizes, the methods were tuned to find exactlyd tar = 19 edges on the simulated data, and the Dice coefficient and normalized inner product performance measures were calculated. Figure 4.2 shows the Dice and normalized inner product performance measures (av- eraged over 15 trials) as a function of sample size for each method and both simulations. Although parameter tuning failed at a number of the smallest sample sizes, these points are still plotted. In particular, in the first simulation where the non-zero off-diagonal of K was set to 1, the tuning failed for PCU on the smallest 5 sample sizes (N ≤ 50) and for glasso on the smallest 6 sample sizes (N ≤ 100). In the second simulation where the non-zero off-diagonal ofK was set to 2, tuning failed for glasso at the smallest sample size (N = 10). This makes comparing the methods at these small sample sizes difficult because their sparsities no longer match. In particular, glasso’s performance is likely reduced relative to PCU because it undershotd tar more than PCU on average. PCU consistently outperforms or ties the other methods for both performance measures, but glasso reduces much of this performance difference once it is able to identify the target number of edges at which point it generally performs as well as or better than the concentration method. 69 (a) Results for concentration matrix with diagonal = 4 and off diagonal = 1 (b) Results for concentration matrix with diagonal = 4 and off diagonal = 2 Figure 4.2: Dice coefficient and inner product performance measures versus number of samples for each method on the simulated tridiagonal concentration matrices. Next we simulated 3 graphs that were representative of real brain networks by setting K to ^ K ′ for each method on the cortical gray matter thickness dataset described and analyzed in Section (4.2.3) and refer to the resulting simulations as “method-based”. For this, PCU’s parameterq was set to control the FDR at 5%, resulting in a graph with 138 edges. A target FDR of 5% is often used in neuroimaging studies and was chosen here to offer a good balance between accuracy and number of identified edges. To force the number of edges to be consistent in all three methods’ graphs, the concentration method’s type I error rate and the glasso method’s regularization parameter were 70 set to 1.95% and 0.145, respectively (d tar = 138). Data was simulated based on each method’s ^ K ′ over a range of sample sizes, and the after tuning the methods to find exactlyd tar edges on each simulated dataset, the Dice coefficient and normalized inner product performance measures were calculated. However, this tuning again failed for PCU and particularly glasso for the smallest sample sizes (N∈ {20; 15; 10}). Figure 4.3 shows the Dice and normalized inner product performance measures of each method for each method-based simulation averaged over 15 trials as a function of sample size. Glasso again particularly struggles at the lowest sample sizes due to identifying less than the target number of edges, but generally performs slightly better or ties PCU for best performance once the target number of edges is achieved. The concentration method consistently performs the worst in terms of both performance measures, even when it served as the simulation basis. Control of the False Discovery Rate Because the heuristic modification we make to have PCU control the FDR (Li and Wang, 2009) is a slight variation on another modification for which only asymptotic control of the FDR is proven, we test the heuristic modification’s ability to control the FDR on simulated data for various sample sizes. In particular, we simulate data from the concentration-based and tri-diagonal (off diagonal = 2) cases as of Section 4.2.2, but evaluate the empirical FDR (number of false positives / total number of positives) in- stead of the previous performance measures for 40 trials at each sample size. Figure 4.4 shows the average empirical FDR for the two simulation cases as a func- tion of sample size for target FDRs of 15%, 5%, and 1%. For these two cases PCU does a good job of asymptotically controlling the FDR. The difference in the number of samples required to control the FDR is consistent with the observation that the path 71 (a) concentration-based simulation results (b) PCU-based simulation results (c) glasso-based simulation results Figure 4.3: Dice and inner product performances versus sample size on the method- based simulations. Each simulated dataset is based on the estimated concentration ma- trixK ′ from applying either the concentration, UPC, or glasso method to the cortical thickness dataset. 72 (a) concentration-based simulation (b) tri-diagonal (off diagonal = 2) simulation Figure 4.4: Average empirical FDR versus sample size on the concentration method- based and tridiagonal (off diagonal = 2) simulated datasets. PCU takes in inferring a solution and the validity of its heuristic modification depend on a dataset’s underlying concentration matrix. In particular, the heuristic modifica- tion calls for performing the BH procedure on only the remaining edges in the graph as opposed to all edges, including removed edges, to compensate for the edge p-values generally being significantly larger than the true p-values. Although it seems likely that there exist concentration matrices for which this justification fails, preventing PCU from controlling the FDR, the prevalence of such concentration matrices seems limited based on the results of Li and Wang who showed FDR control at 500 samples over a range of networks with diverse network properties. We note that swifter and guaranteed asymptotic control of the FDR could be easily accomplished using Li and Wang’s original modification with PCU which they prove is capable of asymptotically controlling the FDR. But doing so would result in a loss of statistical power. 73 4.2.3 Real Gray Matter Thickness Data As an example of PCU’s real-world utility, we compare it to the concentration and glasso methods in terms of stability on real, cortical gray matter thickness data (He et al., 2007; Lerch et al., 2006; Worsley et al., 1996, 2005). The biological basis for cortical, gray matter thickness correlations is unknown, but it has been argued that the covariation of the morphological features, such as gray matter thickness, gray matter volume and cortical surface area, may result from mutually tropic influences (Ferrer et al., 1995), the contribution of genes Thompson et al. (2001a,b); Schmitt et al. (2008); Giedd et al. (2007), or common experience-related plasticity and extensive learning (Maguire et al., 2006; Draganski et al., 2004, 2006). Gray matter thickness has been implicated in many important brain phenomena, such as aging (Sowell et al., 2003) and Alzheimer’s disease (Lerch et al., 2005). Studies have indicated that intercorrelated regions of morphological features may be a part of a functional, neuroanatomically interconnected system (Mitel- man et al., 2005). Furthermore, significant correspondence between anatomical connec- tivity and functional connectivity of functional data has been shown (Honey et al., 2007; Hagmann et al., 2008). Thus, partial correlation measurements of gray matter thickness connectivity could provide insight into human brain organization, and we report the sta- ble thickness networks visualized on the cortical surface. Although our analysis can just as easily be applied to other morphological features, such as cortical surface area or gray matter volume, we selected gray matter thickness because it is more widely used and more robust to parcellation error, which can introduce negative correlation between adjacent regions. 74 Australian Dataset Collection and Preprocessing Before the methods could be compared we had to extract the thickness data from a dataset of structural MRI scans (Section 2.2) and then condense the densely sampled thickness measurements to a much sparser single thickness summary statistic for each cortical ROI. This section elaborates on the details of this process. The data started as 3D structural brain MRI scans of 668 normal right handed sub- jects (age range: 22-25 years). The scans were collected using a 4 Tesla Bruker Medspec whole body scanner (Bruker Medical, Ettlingen, Germany) at the Center for Magnetic Resonance (University of Queensland, Australia). Three-dimensional T1-weighted im- ages were acquired with a magnetization prepared rapid gradient echo (MP-RAGE) se- quence to resolve anatomy at high resolution. Acquisition parameters were: inversion time (TI) / repetition time (TR) / echo time (TE) = 1500 / 2500 / 3.83 msec; flip angle = 15 degrees. To differentiate this dataset from others we refer to it as the Australian dataset. We used BrainSuite to extract the cortical gray matter thickness data (Shattuck and Leahy, 2002). In particular, we used BrainSuite’s automated processing pipeline for automatic skull stripping, tissue classification, surface extraction, gray matter thick- ness estimation and cortical and subcortical parcellation. This resulted in ROI-labeled, densely sampled gray matter thickness estimates across the cerebral cortex. The ROI labels were assigned by registering each subject’s cortex to a previously ROI-labeled template cortex (Joshi et al., 2007, 2009a) such that each point on a subject’s cortex was paired with a corresponding, homologous point on the template cortex. The labels were then propagated from the template cortex to each subject’s cortex according to this registration mapping. Because the thickness measurements were sampled much more 75 densely than the ROIs on the cortical surface, multiple thickness measurements were assigned to each ROI. Figure 4.5 shows the lateral and medial views of the template’s cortical surface parcellated and color coded into its 25 ROIs, and Table 4.1 lists the names and ab- breviations of these ROIs. The template cortex was extracted from the average of 27 T1-weighted MRI acquisitions from a single subject provided by the Montreal Neuro- logical Institute database and its labels were propagated from the ICBM (International Consortium for Brain Mapping) high-resolution single subject template. For more infor- mation see http://www.loni.ucla.edu/Atlases/Atlas_Detail.jsp? atlas_id=5. When doing network analysis, the step of choosing appropriate graph vertices is an important but subtle step: leaving out an important variable from a control set can result in extra, unwanted, indirect connections, while including too many correlated control variables can obscure true partial correlations (Mitteroecker and Bookstein, 2009). In this dissertation we limit the number of vertices to the 25 ROIs per hemisphere of the labeled template cortex. Because there are multiple thickness measurements per ROI, getting the thickness data into a form the methods can use requires condensing each ROI’s multiple thickness measurements into a single summary statistic. We calculate the summary statistic for a particular ROI by averaging over that ROI’s thickness mea- surements and then taking the log transform, resulting in a 668 subject by 50 ROI data matrix. The log transform is taken because cortical features, such as volume and gray matter thickness, have been shown to follow an approximately log normal distribution (Joshi et al., 2010; Chung et al., 2003), and thus we assume their log transforms are MVN. 76 Figure 4.5: Lateral and medial views of the 25 parcellated cortical regions per hemi- sphere displayed on the template 1.) superior parietal gyrus (SPG) 2.) cingulate gyrus (CingG) 3.) superior frontal gyrus (SFG) 4.) middle frontal gyrus (MFG) 5.) inferior frontal gyrus (IFG) 6.) precentral gyrus (PrCG) 7.) postcentral (PoCG) 8.) inferior parietal lobule (IPL) 9.) pre-cuneus (PCU) 10.) cuneus (CUN) 11.) lingual gyrus (LING) 12.) fusiform gyrus (FUSG) 13.) parahippocampal gyrus (PHG) 14.) superior occipital gyrus (SOG) 15.) inferior occipital gyrus (IOG) 16.) middle occipital gyrus (MOG) 17.) anterior parahippocampal gyrus (APHG) 18.) superior temporal gyrus (STG) 19.) inferior temporal (ITG) 20.) middle temporal gyrus (MTG) 21.) lateral orbitofrontal (LOFG) 22.) middle orbitofrontal gyrus (MOFG) 23.) supramarginal gyrus (SMG) 24.) gyrus rectus (GR) 25.) insular cortex (INS) Table 4.1: Cortical region names and abbreviations Because MRI preprocessing is a complicated task that is prone to error (especially when done automatedly), we rejected subjects having an ROI with a log-transformed thickness outside 3 standard deviations of that ROI’s distribution, resulting in a final 645 subject by 50 ROI data matrix. 77 Stability Comparison Because the true underlying graph is not known for the empirical gray matter thickness dataset such that accuracy cannot be estimated, we use bootstrapping (Efron and Tib- shirani, 1993) to assess the stability of the methods instead. Although stability is not a substitute for accuracy, it is a prerequisite in the sense that if an estimator’s bias is held constant, decreasing stability will decrease accuracy. We measured stability on the thickness dataset in terms of edge occurrence probabilities and variance of the adjacency matrix elements, which we refer to as the edge variance. To estimate these measures for thickness datasets of sample sizeN ′ we ran each method 500 times on resampled ver- sions of the dataset formed by randomly samplingN ′ subjects from the original dataset with replacement. In particular, we bootstrapped cases: N ′ ∈ {645; 161; 40}. Noting thatP is the number of ROIs andP = 50 allows us to explain thatN ′ was selected to account for the following three representative cases: 1.) N =N ′ ≫P , 2.) N ′ >P , and 3.)N ′ <P , respectively. For each trial, PCU’s q was set to 15% and the concentration and glasso methods were again tuned to match the number of edges given by PCU. Although such tuning proved difficult on the simulated data when dealing with small sample sizes, in this case none of the sample sizes were small enough to pose a problem. PCU’s q was chosen to be 15% as a reasonable target FDR that would result in more potentially interesting stable edges than the previously used rate of 5%. To prevent the network visualizations from being overrun with edges and help em- phasize the most stable edges, we classified the edges as either stable or unstable and only plotted the stable edges in our visualization (Figure 4.7). Although there are many ways to do this, we opted for the straight forward approach of classifying those edges with occurrence probabilities greater than some threshold as stable. Ultimately we chose 78 this threshold to be 70% as a good compromise between visualizing only the stablest of edges and having enough stable edges for an interesting graph. Interpretation of this network is difficult because not much is known about correla- tions in gray matter thickness; however, the dissimilarity between the connections of the inferior frontal gyrus ROIs of the left and right hemispheres may reflect the asymmetry of Broca’s area and its role in language processing. We measured hemispheric asym- metry for a particular ROI by calculating the Dice coefficient between a left subnetwork formed of only the stable edges connecting to the ROI in the left hemisphere and a right subnetwork formed of only the stable edges connecting to the homologous ROI in the right hemisphere. As such, a lower Dice coefficient corresponds to greater asymmetry. The inferior frontal gyrus ROI has the second lowest Dice coefficient at 0.5 (after the inferior occipital gyrus ROI at 0.4) where the average Dice coefficient is 0.77 with a standard deviation 0.16. To quantify the bootstrap results we computed the number of stable edges (SE), the average of the edge variances over ROIs (AEV), and the standard deviation of the edge variance over ROIs (Std) for all methods andN ′ ∈ {645; 161; 40}. Table 4.2 summarizes the results for the different bootstrap cases and Figure 4.6 shows a histogram of edge variance for each method’sN ′ = 645 case. These results indicate that PCU and glasso are significantly more stable than the concentration method as they consistently outper- formed the concentration method by a wide margin in terms of the number of stable edges and average edge variance. ForN ′ = 645, PCU proved more stable than glasso and the histogram shows this is because PCU had a larger proportion of particularly low variance, stable edges. However forN ′ ∈ 161; 40 glasso appears the more stable of the two. That being said, the stability differences between PCU and glasso were not large and are likely insignificant in practice at these sample sizes. 79 N ′ = 645 N ′ = 161 N ′ = 40 Method SE AEV(Std) SE AEV(Std) SE AEV(Std) concentration 82 0.059(0.067) 24 0.069(0.048) NA NA PCU 114 0.042(0.066) 52 0.049(0.063) 11 0.047(0.050) glasso 103 0.046(0.065) 55 0.045(0.066) 17 0.043(0.056) Table 4.2: The number of stable edges (SE), average edge variance (AEV) , and edge variance standard deviation (Std) for each method andN ′ . (a) concentration histogram (b) PCU histogram (c) glasso histogram Figure 4.6: Histogram of the bootstrap-estimated edge variances for each method with N ′ = 645. In another comparison of method stability under the empirical distribution, we started by running each method on the full-sample dataset with parameters ( = 1.95%,q = 5%, and ML-optimized = 0.145) such that each method found 138 edges (d tar = 138) and took the resulting graphs as ground truths for their respective methods. For 15 trials, the graphs were regenerated on resampled datasets of various sizes (N ′ ∈ [10; 10000]), and for each method, the Dice coefficients and concentration matrix inner products were calculated between these graphs and their corresponding ground truth graphs. PCU and particularly glasso encountered problems achievingd tar for the smallest 3 sample sizes (N ′ ∈ {20; 15; 10}). The average Dice coefficient and normalized inner product for each method is plot- ted as a function of N ′ in Figure 4.8 and corroborate the previous stability results for pointsN ′ ∈ 645; 161; 40. Furthermore, two new points are implied. 1.) Although PCU 80 scales well for large sample sizes, maintaining its lead over the concentration method, glasso does not, falling significantly behind the concentration method due to its curves’ early plateaus. 2.) Perhaps not unexpectedly, glasso’s performance still particularly struggles at low sample sizes due to failing to find the target number of edges. 4.2.4 Functional Resting-State Data Here we seek to validate PCU by identifying known, established subnetworks, such as the default mode network (Raichle et al., 2001), in functional resting-state data. As op- posed to structural data, functional data measures some proxy for brain activity. In the case of functional MRI (fMRI), the proxy is the oxygenation level of blood in the brain, which is measured via the blood-oxygen-level-dependent (BOLD) contrast mechanism (Logothetis and Wandell, 2004). The BOLD signal is recorded at multiple voxel loca- tions composing the brain and sampled in time giving a set of brain volumes across time that characterize the brain’s dynamic activity. Because the brain regions communicate with one another, there can be statistical relationships between the recorded brain sig- nals at different locations. To infer the structure of this communication i.e. the brain connectivity pattern or network, correlation and partial correlation approaches are often used (Fox et al., 2005). Recently there has been great interest in the low frequency con- nections occurring while subjects are at rest with particular interest in the default mode network, a collection of brain regions shown to be active during rest and inactive during task-evoked activity (Raichle et al., 2001). Correlation analysis of resting-state fMRI has been used to identify neurophysiological correlates of the default mode network (Fox et al., 2005) and anatomical connectivity as measured by diffusion tensor imaging (Hagmann et al., 2008; Skudlarski et al., 2008; Honey et al., 2009). 81 Beijing_Zang Dataset Collection and Preprocessing We used the Beijing_Zang dataset from the 1000 Functional Connectome Project (http://www.nitrc.org/projects/fcon_1000/). The dataset contains structural T1-weighted MPRAGE scans and functional EPI resting-state scans of 198 normal university students at Beijing Normal University and is described in (Chao-Gan and Yu-Feng, 2010). As before, automatic processing of the structural volumes from raw MRI to parcellated cortical surface with gray matter thickness measurements was performed by BrainSuite (Joshi et al., 2007, 2009a; Shattuck and Leahy, 2002); how- ever, skull extraction was performed completely automatically instead of manually as with the previous Australian thickness dataset, and the volumes were registered to a re- labeled version of the previous atlas (Figure 4.9 and Table 4.3). The re-labeling was performed by neuroanatomist Hanna Damasio at the University of Southern California and resulted in 70 ROIs, 20 more than in the original labeling. After rejecting sub- jects whose processing suffered from gross errors based on visual inspection of the atlas registration results, 96 subjects remained. Preprocessing of the functional data of the 96 remaining subjects was performed us- ing scripts provided by the 1000 Functional Connectomes Project that automated the use of FSL (http://www.fmrib.ox.ac.uk/fsl/) and AFNI (http://afni. nimh.nih.gov/afni) for standard resting-state preprocessing. These scripts en- tailed 1.) discarding the first 5 volumes 2.) 3D motion correction 3.) spatial smoothing (FWHM = 6 mm) 4.) grandmean scaling 5.) temporal bandpass filtering (0.005 - 0.1 Hz) 6.) linear and quadratic detrending and 7.) regression removal of global mean, white matter, and cerebral spinal fluid signals and 6 motion parameters. The BrainSuite structural volume labels were then propagated to the functional volumes using FSL to 82 linearly register (6 degrees of freedom, nearest neighbor interpolation) the labeled struc- tural volume to the default 8-th functional volume. Each of the new atlas’ 70 cortical ROIs was condensed to a summary statistic time series by taking the median over the functional samples falling in that ROI, similar to the calculation of the thickness ROI summary statistics described for the previous Australian gray matter thickness dataset. Also, each volume time sample was treated as a separate observation resulting in a 21,600 observation (225 observations for each of the 96 subjects) by 70 region ROI data matrix. As before, the thickness dataset was log transformed and outlier rejection was per- formed on the thickness and functional datasets. Any structural volume with a log- transformed thickness outside 3 standard deviations of its ROI and any functional vol- ume with a value outside of 4.5 standard deviations of its ROI was rejected. This resulted in a 90 subject by 70 region gray matter thickness data matrix and a 20,475 volume by 70 region functional resting-state data matrix. Analysis of the Resting-State Network We applied PCU to the first 20,000 preprocessed time samples of the Beijing_Zhang functional resting-state dataset with a target FDR of 0.001 and clustered the resulting network in the hopes of identifying the default mode, sensorimotor, and visual sub- networks. This approach was chosen because little applicable prior partial correlation analysis has been done on resting-state fMRI and because inferring expected partial cor- relation results from the much more studied regular correlation analysis is difficult. For example, if there is a path between every pair of nodes in the network resulting from partial correlation analysis (as one might expect in a brain network), and this network is 83 1.) superior frontal gyrus (SFG) 2.) middle frontal gyrus (MFG) 3.) pars opercularis (POp) 4.) pars triangularis (PT) 5.) pars orbitalis (POr) 6.) pre-central gyrus (PrCG) 7.) transverse frontal gyrus (TFG) 8.) gyrus rectus (GR) 9.) middle orbitofrontal gyrus (MOFG) 10.) anterior orbitofrontal gyrus (AOFG) 11.) posterior orbito-frontal gyrus (POFG) 12.) lateral orbitofrongal gyrus (LOFG) 13.) paracentral lobule (PCL) 14.) cingulate gyrus (CingG) 15.) subcallosal area (SCA) 16.) post-central gyrus (PoCG) 17.) supramarginal gyrus (SMG) 18.) angular gyrus (AngG) 19.) superior parietal gyrus (SPG) 20.) pre-cuneus (PCU) 21.) temporal pole (TP) 22.) superior temporal gyrus (STG) 23.) transverse temporal gyrus (TrTG) 24.) middle temporal gyrus (MTG) 25.) inferior temporal gyrus (ITG) 26.) fusiforme gyrus (FUSG) 27.) parahippocampal gyrus (PHG) 28.) hippocampus (HIP) 29.) amygdala (AMG) 30.) superior occipital gyrus (SOG) 31.) middle occipital gyrus (MOG) 32.) inferior occipital gyrus (IOG) 33.) lingual gyrus (LingG) 34.) cuneus (CUN) 35.) insula (INS) Table 4.3: Cortical region names and abbreviations a perfect map (as assumed by PCU), then there must be a non-zero correlation between every pair of nodes and the resulting correlation network is completely connected. Figure 4.11 visualizes the network using the graphviz4matlab Matlab interface (http://code.google.com/p/graphviz4matlab/) for GraphViz (http://www.graphviz.org/). The default GraphViz “NEATO” spring model layout determined the node positions in the visualization by attempting to minimize a global energy function assuming binary edge weights, similar to (Kamada and Kawai, 1989). Roughly, this layout tends to place heavily interconnected nodes close to one another. Here the effect was to cluster many of the ROIs according to subnetwork, particularly for the visual and sensorimotor sub- networks whose ROIs tended to be placed at the very top and center-right respectively. This clustering likely results from the small average path lengths of the subnetworks, which can be more easily seen in Figure 4.10. For example, the visual subnetwork ROIs 84 in the occipital lobe are all either directly connected or connected through only one other node (path lengths of 1 or 2). Anatomically, this makes sense as one would expect func- tionally coherent brain regions to be more directly connected than functionally distinct ones. This logic may also explain why the default mode subnetwork is the least clus- tered of the three subnetworks as its activity has been shown to closely relate to general task-evoked activity in many other brain regions (Raichle et al., 2001). 4.2.5 PCU Consistency Across Datasets and Modalities To evaluate the consistency of the results across different modalities and datasets we compared PCU’s results on the Australian gray matter thickness data to its results on the Beijing_Zhang structural grey matter thickness and functional resting-state MRI data. This required reprocessing the Australian gray matter thickness dataset as previously described in Section 4.2.3, but for the new atlas (Figure 4.9), resulting in a 657 subject by 70 region thickness data matrix. For each dataset combination (e.g. Australian thickness / Beijing_Zang functional) we tuned the FDR of PCU to find the same target number of edges on both datasets and measured the similarity of the resulting graphs with the Dice coefficient. To simplify the tuning on the functional dataset we only used the first 10,000 functional samples for the comparisons with other datasets. As a reference, we also compared the PCU results on the first 10,000 functional samples to the next 10,000. Figure 4.12 shows the Dice coefficient similarity measure versus target number of edges for each dataset combination. The expected value of the Dice coefficient between two random 70 node graphs with equiprobable edges can be readily solved using the hy- pergeometric distribution and is plotted along with PCU’s realized Dice coefficients for reference. To ensure that the graph similarity is not primarily caused by edges between 85 left and right hemisphere homologous points, we also report the Dice coefficients after excluding these edges. The Dice coefficients (Figure 4.12) show significantly higher consistency between the graphs than would be achieved randomly for all dataset combinations. Perhaps counter intuitively, the PCU graphs from the Australian thickness dataset / Beijing_Zang functional dataset combination (Figure 4.12b) are more similar than the Australian thickness / Beijing_Zang thickness and Beijing_Zang thickness / Beijing_Zang func- tional combinations with a peak Dice coefficient of 0.62, suggesting a strong connection between resting-state functional activity and grey matter development. The other dataset combinations would likely show greater consistency, but the Beijing_Zang thickness dataset suffers from having many fewer samples than the Australian thickness dataset. Although not included here, taking 100 subjects from the Australian thickness dataset and adding them to the Beijing_Zang thickness dataset improved the similarity between the Australian thickness and Beijing_Zang thickness results considerably and had a peak Dice coefficient of 0.55. 4.3 Summary We have presented PCU, an algorithm for the inference of sparse, undirected graphs while asymptotically controlling the false discovery rate. PCU starts with a full graph and prunes edges by testing conditional independence. However, compared to the stan- dard concentration method, these partial correlations are controlled on smaller, yet math- ematically equivalent sets that are more powerful at inferring conditional independence. Simulations corroborate PCU’s asymptotic control of the false discovery; however, this does not preclude the existence of distributions for which such control is not possible. 86 The performance of the PCU algorithm was compared to that of the concentration method and that of the graphical lasso in terms of accuracy on synthetic data and stability on empirical gray matter thickness data. PCU was shown to offer both accuracy and stability gains relative to the concentration method and, particularly at small sample sizes, relative to the graphical lasso. PCU also offered significantly greater stability than the graphical lasso at larger sample sizes. The results of the stability analysis were used to exclude unstable edges from the visualization of the cortical gray matter thickness network. We also investigated the consistency of the PCU algorithm across different datasets and modalities. This was done by comparing networks from the Australian cortical gray matter thickness dataset to networks derived from the Beijing_Zhang cortical gray matter thickness and functional resting-state data. Networks were remarkably consis- tent, particularly between the Australian thickness and the Beijing_Zang resting-state data, suggesting a relationship between resting-state functional activity and gray matter development. Lastly, applying PCU to the Beijing_Zang functional resting state data and visualiz- ing the results using a spring model layout tended to separately cluster ROIs belonging to the default mode, visual, and sensorimotor subnetworks. 87 (a) Concentration, top view (b) Concentration, lateral view of left hemisphere (c) Concentration, lateral view of right hemisphere (d) PCU, top view (e) PCU, lateral view of left hemisphere (f) PCU, lateral view of right hemisphere (g) glasso, top view (h) glasso, lateral view of left hemisphere (i) glasso, lateral view of right hemisphere Figure 4.7: Three views of the stable bootstrapped graphs for each method withN ′ = 645. Refer to Table 4.1 for ROI names. Edge darkness and thickness increase with edge stability. In the lateral views only intrahemispheric connections are shown. 88 Figure 4.8: A comparison of each method’s consistency with their full-sample output under the thickness dataset’s empirical distribution. The comparison is made over a range of sample sizes and consistency is measured by 15 trial average of the Dice coef- ficient and inner product between the current output and the full-sample output. Figure 4.9: Lateral and medial views of the 35 parcellated cortical regions per hemi- sphere displayed on the atlas 89 (a) top view (b) lateral view of left hemi- sphere (c) lateral view of right hemi- sphere Figure 4.10: Three views of the functional resting-state graph output by PCU. Refer to Table 4.3 for ROI names. Edge darkness and thickness increase with the partial correlation magnitude where blue and red edges denote positive and negative partial correlations respectively. In the lateral views only intrahemispheric connections are shown. 90 Figure 4.11: GraphViz “NEATO” spring model layout of the functional resting-state partial correlation network. Each circle contains its corresponding ROI abbreviation (see Table 4.1) where an “R.” prefix denotes the right hemisphere and “L.” the left. Suggested subnetwork membership is encoded by circle color (key on the figure). For example, a white circle denotes membership with all three possible subnetworks (sensorimotor, default mode, visual) while gray denotes membership with none of these subnetworks. 91 (a) Australian thick. / Beijing_Zang thick. (b) Australian thick. / Beijing_Zang funct. (c) Beijing_Zang thick. / Beijing_Zang funct. (d) Beijing_Zang funct. / Beijing_Zang funct. Figure 4.12: Actual PCU Dice coefficient and theoretical random graph Dice coefficient versus target number of edges. 92 Chapter 5 Extensions of Sparse Partial Correlation Methods to the Time Domain Neuronal populations fire in time-varying degrees of synchrony and coordination as the brain processes information (Tononi and Edelman, 1998; Abeles et al., 1995). Electro- physiological measurement of this dynamic network activity, performed noninvasively with electroencephalography (EEG) and magnetoencephalography (MEG) or invasively with microelectrodes, has enabled the study of brain activity with fine time resolution. In the analysis of such data, coherence (Priestley, 1981) and Granger causality (Granger, 1969) take the temporal structure of the data into account and are often used to identify coordination or interaction between different neuronal populations (Miltner et al., 1999; Brovelli et al., 2004). As with correlation, coherence does not differentiate between direct and indirect connectivity, but its multivariate extension, partial coherence, does. Similarly, granger causality has a bivariate version and a multivariate version that is capable of distinguishing between direct and indirect connectivity. As in the previous, non-time series case with correlation (Chapter 4), the benefits of these multivariate inter- action measures come at the price of reduced statistical power. This downside is often exacerbated for functional brain signals because the number of samples is effectively reduced by the common practice of calculating interaction measures over short time 93 windows such that the signals can be treated as approximately stationary (Ding et al., 2000; Miltner et al., 1999). To address this problem we propose extensions to the PCU algorithm (Section 4.1.3) and the graphical lasso method (glasso; Section 4.1.3) that leverage network sparsity to improve the estimation of partial coherence and Granger causality based networks. Coherence measures the linear relationship between the weights of the Fourier basis functions (i.e. the Fourier coefficients) of two random, weakly stationary time series or random processes (Priestley, 1981). Partial coherence is a measure of the same linear relationship between two such Fourier coefficients after their linear relationships with the corresponding Fourier coefficients of other time series has been removed. In fact, partial coherence is asymptotically the partial correlation between Fourier coefficients of the same frequency. In the same way that non-zero partial correlations represent direct linear dependences between a set of random variables and can be visually represented as edges in a graph, non-zero partial coherences represent direct linear dependences be- tween a set of time series and can be used to generate graphs termed partial correlation graphs for time series (Dahlhaus, 2000). The extensions of the PCU and glasso algorithms to partial coherence for the infer- ence of partial correlation graphs for time series maintains the similarities with partial correlation in the non-time series case. In particular, the extension of the PCU algorithm, named spectral PCU, applies the original PCU algorithm to the Fourier coefficients of a set of time series at the same frequency. This is repeated at multiple frequencies, giving a graph for each frequency. These graphs are then combined into one partial correlation graph for time series. The glasso extension to partial coherence, named Spectral glasso, similarly applies its original algorithm one frequency at a time. However, instead of 94 applying it to a covariance matrix as in the non-time series case, it is applied to a co- herence matrix, reformulating the partial coherence estimation problem as a covariance estimation problem for Fourier coefficients. Granger causality infers causality based on whether past observations of one time series help predict or estimate future values of another. If a time series aids in such a prediction, then it is said to Granger cause the other, which is not necessarily a symmet- ric relation. For multivariate Granger causality, a time series Granger causes another if it similarly aids in its prediction after the pasts of other time series in a control set have also been taken into account. In this way, multivariate Granger causality differen- tiates between direct and indirect connectivity and allows for the generation of directed graphs, termed Grange causality graphs, where a directed edge from one node to an- other denotes that the former Granger causes the later (Dahlhaus and Eichler, 2003; Eichler, 2007). In practice because it can very difficult to find optimal estimators, it is often assumed that the data follows a multivariate autoregressive (MV AR) model for which the optimal estimators are linear (Baccala and Sameshima, 2001; Hoerzer et al., 2010; Ding and He, 2013). The extension of PCU to Granger causality, named PCU MV AR, is based on the assumption of an MV AR model and the observation that when checking whether one time series Granger causes another it is often unnecessary to first remove the effects of all other time series as a control set of just the true Granger causal time series is sufficient. Although both control sets are equivalent in terms of the definition of Granger causality, they are not equivalent when it comes to inferring Granger causality from empirical data. In general, estimating Granger causality using the reduced control set will tend to increase statistical power and reduce sample size requirements, just as it 95 did for partial correlations in Chapter 4. PCU MV AR leverages this by seeking to infer Granger causality using these reduced but sufficient control sets. In the next section we give a brief overview of coherence and Granger causality and describe our proposed methods. Next we compare our methods to other methods in simulation, noting particularly strong performance at small sample sizes. We then apply the methods to data from a study in which microelectrodes were used to invasively record the local field potentials (LFPs) of macaque monkeys (Bressler et al., 1999). As in EEG and MEG, LFP measurements result from the dendritic and volume conduction currents of brain activity. We report results that are consistent with the task-based study and other published analysis of the data and conclude with a discussion of these results. 5.1 Theory and Algorithms 5.1.1 Coherence Letx[t] andy[t],t∈Z be two discretely sampled, weakly stationary random processes or time series with cross-covariance functionC xy [] = E{(x[t]− E{x[t]})(y[t−]− E{y[t−]}) ∗ },∑ ∞ t=−∞ SC xy []S < ∞ where to keep with convention (:) ∗ now denotes just the complex conjugate operator (as opposed to its use in Chapter 3 as the conjugate transpose operator). The cross-spectral density function between x[t] and y[t] is the discrete time Fourier transform (DTFT) ofC xy []∶ S xy (w)= ∞ Q =−∞ C xy []e −jw : 96 The coherence between x[t] and y[t] is a complex quantity that measures the linear relationship between the two time series with respect to a particular frequency and is calculated by normalizing the cross-spectral density function: coh xy (w)= S xy (w) » S xx (w)S yy (w) : (5.1) The magnitude of coherence ranges between zero and one as the strength of the linear relationship increases. Indeed, if two signals are uncorrelated for all time lags ∈ Z, then their coherence will be zero for all frequenciesw ∈R, or equivalently 0 ≤ w < due to coherence’s 2 periodicity and the fact that its magnitude is an even function. Perhaps more intuitively, coherence is the correlation between DTFT coefficients as the number of samples approaches infinity. For example, in terms of the DTFT coef- ficientsX(w) =∑ N−1 t=0 x[t]e −jwt andY (w) =∑ N−1 t=0 y[t]e −jwt forx[t] andy[t] respec- tively but now witht∈ {0; 1;:::;N − 1} coh xy (w)= lim N→∞ corr(X(w);Y (w)): (5.2) To see this, assumex[t] andy[t] are mean zero without loss of generality and substitute their definitions forX(w) andY (w) into cov(X(w);Y (w)): cov(X(w);Y (w))= E ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ N−1 Q tx=0 x[t x ]e −jwtx ⎛ ⎝ N−1 Q ty=0 y[t y ]e −jwty ⎞ ⎠ ∗ ⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭ (5.3) = N−1 Q tx=0 N−1 Q ty=0 E{x[t x ]y ∗ [t y ]}e −jw(tx−ty) : Defining a new summation variable =t x −t y gives 97 cov(X(w);Y (w))= N−1 Q tx=0 tx Q =tx−(N−1) C xy []e −jw : (5.4) Moving the summation to the outside and breaking it apart gives cov(X(w);Y (w))= 0 Q =−(N−1) +N−1 Q tx=0 C xy []e −jw + N−1 Q =1 N−1− Q tx=0 C xy []e −jw = 0 Q =−(N−1) (N +)C xy []e −jw + N−1 Q =1 (N −)C xy []e −jw = 0 Q =−(N−1) (N −SS)C xy []e −jw + N−1 Q =1 (N −SS)C xy []e −jw =N N−1 Q =−(N−1) (1− SS N )C xy []e −jw : (5.5) Thus, lim N→∞ (1~N)cov(X(w);Y (w))= lim N→∞ N−1 Q =−(N−1) C xy []e −jw =S xy (w): (5.6) Substituting this expression into the definition of coherence (5.1) completes the proof. Partial coherence maintains the analogy with correlation. Just as partial correla- tion is the correlation between residuals after first regressing out other random vari- ables in a control set, partial coherence is asymptotically the correlation between DTFT coefficients after first regressing out the DTFT coefficients of other time se- ries. In particular, given a control set of multiple weakly stationary time seriesZ[t] = 98 {z 1 [t];z 2 [t];:::;z n [t]},t∈ {0; 1;:::;N− 1}, letZ(w) be the set of DTFT coefficients of the elements ofZ[t], and ^ X Z (w) and ^ Y Z (w) be the linear estimates of X(w) and Y (w) fromZ(w) respectively. The partial coherence at frequencyw betweenx[t] and y[t] controlled forZ[t] is the correlation between residualsR X (w) =X(w)− ^ X Z (w) andR Y (w)=Y (w)− ^ Y Z (w) as the number of time samples goes to infinity: coh xySZ (w)= lim N→∞ R X (w)R Y (w) : (5.7) Continuing the analogy with correlation, partial coherence can be quickly calculated by inverting and normalizing a covariance matrix, namely the matrix of power spectral densities. In particular, let matrixS(w) be a function of frequency w with elements S ij (w)=S z i z j (w) andH(w)=S −1 (w). The partial coherence betweenz i [t] andz j [t] controlled for all other time series inZ[t] at frequencyw is coh z i z j SZ−{z i ;z j } (w)= −H ij (w) » H ii (w)H jj (w) : (5.8) Partial coherence can also be calculated as the coherence of time domain residuals (Dahlhaus, 2000). LetZ[t] = (z 1 [t];:::;z n [t]) T ,t ∈ {0; 1;:::;N − 1} now be a vector of zero mean, weakly stationary time series and defineZ −i−j [t] = (z k [t];k ≠i;j) to be the vector of time series excludingz i [t] andz j [t]. DefineA ∗ i [t] to be an optimal mean squared error linear filter for estimatingz i [t] fromZ −i−j [t]: A ∗ i [t]= arg min A i [t] E(z i [t]−(A i [t]∗Z −i−j [t])) 2 whereA i [t]∗Z −i−j [t]=∑ N−1 =0 A i [t−]Z −i−j [] and i [t]=z i [t]−(A ∗ i [t]∗Z −i−j [t]) is the residual. Repeating this for time seriesz j [t] and calculating the coherence between i [t] and j [t] gives the partial coherence: 99 coh z i z j SZ−{z i ;z j } (w)= lim N→∞ coh i j (w): (5.9) From this it follows that coh z i z j SZ−{z i ;z j } (w) = 0 for w ∈ R is equivalent to cov( i [t]; j [t+])= 0 for ∈Z asN→∞. In practice however,N is finite and there is no advantage to considering more thanN frequency values:w = 2k N fork ∈ {0; 1;:::N−1}. In terms of DTFT coefficientsZ(w), these frequencies correspond to the discrete Fourier transform (DFT) coefficientsZ[k]= ∑ N−1 t=0 Z[t]e −j2kt~N and we frame the remaining discussion of coherence estimation and inference in terms of such k-indexed discrete samples in frequency. Partial coherence is often estimated by first estimating the power spectral density matrixS(w) at a particular frequency and then inverting and normalizing it per expres- sion (5.8). Estimation using expression (5.9) is less common because it tends to be more computationally intensive because of the required estimation ofA ∗ [t] for each pair of signals. Here we estimateS(w) nonparametrically at a finite, discrete set of frequencies using Welch’s overlapped segment averaging (WOSA) method (Welch, 1967) with 50% overlap and a Parzen window data taper. This method first partitions each time series in time into a number of overlapping segments that are then multiplied by a data taper window and discrete Fourier transformed. In particular, let ~ Z i b [k]= N S −1 Q t=0 h[t]z i [t+N D b]e −j2kt~N S (5.10) whereh[t] is the unit norm data taper of segment lengthN S ,N D is the integer distance between the starts of consecutive segments (0 <N D ≤N S ), andb is the segment index. Thus, 50% overlap impliesN D =N S ~2. The k-th WOSA cross spectral density estimate betweenz i [t] andz j [t] corresponding to frequencyw = 2k N S is calculated as 100 ^ S z i z j [k]= 1 B B−1 Q b=0 ~ Z i b [k]( ~ Z j b [k]) ∗ (5.11) whereB is the number of segments andN D (B− 1)=N −(N S − 1). Proper selection of WOSA’s data taper and parameters is important because they significantly affect the quality of the estimates. For example, for fixedN and overlap percentage (typically set to 50% in practice), a tradeoff generally occurs between es- timator bias and variance in which increasing N S reduces bias, but decreases B and increases variance while increasingB reduces variance, but decreasesN S and increases bias. The WOSA method and other nonparametric power spectral density estimators have been well studied (Percival and Walden, 1993; Priestley, 1981), but we stop here as further discussion is outside the scope of this dissertation. To allow for statistical analysis of the power spectrum, we approximate WOSA es- timates as scaled chi-squared distributions with = 2(E{ ^ S[k]}) 2 ~var( ^ S[k]) empirical degrees of freedom where var denotes variance (Percival and Walden, 1993; Koopmans, 1995). Plugging in approximations of the WOSA estimate’s expected value and variance by Percival and Walden (1993) gives v = 2N B 1+ 2∑ N B −1 m=1 1− m N B T ∑ N S t=1 h[t]h[t+ms]T 2 (5.12) fork ≠ 0. Fork = 0 we halve the degrees of freedom provided by (5.12). To test for zero partial coherence we transform partial coherence estimates ^ coh z i z j SZ−{z i ;z j } [k] to F-statistics under the null hypothesis that the true partial coherence is zero (Koopmans, 1995; Hannan, 2009): 101 (m−n+ 1) S ^ coh z i z j SZ−{z i ;z j } [k]S 2 1−S ^ coh z i z j SZ−{z i ;z j } [k]S 2 ∼ F 2;2(m−n+1) ;k = {1; 2;:::N S − 1}−{N S ~2} (m−n− 1) S ^ coh z i z j SZ−{z i ;z j } [k]S 2 1−S ^ coh z i z j SZ−{z i ;z j } [k]S 2 ∼ F 1;m−n−1 ;k = 0 (m−n+ 1) S ^ coh z i z j SZ−{z i ;z j } [k]S 2 1−S ^ coh z i z j SZ−{z i ;z j } [k]S 2 ∼ F 1;m−n+1 ;k =N S ~2 (5.13) wherem=~2 andn is the number of time series inZ[t]. 5.1.2 Partial Coherence Based Graph Algorithms A natural application of partial coherence’s ability to quantify linear relationships be- tween time series is to form graphs where the absence of an edge is equivalent to zero partial coherence between the corresponding time series for all w. In particular let V = {V 1 ;V 2 ;:::;V n } be the set of graph vertices corresponding to multivariate, weakly stationary time seriesZ[t] = {z 1 [t]; z 2 [t];:::; z n [t]} with {V i ;V j } a member of edge setE if and only ifV i andV j are connected or adjacent (V i ∼V j ) in graphG = (V;E).G is said to be a partial correlation graph for time series (Dahlhaus, 2000) if {V i ;V j }∉E⇔ coh z i z j SZ−{z i ;z j } (w)= 0; 0≤w <: (5.14) Such graphs have a global Markov property using the same undirected separation cri- terion (Section 4.1.2) as the well-studied Gaussian graphical models (GGMs; Edwards, 2000; Whittaker, 1990; Pearl, 1988) of Chapter 4, making them particularly useful in conveying partial coherence relationships. 102 Methods for inferring GGMs can readily be extended to partial correlation graphs for time series by treating the estimation of partial coherence as the estimation of par- tial correlation between DFT coefficientsZ i [k] =∑ N S −1 t=0 z i [k]e −j2kt~N S with two main differences. First, methods that hypothesis test for non-zero partial correlation need to have their tests updated to work for Fourier coefficients. Here we do this by equivalently checking for non-zero partial coherence (expressions (5.11), (5.12), and (5.13)). Sec- ond, the DFT coefficients need only be taken one frequency at a time as equation (5.7) shows that the partial coherence at frequencyw = 2k N S only depends on thek-th Fourier coefficients. Doing so gives a separate graph and corresponding set of edge p-values for each sampled frequency with the k-th graph having an edge between V i and V j if and only if it is concluded that coh z i z j SZ−{z i ;z j } [k] ≠ 0. We refer to these graphs as the frequency graphs and note that per definition (5.14) if all hypothesis tests are correct, then the union over these graphs is the desired partial correlation graph for time series. In practice, simply taking the union over the graphs at each frequency is not sufficient if some error rate is to be controlled. However, Simes (1986) provides a procedure for testing the intersection null hypothesis that a partial coherence is zero at all L ≤ floor(N S ~2+ 1) tested frequencies where floor(.) denotes the floor function. The reason for the floor function is that it accounts for redundant frequencies due to coherence’s symmetry while the inequality accounts for the fact that it may be preferable or sufficient to consider a subset of the possible frequencies. In particular, if the intersection null hypothesis is rejected when p (i) ≤i~L (5.15) for anyi ∈ [1; 2;:::;L], then the type I error rate is controlled at wherep (1) ≤p (2) ≤ ::: ≤p (L) are an edge’s ordered p-values from testing for non-zero partial coherence at 103 each of theL tested frequencies. Simes proved that this procedure holds for indepen- dent p-values and showed that it holds for a variety of multivariate normal and gamma statistics in simulation. We justify its use here by noting that under general conditions the Fourier coefficients of different frequencies are asymptotically independent as the number of samples increases (Priestley, 1981; Anderson, 2011). To combine the fre- quency graphs we calculate p-values for edges in the partial correlation graph for time series by finding the smallest for each edge such that the Simes procedure rejects the null hypothesis. Inverse Power Spectrum Matrix For the purpose of generating a baseline method to compare the other methods against, we first estimate the partial coherence matrix by inverting and normalizingZ[t]’s es- timated power spectral density matrix according to (5.8). These estimates are then hy- pothesis tested for non-zero partial coherence and the resulting frequency graph p-values are combined using the Simes procedure (5.15) to get p-values for the edges of the par- tial correlation graph for time series. These p-values are in turn thresholded to control the type I error rate and form the final partial correlation graph for time series. Going forward we will refer to this multistep procedure as the spectral inversion method. Spectral PCU The idea behind the original PCU algorithm (Section 4.1.3) still applies here with respect to the Fourier coefficients ofZ[t]: leverage the increased statistical power of testing for non-zero partial correlation on smaller control sets that are equivalent to testing on the full control sets. In fact, spectral PCU is identical to the original PCU algorithm with a few exceptions. First, all of the general extensions discussed earlier in this section 104 apply to spectral PCU with the caveat that the maximum p-value over all hypothesis tests for a particular edge and frequency graph is used as the p-value for that edge and frequency graph in the Simes procedure. Second, as a final step the p-values returned by the Simes procedure are input into the same Benjamini-Hochberg procedure used in the original PCU algorithm (c(m) = 1; Section 2.3) to control the false discovery rate (FDR) and generate the partial correlation graph for time series. Third, although the original PCU algorithm incorporates the heuristic modification by Li and Wang (Li and Wang, 2009; Section 4.1.3) to control the FDR, we instead chose to incorporate their non-heuristic version of the same modification in spectral PCU, reducing statistical power, but favoring faster and more certain asymptotic control of the FDR. This helps compensate for increased difficulty controlling the FDR on electrophysiological time series data due to non-stationarity effectively reducing the number of samples. Code for spectral PCU is given in Pseudocodes 2 and 3. Incorporating Li and Wang’s non-heuristic modification over their heuristic modifi- cation calls for all edge p-values, as opposed to only the p-values of remaining edges, to be considered by the Benjamini-Hochberg procedure used for pruning frequency graph edges. Although this non-heuristic modification is proven to allow the original PC al- gorithm to asymptotically control the FDR (Li and Wang, 2009), it is not clear what effect combining the multiple frequency graphs with the Simes procedure will have on spectral PCU’s ability to control the FDR. For example, the error control used in prun- ing the frequency graph edges (Psuedocode 2) need not match the error control applied to the p-values from the Simes procedure (Psuedocode 3) and the interaction between these choices is not immediately evident. Here we take the straightforward approach of controlling the FDR with the Benjamini-Hochberg procedure at the same levelq for both. 105 Pseudocode 2 PCU frequency graph algorithm Inputs: scalar target false discovery rateq, frequency to considerw, andn×T data matrixZ consisting ofT time observations of a weakly stationaryn-length vector time series Output: then×n matrixP of edge p-values for the frequency graph atw 1: ^ G =(V; ^ E)← complete graph of n vertices 2: m←0 3: repeat 4: for each edgee={V i ;V j }∈ ^ E do 5: Identify the smaller of the two adjacency sets (adj(V i ; ^ G) or adj(V j ; ^ G)) unless the smaller adjacency set is less thanm, then identify the larger of the two. 6: neighbors← the identified adjacency set 7: for each possible control setS ⊆ neighbors−{V i ;V j } such thatSSS=m do 8: Calculate coh z i z j SS (w) fromZ and transform to an F-statistic to get p-valuep ijSS 9: if p ijSS is the largest p-value calculated yet for edge e = {V i ;V j } and all edges in the graph have been tested at least once then 10: Perform the Benjamini-Hochberg procedure for controlling the FDR atq on the set of each edge’s largest p-value 11: Remove edges whose maximum p-value fails the Benjamini-Hochberg procedure from ^ E 12: if the current edgee was removed then 13: Exit the inside for loop to return to the outside for loop to select a new edge for testing 14: end if 15: end if 16: end for 17: end for 18: m← m+1 19: untilSadj(V i ; ^ G)S−1<m for allV i ∈V 20: P← matrix of largest p-values for each edge Pseudocode 3 Spectral PCU algorithm Inputs: scalar target false discovery rateq, L-sized set of frequencies to considerW , andn×T data matrixZ consisting ofT time observations of a weakly stationary n-length vector time series Output: the estimate of the partial correlation graph for time series, ^ G = (V; ^ E) 1: for each frequencyw ∈W do 2: Run the PCU frequency graph algorithm (Pseudocode ??) withq, w, andZ as input 3: P(w)← returned p-value matrix 4: end for 5: Perform the Simes procedure on {P(w);w ∈W} to get a p-value for every possible edge {V i ;V j } whereV i ∈V,V j ∈V, andV i ≠V j 6: Perform the Benjamini-Hochberg procedure for controlling the FDR atq on the set of resulting p-values 7: ^ E← set of edges whose p-value passes the Benjamini-Hochberg procedure Spectral Glasso Given samples from a multivariate normal (MVN) distribution, the original graphical lasso method (glasso; Section 4.1.3; Friedman et al., 2008) seeks to estimate the inverse of the covariance matrix (the concentration matrix) through numerical maximization of the MVN` 1 -penalized log likelihood ln(det(K))− tr(SK)−∥K∥ 1 (5.16) over non-negative definite concentration matricesK. HereS is the sample covariance matrix, tr(.) denotes the trace of a matrix, is a tuning parameter and∥●∥ 1 denotes 107 the` 1 norm (the sum over the absolute values of the elements). The addition of the` 1 norm has a beneficial sparsifying effect on the solution and allowsK to be estimated from fewer samples than would otherwise be possible. However it also requires careful selection of, which is nontrivial as optimal varies withS, and we are currently aware of no method for selecting to control some multiple comparison error rate. As such, glasso takes more of an estimation theoretic perspective compared to spectral PCU’s detection theoretic perspective which focuses more on the error-controlled detection of non-zero partial coherences than the partial coherence values themselves. Spectral glasso treats the problem of inferring partial coherence from time series as a problem of inferring the concentration matrix ofZ[t]’s DFT coefficients Z[k] = (Z 1 [k];Z 2 [k];:::;Z n [k]) T at a particular frequency corresponding to index k. This alternative formulation is solved in similar fashion as the original glasso method by maximizing an approximation ofZ[k]’s` 1 -penalized log likelihood with respect to its concentration matrix. If Z[t] is a Gaussian process, then Z[k] ∈ C n is distributed complex MVN for eachk. Furthermore, ifZ[t] is not a Gaussian process, thenZ[k] is asymptotically distributed complex MVN under general conditions as the number of samples grows (Priestley, 1981; Anderson, 2011), and spectral glasso uses this to justify its assumption thatZ[k] is distributed complex MVN. The distribution for complex MVN random variables, say z = Z[k 0 ] for some constant k 0 , depends on three parameters: mean = E{z}, covariance matrix = E{(z −)(z −) H }, and relation matrixC = E{(z −)(Z −) T } where (:) H and (:) T denote the conjugate transpose and transpose operators respectively. Circular sym- metric complex MVN distributions only depend on , as andC are zero, and can readily be shown to have the same likelihood functions as their real MVN counterparts by lettingK = −1 andS be the sample covariance matrix computed with the conjugate 108 transpose operator. Because we are only interested in as it specifies all of the par- tial coherence relationships, we simplify the` 1 -penalized log likelihood maximization problem by treatingZ[k] as though it follows such a circular symmetric distribution. As with the other methods, spectral glasso solves its subproblem (5.16) forZ[k] one frequency at a time. Based on the fact thatZ[k] ′ s covariance matrix asymptotically ap- proachesN S timesZ[t]’s power spectral density matrix atw = 2k~N (equation (5.6)), spectral glasso solves (5.16)L times, once with each frequency’s power spectral density matrix scaled byN S input for sample covarianceS. In practice however, we normalize the power spectral density input matrices to be coherence matrices (expression (5.1)) so that the resulting partial coherence matrices are scale invariant and the regularization parameter applies a more consistent penalty to each partial coherence. The outputK’s are normalized according to (5.8) to get a partial coherence estimate at each frequency. Because the ` 1 norm prevents the calculation of p-values, the Simes procedure is not applicable and spectral glasso instead forms frequency graphs from the non-zero partial coherence estimates and takes their union to generate the partial correlation graph for time series. For this dissertation we solve (5.16) using YALMIP (Löfberg, 2004) and the SDPT3 solver (Toh et al., 1999) with default parameters. Because the ` 1 norm often forces elements of a solution to be identically zero, when YALMIP returned solutions with values less than 1E-3, we assumed poor convergence and treated them as zeros. 5.1.3 Granger Causality LetZ[t] = (z 1 [t]z 2 [t]::: z n [t]) T be a weakly stationary vector time series following multivariate autoregressive (MV AR) model 109 Z[t]= p Q =1 A[]Z[t−]+[t] (5.17) whereA[] is then×n transfer function matrix,p is the model order, and[t] is then- length innovation or residual vector of white noise, i.e. E{[t]}= 0 andE{[t] T [t+]} is identity for = 0 and zero otherwise. If the past ofz i [t] helps predictz j [t] after first regressing out the pasts of all other time series inZ[t] fromz j [t], includingz j [t] itself, thenz i [t] is said to Granger causez j [t]. More simply,z i [t] Granger causesz j [t] if and only if there exists at∈ {1; 2;:::p} such that element (j;i) ofA[t]≠ 0 (Granger, 1969). Expression (5.17) is a compact form for specifying multiple linear models, one for each z i [t];i ∈ {1; 2;:::;n} that can be rearranged to make the estimation ofA[t] a single, large application of linear regression or n smaller applications, one for each z i [t]. To see that the estimation ofA[] decouples inton smaller estimation problems, note that (5.17) can be written even more compactly in its companion form as Z ′ [t]=A ′ Z ′ [t− 1] (5.18) whereTp-length vectorZ ′ [t] = (z 1 [t];z 2 [t];:::;z n [t];z 1 [t− 1];z 2 [t− 1];:::;z n [t− (p− 1)]) T ,np×np matrix A ′ = ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ A[0] A[1] ::: A[p] I 0 0 0 0 ⋱ 0 0 0 0 I 0 ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (5.19) andI is then×n identity matrix. Right multiplying (5.18) byZ ′T [t− 1] and taking the expected value gives 110 E{Z ′ [t]Z ′T [t− 1]}=A ′ E{Z ′ [t− 1]Z ′T [t− 1]} (5.20) →C Z ′ Z ′[1]=A ′ C Z ′ Z ′[0] →A ′T =C −T Z ′ Z ′[0]C Z ′ Z ′[1] whereC Z ′ Z ′[] is the autocovariance matrix forZ ′ [t] at lag and is assumed invertible. VectorizingA ′T gives vec(A ′T )= (I⊗C −T Z ′ Z ′[0])vec(C T Z ′ Z ′[1]) (5.21) where ⊗ denotes the Kronecker product. Letting a i denote the i-th row ofA ′ and C Z ′ i Z ′[1] be thei-th row ofC Z ′ Z ′[1] allows (5.21) to be written as vec(A ′T )= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ S a T 1 S a T 2 S ⋮ a T np S ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ (5.22) 111 (I⊗C −T Z ′ Z ′[0])vec(C T Z ′ Z ′[1])= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ C −T Z ′ Z ′[0] 0 0 0 0 C −T Z ′ Z ′[0] 0 0 0 0 ⋱ 0 0 0 0 C −T Z ′ Z ′[0] ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ S C T Z ′ 1 Z ′[1] S C T Z ′ 2 Z ′[1] S ⋮ C T Z ′ np Z ′[1] S ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ : (5.23) Thusa i =C Z ′ i Z ′[1]C −1 Z ′ Z ′[0], which is the standard solution for the linear regression of z i (t) onZ ′ [t− 1] for 1≤i≤n. To perform a hypothesis test for Granger causality let Z[t] and z j [t], t ∈ {1; 2;:::;T} now represent realizations of the random processes they previously rep- resented. Further, let control set S be a subset of {1; 2;:::;n} and ^ A j:SS [] be the estimate of the j-th row of the transfer functionA[] using only {z k [t];k ∈ S} such that the order of the values is maintained with respect toZ[t] by inserting zeros in the column locations corresponding to the excluded time series. Further still, let residual ^ jSS [t] = z j [t]−∑ p =1 ^ A j:SS []Z[t−], p+ 1 ≤ t ≤ T and residual sum of squares RSS jSS =∑ T t=p+1 ^ 2 jSS [t]. To test for Granger causality fromz i [t] toz j [t] with control set S we use the fact that (T −p(m+ 2))(RSS j:S{S−i} − RSS j:SS ) pRSS j:SS ∼F p;[T−p(m+2)] (5.24) wherem is the cardinality ofS (SSS), and we assume that the mean of the time series was estimated and removed to better fit the MV AR model (Brandt and Williams, 2006). 112 The definition of Granger causality specifies that this test needs to be performed with maximum S = {1; 2;:::;n}, however as will be discussed below in regards to the PCU extension to Granger causality, sometimes it is preferable to test on smaller, but equivalent control sets. 5.1.4 Granger Causality Based Graph Algorithms Granger causality leads to a directed graphical representation. In particular, let V = {V 1 ;V 2 ;:::;V n } be the set of graph vertices corresponding to mean zero, weakly sta- tionary time series vectorZ[t]= (z 1 [t];z 2 [t];:::;z n [t]) T with (V i ;V j )∈E if and only if there is a directed edge fromV i toV j (V i →V j ) in graphG = (V;E). AssumingZ[t] follows MV AR model (5.17),G is said to be a Granger causality graph if V i ↛V j ⇔A ji [t]= 0∀t∈ {1; 2;:::p} (5.25) whereA ji [t] denotes element (j,i) of the MV AR model transfer functionA[t]. Such graphs have a number of interesting properties and relationships with other types of graphs, including the partial correlation graphs for time series of Section 5.1.2, but this topic is outside the scope of this dissertation, and we refer interested readers to (Eichler, 2007) and (Dahlhaus and Eichler, 2003) for more information. This section presents the extensions of the previous three methods to Granger causality and the inference of Granger causality graphs. Regression We use standard linear regression to solve for the transfer function and act as a baseline method to compare the other methods against. This is equivalent to minimizing the empirical residuals: 113 ^ A[t]= arg min A[t] T Q t=p+1 ]Z[t]− p Q =1 A[]Z[t−]] 2 2 (5.26) whereZ[t] is used to represent a realization of the vector time series and Y●Y 2 2 denotes the squared` 2 norm. In testing for Granger causality via (5.24), a restricted model, such as ^ A j:SS , is estimated by excluding time series not in S when estimating the transfer function. The resulting p-values can then be thresholded to control the type I error rate and form Granger causality graphs. PC MV AR Because the problem of estimating the transfer function decouples into a separate linear regression problem for each individual time series, when estimating the transfer function coefficients forz i [t] corresponding to ^ A i:SS , the control setS need only include the time series that Granger causez i [t]. That is, ifS = {k;A ik [t]≠ 0 for anyt∈ {1; 2;:::;p}}, thenA i:SS [t] =A i: [t] whereA i:SS [t] is thei-th row of the optimal transfer function for the reduced model implied byS and order is again maintained with respect toZ[t] by inserting zeros at the locations of the excluded time series. This is significant because although the control sets are equivalent in terms of Granger causality, the hypothesis test for Granger causality in (5.24) will generally have greater statistical power and require fewer time samples on these reduced control sets. The test for Granger causality in (5.24) corroborates this as its degrees of freedom and numerator scaling factor decreases byp for every additional variable in the control set. It is these reduced control sets that serve as the basis of PC MV AR, the PCU extension to Granger causality. PC MV AR replaces the partial correlation tests on different control sets in PCU with tests for Granger causality on different control sets that imply different restricted mod- els (5.24), leading to a number of differences. First, because the graphs are directed, 114 the algorithm checks both edge directions and includes both edges in the graph-wide hypothesis test to control the FDR. Second, as with the original PCU algorithm, PC MV AR continuously updates its graph during estimation such that not all possible con- trol sets need be tested; however, the decoupling of the MV AR model solutions changes the determination of these control sets. In particular, only time series with edges feeding intoz i [t] need to be included in the control sets used to test any edge toz i [t]. Code for PC MV AR is given in Pseudocodes 4. By starting with small control sets that grow with each iteration through the graph, PC MV AR attempts to test for the Granger causality of each edge using the smallest control set size required. To ensure that a valid edge is not removed, this approach requires that two Granger causal time series on the full model not be noncausal on a reduced model. While such a situation is possible, it requires model parameters to precisely cancel out, which is very unlikely (Litterman and Weiss, 1983) . As with spectral PCU, we chose to incorporate Li and Wang’s non-heuristic modifi- cation for controlling the FDR (Li and Wang, 2009) in PC MV AR to again favor faster and more certain control of the FDR. Because of PC MV AR’s similarity and conceptual consistency with Li and Wang’s non-heuristically modified PC algorithm, which they proved asymptotically controlled the FDR, we hypothesize that PC MV AR also shares this property, but do not prove it. 115 Pseudocode 4 PC MV AR Algorithm Inputs: the target false discovery rate,q, andn×T data matrixZ consisting ofT time observations of a weakly stationaryn-length vector time series Output: the estimate of the directed Granger causality ^ G = (V; ^ E) 1: ^ G = (V; ^ E)← complete graph of n vertices, including self-edges i.e. (V i ;V i ) 2: m← 0 3: repeat 4: for each edgee= (V i ;V j )∈ ^ E do 5: parents← {k;(V k ;V j )∈ ^ E} 6: for each possible control setS ⊆ parents−i such that SSS=m do 7: Calculate RSS j:S{S∪i} and RSS j:SS fromZ and transform to an F-statistic (5.24) to get p-valuep ijSS 8: ifp ijSS is the largest p-value calculated yet for edgee= (V i ;V j ) and all edges in the graph have been tested at least once then 9: Perform the Benjamini-Hochberg procedure for controlling the FDR atq on the set of each edge’s largest p-value 10: Remove edges whose maximum p-value fails the Benjamini-Hochberg procedure from ^ E 11: if the current edgee was removed then 12: Exit the inside for loop to return to the outside for loop to select a new edge for testing 13: end if 14: end if 15: end for 16: end for 17: m← m+1 18: until S{k;(V k ;V j )∈ ^ E}S− 1<m for allV j ∈V ` 1 -regularized Regression The extension of the glasso method to Granger causality (Valdés-Sosa et al., 2005) calls for ` 1 regularization of the transfer function estimation problem (5.26) to encourage sparsity: ^ A[t]= arg min A[t] T Q t=p+1 ]Z[t]− p Q =1 A[]Z[t−]] 2 2 +∥A[]∥ 1 : (5.27) As with glasso and spectral glasso, there does not seem to be an established way to select tuning parameter such that an error rate is controlled for the entire graph, and effective selection of can be difficult. Whereas the other Granger causality methods could use the F-test to infer Granger causality at a controlled error rate, the F-test does not apply to transfer function estimates from (5.27). Instead, the` 1 -regularized regression method infers an edge if and only if the corresponding element of ^ A[t] is non-zero at anyt∈ {1; 2;:::;p}. We used YALMIP (Löfberg, 2004) and the SDPT3 solver (Toh et al., 1999) with default parameters to solve (5.27). In practice, because of incomplete convergence or numerical issues, the zeros of ^ A[t] do not necessarily reach zero, so we treated any values less than 1E-3 as zero. 5.2 Simulation 5.2.1 Method Comparison in Simulation Similar to Section 4.2, we compare the three partial coherence based methods against each other and the three Granger causality based methods against each other in simula- tion. To do this we generated synthetic data from an order 5 MV AR model with standard 117 normal innovations ([t] in (5.17)). Given a target topography for the Granger causality graph, the transfer functionA[t] is generated by selecting random poles of an infinite impulse response filter and using the resulting transfer function coefficients as the di- agonal. The off-diagonal locations (i.e. those corresponding to edges in the Granger causality graph) have a 50% chance of being non-zero for eacht∈ {1; 2;:::;p}, except that each edge had a randomly selectedt for which the transfer function is guaranteed to be non-zero so as to ensure that the target topography is achieved. These non-zero off- diagonal locations are assigned random coefficients uniformly distributed between 0.5 and 1 that are scaled by their corresponding diagonal coefficients to keep them less than these coefficients and increase the likelihood of generating a stable transfer function. This procedure is repeated until it results in a stable transfer function. The target topography for n signals was randomly generated according to two parameters b and d as follows. Starting with a completely connected topography, b(n 2 −n)~2 randomly generated (with repetition) bi-directional edges are removed from the topography. Next,d(n 2 −n) randomly generated (with repetition) directed edges are added to the topography. For the comparison of Granger causality based methods we chose n = 20, b = 3:1, and d = 0:04 which generates sparse Granger causality graphs with ~33 directed edges on average (standard deviation 5.6). For the comparison of par- tial coherence based methods we chosen = 20,b = 3:8, andd = 0:02 which generates sparse partial correlation graphs for time series with ~19 undirected edges on average (standard deviation 5.8). To fairly compare the methods we tuned their parameters to find the true number of edges for each simulation in the same fashion as described for the Gaussian graphical model methods of Section 4.2.1. However the tuning of the ` 1 regularized methods 118 required some changes. In particular, for spectral glasso we perform the same cross- validation procedure with the same MVN likelihood function measure of performance to evaluate different’s except that the likelihood function is now calculated once for each frequency and the coherence matrix is used in place of the sample covariance matrix. The results are then averaged over frequency to get the performance measure for a fold of the cross-validation procedure. Also, to infer exactly the targetd tar edges for an optimal (as chosen by the cross-validation procedure) we average the squared magnitudes of the partial coherence estimates over frequency and select the edges corresponding to the d tar largest as the inferred graph. To select for the inference of Granger causality graphs by the` 1 -regularized regres- sion method we used a bias-variance tradeoff curve in place of cross-validation because of computational complexity. The bias-variance tradeoff curve is generated by plotting the sum of the residual sum of squares for each time series (Section 5.1.3) against the ` 1 norm of the estimated transfer function for 20 values evenly spaced from -6 to 0 in log scale. This curve resembles a smoothed ’L’, and the optimal is selected as the corresponding to the point closest to the notch at the bottom left of the curve. With respect to the optimal’s transfer function, for each potential edgeV i → V j , the vari- ance of residual j [t] on the reduced model excludingz i [t] is calculated and the edges corresponding to thed tar largest of these values are included in the final graph. In regards to the selection of other parameters, we set the Granger causality based methods to only solve for MV AR models of order 5. For partial coherence based meth- ods we found blocksizeN S = 32 gave a satisfactory compromise between bias and vari- ance for the WOSA spectral density estimator. To reduce computational time we only considered estimates at frequenciesw ∈ {2k~6;k ∈ {0; 1; 2; 3}}, which were calculated through linear interpolation of the original 32 WOSA estimates. 119 We again evaluate performance in terms of the Dice coefficient and normalized in- ner product (Section 4.2.1). The Dice coefficient is evaluated between the true edge set and the inferred edge set and is a measure of overlap between the two sets. The normalized inner product measures the cosine of the angle between the vectorized true transfer function and a vectorized estimate of the transfer function that is constrained to the topography of a method’s inferred Granger causality graph. While it is possi- ble to similarly estimate partial coherence matrices with constrained topography, it is less straight forward, more computationally intensive, and not sufficiently informative to warrant its inclusion here. The performance measures were evaluated over a range of sample sizes and aver- aged over 15 trials for each method. Figure 5.1 shows the Dice performance measure for the partial coherence based methods, and Figure 5.2 shows the Dice and normal- ized inner product performance measures for the Granger causality based methods. For low samples sizes the parameter tuning of the PC type methods and the spectral glasso method failed such that the number of inferred edges was less than the true number of edges (d tar ) by a wide margin. Nevertheless, because we are concerned with perfor- mance at small sample sizes we include these data points and note that the problem occured for the six smallest sample sizes for spectral PCU and spectral glasso and for the two smallest sample sizes for PC MV AR. This complicates comparing the methods at these sample sizes because their sparsities no longer match and it is unclear whether finding fewer edges helps or hurts performance as the answer depends on the accuracy of the missing edges. Regardless, the methods would still perform better than the inverse spectral density and standard regression methods which do not have enough samples to perform the necessary hypothesis tests at many of these points. 120 Figure 5.1: Dice coefficient performance measure versus number of samples for each partial coherence based method on synthetic data. The figures show the` 1 regularization based methods generally outperforming the baseline standard regression and inverse spectral density methods at low sample sizes. The PC based methods in turn tend to tie or outperform the` 1 regularization based meth- ods at low sample sizes. In regards to the spectral methods, the lack of convergence to the true graph and corresponding change in behavior for large sample sizes is likely due to the number of samples having grown large enough that the spectral density estima- tor’s bias dominates its variance, violating a key assumption of the partial coherence hypothesis test and preventing increased samples from significantly improving perfor- mance. The` 1 regularized Granger causality method fails to converge because the ’L’ curve method of selecting is overly aggressive for large sample sizes where should be very close to zero. 5.2.2 Control of the False Discovery Rate Given that Li and Wang’s non-heuristic modification to the PC algorithm (Li and Wang, 2009) is proven to asymptotically control the FDR, one might expect spectral PCU and PC MV AR to do so as well considering that they maintain its basic structure and are 121 Figure 5.2: Dice coefficient and inner product performance measures versus number of samples for each Granger causality based method on synthetic data. conceptually consistent with it. We see no reason for this to not be the case, but do not prove it here. Instead, we estimate the methods’ abilities to control the FDR on the simulated data of Section 5.2.1 by averaging their empirical FDR (number of incorrectly inferred edges / total number of inferred edges) over 40 trials across a range of sample sizes. For the WOSA spectral density estimator, the blocksize was set to grow with the number of samples T as N S = T 0:38 to ensure that the estimator’s bias and variance would both decrease asT grew. Figure 5.3 shows the average empirical FDR for target FDRs 15%, 5%, and 1% converging properly. Both methods quickly reduced their empirical FDR to be within neighborhoods of their targets, but spectral PCU took many samples to fully achieve its target rates. While it is very possible that spectral PCU’s additional complexity due to the Simes procedure for combining frequency graphs accounts for much of its slower convergence, it should also be noted that PC MV AR likely benefitted from the synthetic data being generated by an MV AR model. 122 (a) Spectral PCU (b) PC MV AR Figure 5.3: Average empirical FDR versus sample size for spectral PCU and PC MV AR on their simulated MV AR models. 5.3 Real Data Here we apply the spectral PCU and PC MV AR methods to functional brain data from a task-based experiment (Bressler et al., 1993) that has been widely studied (Ding et al., 2000; Zhang et al., 2008; Ledberg et al., 2007; Aydore et al., 2013). The experiment used 51m diameter, bipolar electrode pairs with a tip separation of 2:5 mm at 15 cortical sites in the right hemisphere (Figure 5.4) of an adult rhesus macaque monkey to measure their local field potentials (LFPs) as they performed a visual pattern discrimination task. In particular, 115ms after pulling a lever the monkey was presented with either a line or diamond pattern (Figure 5.5) signifying whether the lever was to be released (go) or remain held (no-go). Correctly releasing the lever within 500 ms of the go stimulus resulted in a reward. Multiple trials were run each day over a period of 18 days with the go and no-go stimulus patterns switching between days. The data was originally sampled at 200 Hz and had bad trials discarded, leaving 10,178 trials of which 5,225 were go (2,322 line pattern = go over 8 days; 2,903 diamond pattern = go over 10 days) and 4,953 were no-go (2,546 line pattern = no-go; 2,407 diamond pattern = no-go). 123 The data was processed a day at a time with the go and no-go trials treated separately. This included bandpass filtering from 10–30 Hz with an order 14 two-way elliptic IIR filter, subtraction of the evoked response (as estimated by averaging over the stimulus locked trials), and temporal detrending. To deal with non-stationarity we applied spec- tral PCU and PC MV AR to short time windows. Following Aydore et al. (2013) and Ding et al. (2000) we chose two time windows, one centered at 120 ms after visual stim- ulus (early response) and another centered at 278 ms after stimulus (late response), both only 80 ms (16 samples) wide. Application of spectral PCU (blocksizeN S = 16, target FDR = 0.01) and PC MV AR (MV AR model order 3, target FDR 0.005) to the short time windows of the go and then no-go trials resulted in a go and no-go graph for each day and time window. Further following Aydore et al. (2013), we compare go versus no-go connectivity for the late response window and line versus diamond connectivity at the early response window. To help compare go versus no-go, we counted the number of days an edge was present in a method’s go graph and did the same for the no-go graphs. To compare line versus diamond, we counted the number of days an edge was present in a method’s graph corresponding to the monkey being shown a line stimulus, which could be the go or no-go graph depending on the day. This count was similarly repeated for the diamond stimulus. Figure 5.6 shows the go versus no-go connectivity results by only plotting the edges for which the magnitude of the difference between the go and no-go counts is at least 5 for spectral PCU and at least 6 for PC MV AR, and Table 5.1 lists their counts. There is significant similarity between the spectral PCU and PC MV AR graphs as well as between these graphs and the corresponding graphs by Aydore et al. (2013) and Ding et al. (2000). Both graphs show long connections between the visual cortex and the 124 sensorimotor cortex for the go condition that are not present for the no-go condition, which is consistent with the involvement of a visual to motor component for go trials that is not present for no-go trials. Furthermore, in the PC MV AR graph these connections are either bi-directional or directed from the visual cortex to the sensorimotor cortex, which is consistent with the temporal ordering of the experiment. Figure 5.7 shows the line versus diamond connectivity results by only plotting the edges for which the magnitude of the difference between the line and diamond counts is at least 6, which is an increase of the previous threshold for spectral PCU to help com- pensate for the line pattern only meaning go in 8 of the days compared to the diamond pattern’s 10 days. Also, Table 5.2 lists the counts for each edge. Compared to the go versus no-go results, both graphs show a loss of the long range visual to sensorimotor connections and an addition of short connections within the visual cortex, which is con- sistent with experimental expectations as the difference between the two conditions is purely visual. In addition to being consistent with one another, the spectral PCU and PC MV AR results again closely match those of Aydore et al. (2013). 5.4 Summary We proposed three novel methods for inferring direct connectivity in multivariate time series, two based on the PC algorithm, and one based on the graphical lasso algorithm. In particular, spectral PCU and spectral glasso infer undirected connectivity based on partial coherence and PC MV AR infers directed connectivity based on Granger causal- ity. In simulation all methods outperformed the baseline method at low sample sizes and spectral PCU and PC MV AR were able to do this while asymptotically controlling the FDR. Application of the PC-based methods to LFP data from a task-based monkey 125 Figure 5.4: Locations of the 15 electrode pairs (reproduced from Liang et al., 2001) Figure 5.5: The four possible visual patterns shown to the monkey in the study study resulted in graphs that were consistent with each other, other published findings, and experimental expectations. 126 (a) Spectral PCU results (b) PC MV AR results Figure 5.6: Spectral PCU and PC MV AR LFP monkey data late response (278 ms) results. The thickness of the edges is proportional to the difference in the corresponding counts with a blue edge denoting that the go count is larger and a red edge denoting that the no-go count is larger. For the PC MV AR results, arrows are used to denote causality and unidirectional arrows have been tipped yellow to help them stand out. 127 (a) Spectral PCU results (b) PC MV AR results Figure 5.7: Spectral PCU and PC MV AR LFP monkey data early response (120 ms) results. The thickness of the edges is proportional to the difference in the corresponding counts with a green edge denoting that the diamond count is larger and a magenta edge denoting that the line count is larger. For the PC MV AR results, arrows are used to denote causality and unidirectional arrows have been tipped blue to help them stand out. 128 Spectral PCU Late Response PC MV AR Late Response Pair Go No-go Pair Go No-go 14-15 17 3 14↔15 18 10 4-8 13 1 1→14 17 2 1-7 12 2 1↔15 7.5 0 8-9 13 5 6↔14 9 1.5 4-7 11 3 1→13 7 0 13-15 8 0 1↔7 10.5 3 1-8 17 10 1→9 8 1 9-11 4 15 13↔15 11 1.5 6-15 6 0 8→13 7 0 9-13 5 0 9↔13 7 0 12-13 9 3 9→15 6 0 4-5 12 17 4→8 11 5 1-5 5 10 5→2 18 12 1-14 5 0 1→2 13 7 8→9 10 4 7→8 12 6 Table 5.1: Spectral PCU and PC MV AR go and no-go counts for each edge in Figure 5.6. A bidirectional arrow is used to denote two opposite directional edges of the same color, and for its counts we report the averages of its direction edge counts. 129 Spectral PCU Early Response PC MV AR Early Response Pair Diamond Line Pair Diamond Line 3-4 1 14 6→5 18 10 2-3 17 4 5→4 12 18 4-5 5 16 6→3 17 2 2-5 15 5 2↔3 12 3.5 5-6 18 12 4→6 12 4 3↔4 3.5 13.5 5→10 1 11 2↔5 16 0.5 1→3 10 0 11→1 7 1 4→11 12 6 Table 5.2: Spectral PCU and PC MV AR diamond and line counts for each edge in Figure 5.7. A bidirectional arrow is used to denote two opposite directional edges of the same color, and for its counts we report the averages of its direction edge counts. 130 Chapter 6 Conclusion The development of non-invasive brain imaging modalities over the last several decades has greatly improved understanding of the human brain. Structural imaging modalities have enabled the in vivo study of anatomical connections and brain organization while functional imaging modalities have enabled the study of the brain activity arising from this structure. However, the analysis of such neuroimaging data is significantly com- plicated by the complexity of the human brain. One common approach to studying this data likens the brain to a highly interconnected network and seeks to identify its physical and statistical connections or relationships. Often this is done using blind source separa- tion (BSS) or statistical measures such as correlation, coherence, and Granger causality. However, standard application of BSS to data on the cerebral cortex completely ignores the structural information implied by the data’s location on the cortex, and standard ap- plication of the statistical measures ignores prior knowledge of the brain’s inherently sparse connectivity. This thesis proposed five novel methods to account for these shortcomings and im- prove the inference of brain connectivity from data spanning three different neuroimag- ing contexts. We compared each of these methods to the leading methods of their par- ticular context in simulation. Two methods were also compared to others with respect to empirical neuroimaging data. All methods however were applied to at least one empiri- cal neuroimaging dataset and had their results shown and discussed. More specifically, the main contributions of this work include: 131 • The development, evaluation, and application of a BSS method for data on the cerebral cortex that incorporates knowledge of the measurement locations to im- prove performance. This is done by updating the shift used in the computation of lagged covariance matrices in the SOBI method to a rotation on the two- dimensional manifold of the cerebral cortex. This updated SOBI method was shown to generally outperform the popular FastICA and original SOBI methods. I also proposed the study of cortical brain networks through the application of BSS to structural data on the cerebral cortex and applied the modified SOBI method to empirical cortical thickness, thickness variance, and curvature variance data to reveal their underlying modes of variation. • The development, evaluation, and application of PCU, a sequential graph prun- ing algorithm for the inference of sparse, undirected graphs from realizations of a multivariate normal distribution. PCU is based on the PC algorithm and performs hypothesis tests for non-zero partial correlation on smaller, statistically more pow- erful control sets that are equivalent to the original, larger control sets. PCU differs from the PC algorithm by approximately halving the number of control sets that need testing, speeding up the algorithm and increasing statistical power by pro- viding fewer opportunities to erroneously prune a valid edge. The method was shown to generally tie or outperform the concentration method and the graphical lasso in terms of accuracy on synthetic data and stability on empirical gray matter thickness data. In simulation, PCU did a good job of asymptotically controlling the false discovery rate (FDR). When applied to a cortical gray matter thickness dataset and an fMRI resting-state dataset of different subjects PCU returned re- markably similar networks 132 • The development, evaluation and application of spectral PCU, spectral glasso, and PC MV AR, the extensions of the PCU and glasso algorithms to time series. The spectral extensions infer graphs based on non-zero partial coherences, but refor- mulate the resulting partial coherence estimation and hypothesis testing problems as partial correlation estimation and hypothesis testing problems with respect to the discrete Fourier coefficients of a set of time series. In this way, the spectral extensions apply the same PCU and glasso techniques for the non-time series case to partial coherence and time series. PC MV AR infers directed graphs based on Granger causality under the assumption that the time series follow a multivariate autoregressive (MV AR) model. As with partial correlation, reduced yet equivalent control sets exists for which tests for Granger causality become more powerful, and PC MV AR identifies them similar to PCU. In simulation all of the extensions outperformed the baseline methods they were compared against at low sample sizes and spectral PCU and PC MV AR did this while asymptotically controlling the FDR. Application of the PC-based methods to LFP data from a task-based monkey study resulted in graphs that were consistent with each other, other pub- lished findings, and experimental expectations. Here the PC-based and glasso-based methods derived their superior performance from the assumption that the underlying connectivities were sparse. However, other regular- izations, priors, and model assumptions exist and will likely improve performance to the extent that they accurately and specifically fit the true underlying model. For ex- ample, all of the methods discussed here either explicitly or implicitly assume a linear model because it greatly simplifies the problems, but such an assumption is in direct contradiction to the brain’s highly non-linear nature, and performance could likely be 133 improved by fitting a more accurate, non-linear model. The assumption of weak station- arity is another example: if the non-stationarity of the brain could be better modeled, then longer time windows could be used and connectivity measures could be calculated over more samples. Autoregressive conditional heteroskedasticity (ARCH) models, an extension of MV AR models where the variance of the innovation terms are allowed to change over time, tend to better model non-stationarity and could improve the inference of brain connectivity. Although significant progress has been made towards understanding the brain, much remains unknown. However, the brain is being studied from multiple, interconnected fronts where an advance in one promotes progress in the others. As knowledge of the brain grows, connectivity measures will likely be devised with increasingly accurate and specific priors and assumptions that improve connectivity inference, leading for further insight and perpetuating a synergistic cycle of accelerating progress. 134 References Abeles, M., Bergman, H., Gat, I., Meilijson, I., Seidemann, E., Tishby, N., Vaadia, E., 1995. Cortical activity flips among quasi-stationary states. Proceedings of the National Academy of Sciences 92, 8616–8620. Achard, S., Salvador, R., Whitcher, B., Suckling, J., Bullmore, E., 2006a. A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. Journal of Neuroscience 26, 63–72. Achard, S., Salvador, R., Whitcher, B., Suckling, J., Bullmore, E., 2006b. A resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. The Journal of Neuroscience 26, 63–72. Anderson, T.W., 2011. The statistical analysis of time series. volume 19. Wiley. Awate, S.P., Yushkevich, P., Song, Z., Licht, D., Gee, J.C., 2009. Multivariate high- dimensional cortical folding analysis, combining complexity and shape, in neonates with congenital heart disease, in: Information Processing in Medical Imaging, Springer. pp. 552–563. Aydore, S., Pantazis, D., Leahy, R.M., 2013. A note on the phase locking value and its properties. NeuroImage 74, 231 – 244. Baccala, L.A., Sameshima, K., 2001. Partial directed coherence: a new concept in neural structure determination. Biological cybernetics 84, 463–474. Banerjee, O., El Ghaoui, L., d’Aspremont, A., 2008. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research 9, 485–516. Beckmann, C., DeLuca, M., Devlin, J.T., Smith, S.M., 2005. Investigations into resting- state connectivity using independent component analysis. Philosophical Transactions of the Royal Society B: Biological Sciences 360, 1001–1013. Beckmann, C., Smith, S., 2005. Tensorial extensions of independent component analysis for multisubject fMRI analysis. NeuroImage 25, 294 – 311. 135 Belouchrani, A., Abed-Meraim, K., Cardoso, J.F., Moulines, E., 1993. Second-order blind separation of correlated sources, in: International Conference Digital Signal Processing, pp. 346–351. Belouchrani, A., Abed-Meraim, K., Cardoso, J.F., Moulines, E., 1997. A blind source separation technique using second-order statistics. IEEE Transactions on Signal Pro- cessing 45(2). Benjamini, Y ., Hochberg, Y ., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological) , 289–300. Benjamini, Y ., Yekutieli, D., 2001. The control of the false discovery rate in multiple testing under dependency. Annals of statistics , 1165–1188. Biswal, B., Zerrin Yetkin, F., Haughton, V .M., Hyde, J.S., 1995. Functional connec- tivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine 34, 537–541. Bock, D., Lee, W., Kerlin, A., Andermann, M., Hood, G., Wetzel, A., Yurgenson, S., Soucy, E., Kim, H., Reid, R., 2011. Network anatomy and in vivo physiology of visual cortical neurons. Nature 471, 177–182. Brandt, P.T., Williams, J.T., 2006. Multiple time series models. volume 148. Sage Publications, Incorporated. Bressler, S.L., Coppola, R., Nakamura, R., 1993. Episodic multiregional cortical coher- ence at multiple frequencies during visual task performance. Nature 366, 153–156. Bressler, S.L., Ding, M., Yang, W., 1999. Investigation of cooperative cortical dynamics by multivariate autoregressive modeling of event-related local field potentials. Neu- rocomputing 26, 625–631. Brovelli, A., Ding, M., Ledberg, A., Chen, Y ., Nakamura, R., Bressler, S.L., 2004. Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by granger causality. Proceedings of the National Academy of Sciences of the United States of America 101, 9849–9854. Calhoun, V .D., Adali, T., Hansen, L.K., Larsen, J., Pekar, J.J., 2003. ICA of func- tional MRI data: An overview, in: International Workshop on Independent Compo- nent Analysis and Blind Signal Separation, pp. 346–351. Calhoun, V .D., Kiehl, K.A., Pearlson, G.D., 2008. Modulation of temporally coherent brain networks estimated using ICA at rest and during cognitive tasks. Proceedings of the IEEE 29(7), 828–838. 136 Calhoun, V .D., Liu, J., AdalI, T., 2009. A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and erp data. NeuroImage 45, S163 – S172. Mathematics in Brain Imaging. Callan, D.E., Callan, A.M., Kroos, C., Vatikiotis-Bateson, E., 2001. Multimodal con- tribution to speech perception revealed by independent component analysis: a single- sweep EEG case study. Cognitive Brain Research 10, 349 – 353. Casella, G., Berger, R., 2001. Statistical Inference. Duxbury Press. Chao-Gan, Y ., Yu-Feng, Z., 2010. Dparsf: a matlab toolbox for pipeline data analysis of resting-state fMRI. Frontiers in systems neuroscience 4. Chen, Z.J., He, Y ., Rosa-Neto, P., Germann, J., Evans, A.C., 2008. Revealing Modular Architecture of Human Brain Structural Networks by Using Cortical Thickness from MRI. Cerebral Cortex 18, 2374–2381. Chung, M., Worsley, K., Robbins, S., Paus, T., Taylor, J., Giedd, J., Rapoport, J., Evans, A., 2003. Deformation-based surface morphometry applied to gray matter deforma- tion. NeuroImage 18, 198–213. Dahlhaus, R., 2000. Graphical interaction models for multivariate time series 1. Metrika 51, 157–172. Dahlhaus, R., Eichler, M., 2003. Causality and graphical models in time series analysis. Oxford Statistical Science Series , 115–137. Dale, A.M., Fischl, B., Sereno, M.I., 1999. Cortical surface-based analysis: I. segmen- tation and surface reconstruction. NeuroImage 9, 179 – 194. Damoiseaux, J., Rombouts, S., Barkhof, F., Scheltens, P., Stam, C., Smith, S.M., Beck- mann, C., 2006. Consistent resting-state networks across healthy subjects. Proceed- ings of the national academy of sciences 103, 13848–13853. Dice, L., 1945. Measures of the amount of ecologic association between species. Ecol- ogy 26, 297–302. Ding, M., Bressler, S.L., Yang, W., Liang, H., 2000. Short-window spectral analysis of cortical event-related potentials by adaptive multivariate autoregressive modeling: Data preprocessing, model validation, and variability assessment. Biological cyber- netics 83, 35–45. Ding, M., He, B., 2013. Exploring functional and causal connectivity in the brain, in: Neural Engineering. Springer, pp. 545–564. 137 Draganski, B., Gaser, C., Busch, V ., Schuierer, G., Bogdahn, U., May, A., 2004. Neuro- plasticity: changes in grey matter induced by training. Nature 427, 311–312. Draganski, B., Gaser, C., Kempermann, G., Kuhn, H., Winkler, J., Buchel, C., May, A., 2006. Temporal and spatial dynamics of brain structure changes during extensive learning. Journal of Neuroscience 26, 6314. Edwards, D., 2000. Introduction to graphical modelling. Springer Verlag. Efron, B., Tibshirani, R., 1993. An introduction to the bootstrap. volume 57. Chapman & Hall/CRC. Eichler, M., 2007. Granger causality and path diagrams for multivariate time series. Journal of Econometrics 137, 334–353. Esposito, F., Scarabino, T., Hyvärinen, A., Himberg, J., Formisano, E., Comani, S., Tedeschi, G., Goebel, R., Seifritz, E., Salle, F.D., 2005. Independent component analysis of fMRI group studies by self-organizing clustering. NeuroImage 25, 193 – 205. Ferrer, I., Blanco, R., Carulla, M., Condom, M., Alcantara, S., Olive, M., Planas, A., 1995. Transforming growth factor-[alpha] immunoreactivity in the developing and adult brain. Neuroscience 66, 189–199. Fischl, B., Sereno, M.I., Dale, A.M., 1999. Cortical surface-based analysis: Ii: Inflation, flattening, and a surface-based coordinate system. NeuroImage 9, 195 – 207. Fisher, R., 1921. On the "probable error" of a coefficient of correlation deduced from a small sample. Metron 1, 3–32. Fox, M.D., Snyder, A.Z., Vincent, J.L., Corbetta, M., Essen, D.C.V ., Raichle, M.E., 2005. The human brain is intrinsically organized into dynamic, anticorrelated func- tional networks. Proceedings of the National Academy of Sciences of the USA 102, 9673–9678. Friedman, J., Hastie, T., Tibshirani, R., 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432. Friston, K., 1994. Functional and effective connectivity in neuroimaging: a synthesis. Human Brain Mapping 2, 56–78. Genovese, C., Lazar, N., Nichols, T., 2002. Thresholding of statistical maps in func- tional neuroimaging using the false discovery rate. Neuroimage 15, 870–878. Giedd, J., Schmitt, J., Neale, M., 2007. Structural brain magnetic resonance imaging of pediatric twins. Human Brain Mapping 28, 474–481. 138 Golub, G.H., Van Loan, C.F., 1996. Matrix Computations (Johns Hopkins Studies in Mathematical Sciences)(3rd Edition). The Johns Hopkins University Press. 3rd edi- tion. Gong, G., He, Y ., Concha, L., Lebel, C., Gross, D.W., Evans, A.C., Beaulieu, C., 2009. Mapping Anatomical Connectivity Patterns of Human Cerebral Cortex Using In Vivo Diffusion Tensor Imaging Tractography. Cerebral Cortex 19, 524–536. Granger, C., 1969. Investigating causal relations by econometric models and cross- spectral methods. Econometrica: Journal of the Econometric Society , 424–438. Hagmann, P., Cammoun, L., Gigandet, X., Meuli, R., Honey, C., Wedeen, V ., Sporns, O., 2008. Mapping the structural core of human cerebral cortex. PLoS Biology 6, e159. Hagmann, P., Kurant, M., Gigandet, X., Thiran, P., Wedeen, V .J., Meuli, R., Thiran, J., 2007. Mapping human whole-brain structural networks with diffusion MRI. PLoS ONE 2, e597. Hannan, E.J., 2009. Multiple time series. volume 38. Wiley. Hardan, A.Y ., Jou, R.J., Keshavan, M.S., Varma, R., Minshew, N.J., 2004. Increased frontal cortical folding in autism: a preliminary MRI study. Psychiatry Research: Neuroimaging 131, 263 – 268. He, Y ., Chen, Z.J., Evans, A.C., 2007. Small-world anatomical networks in the human brain revealed by cortical thickness from MRI. Cerebral Cortex 17, 2407–2419. Hoerzer, G.M., Liebe, S., Schloegl, A., Logothetis, N.K., Rainer, G., 2010. Directed coupling in local field potentials of macaque v4 during visual short-term memory revealed by multivariate autoregressive models. Frontiers in computational neuro- science 4. Honey, C., Kötter, R., Breakspear, M., Sporns, O., 2007. Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proceedings of the National Academy of Sciences 104, 10240–10245. Honey, C., Sporns, O., Cammoun, L., Gigandet, X., Thiran, J., Meuli, R., Hagmann, P., 2009. Predicting human resting-state functional connectivity from structural connec- tivity. Proceedings of the National Academy of Sciences 106, 2035–2040. Hyvärinen, A., Karhunen, J., Oja, E., 2001. Independent Component Analysis. Wiley- Interscience. Hyvärinen, A., Köster, U., 1997. A fast fixed-point algorithm for independent compo- nent analysis. Neural Computation 9, 1483–1492. 139 Ikeda, S., Toyama, K., 2000. Independent component analysis for noisy data—MEG data analysis. Neural Networks 13, 1063–1074. Joshi, A., Joshi, S., Leahy, R., Shattuck, D., Dinov, I., Toga, A., 2010. Bayesian ap- proach for network modeling of brain structural features, in: Proceedings of SPIE, p. 762607. Joshi, A., Leahy, R., Toga, A., Shattuck, D., 2009a. A framework for brain registration via simultaneous surface and volume flow, in: Information Processing in Medical Imaging, Springer. pp. 576–588. Joshi, A., Shattuck, D., Thompson, P., Leahy, R., 2007. Surface-constrained volumetric brain registration using harmonic mappings. Medical Imaging, IEEE Transactions on 26, 1657 –1669. Joshi, A., Shattuck, D., Thompson, P., Leahy, R., 2009b. A parameterization-based nu- merical method for isotropic and anisotropic diffusion smoothing on non-flat surfaces. IEEE Transactions on Image Processing 18, 1358–1365. Jost, J., 2011. Universitext: Riemannian Geometry and Geometric Analysis. Springerverlag Berlin Heidelberg. Joyce, C.A., Gorodnitsky, I.F., Kutas, M., 2004. Automatic removal of eye movement and blink artifacts from EEG data using blind component separation. Psychophysiol- ogy 41(2), 313–325. Jung, T.P., Makeig, S., Humphries, C., Lee, T.W., Mckeown, M.J., Iragui, V ., Sejnowski, T.J., 2000. Removing electroencephalographic artifacts by blind source separation. Psychophysiology 37, 163–178. Jung, T.P., Makeig, S., McKeown, M., Bell, A., Lee, T.W., Sejnowski, T., 2001. Imaging brain dynamics using independent component analysis. Proceedings of the IEEE 89, 1107 –1122. Kamada, T., Kawai, S., 1989. An algorithm for drawing general undirected graphs. Information processing letters 31, 7–15. Kendall, M.G., 1945. The Advanced Theory of Statistics. volume 1. Charles Griffin and Company. Klemm, M., Haueisen, J., Ivanova, G., 2009. Independent component analysis: com- parison of algorithms for the investigation of surface electrical brain activity. Medical and Biological Engineering and Computing 47, 413–423. Koller, D., Friedman, N., 2009. Probabilistic Graphical Models: Principles and Tech- niques. The MIT Press. 140 Koopmans, L.H., 1995. The spectral analysis of time series. volume 22. Academic Press. Korn, M.R., Dyer, C.R., 1987. 3-d multiview object representations for model-based object recognition. Pattern Recognition 20, 91–103. Kraitchik, M., 1953. Mathematical Recreations (A Mosaic on the Sphere). Dover Pub- lications. Lauritzen, S., 1996. Graphical models. volume 17. Oxford University Press, USA. Ledberg, A., Bressler, S.L., Ding, M., Coppola, R., Nakamura, R., 2007. Large-scale visuomotor integration in the cerebral cortex. Cerebral cortex 17, 44–62. Lee, H., Lee, D., Kang, H., Kim, B., Chung, M., 2011. Sparse brain network recovery under compressed sensing. IEEE Transactions on Medical Imaging 30, 1154–1165. Lee, T.W., Lewicki, M., Girolami, M., Sejnowski, T., 1999. Blind source separation of more sources than mixtures using overcomplete representations. Signal Processing Letters, IEEE 6, 87–90. Lerch, J., Pruessner, J., Zijdenbos, A., Hampel, H., Teipel, S., Evans, A., 2005. Fo- cal decline of cortical thickness in alzheimer’s disease identified by computational neuroanatomy. Cerebral Cortex 15, 995–1001. Lerch, J., Worsley, K., Shaw, W., Greenstein, D., Lenroot, R., Giedd, J., Evans, A., 2006. Mapping anatomical correlations across cerebral cortex (MACACC) using cortical thickness from MRI. NeuroImage 31, 993–1003. Li, J., Wang, Z., 2009. Controlling the false discovery rate of the association/causality structure learned with the PC algorithm. The Journal of Machine Learning Research 10, 475–514. Liang, H., Ding, M., Bressler, S.L., 2001. Temporal dynamics of information flow in the cerebral cortex. Neurocomputing 38, 1429–1435. Lichtman, J., Livet, J., Sanes, J., 2008. A technicolour approach to the connectome. Nature Reviews Neuroscience 9, 417–422. Litterman, R.B., Weiss, L., 1983. Money, real interest rates, and output: A reinterpreta- tion of postwar us data. Löfberg, J., 2004. YALMIP : A toolbox for modeling and optimization in MATLAB, in: Proceedings of the CACSD Conference, Taipei, Taiwan. Logothetis, N., Wandell, B., 2004. Interpreting the bold signal. Annu. Rev. Physiol. 66, 735–769. 141 Luders, E., Thompson, P., Narr, K., Toga, A., Jancke, L., Gaser, C., et al., 2006. A curvature-based approach to estimate local gyrification on the cortical surface. Neu- roimage 29, 1224–1230. Maguire, E., Woollett, K., Spiers, H., 2006. London taxi drivers and bus drivers: a structural MRI and neuropsychological analysis. Hippocampus 16, 1091–1101. Makeig, S., Jung, T.P., Bell, A.J., Ghahremani, D., Sejnowski, T.J., 1997. Blind sepa- ration of auditory event-related brain responses into independent components. Pro- ceedings of the National Academy of Sciences 94, 10979–10984. Marrelec, G., Krainik, A., Duffau, H., Pélégrini-Issac, M., Lehéricy, S., Doyon, J., Be- nali, H., 2006. Partial correlation for functional brain interactivity investigation in functional MRI. NeuroImage 32, 228–237. Matsuoka, K., Ohoya, M., Kawamoto, M., 1995. A neural net for blind separation of nonstationary signals. Neural Networks 8, 411 – 419. McKeown, M.J., Makeig, S., G.Brown, G., Jung, T.P., Kindermann, S.S., J.Bell, A., J.Sejnowski, T., 1998. Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping . Meek, C., 1995. Strong completeness and faithfulness in bayesian networks, in: Pro- ceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc.. pp. 411–418. Meinecke, F., Ziehe, A., Kawanabe, M., Muller, K.R., 2002. A resampling approach to estimate the stability of one-dimensional or multidimensional independent compo- nents. Biomedical Engineering, IEEE Transactions on 49, 1514 –1525. Meinshausen, N., Bühlmann, P., 2006. High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34, 1436–1462. Miltner, W.H., Braun, C., Arnold, M., Witte, H., Taub, E., 1999. Coherence of gamma- band EEG activity as a basis for associative learning. Nature 397, 434–436. Mitelman, S., Shihabuddin, L., Brickman, A., Buchsbaum, M., 2005. Cortical inter- correlations of temporal area volumes in schizophrenia. Schizophrenia research 76, 207–229. Mitteroecker, P., Bookstein, F., 2009. Examining modularity via partial correlations: A rejoinder to a comment by Paul Magwene. Systematic Biology 58, 346. Moosmann, M., Eichele, T., Nordby, H., Hugdahl, K., Calhoun, V .D., 2008. Joint inde- pendent component analysis for simultaneous EEG-fMRI: Principle and simulation. International Journal of Psychophysiology 67, 212 – 221. Integration of EEG and fMRI. 142 Narr, K.L., Bilder, R.M., Toga, A.W., Woods, R.P., Rex, D.E., Szeszko, P.R., Robinson, D., Sevy, S., Gunduz-Bruce, H., Wang, Y .P., DeLuca, H., Thompson, P.M., 2005. Mapping cortical thickness and gray matter concentration in first episode schizophre- nia. Cerebral Cortex 15(6), 708–719. Onton, J., Westerfield, M., Townsend, J., Makeig, S., 2006. Imaging human EEG dy- namics using independent component analysis. Neuroscience and Biobehavioral Re- views 30, 808 – 822. Methodological and Conceptual Advances in the Study of Brain-Behavior Dynamics: A Multivariate Lifespan Perspective. Pantazis, D., Joshi, A., Jiang, J., Shattuck, D.W., Bernstein, L.E., Damasio, H., Leahy, R.M., 2010. Comparison of landmark-based and automatic methods for cortical sur- face registration. NeuroImage 49, 2479 – 2493. Pearl, J., 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. Percival, D.B., Walden, A.T., 1993. Spectral analysis for physical applications. Cam- bridge University Press. Priestley, M., 1981. Spectral analysis and time series . Raichle, M., MacLeod, A., Snyder, A., Powers, W., Gusnard, D., Shulman, G., 2001. A default mode of brain function. Proceedings of the National Academy of Sciences 98, 676–682. Sallet, P.C., Elkis, H., Alves, T.M., Oliveira, J.R., Sassi, E., de Castro, C.C., Busatto, G.F., Gattaz, W.F., 2003. Reduced Cortical Folding in Schizophrenia: An MRI Mor- phometric Study. Am J Psychiatry 160, 1606–1613. Sato, W., Kochiyama, T., Yoshikawa, S., Matsumura, M., 2001. Emotional expression boosts early visual processing of the face: Erp recording and its decomposition by independent component analysis. NeuroReport 12(4). Schmithorst, V .J., Holland, S.K., 2004. A comparison of three methods for generating group statistical inferences from independent component analysis of fMRI. Journal of Magnetic Resonance Imaging 19(3), 365–368. Schmitt, J., Lenroot, R., Wallace, G., Ordaz, S., Taylor, K., Kabani, N., Greenstein, D., Lerch, J., Kendler, K., Neale, M., et al., 2008. Identification of genetically medi- ated cortical networks: a multivariate study of pediatric twins and siblings. Cerebral Cortex 18, 1737. Sehatpour, P., Molholm, S., Schwartz, T.H., Mahoney, J.R., Mehta, A.D., Javitt, D.C., Stanton, P.K., Foxe, J.J., 2008. A human intracranial study of long-range oscillatory 143 coherence across a frontal–occipital–hippocampal brain network during visual object processing. Proceedings of the National Academy of Sciences 105, 4399–4404. Shattuck, D., Leahy, R., 2002. Brainsuite: an automated cortical surface identification tool. Medical Image Analysis 6, 129–142. Simes, R.J., 1986. An improved bonferroni procedure for multiple tests of significance. Biometrika 73, 751–754. Skudlarski, P., Jagannathan, K., Calhoun, V ., Hampson, M., Skudlarska, B., Pearlson, G., et al., 2008. Measuring brain connectivity: diffusion tensor imaging validates resting state temporal correlations. Neuroimage 43, 554–561. Sowell, E.R., Peterson, B.S., Thompson, P.M., Welcome, S.E., Henkenius, A.L., Toga, A.W., 2003. Mapping cortical change across the human life span. Nat Neurosci 6(3), 309–315. Sowell, E.R., Thompson, P.M., Rex, D., Kornsand, D., Tessner, K.D., Jernigan, T.L., Toga, A.W., 2002. Mapping sulcal pattern asymmetry and local cortical surface gray matter distribution in vivo: maturation in perisylvian cortices. Cereb. Cortex 12, 17– 26. Spirtes, P., Glymour, C., Scheines, R., 2000. Causation, Prediction, and Search. vol- ume 81. The MIT Press. Sporns, O., Tononi, G., Kotter, R., 2005. The human connectome: a structural descrip- tion of the human brain. PLoS Comput Biol 1, e42. Stam, C., Jones, B., Nolte, G., Breakspear, M., Scheltens, P., 2007. Small-World Net- works and Functional Connectivity in Alzheimer’s Disease. Cereb. Cortex 17, 92–99. Stone, J.V ., Porrill, J., Porter, N.R., Wilkinson, I.D., 2002. Spatiotemporal indepen- dent component analysis of event-related fMRI data using skewed probability density functions. NeuroImage 15, 407 – 421. Supekar, K., Menon, V ., Rubin, D., Musen, M., Greicius, M., 2008. Network analysis of intrinsic functional brain connectivity in alzheimer’s disease. PLoS Comput Biol 4, e1000100. Svensén, M., Kruggel, F., Benali, H., 2002. ICA of fMRI group study data. NeuroImage 16, 551 – 563. Tang, A.C., Sutherland, M.T., McKinney, C.J., 2005. Validation of sobi components from high-density EEG. NeuroImage 25, 539 – 553. 144 Theis, F.J., Meyer-B ´ ’ase, A., Lang, E.W., 2004. Second-order blind source separation based on multi-dimensional autocovariances, in: Puntonet, C.G., Prieto, A. (Eds.), Independent Component Analysis and Blind Signal Separation. Springer Berlin / Hei- delberg. volume 3195 of Lecture Notes in Computer Science, pp. 726–733. Thompson, P., Cannon, T., Narr, K., van Erp, T., Poutanen, V ., Huttunen, M., Lönnqvist, J., Standertskjöld-Nordenstam, C., Kaprio, J., Khaledy, M., et al., 2001a. Genetic influences on brain structure. Nature Neuroscience 4, 1253–1258. Thompson, P., Vidal, C., Giedd, J., Gochman, P., Blumenthal, J., Nicolson, R., Toga, A., Rapoport, J., 2001b. Mapping adolescent brain change reveals dynamic wave of accelerated gray matter loss in very early-onset schizophrenia. Proceedings of the National Academy of Sciences 98, 11650. Toh, K., Todd, M., Tutuncu, R., 1999. Sdpt 3- a matlab software package for semidefi- nite programming, version 1. 3. Optimization Methods and Software 11, 545–581. Tong, L., Soon, V .C., Huang, Y .F., Liu, R., 1990. Amuse: A new blind identification algorithm, in: IEEE International Symposium on Circuits and Systems, pp. 1784– 1787. Tononi, G., Edelman, G.M., 1998. Consciousness and complexity. Science 282, 1846– 1851. Valdés-Sosa, P., Sánchez-Bornot, J., Lage-Castellanos, A., Vega-Hernández, M., Bosch- Bayard, J., Melie-García, L., Canales-Rodríguez, E., 2005. Estimating brain func- tional connectivity with sparse multivariate autoregression. Philosophical Transac- tions of the Royal Society B: Biological Sciences 360, 969–981. van de Ven, V ., Formisano, E., Prvulovic, D., Roeder, C.H., Linden, D.E., 2004. Func- tional connectivity as revealed by spatial independent component analysis of fMRI measurements during rest. Human Brain Mapping 22(3), 165–178. Vigário, R., Jousmäki, V ., Haemaelaeninen, M., Haft, R., Oja, E., 1998. Independent component analysis for identification of artifacts in magnetoencephalographic record- ings. Advances in neural information processing systems , 229–235. Vigário, R., Sarela, J., Jousmiki, V ., Hamalainen, M., Oja, E., 2000. Independent com- ponent approach to the analysis of EEG and MEG recordings. Biomedical Engineer- ing, IEEE Transactions on 47, 589 –593. V orobyov, S., Cichocki, A., 2002. Blind noise reduction for multisensory signals using ICA and subspace filtering, with application to EEG analysis. Biological Cybernetics 86, 293–303. 145 Welch, P., 1967. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. Audio and Electroacoustics, IEEE Transactions on 15, 70–73. Westad, F., Kermit, M., 2003. Cross validation and uncertainty estimates in independent component analysis. Analytica Chimica Acta 490, 341 – 354. Papers presented at the 8th International Conference on Chemometrics and Analytical Chemistry. Whittaker, J., 1990. Graphical models in applied multivariate statistics. volume 16. Wiley New York. Woods, R.P., Grafton, S.T., Holmes, C.J., Cherry, S.R., Mazziotta, J.C., 1998. Auto- mated image registration: I. general methods and intrasubject, intramodality valida- tion. Journal of Computer Assisted Tomography 22, 139–152. Worsley, K., Charil, A., Lerch, J., Evans, A., 2005. Connectivity of anatomical and functional MRI data, in: IEEE International Joint Conference on Neural Networks, pp. 1534–1541. Worsley, K., Marrett, S., Neelin, P., Vandal, A., Friston, K., Evans, A., et al., 1996. A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping 4, 58–73. Zhang, Y ., Chen, Y ., Bressler, S.L., Ding, M., 2008. Response preparation and inhibi- tion: the role of the cortical sensorimotor beta rhythm. Neuroscience 156, 238. Zhukov, L., Weinstein, D., Johnson, C., 2000. Independent component analysis for EEG source localization. Engineering in Medicine and Biology Magazine, IEEE 19, 87–96. Zilles, K., Schleicher, A., Langemann, C., Amunts, K., Morosan, P., Palomero- Gallagher, N., Schormann, T., Mohlberg, H., B ´ ’urgel, U., Steinmetz, H., Schlaug, G., Roland, P.E., 1997. Reduced Cortical Folding in Schizophrenia: An MRI Mor- phometric Study. Human Brain Mapping 5, 218–221. 146
Abstract (if available)
Abstract
Although the human brain has been studied for centuries, and the advent of non-invasive brain imaging modalities in the last century in particular has led to significant advances, there is much left to discover. Current neuroscientific theory likens the brain to a highly interconnected network whose behavior can be better understood by determining its network connections. Correlation, coherence, Granger causality, and blind source separation (BSS) are frequently used to infer this connectivity. Here I propose novel methods to improve their inference from neuroimaging data. ❧ Correlation and coherence suffer from being unable to differentiate between direct and indirect connectivity. While partial correlation and partial coherence can mitigate this problem, standard methods for calculating these measures result in significantly reduced statistical inference power and require greater numbers of samples. To address these drawbacks I propose novel methods based on a graph pruning algorithm that leverage the connectivity sparsity of the brain to improve the inference of partial correlation and partial coherence. These methods are demonstrated in applications. In particular, partial correlation is explored in both cortical thickness data from structural magnetic resonance (MR) images and resting state data from functional MR images, and partial coherence is explored in invasive electrophysiological measurements in non-human primates. ❧ Granger causality inherently differentiates between direct and indirect connectivity and like partial coherence is readily applicable to time series. However unlike partial coherence, it uses the temporal ordering implied by the time series to infer a type of causality on the connectivity. Despite its differences, the inference of Granger causality can also be improved using a similar graph pruning algorithm, and I describe such an extension here. The method is also applied to explore electrophysiological interactions in non-human primate data. ❧ BSS methods seek to decompose a dataset into a linear mixture of sources such that the sources best match some target property, such as independence. The second order blind identification (SOBI) BSS method has a number of properties particularly well-suited for data on the cerebral cortex and relies on the calculation of lagged covariance matrices. However while these lagged covariance matrices are readily available in one-dimensional data, they are not straightforward to calculate on the two-dimensional cortical manifold on which certain types of neuroimaging data lie. To address this, I propose a method for calculating the covariance matrices on the cortical manifold and demonstrate its application to cortical gray matter thickness and curvature data on the cerebral cortex.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Measuring functional connectivity of the brain
PDF
Multivariate statistical analysis of magnetoencephalography data
PDF
Applications of graph theory to brain connectivity analysis
PDF
Signal processing methods for interaction analysis of functional brain imaging data
PDF
Functional connectivity analysis and network identification in the human brain
PDF
Methods for improving reliability and consistency in diffusion MRI analysis
Asset Metadata
Creator
Wheland, David Stanford
(author)
Core Title
Signal processing methods for brain connectivity
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
09/13/2013
Defense Date
05/22/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
blind source separation,brain,coherence,connectivity,cortical curvature,EEG,fMRI,Granger causality,graphical model,gray matter thickness,independent component analysis,MRI,multivariate autoregressive model,network,neuroimaging,OAI-PMH Harvest,partial correlation,sparsity
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Leahy, Richard M. (
committee chair
), Jenkins, Brian Keith (
committee member
), Joshi, Anand A. (
committee member
), Tjan, Bosco S. (
committee member
)
Creator Email
dwheland@gmail.com,wheland@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-328335
Unique identifier
UC11294003
Identifier
etd-WhelandDav-2041.pdf (filename),usctheses-c3-328335 (legacy record id)
Legacy Identifier
etd-WhelandDav-2041.pdf
Dmrecord
328335
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wheland, David Stanford
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
blind source separation
brain
coherence
connectivity
cortical curvature
EEG
fMRI
Granger causality
graphical model
gray matter thickness
independent component analysis
MRI
multivariate autoregressive model
neuroimaging
partial correlation
sparsity