Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Multivariate statistical analysis of magnetoencephalography data
(USC Thesis Other)
Multivariate statistical analysis of magnetoencephalography data
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MULTIV ARIATE STATISTICAL ANALYSIS OF MAGNETOENCEPHALOGRAPHY DATA by Juan Luis Poletti Soto A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2011 Copyright 2011 Juan Luis Poletti Soto Dedication Dedico esta tesis a mi ahijado Rodri. ii Acknowledgments I would like to express my deepest gratitude to my advisor, Dr. Richard Leahy, for welcoming me into his research group, for his patience, and for his constant support and guidance. I would also like to profoundly thank my good friend Dr. Dimitrios Pantazis for our long discussions, and for his strive for excellence that always sought to bring out the best in all of us. My thanks go to the members of my qualifying exam and dissertation committees, Drs. Antonio Ortega, Bosco Tjan, and Krishna Nayak, for their useful comments and suggestions. Dr. Nayak’s frequent mentoring has been helpful since my early days at USC, and for that I am especially grateful. I am also grateful to all Electrical Engineering faculty and staff. The fulfilling experience I had at USC is in great part due to the friendship of my former and current colleagues at the Biomedical Imaging Research Laboratory: Quanzheng Li, Joyita Dutta, Yuteng Chang, Syed Ashrafulla, David Wheland, Yanguang Lin, Sergul Aydore, Ran Ren, Wentao Zhu, Chitresh Bhushan, Ying Fang, Sangtae Ahn, Sanghee Cho, Belma Dogdas, Felix Darvas, Hua Hui, Anand Joshi, Esen Kucukaltun- Yildirim, Zheng Li, and Sangeetha Somajayula. The data used throughout this thesis has been kindly provided by Dr. Karim Jerbi from INSERM in Lyon (France), with whom I have had very insightful discussions, that iii always brought a new and interesting perspective to my work. I have also benefited greatly from exchanges with Drs. John Mosher, Sylvain Baillet, Matti Hamalainen, and Thomas Nichols. Relatives and friends in Brazil and Paraguay have been incredibly supportive and understanding, and my fellow Brazilian graduate students at USC and UCLA have become my true family in Los Angeles; I thank them all. And finally, I could not have accomplished any of this without the unconditional love from my parents, Maria Adelia and Domingo Antonio, my brothers Edgar and Gabriel, my sister Maria Estela, and my fiancee - and my better half - Bianca. This work is theirs as much as it is mine. iv Table of Contents Dedication ii Acknowledgments iii List of Figures vii Abstract xi Chapter 1: Introduction 1 Chapter 2: Overview of Magnetoencephalography 7 2.1 Electrophysiological Sources of MEG Signals . . . . . . . . . . . . . . 7 2.1.1 Functional Interactions . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Forward and Inverse Modeling . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 3: Image Reconstruction and Statistical Analysis 14 3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Inverse Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 Regularized Minimum Norm . . . . . . . . . . . . . . . . . . . 15 3.2.2 Linearly Constrained Minimum Variance (LCMV) Beamformer 17 3.2.3 Multiple Signal Classification (MUSIC) . . . . . . . . . . . . . 18 3.3 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.1 Familywise Error Rate (FWER) . . . . . . . . . . . . . . . . . 21 3.3.2 False Discovery Rate (FDR) . . . . . . . . . . . . . . . . . . . 24 3.3.3 Multi-subject Analysis . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 4: Methods for estimating functional interactions in the brain 28 4.1 Coherence and Coherency . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1.1 Cross-Frequency Amplitude Coupling . . . . . . . . . . . . . . 29 4.2 Phase Synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 Methods for Nested Oscillations . . . . . . . . . . . . . . . . . . . . . 31 v 4.4 Multivariate Autoregressive (MV AR) Models . . . . . . . . . . . . . . 32 Chapter 5: Detection of Changes in Oscillatory Neural Activity with Multi- variate Linear Models 34 5.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Multivariate Analysis of Variance . . . . . . . . . . . . . . . . . . . . . 36 5.3 Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Post-hoc Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.4.1 ProtectedF -tests . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . 42 5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.5.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . 47 5.5.3 Multi-Subject Analysis . . . . . . . . . . . . . . . . . . . . . . 51 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Chapter 6: Canonical Correlation Analysis Applied to the Estimation of Functional Connectivity 59 6.1 Canonical correlation analysis . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Linear mixing correction . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.3 Contribution of each frequency band to the correlation . . . . . . . . . . 66 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Simulations with Gaussian Noise . . . . . . . . . . . . . . . . . 69 6.4.2 Simulations Combined with Real Data . . . . . . . . . . . . . . 70 6.4.3 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . 74 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.6 Relationship between linear mixing and collinearity of the canonical vectors 79 Chapter 7: Cross-frequency Phase and Amplitude Coupling with Complex Canonical Correlations 82 7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2 Complex Canonical Correlation . . . . . . . . . . . . . . . . . . . . . . 84 7.3 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Chapter 8: Conclusions 89 References 91 vi List of Figures 2.1 Cerebral frontal cortex drawn by Ram´ on y Cajal (1904). The figure shows both pyramidal (A, B, C, D, E) and non-pyramidal (F, K) cells. . 9 2.2 Left: Commercial MEG system with 275 gradiometers; right: diagram of MEG equipment [images obtained from (Pantazis, 2006)]. . . . . . . 10 2.3 Crosstalk effects in reconstructed current density maps: (a) spatial loca- tion of focal source; (b) reconstructed current density map obtained from applying the forward model and then the inverse solution on the focal source shown in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.1 Simulation results. (a) The spatial profile of the two sources simulated on the brain surface; the signal on the left hemisphere lies in the alpha band, while the signal on the right hemisphere lies in the alpha, beta and low-gamma bands; (b) The significant activation regions obtained from the univariate approach (ANOV A), with FWER controlled over space and frequency; (c) The significant activation regions obtained from the multivariate approach (MANOV A), with FWER controlled over space. 46 5.2 Post-hoc analysis results on the simulated data, at the,, and L fre- quency bands. Top row: significant sources obtained from the protected F test. Bottom row: sources where the discriminant ratio coefficient (DRC) was greater than or equal to 1=6. For the other frequency bands, none of the methods found any active source. . . . . . . . . . . . . . . 47 5.3 Results of the analysis with data from a visuomotor study. (a) Brain map of theR statistic, after thresholding for significance; (b) Compari- son between ANOV A (FWER corrected over space and frequency) and MANOV A; (c) Comparison between ANOV A (FWER corrected over space) and MANOV A. In the last two maps, sources in red were detected by both methods, sources in yellow were detected by MANOV A only, and sources in blue were detected by ANOV A only. . . . . . . . . . . . 48 vii 5.4 Maps of normalized differences of the mean observed signal powers for each frequency band, represented by the components of vector ^ 2 ^ 1 . 49 5.5 Significant protectedF brain maps for each frequency band of interest. . 50 5.6 Brain maps of the DRCs greater than or equal to 1=6 for each frequency band of interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.7 Discriminant ratio coefficient (DRC) and normalized mean power for both conditions at specific voxels. (a) The voxel with the highest value of theR statistic; (b) A voxel detected by MANOV A, but not by ANOV A nor by any of the protectedF -tests; (c) A voxel not detected by MANOV A; (d) Spatial location on the cortical surface of each of the selected voxels. 52 5.8 Comparison between the results from multi-subject analysis obtained with MANOV A and with ANOV A, for varying values of the group null hypothesis parameter u. In these maps, red voxels were detected by MANOV A and ANOV A, blue voxels were detected only by ANOV A, and yellow voxels were detected only by MANOV A. . . . . . . . . . . 54 5.9 Maps of voxels with significant protectedF statistics, according to the group statistical test and foru = 13. . . . . . . . . . . . . . . . . . . . 55 6.1 Thresholded maps computed from multi-subject data acquired during a visuomotor experiment (more details in section 6.4.3). Left and right hemisphere maps from 15 subjects are presented, each pair of rows show- ing 5 subjects. In these maps, blue voxels have significant only when permutation-based distributions are used, orange voxels have significant only when Monte Carlo-based distributions are used, and red voxels have significant for both approaches. . . . . . . . . . . . . . . . . . 66 6.2 Simulation results. (a) The spatial profile of the three sources simulated on the brain surface; signal in region (1) lies on the beta band, signal in region (2) lies on the low-gamma band, and signal in region (3) lies on the alpha band; (b) Map of the maximum canonical correlation when the reference is in region (3) (indicated by the green dot); (c) Map of the quantities, that indicate the amount of linear mixing; (d) Results of the statistical thresholding: voxels in green have a significant, voxels in red have a significant, and voxels in blue have both significant and. . 71 viii 6.3 Absolute values of the canonical vectorsa andb at voxels in regions (1) (left) and (2) (right), obtained from the simulated data with the reference voxel in region (3). In the canonical vector plots, yellow bars indicate frequencies that contribute significantly to the interaction, according to theF -tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4 Time series of a trial from one of the 100 simulated datasets. (a) Simu- lated sources; (b) Real MEG data; (c) The sum of the simulated and the real data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.5 Source-domain representations of the simulated (a), real (b) and the sim of simulated and real time series (c) presented in figure 6.4, at time instant t = 0:6944s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.6 Results from one of the group 1 datasets: (a) Spatial profile of the simulated sources; (b) map of the voxels with significant (green), (red) or both (blue), with the reference voxel within region (1). . . . . . 74 6.7 Results from one of the group 2 datasets: (a) Spatial profile of the simulated sources; (b) map of the voxels with significant (green) or (red), with the reference voxel within region (1). Notice that no voxel has significant and significant. . . . . . . . . . . . . . . . . . . . . . . 75 6.8 ROC curve drawn according to the proceedings described in section 6.4.2. The dotted line represents the line of no discrimination, where TPF equals FPF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.9 Left, posterior, and right views of brain maps of voxels with significant group effects, for varying values of the group null hypothesis parameter u. Colored voxels have effects for all values ofu smaller or equal to 10, but yellow voxels have also effects foru = 11, and red voxels have also effects foru equal to 11 and 12. . . . . . . . . . . . . . . . . . . . . . . 76 6.10 Anterior views of maps of voxels with significant components of the canonical vectorsa andb, according to the group statistical test and for u = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.11 Posterior views of maps of voxels with significant components of the canonical vectorsa andb, according to the group statistical test and for u = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 ix 7.1 (a) Spatial profile of two simulated sources: the phase at the theta band in source 1 is coupled with the amplitude at the low-gamma band in source 2; (b) Map of voxels with significant coupling with a reference in region 2, where matrixX contains amplitude information, and matrices Y contain phase information; (c) Map of voxels with significant coupling with a reference in region 1, where matrixX contains phase information, and matricesY contain amplitude information. . . . . . . . . . . . . . 86 7.2 (a) V oxels where the low-gamma phase interacts strongly with the theta amplitude at a reference in region 2; (b) V oxels where the theta amplitude interacts strongly with the low-gamma phase at a reference in region 1. No other phase-amplitude or amplitude-phase combination yielded significant coupling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 x Abstract I describe methods for the detection of brain activation and functional connectivity in cortically constrained maps of current density computed from magnetoencephalography (MEG) data using multivariate statistical analysis. I apply time-frequency (wavelet) analysis to individual epochs to produce dynamic images of brain signal power on the cerebral cortex in multiple time-frequency bands, and I form observation matrices by putting together the power from all frequency bands and all trials. To detect changes in brain activity, I fit these observations into separate multivariate linear models for each time band and cortical location with experimental conditions as predictor variables; the resulting Roy’s maximum statistic maps are thresholded for significance using permuta- tion tests and the maximum statistic approach. A source is considered significant if it exceeds a statistical threshold, which is chosen to control the familywise error rate, or the probability of at least one false positive, across the cortical surface. As follow-up techniques to identify individual frequencies that contribute significantly to experimental effects, I further describe protectedF -tests and linear discriminant analysis. To detect functional interactions in the brain, I take these observations and compute the canonical correlation between a chosen reference voxel and every other voxel in the brain. The canonical correlation maps are also thresholded for significance, but here I use parametric asymptotic approximations. Based on collinearity properties of the vectors associated to the canonical correlations, I implement procedures to discard voxels whose interaction xi with the reference is due to linear mixing, and describe approximateF -tests to identify individual frequencies that contribute significantly to detected interactions. I evaluate these multivariate approaches both on simulated data and experimental data from a MEG visuomotor task study. My results indicate that Roy’s maximum root is more powerful than univariate approaches in detecting experimental effects when correlations exist between power across frequency bands; as for canonical correlation analysis, I find that it detects experimental effects, allowing the simultaneous evaluation of several possible combinations of cross-frequency interactions in a single test. xii Chapter 1 Introduction Magnetoencephalography (MEG) is a relatively recent technique for the imaging of human brain function that is becoming increasingly popular in the clinical and cognitive neuroscience communities. Even though images obtained from MEG data provide considerably lower spatial resolution than other imaging modalities, such as functional magnetic resonance imaging (fMRI) and positron emission tomography (PET), their temporal resolution on the order of milliseconds (a feature shared with EEG images) offers insights into how the brain works that are unmatched by those other modalities. The application of appropriate statistical analysis that controls effectively for false positives ensures that the scientific interpretation of these findings is performed in an objective manner. Statistical inference in MEG distributed activation maps typically uses the general linear modeling (GLM) framework (Kiebel, 2003), which is considered a standard in fMRI (Friston et al., 1995b) and PET (Worsley et al., 1992) neuroimaging studies. However, there are important differences between MEG and the other neuroimaging modalities related to how observations are created and fitted in GLM models, as well as how subsequent statistical inference is performed. The investigation of stimulus-locked event-related components typically involves a mass univariate approach where separate analysis of variance (ANOV A) models are fitted in each spatial location (Park et al., 2002; Barnes and Hillebrand, 2003; Brookes et al., 2004), or each spatial-temporal location (Pantazis et al., 2004, 2005b; Sekihara et al., 2005). Recently there has also been a great deal of interest in analysis of the induced response, which corresponds to stimulus-related 1 variations in power in different oscillatory bands as a function of time. This allows us to detect experimental oscillatory effects corresponding to modulations in power in specific frequency bands, even though the oscillations themselves are not phase-locked to the stimulus or response. Induced effects are typically investigated using a time-frequency decomposition such as the Morlet wavelet transform (Teolis, 1998; Tallon-Baudry and Bertrand, 1999). Examples of the use of ANOV A models to analyze the induced response include Singh et al. (2003); Durka et al. (2004); Kiebel et al. (2005); Pantazis et al. (2005c, 2009a). Since studies in which changes in brain activity were found across several frequency bands have been reported (Bas ¸ar et al., 2001; Kilner et al., 2005; Pantazis et al., 2005c), dealing simultaneously with all frequency bands of interest in a multivariate model might be advantageous compared to applying univariate approaches to each frequency band separately, especially because the improved detection of event- related modulations provided by the former in the presence of correlated variables has been demonstrated (Cole et al., 1994; Field, 2005). MEG and EEG are also among the most popular non-invasive imaging modalities, along with fMRI (Friston et al., 1995b; Fox et al., 2005; Andrews-Hanna et al., 2007), in neuroscience experiments that seek to analyze brain functional connectivity, i.e. inter- actions between cortical regions and their modulation by specific tasks (David et al., 2004; Darvas and Leahy, 2007; Schoffelen and Gross, 2009). Even though the oper- ations of individual neurons and synapses are now understood in considerable detail, understanding the manner in which they cooperate in large ensembles remains a chal- lenge. Again, the good temporal resolution of EEG/MEG plays an important role, as time-frequency decompositions allow the estimation of functional connectivity between signals from distant cortical areas at several frequency bands. Recent findings include theta-alpha (Schack et al., 2005) and theta-gamma coupling in visual memory (Bruns 2 and Eckhorn, 2004; Penny et al., 2008; Schack et al., 2002; Schack and Weiss, 2005); theta-gamma coupling in word recognition (Mormann et al., 2005), and behavioral tasks (Canolty et al., 2006; Jensen and Colgin, 2007); delta-gamma coupling in visual attention (Lakatos et al., 2008); alpha-gamma coupling in mental arithmetic tasks (Palva et al., 2005; Palva and Palva, 2007); alpha-beta (Nikulin and Brismar, 2006) and alpha-gamma coupling in rest (Osipova et al., 2008); beta-gamma coupling in hand movement sim- ulation (de Lange et al., 2008; Jerbi and Bertrand, 2009); and coupling of ultraslow rhythms with frequencies between 1 and 40Hz during somatosensory tasks (Monto et al., 2008). Within-frequency interactions have also been widely reported, e.g. in Busch and VanRullen (2010); de Pasquale et al. (2010); Fries (2005); Hummel and Gerloff (2006); Lachaux et al. (1999); Jerbi et al. (2007); Pantazis et al. (2009b); Palva et al. (2010). Again, a multivariate approach in this context might have better sensitivity in detecting neural effects than univariate ones, because cortical interactions often involve several frequency bands, even though popular connectivity measures in EEG/MEG, such as coherence (Nunez et al., 1997; Gross et al., 2001; Quyen et al., 2001), phase synchrony (Tass et al., 1998; Lachaux et al., 1999; Lin et al., 2004) and Granger’s causality (Kamin- ski et al., 2001; Kus et al., 2004; Brovelli et al., 2004), typically take into account a single quantity from each location, such as amplitude, phase or a combination of the two, within a specified frequency band. Given the wealth of information available from EEG/MEG acquisitions, one must know beforehand the set of variables that are meaningful to the analysis, or conversely, perform a large number of tests across several cortical sites, frequencies, and experimental conditions to identify true interactions. Interpretation of interaction measures is further confounded by the ill-posed nature of inverse methods in EEG/MEG, producing images of low resolution with blurred cortical sources. Most interaction measures are sensitive to the linear mixing of sources, 3 producing spurious interactions (Nunez et al., 1997). Approaches to reduce linear mixing include eliminating zero-lag correlations (Nolte et al., 2004; Marzetti et al., 2008; G´ omez- Herrero et al., 2008), reconstructing current density maps that cancel interfering signals (Hui and Leahy, 2006; Hui et al., 2009), and simply excluding a priori a number of spatial locations from the analysis (Carbonell et al., 2009). In this dissertation, we present methods that analyze MEG signals in the time- frequency domain based on a multivariate statistical framework. We use these methods to solve two main problems: (1) detecting task-based changes in brain activity; and (2) assessing functional connectivity in the brain. We combine minimum-norm inverse imaging with complex Morlet wavelet expansion to compute dynamic maps of cortical activity across multiple frequencies. Multivariate observations are formed by concate- nating measurements of neural power across several frequency bands. The performance of our method is evaluated using simulations and real data obtained from a visuomotor MEG study (Jerbi et al., 2007). Although the emphasis here is on MEG, our method readily applies to EEG signal analysis, and even to combined EEG/MEG acquisitions. For the first problem, the statistic used to measure group separation is Roy’s maximum root, which can be thought of as a generalization of the conventional F -statistic for higher-dimensional problems. The significance of each source is estimated based on the familywise error rate (FWER), or the probability of at least one false-positive under the null hypothesis of no changes in brain activation. The FWER is related to the probability distribution of the maximum statistic across all sources, and resampling methods are used to estimate this distribution (Nichols and Hayasaka, 2003). For the second problem, we implement canonical correlation analysis: given two sets of measurementsX andY, canonical correlation consists of finding linear combinations of each set that result in the maximum correlation between them. In the context of MEG 4 signal analysis in the time-frequency domain, it is then able to find the optimal combina- tion of frequencies in one cortical site that best predicts a combination of frequencies in another cortical site. We provide an FWER-based parametric test to identify significant cortical interactions while controlling for multiple comparisons. Importantly, because multivariate observations are formed across frequencies, our method allows for a solution to the linear mixing problem by considering the way source interaction profiles vary in the spectral dimension. Due to their increased sensitivity in detecting spatially distributed experimental effects, recently there has been an increased interest in multivariate structural (Worsley et al., 2004; Taylor and Worsley, 2008) and functional MRI studies (Friston et al., 1995a; Carlson et al., 2003; Cox and Savoy, 2003; Haynes and Rees, 2006; Norman et al., 2006; Pereira et al., 2009), in which the observations are formed across neighboring voxel locations, following, for example, the searchlight approach (Kriegeskorte et al., 2006). In the field of EEG/MEG, some of the recent multivariate methods developed include Hotelling’s T 2 statistics across the spatial dimension (Friston et al., 1996; Carbonell et al., 2004), partial least squares (D¨ uzel et al., 2003; McIntosh and Lobaugh, 2004), multivariate phase synchronization (Cadieu and Koepsell, 2008), and pattern classifiers with application to brain computer interfaces (Pfurtscheller and Neuper, 2001; Donoghue, 2002; Wolpaw and McFarland, 2004; Lebedev and Nicolelis, 2006). This dissertation is organized as follows: chapter 2 presents the basics of how the brain generates signals that are measurable with MEG equipment, along with an overview of MEG instrumentation, the model that allows the creation of images of brain activity from MEG data, and practical applications. In chapter 3, we present in more detail some of the most common methods used to reconstruct brain images of electric activity, along with tools to determine if these reconstructions provide statistically significant results. 5 Chapter 4 shows approaches to evaluate the presence of task-based functional interactions with MEG. In chapter 5, we describe a new methodology to detect changes in oscillatory (induced) brain activation based on multivariate linear models. Chapter 6 provides a further application of multivariate statistics to MEG signal analysis: we use canonical correlations to investigate cross-frequency functional connectivity. Finally, chapter 8 lists our conclusions, along with topics for future research. 6 Chapter 2 Overview of Magnetoencephalography Magnetoencephalography is an imaging modality that provides unique insights into the functioning of the human brain. Even though activity profiles computed from MEG data suffer from low spatial resolution, specially when compared with other brain imaging modalities such as PET and fMRI, its temporal resolution (on the order of milliseconds) is much higher than that of the those modalities. Furthermore, unlike PET (which infers activity indirectly from the uptake by the brain of molecules such as oxygen and glucose) and fMRI (which uses local hemodynamic changes as a correlate of brain activation), MEG, along with EEG, is able to measure neuronal activity directly. In this chapter, we describe the processes in the brain that result in magnetic fields that can be measured with MEG, the machinery that perform these measurements, how we can create images of brain activation from them, and a few applications of this technique. 2.1 Electrophysiological Sources of MEG Signals In the human brain, the basic unit of information processing and transmission is the neuron (or nerve cell). It consists of three main parts: the soma, where the neuron nucleus is located; the dendrites, which receive the signals from other neurons; and the axon, which transmits the neuron signal to other nerve cells. Axons are surrounded by myelin sheaths, whose function is to provide nutrients and oxygen, mechanical support, and electrical insulation. The most common way of communication between neurons through junctions called chemical synapses. Typically, an electric impulse (action potential) traveling through 7 the membrane of the presynaptic neuron, as it arrives at the termination of its axon, causes the release of neurotransmitters from synaptic vesicles into the synaptic cleft. The neurotransmitters bind to receptors located at the membrane of the postsynaptic neuron, where ion channels (mostly Na + , K + and Cl ) are then opened. Consequently, the membrane voltage changes, creating a postsynaptic potential (PSP); if the PSP is greater than the firing threshold at the axon hillock, another action potential is initiated along the axon of the postsynaptic neuron (Squire, 2008). In the cerebral cortex, excitatory PSP’s occurring at apical dendrites of pyramidal cells create intracellular electric currents called primary currents (Gloor, 1985), and those are believed to be the main generators of measurable magnetic fields (Baillet et al., 2001). The synchronization of primary currents in clusters of tens of thousands of dendrites of pyramidal cells (oriented in parallel to each other and perpendicularly to the cortical surface) results in magnetic fields strong enough to be detected by MEG equipment (Nunez and Silberstein, 2000), even though both primary and volume currents (extracellular currents that result from the conservation of charge principle) generate measurable magnetic fields outside the head. Also, because of their short duration, action potentials along neuron axons are not thought to cause strong magnetic fields (Nunez and Srinivasan, 2006). Figure 2.1 shows the arrangement of pyramidal cells on the cortex. 2.1.1 Functional Interactions It is believed that any cognitive operation in the brain occurs not at the level of individual neurons and their local synaptic connections, but at the level of neural assemblies, which can be defined as “distributed local networks of neurons transiently linked by reciprocal dynamic connections” (Varela et al., 2001). Typically, these neural assemblies are not only temporally coupled but also spatially constrained, and they can be associated with a specific cognitive task (Fingelkurts et al., 2005); when two or more of these assemblies 8 Figure 2.1: Cerebral frontal cortex drawn by Ram´ on y Cajal (1904). The figure shows both pyramidal (A, B, C, D, E) and non-pyramidal (F, K) cells. interact, we have large-scale integration, and this phenomenon is usually what one is most interested in detecting in functional connectivity studies, because it indicates that a given stimulus results in the synchronization of a number of spatially distant, functionally distinct areas of the brain. 2.2 Instrumentation MEG requires sophisticated sensing technology, because the magnetic fields generated by the human brain are extremely low – less than 500 fT, or 10 8 of the Earth’s magnetic field (H¨ am¨ al¨ ainen et al., 1993). The detector capable of acquiring fields of such small magnitude, and which is used in commercial MEG systems to this day, is called a SQUID, 9 or Superconducting QUantum Interference Device (Zimmerman et al., 1970); the first MEG acquisition with the SQUID detector consisted of recordings of brain signals at the alpha frequency band (Cohen, 1972). Figure 2.2: Left: Commercial MEG system with 275 gradiometers; right: diagram of MEG equipment [images obtained from (Pantazis, 2006)]. A picture and diagram of a modern MEG system is shown in figure 2.2; most of the commercial MEG systems recently produced contain hundreds of SQUID sensors (magnetometers). These systems are built with mechanisms to attenuate several types of background noise. The sensors are made of superconducting material, and are immersed in low-temperature liquid helium, which also results in very low instrumental noise; the room containing the MEG apparatus is magnetically shielded to diminish effects of high-frequency interference, such as RF waves; and low-frequency noise is reduced 10 by the inclusion in the system of gradiometers, which are sensors capable of detecting spatial gradients of the magnetic fields that have reduced sensitivity to fields from distant sources compared to a magnetometer (Baillet et al., 2001). 2.3 Forward and Inverse Modeling Given a set of MEG measurements, we are interested in finding the strength and location (and sometimes the orientation) of the sources of current density in the brain that result in the measured magnetic fields. This is an inverse problem, which requires first an adequate construction of the forward model, or the model that maps brain current densities onto the MEG sensor space. An important concept when developing a forward model is the equivalent current dipole (ECD), which assumes that all electric currents within a small volume of cortex can be approximated as an oriented point source (Baillet et al., 2001). Because most of the relevant MEG signals have frequencies below 100 Hz, it is possible to describe the physics of MEG with quasi-static approximations of Maxwell’s equations (Baillet et al., 2001). Also, the magnetic permeability of biological tissue is very close to the permeability of free space, thus the main factors in determining the forward model are the electric conductivity and shape of all components of the head (scalp, skull, brain and cerebrospinal fluid) (Darvas et al., 2004). The conductivities of the tissues in the head can be assumed to be homogeneous with standard values, thus the head geometry remains to be found; simple models that approximate the head with spheres (Sarvas, 1987; Huang et al., 1999) are known to work well (Leahy et al., 1998), while more realistic models that take into account the true shape of the head, such as boundary element methods and finite element methods (Baillet et al., 2001; Mosher et al., 1999; Huang et al., 1999; Baillet et al., 2004), produce some improvement in localization accuracy (Leahy et al., 1998). The latter models require an accurate representation of the 11 tissue boundaries, which can be obtained by the segmentation of anatomical MRI images, a procedure that can be performed with automated software (Shattuck and Leahy, 2000; Goebel, 1996). The final model provides a linear mapping between the unknown cortical current densities and the measured magnetic field. The inverse problem essentially requires solution of this set of typically highly underdetermined equations. In the case that multiple dipole rather than cortical current density models are used, the locations of the dipoles must also be determined, which are nonlinear parameters that affect the forward model. Once an appropriate forward model is computed, an inverse solution can be applied to the MEG data to find the current sources. The most popular inverse methods can be divided in two categories: dipole fitting methods and imaging methods. Dipole fitting methods assume that the measured magnetic fields can be described by a few ECDs, and seeks to determine their location, strength and orientation, while imaging methods assume that MEG data is generated by possibly several thousands of clusters of pyramidal cell dendritic trees orientated normally to the cortical surface and with known location, and only tries to find their current intensities using the linear forward model (Baillet et al., 2001). We will examine some of these methods in chapter 3. 2.3.1 Crosstalk One important issue in MEG signal analysis at the source domain, especially in matters concerned with connectivity studies, is the phenomenon of crosstalk (also known as field spread or linear mixing) (Schoffelen and Gross, 2009). As the forward model is linear, a given current source will be mapped onto every channel in the MEG array; consequently, all MEG channels will potentially have information from all sources in the brain. And no inverse solution is capable of fully unmixing the original sources, because the inverse problem is highly underdetermined: for an accurate representation of distributed brain 12 activity, we typically need tens of thousands of source locations on the cortical surface, but we have only hundreds of sensors. Distributed source imaging methods of the type used in this thesis therefore need to use a substantial amount of regularization, resulting in relatively low cortical resolution as illustrated in figure 2.3. Crosstalk becomes a serious problem in connectivity analysis, because due to the wide point spread function of the inverse solution, we might find interactions between two regions when actually there is no functional coupling at all. Crosstalk effects will be discussed in more detail in later chapters. Figure 2.3: Crosstalk effects in reconstructed current density maps: (a) spatial location of focal source; (b) reconstructed current density map obtained from applying the forward model and then the inverse solution on the focal source shown in (a). 13 Chapter 3 Image Reconstruction and Statistical Analysis In MEG, image reconstruction refers to techniques that convert the magnetic fields acquired in MEG experiments into maps of brain activity (current density maps, or CDMs), presented either on a 3D representation of the brain (volume imaging), or on a 2D representation of the cortical surface (cortical imaging). Because MEG data is severely corrupted by noise from several sources, statistical procedures must be applied to the CDMs to detect significant brain activation while controlling for the number of false positives. In this chapter, we present the most common methods for MEG image reconstruction, along with typically used approaches and criteria for assessing statistical significance of current density maps. 3.1 Model Our MEG data set consists of J stimulus-locked event-related trials, one per stimu- lus repetition. Each trial is an array of data M (n sensors n timepoints ) representing the measured magnetic field at each sensor and each time instant. Brain activationZ (n sources n timepoints ) is modeled as being linearly related to the measurements, according to the expression: M =GZ +N; (3.1) where G (n channels n sources ) is the forward operator, also called lead field matrix, and N represents noise in the measurements. G can be computed with one of the 14 methods discussed in section 2.3. The dimensions of the matrices described here come from the assumption (followed throughout the remainder of this dissertation, unless otherwise stated) that the elemental current dipoles at each cortical location are oriented perpendicular to the cortical surface; if we do not assume this constraint to our model, then at each source we will have three possible current densities (one for each component of the density,x,y andz), and the number of rows ofZ and the number of columns ofG is multiplied by 3. 3.2 Inverse Methods As discussed in section 2.3, inverse solutions can usually be divided in two categories: dipole fitting methods and imaging methods. The first of the methods presented below (regularized minimum-norm imaging) falls into the second category, while the others (linearly constrained minimum variance beamformer and multiple signal classification) are in the first. The choice of which method to use is not trivial, and depends on what kind of activation the experiment is expected to yield: for a very small number of focal, stationary activation sources, dipole fitting methods perform better, as demonstrated by ROC studies (K¨ uc ¸ ¨ ukaltun-Yıldı rım et al., 2006); for more distributed, more complex activation patterns, imaging methods should be preferred (Darvas et al., 2004). 3.2.1 Regularized Minimum Norm We can obtain an estimate of the time seriesZ by choosing an appropriate cost function on the fit of the estimated image to the data, and applying linear methods to minimize it (H¨ am¨ al¨ ainen and Ilmoniemi, 1984). However, in order to ensure stable reconstructions, some form of regularization must be included in the cost function, because the inverse problem is severely ill-posed. Standard Tikhonov “mininum norm” regularization uses 15 the L2 norm of the solution as a regularizer and has the following cost function (Tikhonov and Arsenin, 1977; Okada, 2003): = (MGZ) 0 (MGZ)Z 0 Z; (3.2) where is the regularization parameter, which is often chosen heuristically but can also be chosen using the L-curve (Hansen, 1992; Hansen and O’Leary, 1993) or using cross-validation methods (Golub et al., 1979; Wahba, 1990). By setting the derivative of to zero, we find: ^ Z = (G 0 G +I) 1 G 0 MHM; (3.3) whereH is the regularized minimum-norm inverse operator. There are many variants on the minimum norm method where the regularizer is replaced with the quadratic form Z 0 WZ. In the most common cases,W is either a column normalizing diagonal weight (Baillet et al., 2001) or based on a Laplacian operator, as in the LORETA method (Pascual- Marqui, 2002). The column normalization weighting has the effect of compensating for the effects of depth dependent sensitivity and tends to produce less bias of sources towards superficial portions of the cortex that are closer to the sensors. Since the sensitivity of each source location to noise in the data is not uniform, it is common to weight this solution so that the resulting current density maps are noise normalized (Dale et al., 2000): ^ Z nn = HM p diag(HC N H 0 ) ; (3.4) whereC N is the noise covariance. The normalized reconstructed sources ^ Z nn will all have at-distribution, in the case where the signal consists of noise only, which serves as the standard null hypothesis in detecting significant experimental effects in evoked response studies using MEG (Dale et al., 2000; Pantazis et al., 2005a). Another variation 16 on normalization is to replace the noise covarianceC N in equation (3.4) with the data covarianceC M . This approach is referred to as the standardized low-resolution brain electromagnetic tomography, or sLORETA (Pascual-Marqui, 2002). It should be noted that in the case of noise only,C M = C N , thus the two methods are equivalent. They therefore differ only when true signals are present. 3.2.2 Linearly Constrained Minimum Variance (LCMV) Beam- former Beamforming, or spatial filtering, is a technique initially applied to array signal processing problems, and which has found wide use in fields such as radar and sonar (Van Veen and Buckley, 1988). The spatial filter is designed to pass the signal originating from a specified location with unity gain, while attenuating signals from all other directions under consideration. For this reason, the beamformer applied to MEG is sometimes referred to as a “virtual depth electrode”. Under the assumption that inerfering signals and additive noise are uncorrelated with the signal coming for the location of interest, the desired result is achieved by minimizing output signal power subject to the unity gain constraint. In MEG, the goal is to find a filter at each source location that minimizes the signal output in it while attenuating signals elsewhere in the brain (Van Veen et al., 1997). The beamformer output at spatial locationi is: Y i =W 0 i M; (3.5) where W i is the spatial filter for location i with dimensions n channels 3, and Y i has dimensions 3n timepoints . In the expression above, we assume unknown current density orientation; if the current densities are assumed to be normal to the cortical surface, equation 3.5 remains valid, but now we have vectorsy i (1n timepoints ) andw i (n channels 1) instead ofY i andW i , respectively. 17 We find the expression forW i by solving the following optimization problem: min W i tr(W 0 i C M W i ) s:t: W 0 i G i =I; (3.6) whereC M is the data covariance, andG i (n channels 3) are the rows in the lead field matrix that correspond to sourcei (which we replace with vectorg i in the constrained case). By using equation 3.6, we seek to minimize the variance of the filter outputY i , while the linear constraint ensures unit filter gain at the desired locationi. We find the solution of this problem by using Lagrange multipliers, and the result is given by: W i = (G 0 i C 1 M G i ) 1 G 0 i C 1 M : (3.7) By substituting equation 3.7 in 3.5, we find the current density at sourcei, and the power (or variance) ofY i is given bytr[(G 0 i C M G i ) 1 ]. If we normalize the signal power by the noise power (i.e. the expected output at thei th in the presence of noise only), we have the neural activity index at locationi (Van Veen et al., 1997): d i = tr[(G 0 i C 1 M G i ) 1 ] tr[(G 0 i C 1 N G i ) 1 ] ; (3.8) whereC N is the noise covariance. A recent development in MEG source reconstruction with beamformers modified the linear constraint of equation 3.6, in order to force not only unit gain at the target location, but also null current densities at other specific locations (Hui et al., 2009). This is a promising approach in connectivity studies, because it attenuates crosstalk effects between spatial regions. 3.2.3 Multiple Signal Classification (MUSIC) MUSIC is another inverse method for MEG that was originally developed for applications in array signal processing (Schmidt, 1986). In MUSIC, we estimate the signal subspace 18 from the MEG measurements, and then we project the rows of the lead field matrix corresponding to each spatial location onto the signal subspace. The locations at which the normalized projections are approximately equal to unity correspond to the desired dipole locations (Mosher and Leahy, 1998; Mosher et al., 1999). To estimate the signal subspace, we compute the singular-value decomposition of the measurement matrix:M =U M S M V 0 M . If a number of assumptions hold (numberp of dipoles smaller thann channels , large SNR, i.i.d. channel noise), a basis for the signal subspaceU S can be formed from the firstp columns ofU M , while the other columns of U M form a basis for the noise subspaceU N ; the value ofp can be determined from the singular values of the covariance ofM (Mosher and Leahy, 1998; Baillet et al., 2001). MUSIC then projectsG i , the lead field for locationi, ontoU S by means of the canonical correlation between these two matrices (details on canonical correlation analysis are presented in chapter 6); if the canonical correlation is equal to 1, then we have found a current dipole, and the associated canonical vectors indicate the dipole orientation. 3.3 Statistical Analysis In order to assess the statistical significance of current density maps computed from MEG data, we must first convert these CDMs into statistical maps, at which we can apply some form of thresholding to control for false positives. The standard approach for this transformation in MEG is the general linear model (GLM) (Kiebel, 2003). One simple example of the use of this approach is as follows: let us assume that we have MEG measurements from several trials or epochs, in which two experimental conditions were tested (e.g. some experimental task and rest). At each spatial location in the brain, we fit a GLM according to the following expression: y ij = i +u ij ; (3.9) 19 where y ij is the observation at trial j of condition i (i2 1; 2 in this case), i is the regression parameter, andu ij refers to noise; the observation can be, for instance, the current density itself at that location, or the density power within a specific frequency band. By putting together information from all available trials, we can rewrite equation 3.9 in matrix form: y =X +u; (3.10) where the rows of vectorsy andu containy ij andu ij , respectively, = [ 1 2 ] 0 , and X is the design matrix that relates the observations to the regression parameters, and whose elements are either 0 or 1. To test if there are differences between conditions, the appropriate statistic is the mean difference divided by the noise variance, and is given by: T = (n trials 1) ^ 1 ^ 2 yX ^ ; (3.11) where ^ = [ ^ 1 ^ 2 ] 0 is the least-squares estimate of . If the noise is assumed Gaussian, the statistic shown in equation 3.11 follows Student’st-distribution, but more complex linear models have different distributions (Snedecor’sF , Hotelling’sT 2 , among others). If no assumptions can be made about the data, the statistic distributions may be extracted directly from the data with nonparametric resampling methods (Nichols and Holmes, 2001). Once we have statistical maps, we can proceed with the thresholding. In the brain, we may have thousands of statistical hypotheses to be tested, one for each current source; if the criterion for control of false positives is not chosen carefully, the number of errors in the detection might be exceedingly large: for instance, if we have 10,000 sources in the brain, and we test each one separately for significance with an error probability of 5%, the noise at each location being independent, we will find on average 500 active sources even if there is no activity at all. In practice this treshold would produce far 20 fewer errors since regularization of the inverse problem results in a substantial amount of smoothing of noise on the cortical surface. However, we still need to control for the multiple comparison problem to avoid erroneous inferences. Below we describe criteria that address the problem of multiple comparisons. It should be pointed out that the GLM described above, and the approaches discussed below, are appropriate for the statistical thresholding of CDMs produced by imaging methods; for dipole fitting solutions, other methods must be used for the assessment of significance, such as bootstrap resampling (Darvas et al., 2005). 3.3.1 Familywise Error Rate (FWER) The familywise error rate is defined as the probability of at least one false positive under the null hypothesis H 0 of no activity in the brain; its mathematical expression is as follows (Nichols and Hayasaka, 2003): P(FWER) = P(max i T i >T jH 0 ) = P([ i T i >T jH 0 ); (3.12) where T i is the statistic at location i. The most straightforward method to control the FWER is the Bonferroni method (Hochberg and Tamhane, 1987), which can be summarized as follows: if we haven sources in the brain, and we want to threshold the statistical map at an level of confidence, then we must test each individual source at the=n sources . This method is based on Bonferroni’s inequality: P nsources [ i=1 T i >T ! nsources X i=1 P(T i >T ) =n sources ; (3.13) which becomes an equality if theT i ’s are independent. However, it is a very conservative test for MEG studies, where strong and highly variable data dependencies are usually found. 21 A less conservative approach for the control of FWER consists in ordering the p-values at all locations, and sequentially testing each p-value p (i) in ascending or descending order One such method is due to Hochberg (1988), which starts by checking whether the p-value p (nsources) is larger than ; if that is the case, then p (nsources1) is compared with=2, and so on until we find aj for whichp (j) =(n sources j + 1), and consider allp (i) fori j significant. If we do the same comparisons but starting from the smallest p-valuep (1) and then going up, we have Holm’s method (Holm, 1979), of which Hochberg’s method is a more powerful modification. Hochberg’s method is more powerful than Bonferroni’s, and requires weak assumptions on data dependence that are satisfied by multivariate Gaussian data with non-negative correlations where the null hypothesis is true (Sarkar, 2002). However, it does not take into account the spatial smoothness and the particular dependence of the data (Nichols and Hayasaka, 2003). FWER methods that use information about the data’s spatial structure and dependence are based on the distribution of the maximum statistic max i T i . Once this distribution is determined, we can control FWER at the-level by thresholding all sources in the brain at the-threshold for the maximum statistic. There are two main techniques to find the maximum distribution: 1. With permutation methods (Nichols and Holmes, 2001), in which samples of the distribution are obtained empirically from surrogate brain maps generated by resampling the data. For instance, for the linear model in equation 3.10, we can create surrogate statistics by randomly permuting the rows of the design matrixX, and then proceed with the computation ofT as usual; 22 2. With random field theory (RFT) (Adler, 1981), according to which the maximum distribution can be approximated by the following function: P (max i T i >T jH 0 ) D X d=0 R d (S) d (T ): (3.14) In this expression,D is the dimension of the search regionS to be analyzed (D = 2 for cortical images,D = 3 for volumetric maps);R d (S) is sum of quantities known as resels (or resolution elements), which depend only on topographical features of the statistical map; and d is known as the Euler Characteristic (EC) density, which is dependent on the thresholdT and on the distribution of the statistics (Gaussian, t,F , 2 , Roy’s maximum root and so on) (Worsley et al., 1996, 2004). Both approaches have advantages and disadvantages: permutations yield exact distri- butions, require very few assumptions, and are easy to implement, but its computational cost can be very high; on the other hand, RFT has fast processing times, but is only an approximation, which requires several assumptions for accurate results (i.e. same known parametric distribution at every source, point spread function with two derivatives at the origin, smoothness, and high thresholds) (Pantazis et al., 2005a). Whatever technique is used, however, it is easy to verify that the use of the maximum statistic takes into consideration the correlation structure in the data. To illustrate this point, let us consider two extreme cases: if, for instance, all locations are 100% independent, then thresholding based on max i T i is equivalent to a Bonferroni test, as a direct result of equation (3.13); on the other hand, if the locations are 100% dependent, then all T i ’s are equal, and the thresholding is equivalent to testing each individual source without correction for multiple comparisons. 23 3.3.2 False Discovery Rate (FDR) Another criterion for error control is the false discovery rate, which is the expected rate of false discoveries among the sources where the null hypothesis was rejected (Benjamini and Hochberg, 1995; Genovese et al., 2002): for example, if we choose the FDR to be = 5%, then we can expect 5% of the sources above the threshold to be false positives. Statistical thresholding based on the FDR can be performed with the following steps (Benjamini and Yekutieli, 2001): 1. Convert the statistical map into a map of p-values (this can be done parametri- cally or, if the statistic distribution is not known, by obtaining it with resampling methods); 2. Sort the p-values in ascending orderp (1) p (2) :::p (nsources) ; 3. Find the largest sourcei for which the following inequality holds: p (i) i n sources c(n sources ) ; (3.15) 4. Reject the null hypothesis at all voxels whose p-value is smaller thanp (i) . c(n sources ) = P nsources k=1 1=k if no assumptions are made about the joint distributions of the p-values, orc(n sources ) = 1 if the p-values are assumed to be independent or to have positive dependence (Benjamini and Yekutieli, 2001). FDR tends to detect more active sources than any of the FWER methods described above, and it also controls for FWER, but in a weak sense. With FDR, if the null hypothesis is rejected everywhere in the brain, FWER control is guaranteed, but if the null hypothesis is rejected at some source, then FWER control is not guaranteed anywhere; on the other hand, the methods of section 3.3.1 are able to control FWER 24 effectively in any subset of the brain even if the null hypothesis is rejected somewhere (strong FWER control). 3.3.3 Multi-subject Analysis The procedures described in the previous two sections apply to the statistical testing of data obtained from a single subject. In order to obtain conclusions that are meaningful from a neuroscientific point of view inference must be performed at the population level. But to carry out those inferences, we need to adequately combine the signals from all individual subjects to form a group statistical map, which can then be thresholded according to the approaches discussed above. Two possible ways of producing group maps are by combining linear models and by combining p-values (Lazar et al., 2002; McNamee and Lazar, 2004). A simple method to combine linear models is with summary statistics (Mumford and Nichols, 2006; Holmes and Friston, 1998): in the first stage, we fit at every voxel of each individual subject (we assume that the cortical surfaces of all subjects are coregistered to a common brain) a general linear model as the one shown in equation (3.10): y k =X k k +u k ; (3.16) wherek is the subject index. Each individual linear model gives us a contrast of the estimated regression parametersc 0 ^ k , and the contrasts of all subjects form the rows of the observation vectory g in the group linear model: y G =X G G +u G ; (3.17) from which we can then proceed to the statistical tests as usual. The summary statistics approach assumes that the variance of c 0 ^ k is the same for across subjects, and also that the subjects are independent. If these assumptions do not hold, one must estimate 25 the within-subject and between-subject variances that will be taken into account in the computation of the estimates ^ k and ^ G (Beckmann et al., 2003). Group analysis with pooling of p-values requires a group hypothesis, which depends on the minimum number of subjectsu, 1un subjs that must present an effect so that a group effect is detected. The group hypothesis can be stated as follows: H u 0 :m<u vs: H u a :mu; (3.18) wherem is the unkwown number of subjects with an actual effect (Heller et al., 2007). Onceu is chosen, there are several ways we can combine the individual p-valuesp k for each location into a map of group p-valuesp G . Some of these ways are (here,p (k) are the ordered individual p-values): 1. Fisher’s method: p G = P X F 2 n subjs X k=u logp (k) ! ; (3.19) whereX F has a 2 distribution with 2(n subjs u + 1) degrees of freedom; 2. Stouffer’s method: p G = 1 P n subjs u+1 k=1 z (k) p n subjs u + 1 ! ; (3.20) where is the cumulative distribution function of a standard normal distribution, andz (k) = 1 (1p (k) ); 3. A generalized form of Simes’ method, proposed in Benjamini and Heller (2007): p G = min k=1;;n subjs u+1 n subjs u + 1 i p (u1+k) : (3.21) This method is a more powerful variant of Bonferroni’s method for combining p-values, according to whichp G = (n subjs u + 1)p (u) . 26 Once we compute a map with thep G values, procedures such as Hochberg’s method or the FDR-based thresholding discussed above can be used to find the locations with a significant group effect. Benjamini and Heller’s method does not require independence across subjects, unlike Fisher’s and Stouffer’s methods, and it is also more stable: when equations (3.19) or (3.20) are used, a singlep (k) with a very low value will dominate the group effect, because it will force an infinitesimal value forp G regardless of the other subjects’ p-values. When choosing whether to use summary statistics or to test for group effects with group p-values, one must take into account the advantages and shortcomings of each option: while the former enables tests of directionality and magnitude, and makes it easier to implement thresholding based on the maximum distribution, the latter takes advantage of all statistical information on each location, not just means and variances (although it does not consider the spatial structure of the data), while affording greater flexibility in the choice of the minimum number of subjects with an effect. Furthermore, in problems such as those we will deal with in chapters 5 and 6, summary statistics require a large number of subjects to avoid problems related to limited degrees of freedom. 27 Chapter 4 Methods for estimating functional interactions in the brain By directly measuring neuronal activity at very small time scales, MEG has the ability of detecting not only short-term variations in brain dynamics, but also brain signal behavior in the time-frequency domain; both features make MEG an attractive choice for studies of functional interactions. In this chapter, we will focus on some of the most common methods for the assessment of functional connectivity in the brain with MEG data. We will discuss the properties of these methods, including how they deal with linear mixing (crosstalk) effects, when applicable. Our survey describes techniques that estimate functional interactions directly from current densities; other techniques (e.g. (David and Friston, 2003)) use the data to build complex, nonlinear models that are feasible from a biophysical point of view, but we do not focus on them in this work. 4.1 Coherence and Coherency Coherency is an analog of the standard temporal cross-correlation coefficient on the signal frequency domain (Nunez et al., 1997). Given two spatial locations in the brain and the time representations of their current densitiesx(t) andy(t), coherency is given by the expression: C x;y (f) = X(f)Y (f) p X(f)X (f) p Y (f)Y (f) ; (4.1) whereX(f) andY (f) are the respective complex frequency representations of signals x(t) andy(t) (obtainable with Fourier transforms or wavelet convolutions, for instance 28 (Schoffelen and Gross, 2009)), and “ ” denotes the complex conjugate. Coherence is simply the absolute value of the quantity presented in equation 4.1, and is a measure of the dependency between two time series at a given frequency. Its values may vary between 0 (no linear dependency between x and y) and 1 (perfect linear dependency). Some of the problems related to coherence are: it does not deal adequately with non-linear dependencies; it is not able to distinguish between signal magnitude and phase as factors that cause signal interactions; and it is not immune to crosstalk effects. Regarding this last effect, an improvement was proposed in (Nolte et al., 2004), and consists of simply computing coherency as shown in equation 4.1, and keeping only the imaginary part ofC x;y (f). The rationale guiding this approach is that “interactions” due to crosstalk are instantaneous or zero-lag, and the coherency associated with these “interactions” would have only real components; consequently, by discarding the real part ofC x;y (f), we would remove crosstalk effects. Even though this technique is indeed effective in attenuating much of the spurious interactions, it has the obvious effect of also throwing away information related to true interactions that also have zero time lag. 4.1.1 Cross-Frequency Amplitude Coupling Based on the frequency representations of signalsx(t) andy(t), it is possible to verify whether, for different frequenciesf 1 andf 2 ,X(f 1 ) interacts withX(f 2 ). This can be done simply by retrieving the magnitude ofX(f 1 ) andX(f 2 ), and computing the standard cross-correlation over trials between these two quantities as usual (Bruns and Eckhorn, 2004). Recent examples of the application of cross-frequency amplitude coupling in the imaging of electromagnetic brain activity include de Lange et al. (2008); Osipova et al. (2008); Witte et al. (2008). 29 4.2 Phase Synchrony As mentioned in the previous section, one of the disadvantages of coherence is that it is not able to determine whether the magnitude or the phase ofX(f) andY (f) are responsible for a given high value ofC x;y (f). One approach we can take that will allow us to be more certain about the origins of the interactions between two signals is by looking at a quantity called phase synchrony (Tass et al., 1998; Lachaux et al., 1999; Quyen et al., 2001), which looks specifically at how the phases ofx(t) andy(t) interact. The phase locking condition is an indication of the degree of synchronization between two oscillators, and is expressed by: j n;m (t)j< constant; where n;m (t) =n x (t)m y (t): (4.2) Here,n andm are integers, and x;y (t) are the phases of the respective signals (usually, it is assumedn =m = 1 for simplicity (Quyen et al., 2001; Varela et al., 2001)). This condition can be tested at each time instant with the phase-locking value (PLV): PLV(t) = 1 n trials X k expfi[ x (t;k) y (t;k)]g; (4.3) where the sum is taken over the number of trials. As with coherence, a PLV of 1 indicates little variation in the phase difference across trials, while a PLV of zero indicates otherwise. The estimation of the signal instantaneous phases can be found either by a narrowband filtering of the signals followed by a Hilbert transform (Tass et al., 1998), or by convolving the signals with wavelet kernels (Lachaux et al., 1999). To threshold the PLV values for significance, their distribution can be computed by generating surrogate data, in which the trial indices for only one of the two signals is shuffled. One limitation of phase synchrony is that it does not deal with crosstalk effects. Another one is that equation 4.3 not only assumes with interactions within narrow 30 frequency bands, but at the same frequency. However, it is possible to evaluate cross- frequency phase synchrony between narrowband phases x (t) and y (t), if their central frequenciesf x andf y are chosen such thatnf x =mf y , wheren andm are integers. The n :m phase-locking value then becomes (Palva et al., 2005): PLV(t) = 1 n trials X k expfi[n x (t;k)m y (t;k)]g: (4.4) 4.3 Methods for Nested Oscillations Nested oscillation, also known as phase-amplitude coupling, refers to interareal cross- frequency interactions between the amplitude of one signal and the phase of another. Interest in the study of nested oscillations has been increasing, as they have been detected in a number of cognitive tasks. There are a number of methods for the detection of phase-amplitude coupling: one of them is the modulation index (Canolty et al., 2006), given by: M(t) = 1 n trials n trials X k=1 A x (t;k)exp[i y (t;k)] ; (4.5) whereA x (t;k) is the amplitude of thek th trial of signalx(t), and y (t;k) is the phase of thek th trial of signaly(t) (the amplitudes and phases can be retrieved, for instance, by narrowband filtering followed by Hilbert transform). We can also take the Hilbert transform ofA x (t;k), compute the transform’s phase Ax (t;k), and then apply (4.3) to y (t;k) and Ax (t;k) (Mormann et al., 2005). Finally, one may use a general linear model, in whichA x is modeled as a multiple regression (Penny et al., 2008): A x (t;k) =X(t;k)(t) +(t;k); (4.6) where is a vector of regression coefficients, is additive Gaussian noise, and X(t;k) = [ 1 cos[ y (t;k)] sin[ y (t;k)] ] are the rows of the design matrix. It has 31 been demonstrated (Penny et al., 2008) that the GLM has the best performance among all methods described here for the detection of nested oscillations. 4.4 Multivariate Autoregressive (MV AR) Models Unlike the techniques described previously, methods based on multivariate autoregressive models have the capability to estimate not only interaction strength, but also its direction- ality. Another advantage of MV AR models is that they allow the study of interactions between multiple time series simultaneously. If x(t) = [ x 1 (t) x 2 (t) ::: x p (t) ] 0 represents a p-dimensional time series (each one of them related to a different brain region), them th MV AR model can be described by (Kaminski et al., 2001): x(t) m X i=1 A(i)x(ti) =e(t); (4.7) where e(t) is a p-dimensional white noise vector, and the A i ’s are pp coefficient matrices, whose elements can be computed by solving multivariate Yule-Walker equations. In the frequency domain, equation 4.7 becomes: X(f) =A 1 (f)E(f) =H(f)E(f); (4.8) whereX(f) is the Fourier transform ofx(t),A(f) is given by: A(f) = m X k=0 A k e i2kf (4.9) withA(i = 0) =I, andH(f) is the transfer matrix of the system. Measures such as the directed transfer function (DTF) (Kaminski and Blinowska, 1991; Kaminski et al., 2001) or the partial directed coherence (PDC) (Baccal´ a and Sameshima, 2001; Kus et al., 2004) use this model to infer interactions from time seriesj of to time seriesi (i;j = 1;:::;p). 32 They achieve this by looking at the elements of eitherH(f) orA(f), according to the following: DTF 2 ij (f) = jH ij (f)j 2 P p k=1 jH ik (f)j 2 ; PDC 2 ij (f) = A 2 ij (f) 2 P p k=1 A 2 kj (f) 2 : (4.10) Both measures are normalized to lie within the interval [0; 1], but the normalization factors (denominators) are different: DTF is normalized with respect to all activities flowing towards time series i, while PDC is normalized with respect to all activities flowing from time seriesj. Their results are similar, although DTF is not able to distinguish indirect interactions from direct ones (Kus et al., 2004). It should be pointed out that, in the case wherep = 2 and the normalization factor is dropped from the DTF expression (Kaminski et al., 2001; Kus et al., 2004), this measure is equivalent to the concept of Granger causality expressed in the frequency domain (Granger, 1969; Geweke, 1982; Brovelli et al., 2004). As mentioned in the beginning of this section, measures based on MV AR models are advantageous because they provide information regarding the directionality of the interactions; in fact, DTF and PDC can be seen as multivariate extensions of the concept of pairwise Granger causality. But this model has some limitations, as it is only able to deal with connectivity between signals at the same frequency, and it also suffers from crosstalk, as demonstrated elsewhere (Hui et al., 2009). In chapters 6 and 7 we return to the problem of estimating connectivity between regions from MEG inverse solutions. There we use multivariate approaches to explore the possibility of interactions between multiple frequencies or between frequencies in cases where the interaction frequencies are unknown. 33 Chapter 5 Detection of Changes in Oscillatory Neural Activity with Multivariate Linear Models In this chapter we describe the use of multivariate analysis of variance (MANOV A) models to detect task-based changes in oscillatory brain activity. Observations for several frequency bands are constructed using complex Morlet wavelet time-frequency decompo- sition, and are fitted to separate MANOV A models at each source location. The resulting Roy’s maximum root maps are tested for significance using permutation tests, and follow- up protected F -tests and linear discriminant analysis allow us to identify individual frequencies that contribute significantly to experimental effects. We use simulations and real MEG data from a visuomotor experiment to compare the performance of our multivariate method against conventional univariate approaches. 5.1 Model The model that linearly relates the MEG measurement matrixM (n sensors n timepoints ) from each of the J stimulus-locked event-related trials with the brain activation Z (n sources n timepoints ) is the one described in section 3.1 : M =GZ +N; (5.1) 34 as in equation 3.1,G (n channels n sources ) is the lead field matrix, andN represents noise. To find an estimate of the spatiotemporal activityZ, we use the Tikhonov regularized minimum-norm inverse method discussed in section 3.2.1: ^ Z = (G 0 G +I) 1 G 0 M: (5.2) The reconstructed timeseries at each source locations are then given byZ st , wheret is the time index. Even though we use cortically constrained regularized minimum-norm, our method can be used with any inverse solution, surface or volume based. Our goal is to detect event-related modulations of brain activity over time, space, and frequency, and for that we need an estimate of neural activation energy at specific time-frequency instances (Pantazis et al., 2009a). This estimate is given by: y stf =jC stf j 2 ; (5.3) whereC stf =Z st ?w tf are the complex wavelet coefficients obtained from convolving each source timeseriesZ st with a continuous-time Morlet wavelet kernelw tf (Teolis, 1998). Morlet wavelets are popular in MEG time-frequency analysis (Tallon-Baudry and Bertrand, 1999; Tallon-Baudry et al., 1996; Kiebel et al., 2005), as they capture the signal oscillatory components at the desired frequencies but, unlike the Fourier transform, also provide information on temporal localization. The expression of the kernel of this wavelet at central frequencyf is: w tf = (b) 0:5 e j2ft e t 2 =b ; (5.4) where the bandwidth parameterb is chosen such that the productf p b is equal to 2:12. The bandwidth parameter controls the frequency resolution 2 f , according to the expression 35 f = f=2:12(2) 0:5 (Pantazis et al., 2009a): for instance, the frequency resolution at 3Hz is 0:64Hz, and 2 f = 15:93 atf = 75Hz. To improve the signal-to-noise ratio, and increase the statistical power by minimizing the total number of statistics that need to be tested for significance, we may choose to summarize the observations over spatial regions of interestS, time bandsT = [t 1 ;t 2 ], and frequency bandsF = [f 1 ;f 2 ]: y STF = Z Z Z (s;t;f)2(S;T;F) jC stf j 2 dsdtdf: (5.5) 5.2 Multivariate Analysis of Variance Each trial provides us energy observations y STF in several spatial-temporal-spectral bands STF . We introduce new indices to identify the observations: i2f1; 2;:::g denotes the condition (e.g. 1 for the main task of interest and 2 for the baseline or rest condition), andj denotes the trials acquired from each conditioni. With these indices, the observations can be arranged into general linear models. The methodology used here expands the statistical parametric mapping (SPM) approach (Friston, 1996) to include in the analysis the multivariate data now available at each source location (Worsley et al., 2004). Consider the ANOV A model: y STF ij = STF i +u STF ij ; (5.6) where STF i are the activation parameters for each condition, andu STF ij is the model error term, assumed to be zero-mean Gaussian. ANOV A is used in situations in which there is one dependent variable (or observation) and so it is known as a univariate test. The superscriptsSTF indicate that we fit the same model for all spatial-temporal-spectral 36 bands. Since a separate but identical GLM is fitted at each band, this approach is typically referred to as mass univariate analysis. MANOV A is designed to look at several dependent variables (observations) simulta- neously and so it is a multivariate test. We convert the univariate model described above to a MANOV A model by reorganizing all the variables into row vectors, whose elements consist of observations over different frequency bandsF . For example, row vectory ST ij contains the values of the scalarsy STF ij for allF . Thus the MANOV A model becomes: y ST ij = ST i +u ST ij : (5.7) Since we fit a separate but identical MANOV A to each spatial-temporal band, it is appropriate to call this approach mass multivariate analysis, with mass referring to the spatial and temporal dimensions, and multivariate to the frequency dimension. Note that we could have equivalently formed multivariate observations over other dimensions, provided that the multivariate dimension is small enough to allow for stable estimation of the covariance matrices involved in the calculation of test statistics, as described in section 2.3. We chose the frequency dimension because multiple studies have reported simultaneous changes in brain activity across several frequency bands (Bas ¸ar et al., 2001; Kilner et al., 2005; Pantazis et al., 2005c). The main reason for preferring MANOV A over ANOV A design is the increased sensi- tivity offered by the former when an experimental effect appears in multiple frequencies. If separate ANOV As are conducted on each frequency variable, then any relationship between frequencies is ignored. As such, we lose information about any correlations that might exist between frequencies. MANOV A, by including all frequency observations in the same model, takes into account the relationship between different frequencies. Consequently, MANOV A has greater power to detect an effect, because it can detect 37 whether experimental conditions differ along a combination of variables (frequencies), whereas ANOV A can detect only if they differ along a single variable. GLM theory assumes normal distributions, which is reasonable for averaged evoked responses due to the central limit theorem. However, power time-frequency decom- positionsy STF ij of single trial data have a chi-square distribution. Fortunately, Kiebel et al. (2005) have shown that, under most circumstances, one can appeal to the central limit theorem or apply a log or square-root transform on the MEG power estimates to make the error terms normal, and thus GLM theory is still appropriate. Furthermore, when non-parametric thresholding schemes are used , parametric assumptions are not necessary and any deviations from Gaussianity will affect the sensitivity of detecting an experimental effect, but not error control. We can equivalently write the MANOV A model in matrix form, where we “stack” all row vectors described above to form matrices: Y ST =X ST B ST +U ST ; (5.8) whereY ST (n observations n variables ) is the matrix of all observations,B ST (n conditions n variables ) is the parameter matrix with the parameter activation vectors for all conditions, U ST (n observations n variables ) is the error matrix, andX ST (n observations n conditions ) is the design matrix relating each observation to one of the conditions, and whose elements are either 0 or 1. 38 5.3 Test Statistics Even thoughY 1 gives us information about the dynamics of brain activity for a given task, it is necessary to verify the statistical significance of the observed effects. Based on the linear model shown in equation (5.8) (Seber, 1984), we write the null hypothesis we are interested in testing in matrix form: H R 0 : =AB =C vs: H R a :AB =C: (5.9) where the dimension ofA isn contrasts n conditions and the dimension ofC isn contrasts n variables . For instance, if there are two conditions in our study and we want to test for differences in brain activity between them (a single-contrast case), we useA = [ 1 1 ] andC =0 in equation (5.9), so that the null hypothesis of no activation changes becomes H R 0 : 1 = 2 . The statistic used in our hypothesis testing is a multivariate equivalent of the F - statistic, which is the ratio of the between-group to the within-group variances. Here, this ratio is represented by the multivariateF matrixE 1 H, whereH (n variables n variables ) andE (n variables n variables ) are the between-group and within-group variance matrices, respectively, and are given by: H = (CA ^ B) 0 [A(X 0 X) y A 0 ] 1 (CA ^ B) E = (YX ^ B) 0 (YX ^ B) ; (5.10) 1 For ease of notation, we will drop the indicesST from now on, since it is assumed that the model will be fitted at all sourcesS and all time pointsT . 39 where ^ B is the least-squares estimate of the parameter matrix, and (X 0 X) y indicates the pseudo-inverse of (X 0 X). Again, in the case where we test the difference between two conditions, the matrices of equation (5.10) become: H = 1 n 1 + 1 n 2 1 ( ^ 1 ^ 2 ) 0 ( ^ 1 ^ 2 ) E = n 1 X j=1 [(y 1j ^ 1 ) 0 (y 1j ^ 1 )] + n 2 X j=1 [(y 2j ^ 2 ) 0 (y 2j ^ 2 )] ; (5.11) wheren 1 andn 2 are the number of trials for condition 1 and 2, respectively, and each ^ i , i2f1; 2g, is given by the mean of all trials that belong to conditioni, such that ^ B = [ ^ 0 1 ^ 0 2 ] 0 . A number of statistics reduce the matrices described above into scalar values to test for statistical significance, including Roy’s maximum root, Wilks’ likelihood ratio, Lawley- Hotelling trace, and Pillai’sV (Seber, 1984). They are all functions of the eigenvalues of E 1 H, and differ in terms of power and robustness to violations of multivariate normality, homogeneity of the covariance matrix, and unequal sample sizes. In this work we use Roy’s maximum rootR, which is given by the maximum eigenvalue ofE 1 H, because it is most powerful when the conditions are mainly separated by one discriminant function (Field, 2005; Bray and Maxwell, 1985). Furthermore, an analytical solution exists to threshold Roy’s maximum root statistical maps with random field theory (Worsley et al., 2004), even though we use a permutation approach in this study. In the single-contrast case shown in equation 5.11 there is only one nonzero eigenvalue and all the above statistics, includingR, are equivalent to the Hotelling’sT 2 (Worsley et al., 2004; Taylor and Worsley, 2008). For each spatial and temporal location where we fit a MANOV A model, the null hypothesis is rejected with level of confidence (1 ) if the statistic R exceeds a thresholdR ; this threshold is based on a measure that provides control of the number of 40 false positives. The standard approach (Nichols and Hayasaka, 2003) is to control the familywise error rate (FWER), or the probability of at least one false positive under the null hypothesis. For MANOV A, this measure can be controlled over space; for ANOV A computed at different frequency bands, it can be controlled over space (therefore allowing multiple comparison errors over frequencies), or alternatively, simultaneously over space and frequency. The FWER is directly related to the global maximum distribution of the statistic (as discussed in section 3.3.1): one or more voxels will exceed the thresholdR underH 0 only if the maximum statistic exceedsR . The distribution of the maximum statistic can be estimated empirically with nonparametric permutation methods (Nichols and Holmes, 2001; Pantazis et al., 2005b), since they are exact, computationally feasible, and require very few assumptions about the data (most importantly, exchangeability underH 0 ). The permutation samples are created by randomly exchanging the 0’s and 1’s at the rows of the design matrixX, recomputing theR-statistic map for the permuted data, and getting its maximum. Repeating this procedure several times and building a histogram from the resulting samples gives us the empirical distribution. To preserve the spatial correlation of the data in the permutation samples, the same randomization ofX is applied to each spatial location when computing the permutedR-map. The same is true for the ANOV A case, with the exception that when controlling FWER over space and frequency, the same randomization scheme is used over space and frequency. 5.4 Post-hoc Analysis After detecting significant sources in the brain using the method described above, one might be interested in finding which variable, or set of variables, is responsible for the 41 significance of that source. In the present work, we use two approaches: protectedF -tests and linear discriminant analysis (Rencher and Scott, 1990). 5.4.1 ProtectedF -tests This technique consists of fitting a univariate ANOV A model at each of the variables individually, with FWER-correction over space and frequency, but only on those sources that were previously considered significant according to the MANOV A method; in other words, the map of significant activity points is used as a “mask”, on whose sources each variable is tested for significance. The use of this “mask” is what makes the protected tests different from the “unprotected” ones, which are simplyF -tests computed at every spatial location and every frequency band of interest, with FWER corrected over space and frequency. 5.4.2 Linear Discriminant Analysis Discriminant analysis is a popular technique in pattern classification, with a wide range of applications that also includes MEG (Besserve et al., 2007). Here, we are more interested in its use as a descriptive tool. Discriminant functions are linear combinations of variables that best separate groups. Given our observationsy ij , a linear combination transforms them into scalars: z ij =<a;y ij >= X k a k y ijk : (5.12) The vector a that provides the best group separation is the one that maximizes the differences between the mean values of thez ij ’s for each group ( z i ), divided by their covariances. In the two-condition case, this standardized difference is given by (Rencher, 1995): a = argmax ( z 1 z 2 ) 2 s 2 z = argmax [<a; y 1 y 2 >] 2 a 0 Sa ; (5.13) 42 whereS is the sample covariance. Using the relationships betweenS andE, and between ( y 1 y 2 ) 0 ( y 1 y 2 ) andH, shown in equation (5.11), we get: a = argmax a 0 Ha a 0 Ea : (5.14) Thus thea that maximizes =a 0 Ha=a 0 Ea is the discriminant function coefficient vector. Rewriting the expression of, and assuming a nonzeroa, we find that our coefficients are a solution to the generalized eigenvector problem: (E 1 HI)a =0: (5.15) This means that the possible discriminant function coefficients are the eigenvectors of E 1 H, while the’s are its eigenvalues. Therefore, the eigenvector associated with the highest will be our solution. Since the highest is, by definition, Roy’s maximum root, findingR gives usa automatically. If discriminant functions will be used to estimate the relative contribution of each variable to overall group separation, their coefficients must be standardized, because of differences in their magnitudes and in their variances. One way of doing this is by means of discriminant ratio coefficients (DRCs) (Thomas, 1992; Thomas and Zumbo, 1996), which are given by the expression: d k = a k (Ta) k a 0 Ta ; (5.16) where (:) k is thek-th row of (:), andT =E +H is the total variance matrix. The value ofd k is always positive, and therefore indicates how much each variable contributes to the experimental effect, but not whether that variable increases or decreases for a given experimental condition. This information can be retrieved from the estimated parameter matrix ^ B of the MANOV A model. 43 5.5 Results To assess the utility of the proposed MANOV A approach, we compared its performance to the ANOV A methods that we previously used in MEG analysis (Pantazis et al., 2009a), using both simulated and real MEG data. In MANOV A, the FWER is controlled over space, while in ANOV A, unless stated otherwise, the FWER is controlled over space and frequency; the confidence level in all tests is (1) = 95%. In this study, we configured the Morlet wavelet such that at 10Hz the temporal resolution is 300ms and the frequency resolution is 2:12Hz. Furthermore, we used a single time intervalT , which covers the entire duration of the experiment (1 second for both simulated and real data), and the following six frequency bandsF : delta (2 4Hz), theta (5 8Hz), alpha (8 12Hz), beta (15 30Hz), low-gamma (30 60Hz) and high-gamma (60 90Hz). The choice of these frequency bands is based on previous findings that suggested distinct functional roles for these frequencies in the sensorimotor cortex (Jerbi et al., 2004; Crone et al., 1998; Salmelin et al., 1995; Waldert et al., 2008). Also, no integration over space was performed on the wavelet coefficients, i.e.,S =s for everys. 5.5.1 Simulations The simulated brain activationZ has the spatial profile shown in Figure 5.1a, consisting of two active sources, one in each hemisphere. The source in the left hemisphere is a cosine signal with frequency 10Hz (within the alpha frequency band); the energy of this signal for each trial is sampled from a normal distribution, depending on the trial condition: r 1 N( 1 ;) r 2 N( 2 ;) ; (5.17) 44 where r i , j 2f1; 2g are the signal energies for each condition, 1 = 10 + 2:5 p 3, 2 = 10 2:5 p 3, and = 1. The source in the right hemisphere is a summation of three cosine signals with frequencies 10, 22:5 and 45Hz (within the alpha, beta and gamma-low frequency bands, respectively); the energies of these signals are sampled from a multivariate normal distribution: r 1 N( 1 ;K) r 2 N( 2 ;K) ; (5.18) wherer 0 i = [r i;j r i;j r L i;j ] ,j2f1; 2g are the signal energies for each condition and frequency, and the parameters of the multivariate distribution are: 1 = 2 6 6 6 6 4 12:5 12:5 12:5 3 7 7 7 7 5 ; 2 = 2 6 6 6 6 4 7:5 7:5 7:5 3 7 7 7 7 5 ; K = 2 6 6 6 6 4 1 0:9 0 0:9 1 0 0 0 0:1 3 7 7 7 7 5 : (5.19) The time profile of our simulated sources was inspired by an activation pattern estimated with EEG during a hand movement task (Pfurtscheller and da Silva, 1999). Our simulated sources are then projected onto the sensor space according to equation (5.1); zero-mean Gaussian noise is added thereafter, such that the SNR is 1=200. We simulated a total of 100 trials, or 50 trials per condition. Power observations for the frequency bands mentioned above were fitted to MANOV A models andR maps were created for the simulated data. We further performed separate ANOV A tests for each frequency band. The threshold for both cases was set to control 5% FWER, only in space for the MANOV A model, and in space and frequency for the ANOV A model. Figures 5.1b and 5.1c show the significant sources obtained with both ANOV A and MANOV A approaches, respectively. MANOV A was able to detect both single- and multiple-frequency sources, as we see two activation regions, one on each hemisphere, 45 Figure 5.1: Simulation results. (a) The spatial profile of the two sources simulated on the brain surface; the signal on the left hemisphere lies in the alpha band, while the signal on the right hemisphere lies in the alpha, beta and low-gamma bands; (b) The significant activation regions obtained from the univariate approach (ANOV A), with FWER controlled over space and frequency; (c) The significant activation regions obtained from the multivariate approach (MANOV A), with FWER controlled over space. that reflect the spatial location of the simulated signals. On the other hand, ANOV A was completely insensitive to the multiple-frequency source, since only the activation region on the left hemisphere is present, although this region is larger than its MANOV A counterpart. In figure 5.2 the results of our post-hoc analysis methods are presented. Here, we see the significant sources according to the protectedF -tests, and also the sources where the value of DRC was greater than or equal to 1=6, a rule for assessing the contribution of a variable suggested by Thomas (1992). These results are in agreement with the simulated sources, i.e., they indicate that the detected activity on the left hemisphere comes only from the alpha frequency band, while that on the right hemisphere comes from the alpha, beta and low-gamma bands. In summary we find that when compared to ANOV A, MANOV A is able to detect induced activity with higher sensitivity in cases where effects involving signals with 46 Figure 5.2: Post-hoc analysis results on the simulated data, at the,, and L frequency bands. Top row: significant sources obtained from the protectedF test. Bottom row: sources where the discriminant ratio coefficient (DRC) was greater than or equal to 1=6. For the other frequency bands, none of the methods found any active source. multiple, correlated frequencies are present. We also find evidence that our post-hoc anal- ysis methods are reliable tools for finding the frequency bands that cause the separation among conditions. 5.5.2 Experimental Data The data used in our work was acquired from a visuomotor task study (Jerbi et al., 2004, 2007). Two conditions for a single subject were tested: sustained visuomotor control (VM), in which the subject watched a randomly rotating cube in a screen in front of him and manipulated a trackball to prevent the cube from rotating by minimizing its angular deviation, and rest (R), in which the subject looked at a still cube without performing any activity. Figure 5.3a shows cortical maps with the values of theR statistic at each source, after thresholding for significance. These maps indicate that the differences in brain activity between the VM and R conditions appear in wide regions across the cortical surface, mostly in the parietal lobes, with highest values around the left sensorimotor cortex. The highest values ofR lying predominantly in the left hemisphere should be expected, because the subject used his right hand to move the trackball; moreover, the changes in 47 activity we found around the left and right motor cortices are in line with results obtained not only with the same dataset (Jerbi et al., 2004), but also with different visuomotor experiments (Chen et al., 2003; Classen et al., 1998). These studies also indicate a decrease in task-based signal power when compared to a control condition, especially in the alpha and beta bands; we observe the same effect in our study, as shown in figure 5.4, where the maps of each component of vector ^ 2 ^ 1 represent the difference of the mean observed signal power for each frequency band. Figure 5.3: Results of the analysis with data from a visuomotor study. (a) Brain map of theR statistic, after thresholding for significance; (b) Comparison between ANOV A (FWER corrected over space and frequency) and MANOV A; (c) Comparison between ANOV A (FWER corrected over space) and MANOV A. In the last two maps, sources in red were detected by both methods, sources in yellow were detected by MANOV A only, and sources in blue were detected by ANOV A only. A comparison of the performance of our MANOV A method with ANOV A can be found in figure 5.3b. Besides the considerable amount of overlap between the sources detected by both methods, an interesting feature of these maps is that the extent of the cortical regions showing experimental effects detected only by ANOV A is smaller than of those detected by MANOV A. We may take a less conservative approach and perform the ANOV A test with FWER correction only over space. Here, in the worst case where the frequency bands are independent, the achieved confidence level is not (1) = 95%, but (1) 6 = 73:51%. Even in this case, MANOV A is still able to detect sources that 48 Figure 5.4: Maps of normalized differences of the mean observed signal powers for each frequency band, represented by the components of vector ^ 2 ^ 1 . ANOV A is not, as figure 5.3c shows. (In fact, these images show that, in this study, the difference between correction over space and frequency and correction only over space is very small in ANOV A.) Given the difference between the MANOV A and ANOV A methods, it is important to find the frequency band (or bands) responsible for the significance in theR-maps. As discussed above, protected F -tests and linear discriminants are suitable methods for finding this information, and the resulting cortical maps are shown in figures 5.5 and 5.6, respectively. There are some differences between these maps, for instance the active sources for the high-gamma band have very different spatial patterns, which should be expected since each method performs a different kind of analysis: protected F -tests look at each frequency band without taking into account the other bands, while linear discriminants look at all variables simultaneously. However, it is clear from both 49 maps that the beta frequency band brings the greatest contribution to the overall group separation, followed by the low-gamma band but with considerably less influence in the activation changes. Figure 5.5: Significant protectedF brain maps for each frequency band of interest. A closer look at the behavior of a few selected voxels is provided by figure 5.7. It shows the DRC values and the mean power for each condition, normalized by the standard deviation, for the voxel with the highest value of theR statistic (figure 5.7a), a voxel detected by MANOV A but not by ANOV A nor by any of the protectedF -tests (figure 5.7b), and a voxel not detected by MANOV A (figure 5.7c). In the first case, the changes in activation are mostly caused by the beta, low-gamma and high-gamma frequency bands. In the second case, a combination of the delta, alpha, beta and high-gamma bands causes a significant discrimination, but none of these frequencies is significant by itself. Finally, in the third case, the changes between conditions are very small for every band; 50 Figure 5.6: Brain maps of the DRCs greater than or equal to 1=6 for each frequency band of interest. even though some DRCs have large values, this reflects the fact that they are standardized to sum to 1, and should not be considered evidence of significant overall separation. 5.5.3 Multi-Subject Analysis If we are interested in studying task-based changes in activation at the population level, a group analysis with data from multiple subjects is required. In this section, we use MEG visuomotor data obtained from 15 subjects. In section 3.3.3 we describe methods we can use to make group inferences on our data, and in our study, we apply Benjamini and Heller’s method, where the individual subject p-values for each voxel are combined into a group voxel p-value. Based on 51 Figure 5.7: Discriminant ratio coefficient (DRC) and normalized mean power for both conditions at specific voxels. (a) The voxel with the highest value of theR statistic; (b) A voxel detected by MANOV A, but not by ANOV A nor by any of the protectedF -tests; (c) A voxel not detected by MANOV A; (d) Spatial location on the cortical surface of each of the selected voxels. the ordered subject p-valuesp (k) ,k = 1; ;n subjs , Benjamini and Heller’s method is implemented by equation (3.21) : p G = min k=1;;n subjs u+1 n subjs u + 1 i p (u1+k) : (5.20) The scalaru, as explained above, is the minimum number of subjects that must have an effect at a given voxel for a group effect to be assigned to that voxel. We chose to use a method that combines p-values instead of one that combines linear models because, if we implemented the latter, the observation matrix in the group linear model of equation (3.17) would have 15 rows and 6 columns, and with such a reduced number of degrees of freedom we would not be able to compute stable multivariate statistics. 52 To obtain the p-values for equation (3.21), we depart from resampling approaches to find the distributions of theR statistics at the locations of each individual subjects, and focus instead on parametric expressions. The reason behind this is that Benjamini and Heller’s equation requires very accurate p-values, which in turn demands a very high number of permutations, thus becoming computationally expensive. The parameterization of Roy’s maximum root depends on the dimensions of matricesX,Y andA: if 1 = 0:5(n variables 2) and 2 = 0:5(n observations n conditions n variables 1), then the variable F = 2 + 1 1 + 1 R (5.21) has anF distribution with 2( 1 + 1) and 2( 2 + 1) degrees of freedom (Seber, 1984). It should be noted that this parameterization is valid only in single-contrast models. A comparison of multi-subject results obtained from MANOV A and ANOV A models is shown in figure 5.8, for varying values of the scalaru. MANOV A-based group analysis is performed by applying Benjamini and Heller’s method to the parameterization of equation (5.21); and for ANOV A-based group analysis, we get individual p-value maps by computingF statistics for all frequency bands at each voxel, converting them into p-values using the usualF parameterization, and then taking the minimum p-value. These maps validate our single-subject results presented in the previous section, as wide areas in the brain are active according to MANOV A but not to ANOV A, while very few voxels were detected by ANOV A only. Looking specifically at our MANOV A group results, we see strong changes in activation around the motor cortices on both hemispheres (similar to the findings presented in figure 5.3), but unlike section 5.5.2, we also observe increases in active regions in the visual cortex. To find the frequency bands responsible for the active voxels presented in figure 5.8, we can apply Benjamini and Heller’s method to the protectedF -tests from section 5.4.1. 53 The significant voxels according to theF -tests at the group level are shown in figure 5.9, foru = 13. As theF -tests do not provide directionality, we took the contrast vectors ^ 2 ^ 1 at voxel, and averaged them across subjects; their values are indicated in the color code of figure 5.9. In the averaging, care was taken to ensure that each component of the contrast vector had the same variance across subjects. According to these maps, the VM experiment causes strong decrease in beta power around both motor cortices, along with a less detectable decrease in alpha power. It is interesting to notice that there are active voxels in figure 5:8 that are not present in any of the maps in figure 5.9, which means that some of the activation changes cannot be directly attributed to any specific frequency band. More lenient tests (for instance, decreasing further the value ofu), would show us gradually how each band participates in the separation between conditions. (a)u = 15 (b)u = 14 (c)u = 13 Figure 5.8: Comparison between the results from multi-subject analysis obtained with MANOV A and with ANOV A, for varying values of the group null hypothesis parameter u. In these maps, red voxels were detected by MANOV A and ANOV A, blue voxels were detected only by ANOV A, and yellow voxels were detected only by MANOV A. 54 (a) (b) (c) (d) L Figure 5.9: Maps of voxels with significant protectedF statistics, according to the group statistical test and foru = 13. 5.6 Discussion In both simulated and real data, despite the considerable overlap between both MANOV A and ANOV A methods, the number of statistically significant surface elements obtained from MANOV A was higher than the number of significant sources from ANOV A. Also, MANOV A was more sensitive to the activity of a number of sources even if less conser- vative ANOV A tests were performed. The difference in performance between the two methods is more evident in the simulation results, where MANOV A was sensitive to both single-frequency and multiple-frequency sources (i.e., to changes in a single variable as well as to changes in multiple variables, respectively), while ANOV A found sources only in the former. A number of factors may have influenced our results, such as the choice of a different inverse operator to reconstruct current density maps, or the selection of a volumetric source space rather than cortical imaging. As discussed in previous chapters, we could have also used alternative ways to control false positives, for example multivariate random field methods (Taylor and Worsley, 2008; Carbonell et al., 2008) instead of permutations to estimate the maximum distribution, or even the use of false discovery rate instead of the FWER. 55 Our findings are better understood if we look at the basic concepts behind Roy’s maximum root, and the operations this statistic performs on the multivariate data. The key concept behind it is the union-intersection principle (Roy, 1953; Seber, 1984; Worsley et al., 2004): a multivariate null hypothesis is the intersection of simpler univariate null hypotheses, or conversely, the multivariate alternative hypothesis is the union of univariate alternative hypotheses. Based on this principle,R can be computed by creating a univariate model from the original multivariate model through a linear combination on the observations, computing theF statistic for the new model, repeating this procedure for all linear combinations, and taking the maximumF . Thus, Roy’s maximum root is a statistical test that simultaneously considers all possible simpler hypothesisF -tests derived by linearly combining the multivariate observations (infinite in number). The combined null hypothesis is the intersection of the simpler null hypotheses, and therefore, to control the error rate among them, we need to apply a higher threshold than the one we would apply to control only a single univariate test. However, if a correlation pattern causes a linear combination of the observations to have a high F -statistic, then only MANOV A will detect it because ANOV A does not test for arbitrary linear combinations of observations. ANOV A only tests for the trivial linear combinations of observations that assign 1 to the variable of interest, and 0 to all other frequency bands. Given the relationship between Roy’s maximum root and the union-intersection principle, the work of Carbonell et al. (2004) is in a sense a direct precursor of our MANOV A method. In this paper a one-dimensional statistical map was formed over time, by using a multivariate statistic over space. The statistical map was thresholded by means of random field theory, and once significant time instances were identified, the union-intersection principle allowed further localization of the effect in space. In our work, we form the statistical map in space-time instead of just time, and threshold it using 56 permutation tests instead of random field theory. Furthermore, the union-intersection principle of Roy’s maximum root would have allowed us to further localize the effect in frequency, similarly to Carbonell et al. (2004) in space. However, we implemented an alternative method based on protected F -tests, whose significance was evaluated with an additional permutation analysis to exactly control the FWER on the locations in space-time where the null hypothesis had already been rejected. We further evaluated the contribution of each frequency to the experimental effect with discriminant analysis, an approach not included in Carbonell et al. (2004). Going back to our results, it is reasonable to expect sources sensitive only to ANOV A because, when a single variable shows differences between conditions, MANOV A is more conservative than ANOV A, since here the former is equivalent to a univariate test at a level higher than (1). On the other hand, it is also reasonable to expect sources that can only be detected by MANOV A because this might relate to variables that do not show an effect when considered individually, but that contribute to group separation when combined, especially if they are somewhat correlated. As a follow-up method to our MANOV A technique, we found that both protected F -tests and linear discriminants are reliable ways to identify individual frequency bands that contribute significantly to changes in brain activation. This is demonstrated by our simulation results, since either method was able to detect accurately the bands that form the simulated time series on each source. When applied to real data, our post-hoc analysis points towards interesting results. One noteworthy finding is how the sources detected by each technique differ; as men- tioned before, this is evident for the high-gamma band (among other cases), as the activation found by the protectedF -test lies entirely on the left hemisphere, whereas most of the sources with sufficiently high DRCs are on the right hemisphere. If we 57 assume that all the significant results of our statistical tests faithfully describe the experi- mental effects of the real data, then these seemingly contradictory findings are in fact evidence of different activation patterns in the brain, to which each of the methods here is sensitive. In particular, the significantF values in the left hemisphere in figure 5.5f indicate sources with a high between-condition variation in high-gamma power. However, these sources have small DRCs in the high-gamma band, as seen in figure 5.6f, because the beta frequency contributes much more to the condition separation than high-gamma. On the contrary, high DRCs on the right hemisphere of figure 5.6f denote sources where high-gamma changes contribute strongly to the separation between conditions, together with other bands, but high-gamma itself is insufficient to separate the two conditions with anF -test. Figure 5.7b refers to a voxel in the right visual cortex that is sensitive to MANOV A but not to ANOV A. In this voxel no frequency band is significantly active, according to the protectedF -tests, but the DRC values and the differences in the normalized mean power between conditions indicate that the activity variation there comes mostly from the high-gamma, alpha, delta and beta bands. Thus it is a situation in which separation is caused not by a single variable but by a combination of variables, and it demonstrates not only the advantages of multivariate over univariate analysis, but also the potential of MANOV A as a tool for studying frequency interactions in the brain. 58 Chapter 6 Canonical Correlation Analysis Applied to the Estimation of Functional Connectivity In this chapter, we describe the use of canonical correlation analysis to detect functional interactions in the brain. By following the procedures described in section 5.1, we obtain, for each voxel on the cortical surface, multivariate observation matrices that serve as inputs to our method. We compute canonical correlation maps between a reference voxel and all other cortical locations, and then test for significance using asymptotic parametric distributions. Interactions due to linear mixing are discarded with additional tests that examine the collinearity of canonical vectors. Follow-up approximateF -tests allow us to identify individual frequencies that contribute significantly to the detected interactions. Again, the performance of our method is verified with MEG data from simulations and a real visuomotor experiment. 6.1 Canonical correlation analysis LetX andY be observation matrices from two different voxels, created as described in section 5.1, and normalized so that every column at either matrix has zero mean and unit variance. The maximum canonical correlation seeks linear combinations of the columns 59 of these matrices that maximize the correlation between them. In other words, we want to find vectorsa andb with dimensionn freqs 1 such that the expression = (Xa) 0 (Yb) s:t: 8 > < > : (Xa) 0 (Xa) = 1 (Yb) 0 (Yb) = 1 ; (6.1) is maximized (Hotelling, 1936). Maximum canonical correlation and vectorsa andb are given respectively by the eigenvalues and eigenvectors of the equations (Seber, 1984): 8 > < > : (C XX ) 1 C XY (C YY ) 1 C YX a = 2 a (C YY ) 1 C YX (C XX ) 1 C XY b = 2 b : (6.2) Here,C XX = (n trials 1) 1 X 0 X andC YY = (n trials 1) 1 Y 0 Y are the sample autoco- variance matrices ofX andY, respectively, andC XY = (C YX ) 0 = (n trials 1) 1 X 0 Y is the cross-covariance matrix betweenX andY. Maximum canonical correlation is also known as the principal angle between two subspaces (Bj¨ orck and Golub, 1973). In this study,X refers to the observation matrix of a fixed, previously chosen reference voxel, whileY is the observation matrix of any other voxel in the brain. We create a statistical map by calculating the canonical correlation between the reference and all other voxels. For each spatial location, if exceeds a threshold , then we can reject at a (1) level of confidence the null hypothesisH 0 of no correlation between this location and the reference (unless otherwise stated, = 5% for the remainder of this chapter). We calculate this threshold based on the step-up Hochberg procedure, which controls the familywise error rate, or the probability of one or more false positives underH 0 , as described in section 3.3.1. The FWER approach requires the conversion of a canonical correlation map into a p-value map. We do this parametrically by using an asymptotic approximation of the canonical correlation distribution, under the assumption that the matricesX andY are multivariate Gaussian: let 1 ; 2 ;::: n freqs be the square roots of 60 all the eigenvalues of the matrix (C XX ) 1 C XY (C YY ) 1 C YX shown in equation (6.2). Then, under the null hypothesis, the quantity = n trials 1 2 (2n freqs + 3) n freqs X k=1 log(1 2 k ) (6.3) has a 2 distribution withn 2 freqs degrees of freedom (Anderson, 2003), and can therefore be converted into a p-value. 6.2 Linear mixing correction Due to the limited spatial resolution of the MEG imaging system, the point spread function extends across several voxels, and is considerably variable across the cortical surface. As a consequence, two voxels may have a high correlation even if there is no actual functional interaction between them. Our canonical correlation analysis allows the following solution to this problem. Assume there is linear mixing between two voxels, i.e., observationsX andY are linear combinations of the true underlying neural activationsX andY at the two voxels: 8 > < > : X =X +Y Y =X +Y ; (6.4) where,,, and are unknown, nonzero constants. We show in section 6.6 that if there is no true interaction between the two cortical sites (X andY are uncorrelated, or X 0 Y =0), then canonical vectorsa andb (associated with the observation matricesX andY, respectively) are collinear regardless of the mixing scalars,, and. Based on this collinearity property, we are interested in finding a suitable statistical test to verify whether a given interaction between two spatial locations is mainly caused 61 by linear mixing or at least a portion of the interaction can be explained by true functional coupling. We attempted several approaches to find an adequate test, such as the following: 1. If vectorsb ? i (i = 1;:::;n freqs 1) form an orthogonal basis for the orthogonal complement of the subspace spanned byb, and cos( i ) =a 0 b ? i for alli, then we can test for linear mixing by using the following null hypothesis: H 0 : max i=1;:::;n freqs 1 cos( i ) = 0: (6.5) The problem with this approach is the difficulty in finding an accurate distribu- tion for max i cos( i ), because it cannot be parameterized, and no permutation scheme can be applied. Also, due to these previous factors, it is not possible to verify whether a distribution obtained from Monte Carlo methods is adequate (our simulations indicate that it is not). 2. As we will describe in the next section, the properties of canonical correlation analysis allow us to fit the following linear model: Xa =Yb +; (6.6) wherea is assumed fixed, andb are the (assumed unknown) model regression parameters. Based on this model, we can test for vector equality (not collinearity) by testing the following null hypothesis, for alli = 1;:::;n freqs : H 0 :b i =a i ; (6.7) wherea i andb i are thei th components of vectorsa andb, respectively. As with the previous method, this approach cannot be parameterized, and permutation schemes cannot be easily implemented. 62 3. Given the variablesu =Xa andv =Ya, we can test for the significance of the correlation coefficient betweenu andv: H 0 :r = u 0 v p u 0 u p v 0 v = 0: (6.8) The distribution ofr can be parameterized according to Fisher’s transformation of Pearson’s correlation coefficient, or simple permutation schemes can be imple- mented to obtain the distribution empirically, as we did in our investigations of this approach. The problem with it, however, is that rejecting the null hypothesis implies that the interaction is mainly caused by linear mixing; if the null hypothesis is not rejected, linear mixing could be present in the interaction, although its effect would not be too strong. Thus this approach tends to be too lenient, detecting ‘significant” canonical correlations when interactions are caused by linear mixing only, but noise results in a small non-zero angle betweenu andv. The test that we developed and found to work substantially better than any of those we have just described is as follows. Let us consider the following decomposition of canonical vectorb: b =b P +b O ; (6.9) where b P and b O are the components of b which are parallel and orthogonal to a, respectively. Equation (6.1) can then be rewritten as: =a 0 X 0 Yb =a 0 X 0 Yb P +a 0 X 0 Yb O : (6.10) The first term of the sum above refers to the proportion of that could be explained by linear mixing, while the second term refers to the proportion of unrelated to linear mixing. Thus the quantity 1 =a 0 X 0 Yb O indicates the strength of true coupling for a given location pair. By analogy, we can perform on vectora a decomposition similar to 63 the one in equation (6.9), which gives us the quantity 2 =a 0 O X 0 Yb. If either 1 or 2 are significant, we conclude that a given interaction can not be fully attributed to linear mixing. The null hypothesis for this case can be defined as: H 0 : = maxf 1 ; 2 g = 0 vs: H a :6= 0: (6.11) As with test (3), the distribution of the linear mixing quantity can be obtained nonpara- metrically with resampling methods, even though parameterization based on Fisher’s transform is inaccurate for 1 and 2 . As it will be explained below, even if we use other approaches to find the distribution of , it is possible to check the accuracy of these approaches by comparing them with distributions computed with permutations. Unlike test (3), however, by rejecting the null hypothesis of equation (6.11), we only keep interactions for which there is strong evidence of true functional coupling, thus being more conservative with respect to the effects of crosstalk. It should be noted that this is a conservative statistical test, because we can miss true interactions that have collinear vectorsa andb, but we consider this preferable to report- ing false neural interactions caused by linear mixing. In fact, conservativeness is a feature of another interaction metric, the imaginary coherence (Nolte et al., 2004), according to which all correlations with zero time lags, even those related to true interactions, are discarded because such correlations may have been caused by linear mixing. Just as imaginary coherence requires true interactions to have non-zero time lags, our canonical correlation approach requires true interactions to have different frequency profiles, and thus non-collinear canonical vectorsa andb. The distribution of is obtained from Monte Carlo simulations: 100; 000 realiza- tions of random observation matricesX andY are created according to equation (6.4), whereX andY are standard independent multivariate Gaussian matrices with dimension 64 n trials n freqs , and have standard uniform distributions, and = p 1 2 and = p 1 2 . Comparing this distribution against at every voxel location gives us a p-value map, which is then thresholded for significance using FWER to control for multiple comparisons. For voxels above the threshold we are confident that canoni- cal correlations are not caused solely by linear mixing. As mentioned before, we can validate the distributions computed with Monte Carlo simulations by comparing them with distributions estimated with permutations; such a comparison is presented in figure 6.1, where we display maps of voxels with significant for multi-subject data acquired during a visuomotor task study and with a reference voxel located in the left motor cortex (more details on this data in section 6.4.3). The thresholding is based on FWER, but for the permutation-based distributions, we used the maximum statistic approach used in chapter 5, while for the Monte Carlo-based distributions, we used Hochberg’s method. In most cases, there was strong overlap between the two approaches, even though the thresholding schemes were different - we do not apply Hochberg’s method to permutation-based distributions because it requires very precise estimations of the distributions, which in turn demands a large number of permutations. This indicates that the Monte Carlo methods are suitable for thresholding. We have described a statistical test that identifies voxels where interaction is not fully explained by linear mixing. Furthermore, in the previous section we described a test that identifies voxels with statistically significant canonical correlation. We combine these tests using conjunction analysis (Nichols et al., 2005), which uses for the thresholding a map of the maximum, at each voxel, between the p-value for and the p-value for . The resulting test finds significant canonical correlations between the reference and all other cortical locations that cannot be explained by linear mixing. Simulations and experimental data exploring this approach are presented in section 6.4. 65 Figure 6.1: Thresholded maps computed from multi-subject data acquired during a visuomotor experiment (more details in section 6.4.3). Left and right hemisphere maps from 15 subjects are presented, each pair of rows showing 5 subjects. In these maps, blue voxels have significant only when permutation-based distributions are used, orange voxels have significant only when Monte Carlo-based distributions are used, and red voxels have significant for both approaches. 6.3 Contribution of each frequency band to the correla- tion After finding cortical areas that interact significantly with a chosen reference, one might be interested in identifying which individual frequencies are responsible for the interaction. IfX is the observation matrix at the reference voxel andY the observation matrix at the 66 voxel being analyzed, canonical correlation vectorsa andb contain information on the frequency contributions of each cortical site to the interaction (Rencher, 1992). Therefore, to find significant individual frequencies, we want to identify which components of canonical vectorsa andb are significantly different from zero. The approach we follow here is based on an approximate test presented in (Bruguier et al., 2008). Based on the expressions for canonical correlations shown in equation (6.2), we can write the following: (C XX ) 1 C XY (C YY ) 1 C YX a = 2 a, , [(C YY ) 1 C YX ](C XX ) 1 C XY (C YY ) 1 C YX a = [(C YY ) 1 C YX ] 2 a, , (C YY ) 1 C YX (C XX ) 1 C XY [(C YY ) 1 C YX a] = 2 [(C YY ) 1 C YX a], , (C YY ) 1 C YX (C XX ) 1 C XY b = 2 b: (6.12) Thus, b = (C YY ) 1 C YX a,b = (Y 0 Y) 1 Y 0 Xa: (6.13) The above expression is equal to a least-squares estimator for a linear models, where Xa is the observation vector andY would be the design matrix. Thus, ifa is assumed fixed, we can test the significance of each element of b = [ b 1 b 2 ::: b n freqs ] 0 by fitting the following linear model at each voxel that is significantly correlated with the reference: Xa =Yb +; (6.14) with being a residuals vector. The null hypothesis is: H b i 0 :b i = 0,Db =0 vs: H b i a :Db6=0; (6.15) 67 where matrix D is a (1n freqs ) row vector whose elements are 0 apart from thei th element which is 1. To perform anF -test on this model, we need the within-group and between-group variances, given respectively by: 8 > < > : E = (XaYb) 0 (XaYb) H = (Db) 0 [D(Y 0 Y) 1 D 0 ] 1 (Db): (6.16) We then compute the statistic = (n trials n freqs )H=E which has anF distribution with 1 and (n trials n freqs ) degrees of freedom (Seber, 2003). If is larger than a certain threshold, thenb i is significantly different from zero, and its correspondent frequency band contributes highly to the overall interaction. The significance of the elements of a can be evaluated by an analogous procedure, in which the positions of the matrices Xa andYb in equation (6.14) are interchanged, vectorb is assumed fixed, and every component of vectora is tested. The aboveF -tests are performed only on the voxels that have significant interactions, i.e., the ones that survive the conjunction of significant canonical correlation and linear mixing tests described earlier. Since these can be several voxels, we again have a multiple comparison problem, which we solve with an FWER procedure: each vector element (corresponding to a particular frequency band) produces anF -statistic map which is converted into a p-value map and thresholded with FWER. As it has been pointed out by Bruguier et al. (2008), this is not an exact test, because the design matrix is random, contrary to standard general linear models. However, it is reasonably accurate as shown in our simulations. 68 6.4 Results In order to assess the accuracy of our canonical correlation method in detecting functional interaction in the brain, discarding correlations that are due to linear mixing, and finding which frequency bands contribute mostly to the true correlations, we first apply this method to simulated MEG signals. Real data applications of our method are illustrated with multi-subject MEG measurements obtained during a visuomotor task study, the same as used in the previous chapter. 6.4.1 Simulations with Gaussian Noise The simulated brain activationZ has the spatial profile on the cortical surface shown in Figure 6.2a, consisting of three active sources, two in the left hemisphere and one in the right hemisphere. All sources are cosine waves: frequencies are 22:5Hz for source 1, 45Hz for source 2, and 10Hz for source 3. The powers of the cosine waves r k (k2f1; 2; 3g) vary across trials, and are modeled as multivariate Gaussian random variables: r = 2 6 6 6 6 4 r 1 r 2 r 3 3 7 7 7 7 5 N 0 B B B B @ 2 6 6 6 6 4 2 2 2 3 7 7 7 7 5 ; 2 6 6 6 6 4 1 0:9 0:9 0:9 1 0:9 0:9 0:9 1 3 7 7 7 7 5 1 C C C C A : (6.17) The time profile of our simulated sources was inspired by an activation pattern estimated with MEG power data from a visually directed finger movement task which suggested the existence of task-related functional connectivity across different frequencies and cortical regions (Chen et al., 2003). Our simulated sources are then mapped into the sensor space according to equation (3.1); at the sensor level, we add real zero-mean Gaussian noise, such that the SNR is 69 1=100, for a total of 200 trials. We then apply the inverse operator to obtain the recon- structed current density maps, create observation matrices from the power observations obtained for the frequency bands as described in section 3.1, and maps were computed for the simulated data. The procedure of computing the canonical correlation maps and thresholding them for significance is illustrated in figures 6.2b through 6.2d: in figure 6.2b, we can see the values of the canonical correlations for every voxel on the cortical surface, when the reference is in region (3); figure 6.2c shows the values of the quantity, the linear mixing statistic from section (6.2), for all voxels; 6.2d presents the results of the statistical tests applied to the two previous maps, according to the FWER approach described in section 6.1, where blue locations indicate true functional interaction with the reference. According to these maps, our method found that voxels around all simulated regions were correlated with the reference, but that the correlations detected at voxels around region (3) did not correspond to true interactions, matching exactly our simulation setting. To identify which frequency bands contribute mostly to the interactions, we need to analyze the components of vectorsa andb. Figure 6.3 presents the absolute values of each component ofa andb for a voxel in region (1) and another in region (2), with the reference in region (3) as shown in figure 6.2. According to these figures, the alpha band at the reference interacts significantly with the beta band at region (1), and with the low-gamma band at region (2). Again, this corresponds to our simulation setting, and is further evidence of the accuracy of canonical correlation analysis coupled withF -tests. 6.4.2 Simulations Combined with Real Data To further investigate the ability of canonical correlations to detect true functional interactions while minimizing the rate of false positives (caused by either random error or by linear mixing), we proceed to an ROC analysis - specifically, a method based 70 Figure 6.2: Simulation results. (a) The spatial profile of the three sources simulated on the brain surface; signal in region (1) lies on the beta band, signal in region (2) lies on the low-gamma band, and signal in region (3) lies on the alpha band; (b) Map of the maximum canonical correlation when the reference is in region (3) (indicated by the green dot); (c) Map of the quantities, that indicate the amount of linear mixing; (d) Results of the statistical thresholding: voxels in green have a significant, voxels in red have a significant, and voxels in blue have both significant and. on a variant of ROC called alternative-free response ROC (Chakraborty and Winter, 1990; Metz, 2006; K¨ uc ¸ ¨ ukaltun-Yıldı rım et al., 2006). Once again, we use simulated sources, but instead of Gaussian noise, we seek to replicate the noisy measurements typically found in MEG systems by adding real human MEG signals to the simulations as background noise. For this analysis, we created 100 datasets, each consisting of 115 MEG trials. The datasets were equally divided into two groups. Each dataset in group 1 contained two simulated sources located randomly on the cortex but at least 10 cm apart. Their time series were cosine waves with correlated power and parameters similar to the previous section. The signals were restricted to lie within one of the six frequency bands, but each 71 Figure 6.3: Absolute values of the canonical vectorsa andb at voxels in regions (1) (left) and (2) (right), obtained from the simulated data with the reference voxel in region (3). In the canonical vector plots, yellow bars indicate frequencies that contribute significantly to the interaction, according to theF -tests. source had a different band. On the other hand, each dataset in group 2 contained a single simulated source, with random location, power, and signal frequency. For a realistic simulation, we added experimental MEG recordings as noise. The data consisted of 115 trials of the rest condition of the visuomotor experiment, which were randomly added to the 115 trials of simulated data with SNR = 1=25; in figure 6.4, we can see the simulated and real time series for a trial of one of the datasets, and the projection of these time series onto the source domain can be seen in figure 6.5. Figures 6.6 and 6.7 show the spatial profile of the simulated sources and the results of the statistical thresholding for two datasets belonging to groups 1 and 2, respectively. We draw an ROC curve by obtaining true positive and false positive fractions for different values of the confidence level (1). For a given , we find the TPF by performing the statistic tests described in sections 6.1 and 6.2 on each of the 50 group 1 72 (a) (b) (c) Figure 6.4: Time series of a trial from one of the 100 simulated datasets. (a) Simulated sources; (b) Real MEG data; (c) The sum of the simulated and the real data. (a) (b) (c) Figure 6.5: Source-domain representations of the simulated (a), real (b) and the sim of simulated and real time series (c) presented in figure 6.4, at time instantt = 0:6944s. datasets, having as a reference voxel a location within one of the two simulated sources, and computing the proportion of datasets in which there is at least one significant voxel within a 20-mm radius from the center of the other simulated source. Similarly, we find the FPF by computing the proportion of the 50 group 2 datasets in which there is any number of significant voxels anywhere in the brain. To add some robustness in the estimation of the TPF and FPF, we discard all significant voxels that belong to an active region smaller than 20 mm 2 . The resulting ROC curve is shown in figure 6.8. 73 Figure 6.6: Results from one of the group 1 datasets: (a) Spatial profile of the simulated sources; (b) map of the voxels with significant (green), (red) or both (blue), with the reference voxel within region (1). 6.4.3 Experimental Data We analyzed the experimental VM data to explore interactions between the primary motor cortex (M1) and the the rest of the cortex. We used the MEG measurements described in chapter 5 acquired fromn subjs = 15 subjects (n trials = 573 101) (Jerbi et al., 2007), having the time series of the primary motor cortex as reference signal. Multi-subject analysis requires the interpolation of the cortical surfaces from all subjects to a common anatomical map, such as the standard Montreal Neurological Institute brain. Co-registration was performed with “a standard procedure based on anatomical markers (AC-PC), the Talairach bounding box, and piecewise affine transformations” (Jerbi et al., 2007). To perform statistical analysis at the group level, we take the and p-value maps computed for each subject and combine them into a single group p-value map, that can then be thresholded with the FWER procedure described in section 6.1. We obtain group 74 Figure 6.7: Results from one of the group 2 datasets: (a) Spatial profile of the simulated sources; (b) map of the voxels with significant (green) or (red), with the reference voxel within region (1). Notice that no voxel has significant and significant. Figure 6.8: ROC curve drawn according to the proceedings described in section 6.4.2. The dotted line represents the line of no discrimination, where TPF equals FPF. p-values for and with Benjamini and Heller’s method, as in chapter 5, and we find the voxels with significant coupling at the group level by taking, at each voxel, the maximum between thep u for and, and then thresholding the resulting map. Figure 6.9 shows maps of the voxels significantly correlated with the reference signal after linear mixing correction, according to the null hypothesis stated in equation (3.18) 75 and for varying values ofu. According to these maps, the left motor cortex interacts with large portions of the visual cortex and the temporal lobe in both hemispheres, and also with smaller regions on the left and right frontal lobes. In order to find out which frequency bands are responsible for these interactions, we can extend theF -test approach described in section 6.3 to multi-subject analysis as we did with the and maps, but now we restrict our search for significant components ofa andb to voxels with a detected group effect. The results of the groupF -tests are presented in figures 6.10 and 6.11. The maps shown in figure 6.10 indicate that except for the delta and theta bands, all frequencies contribute to the correlation between the left M1 area and the frontal lobe; on the other hand, figure 6.11 shows that the coupling we found between the left M1 area and the visual cortex is mainly caused by the beta, low- and high-gamma bands on the former area, and by the beta band, on the latter. Figure 6.9: Left, posterior, and right views of brain maps of voxels with significant group effects, for varying values of the group null hypothesis parameteru. Colored voxels have effects for all values ofu smaller or equal to 10, but yellow voxels have also effects for u = 11, and red voxels have also effects foru equal to 11 and 12. 6.5 Discussion Cortical interactions can occur in many possible combinations of frequency bands and this poses a significant challenge in EEG/MEG data analysis in terms of the numbers of tests that need to be performed. Furthermore, standard statistical measures, such as coherence and phase synchrony, are unable to distinguish between true neural interactions 76 and linear mixing effects caused by the limited spatial resolution of EEG/MEG inverse methods. Our method deals with both problems by introducing the new interaction measure of canonical correlations. By construction, this measure provides the optimal way to linearly combine frequencies in one cortical site to best predict frequencies in another location. Given the linear nature of EEG/MEG mixing effects, false interactions are only possible within the same frequency bands, causing canonical vectors to be parallel. To take advantage of this effect, we have devised a test for linear mixing that rejects interactions if canonical vectors are collinear. Our simulation results suggest that our method efficiently detects cortical interactions while controlling for linear mixing. Regarding the interactions in the VM experimental data, we found that wide areas of the visual cortex and smaller regions in the temporal and frontal lobes interact strongly with the left motor cortex. Furthermore, our results indicate that all six frequency bands under consideration contribute to this specific coupling, even if one of them (alpha) is present in only a very few voxels. This attests to the advantages of using a multivariate approach to functional connectivity, because a single test is able to look at several combinations of cross-frequency interactions, with the additional capability of dealing with crosstalk effects. Vectora (a) (b) (c) (d) (e) L (f) H Vectorb Figure 6.10: Anterior views of maps of voxels with significant components of the canonical vectorsa andb, according to the group statistical test and foru = 10. 77 Our approach for solving the linear mixing problem is related with methods such as the imaginary coherence approach proposed by Nolte et al. (2004) and the phase-slope index described in Nolte et al. (2008). Both methods indicate that there is significant coupling between two cortical sites only if the time lag between their signals is nonzero. The same way these methods require interactions with different time lags, our method requires interactions at different frequency profiles. Neither type of interaction is possible as a result of cross-talk or linear mixing only. True cortical interactions often involve the same frequency bands in both cortical sites. If a high canonical correlation is caused only by within-frequency interactions that occur in all frequency bands involved in the analysis (regardless of the strength of these interactions), then the canonical vectors may also be collinear and these interac- tions will be missed by our method. Even though this is a disadvantage, we prefer a conservative interaction metric than one prone to produce false positives due to linear mixing. Furthermore, such interactions erroneously considered spurious due to collinear canonical vectors might be correctly detected by imaginary coherence, which means that our method should not be considered a substitute, but rather complementary to the technique presented in Nolte et al. (2004). Finally, it is worth mentioning that, since Vectora (a) (b) (c) (d) (e) L (f) H Vectorb Figure 6.11: Posterior views of maps of voxels with significant components of the canonical vectorsa andb, according to the group statistical test and foru = 10. 78 the canonical vectors are a measure of the frequency profile of the interacting sources, more variables in our observations provide profiles with better resolutions, thus our linear mixing correction method is also dependent on the number of frequency bands being analyzed. Canonical correlation analysis requires the selection of a reference spatial location, whose interaction with all other voxels in the brain is evaluated. For our multi-subject real data computations, the reference was located in the motor cortex. This was a straightforward decision, because the data was acquired during a visuomotor task, and also this voxel had the highest correlation with hand movement speed during this task (Jerbi et al., 2007). When such prior information is not available, one may have to compute interaction maps for multiple reference voxels (Carbonell et al., 2009). For EEG/MEG source analysis, where the total number of voxels might be in the tens of thousands, this poses a severe multiple comparison challenge that should be accounted for when thresholding for significance. 6.6 Relationship between linear mixing and collinearity of the canonical vectors Based on equation (6.4), ifX andY are uncorrelated (i.e.,X 0 Y = 0) the auto- and cross-covariance matrices can be written in terms of the auto-covariances ofX andY: 8 > > > > < > > > > : C XX = 2 C XX + 2 C YY C YY = 2 C XX + 2 C YY C XY = C XX +C YY =C YX : (6.18) 79 We can also write: (C YY ) 1 C YX = = ( 2 C YY + 2 C XX ) 1 (C XX +C YY ) = = ( 2 C YY + 2 C XX ) 1 C YY (C YY ) 1 (C XX +C YY ) = = [ 2 2 (C YY ) 1 C XX +I] 1 [ 2 (C YY ) 1 C XX + 1 I] = = 1 (A + 2 2 I) 1 (A + 1 1 I); (6.19) whereA = (C YY ) 1 C XX . By an analogous procedure, we find: (C XX ) 1 C XY = = ( 2 C XX + 2 C YY ) 1 (C XX +C YY ) = ( 2 C XX + 2 C YY ) 1 C YY (C YY ) 1 (C XX +C YY ) = [ 2 2 (C YY ) 1 C XX +I] 1 [ 2 (C YY ) 1 C XX + 1 I] = 1 (A + 2 2 I) 1 (A + 1 1 I): (6.20) Now letA =USV 0 be the singular value decomposition of the matrixA, which means thatA+I =U(S +I)V 0 , for any real. The matrix in equation (6.19) becomes: (C YY ) 1 C YX = = 1 [U(S + 2 2 I)V 0 ] 1 U(S + 1 1 I)V 0 = = 1 V(S + 2 2 I) 1 (S + 1 1 I)V 0 = = 1 VS 1 S 2 V 0 ; (6.21) whereS 1 = (S + 2 2 I) 1 andS 2 = (S + 1 1 I). Similarly for equation (6.20): (C XX ) 1 C XY = = 1 [U(S + 2 2 I)V 0 ] 1 U(S + 1 1 I)V 0 = = 1 V(S + 2 2 I) 1 (S + 1 1 I)V 0 = = 1 VS 3 S 2 V 0 ; (6.22) 80 whereS 3 = (S + 2 2 I) 1 . SinceS 1 ,S 2 andS 3 are diagonal, matrix multiplications between any of them will commute, and based on the equations (6.21) and (6.22), the matrices in equation (6.2) become: (C XX ) 1 C XY (C YY ) 1 C YX = = (C YY ) 1 C YX (C XX ) 1 C XY = = VS 1 (S 2 ) 2 S 3 V 0 : (6.23) Thus the eigenvalue equations that solve equation (6.1) are the same, and the canonical vectorsa andb are equal. Ifa =b, then the two vectors are collinear, and we use this condition in the linear mixing tests, as discussed in section 6.2. It should be pointed out that the relationship between linear mixing and collinearity betweena andb is not an “if and only if” one, as it is possible that two vectors have collinear canonical vectors without suffering from crosstalk. 81 Chapter 7 Cross-frequency Phase and Amplitude Coupling with Complex Canonical Correlations One limitation of the methodology presented in chapter 6 is that it deals with only one type of functional connectivity, namely that between neural power at different frequencies, while interactions inter-areal cross-frequency interactions between signal amplitude and phase, and between signal phases, have been reported, as described in sections 4.3 and 4.2, respectively. In this chapter, we introduce canonical correlation analysis on complex- valued data as a means of simultaneously testing several combinations of amplitude and phase interactions. 7.1 Motivation Letx k be the unit-length columns of observation matrixX, and lety k be the unit-length columns of observation matrixY. We can then see the canonical correlation between matricesX andY as the weighted sum of the cross-correlation coefficients between all vectorsx k ofX and all vectorsy k ofY: = n freqs X i=1 n freqs X j=1 a i b j x 0 i y j ; (7.1) where thea k andb k are the elements of the canonical vectorsa andb, computed with equation 6.2. Now let us look into the terms of the summation above. In chapter 6,x k andy k were amplitude measurements (more specifically, power measurements), 82 corresponding to the cross-frequency amplitude coupling described in section 4.1.1. If x k represents the frequency amplitude vector at one location andy k the phase vector (in exponential form) for these frequencies at the other location, then we can write equation (7.1) as: n freqs X i=1 n freqs X j=1 a i b j n trials X k=1 A i;k exp[j j;k ] = = n freqs X i=1 n freqs X j=1 a i b j EfA i exp[j j ]g; (7.2) where Ef:g denotes expected value, and A i;k and exp[j j;k ] represent the amplitude observation at frequencyi at one location and phase observation at frequencyj at the other location, respectively, both acquired during thek th trial. The expected values in the summation above give us the modulation index from section 4.3, thus allowing us to test simultaneously several combinations of cross-frequency phase-amplitude interactions between two locations. On the other hand, if bothx k andy k contain phase information, equation (7.1) becomes: n freqs X i=1 n freqs X j=1 a i b j n trials X k=1 exp[j j;k ] exp[j j;k ] = = n freqs X i=1 n freqs X j=1 a i b j Efexp[j( i j )]g: (7.3) This gives the phase synchrony from section 4.2, enabling simultaneous testing of cross- frequency phase-phase coupling. A method based on canonical correlations that allows complex-valued observations would be able to handle these coupling measurements; in fact, such a method has been discussed in the literature (Hannan, 1971; Brillinger, 2001), and we outline this approach below. 83 7.2 Complex Canonical Correlation As in section 6.1, our goal here is to find linear combinations of observation matricesX andY such that the correlation between these combinations is maximized; however, ifX andY are complex-valued, we cannot look for the maximum of equation (6.1), because can be complex too. Instead, we modify the cost function according to the following: ! =jj 2 = [(Xa) 0 (Yb)][(Xa) 0 (Yb)] s:t: 8 > < > : (Xa) 0 (Xa) = 1 (Yb) 0 (Yb) = 1 : (7.4) Writing the matricesX 0 X, Y 0 Y andX 0 Y in terms of the respective auto- and cross- covariancesC XX ,C YY andC XY , the Lagrangian function for equation (7.4) is: L(a;b; a ; b ) = (a 0 C XY b)(b 0 C YX a) a (a 0 C XX a 1) b (b 0 C YY b 1): (7.5) We now take the gradients ofL(a;b; a ; b ) with respect to the real and imaginary parts of vectorsa =a R +ja I andb =b R +jb I , and equal them to zero: 8 > > > > > > > < > > > > > > > : r a R L = 2 R C XY b 2 a C XX a R = 0 (I) r a I L = 2 I C XY b 2 a C XX a I = 0 (II) r b R L = 2 R C YX a 2 b C YY b R = 0 (III) r b I L = 2 I C YX a 2 b C YY b I = 0 (IV) ; (7.6) where R = 0:5(+ ), and I =0:5j( ). In the system above, if we add equation (I) to equation (II), and add equation (III) to equation (IV), we obtain the system: 8 > < > : a C XX a = C XY b b C YY b =C YX a : (7.7) 84 We also find that a = b =!. Manipulating the above system gives us the eigenvalue equations that are the solution of the maximization problem shown in equation (7.4): 8 > < > : (C XX ) 1 C XY (C YY ) 1 C YX a =!a (C YY ) 1 C YX (C XX ) 1 C XY b =!b : (7.8) This system is the same as equation (6.2), except for the fact that here we need to take into account the fact that is complex; thus the expression we use to find the canonical correlations of real-valued matrices can be also applied to complex-valued observations. 7.3 Simulated Data To verify the validity of complex canonical correlations and their possible application in observations obtained from MEG data, we apply the results from the previous section to simulated signals where we observe the phenomenon of nested oscillations, or the amplitude of the signal at a spatial location is coupled with the phase of the signal at another location. The simulated brain activation has the spatial profile shown in Figure 7.1, consisting of two sources, one in each hemisphere. Signalsz 1 (n;t) at source 1 and z 2 (n;t) at source 2 have the following expression: z 1 (n;t) = [2 +A 1 (n)] cos[2f 1 t + 1 (n)] z 2 (n;t) = [2 +A 2 (n)] cos[2f 2 t + 2 (n)] (7.9) where n and t are time and trial indices, respectively, f 1 = 45Hz (lying in the low- gamma frequency band), f 2 = 6:5Hz (lying in the theta frequency band), 1 (n) and 2 (n) vary uniformly between 0 and 2, A 1 (n) has a standard uniform distribution, andA 2 (n) = sin[ 1 (n)]. After projecting the sources onto the sensor level and adding Gaussian noise, we proceed with the current density map reconstruction and wavelet decomposition as usual. The only difference here from the previous chapters is that we 85 do not integrate the power of the wavelet coefficients over time nor frequency, but use for our observations the complex-valued coefficientsC stf att = 0:5s (half of the trial period) and and at the central frequencies of all six frequency bands considered so far - namely, 3, 6.5, 10, 22.5, 45 and 75 Hz; we do not integrate over frequencies because, unlike in the previous chapters, we want to keep the phase properties of the signal. Figure 7.1: (a) Spatial profile of two simulated sources: the phase at the theta band in source 1 is coupled with the amplitude at the low-gamma band in source 2; (b) Map of voxels with significant coupling with a reference in region 2, where matrixX contains amplitude information, and matricesY contain phase information; (c) Map of voxels with significant coupling with a reference in region 1, where matrixX contains phase information, and matricesY contain amplitude information. Figure 7.1b shows the canonical correlation map after thresholding for significance. For these images, observation matrixX refers to a reference voxel inside region 2, and matrixY refers to any other voxel in the brain, but unlike chapter 6, each element of matrixX contains the absolute value of the coefficientsC stf of the corresponding trial and frequency, and each element ofY contains the complex exponential of the angle of C stf . On the other hand, in figure 7.1c the reference is a voxel inside region 1, and we switch the components of the observations: nowX contains phase information, whileY 86 contains amplitude information. The statistical tests were based on FWER, the maximum statistic, and permutation methods as in chapter 5; to obtain the maximum distribution, the permuted samples were computed by randomly exchanging the order of the rows in matrixX, without changing matrixY. Figures 7.1b and 7.1c indicate that our method is able to detect correctly the regions interacting with the reference according to our simulation settings. Having found the voxels that significantly interact with the reference, we are interested in determining the frequencies that cause that interaction. A simple way to find these frequencies is by using the canonical correlation threshold directly: given a pair of voxels whose canonical correlation is considered significant if it exceeds some threshold , and the fact that canonical correlation is the maximum correlation among all possible linear combinations of matrices X and Y, then we can simply state that any linear combination that is above is also significant (Carbonell et al., 2004, 2009). The results of these tests are presented in figure 7.2. In figure 7.2a, we test whether the coupling between the theta component of matrixX and the low-gamma component of matrixY are significant, based on the threshold applied to figure 7.1b; and in figure 7.2b, we test the coupling between the low-gamma component of matrixX and the theta component of matrixY with the threshold applied to figure 7.1b. Once again, we can verify the ability of this approach to retrieve the theta-low-gamma coupling we simulated, and did not find any other cross-frequency interaction between regions 1 and 2. Figures 7.1 and 7.2 indicate that complex canonical correlation analysis is a promising technique for detecting cross-frequency interactions between the signal amplitude at one location and the signal phase at another, and determining which frequencies are responsible for those amplitude-phase interactions. However, its applicability to cross- frequency phase-phase interactions would require further analysis, due to how phase 87 (a) (b) Figure 7.2: (a) V oxels where the low-gamma phase interacts strongly with the theta amplitude at a reference in region 2; (b) V oxels where the theta amplitude interacts strongly with the low-gamma phase at a reference in region 1. No other phase-amplitude or amplitude-phase combination yielded significant coupling. coupling works in the brain: for instance, if we created observation matrices X and Y only with components of the form exp[j k ], we would not be able to test phase interactions between two different frequenciesf 1 andf 2 such thatMf 1 =Nf 2 , because the phase synchrony in that case is not between 1 and 2 , but betweenM 1 andN 2 . The developments and results presented in this chapter are a preliminary exploration of this approach to detecting amplitude/phase and phase/phase interaction. Despite the limitation just described, we believe this is a powerful approach to exploratory analysis of EEG and MEG data when multiple different types of interaction may be present in the data. 88 Chapter 8 Conclusions In this study, we have developed multivariate methods for the detection of significant task-related modulations of oscillatory activity, and for the assessment of cross-frequency functional interactions between brain regions, with MEG data. By combining time- frequency decomposition of the MEG signals and multivariate linear models, we could reliably detect task-related cortical changes in activation where univariate approaches were previously insensitive, and also single out the frequency bands that were responsible for these changes. Furthermore, our canonical correlation method was able to accurately detect voxels on the cortical surface that correlate with a given reference, while also finding the signal frequencies that are responsible for this correlation, and discarding from the analysis voxels whose interaction with the reference was due to linear mixing. The advantages of these multivariate techniques were shown with both simulated data and MEG visuomotor recordings. Finally, we demonstrated that canonical correlation, when applied to complex-valued signals, is a promising technique that will enable us to test cross-frequency interactions beyond the power-power coupling we have analyzed so far. We extended the multivariate linear models to multiple subject statistical tests, and our findings match other visuomotor studies reported in the literature. Regarding multi- subject canonical correlation analysis, we detected coupling between motor and visual areas in the brain, which should be expected given the task carried out; however, one limitation of our method is that it cannot detect functional interactions that are exclusively task-based, because data from a control or rest condition is left out of the analysis. 89 One possible way of including rest data is by means of the so-called partial canonical correlation (Timm and Carlson, 1976; Lee, 1978), whose basic idea is to project data from one condition into the data from the other, and use the residues of the projection as the inputs to the canonical correlation. Other possible approaches include testing the difference between for a task and at rest (Jerbi et al., 2007; Chen et al., 2003; Classen et al., 1998); and adapting any of the methods for pooling p-values discussed in section 3.3.3, applied now to different conditions instead of to different subjects. In a sense, multivariate linear models and canonical correlations can be considered complementary methods; in fact it can be argued that multivariate analysis of variance (MANOV A) is but a particular case of canonical correlation analysis (Muller, 1982). As MANOV A finds not only changes in oscillatory activity, but also the frequency bands responsible for those changes, we can use this framework to find frequency interactions within cortical locations; combined with canonical correlations, which find frequency interactions between cortical locations, we obtain a more complete picture of functional connectivity in the brain for a given set of MEG power data. 90 References R.J. Adler. The geometry of random fields. John Wiley & Sons Inc, 1981. T.W. Anderson. An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, Hoboken, 3 rd edition, 2003. J.R. Andrews-Hanna, A.Z. Snyder, J.L. Vincent, C. Lustig, D. Head, M.E. Raichle, and R.L. Buckner. Disruption of large-scale brain systems in advanced aging. Neuron, 56 (5):924–935, 2007. L.A. Baccal´ a and K. Sameshima. Partial directed coherence: a new concept in neural structure determination. Biological Cybernetics, 84(6):463–474, 2001. E. Bas ¸ar, C. Bas ¸ar-Eroglu, S. Karakas ¸, and M. Sch¨ urmann. Gamma, alpha, delta, and theta oscillations govern cognitive processes. International Journal of Psychophysiology, 39(2-3):241–248, 2001. S. Baillet, J.C. Mosher, and R.M. Leahy. Electromagnetic brain mapping. IEEE Signal Processing Magazine, 18(6):14–30, 2001. S. Baillet, J.C. Mosher, and R.M. Leahy. Brainstorm: A non-commercial matlab toolbox for meg-eeg data visualization and processing. World Wide Web electronic publication, 2004. URLhttp://neuroimage.usc.edu/brainstorm. G.R. Barnes and A. Hillebrand. Statistical flattening of MEG beamformer images. Human Brain Mapping, 18(1):1–12, 2003. C.F. Beckmann, M. Jenkinson, and S.M. Smith. General multilevel linear modeling for group analysis in fMRI. NeuroImage, 20(2):1052–1063, 2003. Y . Benjamini and R. Heller. Screening for partial conjunction hypotheses. Technical report, Department of Statistics and Operations Research, Tel Aviv University, 2007. Y . Benjamini and Y . Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57: 289–300, 1995. 91 Y . Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics, 29(4):1165–1188, 2001. M. Besserve, K. Jerbi, F. Laurent, S. Baillet, J. Martinerie, and L. Garnero. Classification methods for ongoing EEG and MEG signals. Biological Research, 40:415–437, 2007. ˚ A. Bj¨ orck and G.H. Golub. Numerical methods for computing angles between linear subspaces. Mathematics of Computation, 27(123):579–594, 1973. J.H. Bray and S.E. Maxwell. Multivariate Analysis of Variance. Sage, Thousand Oaks, CA, 1985. D.R. Brillinger. Time Series: Data Analysis and Theory. Society for Industrial and Applied Mathematics, Philadelphia, 2001. ISBN 0898715016. M.J. Brookes, A.M. Gibson, S.D. Hall, P.L. Furlong, G.R. Barnes, A. Hillebrand, K.D. Singh, I.E. Holliday, S.T. Francis, and P.G. Morris. A general linear model for MEG beamformer imaging. NeuroImage, 23:936–946, 2004. A. Brovelli, M. Ding, A. Ledberg, Y . Chen, R. Nakamura, and S.L. Bressler. Beta oscil- lations in large-scale sensorimotor cortical network: directional influences revealed by granger causality. Proceedings of the National Academy of Sciences, 101:9849–9854, 2004. A. Bruguier, K. Preuschoff, S. Quartz, and P. Bossaerts. Investigating signal interaction with canonical correlation analysis of fMRI brain activation data. NeuroImage, 41: 35–44, 2008. A. Bruns and R. Eckhorn. Task-related coupling from high- to low-frequency signals among visual cortical areas in human subdural recordings. International Journal of Psychophysiology, 51:97116, 2004. N.A. Busch and R. VanRullen. Spontaneous EEG oscillations reveal periodic sampling of visual attention. Proceedings of the National Academy of Sciences, 107(37):16048, 2010. ISSN 0027-8424. C.F. Cadieu and K. Koepsell. A multivariate phase distribution and its estimation. Arxiv preprint arXiv:0809.4291, 2008. R.T. Canolty, E. Edwards, S.S. Dalal, M. Soltani, S.S. Nagarajan, H.E. Kirsch, M.S. Berger, N.M. Barbaro, and R.T. Knight. High gamma power is phase-locked to theta oscillations in human neocortex. Science, 313:1626–1628, 2006. F. Carbonell, L. Gal´ an, P. Vald´ es, K. Worsley, R.J. Biscay, L. D´ ıaz-Comas, M.A. Bobes, and M. Parra. Random field - union intersection tests for EEG/MEG imaging. Neu- roImage, 22(1):268–276, 2004. 92 F. Carbonell, L. Gal´ an, and K.J. Worsley. The geometry of the Wilks’s random field. Annals of the Institute of Statistical Mathematics, 2008. Accepted. F. Carbonell, K.J. Worsley, N.J. Trujillo-Barreto, and R.C. Sotero. Random fields - union intersection tests for detecting functional connectivity in MEG/EEG imaging. Human Brain Mapping, 30:2477–2486, 2009. T.A. Carlson, P. Schrater, and S. He. Patterns of activity in the categorical representations of objects. Journal of Cognitive Neuroscience, 15(5):704–717, 2003. D.P. Chakraborty and L.H. Winter. Free-response methodology: alternate analysis and a new observer-performance experiment. Radiology, 174(3):873, 1990. Y . Chen, M. Ding, and J.A. Scott Kelso. Task-related power and coherence changes in neuromagnetic activity during visuomotor coordination. Experimental Brain Research, 148:105–116, 2003. J. Classen, C. Gerloff, M. Honda, and M. Hallett. Integrative visuomotor behavior is associated with interregionally coherent oscillations in the human brain. Journal of Neurophysiology, 79(3):1567, 1998. ISSN 0022-3077. D. Cohen. Magnetoencephalography: Evidence of magnetic fields produced by alpha- rhythm currents. Science, 161:664–666, 1972. D.A. Cole, S.E. Maxwell, R. Arvey, and E. Salas. How the power of MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. Psychological Bulletin, 115(3):465–474, 1994. D.D. Cox and R.L. Savoy. Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage, 19(2):261–270, 2003. N.E. Crone, D.L. Miglioretti, B. Gordon, J.M. Sieracki, M.T. Wilson, S. Uematsu, and R.P. Lesser. Functional mapping of human sensorimotor cortex with electrocorticographic spectral analysis. i. alpha and beta event-related desynchronization. Brain, 121(12): 2271–2299, 1998. A.M. Dale, A.K. Liu, B.R. Fischl, R.L. Buckner, J.W. Belliveau, J.D. Lewine, and E. Halgren. Dynamic Statistical Parametric Mapping:: Combining fMRI and MEG for High-Resolution Imaging of Cortical Activity. Neuron, 26(1):55–67, 2000. F. Darvas and R.M. Leahy. Functional imaging of brain activity and connectivity with MEG. In V .K. Jirsa and A.R. McIntosh, editors, Handbook of Brain Connectivity, pages 201–219. Springer, Berlin, 2007. 93 F. Darvas, D. Pantazis, E. Kucukaltun-Yildirim, and RM Leahy. Mapping human brain function with MEG and EEG: methods and validation. NeuroImage, 23:S289–S299, 2004. F. Darvas, M. Rautiainen, D. Pantazis, S. Baillet, H. Benali, J.C. Mosher, L. Garnero, and R.M. Leahy. Investigations of dipole localization accuracy in MEG using the bootstrap. NeuroImage, 25(2):355–368, 2005. O. David and K.J. Friston. A neural mass model for MEG/EEG::: coupling and neuronal dynamics. NeuroImage, 20(3):1743–1755, 2003. O. David, D. Cosmelli, and K.J. Friston. Evaluation of different measures of functional connectivity using a neural mass model. NeuroImage, 21(2):659–673, 2004. F.P. de Lange, O. Jensen, M. Bauer, and I. Toni. Interactions between posterior gamma and frontal alpha/beta oscillations during imagined actions. Frontiers in Human Neuroscience, 2:7, 2008. F. de Pasquale, S. Della Penna, A.Z. Snyder, C. Lewis, D. Mantini, L. Marzetti, P. Belar- dinelli, L. Ciancetta, V . Pizzella, G.L. Romani, and M. Corbetta. Temporal dynamics of spontaneous MEG activity in brain networks. Proceedings of the National Academy of Sciences, 107(13):6040, 2010. J.P. Donoghue. Connecting cortex to machines: recent advances in brain interfaces. Nature Neuroscience, 5(supp):1085–1088, 2002. P.J. Durka, J. Zygierewicz, H. Klekowicz, J. Ginter, and K.J. Blinowska. On the statistical significance of event-related eeg desynchronization and synchronization in the time- frequency plane. IEEE Transactions on Biomedical Engineering, 51(7):1167–1175, 2004. E. D¨ uzel, R. Habib, B. Schott, A. Schoenfeld, N. Lobaugh, A.R. McIntosh, M. Scholz, and H.J. Heinze. A multivariate, spatiotemporal analysis of electromagnetic time- frequency data of recognition memory. NeuroImage, 18:185–197, 2003. A. Field. Discovering Statistics Using SPSS. Sage, Thousand Oaks, CA, 2005. A.A. Fingelkurts, A.A. Fingelkurts, and S. K¨ ahk¨ onen. Functional connectivity in the brain–is it an elusive concept? Neuroscience & Biobehavioral Reviews, 28(8):827–836, 2005. M.D. Fox, A.Z. Snyder, J.L. Vincent, M. Corbetta, D.C. Van Essen, and M.E. Raichle. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proceedings of the National Academy of Sciences, 102(27):9673, 2005. 94 P. Fries. A mechanism for cognitive dynamics: neuronal communication through neuronal coherence. Trends in Cognitive Sciences, 9(10):474–480, 2005. K.J. Friston. Statistical parametric mapping and other analysis of functional imaging data. In A.W. Toga and J.C. Mazziotta, editors, Brain Mapping: The Methods, pages 363–385. Academic Press, San Diego, 1996. K.J. Friston, C.D. Frith, R.S.J. Frackowiak, and R. Turner. Characterizing dynamic brain responses with fMRI: a multivariate approach. NeuroImage, 2:166–172, 1995a. K.J. Friston, A.P. Holmes, K.J. Worsley, J.B. Poline, C. Frith, and R.S.J. Frackowiak. Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping, 2:189–210, 1995b. K.J. Friston, K.M. Stephan, J.D. Heather, C.D. Frith, A.A. Ioannides, L.C. Liu, M.D. Rugg, J. Vieth, H. Keber, K. Hunter, and R.S.J. Frackowiak. A multivariate analysis of evoked responses in EEG and MEG data. Neuroimage, 3(3):167–174, 1996. C. Genovese, N. Lazar, and T. Nichols. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage, 15:870–878, 2002. J. Geweke. Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association, 77(378):304–313, 1982. P. Gloor. Neuronal generators and the problem of localization in electroencephalography: application of volume conductor theory to electroencephalography. Journal of Clinical Neurophysiology, 2(4):327, 1985. R. Goebel. Brainvoyager: A program for analyzing and visualizing functional and structural magnetic resonance data sets. Neuroimage, 3(3):S604, 1996. G.H. Golub, M. Heath, and G. Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979. ISSN 0040- 1706. G. G´ omez-Herrero, M. Atienza, K. Egiazarian, and J.L. Cantero. Measuring directional coupling between EEG sources. NeuroImage, 43:497508, 2008. CWJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 37(3):424–438, 1969. J. Gross, J. Kujala, M. H¨ am¨ al¨ ainen, L. Timmermann, A. Schnitzler, and R. Salmelin. Dynamic imaging of coherent sources: studying neural interactions in the human brain. Proceedings of the National Academy of Sciences, 98(2):694–699, 2001. 95 M. H¨ am¨ al¨ ainen, R. Hari, R.J. Ilmoniemi, J. Knuutila, and O.V . Lounasmaa. Magnetoen- cephalography - theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65(2):413–497, 1993. M.S. H¨ am¨ al¨ ainen and R.J. Ilmoniemi. Interpreting measured magnetic fields of the brain: estimates of current distributions. Technical report, Technical Report TKK-F-A559, Helsinki University of Technology, Espoo, 1984. E.J. Hannan. Multiple Time Series. Wiley, New York, 1971. P.C. Hansen. Analysis of discrete ill-posed problems by means of the L-curve. SIAM review, 34(4):561–580, 1992. ISSN 0036-1445. P.C. Hansen and D.P. O’Leary. The use of the L-curve in the regularization of discrete ill-posed problems. SIAM Journal on Scientific Computing, 14(6):1503, 1993. ISSN 1064-8275. J.D. Haynes and G. Rees. Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7):523–534, 2006. R. Heller, Y . Golland, R. Malach, and Y . Benjamini. Conjunction group analysis: an alternative to mixed/random effect analysis. NeuroImage, 37:1178–1185, 2007. Y . Hochberg. A sharper bonferroni procedure for multiple tests of significance. Biometrika, 75(4):800–802, 1988. Y . Hochberg and A.C. Tamhane. Multiple comparison procedures. Wiley New York, 1987. S. Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2):65–70, 1979. A.P. Holmes and K.J. Friston. Generalisability, random effects and population inference. NeuroImage, 7:S754, 1998. H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, 1936. M. Huang, J.C. Mosher, and R.M. Leahy. A sensor-weighted overlapping-sphere head model and exhaustive head model comparison for MEG. Physics in Medicine and Biology, 44:423–440, 1999. H.B. Hui and R.M. Leahy. Linearly constrained MEG beamformers for MVAR modeling of cortical interactions. In 3rd IEEE International Symposium on Biomedical Imaging: Macro to Nano, pages 237–240, 2006. H.B. Hui, D. Pantazis, S.L. Bressler, and R.M. Leahy. Identifying true cortical interactions in MEG using the nulling beamformer. NeuroImage, 2009. 96 F.C. Hummel and C. Gerloff. Interregional long-range and short-range synchrony: a basis for complex sensorimotor processing. Progress in Brain Research, 159:223–236, 2006. O. Jensen and L.L. Colgin. Cross-frequency coupling between neuronal oscillations. Trends in Cognitive Science, 11(7):267–269, 2007. K. Jerbi and O. Bertrand. Cross-frequency coupling in parieto-frontal oscillatory networks during motor imagery revealed by magnetoencephalography. Frontiers in Neuroscience, 3(1):3–4, 2009. K. Jerbi, J.-P. Lachaux, S. Baillet, and L. Garnero. Imaging cortical oscillations during sustained visuomotor coordination in MEG. In IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 380–383, 2004. K. Jerbi, J.-P. Lachaux, K. N’Diaye, D. Pantazis, R.M. Leahy, L. Garnero, and S. Baillet. Coherent neural representation of hand speed in humans revealed by MEG imaging. Proceedings of the National Academy of Sciences, 104(18):7676–7681, 2007. M. Kaminski, M. Ding, W.A. Truccolo, and S.L. Bressler. Evaluating causal relations in neural systems: Granger causality, directed transfer function and statistical assessment of significance. Biological Cybernectics, 85:145–157, 2001. M.J. Kaminski and K.J. Blinowska. A new method of the description of the information flow in the brain structures. Biological Cybernetics, 65(3):203–210, 1991. S. Kiebel. The general linear model. In R. Frackowiak, K. Friston, C. Frith, R. Dolan, C. Price, S. Zeki, J. Ashburner, and W. Penny, editors, Human Brain Function, 2 nd Edition. Academic Press, New York, 2003. S.J. Kiebel, C. Tallon-Baudry, and K.J. Friston. Parametric analysis of oscillatory activity as measured with EEG/MEG. Human Brain Mapping, 26(3):170–177, 2005. J.M. Kilner, S.J. Kiebel, and K.J. Friston. Applications of random field theory to electrophysiology. Neuroscience Letters, 374(3):174–178, 2005. N. Kriegeskorte, R. Goebel, and P. Bandettini. Information-based functional brain mapping. Proceedings of the National Academy of Sciences, 103(10):3863, 2006. E. K¨ uc ¸ ¨ ukaltun-Yıldı rım, D. Pantazis, and R.M. Leahy. Task-based comparison of inverse methods in magnetoencephalography. IEEE Transactions on Biomedical Engineering, 53(9):1783–1793, 2006. R. Kus, M. Kaminski, and K.J. Blinowska. Determination of EEG activity propaga- tion: pair-wise versus multi-channel estimate. IEEE Transactions on Biomedical Engineering, 51(9):1501–1510, 2004. 97 J.P. Lachaux, E. Rodriguez, J. Martinerie, and F.J. Varela. Measuring phase synchrony in brain signals. Human Brain Mapping, 8:194–208, 1999. P. Lakatos, G. Karmos, A.D. Mehta, I. Ulbert, and C.E. Schroeder. Entrainment of neuronal oscillations as a mechanism of attentional selection. Science, 320:110–113, 2008. N.A. Lazar, B. Luna, J.A. Sweeney, and W.F. Eddy. Combining brains: a survey of methods for statistical pooling of information. NeuroImage, 16:538–550, 2002. RM Leahy, JC Mosher, ME Spencer, MX Huang, and JD Lewine. A study of dipole localization accuracy for MEG and EEG using a human skull phantom. Electroen- cephalography and clinical neurophysiology, 107(2):159–173, 1998. M.A. Lebedev and M.A.L. Nicolelis. Brain–machine interfaces: past, present and future. TRENDS in Neurosciences, 29(9):536–546, 2006. S.Y . Lee. Generalizations of the partial, part and bipartial canonical correlation analysis. Psychometrika, 43(3):427–431, 1978. F.-H. Lin, T. Witzel, M.S. H¨ am¨ al¨ ainen, A.M. Dale, J.W. Belliveau, and S.M. Stufflebeam. Spectral spatiotemporal imaging of cortical oscillations and interactions in the human brain. NeuroImage, 23:582–595, 2004. L. Marzetti, C. Del Gratta, and G. Nolte. Understanding brain connectivity from EEG data by identifying systems composed of interacting sources. NeuroImage, 42:8798, 2008. A.R. McIntosh and N.J. Lobaugh. Partial least squares analysis of neuroimaging data: applications and advances. NeuroImage, 23:250–263, 2004. R.L. McNamee and N.A. Lazar. Assessing the sensitivity FMRI group maps. NeuroImage, 22:920–931, 2004. C.E. Metz. Receiver operating characteristic analysis: a tool for the quantitative evalua- tion of observer performance and imaging systems. Journal of the American College of Radiology, 3(6):413–422, 2006. S. Monto, S. Palva, J. V oipio, and J.M. Palva. Very slow EEG fluctuations predict the dynamics of stimulus detection and oscillation amplitudes in humans. The Journal of Neuroscience, 28(33):82688272, 2008. F. Mormann, J. Fell, N. Axmacher, B. Weber, K. Lehnertz, C.E. Elger, and G. Fern´ andez. Phase/amplitude reset and theta-gamma interaction in the human medial temporal lobe during a continuous word recognition memory task. Hippocampus, 15:890–900, 2005. 98 JC Mosher and RM Leahy. Recursive MUSIC: a framework for EEG and MEG source localization. IEEE Transactions on Biomedical Engineering, 45(11):1342–1354, 1998. J.C. Mosher, R.M. Leahy, and P.S. Lewis. EEG and MEG: Forward solutions for inverse problems. IEEE Transactions on Biomedical Engineering, 46(3):245–259, 1999. K.E. Muller. Understanding canonical correlation through the general linear model and principal components. American Statistician, 36(4):342–354, 1982. J.A. Mumford and T. Nichols. Modeling and inference in multisubject fMRI data. IEEE Engineering in Medicine and Biology Magazine, March/April:42–51, 2006. T. Nichols, M. Brett, J. Andersson, T. Wager, and J.-B. Poline. Valid conjunction inference with the minimum statistic. NeuroImage, 25:653–660, 2005. T.E. Nichols and S. Hayasaka. Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical Methods in Medical Research, 12: 419–446, 2003. T.E. Nichols and A.P. Holmes. Nonparametric permutation tests for functional neu- roimaging: a primer with examples. Human Brain Mapping, 15:1–25, 2001. V .V . Nikulin and T. Brismar. Phase synchronization between alpha and beta oscillations in the human electroencephalogram. Neuroscience, 137(2):647–657, 2006. ISSN 0306-4522. G. Nolte, O. Bai, L. Wheaton, Z. Mari, S. V orbach, and M. Hallet. Identifying true brain interaction from EEG data using the imaginary part of coherency. Clinical Neurophysiology, 115:2292–2307, 2004. G. Nolte, A. Ziehe, V .V . Nikulin, A. Schl¨ ogl, N. Kr¨ amer, T. Brismar, and K.R. M¨ uller. Robustly estimating the flow direction of information in complex physical systems. Physical Review Letters, 100(23):234101, 2008. K.A. Norman, S.M. Polyn, G.J. Detre, and J.V . Haxby. Beyond mind-reading: multi- voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9):424–430, 2006. P.L. Nunez and R.B. Silberstein. On the relationship of synaptic activity to macro- scopic measurements: Does co-registration of EEG with fMRI make sense? Brain Topography, 13(2):79–96, 2000. P.L. Nunez and R. Srinivasan. Electric fields of the brain: the neurophysics of EEG. Oxford University Press, USA, 2006. 99 P.L. Nunez, R. Srinivasan, A.F. Westdorf, R.S. Wijesinghe, D.M. Tucker, R.B. Silberstein, and P.J. Cadusch. EEG coherency i: statistics, reference electrode, volume conduction, laplacians, cortical imaging, and interpretation at multiple scales. Electroencephalog- raphy and Clinical Neurophysiology, 103:499–515, 1997. Y . Okada. Discrimination of localized and distributed current dipole sources and localized single and multiple sources. In W. Weinberg, G. Stroink, and T. Katila, editors, Biomagnetism, an Interdisciplinary Approach, pages 266–272. Pergamon, New York, 2003. D. Osipova, D. Hermes, and O. Jensen. Gamma power is phase-locked to posterior alpha activity. PLOS ONE, 3(12):E3990, 2008. J.M. Palva, S. Palva, and K. Kaila. Phase synchrony among neuronal oscillations in the human cortex. The Journal of Neuroscience, 25(15):3962–3972, 2005. J.M. Palva, S. Monto, S. Kulashekhar, and S. Palva. Neuronal synchrony reveals working memory networks and predicts individual memory capacity. Proceedings of the National Academy of Sciences, 107(16):7580, 2010. S. Palva and J.M. Palva. New vistas for [alpha]-frequency band oscillations. Trends in neurosciences, 30(4):150–158, 2007. ISSN 0166-2236. D. Pantazis. Statistical Signal Processing of Magnetoencephalography Data. PhD thesis, University of Southern California, 2006. D. Pantazis, T.E. Nichols, S. Baillet, and R.M. Leahy. Spatiotemporal localization of significant activation in MEG using permutation tests. In C. Taylor and J.A. Noble, editors, Proceedings of the 18th Conference on Information Processing in Medical Imaging, pages 512–523, 2004. D. Pantazis, T.E. Nichols, S. Baillet, and R.M. Leahy. A comparison of random field theory and permutation methods for the statistical analysis of MEG data. NeuroImage, 25:383–394, 2005a. D. Pantazis, T.E. Nichols, S. Baillet, and R.M. Leahy. A comparison of random field theory and permutation methods for the statistical analysis of meg data. NeuroImage, 25:383–394, 2005b. D. Pantazis, D.L. Weber, C.L. Dale, T.E. Nichols, G.V . Simpson, and R.M. Leahy. Imaging of oscillatory behavior in event-related meg studies. In C. Bouman and E. Miller, editors, Proceedings of SPIE, Computational Imaging III, volume 5674, pages 55–63, 2005c. 100 D. Pantazis, G.V . Simpson, D.L. Weber, C.L. Dale, T.E. Nichols, and R.M. Leahy. A novel ANCOVA design for analysis of MEG data with application to a visual attention study. NeuroImage, 44:164–174, 2009a. D. Pantazis, G.V . Simpson, D.L. Weber, C.L. Dale, T.E. Nichols, and R.M. Leahy. A novel ANCOVA design for analysis of MEG data with application to a visual attention study. NeuroImage, 44:164–174, 2009b. H. Park, J. Kwon, T. Youn, J. Pae, J. Kim, M. Kim, and K. Ha. Statistical parametric mapping of LORETA using high density EEG and individual FMRI: application to mismatch negativities in schizophrenia. Human Brain Mapping, 17:168–178, 2002. RD Pascual-Marqui. Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find Exp Clin Pharmacol, 24(Suppl D):5–12, 2002. W.D. Penny, E. Duzel, K.J. Miller, and J.G. Ojemann. Testing for nested oscillation. Journal of Neuroscience Methods, 174:5061, 2008. F. Pereira, T. Mitchell, and M. Botvinick. Machine learning classifiers and fMRI: a tutorial overview. NeuroImage, 45(1S1):199–209, 2009. G. Pfurtscheller and F.H. Lopes da Silva. Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical Neurophysiology, 110:1842–1857, 1999. G. Pfurtscheller and C. Neuper. Motor imagery and direct brain-computer communication. Proceedings of the IEEE, 89(7):1123–1134, 2001. M. Le Van Quyen, J. Foucher, J.P. Lachaux, E. Rodriguez, A. Lutz, and J. Martinerie. Comparison of hilbert transform and wavelet methods for the analysis of neuronal synchrony. Journal of Neuroscience Methods, 111:83–89, 2001. S. Ram´ on y Cajal. Textura del Sistema Nervoso del Hombre y de los Vertebrados, volume 2. Nicol´ as Moya, Madrid, 1904. A.C. Rencher. Interpretation of canonical discriminant functions, canonical variates, and principal components. The American Statistician, 46(3):217–225, 1992. A.C. Rencher. Methods of Multivariate Analysis. Wiley, Hoboken, NJ, 1995. A.C. Rencher and D.T. Scott. Addressing the contribution of individual variables follow- ing rejection of a multivariate hypothesis. Communications in Statistics - Simulation and Computation, 19(2):535–553, 1990. S.N. Roy. On a heuristic method of test construction and its use in multivariate statistics. The Annals of Mathematical Statistics, 24(2):220–238, 1953. 101 R. Salmelin, M. H¨ am¨ al¨ ainen, M. Kahola, and R. Hari. Functional segregation of movement-related rhythmic activity in the human brain. NeuroImage, 2(4):237–243, 1995. S.K. Sarkar. Some results on false discovery rate in stepwise multiple testing procedures. Annals of Statistics, 30(1):239–257, 2002. ISSN 0090-5364. J. Sarvas. Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Physics in Medicine and Biology, 32:11, 1987. B. Schack and S. Weiss. Quantification of phase synchronization phenomena and their importance for verbal memory processes. Biological Cybernetics, 92:275287, 2005. B. Schack, N. Vath, H. Petsche, H.-G. Geissler, and E. M¨ oller. Phase-coupling of theta- gamma EEG rhythms during short-term memory processing. International Journal of Psychophysiology, 44:143–163, 2002. B. Schack, W. Klimesch, and P. Sauseng. Phase synchronization between theta and upper alpha oscillations in a working memory task. International Journal of Psychophysiol- ogy, 57(2):105–114, 2005. ISSN 0167-8760. R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276–280, 1986. J.-M. Schoffelen and J. Gross. Source connectivity analysis with MEG and EEG. Human Brain Mapping, 30:1857–1865, 2009. G.A.F. Seber. Multivariate Observations. Wiley, New York, 1984. G.A.F. Seber. Linear Regression Analysis. Wiley, New York, 2003. K. Sekihara, M. Sahani, and S.S. Nagarajan. A simple nonparametric statistical thresh- olding for MEG spatial-filter source reconstruction images. NeuroImage, 27:368–376, 2005. D. Shattuck and R. Leahy. BrainSuite: an automated cortical surface identification tool. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2000. Springer, 2000. K. Singh, G.R. Barnes, and A. Hillebrand. Group imaging of task-related changes in cortical synchronization using nonparametric permutation testing. NeuroImage, 19: 1589–1601, 2003. L.R. Squire. Fundamental Neuroscience. Academic Press, 2008. C. Tallon-Baudry and O. Bertrand. Oscillatory gamma activity in humans and its role in object representation. Trends in Cognitive Sciences, 3(4):151–162, 1999. 102 C. Tallon-Baudry, O. Bertrand, C. Delpuech, and J. Pernier. Stimulus specificity of phase-locked and non-phase-locked 40 Hz visual responses in human. Journal of Neuroscience, 16(13):4240, 1996. P. Tass, M.G. Rosenblum, J. Weule, J. Kurths, A. Pikovsky, J. V olkmann, A. Schnitzler, and H.J. Freund. Detection ofn : m phase locking from noisy data: application to magnetoencephalography. Physical Review Letters, 81(15):3291–3294, 1998. J.E. Taylor and K.J. Worsley. Random fields of multivariate test statistics, with applica- tions to shape analysis. Annals of Statistics, 36(1):1–27, 2008. A. Teolis. Computational Signal Processing with Wavelets. Birkh¨ auser, Boston, 1998. D.R. Thomas. Interpreting discriminant functions: a data analytic approach. Multivariate Behavioral Research, 27(3):335–362, 1992. D.R. Thomas and B.D. Zumbo. Using a measure of variable importance to investigate the standardization of discriminant coefficients. Journal of Educational and Behavioral Statistics, 21(2):110–130, 1996. A.N. Tikhonov and V .Y . Arsenin. Solutions of Ill-Posed Problems. Winston, Washington, DC, 1977. N.H. Timm and J.E. Carlson. Part and bipartial canonical correlation analysis. Psychome- trika, 41(2):159–176, 1976. B.D. Van Veen and K.M. Buckley. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2):4–24, 1988. BD Van Veen, W. Van Drongelen, M. Yuchtman, and A. Suzuki. Localization of brain electrical activity via linearly constrained minimum variance spatial filtering. IEEE Transactions on Biomedical Engineering, 44(9):867–880, 1997. F. Varela, J.P. Lachaux, E. Rodriguez, and J. Martinerie. The brainweb: phase synchro- nization and large-scale integration. Nature Reviews Neuroscience, 2(4):229–239, 2001. G. Wahba. Spline models for observational data. In CBMS-NSF Regional Conference Series in Applied Mathematics, volume 59, Philadelphia, 1990. Society for Industrial and Applied Mathematics. S. Waldert, H. Preissl, E. Demandt, C. Braun, N. Birbaumer, A. Aertsen, and C. Mehring. Hand movement direction decoded from MEG and EEG. Journal of Neuroscience, 28(4):1000–1008, 2008. 103 H. Witte, P. Putsche, C. Hemmelmann, C. Schelenz, and L. Leistritz. Analysis and modeling of time-variant amplitude–frequency couplings of and between oscillations of EEG bursts. Biological Cybernetics, 99(2):139–157, 2008. J.R. Wolpaw and D.J. McFarland. Control of a two-dimensional movement signal by a noninvasive brain-computer interface in humans. Proceedings of the National Academy of Sciences, 101(51):17849, 2004. K.J. Worsley, A.C. Evans, S. Marrett, and P. Neelin. A three-dimensional statistical analysis for CBF activation studies in human brain. Journal of Cerebral Blood Flow and Metabolism, 12:900–918, 1992. K.J. Worsley, S. Marrett, P. Neelin, AC Vandal, K.J. Friston, AC Evans, et al. A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4(1):58–73, 1996. K.J. Worsley, J.E. Taylor, F. Tomaiuolo, and J. Lerch. Unified univariate and multivariate random field theory. NeuroImage, 23:S189–195, 2004. JE Zimmerman, P. Thiene, and JT Harding. Design and Operation of Stable rf-Biased Superconducting Point-Contact Quantum Devices, and a Note on the Properties of Perfectly Clean Metal Contacts. Journal of Applied Physics, 41:1572, 1970. 104
Abstract (if available)
Abstract
I describe methods for the detection of brain activation and functional connectivity in cortically constrained maps of current density computed from magnetoencephalography (MEG) data using multivariate statistical analysis. I apply time-frequency (wavelet) analysis to individual epochs to produce dynamic images of brain signal power on the cerebral cortex in multiple time-frequency bands, and I form observation matrices by putting together the power from all frequency bands and all trials. To detect changes in brain activity, I fit these observations into separate multivariate linear models for each time band and cortical location with experimental conditions as predictor variables
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Statistical signal processing of magnetoencephalography data
PDF
Signal processing methods for interaction analysis of functional brain imaging data
PDF
Signal processing methods for brain connectivity
PDF
Multivariate time series analysis based on principal component analysis
PDF
Network structures: graph theory, statistics, and neuroscience applications
PDF
Measuring functional connectivity of the brain
PDF
Functional connectivity analysis and network identification in the human brain
PDF
Investigating statistical modeling approaches for reservoir characterization in waterfloods from rates fluctuations
PDF
Prediction modeling and statistical analysis of amino acid substitutions
PDF
Scalable multivariate time series analysis
PDF
Parallel implementations of the discrete wavelet transform and hyperspectral data compression on reconfigurable platforms: approach, methodology and practical considerations
PDF
Multivariate methods for extracting genetic associations from correlated data
PDF
Matrix factorization for noise-robust representation of speech data
PDF
Applications of graph theory to brain connectivity analysis
PDF
Statistical analysis of microarray data and functional genomics of yeast ageing
PDF
On the electrophysiology of multielectrode recordings of the basal ganglia and thalamus to improve DBS therapy for children with secondary dystonia
PDF
Transmission tomography for high contrast media based on sparse data
PDF
Human activity analysis with graph signal processing techniques
PDF
Mapping the neural architecture of concepts
PDF
Correction, coregistration and connectivity analysis of multi-contrast brain MRI
Asset Metadata
Creator
Poletti Soto, Juan Luis
(author)
Core Title
Multivariate statistical analysis of magnetoencephalography data
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/08/2011
Defense Date
12/09/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
functional connectivity,magnetoencephalography (MEG),multivariate statistics,OAI-PMH Harvest,oscillatory brain activity
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Leahy, Richard M. (
committee chair
), Ortega, Antonio (
committee member
), Pantazis, Dimitrios (
committee member
), Tjan, Bosco S. (
committee member
)
Creator Email
juanlps@yahoo.com,polettis@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3646
Unique identifier
UC1222934
Identifier
etd-Soto-4248 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-427403 (legacy record id),usctheses-m3646 (legacy record id)
Legacy Identifier
etd-Soto-4248.pdf
Dmrecord
427403
Document Type
Dissertation
Rights
Poletti Soto, Juan Luis
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
functional connectivity
magnetoencephalography (MEG)
multivariate statistics
oscillatory brain activity