Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Causality and consistency in electrophysiological signals
(USC Thesis Other)
Causality and consistency in electrophysiological signals
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CAUSALITYANDCONSISTENCYINELECTROPHYSIOLOGICALSIGNALS by SyedAshrafulla ADissertationPresentedtothe FACULTYOFTHEUSCGRADUATESCHOOL UNIVERSITYOFSOUTHERNCALIFORNIA InPartialFulfillmentofthe RequirementsfortheDegree DOCTOROFPHILOSOPHY (ELECTRICALENGINEERING) May2014 DoctoralCommittee: ProfessorRichardM.Leahy,Chair ProfessorKeithM.Chugg ProfessorAntonioOrtega ProfessorJerryM.Mendel ProfessorHyungsikR.Moon Copyright 2014 SyedAshrafulla Abstract Model-basedapproachestoelectrophysiologicalsignalprocessingprovidelow-variance estimates of the activity and relationships within neurological systems. In this disserta- tion, we develop a method for testing consistency of modeling of neurological signals acrossdifferentdataacquisitionsystems,introduceanewmeasureformodelingcausal- ity in mean between sets of signals, and construct a fast method for modeling causality invariance. Fortestingconsistencyofsignalsacrossdifferentacquisitionsystems,weusestatis- tical resampling and estimation techniques to assess the similarity of activity estimated from recordings on two different systems. Using this methodology, we can determine what preprocessing and postprocessing steps are advisable for non-invasive studies of brain electrical activity, for example. We can also determine which models are most robustacrossdataacquisitionsystems. For causality in mean, we develop a new model of causality between sets of signals inspired by properties in electrophysiological signals. Since nearby sensors are often sensitivetothesameinformation,groupingsensorsintoregionsofinterestcanincrease signal-to-noiseratio. Weexploitthispropertytoformanewmeasureofcausalitycalled canonical Granger causality (CGC), which trades off spatial resolution to increase the abilitytofindshort,dynamicconnectionsthatarecausalinmean. ii For causality in variance, we introduce a new method to more quickly estimate the effectofonesignalonanothersignal’svariation. Thiscausalityisestimatedbyfittinga conditional heteroscedasticity model to signals using a maximum-likelihood approach. We introduce a new method for this maximum-likelihood problem which uses variable splittingandthealternatingdirectionmethodofmultipliers,avoidingthecomputational cost of traditional gradient descent methods. The significant gains in speed allow us to analyzedynamicchangesincausalityinvariancewithanow-reasonableruntime. Along with each project we provide applications of these methods to MEG record- ings, local field potentials on the macaque brain and intracranial voltage recordings on theresting-statehumanbrain. iii to my mother, my mother, my mother and my father iv Acknowledgements Thisdissertationisthemostcompleterepresentationofamassiveamountofworkwhich could not possibly be completed without the help of some of the best people I’ve ever met. IwouldberemissifIdidnotacknowledgethemandtheirsuccessfuleffortstoturn meintoaproductiveengineer. My advisor, Professor Richard M. Leahy, provided me many years of support both financially and academically. His dogged insistence on improving me and keeping me accountable was in large part how I became able to compete and thrive in the world of academia. I also thank my guidance committee for both my qualification and my de- fense: ProfessorAntonioOrtega,ProfessorKeithM.Chugg,ProfessorJerryM.Mendel and Professor Hyungsik Roger Moon. Their diverse expertise guided me in converting my proposal to a manageable and more interesting problem to solve than what I had originallyproposed. I have been lucky to be put under the wing of two talented academics who acted as semi-advisors during my time in graduate school. Dimitrios Pantazis was the person I turned to for my most diffcult problems during the first portion of my work, which includes chapters 2 and 3 of this dissertation. He is not only a semi-advisor but a dear friend whose honest conversations helped me many times. Professor Justin P. Haldar guidedmewhenImadetheturntowardsanalyzingcausality,anditistohimthatIowe deep debt for both my progress and my ability to graduate. It is no coincidence that the papersrelatingtochapters4and5bearhisnameasasecondauthor. Ialsowanttogive thankstoAnandA.Joshi,whoprovidedanacademic’shavenwhendiscussingresearch problems. Iwanttothankallmycurrentandformerlabmates-HuaBrianHui,ProfessorJuan LuisPolettiSoto,YuTengChang,ChitreshBhushan,KevinWang,SangeethaSomaya- jula,SangheeCho,JoyitaDutta,YanguangLin,RanRenandWentaoZhu-forcreating a personal work environment where we could discuss anything from optimization to food to politics. Nick Richard and Matt Brennan were not just friends but also sound- boardsearlyinmyworkwheneverIneededassistance,personalorotherwise. I also want to single out three people who were my best friends in Los Angeles. Martin Gawecki was my roommate for many years and provided the problem-solving perspectiveandfriendlinessthatIneededwhenstuck. DavidWhelandcamewithmein the same year and was a constant companion during our seemingly endless journey of v graduate school. Sergül Aydöre is my best friend in Los Angeles and the only person whowouldgooutofherwaytochallengeandatthesametimesupportme. Ihopethat in my future work I can continue to find ways to work and associate with extraordinary engineerslikethem. I must also acknowledge the National Institute of Health, the National Institute of Biomedical Imaging and Bioengineering and the Ming Hsieh Department of Electri- cal Engineering within the Andrew Viterbi School of Engineering at the University of Southern California (USC) for providing financial and administrative support. I expect that my generation will be far from the last to thank them for the work they do while minimallyinterferingthescientiststheysupport. Part of my success as an engineer was the many collaborations who helped me suc- ceed. The Image Processing and Informatics Lab at USC, led by Professor Brent J. Liu and including members Jorge Document, Kevin Ma, Ruchi Deshpande, Ximing Wang, Jasper Lee, James Fernandez, and honorary member H. K. Bernie Huang, provided fi- nancialaswellasacademicsupportduringtheyearswhereIfinishedchapter2. Sylvain Baillet, John C. Mosher and François Tadel helped me turn much of my theory into practicallyusefulsoftwareusedworldwide. Ialsowanttospecificallymentionthehelp ofSharonNiv,whoprovidedmybestlookintohowmymethodscanhelppeopleoutside theclinicaswellasexposemyworktothefieldofpsychology. Shealsoprovidedmea context to my work that was not just based in academia, opening my eyes to the use of whatIhavelearnedandcandorightnowtohelppeople. Finally,Ithankmyparentsandfamilyfortheirsupport. IdonotthinkIamexagger- ating in saying that my mother Hosne Ara and father Syed Asgharullah sacrificed their lives for me, my two lovely sisters Syeda Sharmin Ara and Syeda Yasmin Ara and my brotherNusratHossain. Ihopethatwiththiswork,andmycontinuingwork,Icanmake themproudforallthesacrificiestheyhavemadetosupportme. vi TableofContents Abstract ii Dedication iv Acknowledgements v ListofFigures x Chapter1 Introduction 1 Chapter2 Background 5 2.1 SignalProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 TimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.3 SingularValueDecomposition . . . . . . . . . . . . . . . . . . 8 2.2 BrainImaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 ElectrocorticographyandLocalFieldPotentials . . . . . . . . . 11 2.2.2 Magnetoencephalography . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 BrainResponseTypes . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.1 HypothesisTesting . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.3 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter3 ExchangeabilityofMultisiteData 28 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 MaterialsandMethods . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.1 Datacollection . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 Alignmentofsensors . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Resamplingtheevokedresponse . . . . . . . . . . . . . . . . . 34 vii 3.2.4 Sourceestimation . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.5 Significantactivity . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.6 Testingforconsistency . . . . . . . . . . . . . . . . . . . . . . 38 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.1 Effectsofalignmentonconsistency . . . . . . . . . . . . . . . 41 3.3.2 WeightedMNE . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.3 Beamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.4 Groupconsistencytesting . . . . . . . . . . . . . . . . . . . . . 46 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter4 CanonicalGrangerCausality 54 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 ReviewofCausalityMeasures . . . . . . . . . . . . . . . . . . . . . . 57 4.2.1 GrangerCausality(GC) . . . . . . . . . . . . . . . . . . . . . . 57 4.2.2 MultivariateGrangerCausality(MGC). . . . . . . . . . . . . . 59 4.2.3 Grangercanonicalcorrelationanalysis(GCCA) . . . . . . . . . 60 4.3 CanonicalGrangerCausality(CGC) . . . . . . . . . . . . . . . . . . . 61 4.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4.1 SimulatedModels . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4.2 DetectingCausality . . . . . . . . . . . . . . . . . . . . . . . . 65 4.4.3 EstimatingCausality . . . . . . . . . . . . . . . . . . . . . . . 69 4.5 RealExperimentalData . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5.1 ExperimentalSetup . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5.3 Striate-PrestriateCausality . . . . . . . . . . . . . . . . . . . . 72 4.5.4 SignalSubspaces . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Chapter5 ADMM&Heteroscedasticity 81 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.1 AutoregressiveConditionalHeteroscedasticity . . . . . . . . . . 83 5.2.2 MaximumLikelihoodEstimation . . . . . . . . . . . . . . . . . 85 5.2.3 VariableSplitting&DualAscent . . . . . . . . . . . . . . . . . 86 5.2.4 AlternatingDirectionMethodofMultipliers(ADMM) . . . . . 87 5.2.5 MeasuringCausalityinMean&Variance . . . . . . . . . . . . 92 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.3.2 Application: Resting-StateEEG . . . . . . . . . . . . . . . . . 99 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 viii 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Chapter6 Conclusion 108 References 112 AppendixA SourceEstimation 133 A.1 WeightedMNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 A.2 LCMVbeamforming . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 AppendixB NumericalImplementationofCGC 136 AppendixC TabulatedResultsinCGCSimulations 139 AppendixD ADMMStepsforvARCHModeling 147 D.1 UpdatingtheParametersinvARCHmodel . . . . . . . . . . . . . . . . 148 D.2 Updatingtheconditionalcovariancematrices . . . . . . . . . . . . . . 150 D.3 UpdatingtheLagrangemultipliermatrices . . . . . . . . . . . . . . . . 151 D.4 Updatingtheprimal&dualgapsforconvergence . . . . . . . . . . . . 152 ix ListofFigures 2.1 Interaction from the primary current in the axon to the primary current in the dendrite. While the axonal currents provide the information sent to a neuron, the dendritic primary currents have a longer time scale and more strength and are thus easier to detect. In addition, these primary currentsinducevolumecurrents;together,thedendriticprimarycurrents and volume currents can be detected through local potential changes or localmagneticfieldchanges. Thefigureistakenfrom[Par09]. . . . . . 11 2.2 Implantation of electrodes on a patient. The contacts record electrical potentials on the surface of the brain. We will use ECoG on a monkey braintoidentifydynamicconnectivityinthemonkeyinsection4.5.We will also use ECoG on a human brain to identify dynamic connectivity intherestingstateinsection5.3.2..................... 12 2.3 Signals on the brain are recorded in the MEG gantry by the rectangular sensors. Due to the low signal strength, the sensors must be specialized and lie in liquid helium. The resulting signals can be shown in a 2D topographymappingtothehumanhead. InMEG,astrongfocalcurrent sourcecausesadipolepatternduetotheBiot-Savartlaw. . . . . . . . . 14 2.4 Bootstrapping to construct new samples of the evoked response (X or Y). Resamplingwithreplacementleadstoalower-powersignalasthere are fewer unique trials. However, over the set of all bootstraps, the dis- tribution of the bootstrapped values of X is a strong estimator of the distributionofX (inmeanandvariance). . . . . . . . . . . . . . . . . . 19 3.1 We compare the maps of significant activity to determine if MEG data gathered from one center can be treated as an extra run of MEG data gatheredfromanothercenter. Weusethepipelineaboveincombination with the data collected in section 3.2.1 to explore this exchangeability ofMEGdatabetweenmachines. . . . . . . . . . . . . . . . . . . . . . 31 x 3.2 Thechoiceofalignmentdoesnotsubstantiallychangethesimilarityco- efficient between maps of significant activity from any two sites on any run for any subject. For each choice of pair of sites and run, each bar is the Simpson coefficient (and each star is the Dice coefficient) between maps of significant activity generated from orientation-unconstrained MNEusingfullwhitening. Thethreebarsforeachsubjectcorrespondto the initial alignment, ` 1 norm ICP alignment and ` 2 norm ICP align- ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 For orientation-constrainted weighted MNE, the number of bootstraps that consider a vertex significant across machines is either close to the maximum (B=2000) or the minimum (zero). The distribution of sim- ilarity coefficients, aggregated over all subjects & pairs of runs, shows thatbetween-machinesimilarityissmallerthanwithin-machinesimilar- ity, but that difference seems to be smaller than the variation in within- machine similarity. On the left, the counts of the number of significant bootstraps are plotted using one subject (subject D). The top rows are bootstraps admitting significant activity at a vertex at both machines; the bottom row is bootstraps admitting significant activity at a vertex for both runs at the specified machine. On the right, the histogram val- ues are computed with bin width 0.01 and aggregated over all possible subjects and runs. The y-axes are normalized to represent values of the estimatedprobabilitydistributionfunction. Inthesefigures,weusedthe correlatednoisemodel. . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 For orientation-unconstrained MNE, the overlap of significant activity between machines is more contiguous, and as a result the similarity between-andwithin-machinesishigherthanwithorientationconstraints. In addition, the relative increase means there is a larger relative differ- enceinsimilaritywhenusingmultiplemachines. Ontheleft,thecounts ofthenumberofsignificantboostrapsareplottedusingonesubject(sub- jectD).Thetoprowscountbootstrapswhereavertexisactiveacrossthe two specified machines, while the bottom row counts bootstraps where a vertex is active across runs at the specified machine. On the right, the histogramvaluesarecomputedwithbinwidth0.01andaggregatedover all subjects and runs. The y-axes represent the values of the estimated probabilitydistributionfunction. Inthesefigures,weusedthecorrelated noisemodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 xi 3.5 For orientation-constrained LCMV beamforming, the regions of signif- icant activity are larger than for weighted MNE. As a result, there is somewhatsmallerbetween-machineDicecoefficientsforLCMVbeam- forming, while there are higher between-machine Simpson coefficients, whencomparedtoweightedMNE.Ontheleft,thecountsofthenumber of significant bootstraps are plotted using one subject (subject D). The top rows are bootstraps admitting significant activity at a vertex at both machines; the bottom row is bootstraps admitting significant activity at a vertex for both runs at the specified machine. On the right, the his- togramvaluesarecomputedwithbinwidth0.01andaggregatedoverall possible subjects and runs. The y-axes are normalized to represent val- ues of the estimated probability distribution function. In these figures, weusedthecorrelatednoisemodel. . . . . . . . . . . . . . . . . . . . 46 3.6 Fororientation-constrainedLCMVbeamforming,thebootstrapsaremuch morevariableforeachmachineandrun. Asaresult,thesimilarityinex- tent will decrease as the sizes of regions of significant activity differ. In contrast, the similarity in localization will increase as more vertices are classified as signifcantly active. On the left, the counts of the number of significant bootstraps are plotted using one subject (subject D). The top rows are bootstraps admitting significant activity at a vertex at both machines; the bottom row is bootstraps admitting significant activity at a vertex for both runs at the specified machine. On the right, the his- togramvaluesarecomputedwithbinwidth0.01andaggregatedoverall possible subjects and runs. The y-axes are normalized to represent val- ues of the estimated probability distribution function. In these figures, weusedthecorrelatednoisemodel. . . . . . . . . . . . . . . . . . . . 47 4.1 Graphical depiction of GC, MGC, and CGC. We want to estimate the causality from the source region to the sink region via the time series recordings(represented bydarkcircles) fromeachregion. GC analyzes the causality between pairs of recordings using univariate and bivariate autoregressivemodels. MGCfitsmuchlargerparametricautoregressive models to collections of time series. CGC parsimoniously represents each ROI using a single signal formed through an optimized weighted linear combination of signals from the ROI. CGC then applies standard GCanalysistotheseparsimoniousrepresentations. . . . . . . . . . . . 58 4.2 ROCs for bivariate GC, CGC and MGC. The dark lines are the mean ROCs, with the shadows indicating the 95% confidence intervals after bootstrapping. CGC demonstrates better performance than MGC for causalitywithhighercomplexity,duetothesimplifiedsignalmodeland themaximizationofcausality. . . . . . . . . . . . . . . . . . . . . . . 67 xii 4.3 AUCasafunctionofautoregressivesimulationorderP,numberofsam- ples N, SIR and SNR, along with 95% confidence intervals for AUC after bootstrapping. CGC is more robust to small N and large P than MGC,asseenbythegrowinggapinperformance. . . . . . . . . . . . . 68 4.4 DifferencebetweenCGC’serrorwithrespecttothetrueGCandGCCA’s errorwithrespecttothetrueGC.AsN increases,theerrorsgetsmaller yet CGC becomes increasingly better at estimating the true causality. Each percentage is the number of simulations where CGC was better. AsP increases, the errors get larger and CGC is better for a larger per- centageofsimulations. . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Location of the 6 LFP channels in the monkey recorded during the ex- periment. We analyzed striate-prestriate causal interactions related to thestimulus: avisualcuetoperformamotortask. . . . . . . . . . . . . 72 4.6 Dynamic causality in the occipital lobe during a visuomotor decision task. Both CGC (top row) and MGC (bottom row) find causality at 57.5ms from striate to prestriate (left column), but CGC finds a statisti- cally significant difference in causality between the two conditions. In addition,onlyCGCfindsastatisticallysignificantdifferenceincausality between the two conditions. This prestriate to striate causal interaction may be part of the visuomotor processing in “Go” that is not required for“NoGo”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.7 Analysis of the linear combination weights estimated by CGC during peakcausality. Themainplotshowstherelativemagnitudeoftheweight estimated for each channel; the larger the circle radius, the larger the magnitudeoftheweightforthatchannel. Thedarklineshowsthemean magnitude of the weight over the full set of trials, while the shaded re- gionshowsthebootstrappedconfidenceinterval. WeobservethattheS 3 and P 2 recordings contribute substantially to the estimated causality at both time instances. However, the analysis also indicates that S 1 has a higherweightincausalityfromprestriatetostriateat137.5msthanfrom striatetoprestriateat57.5ms. . . . . . . . . . . . . . . . . . . . . . . . 75 5.1 Runtime & log-likelihood of the optimal parameters as a function of ⇢ forADMM,comparedagainstSQP.Eachplotcorrespondstoadifferent size of the vector-valued signal. For each choice, there is an optimal ⇢ (shown with an⇥ ) which maximizes speed (dashed green line); for all ⇢ there is negligible change in accuracy (solid blue line). Accuracy is thedifferenceinlog-likelihoodusingSQPversusADMM (L ⇢ L SQP ). SpeedistheratioinruntimeusingSQPversusADMM ⇣ t SQP t⇢ ⌘ ...... 94 xiii 5.2 PerformanceofADMMversusSQPasafunctionofthenumberofsam- plesN. ADMMisgenerallyfaster,withincreasingspeedadvantagesas N increases. Thereisnegligiblechangeinlog-likelihoodafter250⇥ M 2 samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3 Wald statistic as a function of the ARCH coefficient ˜ C 12 . As ˜ C 12 in- creases,thelog-Waldstatisticincreases;forvaluesof0.1andabove,the resultisexcellentpowerinthetestforcausalityinvariance. Theempir- ical distribution of the Wald statistic for ˜ C 12 =0 is nearly equivalent to using a chi-square distribution with two degrees of freedom, as shown on the bottom left, justifying the use of a parametric null distribution. Using that parametric null, the ROC curves show increasing detection power as ˜ C 12 increases; for ˜ C 12 0.12, the detection power is nearly perfect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Conditionally heteroscedastic voltage signals on the brain cortex. Red circlescorrespondtothosedifferentialsignalsforwhomthereisnosig- nificant conditional heteroscedasticity, while yellow circles correspond to those differential signals that did demonstrate significant conditional heteroscedasticity. Timecoursesfortheconditionalheteroscedasticchan- nels are shown on the right for a ten-second window; the burstiness in each signal is clear at 87, 89 and 94 seconds. For each voltage signal, a lagofR=1sufficedtocapturetheARCHbehaviorofthatsignal. . . . 106 5.5 Short-time causality in mean among icEEG signals during the resting state. Many connections show significant causality in mean, indicating adensenetworkofcausalconnectionsintherestingstate. . . . . . . . . 106 5.6 Short-time causality in variance among icEEG signals during the rest- ing state. Fewer connections are found, and many of these connections overlap with those for causality in mean. Agreement with causality in meanmaybecausedbytheeffectofcausalityinvarianceonestimating causalityinmean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 xiv Chapter1 Introduction To analyze the functional dynamics of biological systems, scientists use sensors that record signals related to those systems. By assessing the dependencies and trends in thesesignals,wecanhopetoilluminatethedynamicsandstructurewithinandbetween the underlying systems. Our goal in this dissertation is to construct different methods for analyzing these relationships, focusing our applications on an organ whose impact onanorganismisnecessary,diverseandmysterious: thebrain. The brain is a complex organ whose activity drives most of the behavior of its host organism. Much of the brain’s activity is manifested as electrical currents generated by the brain’s neurons. These neuronal currents can be considered the output of the brain when the brain is modeled as a system. Most investigations of brain behavior try to describe (a) which groups of neurons change their currents for a given task, which we will call activation, and (b) how such changes are modulated, which we will call connectivity. Inthisdissertation,wewilldescribeourworkpertainingtobothactivationandiden- tification of the brain’s behavior. First, we build a framework to analyze variation in recordings of brain activity when the organism is studied at different locations with different machines while giving the same brain activation patterns. This framework determines whether it is permissible to pool data across multiple sites – possibly in- creasing statistical power withinin group studies – without adding too much variation. We then describe a measure to help identify a type of connectivity called causality in 1 mean: canonical Granger causality (CGC) [AHJL13]. CGC estimates the causality in mean between sets of signals by taking weighted sums within each set, usually repre- senting a region of interest. CGC, in trading off the specificity of each signal’s sensor location, reliably measures interactions with few samples, a property important in dy- namic analysis of the brain’s behavior. Next, we describe an algorithm to solve for a different type of connectivity between brain regions: causality in variance. Through a standard model, the autoregressive conditional heteroscedasticity (ARCH) model, and ournewalgorithmemployingthealternatingdirectionmethodofmultipliers(ADMM), weareabletoestimatecausalityinvariancemuchfasterthancurrentmethods,allowing areasonableruntimefordynamicanalysesofcausalityinvarianceacrossbrainsignals. Of the many ways to record brain activity, we will focus on those ways that have high temporal resolution, since our goal is to capture functional relationships. Non- invasively,weusemagnetoencephalographic(MEG)[HHI + 93]recordingsthatmeasure small variations in the magnetic field outside the head, largely due to electrical activity within the head. Invasively, we use electrocorticographic (ECoG) [PP63] and intracra- nial electroencephalographic (icEEG) recordings, which measure voltages on the sur- face of the brain or within the brain. In either of these methods, the underlying signal is the same: the electrical current of large populations of neurons [OOK + 02]. While a single neuron cannot be analyzed easily, nearby neurons are often highly synchronous in activating for a given condition [HCN + 09], causing a local change in voltage at the surface of the brain (measured by ECoG and icEEG) and a local change in the induced magneticfieldthroughelectrostatics(measuredbyMEG). For MEG, there are many sources of variability, such as different subjects, different machinesanddifferentenvironments[OGH07]. Therefore,whenattemptingamultisite MEGstudy,theadditionofinter-sitevariabilitymayinvalidategroup-wideconclusions. 2 Inthefirstpartofourwork,weanalyzetheinter-sitevariabilityinMEG.Ourgoalisto determinewhetheramultisitestudycanbeperformedwithMEGdataorwhetherdiffer- ent sites differ so much in their recordings that the data cannot be grouped. We use the concept of “exchangeability”: the ability to replace recordings at one site with record- ings at another site while introducing negligible changes in group analysis. We also providethepreprocessingandpostprocessingstepsthatincreasethestatisticalpowerof testsusingMEGdatabypoolingacrossMEGcenters. After determining the exchangeability of MEG data, we can then assess connectiv- ity in the brain. Specifically, we will look at causality, which quantifies the influence of one part of the brain on another [Pea09]. Through functional brain data, we can look at causality via precedence, i.e. the relation between the past of one part of the brain and the present of a second [Wie56]. This model is known as Granger causality [Gra69] and provides a look into the time-delayed flow of information throughout the brain [BKK04]. For Gaussian data, we can restrict ourselves to looking at causality in meanandcausalityinvariance,thetwodescriptorsofaGaussianprocess. Causality in mean relates to the rise and fall of one signal as modulated by another signal. However, in brain recordings, multiple signals usually record the activity from thesamepartofthebrain[SS01]. Ifweanalyzethesignalsseparatelythetruecausality betweenpartsofthebrainmaybelostinnoise[SFC + 10]. Givensomeknowledgeabout howtogroupthesignals,wecanaggregatethesignalsineachgrouptomoreaccurately estimatethesignalofinterest. Callingeachgroupa“regionofinterest”orROI,ourgoal is to derive causality between ROIs through the weighted sums of signals in each ROI. The resulting measure, CGC, reduces the complexity in estimating causality so that we can measure causality with less data than existing methods. This feature is especially importantinbrainanalyseswhereconnectivityisdynamicandshortinduration. 3 Causalityinvariancerelatestothevariationinpowerofonesignalasmodulatedby another signal. This modulation can be estimated through fitting of an ARCH model [Eng82]whichrelatesthemostrecentpastsecondmomentsofonesignaltothecurrent varianceofasecond. Previousmaximum-likelihoodapproacheswererelegatedtostan- dard gradient descent and quadratic programming optimization methods. We devise a newalgorithmtoestimatethismodel,usingvariablesplittingandADMM,whichproves to be faster for all ARCH models, with the speed increase becoming ever bigger as the dimensionality of the model increases. Hence, we can estimate dynamic changes in causalityamongmultiplesignalswithachosenaccuracyandanow-reasonableruntime. We first provide the background of this work in chapter 2, including the imaging methodologies and underlying mathematical constructs. In chapter 3 we will describe the first part of our work: the exchangeability of MEG data across sites. After the as- sessment of exchangeability, we will then move to the definition and analysis of CGC in chapter 4. We then continue our analysis of causality with our novel algorithm for estimating ARCH models in chapter 5. In each chapter, we provide not only the algo- rithms but an appropriate application of each result in identifying some aspect of brain function. We conclude by putting our work in the broader context of biological system identification,providingsomepossiblefutureavenuesforresearch. 4 Chapter2 Background In this chapter we explain some of the background assumed in this dissertation. We introduce some concepts of signal processing used as a basis for our methods. We then describe the biological mechanisms and concepts behind the signals we are trying to analyze in the brain. We also describe the machinery recording those signals for our analysis. We add some description of the statistics tools being used. We conclude this chapterwithaprimerontheconceptofconnectivityasitrelatestobrainsignals. 2.1 SignalProcessing Signal processing [Mit10] is the analysis of ordered observations that represent some aspect of an underlying system. For example, one can consider temperature over dif- ferent locations to be a signal that represents some part of the weather or answers to surveysasasignalrepresentingemotionalstateofthepeoplebeingsurveyed. Thereare manyaspectstosignalprocessing,buteachapplicationofsignalprocessingcanbebro- kendownintoacquisition,preprocessing,andanalysis. Inthisdissertation,wefocuson preprocessing and analysis. Before that, we describe two important concepts in signal processing: theconceptoftimeseriesandtheuseofmatricesinsignalprocessing. 5 2.1.1 TimeSeries A time series [Bri01] is an ordered set of data that has a precedence that allows the interpretationofthen th valuecomingafterthe(n1) th andbeforethe(n+1) th element in the set. The samples are generally evenly spaced in time, so that the concept of having lag between samples is consistent across the set of data. We write each time series of length N as x[n] for data points, or samples, n=1,2,...,N. For multiple timepoints, we will use ellipses in the brackets, so thatx[3,...,8] represents the joint values (x[3],x[4],x[5],x[6],x[7],x[8]). Inthisdissertation,wewillmodelmanytimeseriesx[n]asrandomsequenceswith a probability distribution function P [1,...,N] which maps the time series x to a nonneg- ative number such that ´ P [1,...,N] (z[1,...,N])dz =1. Note that the function is spe- cific to the time samples. A time series is stationary if for every window of samples x[n 1 ,...,n 2 ]andinteger⌧ (the“lag”),thereisanequalityinprobabilitydistributions: P [n 1, ...,n 2 ] (x[n 1 ,...,n 2 ]) =P [n 1 +⌧,...,n 2 +⌧ ] (x[n 1 +⌧,...,n 2 +⌧ ]) Under stationarity, the probability distribution of a set of samples inx is not a function of the indices for those samples but rather how far away they are from each other in time. So, any segment of lengthN will have the same joint probability distribution. A processis weakly stationaryifonlyitsmeanandvariancefollowtheabovelaw. 6 2.1.2 Matrices In addition to analyzing time series, we will also be analyzing matrices [Str09], which arerectangulararraysofelementssuchasthisoneofdimensionM⇥ N: A = 2 6 6 6 6 6 6 6 6 6 6 4 A 11 A 12 ··· A 1,N 1 A 1N A 21 A 22 ··· A 2,N 1 A 2N . . . . . . . . . . . . . . . A M 1,1 A M 1,2 ··· A M 1,N 1 A M 1,N A M1 A M2 ··· A M,N 1 A MN 3 7 7 7 7 7 7 7 7 7 7 5 Matrices add in the expected manner (element-by-element) but multiply in a differ- ent manner: for two matricesA (of size M⇥ N) andB (of size N⇥ K),C = AB is a matrix of sizeM⇥ K such thatC mk = P N n=1 A mn B nk for allm=1,...,M and k=1,...,K. Note that matrix multiplication can only be done on two matrices where the number of rows of the second matrix equals the number of columns of the first ma- trix; hence, matrix multiplication is not commutative. For two matricesA andB with the same size (of M⇥ N), the element-wise (Hadamard) product is the element-wise productC =AB of sizeM⇥ N such thatC mn = A mn B mn for allm=1,...,M and n=1,...,N.A vector is a matrix with only one column (in this work, we will represent vectors as columns and use lowercase variables such asa), and a scalar is a 7 singlenumberequivalenttoa1⇥ 1matrix. Inaddition,wewillusetwospecialmatrices, knownasthe identity matrixandthe ones matrix: I M⇥ M = 2 6 6 6 6 6 6 6 6 6 6 4 10 ··· 00 01 ··· 00 . . . . . . . . . . . . . . . 00 ··· 10 00 ··· 01 3 7 7 7 7 7 7 7 7 7 7 5 1 M⇥ N = 2 6 6 6 6 6 6 6 6 6 6 4 11 ··· 11 11 ··· 11 . . . . . . . . . . . . . . . 11 ··· 11 11 ··· 11 3 7 7 7 7 7 7 7 7 7 7 5 Theidentitymatrixmustbesquare,buttheonesmatrixmayberectangular (M6=N). 2.1.3 SingularValueDecomposition We will use the concept of singular value decomposition [EY36] in chapters A and D. Thisdecompositionisbasedonsolutionstotheequation Av = u whereA is a (known) matrix,u andv are an unknown vector to estimate and is an unknown scalar estimated along withu andv. It is well known [KL80] that there are S =min{M,N} solutions ( s ,u s ,v s ) to this equation 1 , where the singular values s 0andthe left singular vectorsu s and right singular vectorsv s satisfy u T s u s 0 =v T s v s 0 = 8 > > < > > : 0 s6=s 0 1 s =s 0 1 This requires the assumption that the eigenvalues are distinct. If k eigenvalues i1 , i2 ,..., i k are equal,thentheleftsingularvectorslieinarestrictedsubspaceofdimensionkandtherightsingularvalues lieinanotherrestrictedsubspaceofdimensionk. 8 for s,s 0 =1,...,S, making the vectors orthonormal. Without loss of generality, we order the singular values so that 1 2 ··· S 1 S . The decomposition is then A = min{M,N} X s=1 s u s v s =U⇤V T where U = u 1 u 2 ··· u S is a matrix of dimension M ⇥ S, V = v 1 v 2 ··· v s is a matrix of dimension N⇥ S and⇤ is a matrix of dimension S⇥ S with⇤ ss = s fors=1,...,S,and⇤ ss 0=0fors6=s 0 . IfAissymmetric,then M =N andu m =v m form=1,...,M. Singular value decompositions have the property that U and V have orthonormal columns. Hence, one can define the pseudoinverseA † =V⇤ 1 U which has the prop- ertythatAA † A =AandA † AA † =A † . ThisisnotatrueinverseasAisnotsquarein general; however, forA square (M =N) and invertible (min s s 6=0) it can be shown thatA † =A 1 . In addition, we will use the fact [GR70] that for a known vectory and aknownmatrixA,thesolutionoftheoptimizationproblem min x kAxyk 2 ` 2=min x (Axy) T (Axy) is found using a pseudoinverse: the optimal value of x is ˆ x = A † y. In addition, if M<N, so that there are multiple values ofx such thatAxy=0, then ˆ x is the solutionwithminimumsum-of-squares(its` 2 -norm). 2.2 BrainImaging To analyze functional brain data, we need to first understand the mechanisms of the signalswearetryingtoanalyzeandthedataweareobtaining. Thefunctionofthebrain 9 is mainly driven by neurons, which are cells that receive information through dendrites andtransmitinformationthroughaxonstoothercells(primarilyotherneurons)[KSJ00]. The neuron’s signal is known as the action potential, which is a pluse that propagates along the axon and acts as input to the dendrites of the neurons to which that axon is connected. Thisactionpotentialisformedwhentheaggregateinputstoaneuronexceed aneuron-specificthreshold. Wefocusonthoseactionpotentialsandtheirpropagationtootherneurons. Propaga- tion of one neuronal signal to another occurs at the synapse, which is a junction where electrical signals are neurochemically transmitted. The transmitting neuron generates anactionpotentialatthebeginningofthe axon,ashieldedconductorofvoltagechange to the synapse [SSSH97]. That potential triggers the opening of ion channels in the synapse. Ionchannelsaremembraneproteinsthatregulatetheflowofions. Theiropen- ingformsapotentialacrossthesynapse. This excitatory postsynaptic potentialinduces acurrentatthedendrite[JMCC96]. The currents from an action potential are axonal currents, which aggregate along the dendrites of the receiving neuron to form the dendritic primary current as shown in figure 2.1. These dendritic primary currents have a longer time scale, since they are the sumofallaxonalcurrentsinputintotheneuron,thanthefastaxonalcurrents. Hence,the dendritic primary currents are easier to detect. Since the brain is a conducting medium, the presence of a primary current induces a volume current outside the neuron. Most neuronalcellbodiesareonorneartheoutermostlayerinthebrain,calledthecortex,so thecortexiswheremostoftheprimarycurrentsinthebrainaregenerated. Thedendritic treesofneuronsareparallelinparallelinthecortex[LESF00]sothecombineddendritic primary and secondary currents sum to form a detectable signal. Under the assumption that this is a quasistatic electrophysical process, the two most used signals are a local 10 Figure 2.1: Interaction from the primary current in the axon to the primary current in the dendrite. While the axonal currents provide the information sent to a neuron, the dendritic primary currents have a longer time scale and more strength and are thus easiertodetect. Inaddition,theseprimarycurrentsinducevolumecurrents;together,the dendritic primary currents and volume currents can be detected through local potential changesorlocalmagneticfieldchanges. Thefigureistakenfrom[Par09]. changeinthepotentialonthesurfaceofthecortexaswellasthemagneticfieldoutside thebrain[Par09]. 2.2.1 ElectrocorticographyandLocalFieldPotentials 2.2.1.1 Reasoning We can record the voltage on the cortex by placing electrodes on the brain. This is the concept behind electrocorticography (ECoG) [PP63] and is shown in figure 2.2. ECoG usessmallelectrodestomeasurelocalfieldpotentials(LFPs)onthecortexthataregen- erated by the nearby primary and secondary currents. This modality requires a surgical procedure; in return, the recordings are not affected by nonuniform conductivities from 11 Figure 2.2: Implantation of electrodes on a patient. The contacts record electrical po- tentials on the surface of the brain. We will use ECoG on a monkey brain to identify dynamicconnectivityinthemonkeyinsection4.5. WewillalsouseECoGonahuman braintoidentifydynamicconnectivityintherestingstateinsection5.3.2. hair or the scalp. ECoG signals have a high spatial resolution (~1mm) and temporal resolution(~5mm)[BAK12]. 2.2.1.2 Estimation Since this method is invasive, there is no need to estimate signals on the cortex. How- ever, measuing a voltage change requires a reference for the ECoG data. If we subtract a common signal from all the recordings (a referential approach), we introduce non- biologicalcommonsignalbetweenrecordings. Thus,wewillpairofftheelectrodesand use the intra-pair difference as the recordings of interest, corresponding to local volt- age changes. In section 5.3.2 we will analyze connectivity by referencing each surface electrodebyanearbydepthelectrode. 12 2.2.2 Magnetoencephalography 2.2.2.1 Reasoning We know from electromagnetics that a total current induces a nearby magnetic field [Max65]. Magnetoencephalography(MEG)measuresthechangesinthemagneticfield outside the head related to the primary and secondary currents in the brain [Coh68] as shown in figure 2.3. MEG uses a specialized sensor called a superconducting quantum interference device (SQUID) to record these magnetic fields [Koc08]. The device’s complication is due to the extremely low strength of the magnetic field caused by brain currents. However, given proper shiedling and preprocessing, MEG is able to use these magnetic fields to infer the current distribution on the cortex, albeit with much less sensitivityandspatialresolutionthanECoG[OOK + 02]. 2.2.2.2 Estimation MEG has hundreds of sensors that capture the magnetic field outside the head. In ad- dition, software like BrainSuite [SL02] can construct a tesselated representation of the cortex. We denote the sampled channel data bym[n], one dimension per channel, and the estimated source data bys[n], one dimension per vertex on the cortical tesselation. Since the relationship between currents and magnetic fields is linear [BML01], we can modeltheMEGdataasamatrixequation: m[n]=Gs[n] We want to be able to describe the currents on the cortex from the magnetic fields we record,butthetesselatedcortexhasthousandsofverticesforhundredsofsensors. Asa result, thesystemisunderdetermined. Thus,wemostintroducemodellingassumptions 13 Figure 2.3: Signals on the brain are recorded in the MEG gantry by the rectangular sensors. Duetothelowsignalstrength,thesensorsmustbespecializedandlieinliquid helium. The resulting signals can be shown in a 2D topography mapping to the human head. In MEG, a strong focal current source causes a dipole pattern due to the Biot- Savartlaw. forthecurrentsourcess[n]. Thetwomostcommonmodelsaremultipledipolemodels anddistributedsourcemodels. Multipledipolemodels[MLL92]assumesthatthemag- netic field is generated mainly by a small number of focal sources. Distributed source models[DS93]donotassumeasmallnumberofsources,butinreturnallowlargeareas of cortex to affect each sensor. We describe the techniques to estimate m from s in appendix A, using these techniques in our analysis of the exchangeability of MEG data acrosssitesinchapter3. 14 2.2.3 BrainResponseTypes In many neuroimaging experiments, the goal is determine the brain activity caused by a certain stimulus that can be as simple as stimulation of a nerve and as complex as a decision-makingtask. Timeisthencalibratedsothatn=0correspondstotheinitiation of the stimulus. In addition, to increase the ability to differentiate stimulation-related signals from noise, often investigators will collect multiple instances of the brain’s re- sponsetothesimulation,whicharecalledtrials. Then,thegoalistousemultipletrials, allwithacommondescriptionoftimezero,todetermineactivityinthebrainthatrelates toastimulus. 2.2.3.1 TheEvokedResponse Thesimplestwaytofindactivityrelatedtothestimulusisbyaveragingthetrials[MI08], forming the evoked response. The signal model is that each trial ⌧ 2{ 1,...,T} has a stimulus-locked signalg[n] that is independent of the trial number [Daw54]. Labelling b ⌧ [n]asthesignalnotlockedtothestimulus,wehavethemodel: m ⌧ [n]=g[n]+b ⌧ [n] Asaresult,wecanincreasethesignalpowerbyaveragingovertrials,whichreducesthe variance of the noise without affecting the signal. We will analyze the evoked response ofawell-studiedstimulusinchapter3todeterminethesitevariabilityinMEGdata. 15 2.2.3.2 TheInducedResponse Theevokedresponsedoesnotfullycharacterizetheresponse;itonlyanalyzesresponses locked to the stimulus. The induced repsonse corresponds to that part of the brain re- sponse that is not locked to the stimulus [Pan95]. We find the induced response by subtractingtheevokedresponsefromeverytrial,forminginducedresponsetrials ˜ m ⌧ [n]=m ⌧ [n] 1 T T X ⌧ =1 m t [n] The induced response then contains components that are not phase-locked to the stimulus but are different from non-stimulus activity. The induced response is usually studied with higher-order statistical models; by removing the evoked response, we pre- vent bias in those models caused by nonzero means in the data. In addition, analyzing theinducedresponseforconnectivityadmitsmodelsthatrequirestationaritybyremov- ing the evoked response and its change of the mean over time [LDNB00]. We will analyze the induced response to find networks in the area of the macaque brain related tovisioninchapter4. 2.2.3.3 TheRestingState In the two previous responses, the requirement to differentiate was a task which sepa- rated the brain response into the response related to the task and the response related to noise. The resting state is the behavior of the brain when there is no task. In most studies, this data is treated as noise. Recently, however, the brain has shown organized andcoherentbehavioracrossregionsconsistentlyduringtherestingstate. Thisbehavior has many hypotheses; for example, the default mode network is a set of regions of the brainthatcoherentlydeactivateduringanytask. 16 Resting-state brain signals are only locally stationary (see section 2.1.1), as brain networkswillbeactiveforshortdurations. Asaresult,weanalyzeresting-stateactivity throughnonstationaryapproaches. Inthisdissertation,weuseslidingwindowstoallow analyses of stationary models on non-stationary data by assuming local stationarity. To summarize the results over an entire set of resting-state data, it is often best to use an agglomerative approach. First, we estimate the desired metrics and make the result- ing conclusions within each window. Then, we determine which conclusions persist for many consecutive windows, indicating short-time consistent brain behavior. This approachwillleadtotheresting-statecausalitynetworksinchapter5. 2.3 Statistics Statistical methods are often used in studies to measure the likelihood of differences or similarities between datasets. The statistics of this dissertation revolve around the hypothesistest,whichdetermineswhetheragivenhypothesiscanorcannotberejected. Whenthereisnoclearrepresentationoftheprobabilitydistributionsunderahypothesis, wemustuseresamplingtechniquestoformenoughsurrogatedatatoadequatelytestthe hypothesis. In this section, we will assume we are looking for the differences between tworandomvariablesX andY . 2.3.1 HypothesisTesting A hypothesis test [CB01] is a procedure to reject a hypothesis so that we can verify its opposite. The null hypothesis is what we try to reject; in our example, the null hypothesis, usually labelled H 0 , is the hypothesis that X = Y. We also need a test 17 statistic to summarize the effect of the null hypothesis in one scalar. A common test statisticisthe t-statistic t = µ X µ Y q 2 X N X + 2 Y N Y where the means of the random variables areµ X andµ Y , the standard deviations of the random variables are 2 X and 2 Y and the sample sizes of the random variables areN X andN Y . Wealsoneedanulldistribution,whichdescribesthedistributionoftunderthe null hypothesis. In our example, if X and Y are Gaussian random variables with unit variance,wecanassumethenulldistributionofttobeGaussianwithvariance 1 N X + 1 N Y andzeromean(sinceunderthenullhypothesis,X =Y soµ X =µ Y ). The hypothesis test is then an attempt to reject a null hypothesis (X = Y) because one required condition (µ X = µ Y ) is unlikely. We use the statistic ˆ t calculated from theactualsamples. Thenulldistributionprovidesaprobabilitythatthestatisticisabove thisactualvaluegiventhenullhypothesis: p =P t> ˆ t|H 0 . Ifthisp-valuefallsunder a standard rate of error called the false positive rate ↵ (in neuroscientific experiments, often↵ =0.05),werejectthenullhypothesis. 2.3.2 Bootstrap Suppose in our example we want to test X = Y but we have only one sample of X and Y. One reason may be because they are measurements from the evoked response of multiple trialsx ⌧ andy ⌧ , ⌧ =1,...,T. We denote the original measurements asX 1 andY 1 X 1 =X (x 1 ,x 2 ,...,x T ) Y 1 =Y (y 1 ,y 2 ,...,y T ) 18 Figure 2.4: Bootstrapping to construct new samples of the evoked response (X or Y). Resampling with replacement leads to a lower-power signal as there are fewer unique trials. However,overthesetofallbootstraps,thedistributionofthebootstrappedvalues ofX isastrongestimatorofthedistributionofX (inmeanandvariance). Wecanusethebootstrap[ET93]togenerateB1extrasamplesX 2 ,Y 2 ,...,X B ,Y B that sample from the distribution of X and Y. For bootstrap b we resample with re- placement from the set of trials to form a new sample ofX andY as shown in 2.4.We resample with replacement T times from the trials x 1 ,x 2 ,...,x T to form a new set of trialsx b 1 ,x b 2 ,...,x b T fromwhichwecomputeasampleforX. RepeatingforY,wegeta newsampleoftherandomvariables X b =X x b 1 ,x b 2 ,...,x b T Y b =Y y b 1 ,y b 2 ,...,y b T 19 Under some mild assumptions of the distribution of X and Y and some strong conditions on a scalar-valued function f [ET86], the histogram of {f (X 1 ),f (X 2 ),...,f (X B )}isagoodestimatorofthedistributionoff (X)(andsim- ilarly forf (Y)). One valid use of bootstrapping is whenf is a function of the mean of the trials in each bootstrap; in other words, bootstrapping does well with metrics using evoked responses. We will use bootstrapping in our multisite MEG analysis in chapter 3. 2.3.3 Permutation In permutation testing [Pit37], rather than building distributions ofX andY, we com- putethedistributionoftwhenX =Y. Aswithbootstrapping,wewantP1samples X 2 ,Y 2 ,...,X P ,Y P of the random variables, except this time at permutationp, we ran- domlyswapsometrialswithequalprobability. Aninstanceofthisswapmightyield X p =X (x 1 ,y 2 ,x 3 ,...,y T ) Y p =Y (y 1 ,x 2 ,y 3 ,...,x T ) where trials 2, T, and others have had their data change conditions. The result is a mixing of X and Y, which means these samples are surrogates for samples from the null hypothesis [PNBL03]. Then, combined with the original samples X 1 and Y 1 , the P 1 permutations form a distribution of the test statistict under the assumption that X =Y. NotethatwehavenotformedaneasilyinterpretabledescriptionofX orY but rathertheteststatistic,whichisadifferencebetweenX andY [Wel90]. The key difference between the bootstrap and the permutation is the question be- ing asked. Bootstraps estimate the distribution of a summary statistic. Permutations 20 estimate the distribution of a summary difference. Permutations thus work best when comparingtwodistributionswhosesamplesarelocked;wewillusethisinsection3.2.5 where X is post-stimulus activity and Y is pre-stimulus activity so the sets are locked by trial index. Bootstrapping works best when the samples are not locked; we will use this in section 3.2.3 whereX is data from one site andY is data from another site (so the samples are not locked). We will also use bootstraps in section 4.5.3 since we are tryingtofindconfidenceintervalsofcausalityinmeanateachtime. 2.4 Connectivity Brain connectivity is the analysis of relationships between parts of the brain [Rai11]. In this dissertation, we focus on functional brain connectivity, where the relation- ships are related to the activity generated from each part in the brain. Functional brain connectivity has been explored through many datasets [HMLA03, CALC10, SLEL07,BZHH95,CHA + 00,WR96,SKSD05,Pet96,NML93,MWGD07,MSAW01, MFF89, LCL + 98, GCC96, ADC + 04, ACM + 07, BKK04, DBYL00, FBK85, GSAL08, HMAS03,LWA + 06,WCD08]butallrevolvearoundtheanswertothiscriticalquestion: areX andY related,andifso,how? Thedescriptionoffunctionalbrainconnectivitycanbesplitintotwoparts: associa- tionandinfluence[RS10]. Connectivitybyassociationisreferredtoasundirectedcon- nectivity [SSSB05] and only asks whetherX andY have a common attribute. Finding influence requires extra modelling assumptions, resulting in the estimation of directed connectivity [BCB + 05]. Directed connectivity tries to not only find the information commontoX andY butalsothedirectionoftransmissionofthatinformation. Directed connectivity results thus give us more information about the possible structure of brain systems,butatthetradeoffofassumingamodelfordirectedinfluence. 21 We will describe the most-used mathematical constructs of undirected and directed connectivitybelow: correlationandGrangercausality. 2.4.1 Correlation Correlation is the measure of commonness between two random variablesX andY. In correlation, we assume that X is related to Y if strong values of X correspond with strong values of Y. Another interpretation of correlation is whether there is a strong linearrelationshipbetweenX andY. Onemeasureofthisassociationisthecovariance betweenX andY. Normalizing by their individual variances yields Pearson’s correla- tioncoefficient ⇢ = Cov(X,Y) p Var(X)Var(Y) whose magnitude is maximized at 1 when the variation inX orY is totally determined by their association and minimized at 0 when X and Y share no association. While there are other measures that specify frequency-domain correlations [SJM + 06], sparse correlations [WJM + 12] or correlations for phase [APL13] and amplitude [CED + 06], they all ask the same question: how much of the variation in X and Y is due to the commonnessbetweenthetworandomvariables? 2.4.2 Causality IsY providinginformationtoX orviceversa(orboth)? Correlationcannotanswerthis question, so we look to causality. Causality is the influence of a random variable on another random variable [Pea09]. Causality is a philosophical theory first described by David Hume whereby the test forY causingX is whether changes inY force changes inthebehaviorofX [Hum40]. 22 Of the many ways to define this philosophical concept [Gha98, MPS08a, MPS08b,MLCS11,MPS12,Sch00,BBS09,BB12,BG07,HSPVB07,LNS + 12,PV07, RHPK12],Grangercausalityisoneofthesimplestandearliest[Gra69]. Grangercausal- ity measures the effect of the past ofY on the present ofX. It thus requires time series representingY andX (labelledy[n] andx[n] respectively). A statistical interpretation of Granger causality is best defined by the null hypothesis: Y does not Granger-cause X if, given the pastx[n1,n2,...], the pasty[n1,n2,...] is independent of x[n]. Assuming X and Y are jointly Gaussian, we can reduce this definition to only lookingattheconditionalmeanorconditionalvariance,correspondingtotheconditions of“causalityinmean”and“causalityinvariance.” 2.4.2.1 CausalityinMean Causality in mean reflects the change in the overall trend of a signal due to an external influence. Againdefinedbyitsnegation,Y doesnotGranger-causeX inmeanif E{x[n]|x[n1,...]} =E{x[n]|x[n1,...],y[n1,...]} Granger causality is most simply measured via autoregression [Sim72]. To use au- toregressionforGrangercausality,weneedtoassertaseriesofassumptions. Thesignals must be stationary and bounded. The signals must also be zero-mean, which is easily achieved by removing the sample mean. The signals must be P th -order Markov, i.e. x[n] is only affected by the past P samples of any signal, so that the sums in the au- toregressivemodelsarefinite. Thesignalsmustalsohaveconstantconditionalvariance, so that Var(x[n]|x[n1,...],y[n1,...]) is not a function of n or x[n]. We can estimate P by information theoretical techniques balancing the increased bias and de- creasedvarianceasP isincreased[L ¨ 85,Aka73,HT89]. Inaddition,weassumethatthe 23 data is Gaussian, because the best estimate ofx[n] given other Gaussian variables is a linearestimate. In autoregression, we model x[n] as a sum of signals in the past and a noise term. ForGrangercausality,weneedtwoautoregressivemodels[Mar87]: x[n]= P X p=1 b[p]x[np]+r[n] 2 6 4 x[n] y[n] 3 7 5 = P X p=1 2 6 4 A 11 [p] A 12 [p] A 21 [p] A 22 [p] 3 7 5 2 6 4 x[np] y[np] 3 7 5 + 2 6 4 s x [n] s y [n] 3 7 5 The top model is the restricted autoregression because it leaves outy. The coefficients b[p]actasafiltertodeterminex[n]fromitspast;theresultingresidualisthe restricted residualr[n]. The bottom model is the unrestricted autoregression because it includes y. Similar to the restricted autoregressive model, we have a filter from the past of X and the past ofY to predictx[n] with unrestricted residuals x [n]. Each model can be fit using a variety of techniques [SM05] that all minimize the residual variance of the model. Then,wecanconstructametricforGrangercausalityinthemean[Gra80]usingthe restrictedandunrestricedresiduals: G Y!X =ln Var(x[n]|x[n1,...]) Var(x[n]|x[n1,...],y[n1,...]) Var(x[n]|x[n1,...]) = Var(r[n]) Var(x[n]|x[n1,...],y[n1,...]) = Var(s[n]) ThenumeratoristhevariationinthepresentofX thatisnotexplainedsolelybyitspast. Thus it may be explained by two factors: other variation at timen or other variables in 24 thepast. ThedenominatoralsosubtractsoutthevariationrelatedtothepastofY. Thus, the difference corresponds to the variation in X solely due to the past of Y, normal- ized by the variation without knowledge of Y and removing any effects of the past of X. In other words, Granger causality measures the ability of Y to predict X beyond what the past ofX can already provide. Under the conditions described above, the null distributionofG Y!X isachi-squaredistributionwithonedegreeoffreedom[Gew82]. The assumptions used to estimate Granger causality by autoregression can be re- laxed. For example, kernel Granger causality [MPS08a, MPS08b, MLCS11] relaxes theassumptionofGaussianitybymodifyingtheautoregressivemodel. Similarly,trans- fer entropy [Sch00, HSPVB07, BBS09, LNS + 12, KS02] uses an information-theoretic measure to replace the variances as a measurement of prediction error. We will expand Grangercausalitytoworkwithsetsofsignalsinchapter4. 2.4.2.2 CausalityinVariance Causality in variance reflects the change in the variation or randomness of a signal due to an external influence. Using the notation above, Y does not Granger-cause X in varianceif Var(s x [n]|x[n1,...]) = Var(s x [n]|x[n1,...],y[n1,...]) It is important to note that this is a variance calculation on the residual af- ter autoregressive modeling; in general we would compute s x [n]= x[n] E{x[n]|x[n1,...],y[n1,...]} before analyzing causality in variance, so as to remove any impact of causality in mean upon the testing of causality in variance. Us- ing the same assumptions for autoregression above (save for the conditionally constant variances) and that the variances of the signals are R th -order Markov, we can build an 25 autoregressive conditional heteroscedasticity (ARCH) model relating the variances and covariance of s x [n] and s y [n], from which we derive a metric of Granger cauaslity in variance 2 6 4 s x [n] s y [n] 3 7 5 |x[n1,...],y[n1,...]⇠N 0 B @ 0, 2 6 4 H xx [n] H xy [n] H yx [n] H yy [n] 3 7 5 1 C A TherearemanyvarietiesofARCHmodels,butwewillfocusonthemostgeneralVEC version: 2 6 6 6 6 4 H xx [n] H xy [n] H yy [n] 3 7 7 7 7 5 = 2 6 6 6 6 4 W xx W xy W yy 3 7 7 7 7 5 + R X r=1 2 6 6 6 6 4 C xx [r] C xyy [r] C xy [r] C xyx [r] C yxyx [r] C yxy [r] C yy [r] C yxx [r] C yx [r] 3 7 7 7 7 5 2 6 6 6 6 4 s 2 x [nr] s x [nr]s y [nr] s 2 y [nr] 3 7 7 7 7 5 Tointerpretthismodel,firstfocusonH xx [n]. Itisthevarianceofs x attimenandit ismodeledasaconstantplusalinearsumofallsecondmomentsattimesn1,...,n R, much in the same way autoregression is a linear sum of all first moments at times n 1,...,nP. In general this model is estimated by maximum likelihood; we will provideafastestimationofthemaximum-likelihoodparametersinchapter5. Usingthemodel,wecomputethestrengthofcausalityinvarianceusingaWaldtest [Wal43]. Thenullhypothesisofnocausalityinvariance,undertheVECversionabove, isequivalentto[HH06] H variance 0 :C 1,12 =C 1,13 =C 2,12 =C 2,13 =... =C R,12 =C R,13 =0, (2.1) 26 TocomputetheWaldstatistic,westackthemodelparametersintooneparametervector ⇥ = w T i vec T (C 1,i ) ··· vec T (C R,i ) T (2.2) sothatwecandefinetheinformationmatrixF(⇥ )=E n @ L @ ✓ @ L @ ✓ T o basedonthelog- likelihoodLoftheVECversionoftheARCHmodel. TheWaldstatistictestingthenull hypothesisH variance 0 is[HH08] V y 2 !y 1 =(N1)⇥ T J (F(⇥ )) 1 JJ ⇥ J (2.3) whereJisthesetofindicescorrespondingtoelementsC 1,12 ,C 1,13 ,C 2,12 ,...,C R,12 and C R,13 in⇥ (e.g. forR=1,J = {7,10} and forR=2,J = {7,10,16,19}),⇥ J is the subvectorof⇥ containingtheelementsindexedbyJand(F(⇥ )) 1 JJ isthesubmatrixof (F(⇥ )) 1 containing rows and columns indexed byJ. Under mild stability conditions [CL03,HP09],thenulldistributionoftheteststatisticischi-squarewithtwodegreesof freedom[Eng84]. 27 Chapter3 ExchangeabilityofMultisiteData We first develop a processing pipeline that not only estimates brain activity from non- invasive sensing but also compares the estimates across different machines recording the same phenomenon at different clinical centers. Given data on the same subject per- formingthesameexperimentateachmachineondifferentdays,wecanthendetermine the consistency of estimated brain activity across the machines. Our method also tests which preprocessing and postprocessing steps increase the consistency of theses esti- mates of brain activity. In this chapter, we apply our method to three machines that record magnetoencephalographic data, using both weighted minimum-norm estimation and linearly-constrained minimum variance beamforming to estimate the brain activity recordedbythemagnetoencephalographicsensors. 3.1 Introduction n many human neurobiological studies, using multiple subjects in large-scale studies can have many benefits. Such studies reduce the effects of subject variability, allow analysisofspecificandlimitedpopulationssuchasthosewithararedisorder,cancreate commonstandardsforanalysisacrossclinics,andcanbeusedasdatabasesforfuturere- search. However,forstudiesusingmagnetoencephalography(MEG)[HHI + 93],pooling of MEG recordings across MEG centers introduces variability across centers. Hence, it is important to test whether this between-center variability in MEG data decreases the 28 abilitytofindsignificantactivityfromMEGdatawhencomparedtowithin-centervari- abilityacrossmultipleruns. Multicenter data collection is an important part of clinical studies, as the number of patients exhibiting the same conditions at a single center is often small [PBLD02]. For studies which use electroencephalographic (EEG) recordings, such multicenter studies provide statistical power that would not be evident in single-center studies [SLR + 97, LTMF98, LFG + 05, BLC + 11, BFV + 12]. In addition, multicenter studies are popular in clinical trials, which require ubiquity in the effect of a clinical treatment [BQB + 09, SNA + 09, KCM + 11]. Analyzing a study from multiple centers also reduces the effects ofcharacteristicsspecifictothepopulationofonecenter[FFD10]. With the popularity and power of multicenter studies, it is beneficial to examine whetherpoolingdatafromdifferentcentershasaneffectonstudyresults. Thevariation of data from different centers has been well-studied for magnetic resonance imaging (MRI) and positron emission tomography (PET). For PET, calibration of the scanner andconsistencyintheprotocolswerethekeystepstoadmittingmulticenterPETstudies [GKdW + 02, SSK + 09, FKD + 10]; with those two conditions satisfied, PET multicenter studiesusingstandardmetricssuchasstandardizeduptakevaluehavebeenusedinmany clinical settings [HSP + 02, MTH + 08, VBK + 09]. For anatomical MRI, as with PET, consistency in data processing steps such as segmentation [SvHH + 04] and distortion correction [JCG + 06] admitted multicenter study protocols. Functional MRI showed similarabilitiestopooldatafrommultiplecenters,intermsofresultingactivationmaps andlevelsofBOLDresponse[ZGW + 05,GJM + 10,BMS + 11] Unlike MRI and PET, MEG does not have a large knowledge base with respect to multicenter capabilities. Preliminary analysis in MEG qualitatively showed low between-center variation in localization of activity [WHMn + 07]. When analyzing the 29 timeandamplitudeofsomatosensoryactivity,subjectvariabilityaccountedforamajor- ity of the variance in the peak time and amplitude [OGH07]. However, there has as of yet been no standardized quantitative method for comparing MEG results across multi- ple centers, as was done with SUV for PET, cortical thickness for anatomical MRI and BOLDlevelsforfunctionalMRI. In this paper, we establish a quantitative method to compare MEG results across multiple centers. We use data from median nerve stimulation MEG experiments col- lected for multiple subjects each at multiple centers. This unique dataset allows us to separatevariationacrosssubjectsfromvariationacrosscenterswithoutresortingtosim- ulated sources [CCY + 91, PLMT97, LDB02, SSN05]. We evaluate two popular meth- ods of distributed source estimation in MEG - weighted minimum-norm imaging and linearly-constrained minimum-variance beamforming - along with several source esti- mationoptionstoseeifanysourceestimationtechniquesadmitmulticenterstudies. We useanonparametricstatisticalproceduretofindpeakactivitymapsforthemediannerve stimulation,consistentwithstandardprotocolsforMEGanalysis. Wethenuseasecond statistical procedure to determine if the consistency of these activity maps across any two centers is equivalent to the consistency of these activity maps across two runs at one center. If this equivalence exists, then the MEG data between the two centers is “exchangeable”; that is, MEG data collected at one center can be treated as if it was anotherrunofMEGdatacollectedatthesecondcenter. 3.2 MaterialsandMethods We use the method diagrammed in figure 3.1 to test the consistency of MEG data from multiple centers. Our goal is to determine if a metric of similarity computed across sites is equivalent in distribution to that same metric of similarity computed within a 30 Figure3.1: WecomparethemapsofsignificantactivitytodetermineifMEGdatagath- ered from one center can be treated as an extra run of MEG data gathered from another center. Weusethepipelineaboveincombinationwiththedatacollectedinsection3.2.1 toexplorethisexchangeabilityofMEGdatabetweenmachines. site. Since we need multiple samples of the evoked response, we use bootstrapping to generate surrogate samples from the distribution of the evoked response. For each bootstrap, we compute the maps of significant activity for each data set (site and run) through a permutation procedure that shuffles pre- and post-stimulus data within each trial. With the maps of activity generated for each bootstrap, we can then compute the chosen metric of similarity between sites and within site. The count of successful tests forexchangeability,foundbyequivalencetestingofthemetricofsimilarity,determines whether the group admits multicenter MEG studies. We describe the details of these proceduresnext,startingwiththeparticularsofthedataset. 31 3.2.1 Datacollection Five subjects, each free of discernible neurological defects, were taken to three differ- ent centers: Massachusetts General Hospital in Boston, MA (having a 306-sensor Vec- torView system from Elekta-Neuromag Oy, Helsinki, Finland), the University of Min- nesota Brain Sciences Center in Minneapolis, MN (having a 248-sensor Magnes 3600 WH system from 4D Neuroimaging, San Diego, CA, USA) and The MIND Institute in Albuquerque, NM (having a 275-sensor Omega system by VSM MedTech, Coquitlam, BC, Canada). At each center, presentation software from Neurobehavioral Systems (on a standard laptop computer) controlled and delivered stimuli via a S88 dual-channel stimulator with PSISU7 optical stimulus isolation units (Grass Instruments, West War- wick,RI,USA).Suchstimuliweresenttotherightmediannervewithaninter-stimulus intervalvaryingbetween 1.5sand 2s. Theintensityofthestimuluswaschosennearthe minimum necessary to induce a motor response. We collected R=147 trials during eachoftwoconsecutivedaysateachofthreecentersforeachsubject. Wescannedeach subject on a Siemens Avanto 1.5T scanner at Massachusetts General Hospital for T1- weightedsagittalMPRAGEimages. BrainSuite[SL02]extractedtessellatedsurfacesof thecortex,withV ⇡ 20000vertices,andscalpfromtheseMRimages. Wepreprocessthedataoffline,firstbyremovingstimulusartifacts(a 10mswindow around the trigger). Then, a trial is defined to be±400ms around the trigger. The data is decimated to a sampling rate of 1000Hz. The tessellated surfaces and MEG record- ingsareloadedintoBrainStorm[TBM + 11],atoolforMEGanalysisandvisualization. We denote the collected MEG data for each center and run by vector-valued timeseries x S n,r [t], wheren2 1,...,N is the trial index, t2 T,...,1,0,1,...,T is the time index (with n =0 corresponding to the stimulus), r 2 1,2 is the run number and 32 S2{ Neuromag,CTF,4D} is the label for the center. The dimension ofx S n,r [t] is the numberofMEGsensorsforthemachineatcenterS. 3.2.2 Alignmentofsensors To propertly estimate brain cortical activity from MEG sensor recordings, the MEG sensorsshouldbeaccuratelyalignedtotheextractedcorticalandscalpsurfacesbasedon the position of the head in the MEG machine. Each MEG dataset includes the location ofthenasion,leftpreauricularpoint,rightpreauricularpointandasetofdigitizedhead pointsrepresntingthescalpintheMEGmachine. BrainStormusesthefirstthreepoints, or “fiducials,” to initially align the sensors to the scalp. In this section, we determine whether using the additional head points to more finely align the sensors to the scalp promotesconsistencyinMEGsignalestimationacrosscenters. Weusetheiterativeclosestpoint(ICP)algorithm[Zha94]toaligntheheadpointsto thescalpsurfacegeneratedabove. LetthelocationsoftheAheadpointsbedenotedby three-dimensional vectorsh a = h x a h y a h z a ,a2 1,...,A, and theC scalp points be denoted by three-dimensional vectors s c = s x c s y c s z c ,c 2 1,...,C. Then, for a chosenp, the ` p norm distance between a digitized head point ath a and a scalp tesselationpointats c is d(h a ,s c )=|h x a s x c | p +|h y a s y c | p +|h z a s z c | p (3.1) TheICPprocedurethenfitsthedigitizedheadpointstothescalpusingthefollowing iterativealgorithm: 1. Foreachdigitizedheadpointath a ,findtheclosestscalptesselationpoint˜ s a based onthedistancein3.1. 33 2. Calculate a rotation matrix ˆ R and translation vector ˆ t minimizing the total dis- tance P A a=1 d(Rh a +t,˜ s a ). 3. Rotateandtranslatetheheadpointssothat ˆ Rh a + ˆ t! h a . 4. If the total distance P A a=1 d(Rh a +t,˜ s a ) has sufficiently decreased, go back to step1. There are three choices for sensor alignment: default alignment with the three fidu- cials, ICP with ` 1 norm distances (promoting low distance for many points) or ICP with ` 2 norm (promoting low total distance). To compare the three alignment choices, we compute regions of significant activity for each subject at each center on each run using the procedures defined below in sections 3.2.4 (using unconstrained minimum- norm estimation with a correlated noise model) and 3.2.5. We then compute the Dice coefficient, a measurement of similarity described in section 3.2.6, between each pair of centers for every run and subject. From this analysis, we can determine the effect of channelalignmentontheexchangeabilityofmulticenterMEGdata. 3.2.3 Resamplingtheevokedresponse For each set of recordingsx t [n], we average trials to suppress ongoing brain activity andestimatetheevokedresponse ¯x 0 [t]= 1 N N X n=1 x n [t] but we also need an estimate of the variability of the evoked response. To compute suchvariability, webootstrap [ET86]theevokedresponse; describedinsection 2.3.2,a bootstrappedsampleoftheevokedresponseis 34 ¯ x b [t]= 1 N N X n=1 x bn [t] where (b 1 ,b 2 ,...,b N ) is a resampling with replacement of the ordered indices (1,2,...,N). Over many bootstraps, the bootstrapped samples are good estimates of thedistributionoftheevokedresponseinmeanandvariance[ET93]. WeuseB=2000 bootstraps. 3.2.4 Sourceestimation Givenabootstrappedsampleoftheevokedresponseateachsensor,wewanttoestimate theevokedresponseonthecortex. ThemappingofactivityfromcortextoMEGsensors is linear; such a mapping is represented by the lead field matrix G. In BrainStorm using an overlapping spheres model [LMS + 98, HML99], we compute an orientation- unconstrainedmodelGaswellasanorientation-constrainedmodelgwhichwhichonly estimatescorticalcurrentnormalthecortexbutmayprovidebetterlocalization[DS93]. 3.2.4.1 Prewhitening Most techniques for estimating brain activity from MEG recordings assume spatial and temporal whiteness, so spatial and temporal whitening can increase the accuracy of MEGinverseestimation[SHDN08]. Fortemporalwhitening,MEGnoisehasaroughly pinkspectrum;thatis,itsfrequencyprofileisapproximately 1 f [KBFIC + 10]. Towhiten thisspectrum,first,thetime-domainfilterD = 1 1 isappliedtothedata,which 35 is equivalent to a first-order difference operation. To fine-tune the whitening, we para- metrically model the frequency spectrum of the prestimulus data ¯ x b [t< 0] with a 5th- order univariate autoregressive process (cf. section 2.4.2.1); this creates an infinite- impulse response filterR. R 1 is thus a finite-impulse-response filter which will some- whatwhitentheprestimulusdata. Wedenotethetemporallywhitenedsignalsby ˘ x b [t]. We also spatially prewhiten each bootstrap individually, using an estimate for the noisecovarianceusingtheprestimulusrecordings: C b = 1 N N X n=1 1 T1 T X t=1 ˘ x bn [t]˘ x T bn [t] ! . There are two possible assumptions about the spatial distribution of MEG noise made in this manuscript: uncorrelated noise, which allows for heteroscedasticity in noise be- tween sensors, and correlated noise, which includes covariation in noise between sen- sors. Assuminguncorrelatednoise,thewhitenerissimplyadiagonalwhitenerW b such thatW b,vv = p C 1 vv for each rowv. Assuming correlated noise, the whitenerW b must incorporate the covariation between the sensors, as estimated by the off-diagonal inC. Often,MEGdataexhibitssingularcovariancematrices, soweusearegularizedinverse to estimate the whitener. To do this, we first use singular value decomposition to factor C b =U b S b U T b by singular-value decomposition. We then truncateS b to only have the singular values above 1 9 S b,11 (the first element inS b ) due to the rank deficiency in the MEGdata. Then,amatrixthatwillwhitentheprestimulusdataisW b =S 1 2 b U T b . UsingeithermodelforMEGnoise,theresultisMEGrecordingswhitenedoversen- sorspaceintheprestimulus,denotedby ˜ x b [t]=W b ˘ x b [t]. Withthisspatiallywhitened data, we also whiten the lead field over sensor space to form ˜ G b = W b G, so that the relationbetweencorticalsources˜ s b [t]andthewhiteneddatais ˜ x b [t]= ˜ G˜ s b [t]. 36 3.2.4.2 Inverseimaging We use two methods to estimate cortical sources: weighted minimum-norm estima- tion (MNE), which tries to reduce the total variance in the estimation, and linearly- constrained minimum-variance (LCMV) beamforming, which tries to maximize the signal-to-noiseratioateachvertex. ThesemethodsaredescribedinappendixA;eachre- sultinmatrixtransformationsH b ,basedon ˜ G b andthewhiteneddata ˜ x b [t],whichhave C columnsandeitherV (fororientation-constrainedestimation)or 3V (fororientation- unconstrained estimation) rows. H b is termed the “inverse kernel”, and the estimated sourcesare˜ s b [t]=H b ˜ x b [t]. 3.2.4.3 Postprocessing fter applying the inverse kernel, the estimated sources ˜ s b [t] correspond to temporally whiteneddata. Sincetheinversekernelislinear, asisallofthepreprocessingsteps, we can find the estimated sources with respect to the original data by applying the inverse of the filters in section 3.2.4.1. This temporal coloring of ˜ s b [t] leads to the estimated sourcesˆ s b [t]. 3.2.4.4 Orientationconstraints For orientation-constrained estimation, the estimated sources are scalar using the pro- cedure above. Hence, we set ¯ s b [t]= ˆ s b [t] for the proceeding analysis. However, for orientation-unconstrainedestimation,ˆ s b [t]isathree-dimensionalvectorateachvertex; toestimatethetotalactivityateachvertexwetakethemagnitudeoftheestimatedsource vector ¯ s b [t]= q ˆ s 2 b,x [t]+ˆ s 2 b,y [t]+ˆ s 2 b,z [t] 37 where the subscripts x,y,z correspond to the three elements of the three-dimensional vectorconsistingofelementsofˆ s b [t]thatcorrespondtovertexv. 3.2.5 Significantactivity We employ a nonparametric permutation procedure [PNBL03, PNBL05] to determine significant evoked activity from the bootstrapped evoked source estimations ¯ s b [t]. Let- ting ¯ s b,v [t] be the estimation at vertexv, we define the null hypothesis of no stimulus- relatedactivityattimet> 0asH perm 0,t :¯ s b,v [t]=¯ s b,v [t< 0]. Totestthishypothesis,we calculateateststatisticontheactivityatvertexv ontimetforbootstrapb T b,v [t]= |¯ s b,v [t] 1 T P T ⌧ =1 ¯ s b,v [⌧ ]| r 1 T 1 P T ⌧ =1 ⇣ ¯ s b,v [⌧ ] 1 T P T =1 ¯ s b,v [ ] ⌘ 2 . This creates a statistcal map of cortical activity for bootstrapb [DLF + 00]. We then use a permutation procedure, correcting for multiple comparisons across vertices by setting the familywise error rate to ↵ =0.05, to determine the set of vertices for which T b,v [t] is significantly greater than zero at time t for bootstrap b, which we denote by M b [t]. WeuseP =2000permutations. 3.2.6 Testingforconsistency LetE andF betwocentersforwhichwearetestingtheconsistencyoftheirMEGdata. Weusetheconceptof“exchangeability,”whichsaysthattwodatasetsfromtwocenters are consistent if the dataset at one site can be treated as if it was the second run of the protocol at the second site. For MEG, then, we want to compare the similarity of data fromE anddatafromF againstthesimilarityofdatafromE overtworuns. 38 LetM E,1 b [t],M E,2 b [t] andM F,1 b [t] be the sets of active vertices at timen computed after section 3.2.5. We use two measures of similarity between sets of active vertices: theSimpson[Sim43]andDice[Dic45]coefficients. FortwoarbitrarysetsK andL,the coefficientsofsimilarityaredefinedby s(K,L)= |K\ L| min{|K|,|L|} d(K,L)= |K\ L| 1 2 (|K|+|L|) Withrespecttosetsofsignificantvertices,theSimpsoncoefficientmeasuresthesimilar- ityinlocationbetweenK andL;s(K,L)rangesfrom 0to 1withmaximumvalueifK is a subset ofL or vice versa. The Dice coefficient is more stringent in that it measures similarityinbothlocationand extent;d(K,L)rangesfrom0to1withmaximumvalue ifandonlyifK =L. Foragivenbootstrapb,wemustaccountforpossibletimedelaysbetweenmachines due stimulus-related delays or other artifacts. We do not have to do so for similarity across runs as there were no clear differences in time delay between runs at the same machine. So,foreachbootstrapwefindthetimeindexofmaximaltotalchannelpower, denotedbymax b . Then,wecomputethecoefficientsofsimilarity D ⇣ M E,1 b ,M E,2 b ⌘ =d ⇣ M E,1 b h max E,1 b i ,M E,2 b h max E,2 b i⌘ S ⇣ M E,1 b ,M E,2 b ⌘ =s ⇣ M E,1 b h max E,1 b i ,M E,2 b h max E,2 b i⌘ D ⇣ M E,1 b ,M F,1 b ⌘ =max i= 2, 1,0,1,2 j= 2, 1,0,1,2 d ⇣ M E,1 b h max E,1 b +i i ,M F,1 b h max F,1 b +j i⌘ S ⇣ M E,1 b ,M F,1 b ⌘ =max i= 2, 1,0,1,2 j= 2, 1,0,1,2 s ⇣ M E,1 b h max E,1 b +i i ,M F,1 b h max F,1 b +j i⌘ between sites and within sites. The first two coefficients D ⇣ M E,1 b ,M E,2 b ⌘ and S ⇣ M E,1 b ,M E,2 b ⌘ represent the similarity of data from E over two runs, i.e. the 39 within-site similarity. On the other hand, the last two coefficentsD ⇣ M E,1 b ,M F,1 b ⌘ and S ⇣ M E,1 b ,M F,1 b ⌘ representthesimilarityofdatabetweenE andF,i.e. thebetween-site similarity. We then apply an equivalence test [Wel10] to determine of the between-center sim- ilarity is equivalentto the within-center similarity, admitting exchangeability. For read- ability, we focus on the Dice coefficient D; the procedure is repeated for the Simpson coefficient S. For the null hypothesis, we cannot choose H equiv 0 : D ⇣ M E,1 b ,M E,2 b ⌘ 6= D ⇣ M E,1 b ,M F,1 b ⌘ since the coefficients are continuous variables soH 0 is always false. Instead,wemustchooseanallowabletoleranceindifference". Todothis,wecalculate the within-center variance in similarity Var ⇣ D ⇣ M S,1 b ,M S,2 b ⌘⌘ for all possible sites S and subjects. The average of these variances, across sites and subjects, is an estimate of the group variation in within-site similarity, so we consider its square root to be the allowabletoleranceindifference"= q P S,subject Var(D subject (M S,1 ,M S,2 )). Then the null hypothesis to test for equivalence is H equiv 0 : |D M E,1 ,M E,2 D M E,1 ,M F,1 | < ". Each bootstrap gives one sample of the distributions of D M E,1 ,M E,2 and D M E,1 ,M F,1 , so we use a two-sample nonparametric ttest to reject nonequivalence. We then count the number of rejections, corresponding to significantequivalence,overallsubjectsandrunstodeterminethelevelofgroupequiv- alenceforeachpairofcenters. 3.3 Results In this section, we demonstrate the effect of alignment on consistency of MEG source estimation across sites. Then, using the chosen alignment, we describe the effect the previously described MEG source estimation choices on the exchangeability of MEG 40 (a)Neuromag-CTFrun1 (b)CTF-4Drun1 (c)4D-Neuromagrun1 (d)Neuromag-CTFrun2 (e)CTF-4Drun2 (f)4D-Neuromagrun2 Figure 3.2: The choice of alignment does not substantially change the similarity coeffi- cientbetweenmapsofsignificantactivityfromanytwositesonanyrunforanysubject. Foreachchoiceofpairofsitesandrun,eachbaristheSimpsoncoefficient(andeachstar istheDicecoefficient)betweenmapsofsignificantactivitygeneratedfromorientation- unconstrainedMNEusingfullwhitening. Thethreebarsforeachsubjectcorrespondto theinitialalignment,` 1 normICPalignmentand` 2 normICPalignment. recordingsingroupstudies. Wewillthenbeabletoproposeanarrayofchoicesthatall promotemultisiteMEGstudies. 3.3.1 Effectsofalignmentonconsistency For analyzing the effect of alignment, we calculate the Dice coefficient between maps of significant activity (estimated via orientation-unconstrained weighted MNE with a correlated noise model) for each of the five subjects (labelled A, B, C, D and E), each run (1, 2) and each pair of MEG machines in the data collection described in section 3.2.1. Infigure3.2itcanbeseenthatthesimilaritycoefficientsnotchangemuchbased on the alignment, and such changes are not consistenly in favor of a specific alignment 41 choice. Hence, small changes in alignment of MEG, while beneficial towards accuracy in localization [SLPB96], is not too influential in variability of MEG across multiple centers. In the proceeding work, we will assume all data has been aligned using ICP with` 2 normdistancecalculations. 3.3.2 WeightedMNE As noted in section 3.2.4.2, weighted MNE uses a model which minimizes the spatial ` 2 -normoftheestimatedsources. Duetothelargevariationinmapsbasedonorientation constraints, we will split the analysis of weighted MNE into the two choices of source orientationconstraints. 3.3.2.1 Orientation-constrainedMNE For orientation-constrained MNE, the estimated activity is not contiguous. This is be- causetheorientationconstraintspreventtheestimationofsignficantactivityonthegyri, where the current sources point radially and thus can’t be detected by MEG [LDB02]. On the left of figure 3.3a, we see that with the discontinuities, most voxels which are significant for a few bootstraps are significant for almost all bootstraps, since most of themapshowsvaluesnearthenumberofbootstraps(B=2000). Wealsoseethatthere issomevariationintheregionsofsignficantactivitybetweensites;forexample,thetwo runs on the Neuromag scanner show similar maps that are less extended than the two runsonthe4Dscanner. The histogram of the similarity measures in figure 3.3b indicates relatively high between-machine similarity. The between-machine similarity is close to the within- machine similarity when taking into account the variation in similarity measurements. As will be seen for the remaining measures, the Simpson coefficient shows a larger 42 (a) the number of bootstraps for which each vertex issignificantlyactive (b) distribution of similarity, aggregated over sub- jectsandruns. Figure 3.3: For orientation-constrainted weighted MNE, the number of bootstraps that consideravertexsignificantacrossmachinesiseitherclosetothemaximum(B=2000) or the minimum (zero). The distribution of similarity coefficients, aggregated over all subjects & pairs of runs, shows that between-machine similarity is smaller than within- machine similarity, but that difference seems to be smaller than the variation in within- machine similarity. On the left, the counts of the number of significant bootstraps are plotted using one subject (subject D). The top rows are bootstraps admitting significant activity at a vertex at both machines; the bottom row is bootstraps admitting significant activity at a vertex for both runs at the specified machine. On the right, the histogram values are computed with bin width 0.01 and aggregated over all possible subjects and runs. The y-axes are normalized to represent values of the estimated probability distri- butionfunction. Inthesefigures,weusedthecorrelatednoisemodel. degree of separation in between-machine and within-machine similarity, which is due tothelargerequiavlenceinlocalizationacrossbootstrapswithinthesamemachine. 3.3.2.2 Orientation-unconstrainedMNE Withorientation-unconstrainedMNE,theregionsofsignificantactivityarecontiguous. As a result, in figure 3.4a, the region where most bootstraps agree on activity is largely in one or two wide patches on the cortical surface. In addition, the patches of activity 43 (a) number of bootstraps for which each vertex is significantlyactive (b) distribution of similarity, aggregated over sub- jectsandruns. Figure 3.4: For orientation-unconstrained MNE, the overlap of significant activity be- tween machines is more contiguous, and as a result the similarity between- and within- machines is higher than with orientation constraints. In addition, the relative increase means there is a larger relative difference in similarity when using multiple machines. Ontheleft,thecountsofthenumberofsignificantboostrapsareplottedusingonesub- ject (subject D). The top rows count bootstraps where a vertex is active across the two specified machines, while the bottom row counts bootstraps where a vertex is active across runs at the specified machine. On the right, the histogram values are computed withbinwidth0.01andaggregatedoverallsubjectsandruns. They-axesrepresentthe values of the estimated probability distribution function. In these figures, we used the correlatednoisemodel. reach across sulci and gyri, as opposed to the orientation-constrained MNE estimates thatwerelocalizedlargelyinsulci. As a result, the distribution of the similarity coefficients within-machine is nar- rower in figure 3.4b than with the previous results in which source orientations were constrained normal to the surface (figure 3.5b). The relative difference in similar- ity between- and within-machines is thus relatively larger than with the orientation- constrained case. Additionally, the similarity coefficients are larger on average without orientationconstraints,betheycomputedbetweenmachinesorwithinamachine. 44 3.3.3 Beamforming ForLCMVbeamforming,insteadofminimizingthespatial` 2 -normofthereconstructed sources, we maximize the signal-to-noise ratio at each cortical tesselation vertex under the assumption that the current cortical location is signal and the remaining cortical locations are noise [DSN06]. As with the previous section, we split the results into orientation-constrainedandorientation-unconstrainedsourceestimation. 3.3.3.1 Orientation-constrainedbeamforming Orientation-constrainedbeamformingonlylooksatsignalpowerforcurrentsnormalto the cortical surface. In figure 3.5a the counts of significant voxels show a larger extent of activity than shown with weighted MNE, though the locations of significant activity are in the same sulci and gyri. The between-site Dice coefficient here is smaller than thebetween-sitesimilairtyforweightedMNE,whiletheoppositeconditionholdswhen usingtheSimpsoncoefficient. Asseeninfigure3.5b,thebetween-machinesimilarityis quite high, though a lot further from within-machine similarity than in the orientation- constrainedweightedMNEcase. 3.3.3.2 Orientation-unconstrainedbeamforming As with orientation-constrained beamforming, orientation-unconstrained beamforming showsafewbootstrapswithlargerregionsofsignificantactivity, asseeninfigure 3.6a. In addition, the bootstraps are more variable; as a result, fewer vertices are significant over most of the bootstraps. This increase in region size leads to a substantial increase inSimpsoncoefficient,asseeninfigure3.6b. Inaddition,thereisasubstantialdecrease intheDicecoefficient,astherearenowmanymoreverticesthataresignificantlyactive 45 (a) number of bootstraps for which each vertex is significantlyactive (b) distribution of similarity, aggregated over sub- jectsandruns. Figure 3.5: For orientation-constrained LCMV beamforming, the regions of signifi- cant activity are larger than for weighted MNE. As a result, there is somewhat smaller between-machine Dice coefficients for LCMV beamforming, while there are higher between-machineSimpsoncoefficients,whencomparedtoweightedMNE.Ontheleft, thecountsofthenumberofsignificantbootstrapsareplottedusingonesubject(subject D). The top rows are bootstraps admitting significant activity at a vertex at both ma- chines; the bottom row is bootstraps admitting significant activity at a vertex for both runs at the specified machine. On the right, the histogram values are computed with bin width 0.01 and aggregated over all possible subjects and runs. The y-axes are nor- malized to represent values of the estimated probability distribution function. In these figures,weusedthecorrelatednoisemodel. only for a few bootstraps. The within-machine similarity does not change as much as thebetween-sitesimilarity,aswell. 3.3.4 Groupconsistencytesting We can quantify whether any of the previous source estimation methods admit ex- changeabilityusingtheprocedureofequvialencetestingdescribedinsection 3.2.6.We want to determine how many combinations of subject and run provide equivalence of 46 (a) number of bootstraps for which each vertex is significantlyactive (b) distribution of similarity, aggregated over sub- jectsandruns. Figure 3.6: For orientation-constrained LCMV beamforming, the bootstraps are much more variable for each machine and run. As a result, the similarity in extent will de- crease as the sizes of regions of significant activity differ. In contrast, the similarity in localization will increase as more vertices are classified as signifcantly active. On the left, the counts of the number of significant bootstraps are plotted using one subject (subjectD).Thetoprowsarebootstrapsadmittingsignificantactivityatavertexatboth machines;thebottomrowisbootstrapsadmittingsignificantactivityatavertexforboth runs at the specified machine. On the right, the histogram values are computed with bin width 0.01 and aggregated over all possible subjects and runs. The y-axes are nor- malized to represent values of the estimated probability distribution function. In these figures,weusedthecorrelatednoisemodel. between-machinesimilaritytowithin-machinesimilarityofmapsofsignificantactivity estimated from MEG recordings. For each pair of machines, we have 5 subjects and 4 choices of datasets: run 1 on each machine, run 2 on each machine, run 1 on the first with run 2 on the second machine, and run 2 on the first with run 1 on the second machine. In tables 3.1 and 3.2 we tabulate the fraction of these 20 comparisons which admit exchangeability. For weighted MNE in table 3.1, we see that for most pairs of 47 Neuro-CTF CTF-4D 4D-Neuro versus Neuro-Neuro CTF-CTF CTF-CTF 4D-4D 4D-4D Neuro-Neuro Dice free cor 20/20 8/20 12/20 20/20 17/20 20/20 unc 20/20 12/20 10/20 17/20 20/20 20/20 cons cor 20/20 8/20 10/20 20/20 18/20 20/20 unc 19/20 12/20 11/20 19/20 20/20 20/20 Simpson free cor 17/20 12/20 15/20 19/20 16/20 18/20 unc 10/20 8/20 12/20 20/20 13/20 11/20 cons cor 17/20 14/20 14/20 20/20 18/20 17/20 unc 14/20 9/20 15/20 20/20 15/20 13/20 Table 3.1: Fraction of subject-runs that admit equivalence in between-machine and within-machine similiarity, using weighted MNE. Bold text indicates a majority of datasetsadmittingexchangeability. Forcolumntwo,“free”meansorientation-freewhile “cons” means orientation-constrained. For column three, “cor” means correlated noise model and “unc” means uncorrelated noise model. For space, we’ve also shortened “Neuromag”to“Neuro.” Formostchoicesofpairsofsitesandpairsofruns,amajority ofdatasetsadmitequivalence. Thechoicetoconstraintsourceorientationstobenormal to the cortical surface increases exchangeability for the most part, while the choice of noise model does not have a substantial impact. In addition, data from CTF machines tendtorejectexchangeabilitywithothermachines. sites,equivalenceisadmittedforbothchoicesoforientationconstraintsandbothcoeffi- cients. WenotethatcomparisonswhichuseCTFdatasufferincomparisontoNeuromag and4Ddata,possiblyduetohighernoisevarianceintheCTFdatacollected. For LCMV beamforming in table 3.2, the percentages that admit equivalence are much smaller. Notably different is the lack of exchangeability when using orientation- free estimation. Based on the maps in figure 3.6a, this may be due to the variation in bootstrapped maps of significant activity when using orientation-free LCMV beam- forming. As with the results for weighted MNE, comparisons using CTF data suffer as comparedtoNeuromagor4Ddata. 48 Neuro-CTF CTF-4D 4D-Neuro versus Neuro-Neuro CTF-CTF CTF-CTF 4D-4D 4D-4D Neuro-Neuro Dice free cor 20/20 11/20 10/20 14/20 12/20 15/20 unc 16/20 7/20 10/20 13/20 4/20 7/20 cons cor 20/20 14/20 13/20 18/20 20/20 18/20 unc 16/20 9/20 11/20 18/20 15/20 15/20 Simpson free cor 11/20 4/20 3/20 7/20 17/20 17/20 unc 12/20 2/20 4/20 7/20 14/20 14/20 cons cor 13/20 10/20 10/20 16/20 16/20 17/20 unc 13/20 9/20 14/20 18/20 16/20 13/20 Table 3.2: Fraction of subject-runs that admit equivalence in between-machine and within-machine similiarity, using LCMV beamforming. Bold text indicates a majority of datasets admitting exchangeability. For column two, “free” means orientation-free while “cons” means orientation-constrained. For column three, “cor” means correlated noise model and “unc” means uncorrelated noise model. For space, we’ve also short- ened“Neuromag”to“Neuro.” Fewerpairsofsitesadmitequivalencethanwithweighted MNE(cf. table3.1). However,manypairsstillshowequivalenceformosttests. Aswith weightedMNE,orientationconstraintshavemoreeffectonequivalencethannoisemod- eling, with more exchangeability if activity is estimated normal to the cortical surface. Inaddition,datafromCTFmachinestendtorejectexchangeabilitywithothermachines. The choice of whitener does not seem to have much of an impact; in tables 3.1 and 3.2, the values do not show much separation between noise models. The largest no- table difference is using orientation-free LCMV beamforming and the Dice coefficient; however,asseeninfigure3.6b,thosecoefficientsaresmallandtheexchangeabilityad- mitted using a correlated noise model may be an artifact of the large variance in Dice coefficientusingorientation-freeLCMVbeamforming. 3.4 Discussion In this chapter, we analyzed the effect of a multitude of MEG processing steps on ex- changeabilityofMEGdataacrosscenters. Welookedatalignmentofthesensorstothe 49 anatomical data, different types of whitening, different inverse imaging techniques and different orientation constraints. We found that alignment and noise whitening choices hadlittleimpactonexchangeability,whileweightedMNEseemedtohavemoreconsen- sus in significant activity across machines than LCMV beamforming. This difference is much larger when there are no orientation constraints; with orientation constraints, allmethodsseemtoadmitexchangeabilityofMEGdatabetweenmachines,intermsof reconstructedmapsofsignificantactivity. For many of the pairs of sites, not all subject-runs admitted exchangeability. This is largely due to subject variance; as noted on previous analysis of this data [OGH07], subject variability is a large factor in group analysis of MEG data. This is not just a property specific to MEG; functional MRI [YGW + 10, BMS + 11] and anatomical MRI [SvHH + 04] show a similar property. The brain’s anatomy is highly variable across subjects and it is expected that its function will also be highly variable across subjects. This subject variability - captured by the within-machine variance in section 3.2.6 - is larger than the variability of MEG noise due to different MEG machines or locations, as demonstrated by the rejection of non-equivalence in similarity coefficients between- andwithin-machines. As part of the estimation of cortical current densities, we spatially and temporally prewhitenedthedata. Spatialwhiteningisusedtodecreasetheinfluenceofbackground activitybyremovingvariationinthesourcesrelatedtosuchactivity[SHDN08]. Hence, the prewhitener is chosen using the pre-stimulus interval (in our case, x b [n< 0]) as background activity. Since this activity contains not contain only spatial correlations but time-lagged correlations as well, we also use the temporal whitening procedure in section3.2.4.1todecreasetheinfluenceofbackgroundactivityontheestimatedtempo- raldynamics[BS97]. Weobservedintables3.1and3.2thatthetypeofspatialwhitening 50 doesnothavealargeimpactonthemulticenterexchangeabilityofMEGdata. However, inmostcases,usingacorrelatednoisemodel,thusremovingcovariationbetweenMEG sensors related to background activity, does increase consistency of MEG data across machines. The differences between LCMV beamforming and weighted MNE have been cov- ered in various papers [MBL03, Ima10] and shown to be modifications of a general least-squares adaptive filtering approach. With respect to exchangeability of MEG data across machines, weighted MNE seems to perform better than LCMV beamforming. Based on the bootstrapped maps of significant activity, LCMV beamforming seems to suffermorefromtheeffectsoflowtrialsizesthanweightedMNE.LCMVbeamforming doesrequireasubstantialamountofdatatohaveastableestimateofthecovariancema- trixusedintheinversekernel[BVR + 08]. Withthenumberoftrialsusedhere(R=147) andashorttemporalresponse(under 100ms),theresultinginversekernelvariationmay haveanimpactonattemptstopoolresultsbetweenMEGmachines. For LCMV beamforming, orientation constraints seem to increase consistency across sites; such constraints have also been shown to increase the spatial resolution and the localization accuracy of LCMV beamforming [HB03]. These benefits require relatively accurate MEG-MRI registration, provided by any of the ICP algorithms in section 3.2.2, and accurate estimation of the orientations normal to the cortical surface, provided by BrainSuite in section 3.2.1. Weighted MNE seems to be less affected by orientation constraints; such constraints have been found to decrease the localization error [LWA + 06] but not by as much as the extent of activity in this experiment. Hence, 51 weightedMNEadmitstheexchangeabilityofMEGdataacrossmachineswhethercorti- cal current source estimations are constrained or unconstrained, while LCMV beam- forming seems to require orientation constraints to admit the same exchangeability acrossmachines. An alternative approach to testing for MEG consistency with this dataset, and other datasets exhibiting clear dipolar activity, would be to use dipole localization [MLL92,LMS + 98]. Dipolelocalization,ratherthanestimatingthecorticalcurrentden- sity,estimatesthelocationandpowerofthesingledipole,calledthe“equivalentcurrent dipole,” that best explains the magnetic field at all the sensors. As a result, given that thetrueactivitycan bemodeledbyasingledipole, dipoleestimationwouldhave much lower variance than distributed source estimation. Procedures to estimate this dipole wereusedin[OGH07]todeterminethevariationinpeaktimeandmagnituderelatedto machine differences. Such an analysis may also be helpful in determining the variation inlocalizationaswell. Recently, guidelines have been presented regarding the evaluation of methods such as WMNE and LCMV as well as techniques such as MEG [?]. In this study we have attemptedtosatisfymostoftheseguidelines. Specifically,wehaveuseddefaultparame- tersinBrainStorm,whichispubliclyavailableathttp://neuroimage.usc.edu/brainstorm/ and open-source. We have also supplied the processing steps in section 3.2 along with details of the two source estimation algorithms in appendix A. Thus, the methods used herecanbeverifiedforsetsofdataotherthantheonepresentedinthismanuscript. 3.5 Summary Pooling of MEG data across machines can increase statistical power in group experi- ments. Wehaveshownthatwithcorticalcurrentdensityestimationconstrainedtohave 52 current orientation normal to the cortical surface, two popular methods of estimation - weighted MNE and LCMV beamforming - admit exchangeability of MEG data from multiplemachines. Whenthecurrentorientationsareleftunconstrained,andtherearea moderatenumberoftrials,onlyweightedMNEadmitsthesameexchangeability,which isdefinedbytheabilitytotreatdatasetsfromtwomachinesasiftheyweredatasetsfrom thesamemachinerunatdifferenttimes. Wehavealsoshownthatwhilespatialwhiten- ing and channel alignment are important, neither the type of spatial whitening nor the choice of spatial alignment have a large impact on the consistency of MEG data across machines. 53 Chapter4 CanonicalGrangerCausality With proper preprocessing to promote consistency in electrophysiological recordings, we turn our attention to the estimation of causality between these recorded (or esti- mated) biological signals. Existing approaches for computing causality in mean with thesesignalshavefocusedoneitherpairwiseinteractionsormultivariateactions,ineach case considering each signal to be a separate biological system. In this chapter, we de- velop a new model, termed canonical Granger causality (CGC), that robustly estimates Grangercausalitybetweenensemblesofsignalsinamacro-scalenetworkbyexploiting the commonalities in each ensemble of signals. We present a customized optimization algorithmtocomputeCGCanddemonstratetheimprovedperformanceofCGCrelative to existing approaches. In addition, we apply CGC to macaque brain functional ECoG data,showingdynamicnetworkrelationshipsbetweenthestriateandprestriateduringa visuomotortask. 4.1 Introduction This chapter focuses on a directed causal model of brain activity called canonical Granger causality (CGC). CGC determines the extent to which the activity in one re- gion is influenced by the activity in another region. The goal of this model is to extend current work on causality to model interactions between groups of signals; for electro- physiologicaldata,thesegroupsareoftenspatiallyclusteredthroughregionsofinterest 54 (ROIs). In this chapter, we introduce an optimization technique customized to com- puteCGC,basedonoptimizationonspecifictypesofmanifolds. Later,wedemonstrate CGC’spowerinanalyzingcausalitywhenthenumberofsamplesissmalland/orthere- quiredcausalitymodelisrelativelycomplex. CGC’sabilitiesstemfromitsrelationship tobothpairwisecausalmodelingandundirectedregionalconnectivityanalysis. CausalmodelslikeGrangercausality(GC)[Gra69]andBayesiannetworks[Gha98] determinethestrengthanddirectionalityofsignalinteractionsbyanalyzingthejointdis- tributions of their time series under appropriate modeling assumptions. Classical meth- odsanalyzecausalityinmeanbetweenmultipletimeseriesinapairwisefashion,using bivariate models [Gew82]. However, pairwise analysis is not ideal for functional brain mapping experiments in which multiple time series are available from different sen- sors,voxels(inareconstructedsourcevolume),orvertices(fromareconstructedsource surface). Usually, such time series can be spatially correlated because of the limited spatialresolutionofthemappingtechniquesorbecausefunctionalactivationisspatially distributed across a larger region of cortex. As a result, rather than analyzing causality independentlybetweenpairsoftimeseries,itcouldbemoredesirabletoanalyzecausal- itybetweenROIsthateachincludemultiplecorrelatedtimeseries[DEV + 03]. Grouping multiple time series together can reduce the number of variables to estimate in a para- metric model, can improve the signal-to-noise ratio of the resulting causality measure, andcanhelpinidentifyinglong-rangeconnectivitythatmightpreviouslyhavebeenob- scured by larger apparent short-range connectivity artificially introduced by crosstalk [BGY + 09]. Many methods have been used to assess connectivity between ROIs. One popu- lar method is canonical correlation [Hot36], which estimates an undirected model of connectivity by maximizing the correlation between weighted linear combinations of 55 signals from two ROIs. In addition to providing a measure of connectivity, canonical correlation also provides an estimate of the relative contribution of each signal from each ROI to the correlation between the ROIs [KV81]. Canonical correlation analysis, however,providesnoinformationaboutthedirectionofinformationflowbetweenROIs. Directed models between ROIs often use Granger causality as a model of directed interaction. MultivariateGrangercausality(MGC)[BS10]reliesonmultivariateautore- gressive models of the signals within each ROI. While MGC can be accurate when the data records are sufficiently long, MGC involves substantially more parameters to esti- mate than do bivariate methods and can be prone to overfitting and sensitive to noise. Granger canonical correlation analysis (GCCA) [WCK + 11] uses aggregated signals in each ROI, similar to canonical correlation analysis. This reduces the number of param- eterstoestimaterelativetomethodslikeMGC,althoughitisnotconsideredanestimate of the amount of causality or the underlying signals responsible for the causal connec- tion[SFC + 10],andcanonlyidentifythepresenceorabsenceofcausality. Our CGC measure combines the strengths and overcomes the disadvantages of GC and MGC by using an optimized weighted linear combination of the time series to par- simoniously represent each ROI with a single time series. This is similar to the idea of canonical correlation, where the signal dimensionality is reduced by considering the weighted sum of signals [CALC10]. Subsequently, CGC computes standard bivariate GC between the two representative time series. Like MGC, CGC enables estimation of causalitybetweendifferentROIswithmultipletimeseries. LikeGC,CGCrequiresesti- mationofarelativelysmallnumberofparameters,whichmakesitlesspronetooverfit- tingandlesssensitivetoshortdatarecords,andcandeterminethecomparativestrength of causal interactions. Like GCCA, CGC estimates signals of interest by weighted ag- gregation. Hence, CGC combines a regional causality measure with an estimate of not 56 only those signals representing each region but also the strength of any such causal connection. Thischapterisorganizedasfollows. Insection4.2wereviewGC,MGCandGCCA to establish the groundwork for CGC. The CGC model and associated algorithms are presented in section 4.3. A simulation study is described in section 4.4 that illustrates theadvantagesanddisadvantagesoftheproposedapproach. Insection4.5,theproposed model is applied to real brain data acquired from a macaque performing a visuomotor task[BCN93],whereweshowthatCGCcanidentifycausalinteractionsbetweenstriate and prestriate regions of the occipital lobe. Finally, discussions and conclusions are presentedinsections4.6and4.7,respectively. 4.2 ReviewofCausalityMeasures We want to determine the amount by which signals in the “source” region of interest influence the mean trend of signals in the “sink” region of interest, as shown in figure 4.1. Inthissection,wewilldescribethethreepreviousmethodsused: Grangercausality (GC),multivariateGrangercausality(MGC)andGrangercanonicalcorrelationanalysis (GCCA). In the next section, we will then describe our proposed measure: canonical Grangercausality(CGC). 4.2.1 GrangerCausality(GC) We have described Granger causality in section 2.4.2.1; for our signals, the Granger causalitymetricis G x 2 !x 1 = Var(x 1 [n]|x 1 [n1,...]) Var(x 1 [n]|x 1 [n1,...],x 2 [n1,...]) (4.1) 57 Figure4.1: GraphicaldepictionofGC,MGC,andCGC.Wewanttoestimatethecausal- ity from the source region to the sink region via the time series recordings (represented bydarkcircles)fromeachregion. GCanalyzesthecausalitybetweenpairsofrecordings using univariate and bivariate autoregressive models. MGC fits much larger parametric autoregressive models to collections of time series. CGC parsimoniously represents each ROI using a single signal formed through an optimized weighted linear combina- tion of signals from the ROI. CGC then applies standard GC analysis to these parsimo- niousrepresentations. wherewehaveadded 1totheGrangercausalitytoeasilycomparewiththeothermeth- ods in this section. GC requires the estimation of a modest number (P +4P =5P) of autoregressive coefficients (cf. section 2.4.2.1), which makes estimation of GC rela- tivelystableforshorttimeseries(lowN)ormorecomplexmodels(highP). However, GC does not provide a mechanism for analyzing causality between sets of multiple time series. In addition, when GC is applied pairwise to spatially-correlated signals 58 frommultipleareasinthebrain,GCisbiasedtowardsfindinghighercausalitybetween spatially-adjacentareas[WCBD07]. 4.2.2 MultivariateGrangerCausality(MGC) MGC is one method to extend GC for analyzing causality between two sets of multiple timeseries[BS10]. Giventhevector-valuedtimeseriesy 1 (withM 1 valuesateachtime point) and the vector-valued time seriesy 2 (withM 2 values at each time point), MGC attempts to measure the extent to which past values of y 2 can be used to predict the presentvalueofy 1 . AnalogoustothetwoARmodelsusedbyGC,MGCconsiderstwo P th ordermultivariateautoregressive(MVAR)modelsgivenby: y 1 [n]= P X p=1 B[p]y 1 [np]+r 1 [n] 2 6 4 y 1 [n] y 2 [n] 3 7 5 = P X p=1 A[p] 2 6 4 y 1 [np] y 2 [np] 3 7 5 + 2 6 4 s 1 [n] s 2 [n] 3 7 5 n=1,...,N where B[p] 2 R M 1 ⇥ M 1 and A[p] 2 R (M 1 +M 2 )⇥ (M 1 +M 2 ) , p =1,...,P, denote the MVAR coefficient matrices which are estimated to minimize the prediction error vari- ancesr 1 [n] and s 1 [n] s 2 [n] T . Similar to the univariate case, MGC is defined as theratiobetweenthesizesofthesetwopredictionerrorsandisgivenby: M y 2 !y 1 = |P 2 1 | |⌃ 2 1 | 59 where|·|denotesthematrixdeterminantand P 2 1 = 1 N1 N X n=1 r 1 [n]r T 1 [n] ⌃ 2 2!1 = 1 N1 N X n=1 s 1 [n]s T 1 [n] As with GC, MGC takes values between 0 and 1 , with large values indicating that the past of y 2 contributes substantially to the prediction of y 1 . Relative to GC, MGC enables the estimation of causality between two ROIs, with each ROI con- taining multiple time series. However, MGC requires the estimation of many more M 2 1 P +(M 1 +M 2 ) 2 P coefficients, which makes it less stable than GC, particularly whenthenumberofsamplesissmall. 4.2.3 Grangercanonicalcorrelationanalysis(GCCA) Granger canonical correlation analysis (sometimes named cluster Granger causality [SFC + 10]) is an alternate method for regional causality which looks at lagged corre- lations. Considering the vector-valued time series y 1 and y 2 above, and the order P whichcorrespondstothemaximumlaginthecausalinteraction,GCCAismeasuredby [WCK + 11] ⇣ ˆ ↵ , ˆ ⌘ =argmax ↵ 2 R M 1 , 2 R M 2 k↵ k ` 2 =k k ` 2 =1 corr ↵ T y 1 [n], T y 2 [nP] R y 2 !y 1 =G ˆ T y 2 ! ˆ ↵ T y 1 GCCA is a more parsimonious model than MGC, but it only looks at the effect of y 2 [nP]ony 1 [n]ratherthanthetotaleffectofy 2 [n1,...,nP]ony 1 [n]. 60 4.3 CanonicalGrangerCausality(CGC) We introduce a new measure, CGC, that combines ideas from canonical correlation analysisandGCanalysis. LikeMGC,CGCenablestheestimationofcausalitybetween two ROIs containing multiple time series. However, CGC uses a more parsimonious model than MGC, which makes it less sensitive to noise and less prone to overfitting. LikeGCCA,CGCusesweightedsumsoftimeseriestorepresenteachregion. However, CGCmeasurestheamountofcausalityandthusshouldbeabletodifferentiatebetween causalrelationshipsthathavedifferentstrengths. Considering the vector-valued time series y 1 [n] and y 2 [n] introduced above, we defineCGCas C y 2 !y 1 =max ↵ 2 R M 1 , 2 R M 2 k↵ k ` 2 =k k ` 2 =1 G T y 1 !↵ T y 2 whereG represents the Granger causality defined in equation 4.1. CGC is obtained by computing the standard GC on scalar-valued time series ˆ x 1 = ↵ T y 1 and ˆ x 2 = T y 2 obtainedfromalinearcombinationoftheoriginaltimeseriesfromeachROI.Thelinear combinationweightsforeachROI,↵ and ,areoptimizedtomaximizetheGCbetween ˆ x 1 and ˆ x 2 . Note that we have constrained ↵ and to have unit norm because GC is invarianttorescalingofthedata. Given the unit-norm constraint, the estimation of ↵ and requires estimation of M 1 + M 2 2 parameters. Combined with the estimation of 5P AR coefficients for standard GC (cf. section 4.2.1), CGC requires the estimation of M 1 +M 2 2+5P parameters. Note that CGC has substantially fewer parameters to estimate compared to theM 2 1 P +(M 1 +M 2 ) 2 P parameters that must be estimated for MGC (cf. section 4.2.2), and that this difference is particularly pronounced for large values ofP,M 1 , or M 2 . Aswewilldemonstrateinsection4.4.2,thisreductioninthenumberofparameters 61 resultsinCGCbeingmorestableandaccurateinidentifyingROI-to-ROIcausalitythan MGC for short data records. However, we note that CGC is an ROI-wise measure that finds the strongest causal interaction between the two ROIs, and hence may not provide direct insight into the individual signal interactions if there are multiple causal connectionsbetweenROIs. ComputationofCGCrequiresoptimizationwithrespecttoweightvectors↵ and . Due to the unit-norm constraints, ↵ lies on the (M 1 1)-sphereS M 1 1 , and lies on the (M 2 1)-sphereS M 2 1 . In appendix B we exploit the structure of these constraint setstointroduceanumericalalgorithmtooptimizetheseweights,usingavariationofa techniquedevelopedforoptimizationontheSteifelmanifold[EAS98]. 4.4 Simulations We compared the properties of the CGC, MGC, and GCCA measures using simulated timeseries. Performancewasevaluatedbyexaminingeachmethod’sabilitytofindtrue causalitywhileavoidingfalsecausality. Wetestbothmeasuresonsimulatedtimeseries with varying numbers of samples (N), model order (P), and noise levels as described below. 4.4.1 SimulatedModels We simulated directed networks of 2 ROIs (ROI 1 and ROI 2 ) with M 1 = M 2 =4. Simulations were constructed from AR models, and included interference and noise (described in section 4.4.1.2). Simulations were generated for model orders P 2 62 {2,4,6,8,10}, number of time points N2{ 100,200,400}, signal-to-interference ra- tios SIR2{ 1,5,25}, and signal-to-noise ratios SNR2{ 1,5,25}, with a fixed number ofinterferers (I=2). 4.4.1.1 SimulatedSignalsofInterest Each ROI has one underlying time series involved in information transfer between the ROIs, denoted byx 1 andx 2 respectively. We simulate these signals using bivariate AR processesaccordingto: 2 6 4 x 1 [n] x 2 [n] 3 7 5 = P X p=1 2 6 4 D[p] A[p] 0 D[p] 3 7 5 2 6 4 x 1 [np] x 2 [np] 3 7 5 + 2 6 4 ⌘ 1 [n] ⌘ 2 [n] 3 7 5 Here, the past of x 1 has no effect on x 2 because the corresponding coefficient in the AR model is zero. The only possible direction of causality is then from ROI 2 to ROI 1 , modulated byA[p],p=1,...,P. The process is driven by simulated zero-mean white Gaussianinnovations⌘ 1 and⌘ 2 . To test the ability of the three metrics to detect causality, we generate two types of processes: those for which there is no causality (A[p]=0,p=1,...,P) and those for whichthereiscausalityfromx 2 tox 1 (A[p]6=0forsomep=1,)...,P. Ineachcase, wegeneratethecoefficientsasfollowsforagivenorderP: 1. Constructtwosequences: d[p]= 1 p ,p=1,...,P a[p]= Uniform[0.15,0.3],p=1,...P whereUniform[a,b]isauniformly-distributedrandomvariablebetweenaandb. 63 2. SetD[p]equalto 0.3·d[p]. 3. Fornocausality,setA[p]=0,p=1,...,P Forcausality,setA[p]=d[p]·a[p]. 4. RandomlyreordertheARmatrices (a) Let{q 1 ,q 2 ,...,q P }bearandomshufflingoftheindices{1,...,P}. (b) AssignD[q 1 ]! D[1],D[q 2 ]! D[2],...,D[q P ]=P. (c) ApplythesamereorderingtoA[p],p=1,...,P. ReorderingoftheARcoefficientseliminatesthemonotonicityofthecoefficients. Such monotonicity may admit lower-order AR modeling of signals (e.g. an AR model with P =2maysufficientlyfitsignalsfromanARprocesssimulatedaboveforP =4). This method does not guarantee generation of a stable AR process, but was found to always generateastableprocessinoursimulations. 4.4.1.2 ObservationModel Given the underlying signalsx 1 andx 2 , we simulated the measured vector-valued time seriesforeachROIaccordingto y r [n]=g r x r [n]+ r [n] SIR + " r [n] SNR ,r=1,2 whereg r 2 R 4 isavectordescribingtherelativecontributionofx r toeachofthe4time series in ROI r (drawn uniformly from S 4 in our simulation), r represents simulated interference,and" r issimulatedmeasurementnoise(whiteGaussianwithunitvariance inoursimulation). 64 Theinterferencesignals 1 and 2 arevector-valuedtimeseriesusedtomodelback- groundbrainactivitythatisnotassociatedwithcausalitybetweenthetwoROIs. 1 and 2 areindependentofeachother,thesignalsx 1 andx 2 ,andthenoisetimeseries" 1 and " 2 . TheyaresimulatedasasumofscalarARtimeseriesz r,i [n]=,i=1,2accordingto: r [n]= 2 X i=1 h r,i z r,i [n],r=1,2 whereh r,i represent the contribution of each interferer to each time series with its cor- responding region, and are drawn uniformly at random from S 4 . Within each region, the scalar AR time series are generated by autoregression of the same order P as the sources’modelforeachregionr=1,2: 2 6 4 z r,1 [n] z r,2 [n] 3 7 5 = P X p=1 2 6 4 D[p] A[p] A[p] D[p] 3 7 5 2 6 4 z r,1 [np] z r,2 [np] 3 7 5 + 2 6 4 1 [n] 2 [n] 3 7 5 withthecoefficientsD[p]andA[p],p=1,...,P copiedfromthesourcesimulationin thecaseofcausality(soA[p]6=0,p=1,...,P)insection4.4.1.1. 4.4.2 DetectingCausality Weusedreceiveroperating characteristics (ROCs)toevaluatetheperformanceof CGC andMGC.TheROCcurveisaplotofthetruepositiverateversusthefalsepositiverate computedfromanappropriatestatisticasafunctionofadecisionthreshold[Met78]. For the simulation setup in section 4.4.1, we define a “true positive" as occuring when the causalityfromROI 2 toROI 1 exceedsthedecisionthresholdwhenA[p]isnonzero,while a “false positive" occurs when the causality from ROI 2 to ROI 1 exceeds the decision thresholdwhileA[p]=0,p=1,...,P. 65 WeperformedT =1500MonteCarlosimulations,computingforeachsimulation • CGC:C y 2 !y 1 , • MGC:M y 2 !y 1 ,and • to establish a best-case performance, bivariate GC using a linear combination of signalsbasedonthetrueweights(g 1 andg 2 )usedinthesimulation: G g T 2 y 2 !g T 1 y 1 . The empirical ROC curves for each measure derived from these simulations are shown in figure 4.2 for the simulations where N =400, SIR =5 and SNR =1. It can be seen from each of the plots that CGC shows a higher true positive rate than MGC for a given false positive rate. Increasing the order P of the simulated AR processes amplifies the difference in abilities between CGC and MGC, due to the accuracy and low number of parameters in CGC. Additionally, our simulations show that applying GC to representative ROI signals formed using perfect prior knowledge of the original weightsg 1 andg 2 yieldsbetterperformancethanCGC,asexpected. In order to further compare the performance of CGC to MGC, we summarize the ROCs by calculating the area under the ROC curve (AUC). The AUC measures the abilityofeachmeasuretodiscerntruedirectionsofcausality,averagedoverallpossible choicesofthefalsepositiverate. Inordertoestimatea95%confidenceintervalofAUC non-parametrically,weperformed 1000bootstrapsimulations[ET93]. In figure 4.3 the AUC is plotted, along with its bootstrapped confidence intervals, as a function ofP andN, setting SIR = SNR=5. AsN decreases, the performance of MGC deteriorates substantially, while CGC is still able to reasonably differentiate true causality from false causality. The difference between the performances of CGC and MGC grows even more as P increases. Specifically, for P =8 the number of parameters required for MGC is 4 2 ·8+(4+4) 2 ·8 /(4+4+5·8) = 40/3 times 66 Figure4.2: ROCsforbivariateGC,CGCandMGC.ThedarklinesarethemeanROCs, with the shadows indicating the 95% confidence intervals after bootstrapping. CGC demonstrates better performance than MGC for causality with higher complexity, due tothesimplifiedsignalmodelandthemaximizationofcausality. more than the number of parameters required for CGC. As before, the performance of MGCandCGCissurpassedbytheperformanceofGCwithknownweights,thoughthe performance of both MGC and CGC approach that of GC with known weights as N increases. The AUC confidence intervals for the full parameter set are shown in tables C.1,C.2andC.3ofappendixC.AstheSNR/SIRdecreases,theperformancedifference between CGC and MGC gets smaller, especially for short recordings (length 100) and high model order (P =6 or P =8). Nevertheless, CGC always performs better than MGCoverthewholerangeofsimulations. 67 Figure4.3: AUCasafunctionofautoregressivesimulationorderP,numberofsamples N, SIR and SNR, along with 95% confidence intervals for AUC after bootstrapping. CGC is more robust to smallN and largeP than MGC, as seen by the growing gap in performance. CGC does not perform as well as GC with known weights due to the upward bias in CGC: by calculating a maximum over weights, CGC is guaranteed to be higher than GCwithknownweights. Thisleadstoanincreasedrateoffalsepositiveswhenthereis no causality. For long time series, the difference between estimated causalities in true and false directions is significantly larger than this bias. However, for short time series, thebiasincreasesandthereforetheROCperformanceofCGCdecreases. 68 4.4.3 EstimatingCausality In addition to CGC’s detection power, we analyzed the ability of CGC and GCCA to scalewithincreasedcausalstrengthfromROI 2 toROI 1 . Foreachsimulation,wecalcu- late how well CGC and GCCA match the underlying GC by taking the squared differ- ence: error CGC = ⇣ C y 2 !y 1 G g T 2 y 2 !g T 1 y 1 ⌘ 2 error GCCA = ⇣ R y 2 !y 1 G g T 2 y 2 !g T 1 y 1 ⌘ 2 We then determine the 95% confidence interval of the differences over all choices for sample size and model order in figure 4.4 in the case where SIR = SNR=5. The full resultsforeachsimulationchoiceareintablesC.4,C.5,C.6andC.7ofappendixC.For everychoiceofsimulationparameters, weseethatthedifferenceinerrorispositive for a majority of the time, indicating that CGC is generally better than GCCA at matching the causality between the true regional signals. In addition, the property is seemingly notdependentonthepoweroftheinterferersbutratherthepowerofthewhitenoiseand wellasthecomplexityoftheautoregression. Morespecifically,asthecomplexityofthe autoregressivemodelincreases,GCCAisincreasinglyworsethanCGCatmatchingthe underlyingbivariatecausality. 4.5 RealExperimentalData We also applied CGC and MGC to estimate causality between brain regions using data collectedduringavisuomotortaskperformedbyamacaque[BCN93]. Thedataforthis task was made available by Dr. Steven Bressler at Florida Atlantic University, Boca 69 Figure4.4: DifferencebetweenCGC’serrorwithrespecttothetrueGCandGCCA’ser- rorwithrespecttothetrueGC.AsN increases,theerrorsgetsmalleryetCGCbecomes increasingly better at estimating the true causality. Each percentage is the number of simulations where CGC was better. As P increases, the errors get larger and CGC is betterforalargerpercentageofsimulations. Raton, FL, USA, and consisted of local field potential (LFP) recordings as explained below. 4.5.1 ExperimentalSetup A macaque was implanted with transcortical bipolar electrodes to record LFPs at mul- tiplebrainlocationsononehemisphere,includingmultiplerecordingsinthestriateand prestriate. Themacaquewasalsotrainedtoperformavisuomotordecisionmakingtask. The macaque initiated each trial of the experiment by depressing a lever, after which a 70 visual stimulus appeared for 100ms. The visual cue would inform the monkey to either letgoofthelever(“Go”)orkeeptheleverdown(“NoGo”). Voltageswererecordedata200Hzsamplingrate. Definingthelever’sinitialdescent tobeattimet=0,wefocusonatimeintervalupto300msafterstimulus(t=300ms). After pooling together multiple runs, we have 480 trials for each condition “Go” (cor- responding to a stimulus image of four dots in the shape of a diamond) and “NoGo” (correspondingtoastimulusimageoffourdotsintheshapeofaline). 4.5.2 Preprocessing In order to examine causality in visual cortex, we analyzed time series with a window lengthof100samples(20samples),computingtime-varyingestimatesofcausalityover the period 0-300ms, based on the work of [DBYL00]. The 6 electrodes located in the visual cortex naturally form two anatomical ROIs: S 1 , S 2 and S 3 within a striate ROI andIT,P 1 andP 2 withaprestriateROI,asillustratedinfigure4.5. TheseROIsandtheir connectivityhavebeenassociatedwithvisualpatternrecognition[MUM83,LDNB00]. The focus of our analysis was on the alpha-band and beta-band activity based on pre- vious work [BDY99, CBD06]; therefore, we applied a 14th-order elliptical filter to the data with a passband of 8-30Hz and a stopband attenuation of 60 dB for frequencies below 5 Hz or above 40 Hz. We subtracted the ensemble mean from the signals to achieve approximate second order stationarity. We then scaled each time series to have unitpower. The CGC and MGC measures were computed independently for each 100ms time window between the striate and prestriate ROIs, and the results were plotted as a func- tion of time. Separate estimates of the AR parameters were computed for the CGC and 71 Figure 4.5: Location of the 6 LFP channels in the monkey recorded during the experi- ment. Weanalyzedstriate-prestriatecausalinteractionsrelatedtothestimulus: avisual cuetoperformamotortask. MGCmodelsusingall“Go”andall“NoGo”trials. Bootstrappingwith1000resamples wasusedtocomputeconfidenceintervalsonallestimates. 4.5.3 Striate-PrestriateCausality The resulting CGC and MGC measures for each condition are plotted as a function of time in figure 4.6. Both measures indicate that the peak causality from striate to prestriate occurs in both tasks at 57.5ms. However, the difference between the two tasks is more pronounced for CGC than MGC; in fact, for CGC there is a statistically significant (p< 0.02)differenceincausalitybetweenthetwotaskswhencorrectingfor multiple comparisons across time through the Benjamini-Hochberg procedure [BH95]. 72 (c)striate(S 1 ,S 2 ,S 3 )toprestriate(P 1 ,P 2 ,IT) (d)prestriate(P 1 ,P 2 ,IT)tostriate(S 1 ,S 2 ,S 3 ) Figure 4.6: Dynamic causality in the occipital lobe during a visuomotor decision task. Both CGC (top row) and MGC (bottom row) find causality at 57.5ms from striate to prestriate (left column), but CGC finds a statistically significant difference in causality betweenthetwoconditions. Inaddition,onlyCGCfindsastatisticallysignificantdiffer- enceincausalitybetweenthetwoconditions. Thisprestriatetostriatecausalinteraction maybepartofthevisuomotorprocessingin“Go”thatisnotrequiredfor“NoGo”. Thetask-relateddifferenceincausalitymaybeattributedtotheincreasedcomplexityin thestimulusimageofadiamondasopposedtothestimulusimageofaline[BDY99]. In addition, both measures indicate an increase in causality from prestriate to stri- ate specific to the “Go” task at 137.5ms. Again, the difference between “Go” and “NoGo” at this time point is quite pronounced for CGC. There is a statistically sig- nificant (p< 0.01) difference between the causality between these two conditions for 73 CGCwhencorrectingformultiplecomparisonsasbefore,whileMGCdoesnotdemon- strate such statistically significant differences at this level between the two conditions at any point during the experiment. Signal propagation from the prestriate to striate is found in visuomotor processing [LBB03] where the visual stimulus causes a motor action. The differences in connectivity as observed using CGC for “Go” and “NoGo” can be explained as a result of this signal interaction since “Go” involves visual and visuomotornetworks,while“NoGo”involvesonlyvisualnetworks. 4.5.4 SignalSubspaces ThecomputationofCGCalsoyieldslinearcombinationweightsandthesignalsofinter- estforeachROI.Weanalyzedtheweightsatthetwotimepoints,57.5msand137.5ms, corresponding to peak differences in causality between “Go” and “NoGo.” Results are only shown for the “Go” task, which demonstrated causality at both time points. In figure 4.7 we plot the weight vectors estimated in CGC in each direction of causality. At each sensor, we also plotted the confidence interval using the previous bootstrap procedure. At the first time of large causality (57.5ms), the signals from S 3 for the striate and P 2 for the prestriate contribute substantially to the linear combination used in CGC, as indicated by the large magnitude of the weights at those sensor locations. Later at 137.5ms, only P 2 contributes substantially to the linear combination for the prestriate, while bothS 1 andS 3 contribute for the striate. The presence ofS 3 andP 2 is consistent with previous analysis on this data [LDNB00]. However, in the portion of the response related to the visuomotor network, the presence of the sensor S 1 is inconsistent with thosepreviousanalysesandthereforeneedstobefurtherinvestigated. 74 (a)(a)Fromstriatetoprestriateat57.5ms (b)(b)Fromprestriatetostriateat137.5ms Figure 4.7: Analysis of the linear combination weights estimated by CGC during peak causality. The main plot shows the relative magnitude of the weight estimated for each channel; the larger the circle radius, the larger the magnitude of the weight for that channel. The dark line shows the mean magnitude of the weight over the full set of trials, while the shaded region shows the bootstrapped confidence interval. We observe that theS 3 andP 2 recordings contribute substantially to the estimated causality at both time instances. However, the analysis also indicates that S 1 has a higher weight in causalityfromprestriatetostriateat137.5msthanfromstriatetoprestriateat57.5ms. 4.6 Discussion Weproposedanewmeasure,CGC,thatmodelsdirectedconnectivitybetweentwoROIs byestimatingoptimalweightedsumsofthesignalsfromeachROI.Theresultingscalar- valued weighted sum can be viewed as a regional signal representing the ROI. In each ROI, the signals of interest are the ones that participate in a causal interaction between ROIs, with the weights representing the spatial topography of these signals. The mod- eling of CGC stems from the expectation that some recordings in the ROI capture the regionalsignal. Averagingoverthoserecordings,withproperweightingplacedoneach recording,shouldreducethepowerofthenoiseobfuscatingeachsignal. Wereducethe amountofinformation(frommanyrecordingsineachregiontoonesignal)butinreturn we have hopefully not lost too much information pertinent to estimating the causality 75 between the regions. In addition, this collapse of data in each region significantly de- creasesthenumberofparametersrequiredtoestimatedirectedconnectivity. Along with CGC’s formulation, we introduce a customized numerical procedure to evaluate CGC. This procedure modifies conjugate gradient descent to reflect the struc- ture imposed on the optimization parameters by the constraints in CGC’s optimization problem. Conjugate gradient descent optimizes a function by updating the current es- timate based on the local change in the optimization function at that estimate. Our proceduremodifiesthecalculationofthatlocalchange(i.e. thegradient)sothattheup- dated estimate will naturally satisfy the constraints. In a similar procedure to [EAS98], we also modify the direction of search at each step to speed up convergence. These modifications, detailed in appendix B, were seen in simulation to heavily reduce the runtimeincalculatingCGC. With simulated and LFP data, we find CGC to be more robust to sample size and model order than an alternative method for calculating regional causality (MGC). In simulation,CGCshowedimprovedcausalitydetectionperformancecomparedtoMGC as shown by the difference in AUC and TPR between causality measures. In LFP data, CGC showed short-time task differences in causality between the striate and prestri- ate; such statistically significant differences were not observed with MGC. This striate- prestriate causality has previously been associated with tasks involving visual pattern recognition [MUM83, LDNB00] and visuomotor processing where the visual stimulus causes a motor action [LBB03]. CGC is more able to discriminate between causal and non-causal interactions for shorter sample sizes and higher autoregressive model order partially because CGC simplifies the model of regional causality by taking a weighted sumofthesignalsineachregion. 76 We found in simulation that CGC was a better estimate of the underlying bivari- ate causality than GCCA. This reflects the nature of the two metrics: CGC is us- ing Geweke’s estimate for Granger causality while GCCA is only using lagged cross- correlation. As a result, GCCA may have a difficult task in differentiating between two unequally strong causal interactions as both would show high cross-correlation. This difference in estimating causal strength is amplified for higher sample sizes: CGC, for signals of length 400, almost always is a better estimate for causality between signals of interest than GCCA. In addition, GCCA degradesin performance as model order in- creases, relative to CGC, because GCCA only analyzes correlations of lag equal to the model order while CGC summarizes the correlations of lags from one up to the model order. CGCdependsonpredefinedROIs. Inthemacaquedata,theLFPrecordingsinclude twosetsofelectrodesgroupedintwowell-definedanatomicalROIs: thestriateandpre- striate. However, anatomical ROIs are not required for CGC; we can attempt to use de- compositionssuchasprincipalcomponents(PCA)orassociationssuchascorrelationto cluster recordings into distinct functional ROIs. Whether the ROIs are defined anatom- ically or functionally, CGC is able to more robustly detect causal interactions between ROIs without the need to specify which specific subregions are involved. Moreover, the weight vectors obtained during model fitting provide an estimate the relative con- tribution of these subareas to the causal interaction. It is important that the two ROIs are disjoint to avoid signal mixing between them which would make determination of causalityimpossible. CGC provides, in addition to causality, the weights used to estimate the signals in each region related to that causality. In simulations, we saw that for larger sample size (N =400)andsmallermodelorder(P =2,4,6),CGCestimatedcausalitywithsimilar 77 accuracytothecausalityusingthecorrect(known)weights. Inthecausalnetworkinthe macaque, high weights were observed at S 3 and P 2 , consistent with previous analysis of this data. However, in the portion of the response related to the visuomotor network, the sensor S 1 also had significant weight which is inconsistent with earlier findings [LDNB00]. With LFP signals we have shown CGC’s ability to find inter-regional connectivity as well as signals of interest in a region. We can extend this work to other functional brain modalities such as electrocorticography (ECoG), electroencephalography (EEG) andmagnetoencepahlography(MEG).WhiletheLFPsignalsaredirectlyrelatedtolocal neuronalpopulations,thesensorsinECoG,EEGandMEGaresensitivetomuchlarger areasofneuronalactivity. Usingcorticalcurrentdensitymapping[BML01]can reduce thistosomedegree,butresolutionremainspoorrelativetoinvasivemeasurementswith bipolar microelectrodes. If two ROIs are widely separated then CGC should produce meaningfulresultsforEEGandMEGdata. However,incaseswherethereissignificant cross-talk or linear mixing between ROIs, the causality measures can be compromised [HPBL10]. Inthatcase,itmaybepossibletomodifytheCGCapproachtofirstremove linearmixingbetweenROIs[BWL + 11,HHC + 12,SPJ + 10]beforecomputingcausality. The model presented in this paper can be extended to the frequency domain meth- ods of finding causality [Gew82, KDTB01]. By decomposing causality into different frequencies, we may be able to discover significant causality over restricted frequency bands when there is no significant overall causality. In many neuroimaging cases, it hasbeenfoundthatcausalinteractionsarelocalizedinfrequency[WLF08,WvDKH09, LHS + 09]; however, frequency decompositions are difficult for all but the simplest of cases [Chi12]. Another extension of CGC would allow multiple signals of interest in 78 eachregiontointeract;henceameasureofcausalitycouldbedeterminedthatliessome- wherebetweenCGCandMGC.Thisextensionmaybetterapproximatecausalrelations in the brain, where it may be expected that multiple signals in one ROI, none uniquely determinedbyanychannelinthatROI,areinteractingwithmultiplesignalsinasecond ROI. Another possible extension of CGC is to identify direct versus indirect connectivity between multiple ROIs by using partial measures of causality [BS01, GSK + 08]. The principle behind partial causality is to determine causal interactions between two re- gionsthatarenotexplainedbymutualinteractionswithotherROIs,thusofferingapos- sible description of direct causal interaction. Such a procedure was described for MGC [BS10] by estimating a MVAR model even larger than the bi-regional model which performed poorly in our simulated and real studies. GCCA worked around the sharp increase in model estimation by simply regressing out any signals that may indirectly contributetothecausalrelationshipofinterest. ApartialextensiontoCGCwouldneed to combine the benefits of each approach to keep the model order low while still esti- matingthecausalinteractionsandtheirstrengths. 4.7 Summary We presented a novel measure, CGC, that models the causal interaction between ROIs by borrowing ideas from canonical correlation and Granger causality. With this new measure, causality between ROIs is modeled through weighted sums of the signals in the individual ROIs. We also presented a numerical method to determine the weights and corresponding causality from one ROI to another. In simulation, we demonstrated that CGC can effectively differentiate both the presence and strength of causality for complex model order, moderate noise levels and low data sizes. In an application on 79 macaqueLFPdatafromavisuomotortask,wedemonstratedCGC’sabilitytodetermine dynamicdifferencesincausalityintheoccipitallobecorrespondingtothetypeofvisual stimulusandvisuomotorresponse. 80 Chapter5 ADMM&Heteroscedasticity The previous chapter focused on Granger causality in the mean, which analyzes the effect of one signal on the average trend of a signal. In this chapter, we continue the exploration into Granger causality by looking at Granger causality in variance, which analyzestheeffectofonesignalonthedeviationfromtheaveragetrendofasignal. This behavior is modelled via autoregression with conditional heteroskedasticity (ARCH). We construct a new algorithm which combines variable splitting and the alternating di- rectionmethodofmultipliers (ADMM) toestimatetheparametersofanARCH model. This algorithm is able to estimate ARCH models much faster than current methods, providingtheopportunitytoanalyzedynamicchangesincausalityinvariancewithrea- sonableruntime. 5.1 Introduction The simplest model for Granger causality in mean is the linear autoregressive (AR) model [CC75, Gew82]. This model relates the past of one time series to to the present of another by a single coefficient, whose magnitude provides the strength of causality [TP93,MSAW01,BS01]. ARmodelshavebeenextendedtohandleindirectcausalrela- tionships[BKK04,MPS12,CWM12],groupsoftimeseries[AHJL13,BS10,WCK + 11] and nonlinear relationships [MPS08a, MPS08b, MLCS11]. In linear AR models, there areamyriadofestimationtechniques;currentproceduresallowrobustestimationofAR modelswithrelativelyfewsamplesandashortruntime[Bri01,SM05]. 81 For Granger causality in variance, the effect of the past values of a set of time se- ries on the present variance of that same set is termed conditional heteroscedasticity [Eng82]. The simplest model for this phenomenon is the linear autoregressive condi- tional heteroscedasticity (ARCH) model [Wei86, EK95]. This model relates the past values to the present covariance through the use of products of the past values between all possible pairs of time series, providing insight into the modulation of a time se- ries’s volatility by external factors [PP04, Cap07]. ARCH models have many forms, increasing the complexity of the relationships between the paired past products and the present covariance [Bol90, TT02, BLR06, Bol08]. In this manuscript, we will focus on awidely-applicableARCHmodel: theVECversion,whichassumesthepresentcovari- anceisasumofaconstantcovarianceandaweightedsumofallthepairedpastproducts [BEN94]. Thereisnoclosed-formsolutionfortheparametersintheVECversionoftheARCH model. The most common numerical technique is a gradient-based approach applied to amaximum-likelihoodproblem[GRD99,HP09]. Alargenumberofparametersmustbe estimated through the use of conjugate gradient descent, which requires calculation of the log-likelihood, score and information matrix [EK95, VAG13]. WithM time series and R lags in the VEC version of the ARCH model, gradient descent must estimate 1 2 M (M+1) 1+ R 2 M (M+1) coefficients [ST09]. The computational cost of each iteration is extremely high, even though gradient descent requires only a few iterations. Recently, another technique has used the dual formulation of the maximum-likelihood optimizationproblemtoincorporatestabilityconstraints[CO12],butthistechniquestill requirestheexpensivecalculationofthelog-likelihoodandscorefunction. Inthismanuscript,wedescribeanewoptimizationmethodtoestimatetheVECver- sion of the ARCH model. We use variable splitting along with the alternating direction 82 method of multipliers (ADMM) to break down the estimation of the model into a se- riesofrelativelyfast,thoughsuboptimal,iterations. Wedemonstratethatthistechnique estimates ARCH models faster than current gradient descent methods with no loss in accuracy. This property holds for all choices of the number of time series and length of time series. We then apply our method, with standard formulas for estimating Granger causality in variance from the VEC version of an ARCH model, to study the temporal dynamicsofvoltagesignalsgeneratedbythehumanbrainduringarestingstate. 5.2 Methods 5.2.1 AutoregressiveConditionalHeteroscedasticity Let"[n]2 R M be a vector-valued stationary time series with zero mean at each time n given of the value of " at any other time. The variance of "[n] is constant as a function of n due to stationarity, but the conditional variance of "[n] given its past "[n1,n2,...] may be a function of those past values. We will assume that the signal isR th -order Markov, meaning that only the most recentR past values affect the varianceattimen. Together,theseassumptionscanbedescribedby E{"[n]|"[n1,n2,...]} =0 Var("[n]|"[n1,n2,...]) = Var("[n]|"[n1,...,nR]) 83 To construct the conditional heteroscedasticity model, we first define, for a sym- metric matrixA, the half-vectorization operator vech(·) that stacks the elements of the lower-triangularelementsofAintoacolumn vech(A)= A 11 ··· A 1M A 22 ··· A 2M A 33 ··· A MM T We also define the opposite of that operator hcev(·) which creates a fully symmetric matrixfromavectorconstructedusingthehalf-vectorizationoperator. We use the vector autoregressive conditional heteroscedasticity (vARCH) model [Eng82] to describe the conditional variation using the homoscedasticity vector w 2 R 1 2 M(M+1) and conditional heteroscedasticity matrices C r 2 R 1 2 M(M+1)⇥ 1 2 M(M+1) , r=1,...,R: "[n]|"[n1,...,nR]⇠N (0,H[n]) (5.1) whereH[n]representstheconditionalcovarianceattimenandismodelledby vech(H[n]) =w + R X r=1 C r vech "[nr]" T [nr] For the bivariate caseM =2, the element-wise formulation of the conditional covari- anceis 2 6 6 6 6 4 H 11 [n] H 12 [n] H 22 [n] 3 7 7 7 7 5 = 2 6 6 6 6 4 w 11 w 12 w 22 3 7 7 7 7 5 + R X r=1 2 6 6 6 6 4 C 11,r C 12,r C 13,r C 21,r C 22,r C 23,r C 31,r C 32,r C 33,r 3 7 7 7 7 5 2 6 6 6 6 4 " 2 1 [nr] " 1 [nr]" 2 [nr] " 2 2 [nr] 3 7 7 7 7 5 In this form, it is clear that each variance and covariance at timen is a constant plus a weightedsumofallpossibleproductsofpairsofsignalsattimen1. 84 The order R can be computed using a Lagrange multiplier statistic [EK95], which reduces to a regression of the squared signal" 2 [n] on a constant signal and the lagged squaredsignals" 2 [nr],r=1,...,R. 5.2.2 MaximumLikelihoodEstimation Assuming that the joint distribution of"[1,2,...,R] is uniform, the likelihood of a set ofparameters (w,C 1 ,...,C R )isfoundusingthechainruleofprobability L(w,C 1 ,...,C R )= p("[1,...,N]; w,C 1 ,...,C R ) = N Y n=R+1 p("[n]|"[n1,...,1]; w,C 1 ,...,C R ) = N Y n=R+1 p("[n]|"[n1,...,nR]; w,C 1 ,...,C R ) = N Y n=R+1 exp 1 2 " T [n]H 1 [n]"[n] (2⇡ ) M 2 p |H[n]| H[n]= hcev w + R X r=1 C r vech "[nr]" T [nr] ! wheretheoperator|·|representsthedeterminantoftheargumentmatrix. Most methods to estimate vARCH use gradient descent on a maximum-likelihood formulationwiththelog-likelihood[HP09,CO12,FHA + 13] `(w,C 1 ,...,C R )= 1 2 N X n=R+1 ln|H[n]|+ N X n=R+1 1 2 " T [n]H 1 [n]"[n] H[n]= hcev w + R X r=1 C r vech "[nr]" T [nr] ! 85 where we have removed the inconsequential constant term 1 2 (N1)M ln2⇡.We use a semiquadratic programming (SQP) algorithm [Ber99] as implemented in Mat- LabR2013a,whichwasfasterthanothergradientdescentmethods. TheSQPalgorithm usesrelativelyfewiterations,buteachiterationisquitecostlyduetohavingtocalculate a gradient, an approximate Hessian and an optimal stepsize. We also found that while an analytical calculation of the gradient is more accurate [EK95], a numerical gradient wasmuchfasterandledtothesameoptimainsimulation. 5.2.3 VariableSplitting&DualAscent Define the half-vectorized representation of the signal e[n]= vech "[n]" T [n] . Then, the previous definition ofH[n] as the conditional variance of "[n] given "[n1,...,nR]canbeusedtoconvertthemaximum-likelihoodformulationintoa constrainedoptimizationproblem minimize 1 2 N X n=R+1 ln|H[n]|+ 1 2 N X n=R+1 " T [n]H 1 [n]"[n] subjectto H[n]= hcev w + R X r=1 Ce[nr] ! Note that in the previous formulation, we were usingH[n] as a placeholder; here, we nowuseitasaparametertooptimize. Wecanconvertthisproblemintoanunconstrained optimization problem through the use of the augmented Lagrangian, also known as the method of multipliers [Hes69]. We introduce matrices N[R+1,...,N] 2 R M⇥ M 86 (thatrepresentaugmentedLagrangemultipliers)toformanunconstrainedoptimization problemwhichminimizesthefunction L (w,C 1 ,...,C R ,H[R+1,...,N],N[R+1,...,N]) = 1 2 N X n=R+1 ln|H[n]|+ 1 2 N X n=R+1 " T [n]H 1 [n]"[n] + ⇢ 2 N X n=R+1 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F ⇢ 2 N X n=R+1 kN[n]k 2 F wherek·k 2 F represents the Frobenius norm of the argument matrix. This unconstrained optimizationproblemcanbesolvedbydualascent: minimize L to get the best estimates ˆ w, ˆ C 1 ,..., ˆ C R 1 , and ˆ C R and the covariances ˆ H[R+1,...,N]giventheLagrangemultipliersN[R+1,...,N],andthen update the Lagrange multiplier matrices using the new estimates of the parameters: ˆ N[n]=N[n]+↵ [n]hcev ⇣ ˆw + P R r=1 ˆ C r e[nr] ⌘ ˆ H[n],n =R+1,...,N forsomestepsizes↵ [R+1,...,N]2 R. Themainadvantageofdualascentisthattheupdatestepisfast. Themaindisadvantage of this approach is that at each step, we have to find the best estimate for both the vARCH parameters s ˆ w, ˆ C 1 ,..., ˆ C R 1 , and ˆ C R as well as the substituted covariance matrices ˆ H[R+1,...,N]. 5.2.4 AlternatingDirectionMethodofMultipliers(ADMM) Rather than finding the optimal estimate for the parameters and covariance matrices at each step, we can use the structure of L to find iterations that just make progress towards the optimal point. One approach to calculating these suboptimal iterations is 87 the alternating direction method of multipliers (ADMM), which finds the best estimate oftheparametersandthecovariancematricesseparately[BPC + 10]. Specifically, L is a sum of maximum-likelihood terms corresponding to H[n] for each timepoint n and a least-squares fit of the vARCH parameters w and C to the conditional covariances H[n]. So, we split the minimization step of dual ascent into two minimization steps, first finding the best fit for the vARCH parameters and then estimatingtheconditionalcovarianceateachtime. Thefullalgorithmisoutlinedbelow, usingmanynotationsanddefinitionsfoundinchapter2: Precompute the vectorse[n]= vech "[n]" T [n] representing single-sample second moments, thevectorp = vech ⇣ 1 M⇥ M + ⇣ 1 p 2 1 ⌘ I M⇥ M ⌘ of 1sand 1 p 2 s, theSVD 2 6 6 6 6 6 6 6 6 6 6 4 1 e T [R] e T [R1] ··· e T [1] 1 e T [R+1] e T [R] ··· e T [2] . . . . . . . . . . . . . . . 1 e T [N2] e T [N3] ··· e T [NR1] 1 e T [N1] e T [N2] ··· e T [NR] 3 7 7 7 7 7 7 7 7 7 7 5 =U e S e V T e , andthematrix = V e S 1 e U T e ⌦ (diag{p}) 1 representingapseudoinverse. Initialize theaugmentedLagrangiancoefficient⇢ =40, theiterationcounteri=0and 88 the estimatesw 0 ,C 1,0 ,C 2,0 ,...,C R 1,0 andC R,0 using least-squares regression ofe[n]one[n1,...,nR] 2 6 6 6 6 6 6 6 6 6 6 4 w 0 vec(C 1,0 ) vec(C 2,0 ) . . . vec(C R,0 ) 3 7 7 7 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 4 pe[R+1] pe[R+2] . . . pe[N1] pe[N] 3 7 7 7 7 7 7 7 7 7 7 5 whichrequiresthepseudoinverseprecomputedinthepreviousstep. Loop overiterationi=1,2,... Update eachsubstitutedcovariancematrixusingafewfixed-pointiterations,re- placingH i [n]with hcev w i 1 + R X r=1 C r,i 1 e[nr] ! + H 1 i [n] " n " T n H i [n] H 1 i [n] 2⇢ +N i 1 [n] and then perform gradient descent on those covariance matrices that have notconverged. Compute theprimalanddualgaps r 2 i = 1 (N1) N X n=R+1 khcev w i 1 + R X r=1 C r,i 1 e[nr] ! H i [n]k 2 F s 2 i = ⇢ 2 (N1) 2 N X n=R+1 kvech(H i [n]H i 1 [n])k 2 ` 2 +kvech(H i [n]H i 1 [n])e T [n1]k 2 F . 89 Terminate the loop if the gaps are small enough; for a tolerance of 10 6 the conditionsare r 2 i M (M+1)10 12 ·10 2 , and s 2 i M (M+1)+(M (M+1)) 2 2 10 12 ·10 2 . Theterminationisenactedearly(by10 2 )becauseADMMdecreasesinspeed asthesearchpathapproachestheoptimum. Increase theaugmentationparameter⇢ ifthegapshavegrown: ifr 2 i > 1.05·r 2 i 1 ors 2 i > 1.05·s 2 i 1 thenincrease⇢ by 10%. Ascend theLagrangemultipliersusingastepsizeof1,bysettingN i [n]to N i 1 [n]+hcev w i 1 + R X r=1 C r,i 1 e[nr] ! H i [n]. Update thevARCHparametersusingtheprecomputedpseudoinverse 2 6 6 6 6 6 6 6 6 6 6 4 w i vec(C 1,i ) vec(C 2,i ) . . . vec(C R,i ) 3 7 7 7 7 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 6 6 6 6 4 pvech(H[2]N[2]) pvech(H[3]N[3]) . . . pvech(H[N1]N[N1]) pvech(H[N]N[N]) 3 7 7 7 7 7 7 7 7 7 7 5 . 90 Ascend thecurrentparameterstoaccountfortheearlyterminationusingM1Newton stepsonthevARCHparameters: w T i vec T (C 1,i ) ··· vec T (C R,i ) T = ⇥ 0 ⇥ j +↵ (H (⇥ j )) 1 r ⇥ j L ! ⇥ j+1 j =1,...,M+1 whereH (⇥ ) is the Hessian ofL with respect to the parameters andr ⇥ L is the gradientofLwithrespecttotheparameters. The first step (solving for the vARCH parameters) uses least-squares estimation with a precomputedpseudoinverseandisthusasfastasmatrixmultiplication. Thesecondstep (solving for each substituted conditional covariance matrix) uses a few short paralleliz- ableiterations. Thethirdstep(solvingfortheaugmentedLagrangemultipliermatrices) is acomputationally fastupdate equationwhere theoptimal stepsize at eachiteration is ˆ ↵ [n]=1foralliterationsandvaluesofn. ThefinalM1Newtonsteps,usedtoobtain the maximum likelihood with a tolerance of 10 6 , are few compared to the number of gradient descent steps used in the SQP algorithm. In addition, since we are close to the optimum, we can use successive parabolic interpolation to find the optimal stepsize for each gradient descent step quickly [Hea02]. The derivation of each step is shown in appendixD. 91 5.2.5 MeasuringCausalityinMean&Variance LetM =2 and"[n]= 2 6 4 " 1 [n] " 2 [n] 3 7 5 so that our goal is to measure the causality from " 2 to" 1 . UsingGrangercausalityprinciples,thenullhypothesisofnocausalityfrom" 2 to " 1 is p(" 1 [n]|" 1 [n1,...,nR]) =p(" 1 [n]|" 1 [n1,...,nR]," 2 [n1,...,nR]) (5.2) For Granger causality in the mean, the metric to test is described in section 2.4.2.1 andreprintedhere G y 2 !y 1 = Var(" 1 [n]|" 1 [n1,...,nR]) Var(" 1 [n]|" 1 [n1,...,nR]," 2 [n1,...,nR]) 1 (5.3) with null distribution of 2 R when " 2 does not cause " 1 in mean [Gew82]. For Granger causalityinvariance,themetrictotestisdescribedinsection2.4.2.2andreprintedhere V y 2 !y 1 =(N1)⇥ T J (F(⇥ )) 1 JJ ⇥ J (5.4) ⇥ = w T i vec T (C 1,i ) ··· vec T (C R,i ) T (5.5) whereF(⇥ )=E n @ L @ ✓ @ L @ ✓ T o istheinformationmatrixforthegivenparameters. This estimated strength of causality in variance has a null distribution of 2 2·R when " 2 does notcause" 1 invariance[HH06]. 92 5.3 Results 5.3.1 Simulations Inthissection,wewillcompareouralgorithmtostandardSQPasimplementedinMat- LabR2013a(Mathworks,Natick,MA,USA).ForSQPwesetthestoppingtoleranceof the stepsize to 10 6 ; for our ADMM algorithm, we set the tolerance of the primal and dualgapsto10 6 asdescribedinsection5.2.4. 5.3.1.1 SimulationParameters For a given dimension M2{ 2,3,4,5} we simulate a vector-valued signal "[n] 2 R M ,n =1,2,...,N, according to the vARCH model in equation 5.1 with R =1. To construct the covariance sequenceH[2,...,N], we use a simplified form called the BEKKversion[EK95] H[n]=W + ˜ C"[n1]" T [n1] ˜ C T (5.6) For the homoscedasticity term W we take W =0.31 M⇥ M +0.7I M⇥ M so that the homoscedastic covariance is 1 on the diagonal and 0.3 off the diagonal. For the con- ditional heteroscedasticity term, we take ˜ C rr =0.25 for r=1,...M (the diagonal), ˜ C r,r+1 =0.1forr=1,...,N1(theelementsoneabovethediagonal), ˜ C M1 =0.1 (the element in the lower left corner) and ˜ C rc =0 for all other positions on the matrix ˜ C. The model can be converted to the model we use by vectorization and properties of Kroneckerproducts vec(H[n]) = vec(W)+ ⇣ ˜ C⌦ ˜ C ⌘ vec "[n1]" T [n1] (5.7) vech(H[n]) = vech(W)+Cvech "[n1]" T [n1] (5.8) 93 Figure 5.1: Runtime & log-likelihood of the optimal parameters as a function of ⇢ for ADMM, compared against SQP. Each plot corresponds to a different size of the vector-valued signal. For each choice, there is an optimal ⇢ (shown with an⇥ ) which maximizes speed (dashed green line); for all ⇢ there is negligible change in accuracy (solidblueline). Accuracyisthedifferenceinlog-likelihoodusingSQPversusADMM (L ⇢ L SQP ). SpeedistheratioinruntimeusingSQPversusADMM ⇣ t SQP t⇢ ⌘ . where C is constructed from the Kronecker product ⇣ ˜ C⌦ ˜ C ⌘ by first removing all rowscorrespondingtoupper-triangularelementsofH[n]andthencombining(byaddi- tion) each pair of columns representing a lower-triangular and its corresponding upper- triangularelementin"[n1]" T [n1]. 94 5.3.1.2 EffectofAugmentationParameters Todeterminetheeffectofthechoiceof ⇢ onADMMestimation, werunthesimulation withN =1000·M 2 samples by first simulating 100+N samples and then discarding the first 100 to remove effects of initialization on the simulation. We then use SQP to compute an optimal log-likelihoodL SQP , runtimet SQP , and parametersw SQP andC SQP ; theseconstituteourperformancebaseline. For each ⇢ , let L ⇢ be the maximum log-likelihood using ADMM with the aug- mented Lagrangian parameter ⇢ . Similarly, let t ⇢ be the runtime,w ⇢ be the estimated homoscedastic term andC ⇢ be the estimated conditional heteroscedasticity term using ADMM with the specified augmented Lagrangian parameter. In figure 5.1, each plot corresponds to a choice forM and shows two quantities - the log-likelihood difference L ⇢ L SQP and the runtime ratio t SQP t⇢ - as a function of ⇢ . For bivariate vARCH esti- mation, the runtime ratio is bigger than 1 for almost all values of ⇢ , with a maximum of⇡ 5.3. In addition, the difference in log-likelihood is on the order of the tolerance; hence,wecanconcludethatouralgorithmis⇡ 5.3⇥ fasterwithnoappreciablechange (differenceof< 10 6 )inthemaximumlog-likelihood. This runtime difference grows asM grows, as demonstrated in the remaining plots in figure 5.1. For ⇢ found using the figures in figure 5.1, we tabulate the results as the dimension increases in figure 5.1. As dimension increases, ADMM gets exponentially increasing gains in runtime against SQP. In this table, we’ve also computed the error in estimatingw andC from the two numerical approaches, to show that the differences in estimation of vARCH parameters are small. We also see that good choices for the augmented Lagrangian parameter change as M increases; we found that a good rule was to use ˆ ⇢ =8· (M+2)+2. We can also use ˆ ⇢ =40 while still enjoying major performancegainswithrespecttoSQPforallvaluesofM. 95 M ˆ ⇢ L ˆ ⇢ L SQP t SQP t ˆ ⇢ w SQP w ˆ ⇢ C SQP C ˆ ⇢ 2 34 0.5302·10 6 4.3775 0.5526·10 5 2.2420·10 2 3 42 0.4538·10 6 20.7584 0.1366·10 5 0.0049·10 2 4 50 0.0201·10 6 85.8739 0.0366·10 5 0.1302·10 2 5 58 0.0080·10 6 158.2007 0.0010·10 5 0.0373·10 2 Table 5.1: For chosen ⇢ , the differences between ADMM & SQP. AsM increases the speed increase is exponential and the accuracy gets better. In addition, both SQP and ADMMendupveryclosetothetruevARCHparametersinthesimulation. 5.3.1.3 EffectofSignalLength For each choice ofM we set ⇢ based on the values in table 5.1 and again compare the log-likelihooddifferenceL ⇢ L SQP andtheruntimeratio t SQP t⇢ aswevarythenumberof samples. Infigure5.2weseethatADMMisgenerallyfasterthanSQPasthenumberof samplesincreases;inaddition,theaccuracyisusuallywithintoleranceandissometimes better for low numbers of samples. For each choice of number of signalsM, it can be seenthatchoosing 250·M 2 leadstostableresultswithADMM&SQP. 5.3.1.4 DetectionofCausality Totesttheabilitytodetectcausalityinvariance,wemodifyoursimulationbychanging the conditional heteroscedasticity matrix ˜ C 2 R 2⇥ 2 so that ˜ C 11 = ˜ C 12 =0.4 and ˜ C 21 =0,sothatthereisonlypossiblycausalityfrom" 2 to" 1 . Foreachchoiceof ˜ C 12 within{0,0.02,0.04,...,0.2},wecalculatetheWaldstatistic for325MonteCarlosimulationsusingN =4500samples. Asthecoefficientregulating thecausaleffectfrom" 2 to" 1 increases,thereisalargerdeviationfrombothnulldistri- butions as seen at the top of figure 5.3. We also see on the bottom left of figure 5.3 that the parametric null distribution (a chi-square distribution with two degrees of fredom) estimatesthedistributionoftheWaldstatisticunderthecondition ˜ C 12 =0well. 96 Figure5.2: PerformanceofADMMversusSQPasafunctionofthenumberofsamples N. ADMMisgenerallyfaster, withincreasingspeedadvantagesasN increases. There isnegligiblechangeinlog-likelihoodafter 250⇥ M 2 samples. Using the parametric null distribution, we can build a receiver operating character- istic (ROC) of the Wald statistic for each choice of ˜ C 12 . At the bottom right of figure 5.3 we plot the ROCs [Met78] for a few of the choices of ˜ C 12 . Causality is reliably detectedforsmallvaluesoftheoff-diagonalcoefficientinthemodel-theareaunderthe ROCreaching 0.90andthetruepositiveratecontrollingforafalsepositiverateof 0.05 reaching0.5-asshownintheROCwhen ˜ C 12 =0.1,whichisrelativelysmallcompared to ˜ C 11 =0.4. We can further look at the area under the ROC and the true positive rate for a false positive rate of 0.05 as a function of ˜ C 12 and the number of samples in table 5.2, 97 1500 2000 2500 3000 N! 1500 2000 2500 3000 .540 .510 .502 .546 .02 .058 .034 .066 .062 .606 .548 .558 .616 .04 .102 .052 .108 .112 .660 .648 .682 .726 .06 .166 .058 .186 .212 .724 .694 .772 .832 .08 .238 .118 .312 .388 .794 .792 .846 .918 .10 .312 .192 .456 .672 .856 .876 .922 .944 .12 .446 .348 .702 .768 .898 .916 .958 .976 .14 .572 .428 .836 .908 .934 .942 .980 .986 .16 .702 .644 .930 .952 .968 .972 .986 .994 .18 .828 .782 .954 .978 .970 .980 .990 .998 .2 .858 .882 .986 .994 areaunderROC c truepositiverates Table 5.2: Detection ability of Wald statistic as the number of samples N and ARCH coefficient ˜ C 12 vary. On the left, the area under the ROC increases as a function of number of samples and ARCH coefficient; forN 2500 and ˜ C 12 0.12, the AUC is consistently above 90%. On the right, the true positive rate when controlling for a 5% false positive rate shows the same behavior; for N 2500 and ˜ C 12 0.12, the true positiverateisconsistentlyabove 50%. where we see them increase as the number of samples and the strength of causality increases. From table 5.2 we can also conclude how many samples we need and how reliably we can estimate different strengths of cauaslity. For strong causal relationships ⇣ ˜ C 12 0.12 ⌘ ,having 2500samplesisrequiredtohaveatruepositiverateabove50% whencontrollingforafalsepositiverateof 5%;forthiscase,theareaundertheROCis 90%, showing strong power. For weaker causal relationships ⇣ ˜ C 12 < 0.12 ⌘ , even 3000 samplesisnotenoughtoguarantee50%truepositiveratewith5%falsepositiverate. In fact, the performance degrades rapidly as we reach ˜ C 12 =0.02. We can conclude that in application we can reliably detect moderate and strong causality in variance when ˜ C 12 0.12 = 3 10 ˜ C 11 andwehaveatleast 2500samples. 98 5.3.2 Application: Resting-StateEEG To demonstrate the utility of our algorithm, we try to estimate the causality in variance among recordings of voltages on the surface of the brain in an epileptic patient during a condition of rest (known as the resting state and described in section 2.2.3.3). These intracranialelectroencephalographic(icEEG)recordingscapturethevoltagescreatedby theactivityofnearbyneurons. Weshowhowcausalityinvarianceandcausalityinmean can together demonstrate consistent dynamic networks of brain activity, characterizing them by whether the causal relationship affects the average trend or average power of theaffectedsignals. 5.3.2.1 DataCollection OnepatientunderwentanicEEGevaluationusingbothelectrodepatchesanddepthelec- trodes. In this retrospective study, we focus solely on the electrode patches, consisting of 41 electrodes placed on the surface of the left hemisphere of the patient’s brain. To avoid covariation between electrodes due to external factors such as eye blinks, heart beatsorthecommonreferencesignal,weusea“differential”referencing: weconstruct 17icEEGsignalsbytakingthevoltagedifferencebetweendisjointpairsofnearbyelec- trodes. Of the 17, seven showed significant conditional heteroscedasticity using the Lagrangemultipliertestinsection5.2.1;wewillfocusonthesesevenfortheremainder oftheanalysis. We focus on a one-minute segment of voltage signals recorded at a sampling rate of 1000Hz. We filtered the data to a range of [1.5,90]Hz using a tenth-order elliptical infinite-impulse-response filter and then downsampled the data to 250Hz; this allows us to use smaller lags in autoregressive modelling while preserving the information in the icEEG signals. Since this data is globally non-stationary, we focus on short time 99 windowstensecondsinlengthwitha0.5sshiftfromwindowtowindow. Removingthe first few and last few windows to avoid filter edge effects, we have 89 windows, each with 2500 samples to analyze. The window length was chosen based on the simulation resultsintable5.2. 5.3.2.2 CausalityinMean ForthesevenicEEGsignals,wecomputedsignificantlycausalconnectionsinthemean using the methods in section 5.2.5 and the null parametric statistics in section 5.2.5. We plot in figure 5.5 the count of significantly causal connections for causality in the mean. Weseethatmanydirectedpairsofchannelsarecausalinthemeanforatleastten of the 89 windows. Many of the causal connections from A39-A40 are bidirectional; the same behavior appears for causal connections involving A20-B32. On the other hand, both causal connections from B26-B22 are unidirectional, indicating that it may beleadingactivityinotherareas. However,weknowthatcausalityinmeanisupwardly biasedwhencausalityinvarianceispresent,soweshouldalsolookatanypairsshowing consistentcausalityinvariance. 5.3.2.3 CausalityinVariance Aftertemporallywhiteningthedatawiththeautoregressivemodelsconstructedtocom- pute causality in mean, we then compute causality in variance. We determine signif- icance using the null parametric statistic devised in section 5.2.5 and then count the number of windows for each pair that show significant causality in variance. In figure 5.6 we see that the number of significantly causal connections is much smaller; only fourshowconnectivityformorethantenofthe89windows. 100 Of the four, three overlap with causality in the mean: the two connections relating A20-B32 and A32-B20, as well as the connection from B40-B36 to A19-A22. For the bidirectional coupling, causality in variance showed at least 20 significantly causal windows,andcausalityinmeanshowedall89windowstobesignificantlycausal. This is a strong indication that the causality in mean estimated between A20-B32 and A19- A22 is highly affected by causality in variance. Similarly, the causality in mean from B40-B36 to A19-A22 was significant for 32 windows, while the causality in variance was significant for 14 windows, again indicating that causality in variance may affect theinterpretationofcausalityfromtheformerelectrodetothelatter. 5.4 Discussion ARCHmodellingismuchmoredifficultthanautoregressivemodeling,asthegenerated sequence in ARCH is hidden (the variances H[2,...,N]); hence, least-squares tech- niques such as Yule-Walker equations do not readily apply. Currently, the only prudent approachismaximum-likelihoodestimation. Wehavedemonstratedanewalgorithmto find the maximum-likelihood parameters, which uses a problem with more parameters to optimize but provides iterations fast enough to lead to faster results. As a result, for vARCH estimation, our algorithm provides a speed increase on the factor of tens of times, up to 158 for the case of ARCH modeling of 5 signals. This performance gain unlocks the ability to use ARCH to estimate dynamic networks of causality in variance asdemonstratedinsection5.3.2. Whiletherearemanychoicesforthevariablestosplit,wechosethecovariancesbe- causetheyprovideda fast least-squaresstepwhenestimatingtheparameters. In return, we have a cubic equation to solve for the optimal covariance at each time n; in prac- tice, after the first few iterations, a maximum of 20 fixed-point assignments provided 101 the optimal covariance for each time n in each ADMM iteration. Hence, while other methodsorsubstitutionsmaybeplausible,theuseofafewfixed-pointassignmentsand aleast-squaresestimateprovidedthefastestcombinationforeachADMMiteration. EachstepofADMMisanapproximationofthedualascentminimizationstep;rather than estimating ⇣ ˆ w, ˆ C, ˆ H[2,...,N] ⌘ simultaneously, we split the variables into two sets to form the two alternating directions. When the algorithm is near the optimal parmaeters, thesuboptimalityofthesestepsforcestheuseofmoreandmoreiterations. Asaresult,thealgorithmslowsdownasitisclosertotheoptimum. Weusedapragmatic adjustment to alleviate this issue: we allowed the primal and dual gaps to reach within 10 2 ofthetargettoleranceandthenusedM1preconditioneddualascentsteps. Since atthisstepinthealgorithmthecurrentparametersareclosetothemaximum,theoptimal stepsize can be found by successive parabolic interpolation with high accuracy. These steps, with good estimates for the optimal stepsizes, provide a much faster route to the optimalparametersthantheADMMstepswhenthecurrentestimateisveryclosetothe optimum. OnestrugglewithARCHmodelingisitsrequirementofconstraints,usedtoprevent theARCHmodelfromestimatinginfinitevariances. Inourapproach,byestimatingthe covariances as well, we prevent the estimation of infinite or negative-definite matrices; with the observation of either case, the algorithm would stop and adjust ⇢ . As a result, our solution provides a valid sequence of covariance matrices. Since this sequence is constructed using the estimated parameters, those parameters must fit the constraints required in the ARCH model. Other gradient descent and dual ascent algorithms have toexplicitlyprovidethoseconstraintswithinthecostfunction[CO12]. This form of vARCH modelling is known as the VEC version; other versions such as the BEKK version [EK95], CCC version [Bol90], DCC version [TT02, Eng02] and 102 GDCversion[KN98]givedifferentassumptionsfortherelationshipbetweenthepresent covariance and past second moments. We focused on the VEC version in this chapter duetoitsgeneralizabilityandclosenesstotheautoregression(AR)modelsthatareused to estimate causality in the mean. We can also add the effect of theQ past covariances H[n1,n2,...,nQ]totheconstructionofH[n],formingthevectorgeneralized autoregressiveconditionalheteroscedasticity(vGARCH)[Bol86]model,butweusethe vARCHmodeltokeeptheparameterspaceassmallaspossiblewhilestillbeingableto estimatetheinterestinginteractionsbetweenpastcovariationandpresentcovariation. As demonstrated in the application of ARCH to resting-state EEG data, it is impor- tanttonotethatcausalityinmeanandcausalityinvariancearenotseparatephenomena [PP04]. Having causality in variance may bring about spurious causality in the mean, and vice versa. One extension of this approach is to provide a combined AR-ARCH model; however, such proposed models have required a large amount of data and use slow techniques to find a solution. The common practice is to use the residuals of AR models to estimate causality in variance, as done above. We hope that in the future we will be able to expand this algorithm to the more general case, where we would expect similarperformancegains. 5.5 Summary We demonstrated a novel algorithm for estimating ARCH models, which allows us to find causality invariance fordynamic systemsin reasonableruntime. Weused variable splittingandADMM,whichtogetherincreasethenumberofparameterstoestimatebut provide fast iterations for estimating those parameters. Thus, we can use many more iterations than current gradient descent methods, while still obtaining the maximum likelihood estimate faster. We also showed how such an algorithm can enable the use 103 of ADMM on biological time series, where many models must be estimated in order to capturetheevolvingcausalityintherestingbrain. 104 Figure 5.3: Wald statistic as a function of the ARCH coefficient ˜ C 12 . As ˜ C 12 increases, thelog-Waldstatisticincreases;forvaluesof0.1andabove,theresultisexcellentpower in the test for causality in variance. The empirical distribution of the Wald statistic for ˜ C 12 =0 is nearly equivalent to using a chi-square distribution with two degrees of freedom,asshownonthebottomleft,justifyingtheuseofaparametricnulldistribution. Using that parametric null, the ROC curves show increasing detection power as ˜ C 12 increases;for ˜ C 12 0.12,thedetectionpowerisnearlyperfect. 105 Figure 5.4: Conditionally heteroscedastic voltage signals on the brain cortex. Red cir- clescorrespondtothosedifferentialsignalsforwhomthereisnosignificantconditional heteroscedasticity, while yellow circles correspond to those differential signals that did demonstrate significant conditional heteroscedasticity. Timecourses for the conditional heteroscedasticchannelsareshownontherightforaten-secondwindow;theburstiness ineachsignalisclearat87,89and94seconds. Foreachvoltagesignal,alagofR=1 sufficedtocapturetheARCHbehaviorofthatsignal. (a)percentageofwindowsofsignificantcausality (b)causalconnectionsforatleast10windows Figure 5.5: Short-time causality in mean among icEEG signals during the resting state. Many connections show significant causality in mean, indicating a dense network of causalconnectionsintherestingstate. 106 (a)percentageofwindowsofsignificantcausality (b)causalconnectionsforatleast10windows Figure 5.6: Short-time causality in variance among icEEG signals during the resting state. Fewer connections are found, and many of these connections overlap with those forcausalityinmean. Agreementwithcausalityinmeanmaybecausedbytheeffectof causalityinvarianceonestimatingcausalityinmean. 107 Chapter6 Conclusion In this dissertation we presented solutions to three important problems in biological signals: consistency across devices, detection of causality in mean and detection of causalityinvariance. Forconsistencyacrossdevices,wedevelopedaproceduretocom- pare consistency of biological signals recorded from different devices. We also found, as a result, the preprocessing steps that promoted such inter-device consistency. For detection of causality in mean, we developed a new model to determine such causality betweensetsofsignals. Thismodel,canonicalGrangercausality,exploitsthesharingof signalsbetweennearbysensorstobetterestimatemacro-scalecausality. Forcausalityin variance,weconstructedanewalgorithmtomorequicklyestimatethenecessarycondi- tional heteroscedasticity models. Using an alternating direction method of multipliers, we were able to reduce the runtime enough to allow short-time analysis of causality between biological signals. In all three projects, we used recordings of brain activity to demonstratetheeffectivenessofourapproachesindeterminingtime-varyingfunctional behaviorofabiologicalsystem. Biological signals are difficult to analyze due to their relatively low signal-to-noise ratioandhighvariabilityacrossmultiplerunsofthesameexperiment. Inaddition,most methods to record biological signals use an array of sensors that cover the entire bio- logical system (e.g. the brain). These sensors are sensitive to relatively wide areas of brain activity in comparison to the size of a neuron; hence, there is an overlap of in- formation in the time series recorded between sensors. All three methods presented in 108 this dissertation attempted to increase the benefits and decrease the drawbacks of these properties. Forexample,wefoundthatappropriatetemporalpreprocessingincreasedthe consistencyofbrainactivityestimatedfromMEGdata,mostlikelyduetoanincreasein signal-to-noise ratio. For Granger causality analysis, pooling nearby sensors based on regions of interest detected macro-scale causality by exploiting the shared information ofsensorsineachregion. Foranalysisofcausalityinvariance,thevariabilityofcausal- ity across time required a method that quickly estimated conditional heteroscedasticity models. In each case these methods attempted to take into account a key trait of the biologicalsignalsbeinganalyzed;thisisanimportantandoftenneglectedpartofsignal modelling. These model assumptions are not specific to biological signals, however. Causality is a philosophical construct to model how humans determine the behavior and manip- ulation of the systems around them. In causal analysis, we attempt to mathematically model this construct. For example, Granger causality expects a consistent delay be- tween changes in one signal and changes in a second signal. This view of causality as a lagged relationship has been assumed since the days of David Hume [Hum40], but research started by Albert Michotte [Mic63] and continued by Bayesian network re- searchers [Pea09] has shown that the human perception of causality is event-based as well as lag-based. Thus, causality is not narrowly defined by past and future events; causality includes any perceived actions of one system upon another. Lagged causal- ity is only a subset of causality whereby the actions of one system on another have a quantifiabledelay. The models and methods derived above have been consistently posed in terms of “signals” and “time series” and “sets” so as to provide generalizability. Most analyses ofrecordablesystemscanbenefitfromthemethodsprovidedhere. Forexample,wecan 109 useconsistencyanalysisforgeologicalsurveyingacrossinstruments,canonicalGranger causality for analysis of genetic variation and mutations, and conditional heteroscedas- ticity estimation via ADMM for dynamic analysis of weather patterns. Generalization of these methods requires that the application satisfy the assumptions being exploited byeachalgorithm. Consistencyanalysisrequiresthesameunderlyingsignalofinterest in each set of data, e.g. the brain response to a stimulus or the signal characteristic of underground oil. Canonical Granger causality asks for sets of signals in which each set is hypothesized to have one common signal of interest, e.g. sets of nearby brain sen- sors or sets of gene expressions. ARCH modeling assumes that the causal mechanism manifests in changes in variation or magnitude, and our ADMM method will benefit when those changes are time-varying or occur over short time windows, e.g. in cases of causal resting-state brain activity or weather patterns from multiple locations. Our ADMM method will also benefit in this age of Big Data, where faster analyses lead to moretestablehypothesesinallofthesciences. We also can expand upon this research in numerous ways. For example, most of the models here have a linear element that can be extended to nonlinear relationships between the past and present. However, those nonlinear relationships usually require longstationarytimeseries; itwillbeimportanttodevelopnonlinearmethodsthatwork well on shorter time series. We can increase the complexity of these models to differ- entiatebetweendirectinfluenceandindirectinfluencebyaddingpossiblerelaysignals; these models usually have a much higher variance and thus need to be regularized ap- propriately. We can also extend the random variables to discrete random variables; there is much work in the causality of “point processes” representing activations of an object [HRA + 95, SB03], which would increase the application of these methods to event-related signals such as traffic jams and device errors. In each case, we are adding 110 assumptions to the models and algorithms described above that increase the effective- ness of the algorithms. We must take care that adding those assumptions does not bias themodeltowardsthenoiseinthetimeseriesweuse;inotherwords,wewanttobetter fitnotonlythedatawehavebutthedatawewillgetifwecontinuerecording. In determining how a system works, our natural approach is to record some aspect of the system, be it brain electrical voltage or temperature or stock index price. By recording some aspect of a dynamic system, we generate time series that hopefully represent the system’s behavior. Under this general framework, there is still a lot of worktobedonetofindacompletesetoftoolsthatcandescribeasystemfromrecorded timeseries. Wehavedescribedthreealgorithmsthatincreasethistoolset,andweexpect that in the future we will be able to grow this toolset to be able to characterize systems, biologicalorotherwise,thatgovernthebehavioroftheworldwelivein. 111 References [ACM + 07] Laura Astolfi, Febo Cincotti, Donatella Mattia, M Grazia Marciani, Luiz A Baccalá, Fabrizio de Vico Fallani, Serenella Salinari, Mauro Ursino, Melissa Zavaglia, Lei Ding, J Christopher Edgar, Gregory A Miller, Bin He, and Fabio Babiloni. Comparison of different corti- cal connectivity estimators for high-resolution EEG recordings. HBM, 28(2):143–57,February2007. 2.4 [ADC + 04] ZimbulAlbo,GonzaloVianaDiPrisco,YonghongChen,GovindanRan- garajan,WilsonTruccolo,JianfengFeng,RobertPVertes,andMingzhou Ding. Is partial coherence a viable technique for identifying generators ofneuraloscillations? Biol Cyber,90(5):318–26,May2004. 2.4 [AHJL13] Syed Ashrafulla, Justin Pritam Haldar, Anand A Joshi, and Richard M Leahy. Canonical Granger causality between regions of interest. Neu- roImage,83:189–199,June2013. 1,5.1 [Aka73] Hirotugu Akaike. Information theory and an extension of the maximum likelihood principle. In B N Petrov and F Csaki, editors, Second Inter- national Symposium on Information Theory, volume 1, pages 267–281. AkademiaiKiado,Budapest,AkademiaiKiado,1973. 2.4.2.1 [AMS08] Pierre-Antoine Absil, Robert Mahony, and Rodolphe Sepulchre. Op- timization Methods on Matrix Manifolds. Princeton University Press, Princeton,NJ,2008. B [APL13] Sergul Aydore, Dimitrios Pantazis, and Richard M Leahy. A note on the phase locking value and its properties. NeuroImage, 74:231–44, July 2013. 2.4.1 [BAK12] GyörgyBuzsáki,CostasAAnastassiou,andChristofKoch. Theoriginof extracellular fields and currents – EEG, ECoG, LFP and spikes. Nature Reviews Neuro,13(6):407–20,June2012. 2.2.1.1 [BB12] Lionel Barnett and Terry Bossomaier. Transfer entropy as a log- likelihoodratio. [preprint],pages1–10,May2012. 2.4.2 112 [BBS09] LionelBarnett, AdamBBarrett, andAnilKSeth. Grangercausality and transfer entropy are equivalent for Gaussian variables. Phys Rev Letters, 103(23):2–5,December2009. 2.4.2,2.4.2.1 [BCB + 05] Fabio Babiloni, Febo Cincotti, Claudio Babiloni, Filippo Carducci, Do- natellaMattia,LauraAstolfi,AlessandraBasilisco,PaoloMariaRossini, LeiDing,YNi,JCheng,KChristine,JohnSweeney,andBinHe.Estima- tion of the cortical functional connectivity with the multimodal integra- tionofhigh-resolutionEEGandfMRIdatabydirectedtransferfunction. NeuroImage,24(1):118–31,January2005. 2.4 [BCN93] Steven L Bressler, Richard Coppola, and Richard Nakamura. Episodic multiregional cortical coherence at multiple frequencies during visual taskperformance. Nature,366(6451):153–6,November1993. 4.1,4.5 [BDY99] Steven L Bressler, Mingzhou Ding, and Weiming Yang. Investigation ofcooperativecorticaldynamicsbymultivariateautoregressivemodeling of event-related local field potentials. Neurocomputing, 26-27:625–631, June1999. 4.5.2,4.5.3 [BEN94] TimBollerslev,RobertFEngle,andDanielBNelson. ARCHmodels. In RobertFEngleandDanMcFadden,editors, Handbook ofEconometrics, volumeIV,chapter49,pages2959–3038.ElsevierB.V.,Amsterdam,The Netherlands,4edition,1994. 5.1 [Ber99] Dimitri P Bertsekas. Nonlinear Programming, volume 2. Athena Scien- tific,1999. 5.2.2 [BFV + 12] Julie Blumberg, Iván Sánchez Fernández, Martina Vendrame, Bernhard Oehl, William O Tatum, Stephan Schuele, Andreas V Alexopoulos, An- napurna Poduri, Christoph Kellinghaus, Andreas Schulze-Bonhage, and TobiasLoddenkemper.Dacrysticseizures: demographic,semiologic,and etiologicinsightsfromamulticenterstudyinlong-termvideo-EEGmon- itoringunits. Epilepsia,53(10):1810–9,October2012. 3.1 [BG07] David R Brillinger and Apratim Guha. Mutual information in the fre- quency domain. J Stat Planning Inference, 137(3):1076–1084, March 2007. 2.4.2 [BGY + 09] Guangyu Bin, Xiaorong Gao, Zheng Yan, Bo Hong, and Shangkai Gao. An online multi-channel SSVEP-based brain-computer interface using a canonical correlation analysis method. J Neural Engr, 6(046002):1–6, August2009. 4.1 113 [BH95] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Society B,57(1):289–300,1995. 4.5.3 [BK79] Giorgio V Borgiotti and Leonard J Kaplan. Superresolution of uncorre- latedinterferencesourcesbyusingadaptivearraytechniques. IEEETrans Antennas Propagation,27(6):842–845,November1979. A.2 [BKK04] Katarzyna J Blinowska, Rafal Ku´ s, and Maciej J Kaminski. Granger causality and information flow in multivariate processes. Phys Rev E, 70(5):1–4,November2004. 1,2.4,5.1 [BLC + 11] ClaudioBabiloni,RobertaLizio,FilippoCarducci,FabrizioVecchio,Al- berto Redolfi, Silvia Marino, Gioacchino Tedeschi, Patrizia Montella, Antonio Guizzaro, Fabrizio Esposito, Alessandro Bozzao, Franco Giu- bilei, Francesco Orzi, Carlo C Quattrocchi, Andrea Soricelli, Elena Sal- vatore, Annalisa Baglieri, Placido Bramanti, Enrica Cavedo, Raffaele Ferri, Filomena Cosentino, Michelangelo Ferrara, Ciro Mundi, Gian- paolo Grilli, Silvia Pugliese, Gianluca Gerardi, Laura Parisi, Fabrizio Vernieri, Antonio I Triggiani, Jan T Pedersen, Hans-Göran Hå rdemark, PaoloMRossini,andGiovanniBFrisoni.Restingstatecorticalelectroen- cephalographic rhythms and white matter vascular lesions in subjects with Alzheimer’s disease: an Italian multicenter study. J Alzheimer’s, 26(2):331–46,January2011. 3.1 [BLR06] LucBauwens,SébastienLaurent,andJeroenV.K.Rombouts. Multivari- ate GARCH models: a survey. J Applied Econ, 21(1):79–109, January 2006. 5.1 [BML01] Sylvain Baillet, John Compton Mosher, and Richard M Leahy. Electro- magnetic brain mapping. IEEE Sig Proc, 18(6):14–30, 2001. 2.2.2.2, 4.6 [BMS + 11] Gregory G Brown, Daniel H Mathalon, Hal Stern, Judith Ford, Bryon Mueller, Douglas N Greve, Gregory McCarthy, James Voyvodic, Gary Glover, Michele Diaz, Elizabeth Yetter, I Burak Ozyurt, Kasper W Jor- gensen, Cynthia G Wible, Jessica A Turner, Wesley K Thompson, and StevenGPotkin. MultisitereliabilityofcognitiveBOLDdata. NeuroIm- age,54(3):2163–75,February2011. 3.1,3.4 [Bol86] Tim Bollerslev. Generalized autoregressive conditional heteroskedastic- ity. J Econ,31(3):307–327,April1986. 5.4 114 [Bol90] Tim Bollerslev. Modelling the coherence in short-run nominal ex- change rates: A multivariate generalized ARCH model. Rev Econ Stat, 72(3):498–505,1990. 5.1,5.4 [Bol08] TimBollerslev. GlossarytoARCH(GARCH). SSRNElectronicJournal, 2008. 5.1 [BPC + 10] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eck- stein. Distributed optimization and statistical learning via the alternat- ing direction method of multipliers. Foundations and Trends in Machine Learning,3(1):1–122,2010. 5.2.4,D.3,D.4 [BQB + 09] Nicholas M Barbaro, Mark Quigg, Donna K Broshek, Mariann M Ward, Kathleen R Lamborn, Kenneth D Laxer, David A Larson, William Dil- lon, Lynn Verhey, Paul Garcia, Ladislau Steiner, Christine Heck, Dou- glas Kondziolka, Robert Beach, William Olivero, Thomas C Witt, Vi- centa Salanova, and Robert Goodman. A multicenter, prospective pilot study of gamma knife radiosurgery for mesial temporal lobe epilepsy: seizure response, adverse events, and verbal memory. Annals of Neurol- ogy,65(2):167–75,February2009. 3.1 [Bri01] David R Brillinger. Time Series: Data Analysis and Theory. SIAM: SocietyforIndustrialandAppliedMathematics,2001. 2.1.1,5.1 [BS97] Roland Beucker and Heidi Anne Schlitt. Temporal and spatial prewhitening of multi-channel MEG data. Technical Report November, ForschungszentrumJülich,Jülich,Germany,1997. 3.4 [BS01] Luiz A Baccalá and Koichi Sameshima. Partial directed coherence: a new concept in neural structure determination. Biol Cyber, 84(6):463– 474,May2001. 4.6,5.1 [BS10] Adam B Barrett and Anil K Seth. Multivariate Granger causality and generalized variance. Phys Rev E, 81(4):1–14, April 2010. 4.1, 4.2.2, 4.6,5.1 [BVR + 08] Matthew J Brookes, Jiri Vrba, Stephen E Robinson, Claire M Steven- son, Andrew M Peters, Gareth R Barnes, Arjan Hillebrand, and Peter G Morris. Optimising experimental design for MEG beamformer imaging. NeuroImage,39(4):1788–802,February2008. 3.4 [BWL + 11] Matthew J Brookes, Mark Woolrich, Henry Luckhoo, Darren Price, JoanneRHale,MaryCStephenson,GarethRBarnes,StephenMSmith, andPeterGMorris. Investigatingtheelectrophysiologicalbasisofresting 115 statenetworksusingmagnetoencephalography. PNAS,108(40):16783–8, October2011. 4.6 [BZHH95] Bharat Biswal, F Zerrin Yetkin, Victor M Haughton, and James S Hyde. Functional connectivity in the motor cortex of resting human brain us- ing echo-planar mri. Magnetic Resonance in Medicine, 34(4):537–541, October1995. 2.4 [CALC10] NicolleMCorrea,TülayAdali,Yi-OuLi,andVinceDCalhoun. Canon- icalcorrelationanalysisfordatafusionandgroupinferences: Examining applications of medical imaging data. IEEE Sig Proc, 27(4):39–50, Jan- uary2010. 2.4,4.1 [Cap07] MassimilianoCaporin. Variance(non)causalityinmultivariateGARCH. Econ Rev,26(1):1–24,February2007. 5.1 [CB01] GeorgeCasellaandRogerLBerger. StatisticalInference. DuxburyPress, 2edition,2001. 2.3.1 [CBD06] Yonghong Chen, Steven L Bressler, and Mingzhou Ding. Frequency de- composition of conditional Granger causality and application to multi- variateneuralfieldpotentialdata. JNeuroMethods,150(2):228–37,Jan- uary2006. 4.5.2 [CC75] Peter E Caines and C. Chan. Feedback between stationary stochastic processes. IEEE Trans Auto Control,20(4):498–508,August1975. 5.1 [CC83] David Cohen and B. Neil Cuffin. Demonstration of useful differences betweenmagnetoencephalogramandelectroencephalogram. EEG&Clin Neuropsych,56(1):38–51,July1983. A.2 [CCY + 91] B Neil Cuffin, David Cohen, Kazutomo Yunokuchi, Roman Maniewski, Christopher Purcell, G Rees Cosgrove, John Ives, John Kennedy, and Donald Schomer. Tests of EEG localization accuracy using implanted sources in the human brain. Annals Neuro, 29(2):132–8, February 1991. 3.1 [CED + 06] Ryan T Canolty, E Edwards, S S Dalal, M Soltani, S S Nagarajan, H E Kirsch, M S Berger, N M Barbaro, and R T Knight. High gamma power is phase-locked to theta oscillations in human neocortex. Science, 313(5793):1626–8,September2006. 2.4.1 [CHA + 00] Dietmar Cordes, Victor M Haughton, Konstantinos Arfanakis, Gary J Wendt, Patrick A Turski, Chad H Moritz, Michelle A Quigley, and M Elizabeth Meyerand. Mapping functionally related regions of 116 brain with functional connectivity MR imaging. Amer J Neuroradio, 21(9):1636–44,October2000. 2.4 [Chi12] DanielChicharro. OnthespectralformulationofGrangercausality. Biol Cyber,(2011):331–347,January2012. 4.6 [CL03] Fabienne Comte and Offer Lieberman. Asymptotic theory for multivari- ate GARCH processes. J Multivariate Anal, 84(1):61–84, January 2003. 2.4.2.2 [CO12] StéphaneChrétienandJuan-PabloOrtega. MultivariateGARCHestima- tion via a Bregman-proximal trust-region method. Comp Stat & Data Anal,pages1–27,November2012. 5.1,5.2.2,5.4 [Coh68] David Cohen. Magnetoencephalography: evidence of magnetic fields produced by alpha-rhythm currents. Science, 161(3843):784–786, Au- gust1968. 2.2.2.1 [CWM12] JoyceChiang,ZJaneWang,andMartinJMcKeown. Ageneralizedmul- tivariateautoregressive(GmAR)-basedapproachforEEGsourceconnec- tivityanalysis. IEEE Trans Sig Proc,60(1):453–465,January2012. 5.1 [Daw54] G D Dawson. A summation technique for the detection of small evoked potentials. Clin Neuro,6:65–84,January1954. 2.2.3.1 [DBYL00] Mingzhou Ding, Steven L Bressler, Weiming Yang, and Hualou Liang. Short-window spectral analysis of cortical event-related potentials by adaptivemultivariateautoregressivemodeling: datapreprocessing,model validation, and variability assessment. Biol Cyber, 83(1):35–45, July 2000. 2.4,4.5.2 [DEV + 03] Maryann D’Alessandro, Rosana Esteller, George Vachtsevanos, Arthur Hinson,JavierEchauz,andBrianLitt. Epilepticseizurepredictionusing hybrid feature selection over multiple intracranial EEG electrode con- tacts: areportoffourpatients. IEEETransBiomedEngr,50(5):603–615, May2003. 4.1 [Dic45] Lee Raymond Dice. Measures of the amount of ecologic association be- tweenspecies. Ecology,26(3):297,July1945. 3.2.6 [DLF + 00] AndersMDale,ArthurKLiu,BruceRFischl,RandyLBuckner,JohnW Belliveau,JeffreyDLewine,andEricHalgren. Dynamicstatisticalpara- metricmapping: combiningfMRIandMEGforhigh-resolutionimaging ofcorticalactivity. Neuron,26(1):55–67,April2000. 3.2.5 117 [DS93] Anders M Dale and Martin I Sereno. Improved localizadon of cortical activity by combining EEG and MEG with MRI cortical surface recon- struction: a linear approach. J Cog Neuro, 5(2):162–176, April 1993. 2.2.2.2,3.2.4 [DSN06] SarangSDalal,KensukeSekihara,andSrikantanSNagarajan. Modified beamformersforcoherentsourceregionsuppression.IEEETransBiomed Engr,53(7):1357–63,July2006. 3.3.3 [EAS98] Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithmswithorthogonalityconstraints. SIAMJournalonMatrixAnal- ysis and Applications,20(2):303–353,1998. 4.3,4.6,B [EK95] Robert F Engle and Kenneth F Kroner. Multivariate simultaneous gen- eralized ARCH. Econ Theory, 11(01):122, February 1995. 5.1, 5.2.1, 5.2.2,5.3.1.1,5.4 [Eng82] Robert F Engle. Autoregressive conditional heteroscedasticity with es- timates of the variance of United Kingdom inflation. Econometrica, 50(4):987–1007,1982. 1,5.1,5.2.1 [Eng84] Robert F Engle. Wald, likelihood ratio, and Lagrange multiplier tests in econometrics. InZviGrilichesandMichaelDIntriligator,editors,Hand- book of Econometrics, volume II, chapter 13, pages 776–825. Elsevier, 1984. 2.4.2.2 [Eng02] Robert F Engle. Dynamic conditional correlation. J Bus & Econ Stat, 20(3):339–350,July2002. 5.4 [ET86] Bradley Efron and Robert Tibshirani. Bootstrap Methods for Standard Errors,ConfidenceIntervals,andOtherMeasuresofStatisticalAccuracy. Statistical Science,1(1):54–75,February1986. 2.3.2,3.2.3 [ET93] Bradley Efron and Robert Tibshirani. An Introduction to the Bootstrap. ChapmanandHall,BocaRaton,FL,USA,1993. 2.3.2,3.2.3,4.4.2 [EY36] CarlEckartandGaleYoung. Theapproximationofonematrixbyanother oflowerrank. Psychometrika,1(3):211–218,September1936. 2.1.3 [FBK85] PiotrJFranaszczuk,KatarzynaJBlinowska,andMarekKowalczyk. The application of parametric multichannel spectral estimates in the study of electricalbrainactivity. Biol Cyber,51(4):239–247,January1985. 2.4 [FFD10] Lawrence M Friedman, Curt D Furberg, and David L DeMets. Funda- mentals of Clinical Trials. Springer,NewYork,NY,2010. 3.1 118 [FHA + 13] Seyyed Hamed Fouladi, Mohammadehsan Hajiramezanali, Hamidreza Amindavar,JamesARitcey,andPaymanArabshahi. Denoisingbasedon multivariate stochastic volatility modeling of multiwavelet coefficients. IEEE Trans Sig Proc,61(22):5578–5589,November2013. 5.2.2 [FKD + 10] Frederic H Fahey, Paul E Kinahan, Robert K Doot, Mehmet Kocak, Harold Thurston, and Tina Young Poussaint. Variability in PET quan- titation within a multicenter consortium. Medical Physics, 37(7):3660, 2010. 3.1 [GCC96] MiguelAngelGuevaraandMaríaCorsi-Cabrera. EEGcoherenceorEEG correlation? Int J Psychophys,23(3):145–153,October1996. 2.4 [Gew82] JohnGeweke. Measurementoflineardependenceandfeedbackbetween multipletimeseries. JAmerStatAssoc,77(378):304–313,1982. 2.4.2.1, 4.1,4.6,5.1,5.2.5 [Gha98] Zoubin Ghahramani. Learning dynamic Bayesian networks. Adap- tive Processing of Sequences and Data Structures, 1387:168–197, 1998. 2.4.2,4.1 [GJM + 10] Viktoria-Eleni Gountouna, Dominic E Job, Andrew M McIntosh, TWilliamJMoorhead,GKatherineLLymer,HeatherCWhalley,Jeremy Hall, Gordon D Waiter, David Brennan, David J McGonigle, Trevor S Ahearn, Jonathan Cavanagh, Barrie Condon, Donald M Hadley, Ian Marshall, Alison D Murray, J Douglas Steele, Joanna M Wardlaw, and Stephen M Lawrie. Functional Magnetic Resonance Imaging (fMRI) re- producibility and variance components across visits and scanning sites withafingertappingtask. NeuroImage,49(1):552–60,January2010. 3.1 [GKdW + 02] LilliGeworski,BerndOKnoop,MaikedeWit,VelimirIvancevi´ c,Roland Bares, and Dieter L Munz. Multicenter comparison of calibration and cross calibration of PET scanners. J Nuclear Medicine, 43(5):635–9, May2002. 3.1 [GR70] GeneHGolubandChristianReinsch. Singularvaluedecompositionand least squares solutions. Numerische Mathematik, 14(5):403–420, April 1970. 2.1.3 [Gra69] CliveWilliamJohnGranger. Investigatingcausalrelationsbyeconomet- ricmodelsandcross-spectralmethods. Econometrica,37(3):424,August 1969. 1,2.4.2,4.1 [Gra80] Clive William John Granger. Testing for causality. J Econ Dyn Control, 2:329–352,January1980. 2.4.2.1 119 [GRD99] Gloria González-Rivera and Feike C Drost. Efficiency comparisons of maximum-likelihood-based estimators in GARCH models. J Econ, 93(1):93–111,November1999. 5.1 [GSAL08] David W Gow, Jennifer A Segawa, Seppo P Ahlfors, and Fa-Hsuan Lin. Lexical influences on speech perception: a Granger causality analysis of MEGandEEGsourceestimates. NeuroImage,43(3):614–23,November 2008. 2.4 [GSK + 08] Shuixia Guo, Anil K Seth, Keith M Kendrick, Cong Zhou, and Jianfeng Feng. Partial Granger causality–eliminating exogenous inputs and latent variables. J Neuro Methods,172(1):79–93,July2008. 4.6 [HB03] Arjan Hillebrand and Gareth R Barnes. The use of anatomical con- straints with MEG beamformers. NeuroImage, 20(4):2302–2313, De- cember2003. 3.4 [HCN + 09] Avgis Hadjipapas, Erik Casagrande, Angel Nevado, Gareth R Barnes, Gary Green, and Ian E Holliday. Can we observe collective neuronal activity from macroscopic aggregate signals? NeuroImage, 44(4):1290– 303,March2009. 1 [Hea02] Michael Heath. Scientific Computing. McGraw-Hill, New York, NY, 2 edition,2002. 5.2.4 [Hes69] Magnus R Hestenes. Multiplier and gradient methods. J Opt Th App, 4(5):303–320,November1969. 5.2.3 [HH06] Christian Matthias Hafner and Helmut Herwartz. A Lagrange multiplier testforcausalityinvariance. EconomicsLetters,93(1):137–141,October 2006. 2.4.2.2,5.2.5 [HH08] Christian Matthias Hafner and Helmut Herwartz. Analytical quasi max- imum likelihood inference in multivariate volatility models. Metrika, 67(2):219–239,March2008. 2.4.2.2 [HHC + 12] Joerg F Hipp, David J Hawellek, Maurizio Corbetta, Markus Siegel, and Andreas K Engel. Large-scale cortical correlation structure of sponta- neousoscillatoryactivity. Nature Neuro,15(6):884–90,June2012. 4.6 [HHI + 93] Matti S Hämäläinen, Riitta Hari, Risto J Ilmoniemi, Jukka Knuutila, and Olli V Lounasmaa. Magnetoencephalography – theory, instrumentation, and applications to noninvasive studies of the working human brain. Re- views of Modern Physics,65(2):413–497,April1993. 1,3.1 120 [HMAS03] Wolfram Hesse, Eva Möller, Matthias Arnold, and Bärbel Schack. The useoftime-variantEEGGrangercausalityforinspectingdirectedinterde- pendenciesofneuralassemblies. JNeuroMethods,124(1):27–44,March 2003. 2.4 [HML99] Ming-Xiong Huang, John Compton Mosher, and Richard M Leahy. A sensor-weighted overlapping-sphere head model and exhaustive head modelcomparisonforMEG. PMB,44(2):423–440,February1999. 3.2.4 [HMLA03] Ingo Hertrich, Klaus Mathiak, Werner Lutzenberger, and Hermann Ack- ermann.Processingofdynamicaspectsofspeechandnon-speechstimuli: awhole-headmagnetoencephalographystudy. CognitiveBrainResearch, 17(1):130–139,June2003. 2.4 [Hot36] Howard Hotelling. Relations between two sets of variables. Biometrika, 28(3):321–377,1936. 4.1 [HP09] ChristianMatthiasHafnerandAriePreminger. Onasymptotictheoryfor multivariate GARCH models. J Multivariate Anal, 100(9):2044–2054, October2009. 2.4.2.2,5.1,5.2.2 [HPBL10] Hua Brian Hui, Dimitrios Pantazis, Steven L Bressler, and Richard M Leahy. Identifying true cortical interactions in MEG using the nulling beamformer. NeuroImage,49(4):3161–74,February2010. 4.6 [HRA + 95] D M Halliday, J R Rosenberg, A M Amjad, P Breeze, B A Conway, and S F Farmer. A framework for the analysis of mixed time series/point processdata–theoryandapplicationtothestudyofphysiologicaltremor, single motor unit discharges and electromyograms. Prog Biophys Molec Biol,64(2-3):237–78,January1995. 6 [HSP + 02] K Herholz, E Salmon, D Perani, J-C Baron, V Holthoff, L Frölich, P Schönknecht, K Ito, R Mielke, and E Kalbe. Discrimination between Alzheimer dementia and controls by automated analysis of multicenter FDGPET. NeuroImage,17(1):302–316,September2002. 3.1 [HSPVB07] Katerina Hlavackova-Schindler, Milan Paluš, Martin Vejmelka, and Joy- deep Bhattacharya. Causality detection based on information-theoretic approachesintimeseriesanalysis. Physics Reports,441(1):1–46,March 2007. 2.4.2,2.4.2.1 [HT89] Clifford M Hurvich and Chih-Ling Tsai. Regression and time series model selection in small samples. Biometrika, 76(2):297–307, 1989. 2.4.2.1 121 [Hum40] DavidHume. A Treatise of Human Nature. London,1740. 2.4.2,6 [Ima10] Toshiaki Imada. A method for MEG data that obtains linearly- constrained minimum-variance beamformer solution by minimum-norm least-squares method. In 17th Int’l Conf Biomag, pages 152–154, 2010. 3.4 [JCG + 06] Jorge Jovicich, Silvester Czanner, Douglas Greve, Elizabeth Haley, An- dre van der Kouwe, Randy Gollub, David Kennedy, Franz Schmitt, Gre- goryBrown,JamesMacfall,BruceFischl,andAndersMDale. Reliabil- ity in multi-site structural MRI studies: effects of gradient non-linearity correction on phantom and human data. NeuroImage, 30(2):436–43, April2006. 3.1 [JMCC96] DanielJohnston,JeffreyCMagee,CostaMColbert,andBrianRCristie. Active properties of neuronal dendrites. Annu Rev Neurosci, 19(Gree- nough1975):165–86,January1996. 2.2 [KBFIC + 10] Christopher T Kello, Gordon D A Brown, Ramon Ferrer-I-Cancho, JohnGHolden,KlausLinkenkaer-Hansen,TheoRhodes,andGuyCVan Orden. Scalinglawsincognitivesciences. Trends in Cognitive Sciences, 14(5):223–32,May2010. 3.2.4.1 [KCM + 11] L B Krupp, C Christodoulou, P Melville, W F Scherl, L-Y Pai, L R Muenz, D He, R H B Benedict, A Goodman, S Rizvi, S R Schwid, B Weinstock-Guttman, H J Westervelt, and H Wishart. Multicenter ran- domized clinical trial of donepezil for memory impairment in multiple sclerosis. Neurology,76(17):1500–7,April2011. 3.1 [KDTB01] Maciej J Kaminski, Mingzhou Ding, Wilson A Truccolo, and Steven L Bressler.Evaluatingcausalrelationsinneuralsystems: Grangercausality, directed transfer function and statistical assessment of significance. Biol Cyber,85(2):145–157,August2001. 4.6 [Kie53] JackCarlKiefer. Sequentialminimaxsearchforamaximum. ProcAMS, 4:502–506,1953. B [KL80] Virginia C Klema and Alan J Laub. The singular value decomposi- tion: Its computation and some applications. IEEE Trans Auto Control, 25(2):164–176,April1980. 2.1.3 [KN98] KennethFKronerandVictorKNg. Modelingasymmetriccomovements ofassetreturns. Rev Fin Stud,11(4):817–844,October1998. 5.4 122 [Koc08] Hans Koch. SQUIDs - quo vadis? Physica C, 468(15-20):1112–1114, September2008. 2.2.2.1 [KS02] Andreas Kaiser and Thomas Schreiber. Information transfer in continu- ousprocesses. Physica D,166(1-2):43–62,June2002. 2.4.2.1 [KSJ00] Eric R Kandel, James H Schwartz, and Thomas M Jessell. Principles of Neural Science. McGraw-Hill,4edition,2000. 2.2 [KV81] AntonAAKuylenandTheoMMVerhallen. Theuseofcanonicalanal- ysis. J Econ Psych,1(3):217–237,September1981. 4.1 [L ¨ 85] Helmut Lütkepohl. Comparison of criteria for estimating the order of a vectorautoregressiveprocess. JTimeSerAnal,6(1):35–52,1985. 2.4.2.1 [LBB03] TMLock,JSBaizer,andDBBender. Distributionofcorticotectalcells inmacaque. ExperimentalBrainResearch,151(4):455–70,August2003. 4.5.3,4.6 [LCL + 98] TLocatelli,MCursi,DLiberati,MFranceschi,andGComi. EEGcoher- ence in Alzheimer’s disease. Clin Neuro, 106(3):229–237, March 1998. 2.4 [LDB02] Arthur K Liu, Anders M Dale, and John W Belliveau. Monte Carlo sim- ulationstudiesofEEGandMEGlocalizationaccuracy. HBM,62(1):47– 62,2002. 3.1,3.3.2.1 [LDNB00] Hualou Liang, Mingzhou Ding, Richard Nakamura, and Steven L Bressler. Causal influences in primate cerebral cortex during visual pattern discrimination. Neuroreport, 11(13):2875–80, September 2000. 2.2.3.2,4.5.2,4.5.4,4.6 [LESF00] Joachim Lübke, Veronica Egger, Bert Sakmann, and Dirk Feldmeyer. Columnar organization of dendrites and axons of single and synaptically coupled excitatory spiny neurons in layer 4 of the rat barrel cortex. The Journal of Neuroscience,20(14):5300–11,July2000. 2.2 [LFG + 05] Dietrich Lehmann, Pascal L Faber, Silvana Galderisi, Werner M Her- rmann,ToshihikoKinoshita,MarthaKoukkou,ArmidaMucci,RobertoD Pascual-Marqui, Naomi Saito, Jiri Wackermann, Georg Winterer, and Thomas Koenig. EEG microstate duration and syntax in acute, medication-naive, first-episode schizophrenia: a multi-center study. Psych Rsrh,138(2):141–56,February2005. 3.1 123 [LHS + 09] Fa-HsuanLin,KeikoHara,VictorSolo,MarkVangel,JohnWBelliveau, Steven M Stufflebeam, and Matti S Hämäläinen. Dynamic Granger- Geweke causality modeling with application to interictal spike propaga- tion. HBM,30(6):1877–86,June2009. 4.6 [LMS + 98] Richard M Leahy, John Compton Mosher, Michael E Spencer, Ming- Xiong Huang, and Jeffrey David Lewine. A study of dipole localization accuracy for MEG and EEG using a human skull phantom. Clin Neuro, 107(2):159–73,August1998. 3.2.4,3.4 [LNS + 12] JoonLee,ShamimNemati,IkaroSilva,BradleyAEdwards,JamesPBut- ler, and Atul Malhotra. Transfer entropy estimation and directional cou- plingchangedetectioninbiomedicaltimeseries. BMEOnline,11(1):19, January2012. 2.4.2,2.4.2.1 [LTMF98] Joyce Liporace, William Tatum, George Lee Morris, and Jacqueline French. Clinical utility of sleep-deprived versus computer-assisted am- bulatory 16-channel EEG in epilepsy patients: a multi-center study. Epilepsy research,32(3):357–62,November1998. 3.1 [LWA + 06] Fa-Hsuan Lin, Thomas Witzel, Seppo P Ahlfors, Steven M Stufflebeam, John W Belliveau, and Matti S Hämäläinen. Assessing and improv- ing the spatial accuracy in MEG source localization by depth-weighted minimum-norm estimates. NeuroImage, 31(1):160–71, May 2006. 2.4, 3.4,A.1 [Mar87] S Lawrence Marple. Digital Spectral Analysis with Applications, vol- ume 1 of Prentice Hall Signal Processing Series. Prentice Hall, Engle- woodCliffs,NJ,1987. 2.4.2.1 [Max65] James C Maxwell. A Dynamical Theory of the Electromagnetic Field. Phil Trans B,155(1865):459–512,January1865. 2.2.2.1 [MBL03] John Compton Mosher, Sylvain Baillet, and Richard M Leahy. Equiv- alence of linear approaches in bioelectromagnetic inverse solutions. In IEEE SSP,number5,pages294–297.IEEE,2003. 3.4,A.2 [Met78] CharlesE.Metz. BasicprinciplesofROCanalysis. Seminars in Nuclear Medicine,8(4):283–298,October1978. 4.4.2,5.3.1.4 [MFF89] Edward L Merrin, Thomas C Floyd, and George Fein. EEG coherence in unmedicated schizophrenic patients. Biol Psych, 25(1):60–6, January 1989. 2.4 124 [MI08] A Mouraux and G D Iannetti. Across-trial averaging of event-related EEG responses and beyond. Magnetic Resonance Imaging, 26(7):1041– 54,September2008. 2.2.3.1 [Mic63] AlbertMichotte. Theperceptionofcausality.BasicBooks,Oxford,1963. 6 [Mit10] SanjitKMitra. DigitalSignalProcessing. McGraw-Hill,NewYork,NY, 2010. 2.1 [MLCS11] Daniele Marinazzo, Wei Liao, Huafu Chen, and Sebastiano Stramaglia. Nonlinear connectivity by Granger causality. NeuroImage, 58(2):330–8, September2011. 2.4.2,2.4.2.1,5.1 [MLL92] John Compton Mosher, Paul S Lewis, and Richard M Leahy. Multiple dipole modeling and localization from spatio-temporal MEG data. IEEE Trans Biomed Engr,39(6):541–57,June1992. 2.2.2.2,3.4 [MPS08a] DanieleMarinazzo,MarioPellicoro,andSebastianoStramaglia. Kernel- Granger causality and the analysis of dynamical networks. Phys Rev E, 77(5):1–9,May2008. 2.4.2,2.4.2.1,5.1 [MPS08b] Daniele Marinazzo, Mario Pellicoro, and Sebastiano Stramaglia. Kernel method for nonlinear Granger causality. Phys Rev Letters, 100(14):1–4, April2008. 2.4.2,2.4.2.1,5.1 [MPS12] Daniele Marinazzo, Mario Pellicoro, and Sebastiano Stramaglia. Causal information approach to partial conditioning in multivariate data sets. Comp Math Meth Med,2012:1–8,January2012. 2.4.2,5.1 [MSAW01] Eva Möller, Bärbel Schack, Matthias Arnold, and Herbert Witte. Instan- taneousmultivariateEEGcoherenceanalysisbymeansofadaptivehigh- dimensional autoregressive models. J Neuro Methods, 105(2):143–58, February2001. 2.4,5.1 [MTH + 08] Lisa Mosconi, Wai H Tsui, Karl Herholz, Alberto Pupi, Alexander Drzezga, Giovanni Lucignani, Eric M Reiman, Vjera Holthoff, Elke Kalbe,SandroSorbi,JanineDiehl-Schmid,RobertPerneczky,Francesca Clerici, Richard Caselli, Bettina Beuthien-Baumann, Alexander Kurz, SatoshiMinoshima,andMonyJdeLeon. Multicenterstandardized18F- FDG PET diagnosis of mild cognitive impairment, Alzheimer’s disease, andotherdementias. JNuclearMedicine,49(3):390–8,March2008. 3.1 [MUM83] MortimerMishkin,LeslieGUngerleider,andKathleenAMacko. Object visionandspatialvision: twocorticalpathways.TrendsinNeurosciences, 6:414–417,January1983. 4.5.2,4.6 125 [MWGD07] MichaelMurias,SaraJWebb,JessicaGreenson,andGeraldineDawson. RestingstatecorticalconnectivityreflectedinEEGcoherenceinindivid- ualswithautism. Biol Psych,62(3):270–3,August2007. 2.4 [NML93] Tore Nielsen, Jacques Montplaisir, and Maryse Lassonde. Decreased in- terhemisphericEEGcoherenceduringsleepinagenesisofthecorpuscal- losum. European Neurology,33(2):173–176,1993. 2.4 [OGH07] Wanmei Ou, Polina Golland, and Matti S Hämäläinen. Sources of vari- ability in MEG. In Nicholas Ayache, Sebastian Ourslein, and Anthony Maeder, editors, MICCAI, pages 751–759, Brisbane, 2007. Springer- Verlag. 1,3.1,3.4 [OOK + 02] Makoto Oishi, Hiroshi Otsubo, Shigeki Kameyama, Nobuhito Morota, Hiroshi Masuda, Masaomi Kitayama, and Ryuichi Tanaka. Epileptic spikes: magnetoencephalography versus simultaneous electrocorticogra- phy. Epilepsia,43(11):1390–5,November2002. 1,2.2.2.1 [Pan95] Christo Pantev. Evoked and induced gamma-band activity of the human cortex. Brain Topography,7(4):321–330,June1995. 2.2.3.2 [Par09] Lauri Parkkonen. Expanding the applicability of magnetoencephalogra- phy. Phd, Helsinki University of Technology, 2009. (document), 2.1, 2.2 [PBLD02] EkaterinaPataraia,ChristophBaumgartner,GeraldLindinger,andLüder Deecke. Magnetoencephalography in presurgical epilepsy evaluation. Neurosurgical Review,25(3):141–159,June2002. 3.1 [Pea09] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge UniversityPress,Cambridge,2edition,2009. 1,2.4.2,6 [Pet96] HellmuthPetsche. Approachestoverbal,visualandmusicalcreativityby EEG coherence analysis. Int J Psychophys, 24(1-2):145–59, November 1996. 2.4 [Pit37] E J G Pitman. Significance tests which may be applied to samples from anypopulations. Supplement to the Journal of the Royal Statistical Soci- ety,4(1):119–130,1937. 2.3.3 [PLMT97] John W Phillips, Richard M Leahy, John Compton Mosher, and Bijm Timsari.ImagingneuralactivityusingMEGandEEG.IEEEEngineering in Medicine and Biology Magazine,16(3):34–42,1997. 3.1 126 [PNBL03] Dimitrios Pantazis, Thomas E Nichols, Sylvain Baillet, and Richard M Leahy.SpatiotemporallocalizationofsignificantactivationinMEGusing permutation tests. In IPMI, volume 18, pages 512–23, July 2003. 2.3.3, 3.2.5 [PNBL05] Dimitrios Pantazis, Thomas E Nichols, Sylvain Baillet, and Richard M Leahy. A comparison of random field theory and permutation methods forthestatisticalanalysisofMEGdata.NeuroImage,25(2):383–94,April 2005. 3.2.5 [PP63] Wilder Penfield and Phanor Perot. The brain’s record of auditory and visualexperience. Brain,86(4):595–696,1963. 1,2.2.1.1 [PP04] Theologos Pantelidis and Nikitas Pittis. Testing for Granger causality in variance in the presence of causality in mean. Economics Letters, 85(2):201–207,November2004. 5.1,5.4 [PP12] KaareBrandtPetersenandMichaelSyskindPedersen. Thematrixcook- book. Technical report, Technical University of Denmark, 2012. D.2, D.4 [PR69] Elijah Polak and G Ribière. Note sur la convergence de méthodes de directionsconjuguées. Revue,3(1):35–43,1969. B [PV07] MilanPalušandMartinVejmelka. Directionalityofcouplingfrombivari- ate time series: How to avoid false causalities and missed connections. Phys Rev E,75(5):1–14,May2007. 2.4.2 [Rai11] Marcus E Raichle. The restless brain. Brain Connectivity, 1(1):3–12, January2011. 2.4 [RHPK12] JakobRunge,JobstHeitzig,VladimirPetoukhov,andJürgenKurths. Es- capingthecurseofdimensionalityinestimatingmultivariatetransferen- tropy. Phys Rev Letters,108(25):1–5,June2012. 2.4.2 [RS10] Mikail Rubinov and Olaf Sporns. Complex network measures of brain connectivity: uses and interpretations. NeuroImage, 52(3):1059–69, September2010. 2.4 [SB03] AnneCSmithandEmeryNBrown. Estimatingastate-spacemodelfrom point process observations. Neural Computation, 15(5):965–91, May 2003. 6 [Sch00] Thomas Schreiber. Measuring information transfer. Phys Rev Letters, 85(2):461–4,July2000. 2.4.2,2.4.2.1 127 [SFC + 10] Joao Ricardo Sato, André Fujita, Elisson F Cardoso, Carlos E Thomaz, Michael J Brammer, and Edson Amaro. Analyzing the connectivity be- tweenregionsofinterest: anapproachbasedonclusterGrangercausality for fMRI data analysis. NeuroImage, 52(4):1444–55, October 2010. 1, 4.1,4.2.3 [SHDN08] K Sekihara, K E Hild, S S Dalal, and S S Nagarajan. Performance of prewhitening beamforming in MEG dual experimental conditions. IEEE Trans Biomed Engr,55(3):1112–21,March2008. 3.2.4.1,3.4 [Sim43] George Gaylord Simpson. Mammals and the nature of continents. Amer J Sci,241(1):1–31,January1943. 3.2.6 [Sim72] Christopher Sims. Money, income and causality. Amer Eco Rev, 62(4):540–52,February1972. 2.4.2.1 [SJM + 06] C J Stam, B F Jones, I Manshanden, A M van Cappellen van Wal- sum, T Montez, Jeroen P A Verbunt, Jan C de Munck, B W van Dijk, HWBerendse,andPScheltens. Magnetoencephalographicevaluationof resting-statefunctionalconnectivityinAlzheimer’sdisease. NeuroImage, 32(3):1335–44,September2006. 2.4.1 [SKSD05] Paul Sauseng, Wolfgang Klimesch, Manuel Schabus, and Michael Dop- pelmayr. Fronto-parietal EEG coherence in theta and upper alpha re- flect central executive functions of working memory. Int J Psychophys, 57(2):97–103,August2005. 2.4 [SL02] David W Shattuck and Richard M Leahy. BrainSuite: an automated cor- tical surface identification tool. Medical Image Analysis, 6(2):129–42, June2002. 2.2.2.2,3.2.1 [SLEL07] Kaspar Schindler, Howan Leung, Christian E Elger, and Klaus Lehnertz. Assessingseizuredynamicsbyanalysingthecorrelationstructureofmul- tichannelintracranialEEG. Brain,130(Pt1):65–77,January2007. 2.4 [SLPB96] D Schwartz, D Lemoine, E Poiseau, and C Barillot. Registration of MEG/EEG data with 3D MRI: Methodology and precision issues. Brain Topography,9(2):101–116,December1996. 3.3.1 [SLR + 97] PSSebel,ELang,IJRampil,PFWhite,RCork,MJopling,NTSmith, P S Glass, and P Manberg. A multicenter study of bispectral electroen- cephalogram analysis for monitoring anesthetic effect. Anesthesia and Analgesia,84(4):891–9,April1997. 3.1 128 [SM05] Petre Stoica and Randolph L Moses. Spectral Analysis of Signals. 2005. 2.4.2.1,5.1 [Smi93] Steven T Smith. Geometric optimization methods for adaptive filtering. Phd,HarvardUniversity,1993. B [SNA + 09] E Sidransky, M A Nalls, J O Aasly, J Aharon-Peretz, G Annesi, E R Barbosa, A Bar-Shira, D Berg, J Bras, A Brice, C-M Chen, L N Clark, C Condroyer, E V De Marco, A Dürr, M J Eblan, S Fahn, M J Far- rer, H-C Fung, Z Gan-Or, T Gasser, R Gershoni-Baruch, N Giladi, A Griffith, T Gurevich, C Januario, P Kropp, A E Lang, G-J Lee-Chen, S Lesage, K Marder, I F Mata, A Mirelman, J Mitsui, I Mizuta, G Nico- letti, C Oliveira, R Ottman, A Orr-Urtreger, L V Pereira, A Quattrone, ERogaeva,ARolfs,HRosenbaum,RRozenberg,ASamii,TSamaddar, CSchulte,MSharma,ASingleton,MSpitz,E-KTan,NTayebi,TToda, A R Troiano, S Tsuji, M Wittstock, T G Wolfsberg, Y-R Wu, C P Za- betian,YZhao,andSGZiegler. Multicenteranalysisofglucocerebrosi- dasemutationsinParkinson’sdisease. NEJMedicine,361(17):1651–61, October2009. 3.1 [SNP + 01] K Sekihara, S S Nagarajan, D Poeppel, A Marantz, and Y Miyashita. Reconstructingspatio-temporalactivitiesofneuralsourcesusinganMEG vector beamformer technique. IEEE Trans Biomed Engr, 48(7):760–71, July2001. A.2 [SPJ + 10] Juan Luis Poletti Soto, Dimitrios Pantazis, Karim Jerbi, Sylvain Baillet, and Richard M Leahy. Canonical correlation analysis applied to func- tional connectivity in MEG. In Proc IEEE ISBI, pages 113–116, Rotter- dam,2010.IEEE. 4.6 [SS01] Emilio Salinas and Terrence J Sejnowski. Correlated neuronal activity and the flow of neural information. Nature Reviews Neuro, 2(8):539–50, August2001. 1 [SSK + 09] JoshuaSScheuermann,JanetRSaffer,JoelSKarp,AnthonyMLevering, andBarryASiegel. QualificationofPETscannersforuseinmulticenter cancer clinical trials: the American College of Radiology Imaging Net- workexperience. J Nuclear Medicine,50(7):1187–93,July2009. 3.1 [SSN05] Kensuke Sekihara, Maneesh Sahani, and Srikantan S Nagarajan. Local- ization bias and spatial resolution of adaptive and non-adaptive spatial filtersforMEGsourcereconstruction. NeuroImage,25(4):1056–67,May 2005. 3.1 129 [SSSB05] RaymondSalvador,JohnSuckling,ChristianSchwarzbauer,andEdBull- more. Undirectedgraphsoffrequency-dependentfunctionalconnectivity in whole brain networks. Phil Trans B, 360(1457):937–46, May 2005. 2.4 [SSSH97] Greg Stuart, Nelson Spruston, Bert Sakmann, and Michael Häusser. Ac- tion potential initiation and backpropagation in neurons of the mam- malianCNS. Trends in Neurosciences,20(3):125–31,March1997. 2.2 [ST09] Annastiina Silvennoinen and Timo Teräsvirta. Multivariate GARCH Models. In Thomas Mikosch, Jens-Peter Kreiß, Richard A Davis, and Torben Gustav Andersen, editors, Handbook of Financial Time Series, pages 201–229. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009. 5.1 [Str09] Gilbert Strang. Introduction to Linear Algebra. Wellesley Cambridge Press,Wellesley,MA,USA,2009. 2.1.2 [SvHH + 04] HugoGSchnack,NeeltjeEMvanHaren,HillekeEHulshoffPol,Marco Picchioni, Matthias Weisbrod, Heinrich Sauer, Tyrone Cannon, Matti Huttunen,RobinMurray,andRenéSKahn. Reliabilityofbrainvolumes frommulticenterMRIacquisition: acalibrationstudy. HBM,22(4):312– 20,August2004. 3.1,3.4 [TBM + 11] François Tadel, Sylvain Baillet, John Compton Mosher, Dimitrios Pan- tazis, and Richard M Leahy. Brainstorm: a user-friendly application for MEG/EEGanalysis. Comp Intel Neuro,2011:1–13,January2011. 3.2.1 [TP93] HiroYTodaandPeterCBPhillips. Vectorautoregressionsandcausality. Econometrica,61(6):1367–1393,1993. 5.1 [TT02] Y K Tse and Albert K C Tsui. A multivariate generalized autoregressive conditional heteroscedasticity model with time-varying correlations. J Bus & Econ Stat,20(3):351–362,July2002. 5.1,5.4 [VAG13] AudroneVirbickaite,MConcepciónAusín,andPedroGaleano.Bayesian inferencemethodsforunivariateandmultivariateGARCHmodels: asur- vey. J Econ Surveys,00(00):n/a–n/a,September2013. 5.1 [VBK + 09] Linda M Velasquez, Ronald Boellaard, Georgia Kollia, Wendy Hayes, Otto S Hoekstra, Adriaan A Lammertsma, and Susan M Galbraith. Repeatability of 18F-FDG PET in a multicenter phase I study of pa- tients with advanced gastrointestinal malignancies. J Nuclear Medicine, 50(10):1646–54,October2009. 3.1 130 [vVvDYS97] Barry D van Veen, Wim van Drongelen, Moshe Yuchtman, and Akifumi Suzuki. Localization of brain electrical activity via linearly constrained minimumvariancespatialfiltering. IEEETransBiomedEngr,44(9):867– 80,September1997. A.2 [Wal43] AbrahamWald. Testsofstatisticalhypothesesconcerningseveralparam- eters when the number of observations is large. Trans AMS, 54(3):426– 482,1943. 2.4.2.2 [WCBD07] Xue Wang, Yonghong Chen, Steven L Bressler, and Mingzhou Ding. Granger causality between multiple interdependent neurobiological time series: blockwiseversuspairwisemethods. IntJNeuralSys,17(2):71–8, April2007. 4.2.1 [WCD08] Xue Wang, Yonghong Chen, and Mingzhou Ding. Estimating Granger causalityafterstimulusonset: acautionarynote.NeuroImage,41(3):767– 76,July2008. 2.4 [WCK + 11] Guorong Wu, Fuyong Chen, Dezhi Kang, Xiangyang Zhang, Daniele Marinazzo, and Huafu Chen. Multiscale causal connectivity analysis by canonical correlation: theory and application to epileptic brain. IEEE Trans Biomed Engr,58(11):3088–96,November2011. 4.1,4.2.3,5.1 [Wei86] Andrew A Weiss. Asymptotic theory for ARCH models: estimation and testing. Econ Theory,2(1):107–131,1986. 5.1 [Wel90] William J Welch. Construction of permutation tests. J Amer Stat Assoc, 85(411):693–698,September1990. 2.3.3 [Wel10] Stefan Wellek. Testing Statistical Hypotheses of Equivalence and Non- inferiority. Chapman and Hall, Boca Raton, FL, USA, 2 edition, 2010. 3.2.6 [WHMn + 07] Mike P Weisend, Faith M Hanlon, Rebecca Montaño, Seppo P Ahlfors, Arthur C Leuthold, Dimitrios Pantazis, John Compton Mosher, Aposto- los P Georgopoulos, Matti S Hämäläinen, and Cheryl J Aine. Paving the way for cross-site pooling of magnetoencephalography (MEG) data. International Congress Series,1300:615–618,June2007. 3.1 [Wie56] Norbert Wiener. The theory of prediction. In E F Beckenbach, ed- itor, Modern Mathematics for Engineers, chapter 8, pages 125—-139. McGraw-Hill,1956. 1 [WJM + 12] David Wheland, Anand A Joshi, Katie McMahon, N. Hansell, N Martin, MWright,PaulMThompson,DavidWShattuck,andRichardMLeahy. 131 Robust identification of partial-correlation based networks with applica- tions to cortical thickness data. Proc IEEE ISBI, pages 1551–1554, May 2012. 2.4.1 [WLF08] Jianhua Wu, Xuguang Liu, and Jianfeng Feng. Detecting causality be- tween different frequencies. J Neuro Methods, 167(2):367–75, January 2008. 4.6 [WR96] Sabine Weiss and Peter Rappelsberger. EEG coherence within the 13- 18 Hz band as a correlate of a distinct lexical organisation of concrete and abstract nouns in humans. Neuroscience letters, 209(1):17–20, May 1996. 2.4 [WvDKH09] ChristopherTWilke,WimvanDrongelen,MichaelHKohrman,andBin He. Identification of epileptogenic foci from causal analysis of ECoG interictalspikeactivity. Clin Neuro,120(8):1449–56,August2009. 4.6 [WWK92] J Z Wang, S J Williamson, and L Kaufman. Magnetic source images determined by a lead-field analysis: the unique minimum-norm least- squares estimation. IEEE Trans Biomed Engr, 39(7):665–75, July 1992. A.1 [YGW + 10] Anastasia Yendiki, Douglas N Greve, Stuart Wallace, Mark Vangel, Jeremy Bockholt, Bryon A Mueller, Vince Magnotta, Nancy Andreasen, Dara S Manoach, and Randy L Gollub. Multi-site characterization of an fMRI working memory paradigm: reliability of activation indices. Neu- roImage,53(1):119–31,October2010. 3.4 [ZGW + 05] Kelly H Zou, Douglas N Greve, Meng Wang, Steven D Pieper, Simon K Warfield,NathanSWhite,SanjayManandhar,GregoryGBrown,MarkG Vangel, Ron Kikinis, and William M Wells. Reproducibility of func- tionalMRimaging: preliminaryresultsofprospectivemulti-institutional study performed by Biomedical Informatics Research Network. Radiol- ogy,237(3):781–9,December2005. 3.1 [Zha94] Zhengyou Zhang. Iterative point matching for registration of free-form curves and surfaces. Int J Comput Vis, 13(2):119–152, October 1994. 3.2.2 132 AppendixA SourceEstimation In this appendix, we describe in detail the two most common methods used to esti- mate distributed source activity from MEG recordings: weighted minimum-norm esti- mation(MNE)andlinearly-constrainedminimumvariance(LCMV)beamforming. We assume we have prewhitened channel recordings ˜ x b [n] 2 R C ,n=1,...,N, with a prewhitened lead field ˜ G 2 R V orR 3V (depending on orientation constraints), where the whitening is on the “noise” segment of the channel recordings. As in section 3.2.4, we let ˜ G v be thev th row of ˜ G if we have orientation-constrined source estimation and ˜ G v bethesubmatrixof ˜ Gcontainingrows3V2,3V1and3V ifwehaveorientation- unconstrainedsourceestimation. A.1 WeightedMNE MNEattemptstominimizethepenalizedcostfunction ˜ x b [n] ˜ G˜ s b [n] 2 ` 2 + k˜ s b [n]k 2 ` 2 for the sources ˜ s b [n]. The first term determines the fit of the estimated sources to the channel recordings, while the second term reguarlizes the fit towards broader activity with a lower maximum across vertices. To also model the heteroscedasticity across sources,itisalsoimportanttoestimatethesourcecovarianceE ˜ s b [n]˜ s T b [n] =C d asa diagonalmatrix. Eachdiagonalelementisestimatedfromthegainofthecorresponding 133 vertex in the forward model; this estimation is dependent on the orientation constraints inthedesiredsourceestimation: • for orientation-constrained source estimation, C d 2 R V⇥ V with each vertex v havingacorrespondingdiagonalelementC d,vv = ⇣ ˜ G T v ˜ G v ⌘ 1 ,and • for orientation-unconstrained source estimation, C d 2 R 3V⇥ 3V with C d,(3v 2,3v 2) = C d,(3v 1,3v 1) = C d,(3v,3v) = ⇣ tr n ˜ G T v ˜ G v o⌘ 1 so all three sourcescorrespondingtovertexv areaffectedequally. Then,theMNEsolutionisalinearmappingof ˜ x b [n]to˜ s b [n]throughtheinversekernel H[WWK92] H =C 1 d ˜ G T ⇣ ˜ GC 1 d ˜ G T + I ⌘ 1 The addition of the source covariance is also called depth weighting as it reduces the bias towards sources closer to the MEG sensor array [LWA + 06]. We choose = 1 9 tr n ˜ GN ˜ G T o to balance the condition number of the inverted matrix with the effectoftheregularizationonthesourceestimation. A.2 LCMVbeamforming LCMV beamforming attempts to decorrelate the sources at different vertices. LCMV beamformingcanbethoughtofasaminimum-normleast-squaresestimation[MBL03] with a different choice for estimating the source covariance. In LCMV beamforming, we construct the linear mapping from ˜ x b [n] to ˜ s b [n] through the matricesH v , one for 134 eachvertexv. Thesourcecovariancematrixisestimatedusingthecovariancefrompost- stimulus channel recordings ˜ C d = ˜ x b [n> 0]. Then, the inverse kernel at each vertex is [vVvDYS97] H v = ⇣ ˜ G T v ˜ C 1 d ˜ G v ⌘ 1 ˜ G T v ˜ C 1 d Fororientation-unconstrainedsourceestimation,therankof ˜ G T v ˜ C 1 d ˜ G v is2ratherthan 3 because MEG is insensitive to radial sources [CC83]. In this case, we take a rank- 2 Moore-Penrose pseudoinverse to reduce instability in the inversion of ˜ G T v ˜ C 1 d ˜ G v . For orientation-constrained source estimation, the value ˜ G T v ˜ C 1 d ˜ G v is a scalar that can easily be inverted. However, due to the bias of orientation-constrained LCMV beam- formingtowards sourcescloser to thecenter ofthebrain, we normalizethe orientation- constrained mapping at each vertex by its magnitude. The resulting inverse kernel is knownastheBorgotti-Kaplanbeamformer[BK79,SNP + 01]andsimplifiesto,foreach vertexv, H v = ˜ G T v ˜ C 1 d ˜ G T v ˜ C 1 d ThefullinversekernelHisconstructedbyconcatenatingH v row-wiseoverallvertices. 135 AppendixB NumericalImplementationofCGC We describe the algorithm to optimize ˆ G y 2 !y 1 (↵ , ) over all weights ↵ and such thatk↵ k=1andk k=1,usingprevioustheoryonconstrainedoptimization[Smi93]. UndertheconstraintsinCGC,↵ isforcedtolieonthesetofallpointsinR M 1 withunit norm,asetthatwewilldenoteS M 1 1 . Similarly, isforcedtolieonthesetS M 2 1 .We areoptimizingoverallweights,whichweorganizeintoonevector: w = 2 6 4 ↵ 3 7 5 2 S =S M 1 1 ⇥ S M 2 1 Denotethegradientof ˆ G y 2 !y 1 (↵ , )inR M 1 +M 2 by r w ˆ G y 2 !y 1 (↵ , )= 2 6 4 r ↵ ˆ G y 2 !y 1 (↵ , ) r ˆ G y 2 !y 1 (↵ , ) 3 7 5 Then, the gradient of ˆ G y 2 !y 1 (↵ , ) with respect to S is a projection of r w ˆ G y 2 !y 1 (↵ , )[AMS08]: r S w ˆ G y 2 !y 1 (↵ , )= 2 6 4 (I ↵↵ T )r ↵ ˆ Gy 2 ! y 1 (↵ , ) k(I ↵↵ T )r ↵ ˆ Gy 2 ! y 1 (↵ , )k (I T )r ˆ Gy 2 ! y 1 (↵ , ) k(I T )r ˆ Gy 2 ! y 1 (↵ , )k 3 7 5 where we have forced each element of the gradient to have magnitude 1, for reasons explainedlaterinthe“Geodesic”sectionofthealgorithm’soutline. 136 We use a nonlinear conjugate gradient descent algorithm with Polak-Ribière updat- ing [EAS98] to maximize ˆ G y 2 !y 1 (↵ , ). The algorithm uses two termination criteria (" for the weights and for the value of ˆ G y 2 !y 1 (↵ , )) and starts at a random initial pointw 0 = 2 6 4 ↵ 0 0 3 7 5 : Initialization Letg 0 =r S w 0 ˆ G y 2 !y 1 andd 0 =g 0 = 2 6 4 d ↵ 0 d 0 3 7 5 betheinitialgradientand descentdirection. Loop Foriterationk=1,...,M 1 +M 2 2,weoptimizeonsuccessivecurvesinS: Path Formthetwo-dimensionalcurve w(t 1 ,t 2 )= 2 6 4 ↵ k 1 cost 1 +d ↵ k 1 sint 1 k 1 cost 2 +d k 1 sint 2 3 7 5 This curve lies on S because the two elements ofd k 1 have unit norm, as mentionedearlier. Search Find the maximal value of ˆ G y 2 !y 1 (↵ , ) over the geodesic using the goldensectionsearch[Kie53]. Letthelocationofthatoptimumbe ˆ t 1 , ˆ t 2 . Update Setthe • newweightsw ˆ t 1 , ˆ t 2 ! w k = 2 6 4 ↵ k k 3 7 5 , • newgradientg k =r S w k ˆ G y 2 !y 1 , 137 • stepsize k [PR69] k =(g k ¯ g k 1 ) T g k 1 ¯ g k 1 = 2 6 4 I↵ k 1 ↵ T k 1 0 0I k 1 T k 1 3 7 5 g k 1 • andfinallythenewsearchdirectiond k : d k =g k + k ¯ d k 1 ¯ d k 1 = 2 6 4 ↵ k 1 sint 1 +d ↵ k 1 cost 1 k 1 sint 2 +d k 1 cost 2 3 7 5 Termination Ifeitherconditionbelowistrue kw k w k 1 k<" ˆ G y 2 !y 1 (↵ k , k ) ˆ G y 2 !y 1 ↵ k 1 , k 1 < exittheprocess. Repeat Gotoiterationk+1withstartingpointw k = 2 6 4 ↵ k k 3 7 5 andsearchdirec- tiond k . Reset Ifwereachthelastiterationintheloop(sothatk =M 1 +M 2 2)andneitherof the termination criterion are satisfied, set↵ 0 = ↵ M 1 +M 2 2 and 0 = M 1 +M 2 2 andrepeatthealgorithm. Weset"= =10 6 asatradeoffbetweenaccuracyandspeed. 138 AppendixC TabulatedResultsinCGCSimulations 139 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.810,0.842] [0.819,0.848] [0.839,0.872] [0.849,0.876] [0.879,0.903] [0.889,0.915] [0.864,0.893] [0.889,0.915] [0.910,0.932] [0.616,0.657] [0.619,0.657] [0.645,0.691] [0.696,0.732] [0.681,0.717] [0.704,0.745] [0.715,0.755] [0.706,0.750] [0.735,0.775] [0.609, 0.648] [0.614, 0.655] [0.622, 0.668] [0.678, 0.714] [0.654, 0.689] [0.678, 0.723] [0.690, 0.734] [0.668, 0.712] [0.686, 0.730] 4 [0.813,0.841] [0.850,0.876] [0.843,0.874] [0.856,0.883] [0.897,0.919] [0.901,0.925] [0.857,0.889] [0.895,0.920] [0.909,0.932] [0.636,0.673] [0.631,0.670] [0.671,0.715] [0.696,0.733] [0.690,0.728] [0.723,0.762] [0.709,0.751] [0.717,0.758] [0.753,0.793] [0.602, 0.642] [0.587, 0.628] [0.615, 0.664] [0.624, 0.663] [0.622, 0.661] [0.651, 0.695] [0.658, 0.705] [0.661, 0.707] [0.633, 0.676] 6 [0.774,0.806] [0.805,0.876] [0.820,0.854] [0.829,0.857] [0.871,0.895] [0.886,0.913] [0.844,0.875] [0.884,0.910] [0.888,0.913] [0.590,0.628] [0.616,0.657] [0.610,0.656] [0.671,0.710] [0.671,0.709] [0.696,0.740] [0.673,0.716] [0.700,0.742] [0.723,0.763] [0.528, 0.568] [0.568, 0.607] [0.555, 0.604] [0.596, 0.635] [0.602, 0.643] [0.599, 0.644] [0.587, 0.635] [0.586, 0.634] [0.591, 0.641] 8 [0.716,0.751] [0.752,0.786] [0.758,0.795] [0.778,0.808] [0.840,0.867] [0.842,0.875] [0.792,0.828] [0.864,0.892] [0.863,0.893] [0.553,0.597] [0.565,0.604] [0.550,0.598] [0.616,0.655] [0.653,0.692] [0.631,0.678] [0.649,0.697] [0.681,0.724] [0.688,0.732] [0.514, 0.554] [0.514, 0.553] [0.501, 0.551] [0.531, 0.573] [0.558, 0.598] [0.522, 0.571] [0.543, 0.589] [0.570, 0.615] [0.534, 0.588] 10 [0.624,0.662] [0.651,0.668] [0.672,0.717] [0.695,0.734] [0.770,0.801] [0.788,0.824] [0.725,0.768] [0.801,0.838] [0.848,0.879] [0.522,0.561] [0.526,0.566] [0.536,0.585] [0.560,0.600] [0.587,0.626] [0.583,0.632] [0.601,0.647] [0.622,0.666] [0.668,0.712] [0.485, 0.527] [0.486, 0.527] [0.491, 0.540] [0.501, 0.542] [0.492, 0.535] [0.526, 0.575] [0.510, 0.561] [0.480, 0.526] [0.494, 0.545] Table C.1: 95% confidence interval of the area under the ROC curve for differing AR model ordersP, SIR and SNR. In this table,weuseN =100samples. ThestandardtextisforGCwithknownweights,theboldtextisforCGCandtheitalicizedtext isforMGC.CGC’sintervalisconsistentlyhigherthanMGC’s. AsSNRincreases,N decreasesorP increases,thisdifference inperformanceincreases. 140 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.924,0.942] [0.935,0.950] [0.931,0.949] [0.948,0.962] [0.970,0.979] [0.973,0.984] [0.956,0.971] [0.969,0.979] [0.976,0.986] [0.758,0.790] [0.775,0.806] [0.762,0.801] [0.846,0.873] [0.852,0.878] [0.861,0.889] [0.873,0.900] [0.858,0.887] [0.888,0.915] [0.735, 0.769] [0.748, 0.784] [0.738, 0.779] [0.812, 0.842] [0.815, 0.842] [0.833, 0.864] [0.835, 0.866] [0.823, 0.856] [0.839, 0.870] 4 [0.928,0.945] [0.959,0.970] [0.957,0.972] [0.966,0.976] [0.977,0.985] [0.979,0.988] [0.965,0.977] [0.978,0.987] [0.983,0.991] [0.789,0.819] [0.807,0.836] [0.822,0.855] [0.869,0.893] [0.869,0.894] [0.882,0.911] [0.883,0.909] [0.899,0.922] [0.909,0.931] [0.729, 0.764] [0.732, 0.767] [0.749, 0.787] [0.789, 0.820] [0.800, 0.830] [0.806, 0.842] [0.821, 0.855] [0.826, 0.859] [0.822, 0.856] 6 [0.921,0.938] [0.942,0.956] [0.955,0.970] [0.948,0.962] [0.975,0.984] [0.977,0.986] [0.957,0.973] [0.979,0.987] [0.982,0.990] [0.755,0.786] [0.776,0.808] [0.788,0.824] [0.843,0.869] [0.879,0.901] [0.886,0.911] [0.876,0.906] [0.896,0.922] [0.903,0.926] [0.665, 0.698] [0.686, 0.725] [0.680, 0.723] [0.759, 0.795] [0.763, 0.795] [0.771, 0.808] [0.782, 0.820] [0.796, 0.834] [0.778, 0.817] 8 [0.882,0.904] [0.925,0.943] [0.910,0.932] [0.928,0.945] [0.967,0.977] [0.968,0.979] [0.942,0.960] [0.973,0.983] [0.977,0.986] [0.674,0.714] [0.714,0.747] [0.731,0.773] [0.808,0.838] [0.839,0.866] [0.836,0.867] [0.854,0.883] [0.870,0.897] [0.882,0.908] [0.604, 0.644] [0.615, 0.652] [0.653, 0.697] [0.704, 0.742] [0.711, 0.745] [0.693, 0.736] [0.744, 0.783] [0.736, 0.774] [0.740, 0.778] 10 [0.790,0.821] [0.846,0.872] [0.854,0.883] [0.872,0.896] [0.944,0.958] [0.952,0.967] [0.904,0.927] [0.954,0.969] [0.972,0.983] [0.604,0.642] [0.628,0.668] [0.673,0.719] [0.731,0.766] [0.776,0.808] [0.784,0.821] [0.831,0.862] [0.851,0.880] [0.876,0.903] [0.569, 0.608] [0.559, 0.601] [0.588, 0.637] [0.620, 0.660] [0.641, 0.679] [0.634, 0.678] [0.684, 0.729] [0.695, 0.739] [0.697, 0.741] Table C.2: 95% confidence interval of the area under the ROC curve for differing AR model ordersP, SIR and SNR. In this table,weuseN =200samples. ThestandardtextisforGCwithknownweights,theboldtextisforCGCandtheitalicizedtext isforMGC.CGC’sintervalisconsistentlyhigherthanMGC’s. AsSNRincreases,N decreasesorP increases,thisdifference inperformanceincreases. ComapredwithtableC.1,alltheintervalsarehigheraswell. 141 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.981,0.988] [0.987,0.993] [0.987,0.993] [0.991,0.996] [0.879,0.903] [0.995,0.998] [0.993,0.996] [0.998,0.999] [0.997,0.999] [0.913,0.932] [0.920,0.939] [0.916,0.939] [0.957,0.969] [0.966,0.976] [0.964,0.977] [0.968,0.980] [0.970,0.981] [0.977,0.986] [0.886, 0.909] [0.894, 0.916] [0.884, 0.911] [0.934, 0.950] [0.946, 0.960] [0.944, 0.962] [0.950, 0.965] [0.947, 0.963] [0.954, 0.968] 4 [0.990,0.995] [0.995,0.998] [0.995,0.998] [0.996,0.998] [0.999,0.999] [0.999,1.000] [0.995,0.998] [0.999,1.000] [0.999,1.000] [0.940,0.955] [0.954,0.966] [0.949,0.965] [0.972,0.982] [0.980,0.987] [0.983,0.990] [0.984,0.991] [0.984,0.991] [0.988,0.994] [0.889, 0.911] [0.899, 0.920] [0.898, 0.922] [0.934, 0.950] [0.948, 0.962] [0.948, 0.964] [0.948, 0.963] [0.957, 0.970] [0.952, 0.967] 6 [0.988,0.993] [0.994,0.997] [0.995,0.998] [0.995,0.998] [0.998,0.999] [0.999,1.000] [0.995,0.998] [0.999,1.000] [0.999,1.000] [0.935,0.950] [0.949,0.962] [0.945,0.963] [0.977,0.985] [0.979,0.987] [0.981,0.989] [0.980,0.989] [0.991,0.995] [0.989,0.994] [0.855, 0.880] [0.872, 0.896] [0.879, 0.906] [0.928, 0.945] [0.924, 0.940] [0.935, 0.954] [0.940, 0.957] [0.948, 0.963] [0.942, 0.959] 8 [0.982,0.988] [0.993,0.996] [0.990,0.996] [0.993,0.996] [0.998,0.999] [0.998,0.999] [0.993,0.997] [0.997,1.000] [0.999,1.000] [0.901,0.921] [0.921,0.939] [0.931,0.949] [0.965,0.977] [0.976,0.984] [0.978,0.987] [0.978,0.988] [0.985,0.992] [0.987,0.993] [0.787, 0.818] [0.809, 0.836] [0.826, 0.859] [0.890, 0.911] [0.904, 0.922] [0.898, 0.922] [0.922, 0.942] [0.924, 0.944] [0.937, 0.956] 10 [0.937,0.952] [0.964,0.976] [0.969,0.981] [0.974,0.982] [0.996,0.998] [0.995,0.998] [0.981,0.990] [0.998,0.999] [0.997,1.000] [0.789,0.819] [0.822,0.849] [0.842,0.873] [0.942,0.956] [0.950,0.963] [0.962,0.975] [0.970,0.982] [0.979,0.988] [0.985,0.992] [0.707, 0.742] [0.701, 0.737] [0.720, 0.763] [0.843, 0.871] [0.841, 0.866] [0.851, 0.881] [0.903, 0.927] [0.902, 0.925] [0.902, 0.925] Table C.3: 95% confidence interval of the area under the ROC curve for differing AR model ordersP, SIR and SNR. In this table,weuseN =400samples. ThestandardtextisforGCwithknownweights,theboldtextisforCGCandtheitalicizedtext isforMGC.CGC’sintervalisconsistentlyhigherthanMGC’s. AsSNRincreases,N decreasesorP increases,thisdifference inperformanceincreases. Inaddition,comparedwithtablesC.2andC.3,alltheconfidenceintervalsarehigher. 142 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.005,0.203] [0.005,0.210] [0.005,0.217] [0.008,0.219] [0.007,0.212] [0.007,0.209] [0.017,0.216] [0.018,0.219] [0.017,0.200] [0.003, 0.214] [0.002, 0.203] [0.003, 0.216] [0.002, 0.183] [0.002, 0.194] [0.003, 0.185] [0.002, 0.168] [0.002, 0.172] [0.002, 0.170] 4 [0.006,0.254] [0.006,0.265] [0.008,0.265] [0.009,0.266] [0.009,0.268] [0.012,0.257] [0.016,0.275] [0.017,0.262] [0.024,0.261] [0.007, 0.269] [0.006, 0.275] [0.008, 0.276] [0.004, 0.255] [0.005, 0.237] [0.004, 0.250] [0.004, 0.229] [0.003, 0.234] [0.004, 0.224] 6 [0.007,0.314] [0.008,0.322] [0.008,0.319] [0.013,0.313] [0.015,0.313] [0.018,0.329] [0.018,0.310] [0.028,0.318] [0.029,0.317] [0.021, 0.340] [0.013, 0.311] [0.017, 0.320] [0.009, 0.301] [0.012, 0.293] [0.009, 0.278] [0.006, 0.303] [0.008, 0.289] [0.007, 0.283] 8 [0.009,0.374] [0.008,0.382] [0.009,0.382] [0.015,0.377] [0.011,0.388] [0.014,0.390] [0.021,0.377] [0.030,0.377] [0.030,0.367] [0.033, 0.394] [0.030, 0.371] [0.024, 0.382] [0.023, 0.363] [0.016, 0.373] [0.021, 0.354] [0.015, 0.336] [0.012, 0.349] [0.015, 0.351] 10 [0.015,0.430] [0.009,0.440] [0.010,0.439] [0.017,0.442] [0.013,0.455] [0.024,0.451] [0.017,0.445] [0.021,0.443] [0.039,0.451] [0.053, 0.440] [0.047, 0.451] [0.058, 0.445] [0.045, 0.430] [0.037, 0.434] [0.041, 0.417] [0.028, 0.408] [0.026, 0.414] [0.027, 0.405] TableC.4: 95%confidenceintervaloferroroftheestimatedregionalmeasurewithrespecttothetruebivariateGCwithweights known for differing AR model orderP, SIR and SNR. In this table, we useN =100 samples. The bold text is the error for CGC and the italicized text is for GCCA. As the number of samples or the order increases, CGC performs better than GCCA. Inaddition,asSNRdecreases,CGCperformsworse. 143 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.001,0.089] [0.001,0.088] [0.001,0.085] [0.001,0.090] [0.002,0.089] [0.002,0.087] [0.003,0.092] [0.003,0.088] [0.004,0.094] [0.002, 0.139] [0.002, 0.132] [0.002, 0.135] [0.001, 0.097] [0.001, 0.098] [0.001, 0.090] [0.001, 0.095] [0.001, 0.090] [0.001, 0.084] 4 [0.002,0.108] [0.001,0.107] [0.001,0.102] [0.002,0.100] [0.002,0.105] [0.002,0.105] [0.002,0.103] [0.003,0.099] [0.004,0.114] [0.004, 0.182] [0.004, 0.158] [0.004, 0.166] [0.002, 0.140] [0.001, 0.127] [0.001, 0.128] [0.001, 0.130] [0.001, 0.115] [0.001, 0.109] 6 [0.002,0.126] [0.002,0.120] [0.001,0.122] [0.002,0.117] [0.002,0.116] [0.002,0.121] [0.003,0.123] [0.003,0.116] [0.004,0.120] [0.008, 0.209] [0.004, 0.192] [0.006, 0.184] [0.003, 0.159] [0.002, 0.154] [0.002, 0.157] [0.002, 0.150] [0.003, 0.159] [0.002, 0.153] 8 [0.003,0.145] [0.002,0.145] [0.002,0.143] [0.002,0.129] [0.002,0.126] [0.002,0.126] [0.003,0.124] [0.002,0.128] [0.003,0.126] [0.013, 0.249] [0.011, 0.252] [0.009, 0.225] [0.006, 0.204] [0.004, 0.209] [0.003, 0.182] [0.004, 0.190] [0.003, 0.170] [0.003, 0.193] 10 [0.002,0.171] [0.003,0.170] [0.002,0.169] [0.002,0.140] [0.002,0.145] [0.003,0.150] [0.003,0.141] [0.003,0.147] [0.003,0.147] [0.028, 0.288] [0.024, 0.288] [0.017, 0.278] [0.013, 0.241] [0.009, 0.243] [0.007, 0.234] [0.005, 0.218] [0.004, 0.230] [0.004, 0.207] TableC.5: 95%confidenceintervaloferroroftheestimatedregionalmeasurewithrespecttothetruebivariateGCwithweights known for differing AR model orderP, SIR and SNR. In this table, we useN =200 samples. The bold text is the error for CGC and the italicized text is for GCCA. As the number of samples or the order increases, CGC performs better than GCCA. Inaddition,asSNRdecreases,CGCperformsworse. 144 SNR=1 SNR=5 SNR=25 P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 2 [0.001,0.070] [0.001,0.061] [0.001,0.059] [0.001,0.040] [0.001,0.039] [0.001,0.045] [0.001,0.042] [0.001,0.039] [0.001,0.041] [0.001, 0.083] [0.001, 0.076] [0.001, 0.068] [0.000, 0.044] [0.001, 0.041] [0.000, 0.042] [0.001, 0.045] [0.000, 0.037] [0.001, 0.037] 4 [0.001,0.088] [0.001,0.080] [0.001,0.071] [0.001,0.045] [0.001,0.042] [0.001,0.044] [0.001,0.045] [0.001,0.045] [0.001,0.044] [0.002, 0.109] [0.002, 0.105] [0.001, 0.092] [0.001, 0.061] [0.001, 0.054] [0.001, 0.049] [0.000, 0.047] [0.000, 0.037] [0.000, 0.038] 6 [0.001,0.110] [0.001,0.102] [0.001,0.104] [0.001,0.055] [0.001,0.048] [0.001,0.050] [0.001,0.048] [0.001,0.049] [0.001,0.045] [0.003, 0.133] [0.002, 0.125] [0.002, 0.122] [0.001, 0.085] [0.001, 0.073] [0.001, 0.068] [0.001, 0.058] [0.001, 0.055] [0.000, 0.051] 8 [0.002,0.150] [0.002,0.140] [0.002,0.134] [0.001,0.085] [0.001,0.072] [0.001,0.063] [0.001,0.052] [0.001,0.049] [0.001,0.046] [0.006, 0.181] [0.005, 0.166] [0.004, 0.159] [0.001, 0.117] [0.001, 0.107] [0.001, 0.097] [0.001, 0.087] [0.001, 0.071] [0.001, 0.071] 10 [0.002,0.178] [0.003,0.167] [0.002,0.173] [0.002,0.127] [0.001,0.119] [0.001,0.111] [0.001,0.070] [0.001,0.056] [0.001,0.055] [0.012, 0.212] [0.010, 0.203] [0.012, 0.207] [0.004, 0.156] [0.004, 0.158] [0.002, 0.145] [0.001, 0.122] [0.001, 0.116] [0.001, 0.099] TableC.6: 95%confidenceintervaloferroroftheestimatedregionalmeasurewithrespecttothetruebivariateGCwithweights known for differing AR model orderP, SIR and SNR. In this table, we useN =400 samples. The bold text is the error for CGC and the italicized text is for GCCA. As the number of samples or the order increases, CGC performs better than GCCA. Inaddition,asSNRdecreases,CGCperformsworse. 145 SNR=1 SNR=5 SNR=25 N P SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 SIR=1 SIR=5 SIR=25 100samples 2 44.20% 45.47% 44.29% 32.13% 32.27% 30.29% 25.43% 28.19% 26.86% 4 52.07% 48.27% 47.81% 39.87% 37.67% 36.48% 35.62% 32.19% 31.71% 6 52.33% 50.33% 49.43% 44.00% 41.07% 39.43% 41.90% 38.10% 38.00% 8 54.13% 53.13% 52.38% 46.13% 47.67% 42.76% 44.38% 38.57% 41.81% 10 54.60% 55.40% 53.43% 49.93% 48.53% 48.48% 45.33% 44.48% 44.57% 200samples 2 65.27% 62.67% 61.81% 45.60% 42.67% 41.81% 34.95% 35.05% 32.95% 4 70.60% 68.87% 68.48% 55.27% 51.07% 48.57% 42.57% 40.48% 39.24% 6 77.47% 74.67% 73.71% 59.73% 56.13% 53.05% 48.95% 50.10% 46.00% 8 81.00% 82.00% 78.67% 67.53% 67.00% 59.05% 57.62% 54.76% 53.62% 10 87.20% 86.40% 82.67% 77.47% 76.60% 73.05% 66.10% 64.57% 61.62% 400samples 2 75.40% 72.47% 69.43% 41.13% 37.53% 32.10% 17.62% 15.05% 11.62% 4 83.73% 80.47% 79.90% 56.60% 50.73% 46.76% 26.57% 24.48% 23.14% 6 87.07% 87.20% 84.57% 65.33% 63.27% 58.76% 39.33% 38.29% 33.71% 8 92.73% 92.40% 90.29% 78.47% 76.40% 72.00% 53.71% 50.48% 47.33% 10 94.80% 94.80% 94.67% 86.80% 88.27% 85.81% 70.10% 68.38% 65.05% TableC.7: PercentageofsimulationsinwhichCGCisclosertothetruecausalityinsimulationthanGCCA.Boldtextindicates thoseparametersforwhichCGChaslesserroramajorityofthetime. AtlowSNRorhighmodelorderP,CGCisclosertothe true causality than GCCA for most simulations. For low numbers of samplesN and low model orderP, GCCA benefits from having fewer parameters, and is thus a slightly better estimator of the underlying causality; however, based on tables C.4, C.5 andC.6,bothmetricsarepoorinestimatingtheunderlyingcausality. 146 AppendixD ADMMStepsforvARCHModeling Inthisappendix,ourgoalistofindtheminimumofthefunction L(w,C 1 ,...,C R ,H[R+1,...,N],N[R+1,...,N]) = (N1)M 2 ln2⇡ + 1 2 N X n=R+1 ln|H[n]|+ 1 2 N X n=R+1 " T [n]H 1 [n]"[n] + ⇢ 2 N X n=R+1 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F ⇢ 2 N X n=R+1 kN[n]k 2 F where the parameters are (w,C 1 ,...,C R ,H[R+1,...,N]) and the Lagrange mul- tipliers are N[R+1,...,N]. We do this using an alternating direction approach, wherebyweiteratively(andinorder) • update (w,C 1 ,...,C R )holdingallothervariablesconstant, • updateH[R+1,...,N]holdingallothervariablesconstant,andthen • ascendN[R+1,...,N]basedonthenewupdatesforalltheparameters. Wedescribeeachstepbelow,includingthestoppingcriteria(whichrequirethecalcula- tionoftheprimalanddualgaps). 147 D.1 UpdatingtheParametersinvARCHmodel Holding the substituted conditional covariance matrices and the Lagrange multiplier matricesconstant,theonlytermthataffectsthevARCHparametersistheleast-squares term min w,C 1 ,...,C R N X n=R+1 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F This can be simplified by exploiting the symmetry in the matrix within the Frobe- nius norm. Each off-diagonal element of hcev ⇣ w + P R r=1 C r e[nr] ⌘ is repre- sented twice, while each on-diagonal element of hcev ⇣ w + P R r=1 C r e[nr] ⌘ is represented once. Thus, within the norm, each off-diagonal matrix has weight 1 while each on-diagonal element has weight 1 p 2 . So, we can transfer the half- vectorization operator onto the known matrices H[n] and N[n] through the vector p = vech ⇣ 1 M⇥ M + ⇣ 1 p 2 1 ⌘ I M⇥ M ⌘ where I M⇥ M is the M ⇥ M identity matrix and1 M⇥ M is theM⇥ M matrix with a 1 at every location. Using this constant vector wehave min w,C 1 ,...,C R N X n=R+1 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F =min w,C 1 ,...,C R N X n=R+1 kp w + R X r=1 C r e[nr]vech(H[n]N[n]) ! k 2 ` 2 where represents the element-wise, or Hadamard, multiplication. We additionally defineP 2 R 1 2 M(M+1)⇥ 1 2 M(M+1) as a diagonal matrix whose diagonal corresponds to the elements in p. We use properties of Kronecker products to write PC r e " n r = 148 vec PCe " n r = e T [nr]⌦ P where vec(·) converts a matrix into a column vec- tor by stacking the columns in order and⌦ is the Kronecker product. We then have a weightedleast-squaresestimationproblemateachtime min w,C 1 ,C 2 ,...,C R N X n=2 kPw + R X r=1 e T [nr]⌦ P vec(C r )Pvech(H[n]N[n])k 2 ` 2 which can be stacked in time to form one weighted least-squares problem. Using Kro- necker product rules, we can further simplify this step using the singular value decom- position 2 6 6 6 6 6 6 6 6 6 6 4 1 e T [R] e T [R] ··· e T [1] 1 e T [R] e T [R] ··· e T [2] . . . . . . . . . . . . . . . 1 e T [N2] e T [N3] ··· e T [NR+1] 1 e T [N1] e T [N2] ··· e T [NR] 3 7 7 7 7 7 7 7 7 7 7 5 =U e ⌃ e V T e and the fact thatP 1 is just a diagonal matrix whose nonzero values are inverses of the correspondingnonzerovaluesinP: 2 6 6 6 6 6 6 6 4 ˆ w vec ⇣ ˆ C 1 ⌘ . . . vec ⇣ ˆ C R ⌘ 3 7 7 7 7 7 7 7 5 = V e ⌃ e U T e ⌦ P 1 2 6 6 6 6 6 6 6 4 pvech(H[R+1]N[R+1]) pvech(H[R+2]N[R+2]) . . . pvech(H[N]N[N]) 3 7 7 7 7 7 7 7 5 By taking a singular value decomposition of a smaller matrix, we reduce the compu- tation time and the instability of the pseudoinverse operator. In addition, the matrix 149 V e ⌃ e U T e ⌦ P 1 is a function solely of constants and the signal itself. Thus, the ma- trix can be precomputed and stored to avoid having a singular value decomposition for eachiterationofthealgorithm. D.2 Updatingtheconditionalcovariancematrices Holding the vARCH parameters and Lagrange multiplier matrices constant, we can solve for each covariance matrix H[n] independently, since the only part of the cost functionrelatingtoH[n]isthemarginallikelihoodattimen L n (w,C,H[n],N[n]) = M 2 ln2⇡ + 1 2 ln|H[n]|+ 1 2 " T [n]H 1 [n]"[n] + ⇢ 2 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F ⇢ 2 kN[n]k 2 F The gradient of this cost function can be computed using formulas in matrix calculus [PP12] @L n @ H[n] = 1 2 @ ln|H[n]| @ H[n] + 1 2 @ Tr H 1 [n]"[n]" T [n] @ H[n] + ⇢ 2 @ kH[n]N[n]hcev ⇣ w + P R r=1 C r e[nr] ⌘ k 2 F @ H[n] = 1 2 H 1 [n]+H 1 [n]"[n]" T [n]H 1 [n] +⇢ H[n]N[n]hcev w + R X r=1 C r e[nr] !! 150 Whensettingthistozero,theresultisacubicmatrixequationthatisverydifficultto solve. However,bymovingthetermcorrespondingto⇢ H[n]overtotheleft-handside, wegetthatattheoptimalcovariance, ˆ H[n]=N[n]+hcev w + R X r=1 C r e[nr] ! + 1 2⇢ H 1 [n] "[n]" T [n]H[n] H 1 [n] so we can apply this fixed-point iteration a few times to get the best estimate ˆ H[n]. Since the iterations are completely separate for different n we can use par- allelization to increase the speed of this computation; in the implementation we use the mmx toolbox (http://www.cs.washington.edu/people/postdocs/ tassa/code/)forfastcomputationofthefixed-pointiterations. D.3 UpdatingtheLagrangemultipliermatrices Inthisstep,wewanttousedualascenttoprovideanupdatefortheLagrangemultiplier matrices. We can find the ascent direction by taking the derivative of the cost function with respect to each Lagrange matrixN[n]. Similar to the updates for the conditional covariance matrices (section D.2), we only have to look at the marginal likelihood at timen. Usingformulasinmatrixcalculus,thederivativeis @L n @ N[n] = ⇢ hcev w + R X r=1 C r e[nr] ! H[n]+N[n] ! ⇢ N[n] = ⇢ hcev w + R X r=1 C r e[nr] ! H[n] ! 151 so that the update is (incorporating the augmented Lagrangian constant ⇢ into the step- size): ˆ N[n]=N[n]+↵ [n] hcev w + R X r=1 C r e[nr] ! H[n] ! We choose the stepsize ↵ [n]=1 for all values ofn so that the resulting estimation set ⇣ ˆ w, ˆ C 1 ,... ˆ C R , ˆ H[R+1,...,N], ˆ N[R+1,...,N] ⌘ is dual feasible with respect to theestimatedLagrangemultipliers ˆ N[R+1,...,N][BPC + 10]. D.4 Updatingtheprimal&dualgapsforconvergence To define primal and dual feasibility, we follow the procedure used in other ADMM literature [BPC + 10], where we first define three functions related to the unaugmented Lagrangian optimization: the marginal Lagrangian optimization function without the Lagrangianmultipliers U n (H[n]) = M 2 ln2⇡ + 1 2 ln|H[n]|+ 1 2 " T [n]H 1 [n]"[n], the marginal Lagrangian optimization with the Lagrangian multipliers (where [·] rc cor- respondstothevalueinrowr andcolumncofthematrixargument) A n (w,C 1 ,...,C R ,H[n],N[n]) =U n (H n ) +⇢ M X r=1 M X c=1 [N n ] rc " hcev w + R X r=1 C r e[nr] ! H[n] # rc 152 andthetotalLagrangianoptimizationfunction A(w,C 1 ,...,C R ,H[R+1],...,H[N1],H[N],N[R+1],...,N[n]) = N X n=R+1 A n (w,C,H[n],N[n]). WenotethatintheformulationofA n weincludeda⇢ inthepenaltyterm,whichallows A to match the optimization function used in the ADMM approach above. In fact, the costfunctionusedintheADMMapproachcanberewrittenas L =A+ ⇢ 2 N X n=R+1 khcev w + R X r=1 C r e[nr] ! H[n]k 2 F L n =A n + ⇢ 2 khcev w + R X r=1 C r e[nr] ! H[n]k 2 F =U n + ⇢ 2 khcev w + R X r=1 C r e[nr] ! H[n]+N[n]k 2 F ⇢ 2 kN[n]k 2 F Primal and dual feasibility are satisfied for the optimal values ˘ w, ˘ C 1 ,... ˘ C R , ˘ H[R+1,...,N], along with optimal Lagrange multiplier matri- ces ˘ N[R+1,...,N]. Primal feasibility corresponds to setting the gradient of the total Lagrangian optimization function with respect to the Lagrange multipli- ers to zero. As a result, primal feasibility is a set of N 1 matrix conditions 0= hcev ⇣ ˘ w + P R r=1 ˘ C r e[nr] ⌘ ˘ H[n]. Dualfeasibilitymeanshavingthegradient 153 of the total Lagrangian optimization function with respect to the parameters be zero. Thiscanbewrittenas 02r ˘ w, ˘ C 1 , ˘ C 2 ,..., ˘ C R A = N X n=R+1 r ˘ w, ˘ C 1 , ˘ C 2 ,..., ˘ C R U n +⇢ 2 6 6 6 6 6 6 6 6 6 6 4 I e[n1]⌦ I e[n2]⌦ I . . . e[nR]⌦ I 3 7 7 7 7 7 7 7 7 7 7 5 vech ⇣ ˘ N[n] ⌘ 02r ˘ H[n] A =r ˘ H[n] A n =r ˘ H[n] U n ⇢ ˘ N[n] usingidentitiesinmatrixcalculus[PP12]. Since primal feasibility is just the satisfaction of the constraints, the primal gap is thedeviationfromthoseconstraints,stackedoversamples r = 2 6 6 6 6 6 6 6 6 6 6 6 4 hcev ⇣ ˆw + P R r=1 ˆ C r e[R+1r] ⌘ ˆ H[R+1] hcev ⇣ ˆw + P R r=1 ˆ C r e[R+2r] ⌘ ˆ H[R+2] . . . hcev ⇣ ˆw + P R r=1 ˆ C r e[N1r] ⌘ ˆ H[N1] hcev ⇣ ˆw + P R r=1 ˆ C r e[Nr] ⌘ ˆ H[N] 3 7 7 7 7 7 7 7 7 7 7 7 5 which also happen to be the values used to update the Lagrange multiplier matrices at eachiteration(sectionD.3). 154 From the direction where we update the substituted covariance matrices (section D.2), ˆ H[n]istheminimumofL n ⇣ ˆ w, ˆ C 1 ,..., ˆ C R ,H[n],N[n] ⌘ soitsatisfies 02r ˆ H[n] L n ⇣ ˆ w, ˆ C 1 ,..., ˆ C R ,H[n] ⌘ =r ˆ H[n] U n ⇢ hcev ˆw + R X r=1 ˆ C r e[nr] ! ˆ H[n]+N[n] ! =r ˆ H[n] U n ⇢ ˆ N[n] using the update rule for the Lagrange multiplier matrices in section D.3. Since this is the second condition required for dual feasibility, we conclude that this step enforces dualfeasibilitywithrespecttothesecondcondition. From the direction where we updated the vARCH parameters (sec- tion D.1), ˆ w, ˆ C 1 , ˆ C 2 ,... ˆ C R 1 and ˆ C R together are the minimum of L(w,C 1 ,...C R ,H[R+1,...,N],N[R+1,...,N])sotheysatisfy 02r ˆ w, ˆ C 1 ,... ˆ C R L = N X n=R+1 r ˆ w, ˆ C 1 ,... ˆ C R L n = N X n=R+1 r ˆ w, ˆ C 1 ,... ˆ C R U n +⇢ 2 6 6 6 6 6 6 6 6 6 6 4 I e[n1]⌦ I e[n2]⌦ I . . . e[nR]⌦ I 3 7 7 7 7 7 7 7 7 7 7 5 vech ⇣ ˆ N[n]+ ˆ H[n]H[n] ⌘ 155 which satisfies the first (and only other) condition for dual feasibility whenever the dif- ference in variances is zero. This sole term that prevents dual feasibility is the duality gap,andcanbesimplifiedtotheconcatenationofR+1columnvectors s = 2 6 6 6 6 6 6 6 6 6 6 6 4 P N n=R+1 vech ⇣ ˆ H[n]H[n] ⌘ P N n=R+1 vec ⇣ vech ⇣ ˆ H[n]H[n] ⌘ e T [n1] ⌘ P N n=R+1 vec ⇣ vech ⇣ ˆ H[n]H[n] ⌘ e T [n2] ⌘ . . . P N n=R+1 vec ⇣ vech ⇣ ˆ H[n]H[n] ⌘ e T [nR] ⌘ 3 7 7 7 7 7 7 7 7 7 7 7 5 In summary, the primal gap is defined by the deviation from the constraints as de- notedbyrabove,andthedualitygapisthe2-partcolumnvectordenotedbysabove. We note that the above choice of stepsize (↵ [n]=1 for all samples) allowed the relatively simplecomputationofbothgaps,alongwithitsotherbenefitsasnotedin[BPC + 10]. 156
Abstract (if available)
Abstract
Model‐based approaches to electrophysiological signal processing provide low‐variance estimates of the activity and relationships within neurological systems. In this dissertation, we develop a method for testing consistency of modeling of neurological signals across different data acquisition systems, introduce a new measure for modeling causality in mean between sets of signals, and construct a fast method for modeling causality in variance. ❧ For testing consistency of signals across different acquisition systems, we use statistical resampling and estimation techniques to assess the similarity of activity estimated from recordings on two different systems. Using this methodology, we can determine what preprocessing and postprocessing steps are advisable for non‐invasive studies of brain electrical activity, for example. We can also determine which models are most robust across data acquisition systems. ❧ For causality in mean, we develop a new model of causality between sets of signals inspired by properties in electrophysiological signals. Since nearby sensors are often sensitive to the same information, grouping sensors into regions of interest can increase signal‐to‐noise ratio. We exploit this property to form a new measure of causality called canonical Granger causality (CGC), which trades off spatial resolution to increase the ability to find short, dynamic connections that are causal in mean. ❧ For causality in variance, we introduce a new method to more quickly estimate the effect of one signal on another signal's variation. This causality is estimated by fitting a conditional heteroscedasticity model to signals using a maximum‐likelihood approach. We introduce a new method for this maximum‐likelihood problem which uses variable splitting and the alternating direction method of multipliers, avoiding the computational cost of traditional gradient descent methods. The significant gains in speed allow us to analyze dynamic changes in causality in variance with a now‐reasonable runtime. ❧ Along with each project we provide applications of these methods to MEG recordings, local field potentials on the macaque brain and intracranial voltage recordings on the resting‐state human brain.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Signal processing methods for brain connectivity
PDF
Multiscale spike-field network causality identification
PDF
Automatic quantification and prediction of human subjective judgments in behavioral signal processing
PDF
Functional connectivity analysis and network identification in the human brain
PDF
High-speed and reconfigurable all-optical signal processing for phase and amplitude modulated signals
PDF
On the electrophysiology of multielectrode recordings of the basal ganglia and thalamus to improve DBS therapy for children with secondary dystonia
PDF
Compression of signal on graphs with the application to image and video coding
PDF
Multivariate statistical analysis of magnetoencephalography data
PDF
Signal processing methods for interaction analysis of functional brain imaging data
PDF
Estimation of graph Laplacian and covariance matrices
PDF
New theory and methods for accelerated MRI reconstruction
PDF
Statistical signal processing of magnetoencephalography data
PDF
Intelligent knowledge acquisition systems: from descriptive to predictive models
PDF
Aggregation and modeling using computational intelligence techniques
PDF
All-optical signal processing toward reconfigurable optical networks
PDF
Novel optimization tools for structured signals recovery: channels estimation and compressible signal recovery
PDF
Computational modeling of human interaction behavior towards clinical translation in autism spectrum disorder
PDF
Scalable sampling and reconstruction for graph signals
PDF
Modeling social causality and social judgment in multi-agent interactions
PDF
Nonlinear modeling of causal interrelationships in neuronal ensembles: an application to the rat hippocampus
Asset Metadata
Creator
Ashrafulla, Syed
(author)
Core Title
Causality and consistency in electrophysiological signals
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
03/03/2014
Defense Date
12/12/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
autoregression,brain,causality,conditional heteroscedasticity,correlation,electroencephalography,electrophysiology,magnetoencephalography,OAI-PMH Harvest,signal processing
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Leahy, Richard M. (
committee chair
), Chugg, Keith M. (
committee member
), Mendel, Jerry M. (
committee member
), Moon, Hyungsik Roger (
committee member
), Ortega, Antonio K. (
committee member
)
Creator Email
ashraful@usc.edu,syedashrafulla@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-366875
Unique identifier
UC11288412
Identifier
etd-Ashrafulla-2277.pdf (filename),usctheses-c3-366875 (legacy record id)
Legacy Identifier
etd-Ashrafulla-2277.pdf
Dmrecord
366875
Document Type
Dissertation
Rights
Ashrafulla, Syed
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
autoregression
brain
causality
conditional heteroscedasticity
correlation
electroencephalography
electrophysiology
magnetoencephalography
signal processing