Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Fast flexible dynamic three-dimensional magnetic resonance imaging
(USC Thesis Other)
Fast flexible dynamic three-dimensional magnetic resonance imaging
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FAST FLEXIBLE DYNAMIC THREE-DIMENSIONAL MAGNETIC RESONANCE IMAGING by Yinghua Zhu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2016 Copyright 2016 Yinghua Zhu To my parents and my wife. ii Acknowledgements It has been a challenging yet fun journey toward the nish of this dissertation. This dissertation would not have been nished without the support and help from many people. I am most grateful to my advisor, Prof. Krishna Nayak, for opening the door of research for me, oering interesting research topics, and providing guidance during the my Ph.D study. I am also grateful to the patience and exibility he has given to me, and his enthusiasm and kindness that have made Magnetic Resonance Engineering Laboratory (MREL) the most fantastic research group. I am thankful to my qualifying and dissertation committees, Prof. Shrikanth Narayanan, Prof. Justin Haldar, Prof. Louis Goldstein, Prof. Meng Law, and Prof. John Wood, for their valuable suggestions. I dedicate special appreciation to Prof. Narayanan and Prof. Law who helped to provide me with interesting research topics and funding. I am very appreciative to my labmates, Yoon-Chul Kim, Robert Marc Lebel, Sajan Goud Lingala, Zungho Zun, Mahender Makhijani, Travis Smith, Samir Sharma, Hung Phi Do, Ziyue Wu, Terrence Jao, Yi Guo, Vanessa Landes, Ahsan Javed, Weiyi Chen, and Xin Miao - together they have created a friendly and collaborative research environment, and many of them have volunteered to be my subjects. I specially enjoy the stimulating discussions with Yoon, Sajan and Yi. Thet have provided me with the most of the technical support. I also thank my research collaborators: Michael Proctor, Nassos Katsamanis, Asterios Toutios, Adam Lammert, Vikram Ramanarayanan, Jangwon Kim, and Colin Vaz from Speech Production and Articulation kNowledge group (SPAN), iii Dr. Mark Shiroishi, Mario Franco and Samuel Valencerina from Keck School of Medicine. We have worked together for collecting research data. Finally, I want to give my sincere appreciation to my parents for their unconditional love, support and sacrice. I want to thank my wife for her understanding, encouragement and unwavering love. This dissertation cannot be possibly done without them. iv Table of Contents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction 1 2 Background 6 2.1 MRI Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Signal Excitation . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 Signal Equation . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.5 Relaxation and Image Contrast . . . . . . . . . . . . . . . . 14 2.1.6 MRI Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.7 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Speech Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.2 Brain Tumor Imaging . . . . . . . . . . . . . . . . . . . . . . 23 v 2.3 Unmet Needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Fast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.2 Flexible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Dynamic 3D Visualization of Speech 31 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 2D Real-time MRI of Speech . . . . . . . . . . . . . . . . . . 35 3.2.2 Mel-Frequency Cepstral Coecients . . . . . . . . . . . . . . 35 3.2.3 Dynamic Time Warping . . . . . . . . . . . . . . . . . . . . 36 3.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.2 In Vivo Experiments . . . . . . . . . . . . . . . . . . . . . . 43 3.3.3 Data Visualization and Analysis . . . . . . . . . . . . . . . . 46 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.1 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Evaluation and Data Visualization . . . . . . . . . . . . . . 50 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4 Dynamic 3D Imaging of Speech 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 vi 4.3.1 Imaging Protocols . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3.2 Validation using Multi-slice Speech MRI . . . . . . . . . . . 67 4.3.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5 GOCART: GOlden-angle CArtesian Randomized Time-resolved 3D MRI 80 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.1 Data Acquisition Phase Encode Ordering . . . . . . . . . . . 83 5.2.2 Constrained Reconstruction . . . . . . . . . . . . . . . . . . 84 5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.3.1 Point Spread Function Analysis . . . . . . . . . . . . . . . . 86 5.3.2 Retrospective In-vivo Studies . . . . . . . . . . . . . . . . . 86 5.3.3 Phantom Simulation Studies . . . . . . . . . . . . . . . . . . 87 5.3.4 Prospective In-vivo Studies . . . . . . . . . . . . . . . . . . 88 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6 Conclusion 103 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Bibliography 108 vii List of Tables 2.1 Comparison of modalities for speech imaging . . . . . . . . . . . . . 21 3.1 List of articulatory characteristics with observed 3D features . . . . 45 viii List of Figures 2.1 MRI scanner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Precession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 2D and 3D trajectories . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Blood relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Alternative modalities for speech . . . . . . . . . . . . . . . . . . . 19 2.6 Temporal resolution requirements for speech tasks . . . . . . . . . . 20 2.7 2D real-time MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.8 Brain DCE images . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.9 Pharmacokinetic analysis of brain tumor . . . . . . . . . . . . . . . 25 2.10 DCE MRI signal evolution . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Applying DTW on the audio MFCC time series . . . . . . . . . . . 39 3.2 The ow of pairwise MFCC-DTW alignment . . . . . . . . . . . . . 41 3.3 Minimum-distance warping path . . . . . . . . . . . . . . . . . . . . 50 3.4 Error-to-noise ratio map of average MFCC-DTW . . . . . . . . . . 51 3.5 Alignment result comparisons . . . . . . . . . . . . . . . . . . . . . 53 3.6 Average root-mean-square errors of tongue surface tting . . . . . . 54 3.7 Selected individual frames from dynamic 3D visualizations . . . . . 55 ix 4.1 Golden angle stack-of-spirals data acquisition . . . . . . . . . . . . 65 4.2 Imaging protocols with k z -undersampling . . . . . . . . . . . . . . . 67 4.3 Imaging plane selection for variable density sequence . . . . . . . . 68 4.4 Registration of 2D multi-slice and 3D MRI . . . . . . . . . . . . . . 71 4.5 Temporal proles from 2D multi-slice MRI . . . . . . . . . . . . . . 72 4.6 A-P speech temporal proles comparisons . . . . . . . . . . . . . . 73 4.7 L-R speech temporal proles comparisons . . . . . . . . . . . . . . . 74 4.8 Selected articulation from sagittal planes . . . . . . . . . . . . . . . 75 4.9 Selected articulation from coronal planes . . . . . . . . . . . . . . . 76 4.10 Consented and achieved spatiotemporal resolution . . . . . . . . . . 77 5.1 Sampling patterns of PD, GOCART and GA . . . . . . . . . . . . . 85 5.2 Point spread functions for the sampling patterns . . . . . . . . . . . 89 5.3 Mean nRMSE of tumor and vessel ROI . . . . . . . . . . . . . . . . 91 5.4 Peak enhancement from a meningioma patient . . . . . . . . . . . . 92 5.5 CCC and Bland-Altman plots . . . . . . . . . . . . . . . . . . . . . 93 5.6 Time-intensity curves of reconstructed results . . . . . . . . . . . . 94 5.7 Brain DCE phantom simulations . . . . . . . . . . . . . . . . . . . 95 5.8 Prospective results of patients using GOCART . . . . . . . . . . . . 96 5.9 K trans maps calculated from patients . . . . . . . . . . . . . . . . . 97 5.10 Maximum intensity projection of a patient . . . . . . . . . . . . . . 98 5.11 Axial post-gadolinium T 1 -weighted images . . . . . . . . . . . . . . 99 x Abstract Magnetic resonance imaging (MRI) is a non-invasive 3D imaging modality capable providing unique contrast for anatomical and functional assessment. MRI is slow (compared to other modalities, such as ultrasound and computed tomography) because of physical and physiological limitations. Therefore, critical trade-os have to be made between the spatial coverage, spatial resolution and temporal resolution. In conventional 2D time-resolved (dynamic) examinations, one parameter is usually sacriced for maintaining the other two. Such trade-o would be even more critical in dynamic 3D imaging. For a long time, MRI speed prevented the application of MRI to dynamic 3D imaging where the motion of tissues (muscles, organs) or the movement of some contrast media is diagnostically important. The desired MRI signals reside in a 4D space (3D object + time), and the contemporary scanners are able to capture a very small fraction of the data, resulting in temporal blurs if spatial coverage and resolution are prioritized, or image artifacts if insucient data in one time frame. Speeding up the MRI acquisition therefore has long been an important area of research, as it enabled many new applications and more exible use of the MRI scanner. This dissertation work aims to dynamic 3D MRI specically for speech imaging and dynamic contrast-enhanced imaging. At rst, an engineering method is proposed to combine dynamic 2D data to form dynamic 3D speech visualization, using audio-based alignment. Later on, the dissertation work proposes more advanced techniques that provide 3D dynamics in a "fast" manner such that the data acquisition sequence can be generated rapidly, and useful 3D information can be recovered with sucient temporal resolution, and in a " exible" manner xi such that scanning parameters can be freely chosen, and the temporal resolution can be adjusted in retrospect. The acceleration is achieved by using ecient data acquisition and a combined parallel imaging and constrained reconstruction method; the exibility is obtained by applying a golden angle data acquisition scheme, and variable-density sampling in the data space. The imaging speed and exibility have been shown to be useful for investigations of human speech production and investigations of contrast agent kinetics in brain tumors. xii Chapter 1 Introduction Magnetic resonance imaging (MRI) is one of the most powerful and widely used medical imaging modalities [31, 42]. MRI is noninvasive and able to provide unique resolution and contrast for anatomical or structural images. MRI can be performed on arbitrary orientated 2D slice or 3D slab, and there are many tissue contrast types available for dierent applications. MRI has signicant impacts on clinical diagnosis and treatment planning, and typical applications include brain, spine, musculoskeletal, cardiovascular and body examinations. MRI is also able to serve as an imaging tool for scientic study, such as speech imaging for linguistic research. The most critical drawback of MRI is the imaging speed. MRI is slow because of physical and physiological limitations. The image signal, excited by the MRI scanner, lasts for a extremely short period, and requires time to recover and re- excite. Therefore, the signal acquisition has small temporal windows when the signal is useful. Aggressive data excitation may causing heating issue to patients. In addition, signal encoding could not be arbitrary fast due to hardware and signal-to-noise constraints. Therefore, critical trade-os are usually made between the spatial coverage, spatial resolution and temporal resolution. In conventional 1 2D time-resolved (dynamic) examinations, one parameter is usually sacriced for maintaining the other two. Such trade-o would be even more critical in dynamic 3D imaging. The next chapter introduces the dissertation background, including MRI fundamentals, dissertation applications and unmet needs. MRI fundamentals introduce how the MRI signal is generated and acquired. Background of the two dissertation applications, speech imaging and brain tumor imaging, and the needs of acceleration and exibility, are also introduced. For a long time, MRI speed prevented the application of MRI to dynamic 3D imaging where the motion of tissues (muscles, organs) or the movement of some contrast media is diagnostically important. The desired MRI signals reside in a 4D space (3D object + time), and the contemporary scanners are able to capture a very small fraction of the data, resulting in temporal blurs if spatial coverage and resolution are prioritized, or image artifacts if insucient data in one time frame. Speeding up the MRI acquisition therefore has long been an important area of research, as it enabled many new applications and more exible use of the MRI scanner. This dissertation work aims to dynamic 3D MRI specically for speech imaging and dynamic contrast-enhanced imaging. Firstly, an engineering method is proposed to combine dynamic 2D data to form dynamic 3D speech visualization, using audio-based alignment. Secondly, the dissertation work proposes more advanced techniques that provide 3D dynamics in a "fast" manner such that the data acquisition sequence can be generated rapidly, and useful 3D information can be recovered with sucient temporal resolution, and in a " exible" manner such that scanning parameters can be freely chosen, and the temporal resolution can be adjusted in retrospect. 2 In speech studies, researchers are interested in characterizing the relationship between articulations and acoustics, and understanding critically-controlled aspects of vocal tract shaping during speech. Visualization of the movements of the lips, tongue and velum can provide important information about the spatiotemporal properties of speech gestures [29, 65]. 2D real-time MRI has been shown to be a very advantageous and powerful tool for capturing motions of these articulators during speech production, with sucient spatiotemporal resolution and signal to noise ratio. The typical sagittal imaging plane covers most regions of interest, but has been limited within one thin slice, and failed to provided a full geometric information of the vocal tract shaping. 3D dynamics would be a major advance because the information outside a 2D plane could be equally valuable, such as tongue groove. The speeds of dierent vocal tract organs vary signicantly, and exible temporal resolution property allows selections of temporal resolution for individual organs. Chapter 3 and 4 present two strategies for generating 3D speech dynamics. Chapter 3 proposes a data acquisition from multiple parallel sagittal planes and an alignment method using features extracted from synchronized audio information [129]. Each plane of data are acquired with one repetition of the speech tasks, during which the subjects are instructed to maintain similar speech rate. Linguistically meaningful audio features are extracted from each audio track and used in a dynamic programming method to align the MRI data. Finally, dynamic 3D visualizations are generated from manually segmented 2D real-time MRI images. Chapter 4 presents a direct 3D data acquisition using ecient 3D golden angle stack-of-spirals readouts, and a constrained reconstruction method that utilized prior information from the data. The proposed data acquisition has 3 variable sampling density that frequently updates the critical information from the data space, and allows free selection of temporal resolution in retrospect. The reconstructed dynamic 3D images provide sucient spatiotemporal resolution and are compared with results of a well-established technique, 2D multi-slice real-time MRI. In clinical assessment of brain abnormalities (mostly tumors in this work), radiologists are no only interested in the size and shape of the tumor, but also the characterization of tumor properties, for instance, tumor permeability and vascular volume, which help to determine the disease state and assess the eects of drug treatment [8, 41]. This assessment is possible with the injection of gadolinium contrast agents (CA), which enhance the MRI signals when CA passes through the vessels and leaks into the tumors, and the signal intensity is proportional to the concentration of the CA. Dynamic contrast-enhanced (DCE) MRI is the technique to image the process of signal enhancement. Quantitative pharmacokinetic parameter maps that re ect microcirculatory structure and function of the imaged tissues are then extracted from image series. Spatial coverage and resolution are critical for localization and segmentation of tumors, and temporal resolution is crucial for pharmacokinetic modeling. Chapter 5 develops a novel DCE-MRI data acquisition scheme, termed GOlden- angle CArtesian Randomized Time-resolved 3D MRI (GOCART) [127], which is well suited for advanced reconstruction methods. GOCART combines an existing framework that randomly acquires the data space, and another framework that allows for exible temporal resolution. The images are reconstructed using combined parallel imaging and compressed sensing, and have been validated in retrospective studies and in digital phantom simulations. GOCART yields superior 4 image delity over conventional methods, especially in highly accelerated cases. The GOCART DCE-MRI brain imaging protocol has been implemented and incorporated in brain examinations at Keck School of Medicine of USC, and has been enrolled into research and preclinical studies. Finally, chapter 6 summarizes the dissertation work and discusses ongoing research topics and future work. 5 Chapter 2 Background 2.1 MRI Fundamentals MRI is an advanced medical imaging technique based on nuclear magnetic resonance (NMR). Conventional MRI scanners image the Hydrogen atom ( 1 H), which is the most abundant and practical for human imaging. 1 H in human body is mainly in the form of water (H 2 O), and a human adult is approximately 65% H 2 O by weight. Figure 2.1(a) shows the a latest commercial MRI scanner model from General Electrical (GE) Healthcare, and (b) illustrates the cross-sectional view of a typical clinical scanner with a patient lying inside, and demonstrates key components including Magnet, Radio Frequency Coil, and Gradient Coils, which correspond to critical steps for MRI, polarization and resonance, excitation and reception, and spatial encoding. After data acquisition, the MR image reconstruction utilizes the Fourier transform and is performed on an accompanying computer. One 1 H contains single positive charged proton and single negative charged electron bound to the nucleus. The positive charged 1 H is spinning and possesses 6 (a) (b) Figure 2.1: MRI is an advanced medical imaging technique. (a) Latest GE Signa HDxt 3.0T MRI scanner, image courtesy of GE at www3.gehealthcare.com/en/products/categories/magnetic resonance imaging. (b) Cross-sectional view of a typical clinical scanner with a patient lying inside, image courtesy of Maggie Philbin at www.maggiephilbin.com/wp- content/uploads/2010/11/mri-scanner.jpg. The key components of a scanner include Magnet, Radio Frequency Coil, and Gradient Coils. a nuclear angular momentum (often referred to as spin). The momentum allows proton to interact with external applied magnetic eld, and the individual interaction can be described by quantum mechanics. Water molecules are very small, and there are approximately 6.8 10 19 protons in 1 mm 3 of water. Therefore, MRI focuses on macroscopic behavior of spins. In classical description, macroscopic behavior of spins analogous to a magnetization moment that is spinning about its axis. 2.1.1 Polarization The MRI scanner generates a static magnetic eld, B 0 , to align randomly oriented spins to parallel directions. The quantum mechanical property of 1 H leads to two separated populations of aligned spins, pointing to parallel (n + ) and antiparallel (n + ) directions of B 0 . The ratio of the two populations is dependent on the eld 7 strength and the temperature: n n + =e h B 0 =2kT (2.1) where h is Planck's constant, is the gyromagnetic ratio, k is Boltzmann's constant, and T is absolute temperature. The Larmor frequency, =2, is 42.58 MHz/Tesla for 1 H. Tesla is a eld strength unit, 1 Tesla (T) = 10000 Gauss (G). Typical commercial clinical MRI scanners have xed B 0 eld strength of 1.5 T or 3 T. At 1.5 T, the ratio is about 0.999990, corresponding an excess of 10 parallel spins per million. The state of balance between the two populations of spins is called equilibrium state. A magnetization moment M, per unit volume, is produced as the sum of spin magnetization that aligned with B 0 in the volume. 2.1.2 Signal Excitation From a classical point of view, signal excitation is the rotation of M by some angle. A magnetic eld B 1 applies on the transverse direction and performs the rotation. When M is rotated by an angle from B 0 , a torque dM dt = M B (2.2) is induced and leads to precession about B, which is the sum of applied magnetic elds. Precesses of M about B at Larmor frequency is the solution for above equation. Figure 2.2(a) demonstrates the precession of a 90° excited M from the transverse (x-y) plane to the z axis. The precession of M in the transverse plane is similar to the precession of the gyroscope (Figure 2.2(b), adopted from [4]), where 8 0.8 0.6 0.4 0.2 x 0 -0.2 -0.4 -0.6 -0.8 -0.8 -0.6 -0.4 -0.2 0 y 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 z (a) (b) Figure 2.2: A spinning object responses to the external force as precession: the precession of (a) a 90° excited magnetization moment M (macroscopically behaves like spinning magnet) with the existence of external force created by the B 0 , the blue curve indicates the trace of M, and (b) a rapidly spinning gyroscope with the existence of the gravity force, image courtesy of NEETS [4]. the rapidly gyroscope spinning provides force to resist gravity, and the gyroscope responses to the gravity by precession. Since M keeps changing direction during the precession, B 1 needs to be tuned to the same resonance frequency to maintain torque to rotate M away from z axis. B 1 eld is thus produced at resonance frequency (usually referred to as radio frequency, or RF) by RF coil. Intuitively, the rotation angle of M depends on the strength and duration of B 1 . The typical B 1 eld strength is a fraction of a Gauss, and the typical duration is a few milliseconds. However, since B 1 is carried on a very high frequency (63.87 MHz at 1.5T), it contains high energy and potentially causes subject temperature increase. Precessing M generates changing magnetic eld on the transverse plane. According to Faraday's law, currents can be created by electromotive force in a closed circuit enclosing varying magnetic elds. A closed-loop receiver (the same RF coil) is usually placed on the transverse plane to detect magnetic eld 9 changes and record the induced current. The induced current, also viewed as a time signal s r (t), is proportional to the precessing transverse magnetization, and may be written as s r (t) = ZZZ V M trans (x;y;z;t)dxdydz (2.3) 2.1.3 Signal Equation In the simplest case, signal decay due to magnetization relaxation is ignored, and M always precesses in the transverse plane with the exist of B 0 , the Equation 2.2 becomes dM dt = M B 0 k (2.4) where k is the unit vector in z. When the initial magnetizationM 0 point to x axis, the solution of Equation 2.4 is M(t) =me i B 0 t (2.5) In more general situation, the gradient coil of the scanner generates time-varying magnetic eld gradients, the Equation 2.2 becomes dM dt = M (B 0 k +G x (t)i +G y (t)j +G z (t)k) (2.6) where i and j are the unit vectors in x and y, respectively. G x , G y , G z are the gradients along x, y, z, respectively. When the initial magnetizationM 0 point to x 10 axis, the solution of Equation 2.6 becomes M(t) =me i (B 0 +Gx+Gy+Gz)t (2.7) After removal of e i B 0 using standard demodulation technique, Equation 2.7 can be incorporated into Equation 2.3, s(t) = Z x Z y Z z m(x;y;z;t)e i2[kx(t)x+ky(t)y+kz(t)z] dxdydz (2.8) where 8 > > > > < > > > > : k x (t) = 2 R t 0 G x ()d k y (t) = 2 R t 0 G y ()d k z (t) = 2 R t 0 G z ()d (2.9) Equation 2.8 is often referred as the signal equation, and Equation 2.9 represents time integral of the time-varying magnetic led gradients, in units of spatial frequency. This signal equation reveals the most important relationship in MRI: the recorded signal equals to the 3D Fourier transform of object in the spatial frequency domain. By convention, the transform space where k represents spatial frequency and k x , k y , k z are the three axes, is called "k-space". The recorded signal s(t) is thus mapped to a trajectory in the k-space determined by time integral of gradient waveforms. To reconstruct for a image (i.e., object), sucient data samples from k-space must be acquired on the path of trajectories before the inverse Fourier transform. This process is called data acquisition. 11 2.1.4 Data Acquisition There are several requirements for data acquisition. Mathematically, the sample points in the k-space should satisfy the eld of view (FOV) and spatial resolution requirements; physically, each sampling trajectory should nish within a relatively short time window (details in Section 2.1.5); and practically, temporal resolution must be met because the object contains time-varying information (details in Section 2.2). FOV and Spatial Resolution According to the sampling theorem, sampling in the k-space creates replicas in the image domain. The eective FOV along one direction is FOV = 1 k (2.10) where k is the minimal sampling interval along the direction. The eective spatial resolution along one direction is 1 2k max (2.11) where k max is the maximal spatial frequency obtained in the k-space. Acquisition Trajectories A k-space trajectory is determined by time integral of gradient waveforms. 2D trajectories (k x k y ) are the most fundament, and can be straightforwardly extended to 3D. There are many possible 2D trajectories, Figure 2.3(a) shows representative Cartesian, radial, and spiral trajectory. The color level corresponds 12 (a) (b) Figure 2.3: Representative (a) 2D Cartesian, radial, and spiral trajectory, and (b) their extensions to 3D by augmenting an additional k z direction: 3D Cartesian, stack-of-stars, and stack-of-spirals. The color level corresponds to the sequential acquisition order of individual readouts. to the sequential acquisition order of individual readouts. Cartesian trajectory is by far the most robust and widely used. While non-Cartesian radial and spiral trajectories are generally more robust to motion and ow, because each readout usually start from the center of k-space, where critical low-frequency data reside. In addition, the length of spiral trajectory is easily controlled, and longer trajectory usually improves acquisition eciency. The costs of radial and spiral include increased computational complexity due to non-Cartesian sampled data, and increased sensitivity to gradient system imperfections. There are even more possibilities for 3D trajectory [44]. For simplicity, this dissertation work uses direct extensions of aforementioned trajectories by 13 augmenting an additional k z direction. Figure 2.3(b) demonstrates the resulting 3D Cartesian, stack-of-stars, and stack-of-spirals trajectories. 2.1.5 Relaxation and Image Contrast During the precession, the strength of the magnetization moment M is actually not a constant. In classical description, due to the magnetic dipole-dipole interactions, M possesses longitudinal (along z) and transverse relaxation. The longitudinal component behaves as M z (t) =M 0 + (M z (0)M 0 )e t=T 1 (2.12) where M 0 is equilibrium magnetization, M z (0) is initial longitudinal component, and T 1 is the spin-lattice time constant that controls the rate of longitudinal relaxation. The transverse component behaves as M xy (t) =M xy (0)e t=T 2 (2.13) whereM xy (0) is initial transverse component, andT 2 is the spin-spin time constant that controls the rate of transverse relaxation. Both T 1 and T 2 are dependent to the tissue type and eld strength. For instance, bloodT 1 andT 2 are approximately 1500 ms and 300 ms at 1.5 T [106], respectively. Figure 2.4 shows realistic (a) T 1 relaxation and (2) T 2 relaxation of blood, and (3) the precession trace of blood magnetization moment M (the Larmor frequency has been signicantly decreased for demonstration purpose). Notice that Figure 2.2(a) is a demonstration of precession trace that uses fakeT 1 andT 2 values to mimic the gyroscope precession. 14 (a) (b) (c) 1 0 x -1 0 y 0.5 0 1 1 z -1 t (s) 10 4 6 8 2 10 4 6 8 2 Figure 2.4: Realistic blood magnetization moment M (a) T 1 (1500 ms) relaxation and (2) T 2 (300 ms) relaxation, and (3) the precession trace of M (the Larmor frequency has been signicantly decreased for demonstration purpose) The transverse signal amplitude usually decay faster than intrinsic T 2 decay, because of the loss of phase coherence of the spins due to applied magnetic eld inhomogeneity and dierent tissue susceptibilities. This undesired dephasing phenomenon is usually referred to as o-resonance. O-resonance leads to image distortions and artifacts, resulting in an signal volume decay eectively described by a time constant T 2 , which is smaller than T 2 . The mechanisms of MRI provides the ability to produce various tissue structures contrast, including proton density contrast, T 1 contrast, T 2 contrast, T 2 contrast, phase contrast, etc. Since onlyT 1 contrast is used in this dissertation work, the scope of MRI sequence introduction is limited to common T 1 -weighted sequence. 2.1.6 MRI Sequence An MRI sequence is a combination of programed time-varying magnetic elds, including RF and gradients along x, y, z directions. There are over one hundred 15 dierent pulse sequences in use today. Gradient echo is one kind of fundamental sequence that reverse gradient polarity to rephase protons and form signal echoes. Gradient echo sequence includes an RF pulse for signal excitation, followed by data acquisition gradients, and a gradient spoiler to dephase the residual transverse magnetization within a voxel. Due to the signal decay caused by relaxation and o- resonance, readout trajectory has to nish within a relatively short time window after a signal excitation, and the process repeats to acquire k-space data. The period of each repetition is called time of repetition (TR). Gradient echo sequence can be very fast, although sensitive to magnetic eld inhomogeneities because of gradient rephasing. At the air-tissue boundary and bone-tissue boundary, the tissues are magnetized to dierent degree (prone to o- resonance), experiencing magnetic susceptibility artifacts. After data acquisition, a spoiler destroys any residual transverse magnetization, almost eliminates T 2 dependence. The resulting T 1 -weighted sequence is often referred to as Fast Low Angle SHot (FLASH) sequence [39], and on GE scanner the name is SPoiled Gradient Recalled echo (SPGR) sequence. Typical SPGR sequence utilizes a small magnetization excitation angle ( ip angle) and short TR, therefore, longitudinal magnetization (M z ) is not fully recovered. The steady-state longitudinal component M ss is highly dependent on T 1 and independent of T 2 , M zz =M 0 (1e TR=T 1 )sin 1e TR=T 1 cos (2.14) A typical research SPGR sequence at USC utilize a TR of 5-6 ms and a of 15°. 16 2.1.7 Acceleration MRI is a conventionally slow medical imaging modality, a typical comprehensive exam would cost at least half hour. The speed is limited by physical and physiological constraints. The signal relaxation determines maximal eective data acquisition period, and additional time must be spent on signal excitation and spoiling within each TR. The RF sequence contains high energy and deposits power in the body, which may cause heating problem. In addition, signal encoding cannot be arbitrary fast due to hardware and signal-to-noise considerations. To improve patient comfort, and more importantly, to better capture useful dynamic information, researchers have been continually exploring acceleration of MRI. Ecient implementation of SPGR sequence can reduce the scan time. Fractional RF reduces signal excitation time, and RF spoiling could replace gradient spoiling to shorten TR. A number of ecient data acquisition trajectories have also been proposed. Notably, Echo Planar Imaging (EPI [90]) acquires multiple Cartesian lines data within each TR, and spiral imaging [2, 73] exploits gradient hardware limits and is robust to motion and ow (Figure 2.3). Reconstruction with less k-space data is an alternative yet more powerful method. This reconstruction requires prior knowledge of the data, and use the information as "constraint". Partial Fourier method [72] exploits k-space redundancy, the reduction factor R is usually <2; parallel imaging [95, 36] utilize sensitivity from multiple-channel receiver coil, the R is limited by the number of channels; the latest and most advanced reconstructions assume that the imaged objects are compressible in object domain or certain transform domain [69], or the data matrix is low-rank [60]. The R of sparsity and/or low-rank 17 constrained reconstruction highly depends on applications and how appropriate the assumptions are. 2.2 Applications As an advanced medical imaging technique, MRI has been widely used for investigation of human anatomy and physiology. The most remarkable advantages of MRI are probably safety that involves no ionizing radiation, and high adaptability that provides various contrasts for dierent applications. The wide range of MRI applications cover most of the human soft tissues, for example, brain (neuroimaging), heart (cardiac imaging), vessel (MR angiography), tumor (contrast enhanced MRI), etc. Besides imaging for diagnosis purposes, MRI has been used for scientic discovery, such as functional MRI that investigates brain functions and connectivities, and speech MRI that examines morphologies and functions of vocal organs during speech [102]. This dissertation work aims to improve the capability of MRI in speech imaging and tumor imaging. 2.2.1 Speech Imaging Speech production is a complicated process that involves dexterous cooperation of a number of organs in the upper airway, including the lip, teeth, tongue, velum, larynx, pharynx, and vocal fold. The space formed by these organs is usually referred to as the vocal tract. During the speech production, air comes out from lung and enters the vocal tract through the airway, causing vocal folds vibrations and producing fundamental frequency and harmonics. When the air passing through the vocal tract, it is carefully tuned by the shaping of vocal tract, 18 (c) (b) (a) Figure 2.5: Several alternative techniques have been utilized to visualize vocal tract shaping during speech, including (a) x-ray, image courtesy of Stevens [107], (b) ultrasound, image courtesy of Haskins Labortories, and (c) EMA, image courtesy of [121]. They all have critical limitations for speech imaging. and the formant frequencies and intonation of speech are controlled. Since speech is a dynamic process, the dynamics of vocal organs are of great interest to the speech researchers. Therefore, the dynamics of the vocal tract shaping is critical for understanding speech, and imaging of the vocal tract is usually performed by imaging of the participating vocal organs. By analysis of vocal tract dynamics, speech researcher are able to characterize the relationship between articulations and acoustics, and understand critically-controlled aspects of vocal tract shaping during speech. Alternative Modalities Several alternative techniques have been utilized to visualize vocal tract shaping during speech, including x-ray [107, 18], ultrasound [109, 91], electromagnetic articulography (EMA) [88, 121]. Figure 2.5 demonstrates use of (a) x-ray (adopted from [107]), (b) ultrasound (adopted from Haskins Labortories) and (c) EMA (adopted from [121]) for speech studies. The shared advantage of aforementioned techniques is the high temporal resolution, which is extremely important for capturing vocal tract dynamics, especially for fast motion such 19 Figure 2.6: Current consensus on the spatiotemporal resolution requirements for dierent speech tasks from 2014 Speech MRI Summit, image courtesy of Lingala [65]. The speech tasks are presented as rectangular zones with approximate boundaries. as closures of alveolar trills and consonant constriction. Figure 2.6 shows the spatiotemporal resolution requirements for various speech tasks from 2014 Speech MRI Summit [65]. The speech tasks are presented as rectangular zones with approximate boundaries. However, these alternative modalities all have critical limitations: x-ray has ionizing radiation, and provides only 2D projection of the object; ultrasound detests only rst tissue-air boundary, and the detector attaches the jaw; EMA tracks sparse spatial samples, and interferes normal speech by putting sensors into the mouth. Although MRI has provided relatively lower spatial and temporal resolution, it has the unique advantage of safely and noninvasively producing complete views and quantitative measurements of the entire vocal tract including the velum, posterior oral cavity, and pharyngeal portions. A more comprehensive comparison can be found in Table 2.1. Improving the temporal resolution has 20 Safety View 3D Resolution Speed Comfort Cost Posture X-ray 7 3 7 3 3 3 3 3 US 3 7 3 3 3 7 3 3 EMA 3 7 7 7 3 7 3 3 MRI 3 3 3 3 7 3 7 7 Table 2.1: Advantages (3) and disadvantages (7) of X-ray, ultrasound(US), EMA, MRI for 9 dierent aspects of speech imaging. been one of the aims of this dissertation work, based on the established work of 2D real-time MRI [82]. 2D Real-Time MRI To the best of my knowledge, there is no widely accepted denition of dynamic MRI. According to Narayanan's denition in [82], dynamic MRI refers to the imaging of an actively time-varying subject, rather than a static postural source. A dierent term, real-time MRI, refers to very fast MRI techniques that can capture the information satisfying the temporal requirements. In short, dynamic MRI refers to the source, real-time MRI to the acquisition. In this dissertation work, dynamic MRI and real-time MRI are not distinguished, both refer to a set of techniques including fast and ecient acquisition of dynamic objects, and corresponding image reconstruction. The 2D real-time speech MRI [82] uses spiral readout schemes, and 8 to 14 spirals fully sample the k-space without introducing o-resonance blurs. Opposed to sequential order of spirals, bit-reversal order [48] is applied to make the sampling pattern more uniform in the k-space. Such interleaved spirals are usually referred to as interleaves. The acquired data are rst interpolated to Cartesian grid using gridding technique [46], and then Fourier transformed to images. Sliding window technique is used to increase eective frame rate. Figure 2.7(a) (adopted from [82]) 21 (a) (b) Figure 2.7: Overview of 2D real-time MRI. (a) The data acquisition and reconstruction for 2D real-time MRI using interleaved spiral readouts, image courtesy of Narayanan [82]. 9 to 13 spirals fully sample the k-space without introducing o-resonance blurs. Gridding technique is used for reconstruction of individual frames, and sliding window technique is used to increase eective frame rate. (b) Representative vocal tract shaping on the sagittal slice during speech production. provides a overview of 2D real-time MRI data acquisition and reconstruction, and Figure 2.7(b) demonstrates some representative results of speech production. The changing shapes of the vocal tract can be clearly visualized and evaluated from the midsagittal images. Since reference [82] was published and the Speech Production and Articulation kNowledge (SPAN) group at USC was founded, a total of over 100 hours of 2D real-time MRI speech data have been collected, numerous researches have been performed at USC, and more researches will be performed with the share of database and supporting toolset [81, 78]. 22 2.2.2 Brain Tumor Imaging Tumor is an abnormal growth of tissue that can be benign or malignant, and the later one is also known as cancer. The tumor tissues have no physiological function, and usually tend to proliferate to surrounding normal tissues. The most common brain tumors can be classied as gliomas, meningiomas, pituitary adenomas, and nerve sheath tumors. Brain tumors are the second an fth leading cause of cancer- related deaths in males and females ages 20-39, respectively. 70,000 new cases of primary brain tumors were diagnosed, and nearly 14,000 people lost their battle with a brain tumor in 2014 in the U.S. [84]. Normal human brain is almost separated from the blood due to the blood- brain barrier, which is a highly selective barrier that separates blood from brain extracellular uid. Brain tumors cause disruption of blood-brain barrier, and allows slow leakage of blood and the contrast agent (CA) into the tumor tissues. Therefore, tracking of the blood leaked into tumors is an ecient method for tumor imaging, and leakage becomes one of the most important measurements for tumors. T 1 -weighted sequence has been an important means for brain tumor imaging. Tumor signals can be enhanced by injection of contrast agent, which shorten the T 1 near CA molecules, giving rise to faster longitudinal relaxation in a dynamic process. While the normal brain tissue signals are barely enhanced without CA. This technique is referred to as dynamic contrast-enhanced (DCE) MRI. Brain Dynamic Contrast-Enhanced MRI Gadolinium CA are usually administered by injection into the human blood system. Shortly after the vein injection from arm (about 60 seconds), the gadolinium molecules arrive the brain, and eective reduce both T 1 and T 2 values of nearby 23 Figure 2.8: Typical brain DCE images acquired with USC brain imaging protocol. When the CA arrives, the signal in the vessel is signicantly enhanced, and the enhancement drops as the bolus passes the vessel. The tumor signal is enhanced much slower and weaker as the CA gradually leaks into the tumor. atoms within tissues. Following Equation 2.14, reduce of T 1 increases stead-state magnetization signal. DCE MRI makes use of such relaxitivity eects of gadolinium to visualize the enhanced tumor tissuesT 1 signal dynamics when CA leak in, while the normal brain tissue signals are barely enhanced without CA. The increase in the rate of T 1 relaxation is proportional to the concentration of the CA, which is proportional to the blood carrying the CA. The blood in the tumor is directly related to the characteristic permeability. Figure 2.8 illustrates typical brain DCE images acquired with USC brain imaging protocol. When the CA arrives, the signal in the vessel is signicantly enhanced, and the enhancement drops as the bolus passes the vessel. The tumor signal is enhanced much slower and weaker as the CA gradually leaks into the tumor. On the current stage of brain tumor imaging, doctors mostly rely on post T 1 - weighted contrast enhanced images to diagnose (e.g., last subgure in Figure 2.8). However, additional information such as tissue permeability map (K trans ) and volume of plasma (v p ) can be derived from pharmacokinetic analysis of DCE dynamics for quantitative parameters that re ect microcirculatory structure and function in imaged tissues [86]. 24 Figure 2.9: Pharmacokinetic analysis of a left parietal-occipital glioblastoma multiforme, image courtesy of Paldino [86]. A pharmacokinetic model is used to t the CA concentration and the vascular input function, and produce transfer constant (K trans ), fractional plasma volume (v p ), fractional volume of the extracellular extravascular space (v e ), and goodness of t (GOF). Pharmacokinetic Analysis Pharmacokinetics, in the context of DCE MRI, is the characteristic interactions of CA and the tissues in terms of its distribution and permeability. Figure 2.9 is adopted from [86], demonstrating a set of typical measurements derived from brain DCE MRI. At rst, a series of CA concentrations in plasma over time are calculated from DCE images and an measurement of T 1 mapping of the brain (top left), and the vascular input function is estimated from a artery in DCE image or a experimentally-derived population-averaged function [87]. And then, one pharmacokinetic model is used to produce useful parameter maps, including 25 the transfer constant (K trans ), fractional plasma volume (v p ), fractional volume of the extracellular extravascular space (v e ), and goodness of t (GOF). There is a wide range of classic models available for DCE MRI [105], from which a Tofts model [114] is often used at USC for the simplicity and robustness. 2.3 Unmet Needs MRI speed prevented the application of MRI to dynamic 3D imaging where the motion of tissues (muscles, organs) or the movement of some contrast media is diagnostically important. The desired MRI signals reside in a 4D space (3D object + time), and the contemporary scanners are able to capture a very small fraction of the data, resulting in temporal blurs if spatial coverage and resolution are prioritized, or image artifacts if insucient data in one time frame. Speeding up the MRI acquisition therefore has long been an important area of research, as it enabled many new applications and more exible use of the MRI scanner. This dissertation work aims to improve dynamic 3D MRI specically for speech imaging and dynamic contrast-enhanced imaging for 1) facilitate and accelerate the dynamic 3D MRI, and 2) with operational and demonstrative exibility. 2.3.1 Fast In the dissertation work, the meaning of Fast is twofold, fast acquisition sequence generation to improve patient comfort, and fast k-space data acquisition that allow high temporal resolution. 26 Fast Sampling Pattern Generation Lebel et al. have proposed a variable-density poisson ellipsoid sampling [58] for brain DCE MRI that has been shown to be suitable for combined parallel imaging and compressed sensing. The imaging protocol has been experimentally evaluated at Keck School of Medicine of USC. However, the algorithm for variable- density poisson ellipsoid sampling is very time-consuming. It takes a few minutes to generate the sampling pattern of sequence for a brain DCE scan, and the runtime increases as the acceleration factor reduces. This prohibits the on-line generation of the sampling pattern, and as an workaround solution, sampling pattern pattern has been generated o-line and uploaded to the scanner. In addition, the method provides a sampling pattern, without specifying the order. One intuitive implementation is to sequentially acquires the sample points, which is obviously suboptimal because the center of k-space is sampled together in a subset of temporal window. A fast-generated sampling pattern with specic sampling order is thus desired for brain DCE MRI. Fast Imaging Since the physical and physiological constraints are impractical to overcome, one feasible way to accelerate the imaging speed is to reconstruct an image with less k-space sample. For example, if comparable image quality can be obtained by a new approach with only 10% of the data from k-space (10% of fully sampling time), the new approach is considered to achieve a 10 acceleration. Section 2.1.7 brie y overviews the essential techniques for MRI acceleration. The data acquisition sequence need to be highly ecient and suitable for constrained 27 reconstruction. The core reconstruction techniques can be combined to achieve higher acceleration [58, 28, 66, 33] by exploiting k-t-space redundancy. In the speech imaging application, the target FOV is 20205 cm 3 , spatial resolution is at least 333 mm 3 , and temporal resolution is at least 150 ms. For the brain DCE MRI, [58] has laid solid foundation and achieved FOV of 242424 cm 3 , spatial resolution of 0.940.941.9 mm 3 , and temporal resolution of 4.1 sec. The reconstruction scheme should be able to directly apply to the new sampling scheme. 2.3.2 Flexible In the dissertation work, Flexible can be demonstrated from two angles, exible imaging parameter selection prior to MRI exams, and exible temporal resolution during image reconstruction. Flexible Imaging Choice Free selection of imaging parameters such as the FOV or spatial resolution for case-dependent scan plane prescription is benecial to clinical brain DCE MRI exams. This requirement is closely related to fast sampling pattern generation: if the new algorithm is fast enough and executable on MRI console, free imaging parameter selection is possible. Flexible Temporal Resolution Flexible temporal resolution can be very useful for the following reasons. First, although greatly improved, constrained reconstruction still has to face the trade- o between temporal resolution and image quality. An adaptive reconstruction 28 Figure 2.10: Signal evolutions in the artery and tumor during a DCE MRI, image courtesy of Lebel [57]. Dierent portions of the image series may require dierent temporal resolution, depending on much rapidly the signals changes during the periods. can assign a larger temporal window when there is less dynamic information, and a smaller window for better capture of fast signal changes. Figure 2.10 is adopted from [57], demonstrates typical signal evolutions in the artery and tumor during a DCE MRI. Dierent portions of the image series may require dierent temporal resolution, depending on much rapidly the signals changes during the periods. For instance, perfusion quantication requires higher temporal resolution than vessel permeability quantication after 100 s. Second, if patient motion occurs during the brain DCE MRI exam, the data with motion can be discarded (or applied motion compensation), and images can still be reconstructed from remaining data. Last, the optimal temporal resolution for a dynamic MRI scan is often not known in advance, at it may depend on the actual 29 tissue, motion, or disease state. It is even more uncertain when constrained reconstruction and pharmacokinetic analysis are involved in DCE MRI. Several studies [49, 56, 17, 50] have investigated the in uences of temporal resolution or constrained reconstruction on the pharmacokinetic analysis of DCE MRI, where data with exible temporal resolution are extremely valuable. 30 Chapter 3 Dynamic 3D Visualization of Speech 3.1 Introduction The vocal tract is the universal human instrument, played with great dexterity and skill in the production of spoken language. Speech researchers are interested in characterizing the relationship between articulation and acoustics, and understanding critically-controlled aspects of vocal tract shaping during speech. Visualization of the movements of the lips, tongue and velum can provide important information about the spatiotemporal properties of speech gestures. Several techniques have been utilized to visualize tongue shaping during speech, including computed tomography (CT) [89], electromagnetic articulography (EMA) [88], x-ray [18, 120], x-ray microbeam [77, 118], ultrasound [109], and magnetic resonance imaging (MRI) [6, 111, 79, 80, 94, 52, 51, 30, 71, 108, 19, 82]. However, each of these approaches has important limitations. Both CT and x-ray 31 expose subjects to radiation. In addition, x-ray averages the entire vocal tract volume into a two-dimensional (2D) image from one projection. EMA provides three-dimensional (3D) data with high temporal resolution, but the EMA sensors are spatially sparse and dicult to attach to pharyngeal structures. Ultrasound is safe and noninvasive, but it detects only the rst air-tissue boundary, and typically does not image either the tongue tip or lower pharyngeal articulation. Furthermore, the ultrasound transducer probe is typically in contact with the jaw and may aect the speech production. Although MRI has provided relatively lower spatial and temporal resolution, it has the unique advantage that it can produce complete views, and quantitative measurements, of the entire vocal tract including the velum, posterior oral cavity, and pharyngeal portions. Phonetic data from these regions are not easily acquired using other modalities, but can be safely and non- invasively imaged with MRI, making it a promising tool for speech research. A critical discussion of various techniques can be found in [9]. MRI of the upper airway for speech research has been limited by slow acquisition speed. A number of the early MRI acquisition techniques for speech have been based on imaging of multiple 2D slices [6, 111, 79, 80, 94] with long scan times in the order of minutes, typically requiring multiple sustained repetitions of the utterance under investigation. The recent introduction of 3D-encoded MRI techniques combined with either compressed sensing or parallel imaging has accelerated the acquisition speed within the duration (typically 6 to 10 s) of a single sustained sound production [52, 51]. Compared with 2D multi-slice MRI, 3D- encoded MRI can provide higher acceleration with pseudo-random undersampling available in 2D k-space, and also has the potential for thinner slices and higher signal-to-noise ratio (SNR). 32 More recently, dynamic MRI methods have been used to provide information about changes in vocal tract shaping during speech. Cine-MRI enables obtaining a set of images by repeatedly scanning the vocal tract while subjects produce (hopefully) identical repetitions of the same utterance. During each repetition, part of the k-space (e.g. one row) is acquired sequentially [30, 71, 108]. In order to construct images from these data, cine-MRI usually requires synchronization in data acquisition and image reconstruction. Moreover, cine-MRI lls the k-space in a predetermined pattern, which assumes that dierent repetitions of production have the same pace and duration. Real-time MRI, on the other hand, does not involve the synchronization of k-space acquisition and motions of the soft tissue (e.g., tongue motion, ventricular motion of the beating heart), and captures the motion using fast image acquisition techniques. Demolin et al. demonstrated real- time MRI of speech using turbo spin echo sequence (TSE) with 4 fps on a single slice [19]. Narayanan et al. proposed real-time MRI of vocal tract dynamics with 20-24 fps using a spiral gradient echo sequence and sliding window reconstruction, which is sucient to capture the underlying motion and physiology of interest during the production of speech [82]. Dynamic imaging of the full 3D vocal tract with high spatial and temporal resolution is, however, more desirable than just a single plane view for a proper understanding of articulation during uent speech. Current MRI systems do not meet the requirements for capturing 3D vocal tract dynamics in real-time. Interesting engineering, but compromised, solutions have included model-tting methods and multi-planar imaging in separate operations. Several approaches (including non-MRI ones) have been proposed. Engwall established a 3D parametric tongue model using static MRI and electropalatography (EPG), and 33 estimated parameters for tongue movements using EMA [27]. Video data acquired from pellets marked on the human face were combined with 3D static MRI data to linearly model the articulators [5]. Yang and Stone reconstructed 3D tongue surface motions by temporal registration of ultrasound images from multiple scan locations [122]. Takemoto et al. demonstrated a 3D cine-MRI technique that measures the shaping of the vocal tract using multi-planar 2D cine-MRI [112]. The success of cine-MRI depends crucially on synchronization between speech production and MRI acquisition. This can be challenging to speakers who are not trained to repeat the utterances at the same speech rate. Inspired by multi-planar 2D imaging techniques, in this paper, we propose a novel approach to reconstruct 3D dynamics from multi-planar real-time MRI. The proposed method constructs 3D movies of the tongue shape and vocal tract dynamics by temporal alignment of parallel 2D MRI data of overlapping sagittal slices covering the entire vocal tract. We brie y introduced the approach and demonstrated the preliminary results rst in [128]. The remainder of this paper is organized as follows. We rst brie y review real-time MRI of speech, acoustic features mel-frequency cepstral coecients (MFCC), and a well-known algorithm to measure sequence similarity, called dynamic time warping (DTW). We then describe a systematic approach of parameter selection for extracting MFCCs and evaluation of the performance of dierent methods of parameterization. Following which, we demonstrate the in-vivo results of audio alignment, synthesized coronal images, and 3D dynamic visualization of tongue and vocal tract shaping during uent speech. Finally, we critically discuss the current approach, and propose potential future directions for this work. 34 3.2 Background 3.2.1 2D Real-time MRI of Speech 2D Real-time MRI specically refers to directly acquiring, reconstructing and displaying MR images in real-time. Spiral readout is one of the most time- ecient schemes commonly used in real-time MRI. The design of spiral trajectories balances trade-os among temporal resolution, spatial resolution and SNR [82]. This technique has been used to study the vocal tract shaping aspects such as of English speech production [11] and emotional speech production [59]. The implementation of 2D real-time MRI continuously acquires the k-space data in an interleaved spiral scheme, normally with 10 to 20 spiral arms to form a single image [82]. A sliding window reconstruction is applied to reconstruct images after a subset of interleaves are acquired. Synchronized and noise-robust audio recordings (sample rate 20 kHz) are acquired simultaneously with the MR data acquisition to record speech and other vocalizations. Adaptive noise cancellation can eectively remove the MRI acoustic noise [10]. 3.2.2 Mel-Frequency Cepstral Coecients MFCCs are one type of short-term spectral features extracted from the acoustic speech signal. This signal representation is motivated by the signal processing in the human auditory system (cochlea) [43]. MFCCs attempt to encapsulate characteristics of the signal which are salient in human speech perception. In this study, we use MFCCs to represent continuous overlapping speech segments (frames), forming a time series of MFCC feature vectors for each audio recording. MFCCs can be computed as follows: 35 (1) Apply a discrete Fourier transform (DFT) to each Hamming-windowed audio frame and generate the short-term power spectrum: S(k) =jFFT (Hamming(s))j 2 ; (3.1) where s represents a frame of the audio recording, and S(k) is the power spectrum on frequency k. (2) Pass the power spectrum through a triangular band-pass lter bank with M lters (m = 1; 2;;M), M usually ranges from 24 to 40 [43]: H m [k] = 8 > > > > > > > < > > > > > > > : 0; k<f[m 1] 2(kf[m1]) (f[m+1]f[m1])(f[m]f[m1]) ; f[m 1]kf[m] 2(f[m+1]k) (f[m+1]f[m1])(f[m+1]f[m]) ; f[m]kf[m + 1] 0; k>f[m + 1] (3.2) and obtain a weighted average power spectrum around the lter bank frequencies f[m] that are uniformly spaced in the mel cepstrum. This step smooths the frequency responses and eliminates harmonics. (3) Compute the log of power spectrum ltered by (2), and perform a discrete cosine transform (DCT) to obtain the MFCCs. Typically the rst 13 MFCCs are sucient to characterize the segmental phonetic information in a speech audio frame [43]. 3.2.3 Dynamic Time Warping DTW is a dynamic programming technique that measures the similarity and nds the minimum-distance warping path between two time series [96]. Given two time 36 series A and B, of length m and n, respectively, A = [a 1 ;a 2 ;a 3 :::a m ]; (3.3) B = [b 1 ;b 2 ;b 3 :::b n ]; (3.4) the distance (typically Euclidean) of a i and b j is denoted dist(i;j) =ja i b j j; 1im; 1jn: (3.5) A 2D cost matrix D of size m by n is constructed, where D(i;j) represents the minimum distance between two partial series A 0 = [a 1 ;a 2 ;a 3 :::a i ] and B 0 = [b 1 ;b 2 ;b 3 :::b j ]. Boundaries of D are initialized as D(i; 0) =D(0;j) =infinity; 1im; 1jn: D(0; 0) = 0; (3.6) and then D is lled from D(1; 1) to D(m;n) with the pattern D(i;j) =dist(i;j)+min[D(i 1;j 1);D(i 1;j);D(i;j 1)]; 1im; 1jn: (3.7) The monotonicity criterion imposed on D constrains the minimum-distance warping path D(i;j) to go through one of the previous-state cells. The algorithm increments i and j until the cost matrix is lled such thatD(m;n) is the minimum distance between series A and B. The minimum-distance warping path can be determined by back-tracking from D(m;n) to D(1; 1) by locating the minimum- distance previous states. Since a single member in one series can map to multiple 37 successive members in the other series, the two series can be of dierent lengths. In the presented work we apply DTW to align pairs of audio feature series, and obtain the warping relationships for the subsequent alignment of the videos. 3.3 Method Dynamic 3D MRI reconstruction makes use of two companion data sets: real-time MR data, and synchronized noise-cancelled audio recordings. We acquired MR data from a set of parasagittal scan planes, covering the entire upper airway. The same speech corpus was elicited from the subject at each scan plane acquisition. As a preprocessing step, we extracted the MR data/audio tracks that correspond to a token of interest (e.g., utterance /ala/) from a long video/audio acquisition. Since the speech rate tended to vary even during repeated utterances of the same stimuli, we employed audio-based alignment by applying DTW on the audio MFCC time series (MFCC-DTW alignment) to synchronize pairwise repetitions of the stimuli, as illustrated by the ow charts in Figure 3.1. The applied 13 MFCC lter bank frequencies in Hz were: f[1; 2;:::; 13] = [81;171; 271; 383; 508; 647; 802; 975; 1168; 1384; 1624; 1892; 2190]: (3.8) We subsequently generated aligned videos from the MR data by controlling the placement of the sliding window according to the acoustic alignment. As a result, the temporal resolution of the aligned movies is identical to that obtained from real-time 2D imaging. Finally, we constructed the dynamic 3D visualizations in three ways: synthesized movies along 3 coronal planes, 3D tissue surfaces, and 38 Start End Load Audio Save MFCC Pass Through Filter Bank Hamming Window |DFT| 2 log DCT Start Load Test Audio Load Reference Audio Divide Into Frames Divide Into Frames Pass Through MFCC Extractor Pass Through MFCC Extractor Compute Cost Matrix Back Track & Locate Warping Path Save Warping Path End (a) (b) Figure 3.1: (a) MFCC-DTW alignment takes a pair of noise-cancelled audio recordings as input, segments them into overlapping speech frames, and then passes each frame through the (b) MFCC extractor. The MFCC extractor follows the standard MFCC extraction procedures, including generation of short-term power spectrum, passing through a triangular band-pass lter bank, computing the log and the discrete cosine transform (DCT). MFCC-DTW alignment constructs a cost matrix for each pair of MFCCs series, and then back tracks and locates the minimum-distance warping path. 39 3D vocal tract dynamics using manually segmented features from aligned frames. Figure 3.2 shows the ow of pairwise data processing. Here we use the terms reference and test to refer to a pair of data sets as input of the process, in which test is aligned to reference. Notice that the pairwise data processing is audio-based, therefore applicable to align data acquired from dierent scan planes. In addition, the audio of a reference is synchronized with all aligned test videos, and is the only audio being used with visualization after alignment. 3.3.1 Parameter Selection MFCC-DTW alignment employs the DTW algorithm to align pairs of MFCC vector time series extracted from multiple utterances of comparable speech recordings. It is common to use a frame width from 5 ms to 100 ms, and shift subsequent windows by 1/3 of the frame width in MFCC computation [96]. Euclidean distances of two MFCC vectors series from two audio recordings form the cost matrix D in Equation 3.7, from which the MFCC-DTW algorithm derives the minimum-distance warping path. It is then feasible to match each audio frame with each imaging time of repetition (TR) using a nearest neighbor method. This avoids any restriction on the frame width and shift size when synchronizing the MFCC series with the MR data. We performed a two-stage test to exhaustively nd a suitable frame width and shift size, using pilot data from midsagittal plane scans. The data included two sets of acquisitions: one with deliberately varied speech rates, and the other with a normal speech rate. In the rst experiment, we applied MFCC-DTW alignment with all possible combinations of frame width and shift size, using values 1, 2, 3 ... 500 ms for both. We aligned four utterances in the pilot data for three cases: long 40 Reference video 1, 8, 15, 22 … 1, 8, 15, 22 … Mismatched videos MFCC-DTW alignment reference test 1, 5, 14, 20 … Aligned videos Dynamic 3D visualiza!on Figure 3.2: The ow of pairwise data processing. A set of parasagittal scan planes covers the entire upper airway. Acquired data include real-time MR data, and synchronized companion noise-cancelled audio recordings. Since the speech rate and duration tend to vary, reconstruction using the same sliding window would lead to mismatched videos. Audio-based MFCC-DTW alignment synchronizes pairwise audio recordings, and the resulting warping paths guide the placement of the sliding window in reconstructing aligned videos. One reference video and aligned videos from other sagittal planes enable the dynamic 3D visualization, such as the synthesized coronal movies, the 3D tissue surfaces and vocal tract dynamics. 41 audio to short audio (long-to-short) alignment, short audio to long audio (short- to-long) alignment, and normal audio to normal audio (normal-length) alignment. In total, 500x500x3x4 aligned videos were generated to examine residual alignment errors. In the second experiment, we measured error-to-noise ratio (ENR) of all the aligned videos. In computing ENR, it is assumed that: a) additive noise in the image is independent and identically distributed with mean 0 and variance 2 ; b) the true image is uncorrelated with noise. Therefore, E( ) =E(ee +nn ) =E(ee ) +E(nn ) =E(ee ) + 2 2 (3.9) where E( ), E(ee ) and E(nn ) are the mean squared errors between pixel intensities within the reference and test frames, the pure signal dierences (i.e., alignment errors) and the pure noise dierences, respectively. E(ee ) and E(nn ) are computed thus: E( ) = P Nx P Ny P Nz jV R (x;y;t)V T (x;y;z)j 2 N x N y N t (3.10) E(ee ) = P Nx P Ny P Nz jS R (x;y;t)S T (x;y;z)j 2 N x N y N t (3.11) whereN x andN y are dimensions of the frame, andN t is the total number of frames. V R (x;y;t) andV T (x;y;t) denote image pixel intensity in spatial coordinates (x;y) in the t-th frame of the reference video and test video, respectively. S R (x;y;t) and S T (x;y;t) denote pure signal pixel intensity from the reference video and test video, respectively. Here, E(ee ) = 0 means that the two videos are perfectly aligned. MFCC-DTW alignment parameter selection aims to discover the parameters that minimize E(ee ). Apparently E(ee ) is not directly assessable because the noise 42 is always embedded in the acquired videos. However, the noise variance 2 can be estimated from the background regions with no signal. In this work, we dene ENR to be: ENR = E(ee ) E(nn ) = E( ) 2 2 2 2 (3.12) The minimum possible value of ENR is 0; ENR = 1 means that the error energy due to signal misalignment is identical to the noise energy. 3.3.2 In Vivo Experiments Experiments were performed on a 1.5 T Signa Excite HD MRI scanner system (GE Healthcare, Waukesha, WI) with gradients supporting maximum amplitude of 40 mT/m and maximum slew rate of 150 T/m/s. The sampling period was set to 4μs (receiver bandwidth 125 kHz). We used a body coil for radio frequency (RF) transmission and a custom 4-channel upper airway coil (two anterior elements, two posterior elements) for signal reception. Data from the two anterior elements only were used for image reconstruction as the two posterior elements provided low coil sensitivity in the upper airway regions. Parallel imaging was not utilized in this study. MR imaging was performed with a custom real-time imaging framework [100], providing interactive control of scan parameters, image reconstruction, and frame display in real-time. Subject utterances were monitored in real-time using an FOMRI-III in-scanner noise reducing optical microphone system (Optoacoustics, Moshav Mazor, Israel) during MRI scans. In-scanner audio recordings were made simultaneously with MRI acquisitions [10]. 43 The MRI pulse sequence consisted of 1.5 ms excitation, 2.5 ms spiral readout, and 2 ms for gradient rewinder and spoiler. TR was 6.0 ms. 13 spiral interleaves were used to form each image, resulting in temporal resolution of 78 ms. We used gridding reconstruction to reconstruct every single frame and obtained an eective video frame rate of 23.8 fps using a sliding window reconstruction, updating frames every 7-TR (i.e., 42 ms). Each frame had 20 cm 20 cm eld of view (FOV) and 2.4 mm 2.4 mm spatial resolution. Three male native speakers of English (two American English, one Australian English) were used as subjects. Subjects ages ranged from 25 to 30 years; none had undergone any major dental work, major oral or maxillofacial surgery, and had no prior linguistic training. Each subject was screened and provided informed consent in accordance with institutional policy. Each subject was scanned in the supine position with the head immobilized from left-right tilting using foam pads between the head and the receiver coil. Stimuli were projected onto a screen in the scanner room, which could be seen by the subjects through a mirror attached to the receiver coil. All subjects made their best eorts to keep their heads stationary during the experiments. Stimuli consisted of English vowel-consonant-vowel (VCV) sequences /ala/, /ara/, /asa/, and /asa/. Details of each token and the associated regions of interest in the vocal tract are given in Table 3.1. First, pilot data from the midsagittal scan plane with varied and normal speech rates were obtained for the purpose of choosing parameters for MFCC-DTW alignment (see Section 3.3.1). In acquisition of varied-rate speeches, one subject was instructed to read the stimuli at slow speech rates (typically slower than 2.0 s/utterance), and to repeat the productions at fast speech rates (typically faster 44 Study Articulation under examination Stimuli 3D features observed Liquids - place of articulation of tongue tip /ala/ - central tongue tip constriction - stabilization of tongue body - one or two side channels - bracing of tongue root - asymmetry in lateralization - coordination of tongue tip and body /ara/ - retro exion / bunching of tongue - formation of side channels - pharyngeal / tongue root gesture - location of central constriction - labial approximation - retro exion or bunching Sibliants - place of articulation of tongue tip /asa/ - more anterior tongue tip - tongue body posture -deeper tongue groove - laminality of tongue blade /asa/ - retro exion or bunching - shape and extension of tongue groove - more controlled tongue body (less movement observed) - shape and location of constriction - shallower, longer post-constriction groove - coordination of tongue tip and body - wider, atter constriction cross-section - labial protrusion Table 3.1: List of targeted articulatory characteristics associated with observed 3D features from English vowel-consonant-vowel (VCV) sequences /ala/, /ara/, /asa/, and /asa/. than 1.0 s/utterance), by articial elongation and contraction, respectively. We monitored and veried the speech rates on-the- y. Twenty-one parallel sagittal slices together covered the entire vocal tract volume. Each slice was 6 mm thick, located with 3 mm overlap with neighboring slices. Subjects uttered 21 repetitions of each token at a normal speech rate, which amounted to a mean value of 1.1 s/utterance and a variance of 0.2 s/utterance during each real-time scan. Pulse sequences were programmed to execute automatic and continuous sweeping of the entire 21 slices without a scanner pause during real-time imaging of the vocal tract. The slice acquisition scheme commenced at the midsagittal slice, and then proceeded to parasagittal slices in an interleaved center-out pattern (see the supplementary video). In addition, the spiral readout gradients between successive slice acquisitions were turned o as 45 a means to provide an auditory trigger for the subjects to prepare for the next repetition. We used the data from 21 sagittal slices to synthesize movies on the other orthogonal planes (e.g., coronal), and used the data from 13 central sagittal slices to construct the dynamic 3D tongue and vocal tract visualization. Finally, we acquired data from 3 uniformly spaced coronal scan planes on the tongue individually, while the subjects were asked to repeat the same speech corpus. The directly acquired coronal movies were utilized for method validation. 3.3.3 Data Visualization and Analysis An orthogonal slice cut in the data volume formed by aligned sagittal frames constitutes the orthogonal 2D view (e.g., coronal, axial). As a result, the number of pixel columns in the synthesized views equal to the number of acquired sagittal slices. The cross-plane (right-left, or R-L direction) resolution is the shift size of the sequential slices (3 mm), but the true spatial resolution is the slice thickness (6 mm), which is lower than in-plane resolution (2.4 mm). The FOV of synthesized view was 200 mm 63 mm, therefore we resized frames, using bicubic interpolation, to the isotropic resolution 1 mm 1 mm that is the greatest common divisor of FOV on two directions, in order to obtain reasonable visualization. Bicubic interpolation is widely used in image processing to resize image by estimating interpolated pixels using a number of closest surrounding pixels. We iterated the process to generate the synthesized coronal videos frame- by-frame. Directly acquired coronal data sets were also aligned to sagittal reference data set using the same MFCC-DTW alignment approach. We evaluated the aligned 3D data using synthesized coronal planes, in comparison with directly imaged coronal movies on the same anterior-posterior 46 (A-P) positions. If the 3D data are correctly aligned, we would visualize tongue features that are evident in directly imaged coronal views, such as the tongue groove [6]. The smoothness of the tongue surface in coronal slices serves as another criterion for method evaluation, because alignment errors will manifest as apparent tissue irregularities in a coronal section of the tongue. We quantitatively assessed the smoothness using a curve tting method. We rst manually segmented the tongues using a locally thresholding tool from sliceOmatic (TomoVision, Magog, Quebec, Canada) software, and then extracted the upper tongue surfaces, and tted the surfaces to polynomial curves, c(x) =c 1 x n +c 2 x n1 +::: +c n x +c n+1 (3.13) wherec(x) is a tted curve with polynomial coecientsc 1 ;c 2 ;c ( n + 1) of degree n, in a least squares sense. We used the root-mean-square (RMS) error of the tted curve, RMS = s P Nx (T (x)C(x)) 2 N x (3.14) to quantitatively assess tongue surface irregularity, where N x denotes number of points on the upper tongue surface,T (x) andC(x) are vertical positions of tongue and tted curve in horizontal position x. The degree of polynomial curve on each coronal plane was selected as the minimum degree of curve tting on directly imaged coronal view that results in sub-millimeter RMS error. Manually segmented tongue and lower jaw contours from aligned sagittal videos establish the cross-section of the 3D surfaces. The volumetric vocal tract data is ready for visualization after 3D nearest neighbor interpolation along the R-L 47 direction. The Matlab (The MathWorks, Natick, MA) 3D visualization toolbox was used to smooth the data volume with a box convolution kernel size of 5, and to extract an isosurface a 3D surface of identical value (isovalue) from the volumetric static data. We empirically determined the isovalue to provide sucient smoothing of the reconstructed lingual volume, preserving the major anatomical features of the tongue (and surrounding parts of the vocal tract). Since one contour depicts the vocal tract boundary on each parallel plane, the synthesized 3D surfaces look like ribbons enclosing the vocal tract with openings at the right and the left ends of the imaged volume (see Figure 3.7). Similarly, manually segmented vocal tracts from aligned sagittal videos build the tube-shaped vocal tract airway [111]. We performed 3D cubic interpolation on the data in Matlab and smoothed the volume with a box convolution kernel size of 3. Instead of using an isosurface, we color-coded the segmented areas on dierent sagittal planes in the visualization. Sequences of 3D static models portray 3D tongue motion and vocal tract articulatory dynamics, which provide ecient visualizations of vocal tract shaping, observable from any viewing angle. 3.4 Results 3.4.1 Parameter Selection Figure ref3.3 displays the minimum-distance warping path for two audio recordings of the utterance /ala/ from the pilot data. The red line represents the preliminary non-linear warping using MFCC-DTW alignment with conventional MFCC settings extracted over 10 ms timeframes (within the region suggested by [43]), using a 1 ms shift size, demonstrating high resolution alignment pairs. 48 Each point on the red line (e.g. (x, y)) stands for the mapping of a frame centered at position x in the longer duration audio signal to a frame centered at position y in the shorter duration audio signal. In contrast, the dotted blue line indicates the naive equal-time alignment (i.e., unaligned case), and the green dashed line shows a uniform end-to-end linear alignment. Amplitude waveforms of the two audio signals and their corresponding MFCC series are shown along the horizontal and vertical axes. Notice that despite the fact that the MFCC frame width and shift size were not yet optimized at this stage, conventional settings already oered reasonable alignment results. We used the pilot data to experiment with parameter selection. Each test involved long-to-short alignment, short-to-long alignment and normal-length alignment. We noted that the results of long-to-short alignment were highly consistent with the results of normal-length alignment, but dierent from the results of short-to-long alignment. Our ndings suggest that it is necessary to avoid aligning an extra-short utterance to an extra-long repetition of the same utterance. We therefore excluded the usage of short-to-long alignment results for the following parameter selection. Figure 3.4 illustrates the average map of ENR (see Equation 3.12) on the aligned videos of the utterances /ala/, /ara/, /asa/, and /asa/. Both frame width and shift size range from 1 ms to 500 ms with 1 ms spacing, which guarantee wide and dense coverage of all possible parameter combinations. Although there is no universal minimum-error setting, the data reveal a general trend that smaller frame widths and shift sizes result in smaller errors. We obtained an estimate of the optimal parameter setting 36 ms frame width and 8 ms shift size by averaging 100 minimum-error settings. 49 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 100 200 300 400 500 600 700 800 900 MFCC-DTW alignment equal−time alignment linear alignment −12 −10 −8 −6 −4 −2 0 2 4 6 8 10 (ms) (ms) MFCC Value (a. u.) 81 2190 802 (Hz) (Hz) 81 802 2190 Figure 3.3: Minimum-distance warping path for aligning pilot data /ala/ long recording to short recording using preliminary MFCC-DTW alignment (red line) with conventional MFCC settings extracted over 10 ms timeframes, using a 1 ms shift size. Each point on the red line (e.g. (x, y)) stands for the mapping of a frame centered at position x in the longer duration audio signal to a frame centered at position y in the shorter duration audio signal. In contrast, the dotted blue line indicates the naive equal-time alignment (i.e., unaligned case), and the green dashed line shows a uniform end-to-end linear alignment. Amplitude waveforms of the two audio signals and their corresponding MFCC series are shown along the horizontal and vertical axes. 3.4.2 Evaluation and Data Visualization We qualitatively and quantitatively assessed the plausibility of the results by comparing directly acquired coronal movies (ground truth) and synthesized coronal movies. Figure 3.5(a) illustrates such comparison among directly acquired (D), reformatted MFCC-DTW aligned (A), linearly aligned (L) and unaligned (U) coronal images on 3 color-coded parallel coronal planes: tongue tip (alveolar, in red), tongue blade (post-alveolar, in green) and tongue front (hard palate, in blue). Images were selected from the production of /s/ in the utterance /asa/ 50 Frame Width (ms) Shift Size (ms) 100 200 300 400 500 50 100 150 200 250 300 350 400 450 500 1 1.2 1.4 1.6 1.8 2 Figure 3.4: Error-to-noise ratio (ENR) map of average results of MFCC-DTW long- to-short and normal-length alignments from four utterances /ala/, /ara/, /asa/, and /asa/ with frame widths and shift sizes ranging from 1 ms to 500 ms with 1 ms spacing. Frame width and shift size are critical to MFCC extraction, but there is no widely accepted values among dierent applications. Here the tested parameter combinations guarantee wide and dense coverage of possible choices. There is a general trend that smaller frame widths and shift sizes result in smaller errors.36 ms frame width and 8 ms shift size are estimated optimal parameter setting from averaging 100 minimum-error settings. from one subject when his fricative constriction degree was maximal. The directly acquired images were manually trimmed to display the same FOV (200 mm 63 mm) as the synthesized images. The arrows in the coronal images indicate tongue groove sibilant formation in the three tongue regions. The MFCC-DTW alignment results clearly indicate the tongue groove pattern similar to the directly acquired one, whereas data are poorly synchronized by linear and equal-time alignments. Other salient vocal tract features involving tongue shaping were observed in synthesized images of /ala/, /ara/, and /asa/ utterances respectively (not shown, 51 see supplemental video). Figure 3.5(b) presents segmented tongues from coronal views, and the tted tongue surfaces from sparse upper tongue surfaces. The bright areas standing out of black background are the results from zoomed-in areas highlighted by yellow columns in the sagittal image in Figure 3.5(a). Upper tongue surface points were extracted as circles, and tted to polynomial curves. The degrees of polynomial tting were 4, 6 and 7 for red, green and blue slices (from anterior to posterior), respectively. Figure 3.6 displays the average RMS errors of tongue surfaces curve tting from maximum constrictions of all subjects eliciting the consonants from /ala/, /ara/, /asa/, and /asa/, excluding /ara/ and /asa/ on the anterior plane because the retracted tongues did not appear in the coronal images. The tting degrees were individually chosen, for anterior, middle and posterior coronal slices, as the minimum degrees of curve tting on directly imaged coronal views that result in sub-millimeter RMS errors. The gure indicates that MFCC-DTW alignment (A) RMS errors are comparable to the RMS errors of directly acquired (D) data, and much lower than the RMS errors of linear alignment (L) and equal-time alignment (U) in all coronal slices, especially in the anterior coronal slice. Figure 3.7 illustrates 3D visualization of the tongue and the jaw surfaces for the consonants /l/, /r/ and /s/ in the utterances of /ala/, /ara/, and /asa/. Many details of lingual articulation are evident in the reconstructed tongue surfaces (see arrows), including apical approximation of coronals (/l/), bunched production of rhotic approximants (/r/), and groove formation during sibilant productions (/s/). The 3D models appear qualitatively less accurate at the far right and far left ends of the volume, potentially due to rapid changes in tongue geometry, leading to blurry sagittal images and diculty with manual segmentation. Images shown 52 Figure 3.5: (a) Comparison among directly acquired (D), reformatted MFCC- DTW aligned (A), linearly aligned (L), and equal-time aligned (U) coronal images on 3 color-coded parallel coronal planes. Tongue groove formation may be clearly seen in the directly acquired images and reformatted MFCC-DTW aligned images (see arrows), but not in results of linear and equal-time alignments. (b) Manually segmented tongues from coronal views, and the tted tongue surfaces from sparse upper tongue surfaces. The bright areas are the segmentation results from zoomed- in areas highlighted by yellow columns in the sagittal image in (a). 53 Direct Aligned Linear Unaligned 0 0.5 1 1.5 2 2.5 3 3.5 4 Root−mean−square error (mm) Root−mean−square error of curve fittin g Anterior coronal Middle coronal Posterior coronal Figure 3.6: Average root-mean-square (RMS) errors of tongue surface tting: maximum constrictions of all subjects during consonant production in /ala/, /ara/, /asa/, and /asa/ sequences, excluding /ara/ and /asa/ on the anterior plane. The tting degrees are individually chosen, for anterior, middle and posterior coronal slices, as the minimum degrees of curve tting on directly imaged coronal views that result in sub-millimeter RMS errors. MFCC-DTW alignment (A) results are comparable to directly acquired (D) results, and are much lower than results of linear alignment (L) and equal-time alignment (U) in all coronal slices, especially in the anterior coronal slice. in Figure 3.7 were extracted from the reconstructed dynamic 3D movie (78 ms temporal resolution) that is provided as supplemental material. Supplemental materials also include dynamic 3D movies of the tube-shaped vocal tract, in which the segmented airway from each sagittal slice is color-coded. 3.5 Discussion To our knowledge, the proposed method is the rst demonstration of dynamic 3D visualization of the vocal tract shaping using real-time MRI and synchronized audio recordings. Since the proposed method relies on 2D real-time MRI data, it 54 (a) (c) (b) Figure 3.7: Selected individual frames from dynamic 3D visualizations of the tongue and the jaw surfaces for the consonants /l/, /r/ and /s/ in the utterances of /ala/, /ara/, and /asa/ (linguistically salient aspects of vocal tract shaping indicated with arrows). (a) apical coronal articulation (approximation of the tongue tip to the alveolar ridge) during lateral production in the utterance /l/, (b) bunched coronal articulation during production of rhotic approximant in the utterance /r/, and (c) tongue groove formation during sibilant production in the utterance /s/. The 3D models appear qualitatively less accurate at the far right and far left ends of the volume, potentially due to rapid changes in tongue geometry, leading to blurry sagittal images and diculty with manual segmentation. The dynamic 3D visualization of the four utterances is provided as supplemental materials. inherits the image quality problems of 2D real-time MRI. Signicantly low SNR due to poor coil sensitivities of our current receiver coil array was observed in the middle to lower neck. This made it dicult to observe the dynamics of the vocal cord, which is another region of interest for the application of the proposed approach. In addition, the proposed technique requires ner slice resolution than 6 mm to improve visualization of coronal-slice tongue shape, but the image SNR is unacceptable with the slice thickness thinner than 6 mm in our current 2D real- time MRI of speech. Together 6 mm thickness and 3 mm overlap result in a volume covering the entire vocal tract with reasonable pixel resolution and data delity in the cross-plane direction, since a 2D image is a projection of tissue slice. We made trade-os among image quality, data delity and scan time. Midsagittal plane is 55 most commonly used in speech research to get a full view of the vocal tract from the lip opening to the glottic region, and acquisition of contiguous parallel sagittal planes is more time-ecient in covering the entire vocal tract than acquisition of stack of axial or coronal planes, because vocal tract has much smaller dimension alone R-L direction than A-P or superior-inferior (S-I) direction. If we prescribe axial or coronal plane, more slices are required for covering the entire vocal tract, leading to more repetitions of the speech corpus for subjects. We have performed our experiment on three subjects using dierent stimuli including VCV utterances, long English words, and English sentences. Experiments involving English sentences require multiple separate scans to acquire data for all the sagittal slices covering the entire vocal tract, because of scan time limitations. Nevertheless, the MFCC-DTW technique successfully aligned separately-acquired data to a similar level in all cases (more than ten). We demonstrate manually-segmented results of four short, representative VCV utterances in Figure 3.7 and the supplemental dynamic 3D movies, which demonstrate 3D features of interest. The number of triangular lters used in MFCC extraction, 24 to 40, is widely accepted in audio signal processing community [43], and for speech processing applications, the rst 13 out of 24 MFCCs are commonly used [43]. We tried dierent values within the range, and observed little dierences in the alignment results. Figure 3.3 shows that the minimum-distance warping path is a non- decreasing, discontinuous curve constrained by the monotonicity criterion. The curve veries the abilities of MFCC-DTW alignment to align two productions where the speech rates are unstable, and to align two MFCC sequences of dierent lengths because one-to-many and many-to-one mappings are both allowed. The 56 drawbacks of DTW include quadratic time complexity and memory requirement. DTW is a dynamic programming algorithm using a cost matrix whose size is proportional to the product of the lengths of two time series [99]. This puts practical limits on the usage of DTW for aligning extremely long acquisitions or using extremely small shift sizes. Other groups have accelerated traditional DTW and improved memory occupancy eciency by sacricing minor accuracy. Following Salvador and Chans classication [99], established approaches fall into three categories: 1) Constraints which limit the number of cells in the cost matrix to be evaluated, such as Sakoe-Chiba Band [98] and Itakura Parallelogram [45]; 2) Data Abstraction which reduces the sizes of time series as DTW input [15]; and 3) Indexing which prunes out the number of times DTW runs using lower-bounding functions [47]. Our results indicate that the performance of long-to-short alignment is superior to that of short-to-long alignment. We believe this is because when a subject intentionally produces short utterances, most acoustic features are shortened, and some may be discarded. When a subject intentionally prolongs utterances, he appears to preserve all the acoustic features, or even adds some abnormal acoustic features. The MFCC-DTW algorithm simply discards additional features by mapping multiple entries to a single entry in the test audio in long-to-short alignment, such as period 1.6 s to 1.9 s in Figure 3.3. But when the algorithm aligns a short utterance to a long utterance, misalignments may occur when the algorithm aligns normal acoustic features to abnormal ones due to articial elongation of the utterance. Since experimental audio recordings for utterances collected to date have a mean of 1.1 s/utterance and a variance of0.2 s/utterance, this suggests that short-to-long alignment is not a concern in real audio alignment. 57 There is no universal optimal value of frame width or shift size for short-time audio processing, such as MFCC extraction. We observed that 100 minimum- error parameter settings cluster in a small area, and thus we averaged them for our estimate. The resulting 36 ms frame width falls into the typical range conventionally used in speech processing (5 ms to 100 ms, [96]), and is very close to that of 40 ms used in [43], but the shift size is only 2/9 of the frame width, smaller than the suggested 1/3 of the frame width [96]. Coronal views derived from the reconstructed data using this technique are a good example of the type of arbitrary reformatting that can only be done with 3D data, and provide a clear method by which to examine the synthesized 3D vocal tract data. Poor alignment leads to asymmetry of reconstructed tongue and jagged surfaces (i.e., stair-step artifact), which are evident in linear and equal-time alignment results, but are greatly mitigated in MFCC-DTW alignment results. The results from the RMS error plots show that DTW-MFCC alignment invariably outperforms the other two alignment methods, and conrm the observations of qualitative evaluations. The RMS errors of linear and equal-time alignments from anterior slice are signicantly higher than from middle and posterior slices (see Figure 3.6), because the vocal tract constriction of /l/ and /s/ is located between the tongue tip and alveolar ridge, and lasts for a very short period (about 100 ms to 200 ms), so that any misalignment would be manifest as stair-step artifact. In addition, linear and equal-time alignments tend to be very sensitive to the duration, the start and end of segmented data, to which MFCC-DTW alignment exhibits the robustness. Therefore the careful data/audio segmentation of single short utterances by mutual inspection of video/audio in the preprocessing improves accuracy of the linear and equal-time alignments. 58 From a temporal perspective, MFCC-DTW alignment has poorer results at the start and the end of each utterance, compared with the medial data portions. The likely reason is that MFCC-DTW alignment relies on audio information, but there are no clearly identiable acoustic landmarks to support proper alignment at the start and end of the utterance, corresponding to points in time when the articulators just leave from or move back to rest positions (without producing sound). In spite of these limitations, the intervals of most interest for linguistic research are focused on periods of more active articulation, when acoustic- articulatory alignment is more easily achieved, and slight mismatches at the start and the end of each utterance are therefore less problematic. The 3D movies generated using this technique successfully allowed for the dynamic visualization of many of the salient articulatory features anticipated in these stimuli. However, one important limitation of the method at this stage of development is the presence of minor surface irregularities in the reconstructed object. Several possible reasons are: 1) Image quality issues: o-resonance and motion artifacts could blur the air-tissue boundaries, resulting in contour tracking errors. 2) Slice thickness: the shape of the tongue changes along the R-L direction, especially at both ends of the tongue, where steep changes exist. 3) Gross head motion: some head movement is inevitable during multiple repetitions, despite the limiting eect of having subjects heads immobilized using foam paddings. 4) Varying degrees of jaw opening: subjects could not precisely repeat the same degree of jaw opening in dierent repetitions. 5) Manual segmentation errors: manual segmentation currently outperforms any automatic segmentation methods, but can be weakened by shortcomings including reproducibility errors, operator fatigue and bias. 59 Finally, it is worth noting that high temporal resolution 3D vocal tract data is valuable not only to linguistic studies, but could also be useful for clinical research, including the investigation of articulatory dierences between the pre- and post- operative vocal tracts in glossectomy patients [76]. 3.6 Conclusion We presented a novel method for 3D dynamic imaging of human vocal tract airway shaping (and of the associated articulators, notably, the tongue) based on 2D real- time MRI of parallel sagittal slices that are independently acquired from repetitions of the same speech corpus. The technique applies DTW to comparable series of audio MFCC feature vectors to compensate for the temporal mismatches of the videos resulting from varied speech rates. With this technique, we were able to reconstruct 3D vocal tract movies with 2.4 mm 2.4 mm 3 mm spatial resolution and 78 ms temporal resolution, from which we successfully visualized lingual features of several tested utterances. The proposed method can give improved insights into the goals of speech production, since it can provide high temporal resolution information about the changing geometry of the entire vocal tract data which are not available from conventional 2D/3D MRI techniques. 60 Chapter 4 Dynamic 3D Imaging of Speech 4.1 Introduction The vocal tract is the universal human instrument, played with great dexterity and skill. The articulation of vocal tract shaping during speech has been extensively studied in order to characterize the relationship between articulation and acoustics, understand critically controlled aspects of speech production, and describe characteristic phonetic properties of a variety of languages. Most of these speech studies have been performed by analysis of lips, tongue, velum and glottis dynamics that carry important information about the spatiotemporal properties of speech gestures. Besides linguistic studies, vocal tract dynamics is also useful for clinical research, including the investigation of articulatory dierences between the pre- and post-glossectomy [34], the airway collapse of obstructive sleep apnea [70], and the assessment of laryngeal vestibule closure mechanisms of swallow [68]. Seeing speech might be the most challenging application of vocal tract imaging, as the speech production usually involves motions of most of the critical articulators. Researchers have used X-ray [107, 18], ultrasound [109, 91], computed 61 tomography (CT) [89], electromagnetic articulography (EMA) [88, 121], and MRI [79, 108, 82, 85, 66, 112, 129, 33, 32] for speech imaging. Compared with other imaging modalities, MRI has been shown to be advantageous as a safe and noninvasive tool that provides whole vocal tract volume coverage, and clean air-tissue boundary. The most critical shortcoming of MRI for speech imaging has traditionally been the imaging speed due to physical and physiological constraints. This limitation has been circumvented or overcome by imaging sustained production of fricatives [79], using gated cine-MRI technique [108], applying fast and ecient data acquisition [82], and acceleration through parallel imaging [95, 36] and compressed sensing [69]. Most notably, Niebergall et al. have achieved temporal resolution of 33ms by applying radial readouts and nonlinear inverse reconstruction [85], and Lingala et al. have reconstructed speech MRI at 12ms temporal resolution using spiral readouts and compressed sensing [66]. The previously stated techniques have been limited on a single plane view of articulation, while dynamic 3D MRI would be more desirable as it would provide 3D dynamics of the entire vocal tract for proper understanding of speech production. Since conventional MR systems and reconstruction methods did not meet the requirements for capturing vocal tract dynamics in real-time, compromised engineering methods have been proposed, including model-tting methods and registration based methods. Engwall et al. has established tongue model using static MRI and electropalatography (EPG), and tted the model to EMA dynamics [27]. Similarly, video data from pellets marked on the human face were combined with 3D static MRI data to linearly model the articulators [5]. 2D cine MRI has been extended to 3D using multi-planar method, which heavily relies on the synchronization between speech production and MRI acquisition [112]. 62 Zhu et al. have produced dynamic 3D visualization of vocal tract shaping using audio-based alignment of parallel-slice data acquired from dierent scans [129]. All of these aforementioned methods required some sort of blend of information from separated operations, and failed to acquire and reconstruct the 3D MRI data directly. Zhu et al. has utilized the low-rank constraint by using partially separable (PS) [60] model to accelerated stack-of-spirals speech MRI by a factor of 8. In the data acquisition, one central k-space plane was repeatedly sampled and interleaved by other k-space planes. The temporal and spatial bases of the model were extracted from the frequently acquired plane and other planes, respectively. Fu et al. combined sparsity and low-rank constraints, in which they used spirals [33] or cones [32] trajectories as navigator for temporal basis, and Cartesian sequences for spatial basis. Most recently, they combined the cone navigator and Cartesian readout in the same TR, and achieved very high frame rate [32]. This method utilizes a global low-rank assumption, therefore, periodicity (speech repetitions) during the data acquisition is highly favorable, and one-time movement of the vocal organs (such as swallow) is possibly missing after the reconstruction. In this work, we propose to use a combination of an 8-channel custom speech head coil, a variable density golden-angle stack-of-spirals sequence, and constrained reconstruction using temporal nite dierence and spatial total variation as constraints. We have achieved a net acceleration of 60, with 26 ms temporal resolution and 2.5 mm isotropic spatial resolution. The proposed method does not require any speech repetition in the data acquisition for proper image reconstruction. The remainder of this paper is organized as follows. We rst brie y introduce the background of data acquisition sequence and constrained reconstruction. We then describe our sampling protocols with dierent acquisition 63 strategies, validation using multi-slice 2D real-time MRI, and in-vivo experiment designs. Following which, we demonstrate the results of constrained reconstruction, and the comparison with the multi-slice MRI at corresponding locations. Finally, we critically discuss the present approach and propose potential future directions for this work. 4.2 Background 4.2.1 Data Acquisition Spiral trajectory balances trade-os among temporal resolution, spatial resolution and SNR, and has been shown to be very robust in speech data acquisition [82]. As extension of 2D spiral to 3D, stack-of-spirals sequence addsk z phase encoding, and within each k z plane, the conventional 2D spiral sequence can be applied. Golden angle [119] increment has nearly uniform sampling pattern that allows retrospective temporal resolution selection, and can be easily incorporated into each k z plane of stack-of-spirals sequence. Figure 4.1(a) demonstrates the sequential 2D spiral interleaves. The color level corresponds to the acquisition order of individual readouts. One of the interleaves is extracted and rotated with golden angle to produce Figure 4.1(b). Figure 4.1(c) shows the 3D golden angle stack-of-spirals sequences used in this study. 4.2.2 Image Reconstruction We employed a sparse SENSE image model [54, 67] using temporal nite dierence and spatial total variation as l 1 -norm penalties. The image m is obtained by 64 (a) (c) (b) Figure 4.1: (a) Sequential 2D spiral interleaves. The color level corresponds to the acquisition order. (b) Golden angle 2D spiral interleaves. The sampling patter is more uniform and allows retrospective temporal resolution selection. (c) 3D golden angle stack-of-spirals sequence, each k z level is the same as Figure 4.1(b). minimizing f(m) =kF u Smyk 2 2 + 1 kT t mk 1 + 2 kT s mk 1 (4.1) where F u is the undersampling Fourier operator, S is the sensitivity operator, y is the acquired k-space data, T t is the temporal nite dierence, T s is the spatial total variation, 's were carefully chosen to avoid compression artifacts and maintain high data consistency. Once calibrated, 's were held constant for all other studies. The cost function in Equation 4.1 was minimized with a nonlinear conjugate gradient method. 65 4.3 Method 4.3.1 Imaging Protocols We have examined four imaging protocols, (1) k z fully sampling (k z -full), (2) k z partial-Fourier sampling (k z -pf), (3) k z 2 undersampling (k z -2), and (4) k z variable density sampling (k z -vd). (1)k z -full: Each interleave is sampled from allk z levels, and then next interleave is sampled. k z acquisition order is bit-reversed. (2) k z -pf: A partial k z acquisition version of the protocol (1), as illustrated in Figure 4.2(a) - slightly over half of the k z levels are sampled. k z acquisition order is bit-reversed. (3) k z -2: A k z uniformly undersampled version of the protocol (1), as shown in Figure 4.2(b) - only odd numbered k z planes are acquired. k z acquisition order is bit-reversed. (4)k z -vd: Variable densityk z undersampling with sampling probability inverse proportional to k z distance to the center of k-space (Figure 4.2(c)). There is no specick z undersampling strategy or bit-reversed ordering. The intuition and the implementation details are as follows. Sampling the center of k-space more frequently has been a common strategy for dynamic or time-resolved MRI, as has been done in keyhole [115], TRICK [55], Possion disc sampling [69] and many other established techniques. We simulated k z sampling density as radial sampling (1/k r ), and tried to implement golden angle along k z : a)k z density: The sampling density of each k z plane is inverse proportional to k z distance to the center of k-space. For instance, there are 8 k z levels, and thek z 66 kz = 1 kz = 2 kz = 3 kz = 4 kz = 5 kz = 6 kz = 7 kz = 8 ...... (a) (b) (c) Figure 4.2: (a) A partial k z acquisition in which slightly over half of the k z levels are sampled. The red planes are not sampled. (b) A k z uniformly undersampled acquisition where odd numbered k z planes are acquired. (c) Variable density k z undersampling with sampling probability inverse proportional tok z distance to the center of k-space. The acquired k z is selected with golden angle from reformatted k z circle. distance are [3.5, 2.5, 1.5, 0.5, 0.5, 1.5, 2.5, 3.5], then the corresponding density are [4.26%, 5.97%, 9.94%, 29.83%, 29.83%, 9.94%, 5.97%, 4.26%]. Figure 4.2(c) demonstrates the density assignment to the rst half of k z levels. b) k z selection: The k z densities are projected to a pie chart, and the k z order is determined by a golden angle rotating arrow, as shown in the Figure 4.3. The arrow rotates counterclockwise with golden angle (222.48 ) to picksk z levels. The acquisition view number of each k z plane is individually counted, and increases when thek z is selected. This scheme utilizes the uniform property of golden angle sampling to ensure accurate sampling densities. 4.3.2 Validation using Multi-slice Speech MRI The multi-slice speech MRI proposed by Kim et al. [53] simultaneously images a few arbitrary slices, and is able to achieve sucient temporal resolution using constrained reconstruction [66]. We applied the technique on sagittal, coronal and 67 kz 4 3 5 4 8 5 4 6 view 1 1 1 2 1 2 3 1 kz 4 2 5 4 7 5 view 4 1 3 5 1 4 3 2 ... Figure 4.3: The k z densities are projected to a pie chart, and the k z order is determined by a golden angle rotating arrow. The acquisition view number of each k z plane is individually counted, and increases when the k z is selected. axial planes of the vocal tract, and used the reconstructed images as a reference for expected image quality. The signals from cross-sections of three slices are excited in each TR, hence the signals are more saturated and form darker lines in the images. These darker lines were used as markers that facilitate registration of 2D slices and 3D volume. The registration between 2D and 3D data were performed manually using static data of characteristic vocal tract shaping, such as tongue grove from sustained production of fricative sound /s/. The sustained productions data were fully sampled and reconstructed using conventional linear reconstruction to avoid motion artifacts and reconstruction induced artifacts. After registration, speech tasks were imaged using 2D multi-slice and 3D MRI separately and compared qualitatively. 4.3.3 Experiment Experiments were performed on a 1.5 T Signa Excite HD MRI scanner system (GE Healthcare, Waukesha, WI) with gradients supporting maximum amplitude 68 of 40 mT/m and maximum slew rate of 150 T/m/s. The sampling period was set to 4μs (receiver bandwidth125 kHz). We used a body coil for radio frequency (RF) transmission and a custom 8-channel upper airway coil for signal reception. MR imaging was performed with a custom real-time imaging framework [100], providing interactive control of scan parameters. Simultaneous audio recording could be possible [10] but was not utilized in our experimental study. The 2D multi-slice MRI simultaneously imaged 1 sagittal, 1 coronal, and 1 sagittal slices. The slice thickness was 6 mm and the eld-of-view (FOV) was 2020 cm 2 . The spiral interleaves were calculated with 12 interleaves and spatial resolution of 2.52.5 mm 2 . The (pseudo) golden angle spiral sequence consists of 34 interleaves. The TR was 6.1 ms, and the temporal resolution was 219.6 ms. The 3D MRI excited a midsagittal slab of 5 cm thickness, and acquired data using a stack-of-spirals sequence. The interleaves within eachk z level are identical to those of 2D multi-slice MRI. There are 20 k z -encoding planes, resulting in 2.52.52.5 mm 3 spatial resolution. The TR was 6.4 ms, and the temporal resolution was 1.54 sec. The subject, a native speaker of Chinese, was screened and provided informed consent in accordance with institutional policy. The subject has not undergone any major dental work, major oral or maxillofacial surgery. The subject was scanned in the supine position with the head immobilized from left-right tilting using foam pads between the head and the receiver coil. Stimuli were posted inside the scanner bore in front of the subject. The stimuli consisted of static and dynamic parts. The subject was instructed to sustain the production of /s/ during the static part. The dynamic part included (1) counting numbers: 1, 2, ..., 10; (2) twice of /za-na-za/, /zu-nu-zu/ and /zi- 69 ni-zi/; (3) relatively slow production of /a:/ and /i:/. The duration of static and dynamic parts were approximately 5 sec and 30 sec, respectively. 2D multi-slice images (dynamic part) were reconstructed using sparse SENSE constrained reconstruction with temporal nite dierence penalty [66]. The temporal windows ofk z -full,k z -pf andk z -2 included one interleave perk z plane, and similar acceleration of 1 interleave/plane/frame has been successfully applied to 2D real-time speech MRI [66]. More specically, k z -full had 20 k z levels, and was reconstructed with 20 interleaves/frame, k z -pf was reconstructed with 12 interleaves/frame, and k z -2 was reconstructed with 10 interleaves/frame. The resulting temporal resolution were 122, 73.2 and 61 ms. k z -vd was reconstructed at 20, 12, 10 interleaves/frame to compare with corresponding above protocols. In addition, k z -vd was reconstructed with smaller numbers (8, 6, 4) of interleaves/frame, leading to temporal resolution of 51.2, 38.4 and 25.6 ms. Reconstruction at dierent acceleration demonstrates the exible temporal resolution property and helps to determine the most appropriate acceleration for dierent speech tasks. Reconstructed images and extracted temporal proles were used for qualitative image assessment. 4.4 Results Figure 4.4 illustrates manually enlarged and registered images of the 2D multi-slice (left) and 3D (right) MRI from (a) sagittal, (b) coronal, and (c) axial slice of the vocal tract, respectively. Arrows point out the darker lines due to signal saturation. Since the 2D MRI has larger voxel size 2.52.56 mm 3 than 2.52.52.5 mm 3 of 3D MRI, two adjacent 3D MRI slices were averaged and shown. 3D MRI shows higher SNR (see vocal cavity in coronal slice), and ner feature than 2D MRI (see 70 (a) (b) (c) Figure 4.4: Manually enlarged and registered images of the 2D multi-slice (left) and 3D (right) MRI from (a) sagittal, (b) coronal, and (c) axial slice. Arrows point out the darker lines due to signal saturation. Two adjacent 3D MRI slices were averaged to match slice thickness of 2D MRI. teeth ridge in axial slice). Since the 2D and 3D data were from two separated acquisitions, the articulations were similar but not perfectly identical, especially the axial slices were less accurately registered. Figure 4.5 demonstrates the temporal proles extracted from 2D multi-slice MRI from (a) sagittal and (b) coronal slices. The 2D dynamics were reconstructed with 2 interleaves/frame, resulting in temporal resolution of 36.6 ms. Not ideal gold standard though, they were the best possible multi-slice data currently possible and served as reference for expected image quality. The locations of the extracted data are indicated in lines in the anatomical images. The A-P line from sagittal image covers signals from lip, tongue, and velum, and the L-R line from coronal image mainly captures the tongue motions and the tongue grove. The axial plane data were not used for comparison in this study. Figure 4.6 and Figure 4.7 show the temporal prole comparisons of A-P and L-R lines, respectively. The reference has temporal resolution of 6 TR but was reconstructed with 2 interleave/frame. k z -full,k z -pf andk z -2 used coil sensitivity map calibrated from k z -full, and k z -vd used coil sensitivity map estimated from 71 (b) (a) Figure 4.5: Temporal proles extracted from 2D multi-slice MRI from (a) sagittal and (b) coronal slices. The temporal resolution is 36.6 ms. The locations of the extracted data are indicated in lines in the anatomical images. itself. k z -vd was reconstructed with 20, 12, and 10 interleave(TR)/frame to compare with results of k z -full, k z -pf and k z -2, respectively. k z -vd was further reconstructed with 8, 6, and 4 TR/frame. The penalties were carefully chosen based on comparison of A-P proles, and best penalties were found in a similar range, therefore, we xed the penalties for all the reconstructions. None of the k z -full, k z -partial or k z -2 has correctly reconstructed the expected A-P or L- R temporal les, but k z -vd results were much better at all temporal resolution. Arrows point out selected details that are best reconstructed. In Figure 4.6, medium (blue) and fast (red) speech data were better reconstructed with higher temporal resolution, and slow (green) speech data were better reconstructed with lower temporal resolution. In Figure 4.7, the trend is clearer that faster speech is in favor of higher temporal resolution, while longer reconstruction window is good for slower speech. Overall, reconstruction with 8 TR/frame balances the trade-o and produces the best results. Figure 4.8 and Figure 4.9 display selected articulation from sagittal and coronal planes, respectively. These images were selected from 4TR/frame reconstruction that is the highest temporal in this study. In Figure 4.8, the lip, tongue, velum 72 2D: 3D: 2TR/frame kz-full 20 TR/frame kz-vd 20 TR/frame kz-pf 12 TR/frame kz-2X 10 TR/frame kz-vd 12 TR/frame kz-vd 10 TR/frame kz-vd 8 TR/frame kz-vd 6 TR/frame kz-vd 4 TR/frame medium fast slow Figure 4.6: The temporal prole comparisons of A-P lines. k z -vd was reconstructed with 20, 12, and 10 TR/frame to compare withk z -full,k z -pf andk z -2, and further reconstructed with 8, 6, and 4 TR/frame. Arrows point out selected details that are best reconstructed. 73 2D: 3D: 2TR/frame kz-full 20 TR/frame kz-vd 20 TR/frame kz-pf 12 TR/frame kz-2X 10 TR/frame kz-vd 12 TR/frame kz-vd 10 TR/frame kz-vd 8 TR/frame kz-vd 6 TR/frame kz-vd 4 TR/frame medium fast slow Figure 4.7: The temporal prole comparisons of L-R lines. k z -vd was reconstructed with 20, 12, and 10 TR/frame to compare withk z -full,k z -pf andk z -2, and further reconstructed with 8, 6, and 4 TR/frame. Arrows point out selected details that are best reconstructed. 74 Figure 4.8: Selected articulation from sagittal plane using 4TR/frame reconstruction. The lip, tongue, velum and glottis are clearly visualized with clean boundaries. and glottis form dierent shapes of vocal tract, and can be clearly visualized with clean boundaries. Figure 4.9 has slightly poorer image quality possibly due to the penalties were selected by inspecting sagittal movies. The tongue surface is still distinguishable and tongue groove can still be identied (arrows). Figure 4.10 is adopted from [65], showing the current consensus about spatiotemporal resolution requirements for various speech tasks. This consensus is amongst speech imaging researchers in attendance at the 2014 Speech MRI Summit. Some boundaries are approximate due to the lack of gold standard, and are being rened through the development of real-time MRI technique. The plots depict the spatiotemporal resolution of four established protocols. Notably, 75 Figure 4.9: Selected articulation from coronal plane using 4TR/frame reconstruction. The tongue surface is distinguishable and tongue groove can be identied (arrows). protocol 2 is from [66], and protocol 4 is from [33]. The red star indicates the highest spatiotemporal resolution achieved by this study, 2.5 mm 3 and 25.6 ms. However, in our currently experiment design, the fast speech that can be captured is approximately the fast speech part (red shaped) in Figure 8 and Figure 9. We estimated the temporal resolution for proper imaging of that part to be approximately 100 ms to 150 ms. Therefore, the eective temporal resolution of proposed method is between 100 ms and 150 ms (red bar in Figure 4.10). 76 Figure 4.10: Current consensus about spatiotemporal resolution requirements for various speech tasks. This consensus is amongst speech imaging researchers in attendance at the 2014 Speech MRI Summit, image courtesy of Lingala [65]. The red star indicates the highest spatiotemporal resolution achieved by this study, and the red bar marks the estimated eective temporal resolution. 4.5 Discussion We have shown constrained reconstruction of dynamic 3D stack-of-spirals MRI in speech imaging application. Since there is no ground truth, we have performed prospective validation with 2D multi-slice MRI that is the best available reference. Variable density stack-of-spirals with high density in the center of k-space outperformed conventional sampling strategies. We have shown the strengths of proposed approach and the potential impact on 3D speech MRI. The proposed method has four important advantages: 1) The proposed sampling yields the most accurate results compared to conventional sequences, and the extracted temporal proles are the most similar to the reference. 2) The proposed sampling scheme has higher sampling density in the center of k-space, which is benecial for sparse reconstruction [69]. 3) The proposed sampling allows 77 exible temporal resolution due to golden angle spiral interleaves and a pseudo golden anglek z selection strategy. The optimal temporal resolution for a dynamic speech MRI scan may be unknown a priori and usually depends on the speech tasks. 4) The proposed reconstruction utilizes temporal nite dierence and spatial total variation constraints, and makes no global periodicity assumption such as temporal Fourier transform constraint or global low-rank constraint. Our experiment design has a number of limitations. The reference data were acquired from separated 2D MRI scans. Therefore, the data are not perfect ground truths due to dierences of speech repetitions and imaging parameters (RF, TE, TR). A delicate digital dynamic 3D vocal tract phantom may be a solution for the needs of validation. Our study has shown that a variable-density sampling scheme is desired, but the density and sampling order may not be optimal, and further investigations are required on more comparison studies or on the theoretical bounds of MRI sampling. Another limitation is that k z -vd makes large jumps in 3D k-space, which could cause more eddy current artifacts. Penalty parameters were manually selected, and it is more challenging when there is no suitable ground truth for quantitative analysis. We have tried to minimize the number of constraints by only using the most powerful temporal nite dierence [66], but found that temporal blurs (compression artifacts) were induced in fast motion data when sucient penalty was employed. Therefore, we added another constraint without temporal assumption, spatial total variation, to less the burden of individual penalty. We have reported the reconstructed temporal resolution up to 25.6 ms (equivalent frame rate 39 frame/sec), and approximately estimated the eective temporal resolution to be 100-150 ms. The true temporal (even spatial) resolution 78 is a more meaningful metric but very challenging to measure. In fact, high frame rate can easily be reached by employing sliding window with small step size but the result hardly delivers more informative. 4.6 Conclusion We have proposed a variable density stack-of-spirals sequence for dynamic 3D speech MRI using constrained reconstruction. We utilized registered multi-slice 2D MRI data as image quality reference for qualitative assessment. Our results demonstrated that the proposed sequence outperformed conventional schemes, and allowed exible temporal resolution. We have achieved temporal resolution as high as 25.6 ms, with an estimated eective temporal resolution 100-150 ms. 79 Chapter 5 GOCART: GOlden-angle CArtesian Randomized Time-resolved 3D MRI 5.1 Introduction Time-resolved 3D MRI is an imaging technique that enables contrast-enhanced MR angiography (CE-MRA) and dynamic contrast enhanced (DCE) MRI. CE-MRA and DCE-MRI are both dynamic and utilize the same enhancement mechanism, but they have dierent goals. CE-MRA focuses on vascular signal, where contrast agent concentrations are very high, and high spatiotemporal resolution is critical. DCE-MRI, however, focuses on tissue signals, where contrast agent concentrations are lower, change more slowly, and these subtle changes allow one to quantify pharmacokinetics. 80 Various sampling and reconstruction techniques have been proposed to address and improve the spatial versus temporal resolution trade-o in CE-MRA and DCE-MRI. Early view-sharing methods such as keyhole [115] and time-resolved imaging of contrast kinetics (TRICKS) [55] lled the missing data from adjacent time frames. Since non-Cartesian sampling is more robust to motion and ecient for dynamic imaging, TRICKS was extended to use radial projections [116] and spirals [25]. Other non-Cartesian implementations include k-space weighted image contrast (KWIC) [103], golden angle stack-of-stars [28], vastly undersampled isotropic projection reconstruction (VIPR) [7], highly constrained back projection (HYPR) [74], and stack-of-spirals [123]. Non-Cartesian sequences are often limited by gradient errors and o-resonance artifacts. For this reason, many investigators have reverted to Cartesian sequences where the phase encode (PE) order provides variable density much like non- Cartesian approaches. Such sequences include Cartesian projection reconstruction (CAPR) [40], stochastic trajectories (TWIST) [61], interleaved variable density (IVD) [117], a multi-level radial prole ordering [3], dierential subsampling with Cartesian ordering (DISCO) [101], variable-density Poisson ellipsoid [58] and an ordering that gradually improves spatial resolution [35]. Cartesian and non-Cartesian sequences have also been combined, starting with Time resolved interleaved projection sampling with 3D Cartesian Phase and Slice encoding (TRIPPS) [24], which applied rasterized radials on the PE plane, and golden angle radial phase encoding (Golden-RPE) [93], which combined radial sampling and Cartesian readouts. TRIPPS and Golden-RPE were succeeded by golden angle (GA) variants [22], variable-density radial (VDRad) [14] and golden angle spiral variants [92]. 81 Most of the aforementioned methods accelerate time-resolved MRI by undersampling in k-space, and used parallel imaging [95, 36] and/or compressed sensing [69] for reconstruction. Poission disc sampling [83] has been a popular choice for undersampling since compressed sensing was introduced to MRI [69]. Numerous algorithms have been proposed for eciently generating Poisson disc sampling patterns, including dart throwing [21], jittered sampling [16], best candidate [75], and more recent O(N) boundary sampling [26] and modied dart throwing [12]. Of relevance to this work, Lebel et al. [58] proposed Poisson ellipsoid sampling based on dart-throwing for dynamic imaging by extending the 2D Poisson disc pattern to 3D k y k z t space. This method provides excellent transform sparsity, is compatible with parallel imaging, and limits temporal redundancy. Unfortunately, it is computationally intensive, has many input variables, and is poorly suited to variable temporal resolution. In contrast, golden angle Cartesian sampling [93] provides more coherent sampling than the Poisson ellipsoid approach, but allows exibility in the specication of temporal resolution during reconstruction and allows for fast on-line generation of the phase encode order. Here we denote PD to variable density Poisson ellipsoid sampling and GA to Cartesian golden angle radial sampling. We propose a modied GA approach that combines the benets of PD and GA. We introduce a sampling probability, via pseudorandom number generation, to add additional incoherence to GA [125]. The proposed approach is termed as GOlden-angle CArtesian Randomized Time- resolved (GOCART) 3D MRI. 82 5.2 Methods 5.2.1 Data Acquisition Phase Encode Ordering We have implemented continuous data acquisition by modifying a standard 3D Cartesian spoiled gradient echo sequence, where full Cartesian sampling is used along the standard frequency encoding directionk x , and PE sampling in thek y k z plane can be freely sub-sampled and/or reordered. Poisson disc sampling has been shown to be suitable for combined parallel imaging and compressed sensing. Poisson ellipsoid [58] expands Poisson disc to k y k z t space with the constraint that samples do not coexist within an ellipsoid surrounding each sample. Variable density is achieved by subdividing the k y k z plane into a series of annuli with exponentially decreasing sampling density, and fully sampling central region with 15% of the total samples. Golden angle sampling [119] supports exible temporal resolution selection in reconstruction because it provides approximately uniform angular sampling for an arbitrary number of spokes. 3D Cartesian GA implementation [22] is applied in the k y k z plane, where Cartesian PEs are selected in order to form a close approximation to golden angle radial spokes. Based on the GA scheme, we introduce a probability of sampling (P) for each PE, a central k-space region where P always equals to 1, and a temporal window (W) within which the same PE is not repeated. Once a PE is chosen by a GA radial spoke going through it, the chance of acquisition is determined by P (P2(0, 1]), such that part of the PEs along the spoke is skipped. The logical next step is that we can acquire data from subsequent spokes by skipping some PEs in one spoke, and achieve more incoherent and uniform sampling. PE skipping is 83 disabled (P=1) within a predened central region because the center of k-space is especially important for preserving low-frequency image information. Considering that each spoke starts from (or ends in) the center of k-space, the center is naturally oversampled, and excessive k-space center sampling density needs to be avoided. The temporal window W prevents frequently repeated PEs, and the window width is dened as integer multiple of the TR. Finally, we do not acquire the corners of the k y k z plane. Figure 5.1 demonstrates the PEs for single time frames for PD, GOCART with P = 0.1, 0.4, 0.7, and 1 (equivalent to the original GA). It is worth noting that data are continually acquired and retrospectively binned into frames for reconstruction. Within each data frame, the rst PE and the last PE may be in the middle of radial spokes, which look like a discontinuity in the k y k z plane (e.g. GA with R=35 in Figure 1). Reduction factors (R) of 35 and 100 are presented for a matrix size of 256150. The P=1 central regions account for approximately 15% of PEs at each undersampling, in accordance with the setting in PD. W was empirically set to 50% of the number of PEs per time frame with 100 acceleration to limit sampling redundancy. 35 is used in our current experimental DCE setup, and 100 is the highest R in this study. As P decreases, more sampling randomization is achieved, and the central region has better coverage. 5.2.2 Constrained Reconstruction We employed a sparse SENSE image model [54, 67] with multiple constraints applied as l 1 -norm penalties. The image m is obtained by minimizing f(m) =kF u Smyk 2 2 + 1 kT t mk 1 + 2 kT s mk 1 + 3 k mk 1 (5.1) 84 R=35 R=100 PD P=0.1 P=0.4 P=0.7 GA GOCART } Figure 5.1: (a) Sampling pattern (k y ,k z ) for single time frames for PD, GOCART with P equal to 0.1, 0.4, 0.7, and 1 (GA), when the reduction factor (R) equals to 35 (top row) and 100 (bottom row). The matrix size is 256150, the P=1 central regions account for approximately 15% of PEs, and the temporal window W was empirically set to 70% of the undersampled temporal resolution to limit the PE redundancy. As P decreases, the sampling pattern appears more random, and there are fewer large gaps in k-space. whereF u is the undersampling Fourier operator, S is the sensitivity operator, y is the acquired k-space data, T t is the temporal nite dierence, T s is spatial total variation, and is the spatial wavelet transform. 's were carefully chosen to avoid compression artifacts and maintain high data consistency. Once calibrated, 's were held constant for all other brain DCE-MRI studies. 2 and 2 were set to zero in in-vivo retrospective studies and phantom simulations to calibrate and validate the GOCART method. This was done to reduce uncertainties that may be induced from multiple constraints. Multiple constraints are used in in-vivo prospective studies to lessen the burden on individual constraints and minimize 85 compression artifacts due to each transform [58]. The cost function in Equation 5.1 was minimized with an augmented Lagrangian method [97], alternating direction methods of multipliers (ADMM). 5.3 Experiments 5.3.1 Point Spread Function Analysis The sampling probability P is a key feature of GOCART and analyzing image- space point spread functions (PSF) provides an intuitive metric for optimizing this parameter. The aim is to reduce coherent side lobes, which is expected to improve compatibility with compressed sensing [69]. The PD sampling tables were always generated separately for dierent temporal resolutions, while GA and GOCART sampling tables were retrospectively segmented from single generations, respectively. The PSFs were obtained by taking inverse Fourier transforms along thek y k z dimensions. All the PSFs were normalized to the peaks, and displayed at the same linear scale. 5.3.2 Retrospective In-vivo Studies Fully sampled DCE data from ve brain tumor patients were acquired on a clinical 3T scanner (HDxt, GE Healthcare, Waukesha, WI) using an 8-channel head coil and T 1 -weighted spoiled gradient echo sequence with 15° ip angle and 6 ms TR. Each subject was screened and provided informed consent in accordance with institutional policy. The conventional DCE data had two sizes: A) three datasets with 25618610 matrix size and 24246 cm 3 FOV, and B) two datasets with 86 25615010 matrix size and 22226 cm 3 FOV. All the data had 35 time frames with approximately 10 sec temporal resolution. In our 3D undersampling sequence, the readout direction k x is always fully sampled, and constrained reconstruction is eectively recovering samples in the k y k z plane. Therefore, we fabricated k y k z t data from fully sampled k x k y t data acquired from conventional DCE protocol, and performed retrospective studies using the k y k z t data. In this way, we obtained a fully sampled reference scan with sucient temporal resolution. A single slice from each conventional DCE dataset was used to synthesize k y k z t data. The data were retrospectively downsampled using PD, GA and GOCART schemes at 20 dierent R's evenly distributed between 5 and 100. To prevent bias, we generated 50 sampling patterns for each scheme by altering the pseudorandom generator seeds (PD, GOCART) and/or initial angles (GA, GOCART). The Pharmacokinetic (PK) parameter K trans were calculated for tumor ROIs, and the agreement of K trans was assessed using concordance correlation coecients (CCCs) and Bland-Altman plots. The results of tumor and vessel regions of interest (ROI) were quantitatively assessed using normalized root-mean-squared-error (nRMSE) between the reconstructed and fully sampled reference images. 5.3.3 Phantom Simulation Studies We manually segmented post-contrast whole-brain DCE-MRI data [58] from a brain tumor patient into ve non-overlapping regions: vessels, tumor boundary, tumor core, cerebrospinal uid (CSF), and other tissue. Pharmacokinetic (PK) parameter K trans of tumor boundary and tumor core was 0.1 and 0.02 min -1 , respectively, which mimicked our in-vivo estimates. Other PK parameters, 87 including v p , v e and tissue T 1 , were given literature values for each region [104, 113]. A population-based arterial input function (AIF) [87], a two-compartment exchange model [104], and the steady state spoiled gradient signal equation, were used to generate the time intensity curves and fabricate dynamic images. We then multiplied coil sensitivities, took the Fourier transform, and added realistic noises. Coil sensitivity maps, noise covariance matrix and signal-to-noise ratio were estimated from the in-vivo data. B 1 + variation and T 2 * decay were not modeled, but could be included later. The phantom had a 256256100 matrix size and a 242419 cm 3 FOV, identical to those of the acquisition protocol for experimental prospective in-vivo data. We simulated a TR by TR acquisition with TR=6 ms for a 10-minute period (100,000 TRs). One single axial slice of size 256100 was used to synthesize k y k z t data acquired using PD, GA and GOCART at R=35 and R=100. PD has no exible temporal resolution in the reconstruction, therefore, the PE ordering of PD at R=100 was separately generated. nRMSE was also used to quantitatively compare the results. 5.3.4 Prospective In-vivo Studies Two prospective GOCART undersampled data were acquired from brain tumor patients. Informed consent was obtained from patients prior to MRI scans. Our Institutional Review Board approved these studies. Patient data were acquired using an 8-channel head coil and an axial T 1 - weighted spoiled gradient echo sequence with 15° ip angle and 6 ms TR. The data had 256256100 matrix size and 222219 cm 3 FOV. Prior to each scan, T 1 maps were acquired using a variable ip angle DESPOT1 method [20]. Multiple 88 R=35 R=100 PD P=0.1 P=0.4 P=0.7 GA } GOCART Figure 5.2: Point spread functions (PSF) for the sampling patterns in Figure 5.1. High acceleration rates have more pronounced side lobes than lower rates. GA (i.e., GOCART with P=1.0) has undersampling artifacts concentrated in clearly visible streaking patterns. The streaking side lobes are eectively attenuated by reducing the sampling probability (P). As P approaches 0, the PSFs closely resemble those obtained with PD. constraints (Equation 5.1) with empirical values were employed [58]. After image reconstruction, a Patlak model [114] was used to estimate PK parameters. The AIF was generated from a population-averaged analytic equation [87]. 5.4 Results Figure 5.2 shows the PSFs of the sampling patterns in Figure 5.1. In general R=100 has higher side lobes than R=35. PD has the lowest amplitude lobes while GA (which is GOCART with P=1) has clearly visible coherent streaks. With GOCART, this coherent streaking diminishes as P decreases (arrows), essentially converging to PD as P approaches 0. All the PSFs were normalized to the peaks, and the minor visible peak dierences are due to interpolation. 89 Figure 5.3(a,b) illustrates the mean nRMSE of tumor and vessel ROIs, respectively. nRMSE was averaged from 50 datasets over ve patients, and the equivalent results were connected into contour lines (isolines). The two axes are R and P ranging from 5 to 100 and 0 to 1, respectively. Here we plotted PD as P=0 for the convenience of comparison. Reconstruction error in large, multi-voxel tumors and small vessels was insensitive to sampling schemes with acceleration rates below 20. GOCART is advantageous over PD or GA because it has lower nRMSE at higher acceleration rates (arrows). Tumors with structures and smooth signal variation were most accurately reconstructed with the moderate randomization obtained with P between 0.3 and 0.5; small, tiny vessels with rapidly changing signal were most accurately recuperated with additional randomization using 0.1<P<0.3. As such, P=0.3 was selected as a compromise and was used for subsequent experiments. The mean and standard deviation comparison of PD, GA and GOCART (P=0.3) with respect to R are shown for tumor and vessel ROIs in Figure 3(c,d), respectively. On the tumor, GOCART outperforms PD and GA with R>35; on the vessel, GOCART overtakes GA in all cases, and outperforms PD with R>75. The standard deviations are small (<3%) compared to the means in all cases. Figure 5.4 contains a single time frame during peak enhancement from a meningioma patient. The tumor and vessel ROIs are shown in the reference image. These results were reconstructed from the same reference data but retrospectively undersampled with PD, GA and GOCART (P=0.3) at R=35 and 100 (labeled). No residual aliasing artifacts are visually apparent and no reduction in eective resolution is seen, as evidenced by images on the rst row. Dierence images between the reference and reconstructed undersampled images (intensity scaled by 90 0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 R R nRMSE nRMSE PD 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 GA 20 40 60 80 100 R P PD 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 GA 20 40 60 80 100 R P { GOCART (a) (b) (c) (d) PD GA GOCART PD GA GOCART Figure 5.3: Mean nRMSE of (a) tumor ROI, and (b) vessel ROI with contours outlining equivalent nRMSE. The two axes are reduction factor R and sampling probability (P) ranging from 5 to 100 and 0 to 1, respectively. PD is plotted as P=0 for the convenience of comparison. GOCART is advantageous with large R (arrows). The mean and standard deviation comparison of PD, GA and GOCART (P=0.3) with respect to R are shown for (c) tumor ROI and (d) vessel ROI. On the tumor, GOCART outperforms PD and GA with R>35; on the vessel, GOCART beats GA in all cases, and outperforms PD with R>75. The standard deviations are small (<3%) compared with the means. 91 Reference PD 35x PD 100x GOCART 100x GOCART 35x GA 35x GA 100x Figure 5.4: A single time frame during peak enhancement from a meningioma patient. Tumor and vessel regions are identied in the fully sampled reference image. Results of PD, GA and GOCART (P=0.3) are shown at R equals 35 and 100 (as labeled). The second row displays the dierence images between the reference and reconstructed undersampled images (intensity scaled by 3). GA results in higher errors in vessels than PD and GOCART especially when R=100 (upper arrows). All these three methods performed similarly well with the tumor, except PD yielded slightly higher error when R=100 (lower arrows). 3) indicate that the errors mainly occur in vessels with abrupt spatial/temporal signal changes. GA results in higher error in vessels than PD or GOCART, especially when R=100 (upper arrows). All three methods performed well within the tumor, but GOCART yielded slightly lower error at R=100 than the other methods (lower arrows), as predicted in Figure 5.3. Figure 5.5 demonstrates CCCs of the same dataset changes as a function of R. As R increases from 5 to 100, CCCs of PD, GA and GOCART decrease from above 0.85 to below 0.55. All three techniques perform similarly when R<50, and GOCART outperform PD and GA when R>50. Figure 5.5(b-g) present Bland- Altman plots with horizontal axis presenting true Ktrans values, and vertical axis presenting Ktrans dierences. The blue line in the middle is the average value of the dierences, and the two red lines provide 95% condence intervals (within average 1.96 standard deviation of the dierences). CCCs at R=100 in general have larger 92 R 0 20 40 60 80 100 CCCs 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 PD GA GOCART True K trans 0 0.1 0.2 0.3 Difference -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 True K trans 0 0.1 0.2 0.3 Difference -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 True K trans 0 0.1 0.2 0.3 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 True K trans 0 0.1 0.2 0.3 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 True K trans 0 0.1 0.2 0.3 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 True K trans 0 0.1 0.2 0.3 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 (b) PD 35x (d) GA 35x (f) GOCART 35x (c) PD 100x (e) GA 100x (g) GOCART 100x (a) CCC CCC: 0.689 CCC: 0.461 CCC: 0.677 CCC: 0.524 CCC: 0.673 CCC: 0.542 Figure 5.5: (a) CCCs of the same dataset changes as a function of R. All three techniques perform similarly when R<50, and GOCART outperform PD and GA when R>50. (b-g) Bland-Altman plots and 95% condence intervals within two red lines. CCCs at R=100 in general have larger bias and variance than CCCs at R=35. PD, GA and GOCART produce similar condence intervals at R=35, and PD results in larger variance but smaller bias than GA and GOCART at R=100. bias and variance than CCCs at R=35. PD, GA and GOCART produce similar condence intervals at R=35, and PD resulted in larger variance but smaller bias than GA and GOCART at R=100. Figure 5.6 shows the time-intensity curves (TIC) of reconstructed results of the data in Figure 5.4. The TICs of R=35 and R=100 were averaged from tumor and vessel ROIs (as labeled) to avoid bias of choosing single voxel. The signal peak regions are critical for PK parameter estimation, and are enlarged in each plot. GOCART yielded peak signals that are closest to reference, except in tumor with 93 0 0.45 Time (sec) Signal Intensity (a.u.) Ref PD GA GOCART Signal Intensity (a.u.) Signal Intensity (a.u.) Signal Intensity (a.u.) Tumor 35× Tumor 100× Vessel 100× Vessel 35× Time (sec) Time (sec) Time (sec) 0.40 0.35 0.30 0.25 0.25 0.45 0.40 0.35 0.30 0.25 0.25 0.60 0.50 0.40 0.30 0.20 0.10 0.60 0.50 0.40 0.30 0.20 0.10 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Figure 5.6: The time-intensity curves (TIC) of reconstructed results of the data in Figure 4. The TICs of reference, R=35 and R=100 were averaged from tumor and vessel ROI (as labeled). Data delity is maintained better at 35times compared to 100times. GOCART yielded peak signals that are closest to reference, except in tumor with R=100 (Figure 5.6 lower left subgure), GOCART peak is slightly (0.3%) lower than PD peak. Overall, GOCART provided the highest temporal delity in the comparison group. R=100 (Figure 5.6 lower left subgure), where GOCART peak is slightly (0.3%) lower than PD peak. Similar results were observed on all the other datasets, which are not shown. Overall, GOCART provided the highest temporal delity in the comparison group. Figure 5.7 demonstrates the phantom experiment outcomes. Figure 5.7(a) is the manually segmented phantom, and the time-varying vessel, tumor boundary and tumor core are labeled. Figure 5.7(b,c) demonstrates the reconstructed TICs of these three ROIs. The ground truth (reference) is shown with a smooth blue line because it has 6 ms temporal resolution, and the temporal resolution of R=35 and R=100 were 4.39 sec and 1.54 sec, respectively. The vessel nRMSE 94 (b) R=35 Time (sec) 0 100 200 300 400 500 600 Signal Intensity (a.u.) 0 0.5 1 1.5 2 2.5 (c) R=100 Time (sec) 0 100 200 300 400 500 600 Signal Intensity (a.u.) 0 0.5 1 1.5 2 2.5 (1) (2) (3) (1) (2) (3) (1) vessel (2) tumor boundary (3) tumor core (a) Ref PD GA GOCART Ref *PD GA GOCART Figure 5.7: (a) Manual phantom segmentation results. The vessel, tumor boundary and tumor core were time-varying in the simulation. (b, c) Reference and reconstructed TICs of the three ROIs at R=35 and R=100, respectively. The PD sampling at R=100 was regenerated independently. The reconstructed signal peak regions of vessel and tumor boundary are enlarged for visualization. Although worse than PD at R=35, GOCART beat GA in both accelerations, and GOCART at R=100 recuperated the most accurate peak signal shapes and peak signal arrival times. 95 (b) (a) Figure 5.8: Representative results with 4 temporal phases obtained on (a) a patient with a left posterior fossa meningioma and (b) a patient with a right- sided acoustic schwannoma. The tumors are arrowed in the last gures where the tumor enhancement was maximal. The temporal resolution was 4.5 sec and the net acceleration was 35. of PD, GA and GOCART at R=35 were 0.90%, 1.16% and 0.89%, respectively, and at R=100 were 3.91%, 2.08% and 0.85%, respectively. There was no apparent dierence in reconstructed images of tumors, where the signals were slowly varying. The reconstructed signal peaks of vessel and tumor boundary are enlarged for visualization. Notice that the corresponding enlarged regions have exactly the same display rage for R=35 and R=100. GOCART most accurately recuperated the varying signals, especially when R was large and the signals had abrupt changes. The DCE-MRI was reconstructed with a net acceleration factor of 35 and temporal resolution of 4.5 sec, the same as current clinical protocol. Figure 5.8 shows representative GOCART results with 4 temporal phases obtained on (a) a left posterior fossa meningioma patient and (b) a right-sided acoustic schwannoma patient, respectively. The tumors are arrowed in the last gures where the tumor 96 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 (a) (b) Figure 5.9: K trans maps calculated from (a) a patient with a left posterior fossa meningioma and (b) a patient with a right-sided acoustic schwannoma (raw images in Figure 5.8). The K trans maps clearly show signicantly higher permeability and leakiness of the blood brain barrier in the tumor tissues (arrows). No leakage is observed in the healthy parenchyma. enhancement was maximal. The temporal evolutions of these two tumors are provided in two supplementary videos. Figure 5.9(a,b) present permeability K trans maps calculated from GOCART prospective results from the left posterior fossa meningioma patient and right- sided acoustic schwannoma patient, respectively (raw images in Figure 5.8). The K trans maps appear as expected: higher blood brain barrier is observed in the tumor tissues (arrows), and acoustic schwannoma has signicant BBB permeability. No leakage is observed in the healthy parenchyma. The CE-MRA was reconstructed with net acceleration factor of 100 to demonstrate the exibly temporal resolution of GOCART. Our retrospective in- vivo studies (Figure 5.6) and phantom simulations (Figure 5.7) have indicated that 97 (a) (b) Figure 5.10: Maximum intensity projection (MIP) of acoustic schwannoma patient on (a) sagittal and (b) axial planes. The image series was reconstructed with net acceleration factor of 100 to demonstrate the exibly temporal resolution of GOCART. Five reprehensive time frames from 23 sec, 33 sec, 41 sec 86 sec and 131 sec illustrate the dynamic pass of the contrast agent in the intracranial vessels. The acoustic schwannoma can be clearly visualized in the axial MIP (arrows). GOCART is most advantageous at highly accelerated vascular imaging. Figure 5.10 demonstrates maximum intensity projection (MIP) of right-sided acoustic schwannoma patient on (a) sagittal and (b) axial planes. Four reprehensive time frames from 33 sec, 41 sec 86 sec and 131 sec illustrate the dynamic pass of the contrast agent in the intracranial vessels. The acoustic schwannoma can be clearly visualized in the axial MIP (arrows). Rotating MIP maps as time progress are provided in supplementary videos. Figure 5.11 contains axial post-gadolinium T 1 -weighted images of the right- sided acoustic schwannoma, from inferior to superior. These images are the last time frame reconstructed from prospective GOCART undersampled data. The tumor has been fully enhanced by the contrast bolus. The high resolution GOCART imaging protocol evidently depicts the shape and ne details of the acoustic schwanoma, as pointed out by the arrows. 98 Figure 5.11: Axial post-gadolinium T 1 -weighted images of the right-sided acoustic schwannoma, from inferior to superior. The tumor has been fully enhanced by the contrast bolus. The arrows point out the evident depiction of the acoustic schwanoma by our high resolution GOCART imaging protocol. 5.5 Discussion GOCART is a novel sparse sampling scheme for time-resolved 3D Cartesian MRI. GOCART combines the best components of GA and PD: GOCART samples the center of k-space very frequently, distributes PEs uniformly throughout the sampling domain (via the ergodic nature of GA), and locally randomizes sampling locations. We have performed retrospective validation on fully sampled in-vivo data and phantom simulation, where the ground truth can be dened for image quality assessment. We have demonstrated that GOCART is favorable for 3D DCE-MRI using constrained reconstruction. We have shown the strengths of GOCART and its potential impact on brain DCE MRI. GOCART has four important advantages: 1) GOCART allows fast, exible, and reproducible case-dependent view-order generation. It required 0.097 sec for GOCART and 31 sec for PD to generate sampling patterns for a 10 min 99 DCE scan on a single 2.5 GHz CPU. GOCART is therefore suitable to real- time selection of imaging parameters such as the FOV and spatial resolution. With a suitable pseudo-random number generator, the sampling pattern can be readily re-generated during image reconstruction, eliminating storage and transfer of an external look-up table. 2) GOCART maintains or improves reconstruction accuracy and temporal resolution relative to alternative methods. In no case did GOCART overly degrade reconstructed image quality and in some cases, specically very high acceleration rates, outperformed PD and GA. These have been demonstrated in both retrospective in-vivo studies and phantom simulation studies. 3) As it is based on a radial acquisition, GOCART has intrinsic variable density sampling. The underlying sampling density is inversely proportional to k r , the radial distance to the k-space center. The center of k-space has higher sampling density, which is benecial for sparse reconstruction [69, 58]. 4) GOCART inherits the exible temporal resolution intrinsic to GA. The optimal temporal resolution for a dynamic MRI scan is often unknown a priori and depends on the PK parameters of the tissue. The phantom simulation also suggested a variable temporal resolution reconstruction that assigns a higher temporal resolution to signicantly changing signals. This is possible with GOCART. 5) Although not shown in this work, GOCART is compatible with multi-level sampling patterns that leverage asymptotic structure in CS and are tailored to the object [1]. Specically, a k r -dependent sampling probability mask can be employed, instead of a uniform sampling probability. GOCART bears similarities and thus shares some of the aforementioned advantages with recently proposed methods such as TWIST [61], IVD [117], DISCO [101] and VDRad [14], which have been used in other contrast enhanced 100 applications. For instance, 1/k r variable density undersampling is similar to IVD and VDRad, and predening a P=1 central region is similar to TWIST and DISCO. Besides, TWIST, IVD, DISCO and VDRad combine adjacent frames to form a fully sampled footprint; GOCART can generate similar footprint, although fully sampling is not always guaranteed. GOCART has enhanced variable temporal resolution ability from more uniform samples. A wider range of temporal resolution selections facilitates exploring optimal experiment designs for dierent PK performances and capturing the signal peaks, perhaps making GOCART better suited for brain DCE MRI. The classical temporal nite dierence was chosen because of its simplicity (convex optimization, single tuning parameter, appropriateness for DCE-MRI). We agree that data-driven reconstruction algorithms such as k-t SLR [63], and blind compressed sensing [64] may oer improved reconstruction performance. However, these were not considered in this study, as these schemes are associated with increased complexity (non-convex optimization, non-trivial tuning of free parameters, convergence not guaranteed and will depend on the sampling pattern). Our experiment design has a number of limitations. In the retrospective in- vivo studies, the reference data with approximately 10 seconds temporal resolution was used as snapshot, which is not ideal especially for fast varying signals. In addition, the retrospective in-vivo data suered from some in ow artifacts. In the phantom simulation studies, data at 6 ms temporal resolution were generated and scanner data acquisition was simulated. However, we could have achieved such high acceleration factors because the phantom lacked sucient anatomical detail. Furthermore, the calibrated 's were dierent in retrospective in-vivo studies and phantom simulation studies, and we empirically chose 's from retrospective in- 101 vivo studies for prospective in-vivo studies. Neither of the experiments was able to test the eect of eddy currents, which may be more noticeable as the sampling probability decreases. Acquiring radial spokes from the edge of k-space toward the center rather than center out may mitigate this eect. One potential improvement may be achieved by taking full advantage of head coil geometry with parallel imaging. The 8 elements of our head coil provide maximum diversity in the axial plane, which was the phase encoding plane in the prospective whole-brain DCE-MRI experiments. Although not specically done in this work, it is possible to optimize GOCART for parallel imaging by incorporating structured undersampling along spokes, as seen in IVD [117] or DISCO [101]. This is also true for PD and GA. GOCART outperformed PD and GA in both retrospective in-vivo studies and phantom simulation studies. However, PD generally outperformed GA in the former studies but was overtaken in the later ones. This is likely due to the fact that GA samples the center of k-space more frequently within temporal windows. 5.6 Conclusion We have compared GOCART with PD and GA through retrospective analysis and phantom simulation, and have shown prospective in-vivo clinical results. Our results demonstrate that GOCART has combined the advantages of both PD and GA, and outperforms both techniques in highly undersampled cases. We provide the source code for GOCART on http://mrel.usc.edu/sharing/GOCART.zip, which includes both Matlab and C implementations, and has additional optional features such as golden angle adjustment for anisotropic FOV and angular perturbation that were not used in this study. 102 Chapter 6 Conclusion 6.1 Summary MRI is a powerful medical imaging modality that is valuable in both clinical examination and science discovery. The most critical drawback is the imaging speed, and lots of important information requires high temporal resolution. I have made contributions to accelerated the conventional MRI for both speech imaging and tumor imaging applications. I used combinations of dierent techniques to achieve sucient temporal resolution while maintaining spatial coverage and resolution. This dissertation work has illustrated the following three contributions. Dynamic 3D Visualization of Speech The dissertation work presents a novel method for the creation of 3D dynamic movies of vocal tract shaping based on the acquisition of 2D dynamic data from parallel slices and temporal alignment of the image sequences using audio information. Multiple sagittal 2D real-time movies with synchronized audio recordings are acquired. Audio data are aligned using MFCCs extracted from 103 windowed intervals of the speech signal. Sagittal image sequences acquired from all slices are then aligned using DTW. The aligned image sequences enable dynamic 3D visualization by creating synthesized movies of the moving airway in the coronal planes, visualizing desired tissue surfaces and tube-shaped vocal tract airway after manual segmentation of targeted articulators and smoothing. Dynamic 3D Speech MIR The dissertation work proposes a variable density 3D stack-of-spirals sequence for dynamic 3D speech MRI. Constrained reconstruction with temporal nite dierence and spatial total variation were used for image reconstruction from high undersampled data. Manually registered multi-slice 2D MRI data of same stimuli acquired from the same subject were used as image quality reference for qualitative assessment. The proposed sequence outperformed conventional undersampling schemes, and allowed exible temporal resolution. Temporal resolution as high as 25.6 ms has been achieved, with an estimated eective temporal resolution 100-150 ms. Dynamic 3D DCE MRI using GOCART The dissertation work develops GOCART 3D Cartesian sampling scheme. GOCART is based on GA Cartesian sampling, but randomly samples the k y k z phase encode locations along each radial spoke. GOCART was evaluated in conjunction with constrained reconstruction of retrospectively and prospectively undersampled in-vivo dynamic contrast enhanced (DCE) MRI data and simulated phantom data. Results from GOCART were equal to or superior to those with Poisson disc or GA sampling schemes. Typical GOCART sampling tables were 104 generated in <100 ms. GOCART is a practical and ecient sampling scheme for highly accelerated DCE-MRI and is well suited to modern reconstruction methods such as parallel imaging and compressed sensing. 6.2 Future Work The proposed fast and exible dynamic 3D MRI techniques have been utilized for speech and brain tumor DCE MRI, and future work can be done to further improve the sampling and reconstruction techniques, and to enlarge the application scope, including further improving spatiotemporal resolution, reconstruction time, designs for better validation methodology, and clinical translation. 3D Speech using Cone Trajectory Stack-of-spiral is a straightforward extension of spiral trajectory and has been utilized in the dissertation work. 3D cone trajectory [38] is another variant of spiral in 3D, where the k z gradient is on during spiral readouts. Cone trajectory has better motion properties than stack-of-spirals, because each readout starts from the center of k-space. In addition, cone trajectory has shorter echo time than stack-of-spirals, as there is no k z phase encoding in cones. Shortened echo time improves sampling eciency and reduces o-resonance artifacts. DCE Model-Constrained Reconstruction Since the reconstructed DCE MRI images are tted to model for pharmacokinetic parameter maps, which are usually the end product of DCE MRI, the prior information from the model can be utilized for higher acceleration. One way is to incorporate the model into forward and backward models to directly connect 105 the k-space data and pharmacokinetic parameters, and apply constraints on the parameter images [37]. Another way is to train an over-complete dictionary from a wide range of possible parameter combinations, and reconstruct the DCE dynamics proles as sparse representation using the dictionary [62]. Reducing Reconstruction Time Currently the reconstruction time is approximately 8 hours for a high-resolution brain DCE data set (35 acceleration), and the time is prolonged for higher acceleration. Keeping the constrained reconstruction framework, the computation cost can be eective reduced by applying array compression [13, 124] to reduce the number of receiver channels. Alternatively, the computation power can be greatly enhanced by using graphics processing units (GPUs) [110]. Dynamic 3D Phantom Validation Phantom experiment is a very useful way for method validation because it provides ground truth. Digital phantoms are inexpensive and exible, with arbitrary temporal and spatial resolution [130, 126]. However, digital phantoms are usually less realistic than in-vivo data, prone to higher acceleration than reality. Physical phantom such as [23] provides controllable and reproducible arterial input function and perfusion, and helps to assess system-related phenomenons such as eddy currents, which is not possible with digital phantoms. Clinical Translations Ultimately, the goal of DCE-MRI development is to translate the technique into preclinical and clinical settings. Although some eorts have been made 106 such as protocol settings on the scanner, a number of challenges remain. The rst is selection of penalty parameters. Empirical settings may be acceptable for feasibility studies, wide acceptance requires automated unbiased parameter selection. [62] is an exciting option because it is parameter-free. Another challenge is the reconstruction time. For practical reason, the reconstruction time should be less than an hour. Patient population examinations are essential for clinical translation, and fortunately, the established collaborations at USC ensure success of accomplishing the objective. 107 Bibliography [1] B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. ArXiv e-prints, 1302:561, Feb. 2013. [2] C. B. Ahn, J. H. Kim, and Z. H. Cho. High-speed spiral-scan echo planar NMR imaging-I. IEEE transactions on medical imaging, 5(1):2{7, 1986. [3] M. Akakaya, T. A. Basha, R. H. Chan, H. Rayatzadeh, K. V. Kissinger, B. Goddu, L. A. Goepfert, W. J. Manning, and R. Nezafat. Accelerated contrast-enhanced whole-heart coronary MRI using low- dimensional-structure self-learning and thresholding. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 67(5):1434{1443, May 2012. [4] H. N. S. Association. Intro. to synchros, servos, and gyros, 1998. [5] P. Badin, G. Bailly, L. Revret, M. Baciu, C. Segebarth, and C. Savariaux. Three-dimensional linear articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics, 30(3):533{553, 2002. [6] T. Baer, J. C. Gore, L. C. Gracco, and P. W. Nye. Analysis of vocal tract shape and dimensions using magnetic resonance imaging: vowels. J. Acoust. Soc. Am., 90(2 Pt 1):799{828, Aug. 1991. [7] A. V. Barger, W. F. Block, Y. Toropov, T. M. Grist, and C. A. Mistretta. Time-resolved contrast-enhanced imaging with isotropic resolution and broad coverage using an undersampled 3d projection trajectory. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 48(2):297{305, Aug. 2002. [8] M. Bergamino, L. Bonzano, F. Levrero, G. L. Mancardi, and L. Roccatagliata. A review of technical aspects of T1-weighted dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) in human brain tumors. Physica medica: PM: an international journal devoted to the 108 applications of physics to medicine and biology: ocial journal of the Italian Association of Biomedical Physics (AIFB), May 2014. [9] E. Bresch, Y.-C. Kim, K. S. Nayak, D. Dyrd, and S. S. Narayanan. Seeing speech: Capturing vocal tract shaping using real-time magnetic resonance imaging. IEEE Signal Process Mag, 25(3):123{132, 2008. [10] E. Bresch, J. Nielsen, K. Nayak, and S. Narayanan. Synchronized and noise- robust audio recordings during realtime magnetic resonance imaging scans. J. Acoust. Soc. Am., 120(4):1791{1794, Oct. 2006. [11] E. Bresch, D. Riggs, L. M. Goldstein, D. Byrd, S. Lee, and S. S. Narayanan. An analysis of vocal tract shaping in English sibilant fricatives using real- time magnetic resonance imaging. INTERSPEECH 2008, pages 2823{2826, 2008. [12] R. Bridson. Fast Poisson Disk Sampling in Arbitrary Dimensions. In ACM SIGGRAPH 2007 Sketches, SIGGRAPH '07, New York, NY, USA, 2007. ACM. [13] M. Buehrer, K. P. Pruessmann, P. Boesiger, and S. Kozerke. Array compression for MRI with large coil arrays. Magnetic Resonance in Medicine, 57(6):1131{1139, June 2007. [14] J. Y. Cheng, M. Uecker, M. T. Alley, S. S. Vasanawala, J. M. Pauly, and M. Lustig. Free-Breathing Pediatric Imaging with Nonrigid Motion Correction and Parallel Imaging. ISMRM, page 312, 2013. [15] S. Chu, E. Keogh, D. Hart, M. Pazzani, and Michael. Iterative Deepening Dynamic Time Warping for Time Series. In In Proc 2 nd SIAM International Conference on Data Mining, 2002. [16] R. L. Cook. Stochastic Sampling in Computer Graphics. ACM Trans. Graph., 5(1):51{72, Jan. 1986. [17] S. P. Cramer and H. B. W. Larsson. Accurate determination of blood- brain barrier permeability using dynamic contrast-enhanced T1-weighted MRI: a simulation and in vivo study on healthy subjects and multiple sclerosis patients. Journal of Cerebral Blood Flow and Metabolism: Ocial Journal of the International Society of Cerebral Blood Flow and Metabolism, 34(10):1655{1665, Oct. 2014. [18] P. Delattre and D. C. Freeman. A dialect study of American r's by x-ray motion picture. Linguistics, 6(44):29{68, 1968. 109 [19] D. Demolin, T. Metens, and A. Soquet. Real time MRI and articulatory coordinates in vowels speech production. Proc. Speech Production Seminar, pages 86{93, 2000. [20] S. C. L. Deoni. High-resolution T1 mapping of the brain at 3T with driven equilibrium single pulse observation of T1 with high-speed incorporation of RF eld inhomogeneities (DESPOT1-HIFI). Journal of magnetic resonance imaging: JMRI, 26(4):1106{1111, Oct. 2007. [21] M. A. Z. Dipp and E. H. Wold. Antialiasing Through Stochastic Sampling. In Proceedings of the 12th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '85, pages 69{78, New York, NY, USA, 1985. ACM. [22] M. Doneva, C. Stehning, K. Nehrke, and P. Brnert. Improving Scan Eciency of Respiratory Gated Imaging Using Compressed Sensing with 3D Cartesian Golden Angle Sampling. ISMRM, page 641, 2011. [23] B. Driscoll, H. Keller, and C. Coolens. Development of a dynamic ow imaging phantom for dynamic contrast-enhanced CT. Medical physics, 38(8):4866{4880, Aug. 2011. [24] J. Du. Contrast-enhanced MR angiography using time resolved interleaved projection sampling with three-dimensional Cartesian phase and slice encoding (TRIPPS). Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 61(4):918{924, Apr. 2009. [25] J. Du and M. Bydder. High-resolution time-resolved contrast-enhanced MR abdominal and pulmonary angiography using a spiral-TRICKS sequence. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 58(3):631{635, Sept. 2007. [26] D. Dunbar and G. Humphreys. A Spatial Data Structure for Fast Poisson- disk Sample Generation. In ACM SIGGRAPH 2006 Papers, SIGGRAPH '06, pages 503{508, New York, NY, USA, 2006. ACM. [27] O. Engwall. Combining MRI, EMA and EPG measurements in a three- dimensional tongue model. Speech Communication, 41(2-3):303{329, 2003. [28] L. Feng, R. Grimm, K. T. Block, H. Chandarana, S. Kim, J. Xu, L. Axel, D. K. Sodickson, and R. Otazo. Golden-angle radial sparse parallel MRI: combination of compressed sensing, parallel imaging, and golden-angle radial sampling for fast and exible dynamic volumetric MRI. Magnetic Resonance 110 in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 72(3):707{717, Sept. 2014. [29] W. T. Fitch. The evolution of speech: a comparative review. Trends in Cognitive Sciences, 4(7):258{267, July 2000. [30] A. K. Foldvik, U. Kristiansen, and J. Kvaerness. A time-evolving three-dimensional vocal tract model by means of magnetic resonance imaging (MRI). Third European Conference on Speech Communication and Technology, 1993. [31] E. M. R. Forum. Ch21. Facts and Figure. Magnetic Resonance, a critical peer-reviewed introduction, 2015. [32] M. Fu, J. L. Holtrop, J. L. Perry, D. P. Kuehn, Z.-P. Liang, and B. P. Sutton. High-resolution full-vocal-tract 3d dynamic speech imaging. ISMRM, page 569, 2015. [33] M. Fu, B. Zhao, C. Carignan, R. K. Shosted, J. L. Perry, D. P. Kuehn, Z.- P. Liang, and B. P. Sutton. High-resolution dynamic speech imaging with joint low-rank and sparsity constraints. Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 73(5):1820{1832, May 2015. [34] C. L. Furia, L. P. Kowalski, M. R. Latorre, E. C. Angelis, N. M. Martins, A. P. Barros, and K. C. Ribeiro. Speech intelligibility after glossectomy and speech rehabilitation. Archives of Otolaryngology{Head & Neck Surgery, 127(7):877{883, July 2001. [35] N. Gdaniec, H. Eggers, P. Brnert, M. Doneva, and A. Mertins. Robust abdominal imaging with incomplete breath-holds. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 71(5):1733{1742, May 2014. [36] M. A. Griswold, P. M. Jakob, R. M. Heidemann, M. Nittka, V. Jellus, J. Wang, B. Kiefer, and A. Haase. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 47(6):1202{1210, June 2002. [37] Y. Guo, Y. Zhu, S. G. Lingala, R. M. Lebel, and K. S. Nayak. Highly Accelerated Brain DCE MRI with Direct Estimation of Pharmacokinetic Parameter Maps. ISMRM, page 0574, 2015. 111 [38] P. T. Gurney, B. A. Hargreaves, and D. G. Nishimura. Design and analysis of a practical 3d cones trajectory. Magnetic Resonance in Medicine, 55(3):575{ 582, Mar. 2006. [39] A. Haase, J. Frahm, D. Matthaei, W. Hnicke, and K.-D. Merboldt. FLASH imaging: rapid NMR imaging using low ip-angle pulses. 1986. Journal of Magnetic Resonance (San Diego, Calif.: 1997), 213(2):533{541, Dec. 2011. [40] C. R. Haider, H. H. Hu, N. G. Campeau, J. Huston, 3rd, and S. J. Riederer. 3d high temporal and spatial resolution contrast-enhanced MR angiography of the whole brain. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 60(3):749{760, Sept. 2008. [41] A. K. Heye, R. D. Culling, M. D. C. Valds Hernndez, M. J. Thrippleton, and J. M. Wardlaw. Assessment of blood-brain barrier disruption using dynamic contrast-enhanced MRI. A systematic review. NeuroImage. Clinical, 6:262{ 274, 2014. [42] W. Hollingworth, C. J. Todd, M. I. Bell, Q. Arafat, S. Girling, K. R. Karia, and A. K. Dixon. The diagnostic and therapeutic impact of MRI: an observational multi-centre study. Clinical Radiology, 55(11):825{831, Nov. 2000. [43] X. Huang, A. Acero, and H.-W. Hon. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River, NJ, 1 edition edition, May 2001. [44] P. Irarrazabal and D. G. Nishimura. Fast three dimensional magnetic resonance imaging. Magnetic Resonance in Medicine, 33(5):656{662, May 1995. [45] F. Itakura. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 23(1):67{72, Feb. 1975. [46] J. I. Jackson, C. H. Meyer, D. G. Nishimura, and A. Macovski. Selection of a convolution function for Fourier inversion using gridding [computerised tomography application]. IEEE transactions on medical imaging, 10(3):473{ 478, 1991. [47] E. Keogh and C. A. Ratanamahatana. Exact Indexing of Dynamic Time Warping. Knowl. Inf. Syst., 7(3):358{386, Mar. 2005. 112 [48] A. B. Kerr, J. M. Pauly, C. H. Meyer, and D. G. Nishimura. New strategies in spiral mr uoroscopy. SMR, page 99, 1999. [49] L. E. Kershaw and H.-L. M. Cheng. Temporal resolution and SNR requirements for accurate DCE-MRI data analysis using the AATH model. Magnetic Resonance in Medicine, 64(6):1772{1780, Dec. 2010. [50] S. G. Kim, L. Feng, R. Grimm, M. Freed, K. T. Block, D. K. Sodickson, L. Moy, and R. Otazo. In uence of temporal regularization and radial undersampling factor on compressed sensing reconstruction in dynamic contrast enhanced MRI of the breast. Journal of magnetic resonance imaging: JMRI, June 2015. [51] Y.-C. Kim, S. S. Narayanan, and K. S. Nayak. Accelerated 3d MRI of vocal tract shaping using compressed sensing and parallel imaging. Proc. ICASSP, pages 389{392, 2009. [52] Y.-C. Kim, S. S. Narayanan, and K. S. Nayak. Accelerated three-dimensional upper airway MRI using compressed sensing. Magn Reson Med, 61(6):1434{ 1440, June 2009. [53] Y.-C. Kim, M. I. Proctor, S. S. Narayanan, and K. S. Nayak. Improved imaging of lingual articulation using real-time multislice MRI. Journal of magnetic resonance imaging: JMRI, 35(4):943{948, Apr. 2012. [54] K. F. King. Combining compressed sensing and parallel imaging. ISMRM, page 1488, 2008. [55] F. R. Korosec, R. Frayne, T. M. Grist, and C. A. Mistretta. Time-resolved contrast-enhanced 3d MR angiography. Magnetic Resonance in Medicine, 36(3):345{351, Sept. 1996. [56] C. Larsson, M. Kleppest, I. Rasmussen, R. Salo, J. Vardal, P. Brandal, and A. Bjrnerud. Sampling requirements in DCE-MRI based analysis of high grade gliomas: simulations and clinical results. Journal of magnetic resonance imaging: JMRI, 37(4):818{829, Apr. 2013. [57] R. M. Lebel, Y. Guo, Y. Zhu, S. G. Lingala, R. Frayne, L. B. Anderson, K. S. Easaw, and K. S. Nayak. The comprehensive contrast-enhanced neuro exam. ISMRM, page 3705, 2015. [58] R. M. Lebel, J. Jones, J.-C. Ferre, M. Law, and K. S. Nayak. Highly accelerated dynamic contrast enhanced imaging. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 71(2):635{644, 2014. 113 [59] S. Lee, E. Bresch, J. Adams, A. Kazemzadeh, and S. S. Narayanan. A study of emotional speech articulation using a fast magnetic resonance imaging technique. INTERSPEECH 2006, pages 1792{1795, 2006. [60] Z.-P. Liang. Spatiotemporal Imaging with Partially Separable Functions. In Joint Meeting of the 6th International Symposium on Noninvasive Functional Source Imaging of the Brain and Heart and the International Conference on Functional Biomedical Imaging, 2007. NFSI-ICFBI 2007, pages 181{182, Oct. 2007. [61] R. P. Lim, M. Shapiro, E. Y. Wang, M. Law, J. S. Babb, L. E. Rue, J. S. Jacob, S. Kim, R. H. Carson, T. P. Mulholland, G. Laub, and E. M. Hecht. 3d time-resolved MR angiography (MRA) of the carotid arteries with time- resolved imaging with stochastic trajectories: comparison with 3d contrast- enhanced Bolus-Chase MRA and 3d time-of- ight MRA. AJNR. American journal of neuroradiology, 29(10):1847{1854, Nov. 2008. [62] S. G. Lingala, Y. Guo, Y. Zhu, S. Barnes, R. M. Lebel, and K. S. Nayak. Accelerated DCE MRI using constrained reconstruction based on pharmaco- kinetic model dictionaries. ISMRM, page 0196, 2015. [63] S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob. Accelerated dynamic MRI exploiting sparsity and low-rank structure: k-t SLR. IEEE transactions on medical imaging, 30(5):1042{1054, May 2011. [64] S. G. Lingala and M. Jacob. Blind compressive sensing dynamic MRI. IEEE transactions on medical imaging, 32(6):1132{1145, June 2013. [65] S. G. Lingala, B. P. Sutton, M. E. Miquel, and K. S. Nayak. Recommendations for real-time speech MRI. Journal of magnetic resonance imaging: JMRI, July 2015. [66] S. G. Lingala, Y. Zhu, Y.-C. Kim, A. Toutios, S. Narayanan, and K. Nayak. High-frame-rate real-time imaging of speech production. SPIE Newsroom, June 2015. [67] B. Liu, Y. M. Zou, and L. Ying. Sparsesense: Application of compressed sensing in parallel MRI. In International Conference on Information Technology and Applications in Biomedicine, 2008. ITAB 2008, pages 127{ 130, May 2008. [68] J. A. Logemann, P. J. Kahrilas, J. Cheng, B. R. Pauloski, P. J. Gibbons, A. W. Rademaker, and S. Lin. Closure mechanisms of laryngeal vestibule during swallow. The American Journal of Physiology, 262(2 Pt 1):G338{344, Feb. 1992. 114 [69] M. Lustig, D. Donoho, and J. M. Pauly. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 58(6):1182{1195, Dec. 2007. [70] A. Malhotra and D. P. White. Obstructive sleep apnoea. Lancet (London, England), 360(9328):237{245, July 2002. [71] S. Masaki, M. K. Tiede, K. Honda, Y. Shimada, I. Fujimoto, Y. Nakamura, and N. Ninomiya. MRI-based speech production study using a synchronized sampling method. J. Acoust. Soc. Jpn. E., 20(5):375{379, 1999. [72] G. McGibney, M. R. Smith, S. T. Nichols, and A. Crawley. Quantitative evaluation of several partial Fourier reconstruction algorithms used in MRI. Magnetic Resonance in Medicine, 30(1):51{59, July 1993. [73] C. H. Meyer, B. S. Hu, D. G. Nishimura, and A. Macovski. Fast spiral coronary artery imaging. Magnetic Resonance in Medicine, 28(2):202{213, Dec. 1992. [74] C. A. Mistretta, O. Wieben, J. Velikina, W. Block, J. Perry, Y. Wu, K. Johnson, and Y. Wu. Highly constrained backprojection for time- resolved MRI. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 55(1):30{40, Jan. 2006. [75] D. P. Mitchell. Spectrally Optimal Sampling for Distribution Ray Tracing. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '91, pages 157{164, New York, NY, USA, 1991. ACM. [76] K. Mdy, R. Sader, A. Zimmermann, P. Hoole, A. Beer, H. F. Zeilhofer, and C. Hannig. Use of real-time MRI in assessment of consonant articulation before and after tongue surgery and tongue reconstruction. 4th Inter. Speech Motor Conf., pages 142{145, 2001. [77] R. D. Nadler, J. H. Abbs, and O. Fujimura. Speech movement research using the new X-ray microbeam system. Proc. 11th Int. Congr. Phonet. Sci., 1:221{224, 1987. [78] S. Narayanan, A. Toutios, V. Ramanarayanan, A. Lammert, J. Kim, S. Lee, K. Nayak, Y.-C. Kim, Y. Zhu, L. Goldstein, D. Byrd, E. Bresch, P. Ghosh, A. Katsamanis, and M. Proctor. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research 115 (TC). The Journal of the Acoustical Society of America, 136(3):1307, Sept. 2014. [79] S. S. Narayanan, A. A. Alwan, and K. Haker. An articulatory study of fricative consonants using magnetic resonance imaging. Journal of the Acoustical Society of America, 98(3):1325{1347, 1995. [80] S. S. Narayanan, A. A. Alwan, and K. Haker. Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part I. The laterals. J. Acoust. Soc. Am., 101(2):1064{1077, Feb. 1997. [81] S. S. Narayanan, E. Bresch, P. Ghosh, L. Goldstein, A. Katsamanis, Y.-C. Kim, A. Lammert, M. Proctor, V. Ramanarayanan, and Y. Zhu. A Multimodal real-time MRI articulatory corpus for speech research. INTERSPEECH, pages 837{840, 2011. [82] S. S. Narayanan, K. S. Nayak, S. Lee, A. Sethy, and D. Byrd. An approach to real-time magnetic resonance imaging for speech production. J. Acoust. Soc. Am., 115(4):1771{1776, Apr. 2004. [83] K. S. Nayak and D. G. Nishimura. Randomized Trajectories for Reduced Aliasing Artifact. ISMRM, page 670, 1998. [84] A. B. T. A. News. Brain Tumor Statistics j American Brain Tumor Association, 2014. [85] A. Niebergall, S. Zhang, E. Kunay, G. Keydana, M. Job, M. Uecker, and J. Frahm. Real-time MRI of speaking at a resolution of 33 ms: undersampled radial FLASH with nonlinear inverse reconstruction. Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 69(2):477{485, Feb. 2013. [86] M. J. Paldino and D. P. Barboriak. Fundamentals of quantitative dynamic contrast-enhanced MR imaging. Magnetic Resonance Imaging Clinics of North America, 17(2):277{289, May 2009. [87] G. J. M. Parker, C. Roberts, A. Macdonald, G. A. Buonaccorsi, S. Cheung, D. L. Buckley, A. Jackson, Y. Watson, K. Davies, and G. C. Jayson. Experimentally-derived functional form for a population-averaged high- temporal-resolution arterial input function for dynamic contrast-enhanced MRI. Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 56(5):993{1000, Nov. 2006. 116 [88] J. S. Perkell, M. H. Cohen, M. A. Svirsky, M. L. Matthies, I. Garabieta, and M. T. Jackson. Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. J. Acoust. Soc. Am., 92(6):3078{3096, Dec. 1992. [89] P. Perrier, L. J. Bo, and R. Sock. Vocal tract area function estimation from midsagittal dimensions with CT scans and a vocal tract cast: modeling the transition with two sets of coecients. J Speech Hear Res, 35(1):53{67, Feb. 1992. [90] M. Poustchi-Amin, S. A. Mirowitz, J. J. Brown, R. C. McKinstry, and T. Li. Principles and applications of echo-planar imaging: a review for the general radiologist. Radiographics: A Review Publication of the Radiological Society of North America, Inc, 21(3):767{779, June 2001. [91] J. L. Preston, P. McCabe, A. Rivera-Campos, J. L. Whittle, E. Landry, and E. Maas. Ultrasound visual feedback treatment and practice variability for residual speech sound errors. Journal of speech, language, and hearing research: JSLHR, 57(6):2102{2115, Dec. 2014. [92] C. Prieto, M. Doneva, M. Usman, M. Henningsson, G. Greil, T. Schaeter, and R. M. Botnar. Highly ecient respiratory motion compensated free- breathing coronary mra using golden-step Cartesian acquisition. Journal of magnetic resonance imaging: JMRI, Feb. 2014. [93] C. Prieto, S. Uribe, R. Razavi, D. Atkinson, and T. Schaeter. 3d undersampled golden-radial phase encoding for DCE-MRA using inherently regularized iterative SENSE. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 64(2):514{526, Aug. 2010. [94] M. I. Proctor, C. H. Shadle, and K. Iskarous. Pharyngeal articulation in the production of voiced and voiceless fricatives. J. Acoust. Soc. Am., 127(3):1507{1518, Mar. 2010. [95] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: sensitivity encoding for fast MRI. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 42(5):952{962, Nov. 1999. [96] L. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, Englewood Clis, N.J, 1 edition edition, Apr. 1993. [97] J. C. Ramirez-Giraldo, J. Trzasko, S. Leng, L. Yu, A. Manduca, and C. H. McCollough. Nonconvex prior image constrained compressed sensing 117 (NCPICCS): theory and simulations on perfusion CT. Medical physics, 38(4):2157{2167, Apr. 2011. [98] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26:43{49, 1978. [99] S. Salvador and P. Chan. Toward Accurate Dynamic Time Warping in Linear Time and Space. Intell. Data Anal., 11(5):561{580, Oct. 2007. [100] J. M. Santos, G. A. Wright, P. Yang, and J. M. Pauly. Adaptive architecture for real-time imaging systems. ISMRM, page 468, 2002. [101] M. Saranathan, D. W. Rettmann, B. A. Hargreaves, S. E. Clarke, and S. S. Vasanawala. DIerential Subsampling with Cartesian Ordering (DISCO): a high spatio-temporal resolution Dixon imaging sequence for multiphasic contrast enhanced abdominal imaging. Journal of magnetic resonance imaging: JMRI, 35(6):1484{1492, June 2012. [102] A. D. Scott, M. Wylezinska, M. J. Birch, and M. E. Miquel. Speech MRI: Morphology and function. Physica Medica, 30(6):604{618, Sept. 2014. [103] H. K. Song and L. Dougherty. Dynamic MRI with projection reconstruction and KWIC processing for simultaneous high spatial and temporal resolution. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 52(4):815{824, Oct. 2004. [104] S. P. Sourbron and D. L. Buckley. On the scope and interpretation of the Tofts models for DCE-MRI. Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 66(3):735{745, Sept. 2011. [105] S. P. Sourbron and D. L. Buckley. Classic models for dynamic contrast- enhanced MRI. NMR in biomedicine, 26(8):1004{1027, Aug. 2013. [106] G. J. Stanisz, E. E. Odrobina, J. Pun, M. Escaravage, S. J. Graham, M. J. Bronskill, and R. M. Henkelman. T1, T2 relaxation and magnetization transfer in tissue at 3T. Magnetic Resonance in Medicine, 54(3):507{512, Sept. 2005. [107] K. Stevens and S. Ohman. Cineradiographic studies of speech. STL-QPSR, 4(2):9{11, 1963. 118 [108] M. Stone, E. P. Davis, A. S. Douglas, M. NessAiver, R. Gullapalli, W. S. Levine, and A. Lundberg. Modeling the motion of the internal tongue from tagged cine-MRI images. J. Acoust. Soc. Am., 109(6):2974{2982, June 2001. [109] M. Stone, T. H. Shawker, T. L. Talbot, and A. H. Rich. Cross-sectional tongue shape during the production of vowels. J. Acoust. Soc. Am., 83(4):1586{1596, Apr. 1988. [110] S. Stone, J. Haldar, S. Tsao, W.-m. Hwu, B. Sutton, and Z.-P. Liang. Accelerating Advanced MRI Reconstructions on GPUs. Journal of parallel and distributed computing, 68(10):1307{1318, Oct. 2008. [111] B. H. Story, I. R. Titze, and E. A. Homan. Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am., 100(1):537{554, July 1996. [112] H. Takemoto, K. Honda, S. Masaki, Y. Shimada, and I. Fujimoto. Measurement of temporal changes in vocal tract area function from 3d cine- MRI data. J. Acoust. Soc. Am., 119(2):1037{1049, Feb. 2006. [113] P. Tofts. T1-weighted DCE Imaging Concepts: Modelling, Acquisition and Analysis. signal, 500(450):400, 2010. [114] P. S. Tofts and A. G. Kermode. Measurement of the blood-brain barrier permeability and leakage space using dynamic MR imaging. 1. Fundamental concepts. Magnetic Resonance in Medicine: Ocial Journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 17(2):357{367, Feb. 1991. [115] J. J. van Vaals, M. E. Brummer, W. T. Dixon, H. H. Tuithof, H. Engels, R. C. Nelson, B. M. Gerety, J. L. Chezmar, and J. A. den Boer. "Keyhole" method for accelerating imaging of contrast agent uptake. Journal of magnetic resonance imaging: JMRI, 3(4):671{675, Aug. 1993. [116] K. K. Vigen, D. C. Peters, T. M. Grist, W. F. Block, and C. A. Mistretta. Undersampled projection-reconstruction imaging for time-resolved contrast- enhanced imaging. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 43(2):170{176, Feb. 2000. [117] K. Wang, R. F. Busse, J. H. Holmes, P. J. Beatty, J. H. Brittain, C. J. Francois, S. B. Reeder, J. Du, and F. R. Korosec. Interleaved variable density sampling with a constrained parallel imaging reconstruction for dynamic contrast-enhanced MR angiography. Magnetic resonance in medicine: ocial journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine, 66(2):428{436, Aug. 2011. 119 [118] J. R. Westbury, G. Turner, and J. Dembowski. X-ray microbeam speech production database user's handbook. Waisman Center on Mental Retardation & Human Development, University of Wisconsin, 1994. [119] S. Winkelmann, T. Schaeter, T. Koehler, H. Eggers, and O. Doessel. An optimal radial prole order based on the Golden Ratio for time-resolved MRI. IEEE transactions on medical imaging, 26(1):68{76, Jan. 2007. [120] S. A. J. Wood. X-ray and model studies of vowel articulation. dissertation, Lund University, 1982. [121] A. Wrench. The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html, 1999. [122] C. Yang and M. Stone. Dynamic programming method for temporal registration of three-dimensional tongue surface motion from multiple utterances. Speech Communication, 38(1-2):201{209, 2002. [123] Y. F. Yen, K. F. Han, B. L. Daniel, S. Heiss, R. L. Birdwell, R. J. Herfkens, A. M. Sawyer-Glover, and G. H. Glover. Dynamic breast MRI with spiral trajectories: 3d versus 2d. Journal of magnetic resonance imaging: JMRI, 11(4):351{359, Apr. 2000. [124] T. Zhang, J. M. Pauly, S. S. Vasanawala, and M. Lustig. Coil compression for accelerated imaging with Cartesian sampling. Magnetic Resonance in Medicine, 69(2):571{582, Feb. 2013. [125] Y. Zhu, Y. Guo, R. M. Lebel, M. Law, and K. S. Nayak. Randomized Golden Ratio Sampling For Highly Accelerated Dynamic Imaging. ISMRM, page 4365, 2014. [126] Y. Zhu, Y. Guo, S. G. Lingala, S. Barnes, R. M. Lebel, M. Law, and K. S. Nayak. Evaluation of DCE-MRI Data Sampling, Reconstruction and Model Fitting Using Digital Brain Phantom. ISMRM, page 3052, 2015. [127] Y. Zhu, Y. Guo, S. G. Lingala, R. M. Lebel, M. Law, and K. S. Nayak. GOCART: GOlden-angle CArtesian Randomized Time-resolved 3d MRI. Magnetic Resonance Imaging (Accepted). [128] Y. Zhu, Y.-C. Kim, M. I. Proctor, S. S. Narayanan, and K. S. Nayak. Dynamic 3D visualization of vocal tract shaping during speech. ISMRM, page 4355, 2011. [129] Y. Zhu, Y.-C. Kim, M. I. Proctor, S. S. Narayanan, and K. S. Nayak. Dynamic 3-D visualization of vocal tract shaping during speech. IEEE transactions on medical imaging, 32(5):838{848, May 2013. 120 [130] Y. Zhu, S. S. Narayanan, and K. S. Nayak. Flexible Dynamic Phantoms for Evaluating MRI Data Sampling & Reconstruction Methods. ISMRM Workshop on Data Sampling & Image Reconstruction, 2013. 121
Abstract (if available)
Abstract
Magnetic resonance imaging (MRI) is a non‐invasive 3D imaging modality capable providing unique contrast for anatomical and functional assessment. MRI is slow (compared to other modalities, such as ultrasound and computed tomography) because of physical and physiological limitations. Therefore, critical trade‐offs have to be made between the spatial coverage, spatial resolution and temporal resolution. In conventional 2D time‐resolved (dynamic) examinations, one parameter is usually sacrificed for maintaining the other two. Such trade‐off would be even more critical in dynamic 3D imaging. ❧ For a long time, MRI speed prevented the application of MRI to dynamic 3D imaging where the motion of tissues (muscles, organs) or the movement of some contrast media is diagnostically important. The desired MRI signals reside in a 4D space (3D object + time), and the contemporary scanners are able to capture a very small fraction of the data, resulting in temporal blurs if spatial coverage and resolution are prioritized, or image artifacts if insufficient data in one time frame. Speeding up the MRI acquisition therefore has long been an important area of research, as it enabled many new applications and more flexible use of the MRI scanner. ❧ This dissertation work aims to dynamic 3D MRI specifically for speech imaging and dynamic contrast‐enhanced imaging. At first, an engineering method is proposed to combine dynamic 2D data to form dynamic 3D speech visualization, using audio‐based alignment. Later on, the dissertation work proposes more advanced techniques that provide 3D dynamics in a ""fast"" manner such that the data acquisition sequence can be generated rapidly, and useful 3D information can be recovered with sufficient temporal resolution, and in a ""flexible"" manner such that scanning parameters can be freely chosen, and the temporal resolution can be adjusted in retrospect. The acceleration is achieved by using efficient data acquisition and a combined parallel imaging and constrained reconstruction method
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Fast upper airway MRI of speech
PDF
Improved brain dynamic contrast enhanced MRI using model-based reconstruction
PDF
Functional real-time MRI of the upper airway
PDF
High-dimensional magnetic resonance imaging of microstructure
PDF
Technology for improved 3D dynamic MRI
PDF
Efficient and accurate 3D FISP-MRF at 0.55 Tesla
PDF
Correction, coregistration and connectivity analysis of multi-contrast brain MRI
PDF
Shift-invariant autoregressive reconstruction for MRI
PDF
Improving sensitivity and spatial coverage of myocardial arterial spin labeling
PDF
Seeing sleep: real-time MRI methods for the evaluation of sleep apnea
PDF
Optimization methods and algorithms for constrained magnetic resonance imaging
PDF
Visualizing and modeling vocal production dynamics
PDF
Fast iterative image reconstruction for 3D PET and its extension to time-of-flight PET
PDF
Dynamic functional magnetic resonance imaging to localize activated neurons
PDF
New methods for carotid MRI
PDF
Emotional speech production: from data to computational models and applications
PDF
Dynamic cardiovascular magnetic resonance imaging for improved assessment of ischemic heart disease
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
Reproducibility and management of big data in brain MRI studies
PDF
Model based PET image reconstruction and kinetic parameter estimation
Asset Metadata
Creator
Zhu, Yinghua
(author)
Core Title
Fast flexible dynamic three-dimensional magnetic resonance imaging
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/02/2016
Defense Date
02/01/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D,brain tumor,constrained reconstruction,DCE-MRI,Dynamic,GOCART,golden angle,MRI,OAI-PMH Harvest,sparsity,Speech
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nayak, Krishna (
committee chair
), Haldar, Justin (
committee member
), Narayanan, Shrikanth (
committee member
), Wood, John (
committee member
)
Creator Email
yinghuaz@usc.edu,zhuyinghua1203@hotmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-205686
Unique identifier
UC11277865
Identifier
etd-ZhuYinghua-4076.pdf (filename),usctheses-c40-205686 (legacy record id)
Legacy Identifier
etd-ZhuYinghua-4076.pdf
Dmrecord
205686
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhu, Yinghua
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D
brain tumor
constrained reconstruction
DCE-MRI
GOCART
golden angle
MRI
sparsity