Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Fast upper airway MRI of speech
(USC Thesis Other)
Fast upper airway MRI of speech
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FAST UPPER AIRWAY MRI OF SPEECH
by
Yoon-Chul Kim
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Ful¯llment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2010
Copyright 2010 Yoon-Chul Kim
Acknowledgments
First of all I would like to express a sincere gratitude to my mentor and advisor Prof.
KrishnaS.NayakforhisguidanceduringmyPh.D.studies. Hehasintroducedmetothis
exciting ¯eld of MRI. He has guided me to change in a positive and productive way. He
has encouraged me and provided me with a broad range of research opportunities. Also,
he was very clear about giving me insights into any research ideas and their impacts. All
of these helped me to get motivated and work hard for many years of research.
IsincerelythankProf. ShrikanthNarayananforprovidingsupportinvocaltractMRI
research and help with improving the manuscripts. In addition, I thank my dissertation
and qualifying exam committee members Prof. Louis Goldstein, Prof. Antonio Ortega,
and Prof. Richard Leahy for their valuable suggestions.
IappreciateallmembersoftheMagneticResonanceEngineeringLaboratory(MREL)
group. I thank Prof. Jon-Fredrik Nielsen who had worked with me on my ¯rst MRI
project about real-time cardiac echo-planar imaging artifact correction. Jon also helped
me to learn tools and software programming related to real-time MRI and conduct the
challenging lab job (i.e., the MREL computer system and networking administrations). I
thank Prof. Houchun Harry Hu who has kindly answered any questions on MR research
, has helped me with data collection, and has proofread on my papers. I thank all of my
former and current lab mates Kyunghyun Sung, Joao Carvalho, Hsu-Lei Lee, Taehoon
ii
Shin, Zungho Zun, Mahender Makhijani, Samir Sharma, Travis Smith, Yinghua Zhu for
discussions, help, andallfavorsofbeingsubjectsfortheMRIscansattheUSChospitals.
I also express a sincere gratitude to Prof. Sungbok Lee who has been cheering and
introduced background on speech research. I thank Erik Bresch, Stephen Tobin, Michael
Proctor, Vikram Ramanarayanan from the Speech Production and Articulation kNowl-
edge (SPAN) group for helping each other in our real-time speech MRI data collection
on weekends.
I express a gratitude to outside faculty and senior researchers with whom I had a
chance to get involved. I thank Prof. Cecil E. Hayes, Prof. Bradley Sutton, Prof. Jong-
Chul Ye, Dr. Meng Law, Dr. John L. Go, Dr. Sara Banerjee, Dr. Holger Eggers, Dr.
Stefanie Remmele, Prof. Michael Lustig for their comments/suggestions related to my
research. IalsothankMRItechniciansandGEengineersatthehospitalsfortheirsupport
in MR scanner related issues, and thank anonymous journal referees who had reviewed
and made valuable suggestions on my manuscripts.
Finally, I dedicate my thesis to my mom, dad, and sister who live in Seoul, Korea.
Also I dedicate this thesis to my dad's brother and his family. Moreover, I cannot forget
to thank my uncle's family who lives in Anaheim, CA. I am deeply indebted to my mom
for her caring and praying for my safety and success.
iii
Table of Contents
Acknowledgments ii
List Of Tables vi
List Of Figures vii
Abstract x
Chapter 1: Introduction 1
1.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2: MRI Background 6
2.1 MRI Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Basic Pulse Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.1 Excitation . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.2 Readout . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 k-space Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2.1 Practical Constraints . . . . . . . . . . . . . . . . . . . . 10
2.1.2.2 2D Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2.3 3D Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Accelerated Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Parallel Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Upper Airway Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Air-tissue Magnetic Susceptibility . . . . . . . . . . . . . . . . . . 21
2.3.2 Motion of Articulators . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.3 Imaging Tradeo®s . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.4 Upper Airway MRI of Speech . . . . . . . . . . . . . . . . . . . . . 26
2.3.4.1 3D MRI of Sustained Speech . . . . . . . . . . . . . . . . 26
2.3.4.2 Real-time MRI of Fluent Speech . . . . . . . . . . . . . . 28
Chapter 3: EPI Artifact Correction for Real-time MRI 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
iv
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Application to Upper Airway Imaging . . . . . . . . . . . . . . . . . . . . 41
Chapter 4: Accelerated 3D Imaging Using Compressed Sensing 49
4.1 Single-coil Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.3 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.3.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.3.2 Image Reconstruction . . . . . . . . . . . . . . . . . . . . 55
4.1.3.3 In Vivo Experiments . . . . . . . . . . . . . . . . . . . . 57
4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Multi-coil Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1.1 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.1.2 In Vivo Experiments . . . . . . . . . . . . . . . . . . . . 63
4.2.1.3 Image Reconstruction . . . . . . . . . . . . . . . . . . . . 63
4.2.1.4 Data Processing and Analysis . . . . . . . . . . . . . . . 64
4.2.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 5: Real-time Speech MRI Using Golden-ratio Spiral 75
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2.2 In Vivo Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2.3 Blockwise Temporal Resolution Selection . . . . . . . . . . . . . . 79
5.2.4 Oral-Velar Coordination . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Chapter 6: Parallel Imaging with Novel 16-Channel Coil at 3 Tesla 94
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.1 Coil Design and Construction . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Experimental Methods . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Chapter 7: Summary and Future Work 117
Bibliography 122
v
List Of Tables
3.1 Ghost-to-signal ratios from the phantom study (see Fig. 3.2 for 1D phase
correction and the proposed method. Mean and standard deviation of
g-factor values for the proposed method are also reported. . . . . . . . . . 37
6.1 AverageSNRimprovement. AverageSNRwasmeasuredinasinglesubject
(33 year old male) using the 16-channel UA coil, single-channel birdcage
coil, and 8-channel NV coil. Eight regions of interest (see Fig. 6.3a) were
identi¯ed in 2D midsagittal images with 1:88£1:88£2:50 mm
3
spatial
resolution, obtained without the use of parallel imaging. The 16-channel
UA coil provided improved intrinsic SNR in all regions of interest. UA:
upper airway, NV: neurovascular. . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Comparison of average SNR drop-o® in three volunteers. Average SNR
was measured in three volunteers using the 16-channel coil. Eight regions
of interest were identi¯ed in 2D midsagittal images with 1:88£1:88£2:50
mm
3
spatialresolution, obtainedwithouttheuseofparallelimaging. Note
the relatively less steep SNR drop-o® from the small female subject. . . . 103
vi
List Of Figures
2.1 A basic slice selective excitation. . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Pulse sequence diagram and image formation process. . . . . . . . . . . . 11
2.3 Schematicdescriptionsofk-spacetrajectoriesfor(a)2DFT,(b)projection
reconstruction (PR), (c) echo-planar imaging (EPI), and (d) spiral. . . . . 14
2.4 Schematic descriptions of k-space trajectories for (a) 3DFT and (b) 3D
stack of spirals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Illustration of Cartesian SENSE reconstruction process. . . . . . . . . . . 17
2.6 Spiral imaging using compressed sensing on a GE resolution phantom. . . 20
2.7 E®ect of spiral readout duration on the upper airway images. . . . . . . . 22
2.8 An example of motion artifacts in midsagittal vocal tract MRI image . . . 23
2.9 Spiral pulse sequence diagram and k-space trajectory . . . . . . . . . . . . 24
2.10 Example MR image of a midsagittal slice of human upper airway. . . . . . 27
2.11 Conventional 3D vocal tract imaging approaches. . . . . . . . . . . . . . . 28
2.12 Midsagittal real-time speech MRI for nasal speech production study. . . . 30
3.1 Reconstruction °owchart for the proposed ghosting correction method for
(a)non-acceleratedand(b)two-foldacceleratedEPIwithautomaticghost-
ing correction.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Real-time cylindrical phantom images reconstructed with non-accelerated
EPI data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 In vivo real-time cardiac images reconstructed with non-accelerated EPI
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Automatic correction during continuous scan plane rotation. . . . . . . . . 46
vii
3.5 Automaticghostingcorrectionusingnoaccelerationandtwo-foldaccelera-
tion(b,c,g,h)andcorrespondingg-factormaps(d,e,i,j)reconstructedusing
two coil calibration schemes. . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6 Dynamics of vocal tract shaping during natural speech utterances of \all
year". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Illustration of scan plane prescription, which is used for the 3D upper
airway imaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 k-space sampling patterns used in the experimental studies. . . . . . . . . 56
4.3 L-curve for the selection of regularization parameter ¸ for CS reconstruc-
tion of the 3D upper airway data with reduction factors of 3, 4, and 5. . . 67
4.4 Representative magnitude and phase images from axial slices. . . . . . . . 68
4.5 Axialslicereconstructionsfromretrospectivesub-samplingoffullysampled
data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Reformatted 2D midsagittal and coronal images after the PC-II CS recon-
structions of the 5x undersampled 3DFT data set. . . . . . . . . . . . . . 70
4.7 3D visualization of the tongue and lower jaw after the PC-II CS recon-
structions from the data set prospectively acquired with 5x acceleration. 71
4.8 Flowchart of the proposed reconstruction scheme. . . . . . . . . . . . . . . 72
4.9 Midsagittal images for (a) /s/, (b) /S/, and (c) /r/. Their corresponding
3D tongue shapes for (d) /s/, (e) /S/, and (f) /r/. . . . . . . . . . . . . . 73
4.10 The prospective use of accelerated 3D acquisition and multi-coil PC-CS
reconstruction. (a) Reformatted midsagittal slices and their associated
midlines drawn for cross-sectional slice prescription. (b) Area function
plot. (c) 3D visualization of the tongue and lower jaw. . . . . . . . . . . . 74
5.1 Schematic diagram of real-time continuous MRI data acquisition (DAQ)
using a golden-ratio spiral view order. . . . . . . . . . . . . . . . . . . . . 77
5.2 k-space trajectories for conventional bit-reversed 13-interleaf UDS and
golden-ratio spiral view order when samples from (a,b) 8, (c,d) 13, and
(e,f) 21 consecutive TRs are combined. . . . . . . . . . . . . . . . . . . . . 88
5.3 Retrospective selection of temporal resolution: (a) Comparison of una-
liased FOV between the golden-ratio view order and conventional bit-
reversed 13-interleaf UDS sampling. (b) The enlargement of the region
within the green rectangle in (a). . . . . . . . . . . . . . . . . . . . . . . 89
viii
5.4 Midsagittal images with a large reconstruction ¯eld-of-view (FOV) of 38
£ 38 cm
2
reconstructed from the data acquired in static posture. . . . . 90
5.5 Gridding reconstructed dynamic frames and time intensity pro¯les from
(a,c)bit-reversed13-interleafuniformdensityspiraldataand(b,d)golden-
ratio spiral view order data. . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.6 Blockwisetemporalresolutionselectionandsynthesisofasinglevideofrom
four temporal resolution videos. . . . . . . . . . . . . . . . . . . . . . . . 92
5.7 Variable temporal resolution selection from real-time data acquired using
the golden-ratio acquisition scheme. . . . . . . . . . . . . . . . . . . . . . 93
6.1 16-channel upper airway receive coil array. . . . . . . . . . . . . . . . . . . 108
6.2 The 16-channel upper airway coil array on a volunteer. (Left) Side view.
(Right) Top view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 Illustrationoftheeightupperairwayregionsofinterest(ROIs)usedinthe
evaluation of SNR and g-factor. . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 (a)Channelindiceslabeledonthecoillayout. (b)Noisecorrelationmatrix
from one representative volunteer. . . . . . . . . . . . . . . . . . . . . . . 110
6.5 Midsagittal, axial, and coronal images at each coil element of the 16-
channel upper airway coil. . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.6 2D midsagittal image reconstruction using 1D SENSE: Comparison of the
two subjects with di®erent head size. . . . . . . . . . . . . . . . . . . . . 112
6.7 Plotsofg-factorvaluesfor8di®erentROIsasafunctionofreductionfactor
for 2DFT midsagittal parallel imaging. . . . . . . . . . . . . . . . . . . . 113
6.8 Comparisonofmidsagittal,axialandcoronalslicereconstructionusingpar-
allel imaging. (Top): 8-channel neurovascular coil. (Bottom): 16-channel
upper airway coil.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.9 3D image reconstruction using 2D SENSE. . . . . . . . . . . . . . . . . . 115
6.10 2Dimagereconstructionofthedynamicsofvocaltractshapingusingspiral
SENSE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
ix
Abstract
Magnetic resonance imaging (MRI) is a powerful non-invasive imaging modality, but is
relativelyslowcomparedtoalternativessuchasX-rayandultrasound. AcceleratingMRI
scans has been of the great interest over the past several years and the acceleration of
upper airway MRI, in particular, is a primary focus of this thesis. Rapid and real-time
MRI can be used to capture tissue dynamics (e.g., the tongue/velum during speech, or
the beating heart) or to reduce scan time. Rapid MRI can be achieved through the use
of novel acquisition and reconstruction methods. Acquisition technologies such as echo-
planar imaging and spiral imaging are e®ective at improving temporal resolution, but
introduce additional image artifacts. Image reconstruction techniques such as parallel
imaging and compressed sensing accelerate acquisition speed by highly undersampling
Fourier data and improve image quality by removing spatial aliasing artifacts.
In this dissertation, I present four methods that accelerate MRI and manage the as-
sociated artifacts. First, a new interleaved echo-planar imaging technique is presented
and applied to real-time interactive cardiac MRI. When combined with parallel imaging
reconstruction,thisenablesautomaticcorrectionofghostingartifactsandmaintainstem-
poral resolution. Second, a method for accelerated three-dimensional (3D) upper airway
imagingispresented. Themethodadoptscompressedsensingforaccelerationandenables
extraction of a whole 3D vocal tract during a single trial of sustained sound production.
x
Hence, itisfreefromimagemis-registrationproblemanddramaticallyimprovesthrough-
put in data acquisition. Third, a novel acquisition scheme based on a golden-ratio spiral
temporal view order is presented and applied to real-time speech MRI. This provides
more °exible retrospective selection of temporal resolution in vocal tract imaging than
conventional spiral acquisition. I demonstrate its e®ectiveness at capturing rapid mo-
tion of articulators (e.g., tongue and velum) in nasal sound production. Finally, a novel
16-channel receive coil is described that is highly sensitive to the upper airway regions
of interest (e.g., lips, tongue, soft palate). Its higher signal-to-noise ratio (SNR) and
acceleration in parallel imaging is demonstrated by comparing it with widely available
commercial coils.
xi
Chapter 1
Introduction
Magnetic resonance imaging (MRI) is a powerful medical imaging modality that is non-
invasive, free from ionizing radiation, °exible in controlling soft tissue contrast, capable
of imaging any arbitrary oblique planes of interest, and e®ective at producing high spa-
tial resolution. However, it is inherently slow compared to other imaging technologies
such as ultrasound and X-ray. Numerous methods that improve imaging speeds in ei-
ther acquisition or reconstruction perspectives have been developed in the MRI research
community.
Echo-planar imaging (EPI) is an ultrafast imaging technique that is based on time-
e±cientsampling of the acquisition space, referred to as k-space. However, reconstructed
images from EPI data may su®er from geometric distortions and ghosting artifacts due
to a variety of sources including echo-misalignment, o®-resonance, °ow, and motion.
Ghosting artifacts due to echo-misalignment is a systemic problem and is hard to correct
for in oblique scan planes. Prior to the actual scans, calibration scans are often necessary
for subsequent artifact correction processes.
In the ¯rst part of the thesis, I present a novel and fully automatic correction method
that eliminates ghosting artifacts due to echo-misalignment in any arbitrary oblique scan
plane. Phantom and real-time cardiac in vivo experiments indicate that the proposed
1
method is superior to the conventional one-dimensional (1D) phase correction method in
terms of ghost suppression in oblique or double-oblique scan planes.
In speech research, MR imaging of the human upper airway during sustained speech
production has been used as a means to provide a full geometric information of the
shaping of the vocal tract and data for its modeling. The acquisition of a whole 3D vocal
tract requires a long scan time, typically exceeding normal breath-hold duration. 3D
vocal tract constructed from many repetitions of the same articulation is likely to su®er
from the image misregistration possibly due to di®erent positioning of the jaw, lips,
and tongue at each trial of sustaining the speech sound. Under the constraints in spatial
resolutionandscantimeappropriateforasinglesustainedsoundproduction,acceleration
canbeonlyachievablebyhighlyundersampling3Dk-space. ConventionalinverseFourier
transformreconstructionoftheundersampleddatasetwillproduceseverespatialaliasing
artifacts in reconstructed images, which will make it infeasible to quantitatively assess
the geometry of vocal tract shaping with good accuracy.
In the second part of the thesis, I propose an application of compressed sensing to
accelerated 3D imaging of human vocal tract shaping. A variable density pseudo random
undersampling in (k
y
, k
z
) space is exploited to promote incoherence of spatial aliasing
artifacts. Image reconstruction from undersampled k-space data is performed based on a
regularized iterative reconstruction, in which the l
1
norm of ¯nite di®erence in the image
is adopted for denoising and edge enhancement. Acquisition and reconstruction from
retrospective and prospective studies demonstrates the e®ectiveness of the technique at
improving the air-tissue boundary depiction at high acceleration factors.
In speech research using real-time MRI, the analysis of vocal tract dynamics is per-
formed retrospectively after acquiring data in real-time. Conventional real-time speech
2
MRI is typically based on a constant temporal resolution. However, a °exible retrospec-
tive selection of temporal resolution is desirable because of natural variations in speaking
rate and variations in the speed of di®erent articulators.
In the third part of the thesis, a novel acquisition scheme based on a golden-ratio
spiral temporal view order is proposed and applied to real-time speech MRI. Compared
toaconventionalspiralacquisition,theproposedmethoddemonstratesimprovedaliasing
artifact reduction for static postures and improved temporal resolution for capturing the
dynamics of rapid articulator motion.
Novel pulse sequences and image reconstruction techniques have improved the SNR
and spatio-temporal resolution in imaging the upper airway. The design and use of a
receive coil that is highly sensitive to the upper airway regions of interest may be an
additional source for improving image quality.
In the ¯nal part of the thesis, I describe a novel 16-channel 3 Tesla upper airway
coil and present its performance in the SNR and parallel imaging by comparing it with
otherwidelyavailablecommercialcoils. Withthiscoilandconventionalparallelimaging,
I demonstrate a 6-fold acceleration in 3D imaging of the vocal tract during sustained
speech. I also demonstrate a 4.2-fold acceleration in 2D real-time imaging of the vocal
tract during °uent speech.
1.1 Outline
This dissertation is outlined as follows.
Chapter 2: MRI Background
This chapter presents an overview of basic MR pulse sequences, k-space sampling,
2D/3Dimaging,parallelimaging,compressedsensing,andabriefintroductiontoimaging
issues in rapid MRI of the upper airway.
3
Chapter 3: EPI Artifact Correction for Real-time MRI
ThischapterpresentsanEPIacquisitionandreconstructiontechniquethateliminates
ghostingartifactsduetomis-alignmentoftheoddandevenechoesusingparallelimaging.
The e®ectiveness of the technique is demonstrated in oblique and double-oblique scan
planes in real-time cardiac and upper airway MRI.
Chapter 4: Accelerated 3D Imaging Using Compressed Sensing
This chapter presents an accelerated MRI technology that enables to capture a high-
resolution 3D vocal tract shape in a single trial of sustaining sound production. A
proposed phase constrained compressed sensing method accelerates MR acquisition by
sub-sampling the Fourier coe±cients, and results in improved depiction of the air-tissue
interface over conventional image reconstruction methods. Its extension to multi-coil
imaging is also presented.
Chapter 5: Real-time Speech MRI Using Golden-ratio Spiral
This chapter presents a new acquisition technique that is based on golden-ratio spiral
temporal view order. The golden-ratio view order provides more °exible retrospective
selection of temporal resolution than conventional scheme. The method is applied to
real-time imaging of the vocal tract shape during nasal sound production.
Chapter 6: Parallel Imaging with Novel 16-Channel Coil at 3 Tesla
This chapter describes a novel 3 Tesla 16-channel receive coil which is highly sensitive
to the upper airway regions of interest. SNR and parallel imaging g-factor of the coil is
evaluated by comparing with widely available single-channel birdcage coil and 8-channel
neurovascular coil, and over several human subjects with di®erent head size and upper
airway geometry. Its parallel imaging performance is demonstrated using high resolution
4
3Dimagingofthevocaltractshapeduringsustainedspeechproduction. Highspatiotem-
poralreal-timeimagingisalsodemonstratedusing2Dspiralparallelimagingofthevocal
tract dynamics during natural speech production.
Chapter 7: Summary and Future Work
Thischaptersummarizesthethesisandpresentsfutureresearchtopicsthatareworth
of more investigation.
5
Chapter 2
MRI Background
I ¯rst describe the basics of MRI physics and the principle of MR image formation. I
provide an overview of two fundamental MRI acceleration techniques: parallel imaging
and compressed sensing. Finally, I present several considerations in imaging the upper
airwayandthenreviewonconventionalmethodsofimagingthevocaltractduringspeech.
2.1 MRI Physics
2.1.1 Basic Pulse Sequences
I introduce the basics and underlying physics of acquiring a two-dimensional MRI image
from a certain imaging slice of the human body.
2.1.1.1 Excitation
The magnet inside the MRI scanner room produces a strong and permanent magnetic
¯eld. The subjects are placed inside the magnet bore. In microscopic level, a majority
of the spins in human tissue are in°uenced by the strong magnet and are aligned along
6
the direction of the main magnetic ¯eld, i.e., longitudinal direction. They precess at a
Larmor frequency !
0
(rad/sec), which is proportional to the magnetic ¯eld strength B
0
.
!
0
=°B
0
; (2.1)
where° isagyromagneticratio, whosevaluedependsonthetypeofnuclei(e.g.,
1
H,
13
C,
31
P). Fromnowon,Iwillassumeimaginghydrogen
1
H,whichismostabundantinhuman
body. For example, °=2¼ for
1
H is 42.58 MHz/T. In
1
H imaging at 3 Tesla magnetic ¯eld
strength, the Larmor frequency is 128 MHz. The net magnetization M
0
of spins is in the
same longitudinal direction as the main magnetic ¯eld in thermal equilibrium condition.
In the presence of strong magnetic ¯eld, another magnetic ¯eld, called a B
1
transmit
¯eld, is involved in MR imaging process. The B
1
¯eld has its carrier frequency tuned
to the Larmor frequency, which is in the radio frequency (RF) range. The direction
of the B
1
¯eld is perpendicular to the main magnetic ¯eld. The B
1
¯eld is typically
applied only for a short duration (e.g., 1» 3 msec). The net e®ect results in excitation
of the spins, i.e., tipping the spin magnetization M
0
onto its transverse plane (i.e., the
plane perpendicular to the main magnetic ¯eld) by a certain angle. This phenomenon
can be described by the well-known Bloch equation [82]. The transversal component
of the excited spin magnetization is time varying in nature, and induces voltage in an
RF receive coil by Faraday's induction law. After the pulse of the B
1
transmit ¯eld, the
transversemagnetizationexponentiallydecaysatcertainratescharacterizedbyT
1
andT
2
time constants and goes back to the thermal equilibrium state. This is called relaxation.
T
1
and T
2
represent longitudinal and transversal relaxation rates, respectively.
Inone-dimensional(1D)sliceselectiveexcitation,aSINC-shapedB
1
transmitpulseis
typicallygeneratedalongwithasliceselectivegradientG
z
(seeFig.2.1). Thecombination
of the B
1
and G
z
results in a 1D slice pro¯le, in which the spins within the passband of
7
B
1
(t)
t
Fourier
transform
M
x
, M
y
Δω
t
Gz(t)
z
ω
ω
0
Δz
Slice
thickness
Produce resonance
offset linearly
varying in z
Slope 1/G
z
Figure 2.1: A basic slice selective excitation. The duration and shape of the envelope of
the B
1
waveform determines the bandwidth and transition width of the transverse mag-
netization. This is governed by the Fourier transform relationship between the envelope
of the B
1
waveform and the spectral response of the transverse magnetization when the
small tip approximation is assumed to hold [86]. Slice selection along the z direction is
attained after applying the G
z
gradient waveform.
the slice pro¯le have the transverse magnetization component and produce signals in the
receive coil while the spins outside the slice pro¯le have little or no transverse component
and has no e®ect on the receive coil.
2.1.1.2 Readout
After the 1D slice selective excitation, time-varying gradient ¯elds G
x
and G
y
are gen-
erated to spatially resolve the spin magnetization in 2D. The G
x
and G
y
gradient ¯elds
induce additional resonance o®sets that linearly vary along the spatial x and y axes,
respectively. The signal detected in the RF receive coil at an instantaneous time t is
given as a sum of the transverse components of all the spins, where each individual spin
experiences their own instantaneous phase o®sets depending on their spatial position.
8
The instantaneous phase o®set4Á can be described by:
4Á(t;x;y)=
Z
t
0
4!(¿)d¿
=
Z
t
0
°4B(¿)d¿
= °
Z
t
0
G
x
(¿)d¿ x+°
Z
t
0
G
y
(¿)d¿ y: (2.2)
When deriving the third line of Equation 2.2, it is assumed that the spins at a position
(x;y) are stationary in time.
Here, I introduce the k-space representation [110]:
k
x
(t),
°
2¼
Z
t
0
G
x
(¿)d¿: (2.3)
k
y
(t),
°
2¼
Z
t
0
G
y
(¿)d¿: (2.4)
When ignoring e®ects such as spin relaxation and spatial resonance o®set, the MR
signal equation can be simply expressed as the following Fourier transform relationship:
s(t)=
Z
x
Z
y
m(x;y) e
¡i4Á(t;x;y)
dxdy
=
Z
x
Z
y
m(x;y) e
¡i2¼(kx(t)x+ky(t)y)
dxdy
=M(k
x
(t);k
y
(t)); (2.5)
where s(t) is a signal from the receive coil, M(k
x
(t);k
y
(t)) is a function of 2D spatial
frequencies (k
x
(t);k
y
(t)), both of which are parameterized by a time variable t. m is the
transversecomponentofthemagnetizedspindensity, (k
x
;k
y
)isthek-spacelocation, and
(x;y) is the spatial position. Note that k
x
(t) and k
y
(t) are given as the integral of the
gradient waveforms G
x
(t) and G
y
(t) respectively and thus are continuous with respect
9
to t. Figure 2.2(a-b) illustrates how the generation of G
x
(t) and G
y
(t) is related to the
trajectory in k-space. The inverse Fourier transform of the samples M(k
x
(t);k
y
(t)) on
the trajectories in k-space results in the MRI image (see Fig. 2.2(c)).
2.1.2 k-space Trajectories
2.1.2.1 Practical Constraints
The gradient ampli¯ers in MRI scanner are the sources that generate the G
x
, G
y
, and
G
z
waveforms. They provide currents to the gradient coils, and their operations can
be described by the L-R circuit model [59, 37]. The current in the gradient coils is
proportional to the gradient amplitude. The voltage in the gradient coils is related to
the gradient switchingrate (i.e., gradientslew rate). Forspeed-up in the MR acquisition,
gradient waveforms are designed to take full advantages of maximum limits of gradient
amplitudeandslewrate. Notethattheuseofmaximumavailableslew-rateandamplitude
in gradient waveform design often causes peripheral nerve stimulation and tissue heating
in the subjects. Also, this is vulnerable to scanner heating, in particular for real-time
MRI that involves several hours-long scans such as dynamic speech imaging experiments.
2.1.2.2 2D Imaging
Here, I brie°y introduce four fundamental k-space sampling schemes (i.e., 2DFT, projec-
tion reconstruction (PR), echo-planar, and spiral), each of which have their own char-
acteristics and exhibit advantages/disadvantages depending on the applications. Some
representative k-space trajectories are illustrated in Fig. 2.3.
10
G
z
DAQ-1 DAQ-2
DAQ-1
DAQ-2
k
x
k
y
Inverse
Fourier
Transform
TR TR
t
t
t
t
≈
≈
≈
≈
a
b c
G
x
G
y
RF
Figure 2.2: Pulse sequence diagram and image formation process. (a) Basic 2DFT fast
gradient echo (GRE) pulse sequence. DAQ represents data acquisition period during
which the signal is recorded by the receive coil. Note that the blip size in the G
y
gradient
(i.e., phase-encode gradient) changes by a certain increment in the area at every TR (see
thegreenarrows),enablingtoacquiresubsequentk
y
levelsinthek-spaceshownin(b). (b)
Data samples acquired during each DAQ period are mapped onto the k-space trajectory.
(c) The inverse Fourier transform of the acquired k-space data yields the ¯nal image.
11
2DFT Imaging: Two-dimensional Fourier transform (2DFT) imaging acquires a
single phase encode line after each excitation. It is popular and most widely used imag-
ing method in clinical MRI studies. It is very robust to any system imperfections such
as gradient/DAQ delay and magnetic ¯eld inhomogeneity. Since the acquired data are
evenly distributed in k space, image reconstruction procedure directly involves a simple
andfast2DinverseFouriertransform. However,2DFTdataacquisitionisextremelyslow
and is very sensitive to motion artifacts.
PR Imaging: Projection reconstruction (PR) imaging covers k-space by acquiring
radial lines at di®erent azimuthal angles. Each radial line is designed to pass through
the k-space origin. Since PR imaging acquires the k-space origin at every TR, it is rel-
atively less sensitive to motion than 2DFT. The k-space samples on PR trajectories are
distributednon-uniformly. Thelowspatialfrequencyregionink-spaceissampleddensely
whereas the high spatial frequency region is sampled relatively sparsely. Spatial aliasing
artifactsresultfrominsu±cientsamplingofhighspatialfrequency. The\high-frequency"
aliasingartifactsappearincoherentandarelessvisuallydisturbingthanfold-overaliasing
artifacts resulting from the regular undersampling in 2DFT imaging.
Echo-planar Imaging: Echo-Planar Imaging (EPI) samples k-space in a raster-like
fashion,anditstrajectoryissomewhatsimilarto2DFTsamplingpattern. EPIisfastbut
is prone to artifacts including ghosting and geometric distortion in reconstructed images
due to a variety of sources such as o®-resonance, T
2
or T
¤
2
relaxation, eddy currents, gra-
dient/DAQ delay, etc. Geometric distortion is proportional to the amount of resonance
o®set and the bandwidth along the phase encode. Multi-shot (or interleaved) short echo-
train-length EPI is often used for practical purposes. It needs a careful selection in the
12
number of shots (or interleaves) and readout duration. The proper choice compromises
between 1) the improved image quality by reduction in image artifacts originating from
either o®-resonance or T
¤
2
decay and 2) the improved imaging speed by the use of a fewer
number of shots (or interleaves).
Spiral Imaging: Spiral imaging typically acquires data from the k-space origin,
and its data acquisition ends at the k-space periphery after traversing the predetermined
spiral trajectory. Spiral imaging is known to be the most time-e±cient at sampling k-
space. Moreover, it is insensitive to motion and °ow e®ects. Spiral imaging is suitable
for imaging rapidly moving objects (e.g., cardiac ventricular motion or °ow imaging,
vocal tract imaging during speech/swallowing). Imaging ¯eld-of-view (FOV) in spiral
trajectories is inversely proportional to a distance along the radial line crossing two ad-
jacent spiral lines. The design of spiral trajectory can be °exible and be parameterized
as a function of the k-space radius (e.g., see Brian Hargreaves' website http://www-
mrsrl.stanford.edu/»brian/vdspiral). However, spiral imaging is prone to spatial blur-
ring or geometric distortion in reconstructed images due to a variety of sources including
o®-resonance, concomitant gradient ¯eld, and gradient/DAQ timing delays.
2.1.2.3 3D Imaging
Three-dimensional(3D)imagingperformsspatialencodingin3Dk-space(i.e.,k
x
;k
y
;k
z
).
3DFT (see Figure 2.4(a)) imaging is widely used and its pulse sequence can be described
as follows: slab excitation pulses with a thick slice (e.g., 8-10 cm) are ¯rst applied and
followed by a G
z
gradient encode blip, which performs k
z
encoding. G
x
andG
y
gradients
are generated to perform k
x
and k
y
encoding, respectively. 3D imaging provides a full
volumetric coverage of regions of interest, but it requires a prohibitively long scan time.
13
a b
c d
ky ky
ky ky
kx kx
kx kx
Figure 2.3: Schematic descriptions of k-space trajectories for (a) 2DFT, (b) projection
reconstruction (PR), (c) echo-planar imaging (EPI), and (d) spiral.
However,3Dimaginghasrecentlygainedattentionbecauseitshighdimensionalityallows
for substantial acceleration by highly undersampling k-space. Advanced reconstruction
methods such as parallel imaging [90, 33, 113] and compressed sensing [66] or the com-
bination of the two [73, 58] can be utilized to reconstruct an alias-free 3D volume from
signi¯cantly undersampled 3D data set. Other non-Cartesian 3D trajectories such as
3D PR [8], 3D cones [34], and 3D stack of spirals [109, 62](see Figure 2.4(b)) have been
developed to improve imaging speed.
14
kx
ky
kz
kx
ky
kz
a b
Figure 2.4: Schematic descriptions of k-space trajectories for (a) 3DFT and (b) 3D stack
of spirals.
2.2 Accelerated Imaging
2.2.1 Parallel Imaging
The use of multiple-channel phased array receive coil is widespread in MRI because it
providesimprovedsignal-to-noiseratio(SNR)overeithersingle-channelcoilorbodycoil.
AllcoilelementsinthearraysimultaneouslydetecttheMRsignals. TheMRsignalatone
element di®ers from those at the others because each coil element has its own spatial coil
sensitivity characterized by its direction and distance relative to tissue region of interest.
Parallel imaging exploits this additional spatial sensitivity information for the purpose of
accelerating MRI scans.
Numerous parallel imaging reconstruction techniques have been proposed in the liter-
ature. Sensitivity encoding (SENSE) [90] and generalized autocalibrating partially par-
allel acquisition (GRAPPA) [33] are two fundamental parallel imaging methods. SENSE
performs parallel imaging reconstruction in the image domain while GRAPPA performs
15
parallelimagingreconstructioninthek-spacedomain. Inthissubsection,Ionlyintroduce
SENSE approaches in Cartesian and non-Cartesian trajectories.
After the incorporation of the spatial coil sensitivity, the MR signal equation is for-
mulated as follows:
s
j
(t)=
Z
~ r
c
j
(~ r)m(~ r)e
¡i2¼
~
k(t)¢~ r
d~ r; (2.6)
wheres
j
isthesignalreceivedfromthej
th
coilelement,misthetransversecomponentof
the spin magnetization, c
j
is the coil sensitivity from the j
th
coil element,~ r is the spatial
position, and
~
k(t) is the k-space location at time t. It is noted that coil sensitivity e®ect
is modeled as a multiplicative factor in the MR signal equation.
In Cartesian k-space sampling, such as 2DFT and EPI, two-fold acceleration can be
achieved by skipping every other phase encode lines in the acquisition. This introduces
FOV/2 aliasing along the phase encode direction in reconstructed images. Two signals,
which are originally FOV/2 apart along the phase encode direction, are superimposed
onto a single pixel and this results in fold-over aliasing. For a given pixel location (x;y),
SENSE formulation can be described by the following linear system when four coil ele-
ments are considered and the acceleration rate is 2.
0
B
B
B
B
B
B
B
B
B
@
a
1
(x;y)
a
2
(x;y)
a
3
(x;y)
a
4
(x;y)
1
C
C
C
C
C
C
C
C
C
A
=
0
B
B
B
B
B
B
B
B
B
@
c
1
(x;y) c
1
(x;y+FOV=2)
c
2
(x;y) c
2
(x;y+FOV=2)
c
3
(x;y) c
3
(x;y+FOV=2)
c
4
(x;y) c
4
(x;y+FOV=2)
1
C
C
C
C
C
C
C
C
C
A
0
B
@
m(x;y)
m(x;y+FOV=2)
1
C
A (2.7)
Here, a
j
is the aliased signal in the image domain for the j
th
coil, c
j
is the coil sensitivity
for the j
th
coil, and m is the unknown signal to be estimated.
Un-aliasing is performed by solving the least-squares problem described by Equa-
tion 2.7 at every pixel (x;y). Figure 2.5 illustrates the Cartesian SENSE reconstruction
16
Aliased images from multiple coils Coil sensitivity maps from multiple coils
SENSE Reconstruction
Un-aliased image g-factor map
Figure 2.5: Illustration of Cartesian SENSE reconstruction process.
process. The matrix inversion involved in solving the least-squares problem results in
noise ampli¯cation in the ¯nal image. The degree of noise ampli¯cation is related to the
condition number of its associated coil sensitivity matrix. Geometry factor (also called
g-factor) is an important indicator of noise ampli¯cation [90] that results from the Carte-
sian SENSE reconstruction (see Fig. 2.5). g-factor values spatially vary and depend on
the acceleration factor, coil geometry, and phase encode directions.
17
In non-Cartesian k-space sampling, such as spiral and radial trajectories, two-fold
accelerationcanbeachievedbyskippingeveryotherinterleaf/spoke. UnliketheCartesian
case, many pixels can contribute aliasing to a single pixel location. Therefore, when
four coil elements are assumed to be considered, SENSE reconstruction is performed by
inverting the following large scale over-determined linear MR system formulation.
0
B
B
B
B
B
B
B
B
B
@
s
1
(k)
s
2
(k)
s
3
(k)
s
4
(k)
1
C
C
C
C
C
C
C
C
C
A
=
0
B
B
B
B
B
B
B
B
B
@
c
1
(~ r
1
)Á(k;~ r
1
) c
1
(~ r
2
)Á(k;~ r
2
) ::: c
1
(~ r
N
)Á(k;~ r
N
)
c
2
(~ r
1
)Á(k;~ r
1
) c
2
(~ r
2
)Á(k;~ r
2
) ::: c
2
(~ r
N
)Á(k;~ r
N
)
c
3
(~ r
1
)Á(k;~ r
1
) c
3
(~ r
2
)Á(k;~ r
2
) ::: c
3
(~ r
N
)Á(k;~ r
N
)
c
4
(~ r
1
)Á(k;~ r
1
) c
4
(~ r
2
)Á(k;~ r
2
) ::: c
4
(~ r
N
)Á(k;~ r
N
)
1
C
C
C
C
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
m(~ r
1
)
m(~ r
2
)
:
:
:
m(~ r
N
)
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
; (2.8)
where s
j
(k) is the M£1 measured k-space data from the j
th
coil element, Á(k;~ r
p
) is the
M£1 vector [e
¡i2¼
~
k
1
¢~ rp
;:::;e
¡i2¼
~
k
M
¢~ rp
]
T
, and m(~ r
p
) is the unknown pixel value at the p
th
pixel location in the image domain. M is the total number of k-space data samples per
each coil, and N is the total number of image pixels. Note that M < N because of the
undersampling for acceleration, but the fact that 4¢M ¸N guarantees a unique solution
in the image estimate [m(~ r
1
);m(~ r
2
);:::;m(~ r
N
)]
T
in the least-squares sense.
2.2.2 Compressed Sensing
Compressed sensing [25, 20] (CS) has recently emerged as a promising theoretical frame-
work in signal processing perspective. Compressed sensing states that a signal can be
exactlyrecoveredfromincomplete(i.e.,sub-Nyquistsamplingrate)randommeasurement
data with a very high probability via a minimum l
p
norm (0 · p · 1) reconstruction
18
provided that the signal is sparse (or compressible) in certain transform domain (e.g.,
wavelets, ¯nite di®erences, curvelets). Compressed sensing is characterized by the follow-
ing three main components:
1. Sparsity of the solution in its sparsifying transform domain
2. Incoherence between the system matrix and the sparsifying basis
3. Minimum l
p
norm (0·p·1) reconstruction which promotes a sparse solution
Since Lustig et al. [66, 67] elaborated and proposed the applicability of compressed
sensing in MRI, many researchers have investigated its e®ectiveness and extended its use
to a variety of MR applications. Compressed sensing MRI (CS-MRI) can be used to
accelerate scan time or increase spatial resolution. CS-MRI can be combined with other
existing acceleration methods such as parallel imaging to promote further accelerations.
It is noted that CS-MRI has recently gained great attention among radiologists and
clinicians [112].
Transform sparsity can be exploited in wavelet coe±cients of the images (e.g., T
1
or T
2
weighted brain imaging), ¯nite di®erences of the images (e.g., contrast enhanced
angiography), or periodicity in the dynamic time series (e.g., gated cardiac cine). Pure
randomundersamplingisdi±culttoachieveinMRIbecausethedataaresampledalonga
smooth trajectory in k-space. However, random undersampling can be easily achieved in
special cases for example on (k
y
, k
z
) encodes in 3DFT imaging [69], (k
y
, t) sampling [68,
31], and (k
f
, k
x
) sampling for MR spectroscopic imaging [43]. Variable density pseudo-
randomundersampling(i.e.,lowspatialfrequenciesarecriticallysampledandhighspatial
frequenciesaresparselysampled)producesnoise-likeincoherentaliasingandaretypically
preferred over uniformly random undersampling [66]. This is attributed to the fact that
themostenergyinthek-spacedataisinthecentralpartofthek-space. Figure2.6shows
19
Fully
sampled
UDS
2x
VDS
0 2 4 6 8
0
5
10
15
20
k−space radius (cm
-1
)
FOV (cm)
0 2 4 6 8
0
5
10
15
20
FOV (cm)
k−space radius (cm
-1
)
a
b
c
d
e
f
Figure 2.6: Spiral imaging using compressed sensing on a GE resolution phantom. A
single channel birdcage coil was used for RF signal reception. Gridding reconstructed
images from (a) fully sampled uniform density spiral (UDS) and (b) two-fold undersam-
pled variable density spiral (VDS) trajectories. CS reconstructed images from (c) fully
sampledUDSand(d)two-foldundersampledVDStrajectories. Plotsofk-spacesampling
density (i.e., imaging ¯eld of view (FOV)) from (e) fully sampled UDS and (f) two-fold
undersampled VDS trajectories.
an example of the use of compressed sensing. Variable density undersampling shown
in Fig. 2.6(f) produces incoherent aliasing artifacts in gridding reconstructed image in
Fig. 2.6(b), and aliasing is further suppressed using compressed sensing reconstruction
as seen in Fig. 2.6(d). Sampling density can be optimized based on aliasing incoherence
as a criterion and via the minimization of maximum sidelobe level in the sampling point
spread function [56].
Compressed sensing reconstruction can be improved by taking into account a variety
of system imperfections in MR acquisition and incorporating these into the MR signal
equation. For example, factors that a®ect acquired MR signals may include: 1) spa-
tial o®-resonance, 2) relaxation, 3) gradient/DAQ delay, 4) eddy currents, and 5) coil
sensitivity. Questions on how to determine appropriate sparsifying basis, ¯nd optimal
20
sampling schemes, and improve sparse reconstruction algorithms, still remain to be an-
swered. These will be answered in di®erent context depending on the type of pulse
sequence, imaging parameters, the use of contrast agent, and the anatomy of interest.
2.3 Upper Airway Imaging
MRIisapowerfultoolforthenon-invasiveassessmentofupperairwayanatomyincluding
the shaping of the tongue, lips, soft palate, and pharyngeal wall. Upper airway MRI has
been used to facilitate clinical assessments in patients with sleep apnea [98, 104], speech
disorders [9], swallowing disorders [4], and for surgical planning [65]. It has also been
used for research purposes within the speech research community [75, 79].
2.3.1 Air-tissue Magnetic Susceptibility
In upper airway imaging of speech and sleep apnea studies, air-tissue boundaries are
the regions of interest and are subject to substantial resonance o®set due to a large
magnetic susceptibility between the air and tissue. Spatial o®-resonance patterns are
related to the geometry of tongue, jaw, and velum positioning, and they are not spatially
smooth. These large resonance o®sets result in severe artifacts in reconstructed images
when using a long readout duration in either EPI or spiral imaging. These artifacts can
be reduced by 1) measuring a ¯eld map (i.e., o®-resonance map) and 2) reconstructing
the image after incorporating the ¯eld map into the reconstruction process. Field map
acquisition typically involves imaging with two di®erent echo times and subtraction of
the phases of these two di®erent echo images. Accurate estimation of the ¯eld map is
challenging in real-time MRI of the upper airway. This is due to the fact that the two
images from di®erent echo-time may not be exactly the same due to motion. Figure 2.7
21
Figure 2.7: E®ect of spiral readout duration on the upper airway images. Each image
represents a single frame captured when the subject sustained the nasal sound /n/. The
spiral readout durations were (a) 2.520 ms, (b) 3.584 ms, (c) 4.576 ms, (d) 6.368 ms, and
(e) 10.560 ms. Identical shim values were used for (a)-(e). Blurring/distortion artifacts
intheimagesaremoreprominentintheair-tissueinterfaceofthetongue, lips, andvelum
as the readout duration increases.
illustratesspatialblurring/distortione®ectsduetoo®-resonancefromair-tissuemagnetic
susceptibility when using spiral imaging with long readout duration.
2.3.2 Motion of Articulators
InMRimagingoftheupperairwayforspeechproductionorswallowingstudies,resulting
images are prone to artifacts due to motion. Conventional MRI acquisition is relatively
much slower than the speed of articulators (e.g., jaw, tongue, lips, and velum) during
°uent speech production. Several speech research groups have proposed gated imaging
techniques which synchronize the data acquisitions from repetitions of the utterance as a
means to improve temporal resolution [44]. The gated imaging schemes have limitations.
The successful use of gated imaging relies on how consistently the speakers can keep the
same positions of articulators and the same speech rate for each repetition trial. For
practical purposes, speech tasks are often limited to many repetitions of a few words.
Non-gated real-time MRI may be more suitable for imaging the vocal tract dynamics
during natural sound production [79]. To this end, real-time MRI of speech requires
ultra-fast acquisition speed. Spiral imaging is time-e±cient in sampling k-space, and is
insensitivetomotion. However,currentreal-timespiralMRIusedforspeechresearchstill
22
Figure 2.8: An example of motion artifacts in midsagittal vocal tract MRI image. Shown
are representative frames reconstructed from spiral data acquired in real-time when the
subject pronounced /ipa/: (a) /i/, (b) transition from /i/ to /p/, and (c) /a/. Temporal
resolution was 84 msec. Motion artifacts are prominent in (b) because the temporal res-
olution used for the imaging is not su±cient in capturing the rapidly moving articulators
(e.g., tongue body and lips).
lacks in temporal resolution. Figure 2.8 shows that the temporal resolution of 84 msec
is su±cient in capturing the air-tissue boundaries for the monophthongal vowel sounds
such as /i/ and /a/, but it is not su±cient in capturing the shaping when there is a
rapid transition from /i/ to /p/. Note the temporal blurring and swirling artifacts in
Fig. 2.8(b) due to data inconsistency resulting from articulators' motion.
2.3.3 Imaging Tradeo®s
Prior to imaging the upper airway, one should carefully determine pulse sequence design
parameters: °ip angle, ¯eld-of-view (FOV), spatial resolution, temporal resolution, the
number of spiral interleaves, readout duration, slice thickness, etc. I focus on real-time
fast gradient-echo spiral MRI, which is routinely being used for our speech production
research at the University of Southern California.
Flip angle can be chosen based on the Ernst angle, which is the optimal °ip angle
that maximizes transverse magnetization in steady state given TR and T
1
values of the
tissue.
23
Figure2.9: Spiralpulsesequencediagramandk-spacetrajectory. (a)Pulsesequencedia-
gramforspiralfastgradientechosequence. Shownare1Dsliceselectiveexcitation(green
box), spiral readout (red box), spiral rewinder (blue box), and gradient spoiler (magenta
box). (b) Spiral k-space trajectory. This illustrates that spatial resolution is related to
the spiral k-space radius, and imaging ¯eld-of-view (FOV) is inversely proportional to
the distance between adjacent spiral lines.
FOV can be properly selected depending on the k-space trajectory. In Cartesian
sampling, the frequency encode direction is typically chosen to be along the superior-
inferior (S-I) direction. This suppresses signal from the brain and neck via analog low-
pass ¯ltering. In spiral imaging, FOV is rotationally symmetric (see Fig. 2.9(b)), and
a smaller FOV imaging gives a faster acquisition speed. Reduced FOV imaging can be
performed e®ectively when the receive coil sensitivity is localized to the upper airway
regions of interest.
A proper selection of spatial and temporal resolution is important in real-time MRI.
For example, in spiral trajectory design, when the readout duration and FOV is ¯xed, a
selection of higher spatial resolution requires more spiral interleaves to cover the k-space
and results in lower temporal resolution.
24
Readout duration (denoted by T
read
in Fig. 2.9(a)) o®ers a trade-o® between the
degreeofblurringartifactsandtheacquisitionspeedforspiralimaging. Thelongeristhe
readout duration, the less number of RF excitations is needed per image, implying faster
imagingspeedbyreducingtheoverheadonthetimespentforthesliceselectiveexcitation
and spoiler gradients (see Fig. 2.9(a)). However, the longer readout duration incurs a
larger phase accrual in the presence of o®-resonance. This leads to more pronounced
artifacts in reconstructed images.
Slice thickness needs a careful selection. The choice of thicker slice leads to higher
SNR and also can lead to shorter duration of slice excitation pulse (i.e., gain in imaging
speed), but it leads to poorer spatial resolution across the slice.
Signal-to-noise ratio (SNR) serves as one of the most important criteria in the assess-
mentofMRIimagequality. SNRisde¯nedastheratioofthesignalintensitytostandard
deviation of the noise [82]. SNR is proportional to voxel size and the square root of the
scan time. In addition, SNR is dependent on T
1
values of soft tissue of interest and pulse
sequence parameters such as °ip angle and repetition time (TR). In RF-spoiled gradient
echosequencewithTR¿T
1
,thesignalatthesteadystatehasthefollowingrelationship:
s(TR;T
1
;µ
E
)1 M
0
(1¡e
¡TR=T
1
)e
¡TE=T
¤
2
p
1¡e
¡2TR=T
1
; (2.9)
where s is the signal amplitude, M
0
is the magnetization at thermal equilibrium and
is proportional to magnetic ¯eld strength B
0
, µ
E
is the Ernst °ip angle that produces
maximum transverse signal in steady state. TE is the echo time (i.e., time interval
between the center of RF excitation and the instant when the echo of the signal is
formed), and T
¤
2
is the e®ective transverse relaxation rate.
25
2.3.4 Upper Airway MRI of Speech
Inspeechresearch,understandingtheshapingofthevocaltractanditstemporalvariation
duringspeechhasbeenofagreatinterestinresearchcommunitiesfromlinguistics,speech
signal processing, speech pathology, otolaryngology, etc. Imaging technologies such as X-
ray, computed tomography (CT), ultrasound, and electromagnetic articulometer (EMA)
havebeenadoptedtoexaminetheshapingofthevocaltract(orthetongue)[11]. However,
eachhavelimitations. Forexample, X-rayandCTinvolveexposuretoionizingradiation,
which is harmful to the subjects. In ultrasound imaging, the probe contacts the jaw and
obstructs natural speech production. In EMA, the sensors are di±cult to attach in deep
regions such as the pharyngeal wall and velum.
Magnetic resonance imaging (MRI) is a non-invasive imaging modality that involves
no ionizing radiation and enables to provide excellent visualization of the soft tissue in
3D. A primary drawback is that MR image acquisition is notoriously slow. As seen
in Figure 2.10, the midsagittal MR image shows clear depiction of articulators such as
lips, tongue, jaw, and velum in open-mouthed position. However, the scan time (i.e., 31
seconds) is not adequate in capturing the dynamics of articulators during speech. Hence,
technological development for improved spatiotemporal resolution is essential and should
be targeted in the vocal tract regions of interest.
2.3.4.1 3D MRI of Sustained Speech
MRI has been adopted in speech research community as a means to extract full three-
dimensional(3D)anatomicalinformationofthevocaltractshapeduringsustainedspeech
production [103, 75, 6, 88]. This provides insights into the knowledge of a whole vocal
tract shape and data for its modeling. High spatial resolution (i.e., 1 » 2 mm) 3D
volume is acquired by prescribing multiple slices which are orthogonal to the midsagittal
26
Tongue
Velum
Hard Palate
Epiglottis
Figure 2.10: Example MR image of a midsagittal slice of human upper airway. Scan time
was 31 seconds. A fast spin echo sequence was used to acquire the image.
slice (see Fig. 2.11(a)). Scan time in 3D imaging often exceeds normal breath-hold limit.
The acquisition typically needs multiple repetitions of the same sound. This may result
in image mis-registration problem, and thus accelerating MRI acquisition is desirable.
In addition, 2D multi-slice imaging with a certain slice gap produces noncontiguous 3D
shape of the soft-tissue as shown in Fig. 2.11(b). True 3D encoding in MR acquisition
can eliminate this problem, but it may necessitate a longer scan time than 2D multi-slice
imaging.
27
Figure 2.11: Conventional 3D vocal tract imaging approaches. (a) Slice prescription for
3D MRI of the vocal tract (adapted from O. Engwall et al. [26]. (b) 3D tongue shape
extracted from data acquired in the oblique coronal slices (adapted from C. Shadle et
al. [97]).
2.3.4.2 Real-time MRI of Fluent Speech
Real-time MRI continuously acquires data and reconstructs and displays image frames
in real-time. It has been developed primarily for cardiovascular MRI [50, 80, 81]. Several
researchgroupshaveappliedreal-timeMRItovocaltractimagingduringspeech[24,79].
Speech Production and Articulation kNowledge group (SPAN, http://sail.usc.edu/span)
at the University of Southern California (USC) is an interdisciplinary research group
in which various research groups are in collaboration from linguistics, otolaryngology,
computer science, and electrical and biomedical engineering. Currently, real-time speech
MRI data collection is routinely conducted at the Los Angeles County hospital in the
USC Health Science Campus. I have been working as an MRI operator in the SPAN
group. MRIoperatorcontactstheMRIchieftechniciansoradministratorsatthescanner
sites for a schedule availability for our data collection. The operator performs the MRI
safety screening on the subjects. After the consent and safety procedures are complete,
the operator places the subject inside the magnet and runs a custom real-time imaging
28
software (RTHawk), which was originally developed by Juan M. Santos [96], for real-
time MRI data collection. RTHawk graphic user interface provides a convenient way
to operate vocal tract MRI scans: 1) The operator can be aware of when to stop the
scan by monitoring the vocal tract movement in real-time, and 2) the operator can easily
adjustscanparameterssuchasscanplaneorientation,FOV,centerfrequency,linearshim
values, DAQ/gradient delays, selection of coil elements, °ip angle, slice thickness, etc.
In real-time speech MRI experiment, speech signal is recorded simultaneously with
real-time MRI data acquisition that is based on fast spiral gradient echo sequence. Con-
ventional gridding reconstruction [46] is used to generate the video of vocal tract dy-
namics. Noise cancellation based on adaptive signal processing [13] removes loud MRI
gradient noise and recovers the speech signal. Then, the video reconstructed from MRI
data is merged with the speech audio to produce synchronized audio/video of the vo-
cal tract dynamics, typically midsagittal view. Real-time speech MRI experiments have
been conducted in a variety of research and clinical settings. Figure 2.12 shows one ex-
ample of real-time MRI nasal studies that investigate timing e®ects in the coordination
of articulators under di®erent syllable contexts [19].
29
(a) “bow know”
(b) “bone oh”
1 2 3 4 5 6 7
8 9
10
11 12 13 14
1 2 3 4 5 6 7
8 9 10 11 12 13 14
Figure2.12: Midsagittalreal-timespeechMRIfornasalspeechproductionstudy. Syllable
di®erence in°uences di®erent timing of articulators. Articulation of /bono/ in di®erent
syllable contexts: (a) /n/ is initial in the second word. (b) /n/ is ¯nal in the ¯rst word.
In (a), the velum starts lowering when the tongue tip touches the hard palate (see frame
6). In (b), the velum starts lowering (frame 4) before the tongue tip touches the hard
palate (see frame 7). Temporal resolution was 78 msec. Frames are shown at every 42
msec.
30
Chapter 3
EPI Artifact Correction for Real-time MRI
This chapter introduces an echo-planar imaging (EPI) acquisition and reconstruction
method which is e®ective at removing ghosting artifacts in real-time interactive cardiac
MRI. In addition, I brie°y present the application of the proposed acquisition scheme to
real-time upper airway imaging of speech production.
3.1 Introduction
Echo-planarimaging(EPI)[71]isusedincardiacMRIbecauseitacceleratesimageacqui-
sition, while maintaining image quality comparable to 2DFT. Cardiac EPI imaging often
introduces artifacts in reconstructed images which may include geometric distortion and
ghostingduetoavarietyofsourcesincludingo®-resonance,in-plane°ow,cardiacmotion,
and echo-misalignment. Geometric distortions due to o®-resonance can be mitigated by
reducing the echo spacing in the readout gradient or acquiring the data using multiple
RF excitations (i.e., shots), which increases the e®ective sampling rate (i.e., increases
the acquisition bandwidth) along the phase-encode direction. Ghosting due to cardiac
motion can be mitigated by making the acquisition faster or restricting data acquisition
to a relatively stationary cardiac phase.
31
Ghosting artifacts caused by echo-misalignment are a systemic problem, and are a
function of induced eddy currents and system timing errors, which are associated with
the scanner hardware (e.g., eddy currents from the cryostat and relative delays of the
physical x, y, and z gradients). These issues are further complicated in oblique scan
planes, which are routinely used in real-time imaging (e.g., cardiac short-axis and long-
axis views).
Theconventionalmethodforcorrectingecho-misalignmentinvolvesperformingacali-
brationscantodeterminetheon-axisgradient/DAQtimedelays,andusingsmallgradient
\blips" to align echoes. These blips are scan-plane dependent [92, 116], and should be
redesigned upon each scan-plane change. Flyback readouts [29], which involve acquiring
data only on the positive polarity of the readout gradient, can be used to avoid echo-
misalignment artifacts without a calibration scan, but have the disadvantage of reduced
scan time e±ciency. Image-based correction [16], which takes into account 1D phase er-
rorsfromreduced¯eldofview(FOV)imagesreconstructedseparatelyfromoddandeven
echoes, is e®ective at reducing ghosting artifacts in on-axis scan planes, but its e®ective-
ness may be degraded in oblique (and double-oblique) scan planes. Full 2D correction is
possible using fully phase-encoded reference scans for each scan plane [21], but it may be
impractical to acquire such reference scans in dynamic and/or real-time imaging applica-
tions. Furthermore, use of such 2D phase-error maps may be problematic when imaging
rapidly moving structures such as the heart, in which the estimated phase-error between
odd and even echoes will be biased by phase-accrual due to °ow and motion.
Anotheralternativeistoseparatedatafromleft-to-rightandright-to-lefttraversalsin
k-space (which each have half the desired FOV), and to reconstruct ghost-free full-FOV
images using parallel imaging [61, 42, 49, 32]. This approach is attractive for real-time
imaging because it does not require additional calibration scans or any modi¯cation of
32
the pulse sequences at the time of scan plane change. The phased-array ghost elimina-
tion (PAGE) method [49] has provided a generalized framework for cancelling ghosting
artifactsduetosourcesincludingo®-resonanceandecho-misalignmentusingtheinforma-
tion of local coil sensitivity pro¯les from multiple coils. Herzka et al. [42] demonstrated
gated cardiac imaging with a sequential non-interleaved EPI acquisition scheme, where
the echo-train-length (ETL) was equal to the SENSE reduction factor (e.g., 2 or 3).
In this work, I propose an \interleaved" gradient-echo EPI acquisition scheme and
associated reconstruction method. Compared to the original PAGE method, this acqui-
sition uses a large ETL ranging from 15 to 50 and a small number of shots in order to
achieve su±cient temporal resolution in free-breathing real-time cardiac EPI imaging.
Using shot-to-shot interleaving of the phase-encode lines and a \double-alternating" k-
space data acquisition scheme, the proposed method achieves ghost suppression using a
SENSEreductionfactorof2, regardlessoftheETL.Theproposedtechniqueiscompared
with the conventional 1D phase correction method [16] in oblique and double-oblique
scan planes. Two-fold accelerated EPI imaging is also demonstrated in conjunction with
theproposedautomaticghostingcorrectiontechnique. Finally,thefeasibilityofreal-time
reconstruction is demonstrated using a custom real-time imaging platform [96].
3.2 Methods
Whenperformingecho-planarimagingatobliqueordouble-obliquescanplanes,twoorall
three physical gradients will oscillate during each readout. An important consideration is
that physical gradients in x, y, and z may have unequal delays [92]. For an oblique scan
planewithunequalgradientdelays, the k-spacelineswithdi®erenttraversaldirections in
the logical coordinate frame will sample positions in k-space that are shifted in opposite
directions along the physical coordinate axes. Combined data are not uniformly spaced,
33
which causes artifacts in reconstructed images. Within the set of lines having the same
traversal direction, uniform spacing is maintained.
The proposed acquisition scheme involves acquiring k-space data with alternating
polarity of the readout gradient. Figure 3.1(a) illustrates the proposed reconstruction
method. Coil sensitivity maps are reconstructed separately from the left-to-right (L-R)
and right-to-left (R-L) lines, which are acquired from the two most recent time frames, n
andn-1[47]. TheL-RandR-Llineseachpreserveuniformspacingbetweenphaseencode
lines and prevent artifacts due to echo-misalignment [42]. In time frame n, SENSE [90]
reconstructionwithareductionfactorof2isappliedseparatelytotheL-RandR-Llines.
The resulting two full-FOV images di®er not in magnitude but in phase, and taking a
root sum-of-squares combination eliminates distortions in phase and recovers signal-to-
noise ratio (SNR). This non-accelerated double-alternating scheme can be applied to any
numberofinterleavesaslongasL-RandR-Llinesarealternatingalongthephaseencode.
Anevennumberofinterleaveswouldrequirethatthephase-encodeblipsalternateinsize.
Figure 3.1(b) illustrates that automatic EPI ghosting correction and two-fold accel-
eration can be achieved simultaneously by controlling interleaf order as shown. In this
case, full-FOV coil sensitivity maps are estimated on the °y based on four adjacent time
frames. After separating L-R and R-L data, a reduction factor of 4 can be used to per-
form SENSE unaliasing operations. In general, to perform accelerated imaging with the
proposed method and an acceleration factor of A ¸ 2, the SENSE reconstruction will
require a reduction factor of 2A, which must not exceed the total number of coils, and
the most recent 2A time frames will be used for forming coil sensitivity maps. The ac-
celerated method can be applied to double-alternating EPI with oA interleaves for any
odd integer o using phase-encode blips with constant size, and with eA interleaves for
any even integer e using phase-encode blips with alternating size.
34
1D phase correction [16] is used for comparison in the phantom and in vivo stud-
ies because it also does not rely on any calibration, and is compatible with real-time
imaging. For each coil, a 1D phase map representing constant and linear phase errors
is computed using phase di®erences between the L-R and R-L images. 1D phase cor-
rection is performed using Eqn. (12) from Ref. [16]. A root-sum-of-squares operation is
performed to produce the ¯nal corrected image, with the images from all coils considered
for reconstruction.
Experiments were performed on a Signa Excite 3 T scanner (GE Healthcare, Wauke-
sha, WI) with gradients capable of 40 mT/m amplitudes and 150 mT/m/ms slew rates.
The receiver bandwidth was set to §125 kHz, i.e., 4 ¹s sampling time. The body coil,
capable of peak B
1
of 16 ¹T, was used for RF transmission and an 8-channel cardiac
array coil was used for signal reception.
Circular EPI (CEPI) trajectories [85], which have a circular k-space footprint, were
designed in MATLAB (The Mathworks, South Natick, MA). CEPI trajectories used
in the experiments produced a 10 to 15% reduction in readout duration compared to
conventionalrectangularEPItrajectories. Abipolarpulsewasaddedtothephaseencode
gradient waveform prior to data acquisition in order to null the ¯rst moment in the
y direction at k
y
= 0. Alternating interleaved EPI readouts with an odd number of
interleaveswereused[17,41],andthisk-spacetraversalpatternwas°ippedinthereadout
direction at every time frame. Echo-time shifting [28] was used to mitigate o®-resonance
e®ects.
Real-time interactive cardiac scanning was performed using the RTHawk real-time
imaging platform. Scan planes were changed interactively by the operator in order to
test the performance of the proposed method at all angles. A spectral-spatial RF pulse
wasusedtoexcitewaterspins, witha5:2mmslicethickness, and440Hzbandwidth[80].
35
Flip angles of 15 to 25 degrees were used. The proposed reconstruction was performed
both o®-line and on-line, and videos were produced o®-line. Raw data from all four ante-
rior receiver channels were used to perform SENSE reconstruction in the non-accelerated
acquisition,andrawdatafromalleightelementswereusedtoperformSENSEreconstruc-
tion in the accelerated case. The noise correlation matrix used in SENSE reconstruction
was computed using the formula presented in the Appendix section of Ref. [90] from raw
data obtained with the RF excitation turned o®.
Phantom experiments were conducted to quantitatively evaluate the level of ghost
suppression. A cylindrical phantom was imaged with axial, oblique, and double-oblique
scan orientations. The proposed method was compared with conventional 1D phase
correction[16]. Thee®ectivenessofghostsuppressionwasevaluatedbycomparingghost-
to-signal ratios (GSR) within the same manually selected region of interest (ROI).
Cardiac in vivo experiments were performed on three healthy volunteers without
gating or breath-holding, and were evaluated qualitatively. Each subject was screened
and provided informed consent in accordance with institutional policy.
Thenon-acceleratedautomaticghostingcorrectionmethodwasimplementedinC++,
within the RTHawk real-time reconstruction software [96]. The LAPACK linear algebra
package was used for the matrix inversion operations required during SENSE reconstruc-
tion. A Linux personal computer (Compaq R3000) with a single 3:2 GHz Intel CPU
and 896 MB RAM was used for real-time reconstruction. Data were sent from the host
computer to the reconstruction computer via Ethernet after each TR [96]. Data from the
four anterior elements of the eight element receiver array were used for reconstruction.
Reconstruction time was measured separately using the built-in C++ \gettimeofday"
function for the following four reconstruction steps: 1) coil sensitivity map estimation, 2)
aliased image reconstruction, 3) SENSE matrix inversion, and 4) image display.
36
Table 3.1: Ghost-to-signal ratios from the phantom study (see Fig. 3.2 for 1D phase
correction and the proposed method. Mean and standard deviation of g-factor values for
the proposed method are also reported.
1D phase correction Proposed method
Scan plane Ghost-to-signal ratio Ghost-to-signal ratio g-factor(mean§ std)
Axial 3:94 % 4:24 % 1:26§0:25
Oblique 8:76 % 3:40 % 1:28§0:28
Double oblique 6:18 % 2:03 % 1:27§0:26
3.3 Results
InFigures,unlessotherwisenoted,thereadoutandphaseencodedirectionscorrespondto
the horizontal and vertical axes, respectively. The uncorrected images in Figures 3.2 and
3.3 were reconstructed by performing an inverse Fourier transform of the raw data with-
out separating the L-R and R-L acquisitions, and without applying any post-processing
method for EPI ghosting correction.
Figure3.2containsphantomimagesreconstructedfromdataacquiredinaxial,oblique,
anddoubleobliquescanplanes. Both1Dphasecorrectionandthe proposedmethodpro-
duce visually comparable ghost-free images in the axial scan plane. However, in oblique
and double-oblique scan planes, ghosting artifacts are prominent when using conven-
tional1Dphasecorrectionbutaresigni¯cantlyreducedwhenusingtheproposedmethod.
Ghost-to-signal ratios (GSR) are listed in Table 3.1 and demonstrate the e®ectiveness of
the proposed method. The GSRs for oblique and double oblique scan planes for the pro-
posed method were 3.40% and 2.03% respectively, whereas those for 1D phase correction
were 8.76% and 6.18%. The mean and standard deviation of g-factor values for the pro-
posed method are also listed in Table 3.1. The average g-factor values ranged from 1.2
to 1.3 in all three scan planes considered.
37
Figure 3.3 contains in vivo cardiac images for four standard cardiac views in one
representative volunteer. Uncorrected images are shown alongside images reconstructed
using1Dphasecorrectionandtheproposedmethod. Uncorrectedimagesexhibitghosting
artifacts in all four cardiac views, and are most severe in the two chamber and four
chamber views. Both 1D phase correction and the proposed method produce appreciable
suppression of ghost artifacts in all views. The proposed method produces better image
quality and improved ghost suppression compared to 1D phase correction (indicated by
white arrows in Fig. 3.3). However, the proposed method experiences noise ampli¯cation
due to the use of parallel imaging with a reduction factor of 2. The mean and standard
deviation of g-factor values were 1:45§0:50, 1:75§0:94, 1:34§0:33, and 1:34§0:28, for
the axial, two-chamber, four-chamber, and short axis views, respectively.
Figure 3.4 illustrates a double-oblique short-axis view during rapid scan-plane rota-
tion. Thescanplanewasrotatedinincrementsof10degreescontinuouslythroughoutthe
acquisition. Ghosting artifacts are suppressed even during rapid changes in scan orien-
tation. Note that images at 0:9, 1:2, and 3:3 seconds appear blurred because they occur
during scan plane changes.
Figure 3.5 compares two-fold acceleration with no acceleration and also illustrates
the e®ect of motion on low temporal resolution coil sensitivity maps. Coil sensitivity
maps were reconstructed using either I) data temporally adjacent to the target data, or
II) data obtained from a stable diastolic phase. A FOV of 33 cm (rather than 25 cm)
was used to mitigate the g-factor increase when using a reduction factor of 4 with our
8-element cardiac array. Two-fold accelerated images (Fig. 3.5c,h) exhibit much lower
SNR than non-accelerated images (Fig. 3.5b,g) because of the reduced acquisition time
and the highly elevated g-factor. In some areas the g-factor reached 10 (Fig. 3.5e). Coil
sensitivity maps obtained using calibration scheme I exhibit motion induced ghosting
38
artifacts (Fig. 3.5a, solid arrow), while those obtained using calibration scheme II exhibit
substantially reduced motion artifacts (Fig. 3.5f, solid arrow). Ghosting artifacts in the
coil sensitivity maps lead to residual ghosting in the ¯nal images (Fig. 3.5b,c compared
to Fig. 3.5g,h). The mean and standard deviation of g-factor values were 1:25§ 0:37
and 2:05§0:90 for the non-accelerated and two-fold accelerated cases using calibration
scheme I, and were 1:16§ 0:25 and 1:79§ 0:58 for the non-accelerated and two-fold
accelerated cases using calibration scheme II. Two-fold accelerated images exhibit less
temporal blurring of structures because of the reduced acquisition time (e.g., descending
aorta indicated by dashed arrows).
Reconstruction time for the non-accelerated method was measured for the real-time
reconstructionalgorithm. Averageruntimemeasurementsforthedi®erentreconstruction
steps were: 45:04 ms for the computation of coil sensitivity maps, 41:45 ms for the
computation of aliased images, 57:44 ms for SENSE matrix inversion operations, and
0:84 ms for image display. The total reconstruction time was approximately 144 ms per
frame,whiletheacquisitiontimeperimagewas60ms. Thisindicatesthatpipeliningthis
computation across three processors will be su±cient for real-time reconstruction using
commercially available personal computer hardware.
3.4 Discussion
Theproposedmethode®ectivelyeliminatedghostingartifactsduetoEPIecho-misalignment
in arbitrary double-oblique scan planes. When applied to real-time interactive imaging,
it automatically corrected ghosting artifacts without a calibration scan whenever a scan
planechangeoccurred. Thisautomaticcapabilityisattributedtothefactthatghost-free
coil sensitivity maps are updated with a few recent time frames, e.g., two time frames for
39
the non-accelerated acquisition and four time frames for the two-fold accelerated acqui-
sition. In this method, a factor of two in SENSE reduction is used solely for correcting
EPI ghosting artifacts, and not for accelerating data acquisition. In the non-accelerated
acquisition, where a SENSE reduction factor of two is used, noise ampli¯cation due to
the SENSE matrix inversion was relatively insigni¯cant. However, in the two-fold ac-
celerated acquisition, where a SENSE reduction factor of four is used, the SENSE noise
ampli¯cation was severe. This can be mitigated by using 16-channel, 32-channel, or
larger receiver coil arrays for which rate-4 parallel imaging has been demonstrated with
reasonable g-factors [36].
A drawback of the proposed method is that lower temporal resolution coil sensitivity
maps su®er from motion induced ghosting artifacts when cardiac motion occurs during
coil calibration. Coil sensitivity maps corrupted with motion artifacts produce residual
artifacts in ¯nal corrected images. A simple way of alleviating this e®ect is to use the
datafromarelativelystationarydiastoliccardiacphasetoconstructcoilsensitivitymaps
that are free from motion induced ghosting. In the experiments, the use of stationary
frames for coil calibration substantially reduced ghosting artifacts in ¯nal corrected im-
ages. Alternatively, the use of temporal low-pass ¯ltering in coil calibration may also
mitigate ghosting artifacts in coil sensitivity maps [47].
Whenthenon-acceleratedcorrectionmethodwasappliedtoreal-timecardiacimaging
at 3 T, overall reconstructed images demonstrated high temporal resolution, excellent
suppressionofghostingartifacts,andhighblood-myocardiumcontrast. Insystoliccardiac
phases, subtle but noticeable FOV/2 ghosting was observed due to rapid cardiac motion.
In end-diastolic cardiac phases, almost no ghosting was observed. While three-interleaf
gradient-echo EPI was primarily used, I also experimented with other odd numbers of
interleaves. SingleshotEPIwasconsideredasawaytofurtheraccelerateacquisition,but
40
signallossandblurringduetoT
¤
2
relaxationprovedtobelimiting, giventhesamespatial
resolution as the imaging protocol used in Fig. 3.3. The use of ¯ve or more interleaves
increased the prevalence of artifacts due to cardiac motion and the lowered temporal
resolution.
Inconclusion,aninterleavedgradient-echoEPIacquisitionstrategy,andPAGE-based
reconstruction technique have been presented as a means for automatically correcting
EPI ghosting artifacts due to echo-misalignment. The method was applied successfully
to real-time interactive cardiac imaging at 3 T, with superior performance compared to
conventional 1D correction. Ghosting artifacts were automatically corrected at arbitrary
oblique scan planes, and high-quality ghost-free images were obtained with 3:1 mm spa-
tial resolution and 60 ms temporal resolution. The automatic EPI ghosting correction
methodutilizesparallelimagingwithareductionfactorof2, andmaybealsocompatible
withfurtheraccelerationusinghigherreductionfactorswhen16,32,orhigherchannelre-
ceiver coil arrays are used. The feasibility of real-time reconstruction using commercially
available workstations was demonstrated.
3.5 Application to Upper Airway Imaging
The proposed \double-alternating" EPI acquisition method was tested on a midsagittal
slice of the human upper airway. To reduce the readout duration and shorten the echo
time, partial k-space sampling was used where only 66:7 % k-space lines were acquired.
Pixel resolution was 60£60. ETL was 8. The k-space data from every 10 interleaves
(i.e., 10 TRs) were combined and used for image reconstruction. This resulted in a frame
with 77 ms temporal resolution. Frames were updated at every 5 TRs, in which TR was
7.704 ms. Designed FOV was 18£18 cm
2
and 110% FOV scale factor was used so that
the ¯nal in-plane spatial resolution was 3:3£3:3 mm
2
.
41
ExperimentswereperformedonaSignaExcite1.5Tscanner(GEHealthcare,Wauke-
sha, WI) with gradients capable of 40 mT/m amplitudes and 150 mT/m/ms slew rates.
The receiver bandwidth was set to §125 kHz, i.e., 4 ¹s sampling time. The body coil,
capableofpeakB
1
of16¹T,wasusedforRFtransmissionandacustom4-channelupper
airway coil was used for signal reception.
One subject was screened and provided informed consent in accordance with institu-
tional policy. Audio signals were recorded inside the magnet. The subject was scanned
in supine position and repeated the following TIMIT [30, 118] sentences: \she had your
darksuitingreasywashwaterallyear"and\Don'taskmetocarryanoilyraglikethat".
MRIgradientacousticnoisewaseliminatedbasedonRef.[13]. Theframescorresponding
to \all year" were able to be identi¯ed with the aid of the synchronized audio and video.
Figure 3.6 shows 20 frames representing the utterance of \all year". Images are free from
ghosting artifacts even though the scan plane is slightly double-oblique from the on-axis
sagittal plane. Unlike spiral imaging, air-tissue boundaries do not su®er from a large de-
gree of spatial blurring, but may su®er from geometric distortion along the phase encode
direction.
42
Inv. FT
L-R
Sensitivity map
Inv. FT
R-L
Sensitivity map
Inv. FT
L-R
Half FOV image
SENSE recon. SENSE recon.
L-R Full FOV image R-L Full FOV image
Root Sum-of-Squares
Final image
Inv. FT
R-L
Half FOV image
ky
i2+
ky
Time frame
n-1
Time frame
n
i1-
i3-
i1+
i2-
i3+
Inv. FT
L-R
Sensitivity map
Inv. FT
R-L
Sensitivity map
Inv. FT
L-R
FOV/4 image
SENSE recon. SENSE recon.
L-R Full FOV image R-L Full FOV image
Root Sum-of-Squares
Final image
Inv. FT
R-L
FOV/4 image
Time frame
n-3
i1+
ky
Time frame
n-2
i1-
ky
i2-
ky
i2+
ky
Time frame
n-1
Time frame
n
Figure 3.1: Reconstruction °owchart for the proposed ghosting correction method for (a)
non-acceleratedand(b)two-foldacceleratedEPIwithautomaticghostingcorrection. (a)
The acquisition and reconstruction as a function of time is shown for three-interleaves,
and can be applied to any odd number of interleaves with constant phase encode blip
size, or to any even number of interleaves with an alternating phase encode blip size. At
time frame n-1, a full k-space data set is acquired with the interleaf ordering: i1-, i2+,
i3-. At time frame n, another full data set is acquired with ordering: i1+, i2-, i3+. These
two sets are repeated continuously. Note that `-' indicates that the readout gradient is
°ipped. To reconstruct a ghost-free image for time frame n, L-R data lines and R-L data
linesareseparated. Themostrecenttwotemporalframesareusedtoformcoilsensitivity
maps separately from L-R and R-L full-FOV data (middle row). SENSE reconstruction
is used to form full-FOV L-R and R-L images for the current time frame, which are then
combined using root-sum-of-squares, to produce a ¯nal image representing time frame n.
(b)Whenacceleratingdataacquisitiontimebyafactoroftwo,aSENSEreductionfactor
of four is needed. The most recent four temporal frames are used to form coil sensitivity
maps. In general, if areduction factorof R canbe achieved for a particular coil geometry
and scan planes, it can be combined with the proposed EPI strategy with an acceleration
factor of R/2 since one half of the reduction factor is used for separating L-R and R-L
lines during reconstruction and the remainder can still be used for acceleration.
43
Figure3.2: Real-timecylindricalphantomimagesreconstructedwithnon-acceleratedEPI
data. Data from four receiver coils (two from anterior and the other two from posterior
receivercoils)in8-channelcardiacarraycoilwereusedforreconstruction. Reconstruction
results for axial (top row), oblique (middle row), and double-oblique (bottom row) scan
planes are shown using no correction, 1D phase correction, and the proposed correction
method. The blue and red contours, superimposed on uncorrected images, represent
regions of interest (ROIs) for signal and ghost, respectively. The pixel values in the ROIs
were used to compute ghost-to-signal ratios. Images reconstructed with the proposed
method are ghost-free even in oblique and double-oblique scan planes. Scan parameters:
Double-alternating CEPI with ¯ve interleaves, ETL = 19, FOV = 21£21 cm
2
, spatial
resolution = 2:2£2:2 mm
2
, TR = 39 ms, and time-per-image = 195 ms.
44
Figure3.3: Invivo real-timecardiacimagesreconstructedwithnon-acceleratedEPIdata.
Data from four anterior receiver coils in 8-channel cardiac array coil were used for recon-
struction. Uncorrected(leftcolumn), 1Dphasecorrected(middlecolumn), andcorrected
images with the proposed correction scheme (right column) are shown for four standard
views: axial, two chamber, four chamber, and short axis. Ghosting artifacts are substan-
tially reduced in all four corrected images with the proposed method. The white arrows
indicate that residual ghosting artifacts are clearly visible in images reconstructed with
the 1D phase correction method, but they are not observed in images reconstructed with
the proposed method. Scan parameters: Double-alternating CEPI with three interleaves,
ETL = 27, FOV = 25£25 cm
2
, spatial resolution = 3:1£3:1 mm
2
, TR = 20 ms, and
time-per-image = 60 ms.
45
Figure3.4: Automaticcorrectionduringcontinuousscanplanerotation. Scanparameters
were identical to those described in Fig. 3.3. Reconstructed images using the proposed
methodareshownwithrespecttotime. Numbersinthelowerrightcornersintheimages
denote time in second. The interval between images is 300 ms, which corresponds to
a spacing of ¯ve time frames. Note the decrease in contrast between myocardium and
blood during the systolic phase in the cardiac cycle (see images at 3:6 and 4:5 second).
46
Figure3.5: Automaticghostingcorrectionusingnoaccelerationandtwo-foldacceleration
(b,c,g,h)andcorrespondingg-factormaps(d,e,i,j)reconstructedusingtwocoilcalibration
schemes. The target data are from a systolic cardiac phase where cardiac motion is
substantial. g-factor maps are shown using a scale from 1 to 10. Note that g-factor
values from calibration scheme II (i,j) are signi¯cantly lower than those from calibration
schemeI(d,e),especiallyforthetwo-foldacceleratedcase. Motionartifactsarenoticeably
reduced for the two-fold accelerated case when using calibration scheme II (solid arrows).
Two-fold accelerated images have lower SNR because of reduced acquisition time and
the elevated g-factor, but show less temporal blurring in the descending aorta (dashed
arrows)comparedtonon-acceleratedimages. Scanparameters: Double-alternatingCEPI
with two interleaves, ETL = 50, FOV = 33£33 cm
2
, spatial resolution = 3:3£3:3 mm
2
,
TR = 34 ms, and time-per-image = 136 ms (full FOV low temporal resolution for coil
calibration), 68 ms (non-accelerated), 34 ms (two-fold accelerated).
47
1 2 3 4 5
6
7 8 9 10
11 12 13 14 15
16 17 18 19 20
Figure 3.6: Dynamics of vocal tract shaping during natural speech utterances of \all
year". The phase encode is along the horizontal axis and the frequency encode is along
the vertical axis. SENSE reconstruction was not used primarily because images from the
two posterior coil elements exhibited aliasing artifacts in the vocal tract ROIs. Temporal
resolution was 77 ms. Frame update rate was 38.52 ms.
48
Chapter 4
Accelerated 3D Imaging Using Compressed Sensing
4.1 Single-coil Imaging
4.1.1 Introduction
Three-dimensional (3D) imaging of the upper airway during sustained sound produc-
tion has recently emerged as a promising tool in speech production research as a means
to capture the full geometry of the vocal tract. The diversity of tongue shapes and
dynamics are made possible, at least in part, through di®erent lingua-palatal bracing
mechanisms [101, 102, 75, 2] leading to complex airway geometries, the understanding
of which is critical for investigations into the production of both normal and disordered
speech. In addition to helping shed light on the intricate airway shaping mechanisms un-
derlying the production of various linguistically-meaningful speech sounds, 3D imaging
also lends itself to providing quantitative volumetric information of the airway regions.
The shaping of the tongue and other articulators, and the temporal characteristics of
their shaping, give rise to characteristic patterns of acoustic resonance behavior of the
vocal tract that de¯ne the properties of human speech that can be modeled with such
quantitative information.
49
Recentworkhasshownthatthree-dimensionaltongueshapeandthedynamicsunder-
lying shape formation are critical to understanding natural linguistic classes and issues
of phonological representation as evidenced in speech motor control. Previous models of
speech production often assumed that the position of maximum constriction, de¯ned in
the midsagittal plane, was the main \place of articulation" parameter. Imaging studies
such as those by Narayanan et al. [78] have suggested that articulation cannot be char-
acterized solely by identifying a constriction position and that speech production targets
go beyond the midsagittal plane. Initial speech studies using MRI focused on vowel
sounds [6, 103]. The models of the vocal tract constructed from the MR images of di®er-
ent vowels yielded good estimations of vowel formant frequencies and formant patterns,
whichagreedwiththegeneralacousticimplicationofthenotionofthetongueheightand
backness on vowel articulation. For example, the study by Narayanan et al. [77] that
focused on tongue shaping and 3D vocal tract data and models for the American English
vowels /a/, /i/, /u/ showed distinct di®erences in tongue shaping: the anterior tongue
was raised and convex for /i/ compared to the lowered concave shape for /a/ while the
tonguebackshowedanoppositetrendinthedegreeofconcavity. Thesedatawereusedin
a¯niteelementbasedsimulationofthevocaltractmodelstostudytheacousticproperties
of the vowel sounds. Other studies have investigated a variety of continuant consonant
sounds such as fricatives and liquids. Narayanan et al. [75] examined vocal tract shap-
ing of consonants using MRI and other articulatory measurements, and have presented
data and results on three dimensional vocal tract and tongue shapes for fricative sounds
produced by talkers of American English. These data showed key di®erences in tongue
shaping between the sibilants /s/ (concave, grooved) and /S/ (convex, cupped) and were
helpful in deriving meaningful acoustic source models for these sounds [74]. Using in-
sights gained in imaging work, in conjunction with the quantitative data of vocal tract
50
area functions and sublingual cavity of Alwan et al. [2], Espy-Wilson et al. [27] created
acoustic models for the American-English /r/ delineating clearly the role of the oral and
pharyngeal constrictions and the sublingual volume. Similar advances have been made
toward understanding the acoustics of lateral sounds [7, 115]. While these studies repre-
sent signi¯cant progress in speech research, they can be further improved by addressing
certain technological limitations.
These previous MRI studies were based on 2D multi-slice acquisitions, requiring mul-
tiple repetitions of the same sound and scan-time on the order of several minutes [75, 2,
78, 6, 103, 7, 76, 117]. These procedures are prone to data inconsistency, resulting from
slightly di®erent positions of the jaw, head, and tongue during each repetition. Com-
pared to 2D multi-slice, it is well known that 3D encoding provides contiguous coverage
with the potential for thinner slices and improved signal-to-noise ratio (SNR) e±ciency.
However, 3D encoding with high spatial resolution currently requires prohibitively long
scan time and easily exceeds the normal duration of sustained sound production with
minimal subject motion.
3D MRI scans may be accelerated using time-e±cient k-space sampling [45], parallel
imaging [33, 90], or with the recently developed approach of compressed sensing [20, 25,
66, 10]. Many of the e±cient k-space sampling schemes (based on spiral and echo-planar
trajectories) are prone to severe blurring artifacts and geometric distortions due to o®-
resonance at the air-tissue boundaries. Parallel imaging requires the design and use of
receiver coil arrays where the coil elements have di®ering sensitivity over the anatomic
region of interest [39]. Compressed sensing MRI (CS-MRI) relies only on sparsity of the
¯nal reconstructed image in a transform domain [20, 25, 66, 10].
In this manuscript, I investigate the use of CS-MRI for accelerated 3D upper airway
imaging,andinvestigatethepotentialbene¯tofphaseconstraints. MRimagesoftenhave
51
spatially varying phases whose sources may include receiver coil phase, gradient/DAQ
delays, o®-resonance, °ow and motion. Phase constrained (PC) CS, originally proposed
byLustigetal.[66],appliesalowspatialresolutionphaseestimateaspartoftheencoding
function. Thisisexpectedtoincreasesparsityofthesolutionincertaintransformdomains
(e.g., ¯nite di®erence). I explore the use of PC-CS in this application, because air-tissue
boundariesaretheprimaryfeaturesofinterestandareexpectedtoexperiencesubstantial
phase variation due to air-tissue susceptibility. I compare phase estimation from a low-
resolutionfullysampledregimewithatwo-stageapproachthatestimatestheobjectphase
map from a non-PC CS reconstruction. In retrospective sub-sampling experiments with
no sound production, CS reconstructed images with and without phase constraints were
compared qualitatively. Undersampled 3DFT acquisition and PC-CS reconstruction was
then prospectively applied with acceleration factors of 3, 4, and 5, to high-resolution 3D
vocal tract scanning during sustained sound production of English consonants /s/, /S/,
/l/, /r/, sounds characterized by complex tongue and airway shaping.
4.1.2 Theory
Consider3DFTimaging,wherek
y
andk
z
arethephaseencodingdirections,andtherefore
the axes of undersampling. After 1D Fourier transformation along k
x
, the signal for each
x position can be expressed as:
s(k
j
)=
L
X
l=1
e
¡i2¼k
j
¢r
l
e
iÁ(r
l
)
m(r
l
)+n(k
j
): (4.1)
Here, k
j
is the j
th
sampled k-space sample location in the (k
y
, k
z
) domain and 1 ·
j · J, where J is the total number of phase encodes. r
l
is the l
th
spatial position in
the (y, z) image domain, and L is the total number of pixels. Á is the phase in the
(y, z) image domain and m is the desired magnitude image (representing amplitude of
52
transverse magnetization) in the (y, z) image domain, and n is the i.i.d. (independent
and identically-distributed) additive white Gaussian noise. Because Eq. 4.1 holds for
1·j·J, there exist J linear equations that can be expressed as one matrix equation:
s=©Pm+n: (4.2)
Here, the signal vector s is [s(k
1
) s(k
2
) ::: s(k
J
)]
T
, © is the J £ L Fourier encoding
matrix, where ©(j;l)= e
¡i2¼k
j
¢r
l
, P is an L£L diagonal matrix, where the l
th
diagonal
element is e
iÁ(r
l
)
. m = [m(r
1
) m(r
2
) ::: m(r
L
)]
T
is the unknown image estimate and
n = [n(k
1
) n(k
2
) ::: n(k
J
)]
T
. When J << L, Eq. 4.2 becomes a highly underdetermined
linear system, and in¯nitely many solutions for m exist. Compressed sensing theory
statesthatmcanbeexactlyrecoveredwithaveryhighprobabilitywhen missparseina
transformdomain, byminimizingthe l
1
-norm of thesparsifying transform of the solution
under the constraint that k s¡©Pmk
2
is close to zero. Unconstrained optimization is
morepractical for large-scale reconstruction problems such as MRI image reconstruction,
therefore, the unknown image estimate m is obtained by minimizing the following convex
function:
f(m)=jjs¡©Pmjj
2
2
+¸jjªmjj
1
: (4.3)
Here,¸isaregularizationparameterthatcontrolstherelativeweightofsparsityanddata
¯tting, and ª is a sparsifying transform (e.g., wavelets, curvelets, or ¯nite di®erence).
In this work, I adopted the ¯nite di®erence sparsi¯er that contains the horizontal and
vertical gradients of the image. In the absence of P (i.e., P is the identity matrix), the
optimizationproblemisreferredtoasnon-phase-constrainedCSreconstruction. Inphase
constrained CS reconstruction, P contains a predetermined estimate of the object phase,
53
which may originate from system delays, receiver coil phase, and phase accrual due to
o®-resonance.
4.1.3 Materials and Methods
4.1.3.1 Data Acquisition
ExperimentswereperformedonaSignaExciteHD3:0Tscanner(GEHealthcare,Wauke-
sha, WI) with gradients capable of 40 mT/m amplitude and 150 mT/m/ms slew rate.
The receiver bandwidth was set to§125 kHz (i.e., 4 ¹s sampling rate). A birdcage head
coil was used for RF transmission and signal reception. Each subject was screened and
provided informed consent in accordance with institutional policy.
The vocal tract region of interest was imaged using a single midsagittal slab with 8-
cmthicknessintheright-left(R-L)direction. Thereadoutdirectionwassuperior-inferior
(S-I) and the phase encode directions were anterior-posterior (A-P) and right-left (R-L)
(seeFig4.1). AgradientechosequencewasusedwithTE=2:2msec,TR=4:6msec,°ip
angle = 5
±
, NEX = 1, spatial resolution = 1:5£1:5£2:0 mm
3
, and FOV = 24£24£10
cm
3
.
Pseudo-random undersampling was implemented as follows. First, two independent
and uniformly distributed random numbers corresponding to k-space radius and az-
imuthal angle were generated to create pseudo random (k
y
, k
z
) location in polar form.
From the randomly chosen samples, the nearest (k
y
, k
z
) Cartesian phase encodes were
selected for sampling. This scheme achieves a sampling density that is inversely propor-
tionaltok-spaceradius. Second, alowspatialfrequency, whoseoutermost k-spaceradius
was 30 % of the full k-space radius, was fully sampled. The ¯nal sampling patterns and
corresponding reduction factors are shown in Fig 4.2.
54
Figure 4.1: Illustration of scan plane prescription, which is used for the 3D upper airway
imaging. The dashed lines indicate the orthogonal slice orientation of each image. The
largest-width, medium-width, and smallest-width dashed lines are for the prescription of
the midsagittal, coronal, and axial slices, respectively. An 8 cm sagittal slab excitation is
applied to cover the vocal tract volume of interest. The readout direction is along S-I so
that the analog low-pass ¯lter suppresses uninteresting regions (e.g., the brain and neck).
The features of interest include: [LL] lower lip, [UL] upper lip, [P] palate, [T] tongue
surface, [V] velum, [PW] pharyngeal wall, [E] epiglottis.
4.1.3.2 Image Reconstruction
Since all data sets were fully sampled along the readout (k
x
) direction, data were ¯rst
inverse-Fourier transformed along the readout direction, and image reconstruction was
performed separately for each y¡z planar section. For each x position, fully sampled
datasetswerereconstructedusing2DinverseFouriertransform(IFT).Forthesimulated
and real undersampled acquisitions, un-acquired k-space locations were ¯lled with zeros
prior to inverse Fourier transformation.
For PC-CS, the phase map was calculated in two ways: (PC-I) Taking a 2D inverse
Fourier transform of fully sampled low spatial frequency data. In order to remove Gibbs
55
a b c d e
k
y
k
z
Figure 4.2: k-space sampling patterns used in the experimental studies. Relative reduc-
tionfactorsare(a)1,(b)1.3,(c)3,(d)4,and(e)5. Notethattheregioninsidetheellipse
with a radii 30% of the overall k-space was fully sampled in all cases for the estimation
of low-resolution image phase.
ringing artifacts due to k-space truncation, the low spatial frequency data set was multi-
plied by a 2D Hanning window. (PC-II) Taking the phase of the complex-valued image
estimate obtained from a non-PC CS iterative reconstruction. To avoid noise contam-
ination, the PC-II phase map was masked to contain only spatial locations where the
magnitude image was greater than 20 % of its maximum value.
CSreconstructionsfromundersampleddatasetswerebasedonaniterativenon-linear
conjugate gradient algorithm [66] which sought to ¯nd a global minimum for the cost
function in Eq. 4.3. The l
1
-norm of the ¯nite di®erence of the solution (also known as
TotalVariation[95])wasusedasaregularizer. Theregularizationparameter¸waschosen
based on the L-curve method [35]. I examined the tradeo® between data consistency and
total variation for a broad range of ¸ values (see Fig. 4.3) prior to selecting a ¸ value
for image reconstruction from prospectively acquired data. To speedup reconstructions
over a broad range of ¸ values, the ¯nal image from a particular ¸ value was used as the
initial image estimate for the CS reconstruction with the next higher ¸ value.
56
4.1.3.3 In Vivo Experiments
Subjects were in supine position and their heads were immobilized by inserting foam
pads between their ears and the receiver coil. A fully sampled data set without sound
productionwasacquiredinonetrainedsubject. Theirmouthwasheldopenfor36seconds
without swallowing. A total of 8000 (k
y
, k
z
) encodes, where the number of k
y
and k
z
encodes was 160 and 50, respectively, was used to fully cover 3D k-space at the Nyquist
rate. This data set was retrospectively sub-sampled to simulate the sampling patterns
shown in Fig. 4.2. The CS reconstructions were performed both without and with phase
constraints.
Prospective accelerated acquisitions were performed by imaging the vocal tract shap-
ing during each sustained sound production of English consonants /s/, /S/, /l/, and
/r/. Scan time for the 3, 4, and 5-fold accelerated acquisitions took 12, 9, and 7 sec-
onds, respectively. 2D CS reconstruction was performed for each axial slice. The initial
estimate for the CS reconstruction of a slice was taken from the ¯nal image estimate
obtained from the CS reconstruction of its adjacent slice. PC-CS reconstruction was ap-
plied with ¸ = 0:005 and 100 iterations for 65 contiguous slices of interest along x (i.e.,
S-I direction). 3D visualization of tongue shape was realized by manually segmenting the
tongue in each reconstructed coronal image, stacking the segmented slices, and ¯nally
performing 3D volume rendering using the vol3d.m Matlab routine (publicly available at
http://www.mathworks.com). The ¯nalvolumerenderedtonguesurfaceswereableto be
displayed at any view angle, providing e±cient visualization of tongue shaping.
4.1.4 Results
Figure 4.3 shows an L-curve obtained from the non-PC CS reconstruction. The corner of
the L-curve was not sharp and ¸=0:005, which lies on highest curvature, was chosen as
57
anoptimalregularizationparameterforboththenon-PCandPCCSreconstructions. For
large values of ¸ (i.e., ¸>0:01 in Fig. 4.3), reconstruction strongly favored minimization
of total variation so that reconstructed images were observed to be overly smooth.
Figure 4.4 shows some representative axial slice images from 3D data set when the
subject was in open-mouthed position during the 36-seconds scan. The phase variation
pattern resulting from large o®-resonance due to air-tissue susceptibility is not smooth,
and is related to the geometry of air-tissue interface. This is observed especially in the
orofacial tissue and the lateral sides of the tongue (see the yellow arrows).
Figure 4.5 shows images from one axial slice extracted from 3D volume in the ret-
rospective sub-sampling experiment. Figure 4.5a contains images obtained from IFT,
non-PC CS, PC-I CS, and PC-II CS reconstructions of the data sets sub-sampled with
di®erent reduction factors. The image from the elliptic k-space full sampling (1.3x) was
comparableinimagequalitytothatfromtherectangulark-spacefullsampling(1x). The
IFT reconstructedimages fromthe undersampleddataexhibited incoherentaliasing arti-
facts and the image quality was degraded with higher reduction factors. The non-PC CS
reconstructionimprovedimagequalityovertheIFTreconstructionintermsofde-noising
and enhancement of the air-tissue boundaries. The PC-I and PC-II CS reconstructions
further improved the air-tissue boundary depiction quality for reduction factors 3, 4, and
5. Figure 4.5b contains phase di®erence images after the low and high resolution phase
maps were subtracted from the full resolution fully sampled reference phase map. Notice
the larger phase errors in the low resolution phase map (see Fig. 4.5b(iii)) particularly in
the ROIs with rapid phase variations (indicated by the white arrows in Fig. 4.5b(i,iii,v)).
Figure 4.5c compares the boundary depiction in ROIs with rapid phase variation for
58
di®erent reconstruction schemes. The PC-II CS reconstruction clearly improved the de-
pictionoftheair-tissueboundariescomparedtoPC-ICSreconstruction(seewhitearrows
in Fig. 4.5c).
Figure 4.6 shows a midsagittal slice and eight equally spaced coronal slices refor-
matted from a 3D vocal tract volume obtained after the PC-II CS reconstructions. 3D
imaging provided many useful vocal tract shaping features that cannot be captured by
2Dmidsagittalimagingalone. Thegrooveofthetonguesurfacecouldbeclearlyobserved
in coronal sections in /s/ (see the white arrow in the /s/ row of Fig. 4.6). /S/ and /l/
sounds exhibited very similar vocal tract shaping patterns in the midsagittal scan plane,
but when comparing the coronal slices, the vocal tract cross sectional areas were signif-
icantly di®erent (see the white arrows in the /S/ and /l/ rows in Fig. 4.6). Figure 4.7
shows a 3D visualization of the tongue surface for each sound production of /s/, /S/, /l/,
and /r/. The groove of the tongue was clearly seen for the fricative /s/ and /S/ sounds,
but it was not observed for the /l/ sound. The cupping of the tongue was observed in
the /r/ sound.
4.1.5 Discussion
The major sources of phase include: 1) receiver coil phase, 2) spatial frequency o®set
due to ¯eld inhomogeneity and air-tissue magnetic susceptibility di®erence, and 3) gra-
dient/DAQ timing delay. These may be estimated from separate calibration scans or via
self-calibration, which was chosen in this study. Self-calibration avoids possible errors
caused by the vocal tract geometry changing between calibration scans and accelerated
scans. Itisnotedthatthefeaturesofinterestareair-tissueboundariessuchasthetongue
surface, lips, hard palate, velum, and epiglottis which are coordinated for the generation
59
of unique gestures depending on di®erent articulation tasks. Even two separate produc-
tions of the same sound/articulation task could result in slightly di®erent vocal tract
shaping, and has been a source of di±culty widely reported in the literature.
The PC-II CS reconstruction utilized a relatively high spatial resolution phase map
obtainedfromthenon-PCCSreconstructionandimprovedthedepictionoftheair-tissue
boundaries with large degree of phase variation, particularly at high acceleration factors.
The phase map estimate may be prone to artifacts due to imperfect CS reconstruction,
but it does tend to contain the rapidly varying phase information, while the low spatial
resolutionphasemapdoesnot. AdrawbackofthePC-IICSreconstructionisanincreased
reconstruction time because of the need for an additional iterative CS reconstruction just
for the phase estimate.
TheuseofTotalVariation(TV)regularizationwase®ectiveatimprovingthedepiction
of air-tissue boundaries and suppressing noise-like aliasing artifacts, and was more e®ec-
tivewhencombinedwiththephase-constrainedreconstructiontechnique. Thede-noising
andedge-preservingcharacteristicscanimprovetheperformanceofthesubsequentimage
processing tasks (e.g., Canny edge detection, image segmentation) for the quanti¯cation
process such as the measurement of the vocal tract area function. The degree of the
in°uence of TV regularizer was controlled by the choice of the regularization parameter
¸. TheL-curveanalysisprovidedtheinsightofchoosinganappropriate ¸. Moreover, the
waveletorcurvelettransformcanbeusedasanothersparsifyingbasisandthereconstruc-
tion may be improved by incorporating an additional regularizer into the optimization
function.
A drawback of the method is that reconstruction is computationally intensive and
requires a considerable reconstruction time. The convergence speed of the algorithm
was observed to decrease as either a higher acceleration factor or a large value of the
60
regularization parameter is used. For the generation of a 3D volume of the upper airway,
100 iterations were used to reconstruct a single image and this iterative reconstruction
was processed for 65 contiguous slices of interest along x. The generation of a 3D volume
when using PC CS took approximately 4 hours on a 3.4 GHz of CPU with 3.0 GB of
RAM.
In this work, the CS reconstructions were performed in two dimensions (y;z) after
1D IFT along k
x
. If computation time and memory size were not issues, there would
be potential bene¯ts to solving the CS optimization in 3D directly. Sparsity along x
wouldallow for some additional de-noising, and there would be an opportunity to correct
shifts in x-position due to o®-resonance if the di®erent sources of image phase could be
separated.
Although not shown here, the use of coil arrays (e.g., 8-channel neurovascular array
in our work) can improve the SNR in 3D upper airway imaging. If the combined use of
parallel imaging and compressed sensing were adopted, signi¯cantly higher accelerations
would be achievable [66, 10, 58, 73]. Linguistically relevant high resolution features such
as tongue tip constrictions and epiglottis would be easily resolved. Moreover, it may
be possible to measure the vocal tract area function with greater precision, therefore
improvingtheaccuracyofthequantitativeanalysisofvocaltractshapinginbothnormal
and disordered speech production.
4.1.6 Conclusions
I have demonstrated the application of compressed sensing (CS) MRI to high-resolution
3D imaging of the vocal tract during a single sustained sound production task (no repe-
titions needed). Phase constrained CS outperformed conventional CS in spatial locations
with large phase variations (lateral edges of the tongue). I have demonstrated that 5x
61
accelerationisachievablewithPCCS,withnegligiblelossoftissueboundaryinformation
that is relevant to speech production research. I have demonstrated a 3D upper airway
imagingusinganundersampled3DFTgradientechoacquisitionwitha1:5£1:5£2:0mm
3
spatial resolution in 7 seconds, which is a duration practical for sustained sound produc-
tion.
4.2 Multi-coil Imaging
In this section, I extend the accelerated 3D single-coil imaging and phase-constrained CS
(PC-CS) reconstruction to imaging with a multiple channel receive coil array (parallel
imaging). Thiscombineduseofcompressedsensingandparallelimaginghasbeenrecently
proposedbyseveralgroups[58,73], butthenotionofincorporatinghigh-resolutionphase
informationisuniquetothiswork. Ipresentatwo-stagereconstructionapproachthat¯rst
estimatesphasemapsforeachcoilelementviaconventionalCSreconstructions, andthen
reconstructs ¯nal image iteratively after incorporating the high-resolution phase maps
and low-resolution magnitude coil sensitivity maps into a multi-coil CS reconstruction.
4.2.1 Methods
4.2.1.1 Data Acquisition
Experiments were performed on a 3:0 T Signa Excite HD MRI scanner (GE Healthcare,
Waukesha, WI). The receiver bandwidth was set to §125 kHz (4¹s sampling rate). The
body coil was used for RF transmission, and 8-channel neurovascular array coil was used
for signal reception (only 4 superior elements were used for reconstruction). The vocal
tract region of interest (ROI) was imaged using a single thick midsagittal slab with 8 cm
thickness in the right-left (R-L) direction. The readout direction was superior-inferior
(S-I) and the phase encode directions were anterior-posterior (A-P) and right-left (R-L).
62
Agradientecho(GRE)sequencewasusedwithTE=2:3msec,TR=5:0msec,°ipangle
=10
±
,NEX=1,spatialresolution=1:33£1:33£1:33mm
3
,andFOV=20£24£8cm
3
.
4.2.1.2 In Vivo Experiments
A fully sampled data set, without sound production, was acquired when one trained
subject held the mouth open for 54 seconds, without swallowing. A total of 10800 (k
y
,
k
z
) encodes, where the number of k
y
and k
z
encodes was 180 and 60, respectively, fully
covered 3D k-space at the Nyquist rate. The undersampling of (k
y
, k
z
) was based on 1)
full sampling of low-spatial frequencies and 2) random undersampling of the remaining
high-spatial frequencies. The outermost k-space radius of the fully sampled region was
chosen to be 20% of the full k-space radius.
Prospective accelerated acquisitions were performed by imaging the vocal tract shap-
ing during sustained sound production of American English /s/,/S/,/l/,/r/. The scan
time for 6x, 8x, and 10x acquisitions were 9:0, 6:8, and 5:4 seconds, respectively.
4.2.1.3 Image Reconstruction
Data were ¯rst inverse-Fourier transformed (IFT) along k
x
. At each x position, recon-
structionwasperformedin2Dplanarsection. Forcomparison, twodi®erentconventional
reconstruction schemes were used: 1) root-sum-of-squared (RSS) reconstruction, where
imagesfromallfourcoilelementswereroot-sum-of-squared(RSS)toproduce¯nalimage,
and 2) iterative conjugate-gradient-based un-regularized SENSE reconstruction based on
the work of Pruessmann et al. [89].
The multi-coil PC-CS reconstruction is illustrated in Fig. 4.8. In the ¯rst stage,
high-resolution phase map was estimated using CS reconstruction for each coil element.
This is e®ective at capturing rapidly varying phases in the air-tissue boundaries, where
63
rapid phase variation is expected due to large susceptibility di®erence between the air
and tissue. Its incorporation into a PC-CS optimization leads to increased sparsity of
the transform coe±cients of the ¯nal solution [66]. In the second stage, multi-coil PC-CS
reconstruction was performed by minimizing the convex function in Eq 4.4. Here, s
l
is
the data vector for the l
th
coil element, © is the Fourier encoding matrix, P
l
is a diagonal
matrix containing the phase estimate, C
l
is a diagonal matrix containing coil intensity
map, and m is the unknown image estimate. ¯
TV
and ¯
W
are regularization param-
eters for total variation and l
1
-norm of wavelet transform, respectively. Daubecchies-8
wavelet transform was adopted in this study using WaveLab850 software (see the website
http://www-stat.stanford.edu/»wavelab/). Their values were chosen after visual inspec-
tion of reconstructed images representing a broad range of the values from retrospective
studies.
f(m)=
L
X
l=1
jjs
l
¡©P
l
C
l
mjj
2
2
+¯
TV
jjª
F
mjj
1
+¯
W
jjª
W
mjj
1
: (4.4)
4.2.1.4 Data Processing and Analysis
Vocal tract area functions were measured by 1) manually drawing the vocal tract mid-
line on a midsagittal slice, 2) prescribing several cross-sectional slices orthogonal to the
midline, from the lips to the glottis, and 3) calculating the vocal tract areas from each
cross-sectional slice. All analysis was done using OsiriX software [94].
4.2.2 Results and Discussion
Reconstruction results from retrospectively undersampled data (not shown) indicated
that 6x and 8x produced little or no air-tissue boundary errors but 10x produced signi¯-
cant boundary errors in the airway and lateral sides of the tongue.
64
Figure 4.9 contains midsagittal images and their corresponding 3D visualization of
tongue shapes for /S/ and /r/ sounds from prospectively acquired 8x data. The use of
1:33 mm isotropic resolution allows for su±ciently resolving the narrowing of the vocal
tract between the tongue blade and alveolar ridge (the yellow arrow in (b)). The degree
ofthetonguegroovingisclearlyseenfortheEnglishfricative/S/(theblackarrowin(e)).
The /r/ sound characterizes a complex geometry of the tongue shape (e.g., large volume
of the sublingual cavity (the white arrow in (f)) and cupping of the frontal tongue (the
red arrow in (f))).
Figure4.10containsamidsagittalslice,vocaltractareafunction,and3Dvisualization
oftonguesurfacefor/s/,/S/,/i/,and/r/. Midlines(thinlinesmarkedoneachmidsagittal
image) that were manually drawn have di®erent shapes depending on the sound being
produced. The midline tongue contour for /r/ was highly tortuous because of the large
spacefromthesublingualcavityandtheupwardpositionofthetonguetip. Figure4.10(b)
shows that the measured area functions are di®erent for the di®erent articulations. The
fricative sounds /s/ and /S/ have increased areas near the glottis region unlike the vowel
sound /i/ because of the backward movement of the tongue root (compare the shaping of
theepiglottisin/s/,/S/,/i/fromthemidsagittalinFig.4.10(a)). Figure4.10(c)suggests
that 3D vocal tract geometry provides additional information such as the degree of the
tongue grooving (compare the arrows in /s/, /S/, /i/), cupping of the tongue (see the
thick arrow in /r/), and the volume of the sublingual cavity (see the thin arrow in /r/).
4.2.3 Summary
The proposed reconstruction can produce a clear depiction of 3D tongue shaping with
1:33£1:33£1:33 mm
3
resolution, which is higher than the single-coil imaging data set
discussedintheprevioussection,fromdataacquiredduring7secondsscanandsustained
65
sound production. It adopts a phase-constrained CS combined with multi-coil data. It
demonstrates clear depiction of air-tissue boundaries, which are the features of interest.
However, itiscomputationallyintensivebecauseitrequiresL+1iterativereconstructions
as shown in Fig. 4.8, where L is the number of coil elements.
66
0 0.005 0.01 0.015 0.02 0.025 0.03
30
40
50
60
70
80
90
100
data consistency
total variation
3x
4x
5x
λ=0.006
λ=0.007
λ=0.008
λ=0.009
λ=0.01
λ=0.011
λ=0.012
λ=0.013
λ=0.014
λ=0.015
λ=0.016
λ=0.017
λ=0.005
λ=0.004
λ=0.003
λ=0.002
λ=0.001
λ=0.0005
λ=0.0001
Figure 4.3: L-curve for the selection of regularization parameter ¸ for CS reconstruction
of the 3D upper airway data with reduction factors of 3, 4, and 5. The CS reconstruction
was terminated at the 1000
th
iterate. The plotted points (x) and their corresponding
regularization parameter values (¸) are shown for reduction factor 3. Virtually identi-
cal patterns were observed for reduction factors 4 and 5. The corners of the L-curve
are not sharp, but provide a clear trade-o® between total variation (sparsity) and data
consistency.
67
Figure 4.4: Representative magnitude and phase images from axial slices. The adjacent
images shown were 3 mm apart along the S-I direction. The data were acquired in the
open-mouthed position during 36 seconds. Large phase variations are observed particu-
larly in the orofacial tissue and the lateral sides of the tongue (see the yellow arrows).
68
Figure 4.5: Axial slice reconstructions from retrospective sub-sampling of fully sampled
data. (a) Magnitude images reconstructed by use of inverse Fourier transform (iFT),
non-phase-constrained compressed sensing (CS), PC-I CS, and PC-II CS reconstructions
of 1x, 1.3x, 3x, 4x, 5x sub-sampled data. (b) (i) Full-resolution phase map from fully
sampled 1x data. (ii) Low-resolution phase map from fully sampled low-frequency data.
(iii) Phase di®erence between phase maps (i) and (ii). (iv) Phase map from non-PC CS
reconstruction of 5x sub-sampled data. (v) Phase di®erence between phase maps (i) and
(iv). (c)Magni¯edROIsinsidetheredrectanglein (a). Noticethesharpdepictionof the
air tissue boundaries in 5x PC-II CS reconstructed image (see the white arrows in (c)).
69
midsagittal coronal
/s/
/∫/
/l/
/r/
Figure 4.6: Reformatted 2D midsagittal and coronal images after the PC-II CS recon-
structions of the 5x undersampled 3DFT data set. The prospective use of accelerated
3DFT scanning required just 7 seconds of scan time during which one trained subject
produced each sustained English consonant /s/, /S/, /l/, and /r/. This achieved 1.5 £
1.5£ 2.0 mm
3
resolution over a 24£ 24£ 10 cm
3
FOV. Representative 2D midsagittal
images are shown in the leftmost column. Eight representative coronal slices of interest
are shown that are ordered from lips to pharyngeal wall. Important articulatory features
provided by the 3D vocal tract dataset include: (1) groove of the tongue surface for frica-
tive sound /s/ (see the arrow in the /s/ row) and (2) wider shaping of the vocal tract
between the hard palate and the tongue front for /l/ indicating the curving of the tongue
sides to allow air°ow along the sides (for the comparison, see the arrows in the /S/ and
/l/ rows) although their 2D midsagittal slices exhibit similar shaping patterns.
70
/s/
/∫/
/l/
/r/
Figure 4.7: 3D visualization of the tongue and lower jaw after the PC-II CS reconstruc-
tions from the data set prospectively acquired with 5x acceleration. Tongue grooves are
seen for /s/ and /S/, further forward in /s/ than /S/, but not for /l/ (see the arrows in
/s/, /S/, and /l/). Cupping of the tongue (i.e., cavity behind the tongue front) is seen
for /r/ (see the arrow in /r/).
71
Non-PC
CS recon.
w/ TV
Low resolution
Mag. coil sensitivity map
|C
1
(r)|
Mag. image
|M
1
(r)|
Phase map
angle(M
1
(r))
Find mask
mask
1
(r)
x
e
j(·)
x
Multi-coil PC-CS reconstruction
Final image
Coil 1
raw
data
Coil 1 raw data
Coil 2 raw data
Coil L raw data
Non-PC
CS recon.
w/ TV
Low resolution
Mag. coil sensitivity map
|C
L
(r)|
Mag. image
|M
L
(r)|
Phase map
angle(M
L
(r))
Find mask
mask
L
(r)
x
e
j(·)
x
Coil L
raw
data
Figure 4.8: Flowchart of the proposed reconstruction scheme.
72
Figure 4.9: Midsagittal images for (a) /s/, (b) /S/, and (c) /r/. Their corresponding 3D
tongue shapes for (d) /s/, (e) /S/, and (f) /r/.
73
Figure 4.10: The prospective use of accelerated 3D acquisition and multi-coil PC-CS
reconstruction. (a) Reformatted midsagittal slices and their associated midlines drawn
for cross-sectional slice prescription. (b) Area function plot. (c) 3D visualization of the
tongue and lower jaw.
74
Chapter 5
Real-time Speech MRI Using Golden-ratio Spiral
5.1 Introduction
Real-time MRI has provided new insight into the dynamics of vocal tract shaping during
natural speech production [79, 11, 24, 107]. In real-time speech MRI experiments, image
data and speech signals are simultaneously acquired. Real-time movies, typically of a
2D midsagittal slice, are reconstructed and displayed in real-time. The shape of the
vocal tract, from the lips to the glottis, is identi¯ed using air-tissue boundary detection
performed at each frame [12]. Adaptive noise cancellation is used to produce speech
signals free from the MRI gradient noise [13]. Articulatory and acoustic analysis is then
performed using synchronized audio and video information [87]. Although MRI data is
acquired and reconstructed in real-time, the processes of segmentation and analysis are
performed retrospectively.
Speech rate is highly dependent on the subject's speaking style and the speech task,
and it a®ects speed of articulatory movement [1, 108]. Variations in the velocity of
articulators such as tongue dorsum, lips, and jaw result from the nature of the sequences
of the vowels and consonants being produced [84]. The motion of articulators (e.g.,
tongue, velum, lips) is relatively slow during production of monophthongal vowel sounds
75
or during/vicinity of pauses. Vocal tract variables such as tongue tip constriction, lip
aperture, and velum aperture are dynamically controlled and coordinated to produce
target words [15]. The speeds among articulators can also di®er during the coordination
of di®erent articulators, for example, the movement of the velum and the tongue tip
during the production of the nasal /n/.
Current speech MRI protocols do not provide a mechanism for °exible selection of
temporal resolution. This is of potential value, because higher temporal resolution is
necessary for frames that re°ect rapid articulator motion while lower temporal resolution
is su±cient for capturing the frames that correspond to static postures. As recently
shown by Winkelmann et al. [114], golden-ratio sampling enables °exible retrospective
selection of temporal resolution. It may be suited for speech imaging, in which the
motion patterning of articulators varies signi¯cantly in time, and in which it is di±cult
to determine an appropriate temporal resolution a priori.
Inthismanuscript, Ipresenta¯rstapplicationofspiralgolden-ratiosamplingscheme
(see Fig. 5.1) to real-time speech MRI and investigate its performance by comparison
with conventional bit-reversed temporal view order sampling scheme. Simulation studies
areperformedtocompareunaliased¯eld-of-view(FOV)fromspiralgolden-ratiosampling
withthatfromconventionalbit-reversedsamplingatdi®erentlevelsoftemporalresolution
afteraretrospectiveselection. Invivo experimentsareperformedtoqualitativelycompare
imagesignal-to-noiseratio(SNR),levelofspatialaliasing,anddegreeoftemporal¯delity.
Finally, I present an automated technique in which a composite movie can be produced
using data reconstructed at several di®erent temporal resolutions. I demonstrate its
e®ectiveness at improving articulator visualization during production of nasal consonant
/n/.
76
Figure5.1: Schematicdiagramofreal-timecontinuousMRIdataacquisition(DAQ)using
a golden-ratio spiral view order. A sequence of only ¯rst ¯ve TRs is shown. (Top) Pulse
sequencediagram. (Bottom)Accumulationofspiralinterleavesinthesamplingofk-space
as time elapses. Every spiral interleaf acquired during current DAQ period is indicated
in blue color. It is noted that temporal resolution can be controlled by the number of
adjacent TRs chosen when reconstructing a frame. In the golden-ratio sampling scheme,
next spiral interleaf never overlaps with previously acquired interleaves. Hence, sampling
density (i.e., imaging ¯eld-of-view) increases as the number of adjacent TRs used for a
frame reconstruction increases.
5.2 Materials and Methods
5.2.1 Simulation
A simulation study was performed to compare unaliased FOVs from conventional bit-
reversed view order sampling [19] (see Fig. 5.2a,c,e) and spiral golden-ratio view order
sampling (see Fig. 5.2b,d,f) for a variety of temporal resolutions selected retrospectively.
The spiral trajectory design was based on the imaging protocol routinely used in our
laboratory at theUniversityofSouthernCalifornia[79, 19]. Thedesignparameters were:
13-interleaf uniform density spiral (UDS), 20£ 20 cm
2
FOV, 3:0£ 3:0 mm
2
in-plane
spatial resolution, maximum gradient amplitude = 22 mT/m, maximum slew rate =
77 T/m/s, and conventional bit-reversed view order. Bit-reversed temporal view order
is often adopted in real-time MRI because it shortens the spiral interleaf angle gaps in a
few adjacent interleaves at any time point and reduces motion artifacts [100, 81]. Spiral
golden-ratio view order was performed by sequentially incrementing the spiral interleaf
77
angle by the golden-ratio angle 360
±
¢2=(
p
5+1) ¼ 222:4969
±
at every repetition time
(TR) (see Fig. 5.1 and Fig. 5.2d). The unaliased FOV was de¯ned as the reciprocal of
the maximum sample spacing in k-space.
5.2.2 In Vivo Experiments
MRI experiments were performed on a commercial 1:5 Tesla scanner (Signa Excite HD,
GE Healthcare, Waukesha, WI). A body coil was used for radio frequency (RF) trans-
mission, and a custom 4-channel upper airway receive coil array was used for RF signal
reception. The receiver bandwidth was set to §125 kHz (i.e., 4 ¹s sampling rate). One
subject was scanned in supine position after providing informed consent in accordance
with institutional policy.
A midsagittal scan plane of the upper airway was imaged using custom real-time
imaging software [96]. The spiral trajectory design followed those described in the Sim-
ulation section. The imaging protocol was: slice thickness = 5 mm, TR = 6:164 ms,
temporal resolution = 80:1 ms. The golden-ratio view order scheme was compared with
the conventional bit-reversed 13-interleaf UDS scheme with all other imaging and scan
parameters ¯xed (e.g., scan plane, shim and other calibrations, etc.). The volunteer was
instructed to repeat \go pee shop okay bow know" for both the conventional bit-reversed
UDS and golden-ratio acquisitions. The speech rate was maintained using a 160 bpm
metronome sound that was communicated to the subject using the scanner intercom.
Inconventionalbit-reversed13-interleafUDSdata,griddingreconstructionswereper-
formed using temporal windows of 8-TR and 13-TR. In golden-ratio spiral data, gridding
reconstructions were performed using temporal windows of 8-TR, 13-TR, 21-TR, and
34-TR. Gridding reconstructions were based on interpolating the convolution of density
78
compensated spiral k-space data with a 6£6 Kaiser-Bessel kernel onto a two-fold over-
sampledgridsfollowedbytaking2DinversefastFouriertransform(FFT)anddeapodiza-
tion [46]. Root sum-of-squares (SOS) reconstruction from the 2 anterior elements of the
coil was performed to obtain the ¯nal images. For comparison of images reconstructed
fromdi®erenttemporalwindows,imageframeswerereconstructedfromthedatainwhich
the centers of each temporal window were aligned.
5.2.3 Blockwise Temporal Resolution Selection
In golden-ratio data sets, multiple temporal resolution videos can be produced retro-
spectively. I sought a procedure for automatic selection of the temporal window that
was appropriate for each image region in each time frame, and the ability to use this to
synthesize a single video.
I used time di®erence energy (TDE), as described in Eq. 5.1, as an indicator of
motion. This was calculated for each block B
j
and each time t:
TDE(B
j
;t)=
T
X
t
0
=¡T
X
(x;y)2B
j
jI(x;y;t)¡I(x;y;t¡t
0
)j
2
; (5.1)
whereI(x;y;t)isimageintensityatpixellocation(x;y)andtimet,and2T isthenumber
of adjacent time frames that are considered.
Temporal resolution selection was performed based on the alias-free high temporal
resolution frames. I used sensitivity encoding (SENSE) reconstructed frames from 8-
TR temporal resolution data (i.e., 49:3 ms temporal resolution) for the calculation of
TDE. In addition, SENSE reconstructions were performed at each frame from 13-TR,
21-TR, and 34-TR temporal windows, whose corresponding temporal resolutions were
80:1 ms, 129:4 ms, and 209:6 ms. Data from all 4 elements of the coil were considered
for the reconstruction. Alias-free coil sensitivity maps were obtained from 34-TR data.
79
Sensitivitymapsforeachcoilelementwereobtainedbydividingtheimageateachelement
by the root SOS image of all elements. Frames were updated at every 4-TR = 24:7 ms
(i.e., 40:6 frames per second). The spiral SENSE reconstruction was based on a non-
linear iterative conjugate gradient algorithm with a total variation regularizer [58, 66].
Totalvariationregularizationwase®ectiveatremovingimagenoisewhilepreservinghigh
contrast signals such as the air-tissue boundaries. Iterations were terminated at the 20
th
iterate after visual inspection.
Intensity correction was performed at each frame using a thin plate spline ¯tting
method [64]. The SENSE reconstructed frames were ¯rst cropped to a 64£64 size that
onlycontainedthevocaltractregionsofinterestandthenwereinterpolatedtoa128£128
size in order to avoid the blockiness of the images. An 8£8 block and T = 2 was used
to calculate TDE. Two spatially adjacent blocks were overlapped by 4 pixels in either
the vertical or horizontal direction. The calculated TDE at each block was assigned to
the central 4£4 block. TDE was normalized through the entire time frames. Temporal
resolution selection at each 4£ 4 block was performed after a simple thresholding of
the normalized TDE (TDE
norm
). For TDE
norm
¸ 0.6, 8-TR SENSE was assigned. For
0.4 · TDE
norm
< 0:6, 13-TR SENSE was assigned. For 0.2 · TDE
norm
< 0:4, 21-TR
SENSE was assigned. For TDE
norm
< 0:2, 34-TR SENSE was assigned. These settings
were chosen empirically based on quality of the ¯nal synthesized video.
5.2.4 Oral-Velar Coordination
Another set of experiment was performed to investigate the e®ectiveness of variable tem-
poral resolution selection in golden-ratio spiral acquisition. The imaging protocol was:
slice thickness = 5 mm, repetition time (TR) = 6:004 ms, in-plane spatial resolution =
2:4£2:4 mm
2
, receiver bandwidth =§125 kHz. The golden-ratio view order scheme was
80
compared with the conventional bit-reversed 13-interleaf UDS method on the same mid-
sagittal scan plane. The experiment was performed under a standard mirror-projector
setup. InonesetofRT-MRIscan,6slidesweresequentiallypresentedas\typebowknow
¯ve", \type bone oh ¯ve", \type toe node ¯ve", \type bone know ¯ve", \type tone oh
¯ve", and \don't carry an oily rag like that". The presentation of the stimuli was con-
trolled by Microsoft Powerpoint software, in which there was a 1 second pause between
the adjacent slides. The subject was instructed to lie in supine position and read/speak
the words at a normal speech rate.
5.3 Results
Figure 5.3 contains a plot of the unaliased FOV as a function of temporal resolution
(i.e., the number of TRs) when retrospectively selecting a temporal window. For 13-TR,
the 13-interleaf UDS supports a larger unaliased FOV than the golden-ratio view order.
When the number of TRs becomes a Fibonacci number (e.g. 2;3;5;8;13;21;34), there
is a change in the unaliased FOV for the golden-ratio method. Note the sudden increase
in the unaliased FOV from 17:1 cm to 27:6 cm when the number of TRs changes from
20 to 21. For 8-TR, the 13-interleaf UDS provides inconsistent unaliased FOVs, which
are 6:7 cm and 10 cm. The unaliased FOVs for the 13-interleaf UDS are smaller than
thoseforthegolden-ratiowhenthenumberofTRsisbetween8and12. Thegolden-ratio
view order provided a consistent unaliased FOV at any time point and at any temporal
window chosen.
Figure5.4containstheimagesreconstructedfromthedataacquiredwhenthesubject
was stationary. Note that the midsagittal slice of interest has a regional support of
roughly 38 cm and has signi¯cant intensity shading due to coil sensitivity. The images
reconstructedfromthe13-interleafUDSdatahavespatialaliasingartifactsintheregions
81
posteriortothepharyngealwallfromcoil1andintheregionssuperiortothehardpalate
and velum from coil 2. The root SOS image in Fig. 5.4(c) contains little or no aliasing
artifacts within the vocal tract region of interest (denoted by the dashed box). Although
the unaliased FOV (i.e., 20 cm) from the 13-interleaf UDS is larger than that from
the golden-ratio sampling for the choice of 13-TR, aliasing artifacts indicated by white
arrowsin Fig. 5.4(c) are more prominentthan in Fig. 5.4(d). The spatial aliasing pattern
can be understood by examining the point spread functions (PSF) for each sampling
pattern. The PSF from the 13-TR UDS had a ratio of maximum sidelobe to mainlobe
peak (PSF
max¡sl
) of 0.061 and exhibited single sidelobe ring with a radius of 20 cm.
The PSF from the 13-TR golden-ratio sampling had a PSF
max¡sl
of 0.044 and showed
less coherent pattern with multiple sidelobe rings and lower sidelobe amplitude than the
13-TRUDS.Unlikethe13-interleafUDSacquisition,aselectionoflongtemporalwindow
providesalargeunaliasedFOVinthegolden-ratioacquisition. Notethataliasingartifacts
are removed in the entire image from a selection of 34-TR window in Fig. 5.4(f).
Figure 5.5 contains image frames and time-varying intensity pro¯les from the conven-
tionalbit-reversed13-interleafUDSandgolden-ratiomethodswhenthesubjectproduced
the speech utterance \bow know". Frames were updated at every TR. As seen in the
undersampled 8-TR case of Fig. 5.5(c) and Fig. 5.5(d), the 13-interleaf UDS method pro-
duces aliasing artifacts that are periodic in time while the golden-ratio method produces
less coherent aliasing in time. This periodicity in the aliasing is attributed to the incon-
sistent FOVs from the conventional bit-reversed 13-interleaf UDS as shown in Fig. 5.3.
As seen from the 13-TR case of Fig. 5.5(c) and (d), the level of aliasing is higher for the
golden-ratio result. Figure 5.5(d) shows that the intensity pro¯le from the 8-TR result
exhibitsthesharpesttransitionoftonguetipmotion(comparetheyellowarrowsin8-TR,
13-TR, and 21-TR results).
82
The region-based temporal resolution selection method required the use of multiple
temporal resolution videos reconstructed from iterative SENSE reconstructions, which
substantially increased computation time. The generation of 160 dynamic frames of
SENSE reconstructions from 8-TR, 13-TR, 21-TR, and 34-TR took approximately 45
min, 49 min, 53 min, and 60 min, respectively, with a 3.06 GHz CPU and 3.48 GB RAM.
The blockwise temporal resolution selection algorithm took approximately 2 min.
Figure 5.6 shows a result of the blockwise temporal resolution selection from the
SENSE reconstructed golden-ratio framesets. The synthesized frames in Fig. 5.6(c) ex-
hibitgoodassignmentoffourdistincttemporalresolutionvideos. Lesstonguetipblurring
is seen as indicated by the yellow hollow arrows in Fig. 5.6(c) and (d) than the 34-TR
result (see the red hollow arrow in Fig. 5.6(e)). A better visualization of the velum open-
ing is seen as indicated by the yellow solid arrows in Fig. 5.6(c) and (e) than the 8-TR
result (see the red solid arrow in Fig. 5.6(d)).
Nasal speech imaging studies were performed using the golden-ratio acquisition and
the results are shown in Fig. 5.7. Time intensity pro¯les are shown from the tongue tip
and velum in the production of nasal consonants in three di®erent syllable conditions
at onset, coda, and juncture geminate. Note that time intensity pro¯les available from
multipletemporal resolution videos facilitate a proper selectionof temporal resolution on
eacharticulator. Amongthefourtemporalresolutionsconsidered,tonguetipdynamicsis
depictedclearlywithleasttemporalblurringfromaselectionof48mstemporalresolution.
Thedepictionofthevelumloweringisclearlyde¯nedwithsu±cientSNRfromaselection
of 126 ms temporal resolution.
83
5.4 Discussion
A new acquisition scheme that adopts a spiral golden-ratio view order has been demon-
strated as a means to provide °exibility in retrospective selection of temporal resolution.
The golden-ratio scheme has been compared with conventional bit-reversed 13-interleaf
UDS acquisition, which is routinely used in our real-time speech MRI data collection at
the University of Southern California. The spiral golden-ratio view order provides larger
and consistent unaliased FOV when undersampling real-time data for higher temporal
resolution. In addition, spiral interleaves are evenly distributed for any choices of the
number of spiral interleaves at any time point, and hence a parallel imaging reduction
factor can be °exibly chosen and applied to dynamic golden-ratio data. Auto-calibration
with high resolution full FOV coil sensitivity maps is possible at any time point by uti-
lizing fully-sampled temporal window data centered on that time point.
Theproposedregion-basedtemporalresolutionselectionmethodhaslimitations. The
8-TR (i.e., 49:3 ms temporal window) SENSE reconstructed frames served as a guide to
selectpropertemporalresolution. However,theyinherentlylackedintemporalresolution
and contained low image SNR. The velum and pharyngeal wall su®ered from much lower
SNR due to low coil sensitivity and potentially due to high parallel imaging g-factor.
This can result in higher TDE regardless of motion. In addition, I performed SENSE
reconstruction from the data whose temporal window is smaller than 8-TR, but resulting
SENSE images produced inadequate image quality with signi¯cantly low SNR or blurred
air-tissueboundarieswiththeuseofalargeregularizationparameter. Higheracceleration
may be possible by the use of a highly sensitive upper airway receive coil with higher
channel counts [39].
The motivation for using blockwise processing was based on the following. First, air-
tissue boundaries within a block typically move with similar speed. Second. blockwise
84
processing helps to stabilize the calculation of TDE in the presence of noise. There is a
trade-o® in selection of the block size. For example, the choice of larger block size can
result in improper assignment of temporal resolution for a block within which motion is
not at uniform speed. The choice of smaller block size causes TDE to be more sensitive
to noise.
The temporal resolution assignment procedure does not provide a strong link be-
tween the needed temporal resolution for an event and choice of retrospective temporal
resolution. An additional navigator sequence may help to obtain the required temporal
resolution information although it reduces scan e±ciency. Ref. [108] by Tasko and Mc-
Clean reports that the speed of the tongue tip during °uent speech was measured using
electromagnetic articulometer (EMA) and was up to 200 mm/s. With the 3 mm spatial
and 49 ms temporal resolution from 8-TR SENSE, it is anticipated that more than 3
pixels will experience temporal blurring around the tongue tip region of interest if the
speed is 200 mm/s. Hence, higher spatio-temporal resolution frames will be necessary for
estimating temporal bandwidth more reliably.
Recent real-time spiral speech MRI has been demonstrated with 80»100 ms tempo-
ral resolution [79, 12, 19], but it lacks in temporal resolution compared to other speech
imaging technologies such as EMA and ultrasound. Higher temporal resolution can be
achieved by lowering spatial resolution or designing longer spiral readout with a fewer
number of interleaves. Lower spatial resolution imaging may lose details of ¯ne struc-
tures such as the epiglottis or lead to more di±culties in resolving the narrowing in the
airway, e.g., the constriction between the alveolar ridge and tongue tip in certain sound
productions such as the fricative /s/. Lengthening the spiral readout would cause images
tobemoresusceptible toblurringordistortionintheair-tissueboundariesduetoalarge
amount of resonance o®set from air-tissue magnetic susceptibility. Correction of blurring
85
ordistortionartifactsischallenginginreal-timeupperairwayMRIbecauseofdi±cultyin
estimating accurate ¯eld map. E®ective o®-resonance correction from real-time golden-
ratio view order data is an interesting area for investigation. Alternatively, real-time
radial speech MRI with 55 ms temporal resolution has been demonstrated using a short-
TRradialfastgradientechosequenceandparallelimagingreconstructionincombination
with temporal ¯ltering [111]. Improved temporal resolution imaging may be achieved by
adopting an additional navigator sequence in the acquisition and a spatiotemporal model
in the reconstruction [107, 63].
I have focused on an example of nasal sound production study in which knowledge
of the timing of oral and velar coordination is important for modeling temporal changes
in the constriction degrees of articulators under di®erent syllable contexts [19]. This
particulararticulationinvolvesarapidtonguetipmotionandrelativelyslowvelarmotion
when producing nasal consonant /n/, and is well suited for investigating the importance
of °exibility in temporal resolution selection. It is noted that velar movement highly
varies depending on the subjects and involved speech tasks. Kuehn [60] reported that
the velocity of the velar °esh point was measured using the cineradiographic equipment
and reached up to 120 mm/s. EMA studies on dynamics of French nasal vowels such
as / A/ and / E/ have been recently proposed [3]. Improved real-time MRI of dynamics
of nasal vowels using the golden-ratio scheme will be of interest to linguistic community.
The golden-ratio method can be applied to other articulatory timing studies which are
investigated in the literature [72].
The acoustic noise generated by the MRI gradients during conventional bit-reversed
13-interleaf UDS imaging is temporally periodic. This can be exploited for high quality
adaptive noise cancellation [13] and audio recordings during speech production in the
magnet. One di±culty with golden-ratio imaging is that the MRI gradient noise is no
86
longer periodic. Advanced models are therefore required for acoustic noise cancellation
during golden-ratio spiral imaging, and remain as future work.
5.5 Conclusions
Ihavedemonstratedtheapplicationofaspiralgolden-ratiotemporalviewordertoimag-
ing a midsagittal slice of the vocal tract during °uent speech. Simulation studies showed
that the golden-ratio method provided larger and consistent unaliased FOV when retro-
spectively undersampling real-time data than the conventional bit-reversed 13-interleaf
uniformdensityspiral. Innasalspeechimagingstudies,theproposedmethodprovidedan
improved depiction of rapid tongue tip movement with less temporal blurring and velar
lowering with higher SNR and potentially reduced aliasing artifacts. The region-based
temporal resolution selection method synthesizes a single video from multiple temporal
resolution videos available in the golden-ratio real-time data and potentially facilitates
subsequent vocal tract shape analysis.
87
1
2
3
4
5
6
7
8
9
10
11
12
13
c d
1
2
3
4
5
6
7
8
9
10
11
12
13
▲
222.4969
°
a
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
b
13-TR
21-TR
e
1, 14
2,15
3,16
4,17
5,18
6,19
7,20
8,21
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
f
Conventional Golden-ratio
8-TR
Figure5.2: k-spacetrajectoriesforconventionalbit-reversed13-interleafUDSandgolden-
ratio spiral view order when samples from (a,b) 8, (c,d) 13, and (e,f) 21 consecutive TRs
are combined. Temporal view orders are marked with the numbers on each end of the
spiralinterleaves. Note that forthe 8-TR case the spiral interleavesin (b) are distributed
more uniformly than in (a). In (c) and (e), the angle spacing between spatially adjacent
spiral interleaves is uniform with an angle of 360
±
=13. In (b,d,f), the angle spacing
between spatially adjacent two spiral interleaves is not uniform but the angle increment
betweensuccessiveviewnumbersisconstantwithanangleof360
±
¢2=(
p
5+1)¼222:4969
±
.
88
Figure 5.3: Retrospective selection of temporal resolution: (a) Comparison of unaliased
FOV between the golden-ratio view order and conventional bit-reversed 13-interleaf UDS
sampling. (b) The enlargement of the region within the green rectangle in (a). The
blue shaded region in (a, b) indicates that unaliased FOV varies in conventional bit-
reversed 13-interleaf UDS when the number of TRs is less than 10. The black solid line
illustrates a linear relationship between unaliased FOV and temporal resolution when
UDS trajectories are designed under the constraints of the same spatial resolution and
readout duration. Note that the unaliased FOV is ¯xed as 20 cm for a temporal window
length(¸13-TR)fortheconventionalbit-reversed13-interleafUDS,butitincreaseswith
temporal window length for the golden-ratio view order.
89
Figure 5.4: Midsagittal images with a large reconstruction ¯eld-of-view (FOV) of 38
£ 38 cm
2
reconstructed from the data acquired in static posture. (a-c) Conventional
bit-reversed 13-interleaf UDS. (a) Image from coil 1, (b) image from coil 2, (c) root sum-
of-squares (SOS) of the coil 1 and coil 2 images. The region within the dashed box in (c)
is the vocal tract regions of interest (ROIs). For speed-up of the spiral acquisition, the
FOV of the 13-interleaf UDS is typically chosen to be small such that aliasing artifacts
are not observed in the vocal tract ROIs. (d-f) Root SOS of the coil 1 and coil 2 images
reconstructedfromdataacquiredusingthespiralgolden-ratioacquisition: reconstruction
from (d) 13-TR, (e) 21-TR, and (f) 34-TR data. Spatial aliasing artifacts are completely
removed in (f) because of larger FOV available from the golden-ratio method. The image
SNR in (f) is higher than that in (d) and (e).
90
Figure 5.5: Gridding reconstructed dynamic frames and time intensity pro¯les from (a,c)
bit-reversed 13-interleaf uniform density spiral data and (b,d) golden-ratio spiral view
order data. A 40£40 matrix containing only the vocal tract region of interest is shown.
Retrospective selection of temporal resolution is performed using 8-TR and 13-TR in
(a), and 8-TR, 13-TR, and 21-TR in (b). Frame update rate was 1-TR = 6:164 ms.
Two example frames (frame 1, 6) are shown (frame 1 is relatively stationary, frame 6
is captured during rapid tongue tip motion). Time intensity pro¯les from the image
column indicated by the dashed lines are shown for (c) bit-reversed 13-interleaf UDS and
(d) golden-ratio view order data.
91
frame 10 frame 15 frame 19 frame 7 frame 1
0
0.2
0.4
0.6
0.8
1.0
a
b
c
d
e
Figure 5.6: Blockwise temporal resolution selection and synthesis of a single video from
fourtemporalresolutionvideos. Fiverepresentativeframesareshownwhicharecaptured
when the subject produced the speech sound /bono/. (a) Normalized time di®erence
energy (TDE) map. (b) Temporal resolution selection map [white: 48 ms (= 8-TR)
temporal resolution, bright gray: 78 ms (= 13-TR) temporal resolution, dark gray: 126
ms (= 21-TR) temporal resolution, black: 204 ms (= 34-TR) temporal resolution]. (c)
Synthesized temporal resolution frames based on the temporal resolution selection map
in (b). (d) 48 ms (= 8-TR) temporal resolution frames. (e) 204 ms (= 34-TR) temporal
resolution frames.
92
Figure 5.7: Variable temporal resolution selection from real-time data acquired using
the golden-ratio acquisition scheme. (a) A midsagittal MR image from the subject. (b)
Time intensity pro¯les for the tongue tip aperture (from the solid prescribed line in (a))
and the velum aperture (from the dashed prescribed line in (a)) for onset, coda, and
juncture geminate in the articulations of \bow know", \bone oh", and \bone know". In
(b), the hollow arrows in the tongue tip and velum aperture pro¯les each indicate tongue
tip closure onto the alveolar ridge and velar lowering, respectively. Temporal resolutions
were (i) 204 ms, (ii) 126 ms, (iii) 78 ms, and (iv) 48 ms.
93
Chapter 6
Parallel Imaging with Novel 16-Channel Coil at 3 Tesla
6.1 Introduction
MRI is a powerful tool for the non-invasive assessment of upper airway anatomy and
function. Upper airway MRI has been used to investigate the feasibility of clinical as-
sessments in patients with obstructive sleep apnea [99, 5] and swallowing disorders [38].
In sleep apnea studies, rapid volumetric imaging is desirable for identifying the sites of
the narrowing or occlusion of the airway and measuring its volume. In swallowing stud-
ies, high temporal resolution real-time MRI is desirable for the evaluation of swallowing
disorders. This enables improved assessment of the dynamics of a bolus of food without
introducing motion artifacts.
Rapid upper airway MRI is also valuable for basic research into human speech pro-
duction. Insuchstudiesvolumetriccoverageistypicallyobtainedduringsustainedsound
production, using a two-dimensional (2D) multi-slice [75] or native 3D acquisition [53].
Tract dynamics (e.g., movement of the tongue and velum) is typically assessed using
real-time acquisition of a single 2D slice (e.g., midsagittal) [79]. An objective and quan-
titative knowledge of the orchestration of articulatory activity that creates speech is a
necessary element in understanding of human communication. A perennial challenge in
94
speech production research is the ability to examine 3D real-time changes in the shaping
of the vocal tract. Spatio-temporal information about speech movements is critical not
onlytounderstandingandmodelingthespeechproductionprocessbutalsotoathorough
understanding of speech acoustics; this in turn has signi¯cant implications for developing
technological applications of machine synthesis and recognition of speech.
Fromatranslationalapplicationperspective,understandingspeechproductionde¯cits
due to neurological damage or other disease etiology directly from articulation details
(versus what is o®ered by speech acoustic recordings) is critical to assess and plan treat-
ment. For example, people with certain neurological disease (e.g., apraxia) are known
to exhibit speech timing patterns di®ering from neurologically unimpaired speakers (see
articulatory data in Ref. [18]). The fact that both irregularities in the implementation of
linguisticprosodyandirregularitiesinarticulatorytimingpatternsoccurinneurologically
impaired populations implies that investigation of the in°uence of prosodic structure on
articulatory timing may illuminate the general question of how language-speci¯c knowl-
edge is related to motor control. Rapid upper airway MRI o®ers tools to look at clinical
disorders in a new way.
Avarietyofpulsesequencesandimagereconstructiontechniqueshavebeenappliedto
improvethespatio-temporalresolutionofupperairwayMRI.Spiralgradientecho(GRE)
imaging techniques have been shown to be e®ective at capturing vocal tract dynamics
during natural speech production [79, 107, 23]. Improved spatio-temporal resolution
imaging has been demonstrated in the assessment of swallowing disorder using parallel
imaging on a standard 12-channel head and neck array coil [14]. However, a design
and use of dedicated multiple channel receive coil which is highly sensitive to the upper
airway anatomy has not been reported in the literature. In this manuscript, a novel 3
Tesla 16-channel upper airway coil is described and its parallel imaging performance is
95
demonstrated. Involunteers,thesignal-to-noiseratio(SNR)andparallelimagingg-factor
values are compared over the upper airway regions of interest in the midsagittal slice. 3D
parallel imaging of the upper airway is demonstrated to investigate the feasibility of high
resolution accelerated upper airway MRI during sustained sound production. Real-time
spiral acquisition and parallel imaging reconstruction is applied to capture vocal tract
dynamics during natural sound production and is demonstrated with 2:0£ 2:0 mm
2
spatial and 84 ms temporal resolution.
6.2 Materials and Methods
6.2.1 Coil Design and Construction
The desired upper airway ¯eld-of-view (FOV) extends in the superior-inferior (S-I) direc-
tionfromabovethehardpalatedownthethroattothelevelofthethirdcervicalvertebra
(C3) and in the anterior-posterior (A-P) direction, from several centimeters in front of
the lips to the anterior surface of the cervical spine.
Our coil con¯guration consists of seven subgroups of elements: ¯ve longitudinal pairs
and two longitudinal triplets (see Fig. 6.1a). The coils in each subgroup share a common
conducting element along one edge (uniform color in Fig. 6.1a) but form two layers along
the opposite side (dashed line). The array has an overall dimension of 43£ 18 cm
2
.
Fourteen elements have a dimension of 8:2£7:5 cm
2
whereas the third coil of each triplet
has a dimension of 8:5£7:5 cm
2
. The top two bands of seven coil elements each are
mounted on a thin plastic substrate that wraps around the lower face. The third coil
of each triplet extends down the side of the neck and is encased in a °exible foam cover
that allowsit to curveunder the chin for a tighter coupling to the neckregion (Fig. 6.1c).
The overlaps between neighboring coil elements form three di®erent areas: A, B, and C
96
in Fig. 6.1a. The contour of the coil elements was adjusted so that area B cancels the
mutual inductive coupling between diagonal elements. The sum of the areas A and B
cancel the coupling between the circumferentially adjacent coils. The overlap C between
longitudinallyadjacentcoilelementsislargerthanthatneededtocancelmutualinductive
coupling. For coils spaced along the z-direction with the conventional overlap [93], the
SNRpro¯leshowstwopeakswithavalleywherethecoilsintersect. Theoverlappingarea
Cwasenlargedtoeliminatethevalleyandgivea°atSNRresponsealongthez-direction.
TheresultingexcessmutualinductancecanbecanceledwiththedecouplingcapacitorCd
placed in the common conducting element. In all cases, the coupling between adjacent
coil elements could be compensated far below the inherent inductive coupling between
non-adjacent coil elements. The individual coil elements were tuned with the trimmer
capacitors to give a minimal reactive component to the input impedance when the coils
were loaded with a volunteer. The input match capacitor of each element (marked with
anasteriskinFig.6.1a)isconnectedtoaninputmatchcircuit(Fig.6.1b). Thepindiode
MA4P1250NM provides active blocking during the transmit pulse. The signal diode
1N6640 allows passive blocking if the coil is not plugged into the scanner. The ratio of
the unloaded Q to loaded Q for each coil element was deduced from measurements of
the coil input impedances. The ratio was taken from the peak values of the resistive
component of the input impedance of the unloaded and loaded condition.
Each coil element is connected to a 27 dB-gain, low-input-impedance preampli¯er
(Rich Spring Technologies, Arcadia, CA) located in a box on either side of the head.
Each coaxial cable connecting a coil element and its preamp is ¯tted with an RF trap at
the input of the preamp to eliminate multiple ground connections to di®erent points on
thesubgroup'scommonconductorandtopreventspuriouseddycurrentsintheshieldsof
the coax. The RF trap consists of three turns of 0.040" semi-rigid coax (Fig. 6.1d) inside
97
a cylindrical shell (Fig. 6.1e) made from double-sided Te°on circuit board. The outer
surface of the Te°on is etched to form two capacitors between two outer conducting pads
and the common inner pad. The outer pads are segmented into many smaller, binary
weighted pads connected with small bridges. Each trap was tuned to 127.72 MHz by
cutting a su±cient number of the bridges with a ¯le while monitoring its insertion loss
with a network analyzer. This construction is an economical way to produce numerous
traps that can be ¯ne-tuned without stocking a large number of di®erent values of high
voltage capacitors or using expensive variable capacitors.
The preamp boxes are mounted on a base plate that serves as the platform for an
adjustable headrest padded with memory foam. The two preamp boxes serve as the base
of a pivoting cross bar that holds an adjustable cantilever connected to the facemask coil
array. The coil array can be held close to the face (Fig. 6.2) or folded up and back to
permit patient entry and exit. The outputs of the two preamp boxes were connected to
the scanner's input sockets by way of two cables that were each ¯tted with a single RF
trap. The latter trap consists of two concentric conducting cylinders that are connected
at each end by several capacitors in parallel.
6.2.2 Experimental Methods
Experiments were performed on a Signa EXCITE HDx 3.0T scanner (GE Healthcare,
Waukesha, WI) with gradients capable of 40 mT/m amplitude and 150 mT/m/ms slew
rate. Thereceiverbandwidthwassetto§125kHz(i.e.,4¹ssamplingrate)forallstudies.
The noise correlation matrix as a measure of coil coupling was computed by 1) acquiring
data with no RF excitation and 2) calculating noise correlation from every pair of the
coil elements as described in Eq. (6) of Ref. [83].
98
For the SNR and g-factor evaluation, 3D volume of the upper airway was acquired in
supine position using a 3DFT gradient echo sequence. Imaging parameters were: 3DFT
gradientecho; k
y
andk
z
phaseencodingalonganterior-posteriorandright-leftdirections,
respectively; TE = 2.1 ms; TR = 3.9 ms; 1.88 £ 1.88 £ 2.50 mm
3
spatial resolution;
24 £ 24 £ 18 cm
3
FOV; pixel dimension = 128 £ 128 £ 72; scan time = 30 seconds.
The subjects held their mouth closed without swallowing during each 30 seconds scan. I
consideredaretrospective1Dregularundersamplingonamidsagittalsliceand2Dregular
undersamplingonaxialandcoronalslices. TheundersamplingwasalongA-Pdirection. I
performedtheSNRevaluationofthe16-channelcoilbycomparingitwithwidelyavailable
single-channel birdcage transmit/receive coil and 8-channel neurovascular (NV) receive
coil and also by comparing the performance of the 16-channel coil among three subjects.
The single-channel birdcage coil is cylindrical-shaped with the inner diameter of 28 cm
and the length of 38 cm, and usually covers the brain. It is high-pass and quadrature
driven. The 8-channel NV coil consists of a 4-element head array and 4-element neck
array: the head array has an inner diameter of 22 cm and length of 33 cm. The neck
arrayhasadimensionof34£17cm
2
intheanteriorneckregionandadimensionof44£
22cm
2
intheposteriorneckregion. EightROIs(seeFig.6.3a)wereselectedmanuallyon
a midsagittal scan plane, and the average SNRs were evaluated on each ROI separately.
For the data acquired from the 8-channel NV and 16-channel UA coils, the average
SNRs were calculated from images in SNR units reconstructed from the B
1
-weighted
combining method described in Eq. (6) of Ref. [48]. Low resolution coil sensitivity maps
were calculated from 32 central k-space lines by dividing the image at each coil element
by the root sum-of-squares (RSS) image. I compared image quality between one large
male and one small female on a midsagittal scan plane after performing parallel imaging
99
reconstruction [90] for reduction factors of R = 2, 3, 4, and 5. Parallel imaging g-factors
were calculated on a pixel-by-pixel basis as described in Ref. [90].
High resolution 3D imaging of the vocal tract was demonstrated using a 3DFT gradi-
ent echo sequence. Imaging parameters were: 8 cm midsagittal slab excitation, °ip angle
= 5
±
; k
y
and k
z
phase encoding along A-P and R-L directions, respectively; TE = 2:1
ms; TR = 4:2 ms; 1:25£1:25£1:25 mm
3
spatial resolution; 20£20£10 cm
3
FOV; pixel
dimension = 160£160£80; scan time = 54 seconds. During each 54 seconds scan, the
subjects held their mouth open without swallowing. For the evaluation of parallel imag-
ing performance, I considered a retrospective 2D regular undersampling on the k
y
and k
z
encodes after taking the inverse Fourier transform along the readout direction (i.e., S-I).
SENSE reconstructions were performed at each axial slice, in which coil calibration was
performed using the central 32£32 (k
y
, k
z
) encodes.
Real-time upper airway MRI during natural speech sound production was demon-
strated on the 16-channel UA coil using a 2D spiral gradient echo sequence. Imaging
parameters were: TE = 1.4 ms, TR = 4.0 ms, readout duration = 1.2 ms, slice thickness
= 5 mm, °ip angle = 10
±
, FOV = 30£30 cm
2
, spatial resolution = 2:0£2:0 mm
2
, image
matrix size = 150£150. The subject was instructed to repeat \go pee shop okay bow
know" during the scan. The angular increment of spiral interleaves in k-space was based
on the golden-ratio temporal view order [114], which supports a retrospective selection
of temporal resolution at any arbitrary time point.
From the 2D spiral data set, ¯ve di®erent acceleration factors of 1.0, 1.6, 2.6, 4.2,
and 6.8 were considered in frame reconstruction. Their corresponding numbers of spi-
ral interleaves were 89, 55, 34, 21, and 13 which led to temporal resolution of 356 ms,
220 ms, 136 ms, 84 ms, and 52 ms, respectively. Coil sensitivity calibration was per-
formed by utilizing multi-coil data corresponding to fully-sampled temporal window and
100
by dividing each individual coil image by the RSS image. The coil sensitivity maps and
noise covariance matrix were applied to image reconstruction. Image reconstruction from
the undersampled multi-coil spiral data was based on an iterative conjugate gradient al-
gorithm after taking the noise decorrelation steps as described in Ref. [89]. Iterations
were stopped at the 15th iterate for the acceleration factors of 1.0, 1.6, and 2.6 and at
the 20th iterate for the acceleration factors of 4.2, and 6.8 based on Ref. [91]. After
the reconstruction of the frames, temporal median ¯ltering with a ¯lter length of 5 was
applied pixel-by-pixel to successive frames in order to eliminate residual aliasing artifacts
in reconstructed frames [111].
6.3 Results
The majority of the Q
unloaded
=Q
loaded
ratios ranged from 4 to 6 with a few as high as 8
depending on how closely the array was placed on the subject. Figure 6.4 contains the
locations of the coil indices and the magnitude plot of a noise correlation matrix from
a healthy volunteer. The o®-diagonal elements in the noise correlation matrix ranged
from 0.0029 to 0.7039 with a mean of 0.1738. The maximum noise correlation occurred
in a pair of longitudinally adjacent coil elements which has a relatively larger degree of
overlap than a pair of horizontally adjacent coil elements.
Figure 6.5 shows midsagittal, axial, and coronal slices of the 3D upper airway from
a healthy subject. From the three orthogonal slice images, it is observed that each coil
element has its unique coil sensitivity pattern. For example, coil 8 shows highly localized
sensitivities in midsagittal slice but shows relatively uniform sensitivity in coronal slice.
Coil 12 shows uniform sensitivity in midsagittal slice but shows localized sensitivities in
axial and coronal slices.
101
Table6.1: AverageSNRimprovement. AverageSNRwasmeasuredinasinglesubject(33
year old male) using the 16-channel UA coil, single-channel birdcage coil, and 8-channel
NV coil. Eight regions of interest (see Fig. 6.3a) were identi¯ed in 2D midsagittal images
with1:88£1:88£2:50mm
3
spatialresolution,obtainedwithouttheuseofparallelimaging.
The 16-channel UA coil provided improved intrinsic SNR in all regions of interest. UA:
upper airway, NV: neurovascular.
ROI SNR
UA
=SNR
Birdcage
SNR
UA
=SNR
NV
1 Upper lip 11.6 5.2
2 Lower lip 19.0 8.8
3 Front tongue 10.0 5.5
4 Mid tongue 5.4 3.5
5 Back tongue 4.1 2.6
6 Palate 3.2 1.9
7 Velum 2.9 2.0
8 Pharyngeal wall 2.0 1.5
Table 6.1 compares SNR for 8 di®erent ROIs with the three di®erent coils on one
subject. The 16-channel coil produced highest SNR in every region. Particularly, the
average SNR in the lower lip provided a 8.8-fold and 19-fold increase compared to the 8-
channelNVcoilandbirdcagecoil,respectively. The16-channelcoilproducedthesmallest
SNR improvement in the pharyngeal wall, 1.5 over the 8-channel coil and 2.0 over the
birdcage coil.
Table 6.2 compares SNR ratio for 8 di®erent ROIs on three subjects with the 16-
channel UA coil. The maximum SNR ratio from Subject 2 was 3.1, which was far lower
than 7.1 and 8.0 from Subject 1 and Subject 3, respectively. This results from the fact
that Subject 2 has the relatively smaller head size and the coil array is placed close to
the orofacial part of the head.
Figure 6.6 contains SENSE reconstructed images and g-factor maps for R = 2 to 5
from one large male (Subject 1) and one small female (Subject 2) using the 16-channel
coil. The pharyngeal wall of Subject 1 in Fig. 6.6a is seen relatively darker than that
102
Table 6.2: Comparison of average SNR drop-o® in three volunteers. Average SNR was
measured in three volunteers using the 16-channel coil. Eight regions of interest were
identi¯edin2Dmidsagittalimageswith1:88£1:88£2:50mm
3
spatialresolution,obtained
without the use of parallel imaging. Note the relatively less steep SNR drop-o® from the
small female subject.
Subject 1 Subject 2 Subject 3
Age/Sex 34/M 24/F 33/M
Size (A-P/S-I)
¤
11.0 cm / 9.1 cm 9.3 cm / 6.6 cm 12.2 cm / 9.1 cm
ROI SNR
ROI
=SNR
pw
SNR
ROI
=SNR
pw
SNR
ROI
=SNR
pw
1 Upper lip 7.1 3.1 6.4
2 Lower lip 7.0 3.0 8.0
3 Front tongue 3.9 1.9 3.2
4 Mid tongue 2.6 1.5 2.0
5 Back tongue 2.2 1.4 1.6
6 Palate 3.8 1.4 3.9
7 Velum 2.0 1.1 1.4
8 Pharyngeal wall 1.0 1.0 1.0
¤
: A-P size was measured as the distance from the lower lip to the pharyngeal wall, and
S-I size was measured as the distance from the hard palate to the glottis.
of Subject 2 in Fig. 6.6c because of the larger head size of Subject 1 and the geometry
of the close-¯tting coil. In both subjects the SENSE images up to a reduction factor 4
exhibited good depiction of air-tissue boundaries in the upper airway ROIs as indicated
by the red contours in Fig. 6.6a and c). Note that the spiky artifacts resulting from
noise ampli¯cation due to sensitivity matrix inversion are observed in the R = 3, 4, and
5 SENSE images in both subjects. The mean g-factors in the upper airway ROIs were
comparable in both subjects for all reduction factors considered.
Figures6.7aandbshowg-factorplotsfortheeightROIsandentireimageasafunction
ofreductionfactorfor2DFTmidsagittalimagingwhenusingthe8-channelneurovascular
coiland16-channelupperairwaycoil,respectively. Thedirectionofthephaseencodewas
103
along A-P. The g-factor values were substantially lower for the 16-channel coil compared
to the 8-channel coil. For reduction factor 4, the average g-factor was 40 % lower and
the di®erence in g-factor values among the ROIs were 30 % lower for the 16-channel
coil. As indicated by the shaded region, g-factor values are in the range of 3-12 among
the ROIs for the 8-channel coil while they are in the range of 2-4 for the 16-channel
coil. The dramatic improvement achieved from the 16-channel coil may be due to more
coil elements having distinct coil sensitivities in the mid-sagittal ROIs (see Figure 6.5a).
Since only 4-element of the 8-channel NV coil is close to the ROIs, it is expected that
the g-factor values dramatically increase for reduction factors greater than 4. As seen
fromFig.6.7a,g-factorvaluesforthepharyngealwallROIdoubledwhenreductionfactor
changed from 5 to 6.
Figures 6.8a and b contain midsagittal images reconstructed via SENSE (R=4) from
the Cartesian sampled data. The image from the 8-channel neurovascular coil shows
severe noise ampli¯cation in most upper airway ROIs (see the arrow in Fig. 6.8a). The
image from the 16-channel coil exhibits improved image quality with much higher SNR
(see Fig. 6.8b). Figures 6.8c-f show axial and coronal slice reconstructions using SENSE
witharate6(R
y
=3alongtheverticalaxis,R
z
=2alongthehorizontalaxis). Theimages
from the 8-channel coil exhibit signi¯cant noise ampli¯cation (arrows in Figures 6.8c and
e). The images from the 16-channel coil exhibit substantially reduced noise ampli¯cation
because of lower g-factor values and higher intrinsic SNR available from the 16-channel
upper airway coil. The average g-factors for the axial slices in Fig. 6.8c and d were 4.64
and 2.67, respectively. The average g-factors for the coronal slices in Fig. 6.8e and f were
2.23 and 1.52, respectively. Note that the coronal slices exhibit lower g-factors than the
axial slices for both the 8-channel and 16-channel coils.
104
Figure6.9comparesR=1andR=6SENSEimagesononereformattedmidsagittal,
four reformatted coronal, and four axial slices. Slice prescriptions for the coronal and
axial slices are indicated by the solid lines in Fig. 6.9a. These slices were chosen because
they contain information on the air-tissue boundaries necessary for the extraction of the
3D vocal tract shape [75]. Although the R = 6 SENSE images are noisier than the
R = 1 images, they preserve the air-tissue boundary features that are relevant to the
measurement of the vocal tract area function.
Figure 6.10 contains spiral SENSE reconstructed images of the midsagittal slice for
¯ve di®erent acceleration factors of 1:0, 1:6, 2:6, and 4:2, and 6:8. Parallel imaging with
reduction factors of up to 4:2 produced the images that retained su±cient image quality
forspeechimagingapplications. Inotherwords,theboundariesofthetongue,velum,and
other articulators were clearly depicted. The R = 6:8 image (see Fig. 6.10a) exhibited
poordepictionoftheseboundariesbecauseoflowerimageSNR.Theconstrictionbetween
thetonguebladeandthealveolarridgeisseenduringtheutteranceof/i/(thesolidarrows
in Fig. 6.10b). Also, the constriction between the tongue tip and the alveolar ridge is
seen during the utterance of /S/ (the hollow arrows in Fig. 6.10b).
6.4 Discussion
The 16-channel coil produced acceptable image quality in static upper airway imaging
withconventionalCartesianSENSEfromarate-4in1Dundersamplingandarate-6in2D
undersampling. This opens up new opportunities for upper airway MRI with improved
spatial resolution. For example, 1:25 mm isotropic resolution 3D imaging of the vocal
tract is achievable with a rate-6 conventional SENSE during 9 seconds of the scan time.
This is already a substantial improvement over the recently reported accelerated 3D
sustained vocal tract imaging that utilizes the single-channel birdcage coil and achieves
105
a 1:5£1:5£2:0 mm
3
resolution and 7 seconds scan time using compressed sensing [53].
Higheraccelerationthanrate-6maybeachievablebyusingadvancedimagereconstruction
methods such as the combined use of compressed sensing and parallel imaging [70].
Thelocalizedsensitivitywiththe16-channelcoilenablesspiralimagingwithasmaller
FOV, and leading to better spatio-temporal resolution compared to what is possible
with the 8-channel NV coil. As reduction factor increases to 6.8 in the spiral parallel
imaging, severe SNR degradation occurred from the iterative SENSE reconstruction.
Iterative SENSE with an explicit regularization (e.g., total variation regularization) may
bebene¯cialsinceitmitigatesnoiseampli¯cationandpreservestheedgesoftheair-tissue
interface. It is worth noting that the use of SENSE-based non-Cartesian parallel imaging
requires: 1) Full FOV image acquisition (including the brain that is not our region-of-
interest) for estimating coil sensitivity. 2) Extrapolation to estimate coil sensitivity in
image regions occupied by air. Data inconsistency due to motion of articulators can also
lead to inaccuracy in data ¯tting term and degrade image quality. As an alternative,
k-space based parallel imaging approaches, such as non-Cartesian GRAPPA [40], do not
requiretheexplicitestimationofcoilsensitivitymapsandmaybemorerobustinpractice.
Compared to the 8-channel neurovascular coil array which has large-sized coil ele-
ments, coil calibration may be more di±cult with the 16-channel coil which has a larger
number of smaller diameter elements. As such, coil sensitivities are more highly localized
and rapidly moving anatomy (e.g., lips, tongue tip) that is close to the coil elements has
a strong signal contribution.
In coil design there is also much room for improvement. For example, future designs
can bene¯t from: rearrangement of the coil elements to reduce the g-factor in its worst
places, better coverage of the lower airway with additional coil elements, better coverage
of the pharyngeal wall with a few large coils around the back of the head, access for
106
a respirator or mask covering the face and nose for experiments related to airway col-
lapse [22], and rearrangement of the coils to allow for easier opening of the lower jaw. It
is also worth of investigating the bene¯t of using higher channel counts (32 or more) for
even higher acceleration.
6.5 Conclusions
Thisnovel16-channelcoilarrayprovidesrate-4accelerationto2DFTmidsagittalimaging,
rate-6 acceleration to 3DFT imaging, and rate-4.2 acceleration to 2D spiral midsagittal
imaging of the upper airway. This will lead to either higher spatial resolution imaging or
reducedscantimeincapturinga3Dvocaltractshapeduringsustainedsoundproduction.
This coil has the potential to allow for improved spatio-temporal resolution in dynamic
upper airway MRI studies of swallowing and normal/disordered speech.
107
Figure 6.1: 16-channel upper airway receive coil array. (a): Coil layout. Individual
coil elements are overlapped with their neighboring coil elements to minimize mutual
inductive coupling. The two lower coil elements, which correspond to the arrows in (c),
extend down the side of the neck region to improve SNR of the lower airway. (b): Input
circuit diagram. (c): Photograph of the 16-channel coil. Each coil element is connected
to a preampli¯er. A pivoting cross bar is supported by the two preampli¯er boxes and
holds an adjustable cantilever connected to the coil array. The coil array can be held
close to the face or folded up and back to permit patient entry and exit. The outputs of
the two preamp boxes are connected to the scanner's input sockets by way of two cables
that are each ¯tted with a single RF trap. (d): RF trap coaxial inductor. (e): RF trap
with etched capacitors [39].
108
Figure 6.2: The 16-channel upper airway coil array on a volunteer. (Left) Side view.
(Right) Top view.
109
Figure 6.3: Illustration of the eight upper airway regions of interest (ROIs) used in the
evaluation of SNR and g-factor. (a) The ROIs are identi¯ed by red contours and are
superimposed onto the midsagittal anatomic image acquired using the 16-channel upper
airway coil: 1-upper lip, 2-lower lip, 3-front tongue, 4-mid tongue, 5-back tongue, 6-
palate, 7-velum, 8-pharyngeal wall, and 9-epiglottis. (b) and (c) shows images acquired
usingthebirdcagecoiland8-channelneurovascularcoil,respectively. Notethattheimage
acquiredfromthe16-channelcoilhasthelocalizedsensitivityontheupperairwayregions
of interest.
0.2
0.4
0.6
0.8
1.0
0.0
1 16
1
16
Channel index
Channel index
8
8
9
9 12
12
4
4
b
a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Figure 6.4: (a) Channel indices labeled on the coil layout. (b) Noise correlation matrix
from one representative volunteer.
110
Figure 6.5: Midsagittal, axial, and coronal images at each coil element of the 16-channel
upper airway coil. 3D data set was acquired using 3DFT GRE sequence. The images
shown are ordered based on the coil layout in Fig. 6.4a. (a) Midsagittal magnitude image
for each coil element. In the upper left image, the dotted and solid lines indicate axial
and coronal slice prescriptions, respectively. From the prescribed lines in (a), shown are
(b) axial and (c) coronal images from each coil element. The numbers shown on each
image indicate channel indices.
111
R=2 R=3 R=4 R=5
a
c
0
2
4
6
8
10
R=1
b
d
0
2
4
6
8
10
max g. = 3.3, avg g. = 1.3 max g. = 6.1, avg g. = 1.9 max g. = 6.1, avg g. = 2.5 max g. = 14.2, avg g. = 4.1
max g. = 2.1, avg g. = 1.1 max g. = 5.4, avg g. = 1.5 max g. = 9.4, avg g. = 2.4 max g. = 11.6, avg g. = 3.9
Figure 6.6: 2D midsagittal image reconstruction using 1D SENSE. The direction of
undersampling was A-P. SENSE reconstructed images with a spatial resolution of
1:88£1:88£2:50 mm
3
are compared with respect to reduction factors from 2 to 5 for
the (a) Subject 1 and (c) Subject 2. Corresponding g-factor maps are shown for the (b)
Subject 1 and (d) Subject 2. The mean and maximum g-factors shown were computed
from the ROIs indicated by the red contours of the R = 1 images in (a) and (c).
112
1 2 3 4 5 6
1
10
20
30
40
Reduction factor
g−factor
1 2 3 4 5 6
0
2
4
6
8
10
12
Reduction factor
g−factor
upper lip
lower lip
front tongue
mid tongue
back tongue
palate
velum
pharyngeal wall
entire image
a b
1
Figure 6.7: Plots of g-factor values for 8 di®erent ROIs as a function of reduction factor
for 2DFT midsagittal parallel imaging. The axis of undersampling was along A-P. (a)
8-channel neurovascular coil. (b) 16-channel upper airway coil. Note that the scales in
the vertical axes are di®erent for (a) and (b) and substantial reduction in g-factors is
achieved with the 16-channel coil.
113
Figure 6.8: Comparison of midsagittal, axial and coronal slice reconstruction using par-
allel imaging. (Top): 8-channel neurovascular coil. (Bottom): 16-channel upper airway
coil. (a, b): Midsagittal SENSE reconstructed images (spatial resolution: 1.88 £ 1.88
£ 2.5 mm
3
) with a rate 4. The direction of phase encode was horizontal. (c, d): Axial
SENSEreconstructedimages(spatialresolution: 1.88£1.88£2.5mm
3
)witharate6(R
= 3 along the vertical axis and R = 2 along the horizontal axis). (e, f): Coronal SENSE
reconstructed images (spatial resolution: 1.88 £ 1.88 £ 2.5 mm
3
) with a rate 6 (R = 3
along the vertical axis and R = 2 along the horizontal axis). The 8-channel coil produced
unacceptable image quality with low image SNR and substantial noise ampli¯cation (see
the arrows in (a,c,e)). The 16-channel coil produced signi¯cantly improved image SNR
with smaller noise ampli¯cation than the 8-channel coil.
114
Figure 6.9: 3D image reconstruction using 2D SENSE. The directions of undersampling
were A-P and R-L. Fully sampled 1.25 mm isotropic resolution 3D data set was acquired
in open-mouthed position during 54 seconds scan. Retrospective undersampling was
performed with a regular undersampling factor of 6, where two-fold acceleration was
along R-L direction and three-fold acceleration was along A-P direction. Reformatted
midsagittal images are shown for (a) SENSE R = 1, and (c) SENSE R = 6. Eight
representative coronal and axial images are shown for (b) SENSE R = 1 and (d) SENSE
R = 6. Air-tissue boundaries in the vocal tract are well de¯ned in all the SENSE R=6
images.
115
b
R=1
89 intl.
R=1.6
55 intl.
R=2.6
34 intl.
R=4.2
21 intl.
R=6.8
13 intl.
a
Figure 6.10: 2D image reconstruction of the dynamics of vocal tract shaping using spiral
SENSE. (a) Comparison of spiral SENSE images from a single representative frame. For
the evaluation of SENSE reconstruction performance under di®erent reduction factors,
theframewaschosenfromthedataacquiredwhenthesubject(Subject1)wasstationary.
Spatial aliasing was substantially reduced for all the SENSE reconstructed images. The
R = 6:8 image exhibited relatively lower image SNR in the posterior regions of the upper
airway(e.g.,velumandpharyngealwall). (b)Dynamicframesthatrepresentthesubject's
utterance of \ee shop". The R = 4:2 SENSE reconstruction was used to generate the
image frames with its frame update rate of 44 ms. The frame sequence is ordered from
the top left to the bottom right.
116
Chapter 7
Summary and Future Work
In summary, I have presented four imaging methodologies that improve image quality
over conventional techniques in rapid MRI of the beating heart and upper airway:
² Automatic correction of echo-planar imaging (EPI) ghosting artifacts in
real-time MRI
I proposed \double-alternating" echo-planar imaging and sensitivity encoding re-
construction to e®ectively remove ghosting artifacts and maintain its temporal res-
olution[57]. Idemonstrateditse®ectivenessatreducingghostingartifactsresulting
fromecho-misalignmentinoblique/double-obliquescanplanesbycomparingitwith
traditional 1D phase correction method.
² Accelerated 3D upper airway MRI using compressed sensing: Applica-
tion to sustained speech imaging
I demonstrated a ¯rst application of compressed sensing to sustained speech imag-
ing [53]. Compared to traditional 3D MRI of the vocal tract, the proposed method
acquires data in true 3D Fourier space and its acquisition does not involve any rep-
etition of the target utterance. Hence, it is not prone to the image mis-registration
117
e®ect and also results in a more time-e±cient data collection procedure. Pseudo-
randomFourierencodingandcompressedsensingreconstructionisexploitedtofur-
ther improve spatial resolution within sustained sound duration. Its combination
with parallel imaging signi¯cantly improves the image quality at high acceleration
rate [52].
² Improved real-time speech MRI using a golden-ratio spiral view order
I proposed a golden-ratio spiral view order and demonstrated its ¯rst application
to real-time speech MRI [55, 54]. The golden-ratio spiral view order provides more
°exibleretrospectiveselectionoftemporalresolution. Thisiswellsuitedtoreal-time
speech MRI in which speech rate and motion of articulators highly vary depending
on speech task or speaking style. I demonstrated its e®ectiveness at capturing
rapid movement of the tongue tip and relatively slow movement of the velum by
comparing it to traditional bit-reversed spiral view order.
² Improved upper airway MRI using a 16-channel coil at 3 Tesla
I described a novel 16-channel upper airway 3 Tesla receive coil that was originally
designed by Hayes et al. [39]. I demonstrated its SNR and parallel imaging g-
factor performance which is far superior to widely available single channel birdcage
and 8-channel neurovascular array coils [51]. I demonstrated a 1.25 mm isotropic
resolution 3D imaging of the vocal tract with a 6-fold parallel imaging acceleration.
This coil also has the potential to improve SNR and spatio-temporal resolution in
real-time speech MRI at 3 Tesla.
Finally, I wish to present some of the topics that I have been working on but have
not completed. These are worth of more investigation as future work.
118
² High resolution vocal tract imaging during sustained speech production
I have performed a preliminary study on collecting 3D MRI data acquired dur-
ing sustained sound production of English vowels and fricatives. Improvement in
imaging work was attributed to a state-of-the-art technology equipped with the 16-
channel upper airway receive coil, pseudo-random Poisson disk undersampling, and
phase-constrained multi-coil compressed sensing reconstruction. I obtained high
quality 3D vocal tract shape data sets with 1.25 mm isotropic resolution and 6
seconds scan time (i.e., 8£ acceleration). Future work includes: 1) Recruit 3 Amer-
icanEnglishmaleand3AmericanEnglishfemalewhohavemid-westorCalifornian
accents, 2) collect multiple sets of sustained sound production of American English
vowels and fricatives from the subjects, and 3) extract vocal tract shape and report
any di®erences in the area functions and 3D geometries of the vocal tract shapes
across the subjects.
² Multi-coil phase constrained compressed sensing for accelerated 3D up-
per airway MRI
I have proposed a multi-coil phase constrained compressed sensing reconstruction
algorithm that involves multiple iterations to improve the estimation of the phase
mapandthusultimatelyimprovesimagequalityinthedepictionofair-tissuebound-
aries. I will address the issue on convergence of the iterative algorithm and seek to
build up a theoretical framework on the method.
² Real-time speech MRI at 3 Tesla
SPAN data collection has been performed dominantly at 1.5 Tesla scanner sites.
The reconstructed frames have low SNR, which leads to a poor depiction of the
air-tissue boundaries. Speech imaging at 3 Tesla is an interesting research area,
119
and some imaging work has been reported in the literature [111, 105]. I have
developed a real-time continuous spiral imaging sequence at 3 Tesla, but this does
not provide interactive monitoring of the vocal tract like what is normally done on
a custom real-time imaging software (i.e., RTHawk). Adaptation of RTHawk into
this new 3 Tesla scanner site is considered as a long-term future work. Optimizing
spiral imaging pulse sequences suited for speech imaging at 3 Tesla may require
substantial amounts of experimental work.
² Obstructive sleep apnea imaging under a less noisy MRI scan
So far, technological advances have been made towards the improvement in speech
imaging. For a side project related to obstructive sleep apnea imaging, I have
developedapulsesequencethatisbasedonthedesignoftheexcitationandreadout
gradientswithafarlowerslewrateandthussigni¯cantlyreducesacousticnoisefrom
theMRIgradients. ThepulsesequencehadanincreasedTRsuchthattheidletime
exceeded the duration of the readout gradients. Scanning was able to continuously
proceed without any automatic pause for more than 30 minutes. The amplitudes
for the excitation and readout gradients were able to be adjusted from 0 to 100 %
in real-time. This setting may help the subjects to naturally fall asleep without
any use of sedatives and provide a better experimental setup for investigating the
airway collapse in sleep apnea MRI.
² Time interleaved imaging of arbitrary scan planes
Conventional real-time speech MRI typically acquires the dynamics of vocal tract
shapefromasingleslice(typicallymidsagittalslice). Thisprovidesinsightsintothe
dynamicsofallarticulators, butdoesnotallowforvisualizingimportantfeaturesin
vocal tract shaping such as grooving/doming of the tongue, asymmetries in tongue
120
shaping, and lateral shaping of the pharyngeal airway. I have developed a real-time
speechimagingtechniquethatprovidesacquisitionofmultiplearbitraryscanplanes
in time-interleaved fashion. Additional bene¯t is that partial saturation indicates
wheretheslicesarelocatedinrelationtoeachother. Ihaveappliedthistechniqueto
speech imaging on Mandarin Chinese fricatives and English voiced/voiceless frica-
tives. Moreover, time-interleaved imaging of multiple axial slices of the brain and
one midsagittal slice of the upper airway is another application area. This enables
monitoring speech or swallowing during fMRI study [106].
121
Bibliography
[1] S. G. Adams, G. Weismer, and R. D. Kent. Speaking rate and speech movement
velocity pro¯les. J Speech and Hear Res, 36:41{54, 1993.
[2] A. Alwan, S. Narayanan, and K. Haker. Toward articulatory-acoustic models for
liquid consonants based on MRI and EPG data. part II: The rhotics. J Acoust Soc
Am, 101:1078{1089, 1997.
[3] A. Amelot and S. Rossato. Velar movements for two French speakers. Proceedings
of the 16th International Congress of Phonetic Sciences, Saarbrucken, Germany,
2007.
[4] A. Anagnostara, S. Stoeckli, O. M. Weber, and S. S. Kollias. Evaluation of the
anatomical and functional properties of deglutition with various kinetic high-speed
MRI sequences. J Magn Reson Imaging, 14:194{199, 2001.
[5] R. Arens, J. M. McDonough, A. T. Costarino, S. Mahboubi, C. E. Tayag-Kier,
G.Maislin, R.J.Schwab, andA.I.Pack. Magneticresonanceimagingoftheupper
airway structure of children with obstructive sleep apnea syndrome. Am J Respir
Crit Care Med, 164:698{703, 2001.
[6] T. Baer, J. C. Gore, L. C. Gracco, and P. W. Nye. Analysis of vocal tract shape
and dimensions using magnetic resonance imaging: Vowels. J Acoust Soc Am,
90(2):799{828, 1991.
[7] P. Bangayan, A. Alwan, and S. Narayanan. From MRI and acoustic data to articu-
latory synthesis: A case study of the lateral approximants in American English. In
Proceedings of the Intl Conf Spoken Lang Processing, pages 793{796, Philadelphia,
PA, 1996.
[8] A. V. Barger, W. F. Block, Y. Toropov, T. M. Grist, and C. A. Mistretta. Time-
resolved contrast-enhanced imaging with isotropic resolution and broad coverage
using an undersampled 3D projection trajectory. Magn Reson Med, 48(2):297{305,
2002.
[9] A. J. Beer, P. Hellerho®, A. Zimmermann, K. Mady, R. Sader, E. Rummeny, and
C. Hannig. Dynamic near-real-time magnetic resonance imaging for analyzing the
velopharyngeal closure in comparison with video°uoroscopy. J Magn Reson Imag-
ing, 20:791{797, 2004.
122
[10] K. T. Block, M. Uecker, and J. Frahm. Undersampled radial MRI with multiple
coils.Iterativeimagereconstructionusingatotalvariationconstraint. Magn Reson
Med, 57:1086{1098, 2007.
[11] E. Bresch, Y-C. Kim, K. S. Nayak, D. Byrd, and S. Narayanan. Seeing speech:
capturing vocal tract shaping using real-time magnetic resonance imaging. IEEE
Signal Processing Magazine, 25:123{132, 2008.
[12] E. Bresch and S. Narayanan. Region segmentation in the frequency domain ap-
plied to upper airway real-time magnetic resonance images. IEEE Transactions on
Medical Imaging, 28:323{338, 2009.
[13] E. Bresch, J. Nielsen, K. Nayak, and S. Narayanan. Synchronized and noise-robust
audio recordings during real-time MRI scans. J Acoust Soc Am, 120(4):1791{1794,
2006.
[14] T. Breyer, M. Echternach, S. Arndt, B. Richter, O. Speck, M. Schumacher, and
M. Markl. Dynamic magnetic resonance imaging of swallowing and laryngeal mo-
tion using parallel imaging at 3T. Magn Reson Imaging, 27:48{54, 2009.
[15] C.P.BrowmanandL.Goldstein. Tiersinarticulatoryphonology, withsomeimpli-
cations for casual speech. Papers in laboratory phonology I: between the grammar
and the physics of speech, pages 341{376, 1990.
[16] M. H. Buonocore and L. Gao. Ghost artifact reduction for echo-planar imaging
using image phase correction. Magn Reson Med, 38:89{100, 1997.
[17] M. H. Buonocore and D. C. Zhu. High spatial resolution EPI using an odd number
of interleaves. Magn Reson Med, 41:1199{1205, 1999.
[18] D. Byrd and K. S. Harris. Identifying and evaluating apraxic speech de¯cits using
magnetometry. Proceedings of the 16th International Congress of Phonetic Sci-
ences, Saarbrucken, Germany, 2007.
[19] D.Byrd,S.Tobin,E.Bresch,andS.Narayanan. Timinge®ectsofsyllablestructure
and stress on nasals: A real-time MRI examination. J Phonetics, 37:97{110, 2009.
[20] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact sig-
nal reconstruction from highly incomplete frequency information. IEEE Trans Inf
Theory, 52(2):489{509, Feb 2006.
[21] N. K. Chen and A. M. Wyrwicz. Removal of EPI Nyquist ghost artifacts with
two-dimensional phase correction. Magn Reson Med, 51:1247{1253, 2004.
[22] I. Colrain, K. S. Nayak, and J. F. Nielsen. Real-time MRI of upper airway collapse
during inspiratory loading. In Proc, ISMRM, 14th Annual Meeting, page 2417,
Seattle, 2006.
[23] C. A. Conway, Y. Song, D. P. Kuehn, and B. P. Sutton. Field-corrected dynamic
imaging of the velopharyngeal musculature during swallow. In Proc, ISMRM, 16th
Annual Meeting, page 2005, Toronto, 2008.
123
[24] D. Demolin, S. Hassid, T. Metens, and A. Soquet. Real-time MRI and articulatory
coordination in speech. Comptes Rendus Biologies, 325(4):547{556, 2002.
[25] D. Donoho. Compressed sensing. IEEE Trans Inf Theory, 52(4):1289{1306, Apr
2006.
[26] O. Engwall and P. Badin. An MRI study of swedish fricatives: Coarticulatory
e®ects. In Proc 5th Speech production seminar, pages 297{300, 2000.
[27] C. Y. Espy-Wilson, S. E. Boyce, M. T. T. Jackson, S. Narayanan, and A. Alwan.
Acoustic modeling of the American English /r/. J Acoust Soc Am, 108(1):343{356,
2000.
[28] D. A. Feinberg and K. Oshio. Gradient-echo shifting in fast MRI techniques
(GRASE imaging) for correction of ¯eld inhomogeneity errors and chemical shift.
J Magn Reson, 97:177{183, 1992.
[29] D. A. Feinberg, R. Turner, P. D. Jakab, and M. V. Kienlin. Echo-planar imaging
with asymmetric gradient modulation and inner -volume excitation. Magn Reson
Med, 13:162{169, 1990.
[30] W.M.Fisher, G.R.Doddington, andK.M.Goudie-Marshall. TheDARPAspeech
recognition research database: speci¯cations and status. In Proceedings of DARPA
workshop on speech recognition, pages 93{99, 1986.
[31] U. Gamper, P. Boesiger, and S. Kozerke. Compressed sensing in dynamic MRI.
Magn Reson Med, 59:365{373, 2008.
[32] M. A. Griswold, P. M. Jakob, R. R. Edelman, and D. K. Sodickson. Alternative
EPI acquisition strategies using SMASH. In Proc, ISMRM, 6th Annual Meeting,
page 423, Sydney, 1998.
[33] M. A. Griswold, P. M. Jakob, R. M. Heidemann, M. Nittka, V. Jellus, J. Wang,
B. Kiefer, and A. Haase. Generalized autocalibrating partially parallel acquisitions
(GRAPPA). Magn Reson Med, 47:1202{1210, 2002.
[34] P. T. Gurney, B. A. Hargreaves, and D. G. Nishimura. Design and analysis of a
practical 3D cones trajectory. Magn Reson Med, 55(3):575{582, 2006.
[35] P.C.Hansen. Analysisofdiscreteill-posedproblemsbymeansofthel-curve. SIAM
Review, 34(4):561{580, 1992.
[36] C. J. Hardy, E. C. Harvey, R. O. Giaquinto, T. Niendorf, A. K. Grant, and D. K.
Sodickson. 32-element receiver-coil array for cardiac imaging. Magn Reson Med,
55:1142{1149, 2006.
[37] B. A. Hargreaves, D. G. Nishimura, and S. M. Conolly. Time-optimal multidimen-
sional gradient waveform design for rapid imaging. Magn Reson Med, 51(1):81{92,
2004.
124
[38] D. M. Hartl, M. Albiter, F. Kolb, B. Luboinski, and R. Sigal. Morphologic param-
eters of normal swallowing events using single-shot fast spin echo dynamic MRI.
Dysphagia, 18:255{262, 2003.
[39] C.E.Hayes,C.Carpenter,I.E.Evangelou,andG.Chi-Fishman. Designofahighly
sensitive 12-channel receive coil for tongue MRI. In Proc, ISMRM, 15th Annual
Meeting, page 449, Berlin, 2007.
[40] R. M. Heidemann, M. A. Griswold, N. Seiberlich, G. Kruger, S. A. R. Kan-
nengiesser, B. Kiefer, G. Wiggins, L. L. Wald, and P. M. Jakob. Direct parallel
image reconstructions for spiral trajectories using GRAPPA. Magn Reson Med,
56:317{326, 2006.
[41] F. Hennel. Image-based reduction of artifacts in multishot echo-planar imaging. J
Magn Reson, 134:206{213, 1998.
[42] D. A. Herzka, P. Kellman, A. H. Aletras, M. A. Guttman, and E. R. McVeigh.
Multishot EPI-SSFP in the heart. Magn Reson Med, 47(4):655{664, 2002.
[43] S. Hu, M. Lustig, A. P. Chen, J. Crane, A. Kerr, D. A. C. Kelley, R. Hurd,
J.Kurhanewicz,S.J.Nelson,J.M.Pauly,andD.B.Vigneron. Compressedsensing
for resolution enhancement of hyperpolarized
13
C °yback 3D-mrsi. J Magn Reson,
192:258{264, 2008.
[44] M. S. Inoue, T. Ono, E. Honda, and T. Kurabayashi. Application of magnetic
resonance imaging movie to assess articulatory movement. Orthod Craniofacial
Res, 9:157{162, 2006.
[45] P. Irarrazabal and D. G. Nishimura. Fast three dimensional magnetic resonance
imaging. Magn Reson Med, 33(5):656{662, 1995.
[46] J. I. Jackson, C. H. Meyer, D. G. Nishimura, and A. Macovski. Selection of a con-
volution function for Fourier inversion using gridding. IEEE Trans Med Imaging,
10(3):473{478, September 1991.
[47] P. Kellman, F. H. Epstein, and E. R. McVeigh. Adaptive sensitivity encoding
incorporating temporal ¯ltering (TSENSE). Magn Reson Med, 45:846{852, 2001.
[48] P. Kellman and E. R. McVeigh. Image reconstruction in SNR units: a general
method for SNR measurement. Magn Reson Med, 54:1439{1447, 2005.
[49] P. Kellman and E. R. McVeigh. Phased array ghost elimination. NMR Biomed,
19:352{361, 2006.
[50] A. B. Kerr, J. M. Pauly, B. S. Hu, K. C. P. Li, C. J. Hardy, C. H. Meyer, A. Ma-
covski, andD.G.Nishimura. Real-timeinteractiveMRIonaconventionalscanner.
Magn Reson Med, 38:355{367, 1997.
[51] Y-C.Kim, C.E.Hayes, S.Narayanan, andK.S.Nayak. Anovel16-channelreceive
coil array for accelerated upper airway MRI at 3 Tesla. Magn Reson Med, 2010. In
Press.
125
[52] Y-C. Kim, S. Narayanan, and K. S. Nayak. Accelerated three-dimensional MRI of
vocal tract shaping using compressed sensing and parallel imaging. Proc. ICASSP,
27:389{392, 2009.
[53] Y-C. Kim, S. Narayanan, and K. S. Nayak. Accelerated three-dimensional upper
airway MRI using compressed sensing. Magn Reson Med, 61:1434{1440, 2009.
[54] Y-C. Kim, S. Narayanan, and K. S. Nayak. Flexible retrospective selection of
temporal resolution in real-time speech MRI using a golden-ratio spiral view order.
Magn Reson Med, 2010. In Press.
[55] Y-C. Kim, S. Narayanan, and K. S. Nayak. Improved real-time MRI of oral-velar
coordination using a golden-ratio spiral view order. In Proceedings of Interspeech,
Makuhari, Japan, 2010.
[56] Y-C. Kim and K. S. Nayak. Optimization of undersampled variable density spiral
trajectoriesbasedonincoherenceofspatialaliasing. In Proc, ISMRM, 16th Annual
Meeting, page 422, Toronto, 2008.
[57] Y-C. Kim, J-F. Nielsen, and K. S. Nayak. Automatic correction of echo-planar
imaging (EPI) ghosting artifacts in real-time interactive cardiac MRI using sensi-
tivity encoding. J Magn Reson Imaging, 27:239{245, 2008.
[58] K. F. King. Combined compressed sensing and parallel imaging. In Proc, ISMRM,
16th Annual Meeting, page 1488, Toronto, 2008.
[59] K. F. King, T. K. F. Foo, and C. R. Crawford. Optimized gradient waveforms for
spiral scanning. Magn Reson Med, 34(2):156{160, 1995.
[60] D. P. Kuehn. A cineradiographic investigation of velar movement in two normals.
Cleft Palate J, 13:88{103, 1976.
[61] S. Kuhara, Y. Kassai, Y. Ishihara, M. Yui, Y. Hamamura, and H. Sugimoto. A
novel EPI reconstruction technique using multiple RF coil sensitivity maps. In
Proc, ISMRM, 8th Annual Meeting, page 154, Denver, 2000.
[62] J. H. Lee, B. A. Hargreaves, B. S. Hu, and D. G. Nishimura. Fast 3D imaging
using variable-density spiral trajectories with applications to limb perfusion. Magn
Reson Med, 50:1276{1285, 2003.
[63] Z. P. Liang. Spatiotemporal imaging with partially separable functions. Interna-
tional Symposium on Biomedical Imaging, pages 988{991, 2007.
[64] C. Liu, R. Bammer, and M. E. Moseley. Parallel imaging reconstruction for arbi-
trarytrajectoriesusingk-spacesparsematrices(kSPA). MagnResonMed,58:1171{
1181, 2007.
[65] R.B.Lufkin, D. G.Wortham, R.B.Dietrich, L. A.Hoover, S.G.Larsson, H.Kan-
garloo, and W. N. Hanafee. Tongue and oropharynx: Findings on MR imaging.
Radiolgy, 161(1):69{75, 1986.
126
[66] M.Lustig,D.Donoho,andJ.M.Pauly. SparseMRI:Theapplicationofcompressed
sensing for rapid MR imaging. Magn Reson Med, 58:1182{1195, 2007.
[67] M. Lustig, D. Donoho, J. M. Santos, and J. M. Pauly. Compressed sensing MRI.
IEEE Signal Processing Magazine, 25:72{82, 2008.
[68] M. Lustig, D. L. Donoho, and J. M. Pauly. k-t SPARSE: High frame rate dynamic
MRI exploiting spatio-temporal sparsity. In Proc, ISMRM, 14th Annual Meeting,
page 2420, Seattle, 2006.
[69] M. Lustig, D. L. Donoho, and J. M. Pauly. Rapid MR imaging with compressed
sensing and randomly under-sampled 3DFT trajectories. In Proc, ISMRM, 14th
Annual Meeting, page 695, Seattle, 2006.
[70] M. Lustig and J. M. Pauly. SPIRiT: Iterative self-consistent parallel imaging re-
construction from arbitrary k-space. Magn Reson Med, 64:457{471, 2010.
[71] P. Mans¯eld. Multi-planar image formation using NMR spin echoes. J. Phys. C,
10:L55{L58, 1977.
[72] S. Marin and M. Pouplier. Temporal organization of complex onsets and codas in
american english: Testing the predictions of a gestural coupling model. Journal of
Motor Control, 14:380{407, 2010.
[73] L. Marinelli, C. J. Hardy, and D. J. Blezek. MRI with accelerated multi-coil com-
pressed sensing. In Proc, ISMRM, 16th Annual Meeting, page 1484, Toronto, 2008.
[74] S. Narayanan and A. Alwan. Noise source models for fricative consonants. IEEE
Trans Speech and Audio Processing, 8(3):328{344, 2000.
[75] S. Narayanan, A. Alwan, and K. Haker. An articulatory study of fricative con-
sonants using magnetic resonance imaging. J Acoust Soc Am, 98(3):1325{1347,
1995.
[76] S. Narayanan, A. Alwan, and K. Haker. Toward articulatory-acoustic models for
liquid consonants based on MRI and EPG data. part I: The laterals. J Acoust Soc
Am, 101:1064{1077, 1997.
[77] S. Narayanan, A. Alwan, and Y. Song. New results in vowel production: MRI,
EPG, and acoustic data. In Proc EuroSpeech, volume 1, pages 1007{1010, Rhodes,
Greece, 1997.
[78] S. Narayanan, D. Byrd, and A. Kaun. Geometry, kinematics, and acoustics of
Tamil liquid consonants. J Acoust Soc Am, pages 1993{2007, 1999.
[79] S.Narayanan,K.S.Nayak,S.Lee,A.Sethy,andD.Byrd. Anapproachtoreal-time
magnetic resonance imaging for speechproduction. J Acoust Soc Am, 115(5):1771{
1776, 2004.
[80] K. S. Nayak, C. H. Cunningham, J. M. Santos, and J. M. Pauly. Real-time cardiac
MRI at 3 Tesla. Magn Reson Med, 51(4):655{660, 2004.
127
[81] K. S. Nayak, J. M. Pauly, A. B. Kerr, B. S. Hu, and D. G. Nishimura. Real-time
color °ow MRI. Magn Reson Med, 43:251{258, 2000.
[82] D. G. Nishimura. Principles of magnetic resonance imaging. Stanford University
EE369B Course Notes, 1996.
[83] M.A.OhligerandD.K.Sodickson. Anintroductiontocoilarraydesignforparallel
MRI. NRM Biomed, 19:300{315, 2006.
[84] D. J. Ostry and K. G. Munhall. Controlof rate and duration of speech movements.
J Acoust Soc Am, 77:640{648, 1985.
[85] J. M. Pauly, R. K. Butts, G. T. Luk Pat, and A. Macovski. A circular echo planar
pulse sequence. In Proc, SMR, 3rd Annual Meeting, page 106, Nice, 1995.
[86] J. M. Pauly, D. G. Nishimura, and A. Macovski. A k-space analysis of small tip
excitation. J. Magn. Reson., 81:43{56, 1989.
[87] M. Proctor, L. Goldstein, D. Byrd, E. Bresch, and S. Narayanan. Articulatory
comparison of tamil liquids and stops using real-time magnetic resonance imaging.
J Acoust Soc Am, 125:2568, 2009.
[88] M. Proctor, C. H. Shadle, and K. Iskarous. Pharyngeal articulation in the produc-
tion of voiced and voiceless fricatives. J Acoust Soc Am, 127:1507{1518, 2010.
[89] K. P. Pruessmann, M. Weiger, P. Bornert, and P. Boesiger. Advances in sensitivity
encoding with arbitrary k-space trajectories. Magn Reson Med, 46(4):638{651,
2001.
[90] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger. SENSE: Sensi-
tivity encoding for fast MRI. Magn Reson Med, 42:952{962, 1999.
[91] P. Qu, K. Zhong, B. Zhang, J. Wang, and G. X. Shen. Convergence behavior of
iterative SENSE reconstruction with non-Cartesian trajectories. Magn Reson Med,
54:1040{1045, 2005.
[92] S. B. Reeder, E. Atalar, A. Z. Faranesh, and E. R. McVeigh. Referenceless inter-
leaved echo-planar imaging. Magn Reson Med, 41:87{94, 1999.
[93] P. B. Roemer, W. A. Edelstein, C. E. Hayes, S. P. Souza, and O. M. Mueller. The
NMR phased array. Magn Reson Med, 16:192{225, 1990.
[94] A.Rosset,L.Spadola,andO.Ratib. OsiriX:anopen-sourcesoftwarefornavigating
in multidimensional DICOM images. J Digital Imaging, 17:205{216, 2004.
[95] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation noise removal
algorithm. Physica D, 60(1-4):259{268, 1992.
[96] J.M.Santos,G.A.Wright,andJ.M.Pauly. Flexiblereal-timemagneticresonance
imaging framework. In Proc, IEEE EMBS, 26th Annual Meeting, volume 47, pages
1048{1051, San Francisco, 2004.
128
[97] C. H. Shadle, M. Proctor, and K. Iskarous. An MRI study of the e®ect of vowel
context on English fricatives. In Proc. 2nd Joint ASA-EAA Conference, Paris,
2008.
[98] F. G. Shellock, C. J. Schatz, P. M. Julien, J. M. Silverman, F. Steinberg, T. K.
Foo, M. L. Hopp, and P. R. Westbrook. Dynamic study of the upper airway with
ultrafast spoiled GRASS MR imaging. Am J Roentgenol, 158:1019{1024, 1992.
[99] F.G.Shellock,C.J.Schatz,P.M.Julien,F.Steinberg,T.K.Foo,M.L.Hopp,and
P. R. Westbrook. Occlusion and narrowing of the pharyngeal airway in obstructive
sleep apnea: evaluation by ultrafast spoiled GRASS MR imaging. J Magn Reson
Imaging, 2:103{107, 1992.
[100] D. M. Spielman, J. M. Pauly, and C. H. Meyer. Magnetic resonance °uoroscopy
using spirals with variable sampling densities. Magn Reson Med, 34(3):388{394,
1995.
[101] M.Stone,A.Faber,andM.Cordaro. Cross-sectionaltonguemovementandtongue-
palate movement in [s] and [sh] syllables. In Proceedings of the 13th International
Congress of Phonetic Sciences, Universite de Provence, pages 354{357, 1991.
[102] M. Stone, A. Faber, L. J. Raphael, and T. H. Shawker. Cross-sectional tongue
shapes and linguopalatal contact patterns in [s], [sh], and [l]. J Phonetics,
20(2):253{270, 1992.
[103] B. H. Story and I. R. Titze. Vocal tract area functions from magnetic resonance
imaging. J Acoust Soc Am, 100(1):537{554, 1996.
[104] Y. Suto, T. Matsuo, T. Kato, I. Hori, Y. Inoue, S. Ogawa, T. Suzuki, M. Yamada,
and Y. Ohta. Evaluation of the pharyngeal airway in patients with sleep apnea:
value of ultrafast MR imaging. Am J Roentgenol, 160(2):311{314, 1993.
[105] B. P. Sutton, C. A. Conway, Y. Bae, R. Seethamraju, and D. P. Kuehn. Faster
dynamic imaging of speech with ¯eld inhomogeneity corrected spiral fast low angle
shot (FLASH) at 3T. J Magn Reson Imaging, 32:1228{1237, 2010.
[106] B. P. Sutton, C. A. Conway, and D.P. Kuehn. Simultaneous monitoring of tongue
tip movements in functional MRI motor tasks for speech and swallowing studies.
In Proc, ISMRM, 17th Annual Meeting, page 20, 2009.
[107] B.P. Sutton, C. Conway, Y. Bae, C. Brinegar, Z-P. Liang, and D.P. Kuehn. Dy-
namic imaging of speech swallowing with MRI. In Proc, IEEE EMBS, 31st Annual
Meeting, pages 6651{6654, 2009.
[108] S.M.TaskoandM.D.McClean. Variationsinarticulatorymovementwithchanges
in speech task. J Speech Lang Hear Res, 47:85{100, 2004.
[109] D.R.Thedens, P.Irarrazaval,T.S.Sachs,C.H.Meyer,andD.G.Nishimura. Fast
magnetic resonance coronary angiography with a three-dimensional stack of spirals
trajectory. Magn Reson Med, 41(6):1170{1179, 1999.
129
[110] D. B. Twieg. The k-trajectory formulation of the NMR imaging process with ap-
plications in analysis and synthesis of imaging methods. Med Phys, 10(5):610{621,
1983.
[111] M.Uecker,S.Zhang,D.Voit,A.Karaus,K.D.Merboldt,andJ.Frahm. Real-time
MRI at a resolution of 20 ms. NMR in Biomed, 23:986{994, 2010.
[112] S. S. Vasanawala, M. T. Alley, B. A. Hargreaves, R. A. Barth, J. M. Pauly, and
M. Lustig. Improved pediatric MR imaging with compressed sensing. Radiology,
256(2):607{616, 2010.
[113] M. Weiger, K. P. Pruessmann, and P. Boesiger. 2D SENSE for faster 3D MRI.
MAGMA, 14:10{19, 2002.
[114] S. Winkelmann, T. Schae®ter, T. Koehler, H. Eggers, and O. Doessel. An optimal
radial pro¯le order based on the golden ratio for time-resolved MRI. IEEE Trans
Med Imaging, 26(1):68{76, 2007.
[115] Z. Zhang and C. Y. Espy-Wilson. A vocal-tract model of American English /l/. J
Acoust Soc Am, 115(3):1274{1280, 2004.
[116] X. Zhou, F. H. Epstein, and J. K. Maier. Reduction of a new Nyquist ghost in
oblique echo planar imaging. In Proc, ISMRM, 4th Annual Meeting, page 1477,
New York, 1996.
[117] X. Zhou, C. Y. Espy-Wilson, S. Boyce, M. Tiede, C. Holland, and A. Choe. A
magnetic resonance imaging-based articulatory and acoustic study of \retro°ex"
and \bunched" American English /r/. J Acoust Soc Am, 123(6):4466{4481, 2008.
[118] V. Zue, S. Sene®, and J. Glass. Speech database development at MIT: Timit and
beyond. Speech Communication, 9:351{356, 1990.
130
Abstract (if available)
Abstract
Magnetic resonance imaging (MRI) is a powerful non-invasive imaging modality, but is relatively slow compared to alternatives such as X-ray and ultrasound. Accelerating MRI scans has been of the great interest over the past several years and the acceleration of upper airway MRI, in particular, is a primary focus of this thesis. Rapid and real-time MRI can be used to capture tissue dynamics (e.g., the tongue/velum during speech, or the beating heart) or to reduce scan time. Rapid MRI can be achieved through the use of novel acquisition and reconstruction methods. Acquisition technologies such as echo-planar imaging and spiral imaging are effective at improving temporal resolution, but introduce additional image artifacts. Image reconstruction techniques such as parallel imaging and compressed sensing accelerate acquisition speed by highly undersampling Fourier data and improve image quality by removing spatial aliasing artifacts.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Functional real-time MRI of the upper airway
PDF
Fast flexible dynamic three-dimensional magnetic resonance imaging
PDF
Visualizing and modeling vocal production dynamics
PDF
Shift-invariant autoregressive reconstruction for MRI
PDF
Improved brain dynamic contrast enhanced MRI using model-based reconstruction
PDF
Seeing sleep: real-time MRI methods for the evaluation of sleep apnea
PDF
Dynamic cardiovascular magnetic resonance imaging for improved assessment of ischemic heart disease
PDF
Emotional speech production: from data to computational models and applications
PDF
Toward understanding speech planning by observing its execution—representations, modeling and analysis
PDF
New methods for carotid MRI
PDF
High-dimensional magnetic resonance imaging of microstructure
PDF
New theory and methods for accelerated MRI reconstruction
PDF
Correction, coregistration and connectivity analysis of multi-contrast brain MRI
PDF
Articulatory dynamics and stability in multi-gesture complexes
PDF
Estimating liver iron non-invasively with high-field MRI
PDF
Understanding music perception with cochlear implants with a little help from my friends, speech and hearing aids
PDF
Characterization of lenticulostriate arteries using high-resolution black blood MRI as an early imaging biomarker for vascular cognitive impairment and dementia
PDF
Matrix factorization for noise-robust representation of speech data
PDF
Diffusion MRI white matter tractography: estimation of multiple fibers per voxel using independent component analysis
PDF
Optimization methods and algorithms for constrained magnetic resonance imaging
Asset Metadata
Creator
Kim, Yoon-Chul
(author)
Core Title
Fast upper airway MRI of speech
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
11/30/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
image reconstruction,magnetic resonance imaging,OAI-PMH Harvest,real-time imaging,speech production,upper airway,vocal tract
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Nayak, Krishna S. (
committee chair
), Goldstein, Louis M. (
committee member
), Narayanan, Shrikanth S. (
committee member
)
Creator Email
yoonckim@usc.edu,yoonckim1@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3570
Unique identifier
UC1130473
Identifier
etd-Kim-4167 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-414851 (legacy record id),usctheses-m3570 (legacy record id)
Legacy Identifier
etd-Kim-4167.pdf
Dmrecord
414851
Document Type
Dissertation
Rights
Kim, Yoon-Chul
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
image reconstruction
magnetic resonance imaging
real-time imaging
speech production
upper airway
vocal tract