Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Information theoretic measures for PET image reconstruction and non-rigid image registration
(USC Thesis Other)
Information theoretic measures for PET image reconstruction and non-rigid image registration
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION THEORETIC MEASURES FOR PET IMAGE RECONSTRUCTION AND NON-RIGID IMAGE REGISTRATION by Sangeetha Somayajula A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) December 2009 Copyright 2009 Sangeetha Somayajula Dedication To my parents Seshu and Venkata Lakshmana Rao Somayajula ii Acknowledgments I thank Prof. Richard Leahy for being an excellent advisor and teacher. His guidance at every step of my research work has been invaluable, and I am grateful for the encour- agement he provided when I needed it. He has been an inspiring role model for me. I thankhim forprovidingfinancialsupport, forhelpingmeformproductivecollaborations, and for making my experience as a graduate student truly valuable. I also thank my guidance committee Dr. Manbir Singh, Dr. Krishna Nayak, Dr. Antonio Ortega, and Dr. Quanzheng Li for their time and feedback. I thank my collaborators Christos Pana- giotou, Dr. Simon Arridge, and Dr. Anand Rangarajan for contributing to the PET work through their valuable suggestions. Additionally, I thank Dr. Richard Laforest and Dr. Giselle Petzinger for providing data to validate my PET work. I am grateful to the Women inScience andEngineeringprogramfor awarding mea top-off scholarship for the first two years of my PhD studies, and for providing travel grants that helped me attend major conferences in my field. It has been a pleasure to work with my colleagues in the USC Biomedical Imaging group. Itwasanenrichingexperiencetodiscussmyworkwiththem,andIthankthemfor being a helpful and collaborative research group. I thank Dr. Sangtae Ahn, Dr. Anand Joshi, Dr. Quanzheng Li, and Dr. Sanghee Cho for always having the time to answer iii my questions, and Dr. Evren Asma for his help early in my research work. I thank Dr. Abhijit Chaudhari, Joyita Dutta, and Dr. Belma Dogdas for being good friends besides being great colleagues. I thank Dr. Zheng Li, Dr. Dimitrios Pantazis, Yu-teng Chang, YanguangLin,RanRen, SyedAshrafulla, JuanPoletti Soto, HuaHui, andDavid Wheland for being great colleagues, and for the fun times I had while being a part of the research group. I am grateful to my dear friends without whom I would be lost. I thank Kavitha Buddharaju for the faith she has in me, and for not letting astronomical international call rates stop her from always being there for me. I thank Lalit Varma for teaching me to believe in myself. Though he left this world early in my PhD program, he was and will continue to be a source of inspiration for me. I thank Suma Mallavarapu for being my sounding board, though she had her own PhD and her baby to take care of. I thank Gautam Thatte, Michelle Bardis, Laura Woo, Ashok and Amola Patel, Manoj Gopalkrishnan, Krishna and Sandhya Gogineni, and Ratnakar Vejella for being strong sources of support throughout my graduate student years. I thank Srikanth Pinninti, Joaquin Rapela, Vijay Ponduru, Pradeep Natarajan, and Haritha Bodugam for making my years as a graduate student at USC an enjoyable experience. Finally,Ithankmyfamilyforbeingpillarsofsupportthroughoutmygraduatestudies. Ithankmy grandmotherEswarammaSomayajula fortakingimmenseprideineverything I do, and my sister SunithaMukkavalli for beingmy friendforas long as Ican remember. My parents Seshu and Venkata Lakshmana Rao Somayajula cheered me on every step of the way, and I thank them for their constant love and support. iv Table of Contents Dedication ii Acknowledgments iii List Of Tables vii List Of Figures viii Abstract xii Chapter 1: Introduction 1 1.1 Positron Emission Tomography Image Reconstruction . . . . . . . . . . . 1 1.2 Non-rigid image registration . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Organization of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter2: PETImageReconstructionUsingInformationTheoreticAnatom- ical Priors 14 2.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 MAP estimation using information theoretic priors . . . . . . . . . 16 2.1.2 Information theoretic priors . . . . . . . . . . . . . . . . . . . . . . 18 2.1.3 Extraction of feature vectors . . . . . . . . . . . . . . . . . . . . . 22 2.1.4 Computation of prior term . . . . . . . . . . . . . . . . . . . . . . 24 2.1.5 Computation of the MAP estimate . . . . . . . . . . . . . . . . . . 25 2.1.6 Efficient computation of entropy and its gradient . . . . . . . . . . 28 2.2 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.2.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2.1.1 Anatomical and PET images with identical structure . . 33 2.2.1.2 Anatomical and PET images with differences . . . . . . . 38 2.2.2 Clinical data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3 Application of Information Theoretic Priors to Positron Range Correction 49 2.3.1 Positron Range Simulations . . . . . . . . . . . . . . . . . . . . . . 51 2.3.2 Real data results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.5 Appendix: Alternative formulation to the JE prior . . . . . . . . . . . . . 59 v Chapter 3: Mutual Information Based Non-Rigid Mouse Registration Us- ing A Scale-Space Approach 62 3.1 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.1.1 Mutual information based non-rigid registration . . . . . . . . . . 64 3.1.2 Scale-space feature vectors . . . . . . . . . . . . . . . . . . . . . . . 65 3.1.3 Displacement field and regularization . . . . . . . . . . . . . . . . . 67 3.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Chapter 4: Non-Rigid Registration Using Gaussian Mixture Models 73 4.1 Methods and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.1.1 Estimation of parameters of Gaussian mixture model . . . . . . . . 77 4.1.2 Estimation of displacement field . . . . . . . . . . . . . . . . . . . 79 4.1.3 Relation to Mutual Information . . . . . . . . . . . . . . . . . . . . 83 4.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1.5 Brain Image registration . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1.6 Mouse image registration . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Chapter 5: Summary and Future Work 100 5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 References 104 vi List Of Tables 2.1 Run-times for combined computation of pdf and gradient for different im- age sizes, for the Parzen and FFT based methods . . . . . . . . . . . . . 32 4.1 Quantitative measures of overlap . . . . . . . . . . . . . . . . . . . . . . . 98 vii List Of Figures 1.1 PET/CT imaging: Whole body PET image (left), its corresponding CT image (middle), and their overlay (right). Image was obtained from the website http:/medical.siemens.com . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Positron range: The positron travels a distance before annihilation during which it loses its kinetic energy. The LOR is a distance away from the source, hence fundamentally limiting the resolution of PET. . . . . . . . . 6 1.3 Mouse imaging studies that require non-rigid registration . . . . . . . . . 8 2.1 Logofestimatedjointprobabilitydistributionfunctions: Top: Anatomical image, Middle: Example reconstructed images (L to R): Blurred, Noisy, and true. Bottom: Estimated joint pdf of the images in the middle row and anatomical image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Joint densities of reference image, and images with arbitrary intensities: Top (L to R) Image with same structure as reference image, but arbitrary intensities, image with two regions having same intensity, and reference image. Bottom: Log of estimated joint densities of the reference image and the two functional images considered in the top row . . . . . . . . . . 22 2.3 Scale-space features of a coronal slice of an MR image of the brain . . . . 23 2.4 Comparison of Parzen and FFT-based methods: a) Image for which pdf and gradients wereestimated, b)pdfestimates usingM =351, c)gradient of marginal entropy using the Parzen method (left) and the FFT-based method (right). The normalized difference in the shown gradient images is 0.14% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5 The phantoms used in the simulations to represent the MR image (left) and the PET image (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 viii 2.6 True and reconstructed images: Top: True PET image (left), QP recon- struction with β = 0.5 (right). Middle: MI reconstruction for μ = 10000 (left), JE reconstruction for μ = 5000 (right). Bottom: MI-scale recon- struction for μ=2500 (left), JE-scale reconstruction for μ=2500. . . . . 35 2.7 Normalized error vs. iteration for best value of μ and β of the information theoretic and QP priors respectively. . . . . . . . . . . . . . . . . . . . . . 35 2.8 Results of Monte Carlo simulations for (L to R) QP (β = 0.5), MI (μ = 2500), JE(μ=5000), MI-scale(μ =2500), JE-scale(μ =2500) . . . . . . . 37 2.9 Joint pdfs of the anatomical image and a) true PET image, b) OSEM estimate used for initialization, c) image reconstructed using MI-Intensity prior, d) Image reconstructed using JE-Intensity prior . . . . . . . . . . . 37 2.10 VariationofmarginalentropyforMI-IntensitypriorandJEforMI-Intensity and JE-Intensity priors as a function of iteration . . . . . . . . . . . . . . 38 2.11 Coronal slice of an MR image of the brain, and true PET image generated from manual segmentation of the MR image. . . . . . . . . . . . . . . . . 39 2.12 True and reconstructed images: Top: True PET image (left), QP recon- struction with β = 0.5 (right). Middle: MI reconstruction for μ = 1e5 (left), JE reconstruction for μ = 150000,Bottom: MI-scale reconstruction for μ=1e5 (left), JE-scale reconstruction for μ=1e5. . . . . . . . . . . . 40 2.13 Normalized error as a function of iteration. . . . . . . . . . . . . . . . . . 41 2.14 Coronal slice of bias and SD images: Top (L to R): QP (β = 0.5), MI- Intensity(μ = 1e5), JE-Intensity(μ = 150000), Bottom (L to R): MI- scale(μ =1e5), JE-scale (μ=1e5). . . . . . . . . . . . . . . . . . . . . . . 42 2.15 Absolute value of bias vs. standard deviation curves for selected regions of interest for quadratic prior (QP), mutual information (MI) and joint entropy (JE) priors using only intensity as a feature, and MI scale-space (MIsc) and JE scale-space (JEsc) priors. . . . . . . . . . . . . . . . . . . . 43 2.16 Coronal slice of PET image (left) with co-registered MR image (right). . 46 2.17 Coronal slice of PET image reconstructions: Top (L to R): QP(μ = 1.5), MI-Intensity(μ = 75000), JE-Intensity(μ = 1e5). Bottom (L to R): MI- scale(μ =1e5), JE-scale(μ =1e5). . . . . . . . . . . . . . . . . . . . . . . 46 2.18 Overlay of PET reconstruction over co-registered MR image for QP (top) and JE-scale (bottom) priors. . . . . . . . . . . . . . . . . . . . . . . . . . 47 ix 2.19 Regions of interest for computing CRC vs noise curves: a)ROIs in caudate (red) and WM separating it from putamen (blue), (b) ROIs for computing noise variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.20 Contrast recovery coefficient versus noise variance plots as the hyperpa- rameter is varied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.21 Reconstruction for high value of μ for MI-scale(left) and JE-scale(right) priors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.22 The phantoms used in the simulations to represent the MR image (left), the PET image (middle), and the truncation mask (right). The line in the PET image shows the position of the profile shown in Fig. 2.25. . . . . . . 51 2.23 Positron range kernel for Cu-60 isotope vs distance (mm) . . . . . . . . . 52 2.24 Reconstructed images(L to R): No range correction with QP (NRC-QP) , range correction with quadratic prior (RC-QP) with μ = 0.01, range correction with JE prior (RC-JE), scale σ =0.5 voxels and μ=5e4. . . . 52 2.25 Profile drawn through true and reconstructed images . . . . . . . . . . . . 53 2.26 (a)CT image registered to reconstructed PET image and (b) truncation mask obtained from the CT image for truncated homogeneous model of range correction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.27 Positron range kernel for Br-76. . . . . . . . . . . . . . . . . . . . . . . . . 56 2.28 Reconstructed images: Top row: Coronal slice of PET images recon- structed using NRC-QP (left), RC-QP (middle), and RC-JE (right). Bot- tom row: Overlay on CT of the images shown in the top row. . . . . . . . 56 3.1 Coronal slice of scale-space images of a mouse for N s = 2, with scale 1 corresponding to no smoothing (left) and σ 2 =3 (right). . . . . . . . . . . 66 3.2 Sagittal slice of target (left) and template (right) mouse images. . . . . . 68 3.3 Comparison of registered images: Coronal and sagittal views of overlay on target image of template (top row), hierarchical multi-scale registered image (middle row), and scale-space registered image (bottom row). . . . 70 3.4 Displacement field obtained from scale-space registration . . . . . . . . . . 71 x 4.1 Estimation of GMM parameters: (a) Target image, (b) Template image, (c) Joint histogram of intensities of target and template images,(d) Joint pdf of target and template images computed using GMM (the component means shown with ’x’ marks). . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 (a) Gaussian window of width σ =2, and (b) its derivative. . . . . . . . . 86 4.3 Registrationresultsforonedataset:(a)Target, (b)Atlas, (c)SSDregistered image, (d)Parzen-MI registered image, and(e) GMM-Condregistered image. 89 4.4 Displacement field obtained from GMM-Cond registration corresponding to the slice shown in Figure 4.3. . . . . . . . . . . . . . . . . . . . . . . . . 90 4.5 Comparison of Parzen-MI vs GMM-Cond based methods: Log of the es- timate of initial joint density between target and template images using (a) Parzen windon method, (b) GMM method (the ’x’ marks denote the GMM means) , and log of the estimate of joint density between target and registered template using (c) Parzen window method and (d) GMM method. 90 4.6 Mean dice coefficients for different brain regions (Caud:Caudate, GM: Cerebral gray matter, WM: Cerebral white matter, Put: Putamen, Tha: Thalamus, HC: Hippocampus, CBGM: Cerebellar gray matter, CBWM: Cerebellar white matter) . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.7 Meansquarederrorbetweenintensitiesoftargetanddeformedatlasimages for SSD, MI, and GMM methods . . . . . . . . . . . . . . . . . . . . . . . 93 4.8 Transaxial and sagittal views of average images: Top row:Template image to which the atlases are registered. Middle row: Average of deformed at- lases usingParzen-MI approach, Bottom row: Average of deformedatlases using GMM based approach . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.9 Multi-modality inter-subject registration: Coronal (top row) and sagittal views (bottom row) of target , template, Parzen-MI registered image, and GMM-Cond registered image (left to right). . . . . . . . . . . . . . . . . 95 4.10 Overlay on target image of template (left), Parzen-MI registered image (middle), and GMM-Cond registered image (right). The target image is shown in red and the deformed template is shown in green. . . . . . . . . 96 xi Abstract We explore the use of information theoretic measures for positron emission tomography (PET) image reconstruction and for multi-modality non-rigid registration. PET is a functional imaging modality based on imaging the uptake of a radioactive tracer. PET images are typically of low resolution and are accompanied by high res- olution anatomical images such as CT or MR for localization of activity. PET tracer uptake typically results in a spatial density that is highly correlated to the anatomical morphology. The incorporation of this correlated anatomical information can potentially improve the quality of low resolution PET images. We propose using priors based on information theoretic similarity measures to incorporate anatomical information in PET reconstruction. We compare and evaluate the use of mutual information (MI) and joint entropy (JE) between feature vectors extracted from the anatomical and PET images as priors in PET reconstruction. These feature vectors are defined using scale-space theory suchthat theyemphasizeprominentboundariesintheanatomical andfunctional images, and attach less importance to detail and noise that is less likely to be correlated in the two images. We present an efficient method of computing these priors and their deriva- tivesbasedonfastFouriertransformsthatreducethecomplexity oftheirconvolution-like xii expressions. Through simulations and clinical data reconstructions, we evaluate the per- formance of MI and JE based priors in comparison to a quadratic prior, which does not use any anatomical information. We also apply these priors to the problem of positron rangecorrection. Positronrangeisthedistancetraveledbyapositronbeforeannihilation, thereby causing a blurring effect in the reconstructed image and limiting its resolution. Weuseinformationtheoreticpriorsinconjunctionwithasystemmodelthatincorporates positron range by modeling it as a spatially invariant blurring that is truncated at the boundary of the imaging volume. We present phantom simulation and real data results comparing these priors to the range corrected system model with quadratic prior. Smallanimalnon-rigidregistration isespecially challengingbecauseofthepresenceof rigid structures like the skeleton within non-rigid soft tissue. We present two approaches to multi-modality imaging that can be applied to clinical as well as pre-clinical images. First, we describe a non-parametric scale-space approach to MI based non-rigid small animal image registration. In this application, the goal is to simultaneously align global structure as well as detail in the images. We present results based on CT images ob- tained from two time points of a longitudinal mouse study that demonstrate that this approach aligns both skeleton and soft tissue better than a commonly used hierarchical approach. Second, we explore an alternative formulation that uses the log likelihood of the reference image (target) given the image to be registered (template) as a similarity metric wherein the likelihood is modeled using Gaussian mixture models (GMM). Using theGMMformulationreducesthedensityestimationsteptothatofestimatingtheGMM parameters. These parameters are small in number for images that have few regions of distinct intensities, such as brain or microCT images. This approach is more robustthan xiii the non-parametric MI based approach because of reduced overall dimensionality of the problem and more robust and accurate density estimation. We present comparisons of the non-parametric MI based approach to that of the GMM conditional likelihood based approach through intra-modality brain and multi-modality mouse images. Finally, we present future directions for the work described in this dissertation. xiv Chapter 1 Introduction This dissertation explores the use of information theoretic measures for two biomedical imaging applications - positron emission tomography (PET) image reconstruction, and non-rigid image registration. In both applications, the goal is to maximize similarity in the information theoretic sense between the image to be reconstructed or registered, and a correlated image. 1.1 Positron Emission Tomography Image Reconstruction Positron emission tomography is a non-invasive biomedical imaging tool that yields the density distribution of a radioactive positron emitting tracer. It is a functional imaging modality that can be used to track metabolic and hemodynamic processes in vivo. The radioactive tracer is injected into or inhaled by the subject, after which it undergoes decay, emittingpositrons. Eachemitted positronisannihilated byanelectron, producing two photons that travel in opposite directions by the law of conservation of momentum. These photons are detected by a circular ring of detectors, and a pair of photons striking detectors opposite to each other within a narrow timing window is called a coincidence. 1 Figure 1.1: PET/CT imaging: Whole body PET image (left), its corresponding CT image (middle), and their overlay (right). Image was obtained from the website http:/medical.siemens.com The data obtained is the number of coincidences in each angular direction around the subject. From this data, an image representing the spatial density of the tracer can be reconstructed, which gives insight into the physiological process that is being studied. PET images visualize physiological processes but do not provide anatomical context, thus making it necessary to complement them with a CT or MR scan. Modern day PET scannershaveCTscannersbuiltintothem, andcanoutputPETimages thatareoverlaid on the CT. Multi-modality systems including combined PET/MRI scanners is a growing area of research, a review of which is given in [Che06], [Tow08]. Thus, PETimages which are of low spatial resolution ( 2 mm at best for clinical PET) are usually accompanied by high resolution CT or MR ( resolution < 1mm) images of the same subject. An example of PET with corresponding CT image, and their overlay is shown in Figure 1.1. PET tracer uptake has been observed to be highly correlated with the underlying anatomical morphology of the subject. This can be seen in Figure 1.1, where the PET 2 activity is within regions that can be clearly delineated in the corresponding CT image. Thus, the incorporation of anatomical information from correlated MR/CT images into PETreconstructionalgorithmscanpotentially improvethequality oflowresolutionPET images. Quantitative inaccuracies due to partial volume effects in the PET image could be corrected during the reconstruction process, or distinct structures that cannot be resolved in the PET image from the data alone could be visible, because of the high resolution anatomical information introduced into the reconstruction algorithm. In this dissertation, we will explore a method of incorporating anatomical information into PET reconstruction through a Bayesian framework wherein information theoretic measures of similarity are used as priors on the PET image. The goal is to reconstruct a PET image thatfitsthedataandismostsimilartotheanatomical imageintheinformationtheoretic sense. Previousworkontheuseofanatomical priorscanbebroadlyclassifiedinto: (i)Meth- odsbasedonanatomicalboundaryinformation,whichencourageboundariesinfunctional images that correspond to anatomical boundaries [LY91] - [ABH + 96], and (ii) methods that use anatomical segmentation information, which encourage the distribution of trac- ers in regions corresponding to anatomical regions [BJT + 96]- [RHG00]. In [BYH + 04] a method that did not use boundary or segmentation information was proposed, wherein theprioraimed toproducehomogeneous regions in thePETimage wheretheanatomical image had an approximately homogeneous distribution of intensities. These approaches rely on boundaries, segmentation, or the intensities themselves to model the similarity in structurebetweentheanatomicalandfunctionalimages. Inthiswork,weuseinformation 3 theoretic measures such as mutual information (MI) and joint entropy (JE) to model the similarity between images in the information theoretic sense. Mutual information (MI) between two random vectors is a measure of the amount of information contained by one random vector about the other and can be used as a sim- ilarity metric between two multi-modality images [WVA + 96]. In [RHG00], a Bayesian joint mixture model was formulated such that the solution maximizes MI between class labels. In [SAL05, SRL07] we described a non-parametric method that uses MI between feature vectors extracted from the anatomical and functional images to define a prior on the functional image. This approach did not require explicit segmentation or boundary extraction, and aimed to reconstruct images that had a distribution of intensities and/or spatial gradients that matched that of the anatomical image. MI achieves this by min- imizing the joint entropy (JE), while maximizing a marginal entropy term that models the uncertainty in the PET image itself. In [Nuy07] it was shown through anecdotal examples that due to the marginal entropy term, MI tends to produce biased estimates in the case where there are differences in the anatomical and functional images, and that joint entropy is a more robust metric in these situations. In this work, we extend the non-parametric framework of our MI based priors to exploretheuseofbothMIandJEaspriorsinPETimagereconstruction. Sinceboththese information theoretic metrics are based on global distributions of the image intensities, spatialinformationcanbeincorporatedbyaddingfeaturesthatcapturethelocalvariation in image intensities. We define features based on scale space theory, which provides a frameworkfortheanalysisofimagesatdifferentlevelsofdetail. Itisbasedongeneratinga oneparameterfamily of images byblurringtheimage withGaussiankernels ofincreasing 4 width (the scale parameter) and analyzing these blurred images to extract structural features [Lin94]. We evaluate the performance of both MI and JE priors in quantifying uptake in specific regions as well as overall, through simulations and clinical data. We also present an efficient method of computing the priors and their derivatives, which makes these priors practical to use in 3D image reconstruction. The obtained results indicate that using JE based priors in PET reconstruction can improve the resolution of reconstructed PET images when there is reasonable agreement between the anatomical and PET images. One of the fundamental resolution limiting factors for PET is positron range [LH99]. In PET the positron emitting source is assumed to lie on the line of response (LOR) that joins the detectors at which the the photon pair is detected after positron-electron annihilation. However, a positron that is emitted from a radioactive source travels a certain distance called the positron range, after which it annihilates with an electron and emits a photon pair. A diagram showing this process is shown in Figure 1.2. The point of annihilation is not the same as the source position, thus limiting the resolution of the reconstructed PET image. For most popular PET isotopes such as F-18,C-11, etc., positron range is less than 1 mm, which is within the detector resolution of present day scanners. However, for some high energy isotopes such as Cu-60, Br-13, etc., positron range can be long (around 3mm), hence limiting their use as PET tracers, especially for small animal imaging [LDW02]. Correcting this positron range effect might open up the use of new long range tracers that can possibly visualize physiological processes that cannot be imaged using present day tracers, or give a different perspective on those that can be monitored today. Additionally, with ongoing research in improving PET detector 5 Figure 1.2: Positron range: The positron travels a distance before annihilation during which it loses its kinetic energy. The LOR is a distance away from the source, hence fundamentally limiting the resolution of PET. technology [Lew08] it is expected that detector resolution will improve in the future, making it important to address this fundamental limitation in reconstruction. Previous work in positron range correction involved obtaining a positron range blur kernel from phantom experiments or Monte Carlo simulations, and using this kernel for correction. The approaches that have been used for positron range correction can be broadly classified into (i) methods that involve deconvolution of the range kernel from the reconstructed image or projection data at each iteration of filtered back projection [HDU90] and (ii) methods that incorporate the range kernel into the system model and applythecorrectionduringstatistical reconstruction[AM08],[BRL + 03]-[RBL + 06]. The second approach was taken in [BRL + 03] where the range kernel was assumed to be spa- tially invariant, and was truncated at the object boundary to reflect the assumption that a positron that escapes the object does not contribute to the sinogram counts. However, this method did not correct for blurring of the internal boundaries within the object and 6 produced ring artifacts at boundaries, where the assumption of spatial invariance of blur kernel is not valid. We apply the information theoretic anatomical priors to the positron range problem with the goal of utilizing the available anatomical information to correct for the positron range blurring effect. We employ the truncated homogeneous model in conjunction with the anatomical priors to correct for range blurring at the object as well as internal boundaries. We present results obtained through phantom simulations using theCu-60isotope(range3.09 mm),andrealdataobtainedbyimagingmiceinjectedwith the Br-76 isotope (range 3.47 mm). 1.2 Non-rigid image registration The second aspect of this work is related to image registration. Longitudinal and inter- subject scans are often performed in imaging studies to study changes in the anatomy and function of a subject over a period of time, or across populations. In longitudinal studies, changes may occur in subject posture, tissue growth, organ movement, etc. In inter-subject studies, anatomical variability across populations needs to be normalized for accurate analysis. Image registration finds a mapping between images such that homologous features in the images have the same coordinates. The mapping that describes the relation between the images can be a low dimensional rigid or affine registration that is described by rotations, translations, and shear, or can bea high dimensional non-rigid registration. In longitudinal and inter-subject studies, the changes in anatomy and function mentioned above require non-rigid registration to align the images. Examples of imaging studies 7 Figure 1.3: Mouse imaging studies that require non-rigid registration that require non-rigid registration are shown in Figure 1.2 for mouse imaging. Several non-rigid registration algorithms have been developed, a review of which can be found in [HBHH01],[KLD04]. Alargesubsetofthesealgorithmsusevoxelbasedsimilaritymetrics that measuresimilarity between theintensities of theimages toberegistered suchas sum of squared difference (SSD), or some statistical measure of the intensities, such as cross correlation or MI. MI basedregistration aims tofindadeformation fieldthatmaximizes MI between the intensities of the images to be registered. Since MI between two images is a function of the distribution of intensities of the images rather than their actual values, it can beused to register multi-modal images. MI based registration has been successfully applied to rigid registration [WVA + 96], and some approaches to non-rigid registration using MI or similar information theoretic measures have also been proposed [DMVS03]- [CDH + 06]. Ithasbeenobserved thatthoughtheexisting non-rigidregistration algorithms have been applied to brain imaging with good results, these methods do not perform well for small animal registration [KHH03] -[LPGD06]. 8 Small animal imaging is an important tool for pre-clinical studies. The registration of small animal images is challenging because of the presence of rigid structures like the skeleton within non-rigid soft tissue. In [KHH03] and [PDD + 05] mouse registration was performedusingpiece-wiserigidregistration ofanatomical structures,whichweredefined in a segmentation step prior to registration. In [LPGD06] a fully automated method was proposed for whole body registration, where they first aligned the skeleton using a point based method, after which they imposed stiffness constraints at the skeleton to align the whole body images using intensity based non rigid registration with mutual information as the similarity metric. In this work, we describe two approaches to multi-modality non-rigid registration that can potentially be used for pre-clinical as well as clinical applications. First, we use MI as a similarity metric in a scale-space based framework similar to that proposed for PET reconstruction. The goal is to find a common mapping that aligns the fine structure that appears in the images at lower scales, as well as the global structure that remainsintheimages at higherscales. For small animalimaging, theskeletal structureis visibleinthelowerscalesandisblurredoutatthehigherscales,whiletheglobalstructure remainsintheimagesatthehigherscales. Hencethisscale-spaceapproachcanpotentially simultaneously align the skeleton as well as soft tissue. This multi-scale approach has the additional advantage of having a cost function that is less prone to local minima [KLD04],andhenceisabletoperformlargenon-lineardeformationsmoreaccuratelythan a hierarchical approach that aligns each scale hierarchically. We evaluate our scale-space method on a longitudinal mouse CT study by comparing it to a hierarchical approach. The results indicate that the scale-space approach is able to align both the skeleton and 9 soft tissue more accurately than the hierarchical approach. In our MI based scale-space registration, we used a non-parametric approach to estimate the unknown joint intensity distribution of the reference image (target) and the image to be registered (template) by Parzen windowing technique [DHS01] to avoid making any assumptions about the joint distribution. However, we note that this non-parametric approach is sensitive to small changes in the images, and requires a large number of samples for accurate estimation. Additionally, we note that using a non-parametric approach to estimate the entire joint densitybasedonthegivenimages, inconjunctionwithestimationofthehighdimensional non-rigid deformation fieldfrom the non-convex MI cost functioncould increase thetotal dimensionality of the problem, thus making the problem more ill-conditioned. As an alternative approach, we use the log likelihood of the target given the template asasimilarity metric, whereinthejointdistributionofthetargetandtemplateintensities is modeled by a Gaussian mixture model (GMM). The goal is to find a deformation field that maximizes the conditional probability of the target image given the deformed template image. Modeling the probability density using a GMM reduces the density estimation step tothat of estimating theparameters of theGMM. We choosethenumber of components in the GMM from the joint histogram of the target and template images, andkeepthisnumberconstantthroughouttheregistration. Thiscanhencebeconsidered as a semi-parametric approach to density estimation. For images that have a few distinct regions of intensity such as brain or mouse CT images, the number of parameters to be estimated is small, and can be robustly estimated from the images. Maximizing MI is closely related to maximizing the joint probability of the target and template images, or the conditional probability of the target given the template 10 images[LEG98]-[ZR03]. AninterpretationofMIasaspecialcaseofmaximumlikelihood estimation is given in [RMAP99]. In [ZR03] a MAP framework for non-rigid registration is used wherein a conditional density estimate is computed using a Parzen-like density estimator, and used as the likelihood term. It was shown that under certain conditions, the optimum of this MAP objective function also maximizes the true MI. In [LEG98] two different approaches to modeling the joint intensity distributions were explored - the Parzen density estimation, and the Gaussian mixture model (GMM). The distributions using these two approaches were estimated using a registered training set for the same modality as that of the images to be registered, and these estimated distributions were used to perform rigid registration by maximizing the joint probability between the target and template image intensities. Our approach of using the log-likelihood of the target given the template in con- junction with a GMM can be viewed as a semi-parametric approximation to MI based registration. However, we expect our GMM based approach to perform better than the non-parametric MI based approach because of (1) reduced overall dimensionality of the registration problem through the parametrization of the density estimation step, and (2) morerobustandaccurateestimation ofthejointdensity. Wecomparetheperformanceof the conditional likelihood metric with GMM parametrization, with that of MI with non- parametricdensityestimationapproach. Weevaluatethesemethodsusingintra-modality brain MR images, as well as inter-subject, inter-modality mouse images. Through these datasets, we test the performanceof ourapproach for clinical as well as pre-clinical imag- ing applications. 11 1.3 Contributions The contributions of this dissertation are as follows: 1. Developed priors based on information theoretic measures of similarity to incorpo- rate anatomical information in PET reconstruction. These priors take advantage of the available high resolution anatomical images to improve the reconstruction of PET images. 2. Used scale-space based feature vectors to incorporate spatial information into the global distributionbasedmeasuresof MIandJE.Wefoundthatusingthesefeature vectors improved the stability of the anatomical priors based on these measures. 3. Applied the information theoretic anatomical priors to positron range correction, a fundamental resolution limiting factor in PET. 4. Applied the scale-space feature vectors to MI based non-rigid mouse registration to simultaneously align fine skeletal as well as global soft tissue structure in the images. 5. Developed a non-rigid registration method that uses thelog likelihood of thetarget given the deformed template image as a similarity metric wherein the distribution is modeled as a GMM. This approach reduces the dimensionality of the density estimation step, and shows improved robustness and accuracy compared to the non-parametric MI based approach. 12 1.4 Organization of the dissertation Chapter 2 describes PET image reconstruction using information theoretic anatomical priors. It presents the details of reconstruction using mutual information and joint en- tropybasedpriors,andthesimulationandclinicaldataresultscomparingthetwometrics. The application of these priors to positron range correction is also described with phan- tom simulations and real data results. Chapter 3 describes MI based non-rigid mouse registration using the scale-space approach. Results are presented for a longitudinal CT study of a mouse. Chapter 4 describes the log-likelihood based non-rigid registration using a Gaussian mixture model for the probability densities. Evaluations using clinical and pre-clinical data are presented. Chapter 5 summarizes the work described in this dissertation and proposes future directions for this work. 13 Chapter 2 PET Image Reconstruction Using Information Theoretic Anatomical Priors The uptake of positron emission tomographic (PET) tracers typically results in a spa- tial density that reflects the underlying anatomical morphology. This results in a strong correlation between the structure of the anatomical and functional images. The incor- poration of anatomical information from co-registered MR/CT images into PET recon- struction algorithms can potentially improve the quality of low resolution PET images. This anatomical information is readily available from multimodality imaging equipment thatisoftenusedforacquiringdataandcanbeincorporatedintothePETreconstruction algorithm in a Bayesian framework through the use of priors. We describe a non-parametric framework for incorporating anatomical information in PET image reconstruction through priors based on mutual information and joint en- tropy. Since both these information theoretic metrics are based on global distributions of the image intensities, spatial information can be incorporated by adding features that capture the local variation in image intensities. We define features based on scale space theory, which provides a framework for the analysis of images at different levels of detail. 14 It is based on generating a one parameter family of images by blurring the image with Gaussian kernels of increasing width (the scale parameter) and analyzing these blurred images to extract structural features [Lin94]. We define the scale-space features for in- formation theoretic priors as the original image, the image blurred at different scales, and the Laplacians of the blurred images. These features reflect the assumption that boundaries in the two images are similar, and that the image intensities follow similar distributions within boundaries. By analyzing the images at different scales, we aim to automatically emphasize stronger boundaries that delineate important anatomical struc- tures, and attach less importance to internal detail and noise which is blurred out in the higher scale images. We evaluate the performance of both MI and JE as priors through simulations as well as clinical data. The simulations explore both the ideal case of perfect agreement between the anatomical and functional images, as well as a more realistic case of the imageshavingsomedifferences. Thesimulationsarebasedonbrainimagingwith 18 FDG, which has uniform uptake in the three main anatomical regions of the brain, gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). We perform Monte Carlo simulations to evaluate quantitation accuracy in specific regions of interest, as well as overall. Additionally, we compare reconstructions of clinical data obtained from a PET scan using [F-18]Fallypride, which shows uptake mainly in the striatum region of the brain. Thisclinical data serves notonly as avalidation of thesimulation results, butalso explores the behavior of the information theoretic priors when the uptake is the same in regions corresponding to different anatomical image intensities. We present an efficient method of computingthe priorsand their derivatives, which reduces thecomplexity from 15 O(MN s ) to O(MlogM +N s ), where N s is the number of voxels in the image and M is the number of points at which the distribution is computed. This makes these priors practical to use for 3D images where N s is large. We apply the information theoretic priors to a fundamental resolution limiting prob- lem of PET, positron range. Positron range is the distance traveled by a positron before annihilation. For some high energy isotopes such as Cu-60, Br-13, etc, this distance can belongandcausesablurringinthereconstructedPETimage, thuslimitingitsresolution [LDW02]. In [BRL + 03] a truncated homogeneous system model was used to correct the positron range effect by modeling it as a spatially invariant blurring that is truncated at theobjectboundarytoreflecttheassumptionthatapositronthatescapestheobjectdoes not contribute to thesinogram counts. We usethetruncated homogeneous system model in conjunction with information theoretic priors to correct for positron range by taking advantage of anatomical information to improve the resolution of the reconstructed im- age. We present simulation results based on the Cu-60 isotope (range 3.07 mm) and real data results based on the Br-76 isotope (range 3.47 mm) and compare reconstructions without range correction, and with range correction in conjunction with quadratic and information theoretic priors. 2.1 Methods 2.1.1 MAP estimation using information theoretic priors We represent the PET image by f = [f 1 ,f 2 ,··· ,f N ] T , where f i represents the activity at voxel index i, and the co-registered anatomical image by a = [a 1 ,a 2 ,··· ,a N ] T . Let 16 g denote the sinogram data, which is modeled as a set of independent Poisson random variables g i , i=1,2,··· ,K. The maximum a posteriori estimate of f is given by ˆ f =argmax f≥0 p(g|f)p(f) p(g) , (2.1) where p(f) is the prior on the image, and p(g|f) is the Poisson likelihood function given by, p(g|f) = K Y i=1 exp(− P N j=1 P ij f j )( P N j=1 P ij f j ) g i g i ! . (2.2) P ij represents the i,j th element of the K×N forward projection matrix P. Here, P not onlymodelsthegeometricmappingbetweenthesourceanddetectorbutalsoincorporates detector sensitivity, detector blurring, and attenuation effects [LQ00]. The prior term p(f) can be used to incorporate a priori information about the PET image. We utilize information that is available through co-registered anatomical images, by defining the prior in terms of information theoretic measures of similarity between the anatomical and functional images. Through this framework, we aim to reconstruct a PET image that maximizes the likelihood of the data, whilealso beingmaximally similar in the information theoretic sense to the anatomical image. To define the prior, we extract feature vectors that can be expected to be correlated in the PET and anatomical images. Let the N s feature vectors extracted from the PET and anatomical images be represented as x i andy i respectively for i=1,2,..,N s . These can be considered as independent realizations of the random vectors X and Y. Let m be the number of features in each feature vector such that X = [X 1 ,X 2 ,··· ,X m ] T . If 17 D(X,Y) is the information theoretic similarity metric that is defined between X andY, the prior is then defined as, p(f)= 1 Z exp(μD(X,Y)) (2.3) where Z is a normalizing constant and μ is a positive hyperparameter. Taking log of the posterior and dropping constants, the objective function L(f) is given by, L(f) = K X i=1 − N X j=1 P ij f j +g i log( N X j=1 P ij f j )+μD(X,Y), (2.4) and the MAP estimate of f is given by, ˆ f =argmax f≥0 L(f). (2.5) 2.1.2 Information theoretic priors We use mutual information and joint entropy as measures of similarity between the anatomical and functional images. Mutual information between two random variables X and Y with marginal distributions p(x) and p(y), and joint distribution p(x,y) is defined as [CT91], I(X,Y)=H(X)+H(Y)−H(X,Y), (2.6) 18 where the entropy H(X) is defined as H(X)=− Z p(x)logp(x)dx, (2.7) and the joint entropy H(X,Y) is given by H(X,Y)=− Z p(x,y)logp(x,y)dxdy. (2.8) Mutual information can be interpreted as the reduction in uncertainty ( entropy) of X bytheknowledgeofY orviceversa. WhenX andY areindependentI(X,Y)takes its minimum value of zero, and is maximized when the two random variables are maximally correlated. The distribution p(x,y) that maximizes MI is sparse with localized peaks. This also corresponds to a reduction in the joint entropy H(X,Y), from equations 2.6 and 2.8. We therefore define D(X,Y) in terms of MI as, D(X,Y)=I(X,Y), (2.9) or in terms of JE as D(X,Y)=−H(X,Y). (2.10) For continuous random variables, the entropy H(X) (called differential entropy) can go to −∞ unlike the Shannon entropy of a discrete random variable which has a min- imum value of zero. Differential joint entropy approaches −∞ as the joint distribution 19 approaches a delta function. Mutual information, however, has a minimum value of zero for continuous and discrete random variables. In Figure 2.1, the pdfs (estimated from joint histograms) of a reference image with someexamplereconstructedPETimagesareshown. Thexaxisofthepdfsshowndenotes PET image intensities, and y axis denotes the reference image intensities. The examples shown are of a blurred reconstruction X blur and a noisy reconstruction X noisy , of a true image X true that is identical in structure to a reference image Y. It can be seen that the joint histogram of the true image is more clustered than that of the blurred or noisy reconstructed image. The joint entropy of these pdfs is decreasing from left to right, H(X blur ,Y) > H(X noisy ,Y) > H(X true ,Y). Similarly, their MI is increasing from left to right, I(X blur ,Y) < I(X noisy ,Y) < I(X true ,Y). Thus, by using these priors, we can potentially reconstruct PET images that are less noisy, and have higher resolution due to the structural information that is available from the anatomical image. Though both MI and JE have an optimum that is characterized by a clustered joint pdf, the additional marginal entropy terms in MI cause a difference in the set of solu- tions that can be obtained using these metrics. Examples of this are shown in Figure 2.2 through two test cases, thefirstwherethefunctional imageX arbit has the samestructure as thereferenceimage butwith arbitraryintensities, and thesecondwherethefunctional image X hom has two anatomically distinct regions combined into one homogeneous func- tional region. The joint entropy H(X arbit ,Y)=H(X true ,Y)≃H(X hom ,Y), whereas the mutual information I(X hom ,Y)<I(X arbit ,Y)=I(X true ,Y). Thus JE can possibly have morethanoneanatomicalregionmaptoahomogeneousfunctionalregionwithoutchange in its value, while MI remains the same for all images that have the same structure as 20 Figure 2.1: Log of estimated joint probability distribution functions: Top: Anatomi- cal image, Middle: Example reconstructed images (L to R): Blurred, Noisy, and true. Bottom: Estimated joint pdf of the images in the middle row and anatomical image. the reference image. This causes JE to be more robust than MI to differences between the PET and anatomical images, wherethe anatomical image has structures that are not present in the PET image [Nuy07]. However, this property could also lead to distinct regions in the PET image that correspond to clusters that are close to each other in the joint density to be combined into one region. It should also be noted that since the opti- mization is with respect to image intensities, the marginal entropy term in MI may cause the PET image intensities to have high variance, thereby increasing the uncertainty or entropy in the image. Both these metrics measure similarity in distributions rather than actual values of intensities, and rely on the likelihood term to converge to the correct so- lution. We explore the properties and performance of these metrics through simulations and clinical data in the later sections. 21 Figure 2.2: Joint densities of reference image, and images with arbitrary intensities: Top (L to R) Image with same structure as reference image, but arbitrary intensities, image with two regions having same intensity, and reference image. Bottom: Log of estimated joint densities of the reference image and the two functional images considered in the top row 2.1.3 Extraction of feature vectors Mutual information and joint entropy between images are functions of the global distri- bution of intensities in the images. In PET images reconstructed using MI or JE where only intensity is taken as a feature, intensities in neighboring voxels of the PET image can be different as long as its global distribution matches that of the anatomical image. This can lead to voxels that have higher or lower intensities than their neighbors, which is undesirable since PET images are usually homogeneous within regions. We introduce spatial information into the information theoretic priors by defining feature vectors that capture the local morphology of the images. The feature vectors should be chosen such that they are correlated in the anatomical and functional images, and should reflect the morphology of the images. Since the information theoretic priors definedin this work are naturally insensitive to differences in intensity between the two images, the combination 22 (a) Original image (b) Blurred image (c) Laplacian of blurred image Figure 2.3: Scale-space features of a coronal slice of an MR image of the brain of local measures of image morphology and information theoretic measures should facil- itate reconstruction of PET images whose structure is similar to that of the registered anatomical image. Scale space theory provides a framework for the extraction of useful information from an image through a multi-scale representation of the image. In this approach, a family of images is generated from an image by blurring it with Gaussians of increasing width (scale parameter), thusgeneratingimages atdecreasinglevels ofdetail. Relevant features can then be extracted by analyzing this family of images rather than just the original image. Scale-space theory has a well defined mathematical framework that is described in [Lin94]. We use this approach to define the feature vectors as: 1. The original image 2. The image blurred by a Gaussian kernel of width σ 1 3. Laplacian of the blurred image By analyzing the image at two different scales ( the original and image at scale σ 1 ), we are giving more emphasis to those boundaries that remain in the image at the higher 23 scale, and attach less importance to those that will be blurred out at the higher scale. A coronal slice of an MR image of thebrainandits scale-space features are showninFigure 2.3. More features could be added by using different sizes of kernels, but we restrict our analysis to two scales to limit complexity. We further make the simplifying assumption of independence between features so that D(X,Y)= m X i=1 D(X i ,Y i ). (2.11) Though the scale-space images are correlated, we choose to make this independence as- sumption to reduce the computational cost of the prior term. 2.1.4 Computation of prior term Theinformationtheoreticpriorsdefinedinsection2.1.2areexpressedintermsofthepdfs of the images, which are not known. We use a non-parametric approach to estimate the jointdensityusingtheParzenwindowmethod[DHS01]. Giventhesamplesf 1 ,f 2 ,··· ,f Ns of the random variable X, the Parzen estimate ˆ p(x) of p(x) is given by, ˆ p(x)= 1 N s Ns X j=1 φ( x−f j σ ), (2.12) whereφ( x σ )isawindowingfunctionofwidthσ. Thewindowingfunctionshouldintegrate to one, and is commonly chosen to be a Gaussian of mean zero and standard deviation σ. For estimating the joint density between X and Y corresponding to the intensities of PETand anatomical imagesf anda, weuseaGaussian windowfunction witha diagonal 24 covariance matrix given by Σ = σ 2 x 0 0 σ 2 y . Thus, the joint density estimate is given by ˆ p(x,y)= 1 N s Ns X k=1 φ( x−f k σ x )φ( y−a k σ y ). (2.13) Thevariance of theParzen windowis taken as a design parameter. It hasbeen shown in [DHS01] that as N s →∞, the Parzen estimate converges to the true pdf in the mean squared sense. For PET image reconstruction (especially for the 3D case) N s is large, so itcanbeassumedthattherearesufficientnumberofsamplestogiveanaccurateestimate of the pdf. Hence, we replace p(x,y) with ˆ p(x,y) in equations 2.6 or 2.8 to compute the prior term. 2.1.5 Computation of the MAP estimate The objective function given in Equation 2.4 can be iteratively optimized by a precon- ditioned conjugate gradient algorithm with an Armijo line search technique [Ber99]. Let the estimate of the PET image at iteration k be ˆ f (k) , the gradient vector be d (k) = ∇L(f)| f=f (k), the preconditioner beC (k) , the conjugate gradient direction vector beb (k) , 25 and the step size computed from the line search be α (k) . The PCG update equations are given by ˆ f (k+1) = ˆ f (k) +α (k) b (k) (2.14) b (k) = C (k) d (k) +β (k−1) b (k−1) (2.15) β (k−1) = d (k) −d (k−1) T b (k) d (k−1) T b (k−1) . (2.16) The algorithm is initialized byb (0) =C (0) d 0 . We use the EM-ML preconditioner, which is defined by C (k) =diag( f (k) P i P ij ). The k th element of the gradient vector d can be computed from Equation 2.4 as, ∂L(f) ∂f k = N X i=1 (−P ik + g i P N j=1 P ij f j P ik )+μ ∂D(X,Y) ∂f k . (2.17) Tocomputethederivative ofD(X,Y), wefirstdefinethederivative ofmarginal andjoint entropy terms correspondingto intensity of the original image. Replacing the integration in Equation 2.7 with its numerical approximation, the derivative of marginal entropy H(X) with respect to f k is given by ∂H(X) ∂f k =− Δx N s M X i=1 (1+logp(x i ))φ ′ ( x i −f k σ x ) (2.18) 26 where φ ′ ( x i −f k σx ) = φ( x i −f k σx )( x i −f k σ 2 x ), and M is the number of points at which the pdf is computed. Similarly, the gradient of joint entropy is ∂H(X,Y) ∂f k = − ΔxΔy N s M X i,j=1 (1+logp(x i ,y j )) ∂p(x i ,y j ) ∂f k = − ΔxΔy N s M X i,j=1 (1+logp(x i ,y j ))× φ( y j −a k σ y )φ ′ ( x i −f k σ x ). (2.19) The gradient of the MI prior is then ∂I(X,Y) ∂f k = ∂H(X) ∂f k − ∂H(X,Y) ∂f k . (2.20) The gradients corresponding to the other scale-space features can be computed easily from these equations since the Laplacian and blurring are linear operations on f and a. For an interpretation of these gradients through an alternative formulation, please refer to the Appendix. We performanapproximate linesearch usingtheArmijorulesinceitdoesnotrequire the computation of the second derivative of L(f), which is computationally expensive for the information theoretic priors. Whenever the non-negativity constraint is violated, we useabentlinesearchtechniquebyprojectingtheestimatefromthefirstlinesearch(with negative values) onto the constraint set, and computing a second line search between the current estimate and the projected estimate. The objective function is a non-convex function of f, so optimization of the function using gradient based techniques require a good initial estimate to converge to the correct solution. 27 2.1.6 Efficient computation of entropy and its gradient The computation of the Parzen window estimate p(x) at M locations using N s samples is O(MN s ). In addition, the gradient in Equation 2.18 is also of complexity O(MN s ), since for each f k , we compute a summation of M multiplications. This can be expensive for large N s , which is the case in PET reconstruction. We take an approach similar to [SZS05] to compute the entropy measures as well as their gradients efficiently through the use of FFTs. The Parzen window estimate in Equation 2.12 can be rewritten as a convolution p(x) = 1 N s Ns X j=1 Z φ( x−s σ )δ(s−f j )ds = φ( x σ )⋆ 1 N s Ns X j=1 δ(s−f j ). (2.21) Here x is at continuous values, and the impulse train h(x) = 1 Ns P Ns j=1 δ(x− f j ) has impulses at non-equispaced locations. Efficient fast Fourier transforms can be used to compute convolution on a uniform grid. To take advantage of these efficient implementations, we interpolate the continuous intensity values f j onto a uniform grid with equidistant discrete bin locations ˘ x j , where j = 1,2,··· ,M, with spacing Δ˘ x. Let the image with the quantized intensity values be ˘ f. The impulses in h(x) are now replaced by triangular interpolation functions thus giving h(˘ x)= 1 N s Ns X j=1 ∧(˘ x−f j ), (2.22) 28 where ∧(u) = 1−|u|,if |u|<Δ˘ x = 0,otherwise. (2.23) Thus each continuous sample f j contributes to the bin it is closest to, as well as its neighboring bins. The Parzen estimate at the equispaced locations is given by p(˘ x)=φ( ˘ x σ )⋆h(˘ x). (2.24) This convolution can now be performed using fast Fourier transforms with complexity O(MlogM). WecanthencomputetheapproximatemarginalentropyH( ˘ X)byreplacing p(x) with p(˘ x) in Equation 2.7. The gradient equation in Equation 2.18 also has a convolution structure, and can be rewritten as ∂H(X) ∂f k =− Δx N s (1+logp(x))⋆φ ′ (− x σ ) x=f k . (2.25) If we replace p(x) in the equation with p(˘ x), we can compute the gradient corresponding to the binned intensity values ˘ f k , k = 1,2,··· ,M using convolution with FFTs. We represent this binned gradient as ∂H( ˘ X) ∂ ˘ f j . To retrieve the gradient with respect to original intensity values, we use similar interpolation as for binning giving, ∂H(X) ∂f k =− M X j=1 ∧(f k − ˘ f j ) ∂H( ˘ X) ∂ ˘ f j . (2.26) 29 This means we take the gradients at neighboring bins and interpolate to get the true gradient. Thus the complexity of computing the binned gradient is still O(M logM) followed by interpolation which is of complexity O(N s ). Additionally, the bin locations corresponding to each f k can be stored in the pdf estimation step, and retrieved in the gradient estimation step, thus giving further computational savings. This method can be extended to the 2D case by using bilinear interpolation instead of the triangular windowing function to compute the binned joint density and to interpolate the gradient at the binned locations to the continuous intensity values. We compare the the Parzen window method with the FFT based method in terms of runtimesandaccuracy ofthemarginalpdfandgradientestimates. For bothmethodswe compute the pdf p(˘ x) at the same locations ˘ x i , i = 1,2,··· ,M. To compute the Parzen gradient, we expressed Equation 2.18 as the inner product of the vectors given by Lp = − Δx N s [(1+log(p(x 1 )),1+log(p(x 2 )),··· ,1+log(p(x M ))] T Φ k = φ ′ ( x 1 −f k σ x ),φ ′ ( x 2 −f k σ x ),··· ,φ ′ ( x M −f k σ x ) T . (2.27) This vectorized form enables us to use efficient linear algebra libraries for evaluating the summation for each f k . The run-times for the combined computation of pdf p(x) and the gradient ∇H(X) using the Parzen and FFT based methods are shown in Table. 2.1 for different sizes of image. The reported run-times were obtained on an AMD Opteron 870 cluster with 8 dual core CPUs, using a single processor. The estimated pdfs and gradients using the two methods are shown in Figure 2.4. The norm of the difference between the gradients 30 (a) Image −50 0 50 100 150 200 250 300 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 Parzen Fast (b) Pdf estimate −2 0 2 4 6 8 x 10 −6 (c) Gradient estimates Figure 2.4: Comparison of Parzen and FFT-based methods: a) Image for which pdf and gradientswereestimated,b)pdfestimatesusingM =351,c)gradientofmarginalentropy using the Parzen method (left) and the FFT-based method (right). The normalized difference in the shown gradient images is 0.14% of the two methods, normalized by the Parzen gradient was less than 1% for all the sizes of image considered. Since the main computation cost for the FFT based method is the FFT based convolutions that dependon M rather than image size, we see large speed-up factors for 3D images compared to theParzen method for which thecomplexity increases linearly with image size. 2.2 Results We evaluate the performance of MI and JE priors through a) simulations that model the best case scenario of the anatomical image having identical structure as the PET image, 31 Table 2.1: Run-times for combined computation of pdf and gradient for different image sizes, for the Parzen and FFT based methods Image Size (No. of Bins) Parzen (sec) FFT- based (sec) Speed- up Factor 128×128(M =351) 1.01 0.11 9.1630 128×128×111(M =351) 117.44 0.88 133.63 256×256×111(M =351) 470.64 3.45 136.41 b) simulations of a more realistic case where there are some differences between the MR and PET regions, and c) clinical data obtained from brain imaging. We consider four different information theoretic priors: 1. JE-Intensity : Joint entropy between the intensities of anatomical and PET images 2. MI-Intensity : Mutual information between the intensities of anatomical and PET images 3. JE-scale : Joint entropy between scale-space featurevectors ( original image, image blurredbyGaussianofwidthσ 1 , andLaplacian ofblurredimage) oftheanatomical and PET images 4. MI-scale : Mutual information between scale-space feature vectors of anatomical and PET images We comparetheperformanceofthese priorsrelative toeach other, andto aquadratic prior(QP),whichdoesnotuseanyanatomicalinformation. QPpenalizeslargedifferences inintensities within aneighborhoodη i definedby 8nearestneighborsof each voxeli, and is given by, p(f)=−β Ns X i=1 X j∈η i (f i −f j ) 2 κ ij (2.28) 32 where κ ij is a weight assigned to each voxel pair f i , and f j . The weight on the prior is controlled by varying the hyperparameter μ for the infor- mation theoretic priors, and β for the QP prior. 2.2.1 Simulation Results The simulations are based on the 18 FDG tracer that shows a uniform uptake in the gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) regions of a normal brain. For all the information theoretic priors considered, we computed the densities at bincenters that we keep constant throughouttheoptimization, therange of which weset to 2.5 times the range of intensities in the initial image. We chose the Parzen standard deviation to be about 5% of the range of the initial image. To form the scale-space features, we used a Gaussian kernel of standard deviation σ 1 =2.0 pixels to generate the blurred image, and a 3×3 Laplacian kernel given by 0 1 0 1 −4 1 0 1 0 . 2.2.1.1 Anatomical and PET images with identical structure Weuseda128×128sliceoftheHoffmanbrainphantomasourfunctionalimageandscaled the three different regions (GM, WM, and CSF) differently to generate our anatomical image. BothimagesareshowninFigure2.5. Thesimulationsarebasedonasingleringof the microPET Focus 220 scanner, for which the sinogram dimensions are 288×252. The sinogram data had approximately 300,000 counts, corresponding to 30 counts per line of response (LOR). We reconstructed all images using 30 iterations of PCG algorithm, initialized by the image reconstructed after 2 iterations of ordered subset expectation 33 0 50 100 150 200 250 0 50 100 150 200 250 Figure 2.5: The phantoms used in the simulations to represent the MR image (left) and the PET image (right) . maximization (OSEM) algorithm using 6 subsets. The reconstructed images using QP and the four information theoretic priors are shown in Figure 2.6 along with the true image. The images shown for each prior are at values of hyperparameter that gave the least error between true and reconstructed images for that prior. The normalized error values as a function of iteration are shown in Figure 2.7. To evaluate the performance of these priors in quantifying uptake, we perform Monte Carlo simulations. Bias and variance of the estimated image were computed from 25 datasets, and are shown in Figure 2.8. It can be seen that the images reconstructed using information theoretic priors have less overall error, sharper boundaries, and less noise than that of QP. MI-Intensity prior produces images that follow the structure of the anatomical image, but have several high intensity pixels distributed within the GM region. This is also reflected in the Monte Carlo results, where the MI-Intensity prior has high variance in the GM region, and less bias along the boundaries compared to QP. Adding spatial information through the MI- scale prior reduces the occurrence of these high intensity pixels, and hence decreases the variance in the estimate and also the overall reconstruction error. The JE-Intensity prior reconstructs a near perfect image, with normalized error of about 2% with respect to the 34 0 100 200 300 400 500 Figure 2.6: True and reconstructed images: Top: True PET image (left), QP recon- struction with β = 0.5 (right). Middle: MI reconstruction for μ = 10000 (left), JE reconstruction for μ=5000 (right). Bottom: MI-scale reconstruction for μ=2500 (left), JE-scale reconstruction for μ=2500. 0 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Iteration number Normalized error QP MI−Intensity JE−Intensity MI−scale JE−scale Figure 2.7: Normalized error vs. iteration for best value of μ and β of the information theoretic and QP priors respectively. 35 true image. The bias and variance are both close to zero, except for a few pixels. This dataset illustrates the best case scenario for the JE prior, since the images are piecewise constant and the JE prior prefers piecewise homogeneous solutions. The JE-scale prior gives biased intensity values, and some smoothing along the boundaries. However, for a morerealisticcasewheretheimages arenotpiecewiseconstant, weexpectthescale-space features to help the JE reconstruction by providing local spatial information. This simulation should also be the best case scenario for the MI prior, since the two images are identical except in the values of intensity, to which it is robust. However, we see that the MI priors do not reconstruct an image close to the true image. This can be explained from the joint pdfs of the anatomical image and the intensity based MI and JE reconstructions shown in Figure 2.6. These pdfs are shown in Figure 2.9, and the variation of the terms in the MI and JE priors as a function of iteration number are shown in Figure 2.10. For the same initial joint density, MI is optimized by initially increasing the marginal entropy term by increasing the variance in the intensities of the GM region, while decreasing the joint entropy by clustering the high intensity pixels together as isolated peaks in the joint histogram. This caused MI to converge to a local minimum in which the isolated peaks manifested as high intensity pixels that were distributed throughout the GM. On the other hand, minimizing the JE term alone gave a steeper decrease in JE and lower final value of JE than that of MI. However, even the JE priorhas an isolated peak in its joint pdf, associated with thebackground/CSFof the anatomical image. Introduction of the scale-space features helps the priors avoid these local minima by adding terms that penalize the formation of these isolated peaks. 36 50 100 150 200 (a) Absolute of bias 0 50 100 (b) SD Figure2.8: ResultsofMonteCarlosimulations for(LtoR)QP(β =0.5), MI(μ =2500), JE(μ=5000), MI-scale(μ =2500), JE-scale(μ =2500) (a) True (b) Initial image (c) MI (d) JE Figure 2.9: Joint pdfs of the anatomical image and a) true PET image, b) OSEM esti- mate used for initialization, c) image reconstructed using MI-Intensity prior, d) Image reconstructed using JE-Intensity prior 37 0 5 10 15 20 25 30 5.24 5.245 5.25 5.255 5.26 5.265 5.27 5.275 5.28 5.285 Iteration number Hx (a) Marginal entropy of MI-Intensity prior 0 5 10 15 20 25 30 9.1 9.2 9.3 9.4 9.5 9.6 9.7 Iteration number Hxy MI JE (b) Joint entropy of MI-Intensity and JE-Intensity priors Figure2.10: Variation of marginal entropy forMI-Intensity priorandJEfor MI-Intensity and JE-Intensity priors as a function of iteration 2.2.1.2 Anatomical and PET images with differences To simulate a more realistic scenario, we used an MR image of the brain and its corre- sponding manually segmented image from the IBSR dataset provided by the Center for Morphometric Analysis at Massachusetts General Hospital and available at the website http://www.cma.mgh.harvard.edu/ibsr/. WeresizedtheMRimagetosize256×256×111 and generated the PET image from the segmented MR image by setting the activity in all voxels corresponding to GM as four times that corresponding to the WM, and CSF activity aszero. Acoronal sliceofbothimages isshowninFigure2.11. ThoughthisPET image was generated from the MR image, its structure is not identical to the MR image sincethelabelsassignsomenon-homogeneousanatomical regionstoahomogeneousfunc- tional region. The3Dsimulations arebasedona55ringBiograph scannerwithsinogram dimensions 336×336×559. The datasets simulated each had approximately 26 counts per LOR. All images were reconstructed by initializing with 2 iterations of OSEM with 21 subsets, followed by 20 iterations of MAP. 38 50 100 150 200 (a) MR image 0 0.5 1 1.5 2 2.5 3 3.5 4 (b) True PET image Figure 2.11: Coronal slice of an MR image of the brain, and true PET image generated from manual segmentation of the MR image. Reconstructed images using the QP and information theoretic priors are shown along with the true image in Figure 2.12. Normalized error between true and reconstructed images areshown in Figure 2.13 for the value of hyperparameter that gave the least error for each prior. Bias and standarddeviation images computed from 25 datasets are shown in Figure 2.14 for the values of hyperparameter that gave the least overall error. We compute the normalized bias and variance in mean uptake in regions of interest (ROIs) by, Bias = 1 N d P N d i=1 ¯ ˆ f i ROI − ¯ f ROI ¯ f ROI , SD = r ( P N d i=1 ( ¯ ˆ f i ROI − 1 N d P N d i=1 ¯ ˆ f i ROI )) 2 ¯ f ROI , (2.29) where ¯ f ROI is the mean true activity in the region of interest, ¯ ˆ f i ROI is the mean activity in ROI of the estimated image from the i th dataset, and N d is the number of datasets. We draw ROIs in the thalamus, caudate, putamen, and WM, and the plots of bias versus 39 0 1 2 3 4 5 6 Figure2.12: Trueandreconstructedimages: Top: TruePETimage(left), QPreconstruc- tionwithβ =0.5(right). Middle: MIreconstructionforμ=1e5(left), JEreconstruction for μ = 150000,Bottom: MI-scale reconstruction for μ = 1e5 (left), JE-scale reconstruc- tion for μ=1e5. standard deviation as the hyperparameter is varied are shown in Figure 2.15 for these regions. The reconstructed images using the information theoretic priors show higher contrast than that of QP, with the JE priors giving the best performance in terms of overall error metric. MI-Intensitypriorhasisolatedvoxelswithhighintensity, andhighvarianceinthe GM as expected. The MI-scale prior reduces the occurrence of these hot spots, and the variance in the GM. However, the overall error for the MI priors is now higher than that of QP, possibly both because of the differences in the anatomical and PET images, and 40 0 2 4 6 8 10 12 14 16 18 20 0.23 0.24 0.25 0.26 0.27 0.28 0.29 Iteration number Normalized error QP MI JE MI−scale JE−scale Figure 2.13: Normalized error as a function of iteration. the higher ratio of volume of GM to WM compared to the Hoffman phantom simulations (2.15forthisdataset comparedto0.7forHoffmanphantom). Inthisrealisticsimulation, theJE-Intensitypriorhassomeisolatedhotandcoldspotswithinregionsofhomogeneous activity, whichisundesirableforanyapplicationofPET.TheJE-scalepriorreducesthese inhomogeneities andthevarianceintheestimate oftheimagerelative totheJE-Intensity prior. The ROI curves shown in Figure 2.15 indicate that the information theoretic priors donot offer atraditional trade-off between bias andvariance. Thesepriorsare non-linear functions of the image that is being estimated, and exhibit a non-monotonic variation as the hyperparameter is changed. However, the ROI plots still offer an insight into the relative performanceofthesepriorswithrespecttoQPandtoeach other. IntheWM, all the information theoretic priors have lesser bias and variance compared to QP. In all the GM regions considered ( putamen, caudate, thalamus), the JE-Intensity prior gave the least bias and variance compared to all the priors considered, but has an unpredictable 41 0.5 1 1.5 2 2.5 3 3.5 (a) Absolute of bias 0.2 0.4 0.6 0.8 1 1.2 1.4 (b) SD image Figure 2.14: Coronal slice of bias and SD images: Top (L to R): QP (β = 0.5), MI- Intensity(μ = 1e5), JE-Intensity(μ = 150000), Bottom (L to R): MI-scale(μ = 1e5), JE-scale (μ=1e5). 42 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0 0.005 0.01 0.015 0.02 0.025 0.03 putamen |Bias| SD QP MI JE MIsc JEsc (a) Putamen 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 thalamus |Bias| SD QP MI JE MIsc JEsc (b) Thalamus 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 WM |Bias| SD QP MI JE MIsc JEsc (c) WM 0 0.02 0.04 0.06 0.08 0.1 0.12 0.005 0.01 0.015 0.02 caudate |Bias| SD QP MI JE MIsc JEsc (d) Caudate Figure 2.15: Absolute value of bias vs. standard deviation curves for selected regions of interest for quadratic prior (QP), mutual information (MI) and joint entropy (JE) priors using only intensity as a feature, and MI scale-space (MIsc) and JE scale-space (JEsc) priors. behavior as the hyperparameter is varied. The JE-scale prior is better behaved, showing a more stable decrease in variance with corresponding increase in bias. Overall, we see that JE based priors outperform QP and MI based priors, and that introducing spatial information through the scale-space features mitigates the instability producedduetothelackofspatialcorrelationsinherentintheseglobaldistributionbased information theoretic metrics. WethereforeprefertheJE-scalepriordespiteits increased bias compared to JE-Intensity prior. 43 2.2.2 Clinical data PETdatawasacquiredonaSiemensBiographTruePointHDscannerfromaParkinsons disease patient who was injected with [F-18]Fallypride. This tracer has a high affinity to D2 receptors, which have a high density in the striatal regions of the brain. PET images using this tracer show uptake mainly in the caudate and putamen subcortical nuclei (jointly referred to as the striatum), and therefore this data would serve to both validate the simulation results, and to investigate the behavior of the information theoretic priors when the PET image intensities do not uniformly follow the variation in the anatomical image intensities. A T1 weighted MRI scan of the subject was performed on the same subject and is usedastheanatomical imageforreconstruction. TheMRimagewasregisteredtoaMAP with QP reconstructed image using MI based rigid registration in RView [SHH99]. The QP image along with the co-registered MR image is shown in Figure 2.16. Histogram equalization was then performed on the MR image to improve contrast. We initialized the MAP algorithm for all priors by the result of 2 iterations of OSEM with 21 subsets, and ran 20 iterations of MAP. We used similar parameters for the bin widths and Parzen variance as for the simulations. However, for the JE-scale prior, we used a scale parameter of σ 1 = 3.0 pixels, and dropped the Laplacian term. The Laplacian of these PET images is non-zero in few voxels ( mainly in the striatal regions), which produces a joint density of the Laplacian images consisting of a large peak around zero, and a few peaks corresponding to the edges. Since the JE metric can produce homogeneous solutions, it tends towards a Laplacian joint density which has one large 44 peak around zero, producing undesirable results. The marginal entropy term in the MI- scale prior prevents this from occur The images reconstructed using the different priors are shown in Figure 2.17 for the slice corresponding to that of the anatomical image shown in Figure 2.16. The QP and JE-scale reconstructions are overlaid on the anatomical image and the coronal, sagittal, and transaxial views are shown in Figure 2.18. It can be seen that for QP, the activity is often spread outside the anatomical region to which it belongs, while this spreading is reduced for the priors using anatomical information. We quantify the performance of these priors in terms of their contrast recovery coef- ficient (CRC) as a function of noise. We define CRC between the uptake in a region of the striatum and a region in the WM close to it as, CRC = ¯ f Str ¯ f WM , (2.30) where ¯ f Str is the mean activity in a region defined in the striatum, and ¯ f WM is the mean activity in a WM region. We considered ROIs in the caudate region of the striatum, and the WM that separates the caudate from the putamen on 11 contiguous slices of the brain. One transaxial slice showing the ROIs for CRC computation are shown in Figure 2.19(a), with red and blue representing the caudate and WM ROIs. We computed the noise variance from regions in the brain where we do not expect uptake of tracer, such as thewhitematter intheposterior ofthebrain. ThenoisevarianceROIsforonetransaxial slice is shown in Figure 2.19(b). The CRC vs noise plots are shown in Figure 2.20. 45 (a) (b) Figure 2.16: Coronal slice of PET image (left) with co-registered MR image (right). 0 1 2 3 4 5 6 7 8 Figure 2.17: Coronal slice of PET image reconstructions: Top (L to R): QP(μ = 1.5), MI-Intensity(μ = 75000), JE-Intensity(μ = 1e5). Bottom (L to R): MI-scale(μ = 1e5), JE-scale(μ =1e5). These results demonstrate that the information theoretic priors can reconstruct high contrast images that take advantage of anatomical image information, in a real world scenario where the anatomical and functional images are independently acquired and then registered to each other. Additionally, the results indicate that these priors can be used even for tracers that bind non uniformly within regions of similar intensities in the anatomical image (e.g., GM, WM, CSF for the brain), as long as the uptake is homogeneous within some clearly delineated anatomical structurethat is large enough to form a prominent peak in the joint density. 46 Figure 2.18: Overlay of PET reconstruction over co-registered MR image for QP (top) and JE-scale (bottom) priors. (a) (b) Figure 2.19: Regions of interest for computing CRC vs noise curves: a)ROIs in caudate (red) and WM separating it from putamen (blue), (b) ROIs for computing noise variance 47 0.01 0.02 0.03 0.04 0.05 0.06 0.07 2.1 2.15 2.2 2.25 2.3 2.35 2.4 2.45 2.5 2.55 2.6 QP MI JE JEsc MIsc Figure 2.20: Contrast recovery coefficient versus noise variance plots as the hyperparam- eter is varied. Figure 2.21: Reconstruction for high value of μ for MI-scale(left) and JE-scale(right) priors. TheCRCcurvesshowthattheinformationtheoreticpriorsreconstructedimageswith higher contrast than that of QP. While QP has a decreasing contrast and noise variance as the hyperparameter is increased, the MI and JE based priors have increasing contrast as μ is increased. An increasing μ would give larger weight on the prior, thus causing the joint density to be more clustered, which would correspond to a higher contrast. We see that consistent to our simulation results, the scale-space priors have reduced noise variance compared to the intensity based priors, without appreciable loss in contrast. 48 The JE-scale prior gives the best contrast to noise performance among all the priors considered. However, wenotethatthebehavioroftheinformationtheoreticpriorsisunpredictable at high values of μ, where the prior dominates the likelihood term. MI tends to produce images that spuriously follow the anatomical regions, whereas JE prefers a piecewise homogeneous image, and may join regions that are close to each other in intensity. This is illustrated in Figure 2.21, which shows the MI-scale and JE-scale reconstructions for high values of μ. This instability makes the choice of hyperparameter especially important for these priors. We have observed that the knee of the L curve between the log likelihood and log prior terms yields reasonable values of μ, but for clinical use, these results should always be verified with the OSEM or MAP with QP results to ensure that no misleading artifacts are produced. 2.3 ApplicationofInformationTheoreticPriorstoPositron Range Correction In this section, we apply JE with scale-space features as a prior in conjunction with the truncated homogeneous system model for positron range correction. The motivation for using JE-scale priors for this application is to take advantage of anatomical information tocorrect forthepositronrangeeffectwherethetruncated homogeneous systemmodelis insufficient. Using these priors can reduce blurring due to positron range outside regions whereuptakeisexpected, byusingaco-registered anatomical image thatdelineates these 49 regions. Thus we aim to correct for positron range such that the PET image not only fits the data, but also is most similar to the anatomical image in the information theoretic sense. The likelihood p(g|f) in Equation 2.1 is a Poisson likelihood function with mean Pf, where P is the system matrix that models the mapping from the source voxels to the detectors. In [LQ00] a factored system model is used, where this P matrix incorporates detector blurring, detector sensitivity, and attenuation effects. Positron range effect can also beincluded in this factored system model througha matrixP range and thecomplete system model can be written as P=P sens P blur P attn P geom P range , (2.31) whereP geom representsthegeometricprojectionfromsourcetodetectorandP attn ,P blur , P sense , and P range model the attenuation, detector blurring, sensitivity, and positron range effects respectively. We use the truncated homogeneous model of [BRL + 03] which can be represented as P range =TP hom T, (2.32) whereP hom is a convolution matrix representingthe spatially invariant blurringassumed within the subject and T is a diagonal truncation matrix that truncates at the object boundary. We use theP matrix as defined in Equation 2.31 in the likelihood function to obtain the objective function as defined in Equation 2.4, with the prior D(X,Y) being the JE-scale prior defined as−H(X,Y), whereX and Y are the scale-space features. 50 0 50 100 150 200 250 0 10 20 30 40 50 60 70 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2.22: The phantoms used in the simulations to represent the MR image (left), the PET image (middle), and the truncation mask (right). The line in the PET image shows the position of the profile shown in Fig. 2.25. 2.3.1 Positron Range Simulations We used a 3D Hoffman brain phantom resampled to size 128× 128× 63 as our func- tional image and scaled the three different regions (gray matter, white matter, and CSF) differently to generate our anatomical image. A truncation mask for the homogeneous truncatedmodelwasgeneratedfromtheanatomicalimage,onetransaxialsliceofwhichis shownalongwiththePETandanatomicalimagesinFigure2.22. Forsimulatingpositron rangeblurring,weusedtherangekernel obtainedfromMonte Carlosimulations of Cu-60 isotope in water, which has a range of 3.08mm [LDW02] and is shown in Figure 2.23. The same kernel is used for generating the data and for reconstruction. The simulations are based on a Concorde R4 scanner, and the sinogram data obtained had 15 counts per line of response. We used 30 PCG iterations to reconstruct all images at a voxel size of 0.4×0.4×1.211 mm. We compare three methods for reconstruction: 1. NRC-QP: No range correction and quadratic prior. 2. RC-QP: Range correction using the truncated homogeneous model, and quadratic prior. 51 −4 −3 −2 −1 0 1 2 3 4 0 0.01 0.02 0.03 0.04 0.05 0.06 Distance(mm) Range kernel Figure 2.23: Positron range kernel for Cu-60 isotope vs distance (mm) 0 50 100 Figure 2.24: Reconstructed images(L to R): No range correction with QP (NRC-QP) , range correction with quadratic prior (RC-QP) with μ = 0.01, range correction with JE prior (RC-JE), scale σ =0.5 voxels and μ=5e4. 3. RC-JE: Range correction using truncated homogeneous model, and JE-scale prior. The reconstructed images using NRC-QP, RC-QP, and RC-JE with scale σ = 0.5 voxels are shown in Figure 2.24. A profile drawn through the reconstructed images is shown in Figure 2.25. It can be seen that without range correction, the reconstructed image is blurred and the boundaries between GM/WM and WM/CSF are not clearly delineated. The RC-QP images demonstrate improvement compared to the NRC-QP reconstruction, but we see 52 20 30 40 50 60 70 80 90 100 110 0 10 20 30 40 50 60 70 80 90 100 True NRC RC−QP RC−JE Figure 2.25: Profile drawn through true and reconstructed images thattheactivity isoverestimated closetotheobjectboundary,andtheinternalstructures are still blurred. The RC-JE reconstruction is able to take advantage of the anatomical informationtoprovidesharperboundarydefinitionevenforthesmallerinternalstructures in the image. The overall normalized quantitation error between true and reconstructed images is 0.6467 for NRC-QP, 0.5232 for RC-QP, and 0.1498 for RC-JE. Thus the JE prior based reconstruction yields images with improved boundary definition and lower quantitation error. 2.3.2 Real data results PET data was acquired on a Focus 220 scanner from two mice that were injected with Br-76 isotope. This isotope has a positron range of 3.47 mm. A CT image of these mice was acquired on a microCT scanner at a resolution of 0.197×0.197×0.197 mm. We reconstructed a PET image of size 256×256×95 at a resolution of 0.4×0.4×0.8 mm, without range correction and using a quadratic prior. The CT image was then registered to the reconstructed PET image, and used as the anatomical image for the 53 range corrected reconstruction. A truncation mask is obtained from the CT image for truncated homogeneous range correction. The CT image registered to the PET image is shownin Figure2.26 along withthetruncation mask. Thepositronrangekernel forBr76 was obtained using Monte Carlo simulations and is shown in Figure 2.27. We reconstructPETimages fromthedatausingNRC-QP, RC-QP, andRC-JE meth- ods. We initialized the MAP algorithm for all three methods by the result of 2 iterations of OSEM with 21 subsets, and ran 50 iterations of MAP. For the RC-JE method we used the JE-scale prior with a scale parameter of 2 pixels. A coronal view of the images reconstructed using these methods is shown in Figure 2.28 along with their overlay on the CT image. Itappearsfromthesefiguresthattherangecorrectedimagesarelessblurredandhave better boundary definition than the NRC-QP image. The RC-QP image produces ring artifacts around region boundaries by overestimating the activity within the region and compensating it by undershoots around the region. The RC-JE prior takes advantage of the available anatomical information to produce apparently better boundary definition with activity that is contained within anatomical regions, and without the ring artifacts. These results are preliminary, but they are sufficiently encouraging to warrant further study of range correction using the approach described here. 2.4 Conclusions We described a scale-space based framework for incorporating anatomical information into PET image reconstruction using information theoretic priors. Through idealized 54 (a) (b) Figure 2.26: (a)CT image registered to reconstructed PET image and (b) truncation mask obtained from theCT image for truncated homogeneous model of range correction. 55 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 0 5 0 0.5 1 1.5 2 2.5 3 3.5 x 10 −3 Distance(mm) Distance(mm) Range kernel Figure 2.27: Positron range kernel for Br-76. Figure 2.28: Reconstructed images: Top row: Coronal slice of PET images reconstructed using NRC-QP (left), RC-QP (middle), and RC-JE (right). Bottom row: Overlay on CT of the images shown in the top row. 56 simulations, realistic simulations, and clinical data, we evaluated the performance of mutual information and joint entropy as priors in PET reconstruction . We presented an efficient FFT based approach for the computation of these priors and their derivatives, making it practical to use these priors for 3D image reconstruction. The reconstructed images using the information theoretic priors have higher contrast than those obtained using the quadratic prior, and look realistic without the mosaic- like appearance that results from anatomical priors that impose piecewise smoothness. Through simulation studies we showed that the marginal entropy term in MI tends to increase thevariance in theestimate of the image. We noted that testing the information theoretic priorson piecewise constant images can give misleadingly accurate results while using only intensity as a feature, especially for JE, since it prefers piecewise constant images. Themorerealistic simulationsthatweperformedshowthatusingJE-Intensity is pronetoisolatedhotandcoldspots,andvariesunpredictablywiththehyperparameter. It isthereforeimportanttoincorporatespatialcorrelationsintothesepriors. Thescale-space based approach described in this dissertation improved the behavior of the information theoretic priors, and the JE-scale prior gave better quantitation compared to QP while usingonlytwo additional features, thecoarser scale image andits Laplacian. Theclinical data results were consistent with the simulations, and show that these priors can be used even for tracers that donot have homogeneous uptakein regions correspondingto similar intensities in the anatomical image, provided the activity corresponds to a prominent anatomical region. It should however be noted that both mutual information and joint entropy priors in PET reconstruction rely on the likelihood term to ensure convergence to an image with 57 the correct intensity. When the prior term dominated the objective function, MI tended to produce an image that had a structure identical to the anatomical image, whereas JE tended towards a piecewise homogeneous solution. This suggests that a weighting between zero (JE) and one (MI) on the marginal entropy term of mutual information could be used depending on the application. We applied these information theoretic priors to a fundamental resolution limiting problem in PET, positron range. These priors were used in conjunction with a truncated homogeneous model for positron range that is incorporated into the system model. We performedsimulationsthatmodeledthebestcasescenariowherethePETandanatomical images have identical structure, and the range kernel is the same for forward projection and reconstruction. The results of these simulations indicate that using the JE-scale priorwith truncated homogeneous system model for rangecorrection yielded images that havebetterboundarydefinitionandlowerquantitationerrorthanthoseobtainedusingno rangecorrectionwithQPorrangecorrection withQP.Wealsoperformedreconstructions from real data obtained by imaging mice that were injected with a long range isotope. TheresultsindicatethatusingtheJE-scaleprioralongwithahomogeneoussystemmodel incorporating positron range is able to produce reconstructions with apparently better boundary definition and without the ring artifacts that are produced by the RC-QP method at regions where the system model is not accurate. The positron range modeling used in these results is simple and does not consider the effects of spatially variant range resulting from differences in electron density within the imaging volume. However, it is promising that the JE based prior appears to perform well with experimental data, even with this imperfect range model. 58 Thisframework of information theoretic priorscan also beappliedto other functional imaging modalities where the data is severely ill-conditioned and the use of anatomical informationmayimprovetheaccuracyofreconstruction. Promisingresultswereobtained for diffuse optical tomography using this framework of information theoretic anatomical priors [PSG + 09]. For PET, we expect these priors to be most useful in applications involving quantification of uptake in specific organs, where the uptake is within a region that is clearly discernible in the anatomical image. 2.5 Appendix: Alternative formulation to the JE prior Joint entropy between two random variables X and Y is defined as H(X,Y) = − Z p(x,y)logp(x,y)dxdy. = −E(logp(X,Y)), (2.33) This expectation can be approximated by the sample mean. If the intensities of image f and a are realizations of X and Y respectively, H(X,Y)=− 1 N s Ns X i=1 logp(f i ,a i ) (2.34) If we use Parzen windows to estimate p as described in the chapter, then, H(X,Y) =− 1 N s Ns X i=1 log( Ns X j=1 φ( f i −f j σ x , a i −a j σ y )) (2.35) 59 The derivative of H(X,Y) with respect to f m is then, ∂H(X,Y) ∂f m = − 1 N s Ns X i=1,i6=m φ ′ ( f i −fm σx )φ( a i −am σy ) p(f i ,a i ) − 1 N s Ns X i=1 φ ′ ( fm−f i σx )φ( am−a i σy ) p(f m ,a m ) (2.36) Since we are using a symmetric Gaussian kernel with standard deviation σ x for the functional image, φ ′ ( f i −fm σx ) =φ ′ ( fm−f i σx )=φ( f i −fm σx ) f i −fm σ 2 x , we get ∂H(X,Y) ∂f m = − 1 N s Ns X i=1 ( φ( f i −fm σx )φ( a i −am σy ) p(f i ,a i ) + φ( fm−f i σx )φ( am−a i σx ) p(f m ,a m ) )( f i −f m σ 2 x ). (2.37) If we let α im = φ( f i −fm σx )φ( a i −am σy ) p(f i ,a i ) ,and β im = φ( fm−f i σx )φ( am−a i σy ) p(f m ,a m ) , (2.38) then Equation 2.37 can be rewritten as, ∂H(X,Y) ∂f m = − 1 N s Ns X i=1 (α im +β im )( f i −f m σ 2 x ), = − 1 N s Ns X i=1 W im ( f i −f m σ 2 x ). (2.39) 60 Theweights α im , andβ im are bothbetween 0 and 1, since the denominator in each of thesetermsisobtainedbyasummationofthenumeratoroverallmori. TheweightW im approaches 0 when f i and a i are distant from f m and a m respectively , and approaches 2 when they are close . Hence, W im selects the voxels with intensities that are within a few standard deviations of f m in both the anatomical and functional images, and updatesf m such that the intensities in these voxels are clustered closer together. ThegradientinEquation2.39canbefurtherapproximatedbyconsideringonlyvoxels that are spatially close to the voxel m, i.e., in a neighborhood η m , giving ∂ ˆ H(X,Y) ∂f m =− 1 N s X i∈ηm W im ( f i −f m σ 2 x ). (2.40) This expression is similar to that of the quadratic prior, except that the weight W im selects only those voxels within the neighborhood η m that have intensities close to f m . In this case, the voxels that are spatially close to each voxel influence the gradient rather than the more global interaction described in Equation 2.39. 61 Chapter 3 Mutual Information Based Non-Rigid Mouse Registration Using A Scale-Space Approach Longitudinal andinter-subjectstudiesareoften performedin small animal imagingin or- dertostudychangesinmouseanatomyandfunctionoveraperiodoftime,oracrosspopu- lations. Changesinanimalposture,tissuegrowth, organmovementandotheranatomical changes during longitudinal studies require non-rigid registration of the acquired images foraccurate analysis. Normalization of anatomical variability across populationsininter- subject studies also requires non-rigid registration. In this chapter, we describe a Gaussian scalespace theory based approach to simulta- neously align global structuresuch as overall shape, as well as detail such as theskeleton, in small animals. We extract scale-space feature vectors at each voxel of the target and template images to be registered. Each feature vector is defined as a vector of the in- tensities of its scale-space images at that voxel. Detail such as the skeletal structure is visible in the lower scales and is blurred out at the higher scales, while the global struc- ture such as overall shape remains in the images at the higher scales. The scale-space feature vectors are aligned with the goal of finding a common mapping that aligns the 62 skeleton as well as overall shape in the mouse images. We do not explicitly impose any rigidity constraints near the skeleton. We use mutual information (MI) as a similarity metric between the target and template images since it measures the similarity between distributionsofintensities ratherthanactualvaluesofintensities intheimage, thusbeing morerobusttointensitydifferencesintheseimages[WVA + 96]. Thismulti-scaleapproach has the additional advantage of having a cost function that is less prone to local minima [KLD04], and hence is able to perform large non-linear deformations accurately. We parameterize the displacement field using the DCT basis, and use the Laplacian of the field as a regularizing term. The DCT bases are eigen functions of the discrete Laplacian, so using the DCT representation of the displacement field in conjunction with Laplacian regularization simplifies the regularization term to a diagonal matrix. More- over, the DCT basis gives an efficient representation of the displacement field, allowing us to represent the field using only a few coefficients. Fast implementations of DCT are readily available, and can be used to further reduce the computation time. In [AF99] a similar parametrization is used for non-rigid registraton using SSD metric with Gauss Newton optimization, which is fast but can be used only for the SSDmetric. The Gauss- Newton optimization also restricts the numberof bases that can beused in practice. The paper recommends the use of less than 8×8×8 coefficients to avoid problems. We use a conjugate gradient optimization with Armijo line search to optimize our cost function, and there is no constraint on the number of bases that can be used. We apply the scale-space registration algorithm to CT images from a longitudinal mouse study. We compare the results of this scale-space approach with those obtained by a hierarchical multi-scale approach that is commonly usedin brain registration, where 63 images at each scale are aligned hierarchically, starting at the highest scale and ending at the lowest scale. 3.1 Methods and Results 3.1.1 Mutual information based non-rigid registration Let i 1 (x) and i 2 (x) represent the intensities of target and template images respectively at position x. Let the transformation that maps the target to the template be T(x) = x−u(x), where u(x) is the displacement field. Let the feature vectors extracted from the target and template images at voxel x be i 1 (x) and i 2 (x) respectively. These can be considered as realizations of the random vectors I 1 and I 2 . Let N s be the number of features in each feature vector such that I 1 = [I 1 1 ,I 2 1 ,··· ,I Ns 1 ] T , where I j 1 is the random variable corresponding to feature j of the target image. Then, the mutual information between feature vectors of the target and deformed template, D u (I 1 ,I 2 ) is defined as [CT91], D u (I 1 ,I 2 )= Z p u (I 1 ,I 2 )log p u (I 1 ,I 2 ) p(I 1 )p u (I 2 ) dI 1 dI 2 . (3.1) A differentiable estimate of the density can be obtained using Parzen windows given by [DHS01], p u (I 1 ,I 2 )= X x φ I 1 −i 1 (x) ρ φ I 2 −i 2 (x−u(x)) ρ , (3.2) where φ is a Gaussian window and ρ determines the width of the window. 64 We use the Laplacian of the displacement field as a regularizing term. The objective function is given by max u D u (I 1 ,I 2 )−μ ∇ 2 u(x) 2 , (3.3) where μ is a hyperparameter that controls the weight on the regularizing term. For simplicity, we assume independencebetween features so that MI between the feature vectors simplifies to the sum of MI between the individual features. D u (I 1 ,I 2 )= Ns X j=1 D u (I j 1 ,I 2 j ). (3.4) 3.1.2 Scale-space feature vectors We use a Gaussian scale-space theory based approach to define feature vectors that are extracted from the images. We generate images at different scales by blurring the target and template with Gaussian kernels of increasing widths. Let the width of the Gaussian kernel at each scale be σ j , j = 1,2,··· ,N s , such that σ j > σ k for j > k. Let the smoothing kernel at scale j be represented by S j (x). We define the scale-space based feature vector as i 1 (x) =[i 1 (x) 1 ,i 1 (x) 2 ,··· ,i 1 (x) Ns ], where i 1 (x) j is the intensity of the target image at scale j and position x, and is given by, i 1 (x) j =S j (x)∗i 1 (x). (3.5) In this work, we use N s =2, where the first scale corresponds to no smoothing and σ 2 =3. A coronal slice of a mouse image at these scales is shown in Fig. 3.1. 65 Figure 3.1: Coronal slice of scale-space images of a mouse for N s = 2, with scale 1 corresponding to no smoothing (left) and σ 2 =3 (right). Wegeneratethefeaturevector ofthedeformedtemplatei 2 (x−u(x))byfirstapplying thedisplacementfieldtotheoriginaltemplate,andthengeneratingthescale-spaceimages of that deformed image. So the feature i 2 (x−u) j is given by, i 2 (x−u) j =S j (x)∗(i 2 (x−u)). (3.6) We observed that this performsbetter than deformingthe original scale-space images of the template. This could be because in our approach, we are retaining the original relationship between intensities as a function of position in the template image while computing i 2 (x−u(x)), and then computing the similarity metric between target and a linear transformation (smoothing) of the deformed template. In contrast, applying the displacement field to the original scale-space images will mean that we are computing the similarity metric between the target and a new function (S j ∗ i 2 )(x−u) of the template. Though generating the scale-space feature vectors at every iteration increases the computation, we take this approach for better accuracy. 66 3.1.3 Displacement field and regularization The displacement field u(x) can be represented in terms of the DCT basis. Let b i (x) and β i (x), i = 1,2,··· ,N b represent the DCT basis vectors, and the DCT coefficients respectively. Then, the displacement field is given by, u(x)= N b X i=1 β i b i (x)=B(x)β, (3.7) whereB(x) =[b 1 (x),b 2 (x),··· ,b N b (x)],andthecoefficientvectorβ =[β 1 ,β 2 ,··· ,β N b ] T . The DCT basis vectors are eigen functions of the discrete Laplacian. Let L be the discrete Laplacian matrix and γ i , i=1,2,··· ,N b be the eigen values ofL corresponding to the basis b i (x). Then, Lu(x)=L N b X i=1 β i b i (x)= N b X i=1 γ i β i b i (x) (3.8) Hence, the discrete approximation of the norm of the Laplacian, ∇ 2 u(x) 2 is given by, ||Lu(x)|| 2 = N b X i=1 γ i 2 β i 2 . (3.9) Thuswesavethecomputationcostofthematrixvectormultiplication requiredbythe left hand side of Equation 3.9, by reducing it to the vector norm on the right hand side. Additionally, sincetheDCTgives asparserepresentation, wecanmodelthedisplacement field with N c <N b coefficients to reduce computation time further. 67 Figure 3.2: Sagittal slice of target (left) and template (right) mouse images. Finally, the objective function is given by max β Ns X j=1 D Bβ (I j 1 ,I 2 j )−μ( Nc X j=1 γ j 2 β j 2 ). (3.10) We optimize this objective function using a gradient descent algorithm with Armijo line search [Ber99]. 3.1.4 Results WeuseCTimagesfromalongitudinalmousestudy. Weusetwotimepointsofthatstudy that are two months apart. The CT images were acquired using the microCT system at a resolution of 0.2× 0.2× 0.2 mm. We first perform a MI based rigid registration on the two images using the RView software [SHH99]. The rigidly registered images are shown in Figure 3.2. It can be seen that there are considerable differences in the skeleton, soft tissue, and overall shape of the images. We downsampled the CT images to size 128×128×64, corresponding to a resolution of 0.4×0.72×0.51 mm to reduce computation. We used these downsampled images as the target and template images in 68 the non rigid registration algorithm. We use two scales to form our feature vectors- the unsmoothed image, and the image smoothed by a Gaussian kernel of size [7x7x7] and width σ =3. We useN c =30 DCT bases and μ=5e−6. We first perform the non-rigid registration at only the higher scale (σ =3), and we use this to initialize the scale-space non rigid registration. Registration at only the lower scale (unsmoothed images) gave a local minimum that would not be a good initialization. We compare the scale-space registration results to those of hierarchical multi-scale registration. We used the same number of DCT bases and the same scales as the scale- space registration, with μ = 2e−5. We first registered the smoothed images, and used the resulting displacement field to initialize the registration for the unsmoothed images. The overlay on the target image of the template image, and images registered using both approaches are shown in Fig. 3.3. We applied the displacement field resulting from both registration algorithms to the higher resolution (0.2 mm) images for display purposes. The displacement field obtained from the scale-space registration is shown in Fig. 3.4, for the coronal and sagittal slices shown in Fig. 3.3 It can be seen that the scale-space non-rigid registration algorithm gives better skele- tal, limb, and soft tissue (such as liver and heart) alignment than the hierarchical multi- scale algorithm. 3.2 Discussion We used a scale-space theory based approach for non-rigid mouse registration using mu- tual information. We used the DCT basis to efficiently represent the displacement field, 69 Figure 3.3: Comparison of registered images: Coronal and sagittal views of overlay on targetimageoftemplate(toprow),hierarchicalmulti-scaleregisteredimage(middlerow), and scale-space registered image (bottom row). 70 Figure 3.4: Displacement field obtained from scale-space registration as well as to simplify the Laplacian regularization term. We generated the scale-space feature vectors at each iteration after applying the displacement field to the original im- ages, since this approach gave better accuracy. We applied this algorithm to CT images of a mouse in a longitudinal study. By using images at different scales simultaneously, we obtained better alignment of the global structure such as the overall shape of the mouse as well as improved align- ment of detail such as the skeleton, compared to the hierarchical multi-scale approach. Additionally, using multiple scales simultaneously gives a cost function that is less prone to local minima. Though these results are encouraging, we note that this approach is computationally intensive. We also note that using non-parametric density estimation in conjunctionwithnon-rigidMIbasedregistrationmightposeproblemsbecauseofthehigh dimensionality oftheproblemcoupledwiththenon-convexity ofthecostfunction. Inthe next chapter, we explore an alternative formulation for non-rigid registration that deals 71 with the high dimensionality of the combined density and displacement field estimation problems. 72 Chapter 4 Non-Rigid Registration Using Gaussian Mixture Models Registration techniques that use mutual information as a similarity metric typically use a non-parametric approach for estimating the unknown distribution of intensities in the images [WVA + 96], [DMVS03] - [CDH + 06], [Chapter 3]. This framework has some limi- tations when dealing with the high dimensional non-rigid registration problem. Firstly, mutual information is a non-convex function of the registration parameters and the reg- istration could converge to an inaccurate local optimum. Thisproblem is exacerbated for the non-rigid registration case due to the increase in dimensionality of the deformation field to be estimated compared to the rigid registration case, for which this framework has been successfully applied [WVA + 96]. Secondly, the non-parametric Parzen based approach that is typically used in MI based registration estimates the entire joint density from the images, and this joint density is then used to estimate the deformation field. Using these two high dimensional estimation steps in conjunction with each other could result in a registration that is sensitive to small changes in the images, thus making it more susceptible to local optima. Additionally, the Parzen window method requires 73 appropriatechoiceoftheParzenwindowwidth,whichisusuallytaken asadesignparam- eter, and is kept fixed over the entire sample [DHS01]. This has the drawback that for long-tailed distributionsthedensityestimatetendstobenoisyatthetails, andincreasing the window width to deal with this might lead to oversmoothing the details in the distri- bution [Sil86]. This would mean that the former scenario results in a cost function that has more local optima, while the latter could lead to inaccurate registration results. This approach also requires a large number of samples to accurately estimate the distribution. In this chapter, we describe an alternative approach for multi-modality non-rigid registrationthatusesthelog-likelihoodofthetargetimagegiventhedeformedtemplateas a similarity metric for non-rigid registration, wherein the log-likelihood is modeled using Gaussian mixture models (GMM). Gaussian distributions are commonly used in image segmentation to represent the distribution of intensities corresponding to a particular tissue type in MR or CT images [SSLS + 01]. The joint distribution of two MR or CT images is characterized by localized blobs that can also be modeled as a mixture of Gaussians. Using GMMs reduces the density estimation step to that of estimating the parameters of the GMM, which consist of the mean, covariance, and weight of each Gaussian. Sincethe estimated parameters can bedifferent for each Gaussian, the density estimates obtained using this approach are smoother than the non-parametric approach whileretainingthedetail inthedistribution. Wechoosethenumberofcomponentsinthe GMM from the joint histogram of the target and template images, and keep this number constant throughout the registration. This can hence be considered as a semi-parametric approach to density estimation. For images that have a few distinct regions of intensity 74 such as brain or mouse images, the number of parameters to be estimated is small, and can berobustly estimated fromfewer samples compared to thenon-parametricapproach. Our approach of using the log-likelihood of the target given the template in con- junction with a GMM can be viewed as a semi-parametric approximation to MI based registration. However, we expect our GMM based approach to perform better than the non-parametric MI based approach because of (1) reduced overall dimensionality of the registration problem through the parametrization of the density estimation step, and (2) more robust and accurate estimation of the joint density. Additionally, we parameter- ize the displacement field using the DCT basis, and use the Laplacian of the field as a regularizing term as described in Chapter 3. WecomparetheperformanceoftheconditionallikelihoodmetricwithGMMparametriza- tion, with that of MI with non-parametric density estimation approach. We will hence- forth refer to these methods as the GMM-Cond and the Parzen-MI approaches respec- tively. We evaluate these methods using intra-modality brain MR images, as well as inter-subject, inter-modality mouseimages. Registration of miceandother small animals is challenging because of the presence of rigid skeleton within non-rigid soft tissue, and it has been observed that though the existing non-rigid registration algorithms have been applied to brain imaging with encouraging results, they do not perform well for small animal images [KHH03] -[LPGD06]. We test the performance of our approach for both brain as well as mouse imaging to evaluate potential in clinical as well as pre-clinical imaging applications. 75 4.1 Methods and Results Let the target and template images be I 1 and I 2 , and their intensity at position x be i 1 (x) andi 2 (x) respectively. Letthetransformation thatmapsthetemplate to thetarget be T(x) = x−u(x), where u is the displacement field. The deformed template is then represented by I u 2 , whose intensities are given by i 2 (x−u(x)). We define the similarity metric D u (I 1 ,I 2 ) between the target and deformed template as the log likelihood of the target given the deformed template. Assuming that the voxel intensities in I 1 and I 2 are independent identically distributed random variables with joint density p(i 1 ,i 2 ), the similarity metric is given by, D u (I 1 ,I 2 ) = logp(I 1 |I u 2 ) = X x log p(i 1 (x),i 2 (x−u(x))) p(i 2 (x−u(x))) . (4.1) WeassumeaGaussianmixturemodelforthejointandmarginaldensitiesp(i 1 ,i 2 )and p(i 2 ). Let the number of components of the Gaussian mixture model be K, the mixing proportions be π k , and g(i 1 ,i 2 |m k ,Σ k ) be a Gaussian with mean m k and covariance Σ k , where k = 1,2,··· ,K. Let the unknown GMM parameters for each component k be represented as θ k = (π k ,m k ,Σ k ), and let Θ = [θ 1 ,θ 2 ,··· ,θ K ] be the vector of all unknown parameters. Then, the joint density is given by p(i 1 ,i 2 |Θ)= K X k=1 π k g(i 1 ,i 2 |m k ,Σ k ), (4.2) where π k >0 and P K k=1 π k =1. 76 WeusetheLaplacianofthedisplacementfieldasaregularizingtermR(u)topenalize deformations thatarenotsmooth. LetLbethediscreteLaplacian matrix. Theobjective function is then given by, max u,Θ X x logp(i 1 (x)|i 2 (x−u(x)),Θ)−μ||Lu|| 2 , (4.3) where μ is a hyperparameter that controls the weight on the regularizing term. To simplify the problem, we replace the combined optimization with respect to the deformation field and GMM parameters with an iterative two step procedure. Here, the GMM parameters are first estimated from the target and deformed template images through maximum likelihood estimation, and the deformation field is then estimated given the estimated GMM parameters. The two step optimization is given by ˆ Θ(ˆ u m ) = argmax Θ X x logp(i 1 (x),i 2 (x− ˆ u m (x))|Θ) (4.4) ˆ u m+1 = argmax u X x logp(i 1 (x)|i 2 (x−u(x)), ˆ Θ(ˆ u m )) − μ||Lu|| 2 , (4.5) where ˆ u m represents the estimated deformation field at overall optimization iteration m. 4.1.1 Estimation of parameters of Gaussian mixture model The maximum likelihood estimate of the GMM parameters ˆ Θ in equation 4.4 can be obtained by the EM algorithm [MD00]. Let the data sample at voxel j corresponding to 77 the position x j be S u j = [i 1 (x j ),i 2 (x j −u(x j ))] T , where j = 1,2,··· ,N, and N is the number of voxels in each image. The component of the GMM from which S j arises is taken as the hidden variable in the EM algorithm. The EM update equations are given by, τ i jk = π i k g(S u j ,m i k (u),Σ i k (u)) P K h=1 π i h (u)g(S u j ,m i h (u),Σ i h (u)) (4.6) π i+1 k (u) = 1 N N X j=1 τ i jk (4.7) m i+1 k (u) = P N j=1 τ i jk S u j P N j=1 τ i jk (4.8) Σ i+1 k (u) = P N j=1 τ i jk (S u j −m i+1 k (u))(S u j −m i+1 k (u)) T P N j=1 τ i jk , (4.9) where π i k (u),m i k (u), and Σ i k (u) are the GMM parameter estimates at EM iteration i and deformation field u. The objective function to be optimized in equation 4.4 is a non-convex function of Θ, so a good initial estimate is needed to converge to the correct local optimum. We use the k-nearest neighbors algorithm [DHS01] to identify cluster centers in the joint histogram of the target and template images, and the number of samples that fall into a particular cluster. The cluster centers and the proportion of samples in a cluster relative to the total number of samples were used as the initializations m 0 k and p 0 k respectively, and Σ 0 k was assumed to beidentity for allk. Thenumberof clusters was chosen to match the initial histogram of the two images, after which it was kept constant throughout the registration process. Figure 4.1 shows an example of a pdf estimated using GMM. Two brain images were considered, transaxial slices of which are shown in Figure 4.1(a) and 78 in Figure 4.1(b). The joint histogram of the intensities of these two images is shown in Figure 4.1(c), and the pdf estimated using GMM is shown in Figure 4.1(d) with the component means overlaid on it. The number of components was chosen to be K =6 to match the joint histogram. In the GMM based approach for probability density estimation, the number of pa- rameters to be estimated is K +KD+KD(D+1)/2, where D is the dimensionality of the sample, and K is the number of components in the GMM. For registration of two images D = 2, so the number of parameters is 6K, which is small for small values of K. On the other hand, the non-parametric Parzen based estimation computes the en- tire N bin ×N bin pdf from the samples, where N bin is the number of bins at which the pdf is computed. We expect to gain computationally as well as in robustness, from this reduction in dimensionality of the problem. However, if the joint density does not fit a GMM, the numberof mixture components might be large, approaching a Parzen window estimate. 4.1.2 Estimation of displacement field Lettheobjectivefunctioninequation4.5berepresentedbyJ m (u),whichcanberewritten as J m (u)=D u (I 1 ,I 2 | ˆ Θ(ˆ u m ))−μR(u) (4.10) We use conjugate gradient (CG) optimization to optimize this objective function with an Armijo line search [Ber99]. Let ˆ u (i) be the estimate of deformation field at CG iteration i. Let the gradient vector be d (i) = ∇J m (u)| u=u (i), the conjugate gradient 79 (a) (b) 50 100 150 200 250 50 100 150 200 250 (c) 50 100 150 200 250 50 100 150 200 250 (d) Figure 4.1: Estimation of GMM parameters: (a) Target image, (b) Template image, (c) Joint histogram of intensities of target and template images,(d) Joint pdf of target and template images computed using GMM (the component means shown with ’x’ marks). 80 direction vector beb (i) , and the step size computed from the line search beα (i) . TheCG update equations are given by ˆ u (i+1) = ˆ u (i) +α (i) b (i) (4.11) b (i) = d (i) +β (i−1) b (i−1) (4.12) β (i−1) = d (i) −d (i−1) T b (i) d (i−1) T b (i−1) . (4.13) The algorithm is initialized by b (0) =d 0 . If the CG algorithm converges in r iterations, the estimated deformation field in equation 4.5 is ˆ u m+1 = ˆ u (r) . The gradient vector d is given by d= ∂ ∂u D u (I 1 ,I 2 )−μ ∂ ∂u R(u), (4.14) where ∂ ∂u D u (I 1 ,I 2 ) and ∂ ∂u R(u) are the gradients of the similarity metric and the reg- ularizing terms respectively. The gradient of the similarity metric with respect to the displacement field at voxel positionx l is given in equation 4.15, where i 1 (l) and i u 2 (l) are shorthandnotations fori 1 (x l ) andi 2 (x l −u(x l )) respectively, theintensities of thetarget and deformed template at x l are represented by S u l = [i 1 (l),i u 2 (l)] T , and ∇i u 2 (l) is the spatial gradient of the deformed template at x l . The gradient of the regularizing term is given in equation 4.16. 81 ∂D u (i 1 ,i 2 ) ∂u(x l ) = ∇i u 2 (l) p(i 1 (l),i u 2 (l)) K X k=1 π k g(S u l | ˆ Θ(ˆ u m )) 0 1 Σ −1 k i 1 (l)−m k,1 i u 2 (l)−m k,2 − ∇i u 2 (l) p(i u 2 (l)) K X k=1 π k g(i u 2 (l))|m k,2 ,Σ k,2,2 ) (i u 2 (l)−m k,2 ) Σ k,2,2 , (4.15) ∂ ∂u R(u) = 2L T Lu. (4.16) We use a DCT parametrization of the displacement field as described in Chapter 3. The objective function is finally given by max β D Bβ (I 1 ,I 2 | ˆ Θ)−μ Nc X i=1 γ i 2 β i 2 , (4.17) whereB,β represent theDCTbases andcoefficients respectively, andγ i ,i=1,2,··· ,N b represents the eigen values of L corresponding to the i th basis vector. Since the DCT is a linear transformation of u, the gradient with respect to β can be obtained in terms of the gradient with respect to u as ∂ ∂β D Bβ (I 1 ,I 2 )=B T ∂ ∂u D u (I 1 ,I 2 ) (4.18) Thus the gradient can be computed by using equation 4.15 and then taking its DCT . The gradient of the regularizer is ∂ ∂β R(Bβ)=2Λβ, (4.19) 82 where Λ is a diagonal matrix with elements [γ 2 1 ,γ 2 2 ,··· ,γ 2 Nc ]. 4.1.3 Relation to Mutual Information Let the random variables corresponding to the intensities of I 1 and I u 2 be ζ 1 and ζ 2 respectively. Mutual information between ζ 1 and ζ 2 is defined as [CT91], D(ζ 1 ,ζ 2 ) = Z p(z 1 ,z 2 )log p(z 1 ,z 2 ) p(z 1 )(z 2 ) dz 1 dz 2 = E(log p(z 1 ,z 2 ) p(z 1 )p(z 2 ) ). (4.20) MI between two random variables can be interpreted as the reduction in uncertainty of onerandomvariablegiventheother. UsingMIasasimilaritymetricforregistration aims to find a deformation that makes the joint density of the target and deformed template images maximally clustered, thus implying that the uncertainty of one image given the other is minimized. Analternativeformulationcanbeobtainedbyapproximatingtheexpectationinequa- tion 4.20 by a sample mean where the intensity at each voxel in the target and deformed template images constitutes the random sample. Hence we get ˆ D(ζ 1 ,ζ 2 ) = 1 N X x log p(i 1 (x),i 2 (x−u(x))) p(i 1 (x))p(i 2 (x−u(x))) = 1 N X x log p(i 1 (x),i 2 (x−u(x))) p(i 2 (x−u(x))) − 1 N X x logp(i 1 (x)). (4.21) 83 Since the target is fixed and independent of u(x), dropping the terms containing the marginal density p(i 1 ), we get the approximate MI based similarity metric as ˆ D(ζ 1 ,ζ 2 )= 1 N X x log p(i 1 (x),i 2 (x−u(x))) p(i 2 (x−u(x))) . (4.22) Thus, computing the deformation field that maximizes mutual information is approx- imately equivalent to maximizingtheconditional density of thetarget given thetemplate image as defined in Equation 4.1. In [RMAP99] a similar relationship between maximum likelihood and conditional entropy was derived. The pdf p(i 1 ,i 2 ) is unknown, and needs to be estimated. The pdf can be estimated using a non-parametric approach such as Parzen windowing, or a GMM based approach can be taken to parametrize the pdf and estimate those parameters. The approach taken to estimate the pdf causes differences in the behavior of these metrics, and we will explore these differences by looking at the gradient equations of these two methods. The Parzen window estimate of a pdf at random variable values z 1 , z 2 is defined by [DHS01] p(z 1 ,z 2 )= 1 N N X j=1 g( z 1 −i 1 (j) σ )g( z 2 −i u 2 (j) σ ), (4.23) whereg( z 2 σ )isaGaussianwindowofwidthσ,whichisusuallytakenasadesignparameter. Note that this can be considered as a Gaussian mixture model with the fixed parameters K =N, π k = 1 N , m k =[i 1 (k),i u 2 (k)] T , and Σ k = σ 2 0 0 σ 2 . The gradient of Parzen based MI cost function with respect to the displacement field at voxel l is then given by 84 D(ζ 1 ,ζ 2 ) ∂u(l) = − 1 N X z 1 ,z 2 (1+logp(z 1 ,z 2 ))g( z 1 −i 1 (l) σ )g ′ ( z 2 −i u 2 (l) σ )∇i u 2 (l) + 1 N X z 2 (1+logp(z 2 ))g ′ (z 2 −i u 2 (l))∇i u 2 (l), (4.24) where g ′ (z 2 −i u 2 (l)) =g( z 2 −i u 2 (l) σ ) z 2 −i u 2 (l) σ 2 . We can rewrite the gradient equation for the GMM-Cond approach in Equation 4.15 as ∂D u (i 1 ,i 2 ) ∂u(x l ) = ∇i u 2 (l) p(i 1 (l),i u 2 (l)) K X k=1 π k g ′ (S u l | ˆ Θ(ˆ u m ))− ∇i u 2 (l) p(i u 2 (l)) K X k=1 π k g ′ (i u 2 (l)|m k,2 ,Σ k,2,2 ) where g ′ (S u l | ˆ Θ(ˆ u m )) = 0 1 Σ −1 k i 1 (l)−m k,1 i u 2 (l)−m k,2 . Thus, in both Parzen-MI as well as the GMM-Cond approaches, the gradient expressions have the derivative of a Gaussian weighting the summand. In Parzen-MI case, the Gaussian that is being differ- entiated hasameangiven bythedatasampleatwhichitiscentered, andafixedvariance over the entire sample. In the GMM case, the Gaussian has a mean and covariance given by the estimated GMM component mean and covariance. The Gaussian derivative has a large magnitude for values that are close to but not equal to the mean, and low every- where else. A plot of g ′ (z 2 ) is shown for a Gaussian window of width σ = 2 in Figure 4.2. Thus, for MI, the gradient selects the voxels that are close (within a few standard deviations (SDs) of Parzen window width) in intensity to a pdf bin value, and in order 85 −10 −8 −6 −4 −2 0 2 4 6 8 10 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 (a) −10 −8 −6 −4 −2 0 2 4 6 8 10 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 (b) Figure 4.2: (a) Gaussian window of width σ =2, and (b) its derivative. to increase MI, deforms the image such that these voxel intensities get closer to that bin. The GMM based approach on the other hand, selects voxels that are within a few SDs of the estimated GMM mean, and tries to get the voxel intensities closer to the mean. TheGMMapproach can thus bemorerobustto local minima, since theParzen approach mightproducedeformationsthathaveregionsofmisalignmentbecauseofsplittingalarge cluster in the joint density into multiple localized peaks around certain bin values. We therefore expect the GMM-Cond approach to have the following two advantages over the Parzen-MI approach. 1. The density estimation is reduced to that of estimating 6K GMM parameters that can be robustly estimated from the given images for small K. 2. Estimation of displacement field may be more robust to local minima since the gradient tries to form localized peaks around the estimated GMM means in the joint pdf, rather than around the closest pdf bin. 86 4.1.4 Results We perform validation studies of our method using (1) brain images, and (2) mouse images. Through these studies, we evaluate the performance of our approach in brain imaging as well as small animal imaging. Brain images usually consist of three distinct regions of intensity, the gray matter, white matter, and cerbro-spinal fluid. Thus the distribution of intensities of these images can be modeled using a GMM with a small number of components, making the GMM-Cond approach feasible. Similarly mouse CT images typically consist of mainly soft tissue versus bone contrast, and can be assumed to follow a GMM. Though mouse MR images have a larger number of intensity levels than the CT, the number of components required in the GMM is not prohibitively large. 4.1.5 Brain Image registration We randomly chose 9 brains from the Internet Brain Segmentation Repository (IBSR) provided by the Center for Morphometric Analysis at Massachusetts General Hospital and is available at http://www.cma.mgh.harvard.edu/ibsr/. This dataset contains T1 weightedMRimagesalongwiththeirmanualsegmentations. Weskull-strippedthebrains usingtheavailable manualsegmentations, andperformedhistogramequalization ontheir intensities to enhance contrast. Each brain from this dataset is used as an atlas that is registered to the rest of the brains, thus giving 72 registrations in total. We first rigidly registered each atlas to each target using AIR registration [WGW + 98a], [WGW + 98b], and then performed non-rigid registration using the GMM-Cond, Parzen-MI, and SSD methods to register these images. Though SSD is not a multi-modality metric like con- ditional likelihood and MI, it is widely used for intra-modality registration and hence we 87 include it in our evaluation. For all three methods, we used 30×30×30 DCT bases, and conjugate gradient optimization. Additionally, for all three methods, we check the determinant of the Jacobian of the deformation field for negative values, and choose a value of μ to weight the regularization term such that the Jacobian determinant has no negative values. This ensures a smooth deformation field for all methods considered. For the Parzen-MI registration we followed a hierarchical approach, first aligning the imagesthatweresmoothedwithaGaussianofwidth3voxels, andusingthedisplacement field thus obtained to initialize the registration of the original images. We observed that directly aligning the original images causes the algorithm to reach an inaccurate local minimum. A Parzen window width of σ = 2 was used to compute the distribution at every iteration. For the GMM-Cond approach, we used K =6 components in the GMM, and 5 overall iterations between the density estimation and deformation field estimation. Each displacement field estimation in the GMM-Cond approach involved 50 iterations of the conjugate gradient algorithm. The registration results for one atlas to target registration are shown in Figure 4.3. The correspondingdisplacement field obtained from GMM-Cond registration is shown in Figure 4.4. The joint density estimated using the Parzen window and GMM methods before and after registration of the brains is shown in Figure 4.5. It can be seen that the GMM-Cond approach is able to cluster the intensities more tightly along the diagonal than the MI-Parzen approach, implying a better alignment. The registration accuracy is evaluated based on the following metrics: 88 (a) (b) (c) (d) (e) Figure 4.3: Registration results for one dataset:(a) Target, (b)Atlas, (c) SSD registered image, (d)Parzen-MI registered image, and (e) GMM-Cond registered image. 89 50 100 150 200 250 0 20 40 60 80 100 120 140 160 180 Figure 4.4: Displacement field obtained from GMM-Cond registration corresponding to the slice shown in Figure 4.3. (a) (b) (c) (d) Figure4.5: ComparisonofParzen-MI vsGMM-Condbasedmethods: Logoftheestimate of initial joint density between target and template images using (a) Parzen windon method, (b) GMM method (the ’x’ marks denote the GMM means) , and log of the estimate of joint density between target andregistered template using(c) Parzen window method and (d) GMM method. 90 1. Dice coefficients: Dice coefficient is a measure of the amount of overlap in sets A and B, and is defined as, d=2 |A∩B| |A|+|B| , (4.25) where|A| denotes the number of elements in the set A. We use dice coefficients to evaluate registration accuracy in terms of its ability to align various segmented regions of the brain. The deformation field obtained from each atlas-target registration is applied to the labels of the atlas to obtain the segmentation of the target image. The dice coefficients between the manual segmentation of the target image, and that obtained by atlas based segmentation is computed. The mean dice coefficients over all 72 atlas-target registrations are shown for various brain regions in Figure 4.6. 2. Normalized mean squared difference between intensities: The images used are all T1 weighted MR images, and computing the squared difference between the target anddeformedatlas image intensities gives ameasureof registration accuracy. Since SSD optimizes this metric, we expect this approach to have the best performance for this metric. However, for the Parzen-MI and the GMM-Cond methods, this would bean unbiased measure of registration accuracy. TheMSE values are shown in a box plot for the SSD, Parzen-MI, and GMM based methods in Figure 4.7. 3. Qualitative comparison : The deformed atlases registered to each image are av- eraged to produce an image that would look more blurred if the registration was inaccurate. The images obtained by averaging the atlases deformed to match one target brain are shown in Figure 4.8 for the MI and GMM based methods. 91 Overall Caud GM WM Put Tha HC CBGM CBWM 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 SSD MI GMM Figure 4.6: Mean dice coefficients for different brain regions (Caud:Caudate, GM: Cere- bral gray matter, WM: Cerebral white matter, Put: Putamen, Tha: Thalamus, HC: Hippocampus, CBGM: Cerebellar gray matter, CBWM: Cerebellar white matter) From these figures, it can be seen that the GMM-Cond method has higher mean dice coefficients for all regions of the brain considered than the Parzen-MI and SSD methods, andlesserMSEbetweenintensitiesthanfortheParzen-MIapproach. Theaverageimages using the GMM based approach look less blurred than that of the Parzen-MI approach, implying that the former is more consistently aligning the atlas to the target images. Thus, the GMM based approach shows higher registration accuracy than the Parzen-MI method in terms of the metrics considered. 4.1.6 Mouse image registration Mouse image registration is a challenging problem because of the presence of rigid struc- tures such as skeleton within non-rigid soft tissue. We apply our GMM based method to 92 SSD GMM MI 0.2 0.25 0.3 0.35 0.4 0.45 Figure 4.7: Mean squared error between intensities of target and deformed atlas images for SSD, MI, and GMM methods multi-modality, inter-subject registration of mouse images. We consider two mice that were imaged using both MR and CT (referred to as MR1 and CT1, MR2 and CT2) and two other mice that were imaged using only CT (referred to as CT3 and CT4). The MR images were obtained on a Biospec 7T system at a resolution of 0.23×0.17×0.50 mm. TheCTimages correspondingtotheMRwereacquiredonaSiemensInveonsystem and the others were obtained from a microCT system, at a resolution of 0.2×0.2×0.2 mm. We first perform a 12 parameter affine registration of the CT images to the MR image using the AIR software [WGW + 98a], [WGW + 98b]. We downsampled the MR and affinely registered CT images to size 128×128×64 to reduce computation. The downsampled MR and affinely registered CT images were then used as the target and template images respectively for non rigid registration (corresponding to 6 inter-subject multi-modality registrations). We compare the GMM-Cond and Parzen-MI approaches. 93 Figure 4.8: Transaxial and sagittal views of average images: Top row:Template image to which the atlases are registered. Middle row: Average of deformed atlases using Parzen- MI approach, Bottom row: Average of deformed atlases using GMM based approach 94 Figure 4.9: Multi-modality inter-subject registration: Coronal (top row) and sagittal views (bottom row) of target , template, Parzen-MI registered image, and GMM-Cond registered image (left to right). SSD is not a suitable metric for multi-modality registration, and hence we do not use it for this comparison. For both GMM-Cond and Parzen-MI methods, we used 15×15×15 DCT bases to represent the displacement field. As before, we check the displacement field for foldings (negative determinant of Jacobian) anduseavalue ofμ thatgives asmooth fieldwithout foldings. We used a hierarchical approach for the Parzen-MI registration with images smoothed by a Gaussian of 3 voxels as the first step, followed by registration of the original images. A Parzen window width of σ = 5 was used to compute the distribution at every iteration. For the GMM-Cond approach, we used K = 8 components in the 95 Figure 4.10: Overlay on target image of template (left), Parzen-MI registered image (middle), and GMM-Cond registered image (right). The target image is shown in red and the deformed template is shown in green. GMM, and 5 overall iterations between the density estimation and deformation field estimation. Each displacement field estimation involved 50 iterations of the conjugate gradient algorithm. Coronal and sagittal views of the registered images for one dataset along with the target and template images are shown in Figure 4.9. The template and registeredimages forthesamedatasetareoverlaid onthetargetMRandshowninFigure 4.10. We quantify the performance of the registration through three measures: 1. Overall overlap: The target and template images can besegmented into mouse and background regions. The overall overlap of the target and deformed template can then be measured by computing the dice coefficients of the region labeled mouse in the two images. 96 2. Overlap of skeleton: The skeleton can be segmented out in the target and template images by thresholding. The dice coefficients of the skeleton in the target and deformed template images give a measure of overlap in the skeleton. 3. Mean squared difference (MSD) between intensities: The target MR image has a corresponding CT image acquired with it. The normalized mean squared differ- ence between intensities of the CT corresponding to the target, and the deformed template images gives a measure of registration accuracy. The values of these measures for the four CT images registered to the MR images are given in Table 4.1. Onanaverage, theGMM-Condmethodhashigherdicecoefficients fortheskeleton as well as overall shape, and lower normalized MSD between intensities than the MI-Parzen registration method, indicating better alignment. It is promising that the GMM-Cond showsimprovedperformancefortheinter-subject,multi-modalityregistrationconsidered, since these images have considerable difference in intensities, overall shape, and skeletal structure. 4.2 Discussion We used theconditional density of thetarget given the deformed template as a similarity metricfornon-rigidregistration, whereintheconditionaldensityismodeledasaGaussian mixture model. A DCT representation of the deformation field was used in conjunction withaLaplacianregularizingtermtoreducecomputation. Wecomparedtheperformance of our approach with that of Parzen-MI based approach using clinical and pre-clinical 97 Table 4.1: Quantitative measures of overlap Dice coefficients for overall overlap Registration Affine Parzen-MI GMM-Cond CT1-MR2 0.8953 0.9048 0.9584 CT3-MR2 0.8574 0.8917 0.9169 CT4-MR2 0.8253 0.8271 0.9386 CT2-MR1 0.8143 0.8980 0.8939 CT3-MR1 0.8137 0.8930 0.8833 CT4-MR1 0.8185 0.8271 0.8994 Average 0.8374 0.8892 0.9151 Dice coefficients for overlap of skeleton CT1-MR2 0.3249 0.4051 0.3934 CT3-MR2 0.3045 0.3421 0.3520 CT4-MR2 0.2419 0.3498 0.3473 CT2-MR1 0.1955 0.2831 0.3164 CT3-MR1 0.1729 0.2372 0.2866 CT4-MR1 0.1576 0.2338 0.3470 Average 0.2383 0.3085 0.3404 Mean squared difference between intensities CT1-MR2 0.5348 0.4212 0.3422 CT3-MR2 0.5618 0.5496 0.4213 CT4-MR2 0.4384 0.5735 0.4201 CT2-MR1 0.6101 0.5233 0.4400 CT3-MR1 0.6043 0.5383 0.6518 CT4-MR1 0.6087 0.4107 0.3515 Average 0.5596 0.5027 0.4378 98 images of bothintra modality brainMR images as well as multi-modality MR/CT mouse images. The GMM-Cond approach showed higher registration accuracy than the Parzen-MI approach in terms of dice coefficients and mean squared difference between intensities of thetarget andregistered images. TheGMMparametrization isnotonlycomputationally more efficient than the Parzen method, but also improves performance by reducing the overall dimensionality of the estimation problem, and through more robust and accurate densityestimation. Additionally, theonlydesignparameterthatneedstobechosenisthe number of clusters in the GMM, which can be obtained from the initial joint histogram. The consistent performance of the GMM-Cond method for brain as well as mouse images, and for intra as well as multi-modality images indicates that this is a robust approachthat canpotentially beappliedtomultimodality registration applications. This approach can be used as an alternative to MI based registration when dealing with high dimensional deformation fields. It can be viewed as a general framework that can be used in conjunction with other models for the deformation field. It should be noted however, that if the joint density of the images does not follow a GMM, a large number of clusters would be required to fit the data, thus increasing the number of parameters to be estimated and might not perform better than Parzen-MI in that case. We expect this approach to be useful in applications where the images have a few distinct regions of intensity, such as brain or mouse CT images. 99 Chapter 5 Summary and Future Work We explored the application of information theoretic measures to PET image reconstruc- tion and non-rigid registration. For PET imaging, we described an approach that uses mutual information and joint entropy as anatomical priors on the PET image. We used a scale-space framework to incorporate spatial information into these global distribution based metrics. The novelty in this approach to using anatomical information in PET reconstruction is in aiming to reconstruct a PET image that is similar to the anatomical image in the information theoretic sense rather than defining explicit correlations be- tween the the boundaries or regions in the images. We presented an efficient FFT based method to compute these priors and their derivatives, making it practical to use these priors for 3D image reconstruction. The simulations and clinical data results indicate that joint entropy with scale-space information reconstructs images with more accurate quantitation and sharper boundaries than that of the commonly used quadratic priors, whichdonot useanatomical information. We appliedtheseanatomical priorstopositron rangecorrection, whichisafundamentalresolutionlimitingprobleminPET.Atruncated homogeneous model was used to incorporate range blurring into the system model, and 100 the joint entropy prior with scale-space features was used to correct for blurring outside regions where uptake is expected. Promising results were obtained from simulations as wellaspre-clinicaldata,whichindicatethatjointentropybasedpriorscanbeusedincon- junction with asystem model thatincorporates positronrange, to improve theresolution of PET even when long range isotopes are used. We described two approaches to non-rigid registration that can be applied to small animal images. First, we apply the mutual information based scale-space framework to non-rigid registration of mouse images. The results indicate that the scale-space based registration gives better skeletal as well as softtissuealignment thanahierarchical multi- scale approach. We used a non-parametric approach to estimating the distribution of intensities that defines the mutual information cost function, to avoid making assump- tions about the form of the distribution. However, using this approach in conjunction with estimation of the non-rigid deformation field increases the dimensionality of the problem. We therefore describe an alternative approach that uses the log likelihood of the target given thedeformed template wherein the likelihood is describedby a Gaussian mixture model. This reduces the density estimation step to that of estimating the GMM parameters, thus reducing the overall dimensionality of the problem for images that have a few regions of distinct intensities. We evaluated the GMM approach using intramodal- ity brain MR images, and multi-modality mouse MR/CT images. The results indicate that this GMM based approach is robust and yields more accurate registration accuracy compared to the non-parametric MI based approach. The GMM based approach can therefore be potentially applied for clinical as well as preclinical imaging. 101 In summary, we find that the information theoretic measures considered in this dis- sertation have potential for application in multi-modality imaging studies. Since there is ongoing research in constructing multi-modality imaging systems, the approaches de- scribed in this dissertation are relevant and could be used as a general framework for reconstruction and registration utilizing the available multi-modality information. 5.1 Future Work The work presented in this dissertation can be extended to various problems in recon- struction and registration. A few of the possible extensions are: 1. Atlas based mouse segmentation: Mouse atlases such as the Digimouse described in [DSCL07] have a mouse CT or MR image with the various organs labeled by an expert. Automatic segmentation of a target image can be performed by register- ing the atlas to it, and applying the obtained deformation field to the labels thus obtaining the target image segmentation. This requires non-rigid registration, pos- sibly of images from different modalities. The GMM based non-rigid registration described in this dissertation can be used to perform this registration of different subjects to the atlas for segmentation. 2. Positron range: Thepositron rangemodelusedinthis workassumes thattherange blurringis spatially invariant, which is not accurate especially at organ boundaries. ThoughtheJEbasedapproachisabletoimprovetheresolutionofthereconstructed imageevenwiththisassumption, usingamoreaccuraterangemodelinconjunction with the JE prior might improve the results further. 102 3. Application of information theoretic anatomical priors to diffuse optical tomog- raphy (DOT): We obtained some preliminary phantom results for application of the information theoretic anatomical priors described in this dissertation to the ill- conditioned problem of diffuse optical tomography [PSG + 09]. This application can be explored further to improve the resolution of DOT. 103 References [ABH + 96] B. A. Ardekani, M. Braun, B. F. Hutton, I. Kanno, and H. Iida. Min- imum cross-entropy reconstruction of pet images using prior anatomical information. Physics in Medicine and Biology, 41(11):2497–2517, 1996. [AF99] J. Ashburnerand K.J. Friston. Nonlinear spatial normalization using basis functions. Human Brain Mapping, 7(4):254–266, 1999. [AM08] A. Alessio and L. MacDonald. Spatially variant positron range modeling derived from ct for pet image reconstruction. In Nuclear Science Sym- posium Conference Record, 2008. NSS ’08. IEEE, pages 3637–3640, Oct. 2008. [Ber99] D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999. [BJT + 96] J. E. Bowsher, V. E. Johnson, T. G. Turkington, R. J. Jacszak, and C. E. Floyd Jr. Bayesian reconstruction and use of anatomical a priori information for emission tomography. IEEE Trans. Med. Imag., 15(5):673– 686, 1996. [BLSL05] B.Bai, R.Laforest, A.M.Smith,andR.M.Leahy. EvaluationofMAPim- age reconstruction with positron range modeling for 3d PET. Nuclear Sci- ence Symposium Conference Record, 2005 IEEE, 5:2686–2689, Oct. 2005. [BNP + 04] K. Baete, J. Nuyts, W. V. Paesschen, P. Suetens, and P. Dupont. Anatomical-based fdg-pet reconstruction for the detection of hypo- metabolic regions in epilepsy. Medical Imaging, IEEE Transactions on, 23(4):510–519, April 2004. [BRL + 03] B. Bai, A. Ruangma, R. Laforest, Y.-C. Tai, and R.M. Leahy. Positron range modeling for statistical pet image reconstruction. IEEE Nucl. Sci. Symposium and Med. Imag. Conf, pages 2501–2505, 2003. [BYH + 04] J. E. Bowsher, H. Yuan, L. W. Hedlund, T. G. Turkington, W. C. Kurylo, C. T. Wheeler, G. P. Cofer, M. W. Dewhirst, and G. A. Johnson. Utilizing MRI information to estimate F18-FDG distributions in rat flank tumors. Conf. Record: IEEE Nucl. Sci. Symp. and Med. Imag. Conf., 4:2488–92, Oct 2004. 104 [CDH + 06] M. Chiang, R. A. Dutton, K. M. Hayashi, A. W. Toga, O. L. Lopez, H. J. Aizenstein, J. T. Becker, and P. M. Thompson. Fluid registration of medi- cal images using jensen-renyi divergence reveals 3d profile of brain atrophy in hiv/aids. In Biomedical Imaging: Nano to Macro, 2006. 3rd IEEE In- ternational Symposium on, pages 193–196, April 2006. [Che06] S. R. Cherry. Multimodality in vivo imaging systems: Twice the power or doublethetrouble? Annual Review of Biomedical Engineering, 8(1):35–62, 2006. PMID: 16834551. [CT91] T. Cover and J. Thomas. Elements of Information Theory. Wiley Series in Telecommunications, 1991. [DHS01] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley, second edition, 2001. [DMVS03] E. D’Agostino, F. Maes, D. Vandermeulen, and P. Suetens. A viscous fluid model for multimodal non-rigid image registration using mutual in- formation. Medical Image Analysis, 7(4):565 – 575, 2003. Medical Image Computing and Computer Assisted Intervention. [DSCL07] B. Dogdas, D. Stout, A. F. Chatziioannou, and R. M. Leahy. Digimouse: a 3d whole body mouse atlas from ct and cryosection data. Physics in Medicine and Biology, 52(3):577–587, 2007. [GLRZ93] G.Gindi, M. Lee, A. Rangarajan, andI.G. Zubal. Bayesian reconstruction of functional images using anatomical information as priors. IEEE Trans in Med. Imag., 12(4), Dec 1993. [HBHH01] D. L. G. Hill, P. G. Batchelor, M. Holden, and D. J. Hawkes. Medical image registration. Phys. Med. Biol., 46(4):R1–R45, March 2001. [HDU90] S.F. Haber, S.E. Derenzo, and D. Uber. Application of mathematical re- moval of positron range blurringin positron emission tomography. Nuclear Science, IEEE Transactions on, 37(3):1293–1299, Jun 1990. [KHH03] N. Kovacevic, G. Hamameh, and M. Henkelman. Anatomically guided registration of whole body mouse MR images. In MICCAI, pages 870–877, 2003. [KLD04] W. R. Krum, L.D.Griffin, and D.L.G.Hill. Non-rigid image registration: Theory and practice. Br. Journ. Radiol., 1(77):S140–S153, 2004. [LDW02] R. Laforest, D.J.Rowland, and M.J. Welch. Micropet imaging with non- conventional isotopes. IEEE Trans. Nucl. Sci., pages 2119–2126, 2002. [LEG98] M.E.Leventon, W.Eric,andL.Grimson. Multi-modalvolumeregistration using joint intensity distributions. In In Proc. MICCAI’98, pages 1057– 1066. Springer, 1998. 105 [Lew08] T. K. Lewellen. Recent developments in pet detector technology. Physics in Medicine and Biology, 53(17):R287–R317, 2008. [LH99] C. S. Levin and E. J. Hoffman. Calculation of positron range and its effect on the fundamental limit of positron emission tomography system spatial resolution. Physics in medicine and biology, 44(3):781–799, March 1999. [LHK + 97] B. Lipinski, H. Herzog, E. R. Kops, W. Oberschelp, and H. W. Muller- Gartner. Expectation maximization reconstructionofpositronemissionto- mography images using anatomical magnetic resonance information. Med- ical Imaging, IEEE Transactions on, 16(2):129–136, April 1997. [Lin94] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer Academic, Netherlands, 1994. [LPGD06] X. Li, T. E. Peterson, J. C. Gore, and B. M. Dawant. Automatic inter- subject registration of whole body images. Lecture Notes in Computer Science, 4057:18–25, 2006. [LQ00] R. M. Leahy and J. Qi. Statistical approaches in quantitative positron emissiontomography. Statistics and Computing, 10(2):147–165, 2000. [LY91] R. Leahy and X. Yan. Incorporation of MR data for improved functional imaging with PET. Inf. Proc. in Med. Imaging, pages 105–120, 1991. [MD00] G. MacLachlan and D.Peel. Finite Mixture Models. Wiley, 2000. [Nuy07] J. Nuyts. The use of mutual information and joint entropy for anatomical priors in emission tomography. Conf. Rec: IEEE Nucl. Sci. Symp. and Med. Imag. Conf., pages 4148–54, 2007. [OWJ + 94] X. Ouyang, W. H. Wong, V. E. Johnson, X. Hu, and C. Chen. Incorpo- ration of correlated structural images in pet image reconstruction. Medical Imaging, IEEE Transactions on, 13(4):627–640, Dec 1994. [PDD + 05] X. Papademetris, D. P. Dione, L. W. Dobrucki, L. H. Staib, and A. J. Sinusas. Articulated rigid registration for serial lower-limb mouse imaging. In MICCAI (2), pages 919–926, 2005. [PSG + 09] C. Panagiotou, S. Somayajula, A. P. Gibson, M. Schweiger, R.M.Leahy, and S. R. Arridge. Information theoretic regularization in diffuse optical tomography. Journal of Optical Society of America, 26:1277–1290, May 2009. [RBL + 06] A. Ruangma, B. Bai, J. S. Lewis, X. Sun, M. J. Welch, R. M. Leahy, and R. Laforest. Three-dimensional maximum a posteriori (map) imaging with radiopharmaceuticals labeled with three cu radionuclides. Nuclear Medicine and Biology, 33(2):217 – 226, 2006. 106 [RHG00] A. Rangarajan, I. T.Hsiao, and G. Gindi. A Bayesian jointmixtureframe- work for the integration of anatomical information in functional image re- construction. Journ. of Math. Imag. and Vision, 12:199–217, 2000. [RMAP99] A. Roche, G. Mal, N. Ayache, and S. Prima. Towards a better compre- hension of similarity measures used in medical image registration. In In MICCAI, pages 555–566. Springer-Verlag, 1999. [RSH + 99] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and D. J. Hawkes. Nonrigid registration usingfree-form deformations: application to breast mr images. IEEE Trans Med Imaging, 18(8):712–721, August 1999. [SAL05] S.Somayajula,E.Asma,andR.M.Leahy. PETimagereconstructionusing anatomical information through mutual information based priors. Conf. Rec: IEEE Nucl. Sci. Symp. and Med. Imag. Conf., pages 2722–26, 2005. [SC97] S.SastryandR.E.Carson. Multimodalitybayesianalgorithmforimagere- construction inpositronemission tomography: atissuecomposition model. Medical Imaging, IEEE Transactions on, 16(6):750–761, Dec. 1997. [SHH99] C. Studholme, D. L. G. Hill, and D. J. Hawkes. An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition, 32(1):71–86, January 1999. [Sil86] B. W. Silverman. Density estimation for Statistics and Data analysis. Chapman and Hall, London, 1986. [SRL07] S.Somayajula,A.Rangarajan,andR.M.Leahy. PETimagereconstruction using anatomical information through mutual information based priors: A scale space approach. Biomedical Imaging: From Nano to Macro, 2007. ISBI 2007. 4th IEEE International Symposium on, pages 165–168, April 2007. [SSLS + 01] D.W. Shattuck, S.R.Sandor-Leahy, K.A.Schaper,D. A.Rottenberg, and R. M. Leahy. Magnetic resonance image tissue classification usinga partial volume model. NeuroImage, 13(5):856 – 876, 2001. [SZS05] S. Shwartz, M. Zibulevsky, and Y. Y. Schechner. Fast kernel entropy es- timation and optimization. Signal Processing, 85(5):1045 – 1058, 2005. Information Theoretic Signal Processing. [Tow08] D. W. Townsend. Multimodality imaging of structure and function. Physics in Medicine and Biology, 53(4):R1–R39, 2008. [WGW + 98a] R. P. Woods, S. T. Grafton, J. D. G. Watson, N. L. Sicotte, and J. C. Mazziotta. Automated image registration: I. general methods and intra- subject, intramodality validation. Journal of Computed Assisted Tomogra- phy, 22:139–152, 1998. 107 [WGW + 98b] R. P. Woods, S. T. Grafton, J. D. G. Watson, N. L. Sicotte, and J. C. Mazziotta. Automated image registration: Ii. intersubject validation of linear and nonlinear models. Journal of Computed Assisted Tomography, 22:153–165, 1998. [WVA + 96] W. Wells, P. Viola, H. Atsumi, S. Nakajima, S. Nakajima, and R. Kikinis. Multimodal volume registration by maximization of mutual information. Med. Image Analysis, 1(1):35–51, 1996. [ZR03] J. Zhang and A. Rangarajan. Bayesian multimodality non-rigid image reg- istration via conditional density estimation. In Information Proc. in Med. Imaging, pages 499–511, 2003. 108
Abstract (if available)
Abstract
We explore the use of information theoretic measures for positron emission tomography (PET) image reconstruction and for multi-modality non-rigid registration.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Fast iterative image reconstruction for 3D PET and its extension to time-of-flight PET
PDF
Model based PET image reconstruction and kinetic parameter estimation
PDF
Signal processing methods for interaction analysis of functional brain imaging data
PDF
Geometric methods of image registration and analysis
PDF
Velocity-encoded magnetic resonance imaging: acquisition, reconstruction and applications
PDF
Image registration with applications to multimodal small animal imaging
PDF
Diffusion MRI white matter tractography: estimation of multiple fibers per voxel using independent component analysis
PDF
Location of lesions on postmortem brain by co-registering corresponding MRI and postmortem slices
PDF
PET study of retinal prosthesis functionality
PDF
Correction, coregistration and connectivity analysis of multi-contrast brain MRI
PDF
Optimizations in the assessment of pediatric bone
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Registration and 3-D reconstruction of a retinal fundus from a fluorescein images sequence
PDF
Shift-invariant autoregressive reconstruction for MRI
PDF
Radio-frequency non-uniformity in cardiac magnetic resonance imaging
PDF
Digitizing human performance with robust range image registration
PDF
Dynamic modeling and simulations of rigid-flexible coupled systems using quaternion dynamics
PDF
Theory and simulation of diffusion magnetic resonance imaging on brain's white matter
PDF
Improved brain dynamic contrast enhanced MRI using model-based reconstruction
PDF
Random access to compressed volumetric data
Asset Metadata
Creator
Somayajula, Sangeetha
(author)
Core Title
Information theoretic measures for PET image reconstruction and non-rigid image registration
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
10/14/2009
Defense Date
09/02/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biomedical imaging,image registration,joint entropy,multi-modality imaging,mutual information,OAI-PMH Harvest,PET reconstruction
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Leahy, Richard M. (
committee chair
), Nayak, Krishna S. (
committee member
), Singh, Manbir (
committee member
)
Creator Email
sangsom@gmail.com,somayaju@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2661
Unique identifier
UC1496407
Identifier
etd-Somayajula-3273 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-268947 (legacy record id),usctheses-m2661 (legacy record id)
Legacy Identifier
etd-Somayajula-3273.pdf
Dmrecord
268947
Document Type
Dissertation
Rights
Somayajula, Sangeetha
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
biomedical imaging
image registration
joint entropy
multi-modality imaging
mutual information
PET reconstruction