Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type o f computer printer. T h e quality o f this reproduction is dependent upon the quality o f the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back o f the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. UMI A Bell & Howell Information Company 300 North Zeeb Road, A nn Arbor MI 48106-1346 USA 313/761-4700 800/521-0600 M AXIM UM LIKELIHOOD HYPERPARAM ETER ESTIMATION F O R GIBBS P R IO R S FROM IN CO M PLETE DATA WITH APPLICATIONS IN IMAGE PROCESSING by Z henyu Zhou A Dissertation Presented to the FACULTY OF T H E GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of th e Requirements for the Degree DOCTOR O F PHILOSOPHY (Electrical Engineering) M ay 1995 Copyright 1995 Zhenyu Zhou UM I Number: 9621667 UMI Microform 9621667 Copyright 1996, by UM I Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by under the direction of Jt.iS .... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of ......... DOCTOR O F PHILOSOPHY Dean of Graduate Studies DISSERTATION COMMITTEE Chairperson D edication To my father and mother and my wife Qing A cknow ledgm ents I would first of all like to thank my advisor, Richard Leahy, from whom I have re ceived valuable encouragement, endless support, excellent guidance and true friend ship during my entire Ph.D study. These enriched my life at USC SIPI both aca dem ically and personally. I am greatly indebted to Costas Synolakis for his enduring patience, guidance, and both moral and financial support during my formative years at USC. I gratefully acknowledge the other members of my comm ittee, Jerry Mendel, Jay Kuo and Louis Gordon. I would also like to thank Zhen Zhang for serving as a m em ber of my guidance committee. T he staff of SIPI has been very helpful and patient. I give a special thanks to Allen Weber, who has been a source of ’’first aid” when I am frustrated with com puter problems on numerous occasions. I am very grateful to Linda Varilla, Gloria Halfacre, Mercedes Morenete and Toy Mayeda for their assistance and support. T he graduate students at SIPI have been a warm, interesting and wonderful group. I wish to thank Erkan Mumcuoglu for his valuable suggestions, comments and argum ents, to Stephanie Sandor for her warmhearted aid, and to Jam es Philips for his friendship and critical reading of my papers. I would also like to thank the other students and ex-students in our group, Xiao Hong Yan, Zhenyu Wu, Sam Song, Michael Spencer, John Mosher, Chung-Hsing Chang, Lance Hsu and Yung-Chih Hsu for being wonderful company these years. Finally, I would like to thank my parents, Ming-de and Yuhua, and my wife Qing, for th eir support and patience. W ithout them this dissertation would have never begun. T he research was funded in part by the Strategic Highway Research Program of the National Academy of Sciences, by the National Cancer Institute (Grant No. 1-R01-CA59794), and by a grant from the W hitaker Foundation. C ontents D edication ii Acknow ledgm ents iii List O f Tables vii List O f Figures viii A bstract xiii 1 I n tr o d u c tio n 1 1.1 Bayesian Approaches to Inverse Problem s.................................................... 1 1.2 Motivation and G o a ls........................................................................................ 3 1.3 Organization of the D issertatio n .................................................................... 5 2 Markov R andom s Fields and Gibbs D istributions for Im age Priors 8 2.1 Definitions and Prelim inaries........................................................................... 9 2.2 MRF-Gibbs Image P rio rs.................................................................................. 1 1 2.2.1 MRF Image Priors Using Intensity Processes o n l y ..................... 12 2.2.2 Compound MRF Image Priors Using Both Intensity and Line Processes .............................................................................................. 15 2.3 Integrating Visual Discontinuities from Different Imaging Modalities through Line P ro cesses.................................................................................... 22 3 D eterm inistic R elaxation A lgorithm s for Com puting M A P Image E stim ates using M RF Priors 24 3.1 P re lim in a rie s ...................................................................................................... 25 3.2 Deterministic Relaxation Algorithms for MAP Estim ation Using Pri ors with Image Intensity Processes O n l y ................................................... 27 3.2.1 Gradient Ascent A lg o rith m s.............................................................. 28 3.2.2 Coordinate Ascent A lgorithm s.......................................................... 29 3.2.3 D iscussion............................................................................................... 30 3.3 Determ inistic Annealing Procedures for MAP Estim ation Using Pri ors with Both Intensity and Line P rocesses............................................... 3.3.1 MAP estimation for the weak membrane model without line interaction p r i o r s ................................................................................. 3.3.2 MAP estimation for the generalized weak membrane model . . 3.4 Experiments and C o n c lu sio n s....................................................................... 4 M ean Field Theory: A Tool to Sim plify the Analysis of Joint Gibbs D istributions 4.1 Mean Field Reference S y s t e m ....................................................................... 4.2 Optimal Mean Field Approximation of Gibbs Partition Functions . . 4.3 Summary of Results from Mean Field Theory .......................................... 4.4 Mean vs. Mode Field A pproxim ations......................................................... 5 Sim ultaneous M aximum Likelihood Estim ation of H yperparam e ters 5.1 Introduction and Previous S tu d ie s................................................................. 5.2 Maximum Likelihood Hyperparam eter Estimation from Incomplete D a t a ...................................................................................................................... 5.3 Approximate Maximum Likelihood Estimation of the Hyperparam eter 5.3.1 Hyperparam eter Estimation Using Saddle Point Approximation 5.3.2 Hyperparam eter Estimation using M ean/M ode Field Approx imations .................................................................................................. 5.4 Numerical M e th o d s........................................................................................... 5.5 Performance S t u d i e s ......................................................................................... 5.5.1 Estim ator Bias and Variance using Stochastic Sampling .... 5.5.1.1 Comparative s tu d i e s .......................................................... 5.5.1.2 Robustness to noise ......................................................... 5.5.1.3 Robustness to different smoothing o p e ra to rs............... 5.5.1.4 Restoration with Poisson n o i s e ..................................... 5.5.2 Applications and Validations with Real Im ages........................... 6 A pplication I: Bayesian Optical Flow Com putation U sing Vector M R F Priors and Sim ultaneous M aximum Likelihood H yperparam eter Estim ation 6.1 In tro d u c tio n ......................................................................................................... 6.2 A Bayesian F o rm a lis m ..................................................................................... 6.3 Com puting the MAP Estim ate of Motion F ie ld s ...................................... 6.3.1 Deterministic Algorithms for MAP Motion Estimation .... 6.3.2 Integrating Visual Discontinuities from Different Modalities . . 6.3.3 Ilyperparam eter E s tim a tio n ............................................................. 6.4 Experiments and C o n c lu sio n s........................................................................ 31 32 36 39 46 47 49 56 58 59 59 65 66 66 67 70 72 72 74 75 75 76 77 86 86 88 91 91 93 94 95 v 7 A pplication II: Bayesian R econstruction of Transmission and Emis sion Tom ographic Images with Sim ultaneous H yperparam eter Es tim ation 102 7.1 Bayesian Formalism for PET R econstruction..............................................103 7.1.1 The Likelihood M o d e l ..........................................................................103 7.1.2 The Prior and Posterior D e n sitie s..................................................... 106 7.1.3 Incorporation of Anatomical Priors in PE T Image Reconstruc tion ...........................................................................................................107 7.2 Preconditioned Conjugate Gradient for Computing the M AP Esti m ate and Simultaneous Hyperparam eter E stim atio n ................................. 108 7.2.1 Preconditioned conjugate gradient m e th o d .................................... 108 7.2.2 Hyperparam eter E s tim a tio n ............................................................... I ll 7.2.3 Reconstructed Transmission and Emission Images from Real D a t a ...........................................................................................................112 7.3 Q uantitative Studies of Bayesian PET Image R eco n stru ctio n ............... 112 8 Conclusions and D irections for Future Research 125 8.1 C o n c lu sio n s.......................................................................................................... 125 8.2 Future Research D irections............................................................................... 126 Bibliography 128 List Of Tables 5.1 Monte Carlo Test for hyperparameter estimation comparing performance of generalized maximum pseudo-likelihood (GMPL), the method of moments (MOM), and the three mode field approximated ML methods (MFAML) described in Section 2.3. Mean, percentage bias and variance were com puted using 50 independent images drawn from the prior using a Gibbs sampler and then blurred and contaminated with Gaussian noise N( 0,16) The prior had a quadratic Hamiltonian defined on a second order neigh borhood. (* indicates algorithm fails to reach a solution. STD=Standard Deviation) ............................................................................................................ 74 5.2 Monte Carlo Test of MFAML 1-3 under Gaussian noise with different noise variance....................................................................................................... 75 5.3 Robustness to different smoothing operators.................................................. 76 5.4 Restoration from Poisson noise......................................................................... 77 7.1 The mean and standard deviation of relative ROI values using linin emis sion data (40 realizations), computed relative to the activity in region la. If region lb is used, the FBP results are somewhat worse, the MAP re sults are similar. Both FBP results use a ramp filter. FBP: ACFs using smoothed bla.nk/(lmin transmission data); FBP1: ACFs using smoothed blank/(30min transmission data). All MAP results use ACFs from repro jection of 1 min transmission reconstructions. MAPI: quadratic prior; MAP2: Hubei prior.................................................................................................. 120 VII List O f Figures 2.1 Neighbor systems for four nearest (a) and eight nearest (b) neighbors 1 1 2.2 Pairwise cliques for four nearest (a) and eight nearest (b) neighbors . 1 1 2.3 Plots of potential functions of Gibbs distributions for image priors . . 13 2.4 Plots of derivatives of the potential functions illustrated in Figure 2.3 14 2.5 Sample MRFs generated using continuous state M etropolis algorithm from compact space [0,100]*MxM', where M=128. Top row: Gibbs distribution using quadratic potential function with f3 = 0.0004, 0.001, 0.004. Middle row: Gibbs distribution using L\ potential function with (3 = 0.01, 0.02, 0.05. Bottom row: Gibbs distribution using Geman-M cClure’s potential function with (3 = 1.0, 1.5, 4.0 and <^=10. 16 2.6 Compound M RF with intensity pixels and line p r o c e s s e s ...................... 17 2.7 Sample CMRFs generated using continuous state Metropolis algo rithm from compact space [0,100]*MxiW * , where M = 128. Top row: Intensity field, bottom row: line field, from a weak membrane prior with (3 = 0.03(left) and 3 (right), and = 10.............................................. 20 2.8 First-order neighbor of (a) a horizontal line site (b) a vertical line site 21 2.9 Second-order neighbor of (a) a horizontal line site (b) a vertical line s i t e ......................................................................................................................... 21 2.10 Line clique types for first-order neighborhood (Up to rotation) . . . . 22 2.11 Line clique types for second-order neighborhood (Up to rotation) . . . 22 3.1 Values of the log-posterior function versus number of iterations for restoration of an image from additive Gaussian noise using conjugate gradient m ethod and Gauss-Siedel coordinate descent method: (a.) W ith the convex quadratic prior, (b) W ith the non-convex Goman and McClure’s p rio r........................................................................................... 30 3.2 The effective energy function of G N C ................................................. 33 3.3 The effective energy function of M F A ................................................. 35 3.4 Surface reconstruction experiment: (a) Original surface pattern, (b) Noise contam inated surface, A^O, 202) (c) MAP reconstruction with quadratic prior, with estim ated (3 = 0.0051 (d) MAP reconstruction with Huber prior, estimated /) = 0.042,5 = 1 0 ............................................ 42 viii 3.5 Surface reconstruction experiment (continue): (e) MAP reconstruc tion with (e) Leahy and Hebert prior, at (3 = 0.05, S = 10, (f) with G em an and McClure prior, at f3 = 0.4, < 5 ^ = 10, (g) with Weak Mem brane prior,at 0 = 0.004 and a — 10, (h) with Weak Membrane prior,at (3 = 0.004 and the potential for the line cliques determined by trial-and-error.............................................................................................. 3.6 Image restoration experiment (Elaine Image): (a) Original image, (b) Noise contam inated image, N (0,202), MAP restoration with (c) quadratic prior, at 0 = 0.001, (d) Huber prior, at (3 = 0.012,(5 = 10. 3.7 Image restoration experiment (Elaine image continue): MAP restora tions using (e) Leahy and Hebert prior, at (3 — 0.1,8 = 8, (f) Geman and McClure prior, at (3 = 0.3, S = 10, (g) Weak membrane prior, at 0 = 0.001,8 = 12 and (h) Generalized weake membrane, at (3 = 0.001, and the potentials for line cliques are obtained by trial-and-error. . 4.1 Illustration of results of mean field approximation of the expectation value of Gibbs Hamiltonian: (a) Quadratic priors, (b) Huber prior. 5.1 Illustration of the quantitative effect of the global hyperparameter (a) L- curve, (b) Global mean-squared-error of restored image versus / 3 ............. 5.2 Illustration of the adaptive quadrature............................................................ 5.3 Experim ent for image restoration from Gaussian noise (Boat image): Top row: left=original, right=noisy; middle row: left=M A P with Q uadratic prior, at (3 = 0.000384, right=M A P with Huber prior; bottom row: left=M A P with Hebert and Leahy prior, right=M A P with Geman and McClure prior. All images shown above correspond to the estimated (3 use MFAML 1.................................................................. 5.4 Comparisons of restored images using three different forms of MFAML m ethod. Top raw: left=M FAM Ll, center=MFAML3 right=MFAM L2, on the bottom is the plot of global squared error versus hyperparam eter, where the star, circle and cross correspond to MFAML 1 to 3 respectively............................................................................................................ 5.5 Plots of global squared error of MAP images versus hyperparam eter values and L-curve. Left: Squared error versus hyperparameter, right: L-curve................................................................................................................... 5.6 Plots of global squared error of MAP images versus hyperparam eter values and L-curve. Left: Squared error versus hyperparameter, right: L-curve................................................................................................................... 43 44 45 57 60 71 79 80 81 82 IX 5.7 Experim ent for image restoration from Poisson noise (Moon image): Top row: left=original, right=noisy; middle row: left=M A P with Q uadratic prior, right=M A P with Huber prior; bottom row: left=M AP with Hebert and Leahy prior, right=M A P with Geman and McClure prior. All images shown above correspond to the estim ated (3 use MFAML 1.............................................................................................................. 83 5.8 Plots of global squared error of MAP images versus hyperparam eter values and L-curve for Moon image with Poisson noise. Left: squared error versus hyperparam eter, right: L-curve................................................ 84 5.9 Plots of global squared error of MAP images versus hyperparam eter values and L-curve for Moon image with Poisson noise. Left: squared error versus hyperparam eter, right: L-curve................................................ 85 6.1 Simulated image pairs for motion estimation experiments: (a) image pair of a ball under rotation with axis perpendicular to the page, (b) image pair of a ball under rotation with axis parallel to the page. . . 97 6.2 Results from image pair in Figure 6.1(a) using simple vector MRF priors without line process: (a) quadratic priors, (b) Huber priors, (c) Hebert and Leahy prior, and (d) Geman and McClure prior. . . . 98 6.3 Results from image pair in Figure 6.1(a) using compound vector MRF priors with line process: (a) vector field estim ated using weak mem brane prior, (b) vector field estim ated using generalized weak mem brane prior, (c) line field associated with (a), (d) line field associated with (b).................................................................................................................. 99 6.4 Results from image pair in Figure 6.1(b) using simple vector MRF priors without line process: (a) quadratic priors, (b) Huber priors (c) Hebert and Leahy prior, and (d) Geman and McClure prior.....................100 6.5 Results from image pair in Figure 6.1(b) using compound vector MRF priors with line process: (a) vector field estim ated using weak mem brane prior, (b) vector field estim ated using generalized weak mem brane prior, (c) line field associated with (a), (d) line field associated with (b)..................................................................................................................... 101 7.1 Illustration of a PET system: coincidence at a. and 1 ) represents a transmission coincidence, at c and d represents a emission coincidence, at e and f represents a scatter coincidence, and at g and h represents a random coincidence...........................................................................................104 x 7.2 Reconstructed transmission images from a set of real human chest d ata (a) filtered backprojection algorithm, (b) Bayesian algorithm using quadratic prior, (c) Bayesian algorithm using Geman-Mclure prior, (d) graduate non-convexity algorithm using weak membrane prior, (e) deterministic annealing using weak membrane prior and (f) determ inistic annealing and ICA procedure using weak membrane prior and line interactions te r m s ...................................................................... 113 7.3 (a): A single cross section of a PET registered MR scan; (b) The anatom ical boundaries extracted from (a); (c) MAP reconstruction with a quadratic Gibbs prior; (d)reconstruction using determ inistic annealing(DA) with estim ated line processes without prior anatom i cal in formation; (e) DA with estim ated line processes influenced by anatomical boundaries using a binary weighting scheme; and (f) 1)A with estimated line processes influenced by anatom ical’ boundaries using a Chamfer weighting scheme....................................................................114 7.4 Computerized brain phantom and the selected of Region-of-Interest ( R O I s ) ..................................................................................................................... 115 7.5 PET simulation experiment: The top row shows sample PET images re constructed, with simultaneous MFAML hyperparameter estimation, from pseudo-random high-count data, using each of the three priors: left - quadratic, center - Huber, right - Geman and McClure. The middle row shows sample PET images reconstructed, with simultaneous MFAML hy perparameter estimation, from pseudo-random low-count data, using each of the three priors. The bottom row shows the total squared error in the estimated MAP images as a function of f3. These errors were averaged over 50 independent realizations as described in Section 3.2. The solid line corresponds to the high count data and the dashed line to the low count data. The star indicates the mean value of fi obtained using the MFAML method for high count data and the small circle for low count data. Note its close proximity to the minimum squared error...............................................116 7.6 ROI variance versus squared bias plot. At each point the summation of the horizontal and vertical coordinates equals the total mean squared error. 'Lop row: high count test; Bottom row: low count test. Left column: large ROI with boundary pixels; right column: interior ROI away from boundaries. The ROIs are marked in Figure 4(a). In the figure, solid line = MAP with quadratic prior; dot-dashed line = MAP with Huber prior; dashed line = MAP with Geman and McClure prior; dotted line = filtered backprojection. The star, circle and cross signs indicate locations corresponding to the MFAML estimate of the hyperparameter........................ 117 7.7 Phantom s used in the ROI analysis with the quantization regions m arked in white...................................................................................................... 119 XI 7.8 (a)The original brain phantom , (b) Anatomical boundaries extracted from (a) and 30 selected R O I’s ..........................................................................121 7.9 Bias and standard deviation of the average activity in R O I’s of 50 independent reconstructions for a variety of m ethods for 1M data. (a)= F B P using ideal ram p filter, (b)=M axim um Likelihood, (c)=Bayesian with quadratic prior, (d)=Bayesian with Geman and McClure prior, (e) = Bayesian with weak membrane prior, (f)=Bayesian with general ized weak membrane prior, (g)=Bayesian with weak m em brane prior and exact anatomical prior, (h)= Bayesian with weak membrane prior and anatom ical prior half pixel shifted from true location both hori zontally and vertically........................................................................................... 124 xu A bstract There are three fundamental problems with using Bayesian methods to solve ill- posed inverse problems in image processing and computer vision: the selection of the prior model of the unknown field, the selection of the prior model parameters, i.e. the hyperparameters, and the optimization algorithms for computing optimal Bayesian solutions of the inverse problems. This dissertation addresses all these problems but concentrates on developing m ethods for hyperparam eter estimation from incomplete data. The class of priors considered in this dissertation are Markov random fields (MRFs) in the form of Gibbs distributions. O f particular interest are those pri ors which model both smooth regions and sharp image boundaries, either through the use of an appropriate Hamiltonian in a simple MRF or through the use of both intensity and line processes in a compound M RF. The forms of determ inistic opti mization algorithms for computation of the optim al Bayesian solutions are dependent 0 1 1 the forms of the M RF priors. Here, various forms of Gibbs image priors and their related determ inistic optimization algorithms for computing maximum a posterior (MAP) estim ates are suggested and compared. The param eters of the prior, the hyperparameters, play a critical role in Bayesian methods for solving ill-posed inverse problems. Of particular im portance for the case of Gibbs priors is the global hyperparam eter which multiplies the Hamiltonian. This param eter controls the balance between the influence of the likelihood function and th at of the prior on the Bayesian solutions. Estimation of hyperparameters can be formulated in terms of param eter estim ation from incomplete d a ta in the sense of Dempster, Laird and R ubin’s EM algorithm . The incomplete d a ta are the observed measurements and the complete data are the unobserved images which are sample realizations from the Gibbs prior. Computing the exact maximum like lihood estim ate (MLE) of the hyperparameter from incomplete data is intractable xiii for most image processing problems due to the complexity and high dimensionality of the joint probability densities involved. In this dissertation, we develop an ap proxim ate maximum likelihood estim ator for this global hyperparam eter, which is computed simultaneously with the M AP image. T he new algorithm relies m ostly on an approxim ation closely related to the mean field theory of statistical mechanics, in which analysis of many body systems is approximated by analysis of a set of single body systems. Through mean field theory, a complicated large dimensional Gibbs distribution can be approximated by a separable function equal to a product of one dimensional density functions. In essence, this reduction in complexity is achieved by approxim ating the influence of th e neighbors of each pixel over their entire sam ple space, by their estim ated means. In our work, we use a mode-held rather than a mean-held approxim ation, where the mode of the posterior density is computed using a MAP image estim ation algorithm. We use a sequential updating scheme to estim ate both the image and the hyperparam eter. Successive iterations of a MAP image estim ation algorithm are substituted in the mode-held approxim ation, which in turn is used to update the hyperparameter estim ate. The hyperparam eter is then held constant and the M AP image updated. T his sequence is repeated until the algorithm converges. We present results of an extensive Monte-Carlo study th at ex amines the bias and variance of this estimator for the problem of image restoration under various conditions for cases where the true value of the global hyperparam eter is known. Several applications of methods presented in this dissertation are discussed. We focus on the problems of optical flow computation and im age reconstruction in positron emission tomography (PET). We have m ade extensive quantitative studies of the performance of these methods for the problem of M AP PET image recon struction. C hapter 1 Introduction 1.1 B ayesian A pproaches to Inverse P roblem s Inverse problems arise in many branches of science and engineering including image processing and computer vision. In typical situations, such as image restoration, surface interpolation, optical flow estimation and computerized tomographic image reconstruction, one is interested in recovering an unknown function from a finite number of degraded and noisy measurements of a known transform ation on the function. Even if knowledge about the forward model, which describes the mapping between the measurements and the unknown function and the noise statistics is ex act, the inverse problems is often ill- posed in the original sense of Hardam ard [23, 2d]: the solution may not be unicpiely determined by the data, or it may not exist, or it does not depend continuously on the data. As a consequence of ill- posedness, a small perturbation in the measurements because of the presence of noise can produce an arbitrarily large error in a solution formed by directly inverting the data using the forward model. Since the 1960s, theoretical and practical methods for determining approxim ate stable solutions of inverse problems have been widely studied. Most of these m ethods can now be classified into two m ajor theories: regularization the ory [53] and Bayesian theory [39]. The two theories are mutually complementary in philosophy as well as in m athematics. The former one is deterministic, whereas the latter one is based on statistical inference. Both theories attem pt to form a global optim ization framework in which generic knowledge of the unknown images, such as the smoothness of the solution, is incorporated. In regularization theory these measures are used as stabilizing functions or constraints, while in Bayesian theory they appear as prior distributions. In Bayesian formulations of many inverse problems in image processing and com puter vision, there exist two sample spaces, X and y . These can often be viewed usefully as the complete and incomplete d ata sample spaces respectively, as described in [14]. The observed data y is a realization from y and is referred to as the incom plete data. The corresponding x € X, the complete data, which is often the quantity to be estim ated, is not observed directly. Rather it is indirectly observed through y with the two quantities related by a conditional probabilistic law, P(y\x). This conditional probabilistic function, known as the likelihood function, is dependent on the imaging modality and is problem-specific. One of the most widely addressed likelihood models, as in image restoration and surface interpolation, is the linear Gaussian model y = A x + n , where y is the observed data (incomplete), x is the underlying image (complete), n is zero-mean Gaussian noise with covariance m atrix C n and m atrix A is a linear degradation operator. Therefore, Another likelihood model is the linear Poisson model, which arises in problems where the data acquisition system is photon limited, e.g. emission tomography, gamma- ray astronomy, and fluorescence microscopy. In this model, the mean of y is related to image a: by a linear operator A, i.e., E[y] = A x and y follows a joint Poisson distribution as: an estim ate of * from the observation y. Since A is often very ill-conditioned, direct inversion based on maximizing of the likelihood function does not always P(y\x) = (27t )~Nl2\Cn\~Nl2 exp [-'-(y - A x f C ^ i y - Ax)} . (l.l) ( 1.2 ) The objective of the inverse problems addressed in this dissertation is to obtain provide a unique and stable solution. Bayesian m ethods solve these type of ill-posed inverse problems by combining information carried in the observation and likelihood 2 P(y\x) with prior generic knowledge about the unknown image, in the form of a prior distribution /^(a;|^*), into the posterior distribution P(x\y, < I > ) using Bayes rule: n, . P(y\x)P(xm r ( » l » . « ) = - — <u> where P{y\$) = j P{y\x)P(x\$)dx (1.4) and $ is a set of model parameters of the prior distribution. They are second level param eters from the standing of the data y, thus are often referred to as hyperparameters. In image processing and computer vision, the prior laws are often Markov random fields (MRFs) in the form of Gibbs distributions [5, 18, 39]. In Bayesian inference, a solution of the inverse problem can be computed from either of the following two optim al estim ators (i) the maximum a posteriori (MAP) estim ator which is the mode of the posterior density and corresponds to a zero- one loss function, and (ii) the minimum mean squared-error (MMSE) estim ator which is the posterior mean and corresponds to the mean squared-error loss function [5, 18, 39]. A full Bayesian approach either requires that the hyperparameters are known, or th at we have a “hyperprior” density for the hyperparameters 4 > . However, in practice, it is desirable that Bayesian estim ators are data-driven, in that they do not depend on any a priori knowledge of the hyperparameter. This is not only because prior knowledge of the hyperparameter is usually not available but also because some new unknown parameters may arise in the hyperprior density itself. Problems also arise if there is an unknown gain factor in the transfer function A in (1.2) or an unknown noise variance in (1.1). As we shall see later, a point estim ate of hyperparam eter from the observed data y can efficiently avoid these problems. 1.2 M otivation and Goals In brief, given the likelihood function P(y\x) and observation y, most Bayesian approaches solve ill-posed inverse problems by (1) choosing an appropriate prior distribution P (a;|$ ), (2) computing a point estim ate of the hyperparam eters 4 > , denoted simply as 4>, and (3) solving either one of the following global optimization problems: MAP : « ( $ ) = arg max P(x\y, 6) « E MMSE : < * ( $ ) > = argm in / (x — x*)2 P (x\y,$)dx = f x P (x \y ,$ )d x x Jx Jx where x* is the true image. Often times, Step 2 and 3 have to be performed sim ulta neously, for they are m utually dependent and are both performed against the same data set y. The fundam ental problems associated with using Bayesian methods are clearly: • The choice of appropriate prior models: in the applications considered in im age processing and computer vision, priors are desired to model both smooth variations and abrupt changes, i.e. discontinuities in brightness, color and/or depth. • The selection of the parameters of the prior model, the hyperparameters: if training d ata of the same nature as the unknowns is available these m ight be estim ated and treated as known constants in the inverse problem (though this is a challenging problem itself). However, such training d a ta is usually unavail able and the hyperparameters m ust be estim ated directly from the observed data y. The estim ates of these parameters are critical for good qualitative and quantitative performance of Bayesian estimators. • Optim ization algorithms for computing an optim al Bayesian estimate: priors which accommodate both abrupt changes and smooth variations are often not strictly convex and not differentiable everywhere, global optim ization is difficult, and special algorithms m ust be developed to com pute the solution. The trade-off between computational cost and optim ality of the solution is im portant. In this dissertation, we restrict our scope to certain Bayesian methods using the class of homogeneous isotropic Markov random field priors. T he behavior of this class of priors is determ ined completely by the form of the potential functions which make up of the Hamiltonian, and by the value of the global hyperparam eter for 4 which the Hamiltonian is its sufficient statistic. This class of Bayesian m ethods has wide applications ranging from image restoration, surface reconstruction, and motion estim ation, to medical image reconstruction [5, 8, 9, 10, 19, 17, 20, 21, 30, 34, 43, 59, 61, 64]. These Bayesian methods have proven to be very powerful for restoration or reconstruction of high quality images. However, a m ajor problem lim iting their utility is the lack of a practical and robust method for selecting the param eters of the prior Gibbs distribution. In this dissertation, we aim at developing a set of completely data driven Bayesian algorithms for solving ill-posed inverse problems based on homogeneous isotropic M RF priors. Completely d ata driven means that both the hyperparam eter and the unknown image are simultaneously estimated from the same observed data, and thus for each data set these algorithms will provide a single estim ate of the image. This is different from most previous studies in which the Bayesian m ethods determ ine a group of estim ates, one for each values of hyperparam eter, from which the optim al one must be chosen by visual inspection. This set of algorithms are determ inistic in nature and are accompanied with an estimation-theoretic maximum likelihood m ethod for simultaneously and automatically selecting the global hyperparam eter. We expect these methods to be applicable to most of the applications mentioned above. So far, the m ajority of studies on Bayesian techniques has been qualitative, this is partially due to the fact that most Bayesian estim ates involving visual inspection and judgm ent of the quality by human observers. The development of completely d ata driven Bayesian algorithms make it possible to perform quantitative studies of the performance of Bayesian algorithms for solving ill-posed inverse problems. Thus, in this dissertation, we evaluate the quantitative accuracy of the Bayesian approach developed in this dissertation applied to functional imaging of the brain using positron emission tomography (PET). 1.3 O rganization of th e D issertation We start the dissertation with a discussion of the characteristics of different Gibbs image priors in Chapter 2. We focus mainly on a subset of image priors which rep resent homogeneous and isotropic MRFs and are defined on low order neighborhood systems and pairwise cliques only. These Gibbs priors are simple to analyze but rich enough to describe m ost of the images of interest. A set of determ inistic relaxation algorithms for computing the optim al Bayesian estim ate are presented in Chapter 3 for the priors discussed in Chapter 2, assum ing the hyperparam eter has been predeterm ined. Gradient ascent algorithm s and coordinate ascent algorithms for computing th e MAP estim ate are compared and discussed for simple M RF priors involving intensity processes only. These algo rithms are combined with an extended deterministic annealing procedure to form a continuation m ethod to compute MAP estim ates for compound M R F priors in volving both intensity and line processes. T he non-negativity constraint is enforced using a penalty function when necessary. In C hapter 4 as a foundation of many of the developments in this dissertation, we introduce and develop a tool for analyzing M R F’s based on the mean field theory of modern statistical mechanics in which a complicated multidimensional Gibbs distri bution is approxim ated with a separable mean field Gibbs distribution. Many useful results based on this approxim ation, such as the posterior mean, marginalization and partition function evaluation are given in this chapter. Some of the m ethods are optimized via the Gibbs-Bogoliubov-Feynman bound and first order perturbation theory. An estim ation theoretic approach for hyperparam eter estimation is given in Chapter 5. This approach falls into the class of maximum likelihood estimation from incomplete data which can be attacked using the EM algorithm [14]. Since a true EM algorithm is completely intractable for most im age processing problems of interest, here an approximation is used which is closely related to the mean field theory developed in Chapter 4. Thus we refer to this m ethod as mean field approxi mated maximum likelihood (MFAML). The m ethod is developed and th e properties of the resulting estim ator are examined using extensive M onte Carlo simulations in the context of the image restoration problem. In Chapters 6 and 7, the algorithms developed in the previous three chapters are applied to two im portant problems: motion estimation from image sequences and image reconstruction in positron emission tomography (P E T ). It is very im portant for both applications th at the Bayesian estim ate should preserve discontinuities and 6 also th at a near optim al choice of hyperparameter be used. While the PET ap plication fits well into the linear framework outlined above, the motion estimation application differs in two im portant ways. First, the mapping between the complete and incomplete data is highly nonlinear. And second, motion is represented by a vector rather than scalar field. This second difference is accommodated by extend ing the Gibbs distribution to the class of vector fields. Despite these differences, the m ethods described in the preceding chapters can be applied with only minor modifications. It has been widely conjectured that the use of MRF based Bayesian approaches should lead to improvements in the quantitative accuracy of estim ated images, in comparison to other simpler estimation procedures. However, the vast m ajority of publications in this field include only images from single realizations of a given source. While these provide anecdotal evidence of improved accuracy, no system atic quantitative study is available. Fortunately our success in hyperparam eter estim a tion enables us to perform quantitative evaluation of Bayesian approaches for solving inverse problems. Therefore, in Chapter 7, we perform an extensive Monte Carlo simulation to evaluate the quantitative performance of the Bayesian techniques de fined in the dissertation in the context of PET image reconstruction. In Chapter 8, we give conclusions from our work and discuss our future work. C hapter 2 M arkov R andom s Fields and G ibbs D istributions for Im age Priors The Bayesian approach for solving an ill-posed inverse problem in image processing demands the integration of prior knowledge about the unknown image with the observed d ata and knowledge of the degradation model (forward model). One of the difficulties in solving an inverse problem with the Bayesian approach is the selection of an appropriate prior model which faithfully describes our knowledge of the properties of the unknown images. In most cases a good image model must accom modate both regions with smooth regions and sharp image boundaries. It has been widely dem onstrated that MRF models are some of the most powerful and successful ways to model various types of images. They have been used by researchers from a wide variety of disciplines, including signal and image processing, pattern recognition, computer vision and applied m athematics. In essence, MRFs model the unknown fields through local interaction of particles (pixels) or edge elements (line sites). There are several reasons that make this class of model so appealing: (i) MRFs provide a solid m athem atical framework to unify heuristic knowledge about, images, which in turn provides a sound basis for algorithm development; (ii) The M RF-G ibbs equivalence provides freedom to choose and analyze the prior models; (iii) MRF models are based on local interactions which facilitate the development of fast iterative optim ization algorithms, since most optim ization procedures favor local properties; (iv) MRFs provide a sound platform for integrating d ata from different imaging m odalities, which is particularly useful in multi-sensor fusion, multi-spectral remote sensing and multi-modality medical imaging. 8 2.1 D efinitions and Prelim inaries Let fi0 be a finite set or a compact space and S' be a d-dimensional lattice (d > 1). * is a collection of random variables {#,} defined on lattice S indexed by i £ S, with each image pixel .t, taking values in flo- xfs are image pixels in lexicographic ordering. The space fl0 will be referred to as the single pixel sample space in this dissertation. The space X = Qq is the sample space of x. D e fin itio n 1 A neighborhood system on S, A f = {Ni,i £ S}, is a collection of subsets of S for which: (i)i ^ N{, Vi £ S (ii) i G Nj if and only if j G /V,, Vi,j £ S. S and A f form a graph Q = {S, Af}. In this dissertation, we are going to use S\i to denote the set of all sites j ^ i in S and di to denote the set of neighbors of pixel i, i.e., {j, j £ iV,}. D e fin itio n 2 If P(x) is a probability measure in X, then x is a M RF defined on the graph Q iff: (i) P(x) > 0, Vx G X ; (ii) P(xt\xS\t) = P(xt\x: - H ) for all i £ S. According to the Hammersley-Clilford theorem [4, 39], the probability measure P(x) of a MRF can be ecpiivalently characterized by a Gibbs distribution: P{x\(i) = ^ y e x p {-/?{/(*)} (2.1) where Z{fi) is the partition function, [3 is the critical global hyperparameter, and lJ(x) is the Gibbs energy or Hamiltonian. Therefore, we often refer to MRF priors as Gibbs priors. The partition function is the scaling constant required so that the P(x\ft) is a valid density function and is given by: Z(0) = jT cxp{-(3U(x)}dx. (2.2) The Hamiltonian is constructed as a sum of potential functions, each of which is a function of the pixel values on special subsets of S which are called cliques. Each clique consists of either a single lattice site or two or more sites with the restriction 9 that for each clique, all sites in that clique are m utual neighbors. Let C denote the set of all cliques. Then for a MRF, the Hamiltonian has a specific form: U(x) = £ Vc(x), (2.3) cec where Vc{ ■)is the potential function of the pixel values at the sites in clique c only. The defining feature of a MRF is P(a:;|a;s\,-) = P(xi\xsi) for all i G S, the local conditional density has a specific relationship with the potential functions, once the form of joint Gibbs distribution and neighbor system are fixed: m i \ e x p { -/? E cgC:«€cV / c (^ ,^ « )} ,,,,, ( M r f OTT-y I / / \ 1 J \ “ ‘d ) h o e x P { - P E c € C : i € c Vc{xl, x ai) } d x i The potential functions thus determ ine the nature of the local behavior of the MRF. They can be chosen freely with the restriction th at the resulting joint density be integrable. In this dissertation, we are mainly interested in a special class of M RF priors for image processing, the homogeneous and isotropic MRF priors. For a homogeneous MRF, the potential function for a particular clique is independent of the location of that clique. For homogeneous, isotropic MRFs, the potential functions are also rota tion invariant. This class of MRFs has the simplest representation, but is sufficiently rich to model a wide variety of useful image types. In most practical applications of image processing and computer vision, low order neighborhoods and simple clique interactions are more desired. Not only do they result in lower com putation and a reduced number of unknown param eters as compared to higher order and complicated clique interactions, but they also appear to be sufficiently complex to model the desired images. In this dissertation, we confine the Gibbs energy function to pairwise interactions: £ '( * ) = £ L M -V ''(.r,,.r,). (2-5) j k.>j,k£Nj where Nj denotes the set of eight nearest neighbors of pixel j with Kjk unity for horizontal and vertical neighbors and 1 / \/2 for diagonal neighbors. Therefore, the potential functions KjkV{xj,Xk) for two-pixel vertical neighbor cliques are identical 10 o o o o o o o o o o o o o o o o o— o— o o o o o o o o o o o o (a). Four nearest neighbors o o o o o o o o o (b). Eight nearest neighbors Figure 2.1: Neighbor systems for four nearest (a) and eight nearest (b) neighbors 1 • ft o — • o o — • (a) Four nearest neighbors (a) Eight nearest neighbors Figure 2.2: Pairwise cliques for four nearest (a) and eight nearest (b) neighbors to those for two-pixel horizontal cliques. The potential functions for all two pixel, di agonal cliques are generally set equal to l / \ / 2 times those for horizontal and vertical pairs. This choice, although somewhat arbitrary, reflects the isotropic requirement, w ith the l / \ / 2 factor reflecting the larger distance between diagonal neighbors on a square lattice. In Figure 2.1 and 2.2, we illustrate two typical neighborhood systems and their associated pairwise cliques. The characteristics of these restricted MRFs are com pletely determ ined by the form of the potential function F(-), and the size of the global hyperparam eter (i. 2.2 M R F -G ibbs Im age Priors T he human visual system is very sensitive to abrupt changes in brightness, color and depth. These correspond to some form of discontinuities in the scene. The ability to reconstruct or identify discontinuities is therefore essential in many vision 11 and image processing problems such as edge detection, surface reconstruction, im age restoration, medical image reconstruction, and motion estim ation. Therefore, one of the m ajor criterion to evaluate the quality of image priors is whether or not the priors are robust for both smooth regions and abrupt changes (edges, surface discontinuities, occluding boundaries etc). The priors are usually chosen to favor piecewise sm ooth images, a property which is reflected by looking at local differ ences. Usually, the sm ooth regions correspond to relatively small local differences, while visual discontinuities correspond to large local differences. Consequently, the potential function can be chosen as a function of the difference between neighboring pixels. When designing a prior based on the local intensity distribution, one can adopt two points of view towards the effect of the prior on the Bayesian solution. On one hand, we can regard the prior Gibbs distribution from a decidedly Bayesian perspective as an a priori assignment of probability to possible realization from the sample space. This perspective is useful when we design the exact functional form of the prior distribution. On the other hand, we can simply consider that P(x\f3) is introduced for the purpose of regularizing the otherwise unstable solution, used for instance, by the m axim um likelihood method. From this perspective, P{x\fi) is analogous to a penalty function as used in the so-called penalized maximum likeli hood. The two points of view complement each other and both are useful to keep in m ind in designing an appropriate prior. There are two classes of MRF priors popular for modeling visual phenomena. One is a Gibbs distribution defined on intensity processes x only, the other uses line processes I as well to model discontinuities explicitly. 2.2.1 M R F Image Priors U sing Intensity Processes only Various suggestions for the form of V(-) have been given, including the (convex) quadratic function (L2 norm of the local difference): V(xJ,xtc) = (xJ ~ x k)2 (2.6) where j and k are adjacent vertical, horizontal or diagonal pairs as illustrated in Fig ure 2.2. This function penalizes the difference of neighboring pixels at, an increasing 12 Huber function l o g ( l . + Figure 2.3: Plots of potential functions of Gibbs distributions for image priors rate, which heavily penalizes the presence of image boundaries. MRFs characterized by this potential function tends to force the image to be smooth everywhere. By considering large local differences as outliers in local distributions (as in robust statistics), many have suggested using potential functions which are more robust to outliers. All these potential functions attem pt to accommodate these outliers by reducing penalties on large local differences. Bouman [9] suggested using V(xj,xk) = |.r,-.T * |“ , l < a < 2 (2.7) which includes the quadratic function as a limit (o=2). By decreasing a from 2 to 1, the penalty increases at a slower and slower rate, when neighboring difference gets larger. In the lim it a = 1 (Lj norm), l/(-) provides a linear penalty function. There fore, this extrem e does not differentiate (in probability) between slow monotonic changes and abrupt changes [9] and consequently does not penalize the presence of edges or boundaries in the image. Since the L\ norm (a = 1 ) is not continuous 1 3 Huber function log(i. + i £ ^ ) ( . 7 7 ,- X h ) 2 Figure 2.4: Plots of derivatives of the potential functions illustrated in Figure 2.3 differentiable, a value of a slightly larger than 1 , such as a = 1 .2 , is more favorable in practice. As Black and Rangarajan point out [7], the L\ norm (ev=d) does not perform as well as the L? norm (o,=2) when the image neighboring differences are Gaussian distributed. The estim ator based on the L i norm characterized prior is not robust in that a single large outlier can make the solution arbitrarily bad. Therefore, a Huber function V (* „ ,* ) = { (2.8) [ |a-j — .rfc| — ^ otherwise is more preferable, which combines the behavior of the L2 norm when the neighboring differences are small while maintaining that of the L\ norm on the image boundaries when the neighboring differences are relatively large. To increase robustness on image boundaries further by reducing the penalty on large local differences, many suggest using non-convex functions which have a 14 saturating property (the ratio of increase of penalty becomes zero asymptotically as local differences increase). Two typical examples are Hebert and Leahy’s asymptotic linear function Both functions actually stop increasing the penalty applied to intensity differences beyond a threshold determined by 8. Consequently they positively favor the presence of edges in the image. Note that the functions of Huber, Hebert and Leahy and Geman and McClure all have an additional hyperpararneter 8 that m ust be specified. The potential functions discussed above are illustrated in Figure 2.3. We also show the derivatives of these functions in Figure 2.4, which illustrates the rate of change of penalty as local intensity differences increase. Figure 2.5 shows nine sample realizations from the MRFs specified by three of the potential functions in Figure 2.3, for three different, values of the hyperpararneter f). The samples were generated using a generalization of the Metropolis algorithm described in [39]. The single pixel sample space was the compact set [0,100], It is clear that the quadratic function encourages the formation of a single smooth region while the Geman and McClure function encourages the formation of piecewise sm ooth regions. The L\ norm behaves somewhere in between. We also note that the param eter (3 plays a critical role in defining the behavior of these priors. 2.2.2 Compound MRF Image Priors Using Both Intensity and Line Processes The non-convex potential function described above is one approach to implicitly modeling visual discontinuities. However, the intensity-only MRF lacks freedom in controlling the behavior of these discontinuities. Towards the goal of representing “piecewise continuous” images in a more meaningful way, a class of M RF models (2.9) and Geman and McClure’s asymptotically constant rational function (2 .10) 15 Figure 2.5: Sample MRFs generated using continuous state Metropolis algorithm from com pact space [0 ,100]*MxM\ where M=128. Top row: Gibbs distribution using quadratic potential function with (3 = 0.0004, 0.001, 0.004. Middle row: Gibbs distribution using L\ potential function with /? = 0.01, 0.02, 0.05. Bottom row: Gibbs distribution using Geman-McClure’s potential function with fi = 1.0, 1.5, 4.0 and S — 10. 16 0 0 1 0 1 0 1 0 1 0 o i o i o i o i o — : Horizontal line site O I O I O I O I O' O I O I O I O I O 1 : Velical edge site O I O I O I O I O Figure 2.6: Compound M RF with intensity pixels and line processes called compound Markov random fields (CM RFs) has been proposed[18]. The com pound M RF described here explicitly model discontinuities through the introduction of a binary value “line processes” . A compound field consists of an upper and a lower level. In Geman and G em an’s framework [18], the upper level random field is a MRF defined on image intensity X = {a:,}, similar to those discussed in the previous section. Their lower level random field is a binary line process /= {/q } which is defined on an interpixel grid system, i.e. lij is the line site between pixel x ; and xj. Figure 2.6 is an illustration of such a CM RF with vertical and horizontal line processes. W hen necessary, diagonal line processes can also be defined between diagonal intensity pairs. At each site, the line processes takes the value 1 to indicate the presence, or 0 the absence, of visual discontinuity between the corresponding intensity pixels. Again the joint density of {*,1} has the form of a Gibbs distribution where the Hamiltonian is a sum of potential functions defined on cliques. The neighborhood system m ust now be defined for both the intensity and line process sites. Coupling between the dual processes * and I is realized by including line sites in the neigh borhood of the intensity sites, and vice versa. The potential functions can then be defined on the cliques of the chosen neighborhood to favor the formation of piecewise smooth regions. T he boundaries between these regions are marked by the line pro cesses. Potential functions that favor the formation of connected boundaries can be defined if the neighborhood for each line processes includes other nearby line process variables. 1 7 Firstly, without imposing any interaction of the line processes, the neighborhood will be simply as follows: For each line site /,-j, the two intensity pixels x,- and x ; are its neighbors; and for each image pixel x, the four or eight nearest neighbor intensity pixels xj as specified in the previous section and all the line sites between each pair of x; and Xj are its neighbors. A simple energy function can be defined for four or eight nearest neighborhood system which achieves the desired result of modeling piecewise smooth images: U ( x , l ) = J 2 { K a ( x i ~ l ij ) + (2.11) The potential function in this Hamiltonian will be term ed U(x,-, Xj,/,j) = Kij(xi — X j)2( l — lij) + ctijlij for j G JV ,-. This class of MRFs is commonly referred to as the weak membrane prior [ 8 , 17, 46]. W ith this definition, the probability of a particular line site lij to take value 1 or 0 is decided by the relative size of the neighboring pixel difference squared and a , j . From the perspective of penalized maximum likelihood, is the size of penalty applied to turn on a line site (take value 1 ). Clearly, the smaller the o,j, the larger the probability that /q takes the value I (on). Because no line interaction cliques exist, all line sites are conditionally independent given the intensity processes. Using this independence, the line processes can be eliminated and an equivalent intensity-process only MRF can be found. In the discussion of Blake and Zisserman [ 8 ] and also in section 3.3.1, it is shown th at, in the context of MAP estim ation, the weak membrane prior specified above behaves identically to the so-called broken parabola potential function: V(xi,xj,aij) = min((xj - X j ) 2, a itJ) = I ''J ' " tJ (2.12 [ ai3 ( x , X j) ^ O ij/fcij (Xi - X j ) 2 > a:ij/tiij This ecjuivalence also provides some insight into how the compound MRF prior works: it increasingly penalizes the difference of neighboring pixels as a quadratic function until a threshold cvij/Ky is reached. Differences beyond this threshold in cur a constant penalty. Thus o ,j/« ij works as a threshold, below which the prior smoothes the neighborhood pixel difference, and above which there is assumed to be a true image boundary and no further penalty is incurred. In Figure 2.7, we illustrate samples from a compound M RF using the Metropolis algorithm. It is possible to introduce interactions between the line processes and hence to encourage the formation of more ideal boundaries (e.g. connected, thin). To achieve this, more complicated neighborhood systems and line cliques than those of the weak m em brane prior m ust be defined (for the weak membrane prior, only a single line site clique is used). For the cases where only vertical and horizontal line processes are involved, two neighborhood systems (first-order and second-order), are illustrated in Figures 2.8 and 2.9. The corresponding line cliques are illustrated in Figure 2.10 and 2.11, respectively. We then replace ai,jhj > n the weak m em brane energy function, which simply assigns a penalty to each individual line site 1^, by a more complicated energy function U^l) defined on the larger line cliques. Hence the new energy function is u ( x , d = y : e {(*.•-* > )2 (i (2-13) where ify(Z) is summed over potentials defined on all line cliques for a given neigh borhood graph. By varying the potentials assigned to different line cliques, we are able to control the formation of line process configurations. In this dissertation, this prior will be referred to as the generalized weak membrane prior. Intuitively, we can view the effect of these line process interactions as the neigh bors influencing the probability that a given line processes will turn on. In other words, they affect the threshold in the local equivalent “broken parabola” which controls the line site on and off, i.e. local thresholds are altered according to the configurations of neighboring line sites. In comparison, the weak membrane model uses only the single param eter atj to determine this threshold. The interaction term s are chosen to encourage or discourage the formation of certain line configura tions according to our prior expectations concerning the image boundaries. Usually the formation of straight lines are favored, corners, crosses and T-junctions can be treated neutrally and parallel lines and isolated line elements are penalized. In this dissertation, all the potentials assigned to line cliques are developed heuristically. 1 9 Figure 2.7: Sample CM RFs generated using continuous state Metropolis algorithm from compact space [0,100]^MxM\ where M=128. Top row: Intensity field, bottom row: line field, from a weak membrane prior with (3 = 0.03(left) and 3 (right), and S = 1 0 . Q : Pixel site 0 ■ m i 1 D O | 0 0 I Current line site o : Neighbour line site (a ) (b ) j Figure 2.8: First-order neighbor of (a) a horizontal line site (b) a vertical line site o 0 1 -1 (a ) Oolo Q : Pixel site | : Current line site : Neighbour line site (b) Figure 2.9: Second-order neighbor of (a) a horizontal line site (b) a vertical line site 2 1 D J 0 B (a) (b) (c) Figure 2.10: Line clique types for first-order neighborhood (Up to rotation) i i D D Figure 2.11: Line clique types for second-order neighborhood (Up to rotation) 2.3 Integrating V isual D iscon tin u ities from D ifferent Im aging M odalities through Line P rocesses For a homogeneous MRF, the param eter atJ of the weak membrane prior should be a constant over the entire lattice. However, it is very unrealistic to expect that all true image boundaries can be perfectly distinguished from the background using a single threshold value over the local differences. It may be desirable that the a,y’s take different values over the image to reflect different intensity differences between separate regions in the image. As we discussed above, including line interactions provides one way to adapt the local equivalent threshold by looking at neighboring line configurations. Below we describe how a spatially variant a,j can be achieved by integrating information from different imaging modalities. We consider the case 2 2 where the locations of discontinuities in the current unknown image are known to be strongly correlated with a computed discontinuity map obtained from another imaging modality. For example, as we will discuss in Chapter 6 , the discontinuities of m otion fields are often aligned with intensity edges. In Chapter 7 we describe how the functional boundaries of PE T images may aligned with anatomical boundaries extracted from MRI images of the same object. This information can be incorporated into the prior by appropriately modifying the param eter cv,j. Let the map of discontinuities computed from the secondary imaging m odality be denoted by the binary image {e,j}, with value 1 denoting the locations of boundary elements. There are two m ethods for incorporating this edge map into our weak m em brane prior, by defining as a function of {e,^}: Bi-Value M ethod: ap(e) = Q1e,y + a 2(l — ep), where ai < < a 2. This m ethod assigns one of two values at each site depending on whether or not there is a discontinuity at that site in the other imaging modality [17, 2 0 ]. Cham fer D istance Measure: a,y(e) = aC? + b where C,j denotes the Chamfer distance defined on the known discontinuity m ap and a and b are constants. In the case where there is perfect alignment of the map {e,j} and the location of boundaries of the unknown image, we expect the first m ethod to perform well. However, if this is not the case due to m isregistration or resolution differences, then the second m ethod may result in more robust behavior. C hapter 3 D eterm in istic R elaxation A lgorithm s for C om puting M A P Im age E stim ates using M R F Priors In this dissertation, we are mainly interested in maximum a posteriori (MAP) so lutions of inverse problems, which correspond to the mode of the posterior Gibbs distributions. Globally optim al MAP solutions to Bayesian inverse problems witb M R F priors can be reached using stochastic relaxation (SR) algorithms such as sim ulated annealing [35]. However, the amount of computation required by these algorithm s is huge and convergence is very slow. In most practical applications a lo cal or sub-optimal solution obtained using fast convergent and less computationally demanding determ inistic relaxation algorithms is acceptable. Many computational m ethods for solving these large problems have been studied in recent years. These include coordinate ascent algorithms (e.g. Gauss-Siedel procedures [9] and iterated conditional modes (ICM) [5]), gradient ascent m ethods (steepest ascent or conju gate gradient [46, 43]) and others (the iterated conditional average (ICA) [38, 30] and generalized EM-methods [34]). There are no clear advantages of one algorithm over the other. The performance of these algorithms is very problem dependent. In this chapter, we will provide a survey of these m ethods and discuss their performance and applicable domain. 2 4 3.1 Prelim inaries From the prior P(x\(3) in (2.1) and the likelihood P (y|:c), the posterior distribution P(x\y,/3) can be found from Bayes’ rule: 1 w 'p> p (v\P) / , P(v\*)P(*\p)ix (’ u ) It is im portant to note th at for a wide class of likelihood functions and Gibbs pri ors the posterior distribution P(x\y,{3) is again a Gibbs distribution, but over a new graph Qv = {<S,A /"P }. In general the neighborhood will be larger than that for Q of the prior. Here, S is again the lattice on which x is defined and Afr is the neighborhood system which corresponds to the MRF specified by the posterior Gibbs distribution P(x\y,/3). In extrem e cases, such as tomographic imaging with a complex m apping A in likelihood function (1.1) and (1.2), Qv is a fully connected graph. The posterior Gibbs distribution can be written as: P W V,m = CXp(- " F(XJ.y ' m (3.2) where Up(x\y, (3) = — In P(y\x) + (3U{x) is the effective Hamiltonian of the poste rior Gibbs distribution, and Z(y,/3) is the corresponding posterior partition function, in the form Z{y,P) = J [ x \ y , ft)}dx = ^ e x p { ln P ( y |* ) - f)U{x)}dx. (3.3) Note that the partition function is a function of both the hyperpararneter and the data y and involves integration over the entire image sample space X. Similarly, we can write posterior Gibbs distribution with a compound M RF as: P ( . , l | , . < » ) (3.4) where, Up(x,l\y,f3) = - In P(y\x)+/3U(x,l) and Z(y,f3) — / /V E / exp{ln P(y\x)~ fiU(x,l)}dx. 2 5 Therefore, given the observed data y, a MAP estim ator is a point estim ate of the image chosen as the mode of the posterior density. From the Bayesian point of view, this corresponds to a zero-one loss function. For a prescribed hyperpararneter value the MAP estimators x or {£,/} are the maximizers of the following log-posterior functions, respectively. x{(3) = arg max In P(x\y,(3) = arg max{ln P(y\x) — fiU(a:) — In £(y, /?)}, it X (x,l) = arg max In P(x,l\y,(3) = arg max{ln P(y\x) — f3U(x,l) — In Z{y,fi)}. x,l x,l In Z(y,f3) is not a function of *, and therefore it does not affect the MAP estimate. Therefore, equivalently we have: x(f3) = arg max{ln P(y\x) — /3U(x)}, (3.5) X (x,l) = arg max{ln P(y\x) — /?(/(* ,/)}. (3.6) x ,l Computing the MAP estim ate is a global optimization problem on one of the above cost functions. This global optimum can be reached with probability one using a stochastic sampler, such as the Metropolis algorithm or Gibbs sampler, and a simulated annealing procedure. However, the am ount of com putation involved is too huge to apply to the image processing problems considered in this disserta tion. Convergence is extremely slow: to ensure convergence to a global maximum, the tem perature annealing schedule must be slower than the rate: T = for discrete valued problems and T = i,,8 (|,-,g|n+1y] f°r continuous valued problems. Therefore, in this dissertation, we favor fast convergent deterministic relaxation al gorithms, though in general they are likely to converge to a suboptim al (locally optim al) solution. 2 6 3.2 D eterm in istic R elaxation A lgorithm s for M A P E stim ation U sin g P riors w ith Im age Intensity P rocesses Only When no line processes are used and both the log-likelihood function and Gibbs- prior Ham iltonian are differentiable with respect to the intensity processes x, the log-posterior f(x) = \nP(x\y,/3) is differentiable and gradient ascent (GA) algo rithms [37] are applicable for MAP estimation. These algorithms are usually only guaranteed to converge to a local optimum, except for convex /( * ) , and the location of this local optim um is sensitive to the initialization of the algorithms. A special problem involved with gradient- type deterministic M AP algorithms is to enforce non-negativity of the pixel value. For many image processing applications, such as tomographic image reconstruction, it makes no physical sense for the image value to be negative, hence the MAP estim ate is in fact a solution of the following constrained problem: max f(x) subject to x > o (3.7) where f( x ) is the original log-posterior function. Usually constrained optim ization is much m ore complicated to solve, so we would rather use unconstrained alternatives to the constrained problem, to which gradient-type algorithms are directly applica ble. Here, we propose using a penalty m ethod to change this constrained problem to an unconstrained optim ization problem [37]. The penalty m ethod adds to the cost function a penalty term which penalizes values which lie outside the allowed interval. Associated with this method is a param eter 7 that determines th e relative weight between the penalty and the original cost function. As this param eter goes to zero, the solution of the new unconstrained problem approaches the solution of the original constrained problem. In th is paper, we replace the problem (3.7) by an unconstrained problem of the form max < 7(7 ,* ) = / ( * ) - ~ G (* ) (3.8) x 7 where 7 is a positive constant and G is a continuous function 0 1 1 ,V satisfying (i) G(x) > 0 for all * € A', and (ii) G(x) = 0 iff x > o. When a gradient based 2 7 technique is used to solve the new problem, it is also preferable th at the function has a continuous first derivative. The procedure for solving (3.7) is the following : Let {7 /t}, k = 1,2,... be a sequence tending to zero such that 7 * , > 0, 7 * > 7 ^+1 . For each k, find the athat maximizes < 7(7 *,*). Then any limit point of the sequence a s* , is a solution to (3.7) [37]. In image restoration, we use the following quadratic penalty function: where u(.) is the unit step function. Although in theory a sequence of solutions should be generated corresponding to a decreasing sequence in the param eter 7 , in practice we find that as long as an appropriate value of 7 is used it can be held constant throughout the iteration process without either significantly reducing the Two types of deterministic ascent algorithms can be applied to optim ize the above ascent algorithm . 3.2.1 Gradient A scent Algorithms We consider two popular forms of gradient ascent: steepest ascent and conjugate gradient algorithms. Steepest ascent iteratively searches for a m aximum using a one (3.9) convergence rate of the algorithm or resulting in significant negative pixel values. modified unconstrained problem: the gradient ascent algorithm and the coordinate dimensional line search in the direction of the current gradient of the cost, function. x < n+,) = as(n) + /x (n)V*</( 7 ,®(n)) (3.10) where the step size /dn) should be determined by a line search as: p ( n ) = a Tg m a x q ( y ,x (n) + /iV *( 7(7 ,* (T ,))) . (3.11) Perhaps the m ost frequently used line search is the Newton-Raphson procedure <*+i) = ..(*) , 0 q (7 ,g (n))/fl/* d2q{ 7 , a d ’ O ) / # / / . 2 (3.12) 2 8 Unfortunately, steepest ascent converges very slowly when the conditional num ber k(H) = j of the Hessian m atrix H is large (Amax(H) and A d e n o t e the largest and smallest eigenvalues of m atrix H, respectively) [37]. Conjugate gradient algorithms improve the convergence speed greatly by succes sive minimization of the cost function along a set of directions to avoid the pitfalls of steepest ascent. At iteration n + 1 the search direction is conjugate to the previous ascent direction [37]. The most frequently used form of conjugate gradient is as follows: 1. : Initial value: ad°h Evaluate g r = Vq(7 , ®(°'), = —g r ^ 2. : For n = 0 , 1 , . . . until convergence (a) Set a d n + i ) = x ^ + g ^ d ^ 1 '1 where = arg max,, q ( 7 , < c * n) + g V x q(~f, x (n^)J (b) Com pute g r C +1) = Vq(j, x^n+l^) (c) Set d^n+1' = — g rC + b -f q(n) w h e r e „(») = (gr(n+ 1) - g r (w )) V (n) n 13) ' (gr(n))Tgr(n) [ J 3. : Replace adn) by ad"+1* and go back to Step 1. The convergence rates for conjugate gradient and steepest ascent can be acceler ated by using an appropriate preconditioner. The form of preconditioner is problem dependent. An example of using a proper preconditioner to accelerate convergence for PET image reconstruction is given in Chapter 6 . 3.2.2 Coordinate Ascent Algorithm s Coordinate ascent algorithms are sometimes attractive because they are easy to implement and it is easy to incorporate a non-negativity constraint, however their convergence properties are poorer than gradient descent in general. Coordinate ascent algorithms change only a single site x , in seeking a new and b etter image x, by solving following line search problem: ,T -n+1* = arg max f(x[n\ ■ ■ ■ , x ;,. . . , xj^) subject to: Xi > 0. (3.14) X j 2 9 .Jltf S ? (a) (b Figure 3.1: Values of the log-posterior function versus number of iterations for restoration of an image from additive Gaussian noise using conjugate gradient method and Gauss-Siedel coordinate descent method: (a) W ith the convex quadratic prior, (b) W ith the non-convex Geman and McClure’s prior For computing the MAP solution, one form of coordinate ascent algorithm is J. Besag’s iterated conditional mode (ICM) [5]. ICM computes a MAP estim ate by maximizing the conditional probability of each pixel given the current estim ates elsewhere and the observed data. We note that P(x\y,ft) oc P{xi\xs\„y,{3) (3.15) and P{xi\xs\i,y,(3) oc P{y\xi,xs\i)l}(xi\x;H) (3.16) Therefore, successive maximization over the conditional probabilities /•’(.r.jy, .v,s\t~ ft) or instead over P(y\xi, xg\ftP(x;|a;a,'/ ensures that P(x\y,ft) never decreases and thus will converge to a local maxima. Maximization over the /J(y|:r;, 'J's\i)P(xi\.i;-n) is equivalent to coordinate wise descent on /(sc). 3.2.3 Discussion Coordinate ascent algorithms can be implemented very easily because at each site the optim ization reduces to a simple one variable line search problem. Further more, these procedures can easily deal with positivity constraints. For continuously 3 0 differentiable cost functions, these procedures can employ a Newton Raphson line search procedure although in many cases each maximization step has a trivial closed form solution. According to Sauer and Bouman [47, 9] and our own experiments (Fig. 3.1(a)), for some well behaved convex posterior functions, such as a Gaussian posterior distribution, coordinate ascent performed in a Gauss-Siedel type procedure converges faster than steepest descent and conjugate gradient algorithm. Although conjugate gradient is relatively difficult to implement especially when there is a positivity constraint involved, it does appears to converge to a better local minimum for non-convcx problems when the condition of the optim ization problem becomes worse. For instance, for a MAP problem using the non-convex Geman and McClure prior, coordinate ascent (ICM and Gauss-Siedel) appears to converge to a “worse” local minimum (Fig 3.1(b)). In most cases, coordinate ascent is more sensitive to initialization. 3.3 D eterm in istic A nnealing P rocedures for M A P E stim ation U sing Priors w ith B oth In ten sity and Line P rocesses To help understand the algorithms developed here more clearly, we use the following set of notations, which is different from the lexicographically ordered notation we used before: S is the lattice indexed by pairs (*,j), and thus x are the pixel from the observed image and the unknown image at site /" • and /£ • denote the vertical and horizontal line sites between pixel (i,j) and (i,j— 1 ) and pixel {i,j) and (i — l ,j ) , respectively. All discussions in this section are based on a first-order neighborhood system. It is straight forward to extend them to more complicated neighborhood systems. For instance, for a second order neighborhood, diagonal line processes should be considered. Because/" = {/" •} and lh = {/((,} are binary fields, gradient ascent algorithms are not directly applicable. Coordinate ascent algorithms are still applicable. However, when the ICM m ethod is applied to the estimation of line processes, it has been observed that the line estimates converge much faster than the image intensities. It is intuitively more appealing to have an updating procedure in which the rate of 3 1 convergence of the line processes can be controlled by allowing the variables /" • and to assumes values on the continuous interval [ 0 , 1 ] but eventually converging to one of the end points (i.e. 0 or 1). In this case, a zero-one-decision about the presence of a boundary is delayed until later in the iteration process, when the intensity processes converge adequately. Below we will discuss two continuation methods, namely the Graduated Nonconvexity Algorithm (GNC) [8 ] and the determ inistic annealing algorithm. 3.3.1 M AP estim ation for the weak membrane m odel w ithout line interaction priors If we assume no line interactions as described in section 2.2.2, then each site in the line processes is conditionally independent from each other given the intensity pro cesses. The MAP estim ator is equivalent to minimizing the following cost function: F = \nP(y\x)+f3^[(xi,J-xi_hJ)2(l-ll) + ayi] + ij P E [(*.. - )2U - ih) + ah lh] • (3-17) For convenience, let D = ln P (y |a:) and a jj = a£- = a. The problem then reduces to , ll}in,h , F = , minJ /; + - li,j) + ctlij] (J V,,j) + min[(a-iiJ - ar,j_i)2(l - l-j) + «/^]}} 1 i,j* = m in { £ > + E ~ a;« -i j) + E ~ ( :i- l 8 ) f t? t? ^ (X fja(t) = min(<2, a) = < _ . (3.19) ( a r > a The minimization problem posed above is unsolvable by techniques such as gradient descent, since it is neither convex nor differentiable. The non-convexity may also pose m any local maxima. Blake and Zisserman [8 ] proposed a continuation method by approxim ating gQ with the piecewise smooth function 2(X> I N ) 140 120 100 -10 •20 -10 Figure 3.2: The effective energy function of GNC / 2 if (Kl < < ]) g*(t) = < a - e(|f| - r) 2 / 2, if q < |f| < r (3.20) o if |<| > r where c = c*/p, c* is a constant dependent upon the convexity of D , 2 c v ? '2 = « ( — 1 - 1 ) and q = — . (3.21) c r The above equations define the G raduate Non-Convexity algorithm [ 8 ]: reducing the param eter p from 1 to 0 steadily changes g* until it becomes precisely equal to ga. This produces a family of prior energies. An illustration of the potential functions discussed before are shown in Figure 3.2. The process of gradually reducing p begins by minimizing a function fdp=1) which is convex, and therefore has a unique minimum. Then, from that minimum, the local minimum is tracked continuously as p is reduced from 1 to 0 by minimization of F^p\ At each step of the c,/ ' ization, either coordinate or gradient ascent algorithm could be used. A more appealing algorithm with the formalism of GNC' is the determ inistic an nealing procedure. It was first suggested by Geiger and Girosi in their innovative 33 17 paper [17]. By im itating the tem perature in stochastic simulated annealing algo rithm s [18], deterministic annealing also introduces one more degree of freedom, the temperature, into the posterior density by defining: Pr(x,l\y,P) = P(*,l\y,P)± = ^ - ^ e x p | - - (a:^ ; y ^ ) | (3.22) where U(x,l, y\(3) is the posterior Hamiltonian in the form of ( 3.17). Note the mode of the modified log-posterior function In Pr(x,l\y, (3) is identical to the original one. Again, because we assume no line interaction term s, the vertical and horizontal line processes in the conditional distribution are independent. Hence, we are able to evaluate the posterior marginal density Pr{x\y) by integrating out the binary line processes as below[6 , 17, 20]. Pr (x\y,f3) = ] T P T(* ,/|y ,/? ) = £ P ( * , % ,/? )* 111 { * } exp{ — ! [ $ X)} f ~ ~ lh) + * ■ < »,« x T -------------------------- X £ e x p [ X {i”l \ 2 /l lk \ i „,k ill /?E.-,j[(a-,j ~ ~ lj,j) + "ijij] T {/'‘I ' £i M ^ n { e x p ( - ^ ^ ) + ex1 > , - ^ , } n { e x p ( — 1 8 (;c‘j : r ^ i', ) i ) + e x p ( - ^ ) } ,, 4 — 7 : e x p { t{ ln lJ(y\x) -f r[y ^ln {ex p ( — l,J ' '"'~l '' ) + o x p ( - ^ X ) ) SjT\y,p) 1 u * 1 hj + E < e x p ( - J >2) + exp(- ^ ))]}} < ie f 1 f In P{y\x) - [E,,j - xi,3-i) + Ejj - •t.-ij)] Zr (y,(3)eXP{ T def 1 f \ n P (y \x ) - U r / f (x\,P) exp{--------------- } Z T (V,P) * 1 T 3 4 100 HO 6 0 ! « 20 0 -20 •20 -15 -10 -5 0 5 10 15 20 l Figure 3.3: The effective energy function of MFA (3.23) where Zr{y,l3) is the partition function corresponding to the marginal posterior at tem perature T and h { m ) is a family of continuous function, param eterized by T: h i t ) = - T \ o g [ e x p ( - ^ r ) + e x p ( - ^ ) ] (3.24) and f/|/^(x|/?) and t/£ ^ (x |y ,/3 ) are the effective Gibbs Hamiltonian functions for the marginal prior distribution and marginal posterior distribution. Examples of the continuous function h'(t) are shown in figure 3.3 for various values of T. When T is high, 4>r(t) is nearly convex. When T decreases, the non-convexity of h i t ) increases, and it can be shown th a t in the lim it as T — > 0, h ( t) converges to the broken parabola, therefore any x which maximizes Pi{x\y) at T = 0 is also a maximizer of the posterior with the broken parabolic prior (3.19). In turn, any x which is a MAP estim ate over the broken parabolic prior is also an image which maximizes the original posterior density with the weak membrane prior without line interactions. Thus T parameterizes a sequence of functions such that the x*r which maximizes these functions converges to the x* which solves the original MAP optim ization problem as T — > 0. Therefore, this determ inistic annealing procedure solves the original MAP problem with the weak membrane prior in the GNC formalism [ 6 ]. def Z t (v , P ) exp{ Ue Tf f (x\y,(3), 0.01 3 5 The determ inistic annealing algorithms is summarized as follows: 1. Start from a high tem perature T (n* = T0, where U^}1 (x\y,(3) is nearly convex, and set n = 1 . 2. Use a deterministic optim ization algorithms to compute the maximizer of the marginal posterior, In Pr(x\y,(3), at the current tem perature T (n). 3. Reduce tem perature T to T(n+1) according to a predefined schedule. Initialize the algorithm with the solution from previous T*n', then go back to step 2 . Repeat until T ~ 0 We note that deterministic annealing has the formalism of GNC, which starts from a unimodal convex optimization problem and then gradually changes the cost function to approach the nonconvex cost function. The solution at each stage will ideally follow the changes of the cost function to term inate at a good (or global) optim um . 3.3.2 M A P estim ation for the generalized weak membrane m odel W hen there are complicated line interaction cliques in the prior, i.e. the generalized weak membrane prior, line sites are dependent on each other, and it is impossible to analytically integrate out the line processes and evaluate the marginal posterior density as before. However, it is still possible to develop a deterministic procedure to com pute a good MAP solution. Insight is gained by analyzing the preceding deterministic annealing algorithm: if we differentiate the effective Hamiltonian of the posterior marginal with respect to x-ij and set the result equal to zero at each tem perature T, we obtain the following equation: 0 _ 9(JrI f (x) _ d\nP(y\x) _ dU)r / (x) d x i'j &x i,j &X i,j d In P(y\x) dxij 3 6 2 0 [ x i,j x i,j -l ) 2f3(xitj Xfj+i) ^1 - 2f3(xitj - X i - h j ) I 1 f @ aij 1 exP i _r ^ ) exp{~ X,,J y '1 ‘^ } + e x p { ^ A }/ exp{ g % i} exp { ^ 7 '^+1)2} + e x p { ^ ^ } e x p j - ^ j T J / \ exp{^ X ||J 7 *'^2} + exp{^YJ-}y e x p { ^ } e x p { ^ ri,J7 '+1''' ^ } + e x p l ^ " ^ 1'1 } J 1 - \ The determ inistic optim ization algorithms used in step 2 of the determ inistic an nealing algorithm to compute the maxima of In Pr(x\y, (3) at each tem perature T are actually trying to solve this equation at each tem perature T. If we define two interm ediate variables [ 2 1 ] . ___________ e x p { ^ } ’J e x p { ^ 7 ^ } + e x p { ^ } Jl e x p j - y '1} exp 7 ' :} + e x p { ^ } We notice the local conditional probabilities for each line site at tem perature T are: p n v \dlv x ) _ exp{-f [(x,., - •x,-1 ,J)'2(l ~ /’ I,) + o i l ] , } } ( ,’ jl j " E / ^ e x p ^ f [(*,„• - ~ /?,) + «?,/?„•]} ( ° P d i j d l ^ x ) ex P { ~ f [(-E ^ ~ ~ lt ) + E ^ j= 0 e x p { -f [ (x I,J - .r,-,,,)2( 1 - l$j) + a-J'lj}}' Therefore, For a zero-one value random variable, the mean value equals the probability of the variable takes value one, thus 3 7 For the clarity of notation, we will use < 1^ > and < /-j > to indicate Un conditional mean of each line sites from now on. Substituting this relationship back into (3.25), we get dUj*(x) _ d\nP(y\x) d U j \ x ) 3 x ( ) . V () X ; , d In . P( * i/1 x 1 = ~dxi- ~ mx,’] ~ ^ - 2 ^ x''> - x^ {~ < lh+ 1 » - 2P(xitj - art -_lfi) ( l - < l-j >) ~ 2P{xitj - ari+ w ) ( l - < /f+ lj >). (3.31) The solution of the above equation is the minizer of the cost function: F = In P(y\x) — (3 E ,,j - ^ _ , ) 2( 1 - < / ’' , > ) + < , < /^ > +(^,,, - a:,-_ 1) j) 2 ( l — < ^ > ) + < ■ < /'], >}-(3.32) We also note here when T is high, < /" ■ > and < /£ • > take values between 0 and 1. When T — > 0, < /" • > and < /£■ > converge to either 0 or 1. In other words, the cost function in (3.32) converges to the weak membrane model, when T —> 0. The above analysis reveals the very im portant fact that the deterministic an nealing procedure replaces the binary value line processes in the original posterior with there conditional mean, parameterized by the annealing tem perature T. These conditional means take values from the continuous interval [0 , 1 ] when T is high, but are forced to take value either 0 or 1, when T — > 0. Therefore, an equivalent determ inistic annealing algorithm is as following: S tepl. Com pute the mean value < /V • > and < > of each line site at the current tem perature, given the current estim ate of image intensity. Step2. Freeze all the line site to their mean values at the current tem perature, and maximize the posterior with respect to the intensity processes only. Step3. Anneal (reduce) the tem perature T and repeat the above two steps. It is also reasonable to apply the sam e idea here to com pute the M AP estimate when there are line site interaction potentials. Of course, computing the true mean value of each site of the line processes is much more complicated when there are line interaction cliques since the variables are no longer conditionally independent. However, an approxim ate solution can be reached via an iterative algorithm known as the iterated conditional average (ICA) procedure [34, 30]. Thus, we propose the following two step deterministic annealing algorithm for computing the MAP estim ate of the posterior using line interaction term s, where as an example = * S tart from a high tem perature T0: Stepl(ICA -line). Com pute the mean value < /” • > and < /£• > of each line site at the current tem perature using the current estim ate of the image intensity and the ICA procedure: < r >(»+i)= __________________ < = xP{ ^ ) _________________ j» ><»+■>=------------------------------- e x p { ^ - } ___________________ ..j e x p jg S l - ; S ,,>;} + where V/(/£j = 1 ,< >b1 )) denote all the interaction potentials which contain the current line site Step2. Freeze all the line sites to their mean value at the current tem perature, and maximize the posterior with respect to the intensity processes, i.e. jj.C+1) _ arg max P(x, < /C-M) > |y,0). X St,ep3. Anneal (reduce) the tem perature T and repeat the above two steps. 3.4 E xperim ents and C onclusions We have tested above algorithms in the following two inverse problems, surface re construction and image restoration. For the surface reconstruction experiment, the original surface shown in Figure 3.4(a) is designed to have surfaces with four typ ical shapes: a cylinder, a cone, a quadratic surface and a Gaussian surface. The 3 9 single pixel sample space of this surface is [0,220]. The degraded surface shown in Figure 3.4(b) is generated by blurring the original surface with a a uniform 3 x 3 kernel and adding Gaussian noise /V(0,202). Observing the Bayesian reconstruc tions of these surfaces, we can clearly see the behavior of various MRF priors for the different intensity patterns. The reconstructed surfaces using different priors are shown in Figure 3.4(c), (d) and Figure 3.5. For the image restoration experim ent, the original image is “Elaine” and is shown in Figure 3.6(a). The single pixel sample space of the “Elaine” image is [0,255]. Again, it is blurred by a uniform 3 x 3 kernel and then contam inated by additive Gaussian noise Ar(0,202). T he noisy d a ta are shown in 3.6(b). Through restoration of this image, we are able to judge the quality of MAP image estim ates associated with different MRF priors and the deterministic MAP algorithms. For both experiments, the likelihood functions are linear Gaus sian model as in (1.1) with known noise variance and line degradation operator. In these two experiments, we used the penalized conjugate gradient algorithm to com pute M AP estim ate from the noise data when only intensity processes are used in the prior. We used the penalized conjugate gradient algorithm combined with the determ inistic annealing procedure to compute the MAP estim ate when compound MRF priors are used. We draw the following conclusions about the algorithms discussed in this chap ter: conjugate gradient is applicable for computing the MAP estim ate as long as the posterior Hamiltonian is differentiable everywhere. It appears to reach a rea sonable sub-optimum even if the cost function is non-convex. Usually the conjugate gradient technique converges to a better local maximum than coordinate ascent pro cedures when the cost function is nonconvex. However, coordinate ascent performed in sequential order (Gauss-siedel like procedure) may have faster convergence for optimizing well conditioned, strictly convex problems. It; is also much easier to im plement when there is a positivity constraint or line processes associated with the posterior density. The most appealing algorithm for computing M AP estimates from the posterior with CM RF priors is the deterministic annealing procedure. W hen line interaction terms exist in the posterior, an appropriate algorithm will be a two step iteration accompanied by tem perature annealing: ( 1 ) use gradient or coordinate as cent to maximize the posterior with respect to the intensity processes while freezing the line processes to their means, (2) use ICA to compute the mean value of the 4 0 line processes. Although these deterministic annealing procedures do not guarantee global convergence in probability as stochastic annealing procedures do, they do ap pear to converge to much better local minimum than directly optimizing the original non-convex function without using tem perature annealing. We also make some additional comments on MRF priors here, through observ ing their effects on M AP restoration of surfaces and images: As we desired, M RF priors with semi-convex or non-convex potential functions (including CM RF priors using line processes) can both model and preserve sharp image boundaries. However, highly non-convex priors such as the Geman and McClure prior and CM RF priors tend to put too much emphasis on image boundaries and to encourage nearly piece- wise constant images (we also observe this from the sampled images from the priors in the last chapter). These piecewise constant patterns may cause unnatural effects on Bayesian image estim ates when the original image is not piecewise constant. ■11 (a). Original Surface (b). N oise Degraded Surface (c). Q uadratic prior at /? = 0.0051 (c). Huber prior a t /? = 0.042,5 = 10 Figure 3.4: Surface reconstruction experiment: (a) Original surface pattern, (b) Noise contam inated surface, A^O, 202) (c) MAP reconstruction with cpiadratic prior, with estim ated (3 = 0.0051 (d) MAP reconstruction with Huber prior, estim ated P = 0.042,5 = 10. 42 (e) Leahy and H ebert prior, at (3 = 0.05, £ = 10 (f) Gem an and M cClure prior, at f3 = 0 .4 ,6 = 10 (g). Weak M embrane prior at ft = 0.004, a = 10 (h). Generalized Weak Membrane at ft = 0.004, a = 10 Figure 3.5: Surface reconstruction experiment (continue): (e) MAP reconstruction with (e) Leahy and Hebert prior, at /? = 0.05, S = 10, (f) with Geman and McClure prior, at j3 — 0.4, S = 10, (g) with Weak Membrane prior,at /5 = 0.004 and o = 10, (h) with Weak Membrane prior,at ft = 0.004 and the potential for the line cliques determ ined by trial-and-error. 43 (a). Original (b). Noise Degraded (c). Quadratic prior at f) = 0.000994 (d). Huber prior at 0 = 0.001 4 = 0 .012, < 5 = 10 Figure 3.6: Image restoration experiment (Elaine Image): (a) Original image, (b) Noise contam inated image, N(0,202), MAP restoration with (c) quadratic prior, at fj = 0.001, (d) Huber prior, at (3 = 0.012, 8 = 10. 4 4 (e) Leahy and H ebert prior, at P = 0.1, 5 = 8 (f) G em an and M cClure prior, at p = 0 . 3 , 5 = 1 0 (g) Weak m em brane prior, a t = 0.001,5 = 12 (h) Generalized weake membrane, at p = 0.001 Figure 3.7: Image restoration experiment (Elaine image continue): M AP restora tions using (e) Leahy and Hebert prior, at (3 = 0.1,5 = 8, (f) Geman and McClure prior, at j3 = 0.3,5 = 10, (g) Weak membrane prior, at (3 = 0.001,5 = 12 and (h) Generalized weake membrane, at P — 0.001, and the potentials for line cliques are obtained by trial-and-error. C hapter 4 M ean Field Theory: A T ool to Sim plify th e A nalysis of Joint G ibbs D istributions As we observed in the previous chapters, Gibbs prior distributions are abundant and sufficient flexible to model a wide range of images. Gibbs posterior distribu tions are the basis for computing optim al Bayesian solutions of inverse problems. We will also show in a later chapter th a t the information carried by these two dis tributions is also sufficient for com puting a m aximum likelihood estim ate of the global hyperparam eter. However, because of the complexity and dimensionality of the image processing applications, analysis and com putation based on Gibbs distri butions, can be extremely difficult and sometimes impossible. For instance, for a m oderate size image with 256 x 256 pixels, evaluation of the exact size of the parti tion functions defined in (2.2) and (3.3) requires a 256 x 256 dimensional integral. Often for these large problems some form of approximation m ust be employed. Sev eral useful approxim ation tools of this type have their origins in modern statistical mechanics, where Gibbs distributions are used to model large spin systems. Mean field approximation (MFA) is the m ost prominent among them . A comprehensive treatm ent of mean field approximation theory in modern statistical mechanics can be found in the book by Chandler [12]. In image processing and computer vision disciplines, mean field approaches have been previously introduced and applied to surface reconstruction [17], image segmentation [60] and motion estimation [61]. The MFA was originally developed as a statistical mechanics tool for the analysis of many body systems through approximation as a set of single body systems. The basic idea is to focus on one particular particle (in our case a pixel site) in the system 46 and assume that the role of the neighboring particles (pixels) can be approxim ated by an average field which acts on the tagged particle. This approach, therefore, neglects the effects of statistical fluctuations in all pixels other than the current tagged one. The corresponding joint description is simply the product of that for each individual particle or pixel. Clearly, this mean field approximation can never be exact, but often it is very accurate. As a useful tool for approximating joint Gibbs distribution, MFA defines a new separable joint distribution which is a product of one dimensional local densities (one per pixel site). Each of the one dimensional local densities is coupled to those of other sites through the mean field. 4.1 M ean F ield R eference System For simplicity of notation, we use P(v) = ^ e x p { - E ( x ) } (4.1) to indicate a general joint Gibbs distribution of a MRF x , with E(x) as the general Gibbs energy function. Note, the hyperparam eter fi is contained in this general energy function implicitly. In this dissertation, the Gibbs distribution can be either the prior, P(x\(i), or posterior, P(x\y,(3), densities, with E{x) = fiU(x) for the prior and E(x) = — In P(y\x) + fiU(x) for the posterior. In essence, the MFA defines a separable mean field reference system with density P m f (®)» characterized by the mean field energy function Em f(x ) and mean field partition function Z m f , to approximate the original Gibbs distribution P{x). That is: P(x) x Pm f{x ) = -7T— exp{ - E m f {x )} (4.2) AMF Since the reference field is separable in x, the new mean field Gibbs distribution P m f { x ) can be written as the product of a set of 1-D densities: FMF ( * ) = n ^ n /(^)> ( ° ) j The form of the 1-D local reference density can of course be chosen in many different ways based on different heuristic arguments. However, no m atter what form they 17 take, these 1-D local reference densities m ust couple to each other through the m ean field of their neighbors. A reasonable choice for the local reference densities is [12, 60]: p r f (x j) = P (x j \ x dj)Uaj=<xgj>m>, w i t h < •''’ j > m f = f x j P p f (x j)d x j ( 4 . 4 ) Jxj where dj denotes the set of neighbor pixels of site j for the neighborhood system of the original field P{x). In other words, each local 1-D reference density takes the form of the local conditional density with all its neighbors frozen to their mean field values < xgj >mE Note that the mean field < Xj >mf is computed with respect to the new reference field, which is different from the mean value of Xj computed from the original density. As discussed in Chapter 2, the conditional densities in (4.3) m ust be of the form P ? ! (xi) = exP < x d3 > m /)} (4.5) / jj where the local energy function E™*(xj\< xaj >mf) is a sum over all potential functions in the original Gibbs energy, E(x), th at include xj, and the corresponding partition function Z jl* is given by: Z f = / e x p { ~ E f ( x f , < xd] >mI)}dxr (4.6) Jx, Combining the local energy and partition functions gives the overall mean field energy function E m f ( x ), and mean field partition function, Zmr- E m f (x ) = £ E f ( x y , < X9j >m]) and ZM F = J ] Z ? 1 ■ (4.7) j J Note that the mean field < x >m* must be computed recursively since the local reference densities Pjl\ x j ) are coupled through < x >mE This recursive procedure can be performed using the well known iterated conditional average (ICA) m ethod which has been used elsewhere for MMSE and MAP image estim ation [30, 38]. The ICA m ethod has not been proven to converge, however the mean field is a fixed point of this iteration. 4 8 Clearly the specific form of the local energy must depend on the original Gibbs distribution that is being approximated. In this dissertation, we make use of the mean field reference system to approxim ate both the prior and posterior joint den sities. For the Gibbs priors with pairwise interactions defined in Section (2.1), it follows from (2.4) and (4.5) that the mean field reference local energies should be of the form: E f { x 3-< xaj > " * ')= /? £ V{Xi,x p kR) (4.8) keNj where x R R = < x j >m^ is the mean field value with respect to the mean field reference system for the prior. For the posterior density, the likelihood for the d ata P(y\x) m ust also be included: Ef(xf,<x„ > " ' ) = - In f ( ! /|.T „ 4 g ) + / j y ; H-») A -€Nj where x R O =< x :j >'n^ is the mean field value with respect to the mean field reference system for the posterior density. Here S \ j denotes the set of all pixels except j. Note th at the introduction of the likelihood term , P(y\x), generally increases the size of the neighborhood system of the posterior MRF relative to the prior one. In the image restoration from 3 x 3 blurring and additive noise case considered below, the neighborhood graph for the posterior density becomes the twenty-four nearest neighbors instead of the eight nearest neighbors of the prior. 4.2 O ptim al M ean Field A pproxim ation o f G ibbs P artition Functions The analysis and evaluation of the mean field reference distribution are much simpler than their original counterparts. The partition functions of the new mean field reference distribution are products of one dimensional integrals, one per pixel, which are easy to evaluate at least numerically. However, the mean field approaches described thus far are ad hoc, i.e. we argue th at the statistical influence of the neighbors of each pixel can be approximated by their mean field, and on this basis the separable reference field is selected. We 4 9 then use the new separable reference Gibbs distribution to approxim ate the original one. The choice of local density in the section above has been widely accepted in the modern statistical mechanics literature. A similar philosophy has also been applied in image processing applications and other disciplines in a variety of forms, implicitly or explicitly. For instance, J. Besag’s pseudolikelihood approxim ation [5] approxim ates a general joint density with a product of local conditional densities, and at each site the conditional densities are defined on known neighbor values instead of their m ean field values. J. Goutisias [45] developed a new class of m utually com patible M RF, whose joint distribution exactly equals the product of the local conditional densities. He then uses this special class of MRF to approxim ate general MRFs. There are of course other choices of local densities th a t can be used to form the separable reference field. For instance, Geiger and Girosi [17] suggest the following form of mean field approximation for the partition functions of a Gibbs distribution using only single and pairwise cliques: E(x) = £ K ( * , ) + £ £ Vc(xt,x 3) i i j€Ni,j>i = £ K ( * < ) + ^ £ £ vc(xi,xj) i z i j e N , ~ £ K ( : r .) + l - Z £ >mf) (4.10) ■ “ i j€/V , ZW) ~ II / exp{-P(Vc(x>) + l £ Vc(xi,<xj >'nJ))}d.vt. (4.11) i'€S x' “ j€N, Note the factor | for each double clique functions. This factor does not exist in our formalisms, nor does it in [12, 61], W ith heuristic arguments, it is hard to judge which form of MFA is more accurate. Here, we provide a means for selecting an optimal separable reference field in the sense of optimizing the partition function approximation. This approach is based on the Gibbs-Bogoliubov-Fcynmann bound (GBF) [12] and first, order perturbation theory. Theorem 1 G ibbs-Bogoliubov-Feynm ann bound For a Gibbs distribution with partition function Z and Gibbs energy F , and any other 5 0 Gibbs distribution with partition junction Z m f and Gibbs energy E m f , we have the following inequality Z > ZMFex p {- < E — Emf >m f}- (4.12) where < C . . . > M F = Z m p f [...} exp ( - E M F)dx. (4.13) J X This inequality is called the Gibbs-Bogoliubov-Feynmann bound. [Proof]: Let A E(x) = E(x) — Em f{&), then Z = J exp{ — [Em f + AE]}dx = J exp(—EMF)exp(—A E )dx = ZMF f e x p (-A E ) ~ ^ - ~ - ^ - d x (4.14) J.x Zmf where the arguments of E m f {&) and A E{x) are understood but omitted for nota- tional simplicity. We define < ... >mf= ZmF J [...} e x p ( - E MF)dx, (4.15) tin ts Z — Zmf < exp(— AE ) > m f ■ (4.16) By applying the inequality e T > 1 + x, we have < e J > = e<!> < > - e<l>. > e<J> < (1 + / - < / >) > = Therefore, we have Z > Z m f ^xp (— < E — Emf >m f )• (4.17) Theorem 1 states a. fairly general and strong fact: if we use any valid second Gibbs distribution to approximate the partition function Z of the original field, the quantity 51 Z m f exp{— < E — E m f >mf will never exceed the original true Z. It follows that the optim al separable reference field which leads to the best approxim ation of the original partition function, can be found by maximizing the quantity on the right- side of the G BF bound. P r o p o s itio n 1 The partition function Z can be best approximated through a mean field reference distribution with partition function Zmf and Gibbs energy E mf as Z « Zmf exp {— < E — Emf > mf} (4.1S) where Emf maximizes the right, hand side of Gibbs-Bogoliubov-Feynmann bound, which is Z m f ^T>{— < E — Emf >m f } Unfortunately, finding a simple and closed form optimal reference system may possible only for a restricted class of Gibbs distributions, although the Theorem is generally applicable. Here, we apply the Proposition 1 and first order perturbation theory to a special example where only interaction terms of the form {.*•,:;•,} are involved. We consider Gibbs distributions P(x) = Z~{ exp{ — E(x)} with Gibbs energy function of the form N £ ( * ) = E i where bij = (),,■ We begin the discussion by considering a mean field energy function of the fol lowing form e m f (x ) = E E “ * * * + E (4-2°) i k i Therefore, the local reference density and local partition functions are r r J(x i) = + (4.21) A k Zl’lf = f cxp{-(*TiaikXi + AHiXi)}dxi. (4.22) Jx< k E a > kx i T n E hjXiXj k Z j€/V. (4.19) 52 A system with this energy function is a model with independent spins where each spin fluctuates under the influence of a mean field. AH, is referred to as the molecular field in statistical mechanics, which is the environmental contribution to the mean field at site i. The resulting joint probability density is thus separable and the partition function will be: . N N . Z m f = / I I e x P [ _ ( E a ! ^ + & H iX i)]d x = J J / e x p [ - ( E « i /h;l';' + A H , - . T , ) ] d . r , . J x k J Xi e n 0 k (4.23) As a direct residt, the mean of the reference field is: < - w L . — » h $ « + - - w - * •* > We notice that < x , > m7 is a function of A //,, while < xj >mf ,j ^ i is indepen dent of A Hi. As we discussed before, we can use the Gibbs-Bogoliubov-Feynmann bound to optimize the mean field reference system by varying the value of A //, so as to maximize the right-hand side of the bound in Theorem 1. Thus AH, can be determ ined by solving 0 = I1 1 (Zmf exp{— < E — Emf >m f}) - * • (4-25) We proceed with < E - E m f > m f = ^ E E b8 < > m f < > " l / ~ E A / / ' < > mf ( 4 -‘ 2 6 ) “ i j£N, i where we have used the fact < X{Xj >ml — < x , >mJ < xj >’n^ for t ^ j. By combining (4.24) and (4.26), while considering the fact that the same pair < :r, > m^< xj >mf appears twice in the summation and the bjj = bji, we get 8 < ■ r 8 < r V'H _ ^ ... - >mj_ _ AM ° < > ° ^ >_ _ Y ' b < X >m} <■*',> " < ‘ ■ < 9 AH, + 3AH , ' (4.27) f> 3 Therefore, AH , = Y bij < x:l > mf . (4.28) jeN, Therefore, the optim al local reference density and local partition function take the form: p r f (^i) = i ex p { - ( ^ « tfc.Tf + X i Y bij < Xj > m /)} (4.29) /ji k jeN, Z ”lf = f e x p { - { ^ 2 a , kx- + bij < Xj > mf x,)}dxt. (4.30) x' k j eN, In turn, Z m f = I I Z? S = E [{ / ex P{-(H«iA.-'cf + Y bP < xj > mf x i)}dxi} (4.31) > ' i X' k jeN, E m f {x ) = Y Y a ikx i + E E b'.l < x 3 > m f x ‘ ( 4 - 3 2 ) i k i jeN, and < E — E m f > m f = < r Y Y b p X iX j ~ Y Y bp < x j > m l Xj > m f i ]€ N, i jeN, = (4 -3 3 ) “ * jeN, We then obtain the optimal approximation of partition function Z Z « Zjvff’e x p j — < E — Emf > m f } (4.34) = II / eX P[~(EZ a kx i + x , Y bp < Xj > mS < x, > m 1 Y bp < Xj > " l f ) \ d x t. , J x < i. icAI. “ , c « . This result can be directly applied to approximate the posterior distribution de fined by a Gaussian likelihood function and a quadratic Gibbs prior energy function, or a prior distribution with a quadratic energy function, since the Hamiltonian of these two distribution can be written as E(pR\ x ) = P Y Y F,j{x, - X j? > jeN,,j>i 5 4 @ ( 5 2 ) ) K i j x ■ y ' KijXiXj), i jeN, i jeN, and e (PO){ x ) = 5 2 5 2 a 'Jx > + E E m x i X j + p ( 5 2 5 2 k v x 1 - E E i jeJV,p > ' j e N f i jeN, i jeN, (4.35) respectively. In the above equations, the superscript PR and PO denote the prior and posterior respectively, and N f denotes the neighborhood of pixel i in posterior distribution. The first two term s in E^po\ x ) arises from the Gaussian likelihood function. A discussion of a special case can also be found in [12], In that case, the field is assumed to be homogeneous, and thus < .r, > ,n7 = < Xj >m^. The result reveals the following facts: for the Gibbs distribution with interactions defined in 4.19, the optim al local mean field energy function is simply summed over all potential functions containing the current site, while keeping the neighboring sites at their mean field value, i.e. if P{x) = -^ex p {-/?]T [K (.'r1 ) + i 5 2 K(ar.',a:j)]} (4.36) /j i 1 jeN, then the optimal mean field reference distribution is P m f ( x ) = J ] ^ ( * . ' l * a i ) * 0 l = < x O l > ™ / = II i e x P { - ^ [ ^ ( * i ) + 5 2 Vc (* » < x j > m f ) } ■ i i j€ Nt (4.37) We note that the mean field reference system suggested in section 4.1 is an extension of the conclusion we obtained from the above restricted class of M RF. Although it is impossible to prove the choice of mean field reference system is strictly optim al for any form of Gibbs distribution, the discussion above provides support for the use of the mean field system described in section 4.1 in the absence of a truly optim al reference field for the general case. 4.3 Sum m ary o f R esu lts from M ean F ield T heory The MFA can be used to facilitate the evaluation of multidimensional integrals of Gibbs distribution over unmanageably large sample spaces. Situations where integrals of this type arise include: (1) evaluation of a marginal distribution, e.g., P(xi) = / P(x)dxs\i; (2) evaluation of the ensemble average (expectation) of an image pixel or a function, e.g. < X{ > = fx XiP(x)dx; (3) evaluation of the partition the prior Gibbs Hamiltonian, < U(x) > — f x U(x)P(x)dx. T he fundam ental result of the MFA is that the marginal distribution of the field at site i [60] can be approximated by its mean field local density, i.e., A natural consequence of the above approximation is that the mean value of site i can be approximated by its mean field value as: There is no theoretical proof of the convergence of ICA procedure. However, in practice convergence to a fix point can usually be achieved. We have devoted the previous two sections to discussion of partition function evaluation. Evaluation of the expected value of the prior Gibbs Hamiltonian results directly from m ean field partition function evaluation. It is easy to verify that: function, e.g. Z = fx exp{— f3U(x)}dx [60] and evaluation of the expected value of (4.38) < x,; > ”4 must be computed recursively as the mean field local reference is dependent on the mean field value of the neighbors. A recursive procedure to compute this is known as the ICA procedure: (4.40) d\nZ{(i) _ $x U(x)ex\>{—l3U{x)}dx dfi fx exp{— ftfl(x)}dx — —E[U(x)\fi] (4.41) 5 6 G&te Prior Uiing Quadritc PoMnHal Qlbbt Prior UwgHuMi PolonUI (a) Quadratic prior (b) Huber prior Figure 4.1: Illustration of results of mean field approximation of the expectation value of Gibbs Hamiltonian: (a) Quadratic priors, (b) Huber prior. and g in Z{y,fi) _ / v f/(a:)exp{ln P(y\x) - piJ(x)}dx dp / v exp{ln F’(y|a:) — pU(x)}dx = -E[U{x)\y,{i}. (4.42) where E[-\(i] and E[-\y,f3] denote ensemble averages with respect to the prior and posterior densities respectively. Therefore, if we consider Z « Zmf, then by using the results from section 4.1 (4.8) and (4.9), we have E[U{x)\fi\^ (4.43) v L, n3 V j x ^ x ^ d x , j L, ex p { -g E keN, V{xj, x^R)}dxj gin 7 po E [U (x)\y,(}]n ---------------------------------------------------------------------------------- (-4.14) ^ Jr, EiceNj V ( x j , X k ° ) exp{ln P(y\xj, - (3 E c -e /v , V j . r ^ . r ^ y j d x j j L, expjln P(y\xj, x'^j) - P E keN, V(*j,*£°)}dTj If we consider using Z ~ Zm f ('xp{ — < E — Emf > m f} and the mean field reference system can be optimized via CJBF bound as discussed in section 4.2, then the mean field approximation of E\U{x)\(i] and E[U(x)\y,P] may have an additional term arising from 8<E~ ^ f>Mf • Of course, when the quantity < E — Emf >mf is not a function of /?, this additional term will be zero. In Figure 4.1, we compared the accuracy of mean field approximation of E[U (x)\(i\ using the mean field reference system developed here and suggested by Geiger and Girosi (see (4.10) with that computed from stochastic sampling. Note for the quadratic prior, the mean field reference system we used is optimal in term s of maximizing the GBF bound. 4.4 M ean vs. M ode Field A pproxim ations In many imaging applications, we are more interested in computing a MAP estim ate of the image than a minimum mean square error (MMSE) estim ate. These corre spond respectively, to the mode and the mean of the posterior densities. Therefore, rather th an also computing the mean field of the posterior reference field, we re place th e mean held with a mode-fiedd. This mode is computed using an iterative MAP estim ation procedure. Note that using the separable approximations described above, the mode of the original and reference fields are identical. In cases where the posterior density is unimodal and symmetric, mean and mode fields are equivalent. Such is th e case for Gaussian data with convex prior potential functions. For Poisson data, the Poisson likelihood is asymmetric and the two m ethods are not equivalent. However, for relatively high mean value, the Poisson likelihood is well approximated by a sym m etric Gaussian function (i.e. the log-likelihood function In P(y\x) is ap proxim ately symmetric). We therefore anticipate only minor differences between the mode-held and mean-held approximations in this case. 5 8 C hapter 5 Sim ultaneous M axim um Likelihood E stim ation o f H yperparam eters 5.1 Introduction and P revious Studies We noted in the previous chapters that for a given incomplete data observation y, Bayesian estim ates of the image are strongly influenced by the global hyperparam eter fi. T he hyperparam eter plays a critical role in controlling the balance of influence of th e Gibbs prior and that of the likelihood. For instance, in MAP estim ators, if fi is too large, the prior will tend to have an over-smoothing effect on the solution. Conversely, if it is too small, the MAP estim ate may be unstable, reducing to the ML solution as fi goes to zero. Thus a proper choice of /3 is im portant for quantitative accuracy in the Bayesian estimates. To illustrate the effect of the hyperparam eter on the M AP estim ate, we show two curves in Figure 5.1 computed for the application of image restoration on the “Elaine” image in Chapter 3. In Figure 5.1(a), we have a typical L-curve [‘ 25, 62], which is a plot, for all valid fi, of the value of the Gibbs energy U(x) versus the log-likelihood lnP(y\x), computed at the MAP solution x for each value of fi. We observe two characteristic parts on the curve, nam ely a “flat” part where the MAP solution is dominated by the prior, and an alm ost “vertical” part, where the solution is dominated by the likelihood function. Heuristically, the region between these two characteristic parts, i.e. the “corner” , corresponds to a good balance between fidelity to the data and smoothness of the solution. Figure 5.1(b) shows a curve of the mean squared error in the MAP estim ate for a range of hyperparam eter values. It is clear that an appropriate choice of fi is 59 i x 10* Figure 5.1: Illustration of the quantitative effect of the global hyperparameter (a) L-curve, (b) Global mean-squared-error of restored image versus /3 necessary to achieve a small error. Furthermore, we have observed th at the corner of the L-curve corresponds to values of the hyperparam eter j3 that are close to those which minimize the mean squared error for the MAP estim ation problem described here (both points are indicated by *). Similar observations were made concerning the more general regularization problem in [25]. A full Bayesian approach requires a “hyperprior” density for the hyperparam e ters. However, it is desirable that Bayesian estimators are completely data-driven, in th at they do not depend on any a priori knowledge of the hyperparam eter. An ad hoc simplification of the fully Bayesian approach is to use a point estim ate instead, which is estim ated from the same observed data. Instead of selecting the hyperpa ram eter in an ad hoc fashion through visual inspection of the resulting images, there are two basic approaches for choosing (3 in a more principled manner: (i) treating ft as a regularization param eter and applying techniques such as generalized cross vali dation, the L-curve, and \ 2 goodness of fit tests; (ii) estimation theoretic approaches such as maximum likelihood. The generalized cross-validation (GCV) m ethod originates from smoothing of noisy data with spline functions to estim ate the correct degree of smoothing [13]. It has been extended to Bayesian image restoration and reconstruction [30]. This m ethod assumes the solution is an unknown function g € and th at the m ea surement errors (noise) are uncorrelated zero mean random variables with common 60 unknown variance a2( “white noise” ). Thus the smoothing spline estim ate of g given = Viii — 1,2, ...,n and regularization constant A, which we call gn < \ is the minimizer of n_11 l(9(U) ~ Vi)2 + A / ( g(m\ t ) ) 2dt (5.1) i= 1 J where g^n\ t ) is the m th order derivative of function g. The term f ( g ^ ( t ) ) 2dt is the m th order Soblev norm of function g. The idea of GCV suggests th a t a good regularization param eter A should be the one which best predicts missing data values. Therefore the optimal A = A should be the minimizer of the cross-validation function defined by V{\) = - £ , ( 9 l & {tk) - y k) \ where is the minimizer of the reduced data regularization problem: " _1 £(</(*,) - Vt)2 + A / (g{m)(t))2dt. (5.2) .=> J i^k Several difficulties are associated with this method: the GCV function is often very flat and, therefore, its minimum is difficult to locate numerically [54], and the m ethod may fail to com pute the correct A when noise is highly correlated [55]. We also note th at for a large dimensional problem, this method m ay be impractical due to the am ount of com putation required. Hansen and Leary’s L-curve is based on the empirical observation that the corner of the curve, illustrated in Figure 5.1(a), corresponds to a good choice of fi in term s of other validation measures [25]. The L-curve has sim ilar performance to GCV for uncorrelated measurement errors, however, the L-curve criterion is also able to work for correlated errors as long as they do not satisfy the discrete Picard condition [25]. We have applied the concept of the L-curve method to estim ate the hyperparam eter in MAP tomographic image reconstruction [62]. The L-curve is difficult to predict precisely without multiple evaluations of the MAP solution for a large num ber of different hyperparam eter values. Thus the computation cost is again very high. \ 2 statistics have been widely used to choose the regularization parameter [52]. For MAP image estimation, [27] developed an adaptive scheme based on a \ 2 statistic 6 1 to select (3. Since the image is estim ated from the same data, the degrees of freedom of the test should be reduced accordingly. This presents a problem when the data and image are of similar dimension. It is also commonly agreed that \ 2 methods tend to over-smooth the solution [52]. As an alternative to the regularization based methods discussed above, a well grounded approach to selection of the hyperparameter is to apply m axim um like lihood (ML) estim ation. The image x, which is drawn from the com plete data sample space X characterized by the param eter /3, is not observed directly. Instead, we observe a second process y, which is conditioned on this image, the MLE of the hyperparam eter corresponds to the maximizer of the incomplete data likelihood function P(y\fi), which is the marginalization of the joint probability density of com plete and incomplete data, P(x,y\(3), over the whole complete data sam ple space, i.e., P{y\f3) = fx P(y,x\(3)dx. Selection of the hyperpararneter can therefore be viewed as a ML estim ation problem in an incom plete/com plete data framework and is a natural candidate for the EM algorithm [14]. However, in most imaging appli cations, the high dimensionality and the complexity of the densities involved make the EM approach intractable. Geman and McClure [19] propose using a stochastic relaxation technique, such as a Gibbs sampler, to evaluate the E-step of the EM algorithm. While this approach provides a m eans of overcoming the intractability of the true EM algorithm, the computational cost remains extremely high. Sev eral other estimation methods have been studied which do not share the desirable properties of true ML estimation but provide com putationally feasible alternatives. They include: generalized maximum likelihood (GML) [42] or generalized maximum pseudo-likelihood (GMPL) [5], the method of moments (MOM) [18] and variational m ethods v . _ r GML [33, 42] considers the hyperpararneter (i at the sam e level as the image param eters x , and makes the simplifying approximation th at the ML estim ate of /I and the MAP estim ate of the image x can be estim ated simultaneously as the joint, maximizers of the joint density P(x,y\f3), i.e.: {*,/?} = arg m ax In P(y, x\(3) X ,fj — arg max{ln P(y\x) + max In P{x\fi)}. X 1 3 ( 5 .3 ) 60 The optim ization can usually be performed in a two-step algorithm, = arg max P(y, x |/3*^) (5.4) _ arg max P (a;lf c '|/3). (5.5) Note the first step is actually the MAP estim ate of x given the current choice of fi, and the second step is the maximum likelihood estim ate of f3 treating th e current MAP estim ate of x as complete data observation. The second step is also equivalent to solving the equation (7(adfc l) = E[U(x)\(3], since [/{x ^ ) is a sufficient statistic for /3. In practice, direct solution of GML is still difficult as the second step requires evaluation of the partition function of the prior. This step is usually replaced by the m axim um pseudo-likelihood (MPL) approach of Besag [5, 33], i.e., we replace the second step by / ^ ‘l = arg m a x IJ P{xj\x[k), k € Nj,f3). (5.6) We refer to this as the generalized maximum pseudo-likelihood (GMPL) method. The solution of these methods can be interpreted as a saddle point approximation of the true MLE [17]. This approach works well in some situations, but the crudeness of the approximation results in poor performance in general. The method of moments (MOM) [18, 38] defines a statistical moment of the data M (y), which is ideally chosen to reflect the variability in the unobserved image and to establish a one-to-one correspondence between the moment value E[M(y)\f3] and the global hyperpararneter (3. In order to generate a moment curve for fi estimation, the expected value of M(y) m ust be computed as a function of fi. In order to ac complish this, images are sampled from the prior for various fi, then pseudo random d ata is generated using the known likelihood function, and the corresponding value of the statistic computed. Initial computational costs for this method are very large, but since the curve of E[M(y)\fi\ versus fi is independent of the observation, it can be computed ahead of time and the on line computation cost is simply th at of com puting the value of the statistic for the observed data and comparing this with the precomputed value to determine the appropriate fi. The m ajor limitation in using this m ethod is in finding a statistic with sufficient slope th at the hyperpararneter can be unambiguously determined. In practice (3 is only identifiable from the mo m ent curve over a very small region. Elsewhere, the curve is very flat resulting in a high variance in the selection of /3 using this method [38]. Results are presented later comparing this technique with our new method as well as with generalized maximum psuedo-likelihood. A variational m ethod is described in [1], This approach leads to a procedure similar to, but simpler than, the EM algorithm. However, the computational cost remains high, and few validation or experimental results have been published for this method. Since the true MLE of the hyperpararneter is desirable but intractable, and the alternatives do not generally perform well, we return to the maximum likelihood approach and develop an approximation that results in a reasonable com putational cost, but which is less restrictive than the generalized maximum likelihood meth ods [5, 33, 42]. The m ajor difficulty in computing a true MLE of (3 is evaluation of the huge multi-dimensional integrals over the highly coupled joint density functions on the complete data sample space .Y, which is required by marginalization, expectation op erations and partition function evaluation. To overcome this problem, we use mean field approximation as developed in the last chapter, in which the highly coupled Gibbs distribution is approxim ated by a simple and separable Gibbs distribution. Each multidimensional integral will become the product of many one dimensional in tegrals (one per pixel), which renders the ML approach tractable. In this chapter, we present a brief sum m ary of the problem formulation of ML estimation of the hyper param eter from incomplete data and then describe the approxim ate ML estimation based on mean field approximation (MFA). This maximum likelihood estim ation is performed simultaneously with Bayesian image estimation. As we shall see, the two estim ates are m utually dependent through the approximation introduced to solve the hyperpararneter estimation problem. We present results of extensive Monte- Carlo studies that examine the bias and variance of this estim ator for cases where the true value of (3 is known. We also evaluate the performance of this m ethod on real images. 6 4 5.2 M axim um Likelihood H yperpararneter E stim ation from Incom plete D ata T he observed incomplete data, y , is related to the hyperpararneter (3 only indirectly through the complete data, the image x. Given the observed incomplete data, y , a m aximum likelihood estim ate of (3 can be found as the maximizer of the marginalized likelihood function [14] P(y\f3) = J x P(y,x\/3)dx = ^ P(y\x)P(x\(3)dx _ f,r expjln P(y\x) - (3U(x)}dx = Z(y,/3) Jx exp{-pU{x)}dx Z{fi) (5.7) where Z(y,f3) and Z(f3) are the partition functions of the posterior and prior Gibbs distributions respectively (defined in (3.3) and (2.2)). Therefore, we have: In P(y\(3) = In Z(y, (3)-\nZ((3), (5.8) and the MLE of the hyperparam ter is a root of the equation n d\nP(y\(3) d\nZ(y,f3) 8\nZ([3) ° “ w W ~ d(3 • As shown in Chapter 3, 0\nZ((3) d(3 d\nZ(y,(3) 0(3 = -E[U(x)\(3] (5.10) = =-E[U{x)\y,(3]. (5.11) It follows from (5.9),(5.10) and (5.11) that the ML estim ate of (3 from y is a root of the likelihood equation [14]: E[U{x)\y,(3} = E[U(x)\(3\. (5.12) 6 5 The solution of this equation may be computed iteratively via an EM algo rithm [14]: Estim ate the sufficient statistic £/(x) UW(x) = E[U(x)\y,0W]. Determining as a root of the equation E[lJ{x)\(3} = U{k)(x). (5.14) (5.13) Proof of convergence and other related problems could be found in [14]. We note that the M-step is in fact the maximum likelihood equation from the theory of sufficient statistics [41]. It computes the ML estim ate of /3 considering the evaluation of the true expectation in the above equation is essentially impossible, both analytically and numerically. Geman and McClure [19] thus suggest using stochastic relaxation to approximate the E-step. The computational cost remains high. 5.3 A pproxim ate M axim um Likelihood E stim ation of th e H yperpararneter 5.3.1 Hyperpararneter Estim ation Using Saddle Point Approximat ion Consider the equation (5.12) which finds the true ML estim ate of the hyperpararneter from incomplete data y. The term E[U(x)\/3] can be precomputed using stochastic relaxation and stored, for it is independent of the observation y. However, the term E[U(x)\y, fi] is observation dependent and must be evaluated “on-line” for each data set. This quantity is unmanageable due to the huge integration over the entire image sample space. The saddle point approximation [17, 40] is an alternative to mean field approxim ation, discussed in Chapter 4, for solving complicated integrals in classical statistical mechanics. It is achieved by neglecting all of the statistical fluctuations in current estim ate U ^ ( x ) as the true size of the sufficient statistic for (i from complete data. The m ajor difficulty in this algorithm is that in almost all cases of interest, 66 the field X and therefore considering only the contribution of the maximum term to integrals over the Gibbs density function. The saddle point approximation is simpler than the mean field approximation. Performance will, of course, be poorer as the assum ption is a gross simplification. The saddle point approximation evaluates the posterior partition function as [17]: Z{y,(3) « C exp{ln P(y\x) - (3U(x)} (5.15) where C is a constant and * is a solution of equation d\nP{y\x) 3U(x) — ai = 0 ( 5 - 1 6 ) Clearly a: is a mode of the posterior distribution. Using (4.42), we have: E[U(x)\y,P] = ~ U{k) (5.17) and hence the saddle point approximated maximum likelihood equation (5.12) be comes (J(x) » E[LJ(x)\f3\. (5.18) It is easy to show that the solution of the above equation is the maximizer of In P(x\j3). Combining both conditions in (5.16) and (5.18), we find that the saddle point approxim ated ML estim ate is simply the solution reached by the GML method discussed in the last section. 5.3.2 Hyperpararneter Estim ation using M ean/M ode Field Approximations The m ean/m ode field approximation developed in Chapter 4 is a much more accu rate approximation technique than the saddle point m ethod with only a moderate increase in computational cost. Rather than ignoring all statistical fluctuations, the m ean/m ode field approximation freezes the neighboring particles to their mean val ues, while only considering the fluctuation of the current tagged particle. Since we 6 7 are more interested in the MAP image estimate, we use mode field approximation most of the time. We now use this mean field approximation to simplify the hyperpararneter esti mation problem described in Section (5.2). This problem reduces to finding a mean field like approximation to equation (5.12). This can be done in several ways - we describe three alternative methods below. M ethod 1: Approximating the marginal likelihood function The first m ethod uses the mean field approximation of the marginal density P(y \jd) in (5.7). The mean field approximation of the posterior density is found by substituting (4.9) in (4.5), and (4.5) in (4.3). We substitute this approxim ated posterior density into the num erator of (5.7) and choose the normalizing constant in the denominator so that fy P(y\(3)dy = 1. We then have ■ _ iy exp{ln P(y\x) - (3U{x)}dx V fx exp{— (x)}dx n , L , exp{In P{y\Xj, a££) ~ (iU[°{x3)}dxt P u F i m = n . f ^ A - w r ^ , (,,-l9) where U f°(xj) = V{xjix k°)' To see that P m f (v \P) is a, valid density, note that we can rewrite (5.19) as: )\P)dxj . (5.20) Starting from this approximation, we can parallel the development in section 5.2, by setting the derivative of the logarithm of the approxim ated marginal density in (5.19) to zero. We then arrive at the approximate likelihood expiation, i.e. the mean field equivalent to (5.12): r L, U f ° (• Tj )exp{1 n P ( y |x j , x ^ ) - f W f ° ( Xj)}dx j Sxj exp{ln p (y\x:n - p u f ° ( x j ) } d x J . 1 s p Lj UjPO(*j) exp{-pU f°{xj)}dxj 68 Given the posterior m ean field, xpo we can then solve (5.21) using a Newton Raphson procedure. M ethod 2: Approxim ating the likelihood equation (5.12) Instead of approxim ating the marginal likelihood (5.7), we can instead use the mean field approxim ated prior and posterior partition functions directly in (4.41) and (4.42). Using the definitions of the local reference field energies, (4.8) and (4.9), and the relationship between these and the reference field partition function (4.7) we have: Z(fi) « Z PR((3) = J[f e x p { -d £ l/ (.rJ,.r™ )}(/.rJ j x > k€Nj Z(y,P) » Zpo(y,l3) = J l [ *xp{ln P(y\Xj, ^ ) - (5.22) j Differentiating the logarithm of the two partition functions in f) and equating them gives us the second approxim ate likelihood equation: fXj U[°(xj) exp{ln P(y\.r], -I'sy,) - 0Uf°{xj)}dxj j fx, exp{ln P(v\xj,x§fi) - p U f°(x J)}clx: j = fx} U fR(xj) exp{— /?f/fin(.Tj)}(/.Tj j J i j ( ' X ( ■ l‘j ) } d x j ’ where UPR(xj) = EkeNj V (x:nxk}> ) and = EfceN, ^ ( x;nxk°)- Comparing (5.21) and (5.23), we see that the left hand sides are identical. The differences on the right hand side result from two different ways of approxim ating the derivative of the log of the prior partition function. In the first method, the mean field approximation is performed with respect to the mean of the posterior reference field, while in the second m ethod, the approxim ation is with respect to the mean of the prior reference field. Since both m ethods are heuristic, it is difficult to argue the m erits of one over the other. Below, we compare the performance of the two m ethods for known fi. 6 9 M ethod 3: Optim al Partition Function Approxim ation via the G B F Bound In section 4.2, we show that (i) for the case of Gaussian prior and posterior densities the optim al reference fields are identical to those described by the local energy function (4.8) and (4.9) above, and (ii) the constant < E — Emf >mf is dependent on the mean field value only. Now consider the impact on the two approximate methods described above. Method 1 works with the posterior density rather than directly with the partition function, so the constant factor in the G13F bound cancels in the num erator and denom inator and the m ethod is unchanged. Method 2, however, equates the deriva tives of the prior and posterior log-partition functions. Using the GBF bound results in a modification of the derivatives, since the constants are different for the prior and posterior partition function, and hence the mean field approxim ated likelihood equation is modified to include this difference. This constitutes our third m ethod and is compared with the others in our first Monte-Carlo experiment. By using results from Section 4.2, the third mean field approxim ated maximum likelihood equation will be: v I , F'/JO(.*'J)exp{ln P(y\xj, *£{$) - 0Uf°(xj)}dxj 1 ,,0 ,,G fX ) exp{ln P(y\xjt ) - fiU[0 {x3)}dx3 2 \ \ **** ** v Jx, UFR{xj)exp{-P U fR(xj)}dxj j fx ^ x p i - P U f ^ X j fi d x j ’ 5.4 N um erical M ethods Using the approximations described above, the MAP estim ate of the image and the ML estim ate of the hyperpararneter are jointly computed using a two step iteration: [1] Initialize the image x k = x° and hyperpararneter (ik = fiu. Set k = 0. [2] Maximize P(x\y;/3k) to find x k+l. [3] Solve the approximated likelihood equation for flk+l using * fc + 1 as the current posterior mode field. [4] Set k = k + 1, goto step [2]. 7 0 Figure 5.2: Illustration of the adaptive quadrature In practice, neither steps 2 nor 3 need be iterated to convergence before moving to the next step. We have no convergence proof for this method. However, in running the m ethod for a wide range conditions, we have never observed a case in which the m ethod does not converge. We refer to this method as mode field approximated maximum likelihood (MFAML). We used a Newton-Raphson root finding procedure to solve the approximated likelihood equation (5.21) or (5.23) for updating the hyperpararneter. All inte grals encountered were computed using numerical integration based on an adaptive quadrature method [48] which is described below. The Newton-Raphson root finding procedure is combined with a deterministic MAP algorithms to perform sim ultane ous image and hyperpararneter estimation following the algorithm described in the above. When we compute a numerical integral for Gaussian functions with uniformly distributed nodes, a great number of function evaluations must be made. A reduc tion of effort can be attained by choosing nonuniformly distributed nodes. Adaptive quadrature methods can be used to automatically control the locations of nodes [48]. An integral can generally be computed more accurately over a short interval than over a long interval. Therefore it is advantageous to subdivide the interval of in tegration [«,&], prior to the application of a quadrature rule. Adaptive methods subdivide [a,b\ successively until the prescribed accuracy is attained for each subin terval by means of the quadrature formula used. The subdivision is finer when' the function f(x) varies greatly and coarser in regions of small variation. The de cision whether or not an interval has to be divided further is made by comparing 71 two different approximations /] and I2 of the same subintegral. In our applications on the subinterval we use Ix = \h3[f{a,j) + f(bj)], where h3 = b3 — a3, and 1 2 = + 2 /ij/(m j)], where mj = \{a3-\-bj). When the absolute difference between the two approximations is smaller than a prescribed precision, we stop subdividing this interval. In Figure 5.2, we illustrate a typical distribution of the nodes generated by this adaptive scheme. 5.5 Perform ance Studies 5.5.1 Estim ator Bias and Variance using Stochastic Sampling We used extensive Monte Carlo simulations to evaluate the performance of the three new MFAML hyperpararneter estim ators in the problem of image restoration from 3 x 3 blurring and additive Gaussian or Poisson noise. We have compared the performance of the three methods described above with a generalized maximum pseudolikelihood (GMPL) and the method of moments (MOM), of which the statistic M(y) takes the same form as the Gibbs energy function of the prior, computed over the noisy image y with eight-nearest neighbor interactions. We performed Monte Carlo studies for image restoration as follows. For each value of the hyperpararneter, fifty sample images were drawn from a specific prior using the Metropolis algorithm[39]. To verify the sampled image, we use maximum pseudolikelihood from complete data to estim ate the hyperpararneter from the sam pled images. Each sampled image was then blurred by one of the following 3 x 3 kernels: ' 0.001 0.028 0.001 ^ f 1/16 1/8 1/16 Opt 1: 0.028 0.884 0.028 , O pt 2: 1/8 1/4 1/8 o o o 0.028 o o o I 1/16 1/8 1/16 and Opt 3: ( 1/9 1/9 1/9 ^ 1/9 1/9 1/9 V 1/9 1/9 1/9 ) 7 2 Note that the smoothing effect will become stronger from O pt 1 to O pt 3. This makes the inverse problem increasingly ill-posed. Pseudo-random Poisson noise or Gaussian noise with known variance a was generated to contam inate each of the resulting blurred images. The likelihood function for these noisy data take the forms of (1.1) and (1.2) for Gaussian and Poisson noise respectively. The hyperparam eters were estim ated for each m ethod of interest for each of the fifty noisy images. Since the original images are sampled from specific priors with known hyperpararneter values, we were able to calculate bias and variance across the fifty resulting estimates. True (3 0.0004 0.0010 0.0040 0.0100 0.0400 0.100 GMPL Mean 4.134e-4 1.093e-3 7.742e-3 * * * GMPL Bias (%) 3.35% 9.30% 93.6% * * * GMPL STD (%) 1.74% 1.60% 6.49% * * * MOM Mean 4.039e-4 1.0l2e-3 4.175e-3 1.154e-2 0.0775 0.3565 MOM Bias (%) 0.97% 1.20% 4.37% 15.4% 93.7% 257 % MOM STD (%) 2.13% 3.21% 10.9% 31.0% 340% 380 % MFAML1 M ean 4.010e-4 1.009e-3 4.114-3 1.071e-2 0.0472 0.1241 MFAML1 Bias (%) 0.25% 0.89% 2.84% 7.11% 17.7% 24.1% MFAML1 STD (%) 1.64% 1.35% 1.63% 2.23% 3.98% 8.81% MFAML2 M ean 3.932e-4 9.777e-4 3.794e-3 9.889e-3 0.0306 0.0579 MFAML2 Bias (%) -1.70% -2.23% -5.14% -11.1% -23.4% -42.1% MFAML2 STD (%) 1.53% 1.45% 1.63% 2.58% 7.08% 16.3% MFAML3 Mean 4 .153e-4 1.004e-3 3.977e-3 9.526e-3 0.0357 0.07842 MFAML3 Bias (%) 3.8% 0.42% -0.572% -4.74% -10.7% -21.58% MFAML3 STD (%) 1.21% 1.37% 1.27%__ 2.51% 4.02% 9.02% Table 5.1: Monte Carlo Test for hyperpararneter estimation comparing performance of generalized maximum pseudo-likelihood (GMPL), the method of moments (MOM), and the three mode field approximated ML methods (MFAML) described in Section 2.3. Mean, percentage bias and variance were computed using 50 independent images drawn from the prior using a Gibbs sampler and then blurred and contaminated with Gaussian noise /V(0,16) The prior had a quadratic Hamiltonian defined on a second order neighborhood. (* indicates algorithm fails to reach a solution. STD=Standard Deviation) 5.5.1.1 Com parative studies The GM PL works well only in lim ited cases for relatively well conditioned prob lems. We first compare the performance of MFAML 1-3, GMPL and MOM using d a ta generated from the quadratic prior, blurred by Opt 1, and contam inated by zero mean Gaussian noise with variance a 1 — 4. Table 5.1 summarizes the ensemble statistics of the fifty estim ated hyperparameters using all methods. Note in all cases, the MFAML’s out-performs both of the other techniques. The differences are very clear for the cases where [3 is large, corresponding to the case of very smooth images. Although the results deteriorate for all estimators, the performance of the MFAML’s m ethod is markedly superior to MOM and GMPL. Among the three MI*AML es tim ators it is not surprising to see that MFAML 3, which is based on an optim al 7 4 Table 5.2: M onte Carlo Test of MFAML 1-3 under Gaussian noise with different noise variance. True fi < T 2 MFAML1 MFAML2 MFAML3 Bias (%) Var (%) Bias (%) Var (%) Bias (%) Var (%) 0.001 4 0.9% 1.35% -2.23% 1.45% 0.42% 1.37% 16 5.28% 1.39% -9.36% 1.61% 0.47% 1.68% 36 11.1% 1.85% -14.3% 2.02% -1.37% 1.99% 100 23.2% 3.83% -31.4% 4.12% -11.5% 2.19% 400 34.8% 10.4% -94.1% 14.2% -25.5% 12.1% partition function approximation, performs better than MFAML 1 and 2. MFAML 1 and 2 tend to introduce positive and negative bias in the estim ated hyperpararneter, with absolute bias the largest for MFAML 2. 5.5.1.2 R obustness to noise We speculate th at when the signal to noise ratio (SNR) gets worse, the performance of hyperpararneter estimation from incomplete d a ta may deteriorate. To test the robustness of these MFAML’s, we use the same setup as in the comparative studies above. We tested the performance of the three MFAML m ethods with a range of noise variances. We summarize the result in Table 5.2. Although we do observe deterioration in the performance when noise variance increases, all three methods appear to perform well and are stable even for very large additive noise variances. 5.5.1.3 R obustness to different sm oothing operators The conditioning of the likelihood affects the degree of ill-posedness of the inverse problem. The more ill-conditioned the likelihood function is, the worse the perfor mance of the hyperpararneter estim ator. Results in Table 5.3 show that this is the case for the three MFAML estimators. Ill-conditioning of the likelihood appears to produce increased bias as well as variance in the hyperpararneter estimates. Bias in the ML estim ator is probably due to approximations introduced through the mean field approximations. An interesting observation from the results shown is that MFAML 2 performs better in absolute bias than MFAML 1 when the conditioning 7 5 Table 5.3: R obustness to different sm oothing operators. True P 3 x 3 operator MFA M il MFA ML2 MFAML3 Bias (%) Var (%) Bias (%) Var (%) Bias (%) Var (%) 0.004 O pt 1 2.84% 1.63% -5.14% 1.63% -0.572% 1.27% O pt 2 145% 3.78% -26.2% 2.45% 46.6% 2.07% Opt 3 170% 3.31% -30.7% 2.14% 46.0% 2.14% 0.01 O pt 1 7.11% 2.23% -11.1% 2.58% -4.74% 2.51% Opt 2 240% 5.88% -52.3% 6.68% 5.96% 1.97% Opt, 3 262% 10.2% -61.0% 7.01% 8.48% 2.01% 0.04 Opt 1 17.7% 3.89% 23.4% 7.08% -10.7% 4.02% O pt 2 157% 8.7% -62.5% 8.19% -46.2% 1.26% Opt 3 164% 9.71% -70.1% 8.60% 44.1% 5.21% of the likelihood becomes worse (e.g.,the Opt 2 and 3), while in most other settings we observe the contrary. It is not surprising again th at MFAML 3 remains the best because it uses the optimal approximation. Nevertheless, all the estim ated hyperpa ram eter values fall in a tolerable range. Later in the quantitative studies, we will see the biases in the estimates do not significantly affect the quantitative accuracy of Bayesian estim ates. In comparison, with very ill-conditioned smoothing operators, both GMPL and MOM fail to identify the hyperpararneter. 5.5.1.4 R estoration w ith Poisson noise Here we test the performance of MFAML for problems with a Poisson likelihood function. We drew fifty sample images from the quadratic prior and blurred them by kernel O pt 2. We then generated pseudo-random Poisson d ata using the pixel values of the blurred images as the mean value. The hyperparam eters were estim ated using MFAML I and 2. We did not use MFALE .3, because we could not obtain a closed form for the optim al reference field unless we used a Gaussian function to approxim ate the Possion density function. The results are shown in 'fable 5.4. 76 Table 5.4: R estoration from Poisson noise. True 0 operator MFAML1 MFAML2 mean Bias (%) Var (%) mean Bias (%) Var (%) 0.01 Opt 1 0.01091 9.1% 2.64% 0.009812 -18.8% 3.04% 0.01 Opt 2 0.02103 110.3% 10.2% 0.005665 -43.35% 9.21% 5.5.2 Applications and Validations w ith Real Images In this experiment, we used first the 3 x 3 blurring mask Opt 2 to blur two 256 by 256 real images ( “Boat” and “Moon” ). The single pixel sample space of the Boat image is [0,255], and that of the Moon image is [0,128]. Then we generated Gaussian noise with fV(0,202) to contam inate the resulting Boat image, and pseudo-random Poisson noise to contam inate the Moon image. We performed two experiments on these noisy images. First we applied the conjugate gradient MAP restoration with the quadratic prior, and the three MFAML simultaneous hyperpararneter estim ators to restore the boat image. W ith this test we then observe the effect of these three estim ators on the restoration of real images. The results are shown in Figure 5.4. Next wo used different priors (quadratic, Huber, Hebert and Leahy, Geman and McClure) and MFAML 1 and 2 to restore both the Boat and the Moon images. In this test, we aim at comparing the performance of MFAML algorithms with different priors. The restored images are shown in Figure 5.3 and 5.7. For these images, we do not know the true hyperpararneter, however, we do know the original image. We also know that the posterior is symmetric for the Boat image and very close to symmetric for the Moon image, and thus the MAP estim ate equals or approxim ately equals the MMSE estimate. Therefore, to evaluate the performance of our algorithms we restored the two degraded images using a number of different hyperpararneter values, we then computed the squared-error of the difference between the restored image and the original. The corresponding curves representing the restored image error as a function of hyperpararneter are illustrated in Figure 5.5 and Figure 5.6 for the Boat image, and Figure 5.8 and Figure 5.9 for the Moon image. Finally, we located the MFAML 1 and MFAML 2 estim ate of the hyperpararneter, denoted by “*” and “X” respectively, on the curve.The MFAML 3 estim ate is indicated by “O” on the 77 curve whenever available. If the hyperpararneter estim ation is successful, it should direct the restoration near the optim al locations, i.e. the minimum squared error. We found that this is indeed the case. 7 8 Figure 5.3: Experiment for image restoration from Gaussian noise (Boat image): Top row: left=original, right=noisy; m iddle row: left=M A P with Quadratic prior, at (3 = 0.000384, right=M A P with Huber prior; bottom row: left=M A P with Hebert and Leahy prior, right=M A P with Geman and McClure prior. All images shown above correspond to the estim ated f3 use MFAML 1. 7 9 Q u a d ra tic Prior 14000 12000 10000 1 8000 6000 4000 y * I f f ’ ■ P 1 0 1 0 ' 1 0 10" 1 0 Global Hyperpararneter Figure 5.4: Comparisons of restored images using three different forms of MFAML method. Top raw: left=M FA M Ll, center=MFAML3 right=MFAM L2, on the bot tom is the plot of global squared error versus hyperpararneter, where the star, circle and cross correspond to MFAML 1 to 3 respectively. Q uadrate Prior I MAP with quadratic prior * 1 0 * MAP with Huber prior Figure 5.5: Plots of global squared error of MAP images versus hyperpararneter values and L-curve. Left: Squared error versus hyperpararneter, right: L-curve. 81 * 10* MAP with Hebert and Leahy prior ! I X 10 MAP with Genian and McClure prior Figure 5.6: Plots of global squared error of MAP images versus hyperpararneter values and L-curve. Left: Squared error versus hyperpararneter, right: L-curve. Figure 5.7: Experiment for image restoration from Poisson noise (Moon image): Top row: left=original, right=noisy; middle row: left=M A P with Quadratic prior, right=M A P with Huber prior; bottom row: left=M AP with Hebert and Leahy prior, right=M A P with Geman and McClure prior. All images shown above correspond to the estim ated (3 use MFAML 1. g » 10* x 10' MAP with quadratic prior Huter M m i * 10* 10' MAP with Huber prior Figure 5.8: Plots of global squared error of MAP images versus hyperparam eter values and L-c.urve for Moon image with Poisson noise. Left: squared error versus hyperparam eter, right: L-curve. 8-1 I l 10s 1 10' MAP with Hebert and Leahy prior g x 10' MAP with Geman and McClure prior Figure 5.9: Plots of global squared error of M AP images versus hyperparam eter values and L-curve for Moon image with Poisson noise. Left: squared error versus hyperparam eter, right: L-curve. 8 5 C hapter 6 A pplication I: Bayesian O ptical Flow C om putation U sing V ector M R F Priors and Sim ultaneous M axim um Likelihood H yperparam eter E stim ation 6.1 Introduction Optical flow, according to Horn and Schunck’s original definition [29], is the apparent motion of brightness patterns observed in an image sequence. To recover the optical flow from an image sequence is an ill-posed problem. Under the assumption th at the intensity of each point on an object remains constant along its trajectory through the image sequence, there is in general no unique m otion field associated with a given moving brightness pattern. As an extrem e case, for example, a ball of uniform intensity undergoing translation, is indistinguishable from the sam e object under going both translation and rotation about its center. Horn and Schunck suggest using Tikhonov regularization [53] to resolve the ill-posedness of the motion estim a tion problem by imposing a smoothness constraint on the solution. Enkelmann et al [15] developed the so-called oriented smoothness constraint which only smoothes the velocity along directions orthogonal to the local image gradients and curvature, i.e. in the direction in which the velocity field is not identifiable from the data. Ar guing that the smoothness constraints are ad hoc, some adopt physical constraints from continuum theory to solve the ill posed motion problem [16, 50, 68]. Although oriented smoothness constraints allow spatially-variant smoothing, regularization- based motion estimation methods in general can not preserve m otion discontinuities very well. Those motion discontinuities occur mainly at occluding boundaries of moving objects and are the most im portant information that the estim ated motion field should carry to the next level of processing in computer vision applications. In recent years, many researchers have noted that Bayesian techniques together with an extension of compound MRF priors discussed in C hapter 2 to vector fields can be applied to the estimation of motion fields. A binary line field can be used to represent and preserve discontinuities in the motion field. Bouthemy and La- lande [10] and Heitz and Bouthemy [28] apply this type of CM RF prior to formulate the motion detection and segmentation simultaneously. They use iterated condi tional modes (ICM) to optimize the resulting Bayesian cost function. More recently, Konrad [32] presented a more general Bayesian formalism of the motion estimation problem, that was solved using simulated annealing via a Gibbs sampler. In this chapter, we first establish a. Bayesian formalism for optical flow estimation using vector M RF priors with or without line processes. We then extend those prior with semi-convex or non-convex potentials, such as the Huber function, Hebert, and Leahy, or Geman and McClure function, to model the vector motion field. We also use a conjugate gradient algorithm accompanied by a Newton-Raphson line search to optimize the resulting log-posterior function. We note th at these vector MRF priors are also able to preserve motion discontinuities very well. When a line process is used, as we discussed in previous chapters, the “hard de cision” nature of ICM in estim ating binary line processes tends to lead to prem ature convergence of the line field relative to the image field (or in this case, the motion field). In other words, the method converges to an undesirable local minimum. This shortcoming becomes more severe in the motion estimation problem, in which we often deal with “sparse data” . Sparseness in the motion estim ation problem occurs because m otion can only be detected where image gradients are significant. Using ICM to compute a MAP motion estim ate of the motion field tends to cause spurious motion discontinuities. Simulated annealing algorithms can resolve this problem but are expensive. Here, we propose an extension of the deterministic annealing algo rithm s described in section 3.3.2 to compute a Bayesian motion estim ate. We also 8 7 combine the motion estimation algorithms with simultaneous mean field approxi m ated m aximum likelihood hyperparam eter estimation. Zhang [61] independently developed a similar extension of deterministic algorithm to motion estim ation but in that case no hyperparam eter estimation was involved. 6.2 A B ayesian Form alism We consider an array of displacement vectors d — {d(r,)} defined on the lattice indexed by i G S, where = {r, = ( : r G 5} and r, = ( x denotes the spatial position. The displacement field d is associated with an underlying continuous image g which is discretized at tim e and t+ over the same lattice and denoted as gt_ and gt+ respectively. The displacement field d is a set of ‘ 2-D vectors such that for all r, = ( . t € A^, i G S, the preceding image point has moved to the following point ( r,+ d ;(r , ), t+). Therefore, the displacement vector field d is the motion field or optical floiv associated with discretized image pair gt_ and £j,+. In reality, the displacement field d may not be defined at all points in A^, for example, in the case of occlusions, and it may not be unique, as in the case of a rotating uniform disk. For the tim e being, we will assume our displacement fields d obey the following rules: 1. Conservation of Intensity: For each image point (r,, f_) in the preceding image, the image intensity does not change when it moves to the point (r, 4-d ,(r,), t+) in the following image, i.e., g(ri + di(r{),t+) = g(r{, t _) 2. Smoothness Constraint: Neighboring pixels have similar displacements. Ex ceptions occur only at region boundaries. According to the intensity conservation assumption, our observation model is a noise corrupted version of the displaced pixel differences. e(d(ri),n) = g{ri,t+) - < ? ( » % ■ - d(ri),t-) (6.1) For notational simplicity, we will use g\ and g2 instead of gt_ and gt+ and d, instead of d (r,). The discretized version of our observation model is then: e{di,ri) = g2(n) - gx{n - d;) (6.2) Because the length of di is not necessarily an integer number of pixel sizes, some interpolation will be necessary for computing — d;). We assume e(d,-,r,) to be independently and identically distributed Gaussian random variables. We then can write the likelihood function as: P(g2\d,gi) = ( 2 ^ ) - M^ . e x p { - Uji9^ 9l)} (6.3) where M is the total number of motion vectors defined on and M lfj{92\d,gi) = J2t{d(ri),r l)2 1=1 is the energy of the likelihood function. Almost all of the Gibbs distributions for image priors discussed in Chapter 2 can be readily extended and applied to motion estimation by simply replacing the pixel random variables with displacement random vectors d, = [•« ,-, n,]. Therefore, the Gibbs distribution for vector MRFs without using line processes will be: /5W ) = ^ y e x p{ - /if/rf(d )}. (6.4) If we again confine the Gibbs energy functions to be defined only on pairwise cliques, then £M d) = E E K'A’iA-dj). (6.5) Forms of V(-, •) are direct extensions of potential functions defined in Chapter 2. We list them below as: Q uadratic (L2 norm) : V(di,dj) = \\di — dj\\l (6.6) f M d i - d i W 2 if lid,- - d , | | 2 < 6 t x Huber function : V(di,dj) — l (6-7) J I l id ,- - d ,II, - i otherwise Hebert and Leahy : V(di,dj) = log(l + (g Geman and McClure : V(di,dj) = (6.9) \\d i - d i \\2 + ° We adopt the binary line process I to indicate motion discontinuities occurring at region boundaries. We use the following compound Gibbs distribution as a prior: P(d,l\(3)= -^j^exY>{-pUd'[(d,l)}. (6.10) We use U d,i(d, I ) = £ ^ 2 {K ij\\d{ — d j \ \ l ( \ — l{j) + S ijlij} (6 .1 1 ) * j,j€N,,j>i for the case without line interactions, or ud,,(d,i) = Y: £ { K i j \ \ d i - d M i - i i j ) } + W ) (6-J2) i j,j€N,,j>i for the case of with line interactions, where f^(Z) is summed over potentials assigned to all line cliques. Following Bayes’ rule, we derive the posterior distribution as follows: i _ m P { 9 2 \ d , g i ) P { d \ g u fi) = F f c M ( U J ) for the prior without a line process, and m j ii \ P( 9 2 \d,l,gi)P(d,l\gi) P{g2\d,gi)P(d,l\gi) P(d,l\gl,g2 = ---------------------~ i \----------------------= --------------------h i i i 6 > 1 4 ) P{92\g i) P{92\gi) for the prior with a line processes. P(g2\d,gi) is the likelihood above ( 6.3). P(d\gi,/3) and P(d,l\gi) can be one of the prior Gibbs distribution discussed previously, if we consider x or x ,l to be independent of g{. The dependency of x or x , l on g\ allows the integration information from other imaging modalities as we will discuss later. In those cases, P(d\gi, (i) and P{d,l\g\) will be modified accordingly. 9 0 Based on the discussion above, the MAP estim ates d and ( d j ) are the solutions of the following optim ization problems: d = arg m ax In P ( g 2 \ d , g i ) + In P ( d \ / 3 ) d . t U f i g i i r i ) - g i { d i + n ) ) 2 ^ w , , = arg rm n{----------------— ---------------------------------------^ tjV (d t,d:)} ( d , l ) = n r g m \ n \ n P { g 2 \ d , I , g x) + \n P ( d , l \ g x, fi) a , I • ( S i {92(l’ i) Ql {d{ -f- Vi)) | | j j ii2/i 1 \ , ri m 1 = a r g m i n j ---------------------- — --------------------------------------------^ K i j \ \ d t - ^ ^ ( 1 - / u ) + d ->1 i j,jeN„j>i 6.3 C om puting the M A P E stim ate o f M otion F ields 6.3.1 D eterm inistic Algorithms for M AP M otion Estim ation Since each component of the displacement vector ul and n; can take any real value, no positivity constraints are required here. Thus we can easily extend the determ inistic relaxation algorithm described in chapter 3 to the above motion estimation problems. There are only two differences here: (i) the unknown field is a vector field instead of scalar field and (ii) the energy of the likelihood function is highly non-linear and non-convex and an implicitly function of the unknown. The first problem is in fact resolved in the formalism and it introduces no obstacle to us in implementing the algorithms in chapter 3. The second problem is non-trivial but can be resolved with successive linear approximations of the observation model using the current estim ate of the motion field [68]. Suppose d ^ i3nl]r is an approximation to the true solution d of the original motion estimation problem and let S d = d - d {n). (6.15) When the maximum norm of S d (i.e. = m axrgA ^^d(r) ) is small, we can use a linear Taylor series approximation g x{ r - d ) « fll(r - d(n > ) - V r/,(r - d ^ ) r S d , ( 6 . 1 6 ) 91 where Vgi(r - d (n*) denotes the gradient of field g,(r — d (n)). Substituting this approximation back into the likelihood energy function, we have U,(d ) « f a C-)- » ■ ! ' - ~ (6.17) We recognize this as a generalization of Horn and Schunck’s optical flow constraint, which is a quadratic cost function over Sd. By combining this constraint with the prior energy function, we form a new cost function on the unknown Sd. The solution of the new cost function S d ^ can then be computed by applying the algorithm s de veloped in Chapter 3, because now the likelihood energy function is a quadratic func tion. Therefore, we obtain a refined solution of the motion field ttU+i) _ u (’» ) By successively applying this procedure, we are able to reach a suboptirnal solution of the original cost function. The successive linear approximation is sometimes ap propriate to perform only in a multiresoultion framework [68], especially when the sizes of the motion vectors are relatively large. In this dissertation, we only illustrate a restricted case in which the relative sizes of interframe displacements are so small that successive linear approximations can be performed at a single resolution. For a given hyperparam eter value fj, a deterministic algorithms for computing the M AP motion field estimate from the posterior using a motion field prior without a line process is: 1. Initialize the motion field d ^ = d°. Set n = 0. 2. Initialize S d = 0. Perform a linear approximation of the function g\(r — d) using d^n\ 3. Use a conjugate gradient algorithm to compute the S d of the new approxi m ate log-posterior function. 4. Compute i*b‘+>) = u (n) q. S d ^ 5. If convergence has been achieved, stop; otherwise, n = n + l and go back to step 2 . Combining the above algorithm with the deterministic annealing procedure dis cussed in Chapter 3, we can com pute the MAP estim ate of a motion field using a compound MRF prior. We summarize the algorithm briefly as: 1. Initialize 7* = To, where 1'0 corresponds to a high tem perature. Initialize dl 'n\Tk) = d° and ln(Tk) zero everywhere. Set k = 0 and n = 0. 2. Freeze the line process, and use the algorithm above to compute the motion field d(Tk) using the previously discussed successive approxim ation m ethod and conjugate gradient algorithms. 3. Freeze the motion field d(Tk), and use the ICA algorithm to update the line process. 4. De crease the tem perature (annealing). In practice, we use Tk+i = 0.97k. If the tem perature has reached the prescribed lowest value, stop; otherwise go hack to step 2. 6.3.2 Integrating Visual D iscontinuities from Different M odalities W ith the appropriate choice of Gibbs prior, Bayesian m ethods for computing a motion field can preserve discontinuities in the motion field. However, since the discontinuities are estim ated simultaneously with the motion field itself, in practice results are often not satisfactory. We speculate that if we can obtain entire or partial information about discontinuities a priori from another imaging modality, and this information is incorporated into a Bayesian formulation, the quality of the estim ate may be greatly improved. In fact, motion discontinuities occur mostly on object, boundaries, therefore rather than using a second imaging modality, we could use a separate processing of the same data to extract object boundaries. It is useful to further factor the prior distributions involving line processes ac cording to Bayes rules: P(d,l\<h ) = P(d\l,gi)P(l\gi) (6.18) where P(d\l,g{) and P(l\gi) are defined as Gibbs distributions: I3{d,l\gx) = j ^ e x p { - / 3 U d ^(l\gi)} (6.19) 9 3 This factorization indicates clearly that the dependency of the motion field on < 7! can be characterized through the line processes. Therefore, we should use the image intensity edges of < /i to affect the formation of the line process when estim ating motion fields. The two methods described in chapter 2 are applicable here by using intensity edges from g\ to affect the value of the a , j ’s in the energy function (6.11). We also consider a m ethod that uses the intensity gradient of gx to affect these o ^ ’s without actually computing the intensity edge map explicitly by defining: = ; I ivy °( m < 6-21) where V</i(;r;, i/,) are the local image gradient components along the direction from X'i t/O X j , 6.3.3 Hyperparam eter Estim ation Because the energy function (log-likelihood) of the motion problem posed above is an implicit function, mean field approxim ated maximum likelihood estim ation can only be applied to the approximated cost function described above, i.e. the linear approxim ation of gt(r — d) using the current estim ate dn. When applying mean field approximation to either prior or posterior Gibbs dis tributions of Markov vector random field in the form of P(d\fi), the mean field reference system will again be a product of local mean field reference densities, but each of the local densities is two dimensional on vector d, =: («,, w,), i.e. P(d\(i) * n P(X \dj,j e Nf, ft). ( 6.22) i To estim ate the hyperparam eter, we have to evaluate all the resulting two dimen sional integrals which occur in the mean field local partition functions. For Gaussian prior and posterior functions, these integrals have a simple closed form. For non- Gaussian prior and posterior functions, numerical evaluation of the two dimensional integrals is non-trivial and expensive. For these cases, we shall decouple the differ ent components in each random vector by redefining the vector form of the MRF as follows: The motion field is a MRF formed by the union of two collections of random variables {u;,f £ S'} U £ S}. Denote the lattices on which u ,- and V { are defined by Su and Sv respectively. Then the corresponding lattice for the new MRF is S' = Su U Sv. The neighborhood system on lattice S ' is defined as follows: for each site i £ Su, the union {u,-,i £ SjJU-ftfj, j £ A ',- C Su} is the neighboring set, and for each site i £ S„, the union {u,-, i £ Su} U {vj,j £ N{ C S„} is the neighboring set. W ith this new definition, the joint Gibbs distribution P(u,v\/3) can take the same form as P(d\f3) before. However, we are able to define the following mean field reference distribution: P(u,v\(3) « ]\P {ui\uj,j £ N ^ v ^ Y lP iv ilv jJ £ Ni;ui) (6.23) i i After these steps the implementation of mean field approximated maximum likeli hood estim ation of the hyperparameter becomes straightforward. 6.4 E xperim ents and C onclusions We use the new Bayesian algorithm discussed above with various different forms of vector M RF priors to estim ate the motion field of the ball in two image pairs shown in Fig. 6.1(a) and (b). These computer generated image pairs correspond to a ball under rotational motion about axes parallel or perpendicular to the page. Each image has a resolution of 64 x 64 pixels. The true motion field outside the ball should be zeros everywhere. We note that results from using the Bayesian algorithm with quadratic priors (shown in Figure 6.2(a) and Figure 6.4(a)) are similar to those using the traditional regularization method of Horn and Schunck, i.e. the motion field is smoothed ev erywhere including the object boundary. This causes large non-zero motion outside the ball, where we should have no motion. Bayesian algorithms using Huber, Hebert and Leahy, and Geman and McClure priors on the motion field only and using the compound priors can all preserve motion discontinuities on the object boundaries. Results are shown in Figures 6.2 to 6.5. We observe that using complicated line cliques (shown in Figure 6.3(b),(d) and Figure 6.5(b),(d)) gives slightly b etter re sults than the other methods. This shows that using complicated line interaction term s does help the formation of desirable line configurations. We conclude that vector forms of M RF priors w ith non-convex Gibbs energy functions are able to model both smooth motion fields and motion discontinuities at occluding boundaries. The Bayesian approaches based on these priors generally per form better than traditional regularization methods, such as the Horn and Schunck’s m ethod, in the sense of preserving motion discontinuities. 9 6 (a): rotation with axis perpendicular to the page » • M h * (b): rotation with axis parallel to the page Figure 6.1: Simulated image pairs for motion estimation experiments: (a) image pair of a ball under rotation with axis perpendicular to the page, (b) image pair of a ball under rotation with axis parallel to the page. 5)7 / < l H111 / ' * \ \s WV^ • \ \ \ \ \ \ \ ' 11 in *iii, ,„, " / m n . '" /r r /1 1 '■'/////// (a) Quadratic prior i r n r / / ^ - r.LX u. f / s z m i v .v A A^iSNN \ v NAN.Ww v- ' / / ' / / / ' / / J I 177 (b) Huber prior m I B .UAAAUw .. v v v \ \^ V S fc - » • . . * ^ ' ^N\\ y / / i ; 'Cc</JwX W M \ (c) Hebert and Leahy /.zz& iZ i W t i " " t , i i n .1,( t i n \ \v _ , AAA \ \ \ \ s . . V \ \ \ \ W v _ . AAWvss.— \\W w - N \ \ \ \ v _ w w ■ ^ S s J s N S " * - ^ \\\v \, ^ N\\\\\\ \ \ V \ \ ' s \ I I I 1 1 1 //III 'till '' '//// . //// (d) Geman and McClure Figure 6.2: Results from image pair in Figure 6.1(a) using simple vector M RF priors w ithout line process: (a) quadratic priors, (b) Huber priors, (c) Hebert and Leahy prior, and (d) Geman and McClure prior. 9 8 Figure 6.3: Results from image pair in Figure 6.1(a) using compound vector M RF priors with line process: (a) vector field estimated using weak m em brane prior, (b) vector field estim ated using generalized weak m em brane prior, (c) line field associated with (a), (d) line field associated with (b). (a) Q uadratic prior (b) Huber prior . . . . / 1 /................. r / f f t , , . I fiiil,. . 1 i f / / / t i , . i I \ s s / / t t . 1 1 \ / / /it. f I \ N • X / ' / / / / / , I V S S (c) Hebert and Leahy (d) Geman and McClure Figure 6.4: Results from image pair in Figure 6.1(b) using simple vector M RF priors w ithout line process: (a) quadratic priors, (b) Huber priors (c) Hebert and Leahy prior, and (d) Geman and McClure prior. 1 0 0 (c) F'igure 6.5: Results from image pair in Figure 6.1(b) using compound vector MRF priors with line process: (a) vector field estim ated using weak membrane prior, (b) vector field estim ated using generalized weak m em brane prior, (c) line field associated with (a), (d) line field associated with (b). C hapter 7 A pplication II: Bayesian R econstruction o f Transm ission and Em ission Tom ographic Im ages w ith Sim ultaneous H yperparam eter E stim ation The primary problem in PET (positron emission tomography) is to reconstruct spa tial distributions of radio-labelled compounds from measurements of coincidence emission data, i.e. the emission sinogram. The main purpose for reconstructing a transmission image is for computing attenuation correction factors for the emission data. Although an accurate statistical model for the observed d ata can be developed, for both emission and transmission problems, the PET reconstruction problem is of ten ill-posed. This property is manifested in the high variance th at is often observed in ML reconstructions at high iteration numbers. Bayesian m ethods for PE T im age reconstruction using Markov random field (M RF) priors have proven to be very powerful for reconstruction of high quality images. However, there are two major problems lim iting the clinical utility of these Bayesian methods. The first problem is related to the iterative nature of the optim ization algorithms for computing optimal Bayesian estim ates. Reconstruction tim es can be prohibitive, since a large number of iterations are usually required to achieve convergence. The second problem is the lack of a practical and robust method for selecting the param eters of the prior, i.e. the hyperparameters. For the first problem, Mumcuoglu el al [43] have proposed using preconditioners in the penalized conjugate gradient algorithm for computing MAP PET estimates. Surprisingly fast convergence can be achieved by these algo rithm s. In this chapter we apply the m ean field approximated maximum likelihood algorithm for estimation of the global hyperparam eter to the P E T reconstruction problem. The combination of the fast converging preconditioned conjugate gradient algorithm (PCG A) and the new MFAML estim ator for the hyperparam eter allows us to perform full quantitative studies of Bayesian PE T approaches using various types of prior information, including anatomical priors from MRI images. 7.1 B ayesian Form alism for P E T R econ struction 7.1.1 T he Likelihood M odel Both transmission and emission PET d ata can be well modeled as a set of indepen dent Poisson random variables. Here we briefly describe the models for the d ata and the m anner in which scattered and random coincidences are included in the model. We assume the random and scatter coincidences have already been estimated before reconstruction. Transm ission Data Transmission P E T data are collected using an external positron source. In order to reconstruct a transmission PET image, or to compute attenuation correction factors (ACFs), it is necessary to collect both blank and transmission sinograms. The blank sinogram is collected in the absence of a patient or other attenuating medium and reflects the activity of the source. The transmission sinogram is then collected with the patient in position. Both the blank and transmission sinograms are observations of a random process and consequently should both be modeled as such. The likelihood function for the transmission sinogram is a function of the expected value of the blank sinogram, rather than the blank sinogram itself. Let j/i denote the number of coincidences between the ith detector pair in the transmission sinogram and Z { denote the expected value of the corresponding elem ent in the blank sinogram. The transmission data, ?/,, can be modeled as independent Poisson observations which consist of true, scattered and random coincidences, with respective means s, and f\: We assume that the image can be adequately 103 Detectors Septa Transmission ring source Detector module and electronics o © Figure 7.1: Illustration of a P E T system: coincidence at a and b represents a trans mission coincidence, at c and d represents a emission coincidence, at e and f repre sents a scatter coincidence, and at g and h represents a random coincidence 104 represented as a set of pixels, each with constant linear attenuation coefficient fij. The total attenuation between the ith detector pair is then determined by the sum of the attenuation coefficients fij multiplied by their respective volume of intersection lij with the ith projection strip. Let ry and tz denote the data acquisition times for the transmission and blank scans respectively. Similarly let 8y and 5Z denote the system dead tim e (as a fraction of the total d ata collection time) for the transmission and blank scans respectively. The mean of the true coincidences in the transmission sinogram is then: ti(n) = u i-'^-zi. (7.1) tzoz The normalization factor compensates for the difference in observation tim es for blank and transmission scans, and also for variations in system dead time. The scatter and randoms components are also functions of the unknown attenu ation coefficients. Typically the data are precorrected for randoms by subtracting a randoms sinogram from the transmission coincidence sinogram. Given independent estim ates of the blank sinogram, randoms sinogram and scatter sinogram, we can write the likelihood function for the transmission image as: P {y\t*) = I I e Vi = ii{p) + Si + i\. (7.2) 2/d Thus, the log-likelihood function is In P{y\ti) = + v M v i)} - (7-3) i Em ission D ata The development of the emission model is very similar to that of the transmission model. Again the data can be represented as a set of independent Poisson random variables x , • with mean .x, equal to the sum of three components: true coincidences e;, randoms f;, and scatter The major difference is th at in this case the unknown param eters, the emission intensities in each pixel, are related to the mean of the data through an affine transform rather than an exponential. x i = E(xi) = e,-(A) + Si + Vi. (7.4) 105 where = — ^2 Pij^j- ( i .5) ' J In this case the elements P;j of the m atrix P denote the probability of detecting an emission from pixel site j at detector pair i and is the attenuation correction factor for pair i. As with the transmission data model, we assume that the randoms and scatter components have been separately estim ated, We can then write the log likelihood as P(x\X) = I [ e x ' ~ j , S'i = e , ( A ) + S i + Vi. (7.6) i *i- Thus, the log-likelihood function is In P ( a : | A ) = + x.iln(xi)}. (7.7) i Note that the only difference between the emission model (7.6) and the transmission model (7.2) is the form of the relationship between the mean of the true coinci dences and the unknown parameters. Conseciuently, the algorithms for emission and transmission reconstruction can be developed simultaneously. 7.1.2 The Prior and Posterior D ensities A wide range of MRF priors have been studied for emission and transmission tomog raphy including intensity process models [47, 19, 27] and intensity and line process models [21, 30, 34]. Because both emission and transmission images consist of mainly smooth or piecewise smooth regions, all the Gibbs priors discussed in Chapter 2 are applicable here. The quantitative effect of these priors on the reconstructed PET images will be shown later in this chapter. In the following section we describe procedures for computing MAP estimates under these priors by maximizing over the posterior densities p{p\y) and p(A|x) for transmission and emission reconstruc tion respectively. W ith the prior using the intensity process only, the log posterior probabilities are: In P (fi\y) = In p(y\(i) — (iU(fi) — \n Z l(y, (3) 1 0 6 and = + j/,•/«(&)} /*.»)- ln £ '(jf,/7 ) (7.8) • i J>i In P(A|*) = In P ( * | A ) - ^t/(A) - In Z e( y ,/?) = + (7-9 ) i i j>i where Z i{y,fi) and Z e(y,0 ) are the appropriate partition functions. Cost functions for MAP PET estimation involving line processes can be defined in a sim ilar manner. 7.1.3 Incorporation of Anatom ical Priors in PET Image R econstruction In most, cases, if CMRFs are used to model P E T images, the line process m ust be estim ated from the data. Estim ates of the line process is influenced by Poisson variations in the coincidence data and are often inaccurate even for high count rates. These errors can be reduced, as suggested in [20, 34], if accurate anatomical boundaries are extracted from a registered magnetic resonance (MR) image, and are used to influence the formation of the line process. It lias been conjectured th at through this procedure, an improvement in the quantitative accuracy of the PET image will be realized. In [62], we performed a Monte Carlo simulation to evaluate the quantitative effect of incorporation of anatomical priors in PET reconstruction. Results will also be presented later in this chapter. As we have discussed before, discontinuities from different imaging modalities can be incorporated in the reconstruction through the line process This incorporation can be done by affecting the value of a tJ in the weak membrane model or values of potentials assigned to line cliques in the generalized weak membrane model. In turn, adjustm ent of the size of a,-j will affect the probability of turning on a line site between image pixels i and j. For example, in the weak membrane priors, the smaller the cv,, is, the larger the probability that we turn on /tJ {ltJ = 1 ). Therefore, we chose a,-,-(e) as some function of the anatomical boundaries extracted from the MR image which encourages the formation of discontinuities in the vicinity 1 0 7 of anatom ical boundaries. In other words, the nearer a site is to the anatomical boundary, the smaller a t] (e) is. We could consider two alternative forms for o,-j(e). The first assigns one of two values at each site depending on whether or not there is also an anatomical boundary at th at site by defining cqj(e) = a ^ j + a 2(l - e*j) [2 0 ], where a t < < o 2 are a pair of user defined constants. In the case where the anatomical and PET boundaries coincide we expect this method to perform well. However, if there is no such direct correspondence between the boundaries, then defining a,j(e) = aCfj + b where C1 ; denotes the Chamfer distance defined on the anatomical boundary map and a and b are constants, may result in more robust behavior. 7.2 P reconditioned C onjugate G radient for C om puting the M A P E stim ate and Sim ultaneous H yperparam eter E stim ation W ith the likelihood models and priors defined above, we can directly apply the MAP algorithms developed in Chapter 3 to compute a MAP reconstruction. In this section, we discuss some special issues relevant to the PET application. First, we briefly present the way to use a preconditioner to accelerate the convergence of PET reconstruction algorithms, especially when penalty functions are included. Then we modify the mean field approximated maximum likelihood hyperparam eter estim ator so that it can be incorporated into the PET reconstruction algorithms. 7.2.1 Preconditioned conjugate gradient m ethod Unless we use a quadratic approximation of the likelihood function, the sequentially updated coordinate ascent procedure will involve multiple forward and backward projections of the image during each complete stage of the optim ization [9]. This procedure is very expensive and impractical. Bournan and Sauer [9] use a Gaussian approxim ation for the likelihood function to avoid these excessive amount of com putation, however this approximation will affect the quantitative accuracy of the reconstructed images. Here we apply a preconditioned conjugate gradient algorithm 108 to compute the MAP solution for priors with intensity processes only, and we com bine the preconditioned conjugate gradient method with a determ inistic annealing procedure for the case of compound M RF priors. As we discussed in Chapter 3, a m ajor problem is that conjugate gradient m eth ods are designed for unconstrained optim ization. However for both transmission and emission PE T , it is necessary to confine the image to the non-negative orthant. This non-negativity constraint can be enforced by using a penalty function as discussed in Chapter 3. For the transmission problem we use the following quadratic penalty function: = (7.10) 7 y 7 In the emission case we use a modified form of (7.10): - G ( A ) = E ( ^ 7 !:) V a, + 7 ) (7.11) 7 j 1 where ?/,(.) is the unit step function. Although in theory a sequence of solutions should be generated corresponding to a decreasing sequence in the param eter 7 , in practice we find the param eter can be held constant throughout the iteration process provided an appropriate value of 7 is used. Holding 7 constant will neither significantly reduce the convergence rate of the algorithm nor result in significant negative pixel values. For PET reconstruction, standard conjugate gradient algorithms have slow con vergence rates, which become even slower when penalty functions are used. As we indicated, the trick to achieving fast convergence of the conjugate gradient method is to use an appropriate preconditioner. When appropriate preconditioners are used, the algorithm converges surprisingly fast [43]. We call these algorithms precondi tioned conjugate gradient algorithms (PCGA). As suggested by Mumcuoglu [43] we use preconditioners and ('\N^ for emission and transmission reconstruction respectively. Cin) = d i a g { ^ - } (7.12) 109 C<n) = d i a g { ^ - } (7.13) 2L, ik i W hen a penalty function is used to impose non-negativity, the preconditioners above need to be modified to CpenaityCT for transmission reconstruction and CpenaityCE for emission reconstruction. According to the work of Mumcuoglu [43], an appropriate form of Cpenaity is: s^(n) ^penalty where ij> is a scalar constant. In the standard applications of preconditioners, the preconditioner is not changed at each iteration. Changing the preconditioner can certainly have adverse effects on the convergence behavior, particularly for a conjugate gradient method. However, one could argue that the preconditioner will not change rapidly from one iteration to the next. It is certainly the case that practical benefits, in the form of fast convergence, are realized. If the penalized log-posterior cost function is denoted as q(0,~f), where 6 repre sents either fjt or A , then the PCG algorithms for computing th e MAP PE T image w ithout using line processes is summarized as the following: #(»»+1) _ gin) a (n)p(’ > ) • Q(n)by Newton Raphson line search p (n) = d (n) + ^ - l ) p ( n - l) d (n) = Cg(n) *<»-.) = (g(n)-g{n-l))Tdin) g ("-l)Td (n- ,) where gh*) = Vq(0,^/)g_g(») is the gradient vector, and C is the preconditioner. The PCG algorithm is initialized with p<°* = and iteratively computes the conjugate directions, p lnh Care m ust be taken to ensure th a t p*7 1 ) is an ascent direction. If the p*nl update equation is premultiplied by g ^ T, then g {n)Tp (n) _ g(")TC g (n) + /?(n-I)g (n)7 p (n-,). (7.15) / 0 0 ipdiag{ — ^-} (7.14) 1 1 0 The first term on the right hand side is positive by the positive definiteness of C . The second term can become negative which can potentially result in g(n ) V n ) < 0 (7.16) making a descent direction. In these cases, we reinitialize the PCG algorithm with p<"> = d (n). The preconditioned conjugate gradient method can be combined with the deter ministic annealing m ethod discussed in Chapter 3 for use in MAP estim ation for compound MRF priors with line processes. 7.2.2 Hyperparam eter Estim ation In principle, the mean field approximated maximum likelihood estim ator (MFAMLE) of the hyperparam eter can be directly applied here. However, the likelihood func tion for PE T includes a very large projection m atrix, which makes the numerical evaluation of the one dimensional integrals on the left hand side of the mean field maximum likelihood equations (5.21) and (5.23) very computationally demanding. Each path of the integral requires m ultiple evaluations of the forward and backward projection using this huge projection m atrix. Fortunately, because the likelihood function is Poisson, it can be well approximated by a Gaussian. This approximation is reasonable as long as the observed photon count is larger than 10 [9], which is true for the most of the data in PET. Approximating the likelihood function with a Gaussian is equivalent to using a second order Taylor expansion as a quadratic approximation of the log likelihood in (7.3) and (7.7). Therefore, for the transmission case: In P(y\fi) « --(/> - Lfi)TD{p - In ) (7.17) (7.18) and for the emission case In P(i'|A ) « — -(m — P \) T D(rn — P \) (7.19) 111 _ y~> ( x « r'' YljPjj^j)2 . . Y 2*,- ■ Kl'm ) This approxim ation will greatly simplify the numerical integral when performing MFAMLE. 7.2.3 R econstructed Transmission and Em ission Images from Real D ata As an illustration of the use of Bayesian approach in PET we show several MAP reconstructions from real human chest data for the transmission case (Figure 7.2), and real brain data for the emission case (Figure 7.3). We also show reconstructions from filtered backprojection (FBP) [49] (the standard for clinical PET scanner) for comparison. 7.3 Q uantitative Studies of B ayesian P E T Im age R econstruction In this section, we first use a computer generated brain phantom to perform quan titative studies of Bayesian PET using Monte Carlo experiments. The purposes of the studies are to examine the quantitative effect of the hyperparam eter on recon structed PET images in terms of global mean squared error and region-of-interest (ROI) bias and variance, and to validate the hyperparam eter estim ation algorithm developed in this dissertation as applied to PET image reconstruction. We then use a 3-D experimental phantom to examine the quantitative performance of Bayesian PET image reconstruction using MRF priors in conjunction with the new mean field approxim ated maximum likelihood hyperparameter estim ator. Finally, we use a com puter generated brain phantom to perform a comparative study of the effect on the quantitative accuracy of Bayesian PET image reconstruction of integrating anatom ical priors from registered MRI images. Q uantitative performance for com puter generated phantom data Realistic Poisson data was generated using a single plane of the 3-D Hoffman brain phantom (see Figure 7.4(a)). Attenuation correction factors (ACFs) were computed 1 1 2 Figure 7.2: Reconstructed transmission images from a set of real hum an chest data (a) filtered backprojection algorithm, (b) Bayesian algorithm using quadratic prior, (c) Bayesian algorithm using Geman-Mclure prior, (d) graduate non-convexity algo rithm using weak membrane prior, (e) deterministic annealing using weak membrane prior and (f) determ inistic annealing and ICA procedure using weak membrane prior and line interactions term s (e) (f) Figure 7.3: (a): A single cross section of a PET registered MR scan; (b) The anatom ical boundaries extracted from (a); (c) MAP reconstruction with a quadratic Gibbs prior; (d)reconstruction using deterministic annealing(DA) with estimated line pro cesses w ithout prior anatomical in formation; (e) DA with estim ated line processes influenced by anatomical boundaries using a binary weighting scheme; and (f) DA with estim ated line processes influenced by anatom ical’ boundaries using a Chamfer weighting scheme. 114 Figure 7.4: Computerized brain phantom and the selected of Region-of-Interest (ROIs) assuming uniform attenuation within the head boundary. A 10% uniform randoms distribution was added based on the assumption that the singles distribution was uniform across the detectors. A scatter distribution was computed using the trans mission and emission images in a Klein-Nishina based Compton scatter model (about 10%). To model variable detector sensitivity, the mean sinogram d ata were scaled using values for intrinsic sensitivity typical in a clinical scanner. The ACFs, randoms and scatter mean values were assumed to be known. The sinogram, detector, and image sizes were chosen to simulate the Siemens ECAT831 PET scanner: sinogram size: 128x 160, 320 detectors on a 64cm diam eter ring, detector size: 0.628cm, image pixel size: 0.174cm. Perfect parallel projections were produced by accurately com puting the areas of intersection of pixels and the rectangular bins formed by joining detector pairs. The original phantom image has 256 x 256 pixels with each pixel size: 0.087cm. Therefore, the reconstructed image is exactly half the resolution of the original phantom. The phantom was scaled to produce an average of 1.6M (high count) or 400,000 (low count) coincidences per sinogram. Fifty realizations of the Poisson d a ta were generated for both high and low counts. PET images were reconstructed for quadratic, Huber and Geman and Mclure priors. M AP images were estim ated for each of the fifty sets of d ata for both high and low count, for each of the priors and for a range of values of j3. Since we do not know the true hyperparam eter for the brain phantom , we generated a curve of 10* 10' to* 1 0 ’ 1 0 * IB 1 1 0 * to' 10' io* 1 0 ' to1 1 0 ' 10' I O 1 IB ' Figure 7.5: PET simulation experiment: The top row shows sample PET images recon structed, with simultaneous MFAML hyperparameter estimation, from pseudo-random high-count data, using each of the three priors: left - quadratic, center - Huber, right - Geman and McClure. The middle row shows sample PET images reconstructed, with simultaneous MFAML hyperparameter estimation, from pseudo-random low-count data, using each of the three priors. The bottom row shows the total squared error in the esti mated MAP images as a function of (3. These errors were averaged over 50 independent realizations as described in Section 3.2. The solid line corresponds to the high count data and the dashed line to the low count data. The star indicates the mean value of f3 obtained using the MFAML method for high count data and the small circle for low count data. Note its close proximity to the minimum squared error. 1 1 6 Larg* ROI wtth Boundary Pixels (Bray matter) x 10® I Bias Square x IO 3 Interior ROI away from Boundary (Qrey matter) I Bias Square x 10* Large ROI with Boundary Pixels (Grey matter) x 10* I Bias Square x 10® Interior ROI away from Boundary (Grey matter) x 10® > Bias Square x 104 Figure 7.6: ROI variance versus squared bias plot. At each point the summation of the horizontal and vertical coordinates equals the total mean squared error. Top row: high count test; Bottom row: low count test. Left column: large ROI with boundary pixels; right column: interior ROI away from boundaries. The ROIs are marked in Figure 4(a). In the figure, solid line = MAP with quadratic prior; dot-dashed line = MAP with Huber prior; dashed line = MAP with Geman and McClure prior; dotted line = filtered backprojection. The star, circle and cross signs indicate locations corresponding to the MFAML estimate of the hyperparameter. 117 average total squared error in the reconstructed image versus the range of hyperpa ram eter values (see Figure 7.5). We then performed simultaneous reconstruction and hyperparam eter estim ation for each of the data sets and for each of the priors, and indicated on Figure 7.5 the mean value of f3 obtained. Here we used the MFAML-2 m ethod for hyperparam eter estimation. Note that the hyperparam eter value that the MFAML m ethod produced lies very close to the smallest average squared-error for both high and low count tests. The best performance for the low count data is only slightly worse than that for the high count case. For both count rates, the standard deviation of the estim ated hyperparameter was less than 1 0 %. Two ROIs, as marked in figure 7.4(a), were selected for the quantitative study. One ROI is a entire structure in grey m atter that includes boundary pixels, the other is an interior region of this structure and does not include boundaries. For these two regions, the bias and variance of the average intensity in the two ROIs were com puted using an ensemble average over the 50 reconstructions. These statistics were computed using the three different priors for a range of hyperparam eter values for the low and high count data. For each prior, bias-squared versus variance were plotted in figure 7.6 for all hyperparam eter values. Similar plots were also m ade for images reconstructed using the filtered backprojection (FBP) algorithm with filters of various cut-off frequency. Finally, we indicate on each curve the values of ROI bias and variance corresponding to the reconstructions obtained using the simul taneous MAP reconstruction and MFAML hyperparam eter estimation. In general, the Bayesian m ethods provide lower total mean squared error (bias-squared plus variance) than filtered backprojection. While the estim ated hyperparameters do not correspond to the minimum possible mean squared error, the results are reason able with the exception of the results obtained using the Geinan and McClure prior which appears to exhibit much higher variance than the other priors, particularly when boundary pixels are included in the ROI. 118 (b) Experimental 3-D chest (c)Experimental 3-D chest phantom (8 th plane) phantom (1 1 th plane) Figure 7.7: Phantoms used in the ROI analysis with the quantization regions marked in white. Q uantitative study using a 3 -D experim ental chest phantom An experimental phantom study was also performed to evaluate the quantitative behavior of the method described in this dissertation. In this experiment, 40 frames of 1 rriin emission data were acquired using an axially and transaxially asymm etric chest phantom , with cylindrical inserts of different activity levels. The data were collected using the 15 plane, Siemens/CTI ECAT931 scanner. An external ring source was used to collect 1 and 30 min transmission scans. Randoms sinograms were acquired separately for both emission and transmission scans. The mean of the randoms component was estimated using this randoms sinogram through maximum likelihood estimation of the singles rate at each detector [43]. The scatter sinogram was computed by tracing scatter paths through a preliminary reconstruction of the emission image and using the Klein-Nishina formula [43]. The ACFs were computed by reprojection of a transmission image which was reconstructed using the Bayesian method described in [43]. In all reconstructed emission images, the image size is 128 x 128, the pixel size is 0.36 cm by 0.36 cm, and the axial sample interval is 0.675cm. Several ROIs were selected away from the compartm ent boundaries. Each ROI size was chosen as a cylindrical volume of size 75 pixels (25 pixels in each plane, covering 3 planes). The volume of each ROI is 6.561cm3. The ROIs are marked on the 8 th and 11th reconstruction planes as shown in Figure 7.4(b) and (c) respectively. ratio true FBPi f b p 2 M APi m ap 2 2 /la 1.293 1.04 ± 0 . 1 1 1.15 ± 0.14 1.33 ± 0.04 1.33 ± 0.03 3 /la 1.128 0.92 ± 0 . 1 0 1 . 0 2 ± 0 . 1 2 1.18 ± 0.04 1 . 2 0 ± 0.03 4 /la 1.743 1.71 ± 0.13 1.67 ± 0.13 1.75 ± 0.05 1.73 ± 0.04 5 /la 3.820 2.83 ± 0 . 2 1 3.07 ± 0.24 2.87 ± 0.07 3.31 ± 0.06 6 / l a 0 . 0 0 0 -0.04 ± 0.07 0 . 0 1 ± 0.09 0.07 ± 0 . 0 2 0.08 ± 0 . 0 2 7 /la 0 . 0 0 0 0.18 ± 0.07 0 . 1 0 ± 0.07 0.15 ± 0 . 0 2 0.16 ± 0 . 0 2 Table 7.1: The mean and standard deviation of relative ROI values using lmin emission data (40 realizations), computed relative to the activity in region la. If region lb is used, the FBP results are somewhat worse, the MAP results are similar. Both FBP results use a ramp filter. FBP: ACFs using smoothed blank/(lmin transmission data); FBP1: ACFs using smoothed blank/(30min transmission data). All MAP results use ACFs from reprojection of 1 min transmission reconstructions. MAPI: quadratic prior; MAP‘ 2: Hubei prior. For each of the 40 one minute frames, the 3-D PET images were reconstructed with the quadratic and Huber potential functions using the gradient projection pre conditioned conjugate gradient (GPPCG) [16] method for MAP estim ation with simultaneous hyperparam eter estim ation using MFAML-2. The results were com pensated for the natural decay in the strength of the source over the duration of the experiment. The average activity was computed for each ROI in each of the 40 reconstructions. Relative activity, compared to that in the background region la in Figure 7.4(b), was computed for each reconstruction. The mean and standard devi ation of the relative ROI values was then computed using ensemble averaging over the 40 frames and compared with the correct values as measured at the tim e of the experim ent. The results are tabulated in Table 7.1. Also shown are the equivalent com putations for filtered backprojection with a ramp filter for attenuation correc tion, using 1 min and 30 min transmission measurements. These results indicate 1 2 0 (a) Figure 7.8: (a)The original brain phantom, (b) Anatomical boundaries extracted from (a) and 30 selected ROI’s generally superior performance of the MAP method, particularly for the smaller ac tivity ratios. These results are encouraging and show the clear potential for using Bayesian methods, with simultaneous hyperparameter estim ation, for quantitative studies in PET. C om parative studies of the effect of using anatom ical priors in P E T reconstructions Although it has been widely conjectured th at improvements in the quantitative accuracy of estimates will be realized for ill-posed inverse problem by using Bayesian methods, especially when proper information from other imaging modalities is incor porated, only images from single realizations of a given source activity are usually shown. W hile these provide anecdotal evidence of improved accuracy, no system atic quantitative study is available. This is partially because of the lack of a practical method to choose proper hyperparameters. In this section, we perform an exten sive Monte Carlo simulation to evaluate the quantitative performance of Bayesian techniques a t the optim al hyperparameter value (the point estim ate produced by our mean field approxim ate maximum likelihood algorithm), in the context of PE T 1 2 1 image reconstruction. The studies will be based on comparisons of ROI statistics com puted from fifty Bayesian P E T reconstructions using different forms of priors. Here, we generated 50 independent Poisson data sets from another brain phan tom in figure 7.8(a) with high (1M) and low (300K) count rates. 15% Poisson distributed randoms and scatter were then added across the sinogram. The true activities for gray m atter, white m atter and cerebrospinal fluid (CSF) are 0.5:0.1:0 and 0.02:0.004:0 respectively. We then applied each of the following m ethods to reconstruct images from each of the 50 data sets: FBP, MLE, M AP with quadratic prior, MAP with Geman&M cClure prior, MAP with a weak membrane prior without anatom ical boundaries (with and without line interactions) and with incorporating anatom ical boundaries. The anatomical boundaries, shown in figure 7.8(b), were ex tracted directly from the original phantom in figure 7.8(a). In order to evaluate the influence of partial volume factors, we also generated a set of data with anatomical boundaries shifted by half a pixel. For each case and each set of data the hyper- param eter (i is estim ated simultaneously as the PET image using MFAML, while the line interaction potentials were set by trial-and-error. We computed bias and standard deviation (STD) in the 30 preselected ROI’s shown in figure 7.8(b). The results are shown in figure 7.9 for the low count case. The type of ROIs selected are listed below: 1-18: ROIs in gray m atter, 19-26: ROIs in white m atter and 27-30: ROIs in CSF (1-6,19,20 and 27: large ROIs with boundary pixel; 7-12 and : large ROIs one pixel apart from boundary; 13-18,21-26,29 and 30: small ROIs i, ior). C onclusion Our M onte Carlo simulation based quantitative evaluation shows that the size of hyperparam eter has a great effect on the quantitative accuracy of the Bayesian es tim ate, in terms of both global mean-squared-error and local ROIs. Estimation (or prior knowledge) of the hyperparam eter is critical for quantitative PET stud ies. Fortunately, our newly developed mean field approximated maximum likelihood estim ator appears to give performance close to the minimum global mean-squared- error. Our ROI studies show that FBP exhibits large bias and fairly small variance. Conversely, ML is relatively unbiased but has a large variance. The MAP results without anatom ical priors, in general, have much lower variance than ML and less bias than FBP. The Geman&McClure and weak membrane priors (with and without line interactions) appear to exhibit very similar performance. They all have small bias and variance for non-boundary pixels and ROIs which do not include boundaries. However, at some boundary locations, bias becomes significantly larger than for ML, presumably due to inaccurate placement of the estim ated boundaries. It is interesting to note that the introduction of line process interaction terms has little noticeably effect on quantitative accuracy. MAP m ethods using anatomical boundary information significantly improve the accuracy of the estim ate at the boundary locations, which results in reductions in bias and variance. Even with a half-pixel shift of the image, which in some sense represents the worst case partial volume effect, the anatomical boundary information results in improved quantitative accuracy. The reconstructed images from the real PE T data confirm that the use of anatomical priors can improve the quality of reconstructed images. . ' " l + 4 ' t t.* (. 1 ■ ■ I n d t i of ROIa (a) \ *1 1 ^ i f t t w i - * * * » 1 1 l ' 1 lUM + tHHl Figure 7.9: Bias and standard deviation of the average activity in ROI’s of 50 independent reconstructions for a variety of methods for 1M data. (a) = FBP using ideal ram p filter, (b)=M axim um Likelihood, (c)=Bayesian with quadratic prior, (d)=Bayesian with Geman and McClure prior, (e)=Bayesian with weak membrane prior, (f)=Bayesian with generalized weak membrane prior, (g)=Bayesian with weak m em brane prior and exact anatomical prior, (h)= Bayesian with weak membrane prior and anatomical prior half pixel shifted from true location both horizontally and vertically. 124 C hapter 8 C onclusions and D irections for Future R esearch 8.1 C onclusions A novel approach for estim ating the global hyperparameter of a Cibbs prior from incomplete data has been established in this dissertation. This approach leads to three practical estim ators of the hyperparam eter based on various forms of mean field approxim ation. Theoretically, all three estim ators obtained by this approach are tractable approximations of the true maximum likelihood estim ator from incom plete d ata (which is intractable). Of particular theoretical interest among the three estim ators is the one which uses optimal mean field approximation of both prior and posterior partition functions in the sense of maximizing the Gibbs-Bogoliubov- Feynmann bound. The solution of this optimal estim ator can be concluded to be the best estim ate of the global hyperparam eter, among those obtained using separable functions to approxim ate the original complicated joint density. In practice, solu tions of all three estim ators lead to near optimal quantitative image estim ates. The qualitative differences between the images obtained by the three estimators are mi nor, b u t the quantitative differences are noticeable. Not surprisingly, the estim ator which uses the optim al partition function approximation performs the best. This ap proach can be easily combined with most state of the art Bayesian algorithms which use M RF priors and readily applied to most inverse problems encountered in image processing, such as image restoration, surface interpolation, m otion estim ation and medical image reconstruction. Extensive Monte Carlo simulations show th a t these estim ates are fairly robust to the noise level, the form and conditioning of likelihood and the choice of the prior. Although estimation of only a single hyperparam eter has been illustrated in this dissertation, it is straight-forward to extend this method to the estim ation of m ultiple hyperparameters. Because of the success of this new hyperparam eter estim ation method, Bayesian solutions of inverse problems are de term ined uniquely and completely by the choice of the prior, the likelihood and the data. In this dissertation, we also illustrate that Gibbs priors are able to accommodate both smooth regions and abrupt boundaries by varying the forms of the potential functions defined on pairwise neighboring pixels. Of particular importance are those priors using semi-convex or non-convex local potential functions and those using line processes. Using compound MRFs with line processes to model sharp boundaries provides more freedom to control the behavior of the prior, and also proves to be useful in integrating information from different imaging modalities. Although global optim ization of Bayesian cost functions using these priors is very difficult, a fast convergent determ inistic relaxation algorithm can achieve a fairly reasonable local optim um . Q uantitative studies of PET image reconstruction show that Bayesian m eth ods for solving ill-posed inverse problems provide solutions with better quantitative accuracy than simpler non-Bayesian methods in general. However, different prior models and different choices of hyperparameters produce significant changes in both the quality and quantitative accuracy of the estim ated images. A proper prior model and correct choice of the hyperparam ter is critical for computing optimal Bayesian solutions. Studies also show that if proper information from other imaging modali ties is incorporated as prior information, significant quantitative improvement in the solution can be achieved. 8.2 Future R esearch D irections D evelopm ent and Validation of Suitable Im age Priors Since the statistical prior distributions confine the range of Bayesian solutions, ap propriate prior models are critical to achieving both high quality and quantitative 126 accuracy in Bayesian image estimates. So far, the image priors described here and elsewhere in th e image processing literature are developed using certain heuristic argum ents and validated by inspection of the quality or quantitative accuracy of the resulting Bayesian estimates. It is obvious that there is no prior model which is universally optim al for all images. Different types of images may require different priors. Therefore, it is necessary to establish some method to validate the prior, when it is used to describe a certain class of images. Furtherm ore, it is desirable to establish certain criteria or rules to evaluate the fitness of prior models when a specific class of images is given. Some researchers have proposed criteria based on a goodness-of-fit test [36]. However, this approach can only apply to discrete images w ith a relatively small num ber of grey levels. The goal of validating or evaluating prior models is as challenging as the problem of hyperparam eter estim ation, when the single pixel sample space is from a discrete set of a large number of gray levels or from a continuous intervals. Param eter E stim ation for the Line Cliques It has been shown in this dissertation that by using line process interaction poten tials, thin connected line configurations may be formed. However, this is oidy true if th e proper weights are assigned to each line clique type. Most researchers chose these weights by trial-and-error, which is very tim e consuming and can not achieve ideal performance. From the standpoint of the observed data, the weights of the line cliques are very difficult to estimate. However, if information from another imaging m odality is available, for example a registered MR image of the same object as in a P E T study, these line cliques parameters may be estim ated from the second imaging modality. Integrating Inform ation from Different Imaging M odalities Since quantitative studies show that great improvement in the accuracy of image estim ates can be achieved by using information, especially discontinuity information from other modalities, further research in this direction should be performed. There still exist two m ajor problems in integrating discontinuities from different, imaging 127 modalities. First, different imaging modalities produce images of different resolu tions, it is therefore difficult to decide the corresponding locations of discontinuities in images between different image modalities. Second, it is unrealistic to expect true image edges always to lie between pixels. In reality, most image boundary elements may lie in image pixels instead leading to the existence of partial volume pixels. A m ultiscale stochastic model may be a good candidate to resolve these problems. B ibliography [1] M. Almeida and G. Gidas, “A Variational Method for Estim ating the Param eters of M RF from Complete or Incomplete D ata”, Technical Report, Brown University, 1989. [2] M. S. B artlett, An Introduction to Stochastic Processes. Cambridge: University Press, 1955 [3] M. Bertero, T. A.Poggio, and, V. Torre,“ Ill-Posed Problems in Early Vision”, Proceedings of The IEEE, vol. 76 No. 8 , Aug. 1988 [4] J. Besag, “Spatial Interaction and the Statistical Analysis of Lattice System s,” Journal of the Royal Statistical Society, B, vol. 36, pp. 192-236,1974 [5] J. Besag, “On the statistical analysis of dirty pictures” , Journal of the Royal Statistical Society, B, vol. 48(3), pp. 259-302, 1986 [6 ] G. L.Bilbro, W. E.Snyder, S. J. Garnier, and J. W .Gault, “Mean Eield Anealling: A Formalism for Constructing GNC-Like Algorithms” , IEEE Trans, on Neural Networks, vol. 3, No. 1, Jan. 1992 [7] M. Black, A. Rangarajan, “On Line Processes, Outlier Rejection and Robust Statistics,” Technical Report YALEU/DCS/RR-993, Departm ent of Com puter Science, Yale University [8 ] A. Blake and A. Zisserman, Visual Reconstruction, Artificial Intelligence, MIT Press, Cambridge, MA, 1987 [9] C. Bouman and K. Sauer, “Fast Numerical Methods for Emission and Trans mission Tomographic Reconstruction,” In Proc. Conf. Info. Sri. Sys., Johns Ilopkins, 1993 [10] P. Bouthemy and P. Lalande, “Motion detection in image sequence using Gibbs distribution,”, in Proc. IEEE int. Conf. Acoust. Speech Signal Processing, May 1989, pp. 1651-1654 [11] R. Carson et al.,“Precision and accuracy of regional radioactivity quantitation using the ML EM reconstruction algorithm ,” in press, 1993. [12] D. Chandler, Introduction to Modern Statistical Mechanics. Oxford University Press, 1987. [13] P. Craven and G. W ahba,“Smoothing Noisy Data with Spline Function,” Nu- merische Mathematik, vol. 31, pp. 377-403, 1979 [14] A. Dem pster, N. Laird and D. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, Vol. 29, pp. 1-38, 1977 [15] W. Enkelmann, “Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences,” Comp. Vision, Graphics and Image Proc., vol. 43, pp. 150-177, 1988. [16] J. M. Fitzpatrick, “The existence of geometrical density-image transform ations corresponding to object motion,” Comp. Vision, Graphics and Image Proc., vol. 44, pp. 155-174, 1988. [17] D. Geiger and F. Girosi,“Parallel and Deterministic Algorithms for MRFs: Sur face Reconstruction and Integration,” IEEE Trans. PA MI, Vol. 12, pp401-412, May 1991 [18] S. Geman and D. Gem an,“Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images” , IEEE Ti'ans. on Pattern Analysis and Ma chine Intelligence, vol. PAMI-6 (6 ), pp. 721-741, Nov. 1984. [19] S. Geman and D. McClure, “Statistical Methods For Tomographic Image Re construction,” In Proc. of the f6th Session of the ISI, Bulletin of the ISI, Vol.52, (1987) [20] G. Gindi, M. Lee, A. Rangarajan, and I. G. Zubal, “Bayesian Reconstruction of Functional Images Using Registered Anatomical Images as Priors” , In A. C. F. Colchester and D. J.IIawkes, editors, Information Processing in Medical Imag ing, pp. 121-131, Springer-Verlag, 1991. [21 ] ------, “A Continuation Method for Emission Tomography” , In Proc. 1992 Nucl. Sci. Symp and Med. Imag. Conf., pp 1204-1206, 1992. [22] J. K.Goutsias, “M utually compatible Gibbs random fields,” //?/?/? Trans. In form. Theory, vol. 35, pp. 1233-1249, Nov. 1989 [23] J. Hadam ard, “Sur les problems aux derivees partielles etleur signification physique,” Princeton University Bulletin, vol.13,1902 [24] J. H adair :d, Lectures on the Cauchy Problem in Linear Partial Differential Equations. New Haven, CTrYale University Press, 1923. 1 3 0 [25] P. C. Hansen, “Analysis of Discrete Ill-posed Problems by Means of the L-curve,” SIAM Review vol. 34, No.4,pp.561-580, 1992 [26] P.C. Hansen and D.P.O’Leary, “The use of the L-curve in the regularization of discrete ill-posed problems,” Report UMIACS-TR.-91-142, Dept, of Com puter Science, University of Maryland, College Park, MD. [27] T. J.H ebert and R. Leahy,“Statistic-Based MAP Image Reconstruction from Poisson D ata Using Gibbs Priors,” In IEEE Tran, on Signal Proc. vol. 40, No. 9, pp. 2290-2303, Sept. 1992 [28] F. Heritz and P. Bouthemy,“Motion estimation and segmentation using a global Bayesian approach,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Processing, 1990, pp. 2305-2308 [29] Horn, B. K. P. and Schunck, B. G. 1981 Determining optical flow. Artificial Intelligence. 17, 185-203. [30] V. E.Johnson, W. H.Wong, X. Hu, and C. C hen,“ Image Restoration Using Gibbs Priors: Boundary Modeling, Treatment of Blurring, and Selection of H yperparam eter”, IEEE Trans, on Pattern Anal, and Machine Inlcll.YA.MI [31] R. L.Kashyap and R. Chellappa, “Estimation and Choise of Neighbors in Spatial-Interaction Models of Images,” IEFJE Trans. Information The ory,vo\. 29, pp. 60-72, Jan. 1983 [32] J. Konrad and E. Dubois, “Bayesian Estimation of Motion Vector Fields,” IEEE Trans, on Pattern Analysis and Machine Intelligence, Vol. 14, No. 9, Sept. 1992 [33] S. Lakshmanan and II. Derin, “Simultaneous Param eter Estimation and Segmentation of Gibbs Random Fields Using Simulated Annealing,” //?/?/'? Trans, on Patt. Anal, and Mchine Intell. Vol. 11,pp. 799-813,1989 [34] R. Leahy and X. Yan, “Incorporation of Anotomical MR d a ta for improved Functional Imaging with PE T ”, In A. C. F. Colchester and D. J.IIawkes, editors, Information Processing in Medical Imaging, pp. 105-120, Springer-Verlag, 1991. [35] P. J. M.van Laarhoven and E. II. L.Arts, “Simulated Annealing: Theory and Applications,” D.Reidel Publishing Company, 1987 [36] E. Levitan, M. Chan, G. Herman, ” Image-Modeling Gibbs Priors,” Technical Report, Departm ent of Radiology, University of Pennsylvania. [37] D. Luenberger, Linear and Nonlinear Programming. Addison-Wesley Inc., Menlo Park, CA, 1989 131 K. M. Manbeck, “Bayesian Statistical Methods Applied to Emission Tomography with. Physical Phantom and Patient Data," Ph.D. thesis, Brown University, 1990. J. M&rroquin,“Probabilistic Solution of Inverse Problems," Ph.D. thesis, MIT. Cambridge, Sept. 1985 J. Mathews, R. L.Walker, Mathematical Methods of Physics, The Ben jam in/C um m ings Inc. Menlo Park, CA, 1970 J. M. Mendel, Lessons in Digital Estimation Theory, Prcntice-Hall, Inc., En glewood Cliffs, New Jersey A. Mohammad-Djafari, “On the Estim ation of Hyperparam eters in Bayesian Approach of Solving Inverse Problems,” In Proc. ICASSP-93 pp. V495-198. E. Mumcuoglu, R. Leahy,S. Cherry arid Z. Zhou, “Fast Gradient-Based Methods for Bayesian Reconstruction of Transmission and Emission P E T Images,” In USC-SIPI REPO RT #241 also Submited to IEEE Trans, on Medical Imaging S. G.Nadabar and A. K. Jain, “Param eter Estimation in M RF Line Process Models,” in IEEE Proc. ICASSP1992,pp. 528-588. G. G. Potamianos and J. K.Goutsias, “Patition Function Estim ation of Gibbs Random Field Images Using Monte Carlo Simulations,” IEEE Trans. Inform. 'Theory, vol. 89, pp. 1322-1332, July 1993 A. Rangarajan, “Representation and Recovery of Discontinuities in Some Early Vision Problems", Ph.D. thesis, University of Southern California, Nov. 1990 K. Sauer and C. Bourn an, “A Local IJpdate-Strategy for Iterative Reconstruc tion from Projections,” In IEEE Tran, on Signal Processing. Vol. 41, No. 2, pp. 534-548, Feb., 1993 II. R. Schwarz, J. Waldvogel, Numerical Analysis- A Comprehensive Introduc tion, pp. 339-342, J. Wiley & Sons, New York. L. A. Sliepp and B. F.Logan, “The Fourier Reconstruction of a Head Sector,” IEEE Tran, on Nuclear Science Vol. NS-1,pp.21-33, June, 1974 S. M. Song and R. M.Leahy, “Com putation of 3-D Velocity Fields from 3-1) Cine C'T images of a Human Heart,” IEEE Trans, on Medical Imaging, Vol. 10. No. 3, pp .295-306, 1991 R. Szeliski, Bayesian Modeling of Uncertainty in Low-level. Vision. Kluwer Aca demic Publishers, Boston, 1989 A. M. Thompson, J. C. Brown, et al.,“A Study of Methods of Choosing the Smoothing Param eter in Image Restoration by Regularization,” In IEEE Tran, on Pali. Anal, and Machine Intell. vol. 13, No. 4, pp. 326-339, April 1991 A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems. Washing ton, DC: W inston and Sons, 1977. J. Varah, “Pitfalls in the numerical solution of ill-posed problems,” SIAM J. Sci. Statist. Comput, 4 pp.164-176, 1983 G. W ahba, “Spline Models for Observational D ata,” C-BMS-NSF Regional Con ference Series in Applied M athematics, Vol.59, Society for Industrial and Ap plied M athematics, Philadelphia, PA, 1990 P. W hittle, “Stochastic Processes in Several Dimensions.” Bull. Int. Statist. Inst., 40, pp. 974-994, 1963. J. W .W oods,“Two-dimensional Discrete Markovian Fields,” IEEE Trans. In formation Theory, vol. 18, pp. 232-240, March 1972 J. W.Woods, S. Dravida, and R. Mediavilla,“Image Estim ation Using Doubly Stochastic Gaussian Random Field Models,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, pp. 245-253, March 1987 X. Yan, R. Leahy, Z. Wu and S. Cherry, “ MAP Estim ation of PE T Images using Prior Anatomical Information,” , In Proc. 1992 Nucl. Sci. Symp and Med. Imag. Conf., pp 1201-1203, 1992. J. Zhang, “The Mean Field Theory in EM Procedures for Markov Random Fields,” IEEE Trans, on Signal Processing, vol. 40. No. 10. Oct. 1992 J. Zhang and J. Hanauer, “The Mean Field Theory for Image Motion Estim a tion”, In Proc. ICASSP, 1993, pp.Vl97-V200. Z. Zhou, R. Leahy, “A Comparative Study of the Effects of Using Anatomical Priors in P E T Reconstruction,” In IEEE Proc. of Nucl. Sci. Symp and Med. Imag. Conf. pp. 1749-1753, 1993 Z. Zhou, R. Leahy, E. Mumcuoglu,“On the effect of hyperparam ctcr estimation on ROl bias and variance in Bayesian PET,” In Proc. conf. of Society of Nuclea r Medicine 1994 Z. Zhou, R. Leahy, E. Mumcuoglu,“Simultaneous Hyperparam eter Estimation and Bayesian Image Reconstruction for PE T ,” in IEFJE Proc. of Nucl. Sci. Symp and Med. Imag. Conf., 1994 [65] Z. Zhou, R. Leahy,“On Simultaneous Maximum Likelihood Hyperparam eter Estim ation for Bayesian Inverse Problems in Image Processing,” SIPI Technical Report, University of Southern California, 1994 [6 6 ] Z. Zhou, R. Leahy,“Maximum Likelihood Hyperparam eter Estim ation for Gibbs Priors from Incomplete D ata with Applications to Positron Emission Tomogra phy” Information Processing and Medical Imaging, France, 1995 [67] Z. Zhou, C. Sinolakis, R. Leahy, S. Song, “3-D M ultiresolution Motion Estima tion for Incompressible Continuous Media,” In IEEE Proc. of Int. Conf. Signal Proc. and Neural Network, China, 1993 [6 8 ] Z. Zhou, C. Sinolakis, R. Leahy, S. Song, “Calculation of 3-D Internal Dis placement Fields from 3-D X-ray Com puter Tomographic Images,” To appear in Proc. of Royal Society, Engineering and Mathematics.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11257241
Unique identifier
UC11257241
Legacy Identifier
9621667