Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Fast solvers in hyperspectral optical bioluminescence tomography for small animal imaging
(USC Thesis Other)
Fast solvers in hyperspectral optical bioluminescence tomography for small animal imaging
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FAST SOLVERS IN HYPERSPECTRAL OPTICAL BIOLUMINESCENCE
TOMOGRAPHY FOR SMALL ANIMAL IMAGING
by
Abhijit J. Chaudhari
A Thesis Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(APPLIED MATHEMATICS)
May 2007
Copyright 2007 Abhijit J. Chaudhari
Dedication
to my parents and friends
ii
Acknowledgments
I would like to express my gratitude to Dr. Wlodek Proskurowski, my Applied Mathe-
matics advisor and mentor for his great understanding, patience, help and support. This
thesis is a result of the several techniques taught in his numerical analysis classes at the
University of Southern California (USC). Further, I wish to thank my doctoral advisor,
Dr. Richard M. Leahy of Electrical Engineering, for his interest, help, and ongoing sup-
port. Also, I am grateful to Anand A. Joshi, Dr. Felix Darvas and Maitreyee Tripathi
who provided technical inputs and encouragement. Lastly, I would like to thank my
thesis committee members Dr. Chunming Wang and Dr. Robert Sacker.
iii
Table of Contents
Dedication ii
Acknowledgments iii
List of Tables v
List of Figures vi
Preface vii
1 The optical forward model 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The existing forward solver . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Investigation for fast forward model solvers . . . . . . . . . . . . . . . 6
2 Investigation of fast solvers 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Sparsity of our FEM matrix and reordering . . . . . . . . . . . . . . . 9
2.3 Cholesky decomposition and the fill-in phenomenon . . . . . . . . . . 11
2.4 Solution schemes for forward models in HOBT . . . . . . . . . . . . . 13
2.4.1 Solution by sequential back-substitution . . . . . . . . . . . . . 13
2.4.2 The incomplete Cholesky factor as a preconditioner for PCG . . 14
2.4.3 Solution using lumped RHS . . . . . . . . . . . . . . . . . . . 16
2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Bibliography 21
iv
List of Tables
2.1 Table showing the sample matricesF used for this thesis . . . . . . . . 8
2.2 Table shows the reordering and decomposition costs in seconds for three
sample matrices in HOBT . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Table shows the backsolving cost for a single sample RHS . . . . . . . 11
2.4 Table shows the backsolving cost for a single sample RHS for the matrix
with v = 59332. MATLAB’s cholin operator was used for the incom-
plete Cholesky decomposition. For the complete Cholesky decomposi-
tion, the same function was used with droptol = 0. . . . . . . . . . . . . 12
2.5 Table shows the cost of repeated reordering, Cholesky decomposition,
and solving versus pre-computation of the Cholesky factor and sequen-
tial backsolving for the matrix withv = 59332. Also, see section 2.4.3 . 14
2.6 The cost for PCG to converge as a function of droptol for the incomplete
Cholesky decomposition for the matrix of sizev = 59322 . . . . . . . . 15
2.7 Table shows the cpu for varying number of RHS using the \ operator
and PCG for the matrix of sizev = 211372. . . . . . . . . . . . . . . . 17
2.8 Table shows the cpu for all the three matrices under consideration and
50 RHS when using the\ operators and PCG. . . . . . . . . . . . . . . 18
v
List of Figures
2.1 Pattern of non-zeros in the FEM matricesF of (2.2) before and after
reordering. Each row in the figure grid corresponds to matrices of the
same size. The first row corresponds to the matrix of size 59332, the
second to that of size 146872 and the third to that of size 211372.
The columns correspond to the original FEM matrix, the FEM matrix
reordered by AMD, and the FEM matrix reordered by RCM respectively. 10
2.2 Plot of the cost for computing the incomplete Cholesky factorization
and for PCG, as a function of droptol for the matrix of size v = 59322 . 16
2.3 Cost of solving multiple RHS using sequential operations, PCG and
using lumped RHS for the matrix of size v = 211372 as a function
of the number of RHS. . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vi
Preface
For bioluminescence imaging studies in small animals, it is important to be able to
accurately localize the 3D distribution of the underlying bioluminescent source. It has
been shown that owing to the spectral variation of optical measurements on the ani-
mal surface with depth of the emission source, Hyperspectral Optical Bioluminescence
Tomography (HOBT) can yield fairly accurate 3D reconstructed source distributions. In
order to solve the associated inverse problem in HOBT, we require the solution to the
optical forward problem first. A solution for the case of a highly scattering medium
(in this case, animal tissue) is given by the diffusion equation and can be solved by the
finite element method (FEM) in a realistic mouse geometry. The corresponding FEM
matrix for a given mouse geometry needs to be inverted for the assembly of the optical
forward model. This process of inversion is both costly and cumbersome because the
FEM matrix is huge (typically 60,000 to 220,000 rows and columns). The huge FEM
matrix corresponds to a dense tessellation. Using a dense tessellation is advantageous in
HOBT because (a) it is a better representation of the internal organs and (b) it is a better
approximation of the photon propagation process in biological tissue. However, we also
note that the FEM matrix is symmetric, positive definite and sparse. Methods for fast
inversion of the FEM matrix are crucial for the use of HOBT in experimental biology
laboratories.
vii
In this thesis, we first evaluate sparse matrix reordering techniques and their impact
on the cost of computing a Cholesky factorization of the FEM matrix. The Cholesky
factorization is used further for two purposes, (a) for solving the system of linear equa-
tions by forward and back-substitution and (b) as a preconditioner to the Preconditioned
Conjugate Gradient (PCG) method. Both methods (a) and (b) give considerable speed-
up in computation. For (a), we evaluate the impact on the computation cost of lumping
together 500 right hand sides (RHS) versus using a single RHS each time. Our results
from the methods presented in this thesis indicate a cost reduction of more than 100
times compared to the existing method which involved direct solving using the MAT-
LAB’s ‘\’ operator sequentially. All computational costs in this thesis are indicated in
seconds.
A. C.
Los Angeles, California
October 2006.
viii
Chapter 1
The optical forward model
1.1 Introduction
Hyperspectral Optical Bioluminescence Tomography (HOBT)[CDB
+
05] is a functional
imaging modality, used primarily for molecular imaging in small animals. In HOBT,
a bioluminescent source located inside the 3D animal volume produces optical sig-
nals on the animal surface that can be measured by a Charge-coupled device (CCD)
camera. The objective, then, is to estimate the 3D spatial distribution of the biolumi-
nescent source from surface measurements. The problem of determining the photon
density on the animal surface due to a bioluminescent source distribution within the
animal volume requires accurate representation of the photon transport in biological
tissue. A deterministic description of this process under the assumption of isotropic
scattering is given by the diffusion equation, which allows modeling of light propa-
gation in turbid media with inhomogeneous optical properties and realistic geometries
[Arr99]. Several approaches have been proposed for solving the diffusion equation,
including methods based on discretization of the volume into finite elements that allow
modeling of inhomogeneities [ASHD93, SAHD95, RSM01] and methods based on ana-
lytical approximations using simplified geometries and assuming tissue homogeneity
[HST
+
94, Arr95, RCN01, GRWN03].
The forward problem in HOBT is to predict the photon fluence (b) on the boundary
dΩ of the animal volumeΩ, given the distribution of the bioluminescent source (q) and
the tissue optical properties. These optical properties: the reduced scattering coefficient
1
μ
′
s
, the absorption coefficient μ
a
, and the refractive index η, are all functions of the 3D
locationr and of wavelengthλ. In the near-infrared (NIR) region, the reduced scattering
coefficient is much larger than the absorption coefficient, so the diffusion approximation
is valid [Arr99]. The diffusion equation is a linear parabolic partial differential equation
that can be viewed as a mapping of the source strength q(r,λ) to the measurements
b(r,λ) in steady state and can be derived from the radiative transport equation (RTE).
In 3D, the steady-state RTE for monochromatic light as a function of the position r
assumes the form [AH03]
ˆ s∇φ(r,ˆ s)+(μ
a
+μ
s
)φ(r,ˆ s) = μ
s
Z
4Ï€
f(ˆ s·ˆ s
′
)φ(r,ˆ s)dˆ s
′
+q(r,ˆ s) (1.1)
where the scalar field φ(r,ˆ s) denotes the energy radiance in W mm
−2
sr
−1
, μ
a
and μ
s
are the absorption and scattering coefficients of the medium in mm
−2
, the unit vectorˆ s
is the direction of scatter for the photon, whose initial direction was the unit vector ˆ s,
f(ˆ s·ˆ s
′
) denotes the scattering phase function which is the probability of scattering of
a photon with initial directionˆ s
′
in the directionˆ s, andq(r,ˆ s) is the directional source
density. The dot product in the argument of the scattering phase functionf(ˆ s·ˆ s
′
) implies
its dependence only on the cosine of the angle betweenˆ s andˆ s
′
. With photon densityΦ
defined as
Φ(r) =
Z
4Ï€
φ(r,ˆ s)dˆ s (1.2)
the diffusion equation in steady state at a single wavelengthλ is given as
−{c▽·κ(r)▽−μ
a
(r)c}
−1
q(r) = Φ(r), (1.3)
where
κ(r) =
1
3[μ
a
(r)+μ
′
s
(r)]
(1.4)
2
κ(r) is the diffusion coefficient, c is the velocity of light at wavelength λ, and μ
′
s
=
μ
s
(1−g) is the reduced scattering coefficient withg as the average cosine of the scatter
angle.
The detailed derivation of the diffusion equation from the RTE is given in [CZ67].
While the diffusion equation is simply the first angular moment (P-1) approximation to
the RTE, it has been shown to be valid under the following circumstances;
1. The medium is dominated by scattering i.e. μ
′
s
>> μ
a
, where μ
′
s
is the reduced
scattering coefficient and μ
a
is the absorption coefficient,
2. The point of interest is at least a few scatter lengths (1/μ
′
s
) away from the source,
3. The propagating signal cannot be approximated correctly if measured at very early
times.
While condition 1 is satisfied by most biological tissue, there are non-scattering
regions, e.g. the liver, where the diffusion approximation is not valid. Methods have
been proposed to use the RTE directly for solving the forward problem [AH03]. How-
ever, these methods are very slow for practical geometries.
The boundary condition for the RTE is that no photons travel in the inward direction
at the boundary dΩ, i.e. withν as the outward normal,
φ(r,ˆ s) = 0, whenr∈ dΩ,ˆ s·ν < 0 (1.5)
While (1.5) cannot be satisfied exactly by the diffusion equation, the condition is
replaced by requiring that the total inward photon current be 0 at the boundary [Arr99,
CZ67], i.e.
Z
ˆ s·ν<0
ˆ sφ(r,ˆ s)dˆ s = 0,r∈ dΩ. (1.6)
3
To account for the refractive index mismatch of the air-tissue interface (refractive index
of 1.4 for tissue versus 1 for air), and total internal reflection at the boundary, a parameter
R is introduced [SAHD95, Arr99], and the boundary condition is modified to
Z
ˆ s·ν<0
ˆ sφ(r,ˆ s)dˆ s =
Z
ˆ s·ν>0
ˆ sRφ(r,ˆ s)dˆ s,r∈ dΩ. (1.7)
After simplifications, equation (1.7) transforms to [Arr99]
Φ(r)+2κG
∂Φ
∂ν
= 0,r∈ dΩ. (1.8)
where κ = 1/[3(μ
a
+ μ
′
s
)]. The parameter G may be found experimentally as G =
(1+R)/(1−R) [GFB83] where
R≈−1.440n
−2
int
+0.710n
−1
int
+0.668+0.0636n
int
(1.9)
or using Fresnel’s laws and Keijzer’s method [KSS88] as
G =
2/(1+R
0
)−1+|cosθ
c
|
3
1−|cosθ
c
|
2
(1.10)
whereθ
c
= sin
−1
(1/n) is the critical angle, andR
0
= (n−1)
2
/(n+1)
2
is the Fresnel’s
reflection coefficient. The boundary condition in (1.8) is called the Robin-type boundary
condition. To summarize, the HOBT forward model assembly constitutes solving the
systems:
−{c▽·κ(r)▽−μ
a
(r)c}
−1
q(r) = Φ(r) inΩ, (1.3)
Φ(r)+2κG
∂Φ
∂ν
= 0, r∈ dΩ. (1.8)
4
where dΩ denotes the boundary of the animal volumeΩ.
1.2 The existing forward solver
The 3D domain is discretized into T tetrahedral elements, connected at v vertex nodes.
The solution Φ is discretized by finite element (FE) basis functions to Φ
h
. For conve-
nience, we denote the discrete solution as φ by dropping the subscript. The problem of
solving for φ becomes a sparse matrix inversion problem in the Finite Element Method
(FEM) framework [ASHD93]. The solution for photon flux at v discrete locations is
given by the solution to
Fφ =w, (1.11)
where
F =K(κ)+C(μ
a
)+ηB, F∈R
v×v
, and φ,w∈R
v
.
The load vector for a specific source configuration is denoted by w. The FEM matrix
F is assembled by adding matrices K and C that depend on the diffusion κ and absorp-
tion coefficients μ
a
, respectively, and a matrix B that imposes the boundary conditions.
Assume that the data are acquired for m surface nodes with n internal source locations.
Then, the monochromatic forward modelA(λ) is given by
A(λ) =DF
−1
W, A(λ)∈R
m×n
(1.12)
where D is a detector selection matrix andW =
[w
1
w
2
... w
n
]
is a matrix of
the load vectors corresponding to the sources under consideration. Our implementation
of the FEM followed the steps outlined in [ADSO00]. We have written a FEM solver
in MATLAB that utilizes the whole animal volume at once. The most tedious and time-
consuming computation is that of solving (1.11) for a large number of right hand sides
5
(RHS). The existing method is that of using MATLAB’s \ operator sequentially, i.e.
solving one RHS at a time in a loop. For example, for a FEM matrix with v = 59,332
and with only 100 RHS, the existing method required 47 min (see Table 2.7).
1.3 Investigation for fast forward model solvers
In a standard HOBT problem, the FEM matrix varies with the number of nodes of
the tesellation from about 60,000 to 220,000. While the FEM matrix is sparse (spar-
sity ratio of about 0.02%), it’s inverse is full and hence, direct inversion is impossible.
In this work, we investigate several methods for speed up in solving (1.11), and con-
sequently, the HOBT forward model. Techniques involved include sparse matrix re-
ordering, investigation of pre-conditioners to be used with the conjugate-gradient based
minimization, and Cholesky factorization with forward and backward substitution for
single and lumped RHS (the\ operator of MATLAB).
MATLAB was the preferred programming language for this work because of its
broad range of functions available for sparse matrix manipulations as well as for its
simplicity in coding. The problem addressed through this thesis can be solved directly
using parallel computing or multi-threading. However, for our case, we are interested
in the code running on a single CPU desktop personal computer (PC) placed alongside
an optical imaging system in a biological laboratory. For this thesis, all simulations are
performed using only a single CPU of an AMD OPTERON 250 dual CPU computer
system.
6
Chapter 2
Investigation of fast solvers
2.1 Introduction
The assembly of the forward model in HOBT by the FEM-based solution to the diffusion
equation, essentially, requires the solution of a system of linear equations represented in
the matrix-vector notation as
Fφ =w, (2.1)
where the matrixF ∈ R
v×v
, and vectors φ,w ∈ R
v
. Solving this general problem by
the LU decomposition [GV96] would cost a total of O(v
3
) flops (i.e. the cost of the
decompositionF =LU plus the cost of back-substitution). It is important to note that
for HOBT, the problem (2.1) needs to be solved repeatedly with the same matrix F on
the left hand side, but several right and sides (RHS)w
i
, i = 1,...,p:
FΦ =W, (2.2)
where the matrixF∈R
v×v
,Φ,W∈R
v×p
withΦ =
[φ
1
φ
2
... φ
n
]
representing
a matrix whose columns are the solutions andW =
[w
1
w
2
... w
n
]
. Instead
of solving the system directly, the decomposition of F can be performed only once,
followed by repeated back-solving. The cost of solving (2.2) by this method will be
O(v
3
+ pv
2
) instead of O(pv
3
) for the direct solver without preprocessing. A straight-
forward implementation of this scheme, however, is not possible, because the matrix F
7
is large, with v often of the order of several hundred thousands, v = O(10
5
), and the
number of RHS’sw
i
of order p = O(10
3
). It is important to note the following:
1. The matrix F is a symmetric, positive definite (SPD) and sparse matrix with the
number of non-zero elements (denoted as nz) per row of the order of tens, i.e.
nz/v = O(10).
2. We are interested in the solution φ
m
only at the surface nodes and not at all v
nodes. Thus, we do not desire to have the entireΦ in the computer memory.
3. Since w is a finite grid representation of the source location and strength, W is
sparse as well (about 4 nz elements per column).
Matrix size Non-zeros Density as Density as Condition
v nz nz/v nz/v
2
number
59332 782853 13.19 2.2×10
−4
6.8×10
4
146872 2004522 13.64 9.2×10
−5
7.8×10
5
211372 3156406 14.22 7×10
−5
9.7×10
7
Table 2.1: Table showing the sample matricesF used for this thesis
In this chapter, we discuss reordering schemes that use the SPD property of F and
evaluate them for use with realistic HOBT matrices as are indicated in Table 2.1. Fur-
ther, we analyze complete and incomplete Cholesky decompositions of reordered matri-
ces. These decompositions are employed for the following three solution schemes which
are discussed in detail in this chapter;
1. Reordering, complete Cholesky factorization of the reordered matrix, and
repeated back-substitution.
2. Reordering, incomplete Cholesky factorization of the reordered matrix, and using
the Cholesky factor as a pre-conditioner for the Preconditioned Conjugate Gradi-
ent (PCG) method.
8
3. Reordering, complete Cholesky factorization of the reordered matrix, and using
lumped RHS’s for back-substitution.
2.2 Sparsity of our FEM matrix and reordering
The cost associated with solving sparse matrix problems depends not only on their spar-
sity but also on the distribution of the non-zero elements. An accurate estimate of com-
putational cost, namely O(vk
2
) for decomposition and O(vk) for backsolving, where k
is the half bandwidth,k << v, may be given only for banded SPD matrices. The matrix
F for HOBT is not banded and has an irregular distribution of the non-zero elements,
thus, the cost cannot be estimated accurately.
To minimize the fill-in phenomenon due to the decomposition of F, the following
two sparse matrix reordering schemes may be used:
1. Symmetric reverse Cuthill-McKee ordering (RCM) [GL81]: This scheme finds
a permutation that produces a matrix that has its non-zero elements close to the
main diagonal compared to the original matrix, thus, reducing the bandwidth.
2. Symmetric approximate minimum degree permutation (AMD) [ADD04] that pro-
duces a matrix with a sparser Cholesky factor compared to the original matrix.
The reordered matrices often have the ’inverted fir tree’ shape along their diago-
nal.
Typical distributions of the non-zero elements in our matrix F before and after
reordering are shown in Figure 2.1. We have used MATLAB’s implementation of func-
tions for sparse matrix operations in simulations, experiments and presentation.
In Table 2.2, we show the cpu times for decomposition and reordering for the three
sample matrices. In Table 2.3, the cost of backsolving is presented. We should note
9
(a)
0 1 2 3 4 5
x 10
4
0
1
2
3
4
5
x 10
4
nz = 782853
(b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 2.1: Pattern of non-zeros in the FEM matricesF of (2.2) before and after reorder-
ing. Each row in the figure grid corresponds to matrices of the same size. The first row
corresponds to the matrix of size 59332, the second to that of size 146872 and the third
to that of size 211372. The columns correspond to the original FEM matrix, the FEM
matrix reordered by AMD, and the FEM matrix reordered by RCM respectively.
here that in our context, while the decomposition and the reordering will be performed
just once, the efficiency of the back-solver is important, for this step is repeated several
thousand times.
10
Size (v) Decomposition RCM AMD
without reordering Reordering Decomp. Reordering Decomp.
59332 99.4 0.34 10.6 0.6 2.5
146872 490 0.9 24.2 1.7 7.5
211372 2588 1.4 130 2.8 18
Table 2.2: Table shows the reordering and decomposition costs in seconds for three
sample matrices in HOBT
Size (v) Without reordering RCM reordering AMD reordering
59332 0.142 0.064 0.065
146872 0.4 0.16 0.15
211372 0.7 0.39 0.28
Table 2.3: Table shows the backsolving cost for a single sample RHS
Notice from tables 2.2 and 2.3 that the AMD scheme of reordering is superior in our
case. Henceforth, all studies are carried out using the AMD reordering.
2.3 Cholesky decomposition and the fill-in phenomenon
Consider the cost of solving the system
˜
F
˜
Φ =
˜
W (2.3)
using the Cholesky decomposition
˜
F =
˜
R
T
˜
R for
˜
F, where
˜
F is the AMD-reordered
v × v matrix,
˜
Φ and
˜
W are v × p reordered matrices, and
˜
R is the v × v Cholesky
factor of
˜
F. Due to the fill-in phenomenon that occurs even in well-structured matrices,
the number of non-zeros in the Cholesky factor may increase compared to that in the
original matrix, thus, contributing significantly to the computation cost. In this section,
we investigate the effects of incomplete Cholesky decomposition on the back-solving
cost. MATLAB’s implementation of the incomplete Cholesky decomposition (cholinc)
11
with a drop tolerance (droptol) option [Saa96] was used for all the studies. This factor-
ization is computed by performing the incomplete LU decomposition (here
˜
R
T
˜
R) with
the pivot threshold option set to 0 (which forces diagonal pivoting) and then scaling the
rows of the incomplete upper triangular factor,
˜
R, by the square root of the diagonal
entries in that column. Setting the droptol parameter to 0 results in a complete Cholesky
decomposition. In Table 2.4, we show the dependence of the cpu time for forward and
back-solving on the non-zero entries in the Cholesky factor
˜
R as a function of the drop-
tol parameter for a sample matrix.
Droptol nz for Cost for Cost of
˜
R decomposition back-substitution
0 6874570 12.02 0.57
10
−5
2974210 10.06 0.26
10
−4
1980942 5.73 0.18
10
−3
1132135 2.55 0.11
10
−2
549798 1.03 0.06
10
−1
173272 0.23 0.02
Table 2.4: Table shows the backsolving cost for a single sample RHS for the matrix
with v = 59332. MATLAB’s cholin operator was used for the incomplete Cholesky
decomposition. For the complete Cholesky decomposition, the same function was used
with droptol = 0.
We note the following from the table:
1. As the drop-tolerance value increases, the number of non-zero entries in the
Cholesky factor decreases, and the backsolving cost decreases.
2. The cost for computing the incomplete Cholesky decomposition shows a simi-
lar trend by decreasing with increase in droptol. However, this cost is of lesser
importance than the cost of the backsolver (because p, the number of RHS, is
very large).
3. The cost of back-substitution is proportional to the number of non-zerosnz in
˜
R.
12
This study was important for enabling us to make a decision about using the incomplete
Cholesky factor as a preconditioner for PCG. Notice, however, that with the increase
of droptol value, the Cholesky factors are becoming increasingly approximate (incom-
plete), see 2.4.2.
2.4 Solution schemes for forward models in HOBT
2.4.1 Solution by sequential back-substitution
A direct method to solve the problem in 2.1 is to use MATLAB’s ‘\’ operator for one
RHS at a time. This operator reorders the sparse matrix, computes a complete Cholesky
decomposition and solves the problem by forward and back-substitution. The drawbacks
of using this scheme are obvious. The Cholesky decomposition will be carried out every
single time for each RHS and the process will be very costly. An alternate method is to
reorder the matrix, compute the Cholesky factorization only once, and use forward and
back substitution for each individual RHS. It is absolutely essential to reorder the matrix
before the factorization. In our experiments, the Cholesky factorization of the unordered
matrix took almost 30 times more time than an AMD-reordered59332×59332 matrix.
The ratio increased to about 120 times for a 211372× 211372 matrix. With AMD-
reordering for the FEM-matrices, the profile of the non-zero elements changes so that
they are now closer to the main diagonal. This implies smaller fill-in and hence, the
Cholesky factor is computed much faster than using an unordered matrix. Table 2.5
shows the cost of using the ‘\’ operator sequentially versus using the AMD-reordered
FEM matrix and it’s Cholesky decomposition for solving the system with just 50 RHS.
The saving by using AMD-reordering is considerable and increases with the number of
RHS, see Section 2.4.3. It should also be noted that MATLAB’s ‘\’ operator includes
AMD reordering and computation of the complete Cholesky decomposition by default.
13
Solving sequentially using ‘\’ amounts to reordered matrix and computing the Cholesky
factorization for each RHS.
Cost of using\ for solving Cost for reordering and Cost for forward
with 50 RHS repeatedly Cholesky decomposition and back-substitution
128.5 2.6 28.5
Table 2.5: Table shows the cost of repeated reordering, Cholesky decomposition, and
solving versus pre-computation of the Cholesky factor and sequential backsolving for
the matrix withv = 59332. Also, see section 2.4.3
.
2.4.2 The incomplete Cholesky factor as a preconditioner for PCG
The rate of convergence of the conjugate gradient method can be strongly affected by
the choice of the preconditioner. The preconditioner is a positive definite matrix chosen
to approximate the inverse of theF matrix. If the eigenvalues of the product of the
preconditioner andF are tightly clustered compared to those ofF itself, convergence is
achieved in relatively few iterations. The incomplete Cholesky factors,
˜
R can be used as
a preconditioner and the system in (2.1) can be solved by the preconditioned conjugate
gradient (PCG) iterations. The product of cost per iteration and the total number of
iterations required for convergence constitutes the total cost of PCG.
There is a trade-off while selecting the droptol parameter. On one hand, for each
iteration of PCG, a system of equations with the matricesR
T
R needs to be solved,
i.e. two backsolving operations are performed at each step (Table 2.6 shows the cpu
cost for a single RHS as a function of droptol). This cost is the largest contribution to
the total cpu. On the other hand, the incomplete Cholesky decomposition
˜
R
T
˜
R leads
to an approximate matrix
˜
F =
˜
R
T
˜
R, and the disparity between the original matrixF
and it’s approximation
˜
F (in terms of the number of non-zero entries and of spectral
properties) increases as droptol increases. Consequently, the incomplete Cholesky factor
14
droptol CPU for back # of PCG CPU for PCG Decomposition Total
substitution iterations time CPU
10
−5
0.26 3 1.06 10.06 11.12
10
−4
0.18 4 1.08 5.73 6.81
10
−3
0.11 7 1.1 2.55 3.65
10
−2
0.06 16 1.9 1.03 2.93
10
−1
0.02 51 5.14 0.23 5.37
Table 2.6: The cost for PCG to converge as a function of droptol for the incomplete
Cholesky decomposition for the matrix of size v = 59322
˜
R progressively become worse as a preconditioner for PCG as an effect of increase in
droptol, and thus, the number of PCG iterations required for convergence increases.
In Table 2.6, the cost of computing the incomplete Cholesky decomposition, the #
of iterations for PCG (it), and cost for PCG iterations are presented as a function of
droptol for the matrix of sizev = 59322.
A comparison of Tables 2.4 and 2.6 shows that increase in droptol implies decrease
in cpu for backsolving, but increase in the number of PCG iterations. As a result, we
should expect to find an optimal value for droptol that minimizes the total cost. Figure
2.2 shows a plot of the time required for computing an incomplete Cholesky factoriza-
tion and finding the solution by PCG. As indicated by the plot and the table, droptol
= 10
−3
is close to optimal. We also note here that if the Cholesky factor is computed
instead of the incomplete one with droptol = 0 (i.e. using MATLAB’s chol instead of
cholinc) for the system of Table 2.6, the cost was 2 seconds as against 12 seconds for
cholinc.
Also, note that the PCG-based method is better than sequential backsolving, see
2.4.3.
15
10
−5
10
−4
10
−3
10
−2
10
−1
10
−1
10
0
10
1
10
2
droptol
CPU in sec
Time required by PCG to converge
Time taken for the Cholesky decomposition
Figure 2.2: Plot of the cost for computing the incomplete Cholesky factorization and for
PCG, as a function of droptol for the matrix of size v = 59322
2.4.3 Solution using lumped RHS
MATLAB’s ‘\’ operator performs matrix left division. The description of this operation
for the purpose of solving our systemFΦ =W (2.2) is as follows:
If F is symmetric positive definite (SPD), the Cholesky factorization requires less
than half the time of a general factorization. The Cholesky factorization isF =R
T
R,
where R is upper triangular. The solutionΦ is computed by solving two triangular
systems,Φ =R\(R
T
\W).
In our case, the FEM-matrix F is SPD and sparse. Thus, using the ‘\’ operator fits
well for our purposes. However, unlike the method presented in subsection 2.4.1, we
lump together the individual RHS in sub-matrices, each of 500 columns. Subsequently,
the ‘\’ operator is used to compute 500 solutions, all at once. This procedure can be
16
50 100 150 200 250 300 350 400 450 500
10
1
10
2
10
3
10
4
10
5
10
6
Sequential operation
PCG
Lumped RHS
Figure 2.3: Cost of solving multiple RHS using sequential operations, PCG and using
lumped RHS for the matrix of size v = 211372 as a function of the number of RHS.
repeated until all the required solutions are computed. We also note that matrix W
transfers the source strength from a 3D location onto the closest tetrahedron nodes. As
a result, W is sparse with an average of 4 non-zero elements per column. Thus, W can
be precomputed and does not require a lot of memory.
# of Sequential ‘\’ PCG Lumped RHS Ratio
RHS (a) (b) (a/b)
50 1420 308 33.5 42.3
100 2840 616 38.6 73.5
200 5680 1232 50.5 112.4
300 8520 1848 64.3 132.5
400 11360 2464 82.6 137.6
500 14200 3080 102.6 138.4
Table 2.7: Table shows the cpu for varying number of RHS using the \ operator and
PCG for the matrix of size v = 211372.
17
Figure 2.3 and table 2.7 shows the cost of using lumped RHS versus PCG and of
solving each RHS individually for the matrix with v = 211372. These results clearly
indicate that there is a big advantage in using lumped RHS. Remark: there is a trade-off
between memory and the number of solutions computed at once. The number 500 for the
RHS was experimentally determined to be suitable for the largest matrix (v = 211372).
This means that processing more thanp = 500 RHS simultaneously would require using
auxiliary memory, thus increasing the CPU significantly, while using RHS less than 500
would not use the computing power of the machine efficiently.
Figure Table 2.7 examines the effect of the number of RHS ‘p’ on the cost of com-
puting the solution. As expected, the cost is proportional to the number of RHS. There is
a reduction in the cost of more than 100 times when using the ‘\’ operator with lumped
RHS for p = 500. Consequently, there is a big advantage when using the method of
lumped RHS with a high enough p that is supported by the computer memory. For our
machine (memory of 2 GB), this was p = 500 for v = 211372.
Table 2.8 examines the dependence of the size of the matricesv on the cost of com-
puting the solution. For the two largest matrices (v = 146872 and v = 211372), we
observe that the cpu grows as the square of the non-zeros elements (nz). As indicated
by the the last column in Table 2.8, the increase in saving is proportional to the increase
in the size of the matrices (v).
Size Non-zeros Sequential ‘\’ PCG Lumped RHS Ratio
nz (a) (b) (a/b)
59332 782853 125 58.5 3.57 35.0
146872 2004522 570 124.5 14.2 40.1
211372 3156406 1420 308 33.5 42.4
Table 2.8: Table shows the cpu for all the three matrices under consideration and 50
RHS when using the\ operators and PCG.
18
2.5 Conclusions
In this thesis, we presented fast solvers for the assembly of forward models in HOBT.
We examined three methods for solving (2.2), (a) using MATLAB’s \ operator in a
loop for sequential solving (the existing method), (b) using PCG with the incomplete
Cholesky decomposition of the FEM matrix as a pre-conditioner, and (c) using MAT-
LAB’s\ operator with lumped RHS. Following are our conclusions from the investiga-
tions undertaken. We refer to the largest FEM matrix (v = 211372) asH:
1. The combined cost for matrix reordering and incomplete Cholesky decomposition
(withdroptol = 10
−3
) forH is about a 100 times less than the cost of computing
the incomplete Cholesky decomposition of unorderedH with the same droptol,
see table 2.2.
2. Increase in droptol corresponds to decrease in the backsolving cost. The cost of
computing the incomplete Cholesky factor decreases with increase indroptol, see
table 2.4.
3. The cost of backsolving is proportional to the non-zeros (nz) in
˜
R, the incomplete
Cholesky factor, see table 2.6.
4. If PCG is used for solving (2.2), droptol = 10
−3
is close to optimal.
5. The cost of solving (2.2) by the method of lumped RHS is about 30 times less
than the cost for PCG forH. The cost for the existing method is about 4 times
higher than that of PCG for matrixH.
6. For large matrices, the cpu when using the\ operator grows as the square of the
non-zeros elements (nz) in the matrix.
19
7. The cost for solving usingH for 500 RHS is at least 100 times less than the
cost for the existing method of sequential solving using MATLAB’s \ operator.
This work clearly shows that the cost for solving the HOBT forward problem is
reduced by more than 100 times by using the proposed method of lumped RHS
with MATLAB’s\ operator (see Section 2.4.3) than the existing method.
20
Bibliography
[ADD04] P.R. Amestoy, T. A. Davis, and I. S. Duff. Algorithm 837: AMD, an
approximate minimum degree ordering algorithm. ACM Transactions on
Mathematical Software, 30(3):381–388, Sep 2004.
[ADSO00] S. R. Arridge, H. Dehghani, M. Schweiger, and E. Okada. The finite ele-
ment model for propagation of light in scattering media: A direct method
for domains with nonscattering regions. Medical Physics, 27(1):252–264,
2000.
[AH03] G. S. Abdoulaev and A. H. Hielscher. Three-dimensional optical tomogra-
phy with the equation of radiative transfer. Journal of Electronic Imaging,
12(4):594–601, October 2003.
[Arr95] S. R. Arridge. Photon-measurement density functions. part i: Analytical
forms. Applied Optics, 34(31):7395–7409, 1995.
[Arr99] S.R. Arridge. Optical tomography in medical imaging. Inverse Problems,
15(2):41–93, 1999.
[ASHD93] S. R. Arridge, M. Schweiger, M. Hiroaka, and D. T. Delpy. A finite ele-
ment approach for modeling photon transport in tissue. Phys. Med. Biol.,
20(2):299–309, 1993.
[CDB
+
05] A. J. Chaudhari, F. Darvas, J. R. Bading, R. A. Moats, P. S. Conti, D. J.
Smith, S. R. Cherry, and R. M. Leahy. Hyperspectral and multispectral
bioluminescence optical tomography for small animal imaging. Physics in
Medicine and Biology, 50(23):5421–5441, Nov 2005.
[CZ67] M. C. Case and P. F. Zweifel. Linear Transport Theory. Addison-Wesley
Inc., 1967.
[GFB83] R. A. J. Groenhius, H. A. Ferwerda, and J. J. Ten Bosch. Scattering and
absorption of turbid materials determined from reflection measurements 1:
Theory. Applied Optics, 22(16):2456–2462, 1983.
21
[GL81] A. George and J Liu. Computer Solutions of Large Sparse Positive Definite
Systems. Prentice-Hall, 1981.
[GRWN03] E. E. Graves, J. Ripoll, R. Weissleder, and V . Ntziachristos. A submil-
limeter resolution fluorescence molecular imaging system for small animal
imaging. Medical Physics, 30(5):901–911, 2003.
[GV96] G. H. Golub and C. F. Van Loan. Matrix Computations. The John Hopkins
University Press, MD, 3 edition, 1996.
[HST
+
94] R. C. Haskell, L. O. Svaasand, T. T. Tsay, T. C. Feng, M. S. McAdams, and
B. J. Tromberg. Boundary conditions for the diffusion equation in radiative
transfer. J. Opt. Soc. Am. A, 11(10):2727–2740, 1994.
[KSS88] M. Keijzer, W. M. Star, and P. R. M. Storchi. Optical diffusion in layered
media. Applied Optics, 27(9):1820–1824, 1988.
[RCN01] B. W. Rice, M. D. Cable, and M. B. Nelson. In vivo imaging of light-
emitting probes. Journal of Biomedical Optics, 6(4):432–440, 2001.
[RSM01] R. Roy and E. M. Sevick-Muraca. Three-dimensional unconstrained
and constrained image-reconstruction techniques applied to fluorescence,
frequency-domain photon migration. Applied Optics, 40(13):2206–2215,
2001.
[Saa96] Y . Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing
Company, 1996.
[SAHD95] M. Schweiger, S. R. Arridge, M Hiroaka, and DT Delpy. The finite element
method for the propagation of light in scattering media: Boundary and
source conditions. Medical Physics, 22(11):1779–1792, 1995.
22
Abstract (if available)
Abstract
Hyperspectral Optical Bioluminescence Tomography (HOBT) yields fairly accurate 3D reconstructed source distributions in small animal studies. The diffusion equation models the light propagation in tissue and can be solved by the finite element method (FEM). For the assembly of the optical forward model, methods for fast inversion of the FEM matrix are crucial. In this thesis, we first evaluate sparse matrix reordering techniques and their impact on the cost of computing a Cholesky factorization of the FEM matrix. The Cholesky factorization is used further for two purposes, (a) for solving the system of linear equations by forward and back-substitution and (b) as a preconditioner to the Preconditioned Conjugate Gradient (PCG) method. Results from the methods presented here indicate a cost reduction of more than 100 times compared to existing methods.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Hyperspectral and multispectral optical bioluminescence and fluorescence tomography in small animal imaging
PDF
Computational methods for fluorescence molecular tomography
PDF
Image registration with applications to multimodal small animal imaging
PDF
Fast iterative image reconstruction for 3D PET and its extension to time-of-flight PET
ZIP
Development of a multi-mode optical imaging system for preclinical applications in vivo [program]
PDF
Development of a multi-mode optical imaging system for preclinical applications in vivo
PDF
3D vessel mapping techniques for retina and brain as an early imaging biomarker for small vessel diseases
PDF
Contributions to structural and functional retinal imaging via Fourier domain optical coherence tomography
PDF
Metasurfaces in 3D applications: multiscale stereolithography and inverse design of diffractive optical elements for structured light
PDF
Molecular imaging data grid (MIDG) for multi-site small animal imaging research based on OGSA and IHE XDS-i
Asset Metadata
Creator
Chaudhari, Abhijit J. (author)
Core Title
Fast solvers in hyperspectral optical bioluminescence tomography for small animal imaging
School
Viterbi School of Engineering
Degree
Master of Science
Degree Program
Electrical Engineering
Publication Date
01/17/2007
Defense Date
12/13/2006
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
fast solvers,hyperspectral optical bioluminescence tomography,OAI-PMH Harvest
Language
English
Advisor
Proskurowski, Wlodek (
committee member
), Sacker, Robert (
committee member
), Wang, Chunming (
committee member
)
Creator Email
ajchaudh@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m228
Unique identifier
UC1470811
Identifier
etd-Chaudhari-20070125 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-156808 (legacy record id),usctheses-m228 (legacy record id)
Legacy Identifier
etd-Chaudhari-20070125.pdf
Dmrecord
156808
Document Type
Thesis
Rights
Chaudhari, Abhijit J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
fast solvers
hyperspectral optical bioluminescence tomography