Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Image and video enhancement through motion based interpolation and nonlocal-means denoising techniques
(USC Thesis Other)
Image and video enhancement through motion based interpolation and nonlocal-means denoising techniques
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
IMAGE AND VIDEO ENHANCEMENT THROUGH MOTION BASED
INTERPOLATION AND NONLOCAL-MEANS DENOISING TECHNIQUES
by
Tanaphol Thaipanich
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2010
Copyright 2010 Tanaphol Thaipanich
Table of Contents
List Of Tables iv
List Of Figures v
Abstract ix
Chapter 1: Introduction 1
1.1 Signicance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Contribution of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Video Error Concealment . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Video Frame Rate Up-Conversion . . . . . . . . . . . . . . . . . . 8
1.3.3 Image denoising application . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Organization of the Proposal . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2: Research Background 12
2.1 Video Error Concealment . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Error Detection Techniques . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Error Concealment Techniques . . . . . . . . . . . . . . . . . . . . 15
2.2 Video Frame Rate Up-Conversion . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Image denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 3: Low-Complexity Video Error Concealment Using OBMA 27
3.1 Introduction to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 BMA and OBMA Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 BMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 OBMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Four Other Derived EC Algorithms . . . . . . . . . . . . . . . . . 31
3.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Explantion of Superior OBMA Performance . . . . . . . . . . . . . . . . . 34
3.4.1 Edge Detection Eect of BMA Distortion Computation . . . . . . 34
3.4.2 Partial Block Matching . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.3 Eect of Error Types . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Two Extensions of OBMA . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.1 OBMA with Rened Local Search . . . . . . . . . . . . . . . . . . 42
ii
3.5.2 Multiple Boundary Layers . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 BMA/OBMA With FMO . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 Conclusion to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Chapter 4: Robust Video Frame Rate Up-Conversion (FRUC) Technique 56
4.1 Introduction to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Abrupt Illumination Change . . . . . . . . . . . . . . . . . . . . . 58
4.2.2 Low Frame Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.1 Local Intensity Adjustment for Abrupt Illumination Change . . . . 60
4.3.2 TWK Algorithm for FRUC . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Other Low Complexity FRUC Algorithms . . . . . . . . . . . . . . . . . . 68
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5.1 Abrupt Illumination Change . . . . . . . . . . . . . . . . . . . . . 73
4.5.2 Low frame rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Conclusion to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Chapter 5: Adaptive Nonlocal Means Algorithm for Image Denoising 84
5.1 Introduction to Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Adaptive Nonlocal Means (ANL-Means) Algorithm . . . . . . . . . . . . . 86
5.2.1 Block Classication . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.2 Adaptive Window Adjustment and Rotated Block Matching . . . . 88
5.3 Rician Noise Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Noise Level Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.1 Additive White Gaussian Noise . . . . . . . . . . . . . . . . . . . . 93
5.4.2 Rician Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.5.1 Additive White Gaussian Noise . . . . . . . . . . . . . . . . . . . . 101
5.5.2 Rician Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.6 Conclusion to Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Chapter 6: Conclusion and Future Work 111
6.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Future Research Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Bibliography 115
iii
List Of Tables
3.1 Complexity and quality comparison of several EC methods (the index is
used in Fig. 3-10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Burst error conditions, where the error condition index is used in Figures
3-14 and 3-15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Performance comparison of noise level estimation schemes using the Lapla-
cian operator and the proposed technique for AWGN of three variance levels. 96
5.2 Results of noise level estimation of the proposed technique for the Rician
noise of dierent standard deviations. . . . . . . . . . . . . . . . . . . . . 100
5.3 The PSNR comparison between the NL-means and the ANL-means algo-
rithms for the AWGN of three standard deviation values. . . . . . . . . . 102
5.4 Comparison of the PSNR values of denoised images with NL-R and ANL-R 107
iv
List Of Figures
2.1 (a) A video frame with lost information and (b) recovered by the error
concealment (EC) technique. . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Two EC algorithms based on linear spatial interpolation using (a) corner
pixels and (b) the nearest neighboring pixels [69]. . . . . . . . . . . . . . . 17
2.3 Four EC modes [78]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 The motion vector candidates used for EC in H.264. . . . . . . . . . . . . 20
2.5 Illustration of video frame rate up-conversion. . . . . . . . . . . . . . . . . 21
2.6 Motion compensated linear interpolation [51]. . . . . . . . . . . . . . . . . 21
2.7 Examples of linear lters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 The block-diagram of a decoder with an EC module. . . . . . . . . . . . . 29
3.2 Distortion computation in BMA and OBMA. . . . . . . . . . . . . . . . . 32
3.3 Performance comparison of six EC algorithms. . . . . . . . . . . . . . . . 35
3.4 Subjective quality comparison frame no. 100 of city sequence (CIF) with
a loss rate of 10%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 The distortion function of BMA resembles gradient-based horizontal and
vertical edge detectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 The edge in the vicinity of the block boundary is penalized by BMA. . . . 38
3.7 The edge in the vicinity of a block boundary helps select the right MB by
OBMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
v
3.8 BMA and OBMA performance comparison for dierent video sequences
with an MB loss rate of 10%. . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.9 Performance comparison of BMA and OBMA against two error types. . . 43
3.10 Quality and complexity comparison of various EC methods given in Table
3-1, where the number is the index of each method. . . . . . . . . . . . . . 46
3.11 The performance of OBMA with a dierent number of outer boundary
layers used in the matching process. . . . . . . . . . . . . . . . . . . . . . 48
3.12 The FMO patterns oered in H.264/AVC, where each color represents a
dierent slice group [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.13 An FMO pattern under consideration, where each color represents a dif-
ferent slice group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.14 Performance comparison of three selected FMO patterns under various
error conditions and video sequences. . . . . . . . . . . . . . . . . . . . . 52
3.15 Subjective quality comparison frame no. 30 of city sequence (QCIF) with
error condition T6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.16 Subjective quality comparison of foreman sequence (QCIF). . . . . . . . . 54
4.1 Frame rate up-conversion (FRUC) scheme. . . . . . . . . . . . . . . . . . 58
4.2 Examples of unreliable MV elds due to (a) abrupt illumination change
(CIF Crew 15 fps) and (b) low frame rate (CIF Soccer 7.5 fps). . . . . . . 61
4.3 Example of gradient frame frame no. 29 and 30 (CIF crew 15 fps) (a)
normal frame and (b) gradient frame. . . . . . . . . . . . . . . . . . . . . 64
4.4 Example of local intensity adjustment frame no. 3 (CIF crew 15 fps) (a)
before processing and (b) after processing. . . . . . . . . . . . . . . . . . . 64
4.5 Example of motion vector eld frame no. 3-5 (CIF crew 15 fps) (a) before
processing and (b) after processing. . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Example of motion projected macroblock (a) the rst-order forward trans-
lation macroblock (b) the second-order forward translational macroblock
(c) the rst-order backward translation macroblock and (d) the second-
order backward translational macroblock. . . . . . . . . . . . . . . . . . . 69
vi
4.7 Illustration of the TWK algorithm for the CIF ice sequence of 7.5fps: (a)
after the rst-order translation detection (b) after the second-order trans-
lation detection (c) after OBMA and (d) the original frame. . . . . . . . . 70
4.8 Illustration of the TWK algorithm for the CIF city sequence of 7.5fps;
(a) after the rst-order translation detection (b) after the second-order
translation detection (c) after OBMA and (d) the original frame. . . . . . 71
4.9 Interpolated frames in the presence of camera
ashlight: (a) MC without
local adjustment (b) MC with local adjustment (c) TWK without local
adjustment and (d) TWK with local adjustment. . . . . . . . . . . . . . . 74
4.10 Performance comparison of several low-complexity FRUC algorithms. . . 76
4.11 Comparison of interpolated frames for the CIF city sequence of 7.5fps: (a)
FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the original
frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.12 Comparison of interpolated frames for the CIF ice sequence of 7.5fps: (a)
FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the original
frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.13 Comparison of interpolated frames for the CIF Soccer sequence of 7.5fps:
(a) FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the original
frame. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.14 Performance comparison of several low-complexity FRUC algorithms - City
sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.15 Performance comparison of several low-complexity FRUC algorithms - Ice
sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.16 Performance comparison of several low-complexity FRUC algorithms - Soc-
cer sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1 Examples of block classication via SVD and K-means: (a) noisy image,
(b) energy in the dominant direction - the brighter region has a more
dominant edge direction and (c) classication results with the K-means
algorithm, where dierent gray values correspond to dierent classes. . . . 89
5.2 An denoised image example obtained by the NL-mean algorithm with the
Rician noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
vii
5.3 Illustration of the noise level estimation scheme using the Laplacian oper-
ator: (a) noisy Lena image with AWGN (
n
=40) and (b) the processed
image using the Laplacian operator. . . . . . . . . . . . . . . . . . . . . . 95
5.4 Six test images to be corrupted by the Rician noise. . . . . . . . . . . . . 99
5.5 Comparison of the averaged PSNR values of six denoising algorithms ap-
plied to seven test images corrupted by the AWGN with three standard
deviation values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.6 Visual quality comparison of various denoising algorithms for image Zelda
corrupted by AWGN with
n
= 40. . . . . . . . . . . . . . . . . . . . . . . 104
5.7 Visual quality comparison of various denoising algorithms for image Lena
corrupted by AWGN with
n
= 40. . . . . . . . . . . . . . . . . . . . . . . 105
5.8 Summarized results of image denoising performance with the Rician noise. 106
5.9 Visual quality comparison of NL-R and ANL-R algorithms for test image
no. 4 corrupted by the Rician noise with
n
= 40 and = 45 degree). . . 109
5.10 Visual quality comparison of NL-R and ANL-R algorithms for test image
no. 6 corrupted by the Rician noise with
n
= 50 and = 45 degree). . . 110
viii
Abstract
In this research, we investigate advanced image and video enhancement techniques based
on motion based interpolation and nonlocal-means (NL-means) denoising. The disserta-
tion consits of three main results. Two video processing applications; namely, video error
concealment (EC) and frame rate up-conversion (FRUC), based on motion analysis have
been examined. Then, an improved NL-means algorithm have been proposed for image
denoising. They are detailed below.
In the rst part of this study, low-complexity error concealment techniques are stud-
ied. The boundary matching algorithm (BMA) is an attractive choice for video error
concealment due to its low complexity. Here, we examine a variant of BMA called the
outer boundary matching algorithm (OBMA). Although BMA and OBMA are similar
in their design principle, it is empirically observed that OBMA outperforms BMA by a
signicant margin (typically, 0.5dB or higher) while maintaining the same level of com-
plexity. We rst explain the superior performance of OBMA, and conclude that OBMA
provides an excellent tradeo between the complexity and the quality of concealed video
for a wide range of test video sequences and error conditions. In addition, we present
two extensions of OBMA, i.e. rened local search and multiple boundary layers. These
ix
extensions can be employed to enhance the performance of OBMA at slightly higher com-
putational complexity. Finally, the eect of the
exible macroblock ordering (FMO) on
the performance of several EC algorithms is examined.
In the second part of this work, two challenging situations for video frame rate up-
conversion (FRUC) are identied and analyzed; namely, when the input video clip has
abrupt illumination change and a low frame rate. Then, a low-complexity processing
technique and robust FRUC algorithm are proposed to address these two issues. The
proposed algorithm utilizes a translational motion vector model of the rst- and the
second-orders and detects the continuity of these motion vectors. Additionally, a spatial
smoothness criterion is employed to improve perceptual quality of interpolated frames.
The superior performance of the proposed algorithm has been extensively tested and
representative examples are given in this work.
In the third part of this research, an adaptive image denoising technique based on the
NL-means algorithm is proposed. The proposed method employs the singular value de-
composition (SVD) method and the K-means clustering (K-means) technique to achieve
robust block classication in noisy images. Then, a local window is adaptively adjusted
to match the local property of a block. Finally, a rotated block matching algorithm based
on the alignment of dominant orientation is adopted for similarity matching. In addition,
the noise level can be accurately estimated using block classication and the Laplacian
operator. Experimental results are given to demonstrate the superior denoising perfor-
mance of the proposed adaptive NL-means (ANL-means) denoising technique over various
image denoising benchmarks in term of both PSNR and perceptual quality comparison,
x
where images corrupted by additive white Gaussian noise (AWGN) and Rician noise are
both tested.
xi
Chapter 1
Introduction
1.1 Signicance of the Research
Video error concealment (EC) and frame rate up-conversion (FRUC) have been topics of
great interest in video processing since they have a wide variety of practical applications.
Both of these techniques rely on motion information analysis in order to interpolate the
regions of interest. They correspond to missing data and up-sampled frames for EC and
FRUC, respectively. Image denoising is one of the classical problems in digital image
processing. It has been studied extensively due to its function as a pre-processing stage
for various image and video applications.
For visual communications over unreliable channels in a mobile and/or wireless en-
vironment, the received video quality is one of top priorities to users. When the user
receives a transmitted video signal, this data may experience some transmission loss be-
cause of fading signals, building occlusion,
uctuating channel bandwidth, etc. Without
proper EC in place, users may suer from severe quality degradation of playback video.
There are many techniques proposed to handle the eect of transmission loss. EC is one
1
of the most desired techniques for low restriction on its implementation [80]. One critical
function of EC is to recover the information of lost macroblocks. Among low-complexity
EC algorithms for macroblock recovery, the boundary matching principle and its derived
algorithms have attracted a lot of attention due to the excellent trade-o in complexity
and visual quality.
FRUC is widely used in several practical applications such as HDTV, Multimedia
PC, slow motion video review, etc. Throughout years, many FRUC methods have been
proposed with a wide range of computational complexity based on the quality of target
applications. Generally speaking, studies on FRUC have great advancement in recent
years and many FRUC algorithms yield exceptional quality [19, 90]. Nevertheless, there
are still conditions that existing FRUC algorithms do not work properly. This research
will focus on these conditions and propose eective solutions to resolve these problems.
Noise in digital image often arises from the transmission and acquisition process of
imaging devices [32]. The presence of noise not only produces undesirable visual quality
but also lowers the reliability of acquired image data. The objective of image denoising
is to recover the best estimate of the original image from its noisy version. It is equally
important for a denoising algorithm to suppress noise while preserving important image
features such as homogeneous regions, edges, and textures. In recent years, the image
denoising technique based on the nonlocal means (NL-means) algorithm has become a
research topic of great interest due to its remarkable performance [9, 10]. Nevertheless,
there are still some limitations to the existing NL-means algorithm. In this work, we
propose an adaptive nonlocal-means (ANL-means) algorithm which is capable of adjusting
denoising mechanism and parameters to match the local structure of an image.
2
Several challenging research problems are described below.
The boundary matching algorithm (BMA) is a non-normative error concealment
technique recommended by the H.264 standard [85]. It has the same level of com-
plexity as the outer boundary matching algorithm (OBMA). It has been observed
[92] that OBMA consistently yields higher quality of concealed video than BMA.
The superior performance of OBMA has not yet been fully understood.
Although existing FRUC algorithms produce high quality up-converted video streams
for common input sequences. There exist several situations, in which these FRUC
techniques fail to perform well. These situations include: 1) an abrupt illumination
change in the process of video capturing and 2) when the input video has a very
low frame rate (e.g., lower than 10 fps). Current FRUC algorithms have not been
evaluated under these circumstances.
Most EC and FRUC algorithms rely on the temporal and/or spatial correlation of
available data in order to interpolate the regions of interest. There are however other
factors to be considered such as the continuity of object contours, the interpolation
of uncovered and to-be-covered background, etc.
Image denoising based on the NL-means algorithm has produced promising results
[9, 10]. Methods for further performance improvement have been proposed in recent
years [54, 79]. Nevertheless, there are still several aspects of the NL-means algorithm
that can be further improved, including similarity measurement, exploitation of
local structure, etc.
3
Accurate estimation of the variance of the image noise is an important task in many
image processing applications. The noise variance is frequently used as an input
parameter for image processing techniques such as image denoising [21, 22], image
restoration [5, 29], image ltering [30, 88], etc. The performance of these algorithms
is highly dependent on the accuracy of the estimated noise level.
1.2 Review of Previous Work
Techniques for combating video transmission errors can be classied into three categories
[80] based on the role of the encoder and the decoder: 1) the encoder-based approach,
2) the joint encoder-decoder approach and 3) the decoder-based approach. They are
detailed below.
The encoder-based approach In this approach, only the encoder is used to handle
transmission errors in form of forward error prevention. This family of techniques
makes transmitted bit streams resilient to transmission errors by adding a small
amount of redundant data via the source coder or the channel coder. Techniques
in this category include: forward error correction (FEC), joint source and channel
coding, etc.
The joint encoder-decoder approach In this approach, the encoder and the decoder
work together in handling errors. It is also called interactive error control. Au-
tomatic repeat-request (ARQ) and hybrid automatic repeat-request (HARQ) are
examples of interactive error control.
4
The decoder-based approach EC is a post-processing technique performed at the
decoder. It usually relies on temporal and/or spatial correlation of video data to
conceal lost macroblocks.
Among these three categories, EC has the least restriction in the implementation
since forward error prevention requires a control over source coding while interactive
error control rely on a feedback channel between the sender and the receiver. These
conditions may not be available in many applications such as video broadcasting, etc.
The recent trend in error concealment is to utilize unique features oered by emerging
video standards, e.g., the variable block-size, multiple reference frames and quarter-pixel
precision of H.264/AVC, the inter-layer correlation of H.264/SVC and the inter-view
correlation of the H.264 multiview extension.
FRUC is a conversion tool to increase the frame rate of input video. There are
many low complexity techniques developed to perform the up-conversion such as frame
repetition, frame averaging, linear frame interpolation, etc. These techniques have a
low computational requirement, but often produce poor interpolated frames with motion
jerkiness, blurring moving objects and disconnected object contour.
Several advanced FRUC techniques have been developed to oer better visual quality
at the cost of higher complexity. For instance, Choi et al. [20] proposed a frame inter-
polation scheme that uses the bi-directional motion estimation to construct the motion
vector eld for up-sampled frames. Lee et al. [51] used a weighted sum of multiple motion
compensated images in frame interpolation to reduce the blocking artifact.
5
Image denoising is an important pre-processing stage in many image processing appli-
cations. Several image denoising techniques have been developed before such as neighbor-
hood ltering [53], total variation minimization [64], Wiener ltering [31], Gaussian scalar
mixture [62], method based on partial dierential equation [55], etc. The Gaussian and
the mean lters rely on 2-D convolution operator to remove noise from the signal. They
demand low computational complexity and work well for smooth regions. However, they
usually yield blurring edges and textures. Advanced denoising techniques such as total
variation minimization and methods based on partial dierential equations are shown to
produce a superior result around the edge region. The recent development of the NL-
means algorithm has drawn the attentions of many researches. Several modications and
improvemen54, 79] and enhance denoising performance [22, 45].
1.3 Contribution of the Research
Three image and video applications have been studied in this research. The main contri-
butions are stated in this section.
1.3.1 Video Error Concealment
In Chapter 3, we focus on low-complexity video error concealment techniques, and the
contributions include the following.
We evaluate and analyze the performance of two well-known EC techniques, i.e.
BMA and OBMA. Both of these algorithms rely on the block matching technique to
conceal the missing area of a frame. Despite their algorithmic similarity, we observe
6
that OBMA consistently outperforms BMA by a signicant margin under various
test conditions. Although this behavior was also pointed out by some researchers in
the past, no clear reason has been given for the superior performance of OBMA. We
not only attempt to explain the performance dierence between BMA and OBMA
but also perform extensive experiments to verify our explanation.
We propose two extensions of OBMA that can be employed to enhance its perfor-
mance: 1) OBMA with rened local search and 2) OBMA with multiple boundary
layers. For the rst extension, we propose three candidate searching schemes for
OBMA, i.e. full search, selective search and rened local search. Each of these
schemes has a dierent level of complexity and can be used to improve OBMA per-
formance at the cost of higher computational complexity. For the second extension,
a varying number of boundary layers is used in block matching. The performance
of each setting is studied in detail. We also show that OBMA can be utilized to
improve the performance of the overlapped motion vector EC algorithm by substi-
tuting the BMA matching criteria with the OBMA one.
We study the eect of three
exible macroblock orderings (FMO) on six dierent EC
algorithms under various transmission error conditions. These three ordering are
raster scan, line-interleaved and scattered order. Each of these macroblock ordering
schemes shows a dierent level of robustness against transmission error and can be
employed to enhance the performance of EC algorithms at the cost of small data
overhead.
7
1.3.2 Video Frame Rate Up-Conversion
In Chapter 4, we focus on low-complexity video frame rate up-conversion techniques, and
the contributions include the following.
We identify two particular scenarios where existing FRUC algorithms fail to produce
interpolated frames of good perceptual quality. They are: 1) video captured with
abrupt illumination change (e.g., camera
ashlight) and 2) the low frame rate of
the input video. We perform an extensive study to analyze the underline problem
of these two scenario. Then, this knowledge is used as a basis to develop new FRUC
solutions.
We propose a low complexity pre-processing solution, called the local intensity ad-
justment, to improve the performance of FRUC algorithms under the abrupt illumi-
nation change condition. In the rst stage, we perform abrupt illumination change
detection based on the frame distance information obtained from the rst- and the
second-order frame luminance histogram dierence. Then, an adaptive thresholding
technique is employed to detect a frame with abrupt illumination change. In the
second stage, we adjust the irregular frame intensity of the detected frame using
local intensity adjustment. With this pre-processing technique, FRUC algorithms
has a signicant performance improvement in the presence of abrupt illumination
change.
We propose a new FRUC algorithm, called TWK, that is able to produce up-
sampled frames of good visual quality from input video of a low frame rate. The
proposed technique aims to preserve not only the temporal continuity but also the
8
spatial smoothness of up-sampled frames. In order to provide good temporal conti-
nuity, the TWK algorithm relies on the detection of the rst- and the second-order
translational macroblocks. Unlike other existing FRUC algorithms, the TWK algo-
rithm does not attempt to obtain the best motion trajectory for every macroblock.
Instead, it only identies the macroblock that complies with the rst- and the
second-order translational motion models. Then, the spatial smoothness, which is
imposed by OBMA, is used to ll out the remaining un-interpolated areas in the
up-sampled frame. It is shown by experimental results that the proposed TWK al-
gorithm outperforms six other low complexity FRUC algorithms signicantly when
the input frame rate is lower than 10 fps.
1.3.3 Image denoising application
In Chapter 5, we propose an adaptive nonlocal means algorithm for image denoising
application, and the contributions include the following.
We proposed an adaptive image denoising technique based on the NL-means algo-
rithm. The technique is capable of adjusting its similarity matching process and
denoising parameters based on the local structure of a pixel. The eigenvalues of the
gradient eld obtained from the singular value decomposition (SVD) are employed
as a key feature to identify the edge, texture and smooth regions. The classication
results are adaptively determined by the K-means clustering (K-means) technique.
In addition, the similarity measurement process is enhanced to capture more self-
similar local regions using a rotated block matching algorithm with dominant ori-
entation alignment. The matching window is fully adapted to the local property in
9
order to improve the similarity distance measurement. It is shown by experimental
results that the proposed ANL-means algorithm has a signicant performance gain
over the traditional NL-means algorithm for various test images and conditions,
especially when the noise level is high. The proposed ANL-means algorithm also
outperforms several well-known image denoising benchmarks such as total variation
minimization and the method based on partial dierential equations in terms of the
objective PSNR measurement and the subjective perceptual quality comparison.
We analyze the performance and the denoising mechanism of the NL-means algo-
rithm under an independent and identically-distributed (i.i.d.) Gaussian additive
noise. We show that any zero-mean additive noise can be eectively removed by the
NL-means denoising technique. However, noise in the magnitude MRI images has
the Rice distribution which poses a greater challenge for the NL-means algorithm
since Rician noise is not additive and its mean is signal-dependent. In order to
denoise Rician noise, we modify the ANL-means algorithm and adopt biased noise
estimation. It is shown by experimental results that the ANL-means algorithm
achieves a considerable PSNR gain over the traditional NL-means algorithm in the
presence of various noise conditions.
We propose a low complexity method for noise level estimation based on the Lapla-
cian operator. The proposed technique relies on the classication results from the
ANL-means algorithm to separate the image structure from the smooth region to
enhance the accuracy of estimation. In addition to additive white Gaussian noise
10
(AWGN), we extend our studies to Rician noise. For Rician noise, the classica-
tion results from the ANL-means algorithm is exploited to identify the background
region so that the signal-dependency property of Rician noise can be addressed.
Then, the Laplacian-based noise level estimation scheme is employed to determine
the variance of Rician noise accurately. It is shown by experimental results that the
proposed noise level estimation technique oers accurate estimates for both AWGN
and Rician noise under a wide range of noise levels and testing conditions.
1.4 Organization of the Proposal
The rest of this proposal is organized as follows. The background of this research such
as techniques for error-resilient video transmission and video frame rate up-conversion is
given in Chapter 2. A low-complexity error concealment algorithm based on OBMA is
thoroughly studied in Chapter 3. A low-complexity pre-processing algorithm based on
local intensity adjustment is developed to handle the abrupt illumination change and a
low-complexity FRUC algorithm is proposed to handle the case of the low frame rate of
input video in Chapter 4. The proposed adaptive image denoising and the noisy level
estimation techniques are presented in Chapter 5. Finally, concluding remarks and some
future research problems are given in Chapter 6.
11
Chapter 2
Research Background
Research background for video error concealment (EC), frame-rate up-conversion (FRUC)
and image denoising is provided in this chapter.
2.1 Video Error Concealment
When video data are transmitted over an unreliable communication channel, they might
be corrupted due to transmission errors. Transmission errors may occur due to various
reasons such as congestion of data trac,
uctuation of the available bandwidth, fading
of transmitted signals, etc. Without a proper technique to handle data loss, received
video data may experience sever quality degradation in playback as shown in Fig. 2-1(a).
Good error handling techniques can reduce the perceptual quality degradation of received
video data. Fig. 2-2(b) illustrates an error-concealed video frame. Generally speaking,
techniques for combating transmission errors in visual communications can be divided
into the following two families [51].
Error control and recovering schemes
This type of techniques aims at lossless recovery, which was originated from data
12
Figure 2.1: (a) A video frame with lost information and (b) recovered by the
error concealment (EC) technique.
communication and is now applied to the video transmission application. Exam-
ples include: Forward Error Correction (FEC), Error Correction Coding (ECC),
Automatic Retransmission Request (ARQ), etc. (ARQ), etc
Signal-reconstruction and error-concealment techniques
Unlike error control and recovering schemes, signal-reconstruction and error-concealment
techniques do not attempt to restore the missing data perfectly. Instead, they aim
to obtain an approximation to missing data that are less objectionable to human
perception since the human visual system (HVS) is less sensitive to some distortion
in image and video sequences. Examples include spatial interpolation, motion-
compensated temporal prediction, etc.
EC has been studied extensively in the last decade due to its general applicability as
compared with other techniques. In the next subsection, we present techniques to detect
errors in video sequence as well as several well-known low complexity EC algorithms.
13
2.1.1 Error Detection Techniques
Error detection is crucial to the performance of EC. Without an accurate error detection
algorithm, EC cannot eectively conceal the missing data and stop the error from prop-
agating through the video sequence. There are three common approaches for video error
detection.
Synchronization marker detection
This family of error detection algorithms monitors some synchronization marker
such as the Group of Blocks Start Code (GBSC) or the slice header to identify the
validity of received data. If the synchronization marker cannot be detected, the
entire data block is declared to be lost until the next synchronization marker is
encountered.
Syntactic or semantic violation
Some properties of video data can be monitored to identify the existence of cor-
rupted data. They include:
{ a motion vector outside the allowed range;
{ an invalid variable-length codes (VLC) table entry;
{ an out-of-range DCT coecient;
{ the number of DCT coecients in a block exceeding the limit.
Information from multiplexing layer
For packetized video, transmission errors and the packet status can be obtained
directly from the multiplexing layer.
14
In addition to the above three approaches, there exist other error detection mecha-
nisms that rely on data processing techniques. For instance, Chen et al. [57] proposed
the used of fragile watermarks for error detection. Their proposed FEW (Force Even Wa-
termark) method forces certain DCT coecients to be quantized into an even number.
If such a watermarked coecient is detected as an odd number, an event of data corrup-
tion is identied. This method is capable of detecting an error accurately. However, it
demands control on source coding to alter encoded DCT coecients. Another method
was proposed by Mitchell et al. [58] under the assmption that the transition between two
consecutive macroblocks should be smooth, which can be used to check the smoothness
of the DC components of neighboring blocks in the absence of transmission errors.
2.1.2 Error Concealment Techniques
To conceal the lost information, EC algorithms rely on a priori information on video
sequences to restore the missing data. Some a priori information is described below.
Data consistency
This constraint relies on the fundamental knowledge on video sequences. For ex-
ample, the pixel value should be in the range of [0,255], a macroblock must have a
motion vector that points to a range within the reference frame, etc.
Smoothness
This is a basic assumption that has been widely used in many EC techniques and
interpreted in several dierent ways. For instance, the variation between adjacent
15
pixels within a lost macroblock and their spatially neighboring pixels in adjacent
macroblocks should be small.
Edge continuity
This is based on the observation that, if an edge is present in the neighboring
macroblocks and its direction passes through the missing area, the edge should pass
through the missing macroblock.
Statistic correlation
This is based on the assumption that pixel values in a video frame are realizations
of an underlying statistical model such as the Markov Random Field (MRF) model.
In general, an EC algorithm relies on the spatial or the temporal correlation existing in
a video sequence to conceal the lost data. The spatial concealment technique interpolates
missing data using available pixels in neighboring macroblocks. Two well known spatial
EC algorithms are illustrated in Fig. 2-2.
In Fig. 2-2(a), each pixel in a missing area is interpolated by calculating the value of
four adjacent neighboring corner pixels as
P (X;Y ) = (1n)(1m)P (X
1
;Y
1
)+(1n)mP (X
1
;Y
2
)+n(1m)P (X
2
;Y
1
)+nmP (X
2
;Y
2
)
(2.1)
n =
(XX
1
)
(X
2
X
1
)
and m =
(YY
1
)
(Y
2
Y
1
)
(2.2)
16
Figure 2.2: Two EC algorithms based on linear spatial interpolation using (a)
corner pixels and (b) the nearest neighboring pixels [69].
and P (X;Y ) is the pixel value at location (X;Y ). This EC algorithm is more robust
than the one presented in Fig. 2-2(b) since it utilizes the corner pixel from the neighboring
macroblock. Even with the loss of consecutive macroblocks, these corner pixels might still
be available, especially if the macroblock interleaving technique is employed. For linear
interpolation using the nearest neighboring pixels as shown in Fig 2-2(b), it usually yields
better performance than linear interpolation using corner pixels since the former oers
higher spatial correlation to the missing pixel. The interpolated pixel value can be written
as
P (i;j) =
1
d
L
+d
R
+d
T
+d
B
(d
R
P (i; 1) +d
L
P (i;N) +d
B
P (1;j) +d
T
P (N;j)) (2.3)
17
four neighboring macroblocks. If any nearest horizontal neighboring pixel is missing
due to consecutive macroblock loss, corner pixels may be used with the available nearest
neighboring pixel to obtain better interpolated results.
Temporal concealment relies on the temporal correlation to recovery the motion vector
information of missing area. Four widely used motion vector recovery techniques are
summarized below.
Set the motion vector of the missing macroblock to zero;
Use the motion vector of the corresponding block in the previous frame;
Use the average of motion vectors form spatially adjacent blocks;
Use the median of motion vectors from spatially adjacent blocks;
It was reported in [46] that the last method in the above list (i.e. using the median of
motion vectors from spatially adjacent blocks) produces the best results. These techniques
demand low computational complexity. However, its performance could be poor due to
an inaccurately interpolated motion vector eld. This is especially obvious for video with
fast motion.
To obtain better performance, more complex temporal concealment techniques have
been developed. Many recently proposed temporal EC algorithms rely on additional
features oered by the video coding standard such as the quarter-pel resolution, the
multiple reference frame, the loop lter, the adaptive block size, the
exible macroblock
ordering (FMO), etc. For example, Kim et al. [47] proposed a new temporal EC method
that utilizes the adaptive block size in H.264, which uses the coding modes of neighboring
18
Figure 2.3: Four EC modes [78].
macroblocks to indentify the suitable concealment mode for a lost macroblock. Based on
the modes of neighboring macroblocks, the sub-block of each lost macroblock is concealed
using a dierent set of motion vector candidates in block matching. Four EC modes used
by this algorithm is presented in Fig. 2-3. The set of motion vector candidates for each
EC mode is described below.
EC mode 1: The set of concealed motion vector candidates for block 0 isfZM, V
1
,
V
2
, V
8
g where ZM is the zero motion vector and V
1
, V
2
, V
8
are referred to motion
vectors located around the missing macroblock as shown in Fig. 2-4.
EC mode 2: The sets of concealed motion vector candidates for blocks 0 and 1 are
fZM, V
1
, V
2
, V
3
, V
7
g andfZM, V
4
, V
5
, V
6
, V
8
g, respectively.
EC mode 3: The sets of motion vector candidates for blocks 0 and 1 arefZM, V
1
,
V
3
, V
4
, V
5
g andfZM, V
2
, V
6
, V
7
, V
8
g, respectively.
EC mode 4: The sets of motion vector candidates for blocks 0, 1, 2 and 3 arefZM,
V
1
, V
3
g,fZM, V
2
, V
7
g,fZM, V
4
, V
5
g andfZM, V
6
, V
8
g, respectively.
19
Figure 2.4: The motion vector candidates used for EC in H.264.
2.2 Video Frame Rate Up-Conversion
The video frame rate up- and down-conversion technique is used in many practical ap-
plications such as video format conversion, slow motion playback for sports video, video
quality enhancement, etc. By video frame rate up-conversion (FRUC), one attempts to
get a higher frame rate by inserting interpolated frames between each pair of adjacent
video frames in the original video as shown in Fig. 2-5. Normally, the interpolation fac-
tor is equal to a positive integer number. However, for some applications, a non-integer
interpolation factor is required such as converting a motion picture with a frame rate
of 24 fps to high-denition television (HDTV) with a frame rate of 60. In this case, a
special care is needed in the frame distribution process. In [12], a time-shifted frame
distribution scheme was proposed to handle such a problem. Instead of interpolating
up-sampled frames with equal temporal spacing, the proposed method aims at using the
entire original video frame in the up-sampling process. It can be achieved by altering the
temporal gap of input video frames in the interpolation process.
20
Figure 2.5: Illustration of video frame rate up-conversion.
Figure 2.6: Motion compensated linear interpolation [51].
Simple FRUC techniques include frame repetition and linear frame interpolation such
as frame averaging. These methods usually yield interpolated frames of poor quality. To
incorporate the motion information in the interpolation process, the motion-compensated
frame interpolation technique has been developed, where an interpolated frame is ob-
tained based on the estimated motion vector eld. The estimation process acquires the
motion information by utilizing the existing correlation between two adjacent frames. The
performance of this technique is highly dependent upon the accuracy of the estimated
motion.
21
The motion compensated linear interpolation scheme is the most well known technique
in this family. It relies on the assumption that the object has a linear motion trajectory
between two consecutive frame. For frame rate up-conversion with a factor of two, the
interpolated frame can be obtained by
f(x;n
1
2
) =
f(x +
v
2
;n 1) +f(x
v
2
;n)
2
(2.4)
where x and n are the spatial and temporal indices, v is the motion vector of the
macroblock, and f(x;n 1) and f(x;n) are two adjacent video frames, respectively, as
shown in Fig 2-6. For other up-conversion factors, the temporal domain indices can be
adjusted accordingly. Motion compensated linear interpolation yields a result of moderate
quality.
If the estimated motion vectors are inaccurate, the interpolated frame usually has the
blocking artifact. Lee et al. [52] proposed a method to reduce the blocking artifact by
applying nonlinear ltering to multiple motion trajectories. To acquire more accurate
motion vectors, a weighted-adaptive motion compensated FRUC algorithm was proposed
in [51], which determines the reliability of estimated motion vectors and weigh multiple
motion trajectories accordingly.
One of the most challenging problems in developing a robust FRUC algorithm is object
occlusion since many FRUC techniques track object/macroblock across video frames to
estimate motion vectors of interpolated frames. With occlusion, FRUC algorithms cannot
determine the accurate motion trajectory and, thus, result in poorly interpolated frame.
22
The occlusion region can be classied to two types: 1) background to be covered (BTBC)
and 2) uncovered background (UB).
The occlusion usually occurs around object boundaries or within an object due to
object deformation (referred to as self-occlusion) [1]. Many FRUC algorithms have been
developed to address this problem. For example, Choi et al. [20] proposed a motion
compensated interpolation method that classies each area of a video frame into four
classes, i.e. stationary background, moving objects, to-be-covered regions and uncovered
regions. Each of these regions adopts an appropriate interpolation method as described
below.
Moving objects
The bi-directional motion compensated interpolation technique is used to predict
the object motion.
Covered background
Since covered background exists only in the previous frame and disappears in the
current frame, the forward prediction technique is used in this region.
Uncovered background
ThebBackward prediction technique is used to interpolate this region since uncov-
ered background is occluded in the previous frame and appears in the current frame
only.
23
Static background
Linear interpolation is employed in this region to reduce the computational com-
plexity. Linear interpolation is employed in this region to reduce the computational
complexity.
2.3 Image denoising
A digital image is a result of a light intensity measurement, usually made by the charge
coupled device (CCD) coupled with a photoelectric sensor [10]. During the image ac-
quisition process, noise often arises from the
uctuation of incoming photons. It can be
written as
Y =X +N; (2.5)
where Y =fy(i)ji2
g is the observed value, X =fx(i)ji2
g is the true value and
N = fn(i)ji 2
g is noise. Normally, noise is assumed to be an independently and
identically distributed (i.i.d.) random variable and white. The main objective of image
denoising is to recover the best estimate of the original image from its noisy version.
According to [59], the image denoising technique can be classied into two approaches:
the spatial ltering and the transform domain ltering methods.
Spatial ltering methods
Image denoising algorithms based on the spatial ltering technique rely on the
spatial information in order to estimate the original signal. The spatial lter can
be further divided into two sub-categories based on its characteristics.
24
1. Linear lters
The linear spatial lter is simply a 2-D convolution operator applied to each
image pixel. They usually have low computational complexity but tend to blur
sharp edges and ne texture. In addition, they often perform poorly in the
presence of signal-dependent noise such as the Rician noise in medical images.
Examples of linear lters include the Gaussian lter, the mean lter, etc.
2. Nonlinear lters
The main drawback of image denoising with linear lters is burring of edge
and texture regions. The nonlinear ltering technique has been developed
to overcome these problems. Due to its low computational complexity, edge
preserving property and robustness to impulsive noise, the median lter is
popular among various nonlinear lters [3].
3. Transform domain ltering methods
The transform domain ltering techniques can be further divided into two
subgroups based on the basis functions.
(a) Spatial-frequency lter
Under the assumption that noise is stronger than signals in the higher
frequency components, the noise removal process is achieved by adopting
a frequency domain lter and adjusting the cut-o frequency to remove
high frequency noise while preserving the original signal [40]. The Wiener
ler [31] is the most well-known technique in this subgroup. It is applied
independently to transform coecients in order to impose constraint on
25
Figure 2.7: Examples of linear lters.
each frequency component. The denoised image estimated by the inverse
transform of ltered coecients has an improved denoising result along
the edge.
(b) Wavelet domain
Image denoising using wavelets is motivated by the following observations.
{ Sparsity: The energy of the transform signal is concentrated in a fewer
coecients in the transform domain.
{ Innite-bandwidth of white noise: The transform of the white noise
in the spatial domain is white noise in the transform domain
Examples of this subgroup include frequency thresholding [26, 39], sta-
tistical modeling of wavelet coecients based on Hidden Markov Models
(HMM) [63], etc.
26
Chapter 3
Low-Complexity Video Error Concealment Using OBMA
3.1 Introduction to Chapter 3
Due to the advancement of video coding technologies and standards, wireless video appli-
cations have become more and more popular in portable consumer electronics, including
mobile TV and 3G cellular phones. One common problem in transmitting compressed
video over unreliable channels in mobile and/or wireless environments is video quality
degradation due to transmission errors. Techniques for combating transmission errors
can be classied into three categories [80] based on the role of the encoder and the de-
coder. Only the encoder is used to handle transmission errors in forward error prevention.
Error concealment (EC) is a post-processing technique employed at the decoder side as
shown in Fig 3-1. When the encoder and the decoder work together in error handling, it
is called interactive error control.
Among these three categories, error concealment (EC) has the least restriction in
implementation and has been studied extensively in the last 20 years. The basic idea of
EC is to recover the lost data from decoded video by exploiting its spatial and/or temporal
27
correlation [49,82]. The original EC concept can be traced back to the problem of partial
lost in still image [50,81]. Then, it has been gradually applied to video coding. Recent EC
developments tend to utilize unique features provided by the emerging H.264/AVC video
coding standard, e.g., the variable block size and multiple reference frames to improve
concealed video quality. For example, the coding mode of neighboring macroblocks was
used in [47] to estimate the coding mode of the lost macroblock so that it can be concealed
on the basis of dierent block sizes. However, the computational complexity of most EC
methods proposed by academia is too high to be practical in mobile applications.
We investigate a low complexity EC algorithm that oers excellent performance in
this work. The boundary matching algorithm (BMA) is a good EC candidate for mobile
video due to its low complexity. Here, we examine a variant of BMA called the outer
boundary matching algorithm (OBMA). Although BMA and OBMA are similar in their
design principle, it is empirically observed that OBMA outperforms BMA by a signicant
margin (typically, 0.5dB or higher) while maintaining the same level of complexity. To
the best of our knowledge, such a performance gap has not yet been well explained in
the literature. In this work, we oer more insights to the superior performance of OBMA
and provide extensive experimental results to demonstrate its advantage for a wide range
of test video sequences and error conditions.
The rest of this chapter is organized as follows. BMA, OBMA and other derived low
complexity EC methods are discussed in Sec. 3.2. The performance evaluation of several
low complexity EC techniques is conducted in Sec. 3.3. The superior performance of
OBMA is explained in Sec. 3.4. Two extensions of OBMA and their associated exper-
imental results are presented in Sec. 3.5. The concept of using the
exible macroblock
28
Figure 3.1: The block-diagram of a decoder with an EC module.
ordering (FMO) to enhance EC performance is discussed in Sec. 3.6. Finally, concluding
remarks are given in Sec. 3.7.
3.2 BMA and OBMA Methods
BMA and OBMA methods rely on the boundary smoothness assumption to conceal the
lost MB information. The candidate motion vector (MV) set consists of its eight neigh-
boring MVs denoted by MV0, MV1, , MV7, and the zero motion vector (ZMV) denoted
by MV8, as shown in Fig. 3-2.
3.2.1 BMA
For each MV candidate, BMA constructs a candidate MB in the reference frame by
backward tracing. To recover the lost MB, the mean of absolute dierences (MAD)
between the boundary of candidate MBs in the reference frame and the neighboring MBs
of the lost MB in the current frame is computed as
29
D
BMA
n
=
1
N
j
X
8i2I
P
N
i
X
8j2J
P
C
j
j (3.1)
n
opt
= arg min
n
(D
BMA
n
) (3.2)
whereD
BMA
n
is the MAD of then
th
candidate MB with n=0,1,,8,P
N
i
andP
C
i
denote
boundary pixel values of neighboring MB i and candidate MB j, respectively, and I and
J are sets of corresponding boundary pixel pairs as shown in Fig. 3-2 and N is the total
number of boundary pixels. For example, N is 16x4=64 for a MB of size 16x16. Then,
the candidate MB that yields the smallest distortion is selected as the best candidate
(n
o
pt) and used to conceal the lost MB.
3.2.2 OBMA
Instead of using adjacent pixel values for distortion computation as done in BMA, OBMA
utilizes a linear translational model to conceal a lost MB as shown in Fig. 3-2 by assuming
that the lost MB can be reconstructed from the reference frame with constant motion
in both magnitude and direction. OBMA adopts the same candidate set MV0, MV1, ,
MV7, MV8, where MV8 is the ZMV, but a dierent distortion computation as
D
OBMA
n
=
1
N
j
X
8i2I
P
N
i
X
8k2K
P
O
k
j (3.3)
n
opt
= arg min
n
(D
OBMA
n
) (3.4)
30
whereD
OBMA
n
is the MAD of then
th
OBMA candidate MB,P
O
k
is the pixel value of
the outer boundary of candidate MB k, and I and K are sets of corresponding boundary
pixel pairs as shown in Fig. 3-2. The candidate MB that has the smallest distortion
is selected as the best candidate. A concept similar to OBMA, called DMVE, was rst
proposed in [1]. Unlike DMVE, OBMA employs only one layer of the outer boundary
and uses all adjacent neighboring MBs in the matching process.
3.2.3 Four Other Derived EC Algorithms
The other four low-complexity EC techniques under our consideration are: zero motion
vector (ZMV), average motion vector (AMV), overlapped motion vector (OMV) and its
modied version (OMV+). ZMV and AMV are known as one of the lowest complexity
EC techniques. ZMV uses the corresponding MB in the reference frame for concealment
while AMV reconstructs the lost MB based on the MV obtained from averaging all
neighboring MVs. OMV restores the lost MB by combining three intermediate subblocks
obtained from the side match computation and the MVs of vertically and horizontally
adjacent neighboring subblocks. The side match computation used in OMV is essentially
a modied version of BMA's matching function. In [15], OMV utilizes a weighting matrix
for subblock merging to produce the concealed MB with less blocky artifact along the
boundary of subblocks. However, simple averaging is used in our work since it requires
less complexity and the deblocking lter in the decoder can prociently remove some of
these eects. OMV+ is similar to OMV. Their main dierence is that OMV and OMV+
adopt modied BMA and OBMA matching functions, respectively.
31
Figure 3.2: Distortion computation in BMA and OBMA.
32
3.3 Performance Evaluation
We apply all six EC algorithms discussed in Sections II and III to three video sequences
(City, Crew and Harbour) under QCIF (176 144), CIF (352288) and 4CIF (704576)
resolution. For each sequence, the rst 100 frames are used and encoded with IPPP
format. The error is modeled as uniformly distributed random with 10% MB loss ratio.
For each case, 10 error patterns are employed and average results are reported in Fig.
3-3. The encoder and decoder module are based on H.264/AVC reference software version
11.0 [43]. BMA and OBMA are very similar to each other and have exactly the same
complexity.
As shown in Fig. 3-3, OBMA outperforms BMA by a large margin. (They correspond
to the 4th and the 3rd bars in the chart for a given error rate.) Its performance is
approximately 0.51, 0.82 and 1.01 dB better than BMA in QCIF, CIF and 4CIF sequences,
respectively and is much better than ZMV and AMV. Performance of BMA and OBMA is
getting better by comparison for higher video resolutions since they tend to have better
spatial features in the matching process. In the low resolution case, OMV yields the
same level of quality as OBMA but its performance suers from ambiguity introduced by
the averaging operation. With the modication, OMV+ gains approximately 0.80, 0.96
and 1.13dB improvements from OMV in QCIF, CIF and 4CIF sequences, respectively.
This enhancement is mainly due to the performance boost from the use of the OBMA-like
matching function. Based on this result, OBMA is very suitable for practical applications
since it has an excellent performance while demanding a low complexity. In the next
33
section, we explain the concepts behind performance of BMA and OBMA as well as the
reasons why OBMA performs much better than BMA despite its similarity.
3.4 Explantion of Superior OBMA Performance
The dierence between BMA and OBMA lies in their distortion function. The inner and
outer boundaries are used in the matching process for BMA and OBMA, respectively. We
can explain the superior OBMA performance from three viewpoints as detailed below.
3.4.1 Edge Detection Eect of BMA Distortion Computation
As shown in Fig. 3-5, BMA's distortion function is similar to the gradient-based horizontal
and vertical edge detectors with simple dierences. Due to this resemblance, BMA's
distortion function actually penalizes an edge located in a block boundary. As shown
in Fig. 3-6, when an edge lies across a block boundary, it results in an uneven surface
between the outer and the inner boundaries, which leads to an increased MAD value. As
a result, the block of ground truth is less likely to be selected. Except edges that are
perpendicular to the boundary, all others across the block boundary tend to hurt the BMA
performance. By using the outer boundary, OBMA's distortion function incorporates the
edge information in the vicinity of a block boundary into its matching decision accurately.
Consider an example given in Fig. 3-7, which has three matching candidates. Since the
2
nd
and the 3
rd
candidates have inconsistent edge information with the neighboring MB,
their MAD calculated by OBMA's distortion function is higher. The original lost MB
will give zero value if it is in the candidate set. In other words, the edge in the vicinity
of a block boundary helps select the right MB by OBMA.
34
Figure 3.3: Performance comparison of six EC algorithms.
35
Figure 3.4: Subjective quality comparison frame no. 100 of city sequence
(CIF) with a loss rate of 10%.
36
Figure 3.5: The distortion function of BMA resembles gradient-based hori-
zontal and vertical edge detectors.
37
Figure 3.6: The edge in the vicinity of the block boundary is penalized by
BMA.
38
Figure 3.7: The edge in the vicinity of a block boundary helps select the right
MB by OBMA.
39
3.4.2 Partial Block Matching
The superior performance of OBMA can be explained by the block-based motion estima-
tion concept as well. That is, OBMA can be viewed as a special type of block matching.
In video encoding, each MB is compared with all possible candidate MBs in the search
range of the reference frame to determine the best motion vector that yields the mini-
mum residual. The traditional motion estimation is to nd the closest match to a block
of size 16x16 using the spatial information of all pixels inside the block. In contrast,
OBMA performs a search to nd the best match to a block of size 18x18 using the spatial
information of all pixels located at the outmost layer.
3.4.3 Eect of Error Types
The results shown in Fig. 3-8 are obtained form testing BMA and OBMA over ten
dierent video sequences. The experiment setting is identical to the one presented in
Fig. 3-3. In most video sequences, OBMA outperforms BMA by a large margin. There
are however three cases (i.e. Football, Ice and Soccer) that they oer about the same
level of quality. This can be explained as follows. In the encoder, the motion estimation
algorithm may yield an inaccurate motion vector (MV) when dealing with fast motion
sequences and result in a larger residual. Consequently, the performance of OBMA is also
limited by fast motion sequences such as the above three image sequences. On the other
hand, OBMA can handle sequences of moderate motion with strong edges and textures
since it can use this information to enhance the matching decision. This phenomenon has
been observed from sequences such as mobile, bus and city. To conrm the strengths and
weaknesses of OBMA, we test BMA and OBMA with selective error patterns instead of
40
uniformly distributed random errors. The purpose is to observe the performance of each
error concealment technique under various error conditions. In particular, we manually
assign erroneous MBs to dierent regions. These two error assignment methods are
describe below
Type I errors: Missing MBs are located in regions consisting of strong edges, tex-
tures and smooth areas. Examples of this type include background, static objects, slow
moving objects, clear object contours, etc
Type II errors: Missing MBs are located in regions of fast motion objects, highly
deformed regions such as mouths, eyes, etc.
Based on the arguments given above, we expect OBMA to perform well for type I
errors but not for type II errors due to the poor performance of motion search in the
latter case. This conjecture is conrmed by experimental results shown in Fig. 3-9. The
experiment setting is the same as that given in Fig. 3-8 except that only 5% MB loss is
tested due to a limited number of classied MBs. We see clearly that the performance
gap between OBMA and BMA is quite large for type I errors but narrows signicantly
for type II errors. Although OBMA yields mediocre quality in the Football, Ice and
Soccer sequences in Fig. 3-8, it performs very well in the same three sequences when
only type I errors are considered in Fig. 3-9. On the other hand, the performance of
OBMA degrades signicantly for type II errors for the Foreman and Harbor sequences,
even though the overall results were good in Fig. 3-8. Please also note that there are no
sucient classied MBs to synthesize type II errors for the Carphone sequence and type
I errors for the Bus and the City sequences. Thus, their results are skipped in Fig. 3-9.
41
25.00
27.00
29.00
31.00
33.00
35.00
37.00
Average PSNR-Y (dB)
carphone
foreman
mobile
harbor
bus
city
crew
football
ice
soccer
Sequence
BMA
OBMA
Figure 3.8: BMA and OBMA performance comparison for dierent video
sequences with an MB loss rate of 10%.
To conclude, OBMA will signicantly outperform BMA if the motion vector provides an
eective temporal prediction.
3.5 Two Extensions of OBMA
Two extensions of OBMA are considered in this section. The rst one is the search
pattern extension. With this technique, OBMA can trade additional complexity for
improved performance. Another extension is to use multiple overlapped layers, which is
an extended version to an idea described in [1].
3.5.1 OBMA with Rened Local Search
OBMA in its original form only searches the reference frames with a small set of neigh-
boring motion vectors (MVs). The overall performance of OBMA can be improved by
42
20.00
24.00
28.00
32.00
36.00
40.00
Average PSNR-Y (dB)
carphone
foreman
harbor
bus
city
football
ice
soccer
Sequence
Type I - BMA
Type I - OBMA
Type II - BMA
Type II - OBMA
Figure 3.9: Performance comparison of BMA and OBMA against two error
types.
increasing the number of MV candidates at the cost of higher computational complex-
ity. To evaluate the trade-o between the concealed video quality and the complexity,
we compare the PSNR results as well as the complexity for the following three search
patterns. Four variations are examined in each case.
OBMA with full search (OBMA-FS): OBMA-FS uses all possible candidates in
a large search area pointed by the estimated MV of the lost MB. The search is performed
with a quarter-pel resolution and can be extended to multiple reference frames.
OBMA-FS5. We set the number of reference frames to 5 and the search range to
16x16. It includes all possible candidates.
OBMA-FS1-R16, OBMA-FS1-R8 and OBMA-FS1-R4. We use only 1 reference
frame and set the search range to 16x16, 8x8 or 4x4, respectively
The complexity of OBMA-FS can be computed by
43
N
C
=N
R
(2R
F
4 + 1) (2R
F
4 + 1) (3.5)
Where N
C
and N
R
are the numbers of candidates and reference frames, respectively
and R
F
is the search range in pixels.
OBMA with rened local search (OBMA-RS): OBMA obtains matching candi-
dates from a small search region pointed by the MV of neighboring MBs. The complexity
is drastically reduced from the previous case since the search region is much smaller.
OBMA-RS2. It uses a search range of 2x2 with the quarter-pel resolution.
OBMA-RS1-QP, OBMA-RS1-HP and OBMA-RS1-FP: They use search range of
1x1 with quarter-pel, half-pel and full-pel resolutions, respectively.
The complexity of OBMA-RS is give by
N
C
= 8 (2R
S
R
Q
+ 1) (2R
S
R
Q
+ 1) (3.6)
Where R
S
is the search range of a small search region, R
Q
is the search resolution
equal to 4, 2 and 1 for quarter-pel, half-pel and full-pel resolution, respectively
OBMA with selective search (OBMA-SS): OBMA-SS is similar to OBMA-RS
except that it only considers a small search region pointed by the MV of neighboring
MBs that gives the minimal MAD. It can be viewed as a two-stage operation.
OBMA-SS2. It uses a search range of 2x2 with the quarter-pel resolution.
OBMA-SS1-QP, OBMA-SS1-HP and OBMA-SS1-FP. They use a search range of
1x1 with quarter-pel, half-pel and full-pel resolutions, respectively.
44
The complexity of OBMA-SS can be calculated via
N
C
= (2R
S
R
Q
+ 1) (2R
S
R
Q
+ 1) + 8 (3.7)
The original BMA and OBMA use a set of eight neighboring MBs for matching.
Thus, their complexity is equal to 8 candidates. The complexity of all techniques and
their performance are shown in Table 3-1. The plot of quality versus complexity is given
in Fig. 3-10. The experimental setting is similar to that in Fig. 3-3 with MB loss rate
equal to 10%. OBMA-FS5 outperforms OBMA by 1.3 dB at the cost of 83,205 matching
candidates. This is not suitable in mobile video applications. In rened and selective
search cases, the performance of OBMA is not aected by the reduction in the search
range from 2x2 to 1x1. Besides, the reduction of search resolution from quarter-pel to
half-pel slightly decreases the overall performance. In general, to increase the number
of matching candidates gives rise to quality improvement as observed in Fig. 3-10 as
shown by the positive slopes between two adjacent points. However, there are some
segments that have a negative slope such as the transition from OBMA-RS1-QP (9) to
OBMA-FS1-R4 (10) and from OBMA-RS1-HP (7) to OBMA-SS2 (8). Overall, we see
that OBMA with a selective search pattern oers an excellent trade-o between quality
and complexity.
3.5.2 Multiple Boundary Layers
In this extension, we may increase the number of boundary layers from one to multiple
layers. The outer boundary layer number ranges from 1 to 8 in Fig. 3-11 and three
45
Error concealment Average Complexity Index
technique PSNR-Y
BMA 32.04 8 1
OBMA 33.54 8 2
OBMA-FS5 34.74 83205 14
OBMA-FS1-R16 34.52 16641 13
OBMA-FS1-R8 34.46 4225 12
OBMA-FS1-R4 34.13 1089 10
OBMA-RS2 34.36 2312 11
OBMA-RS1-QP 34.36 648 9
OBMA-RS1-HP 34.36 200 7
OBMA-RS1-FP 34.17 72 5
OBMA-SS2 34.27 297 8
OBMA-SS1-QP 34.29 90 6
OBMA-SS1-HP 33.96 33 4
OBMA-SS1-FP 33.86 17 3
Table 3.1: Complexity and quality comparison of several EC methods (the
index is used in Fig. 3-10).
14
13
12
11
10
9
8
7
6
5
4
3
2
1
31.50
32.00
32.50
33.00
33.50
34.00
34.50
35.00
1 10 100 1000 10000 100000
Average PSNR-Y (db)
Number of candidates
Figure 3.10: Quality and complexity comparison of various EC methods given
in Table 3-1, where the number is the index of each method.
46
dierent resolutions (QCIF, CIF and 4CIF) of the City sequence was tested. The sequence
consists of 100 frames and it is encoded in the IPPP format. The MB loss rate is 10%. It
turns out that the single-layer OBMA yields the best results for all three video resolutions.
Its performance is decreasing when more outer layers are used in the distortion function.
To explain this, we interpret OBMA form the motion estimation viewpoint. The original
OBMA is equivalent to performing a search similar to motion estimation with a block of
size 18x18 using the spatial information of all pixels located at the outmost boundary.
When N overlapped layers is used, the size of searching block becomes (2N+16)x(2N+16)
and this corresponds to motion estimation with a larger block size. When the block size
is too large, the accuracy of block-based motion estimation can be hurt. As discussed
before, a larger residual is equivalent to higher distortion in a restored MB. Actually,
we may reduce the block size to achieve better quality. This can be done by performing
OBMA in the subblock level (block size of 8x8) as suggested in [16] and [87], which is
called the rened estimation with a smaller block size. (RBMA) and proven to be better
than OBMA with block size 16x16.
3.6 BMA/OBMA With FMO
The
exible macroblock ordering (FMO) is one of many tools used to enhance the error
resilience of coded video in H.264/AVC. The main idea behind is to assign a macroblock
to one particular slice group via the Macroblock-to-slice Allocation map (MBA map).
It was shown in [84] that FMO introduces a small amount of computational overhead
and implementation complexity. As discussed earlier, many EC algorithms rely heavily
47
27.00
28.00
29.00
30.00
31.00
32.00
Average PSNR -Y (dB)
QCIF CIF 4CIF
22.00
23.00
24.00
25.00
26.00
1 2 3 4 5 6 7 8
Number of overlapped layers
Figure 3.11: The performance of OBMA with a dierent number of outer
boundary layers used in the matching process.
on the spatial information to conceal the lost data. With a well-designed FMO pattern,
a lost macroblock may still have some available spatial neighboring information after
experiencing a lost slice.
In H.264/AVC, FMO consists of 7 dierent patterns as shown in Fig. 3-12. Each
FMO pattern is unique and designed to work in dierent applications. Types 0 and
1 are known as interleaved and scattered patterns, respectively. Type 0 places adjacent
macroblocks into a slice while Type 1 distributes the macroblock selection throughout the
entire frame in a checkerboard pattern. Type 2 uses several marked rectangular regions
to dene a slice, which can be used in an application where some areas of the video
frame are independent or more important than the rest. Types 3-5 are more dynamic
48
in the sense that their slice pattern changes over time to accommodate more advanced
applications. Finally, users can dene their preferred FMO pattern in Type 6.
In our studies, we compare three dierent FMO patterns (i.e., the raster scan, the
line-interleaved and the scattered order) as shown in Fig. 3-13 and various testing error
conditions are given in Table 3-2. The burst length is used to indicate the number of
consecutive lost slices. It is limited to 2 since we want to avoid the situation where the
entire frame is lost. In addition, H.264 limits the number of slices per frame to 8 to
avoid a complicated macroblock assignment scheme. In the experiment, we apply BMA
and OBMA to three QCIF video sequences (i.e., carphone, city and foreman). For each
sequence, the rst 100 frames are encoded with the IPPP format. 10 error patterns are
employed and the averaged results are reported in Fig. 3-14. With the line-interleaved
pattern, BMA gains 1.40 and 1.24 dB in T1 and T4 while OBMA obtains 2.68, 1.40
and 1.16 in T1, T4 and T2, respectively. The line-interleaved pattern slightly aects
the performance of T3 and T6 since the lost macroblock only gains a small number of
additional reliable neighboring macroblock. However, the performance of both BMA
and OBMA in T1 and T4 is drastically improved due to a large number of additional
reliable neighboring macroblocks. As shown in Fig 3-14, we see that the scattered pattern
outperforms the line-interleaved pattern in all error conditions. As compared with the
line-interleaved pattern, BMA and OBMA gain an additional improvement of 0.87 and
1.23 dB, respectively, using the scattered pattern.
49
Figure 3.12: The FMO patterns oered in H.264/AVC, where each color
represents a dierent slice group [24]
50
Figure 3.13: An FMO pattern under consideration, where each color repre-
sents a dierent slice group.
Error Error Burst length Number of slice
Condition Probability (slice) per frame
T1 0.05 1 3
T2 0.05 1 5
T3 0.05 1 8
T4 0.05 2 3
T5 0.05 2 5
T6 0.05 2 8
Table 3.2: Burst error conditions, where the error condition index is used in
Figures 3-14 and 3-15.
51
Figure 3.14: Performance comparison of three selected FMO patterns under
various error conditions and video sequences.
52
Figure 3.15: Subjective quality comparison frame no. 30 of city sequence
(QCIF) with error condition T6.
53
Figure 3.16: Subjective quality comparison of foreman sequence (QCIF).
54
3.7 Conclusion to Chapter 3
Several low-complexity EC techniques based on the boundary matching criteria were
studied extensively in this work. Extensive experiment results were given to demonstrate
the large performance gap between BMA and OBMA. This performance gap was ex-
plained from several dierent angles. Two extensions of OBMA and the use of FMO to
enhance its performance were discussed to enhance the understanding of OBMA. To con-
clude, OBMA is suitable for mobile video applications since it has an excellent trade-o
between complexity and good concealed video quality. Besides, it can be used to enhance
other EC techniques that rely on the block-matching procedure as observed in the OMV+
algorithm.
55
Chapter 4
Robust Video Frame Rate Up-Conversion (FRUC)
Technique
4.1 Introduction to Chapter 4
Video frame rate-up conversion (FRUC) has been a technique of great interest due to its
diversied applications in consumer electronics. For instance, HDTV and multimedia PC
systems usually have capability to display a much higher frame rate than that oered by a
broadcasting video stream [20]. In this case, FRUC can be employed to increase the orig-
inal frame rate to enhance the viewing experience of the end-user. For handheld devices,
when the temporal mode of scalable video coding [65] is used, the frame rate of trans-
mitted video may vary due to the
uctuation of the available bandwidth of transmission
channels. FRUC can be exploited to stabilize the video frame rate under this condition.
Many FRUC algorithms adopt a motion interpolation technique to determine the motion
vector (MV) eld of interpolated frames. In general, FRUC algorithms rely on temporal
and/or spatial correlation of the original video sequence to construct up-sampled frames
[28,41]. The concept of FRUC is very similar to that of video error concealment (EC).
56
In particular, frame loss concealment aims to reconstruct the missing frame when video
sequence encounters transmission loss. Thus, there are many fundamental ideas that
have been shared by FRUC and EC such as motion trajectory prediction, optical
ow,
etc [18,61]. Generally speaking, today's advanced FRUC techniques provide satisfactory
results for most video sequences in preserving temporal continuity. However, we observe
two challenging situations, where the performance of FRUC deteriorates severely. That
is, most existing FRUC algorithms fail to provide good results when the input video se-
quence has an abrupt illumination change (e.g., under a
ashlight) and a lower frame rate
(e.g., 10fps or lower). We will address these two situations and propose a low complexity
solution in this work.
Under the above two conditions, the estimated MV eld between adjacent frames is
not accurate, which results in an adverse eect on the motion interpolation process of
FRUC. To address the diculty of a low input frame rate, we propose a new FRUC algo-
rithm that employs the rst- and the second-order translational MV model and detects
the continuity of the MV eld. Instead of relying solely on temporal correlation among
frames, a spatial smoothness criterion is also used to ensure better perceptual quality
of the up-converted video sequence. To handle abrupt illumination change, we present
a low complexity technique to detect the sudden shift in the lighting condition. This
technique relies on detection results obtained from adaptive thresholding applied to the
rst- and the second-order dierence of the frame luminance histogram. Then, a local
intensity adjustment method is used to regulate the frame intensity, which enables the
motion estimation process to get more accurate object motion. The superior performance
of the proposed algorithm will be demonstrated via representative examples in this work.
57
Figure 4.1: Frame rate up-conversion (FRUC) scheme.
The rest of this chapter is organized as follows. The eect of abrupt illumination
change and low frame rates on FRUC is addressed in Sec. 4.2. The proposed solutions for
these two conditions are presented in Sec. 4.3. In Sec. 4.4, we describe a brief detail of six
other low complexity FRUC algorithms used as a benchmark for performance comparison.
Associated experimental results and performance evaluations are presented in Sec. 4.5.
Finally, concluding remarks are given in Sec. 4.6.
4.2 Problem Statement
4.2.1 Abrupt Illumination Change
Most FRUC algorithms rely on temporal continuity of the MV eld in order to interpo-
late up-sampled frames. Motion information obtained from the motion estimation process
in the encoder is not always reliable since the process aims at minimizing the encoding
58
residual instead of acquiring the actual object motion. Under an abrupt illumination
change condition, motion estimation is unable to determine the correct object motion
due to an irregular increase in pixel intensity and, consequently, the temporal continuity
of the MV eld is interrupted. This phenomenon occurs due to the signicant brightness
variation of the environment and/or the camera eect such as
ash, fade in/out, etc.
Without proper pre-processing, motion-interpolation-based FRUC algorithms cannot de-
termine an accurate MV eld for frame interpolation. Fig. 4-2(a) illustrates the MV eld
overlaid on the compensated frame of the crew sequence, which is calculated based on
[68]. In this instance, the temporal continuity of the MV eld is preserved until frame no.
30, where the camera
ash occurs. The eect causes disruption in the MV eld on the
current frame and the next frame. After that, the temporal continuity of the MV eld is
gradually restored.
4.2.2 Low Frame Rate
When the input frame rate is low, the temporal distance of two adjacent frames become
larger. Thus, it is more dicult to get accurate MV estimation for each macroblock
due to the lost information between frames. The linear translational motion assumption,
which is exploited by many FRUC algorithms to interpolate up-sampled frames, may fail
as the frame rate decreases. This is especially true for video with fast moving objects and
inconsistent camera motion. An example is shown in Fig. 4-2(b). In this case, motion
information of the rst three frames is consistent with the actual object motion. However,
after frame no. 4, the MV eld becomes erratic due to fast camera panning starting at
frame no. 5. We observe that the eect of the low frame rate on FRUC starts to manifest
59
when the frame rate is dropped below 10 fps for video with fast motion and fast camera
movement such as soccer, ice and football sequence. The problem of FRUC under the
low frame rate circumstance is very challenging since most FRUC algorithms exploit a
correlation between two adjacent frames in order to construct an interpolated frame.
With a wider temporal gap, this correlation reduces drastically and motion information
obtained from the motion estimation process becomes unreliable. In the next section, we
propose a low-complexity FRUC algorithm to handle this scenario.
4.3 Proposed Solution
4.3.1 Local Intensity Adjustment for Abrupt Illumination Change
An abrupt change in the lighting condition introduces a signicant shift in the average
intensity level of the eected frame. In order to detect this eect, the frame distance
based on the rst- and the second-order dierences of the frame luminance histogram can
be computed.
D
n
(t) =
1
N
X
B
i=0
jh
n
i
(t)h
n
i
(t + 1)j;n = 1; 2 (4.1)
WhereD
1
(t) andD
2
(t) denote the rst- and the second-order dierences of the lumi-
nance histogram of frame t, respectively. h
1
i
(t) andh
2
i
(t) are thei
th
cumulative histogram
of frame t and gradient frame t, respectively, B is the number of histogram bins, and N is
the total pixel number in each frame. The gradient frame is essentially a frame dierence
with threshold and, thus, its frame distance is equivalent to the second-order dierences
of the frame luminance histogram. Examples of the normal frame and the gradient frame
60
Figure 4.2: Examples of unreliable MV elds due to (a) abrupt illumination
change (CIF Crew 15 fps) and (b) low frame rate (CIF Soccer 7.5 fps).
61
are presented in Fig. 4-3. The number of histogram bin can be adjusted lower to con-
trol the sensitivity of measured variables to noise such as camera and object movement.
After the frame distance computation, an adaptive thresholding technique [91] can be
used to determine the signal peak location. The adaptive threshold T (t) is essentially a
moving averaging window with an adjustable sensitivity parameter on data variance. It
is described as follows
T (t) =(t) +:(t) (4.2)
(t) =
1
L
X
i2W
D(i) (4.3)
2
(t) =
1
L
X
i2W
(D(i)(t))
2
(4.4)
Where is a scale factor, (t) and
2
(t) are the sample mean and sample variance
of data within window W of length L that has a center at frame t. A large luminance
change is claimed when we detect simultaneous peak signals within the same frame.
Afterwards, the local intensity dierence of each macroblock is measured by computing
the average luminance dierence of all macroblocks within a neighboring area to their
corresponding locations in the previous frame. Then, the measured parameters are used
to adjust the pixel value of the detected frame to reduce the irregular intensity dierence
introduced by abrupt illumination change. Instead of using globally intensity adjustment,
we employ locally intensity adjusting technique due to the fact that the lighting eect has
62
an uneven in
uence on each area. For instance, the intensity of foreground and objects
with re
ective texture is highly aected by the camera
ash as compared to objects in
background. An exemplary result (Frame no.3 of the crew sequence) of local intensity
adjustment is illustrated in Fig. 4-4 and the comparison of the corresponding MV elds
(before and after the processing) is presented in Fig 4-5. In Fig. 4-4, the intensity
adjusted frame experiences a blurring eect due to an averaging operation but its overall
intensity level is much closer to that of adjacent frames. In Fig. 4-5, after the application
of local intensity adjustment, the temporal continuity of the MV eld is restored and the
obtained motion information is more consistent with actual object motion.
4.3.2 TWK Algorithm for FRUC
As discuss in the previous section, due to a wider temporal gap between two adjacent
frames of low frame rate video, most FRUC algorithms that only exploit temporal corre-
lation and aim to maximize temporal continuity of up-sampled video will not be able to
perform well. Here, we propose a FRUC algorithm, called TWK, based on the rst- and
the second-order translational model that consider not only temporal continuity but also
the spatial smoothness of the interpolated frame.
For low frame rate video, the linear translational model assumption may not hold for
all macroblocks. However, there are still locations where this assumption may apply such
as the stationary background and slow-moving objects, etc. In order to identify them,
we use the rst- and the second-order forward and backward linear motion projection as
given below:
63
Figure 4.3: Example of gradient frame frame no. 29 and 30 (CIF crew 15
fps) (a) normal frame and (b) gradient frame.
Figure 4.4: Example of local intensity adjustment frame no. 3 (CIF crew 15
fps) (a) before processing and (b) after processing.
64
Figure 4.5: Example of motion vector eld frame no. 3-5 (CIF crew 15 fps)
(a) before processing and (b) after processing.
65
f
1
st
FP;t+1
(P
t
(m;n)) =P
t+1
(m +MVX
t
(m;n);n +MVY
t
(m;n)) (4.5)
f
1
st
BP;t
(P
t+1
(m;n)) =P
t
(m +MVX
t+1
(m;n);n +MVY
t+1
(m;n)) (4.6)
f
2
nd
FP;t+1
(P
t
(m;n)) =P
t+1
(m +MVX
t
(m;n)MVX
0
t
(m;n)
;n +MVY
t
(m;n)MVY
0
t
(m;n))
(4.7)
f
2
nd
BP;t
(P
t+1
(m;n)) =P
t
(m +MVX
t+1
(m;n)MVX
0
t+1
(m;n)
;n +MVY
t+1
(m;n)MVY
0
t+1
(m;n))
(4.8)
wheref
N
th
FP;t
andf
N
th
BP;t
are theN
th
order (N = 1; 2) forward and backward macroblock
projection onto frame t, respectively, P
t
(m;n) is macroblock at location (m,n) in frame
t and its corresponding horizontal and vertical MV are denoted by MVX
t
(m;n) and
MVY
t
(m;n), respectively. MVX
0
t
(m;n) and MVY
0
t
(m;n) denote the acceleration of
horizontal and vertical motion obtained by tracking the trajectory path, respectively, as
shown in Fig. 4-6.
Motion estimation uses the previous and the next frames as the reference frame for
forward and backward projection, respectively. After the projection, we compare the
actual macroblock with the projected image. If their rst order dierence is within
the detection threshold, this macroblock is identied as a linear translation macroblock,
and the projected images on the interpolated frame can be used for processing. If an
overlapping region of several projected images exists, pixel averaging is employed.
66
In the second step, we use the second-order translational model to detect locations
with non-linear motion. The second-order detection threshold is set lower than the rst-
order threshold to ensure the accuracy of interpolated areas. Unlike many existing motion
projection-based FRUC algorithms such as [20], our algorithm does not force every mac-
roblock to select the best linear motion trajectory and/or interpolate the MV of every
location since these strategies usually result in low quality of interpolated frames when
the linear translation assumption fails. For practical applications, it is desirable to reduce
the requirement on the computational complexity and the frame buer storage using only
the rst- and the second-order forward translational model.
In the third step, the TWK algorithm uses a technique similar to the outer boundary
block matching (OBMA) [73] - [75] to ll in an area which is left by the second stage. The
MV candidate is obtained from the neighboring area of the corresponding macroblocks
in frames t and t+1. This criterion is introduced to ensure the spatial continuity of
interpolated frames.
An example of interpolated frames obtained after each step of the TWK algorithm is
shown in Fig. 4-7 (CIF ice 7.5fps). As shown in this gure, most of the background area
and the region with slow moving objects have been interpolated after the rst two steps
as shown in Figs. 4-7(a) and (b). Most of the remaining un-interpolated areas consist
of fast moving objects, object boundaries and occlusion areas. The translational models
used in the TWK algorithm are limited and cannot always capture the motion of fast
moving objects, especially under an in
uence of camera motion. For object boundaries
and occlusion areas, it is unlikely that the projected macroblock will be able to nd a close
match within the projected area. Nevertheless, as discussed earlier, the TWK algorithm
67
does not aim to t every area with the translational model. These remaining areas are
interpolated using OBMA and the result is shown in Fig. 4-7(c). As discuss in [75],
OBMA can preserve the continuity of object boundaries. In addition, it can interpolate
uncovered and to-be-covered occlusion areas since OBMA search matching candidates
from all reference frames in both forward and backward temporal directions.
Another interpolated frame obtained by the TWK algorithm is shown in Fig. 4-8
(for the CIF city sequence of 7.5fps). As shown in Fig. 4-8(a), the majority of the
interpolated frame is identied in the rst step since this video sequence complies with
the linear translation model and its content consists mainly of stationary solid objects
(buildings). Because of camera motion, the rst-order translation model is not able to
interpolate the entire frame as shown in Fig. 4-8(b). The second-order translation model
does not contribute much to the interpolated frame. However, OBMA can incorporate
the spatial information (edge and texture) to interpolate the missing data eectively as
shown in Fig. 4-8(c).
4.4 Other Low Complexity FRUC Algorithms
In this section, we describe six other low complexity FRUC algorithms that will be used
as performance benchmark, i.e. Frame Repetition (FR), Frame Averaging (FA), Motion
Copy (MC) and three motion projection based techniques - Forward (FP), Backward
(BP) and Bi-direction (BIP). For FR, the current frame is used directly as an interpo-
lated frame. This technique has the lowest complexity and can be considered as a lower
bound of the performance measure since it does not actually increase the perceived frame
68
Figure 4.6: Example of motion projected macroblock (a) the rst-order
forward translation macroblock (b) the second-order forward translational
macroblock (c) the rst-order backward translation macroblock and (d) the
second-order backward translational macroblock.
69
Figure 4.7: Illustration of the TWK algorithm for the CIF ice sequence of
7.5fps: (a) after the rst-order translation detection (b) after the second-order
translation detection (c) after OBMA and (d) the original frame.
70
Figure 4.8: Illustration of the TWK algorithm for the CIF city sequence of
7.5fps; (a) after the rst-order translation detection (b) after the second-order
translation detection (c) after OBMA and (d) the original frame.
71
rate of the processed video. FA acquires the interpolated frame by averaging two ad-
jacent frames with some weights. The weighting parameter can be adaptively adjusted
based on the frame content. The simple averaging operation will be employed in this
work. MC interpolated each macroblock in the up-sampled frame based on a scaled MV
obtained from the corresponding macroblock in the current frame. Here, the scaled MV
has quarter-pel resolution. The motion projection based technique relies on the motion
information of two adjacent frames to project each macroblock of these frames onto the
interpolated frame. Normally, there exists an overlapped area of several projected mac-
roblocks as well as a blank pixel that is not occupied by any projected data. For the
overlapped area, simple averaging is usually used to merge the overlapped information
while a simple motion interpolation technique and an averaging mask can be used to
ll out the missing data. In general, the technique based on motion projection can be
performed in both the pixel and the macroblock levels. It was shown in [18] that the
pixel-level algorithm yields a better performance but has a higher computational com-
plexity. In favor of low complexity, the macroblock-level algorithm will be used in our
experiment and, in order to obtain better spatial correlation, OBMA will be used to ll
out the missing data. FP and BP derive its name from the projection direction while BIP
obtains the interpolated frame by weight averaging the interpolation results from FP and
BP. Similar to FA, various merging scheme can be applied [7] but a simple weighting will
be used here.
72
4.5 Experimental Results
All low complexity FRUC algorithms presented in the last section can be applied to any
up-sampling factor. Here, for the sake of simplicity, we consider the case of 1-to-2 frame
rate up-sampling throughout the experiments. To get the ground truth, we down-sample
test sequences of 15 fps to sequences of a lower frame rate (i.e., 7.5 fps). Five dierent
video sequences (CIF) are used to test the proposed algorithm. They are: city, crew, ice,
soccer and football.
4.5.1 Abrupt Illumination Change
The eect of the local intensity adjusting technique on MC and the proposed FRUC
algorithm (denoted by TWK) can be observed from Fig. 4-9. The perceptual quality
of interpolated frame is greatly improved. The blocking artifact in Figs. 4-9(a) and
(c) is the result of inaccurate MV caused by camera
ashlight. Overall, MC and TWK
have an average PSNR improvement of 3.22 and 1.45 dB, respectively, via local intensity
adjustment. MC has a larger performance gain since its interpolation process relies solely
on temporal information while TWK considers both temporal and spatial conditions.
4.5.2 Low frame rate
The performance comparison of several low-complexity FRUC algorithms is given in Fig.
4-10. The presented PSNR is acquired from the averaged results of ve testing video
sequences. Although FA is approximately 1.16 dB better than FR, it has very poor
perceptual quality. TWK has an average PSNR improvement of 3.79 dB, 2.63 dB and
1.87 dB with respect to FR, FA and MC, respectively. The three motion projection based
73
Figure 4.9: Interpolated frames in the presence of camera
ashlight: (a) MC
without local adjustment (b) MC with local adjustment (c) TWK without
local adjustment and (d) TWK with local adjustment.
74
techniques (FP, BP and BIP) yield a similar level of performance, which is approximately
1.25 dB lower than TWK.
In order to evaluate the perceptual quality of interpolated frames, three examples
(i.e., city, ice and soccer sequences) are presented in Figs. 4-11, 4-12 and 4-13, respec-
tively. Each of these video sequences has some unique characteristics, which signicantly
in
uence the performance of FRUC algorithms. For the city sequence, it consists mainly
of stationary solid objects (i.e., buildings) and has slight camera motion. With these
properties, most FRUC algorithms can perform well even with a very low frame rate as
shown in Fig. 4-11. On the other hand, the ice sequence experiences no camera motion
but consists of several fast moving objects. All three motion projection based techniques
suer from an inaccurate macroblock projection severely as shown in Fig. 4-12(d), (e)
and (f), since they indiscriminately apply the projection model to all macroblocks. In
contrast, the interpolated frame from TWK has much better perceptual quality due to
the spatial smoothness criterion exerted by OBMA. Among all these three test examples,
the soccer sequence imposes the greatest challenge since it is aected by both fast camera
panning motion and fast object motion. The average PSNR of interpolated frames of
the soccer sequence is about 2.85 and 1.32 dB lower than that of city and ice sequences,
respectively. As shown in Fig. 4-13, TWK produces the best perceptual quality but there
still exist some erroneous details. Nevertheless, these errors are dicult to perceive when
the sequence is displayed at 15 fps.
75
Figure 4.10: Performance comparison of several low-complexity FRUC algo-
rithms.
76
Figure 4.11: Comparison of interpolated frames for the CIF city sequence of
7.5fps: (a) FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the
original frame.
77
Figure 4.12: Comparison of interpolated frames for the CIF ice sequence of
7.5fps: (a) FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the
original frame.
78
Figure 4.13: Comparison of interpolated frames for the CIF Soccer sequence
of 7.5fps: (a) FR (b) FA (c) MC (d) FP (e) BP (f) BIP (g) TWK and (h) the
original frame.
79
Figure 4.14: Performance comparison of several low-complexity FRUC algo-
rithms - City sequence.
80
Figure 4.15: Performance comparison of several low-complexity FRUC algo-
rithms - Ice sequence.
81
Figure 4.16: Performance comparison of several low-complexity FRUC algo-
rithms - Soccer sequence.
82
4.6 Conclusion to Chapter 4
Two challenging issues in video frame rate up-conversion (FRUC) were identied and low
complexity processing techniques were proposed to enhance the performance of interpo-
lated frames. First, to address the problem of abrupt illumination change, we proposed
a local intensity adjustment technique to ensure robust MV estimation, thus leading to
better visual quality. Second, to handle the low input video frame rate, the proposed
TWK algorithm utilizes the rst and the second translational model as well as spatial
smoothness criterion. It was demonstrated by experimental results that TWK performs
signicantly better than other low-complexity FRUC algorithms.
83
Chapter 5
Adaptive Nonlocal Means Algorithm for Image Denoising
5.1 Introduction to Chapter 5
Image denoising is one of the classical problems in digital image processing, and has been
studied for nearly half a century due to its important role as a pre-processing step in var-
ious applications. Its objective is to recover the best estimate of the original image from
its noisy version. Several denoising methods have been proposed such as neighborhood
ltering [49], total variation minimization [82], Wiener ltering [50], Gaussian scalar mix-
ture [81], methods based on partial dierential equation solution [47], etc. Early denoising
techniques such as the Gaussian and the mean ltering are suitable for smooth regions
but they yield blurred edge and texture regions. Unlike the aforementioned techniques,
the Wiener lering method operates in the frequency domain. The denoised image is
estimated by the inverse transform of the ltered coecients, which results in an im-
proved edge region. For the total variation minimization technique, the total variation of
an image is minimized subject to a constraint derived from noise characteristics. It was
shown in [92] that this technique can eectively preserve straight edges. However, if the
84
Lagrange multiplier is too small, ne details could be over-smoothed. On the other hand,
the
at region of the denoised image may suer from the mask eect if the Lagrange
multiplier is too large. The selection of a proper value of the Lagrange multiplier is not
trivial.
The nonlocal means (NL-means) algorithm proposed in [92, 43] has oered remarkably
promising results. Unlike previous denoising methods which were developed under the
local regularity assumption, the NL-means exploits the spatial correlation in the entire
image for noise removal. It adjusts each pixel value with a weighted average of other
pixels whose neighborhood has a similar geometrical conguration. Since image pixels
are highly correlated while noise is typically independently and identically distributed
(i.i.d.), averaging of these pixels results in noise cancellation and yields a pixel that is
similar to its original value.
In this chapter, we propose an adaptive NL-means (ANL-means) algorithm that im-
proves the similarity matching process and denoising parameters based on the local struc-
ture of a pixel. The singular value decomposition (SVD) method and the K-means clus-
tering (K-means) technique are employed for robust block classication in noisy images.
The similarity matching process is enhanced by allowing more candidates through rotated
block matching and dominant orientation alignment. Moreover, a scheme to estimate the
noise level based on the Laplacian operator is presented. In addition to removing the
additive white Gaussian noise (AWGN), we extend it to Rician noise removal, which oc-
curs often in the magnitude MRI image. It is shown by experimental results that the
ANL-means algorithm outperforms the traditional NL-means algorithm signicantly for
85
various test images and conditions. This is especially advantageous when the noise level
is high.
The rest of this chapter is organized as follows. The new ANL-means algorithm is
proposed in Sec. ??. The denoising performance of of NL-means-based technique under
AWGN and Rician noise is analyzed in Sec. 5.3. The noise level estimation scheme for
AWGN and Rician noise is presented in Sec. ??. Experimental results of the ANL-means
algorithm under various conditions are shown in Sec. 5.5. Finally, concluding remarks
are given in Sec. 5.6
5.2 Adaptive Nonlocal Means (ANL-Means) Algorithm
For given noisy image, f =ff(i)ji2
g, the NL-means denoised value
^
f(i) at pixel i is
obtained by a weighted average of all pixels in its neighborhood
N
[1],
^
f(i) =
1
C(i)
X
j2
N
w(i;j)f(j); (5.1)
where
C(i) =
X
j2
N
w(i;j) (5.2)
is a normalization constant and weight w(i;j) is determined by the similarity of the
Gaussian neighborhood between pixels i and j, which can be expressed as
w(i;j) = exp(
kN
i
N
j
k
2
2;a
h
2
); (5.3)
86
and where N
i
denotes a square neighborhood centered at pixel i,jjjj
2;a
is a Gaussian
weighted Euclidean distance function, a is the standard deviation of the Gaussian kernel
and h is the decay parameter.
The proposed adaptive technique consists of three unique features: 1) employing the
singular value decomposition (SVD) method and k-means clustering (K-means) technique
for robust block classication; 2) adjusting the local window adaptively to match the
local property of a block; and 3) applying a rotated block matching algorithm for better
similarity matching. Feature 1 will be described in Sec. 5.2.1 while Features 2 and 3 will
be detailed in Sec. 5.2.2.
5.2.1 Block Classication
Adaption of the NL-means algorithm is conducted according to the block classication
result. Here, block classication is achieved by applying the SVD to the gradient eld
of each block [78]. For a smooth region, there is no dominant direction and, thus, all
eigenvalues are small. For an oriented edge/texture region, there is a dominant direction
and the corresponding eigenvalue is signicantly larger than others. For a block of size
nxn = N, we can group its gradient values into matrix G of size Nx2 and compute its
SVD via
G =
rf(1)
T
rf(2)
T
rf(N)
T
T
andG =USV
T
; (5.4)
where
rf(i) =
@f(i)
@x
@f(i)
@y
T
; (5.5)
87
is the gradient of imageF at pointi,U is anNxN orthogonal matrix,S contains singular
values and V is an 2x2 orthogonal matrix which describes the dominant orientation of
the gradient eld. Since the white noise does not have any preferred direction, we can
classify each block eectively based on the magnitude of the singular value in the dominant
direction. In order to perform adaptive classication, we employ the K-means clustering
technique. Lets(i) be the singular value in the dominant direction of the block centered at
pixeli. The K-means algorithm partitionsfs(i)ji2
g intoK classesC =fc
1
;c
2
:::;c
K
g
while minimizing the within-cluster sum of squares as
arg min
C
K
X
k=1
X
s(i)2c
k
js(i)
k
j
2
; (5.6)
where
k
is the mean of c
k
. An example of energy in the dominant direction and the
corresponding classication result is presented in Figs. 5.1 (b) and 5.1 (c), respectively.
5.2.2 Adaptive Window Adjustment and Rotated Block Matching
To exploit the local property and reduce noise in dierent regions, we adaptively choose
the matching window size based on the classication result. For the edge/texture region,
we employ a small matching window. In contrast, a larger matching window is adopted for
the smooth region. In practice, we use a small window (7x7) in the strong edge/texture
region, a large window (19x19) in the smooth region, and a medium window (13x13) in
other regions.
Furthermore, we employ a rotated block matching process that allows more candidats
of similar image blocks. The matching kernel of the conventional NL-means algorithm is
88
Figure 5.1: Examples of block classication via SVD and K-means: (a) noisy
image, (b) energy in the dominant direction - the brighter region has a more
dominant edge direction and (c) classication results with the K-means algo-
rithm, where dierent gray values correspond to dierent classes.
89
the Gaussian weighted Euclidean distance function, which only recognizes similar blocks
of displacement. This matching kernel cannot treat a distant block that is similar but
with orientation as a similar one since the distance value can be large. Consequently, the
conventional scheme cannot fully exploit the self-similarity existing in regions such as the
object contour. With rotated block transformation, the proposed ANL-means algorithm
can eectively identify blocks with shifted orientation as a closed match. In general, the
candidate block can be rotated in various orientations until the lowest similarity distance
is acquired. To speed up the matching process, we consider only a set of rotated blocks
that have their dominant orientation aligned well with that of the target block. To be
more specic, let v
1
= [
1
2
]
T
be the rst column of V in Eq. 5.4. We can obtain the
dominant orientation of the gradient eld of a given block by calculating
= arctan(
1
2
): (5.7)
Then, we can obtain four rotated blocks with dominant orientations equal to , + 180,
and + 180 degrees. Block rotation is achieved by bicubic interpolation. Since
the block rotation process demands higher computational complexity, we apply it only to
blocks that have a strong dominant orientation.
5.3 Rician Noise Removal
It is often assumed that images are corrupted by independent and identically-distributed
(i.i.d.) additive Gaussian noise, which can be eectively removed by the NL-means algo-
rithm. However, noise in the magnitude MRI image has the Rician distribution, which
90
poses a greater challenge for the denoising problem. The traditional NL-means algo-
rithm is under the assumption that noise is zero-mean and additive. Its performance is
largely in
uenced by the ability to exploit existing self-similarity through weight alloca-
tion. However, the Rician noise presented in magnitude MRI image does not satisfy these
conditions.
Typically, the MRI image Y
M
=fy
M
(i)ji2
g is reconstructed by computing the
inverse discrete Fourier transform of the measured signal components from real and imag-
inary channels denoted byY
M;Re
andY
M;Im
, respectively [8]. These raw data are aected
by N
1
=fn
1
(i)ji2
g and N
2
=fn
2
(i)ji2
g, AWGN with zero means and standard
deviation
n
as given by
Y
M
=
q
Y
2
M;Re
+Y
2
M;Im
;
Y
M;Re
=X
M
cos(
M
) +N
1
; N
1
N(0;
2
n
);
Y
M;Im
=X
M
sin(
M
) +N
2
; N
2
N(0;
2
n
);
Y
M
=
p
(X
M
cos(
M
) +N
1
)
2
+ (X
M
sin(
M
) +N
2
)
2
(5.8)
where X
M
=fx
M
(i)ji2
g is the original MRI image intensity and
M
is the phase
of the real and the imaginary channels. As compared with AWGN, the Rician noise is
much more complicated since it is not additive and its mean is signal-dependent. Without
any modication, the NL-means algorithm cannot be applied eectively to remove the
Rician noise. As illustrated in Fig. 5.2, the denoising result from the normal NL-means
algorithm suers from intensity shift due to biased estimation.
91
Figure 5.2: An denoised image example obtained by the NL-mean algorithm
with the Rician noise.
A technique was proposed in [80] to handle the Rician noise, which rely on its closed-
form second order moment. For estimated image value ~ y
2
M
(j), we can derive its estimate
via
~ y
2
M
(j) =
P
j2
N
w(i;j)y
2
M
(j)E[Y
2
M
]
=
P
j2
N
w(i;j)[(x
M
(j) cos(
M
) +n
1
(j))
2
+ (x
M
(j) sin(
M
) +n
2
(j))
2
]
=
P
j2
N
w(i;j)x
2
M
(j) + 2 cos(
M
)
P
j2
N
w(i;j)x
M
(j)n
1
(j)
+2 sin(
M
)
P
j2
N
w(i;j)x
M
(j)n
2
(j) +
P
j2
N
w(i;j)n
2
1
(j)
+
P
j2
N
w(i;j)n
2
2
(j)
x
2 + 2X cos(
M
)E[N
1
] + 2X sin(
M
)E[N
2
] +E[N
2
1
] +E[N
2
2
]
=
x
2 +E[N
2
1
] +E[N
2
2
] =
x
2 + 2
2
;
(5.9)
92
where
x
2 is the weighted average of the squared magnitude of the original image data.
Based on acquired image data, the estimated value ~ y
2
M
(j) is overestimated by 2
2
n
. Thus,
we adop the following biased estimation to estimate the original data:
~ y
M
(j) =
q
~ y
2
M
(j) 2
2
n
=
p
x
2: (5.10)
The ability to determine a similar match for the Rician noise case is more important than
that for the AWGN case since dissimilarity is amplied by the squared operation. This
eect is observed as a shifted intensity in a denoised image.
5.4 Noise Level Estimation
Accurate estimation of the noise level is critical to the performance of the NL-means
algorithm since the weight decay factor is determined by the estimated noise level. An
under-estimation of
n
would lower the denoising performance of the NL-means and yields
a noisy result. On the other hand, an over-estimation of the noise parameter would result
in a burring denoised image. In this section, we propose a low complexity method to
estimate the variance of the Gaussian and the Rician noise from a noisy image based on
the Laplacian operator and classication results from the ANL-means algorithm.
5.4.1 Additive White Gaussian Noise
A noise variance estimation technique was proposed in [85], which uses the Laplacian
operator to suppress the image structure as illustrated in Fig. 5.3. The variance of
93
the output image provides an estimation of the noise variance. We express the discrete
Laplacian operator by
M =
2
6
6
6
6
6
6
4
1 2 1
2 4 2
1 2 1
3
7
7
7
7
7
7
5
(5.11)
Then, the application of the Laplacian operator M to image I at position (x;y) can be
written asI(x;y)M. If noise at a pixel has standard deviation
n
, thenI(x;y)M has
zero mean and variance of 36
2
n
. The variance of noise in I can be computed as
2
n
=
1
36WH
X
8x;y
(I(x;y)M)
2
; (5.12)
where W and H are the image width and the image height, respectively. However, since
strong edges and complex textures can over-estimate the noise variance [35], we use the
block classication method proposed in Sec. 5.2.1 to locate the smooth region for accurate
variance estimation. This can be written as
2
n
=
1
36N
s
X
8x;y2
s
(I(x;y)M)
2
; (5.13)
where N
s
is the total number of classied pixels in the smooth region denoted by
s
.
We compare the performance of the noise level estimation using the Laplacian operator
[85] and the proposed scheme in Table 5.1. In this experiment, seven test images are
corrupted by AWGN with zero mean and standard deviation
n
= 30, 40 and 50. We see
that the proposed technique is eective and robust in all noise levels. On the average,
the proposed scheme has an estimated variance error of 0.30, 0.37 and 1.11 while the
94
Figure 5.3: Illustration of the noise level estimation scheme using the Lapla-
cian operator: (a) noisy Lena image with AWGN (
n
=40) and (b) the pro-
cessed image using the Laplacian operator.
Laplacian technique [85] has an estimated error of 0.58, 1.10 and 2.24 for
n
= 20, 30
and 40, respectively.
95
Image Noise parameter
n
= 20
n
= 30
n
= 40
Laplacian Proposed Laplacian Proposed Laplacian Proposed
Est. Est. Est. Est. Est. Est.
Lena 20.03 0.03 20.10 0.10 29.47 0.53 29.60 0.40 38.46 1.54 38.59 1.41
Zelda 19.56 0.44 20.03 0.03 28.63 1.37 29.56 0.44 37.11 2.89 38.47 1.53
Peppers 20.44 0.44 20.70 0.70 29.68 0.32 30.11 0.11 38.47 1.53 39.13 0.87
Fruits 19.85 0.15 20.05 0.05 29.15 0.85 29.68 0.32 38.06 1.94 38.82 1.18
Cameraman 19.18 0.82 19.99 0.01 28.20 1.80 29.69 0.31 37.09 2.91 39.20 0.80
Elaine 21.02 1.02 21.14 1.14 30.37 0.37 30.57 0.57 39.31 0.69 39.59 0.41
Girlface 18.85 1.15 20.09 0.09 27.54 2.46 29.54 0.46 35.82 4.18 38.44 1.56
Summary 0.58 0.30 1.10 0.37 2.24 1.11
Table 5.1: Performance comparison of noise level estimation schemes using the Laplacian operator and the
proposed technique for AWGN of three variance levels.
96
5.4.2 Rician Noise
The probability density function (PDF) of the Rice distribution is given by
p(Y
M
jX
M
) =
Y
M
2
n
exp
Y
2
M
+X
2
M
2
2
n
I
0
Y
M
X
M
2
n
; (5.14)
where I
0
is the zeroth-order modied Bessel function of the rst kind. The moment of
the Rician density function can be analytically expressed as a function of the con
uent
hypergeometric function [42]. For even moments, this function is reduced to a simple
polynomial and the second moment of the Rician density function is given by
E[Y
2
M
] = 2
2
n
+X
2
M
: (5.15)
Due to the signal dependency property of the Rician noise, the Laplacian noise es-
timator does not work well. To overcome this problem, we impose another constraint.
That is, for the background region where the signal is assumed to be zero (X
M
! 0), the
Rice PDF can be simplied to the following Rayleigh PDF
Y
M;B
=
p
(X
M
cos(
M
) +N
1
)
2
+ (X
M
sin(
M
) +N
2
)
2
p
N
2
1
+N
2
2
;
(5.16)
where Y
M;B
is the image in the background area. The Rayleigh PDF of Y
M;B
can be
expressed as
p(Y
M;B
) =
Y
M;B
2
n
exp
Y
2
M;B
2
2
n
!
(5.17)
97
and the variance of Rayleigh density function is given by
Var[Y
2
M;B
] =
4
2
2
n
: (5.18)
The background region can be identied by the block classication algorithm as described
before. As derived before, I(x;y)M has zero mean and variance of 36
4
2
2
n
. Then,
the variance of the noise in I can be computed as
2
n
=
1
36
4
2
N
B
X
8x;y2
B
(I(x;y)M)
2
; (5.19)
where N
B
is the total number of classied pixels in the background region
B
.
We show results of the noise level estimation for the Rician noise in Table 5.2. In this
test, six MRI images as shown in Fig. 5.4 are corrupted by the Rician noise of several
standard deviations. We see that the proposed technique provides an accurate noise level
for the Rician noise.
98
Figure 5.4: Six test images to be corrupted by the Rician noise.
99
Image Noise parameter
n
= 10
n
= 15
n
= 20
n
= 25
n
= 30
n
= 35
n
= 40
n
= 45
n
= 50
1 10.70 15.87 20.80 25.41 30.95 35.77 40.81 45.52 48.98
2 10.10 15.11 20.11 24.64 30.02 34.91 39.97 45.21 49.71
3 10.75 15.59 20.59 25.39 30.38 35.23 40.15 45.06 49.37
4 10.62 15.42 20.19 25.22 30.25 35.25 40.29 45.02 49.72
5 10.06 15.01 19.92 24.85 30.01 35.42 40.05 45.26 49.36
6 11.10 16.45 20.47 25.19 30.22 35.64 40.15 45.38 49.27
Average 0.55 0.57 0.37 0.28 0.30 0.40 0.24 0.24 0.59
Table 5.2: Results of noise level estimation of the proposed technique for the Rician noise of dierent standard
deviations.
100
5.5 Experimental Results
In this section, the denoising performance of the proposed ANL-means algorithm is com-
pared with ve well known denoising algorithms: 1) the mean lter (MF), 2) the Gaussian
ltering (GF), 3) the method based the partial dierential equation (PDE), 4) the total
variation (TV) minimization and 5) the traditional NL-means algorithm.
5.5.1 Additive White Gaussian Noise
First, we consider a set of seven representative test images corrupted by the zero-mean
AWGN with standard deviation
n
= 20, 30 and 40. For each case, three Gaussian noise
patterns are generated and the averaged PSNR results of these three denoised images are
reported.
The PSNR comparison between the NL-means and the ANL-means algorithms for
each test image of dierent standard deviation values are listed in Table 5.3. The avearged
PSNR performance of six denoising algorithms is compared in Fig. 5.5. We see that ANL-
means has a substantial PSNR gain over other denoising benchmarks. The average PSNR
of the ANL-means scheme is approximately 2.98, 2.55, 3.70, 1.88 and 2.06 dB better than
MF, GF, PDE, TV and NL-means, respectively. The ANL-means achieves an average
PSNR improvement of 0.63, 2.16 and 3.39 dB over the NL-means for
n
= 20, 30 and 40,
respectively.
101
Image Average PSNR (dB)
n
= 20
n
= 30
n
= 40
NL ANL NL ANL NL ANL
Lena 31.02 31.98 0.96 27.5 30.04 2.54 24.37 28.27 3.9
Zelda 31.85 32.83 0.98 28.18 30.72 2.55 25.06 28.76 3.7
Peppers 30.93 31.59 0.65 27.5 29.79 2.29 24.4 28 3.6
Airplain 30.52 30.93 0.41 27.2 29.05 1.85 24.34 27.41 3.07
Barbara 29.85 30.3 0.45 26.65 28.41 1.76 23.89 26.74 2.85
Elaine 30.4 30.82 0.42 27.3 29.58 2.28 24.32 28.1 3.78
Girlface 31.75 32.29 0.54 28.12 29.98 1.86 25.06 27.92 2.86
Average 30.9 31.53 0.63 27.49 29.65 2.16 24.49 27.89 3.39
Table 5.3: The PSNR comparison between the NL-means and the ANL-means algorithms for the AWGN of
three standard deviation values.
102
Figure 5.5: Comparison of the averaged PSNR values of six denoising al-
gorithms applied to seven test images corrupted by the AWGN with three
standard deviation values.
The denoised images obtained with various algorithms are shown in Fig. 5.6 and 5.7
for visual comparison. The denoising results using MF, GF and PDE might have higher
PSNR yet the the visual quality is actually poorer due to blurred edges. On the other
hand, the TV and the NL-means denoising algorithms preserves sharp edges and object
contours reasonably well. We see that the proposed ANL-means scheme provides much
better visual quality, where noise is strongly suppressed in the smooth region while sharp
edges around the object contour are well preserved.
5.5.2 Rician Noise
We use NL-R and ANL-R to denote the modied NL-means and ANL-means algorithms
tailored to the Rician noise, respectively. In this subsection, we compare the denoising
performance of NL-R and ANL-R applied to a set of 6 representative test images as shown
103
Figure 5.6: Visual quality comparison of various denoising algorithms for
image Zelda corrupted by AWGN with
n
= 40.
104
Figure 5.7: Visual quality comparison of various denoising algorithms for
image Lena corrupted by AWGN with
n
= 40.
105
Figure 5.8: Summarized results of image denoising performance with the
Rician noise.
in Fig. 5.4 corrupted by the Rician noise. We generate the Rician noise by adding two
independent Gaussian noises to the real and imaginary part of image data, respectively.
The Gaussian noise is zero-mean with standard deviation
n
= 30, 40 and 50. The real
and imaginary part of image data are derived from the magnitude of test images with
= 0, 30 and 45 degree. For each test condition, three noise patterns are generated.
The averaged PSNR values of denoised test images are shown in Table 5.4. The
averaged PSNR results of all cases are shown in Fig. 5.8. The averaged PSNR of the
ANL-R is approximately 1.55, 2.54 and 3.01 dB better than NL-R for
n
= 30, 40 and
50, respectively.
106
Noise (
n
) Test image PSNR (dB)
= 0 = 30 = 45
Noisy NL-R ANL-R Noisy NL-R ANL-R Noisy NL-R ANL-R
30 1 18.18 25.83 26.61 18.19 25.81 26.60 18.19 25.81 26.60
2 16.96 27.78 29.26 16.97 27.77 29.28 16.97 27.76 29.27
3 17.08 27.44 29.06 17.08 27.38 29.05 17.09 27.38 29.01
4 18.74 25.54 26.81 18.76 25.53 26.77 18.75 25.45 26.76
5 17.30 27.16 29.01 17.32 27.14 28.98 17.30 27.11 28.92
6 17.69 27.60 29.93 17.69 27.59 29.83 17.68 27.58 29.80
Average 17.66 26.89 28.45 17.67 26.87 28.42 17.66 26.85 28.39
40 1 15.64 22.88 24.90 15.64 22.87 24.88 15.64 22.88 24.89
2 14.48 24.24 26.70 14.49 24.24 26.72 14.49 24.24 26.71
3 14.47 24.03 26.65 14.50 24.04 26.56 14.50 24.05 26.57
4 16.13 22.66 24.79 16.13 22.70 24.87 16.14 22.64 24.73
5 14.78 23.75 26.63 14.78 23.73 26.70 14.76 23.72 26.62
6 15.20 24.14 27.30 15.23 24.22 27.42 15.20 24.14 27.30
Average 15.12 23.62 26.16 15.13 23.63 26.19 15.12 23.61 26.14
50 1 13.67 20.39 23.06 13.68 20.38 23.04 13.68 20.39 23.05
2 12.53 21.26 24.48 12.54 21.25 24.53 12.54 21.25 24.53
3 12.46 21.38 24.30 12.48 21.42 24.30 12.48 21.42 24.32
4 14.13 20.55 22.86 14.15 20.55 22.82 14.14 20.55 22.82
5 12.81 21.10 24.32 12.79 21.05 24.27 12.79 21.06 24.28
6 13.31 21.30 25.14 13.30 21.27 24.98 13.30 21.27 24.99
Average 13.15 21.00 24.03 13.16 20.99 23.99 13.16 20.99 24.00
Table 5.4: Comparison of the PSNR values of denoised images with NL-R and ANL-R
107
Furthermore, some of denoised images using NL-R and ANL-R are shown in Figs. 5.9
and 5.10 for visual comparison. We see that the performance gap between the ANL-means
and the NL-means is much larger for the Rician noise compared to that for the Gaussian
noise. This performance improvement is contributed by not only the better similarity
matching (i.e. adaptive matching window and dominant orientation alignment) but also
the local intensity shift via bias estimation.
5.6 Conclusion to Chapter 5
An adaptive NL-means scheme was proposed in this work, which was shown to be eective
in the denoising of highly noisy images corrupted by both AWGN and the Rician noise.
The proposed ANL-means can classify noisy image eectively via SVD and K-means
clustering technique. The block classication results are utilized to adjust the similarity
measure window size adaptively. Furthermore, a rotated block matching algorithm is
employed to enhance the similarity matching process. The noise level can be estimated
more accurately using a modied Laplacian noise estimation method. It was shown by
experimental results that the performance of the proposed ANL-means scheme outper-
forms several well-known denoising benchmarks in terms of the PSNR value and visual
quality. Besides, the ANL-means algorithm is shown to be eective with both AWGN
and Rician noise, where the latter is common in magnitude MRI images.
108
Figure 5.9: Visual quality comparison of NL-R and ANL-R algorithms for test
image no. 4 corrupted by the Rician noise with
n
= 40 and = 45 degree).
109
Figure 5.10: Visual quality comparison of NL-R and ANL-R algorithms for
test image no. 6 corrupted by the Rician noise with
n
= 50 and = 45 degree).
110
Chapter 6
Conclusion and Future Work
6.1 Summary of the Research
In this research, we performed an extensive study on three image and video processing
application, i.e. error concealment (EC), frame rate up-conversion (FRUC) and image
denoising.
In Chapter 3, we conducted the performance evaluation of two low-complexity error
concealment techniques based on the boundary matching criteria. They are the boundary
matching algorithm (BMA) and the outer boundary matching algorithm (OBMA). The
large performance gap between BMA and OBMA was analyzed. Extensive experiments
under various test conditions were performed to support our analysis. As described in
Chapter 3, the superior performance of OBMA comes from the fact that the criterion
function of OBMA incorporates the edge information in its matching decision while the
criterion function of BMA ignores such information. In addition, we proposed local re-
ned search and multiple boundary layers extension for OBMA. With these extensions,
OBMA can achieve various levels of performance improvement at the cost of additional
111
computational complexity. We also integrated OBMA with some new features in H.264
such as
exible macroblock ordering (FMO) to have a full low-complexity error conceal-
ment system that is suitable for mobile video applications.
For frame rate up-conversion, we identied two scenarios where most existing FRUC
algorithms fail to produce interpolated frames of good quality; namely, abrupt illumi-
nation change and low frame rate video. We showed that the temporal continuity of
the motion vector eld is interrupted due to a sudden intensity shift and large temporal
gap for abrupt illumination change and low frame rate video, respectively. Consequently,
object motion cannot be acquired by motion estimation accurately. Interpolated frames
that rely on the motion information suer severe quality degradation. A low-complexity
pre-processing technique, which consists of abrupt illumination change detection and lo-
cal intensity adjustment, was proposed to ensure robust motion estimation when there
exists an abrupt intensity shift in video sequence. To address the low video frame rate
problem, we proposed an algorithm called TWK, which utilizes the rst- and the second-
order translational model to guarantee the temporal continuity of the motion vector
eld. Then, OBMA is used to preserve the spatial smoothness of video contents. It was
demonstrated by extensive experiments that the proposed TWK algorithm outperforms
six other low-complexity FRUC algorithms signicantly.
In chapter 5, we proposed an adaptive image denoising technique based on the non-
local means (NL-means) algorithm. It was shown to be eective in denoising highly
noisy images. The proposed adaptive nonlocal means (ANL-means) algorithm can ef-
fectively classify noisy image via singular value decomposition and K-means clustering
112
technique. The classication results are utilized to adaptively adjust the denoising pa-
rameter. Furthermore, a rotated block matching algorithm is employed to enhance the
similarity matching process. The noise level can be estimated accurately using a mod-
ied Laplacian noise estimation method. The denoising performance of the proposed
ANL-means scheme was shown to outperform several well-known denoising benchmarks
in terms of the PSNR value and visual quality. The proposed ANL-means algorithm is
applicable to noisy images corrupted by additive white Gaussian noise in normal images
and the Rician noise which is common in magnitude MRI images.
6.2 Future Research Topics
We would like to extend our current studies along the following directions to make this
research more complete.
We can extend our studies to cover more practical applications that use EC and
FRUC as a basis. For example, in addition to conceal lost frames, we may need
to reduce propagation errors as well. Another application is to develop the FRUC
scheme with a non-integer up-sample factor. For simplicity, most previous work
has focused on the integer up-sample factor. However, if we have a motion picture
with a frame rate of 24 fps and would like to convert it to a program suitable for
the high-denition television (HDTV) with a frame rate of 60. Then, we need an
up-sample factor of approximately 2.5.
We can extend the current work to meet a more challenging requirement. For
FRUC, the problem of interpolating occlusion areas (uncovered and to-be-covered
113
background) and obtaining a sharper object contour have been a topic of great in-
terest. For EC, the performance of concealment is highly related to data availability.
In some cases, when a video sequence experiences a severe transmission loss, there
is a chance that this sequence may suer the loss of consecutive frames even under
with FMO. Then, the task to conceal missing frames becomes much dicult due to
the lack of data correlation.
We may study other video processing applications that rely on motion analysis. For
instance, in normal video encoding, both motion vectors and coding residuals need
to be transmitted to the decoder for video playback. However, the encoding bit rate
can be reduced by exploiting the concept of frame interpolation. Instead of using
motion estimation to obtain the motion vector, we can interpolate the current frame
based on information from its reference frames and then, only the motion vector
residuals and coding residuals need to be delivered to the decoder.
We would like to consider applications arising from the conext of cartoon anima-
tion. There are 2D cartoon animation drawn by human hands and 3D cartoon
animation rendered by computer graphics techniques. It is desirable to have a very
low input frame rate of the drawn or rendered input cartoon sequence (say 2fps)
and interpolate frames in between to enhance the frame rate of cartoon contents to
30 fps based on motion analysis. On one hand, the up-sampling conversion ratio
can be as high as 15:1 or 30:1, which imposes a very challenging problem. On the
other hand, since cartoon contents are synthetic video sequences, there could be
some properties to exploit to achieve such a high conversion ratio.
114
Bibliography
[1] Y. Altunbasak and A.M. Tekalp, "Occlusion-adaptive, content-based mesh design and
forward tracking," IEEE Transactions on Image Processing, Volume 6, Issue 9, Sep 1997
Page(s):1270 - 1280
[2] G. R. Arce and R. E. Foster, "Detail preserving ranked-order based lters for image
processing," IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 83-98, Jan.
1989.
[3] J. Astola and P. Kuosmanen, "Fundamentals of Nonlinear Digital Filtering," Boca
Raton, FL: CRC, 1997.
[4] P. Baccichet and A. Chimienti, "A Low Complexity Concealment Algorithm For The
Whole-Frame Loss In H.264/AVC," Presented at the MMSP, 2004.
[5] M. R. Banham and A. K. Katsaggelos, "Digital Image Restoration," IEEE Signal
Processing Magazine, Vol. 14, Issue 2, 24-41, March 1997.
[6] S. Belore, L. Crisi, M. Grangetto, E. Magli, and G. Olmo, "Robust And Edge-
Preserving Video Error Concealment By Coarse-To-Fine Block Replenishment," Pro-
ceedings of IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2002.
[7] S. Belore, M. Grangetto, E. Magli, and G. Olmo, "An Error Concealment Algo-
rithm For Streaming Video," Proceedings of IEEE International Conference on Image
Processing (ICIP), 2003, pp. 649-652.
[8] A. Ben Hamza, P. Luque, J. Martinez, and R. Roman, "Removing noise and preserving
details with relaxed median lters," J. Math. Imag. Vision, vol. 11, no. 2, pp. 161-177,
Oct. 1999.
[9] A. Buades, B. Coll., and J. Morel, "A non local algorithm for image denoising," in
Proc. Int. Conf. Computer Vision and Pattern Recognition (CVPR), vol. 2, 2005, pp.
60-65.
[10] A. Buades, B. Coll, and J. M. Morel, "A review of image denoising algorithms, with
a new one," Multiscale Model. Simul., vol. 4, no. 2, pp. 490-530, 2005.
[11] R. Castagno, P. Haavisto, and G. Ramponi, "A Method For Motion Adaptive Frame
Rate Up-Conversion," IEEE Transactions on Circuits and Systems for Video Technol-
ogy, Vol. 6, Oct. 1996, pp. 436 - 446
115
[12] R. Castagno, P. Haavisto and G. Ramponi, "A Method for Motion Adaptive Frame
Rate Up-conversion," IEEE Trans. on Circuits and Systems for Video Technology, 1996,
pages 436-446
[13] A. Chambolle, "An algorithm for total variation minimization and applications," J.
Math. Imaging and Vision 20 (1-2): 89-97, 2004.
[14] P. Chatterjee and P. Milanfar, "A generalization of non-local means via kernel re-
gression," Proc. SPIE Conf. on Computational Imaging, San Jose, January 2008.
[15] M.-J. Chen, L.-G. Chen and R.-X. Chen, "Error concealment of lost motion vectors
with overlapped motion compensation," IEEE Transactions on Circuits and Systems
for Video Technology, vol.7, no.3, pp.560-563, June 1997
[16] T. Chen, X. Zhang, and Y.Q. Shi, "Error concealment using rened boundary match-
ing algorithm," Proceedings of ITRE2003, pp. 55-59, 11-13 August, 2003
[17] Y. Chen, Y. Hu, O. C. Au, H. Li and C. W. Chen, "Video Error Concealment
Using Spatio-Temporal Boundary Matching and Partial Dierential Equation," IEEE
Transactions on Multimedia, Vol 10, Issue: 1, pp. 2-15, Jan. 2008
[18] Y. Chen, K. Yu, J. Li, and S. Li, "An Error Concealment Algorithm For Entire
Frame Loss In Video Transmission," Presented at the Picture Coding Symposium, San
Francisco, CA, Dec 2004.
[19] B. D. Choi, J. W. Han, C. S. Kim, and S. J. Ko, "Motion compensated frame
interpolation using bilateral motion estimation and adaptive overlapped block motion
compensation," IEEE Trans. Circuits and Systems for Video Technology, vol. 17, no.
4, pp. 407-416, Apr. 2007.
[20] B.-T. Choi, S.-H. Lee, and S.-J. Ko, "New Frame Rate Up-Conversion Using Bi-
directional Motion Estimation," IEEE Transactions on Consumer Electronics, Vol. 46,
Issue 3, August 2000, pp. 603-609
[21] P. Coupe, P. Yger, and C. Barillot, "Fast non local means denoising for 3D MR
images," in Proc. of the 9th Int. Conf. on Medical Image Computing and Computer-
Assisted Intervention (MICCAI '06), pp. 33-40, Oct. 2006.
[22] P. Coupe, P. Yger, S. Prima, P. Hellier, C. Kervrann and C. Barillot, "An optimized
blockwise nonlocal means denoising lter for 3-D magnetic resonance images," IEEE
Trans. on Medical Imaging, vol. 27, no. 4, pp. 425-441, Apr. 2008.
[23] G. Dane and T.Q. Nguyen, "Motion Vector Processing For Frame Rate Up Conver-
sion," Proceedings of IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), Vol. 3, pp. 309-312, May 2004
[24] Y. Dhondt and P. Lambert, "Flexible Macroblock Ordering, an error resilience tool in
H.264/AVC," Fifth FTW PhD Symposium, Faculty of Engineering, Ghent University,
Desembre 2004, Paper No. 106
116
[25] J. Dinesh Peter, V. K. Govindan and Abraham T. Mathew, "Robust estimation
approach for nonlocal-means denoising based on structurally similar patches," Int. J.
Open Problems Compt. Math., vol. 2, no. 2, June 2009.
[26] D. L. Donoho, "De-noising by soft-thresholding", IEEE Trans. Information Theory,
vol.41, no.3, pp.613-627, May1995.
[27] X. Feng and P. Milanfar, "Multiscale principal components analysis for image local
orientation estimation," presented at the 36th Asilomar Conf. Signals, Systems and
Computers, Pacic Grove, CA, Nov. 2002.
[28] S. Fujiwara and A. Taguchi, "Motion-Compensated Frame Rate Up-Conversion
Based On Block Matching Algorithm With Multi-Size Blocks," Proceedings of In-
ternational Symposium on Intelligent Signal Processing and Communication Systems
(ISPACS), Page(s): 353 - 356, Dec. 2005.
[29] S. J. Garnier and G. L. Bilbro, "Magnetic Resonance Image Restoration", Journal
of Mathematical Imaging and Vision, Vol. 5, 1995.
[30] G. Gerig, O. Kubler, R. Kikinis and F. A. Jolesz, "Nonlinear anisotropic ltering of
MRI data," IEEE Trans. on Medical Imaging 11, pp. 221-232, 1992.
[31] S. Ghael, A. M. Sayeed, and R. G. Baraniuk, "Improved wavelet denoising via empiri-
cal Wiener ltering," Proc. SPIE, Wavelet Applications in Signal and Image Processing
V, vol. 3169, pp. 389-399, Oct 1997.
[32] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. Englewood
Clis, NJ: Prentice-Hall, 2001.
[33] C. Han and J. Liu, "New Temporal Error Concealment Method for H.264 Based on
Motion Strength," International Conference on Computational Intelligence and Secu-
rity Workshops, 2007. CISW 2007, pp. 858-861, 15-19 Dec. 2007
[34] S. Han and J. Woods, "Frame-Rate Up-Conversion Using Transmitted Motion And
Segmentation Fields For Very Low Bit-Rate Coding," Proceedings of IEEE Interna-
tional Conference on Image Processing (ICIP), 1997, Santa Barbara, CA, Oct. 1997.
[35] W. Hantanong and S. Aramvith, "Analysis of macroblock-to-slice group mapping
for H.264 video transmission over packet-based wireless fading channel," Circuits and
Systems, 2005. 48th Midwest Symposium, Vol. 2, pp. 1541- 1544, 7-10 Aug. 2005
[36] R. C. Hardie and K. E. Barner, "Rank conditioned rank selection lters for signal
restoration," IEEE Trans. Image Processing, vol. 3, pp.192-206, Mar. 1994.
[37] K. Hilman, H. W. Park and Y. M. Kim, "Using Motion-Compensated Frame-Rate
Conversion For The Correction Of 3:2 Pulldown Artifacts In Video Sequences", IEEE
Transactions on Circuits and Systems for Video Technology, Vol.10, (3), Sept. 2000,
pp. 869-877
117
[38] J. Immerkr, "Fast Noise Variance Estimation," Computer Vision and Image Under-
standing, Vol. 64, No. 2, pp. 300-302, Sep. 1996.
[39] K. Imola, K. Chandrika, "Denoising through wavlet shrinkage: An empirical study",
Center for applied science computing Lawrence Livermore National Laboratory, July
27, 2001.
[40] A.K.Jain,Fundamentals of digital image processing. Prentice-Hall,1989
[41] B.W. Jeon, G.I. Lee, S.H. Lee and R.H. Park, "Coarse-To-ne Frame Interpola-
tion For Frame Rate Up-Conversion Using Pyramid Structure," IEEE Transaction on
Consumer Electron, Vol. 49, Issue 3, pp. 499-508.
[42] G.-T. Jian, M.-J. Chen, and M.-C. Chi, "Eective Error Concealment Algorithm by
Boundary Information for H.264 Video Decoder", IEEE International Conference on
Multimedia and Expo, July 2006, pp. 2021-2024.
[43] JM Reference Software, version JM 11.0. August 2006
[44] K. Kawaguchi and S. K. Mitra, "Frame Rate Up-Conversion Considering Multiple
Motion," Proceedings of IEEE International Conference on Image Processing (ICIP),
Santa Barbara, CA, Oct.1997
[45] C. Kervrann and J. Boulanger, "Optimal Spatial Adaptation for Patch-Based Image
Denoising," IEEE Trans on Image Processing, Vol. 15, No. 10, pp. 2866-2878, 2006.
[46] L. H. Kieu and K. N. Ngan, "Cell-loss concealment techniques for layered video
codecs in an ATM network". IEEE Trans. on Image Processing, 3:666-677, Sept. 1994.
[47] D. Kim, S. Yang, J. Jeong, "A new temporal error concealment method for H.264
using adaptive block sizes," ICIP-IEEE, Int. Conf. Image Processing, 2005
[48] R. L. Lagendijk and M. I. Sezan, "Motion Compensated Frame Rate Conversion Of
Motion Pictures," Proceedings of IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Vol. 3, pp. 453-456, 1992.
[49] W. M. Lam, A. R. Reibman, and B. Liu, "Recovery of lost or erroneously received
motion vectors," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 1993,
vol. 3, pp. 417-420
[50] W. M. Lam and A. R. Reibman, "An error concealment algorithm for images subject
to channel errors," IEEE Trans. Image Processing, vol. 4, pp. 533{542, May 1995
[51] S.-H Lee, O.-J. Kwon and R.-H. Park, "Weighted-Adaptive Motion-Compensated
Frame Rate Up-Conversion," IEEE Transactions on Consumer Electronics, Vol. 49,
Aug. 2003, pp. 485-492
[52] S.-H. Lee, Y.-C. Shin, S. Yang, H.-H. Moon, and R.-H. Park, "Adaptive motion-
compensated interpolation for frame rate up-conversion," IEEE Trans. Consumer Elec-
tronics, vol. 48, no. 3, pp. 444-450, Aug. 2002.
118
[53] M. Lindenbaum and M. Fischer and A. M. Bruckstein, "On Gabor's contribution to
image-enhancement," Pattern Recognition, vol. 27, pp.1-8, 1994.
[54] Y. L. Liu, J. Wang, X. Chen, Y. W. Guo, Q. S. Peng, "A Robust and Fast Non-Local
Means Algorithm for Image Denoising," Journal of Computer Science and Technology,
Vol. 23, No. 2, pp. 270-279, 2008.
[55] M. Lysaker, A. Lundervold, and X.-C. Taj, "Noise removal using fourth-order partial
dierential equations with applications to medical magnetic resonance images in space
and time," IEEE Trans. Image Process., vol. 12, pp. 1579-1590, 2003.
[56] J.V. Manjn, J.C.-Caballero, J.J. Lull, G.G.-Mart, L.M.-Bonmat and M. Robles,
"MRI denoising using non-local means," Medical Image Analysis, vol.12, no. 4, pp.
514-523, Aug. 2008.
[57] C. Minghua, H. Yun and R.L. Lagendijk, "A fragile watermark error detection scheme
for wireless video communications" ,Multimedia, IEEE Transactions on Publication
Date: April 2005 Volume: 7, Issue: 2 On page(s): 201- 21
[58] O. R. Mitchell and A. J. Tabatabai, "Channel error recovery for transform image
coding," IEEE Trans. Commun., vol. COM-29, pp. 1754-1762, Dec. 1981.
[59] M.C. Motwani, M.C. Gadiya, C. Rakhi, R.C. Motwani and F.C. Harris, "Survey of
image denoising techniques," Proc. of GSPx, Santa Clara, CA, USA, 2004.
[60] L. Parthiban and R. Subramanian, "MRI image denoising for telemedicine," 8th Int.
Conf. on e-Health Networking, Applications and Services 2006 (HEALTHCOM 2006),
pp. 188- 191, 17-19 Aug. 2006.
[61] Q. Peng, T. W. Yang, and C. Q. Zhu, "Block-Based Temporal Error Concealment For
Video Packet Using Motion Vector Extrapolation," Proceedings of IEEE International
Conference on Communications, Circuits And System (ICASSP), West Sino Expo.,
2002, pp. 10-14.
[62] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, "Image denoising using
scale mixtures of Gaussians in the wavelet domain," IEEE Trans. Img. Proc., vol. 12,
no. 11, pp. 1338-1351, 2003.
[63] J. Romberg, H. Choi and R. G. Baraniuk, "Bayesian wavelet domain image modeling
using hidden Markov models," IEEE Transactions on Image Processing, vol. 10, pp.
1056-1068, July 2001.
[64] L. Rudin, S. Osher, and E. Fatemi, "Nonlinear total variation based noise removal
algorithms," Physica D, 60: pp. 259-268, 1992.
[65] H. Schwarz, D. Marpe and T. Wiegand, "Overview Of The Scalable Video Coding
Extension Of The H.264/AVC Standard, IEEE Transactions on Circuits and Systems
for Video Technology, Vol. 17, No. 9, pp. 1103-1120, Sept. 2007.
119
[66] T. Shen-Chuan and Y. Shih-Ming, "A fast method for image noise estimation using
Laplacian operator and adaptive edge detection," 3rd Int. Symposium on Commu-
nications, Control and Signal Processing, 2008 (ISCCSP 2008), pp. 1077-1081, Mar
2008.
[67] S. Shirani, F. Kossentini, and R.Ward, "Error concealment methods, a comparative
study," in Proc. Eng. Solutions For Next Millennium, 1999 IEEE Canadian Conf.
Electr. Comput. Eng., Edmonton, AB, Canada, 1999, vol. 2, pp. 835-840
[68] K. Simonyan, S. Grishin and D. Vatolin , "VirtualDub MSU Motion Estimation
Filter - version 1.0"
[69] J. Suh and Y. Ho, "Error concealment based on directional interpolation". IEEE
Trans. on Consumer Electronics, 43(3):295-302, Aug. 1997.
[70] T. Thaipanich, and C.-C. J. Kuo, "An Adaptive Nonlocal Means Scheme for Med-
ical Image Denoising," Under review, SPIE Medical Imaging, San Diego, CA, USA,
February 2010.
[71] T. Thaipanich, B. T. Oh, P.-H. Wu and C.-C. Jay Kuo, "Adaptive Nonlocal Means
Algorithm for Image Denoising Application," IEEE International Conference on Con-
sumer Electronics (ICCE2010), Las, Vegas, NV, USA, January 2010.
[72] T. Thaipanich, P.-H. Wu and C.-C. J. Kuo, "Robust Video Frame Rate Up-
Conversion (FRUC) Techniques," to be appear in International Conference on Con-
sumer Electronics (ICCE2009), January 13th, 2008
[73] T. Thaipanich, P.-H. Wu, C.-C. Jay Kuo, "Video Error Concealment With Outer
And Inner Boundary Matching Algorithms," Conference on Applications of Digital
Image Processing, SPIE Optics and Photonics, San Diego, CA, USA, Aug. 26-30,
2007.
[74] T. Thaipanich, P.-H. Wu, C.-C. Jay Kuo, "Low-Complexity Mobile Video Error Con-
cealment Using OBMA," International Conference on Consumer Electronics (ICCE),
Las, Vegas, NV, USA, Jan. 2008.
[75] T. Thaipanich, P.-H. Wu and C.-C. J. Kuo, "Low-Complexity Video Error Con-
cealment For Mobile Applications Using OBMA," IEEE Transactions on Consumer
Electronics, Vol. 54, Issue 2, May 2008, pp. 753-761.
[76] A. Tourapis, K. Suhring and G. Sullivan, "Revised H.264/MPEG-4 AVC reference
software manual," Joint Video Team, Doc. JVT-Q042, October 2005
[77] V. Varsa, M. M. Hannuksela, and Y. Wang, "Non-normative error concealment al-
gorithms," ITU-T VCEG (SG16/Q6), 14th Meeting: Santa Barbara, CA, USA, 21-4
Sept., 2001
[78] VCEG-N62 - Non-normative error concealment algorithms. ITU - Telecommunica-
tions Standardization Sector STUDY GROUP 16 Question 6 Video Coding Experts
Group (VCEG), 14th Meeting: Santa Barbara, CA, USA, 21-24 September, 2001
120
[79] J. Wang, Y. Guo, Y. Ying, Y. Liu and Q. Peng, "Fast Non-Local Algorithm for
Image Denoising", IEEE Int. Conf. on Image Processing, pp. 1429-1432, 2006.
[80] Y. Wang and Q.-F. Zhu, "Error control and concealment for video communication:
A review," Proceedings of the IEEE, vol. 86, no. 5, pp. 974-997, May 1998
[81] Y. Wang and Q. Zhu, "Signal loss recovery in DCT-based image and video codecs,"
in Proc. SPIE Conf. Visual Communication and mage Processing, vol. 1605, Nov. 1991,
pp. 667-678
[82] Z. Wang, Y. Yu, and D. Zhang, "Best neighborhood matching: An information
loss restoration technique for block-based image coding systems," IEEE Trans. Image
Processing, vol. 7, pp. 1056{1061, Jul. 1998
[83] P. D. Wendt, E. J. Coyle, and N. C. Gallagher, "Stack lters," IEEE Trans. Acoust.,
Speech, Signal Processing, vol. ASSP-34, pp. 898-919, Aug. 1986.
[84] S. Wenger and M. Horowitz, "FMO: Flexible Macroblock Ordering", Joint Video
Team, Doc. JVT-C089, Fairfax (USA), May 2002
[85] T. Wiegand, G. J. Sullivan, G. Bjntegaard and A.Luthra, "Overview of the
H.264/AVC Video Coding Standard," IEEE Transactions on Circuits and Systems
for Video Technology, Vol. 13, No. 7, July 2003
[86] N. Wiest-Daessl, S. Prima, P. Coup, S. Morrissey and C. Barillot, "Rician noise
removal by non-local means ltering for low signal-to-noise ratio MRI: Applications to
DT-MRI," in Proc. of the 11th Int. Conf. on Medical Image Computing and Computer-
Assisted Intervention (MICCAI '08), pp. 171-179, 2008.
[87] Y. Xu and Y. Zhou, "H.264 video communication based rened error concealment
schemes," IEEE Transactions on Consumer Electronics, vol.50, no.4, pp.1135-1141,
November 2004
[88] G. Z. Yang, P. Burger, D. N. Firmin and S. R. Underwood, "Structure Adaptive
Anisotropic Image Filtering", Image and Vision Computing, Vol. 14, 1996.
[89] R. Yang, L. Yin, M. Gabbouj, J. Astola, and Y. Neuvo, "Optimal weighted median
lters under structural constraints," IEEE Trans. Signal Processing, Vol. 43, pp. 591-
604, Mar. 1995.
[90] J. Zhai, K. Yu, J. Li, and S. Li, "A low complexity motion compensated frame
interpolation method," in Proc. IEEE ISCAS, May 2005, pp. 23-26.
[91] H.J. Zhang, A. Kankanhalli and S.W. Smoliar, "Automatic Partitioning Of Full-
motion Video," Multimedia Systems (1993) Vol. 1, No. 1, pp. 10-28.
[92] J. Zhang, J. F. Arnold, and M. R. Frater, "A cell-loss concealment technique for
MPEG-2 coded video," IEEE Transactions on Circuits and Systems for Video Tech-
nology, vol. 10, no. 4, pp. 659-665, June 2000
121
Abstract (if available)
Abstract
In this research, we investigate advanced image and video enhancement techniques based on motion based interpolation and nonlocal-means (NL-means) denoising. The dissertation consists of three main results. Two video processing applications
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rate control techniques for H.264/AVC video with enhanced rate-distortion modeling
PDF
Texture processing for image/video coding and super-resolution applications
PDF
Advanced techniques for high fidelity video coding
PDF
Advanced intra prediction techniques for image and video coding
PDF
Focus mismatch compensation and complexity reduction techniques for multiview video coding
PDF
Automatic image and video enhancement with application to visually impaired people
PDF
Advanced technologies for learning-based image/video enhancement, image generation and attribute editing
PDF
Modeling and optimization of energy-efficient and delay-constrained video sharing servers
PDF
Variational techniques for cardiac image analysis: algorithms and applications
PDF
Complexity scalable and robust motion estimation for video compression
PDF
Low complexity mosaicking and up-sampling techniques for high resolution video display
PDF
Distributed source coding for image and video applications
PDF
Machine learning techniques for perceptual quality enhancement and semantic image segmentation
PDF
Hybrid mesh/image-based rendering techniques for computer graphics applications
PDF
Analytical and experimental studies of modeling and monitoring uncertain nonlinear systems
PDF
Digital signal processing techniques for music structure analysis
PDF
Advanced coronary CT angiography image processing techniques
PDF
Source-specific learning and binaural cues selection techniques for audio source separation
PDF
Saliency based image processing to aid retinal prosthesis recipients
PDF
Disparity estimation from multi-view images and video: graph models and algorithms
Asset Metadata
Creator
Thaipanich, Tanaphol
(author)
Core Title
Image and video enhancement through motion based interpolation and nonlocal-means denoising techniques
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
03/10/2010
Defense Date
12/10/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
adaptive nonlocal-means,frame rate up-conversion,image denoising,motion based interpolation,OAI-PMH Harvest,video error concealment
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Golubchik, Leana (
committee member
), Leahy, Richard M. (
committee member
)
Creator Email
tanapholthai@hotmail.com,thaipani@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2871
Unique identifier
UC1318345
Identifier
etd-Thaipanich-3460 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-293950 (legacy record id),usctheses-m2871 (legacy record id)
Legacy Identifier
etd-Thaipanich-3460.pdf
Dmrecord
293950
Document Type
Dissertation
Rights
Thaipanich, Tanaphol
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
adaptive nonlocal-means
frame rate up-conversion
image denoising
motion based interpolation
video error concealment