Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Autostereoscopic 3D diplay rendering from stereo sequences
(USC Thesis Other)
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AUTOSTEREOSCOPIC 3D DISPLAY RENDERING FROM STEREO SEQUENCES
by
Young Ju Jeong
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
May 2016
Copyright 2016 Young Ju Jeong
Dedication
To my family
ii
Acknowledgements
It is a pleasure to thank all those who helped me in this long journey. I would never have
been able to nish this dissertation without the guidance of my advisor and committee
members, help from my colleagues, and support from my family. First of all, I would like
to express my deepest gratitude to my advisor, Prof. C. C. Jay Kuo, for his excellent
mentorship, support, patience, and immense knowledge. His valuable advice and guid-
ance always helped me in all the times to pursue my phD. I also appreciate committee
members, Professors Alexander A. Sawchuk, Aiichiro Nakano, B. Keith Jenkins, and
Panayiotis Georgiou their valuable comments during the PhD defense and qualifying
examination. I also thank my research group members and friends. Especially, I would
like to express my gratitude to the senior PhD students at USC, Hyunsuk Ko and Dong-
woo Kang. I would like to express my gratitude to my company colleges, Dongkyung
Nam, Jinho Lee, and Hyun Sung Chang. I would also like to thank my husband, Tahee
Lee, for his invaluable encouragement and support. He always cheered me up and stood
by me throughout my study and writing this dissertation. I thank my lovely daughter,
Jane, and son, Jaehyun. They play with me on the playground and study with me on the
table while I am studying. They provide me a lot of happiness during the phD journey.
Last but not least, I thank God I believe for everything that happened for me.
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vi
List of Figures vii
Abstract 1
Chapter 1: Introduction 3
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 3D Display Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Stereoscopic 3D Display . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Multiview 3D Display . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Light Field 3D Display . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 3D Content Rendering Techniques . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 3D Structure Restoration . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 3D Rendering with Non-ideal Stereo Contents . . . . . . . . . . . . 11
1.3.3 3D Rendering with 3D Structure . . . . . . . . . . . . . . . . . . . 11
1.4 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 2: Stereo Matching with Condence-Region Decomposition and
Processing 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Proposed Stereo Matching Algorithm . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Overview of Proposed Solution . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Disparity Estimation in High-Condence Regions . . . . . . . . . . 20
2.2.3 Disparity Estimation in Low-Condence Regions . . . . . . . . . . 25
2.2.4 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Middlebury test set . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.2 Real and synthetic stereo . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
iv
Chapter 3: Uncalibrated Multiview Synthesis 40
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Multiview Display Rendering . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Multiview Display Viewpoints (MDVs) . . . . . . . . . . . . . . . . 42
3.2.2 Multiview Rendering from Ideal Stereo . . . . . . . . . . . . . . . . 42
3.2.3 Previous Multiview Rendering from Non-ideal Stereo . . . . . . . . 43
3.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 Epipolar Geometry of the Algorithm . . . . . . . . . . . . . . . . . 44
3.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 4: Ecient Light-Field Rendering using Depth Maps for 100-
Mpixel Multi-Projection 3D Display 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1.1 Light-eld Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Multi-projection 3D Display . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.1 Operating Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Depth Image Based Light-eld Rendering . . . . . . . . . . . . . . . . . . 61
4.3.1 Calibration and Disparity Estimation . . . . . . . . . . . . . . . . 61
4.3.2 3D modeling using Input Image and Depth . . . . . . . . . . . . . 62
4.3.3 Horizontal Parallax Only Rendering (HPO) . . . . . . . . . . . . . 65
4.3.4 Mirror Re
ection Light-eld Rendering . . . . . . . . . . . . . . . 66
4.3.5 Consistent Hole Filling . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.1 Display System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4.2 Comparison Results of DIBR and IBR . . . . . . . . . . . . . . . . 70
4.4.3 Results of Camera Captured Contents . . . . . . . . . . . . . . . . 72
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Chapter 5: Direct Light Field Rendering without 2D Image Generation 84
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Light Field Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 PROPOSED ALGORITHM: DLFR . . . . . . . . . . . . . . . . . . . . . 89
5.3.1 3D Display Light Field Function (DLF) . . . . . . . . . . . . . . . 89
5.3.2 3D Point P Light Field Function (PLF) . . . . . . . . . . . . . . . 92
5.3.3 3D Direct Light Field Rendering . . . . . . . . . . . . . . . . . . . 94
5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Chapter 6: Conclusion 102
Bibliography 104
v
List of Tables
2.1 Middlebury Stereo Evaluation (Error Threshold = 2 pixels) . . . . . . . . 28
3.1 Comparison Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Target Specications of Multi-Projection 3D Displays . . . . . . . . . . . 69
4.2 Computation Time Comparison (sec) . . . . . . . . . . . . . . . . . . . . 71
4.3 Memory Usage Comparison (Mega byte) . . . . . . . . . . . . . . . . . . . 71
vi
List of Figures
1.1 Stereoscopic 3D display concept: A stereoscopic 3D display provides dif-
ferent left and right images, and these left and right images are displayed
separately to the left and right eye by special 3D glasses. . . . . . . . . . . 4
1.2 Multiview display concept: Multiview displays project light rays to several
viewpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Multiview images for multiview display: Light rays of multiview display
converge to certain viewpoint and light rays in the certain viewpoints can
be interpreted as multiview images. . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Light rays from a multiview display (left) and a light eld display (right):
Users cannot observe clear images out of viewpoints (sweat spots) through
the multiview display, but users can observe clear 3D everywhere through
light eld display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Multi-projection 3D display concept: A large number of light rays can be
generated by large many projetors to recreate natural 3D world. Projector
array is used for the light sources and diuser screen controls direction of
light rays having horizontal motion parallax. . . . . . . . . . . . . . . . . 8
1.6 Flat panel light eld 3D display concept: light rays from 3D displays
project to a large number of viewpoints, providing natural motion parallax. 9
1.7 Images for multi-projection 3D displays . . . . . . . . . . . . . . . . . . . 12
2.1 Two complementary tree structures: (a) the horizontal tree structure and
(b) the vertical tree structure. . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Disparities derived from (a) the horizontal tree-structure and (b) the ver-
tical tree-structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 (a) The disparity values in high-condence regions are shown in gray levels
while the low-condence region is shown in white whose disparity is to be
recovered. (b) The output disparity map where the disparities in low-
condence regions are restored via disparity tting. . . . . . . . . . . . . . 25
2.4 Disparity maps before and after post-processing: (a) left disparity-tting
result before post-processing, (b) right disparity-tting result before post-
processing, (c) left disparity-tting result after post-processing, and (d)
right disparity-tting result after post-processing. . . . . . . . . . . . . . . 27
2.5 Comparison of results on Middlebury data, rst column: ground truth
images, second column: proposed algorithm second column: proposed
algorithm third column: tree DP [90] . . . . . . . . . . . . . . . . . . . . . 29
vii
2.6 Comparison of results on Middlebury data, rst column: semi-global
matching [33], second column: occlusion GC [46] third column: symetric
BP [81] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Comparison of results on Middlebury data, Synthesized new view between
left and right stereo pairs rst column: ground truth images, second col-
umn: proposed algorithm third column: tree DP [90] . . . . . . . . . . . . 31
2.8 Comparison of results on Middlebury data, Synthesized new view between
left and right stereo pairs rst column: semi-global matching [33], second
column: occlusion GC [46] third column: symetric BP [81] . . . . . . . . . 32
2.9 Images captured by stereo cameras: (a) Man, (b) Woman, (c) the esti-
mated disparity for Man, (d) the estimated disparity for Woman. . . . . 33
2.10 Computer-synthesis stereo images: (a) Bird, (b) Coral, (c) the estimated
disparity for Bird, (d) the estimated disparity for Coral. . . . . . . . . . . 34
2.11 Extra middlebury results . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.12 Extra middlebury results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.13 Extra middlebury results . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.14 Extra middlebury results . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1 Lenticular lens multiview display and its viewpoints . . . . . . . . . . . . 42
3.2 epipolar geometry of stereo and viewpoint V
n
. . . . . . . . . . . . . . . . . 44
3.3 (a) Original left image, (b) rectied left image, (c) rectied right image,
and (d) original right image. . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Feature points and epipolar lines of left and right images. . . . . . . . . . 48
3.5 Interpolated view using (a) the proposed algorithm, (b) an uncalibrated
stereo image, and (c) a calibrated stereo image. . . . . . . . . . . . . . . . 49
3.6 Vertical disparity of test stereo video sequences . . . . . . . . . . . . . . . 50
3.7 Interpolated view using the proposed algorithm for video sequence. . . . . 52
3.8 Interpolated view using an uncalibrated stereo image for video sequence. . 53
3.9 Interpolated view using a calibrated stereo image for video sequence. . . . 54
4.1 Light ray conguration of (a) multiview display and (b) light-eld display.
The light rays of the multiview display converge to certain points but those
of the light-eld display do not. . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Basic conguration and operating principle of multi-projection 3D display.
The light rays are created using 96 projectors, and a vertical diuser screen
controls the direction of the light ray. . . . . . . . . . . . . . . . . . . . . . 58
4.3 System design and parameters of multi-projection 3D display, where an
overhead view of the basic arrangement of the multi-projection 3D display
is shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 The block diagram of coordinate transformation from an image pixel to a
light-eld pixel using DIBR. . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5 Input: (a) left color, (b) center color, (c) right color, (d) left depth, (e)
center depth, (f) right depth. The light ray of a given pixel and its geo-
metric depth location of (g) left, (h) center, and (i) right. . . . . . . . . . 63
4.6 Three-dimensional points of input color and depth images: (a) the left
image, (b) the center image, and (c) the right image. . . . . . . . . . . . . 64
viii
4.7 Three-dimensional points of input color and depth images in the world
coordinate system. Red, blue, and green points are shown on the left,
center, and right, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.8 Projector light rendering for the HPO screen. . . . . . . . . . . . . . . . . 65
4.9 Real projectors and two tilt mirrors, where real projectors' light rays are
re
ected from the two tilted mirrors and the re
ected light rays act as
though they are from virtual projectors. . . . . . . . . . . . . . . . . . . . 66
4.10 Projector images: According to the location of projector the portion of
image by the real and virtual projector is changed. (a) leftmost projector,
(b) center projector, and (c) rightmost projector. . . . . . . . . . . . . . . 66
4.11 The 55-inch, 100-Mpixel multi-projection 3D display system . . . . . . . . 68
4.12 Input: (a) left color, (b) center color, (c) right color, (d) left depth, (e)
center depth, and (f) right depth . . . . . . . . . . . . . . . . . . . . . . . 69
4.13 Magnied particle images of synthesized projector images (upper row)
and displayed 3D images captured by camera (lower row): (a) rendered
by DIBR algorithm, (b-d) rendered by the IBR algorithm using 20 (b),
100 (c), and 500 (d) multiview images. . . . . . . . . . . . . . . . . . . . . 70
4.14 Displayed 3D images from multi-projection 3D display: (a) left image, (b)
center image, (c) right image . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.15 Multi-camera capturing system: each camera is separated by 130mm.
Cameras indicated by red rectangle (1st, 5th, and 9th cameras) are used
for the input images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.16 Input colors and estimated disparity maps (a) from 1st camera image,
(b) 5th camera image, (c) 9th camera image, (d) estimated disparity of
1st camera image, (e) estimated disparity of 5th camera image, and (f)
estimated disparity of 9th camera image . . . . . . . . . . . . . . . . . . . 73
4.17 Light-eld images for projectors; (a) image of projector 1, (b) image of
projector 48, (c) image of projector 96, (d) images for 96 projectors . . . . 74
4.18 Light-eld images for projectors 124. . . . . . . . . . . . . . . . . . . . . 76
4.19 Light-eld images for projectors 2548. . . . . . . . . . . . . . . . . . . . 77
4.20 Light-eld images for projectors 4972. . . . . . . . . . . . . . . . . . . . 78
4.21 Light-eld images for projectors 7396. . . . . . . . . . . . . . . . . . . . 79
4.22 Light-eld images for projectors 124. . . . . . . . . . . . . . . . . . . . . 80
4.23 Light-eld images for projectors 2548. . . . . . . . . . . . . . . . . . . . 81
4.24 Light-eld images for projectors 4972. . . . . . . . . . . . . . . . . . . . 82
4.25 Light-eld images for projectors 7396. . . . . . . . . . . . . . . . . . . . 83
5.1 Direct Light Field Rendering (DLFR): Unlike conventional multiview ren-
dering approaches, our proposed approach directly renders the 3D panel
image without reconstructing the multiview images. . . . . . . . . . . . . 86
5.2 4D light eld representation: The light ray that passes through (x;y)2
and (s;t)2 can be represented as L(x;y;s;t) when and
represent
parallel planes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
ix
5.3 Light eld representation (a) Light ray from 3D display and (b) light
eld of 3D display, L(x;y
;s
A
(x);t
). Light rays from the display can be
depicted as a light eld representation. . . . . . . . . . . . . . . . . . . . . 87
5.4 Light eld representation of the 3D point P and (a) light ray from 3D
point P (b) Light eld of the 3D point P, L(x;y
;s
P
(x);t
) . . . . . . . . 87
5.5 3D display light eld (a) Light rays from slanted lens autostereoscopic
display and (b) light eld plot of 3D display . . . . . . . . . . . . . . . . . 90
5.6 3D Display Light Field Function (DLF) of x and s: Light rays are tted
as multilines with the same slope when (a) n
l
is 1, (b) n
l
is 4, (c) n
l
is 5,
and (d) n
l
is 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7 Light eld of 3D Point P: (a) Light eld volume when t is xed, (b) left
image, (c) disparity in left image, (d) right image, (e) disparity in right
image, and (f) epipolar plane image when y is given. . . . . . . . . . . . . 93
5.8 Light eld function of 3D point P: The PLF has slope as
s
x
and passes
through (x
P
;s
L
). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.9 Direct Light Field Rendering(DLFR): The 3D panel image is composed
by the assignment of 3D point P to the intersecting point between the
DLF and PLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.10 Comparison between Direct Light Field Rendering (DLFR) and Multiview
Rendering (MVR) for increasing number of views: (a) Computation time
and (b) memory usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.11 Comparison between Direct Light Field Rendering (DLFR) and Multiview
Rendering (MVR) for various video contents when the number of views is
96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.12 DLFR performance with disparity range of (100; 100): Numbers of inter-
section points of DLF and PLF and computation time for 4K 3D panel
image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.13 Computation times for various multiline functions . . . . . . . . . . . . . 99
5.14 Computation time as a function of disparity for various multiline functions.100
x
Abstract
Rapid developments in 3D display technologies have enabled consumers to enjoy 3D
environments in an increasingly immersive manner through various display systems such
as stereoscopic, multiview, and light eld displays. Sucient 3D contents for various dis-
play systems play important role for further commercial viability of 3D display products.
The common 3D content, however, is only stereo sequences for 3D stereoscopic displays
and it is not guaranteed that the stereo sequences are well calibrated. 3D display render-
ing algorithm is a key to generating 3D contents from the conventional stereo sequences.
In this dissertation we investigate 3D display rendering framework for various type 3D
display systems from conventional stereo contents.
First, we introduce a new stereo matching algorithm, which estimates disparities in
high- and low-condence regions separately. A complementary tree structure is adopted
to identify the high-condence region and estimate its disparity map using dynamic
programming. Then, a disparity tting algorithm restores disparities in low-condence
regions from the color and disparity information of high-condence regions using global
optimization. The proposed stereo matching algorithm enhances disparity values in
both occlusion and dicult-to-estimate areas (e.g. thin objects) to yield a good-quality
disparity map.
Second, we propose an ecient multiview rendering algorithm for the autostereo-
scopic display that takes uncalibrated stereo as the input. The epipolar geometry of
multiple viewpoints is analyzed for multiview displays. The uncalibrated camera poses
for multiview display viewpoints are then estimated by algebraic approximation. The
1
multiview images of the approximated uncalibrated camera poses do not contain any
projection or warping distortion. By the exploiting rectication homographies and dis-
parities of rectied stereo, one can determine the multiview images with their estimated
camera poses.
Third, in order to achieve an immersive, natural 3D experience on a large screen, a
100-Mpixel multi-projection 3D display was developed. Ninety-six projectors were used
to increase the number of rays emanating from each pixel in the horizontal direction to 96.
We propose an ecient light-eld rendering algorithm that utilizes only a few input colors
and depth images. Using a depth map and estimated camera parameters, synthesized
light-eld images are directly generated. This algorithm requires a much lighter memory
load than conventional light-eld rendering algorithms. It is also much simpler than the
image-based rendering algorithm because it does not require the generation of so many
multiview images.
Finally, we propose a novel method, the so-called direct light eld rendering, which
can compose the display 3D panel image without reconstructing all the multiview images
beforehand. Interpreting the 3D display as sampling in the light eld domain, we attempt
to directly compute only the necessary samples, not the entire light elds or multiview
images. Our proposed algorithm involves the solving of linear systems of two variables,
thereby requiring remarkably low computational complexities.
2
Chapter 1
Introduction
1.1 Motivations
The rapid development of 3D display technologies allows consumers to enjoy the 3D
world visually through dierent display systems such as the stereoscopic, multiview, and
light eld displays. Despite the hype of 3D display development, there exist only a few
kinds of 3D contents. The most of 3D contents are only stereo sequences for stereoscopic
3D.
As a way of overcoming the lack of 3D contents for various display systems, 3D display
rendering algorithm generates various 3D display images from stereo sequences. Stereo
matching algorithm is essential technology for recreating the 3D structure from stereo
images. With stereo images and restored 3D structure, multiview rendering algorithm
generates various 3D display images. However there is corresponding increase in the
complexity of conventional multiview rendering algorithm in the attempt to achieve a
sucient of level of reality, which may hinder the further commercial viability of 3D
display products based on such a conventional approach. Besides insucient kinds of
3D contents, uncalibrated stereo sequences deteriorate 3D quality, yielding unwanted
warping distortion in the generated 3D images.
In this dissertation we investigate 3D display rendering algorithms for various type
3D display systems from conventional stereo sequences. First, we investigate stereo
matching algorithm which restores the 3D structure from stereo images. Using the
restored 3D structure, we propose uncalibrated multiview rendering algorithm without
warping distortion, and ecient rendering algorithms for projection type 3D display and
autostereoscopic 3D display.
3
Figure 1.1: Stereoscopic 3D display concept: A stereoscopic 3D display provides dierent
left and right images, and these left and right images are displayed separately to the left
and right eye by special 3D glasses.
1.2 3D Display Techniques
3D display technologies have been rapidly developed. With 3D displays, viewers experi-
ences the real three dimensional world that surpasses the experience provided by existing
2D display, which merely presents two dimensional images or movies. 3D displays are
categorized as either stereoscopic, for which glasses are required, or autostereoscopic, for
which glasses are not required. Autostereoscopic 3D displays are categorized as multi-
view display and light eld display. The light eld display can be implemented by either
multi-projection 3D display or
at panel 3D display.
1.2.1 Stereoscopic 3D Display
In human vision, the scenes shown by either eye are slightly dierent from each other
because of the physical separation of the eyes. Because of these dierences, the human
brain interprets the information in 3D. In order to recreate the 3D world through a phys-
ical 2D display, this principle is applied in reverse. A stereoscopic 3D display provides
dierent left and right images, and these left and right images are displayed separately to
4
Figure 1.2: Multiview display concept: Multiview displays project light rays to several
viewpoints.
the left and right eye by special 3D glasses. The stereo 3D display projects superimposed
left and right images as additive light setting such as one red and one cyon, polarizing
lters, or time sequential frames. Viewers can observe separate left and right images
by subtracted left and images by light setting called anaglyph, polarized lter glasses,
or shutter glasses. The requirement of glasses, however, induces inconvenience, and it
causes viewing fatigue because motion parallax is not available unless a special motion
tracking sensor is used [58,75].
1.2.2 Multiview 3D Display
Multiview 3D displays provide unique images to several viewpoints, oering limited
motion parallax, compared with only one viewpoint in the case of stereoscopic 3D display.
Multiview 3D displays are implemented by lenticular sheet lens [40, 42, 88] or parallax
barrier [48, 76] overlaid on a liquid crystal display. Figure 1.3 explains the principle of
multiview display with N multiview images. Viewers sense the rst 3D view when the
left eye sees the rst virtual view and the right eye sees the second virtual view. When
the viewpoint is moved, the left eye sees the second virtual view and the right eye sees
5
Figure 1.3: Multiview images for multiview display: Light rays of multiview display
converge to certain viewpoint and light rays in the certain viewpoints can be interpreted
as multiview images.
the third virtual view, and the viewer perceives the second 3D view. In this way, (N1)
virtual 3D scenes are changed according to (N 1) viewpoints. The changing scenes are
followed by viewpoint changes, giving us the illusion that we are looking around the 3D
world as a real-life world.
1.2.3 Light Field 3D Display
Even though multiview display provides motion parallax with multiple viewpoints, these
limited viewpoints cannot oer natural motion parallax as real world but discrete motion
parallax. Beside multiview displays which provide limited stepwise viewpoints, light eld
6
Figure 1.4: Light rays from a multiview display (left) and a light eld display (right):
Users cannot observe clear images out of viewpoints (sweat spots) through the multiview
display, but users can observe clear 3D everywhere through light eld display.
3D displays generate
uent viewpoints as real world. If the number of light rays increases
so 3D displays provide not limited viewpoints but countless viewpoints as real world,
then we can enjoy natural 3D world. To develop 3D display which projects continuous
viewpoints, multi-projection 3D display and
at panel display are used. Multi-projection
3D displays create light rays using a large number of projectors. Flat panel light eld
7
Figure 1.5: Multi-projection 3D display concept: A large number of light rays can be
generated by large many projetors to recreate natural 3D world. Projector array is used
for the light sources and diuser screen controls direction of light rays having horizontal
motion parallax.
display is designed to provide possibly many numbers of viewpoints to reduce stepwise
parallax artifacts.
Figure 1.4 shows the dierence between multiview displays and light eld displays.
Since light rays from multiview displays converse to certain viewpoints, user cannot
observe clear 3D image out of the sweet spot area as shows in Fig. 1.4. On the other
hand, light rays of light eld displays do not converse to certain points, but reproduce
light eld as real world. People can observe clear 3D in the larger area.
Multi-projection 3D display
A large number of light rays are necessary for the recreation of realistic 3D displays.
It is because that we perceive 3D objects based on an abundance of light rays in the
real world. Multi-projection 3D displays are good candidates for showing real 3D images
reproducing as many light rays as possible. Multi-projection 3D displays create light
rays using a large number of projectors to present the 3D world. As a result, very
smooth motion parallax without discontinuity can be acquired, and natural 3D images
are obtained without limiting the viewing distance. Figure 1.5 shows basic conguration
8
Figure 1.6: Flat panel light eld 3D display concept: light rays from 3D displays project
to a large number of viewpoints, providing natural motion parallax.
and operating principle of multi-projection 3D displays. Light rays are created using a
large number of projectors, and vertical diuser screen controls the direction of the light
ray.
Flat Panel 3D Display
Multi-projection 3D display provides a large number of light rays with natural motion
parallax, but bulky structure due to the projectors hinders convenient usage as consumer
electronics. Panel-based autostereoscopic displays decrease image resolution by increas-
ing the number of light rays emanating from each pixel, having motion discontinuity and
unnatural 3D images. In order to overcome resolution diculties,
at panel light eld
displays are designed to have more viewpoints than multiview displays. A large number
of viewpoints generate interpolated 3D images having smooth motion parallax.
9
1.3 3D Content Rendering Techniques
1.3.1 3D Structure Restoration
The 3D display technologies are developed to allow consumers to enjoy the 3D world
visually through dierent display systems. However, 3D contents for various type 3D
displays are insucient. It is dicult to generate 3D contents for various 3D display types
by capturing system or computer graphics animation because of cost, system alignments,
and Multiview calibration. Therefore various 3D contents for many 3D displays are
usually generated from the general 3D contents. The most common 3D content is stereo
sequence. Stereo contents are composed with a left image for left eye and a right image
for right eye. Each image has slight dierent structure because of its dierent view point
position. From the dierences between left and right eyes we can estimate the depth of
the object by stereo matching algorithm. Using the estimated depth of object, we can
make the other viewpoint images, too.
Stereo matching algorithms are categorized into two type, local algorithm and global
algorithm. The local algorithms obtain the disparity value by a winner-take-all optimiza-
tion, imposing disparity smoothness using aggregation technique [34,73,74,77,97]. The
global algorithms estimate disparity value by a global energy minimization framework.
The examples of the global algorithms are graph cut (GC) [11, 89], belief propagation
(BP) [21,24,82,94,95], and dynamic programming (DP) [10,12,13,45,67,98].
The DP is implemented by a scan-line operation, resulting in simple method. Despite
its simplicity, DP method is not practically useful because of its streaking artifacts. To
reduce this artifact, Veksler proposes dynamic programming on the tree structure [90].
Hirschumuller [33] proposes a semi-global stereo matching algorithm that uses more than
four neighborhoods.
Global optimization algorithms have artifacts in occlusion areas. Kolmogorov and
Zabin [46] proposed a modied graph cut algorithm which adds a visibility penalty to the
10
energy term. Sun et al. [81] proposed a new energy function belief propagation algorithm
that includes visibility terms.
1.3.2 3D Rendering with Non-ideal Stereo Contents
Although 3D structure is estimated by stereo matching algorithm, the uncalibrated stereo
contents yield 3D quality and synthesis artifacts. Non-ideal stereo contents do not hinder
3D quality in stereoscopic displays. However, non-ideal stereo images seriously 3D quality
of other 3D displays because traditional stereo matching and view synthesis algorithms
are vulnerable to non-ideal sequences.
Compensations for non-ideal stereo contents are categorized into two type, calibra-
tion compensation or view synthesis compensation. The calibration compensations focus
on to nd less artifacts rectication parameter. Several algorithms eectively reduced
the rectication warping distortion by enforcing a homography to minimize the projec-
tive error or resampling eects [28, 53, 56, 99]. For video sequences, Bleyer proposed a
temporally consistent disparity map algorithm [9]. Uncalibrated view synthesis algo-
rithms compensate synthesis error by camera pose estimation or homography interpola-
tion [23,25,26].
1.3.3 3D Rendering with 3D Structure
Multi-projection 3D Rendering
In order to generate 3D images for multi-projection 3D display, processing for abun-
dant light source from a large number of projectors is substantial task. To process pro-
jection type 3D rendering, image-based rendering (IBR) algorithms are popular [78], [79].
For the projection type light eld rendering, a large number of input camera arrays [92]
or a multiview rendering synthesis algorithm [20, 80] is collection of sample images
for IBR. Another technique used to acquire light rays with a 2D slices of a 4D func-
tion [1,29,51,52,59]. Projection type 3D rendering algorithms which use relatively fewer
11
Figure 1.7: Images for multi-projection 3D displays
depth maps and input images, were introduced [6, 69]. The algorithms generate multi-
view images then create light-eld images by IBR algorithm.
Figure 1.7 shows images for projection 3D displays. Input images are stereo images,
Fig. 1.7(c, d), which have only horizontal parallax. The synthesized left and right
multiview images Fig. 1.7(e, f), have only horizontal parallax, too. However the images
for the projector array, Fig. 1.7(a, b), have dierent structure with input stereo images.
Flat Panel Light Field 3D Rendering
A number of techniques have been proposed for 3D light eld rendering. Point
Retracing Rendering (PRR) technique renders images in a point by point manner [35].
Parallel Group Rendering (PGR) [39, 61, 96] and Viewpoint Vector Rendering (VVR)
techniques use an orthographic projection model to process multiple light rays at the
12
same time [7, 62, 71]. However, PGR and VVR are not suitable for autostereoscopic
multiview displays, because a multiview display has a large number of directional light
rays that arise due to the use of slanted lenses. To address this problem, Van Berkel
introduced a Multiview Rendering (MVR) algorithm, which renders each group of rays
from the same viewpoint using a perspective projection model [30,87]. Subsequently the
algorithm weaves multiview images in the 3D panel image by using information from the
display light ray pattern [87].
1.4 Contributions of the Research
Several contributions are made in this research, which are described as follows.
Stereo matching with condence-region decomposition and processing
A stereo matching scheme using condence-region decomposition and processing
based on dynamic programming is proposed. Although most of existing stereo matching
algorithms focused on obtaining accurate disparity map using stereo pairs, it is dicult
to estimate disparity values in both occlusion and dicult-to-estimate areas (e.g. thin
objects). First, complementary tree structure is used to identify high condence regions
by estimating their disparity images using dynamic programming optimization. Then,
low-condence regions are restored by the color and disparity information of high con-
dence regions using the global optimization. The proposed stereo matching algorithm
yields a high-quality disparity map by enhancing disparity values in both occlusion and
dicult-to-estimate areas (e.g. thin objects).
Uncalibrated multiview synthesis
We propose a robust uncalibrated multiview rendering algorithm for the autostereo-
scopic multiview display. The proposed uncalibrated multiview rendering algorithm
yields well calibrated multiview images without warping distortion. Most of multiview
rendering algorithm from the uncalibrated stereo images yields unwanted warping dis-
tortion. And the warping distortion seriously hinders the 3D quality. Rather than
13
generation of accurate multiview images which induce unwanted warping distortion and
temporal inconsistent artifacts, we suggest epipolar gemotry approximation model for
uncalibrated camera poses. We rst derive that the approximate camera poses of inter-
polation and extrapolation of stereo images using epipolar geometry. Interpolation and
extrapolation camera posed for the multiview display locate on the same line of stereo
images. The rotation and translation parameter can be approximated linear interpo-
lation of matching pixels. In order to obtain the dierence between matching points,
feature selection, rectication, stereo matching algorithm, de-rectication, and render-
ing algorithm is applied. Since the proposed algorithm generates synthesis view images
using original stereo images, not warped rectied stereo images, it does not contain
warping distortion, and temporal inconsistency artifacts due to the rectication.
Ecient light-eld rendering using depth maps for 100-Mpixel multi-
projection 3D display
We propose projector type 3D Light-eld 3D display which can recreates real 3D
with 100-M pixels. The optimal projector conguration was designed to minimize the
brightness variation of 3D images, which increases with an increase in the number of pro-
jectors. A 100-Mpixel multi-projection display was developed using an optimized design,
having a viewing angle of 24
and produced natural 3D video with very smooth motion
parallax. In order to generate a large number of 3D pixels, conventional algorithms use
a large number of cameras or input images. This creates diculties in the design of both
the large acquiring system and substantial memory storage.
We also propose an ecient light-eld image generation algorithm based on depth
maps, which transforms viewpoint images into the world coordinate structure and maps
them to the light-eld space. Proposed algorithm utilizes only a few input colors and
depth images. Synthesized light-eld images are directly generated using a depth map
and estimated camera parameters. The proposed algorithm reduces computation com-
plexity and memory usage compared with conventional light-eld rendering algorithms.
14
The proposed algorithm is also much simpler than conventional algorithms since it does
not need the multiview image rendering processing.
Direct light eld rendering without 2D image generation
We propose a novel method to generate the 3D panel image with extremely low
computational complexity and memory usage. The proposed algorithm generate 3D
panel image directly without generation of multiview images because the destination
location of all pixels in 3D panel image can be directly calculated in the light eld
domain. We rst dene 3D display light eld model as a multi-line function. Afterwards,
we dene the light ray model from the input which is usually color pixel and its disparity
in the light eld domain as line function. Finally, the input color pixel is assigned to
the destination of the intersection points between 3D display multi-line function and
the input 3D point line function on the 3D panel image. The intersection points are
only necessary samples for the 3D display among all possible light rays from an input
point. The simple process of solving the linear systems extremely reduces computational
complexity and memory usage without storage for the additional view images, except
for the 3D panel image.
1.5 Organization of the Dissertation
The rest of this dissertation is organized as follows. A stereo matching scheme using
condence-region decomposition and processing is proposed in Chapter 2, where a com-
plementary tree structure to identify high-condence regions and estimate their disparity
maps using the dynamic programming. Afterwards, disparities in low-condence regions
with the color and disparity information of high-condence regions are restored via global
optimization. An uncalibrated multiview synthesis system based on the epipolar geome-
try approximation is proposed in Chapter 3. In this chapter, cameras posed for multiview
display viewpoints (MDV) are modeled and approximated epipolar geometry for MDV
is derived by an algebraic approximation technique. Afterwards, an implementation
15
of approximated epipolar geometry model is described. In Chapter 4, we propose an
ecient light-eld rendering using depth maps for a 100-M pixel multi-projection 3D
display. The optimized design for the 100-M pixel display is described and an ecient
light-eld rendering that uses depth maps is introduced. In Chapter 5, direct light eld
rendering algorithm is presented which extremely reduces computational complexity and
memory usage. Direct light eld rendering algorithm interprets the 3D display as sam-
pling in the light eld domain. The interpreted light lay samples are directly computed
by solving of linear systems of two variables, resulting in low computational complixity.
Finally, concluding remarks and future research directions are given in Chapter 6.
16
Chapter 2
Stereo Matching with
Condence-Region Decomposition
and Processing
2.1 Introduction
Stereoscopic 3D technology is commonly used in 3D movies, DVDs and games nowadays.
Development in 3D technology is motivated by the human visual system, where the 3D
world is conceived through the physical separation of two eyes. This separation creates a
dierence in scenes observed by either eye. This dierence is known as the left-and-right
view disparity, and the extent of the disparity determines how far or near the object
is located. An example of 3D technology is a stereo movie captured by two separate
cameras or synthesized to create two views using graphic animation.
Beyond stereo 3D, viewers can enjoy multiview 3D as well, which provides motion
parallax as a result of multi-camera capturing and multiview display systems [17]. Dis-
parity estimation, which restores the 3D structure from stereo images, is an essential
technology for recreating the 3D world from stereo input images. In this chapter, a
new disparity estimation technology that provides a simple and accurate disparity map
suitable for generation of multiview images is proposed.
Stereo matching methods can be categorized into two types: 1) local algorithms and
2) global algorithms. It is often desired to preserve the disparity smoothness, and their
main dierence lies in the way to impose the disparity smoothness constraint [74]. The
17
local algorithm, also called the window-based algorithm, imposes the constraint in a local
region within a window. Disparity smoothness can be determined using an aggregation
technique. Note that this is not an explicit estimation method [34,73,77,97] in the sense
that the disparity value at each pixel is obtained using a winner-take-all optimization
process [74]. As a result, the estimated disparity may have local noisy artifacts. In
contrast, the disparity is estimated using a global energy minimization framework in
global algorithms. Graph cut (GC) [11, 89], belief propagation (BP) [21, 24, 82, 94, 95],
and dynamic programming (DP) are examples that adopt a global energy minimization
framework [10,12,13,45,67,98]. Graph cut and belief propagation operate on the whole
image and need iterations in nding the minimum. Thus, these algorithms are complex
and demand longer execution time.
The DP method is the simplest global optimization algorithm that can be imple-
mented by a scan-line operation without any iteration. Despite its simplicity, the DP
method is not practically useful because of its severe drawback. Specically, it has a
streaking artifact owing to the scan-line operation. To reduce this artifact, Veksler [90]
proposed another dynamic-programming stereo matching solution with a tree structure
where the smoothness of more than two neighbor pixels can be imposed. However, Vek-
slers tree structure does not consider the full smoothness of four neighbors, resulting in a
worse result as compared to other global optimization algorithms. Hirschumuller [33] pro-
posed a semi-global stereo matching algorithm that uses more than four neighborhoods
to overcome the shortcoming of the scan-line optimization. However, the semi-global
matching as well as the mutual information matching are both required in his algorithm
since the semi-global matching alone is insucient for estimating occlusion areas and
thin objects.
Global optimization algorithms such as graph cut and belief propagation incorporate
full neighborhood information, yet they have artifacts in occlusion areas (e.g. object
boundaries), frame boundary areas and hard-to-estimate areas (e.g. thin objects). Kol-
mogorov and Zabin [46] proposed a modied graph cut algorithm that considers the
18
visual correspondence of stereo images by adding a visibility penalty to the energy term.
However, it does not assign visibility error disparity, and accurate disparity assignment
cannot be guaranteed. Sun et al. [81] proposed a new energy function that includes
visibility terms in the process of belief propagation, but it does not perform well in
hard-to-estimate areas.
We rst use a complementary tree structure to identify high-condence regions
and estimate their disparity maps using DP. Afterwards, we restore disparities in low-
condence regions with the color and disparity information of high-condence regions
using the global optimization. The proposed stereo matching algorithm enhances dis-
parity values in both occlusion and dicult-to-estimate areas (e.g. thin objects) to yield
a high-quality disparity map. The rest of this chapter is organized as follows. The newly
proposed stereo matching algorithm is described in Section 2-2. The experimental results
are presented in Section 2-3. Finally, concluding remarks are given in Section 2-4.
2.2 Proposed Stereo Matching Algorithm
2.2.1 Overview of Proposed Solution
A stereo matching algorithm that uses dierent algorithms to estimate disparities in high-
and low-condence regions is described in this section. In order to identify high con-
dence regions and estimate their disparities, we rst adopt mutually complementary tree
structures in conducting the DP optimization process with an objective to impose the
smoothness constraint of four neighboring pixels. After that, the low-condence regions
will be restored by a global disparity tting algorithm using the information from the
color input as well as the disparities of high-condence regions. With the assistance
of disparities in the high-condence region, the global disparity tting performs well
at occlusion areas and hard-to-estimate areas, which are otherwise dicult to estimate
owing to the lack of information. The proposed algorithm successfully reduces dispar-
ity errors in hard-to-estimate regions via condence region decomposition and reliable
19
disparity values in high-condence regions serve as better boundary values for disparity
estimation in low-condence regions.
2.2.2 Disparity Estimation in High-Condence Regions
A DP optimization algorithm simplies an optimization problem by dividing it into a
sequence of sub-problems. Mathematically, we useE(f
1
;:::;f
m
) to denote a function of
variables f
1
;:::;f
m
, and it can be divided into a set of sub-problems as
E(f
1
;:::;f
m
) =E
1
(f
1
;f
2
) + +E
m1
(f
m1
;f
m
); (2.1)
whereE
n
(f
n
;f
n+1
) is a sub-problem of two variables. The minima of these sub-problems
are then calculated by
M
1
(f
2
) = min
f
1
E
1
(f
1
;f
2
)
M
2
(f
3
) = min
f
2
[M
1
(f
2
) +E
2
(f
2
;f
3
)]
M
k
(f
k+1
) = min
f
k
[M
k
1(f
k
) +E
k
(f
k
;f
k+1
)];
(2.2)
where M
n
(f
n+1
) denotes the minimum value of sub-problem E
n
(f
n
;f
n+1
). Finally, the
global minimum is determined by solving the following sub-problem:
min
f
E(f
1
;:::;f
m
) = min
fm
M
m1
(f
m
): (2.3)
The stereo matching problem is usually formulated as an energy optimization problem
in form of
E(d) =E
d
(d) +E
s
(d); (2.4)
where E(d)is total energy cost, E
d
(d) is the data delity term used to penalize the
mismatch between the estimated solution and the observed data,E
s
(d) is the smoothness
20
(a) (b)
Figure 2.1: Two complementary tree structures: (a) the horizontal tree structure and
(b) the vertical tree structure.
term used to penalize the non-smoothness of disparity values in neighbor pixels, and
is a parameter that determines the ratio of the data term and the smoothness term.
The DP optimization process in Eq. (2:2) can be used to solve the problem in
Eq. (2:4) containing the data delity term and the smoothness term involving one to
four neighbors. Usually, the smoothness term will involve two neighbors in a scan-line
structure and it may involve one to four neighbors in a tree structure. In contrast, global
optimization methods (e.g. BP and GC) impose the smoothness of four neighbors. If
the row-wise scan line is adopted, we can incorporate the smoothness of two horizontal
neighbors to get the following sub-problem:
E(p(x;y);d) =E
d
(p(x;y);d)
+ min
d
i
2D
[E(p(x 1;y);d
i
) +E
s
(d
i
;d)];
(2.5)
wherep(x;y)is the pixel location of (x;y) andD denotes the disparity range for a given
stereo image pair.
Two complementary scan tree-structures introduced in [8] are depicted in Fig. 2.1,
where a horizontal tree structure and a vertical tree structures are shown in Fig. 2.1(a)
21
and 2.1(b), respectively. They oer complementary directions in imposing the smooth-
ness constraints. The data processing steps in the horizontal tree-structure are mostly
along horizontal directions except for one in the vertical direction. It is ideal for regions
with horizontal edges since there is little error accumulated along the horizontal direction.
In contrast, the vertical tree-structure is ideal for regions with vertical edges. The two
complementary tree-structures can be used for high-condence region detection. That
is, if the disparity values estimated using these two structures in a region are about the
same, it is claimed to be a high-condence region.
Generally, the DP process with a tree-structure can be written as
E(p(x;y);d) =E
d
(p(x;y);d)
+
X
w2Cv
min
d
i
2D
[E
w
(p(x;y);d
i
) +Es(d
i
;d)];
(2.6)
where E
w
(p(x;y);d) is the energy of a child node of p(x;y) at disparity d and C
v
is
a set of child nodes of p(x;y). We will derive the DP step for the horizontal and the
vertical tree-structures as shown in Fig. 2.1 below. Each pixel serves as a root node of
a certain tree-structure. The DP process for a pixel with the horizontal tree-structure
can be written as
E
h
(p(x;y);d) =E
d
(p(x;y);d)
+
X
w2cv (x)
min
di2D
[E
w
(p(x;y);d
i
) +Es(d
i
;d)]
+
X
w2cv (y)
min
di2D
[E
h
(w)(p(x;w);d
i
) +Es(d
i
;d)];
(2.7)
where E
h
is the horizontal energy, C
v
(y) is a set of child nodes of p(x 1;y) consisting
ofp(x + 1;y) andp(x + 1;y),C
v
(x) is another set of child nodes consisting of p(x;y 1)
andp(x;y + 1),E
h
(w)(p(x;w);d) is the horizontal child energy of thep(x;w) node when
its parent node isp(x;y). It does not contain energy propagation fromp(x;y) top(x;w).
As shown in Eq. (2:7), nodep(x;y) in the horizontal tree-structure has four child nodes
22
two from the horizontal direction and the other two from the vertical direction. By
decomposing the 2nd and the 3rd terms of the right-hand-side of Eq. (2:7) explicitly, we
have
E
h
(p(x;y);d) =E
d
(p(x;y);d)
+ min
d
i
2D
[E(p(x 1;y);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E(p(x + 1;y);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E
h
(w)(p(x;y 1);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E
h
(w)(p(x;y + 1);d
i
) +E
s
(d
i
;d)]:
(2.8)
By following the same procedure as shown in Eq. (2:7), we can derive the following
DP process for the vertical tree-structure:
E
v
(p(x;y);d) =E
d
(p(x;y);d)
+
X
w2cv (y)
min
d
i
2D
[E
w
(p(x;y;d
i
) +E
s
(d
i
;d)]
+
X
w2cv (x)
min
d
i
2D
[E
v
(w)(p(w;y);d
i
) +E
s
(d
i
;d)]:
(2.9)
Where E
v
is the vertical child energy of the p(w;y) node when its parent node is
p(x;y). Similarly, we can follow the same decomposition in Eq. (2:8) to get the following
equation:
23
(a) (b)
Figure 2.2: Disparities derived from (a) the horizontal tree-structure and (b) the vertical
tree-structure.
E
v
(p(x;y);d) =E
d
(p(x;y);d)
+ min
d
i
2D
[E(p(x;y 1);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E(p(x;y + 1);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E
v
(w)(p(x 1;y);d
i
) +E
s
(d
i
;d)]
+ min
d
i
2D
[E
v
(w)(p(x + 1;y);d
i
) +E
s
(d
i
;d)]:
(2.10)
The disparity of p(x;y), denoted by D(x;y), is estimated by minimizing the energy
function given in Eq. (2:7) or Eq. (2:8) via
D(x;y) = arg min
d2D
E(p(x;y);d): (2.11)
We obtain two disparity values at each pixel node since there are two tree-structures
associated with it. To give an example, the disparity, D
H
, derived from the horizontal
tree-structure and the disparity, D
V
, derived from the vertical tree-structure for a test
stereo image pair called condence check are shown in Fig. 2.2(a) and Fig. 2.2(b),
24
(a) (b)
Figure 2.3: (a) The disparity values in high-condence regions are shown in gray levels
while the low-condence region is shown in white whose disparity is to be recovered. (b)
The output disparity map where the disparities in low-condence regions are restored
via disparity tting.
respectively. We see horizontal cumulative errors near vertical thin objects in the D
H
map and vertical cumulative errors near horizontal edge regions in the map. When the
two disparity values, D
H
and D
V
, at one point are the same, we say that it is in the
high-condence region.
The left- and right-disparity consistency check works well in detecting the high-
condence region. That is, the region in which the left-disparity is not consistent with
the right- disparity is excluded. Furthermore, the local peak in the map also has a high
disparity error and should be removed as well.
2.2.3 Disparity Estimation in Low-Condence Regions
The disparities in low-condence regions can be estimated using their own color informa-
tion and disparities in high-condence regions with a global disparity-tting algorithm.
To satisfy the delity of the color information, we can write the energy function as
E(
~
d) =
X
~
d
i
2
~
d
8
<
:
~
d
i
X
d
j
2N(d
i
)
ij
~
d
j
9
=
;
2
=
1
2
~
d
T
Q
~
d; (2.12)
25
where
~
d is the nal disparity vector,
~
d
i
is the ith element of the nal disparity vector,
denotes a neighbor (a 33 window in our implementation),
ij
is the color similarity
between the ith and the jth color pixels, which have the same locations as
~
d
i
and
~
d
j
,
respectively. To ensure the smooth disparity transition between high- and low-condence
regions, we dene the following ultimate cost function
J(
~
d) =E(
~
d) +(
~
d d)
T
D(
~
d d) (2.13)
where d is a vector consisting of disparity values obtained in the high-condence region,
D is a diagonal matrix whose diagonal elements are 1 for constrained pixels in the high-
condence region and 0 for unconstrained pixels in the low condence region, and is
the Lagrange parameter. By dierentiating Eq. (2:13) with
~
d, we obtain the following
sparse linear system
(Q +D)
~
d =d (2.14)
whose solution corresponds to the global minimum of the energy function in Eq. (2:13)
[50].
We give an example to illustrate the above disparity tting process in Fig. 2.3. The
high-condence disparity map is shown in gray levels while the low-condence regions are
shown in w in Fig. 2.3(a). This is the input to the disparity tting process. The output
disparity map is shown in Fig. 2.3(b). Now, the disparity values in low-condence regions
are restored with the global cost function given in Eq. (2:13). We see from Fig. 3(b)
that both the occlusion region and the hard-to-estimate region near the three brushes
are successfully estimated with the global tting algorithm.
2.2.4 Post-Processing
The disparity tting process as described in Sec. III.C can be done using the left- or the
right-disparities, yet the two resulting disparity maps may not be the same. We show
the output disparity maps using the left- and the right-disparities are shown in Figs.
26
(a) (b)
(c) (d)
Figure 2.4: Disparity maps before and after post-processing: (a) left disparity-tting
result before post-processing, (b) right disparity-tting result before post-processing, (c)
left disparity-tting result after post-processing, and (d) right disparity-tting result
after post-processing.
2.4(a) and (b), respectively. We can clearly see their dierences, which are caused by
inconsistencies between the left-disparities and right-disparities. This type of error can
be corrected by the left- and the right-disparity consistency check. Furthermore, the
background region can be restored by the foreground disparity in the disparity tting
process when the color information in Eq. (2:12) is too similar or insucient to nd a
good disparity value. These are post-processing steps needed to enhance the disparity
map quality. Figs. 2.4(c) and (d) show the post-processing results of Figs. 2.4(a)
and (b), respectively. As compared with the disparity maps before post-processing, the
dierences after the post-processing have been signicantly reduced. The boundaries of
27
Table 2.1: Middlebury Stereo Evaluation (Error Threshold = 2 pixels)
tsukuba venus teddy cones
Algorithm nocc all nocc all nocc all nocc all
Proposed 3.435 4.465 0.993 1.634 6.203 11.123 6.445 11.674
Tree DP [90] 1.733 2.554 1.215 1.895 8.805 17.205 5.574 14.105
SGM [33] 2.014 2.523 0.362 0.682 3.462 6.552 2.381 8.151
occlusion GC [46] 0.882 1.662 1.104 1.573 7.034 11.374 4.163 10.733
symetric BP [81] 0.701 1.371 0.141 0.291 2.651 4.481 3.262 8.392
post-process disparities are clearer since the background and the foreground are better
separated through the left and the right consistency check.
2.3 Experimental Results
In the experiment, we choose the 33 sum-of-squared-dierences (SSD) as the matching
cost for the data term and the Potts model in a Markov Random Field as the smoothness
term. Ifd
i
is equal tod,E
s
(d
i
;d) is zero (i.e. no penalty). Otherwise, a penalty is given,
which is in inverse proportion to the color dierence in the d
i
and d locations. A large
value is used in the global disparity tting step to prevent high-condence disparity
changes. The Middlebury data set and several stereo sequences captured by a stereo
camera, Man and Woman, or synthesized via computer animation, Bird and Coral, are
used for disparity quality. MPEG data set is used for experiments of the rendered image
quality.
2.3.1 Middlebury test set
The performance of ve stereo matching algorithms is compared in Fig. 2.5 and Fig. 2.6,
where disparity maps for Tsukuba, Venus, Teddy and Cones in the Middlebury test data
are shown. The results of the proposed algorithm are good at the occlusion and frame
boundary areas maintaining object structure well. Moreover, disparities in thin object
28
(a) (b) (c)
Figure 2.5: Comparison of results on Middlebury data, rst column: ground truth
images, second column: proposed algorithm second column: proposed algorithm third
column: tree DP [90]
areas are well estimated as a segmented result as shown in Figs. 2.5-2.6. The semi-global
matching results are of good quality, but errors exist in the frame boundary region. The
symmetric BP gives accurate results in occlusion regions, but there are artifacts in the
29
(a) (b) (c)
Figure 2.6: Comparison of results on Middlebury data, rst column: semi-global match-
ing [33], second column: occlusion GC [46] third column: symetric BP [81]
frame boundary and the thin object. The occlusion GC performs poorly in occlusion
regions.
The results of the proposed algorithm are good in the aspect of object shape restora-
tion. Frame boundary areas which do not have corresponding points in the stereo pairs
30
(a) (b) (c)
Figure 2.7: Comparison of results on Middlebury data, Synthesized new view between
left and right stereo pairs rst column: ground truth images, second column: proposed
algorithm third column: tree DP [90]
restored reasonably for the object shape. In spite of good looking disparity results of the
proposed algorithm, the error ratio dened by Middlebury test is not good. Table 2.1
shows error ratio when error threshold is dened as up to 2 pixels. In the results of table
2.1, the rank of proposed algorithm is always below average even despite of qualitative
31
(a) (b) (c)
Figure 2.8: Comparison of results on Middlebury data, Synthesized new view between left
and right stereo pairs rst column: semi-global matching [33], second column: occlusion
GC [46] third column: symetric BP [81]
results. It is because that the disparity estimation(Section 2.3) for low condence region
does not estimate accurate disparity values but estimate the shape of objects.
32
(a) (b)
(c) (d)
Figure 2.9: Images captured by stereo cameras: (a) Man, (b) Woman, (c) the estimated
disparity for Man, (d) the estimated disparity for Woman.
However, even though the Middlebury evaluation results are not good, the rendering
results are good because the object shape are well restored. In the Fig. 2.7 and Fig. 2.8
are synthesized view where the between of left and right image pairs.
2.3.2 Real and synthetic stereo
In order to test general cases, contents captured by stereo cameras or synthesized by
computers are used as test contents. We choose complicate contents that contain thin
objects in dierent depth locations. As shown in Fig. 2.9 and Fig. 2.10, these objects
are well estimated with excellent quality. As shown in Fig. 2.10 the disparity results
are delicately estimated even though the objects are very complex. Thin objects such as
wood branches and sharp leafs are well estimated in Fig. 2.9(c) and (d) coral branches
with dierent depth value are well estimated in Fig. 2.10(d), and sophisticate detail such
as ns of shes are nely estimated in Fig. 2.10(c).
33
(a) (b)
(c) (d)
Figure 2.10: Computer-synthesis stereo images: (a) Bird, (b) Coral, (c) the estimated
disparity for Bird, (d) the estimated disparity for Coral.
2.4 Conclusion
A new method that decomposes the disparity map into high-condence and low-
condence regions and processes them separately was proposed to estimate the disparity
map from a stereo image pair in this work. These two regions can be identied using com-
plementary tree-structures. The stereo image inputs are rst used in the high-condence
region to estimate the disparity values. Afterwards, the disparities in high-condence
regions as well as the color image of one view (either the left-view or the right-view)
34
in the low-condence region are used to estimate the disparities of the low-condence
region via global energy minimization.
The new algorithm works well for thin objects, occlusion areas, and frame boundary
areas, which are dicult to estimate using only the stereo matching technique. It works
well for the Middlebury test sequences as shown by the experimental results. It also
performs well for real world captured images and computer-synthesized cartoon images
by producing a stable and high quality disparity map. We would like to extend the
proposed algorithm to uncalibrated stereo image pairs in the near future.
35
(a)
(b)
(c)
(d)
(e)
Figure 2.11: Extra middlebury results
36
(a)
(b)
(c)
(d)
(e)
Figure 2.12: Extra middlebury results
37
(a)
(b)
(c)
(d)
(e)
Figure 2.13: Extra middlebury results
38
(a)
(b)
(c)
(d)
(e)
Figure 2.14: Extra middlebury results
39
Chapter 3
Uncalibrated Multiview Synthesis
3.1 Introduction
The rapid development of three-dimensional (3D) display technologies allows consumers
to enjoy the 3D world visually through dierent display systems such as stereoscopic,
multiview, light eld, and holography displays. As a pre-processing step, the 3D content
should be manipulated properly to t each display type to oer the best user experience.
The most common 3D source is stereo input. It is dicult to capture ideal stereo
images with the same rotation and translation parameters. For a stereoscopic display,
non-ideal stereo images do not result in serious artifacts, since people cannot perceive
distortion within a certain range; namely, 0.7%0.5% of the vertical disparity dierence
[66]. However, non-ideal stereo images seriously deteriorate the perceived visual quality
of other 3D displays. This is because traditional stereo matching and view synthesis
algorithms are vulnerable to non-ideal stereo images. To address this problem, a proper
calibration process is essential for a 3D display system.
For the disparity estimation part, uncalibrated stereo images deteriorate the disparity
results because matching points do not exist in the stereo image pair. Even though the
stereo images are calibrated, stereo images should be rearranged so that the epipolar lines
are parallel in order to reduce the complexity of the stereo matching algorithm [74]. The
process that generates parallel epipolar line is called rectication [27, 32, 70]; however,
the rectication transform yields unwanted image warping distortion.
Mallon eectively reduced the rectication warping distortion by enforcing a homog-
raphy to minimize the projective error, but some warping distortion still remained [56].
Method such as minimizing resampling eects and metric rectication to reduce warping
40
distortion have been published, but non of them can completely remove distortion [28,53].
Zhou proposed a new rectication algorithm suitable for multiview displays, but its
results still contain a black region due to the warping distortion [99].
For video sequences, the rectication algorithm is applied to each stereo image pair
because matching features are detected in every sequence. An image based rectication
algorithm yields dierent rectication parameters for each video sequence, and these
dierent rectication parameters deteriorate temporal consistency. Bleyer proposed a
temporally consistent disparity map algorithm for uncalibrated stereo videos [9]. It rst
calibrates stereo images and recties them. Disparity map temporal smoothing was then
applied to the rectied stereo video sequences. Although this approach can overcome
artifacts due to temporal disparity inconsistency, the rectication warping problem still
remains.
Uncalibrated view synthesis algorithms have been proposed that use camera pose
estimation [23, 25, 26]. These methods estimate the camera poses of virtual viewpoints
or homography interpolation. Camera pose estimation method has synthesis artifacts
caused by epipolar geometry estimation error. The homography interoplation algorithm
generates little warping distortion, but it has serious temporal
ickering due the tempo-
rally inconsistent warping distortion.
In this chapter, we propose a robust multi-view rendering algorithm for the
autostereoscopic multiview displays that can provide well-calibrated results without
warping distortion.
41
Figure 3.1: Lenticular lens multiview display and its viewpoints
3.2 Multiview Display Rendering
3.2.1 Multiview Display Viewpoints (MDVs)
According to the lenticular lens or parallax barrier design, MDVs are aligned along one
line [65].The lenticular lens multiview display and its linear composed viewpoint, V
1
to
V
N
are shown in Fig. 3.1. Camera models for V
1
to V
N
are
V
1
= K[R t
1
]; ;V
N
= K[R t
N
]; (3.1)
where K is the intrinsic parameter, R isthe extrinsic rotation parameter, and t
i
, i =
1:::N, is the extrinsic translation parameter. Each V
i
should have the same K and R,
and each t
i
are located on the same line with the same interval.
3.2.2 Multiview Rendering from Ideal Stereo
If the stereo images are ideal, then the cameras for MDV satisfy Eq. (3.1). This means
that the MDVs of the multiview images are horizontally displaced [20], and therefore,
the corresponding points and interpolated point are only horizontally displaced.
The interpolated multiview image is
I
n
(x
1
+ (1)x
2
) =I
1
(x
1
) + (1)I
2
(x
2
); (3.2)
42
where I
1
, I
2
, and I
are the left, right, and interpolated, respectively, when x
1
and x
2
are corresponding points between the left and right images, respectively.
3.2.3 Previous Multiview Rendering from Non-ideal Stereo
Exact camera pose estimation algorithm species camera poses for virtual viewpoints
using relative ane structure [25]. The rotation angle and camera center for virtual
point n are
n
=(1)
1
+
2
;
O
n
=slerp(O
1
; O
2
;);
(3.3)
where
1
and
2
are rotation angles, respectively, for R
1
and R
2
andslerp is the spherical
linear interpolation of O
1
and O
2
at distance . Because relative ane structure is
assigned for every matching point, it is complex. Moreover, if the epipolar geometry
estimation contains error, the synthesized virtual view image becomes contaminated
because it implicitly contains relative ane structure.
The approximate version of an exact camera pose estimation uses homography inter-
polation to synthesize virtual viewpoints [23]. It simplies the view synthesis algorithm
by not using relative ane structure, but instead applying homography interpolation.
Therefore, it does not deteriorate because of the epipolar geometry error. However, it
contains warping distortion induced by homography warping. Moreover, it generates
temporal
ickering artifacts caused by the warping distortion.
3.3 Algorithm
In this section, we propose a method to approximate camera poses that do not contain
calibration or warping distortion artifacts for MDVs. This method takes uncalibrated
stereo as the input.
43
Figure 3.2: epipolar geometry of stereo and viewpoint V
n
.
3.3.1 Epipolar Geometry of the Algorithm
In Fig. 3.2, we depict the epipolar geometry of MDVs, where the left and right image
planes are denoted by
1
and
2
, respectively. Point X in world coordinates becomes x
1
in the left image and x
2
in the right image. It is straightforward to derive the following:
x
1
= K[R
1
t
1
]X; (3.4)
x
2
= K[R
2
t
2
]X; (3.5)
where K[R
i
t
i
],i=1 and 2, is a geometric transformation matrix and R
i
and t
i
are the
associated rotational matrix and translational vector, respectively
The camera pose of
n
in Fig. 3.2 for viewpoint V
n
is associated with transla-
tional vector (1)t
1
+t
2
and rotational matrixR
n
. We have the following proposition.
Proposition: Let x
n
be an image point of X at viewpoint V
n
. If R
1
and R
2
are
suciently similar (this is quantied in the body of proof), it can be well approximated
by
x
n
(1)x
1
+x
2
: (3.6)
44
Proof 1 By substituting Eqs. (3.4) and (3.5) into Eq. (3.6), we obtain
x
n
= K
h
(1)R
1
+R
2
(1)t
1
+t
2
i
X; (3.7)
since K is a linear operator. Thus, this proposition implies
R
n
(1)R
1
+R
2
(3.8)
and
O
n
(1)O
1
+O
2
: (3.9)
We would like to argue that Eq. (3.8) is valid. Let R
1!2
be the rotation matrix from
V
1
to V
2
. Generally, we can represent a rotation matrix with the same axis but with the
angle scaled by by R
1!2
if the following Taylor series converges:
R
1!2
=
1
X
k=0
( 1)::: (k + 1)
k!
(R
1!2
I)
k
: (3.10)
The necessary and sucient condition for this convergence is given by
max
i=1;2;3
j
i
(R
1!2
) 1j< 1; (3.11)
where
1
,
2
, and
3
are the eigenvalues of R
1!2
. If we denote by the rotation angle
of matrix R
1!2
, the eigenvalues are equal to 1, e
i
and e
i
, respectively. Therefore, the
inequality in (3.11) is reduced to
2 sin(=2)< 1; or jj<=3: (3.12)
45
If 0 1, the magnitudes of the kth-order term elements are smaller than
j2 sin(=2)j
k
=k. Hence for suciently small , we may approximate the series by its
rst-order, i.e.,
R
1!2
I +(R
1!2
I): (3.13)
By denition, R
1!2
= R
2
R
1
1
and R
n
= R
1!2
R
1
. Exploiting these facts, we nally
prove Eq. (3.8) as
R
n
(I +(R
2
R
1
I))R
1
(3.14)
= (1)R
1
+R
2
: (3.15)
In the following, we argue that Eq. (3.9) is also valid. The translation vector for
n
in Fig. 3.2 is
t
n
= (1)t
1
+t
2
: (3.16)
If we substite t with O,
R
n
O
n
= (1)R
1
O
1
+R
2
O
2
; (3.17)
for suciently small
O
n
(1)O
1
+O
2
: (3.18)
3.3.2 Implementation
The corresponding points of the uncalibrated stereo input can be obtained by rectication
homographies and the disparity of rectied stereo. With the corresponding points, we
46
Figure 3.3: (a) Original left image, (b) rectied left image, (c) rectied right image, and
(d) original right image.
can generate multiview images that do not contain uncalibration and warping distortion
artifacts using epipolar approximation Eq. (3.6).
We implemented the multiview synthesis algorithm for a pair of stereo images using
rectication homographies and the disparity map from rectied stereo image. An exam-
ple is given in Fig. 3.3, where x
1
, and x
2
are uncalibrated matching points and x
r1
and
x
r1
are their corresponding rectied image points. We have
x
r1
=H
1
x
1
x
r2
=H
2
x
2
;
(3.19)
where H
1
and H
2
are homography matrices. The disparity of x
r1
and x
r2
contains
only a horizontal disparity term, which is denoted by d(x
r2
). One can calculate the
corresponding point for x
2
using x
1
x
2
= H
1
2
2
6
6
6
6
4
H
1
x
1
+
2
6
6
6
6
4
d(H
1
x
1
)
0
0
3
7
7
7
7
5
3
7
7
7
7
5
(3.20)
and its interpolation point using Eq. (3.2).
47
Figure 3.4: Feature points and epipolar lines of left and right images.
3.4 Experimental Results
To obtain rectication homograpies of stereo images, we use Harris corner [31], FAST [72],
DOG [57], and Harris-Laplace [54] detectors for feature detection, and SIFT [55] for the
descriptor. The nal matching points can then be selected using RANSAC [22]. However,
so that features can be detected in textured regions, we use a feature spread algorithm.
Using the nal matching points, rectication homographies are obtained by Fusiello's
rectication algorithm [27]. For the stereo matching between the rectied stereo input,
we use a global dynamic programming optimization algorithm that is applied on the tree
structure [38].
Figure 3.4 shows the input stereo images, their matching point, and epipolar lines.
The vertical disparity of this image is 3.085 pixels which is less than human detection
limits of 0.7%0.5%, which is 7.560.5 pixels [66]. However, because their epipolar
lines are not parallel, we observe a warping distortion after rectication.
Figure 3.5 shows the image forV
n
with = 0:5. The result of the proposed algorithm
is well calibrated and does not contain the warping distortion, as shown in Fig. 3.5(a).
In contrast, blurring occurs in Fig. 3.5(b) because of the uncalibrated stereo input, while
warping distortion caused by rectication is observed in Fig. 3.5(c).
48
Figure 3.5: Interpolated view using (a) the proposed algorithm, (b) an uncalibrated
stereo image, and (c) a calibrated stereo image.
Table 3.1: Comparison Results
Algorithm Black Vertical Temporal Consistency
region disparity disparity
(%) (pixels) (pixels) (%)
Original 0 1.33 0 81.66
Fusiello [27] 11.75 0.27 14.13 87.95
Monasse [56] 2.62 0.23 21.05 87.95
Proposed method 0 - 0 87.95
Homography [23] 0.46 - 3.044 87.95
We useed 25 stereo video sequences width a total 6104 frames. The sequence image
size was FHD (1920 1080). The test videos contain real stereo content captured by
a stereo camera or synthetic stereo content created via computer animation. We dene
vertical disparity to be the average vertical displacement of matched feature points in
49
Figure 3.6: Vertical disparity of test stereo video sequences
the stereo images. The vertical disparity for each individual video sequence is shown in
Fig. 3.6. Although the test stereo video sequences are broadcasting content, they are not
well-calibrated as ideal stereo videos. The synthetic stereo contents also have vertical
disparity.
Table 3.1 compares the black region, vertical disparity, temporal disparity, and con-
sistency results of the proposed method with other algorithms. The black region is the
unassigned area ratio caused by the rectication warping. Temporal disparity measures
the features of the frames that do not move between frames but are displaced by the
rectication algorithm. The temporal disparity reduces temporal consistency. If the
matching points are matched again by the stereo matching algorithm, we dene the
matched points to be consistent features. The ratio between consistent features and
all features is the consistency value. Consistency indicates stereo matching algorithm
performance.
The results of Fusiello and Monasse contain a larger black region because of the
warping distortion even though they reduce vertical disparities. Moreover, the results of
Fusiello and Monasse have large temporal disparities that yield temporal inconsistency.
The original stereo does not contain black regions and temporal disparity, but it has low
consistency. This means that because of the uncalibrated results, the stereo matching
algorithm cannot perform well. Even though the homography interpolation algorithm
does not contain a serious amount of black region, it has serious temporal disparity,
50
resulting in serious temporal
ickering in the video. On the other hand, the proposed
algorithm does not contain any black region nor temporal disparity. It also has a good
consistency value.
3.5 Conclusion
An ecient multivew rendering algorithm with uncalibrated stereo images as the input
was proposed in this chapter. Rather than generating accurate multiview images that
yield unwanted warping distortion and temporal inconsistency artifacts, we propose a
new model for uncalibrated camera poses using an epipolar geometry approximation.
The approximated model is mathematically analyzed, and guaranteed as a result to
work well for non-ideal stereo images. We derived a model for multiview viewpoints from
non-ideal stereo input and demonstrated the eectiveness of the proposed algorithm by
comparing three interpolated views using three dierent algorithms.
51
Figure 3.7: Interpolated view using the proposed algorithm for video sequence.
52
Figure 3.8: Interpolated view using an uncalibrated stereo image for video sequence.
53
Figure 3.9: Interpolated view using a calibrated stereo image for video sequence.
54
Chapter 4
Ecient Light-Field Rendering
using Depth Maps for 100-Mpixel
Multi-Projection 3D Display
4.1 Introduction
Progress in the development of 3D displays has enabled us to reproduce a more realistic
3D world. Stereoscopic displays, introduced in the rst developmental stage, oer two
stereoscopic images, which provide 3D eects but cannot induce motion parallax. Later
multiview displays reproduce a more realistic 3D world with motion parallax by using
multiview images. However, this motion parallax can only be experienced when viewed
from a certain location. Moreover, accommodation-vergence con
ict which is disagree-
ment between the focal point of the eyes and the intersection point of the lines of sight [36]
still occurs in multiview displays. It causes eyestrain and discomfort for viewers. Various
studies are currently underway to develop 3D displays in order to solve these problems
using super multiview displays [64, 83{85], tensor displays [91], multi-directional back-
light [19], intergral imaging [37,93], and multi-projection display [3,41,43,44,63]. Beyond
studies, Holograka has demonstrated and commercialized various multi-projection 3D
displays [4,5].
Panel-based autostereoscopic displays [14,16,68] decrease image resolution by increas-
ing the number of light rays emanating from each pixel. This limits the angle and distance
55
at which 3D images can be clearly viewed. Consequently, changes in parallax images
create discontinuity, and natural 3D images cannot be displayed.
A large number of light rays are necessary for the reproduction of realistic 3D displays
because, in the real world, we perceive 3D objects based on an abundance of light rays.
Light-eld displays are good candidates for showing real 3D images without decreasing
the image resolution because they reproduce as many light rays as possible. Light-eld
displays create light rays using a large number of projectors in order to express the
3D world. Consequently, very smooth motion parallax without discontinuity can be
acquired, and natural 3D images are obtained without limiting the viewing distance.
4.1.1 Light-eld Rendering
A large number of projectors can improve 3D displays and create a natural 3D world
using an abundant light source. However, processing a huge number of light rays is a
substantial task. In order to process light-eld images, image-based rendering (IBR)
algorithms are popular [78], [79]. IBR algorithms do not require geometry. Instead, a
collection of sample images is used to render a novel view. In light-eld rendering, this
collection can be comprised of multiview images, and these multiview images can be
obtained using a large number of input camera arrays [92] and a multiview rendering
synthesis algorithm [20,80].
Another technique used to acquire light rays with a light-eld rendering algorithm
was introduced in [1, 29, 51, 52, 59]. The authors suggested the use of 2D slices of a
4D function, which does not require a geometric structure. Moreover, its degree of
complexity is reasonable because it only selects previously acquired images. However, it
requires a large number of input images and, therefore, a substantial amount of memory.
Light-eld rendering algorithms which use depth maps, requiring relatively fewer
input images and disparity maps were introduced [6,69]. However they do not generate
light-eld images using depth maps directly, but generate multiview images then create
light-eld images by IBR algorithm. Viewpoint-based multiview image processing is
56
(a) (b)
Figure 4.1: Light ray conguration of (a) multiview display and (b) light-eld display.
The light rays of the multiview display converge to certain points but those of the light-
eld display do not.
sucient for stereoscopic and multiview displays because these displays produce only a
few viewpoint light rays as shown in Fig. 4:1(a). However the light-eld display requires
a large number of viewpoints as shown in Fig. 4:1(b). A substantial number of multiview
images should be generated for the IBR algorithm. As a result, this technique is complex
and it needs a large amount of memory storage. Direct light eld generation using depth
map was introduced by Arachchi et al. [2], but it does not contain sucient information
such as specic algorithm process and eects.
In this chapter, a 100-Mpixel multi-projection display was developed for medical
and educationa application. It had a viewing angle of 24
and produced natural 3D
video with very smooth motion parallax. We also propose an ecient light-eld image
generation algorithm based on depth maps, which transforms viewpoint images into the
world coordinate structure and maps them to the light-eld space.
The rest of this chapter is organized as follows. The design for the 100-Mpixel multi-
projection 3D display is described in Section 4-2. An ecient light-eld rendering that
uses depth maps is introduced in Section 4-3. Experimental results are presented in
Section 4-4. Finally, conclusive remarks are given in Section 4-5.
57
Figure 4.2: Basic conguration and operating principle of multi-projection 3D display.
The light rays are created using 96 projectors, and a vertical diuser screen controls the
direction of the light ray.
4.2 Multi-projection 3D Display
Our 100-Mpixel multi-projection 3D display is composed of multiple projectors, which
create the light rays, and a vertical diuser screen that optically controls projectors' light
rays.
4.2.1 Operating Principle
Horizontal parallax only (HPO) light-eld displays show images that only have horizontal
parallax, and they require a specially designed vertical diuser screen [3,43]. Figure 4:2
shows the basic conguration and operating principle of the multi-projection 3D display.
Numerous projectors are horizontally and vertically arranged in order to produce a large
number of light-eld rays. Each projector creates a dierent angle with the screen in order
to separate the light-eld rays into individual light rays. The screen must compensate
for the dierences in the vertical positions of the projectors because three-dimensional
images only have horizontal parallax. Therefore, the vertical diuser screen need to have
small horizontal and large vertical diusion angles. When this vertical diuser screen
is used, an image from one projector is seen as one vertical block. Because two side
mirrors are arranged outside of the projectors, the perceived image is composed of 182
58
Figure 4.3: System design and parameters of multi-projection 3D display, where an
overhead view of the basic arrangement of the multi-projection 3D display is shown.
vertical light-eld blocks from the 96 projectors. Meaning, each block corresponds to
the combination of a projected image and its re
ection from the side mirrors.
4.2.2 System Design
Figure 4:3 shows the overhead view of the basic arrangement of the multi-projection 3D
display [49]. The basic layout is determined by the viewing angle
o
, the horizontal length
of the screen W
s
, and the total number of projectors n. Two side mirrors are arranged
between the projector array and the screen, and they are slightly tilted to increase the
viewing angle. The tilt angle of a mirror
m
and the distance between mirrors near the
projector array W can be calculated using the viewing angle and the horizontal length
of the screen. The length of the viewing zone W
v
is expressed by
59
W
v
= 2D
o
tan(
o
=2) (4.1)
whereD
o
denotes the viewing distance. The maximum de
ection angle of the light beam
can be calculated by
W
s
:W
v
=D
p1
:D
p2
(4.2)
= tan
1
(
1
2
W
v
D
p2
): (4.3)
The tilt angle of a mirror is expressed by
m
=
1
2
(
1
2
proj
) (4.4)
where
proj
is the horizontal projection angle of a projector. The distance between
mirrors near the projector array is expressed by
W = 2D
p
tan
m
+W
s
(4.5)
where D
p
is the projection distance.
The maximum number of projectors in the horizontal direction n
h
is determined by
n
h
W=(w +m) (4.6)
wherew is width of a projector andm is a marginal space between projectors. The total
number of projectors is represented by
n =n
h
n
v
(4.7)
60
where n
v
is number of projectors in the vertical direction. Since a small n
v
is advan-
tageous for minimizing the keystone eect, n
h
can be determined by the biggest value
which satisfy Eq. (4.6) and Eq. (4.7).
4.3 Depth Image Based Light-eld Rendering
In this chapter, we propose an ecient light-eld rendering algorithm that directly gen-
erates light-eld images using depth maps. On the other hand, the previous algorithms
rst generate multiview images then render light-eld images using the multiview images
by IBR algorithm. The amount of image memory required by the proposed algorithm
is much smaller than that in IBR-based light-eld rendering algorithms. Moreover, the
proposed algorithm is much simpler than IBR-based light-eld rendering algorithms.
The light-eld rendering algorithm is composed of calibration, disparity estimation,
3D modeling, horizontal parallax only rendering, mirror re
ection light-eld rendering,
and consistent hole lling. This study conducts on the light-eld rendering by DIBR
algorithm.
4.3.1 Calibration and Disparity Estimation
In order to perform disparity estimation of stereo input contents, the input stereo con-
tents should be rectied so that the matching point of left and right images should locate
at the same vertical position [74]. First, we have applied rectication algorithm to stereo
contents [27].
Since the eld of view of the projection type 3D display is much larger than stereo
display, we use wide baseline multiview images. In order to estimate disparity of wide
baseline images we rst generate down-sampled multiview image, and then apply stereo
matching algorithm. The belief propagation algorithm on the simple tree structure
is applied to the down-sampled stereo images. Then error region is improved by the
61
Figure 4.4: The block diagram of coordinate transformation from an image pixel to a
light-eld pixel using DIBR.
depth-color tting [38]. We generate up-sampled disparity maps using Weighted Mode
Filtering [60].
4.3.2 3D modeling using Input Image and Depth
The main idea of light-eld 3D rendering using DIBR is to make point clouds from
camera-captured images and depth maps. Using color images, a depth map, and cam-
era's conguration, we can infer 3D points that include their color and 3D location
information.
Figure 4:4 shows a block diagram of coordinate transformation from an image pixel
to a light-eld pixel using DIBR. Using input camera's intrinsic parameters such as,
its pixels are transformed into a camera coordinate system. Then, camera's extrinsic
parameters can be used to perform the transformation into a world coordinate system.
Using the extrinsic parameters of the projector, the world coordinate system is trans-
posed to a light-eld coordinate system. The pixel locations in the light-eld image are
assigned using the projector intrinsic parameters.
62
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 4.5: Input: (a) left color, (b) center color, (c) right color, (d) left depth, (e) center
depth, (f) right depth. The light ray of a given pixel and its geometric depth location of
(g) left, (h) center, and (i) right.
Figure 4:5 illustrates input camera images and depth. Given the depth of an image
pixel, it can be transformed into a camera coordinate system. A 3D model of the left,
center, and right images are individually transformed as shown in Fig. 4:6. x and y are
the direction of screen width and height, repectively. z is a direction of viewing location
and multi-projection 3D display. The plane atz = 0 is screen plane, and positivez value
means in front of the screen. In the camera coordinate system, perspective geometry
distortion does not occur. This means that the cubes shown in Fig. 4:5 do not have
the same metrics as the original cubes but those of the camera coordinate system as
given in Fig. 4:6. The 3D points of each camera can be combined by camera's extrinsic
63
(a) (b) (c)
Figure 4.6: Three-dimensional points of input color and depth images: (a) the left image,
(b) the center image, and (c) the right image.
Figure 4.7: Three-dimensional points of input color and depth images in the world
coordinate system. Red, blue, and green points are shown on the left, center, and right,
respectively.
parameters, which are the relative translation and rotation between the cameras. Figure
4:7 shows reconstructed 3D points in the world coordinate system. The combined model
represents the input image information.
Since the projectors emit light rays with a perspective property, their image can be
obtained using the perspective projection of the 3D points given projectors' extrinsic and
intrinsic parameters. Extrinsic parameters are given by the design conguration while
intrinsic parameters are decided by the projector. There are two dierences between
the camera and projector projections. The camera selects the nearest point using the
occlusion rule. For the projector case, the points farthest from the projector are selected.
64
Figure 4.8: Projector light rendering for the HPO screen.
Additionally, projectors have a keystone factor, which should be considered by intrinsic
parameters.
4.3.3 Horizontal Parallax Only Rendering (HPO)
Our display system oers only horizontal parallax through the vertical diuser screen.
Figure 4:8 shows HPO rendering, where C is projector, V is an object point, and B
0
is user location. In the HPO, users should observe the same point V whether in B or
B
0
, regardless of their vertical movement. In order to generate a light ray that is not
aected by user's vertical movement, we should consider the object point point V asA
0
not A, which is a vertical diusion point, on the screen plane. With the labelled points
A
0
(x
0
a
;y
0
a
;z
0
a
), B
0
(x
0
b
;y
0
b
;z
0
b
), and V (x
v
;y
v
;z
v
) in Fig. 4:8, the applied HPO parameter
y
HPO
v
can be calculated by
y
HPO
v
=y
0
a
=
y
v
z
v
z
b
(z
b
) (4.8)
and this new y
HPO
v
is used for the DIBR as world coordinate.
65
Figure 4.9: Real projectors and two tilt mirrors, where real projectors' light rays are
re
ected from the two tilted mirrors and the re
ected light rays act as though they are
from virtual projectors.
(a) (b) (c)
Figure 4.10: Projector images: According to the location of projector the portion of
image by the real and virtual projector is changed. (a) leftmost projector, (b) center
projector, and (c) rightmost projector.
4.3.4 Mirror Re
ection Light-eld Rendering
The use of two tilt mirrors for multi-projection 3D display enlarges the viewing angle.
Projector arrangement without mirrors generates the gray 3D viewing region as shown
in Fig. 4:9. Light rays of projectors with tilt mirrors expand the viewing region to the
blue dashed lines.
In order to generate light-eld images for projectors, the re
ected light rays should
be also considered. The light rays re
ected by the mirrors can be considered as if they
are produced by the other projectors, which are virtual projectors in Fig. 4:9. The
66
rotation and translation positions of the virtual projectors are dened using the tilt
angle of the mirrors. Using intrinsic and extrinsic parameters, light rays of real and
virtual projectors are generated. Then, re
ected light rays from the virtual projector
and direct light rays from the real projector are combined. Figure 4:10 shows images
from projectors. The left-side image is from the mirror re
ection and the right-side image
is from a real projector, in the leftmost projector image in Fig. 4:10(a). In the rightmost
projector image case, the left-side image is from a real projector and the right-side image
is from the mirror re
ection, in Fig. 4:10(c). The image of center projector is almost
from a real projector, both ends are from the mirror re
ection, in Fig. 4:10(b).
4.3.5 Consistent Hole Filling
In the process of DIBR, disoccluded hole areas in the synthesized light-eld images are
created by disparity dierence. If the hole area of each light-eld image is restored
separately, then the same region can be restored with dierent color. This induces
inconsistency of 3D images.
In the consistent hole lling algorithm, we rst estimate the largest hole area among
the light-eld images, then restore the estimated largest hole area. Hole area of each
light-eld image is lled by the same restored hole information. Recovering the same
textures in the hole areas improves view consistency and results in realistic 3D images.
4.4 Experimental Results
4.4.1 Display System
We designed a 100-Mpixel multi-projection 3D display system with the design param-
eters listed in Table 4.1. The top image in Fig. 4:11 shows the 55-inch (1185 740
mm, 16:10 aspect ratio) 100-Mpixel multi-projection 3D display system. Ninety-six pro-
jectors were attached to the angularly controllable jigs that adjust the tilt angle of the
projectors. They were embedded into the frame for the projector arrays as shown at the
67
Figure 4.11: The 55-inch, 100-Mpixel multi-projection 3D display system
bottom of Fig. 4:11. In order to achieve a smaller volume, SCRAM HD-451 projectors
(resolution 1; 280800 pixels, size 84109:235 mm) were adopted. A vertical diuser
screen with 1:0
horizontal and 60
vertical diusion angles was used. Two side mirrors
were tilted to 6:62
in order to acquire a 24
viewing angle. The image distribution sys-
tem was comprised of a controlling personal computer (PC) and two graphics-rendering
PCs. Three high-performance ATI FirePro V9000 graphics cards were installed in each
graphics-rendering PC, and four outputs of 2560 1600-pixel video images were gen-
erated by each graphics card. Twenty-four video wall controllers, which divided one
2560 1600-pixel image into four 1280 800-pixel HD images, were linked between the
projectors and graphics cards.
68
Table 4.1: Target Specications of Multi-Projection 3D Displays
Display Prototype system
Length of viewing zone at 1.5 m (W
v
) 638 mm
Distance between mirrors near the projector array (W ) 1,641 mm
Tilt angle of a mirror (
m
) 6:62
Distance from the screen to the imaginary central point O 5,106 mm
Horizontal angular pitch between projectors (toward O) 0:138
Projection distance (D
p
) 1,945 mm
Number of projectors in the vertical direction (n
v
) 6
Number of projectors in the horizontal direction (n
h
) 16
Screen size 55-inch
Viewing Angle 24
Viewing Distance 1-2 m
Number of Projectors 96
System volume (Optical engine only) 3m
3
Light-eld density 9.6view/IPD
Number of light-eld rays 98 M
(a) (b) (c)
(d) (e) (f)
Figure 4.12: Input: (a) left color, (b) center color, (c) right color, (d) left depth, (e)
center depth, and (f) right depth
69
(a) (b) (c) (d)
Figure 4.13: Magnied particle images of synthesized projector images (upper row) and
displayed 3D images captured by camera (lower row): (a) rendered by DIBR algorithm,
(b-d) rendered by the IBR algorithm using 20 (b), 100 (c), and 500 (d) multiview images.
4.4.2 Comparison Results of DIBR and IBR
IBR light-eld rendering algorithm generates multiview images rst. Then projector
images can be created by IBR technique which selects and assigns suitable pixel among
multiview images to the projector image pixel location. Since IBR algorithm does not
need depth structure, sucient multiview images are required to present natural light
rays. In this experiment we use 20, 100, and 500 multiview images for the IBR algorithm
in order to compare DIBR and IBR algorithm quality.
To compare light-eld images for DIBR and IBR, we used three computer graphic
input images and three input depths. The screen was located at z = 0 m, the three
cameras were located at z = 1.65 m. The cameras were separated by 0.6 m and the total
baseline of three cameras was 1.20 m. Intrinsic parameters for each camera and z-values
for each image were available. Figure 4.12 shows the input colors and depth maps, (a)
is left image, (b) is center image, and (c) is right image.
Figure 4:13 shows magnied particle images of synthesized projector image (upper
row) and displayed 3D image which is captured by camera (lower row). Figure 4:13(a)
is the result of our proposed algorithm which is DIBR, and Fig. 4:13(b)-(d) are results
70
Table 4.2: Computation Time Comparison (sec)
DIBR IBR IBR IBR
(proposed) (20 multiview) (100 multiview) (500 multiview)
7.26 12.23 12.82 15.78
Table 4.3: Memory Usage Comparison (Mega byte)
DIBR IBR IBR IBR
(proposed) (20 multiview) (100 multiview) (500 multiview)
288 342 558 1638
of the previous algorithm which is IBR; (b) was generated from 20 images, (c) was
generated from 100 images, and (d) was generated from 500 images. Figure 4:13(a)
does not contain quantization artifacts because proposed algorithm generates projector
images using depth structure. However the IBR images have quantization artifacts up to
100 images. Quantization artifacts were absent only when a sucient number of input
images, such as 500 images, were provided. As shown in the lower images of Fig. 4:13(b)
and (c), these quantization artifacts appeared in the 3D display, too. As a result, we
acquired the similar image quality of 500 images IBR with proposed DIBR mothod with
very small burden of image generation.
DIBR algorithm is more ecient than IBR algorithm in the aspect of computation
time and memory usage. Table 4.2 and 4.3 demonstrate computation time and memory
usage comparison between DIBR method and IBR method under Inter Xeon CPU @3.07
GHz processer. The computation time and memory usage are less than previous method
because DIBR does not need time and memory for the multiview image generation.
On the other hand, IBR algorithm requires more computation time and memory usage
according to the increasing number of multiview images.
Figure 4:14 presents displayed 3D images in our light-eld display; (a) shows a left-
sided image, (b) shows a center view point image, and (c) shows a right-sided image.
71
(a) (b) (c)
Figure 4.14: Displayed 3D images from multi-projection 3D display: (a) left image, (b)
center image, (c) right image
Figure 4.15: Multi-camera capturing system: each camera is separated by 130mm. Cam-
eras indicated by red rectangle (1st, 5th, and 9th cameras) are used for the input images.
Light-eld for the projection type 3D display is well created by DIBR algorithm, pro-
viding large motion parallax.
4.4.3 Results of Camera Captured Contents
To generate light-eld image from real life contents, we made the multiview camera
capturing system, and took real life multiview contents. We use only three view images
for the input to reduce processing complexity, and then generate 100-Mpixel light-elds
for the multi-projection 3D display.
Figure 4.15 shows multi-camera capturing system. The camera model is FUJINON
Corporation CF12.5HA-1 with 54:13
eld of view. The cameras are separated by 130mm
for each. Among cameras we use only three camera input images, leftmost, center, and
rightmost cameras.
72
(a) (b) (c)
(d) (e) (f)
Figure 4.16: Input colors and estimated disparity maps (a) from 1st camera image, (b)
5th camera image, (c) 9th camera image, (d) estimated disparity of 1st camera image,
(e) estimated disparity of 5th camera image, and (f) estimated disparity of 9th camera
image
Figure 4.16 shows captured input stereo colors and estimated disparity maps. To
create large motion parallax, we captured scene with sucient baseline between cameras.
However there is trade-o between disparity estimation and large motion parallax. If
the baseline is large then the disparity value between input images becomes big, yielding
diculties in the disparity estimation process. In order to overcome the diculties of
disparity estimation, we use down-sampled input image for disparity estimation. The
disparity maps of Fig. 4.16 shows artifacts in the texture less region. However the error
of the texture less region does not deteriorate the rendered images.
Figure 4.17 demonstrates the light-eld images for the 96 projectors. Since projectors
are located from left to right, the woman in the gure is close to the window in the left
projector images and the woman is close to the bookshelf in the right projector images.
4.5 Conclusion
We have implemented and veried DIBR light-eld rendeing algorithm through the
multi-projection 3D display. In order to process 100-Mpixel light rays, we proposed
73
(a) (b) (c)
(d)
Figure 4.17: Light-eld images for projectors; (a) image of projector 1, (b) image of
projector 48, (c) image of projector 96, (d) images for 96 projectors
an ecient light-eld rendering algorithm that uses relatively fewer input images and
74
depth maps. As demonstrated by experimental results, our proposed algorithm performs
better than conventional IBR algorithms in terms of memory, complexity, and quality.
75
Figure 4.18: Light-eld images for projectors 124.
76
Figure 4.19: Light-eld images for projectors 2548.
77
Figure 4.20: Light-eld images for projectors 4972.
78
Figure 4.21: Light-eld images for projectors 7396.
79
Figure 4.22: Light-eld images for projectors 124.
80
Figure 4.23: Light-eld images for projectors 2548.
81
Figure 4.24: Light-eld images for projectors 4972.
82
Figure 4.25: Light-eld images for projectors 7396.
83
Chapter 5
Direct Light Field Rendering
without 2D Image Generation
5.1 Introduction
With the development of autostereoscopic 3D displays, viewers can now enjoy multiview
3D content with motion parallax [15, 18]. Beyond 3D displays with limited viewpoints,
super multiview and light eld displays enable users to experience realistic 3D displays
[85,86]. However, these high quality 3D displays build on huge light sources and a huge
number of viewpoints, which result in high computational complexity and heavy memory
usage with the application of conventional 3D rendering algorithms [47].
A 3D rendering algorithm for an autostereoscopic 3D display generates a 3D panel
image, which contains display panel pixel information and each pixel
0
s light ray direction
that is controlled by optical components such as lenticular lenses, parallax barriers, and
slit barriers.
A number of techniques have been proposed for 3D panel image rendering. Point
Retracing Rendering (PRR) technique renders images in a point by point manner with
the use of a lens array [35]. In order to process multiple light rays at the same time,
the Parallel Group Rendering (PGR) [39,61,96]and Viewpoint Vector Rendering (VVR)
techniques use an orthographic projection model [7, 62, 71]. However, PGR and VVR
are not suitable for autostereoscopic multiview displays, because a multiview display
has a large number of directional light rays that arise due to the use of slanted lenses.
To address this problem, Verkel introduced a Multiview Rendering (MVR) algorithm,
which renders each group of rays from the same viewpoint using a perspective projection
84
model [30, 87]. Subsequently the algorithm weaves multiview images in the 3D panel
image by using information from the display light ray pattern [87].
However in order to generate the 3D panel image for super multiview or light eld 3D
displays, the use of the abovementioned techniques poses crucial diculties in terms of
computational complexity and memory usage, which may hinder the further development
of 3D display products based on such techniques. The computational complexity for
MVR increases with the number of directions of the light rays. The computational
complexity and memory usage increase with increase in the number of viewpoints and
the resolution of the 3D display. This is because MVR needs to generate more multiview
images with a higher resolution.
In this chaper we propose a novel method to generate 3D panel images with extremely
low computational complexity and memory usage. Our method, which is called Direct
Light Field Rendering (DLFR), does not generate multiview images because the desti-
nation location of all the pixels in the 3D panel image can be directly calculated in the
light eld domain. First, the 3D display light eld is modeled as a multiline function.
Next, the light ray from the input point, which is usually a color pixel, and its disparity
are modeled in the light eld domain as a line function. Finally, the intersecting points
between the 3D display multiline and the input 3D point line become the destination
locations of the 3D panel image. Among all possible light rays, the intersecting points
are necessary samples for the 3D display from an input point. The simple process of
solving the linear systems of two variables greatly reduces computational complexity.
Moreover, DLFR does not need to store additional view images, except for the 3D panel
image.
Figure 5.1 shows the main dierence between the proposed DLFR algorithm and
the conventional MVR algorithm. The images to the extreme left are the are color and
depth images and the image on the right is the 3D panel image. The upper image shows
3D display multilines and an input point line, while the lower image illustrates MVR
algorithm process.
85
Figure 5.1: Direct Light Field Rendering (DLFR): Unlike conventional multiview ren-
dering approaches, our proposed approach directly renders the 3D panel image without
reconstructing the multiview images.
Figure 5.2: 4D light eld representation: The light ray that passes through (x;y)2
and (s;t)2 can be represented asL(x;y;s;t) when and
represent parallel planes.
5.2 Light Field Representation
A 4D lighteld is a map that can be represented as
L :
!R; (x;y;s;t)7!L(x;y;s;t): (5.1)
86
(a) (b)
Figure 5.3: Light eld representation (a) Light ray from 3D display and (b) light eld
of 3D display, L(x;y
;s
A
(x);t
). Light rays from the display can be depicted as a light
eld representation.
(a) (b)
Figure 5.4: Light eld representation of the 3D point P and (a) light ray from 3D point
P (b) Light eld of the 3D point P, L(x;y
;s
P
(x);t
)
(x;y) and (s;t) denote the two intersection points of a light ray with the pair of parallel
planes and
. Figure 5.2 shows the light eld representation of Eq. (5.1). The light
ray that passes through (x;y)2
and (s;t)2 can be represented as a 4D function
L(x;y;s;t) [29,52].
87
Any light ray from a 3D display can be expressed with the light eld 4D function.
Light rays of full parallax 3D displays such as integral photography can be represented by
four parametersx;y;s andt. For horizontal parallax only 3D displays such as multiview
display, the light rays are determined by two parameters x and s. In this study we
focus on the horizontal parallax only display, setting y and t as xed values y
and t
respectively.
Figure 5.3(a) illustrates the light ray from a 3D display wherein
represents the 3D
display plane and the location of a viewer. Figure 5.3(b) shows the two dimensional
plot of the light ray when the values of y and t are xed. The point s on can be
derived as a function of x if the focal length and the viewing distance are given.
L(x;y
;s
A
(x);t
); (5.2)
where
s
A
(x) =
fD
f
x
D
f
u
m
: (5.3)
Here, f denotes the focal length of the 3D display optical component and D the
distance between and
. Further u
m
denotes the focal point of the mth lens in the
horizontal direction. The light eld in the main lobe can be obtained by choosing u
m
,
which determines s
A
in the main lobe.
Figure 5.4(a) shows a light ray from a 3D point P = (X;Y;Z) and Fig. 5.4(b)
shows the corresponding two dimensional plot. If the 3D point P is assumed to lie on a
Lambertian surface, all the re
ected light rays from the 3D point P can be represented
as
L(x;y
;s
P
(x);t
) (5.4)
where
s
P
(x) =
Z +D
Z
x
D
Z
X: (5.5)
88
The 3D panel image rendering is a process to determine the pixel location ofx of the
3D display and a 3D point P. The location of the pixel is calculated from the expression
L(x;y
;s
A
(x);t
)) =L(x;y
;s
P
(x);t
); (5.6)
referring to the light ray common to the 3D display and the 3D point P. The 3D point
P is rendered at location x of the 3D panel image and x can be computed as
s
A
(x) =s
P
(x) :x =
fXu
m
Z
f +Z
: (5.7)
5.3 PROPOSED ALGORITHM: DLFR
5.3.1 3D Display Light Field Function (DLF)
Autostereoscopic displays use optical components to achieve the eect of having dierent
light rays emerge from dierent points of view. We can map light rays from autostereo-
scopic 3D displays to a light eld representation [29,52], which is the light source (pixel)
versus viewing position point. Since optical components such as lenticular lenses, par-
allax barriers, and slit barriers exhibit periodicity, the light rays also exhibit periodicity
in the light eld representation as shown in Fig. 5.5. Figure 5.5(a) shows the light rays
from a 3D display. The blue line in Fig. 5.5(b) illustrates the plot of the light rays of
the 3D display in the light eld representation along the x ands axes, and it is a plot of
Eq. (5.3). If we quantize thex direction to assign a pixel locationn, then we can obtain
the discrete locations of light rays, which are indicated by the red-cross-marks in Fig.
5.5(b), as below
s
A
[n] =
fD
f
[n]
D
f
u
m
: (5.8)
The red-cross-marks that lie at uniform intervals on the blue line can be represented
as a straight line, since the interval between neighboring points is always
fD
f
. Therefore,
we can represent the light rays of the 3D display as a multiline function, as illustrated
89
(a)
(b)
Figure 5.5: 3D display light eld (a) Light rays from slanted lens autostereoscopic display
and (b) light eld plot of 3D display
in Fig. 5.6. The set of red-cross-marks is grouped as lines having the same slope and
the equal distances between each other. This means that the complex light rays of
a 3D display can be represented by a parsimonious linear model, thus resulting in a
computationally light and ecient 3D display rendering.
90
Parameter s
A
(x) is quantized to a sampled light eld of the 3D display as s
A
[n]
according to the sub-pixel location. Next, the number of interval(s) between two neigh-
boring points on the same line is denoted as n
l
and the number of n interval(s) between
two neighboring lines such as the ith and the (i+1)th line is dened as n
i
The multiline function can subsequently be represented as
s
A
(x;y;p) =a(xbp) +d(y) (5.9)
where a denotes the slope of multiline, p the pth multiline, b the x interval between
neighboring lines, and d(y) the lens oset due to the slanted lens.
a =
s
A
[c +n
l
;m]s
A
[c;m]
n
l
(5.10)
b =
an
i
(s
A
[c +n
i
;m]s
A
[c;m])
a
(5.11)
d(y) =l tany (5.12)
where c denotes a constant, l the horizontal lens pitch, and the slanted angle of the
lenticular lens.
Figure 5.6 illustrates dierent multiline functions. There are 45 red-cross-marks
representing light rays from 45 pixels. Each multiline function can be determined by
two corresponding neighboring points of the same line n
l
. Consequently, the value of
n
l
of Figs. 5.6(a), (b), (c), and (d) are 1, 4, 5, and 9, respectively. Once n
l
is dened,
n
i
is decided, and subsequently, the multiline function represented by Eq. (5.9) can be
calculated. Corresponding to various values ofn
l
, the slope, number of pixels on the line,
and intervals between neighboring lines vary. Since each multiline function in
uences the
computation time of the special depth value, we can select an optimum multiline function
or combinations of multiline functions.
91
(a) (b)
(c) (d)
Figure 5.6: 3D Display Light Field Function (DLF) of x and s: Light rays are tted as
multilines with the same slope when (a) n
l
is 1, (b) n
l
is 4, (c) n
l
is 5, and (d) n
l
is 9.
5.3.2 3D Point P Light Field Function (PLF)
Stereo input images can be interpreted as a light eld. Figure 5.7(a) shows the light
eld volume composed of a set of multiview images on the s axis with constant t
. The
cross section in (x;y) coordinates is a multiview image with constant s. Figures 5.7(b)
and (d) depict the cross section in (x;y) coordinates, when s
L
denotes the left camera
coordinate and s
R
the right camera coordinate. Figure 5.7(f) illustrates the epipolar
plane image, which is a cross section in the (x;s) coordinate. The 3D point P can be
represented as a line in the epipolar plane image.
The epipolar plane image is composed of lines of multiview images with constant y.
Therefore, the line of the light rays from the 3D point P contains two points of the left
image and right image and the information from these two points, which are matching
points in the stereo images. Figure 5.8 shows the light eld of the 3D point P of the input
92
Figure 5.7: Light eld of 3D Point P: (a) Light eld volume when t is xed, (b) left
image, (c) disparity in left image, (d) right image, (e) disparity in right image, and (f)
epipolar plane image when y is given.
Figure 5.8: Light eld function of 3D point P: The PLF has slope as
s
x
and passes
through (x
P
;s
L
).
image with disparity; x denotes the disparity of the 3D point P, and s the oset of
the center positions of the left and the right cameras, s
R
s
L
. The PLF function is
s
P
(x) =
s
R
s
L
x
(xx
L
) +s
L
(5.13)
where x
L
denotes x in the left image coordinate of the 3D point P.
93
Figure 5.9: Direct Light Field Rendering(DLFR): The 3D panel image is composed by
the assignment of 3D point P to the intersecting point between the DLF and PLF
5.3.3 3D Direct Light Field Rendering
The reason as to why conventional MVR constructs multiview images is that this method
does not estimate the destination of a 3D point P in the 3D panel image. It renders
every possible destination by the multiview rendering process, and then selects the right
destination by the weaving process [87]. However, our proposed method DLFR can
directly compute only the necessary samples, which represent the correct destination of
the 3D point P, and not entire light elds or multiview images. This is easily done by
solving linear systems of two variables from the DLF and PLF functions.
The intersecting point of DLF and PLF represents the correct destinations among
all possible destinations of the 3D point P. The 3D point P can be expressed as a line
in the epipolar plane image, and this corresponds to a large number of possible points.
However, the 3D display can express only the sampled light eld, indicated by the red-
cross marks in Fig. 5.5(b). Therefore, among the large number of possible light rays of
the 3D point P, only the light rays that are the same as the light rays of the 3D display
can be form the destination on the 3D panel image.
Figure 5.9 illustrates how the 3D point P is rendered as a 3D panel image upon
solving the linear systems of the two variables x and s. The 3D point P is assigned to
the intersecting points of DLF and PLF functions.
94
The x coordinate of the 3D panel image of the 3D point P is calculated as
x(p) =
abxp + (s
L
d(x))x + (s
R
s
L
)x
L
ax (s
R
s
L
)
; (5.14)
which is the solution to Eq. (5.9) and Eq. (5.13). They coordinate of the 3D panel image
is the same as that of the 3D point P, because the slanted multiview display provides
only horizontal parallax. Since Eq. (5.14) has variable p, x can take on multiple values;
for example, there are three intersecting points in Fig. 5.9.
5.4 Analysis
In this section, we determine that the 3D panel images obtained with the DLFR algo-
rithm are the same as those obtained with the MVR algorithm. The autostereoscopic
display has N + 1 multiview points. There is a 3D point P, and it has disparity a d(P)
between the leftmost and rightmost view points.
DLFR:
The PLF function Eq. (5.13) can be expressed as below in the N + 1 multiview
sytem,
s
P
(x) =
N
d(P)
(xx
L
): (5.15)
The 3D point P is assigned the 3D panel image at location x when x denotes the
intersection point of the PLF function given by Eq. (5.15) and DLF function given by
Eq. (5.9).
I
3D
(x
DLFR
sol
(P)) = P (5.16)
where I
3D
represents the 3D panel image for an N + 1 view multiview display and
x
DLFR
sol
(P) the horizontal pixel location of the intersecting point of the PLF function
given by Eq. (5.15) and the DLF function given by Eq. (5.9).
95
MVR:
In order to apply the MVR algorithm we dene the multiview images rst. There
are 0 toN multiview images;I
0
,I
1
,:::,I
N
. Further,x
In
(P) denotes the horizontal pixel
location of point P and the disparity value between I
0
and I
N
of P is denoted byd(P).
Thus, we have
d(P) =x
I
N
(P)x
I
0
(P); (5.17)
and by multiview image denition, we can determine the horizontal pixel location of P
in the nth multiview I
n
as
x
In
(P) =x
I
0
(P) +
d(P)
N
n (5.18)
and its s value which means view number in the light eld representation is
S
In
=n (5.19)
where S
In
denotes the view number of I
n
.
Subsequently, the point P can be assigned to the 3D panel image, I
3D
I
3D
(x
MVR
n
(P)) = P (5.20)
if
WP [x
MVR
n
(P)] =n; (5.21)
where WP [x] denotes the assigned view number for location pixel s of the 3D panel
image.
Parameter WP [x] can be expressed by the DLF function as
WP [x
MVR
n
(P)] =s
A
[x]; (5.22)
96
(a) (b)
Figure 5.10: Comparison between Direct Light Field Rendering (DLFR) and Multiview
Rendering (MVR) for increasing number of views: (a) Computation time and (b) memory
usage
meaning that x
MVR
n
(P) lies on the curve of the PLF function Eq. (5.13) as well as the
curve of the DLF function Eq. (5.9). Since the intersecting point, x
DLFR
sol
(P), lies on the
curve corresponding to Eq. (5.13), it also satises Eq (5.9). Therefore, the assignment
location of the 3D point (P ) by DLFR, x
DLFR
sol
(P), is same as the location by MVR,
x
MVR
n
(P). This means that the panel images obtained via DLFR and MVR are identical.
5.5 Experimental Results
In this section, we compare the computation time and memory usage between DLFR
and MVR. We analyze computation time for DLFR according to dierent disparity and
multiline functions. First, we compare DLFR and MVR in the aspect of number of views
for 3D displays. Next, we test real contents with our 3D display system which provides
96 views. Finally, we show that computation time of DLFR depends on the disparity
value and suggest possibility to reduce computation time by using dierent multiline
function to the dierent disparity value.
We compare the computational time and memory usage of the DLFR and MVR
algorithms in terms of the increase in the number of views of autostereoscopic displays.
97
Figure 5.11: Comparison between Direct Light Field Rendering (DLFR) and Multiview
Rendering (MVR) for various video contents when the number of views is 96
Figure 5.10 compares the computational time and memory usage between DLFR and
MVR. If the number of views of the autostereoscopic display is increased, the MVR
algorithm will generate more multiview images. However, the DLFR algorithm does not
need to generate more images. It only generates a 3D panel image once, independent
of how many views the autostereoscopic display generates. Figure 5.10(a) shows the
computational time of the DLFR and MVR algorithms and Fig. 5.10(b) shows the
memory usage of the DLFR and MVR algorithms. The computational time and memory
usage of the proposed DLFR algorithm remain constant regardless of the increase in the
number of views. However, in the case of MVR, computation time and memory usage
linearly increase with increase in the number of views.
We tested 30 video contents (a total of 2705 frames) with our 3D display system. The
3D display system was a 65 inch 4K (3840 2160) 3D display, which provides 96 dierent
viewpoints. We examined the performance of the conventional MVR algorithm. We
compared the running time and memory usage between the DLFR and MVR algorithms,
which generated 96 multiview images rst, and subsequently generated the 3D panel
image. As shown in Fig. 5.11, the computation time in the case of the DLFR is reduced
98
−100 −50 0 50 100
1
2
3
4
5
6
7
8
9
10
11
computation time (sec)
disparity
−100 −50 0 50 100
0
2
4
6
8
10
12
14
16
18
20
# of intersection points
computation time
# of intersection points
(a)
Figure 5.12: DLFR performance with disparity range of (100; 100): Numbers of inter-
section points of DLF and PLF and computation time for 4K 3D panel image
(a)
Figure 5.13: Computation times for various multiline functions
to 12% of that of the conventional MVR algorithm, i.e., from 49.76 s to 5.46 s, and
memory usage with the DLFR is reduced to 1% of that of the MVR, i.e., from 2360 MB
to 24.3 MB.
The computation time of DLFR depends on the number of solutions to Eq. (5.14).
The number of solutions is dened by the slope of the PLF, which corresponds to the
disparity, and the multiline function DLF. Figure 5.12 shows the computation time and
the number of solutions for Eq. (5.14) as functions of the disparity. With increase in the
number of DLF and PLF solutions, the computational time correspondingly increases.
99
−100 −50 0 50 100
0
2
4
6
8
10
12
disparity
sec
computation time
fix multiline function
adaptive multiline function
Figure 5.14: Computation time as a function of disparity for various multiline functions.
Since the slope of the DLF curve aects the number of solutions of DLF and PLF, the
optimum value n
l
, which is the number of intervals between two neighboring points on
the same line must selected for the 3D display type. Figure 5.13 shows the computation
time required for various multiline functions. With prior knowledge of the disparity
distribution of the contents, we can select the appropriate multiline function. In order
to address general content for which we do not know the disparity distribution, we can
reduce the computation time by applying various multiline functions. Figure 5.14 shows
resulting computation times when dierent multiline functions are applied to a select
disparity range.
5.6 Conclusion
We developed a new framework for 3D display rendering based on novel 3D panel image
rendering in the light eld domain that provides an ecient rendering approach in terms
of reducing and optimizing complexity and memory usage. The conventional MVR
algorithm suers from increased computational complexity and memory usage with the
increasing number of views in 3D displays, thus reducing commercial viability. In con-
trast, the proposed DLFR algorithm is independent of the number of viewpoints of the 3D
display, with the computational and memory usage remaining constant. Consequently,
100
we believe that our DLFR algorithm will form a key aspect in the further development
of 3D displays.
101
Chapter 6
Conclusion
The 3D display rendering from stereo sequences were studied in this research. We have
presented structure reconstruction from stereo sequences, ecient 3D display rendering
algorithms for multi-projection 3D display and
at panel 3D display, and uncalibrated
multiview rendering algorithm.
In Chapter 2, we propose a new method to estimate the disparity map from a stereo
image pair. The proposed algorithm decomposes the disparity map into high-condence
and low-condence regions and processes them separately by complementary tree struc-
tures. First, the stereo images input are uses to estimate the disparity values in the
high-condence region. Then, the low-condence region is estimated via global mini-
mization using disparities in high-condence regions as well as the color image of one
view. The proposed algorithm works well areas for dicult to estimate using only the
stereo matching technique such as thin objects, occlusion areas, and frame boundary
areas. It work well for the Middlebury test sequences, and performs well for real world
captured images and computer-synthesized images.
In Chapter 3, we propose ecient multiview rendering algorithm with uncalibrated
stereo images as the input. Generating accurate multiview images yields unwanted warp-
ing distortion and temporal inconsistency artifacts. Therefore, we propose an approxi-
mated model for uncalibrated camera poses by epipolar geometry approximation. The
epipolar geometry approximation model is mathematically analyzed to guaranty with
good results for non-ideal stereo images. We derive model for multiview rendering from
the epipolar geometry approximation. The results demonstrate the eectiveness of the
proposed algorithm by comparing three interpolated view.
102
In Chapter 4, we have implemented and veried DIBR light-eld rendering algorithm
though the multi-projection 3D display. To process 100-Mpixel light rays, the proposed
light eld algorithm uses relatively fewer input images and depth maps, resulting in
computation complexity and memory usage reduce. Experimental results demonstrate
that the proposed algorithm performs better than conventional IBR algorithms in terms
of memory, complexity, and quality.
In Chapter 5, we develop a new ecient 3D display rendering in the light eld domain.
The proposed algorithm is ecient in terms of reducing and optimizing complexity and
memory usage. The conventional algorithm suers from increased computational com-
plexity and memory usage with the increasing number of view in 3D displays to achieve
a sucient level of reality. However, the proposed algorithm is independent with the
increasing number of view in 3D displays. The computation and memory usage of pro-
posed algorithm remains constant.
103
Bibliography
[1] E. H. Adelson and J. R. Bergen, \The plenoptic function and the elements of early
vision," Computational models of visual processing, vol. 1, no. 2, 1991.
[2] H. K. Arachchi, S. Dogan, X. Shi, E. Ekmekcioglu, S. Worrall, A. Franck, C. Hart-
mann, T. Korn, and P. Kovacs, \D3.6 report on 3D audio / video rendering and
content adaptation," DIOMEDES, European Commission WP3, 2011.
[3] T. Balogh, \The holovizio system," in Electronic Imaging, 2006.
[4] ||, \Method and apparatus for displaying 3D images," Patent US 6,999,071,
Feb. 14, 2006.
[5] T. Balogh, \Method and apparatus for generating 3D images," Patent US 7,959,294,
Jun. 14, 2011.
[6] T. Balogh, Z. Nagy, P. T. Kov acs, and V. K. Adhikarla, \Natural 3D content on
glasses-free light-eld 3D cinema," in IS&T/SPIE Electronic Imaging, vol. 8648,
2013.
[7] Y. C. Bin-Na-Ra Lee, K. S. Park, S.-W. Min, J.-S. Lim, M. C. Whang, and K. R.
Park, \Design and implementation of a fast integral image rendering method,"
Entertainment Computing{ICEC, p. 135, 2006.
[8] M. Bleyer and M. Gelautz, \Simple but eective tree structures for dynamic
programming-based stereo matching." in VISAPP (2), 2008, pp. 415{422.
[9] ||, \Temporally consistent disparity maps from uncalibrated stereo videos," in
Proc. Image Signal Process. Anal., 2009, pp. 383{387.
[10] A. F. Bobick and S. S. Intille, \Large occlusion stereo," International Journal of
Computer Vision, vol. 33, no. 3, pp. 181{200, 1999.
[11] Y. Boykov, O. Veksler, and R. Zabih, \Fast approximate energy minimization via
graph cuts," Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 23, no. 11, pp. 1222{1239, 2001.
104
[12] I. J. Cox, S. L. Hingorani, S. B. Rao, and B. M. Maggs, \A maximum likelihood
stereo algorithm," Computer vision and image understanding, vol. 63, no. 3, pp.
542{567, 1996.
[13] A. Criminisi, A. Blake, C. Rother, J. Shotton, and P. H. Torr, \Ecient dense
stereo with occlusions for new view-synthesis by four-state dynamic programming,"
International Journal of Computer Vision, vol. 71, no. 1, pp. 89{110, 2007.
[14] C. N. de Boer, R. Verleur, A. Heuvelman, and I. Heynderickx, \Added value of
an autostereoscopic multiview 3-d display for advertising in a public environment,"
Displays, vol. 31, no. 1, pp. 1{8, 2010.
[15] N. A. Dodgson, \Analysis of the viewing zone of the cambridge autostereoscopic
display," Applied Optics, vol. 35, no. 10, pp. 1705{1710, 1996.
[16] ||, \Analysis of the viewing zone of multiview autostereoscopic displays," in
Electronic Imaging, 2002, pp. 254{265.
[17] ||, \Autostereoscopic 3d displays," Computer, vol. 38, no. 8, pp. 31{36, 2005.
[18] ||, \Autostereoscopic 3D displays," Computer, vol. 38, no. 8, pp. 31{36, 2005.
[19] D. Fattal, Z. Peng, T. Tran, S. Vo, M. Fiorentino, J. Brug, and R. G. Beausoleil, \A
multi-directional backlight for a wide-angle, glasses-free three-dimensional display,"
Nature, vol. 495, no. 7441, pp. 348{351, 2013.
[20] C. Fehn, \Depth-image-based rendering (dibr), compression, and transmission for
a new approach on 3D-TV," in Electron. Imaging, 2004, pp. 93{104.
[21] P. F. Felzenszwalb and D. P. Huttenlocher, \Ecient belief propagation for early
vision," International journal of computer vision, vol. 70, no. 1, pp. 41{54, 2006.
[22] M. A. Fischler and R. C. Bolles, \Random sample consensus: a paradigm for model
tting with applications to image analysis and automated cartography," Comm.
ACM, vol. 24, no. 6, pp. 381{395, 1981.
[23] P. Fragneto, A. Fusiello, L. Magri, B. Rossi, and M. Runi, \Uncalibrated
view synthesis with homography interpolation," in Proceedings of the 2nd Joint
3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization and
Transmission (3DIMPVT), 2012.
[24] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael, \Learning low-level vision,"
International journal of computer vision, vol. 40, no. 1, pp. 25{47, 2000.
[25] A. Fusiello, \Specifying virtual cameras in uncalibrated view synthesis," IEEE
Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 604{611, 2007.
[26] A. Fusiello and L. Irsara, \An uncalibrated view-synthesis pipeline," in Proceedings
of the International Conference on Image Analysis and Processing, 2007, pp. 609{
614.
105
[27] ||, \Quasi-euclidean uncalibrated epipolar rectication," in Proc. 19th Int.l Conf.
Pattern Recogn., 2008, pp. 1{4.
[28] J. Gluckman and S. K. Nayar, \Rectifying transformations that minimize resampling
eects," in Proc. Int. Conf. Computer Vision and Pattern Recognition, vol. 1, 2001,
pp. I{111.
[29] S. J. Gortler, R. Grzeszczuk, R. Szeliski, and M. F. Cohen, \The lumigraph," in
Proceedings of the 23rd annual conference on Computer graphics and interactive
techniques, 1996, pp. 43{54.
[30] M. Halle, \Multiple viewpoint rendering," in Proceedings of the 25th annual confer-
ence on Computer graphics and interactive techniques, 1998, pp. 243{254.
[31] C. Harris and M. Stephens, \A combined corner and edge detector." in Proc. Fourth
Alvey Vision Conf., vol. 15, 1988, p. 50.
[32] R. I. Hartley, \Theory and practice of projective rectication," Int. J. Comput. Vis.,
vol. 35, no. 2, pp. 115{127, 1999.
[33] H. Hirschmuller, \Accurate and ecient stereo processing by semi-global match-
ing and mutual information," in Computer Vision and Pattern Recognition, 2005.
CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp.
807{814.
[34] A. Hosni, C. Rhemann, M. Bleyer, C. Rother, and M. Gelautz, \Fast cost-volume l-
tering for visual correspondence and beyond," Pattern Analysis and Machine Intel-
ligence, IEEE Transactions on, vol. 35, no. 2, pp. 504{511, 2013.
[35] Y. Igarashi, H. Murata, and M. Ueda, \3-D display system using a computer gener-
ated integral photograph," Japanese Journal of Applied Physics, no. 17, pp. 1683{
1684, 1978.
[36] T. Inoue and H. Ohzu, \Accommodative responses to stereoscopic three-dimensional
display," Applied optics, vol. 36, no. 19, pp. 4509{4515, 1997.
[37] T. Ito, \Future televisionsuper hi-vision and beyond," in Solid State Circuits Con-
ference (A-SSCC), IEEE Asian, 2010, pp. 1{4.
[38] Y. J. Jeong, J. Kim, H. Y. Lee, and D.-S. Park, \Condence stereo matching using
complementary tree structures and global depth-color tting," in Consumer Elec-
tronics (ICCE), 2013 IEEE International Conference on. IEEE, 2013, pp. 468{469.
[39] S. Jiao, X. Wang, M. Zhou, W. Li, T. Hong, D. Nam, J.-H. Lee, E. Wu, H. Wang, and
J.-Y. Kim, \Multiple ray cluster rendering for interactive integral imaging system,"
Optics express, vol. 21, no. 8, pp. 10 070{10 086, 2013.
[40] R. B. Johnson and G. A. Jacobsen, \Advances in lenticular lens arrays for visual
display," in Optics & Photonics, 2005, pp. 587 406{587 406.
106
[41] J. Jurik, A. Jones, M. Bolas, and P. Debevec, \Prototyping a light eld display
involving direct observation of a video projector array," in Computer Vision and
Pattern Recognition Workshops (CVPRW), IEEE Computer Society Conference on,
2011, pp. 15{20.
[42] Y.-Y. Kao, Y.-P. Huang, K.-X. Yang, P. C.-P. Chao, C.-C. Tsai, and C.-N. Mo,
\11.1: An auto-stereoscopic 3d display using tunable liquid crystal lens array that
mimics eects of grin lenticular lens array," in SID Symposium Digest of Technical
Papers, vol. 40, no. 1, 2009, pp. 111{114.
[43] M. Kawakita, S. Iwasawa, M. Sakai, Y. Haino, M. Sato, and N. Inoue, \3D image
quality of 200-inch glasses-free 3D display system," in IS&T/SPIE Electronic Imag-
ing, 2012.
[44] K. Kikuta and Y. Takaki, \Development of SVGA resolution 128-directional dis-
play," in Electronic Imaging, 2007.
[45] J. C. Kim, K. M. Lee, B. T. Choi, and S. U. Lee, \A dense stereo matching using
two-pass dynamic programming with generalized ground control points," in Com-
puter Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society
Conference on, vol. 2. IEEE, 2005, pp. 1075{1082.
[46] V. Kolmogorov and R. Zabih, \Computing visual correspondence with occlusions
using graph cuts," in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
International Conference on, vol. 2. IEEE, 2001, pp. 508{515.
[47] K.-C. Kwon, C. Park, M.-U. Erdenebat, J.-S. Jeong, J.-H. Choi, N. Kim, J.-H.
Park, Y.-T. Lim, and K.-H. Yoo, \High speed image space parallel processing for
computer-generated integral imaging system," Optics express, vol. 20, no. 2, pp.
732{740, 2012.
[48] H. J. Lee, H. Nam, J. D. Lee, H. W. Jang, M. S. Song, B. S. Kim, J. S. Gu, C. Y.
Park, and K. H. Choi, \8.2: A high resolution autostereoscopic display employing
a time division parallax barrier," in SID Symposium Digest of Technical Papers,
vol. 37, no. 1, 2006, pp. 81{84.
[49] J.-H. Lee, J. Park, D. Nam, S. Y. Choi, D.-S. Park, and C. Y. Kim, \Optimal
projector conguration design for 300-mpixel multi-projection 3D display," Optics
express, vol. 21, no. 22, pp. 26 820{26 835, 2013.
[50] A. Levin, D. Lischinski, and Y. Weiss, \A closed-form solution to natural image mat-
ting," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30,
no. 2, pp. 228{242, 2008.
[51] M. Levoy, \Light elds and computational imaging," IEEE Computer, vol. 39, no. 8,
pp. 46{55, 2006.
107
[52] M. Levoy and P. Hanrahan, \Light eld rendering," in Proceedings of the 23rd
annual conference on Computer graphics and interactive techniques, 1996, pp. 31{
42.
[53] D. Liebowitz and A. Zisserman, \Metric rectication for perspective images of
planes," in Proc. Int. Conf. Computer Vision and Pattern Recognition, 1998, pp.
482{488.
[54] D. G. Lowe, \Object recognition from local scale-invariant features," in Computer
vision, 1999. The proceedings of the seventh IEEE international conference on,
vol. 2. Ieee, 1999, pp. 1150{1157.
[55] ||, \Distinctive image features from scale-invariant keypoints," Int. J. Comput.
Vis., vol. 60, no. 2, pp. 91{110, 2004.
[56] J. Mallon and P. F. Whelan, \Projective rectication from the fundamental matrix,"
Int. J. Comput. Vis., vol. 23, no. 7, pp. 643{650, 2005.
[57] D. Marr and E. Hildreth, \Theory of edge detection," Proceedings of the Royal
Society of London. Series B. Biological Sciences, vol. 207, no. 1167, pp. 187{217,
1980.
[58] L. McMillan and G. Bishop, \Head-tracked stereoscopic display using image warp-
ing," in IS&T/SPIE Electronic Imaging, 1995, pp. 21{30.
[59] ||, \Plenoptic modeling: An image-based rendering system," in Proceedings of
the 22nd annual conference on Computer graphics and interactive techniques, 1995,
pp. 39{46.
[60] D. Min, J. Lu, and M. N. Do, \Depth video enhancement based on weighted mode
ltering," Image Processing, IEEE Transactions on, vol. 21, no. 3, pp. 1176{1190,
2012.
[61] S.-W. Min, J. Kim, and B. Lee, \New characteristic equation of three-dimensional
integral imaging system and its applications," Japanese journal of applied physics,
vol. 44, no. 1L, pp. L71{L74, 2005.
[62] S.-W. Min, K. S. Park, B. Lee, Y. Cho, and M. Hahn, \Enhanced image mapping
algorithm for computer-generated integral imaging system," Japanese journal of
applied physics, vol. 45, no. 7L, pp. L744{L747, 2006.
[63] K. Nagano, A. Jones, J. Liu, J. Busch, X. Yu, M. Bolas, and P. Debevec, \An
autostereoscopic projector array optimized for 3D facial display," in ACM SIG-
GRAPH Emerging Technologies, no. 3, 2013.
[64] H. Nakanuma, H. Kamei, and Y. Takaki, \Natural 3D display with 128 directional
images used for human-engineering evaluation," in Electronic Imaging, 2005, pp.
28{35.
108
[65] A. Neil, \Autostereoscopic 3D displays," Computer, vol. 8, pp. 32{36, 2005.
[66] Y. Nojiri, H. Yamanoue, A. Hanazato, M. Emoto, and F. Okano, \Visual com-
fort/discomfort and visual fatigue caused by stereoscopic HDTV viewing," in Elec-
tron. Imaging, 2004, pp. 303{313.
[67] Y. Ohta and T. Kanade, \Stereo by intra-and inter-scanline search using dynamic
programming," Pattern Analysis and Machine Intelligence, IEEE Transactions on,
no. 2, pp. 139{154, 1985.
[68] F. Okano, \3D TV with integral imaging," in SPIE Defense and Security Sympo-
sium, vol. 6983, 2008.
[69] A. Ouazan, P. T. Kovacs, T. Balogh, and A. Barsi, \Rendering multi-view plus
depth data on light-eld displays," in 3DTV Conference: The True Vision-Capture,
Transmission and Display of 3D Video (3DTV-CON), 2011, pp. 1{4.
[70] V. Papadimitriou and T. J. Dennis, \Epipolar line estimation and rectication for
stereo image pairs," IEEE Trans. Image Process., vol. 5, no. 4, pp. 672{676, 1996.
[71] K. S. Park, M. Sung-Wook, and C. Yongjoo, \Viewpoint vector rendering for e-
cient elemental image generation," IEICE TRANSACTIONS on Information and
Systems, vol. 90, no. 1, pp. 233{241, 2007.
[72] E. Rosten, R. Porter, and T. Drummond, \Faster and better: A machine learning
approach to corner detection," IEEE Trans. Pattern Anal. Mach. Intell., vol. 32,
no. 1, pp. 105{119, 2010.
[73] D. Scharstein and R. Szeliski, \Stereo matching with nonlinear diusion," Interna-
tional Journal of Computer Vision, vol. 28, no. 2, pp. 155{174, 1998.
[74] ||, \A taxonomy and evaluation of dense two-frame stereo correspondence algo-
rithms," International journal of computer vision, vol. 47, no. 1-3, pp. 7{42, 2002.
[75] A. Schwartz, \Head tracking stereoscopic display," Electron Devices, IEEE Trans-
actions on, vol. 33, no. 8, pp. 1123{1127, 1986.
[76] I. Sexton, \Parallax barrier display systems," in Stereoscopic Television, IEE Col-
loquium on, 1992, pp. 5{1.
[77] J. Shah, \A nonlinear diusion model for discontinuous disparity and half-
occlusions in stereo," in Computer Vision and Pattern Recognition, 1993. Proceed-
ings CVPR'93., 1993 IEEE Computer Society Conference on. IEEE, 1993, pp.
34{40.
[78] H.-Y. Shum, S.-C. Chan, and S. B. Kang, Image-based rendering, 2008.
[79] A. M. Siu and R. W. Lau, \Image registration for image-based rendering," Image
Processing, IEEE Transactions on, vol. 14, no. 2, pp. 241{252, 2005.
109
[80] A. Smolic, K. Muller, K. Dix, P. Merkle, P. Kau, and T. Wiegand, \Intermediate
view interpolation based on multiview video plus depth for advanced 3D video
systems," in Image Processing, 15th IEEE International Conference on, 2008, pp.
2448{2451.
[81] J. Sun, Y. Li, S. B. Kang, and H.-Y. Shum, \Symmetric stereo matching for occlu-
sion handling," in Computer Vision and Pattern Recognition, 2005. CVPR 2005.
IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 399{406.
[82] J. Sun, N.-N. Zheng, and H.-Y. Shum, \Stereo matching using belief propagation,"
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, no. 7,
pp. 787{800, 2003.
[83] Y. Takaki, \Thin-type natural three-dimensional display with 72 directional
images," in Electronic Imaging, 2005, pp. 56{63.
[84] ||, \Super multi-view display with 128 viewpoints and viewpoint formation," in
IS&T/SPIE Electronic Imaging, vol. 7237, 2009.
[85] Y. Takaki and N. Nago, \Multi-projection of lenticular displays to construct a 256-
view super multi-view display," Optics express, vol. 18, no. 9, pp. 8824{8835, 2010.
[86] Y. Takaki, K. Tanaka, and J. Nakamura, \Super multi-view display with a lower
resolution
at-panel display," Optics express, vol. 19, no. 5, pp. 4129{4139, 2011.
[87] C. Van Berkel, \Image preparation for 3D LCD," in Proceedings of SPIE - IS&T
Electronic Imaging, 1999, pp. 84{91.
[88] C. Van Berkel, D. W. Parker, and A. R. Franklin, \Multiview 3d lcd," in Electronic
Imaging: Science & Technology, 1996, pp. 32{39.
[89] O. Veksler, \Ecient graph-based energy minimization methods in computer
vision," Ph.D. dissertation, Cornell University, 1999.
[90] ||, \Stereo correspondence by dynamic programming on a tree," in Computer
Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Con-
ference on, vol. 2. IEEE, 2005, pp. 384{390.
[91] G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, \Tensor displays: compressive
light eld synthesis using multilayer displays with directional backlighting," ACM
Transactions on Graphics (TOG), vol. 31, no. 80, 2012.
[92] B. Wilburn, N. Joshi, V. Vaish, M. Levoy, and M. Horowitz, \High-speed videog-
raphy using a dense camera array," in Computer Vision and Pattern Recognition,
Proceedings of the IEEE Computer Society Conference on, vol. 2, 2004, pp. 294{301.
[93] O. H. Willemsen, S. T. Zwart, M. G. Hiddink, D. K. Boer, and M. P. Krijn, \Multi-
view 3D displays," in SID Symposium Digest of Technical Papers, vol. 38, no. 1,
2007, pp. 1154{1157.
110
[94] Q. Yang, L. Wang, and N. Ahuja, \A constant-space belief propagation algorithm
for stereo matching," in Computer Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on. IEEE, 2010, pp. 1458{1465.
[95] Q. Yang, L. Wang, R. Yang, H. Stew enius, and D. Nist er, \Stereo matching with
color-weighted correlation, hierarchical belief propagation, and occlusion handling,"
Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, no. 3,
pp. 492{504, 2009.
[96] R. Yang, X. Huang, and S. Chen, \Ecient rendering of integral images," in SIG-
GRAPH'05, Proceedings of the 32nd annual conference on Computer graphics and
interactive techniques, 2005, p. 44.
[97] K.-J. Yoon and I. S. Kweon, \Adaptive support-weight approach for correspondence
search," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28,
no. 4, pp. 650{656, 2006.
[98] A. L. Yuille and T. Poggio, \A generalized ordering constraint for stereo correspon-
dence," 1984.
[99] J. Zhou and B. Li, \Image rectication for stereoscopic visualization," J. Opt. Soc.
Amer. A, vol. 25, no. 11, pp. 2721{2733, 2008.
111
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Accurate 3D model acquisition from imagery data
PDF
Green learning for 3D point cloud data processing
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Object detection and recognition from 3D point clouds
PDF
Rendering for automultiscopic 3D displays
PDF
Machine learning methods for 2D/3D shape retrieval and classification
PDF
Advanced techniques for stereoscopic image rectification and quality assessment
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Data-driven 3D hair digitization
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
Point-based representations for 3D perception and reconstruction
PDF
Unsupervised learning of holistic 3D scene understanding
PDF
3D object detection in industrial site point clouds
PDF
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
Labeling cost reduction techniques for deep learning: methodologies and applications
PDF
Depth inference and visual saliency detection from 2D images
PDF
Facial age grouping and estimation via ensemble learning
PDF
Face recognition and 3D face modeling from images in the wild
Asset Metadata
Creator
Jeong, Young Ju
(author)
Core Title
Autostereoscopic 3D diplay rendering from stereo sequences
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
03/21/2016
Defense Date
03/02/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D display rendering,autostereoscopic 3D display,light field rendering,multi-projection 3D display,multiview 3D display,multiview rendering,OAI-PMH Harvest,stereo matching
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C. –C. Jay (
committee chair
), Georgiou, Panayiotis (
committee member
), Jenkins, B. Keith (
committee member
), Nakano, Aiichiro (
committee member
), Sawchuk, Alexander A. (
committee member
)
Creator Email
i12feel@gmail.com,youngjeo@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-222048
Unique identifier
UC11277844
Identifier
etd-JeongYoung-4212.pdf (filename),usctheses-c40-222048 (legacy record id)
Legacy Identifier
etd-JeongYoung-4212-0.pdf
Dmrecord
222048
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Jeong, Young Ju
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D display rendering
autostereoscopic 3D display
light field rendering
multi-projection 3D display
multiview 3D display
multiview rendering
stereo matching