Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Rendering for automultiscopic 3D displays
(USC Thesis Other)
Rendering for automultiscopic 3D displays
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Rendering for Automultiscopic 3D Displays
by
Andrew Jones
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements of the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
December 2016
Copyright 2016 Andrew Jones
Acknowledgements
The systems described in this thesis required a wide integration of custom optics,
electrical engineering, and computer science. This work would not have been possible
without my many colleagues who helped build, design, and test these display prototypes:
Ian McDowall, Magnus Lang, Koki Nagano, Jonas Unger, Jay Busch, Xueming Yu, Gra-
ham Fye, Hideshi Yamada, Hsuan-Yueh Peng, Jing Liu, Joel Jurik, Oleg Alexander,
Carlos Morales, Megan Iafrati, and Joey Barreto.
Chapter 6 builds on an interactive framework developed by the ICT Natural Language
group including Ron Artstein, David Traum, Anton Leuski, Jillian Gerten, and Kallirroi
Georgila. I would like to also thank Steven Smith, Mark Rothman, Kia Hays, Karen
Jungblut from the USC Shoah Foundation, and Heather Maio from Conscience Display
for inspiring the "New Dimensions in Testimony" project and their tireless work though-
out development, interviews, and post-production. I would like to thank the generous
donations from private foundations and individuals that made the NDT project possible
including the Pears Foundation, Alan Shay, Lucy Goldman, and the Wolfson Foundation.
Special thanks goes to the Holocaust survivors who trusted us with their testimonies,
in particular Pinchas Gutter, who always amazed me with his resilience, patience, and
abundant generosity.
I would like to thank my other scanned subjects: Lynn Fye, Cynthia Richards, David
Morin, Morgan Spurlock, and Cara Santa Maria. My appreciation goes to Mark Levoy,
Stanford University, and the Smithsonian X project for providing scan geometry data-sets.
ii
I would like to thank my fellow and former lab members: Abhijeet Ghosh, Pieter Piers,
Andrew Gardner, Andreas Wenger, Chris Tchou, Shanhe Wang, Matt Chiang, Alex Ma,
Charles-Felix Chabert, Bruce Lamond, Charis Poullis, Sebastian Sylwan, Per Einarsson,
Marcos Fajardo, Jessi Stumpfel, Marcel Ramos, Cyrus Wilson, and Tim Hawkins. Special
thanks to my fellow PhD students Paul Graham and Borom Tonwattanapong for their
constant eorts to keep my studies on track.
I had the luxury of a wide support network at the Institute for Creative Technologies
including Rob Groome, Diane Piepol, Tom Pereira, Monica Nichelson, Kathleen Haase,
Christina Trejo, Valerie Dauphin, Dell Lunceford, Bill Swartout, Clark Levin, Lila Brooks,
Je Fisher, Randy Hill, Randolph Hall, David Krum, Richard DiNinni, Scott Fisher, and
John Parmentola. This work was sponsored by the University of Southern California
Oce of the Provost, U.S. Air Force DURIP, and U.S. Army Research, Development, and
Engineering Command (RDECOM). The high-speed projector was originally developed
by a grant from the Oce of Naval Research under the guidance of Ralph Wachter and
Larry Rosenblum. The content of this thesis does not necessarily re
ect the position or
the policy of the US Government, and no ocial endorsement should be inferred.
I would like to thank my PhD advisors Paul Debevec and Mark Bolas for their constant
inspiration and guidance along with my qualication and dissertation committee members
Jernej Barbiq, Gerard Medioni, Perry Hoberman and Hao Li.
I would like to thank my family Tim (the rst Dr. Jones), Caroline, Mom, and Dad
for their enduring love and support. Thanks to Valorie for reminding me to nd the fun
in life, when the work was overwhelming. Finally, but certainly not least, I have to thank
my wonderful wife Julia Campbell, who was willing to share her dreams and push me to
realize mine.
iii
Table of Contents
Abstract ix
1 Introduction 1
2 Background and Related Work 4
2.1 Types of 3D Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Other Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Spinning Displays 14
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Projecting Graphics to the Display . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Projecting from the Scene into the Projector . . . . . . . . . . . . 22
3.3.2 Ray Tracing from the Projector into the Scene . . . . . . . . . . . 24
3.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Geometric Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Displaying Photographic Light Fields . . . . . . . . . . . . . . . . . . . . . 29
3.6 Visual Accommodation Performance . . . . . . . . . . . . . . . . . . . . . 32
3.7 Displaying Color Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Approximating Conical Re
ections . . . . . . . . . . . . . . . . . . . . . . 33
4 3D Teleconferencing 39
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Human Factors and Related Work . . . . . . . . . . . . . . . . . . . . . . 41
4.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Projecting 3D Vertices to the Display . . . . . . . . . . . . . . . . . . . . 48
4.5 Flat and Curved 3D Display Surfaces . . . . . . . . . . . . . . . . . . . . . 52
4.6 Face Tracking for Vertical Parallax . . . . . . . . . . . . . . . . . . . . . . 54
4.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Projector Arrays 61
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
iv
5.4 Viewer Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Convex Screen Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6 Time-oset Conversations 78
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Interview Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Data Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.4 Display Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5 Light Field Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.6 Natural Language Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7 Future Work 96
7.1 Spinning mirror displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1.1 Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1.2 Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.1.3 Color reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.1.4 Alternate Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.2 Projector Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.1 3D teleconferencing . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3.2 Time-oset conversations . . . . . . . . . . . . . . . . . . . . . . . 100
7.4 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8 Conclusion 102
BIBLIOGRAPHY 106
v
List of Figures
3.1 A 3D object shown on the spinning mirror display is photographed by
two stereo cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Graph plotting importance of depth cues at varying distances . . . . . . 16
3.3 Photograph and schematic of spinning mirror display . . . . . . . . . . . 17
3.4 Twenty-four consecutive binary frames of interactive OpenGL graphics
are packed into a single 24-bit color image. . . . . . . . . . . . . . . . . . 18
3.5 Measuring holographic diusion . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Diagram explaining vertex projection and light eld sampling for spinning
mirror display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7 Each viewpoint sees sections of multiple projector frames re
ected by the
spinning mirror to form a single perspective image. . . . . . . . . . . . . 25
3.8 Comparison of projection methods for a spinning mirror display . . . . . 26
3.9 Fiducial markers for calibrating spinning mirror display . . . . . . . . . 28
3.10 Validation of resampled lighting eld shown on spinning mirror display . 35
3.11 Visualization of vertical light eld rebinning on spinning mirror display . 36
3.12 Photographs of eight frames from a 25-frame animated light eld as shown
on the display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.13 Photographs testing focus on spinning mirror display . . . . . . . . . . . 37
3.14 A two-mirror tent for displaying two-toned color imagery using orange
and cyan lters below the diusers. . . . . . . . . . . . . . . . . . . . . . 38
4.1 Photographs of 3D teleconferencing apparatus . . . . . . . . . . . . . . . 42
4.2 Patterns used for 3D face scanning, including two phase-shifted sinusoids
and two half-lit images. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Convex,
at, and concave display surfaces made from brushed aluminum
sheet metal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4 A top view showing the anisotropic re
ection of a projector ray into a cone. 49
4.5 A grid of points on a frontoparallel plane is processed through the 6D
lookup table to produce warped geometry displayed on the projector. . . 57
4.6 Photographs showing divergence of light re
ected of
at and concave
anisotropic mirrors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.7 2D face tracking for dynamic vertical parallax . . . . . . . . . . . . . . . 59
4.8 Measurement of gaze accuracy for the 3D teleconferencing display . . . . 59
vi
4.9 Comparison of the dierent mirror shapes for simultaneously tracked up-
per and lower views. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1 3D stereo photographs of a human face on the autostereoscopic projector
array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 The anisotropic screen forms a series of vertical lines, each corresponding
to a projector lens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Photograph of projector array calibration setup . . . . . . . . . . . . . . 66
5.4 Diagrams explaining per-vertex interpolation of multiple viewer positions 71
5.5 Warped MCOP frames sent to the projectors for
at and convex screens 72
5.6 Diagrams showing resolution tradeo and iterative projector renement
for convex mirror screens . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Comparison of dierent viewer interpolation functions. We show three
objects photographed by an untracked center camera and two tracked
left and right cameras. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.8 Comparison of dierent viewer interpolation functions for a convex mirror 77
6.1 (top) Seven of the Panasonic cameras mounted around the stage to record
the performance. (bottom) Mosaic of all 30 camera views. . . . . . . . . 81
6.2 (left) Overhead diagram showing layout of projectors and screen. (right)
Diagram showing connectivity between computers, splitters, and projec-
tors. The render clients are synchronized and controlled by a single master
server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 (left) Photograph showing the 6 computers, 72 video splitters, and 216
video projectors used to display the subject. (right) The anisotropic
screen scatters light from each projector into a vertical stripe. The indi-
vidual stripes can be seen if we reduce the angular density of projectors.
Each vertical stripe contains pixels from a dierent projector. . . . . . . 85
6.4 We compute bidirectional optical
ow between adjacent pairs in the hor-
izontal cameras array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 (left) View generated using bilinear interpolation exhibits aliasing. (cen-
ter) View generated using spatial optical
ow has sharp edges and less
aliasing. (right) Closeup of aliasing around the face. . . . . . . . . . . . 88
6.6 For each point on the screen, we sample the spatial optical
ows elds
between the two nearest cameras. Each optical
ow pair represent a dif-
ferent point on the actual surface. We estimate an intermediate image
coordinate using both spatial
ow to interpolate between views and tem-
poral
ows to oset to a global timeframe. In the right diagram, each
image plane is each image plane is represented as a slanted line due cam-
era rolling shutter in each camera . . . . . . . . . . . . . . . . . . . . . . 89
6.7 This is a sampling of four projector frames generated using
owed light
eld rendering. Each frame appears warped as it corrects for multiple
centers of projection and foreshortening. . . . . . . . . . . . . . . . . . . 90
vii
6.8 Photograph of subject shown on the automultiscopic projector array. . . 93
6.9 Stereo photograph of subjects on the display, left-right reversed for cross-
fused stereo viewing. Each subject is shown from three positions. . . . . 94
6.10 Stereo photographs of 3D geometry on the display, left-right reversed for
cross-fused stereo viewing. Each object is shown from four positions. . . 95
viii
Abstract
While a great deal of computer generated imagery is modelled and rendered in three
dimensions, the vast majority of this 3D imagery is shown on two-dimensional displays.
Various forms of 3D displays have been contemplated and constructed for at least one
hundred years, but only recent advances in digital capture, computation, and display
have made functional and practical 3D displays possible. In this thesis, I propose several
designs that overcome some of the classic limitations of 3D displays. The displays are:
autostereoscopic, requiring no special viewing glasses; omnidirectional, allowing viewers
to be situated anywhere around it; and multiview, producing a correct rendition of the
3D objects with correct horizontal parallax and vertical perspective for any viewer around
the display.
The rst display prototype utilizes a spinning anisotropic mirror to distribute frames
from a high-speed video projector to dierent viewers. Unfortunately, as the size and
mass of the mirror increases, it becomes increasingly dicult to maintain a stable and
rapid rotation speed. The second 3D display form has no moving mechanical parts,
provides interactive content, and scales to large format displays. The key insight is that
a large array of closely stacked projectors aimed at a stationary anisotropic screen is
optically equivalent to a single high-speed projector aimed at a rotating anisotropic screen.
Both types of display utilize new algorithms based on geometry and light eld based
rendering. Applications for these displays include life-size interactive virtual characters,
3D teleconferencing, and time-oset conversations with 3D subjects.
ix
Chapter 1
Introduction
Our world is being rapidly populated by digital displays from televisions to computers to
cell phones. While
at electronic displays represent a majority of user experiences, it is
important to realize that
at surfaces represent only a small portion of our physical world.
The relative three-dimensional shape and position of objects is key to understanding and
interacting with the world around us. However, most digital applications still require
stereo glasses for the perception of true 3D imagery. Automultiscopic displays hold the
promise of seamless 3D imagery that can be seen from any viewpoint without the need for
special 3D glasses. Such displays can generate a stronger perception of 3D with both stereo
and motion parallax. The primary obstacle to designing practical glasses-free displays is
nding new ways to redirect pixels in dierent angular directions in order to be seen by
multiple viewers.
In this thesis, I present a series of display prototypes that oer a number of advan-
tages for displaying three-dimensional objects in 3D. These displays are autostereoscopic,
requiring no special viewing glasses, omnidirectional, allowing viewers to be situated any-
where around it, and multiview, producing a correct rendition of the light eld with correct
horizontal parallax and vertical perspective for any viewpoint situated at a certain dis-
tance and height around the display. The displays are built using primarily commodity
1
graphics and display components, and achieve real-time rendering with non-trivial scene
complexity across their entire eld of view.
The rst display uses a spinning anisotropic mirror to re
ect rays from a high-speed
DLP video projector to multiple viewers. I explore dierent mirror shapes and projector
congurations to increase spatial, angular, and temporal resolution. Later, I modifed this
spinning mirror display for real-time 3D teleconferencing with a dynamic and life-size
3D human face. Unfortunately, as the size and mass of the mirror increases, it becomes
increasingly dicult to maintain a stable and rapid rotation speed. This limitation led
to a second display design that has no moving mechanical parts and can scale to large
format displays. The key insight is that a large array of closely stacked projectors aimed
at a stationary anisotropic screen is optically equivalent to a single high-speed projector
aimed at a rotating anisotropic screen. New pocket-sized PICO projectors can be stacked
with less than 1.5cm separation. This distance is smaller than the human inter-ocular
spacing, so a large dense projector array can generate a distinct image for each eye. I
demonstrate two projector array systems with dierent display volumes. The rst system
is optimized for the human face, while the later can show a full human body at a life-size
scale.
Both spinning mirror and projector array displays represent a new type of horizontal-
parallax multi-view display that can achieve high-angular resolution. The anisotropic
screen or mirror turns the light of the projector into a vertical stripe while preserving
horizontal angular variation. The viewer's eye perceives several stripes from multiple
mirror or projectors positions that combine to form a seamless 3D image. As the adjacent
projector pixels diverge out to dierent viewer positions, there is no longer a one-to-one
mapping between projector frames and viewing positions. Rendering to such a display
requires the generation of multiple-center of projection (MCOP) imagery. I develop and
2
demonstrate the projection mathematics and rendering methods necessary to drive these
displays with real-time raster imagery or pre-recorded light elds so that they exhibit the
correct cues for both horizontal and vertical parallaxes.
3
Chapter 2
Background and Related Work
2.1 Types of 3D Displays
The terminology used in three-dimensional displays is often misused or misunderstood.
This is in part a result of the many varied display technologies that can achieve some level
of three-dimensional eect. There is also signicant confusion in popular culture as to
how displays work. In this thesis, we attempt to follow the taxonomy provided in recent
three-dimensional display surveys [108, 22, 18]. In keeping with the work of Zwicker et al
[128], we describe our display prototypes as both autostereoscopic and automultiscopic.
Both terms refer to the generation of stereo parallax without the need for glasses, while
the later term means that multiple stereo views are shown allowing for motion parallax
and multiple simultaneous users.
Holographic Displays One particular overused term is "hologram". In popular culture
this has been applied to any
oating image even if it is a two-dimensional Pepper's ghost.
The term hologram was originally coined by Dennis Gabor, based on the Greek term for
the "whole message", as holograms record the full wavefront re
ected o an object. Unlike
traditional photographs that record only the intensity of light, a hologram records both
4
intensity and phase. Traditional analog holograms use a coherent light source to illuminate
an object and record the re
ected interference pattern on an analog lm negative. By
illuminating the resulting lm with coherent light, it is possible to reconstruct the same
wavefront and the corresponding original 3D image of the object. More recent computer-
generated holography uses spatial light modulators to diract light based on a rendered
holographic fringe pattern. True holograms face several major obstacles. In order to
defract light, the resolution of the spatial light modulators has to be on the same scale
as the wavelength of light. Recording, rendering, and displaying at this resolution is
demanding both in terms of computational power and cost. As a result most practical
holographic displays have limited eld of view, limited color depth, or are limited to static
scenes. It is useful to note that displays such as that described by Agocs, and indeed
the solutions proposed here, use holographic optical elements, but are not considered
holographic displays.
Volumetric Displays The term volumetric displays has been used to cover a wide
range of displays. In the most general meaning, this covers any display that recreates a
three-dimensional representation within a volume. Unlike holographic displays, volumet-
ric displays do not consider the phase of light and treat light as ray emitted from within
a volume. There are two main categories: mechanical swept-volume displays and xed
static volume displays. Swept-volume displays illuminate a rapidly moving surface with
a high-speed projector or laser. As the surface passes through slices of the 3D scene, the
light source illuminates the 3D points coincident with the surface. The display surface
is typically a re
ective or transmissive diuse material re
ecting light over a wide eld
of view. An example swept-volume display was produced by Actuality Systems using a
spinning diuse disc and a high-speed DLP projector. Other displays have used spinning
5
helix and vibrating sheets. Fixed volume displays do not have any moving parts, and illu-
minate voxels in a emissive solid, liquid, or gas. Examples include illuminating defects in a
solid transparent volume [79], projecting patterns onto fog or water, or using intersecting
pulsed infrared laser beams to create balls of glowing plasma in normal air [49, 84]. An
alternative approach is to use sound waves to distort the medium for example [83] levitate
small re
ective particles using standing acoustic waves. A illumination volume could be
composed of multiple layered emissive LCD screens such as the commercial LightSpace
display. More recently transmissive LCD stacks have been used as compressive displays
as described below.
For all volumetric displays, the 3D image is highly dependent on the scattering prole
of the display screen or volume. If the material has a wide isotropic re
ection seen in
all directions, then there is one-to-one correspondence between the 3D glowing points
and the desired 3D scene. This results in full accommodation, horizontal and vertical
parallax. However for an isotropic medium, it is not possible to create directional eects
such as occlusion and opacity. Our displays belong to an emerging class of horizontal-
parallax multiview 3D displays that combine one or more video projectors to generate
view-dependent images on an anisotropic screen. Such screens re
ect light in a narrow
horizontal direction and diuse light vertically. The display can thus generate dierent
horizontal rays with dierent intensities at the expense of vertical parallax.
Multi-view Anisotropic Displays The idea for achieving occluding 3D imagery by
projecting on or through a moving anisotropic screen has existed within the eld of holog-
raphy for over a decade [6] and is more recently explored in [64, 13]. Recent systems that
employ this idea include [64], which uses an anisotropic privacy-guard lm on a spinning
LCD monitor. Their system is limited by the mass of the LCD panel and its slow update
6
rate, allowing only ve revolutions per second with just six independent viewpoints. The
Transpost system [86] renders 24 images around the outer edge of a video projected image
and re
ects these images onto a rapidly rotating anisotropic screen using a circle of mirror
facets. The system aims for a similar form factor and eect as mine, but achieves only
24 low-resolution (100x100) images around the circle. Their design does not scale well to
additional views as the views must be arranged in a circle within the projected image,
severely limiting their pixel size. However, it achieves 24-bit color whereas the spinning
mirror prototype is limited to halftoned imagery. The LiveDimension system [105] uses
an inward-pointing circular array of 12 projectors and a vertically-oriented light-control
lm, similar to that used in [64], to re
ect each projector's image outwards to the view-
ers. While they achieve twelve full-color views, they do not produce a sucient number
of views for binocular parallax, and a greater number of views would require a greater
number of projectors and use progressively less light from each of them. The Seelinder
display [21, 122] takes a dierent approach of spinning multiple 1D vertical arrays of LEDs
past a cylindrical parallax barrier to produce 3D images. They achieve better than 1
view spacing but with a relatively low resolution of 128 vertical pixels, and they require
very specialized hardware. Cossairt et al. [12] describe a display that couples a three-chip
high-speed DLP projector with a moving slit and a large lens to direct images in 26 hor-
izontal directions at 50Hz, but it uses highly specialized hardware and has a limited eld
of view. None of these systems compensate for changing vertical perspective and parallax
and all require either many projectors or very specialized hardware.
Our rst prototype design is similar to the work by Cossairt et al. [14] in that both sys-
tems use a single high-speed DLP projector to project patterns onto a spinning anisotropic
surface. While the system specications are comparable (Table 2.2), Cossairt et al. [14]
use a proprietary system architecture and do not address the problem of rendering 3D
7
scenes with either correct horizontal or vertical perspective to this type of display. The
perspective-correct projection technique is a central focus and contribution of this thesis.
Other high-speed projectors use proprietary PCI data transfer boards [12, 102]. Typ-
ically such boards generate voxel geometry which is rasterized on the display itself. The
voxel transfer is relatively slow. In order to achieve interactive rates, the DepthCube
display [102] limits the eld of view and transfers only the front surface of the scene vol-
ume. Our system takes advantage of standard graphics card acceleration and transfers
rendered 360
views of the scene across a standard monitor cable (DVI) in real-time. The
high-speed projectors used in this thesis were designed and built by Ian McDowall. Mc-
Dowall previously developed a high-speed projector that looped through 24 binary frames
stored in a single 24-bit DVI image [69, 41]. For Chapters 3 and 4, a new projector was
built with eld-programmable gate array (FPGA) hardware to decode frames in real-time.
This projector enables arbitrarily long sequences at much faster frame rates.
Cossairt et al. [14] Chapter 3 system Chapter 4 system
Interactive content no yes yes
Visual refresh rate 30Hz 15-20Hz (30-40Hz color) 30Hz
Per-view resolution 768768 768768 512768
Angular resolution 0.91
1.25
0.625
Horizontal eld of view 180
360
180
Image diameter 25 cm 13 cm 32 cm
Screen rotation frequency 900 rpm 900-1200 rpm 900 rpm
Color depth dithered RGB
dithered B&W dithered
or 2 channel color 2 channel color
Electronic interface SCSI-3 Ultra DVI DVI
Projection technique
single-view multiple multiple
perspective centers of projection centers of projection
Horizontal perspective innaccurate accurate accurate
Vertical parallax no yes, with tracking yes, with tracking
Table 2.1: Comparison of the spinning mirror prototypes with [14]. Rotation frequency
and visual refresh rate vary based on the graphics card data refresh rate.
8
Projector Arrays As early as in 1931, Ives demonstrated that a vertically oriented
lenticular screen could be used to generate autostereoscopic 3D imagery [38]. Originally
a special photograph with alternating left and right eye stripes was mounted directly
behind the screen. Today, lenticular LCDs with lenticular lenses remain the most common
form of autostereoscopic display due to their relative low cost. As the number of views
is limited by the pixel pitch of the backing image or LCD. Ives showed that increased
stripe density could be achieved by focusing 39 lm projectors onto a retrore
ective or
diuse screen behind the lenticular array. A similar idea was used by Matusik and Pster
[68] who presented a real-time acquisition, transmission, and display using 16 digital
projectors and a vertical lenticular screen. In both cases, the light is focused onto a
diuse plane behind the lenticular array then redistributed by each cylindrical lens. The
angular variation comes from the shape and focal length of the lenticular array. Even
with additional projector resolution, nearly all vertical lenticular displays have limited
eld of view as the lenticular cylinders start to self-occlude.
If the cylindrical lenticular lenses are aligned horizontally then they function as a
vertical anisotropic diuser. This orientation allows for greater angular density as it pre-
serves the angular variation of the original projector spacing. Based on this principle, the
commercial company Holograka has demonstrated large-format projector arrays includ-
ing the rear-projection HoloVizio 720RC [95] and the front-projection HoloVizio C80 [50].
Recently, Kawakita et al. [47] designed a massive 5 meter rear projection display. Both
Holograka and Kawakita et al. displays are built using large projector units with wide
spacing. Additional optics are added to their screens to refocus and redirect projector
rays across a narrow eld of view. Yoshida et al. [123] have developed an array with 103
micro-LCD projectors as a glasses-free tabletop 3D display. The projectors illuminate a
9
conical anisotropic screen coupled with a holographic diuser situated below the table
surface. A more in-depth comparison with projector arrays can be found in Table 2.2.
Zwicker et al. [128] analyzed the depth of eld of automultiscopic displays - showing
that spatial resolution decreases exponentially away from the screen surface. The rate
of decline is also proportional to the angular resolution of the display - higher angular
densities can more accurately resolve points further from the screen As the highest reso-
lution is at the screen surface, ideally the screen should should bisect scene content. For
table-top form-factors this is challenging as virtual objects above the table are further
from the screen surface
HoloVizio Kawatika Yoshida Chapter 5 Chapter 6
720RC [95] et al. [47] et al. [123] system system
Angular resolution unknown 0.24
1.27
1.66
0.66
Horizontal diusion unknown 0.88
0.5-1
1-2
1
Vertical diusion unknown 35
60
60
60
Horizontal eld of view 40
13.5
130
118
135
Horizontal screen size 3m 5m 20cm 30cm 1.2m
Screen shape
at
at cone/cylinder
at/convex
at
Number of projectors 80 57 103 72 216
Projector distance 5.5m 5.5m 80cm 60cm 3.3m
Number of computers 4 unknown 6 1 6
Vertical parallax no no no yes, no
with tracking
Table 2.2: Comparison of our system specications with other autostereoscopic projector
arrays.
Compressive Light Field Displays Similar to volumetric displays, compressive dis-
plays utilize multiple emissive or transmissive layers to reconstruct the 3D scene. However
instead of a one-to-one correspondence between pixels and scene points, compressive dis-
plays consider the cumulative eect of multiple pixels that lie along each ray path. Each
pixel may in
uence multiple rays yet each ray passes through a slightly dierent slice
of the controllable display volume. The nal distribution of pixel intensities across the
10
multiple layers is the result of a content-adaptive optimization process. This approach
exploits redundancy between nearby views in order to reduce the number of source pixels
required to generate a given number of distinct and simultaneous views.
Compressive display can be further categorized based on the type of pixel modulation.
The earliest compressive displays used emissive layers, where each pixel could add but not
remove light. For example, Mora et al. [74] use tomographic reconstruction techniques to
identify a distribution of emissive points, that can best reconstruct occlusion for a small
set of views on an Actuality volumetric display. Similar techniques could be applied to
other volumetric displays.
A variety of compressive displays have been built using layered LCDs with a shared
backlight [54, 115, 56, 113, 65, 10]. An overview of these displays can be found in [116].
These LCD function as multiplicative, as each scale the light passing through the layer.
A summary of numerical methods for optimizing pixel intensities can be found in [57].
The optimization problem can be treated as low-rank matrix factorization for two layers
[54] or tensor factorization for arbitrary layers [113]. Wetzstein et al. use tomographical
techniques [115] as well as real-time optimization methods [114]. Hirsch et al. [33] provide
practical implementation details for compressive displays. Lanman et al. [53] conducts
a depth of eld analysis for compressive displays. Compressive displays can also be
combined with user tracking. Chen et al. [10] optimize a multi-layer display for tracked
view zones. A variant of multiplicative layers is explored using LCD panels to rotate
polarization along each ray [56]. Multiple spatial light modulators can be layered to
create a compressive light led projector combined with an angular expanding screen is
used to increase the narrow eld of view [34].
11
2.2 Other Related Work
Multi-view rendering algorithms Our projection algorithms relate to previous work
in holography and light eld rendering. Halle et al. [29] proposed a method where static
holographic stereograms account for the viewer's distance but not their height. Much
of the existing light eld literature [61, 25, 36] describes useful techniques for acquiring,
storing, and sampling multi-view content. Results from [8, 128] informed our choices for
the amount of horizontal diusion, the number of views we render around the circle, and
the camera aperture used to record our light elds. Our technique for multiple-center-
of-projection view rendering using GPU vertex shaders is informed by the recent work of
Hou et al. [35].
User-tracking User tracking has long been used for single-user glasses displays [15,
51, 111] and single-user autosteroscopic displays [88] in order to update both horizontal
and vertical parallax. Our systems are the rst autostereoscopic projector arrays to
incorporate tracking for vertical parallax. Our method has the advantage that it can
handle provide correct horizontal parallax to an arbitrary number of users, and tracked
vertical parallax to users shoulder distance apart. Tracking latency is less noticeable as
it is restricted to the vertical direction where rapid movements are less frequent. None
of these existing lenticular or projector arrays generate vertical parallax through our
technique could be implemented for these and other autosteroscopic displays.
Light eld rendering In Chapter 6, we resample a set of capture cameras to generate
novel views corresponding with each projector in the array. This work builds on extensive
previous research in the area of image-based and light eld based rendering. Chen et
al.[118] warps rendered images using depth maps to generate novel views while [58, 71] use
12
stereo correspondence to compute depth for altering the viewpoint of real scenes. Seitz
and Dyer [98] presents a view morphing technique for synthesizing correct perspective
for novel viewpoints between corresponded original views; light eld rendering techniques
[61, 25] directly synthesize views of a scene from a new 3D viewpoint by sampling rays from
a 2D array of densely spaced viewpoints. Visual delity can be improved by projecting
image samples onto scene geometry[31] or explicitly forming a surface light eld [72].
Several works have constructed 2D arrays of cameras to capture light elds of dynamic
events. Among them, Yu et al.[124] used an array of cameras and distributed rendering
to allow multiple viewers to observe virtual views in real time; [YMG02] extended this
work to surface cameras that allow the light eld to focus on non-planar geometry. Zhang
and Chen[125] also used depth information to focus a real-time light eld from self-
recongurable camera array. Wilburn et al.[117] used video from array of cameras to
perform spatiotemporal view interpolation: generating novel views from intermediate
camera positions and points in time by computing optical
ow between views staggered
in both time and space.
13
Chapter 3
Spinning Displays
Figure 3.1: A 3D object shown on the display is photographed by two stereo cameras
(seen in the middle image). The two stereo viewpoints sample the 360
eld of view
around the display. The right pair is from a vertically-tracked camera position and the
left pair is from an untracked position roughly horizontal to the center of the display. The
stereo pairs are left-right reversed for cross-fused stereo viewing.
3.1 Introduction
Since the invention of early 19th century stereoscopes [7], most 3D displays have required
stereo glasses to present a dierent image to each eye. However, the human visual system
is not limited to binocular stereo, using many other cues to recognize the depth and
shape of the world around us including motion parallax, focus cues, the size and height of
objects, occlusion, shading, and texture. Existing autostereocopic technologies can only
create a subset of these depth cues. Holographic displays, lenticular and integral displays
14
only work within a narrow view area - thus limiting motion parallax. Alternatively, swept
volumetric displays illuminate a moving display surface to crete glowing 3D points visible
from all directions. Yet these displays cannot recreate monocular depth cues such as
occlusion and shading without angular control of the scattered light. The displayed 3D
points can not block light from more distant surfaces. These missing cues are critical.
Occlusion can be perceived at any distance, making it arguably the strongest depth cue
available. Similarly motion parallax has been shown to be more important than binocular
stereo at distances greater than one meter [16]. Figure 3.2 plots the relative importance
of these depth cues.
Our rst prototype generates correct 3D perspective, occlusion, and shading over a full
360
viewing area. The display utilizes a high-speed video projector and a synchronized
spinning anisotropic screen to re
ect sequential images to viewers around the display.
The high angular density of views allows for both binocular stereo and horizontal motion
parallax without glasses. Vertical parallax can be achieved by combining the display with
active tracking of viewer positions. We refer to this prototype as automultiscopic - as
multiple viewers can observe the display simultaneously.
In this chapter, we introduce techniques for rendering both 3D geometry and real-
world light elds to the display. This requires both calibration of the display hardware,
and novel projection math that relates each rendered to pixel its desired viewing direction.
We analyze the display's ability to reproduce color and focus/accomodation cues. Our
contributions include:
An easily reproducible 360
horizontal-parallax light eld display system that lever-
ages low-cost commodity graphics and projection display hardware.
15
A novel software/hardware architecture that enables real-time update of high-speed
video projection at kilohertz rates using standard graphics hardware.
A light eld display technique that is horizontally multiview autostereoscopic and
employs vertical head tracking to produce correct vertical parallax for tracked users.
A novel projection algorithm for rendering multiple centers of projection OpenGL
graphics onto an anisotropic projection surface with correct vertical perspective for
any given viewer height and distance.
Figure 3.2: Graph showing the relative importance of depth cues at varying distances.
Occlusion does not provide absolute depth information, but can be perceived at any
distance. Stereo parallax is most useful for nearby scenes, while motion parallax is more
important for scenes further than one meter away. [16]
16
3.2 System Overview
Our 3D display system consists of a spinning mirror covered by an anisotropic holographic
diuser, a motion-control motor, a high-speed video projector, and a standard PC. The
DVI output of the PC graphics card (an nVIDIA GeForce 8800) is interfaced to the
projector using an FPGA-based image decoder. As seen in Figure 4.1, the spinning
mirror is tilted at 45
to re
ect rays of light from the projector to all possible viewing
positions around the device, allowing many people to view the display simultaneously.
The remainder of this section provides details of the system components.
high-speed
projector
spinning
mirror
synchronized
motor
Figure 3.3: (Left) The display shows an animated light eld in 3D to an audience around
the device. (Right) Schematic showing the high-speed projector, spinning mirror, and
synchronized motor.
High-Speed Projector We achieve high-speed video projection by modifying an o-
the-shelf projector to use a new DLP drive card with custom programmed FPGA-based
circuitry. The FPGA decodes a standard DVI signal from the graphics card. Instead of
17
)
Figure 3.4: Twenty-four consecutive binary frames of interactive OpenGL graphics are
packed into a single 24-bit color image.
rendering a color image, the FPGA takes each 24-bit color frame of video and displays
each bit sequentially as separate frames (Figure 3.4). Thus, if the incoming digital video
signal is 60Hz, the projector displays 60 24 = 1; 440 frames per second. To achieve even
faster rates, we set the video card refresh to rates of 180-240Hz. At 200Hz, the projector
displays 4,800 binary frames per second. We continuously render new horizontal views of
the subject (288 images per rotation). These views are encoded into 24-bit images and
sent to the projector. A complete kit consisting of the FPGA and DLP boards is now
available from Polaris Road, Inc.
Spinning Mirror System Previous volumetric displays projected images onto a spin-
ning diuse plane which scattered light in all directions. Such displays could not recre-
ate view-dependent eects such as occlusion. In contrast, our projection surface is an
anisotropic holographic diuser bonded onto a rst surface mirror. The mirrored surface
re
ects each projector pixel to a narrow range of viewpoints. The holographic diuser pro-
vides control over the width and height of this region. The characteristics of the diuser
are such that the relative diusion between x and y is approximately 1:200. Horizontally,
the surface is sharply specular to maintain a 1:25
f separation between views. Vertically,
18
the mirror scatters widely so the projected image can be viewed from essentially any
height. Figure 3.5 shows the anisotropic re
ectance characteristics of the mirror system.
The horizontal prole of the specular lobe approximates a bilinear interpolation between
adjacent viewpoints; the motion of the mirror adds some additional blur which improves
reproduction of halftoned imagery at the expense of angular resolution.
The anisotropic holographic diuser and mirror assembly are mounted on a carbon
ber panel and attached to an aluminum
ywheel at 45
. The
ywheel spins syn-
chronously relative to the images displayed by the projector. A two-mirror system (which
is more balanced) for re
ecting multi-color imagery is described in Section 3.7.
Our system is synchronized as follows. Since the output frame rate of the PC graph-
ics card is relatively constant and cannot be ne tuned on the
y, we use the PC video
output rate as the master signal for system synchronization. The projector's FPGA also
creates signals encoding the current frame rate. These control signals interface directly to
an Animatics SM3420D "Smart Motor" which contains rmware and motion control pa-
rameters resulting in a stable, velocity-based control loop that ensures the motor velocity
stays in sync with the signals from the projector. As the mirror rotates up to 20 times
per second, persistence of vision creates the illusion of a
oating object at the center of
the mirror.
Tracking for Vertical Parallax The projector and spinning mirror yield a horizontal-
parallax-only display; the image perspective does not change correctly as the viewpoint
moves up and down, or forward and backward. However, the projection algorithms we
describe in Section 4.4 take into account the height and distance of the viewer to render
the scene with correct perspective. If just horizontal parallax is required, a good course
19
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
1.2
degrees
intensity
Figure 3.5: Measuring the Holographic Diusion The holographic diuser is dif-
fuse in the vertical dimension and sharply specular in the horizontal dimension. Left:
Photographs of a laser beam and a thin vertical line of light from the video projector as
re
ected by the holographic diuser and mirror toward the viewer. The horizontal width
represented in each image is four degrees. The bottom image shows the ideal bilinear
interpolation spread of a hat function whose radius matches the 1:25
angular separation
of the display's successive views. Right: Graphs of the horizontal intensity proles of
the images at left. Dotted red is the laser, solid blue is the projector, and dashed black
is the bilinear interpolation function.
of action is to initialize this height and distance to the expected typical viewing height
and distance.
Since our display is interactive, we can achieve both horizontal and vertical parallax
display by using a tracking system to measure the user's height and distance. In this
work, we use a Polhemus Patriot electromagnetic tracking system where the user holds
the sensor to their temple (or to a video camera lming the display.) The tracking data
is used by the projection algorithm to display the scene from the correct perspective for
the viewer's height and distance. In this way, the display's horizontal parallax provides
binocular stereo and yields zero lag as the user moves their head horizontally, which we
believe to be the most common signicant head motion. The eects of vertical motion
and distance change are computed based on the tracked position. The display only needs
to adjust the rendered views in the vicinity of each tracked user, leaving the rest of the
20
displayed circumference optimized to the average expected viewer position (Figure 5.1).
This provides an advantage over CAVE-like systems where the tracked user's motion
alters the scene perspective for all other users. More advanced user tracking is presented
in Chapters 4 and Chapters 5.
3.3 Projecting Graphics to the Display
P
1
2
4
3
h
d
M
Q
P’
1
2
3
4
M
P’
Ci
Ci+1
M
(a) (b) (c)
Figure 3.6: (a) Intersection of a vertically diused ray of light with the circular locus of
viewpoints V. (b) Seen from above, rays leaving the mirror diverge from the projector's
re
ected nodal point to multiple viewpoints. The viewpoint corresponding to vertex Q
is found by intersecting the vertical plane containing ray
!
P
0
Q with the viewing circle V.
(c) When preprocessing a light eld, the intersection point V
0
determines the nearest
horizontal views to sample.
In this section we describe how to render a scene to the 3D display with correct
perspective, using either scanline rendering or ray tracing. We assume that the spinning
mirror is centered at the origin and that its axis of rotation is the vertical y-axis, with
the video projector at the nodal pointP above the mirror as in Figure 3.6(a). We further
assume that the viewpoint for which the correct perspective should be obtained is at a
heighth and a distanced from they-axis. By the rotational symmetry of our system, we
can produce perspective-correct imagery for any viewing position on the circle V dened
21
byh andd, yielding binocular images for a viewer facing the display sinceh andd will be
similar for both eyes. We denote a particular viewpoint on the circle V asV
0
. In practice,
the set of perspective-correct viewpoints V need not be a continuous planar circle and
can pass through a variety of tracked viewer positions at dierent distances and heights.
At any given instant, with the spinning anisotropic mirror frozen at a particular posi-
tion, the 2D image projected onto the mirror is re
ected out into space, covering parts of
the eld of view of many viewpoints on V as shown in Figure 3.6(b) and photographically
observed in Figure 6.3. Since the mirror provides little horizontal diusion, each projector
pixel (u;v) essentially sends light toward one specic viewpointV
0
onV . We must ensure
that each projected pixel displays the appropriate part of the scene as it should be seen
from viewpointV
0
. Thus, there are two questions we should be able to answer: First, for
a 3D point Q in a scene, what is the corresponding projector pixel (u;v) that re
ects to
the correct viewpoint V
0
along the ray
!
QV
0
? Second, for a given projector pixel (u;v),
which ray should be traced into the scene so that the display projects the correct ray
intensity? The rst answer tells us how to render 3D geometric models to the display
and the second answer tells us how to render ray-traceable scenes such as light elds. We
answer these two questions below.
3.3.1 Projecting from the Scene into the Projector
If our scene is a polygonal 3D model, we need to determine for any world-space vertex
Q where it should be rendered on the projector's image for any given mirror position.
To do this, we view our system from above and note that in the horizontal plane, our
anisotropic mirror essentially behaves like a regular mirror. We thus unfold the optical
path by re
ecting the projector position P to P
0
across the plane of the mirror as seen
in Figure 3.6(b). A ray originating at P
0
passing through Q will continue out into space
22
toward the viewers. This ray
!
P
0
Q will not, in general, intersect the view circle V. By
assuming that the mirror diuses rays into a vertical plane, we intersect the vertical plane
containing
!
P
0
Q with the viewing circle V to determine the viewpoint V
0
from which Q
will be seen with the mirror at its given position. Section 3.8 explains that this diusion
plane is actually an approximation to a cone-shaped re
ection from the mirror, but that
the projection error is small for our setup and can be neglected in practice.
We then trace a ray from the viewpoint V
0
toward Q until it intersects the surface of
the mirror atM. M is the one point on the mirror that re
ects light to the viewer coming
from the direction of Q. To draw onto this point from the projector, we simply need to
project M up toward the projector's nodal point P to nd the corresponding projector
pixel (u;v). Thus, illuminating a pixel at (u;v) will make it appear from viewpoint V
0
that 3D pointQ has been illuminated. Q will eventually be rendered as it should be seen
from all other viewpoints on V as the mirror rotates.
Implementation With these few geometric intersections, we can determine for any 3D
point Q where it should be drawn on the projector for each position of the mirror. Seen
on the display by a viewer, the observed images exhibit correct perspective projection as
in Figure 3.8(c). This technique actually renders multiple-center-of-projection (MCOP)
images to the projector which can not be generated using a traditional projection matrix;
essentially, the projection uses a combination of two dierent viewpointsP (for horizontal
coordinates) andV
0
(for vertical coordinates). Nonetheless, the technique is easily imple-
mented as the vertex shader (see Table 3.1), allowing an entire mesh to be rendered in
a single pass. For z-buering, vertex depth can be based on the distance from V
0
to Q.
In this MCOP projection, long straight lines should naturally appear curved in the pro-
jection. Thus, models with large polygons should be tesselated; alternatively, a fragment
shader as in [35] could discard incorrect pixels that lie outside the triangle.
23
void rasterVS(
float4 Q : POSITION, // vertex position
float4 Qcol : COLOR0, // vertex color
uniform float4x4 ModelViewProj, // projector transform
uniform float4 P, // reflected projector position P'
uniform float d, // view radius
uniform float h, // view height
uniform float4 mirror_norm, // normal of mirror plane
out float4 oQ : POSITION,
out float4 oQcol : COLOR0 )
{
// define ray from reflected projector position P' to vertex Q
float4 PQ = Q - P;
PQ = normalize(PQ);
// compute intersection of ray PQ with vertical cylinder with
// radius d to find view position V'
V = RayCylinderIntersection(PQ, d);
V.y = h; // set correct viewer height
// define ray from ideal viewing position V' to vertex Q
float4 VQ = Q - V;
VQ = normalize(VQ);
// compute intersection ray VQ with mirror plane to find point M
float4 M = RayPlaneIntersection(VQ, mirror_norm);
oQ = mul( ModelViewProj, M ); // project M into projector
oQcol = Qcol; // keep the existing vertex color
// recompute depth based on distance from V'
oQ.z = length(V - Q) / (2 * length(V - M));
}
Table 3.1: CG shader code to map a 3D scene vertex into projector coordinates as de-
scribed in Section 3.3.1. It assumes helper functions are dened for basic geometric
intersection operations.
3.3.2 Ray Tracing from the Projector into the Scene
If the scene to be displayed (such as a light eld) is most easily raytraced, we need to
determine for each projector pixel (u;v) which ray in space { from the viewer toward
the scene { corresponds to that pixel. We again use the re
ected projector position in
Figure 3.6(b) and project a ray from P
0
through its corresponding pixel (u;v) to where
it intersects the surface of the mirror at point M. Upon intersecting the diuser, we
assume that this ray
!
P
0
M spreads into a vertical fan of light which intersects the circle of
24
(a) (b)
(c) (d)
Figure 3.7: Each viewpoint sees sections of multiple projector frames re
ected by the
spinning mirror to form a single perspective image. The slices that compose the single
view shown in (a) can be seen directly in high-speed video images taken of the mirror (b).
(c) and (d) show photographs of the mirror re
ecting a sequence of alternating all-black
and all-white frames from 56cm and 300cm away, respectively, showing that the number
of frames seen varies with viewer distance.
views V at V
0
. Seen from above, this intersection is easily calculated as a 2D line-circle
intersection.
We now know that projector pixel (u;v) re
ects from mirror pointM toward viewpoint
V
0
. Thus, the color it should display should be the result of tracing a ray from V
0
toward
point M. If our scene is a light eld, we simply query ray
!
V
0
M for the scene radiance at
that point. We discuss using this result to render 4D light elds in real time in Section
3.5.
25
(a) perspective (b) projective (c) MCOP
Figure 3.8: A scene is rendered from above (top row) and straight-on (bottom row) using
three methods. (a) Projecting regular perspective images exaggerates horizontal perspec-
tive and causes stretching when the viewpoint rises. (b) Projecting a perspective image
that would appear correct to the viewer if the mirror were diuse exaggerates horizontal
perspective and causes keystoning. (c) Our MCOP algorithm produces perspective-correct
images for any known viewpoint height and distance.
3.3.3 Discussion
The fans of light from a given projector frame diverge horizontally toward multiple view-
points. As the mirror rotates, each viewpoint around the display sees a vertical line that
scans out pixels from numerous projected MCOP images to form a single perspective
image. We captured the formation of these slices using a high-speed camera as seen in
Figure 6.3(a,b). The number of slices that make up an observed image depends on the
viewpoint's distance from the display. We tested this by projecting a sequence of alter-
nating all-black and all-white images, allowing the number of images contributing to any
one viewpoint to be counted easily. Closer to the mirror (Figure 6.3(c)), the number of
images that contributes to the view increases. As the viewpoint recedes (Figure 6.3(d)),
26
the number of images contributing to a view decreases to a minimum of approximately
ten. This number never drops to one since our video projector is not orthographic.
Comparison with other rendering methods Simpler techniques can be used to
project imagery to the display, but they do not achieve correct perspective. [14] recom-
mends displaying perspective or orthographic images of the scene directly to the projector.
Unfortunately, this technique yields images with exaggerated horizontal perspective (Fig-
ure 3.8(a)) since it does not consider that the image seen at a viewpoint consists of vertical
slices of many of these perspective or orthographic images. This approach also neglects
projector ray divergence; the lower part of the spaceship appears too tall since it is further
from the projector.
Another technique would be to project perspective images to the display surface that
would appear correct to a given viewpoint if the mirror were replaced with a completely
diuse surface. [19, 93] describe this process in the context of theater and interactive
applications. However, this technique does not project perspective-correct imagery for
our 3D display (Figure 3.8(b)). While the vertical perspective is accurate, the rendering
shows exaggerated horizontal perspective (the wings splay outward) and the image is
also skewed. Using the MCOP projection technique described above, images appear
perspective-correct for any viewer on V, and V can be adjusted for any estimated or
tracked viewer height and distance (Figure 3.8(c)).
3.4 Geometric Calibration
Our projection process requires knowing the intrinsic projector parameters and its pose
relative to the spinning mirror. We choose our world coordinates to originate at the center
of the mirror, with the vertical axis (0; 1; 0) oriented along the mirror's axis of rotation.
27
(a) (b)
Figure 3.9: (a) Fiducial markers used for determining the projection matrix P. (b) The
four outer mirror ducials as seen by the projector with the mirror at 0
and 180
.
Calibration is relatively straightforward as we only use a single projector and optical path
with a single rotating element.
We use the simple linear calibration approach outlined in Section 3.2 of [23]. The
method requires at least 6 correspondences between known 3D points and their trans-
formed 2D pixel positions. We ignore radial lens distortion as this was measured to be
insignicant.
We obtain known 3D positions by marking xed points on the mirror surface. With the
motor o, we position the mirror so that it faces the front of the display and attach a paper
calibration target consisting of ve ducial markers on the mirror's surface (Figure 3.9).
We project a centered crosshair pattern from the projector so that it can be positioned
directly above the center ducial. (The projector is mounted so that its central projected
pixel projects down vertically.) We use a mouse to move the crosshair to each of the other
ducial markers, clicking the mouse to obtain the position of the corresponding projector
pixel. We then rotate the mirror 180
and click the four ducials again, obtaining a total
of eight 2D points. The eight ducial positions form a unit cube in space.
28
3.5 Displaying Photographic Light Fields
This section describes how we capture, preprocess, and dynamically render 4D light elds
to the device with correct horizontal and vertical parallax leveraging the ray tracing
projection developed in Section 3.3.2.
Light Field Capture We begin by capturing a 4D light eld of a real object. In
this work, we place the object on an inexpensive motorized turntable (Figure 3.10, top
row). A video camera is placed at a distance of D = 1:0m in front of the object. The
object is lit with ambient light and/or lights attached to the turntable so that the object
and its illumination remain in the same relationship to each other during the rotation.
We capture a movie sequence of at least 288 frames of the object rotating 360
on the
turntable, which takes a few seconds. We capture a full 4D light eld by shooting multiple
rotations of the turntable, raising the camera's height H by 1.25cm for each successive
rotation. We calibrate the intrinsic parameters for the camera and record its pose for
each rotation.
Preprocessing the Light Field As discussed in Section 4.4, regular perspective im-
ages can shown directly on the projector will not produce correct perspective to viewers
around the display. Thus, we pre-process the light eld to produce images appropriate
for projection. We rst align our object and display coordinate systems by placing the
origin at a point within the center of the object directly above the center of the turntable,
and we align the y axis to the turntable's axis of rotation. Then, for each slice i of the
captured light eld taken from height H
i
, we generate a new, rebinned, light eld slice
as follows. We place the virtual viewing circle V around the display at height H
i
and
distance D. Then, for each of the 288 mirror positions, we trace rays from the re
ected
projector at P
0
through each pixel (u;v) to the mirror at M through to the viewpoint
29
V
0
on V and then back toward M as described in Section 3.3.2. We then simply need to
query the light eld for its radiance along ray
!
V
0
M. This is a simple query since we chose
V to be coincident with the height and distance of the current slice of the light eld: V
0
thus lies on or between two of the same slice's camera locations C
i
andC
i+1
as in Figure
3.6(c). To obtain the nal pixel value, we only need to bilinearly interpolate between the
pixels from C
i
and C
i+1
that look toward point M on the mirror.
For our display, we next dither the rebinned slices using [85] to create binary images
as in the middle row of Figure 3.10, and we pack sets of 24 halftoned images into 24-bit
color images. As there are 288 images in each rebinned slice, this yields twelve 24-bit color
images per row. At 768 768 resolution, one slice requires just over 20MB of texture
memory, allowing a light eld resolution of over 768 768 pixels by 288 32 views to be
stored on a modern 768MB graphics card.
By construction, each one of the rebinned light eld slices yields correct perspective
when projected on the display and observed anywhere from the original slice's height
H
i
and distance D. If the viewer distance remains near distance D, one could produce
accurate vertical parallax by swapping which slice is displayed according to the user's
height. To render the light eld accurately for any height and distance, we use a dynamic
rebinning process described below.
Dynamic Rebinning for Vertical Parallax We perform dynamic vertical rebinning
that samples from dierent preprocessed light eld slices based on the viewer's height h
and distance d to produce correct horizontal and vertical perspective on the light eld
for any viewpoint. For each mirror position, we consider each slice i's nodal point at
distance D and height H
i
in front of the mirror as shown in Figure 3.11(a). We project
the midpoints between the slices through the viewer position onto the mirror, and then
30
up into the projector image. These projected midpoints form an axis of points crossing
the center of the projector image. We extend lines from each point perpendicularly to
this axis, dividing the projector's image into a set of regions, each one corresponding to
the area for which light eld slice i contains the rays that most closely correspond to the
viewpoint's view of the scene over that area. We delimit the regions as quadrilaterals
that extend wide enough to cover the image as seen in Figure 3.11(b). Then, for each
quadrilateral, we render a texture-mapped polygon that copies over the corresponding
region from each light eld slice. A result of building up a projected image from these
dierent slices is seen in Figure 3.11(c).
If the viewer is close to distanceD from the display, just one or two light eld slices will
constitute the projected images. As the viewer moves forward or back fromD, the number
of slices used will increase. Since the images on the graphics card are already dithered,
we perform no blending between the slices. However, our light eld was of sucient
vertical angular resolution that the seams between the slices were not noticeable. Figure
3.10, bottom row, shows a photograph of a dynamically rebinned light eld for a tracked
camera with the original object seen nearby in the frame, exhibiting consistent size and
perspective. A sequence of dynamically-rebinned 4D light eld imagery displayed to a
moving camera is shown in the accompanying video.
Displaying an Animated Light Field Instead of using the graphics card's memory
to store multiple vertical slices of an object's light eld, we can store multiple temporal
samples of a horizontal-parallax-only light eld. Figure 3.12 shows photographs from a
25-frame animated light eld of a running man captured and rendered using the
owed
re
ectance eld rendering technique of [20]. Alternatively, light elds from multi-camera
systems [120, 117] could be used, or a high-speed single-camera system using a spinning
mirror to vary the viewpoint as in [40] could be used to capture such data.
31
3.6 Visual Accommodation Performance
Accommodation is the eect that each point of the displayed 3D image comes into focus
at a depth that is consistent with its displayed binocular disparity. Achieving correct
visual accommodation can signicantly improve the visual eectiveness of a 3D display
[3]. We performed a basic accommodation test on our 3D display by photographing a
test scene shown by the display using a wide-aperture lens at dierent focal depths. The
results of the experiment are shown in Figure 3.13.
As we present a true light eld in a horizontal plane, the accommodation of the human
eye should be at the depth of features on the virtual object. We have veried this to be
true by placing a horizontal slit across the front of a long lens, and then adjusting the
focus from near to far on a model of small evenly spaced cubes which ll the display's
volume. A detail of these images is presented in Figure 3.13(a) which shows receding
boxes coming into focus as the lens is adjusted. The narrow depth-of-eld of the lens
naturally blurs boxes fore and aft of the focal distance. It is interesting to note that this
blur is made of discrete images due to the quantized nature of our 288-image light eld.
Diusion in the vertical plane disrupts the purity of the light eld for angles other
than that of direct re
ection. This is conrmed in Figure 3.13(b) which was captured by
adjusting focus from near to far with a vertical slit placed in front of the long lens. We
note that the focus recedes with the plane of the diusing mirror, and that the virtual
depth of the small cubes does not play a role as it did with the horizontal slit. The actual
image is a blend of both, and the diusing plane bisects the volume, which appears to
provide a comfortable eld upon which to focus the eye.
32
3.7 Displaying Color Imagery
A straightforward method to create a color version of our display would use a 3-chip DMD
projector. In advance of that, we have implemented a two-channel eld-sequential color
system using a two-sided tent-shaped diusing mirror shown in Figure 3.14(a). For each
side of the tent, we place a color lter between the holographic diusing lm and the
rst-surface mirror, which avoids introducing specular rst-surface re
ections. We chose
a Lee #131 cyan lter for one side and a Lee #020 orange lter for the other, dividing
the visible spectrum approximately evenly into short and long wavelengths. We convert
RGB colors to Orange-Cyan colors by projecting the linear RGB vector onto the plane
spanned by the Orange and Cyan colors.
To render in color, we calibrate each plane of the tent mirror independently as in
Section 3.4. Then, we render the 3D scene twice for each sub-frame, once for the orange
side and once for the cyan side, and the calibration process ensures that each side is
rendered toward the appropriate set of viewpoints. The eect for the viewer is similar to
the Kinemacolor 2-color cinema system, and the choice of lters allows for useful color
reproduction for many scenes. Besides achieving color, the tent-mirror system doubles
the number of images per second shown to the viewers, allowing a 40Hz eld-sequential
color frame rate which appears signicantly more stable than 20Hz monochrome.
3.8 Approximating Conical Re
ections
The projection algorithm presented in Section 4.4 makes a slight approximation by as-
suming that the mirror diuses light from a ray of the projector into a fan of light within a
vertical plane. However, these fans of light are generally conical in shape. The re
ectance
33
properties of an anisotropic surface can be simulated as small parallel cylindrical micro-
facets aligned with dominant axis of anisotropy ~ a [89]. In our setup, ~ a is a horizontal
vector in the plane of the mirror. A projector ray striking a cylindrical micro-facet will
be specularly re
ected at a mirror angle along the cylinder tangent. The re
ected light
forms a cone whose angle at the apex is equal to the angle of incidence [46]. The re
ected
light forms a plane in the special case where the incident light is perpendicular to the
dominant anisotropic axis. As our projector is mounted vertically relative to the mirror
with a relatively narrow eld of view, the projector rays always hit the mirror at close to
90 degrees yielding extremely wide cones. Furthermore, the cones are tangent to the ideal
vertical plane in the vicinity of rays
!
P
0
Q, making these planes close approximations to the
re
ect fans of light in our setup. The step that involves re
ecting the projector through
the plane of the mirror also implicitly makes this assumption, but again the eects are
minimal with our conguration. Errors would appear as a narrowing of the horizontal
perspective at extremely high and low viewpoints. Analytically intersecting a cone with
the viewing circle V is possible but computationally expensive, requiring solving a higher-
order polynomial equation [73]. In practice, a look-up table could be employed to correct
for the small projection errors introduced by the conical re
ection.
34
Figure 3.10: (Top row) Two images from an object light eld captured using a turntable.
(Middle row) Resampled projector frames optimized for the same two viewer heights. Both
frames compensate for the horizontal divergence of projector rays and vertical stretching at
oblique viewing angles. The images appear mirror-reversed (and for most views, rotated)
prior to projection. (Bottom row) A single photograph of the original object sitting to
the right of its virtual version shown on the 3D display.
35
Mirror
Projector P
Eye
Position
Viewpoints
H i
H i+1
H i+2
Projector
Viewport
Quad(H i+2)
Quad(H i+1)
Quad(H i)
Mirror
Normal
(a) (b) (c)
Figure 3.11: (a) To produce correct vertical parallax, vertical light eld rebinning is
performed dynamically by projecting the light eld slice closest in angle to the viewpoint
onto each area of the mirror. (b) These projected areas dene textured quadrilaterals on
the mirror surface, each corresponding to a light eld slice. (c) The areas corresponding
to dierent original slices are made visible by inverting every other quadrilateral of this
dynamically rebinned projector frame.
Figure 3.12: Photographs of eight frames from a 25-frame animated light eld as shown
on the display.
36
(a) Horizontal focus change (b) Vertical focus change
Figure 3.13: (a) Correct horizontal accommodation Using a horizontal slit aperture,
the front boxes come into focus when the camera focus is near (top) and the far boxes come
into focus when the focus is far (bottom). (b) Incorrect vertical accommodation
Using a vertical slit aperture, the camera focuses on the mirror plane that slopes away
from the viewer. The bottom boxes are in focus when the camera is focused near (left).
The top boxes are in focus when the camera is focused far. An unoccluded observer would
observe an astigmatic combination of these two eects.
37
(a) (b)
Figure 3.14: (a) A two-mirror tent for displaying two-toned color imagery using orange
and cyan lters below the diusers. (b) A photograph of color imagery displayed by our
device.
38
Chapter 4
3D Teleconferencing
4.1 Introduction
When people communicate in person, numerous cues of attention, eye contact, and gaze
direction provide important additional channels of information [4], making in-person meet-
ings more ecient and eective than telephone conversations and 2D teleconferences.
However, with collaborative eorts increasingly spanning large distances and the eco-
nomic and environmental impact of travel becoming increasingly burdensome, telecom-
munication techniques are becoming increasing prevalent. Thus, improving the breadth
of information transmitted over a video teleconference is of signicant interest.
The potential utility of three-dimensional video teleconferencing has been dramatized
in movies such as Forbidden Planet and the Star Wars lms. The lms usually depict
a single person transmitted three-dimensionally from a remote location to interact with
a group of colleagues somewhere distant. The lms depict accurate gaze and eye con-
tact cues which enhance the dramatic content, but the technology is ctional. A recent
demonstration by CNN showed television viewers the full body of a remote correspon-
dent transmitted "holographically" to the news studio, appearing to making eye contact
with the news anchor. However, the eect was performed with image compositing in
39
postproduction and could only be seen by viewers at home; the anchor actually stared
across empty space toward a traditional
at panel television [94]. The Musion Eyeliner
(http://www.eyeliner3d.com/) system claims holographic "3D" transmission of gures
such as Price Charles and Richard Branson in life size to theater stages, but the transmis-
sion is simply 2D high denition video projected onto the stage using a Pepper's Ghost
[101] eect only viewable from the theater audience; the real on-stage participant must
pretend to see the transmitted person from the correct perspective to help convince the
audience of the eect. CISCO Systems' TelePresence systems use a controlled arrange-
ment of high-denition video cameras and life-size video screens to produce the impression
of multiple people from dierent locations sitting around a conference table, but the use of
2D video precludes the impression of accurate eye contact: when a participant looks into
the camera, everyone seeing their video stream sees the participant looking toward them;
when the participant looks away from the camera (for example, toward other participants
in the meeting), no one sees the participant looking at them.
In this chapter, we develop a one-to-many teleconferencing system which uses a novel
arrangement of 3D acquisition, transmission, and display technologies to achieve accurate
reproduction of gaze direction and eye contact. The system targets the common appli-
cation where a single remote participant (RP) wishes to attend a larger meeting with an
audience of local participants. In this system, the face of the RP is three-dimensionally
scanned at interactive rates while watching a large screen showing an angularly correct
view of the audience. The scanned RP's geometry is then shown on the 3D display to the
audience. To achieve accurate eye contact in this 3D teleconferencing system, we make
the following contributions:
40
1. We combine an adaptation of the real-time face scanning system of [126] with an
evolved version of the 3D display system in Chapter 3 [43], allowing life-sized 3D
face transmission.
2. We reformulate a generalized multiple-center-of-projection rendering technique for
accurately displaying three-dimensional imagery to arbitrary viewer positions by
projecting onto anisotropic display surfaces. In particular, we generalize the tech-
nique to arbitrarily curved display surfaces, which allows using a concave display
surface which signicantly simplies projecting correct vertical perspective to au-
dience members at dierent heights and distances from the display. We achieve
high-speed rendering with accurate conic intersection mathematics using a 6D ver-
tex shader lookup table.
3. We project accurate, dynamic vertical parallax of the remote participant to multiple
simultaneous viewers at dierent heights by interactively tracking the viewers' face
positions in the teleconference video stream. This allows tracked vertical and au-
tostereoscopic horizontal parallax to be simulated using a horizontal-parallax-only
display, allowing accurate eye contact to be simulated.
4.2 Human Factors and Related Work
Gaze, attention, and eye contact are important aspects of face to face communication [4];
they help create social cues for turn taking, establish a sense of engagement, and indicate
the focus and meaning of conversation. Although eye contact sensitivity is asymmetric [9]
and special congurations can help experienced users determine which gaze directions
signify mutual eye contact [26], it is still useful to develop systems that intrinsically
support direct eye contact. Systems that support direct eye contact have elicited behaviors
41
horizontal
polarizer
vertically polarized
beamsplitter
mirror
audience
reflected camera
position
spinning display
surface
remote
participant
(RP)
high-speed
projector
structured light
projector
camera
remote participant
(RP)
2D video feed screen
Figure 4.1: (Left) The real-time 3D scanning system showing the structured light scanning
system (120Hz video projector and camera) and large 2D video feed screen. (Right) The
3D Display apparatus showing the two-sided display surface, high-speed video projector,
frontal beamsplitter, and 2D video and face tracking camera. Crossed polarizers prevent
the video feed camera from seeing past the beamsplitter.
more similar to face to face conversation, allowing users to more quickly conrm the
communications channel [78] and more easily develop trust in a group [82].
Beamsplitters (e.g. [91]), teleprompter type congurations, and other hardware have
been used to create direct eye contact in 2D video systems ([96] and [26] review many such
systems). One notable use of such hardware has been in the lm industry for documentary
interviews with an enhanced dramatic connection to the audience [76]. Other researchers
have demonstrated software techniques for resynthesizing video imagery to enhance eye
gaze, correcting for o axis camera and display placement [121, 39, 24, 87, 63]. Other
telecollaboration systems that leverage eye gaze include Clearboard [37], GAZE [109],
Hydra [99], and Multiview [81].
The design of our system is informed by several human factors studies. Prussog et al.
[90] performed experiments demonstrating that the impression of telepresence is increased
if the remote viewer is shown at natural size and/or stereoscopically. Muhlbach et al. [77]
found that achieving more accurate eye contact angles improved participants' ability to
recognize individually addressed nonverbal signals. Chen. [9] reported that the perception
42
of eye contact decreases below 90 percent if the horizontal contact angle is greater than
1
or the vertical contact angle is greater than 5
. Accordingly, our design transmits the
remote participant at natural size, autostereoscopically, with eye contact angles consistent
with the tolerances recommended by [9].
In order to achieve an autostereocopic experience across a wide eld of view, it is
necessary to record the participant in a manner that can be re-rendered from any point
of view. Large camera arrays have been used to capture a subject's light eld from all
possible angles [117, 120]. Our display generates 72 unique views over 180
, so a linear
light eld capture system would require 72 cameras for a single viewing height; Matusik et
al. [68] showed a 3D TV system using 16 such cameras and projectors over approximately
a 30
eld of view. However, several hundred cameras in a 2D array would be required
to render sharp horizontal and vertical parallax over a wide eld of view; Taguchi et
al. [104] uses 64 cameras for a relatively narrow eld of view. Furthermore, in two-way
teleconferencing, it is dicult to distribute a large number of cameras without obstructing
the participant's view of their own display.
An alternate approach is to render novel viewpoints based on captured 3D geometry
(e.g. Gross et al. [27]). Many techniques exist for recovering geometry based on multi-
camera stereo but few achieve real-time speeds. To make stereo matching more ecient,
some real-time face scanning systems (e.g. Raskar et al [93]) use active illumination
to disambiguate geometry reconstruction. We use a similar phase unwrapping based
approach to Zhang et al. [126] based on a rapid series of projected sinusoid patterns.
Although the system requires active illumination and can fail for fast moving scenes, it
works well for facial conversations and requires only modest bandwidth and hardware.
43
4.3 System Overview
Our 3D teleconferencing system (Fig. 4.1) consists of a 3D scanning system to scan the
remote participant (RP), a 3D display to display the RP, and 2D video link to allow the
RP to see their audience.
Real-time 3D Face Scanner The face of the RP is scanned at 30Hz using a structured
light scanning system based on the phase-unwrapping technique of [126]. The system uses
a monochrome Point Grey Research Grasshopper camera capturing frames at 120Hz and
a greyscale video projector with a frame rate of 120Hz. We determine the intrinsic and
extrinsic calibration between the two using the calibration technique of [127]. Our four
repeating patterns shown in Fig. 4.2 include the two 90-degree phase-shifted sinusoid
patterns of [126], but instead of a fully-illuminated frame, we project a frame half-lit on
the left followed by a frame half-lit on the right. We subtract one half-lit image from the
other and detect the zero-crossings across scan lines to robustly identify the absolute 3D
position of the pixels of the center of the face, allowing the phase unwrapping to begin
with robust absolute coordinates for a vertical contour of seed pixels. Conveniently, the
maximum of these two half-lit images provides a fully-illuminated texture map for the face,
while the minimum of the images approximates the ambient light in the scene [80]. We
found that by subtracting ambient light from all frames, the geometry estimation process
can be made to work in the presence of a moderate amount of ambient illumination.
Generally, we found 120Hz capture to be relatively robust to artifacts resulting from
temporal misalignment, though fast facial motion can produce waviness in the recovered
geometry.
The result of the phase unwrapping algorithm is a depth map image for the face,
which we transmit along with the facial texture images at 30Hz to the display computer.
44
Figure 4.2: Patterns used for 3D face scanning, including two phase-shifted sinusoids and
two half-lit images. The set of patterns is repeated at 30Hz.
The texture image is transmitted at the original 640 480 pixel resolution but we lter
and downsample the depth map to 80 60 resolution. This downsampling is done to
reduce the complexity of the polygonal mesh formed by the depth map, since it must be
rendered at thousands of frames per second to the 3D display projector. While so far
we have transferred this data only over a local area network, common image compression
techniques (e.g. JPEG) would easily reduce the bandwidth to a similar amount used
in commercial long-distance video chat systems. Real-time decimation and hole-lling
algorithms could be used to improve the quality of the transmitted geometry.
Autostereoscopic 3D Display Our display is based on the previous prototype [43]
with several key dierences. The size, geometry, and material of the spinning display
surface have been optimized for the display of a life-sized human face. The display surfaces
(Fig. 4.3) are in the form of a two-sided tent shape with symmetrical sides made from
thin 20cm25cm sheets of brushed aluminum sheet metal. The brushed aluminum's high
re
ectivity and strongly anisotropic re
ectance make it an inexpensive substitute for the
holographic diuser material used in the previous chapter. The two-sided shape provides
two passes of a display surface to each viewer per full rotation, achieving a 30Hz visual
update rate for 900 rpm rotation compared to the previous 15Hz update rate. The angle
of the tent and the size of the surfaces were chosen to be consistent with the sloped shape
45
of the human face as seen in Fig. 4.3(a). Instead of being placed directly above the display
surface, the high-speed video projector projects onto the display surface from the front,
inclined thirty degrees from the horizontal (two rst-surface mirrors fold the projector's
position into the top of the display.) As a result, the display has just somewhat more
than a 180
eld of view instead of the full 360
, but this omits only views of back of the
head and allows nearly the full 1024 768 resolution of the projector to cover the display
surface.
(a) (b)
Figure 4.3: (a) A 3D face model shown intersecting the tent-shaped display surface. (b)
Convex,
at, and concave display surfaces made from brushed aluminum sheet metal.
A monochrome MULE (Multi-Use Light Engine) high speed projector from Fakespace
Labs projects 1-bit (black or white) frames at 4,320 frames per second using a specially
encoded DVI video signal. Eectively, the display projects seventy-three unique views of
the scene across a 180
eld of view, yielding an angular view separation of 2.5 degrees.
For a typical inter-pupillary distance of 65mm, this provides binocular stereo for viewing
positions up to 1.5m away. Greyscale levels are simulated by a 4 4 ordered dither
pattern implemented as a pixel shader. While the current system is monochrome, color
could be achieved using multiple projectors with dichroic beamsplitters or a three-chip
DLP projector.
46
2D Video Feed An 84
eld of view 2D video feed allows the remote participant to
view their audience interacting with their three-dimensional image. A beamsplitter (Fig.
4.1) is used to virtually place the camera near the position of the eyes of the 3D RP.
The beamsplitter used is one of the protective Lexan transparent shields around the
spinning mirror. We place linear polarizers with perpendicular polarization orientations
on the camera lens and the Lexan shield to block light from the display from reaching the
camera; a related technique appears in [37]. Thus, the camera sees only light re
ecting
o the front of the polarizer on the shield. The camera thus sees a (slightly dim) view of
the audience, easily made usable by ensuring the audience receives adequate illumination
and using a wide f/stop on the camera lens.
The video from the aligned 3D display camera is transmitted to the computer per-
forming 3D scanning of the RP. Key to our work is that the scanning computer also
performs face tracking on this video stream so that the 3D display can render the correct
vertical parallax of the virtual head to everyone in the audience. In addition, the scanning
computer displays the video of the audience on a large LCD screen in front of the RP.
The screen in our system is approximately 1.8 meters wide, one meter away from the RP,
covering an 84
eld of view. We calibrate both the camera and the projector's distortion
parameters and eld of view, and we texture-map a polygonal mesh with the transmitted
images such that angles are consistent between rays captured by the camera and rays seen
on the screen by the RP. Thus, the RP receives a view of the audience as if they were in
the position of the virtual face. While the RP's view is not autostereoscopic, the screen
is approximately at the typical distance of the audience members to the displayed RP so
vergence is nearly correct.
47
4.4 Projecting 3D Vertices to the Display
To render 3D geometry to the display, we need to be able to project a 3D world-space
vertex (Q
x
;Q
y
;Q
z
) to the appropriate pixel of the video projector (p
u
;p
v
). Where this
vertex should be drawn also depends on the current rotation angle of the mirror and
the height and distance (V
h
;V
d
) of the viewer who will observe that vertex. We formu-
late the projection so that one projection function works for any viewing azimuth V
around the display, that is, for a circle of potential viewing positions all at distance V
d
and height V
h
from the display. Thus, the projection function we desire is of the form
(Q
x
;Q
y
;Q
z
;;V
h
;V
d
)7! (p
u
;p
v
).
To build this lookup table, we need to compute how a ray through a projector pixel will
re
ect o the anisotropic display surface and intersect some viewpoint at (V
h
;V
d
;V
). This
problem was rst addressed in the previous chapter but with the simplifying assumption
that rays re
ect from the anisotropic display surface as vertically-aligned planar sheets of
light allowing a real-time analytic solution to the projection [43]. For their case, where the
projector was directly above a display surface tilted at 45
, they argued a that a planar
approximation is reasonably accurate. Unfortunately, our projector's o-axis relationship
to the spinning display surface and and our desire to project onto arbitrarily curved
display surfaces makes such an approximation unworkable.
Our brushed aluminum display surfaces re
ect light as small cylindrical micro-facets
aligned with the dominant axis of anisotropy ~ a. According to models of anisotropic
re
ection [89, 46], a projector ray striking a cylindrical micro-facet will be specularly
re
ected with respect to all mirror angles perpendicular to the cylinder, forming a cone
of light whose angle at the apex is equal to the angle of incidence (Fig. 4.4). Since our
display surface rotates to oblique angles with respect to the incident projector rays, these
cones have signicant curvature for non-frontal viewing positions. Thus, for each pixel
48
of the projector, we must intersect the re
ected cones of light with the viewing circles
(V
h
;V
d
), which yields a quartic equation { a consequence of both cones and circles being
quadratic.
To avoid implementing a real-time quartic equation solver, we built a six-dimensional
lookup table evaluated across (Q
x
;Q
y
;Q
z
;;V
h
;V
d
), solving the requisite circle/cone in-
tersections using a GPU-accelerated numerical search.
viewing
circle
axis of
anisotropy
mirror
projection ray
cross section of lookup table
reflected cone
A
Q
P
M
V
V
V
V
V
1
2
3
4
5
Figure 4.4: A top view showing the anisotropic re
ection of a projector ray into a cone.
The angle formed between the ray and the axis of anisotropy is equal to the apex angle
of the cone.
Building the Projection Lookup Table We observe that the function (Q
x
;Q
y
;Q
z
;;V
h
;V
d
)7!
(p
u
;p
v
) is smooth over all the dimensions, and thus can be approximated by a reasonably
sparse lookup table ranging over each of the six arguments. We evaluate a lattice of
points Q over a 30cm
3
volume (somewhat larger than the volume swept by the spinning
display surface) in increments of 3:75cm. We evaluate over 360
in increments of 1:25
.
We evaluate V
h
;V
d
over typical viewing heights (50cm to +10cm) and distances (0:5m
49
to 2:0m) relative to the center of the display surface, in increments of 50cm and 10cm
respectively.
For each input (Q;;V
h
;V
d
), we search for the viewing angle V
that coincides with
some cone of re
ected light, and the corresponding projector coordinates (p
u
;p
v
) that
produce the light ray that is re
ected. This search lends itself well for parallelization
on the GPU. We evaluate 2D cross-sections of the lookup table directly on the GPU,
spanning the (Q
x
;Q
y
) dimensions. We then iterate over Q
z
to ll the entire table.
To evaluate a 2D cross-section of the table, we rst triangulate the mirror surface
and store normal and tangent information at each vertex, as well as the (p
u
;p
v
) projector
coordinates that project onto each vertex. We then iterate V
over the possible viewing
angles. We use a vertex shader to project the vertices onto the lookup table cross section as
viewed fromV
0
= (V
h
;V
d
;V
), using a frustum with four corner rays passing through the
four corners of the lookup table cross section. We also compute and storep
d
, the distance
from the vertex to V
0
, in order to provide the GPU with depth values for z-buer-based
hidden surface removal. The GPU then rasterizes this projected mirror surface at discrete
(Qx;Qy) positions, corresponding to cells in our lookup table. We use a fragment shader
to evaluate the discrepancy between the view ray and the cone of re
ected light, using the
interpolated position, normal, and tangent information at each rasterized sample, and the
position of the viewer and projector. The discrepancy can be measured asj
~
I~ a
~
L~ aj,
where
~
I is the direction towards the viewer, and
~
L is the direction from the projector.
We keep track of the smallest discrepancy value seen for each pixel by storing the value
in the alpha channel of the frame buer, and discard any sample with a higher value than
the value already in the buer. We then store the interpolated (p
u
;p
v
;p
d
) values in the
RGB channels of the frame buer. After iterating through all possible V
, we copy the
values in the frame buer to the 2D cross-section of the lookup table.
50
Note that this lookup table computation does not assume that the display surface is
at and as a result, we can compute lookup tables for arbitrarily shaped mirrors. The
output of the lookup table is geometry mapped into projector UV coordinates. The
geometry will appear warped in order to compensate for the mirror shape, mirror angle,
and projector perspective and keystoning. To illustrate this warp, we applied the lookup
table to a vertical front-facing plane of geometry as shown in Fig. 4.5. The transform
applied by the table is generally smooth, though curvature increases towards grazing
angles. The mirror faces the front at 0
. At 30
, the image will re
ect to viewers located
60
o-center. The resulting graphs are oriented so that 'up' corresponds to the top of
the mirror and the black rectangle represents the extent of the projector frame.
Evaluating the Lookup Table We store our lookup table in the GPU memory as a
series of 3D single-precision
oating-point textures, each spanning (Q
x
;Q
y
;Q
z
) for a given
V
h
, V
d
, and . At the sampling densities we use, the lookup table for a particular mirror
requires 24MB of memory, which requires little of the 1.5GB of memory of the display
computer's nVIDIA graphics card. We currently use linear interpolation to evaluate
the lookup table at intermediate vertex and viewing positions, and choose our sampling
density to yield a sucient approximation in this context. Our display's graphics card
can perform 3D texture lookup with automatic trilinear interpolation. Thus, we need to
perform just four texture lookups for the neighboring evaluated values ofV
h
andV
d
in the
vertex shader. Due to the synchronization of the rendered frames to the mirror rotation,
the value of is always sampled exactly and requires no interpolation. If more than one
viewer is present, we select the V
h
and V
d
values per rendered frame by identifying the
viewer closest to a ray originating from the center pixel of the projector and re
ected
o the rotated mirror. If the lookup table size were an issue, it is also possible to t a
51
higher-order polynomial to the table entries and store only the corresponding coecients.
On some current generation GPUs, this could provide a signicant speedup as support
for 3D texture lookups is not always fully optimized.
4.5 Flat and Curved 3D Display Surfaces
To demonstrate our projection technique, we designed and tested three dierent display
surfaces { convex,
at, and concave { as seen in Fig. 4.3. These dierently shaped
surfaces oer dierent advantages and disadvantages, underscoring the utility of being
able to project onto arbitrary surfaces. All display surfaces have the same 15
-from-
vertical double-sided design and a surface area of 20cm wide by 25cm high of thin brushed
aluminum sheet metal. The shape of each mirror is supported by a custom assembly of
laser-cut plexiglass.
The
at display surface has the most similarities to the one used in [43], though
it has a steeper angle to better match the shape of a face and two sides to eectively
double the frame rate of the display. The diverging beam of the projector continues to
diverge horizontally after re
ection by the
at display surface, so that approximately a
20
wedge of the audience area observes some re
ected pixels from the projector for any
given mirror position as seen in Fig. 4.6(a). The
at mirror is the simplest to build and
calibrate though other shapes can provide more useful optical properties.
The convex display surface is in the shape of a 40
cylindrical arc, curving20
over
its 20cm of width. The convex curve spreads re
ected light over 100
of the audience.
The benet of this mirror shape is that the line of re
ected light traces over the audience
more slowly compared to the
at mirror, since the speed of the angle formed between
the display surface and the incident projector light is eectively retarded relative to the
absolute rotation of the surface. (If the display surface were a single complete cylinder,
52
the specular re
ection would not move at all.) As a result, the convex mirror yields
higher angular resolution of the three-dimensional imagery, producing higher-quality 3D
stereopsis. However, a convex mirror has several disadvantages. Due to the large angular
divergence of the mirror, many projector rays re
ect to the far side of the display where
they are unseen, while many forward-facing rays can not be re
ected into by any mirror
angle. The result is a smaller usable volume relative to the mirror size. The missing
sample squares in Row 3 of Fig. 4.5 indicate the points that fell outside the concave
mirror's smaller visible volume.
The concave display surface is built into an elliptical shape designed to focus the light
of the projector (which, in its unfolded optical path, is 56cm from the mirror surface) to a
line 1m away from the center of the display (which is a typical average viewing distance),
as seen in Fig. 4.6(b). The utility of this shape is that at any instant at most one audience
member will see the light re
ected by the projector. We leverage this mirror shape in
the next section so that in the case of tracked viewers, the display can render the proper
vertical perspective for each viewer in a straightforward manner using a single (V
h
;V
d
)
per rendered frame.
Dierent mirror shapes also aect the shape of the display's focal surface. The focal
surface for a given viewer is composed of the multiple mirror slices that are illuminated as
the mirror spins. For a
at mirror, the focal surface is a cone centered around the mirror's
axis. Convex and concave mirrors have asymmetrical focal surfaces that change based
on viewing angle. Convex mirrors produce a set of concave focal planes; concave mirrors
produce a set of convex focal planes. This represents another advantage of the concave
mirror, as the human face is shaped more like a convex cylinder than a concavity. When
the shape of the focal surface approximates the object being displayed, accommodation
cues are more accurate and aliasing [128] is minimized.
53
4.6 Face Tracking for Vertical Parallax
To provide accurate gaze and eye contact, the rendered face of the remote participant
must appear to be rendered correctly into world space coordinates as seen by all audience
members. Rendering the face for the same viewing height V
h
and distance V
d
for all
audience members can make the face appear to be gazing at an inaccurately high or low
angle to some viewers, even though the natural horizontal-parallax of the display will
provide generally accurate horizontal perspective to all viewers. Although vertical gaze
direction is detected with less sensitivity than horizontal gaze direction [9], a true sense
of eye contact requires both to be within a few degrees of accuracy. To render vertical
perspective accurately to multiple viewers, we track viewer positions in the 2D video
feed showing the RP's view of the audience (Fig. 4.7(a)). In the previous chapter, we
demonstrated tracked vertical parallax for a single viewer with an active tracking system,
but did not handle multiple viewers with a passive tracking system. For our tracking
system we use the face detection algorithms in the OpenCV library based on [110] and
lter the tracking data using a Kalman lter to reduce noise. The ltered detected face
data provides a good estimate of the azimuth and inclination of each audience face relative
to the eyes of the RP, though the single camera does not provide a distance measurement.
We instead approximate depth based on the size of the detected face. While variation in
face size across audience members biases such distance measurements, the visual error is
essentially undetectable as it results only in subtle changes to perspective foreshortening;
future systems could include a stereo camera pair to more accurately triangulate facial
depth. To photograph simulated audience viewpoints in several gures in this paper,
we used the Augmented Reality Toolkit [119] to track square markers attached to each
camera.
54
When rendering tracked vertical perspectives for multiple viewers, we use the focusing
concave display surface so that any one video projector frame can be assumed to address
just one of the audience members. For each display surface rotation angle we determine
the tracked audience member who is closest to the central re
ected ray of the mirror. We
then render the face using the lookup table entries corresponding to the height and depth
(V
h
;V
d
) of this closest viewer (Fig. 4.7(b)). In this way, the display's horizontal parallax
provides binocular stereo with no lag as viewers move horizontally, while vertical parallax
is achieved through tracking. We believe this is a good approach since it respects the
nding of [9] that we are more sensitive to horizontal gaze direction than vertical gaze
direction, and also since people's body motion is more likely to produce horizontal head
translations than vertical ones.
4.7 Results
Figure 4.9 evaluates the accuracy of the projection technique for three dierent mirror
shapes. We projected a test cube, a real face scan, and a mannequin head scan onto the
concave,
at, and convex display surface geometries, each with its appropriately computed
lookup table. In the concave case, it was necessary to shrink objects to t within the
smaller usable display volume. The surfaces are seen from two simultaneously tracked
camera perspectives at dierent heights. Despite the changing mirror geometry, the results
show the perspective of the displayed objects to be consistent with the camera viewpoints
as well as with each other. For a ground truth comparison, we removed the spinning
mirror from the display and replaced it with the actual mannequin head. Though the
lighting diers, the size and shape of the virtual head closely matches that of the real
head. In general, the concave mirror yields the best combination of display volume, user
55
addressability, and focal cues for a head-sized display. The concave mirror is used for all
subsequent results in this paper and video unless stated otherwise.
We have not conducted a formal user study to judge the eectiveness of the display as
a system for improved teleconference communication but were pleased that subjects using
the display consistently reported a sense that the remote participant was able to make eye
contact with them. This eect was felt most strongly when the remote participant was
looking away, then suddenly turned or glanced towards a particular audience member.
To provide a quantitative measurement of eye contact accuracy, we captured and
transmitted a test object featuring ve registration targets that enable angular orientation
to be measured (Fig. 4.8). The testing procedure is as follows: we placed a camera at
one end of the teleconferencing system, and placed the test object at the other end of
the system. After sighting the transmitted image of the camera lens through the sighting
hole of the test object, we photographed the transmitted image of the test object. Then
we measured on the photograph how far the apex registration target was from the center
of the four edge registration targets. From this deviation we calculated the gaze error in
each direction of the system. The measured errors ranged between 3 to 5 degrees on the
3D display, most of which is attributable to geometric noise and the 2.5 degree separation
between independent views. For the remote 2D display, the error ranged between 1 to 2
degrees.
56
concave flat convex
30
0
+30
Figure 4.5: A grid of points on a frontoparallel plane is processed through the 6D lookup
table to produce warped geometry displayed on the projector. Three LUTs for three
mirror shapes are demonstrated, evaluated for three dierent mirror angles; the black
rectangles show the extent of the actual projector frame.
57
audience
flat mirror
unfolded
projector
axis
(a)
concave
mirror
axis
unfolded
projector
audience (b)
Figure 4.6: (a) Light diverging from a
at anisotropic display surface can illuminate mul-
tiple viewers simultaneously, requiring a single projector frame to accommodate multiple
viewer heights. (b) Light re
ected by a concave display surface typically projects imagery
to at most one viewer at a time, simplifying the process of rendering correct vertical
parallax to each tracked viewer.
58
A
B
C
(a) (b)
Figure 4.7: (a) Faces are tracked in the 2D video feed (b) As the mirror rotates, the
head is rendered for the appropriate height and distance of the nearest viewer for correct
vertical perspective.
Figure 4.8: A test object (left) is aimed at a camera shown on the 2D display (right). The
camera photographs the transmitted image of the test object to measure gaze accuracy
for the 3D display. By switching the locations of the camera and test object, we also
measured the accuracy of the 2D display.
59
concave flat convex
Figure 4.9: Comparison of the dierent mirror shapes for simultaneously tracked upper
and lower views. (Row 1) Concave mirror (Row 2) Flat mirror (Row 3) Convex mirror.
For the convex mirror, the geometry was scaled by 0.75 to t within the smaller display
volume. In the 4th and 8th columns we replaced the mirror with the actual mannequin
head to provide a ground truth reference.
60
Chapter 5
Projector Arrays
Figure 5.1: (left) A 3D face is shown on the autosteroscopic 3D projector array. (cen-
ter, right) The display combines both autostereoscopic horizontal parallax with vertical
tracking to generate dierent horizontal and vertical views simultaneously over a full 110
eld of view. The stereo pairs are left-right reversed for cross-fused stereo viewing.
5.1 Introduction
Projector arrays are well suited for 3D displays because of their ability to generate dense
and steerable arrangements of pixels. As video projectors continue to shrink in size,
power consumption, and cost, it is possible to closely stack projectors so that their lenses
are almost continuous. We present a new HPO display utilizing a single dense row of
projectors. A vertically anisotropic screen turns the glow of each lens into a vertical
stripe while preserving horizontal angular variation. The viewer's eye perceives several
stripes from multiple projectors that combine to form a seamless 3D image. Rendering to
61
such as a display requires the generation of multiple-center of projection (MCOP) imagery,
as dierent projector pixels diverge to dierent viewer positions. In Chapters 3 and 4, we
proposed a MCOP rendering solution in the context of high-speed video projected onto a
spinning anisotropic mirror. A front-mounted projector array can be seen as an unfolded
spinning mirror display where each high-speed frame corresponds to a dierent discrete
projector position. In this chapter, we extend this framework for use with both front and
rear-projection 3D projector arrays.
As every viewer around a HPO display perceives a dierent 3D image, it is possible
to customize each view with a dierent vertical perspective. Such a setup has the unique
advantage that every viewer can have a unique height while experiencing instantaneous
horizontal parallax. Given a sparse set of tracked viewer positions, the challenge is to
create a continuous estimate of viewer height and distance for all potential viewing angles
that provides consistent vertical perspective to both tracked and untracked viewers. The
set of viewers is dynamic, and it is possible that the tracker misses a viewer particularly
as viewers enter or leave the viewing volume. In Chapters 3 and 4 we also assumed a
constant viewer height and distance for each projector frame. In practice, this limitation
can result in visible distortion, tearing, and crosstalk where a viewer sees slices of multiple
projector frames rendered with inconsistent vertical perspective. This is especially visible
when two viewers are close together but have dierent heights. We solve this problem
by dynamically interpolating multiple viewer heights and distances within each projector
frame as part of a per-vertex MCOP computation and compare dierent interpolation
functions. The algorithm can handle both
at screens as well as convex mirrored screens
that further increase the ray spread from each projector.
The primary contributions are:
1. An autostereoscopic 3D projector array display built with o-the-shelf components
62
2. A new per-vertex projection algorithm for rendering MCOP imagery on standard
graphics hardware
3. An interpolation algorithm for computing multiple vertical perspectives for each
projector
4. An analysis of curved mirrored screens for autostereoscopic projector arrays
5.2 Apparatus
To achieve maximum angular resolution, it is preferable to stack projectors as close as pos-
sible. Our projector array system consists of 72 Texas Instruments DLP Pico projectors
each of which has 480x320 resolution. Our projectors are evenly spaced along a 124cm
curve with a radius of 60cm. This setup provides an angular resolution of 1.66
between
views. At the center focus of the curve, we place a 30cm x 30cm vertically anisotropic
screen. The ideal screen material should have a wide-vertical diuse lobe so the 3D image
can be seen from multiple heights, and a narrow horizontal re
ection that directs dierent
projector pixels to varying horizontal views. When a projected image passes through the
vertically anisotropic screen, it forms a series of vertical stripe. Without any horizontal
diusion, the stripe width is equivalent to the width of the projector lens. Each projector
is 1:42cm wide with a 4mm lens; so to eliminate any gap between stripes, this would
require stacking several hundred projectors in overlapping vertical rows [45]. We found
that acceptable image quality could be achieved with a single row of projectors if we
use a holographic diuser to generate 1-2 degrees of horizontal diusion and stack the
projectors with a 2mm gap (see Figure 5.2). The width of the diusion lobe should be
equal to the angle between projectors.
63
Flat Convex
Without diuser With diuser
Figure 5.2: The anisotropic screen forms a series of vertical lines, each corresponding
to a projector lens. A 1-2
horizontal diuser is used to blend the lines and create a
continuous image. The top rows shows stripes and imagery re
ected on a
at anisotropic
screen. The bottom row shows imagery re
ected on a convex anistropic screen. By
varying the curvature of a mirrored anisotropic screen, we can decrease the pitch between
re
ected projector stripes. This increases the spatial resolution at the screen depth at the
expense of overall angular resolution and depth of eld.
For rear-projection setups, we use a single holographic diuser with a horizontal 2
and vertical 60
scattering prole (see Figure 5.3). Alternatively for front-projection, we
re
ect o a ne lenticular screen behind the holographic diuser (see Figure 5.1). The
lenticular screen is painted matte black on the reverse
at side to reduce ambient light and
improve black levels. As the light passes twice through the diuser, only 1
of horizontal
diusion is required. All our screen components are currently available o-the-shelf. The
holographic diusers are from Luiminit Co. The lenticular material is a 60lpi 3D60 plastic
screen from Microlens Technology. Other common anisotropic materials such as brushed
metal can achieve similar vertical scattering, but have limited contrast ratio and would
not work for rear-projection setups.
64
We drive our projector array using a single computer with 24 video outputs from four
ATI Eyenity graphics cards. We then split each video signal using 24 Matrox Triple-
HeadToGo video splitters, each of which supports three HDMI outputs. To track viewer
positions, a Microsoft Kinect camera is mounted directly above the screen. The depth
camera is ideal as it provides both viewer height and distance, however our interpolation
method would work with other 3D tracking methods.
5.3 Calibration
Even with a machined projector mount, there is still noticeable geometric and radiometic
variation between projectors. We automate the geometric calibration of the projectors
using a 2D rectication process to align projectors images to the plane of the screen.
We rst place a diuse surface in front of the screen, then sequentially project and pho-
tograph an AR marker from each projector [119] (see Figure 5.3). We then compute a
2D homography matrix that maps the detected marker corners in each projector to a
physical printed marker also attached to the screen. At the same time, we also measure
the average intensity of light from each projector on the diuse surface and compute a
per-projector scale factor to achieve consistent brightness and color temperature across
the entire projector array.
5.4 Viewer Interpolation
As described above, anisotropic screens do not preserve a one-to-one mapping between
projectors and views as projector rays diverge to multiple viewers at potentially dierent
angles, heights, and distances. To generate an image that can be viewed with a single
perspective, we must render MCOP images that combine multiple viewing positions. A
65
Figure 5.3: (left) Photograph of our rear-mounted projector array setup. (right) Pho-
tograph of the calibration setup for front-mounted projector array. To compute relative
projector alignment, we sequentially correspond a virtual AR marker generated by each
projector with a printed AR marker.
brute force solution would be to pre-render out a large number of views with regular
perspective and resample these images based on the rays generated by the device as
done by Rademacher et al. [92]. A variant of this technique was implemented earlier in
Chapter 4.4 [43] for existing photographic datasets. However, resampling introduces a
loss-of-resolution and this method does not scale well to large dense projector arrays with
many potential horizontal and vertical positions.
For scenes with known geometry, an alternative approach is to replace the standard
perspective projection matrix with a custom MCOP projection operator that maps each
3D vertex point to 2D projector coordinates based on dierent viewing transform. A
per-vertex geometric warp can captures smooth distortion eects and be eciently im-
plemented in a standard GPU vertex shader. Our method is based on a similar approach
used in Chapter 4.4 to generate MCOP renders [43].
The rst step is to project each 3D vertex onto the screen. For each vertex (Q), we
trace a ray from the current projector position (P ) through the current vertex (Q). We
intersect this rst ray (PQ) with the screen surface to nd a screen point (S
P
). The
second step is to compute a viewing position associated with this vertex. The set of
66
viewers can been seen as dening a continuous manifold spanning multiple heights and
distances. In the general case, the intersection of the ray (PQ) with this manifold denes
the corresponding viewer (V ). Finally, we then trace a second ray from the viewer (VQ)
back to the current vertex (Q) and intersect with the screen a second time (S
V
). The
actual screen position uses the horizontal position of S
P
and the vertical position of S
V
.
This entire process can be implemented in a single GPU vertex shader. In essence, the
horizontal screen position is determined by projecting a ray from the projector position,
while the vertical screen position is based on casting a ray from the viewer position. The
dierence in behavior is due to the fact that the anisotropic screen acts as a diuse surface
in the vertical dimension. We multiply the nal screen position by the current calibrated
projector homography to nd the corresponding projector pixel position for the current
vertex.
In practice, it is not easy to dene the manifold of viewer positions as we only know
a few sparse tracked positions. We propose a closed-form method for approximating this
manifold as a series of concentric rings. Previously in Chapter 3, we assumed that the
viewing height and distance was constant for all projectors. This arrangement corresponds
to a single circle of potential viewers. As the viewers are restricted to lie on this circle,
the viewpoints represented by each rendered MCOP image vary only in their horizontal
angle, with no variation in height and distance. One trivial extension would be to compute
interpolate viewer height and distance once per projector frame and adjust the radius of
the viewing circle. In our comparison images we refer to this method as "per-projector"
interpolation. However when tracking multiple viewers close together, it is possible for
the viewing height to change rapidly within the width of a single MCOP projector frame.
We solve this issue by interpolating the viewer height and distance per-vertex. We
pass the nearest two tracked viewer positions to the vertex shader. Each viewer height
67
and distance denes a cylinder with a dierent viewing radius. We then intersect the
current projector-vertex ray (PQ) with both cylinders. To determine the nal viewing
position for this ray, we compute a weighted average of the viewer heights and distances.
The interpolation weights of each tracked viewer position is a function of the angle (
1
,
2
) between the cylinder intersection and the original tracked point. A top-down view of
these intersections is shown on the left side of Figure 5.4. When computing the weighted
average, we also add a third default value with very low weight. This value corresponds to
the average user height and distance appropriate for untracked viewers. As the viewer's
angle from each tracked point increases, the in
uence of the point decays and the viewer
returns to the default height and distance (right side of Figure 5.4). We implemented two
dierent fallo functions centered around each track point - a normalized gaussian and a
constant step hat function. The gaussian function smoothly decays as you move away from
a tracked viewer position, while the hat function has a sharp cuto. For all the results
shown in the chapter, the width of the gaussian and hat functions was 10 degrees. The
weight function can be further modulated by the condence of the given tracked viewer
position. This decay makes the system more robust to viewers missed by the tracking
system or new viewers that suddenly enter the viewing volume. If two viewers overlap or
stand right above each other, then their average height and distance is used. In the worst
case, the system reverts back to a standard autostereoscopic display with only horizontal
parallax. Pseudocode for computing per-vertex projection and viewer interpolation can
be found in Table 5.1. The nal rendered frames appear warped as dierent parts of each
frame smoothly blend between multiple horizontal and vertical perspectives. Figure 6.7
shows a subset of these MCOP frames before they are are sent to the projector.
A related MCOP rendering algorithm was also proposed in Section 4.4 [42]. In this
later work, the entire per-vertex projection operator from 3D vertices to 2D projector
68
positions was precomputed as a 6D lookup table based on 3D vertex position, mirror angle,
and viewer height and distance. The lookup table was designed to handle more complex
conical anisotropic re
ections that occur when projector rays are no longer perpendicular
to the mirror's vertical anisotropic axis. For projector arrays, we found that rays scattered
by the screen remained mostly planar with very little conical curvature. Furthermore,
the lookup table's height and distance was indexed based on a single re
ection angle
per projector frame. This assumption is analogous to our "per-projector" interpolation
examples. In chapter 4, we used a concave anisotropic mirror to optically refocus the
projector rays back towards a single viewer instead of modeling dierent heights and
distances for each projector [42]. Such an approach would not work for a rear-mounted
projector array where there is no mirror element. This new software solution is more
general and allows for a wider range of screen shapes.
5.5 Convex Screen Projection
For a front-mounted array, the pitch between re
ected stripes can also be reduced by
using a convex re
ective anisotropic screen. The convex shape magnies the re
ected
projector positions so they are closer to the screen, with narrower spacing and a wider
eld of view (Figure 5.2). As less horizontal diusion is required to blend between stripes,
objects at the depth of the screen have greater spatial resolution. This improved spatial
resolution comes at the cost of angular resolution and lower spatial resolution further
from the screen. Zwicker et al. [128] provide a framework for dening the depth of eld
and display bandwidth of anisotropic displays. For a given initial spatial and angular
resolution, the eective spatial resolution is halved every time the distance from the
screen doubles. In Figure 5.6 we plot the tradeo between spatial resolution and depth
of eld given the initial specications of our projector prototype. A curved mirror is also
69
void warpVertex(
float3 Q : POSITION, // input 3D vertex position
uniform float3x3 Homography // 2D projector homography
uniform float4 P, // current projector position
uniform float3 T[], // tracked viewers positions
uniform float C[], // tracked viewer confidences
out float3 oQ : POSITION) // output 2D screen coordinate
{
P = computeProjectorPostion(P); // use reflected projector position if front-projection screen
V = interpolateViewer(Q, P, V, T, C); // find viewer that sees current vertex from current projector
float3 VQ = V - Q; // define ray from viewer position V to vertex Q
float3 S = intersectRayWithScreen(VQ) // intersect with planar or cylindrical curved screen
oQ = mul( Homography, S ); // apply projector homography to screen coordinate
}
// interpolate tracked viewer positions per-vertex
float3 interpolateViewer(float3 Q, float3 P, float3 V, float3[] T, float[] C) {
float sum_of_weights;
float current_viewer;
float3 PQ = Q - P; // define ray from reflected projector position P to vertex Q
for each tracked viewer t
{
float3 I = intersectRayWithCylinder(PQ, radius(t)); // radius of cylinder is tracked viewer distance
float angle = computeAngle(T, I); // compute angle between intersection and tracked viewer
float weight = falloffFunction(angle); // falloff function could be Gaussian or Hat function
weight *= C[t]; // also weight by tracking confidence
current_viewer += t * weight;
sumWeight += weight;
}
current_viewer += default_weight * default_viewer; // add in default viewer
sumWeight += default_weight; // with low default weight
return currentViewer / sumWeight; // compute weighted average of tracked viewer positions
}
Table 5.1: Vertex shader pseudocode that projects a 3D scene vertex into 2D screen
coordinates as described in Sections 5.4. The code interpolates between multiple tracked
viewer positions per vertex. It assumes helper functions are dened for basic geometric
intersection operations.
preferable for 360
applications where an anisotropic cylinder can be used to re
ect in all
directions without any visible seams.
As a convex mirror eectively increases each projectors eld of view, each projector
frame must represent a wider range of viewer positions. It becomes more critical to com-
pute multiple viewer heights per projector frame. In general the projection algorithm is
the same as for
at screen with two modications. For front-mounted setups that use
a mirrored anisotropic screen, we rst unfold the optical path to nd re
ected projector
70
V
2
V
Q
V
1
Q - vertex
S
Q
1
Q
2
projectors
viewers
screen
V
2
V
1
interpolated viewers
default viewer
height and distance
projectors
screen
Figure 5.4: (left) Diagram showing how we compute the corresponding viewing position
for each vertex by tracing a ray from the projector position P through the vertex Q. We
intersect the ray with the screen to nd the horizontal screen position S. We intersect
the ray with viewing circles dened by the nearest two tracked viewers (V
1
;V
2
). We
interpolate between the tracked viewer positions based on their angular distance from
the intersection points. (right) Diagram showing the continuous curve formed by the
interpolated viewer positions. The curve returns to a default viewer height and distance
away from tracked viewer positions (V
1
;V
2
).
positions. The rays re
ected o a convex mirror do not always reconverge at a single
re
ected projector position (Figure 5.6 (right)). A rst order approximation is to sample
multiple re
ected rays and average the resulting intersection points. This can still re-
sult in distortion for extreme viewing angles. A more accurate projector position can be
computed per-vertex. In the per-vertex projection, we use the average re
ected projector
to compute an initial intersection point with the screen. Based on the local curvature
around this screen point, we can then interpolate a second more accurate re
ected pro-
jector position that is accurate for that part of the screen. This process could be iterated
to further rene the re
ected projector position, though in our tests the re
ected posi-
tions converged in a single iteration. Secondly, we discard rays that re
ect o the convex
71
Flat screen Convex screen
Figure 5.5: Our algorithm renders out a dierent MCOP image for each of 72 projectors.
This is a sampling of the generated images using per-vertex vertical parallax blending
with a gaussian fallo function. Each image smoothly blends between multiple horizontal
and vertical viewer positions which gives the appearance of an unwrapped object. Flat
front and rear projection screens produce almost identical imagery. Convex screens have
additional warping as each projector spans a wider set of viewers.
mirror near grazing angles as these regions are extremely distorted and are very sensitive
to small errors. In comparison to a
at screen, a convex screen requires rendered frames
covering a wider variation of views and greater per-vertex warping (Figure 6.7).
5.6 Results
Our display can generate stereo and motion parallax over a wide eld of view. While it is
dicult to communicate the full 3D eect with 2D gures, Figure 5.1 shows several stereo
photographs printed for cross-fuse stereo viewing. The motion parallax can also be seen
throughout the accompanying video.
To evaluate our view interpolation algorithms. we render multiple geometric models
from a variety of heights and distances (Figure 5.7). We use a checkered cube to identify
any changes any perspective or warping and a spherical model of the Earth to illustrate
72
0 1 2 3 4 5 6
0
50
100
150
200
250
300
350
Distance from Screen (meters)
Spatial Resolution (stripes per meter)
x1 (flat)
Horizontal Magnification:
x2
x3
x4
x5
projectors
reflected projectors
screen
Figure 5.6: (left) Graph showing tradeo between spatial magnication using convex mir-
rored screens and depth of eld. Greater mirror curvature increases density of projector
stripes and spatial resolution at the screen, however spatial resolution falls o rapidly
away from the screen. (right) For a convex mirrored screen, re
ected projector rays no
longer converge on a single projector position. We sample multiple points on the mirror
surface and compute an average re
ected projector position for each real projector. We
then iteratively rene by re
ected position per-vertex by tracing a ray from the average
position through the vertex to the convex screen. We then compute a more accurate
re
ected position based on the local neighborhood of the screen point.
correct aspect ratio. We also show two high-resolution face models as examples of or-
ganic shapes. We then photographed the display from three views: a lower left view, a
center view, and a high right view. We measured the real camera positions and rendered
matching virtual views to serve as ground-truth validation (Figure 5.7, row 1). With no
vertical tracking, the display only provides horizontal parallax and all viewers will see a
foreshortened image as they move above or below the default height (row 2). Note that
the top and bottom faces of the cube are never visible and the facial eye gaze is inconsis-
tent. If we enable tracking for the left and right cameras (rows 3 through 6), then it is
possible to look up and down on the objects.
73
In Figure 5.7, we also compare our new interpolation algorithm for handling multiple
dierent viewers. For rows 2 and 3, we compute a per-vertex viewer height and distance
as described in Section 5.4. Per-vertex vertical parallax interpolation produces plausible
and consistent perspective across the entire photographed view. In contrast, rows 4 and 5
demonstrate interpolation that uses a constant viewer height and distance per projector.
Each projector still interpolates the nearest two tracked viewers positions, however the
interpolation weights are uniform across all vertices. Per projector interpolation generates
signicant distortion for all three views where the vertical perspective on the left side of
the image does not match the perspective on the right side. These errors also depends
on the shape of the weight fallo function. Using per-projector gaussian weights (row 5)
makes straight lines curved while a per-projector step hat function (row 6) causes image
tearing as the view abruptly changes from one vertical height to another. The distortion
is less visible on organic objects such as a face, though the left and right eyes are no longer
looking in the same direction. Additional results in the video show how this distortion
ripples across the geometry as the camera moves back and forth between two tracked
projector positions.
Another advantage of per-vertex interpolation is that it reduces errors for untracked
viewers. Untracked viewers see the 3D scene based on a default height and distance that
should not be aected by the vertical movement of nearby tracked viewers. Despite the
fact that each projector frame may be seen by multiple viewers, by computing multiple
vertical perspectives within each projector frame (per-vertex) we can isolate each viewer
(Figure 5.7, rows 5 and 6). In contrast, using a single height per-projector, the center
view of the face appears distorted as this untracked view shares some projectors with the
nearby lower left camera (rows 3 and 4). The same eect is shown for a dynamic user
in the accompanying video. Without per-vertex interpolation (time 3:18), the untracked
74
viewer is clearly distorted whenever the tracked viewer is nearby. At time 3:05 in the video,
you can see that with per-vertex interpolation, this extraneous crosstalk is considerably
reduced.
We tested our projection algorithm on a curved mirror with a 10 degree curvature and
a magnication factor of 1.43. Figure 5.8 shows imagery on the curved mirror with and
without per-vertex vertical parallax interpolation. In the later case, perspective distortion
increases signicantly. This distortion could be reduced by using a wider gaussian function
so that nearby frames would be forced to have similar heights, but this would have the
negative eect of more crosstalk with nearby viewers. To validate our projection model
for a wider range of screen shapes, we developed a projector array simulator that can
model arbitrary screen curvature and diusion materials. The simulator uses the same
render engine but projects onto a screen with a simulated anisotropic BRDF. As shown
in the accompanying video (time 1:28), we can maintain a stable image with correct
perspective as a mirror shape changes signicantly. We can also determine the ideal
horizontal diusion width by simulating dierent anisotropic re
ectance lobes.
Our system uses a standard windows PC with no special memory or CPU. The only
requirement is that the motherboard can accommodate 4 graphics cards. The CPU is
primarily used for user-tracking; all other operations are performed on the GPU. The
animations shown in the video used around 5000 triangles and ran at 100-200fps on an
ATI Eyenity 7800. To achieve maximum performance, rendering is distributed across
all four graphics cards. As not all graphics libraries provide this low level control, we
explicitly create a dierent application and render context per GPU, with the dierent
instances synchronized through shared memory. The main bottleneck is Front Side Bus
data transfer as textures and geometry need to be uploaded to all GPUs.
75
low left center high right low left center high right low left center high right
Figure 5.7: This gure shows three virtual objects viewed by an untracked center camera
and two tracked left and right cameras. As a ground-truth comparison, we calibrate the
positions of three cameras and render out equivalent virtual views of the objects (1st row),
while the remaining rows show photographs of the actual display prototype. If a single
constant viewer height and distance is used then the viewer sees signicant foreshortening
from high and low views (2nd row). We also compare dierent viewer interpolation func-
tions for interpolating multiple viewer heights and distances. The tracked view positions
are interpolated either with a constant height/distance per-projector (3rd and 4th rows) or
with dierent height/distance per-vertex (5th and 6th rows), or. Photographs taken with
per-vertex interpolation show less distortion with consistent vertical perspective across
the entire image, and the untracked center view is not aected by the nearby left viewer.
Photographs with per-projector interpolation exhibit multiple incorrect perspectives with
warped lines and image tearing, and the untracked center view is distorted by the nearby
left viewer. The local weight fallo of each tracked position is implemented as either a
normalized gaussian (3rd and 5th rows) or sharp step hat function (4th and 6th rows).
Gaussian interpolation errors appear as incorrect curved lines while errors using a sharp
hat fallo result in a image tearing.
.
76
Per-vertex viewer height Per-projector viewer height
left low center high right left low center high right
Figure 5.8: Comparison of dierent viewer interpolation functions for a convex mirror.
The left side uses per-vertex viewer height and distance with a gaussian fallo. The right
side uses constant height and distance per-projector with a gaussian fallo. Photographs
taken with per-vertex interpolation show less distortion with consistent vertical perspec-
tive. In contrast, straight lines appear curved in photographs using constant per-projector
interpolation.
77
Chapter 6
Time-oset Conversations
6.1 Introduction
What would it be like if you could meet someone you admire, such as your favorite
artist, scientist, or even world leader, and engage in an intimate one-on-one conversation?
Face-to-face interaction remains one of the most compelling forms of communication. Un-
fortunately in many cases, a particular subject may not be available for live conversation.
Speakers are both physically and logistically limited in how many people they can person-
ally interact with. Yet the ability to have conversations with important historical gures
could have a wide range of applications from entertainment to education.
Traditional video recording and playback allows for speakers to communicate with a
broader audience but at the cost of interactivity. The conversation becomes a one-sided
passive viewing experience when the narrative timeline is chosen early in the editing
process. Particularly with rst person narratives, it can be especially compelling to look
the speaker in the eye, ask questions, and make a personal connection with the narrator
and their story. Research has shown that people retain more information through active
discussion over a passive lecture [70].
78
To solve this problem, we created a system that enables users to have interactive
conversations with prerecorded 3D videos. In this paper, we simulate 3D conversations
across time where one half of the conversation has already taken place. Our system
presents each subject life-size on a dense automultiscopic projector array, combining both
3D immersion and interactivity. Automultiscopic 3D displays enable multiple users to
view and interact with a speaker, and see the same 3D perspective as if he or she were
actually present. For each subject, we record a large set of 3D video statements and users
access these statements through natural conversation that mimics face-to-face interaction.
Conversational reactions to user questions are retrieved through speech recognition and
a statistical classier that nds the best video response for a given question. Recordings
of answers, listening and idle behaviors, are linked together to create a persistent visual
image of the person throughout the interaction.
Our main contributions are:
1. A new dense projector array designed to show a life-size human gure to multiple
simultaneous users over a wide viewing area. The eld of view can be easily cus-
tomized with distributed rendering across multiple graphics cards and computers.
2. 3D display is integrated with an interactive natural language interface that allows
users to have simulated conversation with prerecorded video interviews.
3. Display content is rendered using an extension of
owed light eld [20]. This tech-
nique allows for real-time resampling of sparse camera video directly to the projector
array.
79
6.2 Interview Process
While it is impossible to anticipate and record all possible questions and answers, the
system is based on the principle that any single prerecorded answer can serve as a viable
response to a much larger set of potential questions. For some applications, it is possible
to utilize scripted answers, carefully worded to be self-contained with a restricted topic
of conversation. One of the rst systems that allowed spoken interaction with a historical
character was the August system [28]. This system used an animated "talking head" fash-
ioned after August Strindberg that could provide tourist information about Stockholm,
as well as deliver quotes from and about Strindberg himself. The Virtual Human Toolkit
[30] has been used to create multiple scripted characters such as Sgt. Blackwell [59] and
the Eva and Grace virtual twins [106, 103], each serving as virtual educational guides
that tell stories in response to user questions. All these systems utilize ctional charac-
ters modelled and animated by artists. In the late 1990s, Marinelli and Stevens came
up with the idea of a "Synthetic Interview", where users can interact with a historical
persona that was composed using video clips of an actor playing that historical character
and answering questions from the user [66]. "Ben Franklin's Ghost" was a system built
on this concept that was deployed in Philadelphia in 2005-2007 [100]. The system used
speech recognition and keyword-spotting to select the responses.
It is not desirable or possible, however, to script all conversations with real people.
Instead we utilize extensive interviews to gather a wide range of natural responses. The
subjects interviewed for this project were experienced public speakers. By analyzing pre-
vious lectures and interviews, we gathered the most common audience questions. We also
devised a set of prompts to further the interaction, including short factual biographical
information, opinions, and stories. In cases where a question does not have a direct an-
swer, a good story can often ll in the gaps. If no response is suitable, the subject will
80
ask the user to restate the question or suggest a new topic. Additional details on the
question development and analysis can be found in [5].
Figure 6.1: (top) Seven of the Panasonic cameras mounted around the stage to record
the performance. (bottom) Mosaic of all 30 camera views.
81
6.3 Data Capture
We record each subject with an array of 30 Panasonic X900MK cameras, spaced every
6 degrees over a 180 degree semi-circle and at a distance of 4 meters from the subject
(see Figure 6.1). The cameras can record multiple hours of 60fps HD footage directly to
SD cards with MPEG compression. As the Panasonic cameras were not genlocked, we
synchronized our videos within 1/120 of a second by aligning their corresponding sound
waveforms.
A major consideration during the interviews was maintaining video and audio continu-
ity. This is important as the time-displaced interaction may jump back and forth between
dierent takes and even dierent days of production. As much as possible, cameras were
triggered remotely to avoid any unnecessary camera motion. We also prepared multiple
identical outts for the interview subject to wear on successive days. Between interview
sessions we would try to match body posture and costume. A video overlay was used
to rapidly compare footage between sessions. Even with all these eorts, maintaining
complete continuity was not possible. In particular, we noticed changes in how clothing
would fold and hang as well as changes in the subject's mood over the course of days.
Both types of changes may be noticeable when transitioning between disparate answers.
Scene illumination was provided by a LED-dome with smooth white light over the upper-
hemisphere (see Figure 6.1). This is a neutral lighting environment that also avoids hard
shadows.
A key feature of natural conversation is eye-contact, as it helps communicate attention
and subtle emotional cues. Ideally, future viewers will feel that the storyteller is addressing
them directly. However, in early tests, when the interviewer was fully visible, the subject
would tend to address the interviewer and not look at the camera. Alternatively, if the
interviewer was completely hidden, the tone of the interview would feel subdued and less
82
engaging. Our solution was to place the interviewer outside the stage and hidden behind
a curtain, while the interviewer's face was visible as a re
ection through a mirror box
aligned with the central cameras.
After the interview we segmented the interview into stand-alone video responses. The
initial rough edit points are marked during the interview transcription process. These
in/out points are rened by automatically detecting the nearest start and end of the
speech where the audio levels rose above a threshold. Occasionally, the detected audio
start and end points will not exactly match the natural video edit points. For example,
if the subject made silent hand-gestures prior to talking, these should be included in the
video clip. In these cases we manually adjusted the audio and video edit points.
6.4 Display Hardware
Previous interactive "digital humans" systems [100, 59] were displayed life-size but us-
ing conventional 2D technology such as large LCD displays, semi-transparent projection
screens or Pepper's ghost displays [101]. While many dierent types of 3D displays exist,
most are limited in size and/or eld of view. Our system utilizes a large automultiscopic
3D projector array display capable of showing a full human body.
Early projector array systems [38, 68] utilized a multilayer vertical-oriented lenticular
screen. The screen optics refracted multiple pixels behind each cylindrical lens to multi-
ple view positions. As introduced in Chapter 5, recent projector arrays [95, 47, 123, 44]
utilize dierent screens based on vertically scattering anisoptropic materials. The vertical
scattering component allows the image to be seen by multiple heights. The narrow hor-
izontal scattering allows for greater angular density as it preserves the angular variation
of the original projector spacing.
83
335.28 CM
121.92 CM
.63 DEGREES
216 PROJECTORS
SCREEN
SYNC
SERVER
&
AUDIO
PLAYBACK
COMPUTERS
X6
RENDER
CLIENT 1
#1
#72
#61
#12
RENDER
CLIENT 6
SPLITTERS
X72
PROJECTORS
X216
Figure 6.2: (left) Overhead diagram showing layout of projectors and screen. (right)
Diagram showing connectivity between computers, splitters, and projectors. The render
clients are synchronized and controlled by a single master server.
In order to reproduce full-body scenes, the projectors making up the array require
higher pixel resolutions and brightness. Secondly as full bodies have more overall depth,
we must increase the angular resolution to resolve objects further away from the projection
screen. We use LED-powered Qumi v3 projectors in a portrait orientation, each with 1280
800 pixel image resolution (Figure 6.2). A total of 216 video projectors are mounted
over 135
in a 3.4 meter radius semi-circle. At this distance, the projected pixels ll a
2 meter tall anisotropic screen with a life-sized human body (Figure 6.3). The narrow
0:625
spacing between projectors provide a large display depth of eld. Objects can
be shown within about 0.5 meters of the screen with minimal aliasing. For convincing
stereo and motion parallax, the angular spacing between views was also chosen to be
small enough that several views are presented within the intraocular distance.
The screen material is a vertically-anisotropic light shaping diuser manufactured by
Luiminit Co. The material scatters light vertically (60
) so that each pixel can be seen
at multiple viewing heights and while maintaining a narrow horizontal blur (1
). From
a given viewer position, each projector contributes a narrow vertical slice taken from the
84
Anisotropic screen 2:5
1:25
0:625
Figure 6.3: (left) Photograph showing the 6 computers, 72 video splitters, and 216 video
projectors used to display the subject. (right) The anisotropic screen scatters light from
each projector into a vertical stripe. The individual stripes can be seen if we reduce
the angular density of projectors. Each vertical stripe contains pixels from a dierent
projector.
corresponding projector frame. In Figure 6.3, we compare dierent projector spacings. If
the angle between projectors is wider than the horizontal diusion, the individual vertical
slices can be observed directly. As the angular resolution increases, the gaps decrease in
size. Ideally, the horizontal screen blur matches the angular separation between projectors
thus smoothly lling in the gaps between the discrete projector positions and forming a
seamless 3D image.
To maintain modularity and some degree of portability, the projector arc is divided into
three separate carts each spanning 45 degrees of the eld of view. We use six computers
(two per cart) to render the projector images. Each computer contains two ATI Eyenity
7870 graphics cards with 12 total video outputs. Each video signal is then divided three
ways using a Matrox TripleHead-to-Go video DisplayPort splitter, so that each computer
feeds 36 projectors. A single master server computer sends control and synchronizat.ion
commands to all connected carts (see Figure 6.2)
85
Ideally all projectors would receive identical HDMI timing signals based on the same
internal clock. While adapters are available to synchronize graphics cards across multiple
computers (such as Nvidia's G-Sync cards), the Matrox video splitters follow their own
internal clocks and the nal video signals no longer have subframe alignment. This eect
is only noticeable due to the time-multiplexed color reproduction on single chip DLV
projectors. Short video camera exposures will see dierent rainbow striping artifacts for
each projector, however this eect is rarely visible to the human eye. Designing a more
advanced video splitter that maintains the input video timing or accepts an external sync
signal is a subject for future work.
We align the projectors with a per-projector 2D homography that maps projector
pixels to positions on the anisotropic screen. We compute the homography based on
checker patterns projected onto a diuse screen placed in front of the anisotropic surface.
6.5 Light Field Rendering
The main problem in rendering images for the automultiscopic display is that the camera
array used to capture the input video sequences is very sparse compared to the projector
array. The cameras are placed 6 degrees apart while the angle between the projectors is
only 0.625 degrees. It is therefore necessary to synthesize new views for projectors which
are placed in-between the cameras. Furthermore rays emitted by each projector continue
to diverge as they pass through the anisotropic screen. Rendering to such a display
requires the generation of multiple center of projection (MCOP) imagery, as dierent slices
of the projector frame must be rendered according to the varying viewpoints. Previous
methods for rendering MCOP imagery on automultiscopic displays have required either
high-density light elds [43] or existing geometry [42, 44].
86
Many techniques have been proposed to reconstruct 3D geometry from multiple cam-
eras, however, this typically requires slower global optimization across all views [97].
Additional depth cameras [11] can accelerate quality or processing rates for playback on
augmented reality headsets [1].
Figure 6.4: We compute bidirectional optical
ow between adjacent pairs in the horizontal
cameras array.
An intuitive way to view the recorded data is as a light eld parameterized at the
projection screen. Light elds are ideal for rendering complex non-convex shapes with a
wide variety of materials for skin and clothing, and multiple occlusions from limbs. It also
does not require global reconstruction of 3D geometry or perfectly synchronized data. A
light eld can be rendered by identifying the nearest cameras, and sampling pixel values
that correspond to each projector ray [61]. This approach was used by Matusik et al. [68]
to generate imagery on a 3D projector array based on a dense camera array. For sparse
camera arrays such as the full-body camera array, linear sampling will cause noticeable
aliasing (see Figure 6.5). Instead, we utilize
owed light elds [20] to predict intermediate
camera positions. The core idea is to compute pair-wise optical
ow correspondences
87
Figure 6.5: (left) View generated using bilinear interpolation exhibits aliasing. (center)
View generated using spatial optical
ow has sharp edges and less aliasing. (right) Closeup
of aliasing around the face.
between adjacent cameras as illustrated in Figure 6.4. All resampling is computed in
real-time on the GPU [112], requiring only the original video streams and optical
ow
osets.
Flow-based light eld reconstruction assumes that optical
ow osets are locally
smooth so we can use multiple nearby optical
ow osets to rene each sample camera
coordinate. We use the screen surface as proxy geometry to nd initial camera coordi-
nates. For each target ray, we nd the intersection point with the screen surface, and
project it into each adjacent camera. As the projector positions are xed, this mapping
between projector pixels and camera coordinates is constant, and is precomputed as a
lookup table.
The optical
ow vectors correspond each camera coordinate to a second coordinate in
the adjacent camera. In practice, each pair of coordinates references a slightly dierent
88
PLANAR
PROXY
ACTUAL SURFACE
CAMERA 1
VIRTUAL VIEW
CAMERA 2
CAMERA 1
TIME
IMAGE SPACE
GLOBAL
TIME
CAMERA 2
SPATIAL FLOWS
TEMPORAL FLOWS
Figure 6.6: For each point on the screen, we sample the spatial optical
ows elds be-
tween the two nearest cameras. Each optical
ow pair represent a dierent point on the
actual surface. We estimate an intermediate image coordinate using both spatial
ow to
interpolate between views and temporal
ows to oset to a global timeframe. In the right
diagram, each image plane is each image plane is represented as a slanted line due camera
rolling shutter in each camera
surface point on either side of the ideal sample coordinate, since the screen surface does
not match the true shape of the subject. We interpolate between the two coordinate
pairs to get a single sample point for each camera, weighted by the angular distance
between the target ray and each camera. Finally, we interpolate the pixel samples from
the two cameras. An illustration of the two optical
ow pairs and the interpolated sample
positions is shown in Figure 6.6.
To compensate for rolling shutter eects, we also compute the temporal optical
ow,
i.e. between sequential frames in each individual video sequence. The temporal
ow is
then used to add an additional temporal oset to the nal sample position weighted by
the rolling shutter oset for each row on the sensor and distance from the global time.
We are able to compute light eld sampling in real-time in our distributed rendering
framework. The optical
ow osets and samples are combined in a pixel shader for each
89
Figure 6.7: This is a sampling of four projector frames generated using
owed light eld
rendering. Each frame appears warped as it corrects for multiple centers of projection
and foreshortening.
projector ray. For high resolution input, or if more projectors are connected to each host-
PC, the algorithm may be limited by GPU upload speeds for the original video les and
precomputed optical
ow vectors. However, using modern motherboards and GPUs, this
is less of a problem. For example, using PCIe 3.0 the maximum bandwidth is in the order
of 900MB/s for each lane, and higher end motherboards usually have at least two 16x
PCIe ports. As the optical
ow is generally smooth, we downsample the
ow to quarter
resolution, and only upload videos associated with nearby camera positions. If there is
insucient bandwidth to upload both spatial and temporal
ows, the input video les
can be retimed as a separate preprocess.
6.6 Natural Language Interface
In a typical use scenario, the digital speaker presents a short introduction or overview to
provide context and inspire a followup question and answer session. The audience watches
90
the speaker on the 3D display, and interacts by speaking into a microphone and clicking a
push to talk button to tell the system when to listen. We make use of Google API speech
recognition to convert the initial audio to text, though the system is compatible with
other general purpose recognizers [75]. The corresponding video response is chosen using
the NPCEditor dialog manager [60]. The dialog manager is based on cross-language
information retrieval, and calculates the relevance of words in a training data of user
inputs to words in the set of subject recordings. A total ranking is provided of all possible
responses, which is fairly robust to many speech recognition errors. At runtime, if the
condence for a selected response falls below the predetermined threshold, the the subject
asks the user to rephrase the question or suggests an alternate topic. Message passing
between speech recognizer, dialog manager, and video player is based on the publicly
available virtual human toolkit [30].
6.7 Results
The rst full application of this technology was to preserve the experience of in-person
interactions with Holocaust survivors. Currently, many survivors visit museums and class-
rooms to educate, connect and inspire students. There is now an urgency to record these
interactive narratives for the few remaining survivors before these personal encounters are
no longer possible. Through 3D recording, display, and interacting, we seek to maintain
a sense of intimacy and presence, and remain relevant to the future.
Our rst subject was Pinchas Gutter. Mr. Gutter was born in Poland in 1932, lived
in a Warsaw ghetto and survived six concentration camps before being liberated by the
Russians in 1945. The interview script was based on the top 500 questions typically
asked of Holocaust survivors, along with stories catered to his particular life story. The
full dataset includes 1897 questions totaling 18 hours of dialog. These questions are
91
linked to 10492 training questions providing enough variation to simulate spontaneous
and informative conversations. The interactive system was rst demonstrated on an 80-
inch 2D video screen at the Illinois Holocaust Museum and Education Center [107]. A user
study based based on the 2D playback found that interactive video inspired students to
help others, learn about genocide, and feel they could make a dierence. Several students
noted that that the experience felt like a video teleconference with a live person [48].
The 3D projector array system was tested in a public setting with several age groups.
Viewers noted that the 3D display further increased their sense of presence with the
survivor. Many older viewers responded on an emotional level. Anecdotally, many visitors
act as if the survivor was present, apologizing for their suering or if they interrupt. The
most challenging cases were where other Holocaust survivors asked the system questions
re
ecting on their own personal experiences. A user study to quantitatively compare the
2D and 3D experiences is a subject for future work.
For this paper, we conducted two additional short interviews with standing subjects.
Each interview was limited to 20-30 questions over 2 hours, but still allows for short
moderated conversations. Figure 6.9 shows stereo photographs of all three subjects on
the display. The accompanying video shows several 3D conversations with live natural
language recognition and playback.
Figure 6.5 shows a comparison of view interpolation with and without optical
ow
correction. Optical-
ow based interpolation dramatically reduces ghosting between adja-
cent camera positions. In a few cases, aliasing is still visible on the subject's hands where
optical
ow struggles to nd accurate correspondences. The current optical
ow settings
sacrice some quality in order to handle the large video dataset. Each individual optical
ow takes 0.5 seconds on a nVidia GTX980. This adds up to a total of 30 seconds per
frame to precompute pair-wise optical
ows for all camera views.
92
In this chapter, we only consider a one-dimensional array of cameras for horizontal
view interpolation. The addition of multiple rows of cameras would enable changes in
viewer height and tracked vertical parallax (as in previous chapters). An alternative
approach would be reconstruct 3d geometry to simulate dierent vertical perspectives.
We also adapted the geometry rendering algorithm from Chapter 5 to the large projector
array. Sample scenes based on 3d geometry can be seen in Figure 6.10.
Figure 6.8: Photograph of subject shown on the automultiscopic projector array.
93
Figure 6.9: Stereo photograph of subjects on the display, left-right reversed for cross-fused
stereo viewing. Each subject is shown from three positions.
94
Figure 6.10: Stereo photographs of 3D geometry on the display, left-right reversed for
cross-fused stereo viewing. Each object is shown from four positions.
95
Chapter 7
Future Work
Over the last decade there has been a resurgence of interest in 3D displays including stereo
movies, virtual reality, and new display form factors. As researchers continue to develop
new types of displays, this thesis suggests several new avenues for further research.
7.1 Spinning mirror displays
7.1.1 Data Transfer
In Chapters 3 and 4, we utilized a high-speed video projector to generate multiple views
on a spinning anisotropic screen. The resolution and frame rate of the display were limited
by the available data bandwidth between the source computer and the high-speed video
projector. The custom video projector utilized a Texas Instruments Discovery Kit and
coupled FPGA to rapidly decode real-time video input. In Chapter 3, we achieved 4320fps
at 1024x768 resolution by overclocking a single-channel DVI signal at 180hz and decoding
each 24-bit image into sequential binary views. In Chapter 4, we split each frame in half
to double the frame rate to 8640fps at 512x768 resolution. Alternative communication
standards such as dual-channel DVI, multiple DVI/HDMI connections, or a dedicated
network connection such as Inniband could further increase bandwidth. Modern DLP
96
chips can theoretically support frame rates over 20,000fps which would eliminate
icker
and increase angular resolution.
New development kits, such as the Texas Instruments LightCommander, allow for
rapid prototyping and programming of DLP projectors. Typically these kits can play
back short patterns sequences from
ash memory at speeds around 4225fps as well as
streamed sequences over a video interface at slower frame rates. The ability to playback
prerecorded 3D videos from
ash memory could enable small inexpensive displays that
no longer require a connected computer for static 3D scenes.
7.1.2 Dithering
The high-speed projectors sacrice grey-level and color reproduction to achieve higher
frame rates. We performed both ordered dithering and error diusion [85] on individual
projector frames to recreate grey levels. A more optimum algorithm would perform error
diusion across multiple views instead of processing each frame independently. If 3D
geometry is available, dithering could be performed in object texture space. This approach
has the advantage that the remaining dithering errors would be perceived as being on the
surface of the object instead as a
oating pattern across the entire projector frame. Similar
techniques have been used previously to achieve non-photorealistic rendering eects such
as cross-hatching on the surface of objects [17].
7.1.3 Color reproduction
Future designs could also improve the color gamut of the system. In Chapter 4, we
combined two projectors with cyan and orange lters. Utilizing a three-chip projector
would provide true RGB color reproduction. One advantage of our approach is that it
makes exceptionally ecient use of light - the current display is extremely bright when
97
unltered. This may enable future versions to employ multi-color LED illumination which
has the positive consequences that the color gamut can be very wide and the timing of
the LEDs may be very precisely controlled.
7.1.4 Alternate Sizes
The mechanical system on the spinning mirror could be improved to enable quieter, larger,
and more robust form factors. Magnetic bearings could reduce the rotational friction for
medium-sized displays. Larger displays, up to the size of a full human body, would require
low-pressure chambers to reduce aerodynamic resistance to the spinning display surface.
An alternate direction would be to reduce the overall size and cost of the display, utilizing
smaller LED projection units and embedded rendering engines to create hand-held and
untethered units.
7.2 Projector Arrays
Projector arrays can be scaled to a variety of sizes and elds of view by simply adding
additional video projectors. In Chapter 5, we demonstrated
at and curved anisotropic
screens to display a 3D human face, then in Chapter 6, we used
at anisotropic screens
to display the full human body. The curved screens could be extended to a full 360
eld
of view using a re
ective anisotropic cylinder at the center of a circle of video projectors.
The projectors would need to be placed overhead to avoid occlusions from the viewers
themselves. In these prototypes, we used multiple anisotropic screen materials including
o-the-shelf holographic diusers, plastic lenticular lenses, and brushed metal. Further
studies could explore other custom diusing materials and scattering proles to reduce
glare from ambient illumination. For some applications, the anisotropic screen could be
re
ected in a Pepper's ghost mirror [101] in order to overlay the 3D projected image in
98
front of real-world objects. Dense projector arrays could also be integrated into multi-layer
computational displays [113], for example as a directional backlight to a semi-transparent
LCD screen.
7.3 Applications
In this thesis we explored several applications for face-to-face 3D interactions with life-
sized digital humans both for live teleconferencing and prerecorded interviews.
7.3.1 3D teleconferencing
One of the primary advantages of automultiscopic 3D displays is the ability to perceive
correct eye gaze. The Microsoft teleportation project [1] implements 3D teleconferencing
for augmented reality displays with real-time full body reconstruction; however, the head-
mounted display occludes the eyes. Recent work has used sensors attached to head-
mounted displays to recover occluded regions; however, subtle eye motions are still missing
[62, 67].
In Chapter 4, we utilized structured light patterns to scan the human face in real-time.
Similar quality could now be achieved with commercial depth cameras using infrared noise
patterns. Digital 3D capture will become increasingly commonplace as depth cameras are
integrated into many dierent personal devices. However, as long as people continue
to communicate across a wide range of platforms, such as smart phones, tablets, and
3D displays, telecommunication software will have to support multi-way conversations
combining both 2D and 3D participants.
99
7.3.2 Time-oset conversations
One major area of future research is reducing the eort required to record and edit a time-
oset conversation. The system currently relies on extensive interviews spanning multiple
days. In many cases, a subject may not be available for long continuous interviews so
an interview may have to be staggered over multiple weeks or months. This will present
a challenge for maintaining continuity. In the extreme case, interview segments may be
recorded in dierent locations with varying cameras and lighting.
Continuity errors could be partially hidden by more advanced transitions between
video clips. Our system currently uses hard cuts between clips; dissolves or
owed tran-
sitions would be less visible. If 3D geometry is available, it may be possible to correct for
more extreme continuity changes such as changes in lighting, body pose, and clothing.
It is interesting to note that changes in body pose between clips are signicantly more
noticeable in 3D than with traditional 2D video playback.
One drawback of 3D time-oset conversations is the lack of dynamic eye contact. As
the 3D video interview is prerecorded, the eye gaze is xed. When a visitor asks a question,
the digital subject is unable to turn to face them directly. A possible research direction
would be to dynamically alter the eye gaze in the existing 3D video. Kuster et al. [52]
previously developed a technique for minor corrections in eye gaze in 3D teleconferencing
by compositing reoriented 3D faces onto live video. Larger changes in eye gaze will require
simulating neck and body rotations in addition to the face.
Real-world conversations are not always limited to a single subject. The dialog man-
agement system could be extended to include multiple prerecorded subjects, each taking
turns to respond to user questions. As 3D video may not always be available, the dialog
could incorporate other media elements such as animated clips, photographs, paintings,
or audio recordings to enhance the interactive storytelling.
100
7.4 Practical Considerations
The nal goal is to turn automultiscopic displays into viable products. Some lessons
can be drawn from the history of virtual reality. While virtual reality technology has
been around for decades, it was the recent combination of high-resolution LCD displays
from mobile phones and inexpensive plastic lenses that enabled the development of new
commercial products such as the Occulus Rift, HTC Vive, and Samsung Gear VR. The
continuing challenge for automultiscopic displays is nding practical methods to make
pixels even smaller, faster, and cheaper. In particular, the displays proposed in this thesis
will require the mass-production of low-cost LCD projectors and high-speed DLP chips.
Both automultiscopic and head-mounted displays require compelling content to build
up an established user base. In this thesis, we have presented several techniques for 3D
content acquisition including dynamic light eld capture and real-time geometry scan-
ning. Standard rendering interfaces such as OpenVR [2] and game engine plugins are
already available for head-mounted displays. For virtual reality, each eye view is rendered
using a standard perspective camera model, so multiple center-of-projection rendering is
not required. However, such interfaces cleanly handle issues such as hardware calibra-
tion, radial and chromatic distortion, and motion tracking. Automultiscopic displays will
require similar standard interfaces that allow developers to create content while hiding
the technical details. System setup has to be quick and automatic with self-calibrating
projectors and cameras. Automultiscopic displays will also in
uence the design of new
virtual reality hardware. For example, multi-layer LCDs [32, 55] have been integrated
into virtual reality hardware to simulate depth cues such as accommodation.
101
Chapter 8
Conclusion
Automultiscopic displays hold the promise of seamless 3D imagery that can be seen from
any viewpoint without the need for special 3D glasses. These displays aim to recreate
a scene's light eld by controlling the amount of light
owing in every direction leaving
the display surface. In this thesis, we have presented several prototypes for solving this
challenge. In Chapters 3 and 4, we used a mechanical spinning screen to re
ect projected
rays to viewers all the way around the display. In Chapters 5 and 6, we used a xed array
of projectors to create similar wide distribution of light. While these designs represent
only a small subset of possible display types, they share several core principles with wider
applications.
Unlike previous projector arrays, our hardware does not enforce a one-to-one mapping
between viewpoints and rendered frames. This decision simplies the hardware design us-
ing o-the-shelf anisotropic screens. Instead of using complex optics to redirect projector
rays, we predistort the rendered frames to account for the divergence of projector rays.
We developed two main algorithms for rendering multiple viewpoints within the same
image - either using a custom geometric projection or light eld sampling techniques. As
available rendering capabilities continue to increase, we can achieve results using the new
software algorithms that it would be dicult to do with hardware complexity alone. For
102
example, we generalized the projection mathematics to handle a wide range of anisotropic
screen shapes:
at or curved, spinning or stationary, re
ective or transmissive.
A second advantage of our approach is that it does not require precise hardware align-
ment. Small misalignments of the screen do not cause noticeable geometric or chromatic
artifacts, as the screen is made from a uniform anisotropic material. This is a major
advantage over traditional lenticular systems that require precise alignment between the
optical lens array and the source pixels. Similarly, system calibrations are more robust.
Unless the systems are physically moved, system calibrations can be valid for months or
years. The spinning mirror display has run for over 5 years without the need for a new
geometric calibration. The projector array requires more frequent calibrations as the lens
focus slowly distorts due to temperature
uctuations.
All our designs support 3D viewing over a wide eld of view. The ability to natu-
rally move around the display and see many dierent perspectives of the scene creates
a strong sense of reality and presence. For scenes with large discontinuities, motion
parallax serves as a major depth cue. Wide elds of view also allow multiple users to
interact simultaneously with the same scene and each other. Wide eld-of-view displays
will represent a new challenge for content creation. Most rules for stereo lm-making are
based around the idea of a controlled viewer, where even multiple audience members see
the same perspective. The user experience with automultiscopic displays is more similar
to theater-in-the-round staging where the actors and director must take many dierent
viewpoints into consideration.
Finally, we proposed a novel hybrid solution for vertical parallax. Traditionally full
motion parallax typically comes at a high cost as the total pixel count needed by a display
is directly proportional to the number of viewers. A 3D display with 10 horizontal views
requires 10 times the number of pixels as a similar 2D display, but a display with 10
103
horizontal and vertical views requires 100 times the pixels. As a result, most autostereo-
scopic displays are limited to only horizontal parallax where the image does not change
as the viewer changes height. This is a reasonable tradeo as human movement is domi-
nated by horizontal motion. Yet, it is desirable to still handle multiple users with varying
physical heights, and changes in vertical parallax as users approach the display, jump,
or crouch. We solved this problem by incorporating face tracking to generate custom
vertical parallax for multiple users. We observed that each horizontal user position sees
a separate view from a dierent subset of display pixels. We then identify and render
each subset of pixels with customized vertical perspective. In Chapter 4, we solved for
one user height per projector frame, and in Chapter 5, we extended the method to handle
multiple heights within each rendered frame. We smoothly interpolate between viewer
heights while handling users approaching and leaving the display area. This is an ecient
method that minimizes rendering unused viewer heights while supporting instantaneous
horizontal stereo and motion parallax.
Anisotropic spinning mirror and projector array displays each have distinct advan-
tages. Spinning mirror displays (Chapters 3 and 4) require fewer components: a single
high-speed projector, computer, and synchronized motor. The system can also easily sup-
port full 360
viewing. Displays based on projector arrays can be scaled to much larger
sizes with higher frame rates and color depth, though with greater cost due to the much
larger number of projectors and computers.
The success of automultiscopic 3D displays will be contingent on the development of
meaningful applications. In this thesis, we particularly focused on interpersonal com-
munication. In Chapter 4, we explored 3D telecommunication. Telecommunication will
continue to have greater importance as our social and business connections span growing
104
distances. Our 3D teleconferencing system creates a sense of remote presence by trans-
mitting the face of a participant to an audience gathered around the 3D display while
maintaining accurate cues for gaze, attention, and eye contact. The ability to receive
and direct eye gaze is particularly important for larger conversations involving more than
two people, and is not possible using existing 2D systems. In Chapter 5, we introduced
time-oset conversations based on prerecorded interviews. This new form of communi-
cation will allow speakers to interact with larger audiences across space and time. We
envision a wide range of conversations with politicians, celebrities, and academics for both
entertainment and education. Our rst results help preserve the testimony of Holocaust
survivors, and allow future generations to meet with and speak to survivors face to face.
3D displays such as these should become increasingly practical in the years to come
as the core graphics and image projection components decrease in price and increase in
capabilities. My hope is that these contributions will enable other researchers in com-
puter graphics and immersive displays to develop new 3D technology and content. As
automultiscopic displays become more commonplace, the focus should be on developing
content and experiences that will connect people to each other and the world.
105
BIBLIOGRAPHY
[1] Microsoft research holoportation, 2016 (accessed: 2016-4-29). http://research.
microsoft.com/en-us/projects/holoportation/.
[2] Openvr github repository, 2016 (accessed: 2016-6-30). https://github.com/
ValveSoftware/openvr.
[3] Kurt Akeley, Simon J. Watt, Ahna Reza Girshick, and Martin S. Banks. A stereo
display prototype with multiple focal distances. ACM Transactions on Graphics,
23(3):804{813, August 2004.
[4] Michael Argyle and Mark Cook. Gaze and Mutual Gaze. Cambridge University
Press, London, 1976.
[5] Ron Artstein, Anton Leuski, Heather Maio, Tomer Mor-Barak, Carla Gordon, and
David Traum. How Many Utterances Are Needed to Support Time-Oset Inter-
action? In Proceedings of FLAIRS 28, pages 144{149, Hollywood, FL, May 2015.
AAAI Press.
[6] Robert G. Batchko. Three-hundred-sixty degree electro-holographic stereogram and
volumetric display system. In Proc. SPIE, volume 2176, pages 30{41, 1994.
[7] David Brewster. The Stereoscope; its History, Theory, and Construction, with its
Application to the ne and useful Arts and to Education. John Murray, 1856.
[8] Jin-Xiang Chai, Xin Tong, Shing-Chow Chan, and Heung-Yeung Shum. Plenoptic
sampling. In Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceed-
ings, Annual Conference Series, pages 307{318, July 2000.
[9] Milton Chen. Leveraging the asymmetric sensitivity of eye contact for videocon-
ference. In CHI '02: Proceedings of the SIGCHI conference on Human factors in
computing systems, pages 49{56, New York, NY, USA, 2002. ACM.
[10] Renjie Chen, Andrew Maimone, Henry Fuchs, Ramesh Raskar, and Gordon Wet-
zstein. Wide eld of view compressive light eld display using a multilayer ar-
chitecture and tracked viewers. Journal of the Society for Information Display,
22(10):525{534, 2014.
106
[11] Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Cal-
abrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. High-quality streamable
free-viewpoint video. ACM Trans. Graph., 34(4):69:1{69:13, July 2015.
[12] Oliver Cossairt, Adrian R. Travis, Christian Moller, and Stephen A. Benton. Novel
view sequential display based on dmd technology. In Andrew J. Woods, John O.
Merritt, Stephen A. Benton, and Mark T. Bolas, editors, Proc. SPIE, Stereoscopic
Displays and Virtual Reality Systems XI, volume 5291, pages 273{278, May 2004.
[13] Oliver S. Cossairt and Joshua Napoli. Radial multiview 3-dimensional displays.
United States Patent Application 2005/0180007, Aug 2005.
[14] Oliver S. Cossairt, Joshua Napoli, Samuel L. Hill, Rick K. Dorval, and Gregg E.
Favalora. Occlusion-capable multiview volumetric three-dimensional display. Ap-
plied Optics, 46(8):1244{1250, Mar 2007.
[15] Carolina Cruz-Neira, Daniel J. Sandin, and Thomas A. DeFanti. Surround-screen
projection-based virtual reality: the design and implementation of the cave. In
Proceedings of the 20th annual conference on Computer graphics and interactive
techniques, SIGGRAPH '93, pages 135{142, New York, NY, USA, 1993. ACM.
[16] James E. Cutting and Peter Vishton. Perceiving layout and knowing distances:
The integration, relative potency, and contextual use of dierent information about
depth. pages 69{177, 1995.
[17] Doug DeCarlo, Adam Finkelstein, Szymon Rusinkiewicz, and Anthony San-
tella. Suggestive contours for conveying shape. ACM Transactions on Graphics,
22(3):848{855, July 2003.
[18] Neil A. Dodgson. Autostereoscopic 3D displays. Computer, 38(8):31{36, 2005.
[19] Julie O'B. Dorsey, Fran cois X. Sillion, and Donald P. Greenberg. Design and simu-
lation of opera lighting and projection eects. In Computer Graphics (Proceedings
of SIGGRAPH 91), pages 41{50, July 1991.
[20] Per Einarsson, Charles-Felix Chabert, Andrew Jones, Wan-Chun Ma, Bruce Lam-
ond, Tim Hawkins, Mark Bolas, Sebastian Sylwan, and Paul Debevec. Relighting
human locomotion with
owed re
ectance elds. In Rendering Techniques 2006:
17th Eurographics Symposium on Rendering, pages 183{194, June 2006.
[21] Tomohiro Endo, Yoshihiro Kajiki, Toshio Honda, and Makoto Sato. Cylindrical 3D
video display observable from all directions. In 8th Pacic Conference on Computer
Graphics and Applications, pages 300{306, October 2000.
[22] Gregg E. Favalora. Volumetric 3D displays and application infrastructure. Com-
puter, 38(8):37{44, 2005.
107
[23] David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach, chap-
ter 3, page 45. Prentice Hall, 2002.
[24] Jim Gemmell, Kentaro Toyama, C. Lawrence Zitnick, Thomas Kang, and Steven
Seitz. Gaze awareness for video-conferencing: a software approach. Multimedia,
IEEE, 7(4):26{35, Oct-Dec 2000.
[25] Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. The
lumigraph. In Proceedings of SIGGRAPH 96, Computer Graphics Proceedings,
Annual Conference Series, pages 43{54, August 1996.
[26] David M. Grayson and Andrew F. Monk. Are you looking at me? Eye contact and
desktop video conferencing. ACM Trans. Comput.-Hum. Interact., 10(3):221{243,
2003.
[27] Markus Gross, Stephan W urmlin, Martin Naef, Edouard Lamboray, Christian
Spagno, Andreas Kunz, Esther Koller-Meier, Thomas Svoboda, Luc Van Gool, Silke
Lang, Kai Strehlke, Andrew Van de Moere, and Oliver Staadt. blue-c: A spatially
immersive display and 3d video portal for telepresence. ACM Transactions on
Graphics, 22(3):819{827, July 2003.
[28] Joakim Gustafson, Nikolaj Lindberg, and Magnus Lundeberg. The august spoken
dialogue system. In In Proceedings of Eurospeech'99, pages 1151{1154, 1999.
[29] Michael W. Halle, Stephen A. Benton, Michael A. Klug, and John S. Underkoer.
The ultragram: A generalized holographic stereogram. In Proceedings of the SPIE
Practical Holography V, 1991.
[30] Arno Hartholt, David Traum, Stacy C. Marsella, Ari Shapiro, Giota Stratou, Anton
Leuski, Louis-Philippe Morency, and Jonathan Gratch. All Together Now: Intro-
ducing the Virtual Human Toolkit. In 13th International Conference on Intelligent
Virtual Agents, Edinburgh, UK, August 2013.
[31] Tim Hawkins, Andreas Wenger, Chris Tchou, and Andrew Gardner Fredrik Gorans-
son Paul Debevec. Animatable facial re
ectance elds. In Eurographics Symposium
on Rendering: 15th Eurographics Workshop on Rendering, June 2004.
[32] Felix Heide, Douglas Lanman, Dikpal Reddy, Jan Kautz, Kari Pulli, and David
Luebke. Cascaded displays: Spatiotemporal superresolution using oset pixel layers.
ACM Transactions on Graphics, 33(4):60:1{60:11, July 2014.
[33] Matthew Hirsch, Douglas Lanman, Gordon Wetzstein, and Ramesh Raskar. Con-
struction and calibration of optically ecient lcd-based multi-layer light eld dis-
plays. In Journal of Physics: Conference Series, volume 415, page 012071. IOP
Publishing, 2013.
108
[34] Matthew Hirsch, Gordon Wetzstein, and Ramesh Raskar. A compressive light eld
projection system. ACM Transactions on Graphics (TOG), 33(4):58, 2014.
[35] Xianyou Hou, Li-Yi Wei, Heung-Yeung Shum, and Baining Guo. Real-time multi-
perspective rendering on graphics hardware. In Rendering Techniques 2006: 17th
Eurographics Workshop on Rendering, pages 93{102, June 2006.
[36] Aaron Isaksen, Leonard McMillan, and Steven J. Gortler. Dynamically reparame-
terized light elds. In Proceedings of ACM SIGGRAPH 2000, Computer Graphics
Proceedings, Annual Conference Series, pages 297{306, July 2000.
[37] Hiroshi Ishii, Minoru Kobayashi, and Jonathan Grudin. Integration of interpersonal
space and shared workspace: Clearboard design and experiments. ACM Trans. Inf.
Syst., 11(4):349{375, 1993.
[38] Herbert E Ives. The projection of parallax panoramagrams. Journal of the Optical
Society of America, 21(7):397{409, 1931.
[39] Jason Jerald and Mike Daily. Eye gaze correction for videoconferencing. In ETRA
'02: Proceedings of the 2002 symposium on Eye tracking research & applications,
pages 77{81, New York, NY, USA, 2002. ACM.
[40] Andrew Jones, Paul Debevec, Mark Bolas, and Ian McDowall. Concave surround
optics for rapid multiview imaging. In Proceedings of the 25th Army Science Con-
ference, 2006.
[41] Andrew Jones, Andrew Gardner, Mark Bolas, Ian McDowall, and Paul Debevec.
Performance geometry capture for spatially varying relighting. In 3rd European
Conference on Visual Media Production (CVMP 2006), pages 127{133, Nov 2006.
[42] Andrew Jones, Magnus Lang, Graham Fye, Xueming Yu, Jay Busch, Ian Mc-
Dowall, Mark Bolas, and Paul Debevec. Achieving eye contact in a one-to-many
3d video teleconferencing system. In ACM Transactions on Graphics (TOG), vol-
ume 28, page 64. ACM, 2009.
[43] Andrew Jones, Ian McDowall, Hideshi Yamada, Mark Bolas, and Paul Debevec.
Rendering for an interactive 360 light eld display. ACM Transactions on Graphics
(TOG), 26(3):40, 2007.
[44] Andrew Jones, Koki Nagano, Jing Liu, Jay Busch, Xueming Yu, Mark Bolas, and
Paul Debevec. Interpolating vertical parallax for an autostereoscopic 3d projector
array. volume 9011, 2014.
[45] Joel Jurik, Andrew Jones, Mark Bolas, and Paul Debevec. Prototyping a light eld
display involving direct observation of a video projector array. In Computer Vi-
sion and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society
Conference on, pages 15{20. IEEE, 2011.
109
[46] James T. Kajiya and Timothy L. Kay. Rendering fur with three dimensional tex-
tures. In Computer Graphics (Proceedings of SIGGRAPH 89), pages 271{280, July
1989.
[47] Masahiro Kawakita, Shoichiro Iwasawa, M. Sakai, Y. Haino, M. Sato, and Naomi
Inoue. 3d image quality of 200-inch glasses-free 3d display system. SPIE Stereoscopic
Displays and Applications XXIII, 8288, 2012.
[48] Karen Kim. New dimensions in testimony: Findings from student pilots. Technical
report, USC Shoah Foundation, August 2015.
[49] Hidei Kimura, Taro Uchiyama, and Hiroyuki Yoshikawa. Laser produced 3d display
in the air. In ACM SIGGRAPH 2006 Emerging Technologies, SIGGRAPH '06, New
York, NY, USA, 2006. ACM.
[50] Peter Tamas Kovacs and Frederik Zilly. 3d capturing using multi-camera rigs, real-
time depth estimation and depth-based content creation for multi-view and light-
eld auto-stereoscopic displays. In ACM SIGGRAPH 2012 Emerging Technologies,
SIGGRAPH '12, pages 1:1{1:1, New York, NY, USA, 2012. ACM.
[51] Wolfgang Krger, Christian-A. Bohn, Bernd Frhlich, Heinrich Schth, Wolfgang
Strauss, and Gerold Wesche. The responsive workbench. IEEE Computer Graphics
and Applications, 14:12{15, 1994.
[52] Claudia Kuster, Tiberiu Popa, Jean-Charles Bazin, Craig Gotsman, and Markus
Gross. Gaze correction for home video conferencing. ACM Trans. Graph. (Proc. of
ACM SIGGRAPH ASIA), 31(6):to appear, 2012.
[53] D Lanman, G Wetzstein, M Hirsch, and R Raskar. Depth of eld analysis for mul-
tilayer automultiscopic displays. In Journal of Physics: Conference Series, volume
415, page 012036. IOP Publishing, 2013.
[54] Douglas Lanman, Matthew Hirsch, Yunhee Kim, and Ramesh Raskar. Content-
adaptive parallax barriers: Optimizing dual-layer 3d displays using low-rank light
eld factorization. ACM Trans. Graph (SIGGRAPH Asia 2010)., 29(6):163:1{
163:10, December 2010.
[55] Douglas Lanman and David Luebke. Near-eye light eld displays. ACM Transac-
tions on Graphics, 32(6):220:1{220:10, November 2013.
[56] Douglas Lanman, Gordon Wetzstein, Matthew Hirsch, Wolfgang Heidrich, and
Ramesh Raskar. Polarization elds: dynamic light eld display using multi-layer
lcds. ACM Transactions on Graphics (TOG), 30(6):186, 2011.
[57] Douglas Lanman, Gordon Wetzstein, Matthew Hirsch, Wolfgang Heidrich, and
Ramesh Raskar. Beyond parallax barriers: applying formal optimization methods
110
to multilayer automultiscopic displays. In IS&T/SPIE Electronic Imaging, pages
82880A{82880A. International Society for Optics and Photonics, 2012.
[58] Stephane Laveau and Olivier Faugeras. 3-D scene representation as a collection of
images. In Proceedings of 12th International Conference on Pattern Recognition,
volume 1, pages 689{691, 1994.
[59] Anton Leuski, Jarrell Pair, David Traum, Peter J. McNerney, Panayiotis G. Geor-
giou, and Ronakkumar Patel. How to Talk to a Hologram. In Ernest Edmonds,
Doug Riecken, C ecile L. Paris, and Candace L. Sidner, editors, Proceedings of the
11th International Conference on Intelligent User Interfaces, pages 360{362, Syd-
ney, Australia, 2006. ACM Press New York, NY, USA.
[60] Anton Leuski and David Traum. NPCEditor: Creating Virtual Human Dialogue
Using Information Retrieval Techniques. AI Magazine, 32(2):42{56, July 2011.
[61] Marc Levoy and Patrick M. Hanrahan. Light eld rendering. In Proceedings of
ACM SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series,
pages 31{42, 1996.
[62] Hao Li, Laura Trutoiu, Kyle Olszewski, Lingyu Wei, Tristan Trutna, Pei-Lun Hsieh,
Aaron Nicholls, and Chongyang Ma. Facial performance sensing head-mounted
display. ACM Transactions on Graphics (Proceedings SIGGRAPH 2015), 34(4),
July 2015.
[63] Jin Liu, Ion Paul Beldie, and Matthias Wpking. A computational approach to estab-
lish eye-contact in videocommunication. In in Videocommunication, Int. Workshop
on Stereoscopic and Three Dimentional Imaging, pages 229{234, 1995.
[64] Hiroyuki Maeda, Kazuhiko Hirose, Jun Yamashita, Koichi Hirota, and Michitaka
Hirose. All-around display for video avatar in real world. In ISMAR '03: Proceedings
of the The 2nd IEEE and ACM International Symposium on Mixed and Augmented
Reality, page 288, Washington, DC, USA, 2003. IEEE Computer Society.
[65] Andrew Maimone, Gordon Wetzstein, Matthew Hirsch, Douglas Lanman, Ramesh
Raskar, and Henry Fuchs. Focus 3d: Compressive accommodation display. ACM
Trans. Graph., 32(5):153{1, 2013.
[66] Donald Marinelli and Scott Stevens. Synthetic interviews: The art of creating a
"dyad" between humans and machine-based characters. In Proceedings of the Sixth
ACM International Conference on Multimedia: Technologies for Interactive Movies,
MULTIMEDIA '98, pages 11{16, New York, NY, USA, 1998. ACM.
[67] Katsutoshi Masai, Yuta Sugiura, Katsuhiro Suzuki, Sho Shimamura, Kai Kunze,
Masa Ogata, Masahiko Inami, and Maki Sugimoto. Aectivewear: Towards rec-
ognizing aect in real life. In Adjunct Proceedings of the 2015 ACM International
111
Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the
2015 ACM International Symposium on Wearable Computers, UbiComp/ISWC'15
Adjunct, pages 357{360, New York, NY, USA, 2015. ACM.
[68] W. Matusik and H. Pster. 3d tv: a scalable system for real-time acquisition,
transmission, and autostereoscopic display of dynamic scenes. In ACM Transactions
on Graphics (TOG), volume 23, pages 814{824. ACM, 2004.
[69] Ian McDowall and Mark Bolas. Display, sensing, and control applications for digital
micromirror displays. In IEEE VR 2005 - Emerging Display Technologies, pages
35{36, 2005.
[70] Wilbert J McKeachie et al. Teaching and learning in the college classroom. a review
of the research literature (1986) and november 1987 supplement. 1987.
[71] Leonard McMillan and Gary Bishop. Plenoptic modeling: An image-based rendering
system. In Proceedings of SIGGRAPH 95, Computer Graphics Proceedings, Annual
Conference Series, pages 39{46, August 1995.
[72] Gavin S. P. Miller, Steven Rubin, and Dulce Ponceleon. Lazy decompression of
surface light elds for precomputed global illumination. Eurographics Rendering
Workshop 1998, pages 281{292, June 1998.
[73] James R. Miller. Geometric approaches to nonplanar quadric surface intersection
curves. ACM Transactions on Graphics, 6(4):274{307, October 1987.
[74] Benjamin Mora, Ross Maciejewski, Min Chen, and David S. Ebert. Visualization
and computer graphics on isotropically emissive volumetric displays. IEEE Trans-
actions on Visualization and Computer Graphics, 15(2):221{234, 2009.
[75] Fabrizio Morbini, Kartik Audhkhasi, Kenji Sagae, Ron Artstein, Dogan Can,
Panayiotis G. Georgiou, Shri Narayanan, Anton Leuski, and David Traum. Which
ASR should I choose for my dialogue system? In SIGDIAL, Metz, France, August
2013.
[76] Errol Morris. The Fog of War: 13 questions and answers
on the lmmaking of Errol Morris. FLM Magazine, 2004.
http://www.errolmorris.com/content/eyecontact/interrotron.html.
[77] L. Muhlbach, M. Bocker, and A. PRUSSOG. Telepresence in videocommunications:
a study on stereoscopy and individual eye contact. Human Factors, 37(2):290{305,
1995.
[78] Naoki Mukawa, Tsugumi Oka, Kumiko Arai, and Masahide Yuasa. What is con-
nected by mutual gaze?: user's behavior in video-mediated communication. In CHI
'05: CHI '05 extended abstracts on Human factors in computing systems, pages
1677{1680, New York, NY, USA, 2005. ACM.
112
[79] Shree K. Nayar and Vijay N. Anand. 3d display using passive optical scatterers.
Computer, 40(7):54{63, 2007.
[80] Shree K. Nayar, Gurunandan Krishnan, Michael D. Grossberg, and Ramesh Raskar.
Fast separation of direct and global components of a scene using high frequency
illumination. ACM Transactions on Graphics, 25(3):935{944, July 2006.
[81] David Nguyen and John Canny. Multiview: spatially faithful group video confer-
encing. In CHI '05: Proceedings of the SIGCHI conference on Human factors in
computing systems, pages 799{808, New York, NY, USA, 2005. ACM.
[82] David T. Nguyen and John Canny. Multiview: improving trust in group video
conferencing through spatial faithfulness. In CHI '07: Proceedings of the SIGCHI
conference on Human factors in computing systems, pages 1465{1474, New York,
NY, USA, 2007. ACM.
[83] Yoichi Ochiai, Takayuki Hoshi, and Jun Rekimoto. Pixie dust: Graphics generated
by levitated and animated objects in computational acoustic-potential eld. ACM
Trans. Graph., 33(4):85:1{85:13, July 2014.
[84] Yoichi Ochiai, Kota Kumagai, Takayuki Hoshi, Jun Rekimoto, Satoshi Hasegawa,
and Yoshio Hayasaki. Fairy lights in femtoseconds: Aerial and volumetric graphics
rendered by focused femtosecond laser combined with computational holographic
elds. ACM Trans. Graph., 35(2):17:1{17:14, February 2016.
[85] Victor Ostromoukhov. A simple and ecient error-diusion algorithm. In Proceed-
ings of ACM SIGGRAPH 2001, Computer Graphics Proceedings, Annual Confer-
ence Series, pages 567{572, August 2001.
[86] Rieko Otsuka, Takeshi Hoshino, and Youichi Horry. Transpost: A novel approach
to the display and transmission of 360 degrees-viewable 3D solid images. IEEE
Transactions on Visualization and Computer Graphics, 12(2):178{185, 2006.
[87] Maximilian Ott, John P. Lewis, and Ingemar Cox. Teleconferencing eye contract
using a virtual camera. In CHI '93: INTERACT '93 and CHI '93 conference
companion on Human factors in computing systems, pages 109{110, New York,
NY, USA, 1993. ACM.
[88] Ken Perlin, Salvatore Paxia, and Joel S. Kollin. An autostereoscopic display. In
Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceedings, Annual
Conference Series, pages 319{326, July 2000.
[89] Pierre Poulin and Alain Fournier. A model for anisotropic re
ection. In Computer
Graphics (Proceedings of SIGGRAPH 90), pages 273{282, August 1990.
113
[90] A. Prussog, L. Muhlbach, and M. Bocker. Telepresence in videocommunications.
In Proceedings of the Human Factors and Ergonomics Society 38th Annual Meet-
ing, volume 1, pages 180{184, Santa Monica, CA, USA, 1994. Human Factors and
Ergonomics Society.
[91] B. Quante and L. Muhlbach. Eye-contact in multipoint videoconferencing. In
Proceedings of the 17th International Symposium on Human Factors in Telecom-
munication, 1999.
[92] Paul Rademacher and Gary Bishop. Multiple-center-of-projection images. In Pro-
ceedings of ACM SIGGRAPH 98, Computer Graphics Proceedings, Annual Confer-
ence Series, pages 199{206, July 1998.
[93] Ramesh Raskar, Greg Welch, Matt Cutts, Adam Lake, Lev Stesin, and Henry Fuchs.
The oce of the future: A unied approach to image-based modeling and spatially
immersive displays. In Proceedings of SIGGRAPH 98, Computer Graphics Proceed-
ings, Annual Conference Series, pages 179{188, July 1998.
[94] Jeremy Rees. Critics pan CNN's fake election holograms. New Zealand Herald, Nov
7 2008.
[95] Tomas Rodriguez, Adolfo Cabo de Leon, Bruno Uzzan, Nicolas Livet, Edmond
Boyer, Florian Geray, Tibor Balogh, Zoltan Megyesi, and Attila Barsi. Holographic
and action capture techniques. In ACM SIGGRAPH 2007 emerging technologies,
SIGGRAPH '07, New York, NY, USA, 2007. ACM.
[96] D. A. D. Rose and P. M. Clarke. A review of eye-to-eye videoconferencing tech-
niques. BT technology journal, 13(4):127{131, 1995.
[97] Steven. M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard
Szeliski. A comparison and evaluation of multi-view stereo reconstruction algo-
rithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pat-
tern Recognition (CVPR'06), volume 1, pages 519{528, June 2006.
[98] Steven M. Seitz and Charles R. Dyer. View morphing: Synthesizing 3d meta-
morphoses using image transforms. In Proceedings of SIGGRAPH 96, Computer
Graphics Proceedings, Annual Conference Series, pages 21{30, August 1996.
[99] Abigail J. Sellen. Remote conversations: The eects of mediating talk with tech-
nology. Human Computer Interaction, 10:401{444, 1995.
[100] Eric Sloss and Anne Watzman. Carnegie Mellon's Entertainment Technology Cen-
ter conjures up Benjamin Franklin's ghost. Press release, Carnegie Mellon Media
Relations, June 28 2005.
[101] Jim Steinmeyer. The Science behind the Ghost: A Brief History of Pepper's Ghost.
Hahne, 1999.
114
[102] Alan Sullivan. A solid-state multi-planar volumetric display. SID Symposium Digest
of. Technical Papers, 32(1):1531{1533, May 2003.
[103] William Swartout, David Traum, Ron Artstein, Dan Noren, Paul Debevec, Kerry
Bronnenkant, Josh Williams, Anton Leuski, Shrikanth Narayanan, Diane Piepol,
H. Chad Lane, Jacquelyn Morie, Priti Aggarwal, Matt Liewer, Jen-Yuan Chiang,
Jillian Gerten, Selina Chu, and Kyle White. Ada and Grace: Toward Realistic
and Engaging Virtual Museum Guides. In Proceedings of the 10th International
Conference on Intelligent Virtual Agents (IVA 2010), Philadelphia, PA, September
2010.
[104] Yuichi Taguchi, Takafumi Koike, Keita Takahashi, and Takeshi Naemura. Transcaip:
A live 3d tv system using a camera array and an integral photography display
with interactive control of viewing parameters. Accepted to IEEE Transactions on
Visualization and Computer Graphics, 2009.
[105] Kenji Tanaka and Soko Aoki. A method for the real-time construction of a full
parallax light eld. In A. J. Woods, N. A. Dodgson, J. O. Merritt, M. T. Bolas,
and I. E. McDowall, editors, Stereoscopic Displays and Virtual Reality Systems
XIII. Proceedings of the SPIE, Volume 6055, pp. 397-407 (2006)., pages 397{407,
February 2006.
[106] David Traum, Priti Aggarwal, Ron Artstein, Susan Foutz, Jillian Gerten, Athana-
sios Katsamanis, Anton Leuski, Dan Noren, and William Swartout. Ada and Grace:
Direct Interaction with Museum Visitors. In Yukiko Nakano, Michael Ne, Ana
Paiva, and Marilyn Walker, editors, The 12th International Conference on Intelli-
gent Virtual Agents (IVA), volume 7502 of Lecture Notes in Articial Intelligence,
pages 245{251, Santa Cruz, CA, September 2012.
[107] David Traum, Andrew Jones, Kia Hays, Heather Maio, Oleg Alexander, Ron Art-
stein, Paul Debevec, Alesia Gainer, Kallirroi Georgila, Kathleen Haase, Karen
Jungblut, Anton Leuski, Stephen Smith, and William Swartout. Interactive Sto-
rytelling: 8th International Conference on Interactive Digital Storytelling, ICIDS
2015, Copenhagen, Denmark, November 30 - December 4, 2015, Proceedings, chap-
ter New Dimensions in Testimony: Digitally Preserving a Holocaust Survivor's In-
teractive Storytelling, pages 269{281. Springer International Publishing, Cham,
2015.
[108] Adrian R. L. Travis. The display of three-dimensional video images. Proceedings of
the IEEE, 85(11):1817{1832, Nov 1997.
[109] Roel Vertegaal. The gaze groupware system: mediating joint attention in multi-
party communication and collaboration. In CHI '99: Proceedings of the SIGCHI
conference on Human factors in computing systems, pages 294{301, New York, NY,
USA, 1999. ACM.
115
[110] Paul Viola and Michael J. Jones. Robust real-time face detection. International
Journal of Computer Vision, 57(2):137{154, 2004.
[111] Colin Ware, Kevin Arthur, and Kellogg S. Booth. Fish tank virtual reality. In
Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in
Computing Systems, CHI '93, pages 37{42, New York, NY, USA, 1993. ACM.
[112] Manuel Werlberger, Thomas Pock, and Horst Bischof. Motion estimation with
non-local total variation regularization. In IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, June
2010.
[113] G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar. Tensor displays: compressive
light eld synthesis using multilayer displays with directional backlighting. ACM
Transactions on Graphics (TOG), 31(4):80, 2012.
[114] G Wetzstein, D Lanman, M Hirsch, and R Raskar. Real-time image generation for
compressive light eld displays. In Journal of Physics: Conference Series, volume
415, page 012045. IOP Publishing, 2013.
[115] Gordon Wetzstein, Douglas Lanman, Wolfgang Heidrich, and Ramesh Raskar. Lay-
ered 3d: tomographic image synthesis for attenuation-based light eld and high
dynamic range displays. In ACM Transactions on Graphics (ToG), volume 30,
page 95. ACM, 2011.
[116] Gordon Wetzstein, Douglas Lanman, Michele Hirsch, Wolfgang Heidrich, and
Ramesh Raskar. Compressive light eld displays. Computer Graphics and Ap-
plications, IEEE, 32(5):6{11, 2012.
[117] Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez,
Adam Barth, Andrew Adams, Mark Horowitz, and Marc Levoy. High performance
imaging using large camera arrays. ACM Transactions on Graphics, 24(3):765{776,
Aug 2005.
[118] Lance Williams and Eric Chen. View interpolation for image synthesis. In SIG-
GRAPH 93, 1993.
[119] Eric Woods, Paul Mason, and Mark Billinghurst. Magicmouse: an inexpensive
6-degree-of-freedom mouse. In GRAPHITE 2003, pages 285{286, 2003.
[120] Jason C. Yang, Matthew Everett, Chris Buehler, and Leonard McMillan. A real-
time distributed light eld camera. In Rendering Techniques 2002: 13th Eurograph-
ics Workshop on Rendering, pages 77{86, June 2002.
[121] Ruigang Yang and Zhengyou Zhang. Eye gaze correction with stereovision for video-
teleconferencing. Pattern Analysis and Machine Intelligence, IEEE Transactions
on, 26(7):956{960, July 2004.
116
[122] Tomohiro Yendo, Naoki Kawakami, and Susumu Tachi. Seelinder: the cylindrical
lighteld display. In SIGGRAPH '05: ACM SIGGRAPH 2005 Emerging technolo-
gies, page 16, New York, NY, USA, 2005. ACM Press.
[123] Shunsuke Yoshida, Masahiro Kawakita, and Hiroshi Ando. Light-eld generation
by several screen types for glasses-free tabletop 3d display. In 3DTV Conference:
The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON),
2011, pages 1{4. IEEE, 2011.
[124] Jingyi Yu, Leonard McMillan, and Steven Gortler. Scam light eld rendering. In
Pacic Graphics, Beijing, China, October 2002.
[125] Cha Zhang and Tsuhan Chen. A self-recongurable camera array. In Rendering
Techniques 2004: 15th Eurographics Workshop on Rendering, pages 243{254, June
2004.
[126] Song Zhang and Peisen Huang. High-resolution, real-time three-dimensional shape
measurement. Optical Engineering, 45(12), 2006.
[127] Zhengyou Zhang. A
exible new technique for camera calibration. IEEE Trans.
Pattern Anal. Mach. Intell., 22(11):1330{1334, 2000.
[128] Matthias Zwicker, Wojciech Matusik, Fredo Durand, and Hanspeter Pster. An-
tialiasing for automultiscopic 3d displays. In in Eurographics Symposium on Ren-
dering, 2006.
117
Abstract (if available)
Abstract
While a great deal of computer generated imagery is modelled and rendered in three dimensions, the vast majority of this 3D imagery is shown on two-dimensional displays. Various forms of 3D displays have been contemplated and constructed for at least one hundred years, but only recent advances in digital capture, computation, and display have made functional and practical 3D displays possible. In this thesis, I propose several designs that overcome some of the classic limitations of 3D displays. The displays are: autostereoscopic, requiring no special viewing glasses
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Point-based representations for 3D perception and reconstruction
PDF
Object detection and recognition from 3D point clouds
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
3D object detection in industrial site point clouds
PDF
3D deep learning for perception and modeling
PDF
Accurate 3D model acquisition from imagery data
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Compositing real and virtual objects with realistic, color-accurate illumination
PDF
Articulated human body deformation from in-vivo 3d image scans
PDF
CUDA deformers for model reduction
PDF
Face recognition and 3D face modeling from images in the wild
PDF
A framework for high‐resolution, high‐fidelity, inexpensive facial scanning
PDF
Complete human digitization for sparse inputs
PDF
An analytical and experimental study of evolving 3D deformation fields using vision-based approaches
PDF
Unsupervised learning of holistic 3D scene understanding
PDF
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
Reconstructing 3D reconstruction: a graphical taxonomy of current techniques
PDF
Digitizing human performance with robust range image registration
Asset Metadata
Creator
Jones, Andrew Victor
(author)
Core Title
Rendering for automultiscopic 3D displays
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
12/01/2016
Defense Date
05/11/2016
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D display,3D scanning,automultiscopic,computer graphics,GPU,light field,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Debevec, Paul (
committee chair
), Barbic, Jernej (
committee member
), Bolas, Mark (
committee member
)
Creator Email
jones@ict.usc.edu,wonderjeans@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-324217
Unique identifier
UC11213614
Identifier
etd-JonesAndre-4951.pdf (filename),usctheses-c40-324217 (legacy record id)
Legacy Identifier
etd-JonesAndre-4951.pdf
Dmrecord
324217
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Jones, Andrew Victor
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D display
3D scanning
automultiscopic
computer graphics
GPU
light field