Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Extendible tracking: Dynamic tracking range extension in vision-based augmented reality tracking systems
(USC Thesis Other) 

Extendible tracking: Dynamic tracking range extension in vision-based augmented reality tracking systems

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UM I a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. “Extendible Tracking: Dynamic Tracking Range Extension in Vision-Based Augmented Reality Tracking Systems” by Jun Park A D issertation Presented to the FA CU LTY O F T H E G RA D U A TE SCH O O L U N IV ERSITY O F SO U TH ER N CALIFORNIA In Partial Fulfillm ent o f the R equirem ents for the Degree D O C TO R O F PHILOSOPHY (C O M PU T E R SCIENCE) D ecem ber 2000 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3041507 __ ____ < g) UMI UMI Microform 3041507 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES. CALIFORNIA 90007 This dissertation, written by J u r \ P t f r k under the direction of h..l&...... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re­ quirements for the degree of DOCTOR OF PHILOSOPHY Dean o f Graduate Studies Date ...Dec^ber # 1 8 .2000 DISSERTATION COMMITTEE Chairperson Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements First of all, I would like to express the most of appreciation to Ulrich Neumann, my academic advisor and mentor. He showed me models to set up practical goals and steps to achieve them one by one. He also taught me how to cooperate with other professionals and non-professionals as well as to survive this competitive society. Many thanks the qualifying exam and dissertation committee members, Ram Nevatia, Jerry Mendel, Ellis Horowitz, and M aja Mataric. Their academic background and enthusiasm provided me with inspiration, guiding me in the better way. Thanks to my colleagues. Youngkwan Cho’s broad knowledge was a great asset for me. Douglas Fidaleo, Ilmi Yoon, Chad Jenkins, Tae-yong Kim, Jun-yong Noh, Jongweon Lee, Clint Chua, and Joe Demers provided mutual inspirations and good memories. Many thanks to my parents for supporting me emotionally and spiritually in every occasion. More than anything else, I thank my heavenly Father for showing mercy to a wretched like me. He knows I cannot do without His full support. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Acknowledgements................................................................................. ii List of Figures.......................................................................................... vi List of Tables............................................................................................ix Abstract....................................................................................................... x 1. Introduction...................................................................................... 1 1.1. D efinition o f A ugm ented R e a lity .............................................................................................................................. 1 1.2. A pplications o f A ugm ented R e ality ......................................................................................................................... 1 1.2.1. M a n u factu rin g and M ain ten an ce.......................................................................................................................2 1.2.2. M e d ic in e .....................................................................................................................................................................3 1.2.3. E n tertain m en t and B roadcasting........................................................................................................................3 1.2.4. E n g in eerin g and C onsum er D e sig n ................................................................................................................. 4 1.2.5. R o b o tics and T elerobotics / T elem anipulation and T e le o p e ra tio n ..................................................... 6 1.2.6. P ersonal D igital A ssistant (P D A )..................................................................................................................... 6 1.2.7. M ilita ry ........................................................................................................................................................................7 1.3. A n Ideal S y s te m ................................................................................................................................................................7 1.4. R egistration S y ste m ....................................................................................................................................................... 9 2. Tracking Systems......................................................................... 12 2.1. R egistration w ithout T ra c k in g ............................................................................................................................... 12 2.2. R egistration by T ra c k in g ......................................................................................................................................... 13 2.3. T rack in g M ethods: E valuation and C o m p a riso n .............................................................................................. 15 2.4. V ision-based T ra c k in g ................................................................................................................................................ 18 2.4.1. S tru ctu re from M otion (S fM ).......................................................................................................................... 20 2.4.2. M o d el-b ased approach: Fiducial-based T ra c k in g .................................................................................... 21 3. Extendible Tracking...................................................................... 24 3.1. P roblem s o f E xtendible Tracking: Inaccurate Pose C om p u tatio n and O utlier E xistence....................26 3.2. S ystem O v e rv ie w ............................................................................................................................................................30 4. New Feature Position Estimation............................................... 34 4.1. Initial E stim a te s .............................................................................................................................................................. 35 4.2. R ecursive F ilte rs.............................................................................................................................................................35 4.2.1. E x ten d ed K alm an F ilter (EK F)....................................................................................................................... 35 4.2.2. R ecu rsiv e A verage o f C ovariances (R A C ) F ilte r.................................................................................... 38 4.3. C alibration R e s u lt.......................................................................................................................................................... 41 iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Camera Pose Computation........................................................... 45 5.1. R e q u ire m e n ts......................................................................................................................................................................45 5.2. P ose C om putation M e th o d s.......................................................................................................................................... 46 5.3. A N ew M ethod: RA3 (R obust A verages o f 3 -P o in t S o lu tio n s).....................................................................48 5.4. P ose C om putation E xperim ents and R e s u lts ........................................................................................................51 5 .4 .1. D ifferent N oise L ev els...........................................................................................................................................52 5.4.2. P rocessing D ifferent Q uantity o f P o in ts ........................................................................................................ 53 5.4.3. M anaging O u tlie rs.................................................................................................................................................. 54 5.4.4. S udden C am era M o tio n ........................................................................................................................................ 55 5.4.5. R eal E x p erim e n t...................................................................................................................................................... 56 5.5. C h a p te r S um m ary and D isc u ssio n ............................................................................................................................ 57 6. Outlier Culling with Robust Statistics...................................... 58 6 .1. R obust M -estim a to r......................................................................................................................................................... 58 6.2. R eal-tim e A p p ro x im atio n .............................................................................................................................................59 6.3. A d ap tin g into a K alm an Filter: “U se o f a p rio ri co v a rian c e” .......................................................................60 6.4. A p p licatio n s and R e su lts...............................................................................................................................................63 7. System Integration and Results...................................................65 7.1. In teg ratio n with N atural Feature T ra c k in g ........................................................................................................... 65 7.2. R A 3 w ith D ynam ic Feature C a lib ra tio n .................................................................................................................6 8 8. Propagation Error Model of Extendible Tracking..................... 70 8.1. E x p erim en t D e sig n ......................................................................................................................................................... 72 8.1.1. G rid E xperim ents.....................................................................................................................................................72 8.1.2. E xtendible T racking E x p e rim e n ts....................................................................................................................73 8.2. S im u lated E xperim ents..................................................................................................................................................78 8.2.2. U ncertainty T hreshold ( tu ) .................................................................................................................................80 8.2.3. N u m b er o f initially calibrated features ( nc ) .............................................................................8 1 8.2.4. A verage C am era D epth ( z ), Feature D en sity (m), and A verage N um ber o f in-view F eatures ( n v ) 83 8.2.5. A verage N um ber o f in-view F eatures ( n v ) ................................................................................................. 84 8.2.6. M easurem ent N oise Level ( <Jm )...................................................................................................................... 86 8.3. P ropagation E rror Prediction M o d e l........................................................................................................................ 88 8.3.1. S um m ary o f P aram eter E ffe c ts.......................................................................................................................... 88 8.3.2. P ropagation E rror E q u atio n .................................................................................................................................89 9. Conclusion and Future Work...................................................... 91 Bibliography.............................................................................................95 Appendix A. Algorithm of the Perspective 4-Point Problem based on Horaud’s method..............................................................................100 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A .I. In tro d u c tio n ......................................................................................................................................................................100 A .2. M atrix A i (T ran sfo rm atio n from the im age space to the cam era s p a c e )............................................... 101 A .3. M atrix A j (T ran sfo rm atio n from the o b ject space to the im age s p a c e ).................................................... 102 A .3.1. O v e rv ie w ................................................................................................................................................................ 102 A .3.2. P roblem s w ith signs o f trig o n o m e try .......................................................................................................... 104 A .3.3. C alcu latio n o f < p , 0, and dx ...............................................................................................................................107 A.4. E xperim ents and R e s u lt............................................................................................................................................ 1 1 1 A.5. D isc u ssio n ..........................................................................................................................................................................113 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Fig. 1-1 An ideal A R sy ste m .......................................................................................................................................................... I Fig. 1-2 A R applications in m anufacturing and m aintenance.............................................................................................. 2 Fig. 1-3 AR applications in m e d ic in e .......................................................................................................................................... 3 Fig. 1-4 AR applications in e n tertain m en t................................................................................................................................. 4 Fig. 1-5 A R applications in eng in eerin g design and architecture.............................................................................. 5 Fig. 1-6 AR applications in telerobotics and telem anipulation.......................................................................................... 6 Fig. 1-7 AR applications for personal inform ation................................................................................................................ 7 Fig. 1-8 Com m on process o f an A R sy ste m ............................................................................................................................. 8 Fig. 1-9 An Extendible T racking S ystem O v erv iew .......................................................................................................... 11 Fig. 2-1 C om m on process o f an A R sy ste m ......................................................................................................................... 13 Fig. 2.2 C ooddinate system s o f the V E and the R E ........................................................................................................... 14 Fig. 2-3 C ategories o f vision-based trac k in g ......................................................................................................................... 20 Fig. 3-1 C ategories o f tracking w ith dynam ic c a lib ra tio n ................................................................................................24 Fig. 3-2 Synthetic experim ent design for E xtendible T ra ck in g ......................................................................................26 Fig. 3-3 Propagated errors in d ynam ic tracking range extension ex p erim en t........................................................... 27 Fig. 3-4 A ccuracy test o f 3-point pose m ethod: E xperim ent sequence......................................................................... 28 Fig. 3-5 A ccuracy test o f 3-point pose m ethod: C om puted cam era position error (R M S : in c h )........................29 Fig. 3-6 A ffect o f feature outliers in N ew Point Position E stim a tio n ................................................................... 30 Fig. 3-7 An Extendible T racking S ystem A rc h ite c tu re .....................................................................................................31 Fig. 4-1 U pdating 3D feature position based on 2D m easurem ent................................................................................34 Fig. 4 -2 Initial estim ates: intersection o f two lin es..............................................................................................................35 Fig. 4-3 2D analogy o f new algorithm o f update direction in R A C ............................................................................. 39 Fig. 4-4 Synthetic D ata: C am era m ovem ent - Z o o m in g ..................................................................................................41 Fig. 4-5 Synthetic D ata: C am era M ovem ent - P a n n in g .....................................................................................................41 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 4-6 2D A nalogy o f A nalytic S o lu tio n :........................................................................................................................... 42 Fig. 4-7 Real Data: Z oom ing and Panning c a se s.................................................................................................................. 43 Fig. 4-8 Pan Case: V isualizations o f new lines from m oving cam era views and their e ffec t upon the uncertainty covariance e llip s e s ............................................................................................................................................43 Fig. 4-9 Zoom C ase: V isualizations o f new lines from m oving cam era views and th e ir effect upon the uncertainty covariance e llip s e s ............................................................................................................................................44 Fig. 4 -1 0 V irtual cam era view s o f the results o f real data experim ents.........................................................................44 Fig. 5-1 A ccuracy test o f 3-point pose m ethod: C om puted cam era position erro r (V ertical axis: R M S in in c h )................................................................................................................................................................................................45 Fig. 5-2 Linear K alm an F ilter for tem poral s m o o th in g .......................................................................................................49 Fig. 5-3 Im provem ent by A v e ra g in g .......................................................................................................................................... 50 Fig. 5-4 Im provem ent by M -estim atio n..................................................................................................................................... 50 Fig. 5-5 Im provem ent by tem poral sm oothing w ith K alm an filtering........................................................................... 51 Fig. 5-6 C am era position and orientation errors for tw o m easurem ent noise le v els.................................................53 Fig. 5-7 T racking w ith d ifferent num ber o f p o in ts ................................................................................................................53 Fig. 5-8 Presence o f o u tlie rs........................................................................................................................................................... 55 Fig. 5-9 Stability under sudden cam era m otion: cam era X co o rd in a tes........................................................................55 Fig. 5-10 Real environm ent an d virtual object overlay for experim ent with real d a t a ............................................. 56 Fig. 5 - 1 1 R e-projection errors in real data e x p e rim en t........................................................................................................ 57 Fig. 6 - 1 O utlier boundaries using a priori c o v a ria n c e ......................................................................................................... 61 Fig. 6-2 M -cstim ation with ca m e ra pose o u tlie rs.................................................................................................................. 63 Fig. 6-3 M -estim ation w ith feature m easurem ent o u tlie rs ..................................................................................................64 F ig. 7-1 Results o f Feature D etection and T ra c k in g ............................................................................................................. 65 Fig. 7-2 C onvergence o f 3D positio n s o f N atural F e a tu re s ............................................................................................... 6 6 Fig. 7-3 Result o f V irtual O bject/A nnotation O v e rla y ........................................................................................................ 67 Fig. 7-4 Synthetic experim ent se ttin g for propagated errors in dynam ic feature c a lib ra tio n ................................6 8 F ig. 7-5 Propagated cam era p o sitio n error w ith dynam ic ca lib ra tio n ............................................................................ 69 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 8-1 G rid E xperim ent D e s ig n ................................................................................................................................................ 73 Fig. 8-2 Param eters in F eature P la c e m e n t................................................................................................................................74 Fig. 8-3 P robability density o f a random variable for feature p la c e m e n t.................................................................... 74 Fig. 8-4 Feature disp lacem en t ran g e........................................................................................................................................... 75 Fig. 8-5 A T ypical C am era M o tio n .............................................................................................................................................76 Fig. 8 - 6 S cene length in c re a se .......................................................................................................................................................77 Fig. 8-7 T he E ffect o f N u m b er o f F ra m e s ................................................................................................................................79 Fig. 8 - 8 T he E ffect o f U n certain ty T h re s h o ld ........................................................................................................................ 81 Fig. 8-9 T he E ffect o f the “N u m b er o f Initially C alibrated F eatures” in G rid E x p e rim e n t................................. 82 Fig. 8-10 T he E ffect o f N u m b er o f Initially C alibrated F e a tu re s .................................................................................... 83 Fig. 8-11 T he E ffect o f n v ..............................................................................................................................................................85 Fig. 8-12 T he E ffect o f M easu rem en t N oise on C am era Position E rror: G rid E x p e rim e n t...................................8 6 Fig. 8-13 T he E ffect o f M easu rem en t N o is e ...........................................................................................................................87 Fig. 8-14 P redicted propagation e rro r and range with 6 8 % c o n fid e n c e .......................................................................90 Fig. 8-15 E rror Prediction M o d e l.................................................................................................................................................90 Fig. 9-1 T racking area d e c o m p o sitio n ........................................................................................................................................93 Fig. 9-2 ID tag and features for localized tra c k in g ................................................................................................................94 Fig. A-1 F our points (M 0. M 1.M 2.M 3 ) converted into a pencil o f 3 lines (L 1.L2. L3)............................................. 100 Fig. A-2 H om ogeneous M atrix D e c o m p o sitio n ................................................................................................................ 101 Fig. A-3 Im age space co o rd in ate sy ste m ............................................................................................................................... 102 Fig. A-4 O bject space co o rd in ate s y s te m ............................................................................................................................... 103 Fig. A-5 Im age space and O b jec t sp a ce ................................................................................................................................... 104 Fig. A - 6 C ases w hen the co efficien t o f (k'xPj) is n e g a tiv e ...................................................................................... 105 Fig. A-7 O bject space co o rd in ate s y s te m ............................................................................................................................... 106 Fig. A- 8 C alculation o f d e p th .....................................................................................................................................................1 10 Fig. A-9 C am era position erro r: 4 -p o in t m ethod vs. 3-point m e th o d ........................................................................ 1 12 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables T ab le 2-1 Evaluation o f tracking devices / m eth o d s.............................................................................................................17 T ab le 2-2 M ore evaluation o f tracking devices / m ethods................................................................................................ 18 T ab le 3-1 C om parisons o f vision-based tracking m ethods................................................................................................25 T ab le 5-1. Im provem ent in averages and standard deviation in e rro rs........................................................................51 T ab le 5-2. Pose feature pro jectio n ac cu rac y ........................................................................................................................... 52 T ab le 8 -1. The E ffect o f P aram eters on P ropagation E rro r...............................................................................................8 8 T ab le 8-2. C lassification o f P aram eters on Propagation E rro r.........................................................................................89 T ab le A-1 A verage and standard deviation o f cam era position e r r o r s ................................................................... 112 T ab le A -2 Average num ber o f s o lu tio n s .............................................................................................................................. 112 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Augmented Reality (AR) is an interface technology designed to increase the efficiency of user’s interaction with the real environment (RE) and the virtual environment (VE) by providing computer-generated information on the user’s view of the real environment. Virtual information can be texts, rendered models of CAD data, or volume data reconstructed from the scans of medical devices. Vision-based tracking systems are widely used for AR because they do not require additional tracking hardware devices and they can provide accurate registration. However, for these tracking systems, the operating range is restricted to the areas where a minimum number of calibrated features are in view. Partial occlusion of the scene, even when the area of user’s interest is in view, may cause failures in tracking. Tracking over wide ranges can be achieved by dynamically and automatically calibrating unknown features, as they are needed. The calibration is deferred until required as the notion of "lazy evaluation" in Algorithms. With dynamic, automatic, and deferred calibration, a user does not have to predict the possible tracking area, placing and calibrating features ahead of time. Rather, the user starts the system by tracking with a small set of calibrated features. As the user expands to new areas, system calibrates unknown features as they appear in view. Once calibrated, the features can be used as tracking primitives to compute the camera pose. Thus, tracking range can be extended to unprepared environments. The use of natural feature tracking enables tracking range extension even to areas devoid of artificial fiducials, allowing for tracking and virtual object overlay in natural environments. Extending tracking range to a wide area demands reduction of propagated errors because the practical aspect o f camera-pose tracking and feature-position calibration involves noise and errors. This thesis describes robust extendible tracking system that removes or reduces noise and errors. x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. Introduction 1.1. D efinition o f A ugm ented R eality Augmented Reality is a variation or an extension of Virtual Environment, where the users are allowed to see the real world, with VE superimposed upon or embedded in the real world. That is, AR supplements reality, rather than completely replacing it with virtuality [Azuma 1995]. Using AR technology, users can thus interact with a mixture of the virtual world and the real world in a natural way with enhanced efficiency [FhG IGD], The VE is based on various forms of information: visual (text annotation, rendered models of CAD data, or volume data from medical devices), audio, haptic, etc. Among these forms, the visual information has been the primary research interest in AR. % & I — ^ — 'V r ' ' v g H Virtual \ ( ) l I s S B information 7 .> Z. providence / Real W a n i t i m n n n a r t f / A Augm ented environm ent environment Fig. 1-1 An ideal AR system 1.2. A pplications o f A ugm ented Reality AR applications include but are not restricted to medicine, manufacturing / maintenance, entertainment, and engineering / consumer design. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.1. Manufacturing and M aintenance An AR system provides step-by-step instructions for unfamiliar equipment, relieving technicians from referring to several pages of manual. The image of the equipment can be augmented with annotations and CAD information. Fig. 1-2 is an example of a step-by-step maintenance guide for an airplane solenoid test. Researchers at Boeing have been developing an augmented reality prototype to reduce time and effort spent for making wiring harness for airplanes [Caudell 1992][Sims 1994][Curtis 1998]. The technicians are guided by the augmented display that shows the routing of the cables on a generic frame. The result of the pilot study was positive: the workers were able to build wire bundles after only a brief training session [Curtis 1998], There are some other AR works in manufacturing and maintenance applications [Uenohara 1995][Feiner 1993]. Eventually, AR is expected to be used for complicated machinery, such as automobile engines [Tuceryan 1995]. Annotation supporting a maintenance task (USC) Fig. 1-2 AR applications in manufacturing and maintenance Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.2. M edicine One of the major AR applications in medicine is image-guided surgery. Internal anatomy of a patient can be visualized using 3D volume data reconstructed from CT or MRI scans. Using AR techniques, a surgeon can see where the internal structures are located from the surgeon’s camera view. In this way, a surgeon has x-ray vision, a capability that is critical for minimally-invasive surgeries [Grimson 1996] (Fig. l-3a). A mockup of a breast biopsy operation has been built, by which the virtual objects identify the location of the tumor to guide the needle to its target [State 1996b] (Fig. 1 -3b). There are more research groups investigating to display medical data, directly registered onto the patients’ body [Mellor 1995][Lorensen 1993][Betting 1995] [Edwards 1995] [Taubes 1994], a. X-ray vision for image guided surgery (MIT) b. Mockup of breast tum or biopsy. 3-D graphics guide needle insertion (UNC Chapel Hill Dept, of Computer Science.) Fig. 1-3 AR applications in medicine 1.2.3. Entertainm ent and Broadcasting A simple form of augmented view that has been used in entertainment is chroma keying, by which the real image is augmented with computer generated images (e.g., maps in a weather report). A more advanced form can be found in virtual studio. A dynamic, computer-generated, 3D background replaces the static background of a studio. The foreground camera is free to 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. move but must be tracked so that the background image can be generated in the camera’s perspective. Because not performed in real time, special effects for Films may not be considered AR [Simon 1998]. But AR technology has been (is being) used for real time broadcasting. Fox Sports used AR technology to add glow to the hockey puck real-time for live broadcasting (Fig. 1-4). The purpose was to make it easier to follow the puck and to enhance the feeling of dynamics [Fox Trax 1997]. They installed infrared (IR) emitters in the hockey puck and pre-calibrated IR cameras to determine the position of the puck. The sensors installed in the broadcasting cameras were used to measure the angles of the cameras. Although Fox Trax failed to attract hockey fans (hockey fans’ experience was diminished), it was one of the early AR products that technically succeeded. Another example is Princeton Electronics Billboard that allows broadcasters to insert advertisement into specific areas of the broadcast image. Thanks to this technology, local distributors can replace the original advertisements with the advertisements of the local sponsors. Added glow on the hockey puck using Fox Trax (© 1997 IEEE) Fig. 1-4 AR applications in entertainment 1.2.4. E ngineering and Consum er D esign If equipped with AR systems, the physically separated designers and clients may review a complex model on a shared augmented environment. Mixed Reality Systems Laboratory seeks to build an AR prototype so that designers and clients can collaborate on a shared AR environment 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. interactively and efficiently. They can interactively select and modify a specific part of the model. The TransVision system uses a palmtop video-see-through display to show a computer-generated 3D model superimposed on the real-world view. Two or more participants can share a same computer model as if it were real. This situation seems similar to shared VEs, but by taking the augmented reality approach, two or more users can see each other without any difficulty. Since the coordinate system of each participant is identical, actions such as pointing gesture are meaningful to all participants. With virtual reality, gestures as well as computer models have to be re-generated and re-rendered every frame. AR systems can also provide users with a real-world view augmented with a planned architectural structure (Fig. l-5a). An interior architect designs, remodels, and visualizes a room, using furniture models from a database that are superimposed on video images of the room (Fig. l-5b). ECRC is developing an electronic apparel system (an electronic version of a customer shop for clothes). Using this system, customers can look through an electronic catalog, wear clothes, and see how they fit. a. Views of the River Wear in Sunderland, b. Interior design (Fraunhofer Institute for Newcastle (U.K.), augmented with a planned computer Graphics ) Milleneum Footbridge (Fraunhofer Institute for computer Graphics) Fig. 1-5 AR applications in engineering design and architecture 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2.5. Robotics and T elerobotics / T elem anipulation and Teleoperation Teleoperation of a robot is often difficult to directly manipulate especially when the robot is far away and the communication delay is long. Under this circumstance, it is preferable to plan and test the robot actions using a virtual robot displayed on the scene of the RE. Once tested, the actions can be executed by the real robot as planned. Fig. l-6a shows how planned actions can be displayed using virtual outlines (ARGOS [Milgram 1993][Rastogi 1996]). In telemanipulation or teletraining systems, an on-site worker / trainee may have questions answered by a remote expert through text, images, and dynamic manipulations of 3D virtual objects superimposed on shared images of the site environment [Park 1998b] (Fig. I-6b). There are other telepresence systems that used image overlays [Kim 1996] [Rekimoto 1997] [Oyama 1993] [Tharp 1994] [Yoo 1993], a. Virtual lines show a planned motion of a b. Instruction for telemanipulation (USC) robot arm (University of Toronto) Fig. 1-6 AR applications in telerobotics and telemanipulation 1.2.6. Personal Digital A ssistant (PDA ) Using a Personal digital Assistant (PDA), the user's situation will be automatically recognized to assist the user without direct instructions. The user's focus is not on the computer, but the real world. The role of the system is to assist and enhance the user’s interaction with the RE. 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sony developed a friendly palmtop computer system (NaviCam) that has a small video camera to detect the REs. This system provides users with the context sensitive information of the RE. For example, a user in a research laboratory can leam what scientists are up to without disturbing them, or if a user wants to find a particular office, the NaviCam leads the way. 1.2.7. M ilitary The military has been using cockpit displays to provide information to the pilots on the windshield of the cockpit or the visor of helmets. SIMNET (a distributed war games simulation system) also uses AR techniques to display the real battlefield scene augmented with annotation information or highlighting to emphasize hidden enemy units. 1.3. An Ideal System To realize the AR applications in 1.2, an ideal AR system recognizes the RE determining the corresponding VE model, computes the registration information to align the VE with the RE, and renders the VE on the RE. A general computer vision approach of recognizing an environment is NaviCam Today's New Journals: 1 Presence ' ACM Multimedia S y ste m s IEEE Software :& 'll ---- K eW irriH to [Uist'95] a. AR system provides users with the information about a painting in a museum (Sony) b. AR system provides the users with the updated library information (Sony) Fig. 1-7 AR applications for personal information 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. difficult. However, it is easier by recognizing a unique tag [Rekimoto 97], through recognizing a fiducial cluster [Cho 98], by utilizing loose tracking devices (e.g., GPS, digital compass) aided by a computer vision system (e.g. recognizing landscape silhouettes) [Behringer 99], or simply by using user intervention. Fig. 1-8 Common process of an AR system The registration should be real-time, accurate, and robust. The video frame rate of the augmented view should be at least 10~15Hz to provide the users with the feeling of immersion. Human perception is very sensitive to small misalignment. So the accuracy should be high: ideally sub­ pixel accuracy is desired; in a real-time system it is difficult to achieve yet 2-pixel accuracy is desired. The registration system should be robust under outlying measurements and environmental disturbances (for example, magnetic field disturbance or line-of-sight occlusion). The registration system should also allow for mobility and wide-range of operation. Mobility is critical for some applications especially when the users’ operation range is outdoors. For example, SIMNET is designed to provide the soldiers, who should have freedom of travelling, with annotations of enemy units. For these applications, the system should be light in weight and small in volume to be head-worn. Applications require wide-range of operations. For example, 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. most of medical applications are operated in a desktop range, but airplane assembly and maintenance easily require image overlay in 70 m width (Boeing 747-400). The rendering system renders the VE on the image of the RE. Commercially available graphics libraries and graphics engines can be used for real-time rendering. There are some other issues related to realistic virtual environment rendering (recovery of the RE lighting model [Chevier 1995] [Fournier 1994], occlusion problem [Wloka 1995] [Berger 1997]). However, these issues are outside the interest of this paper. The focus of this paper is registration system. 1.4. R egistration System To summarize, the requirements of the registration system for an ideal AR system is as follows: ■ Real-time: low latency and high update rate (> 10-15 Hz) ■ Accuracy: alignment error < 2-3 pixels ■ Robustness: in the presence of outliers and under environmental disturbances ■ Freedom of mobility: portable and wearable * Wide-range o f operation: from desktop to outdoor Unfortunately, it is very difficult to build a registration system to fulfill all the above requirements. A vision-based tracking system has advantages and widely used for AR registration, however, it has limitations in tracking range and other disadvantages (e.g., occlusion problem). As proposed in this paper, an extendible tracking system starts tracking based on a small set of calibrated features, calibrating new features as they are in view. As the new features are recursively calibrated and their uncertainties decrease below a threshold, these features are 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. inserted into the “Calibrated feature database” to be used for tracking (Fig. 1-9). By using dynamically calibrated features, the tracking can be robust under partial occlusion and the range can be extended to uncalibrated areas. As a result, an extendible tracking overcomes the disadvantages of a vision-based tracking system as explained in the thesis statement. Thesis Statement The tracking range of a computer vision-based tracking system can be extended to uncalibrated environments dynamically, interactively, and on-line by (1) calibrating 3D positions of new features, (2) refining camera pose solutions, (3) properly managing outliers, (4) and integrating with natural feature tracking. The contribution of this dissertation is to enable tracking range extension for vision-based camera tracking. Tracking range was extended to uncalibrated environments through dynamic feature calibration, pose computation refinement, and outlier management. Tracking range could be extended even to natural environments when the extendible tracking system was integrated with natural feature tracking. For feature calibration, two recursive filters were designed and implemented. These filters provided accurate 3D position estimates given 2D noisy measurements. According to the simulated experiments, the estimates converged to the true values fast and stayed stable after convergence. For accurate and stable camera pose solutions, a new pose computation method was designed and implemented. Comparing with other closed-form solutions, this new method provided accurate, stable, and robust pose solutions. Vision-based systems are subject outliers. For example, a background object may be incorrectly identified as a calibrated feature. To solve the problems occurred by the outliers, robust statistical methods were imbedded in feature calibration and pose computation. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Insert into DB if Uncertainty < thresh. Calibrated features Uncalibrated features Update feature position Fig. 1-9 An Extendible Tracking System Overview Chapter 2 describes tracking systems, evaluating and comparing various tracking methods. Chapter 3 provides an overview of an extendible tracking system. Chapters 4 to 7 describe the extendible tracking system in detail. Chapter 4 explains about two recursive filters for new feature position calibration. Chapter 5 describes the problems of the current pose computation method, requirements of the new pose computation method, and the new pose computation method compared with the old method. Chapter 6 explains about robust statistical methods, its adoption for real-time implementation, and “use of a priori covariances” to cull outliers in Kalman filter. Chapter 7 provides experimental results, especially of integration with natural feature tracking to extend the tracking range to natural environments. In chapter 8, results of intensive simulated extendible tracking experiments are presented with a propagation error prediction model. Chapter 9 contains conclusion and future work. 1 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 . Tracking Systems A common solution to provide registration information is by tracking the 6DOF camera pose. Section 2.1 briefly explains registration without tracking and its disadvantages, followed by sections 2.2-2.4, describing registration by tracking, evaluations of various tracking methods, and vision-based tracking in more detail. 2.1. Registration w ithout Tracking Registration information can be obtained without tracking, by utilizing geometric-invariance [Mundy 1992] [Weiss 1993] and directly computing the image coordinates of the VE. In CM U’s Magic Eye Project [Uenohara 1996], the system utilizes cross-rations to calculate the positions o f virtual objects. Because cross-ratios of four triangle areas are projective-invariant, using 4 feature points, the system can calculate the fifth point position (positions of the virtual object) based on the calculated invariants. This work is 2D-based registration and not capable of rendering 3D objects. Kutulakos et al. extended the use of invariants to include 3D rendering and placement of arbitrary virtual objects [Kutulakos 1996], However, the accuracy does not seem to be high enough for reliable AR systems. There also has been a work in Structure from Motion using affine invariants [Manku 1997], They used affine cam era projective model [Koenderink 1991] [Mundy 1992] that is valid when FOV is small and depth of the object is small compared to the viewing distance. However, this assumption fails often in many AR applications, especially when the user is manipulating 3D objects in an arm length distance. As a result, using affine invariants is based on assumptions that are not always valid and the acquired accuracy is not high enough for an ideal AR system. A more accurate approach of solving the registration problem is by tracking the 6DOF pose of the camera / object. 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2. R egistration by T racking Fig. 1-8 is repeated in Fig. 2-1 with highlighted ‘Tracking system”. ‘Tracking system” replaces “Registration system” because accurate registration is accomplished by tracking 6DOF camera pose. As in Fig.2-2, the RE and the VE have their own coordinate systems. The RE is projected onto the sensor of the real camera creating the user’s view of the RE. For the VE to be correctly superimposed onto the view of the RE, two types of spatial linkage are required. The coordinates of the VE should be spatially linked to the coordinates of the RE, and the virtual camera and the real camera should be linked together. The coordinates of the VE are linked to the coordinates of the RE during the VE modeling process. For example, the spatial relathionship (between a real part and a virtual part of a device), which is often available from CAD data, are used to define the coordinates of the virtual part when included in the VE database. For the virtual and the real cameras to be spatially linked, they should share same intrinsic and extrinsic parameters. The camera intrinsic parameters can be estimated off-line assuming fixed focal length. For the camera extrinsic parameters, a tracking system is used to estimate the 6DOF pose of the real camera. Given these spatial linkages, the VE is transformed into the same screen space as the RE creating the augmented environment. From this point on, real-time 6DOF camera pose “tracking” is the focus of this paper. Fig. 2-1 Common process of an AR system 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. O bject O bject Object R eal Environm ent Spatial linkage ^ t V > O bject \ * j , L O bject ^ O bject Virtual Environm ent Tracking device Reel cam era patial linkage Virtual cam era Fig. 2.2 Cooddinate systems of the VE and the R E . The goal of tracking is to provide information fulfilling the requirements for the AR applications of 1.2. To repeat, the requirements for AR tracking system are as follows: ■ Real-time: low latency (between RE and VE) and high update rate (at least 10~15Hz) ■ Accuracy: alignment error < 2-3 pixels ■ Robustness: in the presence of outliers and under environmental disturbances ■ Freedom of mobility: portable and wearable ■ Wide-range of operation: from desktop to factory plant or outdoor In the next section, currently available tracking methods will be evaluated according to the above requirements. 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.3. Tracking Methods: Evaluation and Comparison Magnetic trackers use sets of coils in a transmitter as a magnetic source. These coils are pulsed to produce magnetic fields (AC or DC) that are sensed by a receiver to determine the strength and angles of the fields. Magnetic trackers are accurate in small working volumes, such as aircraft cockpits. But as the distance increases, the magnetic field strength and hence the accuracy decrease. Magnetic trackers are vulnerable to noise from power cables, CRTs, and other devices that generate electromagnetic noise of 8-1000Hz [Meyer 1992], They are also subject to metallic distortions and large amount o f error and jitter. An uncalibrated system exhibits 10 cm or more of errors especially in the presence of magnetic field disturbances. Carefully calibrated systems can reduce the errors within 2 cm [Livingston 1995]. Although easily wearable (light and small), magnetic trackers are not robust, and the tracking range is limited by the magnetic field strength. Mechanical trackers measure pose by physically connecting (with jointed linkages) a remote object to a reference point. Sutherland used a mechanical tracking system in his head-mounted display project [Sutherland 1968]. Even with accurate pose information (position accuracy - 0.1 inch), Sutherland reported that the mechanical device was heavy and uncomfortable. Mechanical trackers are useful for applications that require force feedback and accurate tracking in relatively small volumes (e.g. Fake Space BOOM). However, they are intrusive, and the range is limited. There are two types of acoustic trackers: one is based on time-of-flight (TOF) m easurem ent; the other, on phase-coherent (PC) measurement. A TOF device determines distance by measuring the elapsed flight time of an acoustic wave. Multiple emitters and sensors acquire a set of distances to compute the 6DOF pose. A TOF device is characterized by low update rates and vulnerability to ranging errors. In a small volume, a TOF system is accurate but as the range increases, the rate and the accuracy decrease. TOF systems are also vulnerable to spurious acoustic pulses (e.g., typewriters, key chains) generating gross errors. PC systems are less 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vulnerable to noise and spurious errors. They also provide better accuracy and wider range. However, they measure changes in position only and are subject to cumulative errors. Both types of acoustic systems are vulnerable to sensor occlusion and the air disturbances, and their ranges are also limited. Inertial trackers use accelerometers and gyroscopes. Orientations are computed by jointly integrating the measurements of the rate gyros, and changes in positions are computed by double integrating the measurements of the accelerometers based on initial known orientations. Inertial trackers are characterized by stability at high frequencies, good rates, and unlimited tracking range. However, they are subject to accumulation errors (drifts). They are also inaccurate in slow positional changes. Errors of position accelerometers are also not within the functional range for AR tracking [Rekimoto 1997]. Digital compasses, using passive magnetic sensors referenced to the earth’s magnetic field, provide variety of measurements (roll, pitch, yaw, angular acceleration, and velocity). Although inexpensive and portable, they provide 3DOF orientation only, and are inaccurate. Recently, GPS (Global Positioning System) is widely used for global position tracking. Regular GPS does not provide good accuracy (~ m). Although differential GPS provides accurate positions (~ cm), they are heavy and cumbersome as a headwear, and provide 3DOF position only. GPS is also subject to occlusions and does not function well indoors. Optical trackers use cameras (ordinary cameras or infrared cameras) as sensors. They use various algorithms to compute the pose (5DOF or 6DOF). They function over a wide range of area, and provide good accuracy and high rates. However, they are subject to occlusion (line of sight) problem and limited tracking range (limited to the areas where a minimum number of passive fiducials or active beacons are viewable). The robustness depends on the algorithm. 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table 2-1 summarizes the advantages and disadvantages of the tracking methods. Details on the tracking methods (including the rates, precision / accuracy, and commercially available products) can be found in a VE interface technology review written by Youngblut et al. [Youngblut 1996]. Tracking T echniques M agnetic trac k er M echanica 1 tra c k e r Acoustic tracker In ertial trac k er O ptical Com pass GPS TOF PC P assive Active R eal-tim e: Latency R ate High Hiqh Low Hiqh High Low High Hiqh Low Hiqh Low Good Low Good Low Good Low Low Accuracy R o b ustness Low No High Robust No No Medium No Good ? Good ? Low No Depends Yes Portability R ange Yes Limited No Limited No Limited No Umited Yes Not limited Yes ? No Limited Yes Unlimited Depends Outdoor In teractiv ity No No No No No Yes No No No Dynamic environ­ m en t No No No No No Yes No No N o C om m ents Sensitive to magnetic field Intrusive Occlusion problem Air disturbances Drift Translatio n error Occlusion problem In ­ a ccu ra te 3DOF 3DO F Table 2-1 Evaluation of tracking devices / methods None of the tracking methods fulfills the tracking requirement for an ideal AR system. Magnetic trackers, mechanical trackers, and acoustic trackers are limited in tracking ranges. Digital compasses, GPS, magnetic tracker are inaccurate and not robust. Inertial trackers are portable, providing accurate orientation (relative from initial orientation) if the accumulation error can be compensated for. But they provide 3DOF orientation only. Hybrid tracking systems have been developed to compensate the weaknesses of any single tracking method. For example, vision-gyro hybrid tracking system is used to compensate the high-frequency noise of the vision-based tracking system and the accumulation errors of gyros. However, all the tracking devices (including gyro) other than vision-based system is based on a fixed coordinate system, not allowing for the object motion in the environment. This kind of 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. tracking system is not suitable for AR applications in manufacturing, maintenance, and training that require task guidance and specific component indications on subassemblies or portions of structure. These applications are often object-centric, and a more appropriate tracking solution, based on viewing the object itself, is provided by the pose estimation methods developed in the Fields of computer vision and photogrammetry [Neumann 1996]. Another problem with vision- gyro hybrid tracking system is position tracking. Gyro provides accurate orientation information, however, the system depends on vision system for position information because there is no stand­ alone and accurate position tracking system. GPS / digital compass hybrid tracking system has unlimited range and it provides global 6DOF pose, but the accuracy and robustness is not high enough for an ideal AR system. Other hybrid tracking systems have limited tracking range because at least one of the tracking methods has limited tracking range. Tracking Techniques Magnetic / Vision Vision / Inertial Acoustic / Inertial GPS/ Digital Compass Real-time: Latency None Low Low Low Rate M oderate Moderate Hiqh Hiqh Accuracy M oderate Moderate High Low Robustness ? ? ? No Portability Yes Yes No Yes Range Limited 7 Limited Unlimited Allow moving objects No No No No Comments Not fully tested: need extendibility Table 2-2 More evaluation of tracking devices / methods 2.4. V ision-based T racking Although a vision-based tracking system does not fulfill all the requirements, it has many advantages. First, the same video camera used to capture real scenes also functions as a tracking device. Second, because the pose calculation is based on the image, the perceived image 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. alignment error can be minimized. Third, processing delays in the video and graphics subsystems can be matched, thereby eliminating dynamic alignment errors. Fourth, cameras are wearable and the system is portable (it functions as a stand-alone unit unlike, for example, magnetic trackers and acoustic trackers that require magnetic or acoustic sources). In addition, the tracking is relative to an object; hence a vision-based system may adapt a dynamic environment. As a result, a vision-based tracking system has a potential to fulfill the requirements. Because of the advantages, vision-based tracking systems have been widely used and there has been a lot of research effort to develop tracking methods with various assumptions (or initial conditions). The computation time, accuracy, and robustness depend on the tracking methods. In AR, Fiducial-based tracking system is frequently used because it is accurate and computationally low (the fiducials are designed to minimize the detection computation). However, fiducial-based tracking systems are subject to partial occlusion and the tracking range is limited to the areas where a minimum number of pre-calibrated fiducials are in view. The goal of this thesis is to develop a system to overcome the disadvantages of the fiducial-based tracking. Before presenting the new system, various vision-based tracking methods are reviewed and evaluated in this section. There have been two approaches of vision-based systems to compute the pose of the camera / object. Structure from Motion (SfM) starts without any knowledge of 3D positions of the object structure. Based on the 2D measurements of the features, SfM algorithms recover the camera / object motion and object structure simultaneously. The other approach is model-based (it uses the pre-calibrated features (fiducials or landmarks) or known object structure). Correspondence between the 3D positions and 2D measurements of the fiducials of features are used the pose. Model-based approach is popular in AR because of the advantages: fewer minimum number o f features are required than SfM; fiducials can be intentionally designed to be easily detected and distinguished for low computation in detection. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Calibrated features? None -A ll Structure from Motion • Camera intrisic parameters: either M odel-based approach (fiducial-based tracking) calibrated or uncalibrated • Range: unlimited • Accuracy: low • Robustness: sensitive to noise • Camera intrinsic parameters: pre­ calibrated • Range: Limited (no extension) • Accuracy: high • Robustness: partial occlusion Fig. 2-3 Categories of vision-based tracking 2.4.1. Structure from M otion (SfM ) In SfM, the camera extrinsic parameters (pose) and the object structures are recovered simultaneously. The tracking range is theoretically unlimited as long as the system can detect distinguishable features for 2D tracking. There are four major approaches in SfM [Oliensis 1997]: optimization, Kalman filtering and fusing [Azarbayejahni 1995][Broida 1990], projective methods, and invariants-based algorithms [Hartley 1995], Optimization method seems to be accurate but slow and subject to local minima problem. Kalman filtering has been used successfully in many applications. Azarbayejani et al. demonstrates a recursive algorithm (using a Kalman filter) to estimate object structure, camera motion, and camera focal length [Azarbayejahni 1995]. A minimum of 7 points is required, but in practice more points are needed for smoothing. The method is sensitive to noise and requires too many features as a minimum. Projective method does not utilize a priori estimates although they are available. The essential or fundamental matrix [Hartley 1995] rely on polynomial manipulations, which are also unstable [Oliensis 1997], Recently, Sawhney et al. developed an accurate SfM system, demonstrating a realistic result even when parts of the scenes appear and 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. disappear rapidly [Sawhney 1999]. However, their system is not on-line (compute pose of selected keyframes and then in-between frames later), and they used too many points (> 200) in an image. Oliensis developed a fast self-calibration system using an iterative algorithm (STBHS algorithm) [Sturm 1996], However, the depth error was sometimes more than 20% of the depth [Oliensis 1999]. No matter which approach is taken, it is known that the parameter estimation of SfM is not robust, sensitive to noise, and the accuracy is low. SfM requires many features because of over- parameterization. Some of the parameters are Fixed and can be determined off-line. SfM also has a contradiction in preferred camera motion that the camera pose is better estimated with rotational motions while the structure parameters are better recovered with transnational motions. Because SfM starts without any assumption about the structures of the RE, the scale factor is not known. This becomes a problem when the VE are modeled from CAD data that is in absolute scale. For these reasons, SfM techniques are not directly usable for accurate AR tracking. 2.4.2. M odel-based approach: Fiducial-based T racking Most vision-based AR tracking systems are based on fiducials. There are three categories of fiducial-based tracking methods: analytical methods [Mellor 1995][Neumann 1996][State 1996], recursive filter methods [Koller 1997], and iterative methods [Uenohara 1995]. Analytical methods are subject to multiple solutions (unless the number of features > 6 to solve a linear equation) and numerical instability. Fishier and Bolles suggested “Random Sample Consensus” as a method of smoothing data containing a significant percentage of gross errors [Fischler 1981]. They applied this method to the 3-point pose method with success in removing the effects of gross errors. However, their method does not have a time limit (performing random trials), making it unsuitable for real time applications. Horaud et al. developed a four point pose 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. method using non-coplanar points. Geometric constraints are used to solve biquadratic polynomial equations with one unknown [Horaud 1989]. They assert that their method is real­ time, providing fewer solutions than 3-point-based methods, and is more stable (not dependent on the relative orientation of the image plane and scene plane). The problem of multiple solutions persists, and near coplanar points and noise produce unstable results. Ganapathy computes camera position and orientation using a non-iterative analytical method [Ganapathy 1984], His method also employs only 3 points (for external camera parameter estimation), and in general, there are multiple solutions. Although it can be extended to using n-points, it requires iterative optimization. Recently, there has been quasilinear method for estimating camera pose and internal parameters using 4 or 5 points [Triggs 1999]. However, the errors were big for accurate AR registration. Quan et al. proposed linear methods for pose estimation using 4, 5, or N points [Quan 1999]. They used two-step and one-step linear algorithms for 4-point and 5 or more points. However, their method did not apply any outlier culling algorithm, hence the method is vulnerable to outliers as other linear systems are. Welch et al. designed a pose filter or estimator that accepts one point measurement (or constraint) at a time [Welch 1997], The computational overhead of this method (called SCAAT) is small, facilitating real-time applications, and it exhibits robust behavior with noise. SCATT was developed for a high update rate (>lK H z) active beacon system that measured one point at a time. iEKF (Iterative Extended Kalman Filter) also uses the idea of processing one point at a time. Given that video images are snapshots containing multiple point measurements taken at the same time, but at a much lower rate (30Hz), iEKF was developed specifically for video-based tracking. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Uenohara et al. used Newton’s method for 6DOF pose estimation [Uenohara 1996], However, they used multiple DSP chips for implementation to speed up the computation. This is not suited for more general software-based approach. There have been more methods for recovering pose by iteratation (Newton-Raphson) [Lowe 1991] [Yuan 1989]. These methods require initial approximations and can be computationally expensive [Dementhon 1995]. Also the solutions can converge to local minima if the initial values are not close to the true solution [Oliensis 1997], Dementhon et al. designed an iterative algorithm that does not require initial estimates and performs in real-time [Dementhon 1995]. However, their method used scaled orthographic projections and did not fully used the fact that rotation matrices are orthonormal. The robustness and accuracy depend on the method, but the tracking range is limited to the views where a minimum number of fiducials are in view. Therefore, for a wide range tracking, fiducials should be installed and pre-calibrated in a wide range of area. This is impractical because first, it is often difficult to predict the possible tracking area in advance, and second, the range of the calibration devices are limited. Although the accuracy and robustness is better than those of SfM, fiducial-based tracking methods need a robust and wide-area calibration capability to provide the users with freedom of mobility. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 . Extendible Tracking A more desirable method of tracking is based on dynamic calibration. The tracking, in the camera is moving, the system dynamically calibrates detected but unknown features. Until required, the calibration process is deferred as the notion of "lazy evaluation" in algorithms. Given that these features are calibrated, they can be used as tracking primitives just as the pre- calibrated fiducials. By Extendible Tracking, the tracking range can be extended to unprepared and uncalibrated areas. Based on calibrated (pre-calibrated or dynamically calibrated) features, extendible method is accurate and in absolute scale. In Fig. 3-1 and Table 3-1, the new method is compared with SfM and fiducial-based tracking. This new method inherits the advantages of the fiducial-based tracking, and the tracking range is also extendible as in SfM. Integration with natural feature tracking enables the tracking range extension even to the areas where artificial fiducials are not placed, allowing for tracking and virtual object overlay in natural environments. beginning, is fiducial-based and dependent on a small set of pre-calibrated fiducials. While the Calibrated features? None Som e 'All Structure from Motion Range: unlimited Accuracy: low Robustness: sensitive to noise Extendible Tracking Range: wide (extendible) Accuracy: high Robustness: robust Model-based approach (Fiducial-based tracking) Range: limited (no extension) Accuracy: high Robustness: subject to partial occlusion Fig. 3-1 Categories of tracking with dynamic calibration 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Structure from Motion Extendible Tracking Fiducial-based tracking Calibrated None or intrinsic camera parameters Intrinsic camera parameters Some features Intrinsic camera parameters All the features Unknowns Camera parameters (6 +) Feature positions Camera parameters (6 ) Most o f feature positions ( n „ » nr) Camera parameters (6 ) Methods Filters Optimization Invariant-based Dynamic calibration Accurate pose comp. Robust statistics 3,4-point based Use invariant Optimization Recursion Examples Azabayejahni 95, Broida 90, Oiiensis99, Sawhney 99 This paper Most tracking systems Cho 98, Mellor 95, Uenohara 95, Roller 97, Quan 99 Corres­ pondence Comes from 2D correspondence Unique fiducials Cluster recognition Natural feature tracking Unique fiducials Cluster recognition Comments Unlimited tracking range Sensitive to noise Extendible to a wide range Accurate Robust Limited tracking range (where fiducials are in view) Accurate tracking Table 3-1 Comparisons of vision-based tracking methods Extendible Tracking has a potential to fulfill the requirement of wide-area tracking. However, because camera pose tracking and feature position estimation involve noise and outlying measurements, extending the tracking range to a wide area demands reduction of propagated errors. That is, an accurate and robust extendible tracking system is difficult in practice. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1. Problem s o f E xtendible Tracking: Inaccurate Pose C om putation and O utlier E xistence Experiments show that tracking errors propagate rapidly for extendible tracking when the number of calibrated features is relatively smaller than the number of uncalibrated features. Fig. 3-2 shows the synthetic experiment design for tracking range extension with dynamic feature calibration. In this simulated experiment, the system started tracking with 6 calibrated features. The camera was then panned and rotated while the system estimated the positions of 94 initially uncalibrated features placed in a 100”x30”x20” volume. Start with 6 calibrated features C am era m otion Fig. 3-2 Synthetic experiment design for Extendible Tracking Fig. 3-3 shows errors in camera position as camera pose and dynamic feature calibration errors propagated to new scene features. Fig. 3-3a shows the 3D position errors of the dynamically calibrated features. Figure 3-3b shows the errors in the camera position (the peak at about 1000th frame was a pose outlier). After about 500 frames (-16 seconds), the 5-inch accumulated error exceeded 5% of the largest operating volume dimension. This performance may be adequate to 26 94 uncalibrated features Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. compensate for several frames of fiducial occlusion but inadequate for significant tracking area extension. F eature 3D Position C alibration Error | < Q C J 7 6 5 3 1 ^ n n to r— cm r** cm r*» 3 © f— C O 0 1 O ) F ram e n u m o er o — C M 8 8 a. Feature 3D position calibration error ■ E 45 O S 40 CD I 3 5 5 30 S 25 o = 20 0 15 < s 1 io Error propagation in ca m e ra position o < n o m o i n i n c\i O N i n n - P I (O t o V Ifl ■ O © r* » irt u o q m o m o r - •/> cm © r*» i n O ) O ^ N P l CO F ra m e n u m b er b. Camera position error Fig. 3-3 Propagated errors in dynamic tracking range extension experiment A major source of error in the above experiment is the pose computation method (popular 3-point based method [Fischler 1981]). Fig. 3-5 plots the camera position error produced by this method under another simulated test conditions. The true camera position was placed at grid points on a 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. plane with the look-at point maintained around the center of the triangle that was formed by the 3D fiducial positions (Fig. 3-4). The fiducial positions were projected to the image plane and Gaussian measurement noise (a=0.5 pixels) was added. The X and Y (horizontal plane) coordinates of the dots in Fig. 3-5 indicate the X and Y coordinates of the tested true camera positions. The vertical coordinate indicates the computed camera position error. The errors were small (-0 ) in most cases. However, this method has known numerically unstable areas (curved triangular hole in Fig. 3-5) where the method is incapable of computing pose solutions, and is subject to multiple solution problem (in this test, multiple solutions were ignored by selecting the closest solution to the true pose). The result also shows that even when the method provides solutions, none of the multiple solutions were correct in some situations (the raised dots in Fig. 3- 5). Because the effectiveness of dynamic feature calibration (hence, tracking range extension) depends on the behavior of the pose calculations, a more robust and accurate pose computation method is required to reduce the error growth rate in Extendible Tracking. Grid o f true cam era position ,(X =-30-30. Y =-27-33. Z=40) .step = 0 .5 ' ,jf"C am era <0.5.10.4.9.01 Look-at point (-6 . 1.0 .2. 12.0 ) Fiducials Fig. 3-4 Accuracy test of 3-point pose method: Experiment sequence 1. True camera position is iterated on grid points 2. For a given true camera position: Image coordinates of fiducials are computed, adding Gaussian noise (a=0.5 pixel). Then based on the fiducial 3D positions and corresponding image coordinates, estimated camera pose is computed. 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 - 20 15 10 5 0 -30 Fig. 3-5 Accuracy test of 3-point pose method: Computed camera position error (RMS: inch) The presence of outliers may also cause instabilities of the system. There may exist feature measurement outliers (e.g., from incorrect identification of a green marker cap as a green fiducial) and pose outliers (e.g., from multiple solutions of the 3-point pose computation method). These outliers may cause problems especially when filters were used for estimation. For example, in Kalman filter, the outliers have an effect on the estimate convergence while the uncertainty monotonically converges, not affected by the outliers. This is because the filter is incapable of separating and rejecting outliers. Fig. 3-6 shows an example of the effect o f feature outliers on dynamic feature calibration. There were four feature measurement outliers causing peaks in the position estimation curve and lowering the accuracy. However, the uncertainty decreased monotonically, not affected by the outliers (Fig. 3-6b). In this situation, uncertainties do not correctly represent the errors in estimates. A nother example of a pose outlier occurs at about the 1000th frame (Fig. 3-3). It is also known that some of the most common statistical procedures (in particular, those optimized for an underlying normal distribution) are sensitive to seemingly minor deviations from 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the assumptions [Huber 1981]. Because most of the procedures (e.g., Kalman filter) depend on the underlying assumptions of Gaussian distributions, these processes are sensitive to minor deviations. Effect of feature outliers in new feature position estim ation 6 - » 5 — with outliers f 4 CO £ 1 2 I 2 < o J — without outliers 1 0 *— C M < 0 O i CM CO © CO m o n i s O ) (M If) CO co frame number a. New feature position convergence with 4 feature outliers U ncertainties with / without outliers 10 T — without outliers — with outliers S c a c 9 o c 0.01 3 0.001 0.0001 o> CO o CO (*) C M CO c n cm t o fram e n u m b er b. Uncertainties with / without outliers Fig. 3-6 Affect of feature outliers in New Point Position Estimation 3.2. System O verview An extendible tracking system is composed of three major subsystems: dynamic feature calibration, pose computation, and robust outlier management. Unknown features are calibrated 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. iriO I f 5?rsl'. f='* i.W ’ Insert into DB if Uncertainty < thresh. Uncalibrated features Calibrated features Insert into DB if Uncertainty < thresh. Fig. 3-7 An Extendible Tracking System Architecture by dynamic feature calibration subsystem based on the camera pose and the feature image coordinate. The uncertainty of a feature position estimate can be used to determine whether the feature is usable for tracking. 6DOF camera pose is estimated by pose computation subsystem. As shown in Fig. 3-5, the 3- point based pose computation method is inaccurate and not robust. To reduce the propagated errors and for accurate virtual object overlay, more accurate, robust, and real-time pose solutions 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are required. A new pose computation method is preferably n-point based to average out the measurement errors. Outlier management subsystem is imbedded to manage outliers and to reduce the effect of incorrect model assumptions (the measurement error is modeled as, although may not be, Gaussian). Robust statistical methods can be used for outlier management of both feature 2D- measurements and 6D-camera pose solutions. Lastly, to extend the tracking range to natural environments, the system needs to be integrated with natural feature detection and tracking algorithm. The shaded parts of the system are the work accomplished by me (Fig. 3-7). My college, Youngkwan Cho, has implemented Fiducial detection component. Natural feature detection and tracking has been done by Suya You. Extendible Tracking system includes dynamic feature calibration, pose computation refinement, outlier management, and integration with natural feature tracking algorithm. Designed as in Fig. 3-7, Extendible Tracking system enables tracking range extension to uncalibrated and even to natural environments, as described in the thesis statement. Thesis Statement The tracking range of a computer vision-based tracking system can be extended to uncalibrated environments dynamically, interactively, and on-line by (1) calibrating 3D positions of new features, (2) refining camera pose solutions, (3) properly managing outliers, (4) and integrating with natural feature tracking. For dynamic feature calibration, two recursive filters were designed and implemented. These filters provided estimate uncertainties as well as 3D position estimates. The estimates were accurate and stayed stable after convergence. For accurate and stable camera pose solutions, a 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. new pose computation method (which is based on the solutions of 3-point method) was designed and implemented. The performance of the new method was better than other closed-form solutions in multiple solutions problems, accuracy, stability, and robustness. Outlier problems were managed by applying robust statistical methods imbedded in feature calibration and pose computation. Although the tracking range can be extended, there are limits in length. For scale- free scene length, I used lv, a multiple of FOV (Field O f View) length. Using this unit, the tracking range extension within tolerable range of registration error (e.g., < 5 pixels) is limited to lv = 3 using the extendible tracking system I developed. Chapters 4 to 7 describe how Extendible Tracking can be achieved in more detail. Chapter 4 explains about two recursive filters for new feature position calibration. Chapter 5 describes the problems of the current pose computation method, requirements of the new pose computation methods, and a new pose computation methods compared with other methods. Chapter 6 explains about robust statistical methods, its adoption for real-time implementation, and “use of a priori covariances” to cull outliers in Kalman Filter implementation. Chapter 7 describes the system integration, especially with natural feature tracking to extend the tracking range to natural environments. In Chapter 8, the effects of parameters on propagated errors are determined through simulated experiments to identify the major parameters for propagated errors. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 . New Feature Position Estimation In this chapter, the unique aspect of an extendible tracking, that provides the interactive and dynamic new feature position calibration, is described in detail. Given the camera pose and image coordinates of the unknown features, these features’ position estimates are updated based on the incomplete measurement (the estimate is in 3D but the measurement is in 2D). Therefore, the estimate can be anywhere along the line connecting by the cam era and the measurement (Fig. 4-1). E stim ate (3D position) eaiurem ent C a m e ra pose Fig. 4-1 Updating 3D feature position based on 2D measurement The requirements of the feature calibration method are as below, following the registration requirement for an ideal AR system. • L o w c o m p u ta tio n tim e • Availability of uncertainty: to determine the usability of the estimate • Ability to update estimate based on incomplete measurement Two recursive filters have been developed that fulfills the above requirements: one based on EKF (Extended Kalman Filter); the other, RAC (Recursive Averages of Covariances). 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1. Initial Estim ates For unknown features, the intersection1 of two lines connecting the camera positions and the feature locations in the image create the initial estimate of the 3D position of the feature (Fig. 4- 2). The intersection threshold is scaled to the expected size of the Fiducials (if the feature is a fiducial). For example, if the radius of the fiducials is 0.5 inch, as in our case, the threshold is 1.0 inch. Our selection is based on the minimum distance that two distinct fiducials can be placed without overlap. Lines, with a closest point less than 1.0 inch, are considered to be intersecting. 3D position o f the new feature O ne v iew o f the cam era A nother view o f the cam era Fig. 4-2 Initial estimates: intersection of two lines 4.2. Recursive Filters 4.2.1. Extended K alm an Filter (EKF) The Extended Kalman Filter (EKF) has been used in many applications, and details of the method can be found in references such as [Mendel 1995, Welch 1997, Broida 1990]. Inputs to the EKF are the current camera pose and the image coordinates of the fiducial. ck: camera pose at k* time step Zk : image coordinates o f the fiducial at km time step 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The state of the EKF is the current estimate of the fiducial’s 3D position. The real 3D position o f the fiducial is constant over time, so no dynamics are involved in the KEF equations. Parameters of the EKF, including the current state and its covariance matrix, are explained below. pc: intrinsic cam era parameters including focal length : real value o f 3D fiducial position. x k = .rt _, = x k_ -, = ...... = x x x k_x (state): filter’s estimate of the state value at (k -l)1 * ' time step (Initial state is the intersection of the first two lines) Given Z t_, = (z, z, ... z k_{ ) , Ct _, = (c, c, ... c k_l ) , and p t. x k-i = £(-Tt-i I ’ ^*-1 ’ Pc ) (4.1) x k (state prediction): predicted state estimate at km time step given measurement up to (k-1 )lh time step Given Z 4_, = (z, z 2 ... zt _,) , C t_, = (c, c 2 ... c t . , ) , and x k — E {xk I Z i _,, C k_ |, p c) (4.2) Zk (measurement): image coordinate of the fiducial at km time step z k (measurement estimate): estimated measurement at km time step (See eq. 3.6) Zk (residual): z k = Zk - z k (4.3) Q (process noise): set to be a very small value for numerical stability. For example, let t / 3 be 3x3 identity matrix, then Q = 10-5 U3 Since two lines in 3D space may not actually intersect, the point midway between the points of closest approach is 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R (measurement noise): 2x2 covariance matrix Let U-, be 2x2 identity matrix, then R = 2 •£ /,, assuming that the measurement error variance is 2, and there is no correlation between x and y coordinates in the image space. Pt_, (state uncertainty): 3x3 uncertainty covariance matrix at (k-l)m time step Pt-t = £[(**., - **_, )(■**-, - **_, )r ] (4.4) Pk (state uncertainty prediction): predicted state uncertainty at k1 " time step given measurement up to (k -l)m time step Pk = E[{xk - x~ )(x k - x~ )T ] (4.5) h (measurement function): function returning the projection(measurement estimate) of the current position estimate given the current camera pose and camera parameters z k = h { x ; , c k , p c) (4.6) H k (Jacobian): Jacobian matrix of h K k (Kalman Gain): 3x2 matrix. See equation (3.9) The EKF process is composed of two groups of equations: predictor (time update) and corrector (measurement update). The predictor updates the previous ((k -l)1 ") state and its uncertainty to the predicted values at the current (km ) time step. Since the 3D fiducial position does not change with time, the predicted position at the current time step is the same as the position of the previous time step. used as the intersection. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pk - = P k. x+ Q z k =h(x;,ck, p c) (4.7) (4.8) (4.9) Corrector equations correct the predicted state value x t based on the residual of actual measurement c k and measurement estimate ~ k ■ The Jacobian matrix linearizes the non­ linear measurement function. 4.2.2. Recursive Average o f C ovariances (RAC) Filter The RAC filter models each measurement as a 3D line through the current camera position and the fiducial location in the image. The 3D position estimate (JQ of the fiducial is updated based on the measurement line (/). The uncertainty covariance matrix of the estimate is used in computing the update direction vector and the update magnitude of the estimate, and then recursively averaged with the uncertainty covariance matrix of the line. To obtain the update direction of the position estimate, a vector v from the current estimate (X) to a point on the line (/) that is closest to X is computed first. The update direction vector q is a version of the vector v scaled by the uncertainty of X, to move the estimate in the direction of larger uncertainty. Fig. 4-3 shows the 2D analogy of update direction computation, v is scaled in directions of £, and e 2 by A, and A, respectively to produce q . Kk =Pk- H T k {HkPk 7 H T k + R)-' (4.10) (4.3) (4.11) (4.12) 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 4-3 2D analogy of new algorithm of update direction in R A C The update direction vector q is computed by following the processes described below. First we find parameters (a,b,c) satisfying a •£, +b-£2 + c e 3 = v to represent the vector from the current estimate to the closest point on the new line in the uncertainty eigen-vector coordinate system. a [e i e 2 £ 3 1 b a b c = [e , e , e 3 ] " 1 • v (4.13) (4.14) Then we compute (a ’ , b ’ , c ’ ) by scaling (a,b,c) by (A, A, Aj) \a ' b ' c ' ] = [A , • a A 2 • b A 3 • c ] (4.15) Finally, the update direction q is obtained using the scaled components (a \ b', c ’). q = a ' e , + b ' e 1 + c ' e 3 (4.16) The update magnitude m of the position estimate is m = min(du , d,) (4.17) 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where du is the uncertainty of the current estimate in the direction of q and d, is the scaled distance from X to / in the direction of q .i.e., \q | . The uncertainty of the position estimate is represented by a 3x3 covariance matrix. Each line has a constant uncertainty L k , which is narrow and long along the direction of the line. Uncertainty is updated by recursively averaging the covariance matrices, similar to the process used in the Kalman Filter. In the Kalman Filter, the uncertainty covariance matrix is updated by performing weighted-averaging with 2D-measurement error covariance as below. Kt =P;H l(H tPk - H l + R,r' (4.10) pk = a - K t H t )pk- (4 J 2 ) In the RAC Filter, the measurement is modeled as a line that is already in the same 3D space the estimate is in. Therefore we can eliminate the Jacobian matrix and its linearization approximation. This is an advantage o f the RAC Filter over the EKF, simplifying and reducing the computational overhead. Let P k~ be the uncertainty covariance matrix o f the current estimate and L k be that of the new line, then computation of the updated uncertainty covariance matrix P k is simplified as below. K k =Pk 7{Pk + Lkr (418> Pk = { l - K k)Pk < 4 -19> = ( I - P k 7(Pk 7 + Lk)-')Pk - = ((P - + Lk X P - + Lk)-' - Pk {P- + Lk)-' )P~ = {Lk{Pk - + L k)~')Pk - (420) The initial value of Pk is obtained in the same way by replacing P k~ with L k , resulting from the two initial line uncertainties. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3. Calibration Result Both filters appear stable in practice. The EKF is known to have good characteristics under certain conditions [Broida 90], however the RAC gives comparable results, and it is simpler, operating completely in 3D world space with 3D lines as measurements. The RAC approach eliminates the linearization processes required in the EKF with Jacobian matrices. S y n th etic Data (Zoom C a se ): with 0.5 Pixel White Noise Frame Number S y n th etic D ata (Zoom C ase): with 2.0 Pixel White Noise -----EKF (fiducial 1) 2.5 -----EKF (fiduciall > - - RAC (fiducial 1) - - RAC (fiduoall) -----EKF (fiducial2) JZ 2 u — EKF (fiduaaG) - - RAC (fiduda!2) < a > S 1.5 - - RAC (fiducta!2) 2 4 6 S to 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 Frame Number Zoom Case: 0.5 Pixel White Noise Zoom Case: 2.0 Pixel White Noise Fig. 4-4 Synthetic Data: Camera movement — Zooming S y n th etic Data (P an C ase): with 0.5 Pixel White Noise S y n th etic Data (Pan C ase): with 2.0 Pixel White Noise I -— EKF (fiducial 1) - - RAC (fiducial 1) EKF (Iidudai2) - - RAC (fidudal2) 3 5 7 9 1 1 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Frame Number 3 j — EKF (fiducial 1) | - - RAC (fidudal i) EKF (fiduaai2) | - - RAC (fiduoa!2) 2 5 2 T 2 u c « O ' OS 0 3 5 7 9 1 1 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Frame Number Pan Case: 0.5 Pixel White Noise Pan Case: 2.0 Pixel White Noise Fig. 4-5 Synthetic Data: Camera Movement — Panning We tested the EKF and RAC filters to find the positions of two new fiducials (refered to as red and magenta fiducials). W e started with three known fiducials and two cases were tested. In the panning case, new fiducials were placed to the side of the known fiducials, assuming a translated 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. region of interest. In the zooming case, new fiducials were placed in the center so that the user could zoom in to a region of interest. Synthetic data sets were created, based on real camera poses captured by real camera movements. White noise of maximum 0.5 and 2.0 pixel-error were added to the measurements of the image coordinates of the fiducials. See figures 4-4 and 4-5 for the results of experiments with synthetic data sets. Experiments with real data were performed in both the zoom and pan cases (Fig. 4-7). The estimates of the filters were compared with analytic solutions, which have the minimum sum of Euclidean distances to the input lines. Fig. 4-6 shows a 2D analogy of the analytic solution computation. Fig. 4-6 2D Analogy of Analytic Solution: T h e an aly tic so lu tio n is a p o in t th a t h a s m inim um su m o f d is ta n c e s to th e lines. In th is c a s e , th e analytic solution is a p o in t A 'th at m inim izes dt + d2+ d3+ d+ 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 < 5 . 8 2 Real Data (Zoom Case) 2 4 6 8 tO 12 14 (6 18 20 22 24 26 28 30 32 34 36 30 42 Frame Number Real Data (Pan Case) EKF (red fiducial) — EKF (red fiducial) - - RAC (red fiducial) - • 3 f ’ A - - RAC (red fiducial) ---- EKF (magenta fiducial) S ' u k \ ---- EKF (magenta fiducial) * * RAC (magenta fiducial) S = . * A - - RAC (magenta fiducial) 3 5 7 9 tl 13 16 t 8 20 22 24 26 28 30 32 34 36 38 40 42 Frame Number Zoom Case Pan Case Fig. 4-7 Real Data: Zooming and Panning cases A virtual camera view shows the results of the real data experiments graphically (Fig. 4-8 to 4- 10). The lines are traces of the camera, i.e., the input lines used for RAC Filter. The large spheres indicate positions of known fiducials, which were used for computing camera poses. The dark and bright small spheres represent the estimated positions produced by the RAC and EKF filters, while the black cube represents the analytic solution positions. Fig. 4-8 and 4-9 show how the estimates converge and the uncertanties shrink as the number of measurement lines increase for pan and zoom case. Fig. 4-10 graphically compares the filter result with the analytical least-square solutions. The results show that the estimates of the filters converge fast and remain stable after convergence with both real and synthetic data. after 13 frames after 28 frames after 38 frames after 43 frames Fig. 4-8 Pan Case: Visualizations of new lines from moving camera views and their effect upon the uncertainty covariance ellipses 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. after 19 frames after 29 frames after 40 frames after 40 frames : Close-up Fig. 4-9 Zoom Case: Visualizations of new lines from moving camera view s and their effect upon the uncertainty covariance ellipses Pan Case: Magenta fiducial Zoom Case: Red fiducial T h e EKF re s u lt (b rig h t sp h e re ) a p p e a rs in fro n t T he EKF re s u lt (b rig h t s p h e r e ), RAC filter re s u lt o f th e a n a ly tic so lu tio n (b lack c u b e ), a n d th e RAC (d a rk sp h e re ), a n d th e a n a ly tic so lu tio n (black filter re s u lt (d a rk s p h e re ) is ju s t to th e ir left. c u b e ) overlap a n d a re n o t s e p a ra te ly d iscernible. Fig. 4-10 Virtual camera views of the results of real data experiments 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5. Camera Pose Computation 5.1. Requirem ents As already mentioned in Chapter 3, an inaccurate and non-robust pose computation method is one of the main sources of propagated errors. In a previous paper [Neumann 1998b], 3-point analytical method was used. This method was First developed by Fischler and Bolles [Fischler 1981] and has been widely used in real-time camera tracking. This method is known to be subject to numerically unstable area (the method can not provide solutions) and to produce multiple solutions. Numerical instability can be tested under simulated test conditions. Because of numerical instability, the method sometimes provided incorrect solutions or could not provide any solution (Fig. 5-1: a repetition of Fig. 3-5). In this test simulation, multiple solution problems were ignored by selecting the closest solutions to the true camera positions. More accurate pose estimates are necessary to reduce the error growth rate in extendible tracking. 25 20 15 10 5 0 -30 Fig. 5-1 Accuracy test of 3-point pose method: Computed camera position error (Vertical axis: RMS in inch) 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Our criteria for pose calculation methods suitable for extendible tracking are as follows: • Low computation (< 10 ms/estimate) • Accurate solutions when given accurate data, yet robust solutions in the presence of measurement and calibration errors. The method should also facilitate outlier culling in the presence of gross errors (e.g., incorrectly identified features) • Robustness in the presence of outlying measurements • Adaptive use of available information in a frame. When more information (features) is available in a frame, the method should use it to increase accuracy. When little information is visible, it should make the best estimate and reduce its confidence in the solution Methods that use all available information (N-point methods) are generally robust because errors and noise can be averaged [Dementhon 1995]. In terms of the minimum number of features required for tracking, the lower, the better. However, three or four visible features per frame are consistent with theoretical minimums. 5.2. Pose Com putation M ethods There are roughly three types of vision-based pose computation methods: optimization, filtering, and analytical. Optimization method searches for the optimal solutions iteratively [Lowe 1991][Yuan 1989], This method requires initial estimates and high computation. One more disadvantage is that the solutions provided by this method may be local minima. Filtering has been used in tracking recently [Koller 1997][Welch 1997][Park 1999]. Using filters, temporal information is also integrated in estimating dynamic camera pose. However, this method also requires initial estimates. Analytical methods do not require initial estimates, and their 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. computational overhead is frequently low. The method suggested by Fischler and Bolles estimates camera pose based on known 3 points [Fischler 1981]. However, as mentioned earlier, this method is subject to multiple solution problem and numerical instability in certain area. There have been other analytical methods that used more than 3 points. Because of using more point(s), these methods may have advantages over 3-point based methods: the number o f multiple solutions may be reduced, and the accuracy and stability may be enhanced. Kamanta et al. developed a pose computation method based on 4-point targets [Kamanta 1992]. However, their method assumed co-planarity, and the resulting pose accuracy was low. Horaud et al. suggested another analytical pose computation method based on 4 known points [Horaud 1989], According to the authors, their method provided fewer multiple solutions, and the solutions were more stable than the 3-point based methods. Recently, there have been quasi-linear or linear pose computation methods that used 4 or more points to provide unique solutions [Quan 1999][Triggs 1999]. Triggs developed a quasi-linear method that was analogous to Direct Linear Transformation (DLT) [Triggs 1999]. By estimating one fewer parameter, the problem was converted into linear null space computation. His method estimated camera pose and calibration parameters at the same time: with 4 points, it estimated pose and the focal length; with 5 points, it estimated pose, focal length, and principal point. However, this method failed for coplanar cases, was sensitive to noise, and required high computation (e.g., the 4-point algorithm required solving a 80x56 matrix). Quan and Lan revisited the 3-point method and developed linear algorithms for 4 or more points [Quan 1999]. Their method performed singular value decomposition (SVD) and generated solution space by right singular vectors with additional constraints. However, by ignoring non-linear dependencies in the solutions space, available information was not fully utilized. According to the relative errors presented in the paper, the accuracy of the algorithms did not seem to be good enough for AR implementation. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Among the pose analytical computation methods that used more than 3 points, Horaud’s method seemed to be most accurate. For this reason, Horaud’s method has been implemented in this section for the purpose of comparing the performances (accuracy and stability) with the 3-point method. According to the initial test, the 4-point method did not have advantages over the 3- point method. Although the use of 4 points reduced the number of multiple solutions, the conversion from 4 points to 3 lines used only partial information. As a result, the algorithm inherited the same 4th degree polynomial as the 3-point method [Quan 1999]. Detailed description on the algorithm and test results can be found in Appendix A. Because analytical method is the only type of pose computation method that does not require initial estimates, and the 4-point based methods did not have advantages over the 3-point based methods, the strategy o f developing a new pose computation method was to enhance the performance of the 3-point based method. However, there are disadvantages of the analytical 3- point method: multiple solution problem, instability, and poor noise filtering. In the rest of this chapter, a new pose computation method based on the 3-point method [Fischler 1981] is described. This new method was designed to overcome the disadvantages of the base method (3- point method). 5.3. A New M ethod: RA3 (Robust A verages o f 3-Point Solutions) To address the multiple solution problems and instabilities of the simple 3-point pose method, the algorithm performed robust averaging of 3-point solutions. First, the features whose 3D position uncertainties are below a threshold were selected. (Note that these features may have been calibrated off-line or dynamically on-line.) Second, the feature positions in the image were analyzed to select a set o f evenly distributed features. Following six features were selected: four 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. features closest to the comers of the image and two features closest to the center o f the left and right halves of the image. A maximum of twenty triples from these six points was robust- averaged with outlier culling. As in Fig. 5-1, there are pose outliers due to numerical instability. For robust least-square solutions, the M-estimator was suggested [Huber 1981] [Simon 1998]. Instead of the least-square method, a real-time approximation of Huber’s M-estimator was used. Details are in Appendix A. After the robust M-estimator is computed, a linear Kalman filter applies temporal smoothing. In this case, the measurement and the Kalman filter state have the same dimension, and the measurement equation and process equation are linear (Fig. 5-2). Currently, a simple dynamic equation with 0 acceleration is to effect smoothness. This pose solution makes use of both spatial (by feature distribution) and temporal (by smoothing using a linear Kalman filter) information. 3x1.3x3 R otation conversion _______ 5;______ [cTranslation. rotations 3x1. 3x1 A dd velocities 12x1 Cam era Pose (translation, rotation m atrix) Z: <translation. translational vel.. O rientation, rotational vcl.> x ; = ai&)Xm Pt~ = A(St)Pk.,AT(5t) + Q(8t) Z = X ' K = P~{P~ +RU))~' z = z - z X = X~ + KZ P = {I-K)P- M easurem ent=state: H=identity Fig. 5-2 Linear Kalman Filter for temporal smoothing 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 5-3 to 5-5 shows how the pose solutions is improved by averaging, robust M-estimation, and Kalman filtering. Gaussian noise (c = 0.5-pixel) was added to the measurement o f simulation estimator removes the effects of incorrect correspondences, and performs outlier (e.g., gross error) culling. Even with correct data and no outliers, the result was improved in many frames (Fig. 5-4) showing reduced sensitivity to noise. Lastly, the linear Kalman filtering smoothed the camera pose enhancing the camera position accuracy (Fig. 5-5). Because camera orientation is calculated based on camera position and feature correspondences, the orientation accuracy depends on the position accuracy in this and many other pose calculation methods. Thus, the charts for orientation are not presented. The improvements are summarized in Table 5-1 with averages and standard deviations of errors. These error statistics show the benefits o f averaging, applying robust M-estimator, and Kalman filtering. data. The accuracy of the 3-point method was greatly enhanced by averaging (Fig. 5-3). The M- Improvement by Averaging 2.5 3 8 2 3 S 3 n « » ^ ^ ^ n « n n n « F r a m e n u m b e r Fig. 5-3 Improvement by Averaging Improvement by M-estimator 0.7 — A v e r a g e s — 0.6 tn 2 n A a o m i o S ^ n e S F r a m e N u m b e r S . o.s Fig. 5-4 Improvement by M-estimation 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Im provem ent by using Linear Kalman Filter 0 .7 -r - F r a m e N u m b e r Fig. 5-5 Improvement by temporal smoothing with Kalman filtering 3-point Avg. M-est. w/KF Mean 0.350 0.170 0.156 0.150 a 0.380 0.092 0.091 0.075 Table 5-1. Improvement in averages and standard deviation in errors 5.4. Pose Com putation E xperim ents and Results We performed synthetic data experiments to show that the new method fulfills the pose computation method criteria of section I. For synthetic camera motion (6DOF pose sequences) generation, two methods were used: a mechanical digitizer or keyframe interpolations (of viewpoints and look-at points). Gaussian noise of various standard deviations was added to the measurements. The average computation times (with 6-14 points in view) for RA3 was 3.6 ms. Considering 30-70 ms for image analysis and 25-40 ms for virtual object rendering, computational overhead of RA3 is small, and the performance of the whole process was - 8-14Hz on a 450 MHz Pentium CPU. The accuracy was tested by comparing the projections of 3D- points using true camera pose and estimated camera pose. Two 3D-points were projected in 500 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. test frames. The averages of projection errors are shown in Table 5-2 for two measurement-noise levels. a (pixel) Average error (pixel) 0.25 0.55 0.5 1.02 Table 5-2. Pose feature projection accuracy 5.4.1. D ifferent N oise Levels Computed camera pose solutions were compared with true values. Measurement noise levels were a = 0.25 and a = 0.5 pixel. Fig. 5-6 shows the camera position and orientation errors. Comparisons (Position): w/ different noise level F r a m e N u m b e r Fig. 5-6 a — Camera position error for two noise levels with different noise level — SO: 0 25 0.4 f so OS a tr 0.3 o tram a nunb*< Fig. 5-6 b — Camera position error for two noise levels: Zoomed view 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Comparison (Orientation): with different noise F r a m e n u m b e r Fig. 5-6 c - Camera orientation error for two noise levels Fig. 5-6 Camera position and orientation errors for two measurement noise levels 5.4.2. Processing D ifferent Quantity of Points Processing different numbers of points affects the pose accuracy. Fig. 5-7 shows pose solutions using 3, 4, and 7 points between frames 241-280. As more features were used, the camera position errors were reduced. with different num ber of points: RA3 — 3 points — 4 points — 7 points 0.6 4 f/v ® 0.2 frame number Using 3, 4, and 7 points: Close view in 241s t~280t t > frame Fig. 5-7 Tracking with different number of points 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.3. Managing Outliers Vision-based methods are subject to outlier problems. Fig. 5-8a is an example of a gross error resulting from incorrect fiducial identification (the cross mark on a pencil sharpener in the center of the image indicates that the pencil sharpener was detected as a Fiducial). Measurement outliers were added (in addition to cr=0.5 pixel Gaussian noise) to test the robustness of the method (one feature outlier in frames 42-50 and frames 150-162, two feature outliers in frames 343-344; outlier displacements were 100-250 pixels). The proposed method implements an approximated robust M-estimator (Appendix A.) and manages the outlier cases. As a result, the estimates were robust in the presence of outliers (Fig. 5-8b). Fig. 5-8 a - An example of incorrect fiducial identification O utlier E ffect 0.5 RA3: with Outliers RA3: without Outliers Frame number Fig. 5-8 b — RA3 is not affected by outliers 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Outlier Effect: Zoom _ 0.4 0.1 & & & & & J o * & frame number Fig. 5-8 b (continued) Fig. 5-8 Presence of outliers 5.4.4. Sudden C am era M otion Sudden camera motion is generated to test the convergence stability of the RA3. The proposed method is stable providing pose solutions nearly identical to the true values. Fig. 5-9 shows the camera X coordinates. The pose solutions of the method are very similar to the true pose even with sudden camera motion. Stability of other coordinates is similar to X coordinate. C am era X C oordinates 25 — True X 20 — RA3: X *- a r* a s c o a r * * O ! a F ra m e n u m b e r Fig. 5-9 Stability under sudden camera motion: camera X coordinates 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4.5. Real Experiment A sequence of images was obtained and digitized off-line to compare RA3 with the 3-point method. The image contains 15 multi-ring fiducials [Cho 1999], and the virtual objects include a torso of Venus, a virtual window, and annotations (Fig. 5-10). The re-projection errors between the measurement and projection o f fiducials were computed (Fig. 5-11). The errors were predominantly under 1.0 pixel. In this experiment, all the features were calibrated off line, not involving autocalibration. a — fiducial placement b - virtual object overlay Fig. 5-10 Real environment and virtual object overlay for experiment with real data 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. R eal D ata: fiducial re-projection error 1.5 o M a. c r i 2 c 5 c 2 .2. o Q . < S t r 0.5 r » » c n « m oo o F ra m e n um ber Fig. 5-11 Re-projection errors in real data experiment 5.5. Chapter Sum m ary and D iscussion The criteria for vision-based pose computation methods (for supporting dynamic tracking extension) were defined in the beginning of this chapter. In summary, the proposed method is real-time, robust, accurate, and n-point-based. RA3 involves low computation (computation time - 3.6 ms); robust (by applying robust M- estimator) in response to sudden camera motion and in the presence of outliers; accurate with - 1.0 pixel re-projection error (<j=0.5 pixel measurement noise); capable of using a wide range of points (3-6 points). Applying averaging is also advantageous when the result (pose) is applied to a Kalman filter for temporal smoothing because the averaged result is Gaussian according to “Central Limit Theorem”. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6. Outlier Culling with Robust Statistics 6.1. Robust M -estim ator Some of the most common statistical procedures (in particular, those optimized for an underlying normal distribution) are sensitive to seemingly minor deviations from the assumptions. Many researchers proposed Robust Statistics with motivation of resolving the above problems [Huber 1981]. Huber proposed “Robust” procedures with primary interest in distributional robustness (the shape of the true underlying distribution deviates slightly from the assumed model the Gaussian Law) while much less is known about other standard assumptions. The desirable features o f robust statistical procedure are first, reasonably good efficiency (optimal or close to optimal), second, robustness (small deviations from the model assumptions should impair the performance slightly), and third, stability (larger deviation from the model should not cause a catastrophe). There are three types of robust estimators: M-estimator (Maximum likelihood estimator), L- estimator (linear combinations of order statistics), and R-estimator (rank tests). M-estimator is most flexible and easy to handle. It also is generalized straightforwardly to multi-parameter problems. For one-parameter location problems, L-estimator is attractive, however, its mean has poor breakdown properties in many cases. Because of the advantages, robust M-estimator has been widely used in least-square estimation [Huber 1981][Simon 1998], For a real-time system, an approximated version can be used for low computation. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2. Real-tim e A pproxim ation Generally, robust M-estimators require searches to find the M-estimations, which is not appropriate for real-time applications. However, an approximation can be obtained with less computation. If we apply the M-estimation technique using Huber’s function p(x), the minimization equation is as follows. This equation is difficult to evaluate analytically requiring search because of the conditions p-pi>c etc. Conditions depend on the unknown value of p. However, because p ~ ~p, we can approximate the condition using the average. = min minE p (o p , = m in£p(/7-p,) /< = min p 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p^Tfl'LPi* 'L(p-c)+ S ^ +c) ” [ I p - p .I P - p . x p - p , < - c ) where c = k C 7 . Consequently, we can use pseudo measurement suggested by Huber. To devise robust algorithm that can be easily patched into existing programs, Huber suggest pseudo-observation, for example, in least square fitting: Let y t fitted value o f y,’s n = y, - y. s, Standard error of y, or r t Pseudo-observations y*( is defined as y, = - y, if y , - C S' i f A - < - c • S' y , + c - s, i f rt > c - S' Constant c regulates the amount of robustness: good choose is between 1 and 2 (e.g., c=1.5). However, there needs a method to differentiate between outliers (e.g., from multiple solution) and contaminated measurements (e.g., from noise measurement). This can be achieved by throwing away measurements with |r | > c u ■ s , where e.g., c„ = 4 . 6.3. A dapting into a K alm an Filter: “Use o f a priori covariance” The approximated real-time version of robust M-estimator can be adapted into a Kalman filter as a pre-processor to determine whether to accept or reject the measurement, or to use a pseudo­ observation. The statistics required for M-estimation is already available: the state as the mean and the error covariance matrix as the standard deviation. Broida et al. suggested measurement 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. outlier culling (without using robust statistics) in by projecting the error covariance matrix to the measurement space [Press 1993][Brioda 1990]. The following are the equations for adapting Huber’s function of robust M-estimator to a Kalman filter. Let state of the filter be X (average) and the error covariance matrix be P. Assuming R s : Standard measurement noise matrix Z : Measurement Z : Predicted measurement c, ' ■ Threshold for inner outlier C o ' ■ Threshold for outer outlier Then the uncertainty of the measurement is H P H T + Rs . Fig. 6-1 Outlier boundaries using a priori covariance 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. To perform M-estimation method and also cull completely wrong outliers (e.g., a marker cap detected as a fiducial in new point position estimation), the measurements are classified as following and treated differently. Outliers r > c 0 ■ (H PH t + Rs ) : throw away the measurement. Contaminated measurements c, • (H P H t + Rs ) < r < cu ■ (HPH t + Rs ) : use pseudo-measurement Normal measurement r < c , ( H P H T + R S) After applying M-estimation algorithm, the measurement is used as an input to the Kalman filter. When a pseudo measurement is used, the measurement noise matrix needs to be adjusted accordingly to differentiate, e.g., between the measurements close to and far from the inner boundary. M -esdrraor applied in s ta U e region — M-esamabon ^ ^ & & & £ ^ M -estim alo r a p p lied to utable a re a — N-poni aoluaon | 0 8 0 6 0 2 2 3 » o S 8 2 hem engntw Using Pseudo measurement suggested by Huber (a-left, b-right) Fig. 6-2 M-estimation with camera pose outliers 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W KhaflSer culling W ith o u tlier culling 0 8 to 0 6 0.4 j 02 0 k ......................... ........... .......................... . * * + * A * + + * + # & & o * n n * & & N * y 1 — Wth oudtaraJfcig III 2, t i a * J 0 2 Culling outliers (c-left, d-right) Numerically unstable area Numerically stable area Fig. 6-2 (continued) The suggested measurement noise matrix is R = r. ■ r.r + Rs , to set the Kalman gain K smaller when the distance rz is bigger and make the contribution of the measurement is smaller: i.e., R is a weight value. Suggested values of threshold constants are cu = 3,4 and c, = 1.5 6.4. A pplications and Results Fig. 6-2 is camera pose solution data from two synthetic experiments: numerically unstable area for 3-point solutions and numerically stable area. The pose computation method was combination of 3-point solutions. Fig. 6-2a and Fig. 6-2b compares pose computation with and without using pseudo-observations. Using pseudo-observations, the camera position errors (errors from the true camera positions) were reduced as much as to 30%. In Fig. 6-2c and Fig. 6- 2d, those measurements that considered as complete outliers (|r |> ca -st) were rejected, resulting in even better accuracy. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 6-3a is a repetition of Fig. 3-6a showing the effect of feature outliers to new feature position estimation. Fig. 5-2b is the result from the same data with M-estimation embedded in Kalman Filter. The result was compared with new feature position estimation from data without outliers. The result is as good as data without outliers. This is because M-estimation uses pseudo­ observation to refme the measurement not only culling the outliers. Effect of featu re outliers in new featu re position estim atio n 6 s 4 3 2 1 0 Q Q O O O O O O © o f r a m e n u m b e r Kalman Filter with M -estim ation: with feature outliers s 4 3 2 1 0 N lO O V IS en «o o n to cm co o o r> r- 5 < D O C O C M s o • M - m u ) m fram e n um ber a. With and without outliers b. With outliers: with and without M- estimation Fig. 6-3 M-estimation with feature measurement outliers 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7. System Integration and Results 7.1. Integration w ith N atural F eature Tracking In this experiment, pose computation and dynamic feature calibration were integrated with natural feature tracking system. The user-specified number of natural features was detected in the initial image and tracked in the extended images. The camera pose was initially computed based on three fiducials, calibrating natural features. As the fiducials were out of view, or one of two of the fiducials were not detected, natural features were used for camera pose computation. The image stream generated by a cam era was directly digitized using a video editing system for off­ line computation. Because natural feature tracking was not performed in real-time (~ 5Hz), the whole process was performed off-line but without manual intervention. The sampling rate was 15Hz resulting in about 250 frames from a 17 second video sequence. Fig. 7-1 shows the result of natural feature tracking. Twenty features were detected in the first frame, twelve out of which were selected for tracking, rejecting others for being too close to fiducials or the other selected features. Even in 250,h frame, all the features were accurately tracked except for those out of screen. Feature Detection: 4th frame Feature Tracking: 250th frame Fig. 7-1 Results of Feature Detection and Tracking 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 7-2 shows the result of feature position estimation. Each chart indicates the convergence of X, Y, and Z coordinates of 3D positions of the natural features. They converged fast (at about 90th frame, i.e., in 6 seconds) and stayed stable after the convergence. It is noticeable that the initial estimates of Z coordinate were less accurate than X and Y coordinates, because depth is more difficult to recover. Convergence of X coordinate •20 1 fram e number X coordinates of natural features Convergence of Y coordinate to 0 -to -20 > • » •SO fra m e r tm b e r Y coordinates of natural features C o n v e ig e n ce o fZ c o o td iiate i: 9 • 1 9 8 - 1 0 - 4 9 Z coordinates of natural features Fig. 7-2 Convergence of 3D positions of Natural Features 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 7-3 shows the result of camera tracking and virtual object and annotation overlay. The bigger dark circles indicate the projections of the calibrated (in case of fiducials) or estimated (in case of natural features) 3D positions. The smaller bright circles indicate the measurements resulted from natural feature tracking. The bright crosses indicate fiducials or natural features that were used for tracking. In case, fewer than three fiducials were detected, 4 features (either tracked natural features or detected fiducials) close to each comer of the image were selected to compute camera pose. In this experiment, 3-point pose computation method was used. b. 250th frame c. 96th frame d. 97th frame Fig. 7-3 Result of Virtual Object/Annotation Overlay Fig. 7-3a shows the early stage of tracking and natural feature 3D-position estimation. There are noticeable difference (about 5-7 pixels) between the projection of the estimated 3D position and screen coordinates resulted from natural feature tracking. As the frame number increased, the estimates of 3D positions and the screen coordinates resulted from natural feature tracking became closer (about 1-2 pixels) as can be seen in Fig. 7-3b. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fig. 7-3c and 7-3d show how the tracking primitives for camera pose calculation were switched from fiducials to natural features when one of the fiducials was detected. There were also frames where the mixture of fiducials and natural features were used for pose calculation. 7.2. RA3 with D ynam ic Feature C alibration The new pose computation method, RA3, was tested with the dynamic calibration experiment of Fig. 3-2 (repeated in Fig. 7-4). Tracking was started with 6 calibrated features, and dynamic calibration of 94 uncalibrated features was done in a 100”x30”x20” volume. The propagated errors were significantly reduced for RA3 compared to the simple 3-point method (Fig. 7-5). These results indicate that it may be feasible to use autocalibration over a long term and large area with modest error propagation, offering greater freedom of mobility and accuracy to the users of vision-based tracking systems. Fig. 7-4 Synthetic experiment setting for propagated errors in dynamic feature calibration Start with 6 calibrated features 94 uncalibrated features C am era m otion 6 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. With Dynamic Calibration • 5 35 — 3-point method w 30 • - — RA3 25 - 20 8 S R S 3 r- N W « ^ o < o CO s •<r o CO C M O m cm ® o CO CO o co m Frame number Fig. 7-5 Propagated camera position error with dynamic calibration Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8. Propagation Error Model of Extendible Tracking Because of the large number of parameters and inter-dependencies between the parameters, precise prediction of the propagation error is difficult requiring intensive research. However, even a simple version of the propagation error model may be useful for predicting the errors in extendible tracking applications: an error model provides the maximum possible range (i.e., scene length) to which the tracking can be extended within a certain registration error limit. The propagation error model can be designed based on the identification o f the error sources (i.e., parameters) and the determination of the relation between each parameter and the propagation error. These relations are difficult to determine through real experiments because, in real experiments, the effect of one parameter is difficult to be isolated from the effects of others. In this chapter, simulated experiments were performed to determine the relations between the parameters and the propagation error. Although there may be inter-dependencies or correlations between the parameters, most parameters are assumed to be independent for the simplicity of the propagation error model. The candidates of parameters that may affect the propagation error are as follows: • Measurement noise level (standard deviation: < 7m): Measurement noise (on screen space, in pixel unit) is one of the major sources of errors. Measurement noise is mainly due to the limited resolution of the camera, but also dependent on the feature detection algorithm. • Average camera depth ( z ): Close views are expected to be more accurate in tracking than far views. • Feature density (D): Higher feature density increases the total number of features and number of in-view features per frame. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • Average number of in-view features ( nv ): More in-view features are expected to enhance the tracking stability and accuracy. • Number of initially calibrated features ( n c): More initially calibrated features are also expected to enhance the tracking stability and accuracy just as nv . The initially calibrated features are assumed to be clustered at one end of the scene object in the simulated experiments. • Scene length ( lv): The error increases as the scene length increases. For scale-free scene length, total number of features divided by the average number of in-view features was used. This parameter is basically Field Of View (FOV) length, represented as a multiple of the average number of in-view features (nv ). Using this parameter, the scene length can be represented in unit-less form. • Camera motion (M ): The propagation error may also depend on the camera motion. However, determining the effect of the camera motion on the propagation error is difficult. • Feature placement (F): Even with the same number of features, a different arrangement of the features may result in a different magnitude of propagation error. However, feature placement is non-controllable as is camera motion. • Number of frames (/): This factor has an effect on the dynamics of the camera motion. More frames increase the speed of the camera motion. • Feature uncertainty threshold (tu): Feature uncertainty threshold determines when a dynamically calibrated feature is eligible for camera tracking. Large value of feature uncertainty threshold enables the early use of the features in tracking, which may result in inaccurate pose. Smaller value of uncertainty threshold restricts the use o f a feature until 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. its uncertainty is smaller than the threshold, which may result in less number of calibrated features and even in loss of tracking. Although there may be other parameters, to consider every single parameter is not feasible in making an error model. For the parameters listed above, the effect on error propagation was determined empirically through simulated experiments. The propagation error model may depend on the tracking algorithm; different tracking algorithms may have different propagation error models. In the following experiments, RA3 (Robust Averages of 3-point Solutions) was used as the tracking algorithm. However, similar process can be applied to different tracking algorithms to obtain the propagation error models. 8.1. E xperim ent Design 8.1.1. G rid Experim ents For some parameters (e.g., measurement noise level), sequential camera motion is not necessarily required to determine the effect on the tracking error. For these parameters, grid experiments were also performed. Because a grid experiment, being simpler, involves fewer parameters, the effect of one parameter is more clearly determined. In grid experiments, the true camera position was iterated on a 3D grid (X=-30~30, Y=-30~30, Z=40~50, step=5.0 or 2.0). The image coordinates of the feature projections were obtained from the true camera pose. The Gaussian measurement noise was added to each image coordinate, assuming that measurement noise is iid (independent and identically distributed). Fig.8-1 shows the experiment design for grid experiments. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.1.2. Extendible Tracking Experiments For most parameters, the effect on the propagation error was determined through simulated extendible tracking experiments. In an experiment to determine the effect of a specific parameter, the values of other parameters were fixed to isolate the effect of one parameter. For the factors that are difficult to parameterize (e.g., feature placement and camera motion), the effect was minimized by randomization. Grid o f true camera position (X=-30~30, Y=-30~30, Z=40-50) Camera Look-al point Features Fig. 8-1 Grid Experiment Design 8.1.2.1. Feature Placement (F) Arrangement of the features affects on the camera pose estimation and the feature calibration errors, hence also on the propagation errors. However, the effect of this parameter is difficult to determine. The arrangement of the features may also be a user decision (i.e., where to extend the tracking). Therefore, feature placement is disregarded as a non-parameter. Rather, the feature placement was randomized in the simulated extendible tracking experiments. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Average distance between features (d ) Fig. 8-2 Parameters in Feature Placement P(x) -I I Fig. 8-3 Probability density of a random variable for feature placement The features were placed evenly yet with randomness. The randomness was added applying the Monte Carlo method for determining propagation error parameters. The variables in feature placement include the tracking range dimension (D ), the feature size (r), the feature placement density (m), and the feature displacement range (/). The tracking range dimension (D) was a 3D volume where the longest dimension was in the X (horizontal) direction. The Y-axis was in the vertical direction and the Z-axis, in the depth direction (Fig. 8-2). Even though the scene was 3D, feature position was determined in 2D: given that X and Y coordinates were obtained, Z coordinate was determined either from the scene object surface or by randomness within the 3D volume. To control the feature placement density (m), the average distance between features (d ) was used. These two variables (i.e., D and d ) determined the total number of features and the 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2D-grid points where the features were centered when no randomness was involved. The features were assumed to be circles with a fixed radius (feature size = r). The feature size (r) combined with the feature displacement range (/) defined how far the features could be placed away from the grid points. The feature displacement range (/) was 0 ~ lnua, where _ r por randomized feature placement, uniformly distributed random variables were used which take on values in the interval from - I to /. The probability density function of the random variables is shown in Fig. 8-3. If 1=0, the features are placed on the grid points (Fig. 8-4a). The features could be adjacent to the neighboring features if 1=1^ (Fig. 8-4b). Frequently in the experiments, 0 . 8 • / _ </<0.9 / m „ (Fig. 8-4c). feature Grid point a. Feature displacement range 1 = 0 Features are placed on the grid points Tfeature ! feature Grid point • < > . / = /„ b. Feature displacement range I = Im ax Features may be adjacent feature / J w » m a x Grid point*" c. Feature displacement range I = 0 .8 1 „ Fig. 8-4 Feature displacement range 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.1.2.2. Camera Motion (M) As mentioned earlier, camera motion is difficult to parameterize; it is difficult to determine what types of camera motion results in less propagation error or to predict the degree by which one camera motion is more accurate (of the two camera motions). In AR, a user head motion is also not controllable: the freedom of motion needs to be given to the users. Therefore, camera motion is disregarded as a non-parameter. In the simulated experiments, camera motions were randomly generated within certain boundaries so that a minimum number of calibrated features are in-view all through the tracking sequence. In this camera panning experiment, a typical camera motions can be as follows (Fig. 8-5): first, the camera looks at the one end of the scene object; second, the camera pans through the longest dimension of the scene object; lastly, the camera moves to the other end of the scene object. X-axis object Initially calibrated features uncalibrated features Fig. 8-5 A Typical Camera Motion For realistic experiments, randomness was added to the camera motion. The camera motion was controlled by the key-frames of the camera viewpoints and look-at points. The Y and Z coordinates of the camera viewpoints and look-at points were randomly generated within certain boundaries. The X coordinates of the camera viewpoints and look-at points were panned along the X-axis of the scene object. Randomness was added also to the key-frames of the X 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. coordinates so that they were not uniformly distant. The key-frames of the viewpoints and look- at points were smoothed using a polynomial-smoothing algorithm. 8.1.2.3. Scene length As mentioned earlier, scene length was measured in the unit of FOV length (/v ), which is scale- free. In the beginning of the tracking sequence, lv = /, increasing as the camera pans (Fig. 8-6). Because the error increases as scene length increases, the propagation errors were measured in each scene length interval: in these experiments, the interval was 0.2 lv. Each of the 5 parameters (initial number of features, feature uncertainty threshold, average number of in-view features, number of frames, and measurement noise level) was tested to determine the effect, measuring the RMS registration errors in the end of each interval. image lv = 2 Fig. 8-6 Scene length increase 8.1.2.4. Initially Calibrated Features Initially calibrated features were automatically selected among the features. Given the feature placement and the camera motion, the simulator system counted the number of feature occurrences in the first 30-50 views. A certain number of most frequently viewed features were selected as initially calibrated features. Because the first camera views were at one end of the scene object, the selected features were clustered at one end of the scene. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.1.2.5. Measurement Noise Level In the simulated extendible tracking experiments, noise was intentionally added to the projected image coordinates. The image coordinates of the features were computed using true camera pose and then contaminated by the measurement noise to simulate the real situations where the measurements involve noise. The measurement noise was implemented using Gaussian random variables defining the noise level by the standard deviation (in pixel unit) of the Gaussian function. 8.1.2.6. Default Parameter Settings The parameters include number of frames (default = 1500), feature uncertainty threshold (default = 0.03 inch2 ), measurement noise level (default a = 0.5 pixel), number o f initially calibrated features (default = 6-7), and average distance between features (default = 5-10 inch). The parameters that were not relevant to the specific experiment were fixed. The default values were assigned unless specified. 8 .1.2.7. Error Presentation The propagation errors were measured in pixel unit because it is the registration error that actually matters in AR. For each parameter, 100 experiments were performed to present the RMS errors measured in each scene length interval. 8.2. Sim ulated E xperim ents In this section, candidates o f parameters that may affect the propagation error were tested through simulated extendible tracking experiments. For some parameters (measurement noise level and the number of initially calibrated features), grid experiments were also performed. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.2.1. Number of frames (/) The number of frames on a fixed object (fixed length) affects on the dynamics o f the camera motion. A larger number of frames results in slower motion; a smaller number of frames, faster motion of the camera. The number of frames has two effects on propagation error. With a larger number of frames, camera motion is slower, resulting in smaller process noise and smaller propagation error. On the contrary, a larger number of frames require a longer tracking sequence, which may result in larger propagation error. 20 r 1000 -----1200 - - MOO ------1600 1800 ----- 2000 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) a. RMS Error 1000 1200 - * 1400 -----1600 -----2000 2 ■ - 1.2 1.4 1.6 1.8 2 2 ^ 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) b. Standard Deviation Fig. 8-7 The Effect of Number of Frames 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this simulated extendible tracking experiment, other parameters were fixed with only changes in the total number of frames (1000-2000). Experiments were performed 100 times with randomly generated feature placements and camera motion trajectory. The number of frames did not seem to significantly affect on the propagation error (Fig. 8-7). Two contradictory effects (positive and negative) seemed to nullify each other. The propagation error increased with larger number of frames, but not always (e.g., f= 1800, 2000). Because the effect was non-uniform and insignificant, the number of frames was disregarded as a non­ parameter. 8.2.2. U ncertainty Threshold ( tu ) The feature uncertainty threshold is used to determine when to begin using a feature for camera pose computation. For each of 6 uncertainty threshold values (0.03-0.08 inch2 ), 100 simulations were performed with different feature-placements and camera motion trajectory. All the other parameters were fixed. The errors increased as the uncertainty threshold value increased, but not uniformly (Fig. 8-8). Sometimes the errors were smaller with larger uncertainty thresholds (e.g., when t = 0.07 and 0.08). The standard deviations were also high. Because of highly varying and non-uniform relation between the uncertainty threshold and the propagation error, the uncertainty threshold was disregarded. Rather, uncertainty threshold is settable. Although smaller uncertainty threshold values seem to result in smaller propagation errors, too small values may result in loss of tracking when there are fewer than 3 (initially or dynamically) calibrated features. Loss of tracking may be avoided by dynamically selecting the uncertainty values. 8 0 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 0.04 • -0.0S £ IS 0.06 0.08 1.2 1.4 1.6 1.8 2 2.2 2.4 2 6 2 8 3 3.2 3.4 3.6 3.8 scene length (lv) a. RMS Error 12 0.03 — 0.04 - -o.os 0.06 0.07 0.08 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) b. Standard Deviation Fig. 8-8 The Effect of Uncertainty Threshold 8.2.3. N um ber o f initially calibrated features ( nc) The number of initially calibrated features had limited effect on the camera pose accuracy and hence on the error propagation. According to the grid experiment, as the number of features increased, the camera position error decreased but slowly after a certain point (Fig. 8-9). From 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the curve, a reasonably good number seemed to be 6, providing a rationale for using 6 features in RA3 implementation. In the extendible tracking experiment, every parameter except the number of initially calibrated features was fixed. The experiment was repeated 100 times, randomizing the feature positions and the camera motion. The number of initially calibrated features was 3-8. In the extendible tracking, the number of initially calibrated features seemed to have a weaker relation with the propagation error than in the grid experiment. As nc increased, the registration error did not necessarily decrease (Fig. 8-10). The standard deviation was also relatively large, and the relation was not uniform (e.g., when n c= 6 and 7). Therefore, nc can be disregarded as a non-parameter. Rather, an effective number of initially calibrated features can be suggested (e.g., nc =6). Number of features vs. Camera position error (Average) 0.4 ----------------------------------------------------------------------------------------------- i n 0.3 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 number of features Fig. 8-9 The Effect of the "Number of Initially Calibrated Features" in Grid Experiment 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) a. RMS Error 14 7 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) b. Standard Deviation Fig. 8-10 The Effect of Number of Initially Calibrated Features 8.2.4. A verage Cam era Depth ( z ), Feature D ensity (m), and A verage N um ber o f in­ view Features ( nv) These three parameters ( z , m, and n v) are correlated. The average number of in-view features is directly related to the feature density (i.e., a higher density results in a larger number of in-view features). Average camera depth is also related to the number of in-view features: as the camera 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. goes farther, the number of in-view features increases in squares. Therefore, the relations can be summarized in the following equation. n v m-(z)2 Because of inter-dependencies, only two out of three parameters can be selected in the propagation error model. The question is whether the effect of nv (only one parameter), can be used to represent the effects of m and z . The average camera depth ( z ) changes the scale in camera pose estimation and feature calibration. With the same measurement noise level, larger camera depth results in larger calibration error. However, in AR, the scale factor in camera pose and feature calibration error can be disregarded because only the registration error (in pixel unit) matters. Therefore, only nv can be chosen for the propagation error model if the model represents registration error. 8.2.5. A verage Number o f in-view Features ( nv) In this experiment, the average number of in-view features (n v) was tested to determine the effect on the propagation error. As mentioned in the previous section, the effect of nv may represent the effects of feature placement density (m) and average camera depth (z). Other parameters including number of frames, measurement noise level, and feature uncertainty threshold were fixed. The value of nv was controlled by the feature density because, nv, which is averaged during the tracking process, is difficult to control (i.e., the values of nv are not whole numbers). The result was counter-intuitive. The intuitive expectation is that as nv increases, the propagation error decreases. However, the positive effect was limited as was the number of initially calibrated features; as nv>6, the error may have been reduced but by a small degree. On the contrary, the calibration errors (in large number of features) seemed to degrade the camera 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. pose accuracy and increase the propagation error (Fig. 8-11). The propagation error did not also increase uniformly as nv increased. Because the result was non-uniform with two contradictory factors, and the result may vary using different pose computation methods, nv was disregarded. — 12.13 - • 13.41 15.76 17.82 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) a. RMS Error 1 2 10 9.81 — 12.13 " C I 8 o i / > • -13.41 15.76 6 17.82 S — 4 2 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) b. Standard Deviation Fig. 8-11 The Effect of nv 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.2.6. Measurement N oise Level (crm) Measurement noise is one of the major sources of errors in the camera pose estimation and the feature calibration. To determine the effect of measurement noise level on the propagation error, two types of experiments were preformed: grid and extendible tracking. In the grid experiment, the standard deviation of the measurement noise was 0.1-1.0 pixels. For each measurement noise level, 100 different feature placements (with 8 features) were generated. For each feature-placement, true camera position was iterated on -500 grid points to calculate projected image coordinates. Noise was added to the image points and, then, estimated camera pose was computed. The true camera positions and the estimated camera positions were compared to determine the relation between the measurement noise level and the camera pose error. The relation was close to linear for both the average and the standard deviation (Fig. 8-12). 0 .5 i ■ average ■ standard deviation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 G a u ssia n m easurem ent noise stan d ard deviation Fig. 8-12 The Effect of Measurement Noise on Camera Position Error: Grid Experiment In the extendible tracking experiment, the camera was panned for -1500 frames. For each of 5 Gaussian measurement noise levels (standard deviation = 0.1-0.5 pixel), 100 simulations were performed with randomly generated feature-placements and cam era motion trajectories. All the other parameters were fixed. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.1 0.2 9 M T 0.4 0.5 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (Iv) a. RMS Error 12 10 — 0.2 x S 8 0.3 o w » 6 0.5 S 4 2 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (Iv) b. Standard Deviation Fig. 8-13 The Effect of Measurement Noise The relation was similar to that of the grid experiment. The registration errors and the standard deviations almost uniformly increased as the measurement noise level increased (Fig. 8-13). On the basis of the two experiments (grid and extendible tracking), the relation between the measurement noise level and the propagation error was clear and uniform. Measurement noise level is an effective parameter on propagation error. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.3. Propagation E rro r Prediction Model 8.3.1. Sum m ary o f P aram eter Effects Based on the experiments, the parameters that have major effects on the propagation error were identified to be measurement noise (standard deviation: O m) and scene length (/v ). The effects of camera motion (M) and feature placement (F) were difficult to determine. They (M and F) are also user decision. The values of uncertainty threshold ( tu ) and number of initially calibrated features ( nc ) can be fixed without affecting much on the propagation error. Lastly, number of frames (/) and average number of in-view features ( nv) did not have any significant effect on the propagation error. The effects of all the parameters are summarized in Table. 8-1. The parameters can be also classified according to whether they are controllable, motion-dependent, scene-dependent, or sensor-dependent. For example, measurement noise ( C m) depends on the optical sensor (camera) and the feature detection algorithm. Parameter classification is summarized in Table 8-2. Major effects Minor Effects Undeterminable Fixable Negligible • Measurement noise level • Camera motion • Uncertainty threshold • Number of frames (/) (AO (t„ ) ! • Number o f initially ! calibrated features 1 <*r> • Average number of in- • Scene length (/v) • Feature placement (F) j view features ( nv ) Table 8-1. The Effect of Parameters on Propagation Error 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Controllable j Scene / M otion — dependent i Sensor - dependent • Uncertainty threshold ( tu ) * Camera motion (M) i • Feature placem ent (F) • Number o f initially calibrated ‘ . . . . : • Average num ber o f in-view features ( t l . ) , — ‘ features ( n v ) • Scene length ( / v ) • Number o f frames ( / ) • Measurement noise level (a m) Table 8-2. Classification of Parameters on Propagation Error In this section, the effects of two parameters (<Tm and lv) on the propagation error were mathematically determined to obtain the propagation error model. To determine the mathematical relations between the parameters and the propagation error, model fitting was applied to the experiment data. MATLAB was used to empirically find a well-fitting model. 8.3.2. Propagation Error Equation Given the data obtained from the simulated experiment on the effect of measurement noise, the relation between the major parameters (i.e., < Jm and lv) and the propagation error can be determined: the error mean and the error range can be predicted using the RMS error and the standard deviation which are provided in Fig. 8-13. For example, the error mean (m ) and the error range (m -c < r < m+a, with 68% confidence) are predicted as in Fig. 8-14 (a: when measurement noise level = 0.3 pixel, b: when measurement noise level = 0.5 pixel) However, for measurement noise levels that were not involved in the simulated experiments (in the experiments, the measurement noise levels were 0.1, 0.2, 0.3, 0.4, and 0.5), it is difficult to predict the propagation error means and ranges. For general error prediction, a mathematical model can be used. Mathematical error prediction model can be obtained based on model fitting of the data from the simulated experiment: data from experiments on 5 measurement noise levels were used to find a unified mathematical model. The effect of measurement noise level seemed 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. linear observing the result of the grid experiment (Fig. 8-12). However, the effect of scene length seemed polynomial or exponential (Fig. 8-13). Various models were tested to empirically Find a well-fitting model. A model with the following equation provided a reasonably good Fit. e =0.0064- cr e 3 .45-/° 51+1.61 Fig. 8-15 shows the data plots and curves generated by the obtained mathematical model. The model for the error standard deviation can be obtained in the similar way. — M e a n * S O I S S i 10 s 0 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) a. Measurement noise level = 0.3 pixel — M e i n — M e a n - S O 1.2 1.4 1.6 1.8 2 2~2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 scene length (lv) b. Measurement noise level = 0.5 pixel Fig. 8-14 Predicted propagation error and range with 68°/o confidence 1.5 2.5 scenelenglh (tv) 3.S Fig. 8-15 Error Prediction Model 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9. Conclusion and Future Work Using traditional vision-based tracking systems, the tracking range is limited to the areas where a minimum number (this minimum number depends on the tracking algorithm) of pre-calibrated features are in view. Hence, in AR, augmented views (where the virtual objects are superimposed on the real scene) are available only in the limited tracking ranges. However, a user may find a new area that requires augmented views, i.e., where the tracking area needs to be extended. To extend the tracking range using these systems, an off-line calibration should be performed. To the best of my knowledge, there has been no research on the calibration of new features dynamically and interactively for accurate AR tracking. In the proposed vision-based Extendible Tracking, the tracking is fiducial-based in the beginning, depending on a small set of pre-calibrated fiducials. While the camera is moving, the system may dynamically calibrate detected, yet unknown, features. The calibration process is deferred until required instead of performing an exhaustive off-line calibration. Given that these features are calibrated, they can be used for tracking just as the pre-calibrated fiducials. Extendible Tracking was feasible based on the following methods: • Dynamic feature position calibration • Pose computation refinement • Applying robust statistical method • Integration with natural feature tracking By dynamic feature calibration, interactively added features were calibrated dynamically and on­ line using recursive filters. Pose computation method was refined to resolve the multiple solution and instability problems of the 3-point based method. By applying the robust statistical methods, the outliers (feature and camera pose) were detected and rejected in real-time. By Extendible 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Tracking, starting with a small number of calibrated features, even ten times many features were calibrated and used for tracking. The propagation errors (between camera pose estimation and feature calibration) in extendible tracking were affected mainly by measurement noise and also by many other factors such as feature placement, camera motion, number of initially calibrated features, feature density, scene length, and feature uncertainty threshold. Because of large number o f factors, a precise prediction of the propagation error was difficult. Experiments have been performed to determine the relation of each parameter to the propagation error. On the basis of the simulated experiments, the most effective factors were identified to be scene length and measurement noise. An error prediction model was designed based on the results of simulated experiments. The effect of scene length to the propagation error seemed to be more than linear scale. According to the propagation error model, when < 7 m = 0 .5 , the scene object length (/„ ) should be <2.5, for the registration errors to be smaller than 3 pixels in average. Therefore, for accurate AR tracking, the tracking range extension is limited to the neighbor of the pre-calibrated features. In AR, for the VE to be superimposed on the RE, the spatial linkage information needs to be pre­ defined: the coordinates of the VE should be spatially linked to the coordinates of the RE. This context information between the RE and the VE includes the world coordinate system, the 3D positions of the real objects (features), and the 3D positions of the virtual objects. The VE superimposition is based on this context information. When the tracking range is large, use of one context may result in inaccurate tracking because of the limitation in tacking range extension. However, a large area could be decomposed into many small areas (with corresponding contexts); tracking is based on small contexts in small areas. This Localized Tracking is more practical and more accurate. Fig. 9-1 shows the tracking area decomposed into four small contexts. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. calibrated feature uncalibrated feature small region Fig. 9-1 Tracking area decomposition In Localized Tracking, tracking range is still extended but only to limited areas, i.e., to the neighbors of the pre-calibrated features within the same context. In a neighboring area within a different context, a different set of pre-calibrated features can be used for tracking and the VE superimposition of that context. When more than one context is required to be viewed in the same image, the localized tracking is not capable of providing tracking information because each context may have different coordinate system. However, the coordinate systems o f local contexts can be redefined in one world coordinate system on-line or off-line (coordinate system transformation is required). A fter the coordinate systems are redefined, the tracking is based on one world coordinate system and more than one context can be viewed in the same image. One possible problem is the identification of the contexts. Because there are many contexts and pre-calibrated feature sets, each context needs to be uniquely identified. One solution is using ID tags as used in NaviCam [Rekimoto 1997], ID tags are used for identifying the context and providing the context information (3D positions of the pre-calibrated features, and virtual objects and their 3D positions). These ED tags, when they are large, can also serve as features. Fig. 9-2 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. shows an example of localized tracking design; it includes an ED tag, pre-calibrated features and uncalibrated features. o ID tag calibrated feature O O uncaJibraied feature • O Fig. 9-2 ID tag and features for localized tracking Localized tracking has many advantages: first, pre-calibration is not required in a wide area, but in many small areas (this is easier and can be accomplished using a small digitizing device); second, the propagation error in extendible tracking is small; third, the context definition between the real and the virtual environment is small and less complicated. 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography [Azarbayejahni 1995] A. Azarbayejani and A. Pentland, “Recursive Estimation of Motion, Structure, and Focal Length”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 6, June 1995 [Azuma 1994] Ronald Azuma, Gary Bishop, “Improving Static and Dynamic Registration in an Optical See-through HMD”, Proceedings of Siggraph94, Computer Graphics, pp. 197-204 [Azuma 1995] Azuma, Ronald T. A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments 6, 4 (August 1997), 355 - 385. Earlier version appeared in Course Notes #9: Developing Advanced Virtual Reality Applications, ACM SIGGRAPH (Los Angeles, CA, 6-11 August 1995), 20-1 to 20-38. [Behringer 1999] Reinhold Behringer, “Registration for outdoor Augmented Reality Applications Using Computer Vision Techniques and Hybrid Sensors”, 1999 IEEE Virtual Reality, pp.244- 251 [Berger 1997] M.-O. Berger, “Resolving Occlusion in Augmented Reality: a Contour Based Approach without 3D Reconstruction”, CVPR 97. [Betting 1995] Betting, Fabienne, Jacques Feldmar, Nicholas Ayache, and Frederic Devemay. A New Framework for Fusing Stereo Images with Volumetric Medical Images. Proceedings o f Computer Vision, Virtual Reality, and Robotics in Medicine '95 (CVRMed '95) (Nice, France, 3-6 April 1995), 30-39. [Broida 1990] T.J. Broida, S. Chandrashekhar, and R. Chellappa, “Recirsive 3-D Motion Estimation from a Monocular Image Sequence”, IEEE Transactions on Aerospace and Electronic Systems Vol. 26, No. 4, July 1990 [Caudell 1992] T. P. Caudell, and D. M. Mizell, “Augmented Reality: An Application of Heads- Up Display Technology to Manual Manufacturing Processes,” Proceedings o f the Hawaii International Conference on Systems Sciences, 1992, 0073-1129-1/92, EEEE Press, January, 1992, pp. 659-669 [Cho 1998] Y.K. Cho, J.Lee, and U. Neumann, “A Multi-ring Color Fiducial System and A Rule- Based Detection Method for Scalable Fiducial-tracking Augmented Reality”, Proceedings o f International Workshop on Augmented Reality, San Francisco, Nov. 1998 [Cho 1999] Youngkwan Cho, Scalable Fiducial-Tracking Augmented Reality, Ph.D. Dissertation, Computer Science Department, University of Southern California, January 1999 [Curtis 1998] Dan Curtis and David Mizell, “Several Devils in the Details: Making an AR App Work in the Airplane Factory”, International Workshop on Augmented Reality (IWAR) ‘98, San Francisco, November 1998. [Dementhon 1995] D. Dementhon and L. Davis, “Model Based Object Pose in 25 Lines of Code”, International Journal of Computer Vision, 15:123-141, 1995 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Edwards 1995] Edwards, P.J., D.L.G. Hill, D.J. Hawkes, R. Spink, A.C.F. Colchester, A. Strong, and M. Gleeson. Neurosurgical Guidance Using the Stereo M icroscope. Proceedings o f Computer Vision, Virtual Reality, and Robotics in Medicine '95 (CVRM ed '95) (Nice, France, 3-6 April 1995), 555-564. [Feiner 1993] S. Feiner, B. MacIntyre, D. Seligmann, “Knowledge-Based Augmented Reality,” Communications of the ACM, Vol. 36, No. 7, pp 52-62, July 1993 [Feldmar 1997] J. Feldmar, N. Ayache, and F. Betting, “3D-2D Projective Registration of Free- Form Curves and Surfaces” , Computer Vision and Image Understanding, 65(3):403-424, 1997 [FhG IGD] Fraunhofer Project Group for Augmented Reality at ZGDV, Department Visualization & Virtual Reality (FhG IGD-A4), “http://www.igd.fhg.de/www/igd-a4/ar/” [Fischler 1981] Martin A. Fischler and Robert C. Bolles, “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”, Communications of the ACM, Vol.24, No.6, June 1981, pp.381-395 [Foxlin 1998] E. Foxlin, M. Harrington, G. Pfeifer, “Constellation : A Wide-Range Wireless Motion-Tracking System for Augmented Reality and Virtual Set Applications”, Proceedings of SIGGRAPH 98 (Orlando, Florida, July 19-24, 1998). In Computer Graphics Proceedings, Annual Conference Series, 1998, ACM SIGGRAPH, pp.371-378 [Fox Trax 1997] Michael J. Potel, “The Fox Trax Hockey Puck Tracking System”, IEEE Computer Graphics and Applications, March-April 1997, pp.6-12 [Ganapathy 1984] Sundaram Ganapathy, “Decomposition of Transformation Matrices for Robot Vision”, Proceedings of Int. Conf. Ronotics and Automation, 1984, pp. 130-139 [Ghazisadedy 1995] M. Ghazisadedy, D. Adamczyk, D. J. Sandlin, R. V. Kenyon, T. A. DeFanti, “Ultrasonic Calibration o f a Magnetic Tracker in a Virtual Reality Space,” Proceedings o f Virtual Reality Annual International Symposium (VRA1S) ’95, Raleigh, NC, March 11-15, pp. 179-188 [Grimson 1996] W.E.L. Grimson, T. Lozano-Perez, W.M. Wells EH, G.J. Ettinger, S.J. White, and R. Kikinis, "An Automatic Registration Method for Frameless Stereotaxy, Image Guided Surgery, and Enhanced Reality Visualization" In Transactions on M edical Imaging, 1996. [Hartley 1995] R.L. Hartley, “A Linear Method for Reconstruction from Lines and Points”, ICCV, pp.882-887, 1995 [Horaud 1989] Radu Horaud, Bernard Conio, and Oliver Leboulleux, “An Analytic Solution for the Perspective 4-Point Problem”, Computer Vision, Graphics, and Image Proceeding 47, 33- 44(1989) [Huber 81] Huber, Peter J., “Robust Statistics”, Wiley Series in Probability and Mathematical Statistics, 1981 [Julier 1995] S.J. Julier, K. Ulmann, and H.F. Durrant-Whyte, Proceedings of the 1995 American Control Conference, Seattle, Washington, pp. 1628-1632 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Kamanta 1992] Kamanta, S., Easin, R.O., Tsuji, M., and Kawaguchi, E., “A Camera Calibration using Four Point-Targets”, Proceedings o f 11* IAPR International Conference on Pattern Recognition, Conference A: Computer Vision and Applications, 1992, V ol.l, pp.550-553 [Koenderink 1991] J J . Koenderink and A J. van Doom, “Affine Structure from M otion”, Journal o f the Optical Society o f America, Series A, 8:377-385, 1991 [Koller 1997] Dieter Koller, Gudrun Kliner, Eric Rose, David Breen, Ross Whitaker, Mihran Tuceryan, “Real-time Vision-Based Camera Tracking for Augmented Reality Applications”, IEEE Conf. On Computer Vision and Pattern Recognition, 1997 [K riegm an 1990] D. K riegm an and J. Ponce, “O n R ecognizing and P ositioning C urved 3D O b jects from Im age C o ntours” , IE E E T ransactions on P A M I, 12(12):! 127-137, D ecem ber 1990 [Kumar 1994] R. Kumar and A. Hanson, “Robust Methods for Estimating Pose and a Sensitivity Analysis”, CVGIP:Image Understanding, 60(3):313-342, 1994 [Kutulakos 1996] K. N. Kutulakos and J. Vallino, "Non-Euclidean Object Representations for Calibration-Free Video Overlay," Proc. International Workshop on Object Representation for Computer Vision, April 1996. [Lorensen 1993] Lorensen, William, Harvey Cline, Christopher Nafis, Ron Kikinis, David Altobelli, and Langham Gleason. Enhancing Reality in the Operating Room. Proceedings o f Visualization '93 (Los Alamitos, CA, October 1993), 410-415. [Lowe 1991] Lowe, D.G., “Fitting Parameterized Three-Dimensional Models to Images”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 13, pp.441-450, 1991 [Manku 1997] Gurmeet Singh Manku, Pankaj Jain, Amit Aggarwal, and Lalit Kumar, “Object Tracking using Affine Structure for Point Correspondences”, CVPR97, pp.704-709 [Maybeck 1979] P.S. Maybeck, Stochastic Models, Estimation, and Control, Volume 1, New York, Acamedic press, Inc., 1979 [Mellor 1995] J. P. Mellor, Enhanced Reality Visualization in a Surgical Environment, MS Thesis, Department of Electrical Engineering, MIT (13 January 1995). [M endel 1995] J.M . M endel, Lessons in Estimation Theory fo r Signal Processing, Communications, and Control, E nglew ood Cliffs, N ew Jersey, P ren tice H all P T R , 1995 [Meyer 1992] Kenneth Meyer, Hugh L. Applewhite, and Frank A. Biocca, “A Survey on Position Trackers”, Presence, Vol.l, No.2, Spring 1992, pp. 173-200 [Milgram 1993] Milgram, P., Zhai, S., Drascic, D., Grodski, J.J., "Applications of Augmented Reality for Human-Robot Communication", Proc. EROS'93: Int'l Conf. on Intelligent Robots and Systems, Yokohama Japan, 1467-1472, July 1993. [Mundy 1992] J.L. Mundy and A. Zisserman, Geometric Invariance in Computer Vision, MIT Press, 1992 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Neumann 1996] U. Neumann, Y. Cho, “A Self-Tracking Augmented Reality System,” Proceedings o f ACM Virtual Reality Software and Technology ‘96, Hong Kong, pp. 109-115 [N eum ann 1998a] U. N eum ann, S. Y ou, “Integration o f R egion Tracking and O ptical Flow fo r Im age M otion E stim ation,” Proceedings o f IEEE ICIP-98, C hicago, Illinois, Oct. 1998. [N eum ann 1998b] U. N eum ann, J. Park, "E xtendible O bject-C entric T racking for A ugm ented Reality", IEEE Virtual Reality Annual International Symposium (VRAIS) ‘98, A tlanta, G eorgia, M arch 14-18, 1998, pp. 148-155 [O liensis 1997] John O liensis, “A C ritique o f S tructure from M otion A lgorithm s” , N EC I T echnical R eport, A pril 1997. http://w w w-neci.ni.nec.com /hom eD ages/oliensis/poleiccv.DS [O liensis 1999] O liensis, J, “F ast and accurate self-calibration” . The Proceedings o f the S eventh IEEE International C onference on C om puter V ision, 1999, vol.2, pp.745-752 [Park 1998a] J. Park, U. Neumann, “Natural Feature Tracking for Extendible Robust Augmented Realities”, International Workshop on Augmented Reality (IWAR) ‘98, San Francisco, November 1998. [Park 1998b] J. Park, U. Neumann, “Extending Augmented Reality with Natural Feature Tracking”, Proceedings o f SPIE Vol.3524-15, Telemanipulator and Telepresence Technologies V, Boston, Massachusetts, Nov. 1-5, 1998 [Press 1993] William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery, Numerical Recipes in C : The Art o f Scientific Computing, 2nd edition (January 1993), Cambridge Univ Pr (Short); ISBN: 0521431085, p.705 [Rastogi 1996] Rastogi, A, Milgram, P., Drascic, D., "Telerobotic Control with Stereoscopic Augmented Reality", SPIE Volume 2653: Stereoscopic Displays and Virtual Reality Systems IU, pp 135-146, San Jose, Feb 1996 [Rekimoto 1997] Jun Rekimoto, “NaviCam: A Magnifying Glass Approach to Augmented Reality”, Presence, Vol.6, No.4, August 1997, pp.399-412 [Quan 1999] Long Quan and Zhongda Lan, “Linear N-Point camera pose determination”, IEEE Transactions on Pattern Recognition and Machine Intelligence, Aug. 1999, Vol.21, Issue 8, pp.774-780 [Sawhney 1999] Sawhney, H.S.; Guo, Y.; Asmuth, J.; Kumar, R, “Multi-view 3D estimation and applications to match move”, Proceedings of IEEE Workshop on Multi-View Modeling and Analysis of Visual Scenes, 1999 (MVIEW ’ 99), pp.21 - 28 [Sharma 1997] R. Sharma, J. Molineros, “Computer Vision-Based Augmented Reality for Guiding Manual Assembly,” Presence: Teleoperator and Virtual Environments, Vol. 6, No. 3, pp. 292-317, June 1997 [Simon 1998] G. Simon, V. Lepetit, and M.-O. Berger, “Computer Vision Methods for Registration: Mixing 3D Knowledge and 2D Correspondences for Accurate Image Composition”, International Workshop on Augmented Reality (IWAR) '98 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Sims 1994] Sims, Dave. “New Realities in Aircraft Design and Manufacture”, IEEE Computer Graphics and Applications 14, 2 (March 1994), 91. [State 1996a] A. State, G. Hirota, D. T. Chen, B. Garrett, M. Livingston, “Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking,” Proceedings of SIGGRAPH 96 (New Orleans, Louisiana, August 4-9, 1996). In Computer Graphics Proceedings, Annual Conference Series, 1996, ACM SIGGRAPH, pp.439-446 [State 1996b] State, Andrei, M ark A. Livingston, Gentaro Hirota, W illiam F. Garrett, Mary C. Whitton, Henry Fuchs and Etta D. Pisano. Techniques for Augmented-Reality Systems: Realizing Ultrasound-Guided Needle Biopsies. Proceedings o f SIGGRAPH ‘96 (New Orleans, LA, 4-9 August 1996), 439-446. [Sturm 1996] P. Sturm and B. Triggs, “A factorization based algorithm from multi-image projective structure and motion”, ECCV 1996, pp.709-720 [Taubes 1994]Taubes, Gary. Surgery in Cyberspace. Discover 15, 12 (December 1994), 84-94. [Triggs 1999] B. Triggs, “Camera Pose and Calibration from 4 or 5 known 3D Points”, The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, Vol.l, pp.278-284 [Tuceryan 1995] Tuceryan, M., D. S. Greer, et al. Calibration Requirements and Procedures for a Monitor-Based Augmented Reality System. IEEE Transactions on Visualization and Computer Graphics 1 (3): 255-273, 1995 [Uenohara 1995] M. Uenohara and T. Kanade, “Vision-Based Object Registration for Real-Time Image Overlay,” Proceedings o f Computer Vision, Virtual Reality, and Robotics in Medicine: CVRMed ‘95, N. Ayache, Berlin, Springer-Verlag, pp. 14-22 [Weiss 1993] “Geometric Invariants and Object Recognition”, International Journal of Computer Vision, Vol. 10, No.3, pp.207-231, 1993 [Welch 1997] G. Welch and G. Bishop, “SCAAT: Incremental Tracking with Incomplete Information Proceedings o f SIGGRAPH 97 (Los Angeles, California, August 3-8, 1997). In Computer Graphics Proceedings, Annual Conference Series, 1997, ACM SIGGRAPH, pp. 333-344 [Youngblut 1996] Christine Youngblut, Rob E. Johnson, Sarah H. Nash, Ruth A. Wienclaw, and Craig A. Will, “Review of Virtual Environment Interface Technology”, IDA Paper P-3186, March 1996: available at http://www.hitl.washington.edu/scivw/IDA/ [Yuan 1989] Yuan, J.S.C. “A General Photogrammetric Methods for Determining Object Position and Orientation”, IEEE Trans. On Robotics and Automation, Vol. 15, pp. 129-142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A. Algorithm o f the Perspective 4-Point Problem based on Horaud’s method A. 1. Introduction Horaud et al. proposed a pose computation method that is well known for perspective 4-point pose computation [Horaud 1989]. By this method, the 6 DOF camera pose is computed based on four known points (coplanar or non-coplanar). According to the authors, this method produces fewer and more stable solutions than the 3-point method [Fischlerl981]. The objective of this chapter is to implement and compare the performance (stability, accuracy, and number of multiple solutions) of this method with that of the 3-point method. M3 Fig. A -l Four points (M0 / M lrM ^M 3) converted into a pencil of 3 lines (L^L^ L 3) By the 4-point method, four points in the object space were converted into a pencil of three lines (Fig. A -l). These three lines share one of the four points, which is the origin of the object space (Mo in Fig. A -l). Based on the three lines (coordinates in the object space and their projections on the image space), the camera projection homogeneous matrix (transformation from the object space to the camera space) was computed. The homogeneous matrix (A) was decomposed into two homogeneous matrices (A t, from the image space to the camera space; A2, from the object space to the image space). First, the algorithm of this method is explained. Then some potential 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problems (signs of trigonometry, depth calculation) are also described with some suggested improvements. O b j e c t s p a c e I m a g e s p a c e C a m e r a s p a c e Fig. A-2 Homogeneous Matrix Decomposition A.2. M atrix A, (T ransform ation from the im age space to the cam era space) The projection of the object space origin (M0) is the image space origin (J). The bases of the image space are defined as follows: the X-axis (&’) is defined by the line connecting the camera focal point (F) and the image space origin (./); the Y-axis (Pj) is defined to be perpendicular to X- axis and lj (the line connecting the projections of M0 and Mj)-, the Z-axis is defined to be perpendicular to the X-axis and the Y-axis. Although the coordinates of Pj are not determined in the object space at present, P} is perpendicular to the plane containing FJ and L} of the object space. Below are the equations to calculate the bases of matrix A| (from Equation (4) of [Horaud 1989]). The values can be calculated based on the intrinsic camera parameters (image center, focal length, and the scale factor) and two image points (projections of M0 and Mj). 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Y -a x is Image plane X-axis FJ Camera Fig. A-3 Image space coordinate system k ' = F J ll^ll (Eq. A-l) k'xP.' /fx (/3 xA:') / , - ( * '• / , )* ’ M *1 M *1 k x k y k z 0 P,x (k'xP3)x FJx ‘ 3 v Plz 0 ( i 'x P , ) v F J y (k x P3). F J . 0 1 A.3. M atrix A 2 (Transform ation from the object space to the im age space) A.3.1. O verview The object space is defined by M0 (the object space origin), L} (the X-axis), Ps (the Y-axis, X FJ , FMg, and Lj), and L3xP3 (the Z-axis) as in Fig. A-4. In this coordinate system, L, and L2 can be represented as below with unknown 6. Because the coordinates of P} in the object space are not 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. determined at present, 6 is not determined, a; can be determined from the relationship of L} and L, (t'=l,2), and (3 can be determined from L ’t and LS (the projections of Li and L2 ). L, = s in a ,L 3 + c o s a , cos (/? + 8 )P 3 + co sa, sin(/3 + 0 ) L Jx P i (E q. A -2) L-, = sina-,L J + c o s «•, c o s (j3 -Q )P i - c o s c t,s in ( ^ —8 ) L } x P i Fig. A-5 is a view of the plane containing FJ and L3 (view from negative Y direction). The object coordinate system and the image coordinate system share the same Y-coordinate (P3). Given the angle < { > , equations for L3 (X-axis) and L3 xP3 (Z-axis) are easily derived. L3 = c o s 0 • k'+sin(f) (k'xP j) LjXP} = -sin<p ■ k'+cos(p ■ (k'xP )) 0 < < ( > < K (Eq. A -3) Therefore, X-axis o f the object space represented in image space A, C O S 0 sin T he distance betw een J and M q M j X-axis Z -axis Lj xP3 P+0 p-0 Y-axis Rg. A-4 Object space coordinate system 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. J ■ 0 X-axis in the image space Fig. A-5 Image space and Object space What remains is to calculate the angles ( j > and 6, and the distance dx. A .3.2. Problem s w ith signs o f trigonom etry (1) Pi and P2 According to Horaud et al.. Pi (the vector perpendicular to the plane containing FJ and L t) and P2 (the vector perpendicular to the plane containing FJ and L2 ) are represented in the object space as follows (Equations 14-17 of [Horaud 1989]). In the term sin y • (k 'x P j) , Horaud et al. hypothesized that the coefficient sin y, = ||P , x /^|| > 0 in all cases. However, the signs of the coefficients depend on the geometrical relationship of the lines /, (Fig. A-6). In Case 1 of Fig. A-6, P/ is in the negative direction o f (k 'x P 3) , so that the coefficient should be negative. Similarly, in Case 2, P2 is in the negative direction of ( k \ P j ) . P, = cos y, • P3 + sin y, • (fc‘ x / > 3) P2 = cos y, • P3 + sin y 2 • (k 'x P j) cos y, = P: • P3 siny, = ||^ x P 3|| (Eq. A -4) where C o s y 2 = />,•/> s in y , = j|P ,x P 3|| 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. k'xP, Case 1: P, = cos y, • P3 - sin 7 , • (kXP3) Case2: P, = cos 7 , • P3 - + - sin 7 i • (kXP 3) P2 = cos 7 , - P3 + sin 7 , • (fc'xPj) P, = cos 7 2 • P3 — sin 7 2 • (fcyP3) Fig. A-6 Cases when the coefficient of (kXP3) is negative A general calculation of the coefficients is as follows. Using these equations, the signs are correctly defined in any case. S t = • P3 y /U (Eq. A -5) Pi =Si P} + g 2 (kXPj) g 2 = /> »(kXP2) P2 = g i P , + g 4 (kXPi ) where g3 = p . P } g4 = P2*(kXP , ) (2) L2 Equation (3) of [Horaud 1989] represents L2 in the object space as follows. L , = s i n a 2 + c o s a 2 c o s ( /J - 0 ) P3 + c o s a 2 -sin(/3- 6 ) L 2x P i For Fig. A-7, however, the sign of sin (P~0) is positive when L '2 (and L2) is in the negative direction of Lj x P3, and the sign of the second term is negative as follows. 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L, = s i n a , • Lj + c o s a 2 -cos(/? - 6 ) - Pi - c o sa 2 -sin(/? - Q ) ■ x P 3 Because the coordinates of P} in the object space are not determined at present, the relations of P} with Li and L2 are not known; hence, the sign of the coefficient is not determined. The undetermined sign may increase the complexity and the number of solutions. X-axis Z-axis (3+0 (3-0 •Y-axis Fig. A-7 Object space coordinate system (3) cos(5 Angle P can be calculated using L/ ’ and L2\ the projections of Li and L2 on YZ-plane of the object space. cos(2/3 ) = L1'«L2' 0 < 2 /3 < 2 k cos = ^ ( l + cos2p) (Eq A.6) sin P — co s2 /J) But 2P can be larger than 180° (depending on where Ps lies), and cosP can be negative (sinP > 0 always because 0 < P< it). Just as for L2, because the coordinates of P3 are not determined at this point, the sign of cosp is also undetermined. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.3.3. Calculation of 0, 6, and dx A.3.3.1.Calculation o f < p From the constraint that L, and P, are perpendicular (by definition of Pi), two equations can be obtained with two unknown angles (<p,6). = 0 L, »P, = 0 Li and L2 in the object space are previously defined as L, = sin a ,L 3 + c o s« , cos(/3 +d)P3 + co sa i sin(/3 +0)L3xP3 L, = s in a 2L3 + co sa ,c o s(/3 -0 ) P 3 -c o s a ,s in (/3 - 6 ) L 3xP3 (Eq. A-7) (Eq. A-2) 3 L: and L2 in the image space are (replacing Lj by cos0 -fc'+sin0 ■k'xP3 using Eq. A-3) L ,= ( )^'+cosa, cos(/3 + 0)P3 +(sin0sina, + cos0cosa, sin(/3 +0))k'3 xP3 ^ A g) L -, = ( )k'+ cosa, cos(/3 —Q)P3 + (sin0sina, -cos^cosa, sin(/3 -0))yfc’ 3 xP3 The coefficients of k ’ are avoided because they are irrelevant to the calculation of Eq. A-7. Pi and P2 in the image space are as defined in Eq. A-5. From the constraints in Eq. A-7, g, cosa, cos(/? +6) + g2 cosa, cos< /> sin(/3 +d) = - g , sin a, sin0 g3 cosa, cos(/3 —0 )~ g4 - cosa, -cos0 sin(/3 -0 ) = - g 4 sina, -sin0 (g, cos/3 + g, sin /3 cos0)cosa, -cos0 + (-g| sin /3 + g 2 cos/3 -cos0)-cosa, sinf? = - g , -sina, sin0 (g3 -cos/3-g4 sin p -cos0) cosa, cos0 + (g3 sin P + g4 -cos/3 -cos0)-cosa, -sin0 = - g 4 -sina2 -sin0 (g, cosP + g2 sin P cos0) cos0 + (-g, sin P + g2 cosP -cos0)-sin0 = - g , tana, -sin0 (g3 -cos/3-g4 - sin/3 -cos0)-cos0 +(£3 sinP + g4 -cos/3 -cos0)-sin0 = - g 4 tana, -sin0 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Given that al = g l cos {5 bt =gj cos/5 a2 = g 2 -sinP b2 = - g 4 sin P a3 = - g t - sin P b3= g 3 sinp a4 = g 2 cosP b4 = g 4 cosP a5 = —g 2 ■ tana, b5 = - g 4 • tana. Then the constraints are (a, + a2 -cos0)cos 9 + (a3 + a 4 c o s0 )sin 0 = a5 sin0 (6, + b 2 -cos0) -cos0 + (jb3 + b4 -cos0) -sin0 = b5 ■ sin<p From the two equations, s-ng _ (a, + a 2cos0)cos0-a5sin0 _ (bt + b 2 cos0)cos0- b 5 sin0 (a 3 + a4 cos<p) (b3 + b4cos(j)) {(a, + a 2 costf>)(b3 +b4 cos<p) — (6, + b 2 cos<p)(a3 + a 4 cos<p))cosd = {a5(b3 + b4 cos<f))-b5(a 3 +ci4 cos(j))}sin<p = {(a 5 b3 — b5a 3) + (a5 b4 — b5a4) cos (f)} sin (j) Given that D = (at + a 2cos0)(&3 +64cos0)-(fc, + b 2 cos<p)(a3 +a4cos(p) = (a lb3 - b xa3) + (a 2 b3 — a 3 b2 + a { b4 — bx a 4)cos^> + (a 2b4 —b2a4)c o s2 < p Then (E q . A -9) Likewise 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. cos0 - ^ 3 + a 4 c o s ^ ) s ‘ n ^ - a 5 s* n0 _ ( ^ 3 + 04 c o s 0 ) s in 0 - 0 5 sin0 (<a,+a2 cos0) ( Z ? , + b2 cos(p) {(a3 + ci4 cos<p)(b\ + b2cos< p)-(b3 +b4cos<p)(ax +tf,cos0)}sin0 = {a5(0, + b 2 cos0 )— 05(a, + a, cos0)}sin0 = {(a5 0, —b5ax) + (a 5b 2 — 65 a2)cos0}sin0 sin0 = - in ^ - { (q5 Z > , — b5ai) + (a5 b2 — 65 a,)cos0} (Eq- A' 10> Given that fc, = a 5 b4 - b 5 a4 k 2 = a s -b3 —b5 -a3 k 3 = a 5 b2 —b5 ■ a2 k 4 = a 5 bx - b 5 a, k 5 = a, b3 —bx ■ai k 6 = a 2 b4 - b 2 -a4 k 7 = a2 b3—b2 -a3 +ax b4—bx a4 Using the constraint cos2 0 + sin '0 = 1, one equation with one unknown is derived. i4 — kx + k3 + k6 h = 2 (*i -k2 +k3 k4 +k6 -k7) i2 = k2 +k4 +k72 — k 2 —k3 +2 k5 k6 i, = 2• (— A :, ■k2—k3-k4 +k5-k1) i0 =k52 - k 2 - k 2 i4 c o s 0 4 + /3 c o s 0 3 + / 2 co s(f> 2 + /, c o s( f > + i0 = 0 (Ecl- A-l l) This equation can be solved analytically. Theoretically, there can be < 4 multiple solutions. Additional constraints can be applied to reduce the number of solutions. Horaud et al. suggested two constraints to reduce the number of solutions: 0 <cosip <1 and L,»(k'xP,) > 0. According to the initial synthetic experiment, the second constraint did not reduce the number of solutions. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.3.3.2.Calculation of 6 Given that the value of cos<p is calculated, dean be calculated as follows from Eq. A-9 and Eq. A- 10. sin0 = -yj 1 — c o s 0 2 (sin</>>0 \'0<<p < T t ) D = k 5 + k 6 c o s Q2 + k 7 cos<p f sin < p ^ /, , . \ — \-{ki c o s0 + k 2) (E q . A -1 2) sin 0 = cos 6 = D < ✓ r sintp ^ - D (kj cos<p+k4 ) A.3.3.3.Calculation of dx In this chapter, an approach different from the original 4-point method [Horaud 1989] was used to calculate d„ the distance between the image space origin and the object space origin. M M j is in the object space and others are in the image space but Horaud et al. seemed either not to explain the algorithm in detail or to hypothesize that they are in the same coordinate system. However, because the value of angle < p is already calculated, the geometric relations between the triangular components (Fig. A-8) can be determined. FM (the distance between the camera and the object point, M q ) can be calculated as follows. M 3 k ' x P in object space F X in image space Fig. A-8 Calculation of depth 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. First, 8 , is calculated using F J and F J j. Sing, (Eq.A-13) 7 T Given that 8 < — , 8 l can be uniquely determined (considering that the FOV of most cameras Tt . , . K < — , any two points m the image < — ). S 2 = 7t—< p < 5 3 = ;r - (5, + <5,) = 7r - (5, +7r-0) = 0-<5, = (Eq. A-14) sin<53 sin 5, | | F M | | = | | M M ! ||.£ | d x = ||F M ||-||F y || A.4. Experiments and Result In the experiments, true camera position was iterated on 3D grid points, maintaining the look-at point around the origin o f the object space, so that all the four points are in the view of the image. Noise was added to the look-at point for randomness. Gaussian noise (0=0.5 pixel) was added to the image coordinates to simulate measurement noise. The camera position errors of this method were compared with those of the 3-point method (Fig. A-9). The peaks in the chart show camera position outliers (both methods produced outliers). The outliers seem to be due to numerical instability of the computation. Table A-l shows the average and standard deviation of the camera position errors (ignoring outliers) of the two methods. According to the experiments, more accurate camera pose was produced by the 3-point method than the 4-point method. il l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. P4P vs. P3P 45 - f 4 0 - - — P4P — POP £ 30 - § 25 - © I 20 - U L l ^ cn m r - o> in o © n o f. C D 00 O * © o » « - to C M M - to Frames Fig. A-9 Camera position error: 4-point method vs. 3-point method P4P P3P Average 0.9286 0.8079 SD 0.6440 0.5516 Table A-l Average and standard deviation of camera position errors ♦ P3P used only 3 points out of 4 points The number of solutions that the 4-point method provided ranged from 1 to 4: mostly, it provided 2 solutions. Some of the constraints suggested by Horaud did not reduce the number of solutions. Table A-2 shows the average number of solutions of the 4-point and the 3-point methods. The 4- point method seems to produce fewer number of solutions. P4P P3P solutions 1.804 2.002 Table A-2 Average number of solutions 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.5. Discussion The 4-point method reduces the number o f multiple solutions by applying additional constraints (simulated experiments show that this 4-point method provides fewer number of solutions, on average, than the 3-point method). However, there are unresolved sign problems (signs of sin ((5— 6), and cosfi). The 4-point method is numerically unstable just as the 3-point method. The camera position was not more accurate than that of the 3-point method. As a result, the 4-point method does not seem advantageous over the 3-point method. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 
Asset Metadata
Creator Park, Jun (author) 
Core Title Extendible tracking:  Dynamic tracking range extension in vision-based augmented reality tracking systems 
Contributor Digitized by ProQuest (provenance) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,engineering, electronics and electrical,OAI-PMH Harvest 
Language English
Advisor Neumann, Ulrich (committee chair), Mendel, Jerry M. (committee member), Nevatia, Ramakant (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-140641 
Unique identifier UC11334788 
Identifier 3041507.pdf (filename),usctheses-c16-140641 (legacy record id) 
Legacy Identifier 3041507-0.pdf 
Dmrecord 140641 
Document Type Dissertation 
Rights Park, Jun 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
engineering, electronics and electrical
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button