Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Segmentation and Inference of 3-D Descriptions from an Intensity Image by Mourad Zerroug A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 1994 Copyright © 1994 Mourad Zerroug U M I Number: DP22896 All rights reserved INFORMATION TO ALL USERS T he quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and there are m issing p ag es, th e se will b e noted. Also, if m aterial had to be rem oved, a note will indicate the deletion. Dissertation Publishing UMI DP22896 Published by P roQ uest LLC (2014). Copyright in the D issertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes Code P roQ uest LLC. 789 E ast E isenhow er Parkw ay P.O. Box 1346 Ann Arbor, Mi 4 8 1 0 6 - 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by MOURAD ZERROUG under the direction of hi-.?....... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of Ph-P- CpS 2Jj% 3 DOCTOR OF PHILOSOPHY Dean of Graduate Studies Date DISSERTATION COMMITTEE Chairperson Acknowledgments Praise be to the Almighty without Whose help, none of this would have been possible. My most sincere gratitude to my advisor Professor Ram Nevada whose support, encouragements and guidance were key in the success of this work. He was always ready to help. I remain undebted to him for all that I learned from him. I feel fortunate to have had the opportunity to be his student. My deepest thanks to Professor Gerard Medioni whose help and advice are much appreciated. I was inspired by his enthusiasm in research. My heartfelt thanks to Professor Irv Biederman for his encouragements and the instructive discussions about psychological aspects of human vision, and for his kind invitations to discuss technical matters around tea and cookies. I also would like to thank Professor Keith Price for finding time to serve in my qualifying exam and for his help when I had trouble with the machines and the software. My sincere thanks to Dr. Ken Goldberg for serving in my qualifying exam and for his friendship. To Andres Huertas goes my special appreciation for his encouragments, help and for his sincere friendship and many interesting discussions. My special thanks to the student members and visitors of IRIS, Yang Chen, Fatih Ulupinar, Hillel Rom, Sanjay Noronha, Parag Havaldar, Dongsung Kim, Yong Cheol Kim, Gideon Guy, Misuen Lee, Chia-Wei Liao, Bahram Parvin, Craig Reinhart, Andy Lin, Alex Francois, Mathias Bejanin, whose friendship and encouragements made my stay at IRIS and USC an enjoyable one. My very special thanks to Ahmed Abd-Allah, Ahmed Abutaleb, Irfan Khan, Jemshed Nawaz, Ashfaq Khokhar for their sincere friendship which was so helpful in making my student life so enjoyable. I am also grateful to Delsa Castelo, Dorothy Steele and Kusum Shori for their help in administrative matters. Finally, my most distinguished thanks to my parents, my brothers and their families, my sister and her family for their encouragements and their support throughout my entire student life. I will always remain grateful to them. Contents Acknowledgments ii List of Figures viii List of Tables xvi Abstract xvii 1 Introduction 1 1.1 Problems and G o a ls ........................................................................ 4 1.1.1 The Figure/Ground Problem Using Contours.............. 5 1.1.2 The Inference of 3-D Shape............................................... 8 1.2 An A pproach..................................................................................... 11 1.2.1 S c o p e .................................................................................. 14 1.2.2 Method.................................................................................. 16 1.3 Contributions of this T h e s is........................................................... 27 1.4 O u tlin e............................................................................................... 28 2 Previous Work 29 2.1 Inferring Ribbon Descriptions from Boundaries........................... 30 2.1.1 Methods Using Perfect B oundaries.................................. 31 2.1.2 Methods Using Real Images.............................................. 34 2.2 Inferring 3-D Descriptions from a 3-D im ag e.............................. 37 2.2.1 Surface-Based M ethods..................................................... 37 2.2.2 Volume-Based M ethods..................................................... 38 2.3 Inferring 3-D Descriptions from a 2-D im a g e.............................. 40 2.3.1 Psychological Accounts of Human Perception of 3-D Objects from a 2-D Im a g e .............................................................. 41 2.3.2 Method Using Perfect/Synthetic B oundaries................. 42 2.3.3 Methods Using Real Images.............................................. 50 iv 3 Projective Properties 54 3.1 Geometric Projective Properties..................................................... 54 3.1.1 Geometric Projective Properties of S HGCs....... 56 3.1.2 Geometric Projective Properties of PR C G C s.... 62 3.1.3 Geometric Projective Properties of Circular PRGCs . . 69 3.2 Structural Properties......................................................................... 85 3.2.1 Visible Cross-Section Pointing Towards the Camera . . 90 3.2.2 Visible Cross-Section Pointing Away from the Camera. 93 3.2.3 Non-Visible C ross-Section............................................... 96 4 Segmentation and Description of SHGCs 98 4.1 The Curve L e v e l............................................................................... 99 4.2 The Symmetry Level......................................................................... 102 4.2.1 Detection of Local Parallel Symmetries........................... 103 4.2.2 Grouping of Parallel Sym m etries.................................... 104 4.2.3 Selection of Hypotheses..................................................... 108 4.2.4 Verification of Global Correspondences........................... 108 4.3 The Surface Patch L ev el.................................................................. I l l 4.3.1 Detection of Local SHGC P a tc h e s ................... 113 4.3.2 Grouping of Local SHGC Patches...................... 126 4.3.3 Verification of SHGC Parts Hypotheses............................ 132 4.4 3-D Shape Inference......................................................................... 146 4.5 D iscussion......................................................................................... 148 4.5.1 Further Capabilities and L im itations............................. 148 4.5.2 Performance........................................................... 154 5 Segmentation and Description of PRCGCs and Circular PRGCs 157 5.1 The Surface Patch L ev el.................................................................. 157 5.1.1 Detection of Local Curved-Axis Surface Patches. . . . 158 5.1.2 Grouping of Local Curved-Axis Surface Patches . . . . 166 5.1.3 Verification of Curved-Axis Parts H ypotheses............... 169 5.2 3-D Shape Inference......................................................................... 173 5.3 D iscussion......................................................................................... 175 5.3.1 Extensions and Lim itations............................................... 175 5.3.2 Performance......................................................................... 179 6 Detection and 3-D Inference of Compound Objects 182 6.1 Properties of Compound Objects..................................................... 183 6.1.1 End-to-End Jo in ts............................................................... 184 v 6.1.2 End-to-Body J o in ts ............................................................ 188 6.2 Object D etection............................................................................... 189 6.2.1 Detection of Joints............................................................... 189 6.2.2 Analysis of Ambiguities..................................................... 192 6.3 3-D Shape Inference......................................................................... 198 6.3.1 Classification of P a rts ......................................................... 201 6.3.2 3-D Inference of P arts......................................................... 202 6.3.3 Constraints on 3-D Shape from Joints............................... 218 6.3.4 Analysis of the R e s u lts ..................................................... 225 6.4 D iscussion......................................................................................... 226 6.4.1 Strengths and Lim itations.................................................. 226 6.4.2 Perform ance......................................................................... 227 7 Conclusion 230 7.1 Sum m ary............................................................................................. 230 7.2 Future R e se a rc h ............................................................................... 234 7.2.1 Applications......................................................................... 234 7.2.2 Robustness............................................................................ 236 7.2.3 E fficiency........................................................................... 237 7.2.4 S c o p e ........................................................................... 238 References 240 Appendix A. Proofs 247 Appendix B. Analysis of the Correspondence Finding Methods of SHGCs 252 B .l SHGCs of Type 1............................................................................. 252 B.2 SHGCs of Type 2 ............................................................................. 254 Appendix C. Further Analysis of Curved-Axis Primitives 256 C. 1 Equivalence Classes of P R C G C s.................................................. 256 C.2 Size-Constancy of the Cross-Section Segments of PRCGCs . . 257 C.2.1 PRCGCs with Smooth Rotationally Symmetric Cross-Section 258 C.2.2 PRCGCs with Rectangular Cross-Section....................... 259 C.3 An Additional Property of Circular P R G C s.................................. 262 C.4 Non-Circular PRGCs......................................................................... 263 Appendix D. Evaluation of the Inferred 3-D Descriptions of Circular PRGCs 264 D. 1 Variation of the 3-D Descriptions as Functions of the Parameter Space 264 D.2 Similarity of the 3-D Descriptions with the Ground-Truth . . . 267 D.2.1 Similarity of the Recovered Axis with the Ground Truth Axis 267 D.2.2 Similarity of the Recovered Scaling Function with the Ground Truth Scaling F u n c tio n .................................................... 270 Appendix E. Control of the Search and Complexity 272 E .l Control of the Search........................................................................ 272 E.1.1 Indexing................................................................................ 272 E .l.2 Constrained S e a rc h ............................................................ 274 E.2 Time com plexity............................................................................... 275 E.2.1 Curve L e v e l.......................................................................... 275 E.2.2 Symmetry L evel.................................................................. 275 E.2.3 Surface Patch L e v e l............................................................. 275 E.2.4 Object L evel.......................................................................... 277 List of Figures Figure 1.1 A sample intensity image of a scene with a complex object. 3 Figure 1.2 Edges detected from the intensity image of figure 1.1. . . . 7 Figure 1.3 Difficulties of the inverse-projection problem, a. ambiguities of the projection, b. possible view dependence of image boundaries. 9 Figure 1.4 Shape perception depends on the interaction of surfaces; a. sample boundaries; b. a cylindrical surface is perceived; c. the same surface might be perceived as a planar side of a book...................... 10 Figure 1.5 Sample generalized cylinder..................................................... 13 Figure 1.6 Classes of shapes addressed....................................................... 15 Figure 1.7 The two major levels of the m ethod........................................ 22 Figure 1.8 Illustration of the m e th o d ......................................................... 25 Figure 1.9 Example of results of our method (from the image of figure 1.1). a. resulting description of the detected object (teapot).; b. inferred 3-D descriptions for the parts that could be recovered................ 26 Figure 2.1 Sample right rib b o n ................................................................... 32 Figure 3.1 Invariant (a) and quasi-invariant (b) mappings........................ 56 Figure 3.2 SHGC representation and terminology..................................... 58 Figure 3.3 Example of linear parallel sy m m etry ..................................... 59 Figure 3.4 Geometric invariant properties of SHGCs............................... 61 Figure 3.5 Sample PRCGC and related representation............................ 63 Figure 3.6 Terminology of (the projection of) a G C ............................... 64 Figure 3.7 Coordinates of the viewing direction in the Frenet-Serret frame 65 Figure 3.8 Figure 3.9 Figure 3.10 Figure 3.11 Figure 3.12 Figure 3.13 Figure 3.14 Figure 3.15 Figure 3.16 Figure 3.17 Figure 3.18 Figure 3.19 Figure 3.20 Figure 3.21 Figure 3.22 Figure 3.23 Figure 3.24 Figure 3.25 Limb points properties for a circular PRCG C....................... 66 Invariant properties of circular P R C G C s............................. 69 S ample circular PRGC and related representation................ 70 Limb points properties of an SOR........................................... 72 Invariant properties of SORs.................................................... 74 Plots of the size of space of observation for different upper-bounds of the image angles g (a) and f (b)............................................... 79 Property 3.12 fo r. a. 3-D plot of g as a function of the viewing angles; b.corresponding half viewing sphere...................................... 81 Property 3.13 for . a. 3-D plot; b. half viewing sphere. . . . 83 Properties hold exactly for special viewing directions a. side view. b. frontal view................................................... 84 Right ribbon and parallel symmetry correspondences. Coincident in the constant sweep case (a), offset in the non-constant sweep case (b)................................................................................................ 86 Junctions used for the structural properties............................ 87 Inner-surface structural properties........................................... 88 Impossible structural arrangements of self-occluding surface patches........................................................................................ 89 Closure patterns of an SHGC with a cross-section pointing towards the camera.................................................................................. 91 Closure patterns of a PRCGC with a cross-section pointing towards the camera.................................................................................. 92 Closure patterns of a circular PRGC with a cross-section pointing towards the camera................................................................... 93 Closure patterns for an SHGC with a cross-section pointing away from the camera........................................................................ 94 Closure patters for a PRCGC with a cross-section pointing away from the cam era....................................................................... 95 Figure 3.26 Closure patterns for a circular PRGC with a cross-sections pointing away from the camera.............................................................. 96 Figure 3.27 Closure patterns for an invisible cross-section..................... 97 Figure 4.1 A three-level hierarchy for the part level................................. 99 Figure 4.2 Co-curvilinearity measure for boundary grouping.................. 101 Figure 4.3 Example of results of curve level grouping, a. original boundaries; b. resulting boundaries. Boundaries are delimited by large dots. 103 Figure 4.4 Example of parallel symmetries produced on some images, a. initial boundaries; b. symmetry axes in thick lines overlaid on the boundaries.................................................................................. 105 Figure 4.5 Grouping of parallel symmetry elements................................. 107 Figure 4.6 Non grouped symmetries; a. non parallel connections; b. non- similar local and global scales.............................................................. 107 Figure 4.7 Competing hypotheses. Grouping of psl and ps2 conflicts with that of ps3 and ps4............................................................................ 109 Figure 4.8 Verification of linear correspondences...................................... 109 Figure 4.9 Results of the symmetry level on some images....................... 112 Figure 4.10 Block diagram of the SHGC patch level.................................. 113 Figure 4.11 Sample local SHGC patches of type 1. a. cylindrical patch; b. conical patch; c. non-linear patch......................................................... 115 Figure 4.12 Result of the “cross-sections” detection step. a. intensity image; b. boundaries resulting from the curve level; c. hypothesized cross- sections ..................................................................................... 117 Figure 4.13 Finding local S HGC patches of type 1..................................... 118 Figure 4.14 Using Corollary 3.4 to estimate the projection of the axis. . 119 Figure 4.15 Examples of hypothesized local SHGC patches detected from the contours of figure 4.12. a. the “correct” hypotheses, b. examples of “false” hypotheses..................................................................... 121 Figure 4.16 Sample local SHGC patches of type 2. a. cylindrical patch; b. conical patch; c. non-linear patch......................................................... 122 Figure 4.17 Figure 4.18 Figure 4.19 Figure 4.20 Figure 4.21 Figure 4.22 Figure 4.23 Figure 4.24 Figure 4.25 Figure 4.26 Figure 4.27 Figure 4.28 Figure 4.29 Figure 4.30 Figure 4.31 Local convexity and tangentiality of the cross-section in the vicinity of limb boundaries.................................................................... 123 Using an ellipse to approximate the image of the generating region of the cross-section........................................................................ 124 The search for co-cross-sectional points (a) and their completion (b)................................................................................................ 125 Performance of the method on some examples, a. real image bilaterally symmetric boundaries; b. actual correspondences and axis; c. resulting correspondences and axis by the method; d. synthetic non-bilaterally symmetric boundaries; e. actual correspondences and axis; f. resulting correspondences and axis by the method.................................................................................. 127 Examples of geometrically compatible local SHGC patches, a. from figure 4.9; b. from figure 4.15; d. from figure 4.3; c, e, f are illustrative examples................................................................. 129 Structural analysis of geometrically compatible local SHGC patches........................................................................................ 130 Ambiguity of the local structure of discontinuous connections. 131 Axis based cross-section recovery method, a. for LSHGCs. b. for nonlinear SHGCs...................................................................... 135 Axis based cross-section recovery for previous SHGCs. a and b from figure 4.15.a; c from figure 4.21.a........................................... 135 Limb reconstruction method..................................................... 136 Limb reconstruction for the SHGCs of figure 4.25................ 136 Inferring missing shape for SHGC patches of type 2 with skew symmetric sides......................................................................... 137 Closure/completion cycle, a. example, b. block diagram.. . 139 Junction measures.............................. 140 Results of the intra-part filtering after grouping the surface patches of figure 4.15.................................................................................. 142 Figure 4.32 Inter-part filtering using 3-tgt-j closure 144 Figure 4.33 Figure 4.34 Figure 4.35 Figure 4.36 Figure 4.37 Figure 4.38 Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4 Figure 5.5 Figure 5.6 Figure 5.7 Figure 5.8 Using subsumption and closure analysis to filter out unlikely part hypotheses.................................................................................. 145 Results of the SHGC level, a from the boundaries of figure 4.9; b. from the part hypotheses of figure 4.31................................. 146 Additional example of results of the method, a. intensity and edge images; b. detected local SHGC patches (only the “correct” hypotheses are shown), c; completed verified part hypotheses. 147 Recovered 3-D descriptions of previous SHGC scenes shown for different poses; a. from figure 4.34 (lower objects); b. from figure 4.35............................................................................................. 149 Multiple (merged) hypotheses that correspond to the same SHGC object, a. original boundaries; b. detected local SHGC patches; c. completed descriptions (and contours); d. resulting global SHGC description (contours and rulings).......................................... 151 Mapping transitivity constraint for hypothesis merging. . . 153 Block diagram of the PRGC patch level................................. 158 Sample local curved-axis surface patches............................... 160 Right ribbon detection, a. parallel symmetry axis as initial estimate of right ribbon axis; b. finding right ribbon correspondences. 162 Local right ribbon detection, a. finding extremal correspondences, b. finding subsequent correspondences.................................. 164 Resulting local surface patches, a. original intensity and boundary images; b. detected local surface patches.............................. 165 Structural constraints between local curved-axis surface patches. 167 Geometric constraints between local curved-axis surface patches. 168 Examples of grouping hypotheses from the surface patches of figure 5.5.b............................................................................................ 169 Figure 5.9 Curved-axis part completion, a. using pairs of new boundaries; b. using a single boundary........................................................... 171 Figure 5.10 Figure 5.11 Figure 5.12 Figure 5.13 Figure 5.14 Figure 5.15 Figure 6.1 Figure 6.2 Figure 6.3 Figure 6.4 Figure 6.5 Figure 6.6 Figure 6.7 Figure 6.8 Figure 6.9 Figure 6.10 Verified part hypotheses from the surface patches of figure 5.5.b. 172 Additional example of resulting object hypotheses for a different scene of similar objects, a. initial intensity image; b. edge image; c. detected local surface patches, d. verified parts.................... 173 Additional example of results of the segmentation method, a. initial intensity image; b. edge image; c. some detected local surface patches; d. verified parts........................................................... 174 Recovered 3-D primitives shown for their original poses and different ones. a. from the detected objects of figure 5.10; b. from the detected objects of figure 5.11; c. from the detected objects of figure 5.12............................................................................................. 176 A PRGC part with multiple surfaces....................................... 177 A line drawing where our method would fail......................... 179 Organization of the object level................................................ 183 The two types of joints addressed: end-to-end and end-to-body. 184 Structural relationships for end-to-end joints. Equal-size ends (a through c) and different-size ends (d and e).......................... 187 Structural relationships for end-to-body joints.a. visible joint curve; b. non-visible joint curve........................................................ 189 Joint detection allows for partial occlusion, a. a joint is marked between parts p i and p2; b. no joint is marked..................... 192 Example of detected joints; a. intensity image; b. detected parts; c. detected joints (T-junction points are displayed with large dots and junction curves in bold lines).................................................. 193 Ambiguities occur due to similar projective properties of different 3- D bodies...................................................................................... 194 An example of geometric ambiguity........................................ 195 Structural ambiguities may result in different interpretations, a. original boundaries; b. side views of two possible interpretations. 196 Visible joint boundaries suggest “part-end” boundary labelling. 197 Figure 6.11 Graphical representation of the compound object detected from figure 6.6.................................................................................... 198 Figure 6.12 Additional results of the compound object segmentation method. 199 Figure 6.13 Both the body and the cross-section are needed to infer 3-D shape, a. one cross-section needs to be visible for an SHGC; b. both cross- sections need to be visible for a curved-axis part................. 200 Figure 6.14 SHGC representation and projection geometry....................... 204 Figure 6.15 Recovery of an LSHGC. a. cylinder; b. cone.......................... 207 Figure 6.16 Recovery of a non-linear SHGC................................................ 209 Figure 6.17 Inferring 3-D descriptions of curved-axis parts....................... 215 Figure 6.18 Recovered 3-D volumetric descriptions for the descriptions of figure 6.11 (a) and from the descriptions of figure 6.12 (b)............ 219 Figure 6.19 Finding parts’ ends which have the same plane orientations. 221 Figure 6.20 Using end-to-end joints to infer cross-sections........................ 223 Figure 6.21 Assumption of cross-section circularity from surface symmetry, a. bilateral symmetry for SHGCs; b. ellipse major axis orthogonal to extremal 2-D axis tangent........................................................ 224 Figure 7.1 Example of non-detectable part by our method. A stronger reliance on structural properties to hypothesize parts is needed. . . 234 Figure B. 1 Finding the local SHGC patch correspondences..................... 252 Figure B.2 Local convexity and tangentiality of generating region are key to finding correspondences........................................................... 255 Figure C .l Equivalent PRCGCs having different origins.......................... 257 Figure C.2 A rotationally symmetric curve................................................. 258 Figure C.3 Cross-section segments lengths of a PRCGC with a rectangular cross-section.............................................................................. 261 Figure C.4 Extremal cross-section and axis orientations are constrained in the image, a. coincident in a regular cut; b. non-coincident in an irregu lar cut.......................................................................................... 262 xiv Figure D. 1 Plots of size of the parameter space for which the recovered 3-D de scription is within different bounds of the actual one. . . . 266 Figure D.2 Synthetic objects and the resulting 3-D descriptions.............. 268 Figure D.3 Plots of the ground truth (a) and recovered (b) 3-D axes and scaling functions for the objects of figure D.2.a and b...................... 269 Figure E .l The spatial index used for direct access to image boundaries. 273 xv List of Tables Table 3.1 Viewing sets sizes (percent) where the observed image angle g is within 5o of 90o........................................................................ 80 Table 3.2 Viewing sets sizes (percent) where the image angle f is 3o or less. 82 Table 4.1 Evolution of the number of hypotheses of the SHGC module 155 Table 4.2 Run times of the SHGC module (m in:sec)........................... 155 Table 5.1 Evolution of the number of hypotheses of the PRGC module 180 Table 5.2 Run times of the PRGC module (minrsec.)........................... 180 Table 6.1 Evolution of the number of hypotheses of the SHGC module 228 Table 6.2 Evolution of the number of hypotheses of the PRGC module 228 Table 6.3 Evolution of the number of hypotheses of the object level. 228 Table 6.4 Run times of the SHGC module (m in:sec)........................... 229 Table 6.5 Run times of the PRGC module (m in:sec)........................... 229 Table 6.6 Run times of the object level (min:sec).................................. 229 Table D. 1 Similarity measures between recovered descriptions and ground truth descriptions....................................................................... 270 xvi Abstract To produce a description of a three-dimensional (3-D) scene, a computer vision system must have the capability to detect its constituent objects and describe them in some shape representation scheme. 3-D shape descriptions are of particular interest because they have the same dimensions as the real world objects. The 3-D descriptions can provide essential means to recognize objects, manipulate them, navigate around them and even learn about new objects. Achieving 3-D descriptions from a single intensity image is particularly important because such an image can be conveniently acquired with little control of the viewing conditions and it conveys by itself rich information about the viewed objects’ shapes. To develop systems with the ability to infer such 3-D shape descriptions from an intensity image is, however, one of the most challenging problems in computer vision. The problem has two central difficulties. First, objects are not directly given in an image. Finding them in the clutter of image features with noise, boundary breaks, markings, shadows and occlusion is a particularly difficult task. Second, because the image is 2-D, inferring the 3-D shapes is a mathematically under constrained problem. The first problem, known as the figure!ground separation (also object segmentation), has received little attention in past work. Most methods that addressed it derive 2-dimensional descriptions. The second problem has been addressed in the past but mostly under the assumption of perfect and segmented images. In this thesis, we show that the segmentation and recovery of 3-D descriptions from a real intensity image can be solved for a large class of objects. The approach consists of using a structured shape description scheme based on generalized cylinders (GCs) and their joint relationships. Three large sub-classes of GCs are addressed in this thesis, straight homogeneous generalized cylinders (SHGCs), planar right constant generalized cylinders (PRCGCs) and circular planar right generalized cylinders (circular PRGCs). SHGCs are characterized by straight axes and continuous scaling of the cross-sections. PRCGCs are characterized by planar, curved, axes orthogonal to constant-size cross-sections. Circular PRGCs are characterized by planar, curved, axes orthogonal to varying-size circular cross- sections. A central characteristic of the proposed approach is the explicit use of the three-dimensionality of the objects and of their desired descriptions, to provide viewpoint insensitive solutions to the segmentation and 3-D description problems. For this, a number of rigorous projective geometric and structural properties of the classes of shapes addressed are analyzed. These properties are exploited to provide viewpoint insensitive constraints to detect relevant features in the image that are likely to project from volumetric objects. This differentiates this work from past work on generic monocular shape analysis from real images which has used largely heuristic methods. We derive new geometric invariant and quasi-invariant properties of SHGCs, PRCGCs and circular PRGCs. We also provide novel ways to exploit such properties to segment and describe complex curved objects in the presence of real image phenomena, such as spurious and missing information, including occlusion. The method uses a grouping-based, hypothesize-verify, methodology applied to a hierarchy of feature levels. It exploits the projective properties of the shapes of interest to generate, group and verify object hypotheses, and finally to infer their 3- D descriptions. We believe this work to represent an important advance in the development of practical vision systems for monocular 3-D scene analysis. To our knowledge, it is the first work that provides working methods for the segmentation and 3-D description by parts of complex curved objects from a single real intensity image. The proposed methods are demonstrated on several complex intensity images by current standards in the field. xix Introduction Computer vision has since its early days addressed how a description of a scene can be derived from one or more images of that scene. One aspect of describing a scene is to describe the shape of its constituent objects ./Such shape descriptions provide compact representations which can be used to recognize the objects, manipulate them, navigate around them and learn about new objects. Such abilities are important for applications such as inspection, remote sensing, automatic target recognition, medicine, navigation and computer-aided design and manufacturing. Two central issues in achieving object descriptions are the choice of the shape description scheme and the development of methods that produce relevant descriptions from image information. Past work [8,13,43,48] has indicated a number of desirable characteristics for a shape description scheme. They include stability with respect to small changes in either imaging conditions or object shape and locality in that descriptions can be obtained even if objects are partially visible such as due to occlusion. It is also desirable to use rich descriptions, so as to preserve expressiveness in the presence of incomplete image information, and descriptive ones so as to analyze similarities and differences between different objects. Several shape description schemes have been used in the literature. They vary from low level ones based on point or line primitives to higher level ones based on object-level primitives. While there is no single universal scheme that is capable of representing all objects and at the same time satisfying the above characteristics, schemes that reasonably capture a large number of common objects can be derived. It is commonly accepted in the literature [5,8,13,43,48] that such schemes would be structured and segmented in that they consist of part/whole descriptions of objects; i.e. in terms of their components and their relationships. Such structured-shape description schemes allow the analysis of the overall structure of an object as well as of the detailed shape of its components. Thus, they can be used to produce, upon need, different levels of abstractions. Furthermore, it is desirable to use 3-D shape descriptions because they have the same dimensions as the objects of interest. Such object level, 3-D descriptions, are highly useful for 3-D scene analysis. In this thesis, the goal is to produce such 3-D structured descriptions from an intensity image of a scene. An example that illustrates this goal is given in figure 1.1. In this figure, the desired result would be a description of the teapot in terms of four parts with specific relationships; its main body (the pot) in contact with the upper lid and a spout and a handle in contact with the pot. These parts would be described by a 3-D shape scheme. 2 Figure 1.1 A sample intensity image o f a scene with a complex object. The reason a single image is used in this thesis is that it is a cheaper alternative than multiple images such as in stereo or motion. To use multiple images requires finding correspondences between them, i.e. determining the images of the same scene point in all of them, and requires elaborate calibration procedures. Further, a single image by itself often contains rich information about the shape of the scene objects. This is demonstrated by our (human) ability to perceive and recognize complex objects from a single image. To achieve such ability in machines has continued to be one of the most challenging problems in computer vision. However, inferring 3-D structured descriptions from a single intensity image has its own difficulties. First, the objects are not directly given in an image. In fact, all an image directly gives are measures of reflected light onto the photosensitive elements of a camera. Methods that detect those objects in an image, thus, need to be devised. Such methods have to use the image data to find and separate the objects from one another and from the background. This problem, known as thc figure/ ground (or segmentation) problem, is central to many visual tasks and needs to be solved if computer vision is to have practical applications. Second, the image does not directly provide 3-D shape information (such as surface orientations and curvature) but only a 2-D projection of the 3-D objects. Therefore, methods that infer the 3-D descriptions from the image also need to be devised. The above problems also occur if direct depth measurements were available such as in a range image. Such an image does not directly give its constituent objects either. It also requires processing of the depth values to infer descriptions in terms of some shame scheme. However, the use of range images requires highly controlled viewing conditions (and technology). An intensity image, on the other hand, can be conveniently acquired under ordinary viewing conditions. Before describing in any more details the specific goals of this thesis and our approach, a discussion of the issues and problems related to the figure/ground separation and 3-D shape inference is first given in the following section. At the end of this section, our goals are stated in a precise way. Section 1.2 discusses our approach and its relation to past work. The contributions of this thesis are highlighted in section 1.3. 1.1 Problems and Goals An important issue in solving the object segmentation and 3-D shape inference from an intensity image is which image information (cue) to use. Several image 4 cues have been used in past work. They include contours, shading and texture. Object contours convey rich geometric information and are stable with respect to illumination changes [8], Shading refers to reflectance properties of an object’s surface, which depend both on its shape and the illumination conditions. Thus, prior knowledge of photometric properties is needed to exploit shading. Use of texture requires presence (and assumption) of some uniform pattern on the surface [73]. Experiments with humans indicate that the contour is as good for recognition as a full color image [5] and that in case of conflicts, contour cues generally dominate the shading cues [2]. The dominance of contour information over other cues can also be justified on computational grounds. All shape from 2-D image methods require some assumptions. We believe that those using contour are much less restrictive than those using shading, for example, where complete reflectance functions need to be known and assumed to be constant over the scene. For this reason, a contour-based approach is adopted in this thesis. 1.1.1 The Figure/Ground Problem Using Contours To use contours for segmentation and description requires finding boundaries in the image. One possible way is to assume that the image of an object would correspond to a uniform intensity region. This is the approach taken by so-called region segmentation methods which first find such homogeneous regions in the image then extract their (closed) boundaries. While this type of approach works well for images with sharp variations between regions of interest and others (such as images of 5 typewritten characters), they would fail on images of objects which have markings (such as writings and drawings) on their surface and would wrongfully join nearby regions corresponding to different objects or at low contrast regions. For example, the image of figure 1.1, the intensity is not uniform in the region occupied by the object or any of its parts. Notice, for example, the region corresponding to the lid where specularities create sharp intensity variations. The pot itself has surface markings and the region between the spout and the pot (near the occlusion point) has a low contrast which may result in joining the two parts. The alternative is to use edges produced by an edge detector. Edge detectors do not attempt to do segmentation into meaningful regions but only produce edges that correspond to local intensity discontinuities. Thus, in principle, they would produce edges at regions corresponding to object surface and depth discontinuities, which are important to analyze the geometry of an object and describe its shape. However, the intensity edges include not only object boundaries but surface markings, shadows, specularities and noise as well. This stems from the fact that intensity discontinuities (image edges) can result not only from discontinuities in surface orientation and depth but also from discontinuities in surface reflectance and illumination. Furthermore, due to the local nature of edges, object boundaries can be broken due to either the edge detection algorithm (such as errors in edge localization) or the image itself, particularly at low contrast regions. Occlusion between objects is yet another source of image boundary discontinuity. Figure 1.2 shows the edges detected from the image of figure 1.1. Notice that the boundaries are not all perfect, continuous or even part of the outline of the object (most are, in fact, not). Notice the clutter of edges inside the object caused by markings (the upper portion of the pot), shadows (the spout) and specularities (the lid and the handle). Also, notice that the pot is partially occluded by both the spout and the flat object in the front which also occludes part of the spout. The pot itself partially occludes the handle whose ends are not visible. Figure 1.2 Edges detected from the intensity image o f figure 1.1 Despite these difficulties, we believe such edges to be more useful than region boundaries because edge detectors do not attempt to segment the image into regions. They give the needed low level information to analyze the geometry of relevant object features, which is useful for achieving both segmentation and description. This, of course, necessitates the use of more global information in order to differentiate between meaningful edges (projecting from relevant scene objects) from the others (corresponding to markings and shadows for example). 7 1.1.2 The Inference of 3-D Shape The use of monocular contours to infer the 3-D shape of viewed objects also introduces several difficulties. Because contours emphasize the geometric aspect of the image formation, the difficulties lie in the under-constrained nature of the problem. This is due to the fact that projection is in general not invertible. Thus, mathematically, there is an infinite number of 3-D objects (shapes) that could have given rise to the observed image boundaries. For example, for an image pointp, the set of possible 3-D points P is the line that contains p and the focal point of the camera (possibly at infinity). The difficulty is compounded by the fact that certain curved object boundaries are an artifact of the viewing geometry and the object's pose and are thus viewpoint dependent (known as contour generators or limb boundaries). More precisely, they are the loci of points on the object's surface where the viewing direction lies in their tangent plane. Thus their geometry can change in unpredictable ways when the viewpoint changes. Figure 1.3 illustrates this situation. The central issue here is how to use the image boundaries to infer the 3-D shape in a viewpoint insensitive way. But an image only offers appearances of objects which do depend on the viewpoint. It is crucial for 3-D scene analysis that the 3-D shape inference methods be relatively insensitive to viewpoint so that the recovered object descriptions do not drastically change with changes in viewpoint. 8 b Figure 1.3 Difficulties o f the inverse-projection problem, a. ambiguities o f the projection, b. possible view dependence o f image boundaries. To achieve this, an analysis of the viewpoint independent relationships between image and 3-D shapes needs to be performed. For example, a straight image boundary identified as a limb implies that the 3-D surface is locally parabolic [38]. In this case the relationship (constraint) involves a single boundary. The 3-D inference problem cannot be solved by examining boundaries individually, however. Rather, interactions between boundaries, such as symmetries, and interactions between surfaces are important in shape perception. This is illustrated in figure 1.4 where the same boundaries (figure 1.4.a) are perceived once as the curved surface of a cylinder (left-most example) and the other time as the planar top of a book (right-most example). In differential geometric terms, the surface orientations of two intersecting surfaces in the vicinity of their intersection curve are related by well defined differential geometric relationships. Their surface orientations are both orthogonal to the tangent to their common curve [23]. Thus, 9 analyzing the interactions between the boundaries and surfaces of an object is also a important issue in 3-D shape from contour. Figure 1.4 Shape perception depends on the interaction o f surfaces; a. sample boundaries; b. a cylindrical surface is perceived; c. the same surface might be perceived as a planar side o f a book. In this case also, the shape description scheme plays an important role because it is an important source for the analysis of the viewpoint insensitive relationships between image and 3-D shapes. Thus, the figure/ground separation and the 3-D shape inference are related problems in that a chosen generic 3-D shape description scheme affects the scope, the methods and the underlying constraints of both problems in similar ways. The above discussion of the issues and their difficulties, in a sense, defines the goals of this thesis. More precisely, this thesis addresses the figure/ground problem and the inference of 3-D segmented descriptions of complex curved objects that consist of the arrangement of several parts (compound objects) in the presence of a. b. c. 10 imperfections such as noise, boundary breaks, surface markings, shadows and occlusion. It attempts to understand and solve the fundamental issues of automatically detecting relevant objects while explicitly using the 3-dimensionality of their desired descriptions. The goal is also to develop viewpoint insensitive methods for object segmentation and 3-D shape inference. 1.2 An Approach From the previous discussion about real image imperfections, two broad issues need to be addressed in order to develop methods for object segmentation and description: • what is an “object” • how to infer “objects” from an image The first issue is related to a description of an object both as a 3-dimensional body and as its projection in an image. The second issue is related to the algorithmic aspect of the problem where the task is to actually find objects from the image. One possible way is to adopt an approach where the problem is to recognize image objects using a pre-stored database of object models. Each model is a particular object represented by its own features. The problem is transformed into one of finding correspondences between image and model features. The nature of the image features should be the same as those used to represent the object models. Successful matches between image and object features indicate the likelihood of the 11 presence of an object model in the image. Such hypotheses can be verified by finding a consistent transformation to compute the pose of the object; i.e. all individual feature matches are consistent with one another. The description of a recognized object is thus one which has been a priori associated to the object model. This type of approach, which can be termed specific model-based, requires that the object be matched to some pre-existing model in order to obtain a description for it or even find it in the image. Consequently, the recognition process may be very expensive for large databases as both the correspondence and pose spaces become large. There are a number of ways, including representational and algorithmic, which can be used to reduce the search cost [27] but the scope of such approaches remains limited to the available object models which in most current systems do not exceed a few dozens. Another way is to use generic instead of specific models. This can be achieved by using a number of generic primitive shapes that are capable of representing classes of objects. A method that addresses such a generic class of shapes addresses all objects whose shape belongs to that class and is not limited to any particular object instance. A relevant issue, in this case, is which generic shape scheme to use and what its scope is; i.e. how large and common is the set of objects it captures. Since most objects in our environment are structured into one or more components, a natural choice of a generic shape scheme is one which is well suited to structured and 12 segmented descriptions. Further, one would ideally use a small number of generic shape primitives which capture a large number of common objects. This makes the method general and the segmentation problem tractable. Generalized cylinders (GCs) [7] are an example of such a generic shape description scheme. A GC is defined by a planar cross-section curve C, swept along a space curve (axis) A following a sweep function r; C, A and r constitute the intrinsic description of a GC (see figure 1.5). cross-section curve C / axis curve A GCs are particularly suited to structured and segmented 3-D descriptions because they provide natural means to describe a complex object by providing both an axial representation and the volumetric properties of its parts. Although not all objects in our environment can be described by GCs, the fraction of them which can is large [5,7,8]. Moreover, most of these objects can be captured by a small number sweep junction r Figure 1.5 Sample generalized cylinder 13 of sub-classes of GCs. In our approach GCs are used as the basic shape description scheme to describe objects. The scope of this thesis is discussed more specifically in the section below. The method is discussed in section 1.2.2. 1.2.1 Scope The adopted approach is to use a small number of sub-classes of GCs (primitives, or parts) and their relationships (joints) which we believe are common in our environment. The primitive GCs addressed are straight homogeneous generalized cylinders (SHGCs) and planar right generalized cylinders (PRGCs). SHGCs are obtained by sweeping a planar curve (cross-section) along a straight line (axis) while continuously scaling it. Examples include vases, bottles, screw-drivers and many other industrial and household objects. PRGCs are obtained by orthogonally sweeping a cross-section curve along a curved, planar, axis. Two sub-classes of PRGCs are actually addressed in this thesis: planar right constant generalized cylinders (PRCGCs) where the cross-section remains constant during the sweep and circular planar right generalized cylinders (circular PRGCs) where the cross- section is circular but whose size is allowed to vary along the sweep. Examples include tubes, some musical instruments, pipe fittings, snakes and some animal horns. Figure 1.6.a shows sample SHGCs, PRCGCs and circular PRGCs. 14 The type of joints addressed include end-to-end joints where parts are joined at their extremities and end-to-body joints where one part’s extremity is in contact with another part’s body. Figure 1.6.b gives examples of these types of joints. c j > SHGC PRCGC Circular PRGC a. generic volumetric primitives end-to-end joint end-to-body joints. end-to-body joint b. generic joints Figure 1,6 Classes o f shapes addressed. Using the above primitives and their relationships for the purposes of shape description or recognition can also be justified on psychological grounds. Psychophysical experiments conducted by Biederman [5] indicate that human perception and recognition from line drawings is highly influenced by perception of 15 a small number of shape primitives, geons (qualitative GCs). These experiments have lead to the recognition by components (RBC) account of human perception. Central to this theory is also the fact that such primitives induce a viewpoint independent perception (and recognition) from line drawings. 1.2.2 Method The method uses orthography to approximate the projection geometry.1 This is reasonable when the dimensions of scene objects are small compared to their distance from the camera. The method aims at producing descriptions that are not sensitive to the particular viewpoint the scene is viewed from, or equivalently to the pose of the scene objects. To achieve viewpoint independence, a careful analysis of the properties of image observables characterizing the classes of shapes addressed needs to be carried out. In the proposed approach, these image observables are derived from the analysis of the projective properties of those shapes. These properties consist of rigorous relationships between the features of an object as the projection o f a 3-D body. They also establish useful links between image and 3-D shape. They are obtained by analyzing the properties of the image of an object as functions of the viewing direction. 1. it is applied, however, to images acquired by a regular camera (which can be best modeled by perspective projection). See section 4.5.1 for a discussion. 16 Deriving the projective properties of shapes does not by itself provide mechanisms to find them in an image. To automatically segment the objects from a real image, requires the development of algorithms that detect “relevant” image observables and their relationships and use them to produce object-level descriptions in terms of GCs and their relationships. This necessitates the ability to discriminate between “relevant” observables (that correspond to meaningful objects) and “irrelevant” ones in a way that handles the previously mentioned image imperfections, including feature fragmentation and occlusion. This is naturally a search problem. How to organize the search to detect objects is central to the segmentation problem. The search algorithms and their constraints can also be derived from the projective properties of the generic shapes addressed. The problem of object segmentation and description is not new. Several efforts have been made in the past to achieve object descriptions from an image. Past work on the problem of segmentation and shape description can be characterized by the assumed nature of the input data, the shape descriptions sought and the methods used to compute them from the input data. Many approaches have assumed the same dimensionality between input images and output descriptions (for example 2-D from 2-D, 3-D from 3-D). Methods of 3-D descriptions from range data include those of [4,26,52] Methods of 2-D descriptions from image boundaries include those of [32,45,48,58,62], Among these latter, most assume perfect and/or segmented 17 boundaries. Efforts that addressed imperfect boundaries [32,45,58] have used mostly intuitive methods. Previous work on 3-D shape from 2-D boundaries includes early work on polyhedral scene analysis [20,33,37,41] and more recent efforts on curved objects [2,3,16,19,21,22,30,34,38,42,43,46,47,55,66,69,70,72,75, 76,77,78,79,80]. Among those, most also assume perfect and segmented boundaries. A subset of these efforts have also addressed projective properties of GCs, but mostly with application to either the detection of partial shape information such as in [21,55] or for shape description from perfect data also assumed to be given in a segmented form [28,46,69,70], Other recent efforts such as [3,22,34] have addressed the recovery of qualitative volumetric descriptions (geons) from monocular line drawings. Although they addressed the segmentation of input boundaries into meaningful object descriptions, they also assumed perfect and/or synthetic boundaries and did not address the real image imperfections discussed previously. The problem of segmentation and inference of 3-D structured descriptions from a single, real, intensity image has virtually not been addressed in the research community. An exception is the work of [66] which addresses SHGC descriptions from a real image (we will describe previous work in more details in section 2). 18 In this work, the approach is to exploit the projective properties of the classes of 3-D objects introduced in section 1.2.1, to segment and describe them from a real intensity image. Its different aspects are discussed in the sections below. Analysis and Derivation of Projective Properties The projective properties we seek must satisfy a number of requirements in order to be useful in solving our problem. Those requirements should include those postulated for a shape description scheme because the properties are the means to obtain the descriptions. For example, if a property cannot be locally applicable and requires the whole object boundary to be visible, then it is not useful for producing local descriptions under occlusion. Similarly, a property that is noise-sensitive cannot be used to produce stable descriptions. Because the projective properties are to be exploited by methods (for segmentation and shape description), it is also desirable that they satisfy additional requirements: • They should provide strong evidence on the existence of relevant objects. When observed, this makes them useful for hypothesizing presence of such objects from image boundaries. • They must be viewpoint insensitive and provide rigorous relationships between the 3-D descriptions of the objects and their image features .This makes them useful in producing viewpoint insensitive segmentation and 3- D shape descriptions. 19 We find two classes of projective properties to be of interest: geometric properties and structural properties. By geometric properties, it is meant those that give relationships between the boundaries of the projection of a GC (henceforth a part) and that characterize its intrinsic description (as previously defined). Examples include properties that relate limb points that belong to the same cross- section. This type of properties usually consist of symmetry relationships between boundaries of a part. Structural properties characterize the interactions among boundaries of a part and of joined parts. They consist of junction relationships between those boundaries and provide not only constraints on regular interactions but useful information about shape as well. Two classes of geometric properties are of interest: invariant properties and quasi-invariant properties. Invariant properties are those which hold independently of the viewing parameters;2 i.e. they have constant measures. Quasi-invariant properties have measures that are allowed to vary but do so slowly and remain restricted to a small range of values over most of the parameter space. In using the geometric and structural projective properties, the 3- dimensionality of the desired descriptions is explicitly taken into account. We will derive a number of invariant and quasi-invariant properties of SHGCs, PRCGCs and circular PRGCs in chapter 3. We will also discuss their use for the segmentation and shape description problems in chapters 4, 5 and 6. 2. except perhaps on a set of measure zero in the parameter space; i.e. almost everywhere 20 Hierarchical Grouping for Segmentation and Description of Volumetric Shapes The second aspect of the proposed method lies in using a hierarchical, grouping- based, method to detect objects and recover their shape. The method is organized in two levels: the part level and the object level (see figure 1.7). In the former the objective is to form parts hypotheses which satisfy the projective properties of the volumetric primitives in our catalog and in the latter complete object descriptions in terms of parts and their joint relationships, represented as graphs. Each of these two levels is itself organized into several sub-levels which operate in a bottom-up fashion. The part level is organized in three sub-levels: the first one detects boundaries, the next one detects symmetries and the third one consists of two independent modules, one for the detection of SHGCs and the other for the detection of PRCGCs and circular PRGCs (henceforth, when mentioned jointly, we will use the term PRGCs or alternatively curved-axis primitives). The object level is organized in two sub-levels: the first one detects joints between parts and identifies the different interpretations possible from the available descriptions and the second one recovers (and completes) 3-D shape descriptions. Hypothesize-verify processes are used which at each level search for evidence of the presence of relevant (partial) object features, group those likely to project from the same object together and verify object hypotheses, and recover their shape, whenever sufficient (global) information becomes available. 21 segmented volumetric descriptions t object level 1 part level t intensity image Figure 1.7 The two major levels o f the method Throughout the hierarchy, the projective properties (geometric and structural) are used as a source of strong constraints for object segmentation and description. The motivation behind such an approach is two-fold. First, an object description involves features that are naturally hierarchical. For example, the description is in terms of several parts, each of which consists of several surfaces, each of which consists of boundaries, etc. Second, the observed projective properties used to form descriptions are only necessary properties of the image of an object, not sufficient ones to firmly conclude its presence. Therefore, image features which satisfy the properties at each level can only be used as hypotheses. In grouping features, we account for their fragmentation and obtain a better segmentation. To filter out false hypotheses requires verification tests based on additional criteria which consist of the satisfaction of other properties (at the same or a higher level). 22 Although our method is largely bottom-up, each level does not rely on perfect decisions made at lower levels. Rather, each level contributes to the overall segmentation and description problems to the extent made possible by its scope. Wrong decisions, such as false negatives or false positives, can be corrected at higher levels where more global information becomes available. An instance is boundary grouping which is effected at several levels of the hierarchy. In this sense, the method provides functional feedback. Inference of Structured 3-D Shape Descriptions The methods used in this thesis to infer 3-D shape descriptions are based on those of Ulupinar and Nevada [69,70,71,72], These latter produce viewer-centered descriptions in terms of the surface orientations at sampled points on the visible surfaces of an object. Our methods extend theirs to produce the 3-D intrinsic elements of SHGCs and PRGCs, including circular PRGCs which were not addressed in past work, giving object-centered descriptions, independent of the viewer coordinate system. Such descriptions are compact and symbolic and decompose a part’s shape into independent components (cross-section, axis and sweep) which can be analyzed separately. The methods of [69,70,71,72] cast the 3-D shape inference problem into a set of constraint equations on the 3-D shape unknowns, the surface normals. The source of those constraints is a combination of the (symmetry) properties of the 23 boundaries of a part, knowledge of the projection geometry (orthographic) and knowledge of the class of the part at hand (SHGC or PRCGC). Our method further uses the joint relationships between parts of a compound object. More specifically, it exploits the end-to-end joints to set constraints on the plane orientations of the joined part’s cross-sections and to infer missing shape information such as the invisible cross-sections of joined parts. An overall illustration of our method, summarizing its different components, is shown in figure 1.8. This approach shares the philosophy of [45,58] where scene segmentation and shape description are solved concurrently rather than in a pure sequential fashion. Consequently, the figure-ground problem encompasses both (in the remainder of this thesis, we will interchangeably use the terms figure/ground and segmentation). The method discussed in this thesis has been implemented and tested on several real intensity images, seven of which are shown in chapters 4 through 6. An example of the results produced by our method on the input of figure 1.1 is given in figure 1.9. The figure shows the resulting descriptions consisting of a graph whose nodes are the parts (primitive GCs) and whose arcs joint relationships between the parts. The fact that 3-D shape descriptions have been recovered (whenever possible3) is demonstrated in figure 1.9.b which shows the 3-D descriptions from 3. in certain cases, the image boundaries do not contain sufficient information to infer the 3-D shape such as is the case for the shown teapot handle whose cross-sections are not visible. 24 3-D shape inference 3-D shape description projective; .description' segmentation - and description projection of description projective' properties. representation projection of object image obj 3-D object Figure 1.8 Illustration o f the method different viewpoints. Each part is represented by the intrinsic parameters of a GC, all in 3-D space. This work has several applications. They can be used for recognition using higher-level object descriptions. They also have applications in robotic grasping where the recovered shapes can be used to plan and execute a grasp plan, navigation where the shapes can be used to avoid obstacles and plan optimal paths, and object learning where shape-based symbolic descriptions allow to analyze similarities and differences between objects. 25 b. Figure 1.9 Example o f results o f our method (from the image o f figure 1.1). a. resulting description o f the detected object (teapot).; b. inferred 3-D descriptions fo r the parts that could be recovered. 26 1.3 Contributions of this Thesis As previously mentioned, this thesis addresses a virtually unexplored problem, the 3-D shape segmentation and recovery from a real intensity image in the presence of noise, boundary breaks, markings, shadows and occlusion. Its contributions are the following: • Derivation of new geometric invariant properties of SHGCs. • Derivation of new geometric invariant and new geometric quasi-invariant properties of circular PRGCs. • Development of a method for recovering 3-D shape (object-centered description) of circular PRGCs from monocular contours. • Development of a method for viewpoint insensitive segmentation and 3-D object-centered recovery of SHGCs from a single real intensity image in the presence of noise, markings, shadows and occlusion. • Development of a method for viewpoint insensitive segmentation of PRGCs from a single real intensity image with noise, markings, shadows and occlusion. • Development of a method for viewpoint insensitive segmentation and recovery of segmented 3-D descriptions of compound objects (made up of SHGCs and PRGCs) from a single real intensity image with noise, markings, shadows and occlusion. 27 1.4 Outline So far, the discussion has focused on defining and motivating the problem and giving an overview of the proposed approach and its implementation. In chapter 2, a more elaborate description of related research in the literature is given. The discussion will concentrate more on the generic shape-based methods as they are closer to this research. In chapter 3, a number of projective properties of SHGCs and PRGCs (both those derived in this thesis and relevant past ones) are discussed. In chapter 4, a method for the segmentation and inference of SHGCs from a single real intensity image is proposed. In chapter 5, a method for the segmentation and inference of PRGCs from a single real intensity image is described. Since the methods for the detection of SHGCs and PRGCs share some processes, these latter will be described in the SHGC chapter. In chapter 6, a method for the segmentation and 3-D inference of compound objects (the object level) is proposed. Results of the implemented systems together with their performance and the strengths and limitations of the proposed methods are given throughout the discussion. Chapter 7 concludes this thesis and discusses future research issues. Some details of the mathematical formulations, such as proofs of the derived properties, analyses of the results, some implementation details relevant to the performance of the methods and the complexity analyses of key algorithms are given in appendices A through E. 28 2 Previous Work In this chapter, we discuss past work that has addressed the problem of monocular shape analysis. Because such work is too broad, we emphasize the discussion on the more related efforts on generic shape analysis. We organize the previous work into three categories depending on the dimensionality of the input image and the inferred descriptions. In section 2.1, we discuss some methods that use boundaries to infer 2-D generic descriptions based on so-called ribbons (defined later). The emphasis on ribbon-based methods is due to the common use of ribbons as 2-D counterparts of GCs. They provide generic axial descriptions which capture important shape characteristics of image objects such as surface elongation and curvature. In section 2.2, we briefly discuss past work on 3-D shape description from a 3-D range image. The methods we discuss also aim at producing meaningful descriptions from the direct depth measurements. Similar issues as those discussed for intensity images in section 1.1 arise for range images. We also emphasize the 29 discussion on methods that use generic descriptors such as surface and volumetric primitives. In section 2.3, we discuss past work on 3-D shape descriptions from a 2-D image. The discussion emphasizes methods that address generic primitives such as GCs. 2.1 Inferring Ribbon Descriptions from Boundaries There are many methods that attempt to find shape descriptions from boundaries. Here, we will discuss those that infer ribbon descriptions. Several (possibly overlapping) classes of ribbons have been used in the past and include so-called Blum’s ribbons, Brady ribbons and Brooks’ ribbons (see [54] for a comparison between the three classes). Since ribbons consist of symmetry relationships between boundaries, part of a ribbon-based description method is to find those symmetries. Most work that used Blum’s ribbons or Brady’s ribbons was concerned with actually computing those symmetries (recovering the ribbon descriptions) given perfect image boundaries of a single and simple planar object [11,14,54], In this section, we discuss the methods that addressed the more general problem of finding “relevant” description of non-simple objects. For this additional issues need to be addressed besides the ones of finding symmetry relationships. They include how to obtain the “relevant” descriptions from all those which can be found and the related issues of how to define the relevance of a description. We classify these 30 methods into those that used perfect boundaries and those that used real image boundaries. 2.1.1 Methods Using Perfect Boundaries We discuss two methods using different types of ribbons. Nevatia-Binford The method of Nevada and Binford [48] was one of the first to address the segmentation and description of complex curved objects from image boundaries. The input is a range image of a complex (3-D) object, from which a closed outline is derived, and the output is a ribbon-based structured representation of the object. Before we proceed further into the description of the method we first give the definition of the type of ribbons addressed, which we term right ribbons. Definition 2.1: A right ribbon is obtained by sweeping a line segment along a plane curve (axis) while smoothly scaling it and maintaining it orthogonal to it. The axis passes through the mid-point of the swept segment Figure 2.1 gives an example. This definition can be generalized to include a non-necessarily right angle but any constant angle a. This class of ribbon is also termed Brooks’ ribbons (with right angle if the angle is right). This class of ribbons can be thought of as the 2-D counterpart of GCs. Nevatia and Binford’s method consists of forming such ribbon descriptions from the outline. Since parts are not directly given, a search for such descriptions is 31 symmetric (corresponding) points ribbon axis Figure 2.1 Sample right ribbon made. The authors use a projection method which identifies potential ribbon axes in different orientations then apply a linking process, followed by a selection step, aimed at forming complete ribbon descriptions which correspond to parts. An analysis of the relationship between neighboring parte results in the establishment of joint relationships between them in a graph that represents the viewed object. The graph is a representation of the object where nodes correspond to joints and arcs correspond to joined parts (the dual version can also be envisaged). Each part is represented by its ribbon from which several shape properties, such as measures of axis shape and cross-section size, are derived. The descriptions do provide 3-D information along the ribbons (also known as 2 1/2-D). The graphical representation can be used for object recognition using a graph matching approach. Rom-Medioni The method of Rom and Medioni attempts to find a “natural” hierarchical axial (ribbon-based) description of an object given by its outline. The method is based on 32 finding potential parts and their descriptions using both local and global shape information. A local part is defined as a portion of the boundary delimited by two consecutive negative curvature sections. The method proceeds by computing all axes of symmetry, including smooth local symmetry (SLS or Brady ribbon) and parallel symmetry, a generalization of straight line parallelism to curved ones (defined formally in chapter 3). For this a quadratic B-spline representation of the boundaries (using the method of [64]) is used. The SLSs provide local description of a part and parallel symmetry global description of shape. Issues of selection of “good” ribbon axes from the set of all possible axes are addressed using a set of regularity criteria including preference of elongated ribbons. The shape decomposition process proceeds in a recursive fashion whereby local parts are identified and removed from the outline, then the remaining outline is used to find coarser parts by the same process, thus producing a hierarchy of parts descriptions giving a shape decomposition of the given object. Both of the above methods require clean and closed boundaries and would fail in the presence of gaps. Also, the existence of extraneous boundaries, such as markings, would produce many spurious hypotheses whose effect on the true hypotheses are not addressed by the methods. 33 2.1.2 Methods Using Real Images There are many efforts to derive 2-D descriptions from a real image. In this section, we discuss two boundary-based methods that also detect ribbons. The first method addresses ribbon-based structured descriptions of compound objects and the second method addresses object segmentation using ribbon-like surface descriptions in a perceptual grouping framework. Rao-Nevatia The method of Rao and Nevatia [58] is an extension of the method of Nevatia and Binford [48] to handling imperfect image boundaries with gaps, markings and shadows. It takes as input a real image whose boundaries are extracted using a standard edge-detector. Obtained edges are used to find right ribbons segments using the projection method. Obtained ribbons may describe portions of relevant parts or may involve irrelevant or non-related boundaries. They are subsequently grouped to form super-ribbons with the intent to produce more complete parts. The grouping criteria are based on the local similarity of the ribbons; their orientation and widths for example. Obtained super-ribbons are used to form a graphical representation (,scene graph) intended to identify those which are closed, indicating isolated parts, and those which form joints with other ribbons. For instance, a ribbon is mapped onto four nodes, one for each extremity at each end of the ribbon. An analysis of the cycles of that graph is used to identify closures and joints. Ribbons 34 which are not verified (neither closed nor forming joints) are rejected. Others are accepted as descriptors of actual parts. Mohan-N evatia The method of Mohan-Nevatia [45] is similar in spirit to the method of Rao and Nevatia discussed above. It differs in the way the segmentation problem is organized and in the type of ribbons used. It uses perceptual organization as the basis for solving the figure/ground problem. Starting from edges which are linked into boundaries, their method uses co-curvilinearity as a basis for grouping boundaries into more complete contour structures. Axial descriptions between obtained contours are then detected. The type of ribbon they use is not the right ribbon (as previously defined), but one which maps extremities of a pair of boundaries and whose corresponding points have the same relative distance from the matched extremities. This process is applied to all obtained boundaries resulting in several axes of symmetry that include both “good” and “bad” ones. Selection of the “best” axial descriptions (i.e. of likely object surfaces) is based on a constraint satisfaction network implemented as a set of units with both positive and negative links. The network, after a few iterations, is intended to produce higher values for units representing “good” descriptions, based on a number of geometric regularity criteria. Those criteria include the aspect ratio of a symmetric pair of curves, the amount of skew between the curves, the length of the axis and the similarity of the symmetric curves. Selected axial descriptions are then subjected to a further 35 filtering step based on closure verification. The process results in surface descriptions of objects which are further merged, also according to geometric regularities, to produce object descriptions. For example, a hypothesized surface (axis of symmetry between two contours) which is completely included within another one is considered part of the same object as the enclosing one. Both of the above methods address several key issues of the figure/ground problem, namely the inter-dependence of segmentation and shape description and the selection of relevant (likely) hypotheses from false ones. The properties they use in their methods are generic in the sense that they apply to a wide range of objects and viewing conditions. For example, closure is a generic property of the image of an object. Their methods are rather of an intuitive nature, however. For example, the closure criteria used consist of simple boundary links between the extremities of a ribbon end. Although this works well for a purely planar shape, it does not capture the more complex closure patterns of 3-D objects. The selection criteria are also intuitive. There are other methods which detect ribbons in a real intensity image using both contour and intensity information. An example is the method of [32] whose geometric part is similar to that of [58] in that it uses smoothness of the shape variation between ribbons for their grouping. That method uses prior information about expected geometric and photometric properties of ribbons. 36 In essence, past ribbon-based methods have used ribbons as intuitive descriptors of the image of scene (3-D) objects. The segmentation and description problems use largely heuristic constraints and do not take advantage of the 3- dimensionality of the objects they describe. That is, they do not actually relate ribbon descriptions to 3-D shape of the objects or to the descriptions that would be obtained for the same scene viewed from a different viewpoint. This is of course a crucial issue if the purpose is obtaining 3-D shape descriptions. 2.2 Inferring 3-D Descriptions from a 3-D image There is a large number of methods for producing 3-D shape descriptions from a 3- D range image. We will limit this section to a brief overview of two classes of methods: those producing surface-based descriptions and those producing volume- based descriptions. 2.2.1 Surface-Based Methods Surface-based methods address the description of scene objects, from a range image, in terms of surface patches and their relationships. One issue is therefore how to segment the range image into meaningful surface regions. This can be done using either a boundary-based approach or a region-based one. In both cases, local shape measurements on the range image need to be computed first. Curvature measures or planar approximations are examples. A surface region can be defined by a set of enclosing boundaries ([26] for example) or by the aggregation of similar 37 (in the shape measurements sense) neighboring sub-regions ([4] for e.g). Obtained surfaces can then be described by their shape properties, such as area or some fitted surface function. A scene description can simply consist of the obtained surfaces (if the scene contains only one object) or of a more elaborate object-level representation. To infer objects from the surfaces, an analysis of the connectivity relationships between the surfaces is useful. An example is the method of [26] which infers adjacency based on whether boundaries are convex, concave, limb or jump. 2.2.2 Volume-Based Methods Volume-based methods address the inference of descriptions in terms of some generic volumetric shape primitives. There have been methods that use sparse range data and others that use dense range date. An example of the former is the method of Rao and Nevatia [57] for inferring LSHGC (linear SHGC) descriptions. Examples of the latter are methods that fit particular types of generalized cylinders, such as [1], and those that fit super-quadrics, such as [52], and super-ellipsoids, such as [12], to a range image. Such quantitative volumetric primitives as super-quadrics are described by several parameters. Each parameter specifies a certain property of the shape such as tampering and bending. Issues in fitting volumetric primitives to range data include the definition and evaluation of “goodness of fit” measures and how to achieve segmentation (i.e. produce adequate volume descriptions right where parts are). 38 There has been more emphasis on the first issue rather on the second. We believe this to be due to the inadequacy of expressing object segmentation as an optimization problem. In the system of Pentland [52], initial segmentation (skeletonization) is first manually selected then used to find “best fitting” super quadrics using a coarse to fine optimization procedure. Another problem with primitives such as super-quadrics is that the parameters do not have a “natural” meaning. To fit a super-quadric to a 3-D point set, 14 parameters (specifying the deformation of a prototype super-quadric) need to be found which give among others, its position, orientation, squareness / roundness, bending and tapering. It is difficult to predict the effect of a change in one of these parameters on the shape of the primitive as a whole. We find GCs to be more natural descriptions as they capture intuitive notions such as cross-section and axis. Super-quadrics do not explicitly capture such fundamental shape notions. Another way volumetric descriptions could be achieved is to use differential geometry as a means to predict surface properties. For example, in the method of Rom and Medioni [63], sub-classes of GCs including SHGCs and PRCGCs were used as generic volumetric primitives. Some of their differential geometric properties were used to obtain useful starting features to obtain segmentation into meaningful parts. 39 The advantage of range images is that they provide direct depth measurements and do not produce extraneous information such as markings. However, it appears that the best results have been obtained when edges are explicitly used. Further, range images require carefully controlled viewing conditions, which is not easily achieved outside a properly engineered environment. 2.3 Inferring 3-D Descriptions from a 2-D image Inferring 3-D shape descriptions from a 2-D image is a harder problem than when the image and the shape descriptions have the same dimension. There have been past efforts that addressed issues of this problem, namely the analysis of the image properties of 3-D shapes and the way they can be used to actually infer 3-D descriptions. Some previous work has addressed both issues resulting in methods and accounts for 3-D shape recovery and some of it has only addressed the first issue resulting in a set of useful projective properties. We will first discuss some psychological accounts of human perception of 3- D objects from a 2-D image. Then, we discuss recovery methods which assumed perfect or synthetic boundaries and finally we discuss methods which have addressed real images. 40 2.3.1 Psychological Accounts of Human Perception of 3-D Objects from a 2-D Image There are two schools of thought on human perception and recognition of 3-D objects from a 2-D line drawing. The first one contends that humans register a set of “familiar” views of an object and achieve perception (and recognition) of a new view by comparing it with those already in store using, if needed, a mental rotation. This is the “view-specific” account of human perception [25,61]. The second one argues that object perception is achieved through view-invariant perception of a set of volumetric primitive shapes which do not require any familiarity with the particular object. This set of primitives are the basic elements of an object’s shape. This is the “invariant-primitives” account of human perception [5,6]. Both schools have presented arguments as to the validity of their accounts. The arguments given by the “view-specific” school involve experiments on wire frame-like objects where our human perception of the 3-D shape of the objects requires some prior familiarity with those objects. The arguments given by the “invariant-primitives” school involve experiments on more common objects which consist of volumetric parts. Studies conducted by Biederman [5] have revealed that a small number of volumetric primitives, geons, constitute the basis for invariant 3-D shape perception. This has lead to the recognition by components (RBC) account of human perception. Biederman and Gerhardstein [6] suggest three conditions for an 41 object to induce this invariance: that the object admits a description in terms of volumetric primitives such as geons, that the same structural description of the object in terms of those primitives is obtained for different viewpoints and that different objects produce different structural descriptions in term of those primitives. We believe that most common objects do satisfy the above conditions. This is not to imply that the “view-specific” theory is wrong however. Rather, the objects which require several specific views for their recognition seem to be less common than those which do not. 2.3.2 Method Using Perfect/Synthetic Boundaries Methods for inferring 3-D descriptions from a perfect and/or synthetic 2-D boundary image can be classified into those producing quantitative descriptions and those producing qualitative descriptions. 2.3.2.1 Quantitative Shape Recovery Methods Quantitative shape descriptions consist either of surface descriptions based on surface orientations or the parameters of some volumetric primitives. Early work has addressed polyhedral objects and include Mackworth’s system [41] which addressed recovery of the orientations of the faces of a polyhedral object from its 2-D outline. The basis of the method is the use of constraints in gradient space (a 2-D space used to represent vectors) on the face normals from the relationships 42 between their boundaries in the image. For example, a line common to two faces is perpendicular, in gradient space, to the line joining the orientations of those faces. Kanade [37] has used skew symmetry as the basis for constraining the face orientations. The idea is that a skew symmetric figure can be interpreted as rising from an orthogonal symmetry in 3-D and consequently constrains the set of possible orientations of the figure to a hyperbola in gradient space. Additional constraints, as the one used by Mackworth based on the intersection of faces, can be used to further restrict the possible solutions. Curved objects have more recently been the subject of interest of some researchers. Stevens [68] proposed a method for inferring the 3-D shape of a cylindrical surface patch from the image of rulings drawn on its surface. The idea is also based on the interpretation of image angles (between the rulings) as projections of 3-D orthogonal ones. Similar methods based on this idea were also proposed by Xu and Tsuji [74]. Other classes of methods have used a different approach based on maximizing some regularity criteria. Examples include Brady and Yuille’s shape compactness method for recovering the orientation of a planar face from its 2-D outline [15] and Barrow and Tannenbaum’s shape smoothness method for recovering the orientation of a curved surface from its 2-D outline [2]. The above efforts have addressed individual surfaces not volumetric shapes. Work on volumetric shapes includes Nalwa’s derivation of a bilateral symmetry condition for surfaces of revolution (SORs) [47] which are SHGCs with a circular 43 cross-section and which are symmetric with respect to their axes. It also includes Ponce et al.'s derivation of some projective invariant properties of SHGCs [55]. Those properties are detailed in chapter 3. The properties were used to identify partial shape information (the projection of the axis). They did not, nor did Nalwa, provide methods to recover complete 3-D descriptions. Other methods to recover partial shape information include [21] which recovers the scaling function of an SHGC from its boundaries and prior knowledge of the cross-section shape. Other more recent efforts have derived methods for complete recovery of 3-D shape of GCs. Some of them use only contours and others use a combination of contours and shading. Examples of this latter type of methods include those of Gross and Boult’s [28] and Nakamura et a l * s [46] for recovering SHGCs. In the following, we will discuss in some detail the method of Ulupinar and Nevatia which uses only contours. Their method is a good example to discuss because it captures the key issues of the problem and also because our method uses some of their results and thus the discussion is relevant to the description of our research. Ulupinar-Nevatia Ulupinar and Nevatia [69,70,71,72] have addressed the recovery of a number of generic shapes including zero-gaussian curvature (ZGC) surfaces, SHGCs, PRCGCs and multiple-ZGC surface objects. The input to their methods are perfect (synthetic) and segmented boundaries of an object and the output is a description of 44 the surfaces of the object in terms of their orientations (surface normals). The basis of their method, as the one of Kanade for polyhedral objects, is that a number of symmetry relationships in the image strongly constrain 3-D shape. They have derived a number of orthographic invariant symmetry properties of ZGCs, SHGCs and PRCGCs which relate their boundaries in the image. An example is parallel symmetry which is a property of the cross-sections of an SHGC and of the limb boundaries of PRCGCs (the properties are given in detail in chapter 3). The correspondences given by those symmetries are used in a set of differential geometric constraints on the surface orientations in gradient space. For an SHGC, before applying those constraints, their method uses the cross- section and the limb boundaries to “rule” the surface; i.e. reconstruct the image of the cross-sections and meridians. Application of the constraints on say n points (i.e. 2n surface normal unknowns) along each of m cross-sections of interest results in 2mn -4 equations, clearly an under-constrained system of equations. The authors show that under the assumption that the SHGC is “right” (cross-section orthogonal to the axis), then 3-D shape is determined up to one degree of freedom. They suggest that this latter can be fixed by finding the interpretation which favors a medium cross-section slant obtained by an ellipse fitting algorithm to the observed cross-section curves. A similar method is used for ZGCs and PRCGCs. This method does not produce the intrinsic description of a GC but surface orientations in a viewer- 45 centered coordinate system. We will show, however, in chapter 6 that for the class of GCs addressed, there is a relationship between both descriptions. 23.2.2 Qualitative Shape Recovery Methods In this sub-section, we discuss methods that have addressed the recovery of qualitative volumetric descriptions from perfect or synthetic image boundaries. These methods provide implementations of some aspects of the RBC theory of Biederman. Dickinson et al. Dickinson et al. [22] approach the problem using a distributed aspect graph (DAG) matching method. The idea is to use a small number of geons (ten) as models from which a hierarchical aspect representation is generated together with conditional probabilities on the mappings between 3-D models and 2-D aspects. This is achieved by sampling the viewing sphere at several points and finding the different aspects of the images of the 3-D primitives. The descriptions in the aspect hierarchy consists of qualitative relationships between faces and boundaries, such as convexity and adjacency. Identification of geons from given, manually segmented, boundaries proceeds by first forming faces then finding possible matches (in a graph-theoretic sense) between each face and those of the aspect hierarchy. Each hypothesis is associated with its probability extracted from the appropriate pre computed probability matrix mapping boundary groups to faces. Face hypotheses 46 are then used to find potential aspects from the model aspect hierarchy that could have given rise to them. Those aspects are then mapped to potential 3-D primitives that could have generated them. The interpretation in terms of 3-D primitives of the set of image faces are chosen to be those which produce the highest probabilities (i.e. the most likely interpretation). Relationships between primitives are then extracted to produce a representation based on geons and their connectivity. Bergevin-Levine Bergevin and Levine [3] have taken a different approach than Dickinson et al.'s for the inference of geon-based structured descriptions from a 2-D image. Their method consists of using an “object visibility model” whereby expected image properties of meaningful parts and their connectivity are defined. The input to their method is a line drawing (perfect boundaries) of an object and its output a decomposition of those boundaries into a set meaningful parts. Initially, boundaries are manually partitioned into straight and curved segments. A set of non-accidental properties such as co-terminations, junctions, parallelism, corners and T-junctions are subsequently detected to provide the needed low-level structures to form faces. These latter are formed using a contour following algorithm based on pairing corners and T-junctions. Two subsequent steps, the part inference and the connectivity extraction, proceed to form a part/whole description of the line drawing. The part inference step identifies among the detected faces those which would form consistent parts hypotheses. Consistency is defined in terms of a 47 number of qualitative criteria such as similarity of the faces and their adjacency (derived from their visibility model). A unique geon label is then assigned to each detected part. The label consists of four attributes, namely whether the cross-section is symmetric or not, whether it is polygonal of curved, the way its size varies (constant or expanding for e.g.) and whether its axis is straight or curved. This inference is also based on the qualitative properties of the faces previously found. The connectivity between parts is inferred from the junctions between them. The resulting structured descriptions have been applied to recognition from a database of similarly represented objects. Hummel-Biederman Hummel and Biederman [34] have proposed a neural network implementation of the RBC theory. The input to their method is a synthetic, but possibly broken, line drawing depicting an object and its output a structured description of the object in terms of geons and their relationships. To achieve this, the authors use a seven layer architecture (JIM) whose lower levels are dedicated to parsing and representing image edges, junctions between them and surface properties such as symmetries and elongation. The higher levels are dedicated to representing geons (using their attributes such as cross-section and axis shapes), geon relations and feature assemblies and objects. The crux of the method is to use dynamic binding to represent a viewpoint-invariant structured description of objects whereby conjunctions between features of the same group 48 (geon, object, junction) are captured by synchronizing the activities of the corresponding cells using fast enabling links (FELs). For example, an object consisting of one geon on top of another would result in the synchrony of the activities of the cell corresponding to the first geon with that corresponding to the “above” relationship (a similar representation would be achieved for the second geon). Each geon would correspond to the synchrony of the activities of the cells representing its attributes, etc. The process proceeds by forming an FEL chain for each geon. This begins with the grouping of edges using a dynamic process where continuity (co circularity) between boundaries favors their grouping and the presence of junctions prohibits it. Links between junctions are determined by links between their edges. Thus edges and junctions (vertices) constitute the basic building blocks to form parts. Although the main focus of the method is to achieve viewpoint-invariant geon-based descriptions from a line drawing, it has also been used to perform object classification based on those descriptions at the highest levels of the architecture. The method has been applied to single object line drawings and assumes that certain features such as axes of symmetry are given in advance. The above methods have not been applied to imperfect (real) image boundaries with gaps and markings for example. It is to be expected that they would have difficulties correctly interpreting such boundaries. For example, in the method 49 of [3] the boundary following algorithm may not produce correct segmentation if surface markings interfere with the object’s boundaries. We believe that clean boundaries that belong only to the outline of relevant objects cannot be obtained by a low level filter applied to image edges. As previously mentioned in the introduction, and as will be shown in this thesis, this requires simultaneously solving the segmentation and shape description problems, using not only local edge information but global evidence of regularity as well. 2.3.3 Methods Using Real Images Past work that has addressed inference of 3-D descriptions from real (imperfect) images can be divided into specific-model-based approaches and generic ones. While several 3-D from 2-D specific-model-based methods can be found in the literature, very little can be found on generic methods. In this section, we will not survey the methods “purely” based on specific models to recognize 3-D objects from 2-D images as most of them focus little on obtaining generic descriptions in terms of higher level primitives. A suitable source for such a survey would be [35,36]. Recent attempts, such as [29], do address descriptions in terms of groupings of a number of perceptually salient generic features such as symmetries and closure. Here, we discuss two methods for recovery of 3-D objects from a single real 2-D image. The first method uses specific models but also makes use of generic 50 shapes. The other method we discuss follows a generic approach. Both methods do address the segmentation problem' Brooks A classical example of a method using specific object models for the detection of 3-D objects from real monocular image boundaries is Brooks’ Acronym system [16]. The input to this system is an intensity image and the output a description of identified objects in terms of their location, orientation and shape. For this, Acronym uses a database of 3-D models (airplanes) represented as generalized cylinders with their shape attributes and their connectivity relationships. The goal is to identify instances of such airplanes (with their pose) in aerial images of airports. The model airplanes are used to predict image features that can serve to hypothesize presence of image airplanes. Using a rule-based geometric reasoning process, generic shapes such as right ribbons and ellipses are predicted together with ranges for their shape parameters and their relationships. Only straight axis ribbons with linear sweeps are used to represent the images of the 3-D GCs corresponding to parts of airplanes (bodies and wings for e.g). This process results in a prediction graph used to find consistent features in the image. These latter are in turn used to induce back-constraints on the candidate 3-D models both to check global consistency of the image hypotheses and determine the pose of the 3-D models. The matching process, based on symbolic algebraic constraints, either rejects hypotheses or results in consistent identification of models and their refined 51 pose. Although, in principle, this system can be applied to different shape models, it has been tested only on straight axis shapes (airplanes) with a top view. Sato-Binford Sato and Binford [66] have addressed generic detection and recovery of SHGCs from a real intensity image. The input to their method is an intensity image and the output the 3-D intrinsic description of the detected SHGC. Their method is organized in four modules. The first module, end finder, is intended to detect the ends of an SHGC. Those ends are the parallel symmetric parts of the visible “top” and “bottom” cuts (cross-sections). Parallel symmetry is detected using a Hough like algorithm to find the correspondences between a pair of curves. In practice, one end is hand-selected and its symmetric ones are detected. The second module, meridian finder, attempts to identify the meridians of an SHGC given the previously detected symmetric ends. For this, junctions involving the detected ends are detected first. Then, using connectivity criteria between junctions, boundaries are tested for some of the properties of SHGCs similar to those used by Ponce et al. [55]. The correspondences between a pair of candidate meridians are assumed to be all parallel to the line joining opposite junctions at one of the symmetric ends. This results in pairs of boundaries that can be interpreted as meridian projections and an estimate of the image axis of the SHGC. The third module, cross-section finder, consists of finding the cross-section and the fourth module, 3-D recoverer, consists of recovering 3-D shape. In this latter, the cross-section is assumed to be skew- 52 symmetric and the axis to pierce it at its center. This constraint provides a 3-D orthogonal system (the planar cross-section system and the axis) which can be used to estimate the scaling function. This method captures a desirable characteristic of inferring 3-D descriptions from a 2-D image and which is the use of well defined projective properties. However, it has been applied to single object scenes and does not handle occlusion. Its scope is further restricted to SORs and LSHGCs (linear SHGCs, where the scaling function is linear in the position along the axis). This method has some similarities with ours on the detection of SHGCs (both were developed independently of each other). The differences lie in the more general scope of our method, which handles occlusion and other types of SHGCs, and in the way the projective invariant properties are applied. Apart from the work of Sato and Binford, and ours, we are not aware of any other work on the automatic segmentation and 3-D inference of GCs from a single real intensity image. 53 3 Projective Properties In this chapter we discuss the projective properties of the three sub-classes of GCs used in this thesis: SHGCs, PRCGCs and circular PRGCs. These properties are important because from them we can derive features and their relationships that can be used for object detection and for 3-D reconstruction. The properties take explicit account of the three-dimensionality of the objects and of their desired descriptions. We first discuss the geometric properties of the primitives and then their structural ones. 3.1 Geometric Projective Properties Geometric projective properties1 are those which characterize the intrinsic description of a GC, namely its cross-section, its axis and its scaling function. They consist of (symmetry) relationships between the boundaries of an object and fall in two categories: invariant properties and quasi-invariant properties. As introduced 1. note that we use the term projective properties to refer to properties of the (orthographic) projection. 54 in section 1.2.2, invariant properties hold independently of the viewing direction while quasi-invariant properties are allowed to vary but remain restricted to a small range of values over a large fraction of the viewing sphere. Both types of properties can be thought of as parameterized mappings between two spaces. Figure 3.1 gives an illustration. An invariant mapping I maps an element X of the space V (which could be R3, for example), through a point P of the parameter space p (which could be multi-dimensional; the viewing sphere, for example) onto a point IP (X) of some range space R (which could be the set of real numbers, for example). The fundamental property of the invariant mapping / is that IP (X) is actually independent of P and depends only on X; i.e. 7P (X) = /,» (X )V />,/>’ £ p . A quasi-invariant mapping Q maps an element X through a parameter point P onto Qp (X) which has the property that it lies within a relatively small subset 5 (X) of R when P varies over most of p ; i.e. QP (X) G S (X) for P G “most of p . ” This formulation is clearly not mathematically rigorous. Despite some recent attempts to formalize quasi-invariants as rigorously as invariants [10], they are best analyzed statistically [9,17]. This is due to the difficulty of formally expressing notions such as “small range” and “most of the space.” We have used the above formulation because it parallels the one used for invariants. We will address the 55 statistical analyses of specific quasi-invariant properties derived in this thesis later in this section. mapping / a mapping Q b. Figure 3.1 Invariant (a) and quasi-invariant (b) mappings. We start by discussing the geometric projective properties of SHGCs; the properties of PRCGCs and circular PRGCs are discussed next. The projection geometry is assumed to be orthographic in this thesis. The use of the properties for the segmentation and recovery will be discussed in chapters 4 and 5 and 6. 3.1.1 Geometric Projective Properties of SHGCs SHGCs and their properties have been studied by several researchers in the last few years [55,67,69]. We include the relevant properties from previous work and new properties we have derived in the discussion below. First, we give relevant definitions. Definition 3.1: An SHGC (straight homogeneous generalized cylinder) is given by the intrinsic description {C, A, r}, where C is a planar curve (cross- 56 section), A a straight line (axis; not necessarily orthogonal to the cross-section plane) and r a scaling function. From the definition of the intrinsic description given in section 1.2, an SHGC is the surface obtained by sweeping C along A while scaling it by r. Let C(t) = (u(t), v(t)) be a parametrization of C in a 2-D orthonormal frame (O, h, b) attached to its plane, r(s) the scaling function and a the angle between the cross- section plane and the SHGC axis (^-direction), then the surface of the SHGC can be parameterized as follows (using the formulation of [67]): S(t, s) = ( u(t) r(s) sina, v(t)r(s), s + u(t) r(s) c o s a ) (3.1) Equation (3.1) implies that for each value of s, the cross-section is scaled by an amount r(s). We impose C(t) to be piecewise differentiable (C1) and the function r(s) to be C 1 so that edges created on the surface are only due to discontinuities of the cross-section function derivative. When r{s) is linear, i.e. r(.s) = a (s - s q ), we obtain a linear SHGC, or LSHGC, examples of which are cylinders and cones. When a = n / 2, we obtain a right SHGC (RSHGC), making the 3-D coordinate frame (O, h, i> , I) orthonormal. Curves of constant t are called meridians and curves of constant s are called cross-sections (alsoparallels). The point O where the axis pierces the cross-section plan is called the origin of the SHGC. Note that C need not be circular and O is not necessarily its center. Also, we are only interested 2. in case an object has edges caused by discontinuities in the scaling function, it will be considered as more than one SHGC; the superposition of a cone and a cylinder for example. 57 in closed cross-section curves, a characteristic of volumetric parts. Figure 3.2 illustrates the terminology and the chosen configuration of the axes. v cross-sections meridians ► s (aligned with SHGC axis) Figure 3.2 SHGC representation and terminology. Definition 3.2: Two planar unit speed curves3 C\(w{) and C2(w2) are said to be parallel symmetric [69] if there exists a continuous and monotonic function/, such that T ^ w f) = T2(w2) and w2 = /(u^); where TfiVj) is the unit tangent vector of Ctiwi). Thus, corresponding points have parallel tangent vectors. The correspondence is said to be linear if/is a linear function. In this case the two curves are similar up to scale and translation. The axis is the locus of midpoints of lines o f symmetry (also lines o f correspondence) which join symmetric points. Figure 3.3 gives an example. A property of linear parallel symmetric curves is that lines of symmetry are either mutually parallel (for a unit scaling) or all intersect at one point (apex). 3. a curve is unit speed if it is parameterized by arclength. 58 lines of parallel symmetry (also lines of correspondence) axis of symmetr Figure 3.3 Example o f linear parallel symmetry Now we state the invariant properties of SHGCs. Those that have been derived in previous work [55,67,69] are stated without proofs. The ones that we introduce here are given with their proofs. Figure 3.4 illustrates the properties. Property 3.1: Cross-section curves of an SHGC are mutually parallel symmetric with a linear correspondence. This property holds in 3-D and in the 2-D projection regardless of the viewing direction. The proof can be found in theorem 4 and its corollary in [69]. Property 3.2: Contour generators (limbs) of an LSHGC are straight (they are its meridians). This property holds also for the 2-D projection of limbs which are projections of those meridians, regardless of the viewing direction. The proof can be found in section 4 of [67]. 59 Property 3.3: In 3-D, tangents to the surface in the direction of the meridians at points on the same cross-section, when not parallel, intersect at a common point on the axis of the SHGC [67]. In 2-D, tangents to the projections of limbs intersect on the projection of the axis at a common point regardless of the viewpoint [55,69]. The properties we add have been reported without proofs in an overview of this work in [50]. Equivalent ones have been independently derived by [66]. Property 3.4: We give this property in the form of a theorem and its corollary below. Their proofs are given in appendix A. Theorem 3.4: Lines of correspondence between any pair of cross-section curves are either parallel to the axis or intersect on the axis at the same point. Corollary 3.4: In the image projection, lines of parallel symmetry between any pair of projected cross-sections are either parallel to the projection of the axis or intersect on it at a common point regardless of the viewing direction. 60 Property 3.5: Let Ci(u) and C2 (v) be two unit speed parallel symmetric curves with a linear correspondence f u ) = au + b. Then for all u and u’ the vectors V _i = Ci(w’) - Cj(m) and V2 = C2(au’ + b) - C2 (au + b) are parallel and \V2\ / \VL \ = a (i.e. the ratio of their lengths is constant and equal to the scaling of the correspondence). The proof of this property is also given in appendix A. SHGC axis Tangents at corresponding points co-intersect. Property 3.3 Corresponding vectors are parallel. Property 3.5 i — Parallel symmetric cross- / / section curves. / / Property 3.1 Lines of parallel symmetry co-intersect. Property 3.4 I LSHGC limbs are straight Property 3.2. Figure 3.4 Geometric invariant properties o f SHGCs 61 3.1.2 G eom etric Projective Properties of PR CG Cs We first give relevant definitions then give the properties from past work and newly derived ones in this thesis. Definition 3.3: A PRCGC (planar right constant generalized cylinder) is given by the intrinsic description {C, A, r], where C is a planar curve (cross- section), A a planar curve (axis) always orthogonal to C (and passing through its centroid) and r a constant scaling function (without loss of generality fixed to be equal to 1). In other words, a PRCGC is obtained by sweeping C along A while maintaining C constant and orthogonal to A. The constraint that A passes through the centroid of C is not restrictive. In appendix C, it is shown that the family of PRCGCs which differ in the positions of their origin (point of contact between A and C) form an equivalence class; i.e. the shape and properties of a PRCGC are independent of the position of the axis (equivalently, of the parameterization of its cross-section C). Letting C(0) = (p(0) cos0, p(0) sin0) be a parameterization4 of C, A(s) an arclength parameterization of A, the surface of a PRCGC can be parameterized as follows (with respect to some world coordinate system); P(s, 0) = A(s) + p(0) cos0 ft (s) + p(0) sin0 % ( s) (3.2) 4. the cross-section curve C is assumed to be star-shaped to admit such a parameterization, chosen only to simplify the mathematical analysis. Extension to non-star shapes only requires a different parameterization. 62 where h(s) and b {s) are respectively the normal and binormal to the axis curve A(s) whose Frenet-Serret frame is 0 (s), h (s), h (5 ) ) . Since A is planar, b(s) is a constant vector. Figure 3.5 illustrates this representation. The function A(s) is assumed to be C 1, and at least piecewise C3, so that edges on the surface are only due to vertices of the cross-section function p(0) which is assumed to be piecewise C1. The cross-sections are the curves of constant s and the meridians are the curves of constant 0. axis A(s) meridian cross-section O x world coordinate system Figure 3.5 Sample PRCGC and related representation To make the discussion in the subsequent sections concise, we give the following useful definitions (see figure 3.6). Definition 3.4: Two points of a GC are called co-cross-sectional points if they belong to the same cross-section. 63 Definition 3.5: The image line segment joining the projection of co-cross- sectional points is called a cross-section segment. The mid-point of a cross-section segment is called a 2-D axis point. The locus of 2-D axis points is called the 2-D axis.5 co-cross-sectional points cross-section segment projection of 3-D axis point 2-D axis point Figure 3.6 Terminology o f (the projection of) a GC To discuss the geometric projective of PRCGCs (and later, circular PRGCs) we need to derive the analytic expressions of their images. For this we need to write their limb equations. This is discussed in the section below. A more general mathematical derivation is provided by [53], 3.1.2.1 Limbs and Projections of PRCGCs A point P (s, 0) is a limb point if and only if the (unit) viewing vector P is orthogonal to its surface normal f t (s, 6) . This latter is given by f t (s) { ) ds a e dp a H I ds x ae (3.3) 5. Note that the 2-D axis is not necessarily the projection of the 3-D axis. 64 where P = P(s, 0). The viewing vector V can be expressed in the Frenet-Serret frame (r ( 5 ), fl (s), (s) ) as follows: P = cos(3(.v) } (s) + sin(3(.v) cosoc(.s) fl (s) +sin|3(.s) sina(.v) (3.4) where (a(j), (3(i)) are its spherical coordinates in the frame (see figure 3.7). > h Figure 3.7 Coordinates o f the viewing direction in the Frenet-Serret fram e Omitting the details of the expression of f t (s, 0) in the Frenet-Serret frame (which can be found in [53]), the limb equation is obtained by writing that the dot product of f t (s, 0) and P is zero. This yields the following limb equation: p (0 ) sin(0 - a(s)) + p(0) cos (0 - oc(s)) = 0 (3.5) where p (0) is the derivative of p(0). Assuming sin(0 - a(s))^0 6and dropping the arguments 0 and s, equation 3.5 can be written as follows cot (0 - a) = - p / p (3.6) 6. otherwise equation 3.5 would necessarily imply a zero radius. 65 which in the special case of a circular cross-section (i.e. a circular PRCGC where p = 0) becomes cos(0 - a ) = 0 (3.7) yielding two limb points determined by 0 = ±7t / 2 + a(s); i.e. two diametrically opposite points (figure 3.8). Figure 3.8 Limb points properties fo r a circular PRCGC The orthographic projection of the (3-D) Frenet-Serret frame (), ft, b) on a plane orthogonal tot^ gives a “moving” local 2-D frame (ft, i>) in the image for each value of s. The relationship between the 3-D and the 2-D frames is as follows: Let P = tl + njt + bt> be a point expressed in the 3-D frame, then its projection, p, on a plane orthogonal to V is given by p = (- sinp t + cospcosa n + cospsina b) ft + (- sina n + cosa b) ^ (3.8) where ft = ft (5) is the projection of 7 and t > = t> (s) is orthogonal to ft following the right hand rule. Written in vector form, local 3-D coordinates (t, n, b f project as local 2-D coordinates p = (- sinp t + cosp cosa n + cosp sina b, - sina n + co sa b)1 (3.9) 66 The projection of the axis point A(s) is thus the origin of the local 2-D frame. Using equation 3.9, a PRCGC point P(s, 0) can be shown to project as in the image frame (ft, ?> ). From the terminology given in definition 3.5, a cross-section segment of a PRCGC, between the projections of co-cross-sectional limb points P(s, 0j) and P(s, 02), can be expressed in its local 2-D frame as PRCGC and its axis. Its proof is omitted here. P roperty 3.6: The limb (or meridian) projections of a PRCGC are parallel symmetric (not necessarily with a linear correspondence), such that symmetric points are co-cross-sectional, regardless of the viewing direction. Furthermore, the 7. we w ill use the term “side” boundaries to refer to both limb and meridian projections. (3.10) cosp (p 2cos (0 2 - a ) - pjcos (0X - a ) ) p 2sin (0 2 - a ) - p 1sin ( 0 X - a ) (3.11) where p; = p(0j); and its midpoint (2-D axis point) is given by (3.12) 3.1.2.2 Properties of PR CG Cs The following property is due to [70] and relates the “side” boundaries7 of a 67 image of the 3-D axis is also parallel symmetric to the limb (or meridian) projections such that symmetric points are on the same cross-section. In the special case where the cross-section is circular (i.e. circular PRCGC), we can derive additional properties. They establish further relationships between the 2-D axis and the projection of the 3-D axis and between cross-section segments and the 2-D axis of a circular PRCGC. Figure 3.9 illustrates the new properties. The following three new properties are proved in appendix A. P roperty 3.7: In an orthographic projection of a circular PRCGC, the 2-D axis and the projection of the 3-D axis coincide regardless of the viewing direction. An immediate corollary of this property is that, at co-cross-sectional points, the angle ( j) between the tangents to 2-D axis and the projection of the tangent to the 3-D axis is zero (the two axes are parallel since they coincide). P roperty 3.8: In the projection of a circular PRCGC, the angle y between cross-section segments and 2-D axis tangent is n / 2 regardless of the viewing direction. P roperty 3.9: In the projection of a circular PRCGC, the length of the cross- section segments is constant (it is equal to the diameter of the 3-D cross-section). Properties 3.7, 3.8 and 3.9 say that in the image plane, the projection of a circular PRCGC can be described by a right ribbon with has a constant sweep function such that the ribbon corresponding points are the projections of co-cross- sectional points and the ribbon axis the projection of the 3-D axis. 68 cross-section segments are cross-section segments have a constant size orthogonal to 2-D axis 2-D axis and projection of 3-D axis coincide Figure 3.9 Invariant properties o f circular PRCGCs For a PRCGC with a non-circular cross-section, it can be shown that the cross-section segment and the 2-D axis are not mutually orthogonal for arbitrary (although they are parallel symmetric) and the cross-section segment is not constant. In appendix C, an analysis is given to demonstrate this. 3.1.3 G eom etric Projective Properties of C ircular PRGCs Definition 3.6: A circular PRGC (circular planar right generalized cylinder) is given by the intrinsic description {C, A, r}, where C is a circular curve (cross- section), A a planar curve (axis) passing through the center of C and always orthogonal to it, and r a scaling function. In other words, a circular PRGC is obtained by sweeping a circle C along a planar axis curve A while scaling by r it and maintaining it orthogonal to it (A is restricted to pass through the center of C; it can be shown that, unlike PRCGCs, cross-section shapes, that the 2-D axis and the 3-D axis do not coincide in general 69 circular PRGCs which differ only in the position of the axis do not form an equivalence class). Letting C(0) = (p cos0, p sin0) be a parametrization of C (p is the constant radius, fixed without loss of generality to 1), A(s) an arclength parameterization of A (assumed to be C1 and at least piecewise C3 for the same reason as for PRCGCs) and r(s) a C1 function, the surface of a circular PRGC can be parameterized as follows: P(s, 0) = A(s) + r(s) cos0 tl (5) + r(s) sin0 b (5) (3,13) where as before ft (s) and t> (s) are respectively the normal and (constant) binormal to the axis curve A(s). Figure 3.10 illustrates this representation. The cross-sections are the curves of constant s and the meridians are the curves of constant 0. meridian. z cross-section world coordinate system Figure 3.10 Sample circular PRGC and related representation 70 As before, to study the geometric projective properties of circular PRGCs, it useful to derive the expression of their projections. 3.1.3.1 Limbs and Projections of Circular PRGCs Using the expressions of the surface normal given in equation 3.3 and of the viewing direction given in equation 3.4, the limb equation of a circular PRGC can be shown to be given by sinpC?) [1 - K(,s)p r(s) cos0] cos(0 - a(s)) - p r (s) cosfKs) = 0 (3.14) where k(s) is the curvature of the axis A(s) and r (s) the derivative of r(s). Assuming sin(p(s))^0 ,8 and dropping the arguments 0 and 5, this equation can be written as follows [1 - k r cos0] cos(0 - a ) = r cotp (3.15) The limb equation 3.15 holds for a general circular PRGC. In the special case of a straight axis (i.e. a surface of revolution or SOR, k = 0) the limb equation becomes cos(0 - a) = r cotp (3.16) which yields two solutions of opposite angular distance from a (see figure 3.11). When, r = 0), we obtain a circular PRCGC which has already been analyzed in section 3.1.2. 8. which excludes ^ being parallel to ^, a non general viewing direction for which the limb equation has an infinite or zero number of solutions. 71 02 - aCs) V P(s,Q 2) P(s,Q 1) 0 ! - a(s) Figure 3.11 Limb points properties o f an SOR. For the general case of circular PRGCs (k * 0) and r * 0), the relationship between the two limb points is not as straightforward as the two sub-cases previously discussed. However, we will later show that they do have a well behaved relationship. Using the projection equation 3.9, a circular PRGC point P(s, 0) can be shown to project as in the image frame (ti, f>). The cross-section segment between projections of co-cross-sectional limb points P(s, 0 J and P(s, 02), can be expressed in its local 2- D frame as (3.17) r I ( cosp (cos (0 2 - a ) - cos (0 : - a ) ) sin (0 2 - a ) - sin (0 j - a ) (3.18) 72 and its midpoint (2-D axis point) is given by . cosp (cos (0 2 — a ) + co s (6, - a ) h 20 ~ 2v sin (0 2 - a ) + sin (0 2 - a ) ' 3.1.3.2 Properties of C ircular PRGCs The properties we give below also give relationships between projections of co- cross-sectional limb points and the 2-D axis and between the 2-D axis and the projection of the 3-D axis. To make the discussion complete, we first consider the special of a SOR. The following two properties are illustrated in figure 3.12 and proved in appendix A. P roperty 3.10: In an orthographic projection of a SOR, the 2-D axis and the projection of the 3-D axis coincide regardless of the viewing direction. As for Property 3.7, a consequence of this property is that the angle < j ) between tangents to the 2-D axis and to the projection of the 3-D axis is zero (since the axes are collinear). P roperty 3.11: In an orthographic projection of a SOR, the angle y between cross-section segments and the (straight) 2-D axis at their midpoints is 7 t / 2 regardless of the viewing direction. Properties 3.10 and 3.11 are equivalent to the bilateral symmetry property Nalwa derived for SORs [47]. Both properties also indicate that the projection of a SOR can be described by a right ribbon (with a non-constant sweep in general) such 73 cross-section segments are orthogonal to 2-D axis 2-D axis and projection -• of 3-D axis coincide Figure 3.12 Invariant properties ofSO Rs that the ribbon corresponding points are projections of co-cross-sectional points and the ribbon axis the projection of the 3-D axis. Properties of general circular PRGCs (non-straight axis and non-constant sweep) have not been previously addressed in the literature. So far, all the given properties are invariant with respect to the viewing direction. It can easily be shown that they are not invariant for general circular PRGCs. For example, the orthogonality of the 2-D axis tangents and the cross-section segments is not a property of general circular PRGCs. Equation 3.20 gives the expression of the 2-D axis tangent and equation 3.21 gives the dot product of this tangent with the cross- section segment (the value of this product is zero for PRCGCs and SORs but is non zero otherwise). V / ;3.20) 74 1 1 (£, t 2D) = r [ - cosp2 (c2 - cx) [ # » (cx + c2) - 0 1c1 + 02c2J ] - sinp (c2 - Cj) [1 - - kr (cos0!+ cos02)] + i (s2 - s{) [ r (s x + s2) + r[ ©iCj + 02c2J j ] (3.21) where c ,- = cos(0, - a), S; = sin(0, - a), 0, = 90^/3^ for i = 1,2 (at limb points). Below, we show that there are quasi-invariant properties which are as useful as the previous invariant properties. For this we have to a) identify the relevant parameter space of observation, giving the parameters and their ranges of values b) show that the properties of interest constitute quasi-invariant mappings in that their values remain within a “small” set over “most” of the parameter space. Space of Observation Let us rewrite the limb equation 3.15 in the following form (1- e cos0) cos(0 - a) = r cotp (3.22) where e = kt = r / R (R being the radius of curvature of the axis), e is a local measure of the relative thickness of the shape of the circular PRGC. Smaller values indicate rather elongated surfaces (small cross-section radius compared to axis curvature radius) whereas larger values thick and highly bent surfaces. We will call 75 e the thickness ratio-, r is a measure of how “fast” the cross-section changes its size (radius). Equation 3.22 indicates that two pairs of parameters affect the behavior of the contours of a general circular PRGC: • (a, P) corresponding to the viewing direction • (e,r) corresponding to the object parameters (local shape measures) The whole space of observation is a 4-D space ( a, p, e, r) . For the same object, those parameters vary as s varies, a and p can take any values on the viewing sphere for which the limb equation 3.22 admits a finite (non empty) set of solutions; i.e. a E [0, 2tc) and P E (0, n) u (it, 2n). Letting a E [0, tc] and P E (0 ,7i) is sufficient as the limb equation 3.22 is symmetric with respect to n for a and p. The 2-D subspace (< ?, r) is also constrained. We have \e\ < 1 (the cross- section radius r is smaller than the radius of curvature, R, of the axis), otherwise the surface self-intersects, t is also constrained, since II- e cos0| < 2 (as Id < 1) and thus, from equation 3.22, |/j < 2|tanp| .This implies that, at each point, the closer the viewing direction is to the axis tangent direction (i.e. smaller Itanpi), the smaller |r| has to be (otherwise there would be no visible limbs). Thus, for an axis point where P = 15°, for example, the cross-section has limbs if |r| < 0.53. This implies that, for a given Circular PRGC, in order for the surface to have limb points (and its projection to have visible points) over a wide range of views, |r| has to be small. Objects seen in daily environments, such as animal limbs or industrial parts, appear 76 not to have, at the same time, high values of e and r ; i.e.when e is high |r| is small (otherwise the thickness ratio would rapidly increase, which would cause self intersection) and when |r| is high e is small (such as is the case for some musical wind instruments where the outlet, which has a high value of |r |, has locally a linear axis, e = 0). Q uasi-Invariant Properties Property 3.12: In the projection of a circular PRGC, the cross-section segments and the 2-D axis tangents at their mid-points are “almost orthogonal over most of the parameter space.” Furthermore, the (angular) measure of orthogonality (y) tends to degrade only close to degenerate regions of the parameter space for which limbs do not exist and which include non-general viewing directions or non common shapes such as those close to self-intersect themselves. To illustrate this quasi-invariant property, we analyze the behavior of the angle y between a cross-section segment and the tangent to the 2-D axis as a function of the parameters in our space of observation. Algebraic analysis of this quasi-invariant property is difficult since the analytical expression for the tangent to the 2-D axis (equation 3.20) requires knowledge of how 0 varies with respect to s at limb points, which is not known in general. Instead, we analyze it numerically by discretizing the space of observation and, for each point (a , (3, e, r) of the space, solving the limb equation and deriving the projections of co-cross-sectional limb 77 points and the tangent to the 2-D axis. The method is as follows (we omit the details): for each set of parameters (ocj, f3j, e {, r^) (at some s) do • select an arbitrary 3-D frame F 1 = ^ j • solve the limb equation 3.22 to obtain the two limb points 0 ^ and 012 • determine (cc2, P2, e2, r2) at s + ds (for some small ds) • solve equation 3.22 for the second pair of limb points 021 and 022 (at s + ds) and express their coordinates in • using equation 3.17 determine the projection of the two pairs of points P(s, 0 n ), P(s, 0 12) and P(s + ds, 021), P(s + ds, 022) (say p n , p l2, p2\ and p 22) • determine the angle between the cross-section segment given by p n and P l2 and the 2-D axis tangent given by the line joining the midpoints of P llP l2 * n d P2l P22- We derive the angles in the image between cross-section segments and 2-D axis tangents over the space of observation defined by a G [0,7t], |3 G (0, n), e < 0.5 and |r| < 0.5 (i.e. for a unit radius cross-section, sweep rates less than half the current radius per unit arclength along the axis). To avoid long tables of numbers, we attempt to summarize the results in various ways. 78 i) Figure 3.13.a shows the graph of the size (in percent) of the space of observation for which the image angle y is within different (upper- bound) angle values of 90°. ize of space (%) size of space (%) 90° -y (upper-bound) 1 0 I S so * (°) a. b. Figure 3.13 Plots o f the size o f space o f observation fo r different upper- bounds o f the image angles y (a) and < |> (b). ii) Table 3.1 summarizes the parts (in percent) of the regions on the viewing sphere where y is within 5°of 90° for certain values of (e, r) . The size is with respect to the space region where limbs exist. iii) Figure 3 .14.a shows a 3-D plot of y as a function of (a, (3) for ( e, r) = (0.2, 0.2) and figure 3.14.b shows the corresponding display of the half viewing sphere ((a, (3) sub-space). This figure shows where the property holds, where it does not hold and where limbs do not exist. In the display, vertical circles correspond to constant p values with a 5° step. 79 The graph of figure 3.13.a shows that over 84.30% of that 4-D space (excluding special values e = 0 or r - 0; i.e. SORs and Circular PRCGCs) the 2- D angle is within 5° of 90° and for over 92.63% of the space is within 10° of 90°. It can be seen, from table 3.1, that the size of the space where the property holds (2- D angle within 5° of being right) gradually decreases as both e and r take higher values. Also, notice from figure 3.14 that the space where the property holds is connected and the property is well behaved. It tends to gradually degrade for small e = 0.1 e = 0.2 e = 0.3 < ? = 0.4 f = 0.1 96.72 92.44 88.50 85.57 r = 0.2 96.86 89.44 83.62 78.55 r = 0.3 97.11 88.25 80.05 74.83 r = 0.4 97.29 87.57 78.72 72.86 Table 3.1 Viewing sets sizes (percent) where the observed image angle y is within 5° o f 90° values of ltan|3l, that is close to regions where limbs do not exist. Notice that a small value of Itanpl requires a very specific viewing direction ^ that is not only close to being in the axis plane but also almost parallel to the 3-D axis tangent; i.e. an unlikely viewpoint. Therefore, even if V is close to being in the axis plane, the property would degrade only at points where the axis tangent is in the direction of P (a small set of points) and it would still hold for the rest of the surface (a much larger set). 80 2-D angle within 5° of 90° no limb region (medium lines) 2-D angle more than 5' of 90° (thick lines) ^ a. Figure 3.14 Property 3.12 fo r (e ,r ) = (0.2, 0.2) . a. 3-D plot o fy a s a Junction o f the viewing angles; b. corresponding half viewing sphere. This property can be thought of as the projective counterpart of the 3-D orthogonality between the segment connecting limb points and the tangent to the 3- D (in 3-D, the cross-section is planar and orthogonal to the axis). Property 3.13: In an orthographic projection of a circular PRGC, tangents to the 2-D axis and the projection of the tangents to the 3-D axis at the corresponding points are “almost parallel over most of the parameter space of observation.” Furthermore, the (angular) measure of orthogonality ( < ) > ) tends to degrade only close to degenerate regions of the parameter space for which limbs do not exist and which include non-general viewing directions or non-common shapes such as those close to self-intersect themselves. To show this quasi-invariant property, we have used the method discussed previously for Property 3.12. The results of the analysis of the image angle < j ) 81 between these two tangent vectors are summarized in an analogous way as those of Property 3.12. iv) Figure 3 .13.b shows the graph of the size of the previous space of observation where < |) is within different (upper-bound) angular values of 0°. v) Table 3.2 gives the sizes of the regions on the viewing sphere where < J ) is 3° or less for some values of (e, r) . The size is with respect to the space region where limbs exist. vi) Figure 3.15.a gives the 3-D plot of ( j) as a function of (a, [3) for (e ,r ) = (0.2, 0.2) and figure 3.15.b the corresponding half viewing sphere, such as discussed for figure 3.14.b. e = 0.1 e = 0.2 I I o e = 0.4 f = 0.1 96.80 95.04 92.26 91.76 f = 0.2 99.21 95.45 93.01 90.78 r = 0.3 100 97.31 94.27 92.82 r = 0.4 100 99.00 96.30 94.07 Table 3.2 Viewing sets sizes (percent) where the image angle < |> is 3° or less. The behavior of this quasi-invariant property is similar to the previous one. The graph of figure 3.13.6 shows that the two tangents are within 3° of each other over 94.48% of the previous space of observation and within 5° over 96.90% of that space. Table 3.2 shows that the degradation is gradual and slow. Note that because 82 the size of the regions on the viewing sphere where limbs do not exist is mainly influenced by r , the relative size of the region where the property holds may tend to increase as r increases (the ratio is over a smaller region, where limbs exist at all). Also, from figure 3.15, it can be seen that the angle difference tends to increase only close to regions where limbs do not exist. Note that for SORs and circular PRCGCs Property 3.13 is invariant as we have proved the stronger property that the two axes coincide. Note also that it can be verified from equations 3.18, 3.19 and 3.21, that the properties hold exactly (perfect orthogonality and coincidence of axes) for general circular PRGCs for the two viewpoints, where the viewing direction P is orthogonal to the axis plane (i.e. side view, cosa = cosj3 = 0, 0! = 0, 02 = 7t) (figure 3.16.a) and where it is in the axis plane (i.e. frontal view, sina = 0, cosflj = cos02, sinBj = - sin02) (figure 3.16.b). Figure 3.15 Property 3.13 fo r ( e, f) = (0.2, 0.2) . a. 3-D plot; b. half viewing sphere. no limb region (medium lines) 2-D angle difference is 3° or less (thin lines) ^ a. 2-D angle difference is greater than 3° (thick lines) b. 83 These two properties are quasi-invariant with respect to transforms parameterized over the 4-D space of observation. Thus they are orthographic quasi invariants (with respect to (a , (3') and object-shape parameters quasi-invariants (with respect to (e, r) ). They indicate that the image of a general circular PRGC can be well approximated by a right ribbon in the sense that the right ribbon correspondences are close to co-cross-sectional points correspondences over most o f the parameter space. So far, the analysis has indicated that two types of relationships characterize curved-axis primitives: parallel symmetry and right ribbons. The former give exactly the image of co-cross-sectional points of a PRCGC while the latter give “good” approximations of co-cross-sectional points of a circular PRGC. Because, both types of symmetry will be used to find curved-axis parts in the image, it is useful to analyze the relationship between parallel symmetry and right ribbons. This analysis will later be used to derive efficient methods for curved-axis parts detection. The relationships are the subject of the following properties whose proofs are given in appendix A. a. x Figure 3.16 Properties hold exactly fo r special viewing directions a. side view. b. frontal view. 84 P roperty 3.14: A right ribbon with constant cross-section produces parallel symmetric side curves. Furthermore, the axis of parallel symmetry coincides with the right ribbon axis. Figure 3.17.a illustrates this property. In the general case of a non-constant right ribbon, this property does not hold. This is the subject of the following stronger property. P roperty 3.15: The only case where the right ribbon correspondences are exactly parallel symmetry correspondences is when the derivative of the right ribbon sweep function vanishes. There is in general an offset between the parallel symmetry correspondences and the correspondences of a right ribbon with non-constant sweep (figure 3.17.b). This offset is proportional to the derivative of the right ribbon scaling function (see the proof of Property 3.15 in appendix A for the expression of the tangents to the ribbon outline). 3.2 S tructural Properties Among the properties of the image of a GC are also those characterizing the junction patterns (and their relationships) between its boundaries. Such junctions depend on both the part geometry and the viewpoint. Analyzing the possible junction patterns that can be observed in the image for different classes of parts and 85 viewing directions is another way of using the 3-dimensionality of the scene objects. C -(t) Q-(t) Qjty axis a(t) a. ^ right ribbon correspondences X / c x t x y parallel symmetry correspondences % l \ b. Figure 3.17 Right ribbon and parallel symmetry correspondences. Coincident in the constant sweep case (a), offset in the non constant sweep case (b). Junctions of polyhedral and curved objects have been previously addressed in [20,33] and [42] respectively. An important conclusion of those efforts is that the relationships between the image boundaries of a 3-D object belong to a finite, small, catalog of junctions (junction catalog). For piecewise smooth objects, the junction catalog includes so-called three-tangent junctions (henceforth 3-tgt-j) L-junctions (henceforth L-j) and T-junctions (henceforth T-j) and terminal (henceforth cusp9) 86 (see figure 3.18). For polyhedral objects or non-smooth cross-sections, other junctions such as arrow and T-junctions hold [42]. For reasons of simplicity, in the analysis of this section we will use smooth cross-sections to illustrate the properties; i.e. the junctions we use in the analysis are 3-tgt-j, L-j, T-j and cusps}® Other aspects of intensity images such as effects of occlusion by another object and of breaks in boundaries are not discussed here but in chapters 4, 5 and 6 where we describe the segmentation and shape description methods. Also, for now, the analysis assumes that parts are isolated and ignores junctions that occur due to joints between parts. The effect of joints on the observed junctions are discussed in chapter 6. (curvature- )L-j Figure 3.18 Junctions used fo r the structural properties. It is useful to distinguish structural properties along the surface of a part (inner-surface properties) from those at its extremities (closure properties). Both 9. cusps are limb points where the viewing direction is tangential to the limb curve. See [38] for a detailed analysis of similar and other visual generic events. 10. extension to polyhedral junctions is straightforward and only requires aug m en tin g the set o f combinations we give. 87 types of properties give a catalog of possible junctions and connectivity relationships between them. Inner-surface junctions occur due to self-occlusion. This has the effect of producing cusps and T-js in the image of the surface of a part. The existence of such self-occlusion junctions in the image depends on both the primitive and the viewpoint. Figure 3.19 gives examples. The inner-surface property can be stated as follows. P roperty 3.16: In the image, the surface of a part can be delimited by continuous boundaries or by boundaries forming (self-occlusion) T-js such that the cusps (ending parts of the tops of the 7s) lie inside the surface. Further, if more than one self-occlusion T-j is visible the cusps belong to the same portion of the surface. The last part of the property is quite natural because the cusping boundaries belong to the occluding portion of a GC surface and a portion of such a regular surface is either occluding or occluded, not both. This is best explained by the examples of figure 3.20 illustrating violations of the inner-surface property. ' cusping boundaries self-occlusion T-js a. no self-occlusion b. object self-occludes on one side c. object self-occludes on both sides Figure 3.19 Inner-surface structural properties. 88 ‘ cusp” not inside surface 1 cusps do not belong to same surface patch U Figure 3.20 Impossible structural arrangements o f self-occluding surface patches. The closure properties will be given as junction combinations and connectivity relationships between them at the end of a part. To analyze the possible arrangements, it is useful to consider the cases when the cross-section is visible and pointing towards the camera, when it is visible and pointing away from the camera and when it is not visible. In the following analysis we assume that junctions are observed in pairs; i.e. that the surface is bounded by a pair of “side” boundaries.1 1 In this case, the different pair-wise junction combinations at the ends of the surface boundaries of a part and their connectivity, depending on the primitive, are discussed below. Note that our purpose is not to give all the different aspects of the parts (in the aspect graph sense). Rather we want to illustrate the different junction combinations (which could be obtained for different aspects). The given lists are believed to be exhaustive. Arguments are given when appropriate. 11. the analysis can be easily extended to multiple surface parts. 89 3.2.1 Visible Cross-Section Pointing Towards the Camera When the cross-section is visible and points towards the camera then the possible patterns are pair wise combinations of the set (3-tgt-j, T-j, cusp). Combinations involving L-js cannot occur in this case because L-js correspond to a cross-section pointing away from the camera, a case we discuss later. The observed combinations depending on the primitive are discussed below. 3.2.1.1 SHGCs The possible combinations for SHGCs are shown in figure 3.21 and include (3-tgt- j, 3-tgt-j), (3-tgt-j, T-j) and (T-j, T-j). The combinations involving cusps cannot occur because these latter are due to two reasons: when the cross-section points away from the camera and it “flares” due to a relatively high derivative of the scaling function (flaring cusps; making the surface locally closer to the viewer than the cross-section) as was the case for the object in figure 3.19.C, and when the axis is curved producing self-occlusion between different portions of the part (curvature cusps) as was the case for the object in figure 3.19.b. Flaring cusps cannot occur here because we are considering the cross-section pointing towards the camera (the cross-section is locally closer to the viewer than the surface) and curvature cusps cannot occur since the axis is straight. Notice that LSHGCs can only have the (3-tgt-j, 3-tgt-j) combination because LSHGCs cannot self-occlude (the combinations involving T-js require a non-linear sweep). 90 a. 3-tgt-j 3-tgt-j closure b. 3-tgt-j T-j closure c. T-j T-j closure Figure 3.21 Closure patterns o f an SHGC with a cross-section pointing towards the camera. 3.2.1.2 PRCGCs The possible combinations for PRCGCs are shown in figure 3.22 and include (3-tgt- j, 3-tgt-j), (3-tgt-j, cusp) and (3-tgt-j, T-j). T-js are due to two reasons: when the cross-section points towards the camera and flares (flaring T-j), as in figure 3.21.C, and when the part self-occludes due to the curvature of the axis (curvature T-j), as in figure 3.22.C. The (T-j, T-j) combination cannot occur here because there is no flaring for PRCGCs (the scaling function is constant) and when both are curvature T-js (part self-occludes) the cross-section would be completely invisible whereas we are considering a visible cross-section. Also, the (T-j, cusp) combination cannot occur because both junctions would be curvature junctions (curvature T-j and curvature cusp) implying a non-visible cross-section.12 The (cusp, cusp) combination holds only when the cross-section faces away from the camera. Note 12. other combinations involving T-js can occur when occlusion by another body is considered. Such cases are discussed in chapters 4 ,5 and 6 where we address the use of the structural properties for part verification. 91 that the (3-tgt-j, cusp) occurs here, unlike in the SHGC case, because the surface is not constrained to lie on the same side of the cross-section due to the curvature of the axis. = 3 f? @ a. 3-tgt-j 3-tgt-j closure b. 3-tgt-j cusp closure c. 3-tgt-j T-j closure Figure 3.22 Closure patterns o f a PRCGC with a cross-section pointing towards the camera. Notice that the above arguments hold because the axis is planar. For non- planar axes certain combinations that do not occur in this discussion may occur. Notice also that cusps are usually associated with T-junctions [38] where, this time, the surface boundaries are occluding rather than occluded (this can be seen in all the examples where the cusping boundaries are also the tops of some T-j). We will make use of this remark in later chapters where we will address the detection of the closure of parts. 3.2.1.3 Circular PRGCs The possible combinations of circular PRGCs are more complex than either SHGCs or PRCGCs (they a super-set of both cases) and are shown in figure 3.23. The only combination that does not occur is the (cusp, cusp) which as mentioned above 92 corresponds to the case when the cross-section faces away from the camera. Notice that unlike for PRCGCs the combination (T-j, cusp) occurs here because the cross- section can flare (case f. of figure 3.23). a. 3-tgt-j 3-tgt-j closure b. 3-tgt-j T-j closure c. 3-tgt-j cusp closure d. T-j T-j closure f. T-j cusp closure Figure 3.23 Closure patterns o f a circular PRGC with a cross-section pointing towards the camera. 3.2.2 Visible Cross-Section Pointing Away from the C am era When the cross-section is visible and points away from the camera the possible junctions are combinations of the set (L-j, T-j, cusp). Combinations involving 3-tgt- js cannot be observed because 3-tgt-js indicate a cross-section pointing towards the camera. As before, we separate the cases depending on the primitive. 93 3-2.2.1 SHGCs In the case of SHGCs the possible combinations are shown in figure 3.24 and include (L-j, L-j), (L-j, cusp) and (cusp, cusp). Combinations involving T-js cannot be observed because we cannot have flaring T-js as they correspond to a cross- section pointing towards the camera (i.e. the cross-section is locally closer to the viewer than the surface), whereas we are in the case where the cross-section points away from the camera; i.e. the surface is locally closer to the camera than the cross- section. Neither can we have curvature T-js as the axis is straight. Notice also that LSHGCs can only have the (L-j, L-j) because, as previously mentioned, they cannot self-occlude. 3.2.2.2 PRCGCs The possible combinations for PRCGCs are shown in figure 3.25 and include (L-j, L-j), (L-j, T-j), (L-j, cusp). The combination (T-j, cusp) cannot occur here because it a. L-j L-j closure b. L-j cusp closure c. cusp cusp closure Figure 3.24 Closure patterns for an SHGC with a cross-section pointing away from the camera. 94 would imply that the cross-section is not visible (the cusp would have to be a curvature cusp and the T-j would have to be curvature T-j, both making the cross- section invisible). The combination (T-j, T-j) cannot occur also because the cross- section would not be visible and the combination (cusp, cusp) occurs only when at least one of the cusps is a flaring-cusp, which cannot happen for PRCGCs. a. L-j L-j closure b. L-j T-j closure c. L-j cusp closure Figure 3.25 Closure patters fo r a PRCGC with a cross-section pointing away from the camera 3.2.2.3 Circular PRGCs In the case of circular PRGCs, the different combinations are shown in figure 3.26 and include (L-j, L-j), (L-j, T-j), (L-j, cusp), (T-j, cusp), (cusp, cusp) and (T-j, T-j). Notice that the combination (T-j, cusp) can occur in this case unlike for PRCGCs, because we can have a flaring cusp here (figure 3.26.d). Also, the combination (T- j, T-j), when both T-js are curvature T-js while the cross-section is (partially) visible, can occur because of the possibility of flarings (case f). 95 a. L-j L-j closure b. L-j T-j closure c. L-j cusp closure d. T-j cusp closure e. cusp cusp closure f. T-j T-j closure Figure 3.26 Closure patterns fo r a circular PRGC with a cross-sections pointing away from the camera 3.2.3 Non-Visible Cross-Section When the cross-section is not visible, then this is either due to a vanishing cross- section radius (a case we call convergent closure) or to self-occlusion causing T-js to occur. In the latter case (which cannot occur for SHGCs without occlusion by another part), the different combinations are shown in figure 3.27 and include (T-j, T-j) and (T-j, cusp). The analysis we have given is of a qualitative nature using results from previous work (the junction catalog, for example). Related analyses have been carried in the past. Koenderink [38] has provided a mathematical analysis of the a. convergent closure b. T-j T-j closure c. T-j cusp closure Figure 3.27 Closure patterns fo r an invisible cross-section conditions under which certain generic events of general curved surfaces occur, and Ponce and Chelberg [53] have provided a mathematical derivation of limbs and cusps of GCs for the puiposes of rendering and aspect graphs computation. Also, Shafer and Kanade [67] have derived the conditions on the viewing direction and SHGC parameters which cause self-occlusion to be observed in the image. The purpose of the above analysis of the structural properties is to derive verification tests for parts hypotheses made on the basis of the geometric properties. They are also used to infer useful shape information such as whether the cross- section points away or towards the viewer. This will be discussed in chapters 4, 5 and 6. 97 4 Segmentation and Description of SHGCs In this chapter, we discuss the method we use for solving the segmentation and shape recovery of SHGCs from a real intensity image that includes phenomena such as noise, markings, shadows, highlights and occlusion. As mentioned in section 1.2.2, the part level of our method consists of three sub-levels addressing a hierarchy of features: the curve level, the symmetry level and the surface patch level (see figure 4.1). The curve level, takes as input an intensity image and produces boundaries. The symmetry level takes as input those boundaries and outputs parallel symmetry relationships between them. The surface patch level is made up of two independent sub-systems: one dedicated to SHGCs and the other to PRGCs. Because these two sub-systems share the curve and symmetry levels (which are executed only once on the input image), we will describe these latter in this chapter together with the SHGC sub-system. The curve and symmetry levels make use of some previously developed methods in the literature. For this reason, we will omit certain details of these two levels which will be referred to the appropriate references. The bulk of our 98 parts hypotheses * „ surface patch level I curve level | f intensity image Figure 4.1 A three-level hierarchy for the part level. implementation lies in the surface patch level which we will discuss in great detail. Some details of the organization of the search process at all steps and the related efficiency issues are discussed at the end of this chapter. 4.1 The Curve Level The curve level of our method is rather simple and similar to many previous boundary-based methods. For this reason, we limit this section to an outline of the principle of the boundary formation process. Given an intensity image, the method uses an edge detector (such as Canny’s [IS]1) to find edges in the image. These latter are then grouped into local boundaries using simple contiguity criteria between edges. This process results in a list of independent boundaries which 1. other edge detectors could be used. This one was chosen because it is available to us. 99 include object boundaries (edges and limbs) but also surface markings, shadows and highlights. The goal at this level is to form boundaries which, if corresponding to a relevant scene object, can be used to detect it and recover its shape by analyzing their properties and their interactions. This process requires that an object boundaries be separated into different structural components (for example the cross-section boundaries, the side boundaries). However, among the object boundaries, some may be separated (due to low contrasts or errors in edge localization) whereas they should form a single boundary and some may be continuous whereas they should be sub-divided into different structural components. To handle this situation, a standard first step is to segment the local boundaries at corners (tangent discontinuities and curvature extrema). For this, we use the method of [65] which first applies an adaptive smoothing operator and finds the comer points. The reason this process is adopted is that separate object boundaries, when they meet, usually form vertices where there is a discontinuity in either boundary orientation or curvature. It is also useful to group boundaries which are likely to correspond to the same structural component in the scene. Previous work [5,8,24,40,45] has addressed the usefulness of such boundary grouping and the related continuity principles. The argument is that continuity is a non-accidental image property 100 which should be used to infer continuity in the 3-D scene. Another advantage of boundary grouping is that it is always better to have an object boundary as a single symbolic entity instead of several separate ones (corresponding to different portions of it). This reduces the search space at subsequent processing levels. For this we use a simple co-curvilinearity grouping step similar to the method of [45]. The grouping criterion is illustrated in figure 4.2 and is based on a function of the relative change in orientation between a pair of boundaries and of their relative gap. The function is given below M = ( a + P ) . f ■ (4.1) - r g + 1 2 where (a + p) measures the angle variation and g / (/j + g + l2) measures the relative gap between the boundaries. Figure 4.2 Co-curvilinearity measure fo r boundary grouping. In order to be grouped the co-curvilinearity measure has to be minimized for each of the boundary pair among all those lying in its vicinity and both angular variation and the relative gap have to be smaller than fixed thresholds. 101 In our method, the grouping criterion we use is rather conservative (small thresholds) so that only short breaks with smooth connectivity are bridged. In fact, it is clear that little information about whether neighboring boundaries belong to the same object or not can be obtained by just using the local relationships between them. Grouping decisions of larger gaps are best made when more global information about shape is formed and consequently more informed choices can be made. To give an example, consider the sample image of figure 4.3.a showing the local boundaries obtained after the corner-based segmentation (boundaries are delimited by dots). Figure 4.3.b shows the resulting boundaries after the co- curvilinearity grouping. Certain short gaps, such as those at the cross-section of the cone, the lower right portion of the cone’s limb and the upper left portion of the vase’s limb, have been bridged. A quadratic B-spline fitting is applied to the obtained boundaries. The purpose of this fitting is to obtain compact representations which can also be used to analytically compute derivatives along the boundaries. The fitting method of [64] is used for this purpose. 4.2 The Symmetry Level From the projective properties discussed in chapter 3, parallel symmetry is an important property of the cross-sections of SHGCs and side boundaries of PRGCs. 102 Figure 4.3 Example o f results o f curve level grouping, a. original boundaries; b. resulting boundaries. Boundaries are delimited by large dots. Thus, detecting parallel symmetries is an important step towards hypothesizing such parts from the image. In the case of SHGCs, we use invariant Property 3.1 which indicates that the cross-section boundaries of an SHGC, when visible, are linearly parallel symmetric. The method we have developed is based on that of [64] and is intended to identify such linear parallel symmetries. The method consists of a multi-step process which is described below. 4.2.1 Detection of Local Parallel Symmetries For the detection of local parallel symmetry correspondences, we have used the method of [64]. That method consists of considering all pairs of B-spline segments to analytically find correspondences such that corresponding points have similar tangent orientations. It does not specifically find linear correspondences, however. 103 The linearity tests are applied in the next steps. Further, the symmetries obtained are not relied upon for accurate correspondences between boundaries but rather they are used as coarse estimates that are refined by subsequent processes. The reason why the local correspondences produced may be inaccurate is due to the B-spline fitting errors. Details of the local detection method can be found in [64], Figure 4.4.b shows the parallel symmetries (shown by their axes in thick lines) obtained from the boundaries of the images of figure 4.4.a. The right-most image produced about one thousand local correspondences. 4.2.2 Grouping of Parallel Symmetries The purpose of this step is to group parallel symmetry correspondences between boundaries that are likely to belong to the same structural entity, the cross-section of an SHGC. In this sense, linear parallel symmetry provides a more global criterion for grouping boundaries than those at the curve level, thus handling larger gaps and larger angular variations. For this, we propose a perceptual grouping method based on a local compatibility constraint derived from Property 3.5. This latter constrains any two corresponding segments on a pair of symmetric boundaries to be parallel with a constant length-ratio. As shown in figure 4.5, two symmetry elements ps± and /rs'2 are considered for grouping if the vectors $ and f defined by the end points of the symmetries are parallel and the ratio of their lengths !$|/|H (local scale) is similar to the scale |5 |/|j^| suggested by the grouped symmetries (global scale). In 104 Figure 4.4 Example o f parallel symmetries produced on some images, a. initial boundaries; b. symmetry axes in thick lines overlaid on the boundaries. practice, we also use a connection measure for each grouping hypothesis. For this, we distinguish two cases discussed below. Continuous Connection In this case the two symmetry elements have a common boundary (figure 4.5.a). The local compatibility constraint, in addition to the requirements mentioned above, involves a simple connection measure based on the gap between the non connected boundaries. As shown in the figure, the quantity E = |$ |/|§ | measures the relative length of the gap with respect to the symmetric boundaries. A grouping hypothesis is generated if E is less than a fixed threshold, a fraction of 1 (by default 0.5). This measure has been introduced so as to penalize distant symmetry elements, a case that could occur between unrelated symmetries (involving markings, for example). Discontinuous Connection In this case, the symmetry elements do not share a boundary. This case can happen, for example, due to occlusion that hides parts of both end cross-sections of an SHGC. Another connection measure is used in this case, E = (|J |/|S |) ( a 2 + (3 2) , and is as shown in figure 4.5.b. It controls both the relative gap and the angular variation of the symmetric boundaries. Gaps that involve a change in the sign of curvature are not considered. This local compatibility constraint prevents grouping of symmetry elements such as the ones of 4.6.a and 4.6.b. In the former, the connecting segments are not parallel and in the latter, the local scale is not similar to the global one. This step was originally designed to handle cases where there are large gaps in the cross-section which cannot be bridged by simple extrapolation but need the use of parallel symmetry. Its theoretical basis is quite interesting. However, it may be computationally expensive when applied to scenes with complex markings and background such as the ones we will later show. Because of its nice theoretical basis 106 global - scale = J-rr local - scale = y E = M I S I M I S I ( a 2 + 1 3 2) a. continuous connection b. discontinuous connection Figure 4.5 Grouping o f parallel symmetry elements. a. Figure 4.6 Non grouped symmetries; a. non parallel connections; b. non similar local and global scales. we have included it as an optional step in our system (not used by default). In a sense, we have chosen to sacrifice it for the benefit of a faster system. 107 4.2.3 Selection of Hypotheses The previous step may produce conflicting connection hypotheses. Conflicts arise when there is more than one connection hypothesis involving the same curve at the same end. Figure 4.7 gives an example. A final interpretation may not include conflicting hypotheses. These latter are handled as follows. First, conflict sets that include mutually competing hypotheses are constructed. Then, within each set a filtering is performed that discards redundant grouping hypotheses. Redundant grouping hypotheses are those which connect two symmetry elements, say p st and psj, for which there already exists a sequence of “atomic” connections between the sequence ps{ p si+j . . . psj. This latter sequence of connections is preferred over the single connection which bypasses them all. In other words, a sequence of short connections is preferred over one large connection since shorter connections involve more of the object boundaries. Among the remaining hypotheses within each set, it is difficult to decide at this level which (if any) is the right one. Consequently, all alternative combinations involving non competing hypotheses (conflict free sets) are investigated and verified for consistency. This is discussed in the next section. 4.2.4 Verification of Global Correspondences In this step, parallel symmetries are checked for geometric consistency. The objective is to retain only those that produce linear correspondences. The verification consists of checking the similarity between the scale given by the global 108 the gap in the middle curve is differently influenced by the upper and lower boundaries Figure 4.7 Competing hypotheses. Grouping ofpsj and p sj conflicts with correspondence and the scales of each of its component parallel symmetry elements and connections (Property 3.5). This is necessary because the local compatibility constraint only ensures scale similarity of two neighboring local correspondences (due to similarity measures, the relationship may not be transitive). Global verification is performed by selecting correspondences that satisfy the following requirement (see figure 4.8): 1 - ( (\\Pj ~PiV\\<lj ~ 4 |) / (\\Pm -Pol'W dm ~ < 7 0||) ) | < £ V y (4.2) that o fp sj and psj. Pm Figure 4.8 Verification o f linear correspondences. 109 where I I \pm - p$\ / I I qm - < 7q II is the global scale of the compound correspondence and I I pj - PfI I / W qj - qtI I is the local scale of any of its components,2 8 is a fixed threshold. Verified symmetries which are composed of more than element are used in order to fill in the gaps. Since symmetries are similarity relationships, missing portions of a curve can be inferred from corresponding ones of a symmetric curve. Boundary completion is different for the two types of connections previously discussed. We discuss them separately. Continuous Connection In this case, the common boundary of the connected symmetry elements is used as a model for the missing boundary of the connection. This is done as follows. • the part of the model boundary that corresponds to the gap is detected • the missing boundary is obtained by scaling and translating the previous part so that it fills the gap This is shown by the dashed boundary in figure 4.5.a. This operation is performed efficiently by the use of B-splines. The cross-section gaps of figure 4.9.a and b have been so completed. 2. for non-compound symmetries the test is applied internally to sampled points within the symmetric boundaries. 110 Discontinuous Connection In this case, there are gaps on both sides of the connection. The completion is done in two steps. • boundaries are inferred up to the extremities of the continuing curves (dashed boundaries in figure 4.5.b). The same procedure as the one discussed previously is used here. • the two remaining gaps are filled in each by a quadratic B-spline determined by the two joined extremities and their end orientations (dotted curves in figure 4.5.b) The two filled in boundaries are parallel symmetric (since they are obtained by quadratic B- spline segments having mutually parallel tangents at their extremities), thus producing a consistent boundary completion with the symmetry requirements. Figure 4.9.c is an example of such completion. 4.3 The Surface Patch Level The surface patch level is intended to form parts hypotheses using evidence from image surfaces that satisfy those properties. Here, we are concerned with SHGCs and the use of their projective properties for their detection; i.e. the surface patch level of the SHGC sub-system (SHGC patch level). To detect SHGCs, we have to take into account the fact that a part’s surface can be discontinuous due to boundary breaks and occlusion. Thus, complete parts 111 c_ a. b. c. Figure 4.9 Results o f the symmetry level on some images. surfaces cannot be directly found in a single step process. Rather, the process has to proceed by first detecting local surface fragments that are likely to correspond to meaningful portions of parts. Such local surface patches can be defined by groups of boundaries that locally satisfy the projective properties of SHGCs. In order to form complete parts descriptions, the process has to group local surface fragments that could belong to the same object. This necessitates the use of compatibility criteria that are most likely to be satisfied by such surface fragments and less likely to be satisfied by those that belong to different objects (or to no object at all). In order to retain meaningful parts hypotheses, the process has to verify that the hypothesized parts are consistent with respect to the projective geometric and the structural properties of SHGCs. It is clear that a hypothesize-verify approach is well 112 suited to implement this process. The method we propose is organized as illustrated in figure 4.10. Its steps are described below. SHGC parts hypotheses 4 ^ ^ f y p ^ t e h y ^ ^ s e T ^ ^ ^ ^ 4 --------------------------- group “local SHGC patches” I 1 detect “local SHGC patches” I t boundaries and symmetries Figure 4.10 Block diagram o f the SHGC patch level. 4.3.1 Detection of Local SHGC Patches We can distinguish two cases for the projection of an SHGC: when its cross-section is visible and when it is not. This distinction is useful because when the cross- section is visible we can exploit it to find the rest of the part (i.e. focus the search). W hen it is not visible, due to occlusion or joints between parts of an object, then we have to rely on less efficient and less accurate means, such as searching for the surface without an initial estimate of its location, to hypothesize SHGC parts. We have developed two local SHGC patch detection methods, one for detecting patches with visible cross-sections and the other for detecting patches 113 with non-visible cross-section. The former is an active part of the current system and the latter, although implemented, is not. We discuss them separately. 4.3.1.1 Detection of Local SHGC Patches with Visible Cross-Section Let us first give a rigorous definition of a local SHGC patch with a visible cross- section (henceforth “local SHGC patch of type 1”). Definition 4.1: A closed (cross-section) curve C$ and a pair of boundaries B L and f?2 said to form a local SHGC patch o f type 1 if there exists a continuous and monotonic correspondence between B \ and B 2 such that their corresponding portions are either straight or the lines of parallel symmetry between all pairs of cross-sections, tangential to the boundaries at corresponding points and parallel symmetric to C$, lie along a straight line. The cross-sections tangential to the boundaries at corresponding points constitute a ruling of the area between B i and B2. Figure 4.11 shows sample local SHGC patches of type 1. This definition uses the invariant Property 3.2 (for a local LSHGC patch of type 1) and Property 3.4 (for non-linear local SHGC patch of type 1). To detect such local SHGC patches, we first detect cross-sections. This is discussed below. Detection of Cross-Sections The cross-section of a GC when not occluded usually projects onto closed boundaries in the image. Because no a priori information is given about the 114 Figure 4.11 Sample local SHGC patches o f type 1. a. cylindrical patch; b. conical patch; c. non-linear patch. expected shape of a cross-section, we have to rely solely on this closure property. The cross-section detection method we use thus consists of searching, for each boundary, a path of boundaries that connects its two extremities. This amounts to a graph search where nodes are boundary extremities and edges are the boundaries. To limit the search cost the method makes use of geometric constraints and indexing. The constraints impose continuity between a pair of boundaries so that they have “proximal” extremities and the variation of their end tangent orientations is “small.” They also impose a non self-intersecting sequence of boundaries. The indexing allows direct access to only those boundaries that are in the vicinity of the current node of the search. A bound is also used to control the depth of the explored graph. A filtering step is applied on the resulting cycles that first removes redundant hypotheses which are close-by cycles such as would occur for a part’s cross-section with some thickness. It then selects among cycles which share boundaries, the most internal ones (to exclude the cycle that corresponds to the whole outline of a part, for example). For the remaining closures, any gaps between the boundaries are filled in by quadratic B-splines so as to maintain continuity. Not all selected cycles correspond to “true” cross-sections. As an example, figure 4.12 shows the “cross- sections” hypothesized by this process on a sample image. Selection of the correct hypotheses requires finding evidence of the rest of an SHGC part for each one of them. This is discussed in the next sections. Detection of Local SHGC Patches of Type 1 For each hypothesized cross-section, pairs of boundaries are tested whether they form local SHGC patches of type 1. For this, it is necessary to find a mapping between each such pair of boundaries that would correspond to finding co-cross sectional points. It is known [69] that when the cross-section is visible, given a point P on boundary B and its “matching point” X on the cross-section (having the same tangent orientation3) then the corresponding point Q of P on the boundary C can be obtained by translating the cross-section such that X and P coincide and finding the scaling of the cross-section needed to make it tangential to C (the procedure and its proof are discussed in detail in appendix B). This mapping must be continuous and monotonic. It is found by first identifying initial correspondences (say from B to C) that are invertible (if any), then by repeating the process until the 3. P andX would have the same t value in equation 3.1; i.e. X = S(t, s) and P = S(t, s’) for some s and s’. 116 c z > ( \ c. Figure 4.12 Result o f the “cross-sections” detection step. a. intensity image; b. boundaries resulting from the curve level; c. hypothesized cross-sections end of one of the boundaries is reached or a discontinuity in the scaling function is detected. Finding such a mapping corresponds to “ruling” the area between the two 117 boundaries and results in a set of recovered cross-sections (and their scalings) all parallel symmetric. If such a mapping is found, let us denote it by where P t and are the corresponding points, X ,- the point on the cross-section which corresponds to P i and 77 the amount of scaling of the cross-section to which belong P t and Q{ (figure 4.13). Figure 4.13 Finding local SHGC patches o f type 1. When admitting such a mapping, the boundaries B and C are considered to form a local SHGC patch of type 1 if their portions BPi_Pn, between Pi and P n, and CQl-Qn. between Q x and Qn, • are both straight, then a local LSHGC patch is hypothesized. The patch is further classified as being cylindrical if the segments BPi_Pn and Cq i.qh are parallel (figure 4.11.a) otherwise it is classified as being conical (figure 4.11.b). A cylindrical patch, does not fix the axis uniquely. However, the parallel segments give an estimate of its direction (Corollary 3.4). Similarly, 118 for a conical patch the lines supporting the segments intersect at the cone apex which belongs to the axis (also Corollary 3.4). are not both straight then between each pair of recovered cross-sections having different scalings, the intersection point of their lines symmetry is determined4 (figure 4.14). A local SHGC patch of type 1 is hypothesized if the locus of such points is a straight line (using fitting criteria). This line is a local estimate of the projection of the axis (Corollary 3.4). We call this patch a non-linear SHGC patch of type 1 (figure 4.11.c). axis projection Figure 4.14 Using Corollary 3.4 to estimate the projection o f the axis. For the case of a non-linear SHGC patch of type 1, to find the point of intersection of lines of symmetry between a pair of cross-sections (henceforth cross-section apex) we do not actually need to intersect lines of symmetry. Given 4. lines o f symmetry are easily determined since the relative scaling and correspondence between the two cross-sections have already been determined. 119 such a pair of cross-sections and the correspondence information (P{ , Qj, X it rt) and (Pj, Qj, Xj, rj) then the cross-section apex Ay can be shown to be < 4 ,7 = — ----- (4.3) r‘~ rj In summary, hypothesized local SHGC patches are classified into three types each giving a local estimate of the axis: • cylindrical: giving an estimate of the direction of the axis • conical: giving a point (apex) belonging to the axis • non-linear: giving the projection o f the axis Not all obtained hypotheses may correspond to portions of real parts. There are two reasons for this. First, the observed projective invariants are necessary properties of the projections of SHGCs but not sufficient ones to firmly conclude their presence in the scene. Second, thresholds are used so as to account for errors in the projection model, noise, quantization, etc. (see section 4.5.2 for a discussion on the thresholds used in the methods). This inevitably results in spurious hypotheses. Figure 4.15 shows some of the local SHGC patches detected by our system from the boundaries of figure 4.12. In the figure, the “correct” hypotheses are shown separately from some of the “false” ones, although at this stage the system cannot differentiate between them (the total number of local SHGC patches of type 1 is 73, only 4 of them correspond to SHGC objects). In appendix E, we 120 a. b. Figure 4.15 Examples o f hypothesized local SHGC patches detected from the contours o f figure 4.12. a. the “correct” hypotheses, b. examples o f “ false” hypotheses. discuss the details of the organization of the search for local patches and the related efficiency issues. 4.3.1.2 Detection of local SHGC patches with non-visible cross-section Let us give a more rigorous definition of a local SHGC patch with a non-visible cross-section (henceforth a “local SHGC patch of type 2”). Definition 4.2: A pair of boundaries and S 2 are said to form a local SHGC patch o f type 2 if there exists a continuous and monotonic correspondence between them such that their corresponding portions are either straight or the tangents at corresponding points intersect along a straight line. This definition makes use of invariant Property 3.2 (for a local LSHGC patch of type 2) and invariant Property 3.3 (for a non-linear local SHGC patch of type 2). Figure 4.16 shows sample local SHGC patches of type 2. 121 The detection of LSHGC patches of type 2 (cylindrical and conical patches) is fairly simple as it requires testing the linearity of boundaries. Detection of non linear SHGC patches of type 2 has been addressed by Ponce et al. in [55] and by Sato and Binford in [66]. In the former, a Hough-like method is used where the combinations are obtained by considering all possible point wise pairings (which could be reduced if the boundaries have inflection points) and in the latter the mapping is found by assuming the segments connecting corresponding points (cross-section segments) to be all parallel. The Hough method is computationally expensive with complexity 0(km 2) where k is the number of cells in, say, the orientation dimension of the Hough table and m the number of points considered. The parallelism assumption of Sato and Binford is restrictive in that this is not a property of general SHGCs (it holds for special cases such as surfaces of revolution and LSHGCs). t a. b. c. Figure 4.16 Sample local SHGC patches o f type 2. a. cylindrical patch; b. conical patch; c. non-linear patch. 122 We have developed a method based on the known fact that an object’s surface in the vicinity of its extremal boundaries is locally convex along the viewing direction. Therefore, the image of the cross-section curve is locally convex in the vicinity of the limbs. Further, it can be shown that the main property of the cross- section that allows the identification of the correspondences, besides the local convexity, is that the tangent to the cross-section is the same as the tangent to the limb at the point of contact (this is the basis of the ruling method for SHGCs of type 1 and its proof given in appendix B). See figure 4.17. Therefore, the local shape of the cross-section is not important as long as this property holds. Using the observation that the cross-section region that generates limb points (call it the generating region) is usually small and convex, it can be approximated by an ellipse in the image (figure 4.18). For objects which have a smooth scaling function, the orientations of the cross-section segments are locally similar and thus the approximating ellipses have similar orientations. limb and cross-section are mutually tangential Figure 4.17 Local convexity and tangentiality o f the cross-section in the vicinity o f limb boundaries. 123 actual cross-section ellipse approximating the generating region Figure 4.18 Using an ellipse to approximate the image o f the generating region o f the cross-section. Using this argument, we can use a method similar to the one used for ruling SHGCs of type 1. This time we will use an ellipse as though it were the given cross- section. Because the correspondence orientations are not known, the method uses several orientations (using a discretization of the angular range for which the boundaries can be mapped). For each orientation the same local minimization method as the one used for SHGCs of type 1 is used (figure 4.19.a; see appendix B for more details). Only the correspondences which verify the tangentiality property are accepted. This results in correspondences (Pit Qt). If the locus of intersection points of the tangent lines to the boundaries at the points (P(, Qi) is a straight line (using fitting criteria) then a non-linear local SHGC patch of type 2 is hypothesized (Property 3.3). If several such axes are found for different orientations, then several local SHGC patches are hypothesized. This method may not map all points of a given boundary since it only maps those belonging to the part of the generating region which the ellipse approximates 124 well. Missing cross-section segments can be inferred also using Property 3.3. Given the axis A and an unmatched point P k, the intersection point Ak of the axis and the tangent to the boundary at Pk is determined. The corresponding point Qk is the one whose tangent line intersects A &tAk. See figure 4.19.b. A , i i Figure 4.19 The search fo r co-cross-sectional points (a) and their completion (b). The performance of this method depends on how well the image ellipse approximates the generating region of the true (but unknown) cross-section. For bilaterally symmetric boundaries, this method should work best because a surface of revolution produces bilaterally symmetric boundaries and the projection of the cross-section is of course an ellipse. Figure 4.20.a-c gives an example on real image boundaries (from the image of the vase of figure 4.15). Figure 4.20.b shows the actual correspondences and axis for the boundaries of figure 4.20.a, and figure 4.20.C shows the correspondences and axis resulting from this method. The method should also produce reasonable results on the boundaries of SHGCs with smooth cross-sections but non-bilaterally symmetric boundaries. Figure 4.20.d-f gives an 125 example for synthetic boundaries. Notice that the resulting correspondences and axis by the method are similar to the actual ones. For other types of SHGCs, the performance may degrade when the boundaries project from a mix of meridians and limbs making the elliptic approximations poor. The advantages of the proposed method are twofold. First, it exploits an important geometric property of the interaction between cross-sections and limbs which is the local convexity and the tangentiality. This provides means to select only “good” correspondences. Second, it is more efficient than the Hough transform because its complexity between a pair of curves with m points each is 0(km ) instead of the O ipn2) for the Hough transform (see appendix E for the derivation of the time complexity). 4.3.2 Grouping of Local SHGC Patches This step is intended to group together local SHGC patches that are likely to belong to the same part, thus explicitly accounting for feature fragmentation. Expressing the compatibility of local SHGC patches belonging to the same partis central to the grouping (segmentation) process. By a simple examination of Corollary 3.4, we can derive a list of viewpoint invariant geometric compatibility constraints between patches of the same SHGC. The geometric compatibility between a pair of patches 126 i b. e. Figure 4.20 Performance o f the method on some examples, a. real image bilaterally symmetric boundaries; b. actual correspondences and axis; c. resulting correspondences and axis by the method; d. synthetic non-bilaterally symmetric boundaries; e. actual correspondences and axis; f. resulting correspondences and axis by the method. consists of the “similarity” of their local axis descriptions. The similarity relationships, depending on the type of the two patches are given below: 127 ^ • non-linear and non-linear: the axes must be colinear (up to some error; figure 4.21.a and f) • non-linear and conical: the cone apex must lie on the axis (up to some error; figure 4.21.b) • non-linear and cylindrical: the direction of the cylinder must be parallel to the axis (figure 4.21.c) • conical and conical: the limbs must be colinear (figure 4.2l.d) • cylindrical and cylindrical: the limbs must be colinear (figure 4.21 .e) The structural relationship between two geometrically compatible local SHGC patches is also examined. Two cases are distinguished, continuous connection where the patches share a boundary (figure 4.21.a, c, and e) and discontinuous connection where the patches do not share any boundary (figure 4.21 .b and f). A grouping hypothesis between two local SHGC patches is generated if • they form a continuous connection or • they form a discontinuous connection and either • their spatial extents do not overlap (figure 4.22.a) or • there is overlap and • their boundaries form (self-occlusion) T-junctions (figure 4.22.b) a. b. c. non-linear/non-linear non-linear/conical non-linear/cylindrical (continuous connection) conical/conical e. f. cylindrical/cylindrical non-linear/non-linear (discontinuous connection) Figure 4.21 Examples o f geometrically compatible local SHGC patches. a. from figure 4.9; b.from figure 4.15; d.from figure 4.3; c, e, fa r e illustrative examples. 129 • the “cusping” boundaries do not lie outside the patches and belong to the same patch These structural constraints are derived from the inner-surface structural property discussed in section 3.2. The non-overlap case is introduced so as to handle occlusion by other bodies and boundary breaks. In the case of spatial overlap, the detection of T-junctions between the boundaries uses measures of contact (T-j measures) which are discussed in section 4.3.3. T-junctions local SHGC patch 1 spatial overlap a. non-overlapping patches b- self-occluding patches Figure 4.22 Structural analysis o f geometrically compatible local SHGC patches. Conflicting grouping hypotheses (more than one grouping hypothesis at an end of a local SHGC patch) are rare due to the strong nature of the geometric constraints. When they do occur, the hypothesis selection, among conflicting patches, gives preference to the closest one to the common patch among continuous connections (if any) otherwise the closest one among discontinuous connections. The heuristic preference to continuous connections is due to the fact that they are stronger indicators of surface continuity. 130 When all connections are found, aggregates of local SHGC patches are formed by grouping all transitively compatible ones into a single entity (SHGC part hypothesis). In practice, discontinuous connections are not used at this stage to aggregate patches but are examined later at the verification stage. This is due to the ambiguity of the structural properties of discontinuous connections. This is best explained by the example of figure 4.23. At this level, the scope of the method is limited to local SHGC patches and no notion of complete parts is achieved yet. As we will discuss in chapter 6, certain joints between parts produce similar structural arrangements as the inner-surface discontinuous connections of the same part. Thus, at this stage we cannot decide whether we are dealing with patches belonging to two joined parts or to a single part. The decision can be made when more information about complete parts is formed; i.e. when the closure property is verified. This is discussed in the following section. Figure 4.23 Ambiguity o f the local structure o f discontinuous connections. a. one part b. two joined parts same local image structure 131 4.3.3 Verification of SHGC Parts Hypotheses The previous step results in SHGC parts hypotheses, each consisting of one or an aggregate of geometrically and structurally compatible local SHGC patches, which only locally verify the geometric invariant properties of SHGCs and the inner- surface structural property. This step is intended to check their global consistency and acts as a filter that rejects unlikely part hypotheses. The filtering we use consists of two consecutive steps, the intra-part filtering and the inter-part filtering. The former uses information within each part hypothesis to eliminate those that are globally inconsistent with the projective properties of an SHGC. The latter uses the interaction between different parts hypotheses (context) to eliminate those conflicting with “better” interpretations. Both steps are discussed below. 4.3.3.1 The Intra-Part Filtering This step uses both a geometric test and a structural one. The geometric test consists of verifying that a part hypothesis consisting of an aggregate of locally compatible patches is also globally geometrically consistent (due to the use of measures, local compatibility may not be transitive). This is done by first finding the axis of the part hypothesis by combining the local axes descriptions of all component local SHGC patches (for e.g. by fitting a line to the axis points of all local patches if the part is non-linear). The axis description computed is then compared with all the local ones using the geometric compatibility rules of section 4.3.2. 132 The structural test consists of verifying that a part hypothesis has one of the SHGC closure structural patterns of section 3.2 (i.e. the part has a valid closure). However, due to possible boundary breaks or occlusion, some parts hypotheses may have incomplete descriptions and closure may not be directly observed at their ends. For this, completing such partial descriptions is useful. We first discuss the part completion process then the closure finding process. Part Completion The part completion consists of reconstructing the missing shape information at the end of a part hypothesis and in the gaps between aggregated local SHGC patches. For example, the vase of figure 4.15.a and the pot of figure 1.1 have their lower portions occluded, and the object of figure 4.21.a has a gap in its description. For such cases, only partial shape information could be found up to this stage. To complete such descriptions requires reconstructing mappings for each unmapped point of the part boundaries. This is to be done in such a way that the geometry of the part remains consistent. We show that recovering such missing shape information can also be done using the geometric invariant properties of SHGCs. For SHGC hypotheses of type 1, given an unmatched boundary point P we would like to reconstruct the missing cross-section and its scaling (with respect to the “top” one) as well as its co-cross-sectional limb point Q. For this, its matching point5 X on any ruled cross-section (call it reference cross-section) of the part hypothesis is identified (see figure 4.24). Then, the scaling, with respect to the 133 reference cross-section, of the cross-section that passes through P is determined by the following method based on Corollary 3.4 (which we call the axis based cross- section recovery): In the case of an LSHGC (figure 4.24.a), the corresponding point Q is simply the intersection of the line from P parallel to the limb correspondence line of the reference cross-section (line X -Y in the figure) and the other straight limb of the LSHGC. The scale is given by the ratio dist{P Q) / dist(X, Y). In the case of a non-linear SHGC (figure 4.24.b), first the intersection point Ax of the line connecting P to X and the axis is determined.6 Then, by the property of linear parallel symmetry between the cross-sections, it can be shown that the scale is given by the ratio dist(Ax P) / dist(Ax, X). Application of this method to the vase (left-most object) and cone (middle object) of figure 4.15.a and the object of figure 4.21.a are shown in figure 4.25. The co-cross-sectional limb point Q, corresponding to P, in the case of a non-linear SHGC patch of type 1, can be found by determining the tangential envelope of all 5. in the sense discussed in section 4.3.1; i.e. its corresponding parallel symmetric point on that cross-section. 6. the reference cross-section is chosen so that the correspondence line is not parallel to the axis. 134 * - I' I scale distjP, Q) distfX, Y) a. A x « scale f dist(Ax, P) dist(Ax, X) b. Figure 4.24 Axis based cross-section recovery method, a. fo r LSHGCs. b.fo r nonlinear SHGCs. a. b. Figure 4.25 Axis based cross-section recovery fo r previous SHGCs. a and b from figure 4.15.a; c from figure 4.21.a. the recovered cross-sections using the following method (called the limb reconstruction): Let <2o be the extremity of the boundary opposite to the first point P used in the axis based cross-section recovery method (see figure 4.26). The point Qi on the first recovered cross-section whose tangent line passes through Q0 is 135 marked as a limb point. The process is then repeated for Q x and the next cross-section until the limb point on the last cross-section is so determined. In a sense, the limb reconstruction method consists of approximating the area between two successive cross-sections by an LSHGC patch, where limbs and their tangents are colinear (i.e. Property 3.2). This method applied to the recovered cross- sections of figure 4.25 produced the reconstructed limbs shown in figure 4.27. Figure 4.26 Limb reconstruction method. Figure 4.27 Limb reconstruction fo r the SHGCs o f figure 4.25. This completion is applied until either the recovered cross-section coincides (in the edge support sense) with a boundary which has a linear parallel symmetry 136 with the cross-section such that a valid closure results (discussed in next section) or the end of the unmatched boundary is reached. For SHGC patches of type 2, we cannot use Corollary 3.4 since we have no cross-section to assist in the reconstruction of the missing correspondences. In this case, missing shape information cannot be uniquely inferred in general. However, for LSHGCs and non-linear SHGCs with skew symmetric boundaries this becomes possible (SHGCs of type 2 can be classified for skew symmetry at the local SHGC patch detection step by testing whether the cross-section segments form a constant angle with the axis and whether the axis passes through the mid-points of those segments). In this latter case, the corresponding point Q of an unmapped point/* is the symmetric point of P with respect to the axis in the direction of the skew symmetry (figure 4.28). The reconstructed limb is given by the list of points Q so recovered. Figure 4.28 Inferring missing shape for SHGC patches o f type 2 with skew symmetric sides. For an SHGC of either type, the part completion process, at one of its ends, is first applied to the part’s continuing unmatched boundary (if any). This is the case 137 for the left-most vase of figure 4 .15.a for example. Then a closure verification / completion cycle is applied (see figure 4.29.b). This time, the completion uses other boundaries in the vicinity of the part (an example is illustrated in figure 4.29.a). The boundaries which produce reconstructed limbs which satisfy the structural constraints discussed in section 4.3.2 (preserving the part’s consistency with respect to its projective properties) are appended to the part’s description. In doing so, boundaries that did not get grouped at the curve level (for lack of sufficient local evidence of continuity) are grouped using global shape continuity. The cycle continues until either closure is found (as discussed below) or no consistent completion is possible. Closure Verification The closure verification consists of finding evidence of a valid structural closure pattern at the end of a part hypothesis using the catalog of figures 3.21 and 3.24 (since that catalog does not account for occlusion by other bodies, the combinations are augmented to include T-junctions where the tops of the Ts are boundaries other than the part’s boundaries; an example is given in figure 4.30.c). This is done by searching for junctions at each end of a part hypothesis (between the part’s boundaries and other boundaries in its vicinity) and for connectivity relationships between them. This amounts to searching for the projection o f the cross-section if it is (partially) visible or otherwise for evidence of occlusion. Junctions are found using “junction measures” in order to account for image imperfections. Figure 4.30 138 boundary initially not part of part hypothesis part hypothesis closure not found at end of initial part but after part completion a. accept part hypothesis yes no find part description completion : o u n < ► no find valid closure pattern at part’s end______ reject part hypothesis ouncr yes object hypothesis b. Figure 4.29 Closure!completion cycle, a. example, b. block diagram. illustrates the measures for the junctions of the catalog we use. The measures consist of both proximity and angular variation between boundaries. Proximity is between the extremity of a part’s boundary and the junction point (formed with the other boundaries) and the angles are between the tangent at the extremity a part’s boundary and the line joining that extremity to the junction point. A junction is one 139 for which the interacting boundaries form proximity and angular variations smaller than fixed thresholds. To detect cusps at the end of a part, we use the property that they are usually accompanied by T-junctions [38] where the top of the Ts are the side boundaries of the part (unlike the other occlusion T-junctions where the part’s side boundaries are the stems of the Ts). 3-tgt-j measure / / L-j measure Y J T-j measure a. Figure 4.30 Junction measures. Note that it should not be expected that only the “correct” junctions are found at the end of a part. Due to the use of junction measures, several “candidate junctions” can be found. The method avoids systematically searching connectivity between all possible junctions and their pairings as this would be prohibitively expensive for images with a large number of boundaries. Rather it uses heuristics to select the “best” ones. First, one junction of each type is selected by preferring the closest one to the part’s end. The algorithm first searches for junction pairings 140 that correspond to a (partially) visible cross-section, then in case of failure, those that correspond to an occluded (non-visible) cross-section. In the former, junction pairings are examined in the following order: those using 3-tgt-js, then those using L-js, then those using cusps. The reason for this heuristic ordering is that 3-tgt-js are stronger than, and include, L-js; i.e. the L-j structural arrangement is a sub-case of the 3-tgt-j arrangement where one branch is removed. Also, the existence of an L-j at the end of a part’s boundary excludes the possibility of a cusp ending. For a given junction pair of interest, a search for connectivity between them is performed. A successful pairing is one which results in boundary links between the junctions (the cross-section if it is, at least partially, visible), except for the occlusion junctions. Wrong junctions (or their pairings) when found, typically result in no connectivity at all. The search for boundary links uses the same algorithm described for finding cycles in section 4.3.1 (using continuity constraints, bounds on depth and indexing). In case no valid closure is found at a part’s end, the discontinuous connections which were deferred at the grouping step (as mentioned in section 4.3.2) are examined to check whether any compatible part hypothesis, itself not closed, can be grouped with the current one such that the resulting aggregate part is closed. This is repeated as long as no closure is found and non-closed compatible parts remain. The remaining hypotheses of the closure verification step on the parts hypotheses of figure 4.15 are shown in figure 4.31. The parts hypotheses of this stage are 141 displayed with the recovered cross-sections (after completion) and axes. There are two parts hypotheses in the region inside the cone. Figure 4.31 Results o f the intra-part filtering after grouping the surface patches o f figure 4.15. 4.33.2 The Inter-Part Filtering It is not always guaranteed that part hypotheses surviving the intra-part filtering correspond to meaningful objects in the scene (although very few do survive in practice). The purpose of this step is to analyze the interactions between the surviving part hypotheses in order to select the “best” among those conflicting with one another. The conflicts and the selection rules used are given below. Case 1. If a set of parts hypotheses have the same cross-section (with 3-tgt-j closure) then first the smallest ones are discarded. Among those whose length is 142 comparable to the longest one, the part with the most regular (tightest) closure, i.e. with the smallest junction measures, is selected. This criterion corresponds to the fact that a given cross-section belongs to only one part. The same cross-section may be used to rule (detect) other parts, however (for example joined parts), but the parts cannot be closed by the same cross-section. Overlapping parts sharing the same cross-section result from images of objects with regular internal markings such as the middle cone of figure 4.15 which did result in more than one part hypothesis (“inside” the cone) as can be seen in figure 4.31. Case 2. If a part hypothesis has a 3-tgt-j closure then all other part hypotheses which have its internal cross-section boundaries as side boundaries are rejected, and all other part hypotheses which have its side boundaries as cross-section boundaries are also rejected. This filtering is based on the fact that 3-tgt-j closures are highly unlikely to occur by chance due to their constrained structure (in the experiments, false 3-tgt-j closures almost never occurred). Therefore, the boundaries internal to the cross- section of a part with a 3-tgt-j closure should not be used as side boundaries in any part hypothesis (figure 4.32.a). Consequently the side boundaries of a 3-tgt-j closed part should not be used as cross-section boundaries in any other part hypothesis. For example, while the boundaries of figure 4.32.b form a plausible part hypothesis when isolated (a football for e.g), they should not be interpreted as a part when 143 boundaries should not have “cross-section” labeling boundaries should not have “side boundary” labeling side boundaries V plausible part hypothesis /■with “convergent” cross-section y closures boundaries v part hypothesis with 3-tgt-j closure dominates b. c. Figure 4.32 Inter-part filtering using 3-tgt-j closure. considered in the case of figure 4.32.c where the 3-tgt-j closed part dominates and the same boundaries are now perceived as cross-section. Ambiguities may occur at this stage and some of them need a more global context such as joints between parts. An example is when all the side boundaries of a part are the cross-section boundaries of another. Such ambiguities are dealt with at the object level discussed in chapter 6. At this level, we only consider the case where the side boundaries of a part and the cross-section of another intersect but are disjoint. Case 3. If a part hypothesis is completely subsumed by another (i.e. included in its surface region) then it is rejected unless it has a 3-tgt-j closure. 144 The exception for 3-tgt-j closed parts avoids rejecting joined parts such as shown in figure 4.33.b. The other cases are weaker and could result from regular markings (such as drawings) on a part’s surface (such as the cone’s of figure 4.31). This filtering, save for the closure analysis, is similar to the subsumption one used by [45]. part hypothesis rejected part hypothesis not rejected a b. Figure 4.33 Using subsumption and closure analysis to filter out unlikely part hypotheses. The results of the whole process including the inter-part filtering on the boundaries of figure 4.9 are shown in figure 4.34.a. The results of the inter-part filtering step on the parts hypotheses of figure 4.31 are shown in figure 4.34.b. Here, the parts are shown with their image cross-sections and meridians. Additional results of the method on other images, with occlusion and more background texture and markings, are shown in figure 4.35. In the figure, only the “correct” local SHGC patches are displayed although false hypotheses were generated and rejected at the verification step. 145 b. Figure 4.34 Results o f the SHGC level, a from the boundaries o f figure 4.9; b.from the part hypotheses o f figure 4.31. 4.4 3-D Shape Inference At this stage of the process the projective descriptions of the detected SHGCs give the necessary information to recover their 3-D shape. Ulupinar and Nevada [69] have addressed the recovery of a viewer-centered description, based on surface orientations, of the visible portions of an SHGC from its 2-D contours. Our method 146 Figure 4.35 c. Additional example o f results o f the method, a. intensity and edge images; b. detected local SHGC patches (only the “correct’’ hypotheses are shown), c; completed verified part hypotheses. 147 is built on that method and extends the results to obtain the intrinsic description of an SHGC, namely its 3-D cross-section, axis and scaling function. This latter constitutes an object-centered description independent of the viewer coordinate system. The method used first applies the method of [69] on the obtained SHGC descriptions, then infers their intrinsic descriptions using a mathematical analysis we describe in detail in chapter 6. The method requires that an SHGC has its cross- section visible (an argument on this is given in chapter 6) in order to be able to recover its 3-D shape. For SHGC parts which are involved in joints that prevent the visibility of their cross-sections, the joints can be used in some cases to infer the missing cross-sections and make it possible to recover 3-D shape for these SHGCs too. This is also discussed in chapter 6 where compound objects and the constraints they set on 3-D shape are addressed. The results of this method on the descriptions of figure 4.34 (lower three objects) and figure 4.35 are shown in figure 4.36. The figure displays the SHGC primitives with their cross-sections and meridians for different 3-D orientations. 4.5 Discussion 4.5.1 Further Capabilities and Limitations The method that detects SHGCs of type 1 is not limited to special types of cross- sections nor does it assume bilateral symmetric limb boundaries. The SHGC in the 148 b. Figure 4.36 Recovered 3-D descriptions o f previous SHGC scenes shown fo r different poses; a. from figure 4.34 (lower objects); b.from figure 4.35. middle of the first row of figure 4.34 is an example of non-circular cross-section SHGC, as is the occluded object in the image of figure 4.35. The method also handles cross-sections with tangent discontinuities which create sharp edges over the surface. The only difference between such edges and limb boundaries is that the former are the trace of a single point of the cross-section (thus, meridians) whereas the latter are not. All the properties and methods we have given necessarily apply 149 to meridian projections. For example limb correspondences for local SHGC patch detection can still be determined in the same way we have discussed. In fact, some the methods are even simplified in cases sharp edges exist. For example, limb reconstruction of a sharp edge consists simply of tracing the recovered cross- sections comers. Such sharp corners are detected during the cross-section hypothesis step discussed in section 4.3.1. Furthermore, the method can also be applied to cases where the cross-section is polygonal or has concavities which create several limb (or meridian) boundaries which may self-occlude in the image. An example is shown in figure 4.37. The SHGC detection proceeds as though the discontinuities produced by self-occlusion were caused by occlusion from another object. The only difference with other types of SHGCs (with convex cross-section, for example) is that several verified hypotheses will result for the same object. Identification of such a case is straightforward as all such descriptions involve the same cross-section, same axis and same scaling function. To group the resulting SHGC patches for such parts, it is useful to perform a hypothesis merging step so as to make a single part hypothesis out of the many patches obtained. This step is very similar to the local SHGC patch grouping step discussed in section 4.3.2. Its geometric compatibility constraints use the same rules but with the additional constraint that the mappings between boundaries be transitive; i.e. if the point P \ of boundary B i is mapped by the SHGC patch Si to point P2 of boundary B2 and if P2 is mapped by the SHGC patch S2 to 150 a b. c. d. Figure 4.37 Multiple (merged) hypotheses that correspond to the same SHGC object, a. original boundaries; b. detected local SHGC patches; c. completed descriptions (and contours); d. resulting global SHGC description (contours and rulings). 151 point P 3 of boundary 5 3 then P i and P 3 should correspond to each other in the local SHGC patch mapping sense (figure 4.38). In this sense part hypothesis merging becomes a grouping in the cross-section direction as opposed to the grouping in the axis direction of section 4.3.2. The structural constraints of the hypothesis merging step impose that all patches must have compatible closures at their extremities. For example, the patches of figure 4.37.C all have closure with the same “top” cross- section and all have the “same” axis (up to errors) and transitive mappings. In principle, for detecting the presence of such a part, it is not necessary to consider all such limb boundaries. Any pair would be sufficient for the detection of the object and its 3-D recovery since the properties of section 3.1.1 apply regardless of whether the cross-section is concave or convex. The hypothesis merging step, however, allows to have a better segmentation by identifying all observed boundaries belonging to the same object. This step is not fully implemented in the current system. The detection of SHGCs of type 2 has been tested on fewer images. We have shown the obtained results on some of the tested examples and they appear to be promising. The module that detects such SHGCs needs more testing however. All steps, except the local SHGC patch detection, are the same as those of the module detecting SHGCs of type 1 (the hypothesis merging step also applies to SHGCs of type 2). Thus, only the local SHGC patch detection method (the ellipse-based SHGC finding) needs to be tested more and, possibly, improved. We believe this 152 method to have the potential to handle complex images as well. Currently, the module that detects SHGCs of type 2 is not an active part of the system. transitive SHGC patch correspondences SHGC patch Si ___ SHGC patch S2 Figure 4.38 Mapping transitivity constraint fo r hypothesis merging. Also, notice that the geometric projective properties used assume orthographic projection, but the methods have been tested on images exhibiting some perspective effects (the images used to test the methods were taken by a CCD camera). The use of orthographic properties, thus, results in approximations of the shapes. For example, parallel symmetry is not a perspective invariant of the cross- sections of an SHGC at points on the same meridian. However, unless an object is too close to the camera, parallel symmetry gives reasonable approximates of such corresponding points. The goal of the method is in fact to produce approximate shapes for the viewed objects. These approximates are better when the objects’ dimensions are small compared to their distance from the camera and gradually degrade as the perspective effects become significant. 153 4.5.2 Performance This method has been successfully applied to several images and many objects. Three simple images, with a single object each were shown in figure 4.34 (upper row), and results on two complex images were shown in figure 4.34 and figure 4.35. Two additional complex images where the SHGC system performed correctly are shown in chapter 6, making a total of 7 images with 11 different SHGCs. A number of mechanisms have been used to reduce the time complexity of the method. They involve the use of indexes for the spatial access of relevant features such as boundaries and local surface patches and of search constraints so as to focus on promising regions of the search space. Such mechanisms are discussed in detail in appendix E. An analysis of the time complexity of key algorithms described in this chapter is also given in appendix E. Here, we give the performance of the method on the results shown as the evolution of the number of hypotheses throughout the hierarchy and as the corresponding run times. In each table, the corresponding images are referred to by the figure number where they occur in the text. The relevant steps are referred to in the top row of each table. The given run times are obtained on a Sparc 10/30 using Sun Common Lisp under the CME environment. Table 4.1 shows that a very small fraction of the initially hypothesized SHGC patches survive the verification tests. Table 4.2 shows that the local SHGC patch detection is among the most time consuming steps. This is due to the nature of the 154 correspondences sought and which require a local minimization for each point of each boundary of interest. The cross-section finding step is also time consuming for the image of figure 4.35. This is due to the high complexity of the background and the surface markings. scene boundaries hypothesiz ed cross- sections local SHGC patches parts hypotheses verified parts hypotheses fig. 4.12 209 7 73 70 3 fig. 4.35 1783 7 89 85 2 Table 4.1 Evolution o f the number o f hypotheses o f the SHGC module scene cross- section finding local SHGC patch detection grouping of local patches verification of parts fig. 4.12 0:25 1:32 0:10 0:31 fig. 4.35 4:40 4:18 0:07 1:00 Table 4.2 Run times o f the SHGC module (min:sec) Some of the methods could have been made more efficient by adopting a coarse to fine method, for example, in searching for the correspondences. We believe, however, that the most significant way is to implement the above methods on a multi-processor architecture. In each step, the hypothesize-verify processes consider one entity (local surface patch for example) at a time which is independent of the others. Thus, the method naturally leads itself to a parallel implementation. 155 Several parameters have been used in the implementation. Most of them are used to measure errors in the properties. Examples include the linearity of the axis of a non-linear SHGC, the linearity of the limbs of an LSHGC, the compatibility measures and the junction measures. All the parameters of the SHGC module have been constant for images shown in this chapter and also those which will be shown in chapter 6. However, the robustness of the system to changes in the parameter values has been tested. Experiments were conducted by changing key parameters by up to 50% of their default values. Those changes have only affected the size of the search space yielding larger spaces for looser parameter values, without affecting the final results of the system. 156 5 Segmentation and Description of PRCGCs and Circular PRGCs In this chapter, we describe the method used for solving the segmentation and shape recovery of PRCGCs and circular PRGCs from a singe real intensity image that includes imperfections such as noise, markings, shadows and occlusion. It uses the results of the curve and parallel symmetry levels in a hypothesize-verify process whose steps are the same as those of the SHGC system. Therefore, the discussion in this chapter will focus on the surface patch level of the system that detects the curved axis primitives. 5.1 The Surface Patch Level The surface patch of the module that handles PRCGCs and circular PRGCs has a hypothesize-verify organization similar to one used in the SHGC patch level (figure 5.1). Its steps include the detection of local curved-axis surface fragments, grouping of compatible ones and verification of parts hypotheses. The methods used in those 157 steps are based on the projective properties discussed in chapter 3. The different steps are discussed below. PRGC parts hypotheses t verify parts hypotheses | i „ group “local PRGC patches” I detect “local PRGC patches” | t boundaries and symmetries Figure 5.1 Block diagram o f the PRGC patch level. 5.1.1 Detection of Local Curved-Axis Surface Patches As was done for SHGCs, it is important to characterize local curved-axis surface fragments that could correspond to visible portions of a part. From the properties of section 3.1.2, two types of geometric primitives relate the boundaries of a curved- axis part: parallel symmetries and right ribbons. The former give exactly the co cross-sectional points and the orientation of the image of the 3-D axis of a PRCGC (invariant Property 3.6) while the latter give good approximates of the co-cross sectional points and the orientation of the image of the 3-D axis of a circular PRGC (quasi-invariant Properties 3.12 and 3.13). 158 First, one issue of interest is how to differentiate between PRCGCs and circular PRGCs; i.e. given a pair of image boundaries which admit both parallel symmetry and right ribbon correspondences, which (if any) of the two correspondences give co-cross-sectional points assuming that the boundaries project from a curved-axis part in the 3-D scene. This differentiation cannot, in general, be made on the basis of the size of the image correspondence segments. Property 3.9 indicates that the size of the cross-section segments is constant for circular PRCGCs. But in the case of a PRCGC with a non-circular cross-section it is easy to show that the cross-section segments do not have a constant size in general (see appendix C for an argument and for an analysis of special cases). One partial solution is to assume that the part has a constant sweep if its image has parallel symmetry with constant-size correspondence segments (the discussion above indicates that the reverse does not hold). The argument is that if the part is non-constant in 3-D then it is highly unlikely that it produces constant-size correspondences in the image. If the parallel symmetry correspondence segments do not have a constant size, then the decision as to whether parallel symmetry correspondences or right ribbon correspondences give the co-cross-sectional points of the image of the part is left until 3-D shape is recovered (discussed in chapter 6). For such cases, the image of the part will have a dual description consisting of parallel symmetry correspondences and right ribbon correspondences. This leads us to the following definition of a curved-axis local surface patch. 159 Definition 5.1: A curved-axis local surface patch is given by a pair of boundaries admitting either a parallel symmetry or a right ribbon. Figure 5.2 shows sample curved-axis local surface patches. The parallel symmetries necessary to detect curved-axis local surface patches are given by the symmetry level. The symmetries of interest include both linear and non-linear ones. However, right ribbons are not given and have to be detected in order to hypothesize circular PRGCs. In what follows, we discuss the detection of right ribbons. Detection of right ribbons has been initially addressed by [48] using the projection method mentioned in section 2.1.1. The method consists of discretizing the orientations of the axis into a small number of regularly spaced orientations and finding possible straight axis ribbon portions for each of these orientations. The resulting ribbons are linked to allow for smooth axis curvature. The method was applied to closed outlines obtained from range data. More recently, the projection right ribbons \ parallel symmetry correspondences Figure 5.2 Sample local curved-axis surface patches. 160 method also was used by [58] to detect right ribbons from edges detected in a real intensity image. The projection method has complexity 0(km ) where k is the number of discretized orientations and m the number of points. For scenes with a large number of points (or boundaries) this method can be slow. A possible way to speed up the projection method is to use the B-spline representation of the boundaries and cast rays from the B-spline extremities. This would also allow an analytic determination of the axis as a set of B-spline segments [76]. The complexity is still O(jon) but with m being the number of B-spline segments, a much smaller number than the number of points. This slight improvement may still produce slow run times especially if the desired accuracy requires k to be large. The problem with this method and the original projection method lies in the fact that no initial conditions are given for the search for the axis orientations. With a reasonable initial estimate of the axis orientation, a substantial speed up could be obtained. From the analysis in section 3.1.2 (and the related proofs in appendix A), parallel symmetry correspondences are exactly the right ribbon correspondence for a constant-size ribbon and start to offset from them as the derivative of the sweep increases. Thus, between a pair of boundaries, the method we have developed uses their parallel symmetry axes as an initial estimate for the right ribbon axes in a minimization process. Given two boundaries B i and B2, and the parallel symmetry 161 axis (if any) between them, the method consists of searching around the initial correspondences and/>2g2° in figure 5.3.a) for the “best” right ribbon segments between two successive points (p\ andp2). The details are as follows: a) form initial correspondences by intersecting the normal lines at the extremities of each B-spline segment of the parallel symmetry axis with both boundaries (figure 53.a). Select an arbitrary boundary, say B lt as a reference. Call such initial correspondences Pi q® parallel symmetry axis P * 1 symmetry Pi initial right ribbon correspondences a. local axis m Figure 5.3 Right ribbon detection, a. parallel symmetry axis as initial estimate o f right ribbon axis; b. finding right ribbon correspondences. b) find the “best” corresponding points, and q2, f°r the extremal points Pi and p 2. This is done by searching around the initial correspondences p i qi° and p 2 q2 , over an angular region of a selected range A0 and a step dQ (figure 5 A.a). At each step j of the search, the correspondences are the segments s ^ = p i q ^ and s j = p 2 q^. Let us call 162 local axis the quadratic B-spline segment (if any) defined by m x J and m j, the midpoints of and s^, and orientations f j7 and t2] which are the normals to and s-j (see figure 5.3.6). The “best” right ribbon segment mapping the points p i and p 2 is defined as the one which minimizes the following measure (see figure 5.3.6): E = X1E l + h l E2 (5.1) Y^disti^p1 a,P ^ /d ist (p., qt) where E^= - ------------------- -------------------- ; p la is an i'th point on the local axis; qi corresponding points (intersections of the line orthogonal to the axis at p la with the two boundaries) and p lm the midpoint of Pi qt and E2 = y / distim ^, m j); y being the angle between the segments S]7 and s j . c) for each subsequent unmatched point p^ of the reference boundary, use the previously found right ribbon correspondence segment Pk-iQk-l and search around P tf® for the corresponding point q^ that minimizes E of equation 5.1 (see figure 5.4.6) Minimizing E x forces the distance between the local axis and the mid-points of correspondence segments p t q^ to be small; i.e. the local axis should be the 2-D axis as defined in section 3.1.2. Minimizing E 2 avoids high curvature axes which would correspond to highly curved and thick shapes that are close to self-intersect; Xl and are real scalars (whose sum is 1). The method gives more weight to E 1 163 < ?k -2 a. b. Figure 5.4 Local right ribbon detection, a. finding extremal correspondences, b. finding subsequent correspondences. than E 2 as the former should be zero for a right ribbon axis and the latter is only a regularity measure (in our system A ,! is fixed to 0.8 and 7^ to 0.2). Steps b) and c) are carried out up to the extremities of the parallel symmetry axis. An extension of the obtained right ribbon is then performed so as to match at least one boundary extremity (the parallel symmetry axis may not match any boundary extremity). The extension procedure uses the same minimization and uses at each unmatched point the previous correspondence segment as the initial orientation for the search of its right ribbon corresponding point. This method has complexity 0{l2 + 1 m) when applied to an axis of parallel symmetry with m B-spline segments (smaller than the number of points); I being the number of steps of the local search, / = A0 / dQ. It has proved much faster, even including the parallel symmetry detection time complexity, than the original projection method. 164 Figure 5.5.b shows some the local curved-axis surface patches detected from the image of figure 5.5.a. The figure displays cross-section segments and axes (in b. Figure 5.5 Resulting local surface patches, a. original intensity and boundary images; b. detected local surface patches. bold lines). It can be seen that the method does not yield only local surface patches that project from actual scene objects (the total number of hypothesized surface 165 patches is 43, only 8 are fragments of meaningful parts). The reason is the same as the one discussed for SHGCs and relates to the non-sufficiency of the properties and the use of error thresholds. 5.1.2 Grouping of Local Curved-Axis Surface Patches The grouping step uses a combination of geometric and structural compatibility constraints. We first discuss the structural compatibility constraints then the geometric ones and the selection of the grouping hypothesis at the end of each surface patch. The structural compatibility constraints impose the structural arrangements between the local surface patches to be consistent with the inner-surface Property 3.16 of section 3.2. In this case too, continuous connections are distinguished from discontinuous ones. The area overlap constraints and the T-junction and cusp constraints discussed in section 4.3.2 for SHGCs are exactly the same in this case (figure 5.6). When the patches’ areas do not overlap, an additional constraint is used that forces the boundaries of the patches to admit an extrapolating conic between their extremities (to avoid invisible inflections; see figure 5.6.c). Due to the unknown curvature of their axes, curved axis primitives do not enjoy as strong geometric compatibility constraints as SHGCs do. The geometric compatibility constraints used consist of the continuity of the local surface descriptions of two neighboring local surface patches. Two structurally compatible local curved-axis surface patches S\ and S2 are geometrically compatible if the 166 I a. non-overlapping patches area overlap T-junction (self-occlusion) b. self-occluding patches interpolating conic regular case inflection irregular case c. constraint on boundary continuity Figure 5.6 Structural constraints between local curved-axis surface patches. relative difference w’12 (i*1 figure 5.7) between the sizes of their extremal cross- section segments and the relative distance g 12 between their extremities are smaller than fixed thresholds. 167 ■ * m *n = \r2 - r i \ / d n Si\ ___ Mj/rj §12 - dl2 I (h + ^12 + h) Figure 5.7 Geometric constraints between local curved-axis surface patches. At the end of a local surface patch the selected compatible local surface patch S2 is the one such that the following measure is minimized between both patches among all neighboring structurally compatible ones i.e. M yi < M ij\/j j 2 andM 12 <M t2 V/ / ^ 1; where a ^ is the angle between the extremal cross-section segments as shown in figure 5.7. Minimizing corresponds to favoring small values of the local shape measures (e, r). In this sense, the relative radius change m’i2 becomes the image counterpart of the 3-D sweep derivative r of the analysis of section 3.1.2 and the angle a 12 becomes the normalized (for scale invariance) counterpart of the 3-D axis curvature k . This geometric constraint is similar in spirit to the one used in the ribbon- based method of [58] described in section 2.1.2 and which imposes similarity of the m 12 ~ £ 1 2 al2 m'l2 168 cross-section segments sizes and orientations. Ours uses a compact mathematical measure that is minimized. The grouping hypotheses for the local surface patches of figure 5.5.b are shown in figure 5.8 where compatible patches are labeled by the same number. The deferral of discontinuous connections is handled in the same way as discussed for SHGCs in section 4.3.2. Figure 5.8 Examples o f grouping hypotheses from the surface patches o f figure 5.5.b. 5.1.3 Verification of Curved-Axis Parts Hypotheses Aggregates of local curved-axis surface patches constitute parts hypotheses that are to be verified for global consistency in a two-step process: the intra-part filtering and the inter-part filtering. The intra-part filtering step consists of verifying that a part hypothesis is closed with one of the closure patterns of section 3.2 (for curved-axis primitives). 169 For the same reasons as those discussed in section 4.3.3, boundary fragmentation may cause closure not to be observed at a part’s end. Thus, a cyclic process “closure verification / completion” is also used. The closure verification uses the same method and junction measures discussed in section 4.3.3; i.e. junctions and boundary links between them are searched. However, the junction pairings are different since they should be based on those which correspond to the structural projective properties of the closure of PRCGCs and circular PRGCs (figures 3.22, 3.23, 3.25 and 3.26). The part completion process consists of finding boundaries in the vicinity of a non-closed part’s end which can be used to consistently (with respect to its geometry and structure) extend the part’s surface in the hope of finding valid closure patterns at the end of the extended part. Among the boundaries in the vicinity of a non-closed part’s end, the method consists of first finding pairs which admit ribbon descriptions which are geometrically and structurally compatible with the part (see figure 5.9.a). Only boundary pairings that did not form local surface patches are considered (otherwise they would have already been considered at the grouping step; i.e. those boundary pairs do not admit parallel symmetry). For such pairs, the method attempts to find right ribbon correspondences using the right ribbon extension step of the method described in section 5.1.1. If a pair of boundaries is found which admit a right ribbon description which is both geometrically and structurally compatible with the current part then the resulting local surface patch 170 is appended to the part’s description. If no such boundary pair is found then the method chooses the closest boundary (if any) which preserves the structural consistency of the part (figure 5.9.b). This constrained completion provides global (shape) criteria for grouping boundaries that did not get grouped at the curve level using local criteria. \ \ initial part hypothesis \ V boundaries initially not belonging to part and not forming local surface patch boundary initially not belonging to part newly formdd local surface patch compatible with initial part closure not found at end of surface patch but after part completion a. initial part closure not found at end of surface patch but after part completion b. Figure 5.9 Curved-axis part completion, a. using pairs o f new boundaries; b. using a single boundary If no such completion is possible and no closure is found then the deferred discontinuous connections are examined to check whether any compatible part hypothesis can be grouped with the current one such that the resulting aggregate part is closed (in the same way as discussed for SHGCs in section 4.3.3). Part hypotheses for which no valid closure is found at either end are rejected. 171 The inter-part filtering is the same as the one discussed for SHGCs (it is also applied between both types of primitives). It consists of filtering out part hypotheses which are in conflict with 3-tgt-j closed parts (stronger hypotheses), and removing parts which are completely included in the surface of other parts. The details are given in section 4.3.3. The results of the verification step on the parts hypotheses of figure 5.5.b are shown in figure 5.10. The figure shows the parts with their boundaries only and shows the closures (cross-sections) found for each part. The gaps in the parts are due to missing boundaries either in low contrast regions (e.g. the long straw) or occlusion regions (e.g. the truncated torus occluded by the long straw). Additional results of the method on other images are shown in figures 5.11 and 5.12. Figure 5.10 Verified part hypotheses from the surface patches o f figure 5.5.b. 172 Figure 5.11 Additional example o f resulting object hypotheses fo r a different scene o f similar objects, a. initial intensity image; b. edge image; c. detected local surface patches, d. verified parts. 5.2 3-D Shape Inference At this stage, each PRGC has a projective description in terms of its cross-sections (if visible) and surface correspondences. This description can be used to recover 3- 173 Figure 5.12 Additional example o f results o f the segmentation method. a. initial intensity image; b. edge image; c. some detected local surface patches; d. verified parts. D shape of a part. Ulupinar and Nevatia [70] have addressed the 3-D recovery of PRCGCs in terms of surface orientations along visible portions of the surface. Recovery of 3-D shape of circular PRGCs has not been previously addressed in the literature. 174 The method we use is inspired from the method of [70] and directly recovers the intrinsic description of PRCGCs and circular PRGCs, namely their 3-D cross- sections, their 3-D axes and their scaling functions. The method is described in detail in chapter 6 where we also discuss the constraints set on 3-D shape of the joined parts of a compound object. The results of the method on the objects of figures 5.10, 5.11 and 5.12 are shown in figure 5.13. This figure displays the recovered primitives in terms of their cross-sections, meridians and 3-D axes for different 3-D orientations. 5.3 Discussion 5.3.1 Extensions and Limitations The method has been applied to smooth convex cross-section parts only. We believe it can be extended to handle polygonal, concave and cross-sections with sharp edges which produce several limbs (meridians) in the image that can even occlude one another (see figure 5.14). As for SHGCs, the issue here is the merging of parts hypotheses since several surface patches would be produced by such cross-sections. The merging constraints also include geometric and structural tests. The geometric tests should impose the transitivity of the co-cross-sectional points correspondences such that any surface patch is consistent with the global part description. The structural tests are similar to those for SHGC merging and impose that all patches 175 Figure 5.13 c. Recovered 3-D primitives shown fo r their original poses and different ones. a. from the detected objects o f figure 5.10; b.from the detected objects o f figure 5.11; c.from the detected objects o f figure 5.12. 176 be closed by the same cross-section. This step is currently not implemented in our system. This method has some similarities with the methods of Rao and Nevada [58] and Mohan and Nevada [45] discussed in section 2.1.2. For example, both of these methods also use a hypothesize-verify approach. The method of [58] is closer to ours in that it also uses grouping of ribbons. The major difference lies in that our method explicidy uses the 3-dimensionality of the objects by exploiting rigorous projective geometric and structural properties. The other methods use criteria that apply more to 2-dimensional objects and thus do not exploit the 3-dimensionality of the viewed objects. ^ multiple surface patches (to be merged) Figure 5.14 A PRGC part with multiple surfaces. As an example, consider the left-most object in figure 5.12. In the method of [58] for example, two ribbons would be obtained, one for the elongated surface and one for the larger cross-section. These two ribbons would not be identified as belonging to the same object due to the “non-overlapping” constraint imposed which forbids joined ribbons to have overlapping areas. The elongated surface and 177 the cross-section of the above part do have overlapping areas. Further, the closure constraints they use impose the sequence of boundaries joining the extremities of a part’s end (closing boundaries) to lie strictly between those extremities (“in- between-ness” constraint). Since this is not the case for the elongated surface, it will be rejected by their method, whereas it should not. Our method, based on the analysis of the structural closure patterns of the projection of 3-D bodies, imposes more rigorous constraints on the closure of parts hypotheses. In the case of the above object, the larger end has been identified as having a cross-section facing away from the camera with a cusp-cusp closure (closure pattern of figure 3.26.e). Thus, our method not only accepts more complex structural arrangements but also provides more adequate segmentation (into surface and cross-section) and useful shape information, such as where the cross-section is pointing, which is exploited in the 3-D shape inference step, discussed in chapter 6. However, there are cases where our method would fail where the method of [58] works. This is the case of (flat) complex objects where parts are connected to other parts with no visible cross-section such as in the drawing of figure 5.15. The parts will be rejected by our method because, at their joined ends they do not have any of the valid structural closure patterns. A combination of our method with the method of [58] would produce a more general system that handles both 3-D and 2- D objects. 178 F igure 5 .1 5 A line draw in g w here ou r m eth o d w o u ld fa il. 5.3.2 Performance This method has been applied to several complex images of curved axis parts three of which were shown in this chapter (figures 5.5,5.11 and 5.12) and two additional ones will be shown in chapter 6 where we address compound objects. It has produced satisfactory results on those images. An analysis of the time complexity of the algorithms described is given in appendix E where we also discuss some mechanisms used in the implementation to reduce the time complexity. In this section, the performance of the method on the three images shown in this chapter is given as the evolution of the number of hypotheses throughout the hierarchy and as the corresponding run times. Table 5.1 shows that, originally, many hypotheses are generated but very few of them survive the verification tests. Notice also that only a small fraction of the initial parallel symmetries produce local PRGC patches. Table 5.2 shows that the most time consuming steps are the parallel symmetry detection and the verification 179 of parts hypotheses. This is because at the parallel symmetry detection, no notion of parts is yet achieved and the search involves many unrelated boundaries. The part verification searches for complex closure patterns of PRGCs (a super set of those of SHGCs). The complexity of the search is also proportional to the complexity of the image. scene boundaries parallel symmetries local PRGC patches parts hypotheses verified parts hypotheses fig. 5.5 424 1237 43 32 3 fig. 5.11 831 1634 48 41 2 fig. 5.12 1661 7567 132 110 2 Table 5.1 Evolution o f the number o f hypotheses o f the PRGC module scene parallel symmetry local PRGC patch detection grouping of local patches verification of parts fig. 5.5 0:30 0:33 0:07 0:39 fig. 5.11 0:46 0:29 0:09 1:00 fig. 5.12 9:05 2:39 0:30 5:51 Table 5.2 Run times o f the PRGC module (min:sec.) A number of parameters have been used in the PRGC module. Their purpose, as those of the SHGC module, is to measure errors in the projective properties. Examples include the right ribbon measures, the compatibility measures and the junction measures. For all the shown images in this chapter these parameters have been constant. However, the performance of the system has been tested by changing 180 the values of key parameters by up to 50% of their default values. Those changes have not affected the final results of the system on the images shown. The only effect was a larger search space, and longer run times, for looser parameter values. 181 Detection and 3-D Inference of Compound Objects The object level of the hierarchy uses the parts hypotheses produced by the part level for further segmentation and shape inference of compound objects consisting of the arrangements of one or more parts. Parts which show evidence of belonging to the same compound object are grouped into higher level symbolic entities which include the component parts and their relationships, resulting in a graphical representation of each object. This level not only builds these representations but also exploits the interaction between an object’s parts to infer (and complete) their 3-D shapes. It is organized in two main steps: the object detection step and the 3-D shape inference step (figure 6.1). The object detection step consists of analyzing the relationships between the hypothesized parts in order to find potential joints between them and the shape inference step uses the detected joints to infer useful (3-D) shape information including missing shape elements that could not be obtained at the part level. Before describing these two steps, we start by giving the properties of compound objects that are relevant to their analysis. segmented volumetric descriptions t 3-D shape inference f object detection t parts hypotheses Figure 6.1 Organization o f the object level. 6.1 Properties of Compound Objects As mentioned in section 1.2.1, the compound objects addressed in this thesis are those which consist of two types of joints between parts: end-to-end and end-to- body. Figure 6.2 illustrates these arrangements. The end-to-end joint corresponds to two parts in contact at their ends which are not necessarily of the same size or shape. The end-to-body joint corresponds to one part’s end in contact with another part’s body with no constraint on their relative shapes and sizes other than the joined part’s end being smaller that the supporting part’s body. Although not all objects in our environment have these two types of joints, we believe them to be the most frequent ones. Other types of joints would include more complex contacts between parts 183 such as one part’s end in contact with another part’s end and body at the same time. Such joints are less common than those we address. end-to-end joint Figure 6.2 The two types o f joints addressed: end-to-end and end-to-body. The properties of the joint relationships are mostly structural; i.e. they consist of junction relationships (the geometry is captured mostly by the parts). To find the different junctions that can be observed, we can make a similar analysis as the one done for part closure in section 3.2. The junctions depend on both the viewing direction (or pose of the object) and the shape of the parts. We discuss each type of joint separately. Effects of occlusion by other bodies and of contour breaks are discussed in section 6.2. 6.1.1 End-to-End Joints There are two cases for an end-to-end joint: the two parts have the same size at their contact (figure 6.3.a through c) or have different sizes (figure 6.3.d and e). For any viewing direction (except the specific one which lies in a part’s cross-section plane), at the joined ends, one part will have its cross-section facing towards the camera and end-to-body joints " \ l end-to-body joint 184 the other will have its cross-section facing away from the camera. Therefore, except when occluded, a part’s end should have one of the junction combinations discussed in section 3.2. An additional analysis is needed for the observed junctions at occlusion regions between the joined parts. In the analysis below, we use the L -j L- j combination to represent any of the closure patterns that result from the cross- section facing away from the camera and the 3-tgt-j 3-tgt-j combination to represent any of the closure patterns that result from the cross-section facing towards the camera. Both combinations can be replaced by any of the set of combinations they represent. To analyze the effects of joints we separate the cases of equal-size ends joints and different-size ends joints. In each case, we first consider the case where the intersection curve between the parts {joint curve) is visible, then the case where it is not visible. Equal-Size Ends C asel. The joint curve is visible. In this case, the joint curve is the common cross-section. Therefore, the part with its cross-section facing away from the camera will have a regular L-j L-j closure (top part in figures 6.3.a and b). The part with its cross-section facing towards the camera cannot have 3-tgt-j 3-tgt-j closure because the other part partially occludes its cross- section. Only the portion of its cross-section that coincides with the visible joint 185 curve will be observed. The observed junctions of this part depend on whether its junction points are close to the other part’s or not. In case they are, then this has the effect of preventing the upper branch of the 3-tgt-j to be visible, resulting in a sort of “two-tangent junction” (henceforth 2-tgt- j) as can be seen for the bottom part of figure 6.3.a. The 2-tgt-j is a member of the part closure catalog and is used for finding part closure. In the case where the junction points are relatively distant (such as due a flaring of the part with its cross- section facing towards the camera), then the part with its cross-section facing away will partially occlude the other part’s closure, causing this latter part to have T-j closures (bottom part of figure 6.3.b). Case2. The joint curve is not visible The only case where the joint curve is not visible is when the part with its cross-section facing away from the viewer occludes the other part (figure 6.3.c). This happens, for example, when the occluding part has a flaring. This case corresponds to the case of figure 6.3.b seen “from behind.” Different-Size Ends C asel. The joint curve is visible In this case, both parts have regular closures; i.e. the one facing away has L- j L-j closure and the one facing towards the camera has 3-tgt-j 3-tgt-j closure. However, unless the former part is too small compared to the cross-section of the 186 latter part, T-js are observed between the side boundaries of the smaller-end part and the cross-section boundaries of the large-end part. See figure 6.3.d. Case 2. The joint curve is not visible This occurs when the larger part’s cross-section occludes the smaller part, causing the former to have L-j L-j closure and the latter to have T-j T-j closure. See figure 6.3.e. Figure 6.3 Structural relationships fo r end-to-end joints. Equal-size ends (a through c) and dijferent-size ends (d and e). T-j T-j closure T-j T-j closure L-j L-j closure 2-tgt-j 2-tgt-j closure a. L-j L-j closure b. '/ L-j L-j closure c. L-j L -i closure T-junctions i . (top or 1 is cross section of larger " size part) (stem of T is cross-section of larger size part) T-j T-j closure (top of T is crc 3-tgt-j 3-tgt-j closure d. L-j L-j closure e. 187 6.1.2 End-to-Body Joints In the discussion below, we will call joined part the part joined at its end and supporting part the one joined at its body. As before, we consider two cases depending on the visibility of the joint curve. Casel. The joint curve is visible In this case, the joint curve will constitute the joined part’s closure. The latter part cannot have its cross-section facing towards the camera. This is because the surface of the supporting part faces the viewer (otherwise the surface and the joint curve will not be visible). Thus, the joined part will have L-j L-j closure (as before, any of the closure patterns corresponding to the cross-section facing away from the camera can be observed instead). Further, unless the joined part is too small, it will partially occlude the supporting part’s side boundaries, causing T-js, where the top of the Ts are the joined part’s side boundaries. This case is illustrated in figure 6.4.a. Casel. The joint curve is not visible In this case, the supporting part occludes the joined part, resulting in T-j closures for the latter where the stems of the 7s are the joined parts side boundaries. This case is illustrated in figure 6.4.b. 188 T-junctions T'J T~ J closure (tops of Ts are joined ^tem s of Ts are joined part boundaries) part boundaries) L-j L-j c lo su re ^ / a. b. Figure 6.4 Structural relationships fo r end-to-body joints.a. visible joint curve; b. non-visible joint curve. 6.2 Object Detection At this step of the object level, interactions among parts hypotheses are analyzed in order to find joints and form compound objects descriptions. This step is organized in two consecutive stages: detection of joints and analysis of ambiguities. 6.2.1 Detection of Joints The joint detection method consists of finding for each part the structural relationships of figures 6.3 and 6.4 with other parts. We give the detection method as a set of rules. We use the terms “away closure” to denote a closure pattern corresponding to a cross-section pointing away from the camera and “towards closure” to denote a closure pattern corresponding to a cross-section pointing towards the camera. 189 • if two parts share closing boundaries such that one has an “away closure” and the other “ 2 -tgt-j closure” and “similar” sizes at their ends then an end- to-end joint with equal-size ends is hypothesized (pattern of figure 6.3.a) • if a part has T-j closures with another part’s (occluding part) side boundaries such that the T-junctions are on opposite sides of the occluding part and the occluding part has “away closure” with “similar” sizes at the point of contact between the two parts, then an end-to-end joint with equal-size ends is hypothesized (pattern of figure 6.3.b) • if a part (occluding part) has “away closure” and its side boundaries partially occlude another part’s (occluded part) cross-section boundaries such that the occluded part has “towards closure” and the closure boundaries of the occluding part lie completely within the region delimited by the occluded part’s cross-section, then an end-to-end joint with different-size ends is hypothesized (pattern of figure 6.3 .d). • if a part (occluded part) has T-j closures such that the top of the Ts are another part’s (occluding part) cross-section boundaries and the occluding part has “away closure” with a larger size end than the occluded part’s, then an end-to-end joint with different-size ends is hypothesized (pattern of figure 6.3.e) • if a part (occluding part) has “away closure” and has T-junctions with another part’s (occluded part) side boundaries such that the stems of the Ts 190 lie on the same side of the occluded part and the closing boundaries of the occluding part lie inside the surface of the occluded part, then an end-to- body joint is hypothesized (pattern of figure 6.4.a) • if a part (occluded part) has T-j closures such that the top the Ts are another part’s (occluding part) side boundaries lying on the same side of the occluded part, then a joint of type end-to-body is hypothesized (pattern of figure 6.4.b) Notice that the structural patterns of figure 6.3.C and figure 6.3.e are similar. Thus, the above algorithm will only hypothesize the one of figure 6.3.e; i.e. does not mark the joint as “same-size ends” due to the lack of evidence. This is not critical, though, since the main decision here is the end-to-end interaction. The conditions on the part’s closures above allow for partial occlusion at one side, at most, of a part’s end. For example, the case of figure 6.5.a is accepted as a joint, whereas the one of figure 6.5.b is not. The same junction measures discussed in section 4.3.3 are used to account for boundary breaks. Figure 6 . 6 shows an example of the application of this method. Figure 6 .6 .a shows the edge image (from the intensity image of figure 1 . 1), figure 6 .6 .b shows the detected (verified) parts and figure 6 .6 .C shows the hypothesized joints by displaying the T-junction points as large dots and the junction curves in bold lines. Four joints are found: one end-to-end joint, with equal-size ends and visible joint curve, between the lower cone (the pot) and the upper SHGC (the lid), one end-to- 191 a. b. Figure 6.5 Joint detection allows for partial occlusion, a. a joint is marked between parts p i and p 2 ; b. no joint is marked. body joint with visible joint curve between the right-most circular PRGC (the spout) and the pot and two end-to-body joints with non-visible joint curve between the left-most PRCGC (the handle) and the pot. The joint labels can be seen in figure 6 . 11. 6.2.2 Analysis of Ambiguities Parts hypotheses made at the part level although satisfying strong verification tests, are not always correct hypotheses. They only satisfy the properties of a part as an individual entity, not as part of a compound object. If objects are assumed to be made up of a single part, then each part hypothesis is likely to correspond to a meaningful object. For multi-part compound objects, ambiguities may occur. They correspond to situations where the same boundaries belong to different parts hypotheses (having different interpretations). The reasons such ambiguities occur are that different objects (or parts arrangements) may produce similar properties in the image (figure 6.7). Figure 6 .6 Example o f detected joints; a. intensity image; b. detected parts; c. detected joints (T-junction points are displayed with large dots and junction curves in bold lines). Given the nature of the projective properties used, ambiguities can be of a geometric nature or of a structural one. For example, from the geometric projective properties of section 3.1, parallel symmetry is a property of both SHGC cross- sections (Property 3.1) and PRCGC limbs (Property 3.6). Consequently, it is possible to find situations where parallel symmetric boundaries for which a valid closure is found can be equally interpreted as either an SHGC or a PRCGC (see figure 6 .8 ). Similarly, it can be shown that any skew symmetric boundaries satisfy object class 1 (primitives / joints) object class 2 (primitives / joints) common (geometric / structural) Figure 6 .7 Ambiguities occur due to similar projective properties ofdijferent 3- D bodies. Property 3.3 of SHGCs. Thus, a part with skew symmetric cross-sections can introduce similar ambiguities. Structural ambiguities occur due to the similarity between the structural properties of a part and those of a joint. Most of the structural ambiguities are solved by the way discontinuous connections are handled and in the inter-part filtering (examples were given in figures 4.23 and 4.32). Figure 6.9 shows another example of an ambiguity between a 3-tgt-j 3-tgt-j closure and a joint of type end-to-end. The two different interpretations are two joined parts with type end-to-end with image space set of properties set^fr properties S 194 a PRCGC or an SHGC? parallel symmetric boundaries a. PRCGC interpretation dominates b. SHGC interpretation dominates c. Figure 6 .8 An example o f geometric ambiguity. different-size ends (pattern of figure 6.3.d) or three joined parts with end-to-end same-size ends (patterns of figure 6.3.a and b). Certain ambiguities can be resolved by the joint relationships between hypothesized parts. Joints with visible joint curves can be seen as highly regular structures. They involve not only regular (verified) parts but with specific structural relationships. For such relationships to be observed in the image, without the parts actually being joined in the scene, requires first the existence of a viewing direction that would accidentally produce these relationships, then that the scene be viewed from that specific viewing direction. Thus, the joints can be thought of as non accidental relationships whose presence in the image suggests certain labeling of boundaries. The idea is that joint curves should be interpreted as parts ends. 195 . interpretation 2 inteipretation 1 / a \ two joints of type one of type / \ end-to-end end-to-end / \ / - part 1 (cylinder) part 2 (dome) part 3 (cone) part 1 (cylinder) b. Figure 6.9 Structural ambiguities may result in different interpretations, a. original boundaries; b. side views o f two possible interpretations. Therefore, as shown in figure 6.10, the joint boundaries are unlikely to have side boundary labeling. Thus, parts hypotheses where those boundaries are used as limbs are inconsistent with the observed joints. For example, while the single part of figure 6 .8 .a, taken by itself could be interpreted as either an SHGC or a PRCGC, when considered in the joint of figure 6 .8 .b, can only be interpreted as a PRCGC part and when considered in the joint of figure 6 .8 .c as an SHGC part. joint curve cannot have “side boundary’ labeling Figure 6.10 Visible joint boundaries suggest “ part-end” boundary labelling. If after application of this filtering step ambiguities still remain, different object interpretations are generated. Each such object interpretation is represented as an attributed graph whose nodes are the parts and whose arcs are the joints between the parts. The arcs are labeled partially by the type of joints they represent. This graph is only a representation of the detected objects. Its purpose is not the same as the one in the method of [58] where the “scene graph” was used to achieve segmentation of objects made up of ribbons. The resulting compound object (its graphical representation) on the image of figure 6 . 6 is shown in figure 6 .11. The joint labels are shown in the lower left portion of the figure. Figure 6.12 shows additional results of the method on another scene with two compound objects (mugs) with more markings within the objects an in the background. Although multiple interpretations are a feature of our approach, in the examples given here, only one interpretation is found. 197 Figure 6.11 Graphical representation o f the compound object detected from figure 6 .6 . 6.3 3-D Shape Inference Inferring the 3-D shape of an object requires finding the 3-D intrinsic GC description of each of its parts; i.e. its 3-D cross-section, its 3-D axis and its scaling function. This step aims at inferring such 3-D descriptions for each interpretation produced by the previous step. To infer 3-D shape of a part, a minimum amount of information needs to be available in its projective description. This includes its body (co-cross-sectional points Correspondences) and its cross-section. That both of these entities need to be available is illustrated in figure 6.13 where different 198 intensity and edge images b. detected compound objects ( 2 objects with 2 parts each) Figure 6.12 Additional results o f the compound object segmentation method. examples are given of the same boundaries giving rise to different 3-D shape perception when associated with cross-sections of different shapes or orientations. A complete method to such 3-D recovery would use the information given by each part (its projective description built so far) and by the joints between parts. This is because, in general, the problem is under-constrained and needs the use of additional assumptions (preference criteria). Those assumptions should be consistent with the observed interactions between the surfaces of a part and between 199 a. b. Figure 6.13 Both the body and the cross-section are needed to infer 3-D shape, a. one cross-section needs to be visible fo r an SHGC; b. both cross-sections need to be visible fo r a curved-axis part. the parts of an object. In fact, joint interactions between parts may even provide missing information about a part’s description such as its cross-section and make its 3-D recovery possible. The 3-D shape inference method we use exploits both types of interactions. First, in the section below, we discuss the classification of parts, a useful step in 200 inferring 3-D shape. In section 6.3.2 we discuss the 3-D shape inference of apart as though it were isolated. In section 6.3.3, we discuss the use of joints in the 3-D inference. In section 6.3.4, issues relevant to the analysis of the resulting 3-D shape descriptions are discussed. 6.3.1 Classification of Parts This step uses the parts’ descriptions to infer useful information about their cuts (properties of their end cross-sections). Such information includes whether a visible part’s cut is likely to be cross-sectional1 or planar. This classification helps decide whether any of the 3-D recovery method, discussed in section 6.3.2, is applicable to a part. Parts with degenerate, non planar, cuts cannot be used to infer 3-D shape. The cut classification can be inferred from the observed image properties of a part as follows (the properties are used to make the inverse inference): • if the part is an SHGC with linearly parallel symmetric ends then the cuts at both ends are assumed to be cross-sectional (implicitly planar). This uses Property 3.1. • if the part is an LSHGC with line-convergent2 symmetric ends then the cuts are assumed to be planar (not both cross-sectional) 1. A part may not be cut along its cross-section (for example an oblique cut o f a cylinder). 2. Line-convergence is a form of symmetry whereby tangent lines at symmetric points of two curves intersect along a line [72], It can be thought o f as the generalization of point incidence to line incidence. 201 • If the part is an SHGC with neither of the above symmetries, then if it has a full cross-section it is assumed to be a cross-sectional cut (for example, when the bottom end is completely occluded) • if the part is a PRGC with at some end an extremal 2-D axis tangent orthogonal to the major axis of the elliptic cross-section then the cut is assumed to be cross-sectional (implicitly planar). This uses Property C .l of appendix C.3 • if the part is a PRGC with at some end elliptic cross-section not orthogonal to the extremal 2-D axis tangent then the cut is assumed to be planar but not cross-sectional This classification can even help select the appropriate 3-D recovery method. For example, in [72], LSHGCs with non parallel but planar cuts (producing line convergent symmetry) are recovered with a different method that those with planar and parallel cuts. 6.3.2 3-D Inference of Parts The image descriptions of parts with both surface and cross-section(s) can be used to recover their intrinsic descriptions. We discuss the 3-D inference methods used to recover SHGCs and PRGCs in this order. 202 6.3.2.1 Three-Dimensional Inference of SHGCs The 3-D SHGC inference method we use is based on the method of [69] which has been described in section 2.3.2. That method results in a viewer-centered surface description consisting of the orientations at sampled points on the surface. Our method extends that method to obtain an object-centered description in terms of the intrinsic elements of an SHGC: its cross-section, its axis and its scaling function, all in 3-D. This provides rich symbolic descriptions. The method assumes that we have a Right SHGC (i.e. a = n / 2 in equation 3.1). This equation becomes: S(t, s) = (r(s) u(t), r(s) v(t), s) (6.1) Without loss of generality, we assume that s = 0 for the “top” cross-section and that r(0) = 1 (the scaling is relative to the top cross-section). Note that our 2-D description already provides the values r(sz ) because the scaling is an orthographic invariant. However, it does not provide the values si necessary for a complete description of the scaling function. Figure 6.14 gives the configuration of the coordinate systems used. We denote S = ( 0 S, u, v, 5) the SHGC coordinate system, W = (Ow, x, y, z) the viewer coordinate system and I = (Ow, x, y) the image coordinate system. Without loss of generality, we assume that the viewing direction ^ lies in the u-s plane of the SHGC coordinate system. P makes an angle a with the 5-axis (SHGC axis). Consider S’ = (Os, u’, v ’, s’) obtained by rotating S around v by a so that the new 5-axis s’ is aligned with P. Let 0 be the angle between the projection of the SHGC axis and the 203 z’ F igure 6.14 SH G C representation a n d projection geom etry. x-axis. Consider W’ = (0\y, x ’, y ’, z ’) obtained by rotating W by 0 around z so that the new x-axis, x ’, is parallel to the projection of the SHGC axis. Let / ' = (Oyy, x ’, y ’) be obtained by rotating / by 0. Then, in / ’ the SHGC axis is horizontal, and S ’ and W ’ differ only by a translation Og~Ow. The relationship between coordinates in 5, S’, W ’ and I’ are as follows (we will henceforth omit the arguments t and s in the expressions): letting (xv , y5, z j f be the coordinates of Og in W ’, a point P with coordinates (ru, rv, s)r in S can be written with coordinates of equations 6.2 and 6.3 in S’ and W’ respectively and its projection p can be written in I ’ with coordinates of equation 6.4. 204 P = ( c o s c t ru + s sina, rv, -sincr ru + s coso Y 5 - (6.2) P = ( c o s c t ru + s sincr +xs, rv + y s , -sincr ru + s c o s c t + zsYw’ (6.3) p = ( c o s c t ru + s sincy +xs, rv + ysY /> (6.4) In the remainder of this analysis, we consider image measurements in I ’. World coordinates will be expressed directly in W . To recover a complete 3-D description of the SHGC, it is necessary to determine the 3-D top cross-section, the orientation of the axis, the coordinates (x^, ys, 2S)1 of its origin (point where the axis pierces the cross-section plane, not necessarily at its center) and the values (i = 1 .. . n) (call them 5-values) of the cross-sections of interest. First, the 3-D “top” cross-section can be recovered as a set of points (xit yt, ztY on a plane whose normal f t is estimated by the method of [69]. The image cross- section points (Xj, yiYr are already given by the 2-D description and we only have to recover the third coordinate z, of each point. Writing that the cross-section points belong to a plane orthogonal to f t yields the equations yi Nz zt + D — 0 where D gives the distance from the origin to that plane (also the depth of the cross-section). We can arbitrarily fix D to be zero (the absolute depth cannot be recovered from a single image) and we obtain zi = -(ftxxi + Ny yi)/Nz. 205 The viewing angle a can also be easily recovered. Since the axis of the SHGC has its direction given by - f t , we have cosa = -Nz and c t = cos_1(-iVz). The recovery of the scaling function and the axis depend on whether the part is an LSHGC or a non-linear SHGC. We discuss the two cases separately. 1) Recovery of LSHGCs Using equation 6.1 with a linear form for r(s), an LSHGC can be parameterized as follows: S(t, s) = ((as + 1) u, (as + 1) v, s) (6.5) where a is a constant (slope of the sweep), u = u(t) and v = v(t). In the case of a cylinder, a = 0. The direction of the sweep is given by the normal to the cross-section plane, and the size of the cross-section remains constant. It suffices to recover the 5-value (3-D distance from the top cross-section plane) for the last (“bottom”) cross-section. Let P = (x, y)* be a point on the last cross-section and P q = ( x q , y 0Y its corresponding point on the “top” one (figure 6.16.a). The vector <?= P - Pq has coordinates (x - x0, y - yoY which, from equation 6.5 and its corresponding form in / ’, also correspond to (sina s, 0 Y, where s is the s- value of the last cross-section. Equating the last two expressions yields s = (x - x q ) / sina. 206 ! » A k k rd a. b. Figure 6.15 Recovery o f an LSHGC. a. cylinder; b. cone. In the case of a cone {a * 0), we can determine a, the 5-value of the last cross- section and the apex A of the cone (figure 6.16.b). As above, consider a vector (?= P - P q, with coordinates (x - jcq, y ~ Jo)* i* 1 ^ an<* (cosa a s u + sina s,asv, -sina a s u + cos a s f in W’. Equating the former coordinates with the first two of the latter, we obtain: u = (x - xo - sina 5) / (cosa a s) (6.6) v = (y-y 0) l( a s ) (6.7) Consider the surface normal tl= (nx, ny, nz) t at P (computed by the method of [69]). For an LSHGC, ft is orthogonal to £ (the surface of an LSHGC is a developable surface). We can write this as (cosa a s u + sina s, a sv, -sina a s u + cos a s)1. (nx, riy, n f f = 0 207 where denotes vector dot product and which when developed and combined with equation 6.7 yields u = (ny (y - y0) + s (sina nx + cos a nz)) / (as (sina nz - cos a nx)) (6.8) Combining equations 6.6 and 6.8 yields s = - (cos o ny (y - yo) + (x - xo) (cos ° nx ' sinor nz)) / nz (6.9) Using the value r of the scaling of the last cross-sections (given by our 2-D description), a is given by a — (r - 1) /s. The apex A of the cone is the intersection point of all (straight) meridians. Let its coordinates in / ’ be (xa, yaY (which are given by our method). From equation 6.5 and the analysis in appendix A (proof of Theorem 3.4), its coordinates in W ’ are (sincy (s/(l-r)) + xs, ys cos o (s/ (1 - r)) + zs) \ where (xs, ys, zsY are the coordinates of the SHGC origin (0$). Equating the former expression with the first two coordinates of the latter yields xs =xa - sina (s / ( 1 - r)) (6.10) vys =ya C6.il) As we have done in the beginning of this section for the 3-D cross-section recovery, writing that 0 $ lies in the top cross-section plane, zs is determined by zs ( Nx xs + Ny ys~ ) / Nz 208 2) Recovery of Non L inear SHGCs a) Recovery of s- Values along the Axis As previously mentioned, our 2-D description already provides the values rj (the scalings with respect to the top cross-section). Therefore, here we discuss the recovery of the s-values (sj) of the scaling function. Let Pj be a point on cross-section Cj of the surface of the SHGC. Let $ = (mx, nty, m ff be the tangent to the meridian ((mx, my) can be computed in the image) and ft = (nx, U y, nz){ the surface normal at Pj. Let (rj u, rj v, sp* be the coordinates of P j in 5, (x, y, z f its coordinates in W’ ((x, y) are known from the image description) and (x0, y0. zo)f the coordinates of P q, the corresponding point of P j on the top cross-section ((x0, y0, z0) are already known). See figure 6.16. J Figure 6.16 Recovery o f a non-linear SHGC. 209 We can readily determine the complete 3-D tangent to the meridian from the fact that the tangent to the meridian and the surface normal are orthogonal [23], i.e. (nx, nT nzy . (mx, m j = 0 This yields mz = - ( n jn x + ny my) / nz. Using equation 6.1 and expressing the parallelism of the tangent to the meridian in S ’ (cosa u r’j +sina, v r’j, -sina u r’j + cos o f , where r ’j= d r / ds (sj), and W’ (mx, my, mzY, we obtain the following equations: my ( cosa u r ’j +sina ) - mx v r ’j = 0 (6.12) mx ( -sina u r ’j + cosa ) - mz ( cosa u r ’j -i-sina) = 0 (6.13) Combining the last two equations yields the relationship between u and v: u = v ( ( cos a mx - sina mz ) / my ) (6.14) The value of v can be computed with good accuracy using the relationship between coordinates of P q in S ’ and W’, by v = yo~ys (ys is the constant second coordinate of the projection of the axis in / ’; recall that in that system the axis is horizontal). Consider the correspondence vector ? = P j - Pq . In I’, < ? has coordinates ( j : - x0, y - y o / which also correspond to (cosa «(/}•- 1) + sina sj, v (rj - 1)/. Equating the last two expressions yields: sj = (x-x o - cosa u (rj - 1) ) / sina (6.15) 210 In practice, the above process is performed on points where the tangents to the meridians are not parallel to the axis or orthogonal to it. More than one point is used for each cross-section. The obtained values are averaged to obtain an estimate of the value of Sj. b) Recovery of the Origin of the SHGC The coordinates (xs, ys, z j f of the origin of the SHGC (point of the axis on the top cross-section plane) can be determined in many ways. We have used the following method. Let Cj be an arbitrary cross-section, rj its scaling with respect to the top cross- section, and Sj its 5-value computed by the method described previously. Cj is selected so that rj * 1. Let (xa, y j f be the coordinates in the image plane of the intersection point A j of all lines of symmetry between Cj and the top cross-section (those coordinates are given by our 2-D description). From equation 6.1 and the analysis in appendix A (proof of theorem 4), the 3-D coordinates of that point in S are (0, 0, Sj / (1 - rj))1 and in W’ (sina Sj / (1 - rj) + xs, ys, cos a S j/ (1 - rj) + zjf. Equating the first two coordinates of the latter with (xa, ya) yields: The third coordinate can be determined by writing the fact that the origin lies in the top cross-section plane. This yields xs = xa - sina Sj / (1 - rj) (6.16) ys - y a (6.17) zs ( Nx xs + Ny yj) / N z (6.18) 211 The results of the application of the method to the descriptions obtained in figure 4.34.b and figure 4.35 were shown in figure 4.36, where the objects are displayed for poses different from the original ones. The figure shows the 3-D ruled surfaces in terms of cross-sections and meridians. Its application to the SHGCs of the compound objects of figure 6.11 and figure 6.12 can be seen in figure 6.18. At this stage, the complete intrinsic SHGC descriptions are found. 6.3.2.2 Three-Dim ensional Inference of PRGCs The 3-D inference method for PRGCs is also inspired from the method of [70]. This latter uses constraints on the surface orientations at the visible portion of the surface of a PRCGC; i.e. computes a viewer-centered description. The 3-D reconstruction of circular PRGCs has not been previously addressed in the literature. The method we discuss here directly computes the intrinsic description of PRCGCs and circular PRGCs; i.e. an object-centered description. 1) Inference of the 3-D E nd Cross-Sections and the Axis Plane To infer the 3-D end cross-sections requires estimating the orientations of their supporting planes (we consider the general case of an unknown cross-section shape). For this, the method of [69,70] based on ellipse fitting is used. That method consists of selecting among two elliptic fits, the least squares and the scatter fits, the one which minimizes the error of fit. The orientation of the 3-D circle(s) that project(s) onto the fitted ellipse is used as an estimate of the cross-section plane orientation. For a given curve, there are in principle two possible orientations (one 212 circle the necker reverse of the other). The selection among these uses the classification of the cross-section orientation as facing towards the image plane of away from it, which was made at the part closure verification step. This results in a unique solution for each cross-section. Calling ^ and resulting plane orientations of the two extremal cross- sections C i and Cj, the axis plane orientation is given by flQ = because the axis is orthogonal to the cross-sections. Its position (depth) can be arbitrarily fixed. The 3-D positions of the two end cross-sections (their planes) are automatically fixed since their centers belong to the axis plane. Each end cross-section is back-projected onto its plane to obtain its 3-D shape (this uses the same principle as for SHGCs). If at least one of the end cross-sections is complete (facing towards the camera for e.g) the 3-D cross-section shape is complete and a full 3-D recovery is feasible (discussed in the next step). If both end cross-sections are incomplete (facing away from the camera or due to joints, for example) then the 3-D recovery may not be possible. For this, some way of inferring the cross-section shape is needed. This could be done either by exploiting certain types of joints or by assuming particular shapes, such as circular, whenever consistent. An instance, is the use of the ellipse fit error to classify whether the cross-section is (likely to be) circular or not (using normalized errors and fixed thresholds, the details of which we omit). In case it is circular (i.e. the visible 213 portion is elliptical) then the recovered 3-D circle is used as the complete 3-D cross- section. The invisible cross-section inference is discussed in detail in section 6.3.3. 2) Inference of the 3-D Axis and the Scaling Function The 3-D cross-section, the axis plane and the image correspondences (co-cross sectional points) can be used to recover the 3-D axis and the scaling function. The method we describe below has some similarities with the method of [69] and consists of finding for each pair of co-cross-sectional points the position and orientation of the 3-D cross-section which passes through them while being internal to the surface. Let p i and P2 be two co-cross-sectional points of the part’s surface and ^ and Y 2 be the respective tangents to the part’s boundaries at those points. Let 0 be the 2-D axis tangent for the cross-section segment p \P 2 (see figure 6.17). The 3-D cross-section which passes through p i and P2 and which is tangential to the part’s (limb) boundary is found as follows: Step 1. Back-project # on the 3-D axis plane to obtain the orientation of the 3-D cross-section, say W. Step 2. In 3-D, rotate a reference end cross-section, say Cj, by the rotation /? ( h j , j which rotates h j to W . Call the resulting cross-section Cr and its projection cr. Step 3. Find, on cn the two points whose tangents are the same as the tangents ^ and r2 to the projected limbs at p \ andp 2. Call them qi and q2. 214 Step 4. Compute the scale r of the desired cross-section with respect to the reference one as r = dist (pi, p f) / dist (q\, qf). This comes from the fact that lengths ratio of parallel segments is an orthographic invariant. Step 5. Scale cr by r and translate it so that p \ and q\ coincide. Call the resulting cross-section cp. Step 6. Back-project cp so that its center lies on the 3-D axis plane. This results in the desired 3-D cross-section Cp whose radius is r rj where rl is the radius of C\. In step 1, the 2-D axis tangent is used as an estimate of the projection of the 3-D axis tangent. The basis for this are Property 3.6, which for a PRCGC indicates that the projection of the 3-D axis and the 2-D axis are parallel symmetric and thus have the same orientations, and Property 3.13 which for a circular PRGC indicates that the 2-D axis tangent is “almost parallel” to the projection of the 3-D Figure 6.17 Inferring 3-D descriptions o f curved-axis parts. 215 axis tangent over “most viewing directions.” Step 2 uses the assumption that the part’s cross-section is orthogonal to its axis. If one of the cross-sections is classified as a cross-sectional cut then it is chosen as the reference one. Step 3 exploits the fact that limbs and cross-sections are tangential to each other (limb boundaries are assumed). If the side boundaries are not limbs but meridians (corresponding to corners of the cross-section), the identification of the cross-section points q\ and # 2 of step 3 is trivial because they are constant along the surface and correspond to corners of the cross-section. Step 4 is used for both PRCGCs and circular PRGCs. In theory, the scale should be 1 for a PRCGC but due to errors in cross-section and axis orientations a small offset may result. The method of [69] uses an objective function to refine the axis plane orientation such that the scale is 1 for all points. In our method, we prefer to allow for scaling so that PRCGCs and circular PRGCs are handled in a uniform way. Step 5 consists of verifying whether the reconstruction (inference of the 3-D cross-section at the current location) is possible. If after proper scaling, the image of the cross-section does not lie within the part’s surface, then the part is neither a PRCGC nor a circular PRGC and is marked as “unknown.” Therefore, for parts with dual projective geometric descriptions (where the correspondences consist of two alternatives between parallel symmetry and right ribbons, as discussed in section 5.1.1), this step helps choose which (if any) is the right one. Step 6 fixes the position of the desired 3-D cross-section by forcing its 216 center to lie on the axis plane. The locus of those centers constitutes the inferred 3- D axis. This method is applied to all component local surface patches of a curved- axis part. Its results on the part’s hypotheses of figures 5.10, 5.11 and 5.12 are shown in figure 5.13. The figure shows the reconstructed descriptions in terms of cross-sections, meridians and 3-D axes. Its application on the curved axis parts of the compound objects of figure 6.11 and figure 6.12 are shown in figure 6.18. Notice that the handle of the teapot of figure 6.11 and the handle of the left-most mug of figure 6.12 could not be recovered because their cross-sections are completely occluded by the pot and the cup. To recover a 3-D description of such parts requires a different method, than the one discussed above, which does not rely on the visibility of the cross-sections. For example, if both handles can be approximated by Zero-Gaussian curvature surfaces (almost flat in one direction), a method such as [72] could be used to recover their 3-D shapes. Also, useful 3-D information could be recovered if knowledge about the 3-D relationships between, say, the pot or the cup and their handles is given. This is not pursued in this thesis. However, despite the inability of the method to recover 3-D shape of such occluded parts, their projective descriptions still provide important shape information. At this stage, wherever possible, the 3-D intrinsic description of the volumetric parts are inferred. The obtained 3-D descriptions can be used to classify the part at hand. If the sweep function is (almost) constant then the part is labeled 217 as a PRCGC. If on the other hand the sweep is not constant then if the cross-section shape is circular (using the fitting errors previously mentioned) then it is labeled as a circular PRGC otherwise it is labeled as “unknown.” Finally, due to image discontinuity such as (self)occlusion and contour breaks, some 3-D parts may have gaps in their reconstructed 3-D shape (such as could be noticed in figure 5.13). At this stage, it is possible to complete the 3-D descriptions by using interpolation for each of the intrinsic elements. For example, gaps in the 3-D axis of a PRGC can be completed by, say, conical patches in the axis plane and gaps in the scaling function (of either SHGCs or PRGCs) can be completed by, say, linear (or higher degree) B-splines. The right-most circular PRGC (spout) of the compound object of figure 6.18.a has been so completed. This completion is not crucial for the results, however, as the incomplete descriptions are still useful. 6.3.3 Constraints on 3-D Shape from Joints The previous discussion ignored the use of joint interactions among parts in the 3- D inference. Joined parts do set mutual constraints on one another’s 3-D shape. For example, well defined differential geometric relationships exist between the surface orientations of intersecting surfaces. Their surface normals, along their intersection curve, are both orthogonal to the tangent to that curve [23]. The problem of exploiting such constraints from a single image of arbitrary joint relationships between parts is a research topic in its own right (see [71] for an approach that 218 b. Figure 6.18 Recovered 3-D volumetric descriptions fo r the descriptions o f figure 6.11 (a) and from the descriptions o f figure 6 .1 2 (b). 219 addresses objects with multiple Zero-Gaussian Curvature surfaces). For this reason, we limit the analysis to the use of end-to-end joints with visible joint curves as they are simpler to exploit. There are two ways end-to-end joints can be exploited in 3-D shape recovery. The first is to set mutual constraints on the cross-section plane orientations of joined parts. The second is to infer invisible (or not completely visible) cross-sections for parts with incomplete descriptions. We discuss them separately. 6.3.3.1 The Use of End-to-End Joints to Constrain Cross-Section Orientations End-to-end joints with visible joint curves indicate that the joined parts are cut by the same plane. Therefore a 3-D description of the joined parts must be consistent with this constraint. For this, before the use of any of the methods described in section 6.3.2, sets of joined parts’ ends which must have the same plane orientation at their ends are constructed (some of these sets may contain only one element). This is done by traversing the object graph through the arcs labeled as “end-to-end with visible joint curve” and through nodes labeled as having “parallel cuts,” such as SHGCs with linearly parallel symmetric ends. An example is shown in figure 6.19. In this example, two sets are obtained } and {X12, * 2 1 , X22 X31 X32 X41 * 42) • To estimate the common plane orientation of the parts’ ends of each of the sets so obtained, the method first estimates the orientation of each part’s end, independently of the others, using the method described in section 6.3.2. Then the 220 cross-section cross-section X- cross-section X 12 cross-section X22 cross-section X^i cross-section X 41 cross-section X cross-section X 42 Figure 6.19 Finding parts’ ends which have the same plane orientations. common orientation of those parts’ cross-sections is the average of all the obtained orientations. The common cross-section orientation of the lid and the pot of figure 6.18.a is so computed. At this stage, all visible cross-section orientations have an estimate of their plane orientations consistent with the observed end-to-end joints with visible joint curve. Parts with at least one (almost) completely visible cross-section facing towards the camera are used by the methods of section 6.3.2 to recover their 3-D shapes. The remaining parts do not have a fully visible cross-section at either end. For some of these parts, some way of inferring the missing cross-sections is needed in order to be able to recover their 3-D shapes. For this, we find the end-to-end joints with equal-size ends (and visible joint curve) to be useful. This is discussed below. 221 6.3.3.2 The Use of End-to-End Joints with Equal-Size Ends to Complete Parts’ Descriptions Parts with only partially visible cross-sections and which are involved in an end-to- end joint with equal-size ends may have their cross-sections “inherited” from the joined part if this latter has a complete 3-D description. Thus, parts whose 3-D shape cannot be recovered if considered isolated, may be recovered after this inference is made. In the case such joint-based inference is not possible, then assumptions about the circularity of the cross-sections are made whenever consistent. The different cases are discussed below. • if a part P\ has an end-to-end joint with equal-size ends with another part P^ and if the cross-section of P j at the joined end is known (complete) then it is used as the complete cross-section of P2 at its joined end (see figure 6.20) • if an SHGC part has an elliptic arc as the partially visible cross-section and bilaterally symmetric side boundaries with the axis of symmetry orthogonal to the ellipse major axis, then use the ellipse as the part’s cross-section (see figure 6.21.a) • if a PRGC part has an elliptic arc as the partially visible cross-section and an extremal tangent to the 2-D axis (almost) orthogonal to the ellipse major axis, then use the ellipse as the part’s cross-section (see figure 6.2l.b) An example of the first rule is given in figure 6.20 where the “bottom” cross- section of the cylinder is known (since already recovered) and thus can be used as 222 the “top” cross-section of the lower part. We find the other types of joints to be less strong than the one above and are thus not used for this purpose. visible cross-section p 3 end-to-end joint with equal size ends Figure 6.20 Using end-to-end joints to infer cross-sections. The second rule is based on Property 3.10 and Property 3.11 of SORs and the third rule is based on Property C .l given in appendix C.3. Both rules infer circular cross-sections (elliptic in the image) from the symmetry and the partial cross- section observed. Figure 6.21 gives some examples. If the symmetries, for example, are not observed between the part’s side boundaries then no inference is made and the part’s description remains in terms of its surface correspondences only, without complete information about its cross-section. Its description remains projective. Each time the cross-section of a part is inferred in any of the above ways, the 3-D shape of the part can be recovered and the new information is propagated through the visible end-to-end joints with equal-size ends. In doing so, full advantage of such joints is taken for shape inference. The newly inferred cross- 223 inherited cross-section bilateral symmetric boundaries \ \ \ \ \ \ elliptic-arc with major axis orthogonal to axis of symmetry inferred cross-sections a. 2-D axis tangent inferred ortnogonai to ellipse / major axis cross-sections b. Figure 6.21 Assumption o f cross-section circularity from surface symmetry, a. bilateral symmetry for SHGCs; b. ellipse major axis orthogonal to extremal 2-D axis tangent. sections can be used to further complete the part’s surface if it is incomplete (occluded for example). In the case of SHGCs with non (completely) visible cross- section, now that the cross-section is given, the completion step, described in section 4.3.3 for SHGCs with completely visible cross-section, can be used. 224 \ 6.3.4 Analysis of the Results An issue of interest is how to evaluate the “goodness” of the resulting 3-D descriptions. There are two aspects of this “goodness” measure. First, we would like to know whether the reconstruction is sensitive or not to the image appearance of the part, or the viewing direction. The second, aspect has to do with how close the resulting descriptions are to the “actual” ones; i.e. the ground truth. If the part is an SHGC or a PRCGC then the use of the invariant Properties 3.1-3.8 guarantees the non-sensitivity of the resulting 3-D descriptions to the viewing direction (up to projection, noise and quantization errors). For a circular PRGC, the quasi-invariant Properties 3.12 and 3.13 indicate that the image descriptions are “good approximations” of the projective ones over “most viewing directions” but do not directly indicate how the inferred 3-D descriptions vary (as functions of the viewing and shape parameters) due to these approximations. In appendix D, a detailed analysis on the variation of the 3-D descriptions is given. The conclusion is that the 3-D descriptions do not change much. Finally, if the part has been classified as “unknown,” as discussed at the end of section 6.3.2, then we have no rigorous properties to help evaluate the recovered descriptions. The part is still accepted as a meaningful object, however. As for the “goodness” in terms of the ground truth, one way is to apply the method to synthetic objects with known ground truth and compare the results with them. This is the subject of the analysis of circular PRGCs recovery in appendix D. 225 However, ground truth is not known in general and, after all, the observed boundaries could project from an infinite number of possible 3-D shapes. We believe that a more appropriate measure (perhaps the only one) would be how close the 3-D descriptions are to human perception of the same images. An analysis of this measure was given by [69] for perception of the cross-section orientation of SHGCs (the degree of freedom left unspecified) and the conclusion is that the recovered shapes conform (up to small deviations) to human perception. For our curved-axis primitives, that analysis should also apply because the cross-sections orientations are also the first degrees of freedom to be fixed and a similar ellipse fitting method is used. 6.4 Discussion 6.4.1 Strengths and Limitations The object level exploits the projective structural properties of the joints in order not just to infer compound objects, but also to provide global context for both segmentation and shape description. The use of joints with visible joint curves for filtering out false hypotheses, which the part level accepted, is an example of their use for a more complete solution to the figure/ground problem. The use of end-to- end joints with equal-size ends and visible joint curves to infer missing cross- sections and the use of end-to-end joints with visible joint curves to estimate 226 common 3-D cross-section orientations also exploit that context for (3-D) shape description. The method may not handle 2-dimensional compound objects (involving flat GCs) where the joints do not form any of the patterns discussed in section 6.1. An example of such a case was given in figure 5.15. That type of object is handled by a method such as the one of [58] which is best suited to 2-D objects. A combination of both methods would increase the set of objects that can be handled. 6.4.2 Performance The compound object segmentation and shape description method has been demonstrated on two complex images (figures 6.6 and 6.12), containing in total 4 SHGCs and 4 PRGCs. The performance of the whole method, including the SHGC and PRGC modules, on those scenes is summarized in the tables below. The tables show the evolution of the number of hypotheses of both modules and of key steps of the object level and the corresponding run times. The final number of parts given is the one obtained after the part verification step between both SHGC and PRGC parts. It can be seen that the object level processing itself is very fast. This is to be expected because there are few parts hypotheses given as input to that level, making the joint detection fast. Also, the 3-D inference methods are purely numerical and take little time to execute.The bulk of the processing takes place at the part level. 227 The second scene (figure 6.12) takes more time to interpret due to its superior complexity (more markings both within the objects and the background). The parameters used in the object level are mostly those used in the part level for the analysis of the structural properties. The robustness of the method to changes of the parameters, those of the SHGC and PRGC modules for detecting the parts of figures 6.6 and 6.12, has also been tested by modifying key parameter values by up to 50% of their default values. The final results were unchanged. scene boundaries hypothesiz ed cross- sections local SHGC patches parts hypotheses verified parts hypotheses fig. 6.6 850 3 9 6 2 fig. 6.12 2057 6 18 17 2 Table 6.1 Evolution o f the number o f hypotheses o f the SHGC module scene parallel symmetries local PRGC patches parts hypotheses verified parts hypotheses fig. 6.6 3610 84 68 2 fig. 6.12 3783 105 89 2 Table 6.2 Evolution o f the number o f hypotheses o f the PRGC module scene number of parts number of joints number of objects fig. 6.6 4 4 1 fig. 6.12 4 4 2 Table 6.3 Evolution o f the number o f hypotheses o f the object level 228 scene cross- section finding local SHGC patch detection grouping of local patches verification of SHGC parts fig. 6.6 0:55 0:15 0:00.6 0:06 fig. 6.12 4:30 0:42 0:00.9 0:27 Table 6.4 Run times o f the SHGC module (min:sec) scene parallel symmetry local PRGC patch detection grouping of local patches verification of PRGC parts fig. 6.6 1:40 0:54 0:14 3:31 fig. 6.12 2:49 3:52 0:25 15:10 Table 6 5 Run times o f the PRGC module (mimsec) scene joint detection 3-D SHGC inference 3-D PRGC inference fig. 6.6 0:00.5 0:17 0:00.1 fig. 6.12 0:00.6 1:24 0:00.1 Table 6 .6 Run times o f the object level (min:sec) 229 7 Conclusion 7.1 Summary The method and the results described in this thesis show that the problem of segmentation and 3-D inference of complex curved objects from a real intensity image can be solved for a large class of objects. This work extends and complements many past efforts on the same problem. However, these latter have either assumed perfect and segmented boundaries or addressed 2-dimensional shapes. The automatic segmentation and 3-D shape inference of complex curved objects made up of GC parts from a real image in the presence of noise, boundary breaks, markings, shadows and occlusion has not been previously addressed in the literature. The proposed method takes explicit account of both the 3-dimensionality of the scene objects and of the real image imperfections, including occlusion. Its scope includes compound objects that consist of the arrangement of SHGCs, PRCGCs and circular PRGCs. The explicit view of the image objects as projections of 3-D bodies lead to the analysis and derivation of rigorous projective properties of the above classes of shapes in the image. The explicit account of the real image phenomena lead to the derivation of a grouping-based hypothesize-verify method for the segmentation and description of the viewed objects. The method exploits the projective properties as strong constraints to hypothesize, verify and refine object descriptions from the clutter of imperfect intensity boundaries. The geometric invariants, both derived in this thesis and past ones, establish strong relationships between image and 3-D shapes, which allows the characterization of SHGCs and PRCGCs. In the absence of strict invariants, the derived geometric quasi-invariants have proved equally effective for characterizing circular PRGCs. We believe the successful use of the quasi-invariants of circular PRGCs to be an important step towards handling more complex shapes. They are a promising alternative to invariants which may not exist for all shapes. Both types of geometric properties characterize the projection of the 3-D intrinsic description of a GC, thereby providing means to compute its projective description which is then used to infer its 3-D shape. The structural properties, both those of a single part and those of a compound object, establish constraints on the way boundaries of an object interact with one another. Thus, they provide further regularity constraints in hypothesizing relevant 3-D objects. We believe the successful use of these properties in solving the segmentation problem to be an important progress. Most past work has focused on the use of 231 projective properties only to infer shape descriptions, assuming that the segmentation problem has already been solved. The hierarchical organization of the method allows us to solve the segmentation and description problems in steps, each with a well defined scope. The part level uses evidence of regularity within each part as though it were isolated. The object level uses evidence of joints between parts to provide a more global scope. In doing so, geometric context is exploited as it becomes available in order to either refine certain descriptions, such as the computation of the global axis of an SHGC or the inference of parts cross-sections through joints, or filter out false hypotheses which passed lower-scope verification tests. This distribution of the segmentation decreases the dependency of the higher levels on the results of the lower ones. However, the performance of the higher levels is not completely independent of the lower levels. Wrong boundary groupings, in the curve level, between the boundaries of an object and surrounding markings may affect the surface patch detection step if the grouping deviates significantly from the object’s surface. Also, if the boundaries of a part’s surface do not produce the expected symmetry relationships so that a surface patch cannot be detected, then the part may not be found by the method. An example is given in figure 7.1. To handle such cases requires a further ability to detect parts when the geometric invariant and quasi invariant properties are not observed. This would require a stronger reliance on the 232 structural properties. In the example of figure 7.1, the visible 3-tgt-j is a strong indicator of a volumetric part. The presence of an occluding body over boundaries which extend in the same direction as the visible surface boundary is a strong indicator of a regular part as it “explains” the non-visibility of projective geometric relationships. To exploit only such structural evidence would increase the applicability of the method. It remains to be seen, however, how only structural properties will perform in the presence of real image phenomena. Their local nature would be too weak to provide the strong constraints needed to handle surface markings or shadows. One possible way is to use weaker geometric constraints in hypothesizing objects. A ribbon based method such as [45] would be able to detect the visible portions of such an object. The grouping of these portions into a single object hypothesis cannot be made on rigorous criteria such as the intrinsic projective geometry of the object (collinear axes of portions of an SHGC, for example). However, the detection of such surface portions is a useful step. A more comprehensive method would use a combination of both types of approaches so that rigorous projective geometry is exploited whenever possible but less rigorous means are used otherwise. The performance of the SHGC module also depends on whether the right cross-sections are detected or not. In principle, this should not affect the results as the module that detects SHGCs with invisible cross-sections should handle these cases. However, as we have mentioned in section 4.5, that module is currently not 233 complete and is not an active part of the system. This not only limits the method to cases where the cross-section is visible, but makes its performance dependent on the cross-section finding step. Figure 7.1 Example o f non-detectablepart by our method. A stronger reliance on structural properties to hypothesize parts is needed. 7.2 Future Research This work has addressed a challenging problem and we believe it to make an important step towards a practical vision system. However, it does leave many issues that have to be addressed for a more robust, more efficient and more general approach to monocular scene analysis. In the following sub-sections, we discuss the issues which are believed to be relevant for such an endeavor. We first discuss the applications that the current method has. 7.2.1 Applications The results produced by the method described in this thesis have direct application to 3-D object recognition and robotic grasping. In 3-D object recognition, the 234 resulting object-level descriptions can be used to derived powerful indexing keys to access likely models from a large data base. Each of the intrinsic elements of a part, its cross-section, its axis and its scaling function, can by itself be used to derived such indexing keys. A combination of such keys in a multi-dimensional indexing space would provide particularly strong discriminatory power. If 3-D shape could be recovered for an object, then the indexing keys can be based on the 3-D intrinsic elements. This is very useful if the models are given as 3-D objects, such as obtained from previous image interpretations of the system or simply from an off-line encoding. If, on the other hand, the descriptions remain projective, then the indexing keys will be based on the 2-D descriptions. These descriptions being insensitive to viewpoint (because based on viewpoint invariant and quasi-invariant properties), they are also powerful in matching image to model shape properties. In both cases, the indexing can be based on qualitative shape properties such as those discussed in [5]. For compound objects, a method similar to that of [48] can be used to explicitly account for the joint relationships between parts. In robotic grasping, the recovered 3-D shape descriptions provide useful information for planning and executing a grasp operation. Such an application was demonstrated in [58] where the 3-D shapes were obtained from range data. In our case, an intensity image produced by a simple CCD camera mounted on the arm can be used to find the 3-D shapes and apply a similar method for object grasping. 235 7.2.2 Robustness The method performed well on relatively complex images by current standards in the field. It correctly handled spurious information such as surface markings and shadows and missing information such as boundary breaks and occlusion. However, so far, the system has been tested on images taken in an indoor setting with a few visible objects. An outdoor or a workplace scene would provide a different type of background and illumination conditions. Although the current method does not assume any particular properties of the background or illumination conditions (all the images shown were taken under ambient light), its performance on highly cluttered scenes with more complex occlusion patterns than shown in this work (a plastic bottle trash dump, for example) remains to be tested. To handle such complex scenes, a more general approach needs to be developed. For example, cues other than contour alone may be needed. The use of shading may also help guide the segmentation process. A qualitative analysis of shading would help focus the search to homogeneous regions first, for example. This would correspond to a hybrid approach combining both region and edge based methods. Shading can even help disambiguate between conflicting interpretations. In the example given in figure 6.9 where there is an ambiguity as to whether some boundaries delimit a curved surface or a planar cross-section, if the intensity smoothly changes to lower (darker) levels in the vicinity of the side boundaries, 236 then a curved surface interpretation is more likely (assuming a Lambertian surface and a point light source along the viewing direction, for example). An analysis of the qualitative properties of shading and their use for the segmentation would prove useful in such cases. Combining cues would also increase the robustness of the method to its parameters. The method has been tested on about a dozen images. In reality, this only indicates that the approach is viable and holds promise. More testing, and probably further refinement of the method, needs to be made in order to better assess its robustness. 7.2.3 Efficiency Another issue for a practical vision system is to obtain a more efficient method. Although some mechanisms have been used in this thesis to reduce the time complexity of the method (discussed in appendix E), there is much work that remains to be done. An instance is the use of coarse to fine methods in computing the local surface patches. For example, in the right ribbon detection method of section 5.1.1, a search using a coarse to fine angular step would use fine (costly) search only on promising surfaces. However, the main issue in improving the efficiency of the method is to implement it in a parallel architecture. Most steps of the method lead themselves to a natural parallel implementation. This includes the boundary grouping, the cross- section finding, the local surface patch detection, grouping and verification. In those 237 steps, the detection of the features of interest is independent of others and thus can be done in parallel. 7.2.4 Scope A practical vision system for generic objects must be able to handle a large number of objects. The current scope of the method is fairly large. SHGCs, PRCGCs and circular PRGCs do make up many objects in our environment. However, the scope could be increased to handle an even larger set of objects. The method currently does not handle polyhedral objects. Although some polyhedra can be described as LSHGCs, their image consists of several surface patches. As discussed in sections 4.5.1 and 5.3.1, to handle such objects requires a further merging step to make a single part hypothesis out of the many that would be produced. An additional issue in dealing with polyhedral objects is their inherent ambiguity as GCs. For example, the cross-section and side boundary labeling may not be unique and several interpretations could result. A method that addresses polyhedra must also handle those ambiguities. Analyzing the geometric properties of non-circular PRGCs (varying size and not circular cross-section) is also of interest. Including non-planar axis primitives to the set of primitives would also increase the applicability of the method. Issues in handling non-planar axis primitives include the analysis of their projective properties and how they can be used to identify them (and distinguish them from other primitives) and recover their 3-D shape. 238 Finally, the current method addresses objects which can be reasonably well described by a GC of the primitive set used. Such objects produce images which can be used to compute the invariant and quasi-invariant properties. This may not be the case for all objects, however. There are many objects which can only be “coarsely” described by one the primitives used. For example, a tree trunk with many irregularities on its surface may be missed by the current method as it may not satisfy the properties (tangents to its outline, for example, might deviate significantly due to random protrusions). This issue is not to be confused with image noise or quantization effects. The imperfections mentioned here are related to the object itself, not the imaging process. We believe a scale space approach to be useful for handling objects with irregular surfaces. In such an approach, structural details are accessed upon need. It allows us to ignore undesired details when not needed and focus only on the coarser shape attributes, where useful regularities can be detected. The current method uses a single scale, the one provided by the camera resolution. How to use scale and the analysis of the way the projective properties behave with respect to changes in scale are useful issues in this direction. In fact, such an approach can also improve the complexity of the methods. False hypotheses are cheaper to identify at coarser scales than at finer ones. Despite the numerous issues which remain to be addressed, we believe that this work provides a promising methodology to build practical vision systems. 239 References [1] G.J. Agin and T.O. Binford. Computer Description of Curved Objects. IEEE Transactions on Computers, 25, 1976. [2] H.G. Barrow and J.M. Tenenbaum. Interpreting Line Drawings as Three Dimensional Surfaces. Artificial Intelligence. 17:75-116,1981. [3] R. Bergevin and M.D. Levine. Generic Object Recognition: Building Matching and Coarse Descriptions from Line Drawings. IEEE Transactions PAM1, 15, pages 19-36, 1993. [4] RJ. Besl and R.C. Jain. Segmentation Through Symbolic Surface Descriptions. In Proceedings o f IEEE CVPR, pages 77-85, 1986. [5] I. Biederman. Recognition by Components: A Theory of Human Image Understanding. Psychological Review, 94(2): 115-147, 1987. [6] I. Biederman and C. Gerhardstein. Recognizing Depth-Rotated Objects: Evidence and Conditions for Three-Dimensional Viewpoint Invariance. Journal o f Experimental Psychology: Human Perception and Performance. Vol. 19, No 6, 1162-1182,1993. [7] T.O. Binford. Visual Perception by Computer. IEEE Conference on Systems and Controls, December 1971, Miami. [8] T.O. Binford. Inferring Surfaces from Images. Artificial Intelligence, 17:205- 245, 1981. [9] T.O. Binford, T.S. Levitt and W.B. Mann. Bayesian Inference in Model- Based Machine Vision. Proceedings of AAAI Uncertainty Workshop, 1987. 240 [10] T.O. Binford and T.S. Levitt. Quasi-invariants: Theory and Exploitation. In Proceedings o f the Image Understanding Workshop, pages 819-829, Washington DC, 1993. [11] H. Blum. A Transformation for Extracting New Descriptors of Shape. MIT Press. 1967. [12] T.E. Boult and A.D. Gross. On the Recovery of Superellipsoids. Proceedings o f the Image Understanding Workshop, pages 1052-1063, Cambridge, MA, 1988. [13] J. M. Brady. Criteria fo r Representation o f Shape. Human and Machine Vision, J. Beck, B. Hope and A. Rosenfeld (eds), pages 39-84, Academic Press, 1983. [14] M. Brady and H. Asada. Smoothed Local Symmetries and their Implementation. International Journal o f Robotics Research, 3(3):36-61, 1984. [15] M. Brady and A. Yuille, An extremum principle for shape from contour, IEEE Transactions on Pattern Analysis and Machine Intelligence 6,288-301,1984. [16] R.A. Brooks. Model-Based Three Dimensional Interpretation of Two Dimensional Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2): 140-150, 1983. [17] J. B. Burns, R.S. Weiss and E.M. Riseman, View Variation of Point-Set and Line Segment Features, IEEE Transactions PAMI, 15, pages 51-68, 1993. [18] J.F. Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Recognition and Machine Intelligence, 8(6):679- 698, November 1986 [19] S.W. Chen and G. Stockman. Object Wings-2 1/2-D Primitives for 3-D Recognition. In Proceedings o f IEEE CVPR, pages 535-540,1989. [20] M.B. Clowes, On Seeing Things, Artificial Intelligence, 2(1):79-116. 1971. [21] M. Dhome, R. Glachet and J.T. Lapreste, Recovering the Scaling Function of a SHGC from a Single Perspective View, In Proceedings o f IEEE CVPR, pages 36-41, 1992. 241 [22] S. Dickinson, 3-D shape Recovery using Distributed Aspect Matching, IEEE Transactions PAMI, 14(2): 174-198, 1992. [23] M.R DoCarmo. Differential Geometry o f Curves and Surfaces. Prentice Hall, 1976. [24] J. Dolan and R. Weiss, Perceptual Grouping of Curved Lines, In Proceedings o f the Image Understanding Workshop, pages 1135-1145, Palo Alto, California. May. 1989. [25] S . Edelman and H.H. Bulthoff. Orientation Dependence in the Recognition of Familiar and Novel Views of 3D Objects. Vision Research, 32, pages 2385- 2400. 1992. [26] T J. Fan, G. Medioni and R. Nevatia, Recognizing 3-D Objects using Surface Descriptions, IEEE Transactions PAMI, 11(11):1140-1157, 1989. [27] W.E.L. Grimson, Object Recognition by Computer: The Role o f Geometric Constraints, MIT Press 1990. [28] A. Gross and T. Boult, Recovery of Generalized Cylinders from a Single Intensity View, In Proceedings o f the Image Understanding Workshop, pages 557-564, Pennsylvania, 1990. [29] P. Havaldar, G. Medioni and F. Stein. Extraction of Groupings for Recognition. European Conference on Computer Vision, Stockholm, 1994. [30] R. Horaud and M. Brady, On the Geometric Interpretation of Image Contours, Artificial Intelligence, 37:333-353,1988. [31] B.K.P. Horn, Robot Vision. M.I.T. Press, Cambridge, MA. 1986. [32] Q. Huang and G. C. Stockman, Generalized Tube Model: Recognizing 3D Elongated Objects from 2D Intensity Images, In Proceedings o f IEEE Computer Vision and Pattern Recognition, pages 104-109, 1993. [33] D.A. Huffman. Impossible Objects as Nonsence Sentences. Machine Intelligence 6, B. Meltzer and D. Michie (eds.), pages 295-323, Edinburgh University Press, Edinburgh 1971. [34] J.E. Hummel and I. Biederman, Dynamic Binding in a Neural Network for Shape Recognition Psychological Review, 1992. 242 [35] IEEE Transactions of Pattern Analysis and Machine Intelligence. Special Issue on Interpretation of 3-D Scenes-Part I, PAMI 13(10), 1991. [36] IEEE Transactions of Pattern Analysis and Machine Intelligence. Special Issue on Interpretation of 3-D Scenes-Part II, PAMI 14(2), 1992. [37] T. Kanade. Recovery of the Three-Dimensional Shape of an Object from a Single View, Artificial Intelligence, 17:409-460, 1981. [38] J.J. Koenderink. Solid Shape. M.I.T. Press, Cambridge, MA. 1990. [39] J. Liu, J. Mundy, D. Forsyth, A. Zisserman and C. Rothwell, Efficient Recognition of Rotationally Symmetric Surfaces and Straight Homogeneous Generalized Cylinders, In Proceedings o f IEEE Computer Vision and Pattern Recognition, pages 123-128, 1993. [40] D.G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Hingham, MA. 1985. [41] A.K. Mackworth, Intrepreting Pictures of Polyhedral Scenes, Artificial Intelligence, 4:121-137. 1973. [42] J. Malik,Interpreting Line Drawings of Curved Objects, International Journal o f Computer Vision, 1(1):73-103, 1987. [43] D. Marr, Vision, W.H. Freeman and Co. Publishers, 1981 [44] R.S. Millman and G.D. Parker, Elements o f Differential Geometry, Prentice Hall. 1977. [45] R. Mohan and R. Nevatia, Perceptual Organization for Scene Segmentation, IEEE Transactions PAMI. 1992. [46] T. Nakamura, M. Asada and Y. Shirai, A Qualitative Approach to Quantitative Recovery of SHGC’s Shape and Pose from Shading and Contour, In Proceedings o f IEEE CVPR, pages 116-121, New York, 1993. [47] V. Nalwa, Line Drawing Interpretation: Bilateral Symmetry, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:1117-1120, 1989. 243 [48] R. Nevatia and T.O. Binford, Description and Recognition of Complex Curved Objects, Artificial Intelligence, 8(l):77-98, 1977. [49] R. Nevatia. Machine Perception. Prentice Hall. 1982. [50] R. N evatia, K. Price and G. Medioni. USC Image Understanding Research: 1990-1991. In Proceedings o f the Image Understanding Workshop, San Diego, California. 1992. [51] R. Nevatia, M. Zerroug and F. Ulupinar. Recovery o f 3-D Shape o f Curved Objects from a Single Image. Handbook of Pattern Recognition and Image Processing, chapter 4, Y. Tsay editor, Academic Press, 1994. [52] A. Pentland. Recognition by Parts, in Proceedings o f the ICCV, pages 612- 620, 1987. [53] J. Ponce and D. Chelberg. Finding the Limbs and Cusps of Generalized Cylinders. International o f Computer Vision, 1:195-210, 1987. [54] J. Ponce. Ribbons, Symmetries and Skewed Symmetries. In Proceedings o f the Image Understanding Workshop, pages 1074-1079, Massachusetts, 1988. [55] J. Ponce, D. Chelberg and W.B. Mann, Invariant Properties of Straight Homogeneous Generalized Cylinders and their Contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(9):951-966, 1989. [56] K. Rao and G. Medioni. Useful Geometric Properties of the Generalized Cone In Proceedings o f IEEE Computer Vision and Pattern Recognition, pages 276-281. 1988. [57] K. Rao and R. Nevatia. Computing Volume Descriptions from Sparse 3-D Data. International Journal o f Computer Vision, 2:33-50, 1988. [58] K. Rao and R. Nevatia. Shape Description From Sparse and Imperfect. Data. Ph.D. dissertation. IRIS #250, 1988. [59] M. Richetin, M. Dhome, J.T. Lapestre and G. Rives. Inverse Perspective Transform Using Zero-Curvature Contours Points: Applications to the Localization of Some Generalized Cylinders from a Single View, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(2): 185-192, 1991. 244 [60] L. Roberts. Machine Perception o f Three-Dimensional Solids. MIT Press, 1965. [61] I. Rock. Orientation and Form. Academic Press, San Diego, CA. 1973. [62] H. Rom and G. Medioni. Hierarchical Part Decomposition and Axial Shape Description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(10):973-981. 1993 [63] H. Rom and G. Medioni. Part Decomposition and Description of 3D Shapes. Proceedings o f the International Conference on Pattern Recognition. 1994 (to appear). [64] P. Saint-Marc and G. Medioni, B-spline Contour Representation and Symmetry Detection, In Proceedings of the European Conference on Computer Vision, pages 604-606, Antibes, France, April 1990. [65] P. Saint-Marc, J.-S. Chen and G. Medioni. Adaptive Smoothing: A General Tool for Early Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(6):514-529, 1991/ [66] H. Sato and T.O. Binford, Finding and Recovering SHGC Objects in an Edge Image, Computer Vision Graphics and Image Processing, 57(3), pages 346- 356, 1993. [67] S.A. Shafer and T. Kanade, The Theory of Straight Homogeneous Generalized Cylinders, Technical Report CS-083-105, Carnegie Mellon University, 1983. [68] K. A. Stevens. The Visual Interpretation of Surface Contours. Artificial Intelligence 17, 47-73, 1981. [69] F. Ulupinar and R. Nevatia. Shape from Contours: SHGCs. In Proceedings o f IEEE International Conference on Computer Vision, pages 582-582, Osaka, Japan, 1990. [70] F. Ulupinar and R. Nevatia. Recovering Shape from Contour for Constant Cross Section Generalized Cylinders. In Proceedings o f Computer Vision and Pattern Recognition, pages 674-676. 1991. Maui, Hawaii. 245 [71] F. Ulupinar and R. Nevatia, Recovery of 3-D Objects with Multiple Curved Surfaces from 2-D Contours, In Proceedings o f the Image Understanding Workshop, San Diego, California, January. 1992. i [72] F. Ulupinar and R. Nevatia, Perception of 3-D Surfaces from 2-D Contours, IEEE Transactions PAMI, pages 3-18, 15, 1993. [73] A.P. Witkin. Recovering surface shape and orientation from texture. Artificial Intelligence, 17:17-45. 1981 [74] G. Xu and S. Tsuji. Inferring Surfaces from Boundaries. In Proceedings o f IEEE International Conference on Computer Vision, 716-720, London, 1987. [75] M. Zerroug and R. Nevatia. Volumetric Descriptions from a Single Intensity Image. International Journal o f Computer Vision (to appear). [76] M. Zerroug and R. Nevatia, Quasi-invariant Properties and 3-D Shape Recovery of Non-Straight, Non-Constant Generalized Cylinders, In Proceedings o f Computer Vision and Pattern Recognition, pages 96-103. 1993. New York. [77] M. Zerroug and R. Nevatia, Segmentation and 3-D Recovery of SHGCs from a Single Intensity Image, in Proceedings o f the European Conference on Computer Vision, pages 319-330, Stockholm, 1994. [78] M. Zerroug and R. Nevatia, Using Invariance and Quasi-Invariance for the Segmentation and Recovery of Curved Objects, In Proceedings o f the second Workshop on the Application o f Invariance in Computer Vision, pages 391- 410, The Azores 1993. Also to appear in Zisserman and Mundy editors, Springer-Verlag 1994. I i [79] M. Zerroug and R. Nevatia, From an Intensity Image to 3-D Segmented ! Descriptions, In Proceedings o f the International Conference on Pattern | Recognition, 1994. (to appear) J i [80] M. Zerroug and R. Nevatia, Segmentation and 3-D Recovery of Curved Axis j Generalized Cylinders from an Intensity Image, In Proceedings o f the I International Conference on Pattern Recognition, 1994. (to appear) ! 246 Appendix A Proofs Proof of Theorem 3.4 Let P i = S (t, Si) and P2 = S (?, s2) be two corresponding points on two different cross-sections. The line L joining these points can be parameterized as follows: u = [u (?) sin a (r (sj) - r (s2))] m + u (?) sin a r f^ ) (A .l) v = [v (?) (r (sj) - r (s2))] m + v (?) r (5X ) (A.2) s = [si~s2 + u (?) cos a (r (j^) - r (s2))] m + s1 + u (?) cos a r (s^) (A.3) Case 1. r (5X ) * r (s2). The intersection point, M, of L with the SHGC axis (s- axis) is given by setting u = v = 0 which implies m ~ -r (5 X ) / (r C^) - r (s2)) and At has coordinates (0, 0, (s2 r (sj) - sx r (s2) ) / (r (sj) - r (s2)))‘ which are independent of ? (no matter what pair of corresponding points on the cross-sections). Case 2 . r (sj) = r (s2). In this case the direction of L is given by the vector (0, 0, si - s2)1 which is independent of ? and parallel to the SHGC axis □ . 247 Proof of Corollary 3.4 An algebraic proof is not necessary as it is similar to the previous one, except that the expression of lines of correspondence is in the image plane. Instead, we give the following argument: if the correspondence lines are parallel to the axis, then they are also parallel in the image (orthographic projection), otherwise the intersection property holds also true in the image since intersecting lines in 3-D project onto intersecting lines in the 2-D image (general viewpoint) □. Proof of Property 3.5 CjOO - CjCu) = f f r ^ d s = ^ T 2(as+b)ds = dt = i (C2(v') - C2(v)) □. Proof of Property 3.7 From equation 3.7, at limb points, we have cos (0i - a) = cos ( 0 2 - a ) = 0, and thus, sin (0i - a ) = - sin (02 - a ) = ±1 for each s along the axis. Therefore, from equation 3.19, the 2-D axis point is given by (0, 0Y, the origin of the local 2-D frame which, as discussed in section 3.1.2, is the projection of the axis point A(sj no matter what the viewing direction (given locally by a and P) is □. 248 Proof of Property 3.8 Reporting the results of the previous proof in equation 3.18, we obtain the expression of the cross-section segment r (0, ±2)' (A.4) The 2-D axis shown to be the projection of the 3-D axis, its tangent vector is the projection of the 3-D tangent vector >. This latter, from the projection equation 3.9, is given by (-sin (3 , 0 / which is orthogonal the previous cross-section segment □ . Proof of Property 3.9 Using equation 3.18, the length of the cross-section is 2 r where r is the constant 3- D cross-section radius □ . Proof of Property 3.10 From equation 3.16, since the right hand side is a function of s only, we have cos (0X - a ) = cos ( 0 2 - a ) and thus sin (0j - a ) + sin (02 - a ) = 0 Reporting this in equation 3.19, the 2-D axis point is given by (r / 2) (cos p [cos (02 - a) + cos (0X - a)], 0)f which is always a point on the w-axis of the local 2-D frame; i.e. the direction of the projection of the tangent to the 3-D axis. Note that unlike Property 3.7, this point does not coincide with the projection of the 3-D axis point A (s). However, 249 since the 3-D axis is straight, its projection is also straight and it is determined by the origin of the local 2-D frame (projection of A (s)) and the projection of the 3-D tangent; i.e. the M-axis □. Proof of Property 3.11 From the previous proof and equation 3.18, the cross-section segment is given by r (0, sin (02 - a) - sin (0* - a))1 which is parallel to the v-axis of the local 2-D frame. But also from the previous proof the 2-D axis is the M-axis, orthogonal to the v-axis □. Proof of Property 3.14 Consider a right ribbon whose two sides are parameterized as follows: C+ (0 = a (0 + m (t)n (/) (A.5) C_ (t) = a (t) -m (t) n (?) (A.6) where a (t) is a 2-D curve (axis) parameterized by arclength t, n (t) the normal vector to the axis and m (t) the 2-D scaling function (m (t) = m is constant in this case). Corresponding points of such a right ribbon are £7,. (t) and C. (t) for the same value of t. The tangent at the C+ it) side is given by 250 since m'(t) = 0; a (?) is the unit tangent vector of the axis a (?) and k (?) its curvature. Similarly, the tangent at the C. (?) side is given by Q - (?) = [ 1 + w £ (?) ] a ' (?) It is clear that these two vectors are not only mutually parallel but parallel to the axis tangent as well. Furthermore, the right ribbon axis point (the mid point of the segment connecting corresponding points) is also, by virtue of definition 3, the parallel symmetry axis point □ . Proof of Property 3.15 It is sufficient to prove that tangents to the two sides of such a right ribbon are not parallel in the general case. Using equation A.5, the tangent to the C+ curve is given by C’+ (?) = [ 1 - m (?) k (?) ] a' (?) + m’ (?) n (?) and using equation A. 6, the tangent to the C. curve is given by Q j (?) = [ l + m (?) k (?) ] a ’ (?) - m’ (?) tk (?). These two vectors are parallel if and only if their determinant vanishes. It is easy to show that this latter is -2 m’ (?). Therefore, the only condition under which these two vectors are parallel is m ’ (?) = 0; i.e. only at extrema of the ribbon scaling function or in the case of a constant sweep □. 251 Appendix B Analysis of the Correspondence Finding Methods of SHGCs B .l SHGCs of Type 1 To find the corresponding point Q of P is equivalent to finding the proper scaling of the cross-section such that it is tangential to both boundaries B and C [69]. Given the translated cross-section Cjsuch that P and its “matching point” X coincide, then define the function fp ci(Q ‘) = dist (P, Q1 ) / dist (P, R*) where Qi is a point of the boundary C and R l the intersection of the line P Ql with the cross-section Cj (see figure B. 1 .a). The corresponding point Q is a point of C for which fpciiQ ) is a local minimum; i.e. fp (Q) < fp (Qk) for Qk in the vicinity of Q. (B.l) A C2 a. b. Figure B .l Finding the local SHGC patch correspondences. 252 This property was given without proof in [69]. We give its proof in what follows. Proof. Let us assume without loss of generality that both B and C are limb boundaries (the case of extremal meridians is similar). Then by the nature of limb boundaries, the 3-D surface is locally convex, in the viewing direction, along both B and C and in particular in the vicinity of the corresponding point Q. This implies that the image of the cross-section C2 that passes through P and Q is locally convex at < 2; i.e. locally lies on one side of C at Q (figure B.l.b). This local convexity of C2 can be expressed as follows: if we perturb the line P-Q by a small angle then the new line P-Q1 (Ql on the boundary C) will first hit the cross-section C2 then the limb boundary C (since the cross-section lies on one side of C at Q). Denoting Sl the point of intersection of the line P-Q1 with C2, and Ri the point of intersection of the line P-Q1 with the translated cross-section C L , we have dist (P, SO ^ dist (P, Q‘) (B .2) with equality at Q. Using Property 3.1, we know that Cx and C2 are linearly parallel symmetric with apex (intersection of all lines of symmetry) at P. Thus the scaling r between C* and C2 is uniform and we obtain r = dist (P, Q) / dist (P, R ) = dist (J P , Sl) / dist (P, R l) (B.3) 253 Combining the right most term with equation B.2, we can write in the vicinity of Q r =fP (Q) =dist (P, Q) / dist (P, R ) < dist (P, Ql) / dist (P, R l) = fP (Ql) (B .4) with equality at Q. Therefore the ratio r (the true scaling) is a local minimum of the function f P (<2) □. B.2 SHGCs of Type 2 The basic idea of the elliptic approximation of the generating region of the cross- section is the convexity and the tangentiality of the cross-section with the limb boundary, both of which are also the essence of the method and its proof in the previous section. Given a generating curve region R G of the cross-section as shown in figure B.2, and the actual corresponding point Q of a given point P, then at the point of contact between R G and the boundary (i.e. Q), the tangent to RG is parallel to the tangent to the boundary at Q. Hence, so long as the tangent at R G is the same, the local minimum property of the previous section still holds if the curvature of RG changes, to obtain another region RG , in a way that maintains its locally convexity. Therefore if R G is chosen to be the portion of an ellipse with the same tangent at R g , the local minimization method can still be used. However, since the orientation of R q (or the point Q) is not known a priori, multiple ellipse orientations must be used. One of them (up to orientation quantization) will find the point Q. 254 B P Figure B.2 Local convexity and tangentiality o f generating region are key to finding correspondences. Along a boundary the orientation of the ellipse that finds the correct co-cross sectional point should not change much (it does not change at all for surfaces of revolution and LSHGCs). In case it does, some inaccuracies may result by using the same ellipse orientation along a boundary. This is alleviated by accepting only the corresponding points (which locally minimize the scaling) for which the tangentiality and the convexity properties are satisfied (up to errors). An issue is which eccentricity to use for the ellipse. Normally, any eccentricity which results in generating regions with higher curvatures than the mapped boundaries should work. This can be done adaptively depending on the boundaries. In the implementation however, a fixed eccentricity of | has been used as it results in relatively high curvature elliptic regions. 255 Appendix C Further Analysis of Curved-Axis Primitives C .l Equivalence Classes of PRCG Cs We show that the family of PRCGCs with the same cross-section and the same axis plane and shape (not necessarily same size) which differ in the position of the point of contact between the axis and the cross-section forms an equivalence class. This property is similar to the pivot theorem given by Shafer and Kanade for SHGCs [67]. Let us state it formally as a theorem. Theorem C .l: The family of PRCGCs which differ only in the relative position of their axes with respect to the cross-section form an equivalence class with respect to their shape. Proof: To prove this theorem, we have to show that the shape of a PRCGC remains unchanged when its origin is displaced within its cross-section plane. This can be achieved by showing the independence of the shape of a PRCGC to the cross-section parameterization C (0) = (p (0) cos 0, p (0) sin 0)r written with respect to some origin O which is not necessarily the centroid of the cross-section (see figure C .l). For this, it is sufficient to prove that the surface 256 F igure C .l E quivalent P R C G C s having different origins. normal at a point on the surface is independent of the above parameterization. From equations 3.2 and 3.3, the surface normal of a PRCGC can be shown to be but this is precisely the normal to the cross-section curve at the same point when the cross-section is written in the Frenet-Serret frame as Using the fact that a normal to a curve is independent of its parameterization [23], the theorem is proved □. C.2 Size-Constancy of the Cross-Section Segments of PRCGCs The parallel symmetry Property 3.6 is independent of the cross-section shape of a PRCGC. However the size constancy Property 3.9 was given only for circular 0 - (psin0 + pcos0) ^ p c o s 0 -p s in 0 j C (0) = (0, p (0) cos 0, p (0) sin 0) cross-sections. It is useful to analyze whether this constancy property also holds for non-circular PRCGCs. In the following we show that, contrary to intuition, the size constancy property does not necessarily hold in general. To simplify the analysis we will consider two common types of cross-section, a smooth rotationally symmetric curve and a rectangular one with one side parallel to the axis plane and the other orthogonal to it. C.2.1 PRCG Cs with Smooth Rotationally Symmetric Cross-Section The definition of a rotationally symmetric curve is given first. Definition C .l: a closed curve C (0) is rotationally symmetric if for all 0, we have C (0) - P = - (C (0 + Jt) - P), where P is the center point of the curve. In other words, any line through P intersects the curve C at opposite points with equal distances from P. Simple instances include circles, ellipses, body shapes (figure C.2). Many objects we see in our environment have rotationally symmetric cross-sections. The following proposition is a useful step in the analysis. Figure C.2 A rotationally symmetric curve. 258 Proposition C.l: For a PRCGC with a rotationally symmetric cross-section, if a point P(s, 0) is a limb point then P (s, 0 + it) is also a limb point. Proof. From the limb equation 3.6, P (s, 0) is a limb point if and only if cot (0 - a ) = - p (0) / p (0) (C.l) Since C is rotationally symmetric then p (0) = p (0 + rt) and thus p(0) =p (0 + 7t). Therefore, we can write cot ((0 + n) - a) = cot (0 - a ) = - p (0) / p (0) = - p (0 + rt) / p (0 + n); which is equivalent to P (s, 0 + n) being a limb point □ . Using the above argument and replacing 0j by 0 and 02 by 0 + n in equation 3.11, the cross-section segment can be expressed as e . - 2p (0) ( C O S sfn “ g ^ “ >) (C.2) The length of this segment is given by = 2p (0) a / cos2(3cos2 (0 - a ) +sin2( 0 - a ) (C.3) The value of | ? J is not constant in general because p (0) is not constant and a and (3 vary independently of the cross-section shape (they vary with the axis). C.2.2 PRCGCs with Rectangular Cross-Section For a rectangular cross-section with one side orthogonal to the axis plane and the other parallel to it, we can analyze the size of the segments in both directions. 259 C.2.2.1 Analysis of the Side Orthogonal to the Axis Plane The cross-section side which is orthogonal to the axis plane joins two points P(s, 0!) and P(s, 02) such that p (0 ^ cos 0i = p (02) cos 02 and the expression of the of 3-D segment in its Frenet-Serret frame is (0,0, p (02) sin 02 - p (0X ) sin 0*)*; i.e. parallel to the binormal. Using the projection equation 3.9, its image is given by < £ s = (cos (3sin a [ p (02) sin 02 - p (0i) sin 0j], cosa [p (0 ^ sin 02 - p (0 ^ sin 0X ])* where 0j and 02 are constant (there are no limb points in this case). The length of this vector is given by ||£ J = |p (0 2) sin02 - p (O p sinOjj V cos^Psin^ot^cos^a which after developing the term under the radical is equivalent to |£*J = |p (0 2) sin02 - p (0j) sin0jj J l - sin2a s in 2|3 (C.4) But since the viewing direction P is a constant vector then its coordinate in the direction of the binormal S is constant; i.e. P . t> = sin a sin |3 = constant (C.5) and since 0! and 02 are constant so is |p (0 2) sin02 - p (0j) sin0j| . Therefore ||<?s|| is constant regardless of the viewing direction □ . The property of the cross-section segments is in fact stronger since they are also parallel. An illustrative example is given in figure C.3. It can be shown that if a sequence of cross-section segments are all parallel and have constant size, then 260 constant cross-section segments length / (also parallel) \ non-constant cross-section segment length Figure C.3 Cross-section segments lengths o f a PRCGC with a rectangular cross-section. they must be orthogonal to the axis plane; i.e. the property we gave is necessary and sufficient, under orthographic projection and a general viewing direction. C.2.2.2 Analysis of the Side Parallel to the Axis Plane Using a derivation similar to the previous one, the side parallel to the axis plane can be expressed in the image as = (cos P cos a [p(0 2 ) cos 0 2 - p(©i) cos 0 J , -sin a [p(©2) cos 0 2 - p(0i) cos 0i ])* and its length is given by where I = |p (02) cos 02 - p ( 0 cosB^J and m = sin a sin p. In this expression I and m are constant. The former is the 3-D length of the side segment (C.6) and the latter is the cosine of the angle between the viewing direction P and the axis 261 binormal S (see the analysis in previous section); i.e. m = cos 5. Because of the dependency on a , |<?J is not necessarily constant in the image (see figure C.3). C.3 An Additional Property of C ircular PRGCs We can derive an additional useful property of circular PRGCs that relates the image of the orientation of an extremal cross-section and the 2-D axis tangent. By the nature of the primitive, the cross-section orientation is the same as the axis orientation at a given extremity of the primitive (see figure C.4.a). From Property 3.13, the 2-D axis tangent at that extremity is “almost” parallel to the projection of the 3-D axis tangent which in the image is the orientation of the minor axis of the ellipse (projection of the circular cross-section). Therefore the following property holds. ellipse orientatioa- 2-D axis orientation' Figure C.4 Extremal cross-section and axis orientations are constrained in the image, a. coincident in a regular cut; b. non-coincident in an irregular cut. P roperty C .l: In the image of a circular PRGC cut along its cross-section, the 2-D axis tangent at a given extremity is “almost parallel” (almost orthogonal) to the ellipse minor (major) axis at that extremity. 262 The extent to which the two directions are “almost parallel” (or almost orthogonal) is the same as the one given by the analysis of the quasi-invariant Properties 3.12 and 3.13 in section 3.1.2. This property can be used in classifying the cut of a circular PRGC, a useful information for inferring 3-D shape. C.4 Non-Circular PRGCs When the cross-section of a non-constant, planar, right generalized cylinder is not circular, then Properties 3.12 and 3.13 do not have the same ranges of values and space sizes as when it is circular; i.e. the angles y and < j ) have a wider range of variation over the same parameter space region. The particular behavior depends on the cross-section shape (as was the case for PRCGCs and the 2-D cross-section segment constancy property). It is difficult to analyze the variation of those angles for all types of cross-sections. Our experiments on rotationally symmetric cross- sections showed values of y up to 25° from 90° for sample objects. The rate of change depends on the curvature of the cross-section at the limb boundaries, besides its dependence on the thickness ratio and the sweep derivative. As the curvature departs from a constant function, the angular range starts to increase. Therefore, right ribbon approximations of the image surface of non-circular PRGCs are in general not as good estimates of the projection of the 3-D descriptions as they are for circular PRGCs. 263 Appendix D Evaluation of the Inferred 3-D Descriptions of Circular PRGCs In this appendix we analyze the “goodness” of the inferred 3-D descriptions of circular PRGCs. We only address circular PRGCs because their recovery uses two levels of approximations, one is when the projective descriptions are approximated by the quasi-in variant Properties 3.12 and 3.13 and the other is at the 3-D recovery step. These two aspects of the “goodness” of the results are discussed below. D .l Variation of the 3-D Descriptions as Functions of the Parameter Space In the following, we give an analysis of the variation of the recovered 3-D descriptions, as a function of the parameters ( a , (3 , e, r) , when the projection of the 3-D axis orientation is approximated by a vector orthogonal to the cross-section segment. This orientation has been shown to be “almost” orthogonal to the cross- section segment and therefore we wish to analyze the sensitivity of choosing an exactly orthogonal one on the 3-D shape recovery when the parameters change (this analysis can also be thought of as an analysis of the sensitivity of 3-D shape to small perturbations in image measurements of the axis orientation). 264 From the recovery method given previously, the 2-D axis orientation affects the 3-D orientation of the cross-section, the position of the 3-D axis point on the axis plane and the radius (scaling) of the recovered cross-section. The principle of the analysis is as follows: given the analytical expression of the cross-section segments (equation 3.18), the limb equation 3.22, the actual 3-D radius ra of the cross-section, the position of the actual 3-D axis point p a = (0,0, 0)r in its local Frenet-Serret frame and the actual 3-D axis tangent (or the cross-section orientation) = (1,0,0)* in the local frame, what are the variations of the recovered 3-D radius rr with respect to ra, of the recovered 3-D axis point p r with respect to p a and of the recovered 3-D axis orientation tr with respect to ta when (a , |3, e, r) vary in the parameter space, given that in the image the projection of the 3-D axis tangent is approximated by the line orthogonal to the cross-section segment. For this, we assume that the axis plane orientation is given (for example, by observing the two extremal cross-sections). We have carried this analysis for the space of parameters used for the analysis of the quasi-invariant Properties 3.12 and 3.13, namely a € [0, n], (3 € (0, t c ), e < : 0.5 and |r| < 0.5. We omit the details of the mathematical manipulation as they are similar in spirit to the derivation of those quasi-invariant properties. Figure D. 1 .a shows the plot of the size of the parameter space for which the angle between tr and ta is within several upper-bounds. Figure D .l.b shows the plot of the size of the parameter space for which the ratio of rr to ra is within different bounds and 265 figure D .l.c shows the plot of size of the parameter space for which the ratio of the distance between p r andp a to the cross-section radius is within different bounds (i.e. relative closeness of the axis points with respect to the cross-section radius). size of sp a ce ( % ) size of sp a ce ( % ) size of sp a c e ( % ) angle difference between ratio of rr to ra ratio of center displacement tr and ta (upper-bound °) (upper-bound) to radius (upper-bound) a. axis (cross-section) b.cross-section radius c. axis position orientation Figure D .l Plots o f size o f the parameter space for which the recovered 3-D description is within different bounds o f the actual one. From the above plots, it can be seen that the recovered 3-D description is close to the actual one over most of the parameter space. For example, rr is within 2% of ra over 99% of the parameter space, the displacement of the axis point is within 5% of the cross-section radius over 89.6% of the space and the angular displacement of the cross-section orientation is 5° or less over 93.4% of the space. The displacements tend to be larger only close to degenerate regions of the 266 parameter space, namely where limbs do not exist or high values for both the thickness ratio and the sweep derivative (non-common shapes). D.2 Sim ilarity of the 3-D Descriptions with the G round-Truth We have applied our method on synthetic object boundaries generated from objects with known intrinsic GC descriptions and compared them with the inferred descriptions. Figure D.2.a. and b. shows the objects used for the analysis and figure D.2.c. shows the recovered objects shown from different viewing directions. Figure D.3.a shows the ground truth 3-D axis and scaling functions of the two objects and figure D.3.b shows the recovered ones by our method from the 2-D boundaries of figure D.2.a and b. The results do “look” similar to the ground truth. However, we can quantify their similarity. For this, we define the following measures: D.2.1 Sim ilarity of the Recovered Axis with the G round T ruth Axis Let the ground truth axis be aG, its arclength parameterization aG(i), the recovered axis aR and its arclength parameterization aR(t). Those parameterizations are in discrete form, in the same direction and with identical total arclength a j (this can be done by scaling either axis). Furthermore, the recovered axis aR is expressed in the reference frame with respect to which aG is expressed (by bringing their reference frames into coincidence). For each pointpR of the recovered axis aR, let 267 a. original (synthetic) boundaries b. inferred 3-D descriptions Figure D.2 Synthetic objects and the resulting 3-D descriptions. 268 a. ground truth axes and scaling functions b. recovered axes and scaling functions Figure D.3 Plots o f the ground truth (a) and recovered (b) 3-D axes and scaling functions fo r the objects o f figure D.2.a and b. its corresponding point (with the same arclength) on the ground axis aG be p G. The distance (error) measure between pR and p G is given by the normalized distance: da (0 = da (PR (0. Pg (*)) = |P* ~ PG\ \ / «T where ||v|| is the norm of the vector v. This measure is equal to 0 for a perfect reconstruction of the 3-D axis and increases with poorer reconstructions. 269 D.2.2 Similarity of the Recovered Scaling Function with the Ground Truth Scaling Function Let the ground truth scaling function be rG{t) and the recovered scaling function be rR{t), where t refers to the above mentioned arclength. rG(t) is normalized with respect to the radius of a chosen extremal cross-section of the ground truth circular PRGC. Similarly, rR(t) is normalized with respect to the radius of the corresponding extremal cross-section of the recovered circular PRGC. The distance measure between the radii rG(t) and rR(t) is given by dr(t) = dr(rR(t), rG(t)) = 1- [min (rR(t), rG(t)) / max (rR(t), rG(r))]. This measure is equal to 0 for a perfect reconstruction of the 3-D scaling function and increases with poorer reconstructions. It is upper-bounded by 1. We have computed the “distance” measures between the ground truth axes and recovered 3-D axes and between the ground truth scaling functions and recovered 3-D scaling functions shown in figure D.3. Table D .l summarizes the comparison results by giving the minima of da and dn their maxima and their means. Recall that the values are ratios. The results indicate that the recovered descriptions are “close” to the actual ones. object min (da) max (da) mean (da) min (dr) max (dr) mean (dr) fig. D.2.a 0 0.012 0.008 0.0003 0.016 0.005 fig D.2.b 0 0.035 0.019 0.00002 0.02 0.003 Table D .l Similarity measures between recovered descriptions and ground truth descriptions. 270 However, these measures cannot always be used because ground truth is in general not known and the viewed objects could, in principle, have projected from an infinite number of 3-D objects. A more adequate measure is one which compares the results with human perception. This is briefly discussed in section 6.3.4. 271 Appendix E Control of the Search and Complexity In section E. 1 certain aspects of the implementation pertinent to reducing the search cost are discussed. Section E.2 gives the time complexity of key algorithms of the method. E.l Control of the Search The methods described have been implemented with the concern of reducing the search cost. The two main aspects used are indexing and constrained search. We give their principle and some examples below. E.1.1 Indexing Most steps require pair wise analyses of image features such as boundaries and surface patches. In the curve level, a spatial index is used to access image boundaries within a desired region such as the vicinity of a boundary end. The image is partitioned into m x n cells of size 2k x 2k each (the size used is k = 5; i.e. 32 x 32 pixel cells) and boundaries passing through each cell are directly accessible through the position of that cell in the index table. Figure E. 1 shows the overlay of 272 the spatial index on the image of figure 4.12 showing the indexing cells and the points used to index the boundaries within each cell. In the curve level, this index is used to find the local compatibility described in section 4.1 and greatly reduces the number of boundaries examined at each boundary’s end. The same index is used Figure E .l The spatial index used for direct access to image boundaries. in the cross-section hypothesis step of section 4.3.1 and the verification steps of sections 4.3.3 and 5.1.3. At the surface patch level of both SHGC and PRGC modules a surface patch index is used that directly gives the local surface patches (and their ends) which lie in an image region of interest. This index also partitions the image into a number of cells in each of which are indexed surface patches (if any) whose 2-D axis point belongs to that cell. This index is used in the surface grouping steps of sections 4.3.2 and 5.1.2. 273 E.1.2 Constrained Search Constraining the search to relevant regions of a search space is another way the time complexity is reduced. This can be done by imposing certain constraints as soon as possible in the search process instead of leaving them as verification tests once the search is terminated. For example, in the symmetry level, each boundary is associated with the angular extent of its B-spline segments and the parallel symmetry detection method is not used if there is no overlap between the angular extents of a given boundary pair. At the local SHGC patch of type 1 detection step, the boundaries in the vicinity of the hypothesized cross-section at hand are first accessed and checked whether they form valid junctions with it. In case no such junctions are found, then the cross-section hypothesis is rejected and no SHGC patch detection is attempted for it. In the case certain boundaries are found to form valid junctions, then they are first used to detect local SHGC patches. The resulting patches (if any) are used to define image regions in the direction of their surface sweep and the search for the other patches is focused in those regions. The detection of local SHGC patches of type2 and of local PRGC patches proceeds by defining for each boundary a region surrounding it whose size is proportional to its length and then finding patches with other boundaries in that region. Junctions and the closure patterns and joints are detected in a similar fashion. 274 E.2 Time complexity E.2.1 Curve Level The boundary grouping method described in section 4.1 has complexity 0(n 2) where n in the number of boundaries. Using the spatial indexing of the image boundaries, the complexity becomes 0 (nne) where ne is the maximum number of boundaries in the vicinity of a given boundary’s end, for the neighborhood size used C ne « n). E.2.2 Symmetry Level The detection of parallel symmetry has complexity 0 (n b ) where n is the number of boundaries and b the number of B-spline segments per boundary. The grouping step of section 4.2.2 has complexity 0 (m ) where m is the number of parallel symmetry elements, with m = 0(n ) in the worst case. In practice, however, not all curves are parallel symmetric to one another, thus the number of parallel symmetries is much smaller (see tables 5.1 and 6.2 for the number of symmetries for some of the images used in this thesis). E.2.3 Surface Patch Level The cross-section finding method of section 4.3.1 has complexity 0(nned) where n is the number of boundaries, ne the maximum number of boundaries in the vicinity of a boundary’s end (the branching factor of the search using the spatial index, ne « n) and d the maximum depth imposed on the search. 275 The method for ruling local SHGC patches of type 1 of section 4.3.1 has complexity 0 (n2 p) for each hypothesized cross-section, where n is the number of boundaries and p the maximum number of points in a curve. In practice, the complexity is much lower because not all boundaries are examined since the focused search described in the previous section considers first boundaries in some area around the cross-section, then in areas defined by the found local SHGC patches. Thus the complexity becomes 0 { n 2n 2 p ) where na is the number of boundaries “around” the hypothesized cross-section and nb the number of boundaries in the search area defined by each local surface patch. The method for ruling local SHGC patches of type 2 of section 4.3.1 has complexity 0 (nnekp) when applied to all image boundaries; where n is the number of boundaries; ne is the number of boundaries in the region of interest around each boundary, ne « n; k the number of sampled ellipse orientations and p the number of points in a boundary. The Hough-like method of [55] has complexity 0 (n 2kp2) when applied to all image boundaries. The method for ruling local PRGC patches of section 5.1.1 has complexity 0 (nn^)s(s + sb)) where n is the number of boundaries, ne the number of boundaries in the region of interest around each boundary, p s the number of parallel symmetry axes between a pair of boundaries (in practice, p s « 1 on average, implying that few boundary pairs admit parallel symmetry), s the number of steps in the angular 276 search (s = A0 / 50) and b the number of B-spline segments of an axis of parallel symmetry. The surface patch grouping (for SHGCs and PRGCs) has complexity 0 (m me) where m is the number of local surface patches and me the number of local surface patches in the vicinity of a given surface patch’ s end (accessed through the surface patch index described in section E.l). The junction finding of the closure verification step has a complexity which depends on the type of junction searched. Three-tangent junctions may require up to 0 (n e ) where ne is the number of boundaries in the vicinity of a part’s end. This is because one boundary of the part provides one branch of the junction and the search is for the other two branches. Finding the other junctions (L-j, T-j and cusp) has complexity 0 (n e) for each one of them. The search for the connectivity between a pair of junctions takes 0 (ned) where d is the depth limit of the search. E.2.4 O bject Level The joint detection step has complexity 0(np2) where np is the number of parts (the junctions are given by the parts’ descriptions). 277
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255717
Unique identifier
UC11255717
Legacy Identifier
DP22896