Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A modular approach to hardware -accelerated deformable modeling and animation
(USC Thesis Other) 

A modular approach to hardware -accelerated deformable modeling and animation

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Transcript (if available)
Content NOTE TO USERS This reproduction is the best copy available. ® UMI Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A MODULAR APPROACH TO HARD WARE-ACCELERATED DEFORMABLE MODELING AND ANIMATION Copyright 2003 by Clint Chester N. Chua A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements of the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2003 Clint Chester N. Chua Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3116680 Copyright 2003 by Chua, Clint Chester N. All rights reserved. INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. ® UMI UMI Microform 3116680 Copyright 2004 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90089-1695 This dissertation, written by CLlhJi CUBSTBi, A). CHUA under the direction of hjS dissertation committee, and approved by all its members, has been presented to and accepted by the Director of Graduate and Professional Programs, in partial fulfillment of the requirements fo r the degree of DOCTOR OF PHILOSOPHY Director Date A u gu st 1 2 . 2003 Dissertation Committee Chair Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements It has been a while since I started the program and along the way, I have received so much help from many people. First and foremost, I would like to thank Dr. Ulrich Neumann, my advisor, for not only teaching me valuable lessons regarding my thesis development, but also, lessons in life and interpersonal relationships. I will cherish these lessons and moments and carry them forward as I turn over a new chapter in my life. Thank you so much in helping me building up my character and values. I would also like to thank my doctoral and qualifying exam committee members. Dr. Aristides Requicha, while strict with me during the defense, caused me to fully understand the fundamental issues concerning my field of research. He is the catalyst that made me think harder and faster on my feet. Dr. Alice Parker is very special and dear to me because not only has she seen my development first hand since my undergraduate career, but also because, she is instrumental in getting me to think outside of the box, to see the bigger picture. I’d also like to thank Dr. Isaac Cohen who helped me flesh out some heavy mathematics and provided some insight into the Finite Element Method, a truly difficult subject to grasp. Thanks to Dr. Gaurav Sukhatme who practically saved my qualifying exam. Without him, there would not have been one. Special thanks go to Dr. Mathieu Desbrun and Dr. James Gain for their help in designing and understanding some algorithms that I needed. During the program, I have also gotten to meet and interact with the greatest researchers I have ever met, my CGIT labmates. Just watching them work inspired me to better myself and to be like them. They were my role models in more ways than one. The is the cheerful Young Kwan, the compassionate Ilmi, the hardworking Jun, the intelligent Tae-Yong, the unwavering Jun-Yong, the unbiased Doug, the outspoken Bolan, the upbeat Mo, and the rational Deng. Over time, some of these great traits in them filtered into me and profoundly changed me. Special thanks goes to J.P Lewis. He is one of the most unique individual that I have the special opportunity to work with. Without his input, I wouldn’t feel that I have written a great thesis. Always Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. fall of ideas, J.P. is truly deserving of special recognition for his past achievements and the ones that are surely to follow. More power to you J.P.!! Special thanks also goes to Douglas Fidaleo, a truly good friend indeed. Doug has proven his worth as a friend in both thick and thin. He never wavered when I really needed him the most. Doug always had unique perspective on things that always gave me a breath of fresh air. His most outstanding trait that I do wish that more people had is his sense of fairness. Despite it all, Doug is always fair towards others and even if I didn’t want to relent, he made sure I did it the right way. Thanks for keeping me on the straight path Doug!! I would also like to recognize Didi Yao. Considered a recent friend, Didi propelled himself to fame in my book for completing his degree in such a short time and still have the time to teach me how to ingest alcoholic beverages while debugging code. Midori Sours all around!! I would like to thank my parents and my brothers. My parents for making it possible for to me come to the United States. Without them, I would not have had this opportunity to experience this learning process. I would like to thank my brother for training me in getting started and taking care of myself here in the U.S. Life is truly different here. And finally, I would like to acknowledge my wife, Eun-Young Elaine Kang. She stuck with me through thick and thin. She inspired me to keep on going and even when I had self-doubt about my progress or ideas. Elaine is instrumental in getting me to push forward no matter what. Elaine, I thank you from the bottom of my heart and I love you very, very much. I thank God for watching over me during all this time. God bless you all!! Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents ACKNOWLEDGEMENTS........................................................................................II LIST OF TABLES. ......... ....VI LIST OF FIGURES....... ........ VII LIST OF EQUATIONS ............................ .IX ABSTRACT ....... X Chapter 1 Introduction............................ 1 1.1 Problem Statement........................................ 3 1.2 Approach................................. 3 1.3 Motivation................... 5 Chapter 2 Related Wo rk......................................... 7 2.1 Evolution........ .............................................. 7 2.2 State o f the A rt................................ 9 2.3 Current Trends ...... 12 2.4 Conclusion.... ................................................................... 16 Chapter 3 The Framework ...... 17 3.1 Goals and Benefits... ........................................................................ 18 3.1.1 Hardware Acceleration..................................... 18 3.1.2 Standard Deformation A P I................... 19 3.1.3 Modular Deformation Primitives ............................. 20 3.2 Issues.......................................... 20 3.3 Conclusion ....... 21 Chapter 4 Hardware Accelerated Free Form Deformation........................... 23 4.1 FFD Basics........................... 23 4.2 Lattice Translation Table ...... 25 4.3 Extrapolation. ..... 27 4.4 Hardware Acceleration ....... ......28 4.4.1 Evaluator Support................ 28 4.4.1.1 Control Point Cache ..... 30 4.4.1.2 Polynomial Coefficient Register F ile.. ............................... ...31 4.4.2 Vertex Programming ............................... 31 4.4.3 Control Point Access with AGP Support ............................... 33 4.4.3.1 Review of AGP Concepts ............................... .....34 4.4.3.2 Control Point A ccess ..... 36 4.5 Conclusion........................ 37 iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ch a pters Layered Approach................................... .....38 5.1 System Overview....... ..... 39 5.2 Mesh Refinement ...... 40 5.2.1 Midpoint Subdivision ..... 40 5.2.2 Representation................................................. ....42 5.2.3 Subdivision Criteria................................... ..........42 5.2.4 Curvature Based Criteria ....................................... ..............43 5.2.4.1 Analytic Normal Transformation...... ..........................................43 5.2.4.2 Approximation Through Indirect Calculation............... 44 5.2.5 Hardware Accelerated Subdivision ...... 45 5.2.6 Alternative Adaptive Algorithm and Representation ..... 46 5.3 Controlling Mechanisms.......................................... 47 5.3.1 Radial Basis Function....... ..... 47 5.3.2 Skeletal Animation Systems..........................................................................48 5.3.3 Directly Manipulated Free-Form Deformation ...................... 48 5.3.4 Mass Spring Systems.............................. ......49 5.3.5 Finite Element Methods and Other Physical Controllers ...... 50 5.4 Conclusion .......... 51 Chapter 6 M odular Approach.......................................... 52 6.1 Introduction............................................. 52 6.2 Selection o f Core Techniques ...... 53 6.3 Analysis o f Core Computation ...... 55 6.4 Design o f a Graphics Co-processor................................................................. 58 6.5 Emulation on Current Graphics Hardware........................................................64 6.6 Conclusion........................................................................................................... 67 Chapter 7 Summary and Future W o rk ...... 68 7.1 Summary................................................................................................................... 68 7.2 Future Work........................................................................................... 69 BIBLIOGRAPHY..................... ............ ....... .71 A p p e n d ix A........................ 74 A .l Cg Program for Qudratic Uniform Free-Form Deformation......................... 74 A. 2 Cg Program fo r Radial Basis Functions..............................................................78 Appendix B ........... 80 B .l Proposed API Specification ..... 80 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables Table 1: Layered Abstraction Table.................... ...59 Table 2: Timing Table.. ............... 63 Table 3: Basic Setup A P I .................... 80 Table 4: Rendering Primitives .................... ......80 v i Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Figure 1: OpenGL 2.0 Overview....... ........ 16 Figure 2: Global and Local computation.............................. 17 Figure 3: A 2-D view of the lattice grid ................ ......25 Figure 4: The Lattice Translation Table ........... 26 Figure 5: Extrapolation procedure ..................... .....28 Figure 6: Block-level design of the OpenGL Evaluator Sub-block. ......29 Figure 7: Data Flow of the Pipeline with optional components ................. ....30 Figure 8: Accelerated Graphics Port ...... 34 Figure 9: Graphics Address RemappingTable............... ....35 Figure 10: Side Band Addressing......................... 36 Figure 11: System Overview...................... 39 Figure 12: Subdivided Triangle............................ 40 Figure 13: Termination Cases.............. 41 Figure 14: Cascading vertex ................... ....41 Figure 15: Tree Representation...................... 42 Figure 16: Boundary Cases ..... .....47 Figure 17: Progressive Degradation.... ....... 47 Figure 18: Pictorial Overview ......... .....58 Figure 19: New Design.... ................ 62 Figure 20: Register Design Close-up ........ ......62 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 21: Various Loops ...... 64 Figure 22: Quadratic Free-Form Deformation .... ...64 Figure 23: Pose-Space Deformation ................ .65 Figure 24: Radial-Basis Functions ... 66 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Equations Eq. 1..... 24 Eq. 2....... .............24 Eq. 3....... 25 Eq. 4.................... ....48 Eq. 5..... 56 Eq. 6........ .....56 Eq. 7........ 56 Eq. 8..................... ...56 Eq. 9..... 57 Eq. 10........................ 60 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Deformable objects are those whose shape over time cannot be described by a simple rigid movement. The addition of deformable objects to a virtual scene not only increases realism but also broadens the application’s utility. People and clothing are examples of important object classes that cannot be realistically portrayed without modeling deformation. Unfortunately, whereas rigid body motions have standard APIs such as OpenGL and Direct3D, existing animation systems treat deformation as a collection of special cases. Supporting deformable objects in an animation system traditionally requires building a specific deformation engine for each type of deformation needed. These deformation engines may overlap in terms of both algorithmic components and function. Such overlap leads us to question whether a single “general” approach to deformation is possible. Can a variety of different approaches and algorithms be subsumed under a common framework and API? And if this is possible, can it be integrated with existing 3D graphics hardware architectures? My approach partitions the computation of general character animation deformation into global and local computations. Global computations are those that require knowledge of other primitives whereas local computations require only a small set of parameters. By identifying the common set of operations underlying several different deformation primitives, I develop a generalized hardware-accelerated framework for deformation. Due to this hardware design’s flexibility and simplicity, it provides a basis for a standard deformation API. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 1ntroduction Static objects populate most virtual environments. WMle this may be sufficient for some applications, the addition of deformable objects to a scene will not only increase realism but also broaden the application’s utility. For instance, training medical students using virtual surgery simulations on the human body is made possible with deformable objects. Special effects and CG rendered movies rely on deformation techniques to animate characters. A virtual museum that is no longer restricted to displaying rigid body artifacts can now showcase amorphous solids and non-rigid objects. Another common application of deformable techniques is character animation, animation techniques that are applied specifically towards 3D human avatars including facial expression, speech and body articulation. These examples show that as we move away from the simplistic static and rigid body environments such as architectural walkthroughs, existing applications can be made more compelling and engaging. But existing animation systems treat deformation as a collection of special cases. For instance, in animating a CG character, animators will need a bone-muscle system, a facial animation system, a cloth animation system and a hair animation system. Despite the independent implementation of each technique, one can see that there are processes involved in the different deformation techniques that are similar. This means that although each deformation technique is different, they are only different in achieving a desired behavior or shape — the manipulation of the mesh remains the same. On the other hand, rigid body motions have standard APIs such as OpenGL and Direct3D that formalize the hardware architecture of a 3D graphics pipeline and identify the necessary and common components of a 3D renderer. This begs the question as to whether or not a similar standard deformation API is possible. Our proposed idea is therefore to impose a structure on deformation engines and to see how it can be incorporated into a standard 3D graphics pipeline. The proposed architecture draws upon common elements of available deformation techniques and formulates a standard API and its related hardware components. 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Traditionally, independent and monolithic systems are built to solve specific applications such as cloth animation or facial animation systems, but in a realistic application, we see that these systems do not work independently but instead work simultaneously. Let us take for example, a boy playing basketball. From this scenario, we can already point out a lot of deformable animation systems that are required to achieve realism. They include cloth animation, facial animation, hair animation, skeletal controllers and physical dynamics. Notice that as we slowly take away each of these systems one by one, we reduce the realism of the basketball game. From this example, we can see the benefits of a standard deformation API. ® During the early stages of development of a basketball game, a programmer would most likely be interested in creating an initial prototype. One well-known advantage of using a standard, well-designed API is faster prototype creation. In this case, the programmer is not overly concerned with correct physical behavior but would still like to have some rudimentary deformable objects in the prototype. • Another benefit of a standard API would be that the programmer would be able to spend more time with the development of better physical controllers. Since the API should already have provided the fundamental layers of deformable objects, the programmer could devote more effort to create faster or more realistic controllers. • Finally, the programmer can integrate these systems more easily since each deformation system conceptually does similar operations. By reusing these common layers, as shown later, the system can function more efficiently. While the proposed framework does not cover all types of deformation techniques, it is still applicable to a wide variety of deformable objects. By imposing structure on deformation engines and creating a standard API, it is easier to accelerate certain functions in hardware. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1 Problem Statement Including deformable objects in an animation system requires building a specific deformation engine. Typically, this would only satisfy a small part of the whole animation system, As the demand for more realistic animation systems increases, there is a need to incorporate more than one deformation technique. Incorporation of several deformation techniques traditionally involves implementing each system independently. Since deformation is commonly used in character animation, I will use deformation systems and character animation systems synonymously for the rest of this thesis. As mentioned before, despite the various deformation techniques being outwardly dissimilar, many of these techniques share a common set of processes. These processes may overlap in terms of either conceptual function or a commonly used algorithm. Such overlaps call into question the efficiency of an animation system that includes yet isolates each technique. This observation leads us to our main problem: whether or not a common deformation process is possible. If it is possible, how compatible is it not only with other deformation techniques but also with existing 3D graphics hardware? In addition, how can a general framework be imposed upon all the common sub-processes in order to combine them and bring about a standard API for deformation? 1.2 Approach The proposed framework attempts to solve the questions raised in the previous section by identifying and segmenting the different processes involved in any deformation system. A structure is then imposed upon these processes in order to be able to standardize and formalize the division of labor of a deformation system. Issues that are considered in designing the architecture include: • Compatibility with existing graphics pipelines « Ease of use of the deformation engine • The class of deformations that can be represented 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Deformation algorithms are surveyed to identify common attributes shared with other deformation systems and potential for reuse. Candidate techniques are then selected to fulfill the duties of the common processes. This results in a proposed framework for deformation systems. Our approach for a modular framework is to properly identify the different processes involved in most character animation techniques. After surveying a large set of character animation techniques, I propose that character animation deformation can be divided into two general classes of computation: global and local. Global computations are characterized by need to either propagate or collect information to and from neighboring primitives. The key observation here is that the evolution of the system is dependent not only on the current parameters defining a local primitive, but also on the interactions between such primitive and possibly other external factors. Typical examples of this form of calculation are collision detection and response, physically based modeling of a phenomenon, or direct user interaction with an object. Local computations are computations that are solely dependent on the current set of parameters defining the primitive. A typical example of this form of calculation is per-vertex or per-triangle operations. This first order analysis suggests that the global computation portion of character animation should remain on host processor where all the global state information is stored, accessed and updated. Local computations should be done on a local computation engine, a paradigm analogous to OpenGL. The next step in the design process is to properly define the structure and capabilities of the said local computation engine. Our approach to this next step is to develop a set of representative primitives or core computational approaches that are able to complement or supplant the local computations required by most character animation techniques. This can be thought of as both an identification of fundamental computation techniques and a method to simplify the set of basis techniques required by most character animation. 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The final step is to design a system that implements the set of core computational techniques. This allows the developer to have access to a standard set of computation techniques that have been specialized for character animation on graphics hardware. The key benefits of structuring the framework as presented include: a) Efficient utilization of resources where global computation runs on host processor and local computations run on graphics hardware. (OpenGL paradigm) b) Identification of a core set of computational techniques that can be readily used in a variety of character animation techniques c) Hardware acceleration of the core techniques is possible d) An easily scalable architecture that is applicable to a large library of character animation. e) Shorter development times of character animation plug-ins while retaining modularity. f) The GPU is freed up to perform special effects (lighting and shading) so that both deformation and special effects can now be performed simultaneously. 1.3 Motivation The migration of commonly performed tasks to specialized processors serves as a great motivation for this work. Over the history of computing, a variety of examples illustrate the performance gained from such migration. Before graphics became commonly used in computer applications, all graphics related computation was run on the host processor, including vector graphics redrawing. The first leap made was to create a simple rasterizing display chip that handled the mundane and repetitive task of refreshing the screen. This opened a new world of possibilities. The host processor can undertake more complex calculations since it is freed from the screen refresh duty. Thus a trend came about where a specialized processor always outperformed general-purpose computers when performing its intended, specific task. The reason for this is quite clear. Whereas the general-purpose computer cannot predict the data flow or the types of commands that it will execute, 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the specialized processor only deals with a very small set of instructions to execute, usually in a set pattern of execution. This simplicity and regularity can be exploited by hardware designers to obtain better performance. The migration of functionality first dealt with fundamental primitives like screen refreshing, shading and increasing pixel fill bandwidth. While these operations are still what most graphics card manufacturers attempt to improve, the idea of closely integrating more sophisticated graphics functionality into the graphics card system became more popular. This sparked the inclusion of texture mapping, the depth test, and lighting calculations. This migration trend still has not slowed down — the latest graphics cards include new and complex functionality such as programmable vertex shading. This suggests that performance can always be improved by including needed functionality on the graphics card. Myers and Sutherland [Myers 1968] made an astute observation on the design of specialized processors. They stated that the design of a specialized processor will eventually expand its functionality to be like a general purpose processor with a special set of instructions. This new general- purpose processor will then reach a critical point and then in turn, spawn off another simple specialized sub-processor. This cyclical pattern is dubbed the “Wheel of Reincarnation.” Based on Myers and Sutherland’s observation, I believe that the current generation graphics processor has already reached the critical point of its developmental cycle with the introduction of Cg, “C for graphics” [Nvidia 2003] — the graphics chip today has matured to resemble a general-purpose processor with a set of specialized instructions. I predict that the next stage of evolution would be to spawn off a new specialized sub-processor to further accelerate other functions. My hypothesis is that the next generation of graphics cards will include support for character animation on the GPU. One of the fundamental operations used by character animations systems is geometric deformation and thus, it is this geometric deformation system that is the focus of this work. OpenGL provides both a standard API and a model for all graphics card designers to use to provide a set of core functionality in every graphics card. In this thesis I pursue a similar goal: to identify the core functionality that is required by a character deformation system. 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Related Work Most of the previous approaches to deformation are isolated systems developed to solve a specific class of object deformation and thus are typically stand-alone systems. Other works concern themselves with increasing the performance and efficiency of a specific technique or extending the capabilities of a specific technique. With the ever-increasing demand for realism in computer animation, hardware systems with rudimentary support for deformable modeling are starting to appear. Despite all this work in deformable modeling and animation, no standards have been proposed or adopted. This section gives a brief overview of the field. 2.1 Evolution The beginnings of deformable surfaces can be traced back to when the first curve representations were introduced as compact and reliable representations for the computer. With the pioneering work of Bezier and others, it was discovered that using a set of basis functions and control points, one could define a curve, such as the B-spline, based on the position of control points. Since it only took a few control points to represent continuous curved segments, this compact representation of a curve could easily be stored and shared. Naturally, this technique was extended to represent 3D curved surfaces. Deformable objects could now be modeled and animated using control points. By moving the control points that represented the surface, the object would smoothly alter its shape in response to the new control point positions. More geometric techniques evolved from this in order to animate 3D objects and surfaces. Barr introduced geometric deformation as a first order concept [Barr 1984]. Instead of looking for a deformable object representation, Barr introduced the notion of a deformation primitive, an operation applied to a model. Barr’s technique was limited to axis-aligned objects and included simple twists and bends. This sparked research directions in Free-Form Modeling whose primary focus was defining a set of useful primitives that would allow artists to easily sculpt and create fiee-form solids. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Not much later, the Free Form Deformation (FFD) primitive was introduced [Sederberg 1986], This tool allowed us to embed models into the volume of the FFD, and as the artist manipulated the volume, the shape of the model changed accordingly. As it turns out, the mathematics of the FFD is an extension of the curves and surfaces to a 3D space, hence a volume. Again, we are dealing with control points that demarcate a bounding volume and controlling the positions of the control points can deform any object within the zone of influence. The first incarnation of the FFD also had some limitations. In order to keep the whole FFD process simple, it was necessary to constrain the bounding volume of the FFD to be axis aligned and the shape of the volume was required to be a rectangular parallelepiped. So it was natural that the second incarnation of the FFD, called the Extended FFD, removed these restrictions while keeping the mathematics the same with the original FFD. Artists were now able to use a wider set of bounding volume from cylinders, sphere and even star-shaped volumes. The cost of removing such a limitation was an increase in pre-processing time [Coquillart 1990]. Unfortunately, since these techniques were a purely geometric type of deformation, the behavior of the object was not intuitive and thus required a lot of manual work to create physically plausible behavior. One attempt to automate this process was by implementing simple Hooke spring dynamics [Christensen 1997]. The mass-spring system was easy to implement and was quite fast. When used to control the object deformation, the mass-spring system gave physically plausible results. When more realism was required, researchers turned to the field of physical object mechanics and material science and thus Terzopoulos adopted the Finite Element Method for deformable object animation [Terzopoulos 1987]. The Finite Element Method (FEM) is similar to the mass-spring system in that the equations and techniques are borrowed from mechanics. Just as mass-spring models are represented by point masses and a web of springs, the FEM partitions the object into a set of finite elements and appropriate links are created to represent the relationship between each element in order to facilitate the calculation of continuous functions representing the internal stress and strain within the object. In order to accomplish dynamics with an FEM approach, Terzopoulos derived potential energy functions for these 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. finite elements and used a solver to minimize the object’s potential energy [Terzopoulos 1987]. The resulting resting state of the object will then determine the final shape. Since this method is firmly grounded by the physics that govern elastic bodies, the results obtained from using FEM are by far the most realistic. The drawback to this method is that the minimization process is quite slow and may not be applicable to a highly deformable object that may no longer retain its shape [Gibson 1997], 2.2 State of the Art Recent developments in this field can be categorized into two distinct classes, geometric and physically based deformations. Since the introduction of the aforementioned techniques, the research in this field moved towards solving specific problems such as physical correctness for a specific material; efficiency and speed of deformation algorithms; or a new deformation technique for free form sculpting. The most recent developments of these techniques are summarized below. There has been a continued effort to find new ways to deform objects and new ways to represent deformable object. Some of the more notable developments in the geometric deformation include skinning (the process of embedding a skeletal structure that manipulates the shape of the mesh, otherwise known as the “skin”), and new deformation techniques. In a recent technical demonstration from Rhythm and Hues, a visual effects company in Los Angeles, they showed how they were able to render life-like animals in movies such as Stuart Little, Cats and Dogs and Scooby Doo. What they demonstrated was a standard skinning system augmented with a physically plausible muscle system. While the idea of layering a muscle system between the skin and bones is not a particularly new idea, most work in this area tend to use physically correct simulators such as FEMs to calculate the correct amount of muscle expansion and contraction. On the other extreme, other skinning systems simply omit the muscle layer and perform linear interpolation based on the bone influence and position to manipulate the skin. The demonstrated system included physically plausible muscle systems and was still able to run in real time without the high-fidelity shading, rendering, and fur effects. With this 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. system, the artists were able to reproduce effects such as muscle bulging and skin wrinkling thus giving a better overall realism to the animated character. Free Form Deformation is a popular tool used to sculpt and animate objects but until recently, FFD was capable of using only a restricted set of bounding volumes. FFD and its earlier variants were only able to work with regular solid such as rectangular parallelepipeds, cylinders, spheres and the like. Although this restriction does not limit the ability of the FFD to deform most common objects, FFD was unable to properly manipulate any arbitrary object with a single bounding volume. Recently a new method was proposed to apply the FFD technique to arbitrarily shaped objects at the expense of the mathematical simplicity of the earlier FFD variants. The details of this can be found in [MacCraken 1996]. A recent entry into the geometric deformation is WIRES. Unlike its predecessors, WIRES uses an axial deformation paradigm instead of a bounding solid paradigm As its name implies, this deformation primitive uses curves to demarcate its bounding volume. The authors drew inspiration from clay models with actual wires in them to give them a skeleton to stick on to. The shapes of the clay models were manipulated by changing the underlying wire’s shape. This method is analogous to an FFD that solely used cylindrical shapes as the bounding volume. In fact, it can be shown that FFDs and the WIRES system are related since WIRES was able to mimic the behavior of a rectangular parallelepiped FFD [Singh 1998], Instead of dealing with mesh representations, another technique approaches deformation by choosing a better representation for deformable objects. The work in this area includes the use of Metaballs [Blinn 1982] and implicit surface representations [Menon 1996] for deformable objects. While these representations may indeed be a better internal representation (to the algorithm) of a deformable object, rendering these representations usually use some form of a mesh representation (or renderable representation) since most graphics hardware are optimized for meshes. One advantage of this technique though is that since the internal representation is decoupled from its renderable representation, creating a highly tessellated mesh is equivalent (with respect to algorithm complexity) to creating a lower resolution tessellated mesh. 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Another interesting approach to object deformation is Pose Space Deformation (PSD) [Lewis 2000], The idea behind PSD is to map all possible object configurations into an abstract parameter space known as pose space. Within pose space, a rich set of interpolatory techniques can now be applied to points in pose space (representing different object configurations) to generate new shapes and object configurations. The advantage of this technique is that it combines disparate deformation types into a unified approach. New advances in physical controllers were primarily concerned with achieving real-time performance using FEM. Since FEM methods are grounded in material science and mechanics, these techniques give the most physically correct behavior. Two notable research directions recently propelled the FEM as a viable contender as a real-time deformable system. The first technique [James 1999] uses Boundary Element Methods (BEM). The main difference between an FEM and a BEM is that whereas FEM simulates the finite elements throughout the entire volume of the object, the boundary element model only simulates finite elements lying on the boundary of the object. The second paper entitled Dynamic Real-Time Deformation using Space and Time Adaptive Sampling [Debunne 2001] focuses on the how to efficiently perform the FEM only on regions of interest. This is based on the observation that when interacting with a deformable object, one typically does not cause the entire object to deform but only a specific region is affected. A recent addition to the FEM methodologies focuses on using simplistic FEM axial based meshes to solve Skeleton Subspace Deformation (SSD) specific problems. The idea is to introduce fast and physically plausible secondary animation (muscle flexing or fat jiggling) to existing skeleton based deformation techniques by embedding the object’s surface and SSD skeletal system inside a simple FEM mesh roughly approximating the shape of the object. The authors show that despite using low resolution FEM simulation, it is possible to obtain fast and physically plausible deformation. New techniques such as EigenSkin[Kry 2002] and DyRT[James 2002] attempts to leverage the programmability of the current generation of graphics cards. The introduction of “C for graphics” (Cg) has made it possible to tap into the power of the graphics card to perform non-rendering tasks on the GPU. More on the programmability aspect of current graphics cards will be discussed in the next 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. section. EigenSkin reduces the number of parameters normally required to represent the different degrees of freedom in PSD. By reducing the number of parameters, it is now possible to feed the data stream (vertex position, deformation parameters and surface normals) into the limited input stream size that the current graphics cards can handle. The result is a compact PSD representation that allows the graphics card to easily evaluate the deformation. DyRT represents different modes of deformation via Principal Component Analysis (PCA) coefficients. The appropriate deformation is then reconstructed using these PCA coefficients and additional deformation parameters. Thus the host processor is freed from computing deformation by providing the necessary coefficients and deformation parameters to the graphics card. 2.3 Current Trends Although the research community has not proposed any deformation API, the consumer graphics card manufacturers have started to include interesting features related to deformations systems. In this section we look at some examples of how consumer graphics cards and systems include capabilities that start bringing deformation algorithms to the hardware. The first system that we will discuss is Sony’s Playstation 2. The PS2 is composed of two nearly identical RISC processors, the Emotion Engine (EE) and the Graphics Synthesizer (GS). The most interesting feature of these processing elements is that this is one of the first commercial graphics systems that has ten floating point multiply accumulate (FMAC) hardware units on chip. While there is nothing novel with having multiple FMAC units in commercial digital signal processing chips, the PS2 is the first that makes this fundamental operation available in a low cost mass-market system. FMAC units are widely used in graphics operations such as dot products, matrix multiplication, antialiasing, alpha blending, discrete convolutions and other primitive operations. In addition, FMAC units are very useful in B-spline based surface rendering and these units thus can be readily used in implementing FFD on the PS2. The software API would then provide the interface to the low-level assembly instructions that control the FMAC units for an FFD. With such a multiprocessor architecture, 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. application programmers axe able to concurrently draw static environments with the GS and do calculations such as physically correct deformation on the EE. Once the GS is done with rendering the static objects, the results of the EE would then be passed onto the GS to complete the rendering of the deformable object. The next system we will investigate is the GeForceFX and its variants. The GeForceFX supports two key features that suggest that the GeForceFX is also capable of supporting deformation in hardware. The first feature called the nFinite Effects Engine, also referred to as vertex programming. This engine was originally intended to give the graphics programmer full control over the transformation and lighting (T&L) calculation of the pipeline in order to implement more complex shading operations such as Phong shading, bump and displacement mapping and even procedural texturing. This feature was made possible by exposing the input/output registers, the state registers and the proprietary instruction set of the T&L pipeline to the application programmer. Due to the general SIMD instructions available to the T&L pipeline, programmers are able to modify vertex positions as they are passed into the pipeline. This allows the programmer to write procedural deformation microcode using the T&L pipeline. The second key feature of the GeForceFX graphics chip is a built in capability for NURBS surface rendering. Since the FFD can be thought of as a three dimensional version of a B-spline surface, it is not that difficult to extend the existing NURBS surface Tenderer into an FFD volume deformation tool by adding a third dimension to the NURBS evaluator system. However, due to the current Nvidia API and implementation of the NURBS surface Tenderer, any extension to the system would still require a more involved process of API design and hardware architecture. In its current implementation, the NURBS surface Tenderer is designed to work as a “black box” where the programmer can only control the control points and knot positions. The graphics pipeline would then evaluate a uniform grid in order to build the NURBS surface. In order for the FFD to work appropriately in this environment, Nvidia would need to make the current NURBS surface renderer visible to the application programmer. 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The final system discussed is ATI’s Radeon 7500 and 8500 series of graphics cards. Compared to the aforementioned graphics cards, the ATI Radeon graphics card provides the most interesting features available in commercial graphics cards available today. These features include the vertex morphing system and the N-gon tessellation system. What makes these features the most interesting is the fact that ATI system designers included these in order to address the issue of character animation in hardware and has similar parallels in our proposed system. These techniques were first presented in the Game Developer’s Conference 2000 where ATI showed emerging trends of including animation systems in hardware and of developing standard OpenGL API extensions supporting these systems. This trend validates the need to formulate a standard API for deformation systems. ATI’s current venture into this trend begins from the Radeon’s vertex morphing capabilities. The vertex morphing system was originally intended to increase character animation performance by supporting skinning and rudimentary facial animation techniques in hardware. This was achieved via a key framing system that took source and target models and a set of vertex-by-vertex correspondence between these models then linearly interpolated between the two models. Since these source and target models and their correspondences are not constrained to deal with any specific mesh model, such a 3D morphing technique can be thought of as a geometric surface deformation system. Thus with the ATI Radeon graphics cards, programmers are able to take advantage of a simple key-frame morphing system. The Radeon’s N-gon tessellation system is also of interest since it has parallels to the proposed framework. In addition, having multi-resolution techniques available in hardware is important since nearly any deformation on any object will cause surface wrinkles. Thus the programmer has two options, either to work directly with a highly tessellated mesh or to include a multi-resolution technique to refine the object further in order to maintain representation accuracy. Although working with a highly tessellated mesh seems to be the method of choice of most practitioners due to its simplicity, this limits both the rendering speed and the amount of other objects that can share the scene with the deformable object. On the other hand, working with a multi-resolution technique allows the programmer to apply the deformation operation to a smaller amount of vertices then automatically 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. increase tessellation afterwards. The only drawback with this technique is that creating these additional triangles will eat up processor time, time that could have been spent on physical simulation or other application related tasks. The ATI Radeon brings us one step closer by implementing the multi- resolution technique in hardware. So we still get the benefit of applying the deformation operation to a smaller number of vertices but now since the increase in mesh resolution is offloaded to the graphics card, there is no longer a CPU time penalty associated with it. So what we see from these available commercial graphics cards is that we already can migrate deformation systems to the hardware, and that companies such as ATI have already acknowledged the need to include animation primitives in hardware. This reinforces the need for a hardware/software architecture and API for deformable modeling and animation. In a more recent development, the OpenGL Architecture Review Board (ARB), the board governing the revision of the OpenGL standards and specification, has met to discuss the schism in functional standards proposed by the various graphics hardware companies. Due to the proprietary nature of the extensions that companies have provided, software developers now have to commit twice the amount of effort to support the same functionality across the various graphics cards. Although the newly proposed OpenGL 2.0 specification is a significant step, the OpenGL ARB focused on the unification of hardware programmability via a standard vertex shading language. What is still missing from OpenGL2.0 are standards for deformation (Figure 1). It is only a matter of time before the ARB will need to extend OpenGL once again when graphics hardware manufacturers introduce a new schism in deformation standards. 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. .......“ " jpM M BBM pB ipk XI w i t * ^WPflB ..... ........... Figure 1: OpenGL 2.0 Overview 2.4 Conclusion This section discussed the evolution of the different techniques to deformable object animation beginning from Barr’s pioneering work until today’s hardware accelerated support. Throughout this time, researchers focused on either improving these techniques or refining these techniques to work in specific situations. While there has been improvement in these two directions, building a system incorporating several of these techniques has not been addressed effectively. Simply including the various techniques may lead to functional overlap and thus affect the performance and complexity of the system In the following chapters, I will begin the discussion of a first attempt in designing a framework to solve these problems. 1 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 The Framework In order to understand the deformation framework better, let us take a look at a typical deformation system as shown in Figure 2. Most deformation systems are implemented as an independent and monolithic system as shown on the left side of Figure 2. The main reason why most researchers do this is because their goal is to solve a specific problem in deformation and therefore, they exclude any other unnecessary features or techniques in order to concentrate on the results of their technique. Taking a closer look at any existing deformation technique such as the Finite Element Method (FEM) Controller, I observe that a deformation system can be abstracted into two different layers as shown in Figure 2. Global Computation Local Computation Figure 2: Global and Local computation The first layer, called Global computation, refers to computation that requires knowledge of the properties or parameters of neighboring primitives. For instance, the computation needed to calculate how the force applied to an FEM model is spread among each of the finite elements requires the knowledge of the shape of the object and how each of the finite elements interacts with each other. Global computation by nature requires random access to the deformable object and thus requires the immediate access to the whole objects data structure. Because of this requirement, the most efficient place to execute such computation would be the host processor since this is by default where the whole deformable object’s data structure would reside. In addition, parts of the deformation calculation would also entail the interaction between two or more distinct deformable objects. Access to both object’s current state information, including current shape, orientation and external forces, are also easily stored 17 Character Animation Technique Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. and handled in the existing host processor architecture. The main purpose of this layer is then to separate what part of the deformation should run on the host processor and what should not. The second layer, called the local computation, is characterized by computation that operates on fundamental graphics primitives such as vertices or triangles and a corresponding set of support animation parameters. The idea behind this layer is to move unnecessary repetitive computation from the host processor to a specialized processor and thus free up computation time on the host processor. This layer assumes that the knowledge of any other primitive is not required and that the host processor passes the primitive and the animation parameters to the specialized processor. A key characteristic of local computation is the parallelizability of operations on the data stream. Thus a computation is classified as a local computation if in theory one is able to create independent non-communicating threads of execution for each primitive. This division of labor is analogous to how current rigid body animation is partitioned. Computation such as animation timing (e.g. ease in, ease out), animation path, and animation interaction is done on the host processor, whereas the transformation, texturing, lighting and filling is done on the graphics card. It is also natural to extend this analogy for deformable object animation. 3.1 Goals and Benefits 3.1.1 Hardware Acceleration One of the goals of this thesis is to outline how to accelerate the local computation layer in hardware. Therefore the techniques chosen for this layer must be able to be implemented in hardware. I have already shown that there is a trend to extend the responsibilities of the graphics card to front-end mesh processing including character animation. The next generation of graphics cards such as Nvidia’s GeForce FX and ATI’s Radeon 8500 is starting to include rudimentary mesh function such as vertex morphing, triangle subdivision and vertex programmability. In addition, existing hardware units 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. available for other purposes can easily be used to form the basis of a simple yet powerful deformation primitive. The problem with the hardware manufacturer’s current approach to the inclusion of deformation support is that they are limited in scope. The algorithms that are chosen to be included are usually simple and meant to do a specific type of deformation such as skinning support on ATI Radeons. In contrast, our framework is designed to include a wider class of deformation techniques so that the animator can tailor the graphic cards capability to his specific needs. If skinning is not needed then the inclusion of skinning in hardware is questionable. Ultimately, hardware acceleration will also serve as a benefit to the system. If system’s layers were standardized, then this would give graphics hardware manufacturers a set of standardized functions that can be included in the graphics chip. This would in turn give the software developer a new set of features to take advantage of when writing a deformation system. The two advantages gained by a system designer are that the burden of computation is shifted from a general purpose CPU to a dedicated processor on the graphics card and that the cost of computation of the same algorithm on a dedicated processor should be less than to the computation time on a general purpose CPU. 3.1.2 Standard Deformation API Just as OpenGL and DirectX unified what once was a disarray of non-standard methods of creating 3D applications, one goal of this work is to be able to select a set of deformation primitives as a basis set of API calls that are readily available on all next generation graphics hardware. Having a standard deformation API not only helps the developer avoid reinventing the wheel for these commonly used techniques, it also helps graphics hardware manufacturers optimize the design of graphics hardware units to incorporate such standard functionality. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.1.3 Modular Deformation Primitives One key benefit of this work is the modularity of deformation primitives. This directly results in an easily interchangeable system that can handle a set of different types of deformation primitives. The benefit of such architecture is that all other benefits including standardization and hardware acceleration are still applicable to the different deformation primitives. In addition, the animator is not constrained either implicitly (retargeting different methods to a single method) or explicitly (simply having no possibility of using other techniques) because the architecture is capable of handling a wider variety of deformation primitives natively. 3.2 Issues There are several key criteria that the framework must conform to: a) efficiency b) simplicity c) integrability d) applicability or usability e) evolvability Naturally, efficiency is the first consideration of the system. If the framework is unable to perform at least as fast as software then there is no clear benefit of designing a complex framework and require it to be imposed as a standard programmatic interface for deformable object animation. One way to insure that our system runs indeed as a fast as software is to design the system for hardware acceleration. By designing a special purpose sub-processor, hardware design optimization techniques such as pipelining can then be used to increase throughput of the system. But in order to design a system for hardware acceleration, the algorithm to be accelerated needs to be feasible in hardware. The more complex the algorithm the more difficult it is to design a hardware unit, and much more so attempting to optimize such a design. This brings about the next key criteria of the system: simplicity. The idea behind this is to determine the most fundamental operation 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the system and design a hardware unit for that fundamental operation. Simplicity ensures that such a hardware design is easy and feasible and that hardware optimization techniques can be applied to it. Even if a special sub-processor exists that could accelerate computation for deformable object animation, it would not see much use if it were not able to interface with existing animation systems and graphics cards. This becomes a two-pronged test for integrability of the sub-processor. First it would require that the sub-processor be easily integrated with existing graphics hardware. Second, it would also require that a proper API fit within the existing graphics APIs. Designing such a system would also be in vain if the system is not widely accepted. Thus the usability of the system is also an important consideration in the design process. To insure that the system would be widely used is to design the system to incorporate already well-known and widely used deformation techniques. In addition, incorporating several classes of popular deformation primitives also is important to insure that the programmer does not feel constrained to only a specific class of deformation. Finally, a system is made more valuable if it is evolvable. More precisely, if the design of a system does not include design options for future growth, it will remain static and will easily become obsolete due to the changing demands of specialized deformation processors. Incorporating different paths of growth from hardware design, optimization, increasing capability and programmability will only insure that the system can adapt to the ever-changing needs of deformable object animation. 3.3 Conclusion This chapter discussed the fundamental principle of the separation of global and local computation used to design a system for deformable object modeling. Hardware acceleration, a standard deformation API, and modular deformation primitives are then introduced as the goal and benefits of our system. This chapter then concludes with key issues associated with the design of a successful framework, namely, efficiency, simplicity, integrability, usability and evolvability. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the next chapter, I address a key point in the design of a framework, namely the selection and hardware unit design and architecture of the “core computation.” This will serve as a fundamental building block to which the rest of the system will be designed around and is the key to unlocking a design that achieves the goals of efficiency, simplicity, integrability, usability and evolvability. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Hardware Accelerated Free Form Deformation The design of the framework first began from a bottom-up approach. It is critical that the design of the core computation of the system was addressed first since it will be the building block of all deformation calculation that will run as a hardware unit. Thus taking into account usability and simplicity, Free-form deformation is selected as a core computation. Free form deformation was introduced in 1986 as an implicit method of model sculpting but since then has branched out in many different directions, spawning off many variants of the technique and novel uses in deformable object animation. This technique is widely used is a standard tool available in commercial animation packages and thus satisfy the requirement of usability. As discussed in the later sections, FFD is also represented by a simple mathematical formula making it computationally efficient and easily amenable to hardware design. So the first design iteration primarily focused on the hardware design of the FFD sub-processor and all the corresponding issues of hardware integration of the sub-processor with existing graphics pipelines. 4.1 FFD Basics The analogy behind all types of FFDs is embedding an object into a piece of gelatin, and as the gelatin deforms, the embedded object deforms with it. The different FFDs only vary the initial shape of the gelatin in which the object is embedded. A standard FFD works solely with rectangular parallelepipeds. Arbitrary topology FFDs allow any control lattice. The EFFD, while not as versatile as the arbitrary topology FFD, can work with cylinders, spheres and other regular shapes more complex than a rectangular parallelepiped, while still using the efficient deformation equations of the standard FFD. EFFD operation is divided into three steps. The first step embeds the object in the initial control lattice and computes the parameterized coordinates of the object. The next step moves the control lattice points to new locations, thus deforming the enclosed region of space. The last step 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. calculates the deformed positions of every object point based on the new locations of the control points. The deformed object is then ready for rendering. The embedding process starts with a lattice bounding a volume of space to be deformed. An object is embedded within the lattice by computing a transformation of coordinate systems from the object coordinate system to the local lattice coordinate system. The embedding process requires a solution to a system of non-linear equations. This system of non-linear equations is defined by the equations of deformation so we will first introduce these equations and then show how the embedding is done. Assuming that each object point X=(x,y,z) has a parameterized local coordinate (s,t,u) and Q<s,t,u<l, then with the set of control points P, we calculate the deformed position q with (l i j A S’t ’U ) = Z PM J +mM n B l (S)B m ( t ) B n («) Eq. 1 where Ptj k is the ith, jth, kth control point and the Bs are the uniform cubic B-spline blending functions shown below. Thus, given a set of parameterized local coordinates, we evaluate the above equation to get the set of deformed points based on the new positions of the control points. B0(u) = ^ ( l-3 u + 3u2 - u 3) B1(u) = ^ ( 4 - 6 u 2 + 3w3) B 2(u) = ^(1 + 3u + 3u2 - 3m3) E q“ 2 B ,{u)= \u2 For the embedding process, we need to derive the system of non-linear equations to determine the parameterized local coordinates. From equation (1), we can derive the system of non-linear equations for the (s,t,u) parameterization of an object point X. The intuition behind the embedding procedure for the EFFD is that it is the inverse of deformation of the object in the initial control lattice to a rectangular parallelepiped control lattice. Hence, the initial EFFD control lattice shape is constrained by the existence of a mapping or morph between it and a standard rectangular parallelepiped. The morph must not incur any space folding, that is, it must be invertible. From this we 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can see why the EFFD is not appropriate for lattices of arbitrary topology since the morph from the arbitrary lattice to the rectangular parallelepiped may not exist. To derive the system of non-linear equations, we set the object point X equal to q, the deformed point in Equation (1). The goal then is then to find a parameterization (s,t,u) that satisfies this equation. In order to find (s,t,u), we rearrange Equation (1) to obtain E PM J+mMnB l (S) B m (0 B n (U ) ~ X i ,j,k = 0 Eq. 3 where XiJ jk is the object point within the deformable region of P iJ tk to Pi+ 3 j+3 ,k+ 3 ■ Since this is a 3-D vector equation, we have one equation from each dimension, and three unknowns, namely (s,t,u). We use a Newton-Raphson root-finding method with initial guess 0.5 to find (s,t,u). For a more detailed explanation of the EFFD process and properties, refer to [Coquillart 1990]. 4.2 Lattice Translation Table Our formulation of the FFD equation (Equation 1) uses the uniform cubic B-spline basis functions. The parameterized point depends on a 4x4x4 set of local lattice points that surround the object point. A 4x4x4 lattice grid is required to deform a cell volume within the lattice, as shown in Figure 3. For more complicated lattices with more than one cell volume, the process of embedding an object point also entails locating which cell volume controls the space that the object point is located. Object inside the deformable region Non deformable Phantom Region 4x4 lattice grid Figure 3: A 2-D view of the lattice grid. This shows the 4x4 set of control points for a single cell of deformation, the region of deformation in the center of the bounded region, and the non- deformable regions surrounding it 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Since the lattice shape is not always a rectangular parallelepiped, we propose an indexing structure for fast object-point to cell-location translation called the Lattice Translation Table (LTT) shown in Figure 4. The LTT is an axis aligned bounding box of the lattice that is divided into regularly spaced cubes. For each cube that intersects with the deformable area of the lattice, indices to the deformable area are kept in each cell. If more than one deformable region occupy a cube, only one index is kept since the appropriate cube can be deduced as an extrapolation from the results of the embedding process. This extrapolation mechanism will be discussed in the next section. LTT Bounding box Multiple lattice volumes-cell intersection Start o f raster scan EFFD Deformable Volume (Phantom Regions ommitted) (left to right, bottom to top) Figure 4: The Lattice Translation Table The creation of the LTT is done in a raster-order sweep of each cube in the LTT. A set of test points in each cube is chosen, typically the center and the comer points of the cube, then these points are tested for their positions within the lattice volume by parameterizing them in the lattice. If the parameterization yields coordinates such that 0 < s,t,u < 1 then we found a lattice volume-cube intersection. A good starting point for the raster sweep is to use the location of the control point at (0,0,0) in the lattice grid and then use the cube that contains the control point. Notice that some cubes that contain lattice volume-cube intersection may be missed using this algorithm if the lattice volume is concave (e.g. an extruded star shape). One solution is to add more test points in each cube. Another option is to grow the existing cubes to encompass the entire volume of the lattice. This latter method takes the existing set of lattice volume-cube intersections and tests the neighboring cubes that do not have lattice volume-cube intersections. Neighboring cubes are chosen for testing based on control 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. points that have common edges with other control points that lie inside cubes with lattice volume-cube intersections. The LTT is created, as a pre-processing step and each LTT is associated with a specific lattice. This means that a whole library of lattices with its associated LTT files can be created beforehand without prior knowledge of the object to be deformed. Once the lattice and LTT structures are created, embedding the object points involves an LTT lookup of the lattice index belonging to the LTT cube that the object point is in. These indices are then used in the Newton-Raphson root finding method for each point. 4.3 Extrapolation The embedding process described in section 4.1 can also be used to extrapolate the correct cell that an object point belongs to if a good initial guess was provided. Since overlapping deformable volumes exist in an LTT cube, only one lattice index in the LTT can be used as a starting guess for all points within an LTT cube. Since the parameterized coordinate is constrained such that 0<s,t,u <1, coordinates that do not satisfy this constraint mean that the object point does not belong to the lattice cell and also that the object point lies in the direction of the axis that does not satisfy the constraint. In other words, if s < 0 or s > 1 then the lattice cell that is in the x-1 or x+1 position relative to the current cell will contain the object point (Figure 5). Thus the process of embedding and extrapolation is as follows. First, the initial guess, (x,y,z), of the lattice cell index is taken from the LTT. The object point is parameterized against the lattice cell. If it satisfies 0<s,t,u < 1, then the object lies in that lattice cell with parameterized coordinates, (s,t,u). Otherwise, the lattice cell index (x, y, z) is incremented or decremented depending on s, t, u less than zero or greater than one, respectively. 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Object Point Initial guess \ A from LTT New Extrapolated Deformable Volume Figure 5: Extrapolation procedure 4.4 Hardware Acceleration Once the deformable object has been pre-processed by creating the auxiliary data structures described above, the evaluation of the FFD can now be performed in hardware. There are two methods described here that demonstrate how FFD evaluation can be done. In addition, we describe an issue related to data access patterns and hardware technology that can assist it. 4.4.1 Evaluator Support The first method is to instantiate hardware units that support direct FFD evaluation. This can be achieved by taking advantage of OpenGL evaluators. In OpenGL design documents, evaluators are pre-transformation units that feed vertices to the vertex pipeline. Their primary role is to evaluate the Bemstien basis functions for Bezier curves and surface rendering. Until now, no graphics hardware manufacturer has directly instantiated hardware support for OpenGL evaluators. Instead, most graphics hardware manufacturer opt to implement specialized extension hardware units that are specific to their implementation. While this may indeed be a more efficient design with respect to performance, it does not afford programmers the same functionality of the OpenGL evaluators. Our design will concentrate on using and extending standard OpenGL evaluators for FFD evaluation. Our conceptual design (Figure 6) shows a three stage pipelined trivariate EFFD evaluator. The first stage uses three polynomial evaluators to evaluate each blending function in parallel. The next stage is a four-input floating-point vector multiplier that multiplies the results of the blending functions 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with the control point vector. The last stage is the vector-accumulator that computes the summations for each dimension in Equation (1). A 6-bit counter controls the asynchronous reset of the accumulators after every 64 additions. The system also shows two optional components: the register file for polynomial coefficients and the control point cache. Their functions are discussed in later sections. Input latches — ► ] PE PE '3 (a) PE Pipeline Stage 1 Pipeline Stage2 Output latch 6-bit Register File for Polynomial Coefficients From Control Point Cache To OpenGL vertex rendering Figure 6: Block-level design of the OpenGL Evaluator Sub­ block. (a) PE refers to a Polynomial Evaluator. (b) Stage 2 is a floating-point vector multiplier (c) The output stage is a vector accumulator (d) 2-bit counter’s output’s go through a NOR gate and is attached to the asynchronous reset of the accumulator’s register. The base system without optional components works as follows. First the parameterized coordinates (s,t,u) for a model vertex and a control point P are fed into the polynomial evaluator. Once the blending functions are evaluated, all the results are multiplied together and this product is then multiplied with the control point vector. The final vertex coordinate is accumulated for each dimension after every 64 additions. With a pipeline architecture, vertex coordinates are fed into the pipeline continuously and removed at the same rate after a fixed processing latency. These deformed vertices are then passed to the transformation engine for rendering. A summary of the data flow is shown in Figure 7. 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. t u P, r, p, rcg Figure 7: Data Flow of the Pipeline with optional components. Data flows through the pipeline with the general polynomial evaluators set to the uniform cubic B-spline blending function, BO . 4.4.1.1 Control Point Cache This optional component insures that the control points are available to the pipeline as required. The control points are first loaded into the control point cache prior to sending the parameterized coordinates (s,t,u) for any vertices in a model. Once the control points are loaded vertices in the model can be deformed efficiently. Since the model is embedded in the control lattice as a preprocess, a synchronized sequence of model vertices and control points can be computed and stored for all or any part of a model (like a vertex array) to optimize cache performance and minimize its required size. Although the control point positions change to obtain deformations, their number and neighborhood relationships within the control lattice and the model vertices do not change. It is clearly possible to maintain either separate model and control point arrays or a single unified model and control point array. In either case, cache performance can be statically determined by the degree of array synchronization. 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.4.1.2 Polynomial Coefficient Register File Although predetermining the polynomial evaluator has the advantage of simplifying the hardware design, it forces programmers to convert from one family of curves to another. This additional conversion process could be seen as another point of optimization. Instead of using CPU cycles in computing this transformation, we could add specialized hardware just for this transformation. This is however not optimal since the transformation involves matrix multiplication. Duplicating the existing multiplication hardware is wasteful and rerouting the existing OpenGL matrix multiplication hardware disrupts the pipeline. By modifying the behavior of the polynomial evaluators in our system, we can bypass the transformation step. The only modification is to implement a more general polynomial evaluator. This means that the polynomial evaluator can evaluate a three-degree polynomial with arbitrary coefficients. Thus, our general polynomial evaluators are of the following form. GPE = ox3 + bx2 +cx + d With this general polynomial evaluator, we can change the blending functions by changing the coefficients and degree of the polynomial evaluator. This is where the polynomial coefficient register file is useful. In order to maintain state for the family of curve we are currently rendering, we set the current coefficients and polynomial degree in the register file. 4.4.2 Vertex Programming Another method that supports FFD evaluation in hardware is via vertex programming. Originally intended to allow programmers to perform complex texturing and lighting effects such as bump mapping, vertex programming has opened up the basic transformation and lighting (T&L) engine for the programmer to control. The current method to control the T&L engine is to allow the programmer to load into the graphics card a microcode program that is composed of a proprietary 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. SIMD instruction set. The programmer would also be able to access all input, output and state registers of the graphics card. The only drawback is that the maximum microcode length is 128 instructions. This is not so restrictive considering that vertex programming is not intended to replace general software programs. We will restrict our discussion of graphics card specific vertex instructions and properties to Nvidia’s vertex programming instruction set architecture. There are several notable features in Nvidia’s vertex programming architecture that allows us to implement FFD evaluation within 128 microcode instructions. The first is that each instruction is a SIMD instruction. This means that each operation operates on a 4-tuple, be it a 3D homogenous coordinate or a RGBA color value. Thus, the benefit we gain from this is the fact that each data element is a 4-tuple therefore the amount of code needed to represent several operations on a vector is shorter compared to a non-SIMD instruction. A 4-tuple representation also benefits encoding common scalar values into a single vector register. The primary benefit we see from all of this is space savings in both instruction count and data representation, both of which are important since we are working with a very limited microcode length. The second notable feature is that the instruction set provided by Nvidia has a rich set of operations defined on its vectors including dot products, multiply-accumulate, vector element duplication and other operations. This directly affects the length of microcode length needed to code an FFD evaluator. From equation 1 we see that since we are dealing with a summation of multiplied parameters, the dot product operation would shorten the 64-length summation chain by a factor of 4. Notice also from equation 1 that because the uniform cubic B-spline basis is a cubic equation, we are able to evaluate the basis functions easily using the multiply-accumulate operation and the nested form of a polynomial. Finally, encoding several scalar values into a single vector would be useless if the architecture required that vector operations be defined only on vectors during an element-wise vector operation such as vector addition or scalar-vector multiplication. This implies that a scalar needs to be expanded to a vector (by duplicating the scalar value for each entry of the vector) and thus negating the space savings gained. Fortunately, the instruction set allows us to use scalar values in scalar-vector operations by selecting an element of a vector as a scalar. 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The third is a set of 96 floating point vector registers. Using the current formulation of the FFD using the uniform cubic B-spline basis functions, we see that in order to define a single deformable cell, we require the specification of 64 control points. Thus in order to evaluate an FFD using vertex programming means that we would require at least 64 vectors. In the current implementation of Nvidia’s GeForceS, we see that in implementing FFD evaluation, we already require 66% of the total state registers. This leaves little room for any other operation relying on register file space but ensures that FFD evaluation can indeed be performed using vertex programming. There are both advantages and disadvantages when using vertex programming to implement FFD evaluation. The main advantage is that once implemented using microcode, it is not farfetched to encode the said microcode onto a ROM that is then added to the graphics card. Thus, the design cycle of adding a hardcoded vertex program versus adding new hardware functional units is much shorter. In fact, I suspect that the implementation of Nvidia’s standard T&L unit is done via a hardcoded vertex program embedded into the graphics chip. Thus implementing a microcode version of FFD validates the claim that FFD can be supported in hardware. A big disadvantage right now is due to the 128- microcode-length limit. Since FFD evaluation is quite a complex operation, most, if not close to all, of the 128 instructions will be used up. Very little is space is left to do standard perspective projection, let alone texturing and lighting. Thus in its current incarnation, it is quite unlikely to be able to support shading and texturing given the current limits of the vertex programming architecture. One idea would be to allow the results from one vertex program to be used as input to the standard T&L on the graphics card since FFD evaluation in essence manipulates the vertex positions and normal directions but neither the texture coordinates nor the shading algorithms. 4.4.3 Control Point Access with AGP Support So far, we have discussed all FFD calculations with respect to a single deformable cell volume and assumed that somehow the control points of that respective cell volume was already in the local registers, ready for FFD evaluation. While it is possible to pass in the correct set of control points per 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. vertex, notice that first of all, it would require a lot more bandwidth than necessary and it also requires processor I/O calls to index into the correct set of control points. Fortunately, current standards and techniques can be exploited to efficiently access control points in main memory. 4.4.3.1 Review of AGP C on cepts As graphics cards evolved, the need for a faster interface became necessary. The legacy shared bus interface, ISA, simply was not fast enough to handle the bandwidth requirements. Coupled with the fact that it is a shared interface, the total amount of bandwidth is effectively reduced. On the other extreme, the VESA local bus interface allowed a direct and non-shared interface between the CPU and the graphics card. The drawback was that the clocking, transfer rate and even data transfers to the graphics card were directly dependent on the CPU and with a large proliferation of different CPUs the design of the graphics card quickly became complex. Still even as the main shared system bus was upgraded to PCI standards, the amount of bandwidth required by current graphics cards easily outgrew the available bandwidth of the PCI bus. The most recent solution to this is the Accelerated Graphics Port (AGP) [Intel], which was introduced by Intel in 1996. The most notable feature of the AGP is that, like the VESA local bus, it is a non-shared bus separated from the standard system peripheral bus. It is tied directly to the North Bridge controller that includes the following sub-components: the system memory and the CPU shown in Figure 8. From our FFD system’s point of view, AGP also provides some notable features that help increase the efficiency of control point access. Figure 8: Accelerated Graphics Port 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The first feature that helps efficient control point access is called Graphics Address Re­ mapping Table (GART) shown in Figure 9. The GART is a mechanism by which the graphics card is able to address a larger address space that there is real physical local memory on the graphics board. This is analogous to the CPU’s virtual memory model. With the use of GART, the graphics card is able to allocate a portion of the system memory for exclusive use. In addition, the GART also enable the graphics card to address both local graphics memory and allocated system memory as a single linear address space. The original intent of the GART is to allow programmers to have access to more texture memory than what was limited to on-board memory. As triangles were sent down the pipeline to be textured, the appropriate texture is retrieved directly from main memory. This also saves texture setup time since the programmer no longer needed to keep loading and unloading large textures needed in sequence. Although accessing textures via main memory is slower than most on-board memory solutions, the cost of retrieving data from the AGP aperture is reduced with the most recent proposed AGP 3.0 standard that has a theoretical peak bandwidth of over 2.1 GB/s. For a more realistic computer system today under the AGP 2.0 standard, the AGP 4X transfer protocols has a theoretical peak bandwidth of 1.017 GB/s. Unfortunately, the protocols are hampered due the current motherboard northbridge chipset, the chipset bridging the graphics chip with main memory, running at 66MHz giving it a total usable bandwidth of only 508.6 MB/s. This is slowly being rectified as motherboard chipsets are clocked even faster thus increasing the system bus speeds. Top of itafwrsf Physical M om ory Figure 9: Graphics Address RemappingTable Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The other important feature of AGP for control point access is called sideband addressing as shown in Figure 10. When data is requested from memory such as in the PCI interface, the device first requests the appropriate address and waits for the data to arrive prior to sending out the next address request. Since there is a memory latency associated with each address-data transaction, multiple transactions add up to a lot of wasted time due to the accumulated memory latencies. The reason why it is necessary for the system to wait after a request is due to the fact that the shared bus is both the address and data lines. With AGP, an additional 8 dedicated address lines are available to the graphics card. What can be done now is that during the memory latency cycle, these 8 sideband address lines can still continue to request for more data as 8bit offsets from the original address while the first data is still being transferred over from main memory. This way, an interleaving of data and address requests can be achieved to fill in the latency gaps and ultimately, increase performance of the system. This system again was conceived with the intent to be able to access the textures that reside on the non-local memory. Notice that the graphics card is now generating the address requests and hence controls the addressing patterns needed to obtain the texture map efficiently. Figure 10: Side Band Addressing 4.4.3.2 Control Point A c c e ss With the preceding AGP technologies, we can now propose how to implement an efficient method to access control points. The control point lattice is represented by a 3D array that can be stored in the allocated AGP aperture via the GART system. The problem then lies in efficiently accessing the correct subset of control points within the 3D array that represents the current deformable volume cell. Using the current representation of an FFD parameterized point, each parameterized point has an associated origin index that corresponds to a comer of the 4x4x4 lattice that represents the volume cell that deforms the parameterized point. This origin index can be easily translated into the appropriate memory address when used in conjunction with the starting address of the 3D array. The 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problem is then, how do we address the sub-lattice via the current addressing modes. Since the addressing module was designed to bring in parts of a 2D texture, we can partition the 3D array into a set of 2D texture slices and bring in the appropriate sub-textures corresponding to the sub-lattice that we are interested in. Another proposed method to generate the appropriate addresses is to treat the 3D array as a 3D texture. Since 3D textures also operate similarly to our access patterns, it is possible to use existing 3D texture addressing to extract out the sub-lattice. 4.5 Conclusion This section discusses the necessary features ranging from the LTT to extrapolation in order to implement a hardware-accelerated FFD system. This section also discusses the different methods of hardware acceleration from evaluators to vertex programming. This chapter concludes with lattice point access using existing AGP technologies to assist in efficient retrieval. At this point, the system now has a solid core computational model with its accompanying hardware design. The next step then is to build around this core computation to define a viable system that will meet all the other key criteria. In the next section, the system is then expanded on top of the FFD layer to incorporate handling of multiple controllers and other issues in completing the system. 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 5 Layered Approach Modeling and animating a deformable object often involves the simultaneous use of physical simulators, surface deformation and mesh refinement. Thus, to incorporate a deformable object into a virtual environment, these three processes must be integrated into a system. Since existing approaches to deformable objects are either material specific [Baraff 1998] or method specific [Terzopoulos 1987], this limits the class of deformable objects in any one system. Implementing several deformable modeling approaches is difficult since there is no common interface among the different approaches. This chapter shows how three processes can be integrated in one deformation system that supports a wide range of options for each component. The first process is the controlling mechanism. This refers to the process that controls the deformation of the object. This often is a physical simulator but in some cases can be a user interface to the surface deformation engine. The latter case is common when artists require full control over the object rather than having the object respond in a physically correct manner. The second process is the surface deformation engine. This process alters the surface geometry of the deformable object. In some approaches, the physical simulator and the surface deformation engine cannot be differentiated. Most physical simulators directly alter the surface geometry. Combining the controller with the surface deformation process results in method specific deformable systems. We separate these processes in order to enforce a standard structure in our system that allows multiple controllers to be used with the same deformation engine. The third process is the mesh refinement system Although models could be represented using several representations, triangular meshes are common and efficient for rendering. Therefore, the accuracy of the model representing the deformed object can depend on the amount of refinement that is applied to the model. This process ranges from re-meshing, commonly used with implicit surface representations, to subdivision surfaces for triangular or quadrilateral meshes. Some deformation 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. techniques ignore this process by using a fixed but highly tessellated mesh. This is inefficient if some portions of the mesh remain planar while other portions have high surface curvature. Our system implementation focuses on the second and third processes since we intend to use existing controlling techniques. A natural extension to our approach is its use as a framework for general deformable modeling and animation with a variety of controllers. 5.1 System Overview The approach divides the deformation system into three layers, as shown in Figure 11. The first layer, called the controlling mechanism, consists of either purely user-based control or physically based control. The responsibility of this layer is to direct the surface deformation layers either through physical simulation or direct user interaction. Software Only Hardware Accelerated Support Retargeting Layer Surface Deformation Mesh Refinement Figure 11: System Overview The second layer is the surface deformation system. This layer represents a geometric deformation method. Our implementation uses the Extended Free-Form Deformation (EFFD) primitive since FFD is popular, computationally inexpensive, and easy to implement. Among all the variants of the FFD, EFFD retains the mathematical simplicity of the FFD and yet is able to use more complex lattice shapes compared to the original FFD method. In addition, it is shown in our previous chapter that the EFFD technique can be hardware accelerated as a vertex preprocessor in a standard OpenGL pipeline [Chua 2002]. The details of the second layer are discussed in the previous section. The third layer is the mesh refinement system. I chose to work with a triangular mesh representation since graphics pipelines render triangular meshes directly. In order to maintain representation accuracy while minimizing the total number of triangles in the mesh, I employ an 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. adaptive midpoint subdivision operator for mesh refinement. For all deformed states, this allows me to process a minimal number of triangles. I chose midpoint subdivision since it is simple and can be hardware accelerated as proposed by Bischoff and Kobbelt [Bischoff 2000], The basics of midpoint subdivision are introduced in the next section. I add a retargeting layer between the controlling mechanism and the surface deformation layer. In our approach, physical controllers no longer displace vertices of the mesh directly. Thus, a retargeting layer is needed to convert the output of the physical controller and map it to an appropriate input to the EFFD. 5.2 Mesh Refinement 5.2.1 Midpoint Subdivision Midpoint subdivision is the process of refining a triangular mesh by subdividing existing triangles. Vertices are inserted at the midpoint of each edge of the triangle and a new triangle is created as shown in Figure 12. Midpoint subdivision is often applied uniformly to the entire mesh. This has both advantages and disadvantages. One advantage is that several recursive applications of midpoint subdivision easily create a smooth finely tessellated mesh. The disadvantage of uniformly applying midpoint subdivision is the exponential growth in the number of triangles in areas that do not need the subdivision. The solution is to apply midpoint subdivision only to parts of the mesh that require subdivision based on a curvature or surface error metric. Figure 12: Subdivided Triangle Adaptive subdivision gives rise to a problem with boundary triangles. Options for dealing with boundary triangles are shown in Figure 13. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 13: Termination Cases (a) Singe point case. (b) Re-subdivide (c) early termination Two cases arise when performing adaptive midpoint subdivision. A boundary triangle will either have one or two new vertices added to its edges. In the single point case shown in Figure 13a, the triangle is cut in half. In the two-point case shown in Figure 13b and c, there are two ways to subdivide the triangle. The first is to add the missing third point and apply midpoint subdivision (Figure 13b). The other is to split it into three triangles as shown in Figure 13c. The first method has the advantage of maintaining the uniformity of subdivision, however, in introducing the third vertex, the triangle is changed from being a border triangle to a midpoint- subdivided triangle. This method not only expands the boundary triangle region but also causes a cascading midpoint subdivision effect whenever the added vertex also lies on another boundary triangle as shown in Figure 14. ■Third vertex inserted Another boundary triangle Cascaded vertex Figure 14: Cascading vertex The second method does not produce this cascading effect since no new vertices are added. It has the drawback that long thin triangles may be produced but degenerate configurations of boundary triangles can cause cascading, leading to the introduction of unnecessary triangles. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2.2 R epresentation We used a “forest of trees” representation for the subdivided triangles as shown in Figure 15. Instead of re-evaluating the FFD equation (Equation 1) from the parameterized coordinates and lattice indices, object coordinates are kept if the lattice volume was not deformed. Pointers to neighboring triangles are kept in order to perform boundary triangle subdivision. If a triangle is beside a midpoint subdivided triangle, then the neighboring triangle is either subdivided into two or three triangles, depending on the number of new vertices introduced by other midpoint subdivided triangles. Pointers to subdivided triangles are used to keep track of the current level of subdivision. In addition, reversing the subdivision process is possible since we still keep the original triangle in this representation. Object coordinates — and parameterized coordinates -Parameterized coordinate of the center of mass of the triangle Pointers to ' subdivided triangle Figure 15: Tree Representation The root of each tree corresponds to a triangle in the original undeformed mesh. The depth of the tree depends on the number of applications of subdivision a particular triangle has undergone. Traversing each tree and rendering its leaf nodes renders the model. 5.2.3 Subdivision Criteria As the mesh is deformed, the accuracy of the deformed model is generally diminished. A fidelity criterion determines where additional triangles are needed. We use the distance between the undeformed and the deformed center of mass of the triangle as a metric that is simple and fast to evaluate. During the pre-processing stage of parameterizing the object coordinates, the center of mass of each triangle’s vertices is also parameterized. Whenever a triangle undergoes deformation, the center of 42 Pointers to neighboring triangles Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. mass in world coordinates is calculated based on the current configuration of the triangle. The original center of mass is recalculated using the parameterized coordinates and Equation 1. If the distance between these two coordinates exceeds a threshold, the triangle is subdivided. (To avoid a square root calculation, use the square of the distance.) 5.2.4 Curvature Based Criteria Another popular criterion in performing adaptive subdivision is to measure the angular displacement of the normals from their pre-deformed state to their current state. In this section we will discuss different ways to calculate the transformed normals based on the FFD equation. 5.2.4.1 Analytic Normal Transformation Tensor calculus defines the behavior of the normals under multidimensional transformation. Given a fimction F(U) = X : iR3 -»5R3 , the covariant and contravariant rules provide the transformation rules [Nimscheck 1995] of the normals (Nv — > Nx) and tangents (Tv undergoing the transformation fimction F. Let E and D be the respective Jacobians of the embedding and deformation transformations. As suggested by Nimscheck [Nimscheck 1995], scale factors such as determinants can be omitted since we are only interested in the resulting direction and not the magnitude difference. In addition, instead of calculating the inverse, it is also suggested to use the adjoint (denoted J*) of the matrix since the adjoint differs from the inverse by a scale factor (J'1 = det(J)'1 J*). The simplified rules are rewritten as follows. Contravariant FFD T ransformation: Td = D £ ’Tx Covariant FFD Transformation: ND = (E* D*)T Nx Using these transformation rules, we are able to calculate the appropriate transformation matrices for the normals undergoing FFD. One benefit directly stemming from this calculation is that once the transformation matrix is known, this matrix can be sent directly to the normal transformation matrix in an OpenGL pipeline. The drawback is that calculating this analytical transformation matrix is 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. very computationally expensive and is equivalent to evaluating each equation up to three times. An interesting idea would be to somehow route the results of either the vertex program or FFD evaluator system directly to the OpenGL matrices. Unfortunately, this would be a radical departure from the standard feed-forward pipeline architecture used in graphics cards today. S.2.4.2 A pproxim ation Through Indirect Calculation A similar method can be devised based on the shortcoming of the previous approach. Instead of attempting to calculate the transformation matrix, we can attempt to calculate the normals directly. We can do so by perturbing the object vertices by a small amount in the direction of the surface normal. These points are also treated as object points and embedded and deformed normally. Calculating the vector between the deformed position of the perturbed point and the deformed position of the object point yields the deformed normal. Although this technique is similar in calculation complexity, it directly results in the deformed normals instead of the transformation matrix that still needs to be applied to the normals. The main difference here is in architecture and utility. Since this procedure’s results are generated from a vertex program, then the results will most likely be fed directly into the graphics pipeline. Since updating the normal matrix is not on the path in a standard graphics card, this would make the previous technique not amenable to the current architecture. On the other hand, this approximation method directly calculates the normals, which is a valid input to the system as part of a triangle specification. A drawback to this approach is that since it is an approximation of the normal, this may not guarantee the correct calculation of the angular displacements. In addition, using too large a perturbation may lead to some degenerate cases where the angular displacement is incorrect, while using too small a perturbation may run into machine precision problems. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2.5 Hardware Accelerated Subdivision Now that we have all the elements of both the FFD and the adaptive subdivision processes, let us look at the step-by-step process by which a triangle is deformed and subdivided. 1) Given a triangle for deformation, the first step is to determine if the triangle needs to be subdivided. 2) The level of subdivision is then determined. This specifies how many new triangles are generated and how many intermediate points are needed. 3) Based on the number of new triangles and points and because we are using midpoint subdivision, we can calculate the differential length for each triangle edge. By adding the differentials in the appropriate fashion, we are able to generate the new points. New triangles are specified in a z-traverse fashion (triangle strip ordering). 4) Since each new point is specified in object coordinates, we need to retrieve the approximate location of the deformable volume cell that these new points may reside in. This process is done via the LTT translation. 5) The proper control points are then retrieved via 3D texture access patterns and AGP sideband addressing schemes. 6) The new object points are then sent to the embedding fimction 7) The point is finally sent to the FFD deformation system Based on the current experiments, the suggested hardware acceleration is feasible but requires too much bookkeeping to successfully transfer the algorithm to a hardware design. The major flaw is that adaptive subdivision requires knowledge of the neighboring triangles, triangles that do not necessarily come next in the triangle stream unless some triangle ordering is imposed. Thus attempting to access neighborhood information becomes difficult once the object triangles are being sent down memory to the graphics card. 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2.6 Alternative Adaptive Algorithm and Representation The drawback of the prior representation is that there is too much bookkeeping involved in maintaining a coherent model representation. In fact, it is quite possible that the majority of the processing involves the correct update of neighboring triangles and proper splitting of the subdivided triangles. This sparks the question if there is an alternative representation that can alleviate the problems associated with extensive bookkeeping and coherency checking. Since we would ultimately like the hardware to perform the actual subdivision processing, we can immediately simplify the representation of the subdivided triangles. Instead of actually creating the triangles corresponding to the individual subdivision levels, we only need to keep track of the subdivision level of a triangle and let the hardware perform the subdivision. The modified adaptive subdivision data structure and algorithm would need to be changed as follows. The adaptive criteria would still be carried out on the host processor. This criteria test must be able to determine the level of subdivision necessary in a triangle. The model representation will only keep track of the subdivision level and a set of sample points depending on the current subdivision level. The sample points are necessary to ensure that visible undulations on the surface not represented by the object’s vertex normal is properly represented and checked. An open problem that exists using this technique is to determine the best method to take care of boundary triangles. One solution is to perform immediate termination. The additional points are connected to the farthest point on the boundary triangle. Figure 16 shows how to terminate the boundary triangles when the boundary triangle is surrounded by two or more subdivided triangles. The advantage to this approach is that the termination algorithm is simple even for case 2 and 3 triangles. The disadvantage to this approach is the long and thin triangles produced by the algorithm. 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 16: Boundary Cases An interesting approach would be to perform some sort of progressive degradation algorithm to slowly shrink the triangle front. This is quite simple in case 1 boundaries as shown in Figure 17. In case 2 triangles, the algorithm becomes quite complex but this method may spark other possibilities with better triangulation methods. Figure 17: Progressive Degradation 5.3 Controlling Mechanisms In order to deform the object, we need to manipulate the control points. Control points generally number significantly fewer than the deformed mesh vertices, however control of these points is still nontrivial. This section reviews some geometric and physically based controllers best suited for controlling FFD lattices. 5.3.1 Radial Basis Function Radial Basis Functions (RBF) are a popular multidimensional interpolation method. By controlling a small subset of FFD control points, the RBF method can interpolate new positions of the 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. remaining control points. For example, the comer control points of the lattice are chosen as the interpolation points of the RBF from which the internal control point locations are derived. The RBF is a system of linear equations of the following form. N y j = X C^x} - */|| j - E q. 4 <•=1 where yj is a vector containing a single dimension of the control point location, c, is an unknown weighting factor, ||x,-x,j| are the pair-wise Euclidean distances between each of the control points, and IV is the total number of control points. Interpolating is done as follows. Define a symmetric matrix H such that Hy = R(||x;-x|), the pair-wise Euclidean distances of the control points modulated by a radial basis blending function. Recover the weighting factors c by solving the equation c=H ~'y. The new location of the internal control points is calculated by substituting the internal control point location for Xj in Equation 3. This whole process is done for all three dimensions, but since H is the same for each dimension, the inverse is only calculated once. For more the details on RBFs, refer to [Poggio 1989], 5.3.2 Skeletal Animation Systems Skeletal animation systems embed a movable skeleton inside an object. Skeleton motions then influence the surrounding mesh. As the name implies, skeletal animation systems are a form of axial deformation. The skeleton bones form the axis and vertices move to preserve their distance from an axis. Two or more bones may influence vertices near joints. Skeletal animation can control axial deformations of an FFD lattice. Analogous to a skin that wraps around a bone, the bone can control an FFD lattice and multiple bones at joints can control a single control point. In turn, the object embedded in the FFD lattice moves according to the skeletal deformation. For more details on skeletal systems, refer to [Lewis 2000], 5.3.3 Directly Manipulated Free-Form Deformation While global deformations are easily obtained by direct manipulation of FFD control points, attaining specific detailed deformations can require unintuitive configurations of control points. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Animators and modelers often prefer to manipulate the surface directly. This requires the system to hide the indirect and automatic manipulation of the control points from the user. This is called Directly Manipulated Free-Form Deformation (DMFFD). The user manipulates a set of vertices on the surface, moving them to their desired final locations. The DMFFD system calculates the best configuration of control points to produce the desired vertex positions. Determining the best control point configuration can result in either an underdetermined or overdetermined system of non-linear equations, depending on the number of vertices and the total number of control points. Standard solution methods employ either naive pseudo-inverses or Householder’s QR factorization. For details on solving these equations, refer to [Gain 1996, Hsu 1992]. A more important aspect of DMFFD is how it can be used as a retargeting layer as mentioned earlier in Section 3. Since most physically based controllers manipulate the mesh directly, DMFFD is the best method by which the output of a physically based controller can be retargeted to controlling the FFD control points. This means that no changes are needed in the physically based controller algorithm because of DMFFD. In addition, since DMFFD handles general direct surface manipulation, it is conceivable to design a whole new set of controllers on top of DMFFD, be it physically based or non- physically based. 5.3.4 Mass Spring Systems Mass-spring systems are often used in animation systems. Their applications range from facial animation to simple deformable objects. In addition, they are easy to implement and computationally inexpensive for lattices of up to a few hundred points. As external forces, such as gravity, are applied to one or more of the mass elements, a simulation of the mass and spring interactions is computed until an equilibrium state is reached. The application of a mass-spring system as an FFD controller is used in [Christensen 1997]. Each control point is considered a mass element while the lattice topology represents the spring network that ties the masses together. With this arrangement, a user can either apply forces to the FFD lattice or 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. manipulate the mass elements to control the FFD control points. This system produces physically plausible dynamics and responses for the object. 5.3.5 Finite Element Methods and Other Physical Controllers The finite element method (FEM) can be thought of as a generalized extension to the mass- spring model. An object is represented by a set of discrete finite elements. The interconnections between finite elements define how these elements interact. Instead of using Hooke’s spring laws, FEMs use equations that model the level of potential energy within the object and the physical simulation aims at minimizing the total potential energy. FEMs give the most physically accurate representation of elastic deformation but they tend to be computationally expensive. Recently, techniques achieving real-time simulations using FEMs include [Debunne 2001, Debunne 2000, James 1999, McDonnell 2000, Mandal 2000], Since FEM control mechanisms produce the best representations of elastic objects, they are desirable to incorporate as controllers. As discussed in Section 6.3, DMFFD can serve as a retargeting layer to translate the output of FEMs into FFD control point configurations. The physical simulator and representations used by FEMs remain the same, only the final output object configuration is re-targeted. This approach allows our deformation and subdivision method to utilize a rich library of physical controllers. Another method of integrating physical controllers into FFD systems was proposed in Dynamic Free Form Deformations [Faloutsos 1997]. This method defines a parametric space of different lattice configurations representing basic operations performed on the object such as bending and shearing. Lagrangian dynamics is used to simulate both locomotion and internal strain energy. Deformation is calculated as a linear combination of the several operation parameters based on the calculated strain energy of the object. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4 Conclusion I developed a unified deformation system by identifying common processes of deformation systems: controlling mechanisms, surface deformation and mesh refinement. Emphasis was placed on the first and last layers by discussing our selection of possible controllers and midpoint subdivision as components in our system. Since hardware acceleration of surface deformation component is feasible, this leads me to believe that this system can form the foundation of real-time interactive deformable modeling and animation. 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Modular Approach 6.1 Introduction In the last chapter we saw how it is possible to design a layered approach to deformable modeling and animation. This design was geared towards building the proper interface layers to allow multiple types of controllers to run on top of a common surface deformation primitive, the Free-Form Deformation. The philosophy shown by the design of this system is akin to a “one-size fits all” mentality. While there is nothing inherently incorrect regarding this approach, a common criticism of the system was in fact because of this philosophy. Why should the programmer be restricted to a 13- spline volume based deformation primitive? While it may be possible to retarget all other controllers to still run on top of the FFD layer, I agree that the inherent limitations, for example, enforced continuity constraints, will always affect the resulting deformation that may not coincide with what the animator originally intended. One then could contend that a speed tradeoff may justify the loss of accuracy, but after investigating further, Directly Manipulated FFD techniques requires a substantial amount of computation such that the fastest implementation so far runs at 15 frames per second [Gain 1996]. This fact coupled with the fact that the original controller may also be computationally expensive and still has to go through the retargeting layer using DMFFD, makes DMFFD an unattractive option. Another problem that arose while designing this system is that the adaptive subdivision system was very difficult to design. The main reason behind this is because I have violated the constraint that only local computation should be run on the graphics hardware. To perform adaptive subdivision, the hardware unit must have knowledge of all the neighboring triangles in order to ensure that no cracks or seams are created due to unresolved T-junctions. This made it difficult to terminate the adaptive algorithm since the neighboring triangle could either have been previously rendered already or is not the next incoming primitive. This and additional cascading problems discussed in the previous chapter kept coming about when trying to design a hardware unit for adaptive triangle subdivision. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. While the previous system did indeed attempt to address all the key issues in Chapter 3, the one major flaw of the system was not from its design but rather from its philosophy. It is quite difficult to find instances of problems where a single type of algorithm will perform sufficiently well on each of these problems. In fact it is more common that a specialized algorithm will tend to ran faster and more efficiently for the specific problem it is trying to solve. What then should the philosophy be in designing such a system? One method is to take the complete opposite approach to the previous philosophy. Instead of trying to fit disparate techniques to a single common deformation primitive in hardware, a new philosophy could be, why not fit the hardware to a set of disparate deformation techniques? Clearly this philosophy would easily fall upon more criticism since it is typically impractical or infeasible to make such a complex chip to perform a set of deformation primitives. Ignoring the practical side for now though, clearly, this approach is more superior to the previous approach since there is no retargeting and hence it is efficient, and the computation is run on a specialized sub-processor that insures both increased free cycles on the host processor and faster execution on the sub-processor. The question then becomes, is it possible to design a practical deformable object system with such a philosophy in mind. As this chapter will show, with the proper theory and simplification, it is possible to design a practical system for a set of representative deformation primitives. The overview of this design process is as follows: a) A set of representatives needs to be selected. b) The underlying theory for unifying these techniques needs to be developed c) The design is made based on this underlying theory 6.2 Selection of Core Techniques The first step in designing a new system for deformable object animation is to select a set of representative deformation primitives that encapsulates a large portion of all possible types of deformation. For this system, I am primarily concerned with character animation and thus will focus on types of deformation that apply to character animation. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The first class of deformation to be considered is called implicit deformation. This class of deformation is implicit, meaning that the object is not directly manipulated in any manner and instead, an indirect controlling mechanism is used to influence the behavior of how the surface is to be deformed. Several general techniques that fit in this category include Free-Form Deformation, WIRES, Metaballs and implicit surfaces. In FFD, the implicit control is the warping of space of the object that is embedded. As the space is manipulated, the shape of the object is changed accordingly. WIRES relies on the same principle as Skeleton Subspace Deformation except instead of linear constraints influencing the object mesh, generalized curvilinear constraints are used. Metaballs and implicit surfaces on the other hand use point constraints and mathematical equations to map out the volume of influence over a region. Based on these choices that are available, I’ve decided to use FFD. In addition to being widely used in popular character animation packages, FFD is mathematically simple, compact, and has been studied thoroughly in Chapter 3. The next class of deformation to be considered is shape interpolation. Shape interpolation deals with the smooth morphing of one shape to another. In this category, while there has been a lot of work done in the 3D geometric morphing, my primary concern is with character animation, thus I decided to work with shape interpolation techniques that have been successfully used in character animation algorithms. Based on a survey of character animation techniques [Noh 1998], I selected the radial basis function as the candidate for the shape interpolation primitive. This is because of a new technique called Pose-Space Deformation that utilizes RBF to blend multiple shapes according to user specified blending weights. PSD provides a powerful and expressive framework for shape blending through the abstraction of blending parameters to what is called Pose-space. Thus, RBF, essentially PSD too, was selected because of the power of PSD and the simplicity of RBF. One common task done in character animation is the animation of articulated figures. Whether it is a human body, a four-legged animal or an imagined mythical beast, most character animation productions revolve around protagonists that are articulated figures. In this category, the most commonly used technique is called Skeleton Sub-space deformation. This technique embeds linear constraints, called bones, to serve as a linear axial controller of the space around it. Several bones are 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. then interlinked at articulation points that serve as the joints of the articulated figure. The reason why this is such a popular technique is that the computation required for SSD is essentially a matrix-vector multiply, the same operation used to transform 3D vertices of a mesh from world coordinate space to screen space. This implies that SSD can be easily done on existing graphics hardware since the fundamental operation is already available in hardware. The final class of deformation that is considered is physically based deformation. As opposed to the other classes discussed so far, physically based deformation attempts to simulate the real world by evaluating deformation equations based on real elastic properties of materials. A commonly used technique in mechanical engineering called the Finite Element Method was recently introduced in computer graphics as the computation power of a general purpose CPU became fast enough. Despite the increase in computational power of CPUs, the high level of regular volume partitioning imposed on the object required for accurate results proved to be very demanding on the CPU. A simplified variant of the FEM called steady-state FEM analysis computes the final equilibrium state of the deformed object under the given force constraints. This core computation of this variation turns out also to be a vector matrix multiplication. Thus, I selected to work with this form of FEM since it holds promise to be hardware accelerated and simple yet is also physically accurate. 6.3 Analysis of Core Computation Looking closely at the selected techniques in the previous section, a pattern emerges by analyzing the mathematical formulae of each technique. Equation 1 represents the dot product of a vector and a row of a matrix commonly performed in FEM. Equation 2 is the RBF interpolation equation where w* are the blending weights, R is a blending basis function and ft is the blending parameter. Equation 3 is the FFD evaluation equation where CP are the lattice control points and Bj, Bm B„ are the uniform B-spline blending functions. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. X a**6* 2 > t. * ( i i id Ec p . . . •s/ ' s -s. Eq. 5 Eq. 6 Eq. 7 Notice that the basic operation in all three equations is a multiply-accumulate except that the parameters of this multiply-accumulate operation differ slightly from technique to technique. More specifically, these three equations can be written in the following form: where X is a for dot products, wk for the RBF equation, and CP for the FFD equation. T is h for dot products, the blending function R for RBF and the multiplication of uniform B-spline basis RBF blending function. This observation leads me to the following conclusions: a) The multiply-accumulate operation is the unifying fundamental computation for these A more general framework for any existing and possibly undiscovered geometric warps has been proposed in [Milliron 2002]. This framework is more general and expressive and encompasses far more techniques that the one proposed in this section. In fact, one can think of my derivation is a special case of their general framework. Because of this, I will briefly discuss their framework and the simplifications that were made to get to my system. Milliron’s goal is to provide an elegant mathematical framework for all different kinds of geometric warps and deformations in order to provide a means to perform mathematical analysis on geometric deformation is a structured manner. This means that once the appropriate formulation of a geometric deformation has been made in this framework, mathematical properties such as continuity can be easily calculated or enforced. The general framework is posed as follows: Eq. 8 functions for FFD. Notice that A is always a scalar value and that Y can be a scalar, a polynomial or an selected deformation techniques. b) Designing different strategies for implementing Y becomes the focus of this new system 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D ( u , M ) = ( ('W i(u,M)m(si(u,M)»Ti(v)y)dv Eq. 9 i= 1 6 ' where D (u,M ) is the deformed point u of object model M, w t is the falloff fimction, s t is the strength function, Tt is a general matrix transformation and v is one of the continuous features. To explain this framework better, let us dissect each component of the equation. T; refers to a generalized linear transformation applied to a model point u in M. This can range from a simple translation to a complex set of translations, rotations, shears and other operations. Without loss of generality, we can assume it to be translations for now. We can then visualize T, to be a dense set of translation vectors associated with each continuous feature v. A continuous feature v is essentially a controlling mechanism that allows the user to manipulate the deformation effect. This can range from a set of points, a linear constraint, a curvilinear constraint, or a mathematical equation. Now, we can imagine for example how a linear constraint can change the dense displacement field. So far, we have assumed that displacement field is uniform in length, thus to modulate the strength of the vector field, the strength function s is introduced. Finally, the falloff function w controls the extent or region of influence the deformation primitive has. A pictorial summary of this is shown in Figure 18. Thus a geometric deformation primitive is cast into this framework by carefully selecting the falloff function, the strength function, the feature set primitives and the general translation. Milliron et. al. shows how a Free-Form Deformation is cast into this framework by selecting each parameter to be the following: Feature set: Point Transformation: T = translation between feature points Strength function: s = 1 Falloff function: w = B-spline basis functions 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (a) (b) (c) (d) Figure 18: Pictorial Overview (a) Dense displacement field (b) Linear Constraint added (c) Strength Function added (d) Region of Influence added It is now evident how I simplified this general framework for my proposed system. First, the feature set is fixed using 3D points. The second is that the general transformation parameter is always fixed using translations or displacements between the feature points. The third is that the strength fimction is always set to 1. Finally, the weighting fimction only can be a scalar, multiplication of polynomials or a radial basis blending fimction. By constraining this general framework as shown, designing a new graphics co-processor becomes feasible and practical. 6.4 Design of a Graphics Co-processor The previous section can be summarized by Table 1. As this table shows, the fundamental operation we are dealing with is a floating point multiply-accumulate (FMAC), and the method by which a new core technique is added on top of this fundamental operation is to vary the input parameters into the FMAC. For example, for FFD, the input would be a multiplication of B-spline 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. basis functions, whereas RBF would be an RBF blending function. From the programmer’s point of view, the usage of these core techniques will depend on the preprocessing that the deformation primitive requires. For example, RBF direct surface manipulation requires a one time pre-processing set up of the surface control points, whereas, PSD requires the pre-processing setup of the different blend shapes. API (Differs in usage) Core Technique (Differs in parameters) Fundamental Operation FFD FFD Floating Point Multiply Accumulate PSD RBF RBF (Direct Surface deformation/ Expression Cloning) SSD Dot Product FEM Table 1: Layered Abstraction Table Since there has been a lot of work already done in Digital signal processing chip design regarding the design of an FMAC, I am going to assume that an FMAC unit is available to work with in the following block diagram designs of a graphics co-processor. This also means that the design will be primarily concerned with the structure and design of the available input functions. The input functions that I am interested are constant (for dot products), the uniform B-spline basis functions, and the RBF blending functions. They are outlined in Eq. 10. Notice that the design shown in Figure 7 handles both the scalar case and the FFD (both quadratic and cubic basis functions). The problem now becomes, how is the design to be augmented to take care of the various RBF blending functions. Notice that the RBF blending functions are quite different from the uniform B-spline blending functions. So unlike constant functions and polynomials, there is no direct method of representing the various RBF blending functions in hardware. One alternative is to introduce a table lookup for each of the different kinds of RBF blending functions. The 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. advantages are that it is fast, the accuracy of table-lookup based techniques is solely dependent on memory sizes, and this method is applicable to any kind of function. The drawback is that each function requires its own table. To insure a certain accuracy of the function represented also means that every function’s memory representation requirements will all increase rapidly. U n ifo rm Quadratic B asis Functions 1 1 , B0(u) = ----- u +—u Bx(u) = -^ + u - u 2 Uniform Cubic Basis Functions B0(u) = —(1- 3m + 3w2 - u 3) 6 Bx{u) = — (4 -6w 2 +3m3) 6 B2{u) = — (1 + 3m + 3m2 - 3m3) 6 3 B 3 (u ) =~u 6 Various RBF Blending Functions -pi2 h(r) = e K cJ (gaussian) Hr) = 2— > 0 (c2 + r ) h(r) = (c2 +r2y ,0 < j3 <1 h(r) = Vr2 + c2 (multiquadric) h(r) = r2 ln(r) (thin - plate spline) h(r) = r (linear) E q . 10 One other method that takes advantage of the current hardware proposed is using a spline approximation of the function. First of all, this reduces the size of the memory requirements for each 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. blending function. Instead of a whole table full of points and the corresponding values, all that is required is a set of spline control points. As was observed in Chapter 4, an FFD can be thought of as a 3D spline. Thus, it is possible to immediately leverage the current hardware by adding the appropriate control flow as show in Figure 19. Thus, generating the appropriate blending function consists of loading in the proper spline control points for the function that is to be evaluated, and passing in the proper parameters. Then a loop back is required to pass the result back into the system for final evaluation of the RBF equation. Figure 19 show the modifications made to the previous hardware design for FFD. First, a new bank of multiplexers is added to allow for the three types of possible inputs to the system. The left most is a feedback loop directly from the accumulation register. The middle input is a pass-thru from the polynomial evaluators. The final input is the standard input from the vector multiplier. Each of these inputs will be discussed shortly. Essentially, the design has not changed with respect to FFD. Thus by selecting the appropriate multiplexer input, the data flow will remain the same as was discussed in Chapter 4. To obtain the dot product primitive, there are two settings that need to be set properly. The first setting is for the polynomial evaluators to be set to pass-thru mode which is essential f(x)~x. This allows an external program to feed both parameters (s, t, u) and (P* Py, PJ. The second setting is to set the mux to use the middle inputs, the polynomial evaluator pass-thru mode. This setting bypasses the both the polynomial evaluator and vector multiplier stages and feeds the data directly into the multiply-accumulate unit. The final setting used for RBF interpolation is more difficult since unlike the previous two settings, the data flow is basically linear. Before analyzing the data flow for RBF interpolation, a more detailed look into the feedback path (leftmost input) and the register file in the multiply-accumulate unit is necessary. As shown in Figure 20 the register found inside the multiply-accumulate unit is a pair of registers with the appropriate data flow components. As will be explained later in the data flow diagram, this is necessary due to the complex feedback loop needed to reuse both the existing polynomial evaluator and multiply- accumulate units. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Polynomial Polynomial Polynomial M ultiply Accumulate Unit Figure 19: New Design Input from adder To Feedback loop MUX DEMUX Register Register Output back to adder Figure 20: Register Design Close-up Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Time Step Polynomial Evaluator Mode Multiplexer Mode Register Demultiplexer Mode Register Multiplexer Mode 1 B-spline basis evaluation 2 Middle input (Polynomial Evaluator pass- thru) 3 Register 1 Register 1 4 Left input (Feedback loop) 5 Register 2 Register 2 Table 2: Timing Table RBF interpolation was cast as a multiplication of a scalar value and an RBF blending function. Since the blending function was arbitrary, a spline-based interpolation technique was selected since the existing hardware can be reused. Thus the first three time steps consist of the evaluation of the RBF blending function using a B-spline interpolation. The data parameter is first passed into the polynomial evaluator. Then the polynomial evaluator’s result is then fed directly into the multiply-accumulate unit. The result of the multiply-accumulate is accumulated in register 1. This set of operations (time step 1 thru 3) is repeated until the iterative summation of the spline interpolation is completed, typically 2 iterations for quadratic and three iterations for cubic. Then, the output of register 1 is then selected as the input back into the multiply-accumulate unit, as indicated in time step 4. Register 1 currently contains the approximation of the RBF blending function and is then multiplied with the appropriate weights. The result is then accumulated into register 2. The entire cycle (time steps 1 thru 5) is then repeated for the next iteration of the RBF equation. Another way to look at this is that the first set of iterations (steps 1 thru 3) is approximating one blending function of the RBF summation and steps 4 thru 5 is multiplying the blending function with the corresponding weight. Executing Steps 1 thru 5 completely only represents one cycle in this summation and must be repeated to complete the RBF calculation. This is also shown in Figure 21. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Time Steps 1 thru 3 wk x R(uk) + wk + l x R(uk+l) +... A v ______ ______ / r \ Time Represents one Steps 4 complete cycle o f thru 5 intepolation and accumulation Figure 21: Various Loops 6.5 Emulation on Current Graphics Hardware I emulated the basic functions of some candidate deformation techniques including Pose-space deformation, radial basis functions and free-form deformation. Here are some pictorial results and some observations. Figure 22: Quadratic Free-Form Deformation Figure 22 shows a bouncing ball that has been embedded in a simple FFD lattice. This demo shows how a 3x3x3 FFD lattice is being animated by a set of sinusoids and how the ball deforms along 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. with the lattice. Notice that the FFD lattice is degraded to a quadratic FFD instead of a cubic FFD. This is primarily due to the fact that the current state of the art programmable graphics cards have such a limited space in terms of both memory store for the control points and program store. This demo shows the limits placed on the FFD due to the limited size of input and storage of current graphics cards. Figure 23: Pose-Space Deformation 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 23 shows the results of using Pose-space deformation. This demo uses a one­ dimensional pose-space to represent the shape morphing from the arms extended pose (top most) to the fully flexed pose (bottom most). This demo shows a simplistic use of PSD for a single articulation point with only one degree of freedom. In a more complex model of the whole human body, one can surmise that there will be a lot of degrees of freedom and thus more blend shape pairs will be required (ie. a neutral to flexed pair of shapes). A higher dimensional pose-space will also be required to represent and calculate the different shape morphs between all these pairs. The currently available graphics cards will not be able to deal with such a high-dimensional space PSD as is required for finely sculpted articulated figures. Figure 24: Radial-Basis Functions Figure 24 shows RBF surface deformation using 10 control points around the mouth area. To get more flexible facial animation, more control points around the whole face are necessary. Unfortunately, the current graphics cards are limited by the number of total control points due to a limited number of input registers. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.6 C onclusion A new hardware unit design is presented here. This design stems from a new philosophy of adapting the hardware design to the selected deformation primitives. As the design progressed, a new problem became evident, the design of a general function evaluator. By introducing a general function evaluator, the modified FFD hardware unit can now handle various kinds of primitives and even more kinds of deformation APIs as described by the layered abstraction. Finally, this new design is emulated on hardware using current generation graphics cards and Cg, “C for Graphics”, a programming language for programmable graphics cards. 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 Summary and Future Work 7.1 Summary In the course of building a Virtual Reality system, several problems became evident. First, there is no standard API to help programmers write deformation systems, and secondly, these deformation algorithms solely ran on the host processor and thus slowed down the VR simulation. To solve these problems, I have set out to design a system that would be a new standard for deformation algorithms running on a graphics card. During this time, new advances in technology and research showed great promise in solving these problems. Hardware manufacturers such as Nvidia and ATI both introduced problem specific deformation APIs. This only helped fragment the OpenGL graphics standard. From this, we can see a problem specific type of approach cannot solve this problem. This began my search for a unifying principle that will bring together seemingly disparate types of deformation primitives. The first attempt was to introduce a simple yet versatile core computation. Hence, the FFD was chosen and was designed to work with existing graphics card architectures as a small vertex pre­ processor. The second stage was to wrap up this core computation to unify the various kinds of deformation primitives. This brought about the layered approach where layer upon layer was added on top of and below the core computation to make a whole system. While is was possible to retarget a lot of deformation primitives to work with this system, there were two major drawbacks that ultimately showed that the design was flawed. The first problem is that the retargeting layer was run on the host processor. This insured that nearly any technique could be retargeted via this flexible system. The problem is that it ran on the host processor and uses up computation cycles that could be used for other computation. Since the system’s purpose was to accelerate the deformation calculation via offloading the unnecessary computation to the graphics card, it did not make sense when more computation was introduced into the host processor as an after effect. The second problem with the system is that the 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. system is fundamentally an FFD operation. Since FFD has its own advantages and drawbacks, these drawbacks also carried forward to the retargeted deformation primitive. This caused a lot of additional problems in the design of the hardware system, specifically, the adaptive subdivision layer. In order to solve the problem at hand a new design philosophy of adapting the hardware design to the deformation primitives was adopted. This allows us to insure that all the computation is done on the graphics processor since the actual algorithm of the specific deformation primitive is implemented in hardware. This would have introduced the problem of implementing each and every selected primitive in hardware. Luckily the selected primitives shared a common operation and varied not too greatly from each other. This observation was then extended and generalized by Million et. al. to encompass nearly any type of geometric deformation and warp. Upon closer inspection, my observation is a simplification of such a framework. This simplification allowed me to alter the original FFD hardware unit design only minimally, and yet be able to achieve the functionality of all selected primitives. In addition, the new problem of general function evaluation was also solved via recursion. By reusing the existing hardware units, I was able to approximate any general function by a B-spline approximation. Thus the problem of evaluating any general function was reduced to finding and loading the correct set of B-spline control points to the graphics co-processor. This new design of a modular system shows more promise than before as a standard deformation system that can be hardware accelerated and can work with existing graphics cards. In addition, it does not suffer from adding additional computation on the host processor, nor does it have the limitations or disadvantages associated with a single deformation primitive. This modular design spans a wide range of commonly used deformation primitives and yet only requires simple and elegant hardware. 7.2 Future Work The next logical step with the results of this work is to actually implement the hardware design with existing graphics cards either as a PCI daughter board, an ASIC FPGA or directly integrated with 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the GPU. Once this is done, extensions to the OpenGL or DirectX API need to be formulated. This will directly test out the theories presented in this dissertation. Thinking further ahead, the next possible avenue of interest would be programmability. Notice that the B-spline function approximation unit is able to represent a wide variety of functions and changing the behavior of the B-spline would only entail a change in the set of control points used. This allows the user to tinker with the type of function that is to be used for the blending function. By varying the blending function, one can conceivable create a new type of geometric deformation. This is evident since our design hinges on the fact that it is a special case of Milliron et. al.’s general framework for geometric deformations. The basic premise of their framework is that all geometric deformation can be expressed as a multiply-accumulate of three distinct “functions”, the scale, the blending function and the weight. Our constraints only applied to the scale and the weights which means that while my design spans a smaller subspace of the general framework, there is still a subspace of constrained geometric deformations that my design is capable of executing. This flexibility amounts to the same paradigm of programmability of the graphics co-processor. By sending it different kinds of control points, the programmer is able to change its behavior. Another extension to this idea along the same lines is to completely introduce another method for general polynomial evaluation. One popular method to solve such a problem is to use machine- learning algorithms to learn the different kinds of functions that need to be evaluated. In fact, RBF itself is a type of a machine-learning algorithm. This would allow us to reuse the existing design and modify it to handle the RBF in a more efficient manner. By doing so, a new function evaluator unit can be used in lieu of the B-spline based approximation. The final extension would be the generalization of this hardware design to encompass Milliron’s general framework for geometric deformation. By not constraining the other two elementary functions in their framework, the next generation graphics card would then be able to have a powerful character animation system running entirely on hardware. Since the general framework encompasses any type of geometric deformation primitive, the programmer’s choice of deformation algorithm will not be constrained in any way. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Bibliography [Baraff 1998] D. Baraff, A. Wilkin. Large Steps in Cloth Simulation. SIGGRAPH 98, pp. 43-54. [Barr 1984] A. Barr. Global and Local Deformations of Solid Primitives. SIGGRAPH M y 1984 , pp 21-30. [Bischoff 2000] S. Bischoff, L. Kobbelt, H.P. Seidel. Towards Hardware Implementation of Loop Subdivision. SIGGRAPH-EUROGRAPHICS Workshop On Graphics Hardware 2000, pp. 41-50. [Blinn 1982] J. Blinn. A Generalization of Algebraic Surface Drawing. ACM Transactions on Graphics, Vol 1, No. 3, pp. 235-256, July, 1982. [Capell 2002] S. Capell, S. Green, B. Curless, T. Duchamp, Z. Popovic. Interactive Skeleton-Driven Dynamic Deformations. SIGGRAPH 2002, pp. 586-593. [Catmull 1978] E. Catmull and J. Clark. Recursively Generated B-Spline Surfaces on Arbitrary Topological Meshes. Computer Aided Design 10(6): 350-355,1978. [Coquillart 1990] S. Coquillart. Extended Free-Form Deformation: A Sculpting Tool for 3D Geometric Modeling. SIGGRAPH 1990, pp. 187-196. [Christensen 1997] J. Christensen, J. Marks, J. Ngo. Automatic Motion Synthesis for 3D Mass- Spring Models. Visual Computer, Vol. 13, pp. 20-28, 1997. [Chua 2000] C. Chua and U. Neumann. Hardware Accelerated Free Form Deformation. SIGGRAPH- EUROGRAPHICS Workshop On Graphics Hardware 2000, pp. 33-39. [Chua 2001] C. Chua and U. Neumann. A Layered Approach to Deformable Modeling and Animation. IEEE Computer Animation, pp. 184-191. November 2001. [Debunne 2000] G. Debunne, M. Desbrun, M.P. Cani, A. Barr. Adaptive Simulation of Soft Bodies in Real-Time. Computer Animation, pp. 17-24, 2000 [Debunne 2001] G. Debunne, M. Desbrun, M.P. Cani, A. Barr. Dynamic Real-Time Deformation using Space and Time Adaptive Sampling. SIGGRAPH 2001, pp. 31-36. [DeRose 1998] T. DeRose, M. Kass and T. Truong. Subdivision Surfaces in Character Animation. SIGGRAPH 1998, pp. 85-94. [Desbrun 1995] M. Desbrun and M.P. Gascuel. Animating Soft Substances with Implicit Surfaces. SIGGRAPH 1995, pp. 287-290. [Desbrun 1996] M. Desbrun and M.P. Gascuel. Smoothed Particles: A New Paradigm for Animating Highly Deformable Bodies. 6th Eurographics Workshop on Animation and Simulation 1996. [Desbrun 1998] M. Desbrun and M.P. Gascuel. Active Implicit Surfaces for Animation. Graphics Interface 1998 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [Dyn 1987] N. Dyn, D. Levin, J. Gregory. A 4-point Iterpolatory Subdivision Scheme for Curve Design. Computer Aided Geometric Design 4(1987), pp. 257-268. [Dyn 1990] N. Dyn, D. Levin, J. Gregory. A Butterfly Subdivision Scheme for Surface Interpolation with Tension Control. Transaction on Graphics Vol. 9, No. 2, April 1990, pp. 160-169. [Faloutsos 1997] P. Faloutsos, M. van de Panne, D. Terzopoulos. Dynamic Free-Form Deformations for Animation Synthesis. IEEE Transactions on Visualization and Computer Graphics. Vol. 3, No. 3, pp. 201-214, July-September 1997 [Gibson 1997] S. Gibson, B. Mirtich. A Survey of Deformable Modeling in Computer Graphics. MERL Technical Report, TR-97-19, November 1997. [Gain 1996] J. Gain, Virtual Sculpting: An Investigation of Directly Manipulated Free-Form Deformation in a Virtual Environment. Masters Thesis, Rhodes University. February 1996. [Gain 1999] J. Gain and N. Dodgson. Adaptive Refinement and Decimation under Free-Form Deformation. Eurographics UK 1999, Cambridge, UK, April 1999. [Gain 2000] J. Gain. PhD Thesis, The Computer Laboratory, Univerisity of Cambridge, Technical Report TR499, June 2000. [Gascuel 1997] M.P. Gascuel and M. Desbrun. Animation of Deformable Models using Implicit Surfaces IEEE Transactions on Vision and Computer Graphics. March 1997. [Hsu 1992] W. Hsu, J. Hughs and H. Kaufman. Direct Manipulation of Free-Form Deformations. SIGGRAPH 1992, pp. 177-184. [Intel] Intel Corp. http://developer.intel.com/technologv/agp. [James 1999] D. James and D. Pai. ArtDefo: Accurate Real Time Deformable Objects. SIGGRAPH 1999, pp. 65-72. [James 2002] D. James, D. Pai. DyRT: Dynamic Response Textures for Real Time Deformation and Simulation with Graphics Hardware. SIGGRAPH 2002, pp. 582-585. [Kobbelt 1996] L. Kobbelt. Variational Subdivision Schemes. Computer Aided Geometric Design 13(1996), pp 743-761. [Kry 2002] P. Kry, D. James, D. Pai. EigenSkin: Real Time Large Deformation Character Skinning in Hardware. Symposium on Computer Animation 2002, pp. 153-159. [Levoy 1985] M. Levoy and T. Whitted, The Use of Points as a Display Primitive, Technical Report 85-022, Computer Science Department, University of North Carolina at Chapel Hill, January, 1985 [Lewis 2000] J. P. Lewis, M. Cordner, N. Fong. Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation. SIGGRAPH 2000, pp. 165-172. [MacCraken 1996] R. MacCraken and K. Joy. Free-Form Deformation with Lattices of Arbitrary Topology. SIGGRAPH 1996, pp. 181-188. [Mandal 2000] C. Mandal, H. Qin, B. Vemnri. Dynamic Modeling of Butterfly Subdivision Sufaces. Transactions on Visulaization and Computer graphics July-September 2000, pp. 265-287. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [McDormel 2000] K. McDonnell, H. Qin. Dynamic Sculpting and Animation of Free-Form Subdivision Solids. Computer Animation 2000, pp. 138-145. [Menon 1996] J. Menon. An Introduction to Implicit Techniques. SIGGRAPH Course Notes on Implicit Surfaces for Geometric Modeling and Computer Graphics, 1996. [Miliiron 2002] T. Milliron, R. Jensen, R. Barzel, A. Finkelstein. A Framework for Geometric Warps and Deformations. ACM Transactions on Graphics, Vol. 21, No. 1, January 2002, pp. 20-51. [Molnar 2002] S. Molnar. Nvidia Corp., Chief Architect. Personal Communication, March 2002. [Muller 2000] K. Muller, and S. Havemann. Subdivision Surface Tesselation on the fly using a Versatile Mesh Data Structure. Eurographics Vol. 19, No. 3, pp. 151-159, 2000. [Myer 1968] T. H. Myer and I. E. Sutherland. On the Design of Display Processors. Communications of the ACM, Vol. 11, No. 6, June 1968, pp. 410-414. [Nismscheck 1995] U. M. Nimscheck. PhD Thesis, The Computer Laboratory, University of Cambridge, Technical Report TR381, October 1995. [Noh 1998] J. Noh and U. Neumann. A Survey of Facial Modeling and Animation Techniques. USC TR-99-705, 1998. [Nvidia 2003] http://www.nvidia.com/Cg [Peng 1997] Q. Peng, X. Jin, J. Feng, Arc-Length-Based Axial Deformation and Length Preserving Deformation, Computer Animation, 1997 [Poggio 1989] T. Poggio and F. Girosi. A Theory of Networks for Approximation and Learning. A.I. Memo No. 1140. Artificial Intelligence Lab, MIT, Camabridge, MA, July 1989. [Sederberg 1986] T. Sederberg and S. Parry. Free-Form Deformation of Solid Geometric Models. SIGGRAPH 1986, pp. 151-160. [Singh 1998] K. Singh and E. Fiume. Wires: A Geometric Deformation Technique. SIGGRAPH 1998, pp. 405-414. [Terzopoulos 1987] D.Terzopoulos, J. Platt, A. Barr, K. Fleischer. Elastically Deformable Models. SIGGRAPH 1987, pp. 205-214. [Wyvill 1986] G. Wyvill, C. McPhetters, B. Wyvill. Data Structure for Soft Objects. The Visual Computer, Vol. 2, pp. 227-234, 1986 [Zorin 1997] D. Zorin, P. Schroder, W. Sweldens. Interpolating Subdivision for Meshes with Arbitrary Topology. SIGGRAPH 1997, pp. 259-268. 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix A A.1 Cg Program for Qudratic Uniform Free-Form Deformation s t r u c t a p p in : a p p l i c a t i o n 2 v e r t e x { f l o a t 4 u ; / / th e V ertex t o d i s p l a c e f l o a t 4 norm al; } ; s t r u c t v ertO u t : v ertex 2 fra g m en t { f l o a t 4 HPOS : HPOS; f l o a t 4 COLO : COLO; } ; f l o a t 4 bO( f l o a t 4 u ) { r e tu r n 0 . 5-u+ 0 . 5*u*u; } f l o a t 4 b l ( f l o a t 4 u ) { r e tu r n 0 . 5+u-u*u; } f l o a t 4 b 2 ( f l o a t 4 u ) { r e tu r n 0 . 5*u*u; } vertO u t m a in ( a p p in In, un iform f l o a t 4 x 4 ModelViewProj uniform f l o a t 4 L ightV ec, uniform f l o a t 4 cpOOO, uniform f l o a t 4 cpOOl, u n iform f l o a t 4 c p 0 0 2 , u n iform f l o a t 4 cpOlO, u n iform f l o a t 4 cp O ll, uniform f l o a t 4 c p 0 1 2 , u n iform f l o a t 4 cp02 0, u n iform f l o a t 4 cp021, un iform f l o a t 4 c p 0 2 2 , u n iform f l o a t 4 cplOO, un iform f l o a t 4 c p l O l , u n iform f l o a t 4 c p l 0 2 , u n iform f l o a t 4 c p l l O , u n iform f l o a t 4 cp111, u n iform f l o a t 4 c p l l 2 , u n iform f l o a t 4 c p l2 0, u n iform f l o a t 4 c p l 2 1, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. uniform f l o a t 4 c p l 2 2 , uniform f l o a t 4 cp 2 0 0 , uniform f l o a t 4 c p 2 0 1 , uniform f l o a t 4 cp2 02, uniform f l o a t 4 cp210, uniform f l o a t 4 cp211, uniform f l o a t 4 c p 2 1 2 , uniform f l o a t 4 cp22 0 , uniform f l o a t 4 cp 221, uniform f l o a t 4 cp222 ) { vertO u t Out; f l o a t 4 sum = f l o a t 4 ( 0.0, 0.0, 0.0, 0.0); f l o a t 4 temp; f l o a t 4 u = I n .u ; U. w = 0.0; / / f i r s t c a l c u l a t e th e q u a d r a tic b a s i s f u n c t i o n s f l o a t 4 r l = bO( u ); f l o a t 4 r2 = b l ( u ); f l o a t 4 r3 = b 2 ( u ); r l .w = 0.0; r 2 . w = 0.0; r 3 .w = 0.0; / / t r a n s p o s e th e b a s i s m a tr ix f l o a t 4 r5 = rl.xwww + r 2 . wxww; r5 . z = r3 . x ; r 5 .w = 0.0; f l o a t 4 r6 = r l . ywww + r 2 .wyww; r 6 . z = r3.y; r 6 .w = 0.0; f l o a t 4 r7 = rl.zwww + r2.wzww; r7 . z = r3 . z ; r 7 .w = 0.0; / / m u l t i p l y them out f l o a t 4 r8 = r5 .x * r 6; f l o a t 4 r9 = r5 .y * r6; f l o a t 4 rlO = r 5 . z * r6; f l o a t 4 rw; f l o a t 4 ra = f l o a t 4 ( 0.0, 0.0, 0.0, 0.0 ); rw = r8 . x * r7 ; ra = cpOOO * rw .x + ra; ra = cpOOl * r w .y + ra; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ra = cp002 * rw.z + ra; rw = r8 .y * r l ; ra = cpOlO * rw.x + ra; ra cp O ll * rw .y + ra; ra = cp012 ' k rw. z + ra; rw = r 8 . z * r l ; ra — cp020 * rw.x ra ra = cp021 * rw. y + ra ra = cp022 k rw. z + ra rw r9 .x * r l ; ra = cplOO • k rw.x + ra; ra cp lO l k rw.y + ra; ra cp l02 k rw. z ra; rw = : r9 .y * r l ; ra = cpllO * rw .x + ra ra = c p l l l * rw.y + ra ra c p l l 2 * rw. z -r ra rw = r9 . z * r l ; ra c p l2 0 • k rw.x + ra; ra c p l 2 1 k rw .y -r ra; ra _: cp l22 k rw. z + ra; rw = rlO .x k rl; ra = cp2 00 k rw .x + ra? ra = cp2 01 k rw.y + ra; ra cp2 02 k rw. z - r ra? rw = no .y k rl; ra cp 2 10 k rw .x 4- ra; ra = cp 21 1 k rw.y + ra; ra = cp 212 k rw. z + ra; rw = n o . z k rl; ra = Cp22 0 k rw .x + ra ra = cp221 k rw .y + ra ra = cp222 k rw. z + ra / / need t o c a l c u l a t e th e c o l o r /* l i g h t i n g c a l c u l a t i o n s * / f l o a t 4 l i g h t = n o r m a liz e ( L ightV ec ); f l o a t 4 ey e = f l o a t 4 ( 0.0, -40.0, 200.0, 1.0 ) ; f l o a t 4 h a l f = n o r m a l i z e ( l i g h t + e y e ); f l o a t d i f f u s e = d o t ( n o r m a l i z e ( In .n orm al) , l i g h t ); f l o a t s p e c u l a r = d o t ( n o r m a l i z e ( I n .n o r m a l ) , h a l f ); s p e c u la r = pow( s p e c u la r , 32 ); f l o a t 4 d i f f u s e M a t e r i a l = f l o a t 4 ( 1.0, 1.0, 1.0, 1.0) ; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. float4 specularMaterial = float4( 0.5, 0.5, 0.5, 1.0 ) O u t. COLO = d i f f u s e * d i f f u s e M a t e r i a l + s p e c u la r * s p e c u la r M a t e r ia l + 0.3; temp = ra; tem p .w = 1; Out.HPOS = mul(M odelViewProj, temp ) ; r e t u r n Out; } Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.2 Cg Program for Radial Basis Functions s t r u c t appin : application2vertex { flo a t 4 P o s i t i o n ; / / th e Vertex t o d i s p l a c e f l o a t 4 n orm al; f l o a t 4 t h e t a ; / / th e NEW Pose sp a ce p o in t f l o a t 4 wO; f l o a t 4 wl; f l o a t 4 w2; f l o a t 4 w3; f l o a t 4 w4; f l o a t 4 w5; f l o a t 4 w6; f l o a t 4 w7; } ; / / d e f i n e th e ou tp u t r e g i s t e r s s t r u c t v ertO u t : v e r te x 2 fra g m e n t { f l o a t 4 HPOS : HPOS; f l o a t 4 COLO : COLO; }; f l o a t h ( f l o a t x, f l o a t c ) { / / f o r g a u s s ia n f l o a t temp; temp = x / c ; temp = temp * temp; r e tu r n e x p ( -temp ); } v ertO u t p s d ( appin In, u n iform f lo a t 4 x 4 M odelV iew Proj, uniform f l o a t 4 x 4 ModelViewIT, uniform f l o a t c, uniform f l o a t 4 LightVec uniform f l o a t 4 th e ta O , uniform f l o a t 4 t h e t a l , uniform f l o a t 4 t h e t a 2 , uniform f l o a t 4 t h e t a 3 , uniform f l o a t 4 t h e t a 4 , uniform f l o a t 4 t h e t a S , uniform f l o a t 4 t h e t a 6 , uniform f l o a t 4 th e t a ? ) { vertO u t O ut; Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. f l o a t 4 sum = f l o a t 4 ( 0.0, 0.0, 0.0, 0.0); f l o a t 4 temp; temp = I n . t h e t a - th eta O ; sum = sum + (In.wO * h ( le n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - t h e t a l ; sum = sum + (In .w l * h ( le n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - th e ta 2 ; sum = sum + (In.w2 * h ( le n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - t h e t a 3 ; sum = sum + (I n . w3 * h ( l e n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - t h e t a 4 ,- sum = sum + ( I n .w4 * h ( l e n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - t h e t a S ; sum = sum + (In.w5 * h ( l e n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - th e ta 6 ; sum = sum + ( I n .w6 * h ( l e n g t h ( t e m p ) , c ) ) ; temp = I n . t h e t a - t h e t a 7 ; sum = sum + (In.w7 * h ( l e n g t h ( t e m p ) , c ) ) ; /* c a l c u l a t i n g l i g h t i n g */ f l o a t 4 l i g h t = n o rm a lize ( L ightV ec ); f l o a t 4 eye = f l o a t 4 ( -10.0, 0.0, 20.0, 1.0 ) ; f l o a t 4 h a l f = n o r m a l i z e ( l i g h t + eye ); f l o a t d i f f u s e = d o t ( n o r m a l i z e ( I n .n o r m a l ) , l i g h t ); f l o a t s p e c u la r = d o t ( n o r m a l i z e ( I n .n o r m a l ) , h a l f ); s p e c u la r = pow ( s p e c u la r , 32 ); f l o a t 4 d i f f u s e M a t e r i a l = f l o a t 4 ( 0.7, 0.7, 0.7, 1.0); f l o a t 4 s p e c u la r M a te r ia l = f l o a t 4 ( 0.3, 0.3, 0.3, 1.0 ) O ut. COLO = d i f f u s e * d i f f u s e M a t e r i a l + s p e c u la r * s p e c u la r M a te r ia l + 0.1; temp = I n . P o s i t i o n + sum; tem p. w = 1; O u t.HPOS = m ul(M odelV iew Proj, temp ); r e tu r n O ut; } Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Appendix B B.1 Proposed API Specification Basic Setup API Function Name Parameters Description glMacc Number of loops, two arrays to multiply-accumulate Direct access function to the hardware FMAC unit. glDot Two arrays to perform the dot product. Initializes the dot product engine glNewFFDRenderer FFD Lattice points, the basis functions, the order of the basis functions Initializes the FFD engine with the specified parameters glNewRBFRenderer Control points, weight vectors, blending function Initializes the RBF engine with the specified parameters gluNewP SDRenderer Pose space definition (Number of degrees of freedom) Initializes the PSD engine with the specified parameter gluNewSurfaceDeformRenderer Control points, weight vectors; Blending function Initializes the RBF engine with the specified parameters gluNewSSDRenderer Blending matrices Initializes the dot product engine with the specified parameters gluNewFEMRenderer FEM stiffness matrix and displacement matrix Initializes the FEM engine with the specified paramters Table 3: Basic Setup API Rendering primitives GluBeginSpecial Token to specify the type of primitive to render, ID to specify the rendering engine parameters to use Works similar to glBegin except it has an additional parameter that specifies the usage of a Tenderer defined in one of the previous functions above GluEndSpecial None Marks the end of the rendering sequence. Works similar to glEnd G1FFD Vertex Parameterized coordinate, the location of the lattice cell to use Using the parameterized coordinate and a specific cell of the lattice, calculate the new object point G1RBF Vertex Weight vectors, new point location To be used by RBF and PSD Tenderers to render the primitive by specifiying the weight vectors and the new point location GIDotVertex Vectors to perform dot product To be used by SSD and FEM Tenderers to specify the calculate the new point location Table 4: Rendering Primitives 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 
Asset Metadata
Creator Chua, Clint Chester N. (author) 
Core Title A modular approach to hardware -accelerated deformable modeling and animation 
Contributor Digitized by ProQuest (provenance) 
School Graduate School 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,OAI-PMH Harvest 
Language English
Advisor Neumann, Ulrich (committee chair), Parker, Alice C. (committee member), Requicha, Aristides (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-630619 
Unique identifier UC11340111 
Identifier 3116680.pdf (filename),usctheses-c16-630619 (legacy record id) 
Legacy Identifier 3116680.pdf 
Dmrecord 630619 
Document Type Dissertation 
Rights Chua, Clint Chester N. 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button