Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
RGBD camera based wearable indoor navigation system for the visually impaired
(USC Thesis Other)
RGBD camera based wearable indoor navigation system for the visually impaired
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
RGBD Camera Based Wearable Indoor Navigation System for the Visually Impaired by Young Hoon Lee A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) May 2016 Copyright 2016 Young Hoon Lee Acknowledgements First of all, I would like to thank my PhD advisor, Professor Grard Medioni, for being an excellent advisor and mentor. He has helped me pursue my ideas while offering insights and a fresh perspec- tive at important moments. His enthusiasm on research have been great inspiration in my computer vision research. Pursuing a PhD is a long journey that has many (hidden) pitfalls along the way. I could have been the one giving up on this journey when trapped in one of those. His guidance, patience, and encouragement has helped me to overcome those difficulties and move on. My PhD work would not be accomplished without his constructive advice and warm support. I would also like to thank the members of my PhD committee, Professor James Weiland, Professor Alexander Sawchuk, Professor Armand Tanguay, Professor Laurent Itti, and Professor Gaurav Sukhatme for their insightful advice. I am very grateful to Dr. Jongmoo Choi and Dr. Sungchun Lee for being such good mentors and friends of mine. Thanks to their helps and supports, I could easily get used to living in LA and complete my PhD program without any big troubles. I am glad to have Anh Tran Tuan, Dr. Tung-Sing Leung and Dr. Anustup Kumar Choudhury as office mates in the lab. We have had many conversations on life and research, and have encouraged each other to make more solid progress in our research. I truly thank them for being such nice and supportive friends. ii All of the past and current members of the USC Computer Vision Lab also made my time at USC special: Dr. Prithviraj Banerjee, Dr. Pramod K. Sharma, Dr. Dian Gong, Dr. Xuemei Zhao, Dr. Furqan M Khan, Dr. Cheng-Hao Kuo, Dr. Zhuoliang Kang, Dr. Chen Sun, Dr. Vivek Pradeep, Dr. Jan Prokaj, Dr. Bo Yang, Dr. Yann Dumortier, Dr. Iacopo Masi, Dr. Remi Trichet, Dr. Yinghao Cai, Shay Deutsch, Matthias Hernandez, Kanggeon Kim, Jungyeon Kim, Borjeng Chen, Jatuporn Leksut, Rama Reddy Knv, Ruizhe Wang, and Aminat adebiyi. I would like to thank all of them for being friendly and helpful colleagues. I wish them all the best. I am thankful to my family in Korea endless support and love across thousands of miles. And last but not the least, I would like express special thanks to my wife, Heeyon, and daughter, Chloe, for their unconditional love, support, and encouragement. This dissertation is dedicated to you. iii Table of Contents Acknowledgements ii List of Tables vii List of Figures viii Abstract xi 1 Introduction 1 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Reflectance based devices . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 GPS based devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 IMU based devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.4 Computer vision based devices . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.5 other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Hardware components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Navigation software components . . . . . . . . . . . . . . . . . . . . . . . 15 2 Real-time 6DOF pose estimation 19 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 Iterative Closest Point (ICP) . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Orientation initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 RGBD visual odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 frame-to-frame pose estimation . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.2 Pose refinement using ICP and the ground plane . . . . . . . . . . . . . . 24 2.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.3.1 Experiment 1: Pose estimation with a blind-folded subject . . . . 27 iv 2.3.3.2 Experiment 2: Pose estimation with blind subjects . . . . . . . . 32 3 Real-time 2D occupancy grid mapping 36 3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2 2D Occupancy Grid Mapping and Traversability Analysis . . . . . . . . . . . . . . 37 3.3 Safety measure using chamfer distance transform . . . . . . . . . . . . . . . . . . 41 4 Offline Data storage 42 4.1 Managing multiple map building results . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Multi level hash table data structure . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Map merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 Dynamic path planning in the presence of obstacles 53 5.1 D* Lite algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Waypoint generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3 Cue generation and Haptic feedback . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.4.1 validation of waypoint generation algorithm . . . . . . . . . . . . . . . . . 60 6 Mobility experiments 64 6.1 Mobility Experiments with Visually Impaired Subjects . . . . . . . . . . . . . . . 65 6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1.2.1 Subject Recruitment . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1.2.2 Training with manual cue generation . . . . . . . . . . . . . . . 67 6.1.2.3 Testing with autonomous cue generation . . . . . . . . . . . . . 68 6.1.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1.3.1 Training with manual cue generation . . . . . . . . . . . . . . . 69 6.1.3.2 Testing with autonomous cue generation . . . . . . . . . . . . . 72 6.2 Mobility Experiments with Bilnd-Folded Subjects . . . . . . . . . . . . . . . . . . 76 6.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7 Conclusion and Future work 81 7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.2 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.1 Robust egomotion estimation by multi modal sensor integration . . . . . . 84 7.2.2 Usability improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 v Appendix A Staircase Detection From a Wearable Stereo Camera . . . . . . . . . . . . . . . . . . . 86 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 A.3 Overview of algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 A.3.1 Candidate Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A.3.2 False Alarm Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 A.3.2.1 Ground plane estimation . . . . . . . . . . . . . . . . . . . . . . 90 A.3.2.2 Temporal Consistency . . . . . . . . . . . . . . . . . . . . . . . 91 A.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 A.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 A.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Reference List 98 vi List of Tables 6.1 Summary of data from subject recruitment. . . . . . . . . . . . . . . . . . . . . . 67 vii List of Figures 1.1 Obstacles in blind spots of the white cane which may cause injuries. . . . . . . . . 2 1.2 Overview of the proposed prototype: A head-mounted camera+IMU sensor, a smartphone user interface, and tactile interface system. . . . . . . . . . . . . . . . 3 1.3 Overview of the navigation system . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4 Snapshots of smartphone application interface for a blind user . . . . . . . . . . . 15 2.1 Overview of the pose estimation algorithm using both sparse visual features and dense point clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 (Left) Shaded areas in green color are points that have a normal parallel to the gravity vector and used to extract the ground planeP G . (Right) The ground plane model is represented in the global reference frame. . . . . . . . . . . . . . . . . . 21 2.3 Sparse feature based visual odometry that finds matches between visual feature points in image frames between the current observation and a reference keyframe. It then finds matches of corresponding 3D points to calculate motion in 3D space. 23 2.4 FAST corners and their correspondences from the last keyframe after refinement. . 24 2.5 Fast ICP using the ground plane and initial guess aided by the sparse feature based pose estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 A sliding window buffer that maintains the n most recent keyframes for Fast ICP . 27 2.7 A floor plan of the experimentation area . . . . . . . . . . . . . . . . . . . . . . . 28 2.8 Trajectories of sparse feature based algorithm (Red), our algorithm (Green), our algorithm after graph optimization (Blue) is shown. The ground truth is displayed in a dotted Gray line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.9 Qualitative comparison of reconstructed 3d map using different methods . . . . . . 31 2.10 Experimental setup and results with the blind subjects . . . . . . . . . . . . . . . . 33 2.11 Absolute and relative errors of each experiments are presented. . . . . . . . . . . . 34 2.12 Error comparison against one of the state-of-the-art RGBD pose estimation algo- rithm based on sparse visual features. . . . . . . . . . . . . . . . . . . . . . . . . 34 2.13 Error comparison against one of the state-of-the-art RGBD pose estimation algo- rithm based on sparse visual features. . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1 Local 3D voxels is displayed with respect to the world reference frame. Local traversability is calculated O w and O c represent the origin of the world reference frame and the camera reference frame, respectively. . . . . . . . . . . . . . . . . 37 viii 3.2 A filtering process using erosion and dilation. Black and white cells represent vertical and horizontal patches, respectively. . . . . . . . . . . . . . . . . . . . . 38 3.3 Red dots represent obstacles. Numbers in each cell represents chamfer distance. The darker, the closer to obstacles. . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1 Trajectories of two data collection tasks on a floor plan (Left). 3D maps and 2D occupancy grid map of data collection tasks are presented (Right). . . . . . . . . . 42 4.2 2D, 3D, and chamfer distance maps are stored after a mapping process. . . . . . . 43 4.3 Multi level hash table structure of a map . . . . . . . . . . . . . . . . . . . . . . . 45 4.4 Time and memory profile of the proposed data structure. B and F represent the number of buildings and floors in a building, respectively. . . . . . . . . . . . . . . 48 4.5 Graphical representation how the system makes a decision to merge two maps. . . 50 4.6 Graphical representation how the system makes a decision to merge two maps. . . 51 4.7 Deformation of 2D grid map caused by error accumulation can not be corrected by rigid transformations only . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.1 An example of a waypoint update scenario. . . . . . . . . . . . . . . . . . . . . . 55 5.2 Hysteresis loop equation for cue generation. The left turn cases are simple a reflec- tion of the graph about the vertical axis. . . . . . . . . . . . . . . . . . . . . . . . 58 5.3 Timings of each process over 1500 frames. . . . . . . . . . . . . . . . . . . . . . 59 5.4 Scene 1: A scanning desk to an exit . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.5 2 sighted users traveled and recorded RGBD data for the system in 10 different spatial configuration. Estimated trajectories of the sighted subjects (green), the shortest path to reach a goal (red), and the waypoints generated by the proposed algorithm (blue) are represented. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.1 Testing protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 Training environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3 Experimental setup and results with visually impaired subjects . . . . . . . . . . . 69 6.4 Time to completion of a training navigation course per subject per trials. . . . . . 70 6.5 Average time and variability to completion of a training navigation course. . . . . 71 6.6 Trajectories each subject took to reach the destination. . . . . . . . . . . . . . . . 72 6.7 Time to completion of a testing navigation course per subject per trials. . . . . . . 73 6.8 Average time to completion of a testing navigation course. . . . . . . . . . . . . . 74 6.9 Responses collected from visually impaired subject to the System Usability Scale (SUS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.10 A floor plan of the mobility experiments. Red boxes and lines indicates obstacles that are not reflected in the floor plan. . . . . . . . . . . . . . . . . . . . . . . . . 76 6.11 The average elapsed time to complete a navigation task for the Printer (left) and Exit (right) destination per subject. . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.12 The average time to complete a navigation task and variability of the blind folded subjects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 ix A.1 Algorithm flowchart is shown. Candidates are detected using Haar cascade clas- sifiers and additional false detection(yellow boxes) removal processes are applied using the ground plane from a stereo vision system and temporal consistency. . . . 88 A.2 Staircases detected using the proposed algorithm are shown. Note that the proposed algorithm can detect staircases from various view points and lighting conditions . . 92 A.3 ROC curves and performance analysis of our staircase detector. . . . . . . . . . . . 95 x Abstract This thesis presents a wearable RGBD camera based indoor navigation system for the visually impaired (VI) that guides a blind user from the current location to another location. The require- ments for such a system are accurate real-time camera pose estimation, real-time dynamic path planning in the presence of obstacles, and efficient interface with low cognitive loads. In order to estimate a 6-DOF pose of the blind user at real-time, we perform ego-motion estimation, known as visual odometry, using sparse visual features. However, sparse feature based approaches suffer from drifts. Therefore, we refine the estimated poses using dense point clouds and the ground plane when a new keyframe is generated. The system subsequently builds a local vicinity map based on dense 3D data obtained from a RGB-D camera as well as the global 2D probabilistic occupancy grid map for traversability analysis. We have improved the system from the previous prototype by enabling the storage of results of a mapping process. An efficient multi level data structure is proposed to store, load, and manages data efficiently. Next, the shortest path between point A to point B is generated on based on this traversability analysis. The system analyzes the shortest path and generates a safe and efficient waypoint which xi is an intermediate goal to reach a destination. We developed an automatic waypoint generation algorithm that enhances trajectory planning of the blind subjects. Pilot studies showed that the proposed waypoint generation algorithm provides trajectories similar to sighted subjects’ trajecto- ries compared to the shortest path algorithm. A relative direction from the camera to the waypoint is calculated and appropriate cues are generated to guide a blind user to the adjacent area of the waypoint and destination eventually. The wearable system prototype is composed of a smart phone user interface, head-mounted RGBD camera, a mapping algorithm to integrate multi Simultaneous Localization And Mapping (SLAM) results in large scale, real-time navigation algorithm, and haptic feedback vest. The pro- posed system achieves real-time navigation performance at 28:6Hz in average on a laptop, and helps the visually impaired extends the range of their activities and improve the orientation and mobility performance in a cluttered environment. We have evaluated the performance of the proposed system with blind-folded and blind sub- jects. The mobility experiment results show that navigation in indoor environments with the pro- posed system avoids collisions successfully and provides more stable mobility performance of the user compared to conventional and state-of-the-art mobility aid devices. xii Chapter 1 Introduction Try and imagine going to somewhere with your eyes closed. You will soon realize that walking without sight highly affects the confidence in your ability to travel independently. According to the 2011 National Health Interview Survey, 21.2 million American adults age 18 and older reported suffering from vision loss [52]. Vision loss interferes with various daily routines such as reading, mobility, face recognition. Moreover, it also increases the risk of injuries caused by collisions and falls. Previous researches have proved that vision loss severely lowers the mobility of the visually impaired [10, 56, 61]. Visual impairment also increases the risk of unintentional injuries which often result in medical consequences [31, 36]. Long canes and the guide dogs have been the most popular techniques among the blind for navigation and collision avoidance purposes. An interview in [36] also indicated that 12% and 55% of the visually impaired interviewees have used the guide dog and the long cane as a primary mobility aid, respectively. However, the guide dog has very limited availabilities as only about 1,500 individuals are estimated to graduate from a dog-guide user program [43] every year. The relatively small coverage range and a reactive nature of the long 1 cane still accompany a high risk of collision, because the visually impaired can avoid obstacles only when they make contact with obstacles. Moreover, the long canes can cover the lower body portion of a blind user only. As shown in Figure 1.1, the long canes, the most popular mobility aid, sometimes fail to help the visually impaired safely ambulate in many real complex scenes. Over the past years, computer vision and mobile robotics technologies have advanced remark- Figure 1.1: Obstacles in blind spots of the white cane which may cause injuries. ably aided by improvements of vision sensors on measuring 3D structures in terms of an accuracy, price, power profile, and size. This noteworthy improvement in 3D sensing has driven various suc- cessful applications in autonomous navigation, manipulation, and human-robot interaction fields. Autonomous robotics navigation and human navigation are different in many respects including sensory systems, spatial representation, and behaviors. However, underlying problems in the two 2 domains are quite similar [60]. Therefore, we claim that it is very appropriate to transfer these advanced computer vision and au- tonomous robotics technologies into a wearable navigation system for the visually impaired to im- prove their mobility. This proposal investigates the relevant system and algorithmic design issues, provides solutions to some of the issues addressed and validate the solutions through experiments. 1.1 Problem statement Figure 1.2: Overview of the proposed prototype: A head-mounted camera+IMU sensor, a smart- phone user interface, and tactile interface system. The goal of the proposed wearable indoor navigation system for the visually impaired is to provide the navigation aids from the current location to another location. To achieve the goal, the system must answer the following questions: Where am I in the environment and with respect to the destination (Localization) ? What mobility-related risks are in the environment (Mapping)? How 3 do I go to a destination safely (Dynamic path planning in the presence of obstacles)? Therefore, the system needs to have an ability to 1) compute self-motion and 2) interpret the spatial representation in a way to perform motion planning. The process of computing self-motion in robotics or computer vision field is referred to as egomotion estimation. In order to maintain a sense of egocentricity of a blind user, we contin- uously estimate the relative camera pose with respect to a reference coordinate frame. Delayed processing that does not correspond to a specific spatial configuration often gives rise to the risk of collisions and confusions of the blind user. Hence, we need to perform real-time egomotion estima- tion while maintaining accuracy, which is a very challenging problem. Pose estimation algorithms using dense 3D point clouds such as volumetric reconstruction [42, 63, 62] are computationally very expensive and not suitable for wearable real-time navigation applications in general for their demanding hardware requirements. Therefore, we solve egomotion estimation problem mainly us- ing sparse visual features (FAST) from a RGB image and their corresponding 3D information. In order to reduces drifts of the sparse visual feature based ego-motion algorithm, we make a use of dense 3D point clouds. To meet the real-time requirements, we developed a fast dense 3D point clouds Iterative Closest Point (ICP) algorithm using the ground plane as a global reference. Map- ping and dynamic path planning in large scale in 3D space are a computationally expensive tasks for real-time applications where dynamic replanning is often required. Therefore, an efficient rep- resentation of the surrounding environment is essential for a real-time mapping and dynamic path planning. We divide the world using a grid representation with three states: occupied, free, and unknown. The state of a cell in the grid is determined by a occupancy probability of the cell. This 4 representation maintains abstract scheme of the surrounding environment relevant to navigation task which becomes a basis for more efficient operations in path planning. Shortest path planning algorithms such as [28] provide the most efficient path in distance to reach a destination. However, they do not take physical constraints such as a dimension of a body of a blind user into consider- ation. Therefore, the shortest path is perceived as unrealistic and can not be directly used for our application. Furthermore, delivering the entire path to a blind user causes another difficult user in- terface issue. Here, we solve this problem by generating a waypoint, an intermediate goal towards destination, to guide the blind subject. The goal of this process is to generate an efficient, stable and safe waypoint based on the shortest path and traversability analysis information and that can be easily delivered to the tactile feedback system to guide the blind user. A design of an working prototype and implementation of the aforementioned on the working prototype is another problem to solve. The proposed wearable indoor navigation system is composed of a several modules and designed to minimize cognitive loads of the blind. At last, we need to validate the performance of the proposed system on human subjects. 1.2 Challenges A wearable RGB-D indoor navigation system faces the following common challenges. Lack of the global reference: For a successful navigation, localization is an essential step. Localization is a process to find a relative locations and orientation of a system by analyz- ing measurements from attached sensors with respect to a global reference coordinate or 5 landmarks. Unlike outdoor environments, such a reference from a global positioning system (GPS) is not readily available in indoor environments. Therefore, a main challenge in indoor localization method is how to perform accurate relative pose estimation without such global reference. Real-time constraint: The wearable navigation system should process sensor inputs in a timely manner to generate a cue that corresponds to a specific spatial configuration at given time. Delays often give rise to the risk of collisions and confusions of the blind user be- cause the information delivered to the user does not match the surrounding that is presented to the user anymore. Visual Simultaneous Localization And Mapping (SLAM) in robotics and computer vision field has shown very successful performance recently. Visual SLAM approaches build an incremental map of traversed environments and localizes the system within the map simultaneously by detecting and matching visual features. One disadvantage of employing Visual SLAM approaches for real-time applications in large scale is that it is computationally very expensive as the size of features and map grows. Thus, a challenge to use Visual SLAM approaches is to design an efficient data structure and process to meet the real-time requirement. Physical limitations in wearable platform: There exist various design considerations for a wearable navigation system. As illustrated in [12], there are many physical requirements of wearable form factors such as a size and weight compared to autonomous vehicle platforms or robotics platforms. Therefore, there are much fewer sensors available to use. As Hayden 6 pointed out in [12], the performance of the device usually degrades as the size of the device reduces. And therefore, relatively low quality, narrow field of view, and limited 3D informa- tion is available from the wearable platforms. The performance of Visual SLAM approaches is dependent on the quality of 2D / 3D inputs. Hence, maintaining accuracy and performing successful navigation tasks using limited hardware resources of wearable navigation system from a moving camera are also a very challenging problem. Effective user interface: An important component of wearable is the human interface which conveys a cue generated by navigation algorithms into a suitable signal for the user. Vision is the most dominant and reliable sensory modality to build spatial information. The visually impaired, however, has limitations in utilizing the most reliable sensory modality partly or entirely depending on the severity of their vision loss. Hence, people with vision loss rely on other sensor modalities to substitute for vision for many tasks, which increases cognitive loads. For this reason, a very efficient communication interface that adds minimum cognitive loads to the blind is required. 1.3 Related work Electronic Travel Aids (ETA) are elaborately designed for the visually impaired to enhance user confidence for independent travel, rather than to replace conventional aids such as the long cane and guide dog completely. Various types of ETAs including ultrasound, laser ranging, or imaging 7 sensors have been proposed. However, there is currently no standardized or complete system avail- able on the market. Such devices mainly consist of three basic components: a sensor, a processing system, and an interface. A sensor captures data of surrounding environments in a specific type. Then a system processes this data to generate useful information for the visually impaired user to make a decision. Lastly, an interface delivers this processed information to the person using an ap- propriate sensory modality. Since target users of the applications are suffering from blindness, the user interface needs to employ sensory modality such as auditory or tactile to convey information, which is called sensory substitution. 1.3.1 Reflectance based devices Reflectance based devices emit a signal, either light or sound, and analyze the reflected signal to detect obstacles. Examples include the Mowat Sensor [50], the Nottingham Obstacle Detector (NOD) [3], Miniguide [16], and the Binaural Sonic Aid (Sonicguide) [25]. Virtual white cane used laser sensor and a vision sensor to detect uneven surfaces and especially staircases. However, these devices only provide local obstacles avoidance functionalities. Therefore, they require significant training, as the lack of spatial context in local range data does not deliver rich enough information to interpret the environment directly. In addition, the user often has to perform additional measure- ments when an obstacle is detected, to figure out dimensions of the obstacle. Sonic pathfinder [8] and NavBelt [54] used an array of ultra-sound sensors attached to a belt or mobile robots to provide 8 enhanced obstacle avoidance algorithm. However, the design is too constrained to be a commer- cial use for the blind. [54] used mobile robot obstacle avoidance system as a guidance device for blind and visually impaired people. Just as electronic signals are sent to a mobile robot’s motor controllers, they used auditory signals to guide the blind traveler around obstacles, or alternatively, provide an ”acoustic image” of the surroundings. All reflectance-based systems experience power consumption, portability, traversability, and lack of complete user control issues that limit system effectiveness. 1.3.2 GPS based devices Some ETAs devices use Global Positioning System (GPS) to help a visually impaired localize. GPS based ETAs receive signals that indicates satellites’ location and the current time from them and determine the position from the signals. With the increasing prevalence of smart devices, various sensors including GPS have become ubiquitous. Wilson et al. proposed SWAN [64], a wearable computer consisting of audio-only output and tactile input via a task specific hand-held interface device. SWAN aids a user in safe pedestrian navigation and let user add new GIS data pertinent to wayfinding, obstacle avoidance, and situational awareness tasks. Loadstone GPS is an application written for Symbian/Series60 mobile phones that works in conjunction with a Blue- tooth GPS receiver to guide a user to various geographical locations. Drishti [14] uses a GPS and ultrasound positioning system with a wireless wearable platform, and audio feedback communi- cation interface to guide blind users and help them travel in familiar and unfamiliar environments 9 independently and safely. Drishti uses DGPS as its location system to provide the user with an optimal route by means of its dynamic routing and rerouting ability outdoors. The system can also be deployed indoor environments from an outdoor with a simple vocal command. An ultrasound positioning system is used to provide precise indoor location measurements. The user can get vocal prompts to avoid possible obstacles and step-by-step walking guidance to move about in an indoor environment. In [47], MOBIC Travel Aid (MOTA) was proposed to complement primary mobility aids such as the long cane or guide dog based on the technologies of geographical information sys- tems (GIS) and the Global Positioning System (GPS). The MOBIC Travel Aid (MOTA) consists of two interrelated components: the MOBIC Pre-journey System (MOPS) to assist users in planning journeys and the MOBIC Outdoor System (MOODS) to execute plans by providing users with orientation and navigation assistance during journeys. However, the primary usage of the devices is in outdoor applications since GPS based devices do not receive GPS satellites signal indoors. In addition, the devices do not aid obstacle avoidance. 1.3.3 IMU based devices A recently proposed electronic mobility aids for the blind uses an IMU sensor mounted on a smart phone [2, 33]. One advantage of the system is that it provides an inexpensive localization solution if the blind user already owned a smart phone. The system estimates the step length of a user and the number of steps at each frame and calculates the traveled location of the user using the particle filter. However, the system assumes that it already has a metric map of accessible areas with some distinctive landmarks. Hence, the system neither operates in unknown environments nor has an 10 ability to build a map of the surrounding environments. And the system does not provide obstacles avoidance either. 1.3.4 Computer vision based devices Researchers have studied the Simultaneous Localization and Mapping (SLAM) problems over the past decades and made astonishing advancement. SLAM is a technique to estimate the motion of an observer as it continuously explores unknown environments and simultaneously maps the previously unknown environments using observations from sensors. As cheaper and faster cameras and processors have become available in the market, many researchers have investigated computer vision techniques to build more intelligent mobility aids using high bandwidth image data. Stereo camera based algorithm have been adopted widely as it provides 3D information suit- able for obstacle detection and map building. In [6], depth information is generated using disparity information from a pair of simple stereo camera. User learns to associate patterns of tactile stimuli with depth information of objects. However, learning process for different objects requires inten- sive inference works for human subjects imposing heavy cognitive loads. Furthermore, the subject is forced to pay conscious attention to redundant or unnecessary data. Lu et al. [34] used a stereo system to detect a curb/stairway and localize for use in a navigation assistance system. As the sub- ject traverse and scan around the surrounding environments, the camera now undergoes continuous motion in six degrees of freedom (6 DOF ). Position of the camera in every frame will have to be adjusted for the change in the global coordinate system. This requires estimation of the camera motion from frame to frame. This problem is well-known in computer vision as Structure from 11 motion (SfM) and in robotics as Simultaneous Localization and Mapping (SLAM). In [37, 51], the authors proposed a shoulder-mounted stereo vision based system that estimates a camera trajectory to predict user motion and build the vicinity 3D maps. Recently, a real-time wearable, stereo-vision based, navigation system for the visually impaired [49] was proposed. The system adopted a head- mounted design to enable blind users to stand and scan. The system implements a realtime SLAM algorithm, an obstacle avoidance algorithm, and a path planning algorithm. An appropriate cue is generated based on these algorithms and delivered to the tactile sensor array to alert the user to the presence of obstacles, and provide a blind user with guidance along the generated safe path. The stereo vision systems suffer form a low textured environment such solid color walls because stereo matching algorithms do not provide accurate depth information to build a consistent 3D map of the environment for successful navigation in the environment. With recent introduction of affordable yet effective RGB-D sensors, SLAM using RGB-D has gained enormous popularity among the researchers. Firstly, traditional approaches based on visual feature matching or Iterative Closest Popint (ICP) algorithms have been proposed [23, 15] to solve localization, motion estimation, and mapping problems. Advances of GPU performance along with the RGB-D sensor brought forth a huge popularity in real-time 3D reconstruction applica- tions. KinectFusion [42] used depth images only and achieved a real-time dense 3D reconstruction using 3D voxel grids models. Whelanetal. proposes Kintinuous, a spatially extended version of the KinectFusion, which allows dense volumetric modeling in a larger area by translating the volumet- ric model as the sensor translates [63, 62]. [55] introduced a multi-resolution surface reconstruction algorithm that integrates dense image alignment in a SLAM framework to effectively reduce drift 12 while keeping the advantages of dense image registration. However, the works mentioned above mainly focus on small scale 3D reconstruction problems and do not provide a real-time solution to navigation tasks. Monocular cameras have been also adopted to read signs or traffic signals for the visually impaired. OrCam’s product consists of a camera attached to a glasses frame [46]. The camera sees objects, shapes, sizes, words, or anything else around a user, analyzes the image, and tells users what they are seeing. Using their finger, users point in the direction of the object they want to see, and the system instantly announces what is observed. As a result, a visually impaired user could use the system to cross the street, just by pointing at a traffic light and waiting for the system to announce ”green light”. In a supermarket, users can pick up product and the device will tell them what it is, even reading off the ingredients for the user. The focus of the system is more on enabling the blind user to be able to perceive signs and letter rather than on providing mobility aids as they do not provide rich 3D information required for mapping tasks. 1.3.5 other approaches Besides aforementioned approaches, a variety of Some researchers propose to utilize a RFID tag network for mapping and sonar sensors for local obstacle avoidance [27]. While [58] introduced a robot guide to replace the guide dogs. While the system performs admirably, they are likely not cost and time effective for installation in homes, smaller locations, or environments familiar to the traveler. 13 1.4 Outline 1.4.1 Hardware components The navigation system is composed of a smart phone user interface, head-mounted RGBD camera, navigation software, and haptic feedback vest as shown in Figure 1.2. A head-mounted design has become a popular form factor for wearable device applications [11, 40]. The head-mounted design is ideal for scanning around surrounding environments as the VI user traverse to reach a goal. It matches the frame of reference of the person, allowing relative position commands and extends the range of obstacle detection by detecting obstacles located at the blind sight of the long cane. The system is also equipped with a small 9 DOF MEMS IMU sensor on top of the RGB-D sensor for an orientation initialization purpose. The IMU sensor communicates with the system at 15Hz using Bluetooth communication. Lastly, the haptic feedback system consisting of four micro-vibration motors is designed to guide the VI user along the computed path and to minimize cognitive loads. Figure 1.3: Overview of the navigation system 14 1.4.2 Navigation software components The navigation software is comprised of several modules, which are a user interaction, single ses- sion mapping, a mapping algorithm that integrates multiple single SLAM results, and the real-time navigation module as shown in Figure 1.3. The system first interacts with the blind user to initiate a navigation task. Figure 1.4: Snapshots of smartphone application interface for a blind user The blind user provides the system with a minimal set of information such as a building name, floor number, the name of a starting location, and the name of a destination. The starting location can be an approximation of the current location of the user as a localization algorithm can correct a minor deviation. A smartphone application shown in Figure 1.4 helps the VI choose a place from a list of registered places or translates a name of places by speech recognition engines to the system. If the proposed system does not find either the starting location or the destination in the database, the system starts a map building module to collect missing information. In this case, a sighted person or blind user with the aid of a sighted guide travels to the destination. While 15 traveling, the system performs a SLAM task that is to estimate motion of the VI user and builds a map between the starting points and the destination. First, a real-time 6DOF RGBD camera pose estimation algorithm of the navigation system is proposed (Section 2). In order to extract orientational information of the blind users, we perform ego-motion estimation, known as visual odometry, using sparse visual features. However, sparse visual feature based approaches suffer from drifts. In order to reduces drifts of the sparse visual feature based ego-motion algorithm, we make a use of dense 3D point clouds. Finding motion between two dense 3D point clouds are computationally expensive process. Therefore, we propose a fast dense 3D point clouds Iterative Closest Point (ICP) algorithm using the ground plane as a global reference to meet the real-time requirements. The system builds a local vicinity map based on dense 3D data obtained from a RGB-D camera as well as the global 2D Probabilistic Occupancy Grid (POG) map for traversability analysis which is going to be a basis of path planning and obstacle avoidance while performing pose estimation (Section 3). The traversability analysis classifies the surrounding environments into free, occupied, and unknown states based on occupancy probabilities of each cell. After a map building task has been finished, the system associates the starting point and the destination with the information generated by the map building algorithm and stores collected information offline for reuse. The system stores a 3D map consisting of a set of point clouds of traversed areas, RGB images, poses of keyframes, and adjacency list of each place visited. We believe that 3D information which is viewpoint invariant in addition to RGB images is required in our applications because of a physical limitation of the sensor such as the limited field of view. For 16 example, a VI user may travel an area from an opposite direction to where the system had collected RGB images. 2D image based loop detection algorithms often fail to localize under such drastic viewpoint changes. The system also needs to associate a new set of stored data to the existing map and integrate the information accordingly. In order for a navigation system to handle increasing amount of information at real-time, a scalable and efficient data structure is required. For that purpose, we propose to construct a multi level hash table data structure for a constant read, write, and access time. The system also stores an adjacent list of each place labeled by the VI user. This will allow the system to quickly find the connectivity between any two nodes in the system before initiating a real-time navigation task. If the proposed system finds information needed to navigate from the starting location to the destination from the database, the system loads information required maps such as 2D POG, 3D maps and location information of the destination and starting point on the 2D POG map. After loading from DB has been completed, the system begins performing a real-time navigation module (Section 5). Next, the shortest path between point A to point B is generated on based on this traversability analysis. The system analyzes the shortest path and surrounding environment and generates a safe and efficient waypoint which is an intermediate goal to reach a destination. A relative direction from the camera to the waypoint is calculated and appropriate cues are generated to guide a blind user to the adjacent area of the waypoint and destination eventually. A vest-type interface consisting of four micro-vibration motors receives cues form the navigation algorithm through a wireless communication channel (Zigbee). 17 We present experimental results showing that the proposed pose estimation algorithm outper- formed the state-of-the-art sparse feature based algorithm in accuracy while maintaining the near real-time performance in experiments with the real blind subjects. Mobility experiment results with blind subjects in a known environment show that the proposed navigation system when used together with the white cane guides the subjects to a designated goal without colliding to obstacles with more stable performance in terms of navigation completion time. Another mobility exper- iments with blind-folded subjects in an unknown environment prove that the system guides the subjects to a goal successfully which is a very challenging task that requires an external help such as guidance by a sighted subject when the white cane is used alone. 18 Chapter 2 Real-time 6DOF pose estimation Figure 2.1: Overview of the pose estimation algorithm using both sparse visual features and dense point clouds 2.1 Introduction In order to extract orientational information of the blind users, we perform real-time camera pose estimation, known as visual odometry, using sparse visual features. However, sparse visual feature based approaches suffer from drifts. In order to reduces drifts of the sparse visual feature based 19 ego-motion algorithm, we make a use of dense 3D point clouds. Finding motion between two dense 3D point clouds are computationally expensive process. Therefore, we propose a real-time ICP algorithm using the ground plane as a global reference to meet the real-time requirements. 2.1.1 Iterative Closest Point (ICP) 2.1.2 Assumptions We begin with a few practical assumptions to make the proposed navigation system perform more reliably. First, we assume that when a VI user travels to a physical location and labels the place for the first time, he/she is accompanied by someone who can guide the VI user to the correct physical location matching the label. Second, we assume that the ground plane is observed most of times when the proposed system is in operation. For this purpose, we trained VI users to orient the head mounted sensor tilted slightly toward the ground plane in experiments. 2.2 Orientation initialization Roll and Pitch angle of the camera orientation is initialized to be parallel to the ground plane of the world coordinate frame. Using IMU sensor readings as an orientation prior and normals of point clouds obtained using real-time normal estimation algorithm [18], we first obtain a set of point clouds that have normals parallel to the gravity vector. From the set of point of clouds, we find a the plane coefficients using a RANSAC-based least square method and find the major plane P G = (n;D), where n = (A;B;C) T is a normal vector of a plane. We then find a R 0 2 20 Figure 2.2: (Left) Shaded areas in green color are points that have a normal parallel to the gravity vector and used to extract the ground planeP G . (Right) The ground plane model is represented in the global reference frame. SO(3) such that R 0 n=(0;1;0) T . Yaw angle is initialized using the magnetometer reading in order to maintain an orientation with respect to one global reference frame. The pose of the blind user is represented by T n wheren represents the frame number. T 0 = 2 6 6 4 R 0 t 0 0 T 1 3 7 7 5 ; where t 0 =(0;0;0) T This initialization process is essential for real-time wearable navigation system for two reasons. The first reason is that the initialization step helps the VI maintain consistent orientation with respect to surroundings because the system has the reference coordinate that takes a configuration of the physical world into consideration. Incremental SLAM methods build a map with respect to the camera pose of the first frame. Failures to maintain the orientation of the first frame with respect to the real world causes the navigation system to create an incorrect model, for example, 21 converting the ground plane into obstacles. Another reason is that the ground plane model helps the navigation algorithm to maintain much more stable motion estimation results. By finding the ground plane, the system can estimate changes in roll and pitch angle of the VI user faster and more robustly, which enables the motion estimation algorithm to run faster and more accurate. 2.3 RGBD visual odometry Visual odometry is a process of estimating the pose of vision system which has a very successful history in the fields of computer vision and robotics. In order to maintain a sense of egocentricity of the blind user, we continuously estimate the 6 degrees of freedom (DOF) camera pose in the coordinate frame obtained from the initialization step. The RGBD visual odometry algorithm is consisted of a real-time FAST corner based real-time motion estimator that keeps track of motions at every frame and ICP algorithm that refines motion at every keyframe using 3D point clouds. 2.3.1 frame-to-frame pose estimation We adapt the Fast Odometry from VISion (FOVIS) [23] to achieve a real-time frame by frame motion estimation. Every frame begins with an RGB and depth image capture from the camera. The RGB image is converted to grayscale and smoothed using a fixed size Gaussian kernel. Then a Gaussian pyramid is constructed to detect robust features at each Gaussian pyramid level. At each Gaussian pyramid level, FAST corners are extracted. In order to guarantee uniform distribution of features, images at each level is divided into 8080 sub-images. the 25 strongest FAST corners are 22 Figure 2.3: Sparse feature based visual odometry that finds matches between visual feature points in image frames between the current observation and a reference keyframe. It then finds matches of corresponding 3D points to calculate motion in 3D space. selected from each sub-image. Fast corners associated with invalid depth are discarded. Generally, the features motion in the image plane is caused by 3D rotation for continuous motion estimation applications. An algorithm proposed by Mei et al. [39] to compute an initial rotation by directly minimizing the sum of squared pixel errors between downsampled versions of the current and previous frames is used. This initial rotation estimation helps to constrain the search area for feature matching. 80 byte descriptor pixel patch, the last pixel omitted from 99, around a feature is normalized around zero mean. Then features are matched across frames using sum-of-absolute differences (SAD) as a match score. A graph of consistent feature matches is constructed using Howards approach [21]. The graph consists of vertices, a pair of feature matches, and edges that connects vertices. Rigid motions should preserve the Euclidean distance between two features 23 matches over time in a static scene. The 3D Euclidean distance between features matches does not change drastically between consecutive frame. Hence, the set of inliers make up the maximal clique of consistent matches in a static scene. Then the maximal clique in the graph is found using a greedy algorithm [17, 21]. Initial estimation of 6DOF motion is calculated using Horn’s absolute orientation method [19] given the inliers by minimizing the Euclidean distances between the inlier feature matches. The initial motion estimation is further refined by minimizing reprojection errors. Matches with a reprojection error above a threshold will be discarded from the inlier set and the motion estimate is refined once again to obtain the final motion estimate. Some of visual odometry results are shown in Figure 2.4. Figure 2.4: FAST corners and their correspondences from the last keyframe after refinement. 2.3.2 Pose refinement using ICP and the ground plane Frame-to-frame based motion estimation suffers from drifts and error accumulation resulted from . We refine the estimated pose by performing an ICP algorithm at every keyframe to prevent the drifts and error accumulation. Keyframes are extracted when there is a motion from the last keyframe above a threshold or the number of features tracked drops below a predefined threshold 24 value. When a new keyframe is generated, the ground plane, P i;n , is estimated, if available, in the same manner as the initialization step. We can easily alignP i;n to the ground plane extracted at the initialization step, P G using their plane equations. It is known that the accuracy of ICP algorithms are affected by an initial pose estimation between two set of point clouds. This initial alignment step using the ground plane reduces matching errors as shown in Figure 2.5. This step also reduces the search space from 6D (X;Y;Z;f;q;y) to 3D (X;Z;q) because two sets of point clouds are aligned on a plane. This initial alignment steps are illustrated in Figure 2.5. Then the ICP algorithm finds the transformation between point clouds of the current keyframe, p, and the existing map,q, that minimizes the following plane-to-point error metric developed by [5]. E(p;q)= å i k(Rp i +Tq i ) T n i k 2 (2.1) where R2SO(3);t;p i ;q i ; and n i 2R 3 represent a rotation matrix, a translation vector, the i th ele- ment of p andq, and a normal vector of a p i . To further reduce the computation, we create a sliding window based buffer that maintains 3D pointclouds from the last n frames. Because the computa- tion complexity of the ICP algorithm is O(N 2 ), and therefore reducing the number of pointclouds helps to reduce the computation time. Furthermore, we process sequential inputs and the current observation matches to the recent observations as opposed to the outdated observations. Finally, pointcloud p andq are downsampled to a predefined resolution, 5cm in our case. For the real-time navigation algorithm perform to run a probabilistic 2D occupancy grid map is also constructed. The detail of the 2D probabilistic occupancy grid map can be found later in section 3.2. 25 (a) Before the alignment by the ground plane (b) After the alignment by normal vectors of the ground plane (c) After distance alignment by plane equation of the ground plane (d) After additional alignment by motion estimation from sparse visual feature based pose estimation algo- rithm. Figure 2.5: Fast ICP using the ground plane and initial guess aided by the sparse feature based pose estimation algorithm 26 (a)t th keyframe (b)(t+ 1) th keyframe (c)(t+ 2) th keyframe Figure 2.6: A sliding window buffer that maintains the n most recent keyframes for Fast ICP 2.3.3 Results 2.3.3.1 Experiment 1: Pose estimation with a blind-folded subject The first experiment estimates 6DOF poses of a blind-folded person navigating an indoor environ- ment as shown in Figure 2.7 guided by a sighted guide and compare the results against a state-of- the-art RGBD sparse feature based pose estimation algorithm. In the experiment, the blind-folded subject makes a loop of a building. While the subject is traveling, the RGBD data has been cap- tured. The captured data has been processed using sparse feature based pose estimation algorithm (FOVIS), and our algorithm firstly. Recently, graph optimization based SLAM algorithm such as [26] has shown that the pose estimation can be further optimized. Therefore, we also provide a result of the pose estimation results of the proposed algorithm after performing graph SLAM [26]. The error metric used is to measure the differences in distance (translation) after making a loop, which ideally is 0. E(S;F)=kSFk (2.2) 27 Figure 2.7: A floor plan of the experimentation area where S;F2R 3 represent 3D location of the start and the finish point. Total traversed distance is 49m. As shown in Figure 2.8, the proposed algorithm reduces drift from 3.674m to 2.661m by 27.6%. 28 Figure 2.8: Trajectories of sparse feature based algorithm (Red), our algorithm (Green), our algo- rithm after graph optimization (Blue) is shown. The ground truth is displayed in a dotted Gray line. 29 (a) Reconstructed 3D maps of the environment by sparse feature based algorithm (FOVIS). Severe distortion along the horizontal direction is observed. (b) Reconstructed 3D maps of the environment by our algorithm. The proposed algorithm corrects errors in the direction to the ground plane by ground plane matching algorithm and reduces drifts effectively. 30 (c) Reconstructed 3D maps of the environment by our algorithm after the graph optimization by iSAM. An additional graph optimization process improves the accuracy of the pose estimation. [26] Figure 2.9: Qualitative comparison of reconstructed 3d map using different methods 31 2.3.3.2 Experiment 2: Pose estimation with blind subjects The goal of the pose estimation algorithm is to capture accurate pose of the blind person while walking from point A to point B. We recruited 6 blind subjects as an additional experiments when the mobility test is performed by Aminat Adebiyi a Ph.D student in Biomedical Engineering de- partment at University of Southern California. 6 blind subjects provided a verbal consent that the motion capture data can be used for experiment purpose. 4 datasets from subjects out of 6 were used for evaluation as there were sensor obstructions by a sighted guidance and a wire of the sensor. Figure 2.10a shows the experimental environment and approximated ground truth of subjects. 4 subjects were asked to move one corner of a room to an opposite corner in diagonal direction of the room, for example A to C or B to D. Each subjects collected two datasets on different starting points always guided by a sighted user by hand. Performance of the pose estimation algorithm were evaluated using an absolute and relative error metric. The absolute error (E) was measured by displacement of the estimated location of a destination from the ground truth location, E =kd G c d E k whered G and c d E represent 3D location of a ground truth location of a destination and corresponding location estimated by the system. The relative error is obtained by E d t raj , whered t raj represents the total distance of a trajectory. The experiment results show that the system is capable of tracking the movement of the blind under oscillations caused by walking within 6.32% errors of the traveled distance. In the experiments, average traveled distance was 13.93m and theE was 0.88m which is close enough distance to reach the destination within a couple of steps. 32 (a) The floor plan and approximated ground truth of the blind subjects trajectories are presented. (b) The trajectories estimated by the proposed algo- rithm are shown on top of the approximated ground truth Figure 2.10: Experimental setup and results with the blind subjects As shown in Figure 2.12, the proposed algorithm outperforms one of the state-of-the-art algo- rithm, FOVIS [23], in most of the walking scenarios. Detailed comparison results are presented in Figure 2.13. The large errors of FOVIS algorithm results from unreliable 3D information of visual features that the algorithm relies upon for pose estimation. As the visual feature based al- gorithm uses only a few points that have visual distinctiveness and the depth measurement error of the sensor is known to be increasing quadratically to the depth value, the performance of the pose estimation algorithm start decreasing in case most of detected features are located beyond a 33 Avg. (distance) A-C (13.5m) C-A (14.8m) A-C (13.5m) C-A (14.8m) B-D (13.5m) D-B (13.9m) B-D (13.5m) D-B (13.9m) N/A (13.93m) 0.64 1.76 0.62 1.11 0.78 0.44 0.42 1.28 0.88 4.7 11.83 4.6 7.48 5.81 3.16 3.10 9.24 6.32 Figure 2.11: Absolute and relative errors of each experiments are presented. distance. On the other hand, utilizing dense point clouds helps to mitigate the problem as a few features in the further distance are not dominant elements any longer, and the pose is refined using evenly distributed across all depth ranges. 1 2 3 4 5 6 7 8 0 0.5 1 1.5 2 2.5 3 3.5 4 Trial # Y (m) Absolute error (m) Ours Fovis Figure 2.12: Error comparison against one of the state-of-the-art RGBD pose estimation algorithm based on sparse visual features. 34 Avg. (distance) A-C (13.5m) C-A (14.8m) A-C (13.5m) C-A (14.8m) B-D (13.5m) D-B (13.9m) B-D (13.5m) D-B (13.9m) N/A (13.93m) 0.64 1.76 0.62 1.11 0.78 0.44 0.42 1.28 0.88 0.65 2.04 2.45 0.61 1.18 3.83 0.99 2.52 1.95 Figure 2.13: Error comparison against one of the state-of-the-art RGBD pose estimation algorithm based on sparse visual features. 35 Chapter 3 Real-time 2D occupancy grid mapping A mapping is the process to model the surrounding environment in a certain representation by integrating the information gathered with sensors. There are many representations available in mapping, we adapt a grid map representation that uses discretized cells to represent the surround- ing environments. In our application, the blind subject is assumed to walk on the ground plane. Therefore, a 2D grid map representation is used for computational efficiency. It becomes the basis for the path planning which will be described in Chapter 5. As the blind user moves to a desti- nation, the system makes inferences about which cells are occupied given an estimated pose and sensor model. Furthermore, the system also calculates how far each cell is from obstacles which directly indicates how safe a cell is from collisions. Details in updates of occupancy probability and distance to the closes obstacle of each cell will be discussed in the following sections. 36 3.1 Related Work 3.2 2D Occupancy Grid Mapping and Traversability Analysis Even though path planning is performed on 2D space to reduce computation complexity in a sub- sequent stage, traversability is analyzed in 3D space. This allows the visually impaired to prevent collisions, since the system is able to also detect obstacles in 3D space that may be located in blind spots of the long cane. A 3D local voxel grid map (LVGM), a quantized representation of 3D point clouds, is generated with respect to the camera reference frame as represented in Figure 3.1. Using Figure 3.1: Local 3D voxels is displayed with respect to the world reference frame. Local traversability is calculated O w and O c represent the origin of the world reference frame and the camera reference frame, respectively. 37 the estimated camera orientation, we align the 3D LVGM in Figure 3.1 with respect to the global reference frame at each frame as shown in Figure 3.1. The aligned 3D LVGM is classified into occupied, free, and unknown states. Invisible voxels that are out of field of view of the camera or obstructed by another voxel are classified as unknown states. The state of the rest voxels, either occupied or free, is determined based on the number of point clouds in each voxel. Green and red voxels in a upper left corner of Figure 3.1 represent free and occupied state voxels, respectively while light blue colored voxels are the voxel with unknown states. These voxels are further clas- sified into vertical and horizontal patches which is a projection onto the local traversability map (LTM), local 2D local grid space parallel to the ground plane, centered at O 0 c as shown in a upper left corner of Figure 3.1. If one or more occupied voxels projected onto a cell on the LTM, the cell is registered as a vertical patch. On the other hands, cells that free voxels fall into are registered as a horizontal patch. Green cells and red cells in the bottom of a upper left corner in Figure 3.1 indicate horizontal patch and vertical patches on LTM, respectively. Erosion Dilation Kernel Pick the lowest pixel v alue in the kernel and paint the center of the kernel using the lowest v alue Pick the highest pixel v alue in this kernel and paint the center of the kernel using the highest v alue filled Figure 3.2: A filtering process using erosion and dilation. Black and white cells represent vertical and horizontal patches, respectively. 38 Head-mounted cameras for the navigation often faces an spatial configuration where the most of point clouds from a RGBD sensor are located beyond a certain distance as the goal of the system is to guide the visually impaired to steer the subject away from possible collision threats such as walls and obstacles which act as a reliable source when estimating motion of the camera. Hence, depth values from the head-mounted platform tend to be quite noisy since the noise characteristic of the RGB-D sensors is proportional to the quadratic of distance to a point [44]. We have empirically found out that a single vertical patch without neighboring vertical patches or a isolated horizon- tal patch surrounded by neighboring vertical patches generally result from noisy sensor reading. Hence, we apply a simple filtering algorithm using 2D image processing techniques, erosion and dilation, for noise reductions on LTM. Figure 3.2 shows a simple example how this erosion and dilation process fills a hole on a local grid map that is likely to be misinterpretation of the environ- ment introduced by sensor noise. Then the vertical and horizontal patches in the LTM are transformed to the world reference frame to update the occupancy probabilities of the 2D GTM as displayed in 3.1. The occupancy probability of each cell of GTM is updated by occupancy grid mapping rule suggested by Moravec[41]. P(njz 1:t )= 1+ 1P(njz t ) P(njz t ) 1P(njz 1:t1 ) P(njz 1:t1 ) P(n) 1P(n) 1 (3.1) 39 For efficient computation in updating, we uselogOdds() notation which can be directly converted into probabilistic values when needed. As stated in [20], this notation replaces multiplication with addition which is more efficient in computation. L(njz 1:t )=L(njz 1:t1 )+L(njz t ) (3.2) One disadvantage of the update policy represented in (3.2) is that it requires as many observation as had been integrated before to obtain the current state. In order for the system to respond to dynamic changes in the environment immediately and overcome the overconfidence in the map, the upper and lower bound,l max andl min , of thelogodds values is enforced using clamping update policy proposed by Yguel [65]. L(njz 1:t )=max(min(L(njz 1:t1 )+L(njz t );l max );l min ) (3.3) Cells on the probabilistic global traversability map whose occupancy probability is lower than P(l min ) are registered as traversable while cells with higher occupancy probability than P(l max ) are registered as non-traversable, where P(l min ) and P(l max ) are corresponding probability value of l min and l max , respectively. All cells are initialized to have unknown state in the beginning of a mapping process. Once the state of a cell changed to either traversable or non-traversable from unknown, cells whose occupancy probability value falls betweenP(l max ) andP(l min ), are registered as unknown areas. 40 3.3 Safety measure using chamfer distance transform We also build and update a chamfer distance array that stores distance to the closest obstacle cor- responding to the global 2D traversability map which is used to verify a risk of collision instanta- neously. Equation 3.4 defines the distance to the closest obstacle. i and j represent row and column index of a 2D grid map. x obs andy obs represent a column and row index of the closest obstacle of a cell at(i; j). d c (i; j)=min(jx obs ij;jy obs jj) (3.4) The Figure 3.3 shows an example of the chamfer distance array given 3 obstacles which can be defined in the traversability analysis process. Figure 3.3: Red dots represent obstacles. Numbers in each cell represents chamfer distance. The darker, the closer to obstacles. 41 Chapter 4 Offline Data storage After a navigation task is finished, M =fM 3D ;M 2D ;G;C;N s ;N d g is stored where M 3D ; M 2D , G, C, N s and N D represent a 3D map, 2D probabilistic occupancy grid map, Ground plane equation, Figure 4.1: Trajectories of two data collection tasks on a floor plan (Left). 3D maps and 2D occupancy grid map of data collection tasks are presented (Right). 42 chamfer distance map, starting node, and destination node, respectively, as shown in Figure 4.2. A node is represented as N =flabel; L; x; y; Ig where L; x; y indicates an adjacent list of nodes on the same floor, x, y location of the node, and the map ID. There may exist multiple maps on the same floor if a blind user has traveled different places without making a loop as will be discussed in Section 4.3. Hence, map ID is required to find out which map the node belongs to. This information is essential to find connectivity between nodes to make a decision on whether to perform a single session SLAM or the real-time navigation process. Each node is identified by its label that the blind user assigns so that search from the stored information becomes easier for the blind user. Nodes are stored in a multi level structure using a building, floor, and a node’s name. For example, a conference room and kitchen on the second floor of PHE building are stored under the same building and floor as shown in Listing 4.1. Figure 4.2: 2D, 3D, and chamfer distance maps are stored after a mapping process. 43 <? xml v e r s i o n =” 1 . 0 ” e n c o d i n g =” u t f8” ?> <document> <Mapper N=” 2 ” /> <B u i l d i n g Name=” phe ” N=” 3 ”> <F l o o r Name=” 1 ” N=” 3 ”> <Node Name=” p r i n t e r ” X=”1.69413 ” Y=” 6.01168 ” L=” o f f i c e ” I =” 0 ” /> <Node Name=” e x i t ” X=”2.1 ” Y=” 11 ” L=” o f f i c e ” I =” 0 ” /> <Node Name=” o f f i c e ” X=” 0 . 0 ” Y=” 0 . 0 ” L=” e x i t p r i n t e r ” I =” 0 ” /> </ F l o o r> . . . </ B u i l d i n g> <B u i l d i n g Name=” hsc ” N=” 1 ”> <F l o o r Name=” 1 ” N=” 2 ”> <Node Name=” dvrc ” X=” 0 . 0 ” Y=” 0 . 0 ” L=” g r o c e r y ” I =” 0 ” /> <Node Name=” g r o c e r y ” X=” 3.37884 ” Y=” 7.83843 ” L=” dvrc ” I =” 0 ” /> </ F l o o r> </ B u i l d i n g> </ document> Listing 4.1: A simple example of a stored map information in an xml file 4.1 Managing multiple map building results The system needs to be able to integrate results created by multiple single session SLAM tasks as shown in Figure 4.1. The idea of the mapping algorithm is similar to the Multi-session SLAM algorithm introduced in [38, 29]. Multi-session SLAM address the problem of integrating results of multiple SLAM tasks carried out over time repeatedly in the environments with overlaps. The goal of the mapping algorithm in our application is to combine multiple maps that have partial overlaps under uncertainties and estimate a consistent map in a common metrical coordinate system in a 44 robust and efficient manner. In order to tackle this problem, we take an advantage of the fact that a user provides an approximated locations by labels. Another goal is to achieve a real-time performance under increasing amount of information. The VI user shall visit different places over time, for example, home, workplaces, schools, grocery stores, and so on. It is obvious that the amount of information the system needs to maintain grows as the number of places a VI user has visited increases. One of requirements for such a system is an efficient and scalable data structure so that the system handles user’s requests in real time. In the following sections, we introduce an efficient multi level hash table data structure for a mapping for multiple SLAM results and describe how to merge the new information to an existing map. 4.2 Multi level hash table data structure Bldg A Bucket Bldg C Bucket ... Bldg K Bucket Floor 1 Bucket ... ... Floor 3 Bucket ... Floor B Bucket Floor 2 Bucket ... ... Conf. Room Bucket ... 317 Bucket ... Restroom Bucket B13 Bucket ... 202 Bucket Printer Bucket ... 317 Location Adj.List Conf.Room Location Adj.List Restroom Location Adj.List B13 Location Adj.List 202 Location Adj.List Printer Location Adj.List Figure 4.3: Multi level hash table structure of a map 45 Let us assume that the blind user requests a new navigation task and the system contains some maps in its data structure. The system classifies locations using a multi-level structure consisting of building, floor, and node levels of which elements are identified their name. The multi-level structure enables more efficient searches and alleviates labeling issues. For instance, a location name that repeatedly appears in multiple floors or buildings, such as elevator, can be reused without further manipulation. The system must firstly determine whether it is capable of guiding a blind user from the current location to a destination or needs to collect more information.Note is that nodes on the same floor in the same building are considered to be reachable in one navigation task in the current system. For that, the system queries if the requested building, floor, and nodes exist in the map by their label. If the system finds the current location and destination in the data structure, connectivity of two nodes are further examined using the adjacent list information of each node. Once the connectivity is established, the system initiates the real-time navigation algorithm presented.Or the system performs another single session SLAM to build a map of the unexplore area. At the end of each single session SLAM task, the system also associates new information to an existing map in the same manner using labels. The system merges the new information to an existing map found to have a common node. Detail of the map merge process will be discussed in the next section. Evaluations of the query time of a node, time to find a connection between two nodes in a map, and memory consumption of the data structure is illustrated in Figure 4.4. A search operation in a sequential array has O(log(n)) complexity. A hash table performs a very efficient search, insertion, and deletion atO(1) complexity in average. However, it consumes more space in memory to reserve an enough bucket size to minimize collisions in hashing. The system 46 avoids the problem by storing the number of items at each level in an xml file as shown in Listing. 4.1. Hence, when loading map data from an offline file when initiating the program, the number of buildings, floors, and places can be read first, and the system reserves the appropriate bucket size to avoid hash collisions and minimize the memory occupation at the same time. By implementing the efficient multi-level hash table, the system can access or modify a node in the map at the O(1) as opposed to O(log(n B )log(n F )log(n N )), where n B ;n F ;n N represent the number of buildings in an entire map, floors in the building of interest, and nodes on the floor of interest, respectively. A structure of the multi-level hash table is illustrated in Figure 4.3. A connectivity between two nodes on the same floor can be checked in O(n) time in the worst case, where n represents the number of nodes on a floor, if a hash table structure is used. As you can see, time to take to find a query node and a connection between two nodes in a map remains almost constant when there is a large number of elements in the data structure. The query time has increased in some cased as the size of data structure increases, for example when B= 10;F = 10. However, the increment in time is negligible compared to the scale changes of node size. Memory consumption increases depending on the size of buckets required for a stable hashing which is pre-allocated as described above so that no rehasing is required during online operations. 47 10 50 100 5000 5002 5004 5006 5008 5010 Number of nodes Memory (KB) Memory consumption B= 10, F= 10 B= 10, F= 50 B= 10, F= 100 B= 50, F= 10 B= 50, F= 50 B= 50, F= 100 B=100, F= 10 B=100, F= 50 B=100, F=100 (a) Memory profile 10 50 100 0.5 1 1.5 2 2.5 x 10 −3 Number of nodes Time (ms) Query time B= 10, F= 10 B= 10, F= 50 B= 10, F= 100 B= 50, F= 10 B= 50, F= 50 B= 50, F= 100 B=100, F= 10 B=100, F= 50 B=100, F=100 (b) Profile of query time of a node 10 50 100 0.005 0.01 0.015 0.02 0.025 Number of nodes Time (ms) Connection between two nodes B= 10, F= 10 B= 10, F= 50 B= 10, F= 100 B= 50, F= 10 B= 50, F= 50 B= 50, F= 100 B=100, F= 10 B=100, F= 50 B=100, F=100 (c) Profile of query time to find connection between two nodes Figure 4.4: Time and memory profile of the proposed data structure. B and F represent the number of buildings and floors in a building, respectively. 48 4.3 Map merge When a new map generated by a single mapping finds an associated node in an existing map using a label, the system calculates a rigid transformation for two maps to be merged with consistency and verifies a geometric consistency the merged map using neighboring 3D point clouds of the associated node by calculating equation (2.1). If an error, E(p;q), obtained from equation (2.1) is greater than a threshold value, q merge two maps are not merged and stored as separate maps. For example, after a single SLAM task of traveling from S 2 to G 2 in Figure 4.1, the system associates two nodes named ’printer’ on the same floor in a building. When two nodes are verified to be the same location, the system merges the new map into an existing one. Then a blind user collects information by traveling fromS 2 toG 2 and the system performs a single session SLAM during the travel. Once the blind user reaches the destination, the system terminates the single SLAM task and check if the new information can be merged into the existing map. We use 3D point clouds instead of RGB images for place matching unlike [29, 38] to achieve robust matching results under severe view point changes. It is important in our application because the navigation system often encounters a scenario where rgb images have little or no overlaps between two maps due to limited field of view of the RGBD sensor, for example, a user travels from point A to B and wants to come back to point A from point B. However, 3D point clouds are invariant to the field of view and hence more suitable in our applications. As stated earlier, the computational complexity of an ICP approach is O(n 2 ) where n is the number of points. Therefore, we reduce the number of point to 49 be fed to an ICP algorithm by extracting two set of k nearest neighbors p andq near anchors, a 3D location of the same label from each map. , M , (a) Two nodes named A from M 1 and M 2 are as- sociated with a high matching score and their rigid transformation,(R;T), is found. , (b) Two nodes could not be associated with because of low match score. Figure 4.5: Graphical representation how the system makes a decision to merge two maps. 50 (a) Two maps, Map1 and Map2, are merged using a common label Elevator and its neighboring 3D point- clouds from each map. (b) An additional map is merged using another common label printer in the same manner. The map now covers Figure 4.6: Graphical representation how the system makes a decision to merge two maps. 51 4.3.1 Limitation 2D and 3D maps built in a single map building task accumulates errors caused by pose estimation and sensor noises which in turn causes deformation of maps. The current implementation merges two maps using a rigid transformation only when finding overlaps between two separate map. Deformation which is difficult to correct using rigid transformations only results in inconsistency in map. Future direction regarding the issue will be discussed in Chapter 7.1. Figure 4.7: Deformation of 2D grid map caused by error accumulation can not be corrected by rigid transformations only 52 Chapter 5 Dynamic path planning in the presence of obstacles The goal of this process is to generate an efficient, stable and safe waypoint based on the shortest path and traversability analysis information and that can be easily delivered to the tactile feedback system to guide the blind user. Shortest path planning algorithms such as [28] provide the most efficient path in distance to reach a destination. However, they do not take physical constraints such as a dimension of a body of a blind user into consideration. Therefore, the shortest path is perceived as unrealistic and can not be directly used for our application. As stated in [49], ideal trajectories that a navigation system for the blind generates will be trajectories created by the sighted subjects or by blind users guided by sighted guides. Hence, the proposed navigation system aims to generates trajectories close to those of sighted person based on the shortest path algorithm for efficiency and the chamfer distance map for safety. Furthermore, delivering the entire path to a blind user causes another difficult user interface issue. According to [45], the blind prefers to build a very simple and precise set of instructions, with very few details or long descriptions about the place or route. Here, we solve this problem by generating a waypoint, an intermediate goal towards destination, 53 to guide the blind subject by creating much sparse representation yet close to the sighted subjects’ trajectories. 5.1 D* Lite algorithm The shortest path to the waypoint is computed using the D* Lite algorithm [28]. Occupied cells are assigned a cost of infinity, while traversable regions are given appropriate cost values. D* Lite algorithm is an incremental heuristic search algorithm that is able to tackle dynamic changes in the occupancy map in real time. In order to reduce the computational load, the path informa- tion is updated only for sections of the map that have undergone some modification. Discussion of full details of D* Lite algorithm is beyond the scope of this thesis, but interested readers are recommended to find more descriptions in [28]. 5.2 Waypoint generation The blind subject is assumed to travel on a ground plane. Therefore, the path planning is performed on the global 2D traversability map for computational efficiency. We also build and update a chamfer distance array that stores distance to the closest obstacle corresponding to the global 2D traversability map which is used to verify a direct measure of risk of collision instantaneously. The shortest path is produced using the D ? Lite Algorithm as suggested by [28] from the current location to the destination. D ? algorithm can handle dynamic changes of the surrounding very efficiently. However, subtle changes in the map sometimes result in changes of the generated path 54 in high frequency, which confuses the blind subject. The shortest path produced is a set of cells,DL, on the grid map which connects the current location to the destination without using untraversable cells. Instead of directly following DL, we generate a reliable waypoint,W, that are confirmed to be traversable and located at least some distance from obstacles. In order to generate a waypoint, we find the furthest point inDL which is directly visible and traversable from the current location, D 1 , and another cell, D 2 , located furthest in the set DL which is directly visible and traversable from D 1 . Then search the neighborhood of D 1 limited by a predefined radius and find a cell that minimize the cost function, f , given in (5.2).C i;j andCF i;j represent a cell ini th row and j th column and Chamfer distance value ofC i;j , respectively. d(C 1 ;C 2 ) indicates distance between two cell,C 1 andC 2 . W = argmin i;j f i;j ; (5.1) where f i;j =w 1 [d(C i;j ;D 1 )+d(C i;j ;D 2 )]+w 2 CF i;j Occupied cell Free cell Unknown cell Destination Camera location waypoint D* Lite Path Search radius D* Lite Path To the waypoint Heading Figure 5.1: An example of a waypoint update scenario. 55 The temporary waypoint is updated when there is motion or rotation greater than a certain threshold , when the current waypoint is not visible from the current location, or the blind subject arrives the vicinity of the waypoint to avoid frequent updates of the generated path and redundant computation. 5.3 Cue generation and Haptic feedback 5.3.1 Introduction Human interface is an essential part of the ETA devices which delivers commands created by processing camera input and converting the processed data into a format that is appropriate for the blind user to perceive. In this dissertation, a tactile vest as introduced in Section 1.4.1 is designed to convey cues to human subjects for the purpose of obstacle avoidance and orientation guidance to reach a destination. There are two main sensory substitution methods that have been researched in in the context of navigation for the visually impaired: Audio and tactile. Audio cuing in this way, however, imposes a severe cognitive load on the user. In [1], the two main sensory substitution for the visually impaired in the context of navigation in a complex route have been evaluated. Even though the visually impaired responded showed higher rate of correct responses to navigation (directional) commands when using audio feedback. The tactile feedback system showed promise as an alternative to verbal-assist devices as means of communicating important information to users for mobility. Also, the visually impaired use their hearings for many daily activities including a basic echolocation task. It was shown that we can help people in avoiding obstacles through a tactile display [49]. Given these observations, the tactile modality has been 56 employed to convey cues create by motion estimation and path planning algorithms discussed in the previous sections. A tactile vest with microvibration motors and wireless control logic has been designed and integrated into the system. This system has intuitive 4 different cues for a user as followings: Straight (no tactile sensors on) Stop & Scan: Proximity Alert (all tactile sensors on) Turn left (top-left sensor on) Turn right (top-right sensor on) The cue is generated simply based on a relative angle between a direction obtained by drawing a line from the current position to the waypoint and the current heading. In order to prevent frequent changes in cues which often causes confusions from the visually impaired subject, we utilize a hysteresis loop equation that takes the current status of a cue into account as well as a rotation angle required to reach the waypoint,W, as follows. New Cue= 8 > > > > > > > > > > > > > > < > > > > > > > > > > > > > > : Turn Left if 180 <qq 2 ; Curr. Cue if q 2 <q <q 1 ; Straight if q 1 qq 1 ; Curr. Cue if q 1 <q <q 2 ; Turn Right if q 2 q < 180 : For example, let us assume the rotation angle, q, falls somewhere between q 1 and q 2 . If the 57 Figure 5.2: Hysteresis loop equation for cue generation. The left turn cases are simple a reflection of the graph about the vertical axis. current cue is Straight, a new cue will still be Straight. On the other hands, if the current cue is Turn right, a new cue will be Turn right until a subject completes rotation so thatq <q 1 . In our application, q 1 and q 2 are defined as 5 and 15 , respectively. The generated cue is transmitted through the wireless transmitter to the vest interface that has four vibration motors as provided in Fig. 1.2. Each vibration motor delivers a specific cue such as turn left, turn right, and so on. 58 5.4 Results We present the experimental details and results of mobility experiments of the proposed system . Our navigation system runs at 28:6 Hz on average on the following configuration. CPU: Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz RAM: 8.00 GB OS: Windows 7 - 64 bit The breakdown of processing timing is represented in Table 5.3. Figure 5.3: Timings of each process over 1500 frames. 59 5.4.1 validation of waypoint generation algorithm We recorded RGBD data using the proposed system from 2 sighted subjects in 10 different scenar- ios. Each scene has been selected to have different combination of a starting point and destination to test the robustness of the proposed waypoint generation algorithm in different spatial configura- tion. The system estimates the pose of the camera and builds 2D grid maps as the subject records the RGBD data. Then the system loads the map and the shortest path is generated as presented in the right side of the Figure 5.4 as if they are performing the same new navigation task. The corresponding path is plotted in red in the left side. The shortest path is the most efficient. How- ever, minimizing traveling distance is not optimal path for a human to follow as there are higher risks of collisions. The estimated trajectories of the sighted subjects are close to the shortest paths. However, they are generally smoother than the shortest path which is expected as the sighted sub- jects plans collision free paths prior to reach the adjacent area of the obstacles which is what the proposed waypoint generation algorithm aims to achieve. Figure 5.4: Scene 1: A scanning desk to an exit 60 As presented in Figure 5.5, the way point algorithm successfully creates much fewer num- ber of candidates as intermediate goals that are close to the trajectories of the sighted subjects’ trajectories. Quality of waypoint generation is difficult to measure quantitatively. Therefore, we visually confirm similarities between trajectories of the control group and the proposed algorithm and compare the results from the shortest path algorithm as shown in Figure 5.5 as presented in [49] at the moment. Quantitative metric to measure the similarity between trajectories (curves) of the proposed algorithm and control group will be researched as a future work. 80 90 100 110 120 130 140 150 80 90 100 110 120 130 scanning desk to exit X Y Start Sighted D* Lite Our alg. (a) Scene 1 50 60 70 80 90 100 110 100 105 110 115 120 125 130 135 140 145 150 155 elevator to printer X Y Start Sighted D* Lite Our alg. (b) Scene 2 61 70 80 90 100 110 120 130 100 110 120 130 140 150 exit to office X Y Start Sighted D* Lite Our alg. (c) Scene 3 65 70 75 80 85 90 95 100 105 110 115 120 125 130 exit to printer X Y Start Sighted D* Lite Our alg. (d) Scene 4 100 110 120 130 140 150 160 90 95 100 105 110 115 120 125 130 135 140 office to scanning desk X Y Start Sighted D* Lite Our alg. (e) Scene 5 60 70 80 90 100 110 120 130 140 90 100 110 120 130 140 150 office to elevator X Y Start Sighted D* Lite Our alg. (f) Scene 6 62 70 80 90 100 110 120 130 140 110 120 130 140 150 160 office to exit X Y Start Sighted D* Lite Our alg. (g) Scene 7 100 110 120 130 140 150 90 95 100 105 110 115 120 125 130 135 140 office to printer X Y Start Sighted D* Lite Our alg. (h) Scene 8 70 75 80 85 90 95 100 105 110 100 105 110 115 120 125 130 printer to exit X Y Start Sighted D* Lite Our alg. (i) Scene 9 105 110 115 120 125 130 135 140 145 150 100 105 110 115 120 125 130 135 140 printer 2 office X Y Start Sighted D* Lite Our alg. (j) Scene 10 Figure 5.5: 2 sighted users traveled and recorded RGBD data for the system in 10 different spatial configuration. Estimated trajectories of the sighted subjects (green), the shortest path to reach a goal (red), and the waypoints generated by the proposed algorithm (blue) are represented. 63 Chapter 6 Mobility experiments In this chapter, mobility experiments conducted with both the visually impaired subjects and blind- folded subjects are presented. These experiments are designed to test an ability of the system to guide a blind user from point A to point B while avoiding obstacles by giving warning about the existence of obstacles in the vicinity of the blind subjects. As addressed in Section 2.3, it is assumed that a map between point A to point B has been built before navigation tasks either by a blind subject or a sighted person. Two different scenarios for the mobility experiments are addressed in this section which are a known and unknown scenario in our experiments depending on how much information is provided for a navigation task. In a known scenario, the subjects are given a goal and starting location in an indoor space where the subjects have already traveled before. Therefore, the subject knows how to travel between a starting location and a goal even only with the white cane. An unknown scenario includes a navigation task in a completely new place. It can also include a navigation task in a previously traveled place without having full information about the task. For example, a goal or/and starting location with in an indoor space is missing in 64 a navigation task. This type of task is nearly impossible for the blind users to accomplish without guidance by a sighted user. In the following sections, we discuss the motivation, method, and quantitative performance evaluation of the system in two experiment scenarios in detail. 6.1 Mobility Experiments with Visually Impaired Subjects 6.1.1 Motivation In this experiment, we would like to investigate if the tactile cues successfully guide users to reach a specific location of interest in the presence of an obstacles and the cues provide additional ad- vantages over the conventional white canes in a known scenario. Again, a blind subject in the experiment is expected to travel the trajectory using their white cane without guidance of a sighted subject given a starting and goal location in the experiment environment. For the first part of the experiments (training), how successfully the visually impaired subjects comply with commands from the tactile vest is evaluated. Therefore, we designed a human in the loop experiment scenario where cues were provided manually by a human operator without using autonomous mapping and path finding algorithm. The second experiment (testing) was designed to evaluate the entire sys- tem, which performs motion estimation, mapping, and path planning autonomously and delivers appropriate commands to the vibration motors on the vest to guide subjects. 65 6.1.2 Method 6.1.2.1 Subject Recruitment 5 visually impaired subjects were enrolled for this study and a signed informed consent was ob- tained from each participant. This experiment has been carried out as a part of mobility experiments that obtained approval from the Institutional Review Board (IRB) at the Unversity of Southern Cal- ifornia. Subjects were required to be able to speak English, 18+ years of age, no history of motion sickness or vertigo, no cognitive or language/hearing impairments and not require any ambulatory aids such as crutches. Totally blind participants with light perception at best were selected for the study. Orientation and mobility instructors at the Braille Institute of Los Angeles were requested to advertise the study amongst their students. All selected subjects were paid 20 USD in cash upon conclusion of each experiment session. In addition, detailed information pertaining to subjects vi- sion loss was collected. Some relevant data is presented in Table 6.1. Subject RT and NM dropped out of the study later, and therefore, will not be reported in the results section. The protocol of the experiments is shown in Figure 6.1. Even though we assumed the subjects travel familiar environment with a known starting and goal location in a known scenario, at least one week of separation has been made between mobility experiments using the white cane and the system. In each experiment the blind subjects travel the same test route. This is to eliminate effects from learning the route from previous experiments when performed close in time and fatigue a subject may feel after long hours of experiments. 66 Subject Age Diagnosis of Vision Loss NM 40 Detached optic nerve (since birth) RT 40 Optic nerve hypoplasia GB 55 Microphthalmia (L) / Anophthalmia (R) JV 63 Cataracts JN 69 Retinopathy of prematurity Table 6.1: Summary of data from subject recruitment. Figure 6.1: Testing protocol 6.1.2.2 Training with manual cue generation This experiment is designed to test the capability of the tactile vest in guiding a visually impaired user to a specific location without collisions to obstacles using the system together with the white cane. A training navigation course is shown in Figure 6.2. A visually impaired subject begins from a starting point marked as S in Figure 6.2 and finishes at a location marked as E. Maximum 10 trials 67 Figure 6.2: Training environment per subject have been performed for this training. If a subject reaches a performance asymptotes which is indicated by a flat region in Figure 6.4, the training is terminated. 6.1.2.3 Testing with autonomous cue generation This experiment was designed to test the capability of the system in guiding a visually impaired user using the system together with the white cane and whether any improvements can be obtained with the proposed system that warns about the presence of obstacles in the vicinity of the subject also guides the subject toward a destination. A testing navigation course is shown in Figure 6.3. A visually impaired subject begins from a starting point marked as S in Figure 6.3a and finishes 68 (a) The Top down view of a testing environment of a navigation task for the visually impaired subject. (b) A view from a camera of environment for the navi- gation task observed from the location of gray triangle in 6.3a. Figure 6.3: Experimental setup and results with visually impaired subjects at a location marked as E. 3 trials per subject have been performed in the testing phase. An occu- pancy grid map was built prior to the testing using the system and loaded before starting each task. The system generated and transmitted signals to the tactile vest, based on the way point and cue generation algorithm discussed in Section 5. 6.1.3 Results and Discussion 6.1.3.1 Training with manual cue generation The average time to complete a training course per trial is presented in Figure 6.5. Improvement in average time to complete a training task over trials is observed in general which indicates learning 69 effects among the visually impaired subjects. The first three trials show the highest means and vari- ances which agrees with mobility studies under simulated prosthetic conditions [7]. It is reported that most learning during mobility experiments for partially sighted subjects is observed during the first three trial with flattening of the curve by the fifth trial [13]. And our experiment results agrees with the observation found in [13]. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 Individual training completion time 1 2 3 4 5 6 7 8 9 10 10 15 20 25 30 35 40 45 Time to complete (in Second) Trial NM JN GB JM RS Figure 6.4: Time to completion of a training navigation course per subject per trials. 70 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 Avg. training completion time 1 2 3 4 5 6 7 5 10 15 20 25 30 35 40 Among subjects who complied to the system Time to complete (in Second) Trial Figure 6.5: Average time and variability to completion of a training navigation course. 71 6.1.3.2 Testing with autonomous cue generation Figure 6.6: Trajectories each subject took to reach the destination. After the training, RS dropped out of the testing since the subject was not demonstrated learning over trials. Furthermore, NM also dropped out of the testing as she could not comply with the command on the testing date. And the Braille institute closes from the 4th week of Nov, no further experiments could be scheduled. Therefore three subjects JV , GB, JN carried out a navigation task in the testing phase. Figure 6.7 and 6.7 show the time to complete a navigation task per subject per trial and average time to complete a navigation task, respectively. The visually impaired use corners of a wall or sound of a place as a landmark while navigating. Subject JV and GB were fast walker and advanced level white cane user. As they walk fast, they easily miss a landmark and accumulates errors and consequently miss some important cues (turns to make) on the way as represented in Figure 6.6. JV had missed turns 2 out of 3 trials, GB missed 1, while JN did not miss any. Missing important navigational cues results in additional time to localize themselves in their 72 spatial map and come back to the known places, which also results in larger variation in completion time as you can see in Figure 6.7. To note is that JN does not follow usual Orientation and Mobility (O&M) convention among the visually impaired which is to follow the right side of a wall. This brought him closer to the left side wall where the first left to make is to the destination and did not miss any turns. The trajectory that John took was more similar to the trajectory that the shortest path algorithm generates which is to follow the shortest path while maintaining enough distance to prevent possible collisions to walls. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 Time to complete a naviation task by subjects 1 2 3 15 20 25 30 35 40 45 50 55 JV Time to complete (in Second) Trial Cane Cane+System 1 2 3 15 20 25 30 35 40 45 50 55 GB Trial Cane Cane+System 1 2 3 15 20 25 30 35 40 45 50 55 JN Trial Cane Cane+System Figure 6.7: Time to completion of a testing navigation course per subject per trials. One interesting finding is that performance of the system in navigation tasks is not significantly different according to a paired t-test between two groups: White cane and White cane + System (p=0.7033 in 2 tail). This indicates that the proposed navigation system shows a similar perfor- mance in time where the subjects had significantly less amount of time for training. This is a big 73 JV GB JN 10 15 20 25 30 35 40 45 50 55 Mean and Std. Dev of navigation task completion time (Individual) Time to complete (in Second) Subjects 42.94 28.22 28.31 32.24 24.36 36.56 Cane only Cane+System Figure 6.8: Average time to completion of a testing navigation course. improvement from the previous results for obstacle avoidance task where you have a significantly lower speed while reducing the number of collisions. JN who had the best record which even outperformed the white cane showed the highest SUS score indicating JN found the system highly usable. While the other subjects GB and JV showed the lower SUS scores which indicated room for improvement which will be discussed in Section 7.1. 74 I think that I would like to use this system frequently. I found the system unnecessarily complex. I thought the system was easy to use. I think that I would need the support of a technical person to be able to use this system. I found the various functions in this system were well integrated. I thought there was too much inconsistency in this system. I would imagine that most people would learn to use this system very quickly. I found the system very cumbersome to use. I felt very confident using the system. I needed to learn a lot of things before I could get going with this system. System Usability Scale (SUS) JV GB JN Figure 6.9: Responses collected from visually impaired subject to the System Usability Scale (SUS). 75 6.2 Mobility Experiments with Bilnd-Folded Subjects Figure 6.10: A floor plan of the mobility experiments. Red boxes and lines indicates obstacles that are not reflected in the floor plan. 6.2.1 Motivation Another navigation experiment conducted with blind-folded subjects are presented in this section. This experiment is designed to test an ability of the system to guide a blind user in an unknown scenario. Generally speaking, it is very challenging for the visually impaired to travel in unfamiliar environments when using the white cane only. In order to show feasibility of the proposed system 76 to be utilized in such a challenging case, a simple unknown navigation scenario where a partial in- formation is missing for a navigation task is designed. The blind-folded subjects can be considered to inexperienced white cane users which the author believes to be a good group to introduce a new technology to and are expected to show different behaviors from the blind subjects in the previous section who are experienced white cane users. In this experiment, there exist many obstacles such as tables, chairs, cubicle walls, and a fridge that are not present on the floorplan as shown in Figure 6.10, which is a primary reason that the system does not rely entirely on the floorplan for mapping and planning and performs a real-time probabilistic map update. 6.2.2 Method In this experiment, the blind-folded subjects are asked to carry out two navigation tasks. For each task, the subject make two trials and their average performance will be presented and discussed. For a safety concern, the experiment is performed in an area where the blind-folded subjects has traveled recently. However, the starting orientation is unknown to the subjects in this experiment. The experiment is one of the easiest unknown scenarios possible and designed to simulate a sit- uation often happening in a daily basis even in a very familiar environment when a subject loses a sense of the orientation or gets confused by some changes in the spatial configuration. Missing the starting orientation makes it impossible for the subjects to carry out a navigation task immedi- ately. Because relative information between a starting location and destination are essential for the subjects to complete a navigation task. 77 6.2.3 Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.5 1 Avg. time to complete a naviation task by subjects TS RC AT YL 45 50 55 60 65 70 75 80 85 Time to complete (in Second) Subject Cane Cane+System TS RC AT YL 40 50 60 70 80 90 100 110 120 Time to complete (in Second) Subject Cane Cane+System Figure 6.11: The average elapsed time to complete a navigation task for the Printer (left) and Exit (right) destination per subject. As you can see in Fig 6.11, the proposed system when used together with the white cane improved the mobility performance of every subject when going to the Exit and 3 out 4 subjects when going to Printer. The proposed system helps to reduce the navigation completion time by 57:77% in average. One note is that losing a sense of a starting orientation can decrease the mobility performance by about a half in terms of the completion time to reach a goal compared to the scenario where both a starting location and destination are given. This results from an ability of the system that can initialize the orientation with respect to the stored information and help the subjects to align themselves to the existing map. In a similar sense, the system will make a 78 Cane only Cane+System 40 50 60 70 80 90 100 110 Mean and Std. Dev of navigation task completion time (Total) Time to complete (in Second) 81.71 52.23 Cane only Cane+System Figure 6.12: The average time to complete a navigation task and variability of the blind folded subjects. navigation task in a completely new place feasible if a map of the environment is available which can be built by another person with the system who traveled the place before, which would have been an impossible task for the visually impaired subject using the white cane alone. As you can see in the Figure 6.12, this also confirms that using the proposed system improves the stability of the navigation performance. 6.3 Summary In this chapter, we have presented quantitative results indicating the most visually impaired sub- jects were able to demonstrate learning effect in using the proposed system for navigation tasks. It 79 is reported that most learning during mobility experiments for partially sighted subjects is observed during the first three trial with flattening of the curve by the fifth trial [13]. And our experiment results agrees with the observation found in [13]. The proposed system shows comparable perfor- mance in completion to the conventional mobility aids the the mean completion time when using the proposed system does not significantly differ from that of white cane alone while achieving sta- ble performance which is indicated by smaller variance in time in both tests with visually impaired subjects and blind-folded subjects. The ability of the system that can initialize the orientation with respect to the stored map helps the blind-folded subjects to reach the destination faster than when using the white cane only. And the ability of the system is expected to enable blind subjects to navigate even in a completely unknown environment using the system which is an impossible task for the blind subjects to accomplish with the white cane only where guidance from a sighted person is not available. 80 Chapter 7 Conclusion and Future work 7.1 Contributions We have presented an integrated framework using RGB-D as an input to operate real-time SLAM and dynamic path planning in the presence of obstacles in the context of a mobility aids for the blind applications. The proposed pose estimation algorithm that uses sparse features and dense 3D point clouds guided by the ground plane outperforms one of the state-of-the-art sparse feature based pose estimation algorithm that has an absolute error of 3.674m in a walking scenario by the absolute error of 2.661m for a 49.0m trajectory with the blind folded subject guided by a sighted person. The proposed pose estimation algorithm shows the real-time performance given QVGA (320 x 240) image and depth streams. Performance evaluation of the real-time pose estimation with the 4 blind subjects has been presented. The proposed pose estimation algorithm also pro- vided much accurate result of average error of 0.88m on 13.93m trajectories in average. Dynamic path planning in the presence of obstacles is an essential step for a human navigation where the pose 81 of the subject changes and the surrounding environments may also change. In order to overcome a unrealistic realization of the shortest path algorithm and make is easier to generate cues to the blind user, we propose to generate a waypoint based on the shortest path algorithm [28] and cham- fer distance map of occupancy grid. Pilot studies show that the waypoint generated by the system provides trajectories that are close to those of sighted people with a fewer intermediate locations to follow compared to the shortest path generated by D* Lite algorithm. Mobility experiments in a known scenario with the blind subjects show that there is no significant difference in efficiency by a statistical test while improving stability by reducing the number of missing important cues (ro- tation) in a navigation task. Generally speaking, it is very challenging for the visually impaired to travel in unknown or unfamiliar environments. In order to show feasibility of the proposed system to be utilized in such a challenging case, we designed a simple unknown navigation scenario where a partial information is missing in a navigation task. The mobility experiments in the unknown scenario indicate that the proposed system when used together with the white cane improve the mobility performance of the blind-folded subject by 57:77% in average over when the white cane is used alone. The papers that we have already been published or are under review, pertaining to the above works are as follows - Young Hoon Lee, G´ erard Medioni, A RGB-D camera Based Navigation for the Visually Impaired, RSS 2011 RGB-D: Advanced Reasoning with Depth Camera Workshop, pp. 1-6, Los Angeles, USA, Jun, 2011. 82 Young Hoon Lee, Tung-Sing Leung and G´ erard Medioni, Real-time staircase detection from a wearable stereo system, ICPR 2012 (Oral paper), pp. 3770-3773, Tsukuba, Japan, Nov, 11-15, 2012. Young Hoon Lee, G´ erard Medioni, Wearable RGB-D camera navigation system for the blind , ECCV 2014 Workshop on Assistive Computer Vision and Robotics (Oral paper), pp. 493- 508, Zurich, Switzerland, Sep, 2014. Young Hoon Lee, G´ erard Medioni, Wearable RGB-D camera navigation system for the blind , Computer Vision and Image Understanding (Under review), 2015. However, there is still room to improve. Our future work aims at building a wearable navi- gation system to perform in a more reliable and efficient way. First, the visual odometry should improve to react abrupt motions that arise occasionally. The visual odometry algorithm should ef- ficiently filter out unreliable features that are located on moving objects such as pedestrian in front as it produces false movement information. The following sections present our future direction of research followed by a tentative schedule. 7.2 Directions for Future Work The experiments with visually impaired subjects from our research open up several interesting future directions. In particular, I would like to discuss three interesting topics as future work. 83 7.2.1 Robust egomotion estimation by multi modal sensor integration It is essential to continuously maintain accurate and robust motion estimation in order to provide the visually impaired users with correct navigation cues to follow. However, there are some cases where the proposed motion estimation algorithm fails to achieve such requirement. For example, it is well known that visual odometry algorithms fail to perform robustly under low textured area. Limitations of computer vision sensors such as limited field of view or sensing range also cause the robust egomotion estimation to fail under certain conditions. Integration with other sensing modality such as Indoor GPSs or IMUs will help the egomotion estimation overcome such prob- lems as they provide complementary characteristics to the computer vision sensors. Pilot studies show that integration of indoor GPS network located inside a building helps the motion estimation system maintain the translation information by triangulation of the Received Signal Strength Indi- cator (RSSI) from known transmitters.IMUs usually have a higher update rate than computer vision sensors and capable of measuring rapid changes in angular velocity. This will improve not only the accuracy and robustness in static environment but also in dynamic environment as represented in [32]. 7.2.2 Usability improvement The System Usability Scale (SUS) provides a quick and reliable tool for measuring the usability [57]. According to our own SUS from the visually impaired subjects, it is shown that the visually impaired mainly expect the system to be easier and less cumbersome to use. In order to improve 84 the usability of the system, I would like to propose to adapt some of conventions from orientation and mobility classes. It was observed in our experiments that there are some cases where com- mands generated by the system create additional process time as they do not coincide with some of conventions from O&M classes. For example, some of the blind subjects follows walls on the right side in order to maintain orientation as they move and detect landmarks while the navigation system create way points located far enough from walls to avoid collisions. Adopting such navi- gation skills when planning a path for the blind subjects is going to help the blind users who are accustomed to such skills reduce the processing time and cognitive loads while navigating using our system. Pilot studies with the blind subjects suggest that, learning parameters such as a thresh- old value to generate a rotation command helps the visually impaired subject respond better with shorter amount of response time and improved accuracy. 85 Appendix A Staircase Detection From a Wearable Stereo Camera A.1 Introduction We have designed and built a real-time wearable, stereo-vision based, navigation system for the visually impaired which has proved to be well accepted by patients [49]. It consists of a head- mounted stereo camera and a vest interface with four tactile feedback effectors. The system uses a stereo camera as a data acquisition device and implements a realtime SLAM, an obstacle avoid- ance, and a path planning algorithm. Based on the algorithms mentioned above, an appropriate cue is generated and delivered at every frame to the tactile sensor array to alert the user to the presence of obstacles, and provides a blind user with guidance along the generated safe path. A main as- sumption we made here is that the ground is flat. In real scenarios, however, users have to navigate in complex environments which include stairs. Stereo vision sensors are the primary choice for outdoor navigation. Our experiments show that, for staircases, a head mounted stereo system pro- duces poor 3D, because of low texture, short baseline, and alignment of the stairs with the epipolar 86 lines. Hence, we first consider staircase detection as a classical object recognition problem in 2D image analysis. For this reason, we propose to use Haar-like features with cascade classifier that is widely used in object detection because its efficiency and reliable performance. However, un- like human faces, texture, shapes, and the size of staircases have large variation among individual element that makes staircase detection a very difficult problem when using local features. Ground plane estimation from stereo vision, and smoothness of movement constraint are applied to achieve more robust and accurate detection performance at a reasonable additional computational cost. The main contribution of this work is a accurate real-time staircase detection framework. To confirm this, we have performed our experiments based on real datasets collected in different areas. Before describing the details of our algorithm in Section A.3, related works are reviewed in the next sec- tion. Experimental details and results are provided in Section A.4. Conclusion and future work are discussed in Section A.5. A.2 Related work Staircase detection for autonomous navigation system has been addressed in the literature [35, 48]. Different approaches have been proposed to tackle this problem. Works based on range scanners rely on accurate 3D models from costly sensors [4] or inexpensive sensors, but they do not work outdoor [30]. Some researchers extended their curb detection frameworks to the staircase detection problem that make use disparity map and 3D point clouds from a stereo camera as shown in [35]. However, accuracy of 3D models from stereo algorithms severely decreases in low-texture 87 scenes, which are quite common for staircase detection. To overcome shortcomings of previous approaches, researchers proposed algorithms that use a single camera to provide an answers to this problem. [53] uses the Gabor filter for distant detection and then finds edges that are equally spaced when approached close. Unfortunately, line extraction algorithms are brittle to lighting changes and shadow overlaid. Wang et al. proposed similar algorithm to our method in [59]. The main difference is that [59] aims to detect staircases observed only in a frontal view, thus may fail on many real world scenarios. A.3 Overview of algorithm Image acquisition (left) Candidate detection Final detection Staircase Detector Filtering Image acquisition (right) Ground plane estimation Stereo Matching Temporal Consistency t Figure A.1: Algorithm flowchart is shown. Candidates are detected using Haar cascade classifiers and additional false detection(yellow boxes) removal processes are applied using the ground plane from a stereo vision system and temporal consistency. The objective of our approach is to improve the detection accuracy and robustness. To achieve this goal, we collect 210 images containing 3 to 4 steps of a staircase taken from various conditions. 5250 positive 4040 pixels images are prepared by creating 25 distorted or transformed images 88 from every 210 original image. We also have 7000 negative samples that do not contain staircases in them. The Discrete AdaBoost training was performed until we obtain 18 layer cascade of classifiers with a false positive(FP) rate of 7:6 10 6 and true positive(TP) rate being 0:983 approximately. Note that the training is designed to have less training stages than classical applications. This is because higher detection rate is important for this staircase detection problem. However, a classifier with higher detection rate generally suffers from high FP rate which produces low precision. In the proposed detector, FPs are filtered by the estimated ground plane and temporal consistency, which enables the detector to achieve high precision. A.3.1 Candidate Detection For candidate detection, we used the cascade classifiers generated from the training stage and a 4040 subwindow for scanning. Scale increase rate and the number of minimum neighbors are two parameters that we optimize empirically for the best performance. To deal with scale changes effectively, different sizes of scanning window must be applied. Test results indicate that increasing the window size by 30% works robustly at a faster speed for the staircase detection purpose. Haar detector generates multiple hits(detections) over a correct staircase region. On the other hands, false detections usually have a single isolated detection. In our detection algorithm, a group of detections which have more than 10 overlaps are chosen and their average rectangle region becomes the candidate. Then th detected staircase in a frame is represented by its center(x n ;y n ), size(s n ), and confidence level(r n ). On detection,r n of every candidate is initialized to 1. 89 A.3.2 False Alarm Removal Additional false detection removal steps are applied to increase the accuracy of the detector. Recall that the main reason we approach staircase detection problem using 2D image is not because a stereo camera system is not available, but because the stereo system produces poor results specifi- cally in the staircase detection task. However, stereo camera system can provide very meaningful information that helps decreasing the false detection rate. Furthermore, we add robustness to the algorithm by introducing smooth motion constraint. A.3.2.1 Ground plane estimation Approximate ground plane assumption provides useful information and is widely used for many applications in SLAM systems that operate on the ground. We use a simple ground plane estimation method using a pair of stereo camera. For each stereo image pair, we compute a dense disparity map. We then use VDisparity algorithm [22] to locate the potential ground region within the left camera image. Prominent features are selected on these regions using [9]. If more than 8 feature points with valid disparity values are found on the largest connected region, we estimate the ground plane normal and center location by fitting the points with 3D plane using RANSAC. Staircases near the ground plane obtain higher weight because they are more critical to the blind users. For a faster computation, we divide an image into 16 by 16 disjoint subregions. Then confidence level of each candidate is determined by a modified sigmoid weight function proportional to the inverse 90 of Euclidian distance between the ground plane and a detection. The confidence calculation by the ground plane estimation is performed only when the ground plane is extractable in a scene. A.3.2.2 Temporal Consistency Appearance and motion smoothness over time is a general constraint in detection and motion track- ing. Low-level association method based on affinities between two consecutive frames suggested by Huang [24] is modified and applied to corroborate temporal consistency. Confidence level from affinities between two staircases is defined by two parameters, the position and the size of a detection over two consecutive frames. r n;new =r n;old max(A pos (S 0 i ;S n )A size (S 0 i ;S n )) (A.1) A pos (S 0 i ;S n ) and A size (S 0 i ;S n ) represent affinities of position and size of two detected patches and ranges from 0 to 1. They are also a modified sigmoid weight functions of the inverse of Euclidian distance between S 0 i and S n . S 0 i and S n represents staircase detections from the previous frame and from the current frame, respectively. k is the number of detections from the previous frame. Detections whoser n are greater than a threshold valueq fin are considered to be the final detection of staircases. 91 Figure A.2: Staircases detected using the proposed algorithm are shown. Note that the proposed algorithm can detect staircases from various view points and lighting conditions A.4 Experiments and Results The following is a brief description of parameters used and H/W configuration of experiments. q fin are set to 0.8 for the best performance. The machine used has Intel(R) Xeon Quad Core @ 3.72GHz CPU, and Nvidia GeForce GTX 460 for GPU implementation. The experiments are designed to verify the accuracy of the staircase detector, and the effectiveness of false detection re- moval process. Processing speed is also measured to validate feasibility of the proposed algorithm as a real-time application. All experiments were performed on datasets consisting of 852 frames, 320x240 pixels, that contain staircases with various shape, texture, size, and lighting conditions. 92 The datasets are collected from the wearable stereo camera shown in Figure A.1. to contain stair- cases observed from many different viewpoints and distances to test the robustness of the detector. A ground truth is created by manually delineating the staircases area using polygons. Detections with more than 75% of their area overlapping with the ground truth data are defined as a TP. A.4.1 Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision vs. Recall rate curve Recall rate Precision CC CC+TC CC+GP CC+GP+TC 93 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ROC curve for staircase detector FP Rate Recall rate CC CC+TC CC+GP CC+GP+TC 94 0 250 500 750 1000 1250 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ROC curve for staircase detector False Positive (FP) Recall Rate CC CC+TC CC+GP CC+GP+TC Figure A.3: ROC curves and performance analysis of our staircase detector. 95 The stable performance is essential in our application, which implies the precision of the de- tector is important in our application. CC, TC, and GP in Figure A.3. indicate cascade classi- fier, temporal consistency, and ground plane estimation, respectively. Figure A.3.(a) shows that CC+GP+TC achieves the highest precision, 0.76 to 0.84, between 0 and 0.7 recall rate range. As provided in Figure A.3.(b), the recall rates greater than 0.7 gives high FP rates but the detector does not operate stably. Hence the range of interest for our application is limited to the recall rate smaller than 0.7. Figure A.3.(b) shows improvement in accuracy of detection by applying TC and GP. CC+TC, CC+GP, and CC+TC+GP has smaller FP rate in the range of interest. At recall rate of 0.7 with precision of 0.76, CC+GP+TC outperforms CC by 8% in terms of FP rate. Figure A.3.(c) shows that TC, GP, and TC+GP reduces the number of FP very effectively. Experiments show that the number of FP reduces from 1400 to 501, 65% approximately. The improvement by applying the ground plane estimation is somewhat small, and certainly less than anticipated. This is due to current limitations in performance of the ground plane detector. When confidence is low, the result is discarded. Unfortunately, most previous published results on staircase detection did not provide quantitative results, hence a direct comparison to the previous approaches is not provided. The scanning process to find candidates is the biggest computational component of our detection algorithm. By applying the OpenCV GPU cascade classifier algorithm, the scanning process runs at 49.7 fps on 320x240 images, which provides speed-up factor of 18.4 (CPU: 2.7fps). 96 A.5 Conclusion In this section, we introduced a real-time staircase detection framework with high accuracy and low false alarm rate are essential. In order to meet the real-time requirement with accurate detection rate, the proposed algorithm uses 2D image based detection frameworks using classifiers built based on the AdaBoost learning algorithm that is widely used for object detection. For accuracy requirement, ground plane estimation step and temporal consistency validation step are added to lower the FP rate. Experimental results on real datasets confirm the accuracy and effectiveness of the method in various scenarios. The detection algorithm is designed to run either on CPU or GPU. As results suggest GPU implementation runs faster than 30fps in average which guarantees the real-time performance. 97 Reference List [1] Aminat Adebiyi, Shadi Bohlool, Mort Arditti, and James D Weiland. Tactile output as a method for communicating with the visually impaired in mobility. InvestigativeOphthalmol- ogy&VisualScience, 55(13):4144–4144, 2014. [2] I. Apostolopoulos, N. Fallah, E. Folmer, and K. Bekris. Integrated online localization and navigation for people with visual impairments using smart phones. ACM Transactions on InteractiveIntelligentSystems(TiiS), 3(4):21, 2014. [3] D. Bissitt and A. Heyes. An application of bio-feedback in the rehabilitation of the blind. AppliedErgonomics, 11(1):31–33, 1980. [4] G. Capi and H. Toda. A New Robotic System to Assist Visually Impaired People. In IEEE International Symposium on Robot and Human Interactive Communication, pages 259–263, 2011. [5] Yang Chen and G´ erard Medioni. Object modeling by registration of multiple range images. InProc.ICRA1991, pages 2724–2729. IEEE, 1991. [6] G. Costa, A. Gusberti, J. Graffigna, M. Guzzo, and O. Nasisi. Mobility and orientation aid for blind persons using artificial vision. InJournalofPhysics: ConferenceSeries, volume 90, page 012090. IOP Publishing, 2007. [7] Gislin Dagnelie, Pearse Keane, Venkata Narla, Liancheng Yang, James Weiland, and Mark Humayun. Real and virtual mobility performance in simulated prosthetic vision. Journal of neuralengineering, 4(1):S92, 2007. [8] A. Dodds et al. The sonic pathfinder: An evaluation. Journal of Visual Impairment and Blindness, 78(5):203–06, 1984. [9] A. Geiger, J. Ziegler, and C. Stiller. Stereoscan: Dense 3d reconstruction in real-time. In IEEEIntelligentVehiclesSymposium, pages 25–38, 2011. [10] R. G. Golledge, J. R. Marston, and C. M. Costanzo. Attituds of visually imparied persons towards the use of public transportation. JournalofVisualImpairmentBlindness, 91(5):446– 459, 1997. 98 [11] Google, 2013. http://www.google.com/glass. [12] D. Hayden, C. V ondrick, S. Jia, Y . Landa, R. Miller, A. Torralba, and S. Teller. The accuracy- obtrusiveness tradeoff for wearable vision platforms. InCVPR2012EgocentricVisionWork- shop(abstract), pages 1–2, 2012. [13] Sharon Haymes, Daryl Guest, Anthony Heyes, and Alan Johnston. Mobility of people with retinitis pigmentosa as a function of vision and psychological variables. Optometry&Vision Science, 73(10):621–637, 1996. [14] A. Helal, S. Moore, and B. Ramachandran. Drishti: An integrated navigation system for visu- ally impaired and disabled. In Wearable Computers, 2001. Proceedings. Fifth International Symposiumon, pages 149–156. IEEE, 2001. [15] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgb-d mapping: Using kinect-style depth cameras for dense 3d modeling of indoor environments. The International Journal of RoboticsResearch, 31(5):647–663, 2012. [16] J. Hill and J. Black. The miniguide: a new electronic travel device. JournalofVisualImpair- ment&Blindness, 97(10):1–6, 2003. [17] H. Hirschmuller, P. Innocent, and J. Garibaldi. Fast, unconstrained camera motion estimation from stereo without tracking and robust statistics. In Control, Automation, Robotics and Vision, 2002. ICARCV 2002. 7th International Conference on, volume 2, pages 1099–1104. IEEE, 2002. [18] S. Holzer, R. Rusu, M. Dixon, S. Gedikli, and N. Navab. Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 2684–2689. IEEE, 2012. [19] B. Horn. Closed-form solution of absolute orientation using unit quaternions. JOSA A, 4(4):629–642, 1987. [20] Armin Hornung, Kai M. Wurm, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 2013. Software available athttp://octomap.github.com. [21] A. Howard. Real-time stereo visual odometry for autonomous ground vehicles. InIntelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pages 3946– 3952. IEEE, 2008. [22] Z. Hu and K. Uchimura. U-V-disparity: an efficient algorithm for stereovision based scene analysis. InIntelligentVehiclesSymposium,2005.Proceedings.IEEE, pages 48–54, 2005. 99 [23] A. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy. Visual odometry and mapping for autonomous flight using an rgb-d camera. In International Sym- posiumonRoboticsResearch(ISRR), pages 1–16, 2011. [24] C. Huang, B. Wu, and R. Nevatia. Robust object tracking by hierarchical association of detection responses. In Proceedings of the 10th European Conference on Computer Vision: PartII, pages 788–801, 2008. [25] L. Jones, M. Nakamura, and B. Lockyer. Development of a tactile vest. In Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2004. HAPTICS’04. Proceedings. 12th InternationalSymposiumon, pages 82–89. IEEE, 2004. [26] M. Kaess, H. Johannsson, R. Roberts, V . Ila, J. Leonard, and F. Dellaert. isam2: Incremental smoothing and mapping using the bayes tree. TheInternationalJournalofRoboticsResearch, 31(2):216–235, 2012. [27] O. Koch, S. Teller, et al. A self-calibrating, vision-based navigation assistant. In Workshop onComputerVisionApplicationsfortheVisuallyImpaired, 2008. [28] S. Koenig and M. Likhachev. Fast replanning for navigation in unknown terrain. IEEE TransactionsonRobotics, 3(21):354–363, 2005. [29] M. Labb´ e and F. Michaud. Online global loop closure detection for large-scale multi-session graph-based slam. InProc.IROS2014, pages 2661–2666. IEEE, 2014. [30] Y . Lee and G. Medioni. A RGB-D camera based navigation for the visually impaired. InRSS 2011RGB-D:AdvancedReasoningwithDepthCameraWorkshop, pages 1–6, 2011. [31] R. Legood, P. Scuffham, and C. Cryer. Are we blind to injuries in the visually impaired? a review of the literature. InjuryPrevention, 8(2):155–160, 2002. [32] Tung-Sing Leung and G´ erard Medioni. Visual navigation aid for the blind in dynamic en- vironments. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conferenceon, pages 579–586. IEEE, 2014. [33] Fan Li, Chunshui Zhao, Guanzhong Ding, Jian Gong, Chenxing Liu, and Feng Zhao. A reli- able and accurate indoor localization method using phone inertial sensors. InProc.UbiComp 2012, pages 421–430. ACM, 2012. [34] X. Lu and R. Manduchi. Detection and localization of curbs and stairways using stereo vision. InIEEEInternationalconferenceonRoboticsandAutomation, volume 4, page 4648. Citeseer, 2005. [35] X. Lu and R. Manduchi. Detection and Localization of Curbs and Stairways Using Stereo Vision. In IEEE International Conference on Robotics and Automation(ICRA), pages 4648– 4654, 2005. 100 [36] R. Manduchi and S. Kurniawan. Mobility-related accidents experienced by people with visual impairment. Insight: Research & Practice in Visual Impairment & Blindness, 4(2):44–54, 2011. [37] J. Martinez, F. Ruiz, et al. Stereo-based aerial obstacle detection for the visually impaired. In WorkshoponComputerVisionApplicationsfortheVisuallyImpaired, 2008. [38] J. McDonald, M. Kaess, C. Cadena, J. Neira, and J. Leonard. 6-dof multi-session visual slam using anchor nodes. InProc.ECMR2011, pages 69–76, 2011. [39] C. Mei, G. Sibley, M. Cummins, P. Newman, and I. Reid. A constant-time efficient stereo slam system. InBMVC, pages 1–11, 2009. [40] Microsoft. Microsoft HoloLens, 2015. http://www.microsoft.com/ microsoft-hololens. [41] H. Moravec and A. Elfes. High resolution maps from wide angle sonar. In Robotics and Automation. Proceedings. 1985 IEEE International Conference on, volume 2, pages 116– 121. IEEE, 1985. [42] R. Newcombe, A. Davison, et al. Kinectfusion: Real-time dense surface mapping and track- ing. InMixedandaugmentedreality(ISMAR),201110thIEEEinternationalsymposiumon, pages 127–136. IEEE, 2011. [43] JVIB news service. Demographics update: Alternate estimate of the number of guide dog users. JournalofVisualImpairmentBlindness, 89(2):4–6, 1995. [44] C. Nguyen, S. Izadi, and D. Lovell. Modeling kinect sensor noise for improved 3d recon- struction and tracking. In3DImaging,Modeling,Processing,VisualizationandTransmission (3DIMPVT),2012SecondInternationalConferenceon, pages 524–530. IEEE, 2012. [45] Hugo Nicolau, Tiago Guerreiro, and Joaquim Jorge. Designing guides for blind people. De- partamentodeEngenhariaInformatica,InstitutoSuperiorTecnico,Lisboa, 2009. [46] Orcam, 2014. http://www.orcam.com. [47] H. Petrie, V . Johnson, T. Strothotte, A. Raab, S. Fritz, and R. Michel. Mobic: Designing a travel aid for blind and elderly people. JournalofNavigation, 49(01):45–52, 1996. [48] V . Pradeep, G. Medioni, and J. Weiland. Piecewise Planar Modeling for Step Detection using Stereo Vision. In Workshop on Computer Vision Applications for the Visually Im- paired(CVAVI), pages 1–8, 2008. [49] V . Pradeep, G. Medioni, and J. Weiland. Robot vision for the visually impaired. InComputer VisionApplicationsfortheVisuallyImpaired, pages 15–22, 2010. 101 [50] N Pressey. Mowat sensor. Focus, 11(3):35–39, 1977. [51] J. S´ aez and F. Escolano. 6dof entropy minimization slam for stereo-based wearable devices. ComputerVisionandImageUnderstanding, 115(2):270–285, 2011. [52] J. Schiller and J. Peregoy. Provisional report: Summary health statistics for u.s. adults: Na- tional health interview survey, 2011. NationalCenterforHealthStatistics, 10(256), 2012. [53] S. Se. Vision-based detection of staircases. InProceedingsofAsianconferenceoncomputer vision(ACCV), pages 1–6, 2000. [54] S. Shoval, J. Borenstein, and Y . Koren. Auditory guidance with the navbelt-a computerized travel aid for the blind. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEETransactionson, 28(3):459–467, 1998. [55] F. Steinbrucker, C. Kerl, D. Cremers, and J. Sturm. Large-scale multi-resolution surface reconstruction from rgb-d sequences. In Computer Vision (ICCV), 2013 IEEE International Conferenceon, pages 3264–3271. IEEE, 2013. [56] K. Turano, A. Broman, K. Bandeen-Roche, B. Munoz, G. Rubin, S. West, and SEE PROJECT TEAM. Association of visual field loss and mobility performance in older adults: Salisbury eye evaluation study. Optometry&VisionScience, 81(5):298–307, 2004. [57] usability.gov, 2015. http://www.usability.gov/how-to-and-tools/ methods/system-usability-scale.html. [58] K. Wakita, J. Huang, P. Di, K. Sekiyama, and T. Fukuda. Human-walking-intention-based motion control of an omnidirectional-type cane robot. Mechatronics, IEEE/ASME Transac- tionson, 18(1):285–296, 2013. [59] S. Wang and H. Wang. 2D staircase detection using real adaboost. In The Seventh Inter- nationalConferenceonInformation,Communications,andSignalProcessing(ICICS), pages 1–6, 2009. [60] S. Werner, B. Krieg-Br¨ uckner, H. Mallot, K. Schweizer, and C. Freksa. Spatial cognition: The role of landmark, route, and survey knowledge in human and robot navigation. In Infor- matik97InformatikalsInnovationsmotor, pages 41–50. Springer, 1997. [61] S. West, G. Rubin, A. Broman, B. Munoz, K. Bandeen-Roche, and K. Turano. How does visual impairment affect performance on tasks of everyday life?: The see project. Archivesof Ophthalmology, 120(6):774–780, 2002. [62] T. Whelan, H. Johannsson, M. Kaess, J. Leonard, and J. McDonald. Robust real-time vi- sual odometry for dense rgb-d mapping. In Robotics and Automation (ICRA), 2013 IEEE InternationalConferenceon, pages 5724–5731. IEEE, 2013. 102 [63] T. Whelan, M. Kaess, M. Fallon, H. Johannsson, J. Leonard, and J. McDonald. Kintinu- ous: Spatially extended kinectfusion. InRSS2012RGB-D:AdvancedReasoningwithDepth CameraWorkshop, 2012. [64] J. Wilson, B. Walker, J. Lindsay, C. Cambias, and F. Dellaert. Swan: System for wearable audio navigation. In Wearable Computers, 2007 11th IEEE International Symposium on, pages 91–98. IEEE, 2007. [65] M. Yguel, O. Aycard, and C. Laugier. Update policy of dense maps: Efficient algorithms and sparse representation. InFieldandServiceRobotics, pages 23–33. Springer, 2008. 103
Abstract (if available)
Abstract
This thesis presents a wearable RGBD camera based indoor navigation system for the visually impaired (VI) that guides a blind user from the current location to another location. The requirements for such a system are accurate real-time camera pose estimation, real-time dynamic path planning in the presence of obstacles, and efficient interface with low cognitive loads. In order to estimate a 6-DOF pose of the blind user at real-time, we perform ego-motion estimation, known as visual odometry, using sparse visual features. However, sparse feature based approaches suffer from drifts. Therefore, we refine the estimated poses using dense point clouds and the ground plane when a new keyframe is generated. ❧ The system subsequently builds a local vicinity map based on dense 3D data obtained from a RGB-D camera as well as the global 2D probabilistic occupancy grid map for traversability analysis. We have improved the system from the previous prototype by enabling the storage of results of a mapping process. An efficient multi level data structure is proposed to store, load, and manages data efficiently. ❧ Next, the shortest path between point A to point B is generated on based on this traversability analysis. The system analyzes the shortest path and generates a safe and efficient waypoint which is an intermediate goal to reach a destination. We developed an automatic waypoint generation algorithm that enhances trajectory planning of the blind subjects. Pilot studies showed that the proposed waypoint generation algorithm provides trajectories similar to sighted subjects’ trajectories compared to the shortest path algorithm. A relative direction from the camera to the waypoint is calculated and appropriate cues are generated to guide a blind user to the adjacent area of the waypoint and destination eventually. ❧ The wearable system prototype is composed of a smart phone user interface, head-mounted RGBD camera, a mapping algorithm to integrate multi Simultaneous Localization And Mapping (SLAM) results in large scale, real-time navigation algorithm, and haptic feedback vest. The proposed system achieves real-time navigation performance at 28.6Hz in average on a laptop, and helps the visually impaired extends the range of their activities and improve the orientation and mobility performance in a cluttered environment. ❧ We have evaluated the performance of the proposed system with blind-folded and blind subjects. The mobility experiment results show that navigation in indoor environments with the proposed system avoids collisions successfully and provides more stable mobility performance of the user compared to conventional and state-of-the-art mobility aid devices.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Outdoor visual navigation aid for the blind in dynamic environments
PDF
User-interface considerations for mobility feedback in a wearable visual aid
PDF
Novel imaging systems for intraocular retinal prostheses and wearable visual aids
PDF
Intraocular and extraocular cameras for retinal prostheses: effects of foveation by means of visual prosthesis simulation
PDF
Robot vision for the visually impaired
PDF
Computer vision aided object localization for the visually impaired
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Cloud-enabled mobile sensing systems
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Efficient template representation for face recognition: image sampling from face collections
PDF
Autonomous mobile robot navigation in urban environment
PDF
Techniques for compressed visual data quality assessment and advanced video coding
PDF
Experimental design and evaluation methodology for human-centric visual quality assessment
PDF
I. Asynchronous optimization over weakly coupled renewal systems
PDF
Automatic image and video enhancement with application to visually impaired people
PDF
Intraocular camera for retinal prostheses: refractive and diffractive lens systems
PDF
High-performance distributed computing techniques for wireless IoT and connected vehicle systems
PDF
Large-scale path planning and maneuvering with local information for autonomous systems
PDF
Statistical inference for dynamical, interacting multi-object systems with emphasis on human small group interactions
PDF
Explainable and lightweight techniques for blind visual quality assessment and saliency detection
Asset Metadata
Creator
Lee, Young Hoon
(author)
Core Title
RGBD camera based wearable indoor navigation system for the visually impaired
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
02/19/2016
Defense Date
12/03/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
assistive technologies for the blind,OAI-PMH Harvest,visual SLAM,wearable navigation system
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Medioni, Gerard (
committee chair
), Sawchuk, Alexander (
committee member
), Weiland, James (
committee member
)
Creator Email
lee126@usc.edu,younghun2@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-210050
Unique identifier
UC11279272
Identifier
etd-LeeYoungHo-4116.pdf (filename),usctheses-c40-210050 (legacy record id)
Legacy Identifier
etd-LeeYoungHo-4116.pdf
Dmrecord
210050
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Lee, Young Hoon
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
assistive technologies for the blind
visual SLAM
wearable navigation system