Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient indexing and querying of geo-tagged mobile videos
(USC Thesis Other)
Efficient indexing and querying of geo-tagged mobile videos
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFICIENT INDEXING AND QUERYING OF GEO-TAGGED MOBILE VIDEOS by Ying Lu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) May 2018 Copyright 2018 Ying Lu Acknowledgements Having now gone through the process, I would like to express my gratitude to those who helped me reach this point. I have enjoyed the past years to have the opportunities to share experiences, conversations and relationships with a number of extraordinary people along the way. First and foremost I would like to express my sincerest gratitude to my supervisor, Prof. CyrusShahabi, foryourpersistentguidanceandsupportthroughoutmyPhDstud- ies. I am greatly thankful that you guided me through every step of doing research and provided me the opportunities to work on various interesting projects such as MediaQ, iWatch and GIFT. Your passion and devotion to research have inspired me and I will strive to emulate in my career. Your commitment to working on real problems and mak- ing real-world impact has shaped my philosophy of research. I am truly honored to have the opportunity to directly learn from you. I would like to thank Dr. Seon Ho Kim, Dr. Farnoush Banaei-Kashani, Dr. Luciano Nocera, Dr. Ugur Demiryurekfor and Prof. Roger Zimmermann for your continuous guidance and collaborations. I am grateful to explore many research projects with you. IwouldliketothankmyproposalanddissertationcommitteemembersProf. Bhaskar Krishnamachari, Prof. Shahram Ghandeharizadeh, Prof. Craig Knoblock, Prof. Nora Ayanian, Prof. Xiang Ren and Prof. Evan Suma Rosenberg for your valuable comments, suggestions and guidance. I was fortunate to share my PhD experiences with all the bright and talented stu- dents at USC InfoLab. Thank you for making my PhD journey full of excitement and ii happiness. Special thanks to Bei (Penny) Pan for your help on my research and life during my first year, Hien To, Abdullah Alfarrarjeh and Giorgos Constantinou for all the collaborations, Dingxiong Deng, Yu (Rose) Qi, Mohammad Asghari and Yaguang Li for all your kind help. I wish all of you personal and professional successes in life. I also would like to show my gratitude to my old friend Jianguo Wang. I am happy to be your friend over twelve years. Thank you for always giving me encouragement and help whenever I encountered difficulties during my PhD life. Last but not least, I would like to express my deep appreciations to my parents, my two sisters and my brother for your kindly understanding, unreserved support and unselfish love over the years. Thank you for always believing in me, supporting my decisions and sharing my good times and bad times. iii Contents Acknowledgements ii List of Tables vii List of Figures viii Abstract 1 1 Introduction 3 1.1 Motivation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Research Problems and Challenges . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Background 14 2.1 Ground Mobile Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.1 Spatial Coverage Model . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.2 Spatial Queries on Ground Videos . . . . . . . . . . . . . . . . . . 15 2.2 Aerial Mobile Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Spatial Coverage Model . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 Spatial Queries on Aerial Videos . . . . . . . . . . . . . . . . . . . 18 3 Related Work 20 3.1 Content-based Video Management . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Keyword-based Video Management . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Geo-sensor Data based Video Management . . . . . . . . . . . . . . . . . 21 3.3.1 Geo-tagged Ground Video Indexing and Querying . . . . . . . . . 22 3.3.2 Geo-tagged Aerial Video Indexing and Querying . . . . . . . . . . 23 3.4 Sensor Rich Video Systems and Applications . . . . . . . . . . . . . . . . 24 3.4.1 Geo-tagged Video Systems . . . . . . . . . . . . . . . . . . . . . . 25 3.4.2 Geo-tagged Video Applications . . . . . . . . . . . . . . . . . . . . 26 4 Ground Video Indexing and Querying 27 4.1 Baseline methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.1 R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iv 4.1.2 Grid-based Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 OR-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.1 Orientation Augmented R-tree: OAR-tree . . . . . . . . . . . . . . 30 4.2.2 Orientation Optimized R-tree: O 2 R-tree . . . . . . . . . . . . . . . 32 4.2.3 View Distance and Orientation Optimized R-tree: DO 2 R-tree . . . 35 4.3 Query Processing with OR-trees . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.1 Range Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.2 Directional queries . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5.1 Evaluation of Range Queries . . . . . . . . . . . . . . . . . . . . . 45 4.5.2 Evaluation of Directional Queries . . . . . . . . . . . . . . . . . . . 47 4.5.3 Impact of the Query Direction Interval . . . . . . . . . . . . . . . . 47 4.5.4 Impact of Weight Parameters . . . . . . . . . . . . . . . . . . . . . 47 4.5.5 Evaluation Using the Skewed Generated Dataset . . . . . . . . . . 48 4.5.6 Evaluation Using the Real World Dataset . . . . . . . . . . . . . . 50 4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5 Aerial Video Indexing and Querying 53 5.1 Baseline Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.1 R-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.2 OR-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 TetraR-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2.1 TetraR-tree Index Structure . . . . . . . . . . . . . . . . . . . . . . 55 5.2.2 TetraR-tree Optimization . . . . . . . . . . . . . . . . . . . . . . . 59 5.3 Query Processing with TetraR-trees . . . . . . . . . . . . . . . . . . . . . 60 5.3.1 Point Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3.2 Range Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.4.1 Experimental Methodology and Settings . . . . . . . . . . . . . . . 68 5.4.2 Performance Evaluation of Index Construction . . . . . . . . . . . 70 5.4.3 Evaluation for Point Queries . . . . . . . . . . . . . . . . . . . . . 71 5.4.4 Evaluation for Range Queries . . . . . . . . . . . . . . . . . . . . . 72 5.4.5 Evaluation of Search Strategies . . . . . . . . . . . . . . . . . . . . 74 5.4.6 Evaluation on the Real World Dataset . . . . . . . . . . . . . . . . 76 5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6 Mobile Video Management System and Applications 78 6.1 MediaQ: Mobile Video Management System . . . . . . . . . . . . . . . . . 78 6.1.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1.2 Geo-tagged Mobile Video Data Collection and Accuracy Enhance- ment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.1.3 Video Data Management in MediaQ . . . . . . . . . . . . . . . . . 84 6.2 Applications of Geo-tagged Mobile Videos . . . . . . . . . . . . . . . . . . 90 6.2.1 Event Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 v 6.2.2 GIFT: Geospatial Image and Video Filtering Tool for Computer Vision Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2.3 Janus: IntelligentSurveillanceofCriminalActivitywithGeo-tagged Mobile Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.2.4 Points of Interest Detection from Geo-tagged Mobile Videos . . . . 103 6.3 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7 Conclusions and Future Work 106 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Reference List 111 vi List of Tables 4.1 Datasets for the experiments . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Synthetically Generated Dataset . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Sizes of indexing structures (Megabytes) . . . . . . . . . . . . . . . . . . . 45 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2 Synthetically Generated Gen Datasets . . . . . . . . . . . . . . . . . . . . 69 5.3 Performance of indexing structures on RW Dataset . . . . . . . . . . . . . 71 5.4 Performance of indexing structures on Gen Dataset (1B) . . . . . . . . . . 71 6.1 Overview of MediaQ Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 85 vii List of Figures 1.1 Statistics from Cisco Systems, Inc., of the mobile data traffic in 2016 and the prediction by 2021 [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 MediaQ [42] interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 An example of range queries on aerial-FOVs . . . . . . . . . . . . . . . . . 7 2.1 Ground-FOV model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Aerial Field of View (Aerial-FOV) Model . . . . . . . . . . . . . . . . . . 17 2.3 Aerial-FOVs with various azimuth, pitch and roll angles. The vector is rotation angle vector<θ a ,θ p ,θ r >. Dashed areas denote the “dead spaces”. 18 4.1 Sample dataset of FOV objects . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Dead spaces of object and index node of R-tree. Dashed area denotes the dead spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Virtual dead spaces of OAR-tree nodes based on different optimizations. Dashed area indicates virtual dead space. . . . . . . . . . . . . . . . . . . 32 4.4 Leaf nodes of O 2 R-tree for the example in Figure4.1 . . . . . . . . . . . . 35 4.5 Illustration of optimization criteria with and without considering view distance. Suppose the fanout is 2. . . . . . . . . . . . . . . . . . . . . . . . 36 4.6 Overlap identifying for a ground-FOV object . . . . . . . . . . . . . . . . 38 4.7 Illustration of MinA and MaxA . . . . . . . . . . . . . . . . . . . . . . . 39 4.12 Vary β o in O 2 R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 viii 4.13 Vary β d in DO 2 R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.14 Comparison using skewed Gen dataset . . . . . . . . . . . . . . . . . . . . 49 4.15 Vary query direction interval for directional queries . . . . . . . . . . . . . 49 4.16 Comparison on RW dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.8 Page accesses of range queries . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.9 Query processing time of range queries . . . . . . . . . . . . . . . . . . . . 52 4.10 Impact of radius in range queries . . . . . . . . . . . . . . . . . . . . . . . 52 4.11 Varying dataset size for Directional queries . . . . . . . . . . . . . . . . . 52 5.1 A TetraR-tree index node including two aerial-FOVs f 1 and f 2 . . . . . . 56 5.2 Various optimization mechanisms during TetraR-tree construction. . . . . 59 5.3 Even-odd rule algorithm for point query . . . . . . . . . . . . . . . . . . . 61 5.4 Pruning strategy with outer convex hull for point query. . . . . . . . . . . 63 5.5 Total hit strategy with inner convex hull for point query. . . . . . . . . . . 63 5.6 Illustration of the outer convex hull computing . . . . . . . . . . . . . . . 65 5.7 Inner convex hull: not constructed from non-dominated corner points . . 65 5.8 Illustration on inner convex computation. . . . . . . . . . . . . . . . . . . 67 5.9 Illustration on how to transform a range query to a point query . . . . . . 68 5.10 Point query on Gen Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.11 Range query on Gen Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.12 Vary query radius for range queries on Gen Dataset . . . . . . . . . . . . 74 5.13 Evaluation on the search strategies . . . . . . . . . . . . . . . . . . . . . . 75 5.14 Evaluation on RW Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.1 Overall structure of the MediaQ framework with its sub-components. . . . 80 6.2 Architecture of the MediaQ mobile app. . . . . . . . . . . . . . . . . . . . 81 ix 6.3 CumulativedistributionfunctionofaverageerrordistanceswiththeKalman filtering based algorithm. The height of each point represents the total amount of GPS sequence data files whose average distance to the ground truth positions is less then the given distance value. . . . . . . . . . . . . 84 6.4 MediaQ Video Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.5 MediaQ Video Search Interface (Ground Videos) . . . . . . . . . . . . . . 87 6.6 MediaQ Range Query Interface (Aerial Videos) . . . . . . . . . . . . . . . 87 6.7 Fine-granular Mobile Video Search . . . . . . . . . . . . . . . . . . . . . . 90 6.8 Screenshot: NATO Summit 2012 experiments in Chicago utilizing a cus- tom Wi-Fi setup with range extenders and mobile access points. . . . . . 91 6.9 Screenshot: Videos from the NATO Summit 2012 in Chicago. The start- ing position of every trajectory is marked with a pin on the map. . . . . . 92 6.10 Screenshot: Illustration of the FOV of the current frame and the GPS uncertainty (the yellow circle around the camera position). . . . . . . . . 92 6.11 GIFT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.12 Example images from the PBS experiments. . . . . . . . . . . . . . . . . . 97 6.13 Example panoramic image: SelectedFOV# = 17, Source Video# = 3, Selection and Stitching Time = 11.8 sec. . . . . . . . . . . . . . . . . . . . 97 6.14 Geospatial Filtering for 3D Model Reconstruction . . . . . . . . . . . . . . 98 6.15 Illustration of Geo-based Active Key Frame Selection Algorithm . . . . . 98 6.16 An Overview of Persistent Target Tracking with GIFT . . . . . . . . . . . 100 6.17 Spatial Queries for Persistent Target Tracking . . . . . . . . . . . . . . . . 100 6.18 Janus System Architecture: Data, Analytics and Presentation tiers are shown in green, orange and blue. Color-coded arrows outline how raw and pre-processed data, incidents and events are routed through the modules of the system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 x 6.19 Example data that can be used for stalking forensic analysis: (a) tagged pedestrian trajectories, (b) mobile video generated by the public and (c) automatically extracted trajectories and high-resolution face images. . . . 103 6.20 POI Detection based on FOV Model . . . . . . . . . . . . . . . . . . . . . 104 xi Abstract We are witnessing a significant growth in the number of smartphone users and advances in phone sensor technology. More recently, driven by the advances in control engineering, material science and sensor technologies, drones are becoming significantly prevalent in daily life (e.g., event coverage, tourism). Consequently, an unprecedented number of both smartphone videos (i.e., ground videos) and drone videos (i.e., aerial videos) are generated and consumed by public users. In such a large repository, it is difficult to index and search the mobile videos in an unstructured form. Content-based and tag-annotation-based video management suffer from the efficiency and scalability problems. However, due to the rich sensor instrumentations in smartphones and drones, both ground videos and aerial videos can be geo-tagged (e.g., GPS locations and compass directions) at the acquisition time, providing an opportunity for efficient management of aerial videos by exploiting their corresponding spatial structures. Ideally, each ground (aerial) video frame can be tagged by the spatial extent of its coverage area, termed ground Field-Of-View (i.e., ground-FOV, aerial-FOV for aerial videos). This provides an opportunity for efficient management of mobile videos by exploiting their correspond- ing geo-metadata. My thesis tackles the challenges of large-scale mobile video 1 data management using spatial indexing and querying of their corresponding FOVs 2 . Unlike the traditional 1 Ground videos and aerial videos are collectively called mobile videos 2 Ground-FOVs and aerial-FOVs are collectively called FOVs 1 spatial objects (e.g., points and rectangles), ground-FOVs are shaped similar to slices of pieandcontainbothlocationandorientationinformation, andaerial-FOVsareshapedin irregular quadrilaterals. Therefore, conventional spatial indexes, such as R-tree, cannot index them efficiently. Additionally, the distribution of user-generated mobile videos is non-uniform (e.g., more FOVs in popular locations). Consequently, even multilevel grid- based indexes have limitations in managing the skewed distribution. Moreover, since user-generated mobile videos are usually captured in a casual way with diverse setups and movements, no a priori assumption can be made to condense them in an index structure. To overcome the challenges, I propose a class of new index structures called OR-trees and a new index structure called TetraR-tree for efficiently indexing ground and aerial videos, respectively. The key idea of both index structures is that maximally harnessing the corresponding geographical properties of ground-FOVs and aerial-FOVs. OR-trees effectively harness ground-FOVs’ camera locations and orientations, by taking both of them into consideration during index construction to optimize the combination of loca- tion closeness and orientation proximity. TetraR-tree effectively captures the geometric property of quadrilaterals by indexing their four corner points. Based on the proposed indexes, I present novel search strategies and algorithms for efficient spatial queries on groundandaerialvideos. Experimentsusingbothreal-worldandlargesyntheticdatasets (over 30 years’ worth of videos) demonstrate the scalability and efficiency of the proposed indexes and search algorithms. 2 Chapter 1 Introduction 1.1 Motivation and Background Driven by the advances in video technologies and mobile devices (e.g., smartphones, Tablets, Google Glasses, drones), a large number of user generated mobile videos are being produced and consumed. For example, people often use their smartphones to take videos to capture memorable subjects and situations at places (e.g., touristic attractions, concerts, and political rallies) and such videos are often uploaded to social media web- sites, such as Facebook, Flickr, YouTube and Instagram. YouTube [2] has indicated that by the end of 2017, over 72 hours of mobile video were uploaded every minute and there are 1 billion video views on mobile devices per day on average. Facebook [3], another top player in the social media video landscape, reported that over 8 billion average daily views and 100 million hours of video watched every day in 2017. According to a study by Cisco [1], the overall mobile data traffic reached 5.9 exabytes per month at the end of 2016, 70% of which was mobile video data. It is forecasted that mobile video traffic will reach 34.6 exabytes per month by 2021. The statistics of the mobile data traffic in 2016 and the forecast for the following five years are shown in Figure 1.1. Recently, with advances in control engineering, material science and sensor technolo- gies, unmanned aerial vehicles (UAVs, a.k.a. drones) have gained significant commercial momentum. While drones have been mainly used in military activities and public safety for more than one century, personal drones are recently becoming popular as a new consumer mobile device category which are primarily used for aerial photography and video. The number of drones is consequently increasing dramatically. According to the statistic report of Digital Marketing Ramblings (DMR) [4, 5], about 0.7 million drones 3 Figure 1.1: Statistics from Cisco Systems, Inc., of the mobile data traffic in 2016 and the predic- tion by 2021 [1] shipped in 2015 and the number is expected to increase up to 7 million by 2020. The estimated value of the drone industry is $3.3 billion in 2015 and is forecasted to rise up to $127 billion by 2020. In this thesis, videos recorded on the ground with mobile devices (e.g., smartphones, iPad, Google Glasses or Tablets) are called “ground videos”, and videos recorded in the sky with mobile devices (e.g., drones or helicopters) are called “aerial videos”. Both ground and aerial mobile videos play critical roles in daily life. With billions of ground video daily views [2, 3], it is obvious that ground videos provide an important source of information that online users could learn about subjects, places, or events they are interested in. For example, one could search user-generated videos of the Los Angeles wildfire [6]; travelers could search videos to exploit a place they are going to visit. Aerial videos are also playing an important role in daily life and they are used in the grow- ing applications such as event coverage, tourism, traffic monitoring in transportation, rescues, agriculture and disaster responses. For example, drone videos could provide a spectacular view capture events such as a sporting event, concerts, rodeos, weddings, and other outdoor celebrations; resorts would show breathtaking scenic vista views of tourist attractions and landscapes. In the presence of a huge amount of user-generated mobile videos, it is challenging to effectively and efficiently organize and search mobile videos in an unstructured form. 4 Content-based video organization and retrieval have been extensively studied in the past decades in the multimedia community. Video content can be represented by various features including: color (e.g., color histogram [70]), texture (e.g., Gabor [70]), local interest points (e.g., SIFT [56]), and bag-of-words descriptor [80]. Typical content- based video retrieval systems can be roughly divided into two major components: 1) a module for extracting representative visual features from video frames, and 2) an appropriate similarity model to find similar video frames from a high-dimension visual feature database. While content-based techniques are helpful for understanding videos’ contents and semantics, it is obvious that they are time consuming and suffer from the efficiency and scalability problems for large-scale videos. Another popular group of video search utilizes textual annotations, commonly known as tags (or keywords). Tags can be anything that is related to the content of the video, for example location names, people or animal names, event or action descriptions, etc. However, generating a set of meaningful tags to describe the video content is expensive, laborious and often ambiguous. Consequently, textual annotation based video searching is also struggling to achieve satisfactory results with high accuracy and scalability. However, mobile devices nowadays (e.g., smartphones and drones) are embedded with rich sensor instrumentations (e.g., GPS, compass, accelerometers, and gyroscope units). Mobile videos can be automatically geo-tagged (e.g., GPS locations, camera shooting directions) at the acquisition time. This provides an opportunity for efficient management of mobile videos by exploiting their corresponding geo-metadata. Asso- ciating geographic sensor information for video search has become an active research topic [68, 88, 73, 35, 22, 14, 11]. Many social media platforms also have increasing inter- est in GIS applications. For example, Google Maps 1 , YouTube 2 , Facebook Places 3 , 1 https://www.google.com/maps/ 2 https://www.youtube.com/ 3 https://www.facebook.com/places/ 5 Flickr 4 , Panoramio 5 allow users to upload images or videos with attached geo-locations of the cameras and support nearby search applications. However, most of the existing studies (e.g., [68, 88, 73, 35]) and the web services organize and search videos or images based on their camera locations. For example, YouTube represents the entire video with a single point for nearby video search. The results are inherently imprecise as using the camera location is insufficient to represent the coverage of a photo or video: 1) the camera and the subject in the video are usually at different locations and many times far apart (e.g., pictures of the Statue of Liberty are usually taken at a considerable distance from it) [29, 93]; 2) cameramen often move during video recording and thus a single camera location is inadequate for a video with a trajectory. To overcome all the above challenges, we leverage the geo-sensor data to represent the visible scene of each individual video frame with the spatial extent of its coverage area at a fine granularity. For ground videos, a typical spatial model is the Field-Of- View (ground-FOV) model [14], that represents a video frame as a geometric figure, e.g., using camera location, viewing direction and visible distance. The FOV model has been proven to be very useful for various media applications as demonstrated by the online mobile media management systems, MediaQ [42], GeoVid [7]. Figure 1.2 shows the ground-FOV (blue pie-slice) of the video frame currently being displayed in MediaQ. MediaQ 6 is an online media management system that to collect, organize, share, and search mobile videos using automatically tagged geospatial metadata. For aerial videos, each aerial video frame can be modeled as the spatial extent of its coverage area, termed aerial Field-Of-View (aerial-FOV), using the camera location, drone rotation angles (i.e., azimuth, pitch and roll) and camera viewable angle [95, 74, 45]. An aerial-FOV, the spatial coverage on the ground of an aerial video frame, is shaped as a quadrilateral, as shown in Figure 1.3. As most applications are typically interested in the aerial video 4 http://www.flickr.com 5 http://www.panoramio.com 6 http://mediaq.usc.edu/ 6 coverage area on the ground [45, 53, 95], in my thesis, I focus on their spatial coverage on the ground and represent an aerial-FOV as a quadrilateral. Compared to the content- based and tag-annotation-based approaches, the structured geo-metadata with much less occupy space can be efficiently organized, which makes searching among large scale of videos practical. Further, instead of geo-tagging based on camera locations only, we search videos based on the spatial coverages at the video frame level which is much more precise to describe the visible scenes of videos. 1.2 Research Problems and Challenges (a) Range query (b) Directional query Figure 1.2: MediaQ [42] interface Figure 1.3: An example of range queries on aerial-FOVs 7 Based on the rich geospatial metadata and the fine-grained spatial coverage models (i.e., ground-FOVs, aerial-FOVs), we can support rich spatial queries on both ground and aerial mobile videos: point queries, range queries and directional queries. A point query finds all the FOVs that cover a user-specified query point (e.g., a person, a statue). A range query finds all the FOVs that overlap with a circular query range (e.g., a college campus, a park). Figure 1.2a illustrates a circular range query in MediaQ [42], searching forvideoscoveringanareaattheUniversityofSouthernCalifornia, wherethemarkerson themapshowthelocationsofthevideoresultsoftherangequery. Figure1.3illustratesa range query that searches aerial-FOVs overlapping with Los Angeles Memorial Coliseum (red circle). Beside point and range queries, we can also support directional queries on the mobile videos [64, 63]. A directional query finds all the FOVs whose orientations overlap with the user-specified direction within a range. Figure 1.2b shows the results of a directional query when the input direction is the north. Note that “direction” discussed here is an inherent attribute of a spatial object (i.e., FOV). This is different from how direction has been treated in the past in the spatial database field, where direction is only a component of a query. For example, the goal of an object-based directional query in [55] is to find objects that satisfy the specified direction of the query object (e.g., “finding restaurants to the north of an apartment”) while the goal of a directional query in this study is to find all the objects pointing towards thegivendirection. Todistinguishthesetwocharacteristics, wewillusetheterm “orientation” when referring to the direction attribute of FOV objects and “direction” when we refer to the query component. Such spatial queries are the main building blocks of many applications in various domains such as in urban planning, criminal investigation, transportation, aerial surveil- lance, tourism. For example, to investigate the disappearance of Yingying Zhang in UIUC 7 , FBI needed to find all the videos that can cover the bus station where Yingy- ing Zhang visited (point query) and all the videos that cover the areas around the bus 7 http://police.illinois.edu/search-updates/ 8 station (range query). Similar queries were used to search crowdsourced videos of the Boston Marathon bombing or to search dash camera videos of a car accident. Addition- ally, not every view from any direction or distance to a target is distinctive or attractive to users [92]. Hence directional queries which search videos captured from a specific direction are helpful in many applications such as panorama generation [43], 3D model reconstruction [89], event reviews and video summaries [76]. To answer the spatial queries on large-scale mobile videos efficiently, we need to index the FOVs and this is the focus of my thesis. However, indexing FOVs poses challenges to existing spatial indexes due to their inability to incorporate the orientation property of FOVs and the way the user-generated videos are collected in the real world. Let us elaborate each case below. First of all, unlike traditional spatial objects (e.g., points, rectangles), FOVs are spatial objects with both locations and orientations. ground-FOVs are pie-shaped including camera locations and viewing directions. While aerial-FOVs are shaped in quadrilaterals. Further, as the drone camera shooting directions change, the shape of an aerial-FOV changes as well and can be any irregular quadrilateral (to be discussed in Section 2.2). Because of this, many existing indexes cannot efficiently support this type of data. For example, one straightforward approach to index FOVs with a typical spatial index such as R-tree [27] is to enclose the area of each FOV with its Minimum Bounding Rectangle (MBR). In this case, R-tree suffers from unnecessarily large MBRs and consequently large “dead spaces” (i.e., empty area that is covered by an index node but does not overlap with any objects in the node). Also, it can perform neither orientation filtering nor orientation optimization, resulting in many unnecessary index node accesses. Another recent work on indexing for FOVs is a three-level grid- basedindex(Grid)[64,63]. Gridstoresthelocationanddirectioninformationatdifferent grid levels which will not be efficient for video querying since video queries (e.g., range queries) involve both location and orientation information of FOVs at the same time. Second, in real life, the FOVs of user-generated videos are recorded in a casual way with various shooting directions, moving directions, moving speeds, zoom levels and camera 9 lens. There is no pattern that can be used to condense the index storage and then optimize the index. For example, GeoTree [44] focuses on dashcam videos recorded by cars driving on road networks, assuming camera shooting and moving directions does not change frequently. This type of index, however, is not suitable indexing user-generated videos with frequent changes on shooting and moving directions and zoom levels. Grid suffers from efficiency problem for indexing FOVs with the different zoom levels and camera lens’ properties. In addition, the FOVs are not uniformly distributed in the real world. Certain areas result in a significantly larger number of FOVs due to the high frequency and long duration of uploaded videos for those locations. Trivially, Grid performs poorly for non-uniformly distributed FOVs since the occupancy of grid files rises quickly for skewed distribution. 1.3 Contributions To overcome the drawbacks of the existing approaches, in my thesis, I propose a class of new index structures to enable rich and efficient spatial queries on geo-tagged user- generated ground and aerial mobile videos, respectively, by incorporating the compass orientationandGPSlocationinformationintotheindexes. Specifically, thecontributions of my thesis are listed below: • For ground mobile videos, I propose a class of new index structures, called OR- trees [60, 61], building on the premises of R-tree. Specifically, our first straight- forward approach of OR-trees uses R-tree to only index the camera locations of FOVs as points and then augments the index nodes to store their orientations. This variation of OR-tree is expected to generate smaller MBRs and reduce their dead spaces while supporting orientation filtering. To enhance further, we devise a second variation by adding an optimization technique that uses orientation infor- mation during node split and merge operations. Finally, in our third and last variation, we add the FOVs’ viewable distances into the consideration during both 10 the filtering and optimization process. OR-trees harness FOVs’ camera location and orientation information for both filtering and index optimization without any priori assumption on the user-generated videos. • Based on OR-trees, novel search strategies and algorithms are presented for effi- ciently answering range and directional queries on ground videos. For example, I devise a new method to identify an index node of OR-tree as “total hit” (i.e., all the child nodes are in the result set) without accessing its child nodes, which results in a significant reduction in processing cost. To decide whether an OR-tree node need to be accessed or not, tight approximations are computed with the aggregated information of an OR-tree node. • For aerial mobile videos, this is the first work in the literature on aerial video index- ing and query processing by exploiting their spatial coverages, i.e., quadrilateral- shaped aerial-FOVs. I propose a new index structure, called TetraR-tree [59]. Different from R-tree which stores the MBR of aerial-FOVs, at each index node of TetraR-tree, we store four MBRs (tetra-corner-MBRs), each of which covers one of the four corner points of all the aerial-FOVs (i.e., quadrilaterals) in that index node. I also propose a heuristic during the index construction of a TetraR-tree that tries to minimize the alignment waste of the four corners of all the quadrilaterals in an index node, which is critical to improve the performance of index construction and does not lose much efficiency for the query processing. • I propose two novel search strategies based on TetraR-tree for both point and range queries on aerial videos. Based on the geometric properties of the tetra- corner-MBRs in a TetraR-tree index node, we can compute two convex hulls: 1) the smallest convex hull (called outer convex hull) that encloses all the aerial-FOVs in the node, and 2) the largest convex hull (called inner convex hull) where all the aerial-FOVs in the node can cover. Subsequently, based on the outer and inter inner convex hulls, we propose two new search strategies: pruning strategy (to 11 effectively decide whether a node can be pruned or not) and total hit strategy (to effectively decide whether all the objects in a node are in the results or not) for both point and range queries. • Extensive experiments using real-world datasets and large synthetically generated datasets (more than 30 years’ worth of videos) demonstrate the superiority of OR-trees and TetraR-tree. Experimental results show that the first variation of OR-tree which simply augments the nodes with orientations did not produce any better results than the baseline indexes. However, when utilizing orientation and view distance information for the merge and split operations in the second and third variations, the index performance in supporting range and directional queries significantly improved by at least 50%. This implies that a naive addition of extra orientation information in augmenting R-tree does not necessarily enhance the performance of indexing. However, the results show that our sophisticated optimization techniques in augmenting R-tree considering orientation and view distance is critical in the enhancement of the index performance. For aerial video indexing, empiricalresultsdemonstratethesuperiorityoftheproposedTetraR-tree with the proposed search algorithms over the baselines (e.g., R-tree) for both point and range queries by at least 70% in terms of query time and I/O cost. Note that on the large-scale generated dataset, both OR-trees and TetraR-trees can respond to the spatial queries on (ground and aerial) mobile videos within a reasonable time (1–2 seconds). • I introduce several our proposed research projects (MediaQ [42], GIFT [18, 19] and Janus [77]) to demonstrate how we can use the geographic sensor data to organize and search mobile videos and discuss the real world applications of the geo-tagged mobile videos. For example, geo-metadata associated with mobile videos could be used for key video frame selection with spatial filtering for various computer vision applications (e.g., panorama generation [43], 3D model construction [89], 12 target tracking [18, 19]). Based on the FOVs, geo-tagged mobile videos can be used for points of interest detection [8] or criminal event detection [77]. In these applications, geo-tagged video indexing and querying are critical to finding the candidate video frames. 1.4 Thesis Overview The structure of the thesis is organized as follows: Chapter 2 introduces the field of viewspatialmodelsforbothgeo-taggedgroundandaerialvideosandformallydefinesthe videoqueries. Chapter3reviewsrelatedworktoprovidethecontextforthecontributions of the thesis. Chapter 4 presents the new index structures and search algorithms for geo- tagged ground mobile videos. Chapter 5 presents the proposed indexing and querying techniques for geo-tagged aerial mobile videos. Chapter 6 describes the proposed mobile video management system and discusses its applications. Chapter 7 summarizes the thesis and presents the possible directions for future work. 13 Chapter 2 Background Let me first introduce the spatial coverage models for ground and aerial mobile videos and define their spatial queries. 2.1 Ground Mobile Videos 2.1.1 Spatial Coverage Model In my thesis, ground videos are represented as a sequence of video frames, and each video frame is modeled as a Field Of View (ground-FOV) [14] as shown in Fig- ure 2.1. A ground-FOV f is denoted as (p, R, − → Θ), in which, p is the camera location <lantitude,longtitude>, R is the visible distance, − → Θ is the orientation of the ground- FOV in form of a tuple < − → θ b − → θ e >, where, in a clockwise direction, − → θ b and − → θ e are the beginning and ending view direction (a ray), respectively. We store − → Θ as a tuple of two numbers: Θ<θ b ,θ e >, where, θ b = − → N − → θ b (respectively θ e = − → N − → θ e ), is the angle from the north − → N to the beginning (respectively ending) view direction − → θ b (respectively − → θ e ) in a clockwise direction. During video recording using sensor-rich camera devices, for each video frame f, we can capture the camera view direction θ of the the video frame f with respect to the north from the compass sensor automatically. Further, according to the camera lens properties and zoom level, we can calculate the visible distance R and viewable angleα of the video framef [14]. Specifically, the visible distanceR is obtained in Eqn(2.1). R = fh y (2.1) 14 where f is the lens focal length, y is the image sensor height and h is the height of the target object that will fully captured within the video frame. The target object is assumed to be a two story, approximately 8.5 meter-tall building (i.e., h = 8.5m). It is claimed in the study [14] that assuming good lighting conditions and no obstructions, the building can be considered visible within a captured frame if it occupies at least 1/20th of the full image height. And the viewable angleα can be computed through the following formula Eqn(2.2) [31]. alpha = 2 tan −1 y 2f (2.2) Consequently, we can derive the starting and ending view directions: θ b = (θ− α 2 + 360)mod360, and θ e = (θ + α 2 + 360)mod360 (θ is the center direction between θ b and θ e ). ground-FOVs in two dimensions are in sector shaped, which consider camera azimuths. 3-dimensional ground-FOVs are in pyramid shaped, which consider other two camera rotation types: pitch and roll. We mainly focus on ground videos that are taken outdoor in an open area assuming that there are no obstructions within ground- FOVs, whose spatial coverage on the ground is shaped as a sector. Additionally as our proposed indexes and search algorithms for 2-dimensional ground-FOVs can be extended to 3-dimensional ground-FOVs, we focus on 2-dimensional ground-FOVs in this thesis. LetV be a geo-tagged ground video dataset. For a video v i ∈V, let F v i be the set of ground-FOVs of v i . Hence, the video databaseV can be represented as an ground-FOV databaseF ={F v i |∀v i ∈V}. 2.1.2 Spatial Queries on Ground Videos AswerepresentavideodatabaseV asaground-FOVspatialdatabaseF, theproblem ofgeo-videosearchistransformedintospatialqueriesonFOVdatabaseF. Givenaquery 15 p θ α North R Θ r b θ e θ → b θ → e θ Figure 2.1: Ground-FOV model circleQ r (q,r) with the center pointq and the radiusr, a range query finds ground-FOVs that overlap with Q r . It is formally defined as: RangeQ(Q r ,F) def ⇐⇒{f∈F|f∩Q r 6=∅} (2.3) Given a query direction interval Q d (θ b ,θ e ) and a query circle Q r (q,r), a directional query finds ground-FOVs whose orientations overlap with Q d within the range Q r , and is formally defined as: DirectionalQ(Q d ,Q r ,F) def ⇐⇒{f∈F|f. − → Θ∩Q d 6=∅ and f∩Q r 6=∅}, (2.4) where, f. − → Θ denotes the orientation of ground-FOV f. 2.2 Aerial Mobile Videos These days, typical drones (e.g., Canada Drones [9]) almost always include rich sen- sors such as cameras, GPS, 3-axis accelerometers and 3-axis gyroscopes. Thus the geo- graphic sensor metadata (e.g., camera locations and viewing directions) of drones can also be automatically collected during the video recording. 16 2.2.1 Spatial Coverage Model Similarly, aerial videos can be represented as a sequence of video frames, and the field of view of each aerial video frame [95, 74, 45], referred as “aerial-FOV”, is illustrated in Figure 2.2. An aerial-FOV f is modeled as a 7-tuple Γ (lat, lng, hgt, θ a , θ p , θ r , α), in which, < lat,lng,hgt > is the camera location: latitude, longitude and the height with respect to the ground, θ a is the azimuth (yaw) angle rotating around the vertical axis (i.e., the angle from the north to azimuth direction), θ p is the pitch angle rotating around the lateral axis (i.e., the angle from the direction toward to the earth to the pitch direction), θ r is the roll angle rotating around the longitudinal axis (i.e., the angle from the parallel plane of the ground to the roll direction), and α is the drone camera visible angle. Figure 2.2: Aerial Field of View (Aerial-FOV) Model As shown in Figure 2.2, the spatial coverage on the ground of a drone video frame is shaped as a quadrilateralP which can be represented by its four vertices (or corner points){A,B,C,D}. As most applications are typically interested in the video coverage areaontheground[45,53,95], inthisthesis, Ifocusonthespatialcoverageontheground of aerial videos and represent an aerial-FOV as a quadrilateral. The quadrilateral of an 17 (a) Square < 0 ◦ , 0 ◦ , 0 ◦ > (b) Trapezoidal < 0 ◦ , 20 ◦ , 0 ◦ > (c) Kite < 0 ◦ , 20 ◦ , 20 ◦ > (d) Irregular quadrilateral < 0 ◦ , 50 ◦ , 10 ◦ > (e) Irregular quadrilateral < 30 ◦ , 50 ◦ , 10 ◦ > Figure 2.3: Aerial-FOVs with various azimuth, pitch and roll angles. The vector is rotation angle vector <θ a ,θ p ,θ r >. Dashed areas denote the “dead spaces”. aerial-FOV f(A,B,C,D) can be derived from the 7-tuple Γ(lat, lng, hgt, θ a , θ p , θ r , α) [53]. As the pitch and roll angles change, the shape of an aerial-FOV changes and can be any irregular quadrilateral, as shown in Figure 2.3. 2.2.2 Spatial Queries on Aerial Videos For aerial-FOVs, we introduce two types of spatial queries: point and range queries. As illustrated in Figure 2.2, 1) a point query Q p finds aerial-FOVs that cover the query point; and 2) a range query Q r finds aerial-FOVs that overlap with the query circle. AswerepresentanaerialvideodatabaseV asanaerial-FOVdatabaseF, theproblem of aerial video search is transformed into spatial queries on the aerial-FOV databaseF. Given a query point Q p (q), a point query finds aerial-FOVs that cover the query point Q p . It is formally defined as: PointQ(Q p ,F)⇐⇒{f∈F|Q p ∈f} (2.5) 18 Given a query circleQ r (q,r) with the center point q and the radiusr, a range query finds aerial-FOVs that overlap with Q r . It is formally defined as: RangeQ(Q r ,F)⇐⇒{f∈F|f∩Q r 6=∅} (2.6) 19 Chapter 3 Related Work I divide the related work on video data management into three groups: content-based video management, keyword-based video management, and geo-tagged video manage- ment. Next, I will briefly introduce the existing studies on content-based and keyword- based video management and focus on the related work on geo-tagged mobile video management as it is more related to my thesis. Finally, I will review the related work on geo-tagged video systems and applications. 3.1 Content-based Video Management Content-based video organization and retrieval have been extensively studied in the past decades in the multimedia community. Video content can be represented by var- ious features including: color (e.g., color histogram [70]), texture (e.g., Gabor [70]), local interest points (e.g., SIFT [56]), and bag-of-words descriptor [80]. Typical content- based video retrieval systems can be roughly divided into two major components: 1) a module for extracting representative visual features from video frames, and 2) an appro- priate similarity model to find similar video frames from a high-dimension visual feature database. Thus content-based video indexing includes high dimensional indexing and semantic indexing. The high dimensional indexing extracts features from the video seg- ment and uses these features to process similarity comparison, while semantic indexing builds up the connection between the visual content and the semantics by video data mining, classification and annotation [33]. The video clustering and indexing methods used are also based on the visual content. Ngo et al. [69] proposed a two-level hier- archical clustering structure to organize the content of sport videos: the top level is 20 clustered by color feature while the bottom level is clustered by motion feature. The video indexing approaches always use the following key image processing algorithms: camera motion estimation and compensation, object segmentation and tracking tech- niques, and line detection. Akrami and Zargari [10] proposed a compressed domain video indexing method. It is based on the position of the blocks which are used for motion compensation in the coded video. Summary: While content-based techniques are helpful for understanding videos’ con- tent and semantics, it is obvious that they are time consuming and suffer from the efficiency and scalability problems for large-scale videos. 3.2 Keyword-based Video Management Commercial video search engines like Google Video and YouTube use text annota- tions and captions for video search. Repositories of large number of videos are searched using the keywords extracted from captions, text annotations and thumbnail. Keywords can also be summarized from video contents by image processing techniques [81, 93]. However, neither the keywords are effectively linked to the video content, nor are they sufficient for to make an effective video search engine. Moreover, the process of building suchrepositoriesisuserdependant, andreliesonthetextualcontentprovidedbytheuser while uploading the video. Consequently, tag-based video searching is also struggling to achieve satisfactory results with high accuracy and scalability. 3.3 Geo-sensor Data based Video Management Mobile devices nowadays (e.g., smartphones and drones) are embedded with rich sen- sor instrumentations (e.g., GPS, compass, accelerometers, and gyroscope units). Mobile videos can be automatically geo-tagged (e.g., GPS locations, camera shooting directions) at the acquisition time. Associating geographic information always help on video and image management. 21 3.3.1 Geo-tagged Ground Video Indexing and Querying I first review the existing methodologies on indexing and query processing for geo- tagged ground videos. Geo-referenced video management has become an active topic [68, 16, 44, 88, 91, 63, 28]. For example, Kim et al. [41] proposed a vector-based approximation model to efficiently index and search videos based on the ground-FOV model. Flora et al. [25] and Lewis et al. [48] discussed how to use the existing spatial databases (e.g., Informix, PostGIS) to answer range queries efficiently through using polygons to represent the cov- erage of video segments and carefully designing the DB schema. However, the study [25] focused on range queries only. Furthermore the polygon representation of video frames andthequeryprocessingalgorithmsdonotconsidertheorientationinformationofvideos which would result in poor performance for directional queries. Navarrete et al. [68] uti- lized R-tree [27] to index the camera locations of videos. Toyama et al. [88] used grid files to index the camera location and time information. The two studies [68, 88] treated videos / video frames as points. The most closely related studies [16, 44, 63] to this the- sis focused on indexing and query processing of geo-videos represented as ground-FOV objects. Ay et al. [16] indexed ground-FOV objects with R-tree, and Ma et al. [63] pro- posed a grid-based index for ground-FOV objects. However, neither of them are efficient. Their drawbacks are discussed in Section 4.1.2. Kim et al. [44, 46] presented an R-tree based index called GeoTree for ground-FOV objects. The difference between GeoTree and R-tree is that GeoTree stores Minimum Bounding Tilted Rectangle (MBTR) in the leaf nodes. An MBTR is a long tilted rectangle paralleling with the moving direction of the ground-FOV stream enclosed in the MBTR. GeoTree focus on dashcam videos recorded by cars driving on road networks, assuming camera shooting and moving direc- tions does not change frequently. However, in real life, the moving directions of mobile videos change frequently, e.g., with Google Glass or a mobile phone. Further, GeoTree considered the moving direction of FOV objects instead of the orientations of ground- FOV objects. Furthermore, it does not store the orientation information in the index 22 node for filtering. Therefore, the existing work is not efficient and effective for indexing and querying the geo-videos with both location and orientation information. To this end, my thesis focuses on indexing such ground-FOV objects with both location and orientation information. Indexing the temporal information of videos is orthogonal to the problem studied in this work. Since ground-FOV objects are spatial objects with orientations, studies on directions are related to our work. The exploration of directional relationships between spatial objects has been widely researched [85, 75], including absolute directions (e.g., north, southwest) and relative directions (e.g., left,in_front). Some work mainly studied on direction-aware spatial queries [55, 47, 50]. For example, Liu et al. [55] discussed direc- tional queries in spatial databases., e.g., “Finding restaurants in the heading direction (or on the left side) when driving on the highway”. Lee et al. [47] presented nearest surrounder queries that searches the nearest surrounding objects around a query object. Other studies focused on the moving directions of moving objects [83, 44]. For exam- ple, Tao et al. [83] proposed an index called TPR*-tree to index moving objects for predictive spatial queries. In my thesis, the directions of ground-FOV objects are the inherent attributes of the objects rather than that of queries and hence different from the directions discussed in the above studies. 3.3.2 Geo-tagged Aerial Video Indexing and Querying I next review the related work on aerial videos and their indexing. UAVs have been widely studied in various academic communities, such as computer vision [45, 38] and robotics [13, 30]. In computer vision, researchers mainly focus on object detection and tracking from aerial videos/images for surveillance or traffic mon- itoring [45, 38]. For example, vehicle detection and tracking from aerial images/videos by incorporating vehicle behavior motions into road networks [90]. Kumar et al. [45] outlined an integrated systems approach for aerial video surveillance. UAV-based sys- tems for traffic monitoring are well surveyed by Kanistras et al. [38]. In robotics, most 23 studies focus on target tracking with UAV robots [30] or coverage path planning for UAV robots [13]. a path that passes over all points of an area of interest while avoiding obstacles. For example, Theodorakopoulos et al. [84] introduced a flight control strategy to achieve a static ground target tracking with a fixed camera mounted on UAVs, aim- ing at maximizing the target visibility. Hausman et al. [30] tracks moving targets with multiple quadrotor robots. However, these studies focus on various applications of UAV videos or robots rather than indexing and querying of aerial videos. Most of the studies on video indexing and querying based on the geographic infor- mation of videos or images focus on ground videos. There are few studies on aerial video indexing based on the spatial viewable coverages of videos. The most related one was proposed by Morse et al. [66] which indexed UAV videos by precomputing the “coverage quality map”. The coverage quality map is generated by georegistering all the aerial videos to a terrain map. Specifically, each National Elevation Dataset (NED) point in a target area is associated with a set of video frames that cover the point and their cor- responding quality values. The quality value of a video frame with respect to an NED point indicates how well the point can be visible in the video frame (i.e., considering the viewing distance, viewing angle, resolution, etc). The map is then used to index and search the aerial videos by indexing the NED points. Clearly, this precomputing-based approach requires large space and suffers from scalability problem for a large number of videos and a large target area (e.g., a city) and their approach is in fact orthogonal to our work. To the best of our knowledge, This is the first work on aerial video indexing and querying by representing their viewable coverages as quadrilateral-shaped aerial-FOVs. 3.4 Sensor Rich Video Systems and Applications Associating geo-location and camera orientation information for video retrieval has various applications and thus has been of wide interest to the research and industry communities of multimedia and computer vision. 24 3.4.1 Geo-tagged Video Systems There exist a few systems that associate videos with their corresponding geo- locations. Hwang et al. [34] and Kim et al. [40] proposed a mapping between the 3D world and videos by linking objects to the video frames in which they appear. However, their work neglected to provide any details on how to use the camera location and direc- tion to build links between video frames and world objects. Liu et al. [54] presented a sensor enhanced video annotation system (referred to as SEVA) which enables to search videos for the appearance of particular objects. SEVA serves as a good example to show how a sensor rich, controlled environment can support interesting applications. How- ever, it did not propose a broadly applicable approach to geo-spatially annotate videos for effective video search. The studies [14, 16] have extensively investigated these issues and proposed the use of videos’ geographical properties (such as camera location and direction) to enable an effective search of specific videos in large video collections. This has resulted in the development of the GeoVid framework based on the concept of georeferenced video. GeoVid introduced a viewable scene model that is utilized to describe video content. With mobile apps and a web portal (located at http://geovid.org), it demonstrated how this model enhances video management, search and browsing performance. However, the systems mentioned above were limited in that they presented ideas on how to search and manage the video collections based on the where information (i.e., geo-information, e.g., locations, directions) of videos. In addition to the where information, MediaQ [42] to be introduced in my thesis also considers the when, what and who information of video contents. Furthermore, MediaQ provides crowdsourcing services by exploiting the idea of spatial crowdsourcing termed GeoCrowd [? ] to collect on-demand media content on behalf of users. Moreover, MediaQ can provide social network services for sharing and following media content in which users are interested. 25 3.4.2 Geo-tagged Video Applications Geo-referenced videos have various applications in multimedia, computer vision, transportation, event coverage, tourism, etc. Some studies [14, 41, 72, 28, 34, 40] focused on geo-video modeling and representation by associating extracting geospatial metadata with images/videos. For example, Ay et al. [14] modeled the visible scenes of ground mobile videos as a set of ground-FOVs. Hwang et al. [34] and Kim et al. [40] proposed a mapping between the 3D world and the videos by linking the objects to the video frames in which they appear. Their work used GPS location and camera orientation to build links between video frames and world objects. It is observed that the geographical prop- erties of images/videos provide context information for humans to better understand the visual contents of images/videos, such as a panoramic view mapped on a specific loca- tion (e.g., Google Street View). Further, the associated geo-metadata can facilitate the selection of the most relevant video segments or images for down-stream computer vision applications. For example, Zhu et al. [96, 43] generated panorama from videos based on the geo-metadata of videos. Wang et al. [65, 89] exploited geo-information of videos to select key frames for 3D model reconstruction. Shen et al. [78] studied automatic textual tag annotations for geo-tagged videos. Additionally, geo-tagged photos/videos can be utilized for trip recommendation or trip planning. For example, Lin et al. [17] mined the FOV metadata of photos to extract the frequent travel patterns of users and returns high-quality personalized itineraries. Lu et al. [57, 58] for trip planning on large road networks, and Geospatial metadata has been used in Crandall et al. [21] for predicting locations from photos. Moreover, the attached geo-metadata can be exploited to detect points of interest (or hotspots, popular places or events). For example, Liu et al. [? ] discovered hot topics based on the spatio-temporal distributions of geo-tagged videos from YouTube, which took into account the camera locations only. The recent studies [? ? 87, 8] detected POIs from geo-tagged mobile videos based one ground-FOVs. 26 Chapter 4 Ground Video Indexing and Querying In this chapter, we focus on the indexing and querying for geo-tagged ground videos. As introduced in Section 2.2, the spatial coverage of each ground video frame is a ground- FOV, in a pie shape (see Figure 2.1.1). Unlike traditional spatial objects (e.g., points, rectangles), ground-FOVs are spatial objects with both location and orientation infor- mation, which poses challenges to existing spatial indexes. In this chapter, I will discuss the baseline methods for indexing ground-FOVs in Section 4.1. To overcome the drawbacks of the baselines, in Section 4.2, I will propose a class of new index structures, called OR-trees, incorporating camera locations, orien- tations and viewable distances of ground-FOVs. Based on OR-trees, two novel search strategies are presented in Section 4.3 for both range and directional queries. Finally, in Section 4.5, we will conduct extensive experiments to demonstrate the superiority of OR-tree with our proposed search algorithms over baselines. 4.1 Baseline methods 4.1.1 R-tree One baseline for indexing FOVs is R-tree [27], which is one of the basic and widely usedindexstructuresforspatialobjects. ToindexFOVsefficientlyusingR-tree, weindex FOVs based on the MBRs of their visible scenes. Consider the example in Figure 4.1, in which,f 1 ,...,f 8 are FOVs. Since R-tree is based on the optimization of minimizing the 27 area of MBRs of FOVs, the MBRs of the leaf nodes are the dashed rectangles (assume the fanout is 2). For clarity, we did not plot the MBRs of the internal nodes. Range and Directional queries based on R-tree For the range query Q r in Figure 4.1, we need to access all the nodes (R 1 ∼ R 4 ) since all of their MBRs overlap Q r . However, only two FOVs f 1 and f 2 are results. For the directional query with the query direction intervalQ d (0 ◦ − 90 ◦ ) and the query rangeQ r , we also need to access all the nodes since R-tree cannot support orientation filtering. Q r f 1 f 3 f 8 f 4 f 2 f 5 f 7 R 1 R 2 R 3 R 4 Q d f 6 Figure 4.1: Sample dataset of FOV objects f 1 MBR(f 1 ) (a) FOV object R 1 f 1 f 5 (b) Index node Figure 4.2: Dead spaces of object and index node of R-tree. Dashed area denotes the dead spaces. Hence, R-tree has the following drawbacks for indexing FOVs: 1. Dead space. Figure 4.2 illustrates the “dead spaces” (or empty area, the area that is covered by the MBR of an R-tree node, but does not overlap with any objects 28 in the subtree of the node [27]) of FOVf 1 and R-tree nodeR 1 in Figure 4.1. Dead spaces will cause false positives for range queries, and thus increase index node accesses. Taking the range query in Figure 4.1 as an example, due to the dead spaces of R 3 and R 4 , it needs to access both of them, which are not necessary to be accessed since FOVs in neither R 3 nor FOVs in R 4 are results. 2. Large MBRs. The area of the MBR of an R-tree node would be large due to the large visible scenes of the FOV objects enclosed in the node. Obviously, large MBRs will increase the number of accessed node for range queries. 3. No orientation filtering. With regular R-tree, there is no orientation information in the index nodes of the R-tree. 4. No orientation optimization. R-tree is constructed based on the optimization of minimizing the covering area of FOV objects, without considering their directions. 4.1.2 Grid-based Index Another approach for indexing FOVs is Grid-based Index, termed Grid [64, 63], a three-level grid-based index based on the viewable scene, camera locations and view directions of FOVs. The first level indexes FOVs in a coarse grid, where each cell main- tains the FOVs that overlap with the cell. At the second level, each first-level cell is divided into a set of subcells. Each subcell maintains all the FOVs whose camera loca- tions are inside the subcell. At the third level, it divides 360 ◦ into x intervals. Each direction interval maintains all the FOVs whose orientations overlap with the interval. Grid uses the first and second levels for range filtering to process range queries and uses the third level for orientation filtering to process directional queries. However, Grid has the following drawbacks: 1. It stores the location and orientation information at different levels, which is not efficient since video queries usually involve both location and orientation informa- tion of FOVs at the same time during query processing. 29 2. It performs poorly for skewed distribution of FOVs since the bucket occupancy of grid files rises very steeply for skewed distribution [36]. 4.2 OR-trees To overcome the drawbacks of the baseline indexes, we devise a class of new index structures, called OR-trees, incorporating camera locations, orientations and viewable distances of ground-FOVs. The class of OR-trees includes three new indexes each of which is built on the premise of its pervious version. 4.2.1 Orientation Augmented R-tree: OAR-tree Recall that with R-tree, using MBRs to estimate ground-FOVs will result in large MBRs, large “dead spaces” and the loss of orientation information. In this section, we introduceanewindexcalledOrientationAugmentedR-tree(OAR-tree)basedonsmaller MBRs and incorporating orientation information to accelerate the query processing. Based on R-trees, we add two new entries storing the orientation and viewable dis- tance information of ground-FOVs on both leaf and non-leaf nodes of OAR-trees. For the leaf nodes of an OAR-tree, instead of storing the MBRs of ground-FOVs, we store the information of the actual ground-FOVs. Thus we can avoid the “dead spaces” of ground-FOVs to reduce false positives. Specifically, each leaf index node contains a set of entries in the form of (Oid,p,R, Θ), whereOid is the pointer to a ground-FOV in the database;p is its camera location;R is its visible distance; and Θ is its view orientation. For each non-leaf nodeN of an OAR-tree, it contains a set of entries in the form of (Ptr, MBRp, MinMaxR, −−−→ MBO), where • Ptr is the pointer to a child node of N. • MBRp is the MBR of the camera points of the ground-FOVs in the subtree rooted at Ptr. 30 • MinMaxR is a two-tuple including the minimum and maximum visible distances of the ground-FOVs in the subtree of Ptr; • −−−→ MBO istheMinimum Bounding Orientation (MBO),definedinDefinition1below, of the orientations of the ground-FOVs in the subtree rooted at Ptr. Definition 1 (Minimum Bounding Orientation (MBO)). Given a set of ground-FOVs’ orientations Ω = Θ i < θ bi ,θ ei > , 1≤ i≤ n, n is the number of orientations in Ω, then the MBO of Ω is the minimum angle in a clockwise direction that covers all the orientations in Ω, i.e., MBO(Ω) =<θ b ,θ e >, such that − → θ b − → θ e = min θ bi ∈Ω max θ ej ∈Ω − → θ bi − → θ ej . Like the traditional R-tree [27], the OAR-tree is constructed on the optimization heuristic of minimizing the enlarged area of the MBRs of the camera locations of the index nodes. If indexing the ground-FOVs in Figure 4.1 with OAR-tree, then f 3 , f 4 are grouped into a node N 3 , and f 7 and f 8 into a node N 4 . Subsequently, comparing with the baseline R-tree, for the OAR-tree we visit two less index nodes for the range query Q r (both N 3 and N 4 can be pruned), and one less index node for the directional query Q d (N4 can be pruned and all the ground-FOVs in the subtree of N 3 can be reported as results). OAR-tree stores the MBRs of camera locations, and incorporates the aggregate ori- entation and viewable distance information of all the children nodes to achieve smaller MBRs and orientation filtering. However, the OAR-tree is only based on the optimiza- tion of minimizing the covering area of the camera locations, which may result in large false positives for both range and directional queries. Similar to the “dead space” of an R-tree node, we formally define the “Virtual Dead Space” of an OAR-tree node in Definition 2. Figure 4.3 shows the virtual dead spaces of the OAR-tree node containing f 1 and f 5 , and the OAR-tree node containing f 1 and f 2 . Definition 2 (OAR-tree node virtual dead space). Given an OAR-tree index node N(MBRp, −−−→ MBO,MaxMinR), then the virtual dead space of N is the area that is virtually covered by N, but does not overlap with any ground-FOVs in the subtree of 31 N. The virtual coverage of N is a convex region such that any point in the con- vex can be covered by any ground-FOV (p, − → Θ, R),∀p∈ N.MBRp,∀ − → Θ ∈ N. −−−→ MBO, ∀R∈N.MaxMinR. Consider Figure 4.3 again, for the example in Figure 4.1, ground-FOV f 1 is grouped together withf 5 in the OAR-tree based on the camera point optimization. However, iff 1 is grouped together with f 2 , additionally considering orientation information, then the virtual dead space will be significantly reduced and so does the number of false positives. Based on this observation, we next discuss how to enhance OAR-tree by considering orientation optimization during the index construction. f 1 f 5 (a) Camera point optimization f 1 f 2 (b) Camera point and orientation optimization Figure 4.3: Virtual dead spaces of OAR-tree nodes based on different optimizations. Dashed area indicates virtual dead space. 4.2.2 Orientation Optimized R-tree: O 2 R-tree We proceed to present a new index called Orientation Optimized R-tree (O 2 R-tree) that optimizes both the camera location covering area and the similarity in orientation. The stored information of O 2 R-tree nodes is the same as that of the OAR-tree. The main difference between O 2 R-tree and OAR-tree is in the optimization criteria during the merging and splitting of the index nodes. While the framework of the O 2 R-tree construction algorithm (see Algorithm 1), is similar to that of OAR-tree, considering additional orientation information, the following two procedures are different: Choose- Leaf, the procedure of choosing a leaf index node to store a newly inserted ground-FOV, 32 and Split, the procedure of splitting an index node into two nodes when the node over- flows. ChooseLeaf traverses an O 2 R-tree from the root to a leaf node. When it visits an internal node N in the tree, it will choose the entry E of the node N with the least Waste, to be given in Eqn (4.3). Split uses the standard Quadratic Split algorithm [27] based on our proposed Waste function. We proceed to compute Waste considering the combination of the wastes of the camera locations and view orientations. Given an O 2 R-tree entryE(MBRp,MaxMinR,MBO) and a ground-FOVf(p,R, Θ), let ΔArea(E,f) be the empty area (or dead space) that encloses the camera location of f with the MBR of the camera locations of ground-FOVs in E, which is formulated in Eqn(4.1). ΔArea(E,f) =Area(MBR(E,f))−Area(E) (4.1) where Area(MBR(E,f)) is the area of the minimum bounding rectangle enclosing E.MBRp and f.p; Area(E) is the area of E i .MBRp. The angle waste for the view orientation is computed by Eqn(4.2). ΔAngle(E,f) = MBO(E.MBO,f.Θ)− −−−−−→ E.MBO− −−→ f.Θ (4.2) where MBO(E.MBO,f.Θ) is the clockwise cover angle of the minimum bounding orien- tation enclosing E.MBO and f.Θ. Combining Eqn(4.1) and Eqn(4.2) using linear regression and normalization, we can compute the overall waste cost in Eqn(4.3). Waste lo (E,f) =β l ΔArea(E,f) maxΔArea +β o ΔAngle(E,f) maxΔAngle (4.3) In Eqn(4.3), maxΔArea (respectively maxΔAngle) is the maximum of ΔArea(E,f) (respectively ΔAngle(E,f)) for all the pair entries E i and E j to normalize the camera location (respectively orientation) waste. Parameters β l and β o , β l ,β o ≥ 0, β l +β o = 1, 33 are used to strike a balance between the area and angle wastes. A smallerWaste lo (E,f) indicates that the entry E is more likely to be chosen for insertion of object f. Algorithm 1: Insert (R: an old O 2 R-tree, E: a new leaf node entry) Output: The new O 2 R-tree R 1 N← ChooseLeaf(R, E); 2 add E to node N; 3 if N needs to be split then 4 {N, NN}← Split(N, E); 5 if N.isroot() then 6 initialize a new node M; 7 M.append(N); M.append(NN); 8 Store nodes M and NN //N is already stored ; 9 R.RootNode← M; 10 else 11 AdjustTree(N.ParentNode, N, NN); 12 else 13 Store node N; 14 if¬N.isroot() then 15 AdjustTree(N.ParentNode, N, null); Procedure ChooseLeaf(R, E) 16 N← R.RootNode; 17 while N is not leaf do 18 E 0 = argmax Ei∈N Waste lo (E i , E); // Eqn (4.3); 19 N←E 0 .Ptr; 20 Return N; Procedure Split(N, E) 21 E 1 ,E 2 = argmin Ei,Ej∈N∪{E} Waste lo (E i , E j ); // Eqn (4.3); 22 for each entry E 0 in N∪{E}, where E 0 6=E 1 , E 0 6=E 2 do 23 if Waste lo (E 0 , E 1 )≥ Waste lo (E 0 , E 2 ) then 24 Classify E 0 as Group 1; 25 else Classify E 0 as Group 2; 26 Return Group 1 and Group 2; 34 Q r Q d f 1 f 3 f 8 f 4 f 2 f 5 f 7 N 1 f 6 N 2 N 3 N 4 Figure 4.4: Leaf nodes of O 2 R-tree for the example in Figure4.1 For the example in Figure 4.1, as shown in Figure 4.4, the O 2 R-tree groups f 1 and f 2 into a node N 1 , and groups f 5 and f 6 into a node N 2 . Whereas, OAR-tree groups f 1 andf 5 into a nodeN 1 and groupsf 2 andf 6 into a nodeN 2 . Hence, as compared to the OAR-tree, O 2 R-tree visits one less index node for the range query Q r (node N 2 can be pruned), and one less index node for the directional query Q d (nodeN 2 can be pruned). 4.2.3 View Distance and Orientation Optimized R-tree: DO 2 R-tree Considering the camera location and orientation for optimization may still be insuf- ficient. To illustrate this, consider Figure 4.5, ground-FOV f 1 is packed with f 2 in node N 1 (see Figure 4.5a) based on the O 2 R-tree optimization as their camera locations and orientations are the same. While additionally considering the visible distances for opti- mization, f 1 is packed with f 3 in N 1 (see Figure 4.5b) due to the high dissimilarity of visible distances between f 1 and f 2 . Therefore, the range query q r needs to visit two index nodes (i.e., N 1 and N 2 in Figure 4.5a) based on the O 2 R-tree optimization. How- ever, if we consider the view distances for optimization as well then we only need to visit one node (i.e., N 1 in Figure 4.5b). Hence, we discuss how to construct the index based on the optimization criterion incorporating the view distance information of FOV objects, and we call the new index View Distance and Orientation Optimized R-tree (DO 2 R-tree). 35 f 1 f 2 f 3 f 4 Q r N 1 N 2 (a) Without view distance f 1 f 2 f 3 f 4 Q r N 1 N 2 (b) With view distance Figure 4.5: Illustration of optimization criteria with and without considering view distance. Suppose the fanout is 2. The difference between DO 2 R-tree and O 2 R-tree is the optimization criteria, where the new waste function of DO 2 R-tree incorporates the view distance differences as given in Eqn(4.5). Given a DO 2 R-tree entry E and a ground-FOV f, the waste of viewable distance ΔDiff(E,f) is defined in Eqn(4.4). ΔDiff(E,f) =Diff(E∪f)−Diff(E) (4.4) whereDiff(E) (Diff(E∪f)) is the difference between maximum and minimum view- able distances of the ground-FOVs in the subtree of E (including f). Combining the wastes of camera location area and orientation covering angle and the waste of view distance together, we can compute the overall waste cost in Eqn(4.5). Waste lod (E,f) =β l ΔArea(E,f) maxΔArea +β o ΔAngle(E,f) maxΔAngle +β d ΔDiff(E,f) maxΔDiff (4.5) In Eqn(4.5), maxΔDiff is the maximum of ΔDiff(E,f) for all the pair entries E i and E j to normalize the visible distance. Parameters β l , β o and β d , 0≤ β l ,β o ,β d ≤ 1, β l +β o +β d = 1, are used to tune the impact of the three wastes. In particular, ifβ d = 0, then DO 2 R-tree reduces to O 2 R-tree, and if also β o = 0, then it becomes OAR-tree. 36 4.3 Query Processing with OR-trees We proceed to present the query algorithms for range and directional queries based on DO 2 R-tree which is the generalization of the three indexes discussed in Section 4.2.3. 4.3.1 Range Queries In this section, we develop an efficient algorithm to answer range queries. At the high-level, the algorithm descends the DO 2 R-tree in a branch-and-bound manner, pro- gressively checking whether each visited ground-FOV object / index node overlaps with the query circle. Subsequently, the algorithm decides whether to prune a ground-FOV / index node, or to report a ground-FOV / index node (all the ground-FOVs in the node) to be result(s). In the following, before presenting the algorithm, we will first present an exact approach to identify whether a ground-FOV is a result or not, and then we exploit it to identify whether a DO 2 R-tree node should be accessed or not through two newly defined strategies: 1) pruning strategy and 2) total hit strategy. Search Strategies for Range Queries As shown in Fig. 4.6, there are four overlapping cases. Then we can derive the lemma bellow to identify whether a ground-FOV is a result. Lemma 1 (Overlap Identifying for An Object). Given a ground-FOV f p, R, Θ<θ b ,θ e > and a range query Q r (q, r), f overlaps with Q r iff it satisfies one of the equations below. |pq|≤r (4.6) |pq|≤r +R and − → θ b − → pq + − → pq − → θ e = − → Θ (4.7) |pq| cos − → pq − → θ b − q r 2 − (|pq| sin − → pq − → θ b ) 2 ≤R and − → θ b − → pq + − → pq − → θ e 6= − → Θ (4.8) |pq| cos − → pq − → θ e − q r 2 − (|pq| sin − → pq − → θ e ) 2 ≤R and − → θ b − → pq + − → pq − → θ e 6= − → Θ (4.9) 37 q p r (a) Case 1 q p R r → b θ → e θ Θ f g (b) Case 2 q p R r → b θ → e θ Θ f g b e (c) Case 3 → b θ → e θ Θ (d) Case 4 Figure 4.6: Overlap identifying for a ground-FOV object Based on Lemma 1, we can develop our first strategy, the pruning strategy. In order to prune N without accessing the objects in the subtree of N, we first introduce two approximations in Definition 3. Let N MBRp,MBO<θ b ,θ e > be a DO 2 R-tree index node. As shown in Figure 4.7, let < −→ p b q, −→ p e q>, wherep b ,p e ∈MBRp, be a ray tuple such that∀p∈MBRp, the ray − → pq is between −→ p b q and −→ p e q in a clockwise direction. Definition 3 (MaxA andMinA). The maximum (respectively minimum) cover angle in clockwise direction from the MBO of the DO 2 R-tree index node N to the ray − → pq, denoted as MaxA(MBO,MBRp,q) (respectively MinA(MBO,MBRp,q)), is defined as: MaxA(MBO,MBRp,q) =Max − → θ b −→ p e q, −→ p b q − → θ e (4.10) MinA(MBO,MBRp,q) = 38 0 if − → Θ overlaps with < −→ p b q, −→ p e q> Min − → θ e −→ p b q, −→ p e q − → θ b otherwise (4.11) MBRp MBO MaxA(MBO, MBRp, q) MinA(MBO, MBRp, q) e b (a) Case 1 MBRp MBO MaxA(MBO, MBRp, q) MinA(MBO, MBRp, q) e b (b) Case 2 Figure 4.7: Illustration of MinA and MaxA Lemma 2.∀f(p, Θ)∈N(MBRp,MBO),∀θ∈f.Θ, − → θ − → pq≤ MaxA(MBO,MBRp,q) and − → θ − → pq≥ MinA(MBO,MBRp,q) Proof. Lemma 2 is obviously true (see Figure 4.7). Lemma 3 (Pruning strategy). Index node N can be pruned if it satisfies Eqn(4.12), or Eqn(4.13), or Eqn(4.14), MinD(q,MBRp)≥r +MaxR (4.12) MinA(MBO,MBRp,q)≥ arcsin r MinD(MBRp,q) (4.13) MinD(q,MBRp) cos(MaxA(MBO,MBRp,q))− q r 2 −MinD 2 (MBRp,q) sin 2 (MinA(MBO,MBRp,q))≤MaxR, (4.14) where MinD(MBRp,q) is the minimum distance from q to MBRp We are now ready to discuss our second strategy. We call an index node N a “total hit” iff all the objects in N overlap with the query circle. This is a new concept that does not exist with regular R-trees. If N is a “total hit”, then it is not necessary to 39 exhaustively check for all the objects in N one by one, so the processing cost can be significantly reduced. Hence, based on the two approximations above, we propose a novel search strategy, total hit strategy. Proof. ∀f∈ N, 1) If Eqn(4.12) is true then∀R∈ [MinR,MaxR],|pq| >r +R, i.e., f does not satisfy Eqn(4.6) in Lemma 1. Obviously, f does not satisfy Equations 4.7, 4.8 and 4.9, either. 2) If Eqn(4.13) is true, then, according to Lemma 2, for any viewable distance R off, the orientation off does not point to any point in the query circle, and thus it does not satisfy Eqn(4.8) and Eqn(4.9). Obviously, f does not satisfy Eqn(4.6) nor Eqn(4.7). Additionally, 3) if Eqn(4.14) is true then f does not satisfy Equations 4.8 and 4.9 nor Equations 4.6 and 4.7. Therefore Lemma 3 is true. Lemma 4 (Total hit strategy). All the ground-FOVs in the subtree ofN can be reported as results if it satisfies Eqn(4.15), or all the equations (4.16), (4.17) and (4.18), MaxD(q,MBRp)≤r (4.15) MaxD(q,MBRp)≤r +MinR (4.16) MaxA(MBO,MBRp,q)≤ arcsin r MaxD(MBRp,q) (4.17) MaxD(q,MBRp) cos(MinA(MBO,MBRp,q))− q r 2 −MaxD 2 (MBRp,q) sin 2 (MaxA(MBO,MBRp,q))≤MinR (4.18) Proof. ∀f ∈ N, if Eqn(4.15) is true then Eqn(4.6) in Lemma 1 is obviously true. if Eqn(4.16), Eqn(4.17) and Eqn(4.18) are true then Eqn(4.8) (or Eqn(4.9) ) in Lemma 1 is true. Therefore Lemma 4 is true. Search algorithm of range queries. Based on the two new strategies discussed above, we can develop an efficient algorithm to answer range queries (see Algorithm 2. The overview of the algorithm is to descend the DO 2 R-tree in the branch-and-bound manner, progressively applying the two strategies to answer the range queries. 40 Algorithm 2: Range Query (R: DO 2 R-tree root, Q r : range circle) Output: All objects f∈ results 1 Initialize a stack S with the root of the DO 2 R-tree R; 2 while¬S.isEmpty() do 3 N← S.top(); S.pop(); 4 if N satisfies Eqn(4.12)∧ Eqn(4.13)∧ Eqn(4.14) then 5 Prune N; //Lemma 3 ; 6 else 7 if N satisfies Eqn(4.15)∨ Eqn(4.16)∧Eqn(4.17)∧Eqn(4.18) then 8 results.add(N.subtree()) //Lemma 4 ; 9 else 10 for each child node ChildN of N do 11 S.push(ChildN); 4.3.2 Directional queries We next explain an efficient algorithm (see Algorithm 3 in Appendix ??) for pro- cessing directional queries with DO 2 R-tree. Given a direction interval Q d , we can easily decide whether the orientation of a DO 2 R-tree index node overlaps withQ d using orien- tation information in the DO 2 R-tree nodes. Similar to the range query algorithm, Algo- rithm 3 also follows in a branch-and-bound manner, progressively applying the search strategies to answer the directional query. Note that we apply the range search strategies and the orientation filtering methods at the same time to decide whether a DO 2 R-tree index node should be pruned or be reported as a “total hit”. 4.4 Performance Analysis Weanalyzethemaximumimprovementspacesofground-FOVqueriesoverR-treesfor both range and directional queries. Assume that the camera locations and orientations of N ground-FOVs are uniformly distributed in an area and 0 ◦ −360 ◦ , respectively, and their viewable distances and viewable angles are the same (i.e., the ground-FOV area is constant). 41 Algorithm 3: Directional Query (R: DO 2 R-tree root, Q d : directional interval, Q r : range circle) Output: All objects f∈ results 1 Initialize a stack S with the root of the DO 2 R-tree R; 2 while¬S.isEmpty() do 3 N← S.top(); S.pop(); 4 if N satisfies Eqn(4.12)∧ Eqn(4.13)∧ Eqn(4.14) then 5 Prune N //Lemma 3; 6 else 7 if N satisfies Eqn(4.15)∨ Eqn(4.16)∧Eqn(4.17)∧Eqn(4.18) then 8 results.add(N.subtree()) //Lemma 4 ; 9 else 10 for each child node ChildN of N do 11 S.push(ChildN); Lemma 5. Compared with the range query algorithm with R-tree in Section 4.1.1, the I/O cost of the optimal algorithm with the optimal index for range queries is at most 66.7% times less than that of the approach with R-tree. Lemma 6. Given any a query direction intervalQ d (θ b ,θ e ), comparing with the algorithm of the directional query Q d with R-tree in Section 4.1.1, the improvement of the optimal algorithm based on the optimal index for Q d is 1− 33.3%∗θ b θ e /360 times less than that of the approach with R-tree in Section 4.1.1, where θ b θ e is the coverage angle from θ b to θ c in a clockwise direction. Proof. Given a range query Q r (q,r) and an arbitrary ground-FOV f, if the MBR of f, f.mbr, overlaps withQ r , then there is at most 50% probability that f is a false positive since the area of the “dead space”, as shown in Figure 4.2, can be proved to be at most half of that of f.mbr. LetM be the number of ground-FOVs whose MBRs overlap with Q r . Then the number of R-tree nodes that need to be accessed is M f (1 + P h i=1 1 f i )≤ 3M 2f . Then the optimal search algorithm with an optimal index (i.e., only accessing results) visits at most ( 3M 2f − M 2f )/ 3M 2f < 66.7% less nodes than the approach with R-tree. Hence, Lemma 5 is true. 42 LetI Ropt bethemaximumpercentagethattherangequeryalgorithmwithR-treecan be improved. Lemma 5 implies thatI Ropt = 66.7%. LetI Dopt (=1− 33.3%∗θ b θ e /360) be the maximum percentage that the directional query algorithm with R-tree can be improved. Since 0≤θ b θ e /360≤ 1, obviously,I Dopt ≥I Ropt . Lemma 6 implies that the maximum improvement over R-tree for directional queries is at least 66.7%. We can see that the improvement space decreases as the query directional interval increases. Proof. Given a directional query with query direction interval Q d (θ b ,θ e ) and range Q r (q,r), for an arbitrary ground-FOV f, the probability that if f is a false positive for the directional query is 50%(1−θ b θ e /360) since there is 50% probability that f overlaps with Q r , according to Lemma 5, and there is probability θ b θ e /360 that f is within the direction interval Q d . Let M be the number of ground-FOVs whose MBRs overlap with Q r and whose orientations overlap with Q d . Then the number of R-tree nodes that need to be accessed is M f (1 + P h i=1 1 f i )≤ 3M 2f . Then the opti- mal search algorithm with an optimal index (i.e., only accessing results) visits at most ( 3M 2f − M 2f θ b θ e /360)/ 3M 2f <1− 33.3%∗θ b θ e /360 less nodes than the approach with R-tree. Hence, Lemma 6 is true. 4.5 Experimental Studies Implemented Indexes We implemented our proposed indexes and search algorithms: OAR-tree, O 2 R-tree, and DO 2 R-tree for range and directional queries. In addition, we implemented two baselines for comparison: R-tree and Grid based Index (Grid). Datasets and Queries We used two types of datasets: Real World (RW) [62] and Synthetically Generated (Gen) dataset as shown in Table 5.1. RW was collected with the mobile apps of the MediaQ [42] project. The distribution of ground-FOVs in RW was very skewed: 50% of videos are at the University of Southern California (USC), 30% of videos are in Singapore, and 20% of videos in 18 other cities. To evaluate the scalability of our solutions, we synthetically generated five Gen datasets with different 43 data sizes in log scale from 0.1M (million) to 1B (billion) ground-FOVs using the mobile video generation algorithm presented in [15]. In Gen (see Table 5.2), ground-FOVs were uniformly distributed across the 10km× 10km area around the USC, and their view distances were also randomly distributed in 100∼500 meters, assuming a group of people walk randomly taking videos with their smartphones. Unless specified, the dataset size of 1M ground-FOVs was assumed by default. To evaluate the performance of our solutions under the setting of skewed distribution, we additionally synthetically generated five Gen datasets from 0.1M to 1B where ground-FOVs are non-uniformly (gaussian) distributed in the same area. For each dataset (RW and Gen), we generated 5 query sets for range queries by increasing the query radius from 100 to 500 meters, which is a reasonable range for video queries as users usually are interested in videos in a small specified area. Each query set contained 10,000 queries with different query locations but the same query radius. For Gen, query points were uniformly distributed around USC. For RW, half of the queries were uniformly distributed in around USC while the other half were distributed in Singapore. The query radius of 200 meters was assumed by default. Additionally, we generated query sets for directional queries varying the query direction interval from 45 ◦ to 315 ◦ incrementing by 45 ◦ . Table 4.1: Datasets for the experiments Statistics RW Gen total # of ground-FOVs 0.2 M 0.1 M∼1 B total # of videos 1276 100∼1 M ground-FOV# per second 1 1 average time per videos 3 mins 0.28 hours total time 62 hours 27.8 hours∼ 31.71 years average camera moving speed ( km/h) 4.50 4.86 average camera rotation speed ( degrees/s) 10 10 average viewable distance R (meters) 100 250 average viewable cover angle α (degrees) 51 60 Setup and metrics. We implemented all the indexes on a PC with Intel Core TM 2 Duo CPU E8500@3.16GHz, 4GB RAM and 4KB page size. The fanouts of R-tree and 44 Table 4.2: Synthetically Generated Dataset total ground-FOV# 0.1 M 1 M 10 M 100 M 1 B total video# 100 1 K 10 K 100 K 1 M total time 27.8 hours 11.6 days 115.7 days 3.2 years 31.7 years OAR-tree (O 2 R-tree, DO 2 R-tree) were 170 and 102, respectively. For the baseline Grid, following the setting reported in [63], we set the first-level cell size to be 250 meters, set the second-level cell size to be 62.5 meters, and set the view direction interval to be 45 ◦ . As the metrics for our evaluation, we report the average query processing time and the average number of page accesses per query after executing 10,000 queries per each set. Space requirement. The space usages of the index structures for different datasets are reported in Table 4.3. The space requirements of OAR-tree, O 2 R-tree, DO 2 R-tree were almost identical so we only report the space usage of DO 2 R-tree. DO 2 R-tree requires a little more space than R-tree since it needs extra space to store the orientations and viewable distances in each index node. However, the space usage of DO 2 R-tree was significantly less (about 5 times less) than that of Grid because Grid redundantly stores each ground-FOV at each level. Table 4.3: Sizes of indexing structures (Megabytes) RW 0.1M 1M 10M 100M 1B R-tree 8.66 3.41 32.69 289 2,536 23,978 Grid 42.02 16.70 163 1,576 16,519 163,420 DO2R-tree 9.75 5.51 60.25 379 3,784 38,299 4.5.1 Evaluation of Range Queries In this set of experiments, we evaluated the performance of the indexes for range queries using the Gen datasets. Figure 4.8 reports the average number of page accesses of all indexes. Figure 4.8b shows the same but focuses on the data size from 0.1M to 10M. Note that the data size is shown in log scale but the page access number is shown in linear scale. The most important observation is that both O 2 R-tree and DO 2 R-tree 45 significantly outperformed other indexes, which demonstrates the superiority of our opti- mization approaches. O 2 R-tree (respectively DO 2 R-tree) accessed around 40% (respec- tively 50%) less pages than Grid, and around 50% (respectively 60%) less than R-tree. The improvement of DO 2 R-tree is very close to the theoretical maximum improvement (66.7%, as analyzed in Section 4.4), clearly showing the effectiveness of orientation and view distance optimizations and our search algorithms. Another observation is that OAR-tree incurred slightly more page accesses than R-tree. This is expected because OAR-tree is only based on the optimization of minimizing the covering area of camera locations. The dead space of OAR-tree node can be larger than that of R-tree node, thus it may produce more false positives than R-tree, even though it incorporates view ori- entation and distance information into the index nodes for filtering. This demonstrates that not the simple consideration of orientation but the optimization criteria considering the orientation significantly facilitates the reduction of the dead spaces of tree nodes and subsequently the reduction of false positives. In addition, DO 2 R-tree was marginally better than O 2 R-tree for range queries since incorporating additional viewable distance for optimization helped the acceleration of the range queries. Figure 4.9 reports the query processing time in the above experiments. The overall performance improvement showed a similar trend as in Figure 4.8 but one can observe that O 2 R-tree and DO 2 R-tree provided a better improvement in the percentage of reduc- tion as compared to R-tree and Grid than the case of page access. This is because the “total hit” search strategy applied on our indexes can facilitate the reduction of process- ing time without checking the ground-FOVs in a node one by one. Figure 4.10 reports the impact of query radius in term of page accesses while varying the query radius from 100 to 500 meters. Other than Grid, the performance trend held for other indexes as the query radius increased. While for Grid, the number of page accesses increases rapidly and is significantly greater than that of the other indexes for radius greater than 200 meters. This is because the first-level cell size of Grid was set 46 at 250 meters in our experiments so Grid needs to access more first-level cells, each of which contains more objects. Grid is not flexible for queries with various radiuses. 4.5.2 Evaluation of Directional Queries In this set of experiments, we evaluated the performance of the indexes for directional query using the same datasets. We used the default query radius of 200 meters and set the query direction interval to be 90 ◦ in these experiments. As shown in Figure 4.11, O 2 R-tree (respectively DO 2 R-tree) accessed about 70% (respectively 65%) less number of pages than Grid and accessed about 67% (respectively 63%) less than R-tree. This demonstrates that the orientation optimization in build- ing O 2 R-tree and DO 2 R-tree was more effective in supporting directional queries (as expected). DO 2 R-tree performs slightly worse for the directional queries than O 2 R-tree since DO 2 R-tree has a lower extent of optimization on orientation than O 2 R-tree. 4.5.3 Impact of the Query Direction Interval We evaluated the impact of the query direction interval. Figure 4.14 shows that the number of page accesses of R-tree was not influenced by the query direction interval. The number of page accesses for other indexes slightly increased when the angle increased. Clearly, O 2 R-tree and DO 2 R-tree demonstrated at least 2 times better performance than others. The number of page accesses of Grid grows much faster than other indexes as Grid needs to visit its third level for orientation filtering for each candidate at the second level. 4.5.4 Impact of Weight Parameters This set of experiments aimed to evaluate how each optimization criteria (i.e., loca- tion, orientation and view distance) impacts the index performance. First, we built O 2 R-trees with different weight for orientation difference β o (in Eqn 4.3) from 0 to 1 (BetaO in Figure 4.12). Figure 4.12 shows the number of page accesses of O 2 R-tree, 47 built with different β o , for range query with 1M ground-FOV dataset. Note that the case of β o = 0 is actually an OAR-tree which does not use any orientation information for optimization and hence its performance is the worst. This shows simply augment- ing nodes with orientation without optimization does not help. As we increase β o , i.e., applying more optimization using orientation, the performance of O 2 R-tree becomes sig- nificantly better in the mid range ofβ o and becomes the best aroundβ o = 0.4. However, close to the other extreme case of β o = 1 (i.e., β l =0), only orientation difference is con- sidered in building O 2 R-tree so the performance of range query suffers. Hence, a good balance between area and angle wastes can produce the best result. Next, we built DO 2 R-trees with different weight of viewable distance differenceβ d (in Eqn 4.5) from 0 to 1 (BetaD in Figure 4.13). Learned from the previous result, we set β l = 0.6∗ (1−β d ) andβ o = 0.4∗ (1−β d ) to assume their best setting. Whenβ d = 0, the tree is actually O 2 R-tree with β o = 0.4. When β d = 1, only viewable distance difference is considered. Figure 4.13 shows that the performance of DO 2 R-tree was comparable (best when β d = 0.2) when β d ≤ 0.6 but it suffered as β d approached 1. In our dataset, the distance difference among FOVs were not large so the impact ofβ d was minimal. For a specific application, we can find appropriate parameter settings empirically. 4.5.5 Evaluation Using the Skewed Generated Dataset This set of experiments evaluates the performance of our indexes and search algo- rithms for ground-FOV objects with skewed distribution. As shown in Figure 4.14a, the overall performances of O 2 R-tree and DO 2 R-tree sig- nificantly outperform R-tree and Grid for range queries. The improvements of O 2 R-tree and DO 2 R-tree over R-tree show the same trend as in the experiments with the uniform Gen dataset while those over the Grid become more pronounced. Specifically, O 2 R-tree (respectively DO 2 R-tree) accessed about 65% (respectively 60%) less number of pages than Grid. This is because the occupancy of grid files rises very steeply for skewed distribution. 48 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 # of page accesses BetaO in O2R-tree O2R-tree Figure 4.12: Vary β o in O 2 R-tree 20 40 60 80 100 120 0 0.2 0.4 0.6 0.8 1 # of page accesses BetaD in DO2R-tree DO2R-tree Figure 4.13: Vary β d in DO 2 R-tree 0 10000 20000 30000 40000 50000 60000 0.1M 1M 10M 100M 1B # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (a) Range Query 0 10000 20000 30000 40000 50000 60000 70000 0.1M 1M 10M 100M 1B # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (b) Directional Query Figure 4.14: Comparison using skewed Gen dataset 20 40 60 80 100 120 45 90 135 180 225 270 315 # of page accesses Query direction interval (degree) R-tree Grid OAR-tree O2R-tree DO2R-tree Figure 4.15: Vary query direction interval for directional queries 0 5 10 15 20 25 30 35 40 R-tree Grid OAR O2R DO2R # of page accesses Indexes Range Directional Figure 4.16: Comparison on RW dataset 49 Figure 4.14b plots the results for directional queries. We can see that O 2 R-tree and DO 2 R-tree show even more improvements over Grid. This is expected since the orienta- tion optimization in building O 2 R-tree and DO 2 R-tree was more effective in supporting directional queries than range queries. 4.5.6 Evaluation Using the Real World Dataset To evaluate the effectiveness of our indexes and search algorithms in a real-world setting, we also conducted a set of experiments using the real-world (RW) dataset, which follows a skewed distribution. As shown in Figure 4.16, the results show the same trends as in the previous experiments using the skewed Gen dataset, for both range and directional queries. To summarize, the experimental results demonstrate that our proposed indexes O 2 R- tree and DO 2 R-tree and corresponding search algorithms outperform the two baseline indexes for both range and directional queries. 4.6 Chapter Summary In this chapter, I proposed a class of new index structures, OR-trees, to efficiently index ground-FOVs which are spatial objects with both location and viewing orientation information. OR-trees are R-tree-based indexes that can index location, orientation and viewable distance information of ground-FOVs for both filtering and optimization. In addition, our indexes can flexibly support user generated videos that contain ad-hoc ground-FOVs with potentially skewed distributions. Experimental results demonstrated that not the simple consideration of orientation but the optimization criteria considering the orientation significantly facilitated the reduction of dead spaces of the index nodes and subsequently leading to the reduction of unnecessary index node accesses. Further, two novel search strategies – pruning strategy and total hit strategy – were proposed for fast video range and directional queries on top of our index structures. It is not 50 trivial to process range queries with OR-trees. The two search strategies made full use of the geometry property of pie-shaped ground-FOVs and effectively used the aggregated information stored in OR-tree nodes to give tight estimations to identify whether an OR- tree node need to be accessed or not. The total hit strategy is a new concept that does not exist with traditional R-trees. With the total hit strategy, it is not necessary to exhaustively check for all the objects in an index node one by one, so the processing cost can be significantly reduced. Our indexes are not specific for ground-FOVs with sector shapes and can be used to index any spatial object with orientation in other shapes such as vectors, triangles and parallelograms. This is because, like the sector-shaped FOV objects, objects shaped in vectors, triangles and parallelograms can also be defined as objects being comprised of a positionp, a direction tuple − → Θ, and a radius tupleMinMaxR. Additionally, our indexes and search algorithms can be easily extended for 3-dimensional FOVs by augmenting the camera pitch and roll shooting directions. 51 0 10000 20000 30000 40000 50000 60000 0.1M 1M 10M 100M 1B # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (a) 0.1M∼ 1B 0 100 200 300 400 500 600 700 800 0.1M 1M 10M # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (b) 0.1M∼ 100M Figure 4.8: Page accesses of range queries 0 0.5 1 1.5 2 2.5 3 3.5 4 0.1M 1M 10M 100M 1B Query time (seconds) datasize R-tree Grid OAR-tree O2R-tree DO2R-tree Figure 4.9: Query processing time of range queries 0 200 400 600 800 1000 1200 100 200 300 400 500 # of page accesses query radius (meter) R-tree Grid OAR-tree O2R-tree DO2R-tree Figure 4.10: Impact of radius in range queries 0 10000 20000 30000 40000 50000 60000 0.1M 1M 10M 100M 1B # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (a) 0.1M∼ 1B 0 100 200 300 400 500 600 0.1M 1M 10M # of page accesses dataset size R-tree Grid OAR-tree O2R-tree DO2R-tree (b) 0.1M∼ 10M Figure 4.11: Varying dataset size for Directional queries 52 Chapter 5 Aerial Video Indexing and Querying In Chapter 4, we focused on ground mobile videos (e.g., smartphone videos). In this chapter, we focus on the indexing and querying for another type of user-generated videos, i.e., aerial videos (e.g., drone videos). Unmanned aerial vehicles (UAVs, a.k.a. drones) have been mainly used in military activities and public safety for decades. Recently, with advances in control engineer- ing, material science and sensor technologies, drones have gained significant commercial momentum. The number of drones is consequently increasing dramatically. Conse- quently, organizing and searching aerial videos is becoming a critical problem. As introduced in Section 2.2, the spatial coverage of each aerial video frame is shaped as an irregular quadrilateral (see Figures 2.2 and 2.3). Therefore, indexing aerial-FOVs to answer spatial queries on aerial videos poses challenges to existing spatial indexes. In this chapter, I will discuss the baseline methods for indexing aerial-FOVs in Sec- tion 5.1. To overcome the drawbacks of the baselines, in Section 5.2, I will propose a new index structure called TetraR-tree to efficiently index aerial-FOVs. Based on TetraR- tree, two novel search strategies are presented in Section 5.3 for both point and range queries. Finally, in Section 5.4, we will conduct extensive experiments to demonstrate the superiority of TetraR-tree with our proposed search algorithms over baselines. 53 5.1 Baseline Methods 5.1.1 R-trees One baseline for indexing aerial-FOVs is using R-tree [27], which is one of the basic and widely used spatial index structures. To index aerial-FOVs efficiently with R-tree, weencloseeachaerial-FOV(i.e., quadrilateral)withitsMBR,asillustratedinFigure2.3. Given a point query Q p (resp. a range query Q r ), for each visited index node N, R-tree decides whether to access the subtree of N through checking whether Q p is covered by (resp. Q r overlaps with) the MBR ofN or not. Large “dead spaces” will produce many unnecessary index node accesses [27]. R-tree will produce large dead spaces for indexing aerial-FOVs. For a ground-FOV, the dead space area is proved to be at most half of its MBR [61]. However, for an aerial-FOV, the dead space area ratio can be larger than 50% (e.g., Figures 2.3d and 2.3e). 5.1.2 OR-trees Another baseline is extending OR-tree [60, 61], which is the state-of-the-art index for ground videos based on the spatial coverages. The main idea of OR-tree is indexing the camera locations, directions and visible distances of ground-FOVs individually. The straightforward extension of OR-tree for aerial-FOVs is indexing each dimension of the 7-tuple Γ(lat, lng, hgt, θ a , θ p , θ r , α), separately. Each index node includes a set of entries in the form of (MBCube, MBOri, MinMaxα), where MBCube = {minLat, maxLat, minLng, maxLng, minHgt, maxHgt} is the minimum bounding cube of the camera locations of the subtree aerial-FOVs, MBOri = {minθ a , maxθ a , minθ p , maxθ p , minθ r , manθ r } is the minimum bounding viewing orientation of subtree aerial-FOVs, andMinMaxα is the minimum and maximum visible angles of the subtree aerial-FOVs. However, indexing aerial-FOVs with OR-trees results in poor performance for point and range queries due to low effectiveness of OR-tree’s pruning strategy. The reason behind this is not trivial and we explain below in details. Different from ground-FOVs 54 whichareregularpie-shaped, anaerial-FOVisaquadrilateralwithanarbitraryshape. It is complicated and computationally expensive to determine whether a query point (or a query range) is covered by (or overlap with) an aerial-FOV using its 7-tuple information. For example, for the aerial-FOV f in Figure 2.2, to calculate its four corner points (A,B,C,D), we first need to compute four points (A 0 ,B 0 ,C 0 ,D 0 ) assuming the drone camera shooting vertically towards the ground, then rotate each of them based on the rotation angles (θ a , θ p , θ r ), and then project them on the ground. In order to decide whether the aerial-FOVf is covered by the point queryQ p (q) (q is the query point), we need to examine whether the four angles of the vector − → pq to the four planes (i.e., pAB, pBC, pCD and pAD) are within the visible angle α. Hence, it is also complicated and computationally expensive to decide whether a point query is covered by an OR-tree node N or not using its bounding information (MBCube,MBOri,MinMaxα). An alternative spatial index for ground-FOVs is Grid-based Index, termed Grid [63]. Grid is a three-level grid-based index indexing ground-FOVs’ viewable scenes, camera locations and view directions individually at different levels. However, experimental results in the study [61] showed that Grid performed worse than OR-trees as it stores ground-FOVs’sinformation(e.g.,locationsandorientations)atdifferentlevels. Addition- ally, Grid performs poorly for skewed distribution of FOVs since the bucket occupancy of grid files rises very steeply for skewed distribution [37]. Therefore, we do not refer to Grid as our baseline for indexing aerial-FOVs. 5.2 TetraR-tree 5.2.1 TetraR-tree Index Structure To overcome the drawbacks of R-tree [27] and OR-tree [61], I propose a new index structure for aerial-FOVs, called TetraR-tree. Unlike OR-tree that stores the sensor data (i.e., the 7-tuple Γ(lat,lng,hgt,θ a ,θ p ,θ r ,α)) directly for an aerial-FOV object f, TetraR-tree stores its corresponding quadrilateral. 55 In particular, for the leaf index nodes of a TetraR-tree, instead of storing the MBRs of aerial-FOVs like R-tree, we store the actual aerial-FOVs (or quadrilaterals). As such, each leaf index nodeN of a TetraR-tree contains a set of entries in the form of (Oid,P), where Oid is the pointer to an aerial-FOV in the database;P is the set of four vertices of the quadrilateral. Each internal TetraR-tree node N contains a set of entries in the form of (Ptr,R), where • Ptr is the pointer to a child index node of N; •R is a set of four corner-MBRs (aka, “tetra-corner-MBRs”) in the form of{r i |i∈ [1, 4]}. The four corner points of each quadrilateralP in the subtree rooted atPtr are in the four corner-MBRs individually (see the definition in Eqn( 5.1)). N.R = r i |∀P{p 1 ,p 2 ,p 3 ,p 4 }∈subtree(N),p i ∈r i ,i∈ [1, 4] (5.1) Figure 5.1 shows an example of a TetraR-tree leaf node N 1 that includes two aerial- FOVs f 1 and f 2 and its tetra-corner-MBRs (r 1 , r 2 , r 3 , r 4 ) is stored in a branch of its parent node N 2 . (a) aerial-FOV distribution (b) TetraR-tree node information Figure 5.1: A TetraR-tree index node including two aerial-FOVs f 1 and f 2 While the framework of the TetraR-tree construction algorithm (Algorithm 4) is similar to that of the conventional R-tree, the optimization criteria in the procedures ChooseLeaf and Split are different. The procedure ChooseLeaf is to choose a leaf node 56 to store a newly inserted aerial-FOV object. ChooseLeaf traverses the TetraR-tree from the root to a leaf node. When it visits an internal node N in the tree, it will choose the entry E of N with the least Waste, to be given in Eqn( 5.3). The procedure Split is to split an index node N into two nodes when N overflows. We use the standard Quadratic Split algorithm [27] based on a new Waste function. We proceed to present our newly proposed optimization criteria Waste for TetraR-tree with four corner-MBRs in the index nodes. Algorithm 4: Insert (R: an old TetraR-tree, E: a new entry) Output: The new TetraR-tree R after inserting E 1 N← ChooseLeaf(R, E); 2 Add E to node N; 3 if N needs to be split then 4 {N, N 0 }← Split(N, E); 5 if N.isroot() then 6 Initialize a new node M; 7 M.append(N); M.append(N 0 ); 8 Store nodes M and N 0 ; /* N is already stored */ 9 R.RootNode← M; 10 else AdjustTree(N.ParentNode, N, N 0 ); 11 else 12 Store node N; 13 if¬N.isroot() then AdjustTree(N.ParentNode, N, null) ; Procedure ChooseLeaf(R, E) 14 N← R.RootNode; 15 while N is not leaf do 16 E 0 = argmax E i ∈N Waste align (E i , E); /* Eqn(5.3) */ 17 N←E 0 .Ptr; 18 return N; Procedure Split(N, E) 19 E 1 ,E 2 = argmin E i ,E j ∈N∪{E} Waste align (E i , E j ); /* Eqn(5.3) */ 20 for each entry E 0 in N∪{E}, where E 0 6=E 1 , E 0 6=E 2 do 21 if Waste align (E 0 ,E 1 )≥ Waste align (E 0 ,E 2 ) then Classify E 0 as Group 1 ; 22 else Classify E 0 as Group 2; 23 return Group 1 and Group 2; 57 During the index construction, one method is to follow the optimization heuristic of R-tree to minimize the enlarged area of the MBRs (i.e., the full coverages) of aerial- FOVs, and we refer to this method as “fullCoverage-based optimization”. As shown in Figure 5.2a, where B{b 1 ,b 2 ,b 3 ,b 4 } and R{r 1 ,r 2 ,r 3 ,r 4 } are two TetraR-tree index nodes, the fullCoverage-based optimization will minimize the enlarged area (the gray area) of the MBRs ofB andR. However, this optimization cannot effectively group the TetraR-treeindexnodes/objectstogetherwhichhavethesimilarviewablecoverages. For example, in Figure 5.2b, the two aerial-FOVs R{r 1 ,r 2 ,r 3 ,r 4 } and G{g 1 ,g 2 ,g 3 ,g 4 } have the same MBRs but their quadrilaterals have few overlap. However, R overlaps much more withB{b 1 ,b 2 ,b 3 ,b 4 } although their MBRs are different. Consequently, we proceed to introduce a new optimization criteria, “alignment-based” waste, which considers the individual alignment wastes of the four corners of quadrilaterals. The intuition is that TetraR-tree index nodes / objects whose corner-MBRs / corner points are closer to each other have a higher probability to have the similar viewable coverages. Given a TetraR-tree entry E with the tetra-corner-MBRsR{r i |i∈ [1, 4]} and an aerial-FOV f with the quadrilateralP{p j |j∈ [1, 4]}, let ΔArea(r,p) be the enlarged space after enclosing a quadrilateral vertexp inf with an MBRr inE. The definition of ΔArea(r,p)isformulatedinEqn(5.2). Andthealignment-basedwasteofinsertingf into E, denoted by Waste align (E,f), is defined in Eqn(5.3). In Figure 5.2c, the gray area is the alignment-based waste of the two nodesR{r 1 ,r 2 ,r 3 ,r 4 } andG{g 1 ,g 2 ,g 3 ,g 4 }, and the blue dashed area is the alignment-based waste of R and the index node B{b 1 ,b 2 ,b 3 ,b 4 }. ClearlyWaste align (R,B) < Waste align (R,G), thuswiththealignment-basedoptimization, the index node R will be grouped together with B instead of G, assuming the fanout is 2. ΔArea(r,p) =Area(MBR(r,p))−Area(r) (5.2) 58 Waste align (E,f) = X r∈E,p∈f ΔArea(r,p) (5.3) (a) fullCoverage-based (b) fullCoverage-based (a bad case) (c) fullCoverage vs Alignment (d) heurAlignment-based Figure 5.2: Various optimization mechanisms during TetraR-tree construction. 5.2.2 TetraR-tree Optimization As there are four corner-MBRsinanindexnodeE andfourverticesin aquadrilateral f, there are 4! = 24 permutations (i.e., compute the alignment-based waste 24 times). The exhaustive approach to optimize the alignment-based waste is enumerate all the possiblecombinationsandselecttheonewiththeminimumwaste. Thisapproach,named exhAlignment, can find the minimum alignment-based waste but it is too expensive. To reduce the index construction time, we propose a heuristic approach, named heurAlignment, to find the near-optimal alignment-based waste efficiently. Specifically, heurAlignment first calculates the minimum bounding rectanglesMBR(B),MBR(R) for 59 two TetraR-tree entriesB andR. Then it aligns the four corners (i.e., upper-left, lower- left, lower-right and upper-right) of MBR(B) and MBR(R), as shown in Figure 5.2d. Finally, for each corner, match the two corner-MBRs b∈ B and r∈ R that have not been matched and are closest to the corner among B and R, respectively. With this heuristic, heurAlignment calculates the alignment-based waste only once. The deletion procedure of TetraR-tree is similar to that of the conventional R-tree. Except that the optimization criteria in the R-tree adjustment and merge are replaced with the new criteria we proposed for TetraR-tree. 5.3 Query Processing with TetraR-trees We proceed to present the query processing algorithms for point queries and range queries based on TetraR-tree. 5.3.1 Point Query Inthissection, wedevelopanefficientalgorithmtoanswerpointqueries. Atthehigh- level, the algorithm descends from the TetraR-tree root in a branch-and-bound manner, progressively checking whether each visited aerial-FOV object / index node covers the query point. Subsequently, the key problem is to decide whether to prune an aerial-FOV object / index node, or to report the aerial-FOV / index node (all the aerial-FOVs in the node) to be result(s). In the following, before presenting the search algorithm, we first present an approach to identify whether an aerial-FOV covers the query point, and then exploit it to identify whether a TetraR-tree index node should be visited or not through two newly proposed search strategies: 1) pruning strategy and 2) total hit strategy. Search Strategies for Point Queries For an aerial-FOV object, we can apply the even-odd rule algorithm [79] described in Lemma 8 – a typical point-in-polygon algorithm – to identify whether the query point q 60 is inside the aerial-FOV or not. Its complexity isO(e), where e is the number of edges of the polygon. Lemma 7 (even-odd rule [79]). Given a polygon poly and a point q, let the crossing number cn(q,poly) be the times a ray starting from the point q with any fixed direction crosses the polygon poly boundary edges. It claims that q is inside poly if and only if the crossing number cn(q,poly) is odd, or outside if it is even. Lemma 8 (Point Query for an Aerial-FOV Object). Given an aerial-FOV f P and a query point q, q is inside of f if and only if the crossing number cn(q,f) is odd, or outside if it is even. Proof. Lemma 8 is clearly true according to Lemma 7 as aerial-FOVs are quadrilateral- shaped, convex polygons. For example, in Figure 5.3, query pointq 1 is inside of the aerial-FOVf{A,B,C,D} as its crossing number cn(q 1 ,f) is odd whereas q 2 is outside as cn(q 2 ,f) is even. Figure 5.3: Even-odd rule algorithm for point query Based on the lemmas above, we can develop two search strategies. In order to prune a TetraR-tree index nodeN or to reportN to be result(s) without accessing the objects under the subtree of N, we first introduce the following several concepts. Definition4 (PossibleQuadrilateral). Given a TetraR-tree nodeN, we call thepossible quadrilaterals of N as a set of abstract quadrilaterals whose four vertices are respec- tively from the four corner-MBRs ofN, i.e.,{P(p 1 ,p 2 ,p 3 ,p 4 )|p i ∈N.MBR i ,i∈ [1, 4]}. 61 Definition 5 (Outer Convex Hull). Given a TetraR-tree node N, we define the outer convex hull of N, denoted by outerCH(N), as the smallest convex hull that encloses all the possible quadrilaterals of N, i.e., the union of all the possible quadrilaterals. Definition 6 (Inner Convex Hull). Given a TetraR-tree node N, we define the inner convex hull ofN, denoted byinnerCH(N), as the largest convex hull that is covered by all the possible quadrilaterals ofN, i.e., the intersection of all the possible quadrilaterals. The red convex hull in Figure 5.4 illustrates the outer convex hull of a TetraR-tree index node N and the red convex hull in Figure 5.5 shows its inner convex hull. It is non-trivial to efficiently compute the outer and inner convex hulls, and this will be covered in Sec. 16. Based on the outer / inner convex hull definitions above, we propose two novel search strategies: pruning strategy in Lemma 9 and total hit strategy in Lemma 10, to examine whether an index node N is a result or not, without accessing the aerial-FOV objects in the subtree of N. For example, in Figure 5.4, we can prune the index node N, which includes two aerial-FOVs f 1 and f 2 , as the query point Q p is outside of its outer convex hull. Note that, the query point is within the dead space (i.e., the dashed area) of the MBR of the two aerial-FOVs, and thus the index node needs to be accessed if we were indexing with an R-tree. We call an index node N a “total hit” if and only if all the objects under the subtree of N are results. If an index node N is a “total hit”, then it is unnecessary to exhaustively check all the objects in N, so the processing cost can be significantly reduced. In Figure 5.5, as the query point is inside of an index node N, we can claim N is a “total hit” and safely report both f 1 and f 2 to be results without checking whether they are results or not individually. Lemma 9 (Pruning Strategy for Point Query). Given a TetraR-tree index node N and a point query Q p (q), the index node N can be pruned if the query point q is outside of outerCH(N). 62 Proof. Lemma 9 holds as outerCH(N) is the union of all the possible quadrilaterals of N. Lemma 10 (Hit Reporting Strategy for Point Query). Given a TetraR-tree index node N and a point query Q p (q), we can report all the aerial-FOVs under the subtree of N are results if the query point q is inside of innerCH(N). Proof. Lemma 10 holds asinnerCH(N) is the intersection of all the possible quadrilat- erals of N. Figure5.4: Pruningstrategywithouterconvex hull for point query. Figure 5.5: Total hit strategy with inner con- vex hull for point query. Search Algorithm for Point Queries Based on the two new search strategies, we develop an efficient algorithm to answer point queries (see Algorithm 5). The overview of the algorithm is to descend the TetraR- tree in a branch-and-bound manner, progressively applying the two strategies to answer the point query. Convex Hull Computing We proceed to discuss the geometric challenges in the computation of the outer and inner convex hulls. We utilize a study [67] that shows the calculation of outer convex hull for a set of regions (i.e., rectangles) can be transferred to the well-known convex 63 Algorithm 5: Point Query (T:TetraR-tree root, Q p :query point) Output: All the aerial-FOV objects f∈ results 1 Initialize a stack S with the root of the TetraR-tree T; 2 while¬S.isEmpty() do 3 N← S.pop(); 4 outerCH← getOuterCH(N); 5 if cn(Q p , outerCH) is even then Prune N; /*Lemmas 7 and 9 */ 6 else 7 innerCH← getInnerCH(N); 8 if cn (Q p , innerCH) is odd then 9 results.add(N.subtree()); /* Lemmas 7 and 10 */ 10 else 11 for each child node ChildN of N do 12 S.push(ChildN); Procedure getOuterCH(N) 13 P← The outer-convex-dominate points of N; /* Lemma 11 */ 14 return convexHull(P); /* Andrew’s Monotone Chain Algorithm [12] */ Procedure getInnerCH(N) 15 if∀r 1 ,r 2 ∈{MBR UL ,MBR LL ,MBR LR ,MBR UR }, r 1 ∩r 2 6=∅ then 16 return CH UL ∩CH LL ∩CH LR ∩CH UR ; /* Lemma 12 */ hull problem [12, 82] on points, as described in Lemma 1. As shown in Figure 5.6, to compute the outer convex hull of the tetra-corner-MBRs{r 1 ,r 2 ,r 3 ,r 4 }, we compute the convex hull of the 16 vertices{r i (p j )|1≤i≤ 4, 1≤j≤ 4}. Andrew’s Monotone Chain Algorithm [12], a typical solution to the convex hull problem, can compute the convex hull for a set of points with a complexity ofO(n logn), wheren is the number of points. To reduce the processing cost, we can reduce the number of points by removing the points that are unnecessary to be processed. Different from the traditional convex hull problem where points could be randomly distributed, in our tetra-corner-MBRs, points are from up-righted MBRs. Based on this property, we develop Lemma 11 to remove the “dominated points” of a TetraR-tree index node N (e.g., the black corner points in Figure 5.6) to accelerate the outer convex hull computation of N. Lines 13 – 14 in Algorithm 5 present the pseudocode of the outer convex hull computation. 64 Theorem 1 (Outer Convex Hull Computing [67]). Given a set of n rectangles R r i {p 1 ,p 2 ,p 3 ,p 4 }|1≤ i≤ n , the outer convex hull ofR is the convex hull of all the corner points of all the rectangles, i.e.,{r i (p j )|1≤i≤ 4, 1≤j≤n}. Figure 5.6: Illustration of the outer convex hull computing Figure 5.7: Inner convex hull: not con- structed from non-dominated corner points Lemma 11 (outer convex dominate points). Given a TetraR-tree index node N with tetra-corner-MBRsR, we say the corner points of a corner-MBR r inR are boundary corner points if they are on the boundary of the minimum bounding rectangle that encloses the four corner-MBRsR, say r is a boundary corner-MBR, and suppose the other corner points of r are dominated corner points. We can conclude that the outer convex hull of N is the convex hull of its non-dominated corner points, i.e., the union of boundary corner points and the corner points of non-boundary corner-MBRs. Proof. For a boundary corner-MBRr inR, its dominated corner pointsp will be “dom- inated” by its boundary corner points p 0 , i.e., p is inside of the convex hull that passes p 0 ; otherwise p will be the boundary of r. Hence Lemma 11 holds. The calculation of the inner convex hull for a TetraR-tree index node N is more complicatedastheverticesformingtheinnerconvexhullarenotnecessaryfromthetetra- corner-MBRs of N and it is not straightforwardly constructed from the non-dominated corner points (see Figure 5.7). The existing solution [67] calculates the inner convex hull through computing the intersection of four convex hulls, as described in Theorem 2. For 65 example, in Figure 5.8a, the inner convex hull is the intersection of the four convex hulls CH UL (in blue), CH LL (in green), CH LR (in black) and CH UR (in orange). However, in some cases, there is no inner convex hull for a TetraR-tree node (e.g., Figure 5.8b). To avoid the expensive computation for a non-existing inner convex hull, we propose a novel technique given in Lemma 12 to identify the case where no inner convex hull exists. For example, in Figure 5.8b, there is no inner convex hull as the two minimum bounding rectangles MBR UL (in blue) and MBR LR (in red) do not over- lap. The pseudocode of computing the inner convex hull is given in Lines 15 – 16 in Algorithm 5. Theorem 2 (Inner Convex Computation [67]). Given a set ofn rectanglesR, letCH UL (resp. CH LL , CH LR , CH UR ) be the convex hull of the the upper-left (resp. lower-left, lower-right, upper-right) vertices of all the rectangles inR. The inner convex hull ofR is the intersection of the four convex hulls CH UL , CH LL , CH LR and CH UR . Lemma 12 (Inner Convex Hull Pruning). Given a set of n rectanglesR, let MBR UL (resp. MBR LL , MBR LR , MBR UR ) be the minimum bounding rectangle that encloses the the upper-left (resp. lower-left, lower-right, upper-right) vertices of all the rectangles inR. Among the four corner-MBRs, if there are two non-overlapping MBRs, then the inner convex hull ofR does not exist. Proof. As the four convex hulls are enclosed in their corresponding MBRs (e.g., CH UL ⊆ MBR UL ), if their MBRs do not overlap, then the convex hulls do not overlap, and thus Lemma 12 holds. 5.3.2 Range Query We next present an efficient algorithm for processing range queries with TetraR- tree. The basic idea to answer range queries on aerial-FOVs is to transform the range query problem to the point query problem by enlarging the quadrilaterals of aerial-FOVs with the radius of the range query. As shown in Figure 5.9, given a quadrilateral (or a 66 (a) The inner convex consists of four convex hulls (b) An example where no inner convex hull exists Figure 5.8: Illustration on inner convex computation. polygon)P{A,B,C,D}, we denoteenlargedCov as the convex after enlarging each edge (and vertex) of P with a radius r, denote expanedPoly as the polygon by expanding each edge ofenlargedCov to enclose all of its rounded corners, and denotetrimmedPoly as the polygon by trimming off the round corners of enlargedCov. Algorithm 6 depicts how to check whether a range query Q r (q,r) overlaps a quadrilateral P (i.e., an aerial- FOV or a polygon) or not. Specifically, we first check if the query center q is inside of expandedPoly (Line 2). If not, then return false. Otherwise we calculate trimmedPoly to see whether it covers q (Lines 4–5). If trimmedPoly covers q, then return true. Otherwise we perform the final check: examine the vertices p of the polygon P to see if one of them is within the query range (Line 7–8). Similar to point queries, we can also design pruning and total hit strategies for range queries. To decide whether a TetraR-tree index node N overlaps with a range query Q r (q,r), we calculate the outer and inner convex hulls of N, outerCH(N) and innerCH(N), and then prune N if Q r does not overlap with outerCH(N) and report N as a “total hit” if Q r overlaps with innerCH(N) (use Algorithm 6). To process a range query with a TetraR-tree, we descend the TetraR-tree in the branch-and-bound manner, and repeatedly apply each of the search strategies to answer the range query. 67 Figure 5.9: Illustration on how to transform a range query to a point query Algorithm 6: Range overlap (P: a polygon, Q r (q,r): range query with query center q and radius r) 1 expandedPoly← expanded convex of enlargedCov(P ) ; 2 if cn (q, expandedPoly) is even then return false ; 3 else 4 trimmedPoly← trimmed convex of enlargedCov(P ) ; 5 if cn (q, trimmedPoly) is odd then return true ; 6 else 7 for each vertex p of the polygon P do 8 if EucliDist(q, p)>r then return false ; 9 return true 5.4 Experimental Studies We conducted an experiments to evaluate the efficiency of our methods using two fundamental queries: point and range queries. 5.4.1 Experimental Methodology and Settings Implemented Indexes. We implemented our proposed indexes and search algorithms: TetraR-tree with the full-coverage-based optimization (TetraR-fullCover), TetraR-tree with the exhaustive alignment-based optimization (TetraR-exhAlign), and TetraR-tree with the alignment-based heuristic (TetraR-heurAlign) for both point and range queries. In addition, we implemented two baselines for comparison: R-tree [27] and OR-tree [61]. For all these indexes each index node stored in one disk page. As TetraR-tree stores three MBRs more than R-tree, their fanouts are around three times less than that of 68 R-tree. Therefore, we also implemented TetraR-heurAlign where the size of an index node is equal to four disk pages to allow it to have the same fanout as R-tree, and we refer to such index as TetraR-heurAlign*. Datasets. We used two types of datasets: Real World (RW) [71] and Synthetically Generated (Gen) dataset as shown in Table 5.1. RW includes aerial videos that was recorded by Autonomous System Lab at Swiss Federal Institute of Technology Zurich with a solar-powered UAV (AtlantikSolar 1 ). To evaluate the scalability of our solutions, we synthetically generated five Gen datasets with different data sizes in log scale from 0.1M (million) to 1B (billion) aerial-FOVs, see Table 5.2, using the mobile video genera- tionalgorithm[15]. Thisprocesssimulatesasituationwhereaerial-FOVswereuniformly distributed across a 16km× 16km area around Los Angeles Downtown, assuming sev- eral drones being distributed in the area initially and then flying randomly at a speed of around 15 km/h and recording videos with cameras mounted to the drones. Unless specified, the dataset size of 10M aerial-FOVs is assumed in the reported experimental results. Table 5.1: Datasets Statistics RW Gen total # of aerial-FOVs 0.6 M 0.1 M∼1 B aerial-FOV# per second 0.2 1 total play time made by all the aerial-FOVs 81 hours 27.8 hours∼ 31.71 years average camera moving speed ( km/h) 35 15 average flying altitude (meter) 400 100 total covered area (km 2 ) 2.56 256 Table 5.2: Synthetically Generated Gen Datasets aerial-FOV# 0.1 M 1 M 10 M 100 M 1 B aerial-video# 100 1 K 10 K 100 K 1 M total play time 27.8 hours 11.6 days 115.7 days 3.2 years 31.7 years Queries. For point queries, we randomly generated 10,000 query points within the dataset space. For range queries, we generated 5 query sets for each dataset (RW and 1 http://www.atlantiksolar.ethz.ch/ 69 Gen) by increasing the query radius from 0.2 to 1.0 km (kilometer) incremented by 0.2 km, andweusethequeryradiusof0.6kmbydefault. Thisisareasonablevariationrange foraerialvideoqueriessinceusersareusuallyinterestedinvideosinasmallspecifiedarea (e.g., a building with 0.05 km radius, a college campus with 0.8 km radius). Each range query set contained 10,000 queries with the same query radius but the query centers are randomly distributed in the dataset area. Setup and metrics. We implemented all the indexes on a server with Intel(R) Xeon(R) CPU E3 @3.50GHz, 6GB of RAM, 2000GB of hard disk, and used the page size of 4KB. We evaluated their performance based on disk-resident data. For the eval- uation metrics, we report the average query time (end-to-end) and the average I/O cost (measured with IOreduction in Eqn(5.4)) per query after executing 10,000 queries for each set. We measure the I/O cost of an indexA with the reduction percentage over R-tree given Eqn(5.4), in which, IOusage is the disk I/O usage (i.e., the actual number of disk reads and writes per second) monitored with the Linux iotop tool 2 , which can provide the I/O usage for a specific process. LargerIOreduction suggests more I/O cost reduction over R-tree. IOreduction(A) = 1− IOusage(A)∗processTime(A) IOusage(R-tree)∗processTime(R-tree) ∗ 100% (5.4) 5.4.2 Performance Evaluation of Index Construction The construction performance and space usage of the index structures for RW and Gen (1B) are reported in Tables 5.3 and 5.4. The TetraR-tree family requires about 2 times more space than R-tree and 30% more than OR-tree since TetraR-tree needs more space (four MBRs) in each index node, resulting in higher I/O costs than both R-tree and OR-tree. Moreover, the index construction process of TetraR-tree is more CPU intensive and hence it takes slightly longer than that of R-tree. However, if TetraR-exhAlign given that it is an exhaustive approach to optimize the index by enumerating all the possible 2 https://linux.die.net/man/1/iotop 70 combinations, the index construction time takes even longer, about 4 times more than R-tree and 3 times more than OR-tree. We also observe that TetraR-heurAlign* needs at most 30% more for its construction time than TetraR-heurAlign as more branches need to be processed for optimizing each index node even though it has less index nodes. Table 5.3: Performance of indexing structures on RW Dataset R-tree OR-tree TetraR-tree fullCover exhAlign heurAlign heurAlign* Fanout 203 71 53 53 53 203 Tree level# 3 4 4 4 4 3 Size (MB) 21.00 43.89 55.26 54.12 53.26 50.06 I/O cost X 1.589X 2.117X 2.015X 1.919X 1.827X Construct time (min) 1.07 1.24 1.13 5.13 1.30 1.69 Table 5.4: Performance of indexing structures on Gen Dataset (1B) R-tree OR-tree TetraR-tree fullCover exhAlign heurAlign heurAlign* Fanout 203 71 53 53 53 203 Tree level# 5 6 6 6 6 5 Size (GB) 22.98 40.23 60.04 58.01 57.30 53.79 I/O cost X 1.657X 2.068X 2.254X 2.189X 1.929X Construct time (hour) 3.47 4.87 4.26 17.09 5.38 6.97 5.4.3 Evaluation for Point Queries In this set of experiments, we evaluated the performance of the indexes for point queries using theGen dataset by varying the dataset size from 0.1 M to 1B. As shown in Figure 5.10a, we can observe that: 1) all the TetraR-trees with the alignment-based opti- mization(i.e.,TetraR-exhAlign,TetraR-heurAlignandTetraR-heurAlign*)significantlyout- performedbothR-treeandOR-treeintermsofquerytime. Forexample, TetraR-heurAlign took 80% less query time than R-tree and 70% less than OR-tree. This demonstrates the superiority of our alignment-based optimization in TetraR-tree. 2) TetraR-fullCover took 71 less time than R-tree but more time than OR-tree as the fullCoverage-based optimization cannot optimize the actual waste of aerial-FOVs in a TetraR-tree index node. Thus, it performed worse than OR-tree which combines camera locations and viewing orien- tations [61]. TetraR-fullCover performed better than R-tree due to our proposed novel search strategies though they are based on the same optimization criteria. 3) Among all the indexes, TetraR-exhAlign performed the best for point queries as it can find the exact alignment-based waste. However, as mentioned before, it is too time consuming for the index construction. With the alignment-based heuristic, TetraR-heurAlign reduced the index construction time significantly (more than 65% reduction) and did not lose much (less than 20%) efficiency for the query processing. 4) TetraR-heurAlign* was marginally worse (took less than 20% query time) than TetraR-heurAlign since the outer and inner convex hulls of an index node will be slightly looser (i.e., more dead spaces) given more aerial-FOVs in the node. In addition, their construction times were also competitive, and thus it demonstrates that the performance of TetraR-tree is not very sensitive to the number of disk pages in an index node. Therefore, we omit TetraR-heurAlign* in the rest of the experiments. Figure 5.10b reports the average I/O cost reduction IOreduction over R-tree. The overall performance improvement showed a similar trend as Figure 5.10a. TetraR-tree with the alignment-based optimization provided at least 60% on I/O cost reduction as compared to R-tree. Additionally, one can observe that the I/O cost reductions of all indexes over R-tree are comparable for different the dataset sizes. 5.4.4 Evaluation for Range Queries In this set of experiments we evaluated the performance of the indexes for range queries using the Gen dataset. We did not implement OR-tree for range queries since determining whether a query range overlaps with an OR-tree node, in which all the aerial-FOVs are irregular quadrilateral-shaped, is too complicated and computationally expensive. 72 (a) Query time (b) I/O cost reduction Figure 5.10: Point query on Gen Dataset As depicted in Figure 5.11, the overall performances of TetraR-exhAlign and TetraR- heurAlign were significant and similar to the results of point queries. Specifically, TetraR- heurAlign (resp. TetraR-exhAlign) took 84% (resp. 87%) less query time than R-tree and resulted in 75% (resp. 80%) I/O cost reduction over R-tree. Note that on the 1B aerial-FOVs (over 30 years’ worth of aerial videos), our TetraR-trees with the alignment- based optimization took less than 2 seconds to answer range queries, which makes the corresponding application feels more interactive as compared to 8 – 10 seconds required by the other approaches. (a) Query time (b) I/O cost reduction Figure 5.11: Range query on Gen Dataset 73 Figure 5.12: Vary query radius for range queries on Gen Dataset Figure 5.12 reports the impact of the query radius by varying the radius from 0.2km to 1.0km. It is obvious that the performances of TetraR-exhAlign and TetraR-heurAlign stayedrelativelythesameastheradiusincreasewhilethoseof R-treeandTetraR-fullCover worsen as the query radius grew. This is because the total hit strategy applied in our indexes can safely report all the objects in a TetraR-tree index node to be results if the node is a “total hit” without exhaustively checking all the objects in the node one by one, and thus yields a significant reduction of processing time even when the query radius grew (i.e., more query results). 5.4.5 Evaluation of Search Strategies Thissetofexperimentsevaluatetheeffectsofourproposedsearchstrategies: pruning and total hit strategies. Recall that we applied the outer convex hull for the pruning strategy and applied inner convexhullforthetotalhitstrategy. AsintroducedinSec. 16, based on the existing outer convex hull computing method (outerCH) [67], we proposed an efficient approach (outerCH*) by filtering the dominated corner points. Based on the existing solution (innerCH) [67] to the inner convex hull computing, we proposed an optimized approach (innerCH*) by identifying non-existing inner convex hulls. 74 Figure 5.13a shows the query times of TetraR-heurAlign by applying different pruning strategies (i.e., MBR, outerCH and outerCH*) for both point and range queries. Similar to R-tree, MBR-pruning uses the minimum bounding rectangle of the four corner-MBRs to filter out irrelevant objects or index nodes. In this set of experiments, the innerCH*- based total hit strategy is applied. Figure 5.13a shows that with outerCH, the query time of TetraR-heurAlign was reduced significantly (70%) for both point and range queries comparing to the MBR pruning, which demonstrates the superiority of our pruning strategy. Additionally, if applying outerCH*, it provided better (upto 80%) reduction due to the benefit of the filtering of the dominated corner points before calculating the outer convex hulls. Figure 5.13b illustrates the impact of total hit strategy for TetraR-heurAlign. Here, all the queries applied the outerCH*-based pruning strategy. We can observe that with innerCH, TetraR-heurAlign took 40% less query time than the case where no total hit strategy is applied. Meanwhile applying innerCH*, the query time was reduced by 60% for both point and range queries. This set of experiments show that total hit strategy can reduce the query time and applying the filtering technique to avoid computing a non-existing inner convex hull can reduce the processing time further. (a) Pruning strategy (b) Total hit strategy Figure 5.13: Evaluation on the search strategies 75 5.4.6 Evaluation on the Real World Dataset To evaluate the effectiveness of our indexes and algorithms in a real-world setting, we also conducted a set of experiments using the real-world (RW) dataset. As shown in Figure 5.14, the results show the same trends as the previus experiments with the Gen dataset for both point and range queries. (a) Point query (b) Range query Figure 5.14: Evaluation on RW Dataset To summarize, the experimental results demonstrate that our proposed TetraR-tree with the alignment-based optimization and our search strategies based the outer/inner convex hulls consistently outperform the two baselines for both point and range queries. 5.5 Chapter Summary In this chapter, for the first time we represented aerial videos as a series of spatial objects, i.e., irregular quadrilateral-shaped aerial-FOVs, and proposed a new index struc- ture, called TetraR-tree that effectively captured the geometric property of aerial-FOVs. The conventional R-tree, which index aerial-FOVs by enclosing each quadrilateral with a large MBR, suffers large dead spaces and thus have many unnecessary index node accesses. Different from R-tree which stores the MBR of aerial-FOVs, at each index node of TetraR-tree, we store four MBRs (tetra-corner-MBRs), each of which covers one of the four corner points of all the aerial-FOVs (i.e., quadrilaterals) in that index node. I 76 also proposed an alignment-based optimization to effectively and efficiently decide which quadrilaterals or TetraR-tree nodes are grouped together to minimize the dead space. Instead of expensively computing the dead spaces directly, the alignment-based opti- mization minimized the alignment waste of the four corners of all the quadrilaterals in a TetraR-tree index node. To accelerate the alignment-based waste space calculation further, I proposed a heuristic method to avoid exhaustively enumerating all the (24) alignment combinations of the four corner-MBRs. Additionally, I proposed two novel search strategies based on TetraR-tree for both point and range queries. Based on the geometric properties of the tetra-corner-MBRs in a TetraR-tree index node, we can compute two convex hulls: 1) the smallest convex hull (called outer convex hull) that encloses all the aerial-FOVs in the node, and 2) the largest convex hull (called inner convex hull) where all the aerial-FOVs in the node can cover. Subsequently, based on the outer and inter inner convex hulls, we propose two new search strategies: pruning strategy (to effectively decide whether a node can be pruned or not) and total hit strategy (to effectively decide whether all the objects in a node are in the results or not) for both point and range queries. To solve the fundamental geometric challenges in the computation of outer and inner convex hulls, I proposed a filtering technique to accelerate the convex hull computation. Extensive experiments using a real-world dataset and a large synthetically generated dataset demonstrated the superiority of TetraR-tree with the proposed search algorithms over the baselines (R-tree and OR-tree) for both point and range queries by at least 70% in terms of query time and I/O cost. 77 Chapter 6 Mobile Video Management System and Applications In the above chapters, I proposed the indexing and querying solutions to tackle the challenges on large-scale mobile video management. In this chapter, I will first introduce a mobile video management system, MediaQ [42] 1 , to demonstrate how we collect and organize geographic sensor data to organize and search mobile videos in Section 6.1. Based on MediaQ, I will give some example use cases of the geo-tagged mobile videos in the real world in Section 6.2. 6.1 MediaQ: Mobile Video Management System Mobile devices such as smartphones and tablets can capture high-resolution videos and pictures, but their data storage is limited and may not be reliable (e.g., a phone is lost or broken). Reliable backend storages are desirable (e.g., Dropbox, Google Drive, iCloud), but unfortunately, it is very difficult to efficiently search these large storage systems to find required video segments and pictures as the videos are usually file-based and without a facility to systematically organize media content with appropriate indices. This becomes especially troublesome when a huge amount of media data and a large number of users are considered. Moreover, current online mobile video applications (e.g., YouTube, Facebook and Flickr) mainly focus on simple services, such as storage or sharing of media, rather than integrated services towards more value-added applications. 1 https://mediaq.usc.edu/ 78 The collection and fusion of multiple sensor streams such as the camera location, field- of-view, direction, etc, can provide a comprehensive model of the viewable scene. The objective then is to organize the video data based on the viewable scene and therefore to enable efficient retrieval of more meaningful and relevant video results for various applications. In an effort towards addressing the above challenges we introduce the MediaQ [42] as a novel mobile multimedia management system to collect, organize, and search geo-tagged user-generated mobile videos and images. 6.1.1 System Architecture The schematic design of the MediaQ system is summarized in Figure 6.1. Client-side components are for user interaction, i.e., the Mobile App and the Web App. The Mobile App is mainly for video capturing with sensed metadata and their uploading. The Web App allows searching the videos and issuing spatial crowdsourcing task requests to col- lect specific videos. Server-side components consist of Web Services, Video Processing, GeoCrowd Engine, Query Processing, Account Management, and Data Store. The Web Service is the interface between client-side and server-side components. The Video Pro- cessing component performs transcoding of uploaded videos so that they can be served in various players. At the same time, uploaded videos are analyzed by the visual analytics module to extract extra information about their content such as the number of people in a scene. We can plug in open source visual analytics algorithms here to achieve more advanced analyses such as facerecognitionamongasmallgroupofpeoplesuchasauser’s family or friends. Automatic keyword tagging is also performed at this stage in parallel to reduce the latency delay at the server [78]. Metadata (captured sensor data, extracted keywords, and results from visual analytics) are stored separately from uploaded media content within the Data Store. Query Processing supports effective searching for video content using the metadata in the database. Finally, task management for spatial crowd- sourcing , GeoCrowd, can be performed via the GeoCrowd engine. Spatial crowdsourcing using smartphones is emerging as a new paradigm for data collection in an on-demand 79 manner. We use spatial crowdsourcing [39] as the video collection mechanism in MediaQ in order to collect data efficiently and at scale. As the mobile video data collection and the sensor data accuracy are essential for the video query results, I proceed to explain the details of data acquisition and sensor data correction component in the system. Mobile App Web App Client Side Uploading API Search and Video Playing API GeoCrowd API Server Side Web Services Query processing Content repository Metadata repository MySQL MongoDB Data Store Databases GeoCrowd Engine Account Management Transcoding Visual Analytics Keyword Tagging Video Processing User API Figure 6.1: Overall structure of the MediaQ framework with its sub-components. 6.1.2 Geo-tagged Mobile Video Data Collection and Accuracy Enhancement Geo-tagged Mobile Video Data Collection Within our mobile app 2 , we implement a custom geospatial video video module to acquire and process (synchronize and correct) the location and direction metadata along with the video streams. We collect media contents accompanied with their metadata by exploiting all related mobile sensors, especially representing the spatial properties of videos. Figure 6.2 depicts the design of the mobile app. It is comprised of four main components, i.e., the media collection component, the user verification component, the 2 http://mediaq.usc.edu:8080/home/ 80 GeoCrowd component, and the storage component. The media collection component is responsible for capturing video data and their metadata. Thus, while the user is recording a video, various sensors are enabled to collect data such as location data (from GPS) and FOV data (from digital compass). A timer keeps track of the recorded sensor data by relating each sensor record to a timestamp. The correlation of each record with a timestamp is extremely important because video frames must be synchronized with the sensed data. In addition, user data are added to the metadata and a JSON-formatted file is created. Storage (SD card) User Verification Registration Login Media Collection Video Collection Metadata Collection Timer GPS Wi-Fi Gyroscope Accelerometer Magnetic field Camera Timestamp Sync Video Source Location GeoCrowd Task Inquiries Task Management Notification Handler User data User data Upload metadata / video content Video Chunk Metadata User Data Upload status Video storage Metadata storage Mobile App Sensors FOV Video Chunk Server Side Uploading API GeoCrowd API Web Services User API Verification Status Task inquiry data Tasks Notifications Figure 6.2: Architecture of the MediaQ mobile app. The mobile app provides the interface to register and login to the MediaQ system. After login, users can use their device to record videos and upload them to the MediaQ server. However, at times users may not have Internet access for login due to unavailable wireless coverage. In such cases users can still record a video and store it locally without logging into the system. Afterwards, when Internet access becomes available they can 81 upload it to the server. The reason behind this is that every video belongs to a user and the server needs to know who the owner is. We can only achieve that when the users are logged in to the system. After capturing a video, the mobile user is able to select which videos to upload, while others can remain in the device. Before uploading, the user can preview the recorded videos and their captured trajectories to ensure that each video’s metadata are correct and the quality of the video is acceptable. GeoCrowd is also integrated into the MediaQ mobile app to support on-demand media collection. Kalman Filtering-based Location Correction In MediaQ, camera locations (latitude/longitude coordinates) are collected from the GPS sensor embedded in mobile devices. The accuracy of the location data is critical in our approach. However, in reality, the captured locations may not be highly exact due to two reasons: 1) the varying surrounding environmental conditions (e.g., reflections of signalsbetweentallbuildings)duringdataacquisition, and2)inherentsensorerrors(e.g., the use of low-cost sensors in mobile devices). In our system, we enhance the accuracy of the positioning data with a post-processing step immediately after the server receives metadata. We have devised a data correction algorithm based on Kalman filtering as follows. An original GPS reading p k is always accompanied with an accuracy measurement value a k . The accuracy measure indicates the degree of closeness between a GPS mea- surement p k and its true, but unknown position, say g k . If a k is high then that means that the actual position g k is far away from p k . We utilize a model of location mea- surement noise with p k and a k [51], where the probability of the real position data is assumed to be normal distributed with a mean of p k and its standard deviation σ k . We then set σ 2 k =g(a k ), where the function g is monotonically increasing. We model the correction process in accordance with the framework of Kalman filters. Two streams of noisy data are recursively operated on to produce an optimal estimate 82 of the underlying positions. We describe the position and velocity of the GPS receiver by the linear state space: π k = x k y k v κx v κy T , where v κx and v κy are the longitude and latitude component of velocity v κ . In practice, v κ can be estimated by some less uncertain coordinates and their timestamp information. We define the state transition model F k as F k = 1 0 Δt k 0 0 1 0 Δt k 0 0 1 0 0 0 0 1 , where Δt k is the time duration between t k and t k−1 . We also express the observation model H k as H k = 1 0 0 0 0 1 0 0 . H k maps the true state space into the measured space. For the measurement noise model, we use a k to present the covariance matrix R k of observation noise as follows: R k = g(a k ) 0 0 g(a k ) . Similarly,Q k can also be determined by a diagonal matrix but using the average ofg(a δ ), whose corresponding position coordinates p δ and timestamp t δ were used to estimate v κ in this segment. We apply this process model to the recursive estimator in two alternating phases. The first phase is the prediction, which advances the state until the next scheduled measurement is coming. Second, we incorporate the measurement value to update the state. 83 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 10 20 30 40 50 60 70 80 CDF Average error distances to the ground truth of each GPS sample (meters) Original Measurements Processed Data Figure 6.3: Cumulative distribution function of average error distances with the Kalman filtering based algorithm. The height of each point represents the total amount of GPS sequence data files whose average distance to the ground truth positions is less then the given distance value. 6.1.3 Video Data Management in MediaQ In this section, I will first describe the mobile video data collected by MediaQ, and then discuss the data store and the query processing of the mobile videos. MediaQ Dataset Let me first introduce the mobile video data 3 we collected through MediaQ [62]. Figure 6.4 displays the location distribution of the collected videos on Google Maps. Mobile videos were collected from all over the world: most of the videos were recorded in Los Angeles (45%), Singapore (28%), Munich (13%), and the rest 14% videos were from 18 other cities. Table 6.1 shows the overall statistics of the dataset collected in the past 10 years (2007− 2016). There are 2397 videos in total, of which 1924 videos have both video contents and their geospatial metadata. The total length of the 1924 videos is 38.54 hours with 58.18 gigabytes in size, and the average video length is 72.14 seconds. As discussed before, videos can be attached with manually typed keywords 3 The dataset is downloadable from http://mediaq.usc.edu/dataset/ 84 Total number of videos with geo-metadata 2,397 Total number of videos with both geo-metadata and contents 1,924 Total length of videos with contents (hour) 38.54 Average length per video with content (sec) 72.14 Percentage of videos which have keywords 22.78% Average camera moving speed (km/h) 4.5 Average camera rotation speed (degrees/sec) 10 Total number of users 289 Average number of videos by each user 8.29 Total number of FOVs 208,976 Total FOV number of video with contents 142,687 Average number of FOV per second 1.03 Average number FOV per video 74.16 Table 6.1: Overview of MediaQ Dataset and/or automatically tagged keywords based on their FOV coverage. In our dataset, 22.78% of the videos have keywords. Most of the videos were recorded by users casually in a walk mode. The camera moving speed is 4.5 km/h on average, and the camera rotation speed is 10 degrees/sec (i.e., the azimuth angle changing speed). In addition, our dataset were collected by 289 users, and each user collected 8.29 videos on average. Moreover, there were a total of 208, 976 FOVs available in our dataset, among which 142, 687 FOVs are associated with video contents as well. Therefore, the average FOV sampling rate is 1.03 FOVs per second, and each video is associated with 74.16 FOVs on average. Figure 6.4: MediaQ Video Distribution 85 Mobile Video Storage and Indexing MediaQ video data includes two components: video contents and their geospatial metadata. On the server side of MediaQ, video contents are stored in the file system and videos’ geo-metadata are stored in two popular database systems (MySQL 4 and Mon- goDB 5 ). MySQL and MongoDB are two different types of database systems. MySQL is a typical open-source relational database management system (RDBMS) which can support various spatial indexes, e.g., R-tree [27]. Since ground-FOVs (aerial-FOVs) are spatial objects in the shape of pie slice (quadrilaterals), we can enclose the area of each FOV with its Minimum Bounding Rectangle (MBR). With R-trees, we can use the prun- ing techniques supported by MySQL for the efficient spatial filtering. However, MySQL has following limitations: 1) its scalability (i.e., database sharding) is problematic, and 2) the extensive use of the user-defined functions require expensive maintenance cost. To overcome these drawbacks of MySQL, an alternative database system is MongoDB. To make system scalable, MongoDB is featured with auto-sharding, where a number of replica set of a master database server node can be deployable and its database par- titioning is performed automatically. Another advantage of MongoDB is that it stores data in JSON-like documents, which supports schema-free design where a newly defined attribute can easily be inserted into the existing database model. It is optimized for JSON data indexing and searching, and it also supports geospatial indexing. As our geo-sensor data uploaded from the client side is already organized as a JSON format, with MongoDB we do not need transform overhead to import into the database. There- fore, MongoDB would perform better than MySQL and we will eventually migrate to MongoDB. Furthermore, the proposed OR-tree in Chapter 4 and TetraR-tree in Chap- ter 5 can be integrated into the data management systems to support mobile video search more efficiently. 4 https://www.mysql.com/ 5 https://www.mongodb.com/ 86 Spatial Queries on Mobile Videos MediaQcansupportregion, range, directional, keywordqueriesandtemporalqueries. These queries are the fundamental data search functions in the multimedia applications to be discussed in Section 6.2. Figure 6.5 displays the search interface with Google Maps for ground mobile videos in MediaQ; and Figure 6.6 shows the search interface with Google Earth for aerial mobile videos. Figure 6.5: MediaQ Video Search Interface (Ground Videos) Figure 6.6: MediaQ Range Query Interface (Aerial Videos) 87 Region Queries The query region in our implementation implicitly uses the entire visible area on a map interface as the rectangular region. The search engine retrieves all FOVs that overlap with the given visible rectangular region. Our implementation of this kind of query aims to quickly show all the videos on the map without constraints. Range Queries Range queries are defined by a given circle, within which all the FOVs are found that overlap with the area of the circle. The resulting FOV f(p,θ,R,α) of the range circle query (q,r) with the center point q and radius r. We can also support 3D range queries, where a query boundary could be a cube or a sphere and the FOV objects are modeled in 3D pyramids. Figure 6.6 shows a 3D cube range query on aerial videos. Note that for range queries, the camera locations of the result FOV objects could be outside of the query boundaries. Directional Queries A directional query searches all video segments whose FOV direc- tion angles are equal to or less than the range of an allowable error margin to a user- specified input direction angle. The videos to be searched are also restricted to their FOVs residing in the given range on the map interface. A user can initiate a directional query request through MediaQ GUI by defining the input direction angle which is an offest from the North. Then the directional query is automatically submitted to the server and the final query results, similar to those of other spatio-temporal queries, are rendered accordingly. Keyword Queries As mentioned above, textual keywords can be automatically be attached to incoming video frames in MediaQ. The tagged keywords (i.e., “what” meta- data) is related to the content of the videos. The textual keyword search provides an alternative and user-friendly way to search videos. In the MediaQ system, given a set of query keywords S, keyword queries are defined as finding all the video frames such that the associated keywords of each video frame contain all of the keywords in the query keyword set S. Keyword queries can be combined with region queries, range queries, and directional queries to provide richer query functions. 88 Temporal Queries Temporal queries are defined as “given a time interval, find all the videoframeswithintheduration.” Notethattheregionqueries, rangequeries, directional queries, and keyword queries described above can be combined with temporal queries, and they have been implemented in MediaQ. Video Query Result Presentation MediaQ searches mobile videos at the video frame granular level. For example, for the range query in Figure 6.7a, it does not return the entire video but only the video frames that overlap with the range query. The results of aforementioned queries are a set of FOVs, i.e., discrete video frames, which is sufficient when searching images, but not for videos. Videos should be smoothly displayed for human perception. Hence, MediaQ presents the results of a video query as a continuous video segment (or segments) by grouping consecutive FOVs in the same video into a video segment. However, since we are targeting mobile videos, there exist some cases where the result consists of several segments within the same video. When the time gap between two adjacent segments of the same video is large (say more than 5 seconds), individual segment will be displayed independently. However, when the time gap is small it would be desirable to display the two adjacent segments as a single segment including the set of FOVs during the time gap (even though these FOVs are not really part of the result of the given query) for a better end-user viewing experience. To achieve this, we group all the identified FOVs by their corresponding videos and rank them based on their timestamps within each group. If two consecutively retrieved FOVs within the same group (e.g., in the same video) differ by more than a given time threshold (say, 5 seconds), we divide the group into two separate video segments. To display a specific video segment in a long video, we implement this with the Media Fragment tool supported by HTML5 Video 6 . Figure 6.7b displays a video segment result for a range query. The entire video is 01:53 seconds in total and only the video segment [00:58, 01:53] overlaps with the query range. 6 https://www.w3.org/TR/media-frags/ 89 (a) Frame-level Video Search (b) Search Result in MediaQ: Video Segment [00:58, 01:53] Figure 6.7: Fine-granular Mobile Video Search 6.2 Applications of Geo-tagged Mobile Videos Since MediaQ can provide the continuous fusion of geospatial metadata and video frames, such correlated information can be used for the generation of new visual infor- mation, not only for plain display of video results. The geo-tagged mobile video search plays a prominent role in many applications. This section provides several example real world use cases. 6.2.1 Event Coverage We used MediaQ as media platform for covering the NATO Summit event that was held in Chicago in May 2012. This was a joint collaboration between the USC Integrated Media Systems Center, the Columbia College at Chicago, and the National University of Singapore. More than twenty journalism students from Columbia College at Chicago coveredthestreetsinChicagoduringtheeventusingiPhones, iPads, andAndroidphones as the video collecting devices. The focus of this experiment was mainly on real-time video data collection and searching of videos. In the experiments, we used a Linux server machine with an 8-core processor, 8GB of main memory, and 10 TB of disk space. A gigabit fiber network was connected to 90 the server. We supported two types of videos: 480p (720× 480 or 640× 480) and 360p (540×360or 480×360). 480pwastheoriginalvideoqualityrecordedwithmobiledevices, whose bandwidth requirement was around 2 to 3 Mbps. 360p was the transcoded video quality of the original 480p video and its bandwidth requirement was 760 kbps. By default, 360p video was served during video streaming. During a three day period, more than 250 videos were recorded and uploaded. Videos were collected in two different ways. One group of recorded videos was stored in smart- phones and uploaded later when enough Wi-Fi bandwidth was available. The other group of recorded videos was uploaded in a streaming manner for real-time viewing. The challenge for real-time streaming was the lack of enough cellular network band- width since thousands of people in the streets were sharing the same wireless network. Thus, we installed two directional antennae at the corner roofs of the Columbia College campus to cover two target areas. A wired fiber network was connected from the anten- nae to the MediaQ server. Several students were carrying backpacks to carry directional cantenna for wireless Wi-Fi connections to the antennae installed on the roofs. These worked as wireless access points (hotspots) for nearby students. Thus, private wireless communication was available when a line of sight between antennae and cantenna was maintained (see Figure 6.8). Figure 6.8: Screenshot: NATO Summit 2012 experiments in Chicago utilizing a custom Wi-Fi setup with range extenders and mobile access points. 91 Figure 6.9: Screenshot: Videos from the NATO Summit 2012 in Chicago. The starting position of every trajectory is marked with a pin on the map. Figure 6.10: Screenshot: Illustration of the FOV of the current frame and the GPS uncertainty (the yellow circle around the camera position). Overall, video collection and uploading from mobile devices worked satisfactory even though some transmissions were retried several times to achieve a successful upload. We observed that the performance of the server significantly depended on the transcoding time of uploaded videos (which also depends on the capacity of a server hardware). The transcodingtimeper5-second-longvideosegmentvariedfrom6to20seconds(onaverage 10 seconds). Usually, uploading happened in a bursty manner, i.e., all incoming video segments (unit segment) were scheduled for transcoding immediately. At the server, we observed that the peak number of video segment transcoding jobs in a queue was around 500 at the peak time. During the busiest day, the queue length varied from 100 to 400. 92 Since a newly uploaded video is not retrievable until all previous transcoding tasks are completed, there existed some delays between the uploading time and the viewing time. 6.2.2 GIFT: Geospatial Image and Video Filtering Tool for Computer Vision Applications Computer vision algorithms mainly focus on the analysis of a given set of input images or videos without considering what would be the most effective input dataset for the dataset. Our image/video data management techniques can provide the efficient search capabilities that would be useful for vision applications. To this end, we pro- pose a novel Geospatial Image and Video Filtering Tool (GIFT) [18, 19] that provides a general and systematic mechanisms to select the most relevant input images and video from geo-tagged mobile videos for vision applications and image / video analytics. It is observed that the geographical properties of images/videos provide context information for humans to better understand the images/videos, such as a panoramic view mapped on a specific location (e.g., Google Street View). GIFT harnesses and intelligently man- ages geospatial metadata (e.g., camera locations, viewing directions, record timestamps, textual keywords) to deal with large volumes of mobile video data. The innovative claims of GIFT include: 1) maximizing the utility of existing geospatial metadata acquisition and extraction technologies, 2) efficiently managing media content with the associated geospatial metadata collected from mobile devices for indexing and searching, 3) sup- porting various vision applications to enable scalable image/video analytics. GIFT Overview The GIFT framework presented in Figure 6.11 has three com- ponents: vision applications, GIFT, and the video database. The video database (e.g., MySQL) stores the geospatial metadata of videos. Video contents are stored as files. When the vision application requires a set of images/videos to process (e.g., generating a panoramic image), it sends a query request to GIFT. GIFT performs the query and returns a set of images/videos as the query result. The image/video results are trans- ferred to the application over the network. GIFT includes three main modules: Logical 93 Operations, Filtering Functions, and Sorting and Extraction, each of which is explained below. Figure 6.11: GIFT Framework The Logical Operations Module 1) receives query requests from applications, where a query request is a logical combination of a set of filtering functions, 2) converts the specific query request to a set of query functions (i.e., filtering functions) to be executed in the Filtering Functions Module, 3) gets the FOV result sets of the filtering functions and combines the FOV result sets into a result set and 4) sends the result set to the Sorting Module. GIFT supports three basic logical operations, AND, OR, XOR, for combining the filtering functions. The Filtering Functions Module provides query functions with different filtering options based on FOVs stored in the database for different purposes. We provide three types of query functions depending on the type of filtering options: spatial queries, direc- tional queries, temporal queries and two types of queries in terms of query frequency (snapshot or continuous). To accelerate the query processing for filtering functions, we support FOV indexing in GIFT (e.g., R-tree [27], OR-tree [61], TetraR-tree [59]). 94 GIFT provides three basic sorting functions: 1) sorting by distance, where the dis- tance is the Euclidean distance between the FOV’s camera location and the query point; 2) sorting by orientation, which is the angle between the FOV’s orientation and the query direction; and 3) sorting by time, i.e., the closeness of the FOV’s timestamp to the query time. The query results of GIFT consist of a set of FOVs. We further com- bine continuous FOVs that are in the same original video into video segments using the FFMPEG library 7 . To illustrate how GIFT improves the efficiency of computer vision applications, I show several video applications: panorama generation, 3D model reconstruction, and persistent object tracking in the following. Spatial Filtering for Panorama Generation By providing an omnidirectional scene through one image, panoramic images have great potential to produce an immersive sensation and a new way of visual presentation. Panoramas are useful for a large number of applications such as in monitoring systems, virtual reality and image-based rendering. Thus, we consider panoramic image gener- ation from large-scale user-generated mobile videos for an arbitrary given location. To generate good panoramas from a large set of videos efficiently, we are motivated by the following two objectives: • Acceleration of panorama stitching. Panorama stitching is time consuming because itinvolvesapipelineofcomplexalgorithmsforfeatureextraction,featurematching, image adjustment, image blending, etc. • Improving the quality of the generated panoramic images. Consecutive frames in a video typically have large visual overlap. Too much overlap between two adjacent 7 https://www.ffmpeg.org/ 95 video frames not only increases the unnecessary computational cost with redun- dant information [23], but also impacts blending effectiveness and thus reduces the panorama quality 8 . Therefore, the goal is to select the minimal number of key video frames from the videos based on their geographic metadata while preserving the visual quality of the generated panoramic images. Several novel key video frame selection methods have been proposed in our prior work [43] to effectively and automatically generate panoramic images from videos to achieve a high efficiency without sacrificing quality. The key video frame selection criteria of the introduced algorithms based on the geo-information are follows: • To select the video frames whose camera locations are as close as possible to the query location; • To select video frames such that every two spatially adjacent FOVs should have appropriate overlap since too much image overlap results in distortions and exces- sive processing for stitching while too little image overlap may result in stitching failure. • To select video frames whose corresponding FOVs cover the panoramic scene as much as possible. We used MediaQ as the media coverage platform for the Presidential Inauguration in Washington DC in January 2013. It was a joint collaboration with the PBS Newshour College Reporting Team. More than 15 journalism students selected from all across the United States covered the streets in Washington DC during the event using the MediaQ Android app as the video collecting device and MediaQ as the server. This experiment mainly focused on video data manipulation and presentation, especially in the generation of panoramic images from the collected videos from smartphones. 8 http://www.ou.edu/class/digitalmedia/Photomerge_Help/Using_Photomerge.htm 96 Figure 6.12: Example images from the PBS experiments. Figure 6.13: Example panoramic image: SelectedFOV# = 17, Source Video# = 3, Selection and Stitching Time = 11.8 sec. Figure 6.13 provides another example for visual verification on the panoramic image generation algorithm. This panorama was generated by selecting the 17 “best” video frames (the 17 frames were from 3 different videos) among 69,238 candidate frames. The total processing time (the selection time and the stitching time) was 11.8 seconds. This example illustrates that video geo-metadata can facilitate the generation of panoramic images efficiently. Spatial Filtering for 3D Model Reconstruction Automatic reconstruction of 3D models is attracting increasing attention in the mul- timedia community [26, 49, 24, 52, 20]. Scene recovery from video sequences requires a selection of representative video frames. With GIFT, we can also leverage the frame- attached geospatial metadata (e.g., camera locations and viewing directions) of user- generated videos available in the region to select key frames for 3D object reconstruction, as shown the framework in Figure 6.14. 97 Figure 6.14: Geospatial Filtering for 3D Model Reconstruction Figure 6.15: Illustration of Geo-based Active Key Frame Selection Algorithm A novel key video frame selection algorithm for 3D reconstruction based on videos’ geospatial properties is proposed in our prior study [89]. First of all, a point query will be applied to filter out all the frames that do not cover the target object; after that, among the candidate frames, a key frame subset with minimal spatial coverage gain difference is extracted by incorporating a manifold structure into reproducing a kernel Hilbert space to analyze the spatial relationship among the frames. As illustrated in Figure [? ], the black points are the frames’ camera locations from an aerial view. Without loss of generality, we assume that those candidate frames record the target object in the center (denoted with a blue square) after the filtering phase. The objective of our algorithm is to select a subset of frames (denoted with red stars), which maintain a minimal, but full, coverage of the target object in the geographic space. In other words, the information 98 loss from any viewing angle towards the target object is minimized by our key frame selection method. In our experiments, we utilize the public geocrowdsourced user-generated mobile videodatasetfromMediaQ[62]with345videosand77,642frames. Werandomlyselected 10 target objects (two in Singapore and eight in Los Angeles) to which we applied our active key frame selection method before the 3D reconstruction. Experimental results demonstrate that the execution time of the 3D reconstruction is shortened by 80% on average while the model quality is preserved. Due to pervasive trends and scalability advances in processing contextual data, key frame selection based on geo-sensor data analysis is practical and can complement a content-based approach. Spatial Filtering for Persistent Target Tracking Target tracking is the process of locating a moving target object (e.g., car or pedes- trian) over time in a video or multiple videos. It has a variety of uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, medical imaging and video editing. The objective of video track- ing is to associate target objects in consecutive video frames. If the tracker reaches the last frame of a video segment or the target becomes occluded exits the view, GIFT can be applied to select video segments which may cover the target for persistent tracking. Figure 6.16 shows an overview of the persistent tracking system. We first tag a target which is then automatically tracked in the videos. If the tracker ends, e.g., the tracker reaches the last frame of the video or the tracker loses the target, the persistent tracking system issues a request to GIFT to receive videos that may cover the target to allow for subsequent re-acquisition and tracking. Video selection, re-identification and tracking are repeated and the target is persistently tracked – in an automatic fashion – across multiple video segments from different videos. As shown in Figure 6.17, if the tracker ends at timestamp t 0 based on the last observed location P 0 the moving direction of the target, we predict the next possible locations of the target in timestamp t 0 0 sing a 99 constant velocity model. Suppose the predicted location of the target is P 0 0 , we combine spatial (point or range) queries with temporal queries in GIFT to actively select video frames which may cover the target. Figure 6.16: An Overview of Persistent Target Tracking with GIFT Figure 6.17: Spatial Queries for Persistent Target Tracking Without frame-by-frame spatial continuity as in single camera tracking, target re- acquisition across views is a difficult problem. The target is reacquired by choosing the track with the largest affinity score of appearance and position. In computing the 100 appearance affinity, GIFT largely reduces many unnecessary target matching by filtering out geographically far apart targets. Furthermore, the latitude and longitude of the tracks can be inferred from camera calibration [32]. It is possible to compute the position affinities of the tracks based on geo-coordinates. Therefore, the accuracy and efficiency of target re-acquisition are improved with GIFT. We conducted the experiments [18, 19] with the real-world dataset (500 mobile videos and0.5millionFOVs)bytracking20targets. ExperimentalresultsshowthatwithGIFT, the overall performance of persistent tracking is improved in terms of: • Efficiency and lower communication cost: GIFT effectively selects a small number of the most relevant video segments as the input to the tracking task. Therefore, the running time efficiency of the tracking system is improved while the amount of video data needed to be transferred over the network is drastically reduced. • Re-identification and tracking accuracy: It is known that re-identifying targets in a large video repository is error-prone. GIFT makes the tracking system more accurate by reducing the number of unnecessary target matching. 6.2.3 Janus: Intelligent Surveillance of Criminal Activity with Geo- tagged Mobile Videos The geo-tagged mobile videos can also be combined with other data sources (e.g., actuated surveillance cameras, geospatial data, social media data, wearable sensor data) to detect and characterize criminal activities. Their integration can compensate for sen- sor and modality deficiencies by using data from other available sensors and modalities. Janus [77], our proposed integrated system that enables multi-modal data collection at scale and automates the detection of events of interest for the surveillance and recon- naissance of criminal activities (e.g., stalking, gunshot). As shown in Figure 6.18, Janus system follows a three-tier architecture comprising data, analytics, and presentation tier. Each tier processes incoming data streams, and can produce new streams and/or make 101 streams or historical data available to the other tiers. Ultimately the extracted streams of incidents, events and raw data streams, both incoming and stored are made available to end users at the presentation level. Janus includes a comprehensive range of datasets: 1) building entrance badge sensor readings, 2) vehicle license plate readings, 3) incident reports from law enforcement, 4) tweets, 5) trajectories of pedestrians extracted from regular surveillance cameras, 6) videos collected with PTZ Network Cameras that capture high-resolution facial images, and 7) user-generated mobile videos. Mobile video dataset is a video dataset recorded in casual way (e.g., street shot), which can facilitate forensic investigation. For example, officers can find crime clues from the video taken by a passerby in the same time and location of the crime. Figure 6.18: Janus System Architecture: Data, Analytics and Presentation tiers are shown in green, orange and blue. Color-coded arrows outline how raw and pre-processed data, incidents and events are routed through the modules of the system. The Space-Time Cross Referencing Module in Janus provides spatial and temporal cross-referencing of raw input streams, incident streams and event streams. There is a large amount of location based and time-based information generated and used by Janus. Examples of such data include crime incidents, video feeds and tweets. Inevitably, Janus is expected to enable users to search not only by keywords but also by specifying the locations and times associated with their desired objects. 102 Figure 6.19 shows the search results from three different data sources for the stalking incident forensic analysis: (a) trajectories extracted from regular surveillance videos, (b) user-generated mobile videos with field of view geo-metadata, and (c) high-resolution face image from PTZ cameras. By cross-referencing multiple heterogeneous data and providing the tools for querying, visualizing and analyzing the data streams, the com- plexity of the decision making problem is greatly simplified. Figure 6.19: Example data that can be used for stalking forensic analysis: (a) tagged pedestrian trajectories, (b) mobile video generated by the public and (c) automatically extracted trajectories and high-resolution face images. 6.2.4 Points of Interest Detection from Geo-tagged Mobile Videos Many people take photos and videos with smartphones to capture memoriable sub- jects and situations at popular places and events (e.g., touristic attractions, concerts, and political rallies), and share them in social media. Such abundant and continuously user-generated images or videos offer online users the opportunity to learn about sub- jects, places, and events that caught the attention of people at an area of a place where they live or plan to visit. For example, users may search the web to know what popular attractions are currently in their city or what top-k most popular points of interest there. Most web services today (e.g., YouTube, Facebook) answer this type of questions by looking at camera locations and timestamps photos and videos have been tagged with. Their results, however, are inherently imprecise because the camera and the subject are usually at different locations. To avoid this issue to provide more precise and meaningful results, we can represent the visible scenes of each photo or video frame with the spatial extent of its coverage area 103 at a fine granularity (i.e., FOVs) [93, 29, 8]. As shown in Figure 6.20 where markers on the map are the (initial) camera locations of mobile videos at Merlion Park in Singapore, we display top-5 most popular points of interest in the area detected based on the FOV model. It seems POI 2 and POI 3 are located at the same place, but they are far apart (cameramen took videos at the same place but shooting at different directions). Figure 6.20: POI Detection based on FOV Model More Interesting Applications Associating geospatial metadata with mobile videos have more interesting applications in disaster response [86], transportation [94, 57,58],videosummarization[93],videotagannotation[78],urbanplanning,tourism[17], augmented reality [87]. For example, in disaster response and recovery where communi- cation network may be damaged and thus the bandwith is limited, geo-metadata with much smaller size can be uploaded first, and it can be used to analyze which videos are more important in the disaster area and prioritize the actual video content uploading on demand [86]. Another example is automatically generating high quality descriptive tags for outdoor videos by using the viewable scenes of videos obtained from sensor metadata to query visible objects from geo-information databases (e.g., OpenStreetMap 9 ). 9 https://www.openstreetmap.org/ 104 6.3 Chapter Summary In this chapter, I introduced the MediaQ system, the first online mobile media man- agement framework that lets public users store their recorded mobile multimedia content suchasimagesandvideos. MediaQprovidesunprecedentedcapabilitiesinorganizingand searching the media contents by leveraging the underlying sensor-fusion model. MediaQ has demonstrated that the FOV model is very useful for various media applications such as spatial filtering for key video frame selection for panorama generation. Besides of media collection with geospatial metadata and efficient geo-tagged mobile video data management techniques, MediaQ can also support keywords extraction, innovative con- tent browsing and search capabilities, and media content sharing. Based on the MediaQ platform, I studied various applications of geo-tagged mobile videos 1) to show how we can use the associated fine-granular field of view spatial mod- els and 2) to demonstrate the importance of the efficient and scalable video data man- agement capabilities. By leveraging the associated geo-information and its intelligent management, we can perform a spatial filtering to select the key images or video frames for computer vision applications (e.g., panarama generation, 3D model reconstruction, persistent tracking) to improve the video processing performance while preserving the visual qualities. With the FOV model, we can identify more precise and meaningful points of interest. Further, the POI detection time for a large video repository in a big city can be reduced through spatial filtering [8]. Moreover, combining with other data sources or algorithms, queries on geo-tagged mobile videos can be used in criminal investigation [77], video data analysis [93, 78, 29], transportation [94, 57, 58], disaster response [86] and so on. 105 Chapter 7 Conclusions and Future Work 7.1 Conclusion Inmythesis, Itackledthechallengesoflarge-scaleuser-generatedmobileorganization and searching, by leveraging the geospatial sensor metadata (e.g., camera locations or viewing directions) associated with the mobile videos. Ideally, each video frame can be geo-tagged the spatial extent of its coverage area (i.e., FOV). Consequently, a ground mobile video is represented as a sequence of ground-FOVs; and a aerial mobile video is represented as a sequence of aerial-FOVs. This effectively converts a challenging video management problem into a spatial data management of the FOVs. However, unlike regular spatial objects such as points or rectangle, FOVs (i.e., both ground-FOVs and aerial-FOVs) are spatial objects with both location and orientation information which renders the existing spatial index structures inefficient. To index and query ground-FOVs efficiently, I proposed a class of new index struc- tures, OR-trees [60, 61]. The key idea of OR-tree is incorporating GPS locations and Compass orientations into the index structure and optimization in order to reduce the dead spaces and thus it can reduce the unnecessary index node accesses. Experimental results demonstrated that not the simple consideration of orientation but the optimiza- tion criteria considering the orientation significantly facilitated the reduction of dead spaces of the index nodes. To process range and directional queries on ground videos efficiently, two novel search strategies – pruning strategy and total hit strategy – were proposed on top of OR-trees. For example, the new devised total hit strategy can report all the ground-FOVs under the subtree of an OR-tree node to be results directly without 106 accessingitschildnodes, whichresultsinasignificantreductioninprocessingcost. Addi- tionally, I developed analytical model to compute the bound of the maximum possible improvement of OR-trees over the baseline index R-tree. To index and query irregular-quadrilateral-shaped aerial-FOVs efficiently, I proposed another new index structure TetraR-tree. Instead of enclosing the entire quadrilateral with an MBR, to effectively capture the geometric property of aerial-FOVs to reduce dead spaces, TetraR-tree stores four MBRs of the corner points of quadrilaterals in the subtree. The alignment-based heuristic was proposed to optimize the TetraR-tree efficiently and effectively. With TetraR-tree, I proposed two novel search strategies that employ the geographic properties of outer and inter convex hulls to expedite point and rangequeriesonaerialvideos. Onthelarge-scalegenerateddatasets(morethan30years’ worth of videos), both OR-trees and TetraR-trees can respond to the spatial queries on (ground and aerial) mobile videos within a reasonable time (1–2 seconds). Finally, I introduced the MediaQ [42] system to demonstrate how we collect mobile videos with the geographic metadata from the embedded sensors of mobile devices, and how to organize and search mobile videos by leveraging the underlying sensor-fusion model. I also discussed our other multimedia projects to explore the applications of geo- tagged mobile videos and their intelligent data management. For example, in GIFT [18, 19], we can perform a spatial filtering with the associated geo-metadata to select the key images or video frames for computer vision applications (e.g., panarama generation, 3D model reconstruction, persistent tracking) to improve the video processing performance while preserving the visual qualities. In Janus [77], we can combine with other data sources to provide an integrated intelligent surveillance platform. 7.2 Future Work In future, I intend to extend this work in the following directions. 107 Considering Temporal Information of Videos into the Index Structures First, in addition to the geographic coverage, I intend to consider another important criteria of mobile videos, temporal information, into the index structures. This thesis focused on tackling the challenges of mobile video indexing considering the spatial cov- erages and the viewing directions. However, studying the temporal aspect of mobile videos is necessary in real world applications. Simply augmenting the time dimension into the index structures may be not efficient as the time dimension is orthogonal to the spatial coverage dimensions (e.g., the viewing orientations). Users may change the cam- era shooting directions frequently while recording the mobile videos. More sophisticated techniques may be needed to optimize the index structures. There may be opportunities for clustering, sampling and/or compression across time. These techniques could be used to condense the mobile video indexing. Designing a Hybrid Index to Support both Ground and Aerial Videos In some applications (e.g., criminal investigation, event coverage), searching both groundandaerialmobilevideosmayhelptoimprovethesearchresults. Searchingground and aerial videos separately would be time consuming. It is necessary to design a hybrid index to support spatial queries on both ground videos and aerial videos efficiently. As both ground and aerial mobile videos are videos recorded with mobile devices and they can be geo-tagged with sensor data, one idea of combine the two video sources may be modeling the spatial coverages of both ground and aerial mobile videos as 3D pyramids. Our proposed index optimization techniques could also be extended to optimize the 3D pyramid indexing. For example, considering both camera locations and orientations into the index optimization could reduce the dead spaces; alignment-based heuristics applied in TetraR-tree could improve the optimization processing performance. 108 Extending the Proposed Index Structures to the Cloud and Distributed Sys- tem for Even Larger Sets of Video Data with High Volume Data Ingestion The number of mobile videos is rapidly growing. In line with this trend, a distributed computing infrastructure is needed to store and manage this ever-growing large-scale video dataset. Existing cloud data store such as Hadoop and Hive or NoSQL data stores (e.g., key-value stores, column-based stores, document-based stores, graph-based stores) do not efficiently support spatial indexing for FOVs with both location and orientation information. It is necessary to migrate the proposed indexes OR-tree and TetraR-tree to the cloud for parallel processing and video hosting service with even larger video databases. Additionally, I would like to study the insertion and update costs of our indexes and study techniques for batch insertion of mobile videos in the cloud environ- ment with high volume data ingestion. FOV Model Enhancement In the FOV model [14] used my thesis, the visible distance and viewable angle are estimated based on the camera lens properties and zoom levels. Recall Section 2.1.1, this FOV model simply assumes that the target object fully captured within the video frame is a two story, approximately 8.5 meter-tall building. Clearly, this FOV model is not precise in many applications. There are several additional factors that influence the effective viewable scene in a video, such as occlusions, visibility depth, resolution, etc. TheFOVmodelcouldbeextendedandimprovedtoaccountforthesefactors. Occlusions have been well studied in computer graphics research. We plan to incorporate an existing occlusion determination algorithm into our model. Moreover, depth sensors are recently embedded into model mobile devices (e.g., iPhone X 1 ). With the depth sensor data, the FOV model can be refined. Subsequently, data organization and queries on such FOVs 1 https://www.apple.com/iphone-x/ 109 with depth information could be exploited in more interesting applications such as 3D room mapping, robot navigation, gaming, virtual reality or augmented reality, etc. 110 Reference List [1] http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ visual-networking-index-vni/white_paper_c11-520862.pdf. [2] https://fortunelords.com/youtube-statistics/. [3] https://sproutsocial.com/insights/facebook-stats-for-marketers/ #videostats. [4] http://expandedramblings.com/index.php/drone-statistics/. [5] http://dronelife.com/2016/07/19/8-incredible-drone-industry-stats/. [6] https://en.wikipedia.org/wiki/2017_California_wildfires. [7] http://enterprise.nus.edu.sg/technology-commercialisation/ for-industry/technology-highlights/geovid. [8] Efficient detection of points of interest from georeferenced visual content. In 6th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, BigSpatial’17. [9] Canada Drone. http://www.canadadrones.com/, 2017. [10] F. Akrami and F. Zargari. An efficient compressed domain video indexing method. Multimedia tools and applications, 72(1):705–721, 2014. [11] A. Alfarrarjeh, C. Shahabi, and S. H. Kim. Hybrid indexes for spatial-visual search. In Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23 - 27, 2017, pages 75–83, 2017. [12] A. Andrew. Another efficient algorithm for convex hulls in two dimensions. Infor- mation Processing Letters, 9(5):216 – 219, 1979. [13] G. S. Avellar, G. A. Pereira, L. C. Pimenta, and P. Iscold. Multi-uav routing for areacoverageandremotesensingwithminimumtime. Sensors, 15(11):27783–27803, 2015. [14] A. S. Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for Geospatial Video Search. In ACM Intl. Conf. on MM, pages 309–318, 2008. 111 [15] S. A. Ay, S. H. Kim, and R. Zimmermann. Generating synthetic meta-data for georeferenced video management. In Proceedings of the 18th SIGSPATIAL Interna- tional Conference on Advances in Geographic Information Systems, GIS ’10, pages 280–289, New York, NY, USA, 2010. ACM. [16] S. A. Ay, R. Zimmermann, and S. H. Kim. Relevance ranking in georeferenced video search. ACM Multimedia Systems (MMSys), 16(2):105–125, Mar. 2010. [17] Y.-L. H. C.-C. Lin, Y. Zhang and R. Zimmermann. A personalized trip recom- mendation system based on field of views. In In 5th International Conference on Engineering and Applied Sciences, 2015. [18] Y. Cai, Y. Lu, S. H. Kim, L. Nocera, and C. Shahabi. Gift: A geospatial image and video filtering tool for computer vision applications with geo-tagged mobile videos. In 2015 IEEE International Conference on Multimedia Expo Workshops (ICMEW), pages 1–6, June 2015. [19] Y. Cai, Y. Lu, S. H. Kim, L. Nocera, and C. Shahabi. Querying geo-tagged videos for vision applications using spatial metadata. EURASIP Journal on Image and Video Processing, 2017(1):19, 2017. [20] A. R. Chowdhury, R. Chellappa, S. Krishnamurthy, and T. Vo. 3d face reconstruc- tion from video using a generic model. In Multimedia and Expo, 2002. ICME’02. Proceedings. 2002 IEEE International Conference on, volume 1, pages 449–452. IEEE, 2002. [21] D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world’s photos. In Proceedings of the 18th international conference on World wide web, pages 761–770. ACM, 2009. [22] B. Epshtein, E. Ofek, Y. Wexler, and P. Zhang. Hierarchical photo organization using geo-relevance. In Proceedings of the 15th Annual ACM International Sym- posium on Advances in Geographic Information Systems, GIS ’07, pages 18:1–18:7, New York, NY, USA, 2007. ACM. [23] M. J. Fadaeieslam, M. Soryani, and M. Fathy. Efficient key frames selection for panorama generation from video. Journal of Electronic Imaging, 20(2):023015– 023015, 2011. [24] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski. Towards internet-scale multi- view stereo. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 1434–1441. IEEE, 2010. [25] F. Gilboa-Solomon, G. Ashour, and O. Azulai. Efficient storage and retrieval of geo- referenced video from moving sensors. In Proceedings of the 21st ACM SIGSPA- TIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL’13, pages 404–407, New York, NY, USA, 2013. ACM. 112 [26] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz. Multi-view stereo for community photo collections. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007. [27] A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47–57, 1984. [28] Z. Han, C. Cui, Y. Kong, F. Qin, and P. Fu. Video data model and retrieval service framework using geographic information. Transactions in GIS, 20(5):701–717, 2016. [29] J. Hao, G. Wang, B. Seo, and R. Zimmermann. Keyframe presentation for browsing of user-generated videos on map interfaces. In Proceedings of the 19th ACM Interna- tional Conference on Multimedia, MM ’11, pages 1013–1016, New York, NY, USA, 2011. ACM. [30] K.Hausman,J.Müller,A.Hariharan,N.Ayanian,andG.S.Sukhatme. Cooperative controlfortargettrackingwithonboardsensing. In the 14th Int’l Sym. on Exp. Rob., ISER, pages 879–892, 2014. [31] E. Hecht and A. Zajac. Addison wesley publishing company. Inc., Optics, Geomet- rical Optics, page 108. [32] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. International Journal of Computer Vision, 80(1):3–15, 2008. [33] W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank. A survey on visual content-based videoindexingandretrieval. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 41(6):797–819, 2011. [34] T.-H.Hwang,K.-H.Choi,I.-H.Joo,andJ.-H.Lee. Mpeg-7metadataforvideo-based gis applications. In Geoscience and Remote Sensing Symposium, 2003. IGARSS’03. Proceedings. 2003 IEEE International, volume 6, pages 3641–3643. IEEE, 2003. [35] A. Jaffe, M. Naaman, T. Tassa, and M. Davis. Generating summaries and visu- alization for large collections of geo-referenced photographs. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, MIR ’06, pages 89–98, New York, NY, USA, 2006. ACM. [36] V.JainandB.Shneiderman. Datastructuresfordynamicqueries: Ananalyticaland experimental evaluation. In Proc. of the Workshop on Advanced Visual Interfaces. NY: ACM, pages 1–11, 1994. [37] V.JainandB.Shneiderman. Datastructuresfordynamicqueries: Ananalyticaland experimental evaluation. In Proc. of the Workshop on Advanced Visual Interfaces, pages 1–11, 1994. [38] K. Kanistras, G. Martins, M. J. Rutherford, and K. P. Valavanis. Survey of Unmanned Aerial Vehicles (UAVs) for Traffic Monitoring, pages 2643–2666. Springer Netherlands, Dordrecht, 2015. 113 [39] L.KazemiandC.Shahabi. Geocrowd: enablingqueryansweringwithspatialcrowd- sourcing. In Proceedings of the 20th international conference on advances in geo- graphic information systems, pages 189–198. ACM, 2012. [40] K.-H. Kim, S.-S. Kim, S.-H. Lee, J.-H. Park, and J.-H. Lee. The interactive geo- graphic video. In Geoscience and Remote Sensing Symposium, 2003. IGARSS’03. Proceedings. 2003 IEEE International, volume 1, pages 59–61. IEEE, 2003. [41] S. H. Kim, S. A. Ay, B. Yu, and R. Zimmermann. Vector model in support of ver- satile georeferenced video search. In Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, MMSys ’10, pages 235–246, New York, NY, USA, 2010. ACM. [42] S. H. Kim, Y. Lu, G. Constantinou, C. Shahabi, G. Wang, and R. Zimmermann. Mediaq: Mobile multimedia management system. In ACM MMSys, pages 224–235, 2014. [43] S. H. Kim, Y. Lu, J. Shi, A. Alfarrajeh, C. Shahabi, G. Wang, and R. Zimmermann. Key frame selection algorithms for automatic generation of panoramic images from crowdsourced geo-tagged videos. In Proc. of the Conf. of Web and Wireless Geo- graphical Information Systems (W2GIS), pages 67–84, Seoul, South Korea, 2014. [44] Y. Kim, J. Kim, and H. Yu. Geotree: using spatial information for georeferenced video search. Knowledge-based systems, 61:1–12, 2014. [45] R.Kumar, H.Sawhney, andS.Samarasekera. Aerialvideosurveillanceandexploita- tion. Proceedings of the IEEE, 89(10):1518 – 1539, 2002. [46] D. Lee, J. Oh, W.-K. Loh, and H. Yu. Geovideoindex: Indexing for georeferenced videos. Information Sciences, 374:210 – 223, 2016. [47] K. C. Lee, W.-C. Lee, and H. V. Leong. Nearest surrounder queries. IEEE TKDE, 22(10):1444–1458, 2010. [48] P. Lewis, S. Fotheringham, and A. Winstanley. Spatial video and gis. Int. J. Geogr. Inf. Sci., 25(5):697–716, May 2011. [49] M. Lhuillier and L. Quan. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE transactions on pattern analysis and machine intelli- gence, 27(3):418–433, 2005. [50] G. Li, J. Feng, and J. Xu. Desks: Direction-aware spatial keyword search. In Proc. of the 28th IEEE ICDE, pages 474–485, 2012. [51] K. Lin, A. Kansal, D. Lymberopoulos, and F. Zhao. Energy-accuracy aware local- ization for mobile devices. In Proceedings of 8th International Conference on Mobile Systems, Applications, and Services (MobiSysâĂŹ 10), 2010. 114 [52] L. Ling, I. S. Burrent, and E. Cheng. A dense 3d reconstruction approach from uncalibrated video sequences. In Multimedia and Expo Workshops (ICMEW), 2012 IEEE International Conference on, pages 587–592. IEEE, 2012. [53] S. Liu, H. Li, Y. Yuan, and W. Ding. A Method for UAV Real-Time Image Sim- ulation Based on Hierarchical Degradation Model, pages 221–232. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. [54] X. Liu, M. Corner, and P. Shenoy. Seva: sensor-enhanced video annotation. In Proceedings of the 13th annual ACM international conference on Multimedia, pages 618–627. ACM, 2005. [55] X. Liu, S. Shekhar, and S. Chawla. Object-based directional query processing in spatial databases. Proc. of IEEE TKDE, 15(2):295–304, Feb. 2003. [56] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004. [57] Y. Lu, G. Cong, J. Lu, and C. Shahabi. Efficient algorithms for answering reverse spatial-keyword nearest neighbor queries. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems,SIGSPA- TIAL ’15, pages 82:1–82:4, New York, NY, USA, 2015. ACM. [58] Y. Lu, G. Jossé, T. Emrich, U. Demiryurek, M. Renz, C. Shahabi, and M. Schubert. Scenic routes now: Efficiently solving the time-dependent arc orienteering problem. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 487–496. ACM, 2017. [59] Y. Lu and C. Shahabi. Efficient indexing and querying of geo-tagged aerial videos. In ACM SIGSPATIAL / GIS, 2017. [60] Y. Lu, C. Shahabi, and S. H. Kim. An efficient index structure for large-scale geo-tagged video databases. In ACM SIGSPATIAL / GIS, pages 465–468, 2014. [61] Y. Lu, C. Shahabi, and S. H. Kim. Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica, 20(4):829–857, 2016. [62] Y. Lu, H. To, A. Alfarrarjeh, S. H. Kim, Y. Yin, R. Zimmermann, and C. Shahabi. Geougv: User-generated mobile video dataset with fine granularity spatial meta- data. In Proceedings of the 7th International Conference on Multimedia Systems, MMSys ’16, pages 43:1–43:6, New York, NY, USA, 2016. ACM. [63] H. Ma, S. Arslan Ay, R. Zimmermann, and S. H. Kim. Large-scale geo-tagged video indexing and queries. GeoInformatica, 18(4):671–697, 2014. [64] H. Ma, S. A. Ay, R. Zimmermann, and S. H. Kim. A grid-based index and queries for large-scale geo-tagged video collections. In Proc. of the 17th international con- ference, DASFAA workshops, pages 216–228, 2012. 115 [65] P. Mordohai, J.-M. Frahm, A. Akbarzadeh, B. Clipp, C. Engels, D. Gallup, P. Mer- rell, C. Salmi, S. Sinha, B. Talton, et al. Real-time video-based reconstruction of urban environments. ISPRS Working Group, 4, 2007. [66] B. S. Morse, C. H. Engh, and M. A. Goodrich. Uav video coverage quality maps and prioritized indexing for wilderness search and rescue. In Human-Robot Interaction (HRI), pages 227–234. IEEE, 2010. [67] T. Nagai, S. Yasutome, and N. Tokura. Convex hull problem with imprecise input. In Japanese Conference on Discrete and Computational Geometry, pages 207–219. Springer, 1998. [68] T. Navarrete and J. Blat. Videogis: Segmenting and indexing video based on geo- graphic information. In Proc. of the Conf. on geographic information science, pages 1–9, 2002. [69] C.-W. Ngo, T.-C. Pong, and H.-J. Zhang. Motion analysis and segmentation through spatio-temporal slices processing. IEEE Transactions on Image Processing, 12(3):341–355, 2003. [70] M. S. Nixon and A. S. Aguado. Feature extraction & image processing for computer vision. Academic Press, 2012. [71] P.Oettershagen, A.Melzer, T.Mantel, K.Rudin, T.Stastny, B.Wawrzacz, T.Hinz- mann, S. Leutenegger, K. Alexis, and R. Siegwart. Design of small hand-launched solar-powered uavs: From concept study to a multi-day world endurance record flight. Journal of Field Robotics, 2016. [72] M. Park, J. Luo, R. T. Collins, and Y. Liu. Estimating the camera direction of a geotagged image using reference images. Pattern Recognition, 47(9):2880–2893, 2014. [73] A. Pigeau and M. Gelgon. Building and tracking hierarchical geographical & tempo- ral partitions for image collection management on mobile devices. In Proceedings of the 13th ACM International Conference on Multimedia, Singapore, November 6-11, 2005, pages 141–150, 2005. [74] H. S. Sawhney, A. Arpa, R. Kumar, S. Samarasekera, M. Aggarwal, S. Hsu, D. Nis- ter, and K. Hanna. Video flashlights: Real time rendering of multiple videos for immersive model visualization. In Proceedings of the 13th Eurographics Workshop on Rendering, pages 157–168, 2002. [75] M. Schneider, T. Chen, G. Viswanathan, and W. Yuan. Cardinal directions between complex regions. ACM TODS, 37(2):8:1–8:40, June 2012. [76] R. R. Shah, A. D. Shaikh, Y. Yu, W. Geng, R. Zimmermann, and G. Wu. Event- builder: Real-time multimedia event summarization by visualizing social media. In ACM Multimedia, pages 185–188, 2015. 116 [77] C. Shahabi, S. H. Kim, L. Nocera, G. Constantinou, Y. Lu, Y. Cai, G. Medioni, R. Nevatia, and F. Banaei-Kashani. Janus-multi source event detection and col- lection system for effective surveillance of criminal activity. Journal of information processing systems, 10(1):1–22, 2014. [78] Z. Shen, S. Arslan Ay, S. H. Kim, and R. Zimmermann. Automatic tag generation and ranking for sensor-rich outdoor videos. In Proc. of the 19th ACM intl. conf. on Multimedia, pages 93–102, 2011. [79] M. Shimrat. Algorithm 112: Position of point relative to polygon. Commun. ACM, 5(8):434–, Aug. 1962. [80] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object match- ing in videos. In null, page 1470. IEEE, 2003. [81] O. Sornil and K. Gree-Ut. An automatic text summarization approach using content-based and graph-based characteristics. In Cybernetics and Intelligent Sys- tems, 2006 IEEE Conference on, pages 1–6. IEEE, 2006. [82] S. Suri, K. Verbeek, and H. Yildiz. On the most likely convex hull of uncertain points. In European Symposium on Algorithms, pages 791–802, 2013. [83] Y. Tao, D. Papadias, and J. Sun. The tpr*-tree: an optimized spatio-temporal access method for predictive queries. In Proc. of the 29th Intl. Conf. on VLDB, volume 29, pages 790–801, 2003. [84] P. Theodorakopoulos and S. Lacroix. A strategy for tracking a ground target with a UAV. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, September 22-26, 2008, Acropolis Convention Center, Nice, France, pages 1254–1259, 2008. [85] Y. Theodoridis, D. Papadias, and E. Stefanakis. Supporting direction relations in spatial database systems. In Proc. of the 7th Intl. Symposium on Spatial Data Handling(SDH’96), 1996. [86] H. To, S. H. Kim, and C. Shahabi. Effectively crowdsourcing the acquisition and analysis of visual data for disaster response. In Big Data (Big Data), 2015 IEEE International Conference on, pages 697–706. IEEE, 2015. [87] H. To, H. Park, S. H. Kim, and C. Shahabi. Incorporating geo-tagged mobile videos into context-aware augmented reality applications. In Multimedia Big Data (BigMM), 2016 IEEE Second International Conference on, pages 295–302. IEEE, 2016. [88] K. Toyama, R. Logan, and A. Roseway. Geographic location tags on digital images. In Proc. of the 11th ACM Intl. Conf. on MM, pages 156–166, 2003. 117 [89] G. Wang, Y. Lu, L. Zhang, A. Alfarrarjeh, R. Zimmermann, S. H. Kim, and C. Sha- habi. Activekey frameselectionfor3dmodelreconstructionfromcrowdsourcedgeo- tagged videos. In IEEE International Conference on Multimedia and Expo, ICME, pages 1–6, 2014. [90] J. Xiao, H. Cheng, H. S. Sawhney, and F. Han. Vehicle detection and tracking in wide field-of-view aerial video. In The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pages 679–684, 2010. [91] K. young Whang and R. Krishnamurthy. The multilevel grid file - a dynamic hier- archical multidimensional file structure. In Proc. Intl. Conf. on Database Systems for Advanced Applications, pages 449–459, 1991. [92] F. X. Yu, R. Ji, and S.-F. Chang. Active query sensing for mobile location search. In Proceedings of the 19th ACM International Conference on Multimedia, MM ’11, pages 3–12, New York, NY, USA, 2011. ACM. [93] Y.ZhangandR.Zimmermann. Efficientsummarizationfrommultiplegeoreferenced user-generated videos. IEEE Transactions on Multimedia, 18(3):418–431, 2016. [94] Y.-T. Zheng, S. Yan, Z.-J. Zha, Y. Li, X. Zhou, T.-S. Chua, and R. Jain. Gpsview: Ascenicdrivingrouteplanner. ACM Transactions on Multimedia Computing, Com- munications, and Applications (TOMM), 9(1):3, 2013. [95] Z. Zhu and A. R. Hanson. Mosaic-based 3d scene representation and rendering. Signal Processing: Image Communication, 21(9):739–754, 2006. [96] Z. Zhu, E. M. Riseman, A. R. Hanson, and H. Schultz. An efficient method for geo-referenced video mosaicing for environmental monitoring. Mach. Vision Appl., 16(4):203–216, Sept. 2005. 118
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Enabling spatial-visual search for geospatial image databases
PDF
Location-based spatial queries in mobile environments
PDF
Partitioning, indexing and querying spatial data on cloud
PDF
Efficient updates for continuous queries over moving objects
PDF
Ensuring query integrity for sptial data in the cloud
PDF
GeoCrowd: a spatial crowdsourcing system implementation
PDF
Query processing in time-dependent spatial networks
PDF
Efficient crowd-based visual learning for edge devices
PDF
Leveraging georeferenced meta-data for the management of large video collections
PDF
Crowd-sourced collaborative sensing in highly mobile environments
PDF
Generalized optimal location planning
PDF
Deriving real-world social strength and spatial influence from spatiotemporal data
PDF
Combining textual Web search with spatial, temporal and social aspects of the Web
PDF
Efficient pipelines for vision-based context sensing
PDF
Resource scheduling in geo-distributed computing
PDF
Elements of next-generation wireless video systems: millimeter-wave and device-to-device algorithms
Asset Metadata
Creator
Lu, Ying
(author)
Core Title
Efficient indexing and querying of geo-tagged mobile videos
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/07/2018
Defense Date
01/25/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aerial videos,drone videos,field of view,geo-tagged,indexing,mobile videos,OAI-PMH Harvest,querying,user-generated videos
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Krishnamachari, Bhaskar (
committee member
), Ren, Xiang (
committee member
), Suma Rosenberg, Evan (
committee member
)
Creator Email
lvying603@gmail.com,ylu720@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-468393
Unique identifier
UC11266883
Identifier
etd-LuYing-6008.pdf (filename),usctheses-c40-468393 (legacy record id)
Legacy Identifier
etd-LuYing-6008.pdf
Dmrecord
468393
Document Type
Dissertation
Rights
Lu, Ying
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
aerial videos
drone videos
field of view
geo-tagged
indexing
mobile videos
querying
user-generated videos