Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Realistic and controllable trajectory generation
(USC Thesis Other)
Realistic and controllable trajectory generation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Realistic and Controllable Trajectory Generation by Haowen Lin A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (COMPUTER SCIENCE) August 2024 Copyright 2024 Haowen Lin Acknowledgements First and foremost, I want to express my sincere gratitude to my advisor, Prof. Cyrus Shahabi, for his invaluable advice, continuous support, and patience throughout my Ph.D. journey. He has taught me how to advance my research and consistently encouraged me to seek improvement. His remarkable presentation skills have left a lasting impression on me, guiding me through many crucial steps of presenting my research, such as defining a problem and connecting with the audience through a top-down approach. I would also like to extend my heartfelt thanks to my dissertation committee members, including Prof. Bistra Dilkina, Prof Marlon Boarnet for their valuable feedback on my thesis I would also like to thank Prof. Li Xiong for her mentorship throughout nearly all of my Ph.D. program. We collaborated on various projects, and she has been especially accommodating and kind, always offering valuable advice. I am also thankful to my co-author and good friend, Lou Jian, for our close collaboration on intriguing projects in privacy and security during the early years of the program. I have learned so much from his approach to make progress in research and idea development. I am proud and grateful to be a member of Infolab at USC. I have enjoyed fantastic times and received immense support from my labmates: Abdullah Alfarrarjeh, Ritesh Ahuja, Kien Nguyen, Luan Tran, Giorgos Constantinou, Chrysovalantis Anastasiou, Mingxuan Yue, Jiao Sun, Chaoyang He, Sina Shaham, Sepanta Zeighami, Yuehan Qin, Nripsuta (Ani) Saxena, Maria Despoina Siampou, Shang-Ling (Kate) Hsu, Bita Azarijoo, and Narges Ghasemi. I would especially like to thank my co-author Arash Hajisafi for his ii significant contributions to my research. Additionally, I am grateful to Professor to Yao-Yi Chiang, and other university staff for their great help in my academic journey. Lastly, my heartfelt thanks go to my family for their understanding and encouragement. I am especially thankful to my husband for his tremendous support and advice. Without him, I could not have completed this journey alone. His optimism and patience gave me the confidence to overcome any challenge. iii Table of Contents Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Chapter 2: Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1 Trajectory Generation for Learning Mobility Patterns . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 Trajectory generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Moving Behavior Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.4 Controlled Sequence Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Continuous trajectory modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Spatiotemporal Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 STPP for trajectory modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 3: Deep Generative Models for Realistic and Representative Trajectories . . . . . 13 3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 Sequential Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Moving Behavior Preserving GAIL (MBP-GAIL) . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2.2 Policy Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2.1 Mobility Trajectory Encoder . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.2.2 Context Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.2.3 Spatial Dynamics Enforcer . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2.4 Density Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Discriminator and Moving Behavior Classifier . . . . . . . . . . . . . . . . . . . . . 25 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 iv 3.3.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.2 Performance Comparison (RQ1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2.1 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2.2 Dataset-level Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.2.3 Individual-level Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.3 Moving Behavior Evaluation (RQ2) . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3.1 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.4 Ablation Study (RQ3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Chapter 4: Unified Modeling and Clustering of Mobility Trajectories . . . . . . . . . . . . . 38 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Deep Trajectory Modeling and Clustering (DTMC) . . . . . . . . . . . . . . . . . . . . . . 42 4.2.1 Trajectory Cluster Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.2 Learning spatiotemporal Dynamics of Cluster . . . . . . . . . . . . . . . . . . . . . 44 4.2.2.1 Decomposing Hidden Variables . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.2.2 Temporal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.2.3 Spatial Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2.3 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.3.1 E-step: Update Cluster Assignment . . . . . . . . . . . . . . . . . . . . . 48 4.2.3.2 M-step: Update Model Parameters . . . . . . . . . . . . . . . . . . . . . . 48 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1.1 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1.2 Real-world datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.2 Compared Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.4 Clustering Performance on Synthetic Dataset . . . . . . . . . . . . . . . . . . . . . 55 4.3.5 Clustering Performance on Real-World Dataset . . . . . . . . . . . . . . . . . . . . 56 4.3.6 Representation Learning Performance via Log-likelihood . . . . . . . . . . . . . . . 57 4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5 Supplementary Proof of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Chapter 5: Controllable Visit Trajectory Generation with Spatiotemporal Constraints . . 63 5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Constraint Enforced Trajectory Generation (Geo-CETRA) . . . . . . . . . . . . . . . . . . 68 5.2.1 Model Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2.2 Constraint Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.2.3 Spatiotemporal Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.4 Modeling p(τ )ith Reparameterization . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2.5 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3.1 Dataset and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3.2 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.3.3 Baseline Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 v 5.3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3.5 Overall Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.6 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.7 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.8 Application to Location Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.9 Compared with Post-process Constraint Enforcement . . . . . . . . . . . . . . . . 84 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.2 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Chapter 6: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 vi List of Tables 3.1 Detailed statistics of the datasets. # POI denotes the number of POIs. Longitude and Latitude show the detailed spatial ranges of the selected cities. . . . . . . . . . . . . . . . . 28 3.2 Performance comparison of our model and baselines on the Houston dataset, where the lower value indicates a better performance. Bold denotes the best(lowest) results and the underline denotes the second-best results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Performance comparison of our model and baselines on the Los Angeles dataset . . . . . . 32 3.4 Performance comparison of our model and baselines on the Houston, where lower value indicates a better performance. Bold denotes best (lowest) results and underline denotes the second-best results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Performance comparison of our model and baselines on the Los Angeles dataset. . . . . . . 36 4.1 Cluster performance comparison of our model and baselines on the synthetic datasets, where the lower value indicates a better performance. Bold denotes the best(highest) results and the underline denotes the second-best results. . . . . . . . . . . . . . . . . . . . 52 4.2 Log-likelihood per event on synthetic dataset (higher is better). . . . . . . . . . . . . . . . 55 4.3 Log-likelihood per event on real-world data. . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.1 Dataset Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Performance comparison of our model and baselines on two mobility datasets, where the lower value indicates better performance. Bold denotes the best(lowest) results and the underline denotes the second-best results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3 Results of the ablation study in terms of different metrics. . . . . . . . . . . . . . . . . . . . 83 5.4 Comparison of post-process constraint enforcement in realistic settings. . . . . . . . . . . 84 5.5 Comparison of post-process constraint enforcement in Hotspot settings. . . . . . . . . . . 84 vii List of Figures 3.1 A Houston map with context types for each location cell. Left: the context type of each grid in Houston. Right: The map of Houston in the same region. . . . . . . . . . . . . . . . 15 3.2 Illustration of the MBP-GAIL framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Distribution of the POI types in Houston. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Distribution of the POI types in Los Angeles. . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Next location prediction in Houston and Los Angeles. . . . . . . . . . . . . . . . . . . . . . 33 3.6 Moving Behavior distributions of the real-world data, MBP-GAIL, SeqGAN, and MoveSim. 34 4.1 Example of trajectories with two different moving patterns. . . . . . . . . . . . . . . . . . . 39 4.2 The workflow of our proposed modeling and clustering framework. . . . . . . . . . . . . . 41 4.3 Illustration of the spatiotemporal modeling with cluster embedding. . . . . . . . . . . . . . 44 4.4 Visualization of the synthetic dataset. Each image represents one trajectory in specified moving patterns. The x-axis and y-axis represent the spatial region. Dark dots correspond to the oldest points, and light dots correspond to the most recent. Red arrows denotes the moving directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5 Confusion matrix on synthetic dataset when K = 4. . . . . . . . . . . . . . . . . . . . . . . 53 4.6 Silhouette score on real-world dataset. The higher value represents a better cluster quality. 56 5.1 An example of diversified spatiotemporal constraint. To satisfy the constraint, a staypoint must appear in the defined spatial range (outline in red rectangle) and time window (9:15-10:00) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 viii 5.2 The workflow of our proposed Geo-CETRA framework. (a) constrained spatiotemporal (ST) model architecture where it takes past visits and generate next point within the constrained space. (b) takes the constraint per trajectory and factorize it into visit-level constraint that be used as input for constrained ST model (c) shows the training process and (d) shows the evaluation process with beam decoding. . . . . . . . . . . . . . . . . . . 68 5.3 Illustration of the beam decoding procedure. In this example, B = 2 . . . . . . . . . . . . . 75 5.4 Spatial distribution of the aggregated population in Houston. . . . . . . . . . . . . . . . . . 82 5.5 Next location prediction accuracy based on generated trajectories. . . . . . . . . . . . . . . 84 ix Abstract Accessing realistic human movements (aka trajectories) is essential for many application domains, such as urban planning, transportation, and public health (e.g., understanding the spread of an epidemic). However, due to privacy and commercial concerns, real-world trajectories are not readily available, giving rise to an important research area of generating synthetic but realistic trajectories. Traditional rule-based methods rely on predefined heuristics and distributions that fail to capture complicated transition patterns in human mobility. Inspired by the success of deep neural networks (DNN), data-driven methods learn underlying human decision-making mechanisms and generate synthetic trajectories by directly fitting realworld data. Despite this progress, existing approaches lack mechanisms to control the generation process. This lack of control on the generated trajectories greatly limits their practical applicability. In addition, existing studies on trajectory mining applications often projects GPS coordinates onto discrete geographical grids and time intervals, utilizing recurrent neural networks (RNN) or Transformers to capture the sequential information of the trajectories for various analysis tasks such as trajectory simulation [82] and next location prediction. However, modeling human movements requires algorithms that can effectively capture inherently complex spatial and temporal dependencies and transforming trajectories into regular grids and time intervals cannot accurately model real-world trajectories with irregular moving patterns. This thesis addresses these two shortcomings by proposing generation algorithms under various control settings. x First, existing data-driven trajectory generators assume that the next state is solely determined by mimicking individual human actions. In reality, each trajectory is influenced by an underlying purpose that collectively impacts human decisions. To address this challenge, we propose MBP-GAIL, a novel framework based on generative adversarial imitation learning that synthesizes realistic trajectories that preserve moving behavior patterns in real data and are thus more representative. MBP-GAIL models temporal dependencies by Recurrent Neural Networks (RNN) and combines the stochastic constraints from moving behavior patterns and spatial constraints in the learning process. Through comprehensive experiments, we demonstrate that MBP-GAIL outperforms state-of-the-art methods and can better support decision-making in trajectory simulations. Second, current trajectory generation often discretize space and time to model trajectory data with sequence-analysis techniques like Transformers and LSTM, but this discretization tends to obscure the intrinsic spatial and temporal characteristics inherent in trajectories. Recent work shows the effectiveness of modeling trajectories directly in continuous space and time using the spatiotemporal point process (STPP). However, these approaches often assume that all observed trajectories originate from a single underlying dynamic. In reality, real-world trajectories exhibit varying dynamics or moving patterns. We hypothesize that grouping trajectories governed by similar dynamics into clusters before trajectory modeling could enhance modeling effectiveness. Thus, we present a novel approach that simultaneously models trajectories in continuous space and time using STPP while clustering them. Our method leverages a variational Expectation-Maximization (EM) framework to iteratively improve the learning of trajectory dynamics and refine cluster assignments within a single training phase. Extensive tests on synthetic and real-world data demonstrate its effectiveness in clustering and modeling trajectories. Third, existing approaches lack mechanisms to explicitly control the generation process by time and space range, which prevents the incorporation of prior knowledge and the spatiotemporal specification of certain visits. To address these limitations, we formally define the Constraint Trajectory Generation xi problem and introduce Geo-CETRA (Constraint Enforced Trajectory Generation), a novel framework that operates within the continuous spatiotemporal space, enabling direct generation of geographical coordinates and duration of each visit in a trajectory. Geo-CETRA then reparameterizes the sampling space for effective enforcement of various spatiotemporal constraints. Finally, by incorporating a constraint factorization approach along with an innovative beam decoding module, Geo-CETRA facilitates the production of high-quality synthetic trajectories that realistically emulate human movement while satisfying predefined spatiotemporal constraints. Extensive experiments on real-world datasets validate the effectiveness of Geo-CETRA, demonstrating its ability to generate more precise and contextually appropriate trajectories compared to existing methods. xii Chapter 1 Introduction 1.1 Motivations Recent years have witnessed rapid advancements in location-sensing technologies, such as Global Position Systems (GPS) and Radio Frequency Identification (RFID) embedded in mobile devices. These locationsensing technologies have enabled us to collect a large amount of spatiotemporal trajectory data depicted by sequences of chronologically ordered geographical points of moving objects. Such sequences are crucial for a broad range of applications [100, 49, 22]. In urban planning [1], trajectory data can inform the development of infrastructure and transportation systems, ensuring they meet the actual needs of the population. Similarly, in public health, analyzing movement patterns can aid in identifying areas at risk of disease outbreaks and optimizing resource allocation. Similarly, in public health [41], analyzing movement patterns can aid in identifying areas at risk of disease outbreaks and optimizing resource allocation. Despite their tremendous value, access to these spatiotemporal data is often restricted due to privacy concerns and commercial interests [34, 16]. For instance, trajectories possess a unique ability to connect disparate datasets, potentially revealing personally identifiable information through inference. Research indicates that it is relatively easy to identify a user’s name, home address, or religious beliefs using location data [47]. Consequently, researchers often face challenges in obtaining large-scale, high-quality datasets, which are essential for developing accurate models and applications [101]. To address this issue, there 1 is a growing interest in creating generators that can efficiently synthesize realistic trajectories. These generators use advanced algorithms to produce synthetic data that closely resemble real-world movements, providing a viable alternative for research and application development without compromising privacy. This approach not only expands the availability of useful data but also enables researchers to test and validate their models under various scenarios, ultimately advancing the field of spatiotemporal analysis and its applications. The simulation of geospatial trajectories has been widely studied. Traditionally, rule-based methods assume that individual mobility can be described by predetermined mechanisms using a few specific mobility parameters, such as the average time spent at each visiting location [37, 66]. However, real-world trajectories exhibit complex transition patterns that cannot be accurately captured by simple rules. Inferred parameters are often inaccurate, and manually setting these parameters for simulations is a challenging task. Inspired by the success of deep generative neural networks in recent applications such as computer vision and natural language processing [36, 56, 58], data-driven generators such as Variational Autoencoders (VAE) [20, 33], Generative Adversarial Imitation Learning (GAIL) [30] and Generative Adversarial Networks (GAN) [27] have been developed to leverage real-world data for generating synthetic trajectories. These approaches have not only achieve superior performance in mobility trajectory generation but have also been applied to model individual daily activities (e.g., Point-of-Interests check-ins), which would be useful for tracing close contacts and understanding human daily life patterns [93]. Traditional rule-based trajectory generation algorithms are prohibitively expensive and require expert skills, making automated generation methods highly desirable. In this thesis, we formulate the problem of trajectory generation as a deep-learning-based task that can be approached using various data-driven techniques. However, generating realistic trajectories presents several challenges. This thesis specifically addresses two crucial aspects: how to integrate various types of control over generation process to make 2 more practical and realistic trajectories and how to accurately model trajectories with irregular patterns in continuous space and time. First, the existing data-driven trajectory generators are limited in that they assume that the next state is completely decided by mimicking individual human actions, while in the real world, each trajectory typically comes with an underlying purpose that could collectively influence human decisions. For example, knowing that the purpose of our travel is to commute to work suggests that we should start from a residential area and end at a business area (with perhaps a stop at a coffee shop on the way). We term such semantic information as the “moving behavior” of trajectories, i.e., the traveling purpose that describes a user’s movement, such as a home-to-work commute. When we compare a leisure trip to a park as another example, the selection of route and transportation will differ from that of commuting to work. As a result, the moving purpose plays an important role in determining the speed and location of the trajectory generated for the trip, highlighting the difference between a leisure trip and a daily commute. Lacking the control over moving behavior information in the trajectory generation process not only limits the applications in developing advanced downstream modeling tasks [94], e.g., precise ads-targeting on locations frequently passed by specific types of moving behavior but also makes the generation model less realistic. Second, modeling trajectory is challenging due to the inherently irregular and asynchronous characteristics of moving dynamics, with each data point existing in continuous time and space. Prior research on trajectory mining applications often projects GPS coordinates onto discrete geographical grids and time intervals, utilizing recurrent neural networks (RNN) or Transformers to capture the sequential information of the trajectories for various analysis tasks such as trajectory simulation [82] and next location prediction [85]. However, modeling human movements requires algorithms that can effectively capture inherently complex spatial and temporal dependencies and transforming trajectories into regular grids and time intervals cannot accurately model real-world trajectories with irregular moving patterns.[93]. 3 Finally, we explore a control setting that focuses on explicit control over the generation process concerning location and time. Control is important, for example, to incorporate prior knowledge into the generation process. If we know that certain trajectories typically start from a residential area in the morning, proceed to a business district during working hours, and return to the residential area at night, then controlling the synthetic trajectories to adhere to these patterns results in more realistic outcomes, reflecting feasible speeds and complying with specific personal preferences. Alternatively, for specific downstream tasks such as epidemiology, we may want to ensure that synthetic trajectories pass through specific hotspots, like a concert venue, at certain times of the day. This approach would aid governments in effectively targeting interventions in specific areas. 1.2 Thesis Statement Given the availability, importance, and privacy of trajectory data, along with the potential and challenges of generation algorithms, we argue that the deep-learning based generation trajectory algorithm can benefit from integrating explicit or implicit constraints of the trajectories. This integration not only enhances the accuracy and realism of synthetic trajectory data but also ensures that the generated sequences respect inherent patterns and variations observed in real-world data. By incorporating these constraints, we can better simulate complex movements and behaviors, leading to more effective applications in various fields such as transportation, urban planning, and security, all while maintaining the necessary privacy safeguards. More formally, the thesis statement is: The integration of explicit and implicit constraints within the continuous spatiotemporal domain enhances the generation of realistic and representative irregular moving sequences 4 1.3 Contributions With the goal of enhancing trajectory generation algorithms to address the limitations discussed in Section 1.1, this thesis proposes three deep-learning-based approaches and explores various control settings. It presents the challenges associated with these settings and the corresponding proposed solutions. Each approach is rigorously evaluated through extensive experiments on real-world datasets. Specifically, the contributions of this thesis are as follows: • To exploit implicit control over the generation process regarding people’s moving behaviors (e.g., work commute, shopping trips), which significantly influence human decisions during trajectory generation, we propose MBP-GAIL (Moving Behavior Preserving-GAIL), a novel framework based on generative adversarial imitation learning that synthesizes realistic trajectories that preserve moving behavior patterns in real data and are thus more representative. MBP-GAIL employs Recurrent Neural Networks (RNN) to effectively model temporal dependencies, and it integrates stochastic constraints derived from moving behavior patterns along with spatial constraints throughout the learning process. By incorporating these multifaceted constraints, MBP-GAIL ensures that the generated trajectories are not only realistic but also contextually relevant. • To accurately model the trajectories in continuous space and time while also capturing the underlying moving dynamics, we present a novel approach that simultaneously models trajectories in continuous space and time using STPP while clustering them. Our method leverages a variational Expectation-Maximization (EM) framework to iteratively improve the learning of trajectory dynamics and refine cluster assignments within a single training phase. Ex- tensive tests on synthetic and real-world data demonstrate its effectiveness in clustering and modeling trajectories 5 • To explicit control the generation process, which incorporates of prior knowledge and the spatiotemporal specification of certain visit in trajectories, we formally define the Constraint Trajectory Generation problem and introduce Geo-CETRA (Constraint Enforced Trajectory Generation), a novel framework that operates within the continuous spatiotemporal space, enabling direct generation of geographical coordinates and duration of each visit in a trajectory. Geo-CETRA then reparameterizes the sampling space for effective enforcement of various spatiotemporal constraints. Finally, by incorporating a constraint factorization approach along with an innovative beam decoding module, Geo-CETRA facilitates the production of high-quality synthetic trajectories that realistically emulate human movement while satisfying predefined spatiotemporal constraints. 1.4 Thesis Outline The structure of the thesis is organized as follows: In Chapter 2,we review the major related works introduces the related studies of trajectory generation and continuous trajectory modeling. In Chapter 3, we propose a new framework that integrates prior moving behavior patterns into a GAIL method for trajectory generation to generate realistic and representative human mobility sequences. In Chapter 4, we further improve the accuracy trajectory model by directly modeling them in to continuous space and time. We propose a novel framework which unifies modeling the trajectory dynamics and clustering them based on their dynamics simultaneously via spatiotemporal point processes. In Chapter 5, we study the problem of constrained trajectory generation with hard spatiotemporal constraints and propose Geo-CETRA ( Constraint Enforced Trajectory Generation), a general framework that can generate high-quality synthetic trajectories in continuous space-time while effectively enforcing the satisfaction of given spatiotemporal constraints by reparameterizing the sampling space itself and controlling the decoding of the generation process. 6 In Chapter 6 we summarize our contributions and discuss potential future work. 7 Chapter 2 Related Work 2.1 Trajectory Generation for Learning Mobility Patterns 2.1.1 Trajectory generation Simulating geospatial trajectories has been extensively studied. Existing techniques can be roughly divided into two categories: rule-based methods [66] and data-driven methods. Rule-based methods assume that human mobility traces can be modeled by realistic spatiotemporal properties with explicit physical meanings to describe the key characteristics of human mobility. For example, the modeling framework in [37] integrates temporal (e.g., propensity to travel or not) and spatial choices (e.g., explore a new location or not) of individual mobility to generate synthetic trajectories [37]. However, these parametric methods are sensitive to parameter selection, rely on simplified mobility assumptions, and usually fail to capture the inherent randomness and complexity of geospatial trajectories. To improve learning trajectory patterns from real-world data, model-free methods apply deep generative models to learn the latent representation of trajectories directly from the data. Ouyang et al. [64] converts location traces into twodimensional images and uses CNN-based GANs for generations. Later, Feng et al. [23] applies the GAN structure and designs a self-attention model that sequentially captures the temporal transitions of trajectories and integrates prior knowledge of the urban structure and the pre-defined mobility regularities of 8 spatial continuity and temporal periodicity to generate realistic trajectories. To accurately replicate the decision-making process of agents, Generative Adversarial Imitation Learning (GAIL) was developed by integrating the GAN framework [65, 82]. This decision-making process has not only been applied to spatiotemporal trajectory generation but also applied to activity data to find insightful activity patterns [93]. However, these approaches ignore the important semantic moving behaviors and do not generate realistic trajectories (e.g., no stay durations), thus limiting their performance and applications. 2.1.2 Moving Behavior Learning Another line of research on trajectory focuses on understanding and analyzing people’s moving behaviors from raw GPS points. Traditionally, the semantic purposes of trajectories are collected based on census and household travel surveys [24]. However, the high costs of gathering the surveys put limits on their sample sizes and have made it very hard to infer the choices of the entire population. Recent studies start with an unsupervised neural clustering approach to identify moving behaviors from raw trajectories [95]. They first convert trajectories into “context" trajectories augmented with nearby POIs and then apply kmeans algorithm on the embedding extracted from RNN layers. Yue et al. apply variational auto-encoder (VAE) on clustering “context" trajectories to group trajectories with similar moving behaviors. However, most of these approaches focus on clustering, and it would bring more challenges when applying those to trajectory generation. One more recent approach in trajectory generation that brings moving behavior into generation is in [96], where they model the underlying purpose by assigning a global variable as a unit Gaussian in the latent embedding of the whole trajectory. However, they do not enforce any constraint on this global variable for moving purposes and thus, their generated trajectory may not preserve any moving behavior patterns. 9 2.1.3 Trajectory Clustering Trajectory clustering methods aim to gain space time insights inside trajectory data [91]. Most clustering techniques working on raw trajectories adopt predefined distance or similarity metrics such as the classic Euclidean, Hausdorff and dynamic time warping (DTW) suited to specific applications [3]. However, these methods are ineffective due to the strong parametric assumption, which fails to account for the complex spatiotemporal associations underlying trajectories. Recent advances have shifted towards deep learning approaches for trajectory clustering [95, 31]. These techniques commonly utilize autoencoder-based strategies, converting trajectories into fixed-length vectors for clustering within a bifurcated training process [63]. However, such clustering methods are sensitive to small changes in the learned features. A recent work provides an end-to-end clustering algorithm considering temporal dynamics [99]. However, it only focuses on temporal data such as stocks and clinic visits, without considering spatial modeling. Another end-to-end trajectory clustering algorithm is [94], but it transforms the trajectories with a description of the POIs (as opposed to using raw GPS points in our method) to cluster based on mobility purposes (e.g., shopping, eating). 2.1.4 Controlled Sequence Generation Controlled sequence generation is a significant research area on the field of NLP problems that focuses on generating text according to specific guidelines or constraints [60, 46]. This capability is crucial for many applications, ensuring that generated text meets precise requirements for style, tone, content, or format. A variety of methods have been developed. One line of research on constrained text generation focuses fine-tuning language models with control code or prompt-based method [39, 46]. Another family of approaches that enforce keyword-type constraints are modifying post-processing methods and inject constraints into the decoding algorithm by using penalizing term [8, 9, 10, 21, 26, 102, 103]. The availability of these controlled generation models has opened up new possibilities for removing biased, non-factual 10 outputs in text generation and enhancing human-computer interaction in various applications [7, 11, 12, 13, 55, 57, 59]. However, the availability of comprehensive research specifically focused on controlling spatial-temporal data is limited. FDSVAE [96] proposes to impose spatial constraint (i.e., a fixed speed limit ) over the entire dataset. However, this methods cannot generalize to different constraints per trajectory which do not and, thus is inadequate to address the individual generation requirements with diversed spatiotemporal constraints. 2.2 Continuous trajectory modeling 2.2.1 Spatiotemporal Point Processes Modeling spatiotemporal events that are localized in continuous time and space is a critical task across many scientific fields and applications. Most work on has been focused on spatiotemporal data measured at regular space-time intervals [75, 87]. However, continuous time sequence models such as spatio-temporal point processes (STPPs) provide an elegant and principled framework for modeling such irregular event data, leading to their widespread application in a diverse range of disciplines. For instance, STPPs have been instrumental in modeling earthquakes and aftershocks, providing insights into seismic activity and risk assessment [71]. They are also used to understand the occurrence and propagation of wildfires, aiding in fire management and prevention strategies [29]. In the realm of public health, STPPs model the spread of epidemics and infectious diseases, offering crucial information for containment and mitigation efforts [61]. 2.2.2 STPP for trajectory modeling Utilizing spatio-temporal processes for trajectory modeling has become a pivotal approach in understanding and predicting the movement patterns of entities over time and space [93, 14]. Spatio-temporal processes provide a robust framework that captures the dynamic interplay between spatial locations and 11 temporal sequences, allowing for the generation of realistic and contextually relevant trajectories. For example, Yuan et al [93] propose to generate artificial activity trajectories by spatio-temporal point process. It captures spatiotemporal dynamics using neural differential equations, incorporating both continuous flow and instantaneous updates to model the complex transitions between activities. Long et al. proposes a novel two-layer VAE-based model designed to generate practical synthetic human trajectories [50]. The framework incorporates variational temporal point processes, decoupling travel time and dwell time to enhance realism. However, these approaches have not fully exploited inherent grouped moving dynamics patterns of real-world trajectories, leading to less realistic and less useful trajectory simulations. 12 Chapter 3 Deep Generative Models for Realistic and Representative Trajectories In this chapter, we focus on trajectory generation with incorporating people’s moving behaviors (e.g., work commute, shopping purpose) into the generation process. A real-world trajectory is influenced by an underlying purpose, such as commuting to work or leisure trips.The moving purpose significantly impacts the speed and location of the generated trajectory, distinguishing between various types of trips. Without incorporating moving behavior information, trajectory generation models become less realistic and limit the development of advanced downstream applications, such as precise ad targeting based on frequently visited locations. We proposes a new framework that integrates prior moving behavior patterns into the GAIL method. Incorporating moving behavior is not a trivial task since the raw trajectory coordinates do not contain any useful information indicating the moving behavior. Following [94], we generate the “context sequence” for each trajectory from nearby Points-Of-Interest (POIs), then extend the notion of moving behavior (See Def 6) to be defined as the transition patterns of context sequences, termed Context Trajectory. Subsequently, we consider the action of a human as a joint decision influenced by past moving histories and context trajectories guided by a specific moving behavior pattern. We jointly incorporate the dynamics of the transitions between raw locations and their context in a generator to learn the movement policy. We also propose a discriminator to differentiate the generated trajectories from the observed real trajectories and 13 a classifier for evaluating moving behavior patterns. Moreover, our framework is flexible to incorporate reasonable inductive bias in trajectory generation, such as the inherent spatial dependencies between the consecutive raw locations. In summary, our main contributions are: • To the best of our knowledge, we present the first attempt to explicitly model the intrinsic motivations of traveling purposes with context sequences to generate realistic and representative human mobility sequences. • We propose a novel synthetic trajectory generation framework based on GAIL, which jointly models the location and context transition informed by the learned spatial constraint. • We conduct extensive experiments on real-world data through various evaluation metrics, and the empirical results show that compared to the state-of-the-art methods, our method not only generates more realistic human movement trajectories but also preserves moving behavior patterns. The remainder of the paper is organized as follows. In Section 3.1, we introduce the preliminaries and formally define the problem. In Section 3.2, we propose our MPB-GAIL (Moving Behavior PreservingGAIL) method and outline our solution framework. In Section 3.3.4, we report our experiments on realworld data and present the results of our ablation analysis. Finally, we conclude the paper in Section 3.4. 3.1 Preliminaries In this section, we formally define our research problem and provide the necessary preliminary concepts. 3.1.1 Problem Formulation We first define two types of trajectories, the mobility trajectory and context trajectory, and then define the moving behavior based on context transitions. 14 Industry Area Commercial Area Recreation Education Utilities Health Care Residential Area Others (a) The Context Type of Each Location Grid in Houston (b) The Map of Houston Figure 3.1: A Houston map with context types for each location cell. Left: the context type of each grid in Houston. Right: The map of Houston in the same region. Definition 1 (Mobility Trajectory) A mobility trajectory is a sequence of spatiotemporal points, i.e., τ L = [τ L 1 , τ L 2 , · · · , τ L N ] whereτ L i is a tuple (li , ti), ti is the timestamp, li denotes the location, which can be a pair of coordinates (lat,long) or a regionidentification (ID), and N is the total length of the trajectory. Definition 2 (Time Intervals) We divide the time range into disjoint time intervals of equal length where a time interval denotes a granular period of time in a day. Definition 3 (Grid-based Partitioning) We partition an area with a fixed-size grid into small regions. Each region is a square cell. The sequence of locations for the spatiotemporal point in a trajectory is then transformed into a sequence of region IDs where the points reside. Following a common practice in trajectory preprocessing [23], we first perform grid-based partitioning on the study region with fixed time intervals to represent the temporal information. However, the raw spatiotemporal points in the trajectory do not provide useful information about the purpose of the 15 movement, i.e., moving behavior. Hence, we preprocess the mobility trajectories and transform the spatiotemporal trajectories into context sequences. For each location, we group and count the POIs located inside its corresponding grid cell, and, without loss of generality, the context type is represented as the category of the most counted POI type, such as Industry, Residential Area, Education, or Health Care. Figure 3.1 shows the visualization of the context type in each grid cell in Houston (left) and a map of Houston (right). We define a conversion matrix that shows the mapping relation between locations and context type. Thus, all mobility trajectories can be transformed into context trajectories with the following definition: Definition 4 (Conversion Matrix) A conversion matrix used to map between locations and context types is denoted as Γ ∈ {0, 1} Q×K, where Q is the total number of context types, K is the total number of locations, and Γq,k = 1 if the k th location belongs to the q th context type 0 otherwise (3.1) Definition 5 (Context Trajectory) Similar to the mobility trajectory, a context trajectory is a chronologically ordered sequence, i.e., τ C = [τ C 1 , τ C 2 , · · · , τ C N ] where each element τ C i is a context-time tuple (ci , ti), and ci is the location context type, which can be obtained using the mobility trajectory and the conversion matrix by ci = arg max(Γ × OneHot(li)). The problem of inferring moving behavior from a trajectory is first identified and explored in [95], where a deep clustering approach is used to detect the moving behavior. We also define the moving behavior by using the clustering result of context trajectories, which can be elaborated as: Definition 6 (Moving Behavior) The moving behavior m ∈ {1, 2, ...., M} are the labels of the grouped trajectories that have high proximity of transition patterns in context sequences. We apply DBSCAN 16 clustering [62] on the edit distance [70], a common trajectory distance measuring metric, of the context trajectories to obtain the moving behavior label from the real trajectories. Problem 1 (Realistic and Representative Synthetic Trajectory Generation) Given the real-world trajectories in a specific area, where each trajectory is associated with a specific moving behavior type m ∈ {1, 2, ..., M}, the goal is to mimic individuals’ decision-making process and generate synthetic trajectories while retaining the individual and overall moving behavior properties of each trajectory and the entire trajectory population, respectively. 3.1.2 Sequential Decision Process We assume each mobility trajectory is generated by an individual’s sequential decisions on which location to go under a specific traveling purpose (e.g., shopping or exercising). We formulate the problem as a Markov Decision Process (MDP). The basic elements of the MDP are defined as follows. • State. The state s is defined as the history of mobility trajectory until the observed step n, i.e., sn = [τ L 1 , ..., τ L n ], 1 ≤ n ≤ N, which includes the information of the timestamp and the location ID. • Action. An action a represents a decision that a human agent makes at a state sn. In our problem, it is defined as moving to location ln+1 at time tn+1. • State transition. The state transition controls how the state updates after the selected action, i.e., the state sn is updated to [τ L 1 , ..., τ L n , τ L n+1]. • Policy. The policy πθ(a|s, m) characterizes the probability distribution to choose an action at a step n, given the state s and the moving behavior type m. The policy function governs how a human agent makes decisions under different circumstances. • Reward. The reward r is an inherent function and evaluates the decision of taking action under the state, whose input is the state-action pair (s, a). 17 As a result, a human agent’s decision-making strategy can be characterized by two functions: the policy function πθ(a|s, m) controlling how the agent chooses an action and the reward function r(s, a, m) governing how the agent evaluates states and actions. 3.2 Moving Behavior Preserving GAIL (MBP-GAIL) In this section, we present a novel framework (see Fig 3.2) entitled Moving Behavior Preserving GAIL (MBP-GAIL), which jointly learns the policy and reward functions using a generative adversarial net (GAN). 3.2.1 Overview MBP-GAIL has two major components. The policy network πθ (the yellow component in Figure 3.2) as the generator, which, when given the desired moving behavior m, learns to generate an action a similar to the real-world cases based on the current state. The second component, reward network r (the green component in Figure 3.2) consists of a discriminator trained to distinguish between policy-generated and real-world cases and a classifier to detect the moving behavior patterns. The policy network generates the action distribution λ with an RNN-based model that contains three major components: the mobility trajectory encoder, the context predictor, and the spatial dynamics. The mobility trajectory encoder generates a density vector λ L as the base for the next-location distribution given the past observed mobility trajectory τ L and the desired moving behavior m. Meanwhile, the context predictor provides guidance from the perspective of context types: it first generates a context type probability distribution p C based on m and the past context trajectory τ C converted from the observed τ L, and then uses the conversion matrix Γ to convert the context type distribution to its location correspondence λ C. Lastly, the spatial dynamics incorporates spatial continuity constraints, such that the moving 18 �! " �# " v v v v v �! $ �# $ FC Layer Gaussian Mapping � Density Fusion Context Trajectory Encoder Mobility Trajectory Encoder Spatial Dynamics Enforcer ℎ$ �$ �" �% �& �" �' " �' $ �̃ ! $ �̃ # $ �! " �# " � Moving Behavior Classifier � �! $ �# $ Discriminator � �̃ !"# $ �!"# $ �!"# % 1 0 �( �" Action Sampling � Rewards Propagation �!"# $ Trajectory Generation Rewards Learning Policy Learning Real Generated Figure 3.2: Illustration of the MBP-GAIL framework. 19 distance follows a Gaussian-type distribution by generating a density factor λ S based on the encoded latent information of τ L. The final action distribution λ is calculated by fusing density vectors λ L, λ C, and λ S by multiplying the three together, from which we can sample an action a accordingly. The generated trajectories are then fed to 1) the discriminator Dψ to output a score of the probability of the trajectory being similar to the realistic ones 2) the moving behavior classifier C that evaluate whether the generated trajectories have preserved the moving behavior pattern. The policy network and the reward function are jointly optimized through the framework of GAIL to solve a minimax problem as follows: [105] : max ψ min θ L(θ, ψ) = E(s,a,m)∈TE log Dψ(s, a, m)+ E(s,a,m)∈TG log(1 − Dψ(s, a, m)) − βH(πθ) (3.2) where TE and TG are the observed true trajectories and the trajectories generated by policy network πθ under the moving behavior m, respectively. H(πθ) is the entropy regularization term, which controls finding the policy with maximum causal entropy. 3.2.2 Policy Network 3.2.2.1 Mobility Trajectory Encoder The mobility trajectory encoder encodes the historical mobility movements and generates a density vector that represents the likelihood for the next location. At each timestamp, the mobility trajectory encoder first embeds the location ID li , time ti , and moving behavior m via embedding layers and concatenates them into a dense representation: 20 x L i =Concat(WL l × OneHot(li), WL t × OneHot(ti), WL m × OneHot(m)) (3.3) where Concat is the concatenate operation. WL l ∈ R HLl×K, WL t ∈ R HLt×N , and WL m ∈ R HLm×M are the weight matrices, and x L i ∈ R (HLl+HLt+HLm) is the embedding vector for the dense representation. RNN networks have been widely used to predict sequential actions with spatiotemporal context [98]. To capture the sequential actions with spatiotemporal data, for every step i, the mobility trajectory encoder feeds x L i together with the latent vector that represents the past observed mobility trajectory to the RNN and then obtains a fixed length latent vector h L i ∈ R HLh that encodes the mobility trajectory until step i: h L i = RNNL (x L i , hL i−1 ) (3.4) Lastly, a multilayer perceptron (MLP) with a softmax function is used to transform the latent vector h L i to a density vector λ L i ∈ R K that represents the likelihood of going to location li at time ti that is learned from the mobility trajectory: λ L i = Sof tmax(ϕ(WL h × h L i + b L h )) (3.5) where WL h ∈ R K×HLh and b L h ∈ R K are the learnable weights in the MLP, and ϕ is the rectified linear activation unit (ReLU) function. Here, λ L i is a K-dimensional (K is the total number of locations) vector representing how likely each grid will be predicted in the next step solely from the past mobility trajectory. A higher value in the density vector indicates a higher chance that the corresponding location will be generated. 21 3.2.2.2 Context Predictor As human mobility reveals the functions and properties of urban regions [83], we consider the context of a location as an important factor underlying the decision process. For example, a work commute trajectory generally travels and stays in the work area and residential area, resulting in locations with those designated region functionality having a higher likelihood to appear in the trajectory. To model the simulation process guided by the contextual information, we train a context predictor that shares a similar model to the RNN followed by MLP architecture used in the mobility trajectory encoder. The context predictor takes the past context trajectory τ C and the moving behavior m as inputs and generates a probability vector p C that predicts the likelihood of the next context type. Then this context-type probability vector is mapped to a density vector λ C for guiding the location prediction in the next step. In particular, we first embed and concatenate the context sequence representation and moving behavior embedding and feed it into an RNN encoder: x C i =Concat(WC c × OneHot(ci), WC t × OneHot(ti), WC m × OneHot(m)) h C i =RNNC(x C i , hC i−1 ) (3.6) where WC c ∈ R HCc×Q, WC t ∈ R HCt×N , and WC m ∈ R HCm×M are the weight matrices, x C i ∈ R (HCc+HCt+HCm) is the embedding vector, and h C i ∈ R HC h is the latent vector that encodes the context trajectory until step i. An MLP layer followed by a softmax function is then used to obtain the probability vector p C i ∈ R Q for the prediction of the next context type: p C i = Sof tmax(ϕ(WC h × h C i + b C h ), γ) (3.7) 22 where WC h ∈ R Q×HCh and b C h ∈ R Q are the learnable weights in the MLP, and ϕ is the ReLU activation function. γ is a temperature-like hyperparameter used in the softmax function that controls confidence level. The temperature parameter is an important learning parameter for the exploration-exploitation trade-off in Softmax action selection [28]. A low temperature will lead to low entropy in the next context type, and a high temperature will bring more randomness. We apply this tunable temperature to regulate the influence of context trajectories on the final action selection. A high temperature results in less influence from the context trajectory encoder. We then map the context probability vector p C i onto the parameter space of the mobility trajectory, which is a density vector λ C i ∈ R K representing the likelihood of the corresponding locations: λ C i = Γ⊺ × p C i (3.8) Note that Γ ⊺ ∈ {0, 1} K×Q is the conversion matrix used to map from context types to location IDs. After this conversion, we get λ C i , a K-dimensional vector where a higher value in the density vector indicates a higher chance that the corresponding location will be generated, from the context perspective. 3.2.2.3 Spatial Dynamics Enforcer We also consider spatial continuity as an important factor in generating realistic trajectories. For example, the trajectory of movements usually is limited by a speed threshold, i.e., traveling between two consecutive points in the trajectory must be physically feasible for the moving object. To learn the stochastic spatial constraints, existing work either explores a fixed physical distance [96] between two locations or designs a spatial loss to encourage the model to decrease the travel distance between mobility transitions [23]. However, a fixed constraint or enforcing the physical distance between two consecutive locations is not sufficient to capture the complexity of the environment and the variety of the modality in transportation modes. To learn the stochastic constraints, MBP-GAIL generates a density vector at each step where 23 each element in the vector follows a parameterized Gaussian distribution N (0, σS i ) with moving distance. The Gaussian distribution is centered at zero, where closer proximity of the next location to the current ones gives a higher value in the spatial density vector of the corresponding location and, consequently, indicates the higher likelihood of this location grid being chosen in the next step. The standard deviation vector σ S i ∈ R K defines how broad the Gaussian function is, which is learned and controlled from the network. A higher value of the σ S indicates a more flattened Gaussian distribution and weaker spatial constraints, which allows more flexible choices of location to be sampled even from a further distance. Specifically, the spatial dynamics first generates σ S i via h L i , the latent representation of the mobility trajectory until step i: σ S i = sigmoid(WS h × h L i + b S h ) (3.9) where WS h ∈ R K×HLh and b S h ∈ R K are the learnable weight matrices and bias terms. Then the density vector λ S i ∈ R K that enforces the spatial continuity is computed given the Gaussian distribution N (0, σS i ): λ S i = exp(− (∆ × OneHot(li))2 (ασS i ) 2 ) (3.10) where ∆ ∈ R K×K is a pre-computed distance matrix where ∆a,b measures the distance between the a th and the b thlocation, and α is a normalization factor. Here, λ S i is a K-dimensional vector where a higher value in the density vector indicates a higher chance that the corresponding location will be generated from spatial perspective. 3.2.2.4 Density Fusion Density fusion is the final part of the policy network that combines the three density vectors obtained from the previous steps: 1) λ L, the base density for location prediction, 2) λ C, the guidance from the 24 context predictor, and 3) λ S , the distance constraints from the spatial dynamics. It fuses the three vectors by multiplying them all together to generate a weighted density λi ∈ R K: λi = λ L i ⊙ λ C i ⊙ λ S i (3.11) where both λ C i and λ S i can be viewed as weighting masks multiplied onto the base density λ L i generated by the mobility trajectory encoder and ⊙ is element-wise multiplication. In particular, λ C i puts higher weights on locations that have context types matching with what the context predictor predicts, and the temperature parameter γ in Equation 3.7 effectively controls the difference between these weights where for example larger γ gives smaller difference. Similarly, λ S i puts higher weights on nearby locations and lower weights on farther locations, and the learned σ S determines how different these weights are. Lastly, the probability vector for sampling a at step i is calculated based on the fused density λ: pi = sof tmax(µλi) (3.12) where µ is a scaling factor. 3.2.3 Discriminator and Moving Behavior Classifier GAIL uses a reward function to evaluate the actions by comparing the policy-generated actions with realworld actions. It is first modeled by a discriminator D, which aims to distinguish between the real and generated samples. The input of the discriminator is the state-action tuple (s,a) from both the real-world and policy-generated trajectories. Like the policy network, we leverage an RNN to encode the state history and replace the MLP layer with a binary classifier. Following [23], we utilize a sigmoid cross entropy where we sample positive samples from observed trajectories and negative samples from generated trajectories. The optimization is done with the following loss function with gradient descent. 25 LD = E(s,a,m)∈TE log Dψ(s, a, m)+ E(s,a,m)∈TG log(1 − Dψ(s, a, m)) (3.13) where TE and TG are real-world trajectories and policy-generated trajectories, respectively. Then, the probability for the generated sequence will be used as the reward signal to train the generator. rD = − log(1 − Dψ(s, a, m)) (3.14) Moving behavior Classifier Since the current discriminator D focuses only on estimating how realistic a generated sequence is, it may lose some of the original moving behavior pattern preserved in context sequences while generating trajectories. To preserve the moving behavior content, we leverage a multiclass classifier C, which has a similar structure as the discriminator D and trains on the context sequences converted from real-world trajectories with their associated moving behavior labels. Here for C, unlike the discriminator, which uses a binary cross-entropy loss, we perform the moving behavior prediction by utilizing softmax cross-entropy loss and incorporate the moving behavior reward term into the learning of our policy as follows: rC = argmax(C(τ L ) ⊙ OneHot(m)) (3.15) where C(τ L) is the output of the classifier which is an M-dimensional vector encoding the classified probability distribution. With such a design, the generator will be rewarded if the generated trajectory satisfies the moving behavior pattern property. The final reward for training the policy net πθ is defined as follows: r = (1 − υ) ∗ rC + υ ∗ rD (3.16) 26 where υ is a hyperparameter that balances the objective of satisfying the constraints of moving behaviors and mimicking the true trajectories, both of which push the policy learning towards modeling more realistic transitions. Algorithm 1 shows the training procedure of MBP-GAIL. Algorithm 1: Training procedure of MBP-GAIL Input: Real trajectories TE, initial policy θ0 and discriminator ψ0 , Moving Behavior Classifier C Output: Policy πθ, Discriminator Dψ 1 for i ←− 0, 1, . . . do 2 • Rollout trajectories TG,mi , where mi is the deisred moving behavior label of generated trajecotries 3 • Convert it to context trajectories TC,mi by conversion matrix Γ Update πθ with the reward r based on Eq. 3.16 4 Update Dψ using Eq. (3.13) 5 end 3.3 Experiments In this section, we present our experimental evaluations on MBP-GAIL. Our evaluations aim to answer the following questions: RQ1 How is the performance of MBP-GAIL compared to various state-of-the-art methods (realisticness)? RQ2 Can MBP-GAIL preserve the moving behavior patterns in its generation (representativeness)? RQ3 How do different components in MBP-GAIL affect the results? 3.3.1 Experimental Settings Dataset We collect mobility trajectories in Houston and Los Angeles from Veraset ∗ , which provides movement data collected through GPS signals from approximately 10% of cell phones across the US in March 2020. We uniformly sample 15,000 trajectories due to the large data size. We divide the study region ∗ https://www.veraset.com/about-veraset 27 Table 3.1: Detailed statistics of the datasets. # POI denotes the number of POIs. Longitude and Latitude show the detailed spatial ranges of the selected cities. Statistics Houston Los Angeles Longitude (W) [33.8522,36.0522] [118.0437,118.4437] Latitude (N) [29.549907, 29.949907] [95.158421,95.558421] Time 1/3-31/3,2020 # Locations 43262 41255 # POI 42161 103671 into equal side-length (spatial) grid cells with a given side-length l = 200 meters and discretize the cycle of time of a trajectory into one-minute periods to represent the temporal information. The POI information of the located regions can be accessed from the open website of Safegraph † . Table 3.1 illustrate the basic statistics of the Houston and Los Angeles, respectively. We show the number of POIs, the number of total locations, the exact spatial range of the selected datasets. Each point in the trajectory is associated with a latitude, longtitude, timestamp and annomyized user ID. We group the POIs into context types based on their category types, which can be described as Industry, Commercial Area, Entertainment, Education Services, Utilities, Health Care, Residential Areas, and Others. We illustrate the distribution of POI types in each city in Figure 3.3 and Figure 3.4. The location context type is denoted as the category of the most counted POI type within its spatial range. We set locations that have not POIs located in but have been visited more than thresh = 2 times in the dataset as residential area, where locations with no POIs but with fewer visits as others . Baselines We compare MBP-GAIL with data-driven trajectory generation baseline methods:‡ • Markov Model[25] The Markov-based method defines all visited locations as states and builds a transition matrix to capture the first or higher-order transition probabilities between them. † https://docs.safegraph.com/v4.0/docs/places-schema-section-patterns ‡Note that we do not include baseline methods for activity generation since their output is a sequence of POIs and does not support the generation of realistic trajectories. For example, they cannot synthesize the stay duration at each POI. 28 Industry Commercial Area Entertainment Education Services Utilities Health Care 0 5000 10000 15000 20000 25000 30000 # POI Houston Figure 3.3: Distribution of the POI types in Houston. Industry Commercial Area Entertainment Education Services Utilities Health Care 0 10000 20000 30000 40000 50000 60000 70000 80000 # POI Los Angeles Figure 3.4: Distribution of the POI types in Los Angeles. • LSTM A widely used sequential neural network that predicts the next location given historically visited locations 29 • TransVAE [79, 84] A variational autoencoder (VAE)-based generative model where the encoder and decoder are designed with the Transformer architecture • SeqGAN [90] A sequence generative adversarial network to generate the next location based on past states • MoveSim [23] A GAN-based generator that incorporates domain knowledge, such as the urban structure of the regions and POI information in the model. 3.3.2 Performance Comparison (RQ1) 3.3.2.1 Evaluation Metric One of the objectives of this work is to generate activity trajectories similar to real-world activities. Following the common practice in previous works, we adopt these evaluation metrics to evaluate the quality of generated data for statistical similarity: • Distance: travel distance, which is calculated as the cumulative travel distance per trajectory • Radius: radius of gyration is the root mean square distance of all activity locations from the central one, which measures the spatial range • Duration: stay duration, which is calculated as the stay duration of per location visiting • P (r): the visiting probability of one location r • P (r1, r2): the probability of a trajectory transitioning from location r1 to location r2 We use the Jensen–Shannon divergence (JSD) to measure the similarity between the mobility pattern distributions of generated trajectory and real-world trajectory data, which is defined as JSD(p||q) = H((p + q)/2) − 1 2 (H(p) + H(q)) (3.17) 30 where H is the Shannon information, p and q are distributions. Lower JSD indicates a better generation result. In addition to widely used metrics evaluated by JSD[23], we also define P(r1, r2) to evaluate the data utility.P(r1, r2) defines probability of a trajectory transitioning from location r1 to location r2. We build the Origin-Destination Matrix OD ∈ R K∗K for both real and synthetic trajectories. Then we take the Frobenius norm of the difference between the two OD matrices, which can be written as: P(r1, r2) = ||ODreal − ODgenerated||F = vuutX K i=1 X K j=1 |ODreal(i, j) − ODgenerated(i, j)| 2 (3.18) For individual evaluation, we follow [82, 93] to measure how similar each generated trajectory is to the real circumstances by performing the next location prediction task. We take the historical real-world trajectories as the known state to predict the next location and use the standard evaluation performance metrics, Acc@k, which ranks the candidate’s next locations by the probabilities generated from the model and checks whether the ground-truth location appears in the top k candidate locations. 3.3.2.2 Dataset-level Evaluation In this section, we investigate the performance of MBP-GAIL on dataset-level evaluation of real-world data. Table 3.2 and 3.3 present the performance of MBP-GAIL, traditional baseline methods. As we can observe from Table 3.2 and 3.3, the Markov Model performs the worst across all metrics, indicating that simply conditioning on one previous location cannot generate meaningful and realistic trajectories. Movesim achieves the second-best performance on most of these metrics and consistently performs better than SeqGAN, which validates the importance and necessity of incorporating domain knowledge, such as spatial continuity and temporal periodicity, in the generation process. Despite this, MBP-GAIL achieves consistent performance improvements over state-of-the-art prediction and generation methods, especially in 31 Houston Distance Radius Duration P(r) P(r1, r2) Markov Model 0.5098 0.5032 0.4428 0.0028 0.3280 LSTM 0.4865 0.4050 0.3748 0.0023 0.0881 TransVAE 0.4662 0.3942 0.3276 0.0034 0.1537 SeqGAN 0.3318 0.2908 0.2160 0.0074 0.1055 MoveSim 0.2413 0.2402 0.1520 0.0025 0.0924 MBP-GAIL 0.0744 0.1215 0.1311 0.0024 0.0874 Table 3.2: Performance comparison of our model and baselines on the Houston dataset, where the lower value indicates a better performance. Bold denotes the best(lowest) results and the underline denotes the second-best results. Los Angeles Distance Radius Duration P(r) P(r1, r2) Markov Model 0.4086 0.4122 0.4332 0.0046 0.3073 LSTM 0.3855 0.3050 0.3830 0.0032 0.1044 TransVAE 0.3872 0.3443 0.3539 0.0042 0.1462 SeqGAN 0.2948 0.1913 0.1490 0.0025 0.0910 MoveSim 0.0922 0.1274 0.1617 0.0021 0.0932 MBP-GAIL 0.0667 0.1305 0.1452 0.0023 0.0891 Table 3.3: Performance comparison of our model and baselines on the Los Angeles dataset Houston. For example, MBP-GAIL makes a very significant improvement on the JSD metrics evaluation for the distance over the best baseline, Movesim, by 69% and reduces the distance for radius by 49% in Houston. Note that Movesim also applies POI information and spatial constraint in the model (differently from MBP-GAIL). We also obverse that MBP-GAIL’s performance slightly degrades in Los Angeles. We believe this effect may be caused by the fact that the environment (e.g., traffic conditions) is more complex, and the correlation between moving purpose with trajectories is more loose in Los Angles. Nonetheless, for those metrics such as P(r) where MBP-GAIL ranking 2nd, it also has comparable performance compared with the best baselines. This result indicates MBP-GAIL can generate realistic trajectories by introducing the design of context trajectory encoder and spatial dynamics enforcer. 32 Figure 3.5: Next location prediction in Houston and Los Angeles. 3.3.2.3 Individual-level Evaluation In the individual-level evaluation, we measure how similar each generated trajectory is to the real circumstances by performing the next location prediction. Following [82], we take the historical 10 locations as the known state and predict the next location. As we can observe in Figure 3.5, MBP-GAIL performs the best across prediction tasks.MBP-GAIL performs. The performance gap between the proposed MBPGAIL and baseline methods becomes larger in the next location prediction in Houston than in Los Angeles, which is consistent with the performance at the dataset-level evaluation. 33 1 2 3 4 5 Moving Behavior Cluster Index 0.0 0.1 0.2 0.3 0.4 0.5 Percentage SeqGAN MoveSim MBP-GAIL Real Data Figure 3.6: Moving Behavior distributions of the real-world data, MBP-GAIL, SeqGAN, and MoveSim. 3.3.3 Moving Behavior Evaluation (RQ2) 3.3.3.1 Evaluation Metric Given a desired moving behavior pattern, a good trajectory generation model should not only perform well in generating location sequences that are similar to real-world trajectories but also preserves the moving behavior patterns. To further demonstrate the performance of moving behavior distribution, we visualize the moving behavior distribution of the real trajectories, the trajectories generated by MBP-GAIL, and the trajectories generated by the other two best baselines, SeqGAN and MoveSim. We first perform clustering on the edit distance [70] of the context trajectories to obtain the moving behavior label m from the real trajectories. We assume that given a certain moving behavior, an ideally generated trajectory should be similar to all the real trajectories with the same moving behavior. Thus, 34 Houston Distance Radius Duration P(r) P(r1, r2) MBP-GAIL-NC 0.1240 0.1525 0.2904 0.0027 0.1241 MBP-GAIL-NS 0.2257 0.2504 0.2412 0.0027 0.1108 MBP-GAIL 0.0744 0.1215 0.1311 0.0024 0.0875 Table 3.4: Performance comparison of our model and baselines on the Houston, where lower value indicates a better performance. Bold denotes best (lowest) results and underline denotes the second-best results. for each generated trajectory, we first calculate the distance between a generated context trajectory τ C (converted from mobility trajectory) and each moving behavior mi clusters in the real-world data: d(τ C, TE,mi ) = 1 |TE,mi | X τ˜C ∈TE,mi dist(τ C, τ˜ C) (3.19) Then, we assign the generated trajectory with the moving behavior label of the closest cluster by comparing the distances between the generated context trajectory and every moving behavior cluster and finding the one with the smallest distance: mτ = argmin mi∈{1,2,...,M} {d(τ C, TE,mi )} (3.20) 3.3.3.2 Evaluation Results We evaluate the results using the above metrics on the Houston dataset, where we run DBSCAN [62] on the real trajectories and obtain five moving behavior clusters. As we can observe in Figure 3.6, both baselines have a large shift in data distribution compared with real data. For example, SeqGAN generates most of the trajectories with the moving behavior index 5, which is five times the number in the real-world data. In contrast, the data distribution of our framework is very similar to the real-world data, indicating the promising performance of our framework on moving behavior preservation. 35 Los Angeles Distance Radius Duration P(r) P(r1, r2) MBP-GAIL-NC 0.0892 0.1815 0.1623 0.0031 0.0930 MBP-GAIL-NS 0.1724 0.2443 0.2156 0.0028 0.0947 MBP-GAIL 0.0667 0.1305 0.1452 0.0023 0.0890 Table 3.5: Performance comparison of our model and baselines on the Los Angeles dataset. 3.3.4 Ablation Study (RQ3) To further investigate the effect of each component in our model, we create two ablated variants of MBPGAIL by removing the context encoder and spatial dynamics multi-graph fusion. The two variants are named MBP-GAIL-NC and MBP-GAIL-NS, respectively in Table 3.4 and 3.5. We observe that MBP-GAILNC performs significantly worse in the Distance metric, demonstrating that the spatial dynamics module effectively preserves the variations of spatial movements and generates realistic location choices sequentially. Moreover, MBP-GAIL consistently outperforms all the variants, which demonstrates the necessity of all components in the framework. 3.4 Chapter Summary In this paper, we presented a novel generative adversarial framework dubbed as MBP-GAIL, designed to synthetize human mobility trajectories. The MBP-GAIL framework captures the underlying patterns of movement behavior, a crucial aspect in generating realistic and representative mobility data. We emphasize the importance of integrating both the moving behavior and spatial constraints in generating massive amounts of mobility data that closely resemble real-world scenarios. Through extensive experiments, we have demonstrated the exceptional performance of MBP-GAIL in generating synthetic mobility data that closely resembles real-world data in terms of realism and representativenes. MBP-GAIL outperforms other existing methods and has shown promising results in synthesizing human mobility patterns. 36 One limitation of MBP-GAIL is that it generates mobility data for discrete locations and time periods. In future work, we plan to investigate the generation of continuous-time mobility data and the conditioning of location generation based on specific time periods. This will further enhance the realism and representativeness of the generated mobility data, offering a more comprehensive understanding of human mobility patterns. 37 Chapter 4 Unified Modeling and Clustering of Mobility Trajectories In this chapter, we propose to learn trajectories based on grouped moving dynamics where we focus on GPS points localized in continuous time and space. Therefore, our proposed framework tries to learn a better trajectory representation by considering the continuous characteristics of states in time and space and at the same time, taking into consideration different moving dynamics by clustering the trajectories based on the dynamics. we build a trajectory learning model based on spatiotemporal point processes (STPPs), a robust and structured framework for modeling trajectory data in continuous space-time. [71] A recent study demonstrates the effectiveness of STPPs in modeling Point-of-Interest trajectories, emphasizing the importance of spatiotemporal dynamics. [93] However, they implicitly assume that all observed trajectories are generated by a single moving dynamic to which they try to fit. Instead, real-world trajectories exhibit varying dynamics (see Figure 4.1). We hypothesize that if we can group trajectories governed by similar dynamics into a single cluster, STPP will exhibit greater efficacy in modeling trajectories within each cluster. For instance, research shows that modeling the inherent modality or moving behaviors of trajectories can help understand how humans move in space and time, which improves other downstream tasks such as urban mobility simulation [97, 44]. Unfortunately, this gives rise to a classic ‘chicken-and-egg’ predicament: we must first employ STPP to model the trajectories and capture their underlying dynamics before we can 38 Figure 4.1: Example of trajectories with two different moving patterns. cluster similar ones; on the other hand, we must cluster them based on the dynamics first before we can effectively learn the STPP model. There are several potential approaches to go around the problem. First, we can employ a two-stage approach (clustering-then-modeling) where we first cluster trajectories based on the raw features in the trajectory space, such as their spatiotemporal similarity (e.g., using Euclidean distance between the raw trajectories) and then model each cluster with an STPP. Clearly, these clusters may not capture the underlying moving dynamics and may not be optimally aligned with subsequent trajectory learning. Second, we can employ an alternative two-stage approach (modeling-then-clustering), where we first employ models to learn representations from the trajectories, enabling the modeling of continuous-time spatiotemporal dynamics and feature extraction from the trajectories. Subsequently, we can utilize these features to discern the cluster structure. However, previous research has indicated that such a two-phase training paradigm leads to unstable clustering outcomes, often sensitive to the quality of the learned features [95]. Despite some recent efforts to jointly learn representations and clustering information in a single training phase for discrete domain sequences [94], we are unaware of any previous work that directly applies this approach to trajectories embedded within continuous time and space, aiming to capture the underlying moving dynamics. In this paper, we propose a novel framework, named Mobility-aware Deep Trajectory Modeling and Clustering (DTMC), which unifies modeling the trajectory dynamics and clustering them based on their 39 dynamics simultaneously via spatiotemporal point processes. Specifically, we decompose the hidden embedding for the trajectories into two representations: the individual representation acquired through a neural STPP model, and another cluster representation that encapsulates the clustered moving patterns. To obtain cluster assignments, we introduce variational inference where each trajectory can learn a conditional probability based on cluster assignment in an Expectation-Maximization (EM) framework that iteratively refine the trajectory embeddings and cluster assignment and resolve the chicken and egg problem. To show the effectiveness of DTMC, we first compare its clustering results with three types of clustering methods: clustering-only, modeling-then-clustering, and concurrent-modeling-clustering methods. Subsequently, we compare the predictive performance of the DTMC model with three types of modeling approaches: single STPP model, clustering-then-modeling, and concurrent-clustering-modeling methods. The findings underscore DTMC’s superiority not only in enhancing trajectory modeling but also in achieving remarkable results in trajectory clustering. In summary, our contributions are: • We propose a novel unified framework to simultaneously model and cluster trajectories based on inherent moving patterns. Based on the inferred clustering results for each trajectory, our method improves the performance of trajectory representations learning by capturing the underlying clustered moving patterns. • We model the trajectories with STPPs, which can learn continuous spatiotemporal moving dynamics via neural differential equations. • Extensive experiments on both synthetic and real datasets show the expressiveness of our proposed framework on both trajectory clustering and predictive tasks. 40 Spatio-temporal dynamics modeling Select cluster embedding M-step Pseudo-label E-step � … … … … Softmax … �!(�) [log ��] log �� � �) log �� � �) � (�� ,��) Figure 4.2: The workflow of our proposed modeling and clustering framework. 4.1 Preliminaries 4.1.1 Problem Definition In this section, we formally define our research problem of clustering mobility trajectories. We cluster trajectories based on moving patterns on raw trajectories. We first define the mobility trajectory, and then formulate our research problem. Definition 7 (Mobility Trajectory) A mobility trajectory is a sequence of spatiotemporal points generated by an individual in daily life and recorded through GPS. It is represented by a sequence of chronologically ordered points S = {(ti , xi)} L i=1, where ti is the timestamp, xi is the location, and L is the total length of the mobility trajectory. Problem 2 Given a dataset of real-world trajectories in a specific area, our goal is to learning grouped trajectory embeddings while clustering and assigning each trajectory a specific cluster type k ∈ {1, 2, ..., K} such that trajectories sharing similar mobility patterns have the same type. 4.1.2 Background Spatiotemporal Point Processes [18] are concerned with modeling sequences of random events in continuous space and time. We denote an event sequence as S = {(ti , xi)} L i=1, where ti ∈ R is the timestamp, 41 xi ∈ R d is the associated spatial location at each timestamp, and L is the total number of events. An STPP first defines the conditional intensity function: λ(t, x|St) = lim ∆t→0,∆x→0 p(ti∈[t,t+∆t],xi∈B(x,∆x)|St) |B(x,∆x)|∆t , (4.1) where St = {(ti , xi)|ti < t,(ti , xi) ∈ S} denotes the history of events prior to time t, and B(x, ∆x) denotes a ball centered at x ∈ R d and with radius ∆x. The non-negative conditional intensity function λ(t, x|St), often denoted as λ ∗ (t, x), describes the instantaneous probability of the i-th event occurring at t and location x given i − 1 previous events. The joint log-likelihood of observing the trajectory history S within a time interval of [0, T] is then given by log p(S) = X L i=1 log λ ∗ (ti , xi) − Z T 0 Z Rd λ ∗ (τ,u) dudτ. (4.2) 4.2 Deep Trajectory Modeling and Clustering (DTMC) In this section, we present our DTMC approach, which integrates trajectory representation learning and clustering. The workflow of our proposed framework is shown in Fig. 4.2. More specifically, we present the variational EM framework and explain how the probabilistic model infers the cluster assignment (Section 4.2.1). We then elaborate on how we learn spatiotemporal dynamics by incorporating cluster embedding (Section 4.2.2) and finally summarize the training procedure that iteratively updates the cluster embedding and spatiotemporal models for each cluster based on the cluster membership (Section 4.2.2.2). 4.2.1 Trajectory Cluster Inference Given a set of trajectories S = {Sn} N n=1, where Sn = {(ti , xi)} L i=1 is a sequence defined in Def 8, our goal is to divide these N sequences into K groups such that the trajectories generated by similar 42 spatiotemporal dynamics are grouped. We assume for each Sn, there is a corresponding latent variable zn ∈ {0, 1} K, PK k=1 znk = 1 denoting the cluster membership i.e. znk = 1 if and only if Sn belongs to group k. Each zn is drawn from a categorical distribution defined on π = [π1, ..., πK] ∈ R K where π can be a static prior on cluster types or drawn from a Dirichlet distribution. Denoting the variable to represent the distribution of cluster assignment as Z = {z1, · · · , zN }, and our goal is to both maximize the likelihood of the data while also infer the latent variables Z. Given the prior π, the conditional distribution of Z is formed as: p(Z|π) = QN n=1 QK k=1 π znk k where πk is the k-th dimension of π. Given Z, we model the conditional probability of S as: pθ(S|Z) = Y N n=1 Y K k=1 pθ(Sn|k) znk , (4.3) where θ denotes all the learnable parameters of the model. For a trajectory Sn, we input it together with cluster label k into the model, where we adopt a parametric latent embedding for each cluster and obtain the conditional probability pθ(Sn|k). Thus, we can factorize the joint distribution of all variables pθ(S, Z,π) by pθ(S, Z,π) = p(π)p(Z|π)pθ(S|Z) = p(π) Y N n=1 Y K k=1 [πk exp(X L i=1 log λ ∗ θ (ti , xi |k) − Z T 0 Z Rd λ ∗ θ (τ,u|k) dudτ )]znk . (4.4) 43 �� MLP �� �� Softplus �! (#) Neural ODE solver ��! �� �� Neural ODE solver ��! �� GRU lim %→'! �!(% ()) �) ∗ � �) Flow model lim %→'! �!(% �+ ∗ � �, �) �� �� ��",." (�) Update state �! ()) Figure 4.3: Illustration of the spatiotemporal modeling with cluster embedding. 4.2.2 Learning spatiotemporal Dynamics of Cluster In this section, we elaborate on how to model the trajectory and learn the conditional intensity function λ ∗ (t, x|k) for cluster k so that pθ(S|Z) in Eqn. 4.3 can be computed. Specifically, we decompose the conditional intensity based on [17] as λ ∗ (t, x|k) = λ ∗ t (t|k) | {z } Temporal λ ∗ s (x|t, k) | {z } Spatial . (4.5) where λ ∗ t (t|k) is the intensity function for temporal process and λ ∗ s (x|t, k) is the conditional intensity of spatial location x at t given the past trajectory history. Consequently, derived from Eqn. 4.2, we can compute pθ(Sn|k) accordingly: log pθ(Sn|k) =X L i=1 log λ ∗ t (ti |k) − Z T 0 λ ∗ t (τ |k) dτ | {z } Temporal log-likelihood + X L i=1 log λ ∗ s (xi |ti , k) | {z } Spatial log-likelihood . (4.6) 44 In the following, we will first introduce how we incorporate cluster latent embedding to get the hidden states of a trajectory. These hidden states not only encapsulate the latent characteristics of a cluster but are also informed by the trajectory data at each time point. We show how to construct the model to learn the temporal and spatial dynamics which will be jointly conditioned on the hidden states. Fig. 4.3 shows the overall learning process which we will explain in detail below. 4.2.2.1 Decomposing Hidden Variables To model the conditional intensity function λ ∗ (t, x|k) for K clusters respectively, we introduce cluster hidden states h (k) 1:L where we try to combine cluster latent information with the representations learned from the temporal dynamics and spatial dynamics together. Similar to a recurrent neural network, where at every time point, we acquire a hidden state h (t) 1:L that acts as a summary of the history trajectory and would be used to predict future temporal and spatial variables ti and xi . We then augment these representations for each step by adding cluster embedding: h1:L = h (k) 1:L + h (t) 1:L . (4.7) 4.2.2.2 Temporal Modeling After augmenting the cluster embedding, we model hidden state dynamics with jumps to parameterize the intensity function λ ∗ t (t|k) which has been proved effective in [35]. Specifically, we apply the Neural ODE to ensure a continuous-time hidden state and then trigger instantaneous updates in response to the introduction of a new point. This mechanism is essential because it not only captures the continuous temporal pattern between each point, but also allows historical points to influence future movement. In summary, the continuous flow and instantaneous update can be formulated as: 45 dhi dt = fh(ti , hi), lim ϵ→0+ h (t) i+ϵ = gh (ti , xi , hi), (4.8) where fh is a multi-layer perceptron (MLP) modeling the continuous evolution between event times, gh is a gated recurrent unit (GRU) modeling the instantaneous updates of the hidden states at event time, and h (t) i+ϵ denotes the hidden state at time ti +ϵ. As a decoder of the hidden representations, we use a standard multi-layer fully connected neural network with a softplus activation to ensure the intensity is positive. Thus, given h (t) 1:L , the conditional temporal intensity is computed as: λ ∗ t (t1:L|k) = Softplus(MLP(h (t) 1:L )). (4.9) 4.2.2.3 Spatial Modeling Similar to temporal modeling, a core component for modeling locations in the spatial domain is an interpolated continuous spatial intensity function. We determine the spatial dynamic based on Continuous Normalizing Flow (CNF) to model the conditional spatial density p(x|t). In several recent studies, CNFs have been prove effective to model distributions on a real-valued axis such as spatial locations and point clouds [69]. In the same way to temporal domain, we want the spatial intensity function to be continuous everywhere except for the observed point. Consequently, the spatial pattern is updated by a continuoustime normalizing flow that evolves the distribution continuously, and a standard flow model that changes the distribution instantaneously after conditioning on new events. Additionally, the update of the normalizing flow is conditioned on h1:L (with cluster information included), with the assumption that the trajectory history augmented with cluster information has an impact on the future spatial distribution. Such dynamics can be formulated as follows: dxi dt = fx(ti , xi , hi), lim ϵ→0+ xi+ϵ = gx (ti , xi , hi), (4.10) 46 where fx is modeled by a continuous normalizing flow, gx is realized by a standard linear flow, and xi+ϵ denotes the location at time ti + ϵ. Since the spatial variables are real-valued features, we parameterize the conditional spatial intensity x with a Gaussian mixture model as: λ ∗ s (x1:L|t1:L, k) = Nµk,Σk (x1:L), (4.11) where µk and Σk are respectively the learnable mean vector and the learnable covariance matrix of the Gaussian distribution for the k-th cluster. The spatial log-likelihood in Eqn. 4.6 can then be evaluated accordingly. 4.2.3 Training Algorithm In this section, we explain how we optimize and train the framework. We leverage a variational EM framework, where the spatiotemporal modeling learns the spatial and temporal intensity function of each cluster to predict the joint log-likelihood of the trajectory data, whereas we can infer cluster assignment and get the posterior pθ∗ (Z|S) based on the spatiotemporal log-likelihood from each cluster. To be more specific, the framework tries to maximize the log-likelihood function pθ(S). As directly optimizing the function is often hard, we resort to variational methods and introduce a varational distribution q(Z, π) to approximate the posterior pθ(Z, π|S), and thus the framework instead optimizes the evidence lower bound (ELBO) as below : L(q, θ) = log pθ(S) − KL(q(Z, π)||pθ(Z, π|S)). (4.12) Using mean field approximation [76], we have q(Z, π) = q(Z)p(π). Therefore, we derive : 47 L(q, θ) = Z q(Z)Ep(π) log pθ(S, Z|π) q(Z) dZ. (4.13) The ELBO can be optimized by alternating between optimizing the variational distribution q(Z) (i.e., Estep) to approximate the posterior and optimizing the model parameters θ (i.e., M-step) such that log pθ(S) is maximized to better characterize the trajectories. 4.2.3.1 E-step: Update Cluster Assignment In the E-step, we fix the model parameters θ and aim to update q(Z) to maximize the ELBO. The log of the optimized q(Z) is given by: log q ∗ (Z) =Ep(π) log pθ(S, Z|π) = X N n=1 X K k=1 znk{Ep(π) [log πk] + log pθ(Sn|k)}. (4.14) Normalizing the above formulation, we obtain: q ∗ (Z) = Y N n=1 Y K k=1 r znk nk , (4.15) where rnk = exp(Ep(π) [log πk]+log pθ(Sn|k)) PK κ=1 exp(Ep(π) [log πκ]+log pθ(Sn|κ)) is the pseudo-label. As q(Z) is an approximation of pθ(Z|S), when the model is well-trained, pθ∗ (znk = 1|Sn) = rnk, i.e. the posterior probability that Sn in group k is rnk. 4.2.3.2 M-step: Update Model Parameters In M-step, we fix q(Z), and optimize L(q, θ) with respect to θ. Since θ are learnable parameters of the model, we optimize θ via gradient descent with the loss function given by : 48 L(θ) = Eq(Z) [log pθ(S|Z)] = X N n=1 X K k=1 rnk log pθ(Sn|k). (4.16) The procedure of a forward pass of DTMC on trajectory clustering is shown in Algorithm 2 . Algorithm 2: Variational EM training algorithm for DTMC Input: Trajectories S = {Sn} N n=1. Number of clusters K. Mini-batch size B. Training epochs T. Number of steps to update θ in M-step M. Output: Cluster assignment q ∗ (Z) ≈ pθ∗ (Z|S). 1 Initialize θ; To prevent bad initialization, pretrain each cluster hidden embedding h k on samples {Sn} ∈ k divided by K-means ; 2 Evaluate Ep(π) [log πk], k ∈ {1, · · · , K}; 3 for epoch = 1, · · · , T do 4 for iter = 1, 2, ... do 5 Sample a batch of sequences S ′ from S; // E-step: update q(Z′ ) 6 Evaluate log pθ(Sn|k), Sn ∈ S ′ ; 7 rnk = exp(Ep(π) [log πk]+log pθ(Sn|k)) PK κ=1 exp(Ep(π) [log πκ]+log pθ(Sn|κ)) ; // M-step: update θ 8 for step = 1, · · · , M do 9 L˜(θ) = P Sn∈S′ PK k=1 rnk log pθ(Sn|k); 10 Update θ based on ˜gL(θ) ; 11 end 12 end 13 end 14 Compute rnk, n ∈ {1, · · · , N}, k ∈ {1, · · · , K} using the trained model (line 6 to 7); 15 return q ∗ (znk = 1) = rnk; 4.3 Experiments To understand different moving patterns in trajectories and validate our hypothesis that capturing the underlying moving dynamics can help with trajectory modeling, we conduct experiments on both the task of trajectory clustering (Section 4.3.4 and 4.3.5) and trajectory modeling (Section 4.3.6) to evaluate the performance of DTMC. 49 4.3.1 Dataset 4.3.1.1 Synthetic Datasets We use two types of traditional STPPs (with different parameters) to simulate three different moving patterns (moving differently in both temporal and spatial domain)∗ : Spatiotemporal homogeneous Poisson process (STHP), and Spatiotemporal Hawkes process with gaussian diffusion kernel (STHG) with two different parameter settings (labeled as STHG1 and STHG2 to represent different moving patterns with similar behavior in temporal axis but differ significantly in spatial distributions). Additionally, we generate a moving pattern based on the “uniform walk assumption”, where an agent consistently moves at a uniform speed (UNI). All trajectory simulations are defined within S × T = [0, 2]2 × [0, 10]. Each moving dynamics has 1000 event sequences, each containing L = 25 points. We merge the above trajectories to generate three datasets with different number of clusters (K) as ground truth: • K = 2: STHP + STHG1; • K = 3: STHP + STHG1 + STHG2; • K = 4: STHP + STHG1 + STHG2 + UNI. In this way, we generate the synthetic datasets to mimic the situations in real-world where different moving patterns of trajectories are mixed together. In this paragraph, we provide a comprehensive description of the hyperparameters utilized in the generation of our synthetic dataset. The visualization of each simulated moving pattern can is presented in Fig 4.4. STHP Simulation. Explicitly, a spatio-temporal homogeneous Poisson process with intensity λ ∗ (t, x) > 0 where we set λ ∗ (t, x) = 1 ∗ https://github.com/meowoodie/spatiotemporal-Point-Process-Simulator 50 STHP STHG1 STHG2 UNI Figure 4.4: Visualization of the synthetic dataset. Each image represents one trajectory in specified moving patterns. The x-axis and y-axis represent the spatial region. Dark dots correspond to the oldest points, and light dots correspond to the most recent. Red arrows denotes the moving directions. STHG1 and STHG2 Simulation. We assume the kernel function takes a single standard Gaussian diffusion kernel over space and decays exponentially over time to capture the spatial-nonhomogeneous structure [104]. Given the past events, the intensity function is defined as λ ∗ (t, x) = λ0 + X t ′<t g(t, t′ , x, x ′ |µx′, Σx′), ∀t ′ < t, x ′ ∈ S, where the Gaussian diffusion kernel g is defined as g(t, t′ , x, x ′ |µx′, Σx′) = Ce−β(t−t ′ ) 2π p |Σx′|(t − t ′) · exp ( − (x − x ′ − µx′) T Σ −1 x′ (x − x ′ − µx′) 2(t − t ′) ) , where β > 0 controls the temporal decay rate; C > 0 is a constant that decides the magnitude; µx and Σx denote the mean and covariance of the diffusion kernel; | · | denotes the determinant of a covariance matrix. The parameters control the shape (shift, rotation, and etc.) of the Gaussian distribution. 51 Dataset K = 2 K = 3 K = 4 CP ↑ ARI ↑ NMI ↑ CP ↑ ARI ↑ NMI ↑ CP ↑ ARI ↑ NMI ↑ KM-RAW 0.91 0.71 0.60 0.65 0.41 0.42 0.71 0.54 0.57 KM-DTW 0.83 0.65 0.61 0.66 0.42 0.43 0.62 0.38 0.47 DBSCAN 0.84 0.47 0.50 0.63 0.45 0.50 0.54 0.30 0.40 HPGM+BGM 0.62 0.05 0.11 0.41 0.07 0.03 0.35 0.05 0.04 GMVAE 0.75 0.41 0.36 0.60 0.28 0.27 0.50 0.23 0.29 GMVAE+ 0.76 0.43 0.38 0.66 0.32 0.41 0.67 0.41 0.43 THP-EM 0.92 0.72 0.70 0.65 0.40 0.42 0.72 0.55 0.58 DTMC 0.97 0.88 0.83 0.73 0.48 0.51 0.77 0.61 0.63 Table 4.1: Cluster performance comparison of our model and baselines on the synthetic datasets, where the lower value indicates a better performance. Bold denotes the best(highest) results and the underline denotes the second-best results. In STHP1, we set C = 1, β = 1, µx = 0 0 , Σx = 0.05 0 0 0.05 . In STHP2, we set C = 1, β = 1, µx = 0 0 , Σx = 0.3 0 0 0.01 . UNI Simulation. For each sequence, we first randomly sample a pair of points with temporal coordinates t1 and tL from [0, T] and spatial coordinates x1 and xL from [0, 2]2 to represent the starting point and ending point of the trajectory. We then discretize and equally sample L = 25 points between (t1, x1) and (tL, xL) with uniform spatial and temporal steps. 4.3.1.2 Real-world datasets We evaluate the performance of DTMC on real-world datasets where the trajectories are specified as a list of tuples of: user identifier, latitude and longitude, and timestamp. We collect mobility trajectories in Houston from Veraset† , which provides movement data collected through GPS signals from cell phones in March 2020. We sample 1,000 trajectories due to the large data size. Another real-world dataset [86] was collected from Foursquare, Tokyo, which includes 1000 user check-ins within a duration of one month. † https://www.veraset.com/about-veraset 52 KM-RAW DBSCAN GMVAE+ THP-EM Ours Figure 4.5: Confusion matrix on synthetic dataset when K = 4. 4.3.2 Compared Methods To evaluate the clustering ability of DTMC, we compare our model with three types of baselines: clustering-only, modeling-then-clustering, and concurrent-modeling-clustering. Later, for evaluating the modeling accuracy, we combine some of the baseline approaches to create clustering-then-modeling approaches for comparison. The clustering-only methods extract features from trajectories and then apply clustering on them. KMRAW uses raw trajectory as input and applies K-means‡ clustering with Euclidean distance. KM-DTW [67] calculates distance matrix using Dynamic Time Warping (DTW) distance§ and subsequently apply Kmeans clustering on sequential data. DBSCAN employs a density-based clustering approach [72] to cluster the raw trajectories. GMVAE [19] applies a Gaussian Mixture Variational Autoencoder for unsupervised clustering. We develop the encoder and decoder with GRU [15] layers to work with sequences (without modeling the continuous spatiotemporal dynamics in the network). GMVAE+ a variant of GMVAE, where we introduce a supervised loss into GMVAE to align its cluster results with those produced by K-means aligning with our algorithm which is pre-trained with K-means cluster results. ‡ https://scikit-learn.org/stable/ § https://github.com/maikol-solis/trajectory_distance 53 Modeling-then-clustering baselines model trajectories using traditional STPP model and apply clustering on the model parameters. HPGM +BGM [99] learns a specific Hawkes process for the temporal domain and applies a history-dependent Gaussian mixture model for the spatial domain and applies Bayesian Gaussian Mixture model to the learned parameters for clustering. Concurrent-modeling-clustering baseline models trajectories using neural network and perform clustering simultaneously. THP-EM [106]. THP leverages the Transformer encoder for temporal point process representation learning. In the original work, THP was limited to learning the representation for sequences with continuous time only (without proper spatial modeling). We discretize our spatial points into 10 × 10 grids and use grid IDs as markers. For the K clusters, we initialize K different THP models and then apply the EM algorithm to learn the cluster assignment [99]. 4.3.3 Evaluation Metrics To evaluate the clustering ability, we use three clustering metrics that are widely used: Clustering Purity (CP) [4]: The ratio between the number of correctly matched class samples and the number of total data points. Adjusted Rand Index (ARI): The similarity of predicted and ground truth assignments. [80] Normalized Mutual Information (NMI): The reduction in entropy of class labels when the cluster labels are given. Note that CP, ARI, and NMI only work when the ground-truth cluster assignments are known, which is only available for synthetic datasets. For real-world datasets, since the ground truth clustering assignment is unknown, we report silhouette score [73] based on the clustering results and calculate the Euclidean distance on the raw trajectories. Learning Performance To evaluate the learning ability, we report Log-likelihood as the metric for trajectory sequences fitting (the higher, the better) [99]. We randomly split the data set into training (80%) and testing sets (20%), where we train the model on the training set and report log-likelihood on spatial and temporal domains separately on held-out test data. 54 STHP STHG1 STHG2 UNI Model Temporal Spatial Temporal Spatial Temporal Spatial Temporal Spatial Single STTP-model 0.603 -2.925 0.179 -2.064 0.136 -2.345 -0.181 -0.334 Clustering-then-modeling (K-Means) 0.613 -2.72 0.213 -1.944 0.148 -2.323 0.601 -0.177 Clustering-then-modeling (GMVAE+) 0.608 -2.715 0.203 -1.945 0.154 -2.329 0.604 -0.174 THP-EM 0.615 – 0.210 – 0.151 – 0.619 – DTMC 0.649 -2.530 0.243 -1.733 0.218 -2.051 0.637 -0.135 Table 4.2: Log-likelihood per event on synthetic dataset (higher is better). Implementation Details In our experiments, the MLPs are used in the network architecture with a hidden size of 32. The hidden dimensions are all set to 64. We perform the training algorithms in a mini batch of 248, with pretrain epoch set to 10. We use Adam optimzer with lr = 0.001. We peform M-step twice after each E-step. We implement our algorithm in Pytorch. All the deep-learning-based modes are implemented with Pytorch and classical methods are implemented with Python. All experiments are trained on one GeForce RTX 2080 GPU. 4.3.4 Clustering Performance on Synthetic Dataset We assume that the ground truth K and cluster size distribution π is given in the synthetic dataset evaluation. As we can observe in Table 4.1, the modeling-then-clustering baseline HPGM+BGM performs worst among all metrics. This is because it assumes all the trajectories strictly follow parametric Hawkes processes, which does not match reality. Additionally, it employs a two-step process for feature extraction and clustering, where each step is relatively independent. GMVAE and GMVAE+ outperforms HPGM+BGM significantly. It is reasonable as these two methods fit trajectories with the neural network and provide an end-to-end clustering framework. However, the backbones of both versions of GMVAE are simple RNNs that lack proficiency in modeling trajectories with GPS points in continuous time and space. THP-EM achieves the second-best performance, possibly reflecting its ability to discern diverse movement patterns presented in temporal space. Overall, DTMC achieves significant improvement across various datasets, which demonstrates the expressiveness of our clustering framework. We further verify our assumption 55 Houston Foursquare Figure 4.6: Silhouette score on real-world dataset. The higher value represents a better cluster quality. by visualizing the confusion matrix of selected models on K = 4 in Fig. 4.5. Note that although most methods (including ours) effectively distinguish STHP and UNI due to their significantly distinct temporal and spatial movement patterns, our method excels in differentiating STHG1 and STHG2, where these two moving patterns are simulated under same parameters in temporal domain but with different parameters for spatial distributions. It proves the importance and necessity of modeling trajectories as STPP to accurately learn the inherent moving patterns. 4.3.5 Clustering Performance on Real-World Dataset For real-world datasets, we calculate the silhouette score on the clusters generated from each method. Fig. 4.6 shows the comparison results of two real-world datasets. Overall, DTMC exhibits superior performance, indicating that its generated clusters are distinctly separated from one another. K-means ranks second. This is not surprising as we directly calculate the silhouette score based on Euclidean distance on raw trajectories, consistent with how K-means perform the clustering. However, note that the purpose of the clustering is not necessarily to group the trajectories that are similar in Euclidean distance but on how well each group can be modeled later. Towards this end, in the next section, we compare the modeling quality of the clusters. 56 4.3.6 Representation Learning Performance via Log-likelihood Our main thesis in this paper is that segregating trajectories into groups based on shared spatiotemporal dynamics would enhance the accuracy of trajectory modeling. In this section, we report log-likelihood to verify this claim and evaluate the effectiveness of our approach. In order to show the benefits of utilizing the clusters for the trajectory modeling, we compare DTMC with three types of learning methods: 1) Modeling-only: Single STPP-model where we do not cluster the trajectories and only apply a single STPP model on all trajectory data; 2) Clustering-then-modeling: here we first cluster trajectories with either Kmeans or GMVAE+¶ and then train K different STPP models on each cluster separately. The GMVAE+ variation represents a clustering approach that specifically clusters for better modeling while K-Means represents methods that clusters for similarity. 3) Concurrent-clustering-modeling: THP-EM where we apply the same EM algorithm to learn the cluster assignment and report learning in the temporal domain. Table 4.2 reports log-likelihood evaluation on the synthetic datasets. As we can observe, Single STPPmodel performs the worst since it does not group trajectories and thus must capture different moving dynamics using a single set of model parameters. THP-EM ranks second in the temporal learning performance as it also efficiently capture the underlying group patterns in temporal domains. However, it does not outperform our model in the temporal domain, suggesting that modeling temporal distribution without conditioning on the space attributes hampers comprehensive temporal domain learning. Our DTMC approach achieves the best performance in all cases, which demonstrates the power of clustering trajectories for the purpose of better learning. The results also show that the relative performance gain of our method is more obvious with datasets with more complex spatiotemporal dynamics patterns such as STHG1 and STHG2, which would be more useful in practical applications. For real-world datasets, where the ground truth number of clusters (K) is unavailable, we employ Kmeans clustering on the trajectories and apply elbow method [2] to determine the best choice of K (6 for ¶ Since the original GMVAE+ network cannot be directly used for prediction and log-likelihood valuation, we utilize the clustering result of GMVAE+ and then train STPP for each cluster 57 Houston Foursquare Model Temporal Spatial Temporal Spatial Single ST-model 0.627 1.336 2.013 -2.115 Clustering-then-modeling (K-Means) 0.823 1.349 2.075 -1.939 Clustering-then-modeling (GMVAE+) 0.813 1.356 2.063 -1.936 THP-EM 0.836 – 2.076 – DTMC 0.881 1.423 2.082 -1.912 Table 4.3: Log-likelihood per event on real-world data. Houston and 4 for Foursquare, respectively). Table 4.3 reports the corresponding log-likelihood. Similar to synthetic datasets, we observe that the explicit clustering of moving patterns can significantly boost the performance of modeling the spatiotemporal dynamics of trajectories. 4.4 Chapter Summary Real-world trajectories are governed by different underlying moving dynamics. Capturing groups of similar trajectories during the learning process enhances the quality of trajectory representation for predictive analysis. In this paper, we proposed a novel deep learning framework, DTMC, that can concurrently clusters and models trajectories based on their inherent moving patterns. Extensive experiments demonstrate the superior performance of DTMC in differentiating trajectory moving patterns and representation learning as compared to various baseline methods that follow a sequential clustering-then-modeling or modeling-then-clustering approach. Moreover, it surpasses approaches that project trajectories into discrete space, resulting in the loss of detailed spatial and temporal characteristics. In future research, we aim to expand the application of our model across diverse datasets and and utilize the model to generate synthetic datasets with different moving patterns. 4.5 Supplementary Proof of Theorems We present the derivation of our training algorithm here. We first show the derivations of Eqn. 4.12: 58 log pθ(S) = Z Z q(Z,π) log pθ(S)dZdπ = Z Z q(Z,π) log pθ(S)pθ(Z,π|S) pθ(Z,π|S) dZdπ = Z Z q(Z,π) log pθ(S, Z,π) pθ(Z,π|S) dZdπ = Z Z q(Z,π) log pθ(S, Z,π) q(Z,π) q(Z,π) pθ(Z,π|S) dZdπ = Z Z q(Z,π) log pθ(S, Z,π) q(Z,π) dZdπ | {z } L(q(Z,π),θ) + Z Z q(Z,π) log q(Z,π) pθ(Z,π|S) dZdπ | {z } KL(q(Z,π)||pθ(Z,π|S)) . (4.17) By moving the KL-divergence KL(q||pθ) to the lefthand side, we obtain Eqn 4.12. As such, maximizing L is equivalent to maximizing log pθ(S) which is the original optimization target, and minimizing the KL-divergence which gives us a good approximation of the posterior of cluster assignment pθ(Z, π|S). Assuming q(Z,π) = q(Zp(π), our goal changes to maximizing L(q, θ) with respect to q(Z) and θ. We can further derive the form of the loss function L from Eqn. 4.12 to Eqn. 4.13: (q, θ) = Z Z q(Z,π) log pθ(S, Z,π) q(Z,π) dZdπ = Z Z q(Z)p(π) log pθ(S, Z,π) q(Z)p(π) dZdπ = Z Z q(Z)p(π) log pθ(S, Z|π) q(Z) dZdπ = Z q(Z) Z p(π) log pθ(S, Z|π) q(Z) dπdZ = Z q(Z)Ep(π) log pθ(S, Z|π) q(Z) dZ. (4.18) 59 We now derive and prove the convergence of our training algorithm. We maximizeL(q, θ) with respect to q(Z) and θ alternatively in an iterative manner. First, we fix θ and update q(Z): =(q(Z), θ) = Z q(Z)Ep(π) log pθ(S, Z|π) q(Z) dZ = Z q(Z)Ep(π) [log pθ(S, Z|π) − log q(Z)] dZ = Z q(Z)Ep(π) log pθ(S, Z|π)dZ − Z q(Z) log q(Z)dZ = − Z q(Z) log q(Z) p˜θ(S, Z) dZ = −KL(q(Z)||p˜θ(S, Z)), (4.19) where log ˜pθ(S, Z = Ep(π) log pθ(S, Z|π). As KL(q(Z)||p˜θ(S, Z)) ≥ 0, the minimum of KL(q(Z)||p˜θ(S, Z)), which is the maximum ofL, occurs when q(Z) = ˜pθ(S, Z)). Thus in the i-th iteration, setting log q (i) (Z) = Ep(π) log pθ(i−1) (S, Z|π), we can get: L(q (i) (Z), θ (i−1)) ≥ L(q (i−1)(Z), θ (i−1)). (4.20) We can then further derive the update equation of q(Z) as in Eqn. 4.14: 60 log q ∗ (Z) = log ˜pθ(S, Z)) =Ep(π) log pθ(S, Z|π) =Ep(π) [log p(Z|π)] + log pθ(S|Z) = X N n=1 X K k=1 znk{Ep(π) [log πk] + log pθ(Sn|k)}. (4.21) Then, we fix q(Z) and update θ, where terms irrelevant to θ can be viewed as constant. As θ are all the learnable parameters of neural networks, it is hard to directly get a closed optimal formula like q ∗ (Z), and we optimize θ use gradient descent instead: L =L(q(Z), θ) = Z q(Z)Ep(π) log pθ(S, Z|π) q(Z) dZ = Z q(Z)Ep(π) [log pθ(S, Z|π) − log q(Z)] dZ = Z q(Z)Ep(π) log pθ(S, Z|π)dZ + Const = Z q(Z){Ep(π) [log p(Z|π)] + log pθ(S|Z)}dZ + Const = Z q(Z) log pθ(S|Z)dZ + Const = Eq(Z) [log pθ(S|Z)] | {z } L(θ) in Eqn. 4.16 +Const. (4.22) In the i-th iteration, we obtain θ (i) by minimizing −L(θ) with gradient descent. However, as the optimization problem is non-convex, we can not guarantee L is always gets greater after one-step in 61 gradient descent. To address this problem, during each update iteration, we perform M > 1 gradient descent steps for optimization. Therefore, it is reasonable to assume that after M steps: L(q (i) (Z), θ (i) ) ≥ L(q (i) (Z), θ (i−1)). (4.23) Summarizing Eqn. 4.20 and Eqn. 4.23, we get: L(q (i) (Z), θ (i) ) ≥ L(q (i−1)(Z), θ (i−1)), (4.24) which suggests that after each iteration, the objective function L(q(Z), θ) gets greater. Thus, by updating q(Z) and θ alternatively in an iterative manner as derived above, we are able to converge onto the optimal solution. 62 Chapter 5 Controllable Visit Trajectory Generation with Spatiotemporal Constraints In this chapter, we study the problem of explicit control over the trajectory generation algorithm. we focus on visit-based trajectories that capture the sequence of locations and individual visits, such as home, a coffee shop, the office, and the gym [95]. Each visit is specified by its geo-coordinates (latitude and longitude) and the time range of the visit. A visit-based trajectory consists of multiple visits per person per day, making it sparse with irregular inter-arrival times, and hence, challenging to predict [14, 45]. We employ spatiotemporal constraints as our control mechanism. Essentially, a spatiotemporal constraint can be thought of as a 3D cube that defines the valid ranges for a visit in terms of latitude, longitude, and time (e.g., given a time range, generate a trajectory that visit specific locations, see Fig 5.1). The generation of visit-based synthetic trajectories should take these spatiotemporal constraints as input and produce synthetic trajectories that are learned from previously provided training data and adhere to these constraints. Without these constraints, generated trajectories often contain ill-posed visiting patterns, making them unreliable for experimental analysis of movement patterns. Imposing spatiotemporal constraint into trajectory generation is challenging. A common approach for achieving constrained generation involves collecting task-specific examples and introducing additional input signals or conditioning information into the model during the training or fine-tuning stages. This 63 Spatiotemporal Constraint Input Arrive at [9:15 – 10:00 ] between [29.909, -95.349] and [29.920, -95.362] Output Figure 5.1: An example of diversified spatiotemporal constraint. To satisfy the constraint, a staypoint must appear in the defined spatial range (outline in red rectangle) and time window (9:15-10:00) method is prevalent in many natural language processing (NLP) applications, ensuring that generated text adheres to specific requirements for style and tone[40, 39]. However, this approach often fails to follow fine-grained constraints (e.g., enforcing a specific word in neural text generation or following spatiotemporal constraints in our task), even with large training datasets [52]. Large datasets might cover a wider range of scenarios, but they do not inherently teach the model how to prioritize or choose between different trajectory options based on external requirements or constraints. We hypothesize that the major difficulty in constrained trajectory generation lies in the under-specification of the constraints during the training and generation processes. While this lines of method aim to increase the likelihood of generating samples with the same distribution as the training data, they do not guarantee the enforcement of the desired constraints. Another potential approach to circumvent this problem is to optimize the generator in an unconstrained manner and then use threshold filters to select valid trajectories within the constraint 64 range [68]. This method can be limited because trajectories inherently exhibit complex and irregular characteristics of moving patterns, with each data point existing in continuous time and space. This makes it highly unlikely for a trajectory in the unconstrained space to satisfy different spatiotemporal conditions, especially under constraints that have very narrow temporal and spatial windows. To fill the gap and address the above challenges, we propose Geo-CETRA, a general framework that can generate high-quality synthetic trajectories in continuous space-time while effectively enforcing the satisfaction of given spatiotemporal constraints by reparameterizing the sampling space itself and controlling the decoding of the generation process. Specifically, we first show how different constraints can be factorized and interpreted for the model design. Secondly, we propose a novel constrained spatiotemporal generation model that extracts knowledge from continuous temporal and spatial space which can reflects the real-world moving patterns more accurately by a temporal and spatial encoder through random Fourier features and explicitly enforces constraints using reparameterization. Finally, we design a new decoding module to control the generation process and further improve the final performance. The major contributions of this work can be briefly summarized as: • To the best of our knowledge, we are the first to formally formulate and define the problem of constrained trajectory generation with hard spatiotemporal constraints, and we generate trajectories in a continuous spatiotemporal space. • We present our Geo-CETRA model, which learns from the inherent complex and continuous spatial and temporal patterns and produces high-dimensional spatiotemporal representations to support generating realistic trajectories. We apply reparameterization which can provide a general and meaningful scheme to incorporate different spatiotemporal constraints in any gradient-based optimizer. 65 • We conduct extensive experiments on two real-world datasets through various evaluation metrics, and the empirical results show that compared to the state-of-the-art generative methods, our method not only can guide generation towards expected constraints but also generates more realistic human movement trajectories under realistic trajectory constrained settings. 5.1 Preliminaries Definition 8 (Visit-based trajectory) Visit-based trajectories are sequences of GPS points which describe the visiting path of a human agent in continuous space and time. We denote a visit-based trajectory as τ = {(ti ,li)} L i=1, where ti ∈ R + is the timestamp, li ∈ R d denotes the spatial location associated with each timestamp, and L is the total number of spatiotemporal points. In this study, each spatial location corresponds to a two-dimensional latitude-longitude coordinate such that li = (xi , yi). The temporal domain of the trajectory can equivalently be represented as a sequence of strictly positive inter-visit times ∆ti = ti+1 − ti at the i-th point. Representations in terms of ti and ∆ti are isomorphic — we will use them interchangeably for learning throughout the paper. Problem 3 (Constrained trajectory generation) Given a real-world trajectory dataset where each trajectory is subject to a distinct spatiotemporal constraint C(τ ), the objective is to develop a model that captures the spatiotemporal patterns p(τ ) and generate new trajectories that can retain the key characteristics of the original dataset. This objective can be formalized as the following optimization problem: maximize log p(τ ) (5.1) subject to C(τ ) 66 Trajectory models rely on an auto-regressive factorization to perform likelihood estimation and generation of trajectories. The auto-regressive factorization for a trajectory generation model is given using the chain rule as follows: pθ(τ ) = Y L i=1 pθ((ti ,li)|(tj
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Learning controllable data generation for scalable model training
PDF
Deriving real-world social strength and spatial influence from spatiotemporal data
PDF
Decentralized real-time trajectory planning for multi-robot navigation in cluttered environments
PDF
Controlling information in neural networks for fairness and privacy
PDF
Generative foundation model assisted privacy-enhancing computing in human-centered machine intelligence
PDF
Responsible AI in spatio-temporal data processing
PDF
Inferring mobility behaviors from trajectory datasets
PDF
Improving decision-making in search algorithms for combinatorial optimization with machine learning
PDF
Generating and utilizing machine explanations for trustworthy NLP
PDF
Practice-inspired trust models and mechanisms for differential privacy
PDF
Closing the reality gap via simulation-based inference and control
PDF
Towards trustworthy and data-driven social interventions
PDF
From raw sensor data to moving object trajectories at right resolution, quality, and abstraction
PDF
Algorithm and system co-optimization of graph and machine learning systems
PDF
Spatiotemporal prediction with deep learning on graphs
PDF
Emphasizing the importance of data and evaluation in the era of large language models
PDF
Trustworthy spatiotemporal prediction models
PDF
Not a Lone Ranger: unleashing defender teamwork in security games
PDF
Modeling dynamic behaviors in the wild
PDF
A function approximation view of database operations for efficient, accurate, privacy-preserving & robust query answering with theoretical guarantees
Asset Metadata
Creator
Lin, Haowen
(author)
Core Title
Realistic and controllable trajectory generation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2024-08
Publication Date
08/27/2024
Defense Date
08/05/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
constraint generation,deep generative model,OAI-PMH Harvest,spatiotemporal data,trajectory generation
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Shahabi, Cyrus (
committee chair
), Boarnet, Marlon (
committee member
), Dilkina, Bistra (
committee member
), Xiong, Li (
committee member
)
Creator Email
hannahlinhaowen@gmail.com,haowenli@usc.edu
Unique identifier
UC113999U1O
Identifier
etd-LinHaowen-13436.pdf (filename)
Legacy Identifier
etd-LinHaowen-13436
Document Type
Dissertation
Format
theses (aat)
Rights
Lin, Haowen
Internet Media Type
application/pdf
Type
texts
Source
20240828-usctheses-batch-1203
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
constraint generation
deep generative model
spatiotemporal data
trajectory generation