Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Optimizing healthcare decision-making: Markov decision processes for liver transplants, frequent interventions, and infectious disease control
(USC Thesis Other)
Optimizing healthcare decision-making: Markov decision processes for liver transplants, frequent interventions, and infectious disease control
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Optimizing Healthcare Decision-Making: Markov Decision Processes for Liver Transplants, Frequent Interventions, and Infectious Disease Control by Suyanpeng Zhang A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (INDUSTRIAL AND SYSTEMS ENGINEERING) May 2024 Copyright 2024 Suyanpeng Zhang Dedication To my parents Mr. Jin Su and Mrs. Li Zhang and my wife Dr. Vicky Li for their unconditional love. ii Acknowledgements First and foremost, I would like to express my deepest gratitude to my PhD advisor Dr. Sze-chuan Suen. Being advised by you was an incredible experience. You continuously guided and supported me in my research, preparation for an academic career, and personal life. Your enthusiasm and encouragement motivated me to pursue an academic career. I also wish to express my appreciation to my defense and qualifying exam committee members: Dr. Shinyi Wu, Dr. Randolph Hall, Dr. Maged Dessouky, and Dr. Renyuan Xu. Dr. Shinyi Wu, the encouragement you gave after my qualifying exam presentation greatly encouraged me to pursue an academic career. Dr. Randolph Hall, your comments and feedback helped me shape the research ideas that are presented here. Dr. Maged Dessouky, I am grateful for your guidance on our projects and for giving me the opportunity to teach classes. Dr. Renyuan Xu, I greatly appreciate your suggestions and help during my job application process. On this path, my sincere thanks go to Dr. Fengyan Li and Dr. Kristin Bennett at Rensselaer Polytechnic Institute for their advisement and for introducing me to the field of optimization in healthcare. I also wish to acknowledge Dr. Brian Denton, my Master’s program advisor, for his unwavering mentorship and assistance with research projects, and PhD and job applications. My journey through the Ph.D. program was enriched by the love, support, and encouragement from numerous incredible people and groups. Special thanks go to Mingdong, Weizhi, Huiwen, Di, Yingxiao, Qing, Julien, Caroline, Wei, Chris, Peng, Anthony, Geno, Citina, Yiwen, Jinghong, Jing, Ke, and Haozhu. iii The foundation of where I stand today has been laid by my family’s unwavering support. To my parents, Jin Su and Li Zhang, I owe a debt of gratitude for their endless love and solid backing throughout my life. Additionally, to my wife, Vicky Li, thank you for your boundless support and for empowering me with the courage and strength needed to achieve this milestone. iv Table of Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Formulating MDP in healthcare contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Frequency of decision-making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Continuous state space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Variety of MDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1.1 Stopping problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1.2 Continuous-state MDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 MDP with Different Epoch Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 2: Early transplantation maximizes survival in severe acute-on-chronic liver failure: Results of a Markov decision process model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Patients and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 United Network for Organ Sharing (UNOS) database analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Identification of patients with ACLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.3 Overview of model creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.4 Model assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.5 Details of Markov model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.6 Timing of organ acceptance and relative risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.7 Outcome metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.8 Statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 Patient characteristics, categorized by age group and number of organ failures . . . . . . 16 2.3.2 Non-transplant survival probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 v 2.3.3 Post-transplant survival probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 Overall survival probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.5 Timing of accepting a marginal quality donor organ: base case . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.6 Variation in the relative risk and probability of optimal organ offer . . . . . . . . . . . . . . . . . . . . 20 2.3.7 Hepatic vs. extrahepatic ACLF-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.8 Sensitivity analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.9 Analysis of length of stay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 3: Quantifying the Benefits of Increasing Decision-Making Frequency for Health Applications with Regular Decision Epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1 Research Question and Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 Markov Decision Processes in Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 Epoch Sizes in Markov Decision Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 Organ Transplantation with Stochastic Dynamic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.4 Treatment Initiation with Stochastic Dynamic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Structural Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.2 Structural Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.5.1 Organ Transplantation Decisions Among ACLF Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.1.1 Model Inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5.1.3 Base Case Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5.1.4 Sensitivity Analyses: Variation in Relative Risk of Mortality and Probability of Being Offered an Optimal Liver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5.2 Treatment Initiation for Early-Stage CKD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Chapter 4: State Discretization for Continuous-State MDPs in Infectious Disease Control . . . . . . . . . . . . . 62 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.1 Markov Decision Processes in Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2.2 Solving Continuous State MDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2.3 Modeling Disease Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.1 State Space Discretization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 Greedy Algorithm for Finding Discretizations (GreedyCut) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.2 Constructing a Corresponding Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.2.1 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.5 Numeric Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.5.1 Example 1: A Simple SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 vi 4.5.1.1 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.5.1.2 MDP Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.5.1.3 General Algorithm Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.2 Example 2: COVID-19 Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.2.1 Model Structure and Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.5.2.2 MDP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.5.2.3 Extension: MDP with at Most Two Policy Switches . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Chapter 5: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Appendix A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Appendix for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A.1 Model details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 A.1.1 Calculating the Expected One-Year Survival Probability with a Marginal Liver. . . . . . . 101 A.1.2 Additional details regarding the Markov Decision Process Model . . . . . . . . . . . . . . . . . . . . . . 102 A.1.2.1 Model formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 A.1.2.2 Identifying the MDP Optimal Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.1.2.3 Base Case Solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.2 Supplementary figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.3 Supplementary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Appendix for Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 B.1 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 B.2 Organ Transplantation Decisions Among ACLF Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 B.2.0.1 Model Inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 B.2.0.2 State Definition for Liver Transplantation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 B.2.0.3 Transition Probability Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 B.2.1 Parameters Used in the Liver Transplant Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.2.2 Decision boundaries for ACLF3 patients with Ck = 2, 000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.2.3 Decision boundaries for ACLF3 patients with Ck = 10, 000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.2.3.1 Variation in k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.3 Treatment Initiation for Chronic Kidney Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 B.3.0.1 Model Structure and Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 B.3.0.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 vii List of Tables 2.1 Statistical analysis performed using Analysis of Variance for continuous variables and Chi-square analysis for categorical variables. . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Model parameters and sources, including pre- and post-transplant survival probabilities, relative risk of post-LT mortality and health-related utility values. . . . . . . . . . . . . . . 27 4.1 Comparison on MDP solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Runtime of Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3 Mean squared error for the trajectories given different numbers of discretizations . . . . . 87 4.4 Mean squared error for the trajectories given different numbers of discretizations . . . . . 88 A.1 Baseline result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.2 Criteria to determine presence of organ dysfunction/failure . . . . . . . . . . . . . . . . . . 110 A.3 Overall 1-year survival probability based on the decision to transplant on a specific day after listing or defer LT for one day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A.4 Pre and post-transplant survival probabilities for ACLF2 patients, and likelihood of ACLF-3 patients improving to ACLF2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 A.5 Pre and post-transplant survival probabilities for hepatic and extrahepatic ACLF-3 patients 113 A.6 Post-transplant length of hospital stay based on day of transplantation . . . . . . . . . . . 113 B.1 State definition for liver transplantation example . . . . . . . . . . . . . . . . . . . . . . . . 133 B.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 B.3 Decision Boundaries (the maximum number of epochs for an optimal liver), Value of Peak D, and Time at Peak D Under Different Conditions . . . . . . . . . . . . . . . . . . . . . . 140 viii B.4 Decision Boundaries (the maximum number of epochs for an optimal liver), Value of Peak D, and Time at Peak D Under Different Conditions . . . . . . . . . . . . . . . . . . . . . . 141 ix List of Figures 2.1 Diagram of patient flow while awaiting liver transplantation. ACLF-3, acute-on-chronic liver failure grade 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Overall, 1-year survival probability based on the decision to transplant on a specific day or defer LT for 1 day. ACLF-3, acute-on-chronic liver failure grade 3; LT, liver transplantation. 19 2.3 Two-way sensitivity analyses, accounting for center variation regarding probability of receiving an optimal organ offer and expected 1-year post-LT survival using a marginal quality organ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1 Timeline of the more- and less-frequent problems (using k=4 as an example). In the less-frequent problem, the decision-maker can make one decision every four time units at the beginning of each decision epoch. In the more-frequent problem, the decision-maker can make one decision every one time unit at the beginning each decision epoch, resulting in four times as many decisions as the less-frequent problem. . . . . . . . . . . . . . . . . . 36 3.2 Difference in the expected reward earned over time between more-frequent (M) and less-frequent (L) problems for the marginal states with ACLF2, ACLF=3OF, ACLF>3OF. The difference between more-frequent and less-frequent problem drops below $0 once both problems’ optimal actions are accept. Triangles denote when the optimal actions for both problems are wait, ‘+’ when it is wait in M and acccept in L, and ‘*’ when the optimal actions for both problems are accept. . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 Difference in the expected reward earned over time between more-frequent (M) and less-frequent (L) problems for the marginal states with ACLF2. The difference in optimal value is marked triangular when the optimal action for both problems is wait, marked ‘+’ when the optimal action for the more-frequent problem is wait while the optimal action for less-frequent problem is accept, and marked ‘*’ when the optimal action for both problems is accept. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Difference in the expected reward earned over states between more-frequent (M) and less-frequent (L) problems for day 2, 4, and 6. . . . . . . . . . . . . . . . . . . . . . . . . . . 55 x 4.1 Four regions defined using G = {[0, 0.6, 1], [0, 0.2, 1]} are shown in different colors. These correspond to four states:(1) : [X¯ S, X¯ I ] = [0.3, 0.1]; (2) : [X¯ S, X¯ I ] = [0.3, 0.6]; (3) : [X¯ S, X¯ I ] = [0.8, 0.1]; (4) : [X¯ S, X¯ I ] = [0.8, 0.6]. For example, Xt = [0.1, 0.3], the corresponding discretized state representation is X¯ t = [0.3, 0.6]. . . . . . . . . . . . . . . 71 4.2 Apply Cut(1,1,G) where G = {[0, 0.6, 1], [0, 0.2, 1]} gives new discretizations G′ = {[0, 0.3, 0.6, 1], [0, 0.2, 1]}. In the new discretization, the X¯ t is changed as the Euclidean centroid where Xt belongs to has changed. For G, ||Xt − X¯ t ||2 = 0.11. For G′ , ||Xt − X¯ t ||2 = 0.0925. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3 We compare the optimal solution at t = 5 across different states using the GreedyCut and uniform discretized MDPs against the ground truth optimal solution found using brute force methods. (a): Optimal solution from the GreedyCut discretized MDP compared to the brute force solution; (b): Optimal solution from the uniform discretized MDP compared to the brute force solution. ( 0 – both models recommend not implementing lockdown; 1 – both models recommend implementing lockdown; 2 – the brute force method recommends not implementing lockdown while the other method recommends lockdown; 3 – the brute force method recommends implementing lockdown while the other method recommends not implementing lockdown.) . . . . . . . . . . . . . . . . . . . 82 4.4 Runtime of Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5 Comparison between trajectories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6 Comparison between trajectories generated from the GreedyCut discretization method against the uniform discretization method (using 300 discretizations in total) given trajectories generated from SIR model. For each compartment S, I, and R, the GreedyCut discretization method can better capture the disease dynamics. . . . . . . . . . . . . . . . 88 4.7 Proportion of Susceptible/infectious over time for different MDPs. . . . . . . . . . . . . . 91 4.8 Comparison between objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.9 Lockdown policy. The GreedyCut discretized MDP recommends starting the lockdown on week 7 for 42 weeks. The uniform discretized MDP recommends starting the lockdown on week 8 for 50 weeks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.10 Proportion of the population susceptible/infectious over time for different MDPs with policy constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 A.1 (a): Expected one-year survival probability with a marginal liver by relative risk for two age groups, given the probability of receiving an optimal organ of 60%. (b): Expected one-year survival probability with a marginal liver by relative risk for two organ failure groups, given the probability of receiving an optimal organ of 60%. The expected one-year survival probability with a marginal liver gives the proportion of post-transplant patients that will survive up to one-year with a marginal liver transplanted. This value depends on the relative risk (rr). There is a positive linear relationship between the expected one-year survival probability with a marginal liver and the relative risk. . . . . . . . . . . . . . . . 104 xi A.2 Non-transplant survival probabilities, according to recipient age and number of organ failures present at listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 A.3 Post-transplant survival probabilities, using an optimal or marginal donor organ . . . . . . 106 A.4 Model schematic. Diagram of ACLF3 patient flow while waiting for transplant. The left panel shows the flowchart of daily patient outcomes. The pre-transplant health state changes are modeled by the Markov model in the top red box; the post-transplant health states are also described by a Markov model (lower red box). Greek letters are probabilities and are given in Table 2.1 and Table A.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 A.5 Two-way sensitivity analysis after removal of patients with suspected chronic kidney disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 A.6 Two-way sensitivity analyses after removal of patients transplanted before year 2014 . . . 109 A.7 Two-way sensitivity analyses, accounting for center variation regarding probability of receiving an optimal organ offer and expected 1-year post-LT survival using a marginal quality organ, for hepatic and extrahepatic ACLF3 patients . . . . . . . . . . . . . . . . . . 109 B.1 Difference in the expected reward earned over time between more-frequent (M) and less-frequent (L) problems for the CKD stage 1 and CKD stage 2. Triangles mark the difference in optimal value when the optimal action for both the more-frequent and the less-frequent problems is wait, and marked ‘+’ when the optimal action for both the more-frequent and the less-frequent problems is accept. . . . . . . . . . . . . . . . . . . . 138 xii Abstract Repeated decision-making problems in the context of uncertainty naturally arise in healthcare settings. Markov decision processes (MDPs) have proven useful in many healthcare contexts, integrating disease progression, decision-making, costs, and benefits into an optimization framework. However, implementing MDPs in healthcare settings is nontrivial due to challenges including incorporating unique characteristics of certain diseases, determining the optimal frequency of decision-making, and dealing with the infinite number of possible states. In this dissertation, we focus on specific healthcare problems and identify key structural properties to address healthcare questions. We present a finite horizon MDP framework for patients with acute liver failure in need of a transplant, determining the optimal timing for accepting a suboptimal organ to maximize one-year survival probability. Additionally, we study the value provided by having additional decision-making opportunities in each epoch. We provide structural properties of the optimal policies and quantify the difference in optimal values between MDP problems of different decision-making frequencies. We analyze a numerical example using liver transplantation in high-risk patients and treatment initiation for chronic kidney disease patients to illustrate our findings. Finally, in the fourth chapter, to address the curse of dimensionality, we propose a novel greedy algorithm for non-uniform discretization in a population-level MDP for infectious disease control. xiii The dissertation contributes to the field of healthcare applications by providing practical MDP frameworks and efficient algorithms to tackle complex decision-making problems. The theoretical results and empirical analyses offer valuable guidance for healthcare decision-makers in diverse scenarios. xiv Chapter 1 Introduction Numerous healthcare challenges revolve around creating stochastic modeling frameworks that assist caregivers in developing tailored treatment plans for individuals, enable policymakers to evaluate the impact of policies on the public, and help patients to attain improved health outcomes. Many of these problems have to address repeated decision-making with fixed decision intervals in the context of uncertainty. In individual settings, applications include patients with a chronic illnesses who may require a treatment regimen tailored to their disease progression, which may change over time (such as for glaucoma [76], diabetes [85], and other diseases [3]), or patients with organ failures may be offered organs of varying quality for transplantation, and may choose to wait or accept offered organs over time as their own survival probability declines [5, 75, 50, 161]. In the infectious disease control context, Grimm et al. (2021) and Kai et al. (2020) compare the effectiveness of wearing masks and social distancing during the Covid-19 pandemic[60, 72]. Tuite et al. (2014) examines the cost-effectiveness of more-frequent screening recommendations for positive-HIV men who have sex with men (MSM)[137], and Suen et al. (2014) studies the effectiveness of different disease control strategies for tuberculosis (TB) in India[121]. In the various healthcare applications mentioned above, the Markov decision process (MDP) model showcases its adaptability in many different settings as it seamlessly integrates disease progression, decisionmaking, and costs and benefits into one optimization framework. Then the optimal policy can be tractably 1 solved using efficient algorithms. By solving this optimization problem, the decision-maker knows the optimal action to take, given diverse health statuses (severe, mild, or on treatment) and decision periods (different ages, for example). MDPs have been extensively employed in the field of operations research, spanning various applications [54, 20, 13]. The literature on MDPs also offers a profound understanding of their structure and effective solving techniques [107, 55, 134], serving as the fundamental basis for subsequent studies concerning threshold structures of MDP policies, upon which we build our analysis as well. MDPs are also a commonly used tool for healthcare applications [46, 2, 68, 90, 6, 5, 28, 82, 93, 116, 47, 36, 85, 3, 95, 87, 119, 17, 114, 159, 111, 112, 80, 12, 118]. In particular, Hu et al. (1996) and Zargoush et al. (2018) used MDPs to study drug therapy problems[68, 159]. Magni et al. (2000) compared static and dynamic models under the same healthcare application[90]. Jennifer (2007) studied disease management for sepsis patients[82]. Shechter et al. (2008) proposed a MDP framework to identify the optimal time to initiate HIV treatment under ordered health states[116]. Liu et al. (2017) considered technology improvement in Hepatitis C treatment using a MDP model[87]. Maillart et al. (2008), Chhatwal et al. (2010), and Alagoz et al. (2013) studied breast cancer screening policy using a MDP framework[93, 36, 3]. Denton et al. (2009), Kurt et al. (2010), and Mason et al. (2014) used a MDP framework to solve problems for type 2 diabetes patients[47, 85, 95]. Arruda et al. (2019) studied sequential testing problem using dynamic programming and shortest path algorithms[12]. Singh et al. (2020) also presented a POMDP formulation for sequential testing for infectious diseases[118]. Khojandi et al. (2018) developed a semi-MDP model to find extraction policies maximizing lifetime for patients ongoing cardiovascular implantable electronic device treatment[80]. Ayer et al. (2019) optimized blood collection operations using MDP[17]. Schell et al. (2019) improved patient adherence to treatment plan through MDP[114]. Skanadri et al. (2021) formulated a partially observable MDP (POMDP) mnodel for patients with chronic kidney disease[119]. Moreover, many organ transplant problems are solved using MDPs and POMDPs [46, 2, 6, 5, 28, 111, 112], which we 2 discuss at greater length below. Several papers provide more detailed overviews of works involved using MDPs in healthcare applications[113, 4, 120]. Understanding the significance of MDPs in healthcare sets the stage for exploring the associated challenges. For example, when constructing an MDP framework for healthcare applications, the framework may need to be modified based on the unique structure of the problem (continuous state space, for example). We consequently need to study the structural properties of these problems and provide efficient algorithms to solve them. We next provide a brief overview of some of the challenges we identified for the implementation of MDPs in real-world healthcare problems. 1.1 Challenges Implementing MDPs to address healthcare issues is far from straightforward, given the model’s intricate structure, the need for enhanced solving algorithms, the consideration of continuous time decisionmaking, and many other factors. Our primary objective is to concentrate on applying the finite horizon MDP framework to specific healthcare problems while identifying key structural properties that aid in resolving arising health-related questions. 1.1.1 Formulating MDP in healthcare contexts In diverse health settings, a general MDP framework may need to be adjusted to incorporate certain associated characteristics. For instance, organ transplant problems are usually modeled as a stopping problem [161]. To this end, we parameterize classic MDP models to describe the realistic setting of different problems in all our work. This involves defining appropriate health states, extracting monetary rewards from literature, and constructing transition matrices for different healthcare applications. These processes are essential for effectively addressing the distinct challenges and dynamics of each healthcare application. 3 1.1.2 Frequency of decision-making In general, the frequency at which decisions can be made or changed in these contexts is determined by some physical limitation that occurs regularly over time, e.g., the frequency of doctor’s visits or transplantation offers. In some cases, the frequency could be changed with additional costs (a patient on an organ transplant wait list can choose to transfer or multiply list in other transplant centers, for example). In many contexts, it is important to identify the best times to offer more frequent decision-making opportunities and quantify the associated benefits. This allows for better evaluation of whether the benefits justify the potential costs of creating these additional decision-making opportunities. However, to our knowledge, no prior work has examined the benefits of additional decision-making opportunities in an MDP context. We address this problem in Chapter 3. 1.1.3 Continuous state space When dealing with MDPs for large populations, we inevitably encounter the challenge of dimensionality (curse of dimensionality) as the number of states increase. This renders the problem difficult to solve, as the explosion in state space size makes traditional methods impractical even for short time horizons, given that time complexity heavily depends on state space size. Chapter 4 of this dissertation aims to identify a viable approach to discretize the state space effectively, enabling MDPs to find high-quality solutions with discretized states. 1.2 Background MDPs were used back in the 1960s[67]. It is a discrete-time stochastic control process that has been widely applied to various applications including inventory management [54], portfolio management [20], production and storage [13], and others. An MDP framework consists of states (S), actions (A), transitions (P), and rewards (R). S is the set of states that are used to describe the system. A is the set of actions/controls 4 that can be applied. The transition PT (s ′ t+1|st , at) describes the probability of the action at ∈ A in state st ∈ S at time t will lead to state s ′ t+1 ∈ S in the next time period. Rt(st , at) is the reward or cost associated with taking action at ∈ A in state st ∈ S at time t. The objective of this process is to find the optimal policy π such that the total expected reward is maximized: E[ X∞ t=0 λtRt(st , at)] where at = π(st),i.e. actions given by the optimal policy. This optimization can be solved using dynamic programming: Vt(st) = X s ′ t+1 P(s ′ t+1|st , π(st))(Rt(st , π(st)) + Vt+1(s ′ t+1) π(st) = argmaxat{ X s ′ t+1 P(s ′ t+1|st , at)(Rt(st , at) + Vt+1(s ′ t+1)} We point the reader to Schaefer et al. (2004), Alagoz et al. (2010), Sonnenberg et al. (1993), and Givan et al. (2001) for a more complete review of MDPs[113, 4, 120, 55]. 1.2.1 Variety of MDPs 1.2.1.1 Stopping problems The stopping problem can be viewed as a simplified version of a standard MDP [103, 117, 21]. The action space is limited to two actions: accept or wait. Once the action accepted is chosen, the process will be terminated. Then the revised optimization problem becomes: Vt(st) = X s ′ t+1 P(s ′ t+1|st , π(st))(Rt(st , π(st)) + Vt+1(s ′ t+1)) 5 π(st) = argmaxat∈{wait,accept}{ X s ′ t+1 P(s ′ t+1|st , at)(Rt(st , at) + Vt+1(s ′ t+1)} Studies are interested in examining the existence of threshold policies and identifying threshold policies for stopping problems. For example, finding s ∗ t such that accept is the optimal action for state st when st ≤ s ∗ t . The stopping problem has been widely applied to healthcare problems, including cancer screening [3], treatment initiation [116, 85], organ transplantation [161], and so on. In Chapter 4 and Chapter 3, we work on constructing a stopping problem framework for liver transplant and treatment initiation problems. We also examine structural properties under this framework. 1.2.1.2 Continuous-state MDPs A continuous-state MDP is another extension of MDP with state space S being continuous. For example, s ∈ S can be any continuous number between 0 to 1. Such a problem is widely seen in partially-observable MDPs (POMDPs) where one can compute a belief state where this belief state bt ∈ [0, 1] is a continuous state that represents the likelihood of being each state. The POMDP problem then can be viewed as a continuous-state MDP. Truncating and discretizing the state [29] are applied to address the continuous-state issue. For example, Zhou et al. (2010) used Monte-Carlo simulation to approximate the belief state by a finite number of particles on discretized grid mesh[163]. Sandikcci et al. (2013) used fixed-resolution, non-uniform grids to discretize the belief state and approximate the optimal policy for a partially observable MDP (POMDP) model[112]. Lovejoy et al. (1991) used fixed or uniform grids to approximate the solution of the POMDP[89]. However, employing uniform or pre-defined discretizations might not be suitable in all cases, especially when dealing with infectious disease control problems characterized by unpredictable uncertainty under varying policy scenarios. To tackle such instances, we work on developing an efficient state discretization algorithm in Chapter 4 to allow these problems to be formulated as discrete state MDPs. 6 1.2.2 MDP with Different Epoch Sizes There are two main time-related components that impact a decision-making process: the time horizon and epoch size. The former has been studied in prior literature, as exemplified by literature that considers the effect of different lengths of life on decision-making [49, 48]. The latter has received less attention in healthcare applications. In general, in healthcare scenarios, the timing for making decisions, known as the decision epoch, is constrained by physical limitations that recur at regular intervals. This might include the frequency of doctor’s visits or transplantation offers. Such constraints in the healthcare environment are frequently associated with significant costs and must occur after a discrete interval of time. For example, chronic kidney disease (CKD) treatment regimen changes can only be made when the patient visits a doctor’s office, which may happen at some interval (e.g., weekly, monthly, etc.). In these contexts, it is important to identify the best times to offer more frequent decision-making opportunities and quantify the associated benefits. This allows for a better evaluation of whether the benefits justify the potential costs of creating these additional decision-making opportunities. While an alternative would be to use a continuous-time MDP (CTMDP) or semi-MDP (SMDP) model, we focus on a discrete-time formulation in alignment with the majority of the work in clinical and healthcare applications using MDPs, with the hope that this makes our work more generalizable to existing MDP models. CTMDP and SMDP frameworks are usually more difficult and more computationally costly to solve than discrete-time MDP models, and this may also contribute to their relative unpopularity in the healthcare application context. Unlike discrete-time MDPs, CTMDPs and SMDPs assume the time spent in a state before a transition follows an exponential distribution or arbitrary distribution, respectively. For example, Chitgopekar et al. (1969) generalized SMDPs so that the actions can be taken at time points between state transitions[38]. While CTMDPs have been used in many applications [74, 15], these are outnumbered by those using a discrete-time framework. 7 However, even within the context of discrete-time models in healthcare, the choice of epoch size is not always clear. This has led to prior work on methods to convert between epoch sizes; for example, Chhatwal et al. (2016) shows how eigen-decomposition methods can be used for converting transition probability matrices between different lengths of time[37]. We will use this technique to convert transition probabilities as well as reward functions between frequencies in our work. One notable prior work has tangentially addressed the issue of epoch size in an MDP using a variable decision-making frequency model. Alagoz et al. (2013) formulated a finite-horizon MDP model (a stopping problem) in breast cancer diagnosis. The goal of the work to reduce unnecessary follow-ups by considering follow up with different frequencies[3]. Alagoz et al. (2013) introduced two non-terminate actions (followups) which may be chosen every 6 months and 12 months respectively[3]. This problem introduces the utility of considering different action frequencies when solving for optimal health policies, but does not quantify the benefits of more frequent decision-making, which informs decision-makers of how much to invest when considering increasing decision-making frequency, which we do here. There are also examples of using restless bandits to choose epoch sizes in decision-making problems. For instance, Prins et al. (2020) develop a restless multi-armed bandit framework for monitoring drug adherence[106]. The doctor can choose to observe the patient’s state at each decision epoch, potentially resulting in variable lengths of time between observations. The authors reformulate this problem as a two-state POMDP and solve it heuristically. However, this approach cannot compare the exact additional value of more frequent decision-making, as we do in this work by comparing two MDP formulations. In addition, our approach extends the existing MDP literature on organ transplantation, of which there is a rich legacy [46, 2, 6, 5, 111, 112]. 8 1.3 Summary of Contributions The thesis progresses to address three identified challenges in the intersection of MDPs and healthcare, through three chapters preceding the conclusion. It explores the intricacies of MDP parameterization for ACLF patients in line for organ transplants, quantifies the benefits of additional decision-making opportunities in healthcare applications, and effectively solves the continuous-state MDPs in the context of infectious disease control. In Chapter 2, We introduce a novel approach in the field of liver transplantation for acute liver failure patients. To identify the ideal timing to start accepting a suboptimal organ by maximizing the expected one-year survival probability, we create an MDP in stopping problem framework. This is a novel work in liver transplantation for acute liver failure patients. We construct transition probability matrices and rewards using data from the United Network for Organ Sharing (UNOS) database. We solved the problem for different groups (age, number of organ failures, and certain disease groups). By utilizing the MDP framework for early transplantation in acute liver failure, we offer valuable and practical insights to healthcare providers. These insights enable clinicians to make informed decisions regarding the optimal timing for organ transplantation, potentially leading to improved patient outcomes and more efficient allocation of available resources. The work in Chapter 2 has been published in [161]. In Chapter 3, we quantify the benefits of increasing decision-making frequency by comparing two equivalent MDP (stopping problem) frameworks with different decision-making frequencies. We study the existence of threshold policies over both state and time for both problems. Our contribution includes providing novel structural properties that illustrate how these benefits vary across states and over time, as well as the sufficient condition on when additional benefits would be positive. We also validate the applicability of our approach through two real-world examples (1. liver transplant for acute liver failure patients, which is extended from Chapter 2; 2. early treatment initiation for chronic kidney disease patients), demonstrating its practical relevance. The findings of this research deliver crucial insights to 9 decision-makers, informing them of the necessity and impact of changing decision-making frequency under different circumstances. This work has been published in [130]. In Chapter 4, we formulate an MDP that discretizes the continuous state space for infectious disease control. To overcome the continuous state space on a population-level model, we provide a novel greedy algorithm for non-uniform discretization that captures the non-linear disease dynamics. This work is anticipated to offer significant benefits to modelers by providing an efficient and straightforward method for constructing MDP frameworks from other disease models. This MDP helps identify dynamic optimal policies, making it suitable for addressing complex healthcare decision-making scenarios. Moreover, our proposed non-uniform discretization algorithm is expected to outperform existing approaches and uniform discretization methods, providing a more effective way to handle continuous state spaces in MDP-based disease modeling. Furthermore, the research aims to provide valuable insights into population-level disease modeling problems. By offering high-quality solutions in real-world applications like COVID-19, we aim to enhance the understanding and management of infectious diseases at a population level. The outcomes of this work can contribute to more effective policy-making and intervention strategies, ultimately benefiting public health outcomes on a broader scale. To sum up, this thesis advances the field of MDP within healthcare applications, providing novel approaches and methodologies across crucial challenges. Additionally, the methodologies developed have potential applicability in broader contexts beyond healthcare. For instance, the algorithm for discretizing continuous states could also be applied to change-point detection algorithms when the state space is large or continuous. Collectively, these contributions not only push the boundaries of MDP application in healthcare but also offer practical solutions to pressing medical and public health issues. 10 Chapter 2 Early transplantation maximizes survival in severe acute-on-chronic liver failure: Results of a Markov decision process model 2.1 Introduction Acute-on-chronic liver failure (ACLF) is an increasingly prevalent syndrome[124], occurring in patients with decompensated cirrhosis, that is associated with severe systemic inflammation[42, 43, 123], organ failures, and high 28-day mortality[98]. The short-term mortality of certain patients with ACLF grade 3 (ACLF-3), defined as the development of ≥3 organ failures[98], is particularly high[10, 11, 70] and potentially surpasses that of acute liver failure[129]. Mortality is especially great for those with 4-6 organ failures who have been shown to have a 100% mortality within 28-days from presentation, as demonstrated by a prospective study[62]. Liver transplantation (LT) yields excellent patient survival both at 1-year and in the long-term[125, 127]. However, uncertainty still remains regarding the appropriate timing of transplantation in this population, due to challenges related to waitlist and posttransplant mortality. There are several factors which may be incorporated into the timing of transplanting a patient with ACLF-3, including the likelihood of dying on the waiting list if LT is delayed, the potential for recovery of organ failures prior to transplantation to improve post-transplant survival, and the greater posttransplant mortality associated with utilizing a marginal quality organ. A prior registry study demonstrated that the 11 occurrence of LT within 30 days in patients listed with ACLF-3 was associated with reduced 1-year post-LT mortality, but also demonstrated that transplantation using an organ with a donor risk index (DRI) ≥1.7 predicted greater likelihood of death after LT[125]. Additionally, though earlier transplantation in patients with ACLF-3 may improve post-LT survival, greater post-LT survival may be achieved by transplanting the patient after organ failure improvement and subsequent recovery from ACLF-3, particularly in patients >60 years[126]. To address these uncertainties surrounding LT in patients with ACLF-3, we created a Markov decision process model which maximizes overall survival probability, accounting for expected waitlist mortality, post-transplant survival based on donor organ quality, and likelihood of organ failure recovery prior to transplantation. We hypothesized that due to the high waitlist mortality associated with ACLF-3, earlier transplantation of candidates listed with ACLF-3 yields the greatest survival probability, even when accounting for reduced post-LT survival with a marginal quality organs11 and increased post-LT survival associated with organ failure recovery prior to transplantation[126]. 2.2 Patients and Methods The study protocol was considered exempt from review by the institutional review board at CedarsSinai Medical Center. The study and analysis of this study was performed in accordance with STROBE (STrengthening the Reporting of OBservational studies in Epidemiology) guidelines[145]. 2.2.1 United Network for Organ Sharing (UNOS) database analysis From the UNOS registry (www.unos.org), we evaluated patients aged 18 or older listed for LT from 2005 to 2017. Patients who were listed as status-1a, who were retransplanted or who underwent multi-organ transplantation, aside from simultaneous liver and kidney transplantation, were excluded. We collected data regarding patient characteristics at the time of waitlist registration, as well as information regarding waitlist outcomes and post-LT outcomes. 12 2.2.2 Identification of patients with ACLF ACLF at the time of waitlist registration was identified based on the European Association for the Study of the Liver-Chronic Liver Failure (EASL-CLIF) criteria of having a single hepatic decompensation of either ascites or hepatic encephalopathy and the presence of the following organ failures: single renal failure, single non-renal organ failure with renal dysfunction or hepatic encephalopathy, or 2 non-renal organ failures[98]. (Table A.2). Although bacterial infection and variceal hemorrhage are also decompensating events, information regarding these conditions was unavailable in the UNOS database. Specific organ failures were determined according to the CLIF consortium organ failure score for coagulopathy, liver failure, renal dysfunction and renal failure, neurologic failure, and circulatory failure[98]. We used mechanical ventilation as a surrogate marker for respiratory failure. Grade of ACLF was determined based on the number of organ failures at listing and transplantation. (Table A.2) This methodology has been utilized in several previously published studies regarding LT related to ACLF[123, 125, 129]. All patients analyzed had ACLF-3 at the time of listing and at transplantation. We categorized organ quality as optimal (DRI <1.7) or marginal (DRI ≥1.7)[52]. 2.2.3 Overview of model creation We used a stochastic dynamic programming model, which considers the risk of death over time without transplantation, post-LT survival, and uncertainty in quality of livers offered for transplantation in the future, to evaluate the optimal time to accept a liver allograft for LT. The Markov decision process model captures the likelihood of death or being offered an optimal organ daily, for 7 days from the time of listing. We chose a time horizon of 7 days to minimize the chance of daily variation in the patient’s course and because non-transplant mortality approached 50% by day 7, per our analysis. On each of the first 7 days after listing, the provider may accept the organ, upon which the model calculates the 1-year post-transplant survival. If the provider declines the organ, the model resets and the provider will be offered either an 13 optimal or marginal liver the next day. In our Markov decision process model, we accounted for the following factors: patient age > or ≤60 years[126], number of organ failures at listing (3 vs. 4-6), organ quality, and waiting time until LT. 2.2.4 Model assumptions We made several assumptions in the model. First, we assumed that each day, a liver of either optimal or marginal quality will be provided to each patient who has not been transplanted and that a marginal liver results in a lower 1-year post-LT survival probability. Secondly, we assumed that the probability of being offered an optimal organ is constant and independent of the organ quality offered the previous day. Finally, we assumed that best strategy is to always accept an optimal organ if one is offered. 2.2.5 Details of Markov model We modeled the likelihood of receiving an optimal organ each day as α, and the likelihood of receiving a marginal organ as 1 − α. Because the probability of receiving an organ offer varies across UNOS regions, we examined different values of a. For instance, if a center has an expected 70% probability of a liver offer, then α would be 0.7. (Fig. 2.1) We utilized 2 Markov processes: the pretransplant process (Fig. 2.1, top box) and the post-transplant process (Fig. 2.1, bottom box). On each day t after listing, the candidate has a non-transplant mortality probability of γt , as determined from the UNOS database. 2.2.6 Timing of organ acceptance and relative risk To find the optimal time to accept a marginal organ, we used a backwards induction algorithm[23, 24] designed to maximize expected 1-year survival, given all possible decisions on each day (supplemental appendix). Length of hospital stay does not differ substantially between patients with earlier or later transplants, and we therefore omitted this consideration from our analysis. The difference in post-LT 14 Figure 2.1: Diagram of patient flow while awaiting liver transplantation. ACLF-3, acute-on-chronic liver failure grade 3. survival probability when transplanted with a marginal vs. optimal organ was estimated by the relative risk (Fig. A.1). In our base case analysis, we used 0.9 as the relative risk, but we varied the relative risk from 0.6 to 0.9 in sensitivity analyses (Fig. A.1). 2.2.7 Outcome metrics Our primary outcome was which day the provider should stop waiting for an optimal organ and accept a marginal liver. Because of variation in organ availability and post-transplant outcomes between centers, we presented our results across different parameter values. Our results are therefore presented across multiple values of the probability an optimal organ is offered (α) and the relative risk of survival for a marginal organ (relative risk) in 2-way graphs. Exact equations and details for both can be found in the supplementary information. 15 2.2.8 Statistical analysis Data was extracted and analyzed from the UNOS database using Stata 16 (Houston, TX), with descriptive statistics performed with analysis of variance with Bonferroni correction for continuous variables and Chisquare testing for categorical variables. Survival analyses were assessed using Kaplan-Meier methods, with log-rank testing. The Markov decision process model was created using Python 3.6. 2.3 Results 2.3.1 Patient characteristics, categorized by age group and number of organ failures Baseline characteristics of the study population are depicted in Table 2.1. We identified 5,851 patients listed with ACLF-3 who met our inclusion and exclusion criteria, representing 4.3% of the 134,728 patients listed for LT. Patients were subdivided according to age above or ≤60 years or the presence of 3 or 4-6 organ failures at waitlist registration. We did not create additional stratifications within these subgroups, such as patients above age 60 and with 4–6 organ failures, due to loss of sample size. When classifying the transplant candidates according to the number of organ system failures at listing, we identified 4,035 (68.9%) patients with 3 organ failures and 1,816 (31.1%) patients with 4-6 organ failures. Mean model for end-stage liver disease (MELD)-Na score at listing was higher among patients listed with 4-6 organ failures (39.1 vs. 38.5, p = 0.047). Additionally, patients with 4-6 organ failures had a greater prevalence of brain failure (63.8% vs. 49.7%, p <0.001), circulatory failure (99.2% vs. 20.5%, p <0.001) and need for mechanical ventilation (98.3% vs. 11.9%), while those with only 3 organ failures had a higher prevalence of liver failure (82.5% vs. 78.3%, p <0.001) and coagulation failure (71.2% vs. 49.6%, p <0.001). 16 Age ≤60 years Age >60 years p value 3 organ failures 4-6 organ failures p value (n = 4,359) (n = 1,492) (n = 4,035) (n = 1,816) Age, (SD) 47.6 (9.2) 63.8 (3.2) <0.001 51.8 (10.3) 50.3 (11.6) 0.002 Male (%) 2,721 (62.4) 872 (58.5) <0.001 d 2,558 (63.4) 1,035 (56.9) <0.001 Race/ethnicity (%): <0.001 0.001 Caucasian 2,795 (64.1) 980 (65.7) 2,671 (66.2) 1,104 (60.8) African American 540 (12.4) 170 (11.4) 481 (11.9) 229 (12.6) Hispanic 752 (17.3) 240 (16.1) 647 (16.0) 345 (19.0) Missing 88 (1.9) 14 (0.9) 64 (1.6) 32 (1.8) Etiology of liver disease (%): <0.001 <0.001 Alcohol 1,408 (32.3) 373 (25.0) 1,278 (31.7) 472 (25.9) NAFLD 444 (10.2) 289 (19.4) 528 (13.1) 205 (11.3) Hepatitis C virus 1,163 (26.7) 199 (13.3) 1,101 (27.3) 435 (23.9) Hepatitis B virus 233 (5.4) 66 (4.4) 198 (4.9) 101 (5.6) Autoimmune hepatitis 232 (5.3) 80 (5.4) 213 (5.3) 99 (5.5) Primary biliary cholangitis 85 (1.9) 55 (3.7) 93 (2.3) 47 (2.6) Primary sclerosing cholangitis 86 (1.9) 26 (2.4) 93 (2.3) 29 (1.6) Cryptogenic 230 (5.3) 192 (10.2) 259 (6.4) 123 (6.8) Other 478 (10.9) 212 (14.2) 272 (6.7) 305 (16.8) MELD-Na score, (SD) 39.1 (6.4) 38.4 (7.0) 0.001 38.5 (6.3) 39.1 (6.7) 0.047 Liver failure (%) 3,605 (82.8) 1,141 (76.5) <0.001 3,329 (82.5) 1,421 (78.3) <0.001 Renal failure (%) 3,487 (80.1) 1,243 (83.3) 0.005 3,247 (80.5) 1,483 (81.7) 0.284 Coagulation failure (%) 2,856 (65.5) 916 (61.4) 0.004 2,872 (71.2) 900 (49.6) <0.001 Brain failure (%) 2,344 (53.8) 819 (54.9) 0.463 2,006 (49.7) 1,158 (63.8) <0.001 Circulatory failure (%) 1,924 (44.1) 729 (48.3) 0.006 828 (20.5) 1,802 (99.2) <0.001 Mechanical ventilation (%) 1,647 (37.8) 616 (41.3) 0.016 478 (11.9) 1,785 (98.3) <0.001 Table 2.1: Statistical analysis performed using Analysis of Variance for continuous variables and Chisquare analysis for categorical variables. 2.3.2 Non-transplant survival probability Non-transplant survival probabilities are depicted in Fig. A.2. By the seventh day, the survival probability is 60.3% for patients aged ≤60 and 52.8% for patients aged >60 years (p = 0.009). When examining of 3 vs. 4-6 organ failures, we found that by day 7, the survival probability was 62.7% for patients with 3 organ failures and 51.8% for patients with 4-6 organ failures (p = 0.009). 17 2.3.3 Post-transplant survival probability One-year post-LT survival probabilities are displayed in Fig. A.3. In Fig. A.3A, survival after LT is depicted according to age and donor organ quality. Among patients younger than 60 years the 1-year survival probability is 86.2% when transplanted with a low DRI liver and 78.2% using a high DRI organ. Among recipients >60 years old, 1-year survival after LT was 77.1% using an optimal liver and 74.1% with a marginal liver. Fig. A.3B shows similar post-LT survival, categorized by number of organ failures at listing and the type of donor organ. When transplanted with an optimal liver, the 1-year survival probability is 86.5% for a patient with 3 organ failures vs. 80.3% with a marginal organ. Among patients with 4-6 organ system failures, the 1-year survival is 79.2% when using a low DRI organ and 69.6% after LT with a high DRI organ. 2.3.4 Overall survival probability We next compared overall survival probability among the 4 patient subgroups (Fig. 2.2A-D), based upon whether the decision was made to proceed with transplantation on a specific day, regardless of organ quality, or to decline an organ offer and proceed with LT on the next day. The daily survival probabilities for each group, as based on these decisions, are provided in Table A.3. In Fig. 2.2A, which depicts patients with 3 organ failures at listing, we demonstrate that from day 1 through 7 on the waiting list, LT yields a daily average of 4.4% greater overall survival probability than remaining on the waitlist for an additional day (p <0.001). Similar findings were demonstrated among patients with 4-6 organ failures at listing (Fig. 2.2B), with a 5.2% difference in overall survival from day 1-7 after listing (p <0.001). In Fig. 2.2C and Fig. 2.2D, survival probabilities are displayed among patients categorized according to age. For candidates aged ≤60 years, the average daily difference in overall survival was 4.7% (p <0.001), whereas for patients older than 60 years, the average difference in survival was 5.0% (p <0.001). These findings suggest that during the first week on the waiting list for patients with ACLF-3, a delay in LT by 1 day is associated with a reduction in overall survival probability. 18 Figure 2.2: Overall, 1-year survival probability based on the decision to transplant on a specific day or defer LT for 1 day. ACLF-3, acute-on-chronic liver failure grade 3; LT, liver transplantation. 2.3.5 Timing of accepting a marginal quality donor organ: base case We created a Markov decision process model to address the timing of when to accept a marginal organ. Values for selected parameters including pre- and post-LT survival probabilities, likelihood of receiving an organ, and relative risk of posttransplant are listed in Table 2.2. For the base case, we estimated the relative risk of post-transplant survival between a marginal and optimal liver to be roughly 0.9 (equivalent to 0.78 probability of 1-year survival for a marginal liver compared to 0.86 for an optimal liver) based on analysis of the UNOS database. We assumed the likelihood of being offered an optimal liver to be 60% (α = 0.6). In this scenario, for a patient with 3 organ failures alone, if an optimal organ is not offered on day 1 and day 2, we recommend accepting a marginal liver starting on day 3 and proceeding with LT. However, if the patient 19 has 4-6 organ failures at listing, we recommend accepting a marginal liver on day 1 of listing, regardless of the patient’s age, due to the high nontransplant mortality associated with having 4-6 organ failures. The expected 1-year post-transplant survival probability is 79.8% for recipients with 3 organ failures and 71.3% for patients with 4- 6 organ failures. For patients in both age groups, we recommend accepting a marginal liver on day 2 of listing. In this scenario, the expected 1-year post-transplant survival probability is 70.3% for patients >60 years and 78.7% for those ≤60 years old. 2.3.6 Variation in the relative risk and probability of optimal organ offer As the probability of receiving an organ offer and the post-transplant survival utilizing a marginal liver differs between centers, we determined the timing regarding when to accept a marginal quality organ, using different probabilities of receiving an optimal liver offer, across variable post-LT survival probabilities when using a marginal liver. Fig. 2.3A-D display 2-way sensitivity analyses depicting the maximum number of days to wait for an optimal liver, based on the expected probability of receiving an offer and the expected post-transplant survival for each center. The y-axis represents the likelihood of receiving an optimal organ, ranging from 0 to 1. The x-axis represents the expected 1-year survival when transplanted with a marginal liver, which varies from 0.5 to 0.9. On the right side of each graph are the representative decision boundaries to determine which day after listing the provider should proceed with LT, even if offered a marginal organ. In Fig. 2.3A, we display a scenario of a transplant candidate >60 years old. In this setting, if the center has an expected 1-year survival of 70% for patients transplanted with a marginal liver and a 50% daily likelihood of being offered an optimal liver, then LT should proceed on day 1 if an organ is offered, regardless of quality (Fig. 2.3A, red star). However, if the patient is ≤60 years old, the center can wait until day 2 before accepting a marginal organ (Fig. 2.3B, red star). We describe additional scenarios according to the presence of 3 or 4-6 organ failures at waitlist registration in Fig. 2.3C,D. As expected, the decision 20 boundaries occur earlier for patients aged >60 years or with 4-6 organ failures at listing, indicating survival benefit with shorter waiting time. 2.3.7 Hepatic vs. extrahepatic ACLF-3 We also examined outcomes for patients listed with ACLF-3 according to the presence or absence of extrahepatic organ failures, with extrahepatic organ failures defined as either brain failure, circulatory failure, or need for mechanical ventilation. Although renal failure is also deemed an extrahepatic organ failure, for the purposes of this analysis we considered it as a hepatic failure. Our reasons for doing so were twofold. First, if we analyzed transplant candidates only with hepatic failures, specifically liver and coagulation failure, then these patients would be classified as ACLF-2 and not ACLF-3, which was the intended study population. Secondly, a prior study has demonstrated that the presence of brain failure, circulatory failure, or need for mechanical ventilation at LT negatively impacted post-transplant survival, whereas the development of renal failure at LT did not[126]. Survival probabilities are summarized in Table A.5. Fig. A.7 depicts the effect of variation in the relative risk and probability of an optimal organ offer. For instance, if the center has an expected 1-year survival of 70% for patients transplanted with a marginal liver and a 50% daily likelihood of being offered an optimal liver, then LT should proceed on day 1 if an organ is offered to a patient with hepatic ACLF-3, regardless of quality (Fig. A.7A, red star). However, if the patient has extrahepatic ACLF-3, the center can wait until day 2 before accepting a marginal organ (Fig. A.7B, red star). 2.3.8 Sensitivity analyses We performed 2 sensitivity analyses to test the robustness of our findings. In the first, we removed patients with suspected chronic kidney disease based on a previously validated methodology[45]. After removal of 582 patients with suspected chronic kidney disease (9.9%), we demonstrated similar decision boundaries 21 across all 4 patient groups (Fig. A.5). In the second analysis, we analyzed patients transplanted from year 2014 (n = 2,264) to more accurately reflect the current epidemiological landscape of liver disease by evaluating the consistency of our findings in the post direct-acting antiviral era[153]. In this scenario, the decision boundary for patients with 4-6 organ failures increased, thereby allowing for a greater waiting period before recommending acceptance of a marginal organ. The decision boundaries for other groups remains the same (Fig. A.6). 2.3.9 Analysis of length of stay Although our study was focused on the outcome of 1-year postLT survival, we also performed analysis to determine if the day of transplantation impacted post-transplant length of hospital stay. Among the 4 patient groups studies, the day of transplantation did not significantly affect length of hospital stay after LT (Table A.6). 2.4 Discussion Our study demonstrates that among the 3 competing variables of earlier transplantation, donor organ quality, and candidate organ failure recovery, it is earlier transplantation that leads to the greatest overall survival probability. This is due to the high nontransplant mortality after listing, the less consequential impact of organ quality on post-LT mortality, and the low likelihood of organ failure recovery within the first 7 days after listing. Our findings are particularly relevant to patients aged >60 or with 4-6 organ failures at listing, regardless of age, since these patient groups have the highest probability of non-transplant mortality. Mortality rates without LT were higher in our investigation than in prior prospective studies[62, 135], but we believe this is because we evaluated mortality from the time of listing for LT rather than the day of initial presentation with ACLF-3. Although our study provides guidance regarding which day to proceed with LT, we acknowledge that a variety of other factors beyond organ quality are incorporated in 22 the decision to transplant. Therefore, the primary message of our chapter is that the general approach to managing this population should be centered around a principle of earlier transplantation. Ambiguities exist surrounding whether to accept or decline an organ offer for a patient with ALCF-3, partially because data from prior investigations are conflicting regarding whether it is favorable to transplant a patient early or to wait for a higher quality liver[125]. However, per our results, the reduction in post-LT survival when utilizing such an organ is generally less consequential than the daily mortality while remaining on the waiting list, for all patient groups assessed. Although our prior work has suggested earlier transplantation within 30 days of listing may improve post-transplant survival[122], data from the current chapter indicates that within the first 7 days of listing the timing of transplantation does not impact post-LT survival. Therefore, the rationale for earlier transplantation is driven by the high waitlist mortality among candidates with ACLF-3. Subsequently, we suggest that the use of lower quality donor organs can be considered to facilitate earlier transplantation in the setting of ACLF-3, particularly in regions of the United States with higher median MELD-Na scores at LT. An additional factor to consider when offering transplantation to candidates with ACLF-3 is whether organ failure recovery is feasible prior to LT, as this improves post-transplant survival[126, 69], especially among candidates above 60 years[126]. We therefore propose that if in the judgement of the medical and surgical providers, an opportunity exists for improvement in these specific organ system failures, then transplantation should be deferred. However, as the overall likelihood of organ failure recovery occurring within 7 days from listing is less than 10% per our study results, the general approach to the management of patients with ACLF-3 on the waiting list should be focused on early transplantation rather than postponement of LT in anticipation of future organ failure recovery. Although the relatively small percentage of patients who improved from ACLF-3 at listing to ACLF-2 at LT is notable, we believe this finding is consistent with prior data, which has demonstrated that patient’s grade of ACLF between 3-7 days from hospital admission is indicative of the final ACLF grade[62]. 23 To increase confidence in the decision to proceed with LT using a high DRI organ, we have incorporated considerable variability in our 2-way sensitivity analyses, to allow the clinician to account for both the expected probability of receiving an organ offer and the estimated post-transplant survival, based on their center’s prior outcomes. However, we acknowledge that the decision to proceed with LT is complex, involving tradeoffs both at the patient and health system level. In this work, we focus only on patientlevel decision making and not system-level optimization of transplant decisions. Consequently, we do not consider whether the offered organ would be better suited for another patient or would improve performance metrics for the transplantation center. The results of this work are not meant to provide a definitive recommendation on transplantation times, and clinical judgement should be the ultimate arbitrator in determining the best course for a particular patient in a given situation. As our investigation indicates that maximum overall survival in the setting of ACLF-3 occurs with earlier LT, it is important to discuss limitations in how such patients are currently prioritized on the waiting list. Though the MELD-Na score performs well in predicting mortality for patients with mere decompensated cirrhosis, in the setting of ACLF and particularly ACLF-3, studies have demonstrated it to underestimate waitlist mortality[1, 64, 128]. The discrepancy between actual mortality in a patient with ACLF-3 and expected mortality as determined by the MELD-Na score is most pronounced among those with MELDNa scores <30[1, 64, 125]. Furthermore, providing additional waitlist priority using a system based upon the MELD-Na score, such as the Share 35 rule in the United States, does not fully address the mortality risk associated with extrahepatic organ failures[128]. Though we do not advocate for changes in organ allocation policy based on our study findings, we do suggest additional prospective observational trials are needed to determine whether incorporation of ACLF development into waitlist prioritization leads to earlier LT and improvement in overall survival in this population. Additionally, in the United States there is disparity across UNOS regions and between individual transplant institutions in the utilization of marginal livers, with smaller centers being more likely to decline 24 these organs[147]. The reason for this discrepancy is multifactorial, but maintenance of post-transplant survival above the expected outcomes suggested by UNOS is a key driver of current clinical decision making. Subsequently, marginal quality organs are often either discarded or transplanted into patients with lower MELD-Na scores, who could afford to remain on the waiting list[147]. Projections have further indicated that donor organ quality will continue to worsen in the United States and if existing utilization practices remain constant, organ usage will decrease more than 30% by the year 2030[101]. When further considering the rising prevalence of ACLF, particularly in the population with non-alcoholic fatty liver disease[124], these findings are concerning. Therefore, we suggest further investigation to explore changing the outcome metrics when utilizing a marginal quality liver for a patient with ACLF-3, so that a center is not disincentivized from performing LT in a patient who likely would die, with an organ which may have been discarded otherwise. Limitations that are inherent to retrospective studies of public databases also exist in our study; these were primarily related to the potential for misclassification, owing to the lack of data regarding bacterial infection or variceal bleeding, as well as the use of mechanical ventilation as a surrogate for respiratory failure. Although we cannot overcome this limitation, it should be noted that several key findings from our previous publications[123, 125] have been subsequently corroborated in separate studies using granular patient data[14, 92, 64], thus supporting the accuracy of our methodology to identify ACLF. Additionally, post-transplant survival, as determined in our study, may be overestimated due to a selection bias, since only the most “robust” patients in the judgement of the provider would be chosen for transplantation. This may particularly be the case for patients transplanted with marginal organs, leading to a higher relative risk of post-transplant survival compared to recipients transplanted with an optimal organ. To account for this, our 2-way sensitivity analyses provided variability in post-transplant survival, to allow the clinician to incorporate expected survival probability from their center into the decision to proceed with LT. 25 However, we emphasize that our results should only be used as guidance in the decision to accept an organ for transplantation, and ultimately the provider needs to also account for factors not included in our analyses such as frailty, degree of ventilatory and vasopressor requirement, and personal experience regarding transplantation with marginal quality organs[151]. While we cannot model all possible scenarios, given the scarcity of literature regarding transplantation in the setting of ACLF, the value of our chapter is the focus on a single base case scenario and several sensitivity scenarios to provide a quantitatively driven outcome of overall survival probability in relation to 3 specific factors which have previously been demonstrated to affect pre- and post-transplant survival[126, 125]. We believe that the sensitivity scenarios illustrate general trends useful for adapting our findings to a centers’ needs. In summary, earlier transplantation is favored for patients with ACLF-3 within the first 7 days after listing, particularly in candidates aged >60 or with 4-6 organ failures, due to a combination of high mortality without transplantation, relatively lower impact on post-transplant survival when using a marginal organ, and low likelihood of organ failure recovery prior to LT. Further research is needed regarding providing additional waitlist priority to candidates with ACLF-3 to expedite LT and remove disincentives for a center that utilizes a marginal quality donor organ in this population, to increase access to transplantation for the most critically ill patients with end-stage liver disease. 26 Variable Value ACLF-3 patients non-transplant survival probabilities (1-γt), by age and day (1-7): >60 years old Day 1 0.9468 Day 2 0.8719 Day 3 0.8042 Day 4 0.7252 Day 5 0.6442 Day 6 0.5916 Day 7 0.5275 ≤60 years old Day 1 0.9485 Day 2 0.8850 Day 3 0.8201 Day 4 0.7650 Day 5 0.7040 Day 6 0.6590 Day 7 0.6032 ACLF-3 patients post-transplant survival probabilities (1-µ12) transplanted in the first week: Optimal Liver >60 years old, 12 months post-transplant 0.7709 ≤60 years old, 12 months post-transplant 0.8618 Marginal Liver >60 years old, 12 months post-transplant 0.7407 ≤60 years old, 12 months post-transplant 0.7819 ACLF-3 patients non-transplant survival probabilities (1-γt), by organ failures and Day (1-7): >3 organ failures Day 1 0.9336 Day 2 0.8501 Day 3 0.7745 Day 4 0.7064 Day 5 0.6289 Day 6 0.5806 Day 7 0.5183 =3 organ failures, Day 1 0.9575 Day 2 0.9024 Day 3 0.8433 Day 4 0.7868 Day 5 0.7283 Day 6 0.6823 Day 7 0.6274 ACLF-3 patients post-transplant survival probabilities (1-µ12) transplanted in the first week: Optimal liver >3 of, 12 months post-transplant 0.7922 =3 of, 12 months post-transplant 0.8654 Marginal liver >3 of, 12 months post-transplant 0.6963 =3 of, 12 months post-transplant 0.8030 Relative risk of post-transplant mortality between recipients of a marginal vs. an optimal liver 0.90 (varied in sensitivity analysis) Daily probability of getting an optimal liver (α) 0.6 (varied in sensitivity analysis) Table 2.2: Model parameters and sources, including pre- and post-transplant survival probabilities, relative risk of post-LT mortality and health-related utility values. 27 Figure 2.3: Two-way sensitivity analyses, accounting for center variation regarding probability of receiving an optimal organ offer and expected 1-year post-LT survival using a marginal quality organ. (A) We display a scenario of a transplant candidate >60 years old. With an expected one-year survival of 70% for patients transplanted with a marginal liver and 50% daily likelihood of being offered an optimal liver, LT should proceed on day 1 regardless of organ quality. (A, red star) However, if the patient is ≤60 years old, then LT can occur on day 2 or before regardless of organ quality. (B, red star) (C,D) Additional scenarios according to the presence of 3 or 4–6 organ failures at waitlist registration. LT, liver transplantation. 28 Chapter 3 Quantifying the Benefits of Increasing Decision-Making Frequency for Health Applications with Regular Decision Epochs 3.1 Introduction Sequential decision-making problems with fixed decision intervals in the context of uncertainty naturally arise in healthcare settings: monitoring problems, treatment initiation problems, disease testing, and diagnosis frequency, etc. For example, a patient with a chronic illness may require a tailored treatment regimen as their disease progresses over time, or a patient with organ failures may be offered organ transplants of varying quality, and may choose to wait or accept offered organs as their own survival probability declines. In general, the frequency at which decisions can be made or changed in these contexts is determined by some physical limitation that occurs regularly over time, e.g., the frequency of doctor’s visits or transplantation offers. In many healthcare settings, such limitations are often costly and must occur after a discrete interval of time. For instance, chronic kidney disease (CKD) treatment regimen changes can only be made when the patient visits a doctor’s office, which may happen at some interval (e.g., weekly, monthly, etc.). These followup frequencies often vary by patient health or disease progression rate, but the optimal frequency for a particular health state may be unknown. Increasing the frequency may provide benefits – catching disease progression sooner and faster modification of treatment plans as the patient’s needs 29 change – but may also impose costs to patients. It is therefore critical to carefully determine whether it is net beneficial to have more frequent visits. This problem also arises in the context of organ transplantation. A patient waiting for organ transplantation may choose to invest in efforts to increase the frequency of receiving organ offers. Such efforts include transferring to hospitals that have shorter waiting periods [143, 140], and multiple-listing [138]. Transplant centers may have different organ offer frequencies and duration until transplant. Multiple listing entails the process of enrolling at two or more transplant hospitals. Candidates located near the donor hospital are typically prioritized over those farther away, sopting for multiple-listing can enhance patients’ prospects of receiving a local organ offer and chance of transplantation [138]. For example, people who are multiple-listed for heart transplantation have a shorter average second listing waiting period (126 days) compared to the first listing waiting period (335 days) [56]. The Organ Procurement & Transplant Network policy also allows patients to transfer primary waiting time to another hospital or switch wait time between programs if multiple-listed. However, multiple-listing usually involves completing additional evaluations for the new hospital and coordination with the insurance provider. Such efforts may be financially costly and time-consuming, and it may be useful to understand the value of increasing the frequency of donated organs to organ recipients to better determine whether the costs associated with such efforts are justified. As these efforts for increasing the frequency of receiving an organ offer are primarily an individual patient’s medical decision, we approach the problem from the patient’s perspective. In these contexts, it is important to identify the best times to offer more frequent decision-making opportunities and quantify the associated benefits. This allows for better evaluation of whether the benefits justify the potential costs of creating these additional decision-making opportunities. 30 3.1.1 Research Question and Approach What is the value in increasing opportunities to make decisions, specifically in the context of stopping problems when decisions can only happen at regular intervals? While many works have studied timing trade-offs, even within the Markov decision process (MDP) literature, we here take a novel approach of directly comparing two MDPs — one with more frequent decisions, structured such that the state outcomes are equivalent if the action is to continue (as opposed to stopping). This comparison allows direct quantification of the value of more frequent decisions in addition to the identification of the optimal stopping time (which is the typical motivation in previous literature). We will explore this problem in the context of the “more-frequent" and the “less-frequent" MDP problems. In scenario 1 (“less-frequent" decision), a decision-maker has an opportunity to ‘stop’ a process at each interval. We contrast this to scenario 2 (“more-frequent" decision), where the decision is made every 1 k interval (where k is an integer). How much more should a policy-maker value these additional opportunities? We use the same state space, action space, and discount factor for both the more- and less-frequent problems, but the number of epochs in the more-frequent problem is k times that of the less-frequent problem. The transitions are such that, given the same sequence of actions (i.e., the more-frequent problem follows the same policy as in the less-frequent problem for all additional decision-making epochs at each state), both problems generate the same likelihood of ending up in each state. The reward values over a given duration are also equivalent if the same actions are used in both problems. In the morefrequent framework, a per-period cost accounts for all costs associated with additional decision-making opportunities. This problem setup allows us to use these two scenarios to study the benefits of increasing decision-making frequency in stopping problems, all else equal. We make four main contributions in this study. First, we provide structural results around the valuation of decision-making frequency in MDP stopping problem frameworks. Understanding this valuation allows 31 us to decide how often decisions ought to be made to increase utility. Despite substantial prior literature in the area of discrete-time MDPs, we are not aware of any prior work that has examined this problem rigorously. Secondly, we provide structural results relating to less-frequent and more-frequent problem solutions. This allows us to partially solve one problem when the optimal solution is known for the other, allowing us to translate knowledge from one context to another. Third, we analyze the difference between the optimal values of the two problems and when this quantity is maximized. This novel approach allows us to quantify the benefits of making more frequent decisions. Moreover, this informs us of when it would be more profitable to switch to a more-frequent decision-making framework. Fourth, we provide two numerical examples using liver transplantation among a particularly severely ill patient population and early-stage chronic kidney disease treatment initiation using empirical data. These examples demonstrate how this framework might be used in diverse healthcare applications and illustrate its applicability in similar problem contexts. 3.2 Literature Review 3.2.1 Markov Decision Processes in Healthcare Applications MDPs have long been used in the operation research literature for a variety of applications, including inventory management [54], portfolio management [20], production and storage [13], and others. There is a deep literature in solving and understanding MDP structure [107, 55, 134]. These works have provided the foundations of many subsequent results on threshold structures of MDP policies, and we will similarly rely on those results here. As in prior literature, we will examine threshold policies and monotonic structure over time and state space, but we will extend this work to examine their implications when comparing more- and less-frequent decision-making frameworks. We point the reader to Schaefer et al. (2004), Alagoz 32 et al. (2010), Sonnenberg et al. (1993), and Givan et al. (2010) for a more complete review of MDPs[113, 4, 120, 55]. MDPs are also a commonly used tool for healthcare applications, and has been used for applications such as screening [36, 3], sequential disease testing [12, 118], treatment initiation [116, 87], and organ transplantation (see below). Within this MDP framework, we focus our analysis on finite horizon stopping problems. Stopping problems are commonly used for treatment initiation problems and organ transplantation problems and form an important healthcare policy decision tool. In these and other health-related problems, a finite decision horizon is typically considered. Among the problems mentioned here, several are stopping problems [46, 2, 6, 5, 116, 36, 85, 3, 87]. Previously, authors have focused on establishing threshold policies over either state or time in an MDP framework. For instance, Alagoz et al. (2007) identified an at-most-three-region optimal policy for an infinite-horizon MDP model for liver transplantation[5]. Shechter et al. (2008) considered both state thresholds and time thresholds to find the optimal HIV treatment initiation time[116]. However, to the best of our knowledge, no paper has considered how these threshold policies may change if the frequency of decision-making is changed. In this work, we extend prior analyses by additionally studying this problem and extending threshold properties to provide novel insights into estimating the value of decision-making frequency. 3.2.2 Epoch Sizes in Markov Decision Processes There are two main time-related components that impact a decision-making process: the time horizon and epoch size. The former has been studied in prior literature, as exemplified by literature that considers the effect of different lengths of life on decision-making [49, 48]. The latter has received less attention, although many authors have investigated questions involving epoch intervals in their work, particularly within the reliability literature. For instance, Barlow et al. (1975) focuses on probabilistic aspects of reliability theory and includes discussion of timing problems[19], and Kuo et al. (2006) used a partially observable MDP 33 (POMDP) in machine maintenance[84], allowing the intervals between sampling draws to vary. See Wang et al. (2002) for a review of the reliability literature[148]. However, unlike prior work, we do not only focus on when an action should be taken, but also the additional value generated from having the opportunity to make more frequent decisions. While we may find that the optimal time to act may be the same, there may be value in having had more chances to change one’s decision. While an alternative would be to use a continuous-time MDP (CTMDP) or semi-MDP (SMDP) model, we focus on a discrete-time formulation in alignment with the majority of the work in clinical and healthcare applications using MDPs, with the hope that this makes our work more generalizable. CTMDP and SMDP frameworks are usually more difficult and more computationally costly to solve than discrete-time MDP models, and this may also contribute towards their relative unpopularity in the healthcare application context. However, even within the context of discrete-time models in healthcare, the choice of epoch size is not always clear. This has led to prior work on methods to convert between epoch sizes; for example, Chhatwal et al. (2016) shows how eigen-decomposition methods can be used for converting transition probability matrices between different lengths of time[37]. We will use this technique to convert transition probabilities and rewards between frequencies in our work. One notable prior work has tangentially addressed the issue of epoch size in an MDP using a variable decision-making frequency model. Alagoz et al. (2013) formulated a finite-horizon MDP model (a stopping problem) in breast cancer diagnosis[3]. The goal of the work is to reduce unnecessary follow-ups by considering follow up with different frequencies. Alagoz et al. (2013) introduced two non-terminate actions (follow-ups) which may be chosen every 6 and 12 months respectively[3]. This problem introduces the utility of considering different action frequencies when solving for optimal health policies but does not quantify the benefits of more frequent decision-making, which we do here. 34 There are also examples of using restless bandits to choose epoch sizes in decision-making problems. For instance, Prins et al. (2020) develop a restless multi-armed bandit framework for monitoring drug adherence[106]. The doctor can choose to observe the patient’s state at each decision epoch, potentially resulting in variable lengths of time between observations. However, this approach cannot compare the exact additional value of more frequent decision-making, as we do in this work by comparing two MDP formulations. In addition, our approach extends the existing MDP literature on organ transplantation, of which there is a rich legacy [46, 2, 6, 5, 111, 112]. 3.2.3 Organ Transplantation with Stochastic Dynamic Models We use the optimal timing of liver transplantation as one of our motivating examples, for which we will also provide a numerical analysis. Prior work has applied MDPs to organ transplantation problems before [46, 111, 112, 2, 28]. While prior work has examined liver transplantation problems for patients with endstage liver failure under an MDP framework, they have not examined the value of increasing transplant offers. In this chapter, we determine the value of increasing the frequency at which livers are offered to inform patients of how much cost would be justified in doing so. We consider a particularly vulnerable patient population (acute-on-liver-failure grade 2 or 3, or ACLF2 and ACLF3, patients, who have two, three, or more failed organs), where patients are severely ill and at very high priority for liver transplant, making the offer of more frequent organ offers particularly salient [91]. 3.2.4 Treatment Initiation with Stochastic Dynamic Models In this work, we also examine when to initiate treatment for early-stage chronic kidney disease (CKD) patients. Prior work has studied the optimal time to initiate treatment in the context of stochastic disease progression; for instance, Shechter et al. (2008) identified the optimal timing of initiating HIV treatment using an MDP[116], Kurt et al. (2010) studied structural properties of statin initiation for type 2 diabetic 35 patients using an MDP framework[85], and Liu et al. (2017) proposed an MDP framework to find the optimal strategy for treatments considering technology changes[87]. Although the structural properties of the optimal policy have been thoroughly analyzed by many, there has been limited exploration of how changes in the frequency of decision-making can affect the optimal policy and value, which is what we focus on here. 3.3 Model Formulation We formulate two finite-horizon, discrete-time MDPs (the less-frequent and more-frequent problem). Both MDPs have the same objective, which is to maximize the total expected discounted rewards for a patient. As shown in Figure 3.1, in the more-frequent problem, the decision-maker is able to make k − 1 additional decisions in each interval compared to the less-frequent problem, resulting in k times as many decisionmaking opportunities in the more-frequent problem. We consider non-stationary transition probabilities and rewards as these are common in health application problems. Figure 3.1: Timeline of the more- and less-frequent problems (using k=4 as an example). In the less-frequent problem, the decision-maker can make one decision every four time units at the beginning of each decision epoch. In the more-frequent problem, the decision-maker can make one decision every one time unit at the beginning each decision epoch, resulting in four times as many decisions as the less-frequent problem. The notation used in this chapter is as follows. The set of health states in both more- and less-frequent problems is the same and denoted by S = {1, 2, ..., post-decision-making state (|S|-1), death (|S|)}. We assume there exists an ordering of the states. As typical in many healthcare MDP problems, we will order the states such that state 1 is the healthiest state and the health status in the state j is worse than that of state i if i < j. Some states may be absorbing (the death state and post-decision-making state). 36 We limit our analysis to stopping problems, in which the decision-maker may continue or stop the problem. In our motivating organ transplantation problem, this is equivalent to continuing to wait for a better organ or stopping the decision process by accepting an organ offer. We therefore denote the set of available actions in both more- and less-frequent problems as A={wait (w), accept (a)}. If wait is chosen, the patient can remain alive or die before the next decision period, when the process repeats. Once the accept decision is made, the patient will permanently enter a post-decision-making state. We allow the action space to vary across states, where only wait is allowed in some states (‘wait states’) whereas both wait and accept are allowed in all others (‘non-wait states’). This allows us to model situations where no decision besides wait can be made (e.g., if no liver is offered this period). We use S˜ ⊆ S to denote a set containing all non-wait states. The total number of decision epochs in the more-frequent problem is denoted by N. We assume N is a multiple of k. T = {1, ..., N} is the set of possible decision period for the more-frequent problem. T˜ = {1, k + 1, 2k + 1, ..., N} is the set of possible decision periods for the less-frequent problem. We use t˜+ to represent the decision period after time t˜in the less-frequent problem, and use t˜− to represent the epoch before t˜, i.e., t˜+ = t˜+ k, t˜− = t˜− k. We denote pt(d) as the transition probability matrix for the more-frequent problem when the decisionmaker chooses action d ∈ A and t ∈ T. In the less-frequent problem, we use Pt˜(d) = p k t˜ (d),t˜∈ T˜, which is pt˜(d) multiplied by itself k times, to represent the transition probability matrix for action d ∈ A. This means that the likelihood of being in any state for each t ∈ T in the more-frequent problem is the same as in the less-frequent problem at those same epochs, provided the same actions were taken. In the morefrequent problem, we assume pt(d) = pt+m(d) for m ≤ k for any t ∈ T˜. We make this assumption as the transition probabilities typically do not vary much within a short interval (daily, monthly, or yearly). We could relax this assumption using a continuous approximation, e.g., Gompertz functions [59, 120], but we omit this here for simplicity. We use pt(i|d) to represent the i-th row of the matrix pt(d). Throughout the 37 chapter, we use pss′ ,t(d) to denote an element of the matrix, the transition probability from state s to state s ′ at time t given action d. We discuss how we parameterize the matrix in the Section 3.5. λ denotes the discount factor for the more-frequent problem, 0 ≤ λ ≤ 1. For the less-frequent problem, the discount factor is λ k . The reward earned for the patient at state s ∈ S and taking action d ∈ A for t ∈ T for the morefrequent problem is denoted using rt(s, d), the health benefits to the patient. We consider two types of rewards: the immediate reward and the lump-sum reward. Once the decision-maker chooses wait, the patient will earn immediate reward rt(s, w) based on s, t and advance to the next decision period. If the decision-maker chooses accept, the patient will earn the lump-sum reward rt(s, a) given s, t and enter the post-decision-making state. We assume the value of the reward is the same for both t and t + m (t ∈ T , m˜ ≤ k) for both types of rewards in the more-frequent problem. We therefore use rt(s, d) to represent both rt(s, d) and rt+m(s, d) for the more-frequent problem (t ∈ T , m˜ ≤ k). We use ⃗rt(d) to denote the vector of rewards for action d for different states at time t. In the less-frequent problem, we use Rt˜(s, d) to denote the reward for the patient at state s ∈ S and taking action d ∈ A for t˜∈ T˜. We assume the lump-sum reward of the less-frequent problem is the same as the more-frequent problem at t˜ ∈ T˜ (Rt˜(s, a) = rt˜(s, a)). Also, we assume the lump-sum reward of wait states is 0. For the immediate reward, the rewards earned at time t˜ in the less-frequent problem should equal the immediate reward earned in the more-frequent problem at time t˜plus the expected discounted immediate reward in the remainder of that interval (Rt˜(s, w) = rt˜(s, w) + Pk−1 j=1 λ jp j t˜ (s|w)⃗rt˜+j (w) = rt˜(s, w) + Pk−1 j=1 λ jp j t˜ (s|w)⃗rt˜(w)). Ck represents the per-period costs needed to use the more-frequent decision-making frequency compared to the less-frequent frequency; this value is dependent on k. We assume that Ck is time-homogeneous, and, since it captures costs, is non-positive. Ck only appears in the more-frequent problem, as this cost is not incurred in the less-frequent problem. 38 Let vt(s), Vt˜(s) denote the optimal value function of the state s ∈ S, t ∈ T,t˜∈ T˜ for the more-frequent and less-frequent problem respectively. At optimality, the following must hold for the more-frequent problem: vt(s) = Ck + max[rt(s, a), rt(s, w) + λpt(s|w)⃗vt+1] if s /∈ {wait states, post-decision, death} Ck + rt(s, w) + λpt(s|w)⃗vt+1 if s ∈ {wait states} 0 if s ∈ {post-decision, death} . Similarly, for the less-frequent problem: Vt˜(s) = max[Rt˜(s, a), Rt˜(s, w) + λ kPt˜(s|w)V⃗ t˜+ ] if s /∈ {wait states, post-decision, death} Rt˜(s, w) + λ kPt˜(s|w)V⃗ t˜+ if s ∈ {wait states} 0 if s ∈ {post-decision, death} . This problem is equivalent to one where the decision-maker may choose to accept while in a wait state if the lump-sum reward in that state is smaller than min(λvt(s), 0), ∀t ∈ T, s ∈ S, as wait will then always be chosen in wait states (see Lemma 1 in Appendix 1.1). We can set the reward for accept in wait states to satisfy this condition for any realistic problem. Therefore, for ease of notation in the remainder of the manuscript, we assume the action space is ({accept, wait}) for all states and the lump-sum reward for accept is sufficiently small in wait states. 39 3.4 Structural Properties 3.4.1 Assumptions We first make the following reasonable assumptions for the more-frequent problem; we also make analogous assumptions for the less-frequent problem as well (not shown for simplicity). In this and the following sections, many results pertain to a threshold policy. Similar to Bertsekas (2012)[25] and Puterman (1994)[107], we define a threshold policy where, given that the optimal action is accept for non-wait state i at time t, the optimal action will also be to accept for non-wait states j > i or for time t ′ > t . Assumption 1. Rewards are non-increasing over time and non-wait states. Assumption 2. Both P and p have the increasing failure rate (IFR) property for all non-wait states. This means as a patient progresses to a worse state then this patient has a higher chance of progressing to an even worse state compare to patients in better health condition states. This is generally true in the healthcare context. Assumption 3. rt(s, w)+P|S˜| j=1 psj,t(d)u(j)−rt(s, a) ≥ rt(¯s, w)+P|S˜| j=1 psj,t ¯ (d)u(j)−rt(¯s, a), for any non-increasing u vector over state, ∀s, s¯ ∈ S, ˜ s > s, t ¯ ∈ T This means that the reward difference between wait and accept is non-increasing over states. For instance, the benefit of waiting is higher in a healthier state, as sicker states usually have higher mortality. Similar assumptions are commonly used in sequential decision-making problems in healthcare and have been used in prior work to show there exists a threshold policy over states (see Puterman (1994)[107], page 107, and Chhatwal et al. (2010)[36], for example). We will use this assumption for a similar purpose. Assumption 4. rt−1(s, w) + P|S˜| j=1 psj,t−1(d)u(j) − rt−1(s, a) ≥ rt(s, w) + P|S˜| j=1 psj,t(d)¯u(j) − rt(s, a), ∀ non-increasing u, u¯ over state j such that u(j) ≥ u¯(j) > 0, ∀s ∈ S, ˜ ∀t ∈ T, ∀d ∈ A 40 This means that the reward difference between wait and accept is non-increasing over time. For instance, the benefit of waiting is higher in an earlier decision epoch, as later decision epochs usually have higher mortality. Diseases with increasing mortality risk and progression probabilities satisfy this assumption. This assumption is very common in healthcare problems, as patients in worse health states are more likely to become sicker, and that effect worsens over time. For instance, one disease can lead to complications and comorbidities, as biological systems within the body are linked (e.g., having severe cirrhosis of the liver can lead to liver failure [142], but over time this can also lead to other organs failing as well [156]). Assumption 5. The probability of entering wait states is zero. We focus our attention on non-wait states only, as wait will always be preferred in wait states by construction. This assumption allows us to establish a threshold policy and monotonicity over non-wait states. We show numerically that our structural outcomes can hold in non-wait states with violation of this assumption in Section 3.5. The above assumptions are important to establish threshold policies and the theoretical results below. Topkis (2011)[134], Puterman (1994)[107] and other works have used supermodularity assumptions to establish control-type policies. We will also establish theoretical results based on similar assumptions. 3.4.2 Structural Observations Prior work on MDPs shows that our assumptions will generate monotonic (non-increasing) optimal policies in state for both the more-frequent and the less-frequent problem [107]. This means that if wait is preferred at non-wait state s at any time, then it is also the optimal policy for any non-wait state before s; if accept is preferred at non-wait state s, then it will be for all non-wait states after s. Similarly, there also exists monotonic optimal policies in time for both problems, as we show using a similar procedure as in Chhatwal et al. (2016)[37] (see Lemma 3 in Appendix). At any non-wait state, if accept is preferred at time 41 t, then it will also be preferred in any time after t; if wait is preferred at time t, then it is also preferred for all times before t. These threshold policy results imply that if it is optimal to stop the problem at one time or state, then it is also optimal to do so at any later time or sicker state. Proposition 1. vt˜(s) − Vt˜(s) ≥ PN−t˜− N−t˜ k i=0 λ iCk, ∀t˜∈ T , s ˜ ∈ S˜. All proofs are provided in the Appendix. Proposition 1 holds without any of the previous assumptions. In the more-frequent problem, the decision-maker has more opportunities to make decisions, so it is intuitive that the optimal value for the more-frequent problem should be no less than the optimal value for the less-frequent problem if the cost of increasing the frequency is not considered. The right-hand-side is the total discounted cost of increasing the decision-making frequency k times. We can then think of a decision-maker comparing total costs to this additional benefit when making a decision about which offer frequency to use. This proposition also serves as the basis for discovering relationships between the more-frequent and the less-frequent problem. Proposition 2. When accept is the optimal action in the more-frequent problem for non-wait state s, epoch t˜∈ T˜, if Ck ≤ vt˜(s) − Vt˜(s), then accept is also the optimal action for the less-frequent problem for state s at time t˜. Proposition 2 helps us understand the relationship between the more-frequent and less-frequent problem in the case where financial costs are less than the additional benefits. This can serve as the foundation when building other interesting structural properties. This proposition does not rely on any assumptions and should hold for any problem (even without any threshold policies over state or time) described in the problem setup. Using this proposition, we can identify properties of the solution of the more-frequent problem by examining the solution for the less-frequent problem via contraposition. For instance, for a non-wait state s in a period t˜, if the optimal action in the less-frequent problem is wait, then we know the optimal action at state s in t˜must also be wait in the more-frequent problem (as it cannot be accept, as it 42 would also then be accept in the less-frequent problem). If there exists a threshold policy in both the lessfrequent and more-frequent problems, the optimal action at t ≤ t˜for the more-frequent problem must also be wait. We can then pre-solve part of the optimal strategy for the more-frequent problem by solving the less-frequent problem. Similar logic using this proposition shows that when there exists a threshold policy over time, then it must be the case that whenever the more-frequent and less-frequent problems differ in optimal policy, the less-frequent problem chooses accept while the more-frequent problem chooses wait for all t˜∈ T˜. However, note that this proposition does not comment on the optimal actions at time t /∈ T˜, and it is possible for the more-frequent problem to choose accept at a time t /∈ T˜ while the less-frequent problem does not have the opportunity to change from the wait action. If so, the less-frequent problem’s optimal action at the next opportunity t ∈ T˜ should be accept (provided a threshold policy over time exists in the more-frequent problem). Also, this proposition does not provide a way to check if vt˜(s) − Vt˜(s) ≥ Ck. In Theorem 4, we will provide a sufficient condition to ensure vt˜(s) − Vt˜(s) ≥ Ck. Let Dt˜(s) = vt˜(s)−Vt˜(s) denote the difference in optimal values between the more- and less-frequent problem at time t˜and state s. We next develop properties concerning Dt˜(s). Theorem 1. When both problems have different optimal actions and Assumption 4 and Assumption 5 hold, then the difference in the optimal value, Dt˜(s), s ∈ S˜, is non-increasing in time for all t˜∈ T˜ when the optimal action for the more-frequent problem is to wait. Otherwise, Dt˜(s) is non-decreasing in time for all t˜∈ T˜. Theorem 2. When both problems have different optimal actions, if Assumption 3 and Assumption 5 hold, then the difference in the optimal value, Dt˜(s), s ∈ S˜, is non-increasing in state for all s ∈ S˜ when the optimal action for the more-frequent problem is to wait. Otherwise, Dt˜(s), is non-decreasing in state for all s ∈ S˜. Theorem 1 and Theorem 2 state that the difference in the optimal value is always non-increasing over time and over non-wait states if both the more- and less-frequent problems give different optimal actions 43 and more-frequent problem choose to wait. This means that the additional benefit we could earn from switching to the more-frequent problem from the less-frequent problem will be non-increasing over both time and non-wait states if both problems do not agree with each other (however, note that when accept is the optimal action at time t˜ ∈ T˜ for both problems then Dt˜, the difference between optimal values, is zero). When both the more-frequent and less-frequent problems give different optimal actions and the more-frequent problem prefers accept, using Proposition 2, we have Dt˜(s) ≤ Ck. In this case, using the more-frequent problem costs more than the additional benefits gained. Let B(s) ∈ T˜ be the last epoch where the optimal policy for both problems is wait for state s ∈ S˜. Theorem 3. Dt˜(s), s ∈ S˜ is non-decreasing over time and states for t˜ ∈ [0, B+(s)) ∩ T , s ˜ ∈ S˜, where B(s) + = B(s) + k, B−(s) = B(s) − k, if: (a) there is a threshold policy over time and non-wait states; (b) optimal values are non-increasing over time; (c) DB+(s) is non-decreasing over non-wait state s; (d) pB−(s) (i|w)D⃗ B(s) ≤ pB(s) (i|w)D⃗ B+(s) , i ∈ S˜; (e) and Assumption 5 holds. Theorem 3 requires that a threshold policy over non-wait states exists. However, note that if there exists a threshold policy only over some states, Theorem 3 may still be true. Condition (c) means that the additional benefit of more-frequent decision-making is non-decreasing if health states are worse. This means a more severely ill patient would gain more from additional decision-making opportunities, which may be realistic as the potential health gains of intervention increase as a patient nears death. Condition (d) means the one-step ahead expected additional benefits of more-frequent decision-making by choosing wait is smaller at time B(s) than at time B(s) + k. For example, this would mean that the one-step ahead expected additional benefits for an ACLF3 patient aged 50 is less than that of the same individual at the same state at an older age. This may be reasonable as an older individual may have a faster expected rate of health decline and therefore have larger expected one-step ahead expected benefits of more frequent decision-making. When Theorem 1 and Theorem 3 hold, we know the largest additional benefit from switching to the more-frequent problem in each non-wait state will be garnered either in the last time 44 period where both problems choose wait, in the first time the optimal policies do not agree, or the last decision epoch. We now turn to when Dt˜(s) ≥ 0. Let ψt˜(s) = max{rt˜(s, a), rt˜(s, w) + λpt˜(s)⃗rt˜+1(a)}. Theorem 4. For s ∈ S, ˜ t˜∈ T˜, if Ck + λCk + rt˜(s, w) + λpt˜(s|w)ψ⃗ t˜ ≥ Vt˜(s), then Dt˜(s) ≥ 0. Theorem 4 provides a sufficient condition for when Dt˜(s) ≥ 0. When this theorem holds, the largest additional benefit when comparing the less-frequent problem to the more-frequent problem will either be in the last time both problems choose wait or in the first time the optimal policies do not agree. Intuitively, this means if the benefits of making an additional decision is greater than the costs of doing it, then Dt˜(s) ≥ 0. Furthermore, for all time t˜and state s such that Dt˜(s) ≥ 0, according to Theorem 4, the optimal action for the more-frequent problem is always wait. In other words, we can identify a time threshold ts (for state s) for the more-frequent problem, such that for any t˜≤ ts, the optimal action for state s is to wait. With this, we only need to focus on solving the periods (ts, N] for each state s for the more-frequent problem. These four theorems are important in determining and quantifying the difference in value when comparing the less-frequent problem to the more-frequent problem. In the next section, we show numeric examples to better illustrate our theoretical results. 3.5 Numerical Example We provide two numerical examples in this work. The first is on liver transplantation decision-making with k = 2, formulated through a partnership with a physician specializing in liver transplantation at Cedars-Sinai Hospital in Los Angeles (Section 3.5.1). The second example examines treatment initiation for early-stage CKD patients (Section 5.5), with model details and results in the Appendix Section B.3 due to space constraints. 45 3.5.1 Organ Transplantation Decisions Among ACLF Patients Unlike typical prior literature in organ transplantation [6, 5], which focuses on general end-stage liver diseases (ESLD) where patients have a relatively lower death probability within a year after entering the ESLD health state, we focus on a cohort of patients diagnosed with acute-on-liver-failure grade-2 and grade-3 (ACLF2 and ACLF3, respectively). ACLF are types of acute liver failure where the patient has two (ACLF2) or three or more (ACLF3) simultaneous organ failures (OF) and is therefore in a severe, lifethreatening condition. Conventionally, the transplant decision for these patients is made within a week or a month after transplant eligibility to avoid the high likelihood of death [91, 161]. Some livers may have a higher probability of resulting in a successful transplantation, as measured by the donor risk index (DRI), which is a function of age, cause of donor death (if donor is dead), race, donation after cardiac death, partial/split grafts, donor height, donor location, and organ cool time [51, 110]. DRI depends on the donor, but not the recipient. Marginal livers (DRI ≥ 1.7) are less likely to lead to successful transplantation, while optimal livers (DRI < 1.7) are more likely to do so. The 1.7 threshold is well accepted in clinical practice; for instance, it is used in various prior medical literature [16, 44, 71]. Patient-physician teams may decline an offered organ in hopes of being offered another with better outcome probabilities later. Different health systems have different expected waiting periods[133]. Our physician partner indicated that in some health systems, ACLF2 and ACLF3 patients are offered a liver for transplantation as often as once per day. We therefore consider a 28-day decision-making framework, and a liver (either marginal or optimal) may be offered to the patient every two days with some probability. Given these high-need patients, we forgo the queuing systems often seen in liver transplantation models (e.g., Bandi et al. (2019)[18]) as these patients cannot survive a long transplant waitlist, and instead use an MDP stopping problem framework. Although our decision time horizon is short (a month), we include health outcomes one-year post-transplant and lifetime expected outcomes to capture clinically relevant outcomes. The model can also be extended to longer time horizons without structural modifications. 46 We assume a liver is offered to eligible patients with probability Ω on a bi-daily basis. Conditional on a liver offer, we assume the likelihood of receiving an optimal liver offer is time-invariant. We assume that a liver offered on the last decision epoch must be accepted if the patient chooses to wait on all prior decision epochs. Our objective is to evaluate the outcome of providing more frequent liver offers to eligible patients at specific times/health states, which may improve patients’ health at a cost. Additional liver offers may rely on multiple listing or transferring to other transplant centers [140, 138, 9, 144], and may also depend on a hospital’s resources, the price of organ transportation, and other logistical costs. It may also mean additional costs borne by the patient, if they must make additional decisions regarding accepting/rejecting a liver. Identifying the value in increasing the frequency of liver offers is therefore of policy importance; this value must exceed the costs if the increase in offers is to be net beneficial. We compare the less-frequent problem where an organ is offered with probability Ω every two days to the more-frequent problem where a liver is offered with probability ω every day (the more-frequent problem). Ω and ω depend on the number of organ failures suffered by the patient; those with more failures are prioritized over those with fewer [133]. If a liver is offered, with probability o it will be an optimal liver offer, and we assume this is invariant across patient-types. Thus at each epoch in the more-frequent problem, an ACLF2 or ACLF3 patient may be offered one of three options: a marginal liver ((1 − o)ω), an optimal liver (oω), or no liver at all (1 − ω). The patient can decide to accept the offer if a liver is offered. If an optimal liver is offered, there are no benefits to rejection, so it will be accepted. This is confirmed by both clinical experts and model outputs if we allow patients to make decisions when receiving an optimal liver. However, if a marginal liver is offered, the patient may decide whether to accept the liver or not. Accepting a marginal liver will lead to a lower post-transplant quality of life, while rejecting will mean the patient is exposed to mortality risk for at least the time until another liver is offered. 47 Our goal is to quantify the benefit of increased decision-making frequency. In this scenario, there are benefits accrued from both additional offered livers (ω livers offered per two days in expectation in the less-frequent problem and Ω livers offered per day in expectation in the more-frequent problem), as well as additional opportunities to make more decisions as the patient’s health state is observed (ACLF3, ACLF2, dead, etc.). We first quantify the total benefit, then analyze the contribution of each separately in Section 3.5.1.4. We quantify the benefit of increased quality of life and duration of life (the rewards R) through quality adjusted life years (QALYs) and the willingness-to-pay threshold. QALYs were first adopted for costeffectiveness analysis and is now widely used in medical decision-making problems [150]. QALY weights range from 0 to 1, and they take both quality of life and quantity of life lived into consideration. For example, for a patient with ACLF, the QALY weight for a year of life is 0.4 [152] while a perfectly healthy person will have QALY weight of 1. Because we need to compare these benefits with financial costs, we convert these QALY rewards to dollars using a conversion factor, T (the commonly used ‘willingness-topay (WTP) threshold’) [150]. For example, an accepted WTP per QALY gained in typical cost-effectiveness analyses is $50,000 per QALY gained [61]; the dollar value of the health benefit is then the product of the QALYs gained and $50,000 per QALY gained. Thus R = T × QALYs. Our value function at time t, which additionally takes into account the financial costs Ck in the more-frequent problem, is then the net monetary benefits (NMB), as commonly referred to in healthcare decision-making [136]. We solve both the more- and less-frequent MDPs with NMB objective values. This allows us to compare the marginal benefits of increasing the frequency of organ donations. We also identify when the difference between the optimal values of the two problems (D) is maximized in each state. We perform this analysis using empirical data and perform sensitivity analyses around uncertain parameters. We examine how D changes over states in our sensitivity analysis to numerically demonstrate Theorems 2 and 3. 48 3.5.1.1 Model Inputs We use United Network for Organ Sharing (UNOS) data and values from the medical literature to parameterize both MDPs. We use the likelihood of receiving an organ (ω), the conditional likelihood of receiving an optimal liver given a liver offer (o), death probability (γ), and the probability of improving from a worse to a better health state (ξ) to parameterize the transition matrix pt˜(w). We use the eigen-decomposition method to calculate Pt˜(w). The time-invariant relative risk (rr) of survival is used to calculate the posttransplant survival probabilities for a marginal organ. The model structure and inputs were validated by clinical experts from the Cedars Sinai Health System. we vary o and rr in sensitivity analysis as they are uncertain and may vary by transplant center. Critically, we relax the assumption that there is no probability of entering a wait state to numerically evaluate whether this will change outcomes from the theoretically proven results above. See Appendix B.2.0.1 for details on model inputs. 3.5.1.2 Assumptions We make the following additional assumptions in our numeric experiments while relaxing Assumption 5. First, we assume a ACLF2/3 patient will always accept an optimal liver if it is offered. Second, we stratify ACLF3 patients to ACLF=3OF and ACLF>3OF with the former having exactly three OFs and the latter having more than three OFs. We assume the likelihood of improving from ACLF>3OF to ACLF=3OF is equivalent to the probability of improving from ACLF=3OF to ACLF2. Third, we assume that all patients will accept the offered liver at the end of the time horizon if a liver is offered, regardless of organ type. 3.5.1.3 Base Case Results We relax the assumption that the probability of entering wait states is zero in our numerical analyses. Even so, our base case outcomes are consistent with our theoretical results on threshold policies over time and state when a marginal liver is offered, as well as with all propositions and theorems. In the less-frequent 49 problem, for ACLF2, ACLF=3OF, and ACLF>3OF patients, respectively, we find that the optimal policy recommends waiting at most four days, two days, and two days for an optimal liver. We then use these results and Theorem 4 to identify the threshold ts for all states (the threshold for state s such that for any t˜≤ ts, Dt˜(s) ≥ 0 – the optimal action for state s at time t˜must be wait). The thresholds are day four, two, and two for ACLF2, ACLF= 3OF, and ACLF> 3OF patients, respectively. Given the solution from the lessfrequent problem, we know the optimal action for the more-frequent problem must be wait from day 0-4, 0-2, and 0-2 for ACLF2, ACLF= 3OF, and ACLF> 3OF patients, respectively, in the more frequent problem. We then only need to solve for the optimal action on days 4-28, 2-28, and 2-28 for ACLF2, ACLF= 3OF, and ACLF> 3OF patients, respectively. After doing so, we find that the optimal policy recommends waiting at most six days, two days, and two days for ACLF2, ACLF= 3OF, and ACLF> 3OF patients. As expected, the results from the more-frequent problem recommend a longer wait duration than the less-frequent problem as the more-frequent problem provides an additional offer every two days. Figure 3.2: Difference in the expected reward earned over time between more-frequent (M) and lessfrequent (L) problems for the marginal states with ACLF2, ACLF=3OF, ACLF>3OF. The difference between more-frequent and less-frequent problem drops below $0 once both problems’ optimal actions are accept. Triangles denote when the optimal actions for both problems are wait, ‘+’ when it is wait in M and acccept in L, and ‘*’ when the optimal actions for both problems are accept. We show the difference in optimal values (D) in Figure 3.2. The largest difference in the optimal value between the more-frequent and less-frequent problems ranges from $50,730 to $59,203. For ACLF3 patients (both with 3OF and >3OF), these differences decrease over time until both problems have the same optimal 50 action, when the difference value goes below zero (when both problems’ optimal action is accept). When D >0, it would be beneficial for the patient to choose multiple-listing or transfer to another transplant center for a higher frequency of receiving livers. For example, suppose there was an ACLF2 patient (without an optimal liver offer) who had an opportunity to transfer to a transplant center with average offer frequency of once a day from another transplant center with an average offer frequency of once every two days. From Figure 3.2, we see that this patient would not benefit from the transfer after day 6, so transfers should be made before then. From Figure 3.2, we see D > 0 even when both problems have the same recommendation, but note that the difference in expected rewards may not be Ck even when the optimal actions for both problems across time are the same. The difference in expected values can vary because the more-frequent problem allows the decision-maker to more closely track the status of the patient’s state, leading to a higher expected reward even when the optimal policy is the same. Moreover, the decision-maker is provided additional offers which also leads to a higher expected reward. We identify when the difference between the optimal values of the two problems (D) is maximized for ACLF2, ACLF=3OF, and ACLF>3OF patients when a marginal liver is offered. We will refer to this epoch as the ‘time of peak D’ and the D value at this time as the ‘value of peak D.’ Note that different states have possibly distinct ‘time of peak D’ and ‘value of peak D.’ Identifying the difference at the time of peak D is important as it is the maximum per-epoch benefit of switching to the more frequent decision-making framework. In the base case, the time of peak D are day 4, day 2, and day 2 for ACLF2, ACLF=3OF, and ACLF>3OF, respectively. The values of peak D are $59,203, $50,730, and $54,974 for ACLF2, ACLF=3OF, and ACLF>3OF, respectively. Analyzing how the difference value changes over time helps us determine the time of peak D. The difference increases until the more-frequent and less-frequent problems have different optimal strategies, after which it decreases. When the optimal actions for both problems are accept, the difference value 51 becomes negative, as both problems have the same action but the more-frequent problem has costs of additional decision-making opportunities. This insight illustrates a general insight into when the time of peak D occurs: the largest difference in value occurs at either the last decision epoch when the optimal action for both problems is wait or the first decision epoch when both problems have different optimal actions. This is because the difference value is non-decreasing when the optimal action for both problems is wait and non-increasing when the problems have different optimal actions. For instance, according to Theorem 3, for ACLF2 patients, the difference value D is non-decreasing between day zero to four as both problems recommend to wait. This difference in value function between the two problems represents the additional lifetime expected NMBs for using the more-frequent framework. As Ck increases, D will decrease. For example, for ACLF2 patients using base case parameter values, the peak value of D will decrease by approximately $1200 if Ck increases by $500. We find that twice-as-frequent offers would not be net beneficial at any time if Ck is greater than $27,209 for ACLF2 patients, $24,256 for ACLF=3OF patients, or $27,044 for ACLF>3OF patients. If costs were higher than these values, it would not be worthwhile to pursue increased organ offers even if it resulted in twice as frequent offers. While our analysis is conservative and very uncertain, this provides a ballpark threshold for costs. We can also use Theorem 4 to identify an upper bound on the Ck to ensure that twice-as-frequent decisions would be net beneficial. We find these values to be $21,121, $20,395, and $27,030 for ACLF2, ACLF=3OF, and ACLF>3OF patients, respectively. Comparing these values with the actual Ck upper bound values derived numerically above, we find that these values are smaller, as expected (as Theorem 4 only provides sufficient conditions). However, all theoretical values are close to the numerically derived 52 values, showing that this sufficient condition can be practically useful. The gap between actual and Theorem 4 derived Ck is larger for healthier patient, as one-step look ahead approximation is less accurate when patient has longer time to wait. Variation in k. Our framework also allows us to consider situations where the offer frequency is more than doubled — k can by any integer. With a larger k, we observed a longer maximal waiting duration for an optimal liver for patients and a larger value of peak D. For more details, we refer the reader to Appendix B.2.3.1. 3.5.1.4 Sensitivity Analyses: Variation in Relative Risk of Mortality and Probability of Being Offered an Optimal Liver Changes in Difference Value Changes Over Time. The value of the relative risk (rr) of post-transplant mortality and the likelihood of receiving an optimal liver (o) depends on hospital characteristics. We vary o between 0.5-0.7 and rr between 0.7-0.9 (ranges determined from discussions with clinical experts). Appendix table B.3 shows the maximum number of days the model recommends waiting for an optimal liver and the maximum benefits provided by the more-frequent compared to the less-frequent problem. We find that the propositions and theorems demonstrated in the base case analysis also hold for cases in the sensitivity analysis: there exists threshold policies over time and states with marginal liver offered for all cases, the more-frequent problem always provides more benefits, and the optimal value function is always non-increasing in time. The value of D is monotone over the appropriate times as defined in Theorem 1 and Theorem 3. Figure 3.3 illustrates the outcome described in Theorem 1, which states that when the problems have different optimal actions, the difference in optimal values must be non-increasing over time. We see this from day 6-8 for ACLF2 patients – in this period, the optimal action for the more-frequent problem is wait while the optimal action for the less-frequent problem is accept. Similarly, Theorem 3 states that 53 when the optimal policy for both problems is the same (wait), the difference in optimal values must be non-decreasing over time, as seen in days 0-4. Figure 3.3: Difference in the expected reward earned over time between more-frequent (M) and lessfrequent (L) problems for the marginal states with ACLF2. The difference in optimal value is marked triangular when the optimal action for both problems is wait, marked ‘+’ when the optimal action for the more-frequent problem is wait while the optimal action for less-frequent problem is accept, and marked ‘*’ when the optimal action for both problems is accept. We see substantial variation across both problems in the maximum number of days to wait for an optimal liver across the o and rr ranges evaluated. In all but the last case in Appendix table B.3, the results from the more-frequent problem recommend a longer waiting period, as expected. Offering an additional liver each day as well as allowing the decision-maker to track the patient’s status more closely makes the wait option more beneficial, as there will be more opportunities to be offered an optimal liver in the future and there will be more opportunities to halt a quickly deteriorating health state. As shown in Appendix table B.3, we notice that both the time of peak D and the cost at that time increases as the relative risk decreases, because the decrease in the relative risk will lead to a larger difference in benefit between an optimal and a marginal liver. The time of peak D also increases as the likelihood of receiving an optimal liver increases. This suggests that the decision-maker would switch from a lessfrequent problem to a more-frequent problem later when the likelihood of receiving an optimal liver is larger. 54 Appendix table B.3 also illustrates that the largest difference in value occurs at either the last decision epoch when the optimal action for both problems is wait or the first decision epoch when both problems have different optimal actions. For instance, when o=70% and rr=0.7 for ACLF=3OF patients, the maximum D value is achieved at the last decision epoch when the optimal action for both problems is wait (day 2). For ACLF2 patients in this case, however, the time of peak D is the first decision epoch with discordant optimal policies. Also, by finding the value differences at peak D for these two cases, we know that the maximum benefit of switching from the less-frequent problem to the more-frequent problem is $79,937 for ACLF2 patients and $68,923 for ACLF=3OF patients. Changes in Difference Value Over States. Figure 3.4 shows D when o= 70% and rr = 0.5. Theorem 2 states that when both problems have different optimal actions, the difference in optimal values must be non-increasing over states. We see this among ACLF=3OFOF and ACLF>3OF patients on day 6, as the optimal action for the more-frequent problem is wait while it is accept for the less-frequent problem. Similarly, Theorem 3 states that when the optimal policy for both problems is the same (wait), the difference in optimal values must be non-decreasing over states, as seen in all states in days 2 and 4, when both moreand less-frequent problems recommend wait. Figure 3.4: Difference in the expected reward earned over states between more-frequent (M) and lessfrequent (L) problems for day 2, 4, and 6. Analyzing how the difference value changes over non-wait states helps us understand how the benefits change over the health state. Generally, in early decision-making periods, when both problems recommend wait, patients with more organ failures will benefit more from additional liver offers. Later, the two 55 problems recommend different optimal actions (if offered a marginal liver, the more-frequent problem recommends wait while the less-frequent problem recommends accept) in sicker states while both problems recommend wait in healthier states. This shows that the sicker states will gain less benefit from additional decision-making as sicker patients are recommended to accept a marginal liver in the less-frequent problem due to higher mortality probability, and therefore have lower benefits compared with healthier states when both problems recommend wait. Where Does the Benefit Come From? In this example, the benefit of more-frequent decision-making arises from both receiving more liver offers and additional decision-making opportunities. To understand which contributes more towards increasing the value of peak D, we consider a case study for ACLF2 patients, where we fix the probability of receiving a liver offer to be the same over a two-day interval in both the more- and less-frequent problems. The difference between the benefit accrued in this scenario and the previous base case values will then quantify the benefit associated with receiving more livers. To do this, we set the likelihood of receiving a liver for the less-frequent problem (Ω) to 1. Then we set the probability for the more-frequent problem (ω) such that both problems produce one liver offer every two days in expectation. Assuming o=70% and rr=0.7, we found that the value of peak D is $1,490. The base case found $79,937 of additional benefit obtained for ACLF2 patients, so the additional benefit mainly comes from receiving liver offers. 3.5.2 Treatment Initiation for Early-Stage CKD Our second numerical example focuses on a monitoring problem for treatment initiation, where physicians can only observe patient health states for updating treatment plans during an office visit. Disease progression progresses stochastically, and the optimal frequency of these visits may vary by health state and time. 56 In this example, we examine a decision for how frequently to monitor early-stage chronic kidney disease (CKD) patients to determine treatment initiation time. CKD patients are categorized into five stages by disease severity, as measured using the estimated glomerular filtration rate (eGFR). According to current guidelines, CKD patients should undergo at least yearly eGFR checkups and initiate angiotensinconverting enzyme (ACE)/angiotensin receptor blocker (ARB) treatment immediately after reaching stage 3 [100, 131]. However, there is some controversy regarding whether annual checkups are sufficient [65], and considering the low cost associated with ACE/ARB treatment, there is a possibility that early initiation of such treatment could benefit patients [115]. The optimal monitoring frequency and time to initiate treatment would depend on the lifetime health benefits and costs (as measured using NMB). However, more frequent office visits for CKD status monitoring incurs additional costs, and more-frequent physician visits would only be beneficial if these costs were offset by the health benefits of catching disease progression earlier. We therefore use our approach to quantify the relative benefit of more frequent treatment initiation opportunities for early stage (stage 1-2) CKD patients. To do this, we create two equivalent MDPs for patients in stages 1, 2, and 3+. The less frequent MDP problem uses a 12-month decision-making frequency, while the more-frequent assumes 6-month checkups. The available actions in both MDPs are to either initiate treatment or wait. We assume increasing the decision-making frequency incurs a cost (Ck). We use a previously established CKD simulation model to estimate rewards and transition probabilities [66]. Details of the model, solution, and outcomes are given in Appendix Section B.3. Our investigation shows that early treatment initiation is necessary and beneficial for stage 1 and 2 CKD patients for some age groups, regardless of the decision-making frequency. Remarkably, the optimal policies derived from both MDPs are identical, indicating that the decision-making frequency does not significantly impact the recommended course of action. Furthermore, the differences in optimal values between the two MDPs fall below zero. This suggests that a higher frequency decision-making framework 57 may not yield substantial benefits, as the progression of chronic diseases like CKD tends to be slow. These results indicate that the current monitoring frequency recommendations are sufficient, despite ongoing controversy to the contrary. Our findings in this example also validate and strengthen our theoretical results, as all results were consistent with our theoretical conclusions and served to illustrate that they were applicable in a CKD context. These results demonstrate that our approach to quantifying the benefit of additional decision-making opportunities can be extended and applied to various healthcare scenarios, showcasing its generalizability and relevance in different healthcare applications. 3.6 Conclusions We examine the value of making more frequent decisions under a discrete-time, finite-horizon MDP framework. We quantify the benefits of making more frequent decisions as well as provide useful structural properties that can help decision-makers decide which frequency decision-making model to use when more frequent decisions are costly. In our theorems, we provide novel insights comparing the value difference between the more- and lessfrequent problems. We identified properties of the more- and less-frequent problems to determine when there exists a threshold policy in time and in state. We additionally found properties of the difference in expected value between these problems and provide sufficient conditions for when more-frequent decisionmaking would be net beneficial. We provide sufficient conditions under which we can guarantee that the more-frequent decision-making framework would be preferred for some state at some time, and we demonstrate all of our theoretical results numerically in our examples. We provide a two numerical examples to showcase the practical application of our work. The first examines liver transplantation to determine the value of more frequent liver donations. We use data from the UNOS database to parameterize our model for severely ill patients with multiple organ failures (ACLF2/3). 58 Our numerical example shows that the maximum benefit from additional decisions (the value at peak D) is roughly $60,000 in NMB over the lifetime post-transplant years, in the base case (which assumes Ck is $2000). This benefit, which includes both health outcomes and financial costs, goes to zero if the per-patient, per-period cost of using the more-frequent framework increases to roughly $25,000. We also examine the difference in value function when input values are varied in sensitivity analyses, and we find it is inversely proportional to the value of the relative risk, rr. However, it does not change substantially with different likelihood of receiving an optimal liver, o. We also show the relationship between the more-frequent and the less-frequent problem in a more relaxed setting (when all assumptions are not satisfied) in Proposition 2. We find that when accept is optimal for time t for the more-frequent problem, it is also the optimal action for time t˜ (t˜ = t) for the less-frequent problem. This suggests that the solution of the less-frequent problem will provide a partial solution to the more-frequent problem, which potentially provides us a direction to speed up the problem solving process for MDPs. In general, MDPs are NP-hard problems, which are not solvable in polynomial time, and speeding up MDP algorithms has been a historic topic of interest [107]. With Proposition 2, we can solve the less-frequent problem first, which is smaller due to its large epoch size, and then solve the more-frequent problem using the partial solution provided by the more-frequent problem. This reduces the number of computations when there is a threshold policy in both problems and thus speeds up the solving procedure. The policy implications of our work are that 1) we can identify cost-thresholds over which it would not be beneficial to provide even severely ill patients (ACLF2 and ACLF3) patients with more frequent organ offers, and 2) even if it is net beneficial to do so, it may not be beneficial to do so for all periods and health states of the patient — and our analysis framework can identify when it is worth it, thus allowing for targeted offer frequencies. Currently, liver offers already consider patient severity (through MELD scores, 59 etc. [139]), and given the push for more patient-tailored healthcare, perhaps it would be realistic to push for individualized organ offer frequency. In a second example, we demonstrate the broader applicability of this framework by considering a treatment initiation problem for CKD patients. This example not only demonstrates the validity of our theoretical findings but also offers valuable insights into the importance of early treatment initiation for CKD patients. We found that the value at peak D is negative, indicating that it would not be net beneficial to increase the frequency of early-stage CKD patient monitoring from the recommended annual check-up. This shows how our comparative MDP analysis can contribute towards clinical controversies surrounding monitoring frequency. We must acknowledge several limitations of this work. We set up the more-frequent problem in a way that the two transition probability matrices for each half-day are identical if their time indices belong to the same time epoch in the less-frequent problem for simplicity. In reality, the transition probability matrix is changing continuously over time. However, since the change in the transitions within a short time period is negligible, our results should not change substantially in our numerical analysis even had we allowed the transition matrices for t and t + 1 periods (t being odd) to be different. While in practice the decision interval between offers is not completely regular (probabilistic arrival of liver offers), for this problem, due to the short decision epoch size (one day, half day, etc.), we argue that an equal size interval between offers is a reasonable approximation. However, we agree that in general this may not be an appropriate framework for other problems. We limit our analysis to stopping problems, as these occur naturally in many healthcare settings, but many interesting MDPs also fall outside this category. We only analyze problems where the more-frequent framework allows for k times as many decisions, where k is an integer; this limits the generalizability of our analysis. Our analysis focuses on finite horizon models, and we leave the extension of this work to infinite horizon models for future work. 60 Our numerical analysis relies on highly uncertain input values, particularly Ck. However, we perform sensitivity analyses and identify upper bounds for Ck above which the cost would no longer justify the additional decision-making opportunities, and we compare these numerical outcomes to our theoretical bounds. We also make very rough estimates on what the total allowable cost would be when considering the whole cohort of ACLF patients in the UNOS database. These estimates assume the costs and number of liver offers provided to patients are the same across the nation, which of course is not true in reality. However, these values provide ballpark figures that may be useful to policy. Despite these limitations, we believe that this work provides interesting insights for not only transplantation applications but also other applications such as monitoring problems with regular decision epochs. Our chapter draws attention to quantifying the benefits of a more-frequent decision-making framework in health care settings. We derived structural properties between the more-frequent and the less-frequent problems, and provided a numerical example to show how to make use of these results. These results have implications for a wide variety of MDP stopping problems and provide insight into future work on improving the speed of MDP solution algorithms. 61 Chapter 4 State Discretization for Continuous-State MDPs in Infectious Disease Control 4.1 Introduction Public health officials often need to determine the optimal health intervention policy over time while facing substantial uncertainty about the state and trajectory of disease in a population. Many of these problems require making policy decisions sequentially over time, where the state may be represented using a continuous measure (e.g., proportion of the population that is infected). For instance, during the COVID-19 pandemic, decision-makers needed to repeatedly set the start and end times of lockdowns that limited transportation and mixing between individuals to reduce transmission even while the exact transmissibility of COVID-19 was not yet fully understood. This repeated decision-making problem under uncertainty appears repeatedly in infectious disease control, as evidenced by prior literature on similar problems [73, 27, 132, 160, 96, 53]. Such problems often need to consider underlying disease dynamics, which can be measured with uncertainty or depend on a variety of complex social and biological factors. One of the difficulties in solving repeated decision making problems for infectious disease are the complexity of infectious disease dyanmics, which are typically represented using compartmental models and simulation-based models [30, 81]. Such models are difficult to use for repeated decision-making problems 62 as one often needs to evaluate the model repeatedly to identify an optimal policy for disease control, which may require a significant investment of time as there is no closed-form solution. While there are sophisticated ways to identify optimal policies, these techniques have their own challenges. For instance, the maximum principal approach [57, 105, 104] offers a solution framework for optimal control issues under differential equation systems. However, its application becomes increasingly challenging with a larger number of states or policies. Such expansion complicates both the Hamiltonian and the differential equations system, thereby rendering the process of deriving analytical or numerical solutions more complicated and time-consuming. Moreover, it is difficult to find the optimal solution when the problem is non-convex. Simulation optimization, which can handle complex systems, has also been used in disease contexts [35]. However, it is also computationally expensive and time-consuming. Furthermore, the quality of the solution highly depends on the search space and the heuristic function chosen, presenting challenges to its practical application. One can formulate the infectious disease problem as a dynamic programming [33] problem, but the continuous or large state space makes the problem difficult to handle. Dynamic programming methods such as Markov decision processes (MDPs), are also a commonly used method for repeated decision making under uncertainty. MDPs allow for uncertainty in state transitions, which can be used to describe changes in disease/health states over time and allow for repeated decisions over time. Given current computing innovations, many MDPs of useful size can be solved effectively using algorithms such as backward induction, value iteration, policy iteration, etc. MDPs can also be efficiently solved with non-convex problems. Additionally, they inherently account for uncertainty in the outcomes of actions through transition probabilities. However, incorporating dynamics from compartmental models and simulations into an MDP framework is hard because these models use continuous or large number of possible states (as the state usually 63 represents a proportion of the whole population in certain statuses like infectious, recovered, and hospitalized). Having a continuous state space makes the MDP problem hard to solve since traditional MDP solution methods may then not work even for a short time horizon due to state-space explosion issues. For example, backward induction need |S| 2 |A||T − 1| multiplications. In the case of value iteration, each iteration carries a complexity of O(|S| 2 |A|). In the case of policy iteration, each iteration carries a complexity of O(|S| 3 +|S| 2 |A|), and modified policy iteration requires O(k|S| 2 +|S| 2 |A|) per iteration [107]. For this reason, many traditional MDP studies in the healthcare field focus on finite-state decision-making problems like monitoring, treatment initiation, and disease testing and diagnosis [86, 46, 2, 68, 90, 6, 5, 82, 93, 116, 47, 36, 85, 3, 95, 34, 87, 161]. Therefore, finding a good state discretization method that translates infectious disease dynamics into a limited number of states would save consideration computational time in solving an MDP. Uniform discretization is a traditional way of addressing continuous state problems. However, this methodology is suboptimal for addressing infectious disease control challenges. The heterogeneity in state visit frequencies—wherein some states (with extremely high prevalence) may remain unvisited and others (with lower prevalence) might be visited more frequently—renders uniform discretization inefficient. This approach may result in the overuse of discretizations towards states that are less likely to be visited and an inadequate number of discretizations for those with higher probabilities of being reached. How can we find a better way of discretizing the state space to closely represent the changes in health systems/disease? While many works have used various discretization methods to reduce state spaces [89, 112], we here take a novel approach that treats the state discretization problem as an optimization problem. This allows us to find the discretization that would provide a smaller discretized region in more likely visited states for a more accurate description of the true dynamics. We will explore the above state discretization in the context of a disease control problem where states are used to describe the disease dynamics over a population (how many people are infected, how many 64 people die from the disease, etc.), actions are implemented to prevent disease spread (lockdown, social distancing, face masks, and so on). States are assumed to be fully observable at each time. Under this framework, we find a better way of discretizing states such that the discretized state space serves as a good proxy of the original state space. This chapter addresses the challenge of formulating infectious disease control problems as MDPs by proposing a new algorithm for non-uniform state discretization that enables the discrete representation of infinite state spaces. 4.1.1 Contributions We make several contributions in this study. We provide a novel algorithm for defining a non-uniform, discrete state space for infectious disease control problems that well approximates the original continuous state dynamics. Our algorithm exploits the likelihood of each state being visited in the system to more efficiently capture the transitions between states. Defining our method to define a discrete set of states from an originally continuous system allows us to incorporate infectious disease dynamics within frameworks that are better suited for discrete state spaces, such as MDPs. We demonstrate that our state space discretization allows for more accurate MDP outcomes through two numerical examples, one using a classic SIR compartmental model and one using COVID-19 model of Los Angeles County. The remainder of this chapter is organized as follows: we review the related literature in Section 4.2, present the problem setup in Section 4.3, and provide the algorithms in Section 4.4. The numerical example is shown in Section 4.5. In Section 4.6, we conclude. 4.2 Literature Review 4.2.1 Markov Decision Processes in Healthcare Applications MDPs have a rich history in the field of operations research, with wide range of applications such as inventory management [54], portfolio management [20], production and storage optimization [13], and 65 various others. Extensive research has been conducted to solve and understand the structure of MDPs, with notable contributions from works such as Puterman (1994)[107] and Topkis (2011)[134]. MDPs also find widespread application in the field of healthcare. They offer valuable insights and solutions to various health-related issues, including screening[93, 36, 3], sequential disease testing[12, 118], treatment initiation[116, 87], and organ transplantation[111, 112, 161]. For instance, patients in different age groups with risks of breast cancer may need personalized mammography exam frequencies[3], or, in another example, a patient with organ failure may be presented at different state with organ transplant options that vary in their compatibility with the patient. The patient may face the decision to either wait for a better match or accept an offered organ as their own survival probability decreases over time [161]. For a more extensive exploration of MDPs in healthcare, refer to the comprehensive reviews by Schaefer et al. (2004)[113], Alagoz et al. (2010)[4], and Sonnenberg et al. (1993)[120]. Although MDPs are widely used in healthcare applications, most of these consider finite-state decision-making. Constructing MDP for infectious disease control problems with repeated decisions is challenging, especially when the state space for such problems is continuous. 4.2.2 Solving Continuous State MDP As previously discussed, an infinite or continuous state space is a major challenge when formulating MDPs. Several methods have been proposed to address this problem. In Munos et al. (2002), different criteria for dicretizing state and time space non-uniformly are discussed[99]. These methods involve evaluating values or policies using dynamic programming; however, some of these methods raise computational concerns for problems with continuous or large numbers of states. Brooks et al. (2006) proposed a parametric method to discretize a continuous state space by constraining distributions over state space to a parametric family[31]. However, since prior knowledge of the distribution is required, MDPs for population-level infectious disease control would be difficult to solve in this manner. One remedy for such a problem is 66 to solve the MDP formulation by truncation and discretization of the state [29]. Researchers have used various methods to achieve this. For example, Zhou et al. (2010) used Monte-Carlo simulation to approximate the belief state by a finite number of particles on discretized grid mesh[163]. Sandikcci et al. (2013) used fixed-resolution, non-uniform grids to discretize the belief state and approximate the optimal policy for a partially observable MDP (POMDP) model[112]. Lovejoy et al. (1991) used fixed or uniform grids to approximate the solution of the POMDP[89]. However, using uniform or pre-defined discretizations (requires domain knowledge) may not always be appropriate, particularly for infectious disease control problems where disease spread is subject to uncertainty across different policy scenarios. In such cases, a more effective discretization algorithm is needed to enable the computation of the optimal policy. 4.2.3 Modeling Disease Dynamics To identify the optimal policy for an infectious disease control problem, it is necessary to have a model for describing the disease dynamics. For instance, during an emerging pandemic, how would disease transmission change if the government imposed a 1-month lockdown? How would it change if the government imposed a 3-month lockdown instead? Different policies may change the patterns of disease transmission and thus change the proportion of infections in total. To efficiently avert infections, these different possibilities need to be evaluated to understand the resultant health and cost outcomes. Multiple methods are available for assessing the impact of different policies on a specific population. One common method to model disease dynamics is to use compartmental models based on differential equations [30, 77, 78, 79]. A compartmental models uses a mathematical framework to provide insights into the mechanism that affect the transmission and progression of disease. This framework partitions the population into different health or treatment states (compartments). For instance, each compartment represents a specific stage of the infectious disease (e.g., susceptible, infected, recovered), and individuals move between compartments described by differential equations at certain rates. This model is fundamental in 67 epidemiology for understanding the spread of diseases and evaluating the potential impact of public health interventions. For example, compartmental models can compare the effectiveness of wearing masks and social distancing during the COVID-19 pandemic [60, 72]. Long et al. (2018) use a classical compartmental model to assist with the decision of allocating resources during the 2014 Ebola outbreak in Africa[88]. In the numerical section of this manuscript, we consider a classic Susceptible-Infected-Recovered (SIR) epidemic model, which has been extensively used in the epidemiological literature [22, 63, 83]. Another method of evaluating disease dynamics is to use simulation models, which can be used to track transmission, progression, and behavior as well as policy outcomes. For instance, which can be employed to examine the cost-effectiveness of screening recommendations for positive-HIV men who have sex with men (MSM) [137], as well as to study the effectiveness of different disease control strategies for tuberculosis (TB) in India [121]. Although these methods indeed capture the dynamics of the complicated disease, they are unable to compute dynamic policies effectively as mt evaluations are usually needed when there are m possible interventions and t decision epochs. Therefore, it is beneficial to find alternative effective ways of identifying the optimal policy for infectious disease control. In our chapter, we consider a discrete-state MDP framework that takes advantage of its effective solution methods with underlying disease dynamics estimated from traditional disease models such as compartmental and simulation models. To model this problem as a discrete-state MDP, we also need to define a transition function to describe the probability of transitioning between the states. Several existing techniques help to construct this function. For instance, Yaesoubi et al. (2011) proposed a way to compute transition probabilities given a system of ODEs[157]. In another example, Mishalani et al. (2002) proposed a method of developing transition probabilities from a stochastic duration model based on the hazard rate function[97]. However, these methods are computationally intensive, which limit their usage to problems with small populations or disease models with special structures. 68 4.3 Problem Setup The notation used in this chapter is as follows. We denote Xt ∈ X as the state of the epidemic at time t. Xt = [X1t , X2t , ..., Xnt] has n components where each represents the proportion of the population in the compartment (e.g., for a SIR model, n = 3). For example, Xt = [XSt, XIt, XRt] ∈ [0, 1]3 can describe the proportion of the population in susceptible (S), infected (I), and recovered (R) compartments at time t for a SIR model. We denote X0 as the initial state and we assume it follows an initial distribution Ω. We use {Xt} = (X0, ..., XN ) to denote the disease trajectory. In this chapter, we focus on the finite horizon problem. Let T = {1, ..., N} be the set of possible decision epochs for the problem. A = {1, ..., |A|} is the set of possible policy interventions for the problem. We assume a small, finite number of actions/policies (e.g., lockdown versus no lockdown). We denote πt ∈ A as the policy intervention at time t. We consider a model denoted by f(Xt , πt) = Xt+1 that describes the disease dynamics across time epochs t. This function f(Xt , πt) can consider disease progression, transmission over time, mortality, and interventions. Generally, f(Xt , πt) takes the state of the system and policy intervention as an input and then returns the state in the next period. We assume that f(Xt , πt) is time-homogeneous for simplicity (if time-inhomogeneous dynamics are desired, our methods can be easily extended). The cost in state Xt ∈ X and taking action πt ∈ A for t ∈ T in the infectious disease control problem is denoted using r(Xt , πt). This cost can be dependent on health outcomes (e.g., number of infected, total vaccinated population, etc.) as well as other factors (financial cost, economic burden, etc.). We let λ denote the discount factor. 69 Given the transition function f(Xt , πt) and the cost function r(Xt , πt), we have the following optimization formulation for our repeated decision-making disease control problem: min π0,...,πN−1 X N t λ t r(Xt , πt)|X0 (4.1) s.t. Xt = f(Xt−1, πt−1) (4.2) In the above problem, the objective is to find a sequence of actions {π0, ..., πN−1} that minimizes the total discounted cost function r(Xt , πt) over states Xt for the whole N-period time horizon given a known initial state X0. For example, Xt can represent the proportion of individuals in each COVID-19-related health stage at time t, and let r(Xt , πt) compute the proportion of people dead from COVID-19 at time t. If πt denotes the policy intervention (lockdown or not) at time t, then f(Xt−1, πt−1) could be a system of difference equations that describes the population flow across different health stages. Our objective in this problem then is to find the optimal policy intervention at each time t that minimizes the total cost within N periods. There are challenges to solving the above formulation using traditional MDP solution methods (e.g., backward induction, value iteration, policy iteration, etc.) as this formulation usually contains constraints with non-linear dynamics on a continuous state space. These solution methods require a finite number of states for effective evaluation. Moreover, the function f(Xt , πt) may not be expressed as transition probabilities from state to state, while many traditional MDP solution methods use transition probability matrices to allow for the modeling of uncertainty and variability in decision-making processes. To discretize the continuous state space, we partition the state space X into a discrete set of states X¯. For each component i in X , we use the discretization vector Gi to describe how the continuous state space is partitioned into discrete states. The discretization vector Gi contains the maximal and minimal values of the discretized regions for component i. We use G to represent the list of discretization vectors for all 70 0 0.2 0.4 0.6 0.8 1 Susceptible, GS = [0,0.6,1] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Infectious, GI = [0,0.2,1] Figure 4.1: Four regions defined using G = {[0, 0.6, 1], [0, 0.2, 1]} are shown in different colors. These correspond to four states:(1) : [X¯ S, X¯ I ] = [0.3, 0.1]; (2) : [X¯ S, X¯ I ] = [0.3, 0.6]; (3) : [X¯ S, X¯ I ] = [0.8, 0.1]; (4) : [X¯ S, X¯ I ] = [0.8, 0.6]. For example, Xt = [0.1, 0.3], the corresponding discretized state representation is X¯ t = [0.3, 0.6]. components in X . For instance, for an SI model, if G = {[0, 0.6, 1], [0, 0.2, 1]}, we mean that group 1 (the susceptible proportion of the population) is partitioned into two regions[0, 0.6) and [0.6, 1], and the second group (infected proportion) is being partitioned into two regions [0, 0.2) and [0.2, 1]. In this case, we have a total of 2 × 2 = 4 regions. These four regions are given by (1) : XS ∈ [0, 0.6), XI ∈ [0, 0.2); (2) : XS ∈ [0, 0.6), XI ∈ [0.2, 1]; (3) : XS ∈ [0.6, 1], XI ∈ [0, 0.2); (4) : XS ∈ [0.6, 1], XI ∈ [0.2, 1] (shown in Figure 4.1). From these regions, we capture the discretized state space in matrix X¯, which is comprised of the Euclidean centroids of each region. The dimension of X¯ is |G| ×n where |G| is the number of regions and n is the number of components. Thus, in the example above, we would have four states. (1) : [X¯ S, X¯ I ] = [0.3, 0.1]; (2) : [X¯ S, X¯ I ] = [0.3, 0.6]; (3) : [X¯ S, X¯ I ] = [0.8, 0.1]; (4) : [X¯ S, X¯ I ] = [0.8, 0.6]. In this 71 case, X¯ = 0.3 0.1 0.3 0.6 0.8 0.1 0.8 0.6 . Similarly, we define X¯ t ∈ X¯ to be the discretized state at time t and {X¯ t} = (X¯ 0, ..., X¯ N ) to be the trajectory for the discretized state. With this new discretized state space, we can now define ¯f(X¯ t , πt , G), the disease dynamics on the discretized state space. Even though the true disease dynamics might be non-linear, we approximate the transitions on the discretized state space using a linear transition matrix. This is a reasonably good approximation if the length of t is sufficiently small. We denote this transition probability matrix as P(πt) for πt ∈ A. P(πt) has the dimension of |X |×| ¯ X | ¯ where |X | ¯ is the size of the state space. Then the probability of the system being in a state at time t + 1, X¯ t+1, given it was in state X¯ t at time t and policy intervention πt ∈ A is denoted as P(X¯ t+1|X¯ t , πt). Let Vt(X¯ t) denote the optimal value function of the discretized state X¯ t ∈ X¯, t ∈ T for the discretized infectious disease control problem. At optimality, the following must hold: Vt(X¯ t) = max πt∈A r(X¯ t , πt) + λ X X¯ t+1∈X¯ P(X¯ t+1|X¯ t , πt)Vt(X¯ t+1) 4.3.1 State Space Discretization Problem With the original system f(Xt , πt) and state space X , we aim to find the discretized state space X¯ and the transition matrices P that approximate well the original system in that it gives a similar objective value Vt(X¯ t), trajectories {X¯ t} given {π0, ..., πN−1}, and a small optimality gap. In order to do this, we need to find a suitable G and map from ¯f(X¯ t , πt , G) to P. We focus on approximating the original system by establishing an appropriate discretization approach. An effective discretization method should consist of a small number of discretized states that consider 72 intervention effects. To do this efficiently, the discretized states should be capable of providing higher precision in areas where the state space is more likely to be visited. This can lead to a better approximation of the true disease dynamics and can thus result in a more accurate MDP solution. Given the function f(Xt , πt), the initial state, the time horizon, and a sequence of policies {π0, ..., πN−1}, we can calculate a trajectory {Xt}. Subsequently, we require a state discretization G that ensures the discretized trajectory {X¯ t} closely approximates {Xt} for various initial states and policies. Therefore, our objective is to minimize the distance between the true trajectory and trajectory from the discretized model over all samples θ = (X0, {π0, ..., πN−1}) ∈ Θ, all policy intervention scenarios πt , and all time, i.e., minimizing P θ∈Θ PN t=1 ||Xt − X¯ t ||2 | θ. Given a sequence of policy intervention {π0, ..., πN−1} and an initial state X0, we compute the true trajectory using f(Xt , πt). We use ¯f(X¯ t , πt , G) to compute the trajectory from the discretization space matrix X¯. We then map the transition function for discretized states ¯f(X¯ t , πt , G) to transition probability matrix P. Various existing techniques help to construct transition probabilities given function ¯f(X¯ t , πt , G). We discuss how to find a generalizable and efficient way of computing transition probabilities from ¯f(X¯ t , πt , G) given the state discretization in the next section. 4.4 Algorithms In this section, we present a generalizable framework for discretizing a continuous state space for use in MDP frameworks and correspondingly constructing transition probability matrices. 4.4.1 Greedy Algorithm for Finding Discretizations (GreedyCut) The main objective of discretization is to design an effective approach for approximating the disease dynamics with a high level of accuracy, making such problems tractable for conventional discrete-space MDP frameworks. However, it would not be advantageous if the process of finding discretizations itself 73 becomes excessively costly. Therefore, our motivation is to identify a low-cost method that can produce discretizations capable of representing the disease dynamics effectively. In particular, we are interested in outperforming a uniform discretization, which can be considered a general default discretization appropriate across many domains. We assume there is a budget B that represents the total number of regions/discretizations we can have in realize of computational considerations. We use simulated initial states and policy interventions θ = (X0, {π0, ..., πN−1}) ∈ Θ to find the discretizations. The greedy approach has been widely applied to various optimization tasks, which is easy to implement and effective at finding solutions [26, 155, 162]. We now propose Algorithm 1 (GreedyCut), a greedy-based iterative approach to finding a good discretization. In Algorithm 1, we have three functions. The cost function computes the sum of squared error between the trajectory from the discretized state space {X¯ t} and true trajectory {Xt} from f(Xt , πt). We compute the discretized trajectory {X¯ t} using ¯f(X¯ t , πt , G), where the i-th component X¯ it = P|Gi|−1 j=0 1Gi,j≤f(X¯ t−1,πt−1)i<Gi,j+1 Gi,j+Gi,j+1 2 takes the average value of the region it belongs to after the discretization. The cost function can also be customized (e.g., introduce another penalty term to emphasize certain disease compartments). The cut function halves the i-th region of the d-th component to make two discretized states from one continuous range. For example, if we apply CUT(1, 1, G) where G = {[0, 0.6, 1], [0, 0.2, 1]} is defined the same as Figure 4.1, then the new discretization G′ becomes G′ = {[0, 0.3, 0.6, 1], [0, 0.2, 1]}. The new discretization G′ is shown in Figure 4.2b. After the cut, there are now 6 discretized regions. The greedy function then iteratively computes the cost of cutting one continuous range into two equal discretizations along each component (dimension) and finds the best cut. If each cut has the same costs, a point (Xdt ∈ Xt ) from the sampled trajectory ({Xt}|θ) will be randomly drawn, and the region that this point belongs to (component d of the region i of G such that Gd,i ≤ Xdt ≤ Gd,i+1) will be cut into halves. 74 Algorithm 1 Iterative Discretization for Disease Control Problems 1: procedure Cost({Xt}, {X¯ t}) ▷ {Xt} is the true trajectory, {X¯ t} is the trajectory from the disretization 2: return PN t=1 ||X¯ t − Xt ||2 2 3: procedure Cut(d, i, G)▷ d is the component we want to cut, and we want to cut interval [i, i + 1] in half 4: Gd = [0, ..., Gd,i,(Gd,i + Gd,i+1)/2, Gd,i+1, ..., 1] 5: return G 6: procedure Greedy(B, G, f(Xt , πt), ¯f(X¯ t , πt , G), θ) ▷ B is the budget, f(Xt , πt) is the compartmental model dynamics, ¯f(X¯ t , πt , G) calculates the trajectory using discretized states X¯ t , Θ is the pre-generated samples of initial states and policies, for each θ ∈ Θ, θ = (X0, {π0, ..., πN−1}) 7: iter_per_sample = |Θ|/B 8: for θ ∈ Θ do 9: for iterations = 1 : iter_per_sample do 10: best cost = UB 11: worst cost = LB 12: for Component d do 13: for discretization i ∈ Gd do 14: Compute {Xt} using Xt = f(Xt−1, πt−1) 15: Compute {X¯ t} using X¯ t = ¯f(X¯ t−1, πt−1, Cut(d, i, G)) 16: tmp cost = Cost({Xt},{X¯ t}) 17: if tmp cost < best cost then 18: best cost = tmp cost 19: if tmp cost > worst cost then 20: worst cost = tmp cost 21: if worst cost = best cost then 22: draw a point Xdt from {Xt} 23: update G=Cut(d, i, G) such that Gd,i ≤ Xdt ≤ Gd,i+1 24: else 25: cut where cost is minimized 26: return G 75 0 0.2 0.4 0.6 0.8 1 Susceptible, GS = [0,0.6,1] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Infectious, GI = [0,0.2,1] (a) 0 0.2 0.4 0.6 0.8 1 Susceptible, GS ' = [0,0.3,0.6,1] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Infectious, GI' = [0,0.2,1] (b) Figure 4.2: Apply Cut(1,1,G) where G = {[0, 0.6, 1], [0, 0.2, 1]} gives new discretizations G′ = {[0, 0.3, 0.6, 1], [0, 0.2, 1]}. In the new discretization, the X¯ t is changed as the Euclidean centroid where Xt belongs to has changed. For G, ||Xt − X¯ t ||2 = 0.11. For G′ , ||Xt − X¯ t ||2 = 0.0925. When every cut incurs the same costs, we want to cut based on the data obtained through sampling. In general, it is unlikely that the costs for all cuts would be exactly the same; this might happen at the beginning of the algorithm when each discretized state encompasses a large range and the approximation will not improve if cut just once. In total, |G| = P d |Gd| discretizations will be generated. 4.4.1.1 Complexity Analysis In Algorithm 1, if we assume that computing {Xt} and {X¯ t} using f(Xt , πt) and ¯f(X¯ t , πt , G) requires K and K¯ operations respectively, we can analyze the total number of operations performed by the GreedyCut algorithm. Since the computational costs for generating new discretizations, comparing costs, and computing costs are relatively small compared to computing {Xt} and {X¯ t} in our problem, we assume these costs are negligible compared with other costs. The GreedyCut algorithm enumerates through each discretization of b discretizations at the current iteration, where 1 ≤ b ≤ |G|. For each discretization, the algorithm performs computations of {Xt} and {X¯ t}, which have time complexities of K and K¯ operations respectively. 76 The total number of operations for the GreedyCut algorithm can be estimated as the sum of operations over all discretizations. This can be expressed as P|G| b=1 b(K + K¯ ) = |G|(|G|+1) 2 (K + K¯ ). Therefore, the complexity of the GreedyCut algorithm is on the order of O(|G| 2 (K +K¯ )). This implies that the complexity grows exponentially with the number of discretizations, given by |G|. The time complexity increases quadratically with |G| while linearly with the number of operations required for each discretization, represented by (K + K¯ ). As a result, the computational complexity of the algorithm grows rapidly as the number of discretizations increases. This highlights the exponential relationship between the complexity and the desired level of granularity in the discretization process. 4.4.2 Constructing a Corresponding Transition Matrix Once a suitable discretization of a continuous state space has been constructed, we additionally need a transition matrix between these discretized states to capture the dynamics for use in an MDP framework. To do this, we present Algorithm 2, which draws samples from each discretized state to determine the frequency of transitions to subsequent states via the function f(Xt , πt) and G. Algorithm 2 Generating Transition matrix 1: procedure Generate(f(Xt , πt), G) ▷ f(Xt , πt) is the ground-truth discrete-time model, G is the discretization 2: Let P be a ||X | × | ¯ X | × | ¯ A| transition matrix with all zeros and each state represents a discretized state from the discretized state space X¯ defined by G 3: for each policy intervention πt ∈ A do 4: for each discretized state i from X¯ do 5: Uniformly draw c number of samples (Xˆ 0) within the region that contains i (including a centroid in this region) 6: for each sample Xˆ 0 do 7: Compute Xˆ 1 = f(Xˆ 0, πt) 8: Find the discretized state j such that the discretized region contains j also contains Xˆ 1 9: P(j|i, πt) = P(j|i, πt) + 1 10: Normalize P to make it a stochastic matrix 11: return P 77 In Algorithm 2, we draw c samples within each region in G and count the frequency of the transitions from the current region to other regions using policy πt and f(Xt , πt). By sampling and counting the transitions from the original system, we approximate the underlying transition probabilities directly. Creating approximated transition probability matrices in this way offers a practical approach to capturing the essential dynamics of the system and enabling efficient decision-making at the population level. 4.4.2.1 Complexity Analysis In Algorithm 2, we generate c samples |G||A| times, where |G| denotes the number of discretizations and |A| represents the size of the action space. Assuming that computing and locating Xˆ 1 in the appropriate discretization take Kˆ operations, we can analyze the time complexity of Algorithm 2. The number of operations performed by the algorithm then is O(cKˆ |G||A|). Furthermore, in most disease control problems, such as COVID-19 mitigation strategies like lockdown, social distancing, and face masks, the size of the action space |A| is typically small. This implies that the algorithm’s time complexity is primarily influenced by the number of samples c, the number of discretizations |G|, and the operations Kˆ needed for computing and locating Xˆ 1. 4.5 Numeric Analysis In this section, we first showcase our proposed framework for reformulating a SIR model to an infectious disease control MDP framework for supporting public health decisions around social distancing policy. We then demonstrate the utility of this framework on an example of COVID-19 in Los Angeles County, drawing from empirical data of case counts in 2020. We benchmark the outcome of our method (we refer to the ‘GreedyCut discretization method’ hereafter) in both examples by comparing our model outcomes to that of a uniform discretization framework. In this uniform discretization framework, we discretize the entire state space uniformly using the same 78 number of discretizations as used in the GreedyCut discretization method. We then construct the transition probability matrix using Algorithm 2. We use the same budget for both the GreedyCut and the uniform discretization methods to discretize the state space. The transition probabilities for both methods are generated using Algorithm 2. In the second example, we additionally compare our model outcomes to the empirical status-quo policy in Los Angeles in 2020 to demonstrate how much improvement our method is able to make. 4.5.1 Example 1: A Simple SIR Model The SIR model tracks the proportion of the population that is susceptible (S), infected (I), and recovered (R) at each time t. We use a discrete time model where the SIR model can be described using a system of difference equations [7]: St+1 = St − βStIt It+1 = It + βStIt − γIt Rt+1 = Rt + γIt The parameter β is the rate at which an infectious individual transmits the disease to a susceptible person, and is dependent on the average contact rate, and probability of transmission given a discordant contact. Similarly, γ is the recovery rate. Typically, at the beginning of an epidemic, the exact proportion of the population that is infected may be unknown. We use X0 = [XS0, XI0, XR0] = [S0, I0, R0] to denote the initial state at the first decision epoch. We assume that while the exact initial state is unknown, we know an upper and lower bound on each of the compartments. We use S, S¯, I, ¯I, and R, R¯ to denote the upper and lower bound on initial states S0, I0, and R0, respectively. 79 Suppose at each time t, the health department can choose to implement a social distancing policy (a “lockdown") until time epoch t + 1 that reduces the transmission rate β. We assume there are a finite number of periods N. The decision maker wishes to minimize the negative health outcomes and economic and social costs of implementing a lockdown policy. To capture this objective, at each decision epoch, we let the cost be r(Xt , a) = It + u(a), the proportion of the population that is infectious (It ) plus some time-invariant disutility function that captures the economic and social costs that are only incurred when the intervention is in effect u(a). Throughout this section, we refer to this discrete-time system as the ground-truth system, and we will construct our discretized MDP framework based on this. We assume no discounting in the objective (i.e., λ=1). The objective of the MDP is therefore to minimize the total costs over the whole time horizon. 4.5.1.1 Inputs To evaluate this example, we let the transmission rate (β) be 1.4 and the recovery rate (γ) be 0.49. The decision interval (∆t) is a week, and the time horizon (N) is ten weeks. Implementing a lockdown will incur economic and social burden, but it is unclear how this dis-utility can be quantified. For simplicity, we assume the dis-utility is 0.03 if lockdown was implemented and 0 otherwise (u(lockdown) = 0.03, u(no lockdown) = 0). During the early stages of a pandemic, there is typically a large population in the susceptible category, while only a small population is infectious. Therefore, we choose the initial states to be uniformly distributed within the upper and lower bounds for each compartment to be: [S, S¯] = [0.7, 0.99], [I, ¯I] = [0.01, 0.1], and [R, R¯] = [0, 0.29]. In the GreedyCut discretization method, for each sample (θ) generated, ten iterations are run to generate ten additional discretizations (lines 9 - 24 in Algorithm 2). To generate samples θ, X0 are generated 80 uniformly from the region above, and π is a vector with ten random binary variables to indicate the policy intervention (0 – no lockdown, 1 – lockdown). In Algorithm 2, we generate c = 1000 samples to compute two transition probability matrices to correspond to the no lockdown and lockdown policies, respectively. 4.5.1.2 MDP Solutions We compare MDP solutions between the GreedyCut and the uniform discretization methods on 90, 150, 300, and 1200 discretizations. To evaluate our algorithm’s performance, we create 300 samples (we denote the set of all samples as Xˇ 0) by selecting the initial susceptible proportion from 0.7 to 0.99 with stepsize of 0.01 and initialize the infectious proportion from 0.001 to 0.01 with 0.001 stepsize. By enumerating each pair of S and I, we can have a total of 300 different possible initial states (if S + I > 1, we will renormalize each compartment). We compute the following metrics for both GreedyCut and uniform discretization methods on 90, 150, 300, and 1200 discretizations: • ACC: accuracy in matching the percentage of optimal actions by comparing discretized MDP with brute force (ground-truth) solution over each state-time pair (a total of 3000 state-time pairs). ACC = 1 − #mismatch 3000 . • MSE: mean squared error between the optimal value of the discretized MDP (V¯ ∗ 0 ) and the brute force solution (V ∗ 0 ) on the first decision epoch over all states. MSE = EX0∈Xˇ 0 [||V¯ ∗ 0 (X0) − V ∗ 0 (X0)||2 ]. • E2: relative mean absolute error on the first decision epoch over all states. E2 = EX0∈Xˇ 0 [ |V¯ ∗ 0 (X0)−V ∗ 0 (X0)| V ∗ 0 (X0) ]. • Opt. Gap: average of the relative difference between the optimal value of brute force solution and the value of running optimal policy from discretized MDP on the true disease model (V˜ 0) on the first decision epoch. Opt. Gap = EX0∈Xˇ 0 [ |V˜0(X0)−V ∗ 0 (X0)| V ∗ 0 (X0) ]. 81 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Susceptible Proportion 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 Infectious Proportion Week 5 -- GreedyCut Vs Brute Force 0 0 0 0 2 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 2 1 1 1 1 1 1 0 0 0 2 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 2 1 1 1 1 1 1 1 0 0 2 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 2 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0.5 1 1.5 2 2.5 3 (a) 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Susceptible Proportion 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 Infectious Proportion Week 5 -- Uniform Vs Brute Force 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0.5 1 1.5 2 2.5 3 (b) Figure 4.3: We compare the optimal solution at t = 5 across different states using the GreedyCut and uniform discretized MDPs against the ground truth optimal solution found using brute force methods. (a): Optimal solution from the GreedyCut discretized MDP compared to the brute force solution; (b): Optimal solution from the uniform discretized MDP compared to the brute force solution. ( 0 – both models recommend not implementing lockdown; 1 – both models recommend implementing lockdown; 2 – the brute force method recommends not implementing lockdown while the other method recommends lockdown; 3 – the brute force method recommends implementing lockdown while the other method recommends not implementing lockdown.) The GreedyCut discretized MDP is able to generate solutions with a higher accuracy than the uniform discretized MDP. We compare the solutions from the GreedyCut and the uniform discretized MDPs against the brute force solution in the fifth week (t = 5) for illustration. In the fifth week, we compare optimal actions across 300 states. The results indicate that the GreedyCut discretized MDP has six mismatches, whereas the uniform discretized MDP has 26 mismatches when compared with the brute force solution. Moreover, the direction of error may be worse with the uniform discretization MDP. In the GreedyCut discretized MDP, all six mismatches belong to the case where the brute force solution recommends not to lockdown while the GreedyCut discretized MDP recommends implementing lockdown. However, in the uniform discretized MDP, all 27 mismatches belong to the case where the brute force solution recommends lockdown while the discretized MDP recommends not implementing lockdowns. In infectious disease control, failing to implement a lockdown when it is necessary can cause a rapid increase in the proportion of infectious cases. Therefore, error in this direction may be practially worse than in the converse direction. 82 We see this illustrated in the optimality gap between the two discretized MDPs (last columns in Table 4.1), which measures the distance between solutions from discretized MDPs and the true optimal solution. Here we see that GreedyCut MDP achieves a much lower optimality gap for this reason. |G| ACC MSE E2 Opt. Gap GreedyCut Uniform GreedyCut Uniform GreedyCut Uniform GreedyCut Uniform 90 0.9657 0.8120 4.3239e-04 0.0580 0.0689 0.8846 0.0033 0.0954 150 0.9850 0.8820 1.7896e-04 0.0169 0.0435 0.4946 0.0011 0.0583 300 0.9787 0.8613 6.5032e-05 0.0054 0.0233 0.2487 0.0018 0.0251 1200 0.9880 0.9150 3.1079e-05 4.3529e-04 0.0181 0.0654 0.001 0.0072 Table 4.1: Comparison on MDP solutions The GreedyCut discretization method outperforms the uniform discretization methods across different evaluation metrics across all time periods and different discretization budgets. Table 4.1 shows the comparison between the GreedyCut and the uniform discretization methods. The GreedyCut discretization method has higher accuracy (approximately 10% more) in matching the optimal actions from the brute force (ground-truth) solution over all state-time pairs. The GreedyCut discretization method is able to generate accurate recommendations on policy interventions even with a small number of discretizations. Additionally, the GreedyCut discretization method is able to provide a closer approximation of the objective value in both MSE and E2 metrics across all discretizations. When the number of discretizations is small, the GreedyCut algorithm has an MSE that is under 1% of the MSE generated using the uniform discretization approach. Similarly, under these conditions, the GreedyCut algorithm’s E2 remains below 10% of the E2 from the uniform discretization method. Moreover, the GreedyCut algorithm outperforms the uniform discretization method in reducing the optimality gap. The optimality gap ranges from 0.1% to 0.33% with different numbers of discretizations in the GreedyCut algorithm, compared with its ranges from 0.72% to 9.54% in the uniform discretization method. As expected, with a small number of discretizations, the difference in performance between the GreedyCut and the uniform discretization methods is large. The performance gap shrinks when the number of 83 discretizations increases, as uniform discretizations naturally benefit from smaller discretized regions – higher resolution. Run Time Outcomes. To understand how much time is needed to construct an MDP using the GreedyCut discretization method and Algorithm 2, we compare the run time of Algorithm 1 and Algorithm 2 with different numbers of discretizations using Matlab 2022b on a laptop with 16 GB memory and Apple M1 pro chip. 0 500 1000 1500 Number of Discretizations 0 50 100 150 200 250 300 Runtime (seconds) Figure 4.4: Runtime of Algorithm 1 There is an exponential relationship between the number of discretizations and the algorithm runtime (see Figure 4.4), consistent with the time complexity analysis in Section 4.4.1.1. The curvature of the exponential function will depend on the complexity of the disease model f(Xt , πt); with more compartments or population stratifications, total runtime may be larger for a similar number of discretizations. |G| Runtime (hours): 90 0.12 150 0.34 >300 >1 Table 4.2: Runtime of Algorithm 2 The runtime of generating transition matrices is much more costly compared with generating the discretizations when number of discretizations (|G|) is small for both GreedyCut and uniform discretization. 84 Table 4.2 shows the runtime of generating transitions using Algorithm 2 for the GreedyCut discretization method (the uniform discretization method should have the same runtimes as there are the same number of iterations needed). The runtime exceeds one hour with 300 discretizations given c =1000. Therefore, when |G| is small, the total runtime of constructing an approximate MDP is roughly the time for generating transitions using Algorithm 2. In this case, the time that it takes to construct the MDP using the GreedyCut discretization method is close to that of using the uniform discretization method – this is because the runtime of the Algorithm 1 (which is only needed for GreedyCut and not the uniform discretization) is negligible compared to the runtime of the Algorithm 2. 4.5.1.3 General Algorithm Evaluation In this section, we evaluate the GreedyCut discretization method’s performance on generating discretizations that approximate the disease dynamics {Xt} and Algorithm 2’s performance on generating transition probability matrices to generate the discretized trajectories {X¯ t}. We first examine the performance of Algorithm 2 to highlight its capability to generate precise transition probabilities. These probabilities are crucial for describing the discretized trajectory across various discretization settings. Subsequently, we assess the performance of the GreedyCut discretization method by comparing the Markovian trajectories (using transition probabilities from Algorithm 2) between the GreedyCut algorithm and uniform discretization method. How Accurate Are the Generated Transition Probabilities? To evaluate the accuracy of generated transition probabilities from Algorithm 2, we draw samples and evaluate computed trajectories compared with trajectories {X¯ t} from ¯f(X¯ t , πt , G) to eliminate the influence of the quality of the discretization algorithm. For evaluation, we uniformly draw 1000 samples (ˆθ ∈ Θˆ ) that consist of initial states within the upper and lower bound of the proportions in each compartment and a sequence of policy interventions for each 85 initial state. For each evaluation sample ˆθ, we compute the discretized trajectory {X¯ t} using ¯f(X¯ t , πt , G). To obtain the Markovian trajectories from the discretized Markov model, we use an initial belief b0 = ei where all entries of b0 are zero except for i-th entry (corresponding to X0) which has value one. This indicates we know 100% the initial state of the discretized Markov model. Then we update the belief bt = P(πt)bt−1 over time. To compute the expected proportion of people on each time t (Markovian trajectory at time t, X˜ t ), we use the weighted average over the belief vector at time t, e.g., X˜ t = b T t X¯. We then compute the cost Eθˆ∈Θˆ [ PN t=1 ||X˜ t − X¯ t ||2 2 ] to evaluate how close is Algorithm 2 able to generate reliable transition probability matrices. 0 2 4 6 8 10 Time: week 0 0.2 0.4 0.6 0.8 1 Proportions Markovian Trajectories V.S. Discretized Trajectories Markovian S Markovian I Markovian R Discretized S Discretized I Discretized R Figure 4.5: Comparison between trajectories generated from Algorithm 2 given discretizations and trajectories generated from discretized states. For each compartment S, I, and R, both trajectories are close to each other. The trajectories obtained from Algorithm 2 closely aligned with those generated from discretized states for each compartment. As shown in Figure 4.5, we compared the trajectories obtained from Algorithm 2 using 300 discretizations with the ¯f(X¯ t , πt , G) trajectories generated from the same 300 discretized states. We observed that as the number of discretizations increases, Algorithm 2 is capable of generating transitions that closely resemble the dynamics of the disease, represented by {X¯ t}. In Table 4.3, we show 86 a comparison of the cost E[ PN t=1 ||X˜ t − X¯ t ||2 2 ] for |G| of 90, 150, 300, and 1,200 using the GreedyCut discretization method. We observe that the population proportions generated using the Markovian and discretized processes closely align, even over time. With fewer discretizations, each individual discretization possesses a larger range, making it more difficult for samples drawn from these discretizations to transition accurately between decision epochs, leading to more error in approximating {X¯ t}. On the other hand, when a larger number of discretizations is employed, each discretization exhibits a smaller range. By drawing a sufficient number of samples, it becomes possible to provide a more precise description of {X¯ t}. These findings highlight the algorithm’s reliability and accuracy in capturing the system’s dynamics. |G| Mean Squared Error Between Discretized Trajectory and Markovian Trajectory [95% uncertainty interval] 90 0.4362 [0.3921, 0.4802] 150 0.1869 [0.1665, 0.2072] 300 0.1370 [0.1191, 0.1548] 1200 0.1212 [0.1075, 0.1349] Table 4.3: Mean squared error for the trajectories given different numbers of discretizations How Good Are the Discretizations Generated from the GreedyCut discretization method? Next, to evaluate the quality of discretizations generated from the GreedyCut discretization method, we draw samples and calculate the mean squared error across all samples. This error is measured between the actual trajectory ({Xt}) and anticipated Markovian trajectory ({X˜ t}), using the transition matrix created in Algorithm 2 using the discretizations generated from Algorithm 1 over the same 1000 samples for evaluating Algorithm 2. We use the same discretization levels (90, 150, 300, and 1200 discretizations) generated in the previous section for evaluation. To benchmark our model, we also generated uniform discretizations with the same discretizations. Then, for both discretization methods, transition probability matrices were generated using Algorithm 2. 87 0 2 4 6 8 10 Time: week 0 0.2 0.4 0.6 0.8 1 Proportions GreedyCut Discretization GreedyCut S GreedyCut I GreedyCut R Actual S Actual I Actual R (a) 0 2 4 6 8 10 Time: week 0 0.2 0.4 0.6 0.8 1 Proportions Uniform Discretization Uniform S Uniform I Uniform R Actual S Actual I Actual R (b) Figure 4.6: Comparison between trajectories generated from the GreedyCut discretization method against the uniform discretization method (using 300 discretizations in total) given trajectories generated from SIR model. For each compartment S, I, and R, the GreedyCut discretization method can better capture the disease dynamics. We find that the GreedyCut discretization method is able to better approximate the disease dynamics over the uniform discretization method. As shown in Figure 4.6, a comparison between the GreedyCut and the uniform discretization methods based on 300 discretizations shows that the Markovian trajectories for the GreedyCut discretization methods are closely aligned with the actual trajectories. However, the uniform discretization method shows poor approximation, especially showing incorrect trends for the proportion of infectious people over time – where the proportion of infectious population starts to decline after week 8 in the Markovian trajectory, whereas the proportion of infectious population increases in the entire time horizon in the actual trajectory. Additionally, the uniform overestimates the proportion of recovered populations by more than twice compared with the actual proportion of infectious people. |G| GreedyCut Uniform 90 0.1411 [0.1264, 0.1558] 0.3269 [0.2388, 0.4150] 150 0.1300 [0.1162, 0.1439] 0.2483 [0.1887, 0.3708] 300 0.1218 [0.1087, 0.1350] 0.1969 [0.1729, 0.2210] 1200 0.1205 [0.1071, 0.1339] 0.1797 [0.1617, 0.1977] Table 4.4: Mean squared error for the trajectories given different numbers of discretizations For all comparison pairs, the GreedyCut discretization method outperforms the uniform discretization method in the squared error between Markovian and actual trajectories. In Table 4.4, we show the result of the comparison of Eθˆ∈Θˆ [||{X˜ t} − {Xt}||2 ] over 1000 samples and 10 time periods between the GreedyCut 88 and uniform discretization methods. Both algorithms are able to improve the result of approximation when the number of discretizations used increases, as expected. However, the improvement in approximations for the GreedyCut discretization method is small compared with uniform discretization, which suggests a high budget may not be necessary. When the number of allowable discretizations is small due to the computational budget, the GreedyCut discretization method can provide a much better approximation than the uniform discretization method, and adding discretizations may not add too much accuracy. 4.5.2 Example 2: COVID-19 Example COVID-19 led to a significant surge in infections within Los Angeles County (LAC). To mitigate the pandemic during its initial phases, LAC implemented a lockdown from the second week to the tenth week following March 1st, 2020, which marked the onset of the epidemic. In this example, we use an MDP with discretizations to identify the optimal timing of imposing lockdowns in LAC to minimize the proportion of infectious cases while considering the cost of a lockdown. 4.5.2.1 Model Structure and Inputs To describe the disease dynamics of COVID-19 in LAC, we calibrated a SIR model that is stratified by health districts (HD) [109], meaning that the model allows for heterogeneity in health outcomes across HDs. The transmission rates between HDs is also allowed to vary. The disease dynamics for HD i are then described as follows: S i t+1 = S i t − X j βjiS i t I j t I i t+1 = I i t + X j βjiS i t I j t ∆t − γIi t R i t+1 = R i t + γIi t 89 We use βij to represent the transmission rate from HD i to HD j, and all HDs are assumed to have the same clearance rate. We consider whether to implement a lockdown policy at each decision epoch, ∆t, which has duration of one week. Decisions need to be made over a total time horizon of 60 weeks (N=60). If lockdown is implemented, transmission will be reduced (β decreases 80%). The model is calibrated using the LAC COVID-19 data [41] and the traffic data from the Archived Transportation Data Management System [8]. We assume there were 1000 infections (0.01% of the total population) at our initial time epoch of analysis. This is consistent with the early stage of the COVID-19 epidemic in LAC where the proportion of infectious remains a small proportion of the overall population. To calibrate the parameters of the stratified SIR model, we used empirical COVID-19 data on case counts to calibrate transmission rate β and recovery rate γ [41]. During calibration, it is assumed that the number of infected individuals reported in the data accurately reflects the true number of infections. LAC mobility data is also used to help capture the heterogeneity in transmission rate among HDs [32, 158]. We let the stage costs be the proportion of infectious proportions plus the dis-utility if lockdown is implemented. We assume that the dis-utility without a policy intervention is zero. However, determining the dis-utility associated with a lockdown is challenging. If the dis-utility is excessively low, the optimal choice consistently leans towards implementing the lockdown, which ignores the potential economic and social burden brought by the lockdown. On the contrary, if the dis-utility proves to be excessively burdensome, it will never be enforced. To better reflect this tradeoff, we assume the dis-utility of implementing a one-week lockdown is 0.005, implying it equates to a 0.5% infection within the population. We create two discretized MDPs with 150 discretizations using the GreedyCut discretization method and compare outcomes against that of a uniform discretization method. This will guarantee the completion of model construction within an hour for both the GreedyCut and the uniform discretization methods. 90 4.5.2.2 MDP Results We compared the optimal action recommended by the discretized state MDP from the GreedyCut and uniform discretization methods. We find that our GreedyCut algorithm outperforms the uniform discretization by identifying a better MDP optimal solution with less cumulative proportion of population infectious. Figure 4.7 also shows a comparison of disease dynamics across different policies. Compared with no lockdown, the empirical policy in LAC (lockdown from week 2 to week 10) does not prevent infections but rather postpones the infections, as the total cumulative cases (proportion of the population) over time is 0.7504 in the empirical policy, and 0.7508 if no intervention is used. Both uniform-discretized MDP and GreedyCut discretized MDP are able to reduce the cumulative number of infections and the peak of infections. Comparing the two MDP solutions, the GreedyCut discretization method outperforms the uniform discretization method in terms of the overall reduction in the proportion population of infectious by 0.4793 over the 60-week time horizon. 0 10 20 30 40 50 60 Time: week 0.4 0.5 0.6 0.7 0.8 0.9 1 Proportion Proportion of the Population Susceptible Over Time GreedyCut policy Uniform policy Empirical policy No intervention (a) 0 10 20 30 40 50 60 Time: week 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Proportion Proportion of the Population Infected Over Time GreedyCut policy Uniform policy Empirical policy No intervention (b) Figure 4.7: Proportion of Susceptible/infectious over time for different MDPs. The GreedyCut outperforms uniform, empirical, and ‘do nothing’ policies by achieving the lowest objective value (proportion of population infectious plus the dis-utility value from implementing lockdowns). Figure 4.8 compares the objective value among different models evaluated on the ground-truth disease (compartmental) model. The 8-week lockdown policy LAC imposed has a lower objective value than doing nothing after considering the cost of lockdown, as it was not able to reduce infections while 91 Objective Across Different Models 0.2452 0.5745 0.7954 0.7508 GreedyCut Uniform Empirical Do nothing Policy 0 0.2 0.4 0.6 0.8 1 Objective value Figure 4.8: Comparison between objective values across policies from GreedyCut MDP, uniform MDP, empirical policy, and no policy. imposing dis-utility values. Both uniform and GreedyCut MDPs provide a better solution. GreedyCut MDP is able to improve the objective value from doing nothing by 67%, and outperforming the uniform MDP outcome by 57%. This example demonstrates that the GreedyCut algorithm is able to provide a solution that has a smaller total cost compared to the empirical lockdown policy in LAC and the policy generated by the uniform discretized MDP. Even when the number of discretizations is limited for each compartment (for example, a stratified compartmental model includes death, hospitalizations, exposed, etc.), the GreedyCut discretization method can generate quality solutions with low total discounted costs. Moreover, we found that the uniform discretization method is not able to generate a near-optimal solution as its optimal value exceeds twice the objective value of the GreedyCut discretized MDP. 4.5.2.3 Extension: MDP with at Most Two Policy Switches In Section 4.5.2.2, both MDPs recommended a policy with many policy switches where lockdown would be imposed for many small durations. For example, the policy from the uniform discretized MDP recommends 92 lockdown every few weeks in weeks 9-23 (shown in Figure 4.9). This is not practical, as inconsistency in policies can lead to poor adherence or psychological issues [149, 102]. In this section, we consider the same problem with additional constraints where we allow the policy to switch at most twice (once from no lockdown to lockdown, and once from lockdown to no lockdown). We set up the COVID-19 dynamics in the same way as in Section 4.5.2.2. 5 10 15 20 25 30 35 40 45 50 55 60 Week Empirical GreedyCut Uniform GreedyCut -- 2 switch Uniform -- 2 switch Lockdown policy over time 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 4.9: Lockdown policy. The GreedyCut discretized MDP recommends starting the lockdown on week 7 for 42 weeks. The uniform discretized MDP recommends starting the lockdown on week 8 for 50 weeks. With at most two policy switches (one lockdown duration), the GreedyCut discretized MDP recommends a shorter lockdown duration than the uniform discretized MDP and an earlier lockdown initiation date. Figure 4.9 shows the lockdown policy outcomes given the disease dynamics of the ground-truth model. The GreedyCut discretized MDP recommends starting the lockdown on week 7 for 42 weeks. The uniform discretized MDP recommends starting the lockdown on week 9 for 50 weeks. Due to the highly transmittable nature of COVID-19, a lockdown of over 40 weeks is needed to reduce the transmissions. Figure 4.10 compares the trajectories using policies from the uniform discretized MDP and GreedyCut discretized MDP. The GreedyCut discretized MDP is able to generate a policy that reduces the cumulative proportion of population infectious by 0.0864 with fewer weeks of lockdowns compared to the uniform discretized MDP. 93 0 10 20 30 40 50 60 Time: week 0.4 0.5 0.6 0.7 0.8 0.9 1 Proportion Proportion of the Population Susceptible Over Time GreedyCut policy Uniform policy Empirical policy No intervention (a) 0 10 20 30 40 50 60 Time: week 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Proportion Proportion of the Population Infected Over Time GreedyCut policy Uniform policy Empirical policy No intervention (b) Figure 4.10: Proportion of the population susceptible/infectious over time for different MDPs with policy constraints. With the additional constraint on the policy switches, the GreedyCut discretized MDP consistently generates a better solution than the uniform discretized MDP and other policies we considered in this analysis. In addition, the GreedyCut discretized MDP recommends a shorter lockdown duration compared with the uniform discretized MDP which could reduce the economic and societal burden brought by the lockdown. 4.6 Conclusion In this chapter, we introduce a novel algorithm for formulating an MDP framework tailored for continuous or large state-space problems in repeated decision-making with uncertainty for infectious disease control. In our numerical analyses, we found that our algorithm is provides better MDP solutions than the uniform discretized MDP for the models we evaluated. Our approach better approximates the true value function than the uniform discretized MDP, therefore leading to a better policy with a lower optimality gap. Compared to uniform discretization, our method demonstrates better performance across different discretization budgets, especially showing notable benefits when the number of discretizations is small. 94 This may be particularly pertinent for a compartmental model with many compartments, as the resultant number of discretizations for each compartment may be extremely limited. In this case, a uniform discretization method may result in a poor estimation of disease dynamics. We found that our algorithm is able to provide a better state discretization than the uniform discretization method in approximating disease dynamics for the examples considered. Our approach generates smaller regions for states with a higher likelihood of being visited and increases the range of the region for those less frequently visited. Using the GreedyCut method would substantially improve the approximation quality, thus resulting in an improved decision-making process. Additionally, we offer an effective method to compute the transition probability matrices for formulating MDPs given f(Xt , πt), which may be a compartmental or simulation-based model. This helps us incorporate other common disease modeling frameworks into the MDP decision-making framework, facilitating the seamless integration of diverse healthcare applications. This approach is not only straightforward to implement but also effectively captures the complex function f(Xt , πt) that represents the dynamics of the disease in transition probability matrices. With a small number of discretizations, the time spent discretizing the state using our approach is considerably smaller than the time required to produce transition probability matrices. In our examples, the time needed to formulate the discretized MDP using our approach is nearly equivalent to the time necessary for the construction of a uniform discretized MDP. Therefore, our approach could offer an improvement to the MDP solution without substantially increasing computational expense under limited budgets. We provide numeric examples to demonstrate the efficiency and effectiveness of our algorithm in discretizing continuous-state decision-making problems. Benchmarking the performance with the uniform discretization method, we demonstrate that our algorithm is able to generate preferable discretizations with a limited budget that is a good proxy of the ground truth system. We also demonstrate that our algorithm can generate better policies in both synthetic examples and a COVID-19 example. In the synthetic 95 example, our algorithm outperforms the uniform discretization in all metrics for different discretization budgets. In the COVID-19 example, our algorithm improves the objective by nearly 100% from the uniform discretization. Our numerical analysis also generated policy implications for social distancing policy during COVID19 in LAC. The first policy implication is that the threshold of implementing the lockdown depends both on the proportions of susceptible and infectious (recommending implementing the lockdown if the proportions of infectious and susceptible are above certain numbers). When the proportion of the infectious population increases, the threshold of implementing the lockdown on the proportion of the susceptible population decreases. This is because a less susceptible population is needed to spread the disease as the infectious population becomes larger. Similarly, when the proportion of the susceptible population increases, the threshold of implementing the lockdown on the proportion of the infectious population decreases. Secondly, a short lockdown interval may not effectively reduce the total number of cases, instead only delay the epidemic peak. To more effectively control case, a prolonged lockdown period is needed. We must acknowledge several limitations of this work. The GreedyCut algorithm may not find the discretization that globally minimizes the cost function. The performance gap between the GreedyCut and the uniform discretization methods is small with a large discretization budget. The GreedyCut algorithm may have computational difficulty if there is a large action space and does not consider continuous action spaces; this leaves an interesting optimization direction for future studies. The output of the GreedyCut algorithm may be sensitive to the choice of cost function; different choices may result in widely different discretization choices, thus requiring reevaluation of the objective function is changed. Moreover, the calibration of COVID-19 disease dynamics is based on the assumption that the data on COVID-19 is accurate. Nonetheless, there is a potential for undercounting due to the severe limitations in testing resources during the early stages of the outbreak. Given that our algorithm depends on actual disease trajectories and assumes knowledge of the true underlying disease dynamics, uncertainties in these dynamics can affect the 96 discretizations of states and consequently change the recommended policy interventions. However, if the uncertainties are minor (for example, if the parameters exhibit only slight variations), they are unlikely to significantly alter the disease trajectory, and our algorithm is expected to outperform uniform discretization. For situations where the underlying disease dynamics remain largely unknown, future work should focus on imporoving the robustness of our algorithm. Despite these limitations, we believe that this work provides an effective and easy-to-handle scheme for dealing with decision-making problems in large or continuous state spaces. Our chapter provides insight into future work on improving the discretization of solving large-scale MDPs. 97 Chapter 5 Conclusions In conclusion, this dissertation has delved into the challenges and opportunities presented by applying the MDP framework to address various healthcare issues. This dissertation provides valuable insights and solutions to address challenges in different healthcare applications. In Chapter 2, we construct an MDP framework for the organ transplant problem for ACLF patients. By formulating this problem as a stopping problem within the MDP framework, we identify optimal strategies for patients, maximizing their health outcomes under varying conditions and decision periods. These findings offer valuable guidance to healthcare providers and policymakers in making informed decisions to improve patient care. In Chapter 3, we quantify the benefits of increasing decision-making frequency in healthcare applications. We construct two equivalent MDPs with different decision epochs. We provide sufficient conditions for threshold policy (both over states and over time) to be held. Moreover, we shed light on the relationship between the two problems as well as how the difference in optimal values between the two problems change over states and over time. In addition, we establish a theoretical result that identifies a sufficient condition for the difference in optimal values between the two problems to be positive. This condition enables us to gain valuable insights into the structure of optimal actions for the more-frequent problem, without explicitly solving it. Instead, we can leverage the solution obtained from the less-frequent problem 98 to infer the optimal actions for the more-frequent problem. We demonstrate the validity of our theoretical results through organ transplant and treatment initiation problems, which also provide insights to patients/healthcare providers about whether increasing the frequency of decision-making is beneficial. In Chapter 4, we propose a greedy algorithm to deal with challenges from the continuous-state problems under the infectious disease control context. We provide an algorithm for defining a non-uniform, discrete state space that is capable of providing higher precision in areas where the state space is more likely to be visited. This allows us to effectively incorporate infectious disease dynamics within frameworks that are better suited for discrete state spaces, such as MDPs. Our numerical results demonstrate the benefits of using a greedy approach rather than a uniform approach. In both examples, our algorithm can closely approximate disease dynamics and generate MDP solutions that outperform both the status quo and the uniform approach. Overall, this dissertation contributes to the field of healthcare applications by offering practical MDP frameworks, theoretical insights, and efficient algorithms to address real-world challenges. These methodologies developed have potential applicability in broader contexts beyond healthcare. Additionally, our work provides decision-makers with valuable tools and insights to optimize treatment plans, evaluate policies, and improve healthcare outcomes for individuals and populations. As healthcare continues to evolve, MDPs will play an increasingly important role in guiding decision-making processes, and we hope that our contributions will pave the way for further advancements in this area. For future research, we plan to focus on incorporating parameter uncertainties into the MDP framework, aiming to create a more robust model for solving medical decision-making challenges that inherently involve parameter estimation uncertainties. By drawing on insights from the field of robust optimization, we intend to adapt and refine these strategies specifically for healthcare contexts, thereby uncovering and addressing additional challenges. Moreover, we plan to continue optimizing our methodology for discretizing the continuous-state space and constructing the MDP framework. In particular, our future work will 99 involve investigating algorithms that use less computational budgets for creating transition probability matrices based on these discretizations. Another interesting direction is to extend the continuous-state MDP to continuous-state and continuous-action MDP. This development requires discretizing both the state and action spaces, highlighting the demand for a more sophisticated algorithm. 100 Appendix A Appendix for Chapter 2 A.1 Model details A.1.1 Calculating the Expected One-Year Survival Probability with a Marginal Liver We use the following equations to calculate the one-year survival probability with a marginal liver. This metric is used in the chapter to assist in providing clinical insight into our model outcomes. Expected one-year survival probability given a marginal liver is accepted: E[M arginal] = p12 ∗ rr where p12is the post-transplant survival probability after month 12 and rr is the relative risk of survival between a patient with a transplanted marginal versus optimal liver – it is the ratio between the likelihood of post-transplant mortality with a transplanted marginal liver and an optimal liver. We assume rr to be a constant. For example, suppose rr =0.9; the marginal liver expected one-year survival probability is then 0.77 for patients younger than 60 years old. 101 A.1.2 Additional details regarding the Markov Decision Process Model A.1.2.1 Model formulation A Markov decision process is a sequential decision model which incorporates both stochasticity of the health states and time dependent decision making in our problem. Such a model consists of a set of decision epochs, health states, available actions, and rewards. In our model, we have seven decision epochs, represented by each of the seven days after the ACLF3 patient becomes eligible for a liver transplant. There are four states: ACLF3 with an optimal liver offered (pre-transplant), ACLF3 with a marginal liver offered (pre-transplant), post-transplant, and death. Two actions are available if a liver is offered: wait and accept. The decision maker needs to decide whether to accept the offered liver on each day. Rewards (Total one-year survival probability) are associated with each action: if the action is to wait then the patient is expected to receive a certain amount of total one-year survival probability conditional on what state he or she is in the next decision period. If the action is to accept the liver, then the patient is expected to receive a certain amount of rewards for the post-transplant process depending on the quality of the liver. Once the decisionmaker has made the decision to accept then the patient will terminate the pre-transplant process and enter the post-transplant process. We assume that if an optimal liver is offered, then the patient will immediately accept. Then the question becomes how long the patient should wait if no optimal liver is offered. We compute the optimal action (wait or accept) at each state and decision epoch using a dynamic programming approach. The mathematical formulation is as follows: Jk(xk) = max{ X 4 xk+1=1 pxk,xk+1 (wait)Jk+1(xk+1), ra(w, l)} where Jk(xk) is the expected total one-year survival probability the patient will receive at state xk at time k. ra(w, l) is the reward for patients who choose to accept, which depends on the week of transplantation 102 (w) and type of the liver(l)the patient accepts. pxk,xk+1 (wait)is the transition probability that the patient goes from state xk at the kth period to state xk+1 at the k + 1th period given the patient chooses to wait. When Jk(xk) = ra(w, l), then the optimal policy is to accept the liver (l) offered. A.1.2.2 Identifying the MDP Optimal Solution We solve the above problem using the backward induction algorithm [107]. First, we assume that at the end of the decision period, the patient must accept the transplant to avoid dying. Therefore JN (xN ) = ra(w, l). Then we compute the following using backward induction: until we reach the current decision Jk(xk) = max{ P4 xk+1=1 pxk,xk+1 (wait)Jk+1(xk+1), ra(w, l)} and uk(xk) = argmax{ P4 xk+1=1 pxk,xk+1 (wait) Jk+1(xk+1), ra(w, l)} until we reach the current decision epoch. Then the set of uk(xk) for all xk and k is called the optimal policy. Based on the structure of our problem, there exists a monotonic sequential decision. For example, once the optimal action is to accept then it is not possible that the optimal decision is to wait from that day to the last decision period due to the increasing mortality rate and decreasing benefits from transplant over time. A.1.2.3 Base Case Solution Below we provide the solution for patients less than 60 years old with a 60% likelihood of getting an optimal organ and 90% relative risk of post-transplant survival if they get a marginal liver instead of an optimal liver: Day 1 2 3 4 5 6 7 Optimal liver Accept Accept Accept Accept Accept Accept Accept Marginal liver Wait Accept Accept Accept Accept Accept Accept Table A.1: Baseline result 103 Then the last day the patient should wait for the optimal liver (decision boundary) is day 1. Note when the decision boundary is day 0 then it means to accept the marginal liver as soon as the patient is eligible for liver transplant regardless of the type of liver offered. A.2 Supplementary figures Figure A.1: (a): Expected one-year survival probability with a marginal liver by relative risk for two age groups, given the probability of receiving an optimal organ of 60%. (b): Expected one-year survival probability with a marginal liver by relative risk for two organ failure groups, given the probability of receiving an optimal organ of 60%. The expected one-year survival probability with a marginal liver gives the proportion of post-transplant patients that will survive up to one-year with a marginal liver transplanted. This value depends on the relative risk (rr). There is a positive linear relationship between the expected one-year survival probability with a marginal liver and the relative risk. 104 Figure A.2: Non-transplant survival probabilities, according to recipient age and number of organ failures present at listing 105 Figure A.3: Post-transplant survival probabilities, using an optimal or marginal donor organ 106 Figure A.4: Model schematic. Diagram of ACLF3 patient flow while waiting for transplant. The left panel shows the flowchart of daily patient outcomes. The pre-transplant health state changes are modeled by the Markov model in the top red box; the post-transplant health states are also described by a Markov model (lower red box). Greek letters are probabilities and are given in Table 2.1 and Table A.2. 107 Figure A.5: Two-way sensitivity analysis after removal of patients with suspected chronic kidney disease 108 Figure A.6: Two-way sensitivity analyses after removal of patients transplanted before year 2014 Figure A.7: Two-way sensitivity analyses, accounting for center variation regarding probability of receiving an optimal organ offer and expected 1-year post-LT survival using a marginal quality organ, for hepatic and extrahepatic ACLF3 patients 109 A.3 Supplementary tables Organ failure UNOS database variables Liver Total bilirubin ≥ 12 mg/dL Renal • Insufficiency: creatinine 1.5-1.9 mg/dL • Failure: creatinine ≥ 2.0 mg/dL or renal replacement therapy Coagulation INR ≥ 2.5 Brain grade 3-4 encephalopathy based on WestHaven criteria Circulatory requirement of vasopressors Respiratory requirement of mechanical ventilation ACLF-1 • Single renal failure • Renal insufficiency with nonrenal organ failure • Grade 1-2 encephalopathy based on West-Haven criteria with non-renal organ failure ACLF-2 Two organ failures ACLF-3 Three or more organ failures Table A.2: Criteria to determine presence of organ dysfunction/failure 110 Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 3 OF at listing Proceed with LT 83.1% 79.5% 74.9% 70.0% 65.4% 60.5% 56.7% Defer LT for 1 Day 79.5% 74.9% 70.0% 65.4% 60.5% 56.7% 52.1% 4-6 OF at listing Proceed with LT 76.1% 71.0% 64.6% 58.9% 53.7% 47.8% 44.1% Defer LT for 1 Day 71.0% 64.6% 58.9% 53.7% 47.8% 44.1% 39.4% Age ≤60 years Proceed with LT 82.7% 78.4% 73.2% 67.8% 63.3% 58.2% 54.5% Defer LT for 1 Day 78.4% 73.2% 67.8% 63.3% 58.2% 54.5% 49.9% Age >60 years Proceed with LT 74.0% 70.0% 64.5% 59.5% 53.6% 47.7% 43.8% Defer LT for 1 Day 70.0% 64.5% 59.5% 53.6% 47.7% 43.8% 39.0% Hepatic Proceed with LT 85.9% 81.8% 76.9% 71.6% 66.7% 60.7% 56.1% Defer LT for 1 Day 81.8% 76.9% 71.6% 66.7% 60.7% 56.1% 49.4% Extrahepatic Proceed with LT 79.0% 74.9% 69.4% 64.1% 59.3% 54.1% 50.6% Defer LT for 1 Day 74.9% 69.4% 64.1% 59.3% 54.1% 50.6% 46.4% Table A.3: Overall 1-year survival probability based on the decision to transplant on a specific day after listing or defer LT for one day 111 Variable Value ACLF-2 patients non-transplant survival probabilities (1-γt), by age and day (1-7): >60 years old Day 1 0.9839 Day 2 0.9646 Day 3 0.9226 Day 4 0.8916 Day 5 0.8623 Day 6 0.8245 Day 7 0.7950 ≤60 years old Day 1 0.9890 Day 2 0.9714 Day 3 0.9439 Day 4 0.9108 Day 5 0.8804 Day 6 0.8511 Day 7 0.8179 ACLF-2 patients post-transplant survival probabilities (1-µ12) transplanted in the first week: Optimal Liver >60 years old, 12 months post-transplant 0.8372 ≤60 years old, 12 months post-transplant 0.9040 Marginal Liver >60 years old, 12 months post-transplant 0.8302 ≤60 years old, 12 months post-transplant 0.9529 Likelihood of ACLF3 patients improving to ACLF2, by age and day(1-7): >60 years old Day 1 0.0040 Day 2 0.0184 Day 3 0.0300 Day 4 0.0399 Day 5 0.0522 Day 6 0.0676 Day 7 0.0903 ≤60 years old, Day 1 0.0040 Day 2 0.0182 Day 3 0.0353 Day 4 0.0483 Day 5 0.0609 Day 6 0.0712 Day 7 0.0829 Table A.4: Pre and post-transplant survival probabilities for ACLF2 patients, and likelihood of ACLF-3 patients improving to ACLF2 112 Variable Value ACLF-3 patients non-transplant survival probabilities (1-γt), by age and day (1-7): >60 years old Day 1 0.9523 Day 2 0.8955 Day 3 0.8338 Day 4 0.7662 Day 5 0.7065 Day 6 0.6527 Day 7 0.5749 ≤60 years old Day 1 0.9470 Day 2 0.8784 Day 3 0.8119 Day 4 0.7499 Day 5 0.6848 Day 6 0.6395 Day 7 0.5864 ACLF-3 patients post-transplant survival probabilities (1-µ12) transplanted in the first week: Optimal Liver With hepatic, 12 months post-transplant 0.8950 With extrahepatic, 12 months post-transplant 0.8234 Marginal Liver With hepatic, 12 months post-transplant 0.8441 With extrahepatic, 12 months post-transplant 0.7482 Table A.5: Pre and post-transplant survival probabilities for hepatic and extrahepatic ACLF-3 patients Age ≤60 Age >60 Optimal Organ Marginal Organ Optimal Organ Marginal Organ LT Day: Median IQR Median IQR Median IQR Median IQR 1 20.0 (11-45) 11.0 (9-13) 18.0 (12-32) 17.0 (10-28) 2 21.5 (14-41) 29.0 (20-57) 16.0 (11-29) 16.0 (10-31) 3 18.0 (12-36) 26.0 (20-44) 18.0 (11-29) 16.0 (12-28) 4 18.0 (12-35) 15.0 (10-35) 18.0 (11-34) 22.0 (11-31) 5 24.0 (13-37) 18.0 (12-40) 17.0 (10-29.5) 21.5 (14-54) 6 14.0 (9-27) 17.0 (8-78) 18.0 (12-32) 19.0 (10-26) 7 22.0 (18-29) 30.0 (25-32) 22.0 (14-35) 15.0 (12-27) 3 OF 4-6 OF Optimal Organ Marginal Organ Optimal Organ Marginal Organ LT Day: Median IQR Median IQR Median IQR Median IQR 1 16.0 (11-26) 13.0 (11-25) 22.0 (9-48) 12.0 (8-44) 2 16.0 (10-29) 16.0 (10.5-33.5) 25.0 (13-39) 26.0 (15-38) 3 16.5 (11-28) 16.0 (12-28) 22.0 (15-39) 25.0 (16-36) 4 16.0 (10.5-31) 15.0 (8-30) 24.0 (12-50) 27.0 (19-55) 5 16.0 (10-28) 18.5 (12-36) 26.0 (17-40 23.0 (17-60) 6 16.0 (11-26) 16.0 (10-25) 22.0 (16-42) 22.0 (12.5-40) 7 21.5 (14.5-32.5) 16.0 (12-27) 25.5 (16-39) 17.0 (12-28) Table A.6: Post-transplant length of hospital stay based on day of transplantation 113 Appendix B Appendix for Chapter 3 B.1 Proofs The first two proofs are on the threshold policy over time. The process is very similar to [36] with minor changes in assumptions. Lemma 1 Recall the optimality condition in Section 3 for the more-frequent problem: vt(s) = Ck + max[rt(s, a), rt(s, w) + λpt(s|w)⃗vt+1] if s /∈ {wait states, post-decision, death} Ck + rt(s, w) + λpt(s|w)⃗vt+1 if s ∈ {wait states} 0 if s ∈ {post-decision, death} . Similarly, for the less-frequent problem: Vt˜(s) = max[Rt˜(s, a), Rt˜(s, w) + λ kPt˜(s|w)V⃗ t˜+ ] if s /∈ {wait states, post-decision, death} Rt˜(s, w) + λ kPt˜(s|w)V⃗ t˜+ if s ∈ {wait states} 0 if s ∈ {post-decision, death} . Now consider a modified system such that the decision-maker can choose accept in wait states with a small lump sum reward value of ϵ such that ϵ < min{λvt(s), 0}, ∀t ∈ T, s ∈ S. The modified formulation generates the same optimal actions and optimal values as the original formulation. 114 Proof. It is sufficient to show that wait will be the optimal action in all wait states in the modified formulation. WLOG, we show the proof for the less-frequent problem, and the proof for the more-frequent problem follows in a similar way. For any wait state s at time N: rN (s, w) − rN (s, a) = Ck + rN (s, w) − Ck − ϵ ≥ 0, so wait is the optimal action. Induction step: we assume wait is the optimal action for all wait states at time t > m. Then for any wait state s at time m, Ck+rm(s, w)+λpm(s|w)⃗vm+1−Ck−rm(s, a) > rm(s, w)+λpm(s|w)⃗vm+1−Ck− λpm(s|w)⃗vm+1 ≥ 0 Therefore wait the the optimal action at all wait states for time m. This completes the proof. Lemma 2 If Assumption 4 is true, then vt(s) and Vt˜(s) are non-increasing in t ∈ T,t˜∈ T˜ for all s ∈ S. Proof. We show the proof for vt(s) using induction. The proof for Vt˜(s) follows similarly. Base case: Consider optimal action for any state s is wait for time N and wait for time N − 1: vN−1(s) − vN (s) =Ck + rN−1(s, w) + λpN−1(s|w)⃗vN − Ck − rN (s, w) ≥0 115 Consider optimal action for any state s is wait for time N and accept for time N − 1: vN−1(s) − vN (s) =Ck + rN−1(s, a) − Ck − rN (s, w) ≥rN−1(s, w) − rN (s, w) ≥0 We can similarly show vN−1(s)−vN (s) ≥ 0 for the remaining two cases when optimal action is to accept at N and wait at N − 1, or optimal action is to accept at N and wait at N − 1. Induction step: We assume vt(s) is non-increasing in t for any s from m + 1 to N. Then consider if the optimal action for any state s is wait for time k and wait for time m + 1: vm(s) − vm+1(s) =Ck + rm(s, w) + λpm(s|w)⃗vm+1 − Ck − rm+1(s, w) − λpm+1(s|w)⃗vm+2 ≥rm(s, a) − rm+1(s, a) [Assumption 4] ≥0 [Assumption 1] 116 Now consider if the optimal action for any state s is accept for time m and wait for time m + 1: vm(s) − vm+1(s) =Ck + rm(s, a) − Ck − rm+1(s, w) − λpm+1(s|w)⃗vm+2 ≥rm(s, w) + λpm(s|w)⃗vm+1 − rm+1(s, w) − λpm+1(s|w)⃗vm+2 [accept is preferred at time m] ≥rm(s, a) − rm+1(s, a) [Asm 4] ≥0 [Asm 1] We can similarly show vm(s) − vm+1(s) ≥ 0 for the remaining two cases when optimal action is to accept at m + 1 and wait at m, or optimal action is to accept at m + 1 and wait at m. Lemma 3 If Assumption 4 holds, then there exists a threshold policy over time for all states s. Proof. First we know that optimal value is non-increasing over time that follows from Assumption 4 given Lemma 1. WLOG, we show the proof for the more-frequent problem. The proof for the less-frequent problem follows similarly. Let N denote the last epoch that the optimal action is wait. We use a proof by induction. For the base case: Ck + rN−1(s, w) + λpN−1(s|w)⃗vN ≥Ck + rN−1(s, a) − rN (s, a) + rN (s, w) + λpN (s|w)⃗vN+1 [Asm 4, Lemma 1] ≥Ck + rN−1(s, a) [wait is preferred at time N] 117 This holds for all time periods prior to N − 1. For the inductive step, we assume the optimal action is to wait from m to N. Then: Ck + rm−1(s, w) + λpm−1(s|w)⃗vm ≥Ck + rm−1(s, a) − rm(s, a) + rm(s, w) + λpm(s|w)⃗vm+1 [Asm 4, Lemma 1] ≥Ck + rm−1(s, a) [wait is preferred at time m] Therefore there exists a monotone policy in time which is non-decreasing. Proposition 1 vt˜(s) − Vt˜(s) ≥ PN−t˜− N−t˜ k i=0 λ iCk, ∀t˜∈ T , s ˜ ∈ S˜. Proof. It suffices to show that the value of the more-frequent problem is no more greater than value of the less-frequent problem plus the RHS of above statement if both follow the same policy: When At˜(s) = accept, vt˜(s) − Vt˜(s) = rt˜(s, a) + Ck − Rt˜(s, a) = Ck ≥ PN−t˜− N−t˜ k i=0 λ iCk. When At˜(s) = wait, we use induction: vN−k(s) = Ck + rN−k(s, w) + λpN−k(s|w)(Ck + ⃗rN−k+1(w) + λpN−k+1(w)(Ck + ⃗rN−k+2(w) + ... + λpN−k+k−1(w)⃗vN )...) ≥ X k−2 i=0 λ iCk + rN−k(s, w) + λpN−k(s|w)(⃗rN−k+1(w) + λpN−k+1(w)(⃗rN−k+2(w) + ... + λpN−k+k−1(w)V⃗N )...)[Since VN (s) = vN (s), ∀s ∈ S˜] ≥ X k−1 i=0 λ iCk + VN−k(s) 118 Inductive step: vt˜(s) =Ck + rt˜(s, w) + λpt˜(s|w)(Ck + ⃗rt˜+1(w) + λpt˜+1(w)(Ck + ⃗rt˜+2(w) + ... + λpt˜+k−1 (w)⃗vt˜+k )...) ≥ X k−2 i=0 λ iCk + λ k−1 N−t˜−k− N−t˜−k X k i=0 λ iCk + rt˜(s, w) + λpt˜(s|w)(⃗rt˜+1(w) + λpt˜+1(w)(⃗rt˜+2(w) + ... + λpt˜+k−1 (w)V⃗ t˜+k )...) ≥ X k−2 i=0 λ iCk + N−t˜− N−t˜ Xk i=k−1 λ iCk + rt˜(s, w) + λpt˜(s|w)(⃗rt˜+1(w) + λpt˜+1(w)(⃗rt˜+2(w) + ... + λpt˜+k−1 (w)V⃗ t˜+k )...) = N−t˜− N−t˜ Xk i=0 λ iCk + Vt˜(s) Therefore, compared with the optimal policy generated by the less-frequent problem, the more-frequent problem either follows the same policy or finds another policy with higher objective value. Therefore vt˜(s) − Vt˜(s) ≥ PN−t˜− N−t˜ k i=0 λ iCk, ∀t˜∈ T , s ˜ ∈ S˜. Proposition 2 When accept is the optimal action in the more-frequent problem for non-wait state s, epoch t˜ ∈ T˜, if Ck ≤ vt˜(s) − Vt˜(s), then accept is also the optimal action for the less-frequent problem for state s at time t˜. Proof. Let t˜ be the any epoch where we see a ∗ t˜ (s) = accept (assume t˜ is not the last decision epoch). Assume more-frequent problem is k times as frequent as less-frequent problem. vt˜(s) = Ck + rt˜(s, a) ≤ vt˜(s) − Vt˜(s) + rt˜(s, a) 119 Since rt˜(s, a) = Rt˜(s, a), this leads to Vt˜(s) ≤ Rt˜(s, a) Therefore we also have A∗ t˜ (s) = accept. Theorem 1 When both problems have different optimal actions, if Assumption 4 and Assumption 5 hold, then the difference in the optimal value, Dt˜(s), s ∈ S˜, is non-increasing in time for all t˜∈ T˜ when the optimal action for the more-frequent problem is to wait. Otherwise, Dt˜(s) is non-decreasing in time for all t˜∈ T˜. Proof. We know that the optimal value is non-increasing over time, as this follows from Lemma 1. Note that ps ′s,t(w)u = 0 for all s /∈ S˜ for any t and value u given Assumption 5. Let B¯(s) be the first period both problems have different optimal decisions for s ∈ S˜. Let B(s) be the last epoch in which the two problems have different optimal actions for s ∈ S. Thus, at B(s) + k, both problems have the same optimal action. When the optimal action for the more-frequent problem is wait, we wish to show Dt˜(s) is non-increasing in t˜∈ [B¯(s), B(s)]. Then by definition, DB(s) (s) = vB(s) (s, w) − VB(s) (s, a) DB(s) (s) = Ck + rB(s) (s, w) + λpB(s) (s|w)⃗vB(s)+1 − VB(s) (s, a) = Ck + rB(s) (s, w) + λpB(s) (s|w)⃗vB(s)+1 − rB(s) (s, a) Also, we have DB(s)−k (s) = Ck + rB(s)−k (s, w) + λpB(s)−k (s|w)⃗vB(s)−k+1 − VB(s)−k (s, a) = Ck + rB(s)−k (s, w) + λpB(s)−k (s|w)⃗vB(s)−k+1 − rB(s)−k (s, a) ≥ Ck + rB(s) (s, w) + λpB(s) (s|w)⃗vB(s)+1 − rB(s) (s, a) [Asm 4, Lemma 1] = DB(s) (s) 120 We use the same logic to show this is true for all t˜∈ [B¯(s), B(s)]. Next, we wish to show Dt˜(s) is non-decreasing in t˜ ∈ [B¯(s), B(s)] when the optimal action for the more-frequent problem is accept. By definition, DB(s) (s) = vB(s) (s, a) − VB(s) (s, w) = Ck + rB(s) (s, a) − rB(s) (s, w) − λpB(s) (s|w)(⃗rB(s)+1(w) + ... + λpB(s)+k−1 (w)V⃗ B(s)+k ...) Also, we have DB(s)−k (s) =Ck + rB(s)−k (s, a) − rB(s)−k (s, w) − λpB(s)−k (s|w)(⃗rB(s)−k+1(w) + ... + λpB(s)−1 (w)V⃗ B(s) ...) ≤Ck + rB(s) (s, a) − rB(s) (s, w) − λpB(s) (s|w)(⃗rB(s)+1(w) + ... + λpB(s)+k−1 (w)V⃗ B(s) ...) [Asm 4, non-increasing rewards] ≤Ck + rB(s) (s, a) − rB(s) (s, w) − λpB(s) (s|w)(⃗rB(s)+1(w) + ... + λpB(s)+k−1 (w)V⃗ B(s)+k ...) [Lemma 1] =DB(s) (s) This completes the proof. Theorem 2 When both problems have different optimal actions, if Assumption 3 and Assumption 5 hold, then the difference in the optimal value, Dt˜(s), s ∈ S˜, is non-increasing in state for all s ∈ S˜ when the optimal action for the more-frequent problem is to wait. Otherwise, Dt˜(s), is non-decreasing in state for all s ∈ S˜. Proof. Note that ps ′s,t(w)u = 0 for all s /∈ S˜ for any t and value u given Assumption 5. Let Y¯ (t˜) be the first state both problems have different optimal decisions for t˜∈ T˜. Let Y (t˜) be the last state in which the two problems have different optimal actions for t˜ ∈ T˜. Thus, at Y (t˜) + 1, both problems have the same 121 optimal action. When the optimal action for the more-frequent problem is wait, we wish to show Dt˜(s) is non-increasing in s ∈ [Y¯ (t˜), Y (t˜)] ⊆ S˜. Then by definition, Dt(Y (t˜)) = vt(Y (t˜), w) − Vt(Y (t˜), a) = Ck + rt(Y (t˜), w) + λpt(Y (t˜)|w)⃗vt+1 − Vt(Y (t˜), a) = Ck + rt(Y (t˜), w) + λpt(Y (t˜)|w)⃗vt+1 − rt(Y (t˜), a) Also, we have Dt(Y (t˜) − 1) = vt(Y (t˜) − 1, w) − Vt(Y (t˜) − 1, a) = Ck + rt(Y (t˜) − 1, w) + λpt(Y (t˜) − 1|w)⃗vt+1 − rt(Y (t˜) − 1, a) ≥ Ck + rt(Y (t˜), w) + λpt(Y (t˜)|w)⃗vt+1 − rt(Y (t˜), a) [Asm 3] = Dt(Y (t˜)) We use the same logic to show this is true for all s ∈ [Y¯ (t˜), Y (t˜)] ⊆ S˜. Next, we wish to show Dt˜(s) is non-decreasing in s ∈ [Y¯ (t˜), Y (t˜)] ⊆ S˜ when the optimal action for the more-frequent problem is accept. By definition, Dt(Y (t˜)) = vt(Y (t˜), a) − Vt(Y (t˜), w) = Ck + rt(Y (t˜), a) − rt(Y (t˜), w) − λpt(Y (t˜)|w)(⃗rt+1(w) + ... + λpt+k−1(w)V⃗ t+k...) 122 Also, we have Dt(Y (t˜) − 1) = Ck + rt(Y (t˜) − 1, a) − rt(Y (t˜) − 1, w) − λpt(Y (t˜) − 1|w)(⃗rt+1(w) + ... + λpt+k−1(w)V⃗ t+k...) ≤ Ck + rt(Y (t˜), a) − rt(Y (t˜), w) − λpt(Y (t˜)|w)(⃗rt+1(w) + ... + λpt+k−1(w)V⃗ t+k...)[Asm 3] = Dt(Y (t˜)) This completes the proof. Theorem 3 Let B(s) ∈ T˜ be the last epoch where the optimal policy for both problems is wait for non-wait state s ∈ S. If the following conditions hold, then Dt˜(s), s ∈ S˜ is non-decreasing over time and state for t˜∈ [0, B+(s)) ∩ T , s ˜ ∈ S˜, where B(s) + = B(s) + k, B−(s) = B(s) − k: • Threshold policy over states exists, • Non-increasing optimal value over time, • Threshold policy over time exists, • DB+(s) is non-decreasing over non-wait state s, • pB−(s) (i|w)D⃗ B(s) ≤ pB(s) (i|w)D⃗ B+(s) , i ∈ S˜, • Assumption 5 holds. Note that the sixth condition ensures that optimal policies for both problems at time B+(s) cannot both be accept. Proof. Note that ps ′s,t(w)u = 0 for all s /∈ S˜ for any t and value u given Assumption 5. B+(s) is the first period the less-frequent problem has a different optimal action than the more-frequent problem in state s within S˜. We first want to show Dt˜(s) is non-decreasing over states s for t˜ ∈ [0, B+(s)). Then we will 123 use this fact to prove that Dt˜(s) is non-decreasing over time for t,˜ ∀t˜∈ [0, B+(s)). First let ϕt = {s|a ∗ t (s) = wait, s ∈ S, t ˜ ∈ T}. Then by threshold policy over non-wait states and over time exists, |ϕt | should be non-increasing over time for t ∈ [0, B+(s)). Note that we assume WLOG the states are ordered so the set ϕt contains consecutive numbers from 1 to mt where mt = |ϕt |. When k = 2, let M⃗ (B(s)) = λpB(s)+1(1|w)D⃗ B+(s) ... λpB(s)+1(mB(s)+1|w)D⃗ B+(s) rB(s)+1(mB(s)+1 + 1, a) − rB(s)+1(mB(s)+1 + 1, w) − λpB(s)+1(mB(s)+1 + 1|w)V⃗ B+(s) ... rB(s)+1(|S|, a) − rB(s)+1(|S|, w) − λpB(s)+1(|S||w)V⃗ B+(s) where mB(s)+1 = |ϕB(s)+1|. Since DB+(s) is non-decreasing by assumption and p has the IFR property over non-wait states, it follows that λpB(s)+1(s|w)D⃗ B+(s) is non-decreasing in s. Also, we have rB(s)+1(mB(s)+1+1, a)−rB(s)+1(mB(s)+1+1, w)−λpB(s)+1(mB(s)+1+1|w)V⃗ B+(s) ≥ λpB(s)+1(mB(s)+1 + 1|w)D⃗ B+(s) . This is because the difference between the two problems is larger when 124 more-frequent problem chooses the optimal action. Moreover, for any non-wait state s and s¯ where s >¯ s, s > mB(s)+1: rB(s)+1(s, a) − rB(s)+1(s, w) − λpB(s)+1(s|w)V⃗ B+(s) − rB(s)+1(¯s, a) + rB(s)+1(¯s, w) + λpB(s)+1(¯s|w)V⃗ B+(s) ≤λpB(s)+1(¯s|w)V⃗ B+(s) − λpB(s)+1(s|w)V⃗ B+(s) ≤0 It follows that M⃗ (B(s)) is a non-decreasing vector over non-wait states. Now we have: DB(s) (s) = vB(s) (s) − VB(s) (s) = Ck + rB(s) (s, w) + λpB(s) (s|w)⃗vB(s)+1 − rB(s) (s, w) − λpB(s) (s|w)(⃗rB(s)+1(w) + λpB(s)+1(s|w)⃗vB(s)+2) = Ck + λpB(s) (s|w)M⃗ (B(s)) Then by the fact that p has the IFR property over non-wait states at each time period. DB(s) is nondecreasing over S˜. Similarly, we use the same logic to show for all t˜ ∈ [0, B+(s)) that Dt˜ is a nondecreasing vector over S˜. We continue the proof with k = 2. Now we use induction to show that Dt˜(s) is non-decreasing over time. For the base case, let mB(s)−k+1 = |ϕB(s)−k+1|, let B−(s) = B(s) − k, then: 125 DB−(s) (s) = Ck + rB−(s) (s, w) + λpB−(s) (s|w)⃗vB(s)−1 − rB−(s) (s, w) − λpB−(s) (s|w)(⃗rB(s)−1 + λpB(s)−1 (s|w)V⃗ B(s) ) = Ck + λpB−(s) (s|w) λpB(s)−1 (1|w)D⃗ B(s) ... λpB(s)−1 (mB(s)−k+1|w)D⃗ B(s) rB(s)−1 (mB(s)−k+1 + 1, a) − rB(s)−1 (mB(s)−k+1 + 1, w) − λpB(s)−1 (mB(s)−k+1 + 1|w)V⃗ B(s) ... rB(s)−1 (|S|, a) − rB(s)−1 (|S|, w) − λpB(s)−1 (|S||w)V⃗ B(s) ≤ Ck + λpB(s) (s|w) λpB(s)+1(1|w)D⃗ B+(s) ... λpB(s)+1(mB(s)−k+1|w)D⃗ B+(s) rB(s)+1(mB(s)−k+1 + 1, a) − rB(s)+1(mB(s)−k+1 + 1, w) − λpB(s)+1(mB(s)−k+1 + 1|w)V⃗ B+(s) ... rB(s)+1(|S|, a) − rB(s)+1(|S|, w) − λpB(s)+1(|S||w)V⃗ B+(s) [by Asm 4, Lemma 1] 126 ≤Ck + λpB(s) (s|w) λpB(s)+1(1|w)D⃗ B+(s) ... λpB(s)+1(mB(s)+1|w)D⃗ B+(s) rB(s)+1(mB(s)+1 + 1, a) − rB(s)+1(mB(s)+1 + 1, w) − λpB(s)+1(mB(s)+1 + 1|w)V⃗ B+(s) ... rB(s)+1(|S|, a) − rB(s)+1(|S|, w) − λpB(s)+1(|S||w)V⃗ B+(s) [since mB(s)−k+1 ≥ mB(s)+1] =DB(s) (s) 127 For the inductive step, we assume Dt˜(s) ≤ Dt˜+ (s) and we want to show that Dt˜− (s) ≤ Dt˜(s): Dt˜− (s) = Ck + rt˜− (s, w) + λpt˜− (s|w)⃗vt˜−+1 − rt˜− (s, w) − λpt˜− (s|w)(⃗rt˜−+1 + λpt˜−+1(s|w)V⃗ t˜) = Ck + λpt˜− (s|w) λpt˜−+1(1|w)D⃗ t˜ ... λpt˜−+1(mt˜−k+1|w)D⃗ t˜ rt˜−+1(mt˜−k+1 + 1, a) − rt˜−+1(mt˜−k+1 + 1, w) − λpt˜−+1(mt˜−k+1 + 1|w)V⃗ t˜ ... rt˜−+1(|S|, a) − rt˜−+1(|S|, w) − λpt˜−+1(|S||w)V⃗ t˜ ≤ Ck + λpt˜(s|w) λpt˜+1(1|w)D⃗ t˜+ ... λpt˜+1(mt˜−k+1|w)D⃗ t˜+ rt˜+1(mt˜−k+1 + 1, a) − rt˜+1(mt˜−k+1 + 1, w) − λpt˜+1(mt˜−k+1 + 1|w)V⃗ t˜+ ... rt˜+1(|S|, a) − rt˜+1(|S|, w) − λpt˜+1(|S||w)V⃗ t˜+ [ by Asm 4, Lemma 1, and induction hypothesis] ≤ Ck + λpt˜(s|w) λpt˜+1(1|w)D⃗ t˜+ ... λpt˜+1(mt˜+1|w)D⃗ t˜+ rt˜+1(mt˜+1 + 1, a) − rt˜+1(mt˜+1 + 1, w) − λpt˜+1(mt˜+1 + 1|w)V⃗ t˜+ ... rt˜+1(|S|, a) − rt˜+1(|S|, w) − λpt˜+1(|S||w)V⃗ t˜+ [since mt˜+1 ≤ mt˜−k+1] = Dt˜(s) 128 This shows that when k = 2, Dt˜(s) is non-decreasing over time for t˜ ∈ [0, B+(s)) ∩ T˜. We next turn to examining the case when k > 2. We will use induction to show that M⃗ (t˜) is non-decreasing over non-wait state: M⃗ (t˜) = λpt˜+1(1|w)(⃗vt˜+2 − (⃗rt˜+2(w) + ... + λpt˜+k−1 (1|w)V⃗ t˜+ )) ... λpt˜+1(mt˜|w)(⃗vt˜+2 − (⃗rt˜+2(w) + ... + λpt˜+k−1 (mt˜|w)V⃗ t˜+ )) rt˜+1(mt˜+ 1, a) − (rt˜+1(mt˜+ 1, w) + λpt˜+1(mt˜+ 1|w)(⃗rt˜+2(w) + ... + λpt˜+k−1 (mt˜+ 1|w)V⃗ t˜+ )) ... rt˜+1(|S|, a) − (rt˜+1(|S|, w) + λpt˜+1(|S||w)(⃗rt˜+2(w) + ... + λpt˜+k−1 (|S||w)V⃗ t˜+ )) We have shown that the elements of M⃗ (t˜) are non-decreasing for s > mt˜. For s ≤ mt˜, elements in the vector are also non-decreasing by the fact that (⃗rt˜+2(w) + ... + λpt˜+k−1 (s|w)V⃗ t˜+ )...) is non-decreasing over non-wait states from the inductive hypothesis and p has the IFR property over non-wait states. Therefore, using the previous rationale, Dt˜(s) is non-decreasing over S˜. The rest of the proof then follows identically and shows that Dt˜(s) is non-decreasing over time for t˜∈ [0, B+(s)) ∩ T˜. Theorem 4 Let ψt˜(s) = max{rt˜(s, a), rt˜(s, w) + λpt˜(s)⃗rt˜+1(a)}. For s ∈ S, ˜ t˜ ∈ T˜, if Ck + λCk + rt˜(s, w) + λpt˜(s|w)ψ⃗ t˜ ≥ Vt˜(s), then Dt˜(s) ≥ 0. 129 Proof. Dt˜(s) = vt˜(s) − Vt˜(s) ≥ Ck + λCk + rt˜(s, w) + λpt˜(s|w)ψ⃗ t˜(s) − Vt˜(s) ≥ Vt˜(s) − Vt˜(s) = 0 B.2 Organ Transplantation Decisions Among ACLF Patients B.2.0.1 Model Inputs We use data and values from the medical literature to parameterize both MDPs. A table of all parameter values is provided in the Appendix. We consider eleven states described in Appendix Table B.1. When the patient is alive and waiting for a transplant the decision-maker can either wait or accept an offered liver within the first twenty-eight days of being on the waitlist (N = 28) before the patient must accept a liver. We treat the post-transplant state, death, and states when patients receive an optimal liver as absorbing. We treat the states where no organ is offered as wait states. We use the typical 3% annual discount rate for both problems. It is common to assume a constant discount rate regardless of change of cycle length in the health literature [37], whether for problems with daily [4], monthly [39], annual [36], or longer intervals [58] of decision-making. We use United Network for Organ Sharing (UNOS) data to construct the daily mortality and posttransplant survival probabilities for patients with ACLF2 and ACLF3. The UNOS dataset is widely used and includes data on all tranplants since 1987 and covers over 250 transplant hospitals [141, 6, 5]. To construct our transition probability matrix and rewards, we select patients from this dataset who have 130 two to six organ failures and require liver transplantation within a month for our UNOS analysis cohort. We calculate pt˜(w) based on UNOS data and use the eigen-decomposition method [37] to calculate Pt˜(w). Note that the likelihood of a liver being optimal is identical in both problems. This will allow us to have the same accumulated daily transition probabilities for both problems and thus make the two problems equivalent in terms of the daily probability of receiving an optimal liver and daily mortality probability. We use the likelihood of receiving an organ (ω), the conditional likelihood of receiving an optimal liver given a liver offer (o), death probability (γ), and the probability of improving from a worse health state to a better health state (ξ) to parameterize the transition matrix pt˜(w) (matrix shown in Appendix Section B.2.0.3). We use UNOS data to find the values for γ and ξ. We assume o is 70% for the baseline likelihood of receiving an optimal liver and ω = {90%, 95%, 100%} for ACLF2, ACLF=3OF, and ACLF>3OF patients. o is uncertain and may vary by the transplant center, so we vary it in the sensitivity analysis. We use the time-invariant relative risk (rr) of survival to calculate the post-transplant survival probabilities for a marginal organ. This relative risk is given by the ratio between the likelihood of posttransplant mortality with a transplanted marginal liver and an optimal liver. Note that rr is less than one, as a marginal liver should only provide equal or less survival benefits compared to an optimal liver. Using UNOS data, we found that the value varies between age groups and ACLF types. After discussion with our clinical expert, we assume rr is 0.8 for the baseline model but perform sensitivity analyses around this parameter to determine the robustness of our results. Ck reflects the costs of transferring to another hospital or the costs associated with the multiple-listing process. Both multiple-listing and transferring are ways to shorten the wait time. However, multiplelisting usually involves completing additional evaluations for the new hospital and coordination with the insurance provider. While patients are not required to move to a different residence to transfer to another transplant center, most transplant centers consider local patients first, so moving close to the new 131 transplant center may result in a higher chance of receiving an offer sooner. There might be other inconveniences or risks associated with transferring as well. For instance, a new transplant center might ask for a request for transfer in a specific format, and patients risk losing all their prior waiting time if they end their current listing before a new transplant center accepts their transfer request [138]. There might also be an inactive duration between transitioning from one center to the other depending on the transplant center [138]. There may also be costs associated with maintaining the transfer/multiple-listing, which include transportation costs (which can be as high as $6,000 per hour [146], additional evaluations, and potential additional in-hospital charges (if the new transplant center charges more than the old one) [40]. All of these costs are captured by the per period cost Ck. To provide values for our numerical example, we use a per patient, per period cost (Ck) of $2,000 [146], loosely estimated from general long distance medical transportation costs. We assume each patient can spend up to 28 days on the waitlist. Costs can therefore range from the thousands to tens of thousands of dollars per day per patient. However, given the uncertainty and regional variation in this value, we also provide results for Ck=$10,000 in the Appendix, and we also sweep over Ck values to identify the maximum Ck for which the more-frequent problem would be used. To evaluate QALYs for our health states, we adopt the annual 0.4 QALY weights for ACLF2 and 3 transplant patients [152] for both wait and non-wait states. This is equivalent to receiving 0.4/365 QALYs for each day, which is the reward we use for the more-frequent problem. For each full epoch the patient survives in the less frequent problem, the patient will receive the expected discounted reward for a full day of life with ACLF3 (⃗rt˜(w) + λpt⃗rt˜+1(w) = R⃗ t˜(w)). Therefore, the reward for wait is non-increasing in time for both problems. We similarly calculate rewards for the post-transplant state, which include the benefits of living with a transplanted liver. We do so by considering the lifetime post-transplant rewards. We assume that the oneyear post-transplant monthly QALY weights are the same for those with either an optimal or marginal liver 132 and the only difference is in mortality risk and use values from the literature [108]. Patients who survive the first post-transplant year have similar lifetime discounted QALYs as other people at their ages [126], so we use this fact to calculate lifetime QALYs. The expected lifetime rewards are calculated by multiplying the monthly post-transplant survival probabilities for each liver type with the appropriate QALY weight and WTP threshold (T ) and then adding the remaining discounted expected lifetime QALYs (L). Therefore Rt˜(marginal, a) = T × P12 i=1 µi(t˜) × rr × Qi + µ12(t˜) × rr × L, where T = $50, 000 per QALY gained [61], µi(t˜) is the post-transplant survival probability at month i with an optimal liver transplanted on time t˜, and Qi is the QALY value for post-transplant patients at month i. We assume this reward is the same for both problems. B.2.0.2 State Definition for Liver Transplantation Example See Appendix Table B.1 Table B.1: State definition for liver transplantation example Health state Number of organ failures Organ received 1 2 Optimal 2 3 Optimal 3 > 3 Optimal 4 2 Marginal 5 3 Marginal 6 > 3 Marginal 7 2 No offer 8 3 No offer 9 > 3 No offer 10 N/A (post-transplant) N/A (post-transplant) 11 N/A (death) N/A (death) B.2.0.3 Transition Probability Matrix For the more-frequent problem, we use transition matrix pt(wait), where o denotes the likelihood of receiving an optimal liver condition on the patient being alive and receiving an offer, γt , γ′ t , γ′′ t denote the mortality probabilities at time t for ACLF3 patients with 3 organ failures, ACLF2 patients, and ACLF3 patients with more than 3 organ failures, and ω1, ω2, ω3 denote the probabilities of receiving an offer for 133 ACLF2, ACLF=3OF, and ACLF>3OF patients. We assume mortality probabilities are same for odd t. For the less-frequent problem, we use Pt˜(wait) where O denotes the likelihood of receiving an optimal liver conditional on the patient being alive and receiving an offer, Γt˜, Γ ′ t˜ , Γ ′′ t˜ denote the mortality probabilities at time t˜(Day t˜+1 2 ), and Ω1, Ω2, Ω3 denote the probabilities of receiving an offer for ACLF2, ACLF=3OF, and ACLF>3OF patients. Also, we use Ξt , Ξ ′ t to denote the probability of improve from ACLF3 with 3 organ failures to ACLF2, and the probability of improve from ACLF3 with more than 3 organ failures to ACLF3 with 3 organ failures in more-frequent problem and in less-frequent problem, respectively. 134 P˜t(wait) = Ω1O(1−Γ′t) Ω1(1−O)(1−Γ′t) (1−Ω1)(1−Γ′t) 0 0 0 0 0 0 0 Γ′t Ω1O(1−Γ′t) Ω1(1−O)(1−Γ′t) (1−Ω1)(1−Γ′t) 0 0 0 0 0 0 0 Γ′t Ω1O(1−Γ′t) Ω1(1−O)(1−Γ′t) (1−Ω1)(1−Γ′t) 0 0 0 0 0 0 0 Γ′t Ω2ΞtO(1−Γt) Ω2Ξt(1−O)(1−Γt) (1−Ω2)Ξt(1−Γt) Ω2(1−Ξt)O(1−Γt) Ω2(1−Ξt)(1−O)(1−Γt) (1−Ω2)(1−Ξt)(1−Γt) 0 0 0 0 Γt Ω2ΞtO(1−Γt) Ω2Ξt(1−O)(1−Γt) (1−Ω2)Ξt(1−Γt) Ω2(1−Ξt)O(1−Γt) Ω2(1−Ξt)(1−O)(1−Γt) (1−Ω2)(1−Ξt)(1−Γt) 0 0 0 0 Γt Ω2ΞtO(1−Γt) Ω2Ξt(1−O)(1−Γt) (1−Ω2)Ξt(1−Γt) Ω2(1−Ξt)O(1−Γt) Ω2(1−Ξt)(1−O)(1−Γt) (1−Ω2)(1−Ξt)(1−Γt) 0 0 0 0 Γt 0 0 0 Ω3Ξ′tO(1−Γ′′t ) Ω3Ξ′t(1−O)(1−Γ′′t ) (1−Ω3)Ξ′t(1−Γ′′t ) (1−Ξ′t)O(1−Γ′′t ) (1−Ξ′t)(1−O)(1−Γ′′t ) (1−Ω3)(1−Ξ′t)(1−Γ′′t ) 0 Γ′′t 0 0 0 Ω3Ξ′tO(1−Γ′′t ) Ω3Ξ′t(1−O)(1−Γ′′t ) (1−Ω3)Ξ′t(1−Γ′′t ) (1−Ξ′t)O(1−Γ′′t ) (1−Ξ′t)(1−O)(1−Γ′′t ) (1−Ω3)(1−Ξ′t)(1−Γ′′t ) 0 Γ′′t 0 0 0 Ω3Ξ′tO(1−Γ′′t ) Ω3Ξ′t(1−O)(1−Γ′′t ) (1−Ω3)Ξ′t(1−Γ′′t ) (1−Ξ′t)O(1−Γ′′t ) (1−Ξ′t)(1−O)(1−Γ′′t ) (1−Ω3)(1−Ξ′t)(1−Γ′′t ) 0 Γ′′t 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 135 B.2.1 Parameters Used in the Liver Transplant Example See Appendix Table B.2. B.2.2 Decision boundaries for ACLF3 patients with Ck = 2, 000 See Appendix Table B.3. B.2.3 Decision boundaries for ACLF3 patients with Ck = 10, 000 Appendix Table B.4 shows the maximum number of days the model recommends waiting for an optimal liver as well as the maximum benefits provided by the more-frequent problem compared to the lessfrequent problem (day and value of peak D) with Ck = 10, 000. Compared with Appendix table B.3, the value of peak D decreases as Ck increases from $2,000 to $10,000. All other observations remain unchanged compared to main manuscript. B.2.3.1 Variation in k Our framework also allows us to consider situations where the offer frequency is more than doubled — k can by any integer. We therefore also perform this numerical analysis with same per-period cost Ck = 2000, but with k = 4 as an example (4x as many offers instead of 2). We observe that all of our theoretical conclusions are confirmed by the numerical results. The optimal policy for k = 4 show longer maximum durations for waiting for an optimal liver for ACLF2 (10 days), ACLF=3OF (4 days), and ACLF>3OF patients (4 days). The peak difference in the optimal value between the more-frequent and less-frequent problems ranges from $83,021 to $98,169. The peak difference value is almost doubled with k = 4 compared with k = 2. Moreover, we find that three-times-as-frequent offers would not be net beneficial at any time if Ck is greater than $41,029 for ACLF2 patients, $35,832 for ACLF=3OF patients, or $41,683 for ACLF>3OF 136 patients. In a nationwide analysis, this translates to roughly $182M when considering the number of ACLF2 and ACLF3 patients in the US in 2019 [141]. B.3 Treatment Initiation for Chronic Kidney Disease B.3.0.1 Model Structure and Inputs Using the approach outlined in Section 3.4, we investigate two equivalent Markov Decision Processes (MDPs) that vary in decision-making frequency. Our analysis focuses on the following states: CKD1, CKD2, CKD3+, being on treatment, and death, with the last two states being terminal states. The time horizon ranges from a patient’s age 45 to 65. For both MDPs, the available actions are either to wait or initiate the ARB/ACE treatment. In order to derive the transition probability matrices and rewards, we use a previously-calibrated microsimulation (replication of a CKD simulation in prior work [66]) that simulates a cohort of 10,000 patients as they progress through CKD stages [154]. Using microsimulation outputs, we obtain the yearly transition probabilities between stages. To compute the transition probability matrices for the more-frequent MDP, we utilize the eigen-decomposition approach from [36]. As in the organ transplant model, we use the net monetary benefit (NMB) to quantify rewards. The function is computed by multiplying QALYs with the WTP threshold of $50,000 per QALY gained then subtracting costs, where QALYs and costs at each stage and age are given by microsimulation outputs. The cost of monitoring eGFR through a blood test is approximately $20 [94], so we assume Ck = 20. B.3.0.2 Results In both more- and less-frequent problems, the optimal policy recommends treatment initiation at age 50 if the patient is in stage 1, at age 49 if in stage 2, and immediately (age 45) if in higher stages. Threshold 137 policies over state and time both exist in both problems. This suggests that early initiation of treatment may be beneficial for early-stage CKD patients. In this example, the optimal policies derived from both MDPs are identical, indicating that the decisionmaking frequency does not significantly impact the recommended course of action. Furthermore, the differences in optimal values (Dt(s)) between the two MDPs fall below zero for all states and time. This suggests that a higher frequency decision-making framework may not yield benefits. This may be due to the fact that the progression of chronic diseases like CKD tends to be slow, so additional decisionmaking opportunities may not contribute significantly to the overall value gained, even when additional monitoring costs would be low. The results depicted in Figure B.1 illustrate that the difference in optimal value (Dt ) between stage 1 and stage 2 remains non-decreasing across time and states, corroborating the findings of Theorem 3. We also verified that the propositions and Theorem 4 also hold in this numerical analysis. Figure B.1: Difference in the expected reward earned over time between more-frequent (M) and lessfrequent (L) problems for the CKD stage 1 and CKD stage 2. Triangles mark the difference in optimal value when the optimal action for both the more-frequent and the less-frequent problems is wait, and marked ‘+’ when the optimal action for both the more-frequent and the less-frequent problems is accept. 138 Variable Value Source Pre-transplant survival probabilities, by day: ACLF2 ACLF=3 ACLF>3 UNOS day 1 0.989 0.9575 0.9336 day 2 0.9714 0.9024 0.8501 day 3 0.9439 0.8433 0.7745 day 4 0.9108 0.7868 0.7064 day 5 0.8804 0.7283 0.6289 day 6 0.8511 0.6823 0.5806 day 7 0.8179 0.6274 0.5183 day 8 0.7965 0.5821 0.4636 day 9 0.7733 0.5405 0.4273 day 10 0.756 0.5058 0.3978 day 11 0.7369 0.471 0.3705 day 12 0.7165 0.4462 0.3491 day 13 0.6873 0.4174 0.33 day 14 0.6623 0.3767 0.3011 day 15 0.6492 0.3602 0.2718 day 16 0.6291 0.3422 0.2422 day 17 0.6083 0.3288 0.2284 day 18 0.5918 0.2981 0.2209 day 19 0.5772 0.2823 0.2082 day 20 0.5619 0.2656 0.1955 day 21 0.5491 0.2503 0.1865 day 22 0.5381 0.2351 0.1761 day 23 0.5278 0.2278 0.1644 day 24 0.5194 0.2205 0.1591 day 25 0.5103 0.2132 0.1539 day 26 0.498 0.1994 0.15 day 27 0.4895 0.1977 0.1434 day 28 0.4764 0.1936 0.1408 Post-transplant survival probabilities (optimal liver) : ACLF2 ACLF=3 ACLF>3 UNOS 3 mo post-transplant 0.9375 0.9284 0.8805 6 mo post-transplant 0.925 0.8987 0.8276 12 mo post-transplant 0.904 0.8654 0.7922 Relative risk of post-transplant mortality between transplantees with a marginal versus an optimal liver (rr) 0.9 Assumed Daily probability of getting an optimal liver (O) 0.6 Assumed ACLF3 patients pre-transplant QALYs 0.4 Wells et al. (2004)[152] ACLF3 patients post-transplant QALYs, by month: Ratcliffe et al. (2005)[108] 3 months 0.576 6 months 0.601 12 months 0.606 Daily probability of improving to better a state(Ξt) UNOS day 1 0.004 day 8 0.0829 day 15 0.1587 day 22 0.2047 day 28 0.2427 Table B.2: Parameters 139 Less-freq. More-freq. Value of Time at problem problem peak D ($) peak D (Day) o = 50%, rr = 0.7: Two organ failures 4 6 76,120 4 Three organ failures 2 4 66,573 2 More than three organ failures 2 2 66,796 2 o = 50%, rr = 0.8: Two organ failures 2 4 48,389 2 Three organ failures 2 2 48,577 2 More than three organ failures 0 2 39,109 2 o = 50%, rr = 0.9: Two organ failures 2 2 21,625 2 Three organ failures 0 1 1,501 0 More than three organ failures 0 0 -2000 0 o = 60%, rr = 0.7: Two organ failures 4 7 80,162 4 Three organ failures 2 4 71,279 2 More than three organ failures 2 2 69,648 2 o = 60%, rr = 0.8: Two organ failures 2 4 56,200 4 Three organ failures 2 2 50,959 2 More than three organ failures 2 2 54,084 2 o = 60%, rr = 0.9 : Two organ failures 2 2 22,359 2 Three organ failures 0 2 11,046 2 More than three organ failures 0 0 -2000 0 o = 70%, rr = 0.7: Two organ failures 4 8 79,937 6 Three organ failures 2 4 68,923 2 More than three organ failures 2 4 68,817 2 o = 70%, rr = 0.8 (base case): Two organ failures 4 6 59,203 4 Three organ failures 2 2 50,730 2 More than three organ failures 2 2 54,974 2 o = 70%, rr = 0.9: Two organ failures 2 4 22,282 2 Three organ failures 0 2 19,470 2 More than three organ failures 0 2 945 2 Table B.3: Decision Boundaries (the maximum number of epochs for an optimal liver), Value of Peak D, and Time at Peak D Under Different Conditions 140 Less-freq. More-freq. Value of Time at problem problem peak D ($) peak D (Day) o = 50%, rr = 0.7: Two organ failures 4 6 53,751 4 Three organ failures 2 3 44,834 2 More than three organ failures 2 2 47,826 2 o = 50%, rr = 0.8: Two organ failures 2 4 24,670 2 Three organ failures 2 2 28,872 2 More than three organ failures 0 2 20,154 2 o = 50%, rr = 0.9: Two organ failures 2 2 962 2 Three organ failures 0 0 -10,000 2 More than three organ failures 0 0 -10,000 0 o = 60%, rr = 0.7: Two organ failures 4 6 59,369 4 Three organ failures 2 4 50,868 2 More than three organ failures 2 2 51,379 2 o = 60%, rr = 0.8: Two organ failures 2 4 37,056 4 Three organ failures 2 2 31,991 2 More than three organ failures 2 2 35,827 2 o = 60%, rr = 0.9 : Two organ failures 2 2 2,473 2 Three organ failures 0 1 -7,442 0 More than three organ failures 0 0 -10,000 0 o = 70%, rr = 0.7: Two organ failures 4 8 61,679 6 Three organ failures 2 4 49,900 2 More than three organ failures 2 2 51,027 2 o = 70%, rr = 0.8 (base case): Two organ failures 4 5 39,889 4 Three organ failures 2 2 32,495 2 More than three organ failures 2 2 37,413 2 o = 70%, rr = 0.9: Two organ failures 2 2 2,680 2 Three organ failures 0 2 1,242 2 More than three organ failures 0 0 -10,000 0 Table B.4: Decision Boundaries (the maximum number of epochs for an optimal liver), Value of Peak D, and Time at Peak D Under Different Conditions 141 Bibliography [1] Mohamed A Abdallah, Yong-Fang Kuo, Sumeet Asrani, Robert J Wong, Aijaz Ahmed, Paul Kwo, Norah Terrault, Patrick S Kamath, Rajiv Jalan, and Ashwani K Singal. “Validating a novel score based on interaction between ACLF grade and MELD score to predict waitlist mortality”. In: Journal of hepatology 74.6 (2021), pp. 1355–1361. [2] Jae-Hyeon Ahn and John C. Hornberger. “Involving Patients in the Cadaveric Kidney Transplant Allocation Process: A Decision-Theoretic Perspective”. In: Management Science 42.5 (1996), pp. 629–641. [3] Oguzhan Alagoz, Jagpreet Chhatwal, and Elizabeth S. Burnside. “Optimal Policies for Reducing Unnecessary Follow-Up Mammography Exams in Breast Cancer Diagnosis”. In: Decision Analysis 10.3 (2013), pp. 200–224. [4] Oguzhan Alagoz, Heather E Hsu, Andrew J. Schaefer, and Mark S. Roberts. “Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty”. In: Medical Decision Making 30 (2010), pp. 474–483. [5] Oguzhan Alagoz, Lisa M. Maillart, Andrew J. Schaefer, and Mark S. Roberts. “Choosing Among Living-Donor and Cadaveric Livers”. In: Management Science 53.11 (2007), pp. 1702–1715. [6] Oguzhan Alagoz, Lisa M. Maillart, Andrew J. Schaefer, and Mark S. Roberts. “The Optimal Timing of Living-Donor Liver Transplantation”. In: Management Science 50.10 (2004), pp. 1420–1430. [7] Linda JS Allen. “Some discrete-time SI, SIR, and SIS epidemic models”. In: Mathematical biosciences 124.1 (1994), pp. 83–105. [8] Chrysovalantis Anastasiou, Jianfa Lin, Chaoyang He, Yao-Yi Chiang, and Cyrus Shahabi. “Admsv2: A modern architecture for transportation data management and analysis”. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances on Resilient and Intelligent Cities. 2019, pp. 25–28. [9] Mohammad Sanaei Ardekani and Janis M Orlowski. “Multiple listing in kidney transplantation”. In: American Journal of Kidney Diseases 55.4 (2010), pp. 717–725. 142 [10] Vicente Arroyo, Richard Moreau, Rajiv Jalan, Pere Ginès, and EASL-CLIF Consortium CANONIC Study. “Acute-on-chronic liver failure: a new syndrome that will re-classify cirrhosis”. In: Journal of hepatology 62.1 (2015), S131–S143. [11] Vicente Arroyo, Richard Moreau, Patrick S Kamath, Rajiv Jalan, Pere Ginès, Frederik Nevens, Javier Fernández, Uyen To, Guadalupe García-Tsao, and Bernd Schnabl. “Acute-on-chronic liver failure in cirrhosis”. In: Nature reviews Disease primers 2.1 (2016), pp. 1–18. [12] Edilson F Arruda, Basílio B Pereira, Clarissa A Thiers, and Bernardo R Tura. “Optimal testing policies for diagnosing patients with intermediary probability of disease”. In: Artificial intelligence in medicine 97 (2019), pp. 89–97. [13] Edilson F Arruda and João Bosco Ribeiro do Val. “Stability and optimality of a multi-product production and storage system under demand uncertainty”. In: European Journal of Operational Research 188.2 (2008), pp. 406–427. [14] Thierry Artzner, Baptiste Michard, Emmanuel Weiss, Louise Barbier, Zair Noorah, Jean-Claude Merle, Catherine Paugam-Burtz, Claire Francoz, François Durand, Olivier Soubrane, et al. “Liver transplantation for critically ill cirrhotic patients: stratifying utility based on pretransplant factors”. In: American Journal of Transplantation 20.9 (2020), pp. 2437–2448. [15] Joshua Aurand and Yu-Jui Huang. “Mortality and Healthcare: A Stochastic Control Analysis under Epstein–Zin Preferences”. In: SIAM Journal on Control and Optimization 59.5 (2021), pp. 4051–4080. [16] AW Avolio, M Siciliano, R Barbarino, E Nure, BE Annicchiarico, A Gasbarrini, S Agnes, and M Castagneto. “Donor risk index and organ patient index as predictors of graft survival after liver transplantation”. In: Transplantation proceedings. Vol. 40. 6. Elsevier. 2008, pp. 1899–1902. [17] Turgay Ayer, Can Zhang, Chenxi Zeng, Chelsea C White III, and V Roshan Joseph. “Analysis and improvement of blood collection operations: winner—2017 M&SOM practice-based research competition”. In: Manufacturing & Service Operations Management 21.1 (2019), pp. 29–46. [18] Chaithanya Bandi, Nikolaos Trichakis, and Phebe Vayanos. “Robust Multiclass Queuing Theory for Wait Time Estimation in Resource Allocation Systems”. In: Management Science 65.1 (2019), pp. 152–187. [19] Richard E Barlow and Frank Proschan. Statistical theory of reliability and life testing: probability models. Tech. rep. Florida State Univ Tallahassee, 1975. [20] Nicole Bäuerle and Ulrich Rieder. “MDP algorithms for portfolio optimization problems in pure jump markets”. In: Finance and Stochastics 13.4 (2009), pp. 591–611. [21] Sebastian Becker, Patrick Cheridito, and Arnulf Jentzen. “Deep optimal stopping”. In: The Journal of Machine Learning Research 20.1 (2019), pp. 2712–2736. [22] Ross Beckley, Cametria Weatherspoon, Michael Alexander, Marissa Chandler, Anthony Johnson, and Ghan S Bhatt. “Modeling epidemics with differential equations”. In: Tennessee State University Internal Report (2013). 143 [23] Richard Bellman. “Functional equations in the theory of dynamic programming. V. Positivity and quasi-linearity”. In: Proceedings of the National Academy of Sciences 41.10 (1955), pp. 743–746. [24] Richard Bellman and Robert Kalaba. “Dynamic programming and statistical communication theory”. In: Proceedings of the National Academy of Sciences 43.8 (1957), pp. 749–751. [25] Dimitri Bertsekas. Dynamic programming and optimal control: Volume I. Vol. 1. Athena scientific, 2012. [26] Jeffrey D Blanchard, Michael Cermak, David Hanle, and Yirong Jing. “Greedy algorithms for joint sparse recovery”. In: IEEE Transactions on Signal Processing 62.7 (2014), pp. 1694–1704. [27] SM Blower, Katia Koelle, and John Mills. “Health policy modeling: epidemic control, HIV vaccines, and risky behavior”. In: Quantitative evaluation of HIV prevention programs (2002), pp. 260–289. [28] Alireza Boloori, Soroush Saghafian, Harini A Chakkera, and Curtiss B Cook. “Data-driven management of post-transplant medications: An ambiguous partially observable markov decision process approach”. In: Manufacturing & Service Operations Management 22.5 (2020), pp. 1066–1087. [29] Richard J Boucherie and Nico M Van Dijk. Markov decision processes in practice. Vol. 248. Springer, 2017. [30] Fred Brauer. “Compartmental models in epidemiology”. In: Mathematical epidemiology (2008), pp. 19–79. [31] Alex Brooks, Alexei Makarenko, Stefan Williams, and Hugh Durrant-Whyte. “Parametric POMDPs for planning in continuous state spaces”. In: Robotics and Autonomous Systems 54.11 (2006), pp. 887–897. [32] Caltrans. PeMS Data Source Performance Measurement System (PeMS) Data Source. Availible at https://dot.ca.gov/programs/traffic-operations/mpr/pems-source (accessed October 30, 2023). 2023. [33] Alessandro Calvia, Fausto Gozzi, Francesco Lippi, and Giovanni Zanco. “A simple planning problem for COVID-19 lockdown: a dynamic programming approach”. In: Economic Theory (2023), pp. 1–28. [34] Muge Capan, Julie S Ivy, James R Wilson, and Jeanne M Huddleston. “A stochastic model of acute-care decisions based on patient and provider heterogeneity”. In: Health care management science 20.2 (June 2017), pp. 187–206. issn: 1386-9620. [35] Yolanda Carson and Anu Maria. “Simulation optimization: methods and applications”. In: Proceedings of the 29th conference on Winter simulation. 1997, pp. 118–126. [36] Jagpreet Chhatwal, Oguzhan Alagoz, and Elizabeth S. Burnside. “Optimal Breast Biopsy Decision-Making Based on Mammographic Features and Demographic Factors”. In: Operations Research 58.6 (2010), pp. 1577–1591. 144 [37] Jagpreet Chhatwal, Suren Jayasuriya, and Elamin H. Elbasha. “Changing Cycle Lengths in State-Transition Models: Challenges and Solutions”. In: Medical Decision Making 36.8 (2016), pp. 952–964. [38] Shared S Chitgopekar. “Continuous time Markovian sequential control processes”. In: SIAM Journal on Control 7.3 (1969), pp. 367–389. [39] Sung Eun Choi, Margaret L Brandeau, and Sanjay Basu. “Dynamic treatment selection and modification for personalised blood pressure therapy using a Markov decision process model: a cost-effectiveness analysis”. In: BMJ open 7.11 (2017), e018374. [40] George Cholankeril, Ryan B Perumpail, Zeynep Tulu, Channa R Jayasekera, Stephen A Harrison, Menghan Hu, Carlos O Esquivel, and Aijaz Ahmed. “Trends in liver transplantation multiple listing practices associated with disparities in donor availability: An endless pursuit to implement the final rule”. In: Gastroenterology 151.3 (2016), pp. 382–386. [41] City of Los Angeles Public Health. LA County COVID-19 Data. Availible at http://publichealth.lacounty.gov/media/Coronavirus/data/ (accessed October 30, 2023). 2023. [42] Joan Clària, Vicente Arroyo, and Richard Moreau. “The acute-on-chronic liver failure syndrome, or when the innate immune system goes astray”. In: The Journal of Immunology 197.10 (2016), pp. 3755–3761. [43] Joan Clària, Rudolf E Stauber, Minneke J Coenraad, Richard Moreau, Rajiv Jalan, Marco Pavesi, Àlex Amorós, Esther Titos, José Alcaraz-Quiles, Karl Oettl, et al. “Systemic inflammation in decompensated cirrhosis: characterization and role in acute-on-chronic liver failure”. In: Hepatology 64.4 (2016), pp. 1249–1264. [44] KP Croome, P Marotta, WJ Wall, C Dale, MA Levstik, N Chandok, and R Hernandez-Alejandro. “Should a lower quality organ go to the least sick patient? Model for end-stage liver disease score and donor risk index as predictors of early allograft dysfunction”. In: Transplantation proceedings. Vol. 44. 5. Elsevier. 2012, pp. 1303–1306. [45] Giuseppe Cullaro, Elizabeth C Verna, Brian P Lee, and Jennifer C Lai. “Chronic kidney disease in liver transplant candidates: a rising burden impacting post–liver transplant outcomes”. In: Liver transplantation 26.4 (2020), pp. 498–506. [46] Israel David and Uri Yechiali. “A Time-dependent Stopping Problem with Application to Live Organ Transplants”. In: Operations Research 33.3 (1985), pp. 491–504. [47] Brian T. Denton, Murat Kurt, Nilay D. Shah, Sandra C. Bryant, and Steven A. Smith. “Optimizing the Start Time of Statin Therapy for Patients with Diabetes”. In: Medical Decision Making 29.3 (2009), pp. 351–367. [48] Philip H Dybvig and Hong Liu. “Lifetime consumption and investment: retirement and constrained borrowing”. In: Journal of Economic Theory 145.3 (2010), pp. 885–907. [49] Isaac Ehrlich. “Uncertain lifetime, life protection, and the value of life saving”. In: Journal of health economics 19.3 (2000), pp. 341–367. 145 [50] Wenjuan Fan, Yang Zong, and Subodha Kumar. “Optimal treatment of chronic kidney disease with uncertainty in obtaining a transplantable kidney: an MDP based approach”. In: Annals of Operations Research (2020), pp. 1–34. [51] S. Feng, N.P. Goodrich, J.L. Bragg-Gresham, D.M. Dykstra, J.D. Punch, M.A. DebRoy, S.M. Greenstein, and R.M. Merion. “Characteristics Associated with Liver Graft Failure: The Concept of a Donor Risk Index”. In: American Journal of Transplantation 6.4 (2006), pp. 783–790. [52] Sandy Feng, Nathan P Goodrich, Jennifer L Bragg-Gresham, Dawn M Dykstra, Jeffery D Punch, MA DebRoy, Stuart M Greenstein, and Robert M Merion. “Characteristics associated with liver graft failure: the concept of a donor risk index”. In: American journal of transplantation 6.4 (2006), pp. 783–790. [53] Yuting Fu, Hanqing Jin, Haitao Xiang, and Ning Wang. “Optimal lockdown policy for vaccination during COVID-19 pandemic”. In: Finance research letters 45 (2022), p. 102123. [54] Ilaria Giannoccaro and Pierpaolo Pontrandolfo. “Inventory management in supply chains: a reinforcement learning approach”. In: International Journal of Production Economics 78.2 (2002), pp. 153–161. [55] Bob Givan and Ron Parr. “An introduction to Markov decision processes”. In: Purdue University (2001). Available at https://faculty.kfupm.edu.sa/coe/ashraf/RichFilesTeaching/COE101_540/Projects/givan1.pdf (Accessed 6 June 2023). [56] Raymond C Givens, Todd Dardas, Kevin J Clerkin, Susan Restaino, P Christian Schulze, and Donna M Mancini. “Outcomes of multiple listing for adult heart transplantation in the United States: analysis of OPTN data from 2000 to 2013”. In: JACC: Heart Failure 3.12 (2015), pp. 933–941. [57] Aditya Goenka, Lin Liu, and Manh-Hung Nguyen. “Infectious diseases and economic growth”. In: Journal of Mathematical Economics 50 (2014), pp. 34–53. [58] Joel Goh, Mohsen Bayati, Stefanos A Zenios, Sundeep Singh, and David Moore. “Data uncertainty in Markov chains: Application to cost-effectiveness analyses of medical innovations”. In: Operations Research 66.3 (2018), pp. 697–715. [59] Benjamin Gompertz. “XXIV. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. In a letter to Francis Baily, Esq. FRS &c”. In: Philosophical transactions of the Royal Society of London 115 (1825), pp. 513–583. [60] Veronika Grimm, Friederike Mengel, and Martin Schmidt. “Extensions of the SEIR model for the analysis of tailored social distancing and tracing approaches to cope with COVID-19”. In: Scientific Reports 11.1 (2021), pp. 1–16. [61] Scott D Grosse. “Assessing cost-effectiveness in healthcare: history of the $50,000 per QALY threshold”. In: Expert review of pharmacoeconomics & outcomes research 8.2 (2008), pp. 165–178. 146 [62] Thierry Gustot, Javier Fernandez, Elisabet Garcia, Filippo Morando, Paolo Caraceni, Carlo Alessandria, Wim Laleman, Jonel Trebicka, Laure Elkrief, Corinna Hopf, et al. “Clinical course of acute-on-chronic liver failure syndrome and effects on prognosis”. In: Hepatology 62.1 (2015), pp. 243–252. [63] Tiberiu Harko, Francisco SN Lobo, and MK3197716 Mak. “Exact analytical solutions of the Susceptible-Infected-Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates”. In: Applied Mathematics and Computation 236 (2014), pp. 184–194. [64] Ruben Hernaez, Yan Liu, Jennifer R Kramer, Abbas Rana, Hashem B El-Serag, and Fasiha Kanwal. “Model for end-stage liver disease-sodium underestimates 90-day mortality risk in patients with acute-on-chronic liver failure”. In: Journal of hepatology 73.6 (2020), pp. 1425–1433. [65] Keita Hirano, Daiki Kobayashi, Naoto Kohtani, Yukari Uemura, Yasuo Ohashi, Yasuhiro Komatsu, Motoko Yanagita, and Akira Hishida. “Optimal follow-up intervals for different stages of chronic kidney disease: a prospective observational study”. In: Clinical and experimental nephrology 23 (2019), pp. 613–620. [66] Thomas J Hoerger, John S Wittenborn, Joel E Segel, Nilka R Burrows, Kumiko Imai, Paul Eggers, Meda E Pavkov, Regina Jordan, Susan M Hailpern, Anton C Schoolwerth, et al. “A health policy model of CKD: 1. Model construction, assumptions, and validation of health consequences”. In: American journal of kidney diseases 55.3 (2010), pp. 452–462. [67] Ronald A Howard. “Dynamic programming and markov processes.” In: (1960). [68] Chuanpu Hu, William S. Lovejoy, and Steven L. Shafer. “Comparison of Some Suboptimal Control Policies in Medical Drug Therapy”. In: Operations Research 44.5 (1996), pp. 696–709. [69] P Huebener, MR Sterneck, K Bangert, A Drolz, AW Lohse, S Kluge, L Fischer, and V Fuhrmann. “Stabilisation of acute-on-chronic liver failure patients before liver transplantation predicts post-transplant survival”. In: Alimentary pharmacology & therapeutics 47.11 (2018), pp. 1502–1510. [70] Rajiv Jalan, Faouzi Saliba, Marco Pavesi, Alex Amoros, Richard Moreau, Pere Ginès, Eric Levesque, Francois Durand, Paolo Angeli, Paolo Caraceni, et al. “Development and validation of a prognostic score to predict mortality in patients with acute-on-chronic liver failure”. In: Journal of hepatology 61.5 (2014), pp. 1038–1047. [71] Arun Jesudian, Sameer Desale, Jonathan Julia, Elizabeth Landry, Christopher Maxwell, Bhaskar Kallakury, Jacqueline Laurin, and Kirti Shetty. “Donor factors including donor risk index predict fibrosis progression, allograft loss, and patient survival following liver transplantation for hepatitis c virus”. In: Journal of Clinical and Experimental Hepatology 6.2 (2016), pp. 109–114. [72] De Kai, Guy-Philippe Goldstein, Alexey Morgunov, Vishal Nangalia, and Anna Rotkirch. Universal Masking is Urgent in the COVID-19 Pandemic: SEIR and Agent Based Models, Empirical Validation, Policy Recommendations. 2020. arXiv: 2004.13553 [physics.soc-ph]. [73] Robert M Kaplan, John P Anderson, et al. “The general health policy model: an integrated approach”. In: Quality of life and pharmacoeconomics in clinical trials 2 (1996), pp. 302–322. 147 [74] Ioannis Karatzas and Hui Wang. “Utility maximization with discretionary stopping”. In: SIAM Journal on Control and Optimization 39.1 (2000), pp. 306–329. [75] David Kaufman, Andrew J Schaefer, and Mark S Roberts. “Living-donor liver transplantation timing under ambiguous health state transition probabilities”. In: Available at SSRN 3003590 (2017). [76] Pooyan Kazemian, Jonathan E Helm, Mariel S Lavieri, Joshua D Stein, and Mark P Van Oyen. “Dynamic monitoring and control of irreversible chronic diseases with application to glaucoma”. In: Production and operations management 28.5 (2019), pp. 1082–1107. [77] William O Kermack and Anderson G McKendrick. “Contributions to the mathematical theory of epidemics–I. 1927.” In: Bulletin of mathematical biology 53.1-2 (1991), pp. 33–55. [78] William O Kermack and Anderson G McKendrick. “Contributions to the mathematical theory of epidemics—II. The problem of endemicity”. In: Bulletin of mathematical biology 53.1-2 (1991), pp. 57–87. [79] WO Kermack and AG McKendrick. “Contributions to the mathematical theory of epidemics—III. Further studies of the problem of endemicity”. In: Bulletin of mathematical biology 53.1-2 (1991), pp. 89–118. [80] Anahita Khojandi, Lisa M Maillart, Oleg A Prokopyev, Mark S Roberts, and Samir F Saba. “Dynamic abandon/extract decisions for failed cardiac leads”. In: Management Science 64.2 (2018), pp. 633–651. [81] Jacek A Kopec, Philippe Finès, Douglas G Manuel, David L Buckeridge, William M Flanagan, Jillian Oderkirk, Michal Abrahamowicz, Samuel Harper, Behnam Sharif, Anya Okhmatovskaia, et al. “Validation of population-based disease simulation models: a review of concepts and methods”. In: BMC public health 10 (2010), pp. 1–13. [82] Jennifer E Kreke. “Modeling disease management decisions for patients with pneumonia-related sepsis”. In: (2007). (Doctoral dissertation, University of Pittsburgh). [83] Martin Kröger and Reinhard Schlickeiser. “Analytical solution of the SIR-model for the temporal evolution of epidemics. Part A: time-independent reproduction factor”. In: Journal of Physics A: Mathematical and Theoretical 53.50 (2020), p. 505601. [84] Yarlin Kuo. “Optimal adaptive control policy for joint machine maintenance and product quality control”. In: European Journal of Operational Research 171.2 (2006), pp. 586–597. [85] Murat Kurt, Brian T. Denton, Andrew J. Schaefer, Nilay D. Shah, and Steven A. Smith. “The structure of optimal statin initiation policies for patients with Type 2 diabetes”. In: IIE Transactions on Healthcare Systems Engineering 1.1 (2011), pp. 49–65. [86] Claude Lefèvre. “Optimal Control of a Birth and Death Epidemic Process”. In: Operations Research 29.5 (1981), pp. 971–982. issn: 0030364X, 15265463. 148 [87] Shan Liu, Margaret L. Brandeau, and Jeremy D. Goldhaber-Fiebert. “Optimizing patient treatment decisions in an era of rapid technological advances: the case of hepatitis C treatment”. eng. In: Health care management science 20.1 (Mar. 2017), pp. 16–32. issn: 1386-9620. [88] Elisa F Long, Eike Nohdurft, and Stefan Spinler. “Spatial resource allocation for emerging epidemics: A comparison of greedy, myopic, and dynamic policies”. In: Manufacturing & Service Operations Management 20.2 (2018), pp. 181–198. [89] William S Lovejoy. “Computationally feasible bounds for partially observed Markov decision processes”. In: Operations research 39.1 (1991), pp. 162–175. [90] Paolo Magni, Silvana Quaglini, Monia Marchetti, and Giovanni Barosi. “Deciding when to intervene: a Markov decision process approach”. In: International Journal of Medical Informatics 60.3 (2000), pp. 237–253. issn: 1386-5056. [91] Nadim Mahmud, Ruben Hernaez, Tiffany Wu, and Vinay Sundaram. “Early Transplantation in Acute on Chronic Liver Failure: Who and When”. In: Current Hepatology Reports 19.3 (Sept. 2020), pp. 168–173. issn: 2195-9595. [92] Nadim Mahmud, David E Kaplan, Tamar H Taddei, and David S Goldberg. “Incidence and mortality of acute-on-chronic liver failure using two definitions in patients with compensated cirrhosis”. In: Hepatology 69.5 (2019), pp. 2150–2163. [93] Lisa M. Maillart, Julie Simmons Ivy, Scott Ransom, and Kathleen Diehl. “Assessing Dynamic Breast Cancer Screening Policies”. In: Operations Research 56.6 (2008), pp. 1411–1427. [94] Braden Manns, Brenda Hemmelgarn, Marcello Tonelli, Flora Au, T Carter Chiasson, James Dong, and Scott Klarenbach. “Population based screening for chronic kidney disease: cost effectiveness study”. In: Bmj 341 (2010). [95] J.E. Mason, B.T. Denton, N.D. Shah, and S.A. Smith. “Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients”. In: European Journal of Operational Research 233.3 (2014), pp. 727–738. issn: 0377-2217. [96] Laura Matrajt, Julia Eaton, Tiffany Leung, Dobromir Dimitrov, Joshua T Schiffer, David A Swan, and Holly Janes. “Optimizing vaccine allocation for COVID-19 vaccines shows the potential role of single-dose vaccination”. In: Nature communications 12.1 (2021), p. 3449. [97] Rabi G Mishalani and Samer M Madanat. “Computation of infrastructure transition probabilities using stochastic duration models”. In: Journal of Infrastructure systems 8.4 (2002), pp. 139–148. [98] Richard Moreau, Rajiv Jalan, Pere Gines, Marco Pavesi, Paolo Angeli, Juan Cordoba, Francois Durand, Thierry Gustot, Faouzi Saliba, Marco Domenicali, et al. “Acute-on-chronic liver failure is a distinct syndrome that develops in patients with acute decompensation of cirrhosis”. In: Gastroenterology 144.7 (2013), pp. 1426–1437. [99] Remi Munos and Andrew Moore. “Variable resolution discretization in optimal control”. In: Machine learning 49 (2002), pp. 291–323. 149 [100] JP New, RJ Middleton, B Klebe, CKT Farmer, S De Lusignan, PE Stevens, and DJ O’Donoghue. “Assessing the prevalence, monitoring and management of chronic kidney disease in patients with diabetes compared with those without diabetes in general practice”. In: Diabetic medicine 24.4 (2007), pp. 364–369. [101] Eric S Orman, Maria E Mayorga, Stephanie B Wheeler, Rachel M Townsley, Hector H Toro-Diaz, Paul H Hayashi, and A Sidney Barritt IV. “Declining liver graft quality threatens the future of liver transplantation in the United States”. In: Liver Transplantation 21.8 (2015), pp. 1040–1050. [102] John Carlos Pedrozo-Pupo, Maria Jose Pedrozo-Cortes, and Adalberto Campo-Arias. “Perceived stress associated with COVID-19 epidemic in Colombia: an online survey”. In: Cadernos de saude publica 36 (2020), e00090520. [103] Goran Peskir and Albert Shiryaev. Optimal stopping and free-boundary problems. Springer, 2006. [104] Facundo Piguillem and Liyan Shi. “Optimal COVID-19 quarantine and testing policies”. In: The Economic Journal 132.647 (2022), pp. 2534–2562. [105] Lev Semenovich Pontryagin. Mathematical theory of optimal processes. Routledge, 2018. [106] Aviva Prins, Aditya Mate, Jackson A Killian, Rediet Abebe, and Milind Tambe. “Incorporating Healthcare Motivated Constraints in Restless Bandit Based Resource Allocation”. In: preprint (2020). [107] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st. USA: John Wiley & Sons, Inc., 1994. isbn: 0471619779. [108] Julie Ratcliffe, Tracey Young, Louise Longworth, and Martin Buxton. “An Assessment of the Impact of Informative Dropout and Nonresponse in Measuring Health-Related Quality of Life Using the EuroQol (EQ-5D) Descriptive System”. In: Value in Health 8.1 (2005), pp. 53–58. issn: 1098-3015. [109] Matthew Redelings, Loren Lieb, and Frank Sorvillo. “Years off your life? The effects of homicide on life expectancy by neighborhood and race/ethnicity in Los Angeles County”. In: Journal of Urban Health 87 (2010), pp. 670–676. [110] Laura H Rosenberger, Jacob R Gillen, Tjasa Hranjec, Jayme B Stokes, Kenneth L Brayman, Sean C Kumer, Timothy M Schmitt, and Robert G Sawyer. “Donor risk index predicts graft failure reliably but not post-transplant infections”. In: Surgical infections 15.2 (2014), pp. 94–98. [111] Burhaneddin Sandıkçı, Lisa M Maillart, Andrew J Schaefer, Oguzhan Alagoz, and Mark S Roberts. “Estimating the patient’s price of privacy in liver transplantation”. In: Operations Research 56.6 (2008), pp. 1393–1410. [112] Burhaneddin Sandıkçı, Lisa M Maillart, Andrew J Schaefer, and Mark S Roberts. “Alleviating the patient’s price of privacy through a partially observable waiting list”. In: Management Science 59.8 (2013), pp. 1836–1854. 150 [113] Andrew J. Schaefer, Matthew D. Bailey, Steven M. Shechter, and Mark S. Roberts. “Modeling Medical Treatment Using Markov Decision Processes”. In: Operations Research and Health Care: A Handbook of Methods and Applications. Ed. by Margaret L. Brandeau, François Sainfort, and William P. Pierskalla. Boston, MA: Springer US, 2004, pp. 593–612. isbn: 978-1-4020-8066-1. [114] Greggory J Schell, Gian-Gabriel P Garcia, Mariel S Lavieri, Jeremy B Sussman, and Rodney A Hayward. “Optimal coinsurance rates for a heterogeneous population under inequality and resource constraints”. In: IISE Transactions 51.1 (2019), pp. 74–91. [115] Pawana Sharma, Rachel C Blackburn, Claire L Parke, Keith McCullough, Angharad Marks, and Corri Black. “Angiotensin-converting enzyme inhibitors and angiotensin receptor blockers for adults with early (stage 1 to 3) non-diabetic chronic kidney disease”. In: Cochrane Database of Systematic Reviews 10 (2011). [116] Steven M. Shechter, Matthew D. Bailey, Andrew J. Schaefer, and Mark S. Roberts. “The Optimal Time to Initiate HIV Therapy under Ordered Health States”. In: Operations Research 56.1 (2008), pp. 20–33. issn: 0030364X, 15265463. [117] Albert N Shiryaev. Optimal stopping rules. Vol. 8. Springer Science & Business Media, 2007. [118] Rahul Singh, Fang Liu, and Ness B Shroff. “A Partially Observable MDP Approach for Sequential Testing for Infectious Diseases such as COVID-19”. In: arXiv preprint arXiv:2007.13023 (2020). [119] M Reza Skandari and Steven M Shechter. “Patient-type Bayes-adaptive treatment plans”. In: Operations Research 69.2 (2021), pp. 574–598. [120] Frank A. Sonnenberg and J. Robert Beck. “Markov Models in Medical Decision Making: A Practical Guide”. In: Medical Decision Making 13.4 (1993), pp. 322–338. [121] Sze-chuan Suen, Eran Bendavid, and Jeremy D Goldhaber-Fiebert. “Disease control implications of India’s changing multi-drug resistant tuberculosis epidemic”. In: PloS one 9.3 (2014), e89822. [122] Vinay Sundaram and Rajiv Jalan. “Reply”. In: Gastroenterology 157.4 (2019), pp. 1163–1164. issn: 0016-5085. doi: https://doi.org/10.1053/j.gastro.2019.08.004. [123] Vinay Sundaram, Rajiv Jalan, Joseph C Ahn, Michael R Charlton, David S Goldberg, Constantine J Karvellas, Mazen Noureddin, and Robert J Wong. “Class III obesity is a risk factor for the development of acute-on-chronic liver failure in patients with decompensated cirrhosis”. In: Journal of hepatology 69.3 (2018), pp. 617–625. [124] Vinay Sundaram, Rajiv Jalan, Parth Shah, Ashwani K Singal, Arpan A Patel, Tiffany Wu, Mazen Noureddin, Nadim Mahmud, and Robert J Wong. “Acute on chronic liver failure from nonalcoholic fatty liver disease: a growing and aging cohort with rising mortality”. In: Hepatology 73.5 (2021), pp. 1932–1944. [125] Vinay Sundaram, Rajiv Jalan, Tiffany Wu, Michael L Volk, Sumeet K Asrani, Andrew S Klein, and Robert J Wong. “Factors associated with survival of patients with severe acute-on-chronic liver failure before and after liver transplantation”. In: Gastroenterology 156.5 (2019), pp. 1381–1391. 151 [126] Vinay Sundaram, Shannon Kogachi, Robert J Wong, Constantine J Karvellas, Brett E Fortune, Nadim Mahmud, Josh Levitsky, Robert S Rahimi, and Rajiv Jalan. “Effect of the clinical course of acute-on-chronic liver failure prior to liver transplantation on post-transplant survival”. In: Journal of Hepatology 72.3 (2020), pp. 481–488. [127] Vinay Sundaram, Nadim Mahmud, Giovanni Perricone, Dev Katarey, Robert J Wong, Constantine J Karvellas, Brett E Fortune, Robert S Rahimi, Harapriya Maddur, Janice H Jou, et al. “Longterm outcomes of patients undergoing liver transplantation for acute-on-chronic liver failure”. In: Liver Transplantation 26.12 (2020), pp. 1594–1602. [128] Vinay Sundaram, Parth Shah, Nadim Mahmud, Christina C Lindenmeyer, Andrew S Klein, Robert J Wong, Constantine J Karvellas, Sumeet K Asrani, and Rajiv Jalan. “Patients with severe acute-on-chronic liver failure are disadvantaged by model for end-stage liver disease-based organ allocation policy”. In: Alimentary pharmacology & therapeutics 52.7 (2020), pp. 1204–1213. [129] Vinay Sundaram, Parth Shah, Robert J Wong, Constantine J Karvellas, Brett E Fortune, Nadim Mahmud, Alexander Kuo, and Rajiv Jalan. “Patients with acute on chronic liver failure grade 3 have greater 14-day waitlist mortality than status-1a patients”. In: Hepatology 70.1 (2019), pp. 334–345. [130] Vinay Sundaram Suyanpeng Zhang Sze-Chuan Suen and Cynthia L. Gong. “Quantifying the benefits of increasing decision-making frequency for health applications with regular decision epochs”. In: IISE Transactions 0.0 (2024), pp. 1–15. doi: 10.1080/24725854.2024.2321492. eprint: https://doi.org/10.1080/24725854.2024.2321492. [131] Hindia Tahir, Leslie L Jackson, and David G Warnock. “Antiproteinuric therapy and Fabry nephropathy: sustained reduction of proteinuria in patients receiving enzyme replacement therapy with agalsidase-β”. In: Journal of the American Society of Nephrology 18.9 (2007), pp. 2609–2617. [132] Thomas R Talbot, Suzanne F Bradley, Sara E Cosgrove, Christian Ruef, Jane D Siegel, and David J Weber. “Influenza vaccination of healthcare workers and vaccine allocation for healthcare workers during vaccine shortages”. In: Infection Control & Hospital Epidemiology 26.11 (2005), pp. 882–890. [133] Paul J Thuluvath, Avesh J Thuluvath, Steven Hanish, and Yulia Savva. “Liver transplantation in patients with multiple organ failures: feasibility and outcomes”. In: Journal of hepatology 69.5 (2018), pp. 1047–1056. [134] Donald M Topkis. Supermodularity and complementarity. Princeton university press, 2011. [135] Jonel Trebicka, Javier Fernandez, Maria Papp, Paolo Caraceni, Wim Laleman, Carmine Gambino, Ilaria Giovo, Frank Erhard Uschner, Cesar Jimenez, Rajeshwar Mookerjee, et al. “The PREDICT study uncovers three clinical courses of acutely decompensated cirrhosis that have distinct pathophysiology”. In: Journal of hepatology 73.4 (2020), pp. 842–854. [136] Sabrina Trippoli. “Incremental cost-effectiveness ratio and net monetary benefit: current use in pharmacoeconomics and future perspectives”. In: European journal of internal medicine 43 (2017), e36. 152 [137] Ashleigh R Tuite, Ann N Burchell, and David N Fisman. “Cost-effectiveness of enhanced syphilis screening among HIV-positive men who have sex with men: a microsimulation model”. In: PloS one 9.7 (2014), e101240. [138] U.S. Department of Health & Human Services. Multiple listing. Availible at https://optn.transplant.hrsa.gov/patients/about-transplantation/multiple-listing/ (accessed June 9, 2023). 2023. url: https://optn.transplant.hrsa.gov/patients/by-organ/liver/questionsand-answers-about-liver-allocation/. [139] U.S. Department of Health & Human Services. Questions and answers about liver allocation. Availible at https://optn.transplant.hrsa.gov/patients/by-organ/liver/questions-and-answersabout-liver-allocation/ (accessed November 3, 2022). 2022. url: https://optn.transplant.hrsa.gov/patients/by-organ/liver/questions-and-answers-about-liverallocation/. [140] UCSF Health. FAQ: Getting a Liver Transplant. Availible at https://www.ucsfhealth.org/education/faq-getting-a-liver-transplant (accessed June 9, 2023). 2023. [141] United Network for Organ Sharing. How we collect data. Availible at https://unos.org/data/ (accessed December 14, 2020). 2020. [142] University of California San Francisco. End-stage Liver Disease (ESLD). Availible at https://surgery.ucsf.edu/conditions--procedures/end-stage-liver-disease-(esld).aspx (accessed November 3, 2022). 2021. url: https://surgery.ucsf.edu/conditions--procedures/end-stage-liver-disease-(esld).aspx. [143] UW Health. Multiple listing with the UW Health transplant center. Availible at https://patient.uwhealth.org/education/multiple-waitlist-listings (accessed June 9, 2023). 2023. [144] Parsia A Vagefi, Sandy Feng, Jennifer L Dodge, James F Markmann, and John P Roberts. “Multiple listings as a reflection of geographic disparity in liver transplantation”. In: Journal of the American College of Surgeons 219.3 (2014), pp. 496–504. [145] Jan P Vandenbroucke, Erik von Elm, Douglas G Altman, Peter C Gøtzsche, Cynthia D Mulrow, Stuart J Pocock, Charles Poole, James J Schlesselman, Matthias Egger, and Strobe Initiative. “Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration”. In: Annals of internal medicine 147.8 (2007), W–163. [146] Alex Veldman, Michael Diefenbach, Doris Fischer, Alida Benton, and Richard Bloch. “Long-distance transport of ventilated patients: advantages and limitations of air medical repatriation on commercial airlines”. In: Air medical journal 23.2 (2004), pp. 24–28. [147] Michael L Volk, Heidi A Reichert, Anna SF Lok, and Rodney A Hayward. “Variation in organ quality between liver transplant centers”. In: American Journal of Transplantation 11.5 (2011), pp. 958–964. 153 [148] Hongzhou Wang. “A survey of maintenance policies of deteriorating systems”. In: European journal of operational research 139.3 (2002), pp. 469–489. [149] Rebecca K Webster, Samantha K Brooks, Louise E Smith, Lisa Woodland, Simon Wessely, and G James Rubin. “How to improve adherence with quarantine: rapid review of the evidence”. In: Public health 182 (2020), pp. 163–169. [150] Milton C. Weinstein, George Torrance, and Alistair McGuire. “QALYs: The Basics”. In: Value in Health 12 (2009), S5–S9. issn: 1098-3015. [151] Emmanuel Weiss, Mikhael Giabicani, and Fuat Saner. “Do liver transplantation criteria differ according to the type of donor and the severity of the recipient?” In: Transplantation 105.3 (2021), e32. [152] Charles D. Wells, Wayne Murrill, and Miguel R. Arguedas. “Comparison of Health-Related Quality of Life Preferences Between Physicians and Cirrhotic Patients: Implications for Cost–Utility Analyses in Chronic Liver Disease”. In: Digestive Diseases and Sciences 49 (2004), pp. 453–458. [153] Robert J Wong, Maria Aguilar, Ramsey Cheung, Ryan B Perumpail, Stephen A Harrison, Zobair M Younossi, and Aijaz Ahmed. “Nonalcoholic steatohepatitis is the second leading etiology of liver disease among adults awaiting liver transplantation in the United States”. In: Gastroenterology 148.3 (2015), pp. 547–555. [154] Chun-Chou Wu, Yiwen Cao, Sze-chuan Suen, and Eugene Lin. “Examining Chronic Kidney Disease Screening Frequency Among Diabetics: A POMDP Approach”. In: Available at SSRN 4544591 (2023). [155] Chunxue Wu, Chong Luo, Naixue Xiong, Wei Zhang, and Tai-Hoon Kim. “A greedy deep learning method for medical disease analysis”. In: IEEE Access 6 (2018), pp. 20021–20030. [156] Tiffany Wu and Vinay Sundaram. “Transplantation for acute-on-chronic liver failure”. In: Clinical liver disease 14.4 (2019), p. 152. [157] Reza Yaesoubi and Ted Cohen. “Generalized Markov models of infectious disease spread: A novel framework for developing dynamic health policies”. In: European journal of operational research 215.3 (2011), pp. 679–687. [158] Han Yu, Suyanpeng Zhang, Sze-chuan Suen, Maged Dessouky, and Fernando Ordonez. “Extending Dynamic Origin-Destination Estimation to Understand Traffic Patterns During COVID-19”. In: arXiv preprint arXiv:2401.10308 (2024). [159] Manaf Zargoush, Mehmet Gümüş, Vedat Verter, and Stella S Daskalopoulou. “Designing risk-adjusted therapy for patients with hypertension”. In: Production and Operations Management 27.12 (2018), pp. 2291–2312. [160] Fang Zhang, Anita K Wagner, and Dennis Ross-Degnan. “Simulation-based power calculation for designing interrupted time series analyses of health policy interventions”. In: Journal of clinical epidemiology 64.11 (2011), pp. 1252–1261. 154 [161] Suyanpeng Zhang, Sze-Chuan Suen, Cynthia L Gong, Jessica Pham, Jonel Trebicka, Christophe Duvoux, Andrew S Klein, Tiffany Wu, Rajiv Jalan, and Vinay Sundaram. “Early transplantation maximizes survival in severe acute-on-chronic liver failure: results of a Markov decision process model”. In: JHEP Reports (2021), p. 100367. [162] ZiYan Zhao, MengChu Zhou, and ShiXin Liu. “Iterated greedy algorithms for flow-shop scheduling problems: A tutorial”. In: IEEE Transactions on Automation Science and Engineering (2021). [163] Enlu Zhou, Michael C Fu, and Steven I Marcus. “Solving continuous-state POMDPs via density projection”. In: IEEE Transactions on Automatic Control 55.5 (2010), pp. 1101–1116. 155
Abstract (if available)
Abstract
Repeated decision-making problems in the context of uncertainty naturally arise in healthcare settings. Markov decision processes (MDPs) have proven useful in many healthcare contexts, integrating disease progression, decision-making, costs, and benefits into an optimization framework. However, implementing MDPs in healthcare settings is nontrivial due to challenges including incorporating unique characteristics of certain diseases, determining the optimal frequency of decision-making, and dealing with the infinite number of possible states.
In this dissertation, we focus on specific healthcare problems and identify key structural properties to address healthcare questions. We present a finite horizon MDP framework for patients with acute liver failure in need of a transplant, determining the optimal timing for accepting a suboptimal organ to maximize one-year survival probability. Additionally, we study the value provided by having additional decision-making opportunities in each epoch. We provide structural properties of the optimal policies and quantify the difference in optimal values between MDP problems of different decision-making frequencies. We analyze a numerical example using liver transplantation in high-risk patients and treatment initiation for chronic kidney disease patients to illustrate our findings. Finally, in the fourth chapter, to address the curse of dimensionality, we propose a novel greedy algorithm for non-uniform discretization in a population-level MDP for infectious disease control.
The dissertation contributes to the field of healthcare applications by providing practical MDP frameworks and efficient algorithms to tackle complex decision-making problems. The theoretical results and empirical analyses offer valuable guidance for healthcare decision-makers in diverse scenarios.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Designing infectious disease models for local level policymakers
PDF
Calibration uncertainty in model-based analyses for medical decision making with applications for ovarian cancer
PDF
Online reinforcement learning for Markov decision processes and games
PDF
A stochastic conjugate subgradient framework for large-scale stochastic optimization problems
PDF
Scalable optimization for trustworthy AI: robust and fair machine learning
PDF
Pandemic prediction and control with integrated dynamic modeling of disease transmission and healthcare resource optimization
PDF
Learning and decision making in networked systems
PDF
Learning and control for wireless networks via graph signal processing
PDF
Integer optimization for analytics in high stakes domain
PDF
Robust and adaptive online decision making
PDF
Improving decision-making in search algorithms for combinatorial optimization with machine learning
PDF
Artificial Decision Intelligence: integrating deep learning and combinatorial optimization
PDF
Utility functions induced by certain and uncertain incentive schemes
PDF
Essays on capacity sizing and dynamic control of large scale service systems
PDF
Provable reinforcement learning for constrained and multi-agent control systems
PDF
A stochastic employment problem
PDF
Empirical methods in control and optimization
PDF
Train scheduling and routing under dynamic headway control
PDF
A survey on the computational hardness of linear-structured Markov decision processes
PDF
Contour Crafting process planning and optimization
Asset Metadata
Creator
Zhang, Suyanpeng
(author)
Core Title
Optimizing healthcare decision-making: Markov decision processes for liver transplants, frequent interventions, and infectious disease control
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Industrial and Systems Engineering
Degree Conferral Date
2024-05
Publication Date
05/21/2024
Defense Date
04/26/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
dynamic programming,infectious disease control,liver transplant,Markov decision process,OAI-PMH Harvest,state discretization
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Suen, Sze-Chuan (
committee chair
), Hall, Randolph (
committee member
), Wu, Shinyi (
committee member
)
Creator Email
suyanpen@usc.edu,suyanpengzhang@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113950167
Unique identifier
UC113950167
Identifier
etd-ZhangSuyan-12978.pdf (filename)
Legacy Identifier
etd-ZhangSuyan-12978
Document Type
Dissertation
Format
theses (aat)
Rights
Zhang, Suyanpeng
Internet Media Type
application/pdf
Type
texts
Source
20240521-usctheses-batch-1157
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
dynamic programming
infectious disease control
liver transplant
Markov decision process
state discretization